International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________ Privacy and Integrity Preserving on Range Queries in Sensor Networks 1 Anurag Singh, 2Vishal Shinde, 3Akshay Agrawal Department of Computer Shivajirao S. Jhondhle College of Engg & Tech, Asangaon Thane, India DYPCOE ,Pune, India, ARMIET, Thane, India Email: Singhanuragv@gmail.com, vishalshinde@rediffmail.com, Akshay1661@gmail.com StarGate and RISE are some of the commercial used products as storage node. Abstract— In two tiered architecture of sensor networks, the storage node serves as an intermediate tier between the sensor and the sink during storing data and processing queries. This model is widely adopted due to its processing efficiency and power saving and storing benefits. However, the storage node becomes attractive for the attackers due to its importance. SafeQ protocol is a protocol which prevents attackers from gaining information from sensor collected data and sink issued queries. SafeQ protocol also helps sink to detect comprised storage nodes when they misbehave. SafeQ preserves privacy by using a novel technique to encode data and queries in such a way that storage node can correctly process encoded queries without knowing actual values. Markle hash tree and neighborhood chains are the two schemes which can be used to preserve integrity, which is used to verify whether the result of query contains exactly those data items which satisfy the query. As storage node stores data received from different sensors and serve as an important role for answering queries, it brings significant security challenges. Hence, storage nodes are more vulnerable to be compromised and different attacks possible. First, the attacker may obtain sensitive data. Second, the compromised storage node may return forged data. Third, the compromised storage node may not include the data items that satisfy the query result. B. Privacy and integrity preserving are two main challenges in range queries problem. First, a storage node must correctly process the encoded queries over encoded data without determining actual values. Second, sink must verify whether the result contains all the data items involved in query result and does not include any forged data. Keywords— Integrity, privacy, range queries, sensor networks. I. INTRODUCTION A. Technical Challenges C. Motivation Limitations of Prior Art Sheng and Li, in their recent seminal work [5] proposed a solution to this problem commonly known as “S&L scheme”. The two main drawbacks of S&L scheme are (1) it allows the attacker to obtain estimation of original data and queries issued by sink, and (2) the storage and power consumption for both grows exponentially with number of dimensions of data. Wireless sensor networks are widely deployed for various applications like forest fire detection, robot control, earthquake monitoring etc. In two tiered sensor network the storage node gathers data from the nearby sensor node and answer queries from the sink. The storage node act as the intermediate between the sensor node and sink of the network. Storage node provides three benefits to sensor networks. First, power is saved by sending all the collected data to closest storage node, instead of sending them to long routes. Second, sensor can have limited memory as data is stored at the storage node. Third, as the sink only communicates with storage node for queries, the query processing becomes more efficient. The inclusion of the storage node was 1st introduced in [1] and has been used widely [2]-[5]. D. Approach and Key Contribution A novel integrity and privacy preserving range queries protocol, SafeQ is used. SafeQ and S&L are two fundamentally different protocols. SafeQ preserves privacy by a novel technique which encodes both data and queries such that encoded queries over encoded data which helps the storage node to correctly process queries without knowing their actual values. To preserve ________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -2, Issue -4, 2013-14 9 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________ integrity, neighborhood chaining technique is used which helps sink to verify that the result of query contains exactly the data items which satisfy the query. schemes using aggregation and chaining. Although neighborhood chaining method seems similar to above signature aggregation and chaining technique, it is much more suitable and efficient for sensor networks, because of the two differences. First, the technique directly concatenates data items with its left neighbor without computing the digest. Second, technique does not need to compute signature. SafeQ has two main advantages over S&L scheme. First, SafeQ provides better security and privacy, as estimation of data is much more difficult. Second, SafeQ delivers better performance on both storage and space for multidimensional data. D. Secure File System on Untrusted Servers II. RELATED WORK In prior work[16], [17] we have studied secure file system on untrusted server, which aims to design a system where user can securely store their file on the untrusted server. As untrusted server is unable to process the queries over files, these solutions cannot solve the range query problem. In contrast, privacy preserving at storage node is main design goal for SafeQ. A. Privacy and Integrity preserving in WNSs Recently [5] [9] people attention have been on preserving privacy and integrity in wireless sensor networks. Sheng and Li proposed a scheme which uses partitioning idea proposed by Hacigumus et al. in [10] for database privacy. This domain of data values are divided into multiple buckets and the size is based on the distribution of data and location of sensor. In each time slots the data items are collected by the sensor from the environment, places them into bucket and encrypts them, and send the encrypted bucket along with bucket id to the nearby storage node. For bucket that has no data items, encoding numbers are send by sensor, which is used by sink to verify that bucket is empty. When the sink performs range query, the smallest set of bucket Ids are found which contains that range in the query, then sends as query to storage node. On receiving the bucket IDs, the storage node returns the corresponding encrypted data in those buckets. The encrypted buckets are then decrypted by the sink and verified using the encoding numbers. III. MODELS AND PROBLEM STATEMENT A. System Model The two-tiered sensor networks is illustrated in Figure 1.A two tiered sensor networks consist of three types of node that are sensors, storage node and sink. Sensors are inexpensive with limited storage space and computing power. They are massively used in field of collecting data from the environment (such as temperature). Storage nodes are much more powerful mobile devices than sensors which are equipped with more computation power and storage space. Each sensor sends the data in time slots to the nearest storage node. The contact point for the users of sensor networks is the sink. Each time the sink receives the question from the user, it is translated into multiple queries and then disseminates the queries to corresponding storage nodes, there the queries are processed and the results are returned to the sink. The sink combines the query result received from multiple storage node and send to the user. S&L scheme has two main drawbacks. First, the bucket partitioning method involves reasonable estimation on actual values of both data items and queries. Second, the power consumption of both sink and sensor and storage space consumption increases exponentially with the number of dimension. While in SafeQ, the power and space consumption increases linearly with the number of dimensions times the number of data items. B. Privacy Preserving in Database In database-as-service model, the bucket partitioning idea is used for querying encrypted data where sensitive data are outsourced to untrusted server [10].However the drawbacks are as discussed above. Boneh and Waters proposed a public-key system for supporting conjunction, subsets on range encrypted data [15].This scheme cannot be used to solve privacy problem because it is too computationally expensive for sensor networks. Fig.1. Architecture of two tiered sensor networks In the above architecture, it is assumed that all the sensor nodes and storage nodes are loosely synchronized with sink. The time is divided into fixed number of interval and every sensor collects data once per time interval. At the starting time, the sensors and sink agree upon the n time intervals from the time slots.At the same time, after sensor collect data for n times, and message C. Integrity Preserving in Database The focus is on verifying the completeness of the result of relational database queries. Markle hash tree have been used for the authentication of data elements and verifying the integrity of database queries. Pang et al. [20] and Narasimha & Tsudik[21] proposed similar ________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -2, Issue -4, 2013-14 10 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________ is send which contains three tuple (a, n,{v1,…., vn}), where a is the sensor ID and n is the sequence number of time slots and {v1,…., vn} are data items collected by sensor a. It is also assumed that the queries from sink are range queries. A rage query is denoted as {n,[p, q]} and termed as “finding data items, which are collected at n time slot and whose value is in the range of [p, q]”. The basic idea of solution for preserving privacy is explained as follows. A sensor si, in a network share a secret key ki with the sink. Consider n data items v1,. . . .vn that sensor collects in time slot t. Sensor first encrypts the the items with the secret key,and the result is represented as (v1)ki,. . ., (vn)ki . Then sensor applies “magic” function Н on data items and gives H(v1,..vn).The message send by sensor to its closest node includes both encrypted data and information H(v1,..vn). When the sink wants to perform query {t,[p, q]},the sink applies another magic function G and sends {t, G{p, q} to storage node. The storage node process the query over the encrypted data at time slot t using another magic function F .Three main function H,F and G satisfy the three condition. (1)A data item vj (1≤ j≤ n) in range [p, q] if and only if F(j,H(v1,..vn),G[p, q]) is true. This condition allows decide whether the item should be included in result. (2)Given H(v1,..vn) and (vj)kiit is computationally in feasible for the storage node which guarantees privacy. (3)Given G[p, q] it is computationally in feasible for the storage node which guarantees privacy. B. Threat Model It is assumed that the sensor and the sink are the trusted; whereas storage nodes are not. Sensors and storage nodes can be compromised in a hostile environment. In case if the sensor is compromised then the collected data of sensor will be known to attacker and it can send forged data to the nearest storage node. It is extremely difficult to prevent sensors from the attackers without the tamper proof hardware. However data from one sensor contributes a small section of the entire network. Therefore, the main focus is on storage node. After the storage node is compromised, the data is known to the attacker and upon receiving the queries from sink. Therefore the attackers are more motivated to find the compromised storage nodes. A. Prefix Membership Verification IV. MODULE DESCRIPTION The basic building block of preserving privacy is member verification scheme. The key idea of prefix membership verification whether the number in a range to several to several verification of whether to are equal. A prefix {0,1}k {*}w-k with k leading 0s and 1s followed by w-k *s called a k prefix. For example , 1*** is 1prefix and it denotes range [1000,1111].The prefix family of x is denoted as F(x).For example, the prefix family of number 12 is F(12) = F(1100)={1100,110*, 11**, 1***, ****}. Prefix membership verification is the based on fact that for any number x and prefix P, x €P. A. Sensor Node It sends data from to the storage node. These are the points or devices which sense and send the data in encrypted format to storage node. The sensor node may have limited storage space. B. Storage Node It is the main node in this architecture. It stores the data collected from the sensor nodes. Sink can interact with the storage node to view the data. Storage node can also record the misbehave details that have tried to access data. It is the most powerful device in this architecture. To verify whether a number p is in the range [v1,v2], we convert the range to minimum set of prefix denoted S([v1,v2]). For example S([11,15])={1011,11**}. Second, we compute the prefix family F(p) for number p. Thus, p if and only if F(a) ∩ S([v1,v2]) ≠ Ǿ. C. Sink It is the node where users or the admin can interact with storage node and view the data items collected by the user. D. SafeQ It is the protocol used to maintain the privacy. Attackers are not able to gain the information because of this protocol. It encode both queries and data in such a way that storage node can process the results without knowing the actual information. V. PRIVACY PRESERVING FOR 1DIMENSIONAL DATA Fig.2. Prefix membership verification It seems natural to have sensor encrypt data and sink encrypted queries, to preserve privacy. However, how the storage node would process encrypted queries over encrypted data without knowing actual values, is the key challenge. B. Submission Protocol The submission protocol concerns how a sensor sends it data to storage node. Let v1,. . . .vn where data item is in range of [v0 , vn+1], be the data items that sensor collects ________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -2, Issue -4, 2013-14 11 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________ at each time slots. The lower and upper bound is denoted by the v0 and vn+1.both the values are known to the sink and the sensor. After collecting data sensors perform the following six steps: query result QR, which include the data item which satisfy the query;(2)the verification object VO, which is included to verify the result. A new data structure is used called neighborhood chains. Given n data items v1, v2 ,….vn where v0< v1 <.. < vn+1 .The list of n data items encrypted using keys ki (v0 | v1)ki,(v1|v2) ki ,…. , (vn | vn+1)ki. Here “|” denotes the concatenation. Figure 3, shows neighborhood chain for 5 data items 1, 3, 5, 7 and 9. 1. Sort the n data items in ascending order. If some data items have same values then simply represented only one time with the number of share. 2. Convert n+1 ranges to their corresponding prefix representation i.e. compute S([v 0,v1]), S([v1,v2]), …… …,S([vn,vn+1]). 3. Numericalize all prefixes i.e. N(S([v0,v1])), … , N(S([vn,vn+1])). 4. Now ,Compute keyed-Hash Message Authentication Code (HMAC) for each Numericalize prefix with help of key y. The implementation of HMAC can be using MD-5 or SHA1. 5. Encrypt every data items using key ki. Fig.3. An example of neighborhood chains 6. Sensor sends the encrypted data item along with to HMAC(N(S([v0,v1]))), … , HMAC(N(S([vn,vn+1])) ) the nearest storage node. Preserving integrity of query result works as follows. After collecting n data items v1, v2 ,….vn , the sensor sends corresponding neighbor chain (v0 | v1)ki,(v1|v2) ki ,…. , (vn | vn+1)ki Given a range query [p, q],the storage nodes process as usual. The Verification object always consists of right neighbor of the largest data item in query result. C. Query Protocol It is concerned with how the sink sends range query to storage node. If the sink wants to perform query on storage node, it performs following steps. 1. Compute families of prefixes F(p) and F(q) 2. Numericalize all prefixes N( F(p) ) and N( F(q)). 3. Apply HMACy to all the numericalize prefixes. On receiving of QR and VO sink verifies the integrity of the result. Let be the correct query result and QR are is the result from storage node. Now, there can be four cases as follows: 4. Send to the storage node as query i.e. 1) If n1< j <n2 such that (vj-1 | vj)Ki ∉ QR , the sink detect this errors as neighborhood chain is not formed. 2) If (vn1-1 | vn1)ki ∉ QR , the sink detects this error. 3) If (vn2-1 | vn2)ki ∉ QR , the sink detects this error. 4) If QR =o ,then the ink can verfy this fact that the data item (vn2-1 | vn2)ki in VO should satisfy vn2 < p < q <vn2+1. {t, HMAC( N( F(p) )) , HMAC( N( F(q))) }. D. Query Processing On receiving the query {t, HMAC( N( F(p) )) , HMAC( N( F(q))) } the storage node process the queries on n data items, received from nearby sensor A theorem is used which says that Given n numbers in ascending order v1,. . . .vn where vj € ( v0 , vn+1 ) (1≤ j ≤ n), vj €[p , q] , if and only if it there exist 1 ≤ n1 ≤j ≤ n2 ≤ n+1 such that the following two condition are hold: A storage node finds item which satisfy the query result and also find the right of the largest item of the query result, which us required as verification object. 1. HMACy ( N( F(p) )) ∩ HMACy ( N( S( [ vn11,vn1]) )) ≠ Ǿ 2. HMACy ( N( F(q) )) ∩ HMACy ( N( S( [ vn21,vn2]) )) ≠ Ǿ One can also use Merkle hash tree to preserve integrity. When the sensor sends a data to storage node Merkle hash tree is constructed. On receiving he query result and the verification object the sink calculates the root value of the tree and then verifies the integrity of query result.In this technique sorting is important without this it is difficult to maintain the integrity of query result. VI. INTEGRITY PRESERVING FOR 1DIMENSIONAL DATA The result, which a storage node sends to the sink in response to the query, first, storage node, cannot include any data item which doesn’t satisfy the query. Second, the storage node cannot exclude any data which satisfy the query. For sink to check the integrity of query result, the query of storage node consists of two parts: (1)the VII. RANGE QUERIES IN EVENT DRIVEN NETWORK It is assumed so far that in each time slot, a sensor sends ________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -2, Issue -4, 2013-14 12 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________ Power Consumption(mW) data to the storage node and collected at that time slot. However this assumption fails for the event driven networks, where sensor report only to storage node. If this solution is directly applied, the sink would not be able to verify that sensor have collected data in a time slot. There can be a case where the sensor did not submit any data in time slot n. There also can be a case that the storage node discards all the data collected by the sensor in time slot n. n1 < n < α + n1 -1: In this case, if sensor has to submit the data then it submits along with encrypted idle time [n1, n-1]; otherwise it takes no action. n = α + n1 -1: In this case, if sensor has to submit data then submits data along with the idle proof [n1, n-1] otherwise submits an idle proof [n1, n-1]. 100 80 60 40 SafeQ 20 S&Lscheme 0 Time slots in minutes Fig.5. Average power consumption per submission for sensor B. Evaluation Setup SafeQ and S&L scheme was implemented using TOSSIM, a widely used wireless sensor network stimulator. The efficiency of SafeQ and S&L scheme is measured on 1-dimensional data. For better comparison, experiments were conducted on the same set of values S&L scheme used [5]. The data sets were from the large set of real values in Intel Lab [8]. In implementing SafeQ, HMAC-MD5 was used with 128 bit. DES algorithm was used in implementing both SafeQ and S&L scheme. C. Result Summary The experimental result from side by side comparison show that SafeQ is better than S&L scheme. The experimental result confirms that power consumption in S&L scheme grows exponentially with the number of dimension. Sink: The sink has minimal changes. In case that lacks for idle proof for verifying integrity of query result. A. SafeQ 0 10 30 50 70 Storage Nodes: At the time when storage node receives a query {n, G ([a, b])} from sink, first it is checked whether data is submitted by sensor at time slot n. If sensor has, then storage node send query result as discussed in previous section. The storage node also checks that whether the sensor has submitted idle proof .If it’s true, then it sends the idle proof to the sink. Otherwise, it replies to sink saying that the sensor does not have a idle proof time and it would send if the right idle proof time is received and then forward it to sink. The sink can maximum wait for idle proof time is α 1.Smaller α ,favors for integrity verification whereas larger α, favors sensors for power saving. VIII. 5 Fig.4. Average power consumption per query response for storage node Power Consumption(mW) S&L Scheme 10 Time slots in minutes Sensors: An idle time period for sensor is a time slot [n1, n2] where the sensor does not submit including the time slots n1 and n2. Let α be the threshold for the sensor being idle without reporting to sensor node. Suppose the last time that the sensor submitted data is time slot n>n1, sensor acts based on the following cases: n=n1: In this case, if sensor has data to submit, then it just submits the data; otherwise no action. 15 10 30 50 70 The above challenge is addressed by sensor reporting their idle period to storage node each time when they submit data after idle period of time or the idle period is longer than the threshold. Storage node can use such idle period to the sink that the sensor did not submit any data at any timeslot in that period. 20 EXPERIMENTAL RESULTS IX. CONCLUSION Evaluation methodology There are two main key contributions in this paper. First, SafeQ protocol, for handling range queries in to tiered sensor networks efficiently. SafeQ uses Markle hash tree and neighborhood chaining for prefix membership verification. SafeQ significantly strengthens the security as it prevents compromised storage node from obtaining reasonable estimation of actual values. The results To compare SafeQ with S&L scheme, the both schemes are implemented and side by side comparison is performed on large data set. Average power and space consumption is measured for both the submission and query protocol of both schemes. ________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -2, Issue -4, 2013-14 13 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________ proves its efficiency as SafeQ outperforms the prior art of multidimensional data in terms of both power and storage. Second, a solution to adapt SafeQ for event driven sensor networks. [11] B.Hore,S.Mehrotra and G.Tsudik,”A privacypreserving index for range queries” in Proc 30thVLDB,2004,pp. 720-731. [12] R.Agrawal,J.Kieman,R.Srikant and Y.Xu “Order preserving encryption for numeric data” in proc. ACMSIGMOD, 2004, pp.563-574. [13] D.X.Song,D.Wagner,and A.Perrig,”Practical technique for searches on encrypted data,” in Proc.IEEE S&P,2000. [14] P.Golle,J.Staddon, and B.Waters.”Secure conjuctive keyword search over encrypted data” in Proc.2ndACNS,2004, pp.31-45. [15] D.Boneh and B.Water,”Conjuctive keyword search over encrypted data” in Proc.Theory of Crpytography Conference (TCC) ,2007,pp.535554. [16] E-J.Goh, H.Shacham, N.Modadugu, and D.Boneh, ”Sirius: Securing remote untrusted storage” in Proc.NDSS,2003,pp 131-145. [17] M kallahalla,E.Riedel,R.Swamininathan,Q.wang and K.Fu, ”Plutus: Scalable secure file sharing on untrusted storage ” in ProcFAST,2003 pp.29-42. [18] M.Narasimha and G.Tsudik,”Authentication of outsourced database using signature aggregation and chaning” in Proc.DASFAA, 2006 [19] W.Cheng,H.Pang and K-L.Tan,”Authenticating multi-dimensional query results in data publishing”in data and application security,2006 pp. 60-73. [20] F. Chen and A. X. Liu., “Privacy and integrity preserving range queriesin sensor networks,” Michigan State University, Tech. Rep. MSUCSE-09-26, 2009. [21] “Tossim,” http://www.cs.berkeley.edu/pal/research/tossim.h tml. REFERENCES [1] [2] [3] [4] [5] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, and F. Yu, “Data-centric storage in sensornets with ght, a geographic hash table,” Mobile Networks and Applications, 2003, vol. 8, no. 4, pp. 427–442. P. Desnoyers, D. Ganesan, H. Li, and P. Shenoy, “Presto: A predictive storage architecture for sensor networks,” in Proc. , 2005,vol 1,no.8, 10th HotOS. B. Sheng, Q. Li, and W. Mao, “Data storage placement in sensor networks,” in Proc. 7th ACM MobiHoc, 2006, pp. 344–355. B. Sheng, C. C. Tan, Q. Li, and W. Mao, “An approximation algorithm for data storage placement in sensor networks,” in Proc. WASA, 2007, vol 1, no.8,Pp.71–78. B. Sheng and Q. Li, “Verifiable privacypreserving range query in two-tiered sensor networks,” in Proc. IEEE INFOCOM, 2008,vol 2,no.6, pp. 46–50. [6] Stargate gateway (spb400),http://www.xbow.com [7] Rise project,http://www.cs.urc.edu/rise. [8] Intel lab data, research.net/labtdata. [9] J.Shi,R.Zhang and Y.Zhang,”Secure range queries in tiered sensor networks,” in Proc.IEEE INFOCM,2009. [10] H.Hacigumus,B.Iyer,C.Li and S.Mehrotra,”Executing sql over encrypted data in the database-service-provider model” in Proc.ACMSIGMOD,2002,pp 216-227. http://berkeley.intel- ________________________________________________________________________________ ISSN (Print): 2319-2526, Volume -2, Issue -4, 2013-14 14