Proceedings of the 7th Annual ISC Graduate Research Symposium ISC-GRS 2013 April 24, 2013, Rolla, Missouri Vimal Kumar Department of Computer Science Missouri University of Science and Technology, Rolla, MO 65409 EFFICIENT AND SECURE CODE DISSEMINATION IN SENSOR CLOUDS ABSTRACT In this paper, we present an efficient and secure code dissemination technique aimed towards sensor clouds. Previous code dissemination techniques were geared towards traditional wireless sensor networks and did not take into account the dynamic nature of a sensor cloud. The technique presented in this paper first finds out the code which is common across various wireless sensor applications and distributes this code in the form of functions a priori in to the network. During the code dissemination these common functions are picked up by the sensors from the network and only a part of the code needs to be transmitted from the base station. Thus, reducing the overall code transmitted and reducing the energy consumption. Since security is important in sensor clouds, we further present a security scheme based on symmetric proxy re-encryption to provide confidentiality and integrity of the code. We also evaluate our scheme in terms of energy consumption and the reduction in disseminated code size to illustrate its efficiency. 1. INTRODUCTION In a wireless sensor network's life time, the code running on the sensors may need to be updated or changed completely a number of times. Wireless sensor networks are usually very large in size which makes it infeasible to manually update each sensor with new code. The other alternative is to disseminate the code wirelessly in the network. Sensors receive the code packet by packet and then rebuild the code image, once all the code has been received. In wireless code dissemination, code images are communicated via the wireless channel which is inherently in-secure and prone to attacks from adversaries. A secure code dissemination technique enables the code dissemination to be confidential and protected against malicious code injection attacks. A large amount of work has been done in [1-5] to reduce the amount of code to be transmitted from the base station to different sensor nodes. The efforts however have been focused on traditional wireless sensor networks, which support one application at a time. In such networks the code needs to be updated every once in a while. The updates most often are minor and most of the code does not change. Papers such as [1, 3, 5] create a difference script between the old code and the updated code. The base station disseminates the script instead of the whole code which reduces the number of packets and saves energy on the forwarding nodes. In a Sensor cloud [6, 7] however, clusters of nodes are provisioned dynamically to the user, to support several applications on demand. Dynamic provisioning implies that the code on the wireless sensors is changed entirely as a new application is installed. The difference script mechanism cannot be applied in this scenario because the script itself would be the size of the code. Thus, there is a need for an efficient code dissemination scheme, which is well suited for a sensor cloud scenario. Efficiency of code dissemination is an especially important issue in sensor clouds because the frequency of code change is high, in order to support different applications. High frequency of code change implies that the sensors spend a good deal of their energy on forwarding new code and installing new code. Any reduction in the amount of total code transferred thus gets multiplied by high frequency, resulting in great reduction in energy consumption. Moreover, clusters of sensors in a sensor cloud are dynamically provisioned to users. This means that at any given point of time various clusters would be working for various users. In such a scenario, the security of the code in terms of confidentiality and integrity also becomes very important. Code disseminated from the base station will inevitably be forwarded by many sensors on its way to its destination cluster. Code confidentiality is thus a critical pre-requisite since the code may be carrying keying material as well which needs to be protected against eavesdropping. Another pre-requisite is code integrity, which will make sure that an adversary has not injected malicious code packets during the code dissemination process. To summarize, there is a need for a code dissemination scheme, which is well suited to a dynamically provisioned sensor cloud scenario. The scheme needs to minimize the number of code packets transmitted and should provide confidentiality and protect the integrity of the disseminated code. In this paper we describe our efficient and secure code dissemination scheme which addresses the above concerns. Our contributions are thus two-fold. 1. A code dissemination algorithm which reduces the total number of packets sent from the base station to a cluster of sensors in a sensor cloud scenario. 2. A security mechanism which provides, confidentiality, integrity and immediate authentication of code packets. 1 2. SYSTEM MODEL AND ASSUMPTIONS The sensor cloud consists of a large number of wireless sensors. We consider that a clustering algorithm has been run and the sensors have been grouped into clusters. The sensors are provisioned to the users in terms of clusters. At any given point of time, there may be many users in the sensor cloud, each holding one or more clusters of sensors. Such a model has been discussed in [6, 7] and is visualized in Fig. 1. Sensors in a cluster provisioned to a particular user x, collect data for the user x but may act as forwarding nodes for other clusters, for transferring data and code. In previous models, the code updation happened on the scale of the WSN. In our model, on the other hand, the code change happens at the cluster scale. Code is updated when individual users install new applications in their sensor clusters or a cluster is provisioned to a new user and a new application is installed. We assume a routing structure is also in place, using which the base station can route the code to any particular cluster. Each cluster has a cluster key CK, which is known to the cluster members and the base station. The adversary in our model lies inside the network. The sensors in clusters, provisioned to other users are curious and may want to eavesdrop on the code which is being transferred. The sensors which store common code, may want to inject malicious code by modifying the code they store and making other sensors accept this modified code. A A A A A A A A B A User A B B Middleware User B B B B B C C C B C C C C C C C C Sensor clusters Figure 1. System Model 3. PROPOSED APPROACH Our approach is based upon the observation that the executable code which runs on the wireless sensors, consists entirely of subroutines and objects. The subroutines and objects have a one to one correspondence with the functions and global variables in a high level language such as nesC in TinyOS. We further observe that a number of these functions and global variables are common across a number of wireless sensor applications. In a sensor cloud environment, which has a number of WSN applications running simultaneously in various clusters of motes, many applications may share parts of code. Some applications may have the same security code, while others may share the same routing subroutines, some others may share the sensing code and so on. All applications also share the same operating system code. The basic idea therefore is, to identify the commonly used functions and global variables across all the given applications. Then, distribute these in the network such that every node probabilistically stores a few of them. When the code on a cluster of sensors needs to be changed, the base station first checks which of the functions and objects can the sensors request from the other sensors in the network. Only the part of code which does not already exist in the network is sent from the base station. Rest of the code is requested from nearby sensors. The security challenges in the scheme are enumerated below. 1. Since the functions are stored on sensors, a request for a specific function will leak information about the code. To avoid this, the functions need to be kept encrypted on the sensors. This however presents a problem because the encryption keys will need to be revealed to the requesting sensors. Once the requesting cluster knows the encryption keys it can send spurious requests and retrieve all the encrypted code. 2. When sensors reply with encrypted functions to function requests, it needs to be made sure that the functions have not been tampered with. This authentication needs to be done as soon as the functions are received to thwart energy draining attacks. 4. DETECTING COMMON FUNCTIONS We assume that the base station has a tentative list of applications that may be used in the sensor cloud in future. It must be noted that, not all the applications are needed to be known beforehand, rather a small sample size which would be sufficient to detect the common functions across the applications. We follow a similar procedure as followed in Qdiff [1] to detect common functions in the applications. To dump ELF files we use MSP430's msp430-objdump utility and the Bauhaus-toolkit to compare the C files generated from the nesC code of the various applications, compiled for the TelosB mote. Bauhaus-toolkit has a clone detection utility which can detect Type I and Type II clones, in different applications, at the source code level. Type I clones are fragments of code which are identical and Type II clones are copies which are structurally identical but may have the identifiers changed. At this point of time in our research we only consider Type I clones. In future we will develop techniques to take Type II clones into account as well. The Type I clones found in the C code by the clone detection utility of the Bauhaus toolkit, can be further divided into two different types at the ELF file level. Definition 1. We define Type 1a clones as the true Type 1 clones, where the two codes are exactly the same and they may or may not have been shifted in memory. Definition 2. Some Type 1 clones may also contain calls to functions and global variables which have shifted in memory. We define such clones, which have calls to functions and refer global variables, which have shifted in memory as Type 1b clones. While Type 1a clones can be used as they are, it is necessary to fix the code shifts and the change of memory 2 addresses caused by these code shifts in Type 1b clones, before the Type 1b clones are used. To deal with the code shifts, before disseminating the common functions and objects in the network, the base station performs the following activity. The base station reorders the functions and global variables in the ELF files. Beginning with the Type 1a functions, it places the common functions at the end of the .text section. The code will now grow towards the beginning of the file. After all the Type 1a functions are moved, Type 1b functions are moved in the same way. This results in the reordering of other functions and changes in function references. The base station then fixes the changes in function references throughout the code. We place the common functions at the end of the .text section and not at the beginning to avoid situation where in future the base station gets applications with large .data and .bss sections. This will move the beginning of the .text section further down. In such a case it would be impossible to use the common function which was placed in the beginning of the .text section of a smaller application. Therefore, it is necessary to ensure that there are no common functions present at the beginning of the .text section at all. The rearranged common functions with the function calls fixed are then distributed on the wireless sensors as explained in section 5.1. When the base station has to disseminate a new application code in the network, it first rearranges the functions of the new application code such that the common functions reside in the same memory location as the code which was distributed in the network. Rest of the code is then placed around these functions. The global variables of this new application are also arranged according to the common functions' need in the .data and .bss sections. Index CFL BF New Code Key Figure 2. Pre dissemination packet content 5. PROPOSED ALGORITHM 5.1. Pre-deployment Phase The pre-deployment phase consists of two parts, the code processing part and the crypto pre-processing part. In the code processing part, the base station gives a unique identifier called the FID to each new function it encounters and identifies the common functions across all the applications as described in section 4. The FIDs are kept in a table called the function table. Once the common functions have been identified, the application codes are rearranged according to these functions. In the crypto pre- processing phase, the base station creates two one way hash chains. The encryption hash chain K0, K1, …, Kn and the authentication hash chain A0, A1, …, An, where, n is taken to be sufficiently large so as to cover the entire lifetime of the network operations. The hash chains have the following rules. 1. Ai = h(Ai+1) and Ki = h(Ki+1) 2. A0 and K0 are the root of the chain which are obtained by applying the hash function h(), n times on An and Kn respectively. A0 is pre-deployed on the sensors. For each sensor, some FIDs are randomly selected, based on the Flash memory allocated for storing the functions. The functions are encrypted with key K0 using the symmetric re-encryption scheme of [13] and pairs of FIDs and the associated encrypted function are deployed on the sensors. RKi h(CFL) HMAC Ai(h(RK i||h(CFL)) Figure 3. Content disseminated by base station 5.2. Pre-dissemination For each new application code to be disseminated the base station creates a list called the common functions list (CFL). It consists of the FIDs of all the common functions which a node can find in the network stored on other nodes. The list is in the form of FIDs along with the size of the functions and their memory location in the compiled code. Before disseminating the code, the base station creates a predissemination packet. This packet consists of the re-encryption key. For the first code dissemination, the re-encryption key is calculated from the keys K0 and K1 of the encryption key chain. For any given iteration of code change i the re-encryption key is calculated using the i-th and the i-1-th keys in the encryption key chain. The packet further consists of the re-encryption key, RKi, hash of the CFL, h(CFL) and an HMAC of hash of the reencryption key concatenated with hash of the CFL, i.e HMACAi (h(RKi)||h(CFL)). The key Ai used for generating the HMAC is the next key in the authentication key chain. This predissemination packet is then disseminated in the network just prior to the code. This is a broadcast packet and all the nodes in the network save the contents of this packet to authenticate the CFL and the re-encryption key. The structure of this packet can be seen in Fig. 2. 5.3. Code Dissemination The base station prepares for the code dissemination by creating a Bloom Filter (BF) of an appropriate length. It uses a hash function to hash the common functions on the CFL one by one to populate the BF. It then combines the CFL, the BF, the new code and the next encryption key Ki together. The B.S also creates an index on the code dissemination content to help the nodes recover everything and appends this index itself to the contents. Once the total code dissemination content as shown in Fig 3 is known, the code is divided into pages and then packets. The base station then creates a session key Si using a nonce n and the clusters group key CK. This key is used to encrypt the packets to provide confidentiality. All the packets are encrypted individually with the same key. To provide integrity of code, a process similar to Seluge [10] is used with some modifications. For the sake of continuation we use the same nomenclature as in Seluge [10]. We assume that there are P pages and each page 3 has N packets. The pages are denoted as Page 1 to Page P, while the packets for Page i are denoted as Pkti-1 to Pkti-N. Packets in Page P are hashed and the hash of packet i is appended to packet i in page P-1. The packets in Page P -1 then consist of the hash of the corresponding Page P packet along with the original packets of Page P-1. This process is followed until the packets of Page 1 are hashed. A Merkle Hash Tree is created over the packets in Page 1, we call this the Vertical Hash Tree (VHT). In Seluge [10] a signature is created over the root of the Merkle Hash Tree. Verifying a signature is a public key cryptography operation and consumes a large amount of energy. Our implementation of ECDSA signatures over TelosB motes shows that verification of one signature needs 28.771 mJ of energy. On the other hand 1 AES-256 bit encryption costs .01mJ of energy. In a traditional wireless sensor network, this is a necessary evil since the whole network needs to be updated. In a sensor cloud since only one cluster needs to be updated at a time, we can use symmetric key cryptography in place of public key cryptography. Instead of signing the root of the hash with the base stations private key, the base station creates a signature key from the clusters key CK and a random nonce k which will be used to produce the signature. A signature packet which includes the VHT root hash and the nonce k is created and the signature is produced by encrypting VHT root hash||k. The nodes in the cluster can derive the signature key from the cluster key and the nonce and verify the root hash. We observe that since code updation in a sensor cloud happens in a cluster where the sensors are physically close together, energy intensive tasks such as decryption of packets can be done in a distributed manner. Thus, instead of every sensor decrypting all the code, each sensor can decrypt a few packets, which can save a large amount of energy. This however makes it necessary that nodes within a cluster are protected against code injection from each other. To accomplish this, the base station creates another hash tree on the same dissemination contents, which we call the Horizontal Hash Tree (HHT). For this hash tree, each page of the code is hashed and the hashes of the page h(Page) are used as leaf nodes. The root hash of HHT is encrypted using the signature key and the signature is included in the signature packet. The base station just prior to starting the dissemination of the code, broadcasts the next authentication key Ai, which was used in creating the HMAC in the pre-dissemination phase. 5.4. Activity on the nodes After receiving the next authentication key Ai, the nodes follow the following process. The nodes verify this key by checking whether h(Ai) = Ai-1. Once the validity of this key has been ensured, the re-encryption key and the hash of the CFL in the pre-dissemination packet are validated by creating an HMAC and comparing it to the one obtained in the predissemination phase. The one way property of the hash chain ensures that any malicious node, which has the authentication key Ai-1, cannot predict the key Ai, with non-negligible probability. This implies that an adversary which makes any changes to the contents of the pre-dissemination packet, will be caught with a very high probability, which ensures the delivery of the un-tampered RKi and h(CFL). After validating the reencryption key, the encrypted functions stored on the sensors are re-encrypted using the re-encryption key. Re-encrypting with re-encryption key RKi the function, which were earlier encrypted by the key Ki-1, means these functions can now only be decrypted by key Ki. This delayed authentication using a hash chain means an adversary cannot make the nodes accept arbitrary re-encryption keys. This however presents a problem, that if a node happens to miss one pre-dissemination packet, it will break the entire chain of re-encryption keys and the node will no longer be able to use the following re-encryption keys. A simple solution is to send multiple re-encryption keys in each pre-dissemination packet. Past re-encryption keys can be sent along with the current re-encryption key to enable the nodes which have missed one of the past re-encryption keys to mend their broken chain of re-encryption keys. To enable distributed decryption, the cluster head in each cluster creates N virtual ids ranging from 1 to N and gives each sensor one of the virtual ids, where N is the number of packets in a page. The sensors instead of dealing with all the packets only store the packets which are multiple of their virtual id. So, a node with virtual id 1, decrypts pkt1 in all the pages, the node with virtual id 2, decrypts pkt2 in all the pages. In case the number of nodes in the cluster is less than N, nodes can be given additional virtual ids. Each node can verify the packet in page i from the hash in the packet in page i + 1. After all the packets have been received and all except the packets in page 0 have been verified, the nodes encrypt their decrypted packets again and broadcast for other nodes to receive. The encryption is done in large blocks of data which reduces the number encryption operations a node has to perform. Once the nodes receive all the packets of the code dissemination, they first verify that the code has not been tampered by the cluster members. This is done by extracting the horizontal hash bits from each packet and verifying the root of the Horizontal Hash Tree. Since nodes are only allowed to decrypt a part of each page, a change in the code by a malicious node will always be caught. After the verification of the Horizontal Hash Tree, the Vertical Hash Tree is verified, in a similar way, in which Seluge [10] verifies its hash tree. The slight difference is in the verification of the root node, which is encrypted by the symmetric signature key generated from the clusters key CK and the random nonce k. After the verification phase is complete, the nodes extract the CFL, the Bloom filter, the new code and the encryption key Ki using the index. The cluster head then broadcasts the CFL in clear. When a sensor receives the CFL, it first checks its validity by hashing the CFL and comparing it to the hash of CFL received in the pre- dissemination packet. Once the CFL is verified the sensors check if they have the requested functions by comparing the FIDs. If the sensor has one or more of the requested functions, the functions which were re-encrypted by RKi are sent back to the requesting nodes. The encryption key received by cluster nodes, in the code dissemination is used to decrypt the received encrypted functions. The received 4 functions are then verified using the Bloom Filter. The functions are hashed and the positions in the filter, which this hash results are checked against the already existing entries in the Bloom Filter. If the hash of a received function, results in positions which are unset in the Bloom Filter, the function is rejected, otherwise accepted. Once all functions pass through the Bloom Filter, the nodes are ready to build the code image from its various parts. To build the code image the nodes have to only use the CFL to plug these functions into their appropriate position in the code. The code image is stored in the flash memory and built. Once the build is complete, the boot loader can boot the node up using it. 6. PERFORMANCE ANALYSIS We have implemented our code using TinyOS on TelosB sensor and simulated using TOSSIM. As discussed in section 5.4, sending a single re-encryption key in a high noise environment may not be sufficient because of the unreliable sensors, packet drops and packet corruption. We simulated a high noise environment with network of 100 nodes spread in a 10x10 grid, and nodes 2 meters apart from each other. As can be seen in Fig. 4, when only a single key is sent in the predissemination packet, after a run of 1000 iterations, only 95.86 % of the nodes remained in sync with the keys. Once a node does not receive are-encryption key, it goes out of sync and is unable to take any part in further operations. This problem can be alleviated by sending multiple re-encryption keys in a packet. For example in the i-th iteration, along with the reencryption key RKi, the last re-encryption key RKi-1 can also be set. This enables nodes which missed the re-encryption key in the last iteration to come in-sync. It can be seen in the figure that the percentage of nodes out of sync with the network goes down as we increase the number of keys. We found, that 99.99% of the nodes were in-sync, when 4 keys were sent, even after a run of 1000 iterations. Thereafter, the percentage of insync nodes drops slightly which we attribute to the large packet size because of the multiple keys. With TinyOS 2.x packet size limit, only 6 re-encryption keys can be sent at once, since the packet also involves an HMAC and a hash. algorithm has been implemented on wireless sensors. The results of our implementation can be seen in Fig. 5. The block size for encryption was 128 bit. The operations shown in the figure are encryption, decryption, re-encryption, re-encryption key generation and key generation. We can see that most operations of the scheme are very lightweight, except for key generation. In our secure code dissemination scheme, however, the key generation happens on the base station. The operations performed on the nodes are decryption and re-encryption. Figure 5. Execution times of proxy re-encryption operations Figure 6. Energy consumption of proxy re-encryption operations Figure 7. Percentage reduction in overall code size Figure 4. Synchronization of nodes with the re-encryption keys We also implemented the symmetric proxy re-encryption of [13]. This is the first time a symmetric proxy re-encryption On TelosB motes decryption takes 2.93 milliseconds while proxy re-encryption takes 5.18 milliseconds. The corresponding energy consumption data is shown in Fig. 6. From our implementation, we can say that symmetric proxy re-encryption 5 is feasible on wireless sensors. Fig. 7, shows the reduction in the size of disseminated code for various applications compared to Seluge [10]. For this experiment, we chose five applications. Four are the standard TinyOS apps, Blink, Sense, Oscilloscope, RadioSenseToLeds (RSTL) and an application Tree, which is a tree construction application. The common functions between these applications were discovered and distributed in the simulated network. The dissemination code for our algorithm consisted of the new code along with the overhead of the Index, the CFL, Bloom Filer and the Decryption Key. This was compared with standard Seluge [10] dissemination code. The results in Fig. 7 show the percentage reduction of disseminated code for each of the application. The reduction is highest in the Blink application, in which our algorithm disseminates 19.06% less code. Fig. 8 illustrates the energy overhead of our algorithm compared to Seluge [10] for each of the applications. The energy overhead is due to decryption of code packets, verification of HHT and verification of common functions through Bloom Filter. The energy overhead in the case of Blink is 4.2 mJ while, in the case of Oscilloscope, RadioSenseToLeds and Tree is approximately 18.5 joules. A large portion of this overhead however, is the cost of confidentiality of code packets. Figure 8. Energy overhead compared to Seluge 7. CONCLUSIONS Sensor clouds are an emerging paradigm for sensor networks, which is very dynamic in nature. Nodes in sensor clouds are constantly provisioned and de-provisioned for users. In such a scenario an efficient code dissemination algorithm which is also secure becomes necessary. In this paper we have presented a novel code dissemination algorithm, which is both efficient and secure. Our code dissemination algorithm takes into account the similarity of code across applications. The basic idea therefore is to only communicate the new code to the sensors while the common code can be picked up from the sensors in the network. This reduces the amount of code needed to be communicated. Reduced amount of code results in an energy efficient code dissemination. Our security framework is based around the symmetric proxy re-encryption algorithm. We have implemented our algorithm on TelosB motes and from the experiments it can be concluded that the algorithm reduces the amount of communication and energy required, while also providing confidentiality and integrity of code. While in the current work we have focused only on Type 1 clones in application code, in future we intend to include Type 2 clones too, which will greatly increase the efficiency. 8. ACKNOWLEDGMENTS This research is supported by the Intelligent Systems Center. 9. REFERENCES [1] Shafi, N., Ali, K., and Hassanein, H., “No-reboot and zero-flash over-the-air programming for wireless sensor networks,” in Sensor, Mesh and Ad Hoc Communications and Networks (SECON), 2012 [2] Dong, W, Liu, Y, Chen, C, Bu, J, Huang, C, and Zhao, Z, “R2: Incremental reprogramming using relocatable code in networked embedded systems,” IEEE Transactions on Computers, vol. 99, no. 2012. [3] Jeong, J, and Culler, D, “Incremental network programming for wireless sensors,” in Sensor and Ad Hoc Communications and Networks, IEEE SECON 2004. [4] Koshy, J, and Pandey, R, “Remote incremental linking for energy efficient reprogramming of sensor networks,” in Wireless Sensor Networks, 2005. [5] Reijers, N, and Langendoen, K, “Efficient code distribution in wireless sensor networks,” in Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications, ser. WSNA’03. [6] Poolsappasit, N, Kumar, V, Madria, S, and Chellappan, S, “Challenges in secure sensor-cloud computing,” Secure Data Management. 70–84, 2011. [7] Yuriyama, M, and Kushida, T, “Sensor-cloud infrastructure-physical sensor management with virtualized sensors on cloud computing,” in NetworkBased Information Systems (NBiS), 2010 [8] Hui, J, and Culler, D, “The dynamic behavior of a data dissemination protocol for network programming at scale,” in Proceedings of the 2nd international conference on Embedded networked sensor systems. ACM, 2004, pp. 81–94. [9] Li, W, Zhou, P, and Yang, J, “Adaptive buffer management for efficient code dissemination in multiapplication wireless sensor networks,” in IEEE/IPIP International Conference on Embedded and Ubiquitous Computing, 2008, pp. 295–301. [10] Hyun, S, Ning, P, Liu, A, and Du, W, “Seluge: Secure and dos-resistant code dissemination in wireless sensor networks,” in Proceedings of the 7th international conference on Information processing in sensor networks. IEEE Computer Society, 2008 [11] Dutta, P, Hui, J, Chu, D, and Culler, D, “Securing the deluge network programming system,” in Proceedings of the 5 th international conference on Information processing in sensor networks,. IPSN ’06. 6 [12] Tan, H, Ostry, D, Zic, J, and Jha, S, “A confidential and dos-resistant multi-hop code dissemination protocol for wireless sensor networks,” Computers Security, vol. 32, 2013 [13] Syalim A, Nishide, T, and Sakurai, K, “Realizing proxy re-encryption in the symmetric world,” in Informatics Engineering and Information Science, ser. Communications in Computer and Information Science 7