Efficient and Secure Code Dissemination in Sensor Clouds

Vimal Kumar
Department of Computer Science
Missouri University of Science and Technology, Rolla, MO 65409
In this paper, we present an efficient and secure code
dissemination technique aimed towards sensor clouds. Previous
code dissemination techniques were geared towards traditional
wireless sensor networks and did not take into account the
dynamic nature of a sensor cloud. The technique presented in
this paper first finds out the code which is common across
various wireless sensor applications and distributes this code in
the form of functions a priori in to the network. During the
code dissemination these common functions are picked up by
the sensors from the network and only a part of the code needs
to be transmitted from the base station. Thus, reducing the
overall code transmitted and reducing the energy consumption.
Since security is important in sensor clouds, we further present
a security scheme based on symmetric proxy re-encryption to
provide confidentiality and integrity of the code. We also
evaluate our scheme in terms of energy consumption and the
reduction in disseminated code size to illustrate its efficiency.
In a wireless sensor network's life time, the code running
on the sensors may need to be updated or changed completely a
number of times. Wireless sensor networks are usually very
large in size which makes it infeasible to manually update each
sensor with new code. The other alternative is to disseminate
the code wirelessly in the network. Sensors receive the code
packet by packet and then rebuild the code image, once all the
code has been received. In wireless code dissemination, code
images are communicated via the wireless channel which is
inherently in-secure and prone to attacks from adversaries. A
secure code dissemination technique enables the code
dissemination to be confidential and protected against
malicious code injection attacks. A large amount of work has
been done in [1-5] to reduce the amount of code to be
transmitted from the base station to different sensor nodes. The
efforts however have been focused on traditional wireless
sensor networks, which support one application at a time. In
such networks the code needs to be updated every once in a
while. The updates most often are minor and most of the code
does not change. Papers such as [1, 3, 5] create a difference
script between the old code and the updated code. The base
station disseminates the script instead of the whole code which
reduces the number of packets and saves energy on the
forwarding nodes. In a Sensor cloud [6, 7] however, clusters of
nodes are provisioned dynamically to the user, to support
several applications on demand. Dynamic provisioning implies
that the code on the wireless sensors is changed entirely as a
new application is installed. The difference script mechanism
cannot be applied in this scenario because the script itself
would be the size of the code. Thus, there is a need for an
efficient code dissemination scheme, which is well suited for a
sensor cloud scenario. Efficiency of code dissemination is an
especially important issue in sensor clouds because the
frequency of code change is high, in order to support different
applications. High frequency of code change implies that the
sensors spend a good deal of their energy on forwarding new
code and installing new code. Any reduction in the amount of
total code transferred thus gets multiplied by high frequency,
resulting in great reduction in energy consumption. Moreover,
clusters of sensors in a sensor cloud are dynamically
provisioned to users. This means that at any given point of time
various clusters would be working for various users. In such a
scenario, the security of the code in terms of confidentiality and
integrity also becomes very important. Code disseminated from
the base station will inevitably be forwarded by many sensors
on its way to its destination cluster. Code confidentiality is thus
a critical pre-requisite since the code may be carrying keying
material as well which needs to be protected against
eavesdropping. Another pre-requisite is code integrity, which
will make sure that an adversary has not injected malicious
code packets during the code dissemination process. To
summarize, there is a need for a code dissemination scheme,
which is well suited to a dynamically provisioned sensor cloud
scenario. The scheme needs to minimize the number of code
packets transmitted and should provide confidentiality and
protect the integrity of the disseminated code. In this paper we
describe our efficient and secure code dissemination scheme
which addresses the above concerns. Our contributions are thus
1. A code dissemination algorithm which reduces the total
number of packets sent from the base station to a cluster of
sensors in a sensor cloud scenario.
2. A security mechanism which provides, confidentiality,
integrity and immediate authentication of code packets.
The sensor cloud consists of a large number of wireless
sensors. We consider that a clustering algorithm has been run
and the sensors have been grouped into clusters. The sensors
are provisioned to the users in terms of clusters. At any given
point of time, there may be many users in the sensor cloud,
each holding one or more clusters of sensors. Such a model has
been discussed in [6, 7] and is visualized in Fig. 1. Sensors in a
cluster provisioned to a particular user x, collect data for the
user x but may act as forwarding nodes for other clusters, for
transferring data and code. In previous models, the code
updation happened on the scale of the WSN. In our model, on
the other hand, the code change happens at the cluster scale.
Code is updated when individual users install new applications
in their sensor clusters or a cluster is provisioned to a new user
and a new application is installed. We assume a routing
structure is also in place, using which the base station can route
the code to any particular cluster. Each cluster has a cluster key
CK, which is known to the cluster members and the base
station. The adversary in our model lies inside the network. The
sensors in clusters, provisioned to other users are curious and
may want to eavesdrop on the code which is being transferred.
The sensors which store common code, may want to inject
malicious code by modifying the code they store and making
other sensors accept this modified code.
User A
User B
Sensor clusters
Figure 1. System Model
Our approach is based upon the observation that the
executable code which runs on the wireless sensors, consists
entirely of subroutines and objects. The subroutines and objects
have a one to one correspondence with the functions and global
variables in a high level language such as nesC in TinyOS. We
further observe that a number of these functions and global
variables are common across a number of wireless sensor
applications. In a sensor cloud environment, which has a
number of WSN applications running simultaneously in various
clusters of motes, many applications may share parts of code.
Some applications may have the same security code, while
others may share the same routing subroutines, some others
may share the sensing code and so on. All applications also
share the same operating system code. The basic idea therefore
is, to identify the commonly used functions and global
variables across all the given applications. Then, distribute
these in the network such that every node probabilistically
stores a few of them. When the code on a cluster of sensors
needs to be changed, the base station first checks which of the
functions and objects can the sensors request from the other
sensors in the network. Only the part of code which does not
already exist in the network is sent from the base station. Rest
of the code is requested from nearby sensors. The security
challenges in the scheme are enumerated below.
1. Since the functions are stored on sensors, a request for a
specific function will leak information about the code. To avoid
this, the functions need to be kept encrypted on the sensors.
This however presents a problem because the encryption keys
will need to be revealed to the requesting sensors. Once the
requesting cluster knows the encryption keys it can send
spurious requests and retrieve all the encrypted code.
2. When sensors reply with encrypted functions to function
requests, it needs to be made sure that the functions have not
been tampered with. This authentication needs to be done as
soon as the functions are received to thwart energy draining
We assume that the base station has a tentative list of
applications that may be used in the sensor cloud in future. It
must be noted that, not all the applications are needed to be
known beforehand, rather a small sample size which would be
sufficient to detect the common functions across the
applications. We follow a similar procedure as followed in
Qdiff [1] to detect common functions in the applications. To
dump ELF files we use MSP430's msp430-objdump utility and
the Bauhaus-toolkit to compare the C files generated from the
nesC code of the various applications, compiled for the TelosB
mote. Bauhaus-toolkit has a clone detection utility which can
detect Type I and Type II clones, in different applications, at the
source code level. Type I clones are fragments of code which
are identical and Type II clones are copies which are
structurally identical but may have the identifiers changed. At
this point of time in our research we only consider Type I
clones. In future we will develop techniques to take Type II
clones into account as well. The Type I clones found in the C
code by the clone detection utility of the Bauhaus toolkit, can
be further divided into two different types at the ELF file level.
Definition 1. We define Type 1a clones as the true Type 1
clones, where the two codes are exactly the same and they may
or may not have been shifted in memory.
Definition 2. Some Type 1 clones may also contain calls to
functions and global variables which have shifted in memory.
We define such clones, which have calls to functions and refer
global variables, which have shifted in memory as Type 1b
While Type 1a clones can be used as they are, it is
necessary to fix the code shifts and the change of memory
addresses caused by these code shifts in Type 1b clones, before
the Type 1b clones are used. To deal with the code shifts, before
disseminating the common functions and objects in the
network, the base station performs the following activity. The
base station reorders the functions and global variables in the
ELF files. Beginning with the Type 1a functions, it places the
common functions at the end of the .text section. The code will
now grow towards the beginning of the file. After all the Type
1a functions are moved, Type 1b functions are moved in the
same way. This results in the reordering of other functions and
changes in function references. The base station then fixes the
changes in function references throughout the code. We place
the common functions at the end of the .text section and not at
the beginning to avoid situation where in future the base station
gets applications with large .data and .bss sections. This will
move the beginning of the .text section further down. In such a
case it would be impossible to use the common function which
was placed in the beginning of the .text section of a smaller
application. Therefore, it is necessary to ensure that there are no
common functions present at the beginning of the .text section
at all. The rearranged common functions with the function calls
fixed are then distributed on the wireless sensors as explained
in section 5.1. When the base station has to disseminate a new
application code in the network, it first rearranges the functions
of the new application code such that the common functions
reside in the same memory location as the code which was
distributed in the network. Rest of the code is then placed
around these functions. The global variables of this new
application are also arranged according to the common
functions' need in the .data and .bss sections.
Index CFL BF New Code Key
Figure 2. Pre dissemination packet content
5.1. Pre-deployment Phase
The pre-deployment phase consists of two parts, the code
processing part and the crypto pre-processing part. In the code
processing part, the base station gives a unique identifier called
the FID to each new function it encounters and identifies the
common functions across all the applications as described in
section 4. The FIDs are kept in a table called the function table.
Once the common functions have been identified, the
application codes are rearranged according to these functions.
In the crypto pre- processing phase, the base station creates two
one way hash chains. The encryption hash chain K0, K1, …, Kn
and the authentication hash chain A0, A1, …, An, where, n is
taken to be sufficiently large so as to cover the entire lifetime of
the network operations. The hash chains have the following
1. Ai = h(Ai+1) and Ki = h(Ki+1)
2. A0 and K0 are the root of the chain which are obtained by
applying the hash function h(), n times on An and Kn
A0 is pre-deployed on the sensors. For each sensor, some
FIDs are randomly selected, based on the Flash memory
allocated for storing the functions. The functions are encrypted
with key K0 using the symmetric re-encryption scheme of [13]
and pairs of FIDs and the associated encrypted function are
deployed on the sensors.
RKi h(CFL) HMAC Ai(h(RK i||h(CFL))
Figure 3. Content disseminated by base station
5.2. Pre-dissemination
For each new application code to be disseminated the base
station creates a list called the common functions list (CFL). It
consists of the FIDs of all the common functions which a node
can find in the network stored on other nodes. The list is in the
form of FIDs along with the size of the functions and their
memory location in the compiled code.
Before disseminating the code, the base station creates a predissemination packet. This packet consists of the re-encryption
key. For the first code dissemination, the re-encryption key is
calculated from the keys K0 and K1 of the encryption key chain.
For any given iteration of code change i the re-encryption key
is calculated using the i-th and the i-1-th keys in the encryption
key chain. The packet further consists of the re-encryption key,
RKi, hash of the CFL, h(CFL) and an HMAC of hash of the reencryption key concatenated with hash of the CFL, i.e HMACAi
(h(RKi)||h(CFL)). The key Ai used for generating the HMAC is
the next key in the authentication key chain. This predissemination packet is then disseminated in the network just
prior to the code. This is a broadcast packet and all the nodes in
the network save the contents of this packet to authenticate the
CFL and the re-encryption key. The structure of this packet can
be seen in Fig. 2.
5.3. Code Dissemination
The base station prepares for the code dissemination by
creating a Bloom Filter (BF) of an appropriate length. It uses a
hash function to hash the common functions on the CFL one by
one to populate the BF. It then combines the CFL, the BF, the
new code and the next encryption key Ki together. The B.S also
creates an index on the code dissemination content to help the
nodes recover everything and appends this index itself to the
Once the total code dissemination content as shown in Fig
3 is known, the code is divided into pages and then packets.
The base station then creates a session key Si using a nonce n
and the clusters group key CK. This key is used to encrypt the
packets to provide confidentiality. All the packets are encrypted
individually with the same key. To provide integrity of code, a
process similar to Seluge [10] is used with some modifications.
For the sake of continuation we use the same nomenclature as
in Seluge [10]. We assume that there are P pages and each page
has N packets. The pages are denoted as Page 1 to Page P,
while the packets for Page i are denoted as Pkti-1 to Pkti-N.
Packets in Page P are hashed and the hash of packet i is
appended to packet i in page P-1. The packets in Page P -1 then
consist of the hash of the corresponding Page P packet along
with the original packets of Page P-1. This process is followed
until the packets of Page 1 are hashed. A Merkle Hash Tree is
created over the packets in Page 1, we call this the Vertical
Hash Tree (VHT). In Seluge [10] a signature is created over the
root of the Merkle Hash Tree. Verifying a signature is a public
key cryptography operation and consumes a large amount of
energy. Our implementation of ECDSA signatures over TelosB
motes shows that verification of one signature needs 28.771 mJ
of energy. On the other hand 1 AES-256 bit encryption costs
.01mJ of energy. In a traditional wireless sensor network, this is
a necessary evil since the whole network needs to be updated.
In a sensor cloud since only one cluster needs to be updated at a
time, we can use symmetric key cryptography in place of public
key cryptography. Instead of signing the root of the hash with
the base stations private key, the base station creates a signature
key from the clusters key CK and a random nonce k which will
be used to produce the signature. A signature packet which
includes the VHT root hash and the nonce k is created and the
signature is produced by encrypting VHT root hash||k. The
nodes in the cluster can derive the signature key from the
cluster key and the nonce and verify the root hash.
We observe that since code updation in a sensor cloud
happens in a cluster where the sensors are physically close
together, energy intensive tasks such as decryption of packets
can be done in a distributed manner. Thus, instead of every
sensor decrypting all the code, each sensor can decrypt a few
packets, which can save a large amount of energy. This
however makes it necessary that nodes within a cluster are
protected against code injection from each other. To accomplish
this, the base station creates another hash tree on the same
dissemination contents, which we call the Horizontal Hash Tree
(HHT). For this hash tree, each page of the code is hashed and
the hashes of the page h(Page) are used as leaf nodes. The root
hash of HHT is encrypted using the signature key and the
signature is included in the signature packet. The base station
just prior to starting the dissemination of the code, broadcasts
the next authentication key Ai, which was used in creating the
HMAC in the pre-dissemination phase.
5.4. Activity on the nodes
After receiving the next authentication key Ai, the nodes
follow the following process. The nodes verify this key by
checking whether h(Ai) = Ai-1. Once the validity of this key has
been ensured, the re-encryption key and the hash of the CFL in
the pre-dissemination packet are validated by creating an
HMAC and comparing it to the one obtained in the predissemination phase. The one way property of the hash chain
ensures that any malicious node, which has the authentication
key Ai-1, cannot predict the key Ai, with non-negligible
probability. This implies that an adversary which makes any
changes to the contents of the pre-dissemination packet, will be
caught with a very high probability, which ensures the delivery
of the un-tampered RKi and h(CFL). After validating the reencryption key, the encrypted functions stored on the sensors
are re-encrypted using the re-encryption key. Re-encrypting
with re-encryption key RKi the function, which were earlier
encrypted by the key Ki-1, means these functions can now only
be decrypted by key Ki. This delayed authentication using a
hash chain means an adversary cannot make the nodes accept
arbitrary re-encryption keys. This however presents a problem,
that if a node happens to miss one pre-dissemination packet, it
will break the entire chain of re-encryption keys and the node
will no longer be able to use the following re-encryption keys.
A simple solution is to send multiple re-encryption keys in each
pre-dissemination packet. Past re-encryption keys can be sent
along with the current re-encryption key to enable the nodes
which have missed one of the past re-encryption keys to mend
their broken chain of re-encryption keys. To enable distributed
decryption, the cluster head in each cluster creates N virtual ids
ranging from 1 to N and gives each sensor one of the virtual
ids, where N is the number of packets in a page. The sensors
instead of dealing with all the packets only store the packets
which are multiple of their virtual id. So, a node with virtual id
1, decrypts pkt1 in all the pages, the node with virtual id 2,
decrypts pkt2 in all the pages. In case the number of nodes in
the cluster is less than N, nodes can be given additional virtual
ids. Each node can verify the packet in page i from the hash in
the packet in page i + 1. After all the packets have been
received and all except the packets in page 0 have been
verified, the nodes encrypt their decrypted packets again and
broadcast for other nodes to receive. The encryption is done in
large blocks of data which reduces the number encryption
operations a node has to perform. Once the nodes receive all
the packets of the code dissemination, they first verify that the
code has not been tampered by the cluster members. This is
done by extracting the horizontal hash bits from each packet
and verifying the root of the Horizontal Hash Tree. Since nodes
are only allowed to decrypt a part of each page, a change in the
code by a malicious node will always be caught. After the
verification of the Horizontal Hash Tree, the Vertical Hash Tree
is verified, in a similar way, in which Seluge [10] verifies its
hash tree. The slight difference is in the verification of the root
node, which is encrypted by the symmetric signature key
generated from the clusters key CK and the random nonce k.
After the verification phase is complete, the nodes extract the
CFL, the Bloom filter, the new code and the encryption key Ki
using the index. The cluster head then broadcasts the CFL in
clear. When a sensor receives the CFL, it first checks its
validity by hashing the CFL and comparing it to the hash of
CFL received in the pre- dissemination packet. Once the CFL is
verified the sensors check if they have the requested functions
by comparing the FIDs. If the sensor has one or more of the
requested functions, the functions which were re-encrypted by
RKi are sent back to the requesting nodes. The encryption key
received by cluster nodes, in the code dissemination is used to
decrypt the received encrypted functions. The received
functions are then verified using the Bloom Filter. The
functions are hashed and the positions in the filter, which this
hash results are checked against the already existing entries in
the Bloom Filter. If the hash of a received function, results in
positions which are unset in the Bloom Filter, the function is
rejected, otherwise accepted. Once all functions pass through
the Bloom Filter, the nodes are ready to build the code image
from its various parts. To build the code image the nodes have
to only use the CFL to plug these functions into their
appropriate position in the code. The code image is stored in the
flash memory and built. Once the build is complete, the boot
loader can boot the node up using it.
We have implemented our code using TinyOS on TelosB
sensor and simulated using TOSSIM. As discussed in section
5.4, sending a single re-encryption key in a high noise
environment may not be sufficient because of the unreliable
sensors, packet drops and packet corruption. We simulated a
high noise environment with network of 100 nodes spread in a
10x10 grid, and nodes 2 meters apart from each other. As can
be seen in Fig. 4, when only a single key is sent in the predissemination packet, after a run of 1000 iterations, only 95.86
% of the nodes remained in sync with the keys. Once a node
does not receive are-encryption key, it goes out of sync and is
unable to take any part in further operations. This problem can
be alleviated by sending multiple re-encryption keys in a
packet. For example in the i-th iteration, along with the reencryption key RKi, the last re-encryption key RKi-1 can also be
set. This enables nodes which missed the re-encryption key in
the last iteration to come in-sync. It can be seen in the figure
that the percentage of nodes out of sync with the network goes
down as we increase the number of keys. We found, that
99.99% of the nodes were in-sync, when 4 keys were sent, even
after a run of 1000 iterations. Thereafter, the percentage of insync nodes drops slightly which we attribute to the large packet
size because of the multiple keys. With TinyOS 2.x packet size
limit, only 6 re-encryption keys can be sent at once, since the
packet also involves an HMAC and a hash.
algorithm has been implemented on wireless sensors. The
results of our implementation can be seen in Fig. 5. The block
size for encryption was 128 bit. The operations shown in the
figure are encryption, decryption, re-encryption, re-encryption
key generation and key generation. We can see that most
operations of the scheme are very lightweight, except for key
generation. In our secure code dissemination scheme, however,
the key generation happens on the base station. The operations
performed on the nodes are decryption and re-encryption.
Figure 5. Execution times of proxy re-encryption operations
Figure 6. Energy consumption of proxy re-encryption operations
Figure 7. Percentage reduction in overall code size
Figure 4. Synchronization of nodes with the re-encryption keys
We also implemented the symmetric proxy re-encryption
of [13]. This is the first time a symmetric proxy re-encryption
On TelosB motes decryption takes 2.93 milliseconds while
proxy re-encryption takes 5.18 milliseconds. The corresponding
energy consumption data is shown in Fig. 6. From our
implementation, we can say that symmetric proxy re-encryption
is feasible on wireless sensors. Fig. 7, shows the reduction in
the size of disseminated code for various applications compared
to Seluge [10]. For this experiment, we chose five applications.
Four are the standard TinyOS apps, Blink, Sense, Oscilloscope,
RadioSenseToLeds (RSTL) and an application Tree, which is a
tree construction application. The common functions between
these applications were discovered and distributed in the
simulated network. The dissemination code for our algorithm
consisted of the new code along with the overhead of the Index,
the CFL, Bloom Filer and the Decryption Key. This was
compared with standard Seluge [10] dissemination code. The
results in Fig. 7 show the percentage reduction of disseminated
code for each of the application. The reduction is highest in the
Blink application, in which our algorithm disseminates 19.06%
less code. Fig. 8 illustrates the energy overhead of our
algorithm compared to Seluge [10] for each of the applications.
The energy overhead is due to decryption of code packets,
verification of HHT and verification of common functions
through Bloom Filter. The energy overhead in the case of Blink
is 4.2 mJ while, in the case of Oscilloscope, RadioSenseToLeds
and Tree is approximately 18.5 joules. A large portion of this
overhead however, is the cost of confidentiality of code
Figure 8. Energy overhead compared to Seluge
Sensor clouds are an emerging paradigm for sensor
networks, which is very dynamic in nature. Nodes in sensor
clouds are constantly provisioned and de-provisioned for users.
In such a scenario an efficient code dissemination algorithm
which is also secure becomes necessary. In this paper we have
presented a novel code dissemination algorithm, which is both
efficient and secure. Our code dissemination algorithm takes
into account the similarity of code across applications. The
basic idea therefore is to only communicate the new code to the
sensors while the common code can be picked up from the
sensors in the network. This reduces the amount of code needed
to be communicated. Reduced amount of code results in an
energy efficient code dissemination. Our security framework is
based around the symmetric proxy re-encryption algorithm. We
have implemented our algorithm on TelosB motes and from the
experiments it can be concluded that the algorithm reduces the
amount of communication and energy required, while also
providing confidentiality and integrity of code. While in the
current work we have focused only on Type 1 clones in
application code, in future we intend to include Type 2 clones
too, which will greatly increase the efficiency.
This research is supported by the Intelligent Systems
