Talk Slides

advertisement
Secure and Highly-Available
Aggregation Queries via Set Sampling
Haifeng Yu
National University of Singapore
Secure Aggregation Queries in Sensor Networks
 Multi-hop sensor network with trusted base station
 With the presence of malicious (byzantine) sensors
 Goal: Count the # of sensors sensing smoke (i.e.,
satisfying a certain predicate)
 Sum, Avg, and other aggregates are similar – see paper
 Type-1 attack: Malicious sensors report fake readings
 If # malicious sensor is small – damage is limited
 Not the focus of our work
Haifeng Yu, National University of Singapore
2
Secure Aggregation Queries in Sensor Networks
 Type-2 attack: Malicious sensors (indirectly) corrupt the
readings of other sensors – much larger damage
 E.g., in tree based aggregation
 Focus of most research on secure aggregation – our focus too
base station
36
41
2
1
0
1
malicious
0
0
Haifeng Yu, National University of Singapore
3
State-of-Art and Our Goal
 Active area in recent years (e.g. [Chan et al.’06], [Frikken et
al.’08], [Roy et al.’06], [Nath et al.’09])
 All these approaches focus on detection (i.e., safety only)
 Will detect if the result is corrupted
 But will not produce a correct result when under attack
Our Goal
Detecting attacks  Tolerating attacks
Safety only  Safety + Liveness
System made harmless  System made useful
Haifeng Yu, National University of Singapore
4
Our Approach to Tolerating Attacks
 Previous approaches:
Fix the security holes in
tree-based aggregation
 Dilemma in in-network
processing
 Our novel approach:
Use sampling
36
41
2
1
0
1
0
0
 With MACs on each
sample, security comes
almost automatically
Haifeng Yu, National University of Singapore
5
Our Approach to Tolerating Attacks
 Previous approaches:
Fix the security holes in
tree-based aggregation
 Dilemma in in-network
processing
 Our novel approach:
Use sampling
 With MACs on each
sample, security comes
almost automatically
Cannot
modify
the result
0
0
0
0
0
0
0
sampled
0
0
flood the sample
result (with a MAC)
Challenge with sampling: Potentially large overhead
Haifeng Yu, National University of Singapore
6
Background: Estimate Count via Sampling
 n sensors, b sensors sensing smoke (called black
sensors)
 Goal: Output (, ) approximation b’ such that:
Pr[| b'b |   b]  1  
 E.g.: Sample 10 sensors and 5 are black
 b’ = 0.5n
 Classic result: # sensors needed to sample is
1
n 1

log

2

b 
(Prohibitively)
expensive for small b
Haifeng Yu, National University of Singapore
7
Reduce the Overhead via Set Sampling
 Challenges with small b:
 Need many samples to encounter black sensors
 Set sampling: Sample a set of sensors together
 Binary result will tell whether any sensor in the set is
black (but not how many)
 Efficient implementation in sensor networks – later
 Should be easier to hit sets containing black sensors
How effective will this be?
(How many sets do we need to sample to estimate count?)
Haifeng Yu, National University of Singapore
8
Our Results
 Novel algorithm for estimating count using set
sampling
 Defines randomized and inter-related sets, and
sample them adaptively

log n
 1

log n 
# sets needed to sample: O 2 log



# of samples reduced from polynomial to polylogarithmic
(can be further reduced – see paper)
 Previously without set sampling:
1
n 1

log

2

b 
Haifeng Yu, National University of Singapore
9
Our Results
log n
 1

 Per-sensor msg complexity: O
log
log n 
2



 Comparable to some detection-only protocols [Roy et
al.’06]
1
 1

 Similar msg sizes
O 2 log log n 



 See paper for time complexity
 See paper for other aggregates (sum, avg)
 Set sampling + novel algorithms using set
sampling  Enables secure aggregation queries
despite adversarial interference
Haifeng Yu, National University of Singapore
10
Outline of This Talk
 Background, goal, and summary of results
 Simple implementation of set sampling in
sensor networks
 Main technical results: Novel algorithm for
estimating count via set sampling
Haifeng Yu, National University of Singapore
11
Implementing Set Sampling – Non-Secure Version
Goal: O(1) per-sensor msg complexity for sampling a set
 Example: sample the set {A, B, C, D}
 Request flooded from the base station: O(log n) bits
 We use only O(n) (instead of O(2n)) random sets  O(log n)
bits to name a set
 Reply: Single bit
 Flood back from all black sensors in the set {e.g., A and C}
 Each sensor only forwards the first message received
 Base station sees binary answer
 Multiple samples can be taken in one flooding
 Our algorithm takes samples in O(log n) sequential stages 
Only O(log n) times of flooding
Haifeng Yu, National University of Singapore
12
Implementing Set Sampling – Secure Design
 Each set = Some distinct symmetric key K
 Preload K onto all sensors in the set
 Each sensor should be only be in a small number
of sets – O(log n) in our protocol
 Request: name of K, nonce
 Reply: MAC_K(nonce)
 Only sensors holding K can generate
 DoS attacks possible
 Can be avoided with improved design – see paper
Haifeng Yu, National University of Singapore
13
Outline of This Talk
 Background, goal, and summary of results
 Implement set sampling in sensor networks
 Main technical meat: Novel algorithm for
estimating count via set sampling
 For now assume all sensors are honest
 Security follows from the clean security
guarantees of sampling, though some minor
modifications needed – see paper
Haifeng Yu, National University of Singapore
14
Random Sets on the Sampling Tree
 Basic approach:
 Construct (related) randomized sets of different
sizes and adaptively sample them
 Base station internally created a sampling tree
 A complete binary tree with 4n leaves
 Each tree node = A distinct symmetric key = Some
set of sensors
 Sampling tree is an internal data structure and not
network topology
Haifeng Yu, National University of Singapore
15
K1
K3
K2
K5
K4
K8
K9
K10
K 11
A
K1, K2, K5, K10 loaded
onto the sensor A
K6
K 12
K13
K7
K 14
K15
B
K1, K3, K6, K12 loaded
onto the sensor B
Each sensor is associated with a uniformly random leaf (independently)
Each tree node corresponds to a set
containing all the sensors in its subtree
Haifeng Yu, National University of Singapore
16
Properties of the Sampling Tree
f0  1
f1  1
f 2  0.5
f 3  0.25
 A sensor is black if it satisfies the predicate
 A key is black iff the corresponding set contains black
sensor

f i : fraction of black keys at level i
Haifeng Yu, National University of Singapore
17
f0  1
f1  1
f 2  0.5
f 3  0.25

f i is monotonic as we go down the tree
 Decrease by a factor of at most 2 per level
f i  1 (assuming at least one black sensor)
At the bottom f i  1 / 4 (4n leaves!)
 At the top

1 1
Lemma: There exists a level  with f   , 
4 2
Haifeng Yu, National University of Singapore
18
Why Level  Helps
 f not too small  Efficient estimation of f
via naïve sampling:
1
 1
 O 2 log  samples on level  yields an (, )


approximation for f
 f not too large  Can potentially estimate
final count directly from f
 Chernoff-type occupancy tail bound for balls into
bins
 See paper for details
Haifeng Yu, National University of Singapore
19
Additional Issues: Too Few Keys on Level 
 Challenge:
 To estimate final count based on f , the number
of keys on level  needs to be large enough
 If not, need to track down to lower levels
 Need to leverage other interesting properties on
the sampling tree
 See paper
Haifeng Yu, National University of Singapore
20
Additional Issues: Finding Level 
 Binary search on the O(log(n)) levels
 On each level i examined, sample a small number
of random keys to roughly estimate f i
 Extremely efficient
 Challenges:
 The binary search operates on estimated values
(with error and may not be monotonic)
 When f i is small, the estimation only has error
guarantee on one side
 See paper
Haifeng Yu, National University of Singapore
21
Example Numerical Results
 n = 10,000 and count result (b) range from 0
to 10,000
 Overhead:
 5-15 sequential stages of sampling
 Total 250-300 samples
 Avg approximation error: (1±0.08)
 Hard to get better accuracy even in trusted
environments ([Nath et al.’09])…
 Naive sampling: 300 samples gives same
accuracy only when b > 2,000
Haifeng Yu, National University of Singapore
22
Conclusions
 Making aggregation queries secure is critical
for many sensor network applications
 Contribution: Detecting attacks  Tolerating
attacks
 Safety only  Safety + Liveness
 Our approach:
 Abandon in-network processing and use sampling
 Use novel set sampling to reduce the overhead
 Polynomial overhead  Logarithmic overhead
Haifeng Yu, National University of Singapore
23
Related Work to Set Sampling
 Decision tree complexity for threshold-t
functions (i.e., whether b  t) [Ben-Asher and
Newman’95] [Aspnes’09]
 Most results are for error-free deterministic
protocols
 Large lower bound: (t) (implying (b) for count)
 No prior results for general Monte Carlo
randomized algorithm
Haifeng Yu, National University of Singapore
24
Tolerating Attacks is Difficult
 Example: Byzantine consensus
 Detection substantially easier than tolerance
 n  3f +1 lower bound only applies to tolerance and
not detection
 Pinpointing / revoking malicious sensors is hard
 E.g., due to lack of public-key authentication
 Active research area by itself
Haifeng Yu, National University of Singapore
25
System Model
 Multi-hop sensor network with trusted base station
 Performance metric: Time complexity – see paper
 Performance metric: Per-sensor msg complexity
 Max number of msgs sent/received by an single sensor
(captures loading balance)
 msg size is either 8 bytes (size of a MAC) of log(n) bits
 Collision ignored – as in all prior work
 Or one can apply existing algorithms…
Haifeng Yu, National University of Singapore
26
Implementing Set Sampling – Non-Secure Version
Goal: O(1) per-sensor msg complexity for sampling a set
Request flooding –
every sensor
sends/receives
one msg
 Request size: We use at most O(n) (random) sets 
O(log(n)) bits to name a set
Haifeng Yu, National University of Singapore
27
Implementing Set Sampling – Non-Secure Version
Goal: O(1) per-sensor msg complexity for sampling a set
B, C, D satisfies
the predicate, A
does not
A
B
Reply flooding –
Only the first reply
is forwarded
D
C
 Reply: Single bit
This is why set sampling is designed to be binary
Haifeng Yu, National University of Singapore
28
(The overhead of sampling a set needs to be
properly controlled – will discuss later.)
Haifeng Yu, National University of Singapore
29
Translating to b
 We now have a good estimation for
 Need to produce a good estimation for b
 Let number of keys on level be n
 Throw b balls into n bins
 The fraction of occupied bins has the same
distribution as
 This distribution is highly concentrated near
its mean (Chernoff-type occupancy tail
bound), assuming
 not too close to 1
 n not too small
Haifeng Yu, National University of Singapore
30
Summary of Techniques to Achieve the Results
 Define randomized sets based on a complete
binary tree
 Interesting relationships among the sets
 Sample the sets adaptively
 Leverages Chernoff-type occupancy tail
bounds for balls-into-bins
Haifeng Yu, National University of Singapore
31
Download