Packet Level Algorithms

advertisement
Packet Level Algorithms
Michael Mitzenmacher
Goals of the Talk
• Consider algorithms/data structures for
measurement/monitoring schemes at the
router level.
– Focus on packets, flows.
• Emphasis on my recent work, future plans.
– “Applied theory”.
• Less on experiments, more on design/analysis of data
structures for applications.
– Hash-based schemes
• Bloom filters and variants.
Vision
• Three-pronged research data.
• Low: Efficient hardware implementations of
relevant algorithms and data structures.
• Medium: New, improved data structures and
algorithms for old and new applications.
• High: Distributed infrastructure supporting
monitoring and measurement schemes.
Background / Building Blocks
• Multiple-choice hashing
• Bloom filters
Multiple Choices: d-left Hashing
• Split hash table into d equal subtables.
• To insert, choose a bucket uniformly for each
subtable.
• Place item in a cell in the least loaded bucket,
breaking ties to the left.
Properties of d-left Hashing
• Analyzable using both combinatorial
methods and differential equations.
– Maximum load very small: O(log log n).
– Differential equations give very, very accurate
performance estimates.
• Maximum load is extremely close to
average load for small values of d.
Example of d-left hashing
• Consider 3-left performance.
Average load 6.4
Average load 4
Load 0
1.7e-08
Load 0
2.3e-05
Load 1
5.6e-07
Load 1
6.0e-04
Load 2
1.2e-05
Load 2
1.1e-02
Load 3
2.1e-04
Load 3
1.5e-01
Load 4
3.5e-03
Load 4
6.6e-01
Load 5
5.6e-02
Load 5
1.8e-01
Load 6
4.8e-01
Load 6
2.3e-05
Load 7
4.5e-01
Load 7
5.6e-31
Load 8
6.2e-03
Load 9
4.8e-15
Example of d-left hashing
• Consider 4-left performance with average load of 6,
using differential equations.
Insertions only
Alternating insertions/deletions
Steady state
Load > 1
1.0000
Load > 1
1.0000
Load > 2
1.0000
Load > 2
0.9999
Load > 3
1.0000
Load > 3
0.9990
Load > 4
0.9999
Load > 4
0.9920
Load > 5
0.9971
Load > 5
0.9505
Load > 6
0.8747
Load > 6
0.7669
Load > 7
0.1283
Load > 7
0.2894
Load > 8
1.273e-10
Load > 8
0.0023
Load > 9
2.460e-138
Load > 9
1.681e-27
Review: Bloom Filters
• Given a set S = {x1,x2,x3,…xn} on a universe U,
want to answer queries of the form:
Is y  S .
• Bloom filter provides an answer in
– “Constant” time (time to hash).
– Small amount of space.
– But with some probability of being wrong.
• Alternative to hashing with interesting tradeoffs.
Bloom Filters
Start with an m bit array, filled with 0s.
B
0 0
0
0
0 0
0
0
0
0
0
0
0
0
0
0
Hash each item xj in S k times. If Hi(xj) = a, set B[a] = 1.
B
0 1
0
0
1 0
1
0
0
1
1
1
0
1
1
0
To check if y is in S, check B at Hi(y). All k values must be 1.
B
0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
Possible to have a false positive; all k values are 1, but y is not in S.
B
0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0
n items
m = cn bits
k hash functions
False Positive Probability
• Pr(specific bit of filter is 0) is
p'  (1  1 / m)  e
kn
 kn / m
p
• If r is fraction of 0 bits in the filter then false
positive probability is
(1  r ) k  (1  p' ) k  (1  p) k  (1  e  k / c ) k
• Approximations valid as r is concentrated around
E[r].
– Martingale argument suffices.
• Find optimal at k = (ln 2)m/n by calculus.
– So optimal fpp is about (0.6185)m/n
n items
m = cn bits
k hash functions
Example
False positive rate
0.1
0.09
0.08
m/n = 8
0.07
0.06
0.05
0.04
0.03
Opt k = 8 ln 2 = 5.45...
0.02
0.01
0
0
1
2
3
4
5
6
7
8
9
10
Hash functions
n items
m = cn bits
k hash functions
Handling Deletions
• Bloom filters can handle insertions, but not
deletions.
B
0 1
0
0
1 0
xi
xj
1
0
0
1
1
1
0
1
1
0
• If deleting xi means resetting 1s to 0s, then
deleting xi will “delete” xj.
Counting Bloom Filters
Start with an m bit array, filled with 0s.
B
0 0
0
0
0 0
0
0
0
0
0
0
0
0
0
0
Hash each item xj in S k times. If Hi(xj) = a, add 1 to B[a].
B
0 3
0
0
1 0
2
0
0
3
2
1
0
2
1
0
To delete xj decrement the corresponding counters.
B
0 2
0
0
0 0
2
0
0
3
2
1
0
1
1
0
Can obtain a corresponding Bloom filter by reducing to 0/1.
B
0 1
0
0
0 0
1
0
0
1
1
1
0
1
1
0
Counting Bloom Filters: Overflow
• Must choose counters large enough to avoid
overflow.
• Poisson approximation suggests 4 bits/counter.
– Average load using k = (ln 2)m/n counters is ln 2.
– Probability a counter has load at least 16:
 e  ln 2 (ln 2)16 / 16! 6.78E  17
• Failsafes possible.
• We assume 4 bits/counter for comparisons.
Bloomier Filters
• Instead of set membership, keep an r-bit function
value for each set element.
– Correct value should be given for each set element.
– Non-set elements should return NULL with high
probability.
• Mutable version: function values can change.
– But underlying set can not.
• First suggested in paper by Chazelle, Kilian,
Rubenfeld, Tal.
From Low to High
• Low
– Hash Tables for Hardware
– New Bloom Filter/Counting Bloom Filter Constructions
(Hardware Friendly)
• Medium
– Approximate Concurrent State Machines
– Distance-Sensitive Bloom Filters
• High
– A Distributed Hashing Infrastructure
Low Level :
Better Hash Tables for Hardware
• Joint work with Adam Kirsch.
– Simple Summaries for Hashing with Choices.
– The Power of One Move: Hashing Schemes for
Hardware.
Perfect Hashing Approach
Element 1 Element 2 Element 3 Element 4 Element 5
Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3)
Near-Perfect Hash Functions
• Perfect hash functions are challenging.
– Require all the data up front – no insertions or
deletions.
– Hard to find efficiently in hardware.
• In [BM96], we note that d-left hashing can
give near-perfect hash functions.
– Useful even with insertions, deletions.
– Some loss in space efficiency.
Near-Perfect Hash Functions via
d-left Hashing
• Maximum load equals 1
– Requires significant space to avoid all
collisions, or some small fraction of spillovers.
• Maximum load greater than 1
– Multiple buckets must be checked, and multiple
cells in a bucket must be checked.
– Not perfect in space usage.
• In practice, 75% space usage is very easy.
• In theory, can do even better.
Hash Table Design : Example
• Desired goals:
– At most 1 item per bucket.
– Minimize space.
• And minimize number of hash functions.
– Small amount of spillover possible.
• We model as a constant fraction, e.g. 0.2%.
• Can be placed in a content-addressable memory
(CAM) if small enough.
Basic d-left Scheme
• For hash table holding up to n elements,
with max load 1 per bucket, use 4 choices
and 2n cells.
– Spillover of approximately 0.002n elements
into CAM.
Improvements from Skew
• For hash table holding up to n elements, with max
load 1 per bucket, use 4 choices and 1.8n cells.
– Subtable sizes 0.79n, 0.51n, 0.32n, 0.18n.
– Spillover still approximately 0.002n elements into
CAM.
– Subtable sizes optimized using differential equations,
black-box optimization.
xk
Summaries to Avoid Lookups
• In hardware, d choices of location can be done by
parallelization.
– Look at d memory banks in parallel.
• But there’s still a cost: pin count.
• Can we keep track of which hash function to use
for each item, using a small summary?
– Yes: use a Bloom-filter like structure to track.
• Skew impacts summary performance; more skew better.
– Uses small amount of on-chip memory.
– Avoids multiple look-ups.
– Special case of a Bloomier filter.
Hash Tables with Moves
• Cuckoo Hashing (Pagh, Rodler)
– Hashed items need not stay in their initial place.
– With multiple choices, can move item to
another choice, without affecting lookups.
• As long as hash values can be recomputed.
– When inserting, if all spots are filled, new item
kicks out an old item, which looks for another
spot, and might kick out another item, and so
on.
Benefits and Problems of Moves
• Benefit: much better space utilization.
– Multiple choices, multiple items per bucket, can
achieve 90+% with no spillover.
• Drawback: complexity.
– Moves required can grow like log n.
• Constant on average.
– Bounded maximum time per operation
important in many settings.
– Moves expensive.
• Table usually in slow memory.
Question : Power of One Move
• How much leverage do we get by just
allowing one move?
– One move likely to be possible in practice.
– Simple for hardware.
– Analysis possible via differential equations.
• Cuckoo hard to analyze.
– Downside : some spillover into CAM.
Comparison, Insertions Only
• 4 schemes
– No moves
– Conservative : Place item if possible. If not, try to move earliest
item that has not already replaced another item to make room.
Otherwise spill over.
– Second chance : Read all possible locations, and for each location
with an item, check it it can be placed in the next subtable. Place
new item as early as possible, moving up to 1 item left 1 level.
– Second chance, with 2 per bucket.
• Target of 0.2% spillover.
• Balanced (all subtables the same) and skewed
compared.
• All done by differential equation analysis (and
simulations match).
Results of Moves : Insertions Only
Space overhead,
balanced
Space overhead,
skewed
Fraction moved,
skewed
No moves
2.00
1.79
0%
Conservative
1.46
1.39
1.6%
Standard
1.41
1.29
12.0%
Standard, 2
1.14
1.06
14.9%
Conclusions, Moves
• Even one move saves significant space.
– More aggressive schemes, considering all
possible single moves, save even more.
(Harder to analyze, more hardware resources.)
• Importance of allowing small amounts of
spillover in practical settings.
From Low to High
• Low
– Hash Tables for Hardware
– New Bloom Filter/Counting Bloom Filter Constructions
(Hardware Friendly)
• Medium
– Approximate Concurrent State Machines
– Distance-Sensitive Bloom Filters
• High
– A Distributed Hashing Infrastructure
Low- Medium:
New Bloom Filters /
Counting Bloom Filters
• Joint work with Flavio Bonomi, Rina
Panigrahy, Sumeet Singh, George Varghese.
A New Approach to Bloom Filters
• Folklore Bloom filter construction.
– Recall: Given a set S = {x1,x2,x3,…xn} on a universe U, want
to answer membership queries.
– Method: Find an n-cell perfect hash function for S.
• Maps set of n elements to n cells in a 1-1 manner.
– Then keep log 2 (1 / e ) bit fingerprint of item in each cell.
Lookups have false positive < e.
– Advantage: each bit/item reduces false positives by a factor of
1/2, vs ln 2 for a standard Bloom filter.
• Negatives:
– Perfect hash functions non-trivial to find.
– Cannot handle on-line insertions.
Near-Perfect Hash Functions
• In [BM96], we note that d-left hashing can
give near-perfect hash functions.
– Useful even with deletions.
• Main differences
– Multiple buckets must be checked, and multiple
cells in a bucket must be checked.
– Not perfect in space usage.
• In practice, 75% space usage is very easy.
• In theory, can do even better.
First Design : Just d-left Hashing
• For a Bloom filter with n elements, use a 3-left
hash table with average load 4, 60 bits per bucket
divided into 6 fixed-size fingerprints of 10 bits.
– Overflow rare, can be ignored.
• False positive rate of 12  210  0.01171875
– Vs. 0.000744 for a standard Bloom filter.
• Problem: Too much empty, wasted space.
– Other parametrizations similarly impractical.
– Need to avoid wasting space.
Just Hashing : Picture
Bucket
Empty
Empty
0000111111
1010101000
0001110101
1011011100
Key: Dynamic Bit Reassignment
• Use 64-bit buckets: 4 bit counter, 60 bits divided
equally among actual fingerprints.
– Fingerprint size depends on bucket load.
• False positive rate of 0.0008937
– Vs. 0.0004587 for a standard Bloom filter.
• DBR: Within a factor of 2.
– And would be better for larger buckets.
– But 64 bits is a nice bucket size for hardware.
• Can we remove the cost of the counter?
DBR : Picture
Bucket
000110110101
111010100001
101010101000
101010110101
010101101011
Count : 4
Semi-Sorting
• Fingerprints in bucket can be in any order.
– Semi-sorting: keep sorted by first bit.
• Use counter to track #fingerprints and
#fingerprints starting with 0.
• First bit can then be erased, implicitly given
by counter info.
• Can extend to first two bits (or more) but
added complexity.
DBR + Semi-sorting : Picture
Bucket
000110110101
111010100001
101010101000
101010110101
010101101011
Count : 4,2
DBR + Semi-Sorting Results
• Using 64-bit buckets, 4 bit counter.
– Semi-sorting on loads 4 and 5.
– Counter only handles up to load 6.
– False positive rate of 0.0004477
• Vs. 0.0004587 for a standard Bloom filter.
– This is the tradeoff point.
• Using 128-bit buckets, 8 bit counter, 3-left hash
table with average load 6.4.
– Semi-sorting all loads: fpr of 0.00004529
– 2 bit semi-sorting for loads 6/7: fpr of 0.00002425
• Vs. 0.00006713 for a standard Bloom filter.
Additional Issues
• Futher possible improvements
– Group buckets to form super-buckets that share
bits.
– Conjecture: Most further improvements are not
worth it in terms of implementation cost.
• Moving items for better balance?
• Underloaded case.
– New structure maintains good performance.
Improvements to Counting
Bloom Filter
• Similar ideas can be used to develop an improved
Counting Bloom Filter structure.
– Same idea: use fingerprints and a d-left hash table.
• Counting Bloom Filters waste lots of space.
– Lots of bits to record counts of 0.
• Our structure beats standard CBFs easily, by
factors of 2 or more in space.
– Even without dynamic bit reassignment.
Deletion Problem
Suppose x and y have the same fingerprint z.
Insert x
x
x
x
x
y
y
Insert y
y
z
y
Delete x? z
z
Deletion Problem
• When you delete, if you see the same fingerprint
at two of the location choices, you don’t know
which is the right one.
– Take both out: false negatives.
– Take neither out: false positives/eventual overflow.
Handling the Deletion Problem
• Want to make sure the fingerprint for an
element cannot appear in two locations.
• Solution: make sure it can’t happen.
– Trick: uses (pseudo)random permtuations
instead of hashing.
Two Stages
• Suppose we have d subtables, each with 2b
buckets, and want f bit fingerprints.
• Stage 1: Hash element x into b+f bits using a
“strong” hash function H(x).
• Stage 2: Apply d permutations taking
{0… 2b+f-1}
{0… 2b+f-1}
 i ( H ( x))  ( Bi , Fi )
– Bucket Bi and fingerprint Fi for ith subtable given by ith
permtuation.
– Also, Bi and Fi completely determine H(x).
Handling the Deletion Problem
• Lemma: if x and y yield the same fingerprint in the
same bucket, then H(x) = H(y).
– Proof: because of permutation setup, fingerprint and bucket
determine H(x).
• Each cell has a small counter.
– In case two elements have same hash, H(x) = H(y).
– Note they would match for all buckets/fingerprints.
– 2 bit counters generally suffice.
• Deletion problem avoided.
– Can’t have two fingerprints for x in the table at the same
time; handled by the counter.
A Problem for Analysis
• Permutations implies no longer “pure” d-left
hashing.
– Dependence.
– Analysis no longer applies.
• Some justification:
– Balanced Allocation on Graphs (SODA 2006,
Kenthapadi and Panigrahy.)
– Differential equations.
• Justified experimentally.
Other Practical Issues
• Simple, linear permtuations
 i ( H ( x))  aH ( x) mod 2b f
odd a
– High order bits for bucket, low order for fingerprint.
– Not analyzed, works fine in practice.
• Invertible permutations allow moving elements if
hash table overflows.
– Move element from overflow bucket to another choice.
– Powerful paradigm…
• Cuckoo hashing and related schemes.
– But more expensive in implemenation terms.
Space Comparison : Theory
• Standard counting Bloom filter uses
c counters/element = 4c bits/element.
• The d-left CBF using r bit remainders, 4 hash
functions, 8 cells/bucket uses 4(r+2)/3
bits/element.
• Space equalized when c = (r+2)/3.
Standard false pos d  left false pos
2c ln 2

24  23c  2
• Can change parameters to get other tradeoffs.
Space Comparison : Practice
• Everything behaves essentially according to
expectations.
– Not surprising: everything is a “balls-and-bins”
process.
• Using 4-left hashing:
– Save over a factor of 2 in space with 1% false
postive rate.
– Save over a factor of 2.5 in space with 0.1%
false positive rate.
From Low to High
• Low
– Hash Tables for Hardware
– New Bloom Filter/Counting Bloom Filter Constructions
(Hardware Friendly)
• Medium
– Approximate Concurrent State Machines
– Distance-Sensitive Bloom Filters
• High
– A Distributed Hashing Infrastructure
Approximate Concurrent
State Machines
• Joint work with Flavio Bonomi, Rina
Panigrahy, Sumeet Singh, George Varghese.
• Extending the Bloomier filter idea to handle
dynamic sets, dynamic function values, in
practical setting.
Approximate Concurrent
State Machines
• Model for ACSMs
–
–
–
–
We have underlying state machine, states 1…X.
Lots of concurrent flows.
Want to track state per flow.
Dynamic: Need to insert new flows and delete
terminating flows.
– Can allow some errors.
– Space, hardware-level simplicity are key.
Motivation: Router State Problem
• Suppose each flow has a state to be tracked.
Applications:
–
–
–
–
–
Intrusion detection
Quality of service
Distinguishing P2P traffic
Video congestion control
Potentially, lots of others!
• Want to track state for each flow.
– But compactly; routers have small space.
– Flow IDs can be ~100 bits. Can’t keep a big lookup
table for hundreds of thousands or millions of flows!
Problems to Be Dealt With
• Keeping state values with small space, small
probability of errors.
• Handling deletions.
• Graceful reaction to adversarial/erroneous
behavior.
– Invalid transitions.
– Non-terminating flows.
• Could fill structure if not eventually removed.
– Useful to consider data structures in well-behaved
systems and ill-behaved systems.
ACSM Basics
• Operations
–
–
–
–
Insert new flow, state
Modify flow state
Delete a flow
Lookup flow state
• Errors
–
–
–
–
False positive: return state for non-extant flow
False negative: no state for an extant flow
False return: return wrong state for an extant flow
Don’t know: return don’t know
• Don’t know may be better than other types of errors for many
applications, e.g., slow path vs. fast path.
ACSM via Counting Bloom Filters
• Dynamically track a set of current
(FlowID,FlowState) pairs using a CBF.
• Consider first when system is well-behaved.
– Insertion easy.
– Lookups, deletions, modifications are easy
when current state is given.
• If not, have to search over all possible states. Slow,
and can lead to don’t knows for lookups, other
errors for deletions.
Direct Bloom Filter (DBF) Example
0 0
1
0
2 3
0
0
2
(123456,3)
0 0
0
0
1 3
1
0
1
1
2
0
0
1
2
0
0
(123456,5)
0
0
3
1
1
1
Timing-Based Deletion
• Motivation: Try to turn non-terminating flow
problem into an advantage.
• Add a 1-bit flag to each cell, and a timer.
– If a cell is not “touched” in a phase, 0 it out.
• Non-terminating flows eventually zeroed.
• Counters can be smaller or non-existent; since
deletions occur via timing.
• Timing-based deletion required for all of our
schemes.
Timer Example
Timer bits
1
0
0
0
1
0
1
0
3
0 0
2
1
0
1
1
0
0
0
RESET
0
0
0
0
0
3 0 0 0 1 0 1 0
Stateful Bloom Filters
• Each flow hashed to k cells, like a Bloom filter.
• Each cell stores a state.
• If two flows collide at a cell, cell takes on don’t know
value.
• On lookup, as long as one cell has a state value, and
there are not contradicting state values, return state.
• Deletions handled by timing mechanism (or counters
in well-behaved systems).
• Similar in spirit to [KM], Bloom filter summaries for
multiple choice hash tables.
Stateful Bloom Filter (SBF) Example
1 4
3
4
3 3
0
0
2
(123456,3)
1 4
5
4
5 3
1
0
1
4
?
0
2
4
?
0
2
(123456,5)
0
0
2
1
0
1
What We Need : A New Design
• These Bloom filter generalizations were not
doing the job.
– Poor performance experimentally.
• Maybe we need a new design for Bloom
filters!
• In real life, things went the other way; we
designed a new ACSM structure, and found
that it led to the new Bloom filter/counting
Bloom filter designs.
Fingerprint Compressed Filter
• Each flow hashed to d choices in the table, placed at
the least loaded.
– Fingerprint and state stored.
• Deletions handled by timing mechanism or explicitly.
• False positives/negatives can still occur (especially in
ill-behaved systems).
• Lots of parameters: number of hash functions, cells
per bucket, fingerprint size, etc.
– Useful for flexible design.
Fingerprint Compressed Filter
(FCF) Example
Fingerprint
State
10001110011111100 3
01110100100010111 1
01110010010101111 6
11110101001000111
11110111001001011
00011110011101101
11111111110000000
2
2
1
4
10101110010101011 2
01110010001011111 3
11100010010111110 1
x : 11110111001001011 : State 2 to State 4
Experiment Summary
• FCF-based ACSM is the clear winner.
– Better performance than less space for the
others in test situations.
• ACSM performance seems reasonable:
– Sub 1% error rates with reasonable size.
Distance-Sensitive Bloom Filters
• Instead of answering questions of the form
Is y  S .
we would like to answer questions of the form
Is y  x  S .
• That is, is the query close to some element of the set,
under some metric and some notion of close.
• Applications:
– DNA matching
– Virus/worm matching
– Databases
Distance-Sensitive Bloom Filters
• Goal: something in same spirit as Bloom
filters.
– Don’t exhaustively check set.
• Initial results for Hamming distance show it
is possible. [KM]
• Closely related to locality-sensitive hashing.
• Not currently practical.
• New ideas?
From Low to High
• Low
– Hash Tables for Hardware
– New Bloom Filter/Counting Bloom Filter Constructions
(Hardware Friendly)
• Medium
– Approximate Concurrent State Machines
– Distance-Sensitive Bloom Filters
• High
– A Distributed Hashing Infrastructure
A Distributed
Router Infrastructure
• Recently funded FIND proposal.
• Looking for ideas/collaborators.
The High-Level Pitch
• Lots of hash-based schemes being designed
for approximate measurement/monitoring
tasks.
– But not built into the system to begin with.
• Want a flexible router architecture that
allows:
– New methods to be easily added.
– Distributed cooperation using such schemes.
What We Need
Memory
Computation
Communication
+ Control
Off-Chip
Memory
Hashing
Computation
Unit
Control
System
On-Chip
Memory
CAM(s)
Unit for
Programming
Other
Language
Computation
Communication
Architecture
Lots of Design Questions
• How much space for various memory levels? How can we
dynamically divide memory among multiple competing
applications?
• What hash functions should be included? How open
should system be to new hash functions?
• What programming functionality should be included?
What programming language to use?
• What communication is necessary to achieve distributed
monitoring tasks given the architecture?
• Should security be a consideration? What security
approaches are possible?
• And so on…
Related Theory Work
• What hash functions should be included?
– Joint work with Salil Vadhan.
– Using theory of randomness extraction, we
show that for d-left hashing, Bloom filters, and
other hashing methods, choosing a hash
function from a pairwise independent family is
enough – if data has sufficient entropy.
• Behavior matches truly random hash function with
high probability.
• Radnomness of hash function and data “combine”.
• Pairwise independence enough for many
applications.
Conclusions and Future Work
• Low: Mapping current hashing techniques to hardware is
fruitful for practice.
• Medium: Big boom in hashing-based algorithms/data
structures. Trend is likely to continue.
– Approximate concurrent state machines: Natural progression from
set membership to functions (Bloomier filter) to state machines.
What is next?
– Power of d-left hashing variants for near-perfect matchings.
• High: Wide open. Need to systematize our knowledge for
next generation systems.
– Measurement and monitoring infrastructure built into the system.
Download