Architectures for Packet Classification Caching - CSIE -NCKU

advertisement
Author: Kang Li, Francis Chang, Wu-chang Feng
Publisher: INCON 2003
Presenter: Yun-Yan Chang
Date:2010/11/03
1




Introduction
Approach
Cache Architecture
Evaluation
2


Given a limited silicon resources, what is the
best way to implement a packet classification
cache?
Determine how to best use the limited
resource in three aspects:
◦ Cache’s associativity
◦ Replacement policy
◦ Hash function
3

The method used to evaluate the performance
of cache is to use trace-driven simulations.
◦ Packet Classification
Cache Simulator (PCCS)
◦ Trace data set
 Bell Labs
 New Zealand
 University (OC-3 link)
Fig 1. Flow volume in traces
4
m: Flow-ID size
k: result size
N: set associative cache


The cache memory is an
N-way set associative
cache, which splits the
cache memory into N
memory banks.
Each memory bank is a
directly mapped cache
that is addressable by
the output of the hash
function.
Fig 2. Cache Architecture
5
m: Flow-ID size
k: result size
N: set associative cache



For
an
N-way
set
associative cache, every
input FlowID selects N
memory entries from
each memory bank.
Each entry contains an
m-bit FlowID and a kbit classification result.
Classification result is at
least 1 bit for a packet
filter, but could be
multiple bits.
Fig 2. Cache Architecture
6

Cache associativity
◦ Focus on storage costs on cache.
 Direct-mapped
 N-way associative
 Fully associative
Fig 3. Cache associativity
7

Cache replacement
◦ Determines which entry must be replaced in order
to make room for a newly classified flow.
 LRU (Least-Recently-Used) replacement
 LFU (Least-Frequency-Used) replacement
 Probabilistic replacement
Algorithm:
Upon cache miss:
update(0)
// replace entry with probability h
Upon hit miss:
update(1)
update(state)
if (state == 0)
h = alpha * h;
else
h = alpha * h + (1 - alpha);
alpha = 0.9
h: recent hit ratio
8
◦ Performance of different cache replacement
algorithms using a 4-way associative cache
Fig. 5. Replacement policies using 4-way caches
9

Hash function
◦ A critical component to implementing a cache is the
hash function used to index into it.
◦ Traditional hash function, such as SHA-1 hash
function, the generation output takes more than
1000 logic operations (Shift, AND, OR, and XOR)
using 32-bit word.
◦ To reduce the size of hash function and the latency,
we design a XOR-based hash function operates on
the packet header, and consumes only 16 logic
operations (XOR and Shift).
10
SHA-1 hash function
(needs more than 1000 logic operations)
Fig 6. XOR-based hash function
(needs 16 logic operations)
11
◦ Compare the performance with 4-way associative
LRU cache. The performance of XOR-based hash
function is almost equal to the SHA-1 hash function.
Fig. 7. Hash performance using 4-way, LRU cache.
12
Download