CAM

advertisement
Content Addressable Memories
Vahid Tabatabaee
Fall 2007
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
1
References
 Title: Network Processors Architectures, Protocols, and Platforms
Author: Panos C. Lekkas
Publisher: McGraw-Hill
 Kostas Pagiamtzis, Ali Sheikholeslami, “Content-Addressable
Memory (CAM) Circuits and Architectures: A Tutorial and Survey,”
IEEE J of Solid-State Circuits vol. 41, No.3, March 2006.
 NetLogic MicroSystems Application Note, “Intradevice
Configuration of Network Search Engines”.
 NetLogic MicroSystems Application Note, “High Performance Layer
3 Forwarding”.
 IDT White Paper, “Taking Packet Processing to the Next Level”.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
2
Classification and Search Engines
 Classification engine receives streams of packets as its input.
 It applies a set of application-specific sorting rules and policies
continuously on the packets.
 It ends up compiling a series of new parallel packet streams in
queues of packets.ored.
 For classification the NP should consult a memory bank, a lookup
table or even a data base where the rules are stored.
 Search engines are used for consultation of a lookup table or a
database based on rules and policies for the correct classification.
Search engines are mostly based on associative memory, which is
also known as CAM
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
3
What is CAM?
 Content Addressable Memory is a
special kind of memory!
 Read operation in traditional memory:
 Input is address location of the
content that we are interested in it.
 Output is the content of that
address.
 In CAM it is the reverse:
 Input is associated with something
stored in the memory.
 Output is location where the
associated content is stored.
00
1 0 1 X X
01
0 1 1 0 X
10
0 1 1 X X
11
1 0 0 1 1
0 1 1 0 X
0 1
Traditional Memory
00
1 0 1 X X
01
0 1 1 0 X
10
0 1 1 X X
11
1 0 0 1 1
01
0 1 1 0 1
Content Addressable
Memory
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
4
CAM for Routing Table Implementation
 CAM can be used as a search engine.
 We want to find matching contents in a database or Table.
 Example Routing Table
Source: http://pagiamtzis.com/cam/camintro.html
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
5
Simplified CAM Block Diagram









The input to the system is the search word.
The search word is broadcast on the search lines.
Match line indicates if there were a match btw. the search and stored word.
Encoder specifies the match location.
If multiple matches, a priority encoder selects the first match.
Hit signal specifies if there is no match.
The length of the search word is long ranging from 36 to 144 bits.
Table size ranges: a few hundred to 32K.
Address space : 7 to 15 bits.
Source: K. Pagiamtzis, A. Sheikholeslami,
“Content-Addressable Memory (CAM)
Circuits and Architectures:
A Tutorial and Survey,”
IEEE J. of Solid-state circuits. March 2006
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
6
CAM Memory Size
 Largest available around 18
Mbit (single chip).
 Rule of thumb: Largest CAM
chip is about half the largest
available SRAM chip.
 A typical CAM cell consists
of two SRAM cells.
 Exponential growth rate on
the size
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable
Memory (CAM) Circuits and Architectures: A Tutorial and
Survey,” IEEE J. of Solid-state circuits. March 2006
7
CAM Basics
 The search-data word is loaded
into the search-data register.
 All match-lines are pre-charged to
high (temporary match state).
 Search line drivers broadcast the
search word onto the differential
search lines.
 Each CAM core compares its
stored bit against the bit on the
corresponding search-lines.
 Match words that have at least
one missing bit, discharge to
ground.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable
Memory (CAM) Circuits and Architectures: A Tutorial and
Survey,” IEEE J. of Solid-state circuits. March 2006
8
Type of CAMs
 Binary CAM (BCAM) only stores 0s and 1s
 Applications: MAC table consultation. Layer 2 security related
VPN segregation.
 Ternary CAM (TCAM) stores 0s, 1s and don’t cares.
 Application: when we need wilds cards such as, layer 3 and 4
classification for QoS and CoS purposes. IP routing (longest
prefix matching).
 Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb.
 CAM entries are structured as multiples of 36 bits rather than 32
bits.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
9
CAM Advantages
 They associate the input (comparand) with their memory contents
in one clock cycle.
 They are configurable in multiple formats of width and depth of
search data that allows searches to be conducted in parallel.
 CAM can be cascaded to increase the size of lookup tables that
they can store.
 We can add new entries into their table to learn what they don’t
know before.
 They are one of the appropriate solutions for higher speeds.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
10
CAM Disadvantages
 They cost several hundred of dollars per CAM even in
large quantities.
 They occupy a relatively large footprint on a card.
 They consume excessive power.
 Generic system engineering problems:
Interface with network processor.
Simultaneous table update and looking up requests.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
11
CAM structure
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
Output Port
Control
I/O Port Control
Control & status registers
Global mask registers
Flag Control
Mixable with
72 bits x 16384
144 bits x 8192
288 bits x 4096
576 bits x 2048
Priority Encoder
Decoder
72 bits 131072
CAM
(72 bits x 16K x 8 structures)
Empty Bit
CAM control
Pipeline execution control
(command bus)
 The comparand bus is 72 bytes
wide bidirectional.
 The result bus is output.
 Command bus enables
instructions to be loaded to the
CAM.
 It has 8 configurable banks of
memory.
 The NPU issues a command to
the CAM.
 CAM then performs exact match
or uses wildcard characters to
extract relevant information.
 There are two sets of mask
registers inside the CAM.
12
CAM structure
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
Output Port
Control
I/O Port Control
Control & status registers
Global mask registers
Flag Control
Mixable with
72 bits x 16384
144 bits x 8192
288 bits x 4096
576 bits x 2048
Priority Encoder
Decoder
72 bits 131072
CAM
(72 bits x 16K x 8 structures)
Empty Bit
CAM control
Pipeline execution control
(command bus)
 There is global mask registers
which can remove specific bits
and a mask register that is
present in each location of
memory.
 The search result can be
 one output (highest priority)
 Burst of successive results.
 The output port is 24 bytes
wide.
 Flag and control signals specify
status of the banks of the
memory.
 They also enable us to cascade
multiple chips.
13
CAM Features
 CAM Cascading:
 We can cascade up to 8 pieces without incurring performance
penalty in search time (72 bits x 512K).
 We can cascade up to 32 pieces with performance degradation
(72 bits x 2M).
 Terminology:
 Initializing the CAM: writing the table into the memory.
 Learning: updating specific table entries.
 Writing search key to the CAM: search operation
 Handling wider keys:
 Most CAM support 72 bit keys.
 They can support wider keys in native hardware.
 Shorter keys: can be handled at the system level more efficiently.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
14
CAM Latency
 Clock rate is between 66 to 133 MHz.
 The clock speed determines
maximum search capacity.
 Factors affecting the search
performance:
 Key size
 Table size
 For the system designer the total
latency to retrieve data from the
SRAM connected to the CAM is
important.
 By using pipeline and multi-thread
techniques for resource allocation we
can ease the CAM speed
requirements.
Source: IDT
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
15
Packet Search Speed Requirements
Source: IDT
Source: IDT article in CommsDesign:
http://www.commsdesign.com/showArticle.jhtml?articleID=16501972
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
16
Management of Tables Inside a CAM
 It is important to squeeze as much information as we can in a CAM.
 Example from Netlogic application notes:
 We want to store 4 tables of 32 bit wide IP destination addresses.
 The CAM is 128 bits wide.
 If we store directly in every slot 96 bits are wasted.
 We can arrange the 32 bit wide tables next to each other.
 Every 128 bit slot is partitioned into four 32 bit slots.
 These are 3rd, 2nd, 1st, and 0th tables going from left to right.
 We use the global mask register to access only one of the tables.
MASK 3
00000000
FFFFFFFF
FFFFFFFF
FFFFFFFF
MASK 2
FFFFFFFF
00000000
FFFFFFFF
FFFFFFFF
MASK 1
FFFFFFFF
FFFFFFFF
00000000
FFFFFFFF
MASK 0
FFFFFFFF
FFFFFFFF
FFFFFFFF
00000000
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
17
Example Continued
 We can still use the mask register (not global mask register) to do maximum
prefix length match.
127
94
3
2
1
0
….
1 1 0 0
….
1 1 1 0
Comparand
Register
….
0 0 1 1
….
1 1 1 1
Global Mask
Register
1 0 1
0 0 0
1 0 1
95
1 1 1 0
1 1 0
1 0 1
96
….
….
….
….
….
….
….
….
1 0 1
97
0 0 0 0
1 1 0 1
1 0 1 0
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
1 1 0 1
1 0 1 0
MATCH FOUND
0 0 0 1
0 1 1 0
18
Table Aggregation
 We can use tag bits to aggregate multiple tables in a single CAM.
 Example:
 We want to use a single CAM (NL85721) for IPV4 packet classification
and forwarding.
 We want to filter packets based on other parameters such as VPN.
 We can have an undesired match when we want to do a classification.
 CAM word 0 does not match but the dest. address matches CAM word 1
Source: http://www.netlogicmicro.com/pdf/ncs12_rev_0_8.pdf
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
19
Tag bits to avoid undesired matches
 Tag bits can be used to differentiate between tables.
 Tag bits should not be masked.
 For packet classification tag bit is 0 and for packet forwarding it is 1.
Source: http://www.netlogicmicro.com/pdf/ncs12_rev_0_8.pdf
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
20
Vertically Oriented Table Aggregation
 We can use validity bits to support multiple tables with different number of
entries.
 We need one validity bit for each table.
 When the validity bit in a slot is 1 the corresponding table has a valid entry.
 In the comparand register, only the validity bit of the table that is under
search should be 1.
Source: http://www.netlogicmicro.com/pdf/ncs12_rev_0_8.pdf
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
21
System Design Issues (multiple searches)
 For deep packet inspection, several searches
must occur simultaneously.
 For example: MAC table, IP table, rules table,
flow-management table.
 Question: Do we use 4 CAMs or just 1 CAM
with 4 partitions.
SRAM
SRAM
SRAM
SRAM
CAM
CAM
CAM
CAM
 If we use only 1 CAM:
 Some tables are very large and
some small.
 This approach wastes expensive
partitions.
 If we use 4 CAMs:
 It does suffer when smaller tables do
not justify using separate CAMs.
Packet Processing environment
Network Processor
or custom-designed ASIC
 The overall cost also increases since
we have to use separate SRAM too.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
22
System Design Issues (shorter and longer
search keys)
 We showed how we can implement 36 bit search tables in a 72 bit
wide CAM.
 This approach reduces the speed to half since we need to search
two time for each key.
 Some CAMS are hardwired to support both 36 and 72 bit wide
search keys but they are more expensive.
 For longer search keys the are two choices:
 We can use double data rate (DDR) bus and load meaningful
bits at both the rising and dropping edge of the clock.
 We can double the clock frequency of the that loads the
comparands.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
23
System Design Issues
(simultaneous update and search)
 CAMs can not be updated in a location while searching at the
same time.
 When we do update packets can not be forwarded and they are
back logged.
 We can have a backup CAM for update while search is done on
the other CAM.
 Some designs offer a third port for table maintenance without
inhibiting search operations (SiberCore is an example).
 Increases pin count, board real estate, signals to be routed on
the board.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
24
System Design Issues (CIDR table update)
 Recall that CIDR works based on
the longest prefix match (LPM).
 CAM segments are created
based on the prefix length.
 Some empty slots are left in
each segment to accommodate
new entries.
 If a segment is suddenly filled
up, the table must be taken
offline to reshuffle the entries.
 A read and write operation is
needed for each entry that must
be relocated. We may need a
read and write for the mask word
too.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
Source: http://www.netlogicmicro.com/pdf/cidr_white_paper.pdf
25
CIDR table update: worst case analysis
 What is the worst case scenario: All segments but one are full
 A new entry may need up to 31 move operations.
 Each move requires 4 clock cycles for total of
4 x 31 = 124 clock cycles
 We have 3000 routing updates per second
3000 x 124 = 372000 clock cycles per second
 If the NP clock rate is 100 MHz the cycle time is 10 nsec
 How much time the update consumes:
372000 cycles x 10 nsec per cycle = 3.72 msec
 In OC-192 rate, we have around 20 to 30 MPPS
 Therefore, 74,400 to 111,600 packets will not be classified and
should be discarded.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
26
Reproaches against CAM based
search engines (POWER)
 There is a misnomer that power consumption of CAM increases!
 It does not make sense to compare power consumptions of 2Mb CAM
clocked at 66 MHz and capable of 66 Msps with 9Mb CAM clocked at 150
MHZ capable of 125 Msps.
 Power consumption is result of multiple factors such as:
 Semiconductor manufacturing process.
 Number of searches per second.
 Storage density.
 The smaller the process the larger the capacity; it can also cause drop in the
power supply and increase in the clock rate.
 0.18μ process 50% less power than 0.25μ and 30% further improvement
in 0.15μ.
 The absolute power consumption is increasing, because:
 Larger table.
 Wider search key for deep packet classification.
 Increased wire speed.
 Make sure to consider worst case scenarios not the data sheet values.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
27
Reproaches against CAM based search engines
 Table maintenance and management is a software related problem.
 Third port (Synchronous Maintenance Interface [SMI]) for
SiberCore CAMs is an interesting way of having table
maintenance without affecting of the ongoing search processes.
 Sort-free CAM that do not need partitioning CAMs.
 Density and footprint (Not a real issue)
example:
 The three members in the family, the
CYNSE10512, 10256, and 10128, provide
address tables of 512k, 256k, and 128k
entries (18 Mbits, 9 Mbits, and 4.5 Mbits),
respectively.
 All three devices are housed in 388-contact
BGA packages.
 Price: $75, $135, $275
 1,000,000 entry IPV4 can be handled in two
18Mbits CAM.
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
28
Reproaches against CAM based search engines
 Inflexibility with Table Configurations:
 This is a real issue
 Some applications need flexible table sizes and width
 More research and development needed.
 Price
 In absolute terms they are expensive.
 They are sophisticated complex products that are
indispensable in most designs. So they should be expensive!
ENTS689L: Packet Processing and Switching
Content Addressable Memory (CAM)
29
Download