Bluespec- 3 The IP Lookup Problem The IP lookup problem

advertisement
L19- 1
Bluespec- 3
The IP Lookup Problem
Arvind
Laboratory for Computer Science
M.I.T.
Lecture 19
http://www.csg.lcs.mit.edu/6.827
L19-2
Arvind
The IP lookup problem
• An IP lookup table contains IP prefixes
and associated data
• The problem: given an IP address, return
the data associated with the longest
prefix match ("LPM")
Example
Table
Prefix
Data
IP address
7.14.*.*
A
7.13.7.3
F
7.14.7.3
B
10.7.12.15
F
10.18.200.*
C
10.18.201.5
F
10.18.200.5
D
7.14.7.2
5.*.*.*
E
5.13.7.2
*
F
8.0.0.0
Result
Example
lookups
10.18.200.7
http://www.csg.lcs.mit.edu/6.827
1
L19-3
Arvind
F
F
…
…
F
F
F
C
F
10
255
IP address
A
…
F
A
7
Result
M Ref
7.13.7.3
F
2
10.18.201.5
F
3
7.14.7.2
A
5.13.7.2
E
10.18.200.7
C
18
200
5
D
…
*
…
E
F
B
…
5.*.*.*
3
7
A
…
D
F
A
…
10.18.200.5
…
C
14
E
…
10.18.200.*
5
F
…
B
…
7.14.7.3
F
…
A
…
0
7.14.*.*
…
Sparse tree representation
C
http://www.csg.lcs.mit.edu/
L19-4
Arvind
Table representation issues
• LPM is used for CIDR (Classless InterDomain Routing)
• Number of memory accesses for an LPM?
– Too many
difficult to do LPMs at line rate
• Table size?
– Too big
bigger SRAM
more latency, cost, power
• Control- plane issues:
– incremental table update
– size, speed of table maintenance software
• In this lecture (so code will fit on slides!):
– Level 1: 16 bits, Levels 2 and 3: 8 bits
– So: from 1 to 3 memory accesses for an LPM
http://www.csg.lcs.mit.edu/6.827
2
L19-5
Arvind
Outline
• Example: IP Lookup √
• Three solutions ⇐
– Statically scheduled memory pipeline
– Straight pipeline with uncoordinated memory
references
– Circular pipeline for 100% memory utilization
• Modeling RAMs
– Synchronous vs. Asynchonous view
– Port replicator
• Bluespec coding for straight pipeline
• Bluespec coding for circular pipeline
• Phase 1 compilation
http://www.csg.lcs.mit.edu/6.827
L19-6
Arvind
Static scheduling solution
• Assume the SRAM containing the table
has latency of n cycles, lay out a pipeline
so that the memory accesses are
precisely scheduled
• Issues:
– Since an LPM may take 1- 3 mem accesses, unused
slots may be left idle
– May have to replan the pipeline for a different
latency memory
– Very difficult to plan if memory is also to be used
for some unrelated task.
http://www.csg.lcs.mit.edu/6.827
3
L19-7
Arvind
LPM: Straight Pipeline Solution
ROM
IP Address Table
Replicator
Replicator
rom0
start
lookup 1
rom1
rom2
finish lookup 1,
(start lookup 2)
fifo0
finish lookup 2,
(start lookup 3)
fifo1
IP addr
18.100.32.127
finish
lookup 3
ofifo
fifo2
Lookup Result
(Egress port, …)
http://www.csg.lcs.mit.edu/6.827
L19-8
Arvind
Circular pipeline solution
getToken
Completion
buffer
getTokenAck
luResp
luRespAck
done
luReq
tf
Enter
Enter
Move
Move
RAM
ops
leaf
node
Complete
Complete
Recirc
Recirc
buf
http://www.csg.lcs.mit.edu/6.827
4
L19-9
Arvind
Outline
• Example: IP Lookup √
• Three solutions √
– Statically scheduled memory pipeline
– Straight pipeline with uncoordinated memory
references
– Circular pipeline for 100% memory utilization
• Modeling RAMs ⇐
– Synchronous vs. Asynchonous view
– Port replicator
• Bluespec coding for straight pipeline
• Bluespec coding for circular pipeline
• Phase 1 compilation
http://www.csg.lcs.mit.edu/6.827
e
L19-10
Arvind
RAMs
• Basic memory components are "synchronous":
– Present an read- address AJ on clock J
– Data DJ arrives on clock J+N
– If you don't "catch" DJ on clock J+N, it may be lost, i.e., data
D J+1 may arrive on clock J+1+N
Clock
Addr
Synch Mem,
Latency N
Data
• This kind of synchronicity can pervade the
design and cause complications
http://www.csg.lcs.mit.edu/6.827
5
L19-11
Arvind
Asynchronous RAMs
It's easier to work with an "asynchronous" block:
Ready
( ctr > 0)
ctr
ctr++
deq
Data
Ready
ctr-­
Addr
Ack
Synch Mem
Latency N
Data
http://www.csg.lcs.mit.edu/6.827
L19-12
Arvind
RAMs: Synchronous vs Asynchronous
• The asynch mem has interface:
interface AsyncROM addr data =
read
:: addr -> Action
result :: data
:: Action
ack
• A synch mem can be converted into an
asynch mem with a Bluespec function:
syncToAsync :: SyncROM lat addr data ->
AsyncROM addr data
http://www.csg.lcs.mit.edu/6.827
6
L19-13
Arvind
A common subproblem:
port replicator
...
e.g., SRAM, IP lookup
x
y
Replicator
Replicator
• Given a base module s.t.:
– It generates response y to a particular request
x after some number of cycles
– It can accept requests and produce responses
on every cycle
• Construct a mechanism so that multiple
client modules can utilize the base
module with full utilization
http://www.csg.lcs.mit.edu/6.827
L19-14
Arvind
A general port replicator
base module
...
x
replicator
forward x,
enq tag
y
when tag matches,
forward y, deq tag
shared
tag FIFO
x
y
Client 1
x
y
Client 2
http://www.csg.lcs.mit.edu/6.827
7
L19-15
Arvind
Port replicator code
mk3ROMports :: AsyncROM lat Adr Dta ->
Module (AsyncROM (lat+1) Adr Dta,
AsyncROM (lat+1) Adr Dta,
AsyncROM (lat+1) Adr Dta)
mk3ROMports rom =
module
tags :: FIFO Tag <- mkSizedFIFO lat
let mkPort :: Tag -> Module (AsyncROM (lat+1)
mkPort i =
module
...
not quite legal
Adr
Dta)
port0 <- mkPort 0
port1 <- mkPort 1
port2 <- mkPort 2
interface (port0, port1, port2)
http://www.csg.lcs.mit.edu/6.827
L19-16
Arvind
Port replicator code
cont.
mkPort :: Tag -> Module (AsyncROM (lat+1)
mkPort i =
module
out :: FIFO Dta <- mkSizedFIFO lat
rules
when tags.first == i
==> action tags.deq
rom.ack
out.enq rom.result
interface
read a = action rom.read a
tags.enq i
result = out.first
ack
= out.deq
Adr
Dta)
Can one port’s activity block another port’s activity?
http://www.csg.lcs.mit.edu/6.827
8
L19-17
Arvind
A general port replicator
base module
...
x
replicator
fixed
y
forward x,
enq tag
when tag matches,
forward y, deq tag
cnt
shared
tag FIFO
cnt
cnt-­
cnt > 0
ready
x
y
cnt++
ack
cnt-­
deq
cnt > 0
ready
Client 1
x
deq
cnt++
ack
y
Client 2
http://www.csg.lcs.mit.edu/6.827
L19-18
Arvind
Port replicator code
fixed
mkPort :: Tag -> Module (AsyncROM (lat+1) Adr Dta)
mkPort i =
module
out :: FIFO Dta <- mkSizedFIFO lat
cnt :: Counter (log (lat+1)) <- mkCounter lat
rules
when tags.first == i
==> action tags.deq
rom.ack
out.enq rom.result
interface
read a = action rom.read a
tags.enq i
cnt.down
when cnt.value > 0
result = out.first
ack
= action out.deq
cnt.up
http://www.csg.lcs.mit.edu/6.827
9
L19-19
Arvind
Some observations
• Port replicator can be easily generalized
to n ports
• Port rules conflict with each other but
serializable semantics preserves
correctness
http://www.csg.lcs.mit.edu/6.827
L19-20
Arvind
Outline
• Example: IP Lookup √
• Three solutions √
– Statically scheduled memory pipeline
– Straight pipeline with uncoordinated memory
references
– Circular pipeline for 100% memory utilization
• Modeling RAMs √
– Synchronous vs. Asynchonous view
– Port replicator
• Bluespec coding for straight pipeline ⇐
• Bluespec coding for circular pipeline
• Phase 1 compilation
http://www.csg.lcs.mit.edu/6.827
10
L19-21
Arvind
Bluespec code: Straight pipeline
data Mid = Lookup IPaddr | Done LuResult
mkLPM :: AsyncROM lat LuAddr LuData -> Module LPM
mkLPM rom =
module
(rom0, rom1, rom2) <- mk3ROMports rom
fifo0
fifo1
fifo2
ofifo
::
::
::
::
FIFO
FIFO
FIFO
FIFO
Mid <- mkFIFO
Mid <- mkFIFO
Mid <- mkFIFO
LuResult <- mkFIFO
rules
... for Stages 1, 2 and Completion ...
interface
-- Stage 0
luReq ipa = action
rom0.read (zeroExtend ipa[31:16])
fifo0.enq (Lookup (ipa << 16))
luResp
= ofifo.first
luRespAck = ofifo.deq
http://www.csg.lcs.mit.edu/6.827
L19-22
Arvind
Straight pipeline
cont.
data Mid = Lookup IPaddr | Done LuResult
mkLPM rom =
module
... state is rom0, rom1, rom2, fifo0, fifo1, fifo2, ofifo
rules
-- Stage 1: lookup, leaf
when Lookup ipa <- fifo0.first,
Leaf res <- rom0.result
==> action fifo0.deq
rom0.ack
fifo1.enq (Done res)
-- Stage 1: lookup, node
when Lookup ipa <- fifo0.first,
Node res <- rom0.result
==> action fifo0.deq
rom0.ack
rom1.read (addr+(zeroExt ipa[31:24]))
fifo1.enq (Lookup (ipa << 8))
interface
...
http://www.csg.lcs.mit.edu/6.827
11
L19-23
Arvind
Outline
• Example: IP Lookup √
• Three solutions √
– Statically scheduled memory pipeline
– Straight pipeline with uncoordinated memory
references
– Circular pipeline for 100% memory utilization
• Modeling RAMs √
– Synchronous vs. Asynchonous view
– Port replicator
• Bluespec coding for straight pipeline √
• Bluespec coding for circular pipeline ⇐
• Phase 1 compilation
http://www.csg.lcs.mit.edu/6.827
L19-24
Arvind
Bluespec code: Circular pipeline
luReq
tf
getToken
Enter
Enter
Move
Move
RAM
ops
leaf
no
de
Complete
Complete
Recirc
Recirc
done
cb
luResp
buf
mkLPMK :: AsyncROM lat LuAddr LuData -> Module LPM
mkLPMK rom =
module
cb :: CBuffer NStage LuResult <- mkCBuffer
tf :: FIFO (CBToken NStage, IPaddr) <- mkFIFO
ops :: FIFO (CBToken NStage, IPaddr) <- mkSizedFIFO (lat+1)
buf :: FIFO ((CBToken NStage, IPaddr), LuAddr)
<- mkSizedFIFO (NStage+1)
rules
... rules for Entering, Completion, Recirculation & Movement...
interface
luReq ipa = action tf.enq (cb.getToken, ipa)
cb.getTokenAck
luResp
= cb.get
luRespAck = cb.ack
http://www.csg.lcs.mit.edu/6.827
12
L19-25
Arvind
Circular pipeline rules
luReq
tf
getToken
Enter
Enter
Move
Move
RAM
leaf
no
de
Complete
Complete
Recirc
Recirc
ops
done
cb
luResp
buf
“Enter”:
when (tok, ipa) <- tf.first
==> action
tf.deq
rom.read (zeroExt ipa[31:16])
ops.enq (tok, ipa << 16)
http://www.csg.lcs.mit.edu/6.827
L19-26
Arvind
Circular pipeline rules Continued
luReq
tf
getToken
Enter
Enter
Move
Move
RAM
ops
leaf
no
de
Complete
Complete
Recirc
Recirc
done
cb
luResp
buf
“Complete”:
when (Leaf res) <- rom.result,
(tok, ipa) <- ops.first
==> action
rom.ack
ops.deq
cb.done tok.res
“Recirculate”:
when (Node addr) <- rom.result,
(tok, ipa) <- ops.first
==> action
rom.ack
ops.deq
buf.enq ((tok, ipa << 8),
addr+(zeroExt ipa[31:24])
http://www.csg.lcs.mit.edu/6.827
13
L19-27
Arvind
Circular pipeline rules Continued- 2
luReq
tf
getToken
Enter
Enter
Move
Move
RAM
ops
leaf
no
de
Complete
Complete
Recirc
Recirc
done
cb
luResp
buf
“Move”:
when (tokipa, addr) <- buf.first
==> action
buf.deq
rom.read addr
ops.enq tokipa
“Enter”:
notice the
when (tok, ipa) <- tf.first
conflicts!
==> action
tf.deq
rom.read (zeroExt ipa[31:16])
ops.enq (tok, ipa << 16)
http://www.csg.lcs.mit.edu/6.827
L19-28
Arvind
Some observations
• Timing Closure
– Insert more pipeline stages (i.e. FIFO buffers)!
• Circular pipeline solution trivially extends
to IPv6
http://www.csg.lcs.mit.edu/6.827
14
L19-29
Arvind
Outline
• Example: IP Lookup √
• Three solutions √
– Statically scheduled memory pipeline
– Straight pipeline with uncoordinated memory
references
– Circular pipeline for 100% memory utilization
• Modeling RAMs √
– Synchronous vs. Asynchonous view
– Port replicator
• Bluespec coding for straight pipeline √
• Bluespec coding for circular pipeline √
• Phase 1 compilation ⇐
http://www.csg.lcs.mit.edu/6.827
L19-30
Arvind
LPM code structure
mkLPM rom =
module
(rom0, rom1, rom2) <- mk3ROMports rom
fifo0 <- mkFIFO
fifo1 <- mkFIFO
fifo2 <- mkFIFO
Free variables of the rule
ofifo <- mkFIFO
rules
RuleStage1Leaf(fifo0, fifo1, rom0)
RuleStage1Node(fifo0, fifo1, rom0, rom1)
RuleStage2Noop(fifo1, fifo2)
RuleStage2Leaf(fifo1, fifo2, rom1)
RuleStage2Node(fifo1, fifo2, rom1, rom2)
RuleCompletionNoop(fifo2, ofifo)
RuleCompletionLeaf(fifo2, ofifo, rom2)
RuleCompletionNode(fifo2, ofifo, rom2)
interface
luReq = EluReq(fifo0, rom0)
luResp = EluResp(ofifo)
luRespAck = EluRespAck(ofifo)
http://www.csg.lcs.mit.
edu/6.827
15
L19-31
Arvind
Port replicator code structure
mk3ROMports rom =
module
tags <- mkSizedFIFO
let
mkPort i =
module
out <- mkSizedFIFO
cnt <- mkCounter
rules
RuleTags(i, rom, tags, out)
interface
read = Eread(i, rom, tags, cnt)
result = Eresult(out)
ack = Eack(out, cnt)
substitute
port0 <- mkPort 0
port1 <- mkPort 1
port2 <- mkPort 2
interface (port0, port1, port2)
http://www.csg.lcs.mit.edu/6.827
L19-32
Arvind
Port replicator – after step 1
Step 2:
Flatten
the
module
mk3ROMports rom =
module
tags <- mkSizedFIFO
port0 <module
out <- mkSizedFIFO
cnt <- mkCounter
rules
RuleTags(0, rom, tags, out)
interface
read = Eread(0, rom, tags, cnt)
result = Eresult(out)
ack = Eack(out, cnt)
port1 <- ...similarly...
port2 <- ...similarly...
interface (port0, port1, port2)
http://www.csg.lcs.mit.edu/6.827
16
L19-33
Arvind
Port replicator – after step 2
mk3ROMports rom =
module
tags <- mkSizedFIFO
out0 <- mkSizedFIFO
cnt0 <- mkCounter
rules
RuleTags(0, rom, tags, out0)
let port0 = interface
read = Eread(0, rom, tags, cnt0)
result = Eresult(out0)
ack = Eack(out0, cnt0)
port1 <- ...similarly...
port2 <- ...similarly...
interface (port0, port1, port2)
http://www.csg.lcs.mit.edu/6.827
L19-34
Arvind
Port replicator – final step
mk3ROMports rom =
module
tags <- mkSizedFIFO
out0 <- mkSizedFIFO ; cnt0 <- mkCounter
out1 <- mkSizedFIFO ; cnt1 <- mkCounter
out2 <- mkSizedFIFO ; cnt2 <- mkCounter
rules
RuleTags(0, rom, tags, out0)
RuleTags(1, rom, tags, out1)
RuleTags(2, rom, tags, out2)
let port0 = interface
read = Eread(0, rom, tags, cnt0)
Next step:
result = Eresult(out0)
substitute
ack = Eack(out0, cnt0)
mk3ROMports port1 = interface
read = Eread(1, rom, tags, cnt1)
into mkLPM
...
port2 = interface ...
interface (port0, port1, port2)
http://www.csg.lcs.mit.edu/6.827
17
L19-35
Arvind
Port replicator call
(rom0, rom1, rom2) <- mk3ROMports rom
tags <- mkSizedFIFO
out0 <- mkSizedFIFO ; cnt0 <- mkCounter
out1 <- mkSizedFIFO ; cnt1 <- mkCounter
out2 <- mkSizedFIFO ; cnt2 <- mkCounter
rules
RuleTags(0, rom, tags, out0)
RuleTags(1, rom, tags, out1)
RuleTags(2, rom, tags, out2)
let port0 = interface
read = Eread(0, rom, tags, cnt0)
result = Eresult(out0)
substitutue
ack = Eack(out0, cnt0)
for ports
port1 = interface ...
next
port2 = interface ...
(rom0, rom1, rom2) = (port0, port1, port2)
http://www.csg.lcs.mit.edu/6.827
After Port replicator call
susbtitution
L19-36
Arvind
(rom0, rom1, rom2) <- mk3ROMports rom
tags <- mkSizedFIFO
out0 <- mkSizedFIFO ; cnt0 <- mkCounter
out1 <- mkSizedFIFO ; cnt1 <- mkCounter
out2 <- mkSizedFIFO ; cnt2 <- mkCounter
rules
RuleTags(0, rom, tags, out0)
RuleTags(1, rom, tags, out1)
RuleTags(2, rom, tags, out2)
let rom0 = interface
read = Eread(0, rom, tags, cnt0)
result = Eresult(out0)
ack = Eack(out0, cnt0)
rom1 = interface ...
rom2 = interface ...
http://www.csg.lcs.mit.edu/6.827
18
L19-37
Arvind
LPM code after flattening
mkLPM rom =
module
tags <- mkSizedFIFO;
out0 <- mkSizedFIFO; cnt0 <- mkCounter;
out1 <- mkSizedFIFO; cnt1 <- mkCounter;
out2 <- mkSizedFIFO; cnt2 <- mkCounter;
fifo0 <- mkFIFO; fifo1 <- mkFIFO; fifo2 <- mkFIFO;
ofifo <- mkFIFO;
rules
RuleTags(0, rom, tags, out0)...
let rom0 = interface
read = Eread(0, rom, tags, cnt0)
result = Eresult(out0)
ack = Eack(out0, cnt0)
rom1 = interface ... ; rom2 = interface ...
RuleStage1Leaf(fifo0, fifo1, rom0)...
interface
luReq = EluReq(fifo0, rom0)
luResp = EluResp(ofifo)
luRespAck = EluRespAck(ofifo)
http://www.csg.lcs.mit.edu/6.827
19
Download