L19- 1 Bluespec- 3 The IP Lookup Problem Arvind Laboratory for Computer Science M.I.T. Lecture 19 http://www.csg.lcs.mit.edu/6.827 L19-2 Arvind The IP lookup problem • An IP lookup table contains IP prefixes and associated data • The problem: given an IP address, return the data associated with the longest prefix match ("LPM") Example Table Prefix Data IP address 7.14.*.* A 7.13.7.3 F 7.14.7.3 B 10.7.12.15 F 10.18.200.* C 10.18.201.5 F 10.18.200.5 D 7.14.7.2 5.*.*.* E 5.13.7.2 * F 8.0.0.0 Result Example lookups 10.18.200.7 http://www.csg.lcs.mit.edu/6.827 1 L19-3 Arvind F F … … F F F C F 10 255 IP address A … F A 7 Result M Ref 7.13.7.3 F 2 10.18.201.5 F 3 7.14.7.2 A 5.13.7.2 E 10.18.200.7 C 18 200 5 D … * … E F B … 5.*.*.* 3 7 A … D F A … 10.18.200.5 … C 14 E … 10.18.200.* 5 F … B … 7.14.7.3 F … A … 0 7.14.*.* … Sparse tree representation C http://www.csg.lcs.mit.edu/ L19-4 Arvind Table representation issues • LPM is used for CIDR (Classless InterDomain Routing) • Number of memory accesses for an LPM? – Too many difficult to do LPMs at line rate • Table size? – Too big bigger SRAM more latency, cost, power • Control- plane issues: – incremental table update – size, speed of table maintenance software • In this lecture (so code will fit on slides!): – Level 1: 16 bits, Levels 2 and 3: 8 bits – So: from 1 to 3 memory accesses for an LPM http://www.csg.lcs.mit.edu/6.827 2 L19-5 Arvind Outline • Example: IP Lookup √ • Three solutions ⇐ – Statically scheduled memory pipeline – Straight pipeline with uncoordinated memory references – Circular pipeline for 100% memory utilization • Modeling RAMs – Synchronous vs. Asynchonous view – Port replicator • Bluespec coding for straight pipeline • Bluespec coding for circular pipeline • Phase 1 compilation http://www.csg.lcs.mit.edu/6.827 L19-6 Arvind Static scheduling solution • Assume the SRAM containing the table has latency of n cycles, lay out a pipeline so that the memory accesses are precisely scheduled • Issues: – Since an LPM may take 1- 3 mem accesses, unused slots may be left idle – May have to replan the pipeline for a different latency memory – Very difficult to plan if memory is also to be used for some unrelated task. http://www.csg.lcs.mit.edu/6.827 3 L19-7 Arvind LPM: Straight Pipeline Solution ROM IP Address Table Replicator Replicator rom0 start lookup 1 rom1 rom2 finish lookup 1, (start lookup 2) fifo0 finish lookup 2, (start lookup 3) fifo1 IP addr 18.100.32.127 finish lookup 3 ofifo fifo2 Lookup Result (Egress port, …) http://www.csg.lcs.mit.edu/6.827 L19-8 Arvind Circular pipeline solution getToken Completion buffer getTokenAck luResp luRespAck done luReq tf Enter Enter Move Move RAM ops leaf node Complete Complete Recirc Recirc buf http://www.csg.lcs.mit.edu/6.827 4 L19-9 Arvind Outline • Example: IP Lookup √ • Three solutions √ – Statically scheduled memory pipeline – Straight pipeline with uncoordinated memory references – Circular pipeline for 100% memory utilization • Modeling RAMs ⇐ – Synchronous vs. Asynchonous view – Port replicator • Bluespec coding for straight pipeline • Bluespec coding for circular pipeline • Phase 1 compilation http://www.csg.lcs.mit.edu/6.827 e L19-10 Arvind RAMs • Basic memory components are "synchronous": – Present an read- address AJ on clock J – Data DJ arrives on clock J+N – If you don't "catch" DJ on clock J+N, it may be lost, i.e., data D J+1 may arrive on clock J+1+N Clock Addr Synch Mem, Latency N Data • This kind of synchronicity can pervade the design and cause complications http://www.csg.lcs.mit.edu/6.827 5 L19-11 Arvind Asynchronous RAMs It's easier to work with an "asynchronous" block: Ready ( ctr > 0) ctr ctr++ deq Data Ready ctr-­ Addr Ack Synch Mem Latency N Data http://www.csg.lcs.mit.edu/6.827 L19-12 Arvind RAMs: Synchronous vs Asynchronous • The asynch mem has interface: interface AsyncROM addr data = read :: addr -> Action result :: data :: Action ack • A synch mem can be converted into an asynch mem with a Bluespec function: syncToAsync :: SyncROM lat addr data -> AsyncROM addr data http://www.csg.lcs.mit.edu/6.827 6 L19-13 Arvind A common subproblem: port replicator ... e.g., SRAM, IP lookup x y Replicator Replicator • Given a base module s.t.: – It generates response y to a particular request x after some number of cycles – It can accept requests and produce responses on every cycle • Construct a mechanism so that multiple client modules can utilize the base module with full utilization http://www.csg.lcs.mit.edu/6.827 L19-14 Arvind A general port replicator base module ... x replicator forward x, enq tag y when tag matches, forward y, deq tag shared tag FIFO x y Client 1 x y Client 2 http://www.csg.lcs.mit.edu/6.827 7 L19-15 Arvind Port replicator code mk3ROMports :: AsyncROM lat Adr Dta -> Module (AsyncROM (lat+1) Adr Dta, AsyncROM (lat+1) Adr Dta, AsyncROM (lat+1) Adr Dta) mk3ROMports rom = module tags :: FIFO Tag <- mkSizedFIFO lat let mkPort :: Tag -> Module (AsyncROM (lat+1) mkPort i = module ... not quite legal Adr Dta) port0 <- mkPort 0 port1 <- mkPort 1 port2 <- mkPort 2 interface (port0, port1, port2) http://www.csg.lcs.mit.edu/6.827 L19-16 Arvind Port replicator code cont. mkPort :: Tag -> Module (AsyncROM (lat+1) mkPort i = module out :: FIFO Dta <- mkSizedFIFO lat rules when tags.first == i ==> action tags.deq rom.ack out.enq rom.result interface read a = action rom.read a tags.enq i result = out.first ack = out.deq Adr Dta) Can one port’s activity block another port’s activity? http://www.csg.lcs.mit.edu/6.827 8 L19-17 Arvind A general port replicator base module ... x replicator fixed y forward x, enq tag when tag matches, forward y, deq tag cnt shared tag FIFO cnt cnt-­ cnt > 0 ready x y cnt++ ack cnt-­ deq cnt > 0 ready Client 1 x deq cnt++ ack y Client 2 http://www.csg.lcs.mit.edu/6.827 L19-18 Arvind Port replicator code fixed mkPort :: Tag -> Module (AsyncROM (lat+1) Adr Dta) mkPort i = module out :: FIFO Dta <- mkSizedFIFO lat cnt :: Counter (log (lat+1)) <- mkCounter lat rules when tags.first == i ==> action tags.deq rom.ack out.enq rom.result interface read a = action rom.read a tags.enq i cnt.down when cnt.value > 0 result = out.first ack = action out.deq cnt.up http://www.csg.lcs.mit.edu/6.827 9 L19-19 Arvind Some observations • Port replicator can be easily generalized to n ports • Port rules conflict with each other but serializable semantics preserves correctness http://www.csg.lcs.mit.edu/6.827 L19-20 Arvind Outline • Example: IP Lookup √ • Three solutions √ – Statically scheduled memory pipeline – Straight pipeline with uncoordinated memory references – Circular pipeline for 100% memory utilization • Modeling RAMs √ – Synchronous vs. Asynchonous view – Port replicator • Bluespec coding for straight pipeline ⇐ • Bluespec coding for circular pipeline • Phase 1 compilation http://www.csg.lcs.mit.edu/6.827 10 L19-21 Arvind Bluespec code: Straight pipeline data Mid = Lookup IPaddr | Done LuResult mkLPM :: AsyncROM lat LuAddr LuData -> Module LPM mkLPM rom = module (rom0, rom1, rom2) <- mk3ROMports rom fifo0 fifo1 fifo2 ofifo :: :: :: :: FIFO FIFO FIFO FIFO Mid <- mkFIFO Mid <- mkFIFO Mid <- mkFIFO LuResult <- mkFIFO rules ... for Stages 1, 2 and Completion ... interface -- Stage 0 luReq ipa = action rom0.read (zeroExtend ipa[31:16]) fifo0.enq (Lookup (ipa << 16)) luResp = ofifo.first luRespAck = ofifo.deq http://www.csg.lcs.mit.edu/6.827 L19-22 Arvind Straight pipeline cont. data Mid = Lookup IPaddr | Done LuResult mkLPM rom = module ... state is rom0, rom1, rom2, fifo0, fifo1, fifo2, ofifo rules -- Stage 1: lookup, leaf when Lookup ipa <- fifo0.first, Leaf res <- rom0.result ==> action fifo0.deq rom0.ack fifo1.enq (Done res) -- Stage 1: lookup, node when Lookup ipa <- fifo0.first, Node res <- rom0.result ==> action fifo0.deq rom0.ack rom1.read (addr+(zeroExt ipa[31:24])) fifo1.enq (Lookup (ipa << 8)) interface ... http://www.csg.lcs.mit.edu/6.827 11 L19-23 Arvind Outline • Example: IP Lookup √ • Three solutions √ – Statically scheduled memory pipeline – Straight pipeline with uncoordinated memory references – Circular pipeline for 100% memory utilization • Modeling RAMs √ – Synchronous vs. Asynchonous view – Port replicator • Bluespec coding for straight pipeline √ • Bluespec coding for circular pipeline ⇐ • Phase 1 compilation http://www.csg.lcs.mit.edu/6.827 L19-24 Arvind Bluespec code: Circular pipeline luReq tf getToken Enter Enter Move Move RAM ops leaf no de Complete Complete Recirc Recirc done cb luResp buf mkLPMK :: AsyncROM lat LuAddr LuData -> Module LPM mkLPMK rom = module cb :: CBuffer NStage LuResult <- mkCBuffer tf :: FIFO (CBToken NStage, IPaddr) <- mkFIFO ops :: FIFO (CBToken NStage, IPaddr) <- mkSizedFIFO (lat+1) buf :: FIFO ((CBToken NStage, IPaddr), LuAddr) <- mkSizedFIFO (NStage+1) rules ... rules for Entering, Completion, Recirculation & Movement... interface luReq ipa = action tf.enq (cb.getToken, ipa) cb.getTokenAck luResp = cb.get luRespAck = cb.ack http://www.csg.lcs.mit.edu/6.827 12 L19-25 Arvind Circular pipeline rules luReq tf getToken Enter Enter Move Move RAM leaf no de Complete Complete Recirc Recirc ops done cb luResp buf “Enter”: when (tok, ipa) <- tf.first ==> action tf.deq rom.read (zeroExt ipa[31:16]) ops.enq (tok, ipa << 16) http://www.csg.lcs.mit.edu/6.827 L19-26 Arvind Circular pipeline rules Continued luReq tf getToken Enter Enter Move Move RAM ops leaf no de Complete Complete Recirc Recirc done cb luResp buf “Complete”: when (Leaf res) <- rom.result, (tok, ipa) <- ops.first ==> action rom.ack ops.deq cb.done tok.res “Recirculate”: when (Node addr) <- rom.result, (tok, ipa) <- ops.first ==> action rom.ack ops.deq buf.enq ((tok, ipa << 8), addr+(zeroExt ipa[31:24]) http://www.csg.lcs.mit.edu/6.827 13 L19-27 Arvind Circular pipeline rules Continued- 2 luReq tf getToken Enter Enter Move Move RAM ops leaf no de Complete Complete Recirc Recirc done cb luResp buf “Move”: when (tokipa, addr) <- buf.first ==> action buf.deq rom.read addr ops.enq tokipa “Enter”: notice the when (tok, ipa) <- tf.first conflicts! ==> action tf.deq rom.read (zeroExt ipa[31:16]) ops.enq (tok, ipa << 16) http://www.csg.lcs.mit.edu/6.827 L19-28 Arvind Some observations • Timing Closure – Insert more pipeline stages (i.e. FIFO buffers)! • Circular pipeline solution trivially extends to IPv6 http://www.csg.lcs.mit.edu/6.827 14 L19-29 Arvind Outline • Example: IP Lookup √ • Three solutions √ – Statically scheduled memory pipeline – Straight pipeline with uncoordinated memory references – Circular pipeline for 100% memory utilization • Modeling RAMs √ – Synchronous vs. Asynchonous view – Port replicator • Bluespec coding for straight pipeline √ • Bluespec coding for circular pipeline √ • Phase 1 compilation ⇐ http://www.csg.lcs.mit.edu/6.827 L19-30 Arvind LPM code structure mkLPM rom = module (rom0, rom1, rom2) <- mk3ROMports rom fifo0 <- mkFIFO fifo1 <- mkFIFO fifo2 <- mkFIFO Free variables of the rule ofifo <- mkFIFO rules RuleStage1Leaf(fifo0, fifo1, rom0) RuleStage1Node(fifo0, fifo1, rom0, rom1) RuleStage2Noop(fifo1, fifo2) RuleStage2Leaf(fifo1, fifo2, rom1) RuleStage2Node(fifo1, fifo2, rom1, rom2) RuleCompletionNoop(fifo2, ofifo) RuleCompletionLeaf(fifo2, ofifo, rom2) RuleCompletionNode(fifo2, ofifo, rom2) interface luReq = EluReq(fifo0, rom0) luResp = EluResp(ofifo) luRespAck = EluRespAck(ofifo) http://www.csg.lcs.mit. edu/6.827 15 L19-31 Arvind Port replicator code structure mk3ROMports rom = module tags <- mkSizedFIFO let mkPort i = module out <- mkSizedFIFO cnt <- mkCounter rules RuleTags(i, rom, tags, out) interface read = Eread(i, rom, tags, cnt) result = Eresult(out) ack = Eack(out, cnt) substitute port0 <- mkPort 0 port1 <- mkPort 1 port2 <- mkPort 2 interface (port0, port1, port2) http://www.csg.lcs.mit.edu/6.827 L19-32 Arvind Port replicator – after step 1 Step 2: Flatten the module mk3ROMports rom = module tags <- mkSizedFIFO port0 <module out <- mkSizedFIFO cnt <- mkCounter rules RuleTags(0, rom, tags, out) interface read = Eread(0, rom, tags, cnt) result = Eresult(out) ack = Eack(out, cnt) port1 <- ...similarly... port2 <- ...similarly... interface (port0, port1, port2) http://www.csg.lcs.mit.edu/6.827 16 L19-33 Arvind Port replicator – after step 2 mk3ROMports rom = module tags <- mkSizedFIFO out0 <- mkSizedFIFO cnt0 <- mkCounter rules RuleTags(0, rom, tags, out0) let port0 = interface read = Eread(0, rom, tags, cnt0) result = Eresult(out0) ack = Eack(out0, cnt0) port1 <- ...similarly... port2 <- ...similarly... interface (port0, port1, port2) http://www.csg.lcs.mit.edu/6.827 L19-34 Arvind Port replicator – final step mk3ROMports rom = module tags <- mkSizedFIFO out0 <- mkSizedFIFO ; cnt0 <- mkCounter out1 <- mkSizedFIFO ; cnt1 <- mkCounter out2 <- mkSizedFIFO ; cnt2 <- mkCounter rules RuleTags(0, rom, tags, out0) RuleTags(1, rom, tags, out1) RuleTags(2, rom, tags, out2) let port0 = interface read = Eread(0, rom, tags, cnt0) Next step: result = Eresult(out0) substitute ack = Eack(out0, cnt0) mk3ROMports port1 = interface read = Eread(1, rom, tags, cnt1) into mkLPM ... port2 = interface ... interface (port0, port1, port2) http://www.csg.lcs.mit.edu/6.827 17 L19-35 Arvind Port replicator call (rom0, rom1, rom2) <- mk3ROMports rom tags <- mkSizedFIFO out0 <- mkSizedFIFO ; cnt0 <- mkCounter out1 <- mkSizedFIFO ; cnt1 <- mkCounter out2 <- mkSizedFIFO ; cnt2 <- mkCounter rules RuleTags(0, rom, tags, out0) RuleTags(1, rom, tags, out1) RuleTags(2, rom, tags, out2) let port0 = interface read = Eread(0, rom, tags, cnt0) result = Eresult(out0) substitutue ack = Eack(out0, cnt0) for ports port1 = interface ... next port2 = interface ... (rom0, rom1, rom2) = (port0, port1, port2) http://www.csg.lcs.mit.edu/6.827 After Port replicator call susbtitution L19-36 Arvind (rom0, rom1, rom2) <- mk3ROMports rom tags <- mkSizedFIFO out0 <- mkSizedFIFO ; cnt0 <- mkCounter out1 <- mkSizedFIFO ; cnt1 <- mkCounter out2 <- mkSizedFIFO ; cnt2 <- mkCounter rules RuleTags(0, rom, tags, out0) RuleTags(1, rom, tags, out1) RuleTags(2, rom, tags, out2) let rom0 = interface read = Eread(0, rom, tags, cnt0) result = Eresult(out0) ack = Eack(out0, cnt0) rom1 = interface ... rom2 = interface ... http://www.csg.lcs.mit.edu/6.827 18 L19-37 Arvind LPM code after flattening mkLPM rom = module tags <- mkSizedFIFO; out0 <- mkSizedFIFO; cnt0 <- mkCounter; out1 <- mkSizedFIFO; cnt1 <- mkCounter; out2 <- mkSizedFIFO; cnt2 <- mkCounter; fifo0 <- mkFIFO; fifo1 <- mkFIFO; fifo2 <- mkFIFO; ofifo <- mkFIFO; rules RuleTags(0, rom, tags, out0)... let rom0 = interface read = Eread(0, rom, tags, cnt0) result = Eresult(out0) ack = Eack(out0, cnt0) rom1 = interface ... ; rom2 = interface ... RuleStage1Leaf(fifo0, fifo1, rom0)... interface luReq = EluReq(fifo0, rom0) luResp = EluResp(ofifo) luRespAck = EluRespAck(ofifo) http://www.csg.lcs.mit.edu/6.827 19