CALIFORNIA STATE UNIVERSITY, NORTHRIDGE IMPLEMENTATION OF MEMORY EFFICIENT IP LOOKUP ARCHITECTURE

advertisement
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
IMPLEMENTATION OF MEMORY EFFICIENT IP LOOKUP
ARCHITECTURE
A graduate project submitted in partial fulfillment of requirements
For the degree of Master of Science in,
Electrical Engineering
By
Karthik Venkatesh Malekal
May 2014
The graduate project implemented by Karthik Venkatesh Malekal is approved:
___________________________________
_____________________
Dr. Shahnam Mirzaei
Date
___________________________________
_____________________
Dr. Somnath Chattopadhyay
Date
___________________________________
_____________________
Dr. Nagi El Naga, Chair
Date
California State University, Northridge
ii
ACKNOWLEDGMENTS
This project work would not have been possible without the guidance and the help of
several individuals who contributed and extended their valuable assistance in the
preparation and completion of this project.
Firstly, I would like to extend my most sincere gratitude to Dr. Nagi El Naga for all the
help and support I received from him. He has been an inspiration and a constant driving
force over the course of implementing this project making available to me, all the
required resources that were needed. It was an honor for me to have worked on a project
under his guidance.
I would also like to thank Dr. Shahnam Mirzaei for taking the time to guide me on several
occasions early on during the implementation of this project which was extremely helpful
and invaluable. I would like thank Dr. Somnath Chattopadhyay for his support and for
agreeing to be a member of the project committee.
Most importantly, I would like to thank my parents, Mr. Venkatesh A.K and Mrs. Geetha
Venkatesh for their constant support, prayers and well wishes at every step that I have
taken. I would also like to extend my sincere thanks to my brother, Vinay Malekal and
my sister-in-law, Mithila Malekal for their constant support over the past two years of my
Graduate studies.
Lastly, I extend my sincere gratitude to the Department of Electrical and Computer
Engineering at California State University, Northridge and to all professors who have in
one way or the other extended their support and help in the completion of this project.
iii
TABLE OF CONTENTS
Signature Page
ii
Acknowledgments
iii
List of Table
vi
List of Figures
vii
Abstract
ix
CHAPTER 1: Objective and Project Outline
1
1.1: Introduction
1
1.2: Objective
5
1.3: Project Outline
5
CHAPTER 2: IP Addressing
6
2.1: Classful Addressing Scheme
9
2.2: Classless Inter-Domain Addressing
10
2.3: Solutions to longest prefix match search process
13
2.3.1: Path compression techniques
17
2.3.2: Prefix expansion/Leaf pushing
18
2.3.3: Using multi-bit tries
21
2.3.4: Using a combination of leaf pushing and multi-bit tries
24
CHAPTER 3: Proposals for search process using binary tries
26
CHAPTER 4: Implementation details of memory efficient pipeline architecture
31
iv
4.1: Implementation of pipeline structure
34
4.1.1: The prefix table
34
4.1.2: The trie structure and mapping
36
4.2: Implementation of hardware
45
4.2.1: Data organization in memory blocks
46
4.2.2: Logical blocks used at each stage
47
4.2.3: Search process at stage 1
53
4.2.4: Search process in other stages of the pipeline
55
CHAPTER 5: Simulation results and analysis
59
5.1: Simulation analysis
59
5.2: Implementation analysis
65
CHAPTER 6: Conclusion
67
6.1: Project achievements
67
6.2: Scope for improvements
68
REFERENCES
70
APPENDIX A: Data in Memory Modules
71
APPENDIX B: Implementation codes
78
v
LIST OF TABLES
Table 2.1: An example forwarding table
8
Table 2.2: Prefixes and corresponding nodes in forwarding table
13
Table 4.1: Prefix/forwarding table showing sample prefix values and their
corresponding nodes
35
Table 4.2: Algorithm to find initial stride
39
Table 5.1: List of test vectors to test the IP Lookup request outputs
60
vi
LIST OF FIGURES
Figure 1.1: Block diagram of TCAM based solution for IP lookup
3
Figure 1.2: Block diagram of SRAM based solution for IP lookup
4
Figure 2.0: Block diagram of IP packet showing the source and destination IP
address
6
Figure 2.1: IP network address representation
11
Figure 2.2: Example binary trie structure
14
Figure 2.3: Example path compressed uni-bit trie
17
Figure 2.4: Leaf pushed uni-bit trie
19
Figure 2.5: Multi-bit trie structure with fixed stride
22
Figure 2.6: Leaf pushed multi-bit trie
24
Figure 3.1: Pipeline stages with a ring structure
27
Figure 3.2: Pipeline structure used in CAMP search algorithm
29
Figure 4.1: Sample trie structure with two subtries and mapping to pipeline
32
Figure 4.2: Example subtrie with one parent (node A), two children (nodes B
and C) and mapping to the pipeline
34
Figure 4.3: Trie structure corresponding to given prefixes in table 4.1
36
Figure 4.4: Leaf-pushed uni-bit trie structure
37
Figure 4.5: Multi-bit trie with initial stride, i = 2
39
Figure 4.6: Multi-bit trie with initial stride, i = 3
40
Figure 4.7: Subtrie with root at node “a” in the multi-bit trie from figure 4.6
41
Figure 4.8: Pipeline structure with nodes mapped to its memory modules
44
vii
Figure 4.9: General structure of the pipelined hardware
45
Figure 4.10: Input packet format
45
Figure 4.11: Data packet from memory block
47
Figure 4.12: Shift register module
47
Figure 4.13: Stage 1 memory module
48
Figure 4.13a: Memory module structure
49
Figure 4.14: Node distance check module
50
Figure 4.15: Update module
51
Figure 4.16: Write data packet
52
Figure 4.17: Block diagram of stage 1 of pipeline
53
Figure 4.17a: Block diagram of the stage module of a pipeline
56
Figure 5.1: Simulation output with outputs from each of the 7 stages in the
pipeline
61
Figure 5.2: Simplified simulation output showing only the inputs and final
output
62
Figure 5.3: Simulation outputs showing operation of NDC module
63
Figure 5.4: Simulation outputs showing write operation
64
viii
ABSTRACT
IMPLEMENTATION OF MEMORY EFFICIENT IP LOOKUP
ARCHITECTURE
By
Karthik Venkatesh Malekal
Master of Science in Electrical Engineering
The internet today has grown into a vast network of networks. It consists of a large
number of smaller networks connecting millions of users together by linking these
smaller networks together via a communication medium and routers. As the
communication speeds between these networks is growing rapidly over the past several
years, the demands on faster processing and routing of packets between the networks is
increasing at a rapid pace. A solution to this increased demand in packet routing can be
achieved by implementing the IP lookup process for packet routing using some form of a
tree structure. A TRIE based implementation can be a very efficient IP lookup process
which allows for a highly pipelined structure to achieve high throughputs. But mapping a
trie on to the pipelined stages without a proper structure would result in unbalanced
memory distribution over the stages of the pipeline.
In this project project, a simple and highly effective IP lookup process using a linear
pipeline architecture with a uniform distribution of the trie structure over all the pipelined
stages is presented. This is achieved by using a pipeline of memory modules with
additional logic at each stage to process the data resulting from the search at each stage.
The result as observed is a fast and linear operation of IP Lookup that can be
implemented on a FPGA with one Lookup request being processed at every clock cycle.
ix
CHAPTER 1
OBJECTIVE AND PROJECT OUTLINE
1.1 INTRODUCTION
The main function of network routers is to accept the incoming packets and forward them
to their final destinations. For this, the routers on the network will have to decide the
forwarding path for every incoming packet to reach the next network on the route to the
destination network. This forwarding decision made by the routers usually consists of the
IP address of the next-hop router and the outgoing port through which the packet will
have to be forwarded. This information is usually stored in the form of a forwarding table
in every router which the router builds using information collected from the routing
protocols.
The packet’s destination address is used as the key search field during the lookup through
the forwarding table to find the next-hop location for the packet. This process is called
the “Address Lookup”. Once information regarding the forwarding address is obtained
from the table, the router can route the packet on it’s incoming link through the
appropriate port on it’s output link to the next network. This process is called “Packet
Switching”.
With a continued increase in the network speed called the “Network Link Rates”, the IP
address lookup process in the network routers has become a significant bottleneck. With
1
advances in networking technologies used to route packets, the network link rates are
ever increasingly approaching speeds in the Terabits per second range. With these
increased network speeds, software based implementations of IP lookup can be very slow
and will mostly not be able to support such high speeds.
Hardware based implementation of the lookup process are more appropriate solutions as
the forwarding information search speeds can be highly improved as a result of the
overhead of processing the software running on the router can be removed with a pure
hardware implementation. Two types of hardware implementations are currently
available which can be used for this process. The first being the Ternary Content
Addressable Memory (TCAM) based solutions and secondly, the Static Random Access
Memory (SRAM) based solutions.
A. Ternary Content Addressable Memory (TCAM) based solutions
Ternary Content Addressable Memory, also called TCAM is a high speed memory
structure that uses the data provided by the user, in this case, the router has to search
its entire memory to find an exact match. Though this implementation of IP lookup
can be fast by obtaining outputs in one clock cycle, their throughput is limited by the
low speed of operation of the TCAM. Besides, the power consumption of TCAM
structures is also significantly higher and they lack the ability to be scalable. All these
factors prove that the TCAM based solutions are not a very effective in solving the IP
2
Lookup problem. The block diagram of a TCAM based IP lookup solution is
presented in Figure 1.1.
Figure 1.1: Block diagram of TCAM based solution for IP lookup
B. Static Random Access Memory (SRAM) based solutions
Static Random Access Memory, also called SRAM is a strong alternative to the use of
TCAMs for the purpose of IP Lookup. They usually have very high operation speeds
and their power requirements are also comparatively less when compared to that of
TCAMs. But the problem with SRAM based structure is that they require multiple
clock cycles to complete a single lookup operation. This limitation of the SRAM
based architecture can be overcome by the use of pipelining which would
significantly improve it’s throughput. An implementation of the IP Lookup process
using SRAMs is presented in the block diagram in Figure 1.2.
3
Figure 1.2: Block diagram of SRAM based solution for IP lookup
Using a binary search process on a tree like data structure called a “trie”, the IP lookup
process using the pipelined SRAM architecture can be implemented by mapping each trie
level to a particular stage of the pipeline with it’s own memory and logic processing unit
to perform the lookup. In this way, by implementing an entire trie onto different stages of
a complete pipelined structure, the IP lookup process can flow through the pipelined
stages resulting in the forwarding information being output at the end of the pipeline at
the end of every clock cycle.
Although the above method would increase the throughput of the system, the mapping
would result in an unbalanced distribution of the trie nodes on the different stages of the
pipeline. This in turn would result in the pipeline stage containing larger number of trie
nodes requiring multiple clock cycle to access the larger memory requirements at that
stage and will also lead to larger number of updates to the memory which are usually
dependent on the number of nodes stored in that memory block. As a result, when there
are large number of updates resulting from route insertions and deletions, the stages with
4
larger memory blocks in the pipeline will need proportionately larger number of updates
and can also lead to memory overflow. Hence such stages in the pipeline can become
bottlenecks and slow down the entire search process.
1.2 OBJECTIVE
In this project, a pipelined search architecture is implemented which has a uniform
distribution of the trie nodes across all the memory blocks of the pipeline structure which
would result in a constant search time across all the stages of the pipeline structure. It will
also result in a high throughput of one output at the end of every clock cycle resulting
from the pipeline operation of the search process. With this linear pipelined structure with
constant memory block sizes at each stage, the updates required to each memory block
can also be reduced due to the uniformity in the distribution of the trie nodes.
1.3 PROJECT OUTLINE:
Here, we build a pipelined search structure on a FPGA with each pipeline stage
containing a memory block and logic units that are used to carry out the search process
and the updates to the memory block when a trie node has to be either deleted or added
depending on the topology of the networks surrounding the router. Furthermore, the
nodes of the binary trie structure are uniformly distributed over all the memory blocks to
achieve a uniform search time at each of the pipeline stages. The pipelined search
architecture is designed to be implemented on a Xilinx Spartan 6 FPGA.
5
CHAPTER 2
IP ADDRESSING
In computer networking which is mainly based on packet switched network protocols, the
data to be transmitted are organized into packets of fixed length before transmission.
There are two forms of addressing that are used to route these data packets from a source
node to a destination node, where a node can be either a host computer at the user end or
a server servicing requests from users. The two addressing forms are, IPv4 (Internet
Protocol version 4) and IPv6 (Internet Protocol version 6). The main difference between
these two forms is that, IPv4 uses a 32 bits long address format for each packet whereas
IPv6 uses a 64 bits long address format. This increase in address field length in IPv6 is
mainly to cope with the fast increasing number of interconnected networks and the
number of devices that are connected to each of these networks. A block diagram of a
complete IPv4 packet is shown in Figure 2.0 which includes the 32 bit address field of
the destination node.
Figure 2.0: Block diagram of IP packet showing the source and destination IP address
In IPv4, as mentioned above, the source and destination IP addresses are 32 bits long.
These 32 bits are divided into 4 fields each of 8 bits long called Octets. In decimal form,
6
the IP address is represented in a dot notation form with the value of each octet followed
by a dot. As an example, let us consider a 32 bit IPv4 address in binary format as
11000000101010000000101000000000. This address can be broken down into 4 octets
and can be written as 11000000_10101000_00001010_00000000 which in decimal dot
notation format would result in the address 192.168.10.0.
In the Internet protocol the IP address is mainly required for the interconnection of
various networks. This naturally results in routing of packets based on the network
addresses rather than on host addresses. This resulted in the IP address consisting of two
levels of hierarchy,
A. The network address part: The network part of the IPv4 address is used to
identify the network to which the host is connected to. Thus, all hosts connected
to the same network will have the same network address field.
B. The host address part: The host address is used to identify the individual hosts
on a particular network
As the network address forms the first bits of the IP address, we can call the network
address a “Prefix” which can have a length of up to 32 bits in the IPv4 addressing space
and followed by a “*” which would represent the host address. As an example,
considering the IP address 192.168.10.0, let us assume “192.168”, i.e the first two octets
7
(11000000_10101000) of the IP address form the network address part and “10.0” is the
host address of one terminal on the network. Therefore the IP address
1100000010101000* (192.168.*) would represent all the 216 hosts connected to the
network addressed by “192.168”.
With this format of addressing, the packets are now routed using the network address
until the destination network is reached and then the host address can be used to identify
the intended host on that network to which the data packet has to be delivered. This
process is called “Address Aggregation” by which prefixes can be used to identify a
group of addresses. Also with this format of addressing, the forwarding table in the core
routers will now have to contain only address prefixes and the forwarding information
which can be either the Next-hop address or Output port number to the destination
network as shown in the example in table 2.1. So when a packet comes in, the router will
have to search for the prefixes in the forwarding table that matches the bits in the
destination address field of the packet.
Destination Network
Address Prefix
Next - Hop Address
Output Port Number
192.168
208.24.101.0
1
194.171
208.24.101.10
2
208.10
208.24.101.16
4
Table 2.1: An example Forwarding table
8
This format of IP addressing led to the formation of two different addressing schemes
known as,
1. Classful Addressing
2. Classless Inter-domain Addressing (CIDR)
2.1 CLASSFUL ADDRESSING SCHEME
During the early days of the internet, the addressing scheme was based on a simple
assumption of three different sizes for the network addresses, thus forming three different
classes of networks. These predetermined sizes were 8, 16 and 24 bits of the total 32 bit
length IP address. The classification of the networks was as follows:
a. Class A networks: This class of networks had the first 8 bits of the IP address
representing the network address while the remaining 24 bits represented the
hosts on that particular network. This meant that there were fewer number of
networks (a maximum of 28 = 256 networks) with each network having a large
number of hosts connected to them (a maximum of 224 hosts).
b. Class B networks: This class of networks had the first 16 bits of the IP address
representing the network address while the remaining 16 bits represented the
hosts on that particular network. This meant that there were sufficiently large
9
number of networks (a maximum of 216 networks) and 216 hosts connected to
each of these networks.
c. Class C networks: For these networks the first 24 bits of the IP address were used
to address the networks which meant a very large number of networks with only a
few hosts (a maximum of 28) being connected to each of these networks.
This kind of classification of the networks with fixed network address sizes resulted in
easier lookup procedures. But, it was determined over the course of the growth of the
internet and the increase in the number of networks and hosts that the distribution of the
addresses was wasteful. Also, with the growth of the internet, the size of the forwarding
tables also grew exponentially because the routers had to now maintain an entry for every
network in their forwarding tables. This resulted in slower IP Lookups and hence slower
overall network speeds.
2.2 CLASSLESS INTER-DOMAIN ADDRESSING
With an increase in the number of networks and number of hosts on each network, a
solution was needed to solve the problem of address shortages and wasteful allocation of
IP addresses to some networks. For this, an alternative addressing scheme was
introduced, which was called the ‘Classless Inter-Domain Routing’ (CIDR).
10
With the Classful addressing scheme, the prefix lengths were constrained to either 8, 16
or 24 bits to form the Classes A, B and C respectively. But with the CIDR addressing
scheme, the IP address space is utilized more efficiently by allowing the prefix (network
address) lengths to be arbitrary and not constraining them to be 8, 16 or 24 bits long. By
this scheme, the CIDR allows the network addresses to be aggregated at various levels
with the idea that addresses have a topological significance. This concept allows for the
recursive aggregation of the network addresses at various points in the topology of the
internet. This also leads to a reduction in the number of entries the routers will have to
maintain in their forwarding tables.
As an example to illustrate this, lets consider a group of networks represented by network
addresses starting from 192.168.10/24 to 192.168.20/24. Here the slash “ / ” symbol and
the number following the symbol (24 in this case) denote the number of bits in the IP
address that are used as the prefix (Network address). Here the leftmost 24 bits of the IP
address represent the network address. In 32 bit binary form, these address can be
represented as given below in Figure 2.1.
192.168.10/24 - 11000000_10101000_00001010
.
.
.
.
.
.
192.168.16/24 - 11000000_10101000_00010000
.
.
.
.
.
.
192.168.20/24 - 11000000_10101000_00010100
Figure 2.1: IP network address representation
11
In the set of network addresses above, it can be seen that the first 19 bits
(11000000_10101000_000) are common to all the networks and thus all these networks
can be combined into one network with the first 19 bits representing the network address
and can be denoted as 192.168.10/19. This process would allow a large number of
networks to be combined and represented by a common prefix value. By this, the amount
of forwarding information that will have to be maintained by the network routers will be
reduced significantly.
But for the above addressing scheme to work, all the networks with the common prefix
addresses will have to be serviced by the same service provider. If this criteria is not met,
a few additional information fields will have to be maintained to route the packets
correctly.
As an example, consider that the network with the address 192.168.16/24 is not serviced
by the same Internet Service Provider (ISP) as the other networks. This leads to a
discontinuity in the addressing scheme and packets addressed to the network address
192.168.16/24 may get dropped. A solution to this can be obtaining by still combining the
networks with the same 19 bit prefix values and an additional entry in the forwarding
table for the 192.168.16/24 network. Now, because of this situation, the router will have
to find an address that matches most accurately to the entries in the forwarding table. This
process is called the “Longest Prefix Matching”(LPM). Therefore, the address look up
12
process in the network routers will require the search for the longest matching prefix to
the destination address contained in the packet.
Although the CIDR addressing scheme reduces the number of entries that have to be
maintained in the forwarding tables, the IP lookup process now becomes complex as the
it does not involve only the bit pattern comparison but will also involve finding the
appropriate length of the prefix to be used for the search.
2.3 SOLUTIONS TO LONGEST PREFIX MATCH SEARCH PROCESS
The most commonly implemented method of search to implement the Longest Prefix
Match is through the use of Binary Trie structures. A binary trie is a tree like data
structure containing nodes that represent different levels of binary values from the top of
the tree to the bottom. This search mechanism uses the bits in the prefix address for
branching while traversing down the trie.
Consider the prefixes and the corresponding trie nodes for the prefixes in Table 2.2 to
represent the prefix and the forwarding information in a forwarding table.
PREFIXES
NODE
0*
P1
000*
P2
010*
P3
13
PREFIXES
NODE
01001*
P4
01011*
P5
011*
P6
110*
P7
111*
P8
Table 2.2: Prefixes and corresponding nodes in the forwarding table
This table can be represented in a binary trie structure as shown in Figure 2.2.
Figure 2.2: Example Binary trie structure
The nodes at the different levels of the trie structure represent the prefix values of length
equal to the position of the node in the trie structure. For example, the node P2 at level 3
of the trie represents the prefix value of 000. This sort of a binary trie structure is known
as a Uni-bit Trie. The nodes in such a structure are not only located at the edges of the trie
but are also present at intermediate positions in the trie. This is a result of the exception
networks that are served by different internet service providers as discussed above in
14
section 2.2. Therefore, for example, node P2 and P3 are exceptions to the group of
networks that are represented by the prefix value of node P1.
With this structure, there is usually an overlap of prefixes when a search is conducted.
Lets consider a prefix value of 010* that has to be looked up. As seen from the trie
structure above, both nodes P1 and P3 match this prefix value. To solve this problem, the
Longest Prefix Match (LPM) technique is used which results in node P3 being chosen as
the final result of the lookup process.
Tries are an easy way to find the longest matching prefix for the destination IP address.
The bits of the destination IP address are used in making the decision at each node of the
trie, the direction in which the search should proceed. If at a node, the next bit in the
address is a 1, the search takes the right path and if the bit value is 0, the search proceeds
towards the left path. As the search reaches a prefix node, that node will be registered as
the longest matching prefix and the search continues down the trie to find any other
longest matching prefix node that might be present. If no better matching node is found,
then the last register prefix will be chosen as the final result and the longest matching
prefix.
As an example, let us consider an IP address starting with the value 01001*. The search
begins at the node labeled “ROOT” and continues down towards the left path, where
node P1 is found. This node is now registered as the longest matching prefix and the
15
search continues to the right of node P1 down the trie. Subsequently at position 010, node
P3 is found which replaces node P1 as the longest matching prefix node. The search then
continues down the trie again until 01001 where node P4 is encountered. This node is
now registered as the longest matching prefix node and the search terminates as there are
no more values that can be searched in the trie structure.
The search through the trie structure leads to a hierarchical reduction in the search space
at each step of the trie. Updating the trie structure is also a simple process where a search
for a node can be performed and when the search reaches a node with no further nodes
below it, the appropriate node can be added to the structure to create a new prefix node.
The node deletion process will also be similar where the nodes below a particular one can
be deleted using the search process.
Although the above described process of Uni-bit trie search is simple, it can often leads to
unnecessary search operations being performed on a path of the trie even after a prefix
node has been registered and no more best matches can be found. This is a disadvantage
that can lead to the slowing down of the search process. Also every extra node on the trie
takes up a memory space resulting in a larger use of memory than required. A number of
ways have been suggested to overcome these limitation, some of which are discussed
here.
16
2.3.1 PATH COMPRESSION TECHNIQUE
In this technique of trie length reduction, the one way branch node are reduced by
eliminating intermediate node in the path that do not represent a prefix node. By doing
this, additional information has to be maintained in the nodes about which bit position in
the IP address has to be compared next to find the next node in the trie. As shown in the
diagram in Figure 2.3, the paths from node P1 to P2 and from the root node to the nodes
P7 and P8 have been compressed. The nodes now have to hold an additional information
about the position of the bit in the IP address that has to be compared next which is
represented by the decimal number on the side of every node other than the nodes at the
edges of the trie.
Figure 2.3: Example Path Compressed Uni-Bit trie
For example, consider the search has reached the intermediate node between nodes P7,
P8 and the root on the trie. This node has to now direct the search process to compare bit
number three of the destination IP address in order to determine if the output will be node
17
P7 or node P8. As soon a prefix node is encountered, a comparison of the actual prefix
value the node represents is made with that of the IP address. By this way, we see that the
search process is reduced by one step and the memory requirement is reduced by one on
this path of the trie. In a similar way, all the one way branches in a trie can be reduced
thus resulting in a speedup of the search process.
Although this method of binary trie search reduces the length of the trie structure, it adds
an additional value that has to be maintained at the nodes for the search process to
proceed. This is a disadvantage which can lead to additional memory usage.
2.3.2 PREFIX EXPANSION / LEAF PUSHING
In a router, the forwarding table consists of address prefixes that correspond to the next
hop information for a particular destination IP address. The prefixes are data obtained by
the routers over a period of time and by combining networks with the same prefix bits as
using the address aggregation process described in the previous sections. The problem
with the search process using either the Uni-bit trie structure or the path compressed unibit trie structure is that, the search process requires either storing a prefix node while the
search proceeds down a trie structure to find the longest matching prefix or backtracking
through the trie structure. That is, the search will have to either keep track of the best
matching node encountered or will have move up the trie if no valid matches are found
at the bottom edges of the trie structure. This is usually not a desirable attribute when the
18
trie structure is mapped onto pipelined stages for hardware implementation. While
additional memory is required to keep track of the best match prefix, backtracking will
involve the implementation of a looping structure within the pipeline structure. To avoid
these disadvantages, an optimization to the uni-bit trie called Prefix Expansion or Leaf
Pushing can be used.
In this process, the prefix nodes at the intermediate level of a trie structure are expanded
to be pushed down to the bottom edges of the trie structure. When this optimization is
performed on the uni-bit trie, there can be an overlap of nodes at the edges of the trie. At
the point where two nodes overlap, the node that was already present at the position is
given precedence over the new node. This enables the best matching prefix to be obtained
at the end of the trie.
As an example, the trie structure presented in Figure 2.4 represents a Prefix expanded/
Leaf Pushed trie structure.
Figure 2.4: Leaf pushed uni-bit trie structure
19
Comparing the trie structure above to that from Figure 2.1, it can be seen that the nodes
P1 and P3 which were present at intermediate location in the trie structure have been
pushed to the bottom edge of the trie by way of expanding their prefixes. Here,
A. Node P1’s prefix can be expanded by one bit to values 01* and 10* to push it to
the bottom of the trie.
B. Node P3’s prefix value can be expanded by 2 bits to values 01000*, 01001*,
01010* and 01011* to be pushed to the bottom of the trie.
When prefixes of nodes are expanded, there is a possibility of these nodes overlapping
with other nodes that represent the same prefix value but were part of the original trie
structure forming the exception nodes. In this case, when the expansion takes place,
preference is given to the node already present at a particular location over the expanded
node. This enables the best matching prefix to be found at the end of the trie search
process.
Another advantage of prefix expanding or leaf pushing is that all the nodes that represent
prefix values in the forwarding table are present only at the bottom of the trie structure.
This would eliminate the need to keep track of the best matching prefix node or the need
to backtrack during the trie search. This is due to the fact that all the prefix nodes are
pushed to the bottom of the trie and the nodes encountered at the end of the trie structure
20
are the best matching or the Longest Prefix Matches that can be found in the trie. As a
result of this, the mapping of the trie structure to the pipelined stages will become linear
and the extra memory needed to keep track of the prefix nodes can be eliminated.
2.3.3 USING MULTI BIT TRIES
When a search through the forwarding table in a router is conducted using a trie structure
described in sections 2.3.1, the search process by itself will be simple but will require
extra memory to remember the best matching prefix or will require backtracking through
the trie structure. Using prefix expansion explained in section 2.3.2 has benefits by which
it can eliminate the need for extra memory to register the matching nodes or the need for
backtracking. But with both these methods, the search is still performed on a bit by bit
basis. This has the disadvantage of requiring larger amount of time as only one bit of the
IP address is compared in one clock cycle.
This problem of increased search time can be reduced by searching through the trie
structure using multiple bits at a given time. This leads to the creation of Multi-bit tries
where search at each node of the trie will be performed using multiple bits.
To illustrate this process, the trie from Figure 2.2 has been transformed to a multi bit trie
and is shown in Figure 2.5.
21
Figure 2.5: Multi bit trie structure with fixed stride
It can be observed from the diagram in Figure 2.5 representing the trie structure that from
level 2 down the trie, the search process now uses 2 bits at a time. The first level still has
a single bit comparison. The number of bits that are used for comparison at a node in the
trie is called a “Stride”. So here, at level 1 of the trie which is comparison at the root
node, the Stride value is equal to 1 and the bit comparisons at levels 2 and 3 have a Stride
value equal to 2. If the Stride value used is constant throughout a level, then the trie is
called a “Fixed stride multi-bit trie” or else if the stride values vary on the same level of
a trie, it will be called a “Variable stride multi-bit trie”. The trie structure above
represents a fixed stride multi-bit trie as the stride at each level of comparison is constant.
The use of a multi-bit trie structure will now create a new problem where prefixes of
arbitrary lengths, that is, prefixes with lengths other than the stride values cannot be
compared. This will require some prefix transformations where the nodes with arbitrary
lengths of prefixes will have to be expanded. As an example, the trie structure in Figure
22
2.4 uses stride of 1 at level 1 and stride of 2 at levels 2 and 3. This would mean that, if
there existed a node with a prefix length of 4 bits say 0100*, that node will now have to
be expanded by one bit to 01000*, 01001* to comply with the search process. When this
expansion is done, there is a possibility of nodes overlapping as is the case in this
example where the expanded node would now overlap with node P4 that represents the
prefix value of 01001*. When this scenario occurs, the node P4 is preserved and is not
replaced by the expanded node as preserving the node will lead to a more accurate
comparison and thus the correct forwarding information.
Using this method, the trie structure can be compressed which will lead to a decrease in
the length of the trie and therefore a reduction in the number of pipelined hardware stages
that will be needed to map the trie structure. The worst case, reduction could be made to a
single stage when all 32 bits are compared at once. Though this would reduce the length
of the trie to 1 level, it will also require a larger memory with a longer memory access
time to search for the correct forwarding information. Also, though the overall search
time has now been reduced using multi bit comparisons, the problem of backtracking or
remembering the best matching prefix still exists in this trie structure which can still
reduce the search times through the trie.
23
2.3.4 USING A COMBINATION OF LEAF PUSHING AND MULTI BIT TRIES
As described above, although using either one of the search methods suggested has a set
of benefits, it would still have drawbacks like large size of the trie structure, requiring
extra memory to register matching nodes or even the need for backtracking through the
trie to find the best matching prefix or the Longest Prefix Match value. These drawbacks
can be reduced to a minimum when the prefix expansion/leaf pushing method and the
multi-bit trie search method suggested above are combined to form a trie structure that
will be both compact and will not require any backtracking.
To see the benefits of this search method, consider the trie in figure 2.2 to be leaf pushed
and expanded to form the multi bit trie structure presented in Figure 2.6.
Figure 2.6: Leaf pushed multi bit trie
24
With this trie structure, it can be seen that search is preformed at level 1 using a stride
value of 2 and then subsequently using a stride value of 1 on all the other levels of the trie
structure. Also, it’s seen that nodes P1 and P3 have been expanded and pushed to the
bottom edges of the trie structure. Although using this kind of a trie structure can not
produce a trie that is most compacted, it will result in a significantly compact structure
when compared to a uni-bit trie and will also have to added benefit where all the prefix
nodes are at the bottom of the trie making sure that the nodes encountered at the end of
the search are the best matching or the Longest prefix matches and thus eliminates the
need for any sort of backtracking or need to remember a previously encountered
matching node. Therefore, the speed of the search process and the memory requirements
with this structure can be optimized to obtain search results in the shortest time possible.
25
CHAPTER 3
PROPOSALS FOR THE SEARCH PROCESS USING BINARY TRIES
There are various proposals for the implementation of the IP lookup process using the
binary trie structure and its variations that have been discussed above. Some of the
proposals optimize the uni-bit trie structure directly while others modify the trie structure
as needed to map onto a set of pipelined stages. Some of the proposals have been briefly
described in this chapter.
One proposal is to use a binary trie with variable stride values at different levels of the
trie and mapping nodes at each level of the trie to a memory block in a pipeline stage.
Although using variable strides does produce a compact trie structure as described in
Chapter 2, the uneven distribution of nodes on different levels of the trie structure will
lead to the use of uneven memory blocks at different stages of the pipeline hardware
architecture which will mean that some memory blocks will consume more time than
others. Furthermore, using variable strides for hardware implementation can be a
complex design as the number of bits that will have to be compared at each stage can be
different and additional logic units will be required to make the decision about the
number of bits to be used for the search. It can also lead to a non-uniform distribution of
the nodes in the memory blocks at each stage of the pipeline.
26
Another proposal is to create a circular pipelined trie structure. In this, the pipelined
stages are configured in a circular structure with multi point access to allow for the
lookup process to begin at any point in the pipeline stages. The stage where the lookup
starts is called the “Starting Stage”. In this implementation, a trie is divided into many
smaller subtries of equal size and these subtries are then mapped to the different stages of
the pipeline structure to create a balanced memory distribution across all the stages. As a
result, the nodes are assigned to the pipeline stages based on the subtrie divisions rather
than on the level at which the nodes are present in the main trie structure. This sort of
mapping might require some subtries to wrap around if their roots are mapped to stages
closer to the end of the pipeline.
Due to this type of distribution of subtries, although all the incoming IP addresses enter
the pipeline at the first stage, the search may not start for the next few stages depending
on which stage of the pipeline the root of the subtrie corresponding to the address bits has
been mapped to. A solution to this problem was proposed by implementing a pipeline in a
ring structure consisting of two data paths.
Figure 3.1: Pipeline stages with a ring structure
27
The diagram in Figure 3.1 represents the ring pipeline structure consisting of two
datapaths. Here datapath 1 refers to the path the search algorithm takes during its first
pass and datapath 2 represents the path taken during the second pass through the pipeline.
The datapath 1 is designed to be operational only during the odd cycles of the clock
signal whereas the datapath 2 is active during the even cycles of the clock signal. Also,
this search algorithm will require the use of an index table for the search process to
initially find at which stage in the pipeline the actual search has to begin. Although this
idea of a ring pipeline produces an even distribution of nodes over the memory, the
throughput falls to 1 lookup every two clock cycles as the search has to now pass through
all the pipeline stages twice before a final output is obtained.
A different approach to the IP lookup through a ring pipeline architecture was suggested
and called the Circular, Adaptive and Monotonic Pipeline (CAMP). In this approach, the
trie structure was broken down into a root subtrie which is the starting point of the trie
structure and subsequent child node subtries. This search algorithm uses the multi bit trie
structure. Hence the initial stride of the trie is determined and is used in an index table to
hold the starting points of the subtries that are mapped to the pipeline stages.
The structure in Figure 3.2 will show the structure of the CAMP architecture in more
details.
28
Figure 3.2: Pipeline structure used in the CAMP search algorithm
In this lookup algorithm, the distribution of nodes to the memory stages of the pipeline
structure is made independent of the levels in a trie at which the nodes are present, that is,
the nodes are not mapped onto the memory based on the level in the trie structure where
they are present, rather they are mapped based on which subtrie they belong to. In this
way, the subtries are now mapped uniformly to the memory stages. While doing so, some
of the nodes of one subtrie may get mapped to the next memory stage and so on. These
stages in the pipeline are interconnected internally so that the lookup can pass through all
the stages.
Here, the initial stride bits listed in the index table at the beginning is used to find the
stage at which a search should begin. Now since search requests can arrive at different
stages at arbitrary time periods, a FIFO queue is placed in front of every stage to buffer
the requests while a previous search request is traversing through the stages of the
pipeline. When the stage is idle, the first request in its queue is accepted and the search
progresses. In this way, since the requests can be buffered at different stages, the order of
29
the incoming requests can be lost which would again require that the outputs be buffered
to reorder the outputs in the correct sequence to match that of the input sequence.
Also, since stages can be blocked due to previous requests passing through the pipeline,
there can be a long delay that can be encountered for processing certain search requests
over others. This is a major drawback of this pipeline structure which can reduce the
overall throughput by a significant amount.
30
CHAPTER 4
IMPLEMENTATION DETAILS OF THE MEMORY EFFICIENT PIPELINE
ARCHITECTURE
The main aim of the implementation is to create a pipelined IP address lookup
architecture with a uniform distribution of all the nodes in a trie structure, thereby
creating evenly balanced memory blocks across all the pipeline stages with uniform
memory access times for the lookup and to eliminate the need for any sort of loopback to
any of the previous stages in the pipeline which was the case in the pipeline architectures
described in the previous chapter. The pipeline is designed to be implemented on a Xilinx
Spartan 6 FPGA which has low power requirements and sufficient memory on board to
hold the prefix tables used here.
The Ring and CAMP architectures divide the main trie into many subtries which are then
uniformly distributed over the pipeline stages. But they do not enforce the constraint that
the roots of all the subtries be mapped to the first stage of the pipeline. This leads to the
pipelines having multiple starting points causing search conflicts and affecting the overall
throughput of the architecture.
There are two main constraints that are enforced in this implementation that try to solve
the issues from the previous implementations.
31
Constraint 1: The roots of all the subtries of the main trie are mapped to the first stage of
the pipeline.
Figure 4.1: Example trie structure with two subtries and mapping to the pipeline
Consider an example of a simple trie structure having two subtries with the roots of the
subtries at nodes B and C. Applying the above mentioned constraint to this trie, we see
that the roots of the subtries (node B and node C) are mapped to stage 1 and the child
nodes are mapped to subsequent stages.
This ensure that the search process has only one starting point, thereby eliminating any
loopbacks and ensuring that the output port number of the router through which the IP
packet has to forwarded that is found at a particular stage of the pipeline is the best match
for the IP address being used for the lookup process.
Also, the node distribution of the trie structure over the pipeline stages is made on the
basis of the level in the trie they are present at. With the earlier implementations, this
meant that the nodes at the same level in a subtrie are all mapped to the same pipeline
32
stage. With a relaxation of this rule whereby nodes on the same level of a subtrie can be
mapped to different stages of a pipeline structure, it can be seen that a more uniform
distribution of nodes can be achieved and this leads to the second constraint that has to be
followed for this implementation.
Constraint 2: If one node is the child of another node (parent node), then the parent node
has to be mapped to a pipeline stage preceding the stage to which the child node has
been mapped to.
Considering an example for this, let’s assume the presence of three nodes namely, nodes
A, B and C in a trie structure. Nodes B and C are the child nodes of parent node A. With
the above constraint enforced, this means that node A has to be mapped to a pipeline
stage preceding the stage to which nodes B and C have been mapped to. This constraint
again ensures that the entire trie structure is distributed uniformly from top to bottom
starting at the first stage and ending at the last stage of the pipeline. This method of node
distribution, as explained before helps in increasing the throughput by making sure that
the search process flows in only one direction starting at the first stage and ending at one
of the stages down the pipeline structure.
To support this type of node mapping, No-Operations (no-ops) have to be supported
using which a stage can be skipped over to reach the intended node located in one of the
later stages of the pipeline.
33
Figure 4.2: Example subtrie with one parent (node A), two children (nodes B, C) and
mapping to the pipeline
4.1 IMPLEMENTATION OF THE PIPELINE STRUCTURE
With the above mentioned constraints taken into consideration, the main goal now is to
achieve a throughput of one IP lookup for every clock cycle, a uniform distribution of
nodes across the memory and finally, a constant delay for every IP lookup. For this, the
pipeline structure is implemented using an example prefix table (forwarding table) and a
trie structure corresponding to the entries in the prefix table.
4.1.1 THE PREFIX TABLE
Let us assume a sample prefix table with the following entries as shown below in Table
4.1.
34
Prefix
Node
0*
P1
1*
P2
101*
P4
11010*
P5
1001*
P6
10101*
P7
1100*
P8
0 10011*
P9
101100*
P10
0 00010*
P11
111*
P12
0 1011*
P13
100011*
P14
11111*
P15
Table 4.1: Prefix/forwarding table showing sample prefix values and their corresponding
nodes
The prefix values in this table vary in size from 1 bit up to 6 bits in length. These prefixes
represent the address aggregation at arbitrary levels for the networks that was described
in Chapter 2. The nodes are used to represent the end location for a prefix search when
these prefixes are implemented as a trie data structure.
35
4.1.2 THE TRIE STRUCTURE AND MAPPING
Based on the set of prefix values presented above, a Uni-bit trie structure corresponding
to these prefix values is constructed as shown in the tree diagram presented in Figure 4.3.
Figure 4.3: Trie structure corresponding to the given prefixes in Table 4.1
It can be seen here that the “Root” node is the starting point of the trie and depending on
the value of the bits in the IP address being looked up, the search process can either go to
the left (when a bit is 0) or to the right (when a bit is 1) of a node down the trie structure
till the best matching result is found.
As explained in chapter 2, this type of uni-bit trie structure can be very inefficient in
terms of node distribution in the pipeline search structure which will also lead to a
complicated lookup process. This is due to the fact that address aggregation is at arbitrary
36
levels with prefix nodes placed at different levels of the trie structure. This will lead to
the requirement for some extra processing or may sometimes even lead to unexpected
results for a given destination IP address that is being used for the lookup. Therefore, the
Leaf pushing algorithm is applied to this trie structure to push all the prefix nodes at
arbitrary level to the bottom of the trie structure, thereby guaranteeing more accurate
results.
Therefore, the uni-bit trie structure after Leaf-Pushing will be as shown in the trie
structure represented in Figure 4.4.
Figure 4.4: Leaf-pushed uni-bit trie structure
Although the process of leaf pushing can lead to the creation of duplicate nodes at the
edges of the tries, it will result in a more uniform length of the prefix values in the prefix/
forwarding table.
37
But, we see here that the trie structure is still in the uni-bit format. By looking at the node
distributions, it can be seen that changing the uni-bit trie structure to a multi-bit trie
structure can lead to a decrease in the height of the entire trie structure by eliminating
intermediate nodes and thereby reducing the number of stages that will be required in the
pipeline structure used for the IP lookup process. However, using multi-bit addressing for
nodes throughout the trie structure would result in a uneven distribution of nodes in the
actual pipeline structure and will also cause complexity in the search process itself.
Therefore, multi-bit addressing is used only at the root of the trie structure to create
subtries which will remain as uni-bit structures. This multi-bit addressing used at the root
level of a trie structure, as explained in chapter 2 is called the “Initial stride” for the trie.
In order to obtain an initial stride value that would lead to a uniform distribution of nodes
across the pipeline’s memory, we have to consider a pipeline that is longer by atleast one
stage than the maximum height of the leaf pushed uni-bit trie structure starting at its root.
For the implementation here, since the maximum height of the trie is 6, a 7 stage pipeline
is considered. With this constraint, the algorithm that is used to select the length of the
initial stride is presented in Table 4.2.
38
Inputs: Leaf pushed uni-bit trie(T)
Number of pipeline stages(P)
Output: Initial stride, i
Initialize i to max[1, ((height of T) - P)]
Loop: use value of i to expand T to get expanded trie
if (2i-1) < (Number of node in expranded trie/Number of pipeline stage) < 2i
i is the required initial stride value
else
increment i by 1 and return to Loop
End Loop
Table 4.2: Algorithm to find initial stride
Using the algorithm in the above table to find the initial stride for the uni-bit trie above,
the first iteration through the loop would lead to the trie structure that would be
equivalent to that of the uni-bit trie (refer Figure 4.4) as the value of the initial stride (i)
will be 1.
For the second iteration through the algorithm, a multi-bit trie structure with initial stride,
i = 2 is constructed and is shown in the following diagram in Figure 4.5.
Figure 4.5: Multi-bit trie with initial stride i = 2
39
With the value of i = 2, the IF condition in the algorithm fails and therefore the value of i
is incremented by 1 and another iteration through the algorithm is made. The trie
structure resulting from an initial stride of 3 is shown in the tree diagram in Figure 4.6.
Figure 4.6: Multi-bit trie with initial stride, i = 3
With this multi-bit trie structure, six subtries are created with their roots at nodes a, b, c,
d, e and f with prefixes 011* and 010* representing an output port in stage 1. The leaf
pushing and use of multi-bit tries optimizes the forwarding table entries by creating
subtries which now represent a higher level of network address aggregation.
As the next step, the six subtries obtained in the multi-bit trie are now converted into a
segmented queue structure starting at the root of each subtrie till the last node of that
subtrie. The nodes at each level of the subtrie are mapped to one segment in the queue. In
this way, six segmented queues are created as shown below.
40
As an example to show the creation of the segmented queue structure, lets consider the
subtrie starting at node “a”.
Figure 4.7: Subtrie with root at node “a” in the multi-bit trie from figure 4.6
As there are two levels in the subtrie below the root node, there are two segments in the
queue corresponding to this subtrie. Starting at node ‘a’ if there are child nodes to node
‘a’, fill the first segment of the queue with the child node towards the left first (in this
case node P1) and then the second node (in this case, node ‘g’) to the second location in
the first segment of the queue.
After this is done, the queue will be as shown below.
P1
g
Next, if nodes P1 and ‘g’ had child nodes, their child nodes will be added to the second
segment similar to how nodes P1 and ‘g’ were added to the first segment. Here, since
node P1 does not have any child node, only the nodes below node g are added to the
second segment. The queue would appear as follows.
41
P1
g
P11
P1
Repeating the same process over all the subtries considering a different segmented queue
for each subtrie, the following queues will be created.
Segmented queue for subtrie with root at node “a”
P1
g
P11
P1
Segmented queue for subtrie with root at node “b”
h
i
P1
q
P13
P1
P9
Segmented queue for subtrie with root at node “c”
j
P6
P1
q
P13
P2
r
Segmented queue for subtrie with root at node “d”
k
l
P4
P7
s
Segmented queue for subtrie with root at node “e”
P8
m
P5
P2
Segmented queue for subtrie with root at node “f”
P12
n
P12
P15
42
P10
P4
Also, since the roots of all subtries are mapped to the memory block at the first stage of
the pipeline, the total number of memory stages available for mapping the remaining
nodes onto the 7 stage pipeline is now 6. To achieve an even distribution of nodes at each
stage, the total number of nodes are divided by the number of stages available to find the
number of nodes to be mapped to each stage. In this case, 6 nodes can be mapped to each
stage from stage 2 to stage 7.
These segmented queues are next sorted in a descending order based on,
A. The number of segments in each queue.
B. The number of nodes in the first segment of each queue.
The nodes are then popped from the queues in order. The segments at the front of the
segmented queues are the only segments allowed to pop the nodes. In this way, when the
current stage’s memory fills up, the queues are again re-arranged in the descending order
based on the above two criteria and the node popping continues until all the nodes in all
queues are mapped to the memories of the pipeline stages.
The node distribution across the 7 stages of pipeline used here after the mapping process
has been shown in Figure 4.8 in the following page.
43
Figure 4.8: Pipeline structure with nodes mapped to its memory modules
As seen in the diagram above, the root nodes of all the six subtries are mapped to the first
stage memory block of the pipeline structure. Therefore, the first stage is usually not
balanced. The remaining nodes are then evenly distributed over the remaining six stages.
The last stage has fewer nodes and any future nodes that might get added to the trie due
variations in the network topology can be mapped to the last stage.
44
4.2 IMPLEMENTATION OF THE HARDWARE
The general structure of the hardware to implement this search process is presented in
Figure 4.9.
Figure 4.9: General structure of the pipelined hardware
Each stage in the pipeline shown above consists of several logic units and a memory
block to store the data corresponding to the nodes mapped to that particular stage. The
logic units and the memory block are synchronous in their operation. These units will be
described in the sections in the pages to follow. The are two main inputs to this design,
1. Packet_in: This is a 44 bit packet containing two fields. First field is the 32 bits of
the IP address, the forwarding information of which has to be searched for in the
pipeline structure. The second part contains the Write information, to any of the
stages that might be required in case of an update operation. This is a 12 bit data.
Figure 4.10: Input packet format
45
2. Clock: This is the clock signal used for the synchronous operations in the logic
units and also for the synchronous memory access at each stage.
The output from the hardware is a 4 bit number representing the output port on the router
through which the IP packet received will have to be forwarded.
4.2.1 DATA ORGANIZATION IN MEMORY BLOCKS
Each of the seven memory blocks used in this implementation have 32 memory locations
with each location holding up to 8 bits resulting in a total of 256 bits (32 bytes) of data in
each memory bank. Every node mapped to a stage in the pipeline will occupy two
memory locations, one location to represent the child node when the next bit in the IP
address is a 0 and another to represent the child node when the next bit is 1.
Each 8 bit data in the memory locations holds the partial address to the memory location
where the next child node is present (4 bits), an enable bit(1 bit) used to enable memory
access at the next stage where the child node is present and the stage number (3 bits) at
which the memory block has to be accessed next to find the child node. The data packet
output from a memory block at the end of a clock cycle when a search is in progress is
shown in Figure 4.11 in the next page.
46
Figure 4.11: Data packet from memory block
The data in the memory blocks at each of the pipeline stages that forms a part of the
forwarding table used for the IP lookup process in this implementation is organized in the
tables listed in Appendix A.
4.2.2 LOGICAL BLOCKS USED AT EACH STAGE
1. The Shift Register block
Each of the seven stages in this implementation have a Shift register used to shift the
destination IP address for which the corresponding output port number is being looked up
by either 4 bits or by 1 bit. This shift register is a synchronous module shifting data to the
left at every clock cycle. The shift register is then followed by a register which is used to
synchronize the output data from the shift register module with output data from other the
blocks of that stage. A block diagram of the shift register module is shown in Figure 4.12.
This module has the incoming IP address and the clock signal as inputs.
Figure 4.12: Shift register module
47
2. Memory Module - Stage 1
The memory module at the first stage of the pipeline has a slightly different structure than
the memory modules used in the subsequent stages. It consists of a 32 by 8 bits memory
block followed by a register and a 2:1 multiplexer.
Figure 4.13: Stage 1 Memory module
The Address input is a 5 bit input used to access a memory location to either read or write
data at every clock cycle when a lookup request or an update request arrives. The Write
Enable signal is used to enable the memory block’s write process. The Write Data input is
used to hold the data to be written into the memory block in the event that an update to
the memory block has to be made.
The register at the output is used to register the output from the memory block and
synchronize it with the outputs from the other blocks. Finally the 2:1 multiplexer here is a
combinational block used to pass data to the next stage in the pipeline. The select line
48
input (Sel) to the multiplexer is the Enable bit (bit 3 in this case) from the data packet
output from the memory.
3. Memory Module - All other stages
The memory module used in all the other stages of the pipeline other than stage 1 has a
structure that varies from that used in stage 1. This module consists of a concatenation
block and a memory block. The register and 2:1 multiplexer that were present at stage 1
are now located outside the memory module in the top layer module for the stage. The
structure of the memory module used here is represented in the block diagram in Figure
4.13a.
Figure 4.13a: Memory module structure
The inputs to this module are the 8 bits of data from the previous stage and the MSB of
the shifted IP address from the previous stage. These two inputs are connected to the
concatenation block where the partial address in the Data_In input and the 1 bit MSB of
the IP address are concatenated to form the address of the memory location to be
accessed at that particular stage in the pipeline. The other inputs are the Write Enable, the
49
Write Data and the clock signal which have the same functionality as described in the
stage 1 memory module description.
4. Node Distance Check module
The second constraint used to map the trie nodes to the pipeline stages states that nodes
on the same level of a subtrie can be mapped to memory blocks of different pipeline
stages. With this type of distribution, there will be a need to introduce no-operations so
that certain stages can be skipped to resume search in stages where the next node will be
present. This is accomplished by using the enable bit and the stage number field in the
data packet output from the memory block and the Node Distance check module.
The block diagram of this module is presented in Figure 4.14.
Figure 4.14: Node distance check module
The operation of this module is as follows:
•
When stages are to be skipped, the data from the previous stage is passed to the
node distance check module.
•
In the first module named Node Distance Check, the bits representing the stage
number are extracted and checked to see if the stage number is 0. If not, the value
is decremented by 1 and forwarded to the next stage.
50
•
In the module labeled NDC Enable, if the stage number value is 0, the enable bit
is set or else the enable bit which will have a value of 0 will be passed forward
unchanged.
•
In the next module named NDC Concatenation, the partial memory address
(4bits), enable bit (1bit) and the stage number (3bits) are combined to form the
data output to be passed to the next stage in the pipeline.
5. Update Module
Routers have to very often accommodate updates resulting from changes in the network
topology surrounding the network to which they are connected. But this update operation
has to be carried out without stalling the pipeline operation so that the lookup operation is
not disrupted. The Update module is used here to accomplish this functionality. The block
diagram of this module is shown in Figure 4.15.
Figure 4.15: Update module
The data to be written during an update to the memory is passed into the pipeline as part
of the 44 bits long Packet_in data input. This data occupies the last 12 bits of Packet_in
input packet and the structure of the write data is shown in Figure 4.16 in the next page.
51
Figure 4.16: Write data packet
Here, the first bit is the enable bit which indicates if the write is to the current stage in the
pipeline or not. The next 3 bits, the stage number is used to indicate the stage number at
which the update has to be performed and the last 8 bits is the actual data that has to be
written to a particular location in the memory block.
The operation of the update module is as follow:
•
When Write_Data_In arrives, which is the 12 bits long update data, the 3 bit stage
number is extracted and checked to see if it’s value is 0. If the value is 0, then the
Write enable to the memory module is set to 1 and the enable bit in the data frame
is set to 0.
•
In the concatenation module, the data frame is rebuilt and forwarded to the
demultiplexer.
•
At the demultiplexer, the Enable bit in the data frame is used as the select line
input. When the bit is 0, the write data is forwarded to the write data line of the
current stage’s memory module, else the write data is forwarded to the update
module on the next stage in the pipeline.
52
4.2.3 SEARCH PROCESS IN STAGE 1
The first stage of the pipeline structure follows a search algorithm that is slightly different
from that followed by all the subsequent stages. This is a result of all the root nodes being
mapped to this stage whereby every IP lookup request will have to start at this stage and
flow forward. The first stage of the pipeline consists of the following units.
1. A shift register module used to left shift the incoming IP address by 4 bits.
2. A memory module that holds either the port number or the address and stage
number at which the child node exists.
3. A write block used for the update process whenever a node has to be deleted or
added depending the variations in the network topology surrounding the router.
A block diagram representing stage 1 of the pipeline is presented in Figure 4.17.
Figure 4.17: Block diagram of stage 1 of pipeline
53
The search process at this stage is as follows.
•
The 32 bits IP address and the 12 bits Write Data are extracted from the Packet_in
input.
•
The 4 MSB bits of the IP address are concatenated with a 0 bit and taken as input
to the address field for the memory module.
•
The Write Data input is checked in the write module to verify if any update has to
be made to the data in the memory module. If the Packet_in input contains update
information for the stage, then the Write Enable to the memory module is set to 1,
the Write Data is forwarded to the memory module where the update is made to
the memory location pointed to by the 5 bit address field input.
•
If no updates are found, then the Packet_in data is a lookup request. The lookup
proceeds by accessing the corresponding memory location pointed to by the
Address input field, the data from which points to either the output Port Number
on the router or information about the next stage to be accessed to find the next
node in the trie.
•
Finally, the Shift Register module left shifts the incoming IP address by 4 bits and
forwards it to the next stage in the pipeline.
54
4.2.4 SEARCH PROCESS IN OTHER STAGES OF THE PIPELINE
The search process in the remaining stages of the pipeline has a few extra processes
involved due to the Constraint 2 used for mapping the trie onto the pipeline stages. Here,
there is a need of extra logic to verify if the memory module at a certain stage has to be
accessed or if the data from the previous stage has to be forwarded to the next stages in
the pipeline where the next node information maybe present.
The following block are included in these stages of the pipeline.
1. A shift register module used to left shift the incoming IP address from the
previous stage by 1 bit depending on the enable bit to the shift register. The enable
input here is the Enable bit of the Data_In packet arriving from the previous stage.
2. A memory module that holds either the port number or the address and stage
number at which the child node exists.
3. A Node Distance Check module that is used to verify to which stage in the
pipeline the incoming data is intended for.
4. A write block used for the update process whenever a node has to be deleted or
added depending the variations in the network topology surrounding the router.
A block diagram representing this module of the pipeline is presented in Figure 4.17a in
the next page.
55
Figure 4.17a: Block diagram of the stage module of a pipeline
The search process at this stage is as follows.
•
The shifted IP address from the previous module is passed into the Shift Register
Module. The shift module at these stages is modeled to shift the IP address by 1
bit to the left. This module has an Enable bit which is bit 3 of the Data_In frame
from the previous stage. Depending on the value of this bit, the shift operation
will either be enabled or disable. This, again is a requirement to satisfy the
constraint 2 of the node mapping process where certain stages in the pipeline
maybe have to be skipped. When this happens, the IP address should not be
shifted to maintain the correctness of the lookup.
56
•
The Data_In packet from the previous stage is sent as input to the Memory
module and the Node Distance Check module.
•
At the Node Distance Check module the stage number in the Data_In frame is
checked and the value of the Enable bit altered depending on whether the stage
number is 0 or not.
•
In the Write module, the Write Data input from the previous stage is checked to
see if the Memory access at this stage is for a node lookup or for an update
operation. Depending on the data present in the Write Data, the Write Enable line
to the memory block is either set to 1 or left unchanged at 0.
•
At the Memory module, if the data in the Data_In frame is intended for the
present stage, the 4 MSB bits of the Data_In frame are extracted and then
concatenated with the MSB of the shifted IP address from the Shift Register
module. This 5 bit address is then used to access the memory location to find the
details corresponding to the node being searched for or to update data at that
location in memory.
•
The registers at the outputs of each of the modules are used to synchronize the
outputs from every module.
•
The outputs from the Memory module and the Node Distance Check module are
then connected to the inputs of a 2:1 multiplexer. The select line of this
multiplexer is the negated value of the Enable bit contained in the Data_In input
frame. Depending on this Enable bit, either the data frame from the Memory
57
module or the Node Distance Check module is sent as the Data Out output to the
next stage of the pipeline.
This search algorithm is implemented at every stage of the pipeline other than the first
stage where the algorithm presented in the previous section is implemented. The pipeline
is completely synchronous. For this implementation, when the lookup request arrives at
the input, the first output arrives after seven clock cycles and subsequently there will be
one output at the end of every clock cycle. In this way, the pipelined search process is
implemented and the general structure of this has been presented in Figure 4.9.
58
CHAPTER 5
SIMULATION RESULTS AND ANALYSIS
5.1 SIMULATION ANALYSIS
Here, the IP lookup process with a pipelined architecture and uniform node distribution
is implemented by taking into consideration a lookup table that has fourteen prefixes in
the table. Typically, a router’s forwarding table can contain thousands of prefixes with
each router having hundreds of output ports. As the structure of the trie is dependent on
the prefix values in a forwarding table, in cases where large routing tables are used the
trie structure, the number of pipeline stages required and the node distributions will vary
but the flow of the search process down the pipeline and the way the nodes are distributed
across the stages will remain the same.
For this implementation, a total of fourteen output ports are considered numbered from 1
through 14 with each port representing one prefix node in the final leaf pushed trie. Port 7
is the default port which is chosen in the event that the router is not able to find any best
matches for the input IP address. This is done to create a fail safe using which IP packets
are forwarded to the next network on the default port where a best match prefix to the
destination network maybe found.
59
The simulation to test the operation of the designed RTL model using Verilog HDL was
conducted by using several test vectors. The different operating conditions of the pipeline
structure were tested to see if the expected results were obtained at the output.
The test vectors used to test the IP Lookup operation are listed below in Table 5.1.
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
192.168.10.10
11000000_10101000_00001010_00001010
0 000 00000000
44'b11000000101010000000101000001010000000000000
208.120.26.0
11010000_01111000_00011010_00000000
0 000 00000000
44'b11010000011110000001101000000000000000000000
10.10.10.1
00001010_00001010_00001010_00000001
0 000 00000000
44'b00001010000010100000101000000001000000000000
198.225.162.1
11000110_11100001_10100010_00000001
0 000 00000000
44'b11000110111000011010001000000001000000000000
21.149.202.229
00010101_10010101_11001010_11100101
0 000 00000000
44'b00010101100101011100101011100101000000000000
53.214.24.1
01001001_11010110_000011000_00000001
0 000 00000000
44'b01001001100101011100101011100101000000000000
60
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
IP ADDRESS
WRITE DATA
INPUT DATA
141.23.170.213
0001101_00010111_10101010_11000011
0 000 00000000
44'b10001101100101011100101011100101000000000000
177.224.140.10
0110001_11100000_10001010_00001010
0 000 00000000
44'b10110001100101011100101011100101000000000000
213.85.224.18
11010101_01010101_1110000_00010010
0 000 00000000
44'b11010101100101011100101011100101000000000000
245.188.8.20
11110101_10111100_00001000_00010100
0 000 00000000
44'b11110101100101011100101011100101000000000000
TABLE 5.1: List of test vectors to test the IP Lookup request outputs
It was observed that, for the above test vectors the outputs obtained were a match to the
expected outputs. A screen capture of the simulation waveforms is presented in Figure 5.1
Figure 5.1: Simulation output with outputs from each of the 7 stages in the pipeline
61
The diagram in figure 5.1 shows a detailed output value at each of the seven stage of the
pipeline architecture for every test vector that was given as the input to the search
module. The output at the very end named “PORT NUMBER” is the final output of the
model which represents the port number on the output side of a router module through
which the IP packet would be forwarded to reach the next hop router. Furthermore, it is
observed that the first output arrives after seven clock cycles which is proportional to the
number of stages in the pipeline structure and after that, an output arrives at the end of
every clock cycle. A simplified version of the simulation diagram in Figure 5.1 is
presented below in Figure 5.2 with only the inputs and the final output which is the
“PORT NUMBER” being shown.
Figure 5.2: Simplified simulation output showing only the inputs and final outputs
62
To show the operation of the Node Check Module, one of the above test vectors was
selected and the outputs from the Node Check Modules from stages containing the nodes
‘c’(Stage 1), ‘j’ (Stage 2) and ‘r’ (Stage 4) were observed. The following diagram in
Figure 5.3 shows the observed outputs at the Node Check Module at each stage.
Figure 5.3: Simulation outputs showing operation of NDC module
At stage 1, output points to the next node in stage 2. Output from stage 2 points to the
next node at stage 4 by disabling the enable bit to skip stage 3. The NDC Module at stage
3 decrements the stage value by 1 to zero which indicates that the next node is present at
stage 4. The NDC Enable module at stage 3 sets the enable bit in the Data_out from stage
3 so that memory at stage 4 can be accessed.
63
As described above, the write operation is used in cases where updates to the pipeline
structure are required due to a node being dropping from a network or a new node joining
a network. The following simulation diagram in Figure 5.4 shows the operation of the
write process where node P14 in the trie structure is dropped from the network. This
would require that the memory location of the stage which represents node P14 be
changed to 0, so that if any packet with IP address matching node P14’s prefix value
arrives, it will be forwarded through the default output port of the router to the next
network.
Figure 5.4: Simulation outputs showing write operation
When node P14 drops off from the network to which it is connected to, an input packet
with the write operation enable is inserted to the pipeline with the IP address pointing to
node P14 which is at stage 6 in this implementation. The write data in the input packet
contains how many stages have to be skipped to reach the stage where the memory has to
64
be updated. At the write module at each stage of the pipeline this value is decremented by
1 until it reach the pipeline stage where the value becomes 0. At this point, the write
enable bit to memory is enabled, and the data to be written to memory is given as input to
the memory where the location pointed to by the IP address is updated. In this case, the
data is all zeroes indicating a node was just been deleted. If a node has to be added, the
appropriate data for that node has to be forwarded to the memory location through the
write data bits where the memory locations corresponding to that node are updated to
contain information about the new node.
5.2 IMPLEMENTATION ANALYSIS
The implementation of the model is based on the Xilinx Spartan 6 XC6SLX9 FPGA. Post
synthesis and Place and Route, it is was observed that this implementation had a
maximum operating frequency of 147.189MHz resulting in a time period of 6.79ns for
the lookup operation and a total of 147 million lookups per second. This translates to a
speed of 47.1 Gbps for a minimum packet size of 40 bytes. The number of slice LUTs
used was 149 out of 5720 available and the number of block RAM used is 3 out of the 32
18Kb block RAMs available in this FPGA.
The implementation of the IP lookup process using a pipeline structure with uniform
distribution of the trie nodes across all the stages of the pipeline and using the search
process as described above, shows that an IP lookup can be completed at every clock
65
cycle. The delay for the lookup for each IP packet arriving at the routers input is constant
and is equal to the number of stages in the pipeline. The memory size for each pipeline
will also vary depending on the total number of subtrie nodes. Also, since the search
process here is linear with only one starting point and one exit point, the sequence of the
lookup for the incoming IP packets is maintained. With the linear operation of the
pipeline, updates can also be made with disrupting the lookup process by inserting write
operations in between the lookup requests.
66
CHAPTER 6
CONCLUSION
6.1 PROJECT ACHIEVEMENTS
With the current pace of growth of networks connected by the internet and the rapid
increase in the number of connected devices, hardware implementation of the routing
process has been found to be the most efficient and the least time consuming way of
routing IP packets from the source to the destination. The requirements for the number of
lookups that will have to be carried out at the routers has increased at a rapid rate and the
software implementation of the IP lookups cannot match the requirements. One main
reason for this would be that with the software approach, there is additional overhead
required for processing the softwares running the routing process. Also by implementing
this on an FPGA, the power requirements will be considerably lower than using a
processor. Implementation on a FPGA will also enable in making the lookup process and
the entire structure of the pipeline reconfigurable.
The pipelined IP lookup architecture with the uniform node distribution that has been
implemented in this project has many benefits over other pipelined IP lookup
implementation. For one, this implementation helps in creating balanced memory blocks
across all the memory stages of the pipeline. This has the benefit of achieving uniform
memory access times at all of the memory blocks, thereby speeding up the lookup
67
process. The distribution method of the trie nodes across the pipeline stages used in this
implementation also helps in the creation of a linear pipeline structure with one entry
point and one exit point. This eliminates the need for any loopbacks/backtracking on the
pipeline structure and will also eliminate the need for any extra logic units that will be
required to implement these loopbacks. The linear pipeline structure has the added
advantage of maintaining the sequence of the lookup requests arriving at the input. This
will again eliminate the need for any hardware to keep track of the input packet
sequences to match them with their corresponding outputs.
The only drawback of this implementation will be when a new router comes online
initially with a small routing table. The pipelined architecture will have to be
reconfigured several times as the entries in the forwarding table grow over time when the
router discovers new networks that it can reach. But once the entries in the forwarding
table have grown to a significant number, any changes in the network topology around
the router and the resulting corresponding changes to the forwarding table can be made
by updating the entries by using the write process without disruptions to the lookup
process.
6.2 SCOPE FOR IMPROVEMENTS
With the pipelined architecture, the IP lookup process can also be scaled up to implement
search on forwarding tables for IPv6 addresses which utilize 64 bits for the IP address
68
field as against the currently more popular IPv4 addressing scheme’s 32 bits. This will be
possible because even though the number of address bits increases, the packet routing
process still remains the same although the size of the forwarding tables can be
considerably larger for IPv6 when compared to that of IPv4 addresses. With the pipelined
lookup architecture being implemented on a FPGA and the reconfigurable flexibilities
that FPGAs offer, these changes to the structure required to implement lookups on IPv6
addresses can be made easily.
69
REFERENCES
1. V. Srinivasan and G. Varghese. Fast address lookups using controlled prefix
expansion. ACM Transactions on Computer Systems, Feb 1999.
2. http://en.wikipedia.org/wiki/IP_address, Retrieve date: August 2013
3. http://en.wikipedia.org/wiki/IPv4, Retrieve date: August 2013
4. http://www.networkcomputing.com/netdesign/ip101.html, Retrieve date: August 2013
5. F. Baboescu, D. M. Tullsen, G. Rosu, and S. Singh. A tree based router search engine
architecture with single port memories. Proceedings of ISCA ’05, Jun 2005.
6. S. Kumar, M. Becchi, P. Crowley, and J. Turner. Camp: fast and efficient IP lookup
architecture. Proceedings of ANCS’06, Dec 2006.
7. M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. Survey and taxonomy of IP
address lookup algorithms. IEEE Network, Mar/Apr 2001.
8. A. Basu and G. Narlikar. Fast incremental updates for pipelined forwarding engines.
Proceedings of INFOCOM ’03, Mar/Apr 2003.
9. H. Le, W. Jiang, and V. K. Prasanna. A sram-based architecture for trie-based ip
lookup using fpga. In Proc. FCCM ’08, 2008.
10. Ravikumar V.C and Rabi N. Mahapatra. TCAM architecture
for IP lookup using
prefix properties. IEEE Micro, 2004.
11. W. Jiang and V. K. Prasanna. A memory balanced linear pipeline architecture for trie
based IP lookup. 15th Annual IEEE Symposium on High-Performance Interconnects,
2007
12. Xilinx Spartan 6 FPGA Datasheet (http://www.xilinx.com/support/documentation/
data_sheets/ds162.pdf), Retrieve date: September 2013
13. http://embeddedmicro.com/tutorials/mojo, Retrieve date: September 2013
70
APPENDIX A
DATA IN MEMORY MODULES
1. Stage 1 Memory Module
NODE
a
P1
b
P1
c
d
e
f
ADDRESS
DATA
0000
010 0 010
0001
011 0 010
0010
001 0 110
0011
001 0 110
0100
000 1 000
0101
001 1 000
0110
001 0 110
0111
001 0 110
1000
100 1 000
1001
101 1 000
1010
010 1 000
1011
011 1 000
1100
100 0 010
1101
101 0 010
1110
000 0 011
1111
001 0 011
71
2. Stage 2 Memory Module
NODE
h
i
k
l
j
P6
ADDRESS
DATA
0000
000 1 000
0001
0001 1 000
0010
000 1 000
0011
010 1 000
0100
011 1 000
0101
100 1 000
0110
101 1 000
0111
011 1 000
1000
000 0 001
1001
001 0 001
1010
011 0 101
1011
011 0 101
1100
000 0 000
1101
000 0 000
1110
000 0 000
1111
000 0 000
72
3. Stage 3 Memory Module
NODE
P1
q
P13
P4
P7
s
ADDRESS
DATA
0000
001 0 100
0001
001 0 100
0010
010 0 010
0011
011 0 010
0100
010 0 100
0101
010 0 100
0110
100 0 100
0111
100 0 100
1000
100 0 100
1001
100 0 100
1010
100 0 001
1011
101 0 001
1100
000 0 000
1101
000 0 000
1110
000 0 000
1111
000 0 000
73
4. Stage 4 Memory Module
NODE
P2
r
P1
g
P8
m
ADDRESS
DATA
0000
011 0 011
0001
011 0 011
0010
000 0 001
0011
001 0 001
0100
001 0 011
0101
001 0 011
0110
010 0 001
0111
011 0 001
1000
101 0 011
1001
101 0 011
1010
100 0 001
1011
101 0 001
1100
000 0 000
1101
000 0 000
1110
000 0 000
1111
000 0 000
74
5. Stage 5 Memory Module
NODE
P12
n
P1
P9
P10
P4
ADDRESS
DATA
0000
110 0 010
0001
110 0 010
0010
000 0 001
0011
001 0 001
0100
010 0 010
0101
010 0 010
0110
010 0 010
0111
010 0 010
1000
0100 0 010
1001
100 0 010
1010
100 0 010
1011
100 0 010
1100
000 0 000
1101
000 0 000
1110
000 0 000
1111
000 0 000
75
6. Stage 6 Memory Module
NODE
P2
P14
P11
P1
P5
P2
ADDRESS
DATA
0000
011 0 001
0001
011 0 001
0010
011 0 001
0011
011 0 001
0100
001 0 001
0101
001 0 001
0110
001 0 001
0111
001 0 001
1000
101 0 001
1001
101 0 001
1010
101 0 001
1011
101 0 001
1100
000 0 000
1101
000 0 000
1110
000 0 000
1111
000 0 000
76
7. Stage 7 Memory Module
NODE
P12
P15
ADDRESS
DATA
0000
110 1 000
0001
110 1 000
0010
110 1 000
0011
110 1 000
0100
000 0 000
0101
000 0 000
0110
000 0 000
0111
000 0 000
1000
000 0 000
1001
000 0 000
1010
000 0 000
1011
000 0 000
1100
000 0 000
1101
000 0 000
1110
000 0 000
1111
000 0 000
77
APPENDIX B
IMPLEMENTATION CODES
1. TOP MUDULE OF IP LOOKUP STRUCTURE
module IP_Lookup_Top
(
input [31:0] IP_in,
input [11:0] wr_Data_In,
input clk,
output [7:0] Data_out,
output [31:0] Shift_out,
output [11:0] wr_Data_Out
);
//wires connecting BRAM data from one stage to the next
wire [7:0] DS1_to_DS2;
wire [7:0] DS2_to_DS3;
wire [7:0] DS3_to_DS4;
wire [7:0] DS4_to_DS5;
wire [7:0] DS5_to_DS6;
wire [7:0] DS6_to_DS7;
//wires connecting SRL data from one stage to the next
wire [31:0] Shift1_Shift2;
wire [31:0] Shift2_Shift3;
wire [31:0] Shift3_Shift4;
wire [31:0] Shift4_Shift5;
wire [31:0] Shift5_Shift6;
wire [31:0] Shift6_Shift7;
//wires connecting WRITE data from one stage to the next
wire [11:0] WData1_WData2;
wire [11:0] WData2_WData3;
wire [11:0] WData3_WData4;
wire [11:0] WData4_WData5;
78
wire [11:0] WData5_WData6;
wire [11:0] WData6_WData7;
//Parameter: IPSize -> IP address size to SRL, DSize -> Output data size to BRAM
Stage1_Top #(32,8) Stage1
(
.IP_in(IP_in),
.wr_Data_In(wr_Data_In),
.clk(clk),
.Data_out(DS1_to_DS2),
.IP_shift_out(Shift1_Shift2),
.wr_Data_Out(WData1_WData2)
);
//Parameter: IPSize -> IP address size to SRL, DSize -> Input data size to BRAM and
NDC Mod, ASize -> BRAM address size
Stage2_Top #(32,8,5) Stage2 (
.IP_in(Shift1_Shift2),
.Data_in(DS1_to_DS2),
.wr_Data_In(WData1_WData2),
.clk(clk),
.Data_out(DS2_to_DS3),
.IP_out(Shift2_Shift3),
.wr_Data_Out(WData2_WData3)
);
//Parameter: IPSize -> IP address size to SRL, DSize -> Input data size to BRAM and
NDC Mod, ASize -> BRAM address size
Stage3_Top #(32,8,5) Stage3 (
.IP_in(Shift2_Shift3),
.Data_in(DS2_to_DS3),
.wr_Data_In(WData2_WData3),
.clk(clk),
.Data_out(DS3_to_DS4),
.IP_out(Shift3_Shift4),
79
.wr_Data_Out(WData3_WData4)
);
//Parameter: IPSize -> IP address size to SRL, DSize -> Input data size to BRAM and
NDC Mod, ASize -> BRAM address size
Stage4_Top #(32,8,5) Stage4 (
.IP_in(Shift3_Shift4),
.Data_in(DS3_to_DS4),
.wr_Data_In(WData3_WData4),
.clk(clk),
.Data_out(DS4_to_DS5),
.IP_out(Shift4_Shift5),
.wr_Data_Out(WData4_WData5)
);
//Parameter: IPSize -> IP address size to SRL, DSize -> Input data size to BRAM and
NDC Mod, ASize -> BRAM address size
Stage5_Top #(32,8,5) Stage5 (
.IP_in(Shift4_Shift5),
.Data_in(DS4_to_DS5),
.wr_Data_In(WData4_WData5),
.clk(clk),
.Data_out(DS5_to_DS6),
.IP_out(Shift5_Shift6),
.wr_Data_Out(WData5_WData6)
);
//Parameter: IPSize -> IP address size to SRL, DSize -> Input data size to BRAM and
NDC Mod, ASize -> BRAM address size
Stage6_Top #(32,8,5) Stage6 (
.IP_in(Shift5_Shift6),
.Data_in(DS5_to_DS6),
.wr_Data_In(WData5_WData6),
.clk(clk),
.Data_out(DS6_to_DS7),
80
.IP_out(Shift6_Shift7),
.wr_Data_Out(WData6_WData7)
);
//Parameter: IPSize -> IP address size to SRL, DSize -> Input data size to BRAM and
NDC Mod, ASize -> BRAM address size
Stage7_Top #(32,8,5) Stage7 (
.IP_in(Shift6_Shift7),
.Data_in(DS6_to_DS7),
.wr_Data_In(WData6_WData7),
.clk(clk),
.Data_out(Data_out),
.IP_out(Shift_out),
.wr_Data_Out(wr_Data_Out)
);
endmodule
81
2. STAGE 1 MODULES
2.1 STAGE 1 TOP MODULE
//Parameter: IPSize -> IP addr size, DSize -> Memory data size
module Stage1_Top #(parameter IPSize=1,DSize=1)
(
input [IPSize-1:0] IP_in,
input [11:0] wr_Data_In,
input clk,
output [DSize-1:0] Data_out,
output [IPSize-1:0] IP_shift_out,
output [11:0] wr_Data_Out
);
wire [11:0] WData_bram;
wire wea_bram;
SRL_Mod #(IPSize) SRL_TOP_ST1 (
.IP_in(IP_in),
.clk(clk),
.IP_shift_out(IP_shift_out)
);
//Parameter: IP_size -> IP addr size, DSize -> Memory data size
BRAM_ST1_Mod #(IPSize,DSize) BRAM_TOP_ST1 (
.IP_in(IP_in),
.clk(clk),
.WData(WData_bram[7:0]),
.WEN(wea_bram),
.Data_out(Data_out)
);
Write_Mod_Top WRITE_TOP_ST1 (
// wr_Data_In: 10 bit -> 1bit - wr_en, 3bits - stage, 6bits - write data
.wr_Data_In(wr_Data_In),
82
.clk(clk),
.wr_Data_Bram(WData_bram),
.wr_Data_Next(wr_Data_Out),
.wea(wea_bram)
);
endmodule
83
2.2 STAGE 1 SHIFT REGISTER MODULE
2.2.1 TOP MODULE OF SHIFT REGISTER
//Parameter: size -> IP data size
module SRL_Mod #(parameter size=1)
(
input [size-1:0] IP_in,
input clk,
output [size-1:0] IP_shift_out
);
wire [size-1:0] Data_to_Reg;
SRL_Stage1 #(size) Shift_ST1 (
.IP_in(IP_in),
.clk(clk),
.IP_out(Data_to_Reg)
);
Register #(size) Reg_SR_ST1 (
.In(Data_to_Reg),
.clk(clk),
.Out(IP_shift_out)
);
endmodule
84
2.2.2 SHIFT REGISTER MODULE
module SRL_Stage1 #(parameter IP_size = 1)
(
input [IP_size-1:0] IP_in,
input clk,
output reg [IP_size-1:0] IP_out
);
always@(posedge clk)
begin
IP_out <= IP_in<<4;
end
endmodule
2.2.3 REGISTER IN SHIFT REGISTER MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
85
2.3 STAGE 1 MEMORY MODULE
2.3.1 TOP MODULE OF MEMORY MODULE
//Parameter: IP_size -> IP addr size, DSize -> Memory data size
module BRAM_ST1_Mod #(parameter IP_size=1, DSize=1)
(
input [IP_size-1:0] IP_in,
input clk,
input [DSize-1:0] WData,
input WEN,
output [DSize-1:0] Data_out
);
wire [DSize-1:0] Data_to_Reg;
wire [DSize-1:0] Data_to_Mux;
BRAM_1 BRAM_Stage1 (
.clka(clk),
.wea(WEN),
.addra({0,IP_in[31:28]}),
.dina(WData),
.douta(Data_to_Reg)
);
Register #(DSize) Reg_Data_Mux (
.In(Data_to_Reg),
.clk(clk),
.Out(Data_to_Mux)
);
Multiplexer #(DSize) Mux_ST1 (
.Mux_in1(Data_to_Mux),
.Mux_in2(Data_to_Mux),
.Sel(~Data_to_Mux[3]),
.Mux_out(Data_out)
);
endmodule
86
2.3.2 REGISTER IN MEMORY MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
2.3.3 MULTIPLEXER IN MEMORY MODULE
module Multiplexer #(parameter Mux_size=1)
(
input [Mux_size-1:0] Mux_in1,
input [Mux_size-1:0] Mux_in2,
input Sel,
output reg [Mux_size-1:0] Mux_out
);
always@(Mux_in1 or Mux_in2 or Sel)
begin
if (Sel == 1'b0)
Mux_out <= Mux_in1;
else
Mux_out <= Mux_in2;
end
endmodule
87
2.4 STAGE 1 UPDATE MODULE
2.4.1 TOP MODULE OF UPDATE MODULE
module Write_Mod_Top
(
// wr_Data_In: 11 bit -> 1bit - wr_en, 3bits - stage, 7 bits - write data
input [11:0] wr_Data_In,
input clk,
output [11:0] wr_Data_Bram,
output [11:0] wr_Data_Next,
output wea
);
// wires connecting write module to concat module
wire wea_to_concat;
wire wren_to_concat;
wire [2:0] stage_to_concat;
wire [7:0]dout_to_concat;
//wires connecting concat module to demux and bram module
wire wea_reg;
wire wea_to_bram;
// wdata_demux: 11bits -> 1bit - wr_en, 3bits - stage,7 6bits - wr_data
wire [11:0] wdata_demux;
wire [11:0] wdata_reg;
Write_Mod wr_top_mod (
.wr_en(wr_Data_In[11]),
.wr_stage(wr_Data_In[10:8]),
.wr_data(wr_Data_In[7:0]),
.wea(wea_to_concat),
.wr_en_out(wren_to_concat),
.stage_out(stage_to_concat),
.wr_dout(dout_to_concat)
);
88
Write_Concat wr_concat_top (
.wea(wea_to_concat),
.wr_en(wren_to_concat),
.wr_stage(stage_to_concat),
.wr_data(dout_to_concat),
.wea_out(wea),
.wr_out(wdata_demux)
);
Write_Demux wr_demux_top (
.wr_data(wdata_demux),
.sel(wdata_demux[11]),
.wr_bram(wr_Data_Bram),
.wr_next_stage(wdata_reg)
);
Register #(12) Reg_DNext (
.In(wdata_reg),
.clk(clk),
.Out(wr_Data_Next)
);
endmodule
89
2.4.2 WRITE MODULE OF UPDATE MODULE
module Write_Mod
(
input [7:0] wr_data,
input [2:0] wr_stage,
input wr_en,
output reg wea,
output reg wr_en_out,
output reg [2:0] stage_out,
output reg [7:0] wr_dout
);
always@(wr_data or wr_stage or wr_en) begin
if (wr_en == 1) begin
if (wr_stage == 0) begin
wea <= 1'b1;
wr_en_out <= 1'b0;
wr_dout <= wr_data;
stage_out <= wr_stage;
end
else begin
wea <= 1'b0;
wr_en_out <= 1'b1;
wr_dout <= wr_data;
stage_out <= wr_stage - 1;
end
end
else begin
wea <= 1'b0;
wr_en_out <= 1'b0;
wr_dout <= 8'b0;
stage_out <= 3'b0;
end
end
endmodule
90
2.4.3 CONCATENATION MODULE IN UPDATE MODULE
module Write_Concat
(
input wea,
input wr_en,
input [7:0] wr_data,
input [2:0] wr_stage,
input clk,
output reg [11:0] wr_out,
output reg wea_out
);
always@(wr_en or wr_stage or wr_data or wea)
begin
wea_out <= wea;
wr_out <= {wr_en,wr_stage,wr_data};
end
endmodule
91
2.4.4 DEMULTIPLEXER MODULE IN UPDATE MODULE
module Write_Demux
(
input [11:0] wr_data,
input sel,
output reg [11:0] wr_bram,
output reg [11:0] wr_next_stage
);
always@(wr_data or sel)
begin
if (sel == 0) begin
wr_bram <= wr_data;
wr_next_stage <= 12'b0;
end
else begin
wr_bram <= 12'b0;
wr_next_stage <= wr_data;
end
end
endmodule
92
2.4.4 REGISTER MODULE IN UPDATE MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
93
3. MODULES OF OTHER STAGES OF THE PIPELINE
3.1 TOP MODULE
//Parameter: IPSize -> IP address size to SRL, DSize -> Input data size to BRAM and
NDC Mod, ASize -> BRAM address size
module Stage2_Top #(parameter IPSize=1,DSize=1,ASize=1)
(
input [IPSize-1:0] IP_in,
input [DSize-1:0] Data_in,
input [11:0] wr_Data_In,
input clk,
output [DSize-1:0] Data_out,
output [IPSize-1:0] IP_out,
output [11:0] wr_Data_Out
);
wire [11:0] WData_bram;
wire wea_bram;
wire [DSize-1:0] Data_BReg;
wire [DSize-1:0] Data_NReg;
wire [DSize-1:0] Data_to_Mux1;
wire [DSize-1:0] Data_to_Mux2;
//Parameter: IP_size -> IP address size
SRL_Mod_2 #(IPSize) SRL_TOP_ST2 (
.IP_in(IP_in),
.clk(clk),
.EN(Data_in[3]),
.IP_shift_out(IP_out)
);
//Parameter: DSize1 -> Input data size, DSize2 -> address size to memory
BRAM_Mod_2 #(DSize,ASize) BRAM_TOP_ST2 (
.Data_in(Data_in),
.WData(WData_bram[7:0]),
94
//MSB from IP address to be concatenated with the input data
.IP_bit_in(IP_in[IPSize-1]),
.clk(clk),
.WEN(wea_bram),
.Data_out(Data_BReg)
);
Register #(DSize) Reg_BRAM_ST2
(
.In(Data_BReg),
.clk(clk),
.Out(Data_to_Mux1)
);
//Parameter: DSize -> Input data size
Node_Check_Mod #(DSize) NDC_TOP_ST2 (
.Data_in(Data_in),
.clk(clk),
.Data_out(Data_NReg)
);
Register #(DSize) Reg_NDC_ST2
(
.In(Data_NReg),
.clk(clk),
.Out(Data_to_Mux2)
);
//Parameter: DSize -> Input data size from either BRAM or NDC
Mux_out #(DSize) MUX_TOP_ST2 (
.Data_in1(Data_to_Mux1),
.Data_in2(Data_to_Mux2),
.clk(clk),
.Reg_in(Data_in[3]),
.Data_out(Data_out)
);
95
Write_Mod_Top WRITE_TOP_ST2 (
//wr_Data_In 10 bit -> 1bit - wr_en, 3bits - stage, 6bits - write data
.wr_Data_In(wr_Data_In),
.clk(clk),
.wr_Data_Bram(WData_bram),
.wr_Data_Next(wr_Data_Out),
.wea(wea_bram)
);
endmodule
96
3.2 SHIFT REGISTER MODULE AT EACH STAGE
3.2.1 TOP MODULE OF SHIFT REGISTER
//Parameter: IP_size -> IP address size
module SRL_Mod_2 #(parameter IP_size=1)
(
input [IP_size-1:0] IP_in,
input clk,
input EN,
output [IP_size-1:0] IP_shift_out
);
wire [IP_size-1:0] Data_to_Reg;
SRL_1Bit #(IP_size) SRL_1Bit_instance (
.IP_in(IP_in),
.EN(EN),
.clk(clk),
.IP_out(Data_to_Reg)
);
Register #(IP_size) Reg_SR (
.In(Data_to_Reg),
.clk(clk),
.Out(IP_shift_out)
);
endmodule
97
3.2.2 SHIFT REGISTER MODULE
module SRL_1Bit #(parameter IP_size = 1)
(
input [IP_size-1:0] IP_in,
input EN,
input clk,
output reg [IP_size-1:0] IP_out
);
always@(posedge clk)
begin
if (EN == 1)
IP_out <= IP_in<<1;
else
IP_out <= IP_in;
end
endmodule
3.2.3 REGISTER IN SHIFT REGISTER MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
98
3.3 MEMORY MODULE AT EACH STAGE
3.3.1 TOP MODULE OF MEMORY MODULE
//Parameter: DSize1 -> Input data size, DSize2 -> address size to memory
module BRAM_Mod_2 #(parameter DSize1=1, DSize2=1)
(
input [DSize1-1:0] Data_in,
input [DSize1-1:0] WData,
//MSB from IP address to be concatenated with the input data
input IP_bit_in,
input clk,
input WEN,
output [DSize1-1:0] Data_out
);
wire [DSize2-1:0] Data_to_Bram;
//Parameter: size1 -> Input data size, size2 -> Concat output size
Concat_Mod_2 #(4,5) Concat_Mod2_ST2 (
.Concat_in1(Data_in[7:4]),
.Concat_in2(IP_bit_in),
.Concat_out(Data_to_Bram)
);
BRAM_2 BRAM_Stage2 (
.clka(clk),
.wea(WEN),
.addra(Data_to_Bram),
.dina(WData),
.douta(Data_out)
);
endmodule
99
3.3.2 CONCATENATION MODULE IN MEMORY MODULE
//Parameter: size1 -> Input data size, size2 -> Concat output size
module Concat_Mod_2 #(parameter size1=1,size2=1)
(
input [size1-1:0] Concat_in1,
input Concat_in2,
output reg [size2-1:0] Concat_out
);
always@(Concat_in1 or Concat_in2)
begin
Concat_out <= {Concat_in1,Concat_in2};
end
endmodule
3.3.3 REGISTER IN SHIFT REGISTER MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
100
3.4 NODE DISTANCE CHECK MODULE AT EACH STAGE
3.4.1 TOP MODULE OF NODE DISTANCE CHECK MODULE
//Parameter: d_size -> Input data size
module Node_Check_Mod #(parameter d_size=1)
(
input [d_size-1:0] Data_in,
input clk,
output [d_size-1:0] Data_out
);
wire [2:0] Node_Dist;
wire EN_NDC;
//Parameter -> Node distance size
NDC #(3) NDC_Mod (
.Node_Dist_in(Data_in[2:0]),
.Node_Dist_out(Node_Dist)
);
//Parameter -> Node distance size
NDC_EN #(3) NDC_EN_Mod (
.Node_Dist(Node_Dist),
.EN(EN_NDC)
);
//Parameter: size1->Node addr, size2->EN, size3->Node Dist
NDC_Concat #(4,1,3,8) NDC_Concat_Mod (
.Concat_in1(Data_in[7:4]),
//Node memory address
.Concat_in2(EN_NDC),
//Enable
.Concat_in3(Node_Dist),
//Node distance
.clk(clk),
.Concat_out(Data_out)
);
endmodule
101
3.4.2 DISTANCE CHECK MODULE IN NODE DISTANCE CHECK MODULE
module NDC #(parameter size = 1)
(
input [size-1:0] Node_Dist_in,
output reg [size-1:0] Node_Dist_out
);
always@(Node_Dist_in)
begin
if(Node_Dist_in == 0) begin
Node_Dist_out <= Node_Dist_in;
end
else begin
Node_Dist_out <= Node_Dist_in - 1;
end
end
endmodule
3.4.3 ENABLE MODULE IN NODE DISTANCE CHECK MODULE
module NDC_EN #(parameter size = 1)
(
input [size-1:0] Node_Dist,
output reg EN
);
always@(Node_Dist)
begin
if (Node_Dist == 0)
EN <= 1;
else
EN <= 0;
end
endmodule
102
3.4.4 CONCATENATION MODULE IN NODE DISTANCE CHECK MODULE
//Parameter: size1->Node addr, size2->EN, size3->Node Dist, dsize->concatenated data
output
module NDC_Concat #(parameter size1=1,size2=1,size3=1,dsize=1)
(
input [size1-1:0] Concat_in1,
input [size2-1:0] Concat_in2,
input [size3-1:0] Concat_in3,
input clk,
output reg [dsize-1:0] Concat_out
);
always@(posedge clk)
begin
Concat_out <= {Concat_in1,Concat_in2,Concat_in3};
end
endmodule
3.4.5 REGISTER MODULE IN NODE DISTANCE CHECK MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
103
3.5 OUTPUT MULTIPLEXER MODULE AT EACH STAGE
3.5.1 TOP MODULE OF OUTPUT MULTIPLEXER MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
3.5.2 MULTIPLEXER MODULE OF OUTPUT MULTIPLEXER MODULE
module Multiplexer #(parameter Mux_size=1)
(
input [Mux_size-1:0] Mux_in1,
input [Mux_size-1:0] Mux_in2,
input Sel,
output reg [Mux_size-1:0] Mux_out
);
always@(Mux_in1 or Mux_in2 or Sel)
begin
if (Sel == 1'b0)
Mux_out <= Mux_in1;
else
Mux_out <= Mux_in2;
end
endmodule
104
3.6 UPDATE MODULE AT EACH STAGE
3.6.1 TOP MODULE OF UPDATE MODULE
module Write_Mod_Top
(
// wr_Data_In: 11 bit -> 1bit - wr_en, 3bits - stage, 7 bits - write data
input [11:0] wr_Data_In,
input clk,
output [11:0] wr_Data_Bram,
output [11:0] wr_Data_Next,
output wea
);
// wires connecting write module to concat module
wire wea_to_concat;
wire wren_to_concat;
wire [2:0] stage_to_concat;
wire [7:0]dout_to_concat;
//wires connecting concat module to demux and bram module
wire wea_reg;
wire wea_to_bram;
// wdata_demux: 11bits -> 1bit - wr_en, 3bits - stage,7 6bits - wr_data
wire [11:0] wdata_demux;
wire [11:0] wdata_reg;
Write_Mod wr_top_mod (
.wr_en(wr_Data_In[11]),
.wr_stage(wr_Data_In[10:8]),
.wr_data(wr_Data_In[7:0]),
.wea(wea_to_concat),
.wr_en_out(wren_to_concat),
.stage_out(stage_to_concat),
.wr_dout(dout_to_concat)
);
105
Write_Concat wr_concat_top (
.wea(wea_to_concat),
.wr_en(wren_to_concat),
.wr_stage(stage_to_concat),
.wr_data(dout_to_concat),
.wea_out(wea),
.wr_out(wdata_demux)
);
Write_Demux wr_demux_top (
.wr_data(wdata_demux),
.sel(wdata_demux[11]),
.wr_bram(wr_Data_Bram),
.wr_next_stage(wdata_reg)
);
Register #(12) Reg_DNext (
.In(wdata_reg),
.clk(clk),
.Out(wr_Data_Next)
);
endmodule
106
3.6.2 WRITE MODULE OF UPDATE MODULE
module Write_Mod
(
input [7:0] wr_data,
input [2:0] wr_stage,
input wr_en,
output reg wea,
output reg wr_en_out,
output reg [2:0] stage_out,
output reg [7:0] wr_dout
);
always@(wr_data or wr_stage or wr_en) begin
if (wr_en == 1) begin
if (wr_stage == 0) begin
wea <= 1'b1;
wr_en_out <= 1'b0;
wr_dout <= wr_data;
stage_out <= wr_stage;
end
else begin
wea <= 1'b0;
wr_en_out <= 1'b1;
wr_dout <= wr_data;
stage_out <= wr_stage - 1;
end
end
else begin
wea <= 1'b0;
wr_en_out <= 1'b0;
wr_dout <= 8'b0;
stage_out <= 3'b0;
end
end
endmodule
107
3.6.3 CONCATENATION MODULE IN UPDATE MODULE
module Write_Concat
(
input wea,
input wr_en,
input [7:0] wr_data,
input [2:0] wr_stage,
input clk,
output reg [11:0] wr_out,
output reg wea_out
);
always@(wr_en or wr_stage or wr_data or wea)
begin
wea_out <= wea;
wr_out <= {wr_en,wr_stage,wr_data};
end
endmodule
108
3.6.4 DEMULTIPLEXER MODULE IN UPDATE MODULE
module Write_Demux
(
input [11:0] wr_data,
input sel,
output reg [11:0] wr_bram,
output reg [11:0] wr_next_stage
);
always@(wr_data or sel)
begin
if (sel == 0) begin
wr_bram <= wr_data;
wr_next_stage <= 12'b0;
end
else begin
wr_bram <= 12'b0;
wr_next_stage <= wr_data;
end
end
endmodule
3.6.4 REGISTER MODULE IN UPDATE MODULE
module Register #(parameter size = 1)
(
input [size-1:0] In,
input clk,
output reg [size-1:0] Out
);
always@(negedge clk)
begin
Out <= In;
end
endmodule
109
4.TESTBENCH TO VERIFY THE FUNCTIONAL BEHAVIOR OF THE DESIGN
module Top_Module_tb;
// Inputs
reg [43:0] Packet_in;
reg clk;
// Outputs
wire [3:0] Port_Number;
wire [31:0] IP_shifted;
wire [11:0] wr_Data;
// Instantiate the Unit Under Test (UUT)
Top_Module uut (
.Packet_in(Packet_in),
.clk(clk),
.Port_Number(Port_Number),
.IP_shifted(IP_shifted),
.wr_Data(wr_Data)
);
initial begin
clk = 0;
forever begin
#5 clk = ~clk;
end
end
initial begin
Packet_in= 44'b11000000101010000000101000001010000000000000;
//192.168.10.10
#10 Packet_in = 44'b11110000011110000001101000000000000000000000;
//241.120.26.0
#10 Packet_in = 44'b00001010000010100000101000000001000000000000;
//10.10.10.1
#10 Packet_in = 44'b11000110111000011010001000000001000000000000;
//198.225.162.1
110
#10 Packet_in = 44'b00010101100101011100101011100101000000000000;
//21.149.202.229
#10 Packet_in = 44'b01001001100101011100101011100101000000000000;
//53.214.24.1
#10 Packet_in = 44'b10001101100101011100101011100101000000000000;
//141.23.170.213
#10 Packet_in = 44'b10110001100101011100101011100101000000000000;
//177.224.140.10
#10 Packet_in = 44'b11010101100101011100101011100101000000000000;
//213.85.224.18
#10 Packet_in = 44'b11110101100101011100101011100101000000000000;
//245.188.8.20
#10 Packet_in = 44'b00001101101011010010110100101011000000000000;
//49.173.45.43
#10 Packet_in = 44'b11100001010011101000101010001010000000000000;
//225.78.138.138
#10 Packet_in = 44'b01110101000100101110001011101000000000000000;
//117.18.226.232
#10 Packet_in = 44'b10100010110010010010101010101011000000000000;
//162.201.42.189
#10 Packet_in = 44'b10001001010101011111010111010010000000000000;
//137.85.245.210
//UPDATE OPERATION 1 : Node P14 in Subtrie c
//Initial lookup
//IP address(32 bits) -> 141.23.170.213 - 10001101_00010111_10101010_11000011;
Packet_in = 44'b10001101000101111010101011000011000000000000;
//Write Data(12 bits) -> 0 000 00000000;
//Output -> 1110 (Port14) - P14 - subtrie (c)
//update Mem location at Stage 6 - Node P14
#10 Packet_in = 44'b10001101000101111010101011000011110100000001;
//Write Data(12 bits) -> 1 101 00000001;
//Output -> 0000 (----) - P14 - subtrie (c)
//update Mem location at Stage 6 - Node P14
#10 Packet_in = 44'b10001111000101111010101011000011110100000001;
//Write Data(12 bits) -> 1 101 00000001;
//Output -> 0000 (----) - P14 - subtrie (c)
//update Mem location at Stage 4 - Node r
#10 Packet_in = 44'b10001101000101111010101011000011101100000011;
//Write Data(12 bits) -> 1 011 00000011;
//Output -> 0000 (----) - P14 - subtrie (c)
111
//Check updated value
#10 Packet_in = 44'b10001101000101111010101011000011000000000000;
//UPDATE OPERATION 2 : Node m in Subtrie e
//Initial lookup - Node P5 @ subtrie e
//IP address(32 bits) -> 213.85.224.18 - 11010101_10010101_11001010_11100101;
#10Packet_in = 44'b11010101100101011100101011100101000000000000;
//Write Data(12 bits) -> 0 000 00000000;
//Output -> 0101 (Port5) - P5 - subtrie (e)
//Initial lookup - Node P2 @ subtrie e
//IP address(32 bits) -> 221.85.224.18 - 11010101_10010101_11001010_11100101;
#10 Packet_in = 44'b11011101100101011100101011100101000000000000;
//Write Data(12 bits) -> 0 000 00000000;
//Output -> 0010 (Port2) - P2 - subtrie (e)
//Update Memory location at Stage 6 for Node P5
//IP address(32 bits) -> 213.85.224.18 - 10001101_00010111_10101010_11000011;
#10 Packet_in = 44'b10001101100101011100101011100101110100000000;
//Write Data(12 bits) -> 1 101 00000000;
//Output -> 0000 (----) - P14 - subtrie (e)
//Update Memory location at Stage 6 for Node P2
//IP address(32 bits) -> 221.85.224.18 - 10001101_00010111_10101010_11000011;
#10 Packet_in = 44'b10001101100101011100101011100101110100000000;
//Write Data(12 bits) -> 1 101 00000000;
//Output -> 0000 (----) - P14 - subtrie (e)
//Update Memory location at Stage 4 for Node m
//IP address(32 bits) -> 213.85.224.18 - 10001101_00010111_10101010_11000011;
#10 Packet_in = 44'b10001101100101011100101011100101101100000000;
112
//Write Data(12 bits) -> 1 011 00000000;
//Output -> 0000 (----) - P14 - subtrie (e)
//Check updated value - search for Node P5
//IP address(32 bits) -> 213.85.224.18 - 11010101_10010101_11001010_11100101;
#10 Packet_in = 44'b11010101100101011100101011100101000000000000;
//Write Data(12 bits) -> 0 000 00000000;
//Output -> 0111 (Port7) - P5 - subtrie (e) - Default Port
//Check updated value - search for Node P2
//IP address(32 bits) -> 221.85.224.18 - 11010101_10010101_11001010_11100101;
#10 Packet_in = 44'b11011101100101011100101011100101000000000000;
#100;
end
endmodule
113
Download