IEEE Conference Paper Template - Academic Science,International

advertisement
Perfect HashTable Construction Using Pipelined
Architecture Based on Signature tree
Neeshma k k
Renisha P salim
Department of computer Science and Engineering
Caarmel Engineering College,Ranni,Perunad.
Mahathma University,Kerala.
neeshma99@gmail.com
Assistant Professor
Department of computer Science and Engineering
Caarmel Engineering College,Ranni,Perunad.
Mahathma University,Kerala
renisha.salim@bccaarmel.ac.in
Abstract As the Internet continues to grow rapidly, packet classification
has become a major bottleneck of high-speed routers. Most
traditional network applications require packet classification to
return the best or highest-priority matched rule. However, with
the emergence of new network applications like Network
Intrusion Detection System (NIDS), packet-level accounting, and
load balancing, packet classification is required to report all
matched rules, not only the best matched rule. Here decompose
the operation of multimatch packet classification from the
complicated multidimensional search to several singledimensional searches, and present an asynchronous pipeline
architecture based on a signature tree structure to combine the
intermediate results returned from single-dimensional searches.
By spreading edges of the signature tree across multiple hash
tables at different stages, the pipeline can achieve a high
throughput via the interstage parallel access to hash tables. To
exploit further intrastage parallelism, two edge-grouping
algorithms are designed to evenly divide the edges associated with
each stage into multiple work-conserving hash tables. To avoid
collisions involved in hash table lookup, a hybrid perfect hash
table construction scheme is proposed. Extensive simulation using
realistic classifiers and traffic traces shows that the proposed
pipeline architecture outperforms HyperCuts and B2PC schemes
in classification speed by at least one order of magnitude, while
having a similar storage requirement. Here model the multimatch
packet classification as a concatenated multistring matching
problem, which can be solved by traversing a flat signature tree.
To speed up the traversal of the signature tree, the edges of the
signature tree are divided into different hash tables in both
vertical and horizontal directions. These hash tables are then
connected together by a pipeline architecture, and they work in
parallel when packet classification operations are performed. A
perfect hash table construction is also presented, which
guarantees that each hash table lookup can be finished in exactly
one memory access..
A computer network or data network is a
telecommunications network which allows computers to
exchange data. In computer networks, networked computing
devices pass data to each other along data connections
(network links). Data is transferred in the form of packets. The
connections between nodes are established using either cable
media or wireless media. The best-known computer network is
the Internet. Network computer devices that originate, route
and terminate the data are called network nodes. Nodes can
include hosts such as personal computers, phones, servers as
well as networking hardware. Two such devices are said to be
networked together when one device is able to exchange
information with the other device, whether or not they have a
direct connection to each other. Computer networks differ in
the transmission media used to carry their signals, the
communications protocols to organize network traffic, the
network's size, topology and organizational intent. In most
cases, communications protocols are layered on (i.e. work
using) other more specific or more general communications
protocols, except for the physical layer that directly deals with
the transmission media. Computer networks support
applications such as access to the World Wide Web, shared
use of application and storage servers, printers, and fax
machines, and use of email and instant messaging applications
Computer networks also differ in their design. The two basic
forms of network design are called client/server and peer-topeer. Client-Server network feature centralized server
computers that store email, Web pages, files and or
applications. On a peer-to-peer network, conversely, all
computers tend to support the same functions. Client-server
networks are much more common in business and peer-to-peer
networks much more common in homes. A network topology
represents its layout or structure from the point of view of data
flow. In so-called bus networks, for example, all of the
computers share and communicate across one common
conduit, whereas in a star network, all data flows through one
centralized device. Common types of network topologies
include bus, star, ring network sand mesh networks[10].
Index Terms - Hashtable,HyperCuts,B2PC,Multimatch Packet
Classification.
I. INTRODUCTION
In computing, a hash table (hash map) is a data structure
used to implement an associative array, a structure that can
map keys to values. A hash table uses a hash function to
compute an index into an array of buckets or slots, from which
the correct value can be found Ideally, the hash function will
assign each key to a unique bucket, but this situation is rarely
achievable in practice (usually some keys will hash to the same
bucket). Instead, most hash table designs assume that hash
collisions—different keys that are assigned by the hash
function to the same bucket—will occur and must be
accommodated in some way. In a well-dimensioned hash table,
the average cost (number of instructions) for each lookup is
independent of the number of elements stored in the table.
Many hash table designs also allow arbitrary insertions and
deletions of key-value pairs, at constant average cost per
operation. In many situations, hash tables turn out to be more
efficient than search trees or any other table lookup structure.
For this reason, they are widely used in many kinds of
computer software, particularly for associative arrays, database
indexing, caches, and sets. A hash table is made up of two
parts: an array (the actual table where the data to be searched
is stored) and a mapping function, known as a hash function.
The hash function is a mapping from the input space to the
integer space that defines the indices of the array. In other
words, the hash function provides a way for assigning numbers
to the input data such that the data can then be stored at the
array index corresponding to the assigned number[11].
A. Packet Classification Problem
In the packet classification application, packets are
classified into flows according to policy or routing
information. The policy is specified using fields in the header
of a packet. Specifications of fields are called rules. So flows
are specified by rules applied to incoming packets. Each rule
consists of several fields, say d. A collection of rules is called
a classifier. Each field is either an exact value or a prefix or a
range. In fact, exact values and prefixes are special ranges.
Here treat fields as arbitrary ranges. Each rule also has a
priority index number. Usually, the rules in a classifier are
sorted according to their priorities. This index number is
necessary since a packet may match more than one rule. In this
case, the rule with the highest priority index is chosen. Let C
be the classifier; or C={R1,....RN } where Ri (i=1,...N) is a rule
and N is the number of the rules in the classifier. For each
ruleRi , let Ri={Fd1 ,.....Fdi }, where (j=1,....d) is the jth field.
Throughout the paper, a field or a range of integers is
expressed as F=[b,e] , which means all integers greater than or
equal to b and smaller than or equal to e. b and e are called the
begin point and end point of the field F respectively. For
example, if the field is an IPv4 destination address, then either
point is an integer between 0 and 232-1 . When a packet is
arriving, the values fi(i=1,...d) from the relevant d fields are
extracted and expressed as P=(f1 ,...fd ) .Here say that a rule
R=(F1,....Fd) is matched by the packet, if ε for all i=1,...d.
Among the all matched rules, the rule with the highest priority
index defines the flow that the packet belongs to[8].
With the above definitions, a rule can be considered as a
hyper rectangle in the d-dimensional space. A classifier is a set
of such hyper rectangles. Hyper rectangles in the classifier
might be overlapped. A packet is then a point in the ddimensional space. Thus, packet classification is equivalent to
finding all hyper rectangles which contain the query point.
This resembles the point location problem in computational
geometry . The difference between the packet classification
and the point location problem is that hyper rectangles in the
point location problem are no overlapping, while hyper
rectangles in the packet classification problem may overlap.
Hence, the packet classification is more complex than the
point location problem. However, structures and
characteristics in classifiers could be exploited to develop high
performance packet classification algorithms. Such packet
classification algorithms may sidestep the performance upper
bound achieved in the point location background. The
algorithm to be developed in this paper serves as one such
example.
B. Ternary Content Addressable Memory
Content-addressable memory (CAM) is a special type of
computer memory used in certain very-high-speed searching
applications. It is also known as associative memory,
associative storage, or associative array, although the last term
is more often used for a programming data structure. It
compares input search data (tag) against a table of stored data,
and returns the address of matching data (or in the case of
associative memory, the matching data). Several custom
computers, like the Goodyear STARAN[12], were built to
implement CAM, and were designated associative computers.
C. Signature Tree
The signature file method is a popular indexing technique
used in information retrieval and databases. It excels in
efficient index maintenance and lower space overhead.
Different approaches for organizing signature files have been
proposed, such as sequential signature files, bit-slice files, Strees, and its different variants, as well as signature trees. In
this paper, extends the structure of signature trees by
introducing multiple-bit checking[1][13]. That is, during the
searching of a signature tree against a query signature s q ,
more than one bit in s q will be checked each time when a
node is encountered. This does not only reduce significantly
the size of a signature tree, but also increases the filtering
ability of the signature tree, call such a structure a general
signature tree. Experiments have been made, showing that the
general signature tree uniformly outperforms the signature tree
approach.
The emergence of new network applications, such as the
network intrusion detection system and packet-level
accounting, requires packet classification to report all matched
rules instead of only the best matched rule. Although several
schemes have been proposed recently to address the
multimatch packet classification problem, most of them
require either huge memory or expensive ternary content
addressable memory (TCAM) to store the intermediate data
structure, or they suffer from steep performance degradation
under certain types of classifiers. Here decompose the
operation of multimatch packet classification from the
complicated multidimensional search to several singledimensional searches, and present an asynchronous pipeline
architecture based on a signature tree structure to combine the
intermediate results returned from single-dimensional
searches. By spreading edges of the signature tree across
multiple hash tables at different stages, the pipeline can
achieve a high throughput via the interstage parallel access to
hash tables. To exploit further intrastage parallelism, two
edge-grouping algorithms are designed to evenly divide the
edges associated with each stage into multiple workconserving hash tables. To avoid collisions involved in hash
table lookup, a hybrid perfect hash table construction scheme
is proposed. Extensive simulation using realistic classifiers and
traffic traces shows that the proposed pipeline architecture
outperforms Hyper Cuts and B2PC schemes in classification
speed by at least one order of magnitude, while having a
similar storage requirement. Particularly, with different types
of classifiers of 4K rules, the proposed pipeline architecture is
able to achieve a throughput between 26.8 and 93.1 Gb/s using
perfect hash tables.
II. RELATED WORK
Many schemes have been proposed in literature to address
the best-match packet classification problem,such as TCAMbased schemes, hicut, hyper cut, set splitting algorithm, multi
dimensional cutting. TreeCAM, wire speed TCAM.
Ternary content-addressable memories (TCAMs) have
gained wide acceptance in the industry for storing and
searching Access Control Lists (ACLs).Here propose
algorithms for addressing two important problems that are
encountered while using TCAMs: reducing range expansion
and multi-match classification .The first algorithm addresses
the problem of expansion of rules with range fields to
represent range rules in TCAMs, a single range rule is mapped
to multiple TCAM entries, which reduces the utilization of
TCAMs .Here propose a new scheme called Database
Independent Range PreEncoding (DIRPE) that, in comparison
to earlier approaches, reduces the worst-case number of
TCAM entries a single rule maps on to. DIRPE works without
prior knowledge of the database, scales when a large number
of ranges is present, and has good incremental update
properties[5].
Internet routers that operate as firewalls, or provide a
variety of service classes, perform different operations on
different flows. A flow is defined to be all the packets sharing
common header characteristics; for example a flow may be
defined as all the packets between two specific IP addresses.
In order to classify a packet, a router consults a table (or
classifier) using one or more fields from the packet header to
search for the corresponding flow. The classifier is a list of
rules that identify each flow and the actions to be performed
on each. With the increasing demands on router performance,
there is a need for algorithms that can classify packets quickly
with minimal storage requirements and allow new flows to be
frequently added and deleted. In the worst case, packet
classification is hard requiring routers to use heuristics that
exploit structure present in the classifiers[9].
New network applications like intrusion detection systems
and packet-level accounting require multi match packet
classification, where all matching filters need to be reported.
Ternary Content Addressable Memories (TCAMs) have been
adopted to solve the multi-match classification problem due to
their ability to perform fast parallel matching. However,
TCAM is expensive and consumes large amounts of power.
None of the previously published multi-match classification
schemes is both memory and power efficient.
Here introduces a classification algorithm called
HyperCuts. Like the previously best known algorithm, HiCuts,
HyperCuts is based on a decision tree structure. Unlike
HiCuts, however, in which each node in the decision tree
represents a hyperplane, each node in the HyperCuts decision
tree represents a k−dimensional hypercube. Using this extra
degree of freedom and a new set of heuristics to find optimal
hypercubes for a given amount of storage, HyperCuts can
provide an order of magnitude improvement over existing
classification algorithms[6].
The first one is a renovated TCAM design that can find all
or the first r matches in a packet filter set. The second
architecture is a novel partitioning scheme based on filter
intersection properties allowing us to use off-the-shelf TCAMs
for multimatch packet classification. The classifier engine
finds all matches in exactly one conventional TCAM cycle
while reducing the power consumption by at least two orders
of magnitude, which is far better than the existing hardwarebased designs
The process of categorizing packets into “flows” in an
Internet router is called packet classification. All packets
belonging to the same flow obey a pre-defined rule and are
processed in a similar manner by the router. For example, all
packets with the same source and destination IP addresses may
be defined to form a flow. Packet classification is needed for
non “best-effort” services, such as firewalls and quality of
service; services that require the capability to distinguish and
isolate traffic in different flows for suitable processing. In
general, packet classification on multiple fields is a difficult
problem. Hence, researchers have proposed a variety of
algorithms which, broadly speaking, can be categorized as
“basic search algorithms,” geometric algorithms, heuristic
algorithms, or hardware-specific search algorithms .Here
describe algorithms that are representative of each category,
and discuss which type of algorithm might be suitable for
different applications.
Packet Classification is a key functionality provided by
modern routers. Previous approaches TCAM and algorithmic
perform well in either lookup efficiency (power and number of
accesses) or update effort but not both. To perform well in
both, propose TreeCAM, which employs three novel ideas[7].
 Dual versions of TreeCAM’s decision tree to
decouple lookups and updates: A coarse version with
a few thousand rules per leaf achieves efficient
lookups and a fine version with a few tens of rules per
leaf reduces update effort.
 Interleaved layout of the rules in the TCAM:
Combined with the fine version’s few rules per leaf,
the layout enables us to bound our worst-case update
effort.
Path-by-path updates to enable update work to be interspersed
with packet lookups (i.e., non-atomic updates), eliminating
packet buffering or packet drops during update. Using
simulations of 100,000-rule classifiers, show that TreeCAM
performs well in both lookups and updates:
 6- 8 TCAM subarray accesses per packet, matching
modern TCAMs.
 close to an idealized TCAM in worst-case update
effort while requiring little buffering of packets.
architecture for perfect hash table. The main system
architecture of this system shows figure .
Here propose TreeCAM[2], which employs three novel ideas.
Packet classification is a function increasingly used in a
number of networking appliances and applications. Typically,
this consists of a set of abstract classifications, and a set of
rules which ,sort packets into the various classifications. For
packet classification at line speeds, Ternary ContentAddressable Memories,(TCAMs) have become a norm in most
networking hardware. However, TCAMs are expensive and
power-hungry. Hence, a packet classification rule set need to
be minimized before populating the TCAM. Here formulate
the Ruleset Minimization, Problem, for TCAM as an abstract,
optimization problem based on two-level logic minimization,
and propose an, exact, solution, and, a number, of heuristics.
Here present experimental, results with two different
datasets—artificial filter sets generated, using Class Bench
tool suite and, a real firewall Access Control List(ACL) from,
a large enterprise. Observe an average, reduction, of 41% in
artificial filter sets and,72.5% reduction, in the firewall ACL
using the proposed, heuristics.
III. METHODOLOGY
This system architecture shows that a administrative
system recieve all incoming packets and monitoring packets
and their ports,then categorize the packets using
classifier.There is an option for adding new ruleset and delete
ruleset.All rulesets are arranged into a hashtable. Access
corresponding packets and there is a chance for anomaly
detection.
In this section, the major implementation techniques of the
system are described.
A. System Overview
In the packet classification application, packets are
classified into ows according to policy or routing information.
The policy is specified using fields in the header of a packet.
Specifications of fields are called rules. So ows are specified
by rules applied to incoming packets. These packets are
classified without using predetermined classifier. Using this
classifier create a signature tree Edges of signature tree
divided into a hash table Therefore create a pipelined
Fig.1 System Overview
B. Classifier
In this paper,it is an administrative system,it accept alla the
incoming packet coming from intranet or internet.The
administrative system then classify the packets based on
header field.The classifier using header field information such
as source IP,destinationIP,source port,destination port,protocol
etc.
C. Signature Tree
After classifying each incoming packet.The classifier classify
each packet based on rules.Using this rule create a signature
tree.Each edge on the tree represent a character and each node
represent a prefix of string in the encoded classifier.The D of
each leaf node represent the ID of a rule in the encoded
classifier.
D. Pipelined Architecture
Traversing the signature tree .partition the signature tree and
store each edge of each partition into an individual hash
table.To further partiition the hash table at each stage to
exploit intrastage parallelism.
E. Edge Grouping
In this section,we want to partition the original hash table
associated with process module into M work conserving hash
table.The most straightforward way is to divide eddged of
original hashtable into M independent edge set and store each
of them in a individual hash table.This action is called edge
grouping.Two edge grouping scheme are used[1].
1.Character-Based Edge Grouping
The first edge-grouping scheme is named character-based
edge-grouping (CB_EG)[1]. Its basic idea is to divide edges
according to their associated characters, and then embed the
grouping information in the encodings of characters.
2. Node-Character-Based Edge-Grouping
The second edge-grouping scheme is named Node-CharacterBased Edge-Grouping (NCB_EG), which divides edges not
only based on their labeled characters, but also based on the
IDs of their source nodes[1].
Major problem involved in the hashtable design is hash
collision.Which may increase the number of memory access
involvedin each hash table lookup and slow down the packet
classification speed.In this section construct a perfect hash
table with the guarantee of no hash table collision and support
high speed packet classification.
IV.CONCLUSION
Here model the multimatch packet classification as a
concatenated multistring matching problem, which can be
solved by traversing a flat signature tree. To speed up the
traversal of the signature tree, the edges of the signature tree
are divided into different hash tables in both vertical and
horizontal directions. These hash tables are then connected
together by a pipeline architecture, and they work in parallel
when packet classification operations are performed. A perfect
hash table construction is also presented, which guarantees that
each hash table lookup can be finished in exactly one memory
access. Because of the large degree of parallelism and
elaborately designed edge partition scheme, the proposed
pipeline architecture is able to achieve an ultra-high packet
classification speed with a very low storage requirement.
Simulation results show that the proposed pipeline architecture
outperforms HyperCuts and B2PC schemes in classification
speed by at least one order of magnitude with a storage
requirement similar to that of the HyperCuts and B2PC
schemes.
REFERENCES
[1] Yang Xu,, IEEE, Zhaobo Liu, Zhuoyuan Zhang, and H. Jonathan
Chao,"High-Throughput and Memory-Efficient Multimatch Packet
Classification Based on Distributed and Pipelined Hash Tables" in
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 3,
JUNE 2014.
[2] M. Faezipour and M. Nourani, “Wire-speed TCAM-based architectures
for multimatch packet classification,” IEEE Trans. Comput., vol. 58, no.
1, pp. 5–17, Jan. 2009.
[3] R. McGeer and P. Yalagandula, “Minimizing rulesets for TCAM
implementation,” Proc. IEEE INFOCOM, pp. 1314–1322, Apr. 2009.
[4] S. Kumar, J. Turner, and J. Williams, “Advanced algorithms for fast and
scalable deep packet inspection,” in Proc. ACM/IEEE ANCS, New York,
NY, USA, 2006, pp. 81–92.
[5] K. Lakshminarayanan, A. Rangarajan, and S. Venkatachary, “Algorithms
for advanced packet classification with ternary CAMS,” in Proc. ACM
SIGCOMM , New York, NY, USA, 2005, pp. 193–204.
[6] F. Yu, T. V. Lakshman, M. A. Motoyama, and R. H. Katz, “SSA: a power
and memory efficient scheme to multi-match packet classification,” in
Proc. ACM ANCS, New York, NY, USA, 2005, pp. 105–113.
[7] S. Singh, F. Baboescu, G. Varghese, and J.Wang, “Packet classification
using multidimensional cutting,” in Proc. SIGCOMM, New York, NY,
USA, 2003.
[8] P. Gupta and N. McKeown, “Algorithms for packet classification,” IEEE
Netw., vol. 15, no. 2, pp. 24–32, Mar.–Apr. 2001.
[9] P. Gupta and N. McKeown, “Classifying packets with hierarchical
intelligent cuttings,” IEEE Micro, vol. 20, no. 1, pp. 34–41, Jan.–Feb.
2000.
[10] https://en.wikipedia.org/?title=Computer_network
[11] https://en.wikipedia.org/wiki/Hash_table
[12] https://en.wikipedia.org/wiki/Ternary Content Addressable Memory
[13] https://en.wikipedia.org/wiki/Signature_Tree.
Download