Perfect HashTable Construction Using Pipelined Architecture Based on Signature tree Neeshma k k Renisha P salim Department of computer Science and Engineering Caarmel Engineering College,Ranni,Perunad. Mahathma University,Kerala. neeshma99@gmail.com Assistant Professor Department of computer Science and Engineering Caarmel Engineering College,Ranni,Perunad. Mahathma University,Kerala renisha.salim@bccaarmel.ac.in Abstract As the Internet continues to grow rapidly, packet classification has become a major bottleneck of high-speed routers. Most traditional network applications require packet classification to return the best or highest-priority matched rule. However, with the emergence of new network applications like Network Intrusion Detection System (NIDS), packet-level accounting, and load balancing, packet classification is required to report all matched rules, not only the best matched rule. Here decompose the operation of multimatch packet classification from the complicated multidimensional search to several singledimensional searches, and present an asynchronous pipeline architecture based on a signature tree structure to combine the intermediate results returned from single-dimensional searches. By spreading edges of the signature tree across multiple hash tables at different stages, the pipeline can achieve a high throughput via the interstage parallel access to hash tables. To exploit further intrastage parallelism, two edge-grouping algorithms are designed to evenly divide the edges associated with each stage into multiple work-conserving hash tables. To avoid collisions involved in hash table lookup, a hybrid perfect hash table construction scheme is proposed. Extensive simulation using realistic classifiers and traffic traces shows that the proposed pipeline architecture outperforms HyperCuts and B2PC schemes in classification speed by at least one order of magnitude, while having a similar storage requirement. Here model the multimatch packet classification as a concatenated multistring matching problem, which can be solved by traversing a flat signature tree. To speed up the traversal of the signature tree, the edges of the signature tree are divided into different hash tables in both vertical and horizontal directions. These hash tables are then connected together by a pipeline architecture, and they work in parallel when packet classification operations are performed. A perfect hash table construction is also presented, which guarantees that each hash table lookup can be finished in exactly one memory access.. A computer network or data network is a telecommunications network which allows computers to exchange data. In computer networks, networked computing devices pass data to each other along data connections (network links). Data is transferred in the form of packets. The connections between nodes are established using either cable media or wireless media. The best-known computer network is the Internet. Network computer devices that originate, route and terminate the data are called network nodes. Nodes can include hosts such as personal computers, phones, servers as well as networking hardware. Two such devices are said to be networked together when one device is able to exchange information with the other device, whether or not they have a direct connection to each other. Computer networks differ in the transmission media used to carry their signals, the communications protocols to organize network traffic, the network's size, topology and organizational intent. In most cases, communications protocols are layered on (i.e. work using) other more specific or more general communications protocols, except for the physical layer that directly deals with the transmission media. Computer networks support applications such as access to the World Wide Web, shared use of application and storage servers, printers, and fax machines, and use of email and instant messaging applications Computer networks also differ in their design. The two basic forms of network design are called client/server and peer-topeer. Client-Server network feature centralized server computers that store email, Web pages, files and or applications. On a peer-to-peer network, conversely, all computers tend to support the same functions. Client-server networks are much more common in business and peer-to-peer networks much more common in homes. A network topology represents its layout or structure from the point of view of data flow. In so-called bus networks, for example, all of the computers share and communicate across one common conduit, whereas in a star network, all data flows through one centralized device. Common types of network topologies include bus, star, ring network sand mesh networks[10]. Index Terms - Hashtable,HyperCuts,B2PC,Multimatch Packet Classification. I. INTRODUCTION In computing, a hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found Ideally, the hash function will assign each key to a unique bucket, but this situation is rarely achievable in practice (usually some keys will hash to the same bucket). Instead, most hash table designs assume that hash collisions—different keys that are assigned by the hash function to the same bucket—will occur and must be accommodated in some way. In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key-value pairs, at constant average cost per operation. In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets. A hash table is made up of two parts: an array (the actual table where the data to be searched is stored) and a mapping function, known as a hash function. The hash function is a mapping from the input space to the integer space that defines the indices of the array. In other words, the hash function provides a way for assigning numbers to the input data such that the data can then be stored at the array index corresponding to the assigned number[11]. A. Packet Classification Problem In the packet classification application, packets are classified into flows according to policy or routing information. The policy is specified using fields in the header of a packet. Specifications of fields are called rules. So flows are specified by rules applied to incoming packets. Each rule consists of several fields, say d. A collection of rules is called a classifier. Each field is either an exact value or a prefix or a range. In fact, exact values and prefixes are special ranges. Here treat fields as arbitrary ranges. Each rule also has a priority index number. Usually, the rules in a classifier are sorted according to their priorities. This index number is necessary since a packet may match more than one rule. In this case, the rule with the highest priority index is chosen. Let C be the classifier; or C={R1,....RN } where Ri (i=1,...N) is a rule and N is the number of the rules in the classifier. For each ruleRi , let Ri={Fd1 ,.....Fdi }, where (j=1,....d) is the jth field. Throughout the paper, a field or a range of integers is expressed as F=[b,e] , which means all integers greater than or equal to b and smaller than or equal to e. b and e are called the begin point and end point of the field F respectively. For example, if the field is an IPv4 destination address, then either point is an integer between 0 and 232-1 . When a packet is arriving, the values fi(i=1,...d) from the relevant d fields are extracted and expressed as P=(f1 ,...fd ) .Here say that a rule R=(F1,....Fd) is matched by the packet, if ε for all i=1,...d. Among the all matched rules, the rule with the highest priority index defines the flow that the packet belongs to[8]. With the above definitions, a rule can be considered as a hyper rectangle in the d-dimensional space. A classifier is a set of such hyper rectangles. Hyper rectangles in the classifier might be overlapped. A packet is then a point in the ddimensional space. Thus, packet classification is equivalent to finding all hyper rectangles which contain the query point. This resembles the point location problem in computational geometry . The difference between the packet classification and the point location problem is that hyper rectangles in the point location problem are no overlapping, while hyper rectangles in the packet classification problem may overlap. Hence, the packet classification is more complex than the point location problem. However, structures and characteristics in classifiers could be exploited to develop high performance packet classification algorithms. Such packet classification algorithms may sidestep the performance upper bound achieved in the point location background. The algorithm to be developed in this paper serves as one such example. B. Ternary Content Addressable Memory Content-addressable memory (CAM) is a special type of computer memory used in certain very-high-speed searching applications. It is also known as associative memory, associative storage, or associative array, although the last term is more often used for a programming data structure. It compares input search data (tag) against a table of stored data, and returns the address of matching data (or in the case of associative memory, the matching data). Several custom computers, like the Goodyear STARAN[12], were built to implement CAM, and were designated associative computers. C. Signature Tree The signature file method is a popular indexing technique used in information retrieval and databases. It excels in efficient index maintenance and lower space overhead. Different approaches for organizing signature files have been proposed, such as sequential signature files, bit-slice files, Strees, and its different variants, as well as signature trees. In this paper, extends the structure of signature trees by introducing multiple-bit checking[1][13]. That is, during the searching of a signature tree against a query signature s q , more than one bit in s q will be checked each time when a node is encountered. This does not only reduce significantly the size of a signature tree, but also increases the filtering ability of the signature tree, call such a structure a general signature tree. Experiments have been made, showing that the general signature tree uniformly outperforms the signature tree approach. The emergence of new network applications, such as the network intrusion detection system and packet-level accounting, requires packet classification to report all matched rules instead of only the best matched rule. Although several schemes have been proposed recently to address the multimatch packet classification problem, most of them require either huge memory or expensive ternary content addressable memory (TCAM) to store the intermediate data structure, or they suffer from steep performance degradation under certain types of classifiers. Here decompose the operation of multimatch packet classification from the complicated multidimensional search to several singledimensional searches, and present an asynchronous pipeline architecture based on a signature tree structure to combine the intermediate results returned from single-dimensional searches. By spreading edges of the signature tree across multiple hash tables at different stages, the pipeline can achieve a high throughput via the interstage parallel access to hash tables. To exploit further intrastage parallelism, two edge-grouping algorithms are designed to evenly divide the edges associated with each stage into multiple workconserving hash tables. To avoid collisions involved in hash table lookup, a hybrid perfect hash table construction scheme is proposed. Extensive simulation using realistic classifiers and traffic traces shows that the proposed pipeline architecture outperforms Hyper Cuts and B2PC schemes in classification speed by at least one order of magnitude, while having a similar storage requirement. Particularly, with different types of classifiers of 4K rules, the proposed pipeline architecture is able to achieve a throughput between 26.8 and 93.1 Gb/s using perfect hash tables. II. RELATED WORK Many schemes have been proposed in literature to address the best-match packet classification problem,such as TCAMbased schemes, hicut, hyper cut, set splitting algorithm, multi dimensional cutting. TreeCAM, wire speed TCAM. Ternary content-addressable memories (TCAMs) have gained wide acceptance in the industry for storing and searching Access Control Lists (ACLs).Here propose algorithms for addressing two important problems that are encountered while using TCAMs: reducing range expansion and multi-match classification .The first algorithm addresses the problem of expansion of rules with range fields to represent range rules in TCAMs, a single range rule is mapped to multiple TCAM entries, which reduces the utilization of TCAMs .Here propose a new scheme called Database Independent Range PreEncoding (DIRPE) that, in comparison to earlier approaches, reduces the worst-case number of TCAM entries a single rule maps on to. DIRPE works without prior knowledge of the database, scales when a large number of ranges is present, and has good incremental update properties[5]. Internet routers that operate as firewalls, or provide a variety of service classes, perform different operations on different flows. A flow is defined to be all the packets sharing common header characteristics; for example a flow may be defined as all the packets between two specific IP addresses. In order to classify a packet, a router consults a table (or classifier) using one or more fields from the packet header to search for the corresponding flow. The classifier is a list of rules that identify each flow and the actions to be performed on each. With the increasing demands on router performance, there is a need for algorithms that can classify packets quickly with minimal storage requirements and allow new flows to be frequently added and deleted. In the worst case, packet classification is hard requiring routers to use heuristics that exploit structure present in the classifiers[9]. New network applications like intrusion detection systems and packet-level accounting require multi match packet classification, where all matching filters need to be reported. Ternary Content Addressable Memories (TCAMs) have been adopted to solve the multi-match classification problem due to their ability to perform fast parallel matching. However, TCAM is expensive and consumes large amounts of power. None of the previously published multi-match classification schemes is both memory and power efficient. Here introduces a classification algorithm called HyperCuts. Like the previously best known algorithm, HiCuts, HyperCuts is based on a decision tree structure. Unlike HiCuts, however, in which each node in the decision tree represents a hyperplane, each node in the HyperCuts decision tree represents a k−dimensional hypercube. Using this extra degree of freedom and a new set of heuristics to find optimal hypercubes for a given amount of storage, HyperCuts can provide an order of magnitude improvement over existing classification algorithms[6]. The first one is a renovated TCAM design that can find all or the first r matches in a packet filter set. The second architecture is a novel partitioning scheme based on filter intersection properties allowing us to use off-the-shelf TCAMs for multimatch packet classification. The classifier engine finds all matches in exactly one conventional TCAM cycle while reducing the power consumption by at least two orders of magnitude, which is far better than the existing hardwarebased designs The process of categorizing packets into “flows” in an Internet router is called packet classification. All packets belonging to the same flow obey a pre-defined rule and are processed in a similar manner by the router. For example, all packets with the same source and destination IP addresses may be defined to form a flow. Packet classification is needed for non “best-effort” services, such as firewalls and quality of service; services that require the capability to distinguish and isolate traffic in different flows for suitable processing. In general, packet classification on multiple fields is a difficult problem. Hence, researchers have proposed a variety of algorithms which, broadly speaking, can be categorized as “basic search algorithms,” geometric algorithms, heuristic algorithms, or hardware-specific search algorithms .Here describe algorithms that are representative of each category, and discuss which type of algorithm might be suitable for different applications. Packet Classification is a key functionality provided by modern routers. Previous approaches TCAM and algorithmic perform well in either lookup efficiency (power and number of accesses) or update effort but not both. To perform well in both, propose TreeCAM, which employs three novel ideas[7]. Dual versions of TreeCAM’s decision tree to decouple lookups and updates: A coarse version with a few thousand rules per leaf achieves efficient lookups and a fine version with a few tens of rules per leaf reduces update effort. Interleaved layout of the rules in the TCAM: Combined with the fine version’s few rules per leaf, the layout enables us to bound our worst-case update effort. Path-by-path updates to enable update work to be interspersed with packet lookups (i.e., non-atomic updates), eliminating packet buffering or packet drops during update. Using simulations of 100,000-rule classifiers, show that TreeCAM performs well in both lookups and updates: 6- 8 TCAM subarray accesses per packet, matching modern TCAMs. close to an idealized TCAM in worst-case update effort while requiring little buffering of packets. architecture for perfect hash table. The main system architecture of this system shows figure . Here propose TreeCAM[2], which employs three novel ideas. Packet classification is a function increasingly used in a number of networking appliances and applications. Typically, this consists of a set of abstract classifications, and a set of rules which ,sort packets into the various classifications. For packet classification at line speeds, Ternary ContentAddressable Memories,(TCAMs) have become a norm in most networking hardware. However, TCAMs are expensive and power-hungry. Hence, a packet classification rule set need to be minimized before populating the TCAM. Here formulate the Ruleset Minimization, Problem, for TCAM as an abstract, optimization problem based on two-level logic minimization, and propose an, exact, solution, and, a number, of heuristics. Here present experimental, results with two different datasets—artificial filter sets generated, using Class Bench tool suite and, a real firewall Access Control List(ACL) from, a large enterprise. Observe an average, reduction, of 41% in artificial filter sets and,72.5% reduction, in the firewall ACL using the proposed, heuristics. III. METHODOLOGY This system architecture shows that a administrative system recieve all incoming packets and monitoring packets and their ports,then categorize the packets using classifier.There is an option for adding new ruleset and delete ruleset.All rulesets are arranged into a hashtable. Access corresponding packets and there is a chance for anomaly detection. In this section, the major implementation techniques of the system are described. A. System Overview In the packet classification application, packets are classified into ows according to policy or routing information. The policy is specified using fields in the header of a packet. Specifications of fields are called rules. So ows are specified by rules applied to incoming packets. These packets are classified without using predetermined classifier. Using this classifier create a signature tree Edges of signature tree divided into a hash table Therefore create a pipelined Fig.1 System Overview B. Classifier In this paper,it is an administrative system,it accept alla the incoming packet coming from intranet or internet.The administrative system then classify the packets based on header field.The classifier using header field information such as source IP,destinationIP,source port,destination port,protocol etc. C. Signature Tree After classifying each incoming packet.The classifier classify each packet based on rules.Using this rule create a signature tree.Each edge on the tree represent a character and each node represent a prefix of string in the encoded classifier.The D of each leaf node represent the ID of a rule in the encoded classifier. D. Pipelined Architecture Traversing the signature tree .partition the signature tree and store each edge of each partition into an individual hash table.To further partiition the hash table at each stage to exploit intrastage parallelism. E. Edge Grouping In this section,we want to partition the original hash table associated with process module into M work conserving hash table.The most straightforward way is to divide eddged of original hashtable into M independent edge set and store each of them in a individual hash table.This action is called edge grouping.Two edge grouping scheme are used[1]. 1.Character-Based Edge Grouping The first edge-grouping scheme is named character-based edge-grouping (CB_EG)[1]. Its basic idea is to divide edges according to their associated characters, and then embed the grouping information in the encodings of characters. 2. Node-Character-Based Edge-Grouping The second edge-grouping scheme is named Node-CharacterBased Edge-Grouping (NCB_EG), which divides edges not only based on their labeled characters, but also based on the IDs of their source nodes[1]. Major problem involved in the hashtable design is hash collision.Which may increase the number of memory access involvedin each hash table lookup and slow down the packet classification speed.In this section construct a perfect hash table with the guarantee of no hash table collision and support high speed packet classification. IV.CONCLUSION Here model the multimatch packet classification as a concatenated multistring matching problem, which can be solved by traversing a flat signature tree. To speed up the traversal of the signature tree, the edges of the signature tree are divided into different hash tables in both vertical and horizontal directions. These hash tables are then connected together by a pipeline architecture, and they work in parallel when packet classification operations are performed. A perfect hash table construction is also presented, which guarantees that each hash table lookup can be finished in exactly one memory access. Because of the large degree of parallelism and elaborately designed edge partition scheme, the proposed pipeline architecture is able to achieve an ultra-high packet classification speed with a very low storage requirement. Simulation results show that the proposed pipeline architecture outperforms HyperCuts and B2PC schemes in classification speed by at least one order of magnitude with a storage requirement similar to that of the HyperCuts and B2PC schemes. REFERENCES [1] Yang Xu,, IEEE, Zhaobo Liu, Zhuoyuan Zhang, and H. Jonathan Chao,"High-Throughput and Memory-Efficient Multimatch Packet Classification Based on Distributed and Pipelined Hash Tables" in IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 3, JUNE 2014. [2] M. Faezipour and M. Nourani, “Wire-speed TCAM-based architectures for multimatch packet classification,” IEEE Trans. Comput., vol. 58, no. 1, pp. 5–17, Jan. 2009. [3] R. McGeer and P. Yalagandula, “Minimizing rulesets for TCAM implementation,” Proc. IEEE INFOCOM, pp. 1314–1322, Apr. 2009. [4] S. Kumar, J. Turner, and J. Williams, “Advanced algorithms for fast and scalable deep packet inspection,” in Proc. ACM/IEEE ANCS, New York, NY, USA, 2006, pp. 81–92. [5] K. Lakshminarayanan, A. Rangarajan, and S. Venkatachary, “Algorithms for advanced packet classification with ternary CAMS,” in Proc. ACM SIGCOMM , New York, NY, USA, 2005, pp. 193–204. [6] F. Yu, T. V. Lakshman, M. A. Motoyama, and R. H. Katz, “SSA: a power and memory efficient scheme to multi-match packet classification,” in Proc. ACM ANCS, New York, NY, USA, 2005, pp. 105–113. [7] S. Singh, F. Baboescu, G. Varghese, and J.Wang, “Packet classification using multidimensional cutting,” in Proc. SIGCOMM, New York, NY, USA, 2003. [8] P. Gupta and N. McKeown, “Algorithms for packet classification,” IEEE Netw., vol. 15, no. 2, pp. 24–32, Mar.–Apr. 2001. [9] P. Gupta and N. McKeown, “Classifying packets with hierarchical intelligent cuttings,” IEEE Micro, vol. 20, no. 1, pp. 34–41, Jan.–Feb. 2000. [10] https://en.wikipedia.org/?title=Computer_network [11] https://en.wikipedia.org/wiki/Hash_table [12] https://en.wikipedia.org/wiki/Ternary Content Addressable Memory [13] https://en.wikipedia.org/wiki/Signature_Tree.