International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 An Efficient Network Traffic Classification Approach with Bayesian Classifier Veelamuri Ramakrishna Rao1, L. Prasanna Kumar2, Amarendra Kothalanka3 1,2,3 M.Tech Scholar1, Associate Professor2, Professor & Head of the Department of CSE3 Computer Science & Engineering Department, Dadi Institute to engineering and technology, Visakhapatnam, Andhra Pradesh Abstract: Internet traffic classification is still an important research issue in the field of network traffic and issues over network. Even though various traditional approaches available they are counting with static measures, In this paper we are proposing an efficient and simple probability based approach with Bayesian classification for classifying the training dataset of record of ip packet meta information(source IP, source port, destination IP, destination port, header Length, isAck, isRst and isSyn) with testing data inputs.Our practical approach shows optimal results than the traditional approach. I. INTRODUCTION In the 21st century, the number of Internet users increaseddramatically. The users applied several Internet applications suchas WWW, FTP, peer-to-peer-based software, web media,messaging, email, VOIP etc. This led to fast increments of Internet traffic. The classification of Internet traffic offers three main functions to the network administrator, internet serviceprovider (ISPs), and governments: First, the packet classificationcan be used in the intrusion detection system (IDS) to detect thepatterns of denial of service (DoS) or other malicious attacks. Itcan also be used by the administrator to identify and control thenetwork applications when needed. Second, it can be used by the ISPs to monitor the network flows, diagnose the network to findfaults, properly allocate the bandwidth to applications, and ensurethe performance of the applications and services running on thenetworks. Third, it can be used by governments to do “LawfulInspection” (LI) of the payload of packets, to obtain userinformation. Just like how telephone companies offer to monitortelephone calls to the government, ISPs provide the LI services tothe governments. [1, 2, 3]We know the importance of the characterization of Internet traffic.Now we need to understand the barriers for packet classification.Internet traffic characterizing has been a challenge over the pastfew years. [4] It requires in-depth understanding of the sophisticated network protocol structure, because there are manyvarious types of traffic for the ISPs, as well as a large volume ofstream flows. With the bandwidth and number of servicesincreasing, users can perform much more complicated activitiesthan before. A broadband user can perform tasks such as VoIP,shopping and ISSN: 2231-5381 banding online, peer-to-peer-based file and videosharing among peers, and much more complicated functions thatwere previously known by dial-up users.[5] The complexity willincrease when using different wireless technologies such as the4G Long Term Evolution (LTE) system and the Wi-Fi system.[6] II. RELATED WORK Classifying the network traffic information leads to the analyzing the network sample information with the training information sets in the mobile adhoc networks and peer to peer communications, So many issues raises if agent can not classify before communicate with the receiving node Due to the limitation of port-based and payloadbasedclassification, the recent research focuses on the use of transportand flow layer behavior statistics for packet classification. [6, 7 ,8] This approach uses a set of sample traffic trace to train theclassification engine to identify future traffic based on theapplication flow behaviors, such as packet length, inter-packetarrival time, TCP and IP flags, and checksum. Their target is to classify traffic with similar patterns into groups, or classify trafficinto individual application. However, the accuracy of classifying encrypted traffic using a statistical-based approach is relativelylow, varying from 76% to 86% with false positive rate between0% and 8% base on different rule settings. [8, 9, 10, 11] Manyresearchers use Machine Learning (ML) to perform statistic-basedclassification. The reason to choose ML is because it canautomatically create the signatures for the application andautomatically identify the application in the future traffic flow.Another reason to choose ML is it has the ability to automaticallyselect the most appropriate features to create the signature. Classification is the method to train the machine with sampledatasets, which means the traffic type is known to the user, inorder to build classification rules. Then, the machine uses therules to classify unknown datasets. Clustering is the method to find similar patterns among different traffic types, and group thetraffic that has similar patterns in the clusters. This method doesnot require supervision. Association is a http://www.ijettjournal.org Page 104 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 way to detect therelationships between attributes. Numeric prediction is a way tofind the total number features appearing in the dataset. Thismethod is useful when finding important features or attributes.This method is supervised learning.The main difference between supervised learning andunsupervised learning is that supervised learning needs trainingdatasets to train the machine, whereas unsupervised learning doesnot require a training phase. for members of that class. Examples are grouped in classes because they have common values for the features. Such classes are often called natural kinds. In this section, the target feature corresponds to a discrete class, which is not necessarily binary. III.PROPOSED SYSTEM In our proposed approach we are introducing a empirical model of internet traffic classification approach with Bayesian classification, by calculating the initial and posterior probability for the individual attribute set of testing dataset with training dataset corresponding testing dataset. Given an example with inputs X1=v1,...,Xk=vk, is used to compute the posterior probability distribution of the example's classification, Y: P(Y | X1=v1,...,Xk=vk) (P(X1=v1,...,Xk=vk| Y) = ×P(Y))/(P(X1=v1,...,Xk=vk)) (P(X1=v1|Y)×···×P(Xk=vk| Y)×P(Y))/( = ∑YP(X1=v1|Y)×···×P(Xk=vk| Y) ×P(Y)) where the denominator is a normalizing constant to ensure the probabilities sum to 1. The denominator does not depend on the class and, therefore, it is not needed to determine the most likely class. There are many existing methods to assign packets in the network to a particular application (class), but none of them were capable of providing high-quality per-application statistics when working in high-speed networks. Classification by ports or protocol can provide sufficient results only for limited number of applications, which use fixed port numbers or contain characteristic patterns in the payload, but fails to work in vast networks and busy internet traffic. We are developing a new System to work efficiently and effectively in all type of networks with deeper study of the network packets. The system captures IP packets crossing a target network and constructs traffic flows by checking the headers of IP packets. A flow consists of successive IP packets with the same 8-tuple: “source IP, source port, destination IP, destination port, header Length, isAck, isRst and isSyn”. In the existing system they consider only 5-tuples for classification, but from these 5-tuples we couldn’t get sufficient information to classify the internet traffic efficiently, as they are basic properties of a packet. So to study the packets in depth we are taking few new tuples with the basic tuples. In the tradional system they used Naive Bayesian classifier for the classification of traffic, but this classifier takes much time in computation of probability than Bayesian classifier. As time is more important in network classification we are using Bayesian classifier in proposed system. Classification is the process of classifying the testing data with training data and predicts the class values by the probability . A Bayesian classifier is based on the idea that the role of a (natural) class is to predict the values of features ISSN: 2231-5381 In approach we calssify the network traffic ,Intially get the Device from the node and Load the training dataset with basic meta attribute set(source IP, source port, destination IP, destination port, header Length, isAck, isRst and isSyn) ,Connected node(sample node) details can be retrieved for analyzing the traffic for the classification or machine learning approach. Compute the posterior probability with respect to the individual attributes in the training dataset record and connected node meta data information as sample, which leads to the dynamic chances of computing the prediction values and decides the class label sets as final results IV.EXPERIMENTAL ANALYSIS For implementation purpose, Classification of network traffic information developed through java language with Eclipse IDE environment, the following screen shows the sample training dataset which contains the meta information as follows http://www.ijettjournal.org Page 105 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 And the input testing node sample taken as the basic metainformation of the node as follows ISSN: 2231-5381 http://www.ijettjournal.org Page 106 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 By analyzing the testing data attributes with training dataset sampl,theposterior probability can be calculated as follows V.CONCLUSION AND FUTURE WORK We are concluding our research work with efficient classification technique for classifying the internet traffic classification, posteror probability as measure. We can enhance our research work By enhancing the drawbacks in the traditional approach like attribute mismatching, semantic comparison and optimal feature extraction We can enhance our system by enhancing the drawbacks of mismatched attributes,like if training attribute set more than than the testing dataset attribute set and vice versa,In those situations in testing and training datasets and integrating the semantics features while classifying the data REFERENCES [1] T. T. Nguyen and G. Armitage, “A survey of techniques for internet traffic classification using machine learning,” Commun. Surveys Tuts., vol. 10, no. 4, pp. 56–76, 4th Quarter 2008. [2] Y. Xiang, W. Zhou, and M. Guo, “Flexible deterministic packet marking: An iptraceback system to find the real source of attacks,” IEEE Trans. Parallel Distrib.Syst., vol. 20, no. 4, pp. 567–580, Apr. 2009. ISSN: 2231-5381 [3] Snort 2011 [Online]. Available: http://www.snort.org/ [4] Bro 2011 [Online]. Available: http://broids.org/index.html [5] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, “Internet traffic classification demystified:Myths, caveats, and the best practices,” in Proc. ACM CoNEXT Conf., New York, 2008, pp. 1–12. [6] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, “BLINC: Multilevel traffic classification in the dark,” in Proc. SIGCOMM Comput. Commun. Rev., Aug. 2005, vol. 35, pp. 229–240. [7] A. W. Moore and D. Zuev, “Internet traffic classification using bayesian analysis techniques,” in SIGMETRICS Perform. Eval. Rev., Jun. 2005, vol. 33, pp. 50–60. [8] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. New York: Wiley, 2001. [9] N.Williams, S. Zander, and G. Armitage, “A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification,” in Proc. SIGCOMM Comput. Commun. Rev., Oct. 2006, vol. 36, pp. 5–16. [10] Y.-S. Lim, H.-C.Kim, J. Jeong, C.-K. Kim, T. T. Kwon, and Y. Choi, “Internet traffic classification demystified: On the sources of the discriminative power,” in Proc. 6th Int. Conf., Ser. Co-NEXT’10, New York, 2010, pp. 9:1–9:12, ACM. http://www.ijettjournal.org Page 107 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013 [11] J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G. M. Voelker, “Unexpected means of protocol inference,” in Proc. 6th ACM SIGCOMM Conf. Internet Measurement, New York, 2006, pp. 313–326. [12] S. Zander, T. Nguyen, and G. Armitage, “Automated traffic classification and application identification using machine learning,” in Proc. Ann. IEEE Conf. Local Computer Networks, Los Alamitos, CA, 2005, pp. 250–257. [13] J. Erman, M. Arlitt, and A. Mahanti, “Traffic classification using clustering algorithms,” in Proc. SIGCOMM Workshop on Mining Network Data, New York, 2006, pp. 281–286. [14] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian, “Traffic classification on the fly,” in Proc. SIGCOMM Comput. Commun. Rev., Apr. 2006, vol. 36, pp. 23–26. [15] Y.Wang, Y. Xiang, and S.-Z. Yu, “An automatic application signature construction system for unknown traffic,” Concurrency Computat.: Pract.Exper., vol. 22, pp. 1927–1944, 2010. BIOGRAPHIES Veelamuri Ramakrishna Rao completed his MSc (Computer Science), and currently he is Pursuing M.Tech in Department of CSE in Dadi Institute of Engineering and Technology, His interested areas are Computer networks and Network security and data warehousing. L. Prasanna Kumar is an Associate Professor of Computer Science & Engineering Department, Dadi Institute to engineering and technology, Visakhapatnam, Andhra Pradesh, India. His main Research interests are Data mining, clouding computing, neural networks and he is having 8 years of experience. Amarendra Kothalanka is a Professor & Head of the Department of CSE. He obtained his M.Tech. In Computer Science & Technology from Andhra University. He is pursuing his Ph.D in Computer Science & Engineering from GITAM University, Visakhapatnam. His main research interests are Safety Critical Computer Systems, Software Engineering and Mobile Computing. He is the Sponsor of DIET ACM Student, Women in Computing and Professional Chapters. ISSN: 2231-5381 http://www.ijettjournal.org Page 108