www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 10 October, 2014 Page No. 8964-8971 Internet traffic classification using Hybrid Aggregated classifier and Neural Network Ms. G. Rubadevi1, Mrs. R. Amsaveni2, 1 Research scholar, Department of Computer science, PSGR.Krishnammal college for women, Tamil Nadu, India. 2 Assistant professor, Department of Information Technology, PSGR.Krishnammal college for women, Tamil Nadu, India. Abstract: Internet traffic classification is a fundamental technology for modern network security such as quality of service (QoS) control. It is useful to tackle a number of network security problems including lawful interception and intrusion detection. There is an increasing demand on the development of modern traffic classification techniques due to the development of different application. In this work, Internet traffic is carried out by using the supervised classification techniques namely the Neural Network such as Multilayer perceptron (MLP) and Radial base function (RBF) and Hybrid Aggregated Classifier. The task involved in this work is IP packet capturing, Preprocessing, Flow container construction (If the flows observed in a certain period of time share the same destination IP, port, and transport layer protocol, they are determined as correlated flows and modeled as “Flow Container”), separating low density and high density flow, feature extraction and classification. The accuracy of the classifier Hybrid aggregated classification is better than Neural Network. Keywords: Internet traffic, Hybrid aggregated classifier, Neural Network. Flow based application detection and packet based method and 1. Introduction Internet traffic is the flow of data across the Internet. payload method. Eg: packet-headers alone do not contain required information for accurate methodology. So the Because of the distributed nature of the Internet, there is no accuracy in traditional techniques for traffic classification often single point of measurement for total Internet traffic. Today provides approximately 50.70% to 70%. connection to the Internet can be its most vital link to the Recent research method for traffic classification has been outside world. Too much Internet traffic can cause even the focused on the correlation based statistical features. The flow fastest connections to bog down. Learning to identify common statistical feature based internet traffic classification can be sources of Internet traffic can help you keep your bandwidth implemented by using supervised (classification) algorithms or available and prevent congestion issues from interfering with unsupervised (clustering) algorithms. The unsupervised traffic important data transfers. classification is very difficult to construct without knowing the Accurate network traffic classification plays vital role in numerous network activities, from security monitoring to real traffic classes. In this research work, set of pre-labeled data is given in the process with essential forecasts for long-term provisioning, and supervised traffic classification. The classifiers are divided into from Quality of Service to accounting. Internet traffic two categories based on the pre-labeled data: parametric and classification schemes are difficult to model accurately because non-parametric classifier. Parametric classifiers are Naïve of the limited information commonly available in the network. Bayes, C4.5 decision tree, Bayesian network, SVM and neural Traditional method of internet traffic classification includes network. Non-parametric classifiers are k-Nearest Neighbor (k- Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8964 NN) etc. The proposed internet traffic classification uses parametric classifier such as Hybrid Aggregated classifier (advantage of both Naïve Bayes) and Neural network (MLP and RBF). As reported, the NN classifier can achieve superior performance similar to that of the parametric classifiers, SVM and neural nets. The NN classifier has several important advantages. For example, it does not require training procedure for over fitting of parameters, it is able to process a large number of classes. The accuracy of the NN classifier is affected by a small size of training data. When the number of training samples reduces from 100 to 10 for each class, t he classification accuracy of the NN-based traffic classifier goes down by approximate 20 %. The Hybrid Aggregated classifiers have the advantage of both NB and C4.5 and it provides high accuracy than NN classifier. In this research work, the traffic dataset is divided Fig 1: Flow diagram for internet traffic classification Preprocessing into Low density flow and high density flow. For high density Pre-processing is a process of removing noise and incorrect traffic NB classifier is used for classification and for Low data by data cleaning and data reduction techniques. Real- density flow C4.5 algorithm is used for classification. world database are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size often several gigabytes or more and their likely from multiple, heterogeneous 2. Methodology sources. Low-quality data will lead to low-quality mining 2.1 The Proposed Framework result, The proposed framework of internet traffic classification consist of following modules, IP packet capturing, Preprocessing, Flow Container (FC) construction, Low density and high density flow, Feature Extraction, Feature discretization, Classification and Performance evaluation. The proposed classification performance like accuracy, precision, recall and f-measure are compared between our hybrid classifier with machine learning algorithm such as simple Naïve Bayes, Neural Network (Multilayer Perceptron, Radial Basis Function).The proposed hybrid classifier gives higher accuracy than the other algorithms. IP packet capturing: The Internet traffic packet was taken by using WIRESHARK so it prefers a preprocessing concepts. Data preprocessing techniques includes Data Cleaning, Data Integration, Data Transformation, and Data Reduction. In the preprocessing, the system captures IP packets crossing a target network and constructs traffic flows by checking the headers of IP packets. A flow consists of successive IP packets with the same 5-tuple: source IP, source port, destination IP, destination port, and transport layer protocol. We apply a heuristic way to determine the correlated flows and model them using “Flow Container (FC)”. Correlated flows If the flows observed in a certain period of time share the same destination IP, destination port, and transport layer protocol, they are determined as correlated flows. tool from ISP provider of educational institution. The Wireshark tool is well-known open-source packet capturing software is used to capture internet traffic. It captures network packets and extracts detail of the captured packet. To create data set, internet traffic packets are captured for the duration of Flow Container construction In the proposed scheme, a set of correlated flows are generated by the same application, which is modeled using a “Flow 1 minute. Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8965 Container”. Since the flows, belong to the same application- methods are combined with methods coming from the field of based class, such correlation information can be utilized to artificial intelligence. improve the classification results. Therefore, we aim to aggregate the individual predictions of the correlated flows so The following features were found to match the above criteria and became the base feature set for our experiments: as to conduct more accurate classification. Our research shows Protocol that the goal can be achieved by following the approach of Flow duration classifier combination. The analysis on classifier combination Flow volume in bytes and packets using bagging and random subspace are provided. Packet length (minimum, mean, maximum There is a strong assumption that the average performance of and standard deviation) all the individual classifiers, each trained on a subset of Inter-arrival time between packets (minimum, features and the training set replicas, is similar to a classifier mean, maximum and standard deviation). which uses the full feature set and the whole training set. This assumption is not always true, but we do not make such assumption here. From the inequality, one can see that the more accurate aggregated classifier can be obtained with the higher diversity of the simple predictor. In our work, the simple predictor is unstable due to a small set of training data. Consequently, the aggregation of correlated flow predictions can improve the performance to generate the aggregated predictor. Low density flow and high density flow After construct the “FLOW CONTAINER (FC)”, the traffic flow is divided into low density flow and high density flow based up on the each packet size. 2.2. Feature Extraction The proposed research work follows statistics based classification. In this statistical feature of the packet-level-trace is grabbed and used to classify the network traffic. E.g., a jump in the rate of packets generated by a host might be the sign of worm propagation. However, a jump in the rate of packets might be an indication of a P2P application, which generates plenty of zero payload flows while peers try to connect to each other. In case of statistical approaches it is feasible to determine the application type, but specific application/client Packet lengths are based on the IP length excluding link layer overhead. Inter-arrival times have at least microsecond precision and accuracy (traces were captured using wireshark). As the traces contained both directions of the flows, features were calculated in both directions (except protocol and flow duration). This produces a total of 22 flow features, which we refer to as the ‘full feature set’. Our features are simple and well understood within the networking community. 3. Machine learning classification 3.1. Neural Network A neural network is a type of computational model which is able to solve multi problems in various fields. It processes the information in a similar way as the human brain concept processing the information. Basically, neural network consists of large processing elements called neurons working together to perform specific tasks. As in the human brain, there are thousands of dendrites which contain information signals. They transmitted the signals to the axon in the form of electrical spikes. The axon then sends the signals to another dendrites causing to a synapse. This synapse occurred when excitatory input is sufficiently large than the inhibitory input, and this concept of signal transmission also depicted on how neural network process inputs received. cannot be determined in general: e.g., it cannot be stated that a Multilayer Perceptron (MLP): flow belongs to Skype or MSN Messenger voice traffic but it can be assumed that it is the traffic of some kind of VoIP application, which generates packets with constant bit rate in both directions. These flow characteristics can be hardcoded manually or another way is to automatically discover the features of a specific kind of traffic. To achieve this, statistical A neural network is characterized by 1) its pattern of connections between the neurons (called its architecture), 2) its algorithm of determining the weights on the connections (called its training, or learning algorithm), and 3) its activation function. The Multilayer Perceptron (MLP) is the most common neural network. This type of neural network is known as a supervised network because it requires a desired output in Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8966 order to learn. The purpose of the MLP is to develop a model that correctly maps the input data to the output using historical data so that the model can then be used to produce the output result when the desired output is unknown. A graphical representation of an MLP is shown in Figure 2. In the first step, the MLP is used to learn the behavior of the input data using back-propagation algorithm. This step is called the training phase. In the second step, the trained MLP is used to test using unknown input data. Figure 3: Radial Basis Function Architecture In this research work Radial basis function provides only 70% to 75% of accuracy. 3.2. Hybrid aggregated classifier The Hybrid aggregated classifier takes the advantage of both NB classifier and C4.5 classifier. The traffic flow is divided into low density flow and high density flow based up on the length of the packet. The low density flow is classified using the NB algorithm and high density flow is classified using C4.5 algorithm. Figure 2: MLP architecture with two hidden layers Our proposed system classifier performance is evaluated There are different training algorithms, while it is very along with attacks. In our system, attacks are considered as an difficult to know which training algorithm is the fastest for a unknown source address in the data transmission. Unknown given problem. In order to determine the fastest training source address is considered as an attack in our process. Our algorithm, many parameters should be considered. For proposed classification performance like accuracy, precision, instance, the complexity of the problem, the number of data recall and f-measure are compared between our hybrid points in the training set, the number of weights, and biases in classifier without attacks and along with attacks. Proposed the network, and error goal should be evaluated. hybrid classifier is performs well even in the attacks In this research work, Multilayer perceptron provides only 65% to 75% of accuracy. environment. High density flow – NB classifier Naïve Bayesian is Simple (“naive”) classification method Radial Basis Function Neural Network based on Bayes rule. The Bayesian Classification represents a Radial basis function (RBF) networks typically have three supervised learning method as well as a statistical method for layers: an input layer, a hidden layer with a non-linear RBF classification. Assumes an underlying probabilistic model and activation function and a linear output layer. Radial Basis it allows us to capture uncertainty about the model in a Function (RBF) Neural Network is a multilayer feed forward principled way by determining probabilities of the outcomes. It artificial neural network which uses radial basis functions as can solve diagnostic and predictive problems. Bayesian activation functions at each hidden layer neuron. The output of classification provides practical learning algorithms and prior this RBF neural network is weighted linear superposition of all knowledge and observed data can be combined. Bayesian these basis functions. Classification provides a useful perspective for understanding and evaluating many learning algorithms. It calculates explicit probabilities for hypothesis and it is robust to noise in input data. Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8967 The Naive Bayesian classifier is based on Bayes’ theorem with independence assumptions between predictors. A Naive with missing attribute values, it can also handle attributes with differing costs etc Bayesian model is easy to build, with no complicated Let S be set consisting of s data samples with m distinct iterative parameter estimation which makes it particularly classes. The expected information needed to classify a given useful for very large datasets. Despite its simplicity, the Naive sample is given by Bayesian classifier often does surprisingly well and is widely used because it often outperforms more sophisticated classification methods. Bayes theorem provides a way of calculating the posterior probability, P(c|x), from P(c), P(x), and P(x|c). Naive Bayes classifier assumes that the effect of the pi is the probability that an arbitrary sample belongs to class value of a predictor (x) on a given class (c) is independent of Ci and is estimated by si/s. Let attribute A has v distinct values. the values of other predictors. This assumption is called class Let sij be number of samples of class Ci in a subset Sj. Sj conditional independence. contains those samples in S that have value aj of A. The entropy, or expected information based on the partitioning into subsets by A, is given by • is the posterior probability of class (target) given predictor (attribute). • • is the prior probability of class. is the likelihood which is the probability of predictor given class. • is the prior probability of predictor. The encoding information that would be gained by branching on A is Low density flow – C4.5 classifier C4.5 is a well-known decision tree Machine Learning algorithm used to generate Univariate decision tree. It is an C4.5 uses gain ratio which applies normalization to information extension of Iterative Dichotomiser 3 (ID3) algorithm which is gain using a value defined as: used to find simple decision trees. C4.5 is also called as Statistical Classifier due of its classification capability. C4.5 makes decision trees from a set of training data samples, with the help of information entropy concept. The training dataset consists of large number of training samples which are characterized by various features and it also consists of target class. C4.5 selects one particular feature of the data at each node of the tree which is used to split its set of samples into The above value represents the information generated by splitting the training data set S into v partitions corresponding to v outcomes of a test on the attribute A. The gain ratio is defined as subsets enriched in one or another class. It is based upon the criterion of normalized information gain that is obtained from selecting a feature for splitting the data. The attribute with the highest gain ratio is selected as the The feature with the highest normalized information gain is splitting attribute. The non leaf nodes of the decision tree selected and a decision is made. After that, the C4.5 algorithm generated are considered as relevant attributes. The authors repeats the same action on the smaller subsets. C4.5 has made a have integrated decision tree and neural network, which number of improvements to ID3 like it can handle both resulted in improved classification accuracy. continuous and discrete attributes, it can handle training data Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8968 In this research work hybrid classifier provides the highest accuracy than the other machine learning algorithm. The accuracy of the hybrid classifier is 90% to 96%. 4. Experimental Result 4.1. Dataset Multilayer Perceptron 65.09 68.65 66.74 69.85 Radial Basis Function 69.26 72.24 70.70 75 Hybrid Aggregated classifier 93.96 88.80 90.77 96.28 The Internet traffic packet was taken by using WIRESHARK tool from ISP provider of educational institution. It captures Accuracy network packets and extracts detail of the captured packet. To It is the percentage of correctly classified samples over all create data set, internet traffic packets are captured for the classified samples. Accuracy can be calculated from formula duration of 1 minute. There are 2730 packet was captured for 1 given as follows minute time period. For our experimentation, we are given the 2330 data instance Accuracy = to the training phase in the classifiers. In the training phase, we are given the data instance along with the class label for a Figure4: Accuracy for different classifier training purpose. Thus in the training phase, classifiers are learn the data with features according to the class label. For the testing phase, we are taking the above 400 traffic flows for each protocol such as ALC, ARP, UDP, CDP, DHCP, LLMNR and NBNS. These dataset are classified using existing system NB classifier and proposed hybrid classifier. 4.2. Result Working Environment Various experiments have been carried out by implementing Recall classification algorithms such as Neural Network such as It is the proportion of samples of a particular class Z multilayer perceptron, Radial Basis Function and Hybrid correctly classified as belonging to that class Z. It is equivalent aggregated classification supervised learning algorithms are to True Positive Rate (TPR). Recall can be calculated from implemented using MATLAB 2012. The results of the formula given as follows experiments are compared using accuracy and F-measure. Recall = Comparative result of all the classifier and overall performance Figure5: Recall for different classifier Comparative results of three experiments carried out by implementing algorithms including Neural Network such as multilayer perceptron, Radial Basis Function and Hybrid aggregated classification. The comparative result shows that Hybrid aggregated classifier gives better result than the other classifier. Table 1: Performance of all the classifier Precision ALGORITHM PRECISI ON (%) RECAL L (%) FMEASUR E (%) ACCUR ACY (%) Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8969 It is the proportion of the samples which truly have class z The existing system has several drawbacks such that doesn’t among all those which were classified as class z. Precision can analyses the density of the data. For high density and also low be calculated from formula given as follows density, this system used the same classifier. This will degrades the performance of the system and also less number of features is extracted from the data. This will degrade the accuracy rate Precision = of the system. With the intension of overcome these problems Figure6: Precision for different classifier as well as to increase the accuracy rate of traffic classification, we are proposing the novel hybrid aggregated classifier. The proposed novel hybrid aggregated classifier contains the advantage of Naïve Bayesian classifier and C4.5 classifier. Based on the density of the data, in this system uses these two classifiers for traffic classification purpose. In addition, proposed system is extracts more relevant features from the traffic data in order to enhance the accuracy rate as well as improve the performance of the system. Hence proposed system classifier performance is evaluated along with attacks. The experimental results show that the proposed scheme can F-measure comparison F-measure distinguishes the correct classification of document labels within different classes. In essence, it assesses the effectiveness of the algorithm on a single class, and the higher it is, the better is the clustering. It is defined as follows: F = 2.(Precision.Recall) / (Precision + Recall) Figure7: F-measure for different classifier achieve much better classification performance than existing internet traffic classification methods. Future scope of work: To improve the performance of ML classifier, our future work will include: In this research work, internet traffic dataset has been developed by considering packet flow duration of 1 minute. An increase in the capture duration for the training data set, so that a significant variation in the feature values for different classes could be observed. The internet traffic can also be captured from various different real time environments such as university, offices, home environments, Shopping mall etc. Various other types of attacks in the real time traffic can be finding out by using different techniques. In this research work, the performance evaluation of different classifier shows that the Hybrid Aggregated classifier provides better result than Neural Network References 5. Conclusion [1]. Jun Zhang, Chao Chen, Yang Xiang and Yong Xiang Internet “Traffic Classification by Aggregating Correlated Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8970 Naive Bayes Predictions” IEEE Transaction on Information Forensics and Security, VOL. 8, NO. 1, JANUARY 2013- 5 [2]. Kuldeep Singh, S. Agrawal, B.S. Sohi, “A Near Real-time IP Traffic Classification Using Machine Learning”, I.J. Intelligent Systems and Applications, 2013, 03, 83-93 Published Online February 2013 in MECS (http://www.mecspress.org/) DOI: 10.5815/ijisa.2013.03.09 [12]. Wireshark,Available:http://www.wireshark.org/ [13]. MATLAB,Available:www.mathworks.com. [14]. Mr. Shezad Shaikh, Mr. Niket Bhargava, Ms. Urmila Mahor “Implementation Of Internet Traffic Classifier Using Dbscan Algorithm”, International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, [3]. S. Agrawal Panjab, Jaspreet Kaur, B.S.Sohi “Machine pp.1616-1623. Learning Classifier for Internet Traffic from Academic [15]. Max Bhatia, Sakshi Kaushal, “A Hybrid Technique to Perspective” International Conference on Recent Advances and Identify Peer-to-Peer Internet Traffic “International Journal of Future Trends in Information Technology (iRAFIT2012) Computer Applications (0975 – 8887) Volume 74– No.13, July Proceedings published in International Journal of Computer Applications® (IJCA). [16].T. T. Nguyen and G. Armitage, “A survey of techniques [4].Kun-Chan Lan, John Heidemann,” On the correlation of Internet flow characteristics”. Internet “Network Traffic Classification Using Correlation Information”, IEEE Vol.8,No.1, January 2013-5 based approaches to handle imbalances in network traffic dataset for machine learning techniques” Communications A Institutions” IP Special Issue Multimedia from IJCA.www.ijcaonline.org. for accurate of attacks,” IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 4, pp. 567–580, Apr. 2009. [19]. Bro 2011 [Online]. Available: http://broids.org/index.html [20]. H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, “Internet traffic classification [8]. Murat Soysal, and Ece Guran Schmidt, “Machine learning algorithms [17].Y. Xiang, W. Zhou, and M. Guo, “Flexible deterministic [18]. Snort 2011 [Online]. Available: http://www.snort.org/ [7]. Jaspreet Kaur. S. Agrawal “ A Proposal for IP Traffic Educational 2008. packet marking: An ip traceback system to find the real source [6]. Raman Singh, Harish Kumar, and R.K. Singla “Sampling for for internet traffic classification using machine learning,” Commun. Surveys Tuts., vol. 10, no. 4, pp. 56–76, 4th Quarter [5]. Jun Zhang, Chao Chen, Yang Xiang and Yong Xiang Classifier 2013. flow-based network traffic demystified:Myths, caveats, and the best practices,” in Proc. ACM CoNEXT Conf., New York, 2008, pp. 1–12. classification: Evaluation and comparison,” Performance Evaluation Elsevier Journal, Vol. 67, 2010, pp. 451-467. [9]. Indra Bhan Arya, and Rachna Mishra, “Internet Traffic Classification: An Enhancement in Performance using Classifiers Combination,” International Journal of Computer Science and Information Technologies, Vol. 2 (2), 2011, pp. 663-667. [10]. Shijun Huang Kai Chen Chao Liu, Alei Liang, Haibing Guan, “A Statistical-Feature-Based Approach to Internet Traffic Classification Using Machine Learning” 9781-42443941-6/09/$25.00 ©2009 IEEE [11]. Yongli Ma, Zongjue Qian, Guochu Shou, Yihong Hu“Study on Preliminary Performance of Algorithms for Network Traffic Identification” 978-0-7695-3336-0/08 $25.00 © 2008 IEEE DOI 10.1109/CSSE.2008.1277 Ms. G. Rubadevi1 IJECS Volume3 Issue10 October, 2014 Page No.8964-8971 Page 8971