An Efficient Network Traffic Classification Approach with Bayesian Classifier Veelamuri Ramakrishna Rao

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013
An Efficient Network Traffic Classification Approach
with Bayesian Classifier
Veelamuri Ramakrishna Rao1, L. Prasanna Kumar2, Amarendra Kothalanka3
1,2,3
M.Tech Scholar1, Associate Professor2, Professor & Head of the Department of CSE3
Computer Science & Engineering Department, Dadi Institute to engineering and technology, Visakhapatnam, Andhra Pradesh
Abstract: Internet traffic classification is still an important
research issue in the field of network traffic and issues over
network. Even though various traditional approaches available
they are counting with static measures, In this paper we are
proposing an efficient and simple probability based approach
with Bayesian classification for classifying the training dataset
of record of ip packet meta information(source IP, source port,
destination IP, destination port, header Length, isAck, isRst and
isSyn) with testing data inputs.Our practical approach shows
optimal results than the traditional approach.
I. INTRODUCTION
In the 21st century, the number of Internet users
increaseddramatically. The users applied several Internet
applications suchas WWW, FTP, peer-to-peer-based
software, web media,messaging, email, VOIP etc. This led to
fast increments of Internet traffic. The classification of
Internet traffic offers three main functions to the network
administrator, internet serviceprovider (ISPs), and
governments: First, the packet classificationcan be used in
the intrusion detection system (IDS) to detect thepatterns of
denial of service (DoS) or other malicious attacks. Itcan also
be used by the administrator to identify and control
thenetwork applications when needed. Second, it can be used
by the ISPs to monitor the network flows, diagnose the
network to findfaults, properly allocate the bandwidth to
applications, and ensurethe performance of the applications
and services running on thenetworks. Third, it can be used by
governments to do “LawfulInspection” (LI) of the payload of
packets, to obtain userinformation. Just like how telephone
companies offer to monitortelephone calls to the government,
ISPs provide the LI services tothe governments. [1, 2, 3]We
know the importance of the characterization of Internet
traffic.Now we need to understand the barriers for packet
classification.Internet traffic characterizing has been a
challenge over the pastfew years. [4] It requires in-depth
understanding of the sophisticated network protocol
structure, because there are manyvarious types of traffic for
the ISPs, as well as a large volume ofstream flows. With the
bandwidth and number of servicesincreasing, users can
perform much more complicated activitiesthan before. A
broadband user can perform tasks such as VoIP,shopping and
ISSN: 2231-5381
banding online, peer-to-peer-based file and videosharing
among peers, and much more complicated functions thatwere
previously known by dial-up users.[5] The complexity
willincrease when using different wireless technologies such
as the4G Long Term Evolution (LTE) system and the Wi-Fi
system.[6]
II. RELATED WORK
Classifying the network traffic information leads to the
analyzing the network sample information with the training
information sets in the mobile adhoc networks and peer to
peer communications, So many issues raises if agent can not
classify before communicate with the receiving node
Due to the limitation of port-based and payloadbasedclassification, the recent research focuses on the use of
transportand flow layer behavior statistics for packet
classification. [6, 7 ,8] This approach uses a set of sample
traffic trace to train theclassification engine to identify future
traffic based on theapplication flow behaviors, such as packet
length, inter-packetarrival time, TCP and IP flags, and
checksum. Their target is to classify traffic with similar
patterns into groups, or classify trafficinto individual
application. However, the accuracy of classifying encrypted
traffic using a statistical-based approach is relativelylow,
varying from 76% to 86% with false positive rate between0%
and 8% base on different rule settings. [8, 9, 10, 11]
Manyresearchers use Machine Learning (ML) to perform
statistic-basedclassification. The reason to choose ML is
because it canautomatically create the signatures for the
application andautomatically identify the application in the
future traffic flow.Another reason to choose ML is it has the
ability to automaticallyselect the most appropriate features to
create the signature.
Classification is the method to train the machine with
sampledatasets, which means the traffic type is known to the
user, inorder to build classification rules. Then, the machine
uses therules to classify unknown datasets. Clustering is the
method to find similar patterns among different traffic types,
and group thetraffic that has similar patterns in the clusters.
This method doesnot require supervision. Association is a
http://www.ijettjournal.org
Page 104
International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013
way to detect therelationships between attributes. Numeric
prediction is a way tofind the total number features appearing
in the dataset. Thismethod is useful when finding important
features or attributes.This method is supervised learning.The
main
difference
between
supervised
learning
andunsupervised learning is that supervised learning needs
trainingdatasets to train the machine, whereas unsupervised
learning doesnot require a training phase.
for members of that class. Examples are grouped in classes
because they have common values for the features. Such
classes are often called natural kinds. In this section, the
target feature corresponds to a discrete class, which is not
necessarily binary.
III.PROPOSED SYSTEM
In our proposed approach we are introducing a empirical
model of internet traffic classification approach with
Bayesian classification, by calculating the initial and
posterior probability for the individual attribute set of testing
dataset with training dataset corresponding testing dataset.
Given an example with inputs X1=v1,...,Xk=vk, is used to
compute the posterior probability distribution of the
example's classification, Y:
P(Y
|
X1=v1,...,Xk=vk)
(P(X1=v1,...,Xk=vk|
Y)
=
×P(Y))/(P(X1=v1,...,Xk=vk))
(P(X1=v1|Y)×···×P(Xk=vk|
Y)×P(Y))/(
=
∑YP(X1=v1|Y)×···×P(Xk=vk| Y)
×P(Y))
where the denominator is a normalizing constant to ensure
the probabilities sum to 1. The denominator does not depend
on the class and, therefore, it is not needed to determine the
most likely class.
There are many existing methods to assign packets in the
network to a particular application (class), but none of them
were capable of providing high-quality per-application
statistics when working in high-speed networks.
Classification by ports or protocol can provide sufficient
results only for limited number of applications, which use
fixed port numbers or contain characteristic patterns in the
payload, but fails to work in vast networks and busy internet
traffic. We are developing a new System to work efficiently
and effectively in all type of networks with deeper study of
the network packets.
The system captures IP packets crossing a target network and
constructs traffic flows by checking the headers of IP
packets. A flow consists of successive IP packets with the
same 8-tuple: “source IP, source port, destination IP,
destination port, header Length, isAck, isRst and isSyn”. In
the existing system they consider only 5-tuples for
classification, but from these 5-tuples we couldn’t get
sufficient information to classify the internet traffic
efficiently, as they are basic properties of a packet. So to
study the packets in depth we are taking few new tuples with
the basic tuples. In the tradional system they used Naive
Bayesian classifier for the classification of traffic, but this
classifier takes much time in computation of probability than
Bayesian classifier. As time is more important in network
classification we are using Bayesian classifier in proposed
system.
Classification is the process of classifying the testing data
with training data and predicts the class values by the
probability . A Bayesian classifier is based on the idea that
the role of a (natural) class is to predict the values of features
ISSN: 2231-5381
In approach we calssify the network traffic ,Intially get the
Device from the node and Load the training dataset with
basic meta attribute set(source IP, source port, destination IP,
destination port, header Length, isAck, isRst and isSyn)
,Connected node(sample node) details can be retrieved for
analyzing the traffic for the classification or machine learning
approach.
Compute the posterior probability with respect to the
individual attributes in the training dataset record and
connected node meta data information as sample, which leads
to the dynamic chances of computing the prediction values
and decides the class label sets as final results
IV.EXPERIMENTAL ANALYSIS
For implementation purpose, Classification of network
traffic information developed through java language with
Eclipse IDE environment, the following screen shows the
sample training dataset which contains the meta information
as follows
http://www.ijettjournal.org
Page 105
International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013
And the input testing node sample taken as the basic metainformation of the node as follows
ISSN: 2231-5381
http://www.ijettjournal.org
Page 106
International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013
By analyzing the testing data attributes with training dataset sampl,theposterior probability can be calculated as follows
V.CONCLUSION AND FUTURE WORK
We are concluding our research work with efficient
classification technique for classifying the internet traffic
classification, posteror probability as measure. We can
enhance our research work By enhancing the drawbacks in
the traditional approach like attribute mismatching, semantic
comparison and optimal feature extraction
We can enhance our system by enhancing the drawbacks of
mismatched attributes,like if training attribute set more than
than the testing dataset attribute set and vice versa,In those
situations in testing and training datasets and integrating the
semantics features while classifying the data
REFERENCES
[1] T. T. Nguyen and G. Armitage, “A survey of techniques
for internet traffic classification using machine learning,”
Commun. Surveys Tuts., vol. 10, no. 4, pp. 56–76, 4th
Quarter 2008.
[2] Y. Xiang, W. Zhou, and M. Guo, “Flexible deterministic
packet marking: An iptraceback system to find the real
source of attacks,” IEEE Trans. Parallel Distrib.Syst., vol. 20,
no. 4, pp. 567–580, Apr. 2009.
ISSN: 2231-5381
[3] Snort 2011 [Online]. Available: http://www.snort.org/
[4]
Bro
2011
[Online].
Available:
http://broids.org/index.html
[5] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M.
Faloutsos, and K. Lee, “Internet traffic classification
demystified:Myths, caveats, and the best practices,” in Proc.
ACM CoNEXT Conf., New York, 2008, pp. 1–12.
[6] T. Karagiannis, K. Papagiannaki, and M. Faloutsos,
“BLINC: Multilevel traffic classification in the dark,” in
Proc. SIGCOMM Comput. Commun. Rev., Aug. 2005, vol.
35, pp. 229–240.
[7] A. W. Moore and D. Zuev, “Internet traffic classification
using bayesian analysis techniques,” in SIGMETRICS
Perform. Eval. Rev., Jun. 2005, vol. 33, pp. 50–60.
[8] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classification. New York: Wiley, 2001.
[9] N.Williams, S. Zander, and G. Armitage, “A preliminary
performance comparison of five machine learning algorithms
for practical ip traffic flow classification,” in Proc.
SIGCOMM Comput. Commun. Rev., Oct. 2006, vol. 36, pp.
5–16.
[10] Y.-S. Lim, H.-C.Kim, J. Jeong, C.-K. Kim, T. T. Kwon,
and Y. Choi, “Internet traffic classification demystified: On
the sources of the discriminative power,” in Proc. 6th Int.
Conf., Ser. Co-NEXT’10, New York, 2010, pp. 9:1–9:12,
ACM.
http://www.ijettjournal.org
Page 107
International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 2- Dec 2013
[11] J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G. M.
Voelker, “Unexpected means of protocol inference,” in Proc.
6th ACM SIGCOMM Conf. Internet Measurement, New
York, 2006, pp. 313–326.
[12] S. Zander, T. Nguyen, and G. Armitage, “Automated
traffic classification and application identification using
machine learning,” in Proc. Ann. IEEE Conf. Local
Computer Networks, Los Alamitos, CA, 2005, pp. 250–257.
[13] J. Erman, M. Arlitt, and A. Mahanti, “Traffic
classification using clustering algorithms,” in Proc.
SIGCOMM Workshop on Mining Network Data, New York,
2006, pp. 281–286.
[14] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and
K. Salamatian, “Traffic classification on the fly,” in Proc.
SIGCOMM Comput. Commun. Rev., Apr. 2006, vol. 36, pp.
23–26.
[15] Y.Wang, Y. Xiang, and S.-Z. Yu, “An automatic
application signature construction system for unknown
traffic,” Concurrency Computat.: Pract.Exper., vol. 22, pp.
1927–1944, 2010.
BIOGRAPHIES
Veelamuri Ramakrishna Rao completed his
MSc (Computer Science), and currently he is
Pursuing M.Tech in Department of CSE in
Dadi Institute of Engineering and Technology,
His interested areas are Computer networks and
Network security and data warehousing.
L. Prasanna Kumar is an Associate Professor
of Computer Science & Engineering
Department, Dadi Institute to engineering and
technology, Visakhapatnam, Andhra Pradesh,
India. His main Research interests are Data
mining, clouding computing, neural networks
and he is having 8 years of experience.
Amarendra Kothalanka is a Professor & Head
of the Department of CSE. He obtained his
M.Tech. In Computer Science & Technology
from Andhra University. He is pursuing his
Ph.D in Computer Science & Engineering
from GITAM University, Visakhapatnam. His
main research interests are Safety Critical
Computer Systems, Software Engineering and Mobile Computing.
He is the Sponsor of DIET ACM Student, Women in Computing
and Professional Chapters.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 108
Download