Detection of Skype flows in Web traffic

advertisement
Detecting Skype flows Hidden
in Web Traffic
Presenter: Kuei-Yu Hsu
Advisor: Dr. Kai-Wei Ke
2013/4/29
Outline
 Introduction
 Proposed Methodology
 Experimental Datasets
 Experimental Results
 Conclusions
2
Introduction
• What is VoIP?
• Delude restrictive firewalls
• Skype Proprietary Protocol
• About Detection
3
What is VoIP?
 VoIP(Voice over Internet Protocol): Refers to a
way to carry phone calls over an IP data network,
whether on the Internet or your own internal
network.
 VoIP calls are usually much cheaper than
traditional long distance telephone calls to PSTN
users, or even free if a call is placed directly from
a VoIP end user to another one.
4
Delude restrictive firewalls
 Restrictive firewalls are commonly adopted by
network managers in an effort to give a better security
to the internal network and optimize the use of
network resources.
 Such firewalls are unlikely to block Web traffic
because it is usually perceived as a fundamental
service considered essential for Internet access.
 Using TCP ports 80 (HTTP) or 443 (HTTPS) for
delivering non-HTTP traffic, thus fooling restrictive
firewalls to gain network access.
5
Skype Proprietary Protocol
 Skype can delude a network firewall by using
Web ports to establish communication with other
Skype peers.
 This strategy is adopted by Skype as a fallback
mechanism in the case of other strategies fail to
get through a restrictive firewall.
 Such a strategy renders Skype traffic disguised
as Web traffic quite difficult to be detected by
network operators.
6
About Detection
 Detection of Skype flows in Web traffic
HTTP Workload Model
2. Goodness-of-fit tests
1.
1) Chi-square test
2) Kolmogorov-Smirnov test
3.
P2P VoIP characteristics
 Detection Process
Training Datasets
2. Evaluation Datasets
1.
7
Proposed Methodology
• HTTP Workload Model
• Goodness-of-fit tests
1) Chi-square test
2) Kolmogorov-Smirnov test
• Skype characteristics
8
Proposed Methodology
9
1.
Define a HTTP workload model and capture real
Web data to build empirical distributions of
some relevant parameters.
2.
Capture Web traffic with VoIP calls hidden in it,
calculate the same relevant parameters for
each flow and use metrics taken from two
Goodness-of-fit tests to decide whether the
computed parameters are compatible (or not)
with the empirical distributions derived in the
previous step, classifying each flow as
legitimate Web traffic or not.
Proposed Methodology
10
HTTP Workload Model
 Define a model for evaluate Web “normal”
behavior.
 This model has the following parameters:
1. Web request size;
2. Web Response size;
3. Interarrival time between requests;
4. Number of requests per page;
5. Page retrieval time;
11
Goodness-of-fit tests
1. Chi-square test
 It was first investigated by Karl Pearson in 1900.
 Oi: an observed frequency;
 Ei: an expected (theoretical) frequency, asserted
by the null hypothesis;
 K: the number of classes.
12
Goodness-of-fit tests
2. Kolmogorov-Smirnov test
 It quantifies a distance between the empirical
distribution function of the sample and the
cumulative distribution function of the reference
distribution.
 F0(x): the empirical distribution function derived
from the training part.
 Sn(x):the cumulative step function of a sample of
N observations.
13
Skype characteristics
 It does not use SIP or other known signaling
protocol for VoIP calls and all its traffic is end-toend encrypted.
 Automatically detect network characteristics and
choose the best option available to communicate
with other Skype peers.
 It only uses Web ports as a fallback mechanism,
when UDP is not available.
14
Experimental Datasets
Training Datasets – model part
2. Evaluation Datasets – detection part
1.
15
Training Datasets - model part
 Using a training dataset to characterize a “normal”
Web traffic behavior.
tcpdump: capture HTTP full packet traces,
generating dump files.
2. tcpflow: read these dump files and calculate
the parameters present in the Web workload
model.
1.
16
Training Datasets
 read HTTP headers to clearly identify a Web
request or a Web response and we also compute
the inactivity time between Web messages.
 ISP: Internet service provider
 ACD: academic institution
17
Training Datasets
18
Training Datasets
19
Training Datasets
20
Evaluation Datasets - detection part
tcpdump: captured Web packet traces, but this
time only TCP/IP headers were captured.
2. Another software: the calculations and the
division of flows in Web pages are done without
examining TCP payload (HTTP headers)
information.
1.

21
Web Message Size: consider every MTU-sized
packet as a part of the same Web message, if
there is not too much inactive time between them.
Evaluation Datasets
 We used the number of requests per page as a
filter to remove smaller flows.
 The other three parameters(Web request size、
Web Response size、Interarrival time between
requests) are represented by a list of values and
they are used in Equations (1) and (2) to
generate a χ2 or a Kolmogorov-Smirnov D score.
22
Evaluation Datasets
 we have three values that can be compared with
thresholds to define if this set of related requestresponse messages is likely to be Skype or not.
 VoIP calls of different durations were produced in a
controlled way by a small network of computers
behind port-restrictive firewalls running the Skype
program.
23
Experimental Results
• Sensitivity and specificity
• ROC curves
• Detecting Skype flows
• Evaluating real-time detection
24
Sensitivity and specificity
 Sensitivity and specificity are statistical measures
of the performance of a binary classification
test, also known in statistics as classification
function.
 The test outcome can be positive or negative
 True positive = correctly identified
 False positive = incorrectly identified
 True negative = correctly rejected
 False negative = incorrectly rejected
25
ROC curves
 ROC curves: Receiver Operating Characteristic
curves
 A graphical plot of the sensitivity against
(1−specificity) of a binary classifier. Sensitivity is
the same as true positive rate and (1−specificity)
is equal to false positive rate.
 The classifier has a discrimination threshold
that is varied to produce different points in the
curve.
26
Detecting Skype flows
27
28
Detecting Skype flows
29
Detecting Skype flows
30
Detecting Skype flows
 Fig. 5. χ2 detection.
 90% of 80 Skype flows correctly identified (i.e. true
positive rate) with less than 2% of 17,294 non-Skype
flows incorrectly identified (i.e. false positive rate)
 a 100% detection rate with around 5% of false positives.
 Fig. 6. Kolmogorov-Smirnov D detection.
 a true positive rate of 70% with a false positive rate
around 2%
 a 80% detection with 5% of false positives.
 χ2 ROC curve are always closer to the top left corner
in comparison with the K-S curve.
31
Evaluating real-time detection
 a network administrator may want to identify the
Skype calls that are currently using the network, not
the calls made some minutes or hours ago.
 here the data is captured and analyzed using limited
short time intervals.
 the χ2 detection using the newly generated trace (the
set of all 10s capture files) had a true positive rate up
to 85% with a smaller number of false positives
compared to the χ2 detection using the ISP-3 trace.
32
Evaluating real-time detection
33
Conclusions
34
Conclusions
 It is rather common to find non-HTTP traffic using
Web ports to delude firewalls and other network
elements.
 We evaluated a Skype detection system based
on statistical tests to efficiently detect Skype flows
hidden among Web traffic without a search for
particular Skype patterns or signatures and
without regarding payload information.
35
Conclusions
 We manually produced Skype traffic to build our
Web evaluation dataset and verify that the
proposed parameters are able to identify Skype
flows hidden among HTTP traffic.
 Using simple metrics taken from two Goodness-
of-Fit tests, the χ2 value and the KolmogorovSmirnov distance, we show that Skype flows can
be clearly detected, but our results suggests that
the χ2 metric is a much better choice.
36
Conclusions
 considering the experimental results for the chi-
square detection, our methodology provides
enough flexibility for the network management to
adopt different approaches regarding the possible
detection of Skype flows in Web traffic.
 As future work
 intend to further analyze the real-time detection by
investigating the minimum time interval needed.
 intend to build and evaluate an optimized version of
our tool to perform real-time monitoring in network
links.
37
References
 E. P. Freire, A. Ziviani, and R. M. Salles, "
Detecting Skype Flows in Web Traffic," Proc. of
the IEEE/IFIP Network Operations and
Management Symposium (NOMS 2008), April
2008, pp. 89-96.
 Emanuel P. Freire, Artur Ziviani and Ronaldo M.
Salles, "Detecting VoIP Calls Hidden in Web
Traffic," IEEE transaction on network and service
management, Vol no. 5, pp- 210-214, December
2008.
38
Thanks for listening
39
Download