A New Methodology for Packet Trace Classification and

advertisement
A New Methodology for Packet Trace Classification
and Compression based on Semantic Traffic
Characterization
by
Raimir Holanda Filho
Submitted to the Computer Architecture Department
In Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy in Computer Science
at the
Technical University of Catalonia
September 2005
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 01
1.1. The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 01
1.2. Overview of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 02
1.3. Contribution of our work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 03
2. Traffic Modeling and Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 07
2.1. Classical traffic characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 07
2.2. Traffic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3. Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3. Semantic Traffic Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1. Semantic Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4. Flow Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
4.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3. Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
5. Entropy of TCP/IP Header Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47
5.2. Packet Level Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3. Flow Level Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4. Trace Compression Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
6. Lossless Compression Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.1. Generic Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2. TCP/IP Header Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3. Proposed Header Trace Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4. Decompression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.5. Compression Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7. Lossy Compression Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.1. Packet Trace Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2. Decompression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.3. Compression Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.4. Comparative Packet Trace Characteristics . . . . . . . . . . . . . . . . . . . . . . 100
7.5. Memory Performance Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8. Trace Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.1. Packet Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.2. Flow Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.3. Packet Trace Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
List of Figures
Figure 1.1 Relation between the main contributions . . . . . . . . . . . . . . . . . . . . . . . 5
Figure 3.1 RedIRIS topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 3.2 TSH header data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 3.3 Flow mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 4.1 The asymptotic equipartition property . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 5.1 Flow clustering methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure 5.2 Number of clusters for
- RedIRIS . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure 5.3 Number of clusters for ATM OC-3 traces . . . . . . . . . . . . . . . . . . . . . 52
Figure 5.4. Selected fields used to flow clustering . . . . . . . . . . . . . . . . . . . . . . . 53
Figure 5.5 Number of clusters for m=2 packets . . . . . . . . . . . . . . . . . . . . . . . . . 54
Figure 5.6 Number of clusters for m=3 packets . . . . . . . . . . . . . . . . . . . . . . . . . 55
Figure 5.7 Number of clusters for m=4 packets . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure 5.8 Number of clusters for m=5 packets . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 5.9 Number of clusters for m=6 packets . . . . . . . . . . . . . . . . . . . . . . . . . 59
Figure 5.10 Number of clusters for m=7 packets . . . . . . . . . . . . . . . . . . . . . . . . 59
Figure 5.11 Relation between clusters and flows . . . . . . . . . . . . . . . . . . . . . . . . 61
Figure 6.1 Temporary data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 6.2 Compression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 6.3 Small flow compressed data format . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Figure 6.4 Web compression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Figure 6.5 Small Web flow compressed data format . . . . . . . . . . . . . . . . . . . . . 68
Figure 6.6 Packet compressed data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 6.7 Large flow compressed data format . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 6.8 Decompression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Figure 6.9 Flow clustering vs. Huffman based behavior . . . . . . . . . . . . . . . . . . 73
Figure 6.10 Compression techniques comparison (Lossless) . . . . . . . . . . . . . . 74
Figure 7.1 Compression techniques comparison (Lossy) . . . . . . . . . . . . . . . . . 79
Figure 7.2 R/S plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Figure 7.3 Unique addresses set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Figure 7.4 Temporal locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 7.5 as a function of prefix length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 7.6 Multifractal spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Figure 7.7 Memory access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Figure 7.8 Cache miss rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Figure 8.1 Conventional routing procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Figure 8.2 IP switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Figure 8.3 Packet distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Figure 8.4 Number of clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Figure 8.5. RedIRIS trace-3D shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 8.6. Memphis trace-3D shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 8.7. Columbia trace-3D shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 8.8 Flow clustering spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
List of Tables
Table 4.1 IP version elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Table 4.2 IP version entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 4.3 IHL elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 4.4 IHL entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 4.5 TOS elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 4.6 TOS entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 4.7 Length elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 4.8 Length entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 4.9 Flags elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Table 4.10 Flags entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Table 4.11 Fragment offset elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Table 4.12 Fragment offset entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 4.13 Protocol elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 4.14 Protocol entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 4.15 Data offset elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 4.16 Data offset entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 4.17 Control bits elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 4.18 Control bits entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 4.19 Source port entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 4.20 Destination port entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 4.21 Source address entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 4.22 Destination address entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 4.23 Resume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 4.24 Version,IHL joint probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Table 4.25 Version,IHL joint entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Table 4.26 Version,IHL,Flags joint probability . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Table 4.27 Version,IHL,Flags joint entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Table 4.28 Version,IHL,Flags,FragOff joint probability . . . . . . . . . . . . . . . . . . 38
Table 4.29 Version,IHL,Flags,FragOff joint entropy . . . . . . . . . . . . . . . . . . . . . 38
Table 4.30 Version,IHL,Flags,FragOff,Protocol joint probability . . . . . . . . . . 39
Table 4.31 Version,IHL,Flags,FragOff,Protocol joint entropy . . . . . . . . . . . . . 39
Table 4.32 Version,IHL,Flags,FragOff,Protocol,TOS joint probability . . . . . 40
Table 4.33 Version,IHL,Flags,FragOff,Protocol,TOS joint entropy . . . . . . . . 40
Table 4.34 Version,IHL,Flags,FragOff,Protocol,TOS,DataOff joint prob . . . 41
Table 4.35 Version,IHL,Flags,FragOff,Protocol,TOS,DataOff joint entropy 41
Table 4.36 Version,IHL,Flags,FragOff,Protocol,TOS,DataOff,Control bits joint
probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Table 4.37 Version,IHL,Flags,FragOff,Protocol,TOS,DataOff,Control bits joint
entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Table 4.38 Version,IHL,Flags,FragOff,Protocol,TOS,DataOff,Control bits,Length
joint probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Table 4.39 Version,IHL,Flags,FragOff,Protocol,TOS,DataOff,Control bits,Length
joint entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Table 4.40 Independent random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Table 4.41 Resume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Table 4.42 AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Table 5.1 Flow probability distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Table 5.2 Number of clusters for m=2 packets . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Table 5.3 Clusters description (m=2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Table 5.4 Flow Entropy (m=2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Table 5.5 Number of clusters for m=3 packets . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Table 5.6 Clusters description (m=3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Table 5.7 Flow Entropy (m=3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Table 5.8 Number of clusters for m=4 packets . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Table 5.9 Clusters description (m=4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Table 5.10 Flow Entropy (m=4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Table 5.11 Number of clusters for m=5 packets . . . . . . . . . . . . . . . . . . . . . . . . . 57
Table 5.12 Clusters description (m=5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table 5.13 Flow Entropy (m=5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table 5.14 Number of clusters for m=6 packets . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table 5.15 Number of clusters for m=7 packets . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table 5.16 Cluster and entropy behavior for m-packet flows . . . . . . . . . . . . . . 61
Table 6.1 Set of different values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Table 6.2 Huffman encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Table 7.1 Hurst parameter estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Table 8.1 Packet classification example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
List of Symbols
Random variable
Time series
Mean
Expectation
Variance
Autocorrelation function
Autocovariance function
Hurst parameter
!
Lattice box counting dimension
"
-packet
#%& $
#%& $ (')
*,+
- & $ (')
- &$
# $
- $
Flow with "
packets
Packet header of the i-th packet of a flow consisting of m packets
Selected header field of
#& $
Mapping function
Mapped value of
# & $ .')
Vector of mapped values
Flow of "
packets
Numerical representation of the "
packets
/ -021
Header field variation
3
4
Entropy
36587 4
Conditional Entropy
3
:9;5<
Joint Entropy
!:>=?7(7 @A
Relative Entropy
BCDFEG5
Mutual Information
Acknowledgments
Abstract
Internet traffic measurement has been a subject of interest as long as there
has been Internet traffic. Nowadays, packet traces are collected and used for performance evaluation purposes in many systems. For instance, we can use packet
traces to evaluate basic packet forwarding or complex functions such as quality of
service (QoS) processing, packet classification, security, billing and accounting of
modern network devices.
Moreover, the need for network analysis has increased and realistic models
and methodologies for understanding network behavior play an even more essential role in facilitating the evolution of future gigabit/sec speeds.
In this context, we have developed a novel flow characterization approach
that incorporates semantic characteristics of flows. We understand by semantic characterization the joint analysis of traffic characteristics including the inter
packet time and some of the most important fields (source and destination address,
port numbers, packet length, TCP flags, etc) of the TCP/IP headers content.
Firstly, using clustering techniques we demonstrated that behind the great
number of flows in a high speed link, there is not so much variety among then.
In addition, traces captured from different links showed similar variety. Using the
evidence that many flows can be grouped in few clusters, we have implemented
a template of flows. This template consist of a dataset storing the most common
classes of flows. These results confirmed that exploring the TCP flow properties
we can obtain high rates of compression for TCP/IP header traces.
The analysis carried out using some concepts provided by the Information
theory provided a higher formalism to the proposed methodology. We calculated
the entropy at header field and at flow level. Moreover, we demonstrated that
the methodology could be used to develop header trace compression and trace
classification.
The proposed compression method is focused on the problem of compression of huge packet header traces. Here, we propose two compression methods:
lossless and lossy. The lossless compression is a combined method based on TCP
flow clustering for small flows and Huffman encoding for large flows. With our
proposed method, storage size requirements for .tsh packet header traces are reduced to 16% of their original size. Others known methods have their compression
ratio bounded to 50% and 32%.
The lossy compression is a packet trace compression method based on the
most representative flow clusters, on self-similar inter-packet time properties and
fractal IP address generation. With this proposed method, storage size requirements are reduced to 3% of its original size. Although this specification does not
define a completely lossless compressed data format, it preserves important statistical properties present in the original trace such as self-similarity, spatial and
temporal locality, and IP address structure. Furthermore, a memory performance
evaluation was carried out with four types of traces and the outcomes for mem-
ory access and cache miss ratio measurements demonstrated a similar behavior
between our decompressed trace and the original trace.
Finally, the proposed trace classification can be used to identify how similar are traces collected from different links and the type of applications present
into the trace. Using traces with different properties is strongly recommended for
evaluation purposes and extended validation of many systems. Our approach to
trace classification consists of identify semantically, for each trace, what are their
typical flows.
14
Chapter 1
Introduction
This chapter describes the addressed problem of this thesis and shows our main
contributions to overcome it.
1.1. The problem
The Internet is a global internetwork, sharing information among millions
of computers throughout the world. Internet users send packets of information
from one machine to another carrying traffic from a variety of data, video and
voice applications. Since the appearance of the world-wide-web (WWW) and
more recently of P2P and real-time applications, the Internet traffic has continued
to grow exponentially.
This rapid growth and the proliferation of new applications have combined
to change the character of the Internet in recent years. The volume of traffic and
high capacity of the links have rendered traffic characterization and analysis more
difficult and yet more critical consisting both on a challenging endeavor.
However, we have seen that, nowadays, the available tools used in the Internet are not very large and are concentrated in fews programs. For instance,
almost all operating systems are Windows or Linux based, TCP is the dominant
protocol, there are few TCP versions, and Web and P2P are the most common
applications. Moreover, since that many people use the same type of searchers
(google, scholar, etc) the users tend to show similar behavior when using the Internet. Hence, the need for realistic models and methodologies for understand
network behavior plays an even more essential role in facilitating the evolution of
future gigabit/sec speeds. Under my point of view, it seems inevitable that simulations and empirical techniques to describe traffic behavior will play a larger role
than traditional mathematical techniques have played in the past.
Internet traffic measurement has been a subject of interest as long as there
has been Internet traffic. Important results has been obtained from experimental
analysis on packet traces. For instance, studies on the 90’s years of traffic measurements from a variety of different networks have provided ample evidence that
actual network traffic is self-similar or fractal in nature, i.e., burst over a wide
range of time scales. This observation is in contrast to common modeling choices
in engineering theory, where exponential assumptions still are used to reproduce
15
the bursty behavior.
Moreover, understanding the nature of network traffic is critical in order to
properly design and implement network devices and network interconnections.
Our work is based on empirical investigations of high speed Internet links which
aggregate substantial amounts of traffic. Nowadays, packet traces are collected
and used for performance evaluation purposes in many systems. For instance,
we can use packet traces to evaluate from basic packet forwarding to complex
functions as quality of service (QoS) processing, packet classification, security,
billing and accounting of modern network devices.
A high performance of these network devices is necessary to follow up those
new functionalities. From this reason, several techniques have been proposed for
high speed network devices. For example, MPLS (Multi Protocol Label Switching) combines the flexibility of layer-3 routing with the high capacity of layer-2
switching. However, its performance must be strongly affected by the control parameters set. To choose the appropriate control parameters set, we need to know
the characteristics of the Internet traffic.
Moreover, these network devices must not only achieve high performance,
but they also must have the flexibility to deal with the large and ever-changing increasing demands for new and more complex network functionalities. For executing these tasks, general purpose processors or specific processors known as Network Processors [41] are normally used. Otherwise, the basic forwarding plane
functions are performed in software. In both cases, the performance of these systems depends not only on parameters such as packet length or inter packet time,
but also on some semantic properties of flows like: spatial and temporal locality
of IP address, IP address structure, TCP flags sequence, type of service, etc.
A critical requirement for performance evaluation and design of those network elements is the availability of realistic traffic traces. Nowadays, there is a
growing interest in capturing Internet traffic in pursuit of insights into its evolution. A popular scheme to obtain real traces for extended periods of time is to
collect them from routers [20]. There are, however, several reasons that make
difficult in many cases to have access to them.
Firstly, Internet providers are usually reluctant to make public real traces
captured in their networks. Moreover, when these traffic traces are made public [89], they are delivered after some transformations, such as sanitization [91],
which modify some basic semantic properties (such as IP address structure).
Secondly, there are others problems which arise due to the increasing speed
of Internet routers. Hardware for collecting traces at high speed (e.g. to link rates
of 2.5 Gbps, 10 Gbps or even 40 Gbps) is usually expensive. Moreover, with the
increase of link rates, the required storage for packet traces of meaningful duration
becomes too large.
1.2. Overview of thesis
This chapter presents the thesis problem, and the main contributions to the
field of research on packet trace characterization, compression and classification.
16
Chapter 2 presents the related work on traffic modeling and classical traffic
characteristics, describing in more details the self-similar traffic characteristics
and the fractal properties of the IP address structure. Also, describes the set of
traces used into our analysis
Chapter 3 presents our flow characterization approach which incorporates
semantic characteristics of flows.
Chapter 4 demonstrates that behind the great number of flows in a high
speed link, there is not so much variety among then. The evidence that Internet
traffic shows a small variety of flows has guided us to group the flows into a set of
clusters.
Using some concepts provided by the Information theory, we calculate in
chapter 5 the entropy at header field and at flow level. Furthermore, we demonstrate that those outcomes could be used to develop header trace compression and
trace classification.
In Chapter 6 we address the problem of compression of huge packet traces.
We propose a novel lossless packet header compression, focused not on the problem of reducing transmission bandwidth or latency, but on the problem of saving
storage space. With our proposed method, storage size requirements are reduced
to 16% of its original size.
Chapter 7 studies the properties of a new lossy packet trace compression
method. The compression ratio that we achieve is around 3%, reducing the file
size, for instance, from 100 MB to 3 MB. Although this specification defines a
not lossless compressed data format, it preserves important statistical properties
present into original trace such as self-similarity, spatial and temporal locality,
and IP address structure. Furthermore, memory performance studies are presented
with the Radix Tree algorithm executing a trace generated by our method. To give
support to these studies, measurements were taken of memory access and cache
miss ratio.
Our approach to trace classification is presented on Chapter 8. It consists of
identify, for each trace, what are their typical flows.
Finally, in Chapter 9 we summarize our results and mention open areas for
continued research in this area.
1.3. Contribution of our work
The core contributions of our work encompass four areas:
H
Semantic traffic characterization;
H
Lossless packet trace compression;
H
Lossy packet trace compression;
H
Packet trace classification.
17
The first component deals with packet traffic characterization. It differs
from previous studies in its focus on semantic characterization. We understand by
semantic characterization the analysis of traffic characteristics including the inter
packet time and some of the most important fields (source and destination address,
port numbers, packet length, TCP flags, etc) of the TCP/IP headers content. Many
published papers show important characteristics of the traffic such as inter packet
time, traffic intensity, packet length, etc; but in our opinion, more semantic aspects
of flows are required for a useful traffic characterization. We propose, for Internet
traffic, a novel semantic characterization. We demonstrate that behind the great
number of flows in a high-speed link, there is not so much variety among them
and clearly they can be grouped into a set of clusters [61]. In these analysis, we
have assumed the most common case of storing TSH (Time Sequence Header)
packet headers files [112].
Using the evidence that many flows can be grouped in few clusters, we have
constructed a new lossless packet header trace compressor. We propose a novel
packet header compressor, focused not on the problem of reducing transmission
bandwidth or latency, but on the problem of saving storage space. In our case, we
use the advantage that we know in advance all packets in a flow to be compressed,
and the compression rate that we achieve is around 16%, reducing the file size, for
instance, from 100MB to 16MB [59].
To reach this performance, the method uses two classes of algorithms. A
first class fits well for small flows and the second for large flows. Analysis using
both classes have demonstrated that the best combined performance is reached
when we consider small flows ranging from 1 to 7 packets per flow. The final
method presented is lossless, in the sense that for some fields the decompression
algorithm regenerates exactly the original value, while for others, those for which
the initial values are random as for instance initial TCP sequence number, the
values are shifted, as if we were capturing the trace at another execution time.
Evidently, these changes do not affect, in most cases, the analysis taken from the
decompressed file. Others known methods have their compression ratio bounded
to 50% and 32%.
Then, in order to reach a higher compression ratio, the third component
proposes a lossy compression method [62]. For some specific research reasons,
a lossy method that preserves important statistical properties present into original
traces can be more appropriate. The compression ratio that we achieve with this
method is around 3%.
The last component proposes, for Internet traffic, a new approach to classify
the traffic based on flow clustering spectrum. Using a methodology based on
three steps, the proposed packet trace classification identify how similar are traces
collected from different links [60].
Figure 1.1 illustrates how these four components are interconnected.
18
Internet Trace
Semantic Traffic
Characterization
Packet Trace
Packet Trace
Classification
Compression
Lossless
Lossy
Compression Compression
Figure 1.1: Relation between the main contributions
19
20
Chapter 2
Traffic modeling and
characterization
This chapter is devoted to present the classical traffic characteristics and the related work on traffic modeling, describing in more details the self-similar traffic
characteristics and the fractal properties of the IP address structure. Moreover, we
describe the set of traces used into our analysis.
2.1. Classical traffic characteristics
The complexity of the Internet traffic necessitates that we characterize it as
a function of multiple dimensions to understand the network mechanism. Those
traffic characteristics are strongly influenced by a set of factors such as: delay,
jitter, loss, throughput, utilization, reachability, availability, burstiness, and length.
Bellow, we describe each one of these factors in more details.
2.1.1. Delay and jitter
Delay and jitter are typically end-to-end performance notions. Delay includes transfer delay, caused by the intermediate switching nodes and end hosts.
Many real-time multimedia applications may require predictable delay, notably
inconsistent with the datagram best-effort architecture of the Internet. In addition,
many continuous media applications will rely on synchronization between audio
and video streams. Thus the variance in delay will also be an important Internet
properties [26]. Studies have provided evidence for the variance and asymmetry of
delay on wide area Internet infrastructures [85] [25]. Floyd and Jacobson [44] [45]
analyze traffic phase effects in packet-switched gateways and end systems, their
potential damaging effects on Internet performance, and suggestions for possibles
ways to mitigate the systematic tendency of routing protocols to synchronize in
large systems. Zhang et al. [123] studies the phase effects of congestion control
algorithms or other aspects of TCP implementations [84].
2.1.2. Loss
Multimedia applications may require low or predictable delay but not necessarily completely lossless services, while other applications require transmission
21
guarantees but not strict delay bounds. In addition to the effect of loss on protocol
performance and network dynamics [115] [104] [4] [85] studies have also investigated the potential impact of loss on charging policies [103] [76]. Related to loss
are those of reliability and availability of given links or nodes in the network.
2.1.3. Throughput
According Jain [67], throughput is defined as the rate (requests per unit of
time) at which the requests can be serviced by the system and includes:
H
H
nominal capacity or bandwidth: achievable throughput under ideal workload conditions;
H
throughput under actual workload conditions;
efficiency: ratio of maximum achievable throughput (usable capacity) to
nominal capacity.
2.1.4. Utilization
The metric that typically comes to mind in describing a network, at least
for a network operator, is utilization. Utilization metrics can reflect any measured
granularity, and statistics of their distribution, including mean, variance, and percentile statistics, can reveal trends over both short and long intervals. Related to
utilization is the congestion of the network, or contention for either bandwidth or
switching resources. Measurements of congestion include distributions of queue
length or available buffers in nodes. Several studies of local area environments
focus on short-term utilization characteristics [14] [79]. Longer term utilization
metrics would include traffic volume growth over several years on a backbone
infrastructure [23]. An Internet service provider will tend to pay attention to utilization metrics as indicators of how close their network is to saturation so they
can plan for upgrades.
2.1.5. Reachability
As networks increase their range of possible destinations, so does the size
of routing tables, and thus the cost of maintaining them in switching nodes and
the cost of searching them in forwarding datagrams. Metrics such as the size of
these routing tables, or the number of IP network numbers to which an Internet
component can route traffic, are indicators of network reachability.
2.1.6. Reliability
The reliability of a system is usually measured by the probability of errors
or by the mean time between errors.
2.1.7. Availability
The availability of a system is defined as the fraction of the time the system
is available to service user’s requests. The time during which the system is not
22
available is called downtime; the time during which the system is available is
called uptime. Often the mean uptime, better known as the Mean Time To Failure
(MTTF), is a better indicator.
2.1.8. Burstiness
The statement that traffic is bursty means that the traffic is characterized by
short periods of activity separated by long idle periods. There are two implications
to the bursty nature of data traffic. First of all, the fact that the long-term average
usage by a single source is small would suggest that the dedication of facilities
to a single source is not economical and that some sort of sharing is appropriate.
The second aspect of burstiness is that sources transmit at a relatively high instantaneous rate. This is, in effect, a requirement on the form that this sharing of
facilities can take [55].
Burstiness metrics fall into two categories: those that measure inter-arrival
processes, e.g., time between packets arrivals, and those that measure arrival processes, e.g., number of packets per time interval.
2.1.9. Length
Payload is the amount of information carried in a packet. What constitutes
informations will depend on the layer, e.g., the payload of an IP packet would be
the contents of the packet following the IP header. A loose definition of payload
sometimes includes the entire packet including headers. Packet payload is one
indicator of protocol efficiency, although a more accurate analysis of efficiency
will also reflect end-to-end behavior, including acknowledgments, retransmission,
and update strategies of different protocols.
The differences in payload per application are also visible at the aggregate
level. In [23] and [15] are present evidences for significantly different distributions
of traffic by packets and bytes by individual networks. The disparity between the
number of packets and the number of bytes sent by networks indicates a definite
difference in workload profiles, where specific networks, likely, with major data
repository, source mostly large packets sizes into the backbone.
23
2.2. Traffic Modeling
In the last years, many researchers have developed mathematical models
which have provided a great number of insight into the design and performance of
network systems. The fundamental aim of this section is to present these models
and to give an overview on the state of the art in traffic modeling.
The work of Erlang [17] [37] [38] in the context of telephone switching
systems constitutes the pioneer work on traffic modeling. Erlang found that given
a sufficiently large population, the random rate of calls can be described by a
Poisson process and the duration of the call was found to have an exponential
distribution. An important result obtained by Erlang is the probability of a busy
signal as a function of the traffic level and the number of lines that have been
provided.
In the generic queueing theory model, the theory attempts to find probabilistic descriptions of some quantities as the sizes of waiting lines, the delay experienced by an arrival, and the availability of a service facility [55]. In the voice of
telephone network, for instance, demands for service take the form of subscribers
initiating a call. In this application the server is the telephone line. The analog of
the service time of a customer is the duration of the call.
In connection with queues, a convenient notation has been developed. In
its simplest form it is written A/R/S, where A designates the arrival process, R
the service required by an arriving customer, and S the number of servers. For
example, M/M/1 indicates a Poisson arrival of customers, an exponentially distributed service discipline and a single server. The M may be taken to stand for
memoryless (or markovian) in both the arrival process and the service time.
Although queueing theory was developed for voice traffic, it was applicable
to computer communications. The generation of data messages at a computer
corresponds to the initiation of voice calls to the network. The time required to
transmit the message over a communications facility corresponds to the required
time of service.
Measurements of traffic in data systems have shown that some message generation can be modeled as a Poisson process. In particular, was showed that the
Poisson arrival process was a special case of the pure birth process. This lead directly to the consideration of birth-death processes, which model certain queueing
systems in which customers having exponentially distributed service requirements
arrive at a service facility at a Poisson rate.
Consider a network of communication links and assume that there are several
packet streams, each following a unique path that consists of a sequence of links
through the network. Let ICJ , in packets/sec, be the arrival rate of the packet stream
K . Then the total arrival rate at link (i,j) is:
LNMPO
I,J
RQ
Q=SRTUVW K XVY "ZK3K T[A\ K]K 1^C_`QD1^ab
1c9')
(2.1)
The preceding network model is well suited for virtual circuit networks, with
each packet stream modelling a separate virtual circuit. For datagram networks,
24
it is necessary to use more general model that allows bifurcation of the traffic of
a packet stream, where there may be several paths followed by the packets of a
stream. Let I,J denote the arrival rate of packet stream K , and let d & + K denote the
fraction of the packets of stream K that go through link (i,j). Then the total arrival
rate at link (i,j) is:
LNM O
d & + K ,I J
RQ
Q=SRTUVW K XVY "ZK3K [T A\ ]K K 1^C_eQD1^fg219')
(2.2)
From the special case of two queues that even if the packet streams are Poisson with independent packet lengths at their point of entry into the network, this
property is lost after the first transmission line. To resolve the dilemma, it was
suggested by Kleinrock [72] [71] that merging several packet streams on a transmission line has an effect similar to restoring the independence of interarrival time
and packet lengths. It was concluded that it is often appropriate to adopt an M/M/1
queueing model fo each communication link regardless of the interaction of traffic
on this link with traffic on other links. This is known as Kleinrock independence
approximation and seems to be a reasonably good approximation for systems involving Poisson stream arrivals at the entry points, packet lengths that are nearly
exponentially distributed, a densely connected network and moderate-to-heavy
traffic load.
Queueing models was also extensively used for ATM traffic. An interesting
modeling approach was decomposing the problem by time scales. This approach
was first introduced by Hui [63]. This decomposition is based on the qualitatively
different nature of the system at three different time scales: call scale, burst scale
and cell scale. At the cell time scale, the traffic consists of discrete entities, the
cells, produced by each source at a rate which is often orders of magnitude lower
than the transmission rate of the output link. At the burst time scale the finner
cell scale is ignored and the input process is characterized by its instantaneous
rate. Consequently, fluid flow models appear as a natural modelling tool [97].
At cell level traffic, was studied the mechanism that handle with individual cells.
Basically, was used two classes of cell level models: those based on renewal processes and those based on Markov modulated processes. At burst level, bellow are
showed several models where traffic is considered as continuous fluid:
H
H
Renewal rate process
H
On/Off process and their superpositions
H
Poisson burst process
H
Gaussian traffic modeling
Markov modulated rate process
Traffic measurements studies however have demonstrated that data traffic
characteristics differ significantly from Poisson features. These studies may be
25
classified as belonging either to local area networks (LAN) or wide area networks
(WAN). Jain el al. [68] studied the traffic on a token ring network, and showed
that successive packet arrivals on a token ring network were neither Poisson nor
compound Poisson. An alternative model using the concept of a packet train
that represents a cluster of arrivals characterized by a fixed range of inter-arrivals
times was proposed. This model considers a track between two nodes A and B.
All packets on the track are flowing either from A to B or from B to A. A train
consist of packets flowing on this track with the intercar time between them being
smaller than a specified maximum, referred to as the maximum allowed intercar
gap (MAIG). If no packets are seen on the track for MAIG time units, the previous
train is declared to have ended and the next packet is declared to be the locomotive
(first car) of the next train. The intertrain time is defined as the time between the
last packet of a train and the locomotive of the next train (figure 2.1).
AB
T1
AB
T2
AB
T3
AB
AB
AB
AB
Inter−Train
T4
T5
T6
Inter−Car
Figure 2.1: Packet Trains
More recent studies show that it is not accurate to use a Poisson process
to model data traffic. A pioneering work by Leland et al. [79] showed that the
LAN traffic generated by Ethernet connected workstations, file servers and personal computers exhibits same degree of correlation when aggregated using window sizes increasing from seconds to hours. This work proposed that Ethernet
LAN traffic is statistically self-similar, and that major components of Ethernet
LAN traffic such as external LAN traffic or external TCP traffic share the same
self-similar characteristics as the overall LAN traffic. Since this work many researchers have shown features of data traffic that exhibit self-similar like longrange dependent (LRD) properties. A stochastic process is said to have LRD if it
is wide-sense stationary and the sum of its auto-correlation values diverge.
More recent investigations have proposed other notions for burstiness, such
as focusing on the arrival process, as a counting process, rather than the interarrival process. Willinger et al. [118] [79] have studied the packet arrival process, and the correlation of packet arrivals in local environments. In their study
of several Ethernet environments [79] they present evidence that Ethernet traffic
is self-similar, implying no natural burst length. Such traffic exhibits the same
correlation structure at various aggregation granularities. They conclude that empirical data demands reexamination of currently considered formal models for
26
packet traffic, e.g., pure Poisson or Poisson-related models such as Poisson-batch
or Markov-Modulated Poisson processes [56], packet train models [68], and fluid
flow models [8]. In particular, their evidence indicates that Poisson modeling assumptions are false for environments that aggregate much traffic. Contrary to the
Poisson assumption, the traffic profile of their measured environments becomes
burstier rather smoother as the number of active sources increases; the Poisson
assumption appears to hold only during low traffic periods with mostly machine
generated router-to-router traffic.
While these studies focus on LAN, specifically Ethernet traffic, Paxson and
Floyd [94] also comment on the potential self-similarity of wide-area traffic. They
also note that packet inter-arrivals are not exponentially distributed [68] [30] [45].
Paxson et al. [95] evaluated 21 WAN traces in their traffic analysis research.
They considered both the Poisson process model and new models to characterize
FTP and TELNET traffic and found that in some cases commonly used Poisson
models result in serious underestimation of TCP traffic burstiness that existed over
a wide range of time scales. For interactive TELNET traffic, the exponentially
distributed inter-arrivals commonly used to model packet arrivals generated by
the user side of a TELNET connection grievously underestimated the variability of
these connections. For applications such as SMTP and NNTP, connections arrivals
are not well modeled as Poisson since both types of connections are machineinitiated and can be timer-driven. For large bulk transfer, exemplified by FTP, the
traffic structure significantly deviates from the Poisson model. Paxson et al. also
so offered results that suggest self-similar properties of WAN traffic.
The degree of self-similarity present in the process is measured in terms of
the Hurst parameter [79]. This Hurst parameter has been proposed as a measure
of the Burstiness of the traffic. The persistence of traffic burstiness is one of the
main causes of congestion and implying on packet loss and delays.
Further study by Willinger et al. [120] provided a plausible physical explanation for the occurrence of self-similarity in high speed network traffic. The
super-position on many ON/OFF sources with strictly alternating ON and OFF
periods and whose ON periods or OFF periods exhibit the Noah effect (i.e. have
infinite variance) can produce aggregate network traffic that exhibits the Joseph
effect (i.e. self-similar or LRD). This was provided as a physical explanation for
the presence of self-similar traffic patterns in modern high-speed network traffic
that is consistent with traffic measurements at the source level.
Crovella et al. [28] showed that Web traffic is self similar, and the selfsimilarity is in all likelihood attributable to the heavy-tailed distributions of transmission times of documents and silent times between document requests. The
data traces used in the Crovella study were recorded by Cunha et al. [29].
Along the last years, the researchers have pointed out the evidence that the
Internet traffic shows long-range dependence, scaling phenomena and heavy tail
behavior. Furthermore, networking series such as the aggreagate number of packets and bytes in time, have been shown to exhibit correlations over large time
scales and self-similar scaling properties.
These models resulted in invalidating the traditionally used assumptions in
27
modeling and simulations, namely that packet arrivals are Poisson and packet sizes
and interarrival times are mutually independent.
However, the results presented in [70] have showed that unlike the older data
sets, current network traffic can be well represented by the Poisson model for subsecond time scales and up to sub-second time scales traffic is characterized by a
stationary Poisson model.
2.2.1. Traffic Flow Properties
The efforts to model and characterize computer network traffic have focused
on temporal statistics of packet arrival and packet size distribution. These models
were used to link and buffer size dimensioning respectively.
However, in the last years, we have seen a strong tendency in use a flowbased approach to model the Internet traffic. Claffy, Braun and Polyzos [22] have
presented a parameterizable methodology for profiling Internet traffic flows at a
variety of granularities. Barakat et al. [11] have presented a traffic model at flow
level by a Poisson shot-noise process. In this model, a flow is a generic notion that
must be able to capture the characteristics of any kind of data stream.
Moreover, many published papers have studied some traffic characteristics
such as flow size, flow lifetime, IP address locality, and IP address structure. For
instance, in [51] flow size distribution is studied, introducing a flow classification
based on number of bytes, i.e. mice or elephants. In [19] flows are classified by
lifetime, demonstrating that most flows are very short. Kohler, Li,Paxson, and
Shenker [75], have investigated the structure of addresses contained in IP traffic.
All these studies show important characteristics of the traffic, but in our opinion,
more semantic aspects of flows are required for a useful traffic characterization.
In terms of synthetic generation, Barford and Crovella [12] created a Web
workload generator which mimics a set of real users accessing a server. The tool,
called SURGE generates references matching empirical measurements of: server
file size distribution, request size distribution, relative file popularity, embedded
file references, temporal locality of reference and idle periods of individual users.
The work of Aida and Abe [6] investigates the stochastic property of the packet
destinations and proposes an address generation algorithm which is applicable for
describing various Internet access patterns.
2.2.2. Popularity
Popularity has been extensively studied, mainly to Web reference streams.
Web reference streams shows highly skewed popularity distributions, which are
usually characterized by the term Zipf’s Law [49] [29] [7] [16].
Zipf’s Law was originally applied to the relationship between a word’s popularity in terms of rank and its frequency of use. It states that if one ranks the
popularity of words used in a given text (denoted by h ) by their frequency of use
#
(denoted by ) then
#jilk]m h
28
(2.3)
The practical implication of Zipf-like distributions for reference streams is
that most references are concentrated among a small fraction of all of the objects
referenced. Based on the near-ubiquity of Zipf-like distributions in the Web, many
authors have captured popularity skew [69] [16] [12]. Highly popular documents
tend to be requested frequently, and thus will exhibit shorter inter-request time;
less popular documents, on the other hand, tend to be requested infrequently, and
thus will exhibit longer inter-request times. In [69] was showed the relashionship
between popularity and temporal locality. That is, Zipf,s Law results in strong
temporal locality [46].
Designers of computer systems have years ago incorporated the notion of
memory reference locality in system design, largely through the use of virtual
memory and memory caches [58]. In deriving metrics for network traffic locality, Jain [66] draws a comparison to memory reference locality, which is either
spatial or temporal. Spatial locality refers to the likelihood of reference to memory locations near previously referenced locations. Temporal locality refers to the
likelihood of future references to the same location. In network traffic, the concentration of references to a small fraction of address and the persistence of references
to recently used addresses are analogous concepts [66]. Jain presents data using
three measures of locality: the network traffic income distribution, which can reflect long or short-term locality; and two metrics that only apply to short-term
locality assessment: the average working set size as a function of packet window size, and the stack depth probability distribution. The income distribution
measures what percent of communicating network entities is responsible for what
percent of traffic on the network. Changes in the working set of source or destination IP networks are indicators of source-based and destination-based favoritism.
To measure the working set one plots the number of unique address references
as a function of the number of total address references. The stack level probability distribution measures the likelihood of reference to a network address as a
function of the previous reference to that address.
Gulati et al. [50] offers four locality metrics, two of which overlap with
those of Jain: persistence; address reuse, which is similar to persistence with the
requirement for consecutive reference loosened; concentration; and reference density. Reference density reflects the number of communicating entities responsible
for a given percentile of the network traffic.
Many previous studies have established the existence of network traffic locality, in particular short-term traffic locality in specific network environments for
selected granularities of network traffic flow. Jain originally established the packet
train model to study locality behavior on a local area network [68]. Other studies,
though not focused on packet trains in particular, also find evidence for locality
even in networks of wider geographic scope [24] [57] [93] [39] [30] [23] [105].
Others [2] [1] [83] have extended the packet train model to the transport and application layers, defining a train as a quadruple of source/destination address pairs
in conjuction with port numbers.
Just as program and data caching policies can exploit memory reference locality in a virtual memory system, router designers can exploit traffic locality with
29
analogous schemes such as caching network address and specialized flow information in switching nodes. [43] and [66] simulate caching algorithms on traffic
traces taken from LAN gateways. Feldmeier [43] estimated the potential benefit
of caching on the performance of gateway routing tables. Using measured traffic from gateways at MIT, he simulated a variety of fully associative caching replacement algorithms (LRU, FIFO, and random) to determine cache performance
metrics such as hit radio and inter-fault distance. His data indicated that the probability of reference to a destination address versus time of previous reference to
that address monotonically decreases for up to 50 previous references, implying
that an LRU cache management procedure is optimal for caches of 50 slots or
less. His conservative conclusion was that caching could reduce gateway routing
table lookup time by up 65%. In addition to caching destination addresses, his
simulations indicate benefits from caching source address as well.
Jain [66] also performed trace-driven cache simulations. Simulating MIN
(optimal), LRU, FIFO, and random replacement algorithms, he found significantly difference locality behavior between interactive and non-interactive traffic. The interactive traffic did not follow the LRU stack model while the noninteractive traffic did. In particular, the periodic nature of certain protocols may
make caches ineffective unless they are sufficiently large. Such environments may
require larger or multiple caches, or new cache replacement/fetch algorithms.
Estrin and Mitzel [39] also explore locality in their investigation of lookup
overhead in routers. They use data collected at border routers and transit networks
to estimate the number of active conversations at a router, which reflect the storage
requirements for the associated conversation state table. They find that maintaining fine grained traffic state may be possible at the network periphery, but deeper
within the network coarser granularity may be necessary. They also use the traces
to perform simulations of an LRU cache for different conversation granularities,
and find that improvements in state lookup time are possible with a small cache,
even without special hardware.
Gulati et al. [50] have explored LAN cache performance of source address,
destination address, and both source and destination address. In their measurement study of LAN traffic they find that it is more important to cache destination rather than source address, especially for caches with more than 15 entries.
One reason is that many source hosts send very few packets, and thus the cost
of caching the source address is greater than the benefit. Another reason is that
source addresses are poor predictors of destination address references in the future.
About locality in the WWW traffic, Almeida et al. [7] proposed models
for both temporal and spatial locality of reference in streams of requests arriving
at Web servers. They showed that simple models based on document popularity
are insufficient for capturing either temporal or spatial locality. Moreover, they
showed that temporal locality can be characterized by the marginal distribution
of the stack distance trace and that spatial locality can be characterized using the
notion of self-similarity.
For temporal and spatial locality of reference in streams of requests arriving
30
at Web servers, Almeida, Bestavros, Crovella, and Oliveira [7], have proposed
models that capture both properties.
2.2.3. Long range dependence and self-similarity
From Ethernet traffic [79] to wide-area traffic [95], passing through Web
traffic [28], all have been characterized by statistical self-similarity. Some have
modeled quite successfully such traffic with ON/OFF traffic sources [119] while
others have tried to fit a Markov-modulated model [102]. [92] has discussed some
implications of self- similarity on network performance as well as the impact of
network handling mechanisms on traffic characteristics. An important idea has
already been raised in [6], in that the heavy-tailed characteristics of the objects
size to be transferred over the network could suffice for generating self-similarity.
It has been shown that heavy-tailed file transfer duration and file sizes can lead to
high variability.
Taqqu et al. [110] proved that aggregate World Wide Web traffic as found
on the Internet links can be modeled by super-positioning many ON/OFF traffic
sources where the ON and OFF periods were drawn from heavy tailed distributions. This method afforded the generation of traffic traces for simulation and
linear time. Traffic generated by one of the ON/OFF traffic sources in the above
mentioned model was representative of a single Web user [79]. Deng [36] proposed an ON/OFF traffic model to be used for the simulation of traffic generated
by an individual browsing the Web. He derived distributions for the parameters
of the model by means of analyzing datasets measured on a corporate network.
He used probability plots to gauge the goodness-of-fit of analytic distributions
to the datasets. The model had the advantage of being simple and of generating
self-similar traffic due to the heavy tailed nature of the ON and OFF distributions.
In the study of wide area network (WAN), Klivansky et al. [73] examined the
packet traffic from geographically dispersed locations on the NSFNET T3 backbone, and indicated that packet-level traffic over NSFNET core switches exhibits
LRD. Their key conclusion is that LRD in TCP traffic is primarily caused by the
joint distributions of two TCP conversation parameters: the one-way number of
packets per conversation and the conversation duration.
Along this section, we describe the main concepts related to self-similarity.
D is
We start defining the concept of stationary ergodic process. A process
stationary if its behavior or structure is invariant with respect to shifts in time.
D is strictly stationary if D
Gnop9cDq[p9WrsrsrW9
tAc and D
;nvuP[9c
qwu
[9srsrsrU9c
txuy possess the same joint distribution for all ^{z{|w} , ;n[9srWrsrW9ct ,
4z~| . A process is said stationary ergodic when it is possible to estimate the
process statistics (mean, variance, autocorrelation function, etc) from the observed
^C69;^ M k9srWrsrU9 .
values of a single time series I
, we define the m-aggregated time series
For a stationary
time
series
€ $?‚ M„ƒ … € $?‚ 9[ M‡† 9Wk9[ˆ‰9srWrsrpŠ by summing the original time series over non
overlapping, adjacent blocks of size " . This may be expressed as
31
…$
O
I …€ $?‚ M k
" >& ‹ … $Œ € $Œ n ‚ I &
(2.4)
One way of viewing the aggregated time series is as a technique for compressing the time scale. If the statistics of the process (mean, variance, correlation,
etc.) are preserved with compression, then we are dealing with a self-similar process.
Some of the properties of self-similar processes are most clearly stated in
terms of the autocorrelation r(k). The autocorrelation as defined as:
Dfu~Ž 6 M (D
N?Ž .
U
Ž q (2.5)
However, other properties are best expressed in terms of the auto-covariance:
M N TX M Ž q
(2.6)
where .
When a given pattern is reproduced exactly at different scales, it might be
termed exact self-similarity. This exact self-similarity can be constructed for a
deterministic time series. A process is said to be exactly self-similar with pa’
†
‘
‘
M
]
k
k9[ˆ‰9Wrsrsr we have:

rameter 
if for all "
<D € $?‚ M <
4
<X1R^”TUV
"N“
W•—–™˜›š M • œx›o\žTU\]XXVYQ
1Ÿ\]^
(2.7)
(2.8)
M kSŽ mAˆ
The parameter  is related to the Hurst parameter, defined as
 .
M k and the variance of the time average
For a stationary, ergodic process, 
k]m
decays to zero at the rate of " . For a self-similar process, the variance of the
time average decays more slowly.
A process I is said to be asymptotically self-similar if for all large enough
I € $?‚ M < I ž1Ÿ^aTpV
<
" “
¡£¢ –™˜›š 6¤ ¡ ¢ K " ¤ ¥ xœ ›o\XTp\]XXVYQ
1\ž^
(2.9)
(2.10)
Thus, with this definition of self-similarity, the autocorrelation of the aggregated
process has the same form as the original process.
One of the most significant properties of self-similar processes is referred to
as long-range dependence. This property is defined in terms of the behavior of the
TA6 as increases.
auto-covariance
In general, a short-range dependent process satisfies the condition that its
auto-covariance decays at least as fast as exponentially:
32
TXviP›¦ … ¦ K
7™§7X¤ ¥
6 ‘ ¥
O
also, we can observe that:
†¨‘ ‘ k
…
(2.11)
(2.12)
In contrast, a long-range dependent process has a hyperbolically decaying
auto-covariance:
6i©7™ª7 Œ “
K
7™ª7X¤ ¥
†«‘  ‘ k
(2.13)
where  is related to the Hurst parameter as defined earlier. In this case,
O
…
6 M ¥
(2.14)
One of the attractive features of using self-similar models for time series
is that the degree of self-similarity of a series is expressed using only a single
parameter. The parameter express the speed of decay of the series autocorrelation
M
function. For historical reasons, the parameter used is the Hurst parameter
‘
k%Ž  mˆ . Thus, for self-similar series with long-range dependence, k]mˆ ‘
k . As ¤ k , the degree of both self-similarity and long range dependence
increases.
A number of approaches have been taken to determine whether a given time
series of actual data is self-similar and, if so, to estimate the self-similarity parameter . Bellow, we summarize some of the more common approaches taken.
The first method, the variance-time plot, relies on the slowly decaying variF€ $‚ is plotted against " on a log-log
ance of a self-similar series. The variance of
o
Ž
plot; a straight line with slope  greater than -1 is indicative of self-similarity.
The second method, the R/S plot, uses the fact that for a self-similar dataset,
the rescaled range or R/S statistic grows according to a power law with exponent
2^a . Thus the plot of R/S against
as a function of the number of points included
^ on a log-log plot has slope which is an estimate of .
The third approach, the periodogram method, uses the slope of the power
spectrum of the series as frequency approaches zero. On a log-log, the periŽyk M k¬ŽˆA .
odogram slope is a straight line with slope 
The last method is called Whittle estimator. The two forms that are most
commonly used to calculate it are Fractional Gaussian Noise (FGN) with paramk]mˆ ‘ ‘ k , and Fractional ARIMA (p,d,q) with †Z‘®­Z‘ k]mˆ [13] [18].
eters
These two models differ in their assumptions about the short-range dependences
in the datasets; FGN assumes no short-range dependence while fractional ARIMA
can assume a fixed degree of short-range dependence.
2.2.4. IP Address Multifractality
The presence of multifractality in IP address was originally described by
[75]. According with them, an address structure can be viewed as a subset of the
33
q 9]6<ugk]mˆX± q M
†
M
B
s
9
Y
k
%
œ
¯
°
)

m
ž
ˆ
±
unit interval
, where the subinterval
corre
spond to address . Considered this way, address structure might resemble a Cantor dust-like fractal [81] [96]. The lattice box counting fractal dimension metric
naturally fits with address structures and prefix aggregation. Lattice box counting
=
ˆŒ
dimension measures, for every , the number of dyadic intervals of length
to cover the relevant dust. These dyadic intervals correspond to the first
=required
^
bits of the IP addresses (p-aggregate). Given a trace, let be the number of
†:² = ²l³ ˆ ).
p-aggregates that contain at least one address present in the trace (
Furthermore, since each p-aggregate contains and is covered by exactly two dis^ ² ^ }Cn ² ˆX^ . Using this notation,
joint (p+1)-aggregates, we know that lattice box counting dimension is defined as
M ´(¶¸· =SQ
\]Q
_)\]_^ ˆ
! µ
(2.15)
p¹»º
Q
\]_)^ would appear as a straight line with slope
If address structures were fractal,
! when plotted as a function of = .
Adaptations of the well-known Cantor dust construction can generate address structures with any fractal dimension. The original Cantor construction can
be extended, for instance, to a multifractal Cantor measure [54] [101]. Begin
B
by assigning a unit of mass to the unit interval . Then, split the interval into
three parts where the middle part takes up a fraction ¼ of the whole interval; call
BW½Y9;Bžn , and BWq . Then throw away the middle part BAn , giving it none of
these parts
½ and
the parent interval’s mass. The other subintervals are assigned masses "
M
Y
B
½
U
B
q
q
k
Ž
½
"
" . Recursing on the nonempty subintervals and generates
Bs½½ ,BW½q , BWq½ , and BWqq with respective masses " q½ , " ½ four
" q,
nonempty subintervals
q
" q " ½ , and " q . Continuing the procedure defines a sequence of measures …
BU¾on¿¿¿ ¾ … ) = " ¾n£ÀÁYÁYÁ,À " ¾ … (each  & is 0, 1, or 2). To create an address
where … (
structure from this measure, we choose a number of address so that the proba
œ%¯U . If " ½ and " q differ the measure is
bility of selecting address equals multifractal.
2.3. Data collection
Network traffic measurements provides a mean to understand what is and
what is not working properly on a local-area or wide-area network. Using specialized network measurement hardware or software, a network researcher can collect
detailed information about the transmission of packets on the network, including
their time structure and contents. With detailed packet-level measurements, and
some knowledge of the Internet Protocol stack, it is possible to obtain significant
information about the structure of an Internet application or the behavior of an
Internet user.
According to [117] there are four main reasons why network traffic measurement is a useful methodology:
H
Network troubleshooting: Computer networks are not infallible. Often, a
single malfunctioning piece of equipment can disrupt the operation of an
34
H
entire network, or at least degrade performance significantly.
H
Protocol debugging: Developers often want to test out new versions of network applications and protocols. Network traffic measurement provides a
mean to ensure the correct operation of the new protocol or application,
its conformance to required standards, and its backward-compatibility with
previous versions.
H
Workload characterization: Network traffic measurements can be used as
input to the workload characterization process, which analyzes empirical
data (often using statistical techniques) to extract salient and representative
properties describing a network application or protocol. Knowledge of the
workload characteristics can then lead to the design of better protocols and
networks for supporting the application.
Performance evaluation: Finally, network traffic measurements can be used
to determine how well a given protocol or application is performing in the
Internet. Detailed analysis of network measurements can help identify performance bottlenecks.
In general, the tools for network traffic measurement can be classified in the
following different ways:
H
H
Hardware-based versus Software-based Measurement tools. The primary
categorization among network measurement tools is hardware-based versus software-based measurements tools. Hardware-based tools are often referred to as network traffic analyzers: special-purpose equipment designed
expressly for the collection and analysis of network data. Software-based
measurements tools typically rely on kernel-level modifications. One widely
used utility is tcpdump [111], a user level tool for TCP/IP packet capture.
In general, the software-based approach is much less expensive than the
hardware-based approach, but may not offer the same functionality and performance as a dedicated network traffic analyzer. Another software-based
approach to workload analysis relies on the access logs that are recorded by
Web servers and Web proxies on the Internet. These logs record each client
request for Web site content, including the time of day, client IP address,
URL requested, and document size. Post-processing of such access logs
provides useful insight into Web server workloads [9], without the need to
collect detailed network-level packet traces.
Passive versus Active Measurements Approaches. A passive network monitor is used to observe and record the packet traffic on an operational network, without injecting any traffic of its own onto the network. That is, the
measurement device is non-intrusive. An active network measurement approach uses packet generated by a measurement device to probe the Internet
and measure its characteristics. Examples of this approach include the ping
35
H
utility for estimating network latency to a particular destination on the Internet, the traceroute utility for determining Internet routing paths, and the
pathchar tool for estimating link capacities and latencies along an Internet
path.
On-line versus Off-line Traffic Analysis. Some network traffic analyzers
support real-time collection and analysis of network data, often with graphical displays for on-line visualization of live traffic data. Other network
measurement devices are intended only for real-time collection and storage
of traffic data; analysis is postponed to an off-line stage. The tcpdump utility falls into this category. Once the traffic data is collected and stored, a
researcher can perform as many analysis as desired in the post-processing
phase.
However, reliable and and representative measurements of wide area Internet traffic are difficult to obtain. Basically, these difficulties are related on the
following problems:
H
H
Firstly, real traces are often difficult to obtain, mostly because of security
or privacy concerns. For this reason, publicly available collections of traces
are often sanitized by hiding the real source and destination addresses of all
packets. Although such sanitized traces may still be useful for many studies
(e.g. dealing with the interarrival time or length distribution of packets),
they are completely useless in those cases when the actual IP addresses are
needed, e.g. to investigate the behavior of caching algorithms.
H
Secondly, with the increasing speed of Internet routers, it becomes more
and more expensive to collect and store full traces of a meaningful duration without affecting the router’s performance. The situation begins to resemble the problems of collecting memory reference strings of programs in
execution. The complete string of sizable CPU bound program cannot be
collected and stored in real time without drastically impairing its execution
time.
Thirdly, Internet traffic patterns are likely to undergo significant changes
due to new applications and user activity patterns that cannot be anticipated
at present. For example, Web transactions became one of the major components of the Internet traffic almost overnight. Such changes will render real
trace collections obsolete.
The Off-line analysis carried along this thesis are based on traces captured
from many sites and mainly are devoted to workload characterization. One of
them was an OC-3 link trace (155 Mbps) that connects the Scientific Ring of
Catalonia to RedIRIS (Spanish National Research Network) [98] collected by a
hardware-based measurement tool on passive mode. This not sanitized trace is a
collection of packets flowing in one direction of the link containing a timestamp
36
and the first 40 bytes of the packet. For our analysis, we have used only the output
link. The Scientific Ring (see Figure 2.2) is a high performance communication
network created in 1993 and nowadays it joins more than forty research institutions.
Figure 2.2: RedIRIS Topology
Furthermore, we have surveyed publicly available archive of traces collected and maintained by the National Laboratory for Applied Network Research
(NLANR) [89]. We downloaded traces obtained from the following sites [90]:
H
H
Colorado State University (COS)
H
Front Range GigaPOP (FRG)
H
University of Buffalo (BUF)
Columbia University (BWY)
In all cases, the traces are stored using the TSH packet header format. For
.tsh files the header size is 44 bytes: 8 bytes of timestamp and interface identifier,
20 bytes of IP, and 16 of TCP (see Figure 2.3). No IP or TCP options are included.
The packet payload is also not stored.
37
0
8
16
24
timestamp (seconds)
timestamp (microseconds)
interface
Version
IHL
Type of Service
Total Length
Identification
Time to Live
Flags
Protocol
IP
Fragment Offset
Header Checksum
Source Address
Destination Address
Source Port
Destination Port
Sequence Number
fin
Reserved
ack
psh
rst
syn
Data
Offset
urg
Acknowledgment Number
Window
Figure 2.3: TSH header data format
38
TCP
Chapter 3
Semantic traffic characterization
In this chapter, we present a novel flow characterization approach that incorporates
semantic characteristics of flows. We understand by semantic characterization the
joint analysis of traffic characteristics including some of the most important fields
(source and destination address, port numbers, packet length, TCP flags, etc) of
the TCP/IP headers content [61].
3.1. Semantic Characterization
Let us define a packet flow as a sequence of packets in which each packet
has the same value for a 5-tuple of source and destination IP address, protocol
number, and source and destination port number and such that the time between
two consecutive packets does not exceed a threshold. In our case, we have adopted
a threshold of 5 sec.
First of all, our analysis intend to classify the header fields depending of how
-
1 be a header field for
they change for packets belonging to the same flow. Let
M
0
2
1
1

Ž
-0ok] .
the i-th packet of a flow. We also define Ã
-0ok] can be classified as -0ok] -random, -0ok] For the first packet of a flow,
-0okY -not predictable:
predictable, or
H -0ok] -random fields:
Are fields whose initial values could or should be
chosen at random: Identification, Sequence Number, and Acknowledgment
Number. The identification field is primarily used for uniquely identifying
fragments of an original IP datagram and many operating systems assign a
sequential number for each packet. Hence, assigning a random number for
the first fragment does not constitute a problem. Equally, the sequence number and acknowledgment number fields are not affected if we assign random
values for the first packet of each flow.
H -0ok] -predictable fields: Are fields whose value is usually known or at least
predictable: Interface, Version, IHL, Type of Service, Flags, Fragment Offset, Protocol, Data Offset, Reserved, and Control Bits. The fields placed in
this group preserve a high level of similarity among different flows.
39
H -0ok] -not predictable fields: Are fields whose value cannot be predicted and
has an specific meaning: Timestamp, TTL, Header Checksum, Total Length,
Source Address, Source Port, Destination Address, Destination Port, and
Window. This group embrace the fields that are very hard to guess their
values for the first packet of each flow. For instance, we can not know
previously, where will start each flow. Hence, is impossible to guess, for
each flow, the value of the timestamp field of the first packet. The TTL field
is modified in Internet header processing, hence depending of the amount
of hops previously visited, its value must vary broadly for different flows.
The total length carried by each packet as well the window field also show
a large variation. Finally, the source and destination address, represents a
set of directions that is impossible to know in advance.
-
1 behavior, the fields of the i-th packet
à -
1 -predictable, and à -
1 -not pre-
Moreover, according with the Ã
-021Ÿ MĆ ,
of a flow were classified as Ã
dictable.
H à -021 Mņ
fields: are header fields whose values are likely to stay constant
over the life of a connection: Version, Type of Service, Protocol, Source
Address, Destination Address, Source Port, and Destination Port. Here we
are grouping the set of TSH header fields that is likely to stay constant over
the life of a connection. Hence, from each flow, we only need store the data
from the first packet.
H à -021 -predictable fields: Are fields whose à -
1
values are predictable,
can be calculated based on the information stored in another field or follows
sequential increments: Interface, IHL, Identification, Flags, Fragment Offset, Time to Live, Sequence Number, acknowledgment number, Data Offset,
Reserved, and Control Bits.
H Ã -021 -not predictable fields: Are fields that are likely to change over the
life of the conversation, and furthermore, are impossible to be calculated:
timestamp, total length, header checksum, and Window. In this group are
inserted fields that are likely to change over the life of the conversation, and
furthermore, are impossible to be calculated.
-0ok]
-021Ÿ
Taking into account the joined behavior of
and Ã
, we have created
-0ok]
four categories of fields. In the first category, are placed the fields whose
-021Ÿ values are constant or predictable through a
values are predictable and Ã
flow:
((
-k]
-k]
Not Random) AND (
-predictable)) AND (( Ã
0
2
Ÿ
1
(Ã
-predictable))
-021 MÆMdž ) OR
The fields that agree with those constraints are: Interface, Version, IHL, Type of
Service, Flags, Fragment Offset, Protocol, Data Offset, Reserved, and Control
40
Bits. This set of fields shows a high similarity within consecutive packets belonging to the same flow and in particular between m-packets flows (flows with m
packets).
-k] values are not
In the second category are included the fields whose
-
1 values are constant or predictable:
predictable and Ã
((
-0ok] -Not Random) AND (-0okY -Not predictable)) AND ((à -021 MÆMĆ ) OR
-021Ÿ -predictable))
(Ã
According with those constraints, we have the following fields: TTL, Source Address, Source Port, Destination Address, and Destination Port. For these fields,
storage needs are restricted to the first packet of each flow.
The third category incorporates the fields that are hard to predict or calculate
and we can not assign random values to them:
(Ã
-021Ÿ -Not predictable)
In this case, storage needs are extended over all packets. These fields are: Timestamp, Total Length, Header Checksum, Window.
-0ok] is ranFinally, the last category groups the fields whose initial value
-021Ÿ can be calculated: Identification, and Sequence
dom and the increments Ã
Number, and Acknowledgment Number.
3.1.1 Flow Mapping
For a best representation of the header fields as well as a way to understand
their behavior, we have developed a header field mapping. In this mapping, for
some header fields the values are simply copied from the packets; for others the
mapped value represents the increment or decrement between consecutive packets
into a flow, and finally for some of them which the distribution of values is highly
skewed, we can replace the original value by a transformation or function of the
values. Bellow, we describe this mapping (See Figure 3.1).
# $
Let & be the packet header of the i-th packet of a flow consisting of "
#
$ (') is a selected header field of # & $ . For each field # & $ (') , a function
packets. &
*,+ performs a mapping into an integer value -& $ (') :
For each packet, let
- & $ (') M *,+ 6 # & $ .')c
(3.1)
- & $ M 6 - & $ ok][9G- & $ 6ˆp9WrsrsrÈ
(3.2)
denote a vector of integers, where we include the selected fields. For the complete
flow we can define:
# $ M 2# n $ 9G# q $ 9srsrsrU9G# $ $ (3.3)
and
- $ M 6- n $ G9 - q $ W9 rsrsrW9;- $ $ p r
41
(3.4)
P3 5 (3)
P35(1) P3 5 (2)
P45 (1) P4 5 (2)
F35 (1) F35 (2)
P45 (3)
F35(3)
5
4
F45 (1) F (2)
F 5(3)
4
Figure 3.1: Flow mapping
- $
Note that the vector
can be viewed as a numerical representation of the "
packet headers, as we substitute some selected packet header fields by integers.
As an example of header fields that are simply copied are: Version, IHL,
Type of service, Flags, Fragment Offset, Data Offset, and Control bits. However,
for some fields; such as Identification and Time to Live; we map their values in
terms of the increment or decrement between consecutive packets.
On the other hand, as we have said earlier, if the distribution of a parameter
is highly skewed, one should consider the possibility of replacing the parameter
by a transformation or function of the parameter. For instance, the replacing of
the timestamp and the packet size into more appropriate values:
H
Packet size: Observing the Internet packets, we have seen that there are a
high predominance of small packet size (acknowledgements packets) and
packet size near 1,500 bytes (packets carrying data). As a consequence of
this observation, we can use the following mapping:
- & $ . '‰ M
M
H
M
† ;E 1 :
=SRTUVW K 1ŸÊV ²ÌˆAÍ<Î o V K
d
É
kAE;1 d:É Ë k ‘ =SRTUVW K 1ŸÊV ² Wk Ï ††
ˆ)E;1 d:É =SRTUVW K 1ŸÊVÑÐÇksÏ ††
Inter-packet time into a flow: Analysing a set of flows with the same number
of packets we have seen that, for small flows, the inter-packet time between
consecutive packet is very similar for many flows. Basically, this interpacket time are very small or are near the Round Trip Time (RTT). This
behavior is related with the TCP properties. The sequence of figures from
3.2 to 3.6 show, for a set of 1,100 flows with 6 packets, how similar are
the inter-packet time between consecutive packets into a flow. Hence, if a
42
1200
1200
1000
Flows with 6 packets
Flows with 6 packets
1000
Large RTT
800
600
Medium RTT
400
200
0
Small RTT
0
0.05
0.1
0.15
0.2
0.25
Time (sec)
Figure 3.2:
à t( k J
Ò
and
0.3
ˆ tWÓ
0.35
800
600
400
200
0
0.4
0
packets)
Figure 3.3:
1000
Flows with 6 packets
1000
Flows with 6 packets
1200
800
600
400
200
0.05
0.1
0.15
0.2
0.25
Time (sec)
Figure 3.4:
à t(³ Ò>Ô
and
0.3
Ï Ò>Ô
0.15
0.2
tWÓ
à t(ˆ
0.25
0.35
0.4
and
0.3
³ Ò>Ô
0.35
0.4
packets)
800
600
400
200
0
0
0.1
Time (sec)
1200
0
0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Time (sec)
packets)
Figure 3.5:
à t(Ï Ò>Ô
and Õ
Ò>Ô
packets)
packet to be transmitted waits for a packet sent by the opposite node, we
call it as dependent packet, otherwise, if a packet is sent immediately after
the last one we call it as no-dependent. For instance, in the TCP three-way
handshake, when a node send a Syn control flag, it waits for a Syn+Ack
control flag from the opposite node. This waiting time corresponds to the
RTT. In this sense, we associate inter-packet time to acknowledgement dependence. Hence:
- & $ (') M
M
† ;E 1 d RTWÓ[Vc=SVs^ ­ Vs^,D=SRTW‰VW
kE;1 d”×ÖÆØ RTUÓGVc=SVs^ ­ Vs^,D=SRTWVW
43
1200
Flows with 6 packets
1000
800
600
400
200
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Time (sec)
Figure 3.6:
à t(Õ Ò>Ô
44
and
Ë Ò>Ô
packets)
Chapter 4
Flow Clustering
This chapter is dedicaded to explore the similarity between Internet flows using
the semantic traffic characterization proposed in the previous chapter. To be able
to do that, we have broken down the trace into flows with the same number of
packets and calculated the number of clusters. Beyond calculating the number of
clusters, we have studied the popularity of each cluster. For compression purposes,
for instance, the relation between the number of flows and the number of clusters
must be as minimum as possible.
4.1. Introduction
Internet Traffic is composed of a very large number of very short flows, and
a few very long flows. The terminology mice and elephants provides a useful
metaphor for understanding this characteristic of Internet traffic: there are relatively few elephants, and a large number of mice [51].
The table 4.1 shows, for each one of the m-packet flows (column 1), the
probability distribution . The data indicate that 90% of the flows show less than
21 packets (column 2). Similar probability distributions were showed in [22] [10].
Moreover, in the same table we see that 77% of the packets (column 3) are carried
by only a small number of flows (elephants) while the remaining large amount of
flows carry few packets (mice). These outcomes are similar to that described in
[51].
The figure 4.1 shows the cumulative distributions of flow packet volume.
The curve indicates that the 90th percentile of the flows reflects 20 packets or less
and that the tail part of the distribution suit as a class of heavy-tailed distribution
which decays very slowly in its tail.
4.2. Methodology
Using the flow characterization described in chapter 3, in a high-speed link
we can find potentially a large variety of flows. However, from our studies, we
have seen that the flows are not very different from each other.
To study the variety among flows, we have used an approach based on clustering, a classical technique used for workload characterization [67]. The basic
45
Table 4.1:
Number of packets
per flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Ù 2021
Flows distribution
Flow
Flow Volume
Probability
(Packets)
0.095546
0.004193
0.070476
0.006184
0.063517
0.008361
0.052245
0.009169
0.195584
0.042908
0.163134
0.042947
0.077876
0.023919
0.039831
0.013981
0.028339
0.011191
0.021941
0.009627
0.016285
0.007860
0.012876
0.006779
0.010409
0.005937
0.008343
0.005125
0.007762
0.005108
0.006719
0.004717
0.007220
0.005386
0.005656
0.004467
0.005214
0.004347
0.003911
0.003432
0.107118
0.774358
46
1
Probability distribution
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
10
100
1000
Packets/Flow
10000
100000
Figure 4.1: Distribution of the number of packets in flows
idea of clustering is to partition the components into groups so the members of a
group are as similar as possible and different groups are as dissimilar as possible.
Statistically, this implies that intra-group variance should be as small as possible
and intergroup variance should be as large as possible.
In the last years, a number of clustering techniques have been described in
the literature. These techniques fall into two classes: hierarchical and nonhierar
chical. In nonhierarchical approaches, one starts with an arbitrary set of clusters,
and the members of the clusters are moved until the intra-group variance is minimum. There are two kinds of hierarchical approaches: agglomerative and divisive.
^
In the agglomerative hierarchical approach, given components, one starts with
^ clusters (each cluster having one component). Then neighboring clusters are
merged successively until the desired number of clusters is obtained. In the divisive hierarchical approach, on the other hand, one starts with one cluster (of
^ components) and then divides the cluster successively into two, three, and so
on, until the desired number of clusters is obtained. A popular known clustering
technique is the minimum spanning tree method.
Generally, a measured trace consists of a large number of flows. For analysis
purposes, it is useful to classify these flows into a small number of classes or
clusters such that the components within a cluster are very similar to each other.
Later, one member from each cluster may be selected to represent the class.
Clustering analysis basically consists of mapping each component into an n^
dimensional space and identifying components that are close to each other. Here
is the number of parameters. The closeness between two components is measured
by defining a distance measure. The Euclidian distance is the most commonly
used distance metric and is defined as:
47
t
­’M®ƒ O I & … Ž I + … q Š ½c¿ Ú
…‹ n
(4.1)
The figure 4.2 depicts the proposed clustering methodology. Starting from
a real trace, we break it down into flows with the same number of packets. Using
$
the mapping described in chapter 3 (see figure 3.1), from a set of vectors ( ),
$
we calculate the Euclidian distance between the
vectors and the results are
- $ represents a cluster.
stored in a distance matrix of flows. Initially, each vector
Evidently, distance 0 means that two vectors are exactly identical. Later, we search
­)Û
the smallest element of the distance matrix. Let J , the distance between clusters
and K , be the smallest. We merge clusters and K and also merge any other
cluster pairs that have the same distance. We have used the Minimum Spanning
^
Tree hierarchical clustering technique [67], which starts with clusters of one
component each and successively joins to the nearest clusters until be reached
a specific distance between the clusters. For each " , we apply, separately, the
clustering method. After all, the final clusters are joined and Templates of Flows
are generated.
Vector 1
Vector 2
Flows
with
2pkts
Distance
Calutalte the
Euclidian
Distance
between the
vectors
Distance
Matrix
of flows
Vector n
Vector 1
Flows
with
3 pkts
Calutalte the
Euclidian
Distance
between the
vectors
Vector 2
Matrix
of flows
Vector n
RedIRIS
Trace
Template of
Flows
Vector 1
Flows
with
m pkts
Vector 2
Calutalte the
Euclidian
Distance
between the
vectors
Distance
Matrix
of flows
Vector n
Figure 4.2: Flow Clustering Methodology
48
4.3. Clustering
We have selected 9 fields to study their diversity among flows in Internet
links. The shaded boxes in figure 4.3 depict those selected fields. We use de
definition of flow as a sequence of packets in which each packet has the same
value for a 5-tuple of source and destination address, source and destination port
and protocol and such that the time between two consecutive packets does not
exceed a threshold of 5 sec.
0
8
16
24
timestamp (seconds)
timestamp (microseconds)
interface
Version
IHL
Type of Service
Total Length
Identification
Time to Live
Flags
Protocol
Fragment Offset
Header Checksum
Source Address
Destination Address
Source Port
Destination Port
Sequence Number
fin
Reserved
ack
psh
rst
syn
Data
Offset
urg
Acknowledgment Number
Window
Figure 4.3: Selected fields used to flow clustering
- $
%- $
We started our analysis using the RedIRIS trace, converting each flow in a
vector and calculating the distance between the previously read flows. The
vector contains the mapping value of the selected fields described above.
However, for the TTL field, we have used the increment or decrement between
consecutive packets.
For each " , we show how the number of clusters increases when new flows
are read. However, we reach a step where the number of clusters reach a limit,
and the new read flows always fit with some of the previous clusters.
Bellow, we describe the set of clusters for each m-packet flows:
H
Flows with 2 packets per flow: The table 4.2 shows, from 5 to 243 flows,
how increase the number of clusters. After read only 243 flows, the number
of clusters tends to increase smoothly (see figure 4.4) and the number of
clusters (15) represent only 6% of the number of flows.
On table 4.3 we describe some clusters found into flows with 2 packets.
As we see, 87% of flows are concentrated in only one cluster. The data
49
points with extreme parameter values are called outliers, particularly if they
lie far away from the majority of the other points. Since those outlying
components do not consume a significant portion of flows, their exclusion
would not affect significantly the final results of clustering.
Table 4.2: Number of clusters for m=2 packets
Number of flows Number of clusters Percentage
5
2
0.40
47
6
0.13
87
7
0.08
201
14
0.07
243
15
0.06
16
Clusters m=2 packets
Number of clusters
14
12
10
8
6
4
2
0
0
50
100
150
Number of flows
200
250
H
Figure 4.4: Number of clusters m=2
H
Flows with 3 packets per flow: The table 4.4 shows, from 6 to 648 flows,
how increase the number of clusters. After read 648 flows, the number
of clusters tends to increase smoothly (see figure 4.5) and the number of
clusters (69) represent only 10% of the number of flows.
On table 4.5 we describe some clusters found into flows with 3 packets. As
we see, approximately 82% of flows are concentrated in only four clusters.
Flows with 4 packets per flow: The table 4.6 shows, from 24 to 1,740 flows,
how increase the number of clusters. After reading 1,740 flows, the number
of clusters tends to increase smoothly (see figure 4.6) and the number of
clusters (103) represent only 6% of the number of flows.
On table 4.7 we describe some clusters found into flows with 4 packets. As
we see, approximately 81% of flows are concentrated in only five clusters.
50
Table 4.3: Flows distribution
Cluster
Packet 1
Packet 2
01
1:4:5:2:0:0:6:2:0
1:4:5:0:0:0:5:4:0
02
1:4:5:2:0:0:7:2:0
1:4:5:0:0:0:5:4:0
03
1:4:5:2:0:0:7:2:0 1:4:5:2:0:0:5:17:0
04
1:4:5:2:0:0:11:2:0 1:4:5:2:0:0:5:4:0
05
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:4:0
06
1:4:5:2:0:0:7:18:0 1:4:5:2:0:0:5:17:0
07
1:4:5:2:0:0:7:18:0 1:4:5:2:0:0:5:4:0
08
1:4:5:2:0:0:6:2:0
1:4:5:2:0:0:5:4:0
09
1:4:5:2:0:0:11:18:0 1:4:5:2:0:0:8:17:0
Prob
0.029586
0.875740
0.017751
0.005917
0.035503
0.011834
0.005917
0.011834
0.005917
Table 4.4: Number of clusters for m=3 packets
Number of flows Number of clusters Percentage
6
5
0.83
40
10
0.27
133
25
0.19
265
45
0.17
362
54
0.14
499
65
0.13
648
69
0.10
70
Clusters m=3 packets
Number of clusters
60
50
40
30
20
10
0
0
100
200
300
400
Number of flows
500
Figure 4.5: Number of clusters m=3
51
600
700
Table 4.5: Flows with 3 pkts per flow
Packet 1
Packet 2
Packet 3
Prob
1:4:5:2:0:0:7:18:0 1:4:5:2:0:0:5:24:0 1:4:5:2:0:0:5:17:0 0.076923
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:16:0 1:4:5:2:0:0:5:17:0 0.564103
1:4:5:2:0:0:6:18:0 1:4:5:2:0:0:5:16:0 1:4:5:2:0:0:5:17:0 0.102564
1:4:5:2:0:0:6:2:0 1:4:5:0:0:0:5:16:-67 1:4:5:0:0:0:5:4:0 0.025641
1:4:5:2:0:0:7:18:0 1:4:5:2:0:0:5:16:0 1:4:5:2:0:0:5:20:0 0.076923
1:4:5:2:0:0:7:18:0 1:4:5:2:0:0:5:16:0 1:4:5:2:0:0:5:17:0 0.051282
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:8:16:0 1:4:5:2:0:0:5:17:0 0.025641
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:7:2:0
1:4:5:0:0:0:5:4:0 0.025641
1:4:5:2:0:0:8:18:0 1:4:5:2:0:0:5:16:0 1:4:5:2:0:0:5:17:0 0.025641
1:4:5:2:0:0:6:2:0
1:4:5:2:0:0:5:24:0 1:4:5:2:0:0:5:25:0 0.025641
Table 4.6: Number of clusters for m=4 packets
Number of flows Number of clusters Percentage
24
9
0.37
56
11
0.19
201
35
0.17
461
51
0.11
873
76
0.08
1740
103
0.06
120
Clusters m=4 packets
100
Number of clusters
Cluster
1
2
3
4
5
6
7
8
9
10
80
60
40
20
0
0
200
400
600 800 1000 1200 1400 1600 1800
Number of flows
Figure 4.6: Number of clusters m=4
52
Table 4.7: Flows with 4 pkts per flow
Cluster
1
2
3
4
5
6
7
8
9
10
11
Packet 1
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:10:18:0
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:7:18:0
1:4:5:2:0:0:7:18:0
1:4:5:2:0:0:7:2:0
1:4:5:0:0:0:7:18:0
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:6:2:0
1:4:5:2:0:0:7:18:0
1:4:5:2:0:0:7:18:0
Packet 2
1:4:5:0:0:0:5:16:-66
1:4:5:2:0:0:8:16:0
1:4:5:0:0:0:5:16:-67
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:16:0
1:4:5:0:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:2
1:4:5:2:0:0:5:16:0
Packet 3
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:8:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:2
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:24:0
1:4:5:0:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:16:2
1:4:5:2:0:0:5:16:0
Packet 4
1:4:5:2:0:0:5:17:0
1:4:5:2:0:0:8:17:0
1:4:5:2:0:0:5:17:0
1:4:5:2:0:0:5:17:0
1:4:5:2:0:0:5:17:0
1:4:5:2:0:0:5:17:0
1:4:5:0:0:0:5:17:0
1:4:5:2:0:0:5:4:0
1:4:5:2:0:0:5:20:0
1:4:5:2:0:0:5:17:0
1:4:5:2:0:0:5:25:0
To improve the accuracy of our outcomes, we extended our analysis to other
networks. We demonstrated that the occurrence of flow clustering verified
in RedIRIS trace could be seen also in other traces. In Fig. 4.7, we show,
M Ï packets and with inter cluster distance equal to zero, the outfor "
comes from four ATM OC-3 traces downloaded from NLANR web site.
The plotted curves are from Colorado State University (COS), Front Range
GigaPOP (FRG), University of Buffalo (BUF), and Columbia University
(BWY).
Furthermore, in Figure 4.7, we see the behavior of the Joined Trace (upper curve). This trace was obtained joining the four downloaded NLANR
traces. From Figure 4.7, we obtain two important conclusions: (i) the traces
from NLANR have the same behavior of the RedIRIS trace, i.e. we can obtain a small number of clusters to represent a packet trace; (ii) the number
of clusters in the joined trace is less than the summation of clusters of the
other four traces. This implies that the type of flows is basically the same in
all traces. Similar behavior was obtained for different values of " .
53
Prob
0.018182
0.163636
0.018182
0.218182
0.072727
0.218182
0.018182
0.127273
0.109091
0.018182
0.018182
25
Joined
COS
BUF
BWY
FRG
Number of Clusters
20
15
10
5
0
0
50
100
150
200
250
300
Flows for m=4 packets
Figure 4.7: Number of clusters for ATM OC-3 traces and joined trace - NLANR
traces
54
H
Flows with 5 packets per flow: The table 4.8 shows, from 30 to 3,162 flows,
how increase the number of clusters. After read 3,162 flows, the number
of clusters tends to increase smoothly (see figure 4.8) and the number of
clusters (142) represent only 4% of the number of flows.
On table 4.9 we describe some clusters found into flows with 5 packets. As
we see, approximately 88% of flows are concentrated in only four clusters.
Table 4.8: Number of clusters for m=5 packets
Number of flows Number of clusters Percentage
30
5
0.16
62
7
0.11
207
19
0.09
709
63
0.08
1589
99
0.06
3162
142
0.04
160
Clusters m=5 packets
Number of clusters
140
120
100
80
60
40
20
0
0
500
1000
1500 2000 2500
Number of flows
3000
3500
H
H
Figure 4.8: Number of clusters m=5
Flows with 6 packets per flow: The table 4.10 shows, from 52 to 1,174
flows, how increase the number of clusters. After read 1,174 flows, the
number of clusters tends to increase smoothly (see figure 4.9) and the number of clusters (92) represent only 8% of the number of flows.
Flows with 7 packets per flow: The table 4.11 shows, from 5 to 2,106 flows,
how increase the number of clusters. After read 2,106 flows, the number
of clusters tends to increase smoothly (see figure 4.10) and the number of
clusters (252) represent 11% of the number of flows.
55
Table 4.9: Flows with 5 pkts per flow
Cluster
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Prob
1
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:17:0
0.600
2
1:4:5:2:0:0:10:2:0
1:4:5:2:0:0:8:16:0
1:4:5:2:0:0:8:24:0
1:4:5:2:0:0:8:16:0
1:4:5:2:0:0:8:17:0
0.110
3
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:4:0
0.085
4
1:4:5:2:0:0:6:2:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:17:0
0.085
5
1:4:5:2:0:0:7:2:0
1:4:5:0:0:0:5:16:-67
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:4:0
0.010
6
1:4:5:2:0:0:10:18:0
1:4:5:2:0:0:8:16:0
1:4:5:2:0:0:8:24:0
1:4:5:2:0:0:8:24:0
1:4:5:2:0:0:8:25:0
0.015
7
1:4:5:2:0:0:7:18:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:17:0
0.005
8
1:4:5:2:0:16:6:2:0
1:4:5:0:0:0:5:16:-66
1:4:5:2:0:16:5:16:0
1:4:5:2:0:16:5:24:0
1:4:5:2:0:16:5:17:0
0.005
9
1:4:5:2:0:0:7:18:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:17:0
0.015
10
1:4:5:2:0:16:6:2:0
1:4:5:0:0:0:5:16:-67
1:4:5:2:0:16:5:16:0
1:4:5:2:0:16:5:24:0
1:4:5:2:0:16:5:17:0
0.010
11
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:2
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:17:0
0.020
12
1:4:5:2:0:0:6:2:0
1:4:5:0:0:0:5:16:-66
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:17:0
0.005
13
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:4:0
0.005
14
1:4:5:2:0:0:7:18:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:17:0
0.005
15
1:4:5:2:0:0:10:18:0
1:4:5:2:0:0:8:16:2
1:4:5:2:0:0:8:16:2
1:4:5:2:0:0:8:24:2
1:4:5:2:0:0:8:17:0
0.005
16
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:17:0
0.005
17
1:4:5:2:0:0:7:2:0
1:4:5:0:0:0:5:16:-67
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:17:0
0.005
18
1:4:5:2:0:0:7:2:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:17:0
0.005
19
1:4:5:2:0:0:7:2:0
1:4:5:0:0:0:5:16:-66
1:4:5:2:0:0:5:24:0
1:4:5:2:0:0:5:16:0
1:4:5:2:0:0:5:17:0
0.005
Table 4.10: Number of clusters for m=6 packets
Number of flows Number of clusters Percentage
52
18
0.34
157
29
0.18
518
59
0.11
1174
92
0.08
Table 4.11: Number of clusters for m=7 packets
Number of flows Number of clusters Percentage
5
5
1
146
50
0.34
396
95
0.23
949
163
0.17
2106
252
0.11
56
100
Clusters m=6 packets
90
Number of clusters
80
70
60
50
40
30
20
10
0
0
200
400
600
800
1000
1200
Number of flows
Figure 4.9: Number of clusters m=6
300
Clusters m=7 packets
Number of clusters
250
200
150
100
50
0
0
500
1000
1500
2000
Number of flows
Figure 4.10: Number of clusters m=7
57
2500
4.4. Conclusions
From the previous analysis, we have concluded that Internet flows are not
very different from each other and we can group a high amount of them into few
clusters.
The table 4.12 extends these analysis, showing the relation between the number of clusters and the number of flows for each class of m-packets flows (m ranging from 2 to 20). As we see in figure 4.11, for " ranging from 2 to 7, this
relation is highly favorable, what means that we can represent many flows with
few clusters reaching a high compression ratio. However, for " greater than 7,
this relation is impaired. This impaired relation does not means that the clusters
does not exist for large flows, but that we would need an extemely large trace to
found them what is not addressed for practical purposes.
Table 4.12: Percentage of clusters for m-packets flows
Number of Pkts Clusters/Flows
per flow
(%)
2
6%
3
10%
4
6%
5
4%
6
8%
7
11%
8
24%
9
30%
10
39%
11
47%
12
59%
13
57%
14
54%
15
58%
16
66%
17
52%
18
66%
19
64%
20
79%
An extensive analysis carried out with many traces has produced the following conclusions:
H
H
For small flows, behind the great number of flows in a high-speed link there
is not so much variety among them and clearly they can be grouped into a
set of clusters;
For each subset of m-packet clusters, the TCP/IP flows are not equally distributed into the clusters, with a high predominance of few clusters;
58
1
Percentage of clusters
Clusters/Flows
0.8
0.6
0.4
0.2
0
0
5
10
m-packet flows
15
20
H
Figure 4.11: Relation between clusters and flows
H
H
Same type of clusters present at different traces;
The evidence that Internet flows can be grouped into a small set of clusters
led us to create templates of flows;
Those templates constitute an efficient mechanism to packet trace compression and classification.
59
60
Chapter 5
Entropy of TCP/IP Header Fields
In this chapter we dicuss about the definition of entropy and its aplicability
to TCP/IP header fields. The analysis were carried out at packet and flow level.
The entropy of a random variable is a measure of the uncertainty of the random
variable; it is a measure of the amount of information required on the average to
describe the random variable. For data compression, the entropy of a random
variable is a lower bound on the average length of the shortest description of the
random variable and is used to establish the fundamental limit for the compression
of information. Data compression can be achieved by assigning short descriptions
to the most frequent outcomes of the data source and necessarily longer descriptions to the less frequent outcomes. Thus the entropy is the data compression limit
as well as the number of bits needed in random number representation. Codes
achieving turn out to be optimal.
5.1. Introduction
The entropy of a random variable
is defined by
with a probability mass function p(x)
3
4 M Ž O ª= I oQ2\Y_XqŸ=ª I (5.1)
Note that entropy is a functional of the distribution of . It does not depend on
the actual values taken by the random variable , but only on the probability.
Using logarithms to base 2 the entropy will then be measured in bits. It is
the number of bits on the average required to describe the random n variable. The
Q
\Y_ € • ‚ where X is
entropy of can also be interpreted as the expected value of
drawn according to probability mass function p(x). Thus:
3
4 M Q
\Y_ =§ k I
(5.2)
5.1.1 Conditional Entropy
The conditional entropy is defined as the entropy of a random variable, given
another random variable. The conditional entropy of a random variable given an61
other is defined as the expected value of the entropies of the conditional distributions, averaged over the conditioning random variable.
F6587 4 M O ª= I oF65 7 M I ¢
F6587 { M Ž O =ª I O)Ü =§ Î 7 I Q
\Y_]=ª Î 7 I ¢
F65 7 4 M Ž O ORÜ =ª I 9 Î oQ2\Y_]=ª Î 7 I ¢
Ü
F25Z7 { M Žx € ¢sÝ ‚ Q2\Y_]=ª6587 {
(5.3)
(5.4)
(5.5)
(5.6)
5.1.2 Joint Entropy
The joint entropy
=ª
joint distribution I
FD:9G5 of a pair of discrete random variables ×9G5Ñ
9 Î is defined as:
3
:9;5< M Ž O
ORÜ ª= I 9 Î Q
\]_]=ª I 9 Î ¢
with a
(5.7)
which can also be expressed as
F
×9G5Ñ M Ž<Q
\Y_]=ª
×9G5’
(5.8)
Also, the entropy of a pair of random variables can be defined as the entropy
of one plus the conditional entropy of the other:
3
:9;5< M Ž O
¢
O Ü ª= I 9 Î Q
\]_]=ª I 9 Î F
×9G5 M Ž O
ORÜ ª= I 9 Î Q
\Y_]=ª I D=ª Î 7 I ¢
F
×9G5Ñ M Ž O OÜ =§ I 9 Î Q
\Y_ž=§ I ގ O OÜ =ª I 9 Î oQ
\]_]=ª Î 7 I ¢
¢
FD:9G5 M Ž O =ª I Q
\]_]=ª I Ž O O)Ü =ª I 9 Î Q
\Y_]=ª Î 7 I ¢
¢
3
:9;5Ñ M FD4fu~36587 4
(5.9)
(5.10)
(5.11)
(5.12)
(5.13)
5.1.3 Relative Entropy
The relative entropy is a measure of the distance between two distributions.
In statistics, it arises as an expected logarithm of the likelihood ratio. The relative
!ßà=?7 @A is a measure of the inefficiency of assuming that the distribution
entropy
is q when the true distribution is p. For example, if we knew the true distribution
of the random variable, then we could construct a code with average description
62
F>=S
@
instead, we used the code for a distribution , we would need
Fà=,au`!ßà=?.7¸7 @If, bits
on the average to describe the random variable.
=ª @ The relative entropy between two probability mass functions I and I
length
is defined as:
!ßà=?7¸7 @ M O ª= I oQ2\Y_ =ª@ I ¢
I
4
!ß>=?7(7 @A M Q
\]_ =ª@DD4
(5.14)
(5.15)
5.1.4 Mutual Information
The mutual information is a measure of the amount of information that one
random variable contains about another random variable. It is the reduction in
the uncertainty of one random variable due to the knowledge of the other. Con
5
sider two random variables and with a joint probability mass function p(x,y)
=ª =ª Î . The mutual information
and marginal probability mass function I and
BC
FE;5Ñ is the relative entropy between the joint distribution and the product dis=ª =ª Î , i.e.,
tribution I
ORÜ ª= I 9 Î oQ
\]_ =ªª= I =ª9 Î Î ¢
I
BC
3EG5< MPO OÜ =ª I 9 Î Q
\]_ =ª=ª I 7 Î ¢
I
BC
3EG5Ñ M Ž O O)Ü =§ I 9 Î Q
\Y_ž=§ I au O O)Ü =§ I 9 Î Q
\Y_ž=§ I 7 Î ¢
¢
BC
FE;5< M Ž O =§ I Q
\Y_]=ª I ŽÅŽ O O)Ü =ª I 9 Î Q
\Y_]=ª I 7 Î ¢
¢
BC
3EG5< M F
4Ž3F
e7 5<
BC
FE;5Ñ MPO
(5.16)
(5.17)
(5.18)
(5.19)
(5.20)
5.1.5 Asymptotic Equipartition Property
In information theory, the analog of the law of large numbers is the Asymptotic Equipartition Property (AEP). It is a direct consequence of the weak law of
n t states
large numbers. The law of large numbers
for independent, identically
n & isthat
n
distributed (i.i.d.) random variables, tá &>n ‹
close
to its expected value
^
Q
\Y_ € •ÞâŸÝ •aãoÝ ¿¿¿ Ý •”ä ‚ is close to the entropy
for large values of . The AEP states that t
8np9c«qs9srsrWrW9c¨t are i.i.d. random variables
=ªDånG9c«qs9srsrsrU9c¨tA is
, where
and
ånG9c«qs9srsrsrU9c¨t . Thus the probability
the sequence
=ªtheD probability
nG9c«qs9srsrsrU9c¨oft observing
ˆ Œ tsæ .
assigned to an observed sequence will be close to
The asymptotic equipartition property can be formalized in the following
way:
63
Ž ^ k Q \Y_]=ª
8nG90qW9srsrsrU9c¨t M Ž ^ k O Q
\]_]=ªD & &
Ž ^ k Q
\Y_ž=§
8n[9c«qs9srWrsrW9c¨tv¤ Ž<Q
\Y_ž=§
4
Ž ^ k Q
\Y_ž=§
8nG9c«qs9WrsrsrW9«t M FD4
(5.21)
(5.22)
(5.23)
This enables us to divide the set of all sequences into two sets, the typical
set, where the sample entropy is close to the true entropy, and the non-typical set,
which contains the other sequences 5.1. Any property that is proved for the typical
sequences will then be true with high probability and will determine the average
behavior of a large sample.
nG9c«qs9srsrsrU9c¨t be independent and identically distributed random variLet
ables with probability mass p(x). We wish to find short descriptions for such
sequences of random variables. First, we divide all sequences into two sets: the
typical set and its complement.
np9 qs9srsrsrU9 I t
The typical set with respect to p(x) is the set of sequences I I
with the following property:
ˆ Œ t € æ € • ‚ }ç ‚ ² ª= I pn 9 I qW9srsrsrU9 I t ² ˆ‰Ž£^a3
4ŽèG
(5.24)
We can represent each sequence of the
t € ætypical
}ç ‚ set by giving the index of the
²
ˆ
sequence in the set. Since there are
sequences in the typical set, the
v
^
6
éu
G
è
index requires no more than
bits.
t
^
è
For a sufficiently large, at small and using the notation I to denote
np9 qY9srsrsrp9 I t and Q I to denote t the length of the codeword correa sequence I t I
using ^a3 I bits on the average.
sponding to I , we can represent sequences
Non−typical set
Typical set
Figure 5.1: The Asymptotic Equipartition Property
64
5.2. Packet Approach Entropy
For a sequence of packets into a Internet trace, we can assume that the sequence of header field values constitute a stationary ergodic process. Moreover,
given the high aggregation level of high speed Internet links, we can assume also
that the header field values of consecutive packets are independent.
In this case, the joint entropy is:
F6ê & [9 ê & }Cn M F6ê & fu~3ê & }CnY7 Fê & c
(5.25)
F6ê & 9[ê & }Cn M F6ê & fu~3ê & }Cn
(5.26)
Fê & 9[ê & }Cn M ˆÑÀß3ê
(5.27)
Fê & is the header field entropy of packet 1 .
where
^
For a sequence of packets:
Fê & 9[ê & }Cn[9WrsrsrW9GêCt M Fê & fu~F6ê & }CnY73ê & cfuÅÁYÁYÁYu~F6êCt,7êCt Œ [n 9WrsrsrW9Gêªn
(5.28)
F6ê & 9[ê & }Cn[9srsrsrU9[êCtA M ^{ÀßFê
(5.29)
5.2.1 Header Field Entropy
Using a header field approach and for the RedIRIS trace, we have calculated
for each one of the TSH header fields, what is the entropy. For our analysis, we
used 1,000,000 packets. The summation of their correspondent code sizes give us
the average length to establish the limit for the compression of packet headers.
We have used the following header fields: timestamp, Interface, Version, Internet Header Length (IHL), Type of service (TOS), Total Lenght, Identification,
Flags, Fragment Offset, Time to live, Protocol, Header checksum, Source address,
Destination Address, Source port, Destination port, Sequence number, Acknowledgment number, Data Offset, Control Bits, and Window. Bellow, we show a brief
description of these header fields (obtained from [99] and [100]) and we show the
calculated entropy:
H
Timestamp: The timestamp is compound by two fields: a first that records
the time in seconds (32 bits) and a second that records the microseconds
(24 bits). The timestamp shows a highly skewed distribution. However,
we did not calculate the entropy for the timestamp values but for the interpacket time between consecutive packets. In this case, the amount of bits to
represent this header field is significantly lower. This does not represent any
problem because the start time is a ramdom value and the most important is
the inter-arival time. The table 5.1 shows some inter-packet times and their
associated frequency. As we can see, the inter-packet times of 8 sec and 7
sec are very common, having frequency of 17% and 12% respectively. In
the table 5.2 we show the entropy for the timestamp header field.
65
Table 5.1: Timestamp Header Field
Elements Probability p(x)
0.000055
0.002054
0.000010
0.003099
0.000034
0.002352
0.000023
0.005603
0.000011
0.001727
0.000048
0.001218
0.000049
0.002505
0.000009
0.012568
0.000047
0.002521
0.000046
0.003484
0.000032
0.002680
0.000019
0.005069
0.000016
0.000705
0.000048
0.000647
0.000024
0.001683
0.000008
0.171074
0.000007
0.124183
0.000016
0.002643
0.000032
0.003537
0.000012
0.002018
0.000007
0.048100
0.000033
0.005350
0.000009
0.012205
0.000033
0.001073
0.000014
0.006262
0.000034
0.000593
0.000013
0.007582
0.000031
0.003082
0.000011
0.000644
0.000012
0.001001
0.000030
0.001514
0.000018
0.007563
0.000028
0.004963
0.000035
0.001344
0.000026
0.002027
..
..
.
.
Table 5.2: Timestamp Header Field
Original Size (bits) Entropy H(x)
56
6.432200
66
H
Interface: On RedIRIS trace, packets are recorded only in one direction.
In fact, the only interface used is the output link. The table 5.4 shows the
original size in bits and the entropy.
Table 5.3: Interface Header Field
Elements Probability p(x)
1
1.00
H
Table 5.4: Interface Header Field
Original Size (bits) Entropy H(x)
8
0.00
Version: The Version field indicates the format of the Internet header. The
table 5.5 show the only IP version present into the trace. The table 5.52
show the size in bits of the original field and the calculated entropy.
H
H
Table 5.5: IP Version Header Field
Elements Probability p(x)
4
1.00
Internet Header Length: Internet Header Length is the length of the Internet
header in 32 bit words, and thus points to the beginning of the data. Note
that the minimum value for a correct header is 5. The table 5.7 show the
only IHL value present into the trace. The table 5.8 show the size in bits of
the original size and the calculated entropy.
Type of Service: The Type of Service provides an indication of the abstract
parameters of the quality of service desired. These parameters are to be
used to guide the selection of the actual service parameters when transmitting a datagram through a particular network. Several networks offer service
precedence, which somehow treats high precedence traffic as more important than other traffic (generally by accepting only traffic above a certain
precedence at time of high load). The major choice is a three way tradeoff between low-delay, high-reliability, and high-throughput. The table 5.9
show the different elements present into the trace. The table 5.10 show the
size in bits of the original size and the calculated entropy.
67
Table 5.6: IP Version Header Field
Original Size (bits) Entropy H(x)
4
0.00
Table 5.7: IHL Field
Elements Probability p(x)
5
1.00
Table 5.8: IHL Header Field
Original Size (bits) Entropy H(x)
4
0.00
Table 5.9: TOS Header Field
Elements Probability p(x)
0
0.980928
28
0.000176
16
0.014960
160
0.000484
21
0.000880
27
0.000088
128
0.000880
8
0.000176
7
0.000264
18
0.000176
192
0.000176
12
0.000132
136
0.000044
64
0.000088
24
0.000132
3
0.000044
..
..
.
.
Table 5.10: TOS Header Field
Original Size (bits) Entropy H(x)
8
0.179016
68
H
Total Length: Total Length is the length of the datagram, measured in octets,
including Internet header and data. This field allows the length of a datagram to be up to 65,535 octets. Such long datagrams are impractical for
most hosts and networks. All hosts must be prepared to accept datagrams
of up to 576 octets (whether they arrive whole or in fragments). It is recommended that hosts only send datagrams larger than 576 octets if they have
assurance that the destination is prepared to accept the larger datagrams.
The number 576 is selected to allow a reasonable sized data block to be
transmitted in addition to the required header information. For example,
this size allows a data block of 512 octets plus 64 header octets to fit in a
datagram. The maximal Internet header is 60 octets, and a typical Internet
header is 20 octets, allowing a margin for headers of higher level protocols.
The table 5.11 show the different elements present into the trace. The table
5.12 show the size in bits of the original size and the calculated entropy.
Table 5.11: Total Length Header Field
Elements Probability p(x)
1440
0.006618
40
0.398533
985
0.000281
552
0.004167
64
0.005891
1500
0.185472
78
0.022422
52
0.068102
1374
0.000024
1480
0.010455
886
0.000041
48
0.030769
..
..
.
.
H
H
Table 5.12: Total Length Header Field
Original Size (bits) Entropy H(x)
16
4.492404
Identification: An identifying value assigned by the sender to aid in assembling the fragments of a datagram. Each flow assign a random value for the
first packet. Hence, this field shows a distribution highly skewed and we
assumed that compressing it is not possible. Hence, we also did not apply
any compression (see table 5.13).
Flags: This fiels has only three bits, the bit 0 (reserved and must be zero, bit
1 ((DF) 0 = May Fragment, 1 = Don’t Fragment) and bit 2 ((MF) 0 = Last
69
Table 5.13: Identification Header Field
Original Size (bits) Entropy H(x)
16
16
Fragment, 1 = More Fragments). The table 5.14 show the different elements
present into the trace. The table 5.15 show the size in bits of the original
size and the calculated entropy.
Table 5.14: Flags Header Field
Elements Probability p(x)
2
0.936070
0
0.063710
1
0.000220
H
Table 5.15: Flags Header Field
Original Size (bits) Entropy H(x)
3
0.345932
H
H
Fragment Offset: 13 bits This field indicates where in the datagram this
fragment belongs. The fragment offset is measured in units of 8 octets (64
bits). The first fragment has offset zero. The table 5.16 show the different
elements present into the trace. The table 5.17 show the size in bits of the
original size and the calculated entropy.
Time to live: This field indicates the maximum time the datagram is allowed
to remain in the Internet system. If this field contains the value zero, then
the datagram must be destroyed. This field is modified in Internet header
processing. The time is measured in units of seconds, but since every module that processes a datagram must decrease the TTL by at least one even if
it process the datagram in less than a second, the TTL must be thought of
only as an upper bound on the time a datagram may exist. The intention is
to cause undeliverable datagrams to be discarded, and to bound the maximum datagram lifetime. The table 5.18 show the most important elements
present into the trace and the table 5.19 show the size in bits of the original
size and the calculated entropy.
Protocol: This field indicates the next level protocol used in the data portion
of the Internet datagram. The table 5.20 show the different elements present
into the trace. The table 5.21 show the size in bits of the original size and
the calculated entropy.
70
Table 5.16: Fragment Offset Header Field
Elements Probability p(x)
0
0.999780
185
0.000088
1295
0.000044
1639
0.000088
Table 5.17: Fragment Offset Header Field
Original Size (bits) Entropy H(x)
13
0.003325
Table 5.18: Time to Live Header Field
Elements Probability p(x)
61
0.075678
124
0.115364
123
0.259768
125
0.227341
126
0.113956
..
..
.
.
Table 5.19: Time to Live Header Field
Original Size (bits) Entropy H(x)
8
3.206642
Table 5.20: Protocol Header Field
Elements Probability p(x)
6
0.944122
17
0.048530
1
0.007260
50
0.000044
0
0.000044
Table 5.21: Protocol Header Field
Original Size (bits) Entropy H(x)
8
0.343014
71
H
Header Checksum: A checksum on the header only. The checksum field
is the 16 bit one’s complement of the one’s complement sum of all 16 bit
words in the header. Due to simplicity to compute the checksum, we have
reserved only one bit to identify if that is a correct checksum or not (see
table 5.22).
H
Table 5.22: Header Checksum Header Field
Original Size (bits) Entropy H(x)
16
1
Source Address: The source address. The table 5.23 show the size in bits of
the original field and the calculated entropy.
H
Table 5.23: Source Address Header Field
Original Size (bits) Entropy H(x)
32
8.667664
Destination Address: The destination address. The table 5.24 show the size
in bits of the original field and the calculated entropy.
H
Table 5.24: Destination Address Header Field
Original Size (bits) Entropy H(x)
32
10.258050
H
Source Port: The source port number. The table 5.25 show the size in bits
of the original field and the calculated entropy.
H
H
Destination Port: The destination port number. The table 5.26 show the size
in bits of the original field and the calculated entropy.
Sequence Number: The sequence number of the first data octet in this segment (except when SYN is present). If SYN is present the sequence number
is the initial sequence number (ISN) and the first data octet is ISN+1. As the
identification header field, each flow assign a random value to the first packeth and thus the distribution of values is highly skewed and compression is
not possible (see 5.27).
Acknowledgment Number: If the ACK control bit is set this field contains
the value of the next sequence number the sender of the segment is expecting
to receive. Once a connection is established this is always sent. As the
sequence number field, the distribution is highly skewed and compression
also is not possible (see 5.28).
72
Table 5.25: Source Port Header Field
Original Size (bits) Entropy H(x)
16
9.002667
Table 5.26: Destination Port Header Field
Original Size (bits) Entropy H(x)
16
6.713252
Table 5.27: Sequence Number Header Field
Original Size (bits) Entropy H(x)
32
32
Table 5.28: Acknowlegment Number Header Field
Original Size (bits) Entropy H(x)
32
32
73
H
Data Offset: The number of 32 bit words in the TCP Header. This indicates
where the data begins. The TCP header (even one including options) is an
integral number of 32 bits long. The table 5.29 show the different elements
present into the trace. The table 5.30 show the size in bits of the original
size and the calculated entropy.
Table 5.29: Data Offset Header Field
Elements Probability p(x)
8
0.102957
5
0.797254
11
0.005412
0
0.045715
7
0.028467
10
0.006380
6
0.006864
2
0.000396
13
0.000660
3
0.000528
15
0.000924
14
0.002772
9
0.000088
4
0.000572
1
0.000264
12
0.000748
H
Table 5.30: Data Offset Header Field
Original Size (bits) Entropy H(x)
4
1.152861
H
H
Reserved: Reserved for future use. Must be zero. Consequently, we do not
need store it.
Control Bits: This field records the TCP operations URG (Urgent Pointer
field significant), ACK (Acknowledgment field significant), PSH (Push Function), RST (Reset the connection), SYN (Synchronize sequence numbers)
and FIN (No more data from sender). The table 5.31 show the different
elements present into the trace. The table 5.32 show the size in bits of the
original size and the calculated entropy.
Window: The number of data octets beginning with the one indicated in the
acknowledgment field which the sender of this segment is willing to accept.
This field shows a large amount of possible values. The table 5.33 shows
the different elements present into the trace. The table 5.34 show the size in
bits of the original size and the calculated entropy.
74
Table 5.31: Control Bits Header Field
Elements Probability p(x)
16
0.621172
24
0.252288
2
0.032119
1
0.036475
17
0.021867
..
..
.
.
Table 5.32: Control Bits Header Field
Original Size (bits) Entropy H(x)
6
1.694607
Table 5.33: Window Header Field
Elements Probability p(x)
33312
0.001364
16093
0.000044
16459
0.000220
16040
0.000484
8192
0.013332
8760
0.119720
37376
0.010164
0
0.057814
65535
0.034187
17254
0.000044
17232
0.001364
16448
0.003476
17520
0.112020
63898
0.000220
63002
0.000968
16532
0.000132
31680
0.003520
24820
0.015224
..
..
.
.
Table 5.34: Window Header Field
Original Size (bits) Entropy H(x)
16
7.324844
75
In the table bellow, we show a summary of the entropy calculated earlier. Using
a header field compression approach, the compression ratio is limited to 40% (141
/ 352), what means that a packet header file of 100Mb can be reduced at maximum
to 48MB. Clearly, this compression bound is not satisfactory and other approachs
must be evaluated to reach higher compression ratios.
Table 5.35: Summary
Header Field
Size (bits) Entropy H(x)
Timestamp
56
6.432200
Interface
8
0.000000
Version
4
0.000000
IHL
4
0.000000
Type of Service
8
0.179016
Total Length
16
4.492404
Identification
16
16.000000
Flags
3
0.345932
Fragment Offset
13
0.003325
Time to Live
8
3.206642
Protocol
8
0.343014
Header Checksum
16
1.000000
Source Address
32
8.667664
Destination Address
32
10.258050
Source Port
16
9.002667
Destination Port
16
6.713252
Sequence Number
32
32.000000
Acknowledgment Number
32
32.000000
Data Offset
4
1.152861
Reserved
6
0.000000
Control Bits
6
1.694607
Window
16
7.324844
Total
352
140.816478
76
5.2.2 Joint Fields Entropy
In this section we are interested on evaluate the behavior of the entropy when
we aggregate the header fields. For each aggregation level, we depict two tables.
The first table shows the probability associated for each joint and the second table
shows the entropy of the joint header fields.
H
Aggregation level 1: Version+IHL (Tables 5.36 and 5.37).
Table 5.36: Joint Version,IHL
Joint Elements Joint Probability p(x)
p(4,5)
1.00
H
Table 5.37: Joint Version,IHL
Joint Size (bits) Entropy H(x,y)
8
0.00
Aggregation level 2: Version+IHL+Flags (Tables 5.38 and 5.39)
H
Table 5.38: Joint Version,IHL,Flags
Joint Elements Joint Probability p(x)
p(4,5,2)
0.936070
p(4,5,0)
0.063710
p(4,5,1)
0.000220
Aggregation level 3: Version+IHL+Flags+Fragment Offset (Tables 5.40
and 5.41)
77
Table 5.39: Joint Version,IHL,Flags
Joint Size (bits) Entropy H(x,y)
11
0.344969
Table 5.40: Joint Version,IHL,Flags,Fragment Offset
Joint Elements Joint Probability p(x)
p(4,5,2,0)
0.936070
p(4,5,0,0)
0.063622
p(4,5,1,0)
0.000088
p(4,5,1,185)
0.000088
p(4,5,1,1295)
0.000044
p(4,5,0,1639)
0.000088
Table 5.41: Joint Version,IHL,Flags,Fragment Offset
Joint Size (bits) Entropy H(x,y)
24
0.346266
78
H
Aggregation level 4: Version+IHL+Flags+FragOff+Protocol (Tables 5.42
and 5.43)
Table 5.42: Joint Version,IHL,Flags,FragOff,Protocol
Joint Elements Joint Probability p(x)
p(4,5,2,0,6)
0.928326
p(4,5,0,0,6)
0.015795
p(4,5,2,0,17)
0.006688
p(4,5,0,0,17)
0.041843
p(4,5,2,0,1)
0.001056
p(4,5,0,0,1)
0.005896
p(4,5,1,0,1)
0.000088
p(4,5,1,185,1)
0.000088
p(4,5,1,1295,1)
0.000044
p(4,5,0,1639,1)
0.000088
p(4,5,0,0,50)
0.000044
p(4,5,0,0,0)
0.000044
H
H
Table 5.43: Joint Version,IHL,Flags,FragOff,Protocol
Joint Size (bits) Entropy H(x,y)
32
0.493611
Aggregation level 5: Version+IHL+Flags+FragOff+Protocol+Type of Service (Tables 5.44 and 5.45)
Aggregation level 6: Version+IHL+Flags+FragOff+Protocol+TOS+DataOffset
(Tables 5.46 and 5.47)
79
Table 5.44: Joint Version,IHL,Flags,FragOff,Protocol,TOS
Joint Elements
Joint Probability p(x)
p(4,5,2,0,6,0)
0.910815
p(4,5,2,0,6,28)
0.000176
p(4,5,0,0,17,0)
0.041843
p(4,5,0,0,6,0)
0.015136
p(4,5,2,0,6,16)
0.014960
p(4,5,2,0,17,0)
0.006688
p(4,5,0,0,1,0)
0.005368
p(4,5,2,0,6,160)
0.000484
p(4,5,2,0,6,21)
0.000880
p(4,5,2,0,6,27)
0.000088
p(4,5,0,0,6,128)
0.000660
p(4,5,0,0,1,128)
0.000088
p(4,5,2,0,6,8)
0.000176
p(4,5,2,0,1,0)
0.001056
p(4,5,0,0,1,7)
0.000264
p(4,5,2,0,6,18)
0.000176
p(4,5,1,185,1,0)
0.000088
p(4,5,0,0,1,192)
0.000176
p(4,5,1,1295,1,0)
0.000044
p(4,5,0,1639,1,0)
0.000088
p(4,5,2,0,6,128)
0.000132
p(4,5,0,0,50,0)
0.000044
p(4,5,2,0,6,12)
0.000132
p(4,5,1,0,1,0)
0.000088
p(4,5,2,0,6,136)
0.000044
p(4,5,2,0,6,64)
0.000088
p(4,5,2,0,6,24)
0.000132
p(4,5,2,0,6,3)
0.000044
p(4,5,0,0,0,0)
0.000044
Table 5.45: Joint Version,IHL,Flags,FragOff,Protocol,TOS
Joint Size (bits) Entropy H(x,y)
40
0.644337
80
Table 5.46: Joint Version,IHL,Flags,FragOff,Protocol,TOS,DataOffset
Joint Elements
Joint Probability p(x)
p(4,5,2,0,6,0,8)
0.100493
p(4,5,2,0,6,0,5)
0.766543
p(4,5,2,0,6,28,11)
0.000044
p(4,5,0,0,17,0,0)
0.033791
p(4,5,2,0,6,0,7)
0.026927
p(4,5,2,0,6,0,10)
0.005808
p(4,5,0,0,6,0,5)
0.014696
p(4,5,2,0,6,0,11)
0.004532
p(4,5,2,0,6,0,6)
0.005500
p(4,5,0,0,17,0,8)
0.000528
p(4,5,2,0,6,16,5)
0.014168
p(4,5,0,0,17,0,2)
0.000352
p(4,5,0,0,17,0,13)
0.000132
p(4,5,2,0,17,0,0)
0.006512
p(4,5,0,0,1,0,0)
0.004752
p(4,5,0,0,17,0,3)
0.000396
p(4,5,0,0,17,0,15)
0.000352
p(4,5,0,0,17,0,14)
0.002684
p(4,5,2,0,6,160,5)
0.000484
p(4,5,2,0,6,28,5)
0.000132
p(4,5,2,0,6,21,8)
0.000616
p(4,5,0,0,17,0,5)
0.000220
p(4,5,0,0,17,0,9)
0.000088
p(4,5,2,0,6,27,5)
0.000088
p(4,5,0,0,6,128,5)
0.000660
p(4,5,0,0,1,128,4)
0.000088
..
..
.
.
Table 5.47: Joint Version,IHL,Flags,FragOff,Protocol,TOS,DataOffset
Joint Size (bits) Entropy H(x,y)
44
1.495013
81
H
Aggregation level 7: Version+IHL+Flags+FragOff+Protocol+TOS+DataOffset+Control
Bits (Tables 5.48 and 5.49)
Table 5.48: Joint Version,IHL,Flags,FragOff,Protocol,TOS,DataOffset,Control
Bits
Joint Elements
Joint Probability p(x)
p(4,5,2,0,6,0,8,16)
0.082453
p(4,5,2,0,6,0,5,16)
0.509988
p(4,5,2,0,6,0,5,24)
0.231389
p(4,5,2,0,6,28,11,2)
0.000044
p(4,5,0,0,17,0,0,1)
0.030007
p(4,5,2,0,6,0,7,2)
0.023803
p(4,5,2,0,6,0,10,2)
0.002772
p(4,5,0,0,6,0,5,16)
0.009460
p(4,5,0,0,17,0,0,7)
0.000044
p(4,5,2,0,6,0,5,17)
0.018479
p(4,5,2,0,6,0,11,16)
0.004268
p(4,5,2,0,6,0,6,18)
0.001012
p(4,5,2,0,6,0,7,18)
0.003124
p(4,5,2,0,6,0,8,24)
0.014696
p(4,5,0,0,17,0,8,20)
0.000088
p(4,5,2,0,6,16,5,16)
0.009460
p(4,5,0,0,17,0,2,13)
0.000044
p(4,5,0,0,17,0,13,31)
0.000044
p(4,5,2,0,6,16,5,24)
0.004356
p(4,5,0,0,6,0,5,4)
0.002816
p(4,5,2,0,17,0,0,1)
0.006424
p(4,5,2,0,6,0,5,4)
0.005632
p(4,5,0,0,1,0,0,0)
0.004708
p(4,5,2,0,6,0,8,17)
0.002728
p(4,5,0,0,17,0,0,14)
0.000044
..
..
.
.
H
Aggregation level 8: Version+IHL+Flags+FragOff+Protocol+TOS+DataOffset+Control
Bits+Length (Tables 5.50 and 5.51)
82
Table 5.49: Joint Version,IHL,Flags,FragOff,Protocol,TOS,DataOffset,Control
Bits
Joint Size (bits) Entropy H(x,y)
50
2.541000
Table 5.50: Joint Version,IHL,Flags,FragOff,Protocol,TOS,DataOffset,Control
Bits,Length
Joint Elements
Joint Probability p(x)
p(4,5,2,0,6,0,8,16,1440)
0.001364
p(4,5,2,0,6,0,5,16,40)
0.364088
p(4,5,2,0,6,0,5,24,985)
0.000088
p(4,5,2,0,6,0,5,16,552)
0.001496
p(4,5,2,0,6,28,11,2,64)
0.000044
p(4,5,2,0,6,0,5,16,1500)
0.105729
p(4,5,0,0,17,0,0,1,78)
0.024243
p(4,5,2,0,6,0,5,24,1500)
0.057506
p(4,5,2,0,6,0,5,24,52)
0.000836
p(4,5,2,0,6,0,5,24,1374)
0.000044
p(4,5,2,0,6,0,5,24,1480)
0.007172
p(4,5,2,0,6,0,5,24,886)
0.000044
p(4,5,2,0,6,0,7,2,48)
0.023803
p(4,5,2,0,6,0,5,24,378)
0.000308
p(4,5,2,0,6,0,5,24,256)
0.000088
p(4,5,2,0,6,0,5,24,582)
0.001452
p(4,5,2,0,6,0,5,24,498)
0.000132
p(4,5,2,0,6,0,10,2,60)
0.002772
p(4,5,2,0,6,0,5,24,1440)
0.002112
p(4,5,0,0,6,0,5,16,40)
0.007700
p(4,5,2,0,6,0,5,24,281)
0.000088
p(4,5,0,0,17,0,0,1,153)
0.000088
p(4,5,2,0,6,0,5,24,396)
0.000132
..
..
.
.
Table 5.51: Joint Version,IHL,Flags,FragOff,Protocol,TOS,DataOffset,Control
Bits,Length
Joint Size (bits) Entropy H(x,y)
66
5.228122
83
Using a joint header approach, the compression ratio is improved in only 1%
and reach a limit of 39%, what means that a packet header file of 100Mb can be
reduced at maximum to 39MB. This new approach did not imply into a big gain
over the previous method and we must evaluate another approach.
Table 5.52: Summary
Header Field
Size (bits) Entropy H
Joing fields
66
5.228122
Interface
8
0.000000
Timestamp
56
6.432200
Identification
16
16.000000
Time to Live
8
3.206642
Header Checksum
16
1.000000
Source Address
32
8.667664
Destination Address
32
10.258050
Source Port
16
9.002667
Destination Port
16
6.713252
Sequence Number
32
32.000000
Acknowledgment Number
32
32.000000
Reserved
6
0.000000
Window
16
7.324844
Total
352
137.833441
5.3. Flow Level Entropy
We have seen that for a sequence of packets into a Internet trace, we assumed
that the they are independent. However, this is not true for a sequence of packets
into the same flow, where the sequence of some header fields show a strong dependence such as the IP address and TCP port numbers. Then, for a flow approach
we have three scenarios.
5.3.1 First scenario
From the chain rule for entropy we have that if
=ª n[9 qY9WrsrsrW9 I t , then:
according I I
t
F
8n[9c«qs9WrsrsrW9«tA M O
Zn[9c«qY9srsrWrU9c¨t
is drawn
FD & 7 & Πn[9srsrsrU9c8n
(5.30)
In the first scenario, we have that the entropy of the field for a set of flows
with n packets is:
consequently;
&à‹ n
F6-vnë † 9GF2-Þq[vë † 9WrsrsrW9;F6-§të †
(5.31)
36-nG9G-ÞqY9srWrsrU9;-ªt—ë †
(5.32)
This scenario embraces the following fields:
84
H
H
Interface
H
Version
H
IHL
H
Type of Service
H
Flags
H
Fragment Offset
Protocol
5.3.2 Second scenario
In the second scenario, we have that:
F2-nìÐ † 9GF6-ªqGíÐ † 9Wrsrsrp9GF6-§tA¬Ð †
(5.33)
and according with the chain rule:
F6- & G9 - & }Cn M 36- & fu~F2- & C} ns7F2- & (5.34)
î
Wï ð ½
ñ
ò
36- & 9G- & }CncvëPF6-
(5.35)
F2- & 1
where
^ packets: is the entropy of the header field into the packet . For a flow with
F6- & 9G- & }Cn[9srsrWrW9G-§t M 36- & fu~F2- & }Cns7F2- & c uÅÁYÁYÁsu~F2-ªt,7-§t Œ n[9srWrsrW9G-vn
î
ïWð ½
ñ
î
ïWð ½
ñ
ò
ò
(5.36)
36- & 9G- & }Cn[9srsrsrU9G-ªtvë®F6-
(5.37)
This scenario embraces the following fields:
H
H
Source Address
H
Destination Address
H
H
Source Port
Destination Port
Time to Live
To calculate the conditional entropy of a header field, we normalized all
packets in relation to the first packet. Hence, the table 5.53 shows the normalized value for flows with 5 packets, obtaining the following conditional
entropy:
85
Table 5.53: Flows with 5 pkts per flow
Flow Pkt 1 Pkt 2 Pkt 3 Pkt 4 Pkt 5
1
0
0
0
0
0
2
0
0
0
0
0
3
0
0
0
0
0
..
..
..
..
..
..
.
.
.
.
.
.
28
0
0
-67
0
0
29
0
0
0
0
0
..
..
..
..
..
..
.
.
.
.
.
.
H
F2-<ˆS7-¨k] Mdž r ††A††††
F2- ³ 7-ˆ‰9G-¨k] Mņ r(kYˆˆˆAóó
F2-ÏC7- ³ 9G-ˆ‰9G-¨k] Mņ r †††††A†
F2- Õ 7-ÆÏ9G- ³ 9G-ˆ‰9G-¨k] MĆ r †††††A†
Data Offset
The table 5.54 shows the normalized value for flows with 5 packets, obtaining the following conditional entropy:
F2-<ˆS7-¨k] Mdž rôˆAó ˳õË
F2- ³ 7-ˆ‰9G-¨k] Mņ r †A†††††
F2-ÏC7- ³ 9G-ˆ‰9G-¨k] Mņ r †††††A†
F2- Õ 7-ÆÏ9G- ³ 9G-ˆ‰9G-¨k] MĆ r †††††A†
H
Table 5.54: Flows with 5 pkts per flow
Flow Pkt 1 Pkt 2 Pkt 3 Pkt 4 Pkt 5
1
0
-2
-2
-2
-2
2
0
-2
-2
-2
-2
3
0
-2
-2
-2
-2
..
..
..
..
..
..
.
.
.
.
.
.
21
0
-1
-1
-1
-1
22
0
-2
-2
-2
-2
..
..
..
..
..
..
.
.
.
.
.
.
Control Bits
The table 5.55 shows the normalized value for flows with 5 packets, obtaining the following conditional entropy:
F2-<ˆS7-¨k] Mdž r ††A††††
F2- ³ 7-ˆ‰9G-¨k] Mņ r(kYˆˆˆAóó
F2-ÏC7- ³ 9G-ˆ‰9G-¨k] Mņ röˆAó ˳õAË
86
F2- Õ 7 -ÆÏ9G- ³ 9G-ˆ‰9G-¨k] MĆ r Õ k õ÷A† k
Table 5.55: Flows with 5 pkts per flow
Flow Pkt 1 Pkt 2 Pkt 3 Pkt 4 Pkt 5
1
0
14
22
14
15
2
0
14
22
14
15
3
0
14
22
14
15
..
..
..
..
..
..
.
.
.
.
.
.
20
0
14
22
14
2
..
..
..
..
..
..
.
.
.
.
.
.
28
..
.
0
..
.
14
..
.
14
..
.
22 2
..
.
44
..
.
0
..
.
-2
..
.
6
..
.
6
..
.
..
.
7
..
.
5.3.3 Third scenario
In the last scenario, the sequence of the header fields into a flow have a
similar behavior into a sequence of consecutive packets. Hence, in this case:
F2-nìÐ † 9GF6-ªqGíÐ † 9Wrsrsrp9GF6-§tA¬Ð †
(5.38)
and the joint entropy is:
F6- & G9 - & }Cn M 36- & fu~F2- & }Cns7F2- & 36- & 9G- & }Cn M F2- & fuF6- & }Cn
36- & 9G- & }Cn M ˆ’ÀßF6-
(5.39)
(5.40)
(5.41)
^
For a flow with packets:
F6- & 9G- & }Cn[9srsrWrW9G-§t M 36- & f u~F2- & }Cns7F2- & cfuÅÁYÁYÁsu~F2-ªt,7-§t Œ n[9srWrsrW9G-vn
(5.42)
F2- & 9G- & }CnG9srsrsrU9G-ªt M ^{ÀßF2-<
(5.43)
This third scenario embraces the following fields:
H
H
Timestamp
H
Total Length
H
Identification
H
H
Sequence Number
Acknowledgement Number
Window
87
5.4. Trace Compression Bound
œ $ as:
ÿ û]ü2ý2ts¯ Û &àþ
n2øúùúû]ü
ý6tW¯ Û &ôþ
q äÿ ûžü
ý6tW¯ Û &àþ
±
ð FñW6î -ï u
ð FñW6î -ï u À
ð FñW2î -<ï á
á
á
"
œ $ M
" À ³Õˆ
From the three senarios showed in the last section, we define
(5.44)
where, the summation for the three scenarios are showed in the tables 5.56 5.57
5.58. In table 5.58 we are considering that the entropy for the Sequence and Acknowledgment number header fields are equal to zero because they can be deduced
from the Total Length header field.
Table 5.56: Entropy 1st Scenario
Header Field
Entropy H(x)
Interface
0.000000
Version
0.000000
IHL
0.000000
Type of Service
0.179016
Flags
0.345932
Fragment Offset
0.003325
Protocol
0.343014
Total
0.871287
Table 5.57: Entropy 2nd Scenario
Header Field
Entropy H(x)
Time to Live
3.206642
Source Address
8.667664
Destination Address
10.258050
Source Port
9.002667
Destination Port
6.713252
Data Offset
1.152861
Control Bits
1.694607
Total
40.695743
œ t is:
œ $ M † Èr ó ÷ k]ˆXó ÷ uÏ † r Ëõ Õ ÷À Ï ³³ u ˆ " À ³ ÏrôˆžÏ õ ÏÏó
"
Õ
Hence, the final expression for
(5.45)
and the maximum compression ratio is:
where d
$
Ù œ $ À d$
is the flow probability distribution for m-packet flows.
88
(5.46)
Table 5.58: Entropy 3rd Scenario
Header Field
Entropy H(x)
Timestamp
6.432200
Total Length
4.492404
Identification
16.000000
Sequence Number
0.000000
Acknowledgment Number
0.000000
Window
7.324844
Total
34.249448
5.5. Conclusions
Applying the equations showed previouly, we deduced that the compression
bound for a TCP/IP header trace is around 13%. The figure 5.59 shows for "
Ù ˆ‰k ) we
ranging from 1 to 20 what is the associated entropy. For large flows ("
M kՆ.
have assumed a mean value for the number of packets. This value was "
Bellow, we resume the main conclusions derived from this chapter:
H
H
For some TCP/IP header fields, we found a relatively low entropy, what
means that there are some values assigned to them that show high probability;
H
H
Selecting only the header fields with low entropy and grouping them, we
have calculated the entropy at packet level. We have seen that this new
arrangement do not impaired significantly the entropy;
Also, we have seen that breaking down a trace into flows with the same
number of packets and considering each different flow as a random variable,
we obtain higher compression ratio;
The compression bound for TCP/IP header traces is around 13%.
89
m-packet flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
150
Compression
Table 5.59: Flows distribution
œ
d$
$
0.095545617
0.070475923
0.063516576
0.052245242
0.195583723
0.163133511
0.077876497
0.039830729
0.028338782
0.021940996
0.016285273
0.012875795
0.010408937
0.008343194
0.007761577
0.006718678
0.007220072
0.005655723
0.005214496
0.003910872
0.107117787
Bound
90
0.215387651
0.156343609
0.136662262
0.126821589
0.120917185
0.116980915
0.114169294
0.112060578
0.110420466
0.109108376
0.108034848
0.107140242
0.106383267
0.105734431
0.105172107
0.104680073
0.104245926
0.103860017
0.103514730
0.103203972
0.098086822
œ $ À d$
0.020579346
0.011018460
0.008680319
0.006625825
0.023649433
0.019083507
0.008891105
0.004463455
0.003129182
0.002393946
0.001759377
0.001379516
0.001107337
0.000882163
0.000816301
0.000703312
0.000752663
0.000587403
0.000539777
0.000403618
0.010506843
0.127952888
Chapter 6
Lossless Compression Method
The main reason why header compression can be done at all is the fact that there
is significant redundancy between header fields, both within consecutive packets
belonging to the same flow and in particular between flows. The big gain of our
proposed method comes from the observation that, for a set of selected header
fields, the flows traveling into an Internet link are very similar. By utilizing a set
of pre-computed templates of flows, and Huffman encoding, the header size can
be significantly reduced. Hence, we have embarked upon the development of a
new header compression scheme for packet header files that reduces drastically
storage requirements. This Chapter provides the details of how the method works,
focusing on the fact that the decompressed header is functionally identical to the
original header.
6.1 Generic Compression
Content compress can be as simple as removing all extra space characters,
inserting a single repeat character to indicate a string of repeated characters, and
substituting smaller bit strings for frequently occurring characters. The compression is performed by algorithms which determine how to compress and decompress. Some of the most popular compress algorithms are the Huffman coding
[74], LZ77 [124], and deflate [32]. Those specifications define lossless compressed data formats.
Huffman encoding belongs into a family of algorithms with variable codeword length. That means that individual values are replaced by bit sequences
(messages) that have a distinct length. So values that appear frequently in a packet
header are given a short sequence while others that are used seldom get a longer
bit sequence. To achieve those aims, the following basic restrictions are imposed
on the encoding algorithm:
H
H
No two messages will consist of identical arrangements of bits.
The message codes will be constructed in such a way that no additional
indication is necessary to specify where a message code begins and ends
once the starting point of a sequence of messages is known.
91
According to [27], an optimal (shortest expected length) prefix code for a
given distribution can be constructed by the Huffman algorithm. Was proved that
any other code for the same alphabet cannot have a lower expected length than the
code constructed by the algorithm.
Huffman coding is a form of prefix coding prepared by a special algorithm.
The Huffman compression algorithm assumes data files consist of some values
that occur more frequently than other values in the same file. This is very true, for
instance, for text files and TCP/IP header traces.
The algorithm builds a Frequency Table for each value within a file. With the
frequency table the algorithm can then build the Huffman Tree from the frequency
table. The purpose of the tree is to associate each value with a bit string of variable
length. The more frequently used characters get shorter bit strings, while the less
frequent characters get longer bit strings. Thus, the data file may be compressed.
The tree sctructure containes nodes, each of which contains a value, its frequency, a pointer to a parent node, and pointers to the left and right child nodes.
At first there are no parent nodes. The tree grows by making successive passes
through the existing nodes. Each pass searches for two nodes, that have not grown
a parent node and that have the two lowest frequency counts. When the algorithm
finds those two nodes, it allocates a new node, assigns it as the parent of the two
nodes, and gives the new node a frequency count that is the sum of the two child
nodes. The next iterations ignores those two child nodes but includes the new parent node. The passes continue until only one node with no parent remains. That
node will be the root node of the tree.
To compress the file, the Huffman algorithm reads the file a second time,
converting each value into the bit string assigned to it by the Huffman Tree and
then writing the bit string to a new file. Compression then involves traversing the
tree begining at the lead node for the value to be compressed and navigating to
the root. This navigation iteratively selects the parent of the current node and sees
whether the current node is the right or left child of the parent, thus determining if
the next bit is a one (1) or a zero (0). The assignment of the 1 bit to the left branch
and the 0 bit to the right is arbitraly.
The decompression routine reverses the process by reading in the stored
frequency table (presumably stored in the compressed file as a header) that was
used in compressing the file. With the frequency table the decompressor can then
re-build the Huffman Tree, and from that, extrapolate all the bit strings stored
in the compressed file to their original value form. To do that, the algorithms
reads the file a bit at a time. Beginning at the root node in the Huffman Tree and
depending on the value of the bit, you take the right or left branch of the tree and
then return to read another bit. When the selected node is a leaf (it has no right
and left child nodes) the algorithm writes the value to the decompressed file and
go back to the root node for the next bit.
LZ77 compression works by finding sequences of data that are repeated.
The term sliding window is used; all it really means is that at any given point in
the data, there is a record of what characters went before. A 32K sliding window
means that the compressor (and decompressor) have a record of what the last
92
32768 characters were. When the next sequence of characters to be compressed
is identical to one that can be found within the sliding window, the sequence of
characters is replaced by two numbers: a distance, representing how far back
into the window the sequence starts, and a length, representing the number of
characters for which the sequence is identical. With the Deflate compressor, the
LZ77 and Huffman algorithms work together.
6.2 TCP/IP header Compression
The previous methods do not take into account the specific properties of the
data to be compressed. The following methods have been developed for saving
transmission bandwidth on channels such as wireless and slow point-to-point links
and are based on the fact that in TCP connections, the content of many TCP/IP
header fields of consecutive packets of a flow can be usually predicted.
The original scheme proposed for TCP/IP header compression in the context
of transmission of Internet traffic through low speed serial links is Van Jacobson’s
header compression algorithm [65]. The main goal behind [65] header compression scheme is to improve the line efficiency for a serial link. Increasing the line
efficiency also allows for better allocation in asymmetric bandwidth allocation
schemes. Since allocation is often dependent on the amount of data to be sent or
received, the smaller headers will cause less fluctuation in the amount of actual
data traversing the link. Also for this particular header compression scheme, the
TCP and IP headers are compressed together and not individually.
The header compression scheme in [35] aims to satisfy several goals, including better response time, line efficiency, loss rate, and bit overhead. [35] is
similar to [65] in regards to TCP, but includes support for other features and protocols, such as TCP options, ECN, IPV6, and UDP. The specification also allows
extension to multicast, multi-access links, and other compression schemes which
ride over UDP.
The goal of [21] is to provide a means of reducing the cost of headers when
using the Real-Time Transport Protocol (RTP), which is often used for applications such as audio and video transport. However, instead of compressing the
RTP header alone, greater efficiency is obtained by compressing the combined
RTP/UDP/IP headers together. Another important goal is that the implementations for the compression and decompression code need to be simple, as a single
processor may need to handle many interfaces. Finally, this compression scheme
is not intended to work in conjuction with RTCP (Real-Time Transport Control
Protocol) as the required additional complexity is undesirable.
The incentive of the robust header compression scheme [40] arose from links
with significant error rates, long round-trip times, and bandwidth limited capacity, and thus, the goal is to be able to design highly robust and efficient header
compression schemes based upon a highle extensible framework. Consequently,
the information presented in this document for this particular header compression
scheme represents the underlying framework on which compression for other pro93
tocols is built. Finally, both UDP and RTP are covered in the [40] scheme.
Since then, specifications for the compression of a number of other protocols
have been written. Degermark proposed additional compression algorithms for
UDP/IP and TCP/IPv6 [34]. Equally for wireless environments, another scheme
that makes use of the similarity in consecutive flows from or to a given mobile
terminal is described in [116].
6.3 Proposed Header Trace Compression
Our proposed compression method is in the context of saving storage space
of potentially huge packet traces. The advantage of our proposed consists of know
in advance the trace file and consequently the complete flows. Thus, we explore
the properties of TCP/IP flows to predict some header fields and the Huffman
encoding to obtain optimal compression ratio.
We make use of the similarity in consecutive header fields in a packet flow
to compress these headers. Here, we use the definition of flow presented earlier
as a sequence of packets with the same IP 5-tuple and such that the time between
two consecutive packets does not exceed 5 sec.
Within a flow, most of the header fields of consecutive packets are identical
or change in a very predictable manner. For this set of fields, storage is required
only once per each flow. However, for another set, the header fields assume different values along the same flow.
A previous phase to the compression itself was the step to determine the clusters of TCP/IP flows and build a Huffman tree to them. For practical reasons, the
clusters are limited to flows with maximum 7 packets. The clusters were obtained
from traces collected in RedIRIS and from traces downlaoded from NLANR (Figure 6.1). This set of clusters is expected to does not change significantly in size
for any different traces.
RedIRIS Trace
Cluster
Analyzer
Clusters
of
Flows
NLANR Traces
Figure 6.1: Cluster Generation
The table 6.1 shows the calculated entropy for the set of clusters. The clusters
were grouped by the number of packets per flow.
Our proposed compression occurs in two steps. At a first step, the algotithm
traverse all file trace building the header field frequency tables and examining the
presence of new clusters. The second step is the compression itself. Moreover,
94
Table 6.1: Interface Header Field
m-packet Entropy H Code Size (bits)
2
0.940182
1
3
2.269575
3
4
2.913547
3
5
2.206767
3
6
3.317451
4
7
4.785971
5
we have applied different approaches for small flows and large flows. In both
cases, the compression is carried out at flow level but for small flows we apply the
clustering techniques described in chapter 4.
6.3.1 First Step: Header Frequency
The first step starts reading the trace and looking into the packet the values
of each header. Here, the compression algorithm creates one dataset and updates
a second dataset (Figure 6.2).
A first dataset (Header Frequency Table) stores the frequency of some header
fields such as: source address, destination address, source port, destination port
and protocol. For each packet, the algorithm reads the value of the headers and
after read the last packet, the algorithm calculates the frequency of each one.
The second dataset (Clustering Frequency Table) stores the flow clusterings.
This dataset shows small variability because the most common flows were stored
previously. For each packet, the algorithm looks into the 5-tuple of fields (Source
and Destination IP address, Source and Destination port number, and Protocol
number) to identify each new connection. Whenever a packet carrying a new flow
is found, a new node is inserted at the end of a temporary data structure (Figure
6.3). This temporary data structure is implemented by a linked list and it stores
^
the packet headers of connections. When a Fin/Rst TCP flag arises in a packet
or the time between two consecutive packets exceeds 5 sec, the flow is flagged,
indicating that this flow has been completed. After that, the algorithm examines
the number of inserted nodes associated to this flow. If the number of packets
is greater than seven, the flow is imediately removed from the temporary data
structure. Otherwise, if the number of packets is smaller than 8 it searches into
the Clusterring Frequency Table dataset for identical flow. If a hit is not possible,
this flow is added into the dataset and a larger message length is assigned to it.
It is important to say that this dataset is not inserted in the compressed file and
moreover it is shared between many compressed files.
6.3.2 Second Step Compression
This second step is related with the compression itself. The method starts
again looking into the 5-tuple of fields (Source and Destination IP address, Source
95
Clustering
Frequency
Table
.tsh
packet header
Compressor
Step 1
Header
Frequency
Table
Temporary
data structure
Figure 6.2: Compression First Step
and Destination port number, and Protocol number) to identify each new connection. Whenever a packet carrying a new flow is found, a new node is inserted at the
end of another temporary data structure (linked list). When the flow placed at the
head of the temporary linked list reaches a completed flow status, the compressor
examines the number of inserted nodes associated to this flow to see if it is a small
or large flow.
In the case of small flows and for a first set of fields, the algorithm works
searching for identical sequence of packets characteristics into the Cluster of
Flows dataset. For the remaining header fields, the algorithm searches for the
correspondent code size (Figure 6.4).
After the template searching, the compressor algorithm starts writing into the
compressed header file. For many fields (see Figure 4.3), the storage is reduced
to a template identifier, which is the most important realization of our proposed
method.
However, for other fields, which predictability is not possible, the carried
information requires to be stored. Here, is important to consider that, for some of
these fields, which the value is likely to stay constant over the life of a flow, the
storage is required only once per each flow. However, for the remainder fields, the
storage is required for each packet.
The Figure 6.5 shows the compressed data format for small flows. The first
field (1 bit), is a flag to identify the type of flow: small or large. The following
five fields store the respective codes for source and destination address, source
and destination port and protocol. The next field stores the flow clustering code,
which represents the following fields: Interface, Version, IHL, Type of Service,
Flags, Fragment Offset, Data Offset, and Control Bits. The next five fields and for
each packet, stores the code to Inter-packet time, Identification, Length, Window
and chksum.
96
Flow 1
Pkt1
Flow 2
Flow 3
Flow n
Pkt1
Pkt1
Pkt1
Pkt2
Pkt2
Pkt3
Pkt3
Pkt4
Figure 6.3: Temporary data structure
Clustering
Frequency
Table
.tsh
packet header
Header
Frequency
Table
Compressor
Step 2
Temporary
data structure
Figure 6.4: Temporary data structure
97
compressed
header
file
FC
Source Address Code
Protocol Code
Initial
TTL
Destination Address Code
Flow Clustering Code
Length Code 1
Window Code 1
ChkS1
Length Code 2
Window Code 2
ChkS2
Time Code N
Identification Code N
Source Port Code
Time Code 1
Identification Code 1
Time Code 2
Identification Code 2
Length Code N
Figure 6.5: Temporary data structure
98
Destination Port Code
Windlow Code N
ChkSN
As already mentioned, for large flows we have used a different approach. In
this class of flows, when a large flow reaches the completed status, each packet is
inspected by the compressor in order to determine the correspondent header field
codes (Figure 6.6).
Header
Frequency
Table
.tsh
packet header
Compressor
Step 2
compressed
header
file
Temporary
data structure
Figure 6.6: Temporary data structure
The Figure 6.7 shows the compressed data format for large flows. The first
field (1 bit), is a flag to identify the type of flow: small or large. The following
five fields store the respective codes for source and destination address, source
and destination port and protocol. For each packet the next fields store the following informations: the packet control (1 bit) which indicates whether it is the last
packet; the joint header codes (Interface, Version, IHL, Type of Service, Flags,
Fragment Offset, Data Offset, Control Bits); Inter-packet time code, Identification
code, Length code, Window code and chksum.
99
FC
Destination Address Code
Source Address Code
Protocol Code
Initial
TTL
Length Code 1
PC Joint Header Code 1
Window Code 1
Identification Code 2
PC
Windlow Code N
Time Code 1
ChkS1 PC
Time Code N
Destination Port Code
Identification Code 1
Joint Header Code 2
Length Code 2
Joint Header Code N
Source Port Code
Window Code 2
Identification Code N
ChkSN
Figure 6.7: Temporary data structure
100
Time Code 2
ChkS2
Length Code N
6.4 Decompression algorithm
Processing at the decompressor is much simpler than at the compressor because all decisions have been made and the decompressor simply does what the
compressor has told it to do. To perform its functionalities, the decompression
algorithm sets up a temporary linked list to store the decompressed packets head^
ers of connections. It works reading the Clustering Frequency Table and the
Header Frequency table datasets (Figure 6.8). These two datasets store the necessary information to reproduce the header fields of all original packets.
Clustering
Frequency
Table
compressed
header
file
Header
Frequency
Table
Decompressor
.tsh
header
trace
Temporary
data structure
Figure 6.8: Decompression model
The decompression algorithm starts assigning a random timestamp to the
first flow. To the following flows, the Time Code field (see Figures 6.5 and 6.7
indicates where each packet starts. The field FC (Flow Control) indicates how to
decompress the flow (small or large). The first method refers to small flows. For
each flow, the algorithm reads the following informations: Source Address, Destination Address, Source Port, Destination Port, Protocol, initial TTL value and
the template identifier. We have assigned initial random values to the following
header fields: Sequence Number, and Acknowledgment Number. After the template be identified, the algorithm goes decoding the following fields: interface,
Version, IHL, Type of Service, Flags, Fragment Offset, Ã TTL, Data Offset, Reserved, and Control Bits. The timestamp, identification, Total Length and Window
fields, are restored packet by packet (see Figure 6.5). The sequence number and
Acknowledgment Number values are reconstructed based on the stored total length
field.
The second method refers to large flows. The decompression method is similar to the previously described mehtod for small flows, with the small diference
that some header field informations are gathered directly from the Huffman treee,
and not from templates of flows.
101
At the point that all header informations has been consumed from the packet,
so its checksum is recalculated and stored in the IP checksum field. Depending
on the header checksum flag the value is calculated correctly or not. For each
decompressed packet, the algorithm inserts a new node at the temporary linked
list sorting by time-stamp. After decoding the last packet of a flow, the algorithm
continues the process reading the next record in the compressed dataset. Meantime, all nodes in the linked list are checked. For the nodes whose timestamp
fields are smaller than the new flow start point, the packet headers are written in a
decompressed file.
6.5 Compression Ratio
The expected message length ( ) can be calculated in order to measure the
#021Ÿ the probability
efficiency of the algorithm. Let be the number of messages,
£21Ÿ
of the i-th value and
the length of a message (number of bits assigned to it).
Then:
M O
&>‹ n
#021£21[r
(6.1)
Moreover, the expected length of any instantaneous code for a random
variable X is greater than or equal to the entropy H(X), i.e.,
Ù 3
4
(6.2)
This section address the problem of performance analysis of the proposed
compression method. The analysis were carried out by comparing the compression ratio of different compression methods.
Using the compression algorithm described in the previous section, we depict in figure 6.9 for each m-packets flow (X axis), the correspondent compression
ratio (Y axis). As we can see on figure 6.9, the compression ratio starts from 23%
for 1-packet flow decaying fastly to 13% and then decaying smoothly until reach
11% in the case of very large flows. Around 7-packet flow, we notice the change
of compression method from small to large flows.
The curve of the figure 6.10 depicts the cumulative compression ratio when
" (packets
in flow) range from 1 to large values. The summation of all terms give
us the value of the trace compression. As we can see from figure 6.10 this value
is around 16%, what means that from a header trace file of 100MB our method
compress it to 16MB.
Is important to say that there are some data structures with informations
related to this method that are also needed as for instance the Flow Clustering
dataset. However, we do not take into account because they stay almost constant.
We studied the efficiency of the proposed compression method, comparing it
against GZIP [47] and Van Jacobson methods. The GZIP and also ZIP and ZLIB
[48] applications use the deflation algorithm. The measures were taken from a
TSH (Time Sequence Header) header trace file [89], [98]. The compressed file
102
0.24
Flow compression
Compression ratio
0.22
0.2
0.18
0.16
0.14
0.12
0.1
1
10
100
1000
10000
100000
Packets/Flow
Figure 6.9: Flow compression bound
Cumulative compression ratio
0.18
Proposed compression algorithm
0.16
0.14
0.12
0.1
0.08
0.06
0.04
1
10
100
1000
10000
Packets/Flow
Figure 6.10: Trace compression bound
103
100000
size obtained using the GZIP application is around 50% of the original TSH file
size.
For the Van Jacobson method, the header size of a compressed datagram
ranges from 3 to 16 bytes. However, we must modify slightly the original method
because the number of active flows is much more larger in a high-speed Internet
link than in a low speed serial link (the scenario where Van Jacobson was originally proposed). Hence, we must increase thus the number of bytes needed to
store the flow identifier (we have increased it from 1 byte to 3 bytes). Moreover,
we assume that a time stamp (3 bytes) is added to each header. As a result we
assume that minimal encoded headers becomes 8 bytes in the best case and 21
bytes in the worst case. Taking into account only the best case and considering
the changes that we have explained before, the compression ratio for n-packet
flows using the Van Jacobson method is bounded by:
d
Ž k] 9
" M ÏÏ£u~ÏAóÏ " `
"
obtaining thus a compression ratio given by:
¯ à& þ P
M O # $ d
Ò
$
" (6.3)
(6.4)
Using this approach, we conclude that the compression rate of the Van Jacobson method reaches 32% in the best case.
The performance of these three compression methods under analysis (our
proposed method, GZIP method and Van Jacobson method) is depicted in Figure
6.11. For different trace collection time (X axis), we show the correspondent
storage needs (Y axis).
104
Compressed file size (MBytes)
60
GZIP method
VJ method
Proposed method
50
40
30
20
10
0
0
20
40
60
80
Uncompressed file size (MBytes)
Figure 6.11: Compression techniques comparison
105
100
106
Chapter 7
Lossy Compression Method
In this chapter we present a new lossy header trace compression method based on
clustering of TCP flows. With a flow characterization approach that incorporates
a set of packet characteristics such as inter-packet time, payload size, and TCP
structures, we demonstrated that behind the great number of flows in a high-speed
link, there is not so much variety among them and clearly they can be grouped
into a set of clusters. Using templates of flows, we developed an efficient method
to compress a header trace file. With this proposed method, storage size requirements are reduced to 3% of its original size. Although this specification defines
a not lossless compressed data format, it preserves important statistical properties present into original trace such as self-similarity, spatial and temporal locality, and IP address structure. Furthermore, in order to validate the decompressed
trace, measurements were taken of memory access and cache miss ratio. The results showed that under specific purposes, our proposed method provides a good
solution for header trace compression.
7.1 Packet Trace Compression
After the analysis presented in [61], we have seen into several traces that
there are some type of flows extremely popular and most of them are short lived
and have small number of packets [51], [19].
Similarly as the lossless compression method, we make use of the similarity
in consecutive header fields in a packet flow to compress these headers. Here, we
also use the definition of flow presented earlier as a sequence of packets with the
same IP 5-tuple and such that the time between two consecutive packets does not
exceed 5 sec.
A previous phase to the compression itself was the step to build the template
of flows. This template is based on the most common type of clusters found in
many traces. Here, we have not limited the clusters for only small flows, but all
flows lenght are stored. Besides the header fields used earlier to calculate the
clusters, we have added the following header fields:
H
Inter-packet time into a flow: for small flows, the interpacket time is calcu107
H
lated in terms of acknowledgment dependence. If a packet to be transmited
waits for a packet sent by the opposite node, it is called a dependent packet,
otherwise, if a packet is sent immediately after the last one, we classify it
as a not dependent. For instance, in the TCP three-way handshake, when a
node send a Syn TCP flag, it waits for a Syn+Ack TCP flag from the opposite node. This waiting time correspond to the Round Trip Time (RTT).
In this sense, we associate inter packet time to acknowledgement dependence. In the case of short flows, we have seen that time-varying does not
represents a serious problem. Hence, for short flows, we have assumed that
each flow has a specific RTT. Evidently, this assumption is not true for long
flows, where RTT is dynamic and time-varying. Hence, for long flows, we
store the inter-packet time;
Total length: we have seen that aproximately 40% and 20% of all packet
carry 40 and 1,500 bytes respectively. In this case we have used the followind binary codes: 00 if packet size = 40, 01 if packet size = 1500, and
‘
‘
11 if 40 packet size 1500.
A second step in the compression consists of explore some header field
distributions. We apply the step to obtain the following data:
H
H
Server Port frequency: for know TCP ports, such as Web servers, we calculate their distribution;
H
Client/server frequency: we calculate how distributed are the flows between
the following relashionship: client/client, client/server, server/client.
H
IP address: amount of unique IP addresses found in the trace;
H
@src,@dst pairs: amount of unique source and destination IP address in the
trace;
H
RTT: Round trip time distribution;
Window: header field distribution for the Window header field.
Finally, the method for compression starts looking into the 5-tuple of fields
(source and destination IP address, source and destination port number, and protocol number). When a packet carrying a new flow is found, a new node is inserted
at the end of a temporary linked list and a new entry is created in the compressed
dataset.
When the flow reaches the status of completed, first of all, the algorithm
looks for the number of inserted nodes associated to this flow and searches into
the template dataset for identical sequence of header fields. In the case that, a
match is not possible, the algorithm searches for the most similar, calculating the
euclidian distance between them. Moreover, the algorithm looks into the trace for
a previous identical pair of source and destination address. After that, we update
the compressed dataset and remove all nodes of this flow from the linked list.
For each flow, we store in the compressed dataset the following data:
108
Inter-Flow time (7 bits): Distance between consecutive flows
H
Number of packets (17 bits): Number of packets into the flow
H
Cluster Identifier (14 bits): a pointer to the Template dataset
H
H
Distance to same pair (10 bits): Distance in terms of number of flows to a
previously flows with identical pair of source and destination IP address.
As we can see, independently of the flow length, we need of 48 bits to represent
each flow.
7.2 Decompression algorithm
The real question is, given a compressed trace, how to decompress fast. The
decompression algorithm is implemented in two steps: a multifractal IP address
generator which mimics the IP address structure of real traces, and the algorithm
that restores the compressed files.
Kohler, Paxson, and Shenker [75] have demonstrated that real address structure look broadly self-similar: meaningful structure appears at all magnifications
levels. The multifractal address generator is based on a multiplicative process.
Following Evertsz and Mandelbrot [42], a process that fragments a set into smaller
and smaller components according to some rule, and at the same time fragments
the measure or mass associated with these components according to some other
rule is a multiplicative process or cascade. The more formal mathematical construction of a cascade starts with assigning aq unit of mass
the unit interval
B M † 9sk] , where the subinterval œ ¯ M °)mˆ ± 9Y6 uÌk]cmAˆ ± q tocorresponds
to ad
dress . The construction starts splitting the interval into three parts where the
B]½
middle part takes up a fraction ¼ of the whole interval.These parts are called ,
Bžn , and BWq . Then throw away the middle part BAn , giving it none of the parent in½ and " q M kxŽ " ½ .
terval’s mass. The other subintervals are assigned masses "
Y
B
½
W
B
q
Recursing on the nonempty subintervals and generates
nonempty subinq½ " four
qq
W
B
½
½
W
B
½
q
W
B
q
½
W
B
q
q
½
q
q
½
"
"
"
"
"
tervals
,
,
, and
with respective masses
,
,
, and
.
Continuing the procedure defines a set of addresses that exhibit a wide spectrum
of local scaling behaviors.
Using the previously calculated Number of IP address, Number of source/destination
address pair, client/server distribution, and TCP port frequency, we generated a sequence of an anonymized 4-tuple.
The decompression algorithm sets up a linked list to store temporarily the
sequence of decompressed packets. It works reading the compressed datasets.
As we have seen, the compressed dataset stores, for each flow, the time-stamp
of the first packet, the number of packets, a pointer to the template dataset, and
the distance to a previous flow with identical source and destination pair. The
template dataset stores the necessary information to reproduce important packet
flow characteristics such as: inter packet time, TCP flag sequence and packet size.
109
The algorithm starts reading the compressed dataset. Note that this dataset
is sorted by the time-stamp data field. For each flow, a TTL field is generated
using the distribution calculated previously. Reading the cluster identifier, the
algorithm identify the correspondent template. The algorithm goes reading the
template values and decoding the interpacket time, interface, version, IHL, TOS,
Length, Flags, FragOFF, Protocol, DataOFF, Control Flags. For each decompressed packet, the algorithm inserts a new node at the linked list sorting by
the time-stamp. Furthermore, are assigned the source and destination IP address,
source and destination port number. If the record pointer to a previous address,
we simply copy the same 4-tuple, if not, we write a new 4-tuple from the list generated previously. The Sequence and Acknowledgement numbers are generated
based on the Total Length. To the Identification header field we generate an initial
random value and sequential values for the next packets. After read the last value
of the template, the algorithm continues the procedure reading the next record in
the compressed dataset. At this moment, all nodes in the linked list with time
stamp less than the current value are written in a decompressed file.
7.3 Compression Ratio
To study the efficiency of the proposed compression method, we compare
the compress ratio for different compression methods for large packet traces. The
measures were taken from a TSH (Time Sequence Header) header trace file and
the compression methods evaluated were the GZIP [47], the Van Jacobson method
and the method proposed in [61].
In the proposed compression method 8 bytes for the first packet of a flow
are sufficient to represent each flow of " packets. There are some data structures
with information related to the clusters of flows that are also needed. However
these additional data structures are almost constant with the packet trace length.
Then the compression ratio for m-packet flows is given by:
d " M Ï † ó
"
(7.1)
obtaining thus a compression ratio of:
¯ Ò à& þ M O
$
# $ d " (7.2)
The table 7.1 shows for each one of the m-packet flows the correspondent
compression ratio (third column) and the figure 7.1 depicts them graphically.
Multiplying each m-packet compression ratio (third column) by the its frequency (second column) we obtain the relative trace compresson ratio (fourth
colums). The summation of all components produces the total trace compression ratio. As we can see, this value is around 3%. The curve on figure 7.2 shows
how it increases.
The GZIP application and also ZIP and ZLIB [48] use the deflation algorithm. For different TSH file sizes, the compressed file size obtained using the
110
m-packet flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
150
Compression
Table 7.1: Flows distribution
œ
d$
$
0.095545617
0.070475923
0.063516576
0.052245242
0.195583723
0.163133511
0.077876497
0.039830729
0.028338782
0.021940996
0.016285273
0.012875795
0.010408937
0.008343194
0.007761577
0.006718678
0.007220072
0.005655723
0.005214496
0.003910872
0.107117787
0.136363636
0.068181818
0.045454545
0.034090909
0.027272727
0.022727273
0.019480519
0.017045455
0.015151515
0.013636364
0.012396694
0.011363636
0.010489510
0.009740260
0.009090909
0.008522727
0.008021390
0.007575758
0.007177033
0.006818182
0.000909091
0.14
œ $ À d$
0.013028948
0.004805177
0.002887117
0.001781088
0.005334102
0.003707580
0.001517075
0.000678933
0.000429375
0.000299195
0.000201884
0.000146316
0.000109185
0.000081264
0.000070559
0.000057261
0.000057915
0.000042846
0.000037424
0.000026665
0.000097379
0.03539729
Flow Compression
Compression ratio
0.12
0.1
0.08
0.06
0.04
0.02
0
1
10
100
1000
Packets/Flows
Figure 7.1: File size comparison
111
10000
100000
Cumulative compression ratio
0.04
Proposed compression algorithm
0.035
0.03
0.025
0.02
0.015
0.01
1
10
100
1000
10000
100000
Packets/Flow
Figure 7.2: File size comparison
GZIP application is 50% of the original TSH file size (see Figure 7.3). For the
Van Jacobson method we have seen in the last chapter that the compression ratio
is around 30%
Figure 7.3 shows the file size of the original trace and the correspondent file
sizes for the three compression methods under analysis.
112
Compressed file size (MBytes)
60
GZIP method
VJ method
Proposed method
50
40
30
20
10
0
0
20
40
60
80
Uncompressed file size (MBytes)
Figure 7.3: File size comparison
113
100
7.4 Comparative Packet Trace Characteristics
In this section we validate how effective is our lossy compression method.
We have compared packet trace properties of the original trace against the decompressed trace. The results have demonstrated that the decompressed trace
reproduces good approximations of original trace. The properties under analysis
were: self-similarity, spatial and temporal locality, and IP address structure.
7.4.1 Self-Similarity
To capture the long-range dependence we employed the statistics of selfsimilarity. Self-similarity means that the statistical properties of a stochastic process do not change for all aggregation levels of the stochastic process. That is, the
stochastic process looks the same if one zooms in time in and out in the process
[121]. The Hurst parameter express the degree of self-similarity; large values
zÄ † r Õ 9Wk] the process is called long-range
indicate stronger self-similarity. If
dependent(LRD).
There are three methods to determine self-similarity and estimate the parameter H of a given set of data. These methods are: Variance-time plots R/S (Rescale
adjusted range) statistic Frequency domain: periodogram and Whittle’s
To estimate the Hurst parameter, we have used the R/S plot method. This
method is useful for providing a single estimate of
and results in a X-Y plot.
Linearity in the plot is evidence of long-range dependence in the underlying series, and the slope of the line in each case can be used to estimate . Plots of
the graphical estimators for the decompressed and the original RedIRIS trace are
shown in Figure 7.4.
M † r ÷ k and M † r ÷A³ for
On the upper curves, the slopes estimate
decompressed and original trace respectively. In addition to these two traces, we
also estimated the Hurst parameter for a trace with exponential inter-packet time.
M † r Õ ˆ , indicative of no long-range
The graphical result for this trace yields
dependence in this series. These results are shown in TABLE 7.4.1.
TABLE 4.4.1
Hurst parameter estimators
Decompressed RedIRIS Exponential
0.71
0.73
0.52
7.4.2 Spatial locality
The existence of spatial locality of reference can be established by comparing the total number of unique sequences observed in the decompressed trace and
the total number of unique sequences that would be found in the original trace
and in a trace with random permutation of addresses [7]. Notice that a random
permutation destroys the spatial locality of reference that may exist in a trace by
uncorrelating the sequence of references. If references are indeed correlated, then
114
3
Original RedIRIS trace
Decompressed RJV Trace
Exponential Trace
2.5
log 10(r/s)
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
3
3.5
log 10(d)
Figure 7.4: R/S plotting
one would expect the total number of unique sequences observed in a trace to be
less than the total number of unique sequences observed in a random trace.
Figure 7.5 shows the total number of unique destination addresses (Y axis)
Ê
for trace length (X axis). Three cases are shown. The top curve shows the total
number of addresses observed in a random trace. The middle and bottom curves
show the total number of unique addresses observed in our decompressed trace
and in the original trace respectively.
Ê The three curves show an increasing on the total number of unique addresses
as increases, with the randomized trace showing the steepest increase and the
others two curves showing similar slow increase. These results confirm a good
approximation between the decompressed and original trace.
7.4.3 Temporal Locality
Temporal locality implies that recently accessed addresses are more likely to
be referenced in the near future [7]. Cache miss ratio is one of the most important
parameters in characterizing the temporal locality.
In our experiments, we have used the decompressed trace, the original trace
and a random trace to drive a routing table lookup algorithm observing the cache
miss ratio for different cache sizes. Note that in this simulation study we are
concerned with the data cache only. This makes sense from the viewpoint of IP
address caching at a router, where the program itself is short and can be entirely
stored in any reasonable instruction cache, while the implementation of the data
structures for the address cache is a primary concern.
Figure 7.6 shows how temporal locality (measured by miss rate) varies
115
8000
Random trace
Decompressed RJV trace
Original RedIRIS trace
Number of Addresses
7000
6000
5000
4000
3000
2000
1000
0
0
100
200
300
400
500
600
700
800
Number of Packets (k)
Figure 7.5: Unique addresses observed for the destination IP address
across the traces. We see that the three curves show a decrease in the miss rate
ratio as cache size increases, with the randomized trace showing the slowest decrease and the decompressed and original traces showing similar decreases. The
miss ratios of the random trace (top curve) are consistently higher than those of
both the decompressed and the original traces, confirming again, a good approximation between both traces .
7.4.4 IP Address Structure
This section compares the structural characteristics of destination IP addresses seen on RedIRIS and decompressed trace. These characteristics may have
implications for algorithms that deal with IP address aggregates, such as routing
lookup and congestion control. In [75] was investigated how a conglomerate‘s
packets are distributed among its component address, and how those addresses
aggregate.
´ ^ would
As we have seen in chapter 2, if address structures were fractal,
appear as a straight line with slope D when plotted as a function of p. Figure 7.7
^
shows, for the RedIRIS trace, a log plot of as a function of p; we found that,
Ï ² = ² ksÏ , ^ curves do appear linear on a logfor a reasonable middle region
scale plot. In this case, the fractal dimension D is equal to 0.625. Regarding the
! Mb† r Ë óAó .
decompressed trace (Figure 7.7), the fractal dimension obtained was
This result is sufficiently close to the fractal dimension of the original trace.
To test if a data set is consistent with the properties of multifractals, we use
the Histogram Method to examine its multifractal spectrum [96]. The figure 7.8
plots the spectrum of the original and decompressed traces. Again, we see that
116
1
Random trace
Decompressed RJV trace
Original RedIRIS trace
0.9
0.8
Miss Rate
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
3
Cache Size (k)
Figure 7.6: Temporal locality parameters
4
Original RedIRIS trace
Decompressed RJV trace
3.5
3
Np
2.5
2
1.5
1
0.5
0
0
2
4
6
8
10
Prefix length p
12
Figure 7.7: Np as a function of prefix length
117
14
16
both are very similar.
Multifractal Spectrum
0,25
0,2
0,15
0,1
0,05
0
6
6
8
4
4
6
8 2
8 2
2
4
4 8
7
9
5
3
0, 0,3 0,3 0,4 0,4 0, 0,5 0,5 0,6 0,6 0, 0,7 0,7 0,8 0,8 0, 0,9 0,9
Scaling Exponent
Figure 7.8: Np as a function of prefix length
7.5 Memory Performance Validation
The compression method studied in this paper achieves a large compression
rate. However, it is not able to recover the exact compressed packet trace. In
this section we study whether the recovered trace is suitable for studies focusing
on memory access characteristics. The results presented in this section do not
cover the entire set of possible network benchmarks, but they clearly show that the
recovered trace exhibits close behavior in relation with the uncompressed packet
trace.
We have applied three benchmark programs taken from Netbench [82] and
Commbench [122] benchmarks. The selected programs were: Route (Netbench),
NAT (Netbench), and RTR (Commbench). All the selected programs involve the
Radix Tree Routing inside their algorithms. The Radix Tree is a binary tree, which
starting at the root, stores the prefix address and mask so far. As you move down
the tree, more bits are matched going one way down the tree. If they don’t match,
the other branch holds the entry required. This sort of data structure can result in
efficient average performance for forwarding table lookup times, on the order of
118
ln (number of entries), which for large routing tables is quite a gain. The returned
value from looking up an entry will typically be the next hop IP router.
The Radix Tree code was instrumented using the ATOM tool [108]. In order
to delimit the processing of packets, checkpoints were placed at the beginning and
at the end of the packet processing. The instrumented code records the number
of memory accesses performed by each packet. At the end of the traffic trace
processing, a list including the total number of memory accesses per packet is
generated.
7.5.1 Memory Access Measurements
In our experiments, we have used four different traces. A first trace is a
subset of the original RedIRIS trace, containing only Web flows. Henceforth,
we will call this trace of Original trace. The second one is the decompressed
trace, obtained after be applied our proposed compress/decompress method over
the Original trace. A third trace was generated assigning random IP destinations
addresses, but maintaining the same temporal distribution of the Original trace.
Finally, for the last trace, the IP directions were generated by a multiplicative
process and were launched using LRU stack model with an exponential interpacket time distribution.
Figure 7.9 plots the cumulative traffic (Y axis) against the number of memory access (X axis) when executing the Radix Tree Routing algorithm for the four
traces. We observe that the Original and the Decompressed trace show similar
behavior while the others traces depict different shapes. We can see, for instance,
that approximately 55% of the traffic from the Original and Decompressed trace
execute access to memory ranging from 53 to 67. Otherwise, the random trace
shows that only 30% of its traffic ranges from 53 to 62 access to memory, and the
fractal trace for this same number of memory access presents approximately 27%
of the traffic. Furthermore, we observe that for the Original and Decompressed
trace, the number of access ranging from 53 to 92 corresponds to 60%, but for
the Random trace, we have 70% of the traffic executing from 53 to 88 memory
accesses, and the random trace executing from 53 to 96 accesses to memory for
37% of the traffic.These divergences are due to the fact that the number of visited
nodes is different.
7.5.2 Cache Miss Rate
In Figure 7.10 and for the same Radix Tree algorithm, we show the cumulative traffic (Y axis) against the cache miss rate (X axis). Here, again, we observe
huge similarity among the Original and the Decompressed trace, but in this case,
the fractal trace has a similar behavior and the random trace presenting not concordance with the Original trace. In the graph we can see that about 60% of the
packets from the Original and Decompressed trace show a cache miss rate lower
than 5%, which correspond to the sequence of packets with a very similar behavior. Otherwise, for this same ratio, we obtain around 10% of the packets from
119
120
RedIRIS
RedIRIS random
fracexp
Decomp
100
Traffic (%)
80
60
40
20
0
50
60
70
80
90
100 110 120 130 140 150 160 170 180 190 200
#Mem Accs
Figure 7.9: Memory Access for the traces: original RedIRIS, the random address, the fractal
address, and the decompressed trace generated by our method.
the Random trace. For a cache miss ratio ranging from 5% to 10%, we observe
an inverse behavior, with 50% of the packets from the random trace conforming
this ratio and only 10% of the packets from the Original and Decompressed trace.
In our opinion, the differences between the Original and random trace are due to
the fact that in one trace memory needs to be released, whereas in the other trace
memory is still available.
120
120
RedIRIS
RedIRIS random
fracexp
Decomp
100
Traffic (%)
80
60
40
20
0
0%-5%
5%-10%
10%-20%
Cache Miss Rate
>20%
Figure 7.10: Cache miss rate for the traces: original RedIRIS, the random address, the fractal
address, and the decompressed trace generated by our method.
121
122
Chapter 8
Trace Classification
Until recently, routing of packets has involved the determination of outgoing link
based on the determination address and then transferring packet data to the appropriate link interface. Destination-based packet forwarding treats all packets
going to the same destination address identically, providing only best-effort service (servicing packets in a first-come-first-served manner). However, routers are
now called upon to provide additional functionalities such as security, billing, accounting and different qualities of service to different applications. To provide
these additional functionalities, complex packet analysis are performed as part
of the packet processing involving multiple fields in the packet. Thus, a better
understanding and classifying of the TCP/IP packet header fields constitutes an
important activity.
8.1. Packet Classification
Traditional routers do not provide service differentiation because they treat
all traffic going to a particular Internet address in the same way. However, users
are demanding a more discriminating form of router forwarding, called service
differentiation. The process of mapping packets to different service classes is referred as packet classification. Packet classification is important for applications
such as firewalls, intrusion detection, differentiated services, VPN implementations, QoS applications, and server load balancing (as shown in Table 8.1).
Table 8.1: Packet classification example
Layer
Application
Two
Switching, MPLS
Three
Forwarding
Four Flow Identification, IntServ
Four
Filtering, DiffServ
Seven
Load Balancing
Seven
Intrusion Detection
Packet classification routers have a database of rules. The rules are explicitly
ordered by a network manager (or protocol) that creates the rule database. Thus
123
when a packet arrives at a router, the router must find a rule that matches the
packet headers; if more than one match is found, the first matching rule is applied.
Packet classification involves selection of header fields form packets, such
as source and destination addresses, source and destination port numbers, protocol
or even parts of URL; and then finding out the best packet classification rule (also
called filtering rule or filter) to determine action to be taken on the packet. Each
packet classification rule consists of a prefix (or range of values) for each possible
header field, which matches a subset of packets.
The requirements for packet classification can vary widely depending on the
application and where packet classification is performed in the network:
H
H
Resource limitations: Packet classification solutions can tradeoff time to
perform the classification per packet vs. memory used. At large corporate
campuses, access speeds may range from medium speed of T3 and OC3, to top speeds of OC-12 and above. At inter ISP boundaries, the access
speeds will be OC-12, OC-48, and above. Residential customers have access speeds of T1 (DSL) or less. Solutions should achieve the required
target access speed, while minimizing the amount of memory used.
H
Number of rules to be supported: Packet classification applications differ in
the number of rules that are specified. Today typical firewalls may specify a
few hundred rules, while an access/backbone router may have a few hundred
of thousands of rules; these rules are expected to scale up with enhanced
services and router throughput and may reach millions of rules.
H
Number of fields used: Packet classification applications differ on the number of fields (dimensions) of the IP header that is used for classification.
Current routers use one field (destination IP address). Firewalls and other
access list applications may use a few more fields [53].
H
Nature of rules: Current routers use rules with a prefix mask on destination IP addresses. However, more general masks such as arbitrary ranges
are expected to become permissible. Packet classification solutions need to
accommodate such general specification.
H
Updating the set of rules: The number of changes to the rules either due
a route or policy change is moderate to small compared to the number of
packets that an application, e.g., a router, needs to classify in the same time
period. Packet classification solutions must adapt gracefully and quickly to
such updates without sacrificing the access performance. Rebuilding major
parts of the data structure for every update is prohibitive.
Worst case vs. Average case: There is a widely held view that for access
time performance of packet classification, one must focus on worst case,
rather than average case [77].
124
Many of these requirements have been articulated in the extensive collection
of papers that have addressed the packet classification problem [3], [52], [53],
[77], [78], [106], [107].
8.2. Flow Classification
Although the proposed solutions can vary greatly, two different approaches
exist to provide QoS (Quality of Service), one speeds up either the forwarding or
the route look-up procedure and the other is that routers be able to classify packets
based on the information carried by the packet headers or to optimize the amount
of work that needs to be done when routing decisions are made [86].
The first set of solutions either speed up the forwarding procedure by implementing the forwarding engine on hardware instead of software or optimize the
route look-up procedure with algorithm development. These solutions are referred
to as Gigabit-routing solutions and they tend to move the forwarding procedure to
specialized components. This speeds up the process of sending the packets to
Gigabits/s-level [87] [114] [109]. Also various algorithms for speeding up the
route look-up process have emerged and gained wide interest. These algorithms
usually utilize different and improved binary search methods [33] [64]. However,
with the developments of new broadband technologies, and thus more bandwidth
available, the limits of hardware-based routing or fast route look-ups will eventually be met.
The second set of solutions, aims to decrease the workload of routers by assigning long-lived packet flows do dedicated connections providing the flows with
a possibility to a better service level than can be realized in the default (routed)
connection. This approach consists of optimize the amount of work that needs to
be done when routing decisions are made [86] and is known as Integrated Internet
routing and switching, or the IP switching solution [87] [5] [31] [88] [113].
The routing decisions could be made to only fraction of the packets total in a
flow, thus reducing the total workload of the router. Doing just one route look-up
for a series of packets and then forwarding the rest of the packets in OSI layer 2
is an interesting approach when we compare it to the burden of doing the route
look-up and forwarding on OSI layer 3 as many times as there are packets in the
traffic flow.
Flow classification is one of the key issues in IP switching. An IP switching must perform flow classification to decide whether a flow should be switched
directly or forwarded hop-by-hop by the routing software. This is implemented
by inspecting the values of the packet header fields and making a decision based
upon a local policy.
One flow classification policy, the protocol-based policy [80] is to simply
classify flows by protocols. With this policy, all TCP flows are selected for switching while all UDP flows are forwarded hop-by-hop by the routing software. The
argument is that connection-oriented services are longer and have more packets to
be sent over a short time than connectionless services. Similarly, flows can also be
classified by applications, such as ftp, smtp, and http [80]. Only those applications
125
that tend to generate long-lived flows and contain a large number of packets are
selected for switching.
8.3. Packet Trace Classifier
Our work is related with the second set of solutions. In this chapter we describe a trace classifier. The idea of this trace classifier is analyze how similar
are traces collected from different links and provide informations about the traces
which can be helpful to implement packet classification rules or to choose the most
appropriate flow classification scheme to be used in traffic-controlled IP switching. To be able to do that, we have developed a visual tool based on spectrum
of colors to easily compare different traces. After describe the trace classification
method, we discuss how this classification method can be used both in packet and
flow classification.
In this section we propose a methodology based on three steps to identify
how similar are traces collected from different places and how different applications (e.g. WEB, P2P, FTP, etc) are distributed into the trace. The idea behind this
proposed classification is to offer an easy and efficient method to select different
traces characteristics to be used for performance evaluation purposes. We have
applied our methodology to traces captured from an OC-3 link (155 Mbps) that
connects the Scientific Ring of Catalonia to RedIRIS (Spanish National Research
Network) [98], which consists of about 250 institutions. This not sanitized trace is
a collection of packets flowing in one direction of the link containing a timestamp
and the first 40 bytes of the packet. For our analysis, we have used only the output
link. Furthermore, we surveyed traces downloaded from NLANR Web site [89].
The first step is devoted to evaluate how distributed are packets among the
m-packets flows. The figure 8.1 shows, for four different traces, the percentage
of packets (axis Y) placed in different m-packets flows (axis X). A first trace was
collected on 1993 and has a high predominance of FTP traffic. The RedIRIS and
Memphis University traces have a predominance of Web traffic but with the presence of P2P traffic. The last trace, the trace captured from Columbia University,
shows a high predominance of Web traffic. Based only in this first step, we can see
that two of them (RedIRIS and Memphis) show similar behavior while the others
have different distributions. From this first step, we have concluded that the trace
collected on 1993 is very different from the others, what led us to exclude it from
next analysis.
Using the flow characterization described in section 2, in a high-speed link
we can find potentially a large variety of flows. However, looking into the flows,
we can see that they are not very different from each other. The second step is
devoted to study the variety among flows. To be able to do that, we have used an
approach based on clustering, a classical technique used for workload characterization [67]. The basic idea of clustering is to partition the components into groups
so the members of a group are as similar as possible and different groups are as
- $
dissimilar as possible. From the set of vectors ( ), we calculate the Euclidian
distance between them and the results are stored in a distance matrix of flows. Ini126
- $
tially, each vector
represents a cluster. Evidently, distance 0 means that two
vectors are exactly identical. For each " ranging from 2 to 13, we apply, separately, the clustering method. The figure 8.2 shows, for three traces, the number
of different clusters (axis Y) for each one of the m-packets flows (axis X).
Percentage of packets
0.2
1993 trace
RedIRIS trace
Memphis University trace
Columbia University trace
0.15
0.1
0.05
0
0
5
10
15
20 25 30 35
m-packets flows
40
45
50
Figure 8.1: Packet distribution
50
RedIRIS trace
Memphis University trace
Columbia University trace
45
Number of Clusters
40
35
30
25
20
15
10
5
0
0
2
4
6
8
m-packets flows
10
12
14
Figure 8.2: Number of Clusters
Taking the percentage of packets and the number of clusters per m-packets
flows and applying a triangle-based cubic interpolation to create uniformly spaced
grid data we display each trace (RedIRIS, Memphis University, Columbia University) as a surface plot (Figures 8.3, 8.4 and 8.5). Looking at the shape of each
figure, we can see clearly that the RedIRIS and the Memphis traces show some
similarities and that the Columbia trace shape is totally different.
127
The third step analyze for each one of the m-packets flows, how distributed
are the packets among their clusters. Using the technique of spectrum of colors,
we have represented on figure 8.6, the spectrum for " =5. From each trace, we
have selected the most representative clusters. On figure 8.6, each bar graph represents a trace and each color on the bar graphic represents the percentage of flows
that fit with this cluster. Plotting the spectrum of the three traces under analysis,
we can see that the spectrum of RedIRIS and Memphis traces are similar while
the Columbia trace shows a different spectrum.
After concluding these tree steps, we can be capable to identify with a high
level of precision how semantically similar are different traces. On going work
have been devoted to identify the type of application present into each analyzed
trace (e.g. Web, P2P, etc).
30
Number of clusters
25
20
15
10
5
0
0.2
14
0.15
12
0.1
10
8
0.05
Percentage of packets
6
m−packets flows
4
0
2
Figure 8.3: RedIRIS trace - 3D shaping
128
30
Number of clusters
25
20
15
10
5
0
0.2
14
0.15
12
0.1
10
8
0.05
Percentage of packets
6
m−packets flows
4
0
2
Figure 8.4: Memphis trace - 3D shaping
30
Number of clusters
25
20
15
10
5
0
0.2
14
0.15
12
0.1
10
8
0.05
Percentage of packets
6
m−packets flows
4
0
2
Figure 8.5: Columbia trace - 3D shaping
129
RedIRIS Trace
Memphis Univerity
Columbia University
91−100%
41−50%
81−90%
31−40%
71−80%
21−30%
61−70%
11−20%
51−60%
0−10%
Figure 8.6: Flow clustering spectrum
130
Chapter 9
Conclusions
In this work, there are five main contributions that consist of:
1) A novel traffic characterization that incorporates semantic characteristics of
flows. We understand by semantic characterization the analysis of traffic characteristics of the TCP/IP header contents. We have demonstrated that behind the
great number of flows in a high-speed link, there is not so much variety among
them and clearly they can be grouped into a set of clusters. For flows with 5 packets, for instance, which represent 20% of the total, we can group then in only 142
types of flows, with 88% of the flows grouped in only four clusters. The evidence
that Internet flows can be grouped into a small set of clusters leaded us to create
templates of flows and an efficient method to compress and classify header packet
traces.
2) Using the concepts of Entropy, we have studied the compression bound for
packet header traces. We have seen that for some TCP/IP header fields, the entropy
is very low. Using a flow level approach and supported by the chain rule for
entropy, we have seen that the compression bound for TCP/IP header traces is
around 13%.
3) A lossless packet header compression method based on TCP flow clustering and Huffman encoding. The combined algorithm applies the Flow Clustering
technique for small flows and the Huffman encoding for large flows. This approach has significantly increased the compression ratio. We have seen that the
Flow Clustering technique fits well for small flows, where many flows are grouped
in few templates. In these circumstances, the number of templates remains constant or shows small variations. The technique is based on semantic similarities
among flows and TCP/IP functionalities. For large flows, we adopt another approach. This approach is based on Huffman encoding and explores the similarity
between packets during the life of a connection. With our proposed method, the
compression ratio that we achieve for .tsh headers packet traces is around 16%,
reducing the file size, for instance, from 100MB to 16 MB. The compression
proposed here is more efficient than any other and simple to implement. Others
known methods have their compression ratio bounded to 50% (GZIP) and 32%
(Van Jacobson method), pointing out the effectiveness of our method.
4) A lossy compression method. In order to reach higher compression ratios,
131
we have also proposed a methodology for lossy compression. With our proposed
method, storage size requirements are reduced to 3% of its original size. Although
this specification defines a lossy compression method, analysis over the decompressed trace, has demonstrated that for a set of statistical properties, it represents
a good approximation of original traces. Furthermore, a memory performance
evaluation was carried out with four types of traces and the outcomes for memory access and cache miss ratio measurements demonstrated that our proposed
compression method shows a huge efficiency.
5) A packet trace classifier. We proposed a methodology based on three steps to
identify how similar are traces collected from different places and how different
applications (e.g. Web, P2P, FTP, etc) are distributed into the trace.
As future works, we believe that using the flow clustering methodology presented here and other properties as packtet loss, retransmission, etc, we can build
a synthetic traffic generator.
132
Bibliography
[1] M. Acharya and B. Bhalla. A flow model for computer network traffic using
real-time measurements. in Second International Conference on Telecommunications Systems, Modeling and Analysis, March 1994.
[2] M. Acharya, R. Newman-Wolfe, H. Latchman, R. Chow, and B. Bhalla.
Real-time hierarchical traffic characterization of a campus area network.
in Proceedings of the Sixth International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, 1992.
[3] H. Adiseshu, S. Suri, and G. Parulkar. Packet filter management for layer 4
switching. 1998.
[4] A. Agrawala and D. Sanghi. Network dynamics: an experimental study of
the Internet. in Proceedings of Globecom’92, December 1992.
[5] H. Ahmed, R. Callon, A. Malis, and J. Moy. IP switching for scalable IP
services. Proceedings of the IEEE, 85:1984–1997, December 1997.
[6] M. Aida and T. Abe. Pseudo-Address Generation Algorithm of Packet
Destinations for Internet Performance Simulation. IEEE INFOCOM 2001,
2001.
[7] V. Almeida, A. Bestavros, M. Crovella, and A. Oliveira. Characterizing
reference locality in the WWW. Proceedings of the Fourth International
Conference on Parallel and Distributed Information Systems (PDIS96), December 1996.
[8] D. Anick, D. Mitra, and M. Sondhi. Stochastic theory of a data-handling
system with multiple sources. Bell System Technical Journal, 1984.
[9] M. Arlitt and C. Williamson. Internet Web Servers: Workload Characterization and Performance Implications. IEEE/ACM Transactions on Networking, 5:815–826, October 1997.
[10] S. Ata, M. Murata, and H. Miyahara. Analysis of Network Traffic and Its
Application to Design of High-Speed Routers. IEICE Trans. Inf and Syst.,
5, May 2000.
133
[11] C. Barakat, P. Thiran, G. Iannaccone, C. Diot, and P. Owezarski. Modeling
Internet Backbone Traffic at the Flow Level. IEEE Transactions on Signal
Processing - Special Issue on Networking, 51, August 2003.
[12] P. Barford and M. Crovella. Generating Representative Web Workloads
for Network and Server Performance Evaluation. Proceedings of the 1998
ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 151–160, July 1998.
[13] J. Beran. Statistics for Long-Memory Processes. Chapman and Hall, New
York, NY, 1994.
[14] D. Boggs, J. Mogul, and C. Kent. Measured capacity of an Ethernet: Miths
and reality. in Proceedings of ACM SIGCOMM’88, pages 222–234, August
1988.
[15] H. Braun and K. Claffy. Network analysis in support of Internet policy
requirements. in Proceedings of INET’93, June 1993.
[16] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web Caching and
Zipf-like Distributions: Evidence and Implications. In Proceedings of the
IEEE Infocom, April 1999.
[17] E. Brockmeyer, H.L. Halstrom, and A. Jensen. The Life and Works of A.
K. Erlang. Transactions of the Danish Academy of Technical Science ATS,
2, 1948.
[18] P. Brockwell and R. Davis. Time Series: Theory and Methods. Springer
Series in Statistics. Springer-Verlag, second edition, 1991.
[19] N. Brownlee and K. Claffy. Understanding Internet traffic streams: Dragonflies and tortoises. IEEE Communications Magazine, 40:110–117, October
2002.
[20] CAIDA.
The Cooperative Association for Internet Data Analysis.
http://www.caida.org.
[21] S. Casner and V. Jacobson. Compressing IP/UDP/RTP Headers for LowSpeed Serial Links. Internet Engineering Task Force, RFC-2508, February
1999.
[22] K. Claffy, H. Braun, and G. Polyzos. A Parameterizable Methodology for
Internet Traffic Flow Profiling. IEEE Journal on Selected Areas in Communications, 13, October 1995.
[23] K. Claffy, G. Polyzos, and H. Braun. Traffic characteristics of the T1
NSFNET backbone. in Proceedings of IEEE Infocom 93, pages 885–892,
1993.
134
[24] K. Claffy, G. Polyzos, and H.-W. Braun. Internet traffic flow profiling.
UCSD TR-CS93-328, SDCS GA-A21526, November 1993.
[25] K. Claffy, G. Potyzos, and H. Braun. Measurement considerations for assessing unidirectional latencies. Internetworking: Research and Experience, vol. 4:pp. 121–132, September 1993.
[26] Kimberly C. Claffy. Internet Traffic Characterization. Ph.D. Thesis, University of California, San Diego, 1994.
[27] T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley
Sons, INC.
[28] M. Crovella and A. Bestavros. Self-Similarity in World Wide Web Traffic:
Evidence and Possible Causes. In SIGMETRICS96, pages 160–169, May
1996.
[29] C. Cunha, A. Bestavros, and M. Crovella. Characteristics of WWW Clientbased Traces. Technical Report BU-CS-95-010, Boston University, Computer Science Department, July 1995.
[30] P. Danzig, S. Jamin, R. Caceres, D.J. Mitzel, and D. Estrin. An empirical
workload model for driving wide-area TCP/IP network simulation. Internetworking: Research and Experience, vol. 3:no. 1, 1991.
[31] B. Davie, P. Doolan, and Y. Rekhter. Switching in IP Networks-IP Switching, Tag Switching and Related Technologies. Morgan kaufmann Publishers, 1998.
[32] DEFLATE.
Compressed data format specification.
ftp://ds.internic.net/rfc/rfc1951.txt.
Available in
[33] M. Degermark, A. Brodnik, and S. Pink. Small Forwarding Tables for Fast
Routing Lockups. Luea University of Technology, 1997.
[34] M. Degermark, M. Engan, B. Nordgren, and S. Pink. Low-loss TCP/IP
Header Compression for Wireless Networks. Proc. MOBICOM, November
1996.
[35] M. Degermark, B. Nordgren, and S. Pink. IP Header Compression. Internet
Engineering Task Force, RFC-2507, February 1999.
[36] S. Deng. Empirical Model of WWW Document Arrivals at Access Link.
IEEE International Conference on Communication, june 1996.
[37] A.K. Erlang. The Theory of Probabilities and Telephone Conversations.
Nyt Tidsskrift Matematik, 20:33–39, 1909.
135
[38] A.K. Erlang. Solution of Some Problems in the Theory of Probabilities of
significance in Automatic Telephone Exchanges. Electrical Engineering
Journal, 10:189–197, 1917.
[39] D. Estrin and D. Mitzel. An assesment of state and lookup overhead in
routers. in Proceedings of IEEE Infocom 92, pages 2332–42, 1992.
[40] C. Bormann et al. RObust Header Compression ROHC: Framework and
four profiles: RTP, UDP, ESP, and uncompressed. Request for Comments
3095, July 2001.
[41] F. Arts et al. Network processor requirements and benchmarking. Computer Networks Journal. Special Issue: Network Processors, 41, April
2003.
[42] C.J.G. Evertsz and B.B. Mandelbrot. Multifractal measures. H.-O. Peitgen,
H. Jurgens and D. Saupe, editors, Chaos and Fractals: New Frontiers in
Science, Springer-Verlag, New York, 1992.
[43] D. Feldmeier. Improving gateway performance with a routing table cache.
in Proceedings of IEEE Infocom 88, pages 298–307, March 1988.
[44] S. Floyd and V. Jacobson. On traffic phase effects in packet-switched gateways. Internetworking: Research and Experience, vol. 3:pp. 115–156,
September 1992.
[45] S. Floyd and V. Jacobson. The synchronization of periodic routing messages. in Proceedings of ACM SIGCOMM’93, pages pp. 33–44, September
1993.
[46] R. Fonseca, V. Almeida, M. Crovella, and B. Abrahao. On the Intrinsic Locality Properties of Web Reference Streams. BUCS-TR-2002-022, August
2002.
[47] J.-L Gailly and M. Adler.
ftp://prep.ai.mit.edu/pub/gnu/.
GZIP documentation and sources.
[48] J.-L Gailly and M. Adler.
ZLIB documentation and sources.
ftp://ftp.uu.net/pub/archiving/zip/doc/.
[49] S. Glassman. A caching relay for the World Wide Web. In Proceedings of
the First International World Wide Web Conference, pages 69–76, 1994.
[50] N. Gulati, C. Williamson, and R. Bunt. Local area network locality: Characterization and application. in Proceedings of the first International Conference on LAN Interconnection, pages 233–250, October 1993.
[51] L. Guo and I. Matta. The war between mice and elephants. Technical Report BU-CS-2001-05 Boston University - Computer Science Department,
May 2001.
136
[52] P. Gupta, S. Lin, and N. McKeown. Routing lookups in hardware at memory access speeds. Proc. IEEE INFOCOM, San Francisco, California, page
1241, 1998.
[53] P. Gupta and N. McKeown. Packet classification on multiple fields. ACM
Computer Communication Review, 1999.
[54] D. Harte. Multifractals: Theory and Applications. Chapman Hall, 2001.
[55] J. F. Hayes. Modeling and Analysis of Computer Communications Networks. Plenum Publishing Corporation, New York, N.Y., 1984.
[56] H. Heffes and D. Lucantoni. A Markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance.
IEEE Journal on Selected Areas in Communications, vol. 4:pp. 856–868,
April 1986.
[57] S. Heimlich. Traffic characterization on the NSFNET National Backbone.
in Proceedings of the 1990 Winter USENIX Conference, December 1988.
[58] J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan kaufmann Publishers, 1990.
[59] R. Holanda and J. Garcia. A Lossless Compression Method for Internet
Packet Headers. EuroNGI Conference on Next Generation Internet Networks - Traffic Engineering. Rome, Italy, April 2005.
[60] R. Holanda and J. Garcia. A new methodology for packet trace classification and compression based on semantic traffic characterization. ITC - 19th
International Teletraffic Congress. Beijing, China., August 2005.
[61] R. Holanda, J. Garcia, and V. Almeida. Flow Clustering: a New Approach
to Semantic Traffic Characterization. 12th Conference on Measuring, Modelling, and Evaluation of Computer and Communication Systems, Dresden
- Germany, September 2004.
[62] R. Holanda, J. Verdu, J. Garcia, and M. Valero. Performance Analysis of
a New Packet Trace Compressor based on TCP Flow Clustering. ISPASS
2005 - IEEE International Symposium on Performance Analysis of Systems
and Software. Austin, Texas, USA, March 2005.
[63] J.Y. Hui. Resource allocation for broadband networks. IEEE J. Selected
Areas in Comm., 6:1598–1608, 1988.
[64] M. Waldwogel J. Turner, G. Varghese, and B. Plattner. Scalable High Speed
IP Routing Lookups. In Proceedings of ACM SIGCOMM 97, pages 25–36,
1997.
137
[65] Van Jacobson. Compressing TCP/IP Headers for Low-Speed Serial Links.
RFC-1144, February 1990.
[66] R. Jain. Characteristics of destination address locality in computer networks: a comparison of caching schemes. Computer networks and ISDN
systems, vol. 18:pp. 243–254, May 1990.
[67] R. Jain. The Art of Computer Systems Performance Analysis. John Wiley
& Sons, 1991.
[68] R. Jain and S. Routhier. Packet trains measurements and a new model for
computer network traffic. IEEE Journal on Selected Areas in Communications, pages 986–995, September 1986.
[69] S. Jin and A. Bestavros. Sources and Characteristics of Web Temporal
Locality. Proceedings of the 8th MASCOTS. IEEE Computer Society Press,
August 2000.
[70] T. Karagiannis, M. Molle, M. Faloutsos, and A. Broido. A Nonstationary
Poisson View of Internet Traffic. IEEE INFOCOM, 2004.
[71] L. Kleinrock. Queueing Systems, Volume 1: Theory. Wiley, 1975.
[72] L. Kleinrock and R. Gail. Queueing Systems: Problems and Solutions.
Wiley Interscience, 1996.
[73] S. Klivansky, A. Mukherjee, and C. Song. On Long Dependence in
NSFNET Traffic. http://www.cc.gatech.edu/, December 1994.
[74] D.E. Knuth. Dynamic Huffman coding. Journal of Algorithms, pages 163–
180, June 1985.
[75] E. Kohler, J. Li, V. Paxson, and S. Shenker. Observed structure of address in
IP traffic. Proceedings of the SIGCOMM Internet Measurements Workshop,
November 2002.
[76] B. Kumar. Effect of packet losses on end-user cost in internetworks with
usage based charging. Computer Communications Review, August 1993.
[77] T.V. Lakshman and D. Stiliadis. High-speed policy-based packet forwarding using efficient multi-dimensional range matching. ACM Computer
Communication Review, 28:203–214, September 1998.
[78] B. Lampson, V. Srinivasan, and G. Varghese. IP lookups using multiway
and multicolumn search. IEEE INFOCOM, page 1248, 1998.
[79] W. Leland, M. Taqqu, W. Willinger, and D. Wilson. On the self-similar nature of Ethernet traffic. in Proceedings of ACM SIGCOMM 93, September
1993.
138
[80] S. Lin and N. McKeon. A Simulation Study of IP Switching. ACM SIGCOMM’97, pages 15–24, 1997.
[81] B. B. Mandelbrot. Fractals, Form, Chance and Dimension. San Francisco,
CA, 1977.
[82] G. Memik, W.H. Mangione-Smith, and W. Hu. Netbench: A benchmarking
suite for network processors. IEEE International Conference ComputerAided Design-ICCA, 2001.
[83] J. Mogul. Network locality at the scale of processes. in Proceedings of
ACM SIGCOMM’91, pages 273–285, September 1991.
[84] J. Mogul. Observing TCP dynamics in real networks. in Proceedings of
ACM SIGCOMM’92, pages 305–317, August 1992.
[85] A. Mukherjee. On the dynamics and significance of low frequency components of Internet load. Technical Report, December 1992.
[86] P. Newman, G. Minshall, T. Lyon, and L. Huston. IP switching and gigabit
routers. IEEE Communications Magazine, pages 64–49, January 1997.
[87] P. Newman, G. Minshall, T. Lyon, and L. Huston. IP switching and gigabit
routers. IEEE Communications Magazine, pages 64–49, January 1997.
[88] P. Newman, G. Minshall, and T.L. Lyon. IP Switching - ATM under IP.
IEEE/ACM Transactions on Networking, 6:117–129, April 1998.
[89] NLANR.
National Laboratory for Applied Network Research.
http://www.nlanr.net.
[90] NLANR. NLANR. Passive Measurement and Analysis: Site configuration
and status. http://pma.nlanr.net/PMA/Sites/.
[91] R. Pang and V. Paxson. A High-Level Programming Environment for
Packet Trace Anonymization and Transformation. Proceedings of ACM
SIGCOMM Conference, August 2003.
[92] K. Park, G. Kim, and M. Crovella. On the effect of traffic self-similarity on
network performance. In Proceedings of SPIE International Conference on
Performance and control of Network Systems, November 1997.
[93] V. Paxson. Growth trends of wide area TCP conversations. IEEE Network,
1994.
[94] V. Paxson and S. Floyd. Wide-area traffic: the failure of Poisson modeling.
Technical Report Lawrence Berkeley Laboratory, February 1994.
[95] V. Paxson and S. Floyd. Wide Area Traffic: The Failure of Poisson Modeling. IEEE/ACM Transactions on Networking, vol. 3:226–244, June 1995.
139
[96] H. O. Peitgen, H. Jurgens, and D. Saupe. Chaos and Fractals. SpringerVerlag, 1992.
[97] COST 242 Project. Broadband Network Teletraffic-Final Report of Action.
Springer, 1996.
[98] RedIRIS. Spanish National Research Network. http://www.rediris.es.
[99] rfc791. Internet Protocol. DARPA Internet Program Protocol Specification,
September 1981.
[100] rfc793. Transmission Control Protocol. DARPA Internet Program Protocol
Specification, September 1981.
[101] R. Riedi. Introduction to multifractals. Technical Report, Rice University,
October 1999.
[102] S. Robert and J. Le Boudec. New models for self-similar traffic. Performance Evaluation, pages 57–68, 1997.
[103] V. Rutenburg and R. G. Ogier. Fair charging policies and minimumexpected-cost routing in internets with packet loss. in Proceedings of IEEE
Infocom 91, pages 279–288, April 1991.
[104] D. Sanghi, A. Agrawala, O. Gudmundson, and B. Jain. Experimental assesment of end-to-end behavior on the Internet. in Proceedings of IEEE
Infocom 93, pages 867–874, March 1993.
[105] A. Schmidt and R. Campbell. Internet protocol traffic analysis with applications for ATM switch design. Computer Communications Review, vol.
23:pp. 39–52, April 1993.
[106] V. Srinivasan, S. Suri, and G. Varghese. Packet classification using tuple
space search. ACM SIGCOMM’99, September 1999.
[107] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel. Fast and scalable
layer four switching. ACM SIGCOMM’98, September 1998.
[108] A. Srivastava and A. Eustace. ATOM - A system for building customized
program analysis tool. Programming Language Design and Implementation - PLDI, pages 196–205, June 1994.
[109] A. Tantaway, O. Koufopavlou, M. Zitterbart, and J. Abler. On the Design
of a Multigigabit IP Router. Journal of High Speed Networks, 1994.
[110] M. Taqqu, W. Willinger, and R. Sherman. Proof of a Fundamental Result
in Self-Similar Traffic Modeling. ACMCCR: Computer Communication
Review, 1997.
[111] tcpdump. The tcpdump program. ftp://ftp.ee.lbl.gov/tcpdump.tar.Z.
140
[112] TSH. TSH format. http://pma.nlanr.net/Traces/tsh.format.html.
[113] A. Viswanathan, N. Feldman, Z. Wang, and R. Callon. Evolution of multiprotocol label switching. IEEE Communications Magazine, 36:165–173,
May 1998.
[114] R.J. Walsh and C.M. Ozveren. The GIGA switch control processor. IEEE
Network, pages 36–43, February 1995.
[115] Z. Wang and J. Crowcroft. Eliminating periodic packet losses in the 4.3
Tahoe BSD TCP congestion contro algorithm. Computer Communications
Review, April 1992.
[116] C. Westphal. Improvements on IP Header Compression. GLOBECOM
2003 - IEEE Global Telecommunications Conference, 22:676–681, December 2003.
[117] C. Williamson. Internet Traffic Measurements. IEEE Internet Computing,
5:70–74, November 2001.
[118] W. Willinger. Variable-bit-rate video traffic and long-range dependence.
IEEE Transactions on Communication, 1994.
[119] W. Willinger, V. Paxson, and M. Taqqu. Self-similarity and heavy tails:
Structural modeling of network traffic. In Practical Guide to heavy Tails:
Statistical Techniques and Applications, Birkhauser Verlag, 1998.
[120] W. Willinger, M. Taqqu, R. Sherman, and D. Wilson. Self-similarity
through high-variability: Statistical analysis of Ethernet LAN traffic at the
source level. IEEE/ACM Transactions on Networking, 5(1):71–86, February 1997.
[121] W. Willinger, M.S. Taqqu, and A. Erramilli. A bibliographical guide to selfsimilar traffic and performance modeling for modern high-speed networks,
Stochastic Networks: Theory and applications. Royal Statistical Society
Lecture Notes Series, Oxford University Press, 4:339–366, 1996.
[122] T. Wolf and M.A. Franklin. CommBench - a telecommunications benchmark for network processors. Proc. of IEEE International Symposium on
Performance Analysis of Systems and Software - ISPASS, 2000.
[123] L. Zhang, S. Shenker, and D. Clark. Observations on the dynamics of a congestion control algorithm: The effects of two-way traffic. in Proceedings
of ACM SIGCOMM’91, pages 133–148, September 1991.
[124] J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory., 23:337–343.
141
Download