NETDOOP – DISTRIBUTED PROTOCOL-AWARE NETWORK TRACE ANALYZER Jankiben Patel

advertisement
NETDOOP – DISTRIBUTED PROTOCOL-AWARE NETWORK TRACE
ANALYZER
Jankiben Patel
B.E., Hemchandracharya North Gujarat University, India, 2005
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
COMPUTER SCIENCE
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
SPRING
2012
NETDOOP – DISTRIBUTED PROTOCOL-AWARE NETWORK TRACE
ANALYZER
A Project
by
Jankiben Patel
Approved by:
__________________________________, Committee Chair
Dr. Jinsong Ouyang
__________________________________, Second Reader
Dr. Chung-E Wang
____________________________
Date
ii
Student: Jankiben Patel
I certify that this student has met the requirements for format contained in the University
format manual, and that this project is suitable for shelving in the Library and credit is to
be awarded for the Project.
__________________________, Graduate Coordinator
Dr. Nikrouz Faroughi
Department of Computer Science
iii
________________
Date
Abstract
of
NETDOOP – DISTRIBUTED PROTOCOL-AWARE NETWORK TRACE
ANALYZER
by
Jankiben Patel
Modern data centers house hundreds to hundreds of thousands of servers. As the
complexity of the data centers increase, it becomes very important to monitor the network
for capacity planning, performance analysis and enforcing security. Sampling the
network data and analyzing helps data center administrators plan and tune for optimum
application performance. But as data centers grown in size, the sampled data sets become
very large in size. We study the application of map-reduce model, a parallel
programming technique, to process and analyze these large network traces. Specifically,
we analyze the network traces for iSCSI performance and network statistics.
We design and implement a prototype of a protocol-aware network trace-processing tool
called Netdoop. This prototype functions as a reference design. We also implement a
farm of virtual servers and iSCSI targets to create an environment that represents a small
iv
data center. We use this virtual environment to collect data sets and demonstrate the
scalability of the prototype. Further, these virtual servers are also used to host and run the
map-reduce framework.
Based on the performance and scalability of the tool that is developed, we make
conclusions about the applicability of the map-reduce model for analyzing large network
traces.
_______________________, Committee Chair
Dr. Jinsong Ouyang
_______________________
Date
v
ACKNOWLEDGEMENTS
I wish to extend my sincere gratitude to my project guide Dr. Jinsong Ouyang, for his
invaluable guidance, support and timely feedback at every stage of my project research
and implementation. Dr. Ouyang has always been immensely supportive mentor
throughout my graduation. Without his support, I would not have completed this project.
Dr. Ouyang’s knowledge in the fields of computer networking protocol and Distributed
Systems helped me solve the complex tasks of this project. I would take this opportunity
to thank Dr. Chung-E-Wang, my second reader, for helping me drafting my project report
and providing me his expertise, advice in the form of his positive feedback.
I am highly grateful to Dr. Nikrouz Faroughi and Dr. Cui Zhang for reviewing my project
report and providing me valuable suggestions.
I also wish to thank my manager Sanjeev Datla at Emulex Corporation for supporting me
as needed and providing valuable feedback
vi
TABLE OF CONTENTS
Page
Acknowledgments.............................................................................................................. vi
List of Tables ..................................................................................................................... xi
List of Figures ................................................................................................................... xii
Chapter
1
INTRODUCTION ........................................................................................................ 1
1.1
Motivation.............................................................................................................. 1
1.2
Requirements ......................................................................................................... 2
1.2.1 Scalability ........................................................................................................ 2
1.2.2 Processing Time .............................................................................................. 2
1.2.3 Operations ....................................................................................................... 2
1.3
2
Report Organization............................................................................................... 3
BACKGROUND .......................................................................................................... 4
2.1
Network Traffic Analysis ...................................................................................... 4
2.1.1 Sample Based Analysis ................................................................................... 4
2.1.2 Network Trace-based Analysis ....................................................................... 6
2.2
iSCSI Protocol ..................................................................................................... 11
vii
2.2.1 Server I/O and Convergence ......................................................................... 11
2.2.1.1 Direct Attached Storage .......................................................................... 11
2.2.1.2 Shared Storage ......................................................................................... 12
2.2.1.3 I/O Convergence ...................................................................................... 13
2.2.2 iSCSI ............................................................................................................. 16
2.2.2.1 Overview ................................................................................................. 16
2.2.2.2 iSCSI Definitions..................................................................................... 21
2.2.2.3 Flow Concepts ......................................................................................... 28
2.2.2.4 iSCSI Protocol Headers ........................................................................... 30
2.2.2.5 Summary.................................................................................................. 41
2.3
The Map-reduce Model ....................................................................................... 42
2.3.1 Serial vs. Parallel Programming .................................................................... 42
2.3.2 Map-reduce.................................................................................................... 43
3
METHODOLOGY ..................................................................................................... 48
3.1
Overview.............................................................................................................. 48
3.2
Design .................................................................................................................. 48
3.3
Framework Evaluation......................................................................................... 50
3.4
Implementation .................................................................................................... 50
3.5
Prototyping Environment..................................................................................... 51
3.6
Analysis and Conclusion ..................................................................................... 51
viii
4
DESIGN ..................................................................................................................... 52
4.1
Assumptions ........................................................................................................ 52
4.1.1 Virtual Infrastructure ..................................................................................... 52
4.1.2 Input Data ...................................................................................................... 53
4.1.3 User Queries .................................................................................................. 54
4.2
Data Format ......................................................................................................... 54
4.3
Fundamental Architecture ................................................................................... 60
4.4
Framework Based Architecture ........................................................................... 63
4.4.1 Apache Hadoop – A Map-reduce Framework .............................................. 63
4.4.1.1 Typical Hadoop Workflow ...................................................................... 65
4.4.1.2 Fault Tolerance and Load Balancing ....................................................... 65
4.4.2 Netdoop and Hadoop ..................................................................................... 66
4.4.2.1 Netdoop Workflow .................................................................................. 66
5
IMPLEMENTATION ................................................................................................ 68
5.1
Prototype Environment ........................................................................................ 68
5.1.1 Virtual Data Center ....................................................................................... 69
5.1.2 Virtual iSCSI Environment ........................................................................... 71
5.1.3 Virtual Map-reduce Cluster ........................................................................... 72
5.2
Netdoop................................................................................................................ 73
5.2.1 Structure ........................................................................................................ 74
ix
5.2.1.1 mapper.py ................................................................................................ 74
5.2.1.2 FlowTable.py ........................................................................................... 74
5.2.1.3 IscsiFlow.py............................................................................................. 75
5.2.1.4 IscsiFlowEntry.py.................................................................................... 75
5.2.1.5 IscsiTask.py ............................................................................................. 76
5.2.1.6 NetStatistics.py ........................................................................................ 76
5.2.1.7 reducer.py ................................................................................................ 76
5.2.1.8 ReducerFlowTable.py ............................................................................. 77
5.2.1.9 ReducerFlow.py....................................................................................... 77
5.2.1.10 ReducerNetStates.py.............................................................................. 78
5.2.1.11 web2netdoop.py ..................................................................................... 78
5.2.2 Execution ....................................................................................................... 78
5.2.2.1 Copying the network trace to the cluster ................................................. 78
5.2.2.2 Submitting the network trace for processing ........................................... 79
5.2.2.3 Copying data from the cluster ................................................................. 79
5.2.2.4 Web-based Interface ................................................................................ 80
6
CONCLUSION AND FUTURE WORK ................................................................... 84
6.1
Conclusion ........................................................................................................... 84
6.2
Future Work ......................................................................................................... 85
Bibliography ..................................................................................................................... 86
x
LIST OF TABLES
Page
Table 4.1 Fields in a Network Trace Record .................................................................... 55
xi
LIST OF FIGURES
Page
Figure 2.1 Netflow-enabled Device ................................................................................... 5
Figure 2.2 Screenshot of Netflow Analyzer Software ........................................................ 6
Figure 2.3 Finisar Xgig Network Analyzer Appliance ....................................................... 7
Figure 2.4 Finisar Xgig Software........................................................................................ 7
Figure 2.5 Wireshark - Open Source Network Analyzer .................................................... 8
Figure 2.6 A Setup to Capture Network Traces .................................................................. 9
Figure 2.7 PCAP File Format ............................................................................................. 9
Figure 2.8 Server with Direct Attached Storage ............................................................... 12
Figure 2.9 Server Connected to a Storage Area Network ................................................. 13
Figure 2.10 Server with a Converged Network Adapter .................................................. 15
Figure 2.11 Server Directly Attached to a SCSI Disk ...................................................... 17
Figure 2.12 Servers Connected to Shared Storage Using iSCSI ...................................... 18
Figure 2.13 iSCSI Initiator and Target Stacks .................................................................. 19
Figure 2.14 iSCSI Protocol Data Unit (PDU) ................................................................... 31
Figure 2.15 iSCSI Login PDU .......................................................................................... 32
Figure 2.16 iSCSI Login Response PDU .......................................................................... 33
Figure 2.17 iSCSI SCSI Command PDU ......................................................................... 34
Figure 2.18 iSCSI SCSI Response PDU ........................................................................... 35
xii
Figure 2.19 iSCSI Task Management Function Request PDU ......................................... 36
Figure 2.20 iSCSI Task Management Function Response PDU ...................................... 37
Figure 2.21 iSCSI Data-out PDU...................................................................................... 38
Figure 2.22 iSCSI Data-in PDU........................................................................................ 39
Figure 2.23 iSCSI R2T PDU ............................................................................................ 40
Figure 2.24: iSCSI Asynchronous PDU ........................................................................... 41
Figure 2.25 The Map-reduce Model ................................................................................. 46
Figure 4.1 Setup to Capture Network Traces.................................................................... 60
Figure 4.2 Parallel Processing Architecture of Netdoop .................................................. 61
Figure 4.3 Components of Hadoop Map-reduce Framework ........................................... 64
Figure 5.1 Virtual Data Center Environment .................................................................... 69
Figure 5.2 Virtual iSCSI Environment ............................................................................. 71
Figure 5.3 Virtual Map-reduce Environment.................................................................... 72
Figure 5.4 Netdoop Web-based User Interface................................................................. 80
Figure 5.5 Select All From Table Query Result ............................................................... 81
Figure 5.6 Flow Statics Calculated by Netdoop ............................................................... 82
Figure 5.7 Search Summary Statistics .............................................................................. 83
xiii
1
Chapter 1
INTRODUCTION
1.1
Motivation
Network traces provide important information about the traffic that passes through the
network. This information can be used for capacity planning as well as to diagnose
performance problems. Modern data centers house hundreds to hundreds of thousands of
servers connected using one or more networks. The size of the network traces captured in
these large data centers is too big to analyze using conventional processing tools running
on any single server. Further, data centers are converging network and storage traffic
over Ethernet using protocols such as Internet Small Computer System Interface (iSCSI)
protocol [8] and Fibre Channel over Ethernet (FCoE) protocol [2]. This makes it even
more important to mine these large network traces for information, for capacity planning
and performance analysis.
Map-reduce is the name of several software frameworks that eliminate a lot of the
complexity of writing fault-tolerant, distributed processing software. As a part of this
work, we study the applicability of the map-reduce model for processing large network
traces. We also develop a tool to analyze these large network traces for network and
storage protocol performance metrics.
2
1.2
Requirements
This section describes the general requirements for the study of the map-reduce model
and the tool that we develop.
1.2.1
Scalability
Scalability, in this context is the ability of the processing tool to run efficiently on any
number of CPU cores in one or more machines. The tool should be able to scale to utilize
more number of CPU cores and storage, as the size of the data increases. Inexpensive
servers and storage can then be added to the cluster, to process larger data sets.
1.2.2
Processing Time
Processing time and scalability are in a way related. It is much faster to process and
analyze a large data set using a number of CPU cores working in parallel, compared to
processing the data set on a single server. However, the data set should be favorable to
parallel processing. Processing time is an important consideration for this work as well as
the tool that we develop. Faster processing times mean more often more analysis results.
1.2.3
Operations
After a careful study of the map-reduce model, three operations were identified for
processing network traces. The first operation, filtering, involves filtering relevant data
from the irrelevant data present in the network trace. The second operation is grouping,
3
where the filtered data is grouped into records that share one or more common attributes,
for example a TCP flow or iSCSI session. Finally the sorting operation sorts the records
based on one or more attributes. The sorted data set is then processed in parallel on a
scalable cluster of CPU cores and storage.
1.3
Report Organization
In Chapter 2 we present an overview of network monitoring, the tools used to collect
network traces and some applications. We also present a brief introduction to the iSCSI
storage protocol and its network frame formats. Then we introduce map-reduce model.
In Chapter 3 we describe the methodology used throughout this project as well as in the
development of the Netdoop tool.
In Chapter 4 we describe the design and implementation of Netdoop that was developed
as a part of this project.
In Chapter 5 we describe the prototyping environment that was used to simulate a data
center environment to collect network traces, and also to host a map-reduce framework.
Finally, in Chapter 6 we summarize our conclusions about the applicability of mapreduce model for processing network traces and mining the data for performance.
4
Chapter 2
BACKGROUND
This chapter introduces some of the technologies related to this project. We start of by
looking at network traces, tools used for collecting network traces and some use cases.
Then we introduce the Internet SCSI (iSCSI) protocol to better understand the importance
and relevance of analyzing the performance of this protocol in modern data centers. We
then explain the map-reduce programming model.
2.1
Network Traffic Analysis
Network traffic analysis is a common operation in data centers of all sizes. Network
traffic analysis provides valuable data for capacity planning, event analysis, performance
monitoring, and security analysis.
2.1.1
Sample Based Analysis
There are a number of methods and tools to collect network data for analysis. One
method of collecting data for network traffic analysis is to configure the network
elements in a network with Cisco NetFlow[3] or sFlow[7]. In particular, NetFlow and
sFlow define the behavior of network monitoring agents inside switches and routers, a
MIB for configuring the agents, and the format of the datagrams that carries the traffic
measurement data from the agents to a controller. Both NetFlow and sFlow deal with
5
flows. A flow is a set of packets that share a certain set of attributes. Information about
the flows is stored in the flow cache of the switch or router. Periodically the flow entries
are encapsulated in a datagram and sent to the collector that computes various statistics.
Figure 2.1 shows the sampling operation in a NetFlow enabled device, and the NetFlow
cache in that device. Figure 2.2 shows a screenshot of NetFlow analyzer software.
Figure 2.1 Netflow-enabled Device
6
Figure 2.2 Screenshot of Netflow Analyzer Software [15]
2.1.2
Network Trace-based Analysis
Another method to collect network data for analysis is to use network analyzers or packet
capture equipment. There are dedicated hardware network analyzers, for example Xgig
from JDSU[6] or software-based network analyzers such as Wireshark[11] that run on
general purpose computers. There are also hardware-based packet capture equipment
such as the IPCopper [5]. Figure 2.3 shows the Finisar Xgig hardware network analyzer
and Figure 2.4 shows a screenshot from the Xgig analyzer interface. Figure 2.5 shows a
screenshot from the Wireshark network analyzer software running on a PC.
7
Figure 2.3 Finisar Xgig Network Analyzer Appliance [6]
Figure 2.4 Finisar Xgig Software [12]
8
Figure 2.5 Wireshark - Open Source Network Analyzer
For detailed performance analysis or for debugging, the most common approach in data
centers is to use network traces. For the scope of this project, we use the network trace
files generated by network analyzers. To capture a network trace using a network
analyzer, a port on the Ethernet switch is mirrored on to another port (labeled as mirror
port in Figure 2.6). Once mirrored, all traffic going through that port are sent to the
mirror port as well. By connecting a network analyzer to the mirror port, the network
analyzer can record all traffic going through the port that has been mirrored. In Figure 2.6
below, the network analyzer can record all traffic going between the servers and the
storage server.
9
Figure 2.6 A Setup to Capture Network Traces
It is also possible to connect some of the network analyzers or packet capture equipment
inline. When connected inline, the device behaves as a pass-thru device, in series with the
network connection of the network device whose network interface is being monitored.
Some network analyzers have proprietary file formats to store the network traces, and
other network analyzers including Finisar Xgig and WireShark support the popular PCAP
file format[10]. The network performance analysis tool developed as a part of this project
is designed to read PCAP files as input. The PCAP file has a very simple file format,
which comprises of a global header and multiple records, as shown in Figure 2.7.
Figure 2.7 PCAP File Format [10]
10
The global header contains information that describes the contents of the file. The
structure of the global header is shown below.
guint32 magic_number; /* magic number */
guint16 version_major; /* major version number */
guint16 version_minor; /* minor version number */
gint32 thiszone;
/* GMT to local correction */
guint32 sigfigs;
/* accuracy of timestamps */
guint32 snaplen;
/* max length of captured packets, in octets */
guint32 network;
/* data link type */
Each record comprises of a packet header and an Ethernet frame captured on the wire.
The packet header contains timestamp information related to the time of capture as well
as meta-data about the frame that is encapsulated in this record. The structure of a packet
header is shown below.
guint32 ts_sec;
/* timestamp seconds */
guint32 ts_usec;
/* timestamp microseconds */
guint32 incl_len;
/* number of octets of packet saved in file */
guint32 orig_len;
/* actual length of packet */
11
2.2
iSCSI Protocol
The previous section discussed the general topic of collecting network traffic data in a
data center. In this section we start by introducing I/O convergence happening in modern
data centers, and then describe the Internet Small Computer System Interface (iSCSI)
protocol [8]. iSCSI protocol allows storage commands and data to be transferred over
Ethernet.
2.2.1
Server I/O and Convergence
Computer servers typically have a network interface and a storage interface. A network
interface allows the server to communicate with other servers on the local network and
the Internet. A storage interface allows the servers to communicate with a storage
subsystem, to store and access data.
2.2.1.1 Direct Attached Storage
If the storage interface connects the server to a local storage device such as a hard drive
(HDD) or solid-state disk (SSD), the server is said to have direct-attached storage or
DAS. Common storage interfaces that connect a server to a local storage subsystem are
Small Computer System Interface (SCSI), Integrated Drive Electronics (IDE), Serial
Advanced Technology Attachment (SATA) and Serial Attached SCSI (SAS). Figure 2.8
shows the block diagram of a server with local storage.
12
Figure 2.8 Server with Direct Attached Storage
2.2.1.2 Shared Storage
If the storage interface connects the server to a storage subsystem shared by multiple
servers, the server is said to be connected to a Storage Area Network (SAN). Servers and
the storage devices connected over a SAN use a transport protocol to exchange control
information and data. Figure 2.9 shows the block diagram of a server that is connected to
a shared storage device over a SAN.
13
Figure 2.9 Server Connected to a Storage Area Network
As shown in Figure 2.9 above, using SAN results in data centers buying, deploying and
managing two separate I/O fabrics — one fabric for carrying network traffic and another
fabric to carry storage traffic.
2.2.1.3 I/O Convergence
In the previous section we saw that connecting servers to a SAN requires that data centers
14
buy, deploy and manage two separate fabrics — one for carrying network traffic, and
another for carrying storage traffic. With the introduction of new protocols such as iSCSI
and FCoE [2], data centers are now able to use Converged Network Adapters (CNA) [9]
to converge network and storage traffic over a single Ethernet fabric. Figure 2.10 shows
the block diagram of a server that uses a converged network adapter to exchange network
and storage traffic over Ethernet.
15
Figure 2.10 Server with a Converged Network Adapter
I/O convergence helps data centers save costs by buying, deploying and maintaining just
one I/O fabric — Ethernet. FCoE capable Ethernet switches bridge FCoE traffic to Fibre
Channel traffic and pass the Fibre Channel traffic over to a Fibre Channel fabric.
Whereas the iSCSI protocol is an end-to-end protocol between the server and a storage
device that is iSCSI capable (iSCSI target).
16
When multiple protocols and traffic types share the same Ethernet fabric in mission
critical business environments, it is important for data center administrators to allocate
proper bandwidth and quality-of-service (QoS) to each protocol. This ensures that the
protocols share the Ethernet fabric efficiently. To tune application performance or study
the cause-effect relationship of a particular data center event, it is important to have the
ability to analyze network traces as quickly as possible. As a part of our work, we apply
the results of our study about the applicability of map-reduce model for processing large
network traces, to storage protocols. The work that is done as a part of this project can be
extended to study the performance of other protocols as well.
2.2.2
iSCSI
This section introduces the Internet Small Computer System Interface (iSCSI) protocol in
more detail. We chose iSCSI as a protocol to apply our work because iSCSI is being
widely deployed in modern data centers today. The prototype developed as a part of our
project can be extended for use in real production environments, helping data center
administrators to benefit from our work.
2.2.2.1 Overview
The iSCSI protocol is layered on top of the TCP/IP protocol. iSCSI allows one or more
servers to use a shared storage subsystem on the network. Specifically, iSCSI provides a
17
transport for carrying SCSI commands and data over IP networks. Because iSCSI is
layered on top of the TCP/IP protocol, iSCSI can be used to connect servers and storage
across local area networks (LAN), wide area networks (WAN) or the Internet. Figure
2.11 shows a server connected to its storage subsystem over a SCSI interface. Figure 2.12
shows multiple servers using iSCSI to connect to an iSCSI target, sharing the storage
attached to the target.
Figure 2.11 Server Directly Attached to a SCSI Disk
18
Figure 2.12 Servers Connected to Shared Storage Using iSCSI
In an iSCSI environment, there are iSCSI initiators (clients), iSCSI targets (storage
server) and servers running management services. The common management services for
iSCSI are Internet Storage Name Service (iSNS) and Service Location Protocol (SLP).
The iSNS and SLP services help discover the available storage resources on a network,
such as advertising the available iSCSI targets to iSCSI initiators.
19
Figure 2.13 iSCSI Initiator and Target Stacks
Figure 2.13 shows the data flow from the application to the disks when using iSCSI.
There are two entities shown in Figure 2.13 — the server that is the iSCSI initiator or the
client, and the iSCSI target serving storage from the disks attached to it. Logically, iSCSI
connects the SCSI stack on a server to the SCSI stack on storage server (iSCSI target)
over Ethernet, using the TCP/IP protocol.
In a server, applications and the operating system read or write data to the filesystem. The
filesystem maps the data to storage blocks, which are then handed off to the SCSI layer.
The SCSI layer prepares command descriptor blocks (CDB) which are commands to
interact with SCSI protocol-capable disks to read and write data blocks. However, when
20
using iSCSI, these CDBs and the data are exchanged with the iSCSI stack instead of a
disk subsystem. The iSCSI stack encapsulates the CDBs into iSCSI command protocol
data units (iSCSI command PDUs) and data into iSCSI data PDUs. These PDUs are
exchanged with a remote iSCSI target over the TCP/IP stack using normal TCP/IP
protocol mechanisms and components.
At the iSCSI target, the iSCSI target stack receives iSCSI command PDUs along with
any data PDUs. The iSCSI target stack then extracts the SCSI CDBs from the iSCSI
command PDUs and the data from the iSCSI data PDUs. The CDBs and data are then
sent to the SCSI stack. The iSCSI protocol has logically connected the SCSI stack on the
server with the SCSI stack on the iSCSI target or the storage server. This allows the
server to use the disks attached to the remote iSCSI target for data storage.
The storage space on the storage servers or iSCSI targets is virtualized as logical units
(LUN). iSCSI management software typically determines which LUNs are available to
which server. This allows fine grain control of how much storage is allocated to each
server, as well as enforces access control. The blocks in each LUN are addressed by a
logical block address (LBA). Logical block addresses are linear within a LUN,
irrespective of how they are stored on the physical media.
21
2.2.2.2 iSCSI Definitions
The following is a summary of definitions and acronyms that are important in
understanding and analyzing the iSCSI protocol [page 10-14][8].
Alias: An alias string can also be associated with an iSCSI Node. The alias allows an
organization to associate a user-friendly string with the iSCSI Name. However, the alias
string is not a substitute for the iSCSI Name.
CID (Connection ID): Connections within a session are identified by a connection ID. It
is a unique ID for this connection within the session for the initiator. It is generated by the
initiator and presented to the target during login requests and during logouts that close
connections.
Connection: A connection is a TCP connection. Communication between the initiator
and target occurs over one or more TCP connections. The TCP connections carry control
messages, SCSI commands, parameters, and data within iSCSI Protocol Data Units
(iSCSI PDUs).
iSCSI Device: A SCSI Device using an iSCSI service delivery subsystem. Service
Delivery Subsystem is defined by the SCSI protocol as a transport mechanism for SCSI
commands and responses.
22
iSCSI Initiator Name: The iSCSI Initiator Name specifies the worldwide unique name
of the initiator.
iSCSI Initiator Node: The “initiator”. The work “initiator” has been appropriately
qualified as either a port or device in the rest of the iSCSI protocol document when the
context is ambiguous. All unqualified usages of “initiator” refer to an initiator port (or
device) depending on the context.
iSCSI Layer: This layer builds/receives iSCSI PDUs and relays/receives them to/from
one or more TCP connections that form an initiator-target “session”.
iSCSI Name: The name of an iSCSI initiator or iSCSI target.
iSCSI Node: The iSCSI Node represents a single iSCSI initiator or iSCSI target. There
are one or more iSCSI Nodes within a Network Entity. The iSCSI Node is accessible via
one or Network Portals. An iSCSI Node is identified by its iSCSI Name. The separation
of the iSCSI Name from the addresses used by and for the iSCSI Node allows multiple
iSCSI Nodes to use the same address, and the same iSCSI Node to use multiple
addresses.
23
iSCSI Target Name: The iSCSI Target name specifies the worldwide unique name of
the target.
iSCSI Target Node: The “target”.
iSCSI Task: An iSCSI task is an iSCSI request for which a response is expected.
iSCSI Transfer Direction: The iSCSI transfer direction is defined with regard to the
initiator. Outbound or outgoing transfers are transfers from the initiator to the target,
while inbound or incoming transfers are from the target to the initiator.
ISID: The initiator part of the Session Identifier. It is a explicitly specified by the initiator
during Login.
I_T nexus: According to the SCSI protocol, the I_T nexus is a relationship between a
SCSI Initiator Port and a SCSI Target Port. For iSCSI, this relationship is a session,
defined as a relationship between an iSCSI Initiator’s end of the session (SCSI Initiator
Port) and the iSCSI Target’s Portal Group. The I_T nexus can be identified by the
conjunction of the SCSI port names; that is, the I_T nexus identifier is the tuple (iSCSI
Initiator Name + ‘,i,’+ISID, iSCSI Target Name + ‘,t,’+ Portal Group Tag).
24
Network Entity: The Network Entity represents a device or gateway that is accessible
from the IP network. A Network Entity must have one or more Network Portals, each of
which can be used to gain access to the IP network by some iSCSI Nodes contained in
that Network Entity.
Network Portal: The Network Portal is a component of a Network Entity that has a
TCP/IP network address and that may be used by an iSCSI Node within that Network
Entity for the connection(s) within one of its iSCSI sessions. A Network Portal in an
initiator is identified by its IP address. A Network Portal in a target is identified by its IP
address and its listening TCP port.
Originator: In a negotiation or exchange, the party that initiates the negotiation or
exchange.
PDU (Protocol Data Unit): the initiator and target divide their communications into
messages. The term “iSCSI protocol data unit” (iSCSI PDU) is used for these messages.
Portal Groups: iSCSI supports multiple connections within the same session; some
implementations will have the ability to combine connections in a session across multiple
Network Portals. A Portal Group defines a set of Network Portals within an iSCSI
Network Entity that collectively supports the capability of coordinating a session with
25
connections spanning these portals. Not all Network Portals within a Portal Group need
participate in every session connected through that Portal Group. One or more Portal
Groups may provide access to an iSCSI Node. Each Network Portal, as utilized by a
given iSCSI Node, belongs to exactly one portal group within that Node.
Portal Group Tag: This 16-bit quantity identifies a Portal Group within an iSCSI Node.
All Network Portals with the same portal group tag in the context of a given iSCSI Node
are in the same Portal Group.
Recovery R2T: An R2T generated by a target upon detecting the loss of one or more
Data-Out PDUs through one of the following means: a digest error, a sequence error, or a
sequence reception timeout. A recovery R2T carries the next unused R2TSN, but requests
all or part of the data burst that an earlier R2T (with a lower R2TSN) had already
requested.
Responder: In a negotiation or exchange, the part that responds to the originator of the
negotiation or exchange.
SCSI Device: This is a SCSI protocol term for an entity that contains one or more SCSI
ports that are connected to a service delivery subsystem and supports a SCSI application
protocol. For example, a SCSI Initiator Device contains one or more SCSI Initiator Ports
26
and zero or more application clients. A Target Device contains one or more SCSI target
Ports and one or more device servers and associated logical units. For iSCSI, the SCSI
device is the component within an iSCSI Node that provides the SCSI functionality. As
such, there can be at most, one SCSI Device within a given iSCSI Node. Access to the
SCSI Device can only be achieved in an iSCSI normal operational session. The SCSI
Device Name is defined to be the iSCSI Name of the node.
SCSI Layer: This builds/receives SCSI CDBs (Command Descriptor Blocks) and
relays/receives them with the remaining command execute parameters to/from the iSCSI
Layer.
Session: The group of TCP connections that link an initiator with a target form a session
(loosely equivalent to a SCSI I-T nexus). TCP connections can be added and removed
from a session. Across all connections within a session, an initiator sees one and the same
target.
SCSI Initiator Port: This maps to the endpoint of an iSCSI normal operation session.
An iSCSI normal operational session is negotiated through the login process between an
iSCSI initiator node and an iSCSI target node. At successful completion of this process, a
SCSI Initiator Port is created within the SCSI Initiator Device. The SCSI Initiator Portal
Name and SCSI Initiator Port Identifier are both defined to be the iSCSI Initiator Name
27
together with (a) a label that identifies it as an initiator port name/identifier and (b) the
ISID portion of the session identifier.
SCSI Port: This is the SCSI term for an entity in a SCSI Device that provides the SCSI
functionality to interface with a service delivery subsystem. For iSCSI, the definition of
the SCSI Initiator Port and the SCSI Target Port are different.
SCSI Port Name: A name made up as UT-8 [RFC2279] characters and includes the
iSCSI name + ‘i’ or ’t’ + ISID or Portal Group Tag.
SCSI Target Port: This maps to an iSCSI Target Portal Group.
SCSI Target Port Name and SCSI Target Port Identifier: These are both defined to
be the iSCSI target Name together with (a) a label that identifies it as a target port
name/identifier and (b) the portal group tag.
SSID (Session ID): A session between an iSCSI initiator and an iSCSI target is defined
by a session ID that is a tuple composed of an initiator part (ISID) and a target part
(Target Portal Group Tag). The ISID is explicitly specified by the initiator at session
establishment. The Target Portal Group Tag is implied by the initiator through the
selection of the TCP endpoint at connection establishment. The TargetPortalGroupTag
28
key must also be returned by the target as a confirmation during connection establishment
when TargetName is given.
Target Portal Group Tag: A numerical identifier (16-bit) for an iSCSI Target Portal
Group.
TSIH (Target Session Identifying Handle): A target assigned tag for a session with a
specific named initiator. The target generates it during session establishment. Its internal
format and content are not defined by this protocol, except for the value 0 that is reserved
and used by the initiator to indicate a new session. It is given to the target during
additional connection establishment for the same session.
2.2.2.3 Flow Concepts
The iSCSI protocol is implemented on top of the TCP/IP protocol. TCP/IP is a
connection-oriented, byte-based loss-less transport protocol. An iSCSI connection
between an iSCSI initiator and an iSCSI target is a TCP connection. Communication
between the iSCSI initiator and target occurs over one or more TCP connections. The
TCP connections carry SCSI commands, parameters and data within iSCSI protocol data
units (iSCSI PDUs). Each connection has a connection ID (CID).
29
A group of iSCSI connections that connect an iSCSI initiator with an iSCSI target form
an iSCSI session. An iSCSI session is similar to a Initiator-Target nexus (I-T nexus) in
the SCSI protocol. TCP connections can be added or removed from an iSCSI session as
required. The iSCSI target sees the same target across all the connections within a
session. Each session has a session ID (SSID). SSID comprises of an initiator defined
component called the initiator session ID (ISID) and the target component called the
target portal group tag. A portal group defines a set of network portals within an iSCSI
network entity that collectively supports the capability of coordinating a session with
connections spanning these portals. One or more portal groups may provide access to an
iSCSI node. Each network portal used by a given iSCSI node belongs to exactly one
group with that node.
An iSCSI session is established through a login phase. An iSCSI login creates a TCP
connection, authenticates an iSCSI initiator and iSCSI target with each other, negotiates
operation parameters and associates the connection to a session. Once a successful login
phase has completed between an iSCSI initiator and an iSCSI target, the iSCSI session
enters a full feature phase. Once an iSCSI session is full feature phase, data can be
transferred between the iSCSI initiator and iSCSI target. Sending a logout command can
terminate a session. The session also may be terminated due to timeouts or TCP
connection failures.
30
As we have seen above, iSCSI carries SCSI commands and data over TCP connections
that together form a session. A network trace shows the TCP packets that form the iSCSI
PDUs. The headers of TCP packets (TCP headers) and the headers of the iSCSI PDUs
contain all the information required to analyze the state of an iSCSI transfer and end-toend performance of a session or connection. The following section is a summary of some
of the common iSCSI header formats as well as some examples of I/O operations.
2.2.2.4 iSCSI Protocol Headers
This section lists some important headers in iSCSI PDUs used in our work to compute
performance metrics of the iSCSI flows. For a more comprehensive list of all iSCSI
headers, PDU formats and definitions of individual fields in the headers please refer to
the iSCSI protocol specification [8].
2.2.2.4.1 iSCSI PDU
All iSCSI PDUs have one or more header segments and optionally, a data segment. After
the entire header segment group a header-digest may follow. A data digest may also
follow the data segment.
31
Figure 2.14 iSCSI Protocol Data Unit (PDU) [8]
2.2.2.4.2 Login Request
After establishing a TCP connection between an initiator and a target, the initiator must
start a login phase to gain further access to the target’s resources.
The Login Phase consists of a sequence of Login Requests and Responses that carry the
same Initiator Task Tag. The Login PDU is as show below.
32
Figure 2.15 iSCSI Login PDU [8]
2.2.2.4.3 Login Response
The Login Response indicates the progress and/or end of the Login Phase. The Login
Response PDU is as shown below.
33
Figure 2.16 iSCSI Login Response PDU[8]
2.2.2.4.4 SCSI Command
The format of the SCSI Command PDU is as shown below. The SCSI Command PDU
carries SCSI commands from the iSCSI initiator to the iSCSI target.
34
Figure 2.17 iSCSI SCSI Command PDU [8]
2.2.2.4.5 SCSI Response
The format of a SCSI Response PDU is as shown below. The SCSI Response PDU
carries the SCSI response from an iSCSI target to the iSCSI initiator in response to a
SCSI command.
35
Figure 2.18 iSCSI SCSI Response PDU [8]
2.2.2.4.6 Task Management Function Request
The Task Management function provides an initiator with a way to explicitly control the
execution of one or more Tasks (SCSI and iSCSI tasks). The Task Management Function
Request PDU is as shown below.
36
Figure 2.19 iSCSI Task Management Function Request PDU [8]
2.2.2.4.7 Task Management Function Response
In response to a Task Management Function Request, the iSCSI target responds with a
Task Management Response. The iSCSI Task Management Function Response PDU is as
shown below.
37
Figure 2.20 iSCSI Task Management Function Response PDU [8]
2.2.2.4.8 SCSI Data-Out for WRITE
The SCSI Data-Out PDU carries data from the iSCSI initiator to the iSCSI target, for a
write operation. The SCSI Data-Out PDU is as shown below.
38
Figure 2.21 iSCSI Data-out PDU [8]
2.2.2.4.9 SCSI Data-In for READ
The SCSI Data-In PDU carries data from the iSCSI target back to the iSCSI initiator for a
read operation. The SCSI Data-In PDU is as shown below.
39
Figure 2.22 iSCSI Data-in PDU [8]
2.2.2.4.10 Ready To Transfer (R2T)
When an initiator has submitted a SCSI Command with data that passes from initiator to
the target (WRITE), the target may specify which blocks of data it is ready to receive.
The target may request that the data blocks be delivered in whichever order is convenient
for the target at that particular instant. This information is passed from the target to the
40
initiator in the Ready To Transfer (R2T PDU. The R2T PDU is as shown below.
Figure 2.23 iSCSI R2T PDU [8]
2.2.2.4.11 Asynchronous Message
An Asynchronous Message may be sent from the target to the initiator without
correspondence to a particular command. The target specifies the reason for the event and
sense data. This is an unsolicited message from the iSCSI target to the iSCSI initiator, for
example to indicate an error. The Asynchronous Message PDU is as shown below.
41
Figure 2.24: iSCSI Asynchronous PDU [8]
2.2.2.5 Summary
This section introduced the concepts in the iSCSI protocol. A subset of iSCSI PDU
formats that are extracted from a network trace for performance analysis were introduced
as well. Servers using iSCSI protocol to connect to remote storage servers or iSCSI
42
targets do so over one or more TCP connections that forms an iSCSI session.
Information related to the TCP connections and iSCSI sessions are used in our work to
extract per flow statistics.
2.3
The Map-Reduce Model
This section introduces parallel programming and the map-reduce programming model.
We start by comparing serial and parallel programming paradigms.
2.3.1
Serial vs. Parallel Programming
In the serial programming paradigm, a program consists of a sequence of instructions,
where each instruction is executed one after another, from start to finish, on a single
processor. To improve performance and efficiency, parallel programming was developed.
A parallel program can be broken up into parts, where each part can be executed
concurrently with the other parts. Each part can be run on a different CPU in a single
computer, or CPUs in a set of computers connected over a network. Building a parallel
program consists of identifying tasks that can be run concurrently or partition the data
that can be processed simultaneously. It is not possible to parallelize a program if the
computed value depends on a previously computed value. However, if the data can be
broken up into equal chunks, we can put a parallel program together.
43
As an example, let us consider the processing requirements for a huge array of data. The
huge array can be partitioned into sub-arrays. If the processing needs are the same for
each element in the array, and does not depend on the computational result on another
element elsewhere in the array, this represents an ideal case for parallel programming.
We can structure the environment such that a master divides the array into a number of
equal-sized sub-arrays and distributes the sub-arrays to the available processing nodes or
workers. Once the processing is complete, the master can collect the results from each
worker. Each worker receives the sub-array from the master, and performs processing.
Once the processing is complete, the worker returns the results to the master.
2.3.2
Map-Reduce
Google introduced the map-reduce programming model in 2004. The map-reduce model
derives from the map-and-reduce function in functional programming languages. In a
functional programming language, the map function takes a function and a sequence of
values as input. The map function then applies the function passed to it to each element in
the sequence. The reduce operation combines all the elements of the resulting sequence
using a binary operation, such as “+” to add up all the elements in the sequence.
Map-reduce was developed by Google as a mechanism for processing large amounts of
raw data [1] [4]. The size of the data that has to be processed was so large that the
processing had to be distributed across thousands of machines, in order to be processed in
44
a reasonable amount of time. The map-reduce model provided an abstraction for Google
engineers to perform simple computations while hiding the details of parallelization,
distribution of data across processing nodes, load balancing and fault tolerance.
The users of the map-reduce library have to write a map function and a reduce function.
The map function takes an input pair and produces a set of intermediate key/value pairs.
The map-reduce library groups together all the intermediate values associated with the
same intermediate key and passes them to the reduce function. The reduce function
accepts an intermediate key and a set of values for that key. It merges together these
values to form a smaller set of values.
One of the main advantages of the map-reduce library is that the complexity of writing
parallel programs is abstracted away or even eliminated. However this simplification
sometimes limits the applicability of the map-reduce model to certain computational
tasks.
A classic example used to illustrate how a map-reduce program can be implemented is
shown in the pseudo code below. This example implements a word counting application
using the map-reduce model. The pseudo code is designed to take a set of text documents
and generate a list of words along with the number of occurrences of each word in the
input text data.
45
function map(String key, String text) {
for word in split_words(text) {
emit_intermediate(word, “1”)
}
}
function reduce(String word, Iterator values) {
int count = 0
for v in values {
count = count + to_int(v);
}
emit(word, to_string(count))
}
As seen from the pseudo code, the map function accepts a key and a value as input and
produces a number of intermediate key-value pairs as output. The reduce function accepts
a key and a list of values as input. The reduce function in this case simply adds up the list
of values passed to it, and produces a final key-value pair for each input key.
46
Figure 2.25 The Map-reduce Model
Figure 2.25 illustrates the execution and flow of the map-reduce job. After submitting a
map-reduce job, the master splits up the input data based on a set of default rules that can
be overridden by writing a custom function. The master then schedules each input data
chunk to a worker. For each input record in the data chunk, the worker calls the map
function. The map function processes the records and produces output records in the form
of key-value pairs. This phase is called the map phase. The map phase is followed by a
sort phase in which the intermediate key-value pairs are sorted based on the key. The
master then selects sets of keys and their values, and schedules them to workers for the
reduce operation. In the reduce phase, the workers participating in the reduce operation
47
call the reduce function on the key-value pairs assigned to that worker.
The reduce functions produce the final output files.
In the word counting example above, a large input text file is split into several chunks.
The default split criteria can be a set of lines based on line numbers. The map function
then processes a set of lines, by splitting the line into individual words and emitting key
value pairs of the format “word 1”. In this example, the word is the key and “1” the
value. During the sort phase, the intermediate files with key-value pairs generated during
the map phase are sorted based on the keys, which in this case are the words. The sorted
list is split once again and scheduled to the reduce workers. The reduce function totals up
the number of occurrences for each word and emits a final count as output file.
As we have seen above, the map-reduce abstracts the complexity of writing parallel
programs. The master takes care of load balancing, scheduling and fault-tolerance,
leaving the programmer to focus on solving the problem at hand. However, this
simplification comes at a cost in that the map-reduce model cannot be applied to any
generic problem. Hence as a part of our work, we study the applicability of the mapreduce model for processing large network traces.
48
Chapter 3
METHODOLOGY
This chapter describes the methodology used to study the problem of analyzing large
network traces using the map-reduce model and in applying the study to implement a tool
to analyze the network traces for the iSCSI and network performance metrics.
3.1
Overview
One of the goals of the project was to study the viability of using map-reduce model to
process network traces, specifically to mine for performance data of iSCSI and network
protocols. To do this we start by creating a prototype environment that represents a
typical data center. Using server virtualization, we created a number of virtual servers
and a virtual iSCSI target. Using this environment and WireShark network analyzer we
collect network traces. We then develop a tool to use the features of a map-reduce
framework. We use the cluster of virtual servers to deploy the tool and study the
distribution of processing workload across the configured cluster of virtual servers. Then
we collect the performance metrics of iSCSI protocol and the underlying network as
reported by the tool.
3.2
Design
In Chapter 1 we presented a set of requirements for this project. The key requirements
were scalability and processing time. In Chapter 2 we also introduced concepts around
49
the collection of network traces, as well as I/O convergence — a dominant trend in
modern data centers. We also introduced iSCSI as a popular protocol that is used to carry
storage and network traffic on a shared Ethernet fabric. There is more data than ever to
observe and analyze in order to predict problems before they occur, as well as to tune
application performance. The capability to process and analyze the network traffic data
sets must not be limited by the size of the data set.
A major design goal for this project has been scalability. Though we leverage the features
of a map-reduce framework, the design should not be restricted to the functionality of any
map-reduce framework. The design should serve as a reference design that represents a
best case in terms of performance.
It is not our intention to design a full-fledged implementation ready for deployment in
data centers. In could take months to design a full-functional tool that can be used with
realistic workloads, let alone the infrastructure necessary for demonstrating the system
working at scale. Instead, we must choose the components of the design such that the
design can be implemented in a limited timeframe, but still provide a reference for using
the map-reduce model for processing network traces. Leveraging the features of a mapreduce framework allows us to make tradeoffs between complexity and time to
implement. By leveraging the features of a standard map-reduce framework, the
implementation time is reduced to days and weeks instead of months.
50
3.3
Framework Evaluation
To implement the design system and the tool for processing network traces in data
centers, we use the features provided by a map-reduce model. We need to determine an
existing map-reduce framework for our implementation. There are a number of mapreduce frameworks available to pick from. However, the body of work that we generate
from this project is to solve a real problem in modern data centers. The framework we
choose must be stable, mature, even better — well known to data center administrators,
so that our work may be leveraged and applied in real data centers.
3.4
Implementation
Once we select a map-reduce framework, we proceed to implementation. We choose a
programming language supported by the map-reduce framework to implement the
functionality required for processing network traces. A number of protocols use Ethernet
as a transport. It will be impossible to implement a system to analyze network traces for
all the protocols that layer on top of Ethernet, in the timeframe available for this project.
We choose a protocol that is relevant and deployed in modern data centers. We
implement the system in a way that it can be extended to support other network protocols
in future. Usability is a very important design goal. We also implement a simple webbased user interface to access the performance data gathered from the network traces.
51
3.5
Prototyping Environment
The map-reduce frameworks significantly reduce the complexity in implementing
parallel programs. However to test and demonstrate the programs at scale requires a
complex infrastructure even to represent a small portion of a data center. Likewise, the
infrastructure required for generating a large network trace, store it and transport it
around can be huge. Given that our implementation is a reference implementation, we
need to build an environment that requires as little infrastructure as possible but still
represents a typical data center. The same reference infrastructure we build must be
capable of generating a scaled-down version of the network trace, just enough to
demonstrate the parallel execution of the processing and analysis. It will be very
expensive in time and money to acquire, configure, and deploy real servers, switches and
storage for the purpose of implementing, testing and demonstrating our work. We again
make a careful tradeoff between complexity and performance.
3.6
Analysis and Conclusion
After implementing the design and building the prototype data center environment, we
deploy our tool. We then study the results of the analysis of a sample network trace
comprising of TCP/IP and iSCSI traffic. Based on the analysis of the results, we should
be able to make conclusions on the viability, advantages and limitations of using mapreduce frameworks for processing network traces.
52
Chapter 4
DESIGN
This chapter describes the way in which the map-reduce model can be used to process
network traces, by distributing the processing workload on to a number of servers
working on a trace in parallel. We start by listing some of the assumptions that our
implementation is based on. We then describe a simple data format that we use for our
implementation. Next, we present the design details of our implementation, which is
basic but sufficient enough to be extended to support more network protocols as well as
work with real data center sized network traces.
4.1
Assumptions
This section lists the assumptions we had to make considering the limitations of the
prototyping infrastructure available, limitations imposed by the map-reduce framework
and performance vs. complexity tradeoffs.
4.1.1
Virtual Infrastructure
Parallel programming is a model of computing where a program or data is split into
multiple chunks, and processed on separate CPUs or servers concurrently. The mapreduce model provides a simple framework for developing parallel programs, hiding
much of the complexities associated with load balancing and fault tolerance.
53
However to implement, test and demonstrate a parallel processing model, we need a
number of servers and an interconnect fabric to demonstrate the benefits of parallel
processing. Acquiring such an infrastructure for the length of this project is expensive
and time consuming to maintain.
For the implementation, testing and demonstration of our work, we make use of server
virtualization. Server virtualization software such as VMware ESX, VMware Fusion
allow a single physical server to be partitioned into multiple virtual servers, each with its
own CPU instance, memory, network and storage interfaces. Recent developments in
CPU technology and enhancements in server virtualization software have dramatically
reduced the overhead due to virtualization. However, the performance of virtual
machines is not yet equal to the performance of physical machines. The performance
overhead comes from sharing of the physical resources among the virtual machines.
We use VMware Fusion running on a notebook computer to create a cluster of virtual
servers to deploy and demonstrate our implementation. While this is a reference
implementation, it should work out-of-the-box on physical servers with improved
performance.
4.1.2
Input Data
Our design is agnostic to input file formats. However, we assume that the network traces
54
for use with our implementation are simple text files. The text file is a text representation
of the network trace in PCAP format captured by a network analyzer. It is formatted in a
way that simplifies parsing. The text file contains flow data represented as a set of flow
records. Later in this chapter, we describe the file format that we use for these text files,
and that is similar to the text version of the PCAP files. A utility that we developed as a
part of our implementation converts the binary PCAP files to text files.
4.1.3
User Queries
User interaction with our tool is through a command line interface (CLI) and web
interface. We make an assumption that the network trace analysis jobs will be submitted
using the CLI, as it allows scripting. The web interface allows user to run queries on the
database of performance results, generated by the analyzing one or more trace files. A
user query can consist of:
•
A selection between iSCSI performance data or network statistics
•
A request to extract performance data based on a calendar-day
Our design is very flexible and can be extended to support a richer user interface, with
more statistics and user queries.
4.2
Data Format
The data format that we use in our reference implementation is intentionally simple. This
keeps the parsing complexity to a minimum, while retaining all the information required
55
for computing protocol-aware performance statistics for each flow in the network trace.
The map-reduce framework partially imposed the requirement of using text-based input
format.
The data format of the input text files comprises of a list of records, one record per line.
Each record comprises of iSCSI protocol or network statistics related fields from the
network trace. Table 4.1 lists each field in the record and a brief description. The fields in
Table 4.1 include fields from Ethernet, TCP/IP and iSCSI protocol headers. iSCSI
headers relevant to our work were summarized in Chapter 2.
Table 4.1 Fields in a Network Trace Record
frame.time_relative
Relative time of arrival with respect to
previous frame
frame.number
Frame serial number, starting from the
first frame captured
frame.len
Length of the captured frame
eth.src
Ethernet source MAC address
eth.dst
Ethernet destination MAC address
eth.type
Ethernet type
ip.src
IP source address for this frame
56
ip.dst
IP destination address for this frame
tcp.srcport
TCP source port for this frame
tcp.dstport
TCP destination port for this frame
tcp.window_size
TCP window size advertised in this
frame
tcp.analysis.retransmission
If the frame was a retransmission
tcp.analysis.window_update
If the remote node is updating a closed
window
tcp.analysis.window_full
The window of the sender of this frame is
closed
tcp.len
Length of TCP payload
tcp.analysis.bytes_in_flight
Number
of
bytes
sent
but
not
acknowledged
tcp.options.wscale
TCP window scale enabled
tcp.options.wscale_val
TCP window scale value
iscsi.request_frame
If this frame is carrying an iSCSI request
iscsi.response_frame
If this frame is a response to a previous
iSCSI request
iscsi.time
Time from request
iscsi
Frame belongs to iSCSI exchange
57
iscsi.scsicommand.R
PDU carries a read command
iscsi.scsicommand.W
PDU carries a write command
iscsi.initiatortasktag
Initiator Task Tag
As a part of this project we calculate various flow based statics and also present the
summary of the flows selected by a user query. There are two types of statistics counters
and gauge. For all counters, we calculate total, minimum, maximum and average value
for each selected flow. The summary at the end shows, for each counter, cumulative,
minimum, maximum, and average values of the selected flows. IP packets, IP bytes,
broadcast packets, multicast packets, unicast packets, frame size counters, iSCSi reads,
iSCSI read bytes, iSCSI writes, iSCSI write bytes and iSCSI packets are counters.
Minimum, Maximum and average command response time and minimum, maximum and
average windows size are calculated as gauge. All the statistics counters are explained
below:
IP Packets: This counter shows number of IP packets captured in a flow. Average value
is calculates by sum of total IP packets divided by number of flows in a result.
IP Bytes: This counter represents the total bytes sent or received in a flow. The IP bytes
is summation of packet size of all IP packets.
58
Broadcast Packets, Multicast Packets, and Unicast Packets: All three counters show
the number of packets of broadcast, multicast and unicast types of packets.
Frame size Counters: In understanding network performance packets size is an
important factor. We consider four different frame size categories; frame size less than
or equal to 64 Bytes(typical Ethernet packet), frame size between 64 Bytes and 256
Bytes, frame size between 256 and 1500 Bytes and frame size between 1500 Bytes to
9000 bytes.
Minimum iSCSI Command Response Time: iSCSI command response time is the time
between initiator issued a command to a target and the time when target successfully
processed the command and provided the response to the initiator. Minimum command
response time is the shortest time taken by a command of a flow. In summary, minimum
value shows the minimum time taken by a command from all selected flows.
Maximum iSCSI Command Response Time: This value shows the longest command
response time taken by a command of all commands of a flow. The summary value
shows the longest time taken by a command from all selected flows.
Average iSCSI Command Response Time: Average response time for flow is defined
as total response time for all tasks of a flow divided by total number of tasks of a flow.
59
For summary, this counter is calculated as total response time for all tasks of selected all
flows divided by total number of tasks of all selected flows.
Minimum Window Size, Maximum Window Size: TCP window size is the window
size present in TCP packet header.
Average Window size: The average window size is calculated as total window size for
all packets divided by total number of TCP packets. Average Window size for all
selected flows is calculated as the total window size for all packets of all selected flows
divided by the total number of packets of all selected flow.
iSCSI Reads and iSCSI Read Bytes: This counter shows number of iSCSI read
commands present in a flow. iSCSI read bytes shows , in all read commands how many
bytes are read from iSCSI target storage.
iSCSI Writes and iSCSI Writes Bytes: This counter shows number of iSCSI write
commands present in a flow. iSCSI write bytes shows , in all write commands how many
bytes are written to iSCSI target storage.
iSCSI IP Packets: This counter shows in a flow how many packets are iSCSI packets in
a flow and also in all selected flows.
60
4.3
Fundamental Architecture
This section describes the fundamental architecture of Netdoop, a distributed protocolaware network trace-processing tool that we developed, leveraging the features of a mapreduce framework. The map-reduce framework that we selected for our work is covered
in the next section.
Figure 4.1 shows a typical setup for capturing network traces and described earlier in
Chapter 2. The switch is configured to mirror a port of interest to another port that is
connected to a network analyzer. The network analyzer captures all the traffic that goes
through the mirrored port.
Figure 4.1 Setup to Capture Network Traces
The network analyzer captures a linear list of frames as they appear on the mirrored port.
61
In the example above, the mirrored port is connected to an iSCSI target. The servers on
the left of the figure share the storage attached to the iSCSI target. Hence the collected
trace can have many flows carrying storage commands and data between the servers and
the iSCSI target.
Figure 4.2 Parallel Processing Architecture of Netdoop
Figure 4.2 shows the fundamental architecture of our processing pipeline. The network
trace is a linear list of packets captured on the wire. The packets belong to different
protocols or flows. Unlike the word count example presented earlier, the network trace
file cannot be split arbitrarily into chunks for parallel processing. Splitting the file into
chunks without taking the protocol flows into consideration will result in the undesirable
side effect of re-ordering the frames.
62
Instead, we identify the network and protocol flows in the trace, group the frames
belonging to a flow and treat the group as a chunk. These flow-aware chunks can be
processed concurrently on a set of computers (called workers) connected over a network.
Because all the frames that comprise a flow are together, the frames belonging to a
particular flow are processed in-order. Components of our Netdoop software run on each
server shown in Figure 4.2. Once each server processes a flow-aware chunk assigned to
it, a collection server collects the partial performance results. On the collection server, the
performance metrics are merged and formatted, and the results are stored in a database.
Users interact with the system using a command line interface as well as a web interface.
The following is a summary of functions performed by Netdoop:

Segmentation of network trace into protocol and flow-aware chunks

Process each chunk concurrently with other chunks, extracting protocol header
data

Perform simple computations on the header data to demonstrate concurrent
processing

Collect partial results from concurrent processing and compile the results

Store the results in a database

Provide a web-based user interface for querying the database
This section introduced the fundamental architecture of Netdoop, a tool that we
63
developed for processing large network traces using the map-reduce model. The next
section describes the map-reduce framework used, and how the fundamental architecture
that we described in this section is mapped onto the map-reduce framework.
4.4
Framework Based Architecture
This section presents a detailed description of how we map the processing pipeline
described in the previous section to a map-reduce framework. We also present key
decisions that we had to make in the selection of a map-reduce framework, and
summarize its functions, advantages and disadvantages.
4.4.1
Apache Hadoop – A Map-Reduce Framework
We have discussed the map-reduce model and how it works in the earlier chapters. While
the map-reduce concept is very simple, implementing it for at-scale processing is
complex. Popular map-reduce frameworks generally provide the following features:

Segment the input data set

Schedule segments of input data to servers for processing

Implement load balancing algorithms to schedule tasks appropriately

Implement fault tolerance mechanisms to protect data against machine failures

Implement a distributed file system to provide uniform access to data across the
map-reduce cluster

Provide a job tracker to report status as well as restart failed workers
64
There are a number of frameworks that take different approaches to implement the above
features. We choose Apache Hadoop as it is a very popular open source map-reduce
framework widely deployed in the industry. Using a framework well known in the
industry makes our work more relevant and applicable to real data center use cases.
Apache Hadoop is a production-ready map-reduce framework developed and maintained
by the Apache Foundation [14].
The servers play three major roles in the Hadoop map-reduce framework:

Master nodes

Slave nodes

Client machines
CLIENT
NODES
NAME
NODE
SECONDARY
NAME NODE
JOB
TRACKER
MASTER
NODES
SLAVE
NODES
Figure 4.3 Components of Hadoop Map-reduce Framework [13]
65
Figure 4.3 shows the different types of nodes in a Hadoop system. The master nodes
oversee the key functional pieces that make up Hadoop -- coordinating the Hadoop
Distributed File System (HDFS) and running jobs on the machines. A job tracker
coordinates the processing of data. The slave nodes form the majority of machines and
process the actual data. Each slave node runs two services – the data node daemon and
the task tracker daemon. The data node daemon is a slave to the name node, while the
task tracker daemon is a slave to the job tracker. Client machines, though installed with
Hadoop software, are neither masters nor slaves. The client machines are usually used to
load data on to HDFS and to copy data/results out of the file system.
4.4.1.1 Typical Hadoop Workflow
A typical workflow for processing data using Hadoop is as follows:

Load data into the cluster (HDFS writes)

Analyze the data (Map Reduce)

Store results in the cluster (HDFS writes)

Read the results from the cluster (HDFS reads)
4.4.1.2 Fault Tolerance and Load Balancing
This section briefly describes some of the mission-critical features provided by Hadoop.
A detailed description of these features is beyond the scope of this project.
66
When data is loaded into the cluster (HDFS) writes, HDFS replicates the data to multiple
machines in the cluster, making data highly available in the event of hardware failures.
The job tracker on a master node, along with the task trackers on each slave machine
ensures that the processing workload is distributed efficiently based on the number of
slave machines that are available in the cluster.
4.4.2
Netdoop and Hadoop
Netdoop is a protocol and flow-aware distributed network trace-analysis tool that we
developed as a part of this project. Netdoop uses the features provided by Hadoop
framework, such as the Hadoop Distributed File System (HDFS) and task scheduling
capabilities.
4.4.2.1 Netdoop Workflow
The Netdoop tool works as follows, by leveraging some features provided by Hadoop:

Netdoop software is installed on the Hadoop cluster

A network trace in the format as described on page 54 is used as input

The network trace is loaded on to the Hadoop cluster (HDFS writes)

Hadoop calls the map function implemented by Netdoop

The map function implemented by Netdoop parses the network traces for flow
information, compiles protocol and flow-aware key-value pairs and hands off the
key-value pairs to Hadoop for scheduling
67

The job tracker of Hadoop schedules the key-value pairs to slave nodes in the
cluster for processing

Hadoop calls the processing components of Netdoop on each slave node

Netdoop processes the key-value pairs scheduled to that slave node and generates
result key-value pairs. The key-value pairs produced in this step contain
performance metrics for the protocol being analyzed

Hadoop calls the reduce function implemented by Netdoop, and schedules the
reduce operation on select slave nodes.

The Netdoop reduce functions compile the final performance metrics and write
the data to the cluster’s file system (HDFS)

The performance data is stored in a MySQL database

Users load and query the performance data in the database using the web interface
In this chapter we presented some of the assumptions related to our work. We also
described the design of data processing pipeline of Netdoop and how Netdoop leverages
the Hadoop map-reduce framework infrastructure. We study the actual implementation of
Netdoop in the next chapter.
68
Chapter 5
IMPLEMENTATION
This chapter presents the implementation of Netdoop tool that we developed. We start by
presenting an overview of the virtual data center environment that we created to deploy,
test and demonstrate the distributed processing of network traces on a cluster of virtual
servers. Next we present an overview of the modules and source code structure of
Netdoop. We then show sample performance metrics that were generated by processing
network traces using Netdoop.
5.1
Prototype Environment
This section describes the virtual data center environment that was created for this
project. Map-reduce frameworks are parallel processing frameworks that work well in a
compute cluster environment. Input data is split into chunks and processed concurrently
on multiple computers in a cluster. To build and maintain a cluster for the purpose of this
project is complex, expensive and time consuming. Hence we use server virtualization
software to partition a server into multiple virtual servers, and then build a virtual cluster
using virtual servers. Specifically, we use VMware Fusion on an Apple Mac to host these
virtual servers. The number of virtual servers we use in a cluster is limited by the amount
of memory and processing cores that we have available in the computer that we use for
implementing, testing and demonstrating our work as a part of this project.
69
5.1.1
Virtual Data Center
VMWARE FUSION
VIRTUAL MACHINE
MASTER
UBUNTU 10.04 SERVER
172.0.0.5
VIRTUAL SWITCH
172.0.0.1
172.0.0.2
UBUNTU 10.04 SERVER
UBUNTU 10.04 SERVER
VIRTUAL MACHINE
SLAVE 1
VIRTUAL MACHINE
SLAVE 2
172.0.0.3
UBUNTU 10.04 SERVER
VIRTUAL MACHINE
SLAVE 3
Figure 5.1 Virtual Data Center Environment
Figure 5.1 shows the virtual data center environment we created for the purpose of this
project. We use VMware Fusion virtualization software to instantiate four virtual
machines on a laptop computer. The number of virtual machines we could instantiate was
limited by the number of CPU cores and memory available on the laptop computer.
We use Ubuntu 10.04 as the operating system in each of the virtual machines. The
machines are labeled as master, slave 1, slave 2 and slave 3 based on their roles in the
70
Hadoop environment. However, the same environment was used to create an iSCSI
initiator and target configuration to generate some sample network traces with iSCSI
protocol traffic in them. Each virtual machine is configured with two network interfaces.
Once network interface is connected to the virtual switch provided by the virtualization
software. This interface allows the virtual machines to communicate with each other.
Virtual machines on this private network are assigned static IP addresses as shown in the
Figure 5.1. The other network interface is connected to the laptop’s networking stack so
that we can communicate between the applications installed on the laptop computer and
the virtual machines. This network interface uses a dynamic IP address address.
The next two sections show the configuration of this virtual data center for 1) collection
of network traces with iSCSI protocol traffic and 2) for installing Netdoop and Hadoop to
deploy and run the tool on network trace samples.
71
5.1.2
Virtual iSCSI Environment
VMWARE FUSION
VIRTUAL MACHINE
MASTER
VDISK
PCAP TRACE
iSCSI TARGET
WIRESHARK
UBUNTU 10.04 SERVER
172.0.0.5
VIRTUAL SWITCH
172.0.0.1
172.0.0.2
172.0.0.3
iSCSI INITIATOR
iSCSI INITIATOR
iSCSI INITIATOR
UBUNTU 10.04 SERVER
UBUNTU 10.04 SERVER
UBUNTU 10.04 SERVER
VIRTUAL MACHINE
SLAVE 1
VIRTUAL MACHINE
SLAVE 2
VIRTUAL MACHINE
SLAVE 3
Figure 5.2 Virtual iSCSI Environment
Figure 5.2 shows the configuration of the virtual data center environment as configured to
generate and collect sample network traces with iSCSI traffic. The slave nodes 1 – 3 are
configured with iSCSI initiator software, and the master node is configured as an iSCSI
target. The iSCSI target is configured with local virtual disks and LUNs. On successful
logins from iSCSI initiators to the iSCSI target, these LUNs are exposed to the iSCSI
initiators as block storage devices. We do a number of file operations on these disks on
the iSCSI initiators to generate sufficient network and iSCSI traffic to the iSCSI target.
72
We install Wireshark, an open source network capture utility on the iSCSI target node to
capture the network traffic on iSCSI target’s network interface connected to the virtual
switch. We use Wireshark to generate the text version of the PCAP file. Network traffic
captured on this port on the iSCSI target contains network traffic between the slave/iSCSI
initiator nodes and the iSCSI target.
5.1.3
Virtual Map-Reduce Cluster
VMWARE FUSION
VIRTUAL MACHINE
MASTER
MYSQL
WEB.PY
NETDOOP
NAME
NODE
JOB
TRACKER
WWW
SERVER
HADOOP
UBUNTU 10.04 SERVER
172.0.0.5
VIRTUAL SWITCH
172.0.0.1
NETDOOP
DATA NODE
TASK
TRACKER
172.0.0.2
NETDOOP
DATA NODE
TASK
TRACKER
172.0.0.3
NETDOOP
DATA NODE
TASK
TRACKER
HADOOP
HADOOP
HADOOP
UBUNTU 10.04 SERVER
UBUNTU 10.04 SERVER
UBUNTU 10.04 SERVER
VIRTUAL MACHINE
SLAVE 1
VIRTUAL MACHINE
SLAVE 2
VIRTUAL MACHINE
SLAVE 3
Figure 5.3 Virtual Map-reduce Environment
73
Figure 5.3 shows the configuration of the virtual map-reduce infrastructure that we build
using the virtual data center environment described earlier. In this configuration, the
master node fills the role of a master node in the Hadoop framework. The master node
runs the name and job tracker services of Hadoop. We also install Netdoop on the master
node and hence onto the HDFS. We use python web framework “web.py” to implement a
web application server. We develop a web application using web.py framework to
provide a web-based user interface for the user. We also use MySQL database installed
on this master node to store the performance metrics computed during the analysis of a
network trace.
The slave nodes run the data node and task tracker services of Hadoop framework.
Netdoop now resides on the distributed filesystem (HDFS) and hence is available in all
the slave nodes.
5.2
Netdoop
This section describes the implementation of the Netdoop tool. We use the Python
programming language to implement Netdoop. We chose Python, as it is an excellent
object oriented programming language and partly because of the limitations enforced by
the Hadoop map-reduce framework.
74
5.2.1
Structure
This section summarizes the functionality of each source file of Netdoop.
5.2.1.1 mapper.py
This is the main file that implements the mapper function called by the Hadoop
framework. In this file we create an instance of the FlowTable class, the flow_table
object. We then call flow_table.process_line(line) where line is a line read from stdin. We
use stdin because Hadoop streams the input data to the mapper function over the stdin
interface. We then call flow_table.compute() to generate the partial output key-value
pairs of the network statistics gathered from the network trace data received over stdin.
5.2.1.2 FlowTable.py
This file implements the FlowTable class. The FlowTable class is a container class that
serves as a container for a list of IscsiFlow objects. The IscsiFlow objects are instances of
the IscsiFlow class. We use the class IscsiFlowEntry to parse and qualify each entry
received over stdin. We then create an IscsiFlow object from the iSCSIFlowEntry. We
make an entry in the FlowTable only if the entry doesn’t exist for that flow. The compute
method of the FlowTable class generates key-value pairs for the sort phase of the mapreduce pipeline.
75
5.2.1.3 IscsiFlow.py
In this file we implement the IscsiFlow class. This class encapsulates the properties
related to a particular iSCSI flow. The properties include ip_src, ip_dst, tcp_src, tcp_dst,
task_table, min_response_time, max_response_time, num_valid, num_reads, num_writes,
min_window_sz, max_window_sz and frame_cnt. An iSCSI flow in this context is an
iSCSI exchange between two nodes defined by the IP.src:IP.dst tuple. The class provides
methods to add flow entries to the flow. The flow entries are records from the trace file
received over the stdin interface.
5.2.1.4 IscsiFlowEntry.py
In this file we implement the IscsiFlowEntry class. This class encapsulates the following
properties of a flow: frame_time_relative, frame_number, ip_src, ip_dst, tcp_src, tcp_dst,
tcp_window_size,
tcp_analysis_retransmission,
tcp_analysis_window_update,
tcp_analysis_window_full, tcp_len, tcp_analysis_bytes_in_flight, tcp_options_wscale,
tcp_options_wscale_val,
iscsi_request_frame,
iscsi_response_frame,
iscsi_time,
iscsi_cmd_string. This class is also responsible for parsing the records from the text
format trace file. We implement a number of private methods within this file that help
validate and parse iSCSI flow information from the network trace.
76
5.2.1.5 IscsiTask.py
In this file we implement the IscsiTask class. The IscsiTask class encapsulates properties
related to a single iSCSI task such as associated with a command. This class contains the
following properties: cmd_str, rd_str, wr_str, is_valid, itt, iscsi_task_entries, base_time,
is_response_time_valid, task_min_response_time, task_max_response_time, is_read,
is_write. This class includes private methods to serve as helper functions in computing
statistics and performance metrics for an iSCSI task.
5.2.1.6 NetStatistics.py
In this file we implements all the routines required to calculate MIB statistics like number
of broadcast packets, multicast packets, unicast packets, total number of packets and total
bytes sent or received. We also check the frame size and maintain counters for each
frame size categories. For example, we are currently have four different ranges for frame
size, frame size less than 64 bytes, between 64 to 256 bytes, between 256 to 1500 bytes
and any size greater than 1500 bytes.
5.2.1.7 reducer.py
In this file we implement the reducer function called by the Hadoop framework. The
reducer function is responsible for aggregating the partial results collected and sorted by
Hadoop. Once again Hadoop passes these intermediate results as key-value pairs over the
stdin interface. Therefore reducer.py reads its input from stdin.
77
The implementation of the reducer is very similar to the mapper, in that we use a
ReducerFlowTable class to capture the partial results of the iSCSI flows generated by the
map phase of Hadoop. We call the reducer_flow_table.process_line(line) method of the
ReducerFlowTable class, where line is the key-value pair that we receive on stdin. We
then call reducer_flow_table.report() to dump the final results generated by compiling
the partial results from the map phase.
5.2.1.8 ReducerFlowTable.py
In this file we implement the ReducerFlowTable class. This class is a container for
objects of the ReducerFlow class. The only property of this class is flow_table, which is a
python hash for storing the flow table entries. The ReducerFlowTable class implements
two methods. The process_line() method parses the incoming key-value pairs and makes
entries in the flow_table structure. The report() method generates a report of the
performance metrics processed by the reducer.
5.2.1.9 ReducerFlow.py
In this file we implement the ReducerFlow class. This class encapsulates the following
properties corresponding an iSCSI flow: key, min_response_time, max_response_time,
min_window_size, max_window_size, iscsi_frame_count, num_reads, num_writes. This
class also implements the process_value(value) method that extracts the fields from the
78
key-value record received over stdin interface after the map phase.
5.2.1.10 ReducerNetStates.py
In this file we implement reduce functions for all MIB statistics.
5.2.1.11 web2netdoop.py
In this file we implement the logic for the web interface. We leverage the features
provided by a popular web application framework for python – web.py. We implement
interface functions that connect to the MySQL database and extract data. We also
implement function that format the data retrieved from the database and present it to the
browser as HTTP responses.
5.2.2
Execution
This section describes the user interation with Netdoop. Specifically we describe the
commands to copy the network trace to the cluster, as well as to submit the network trace
for processing. We then describe the web-based user interface for querying the
performance results stored in the database.
5.2.2.1 Copying the network trace to the cluster
>bin/hadoop dfs –copyFromLocal network_trace.txt /user/user/network_trace.txt
79
5.2.2.2 Submitting the network trace for processing
>bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar – file netdoop/mapper.py –
mapper
mapper.py
–file
netdoop/reducer.py
/reducer
reducer.py
–file
netdoop/IscsiFlow.py –file netdoop/IscsiFlowEntry.py –file netdoop/IscsiTask.py –file
netdoop/ReducerFlow.py
–file
netdoop/ReducerFlowTable.py
/user/user/network_trace.txt –output /user/user/results.txt
5.2.2.3 Copying data from the cluster
>bin/hadoop dfs –getmerge /user/user/results.txt results.txt
–input
80
5.2.2.4 Web-based Interface
Figure 5.4 Netdoop Web-based User Interface
Figure 5.4 shows a screen capture of the web-based user interface of Netdoop. The webbased interface allows users to query the performance metrics stored in the database. For
this reference implementation we support a minimal set of queries such as the flow
defined by the IP address pair, the date-range and time-range. However our architecture
is extensible and can easily be extended to support more complex queries.
81
Figure 5.5 Select All From Table Query Result
Figure 5.5 shows an example screen capture of select all data from the table query. First
all flows selected from the query are displayed with expandable control. After all flow
statistics summary of search is displayed
82
Figure 5.6 Flow Statics Calculated by Netdoop
Netdoop calculates iSCSI protocol related statistics and network statistics for each flow.
Figure 5.6 shows statistics calculated per flow in detail.
83
Figure 5.7 Search Summary Statistics
Figure 5.7 shows an example of summary of search result in detail.
In this chapter we presented a detailed description of the implementation of Netdoop. We
described the functions implemented by each source file. We also presented the
command line interface, the commands to submit a network trace for processing as well
as the web-based user interface. In the next chapter, we present our conclusions and
future work.
84
Chapter 6
CONCLUSION AND FUTURE WORK
In this chapter we draw a conclusion about the suitability of using map-reduce mode and
frameworks for analyzing large network traces and extracting performance metrics of
network protocols. We also suggest what future work can be done based on the result of
our work.
6.1
Conclusion
We studied the applicability of map-reduce model for processing network traces.
Processing a large network trace using a parallel programming model involves multiple
operations. These operation include identifying and grouping network flows present in
the network trace, sorting the groups, scheduling the groups for processing on a cluster of
processing nodes and aggregating the results. The map-reduce model has been found to
be a good match for scheduling the workload and providing a distributed storage
infrastructure accessible across the cluster. However the general purpose map-reduce
framework are not optimal in partitioning network traces as well as aggregating the
results. This is because of the ordering requirements and protocol awareness that is
required for this step.
We implemented Netdoop, a tool that leverages some of the useful features of Hadoop, a
general-purpose map-reduce framework that is very popular in the industry. Netdoop
85
adds protocol/flow-awareness to Hadoop, allowing us to use the scheduling, HDFS and
fault-tolerance features that Hadoop offers. Specifically we observed that the processing
time requirements for breaking up a network trace into small flow-based chunks, and
processing all the chunks in parallel produces results faster than executing the same task
on a single machine. We also noticed that the frame ordering is maintained and verified
that the computed performance statistics were as expected.
To summarize, Netdoop brings scalability through parallel processing to the analysis of
network traces. While our study and implementation showed what was possible using the
map-reduce model, our work needs to be extended before it can be ready for real-world
usage.
6.2
Future Work
There is clearly need for improving the workflow, starting from the network trace in
PCAP format to storing the results into the database. One improvement can be to write a
protocol-aware scheduler plug-in for Hadoop that avoids the need for a text formatted
input file. Another improvement is to use a more modern real-time map-reduce
framework such as Yahoo S4. Using a real-time map-reduce engine will allow the
network trace to be continuously streamed through the processing software without
having to first saving it to disk and processing it later, thus improving upon the batchprocessing nature our existing implementation.
86
BIBLIOGRAPHY
[1] J. Dean, and S. Ghemawat, “MapReduce: Simplified data processing on large
clusters,” Communications of the ACM, 51, 2008, ACM, pp. 107-113.
[2] Fibre Channel Backbone - 5 (FC-BB-5).
[3] Cisco Systems Inc. Introduction to Cisco IOS NetFlow.
http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6555/ps6601/
prod_white_paper0900aecd80406232.pdf
[4] Google Inc. Introduction to Parallel Programming and MapReduce.
http://code.google.com/edu/parallel/mapreduce-tutorial.html
[5] IPCopper. IPCopper ITR1000 Packet Capture Appliance.
http://www.ipcopper.com/product_itr1000.htm
[6] JDSU. Xgig:Powerful Hardware-Based Ethernet/iSCSI Analyzer for SAN And NAS.
[7] Peter Phaal. sFlow Version 5. http://www.sflow.org/sflow_version_5.txt
[8] J. Satran et al. Internet Small Computer Systems Interface (iSCSI).
http://www.rfc-editor.org/rfc/pdfrfc/rfc3720.txt.pdf
[9] Wikipedia. Converged Network Adapter (CNA).
http://en.wikipedia.org/wiki/Converged_network_adapter
[10] WIRESHARK. PCAP File Format.
http://wiki.wireshark.org/Development/LibpcapFileFormat
[11] WIRESHARK. WIRESHARK: The World's Foremost Network Protocol Analyzer.
http://www.wireshark.org/
87
[12] Xgig Software. http://www.sanhealthcheck.com/files/Finisar/Xgig/81Z6AN-Xgigiscsi-analyzer-data-sheet.pdf
[13] Understanding Hadoop cluster and network.
http://bradhedlund.com/2011/09/10/understanding-hadoop-cluster-and-the-network
[14] Apache Hadoop.
http://hadoop.apache.org
[15] NetFlow Analyzer Software.
http://networkmanagementsoftware.com/5-free-tools-for-windows
Download