NETDOOP – DISTRIBUTED PROTOCOL-AWARE NETWORK TRACE ANALYZER Jankiben Patel B.E., Hemchandracharya North Gujarat University, India, 2005 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE at CALIFORNIA STATE UNIVERSITY, SACRAMENTO SPRING 2012 NETDOOP – DISTRIBUTED PROTOCOL-AWARE NETWORK TRACE ANALYZER A Project by Jankiben Patel Approved by: __________________________________, Committee Chair Dr. Jinsong Ouyang __________________________________, Second Reader Dr. Chung-E Wang ____________________________ Date ii Student: Jankiben Patel I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator Dr. Nikrouz Faroughi Department of Computer Science iii ________________ Date Abstract of NETDOOP – DISTRIBUTED PROTOCOL-AWARE NETWORK TRACE ANALYZER by Jankiben Patel Modern data centers house hundreds to hundreds of thousands of servers. As the complexity of the data centers increase, it becomes very important to monitor the network for capacity planning, performance analysis and enforcing security. Sampling the network data and analyzing helps data center administrators plan and tune for optimum application performance. But as data centers grown in size, the sampled data sets become very large in size. We study the application of map-reduce model, a parallel programming technique, to process and analyze these large network traces. Specifically, we analyze the network traces for iSCSI performance and network statistics. We design and implement a prototype of a protocol-aware network trace-processing tool called Netdoop. This prototype functions as a reference design. We also implement a farm of virtual servers and iSCSI targets to create an environment that represents a small iv data center. We use this virtual environment to collect data sets and demonstrate the scalability of the prototype. Further, these virtual servers are also used to host and run the map-reduce framework. Based on the performance and scalability of the tool that is developed, we make conclusions about the applicability of the map-reduce model for analyzing large network traces. _______________________, Committee Chair Dr. Jinsong Ouyang _______________________ Date v ACKNOWLEDGEMENTS I wish to extend my sincere gratitude to my project guide Dr. Jinsong Ouyang, for his invaluable guidance, support and timely feedback at every stage of my project research and implementation. Dr. Ouyang has always been immensely supportive mentor throughout my graduation. Without his support, I would not have completed this project. Dr. Ouyang’s knowledge in the fields of computer networking protocol and Distributed Systems helped me solve the complex tasks of this project. I would take this opportunity to thank Dr. Chung-E-Wang, my second reader, for helping me drafting my project report and providing me his expertise, advice in the form of his positive feedback. I am highly grateful to Dr. Nikrouz Faroughi and Dr. Cui Zhang for reviewing my project report and providing me valuable suggestions. I also wish to thank my manager Sanjeev Datla at Emulex Corporation for supporting me as needed and providing valuable feedback vi TABLE OF CONTENTS Page Acknowledgments.............................................................................................................. vi List of Tables ..................................................................................................................... xi List of Figures ................................................................................................................... xii Chapter 1 INTRODUCTION ........................................................................................................ 1 1.1 Motivation.............................................................................................................. 1 1.2 Requirements ......................................................................................................... 2 1.2.1 Scalability ........................................................................................................ 2 1.2.2 Processing Time .............................................................................................. 2 1.2.3 Operations ....................................................................................................... 2 1.3 2 Report Organization............................................................................................... 3 BACKGROUND .......................................................................................................... 4 2.1 Network Traffic Analysis ...................................................................................... 4 2.1.1 Sample Based Analysis ................................................................................... 4 2.1.2 Network Trace-based Analysis ....................................................................... 6 2.2 iSCSI Protocol ..................................................................................................... 11 vii 2.2.1 Server I/O and Convergence ......................................................................... 11 2.2.1.1 Direct Attached Storage .......................................................................... 11 2.2.1.2 Shared Storage ......................................................................................... 12 2.2.1.3 I/O Convergence ...................................................................................... 13 2.2.2 iSCSI ............................................................................................................. 16 2.2.2.1 Overview ................................................................................................. 16 2.2.2.2 iSCSI Definitions..................................................................................... 21 2.2.2.3 Flow Concepts ......................................................................................... 28 2.2.2.4 iSCSI Protocol Headers ........................................................................... 30 2.2.2.5 Summary.................................................................................................. 41 2.3 The Map-reduce Model ....................................................................................... 42 2.3.1 Serial vs. Parallel Programming .................................................................... 42 2.3.2 Map-reduce.................................................................................................... 43 3 METHODOLOGY ..................................................................................................... 48 3.1 Overview.............................................................................................................. 48 3.2 Design .................................................................................................................. 48 3.3 Framework Evaluation......................................................................................... 50 3.4 Implementation .................................................................................................... 50 3.5 Prototyping Environment..................................................................................... 51 3.6 Analysis and Conclusion ..................................................................................... 51 viii 4 DESIGN ..................................................................................................................... 52 4.1 Assumptions ........................................................................................................ 52 4.1.1 Virtual Infrastructure ..................................................................................... 52 4.1.2 Input Data ...................................................................................................... 53 4.1.3 User Queries .................................................................................................. 54 4.2 Data Format ......................................................................................................... 54 4.3 Fundamental Architecture ................................................................................... 60 4.4 Framework Based Architecture ........................................................................... 63 4.4.1 Apache Hadoop – A Map-reduce Framework .............................................. 63 4.4.1.1 Typical Hadoop Workflow ...................................................................... 65 4.4.1.2 Fault Tolerance and Load Balancing ....................................................... 65 4.4.2 Netdoop and Hadoop ..................................................................................... 66 4.4.2.1 Netdoop Workflow .................................................................................. 66 5 IMPLEMENTATION ................................................................................................ 68 5.1 Prototype Environment ........................................................................................ 68 5.1.1 Virtual Data Center ....................................................................................... 69 5.1.2 Virtual iSCSI Environment ........................................................................... 71 5.1.3 Virtual Map-reduce Cluster ........................................................................... 72 5.2 Netdoop................................................................................................................ 73 5.2.1 Structure ........................................................................................................ 74 ix 5.2.1.1 mapper.py ................................................................................................ 74 5.2.1.2 FlowTable.py ........................................................................................... 74 5.2.1.3 IscsiFlow.py............................................................................................. 75 5.2.1.4 IscsiFlowEntry.py.................................................................................... 75 5.2.1.5 IscsiTask.py ............................................................................................. 76 5.2.1.6 NetStatistics.py ........................................................................................ 76 5.2.1.7 reducer.py ................................................................................................ 76 5.2.1.8 ReducerFlowTable.py ............................................................................. 77 5.2.1.9 ReducerFlow.py....................................................................................... 77 5.2.1.10 ReducerNetStates.py.............................................................................. 78 5.2.1.11 web2netdoop.py ..................................................................................... 78 5.2.2 Execution ....................................................................................................... 78 5.2.2.1 Copying the network trace to the cluster ................................................. 78 5.2.2.2 Submitting the network trace for processing ........................................... 79 5.2.2.3 Copying data from the cluster ................................................................. 79 5.2.2.4 Web-based Interface ................................................................................ 80 6 CONCLUSION AND FUTURE WORK ................................................................... 84 6.1 Conclusion ........................................................................................................... 84 6.2 Future Work ......................................................................................................... 85 Bibliography ..................................................................................................................... 86 x LIST OF TABLES Page Table 4.1 Fields in a Network Trace Record .................................................................... 55 xi LIST OF FIGURES Page Figure 2.1 Netflow-enabled Device ................................................................................... 5 Figure 2.2 Screenshot of Netflow Analyzer Software ........................................................ 6 Figure 2.3 Finisar Xgig Network Analyzer Appliance ....................................................... 7 Figure 2.4 Finisar Xgig Software........................................................................................ 7 Figure 2.5 Wireshark - Open Source Network Analyzer .................................................... 8 Figure 2.6 A Setup to Capture Network Traces .................................................................. 9 Figure 2.7 PCAP File Format ............................................................................................. 9 Figure 2.8 Server with Direct Attached Storage ............................................................... 12 Figure 2.9 Server Connected to a Storage Area Network ................................................. 13 Figure 2.10 Server with a Converged Network Adapter .................................................. 15 Figure 2.11 Server Directly Attached to a SCSI Disk ...................................................... 17 Figure 2.12 Servers Connected to Shared Storage Using iSCSI ...................................... 18 Figure 2.13 iSCSI Initiator and Target Stacks .................................................................. 19 Figure 2.14 iSCSI Protocol Data Unit (PDU) ................................................................... 31 Figure 2.15 iSCSI Login PDU .......................................................................................... 32 Figure 2.16 iSCSI Login Response PDU .......................................................................... 33 Figure 2.17 iSCSI SCSI Command PDU ......................................................................... 34 Figure 2.18 iSCSI SCSI Response PDU ........................................................................... 35 xii Figure 2.19 iSCSI Task Management Function Request PDU ......................................... 36 Figure 2.20 iSCSI Task Management Function Response PDU ...................................... 37 Figure 2.21 iSCSI Data-out PDU...................................................................................... 38 Figure 2.22 iSCSI Data-in PDU........................................................................................ 39 Figure 2.23 iSCSI R2T PDU ............................................................................................ 40 Figure 2.24: iSCSI Asynchronous PDU ........................................................................... 41 Figure 2.25 The Map-reduce Model ................................................................................. 46 Figure 4.1 Setup to Capture Network Traces.................................................................... 60 Figure 4.2 Parallel Processing Architecture of Netdoop .................................................. 61 Figure 4.3 Components of Hadoop Map-reduce Framework ........................................... 64 Figure 5.1 Virtual Data Center Environment .................................................................... 69 Figure 5.2 Virtual iSCSI Environment ............................................................................. 71 Figure 5.3 Virtual Map-reduce Environment.................................................................... 72 Figure 5.4 Netdoop Web-based User Interface................................................................. 80 Figure 5.5 Select All From Table Query Result ............................................................... 81 Figure 5.6 Flow Statics Calculated by Netdoop ............................................................... 82 Figure 5.7 Search Summary Statistics .............................................................................. 83 xiii 1 Chapter 1 INTRODUCTION 1.1 Motivation Network traces provide important information about the traffic that passes through the network. This information can be used for capacity planning as well as to diagnose performance problems. Modern data centers house hundreds to hundreds of thousands of servers connected using one or more networks. The size of the network traces captured in these large data centers is too big to analyze using conventional processing tools running on any single server. Further, data centers are converging network and storage traffic over Ethernet using protocols such as Internet Small Computer System Interface (iSCSI) protocol [8] and Fibre Channel over Ethernet (FCoE) protocol [2]. This makes it even more important to mine these large network traces for information, for capacity planning and performance analysis. Map-reduce is the name of several software frameworks that eliminate a lot of the complexity of writing fault-tolerant, distributed processing software. As a part of this work, we study the applicability of the map-reduce model for processing large network traces. We also develop a tool to analyze these large network traces for network and storage protocol performance metrics. 2 1.2 Requirements This section describes the general requirements for the study of the map-reduce model and the tool that we develop. 1.2.1 Scalability Scalability, in this context is the ability of the processing tool to run efficiently on any number of CPU cores in one or more machines. The tool should be able to scale to utilize more number of CPU cores and storage, as the size of the data increases. Inexpensive servers and storage can then be added to the cluster, to process larger data sets. 1.2.2 Processing Time Processing time and scalability are in a way related. It is much faster to process and analyze a large data set using a number of CPU cores working in parallel, compared to processing the data set on a single server. However, the data set should be favorable to parallel processing. Processing time is an important consideration for this work as well as the tool that we develop. Faster processing times mean more often more analysis results. 1.2.3 Operations After a careful study of the map-reduce model, three operations were identified for processing network traces. The first operation, filtering, involves filtering relevant data from the irrelevant data present in the network trace. The second operation is grouping, 3 where the filtered data is grouped into records that share one or more common attributes, for example a TCP flow or iSCSI session. Finally the sorting operation sorts the records based on one or more attributes. The sorted data set is then processed in parallel on a scalable cluster of CPU cores and storage. 1.3 Report Organization In Chapter 2 we present an overview of network monitoring, the tools used to collect network traces and some applications. We also present a brief introduction to the iSCSI storage protocol and its network frame formats. Then we introduce map-reduce model. In Chapter 3 we describe the methodology used throughout this project as well as in the development of the Netdoop tool. In Chapter 4 we describe the design and implementation of Netdoop that was developed as a part of this project. In Chapter 5 we describe the prototyping environment that was used to simulate a data center environment to collect network traces, and also to host a map-reduce framework. Finally, in Chapter 6 we summarize our conclusions about the applicability of mapreduce model for processing network traces and mining the data for performance. 4 Chapter 2 BACKGROUND This chapter introduces some of the technologies related to this project. We start of by looking at network traces, tools used for collecting network traces and some use cases. Then we introduce the Internet SCSI (iSCSI) protocol to better understand the importance and relevance of analyzing the performance of this protocol in modern data centers. We then explain the map-reduce programming model. 2.1 Network Traffic Analysis Network traffic analysis is a common operation in data centers of all sizes. Network traffic analysis provides valuable data for capacity planning, event analysis, performance monitoring, and security analysis. 2.1.1 Sample Based Analysis There are a number of methods and tools to collect network data for analysis. One method of collecting data for network traffic analysis is to configure the network elements in a network with Cisco NetFlow[3] or sFlow[7]. In particular, NetFlow and sFlow define the behavior of network monitoring agents inside switches and routers, a MIB for configuring the agents, and the format of the datagrams that carries the traffic measurement data from the agents to a controller. Both NetFlow and sFlow deal with 5 flows. A flow is a set of packets that share a certain set of attributes. Information about the flows is stored in the flow cache of the switch or router. Periodically the flow entries are encapsulated in a datagram and sent to the collector that computes various statistics. Figure 2.1 shows the sampling operation in a NetFlow enabled device, and the NetFlow cache in that device. Figure 2.2 shows a screenshot of NetFlow analyzer software. Figure 2.1 Netflow-enabled Device 6 Figure 2.2 Screenshot of Netflow Analyzer Software [15] 2.1.2 Network Trace-based Analysis Another method to collect network data for analysis is to use network analyzers or packet capture equipment. There are dedicated hardware network analyzers, for example Xgig from JDSU[6] or software-based network analyzers such as Wireshark[11] that run on general purpose computers. There are also hardware-based packet capture equipment such as the IPCopper [5]. Figure 2.3 shows the Finisar Xgig hardware network analyzer and Figure 2.4 shows a screenshot from the Xgig analyzer interface. Figure 2.5 shows a screenshot from the Wireshark network analyzer software running on a PC. 7 Figure 2.3 Finisar Xgig Network Analyzer Appliance [6] Figure 2.4 Finisar Xgig Software [12] 8 Figure 2.5 Wireshark - Open Source Network Analyzer For detailed performance analysis or for debugging, the most common approach in data centers is to use network traces. For the scope of this project, we use the network trace files generated by network analyzers. To capture a network trace using a network analyzer, a port on the Ethernet switch is mirrored on to another port (labeled as mirror port in Figure 2.6). Once mirrored, all traffic going through that port are sent to the mirror port as well. By connecting a network analyzer to the mirror port, the network analyzer can record all traffic going through the port that has been mirrored. In Figure 2.6 below, the network analyzer can record all traffic going between the servers and the storage server. 9 Figure 2.6 A Setup to Capture Network Traces It is also possible to connect some of the network analyzers or packet capture equipment inline. When connected inline, the device behaves as a pass-thru device, in series with the network connection of the network device whose network interface is being monitored. Some network analyzers have proprietary file formats to store the network traces, and other network analyzers including Finisar Xgig and WireShark support the popular PCAP file format[10]. The network performance analysis tool developed as a part of this project is designed to read PCAP files as input. The PCAP file has a very simple file format, which comprises of a global header and multiple records, as shown in Figure 2.7. Figure 2.7 PCAP File Format [10] 10 The global header contains information that describes the contents of the file. The structure of the global header is shown below. guint32 magic_number; /* magic number */ guint16 version_major; /* major version number */ guint16 version_minor; /* minor version number */ gint32 thiszone; /* GMT to local correction */ guint32 sigfigs; /* accuracy of timestamps */ guint32 snaplen; /* max length of captured packets, in octets */ guint32 network; /* data link type */ Each record comprises of a packet header and an Ethernet frame captured on the wire. The packet header contains timestamp information related to the time of capture as well as meta-data about the frame that is encapsulated in this record. The structure of a packet header is shown below. guint32 ts_sec; /* timestamp seconds */ guint32 ts_usec; /* timestamp microseconds */ guint32 incl_len; /* number of octets of packet saved in file */ guint32 orig_len; /* actual length of packet */ 11 2.2 iSCSI Protocol The previous section discussed the general topic of collecting network traffic data in a data center. In this section we start by introducing I/O convergence happening in modern data centers, and then describe the Internet Small Computer System Interface (iSCSI) protocol [8]. iSCSI protocol allows storage commands and data to be transferred over Ethernet. 2.2.1 Server I/O and Convergence Computer servers typically have a network interface and a storage interface. A network interface allows the server to communicate with other servers on the local network and the Internet. A storage interface allows the servers to communicate with a storage subsystem, to store and access data. 2.2.1.1 Direct Attached Storage If the storage interface connects the server to a local storage device such as a hard drive (HDD) or solid-state disk (SSD), the server is said to have direct-attached storage or DAS. Common storage interfaces that connect a server to a local storage subsystem are Small Computer System Interface (SCSI), Integrated Drive Electronics (IDE), Serial Advanced Technology Attachment (SATA) and Serial Attached SCSI (SAS). Figure 2.8 shows the block diagram of a server with local storage. 12 Figure 2.8 Server with Direct Attached Storage 2.2.1.2 Shared Storage If the storage interface connects the server to a storage subsystem shared by multiple servers, the server is said to be connected to a Storage Area Network (SAN). Servers and the storage devices connected over a SAN use a transport protocol to exchange control information and data. Figure 2.9 shows the block diagram of a server that is connected to a shared storage device over a SAN. 13 Figure 2.9 Server Connected to a Storage Area Network As shown in Figure 2.9 above, using SAN results in data centers buying, deploying and managing two separate I/O fabrics — one fabric for carrying network traffic and another fabric to carry storage traffic. 2.2.1.3 I/O Convergence In the previous section we saw that connecting servers to a SAN requires that data centers 14 buy, deploy and manage two separate fabrics — one for carrying network traffic, and another for carrying storage traffic. With the introduction of new protocols such as iSCSI and FCoE [2], data centers are now able to use Converged Network Adapters (CNA) [9] to converge network and storage traffic over a single Ethernet fabric. Figure 2.10 shows the block diagram of a server that uses a converged network adapter to exchange network and storage traffic over Ethernet. 15 Figure 2.10 Server with a Converged Network Adapter I/O convergence helps data centers save costs by buying, deploying and maintaining just one I/O fabric — Ethernet. FCoE capable Ethernet switches bridge FCoE traffic to Fibre Channel traffic and pass the Fibre Channel traffic over to a Fibre Channel fabric. Whereas the iSCSI protocol is an end-to-end protocol between the server and a storage device that is iSCSI capable (iSCSI target). 16 When multiple protocols and traffic types share the same Ethernet fabric in mission critical business environments, it is important for data center administrators to allocate proper bandwidth and quality-of-service (QoS) to each protocol. This ensures that the protocols share the Ethernet fabric efficiently. To tune application performance or study the cause-effect relationship of a particular data center event, it is important to have the ability to analyze network traces as quickly as possible. As a part of our work, we apply the results of our study about the applicability of map-reduce model for processing large network traces, to storage protocols. The work that is done as a part of this project can be extended to study the performance of other protocols as well. 2.2.2 iSCSI This section introduces the Internet Small Computer System Interface (iSCSI) protocol in more detail. We chose iSCSI as a protocol to apply our work because iSCSI is being widely deployed in modern data centers today. The prototype developed as a part of our project can be extended for use in real production environments, helping data center administrators to benefit from our work. 2.2.2.1 Overview The iSCSI protocol is layered on top of the TCP/IP protocol. iSCSI allows one or more servers to use a shared storage subsystem on the network. Specifically, iSCSI provides a 17 transport for carrying SCSI commands and data over IP networks. Because iSCSI is layered on top of the TCP/IP protocol, iSCSI can be used to connect servers and storage across local area networks (LAN), wide area networks (WAN) or the Internet. Figure 2.11 shows a server connected to its storage subsystem over a SCSI interface. Figure 2.12 shows multiple servers using iSCSI to connect to an iSCSI target, sharing the storage attached to the target. Figure 2.11 Server Directly Attached to a SCSI Disk 18 Figure 2.12 Servers Connected to Shared Storage Using iSCSI In an iSCSI environment, there are iSCSI initiators (clients), iSCSI targets (storage server) and servers running management services. The common management services for iSCSI are Internet Storage Name Service (iSNS) and Service Location Protocol (SLP). The iSNS and SLP services help discover the available storage resources on a network, such as advertising the available iSCSI targets to iSCSI initiators. 19 Figure 2.13 iSCSI Initiator and Target Stacks Figure 2.13 shows the data flow from the application to the disks when using iSCSI. There are two entities shown in Figure 2.13 — the server that is the iSCSI initiator or the client, and the iSCSI target serving storage from the disks attached to it. Logically, iSCSI connects the SCSI stack on a server to the SCSI stack on storage server (iSCSI target) over Ethernet, using the TCP/IP protocol. In a server, applications and the operating system read or write data to the filesystem. The filesystem maps the data to storage blocks, which are then handed off to the SCSI layer. The SCSI layer prepares command descriptor blocks (CDB) which are commands to interact with SCSI protocol-capable disks to read and write data blocks. However, when 20 using iSCSI, these CDBs and the data are exchanged with the iSCSI stack instead of a disk subsystem. The iSCSI stack encapsulates the CDBs into iSCSI command protocol data units (iSCSI command PDUs) and data into iSCSI data PDUs. These PDUs are exchanged with a remote iSCSI target over the TCP/IP stack using normal TCP/IP protocol mechanisms and components. At the iSCSI target, the iSCSI target stack receives iSCSI command PDUs along with any data PDUs. The iSCSI target stack then extracts the SCSI CDBs from the iSCSI command PDUs and the data from the iSCSI data PDUs. The CDBs and data are then sent to the SCSI stack. The iSCSI protocol has logically connected the SCSI stack on the server with the SCSI stack on the iSCSI target or the storage server. This allows the server to use the disks attached to the remote iSCSI target for data storage. The storage space on the storage servers or iSCSI targets is virtualized as logical units (LUN). iSCSI management software typically determines which LUNs are available to which server. This allows fine grain control of how much storage is allocated to each server, as well as enforces access control. The blocks in each LUN are addressed by a logical block address (LBA). Logical block addresses are linear within a LUN, irrespective of how they are stored on the physical media. 21 2.2.2.2 iSCSI Definitions The following is a summary of definitions and acronyms that are important in understanding and analyzing the iSCSI protocol [page 10-14][8]. Alias: An alias string can also be associated with an iSCSI Node. The alias allows an organization to associate a user-friendly string with the iSCSI Name. However, the alias string is not a substitute for the iSCSI Name. CID (Connection ID): Connections within a session are identified by a connection ID. It is a unique ID for this connection within the session for the initiator. It is generated by the initiator and presented to the target during login requests and during logouts that close connections. Connection: A connection is a TCP connection. Communication between the initiator and target occurs over one or more TCP connections. The TCP connections carry control messages, SCSI commands, parameters, and data within iSCSI Protocol Data Units (iSCSI PDUs). iSCSI Device: A SCSI Device using an iSCSI service delivery subsystem. Service Delivery Subsystem is defined by the SCSI protocol as a transport mechanism for SCSI commands and responses. 22 iSCSI Initiator Name: The iSCSI Initiator Name specifies the worldwide unique name of the initiator. iSCSI Initiator Node: The “initiator”. The work “initiator” has been appropriately qualified as either a port or device in the rest of the iSCSI protocol document when the context is ambiguous. All unqualified usages of “initiator” refer to an initiator port (or device) depending on the context. iSCSI Layer: This layer builds/receives iSCSI PDUs and relays/receives them to/from one or more TCP connections that form an initiator-target “session”. iSCSI Name: The name of an iSCSI initiator or iSCSI target. iSCSI Node: The iSCSI Node represents a single iSCSI initiator or iSCSI target. There are one or more iSCSI Nodes within a Network Entity. The iSCSI Node is accessible via one or Network Portals. An iSCSI Node is identified by its iSCSI Name. The separation of the iSCSI Name from the addresses used by and for the iSCSI Node allows multiple iSCSI Nodes to use the same address, and the same iSCSI Node to use multiple addresses. 23 iSCSI Target Name: The iSCSI Target name specifies the worldwide unique name of the target. iSCSI Target Node: The “target”. iSCSI Task: An iSCSI task is an iSCSI request for which a response is expected. iSCSI Transfer Direction: The iSCSI transfer direction is defined with regard to the initiator. Outbound or outgoing transfers are transfers from the initiator to the target, while inbound or incoming transfers are from the target to the initiator. ISID: The initiator part of the Session Identifier. It is a explicitly specified by the initiator during Login. I_T nexus: According to the SCSI protocol, the I_T nexus is a relationship between a SCSI Initiator Port and a SCSI Target Port. For iSCSI, this relationship is a session, defined as a relationship between an iSCSI Initiator’s end of the session (SCSI Initiator Port) and the iSCSI Target’s Portal Group. The I_T nexus can be identified by the conjunction of the SCSI port names; that is, the I_T nexus identifier is the tuple (iSCSI Initiator Name + ‘,i,’+ISID, iSCSI Target Name + ‘,t,’+ Portal Group Tag). 24 Network Entity: The Network Entity represents a device or gateway that is accessible from the IP network. A Network Entity must have one or more Network Portals, each of which can be used to gain access to the IP network by some iSCSI Nodes contained in that Network Entity. Network Portal: The Network Portal is a component of a Network Entity that has a TCP/IP network address and that may be used by an iSCSI Node within that Network Entity for the connection(s) within one of its iSCSI sessions. A Network Portal in an initiator is identified by its IP address. A Network Portal in a target is identified by its IP address and its listening TCP port. Originator: In a negotiation or exchange, the party that initiates the negotiation or exchange. PDU (Protocol Data Unit): the initiator and target divide their communications into messages. The term “iSCSI protocol data unit” (iSCSI PDU) is used for these messages. Portal Groups: iSCSI supports multiple connections within the same session; some implementations will have the ability to combine connections in a session across multiple Network Portals. A Portal Group defines a set of Network Portals within an iSCSI Network Entity that collectively supports the capability of coordinating a session with 25 connections spanning these portals. Not all Network Portals within a Portal Group need participate in every session connected through that Portal Group. One or more Portal Groups may provide access to an iSCSI Node. Each Network Portal, as utilized by a given iSCSI Node, belongs to exactly one portal group within that Node. Portal Group Tag: This 16-bit quantity identifies a Portal Group within an iSCSI Node. All Network Portals with the same portal group tag in the context of a given iSCSI Node are in the same Portal Group. Recovery R2T: An R2T generated by a target upon detecting the loss of one or more Data-Out PDUs through one of the following means: a digest error, a sequence error, or a sequence reception timeout. A recovery R2T carries the next unused R2TSN, but requests all or part of the data burst that an earlier R2T (with a lower R2TSN) had already requested. Responder: In a negotiation or exchange, the part that responds to the originator of the negotiation or exchange. SCSI Device: This is a SCSI protocol term for an entity that contains one or more SCSI ports that are connected to a service delivery subsystem and supports a SCSI application protocol. For example, a SCSI Initiator Device contains one or more SCSI Initiator Ports 26 and zero or more application clients. A Target Device contains one or more SCSI target Ports and one or more device servers and associated logical units. For iSCSI, the SCSI device is the component within an iSCSI Node that provides the SCSI functionality. As such, there can be at most, one SCSI Device within a given iSCSI Node. Access to the SCSI Device can only be achieved in an iSCSI normal operational session. The SCSI Device Name is defined to be the iSCSI Name of the node. SCSI Layer: This builds/receives SCSI CDBs (Command Descriptor Blocks) and relays/receives them with the remaining command execute parameters to/from the iSCSI Layer. Session: The group of TCP connections that link an initiator with a target form a session (loosely equivalent to a SCSI I-T nexus). TCP connections can be added and removed from a session. Across all connections within a session, an initiator sees one and the same target. SCSI Initiator Port: This maps to the endpoint of an iSCSI normal operation session. An iSCSI normal operational session is negotiated through the login process between an iSCSI initiator node and an iSCSI target node. At successful completion of this process, a SCSI Initiator Port is created within the SCSI Initiator Device. The SCSI Initiator Portal Name and SCSI Initiator Port Identifier are both defined to be the iSCSI Initiator Name 27 together with (a) a label that identifies it as an initiator port name/identifier and (b) the ISID portion of the session identifier. SCSI Port: This is the SCSI term for an entity in a SCSI Device that provides the SCSI functionality to interface with a service delivery subsystem. For iSCSI, the definition of the SCSI Initiator Port and the SCSI Target Port are different. SCSI Port Name: A name made up as UT-8 [RFC2279] characters and includes the iSCSI name + ‘i’ or ’t’ + ISID or Portal Group Tag. SCSI Target Port: This maps to an iSCSI Target Portal Group. SCSI Target Port Name and SCSI Target Port Identifier: These are both defined to be the iSCSI target Name together with (a) a label that identifies it as a target port name/identifier and (b) the portal group tag. SSID (Session ID): A session between an iSCSI initiator and an iSCSI target is defined by a session ID that is a tuple composed of an initiator part (ISID) and a target part (Target Portal Group Tag). The ISID is explicitly specified by the initiator at session establishment. The Target Portal Group Tag is implied by the initiator through the selection of the TCP endpoint at connection establishment. The TargetPortalGroupTag 28 key must also be returned by the target as a confirmation during connection establishment when TargetName is given. Target Portal Group Tag: A numerical identifier (16-bit) for an iSCSI Target Portal Group. TSIH (Target Session Identifying Handle): A target assigned tag for a session with a specific named initiator. The target generates it during session establishment. Its internal format and content are not defined by this protocol, except for the value 0 that is reserved and used by the initiator to indicate a new session. It is given to the target during additional connection establishment for the same session. 2.2.2.3 Flow Concepts The iSCSI protocol is implemented on top of the TCP/IP protocol. TCP/IP is a connection-oriented, byte-based loss-less transport protocol. An iSCSI connection between an iSCSI initiator and an iSCSI target is a TCP connection. Communication between the iSCSI initiator and target occurs over one or more TCP connections. The TCP connections carry SCSI commands, parameters and data within iSCSI protocol data units (iSCSI PDUs). Each connection has a connection ID (CID). 29 A group of iSCSI connections that connect an iSCSI initiator with an iSCSI target form an iSCSI session. An iSCSI session is similar to a Initiator-Target nexus (I-T nexus) in the SCSI protocol. TCP connections can be added or removed from an iSCSI session as required. The iSCSI target sees the same target across all the connections within a session. Each session has a session ID (SSID). SSID comprises of an initiator defined component called the initiator session ID (ISID) and the target component called the target portal group tag. A portal group defines a set of network portals within an iSCSI network entity that collectively supports the capability of coordinating a session with connections spanning these portals. One or more portal groups may provide access to an iSCSI node. Each network portal used by a given iSCSI node belongs to exactly one group with that node. An iSCSI session is established through a login phase. An iSCSI login creates a TCP connection, authenticates an iSCSI initiator and iSCSI target with each other, negotiates operation parameters and associates the connection to a session. Once a successful login phase has completed between an iSCSI initiator and an iSCSI target, the iSCSI session enters a full feature phase. Once an iSCSI session is full feature phase, data can be transferred between the iSCSI initiator and iSCSI target. Sending a logout command can terminate a session. The session also may be terminated due to timeouts or TCP connection failures. 30 As we have seen above, iSCSI carries SCSI commands and data over TCP connections that together form a session. A network trace shows the TCP packets that form the iSCSI PDUs. The headers of TCP packets (TCP headers) and the headers of the iSCSI PDUs contain all the information required to analyze the state of an iSCSI transfer and end-toend performance of a session or connection. The following section is a summary of some of the common iSCSI header formats as well as some examples of I/O operations. 2.2.2.4 iSCSI Protocol Headers This section lists some important headers in iSCSI PDUs used in our work to compute performance metrics of the iSCSI flows. For a more comprehensive list of all iSCSI headers, PDU formats and definitions of individual fields in the headers please refer to the iSCSI protocol specification [8]. 2.2.2.4.1 iSCSI PDU All iSCSI PDUs have one or more header segments and optionally, a data segment. After the entire header segment group a header-digest may follow. A data digest may also follow the data segment. 31 Figure 2.14 iSCSI Protocol Data Unit (PDU) [8] 2.2.2.4.2 Login Request After establishing a TCP connection between an initiator and a target, the initiator must start a login phase to gain further access to the target’s resources. The Login Phase consists of a sequence of Login Requests and Responses that carry the same Initiator Task Tag. The Login PDU is as show below. 32 Figure 2.15 iSCSI Login PDU [8] 2.2.2.4.3 Login Response The Login Response indicates the progress and/or end of the Login Phase. The Login Response PDU is as shown below. 33 Figure 2.16 iSCSI Login Response PDU[8] 2.2.2.4.4 SCSI Command The format of the SCSI Command PDU is as shown below. The SCSI Command PDU carries SCSI commands from the iSCSI initiator to the iSCSI target. 34 Figure 2.17 iSCSI SCSI Command PDU [8] 2.2.2.4.5 SCSI Response The format of a SCSI Response PDU is as shown below. The SCSI Response PDU carries the SCSI response from an iSCSI target to the iSCSI initiator in response to a SCSI command. 35 Figure 2.18 iSCSI SCSI Response PDU [8] 2.2.2.4.6 Task Management Function Request The Task Management function provides an initiator with a way to explicitly control the execution of one or more Tasks (SCSI and iSCSI tasks). The Task Management Function Request PDU is as shown below. 36 Figure 2.19 iSCSI Task Management Function Request PDU [8] 2.2.2.4.7 Task Management Function Response In response to a Task Management Function Request, the iSCSI target responds with a Task Management Response. The iSCSI Task Management Function Response PDU is as shown below. 37 Figure 2.20 iSCSI Task Management Function Response PDU [8] 2.2.2.4.8 SCSI Data-Out for WRITE The SCSI Data-Out PDU carries data from the iSCSI initiator to the iSCSI target, for a write operation. The SCSI Data-Out PDU is as shown below. 38 Figure 2.21 iSCSI Data-out PDU [8] 2.2.2.4.9 SCSI Data-In for READ The SCSI Data-In PDU carries data from the iSCSI target back to the iSCSI initiator for a read operation. The SCSI Data-In PDU is as shown below. 39 Figure 2.22 iSCSI Data-in PDU [8] 2.2.2.4.10 Ready To Transfer (R2T) When an initiator has submitted a SCSI Command with data that passes from initiator to the target (WRITE), the target may specify which blocks of data it is ready to receive. The target may request that the data blocks be delivered in whichever order is convenient for the target at that particular instant. This information is passed from the target to the 40 initiator in the Ready To Transfer (R2T PDU. The R2T PDU is as shown below. Figure 2.23 iSCSI R2T PDU [8] 2.2.2.4.11 Asynchronous Message An Asynchronous Message may be sent from the target to the initiator without correspondence to a particular command. The target specifies the reason for the event and sense data. This is an unsolicited message from the iSCSI target to the iSCSI initiator, for example to indicate an error. The Asynchronous Message PDU is as shown below. 41 Figure 2.24: iSCSI Asynchronous PDU [8] 2.2.2.5 Summary This section introduced the concepts in the iSCSI protocol. A subset of iSCSI PDU formats that are extracted from a network trace for performance analysis were introduced as well. Servers using iSCSI protocol to connect to remote storage servers or iSCSI 42 targets do so over one or more TCP connections that forms an iSCSI session. Information related to the TCP connections and iSCSI sessions are used in our work to extract per flow statistics. 2.3 The Map-Reduce Model This section introduces parallel programming and the map-reduce programming model. We start by comparing serial and parallel programming paradigms. 2.3.1 Serial vs. Parallel Programming In the serial programming paradigm, a program consists of a sequence of instructions, where each instruction is executed one after another, from start to finish, on a single processor. To improve performance and efficiency, parallel programming was developed. A parallel program can be broken up into parts, where each part can be executed concurrently with the other parts. Each part can be run on a different CPU in a single computer, or CPUs in a set of computers connected over a network. Building a parallel program consists of identifying tasks that can be run concurrently or partition the data that can be processed simultaneously. It is not possible to parallelize a program if the computed value depends on a previously computed value. However, if the data can be broken up into equal chunks, we can put a parallel program together. 43 As an example, let us consider the processing requirements for a huge array of data. The huge array can be partitioned into sub-arrays. If the processing needs are the same for each element in the array, and does not depend on the computational result on another element elsewhere in the array, this represents an ideal case for parallel programming. We can structure the environment such that a master divides the array into a number of equal-sized sub-arrays and distributes the sub-arrays to the available processing nodes or workers. Once the processing is complete, the master can collect the results from each worker. Each worker receives the sub-array from the master, and performs processing. Once the processing is complete, the worker returns the results to the master. 2.3.2 Map-Reduce Google introduced the map-reduce programming model in 2004. The map-reduce model derives from the map-and-reduce function in functional programming languages. In a functional programming language, the map function takes a function and a sequence of values as input. The map function then applies the function passed to it to each element in the sequence. The reduce operation combines all the elements of the resulting sequence using a binary operation, such as “+” to add up all the elements in the sequence. Map-reduce was developed by Google as a mechanism for processing large amounts of raw data [1] [4]. The size of the data that has to be processed was so large that the processing had to be distributed across thousands of machines, in order to be processed in 44 a reasonable amount of time. The map-reduce model provided an abstraction for Google engineers to perform simple computations while hiding the details of parallelization, distribution of data across processing nodes, load balancing and fault tolerance. The users of the map-reduce library have to write a map function and a reduce function. The map function takes an input pair and produces a set of intermediate key/value pairs. The map-reduce library groups together all the intermediate values associated with the same intermediate key and passes them to the reduce function. The reduce function accepts an intermediate key and a set of values for that key. It merges together these values to form a smaller set of values. One of the main advantages of the map-reduce library is that the complexity of writing parallel programs is abstracted away or even eliminated. However this simplification sometimes limits the applicability of the map-reduce model to certain computational tasks. A classic example used to illustrate how a map-reduce program can be implemented is shown in the pseudo code below. This example implements a word counting application using the map-reduce model. The pseudo code is designed to take a set of text documents and generate a list of words along with the number of occurrences of each word in the input text data. 45 function map(String key, String text) { for word in split_words(text) { emit_intermediate(word, “1”) } } function reduce(String word, Iterator values) { int count = 0 for v in values { count = count + to_int(v); } emit(word, to_string(count)) } As seen from the pseudo code, the map function accepts a key and a value as input and produces a number of intermediate key-value pairs as output. The reduce function accepts a key and a list of values as input. The reduce function in this case simply adds up the list of values passed to it, and produces a final key-value pair for each input key. 46 Figure 2.25 The Map-reduce Model Figure 2.25 illustrates the execution and flow of the map-reduce job. After submitting a map-reduce job, the master splits up the input data based on a set of default rules that can be overridden by writing a custom function. The master then schedules each input data chunk to a worker. For each input record in the data chunk, the worker calls the map function. The map function processes the records and produces output records in the form of key-value pairs. This phase is called the map phase. The map phase is followed by a sort phase in which the intermediate key-value pairs are sorted based on the key. The master then selects sets of keys and their values, and schedules them to workers for the reduce operation. In the reduce phase, the workers participating in the reduce operation 47 call the reduce function on the key-value pairs assigned to that worker. The reduce functions produce the final output files. In the word counting example above, a large input text file is split into several chunks. The default split criteria can be a set of lines based on line numbers. The map function then processes a set of lines, by splitting the line into individual words and emitting key value pairs of the format “word 1”. In this example, the word is the key and “1” the value. During the sort phase, the intermediate files with key-value pairs generated during the map phase are sorted based on the keys, which in this case are the words. The sorted list is split once again and scheduled to the reduce workers. The reduce function totals up the number of occurrences for each word and emits a final count as output file. As we have seen above, the map-reduce abstracts the complexity of writing parallel programs. The master takes care of load balancing, scheduling and fault-tolerance, leaving the programmer to focus on solving the problem at hand. However, this simplification comes at a cost in that the map-reduce model cannot be applied to any generic problem. Hence as a part of our work, we study the applicability of the mapreduce model for processing large network traces. 48 Chapter 3 METHODOLOGY This chapter describes the methodology used to study the problem of analyzing large network traces using the map-reduce model and in applying the study to implement a tool to analyze the network traces for the iSCSI and network performance metrics. 3.1 Overview One of the goals of the project was to study the viability of using map-reduce model to process network traces, specifically to mine for performance data of iSCSI and network protocols. To do this we start by creating a prototype environment that represents a typical data center. Using server virtualization, we created a number of virtual servers and a virtual iSCSI target. Using this environment and WireShark network analyzer we collect network traces. We then develop a tool to use the features of a map-reduce framework. We use the cluster of virtual servers to deploy the tool and study the distribution of processing workload across the configured cluster of virtual servers. Then we collect the performance metrics of iSCSI protocol and the underlying network as reported by the tool. 3.2 Design In Chapter 1 we presented a set of requirements for this project. The key requirements were scalability and processing time. In Chapter 2 we also introduced concepts around 49 the collection of network traces, as well as I/O convergence — a dominant trend in modern data centers. We also introduced iSCSI as a popular protocol that is used to carry storage and network traffic on a shared Ethernet fabric. There is more data than ever to observe and analyze in order to predict problems before they occur, as well as to tune application performance. The capability to process and analyze the network traffic data sets must not be limited by the size of the data set. A major design goal for this project has been scalability. Though we leverage the features of a map-reduce framework, the design should not be restricted to the functionality of any map-reduce framework. The design should serve as a reference design that represents a best case in terms of performance. It is not our intention to design a full-fledged implementation ready for deployment in data centers. In could take months to design a full-functional tool that can be used with realistic workloads, let alone the infrastructure necessary for demonstrating the system working at scale. Instead, we must choose the components of the design such that the design can be implemented in a limited timeframe, but still provide a reference for using the map-reduce model for processing network traces. Leveraging the features of a mapreduce framework allows us to make tradeoffs between complexity and time to implement. By leveraging the features of a standard map-reduce framework, the implementation time is reduced to days and weeks instead of months. 50 3.3 Framework Evaluation To implement the design system and the tool for processing network traces in data centers, we use the features provided by a map-reduce model. We need to determine an existing map-reduce framework for our implementation. There are a number of mapreduce frameworks available to pick from. However, the body of work that we generate from this project is to solve a real problem in modern data centers. The framework we choose must be stable, mature, even better — well known to data center administrators, so that our work may be leveraged and applied in real data centers. 3.4 Implementation Once we select a map-reduce framework, we proceed to implementation. We choose a programming language supported by the map-reduce framework to implement the functionality required for processing network traces. A number of protocols use Ethernet as a transport. It will be impossible to implement a system to analyze network traces for all the protocols that layer on top of Ethernet, in the timeframe available for this project. We choose a protocol that is relevant and deployed in modern data centers. We implement the system in a way that it can be extended to support other network protocols in future. Usability is a very important design goal. We also implement a simple webbased user interface to access the performance data gathered from the network traces. 51 3.5 Prototyping Environment The map-reduce frameworks significantly reduce the complexity in implementing parallel programs. However to test and demonstrate the programs at scale requires a complex infrastructure even to represent a small portion of a data center. Likewise, the infrastructure required for generating a large network trace, store it and transport it around can be huge. Given that our implementation is a reference implementation, we need to build an environment that requires as little infrastructure as possible but still represents a typical data center. The same reference infrastructure we build must be capable of generating a scaled-down version of the network trace, just enough to demonstrate the parallel execution of the processing and analysis. It will be very expensive in time and money to acquire, configure, and deploy real servers, switches and storage for the purpose of implementing, testing and demonstrating our work. We again make a careful tradeoff between complexity and performance. 3.6 Analysis and Conclusion After implementing the design and building the prototype data center environment, we deploy our tool. We then study the results of the analysis of a sample network trace comprising of TCP/IP and iSCSI traffic. Based on the analysis of the results, we should be able to make conclusions on the viability, advantages and limitations of using mapreduce frameworks for processing network traces. 52 Chapter 4 DESIGN This chapter describes the way in which the map-reduce model can be used to process network traces, by distributing the processing workload on to a number of servers working on a trace in parallel. We start by listing some of the assumptions that our implementation is based on. We then describe a simple data format that we use for our implementation. Next, we present the design details of our implementation, which is basic but sufficient enough to be extended to support more network protocols as well as work with real data center sized network traces. 4.1 Assumptions This section lists the assumptions we had to make considering the limitations of the prototyping infrastructure available, limitations imposed by the map-reduce framework and performance vs. complexity tradeoffs. 4.1.1 Virtual Infrastructure Parallel programming is a model of computing where a program or data is split into multiple chunks, and processed on separate CPUs or servers concurrently. The mapreduce model provides a simple framework for developing parallel programs, hiding much of the complexities associated with load balancing and fault tolerance. 53 However to implement, test and demonstrate a parallel processing model, we need a number of servers and an interconnect fabric to demonstrate the benefits of parallel processing. Acquiring such an infrastructure for the length of this project is expensive and time consuming to maintain. For the implementation, testing and demonstration of our work, we make use of server virtualization. Server virtualization software such as VMware ESX, VMware Fusion allow a single physical server to be partitioned into multiple virtual servers, each with its own CPU instance, memory, network and storage interfaces. Recent developments in CPU technology and enhancements in server virtualization software have dramatically reduced the overhead due to virtualization. However, the performance of virtual machines is not yet equal to the performance of physical machines. The performance overhead comes from sharing of the physical resources among the virtual machines. We use VMware Fusion running on a notebook computer to create a cluster of virtual servers to deploy and demonstrate our implementation. While this is a reference implementation, it should work out-of-the-box on physical servers with improved performance. 4.1.2 Input Data Our design is agnostic to input file formats. However, we assume that the network traces 54 for use with our implementation are simple text files. The text file is a text representation of the network trace in PCAP format captured by a network analyzer. It is formatted in a way that simplifies parsing. The text file contains flow data represented as a set of flow records. Later in this chapter, we describe the file format that we use for these text files, and that is similar to the text version of the PCAP files. A utility that we developed as a part of our implementation converts the binary PCAP files to text files. 4.1.3 User Queries User interaction with our tool is through a command line interface (CLI) and web interface. We make an assumption that the network trace analysis jobs will be submitted using the CLI, as it allows scripting. The web interface allows user to run queries on the database of performance results, generated by the analyzing one or more trace files. A user query can consist of: • A selection between iSCSI performance data or network statistics • A request to extract performance data based on a calendar-day Our design is very flexible and can be extended to support a richer user interface, with more statistics and user queries. 4.2 Data Format The data format that we use in our reference implementation is intentionally simple. This keeps the parsing complexity to a minimum, while retaining all the information required 55 for computing protocol-aware performance statistics for each flow in the network trace. The map-reduce framework partially imposed the requirement of using text-based input format. The data format of the input text files comprises of a list of records, one record per line. Each record comprises of iSCSI protocol or network statistics related fields from the network trace. Table 4.1 lists each field in the record and a brief description. The fields in Table 4.1 include fields from Ethernet, TCP/IP and iSCSI protocol headers. iSCSI headers relevant to our work were summarized in Chapter 2. Table 4.1 Fields in a Network Trace Record frame.time_relative Relative time of arrival with respect to previous frame frame.number Frame serial number, starting from the first frame captured frame.len Length of the captured frame eth.src Ethernet source MAC address eth.dst Ethernet destination MAC address eth.type Ethernet type ip.src IP source address for this frame 56 ip.dst IP destination address for this frame tcp.srcport TCP source port for this frame tcp.dstport TCP destination port for this frame tcp.window_size TCP window size advertised in this frame tcp.analysis.retransmission If the frame was a retransmission tcp.analysis.window_update If the remote node is updating a closed window tcp.analysis.window_full The window of the sender of this frame is closed tcp.len Length of TCP payload tcp.analysis.bytes_in_flight Number of bytes sent but not acknowledged tcp.options.wscale TCP window scale enabled tcp.options.wscale_val TCP window scale value iscsi.request_frame If this frame is carrying an iSCSI request iscsi.response_frame If this frame is a response to a previous iSCSI request iscsi.time Time from request iscsi Frame belongs to iSCSI exchange 57 iscsi.scsicommand.R PDU carries a read command iscsi.scsicommand.W PDU carries a write command iscsi.initiatortasktag Initiator Task Tag As a part of this project we calculate various flow based statics and also present the summary of the flows selected by a user query. There are two types of statistics counters and gauge. For all counters, we calculate total, minimum, maximum and average value for each selected flow. The summary at the end shows, for each counter, cumulative, minimum, maximum, and average values of the selected flows. IP packets, IP bytes, broadcast packets, multicast packets, unicast packets, frame size counters, iSCSi reads, iSCSI read bytes, iSCSI writes, iSCSI write bytes and iSCSI packets are counters. Minimum, Maximum and average command response time and minimum, maximum and average windows size are calculated as gauge. All the statistics counters are explained below: IP Packets: This counter shows number of IP packets captured in a flow. Average value is calculates by sum of total IP packets divided by number of flows in a result. IP Bytes: This counter represents the total bytes sent or received in a flow. The IP bytes is summation of packet size of all IP packets. 58 Broadcast Packets, Multicast Packets, and Unicast Packets: All three counters show the number of packets of broadcast, multicast and unicast types of packets. Frame size Counters: In understanding network performance packets size is an important factor. We consider four different frame size categories; frame size less than or equal to 64 Bytes(typical Ethernet packet), frame size between 64 Bytes and 256 Bytes, frame size between 256 and 1500 Bytes and frame size between 1500 Bytes to 9000 bytes. Minimum iSCSI Command Response Time: iSCSI command response time is the time between initiator issued a command to a target and the time when target successfully processed the command and provided the response to the initiator. Minimum command response time is the shortest time taken by a command of a flow. In summary, minimum value shows the minimum time taken by a command from all selected flows. Maximum iSCSI Command Response Time: This value shows the longest command response time taken by a command of all commands of a flow. The summary value shows the longest time taken by a command from all selected flows. Average iSCSI Command Response Time: Average response time for flow is defined as total response time for all tasks of a flow divided by total number of tasks of a flow. 59 For summary, this counter is calculated as total response time for all tasks of selected all flows divided by total number of tasks of all selected flows. Minimum Window Size, Maximum Window Size: TCP window size is the window size present in TCP packet header. Average Window size: The average window size is calculated as total window size for all packets divided by total number of TCP packets. Average Window size for all selected flows is calculated as the total window size for all packets of all selected flows divided by the total number of packets of all selected flow. iSCSI Reads and iSCSI Read Bytes: This counter shows number of iSCSI read commands present in a flow. iSCSI read bytes shows , in all read commands how many bytes are read from iSCSI target storage. iSCSI Writes and iSCSI Writes Bytes: This counter shows number of iSCSI write commands present in a flow. iSCSI write bytes shows , in all write commands how many bytes are written to iSCSI target storage. iSCSI IP Packets: This counter shows in a flow how many packets are iSCSI packets in a flow and also in all selected flows. 60 4.3 Fundamental Architecture This section describes the fundamental architecture of Netdoop, a distributed protocolaware network trace-processing tool that we developed, leveraging the features of a mapreduce framework. The map-reduce framework that we selected for our work is covered in the next section. Figure 4.1 shows a typical setup for capturing network traces and described earlier in Chapter 2. The switch is configured to mirror a port of interest to another port that is connected to a network analyzer. The network analyzer captures all the traffic that goes through the mirrored port. Figure 4.1 Setup to Capture Network Traces The network analyzer captures a linear list of frames as they appear on the mirrored port. 61 In the example above, the mirrored port is connected to an iSCSI target. The servers on the left of the figure share the storage attached to the iSCSI target. Hence the collected trace can have many flows carrying storage commands and data between the servers and the iSCSI target. Figure 4.2 Parallel Processing Architecture of Netdoop Figure 4.2 shows the fundamental architecture of our processing pipeline. The network trace is a linear list of packets captured on the wire. The packets belong to different protocols or flows. Unlike the word count example presented earlier, the network trace file cannot be split arbitrarily into chunks for parallel processing. Splitting the file into chunks without taking the protocol flows into consideration will result in the undesirable side effect of re-ordering the frames. 62 Instead, we identify the network and protocol flows in the trace, group the frames belonging to a flow and treat the group as a chunk. These flow-aware chunks can be processed concurrently on a set of computers (called workers) connected over a network. Because all the frames that comprise a flow are together, the frames belonging to a particular flow are processed in-order. Components of our Netdoop software run on each server shown in Figure 4.2. Once each server processes a flow-aware chunk assigned to it, a collection server collects the partial performance results. On the collection server, the performance metrics are merged and formatted, and the results are stored in a database. Users interact with the system using a command line interface as well as a web interface. The following is a summary of functions performed by Netdoop: Segmentation of network trace into protocol and flow-aware chunks Process each chunk concurrently with other chunks, extracting protocol header data Perform simple computations on the header data to demonstrate concurrent processing Collect partial results from concurrent processing and compile the results Store the results in a database Provide a web-based user interface for querying the database This section introduced the fundamental architecture of Netdoop, a tool that we 63 developed for processing large network traces using the map-reduce model. The next section describes the map-reduce framework used, and how the fundamental architecture that we described in this section is mapped onto the map-reduce framework. 4.4 Framework Based Architecture This section presents a detailed description of how we map the processing pipeline described in the previous section to a map-reduce framework. We also present key decisions that we had to make in the selection of a map-reduce framework, and summarize its functions, advantages and disadvantages. 4.4.1 Apache Hadoop – A Map-Reduce Framework We have discussed the map-reduce model and how it works in the earlier chapters. While the map-reduce concept is very simple, implementing it for at-scale processing is complex. Popular map-reduce frameworks generally provide the following features: Segment the input data set Schedule segments of input data to servers for processing Implement load balancing algorithms to schedule tasks appropriately Implement fault tolerance mechanisms to protect data against machine failures Implement a distributed file system to provide uniform access to data across the map-reduce cluster Provide a job tracker to report status as well as restart failed workers 64 There are a number of frameworks that take different approaches to implement the above features. We choose Apache Hadoop as it is a very popular open source map-reduce framework widely deployed in the industry. Using a framework well known in the industry makes our work more relevant and applicable to real data center use cases. Apache Hadoop is a production-ready map-reduce framework developed and maintained by the Apache Foundation [14]. The servers play three major roles in the Hadoop map-reduce framework: Master nodes Slave nodes Client machines CLIENT NODES NAME NODE SECONDARY NAME NODE JOB TRACKER MASTER NODES SLAVE NODES Figure 4.3 Components of Hadoop Map-reduce Framework [13] 65 Figure 4.3 shows the different types of nodes in a Hadoop system. The master nodes oversee the key functional pieces that make up Hadoop -- coordinating the Hadoop Distributed File System (HDFS) and running jobs on the machines. A job tracker coordinates the processing of data. The slave nodes form the majority of machines and process the actual data. Each slave node runs two services – the data node daemon and the task tracker daemon. The data node daemon is a slave to the name node, while the task tracker daemon is a slave to the job tracker. Client machines, though installed with Hadoop software, are neither masters nor slaves. The client machines are usually used to load data on to HDFS and to copy data/results out of the file system. 4.4.1.1 Typical Hadoop Workflow A typical workflow for processing data using Hadoop is as follows: Load data into the cluster (HDFS writes) Analyze the data (Map Reduce) Store results in the cluster (HDFS writes) Read the results from the cluster (HDFS reads) 4.4.1.2 Fault Tolerance and Load Balancing This section briefly describes some of the mission-critical features provided by Hadoop. A detailed description of these features is beyond the scope of this project. 66 When data is loaded into the cluster (HDFS) writes, HDFS replicates the data to multiple machines in the cluster, making data highly available in the event of hardware failures. The job tracker on a master node, along with the task trackers on each slave machine ensures that the processing workload is distributed efficiently based on the number of slave machines that are available in the cluster. 4.4.2 Netdoop and Hadoop Netdoop is a protocol and flow-aware distributed network trace-analysis tool that we developed as a part of this project. Netdoop uses the features provided by Hadoop framework, such as the Hadoop Distributed File System (HDFS) and task scheduling capabilities. 4.4.2.1 Netdoop Workflow The Netdoop tool works as follows, by leveraging some features provided by Hadoop: Netdoop software is installed on the Hadoop cluster A network trace in the format as described on page 54 is used as input The network trace is loaded on to the Hadoop cluster (HDFS writes) Hadoop calls the map function implemented by Netdoop The map function implemented by Netdoop parses the network traces for flow information, compiles protocol and flow-aware key-value pairs and hands off the key-value pairs to Hadoop for scheduling 67 The job tracker of Hadoop schedules the key-value pairs to slave nodes in the cluster for processing Hadoop calls the processing components of Netdoop on each slave node Netdoop processes the key-value pairs scheduled to that slave node and generates result key-value pairs. The key-value pairs produced in this step contain performance metrics for the protocol being analyzed Hadoop calls the reduce function implemented by Netdoop, and schedules the reduce operation on select slave nodes. The Netdoop reduce functions compile the final performance metrics and write the data to the cluster’s file system (HDFS) The performance data is stored in a MySQL database Users load and query the performance data in the database using the web interface In this chapter we presented some of the assumptions related to our work. We also described the design of data processing pipeline of Netdoop and how Netdoop leverages the Hadoop map-reduce framework infrastructure. We study the actual implementation of Netdoop in the next chapter. 68 Chapter 5 IMPLEMENTATION This chapter presents the implementation of Netdoop tool that we developed. We start by presenting an overview of the virtual data center environment that we created to deploy, test and demonstrate the distributed processing of network traces on a cluster of virtual servers. Next we present an overview of the modules and source code structure of Netdoop. We then show sample performance metrics that were generated by processing network traces using Netdoop. 5.1 Prototype Environment This section describes the virtual data center environment that was created for this project. Map-reduce frameworks are parallel processing frameworks that work well in a compute cluster environment. Input data is split into chunks and processed concurrently on multiple computers in a cluster. To build and maintain a cluster for the purpose of this project is complex, expensive and time consuming. Hence we use server virtualization software to partition a server into multiple virtual servers, and then build a virtual cluster using virtual servers. Specifically, we use VMware Fusion on an Apple Mac to host these virtual servers. The number of virtual servers we use in a cluster is limited by the amount of memory and processing cores that we have available in the computer that we use for implementing, testing and demonstrating our work as a part of this project. 69 5.1.1 Virtual Data Center VMWARE FUSION VIRTUAL MACHINE MASTER UBUNTU 10.04 SERVER 172.0.0.5 VIRTUAL SWITCH 172.0.0.1 172.0.0.2 UBUNTU 10.04 SERVER UBUNTU 10.04 SERVER VIRTUAL MACHINE SLAVE 1 VIRTUAL MACHINE SLAVE 2 172.0.0.3 UBUNTU 10.04 SERVER VIRTUAL MACHINE SLAVE 3 Figure 5.1 Virtual Data Center Environment Figure 5.1 shows the virtual data center environment we created for the purpose of this project. We use VMware Fusion virtualization software to instantiate four virtual machines on a laptop computer. The number of virtual machines we could instantiate was limited by the number of CPU cores and memory available on the laptop computer. We use Ubuntu 10.04 as the operating system in each of the virtual machines. The machines are labeled as master, slave 1, slave 2 and slave 3 based on their roles in the 70 Hadoop environment. However, the same environment was used to create an iSCSI initiator and target configuration to generate some sample network traces with iSCSI protocol traffic in them. Each virtual machine is configured with two network interfaces. Once network interface is connected to the virtual switch provided by the virtualization software. This interface allows the virtual machines to communicate with each other. Virtual machines on this private network are assigned static IP addresses as shown in the Figure 5.1. The other network interface is connected to the laptop’s networking stack so that we can communicate between the applications installed on the laptop computer and the virtual machines. This network interface uses a dynamic IP address address. The next two sections show the configuration of this virtual data center for 1) collection of network traces with iSCSI protocol traffic and 2) for installing Netdoop and Hadoop to deploy and run the tool on network trace samples. 71 5.1.2 Virtual iSCSI Environment VMWARE FUSION VIRTUAL MACHINE MASTER VDISK PCAP TRACE iSCSI TARGET WIRESHARK UBUNTU 10.04 SERVER 172.0.0.5 VIRTUAL SWITCH 172.0.0.1 172.0.0.2 172.0.0.3 iSCSI INITIATOR iSCSI INITIATOR iSCSI INITIATOR UBUNTU 10.04 SERVER UBUNTU 10.04 SERVER UBUNTU 10.04 SERVER VIRTUAL MACHINE SLAVE 1 VIRTUAL MACHINE SLAVE 2 VIRTUAL MACHINE SLAVE 3 Figure 5.2 Virtual iSCSI Environment Figure 5.2 shows the configuration of the virtual data center environment as configured to generate and collect sample network traces with iSCSI traffic. The slave nodes 1 – 3 are configured with iSCSI initiator software, and the master node is configured as an iSCSI target. The iSCSI target is configured with local virtual disks and LUNs. On successful logins from iSCSI initiators to the iSCSI target, these LUNs are exposed to the iSCSI initiators as block storage devices. We do a number of file operations on these disks on the iSCSI initiators to generate sufficient network and iSCSI traffic to the iSCSI target. 72 We install Wireshark, an open source network capture utility on the iSCSI target node to capture the network traffic on iSCSI target’s network interface connected to the virtual switch. We use Wireshark to generate the text version of the PCAP file. Network traffic captured on this port on the iSCSI target contains network traffic between the slave/iSCSI initiator nodes and the iSCSI target. 5.1.3 Virtual Map-Reduce Cluster VMWARE FUSION VIRTUAL MACHINE MASTER MYSQL WEB.PY NETDOOP NAME NODE JOB TRACKER WWW SERVER HADOOP UBUNTU 10.04 SERVER 172.0.0.5 VIRTUAL SWITCH 172.0.0.1 NETDOOP DATA NODE TASK TRACKER 172.0.0.2 NETDOOP DATA NODE TASK TRACKER 172.0.0.3 NETDOOP DATA NODE TASK TRACKER HADOOP HADOOP HADOOP UBUNTU 10.04 SERVER UBUNTU 10.04 SERVER UBUNTU 10.04 SERVER VIRTUAL MACHINE SLAVE 1 VIRTUAL MACHINE SLAVE 2 VIRTUAL MACHINE SLAVE 3 Figure 5.3 Virtual Map-reduce Environment 73 Figure 5.3 shows the configuration of the virtual map-reduce infrastructure that we build using the virtual data center environment described earlier. In this configuration, the master node fills the role of a master node in the Hadoop framework. The master node runs the name and job tracker services of Hadoop. We also install Netdoop on the master node and hence onto the HDFS. We use python web framework “web.py” to implement a web application server. We develop a web application using web.py framework to provide a web-based user interface for the user. We also use MySQL database installed on this master node to store the performance metrics computed during the analysis of a network trace. The slave nodes run the data node and task tracker services of Hadoop framework. Netdoop now resides on the distributed filesystem (HDFS) and hence is available in all the slave nodes. 5.2 Netdoop This section describes the implementation of the Netdoop tool. We use the Python programming language to implement Netdoop. We chose Python, as it is an excellent object oriented programming language and partly because of the limitations enforced by the Hadoop map-reduce framework. 74 5.2.1 Structure This section summarizes the functionality of each source file of Netdoop. 5.2.1.1 mapper.py This is the main file that implements the mapper function called by the Hadoop framework. In this file we create an instance of the FlowTable class, the flow_table object. We then call flow_table.process_line(line) where line is a line read from stdin. We use stdin because Hadoop streams the input data to the mapper function over the stdin interface. We then call flow_table.compute() to generate the partial output key-value pairs of the network statistics gathered from the network trace data received over stdin. 5.2.1.2 FlowTable.py This file implements the FlowTable class. The FlowTable class is a container class that serves as a container for a list of IscsiFlow objects. The IscsiFlow objects are instances of the IscsiFlow class. We use the class IscsiFlowEntry to parse and qualify each entry received over stdin. We then create an IscsiFlow object from the iSCSIFlowEntry. We make an entry in the FlowTable only if the entry doesn’t exist for that flow. The compute method of the FlowTable class generates key-value pairs for the sort phase of the mapreduce pipeline. 75 5.2.1.3 IscsiFlow.py In this file we implement the IscsiFlow class. This class encapsulates the properties related to a particular iSCSI flow. The properties include ip_src, ip_dst, tcp_src, tcp_dst, task_table, min_response_time, max_response_time, num_valid, num_reads, num_writes, min_window_sz, max_window_sz and frame_cnt. An iSCSI flow in this context is an iSCSI exchange between two nodes defined by the IP.src:IP.dst tuple. The class provides methods to add flow entries to the flow. The flow entries are records from the trace file received over the stdin interface. 5.2.1.4 IscsiFlowEntry.py In this file we implement the IscsiFlowEntry class. This class encapsulates the following properties of a flow: frame_time_relative, frame_number, ip_src, ip_dst, tcp_src, tcp_dst, tcp_window_size, tcp_analysis_retransmission, tcp_analysis_window_update, tcp_analysis_window_full, tcp_len, tcp_analysis_bytes_in_flight, tcp_options_wscale, tcp_options_wscale_val, iscsi_request_frame, iscsi_response_frame, iscsi_time, iscsi_cmd_string. This class is also responsible for parsing the records from the text format trace file. We implement a number of private methods within this file that help validate and parse iSCSI flow information from the network trace. 76 5.2.1.5 IscsiTask.py In this file we implement the IscsiTask class. The IscsiTask class encapsulates properties related to a single iSCSI task such as associated with a command. This class contains the following properties: cmd_str, rd_str, wr_str, is_valid, itt, iscsi_task_entries, base_time, is_response_time_valid, task_min_response_time, task_max_response_time, is_read, is_write. This class includes private methods to serve as helper functions in computing statistics and performance metrics for an iSCSI task. 5.2.1.6 NetStatistics.py In this file we implements all the routines required to calculate MIB statistics like number of broadcast packets, multicast packets, unicast packets, total number of packets and total bytes sent or received. We also check the frame size and maintain counters for each frame size categories. For example, we are currently have four different ranges for frame size, frame size less than 64 bytes, between 64 to 256 bytes, between 256 to 1500 bytes and any size greater than 1500 bytes. 5.2.1.7 reducer.py In this file we implement the reducer function called by the Hadoop framework. The reducer function is responsible for aggregating the partial results collected and sorted by Hadoop. Once again Hadoop passes these intermediate results as key-value pairs over the stdin interface. Therefore reducer.py reads its input from stdin. 77 The implementation of the reducer is very similar to the mapper, in that we use a ReducerFlowTable class to capture the partial results of the iSCSI flows generated by the map phase of Hadoop. We call the reducer_flow_table.process_line(line) method of the ReducerFlowTable class, where line is the key-value pair that we receive on stdin. We then call reducer_flow_table.report() to dump the final results generated by compiling the partial results from the map phase. 5.2.1.8 ReducerFlowTable.py In this file we implement the ReducerFlowTable class. This class is a container for objects of the ReducerFlow class. The only property of this class is flow_table, which is a python hash for storing the flow table entries. The ReducerFlowTable class implements two methods. The process_line() method parses the incoming key-value pairs and makes entries in the flow_table structure. The report() method generates a report of the performance metrics processed by the reducer. 5.2.1.9 ReducerFlow.py In this file we implement the ReducerFlow class. This class encapsulates the following properties corresponding an iSCSI flow: key, min_response_time, max_response_time, min_window_size, max_window_size, iscsi_frame_count, num_reads, num_writes. This class also implements the process_value(value) method that extracts the fields from the 78 key-value record received over stdin interface after the map phase. 5.2.1.10 ReducerNetStates.py In this file we implement reduce functions for all MIB statistics. 5.2.1.11 web2netdoop.py In this file we implement the logic for the web interface. We leverage the features provided by a popular web application framework for python – web.py. We implement interface functions that connect to the MySQL database and extract data. We also implement function that format the data retrieved from the database and present it to the browser as HTTP responses. 5.2.2 Execution This section describes the user interation with Netdoop. Specifically we describe the commands to copy the network trace to the cluster, as well as to submit the network trace for processing. We then describe the web-based user interface for querying the performance results stored in the database. 5.2.2.1 Copying the network trace to the cluster >bin/hadoop dfs –copyFromLocal network_trace.txt /user/user/network_trace.txt 79 5.2.2.2 Submitting the network trace for processing >bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar – file netdoop/mapper.py – mapper mapper.py –file netdoop/reducer.py /reducer reducer.py –file netdoop/IscsiFlow.py –file netdoop/IscsiFlowEntry.py –file netdoop/IscsiTask.py –file netdoop/ReducerFlow.py –file netdoop/ReducerFlowTable.py /user/user/network_trace.txt –output /user/user/results.txt 5.2.2.3 Copying data from the cluster >bin/hadoop dfs –getmerge /user/user/results.txt results.txt –input 80 5.2.2.4 Web-based Interface Figure 5.4 Netdoop Web-based User Interface Figure 5.4 shows a screen capture of the web-based user interface of Netdoop. The webbased interface allows users to query the performance metrics stored in the database. For this reference implementation we support a minimal set of queries such as the flow defined by the IP address pair, the date-range and time-range. However our architecture is extensible and can easily be extended to support more complex queries. 81 Figure 5.5 Select All From Table Query Result Figure 5.5 shows an example screen capture of select all data from the table query. First all flows selected from the query are displayed with expandable control. After all flow statistics summary of search is displayed 82 Figure 5.6 Flow Statics Calculated by Netdoop Netdoop calculates iSCSI protocol related statistics and network statistics for each flow. Figure 5.6 shows statistics calculated per flow in detail. 83 Figure 5.7 Search Summary Statistics Figure 5.7 shows an example of summary of search result in detail. In this chapter we presented a detailed description of the implementation of Netdoop. We described the functions implemented by each source file. We also presented the command line interface, the commands to submit a network trace for processing as well as the web-based user interface. In the next chapter, we present our conclusions and future work. 84 Chapter 6 CONCLUSION AND FUTURE WORK In this chapter we draw a conclusion about the suitability of using map-reduce mode and frameworks for analyzing large network traces and extracting performance metrics of network protocols. We also suggest what future work can be done based on the result of our work. 6.1 Conclusion We studied the applicability of map-reduce model for processing network traces. Processing a large network trace using a parallel programming model involves multiple operations. These operation include identifying and grouping network flows present in the network trace, sorting the groups, scheduling the groups for processing on a cluster of processing nodes and aggregating the results. The map-reduce model has been found to be a good match for scheduling the workload and providing a distributed storage infrastructure accessible across the cluster. However the general purpose map-reduce framework are not optimal in partitioning network traces as well as aggregating the results. This is because of the ordering requirements and protocol awareness that is required for this step. We implemented Netdoop, a tool that leverages some of the useful features of Hadoop, a general-purpose map-reduce framework that is very popular in the industry. Netdoop 85 adds protocol/flow-awareness to Hadoop, allowing us to use the scheduling, HDFS and fault-tolerance features that Hadoop offers. Specifically we observed that the processing time requirements for breaking up a network trace into small flow-based chunks, and processing all the chunks in parallel produces results faster than executing the same task on a single machine. We also noticed that the frame ordering is maintained and verified that the computed performance statistics were as expected. To summarize, Netdoop brings scalability through parallel processing to the analysis of network traces. While our study and implementation showed what was possible using the map-reduce model, our work needs to be extended before it can be ready for real-world usage. 6.2 Future Work There is clearly need for improving the workflow, starting from the network trace in PCAP format to storing the results into the database. One improvement can be to write a protocol-aware scheduler plug-in for Hadoop that avoids the need for a text formatted input file. Another improvement is to use a more modern real-time map-reduce framework such as Yahoo S4. Using a real-time map-reduce engine will allow the network trace to be continuously streamed through the processing software without having to first saving it to disk and processing it later, thus improving upon the batchprocessing nature our existing implementation. 86 BIBLIOGRAPHY [1] J. Dean, and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Communications of the ACM, 51, 2008, ACM, pp. 107-113. [2] Fibre Channel Backbone - 5 (FC-BB-5). [3] Cisco Systems Inc. Introduction to Cisco IOS NetFlow. http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6555/ps6601/ prod_white_paper0900aecd80406232.pdf [4] Google Inc. Introduction to Parallel Programming and MapReduce. http://code.google.com/edu/parallel/mapreduce-tutorial.html [5] IPCopper. IPCopper ITR1000 Packet Capture Appliance. http://www.ipcopper.com/product_itr1000.htm [6] JDSU. Xgig:Powerful Hardware-Based Ethernet/iSCSI Analyzer for SAN And NAS. [7] Peter Phaal. sFlow Version 5. http://www.sflow.org/sflow_version_5.txt [8] J. Satran et al. Internet Small Computer Systems Interface (iSCSI). http://www.rfc-editor.org/rfc/pdfrfc/rfc3720.txt.pdf [9] Wikipedia. Converged Network Adapter (CNA). http://en.wikipedia.org/wiki/Converged_network_adapter [10] WIRESHARK. PCAP File Format. http://wiki.wireshark.org/Development/LibpcapFileFormat [11] WIRESHARK. WIRESHARK: The World's Foremost Network Protocol Analyzer. http://www.wireshark.org/ 87 [12] Xgig Software. http://www.sanhealthcheck.com/files/Finisar/Xgig/81Z6AN-Xgigiscsi-analyzer-data-sheet.pdf [13] Understanding Hadoop cluster and network. http://bradhedlund.com/2011/09/10/understanding-hadoop-cluster-and-the-network [14] Apache Hadoop. http://hadoop.apache.org [15] NetFlow Analyzer Software. http://networkmanagementsoftware.com/5-free-tools-for-windows