1
2
Network monitoring and analysis is essential in order to more effectively troubleshoot and resolve issues when they occur, so as to not bring network services to a stand still for extended periods. Therefore, several software network analyzers have been introduced in the market till date. However, the latency while analyzing the data over a network by software based analyzers is high and hence cannot be performed real time over high speed networks. We develop a hardware based system to perform data analysis at line speed (1Gbps). The main purpose of our project is to monitor network for malicious attacks and to keep a track of data exchange between various reliable and unreliable systems. Finally, we compare the performance by hardware data analyzer with the performance of software data analyzers on the basis of latency.
Network monitoring is a difficult and demanding task that is a vital part of a Network
Administrators job. Network Administrators are constantly striving to maintain smooth operation of their networks. If a network were to be down even for a small period of time productivity within a company would decline. Thereby administrators need to monitor traffic movement and performance throughout the network and verify that security breaches do not occur within the network.
The traffic reports helps to detect anomalies in the network behavior. This saves a lot of time and cost involved in recovering from an attack which is about to occur. Several software based analyzers are being used for network traffic monitoring.
High performance RISC processor.
Soft core Processor implemented on NetFPGA-1G with Xilinx Virtex 2 pro FPGA.
Dual-Core-Dual Thread Processor with 32-bit Instructions.
512x32b Instruction Memory.
Data memory convertible to a FIFO which is a 256x64b dual port RAM for each core.
32 General purpose registers for each core (each register is of 64-bits size).
Supports up to 2 threads.
Direct, Indirect and Relative Addressing.
Includes two hardware accelerators: Peer to Peer Detection System and Malicious Pattern detection system.
Very minimal latency for analyzing data and displaying the information on dashboard.
A 16x32b memory for hardware accelerator. (CAM Specification)
Works at 125 MHz.
3
The NetFPGA is a line-rate, flexible, and low cost open platform for used research and as a teaching tool for networking hardware and router design. More than 2,000 NetFPGA systems have been deployed at over 150 institutions in over 40 countries around the world. The board contains four 1 Gigabit/second Ethernet (GigE) interfaces, a user programmableField Gate Array
(FPGA), and four banks of locally Programmable attached Static and Dynamic Random Access
Memory (SRAM and DRAM).
It has a standard PCI interface allowing it to be connected to a desktop PC or server. Areference design can be downloaded that contains a hardware-accelerated Network Interface Card (NIC) or an Internet Protocol Version 4 (IPv4) router that can be readily configured into the NetFPGA hardware. The router kit allows the NetFPGA to interoperate with other IPv4 routers.
Field Programmable Gate Array (FPGA) Logic o o o o o o
Xilinx Virtex-II Pro 50
53,136 logic cells
4,176 Kbit block RAM up to 738 Kbit distributed RAM
2 x PowerPC cores
Fully programmable by the user
Gigabit Ethernet networking ports o o
Connector block on left of PCB interfaces to 4 external RJ45 plugs.
Interfaces with standard Cat5E or Cat6 copper network cables using.
Fig. 1 Block diagram showing Major Components (NetFPGA)
4
The system design consists dual core-dual thread processor integrated with two hardware accelerators per core, input packet arbiter and fall through fifo. The whole design is put over the
NetFPGA. The hardware accelerators are used to assist processor in fast processing of packets.The processor has its own customized RISC Instruction Set Architecture (ISA) comprising arithmetic, logical and control instructions.
The data memory of each processor is a convertible fifo memory. It works as a fifo when packets coming from input packet arbiter get stored into the same. Once the packet gets stored in fifo completely, and, control taken by processor then convertible fifo is converted to data memory for the processor where the stored packet is processed.
The packet, which is stored in the convertible FIFO, is also intercepted by the two hardware accelerators, integrated with the processor, at the same time. The hardware accelerators are doing deep inspection of packet to extract the vital information required to the analyses.
Various Applications supported by the network processor are:
1.
To identify the malicious activities in the network.
2.
To identify the network topology – node to node connection through NetFPGA.
3.
Identifying connection between reliable and unreliable system.
4.
Identifying a possible distributed attack on a particular server.
The conceptual diagram of the whole system showing the basic blocks – fall through FIFO, processor cores, hardware accelerators is shown in fig 2.
CORE 1 asdc
P2P
Detection
System
Intrusion
Detection
System
P2P
Detection
System
Intrusion
Detection
System
CORE 2
Fig. 2 System Design
5
.
The processor is a 5-stage pipelined that supports up to 2 threads. There are 2 separate program counters for each of the thread. The different stages of the core are Instruction Fetch
(IF),Instruction Decode (ID),Execution (EX), Memory (MEM) and Write Back (WB) stage. The register file is a 32x64 bit register file out of which currently 17 register files are being used by the processor core.
The fig. 3 below shows the multithread implementation for the processor core before the instruction is fetched and decoded.
Fig. 3 Multi Thread implementation
The processor supports most of the MIPS instructions. Each of the threads has specific starting addresses in the memory and the different PCs are reset to these addresses. There is a multiplexer that controls which PC address is to be taken to the Instruction memory.
The multiplexer select line is controlled by a combinational logic circuit that decides which thread to be selected depending upon the opcode bits. Instructions for each of the threads are written in locations given below:
Instruction Memory 512 x 32 bits
Thread 1: Starting Memory Address 0
Thread 2: Starting Memory Address 256
The figure above shows the two program counters implemented for multi-threading, the pc counter is selected depending upon the multiplexer shown which gives the address of the instruction to be fetched and decoded to the instruction memory.
6
The fig. 4 below shows the IF/ID stage of the processor and consists of IF/ID stage-register, instruction memory whose depth is 512, register file and control unit. The running thread decides from which address instruction has to be fetched from the instruction memory. The control unit takes the opcode as an input and generates signals according to the opcode for the instructions to be executed.
The control signals generated are given accordingly to ALU executes its operation depending upon the signals generated by control unit for each instruction. The general operations performed by ALU are to execute arithmetic/logical instructions. Control unit also generating signals to deliver to the muxes for taking the correct source/destination register pair or to jump/branch mux.
Fig. 4 IF/ID Stage
Fig 5 below shows the other stages of the pipeline – EX, MEM and WB stage. It consists of
ID/EX-stage register, equality checker for branch instruction, hardware (combine box)for generating the jump address from the offset specified in the ISA, muxes for taking the correct source/destination register pair based on the instruction to be executed and ALU for the operations.
The result generated from the ALU as well as the other control signals for writing back the data into register file are passed to the EX/MEM stage register. In the memory stage we have FIFO which acts as both FIFO when there is a flow in input packet and data memory when the processor is in use. The data is fetched from the memory depending upon the instruction being executed and is written back into the register file using the wb-stage register
7
Fig. 5 EX/MEM/WB Stages
The other components for the pipelined processor include the convertible FIFO/data memory , its control unit, jump, branch muxes and combinational circuit to generate the branch mux select signal are shown in figure 6 below.
Fig. 6 Convertible FIFO Control Unit
8
The ISA of the processor is 32 bit wide with fields Opcode, Source 1, Source 2, Destination,
Shamt (shift amount), Function Fields and Stall. Each core of the processor contains 17(32 are defined but currently 17 in use per core) general purpose registers. Therefore, the address of destination (rd), source (rs) and transfer (rt) registers are represented by 5 bits. All the immediate instructions, LW/SW, branch and jump take a 9 bit data/address offset from the instruction. The opcode and function fields each are 6 bits wide and Shamt is 4 bit wide.
The processor currently supports basic R-type (ADD, SUB, AND, OR, XOR, XNOR, SLT,
SLTU, SLL, SLR), LW/SW, Immediate (ADDI, SUBI, ANDI, SUBI, ORI, XORI, XNORI,
SLTI, SLTUI), branch equal (beq), branch not equal (bne)and jump instructions. The processor is flexible as further addition of control logic can be done to support instructions like jal and the general purpose registers count can also be increased to 32.
The stall bit is used to stall the pipeline, it is represented by the last bit of the last instruction executed by the code written in instruction. The control signals for the instructions are generated in the control logic in the schematic which is basically combinational logic which generates the signals depending upon the opcode provided to it from the decode stage.
Table-1
The control unit generatessignals are regwrite, memwrite, alusrc, jump, beq, bne, aluop, regdest and memreg.
Instructions RegWrite Alusrc Memwrite Jump Bne Beq
R-Type
I-Type
1
1
0
1
0
0
0
0
0
0
Aluop
0
10
0 10
LW
SW
1
0
1
1
0
1
0
0
0
0
0 00
0 00
BEQ
BNE
Jump
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
1 01
0 01
0 11
0
0
0
0
0
0
Regdest
1`
0
1
0
0
0
0
Memreg
0
Table-2
9
A finite-state machine (FSM) is simply a mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time; the state it is in at any given time is called the current state. It can change from one state to another when initiated by a triggering event or condition; this is called a transition. A particular FSM is defined by a list of its states, and the triggering condition for each transition.
The FIFO control state machine (SM) contains 3 statesFIFO_WRITE, PROCESSOR, FIFO_READ.
FIFO_WRITE
1. Reset all the signals
2. Write packet into FIFO.
3. Check for last word.
4. If lastword of packet received : goto PROCESSOR state
1. Packet is read out from memory.
2. When read pointer becomes one less than write pointer : goto FIFO_WRITE state
FIFO_READ
PROCESSOR
1. Processor starts running.
2. Thread1 runs then thread2 runs.
4. When proc_end signal becomes high : goto
FIFO_READ state.
Fig. 7 FIFO Control State Machine
10
Software based data analyzers cannot achieve line speed and involve packet decoding overhead.
The software based data analyzers have some time lag between taking the information, analyzing it and then displaying or providing the required information. The latency while analyzing the data over a network by software based analyzers is quite high and can some-times lead to the analysis that no longer holds significance in the context, say for example there is an imminent attack (distributed attack) or in an educational institution restricted website has been accessed, then the latency in the result being interpreted and provided can cause damage, so developing the data analyzer to work on line speed using hardware accelerator seemed a possible solution.
Fig. 7 Network Analyzer Block diagram
The packet is sent through the network analyzer which is ported on NetFPGA and is received at the processor (multi core – dual threaded) alongside the hardware accelerators which extracts the information for mapping it to the network topology - showing the interconnection of various nodes in the system. The system also checks for any restricted website being accessed into the system by analyzing the data transfer and alerting the admin if the data transfer crosses a particular threshold.
11
The system also checks for any malicious activity going on in the system by doing deep packet inspection and alerting if there are malicious packet being sent or if there is any distributed attack going on.
Figure 7 above shows the basic diagram representing the various nodes in a network connected through a router installed at NetFPGA. The hardware network analyzer is also ported on the
NetFPGA to access the packets flowing through the router and analyze it to represent it in required way. The data extracted through the analyzer is then read by software system to display the information analyzed in user understandable format, alerting the user/admin about any malicious activity in progress.
This hardware accelerator is mainly designed to keep track of protocols, unauthorized access and network topology connections of incoming packets.
This hardware accelerator provides data for applications such as Traffic Distribution,
Unauthorized access of any node in a network. This hardware accelerator allows the processor to analyze data at line speed.
It is designed keeping in mind that no other protocol except the intended protocol information is extracted and analyzed. Thus avoiding unwanted packet information such as ARP protocol.
Traffic distribution is analyzed in terms of data being transferred between network for particular protocol and for each node.
Unauthorized access latches if any time any node accesses an unauthorized IP address thereby providing access information and data transferred during access for each node.
The figure 8 below shows the circuit diagram for Peer to Peer Detection System.
Fig. 8 Peer to Peer detection System
12
It analyses each packet for any malicious pattern and keeps track of malicious activity of a particular node over a period of time. This information is useful for the administrator to know the network nodes activity in a network. This Hardware accelerator also provides information about
Distributed attack in a network. For instance if three patterns are transferred, say pattern1 from node0, pattern2 from node1 and pattern 3 from node2, we say that it was a distributed attack.
Thus this hardware accelerator alerts the administrator of any distributed attack as well as malicious activity going on the network.
The hardware accelerators receive the input packet along with the processor, basically the convertible FIFO, extract the information required for malicious activity detection and protocol information which is then given to the OpenGL system to display the information is user readable format.
The figure 9 below shows the circuit diagram for Malicious Pattern detection system.
Fig. 9 Malicious Activity Detection
A compiler is a computer program that transforms source code written in a programming language(the source language) into another computer language (the target language, often having a binary form known as object code or executable program).
13
The Instruction Set Architecture of our network processor is based on MIPS 5-stage pipeline architecture. MIPS gcc-cross compiler generates MIPS assembly code from a C program.
This assembly code is given as input to the translator, whose source code is written in C programming language, translates it to our modified ISA.
Assembler, written in C programming language, is used to parse and then further converts this modified ISA to binary file.
The generated binary file is then loaded onto the processor’s instruction memory using a Perl script.
C PPOGRAM
MIPS GCC Compiler
MIPS Assembly Code
CCCoCompiler
Translator
Customized Assembly Code
Assembler
Binary Output
Fig. 10 Flow of Custom Compile
Hardware registers are used to read out the count values for ICMP, TCP and UDP protocol, access information, count of patterns sent by particular node. These Hardware registers can be written by hardware only and read from software.
Software registers are used to initialize values such as different pattern information, clear. Software registers can be written by software, but read out by hardware. Thus setting any registers for any pattern information or resetting any particular registers in hardware is done by software registers.
14
OpenGL (Open Graphics Library) is a cross-language, multi-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardware-accelerated rendering.
OpenGL was developed by Silicon Graphics Inc. (SGI) from 1991 and released in January 1992 and is widely used in CAD, virtual reality, scientific visualization, information visualization, flight simulation, and video games. OpenGL is managed by the non-profit technology consortium Khronos Group.
We have utilized the power of OpenGL and C++ to display a dashboard for the data extracted through the hardware. The data from the hardware registers of the NetFPGA is extracted through Perl and provided into a text file continuously. The C++ code reads the data, interprets it and calls the OpenGL libraries to display the information in graphical format.
The figure below shows the basic Network Topology, restricted website access, malicious activities going on in the network and traffic distribution per node for the network.
Fig. 11 Dashboard View
15
Software Implementation
The software module was designed in line with the hardware implementation. There is a server and client node generated using socket programming. The time stamp is calculated when the data packet is received at NetFPGA and when the extracted information is written into a file for analysis and after that representing it on a dashboard through OpenGL. The latency achieved is 2.287ms.
Latency for the network processor achieved is 5.4us, calculated by taking the packet size sending from a node which is received at NetFPGA and written into the file.
The network processor analyses the data flowing through network at line speed, once implemented on the control node. The basic applications of the network processor designed and implemented are:
Keeping track of malicious activity in the network.
Displaying network topology – nodes to node connection.
Identifying restricted website access if the data exchange between the nodes exceeds a certain threshold.
Identifying a possible distributed attack on a particular server.
16
Team Alpha Adroits was formed as a part of the EE533 coursework at University of Southern
California, Los Angeles. The course is being instructed by Dr. Young Cho and mentored by
Siddharth Bhargav.
Ankit Dwivedi - Graduate Student - Electrical Engineering, The University of Southern
California
Nitish Jain - Graduate Student - Electrical Engineering, The University of Southern
California
Puneeth Appi Reddy - Graduate Student - Electrical Engineering, The University of
Southern California
Ritu Arora - Graduate Student - Electrical Engineering, The University of Southern
California
Vinit Melinamani - Graduate Student - Electrical Engineering, The University of
Southern California
Web Link for project - http://www-scf.usc.edu/~adwivedi/page/Alpha_Adroits_Home.html
17
NetFPGA: A Tool for Network Research and Education - Greg Watson, Nick McKeown and Martin Casado
G. Watson, "NetFPGA: A Tool for Network Research and Education", 2nd Workshop on
Architecture Research using FPGA Platforms (WARFP) February, 2006.
Andrew Goodney, Shailesh Narayan, VivekBhandwalkar, Young H. Cho, "Pattern Based
Packet Filtering using NetFPGA in DETER Infrastructure," First Asia NetFPGA
Developers' Workshop, Daejeon, Korea, June 2010.
opengl-tutorial.org
wikipedia.org
netfpga.org
wikipedia.org/wiki/OpenGL
http://www.wdprocessor.com
/
18