Usefulness of the results - a forgotten evaluation metric of traffic identification tools Tomasz Bujlow (tbu@es.aau.dk) Aalborg University Agenda • Few words about myself • Motivations for traffic monitoring • Existing methods and tools for traffic monitoring & classification - and why they are far from being excellent • A deep look into Deep Packet Inspection • How to verify the accuracy of classification tools? • Implementation and various applications of VBS Tomasz Bujlow • University of Southern Denmark (2007 – 2009) Bachelor of Computer Engineering, Computer Engineering • The Silesian University of Technology (2003 – 2008) Master of Science in Engineering, Computer Engineering • Aalborg University (2010 – 2014) Doctor of Philosophy (PhD): Classification and analysis of computer network traffic • Universitat Politecnica de Catalunya (January 2013 – April 2013) Visiting PhD Student, CBA - Broadband Communications Research Group • Cisco Certified Network Professional (2010) Part I Motivations for traffic monitoring Why to perform traffic monitoring? • To obtain basic statistical information about different kinds of flows in the network and improve Quality of Service Our interests ● ● ● content (audio, video) application (P2P, FTP) service (YouTube, Facebook) Why to perform traffic monitoring? • To obtain the knowledge which applications are most frequently used in the network and enhance user experience by tuning some network parameters or setting up dedicated proxies or servers Our interests ● application (Skype, HTTP) Why to perform traffic monitoring? • To compare users located in the same network and group them into profiled sections Our interests ● ● application (Skype, BitTorrent) IP protocol (TCP / UDP) Why to perform traffic monitoring? • To create graphs of traffic flow between different networks and optimize amounts of bandwidth bought from different content providers Our interests ● service (YouTube, Facebook) Why to perform traffic monitoring? • To introduce smart logging of traffic. Logging is now required by law. The ability to recognize types of transmitted content can result in registering of only, for example, text content of websites, but not images or downloaded binary files. That will save resources, especially storage space. Our interests ● content (text, audio, video) Why to perform traffic monitoring? • To create a traffic generator, to imitate traffic generated by particular applications, or to imitate the real traffic in the network. That allows to test different solutions before implementing them in the real network, and therefore, to minimize the cost. Our interests ● ● ● ● IP protocol (TCP / UDP) application (HTTP, BitTorrent, Skype) content (audio, video) service (YouTube, Facebook) Why to perform traffic monitoring? • To obtain precise data needed to create fast and accurate traffic classifiers working in the network core, which are based on statistical informations (Machine Learning Algorithms). Our interests ● ● ● ● IP protocol (TCP / UDP) application (HTTP, BitTorrent, Skype) content (audio, video) service (YouTube, Facebook) Why to perform traffic monitoring? • To implement smart assessment of QoS in the network at the users' level and in the core of the network Our interests ● ● ● ● IP protocol (TCP / UDP) application (HTTP, BitTorrent, Skype) content (audio, video) service (YouTube, Facebook) Why to perform traffic monitoring? • To understand the behavior of different applications, services, ... Web browsing YouTube World of Warcraft Our interests ● ● ● Skype ● IP protocol (TCP / UDP) application (HTTP, BitTorrent, Skype) content (audio, video) service (YouTube, Facebook) Why to perform traffic monitoring? • To detect malicious traffic, such as Botnet traffic Our interests ● application (bot?) Why to perform traffic monitoring? • To detect malicious traffic, such as DDoS attacks Part II Existing methods and tools for traffic monitoring & classification - and why they are far from being excellent Traffic classification – overview • Classification by ports • Deep Packet Inspection (DPI) • QoS based (IP precedence, DSCP) • Statistical classification Port-based classification • Very simple idea, widely used by network administrators to limit unwanted traffic (generated by worms, spam, etc.) • Implemented on almost all layer-3 switches existing on the market • Can classify only applications operating on fixed ports numbers • Very easy to cheat, so unreliable What can we get? ● low application-layer protocol (HTTP, POP3) for some old, wellknown cases Deep Packet Inspection (DPI) • Rely on inspecting the payload on the application layer • Much more convenient to use than previously described methods • Requires significant amounts of resources • Numerous privacy and confidentiality issues • Encryption makes DPI more difficult • False positives and false negatives due to implemented statistical methods in DPI tools What can we get? ● ● Everything we want: IP protocol, application, content, service, etc But what kinds of results are really produced by the existing DPI tools? DPI – is it really a consistent mean? • Ipoque PACE – application level, content container level [FLASH, WINDOWSMEDIA, QUICKTIME] • OpenDPI – an open-source fork of PACE, the same level of consistency • nDPI – successor of OpenDPI, additionally: service provider level [FACEBOOK, GOOGLE, TWITTER] • Libprotoident – L4 level [TCP / UDP] + the application [BitTorrent], content [Flash_Player], or service provider [YahooError] • NBAR – consistent output on the application level • L7-filter - consistent output on the application level Today accuracy != consistency ● ● accurate tools (PACE, OpenDPI, nDPI) – inconsistent consistent tools (NBAR, L7-filter) - inaccurate DPI – results by PACE, OpenDPI, nDPI • applications and application protocols: BITTORRENT, RDP, SMB, NTP, SSH, DNS, PANDO, NETBIOS, EDONKEY, SOPCAST ,DIRECT_DOWNLOAD_LINK, FTP, ICMP, QUICKTIME, MAIL_SMTP, MAIL_IMAP, WINDOWSMEDIA, MAIL_POP, PPSTREAM, STUN, STEAM • low-level application protocol: HTTP, SSL • content: FLASH, MPEG • Undetected traffic: UNKNOWN • nDPI adds few services, as Facebook, YouTube, and Google DPI - effects of the consistency aspect • Even if the classification results are consistent on the application level, other levels are unknown (IP protocol, lower application protocol, content, service). So, the usefulness of such results is very limited. However, they can be used for the accounting purposes on the application level. • Mixing the levels of the results makes the things even worse: a) it is not possible to account the traffic on any level, as always one chosen level is given and the rest is unknown b) as only one level is given, we do not know what is on any other level, so the usefulness of such results in almost NONE! Today accuracy != consistency ● ● accurate tools (PACE, OpenDPI, nDPI) – inconsistent consistent tools (NBAR, L7-filter) - inaccurate DPI - reasons for the lack of consistency • Most developers claim that “their tool provides the most detailed result, on whatever level it is” • However, how to assess, which level is more precise? Content (MP4 video), content container (Flash), service (YouTube), or application protocol (HTTP)? • Given that the obtain result is Flash, what is the real flow association? a) TCP → HTTP → Flash → MP4 video → YouTube (regular file download)? b) TCP → RTMP → Flash → Justin.tv (live TV streaming)? c) TCP → FTP → Flash → EXE (executable file inside Flash container transferred by FTP)? Today accuracy != consistency ● ● accurate tools (PACE, OpenDPI, nDPI) – inconsistent consistent tools (NBAR, L7-filter) - inaccurate DPI – how to generate useful results? • Structure the results, so all the relevant classification levels are evaluated: a) IP protocol level (TCP / UDP) b) lower application-level protocol, as HTTP, SSL, POP3, etc c) higher application-level protocol or application, as SMTPS, Skype, BitTorrent, Dropbox d) content, as MP4 video, FLV video, MP3 audio, JPG image e) service, as Facebook, YouTube, or Google • Implemented by: a) new version of PACE (partly and in a very limited manner) b) new, development version of nDPI (full implementation) DPI – results generated by new PACE BitTorrent:plain:not_detected SSL:generic:not_detected RDP:no_subprotocols:not_detected HTTP:generic:not_yet_detected unknown:no_subprotocols:not_yet_detected HTTP:generic:youtube BitTorrent:uTP:not_detected HTTP:generic:youtube SMB/CIFS:no_subprotocols:not_detected Socks:socksv5:not_yet_detected SSH:no_subprotocols:not_detected PPLIVE:no_subprotocols:not_detected HTTP:generic:not_detected Skype:unknown:not_detected BitTorrent:encrypted:not_detected PPSTREAM:no_subprotocols:not_detected DNS:no_subprotocols:not_detected Google:encrypted:not_detected Pando:no_subprotocols:not_detected unknown:no_subprotocols:not_detected NETBIOS:no_subprotocols:not_detected HTTP:media:not_detected Yahoo:webmail:not_detected FLASH:no_subprotocols:not_detected eDonkey:plain:not_detected HTTP:generic:facebook DPI – results generated by nDPI-ng • proto: TCP->SSL_with_certificate->POP3S, service: Google → encrypted POP3 session with a Google mail server • proto: TCP->SSL_with_certificate, service: Twitter" → encrypted connection to a Twitter server • proto: TCP->FTP_Data, content: JPG → file-transfer FTP session, which carries a JPG image • proto: TCP->SSL_with_certificate->Dropbox, service: Dropbox → encrypted Dropbox session (the application is Dropbox) with the Dropbox server • proto: TCP->SSL_with_certificate, service: Dropbox → encrypted session with a Dropbox server, while the application is unknown (it can be a web browser connection) • proto: TCP->HTTP, content: WebM, service: YouTube → a flow from YouTube, which transports WebM movie • proto: UDP->DNS, service: Facebook → DNS query about a hostname belonging to Facebook Using QoS markers • Class of service (CoS): 3-bit field that is present in an Ethernet frame header when 802.1Q VLAN tagging is present • Very easy to cheat – everyone can set it to any value • Most Internet Service Providers do not trust incoming QoS markings from their customers What can we get? ● Nothing more than previously set by a user or an application Using QoS markers • IP packets contain the Type of Service field, which can be used for layer-3 QoS marking What can we get? ● Nothing more than previously set by a user or an application – limited to trusted devices in the network Using QoS markers • Valid values for IP Precedence: 0 - 7 • Valid values for DSCP: 0 – 63 Statistical classification • Based on rules, which can be written manually (slow and inefficient) or derived automatically by the use of Machine Learning Algorithms (MLAs) • Very broad choice of MLAs: K-Nearest Neighbors, K-Means, Naive Bayes Filter, C4.5, J48, Random Forest, etc • Achievable detection rate is over 95% • MLAs require significant amount of good quality training data • But... the speed is the power! What can we get? ● ● ● application content (indirectly) service (indirectly) So how can we use the statistical methods? What can we use to classify the traffic by the statistical methods? • IP protocol level → Type field from the IP packets • Application level → statistical classification by packet sizes, ports, TCP flags, flow durations, etc • Content level → statistical classification by IP addresses • Service provider level → statistical classification by IP addresses What is the real result? ● ● Pretty good accuracy for the cases, which were trained by MLA Poor accuracy for all the other cases Identification of service providers • Monitoring of DNS replies delivers the required information • Problems: many service providers using the same IP address “tcpdump -v -K -n -N -t -i eth0 udp src port 53” IP (tos 0x0, ttl 46, id 30600, offset 0, flags [none], proto UDP (17), length 102) 8.8.8.8.53 > 172.26.10.88.58238: 33261 2/0/0 www.facebook.com. CNAME star.c10r.facebook.com., star.c10r.facebook.com. A 31.13.72.17 (74) IP (tos 0x0, ttl 46, id 26945, offset 0, flags [none], proto UDP (17), length 181) 8.8.8.8.53 > 172.26.10.88.46207: 10707 4/0/0 fbstatic-a.akamaihd.net. CNAME fbstatica.akamaihd.net.edgesuite.net., fbstatic-a.akamaihd.net.edgesuite.net. CNAME a1168.dsw4.akamai.net., a1168.dsw4.akamai.net. A 95.101.2.73, a1168.dsw4.akamai.net. A 95.101.2.91 (153) Part III A deep look into Deep Packet Inspection How much information is needed? • It depends on the specific DPI tool • Libprotoident requires only 4 bytes of packet payload in each direction to recognize the traffic. The price: only IP protocol and application levels can be determined. • Other tools also process following bytes, looking for specific signatures of a content or a service. • Some signatures can identify the traffic after receiving 1 first packet with payload (as DNS, NTP, or BitTorrent). Finding the web service or content in an HTTP flow usually requires 4 first packets. • The most 10 packets in each direction should be sufficient to determine all the flow characteristics. Which information is used by DPI? Libprotoident: comparison of the first 4 Bytes of payload + of the packet lengths + port numbers if (!match_str_either(data, "\x01\x00\x00\x00")) return false; if (!match_chars_either(data, 0x00, 0x00, 0x00, ANY)) return false; if (data->payload_len[0] == 4 && data->payload_len[1] == 1) return true; if (data->server_port != 53 && data->client_port != 53) return false; Which information is used by DPI? In PACE / OpenDPI / nDPI, there are the same checks: if ((payload_len > 0) && match_first_bytes(packet->payload, "\xe9\x03\x41\x01")) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, "Found PPLIVE.\n"); if ((payload_len == 0) || ((payload_len == 2) && (packet->payload[0] == 0x05) && (packet>payload[1] == 0x00))) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, "Found SOCKS5.\n"); if ((payload_len == 0) || (payload_len == 49) ||(payload_len == 94)) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, "Found PPLIVE.\n"); if ((packet->udp->dest == htons(5041) || packet->udp->source == htons(5041)) NDPI_LOG(0, ndpi_struct, 0 "Possible PPLIVE ...\n"); Which information is used by DPI? • But they are done for each packet separately (in the order how the packets arrive), so we do not have access to the payload of the previous packet • The detection status is kept in state variables associated with the particular flow Which information is used by DPI? However, they use a bunch of other methods, as IP check: /* Apple (FaceTime, iMessage,...) 17.0.0.0/8 */ if(((saddr & 0xFF000000 /* 255.0.0.0 */) == 0x11000000 /* 17.0.0.0 */) || ((daddr & 0xFF000000 /* 255.0.0.0 */) == 0x11000000 /* 17.0.0.0 */)) { flow->ndpi_result_service = NDPI_RESULT_SERVICE_APPLE; } Which information is used by DPI? Or TCP flags: if (packet->tcp->psh != 0 && flow->rtmp_bytes == 1537) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, Or even the number of processed packets: if (flow->packet_counter > 20) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG..... Which information is used by DPI? In order to discover web services and types of HTTP content, nDPI parses HTTP headers to discover the “host” and “contenttype” lines. The “host” field is compared against domain names associated with the particular service, as: "amazon.com" -> NDPI_RESULT_SERVICE_AMAZON "amazonaws.com" -> NDPI_RESULT_SERVICE_AMAZON "amazon-adsystem.com" -> NDPI_RESULT_SERVICE_AMAZON ".apple.com" -> NDPI_RESULT_SERVICE_APPLE ".mzstatic.com" -> NDPI_RESULT_SERVICE_APPLE Which information is used by DPI? The “content-type” field is compared against predefined values associated with the particular types of the content: "video/mp4" -> NDPI_RESULT_CONTENT_MPEG "video/mpeg" -> NDPI_RESULT_CONTENT_MPEG "video/nsv" -> NDPI_RESULT_CONTENT_MPEG "misc/ultravox" -> NDPI_RESULT_CONTENT_MPEG "audio/ogg" -> NDPI_RESULT_CONTENT_OGG "video/ogg" -> NDPI_RESULT_CONTENT_OGG How to deal with the encrypted traffic? • Encrypted web traffic increased from 20% (in 2011) to 45% (2014) from the whole web traffic • The content is always unknown • The application protocol (HTTPS, POPS, SMTPS, etc) discovered based on ports (e.g., port 465 = HTTPS) • The service discovered based on: a) inspection of the server field in certificates (nDPI) b) matching with services based on cached DNS replies (TSTAT) Part IV How to verify the accuracy of classification tools? The origin of the reference data • The reference data (ground-truth) are usually obtained in one of previously described ways, what causes incompleteness and high misclassification rate • Publicly available databases contain very often incomplete and inaccurate data • So how to provide good quality data? Monitoring on the user's level • System sockets provide name of the application associated with each particular stream in the network • Ability to split HTTP streams according to their content • Fast, precise, avoid privacy issues • Avoid unreliability of port-based or statistical tools Volunteer-Based System (VBS) • Collects data from clients • Enhanced privacy • Application names are taken from system sockets • Recognizes different types of HTTP contents • Open-source, GPL licensed • Windows (32/64-bit) and Linux • Can be downloaded free of charge from SourceForge: http://vbsi.sourceforge.net Design of the system • Volunteer-Based System consists of clients installed on users' computers, the server located at Aalborg University, and statistics generators. Each part of the software can be developed independently, and it collaborates with other by the use of database SQL interfaces The concept of a flow Remote end-point: IP address, port The way of transport: transport protocol Local end-point: IP address, port Information logged for each flow • Identifier of the client • Start timestamp • Hashed local, global, and remote IP addresses • Local and remote ports • Transport layer protocol • Name of the application • Name of the network Information logged for each packet • Identifier of the flow • Direction (inbound / outbound) • Size • State of all TCP flags (for TCP flows) • Time elapsed from the previous packet in the flow • Type of the content (for HTTP flows) Privacy is guaranteed! • Masked users' identities • Masked IP addresses (local, global, remote) • Collecting only general information about transferred HTTP content (as image/gif, video/flv, audio/mpeg) • We do not perform deep inspection of application payloads, all the collected information is obtained only from headers Performance tests • The tests were made on 16 client machines located in Poland • The clients analyzed 121.21 GB of data • The server stored 7.4 GB of statistical data • Communication between the clients and the server was responsible for around 5% of the traffic Performance of VBS • CPU usage by our system does not exceed 5% in average • The system has no impact on the performance of users' computers • VBS is running in background as Windows service or Linux daemon, so it is completely transparent to the users The official project website The official project website http://vbsi.sourceforge.net contains: • Broad description of the project • Screenshots • Roadmap • Binary packages for Windows and Linux • Source code in Git repository • Comprehensive documentation of the source code • Bug tracking and feature request system Part V Implementation and various applications of VBS - a host-based monitoring tool Implementation of our VBS • Developed in Java, using Eclipse environment • Open source, GPL licensed, freely available to everyone • VBS is split into client, server, and statistics generator • Modular design, all parts can be developed independently • Running in the background as a Windows service or Linux daemon thanks to Yet Another Java Service Wrapper (YAJSW) – an open source project that provides support for both 32-bit and 64-bit versions of Windows and Linux • The auto-update mechanism of the client The client The client is installed on users' computers and are responsible for collecting the data about the users' traffic and sending them to the server. The client consists of the following modules: • Packet capturer • Socket monitor • Flows generator • Data transmitter Packet capturer • Uses jNetPcap Java library to collect packets from the network interface. The library makes use of the installed WinPcap / libpcap • Captures all traffic passing the network interface except the traffic from the local subnet (the traffic is filtered by Pcap on the interface level) • JnetPcap offers detecting and stripping various headers (datalink, IP, TCP, UDP, HTTP, etc) • Packets are collected using native Pcap function loopPacket, which saves the resources consumed by VBS Socket monitor • Calls the external socket monitoring tools every second to ensure that even quick openings of sockets are registered • Uses Netstat (Linux) or TCPViewCon (Windows) to get the list of open sockets • Monitors both TCP and UDP sockets and provides the information about the time of opening, time of closing, and name of the application associated with the socket Flows generator • Organizes the captured packets into flows • Attaches the application name to the flow based on the information from the socket monitor • TCP flows are closed based on the time when the corresponding socket is closed • UDP flows are closed based on timeout (one UDP socket can be associated with many flows, since only the local point is defined) • Closed flows are stored in the SQLite database Data transmitter • When the local SQLite database file exceeds 700 kB, it is sent to the server • Raw sockets are used by the communication, but a simple password authentication mechanism exists • The transmitted file also includes identifier of the client and information about the computer on which the client is installed: version of the operating system, information about RAM and CPU(s) The server • Responsible for registering and authenticating clients • Receives SQLite database files from clients • Obtained files are extracted into MySQL database installed on the server machine • There are the following tables in the MySQL database: Clients, Flows, Packets, Applications, ContentTypes, Performance • So far we collected information about 13,242,858 flows and 999,731,839 packets associated with them. The information takes 75.4 GiB disk space. Risk assessment • The threat assessment made by us proved that the system can handle majority of potential security problems Examples of the statistics • Data obtained from 4 users: - User 1 – private user in Denmark, joined December 28, 2011 - User 2 – private user in Poland, joined December 28, 2011 - User 3 – private user in Poland, joined December 31, 2011 - User 4 – private user in Denmark, joined April 24, 2012 • Statistics calculated for all users altogether and for each user separately • Statistics obtained on the per-application basis and percontent-type basis Number of flows vs amount of traffic Top applications for all users Torrent download vs upload Top HTTP content-types for all users Amounts of traffic vs content types Characterizing the applications • We tried to obtain characteristics of 5 different applications based on traffic originated by all our users • It is interesting to observe that 60% of packets (and 71% of packets carrying data) for chrome are inbound • For dropbox the number of inbound and outbound packets is almost the same, but there is a large difference in the size of the inbound and outbound packets The End Thank you for your attention!