Securing Networks using SDN and Machine Learning

advertisement

SECURING NETWORKS USING

SDN AND MACHINE LEARNING

DRAGOS COMANECI – IXIA

@DRCOMANECI

DCOMANECI@IXIACOM.COM

ABOUT ME

Sofware Engineer/Security Researcher at Ixia in the ATI (Application Threat

Intelligence) team

Reverse engineering & emulating application protocols and strikes

Doing a PhD on Software-Enabled Adaptive Network Traffic Management

(short version: SDN + ML  )

SHORT INTRODUCTION

Problem:

Traditional signature-based IPS/IDS approaches won’t scale as the network becomes complex

Solution:

Adaptive way of defending the network: SDN & Machine Learning

Allows: Anomaly detection, botnet detection, honeypot rerouting

SYSTEM OVERVIEW

Progressive Flow Classification

Supervised

Learning

Unsupervised

Learning

Flow Grouping

SDN

Controller

Network

Devices

INTEGRATING FLOW CLASSIFICATION INTO AN

SDN CONTROLLER

Modern SDN Controllers are basically event handlers

Streams of events come into the controller from the network and are transformed into forwarding rules

Structure flow classification as events (e.g. flow match)

NETWORK ANOMALY DETECTION

Continually train & refine supervised models for the traffic flows in our network

When a new flow doesn’t match any model flag it as suspicious, add it to the queue for the clustering algorithm

Run clustering with side information to see if there are other flows similar to it

If it’s in a separate cluster => anomaly; if not, refine the model for the closest match

BOTNET DETECTION

Groups of hosts communicate periodically with a C&C server and receive commands from it that are executed (eg. performing DDoS, scanning the network, sending spam, etc.)

Communication flow with the C&C server => anomaly

Similar communication flows are performed afterwards for the command => group of related flows

Anomaly + group of related flows originating from the same host afterwards

=> bot

HONEYPOT TRAFFIC REROUTING

As before, if the flow doesn’t match any supervised model, mark the host which initiated it as suspicious and store the flow 5-tuple

Next time the host that initiated it tries to communicate reroute that flow to a honeypot

SYSTEM ARCHITECTURE

Hadoop Cluster

Traffic Flows &

Computed Features

Classifier Models &

Flow Groups

Network Controller

Common

Distributed State

Data Store

Nettle

Controller

VM

Nettle

Controller

VM

Nettle

Controller

VM

Flow Classification

Events

Network Forwarding Elements

Network Element

Traffic

Classifier

Forwarding Rules &

Classifier Models

Network Element

Traffic

Classifier

Network Element

Traffic

Classifier

EXPERIMENTAL TESTBED

OVS

Switch

Diffuse

Classifier

ML Enhanced SDN Controller

OVS

Switch

OVS

Switch

OVS

Switch Virtualized

Switches

Ixia BreakingPoint Application

Traffic Emulator

TESTING & RESULTS

Used the Ixia BreakingPoint traffic emulator to simulate Enterprise, Small

Business and ISP network traffic: Enterprise, SOHO/Small Business, Sandvine

2H 2013 North America Fixed application profiles

TESTING & RESULTS

Along with the normal network traffic, we also emulated application attacks

(Critical Strikes strikelist – 607 strikes) as well as botnet traffic (1646 different botnets, the majority of them HTTP based)

EVALUATION & RESULTS

For training data, we generated packet captures with 256 streams for each flow type in the application profile

Then, we proceeded to train classification models for Diffuse (C4.5) for each flow type through the WEKA ML framework

Classification Accuracy:

Application Profile

Enterprise

SOHO/Small Business

Sandvine 2H 2013 North

America Fixed

Without attack/botnet traffic With attack/botnet traffic

82% 68%

87%

79%

71%

63%

CLASSIFICATION TIME

How many packets do we have to inspect before we can reach a conclusion about the flow type? (cap at 20 packets)

Flow features:

Minimum, mean, maximum, standard deviation and sum of the packet sizes

First 10 packet sizes

First 10 packet communication endpoint (initiator/responder)

RESOURCE USAGE OVERHEAD

1 Mininet VM with Diffuse installed simulating a topology with 4 switches; learning switch SDN controller running in the same machine;

CPU usage overhead when enabling Diffuse: 17%

Memory usage overhead: 13%

CONCLUSIONS

Machine learning flow classification & SDN can work together to make the network adaptive

We can extract & use three types of information from the network:

Flow type classification

New flow type classifiers

Flow groups

Anomaly detection, botnet detection & honeypot rerouting can be done

ML traffic classification overhead is manageable

Download