Defending Your Networks with Advanced Machine Learning

advertisement
Defending
Leveraging
Your Networks
Information
with Advanced
for SmarterMachine
Organizational
Learning
Outcomes
University of Maryland
Robert H. Smith School of Business
Applying Machine Learning
to Cyber Defense
May 18, 2015
Greg Porpora – Chief Engineer IBM Federal
Bernie Beekman – Executive I/T Architect
© 2015 IBM Corporation
1
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Cybersecurity Challenges
 High Volume Data Streams
 Threats Emerging at High Rate
• Short Lived Patterns
• Highly Adaptive New Threats
 Evasive Threats: hard to detect, harder to predict
• Discriminative over Multiple Channels
• High Degree of obfuscation with degrees of separation
• Slow and Low Threats
 High Cost to Prevention
• Signature generation takes time and money
• Domain experts hard to find and expensive
2
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Cyber Defense Landscape – Welcome to the Real World
 Today if you are connected to the internet and a high value target
• you are either already infected and don’t know about it
• or will be infected-attacked in the near future !
 Everyone has defense in depth….. However attacks are getting
through to high value targets every week despite massive
investments
 The issue is not defense in depth against known or suspected
activities but….
• …. Adaptive defense that can respond in real-time (seconds to minutes)
to changing attack vectors
3
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Traditional Signature Defense is Insufficient –
Advanced Pattern Detection is Needed.
 Signature-based security systems can only detect attacks for which a
signature has previously been created
 Advanced Analytic Techniques model behavioral patterns, leverage heuristics
and generate rules to detect behavior that falls outside of normal system
operation
• Supervised learning - we label the data and train the algorithm so when providing the
system with new data, it will correctly determine (i.e score) to what class it belongs
(benign, infected)
• Unsupervised learning - we provide data (without labeling) and want our system to
find structures/patterns in the data. By observing various data sets and activities, we
can use anomaly detection systems to classify the behavior and determine if it is
either normal or anomalous.
• Semi-supervised learning - is a class of supervised learning tasks and techniques
that also make use of unlabeled data for training - typically a small amount of labeled
data with a large amount of unlabeled data. Semi-supervised learning falls between
unsupervised learning (without any labeled training data) and supervised learning
(with completely labeled training data).
4
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
As it relates to Cybersecurity, Machine Learning is
best used for:
• Dynamically discovering new or subtle changes in
attack signatures-tradecraft
• Behavior modeling to characterize normal versus
anomalous activity
• Provide lower sensitivity of analysis to reduce false
alarms by balancing bias-variance and precisionversus recall
• Highly adaptable to changing threat domains
• Deeper insight into Advanced Persistent Threat (APT)
both in real-time as well as longer term
5
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Analytic Techniques for Cyber Security
•
Adaptive Defense: Employment of machine learning techniques and models that
self learn and/or discover changing unknown APT signatures and tradecraft in near
real-time (seconds –minutes).
•
Responsive Defense: Post forensic analysis of activities and events (hours-days)
that modifies existing rules-signature/signature-less analysis or creates new ones.
•
Real-Time Cyber Intelligence: Blending both Adaptive and Responsive Defense
detection analysis to create new prevention and response methods and capabilities
that can be employed in near real-time and shared with partners.
Rules-Signature
Based analysis
Adaptive Cyber
Intelligence
Signature-less
Based analysis
•
•
Machine Learning
Based analysis
•
•
Uncover new APT tradecraft in
real-time
Dynamically adapt defensive
posture
Shorten exposure time
Improve resiliency of enterprise
6
8
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Stream Computing
7
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
How InfoSphere Streams Works
 continuous ingestion
 continuous analysis
8
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
How InfoSphere Streams Works
 Continuous ingestion
 Continuous analysis
Filter / Sample
Infrastructure provides services for
Scheduling analytics across hardware hosts,
Establishing streaming connectivity
Annotate
Transform
Correlate
Classify
Achieve scale:
By partitioning applications into software components
By distributing across stream-connected hardware hosts
9
9
Where appropriate:
Elements can be fused together
for lower communication latency
© 2015 IBM Corporation
© 2013 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Example: Botnet Detection
Command and Control Server
Botmaster
Malicious commands
Maintenance and updates
Vulnerable host
Malicious commands
Maintenance and updates
Proxies
12
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Fast Flux
 Fast flux DNS is a technique that a cybercriminal can use to
prevent identification of his key host server's IP address.
 By abusing the way the domain name system works, the
criminal can create a botnet with nodes that join and drop off
the network faster than law enforcement officials can trace
them.
 The basic idea behind Fast flux is to have numerous IP
addresses associated with a single fully qualified domain
name(e.g “xxx.yyy.com”), where the IP addresses are swapped
in and out with extremely high frequency (after the low TTL has
expired), through changing DNS records.
11
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Fast Fluxing Example
QUESTION fanarm.net.
ANSWER
fanarm.net.
fanarm.net.
fanarm.net.
fanarm.net.
fanarm.net.
300
300
300
300
300
IN
IN
IN
IN
IN
A
A
A
A
A
IN A
71.35.101.107
71.37.48.123
195.214.238.241
219.95.36.17
41.222.11.122
}
Notice non contiguous IP address TTL of 300 seconds
for Domain flickingers.net
AUTHORITY
fanarm.net. 300 IN NS ns1.flickingers.net.
fanarm.net. 300 IN NS ns2.flickingers.net.
ASN
Net-block
Country
Registrar
209
71.32.0.0/13
US
arin
209
71.32.0.0/13
US
arin
24881
195.214.236.0/22
UA
ripencc
4788
219.95.0.0/17
MY
apnic
36866
41.222.8.0/21
KE
afrinic
Notice different ASN, Countries and Registrars for Domain flickingers.net
14
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Botnet Detection Processing Elements
13
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
IBM’s Cognitive Cyber Defense (CCD) Solution
•
Three years of effort from IBM Research in Concert with IBM Federal
•
100 % Machine learning Based Network Communication Detector
•
Detect anomalous behavior as it appears in real-time
•
Can easily scale to 32TB per day ingest
•
Models dynamically adapt to changing signatures
Visualization
Ingest Live
Netflow &
DNS
Data
Extract
Netflow &
DNS
Features
Anomaly
Detection
via Trained
Models
Dynamically
Retrain Models
14
17
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
CCD Stream Processing Detailed Flow
Blacklist &
Whitelist
Proxy Logs
PCAP-DNS
Net Flow
WHOIS or Maxmind
18
Base Models
• DNS
• Net Flow
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Applying Advanced Machine Learning Techniques to
Cyber Defense in 2015
•
Major North American Telco
•
Detected over 200 plus DNS Tunneling Attacks
•
Detected many DGA Infected clients beaconing out to Botnet C&C
Servers
•
Detected many infected clients communicating with Fast Fluxing Botnet
C&C Servers
•
Major U.S. Corporation
•
Detected many DGA Infected clients beaconing out Botnet C&C Servers
•
Detected many infected clients communicating with Fast Fluxing Botnet
C&C Servers
•
Major U.S. Manufacturer
•
Detected hundreds DGA Infected clients beaconing out Botnet C&C
Servers
•
Detected many infected clients communicating with Fast Fluxing Botnet
C&C Servers
16
© 2015 IBM Corporation
Defending Your Networks with Advanced Machine Learning
Responding to Cyber Threats with IBM’s Cognitive
Cyber Defense Solution
Cybersecurity Challenges
Cognitive Cyber Defense
 High Volume Data
Streams
 Scalable analytics
platform
 Cybersecurity threats
emerging at high rate
 Real-Time detection
 Evasive threats hard to
detect
 Machine based learning
 Integration into SIEM and
Big Data environments
 High cost of expertise
22
© 2015 IBM Corporation
Download