Defending Leveraging Your Networks Information with Advanced for SmarterMachine Organizational Learning Outcomes University of Maryland Robert H. Smith School of Business Applying Machine Learning to Cyber Defense May 18, 2015 Greg Porpora – Chief Engineer IBM Federal Bernie Beekman – Executive I/T Architect © 2015 IBM Corporation 1 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Cybersecurity Challenges High Volume Data Streams Threats Emerging at High Rate • Short Lived Patterns • Highly Adaptive New Threats Evasive Threats: hard to detect, harder to predict • Discriminative over Multiple Channels • High Degree of obfuscation with degrees of separation • Slow and Low Threats High Cost to Prevention • Signature generation takes time and money • Domain experts hard to find and expensive 2 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Cyber Defense Landscape – Welcome to the Real World Today if you are connected to the internet and a high value target • you are either already infected and don’t know about it • or will be infected-attacked in the near future ! Everyone has defense in depth….. However attacks are getting through to high value targets every week despite massive investments The issue is not defense in depth against known or suspected activities but…. • …. Adaptive defense that can respond in real-time (seconds to minutes) to changing attack vectors 3 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Traditional Signature Defense is Insufficient – Advanced Pattern Detection is Needed. Signature-based security systems can only detect attacks for which a signature has previously been created Advanced Analytic Techniques model behavioral patterns, leverage heuristics and generate rules to detect behavior that falls outside of normal system operation • Supervised learning - we label the data and train the algorithm so when providing the system with new data, it will correctly determine (i.e score) to what class it belongs (benign, infected) • Unsupervised learning - we provide data (without labeling) and want our system to find structures/patterns in the data. By observing various data sets and activities, we can use anomaly detection systems to classify the behavior and determine if it is either normal or anomalous. • Semi-supervised learning - is a class of supervised learning tasks and techniques that also make use of unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). 4 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning As it relates to Cybersecurity, Machine Learning is best used for: • Dynamically discovering new or subtle changes in attack signatures-tradecraft • Behavior modeling to characterize normal versus anomalous activity • Provide lower sensitivity of analysis to reduce false alarms by balancing bias-variance and precisionversus recall • Highly adaptable to changing threat domains • Deeper insight into Advanced Persistent Threat (APT) both in real-time as well as longer term 5 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Analytic Techniques for Cyber Security • Adaptive Defense: Employment of machine learning techniques and models that self learn and/or discover changing unknown APT signatures and tradecraft in near real-time (seconds –minutes). • Responsive Defense: Post forensic analysis of activities and events (hours-days) that modifies existing rules-signature/signature-less analysis or creates new ones. • Real-Time Cyber Intelligence: Blending both Adaptive and Responsive Defense detection analysis to create new prevention and response methods and capabilities that can be employed in near real-time and shared with partners. Rules-Signature Based analysis Adaptive Cyber Intelligence Signature-less Based analysis • • Machine Learning Based analysis • • Uncover new APT tradecraft in real-time Dynamically adapt defensive posture Shorten exposure time Improve resiliency of enterprise 6 8 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Stream Computing 7 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning How InfoSphere Streams Works continuous ingestion continuous analysis 8 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning How InfoSphere Streams Works Continuous ingestion Continuous analysis Filter / Sample Infrastructure provides services for Scheduling analytics across hardware hosts, Establishing streaming connectivity Annotate Transform Correlate Classify Achieve scale: By partitioning applications into software components By distributing across stream-connected hardware hosts 9 9 Where appropriate: Elements can be fused together for lower communication latency © 2015 IBM Corporation © 2013 IBM Corporation Defending Your Networks with Advanced Machine Learning Example: Botnet Detection Command and Control Server Botmaster Malicious commands Maintenance and updates Vulnerable host Malicious commands Maintenance and updates Proxies 12 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Fast Flux Fast flux DNS is a technique that a cybercriminal can use to prevent identification of his key host server's IP address. By abusing the way the domain name system works, the criminal can create a botnet with nodes that join and drop off the network faster than law enforcement officials can trace them. The basic idea behind Fast flux is to have numerous IP addresses associated with a single fully qualified domain name(e.g “xxx.yyy.com”), where the IP addresses are swapped in and out with extremely high frequency (after the low TTL has expired), through changing DNS records. 11 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Fast Fluxing Example QUESTION fanarm.net. ANSWER fanarm.net. fanarm.net. fanarm.net. fanarm.net. fanarm.net. 300 300 300 300 300 IN IN IN IN IN A A A A A IN A 71.35.101.107 71.37.48.123 195.214.238.241 219.95.36.17 41.222.11.122 } Notice non contiguous IP address TTL of 300 seconds for Domain flickingers.net AUTHORITY fanarm.net. 300 IN NS ns1.flickingers.net. fanarm.net. 300 IN NS ns2.flickingers.net. ASN Net-block Country Registrar 209 71.32.0.0/13 US arin 209 71.32.0.0/13 US arin 24881 195.214.236.0/22 UA ripencc 4788 219.95.0.0/17 MY apnic 36866 41.222.8.0/21 KE afrinic Notice different ASN, Countries and Registrars for Domain flickingers.net 14 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Botnet Detection Processing Elements 13 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning IBM’s Cognitive Cyber Defense (CCD) Solution • Three years of effort from IBM Research in Concert with IBM Federal • 100 % Machine learning Based Network Communication Detector • Detect anomalous behavior as it appears in real-time • Can easily scale to 32TB per day ingest • Models dynamically adapt to changing signatures Visualization Ingest Live Netflow & DNS Data Extract Netflow & DNS Features Anomaly Detection via Trained Models Dynamically Retrain Models 14 17 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning CCD Stream Processing Detailed Flow Blacklist & Whitelist Proxy Logs PCAP-DNS Net Flow WHOIS or Maxmind 18 Base Models • DNS • Net Flow © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Applying Advanced Machine Learning Techniques to Cyber Defense in 2015 • Major North American Telco • Detected over 200 plus DNS Tunneling Attacks • Detected many DGA Infected clients beaconing out to Botnet C&C Servers • Detected many infected clients communicating with Fast Fluxing Botnet C&C Servers • Major U.S. Corporation • Detected many DGA Infected clients beaconing out Botnet C&C Servers • Detected many infected clients communicating with Fast Fluxing Botnet C&C Servers • Major U.S. Manufacturer • Detected hundreds DGA Infected clients beaconing out Botnet C&C Servers • Detected many infected clients communicating with Fast Fluxing Botnet C&C Servers 16 © 2015 IBM Corporation Defending Your Networks with Advanced Machine Learning Responding to Cyber Threats with IBM’s Cognitive Cyber Defense Solution Cybersecurity Challenges Cognitive Cyber Defense High Volume Data Streams Scalable analytics platform Cybersecurity threats emerging at high rate Real-Time detection Evasive threats hard to detect Machine based learning Integration into SIEM and Big Data environments High cost of expertise 22 © 2015 IBM Corporation