Analyzing and Profiling Attacker Behavior in Multistage Intrusions 1 Contents Introduction and Background Literature Review Methodology Implementation Evaluation Contribution Conclusion References 2 Introduction Increase in technology has brought more sophisticated intrusions, 3 with which the network security has become more challenging. Attackers might have different intentions and each attack might have different level. Understanding their behavior is important to understand possible risks. In a government study [7] attackers are classified into 9 different groups Amateurs, Criminals, Insiders, Phishers, Nations, Hackers, Terrorists, Bot-network operators, and Spyware/ malware authors. Introduction (Contd..) Amateurs: This group of attackers don’t have much knowledge. They do it for fun. Criminals: seek to attack systems for monetary gain. They use spam, phishing, and spyware/malware to commit identity theft and online fraud. Phishers: Execute phishing schemes in an attempt to steal identities or information for monetary gain. Terrorists : Seek to destroy, incapacitate, or exploit critical infrastructures in order to threaten national security. 4 Attacker Groups Hackers: Break into networks by gaining unauthorized access 5 that requires a fair amount of skill or computer knowledge Insiders: Insiders’ knowledge of a target system often allows them to gain unrestricted access to cause damage to the system or to steal system data. Nations: Use cyber tools as part of their information-gathering and espionage activities. Spyware/malware authors: carry out attacks against users by producing and distributing spyware and malware. Bot-network operators: Bot-net operators use a network, or bot-net, remotely controlled systems to coordinate attacks Problems with IDS (Contd..) It is very important to profile and predict the attacker intentions to protect the network accordingly. There is a necessity find an efficient way to identify the type of attackers. IDS such as Snort [8] helps in detecting single step intrusions, but not in detecting multistage attack and attacker behavior. Due to Huge number of alerts Lack of proper model that can detect multistage attacks Lack of a method that can link multistage attacks to attacker behavior. 6 Objective Develop a system that can • Detect multistage attacks • Analyze the attacker behavior by classifying the activity • Discover the attacker behavior patterns • Predict and profile the type of attacker based on behavior. 7 Literature Review Multilevel alert clustering and intelligent alert clustering models [2] were well formed techniques for reducing the number of alerts. Complexity of the above models could degrade the performance of the system. Mathew et al [1] have made a good effort to present a technique for understanding multi stage attacks using attacktrack based visualization of heterogeneous event streams. They used the event correlation which is based on attack tracks to determine the temporal relationship between the heterogeneous events. 8 Literature Review (Contd..) The above approach was useful just to understand the stages in the multistage attack, but not to predict the user behavior. A user behavior perception model based on markov process [7] presented a novel user behavior perception model for intelligent mobile terminals. The model is based on the Markov process, which introduces also the idea of machine learning and context-awareness. The user behavior histories were used to discover user’s preference, and information gathered from users are described to perceive the user behaviors. 9 Methodology Processing the raw data Alert grouping Attacker behavior analysis Preparation of semi-automatic Training the Hidden Markov Model Profiling and Predicting of attacker behavior 10 Collection and Generalization of Alerts The raw data was provided by ORNL. It was in pcap format The generated alerts have a lot of insignificant information, which needs to be eliminated. Essential details in each alert such as IP Address of source and destination host, alert type and classification are extracted. [**] [1:2000537:6] ET SCAN NMAP -sS [**] [Classification: Attempted Information Leak] [Priority: 2] 07/17-09:30:09.298097 192.168.101.66:33966 -> 192.168.101.53:175 TCP TTL:49 TOS:0x0 ID:27814 IpLen:20 DgmLen:44 S* Seq: 0x3C25204F Ack: 0x0 Win: 0x800 TcpLen: 24 TCP Options (1) => MSS: 1460 11 Collection and Generalization of Alerts The raw data was provided by ORNL. It was in pcap format The generated alerts have a lot of insignificant information, which needs to be eliminated. Essential details in each alert such as IP Address of source and destination host, alert type and classification are extracted. Time stamp Alert type (portscan)TCPPortscan , 07/17-10:03:27.114495, 192.168.101.66,3387, 192.168.101.54,4497. Source 12 Destination Alert Grouping Snort[8] generates thousands of alerts each day many of them might be false alarms. With large number of alerts it is not possible to profile and predict the attacker behavior. On an average an alert is generated for every 2 milliseconds, therefore, we need to group them. Alerts that generated from same source and targeted to same destination for the same purpose ( i.e. with same alert type) and generated within one second of time difference are grouped together. Source Destination Time stamp Alert type Count 192.168.101.56, 192.168.72.1, 07/17- 10:59:06, ETSCANNMAP, 70, 50 13 Behavior code Attacker Behavior Analysis Based on a government study in 2010 [7] the attackers are divided in to different groups such as amateurs, criminals, insiders, terrorists, and hackers. To predict the attacker behavior we have used Hidden Markov Model (HMM) [9], A machine learning algorithm, to analyze these attackers behavior by defining some rules for each type of attacker. 14 Hidden Markov Model λ= (A, B, π, N) (N is number of states) State probabilities Transition probabilities Emission or Observation probabilities 15 Attacker Behavior Analysis (Contd..) We have defined five stages, which are also hidden states in HMM Scanning Enumeration Access attempt Malware attempt Denial of service 16 Stages in the multistage attack Scanning: Attacker tries to gather the information about the target system Observation: ICMP PING Enumeration : Attacker tries to find the vulnerabilities of the target system Observation: CHAT_MSN Access attempt: Attacker tries to gain the access to the target system’s resources. Observation: SQL version overflow attempt Denial of service : Attacker tries deny service to other users. Observation: NETBIOS SMB-DS Trans Max Param DOS attempt Malware attempt : Attacker tries to execute own code on the target system. Observation: SHELLCODE_x86_NOOP 17 Preparation of Semi-Automatic HMM Training 18 Preparation of Semi-Automatic HMM Training (Contd..) Maps alerts (Observations) into one of the five hidden states For example an alert of ICMP PING type is usually considered as a scanning type and an alert of SHELLCODE X86 INC EXC NOOP is considered as exploitation malware attempt type. As of now we have around 88 rules to train our model. Once the rule set is defined, we map the state name to each alert by applying rules. Alert type 07/14-13:12:54.775367 [**] [1:384:5] ICMP PING [**] [Classification: Misc activity] [Priority: 3] {ICMP} 192.168.1.24 -> 192.168.1.1 Time stamp 19 Attacker Victim Preparation of Semi-Automatic HMM Training (Contd..) We have classified all the alerts into five different sets same as states in our model depending upon on the type of alert. For example an alert of ICMP PING type is usually considered as a scanning type and an alert of SHELLCODE X86 INC EXC NOOP is considered as exploitation malware attempt type. As of now we have around 88 rules to train our model. Once the rule set is defined, we have assigned the state name to each alert by applying rules. Scanning, 07/14-13:12:54.775367, Misc activity, 192.168.1.24, 192.168.1.1 State 20 Victim Time stamp Attacker Training the Hidden Markov Model Steps Initialization : This step initializes the state, transition, and observation probabilities. Forward algorithm: This step calculates the observation probabilities based on the occurred observation sequence. Backward algorithm: This step calculates the state and transition probabilities based on observation probabilities and sequence. Re-estimation of probabilities : This step re-estimated the state, transition, and observation probabilities by iterating the above three steps number of times 21 Training the Hidden Markov Model(Contd..) Attacker Groups Behavior Amateur Scanning + Enumeration Insider, Phisher, Spyware/Malware, Botnet Access attempt + Denial of service + (ISBN) Malware attempt Criminal groups, Terrorists, Hackers, Nations Scanning + enumeration + access attempt + (CTHN) Malware attempt Terrorists, Hackers (TH) Scanning + enumeration + Denial of service Terrorists, Hackers, Criminal groups (THC) 22 Scanning + enumeration + access attempt + Table 3.1 Behavior Classification Denial of service + Malware attempt Prediction of Attacker Behavior As we have trained our system and stored probabilities in our database, our next step is to match the set of incoming alerts with one of our stored behavior. To find the closest behavior for a set of alerts, we have used Kullback Leibler Distance Calculator [6]. The Kullback-Leibler distance (K-L) [6] is a measure of the similarity between two completely determined probability distributions. 23 Attacker Behavior Analysis The Kullback-Leibler distance (K-L) Definition: Let p1(x) and p2(x) be two continuous probability distributions. By definition, the K-L distance D (p1, p2) between p1(x) and p2(x) is: Basic Properties D (p1, p2) is the mean of the quantity log [p1(x)/p2(x)], with p1(x) being the reference distribution. The K-L distance is always nonnegative. It is zero only when the two distributions are identical. It is common to encounter the symmetric version of the K-L distance between p1 and p2: Ds(p1, p2) = [D(p1, p2) + D(p2, p1)] / 2 24 Implementation Technologies we used Clustering and Generalization -- Java Attacker Behavior Analysis -- Java C#.net API used: Hidden Markov Model – Jahmm[10] KL-Distance calculator - Jahmm [10] 25 Implementation (Contd..) Transition Probability State Probability Observation Probability 26 Figure 1 Probability Distribution Implementation(Contd..) Figure 2 Behavior Description 27 Evaluation – Experimentation 192.168.0.192 192.168.0.139 192.168.0.1 Amateur type 192.169.10.11 192.168.0.10 192.168.179.1 192.168.0.191 192.168.133.1 Serious threats 28 Attackers Victims Evaluation - Results 7 6 5 4 1/KLDistance Amateur ISBN 3 THC TC CTHN 2 1 0 29 Figure 4 Behavior comparison Contribution Grouping alerts Build HMM model for each of the attacker groups. Profile the 5 HMM models Predict Attacker behavior by calculating KL – distance [3]. 30 Conclusion In our study we achieved most of the expected results. Over all we had over 300 types of alerts generated through this process. This made our system to be able to detect most of the known attacks. Attacker behavior analysis is very efficient way of finding the possible behavior of an attacker, which allows us to take action according to the intentions of the attacker. 31 Demo 32 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 33 S. Mathew, D. Britt, R. Giomundo, S. Upadhyaya, S. Sudit, Real-time Multistage Attack Awareness Through Enhanced Intrusion Alert Clustering, In Situation Management Workshop (SIMA 2005), MILCOM 2005, Atlantic City, NJ, October, 2005. Siraj, Vaughn, Multilevel Alert Clustering for Intrusion Detection Sensor Data, Fuzzy Information Processing Society, USA, 2005. Kullback-Leibler distance http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_kullbak.htm Yang, Gasior, Katipally,Cui, Alerts Analysis and Visualization in Network-based Intrusion Detection Systems, The Second IEEE International Conference on Information Privacy, Security, Risk and Trust (PASSAT2010), 2010, USA. Yang, Katipally, Gasior, Cui, Multistage attack detection system for network administrators, CSIIRW -6 , 2010, USA. Manavogulu, parlov, Giles, Probabilistic User Behavior Models, Proceedings of the Third IEEE International Conference on Data Mining (ICDM’03), 2003 USA. CYBERSPACE: United States Faces Challenges in Addressing Global Cybersecurity and Governance, July 2010 . http://www.snort.org Mark Stamp, A Revealing Introduction to Hidden Markov Models,2008