Botnet Behavior and Detection Strategies Brad Wilder Overview What are botnets? What security threats are associated with botnets? How prevalent are botnets? What strategies can we use to detect and contain botnets? Page 2 What Are Botnets? Basic Definitions Botnets are networks of malware infected machines, capable of being controlled by a remote adversary. They consist of: – Bots: malware infected machines – Botmaster (aka bot herder): the attacker who controls the network – Command and Control (C&C) channel: the communication channel over which the botmaster communicates with and issues commands to the bots – Bot client: the particular malware on which the bot is based The malware may be a virus, worm, Trojan horse, spyware, rootkit, or any other malicious/unwelcome software Page 3 What Are Botnets? Basic Architecture Page 4 Botnet Security Threats Information Infiltration Intellectual property and personal information theft – Trade secrets – Military intelligence – Banking credentials – Usernames and passwords – Information on personal preferences and habits Key logging Phishing/man-in-the-browser Forms the basis for massive copyright infringement and identity theft Page 5 Botnet Security Threats Information Infiltration (cont’d) Stone-Gross, et al hijacked the Torpig botnet, used for spam and phishing attacks, for 10 days in early 2009, during which time they collected: – ~70GB of data from more than 180,000 victims – 8310 financial account credentials at 410 different institutions – ~300,000 username-password pairs from 52,540 different infected machines 28% of the victims reused the same credentials at other Web sites, giving them access to another 368,501 Web site accounts Mariposa botnet, taken down in Spain in March – Sensitive information from 800,000 users, including half of the Fortune 1000 companies and more than 40 of the world’s major banks Page 6 Botnet Security Threats DDoS Distributed Denial of Service attacks – Denial of Service attacks involve flooding a server with so much traffic that it crashes due to the unexpected load – Botnets distribute the workload among many bots – Can be used to take down critical infrastructure – Also used in extortion plots – DDoS attacks are far more difficult to stop than DoS attacks, since blocking one IP address does not stop the others Torpig had an aggregate bandwidth of 17Gbps without factoring in corporate networks, which accounted for 22% of the total Page 7 Botnet Security Threats Spam 95% of all spam is thought to originate from botnets Spam represents 90% of all email traffic 160 billion spam messages per day! Spam is not just irritating; it causes noticeable effects for the end user – Slows connection speed – Can steal contact information from your email inbox – Is a conduit for spreading infections Spam is virtually free to send, but costs time for the recipient to sift through, and even more if a malware payload is delivered successfully Page 8 Botnet Security Threats Cyber Attack Sophistication vs. Cyber Criminal Sophistication bots Cross site scripting Tools “stealth” / advanced scanning techniques High packet spoofing sniffers Intruder Knowledge Staged attack distributed attack tools www attacks automated probes/scans denial of service sweepers GUI back doors network mgmt. diagnostics disabling audits hijacking burglaries sessions Attack Sophistication exploiting known vulnerabilities password cracking self-replicating code Attackers password guessing Low 1980 1985 1990 1995 2000+ Page 9 Source: CERT How Prevalent Are Botnets? Size Estimation Is Difficult Botnet footprint: aggregate number of bots under the botnet’s control Botnet live population: number of bots simultaneously under the botnet’s control There is no clear way to measure the size of a botnet – Analyzing DNS traffic, looking for bots locating a C&C server, or querying DNS blacklists to see if they have been flagged – Redirecting C&C traffic into sinkholes – Infiltration of the botnet C&C server Most methods rely on counting bot IDs or IP addresses Page 10 How Prevalent Are Botnets? Size Estimation Is Difficult (cont’d) Bot IDs can be changed at the whim of the botmaster, and may be inflated to make the botnet appear larger IP addresses do not represent a one-to-one relationship with machines – One of the shortcomings of IPv4 – DHCP: dynamic allocation of IP addresses; ensures the same user does not always use the same IP address; overinflates the size estimate – NAT: allows multiple users on the same private network to more or less share the same IP address; underinflates the size estimate Page 11 How Prevalent Are Botnets? Sizes May Be Overrepresented Sizes are often erroneously reported Mariposa botnet was widely reported to have claimed more than 12 million hosts – Original quotation indicates 12 million IP addresses – Still must have compromised hundreds of thousands and possibly millions of computers – What is most surprising is the botmasters’ utter lack of proficiency Page 12 How Prevalent Are Botnets? Torpig Case Study Stone-Gross, et al showed that IP address information may give a basis for estimating the size of a botnet – Over 10 days, they observed 1.2 million IP addresses – Determined later that the botnet had a footprint of 182,800 bots – Estimated an average live population of ~49,000 bots, based on the rate of new IP addresses used – Found that the IP address count represented about an order of magnitude overrepresentation of the botnet footprint Page 13 How Prevalent Are Botnets? Total Number Of Bots In Operation The difficulty in estimating the size of a single botnet further compounds the difficulty of quantifying the entire botnet problem There may be significant overlap among botnets, leading to overestimation Current estimates diverge widely – Very conservative estimates put the total number in the hundreds of thousands – More convincing estimates put the total number in the millions or tens of millions, spread across perhaps thousands of botnets Page 14 What Can Be Done Prevention Strategies Difference between prevention and detection – Prevention involves stopping the spread of malware – Detection is a reactive approach PEBKAC – Only 46% of computer users always update their AV software – 30-60% of users have little to no knowledge about basic security issues – Almost half of users that open spam do so intentionally Zero-day viruses – 20% of malware is not detected in the best of cases Page 15 What Can Be Done Basic Detection Strategies Blacklisting domain names or IPs that exhibit problematic behavior – Use honeypots: software traps into which malware can be lured – Spam boxes can be used to study spam behavior Early (naive) attempts were overly simplistic – Listening on particular ports: these are often just a suggestion – Examining packet contents: doesn’t work if the transmission is encrypted, or if the bot commands are not known ahead of time Page 16 What Can Be Done IRC-Based Detection Strategies Traffic analysis seems to be the most promising method for detecting botnet C&C activity Strayer, et al showed how a pipeline of successive filters could be used to distill network traffic They started with a base pool of over 9 million traffic flows taken over a 4 month period; >600GB of just TCP/IP header information; they added to this 42 botnet C&C flows they generated with a bot under their control over the course of hours Page 17 What Can Be Done IRC-Based Detection Strategies (cont’d) Page 18 What Can Be Done IRC-Based Detection Strategies (cont’d) Page 19 What Can Be Done IRC-Based Detection Strategies (cont’d) Classifier stage is used to group flows into classes of communication – Interactive – Bulk data transfers – Streaming – Tranactional Even though this seemed promising, the researchers omitted this stage in their implementation because there were too many false negatives Page 20 What Can Be Done IRC-Based Detection Strategies (cont’d) Correlator stage attempts to take flows that occur very close to each other according to some metric and correlate them Want to find flows that are the product of similar applications, that demonstrate a causal relationship with one another, and that follow the multicast model of communication Associate with each flow a vector quantifying certain metrics – Based on temporal qualities in their implementation Group flows pairwise and plot each against the distance between the contributing flow’s vector Page 21 What Can Be Done IRC-Based Detection Strategies (cont’d) Page 22 What Can Be Done IRC-Based Detection Strategies (cont’d) Topological Analyzer stage attempts to find a common node that would indicate the C&C channel The researchers were able to identify 9 out of 10 bots, and find the C&C channel Confirmed their original hypothesis that IRC-based C&C flows are highly correlated Page 23 What Can Be Done Fast-Flux And Domain Flux Fast-flux: bots query a given domain looking for the C&C server, but the domain is mapped onto a set of constantly changing IP addresses – Researchers have combated this since there is a single point of entry – Based on DNS traffic similarity – The method was strongly affected by experimental parameters, and dependent on blacklist Domain flux: the domain names themselves change; each bot contains a domain generation algorithm – Stone-Gross et al countered this when hijacking Torpig – The arms race is stacked against us; it is not scalable – This technique is used by Conficker to generate 50,000 domains a day Page 24 What Can Be Done Current And Future Trends P2P botnets: distributed architecture makes them much more resilient – Probably best countered with traffic flow analysis, but this is an area of intense research Smaller size More bandwidth Decentralized C&C channels Advanced customized encryption IP disguising Page 25 Conclusion What are botnets? What security threats are associated with botnets? How prevalent are botnets? What strategies can we use to detect and contain botnets? Page 26 Thank You Q&A QUESTIONS??? Page 28