Heat-seeking Honeypots : Desing and Experience Reporter :鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/05/26 1 Conference • John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi. "Heatseeking Honeypots: Design and Experience." 20th International World Wide Web Conference (WWW 2011). 2 Outline • • • • • Introduction System Design Result Discussion Conclusions 3 Introduction • Many malicious activities – Phishing、Malware Pages、Open proxies – Vulnerable servers – 90% (compromised legitimate site) • Understanding – How attackers identify Web servers running vulnerable applications? – How they compromise them? – What subsequent actions they perform on these servers would therefore be of great value? 4 Introduction • Honeypot – Client-based • visiting suspicious servers • executing malicious binaries – Server-based • Passive • wait for attackers • Challenge – How to effectively get attackers to target these honeypots? – How to select which Web applications to emulate ? 5 Our System • Our system Heat-seeking Honeypots – Actively attract attackers – Dynamically generate and deploy honeypot pages – Analyze logs to identify attack patterns 6 System Design • • • • Obtaining attacker queries Creation of honeypot pages Advertising honeypot pages to attackers Detecting malicious traffic 7 Architecture 8 Obtaining attacker queries • Perform brute-force port scanning on the Internet. • Make use of Internet search engines. – PHP vulnerability : Phpizabi、v0.848b、 c1 、 hfp1 • Bing log – Group – SBotMiner : inurl:/includes/joomla.php [a-z]{3,7} 9 Creation of honeypot pages(cont.) • How do we create an appropriate honeypot? – (a) Install vulnerable Web software • Pros : How the attacker interacts with and compromises • Cons: domain expert、set up the software – (b) Set up Web pages matching the query • Pros: similar to the ones created by real software(auto) • Cons: fewer interactions (depth of attack) – (c) Set up proxy pages • Pros: (a)(b) • Cons: malicious attacks 10 Creation of honeypot pages • In our deployment, we choose a combination of options (a) and (b) – Search engines (Bing and Google ) – Top three results (emulate) – Web pages at these URLs(crawler) – Rewrite all the links on the page(Javascript) – Ex http://path/to/honeypot/includes/joomla.php • VMs(few common Web applications) – separate 11 Advertising honeypot pages to attackers • Ideally, we want our honeypot pages to appear in the top results of all malicious searches。(Major search engine help) • In our deployment – boost the chance of honeypot pages – adding surreptitious links pointing to our honeypot pages on other public Web pages (author homepage) • SEO 12 Detecting malicious traffic • Identifying crawlers – Characterizing the behavior of known crawlers – Identifying unknown crawlers • Identifying malicious traffic 13 Identifying crawlers(cont.) • Well-known : Google’s crawler uses Mozilla/5.0(compatible;Googlebot/2.1;+http://www.google.c om/bot.html) • Characterizing the behavior of known crawlers – We identify a few known crawlers by looking at the user agent string and verify that the IP address. – Single search engine use multiple IP addresses to crawl pages. (AS) • To distinguish static links(honeypots pages) and dynamic links(real web software) – – – – – Dynamic links are accessed by one crawler. /ucp.php?mode=register&sid=1f23...e51a1b /ucp.php mode=register sid=[0-9a-f]{32}. (AutoRE) Dynamic links (#E) Static links (#C) 14 Identifying crawlers(cont.) • Identifying unknown crawlers – identify other IP addresses • Similar is defined in two parts: – First, must access a large fraction of pages • K = |P|/|C| • All of links (#P)、 Dynamic links (#E)、 Static links (#C) – Second, |P|-|C| = |E| 15 Identifying crawlers(cont.) • Identifying malicious traffic – heat-seeking honeypots(static pages) attract attacker visits. – From honeypot logs, • Not targeting these static pages • access non-existent files or private files. • WhiteList – – – – Honeypots pages、real software pages、favicon.ico. Out of whitelist links are suspicious Blacklist-based need Human operators or security experts. Automated、applied different type of software. 16 Result • Time : 3 mouth • Place : Washington university CS personal home page. • 96 automatically generated honeypot web pages • 4 manually installed Web application software packages • 54,477 visits 、 6,438 distinct IP 17 Result • • • • Distinguishing malicious visits Properties of attacker visits Comparing honeypots Applying whitelists to the Internet 18 Distinguishing malicious visits • Popular Search engine crawler Low pagerank • Google, Bing and Yahoo • One crawler visitors links are dynamic links in the software. 19 Crawler visit We choose K = 75% 20 Attack visits to each honeypot pages Joomla 21 Properties of attacker visits 0.1 aggressive IP 22 Geographic locations & Discovery time 12 Discovery time : We calculate the number of days between the first crawl of the page by a search crawler and the first visit to 23 the page by an attacker Comparing Honeypots • 1. Web server – No hostname – Just IP – No hyperlinks • 2. Vulnerable software – Links to them on Web sites – Search engine can find them • 3. Heat-seeking honeypot pages – Emulate Vulnerable pages – Search engine can find them 24 Comparison of the total number of visits and the number of distinct IP addresses 25 Attack types 26 Applying whitelists to the Internet 0.25 27 Discussion • Detectability of heat-seeking honeypots – Attackers may detect client-based honeypot – Install not full versions of software package • Attracting more attacks – PlanetLab (different domain) • Improving reaction times – Cooperation of search engines • Whitelist – Administrators can secure Web application 28 Conclusion • In this paper, we present heat-seeking honeypots, which deploy honeypot pages corresponding to vulnerable software in order to attract attackers. • Further, our system can detect malicious IP addresses solely through their Web access patterns • false-negative rate of at most 1%. 29 Thank You 30