Throwing the Net and Catching Hackers – Final Report Winter 2009-2010 Submitted by: Shalev Mintz Ori Rezen Supervisor: Amichai Shulman Table of contents Table of contents 2 Overview 3 Cloud Computing - Introduction 4 Amazon EC2 Cloud Computing 5 EC2 – Main Features 6 TOR – The Onion Router 7 Project Architecture 8 Results 10 Conclusion 12 Overview Our project's goal was to discover new attack trends via an application level honeypot. In order to achieve this goal, we decided on two paths – Create a TOR server coupled with a sniffer, capturing all outgoing HTTP traffic. Deliberately infect a machine with a botnet Trojan, and log all traffic from the machine. In order to cope with analyzing all the data we expected of collecting, we devised an automatic tool to tag and filter the log files on a host based filter, and display it in HTML's, allowing easier processing of the data. Cloud Computing - Introduction “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources” – American National Institute of Standards and Technology (NIST) The Cloud Computing technology is based on five main principles: 1. On-demand self-service – A user may at any time, without other human intervention, use available resources from the Cloud, such as data storage or server time. 2. Broad network access – Access to the Cloud is not limited to a single ISP or limited by geographic locations, but is available by globally and from any device (e.g., mobile phones, laptops, and PDAs). 3. Resource pooling – The user need not have any technical knowledge of the Cloud’s implementation or even geographic location, and the Cloud’s resources are pooled to serve multiple users. Examples of resources include storage, processing, memory, network bandwidth, and virtual machines 4. Rapid elasticity – computing resources must be quickly deployable, with the user having the option of quickly modifying the computing power receive from the cloud. To the average user, the Cloud resources often appear unlimited. 5. Measured Service – Cloud systems have monitoring services, allowing users to pay only for consumed Cloud resources, such as by computing power or storage size. Abstract Cloud Diagram Amazon EC2 Cloud Computing “Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud.” - aws.amazon.com/ec2/ For our project we needed a means of creating a honey-pot with the option of creating multiple honey-pots for added visibility. This led us to implement our honey-pot on Amazon’s EC2 service. By creating our probes in a cloud, we enjoyed the following benefits: No physical server is needed, everything is in the cloud. As we were handling Trojans and prone to hacker attacks, we could rapidly terminate and re-create our instances. By creating a ‘snapshot’ of the instance, we can easily backup our work. With the same ‘snapshot’ feature we can deploy as many probes as we like, determining their geographic location as well. EC2 Interface EC2 – Main Features Instances Through this menu we can browse our running instances and launch new instances, terminate them, configure security permissions, connect to them through Remote Desktop Protocol (RDP), and view their traffic usage. AMI Amazon Machine Images (AMI) are snapshots of running instances. This feature allows us to save an instances state, either as backup or to be redeployed whenever required. We used this feature to save all our progress for future projects. To create a new AMI, a machine needs to be ‘bundled’. Volume As well as instances, a storage device may be leased from Amazon. This ‘removable’ storage acts as an external HD which can be ‘attached’ or ‘removed’ from any of your instances, allowing in effect easy data transfer between instances. This feature solved an interesting problem – when we infected an instance we a Trojan, the Trojan blocked all RDP connections to the instance, in effect denying us any contact with the machine. This stopped us from accessing the log files Wireshark was creating from the instance. The workaround we found, was redirecting Wireshark’s log files to a volume, and after a while we detached the volume and reattached it to a different instance in order to read the log files. Security Groups EC2 allows you to determine the security for every instance, a feature very unlike a firewall with an IP table. The user may define several security groups, and under each group define the IP rules. Through the instance screen, the security groups are assigned to the different instances. This allows to strengthen the likelihood of an instance ‘highjack’ TOR – The Onion Router "Tor is a network of virtual tunnels that allows people and groups to improve their privacy and security on the Internet." www.torproject.org When an internet user wants to hide his identity while surfing the web, he may use the TOR client software. Instead of taking a direct route from source to destination, data packets on the Tor network take a random pathway through several relays that cover one's tracks so no observer at any single point can tell where the data came from or where it's going. The underlying assumption of our project is that amongst legitimate users, TOR is also being used by nefarious groups or individuals in order to commit cybercrime while staying anonymous. Project Architecture We set up a TOR server on Amazon's EC2 cloud computing service and installed Wireshark on it in order to capture all outgoing traffic, i.e traffic that used our server as an end node in the TOR network. We wrote a batch script that utilizes several executables and xslt files, in order to automatically export the log files, first into an xml format, and then to html format, making the analysis process much easier. The password for this instance is: h2YlscaDXcG. Another virtual machine was also set up with Wireshark, and infected with a botnet. The password for this instance is: oXFBdHBIxrY. Executables used Filter-garbage.exe - Compares an xml file with 'packet' nodes to a pre-defined filter-list and adds a valid attribute to the 'host' field of the 'packet' nodes accordingly. Output is written into new file named XML_file_filtered, where XML_file is the name of the original file. Usage: Filter-garbage XML_file filter_list remove-non-valid.exe - Iterates over an XML with 'packet' nodes, and creates a new file that holds only the 'packet' nodes whose hosts passed the filter-list. Output is written into a new file named XML_file_only_valid.xml where XML_file is the name of the original file. Usage: remove-non-valid XML_file fix_xml.exe – fixes the encoding definition in the given xml file, to comply with UTF8. Output is written into a new file named XML_file _fixed where XML_file is the name of the original file. NOTE: this executable is a work-around a bug in earlier versions of T-shark. If you are using the latest version of T-shark this executable is superfluous and can (and should) be omitted. Usage: fix_xml XML_file XSLT files used wireshark-basic-v2.xslt – converts the pdml files created by Tshark to xml files that store the captured packets in the form of 'packet' nodes. Written by Amichai Shulman. tor-to-html.xslt – converts xml files with 'packet' nodes to a html files that contain a table of links to URLs that appeared in the 'packet' nodes, sorted by IP address. Results Trojan attack We noticed in log file 41 access to guest-books of legit sites. The guest-books were filled with comments, where each comment was actually a list of hyperlinks with xml-like tags and a comment about porn. The links lead to non-pornographic sites, where a script had uploaded a link to a “porn movie”. These links are in forum comments or fake social profiles. When the “movie” is played, the browser is referred to another site where you are either told you need to download a codec to play the video (Trojan), or using scareware, you are told you have a virus on your windows OS and you must install a new anti-virus (Trojan). Using a disassembler we took a close look at the Trojan file. Most of the code was encrypted and/or obfuscated, as we saw a reference to crypt32.dll and the code was mostly unreadable. The only readable part of the code was a request for administrator privileges. Using a different EC2 instance, we infected the machine with the Trojan and used Wireshark to log the traffic. Analyzing the log files, we noted our machine contacted the IP address 195.5.161.117. This IP is a known malware server, and our Trojan was listed a s a "fast-flux rogue anti-virus." After contacting the IP above, our machine began contacting various sites in the US (Manhattan, Francisco, etc.) and downloading encrypted data. Scraping In many log file we saw systematic downloading of complete real estate sites. This phenomenon is known as scraping. Scraping is the unauthorized extraction of information from another website. Automated bots can collect all of the exposed information on a site, though a crude copy-and-paste operation. Realtors can then post the data on their own sites and link the properties to their agents, as if they had been hired to market the properties. Essentially, scraping is copying the listings from one company and then displaying them as your own on your website. Physician identity theft In log file 15 we found systematic downloading of physician names as well as other details, from several state medical board Web sites. This is the first step in an identity fraud scheme intended to defraud Medicare. For further reading on the subject click here. Yahoo! brute force attack One of the things that were apparent in virtually every log file was a widespread brute-force attack against Yahoo email users, which aimed to obtain login credentials and then use the hijacked accounts for spamming. Yahoo Mail's main login page utilizes a number of security mechanisms to protect against brute force attacks, including providing a generic "error" page that does not reveal whether it was the username or password that the user got wrong. Also, Yahoo tracks the number of failed login attempts and requires that users solve a CAPTCHA if they have exceeded a certain number of incorrect tries. Attackers have apparently found a web service application used to authenticate Yahoo users that does not contain the same security mechanisms. The application - /config/isp_verify_user – is an API used to authenticate ISP business partners of Yahoo. The application is giving detailed error messages when someone enters the wrong username and password, noting which was incorrect. Also, it does not utilize any CAPTCHA on the error page, enabling attackers to guess an unlimited number of times until they come up with the right credentials. Zeus Botnet We managed to identify the botnet we had used for infection as belonging to the Zeus botnet, at the time infecting around 100,000 computers world-wide. Our machine failed to contact its C&C, probably due to a takedown on Zeus servers a month before we used the botnet. Conclusion One of the initial goals of our project was to deploy several TOR probes, allowing truer visibility of TOR usage. From our single TOR probe we accumulated ~10MB of HTTP traffic per minute, meaning ~1.4GB of data per day. The enormous amount of data we collected made sieving through the data for attack vectors very difficult. Even after applying our filtering tool, the process remained Sisyphean. We suggest a different approach in order to circumvent this problem. The idea is to limit the scope. Instead of looking at all traffic while excluding specific sites, a better idea might be to exclude all sites except for a chosen few, in which we are interested. This way, we can deploy multiple probes, while getting only pertinent data.