Throwing the Net and Catching Hackers – Final Report

advertisement
Throwing the Net and Catching
Hackers – Final Report
Winter 2009-2010
Submitted by:
Shalev Mintz
Ori Rezen
Supervisor:
Amichai Shulman
Table of contents
Table of contents
2
Overview
3
Cloud Computing - Introduction
4
Amazon EC2 Cloud Computing
5
EC2 – Main Features
6
TOR – The Onion Router
7
Project Architecture
8
Results
10
Conclusion
12
Overview
Our project's goal was to discover new attack trends via an application level honeypot. In order to achieve this goal, we decided on two paths –


Create a TOR server coupled with a sniffer, capturing all outgoing HTTP
traffic.
Deliberately infect a machine with a botnet Trojan, and log all traffic from the
machine.
In order to cope with analyzing all the data we expected of collecting, we devised an
automatic tool to tag and filter the log files on a host based filter, and display it in
HTML's, allowing easier processing of the data.
Cloud Computing - Introduction
“Cloud computing is a model for enabling convenient, on-demand network access to
a shared pool of configurable computing resources” – American National Institute of
Standards and Technology (NIST)
The Cloud Computing technology is based on five main principles:
1. On-demand self-service – A user may at any time, without other human
intervention, use available resources from the Cloud, such as data storage or
server time.
2. Broad network access – Access to the Cloud is not limited to a single ISP or
limited by geographic locations, but is available by globally and from any
device (e.g., mobile phones, laptops, and PDAs).
3. Resource pooling – The user need not have any technical knowledge of the
Cloud’s implementation or even geographic location, and the Cloud’s
resources are pooled to serve multiple users. Examples of resources include
storage, processing, memory, network bandwidth, and virtual machines
4. Rapid elasticity – computing resources must be quickly deployable, with the
user having the option of quickly modifying the computing power receive
from the cloud. To the average user, the Cloud resources often appear
unlimited.
5. Measured Service – Cloud systems have monitoring services, allowing users
to pay only for consumed Cloud resources, such as by computing power or
storage size.
Abstract Cloud Diagram
Amazon EC2 Cloud Computing
“Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides
resizable compute capacity in the cloud.” - aws.amazon.com/ec2/
For our project we needed a means of creating a honey-pot with the option of
creating multiple honey-pots for added visibility. This led us to implement our
honey-pot on Amazon’s EC2 service.
By creating our probes in a cloud, we enjoyed the following benefits:




No physical server is needed, everything is in the cloud.
As we were handling Trojans and prone to hacker attacks, we could rapidly
terminate and re-create our instances.
By creating a ‘snapshot’ of the instance, we can easily backup our work.
With the same ‘snapshot’ feature we can deploy as many probes as we like,
determining their geographic location as well.
EC2 Interface
EC2 – Main Features
Instances
Through this menu we can browse our running instances and launch new instances,
terminate them, configure security permissions, connect to them through Remote
Desktop Protocol (RDP), and view their traffic usage.
AMI
Amazon Machine Images (AMI) are snapshots of running instances. This feature
allows us to save an instances state, either as backup or to be redeployed whenever
required. We used this feature to save all our progress for future projects. To create
a new AMI, a machine needs to be ‘bundled’.
Volume
As well as instances, a storage device may be leased from Amazon. This ‘removable’
storage acts as an external HD which can be ‘attached’ or ‘removed’ from any of
your instances, allowing in effect easy data transfer between instances.
This feature solved an interesting problem – when we infected an instance we a
Trojan, the Trojan blocked all RDP connections to the instance, in effect denying us
any contact with the machine. This stopped us from accessing the log files Wireshark
was creating from the instance. The workaround we found, was redirecting
Wireshark’s log files to a volume, and after a while we detached the volume and reattached it to a different instance in order to read the log files.
Security Groups
EC2 allows you to determine the security for every instance, a feature very unlike a
firewall with an IP table. The user may define several security groups, and under
each group define the IP rules. Through the instance screen, the security groups are
assigned to the different instances. This allows to strengthen the likelihood of an
instance ‘highjack’
TOR – The Onion Router
"Tor is a network of virtual tunnels that allows people and groups to improve their
privacy and security on the Internet." www.torproject.org
When an internet user wants to hide his identity while surfing the web, he may use
the TOR client software. Instead of taking a direct route from source to destination,
data packets on the Tor network take a random pathway through several relays that
cover one's tracks so no observer at any single point can tell where the data came
from or where it's going.
The underlying assumption of our project is that amongst legitimate users, TOR is
also being used by nefarious groups or individuals in order to commit cybercrime
while staying anonymous.
Project Architecture
We set up a TOR server on Amazon's EC2 cloud computing service and installed
Wireshark on it in order to capture all outgoing traffic, i.e traffic that used our server
as an end node in the TOR network. We wrote a batch script that utilizes several
executables and xslt files, in order to automatically export the log files, first into an
xml format, and then to html format, making the analysis process much easier. The
password for this instance is: h2YlscaDXcG.
Another virtual machine was also set up with Wireshark, and infected with a botnet.
The password for this instance is: oXFBdHBIxrY.
Executables used



Filter-garbage.exe - Compares an xml file with 'packet' nodes to a
pre-defined filter-list and adds a valid attribute to the 'host' field of the
'packet' nodes accordingly. Output is written into new file named
XML_file_filtered, where XML_file is the name of the original
file.
Usage: Filter-garbage XML_file filter_list
remove-non-valid.exe - Iterates over an XML with 'packet' nodes, and
creates a new file that holds only the 'packet' nodes whose hosts passed the
filter-list.
Output
is
written
into
a
new
file
named
XML_file_only_valid.xml where XML_file is the name of the
original file.
Usage: remove-non-valid XML_file
fix_xml.exe – fixes the encoding definition in the given xml file, to comply
with UTF8. Output is written into a new file named XML_file _fixed
where XML_file is the name of the original file.
NOTE: this executable is a work-around a bug in earlier versions of T-shark. If
you are using the latest version of T-shark this executable is superfluous and
can (and should) be omitted.
Usage: fix_xml XML_file
XSLT files used


wireshark-basic-v2.xslt – converts the pdml files created by Tshark to xml files that store the captured packets in the form of 'packet'
nodes. Written by Amichai Shulman.
tor-to-html.xslt – converts xml files with 'packet' nodes to a html files
that contain a table of links to URLs that appeared in the 'packet' nodes,
sorted by IP address.
Results
Trojan attack
We noticed in log file 41 access to guest-books of legit sites. The guest-books were
filled with comments, where each comment was actually a list of hyperlinks with
xml-like tags and a comment about porn. The links lead to non-pornographic sites,
where a script had uploaded a link to a “porn movie”. These links are in forum
comments or fake social profiles. When the “movie” is played, the browser is
referred to another site where you are either told you need to download a codec to
play the video (Trojan), or using scareware, you are told you have a virus on your
windows OS and you must install a new anti-virus (Trojan).
Using a disassembler we took a close look at the Trojan file. Most of the code was
encrypted and/or obfuscated, as we saw a reference to crypt32.dll and the code was
mostly unreadable. The only readable part of the code was a request for
administrator privileges.
Using a different EC2 instance, we infected the machine with the Trojan and used
Wireshark to log the traffic. Analyzing the log files, we noted our machine contacted
the IP address 195.5.161.117. This IP is a known malware server, and our Trojan was
listed a s a "fast-flux rogue anti-virus."
After contacting the IP above, our machine began contacting various sites in the US
(Manhattan, Francisco, etc.) and downloading encrypted data.
Scraping
In many log file we saw systematic downloading of complete real estate sites. This
phenomenon is known as scraping. Scraping is the unauthorized extraction of
information from another website. Automated bots can collect all of the exposed
information on a site, though a crude copy-and-paste operation. Realtors can then
post the data on their own sites and link the properties to their agents, as if they had
been hired to market the properties. Essentially, scraping is copying the listings from
one company and then displaying them as your own on your website.
Physician identity theft
In log file 15 we found systematic downloading of physician names as well as other
details, from several state medical board Web sites. This is the first step in an
identity fraud scheme intended to defraud Medicare. For further reading on the
subject click here.
Yahoo! brute force attack
One of the things that were apparent in virtually every log file was a widespread
brute-force attack against Yahoo email users, which aimed to obtain login
credentials and then use the hijacked accounts for spamming. Yahoo Mail's main
login page utilizes a number of security mechanisms to protect against brute force
attacks, including providing a generic "error" page that does not reveal whether it
was the username or password that the user got wrong. Also, Yahoo tracks the
number of failed login attempts and requires that users solve a CAPTCHA if they
have exceeded a certain number of incorrect tries. Attackers have apparently found
a web service application used to authenticate Yahoo users that does not contain the
same security mechanisms. The application - /config/isp_verify_user – is an API used to
authenticate ISP business partners of Yahoo. The application is giving detailed error
messages when someone enters the wrong username and password, noting which was
incorrect. Also, it does not utilize any CAPTCHA on the error page, enabling attackers to
guess an unlimited number of times until they come up with the right credentials.
Zeus Botnet
We managed to identify the botnet we had used for infection as belonging to the
Zeus botnet, at the time infecting around 100,000 computers world-wide. Our
machine failed to contact its C&C, probably due to a takedown on Zeus servers a
month before we used the botnet.
Conclusion
One of the initial goals of our project was to deploy several TOR probes, allowing
truer visibility of TOR usage. From our single TOR probe we accumulated ~10MB of
HTTP traffic per minute, meaning ~1.4GB of data per day. The enormous amount of
data we collected made sieving through the data for attack vectors very difficult.
Even after applying our filtering tool, the process remained Sisyphean.
We suggest a different approach in order to circumvent this problem. The idea is to
limit the scope. Instead of looking at all traffic while excluding specific sites, a better
idea might be to exclude all sites except for a chosen few, in which we are
interested. This way, we can deploy multiple probes, while getting only pertinent
data.
Download