Transaction / Regular Paper Title

advertisement
IEEE TRANSACTIONS ON COMPUTERS
1
Strider HoneyMonkeys: Active, Client-Side
Honeypots for Finding Malicious Websites
Yi-Min Wang, Yuan Niu, Hao Chen, Doug Beck, Xuxian Jiang, Roussi Roussev,
Chad Verbowski, Shuo Chen, and Sam King
Abstract— A serious threat is emerging on the Internet where malicious Web sites exploit browser vulnerabilities to install malicious
software. Traditional honeypots are typically server-side, which passively draw attacks from malicious clients. In this paper, we
describe the Strider HoneyMonkey Exploit Detection System, which uses client-side honeypots to hunt for potentially malicious Web
sites actively. Our HoneyMonkey system detects unauthorized persistent-state changes resulting from successful vulnerability
exploits; therefore, it can detect both known and zero-day exploits. The system also tracks URL-redirection and analyzes cloaking to
expose back-end malicious Web sites from front-end doorway pages that attract browser traffic. Within the first month of
deployment, the system identified 752 unique URLs that could successfully exploit unpatched Windows XP machines. During daily
monitoring of these 752 exploit-URLs, we discovered a malicious Web site that hosted a zero-day exploit of an unpatched
javaprxy.dll vulnerability and that was hiding behind 25 doorway pages. Microsoft Security Response Center confirmed our report as
the first “in-the-wild”, zero-day exploit of this vulnerability. Additionally, by scanning the most popular one million URLs as determined
by a search engine, we found over seven hundred exploit-URLs (0.071%), many of which served popular content related to
celebrities, song lyrics, wallpapers, video game cheats, and wrestling.
Index Terms—Security, Virtual Machines, Software Vulnerability, World-Wide Web
——————————  ——————————
1 INTRODUCTION
I
nternet attacks that use malicious or hacked Web sites to
exploit unpatched browser vulnerabilities are on the rise.
These malicious Web servers install malcode on visiting
client machines without requiring any user action
[6],[10],[42]. In 2004 [66] and again in 2006 [67], for example, attackers compromised 3rd-party ad servers, which
then infected vulnerable machines that visited pages serving ads from these servers. Although manual analyses of
these events [11],[12],[15],[24],[41],[45],[48] help discover
the exploited vulnerabilities and installed malware, such
efforts do not scale We need an automated approach that
can find malicious Web sites efficiently and can help us
study the landscape of this problem.
We describe the Strider HoneyMonkey system for automatic, systematic Web patrol. “Honey” means “honeypot” [18],
and “Monkey” refers to “monkey programs” that drive
Web browsers to mimic human browsing. As illustrated in
Figure 1, while common server-side honeypots expose
server-side vulnerabilities and passively await attacks from
malicious clients, HoneyMonkeys are client-side honeypots
with vulnerable client-side software that actively visit potentially malicious Web servers.
The HoneyMonkey system adopts a black-box, signature-free, state-change-based approach to exploit detection:
————————————————
 Yi-Min Wang, Doug Beck, Chad Verbowski, and Shuo Chen are with Microsoft Corporation
 Hao Chen and Yuan Niu are with the Department of Computer Science,
the University of California, Davis.
 Xuxian Jiang is with George Mason University. Roussi Roussev is with
Florida Institute of Technology. Sam King is with the University of Illinois
at Urbana-Champaign. Part of this work was performed while these three
coauthors were summer interns at Microsoft Research.
Malicious
Client
Malicious
Doorway Page
Active
Visiting
Server
Software
Honeypot
Vulnerability
URL
Redirection
Browser
Passive
Waiting
Exploit traffic
Exploit
Provider
Monkey Program
Client Software
HoneyMonkey
Figure 1: Honeypot versus HoneyMonkey
instead of directly detecting the acts of vulnerability exploits, the system uses the Strider Tracer [53] to catch unauthorized file creations and configuration changes that result from
successful exploits. This allows us to pipeline the same HoneyMonkey program on multiple Virtual Machines (VMs)
whose OSs and applications have different patch levels to
evaluate the strength of attacks from different Web sites.
URL-redirection tracking [56] is another key aspect of
the HoneyMonkey exploit analysis. Large-scale exploiters
typically structure their Web sites in two layers: doorway
pages at the front end that attract browser traffic and redirect it to exploit providers at the back end (see Figure 1).
We distinguish two types of doorway pages: content providers, which host attractive content (such as pornography,
celebrity information, game cheats, etc.) at well-known
URLs to draw direct browser traffic, versus “throw-away”
spam doorways, which attract traffic by appearing in top
2
results at major search engines through the use of questionable search engine optimization techniques (commonly
referred to as Web spam [17] or search spam). The HoneyMonkey uses the Strider URL Tracer [60] to capture all the
front-end and back-end exploit sites involved in detected
malicious activities.
We demonstrate the effectiveness of our method by discovering large communities of malicious Web sites and by
deriving the redirection relationships between them. We
describe our experience of identifying a new zero-day exploit using this system. We expose hundreds of malicious
Web pages from amongst the most popular Web sites and
thousands of pages that are involved in Web spamming.
Finally, we propose a comprehensive anti-exploit process
based on this monitoring system to improve Internet safety.
This paper is organized as follows. Section 2 provides
background information on the problem space by describing the techniques used in actual client-side exploits of
popular Web browsers. Section 3 gives an overview of the
Strider HoneyMonkey Exploit Detection System and its
surrounding Anti-Exploit Process. Section 4 evaluates the
effectiveness of HoneyMonkey in both known-vulnerability
and zero-day exploit detection, and presents an analysis of
the exploit data to help prioritize investigation tasks. Section 5 discusses the limitations of and possible attacks on
the current HoneyMonkey system and describes several
countermeasures including an enhancement based on a
Vulnerability-Specific Exploit Detection mechanism. Section 6 surveys related work and Section 7 concludes the
paper.
2 BROWSER-BASED VULNERABILITY EXPLOITS
Malicious Web sites typically exploit browser vulnerabilities in four steps: code obfuscation, URL redirection and
cloaking, vulnerability exploitation, and malware installation.
2.1 Code Obfuscation
To foil human investigation and signature-based detection
by anti-malware programs, some Web sites use a combination of the following code obfuscation techniques: (1) dynamic code injection using the document.write() function
inside a script; (2) unreadable, long strings with encoded
characters --- such as “%28” and “&#104” -- which are then
decoded either by the unescape() function inside a script or
directly by the browser; (3) custom decoding routine included in a script; and (4) sub-string replacement using the
replace() function. Code-obfuscation makes it difficult for
signature-based detectors to detect new variants of existing
exploit code.
2.2 URL Redirection and Cloaking
2.2.1 URL Redirection
Most malicious Web sites automatically redirect browser
traffic to additional URLs. Specifically, when a browser
visits a primary URL, the response from that URL instructs
the browser to automatically visit one or more secondary
URLs, which may or may not affect the content that is displayed to the user. Such redirections typically use one of
IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
the following mechanisms classified into three categories:
(1) protocol redirection using HTTP 302 Temporary Redirect; (2) HTML tags including <iframe>, <frame> inside
<frameset>, and <META http-equiv=refresh>; (3) script
functions including window.location.replace(), window.location.href(),
window.open(),
window.showModalDialog(), and <link_ID>.click(), etc. Since
benign Web sites also commonly use redirection to enrich
content, forbidding redirection is infeasible.
2.2.2 URL Cloaking
In some cases, spammers use redirection and cloaking
synergistically to serve optimized pages. For example, in
crawler-browser cloaking, a spam Web page contains scripts
that, when executed, will modify the page. Since most
crawlers run with script execution off, the page seen by
crawlers is different from the one seen by humans. In a recent paper [37] we reported a new type of cloaking: clickthrough cloaking, that serves different content based on the
Referer field in HTTP requests and the browser referrer variable. We discovered that some malicious Web site operators were using search spamming to draw traffic, and that
some of them were using cloaking to spam more effectively.
We divide click-through cloaking into three categories:
server-side cloaking, client-side cloaking, and combination techniques. Server-side cloaking checks the Referer field in the
HTTP request header for a search engine URL. For example, www.intheribbons.com/win330/2077_durwood.html serves
spam content from lotto.gamblingfoo.com to click-through
users, but serves a “404 Not Found” page to non-clickthrough users. To defeat this cloaking, one can issue a query of “url:www.intheribbons.com/win330/2077_durwood.html”
in a search engine to fool the Referer field check. In some
cases, the pages are scripted with checks for special queries
not expected of normal users. For example, “link:”, “linkdomain:”, “site:”, “url:” (or “info:” 1) are often used by those
investigating spam, but not normal search users. However,
all these checks can be circumvented by faking the Referer
field in the HTTP header.
The advantage gained by faking the Referer field is nullified when pages use client-side cloaking to distinguish between fake and real Referer field data by running a script in
the client’s browser to check the document.referrer variable. Example 1 shows a script used by the spam URL naha.org/old/tmp/evans-sara-real-fine-place/index.html. The script
checks whether the document.referrer string contains the
name of any major search engines. If successful the browser
redirects to ppcan.info/mp3re.php and eventually to spam;
otherwise, the browser stays at the current doorway page.
To defeat the simple client-side cloaking, issuing a query of
the form “url:link1” is sufficient. This allows us to fake a
click through from a real search engine page.
var
url
=
document.location
+
"";
exit=true;
ref=escape(document.referrer);
if ((ref.indexOf('search')==-1) && (ref.indexOf('google')==-1) &&
1 To show information about particular Web pages, some search engines
accept “url:” queries while others accept “info:” queries.
WANG ET AL.: STRIDER HONEYMONKEYS: ACTIVE, CLIENT-SIDE HONEYPOTS FOR FINDING MALICIOUS WEB SITES
(ref.indexOf('find')==-1)
&&
(ref.indexOf('yahoo')==-1)
&&
(ref.indexOf('aol')==-1)
&&
(ref.indexOf('msn')==-1)
&&
(ref.indexOf('altavista')==-1)
&&
(ref.indexOf('ask')==-1)
&&
(ref.indexOf('alltheweb')==-1) && (ref.indexOf('dogpile')==-1) &&
(ref.indexOf('excite')==-1) && (ref.indexOf('netscape')==-1) &&
(ref.indexOf('fast')==-1)
&&
(ref.indexOf('seek')==-1)
&&
(ref.indexOf('find')==-1) && (ref.indexOf('searchfeed')==-1) &&
(ref.indexOf('about.com')==-1) && (ref.indexOf('dmoz')==-1) &&
(ref.indexOf('accoona')==-1) &&
(ref.indexOf('crawler')==-1)) { exit=false; } if (exit) { p=location; r=escape(document.referrer); location='http://ppcan.info/mp3re.php?niche=Evans, Sara&ref='+r }
Example 1: A basic client-side cloaking script
More vigilant spammers may include checks for “link:”,
“linkdomain:”, and “site:” as well as for the spam URL’s
domain name in the “url:” and “info:” cases. These checks
are shown in Example 2 for lossovernigh180.blogspot.com.
Again, the script determines which page will be served.
Function is_se_traffic() {
if ( document.referrer ) {
if ( document.referrer.indexOf(“google”)>0
|| document.referrer.indexOf(“yahoo”)>0
|| document.referrer.indexOf(“msn”)>0
|| document.referrer.indexOf(“live”)>0
|| document.referrer.indexOf(“search.blogger.com”)>0
|| document.referrer.indexOf(“www.ask.com”)>0)
{
If ( document.referrer.indexOf( document.domain )<0
&& document.referrer.indexOf( “link%3A” )<0
&& document.referrer.indexOf( “linkdomain%3A” )<0
&& document.referrer.indexOf( “site%3A” )<0 )
{ return true; }
}
}
return false;
}
Example 2: Script fragment for document.referrer checking
from
zeppele.com/9726_5fcb7_Vp8.js,
which
lossovernigh180.blogspot.com redirects to
Techniques exist to combine both server and client-side
scripting in an attempt to foil the efforts of spam investigators, to whom the client-side script logic is exposed. Combination techniques extract referrer information directly
from the client-side document.referrer variable and pass it
along to a spam domain for server-side checks. For example, lawweekly.student.virginia.edu/wwwboard/messages/007.html
sends document.referrer information to 4nrop.com as part of
the URL. If the referrer information passes the server-side
check, the browser is redirected to raph.us. Otherwise, the
browser is redirected to a bogus 404 error page. This
spammer has also attacked other .edu Web sites and set up
cloaked
pages
with
similar
behavior:
pbl.cc.gatech.edu/bmed3200a/10
and
languages.uconn.edu/faculty/CVs/data-10.php are two such examples.
Example 3 shows an obfuscated script using the combination techniques in mywebpage.netscape.com/superphrm2/ordertramadol.htm. By replacing eval() with alert(), we were able to
3
see that emaxrdr.com is sent the document.referrer information, and then either redirects the browser to the spam
page pillserch.com or to spampatrol.org, a fake spam-reporting
Web site.
<script> var params="f=pharmacy&cat=tramadol";
function kqqw(s){
var Tqqe=String("qwertyuioplkjhgfdsazxcvbnmQWERTYU
IOPLKJHGFDSAZXCVBNM_1234567890");
var tqqr=String(s); var Bqqt=String("");
var Iqqy,pqqu,Yqqi=tqqr.length;
for ( Iqqy=0; Iqqy<Yqqi; Iqqy+=2) {
pqqu=Tqqe.indexOf(tqqr.charAt(Iqqy))*63;
pqqu+=Tqqe.indexOf(tqqr.charAt(Iqqy+1));
Bqqt=Bqqt+String.fromCharCode(pqqu);
}
return(Bqqt);
}
eval(kqqw('wKwVwLw2wXwJwCw1qXw4wMwDw1wJqGqHq8qHq
SqHw_ …
Bw1qHqSqHq0qHqFq7'));</script>
Example 3: Obfuscated script for combo cloaking
2.3 Vulnerability Exploitation
Many malicious Web pages attempt to exploit multiple
browser vulnerabilities to maximize the chance of a successful attack. Figure 2 shows an example HTML fragment
that uses various primitives to load multiple files from different URLs on the same server to exploit three vulnerabilities fixed in Microsoft Security Bulletins MS05-002 [34],
MS03-011 [35], and MS04-013 [36]. If any of the exploits
succeeds, it will download and execute a Trojan downloader named win32.exe. Note that although Internet Explorer
is a popular target, other browsers also have exploitable
vulnerabilities.
2.4 Malware Installation
Most exploits eventually install malcode on the victim
machines to enable further attacks. We have observed a
plethora of malcode types installed through browser exploits, including viruses that infect files, backdoors that
open entry points for future unauthorized access, bot programs that allow the attacker to control a whole network of
compromised systems, Trojan downloaders that connect to
the Internet and download other programs, Trojan droppers that drop files from themselves without accessing the
Internet, and Trojan proxies that redirect network traffic.
Some exploits also install spyware programs and rogue
anti-spyware programs.
3 THE STRIDER HONEYMONKEY SYSTEM
The HoneyMonkey system automatically detects and analyzes Web sites that exploit Web browsers. Figure 3 illustrates the HoneyMonkey Exploit Detection System, shown
inside the dotted square, and the surrounding Anti-Exploit
Process, which includes both automatic and manual components.
4
IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
3.1 Redirection and Cloaking Analysis
3.1.1 Redirection Analysis
Many exploit-URLs do not contain exploits but instead
act as front-end doorway pages to attract browser traffic.
They are either content providers serving “attractive” content
(such as pornography or celebrity information), or spam
pages that get into top results at major search engines
through questionable search engine optimization techniques [17][37]. These doorway pages then redirect to backend exploit providers, which execute the exploits.
We may track the URLs visited through browser redirection by intercepting APIs at the network level within each
browser process [56]. When the HoneyMonkey runs in its
“redirection analysis” mode, it checks all the automatically
visited URLs. In Section 4, we present how our URL redirection analysis helps identify major exploit providers that
receive traffic from a large number of doorway pages, and
helps illustrate how exploit providers organize their Web
pages to facilitate customized malware installations for
their affiliates. Finally, we are able to positively identify the
Web pages that actually perform the exploits by implementing an option in our redirection tracker to block all
redirection traffic.
cloaking scanner visits Web sites by clicking through
search-result pages. Given a URL, the scanner derives likely
keywords targeted by the spammer and queries search engines so that the document.referrer variable is correctly set.
We have observed that click-through cloaking almost
always indicates spam; therefore, our redirection-diff approach uses the presence of cloaking as a spam detection
mechanism [55]. Specifically, our tool scans each suspicious
URL twice –- once with anti-cloaking and once without -and compare the two sets of redirection domains. If the
two sets are different, the tool flags the URL as cloaking.
False positives can exist under some circumstances: (1)
Some Web sites serve rotating ads from different domains.
(2) Some Web sites rotate final destinations to distribute
visits among their traffic-affiliate programs. (3) Some Web
accesses may fail. In practice, we find the following improvements effective for reducing false positives: (1) We
can detect the majority of cloaked pages by comparing only
the final destinations from two scans. (2) We can perform
the scans and diffs on each suspect URL multiple times and
flag only the URLs that do yield consistent diff result. (3)
Given a group of URLs expected to exhibit similar cloaking
behavior, we can perform the scans and diff for each URL
once and exclude those inconsistent with the rest of the
group. Given a confirmed cloaked URL, we can construct a
group of suspect URLs by examining domains hosted on
nearby IP addresses.
3.1.2 Cloaked Malicious Web Pages
In a recent search spam investigation described in Section 4.4, we found thousands of malicious pages whose
operators were using search spam techniques to attract
more traffic. Of those malicious page, hundreds use click- 3.2 Exploit Detection System
through cloaking (Section 2.2.2). For example, on October
The exploit detection system is the heart of the Honey13, 2006, mandevillechevrolet.com (spaces added for safety), Monkeys design (Figure 3). This system consists of a 3which appeared as the top Yahoo search result for “Man- stage pipeline of virtual machines. Given a large list of indeville Chevrolet”, would install a malware program put URLs with a potentially low exploit-URL density, each
named “ane.exe” under C:\ if the visitor’s machine was HoneyMonkey in Stage 1 starts with a scalable mode by
vulnerable. This spam page used client-side script to check visiting N URLs simultaneously inside one unpatched VM.
document.referrer and redirected click-through users to When the HoneyMonkey detects an exploit, it switches to
hqualityporn . com.
the basic, one-URL-per-VM mode to re-test each of the N
In a case study using mandevillechevrolet . com as a seed, suspects to determine which ones are exploit URLs.
we extracted 118 suspicious domains hosted on its nearby
Stage-2 HoneyMonkeys scan Stage-1 detected exploitIP addresses. We scanned the 118 domain-level pages and URLs and perform recursive redirection analysis to identify
found that 90 (76%) of the 118 URLs were cloaked URLs all Web pages involved in exploit activities and to deterHoneyMonkey
Detection System
hiding behind three malicious domains: dinet.info, frlynx.info
, mineExploit
their relationships.
Stage-3 HoneyMonkeys continuand joutweb.net. We re-scanned the same group of URLs ously scan Stage-2 detected exploit-URLs using (nearly)
again in mid-November
2006HoneyMonkey:
and found that the three do-Stage-2
HoneyMonkey:
fullyHoneyMonkey:
patched VMs to detect attacksStage-3
exploiting
the latest
Stage-1
(Nearly) fully patched virtual
Unpatchedby
virtual
machines
mains had been replaced
tisall.info
, frsets.info, and
Unpatched
virtual machines
vulnerabilities.
machines
with redirection
analysis
without
analysis
recdir.org (hosted on
the redirection
same pair
of IP addresseswith redirection
We usedanalysis
a network of 20 machines
to produce
the results
85.255.115.227 and 66.230.138.194) but their cloaking behav- reported in this paper. Each machine had a CPU speed beior remained the same.
tween 1.7 and 3.2 GHz and a memory size between 512 MB
Given these findings, it is important for security investi- and 2GB, and ran one VM configured with 256 MB to
gators to adopt anti-cloaking
techniques
in their investiga- 512MB
of RAM.
Each VM supported up to
10 simultaneous
List of
Exploit
Redirection
graph of
Redirection
graph
URLs-oftion. We present one
such tool in the following
section.
browser
processes in the scalable mode,
with or
each
process
URLs
zero-day
latest-patchedof exploit-URLs
interest
vulnerability
exploit-URLs
visiting a different URL. Due to the way
HoneyMonkeys
detect exploits (discussed in 3.3.1), there is a trade-off be3.1.3 Anti-cloaking Scanner and Redirection-Diff Spam
tween the scan rate and the robustness of exploit detection:
Detection Tool
if the HoneyMonkey does not wait long enough or if too
Seed-based
Access
Analysis
Anti-spyware
Browser &
Security
To defeat cloaking,
we could exploit
theofweaknessesInternet
of many simultaneous
browser processes
cause excessive
blocking
URL
& anti-virus
other related
response
individual cloakingSuspect
techniques, but exploit
this approach
wouldsafety
be slowdown,
some exploit
perform a teams
deHunting
teampages may not
center
laborious and unreliable.
Instead, density
our end-to-endenforcement
anti- tectable attack (e.g., beginning
a
software
installation).
team
Figure 3. HoneyMonkey Exploit Detection System and Anti-Exploit Process
WANG ET AL.: STRIDER HONEYMONKEYS: ACTIVE, CLIENT-SIDE HONEYPOTS FOR FINDING MALICIOUS WEB SITES
5
Through extensive experiments, we determined that a
wait time of two minutes was a good trade-off. Taking into
account the overhead of restarting VMs in a clean state,
each machine was able to scan and analyze between 3,000
to 4,000 URLs per day. (We have since improved the scalability of the system to a scan rate of 8,000 URLs per day per
machine in the scalable mode.) In contrast, the basic mode
scans between 500 and 700 URLs per day per machine. We
expect that using a more sophisticated VM platform that
enables significantly more VMs per host machine and faster
rollback [50] would significantly increase our scalability.
scribed in section 3.1.1.
To ease the cleanup of infected state, we run HoneyMonkeys inside a VM. (Our current implementation uses
Microsoft Virtual PC and Virtual Server.) Upon detecting
an exploit, the monkey saves its logs and notifies the Monkey Controller on the host machine to destroy the infected
VM and re-spawn a clean HoneyMonkey, which then continues to visit the remaining URL list. The Monkey Controller then passes the detected exploit-URL to the next monkey in the pipeline to further investigate the strength of the
exploit.
3.2.1 Exploit Detection
Although we could manually build signature-based detection code for known vulnerabilities and exploits, this
approach would not scale and could not detect zero-day
exploits. Instead, we take the following black-box, nonsignature-based approach: a monkey program launches a
browser instance to visit each input URL and then waits for
a few minutes to allow for possible delayed downloading.
Since the monkey does not click any dialog box to permit
software installation, any executable files or registry entries
created outside the browser sandbox indicate exploits. This approach has the additional important advantage of detecting
both known-vulnerability exploits and zero-day exploits uniformly. Specifically, the same monkey program running on unpatched machines to detect a broad range of vulnerability
exploits (as shown in Stages 1 and 2) can run on fully
patched machines to detect zero-day exploits, as shown in
Stage 3.
At the end of each visit, the HoneyMonkey generates an
XML report containing the following five pieces of information:
(i) Executable files created or modified outside the
browser sandbox folders: this is the primary mechanism
for exploit detection. We implemented it on top of the
Strider Tracer [53], which uses a file-tracing driver to efficiently record every single file read/write operation.
(ii) Processes created: Strider Tracer also tracks all child
processes created by the browser process.
(iii) Windows registry entries created or modified: Strider
Tracer additionally includes a driver that efficiently records
every single registry [16] read/write. To highlight the most
critical entries, we use the Strider Gatekeeper and GhostBuster filters [54,55], which target registry entries most frequently attacked by spyware, Trojans, and rootkits based
on an extensive study. This allows HoneyMonkey to detect
exploits that modify critical configuration settings (such as
the browser home page and the wallpaper) without creating executable files.
(iv) Vulnerability exploited: to provide additional information and to address limitations of the black-box approach, we have developed and incorporated a vulnerability-specific detector, to be discussed in Section 5. This is
based on the vulnerability signature of the exploit, rather
than on any particular piece of malcode.
(v) Redirect-URLs visited: Since malcode is often laundered through other sites, this module allows us to track
redirections to determine both the real source of the malcode and those involved in the distribution chain, as de-
3.3 Anti-Exploit Process
The Anti-Exploit Process generates the input URL lists
for HoneyMonkeys to scan, analyzes the output exploitURLs, and takes appropriate actions.
3.3.1 Generating Input URL Lists
We use four sources for generating suspicious URLs for
analysis. The first category consists of suspicious URLs, such as
known Web sites that host malware [9], links appearing in phishing or spam emails [45] or instant messages, Web pages serving
questionable content such as pornography, URL names that are
typos of popular sites [G05], Web sites involved in DNS cache
poisoning [20,26,44], etc.
The second category consists of the most popular Web
pages, which can potentially infect a large client population. Examples include the top 100,000 Web sites based on
browser traffic ranking [3], the top-N search results of popular queries, and the top N million Web sites based on
click-through counts as measured by search engines.
The third category comes from seed-based suspect
hunting as illustrated in the bottom-left corner of `. Confirmed exploit-URLs are used as “seeds” to locate more
potentially malicious Web pages. For example, we have
observed that depth-N crawling of exploit pages containing
a large number of links often leads to the discovery of even
more exploit pages. Similarly, as we will demonstrate in
Section 4.4, when an exploit-URL is comment-spammed at
an open forum (i.e., it is injected into a publicly accessible
comment field to inflate its popularity [38]), other URLs
that appear on the same forum page are also likely malicious [37].
The fourth category encompasses URL lists of a more localized scope. For example, an organization may want to
regularly verify that its Web pages have not been compromised to exploit visitors, or a user may want to investigate
whether any recently visited URL was responsible for his
spyware infection.
3.3.2 Acting on Output Exploit-URL Data
Stage 1 Output – Exploit-URLs
The percentage of exploit-URLs in a given list can be
used to measure the risk of Web surfing. For example, by
comparing the percentage numbers from two URL lists corresponding to two different search categories (e.g., gambling versus shopping), we can assess the relative risk of
malware infection for people with different browsing habits.
6
Stage 2 Output – Traffic Redirection Analysis
The HoneyMonkey system currently serves as a leadgeneration tool for the Internet safety enforcement team in
the Microsoft legal department. The redirection analysis
and subsequent investigations of the malicious behavior of
the installed malware programs provide a prioritized list
for potential enforcement actions that include sending sitetakedown notices, notifying law enforcement agencies, and
filing civil suits against the individuals responsible for distributing the malware programs [61][62]. We have successfully shut down many malicious URLs discovered by the
HoneyMonkey.
Due to the international nature of the exploit community, access blocking may be more appropriate and effective
than legal actions in many cases. Blocking can be implemented at different levels: search engines can remove exploit-URLs from their database [57], Internet Service Providers (ISPs) can black-list exploit-URLs to protect their
entire customer base, corporate proxy servers can prevent
employees from accessing any of the exploit-URLs, and
individual users can block their machines from communicating with any exploit sites by editing their local “hosts”
files to map those server hostnames to a local loopback IP
address.
Exploit-URLs also provide valuable leads to our antispyware product team. Each installed program is tagged
with an “exploit-based installation without user permission” attribute. This clearly distinguishes the program from
other more benign spyware programs that are always installed after a user accepts the licensing agreement.
Stage 3 Output – Zero-Day Exploit-URLs
By constantly monitoring all known exploit-URLs using HoneyMonkeys running on fully extremely damaging,
whether they are deployed in the wild is the most critical
piece of information in the decision process for security
guidance, patch development, and patch release. When a
HoneyMonkey detects a zero-day exploit, it reports the
URL to the Microsoft Security Response Center, which
works closely with the zenforcement team and the groups
owning the software with the vulnerability to thoroughly
investigate the case and determine the most appropriate
course of action. We will discuss an actual case in Section
4.2.
Due to the unavoidable delay between patch release and
patch deployment, it is important to know whether the
vulnerabilities fixed in the newly released patch are being
actively exploited in the wild. Such latest-patchedvulnerability exploit monitoring can be achieved by running HoneyMonkeys on nearly fully patched machines that
miss only the latest patch. This illustrates the prevalence of
such exploits, which help guide on the urgency of patch
IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
deployment.
4 EXPERIMENTAL EVALUATION
We present experimental results in four sections: scanning
suspicious URLs, zero-day exploit detection, scanning popular URLs, and scanning spam URLs. We refer to the first
and the third sets of data as “suspicious-list data” and
“popular-list data”, respectively. All experiments were performed with Internet Explorer browser version 6.0.
We note that the statistics reported in this paper do not
allow us to calculate the total number of end-hosts exploited by the malicious Web sites we have found. Such calculations would require knowing precisely the number of machines that have visited each exploit page and whether each
machine has patched the specific vulnerabilities that the
visted page attempts to exploit.
4.1 Scanning Suspicious URLs
4.1.1 Summary Statistics
Our first experiment aimed at gathering a list of most
likely candidates for exploit-URLs in order to get the highest hit rate possible. In the May/June 2005 time frame, we
collected 16,190 potentially malicious URLs from three
sources: (1) a Web search of “known-bad” Web sites involved in the installations of malicious spyware programs
[9]; (2) a Web search for Windows “hosts” files [21] that are
used to block advertisements and bad sites by controlling
the domain name-to-IP address mapping; (3) depth-2
crawling of some of the discovered exploit-URLs.
We used the Stage-1 HoneyMonkeys running on unpatched WinXP SP1 and SP2 VMs to scan the 16,190 URLs
and identified 207 as exploit-URLs; this translates into a
density of 1.28%. This serves as an upper bound on the infection rate: if a user does not patch his machine at all and
he exclusively visits risky Web sites with questionable content, his machine will get exploited by approximately one
out of every 100 URLs he visits. We will discuss the exploitURL density for normal browsing behavior in Section 4.3.
After recursive redirection analysis by Stage-2 HoneyMonkeys, the list expanded from 207 URLs to 752 URLs – a
263% increase. This reveals that there is a sophisticated
network of exploit providers hiding behind URL redirection to perform malicious activities.
Table 1 shows the breakdown of the 752 exploit-URLs
among different service-packs (SP1 or SP2) and patch levels, where “UP” stands for “UnPatched”, “PP” stands for
“Partially Patched”, and “FP” stands for “Fully Patched”.
As expected, the SP1-UP number is much higher than the
SP2-UP number because the former has more known vulnerabilities that have existed for a longer time.
Table 1. Exploit statistics for Windows XP as a function of patch
levels (May/June 2005 data)
Total
SP1 Unpatched (SP1-UP)
SP2 Unpatched (SP2-UP)
# ExploitURLs
# Exploit
Sites
752
688
204
288
268
115
WANG ET AL.: STRIDER HONEYMONKEYS: ACTIVE, CLIENT-SIDE HONEYPOTS FOR FINDING MALICIOUS WEB SITES
Doorway Pages
Exploit
Providers
R
20
30
Number of hosted
exploit URLs
Figure 6: Top 32 SP1-UP exploit sites ranked by the
number of hosted exploit-URLs
these hundreds of URLs and sites.
Site ranking based on connection counts
Figure 5 illustrates the top 15 exploit sites for SP1-UP ac60
Number of sites from which traffic is received
50
Number of sites with two-way redirection
40
Number of sites to which traffic is redirected
30
20
10
#9
#1
0
S
ite
#1
1
S
ite
#1
2
S
ite
#1
3
S
ite
#1
4
S
ite
#1
5
ite
S
#8
ite
S
#7
ite
S
#6
ite
S
#5
ite
S
#4
ite
S
#3
ite
ite
S
#2
0
Figure 5: Top 15 exploit sites ranked by connection
2. The bar height indicording
their the
connection
counts
counts, to
among
268 SP1-UP
exploit
sites from the
cates
how many
suspicious
list other sites a given site has direct trafficredirection relationship with and likely reflects how entrenched a site owner is with the exploit community. The
bar for each site is composed of three segments of different
colors: a black segment represents the number of sites that
redirect traffic here; a white segment represents the number
of sites to which traffic is redirected; a gray segment indicates the number of sites that have two-way traffic redirection relationship with the given site.
For example, site #15 corresponds to a content provider
who is redirecting traffic to multiple exploit providers and
We anonymize the names of malicious Web sites so as not to interfere
with the ongoing investigations by law-enforcement agencies and our legal
department.
2
Figure 4: SP2-PP Redirection Graph (17 URLs, 10 sites)
10
Site ranking based on the number of hosted exploit URLs
S
4.1.2 Site Ranking
Figure 4 illustrates the site and redirection relationships of the 17 exploit-URLs for SP2-PP. These are
among the most powerful exploit pages in terms of the
number of machines they are capable of infecting and
should be considered high priorities for investigation.
Each rectangular node represents an exploit-URL, and
nodes inside a dashed-line box are pages hosted on the
same Web site. Each arrow represents an automatic traffic redirection. Nodes that do not receive redirected traffic
are most likely content providers, while nodes that receive
traffic from multiple exploit sites are most likely exploit
30
25
20
15
10
5
0
0
#1
The SP2-PP numbers are the numbers of exploit pages
and sites that successfully exploited a WinXP SP2 machine
partially patched up to early 2005. The fact that the numbers are one order of magnitude lower than their SP2-UP
counterparts demonstrates the importance of patching. An
important observation is that only a small percentage of
exploit sites are updating their exploit capabilities to keep
up with the latest vulnerabilities, even though proof-ofconcept exploit code for most of the vulnerabilities are publicly posted. We believe this is due to three factors: (1) Upgrading and testing new exploit code incurs some cost
which needs to be traded off against the increase in the
number of victim machines; (2) Some vulnerabilities are
more difficult to exploit than others; for example, some of
the attacks are nondeterministic or take longer. Most exploiters tend to stay with existing, reliable exploits and only
upgrade when they find the next easy target. (3) Most security-conscious Web users diligently apply patches. Exploit
sites with “advanced” capabilities are likely to draw attention from knowledgeable users and become targets for investigation.
The SP2-FP numbers again demonstrate the importance
of software patching: none of the 752 exploit-URLs was able
to exploit a fully updated WinXP SP2 machine according to
our May/June 2005 data. As we describe in Section 4.2,
there was a period of time in early July when this was no
longer true. We were able to quickly identify and report the
few exploit providers capable of infecting fully patched
machines, which led to actions leading to their shutdown.
ite
0
S
0
providers.
We define for each node a connection count that represents the number of cross-site arrows, both incoming and
outgoing, directly connected to it. Such numbers provide a
good indication of the degree of involvement in the exploit
network. It is clear from the picture that node R has the
highest connection count. Therefore, blocking access to this
site would be the most effective starting point as it would
disrupt nearly half of this exploit network.
The redirection graph for the 688 SP1-UP exploit-URLs is
much more complex and is omitted here. Most of the URLs
appear to be pornography pages and the aggressive traffic
redirection among them leads to the complexity of the bulk
of the graph. In the isolated corners, we found a shopping
site redirecting traffic to five advertising companies that
serve exploiting advertisements, a screensaver freeware
site, and over 20 exploit search sites. Next, we describe two
ranking algorithms that help prioritize the investigations of
Number of directly connected sites
10
ite
17
S
SP2 Partially Patched (SP2PP)
SP2 Fully Patched
(SP2-FP)
7
8
sharing traffic with a few other content providers. Site #7
corresponds to an exploit provider that is receiving traffic
from multiple Web sites. Sites #4, #5, and #9 are pornography sites that play a complicated role: they redirect traffic
to many exploit providers and receive traffic from many
content providers. Their heavy involvement in exploit activities and the fact that they are registered to the same
owner suggest that they may be set up primarily for exploit
purposes.
Site ranking, categorization, and grouping play a key
role in the anti-exploit process because it serves as the basis
for deciding the most effective resource allocation for monitoring, investigation, blocking, and legal actions. For example, high-ranking exploit sites in Figure 5 should be heavily
monitored because a zero-day exploit page connected to
any of them would likely affect a large number of Web
sites. Legal investigations should focus on top exploit providers, rather than content providers that are mere traffic
doorways and do not perform exploits themselves.
Site ranking based on number of hosted exploit-URLs
Figure 6 illustrates the top 32 sites, each hosting five or
more exploit URLs. This ranking helps to highlight those
Web sites whose internal page hierarchy provides important insights. First, some Web sites host a large number
of exploit pages with a well-organized hierarchical structure. For example, the #1 site hosts 24 exploit pages that are
organized by what appear to be account names for affiliates; many others organize their exploit pages by affiliate
IDs or referring site names; some even organize their pages
by the names of the vulnerabilities they exploit and a few of
them have the word “exploit” as part of the URL names.
The second observation is that some sophisticated Web
sites use transient URL names that contain random strings.
This is designed to make investigations more difficult. Site
ranking based on the number of hosted exploit-URLs helps
highlight such sites so that they are prioritized higher for
investigation. The zero-day exploits discussed in the next
sub-section provide a good example of this.
4.2 Zero-Day Exploit Detection
In early July 2005, a Stage-3 HoneyMonkey discovered
our first zero-day exploit. The javaprxy.dll vulnerability
was known at that time without an available patch [28,29],
and whether it was actually being exploited was a critical
piece of information not previously known. The HoneyMonkey system detected the first exploit page within 2.5
hours of scanning and it was confirmed to be the first inthe-wild exploit-URL of the vulnerability reported to the
Microsoft Security Response Center. A second exploit-URL
was detected in the next hour. These two occupy positions
#132 and #179, respectively, in our list of 752 monitored
URLs. This information enabled the response center to provide customers with a security advisory and a follow-up
security bulletin [29],[46].
During the subsequent five days, HoneyMonkey detected that 26 of the 752 exploit-URLs upgraded to the zero-day
exploit. Redirection analysis further revealed that 25 of
them were redirecting traffic to a previously unknown ex-
IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
ploit provider site that was hosting exploit-URLs with
names in the following form:
http://82.179.166.2/[8 chars]/test2/iejp.htm
where [8 chars] consists of 8 random characters that appeared to change gradually over time. Takedown notices
were sent after further investigation of the installed malware programs, and most of the 25 Web pages stopped exploiting the javaprxy.dll vulnerability shortly after that.
Latest-Patched-Vulnerability Exploit Monitoring
One day after the patch release, HoneyMonkey detected
another jump in the number of exploit-URLs for the vulnerability: 53 URLs from 12 sites were upgraded in the subsequent six days. Redirection analysis revealed that all of
them were redirecting traffic to a previously known exploit
provider (ranked #1 in Figure 6) who added a new exploit
page for javaprxy.dll to increase its infection base. A
takedown notice was sent and all 53 URLs stopped exploiting within a couple of days.
Important Observations
This experience provides concrete evidence that the
HoneyMonkey system can potentially evolve into a fullfledged, systematic and automatic zero-day exploit monitoring system for browser-based attacks. We make the following observations from the initial success:
(i) Monitoring easy-to-find exploit-URLs is effective: we
predicted that monitoring the 752 exploit-URLs would be
useful for detecting zero-day exploits because the fact that
we could find them quickly within the first month implies
that they are more popular and easier to reach. Although
zero-day exploits are extremely powerful, they still need to
connect to popular Web sites in order to receive traffic to
exploit. If they connect to any of the monitored URLs in our
list, the HoneyMonkey can quickly detect the exploits and
identify the exploit providers behind the scene through
redirection analysis. Our zero-day exploit detection experience confirmed the effectiveness of this approach.
(ii) Monitoring content providers with well-known URLs
is effective: we predicted that monitoring content providers would be useful for tracking the potentially dynamic
behavior of exploit providers. Unlike exploit providers who
could easily move from one IP address to another and use
random URLs, content providers need to maintain their
well-known URLs in order to continue attracting browser
traffic. The HoneyMonkey takes advantage of this fundamental weakness in the browser-based exploit model and
utilizes the content providers as convenient doorways into
the exploit network.
The effectiveness of this approach was further confirmed by the detection of another zero-day exploit of the
msdds.dll vulnerability around mid-August 2005 [30]: 48
doorway pages redirecting to three exploit providers were
detected by early September. Of particular importance was
a doorway URL from the popular list connecting to an exploit provider who was downloading a malware program
not in the signature set of any anti-malware scanners at that
time. This demonstrated the value of using HoneyMonkey
for malware sample collection to improve the effectiveness
of anti-malware programs, as depicted in the bottom row of
Figure 3.
WANG ET AL.: STRIDER HONEYMONKEYS: ACTIVE, CLIENT-SIDE HONEYPOTS FOR FINDING MALICIOUS WEB SITES
Table 2. Comparison of suspicious and popular list data
Suspicious List
Popular List
# URLs scanned
16,190
1,000,000
# Exploit URLs
207 (1.28%)
710 (0.071%)
# Exploit URLs
After Redirection
(Expansion Factor)
752
1,036
(263%)
(46%)
# Exploit Sites
288
470
SP2-to-SP1 Ratio
204/688=0.30
131/980=0.13
4.3.1 Summary Statistics
Before redirection analysis
Of the one million URLs, HoneyMonkey determined
that 710 were exploit pages. This translates into a density of
0.071%, which is between one to two orders of magnitude lower
than the 1.28% number from the suspicious-list data. The
distribution of exploit-URLs among the ranked list is fairly
uniform, which implies that the next million URLs likely
exhibit a similar distribution and so there are likely many
more exploit URLs to be discovered. Eleven of the 710 exploit pages are very popular: they are among the top 10,000
of the one million URLs that we scanned. This demonstrates the need for constant, automatic Web patrol of popular pages in order to protect the Internet from large-scale
infections.
After redirection analysis
The Stage-2 HoneyMonkey redirection analysis expanded the list of 710 exploit-URLs to 1,036 URLs hosted by 470
sites. This (1,036-710)/710=46% expansion is much lower
than the 263% expansion in the suspicious-list data, suggesting that the redirection network behind the former is
Number of directly connected sites
4.3 Scanning Popular URLs
By specifically searching for potentially malicious Web
sites, we were able to obtain a list of URLs that have 1.28%
of the pages performing exploits. A natural question that
most Web users will ask is: if I never visit those risky Web sites
that serve questionable content, do I have to worry about vulnerability exploits? To answer this question, around July 2005,
we gathered the most popular one million URLs as measured by the click-through counts from a search engine and
tested them with the HoneyMonkey system. Table 2 summarizes the comparison of key data from this popular list
with the previous data from the suspicious list.
less complex. The SP2-to-SP1 ratio of 0.13 is lower than its
counterpart of 0.30 from the suspicious-list data (see Table
2). This suggests that overall the exploit capabilities in the
popular list are not as advanced as those in the suspicious
list, which is consistent with the findings from our manual
analysis.
80
70
60
50
40
30
20
10
0
Si
Number of sites from which traffic is received
Number of sites with two-way redirection
Number of sites to which traffic is redirected
te
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15
te ite ite ite ite ite ite ite te
te
te ite ite ite
Si
S
S
S
S
S
S
S Si
Si Si
S
S
S
Figure 7: Top 15 exploit sites ranked by connection
counts, among the 426 SP1-UP exploit sites from the
popular list
Number of hosted
exploit URLs
(iii) Monitoring highly ranked and advanced exploitURLs is effective: we predicted that the top exploit sites we
identified are more likely to upgrade their exploits because
they have a serious investment in this business. Also, Web
sites that appear in the SP2-PP graph are more likely to upgrade because they appeared to be more up-to-date exploiters. Both predictions have been shown to be true: the first
detected zero-day exploit-URL belongs to the #9 site in
Figure 5 (which is registered to the same email address that
also owns the #4 and #5 sites) and 7 of the top 10 sites in
Figure 5 upgraded to the javaprxy.dll exploit; nearly half of
the SP2-PP exploit-URLs in Figure 4 upgraded as well.
9
70
60
50
40
30
20
10
0
0
5
10
15
20
25
30
Site ranking based on number of hosted exploit URLs
Figure 8: Top 32 sites ranked by the number of exploitURLs, among the 426 SP1-UP exploit sites
Intersecting the 470 exploit sites with the 288 sites from
Section 4.1 yields only 17 sites. These numbers suggest that
the degree of overlap between the suspicious list, generally
with more powerful attacks, and the popular list is not
alarmingly high at this point. But more and more exploit
sites from the suspicious list may try to “infiltrate” the
popular list to increase their infection base. In total, we
have collected 1,780 exploit-URLs hosted by 741 sites.
4.3.2 Site Ranking
Site ranking based on connection counts
Figure 7 illustrates the top 15 SP1-UP exploit sites by
connection counts. There are several interesting differences
between the two data sets behind the suspicious-list plot
(Figure 5) and the popular-list plot (Figure 7). First, there is
not a single pair of exploit sites in the popular-list data that
are doing two-way traffic redirection, which appears to be
unique in the malicious pornography community. Second,
while it is not uncommon to see Web sites redirecting traffic to more than 10 or even 20 sites in the suspicious-list,
sites in the popular-list data redirect traffic to at most 4
sites. This suggests that aggressive traffic redirection/exchange is also a phenomenon unique to the malicious pornography community.
Finally, the top four exploit providers in the popular-list
clearly stand out. None of them have any URLs in the original list of one million primary URLs, but all of them are
behind a large number of doorway pages. The #1 site provides exploits to 75 Web sites primarily in the following
10
IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
Table 3: Malicious URLs in Spammed Forums
2
3
4
5
Spammed at (cached pages of) these forums
http://eclatlantus.yiffyhost.com/fesseeprogenco.html
http://humorrise.sitesled.com/burberryhandbag-supplier-wholesale.html
http://alexandrsultz.sitesled.com/chloe/h
andbag.html
http://nowodus.tripod.com/new-mexicolottery.html
http://lermon.t35.com
http://members.tripod.com/deportivolapigeon/gu
estbook.htm
http://sgtrois.webator.net/?2004/11/02/4-un-cdgratuit-offert-par-ladisq
http://www.no-nation.org/article347.html
http://aose.ift.ulaval.ca/modules.php?op=modloa
d&name=News&file=article&sid=137
http://68.15.204.73/ngallery/albums/3/1.aspx
five categories: (1) celebrities, (2) song lyrics, (3) wallpapers,
(4) video game cheats, and (5) wrestling. The #2 site receives traffic from 72 Web sites, the majority of which are
located in one particular country. The #3 site is behind 56
related Web sites that serve cartoon-related pornographic
content. The #4 site appears to be an advertising company
serving exploiting ads through Web sites that belong to
similar categories as those covered by the #1 site.
Site ranking based on number of hosted exploit-URLs
Figure 8 illustrates the top 32 sites hosting more than one
exploit URL. Unlike Figure 7, which highlights mostly exploit provider sites, Figure 8 highlights many content provider sites that host a large number of exploit pages containing a similar type of content. Again, the top four sites
stand out: the #1 site is a content provider of video game
cheats information for multiple game consoles. The #2 site
(which also appears as the third entry in Figure 7) hosts a
separate URL for each different Web site from which it receives traffic. The #3 site is a content provider that has a
separate entry page for each celebrity figure. The #4 site is a
content provider of song lyrics with one entry page per
celebrity singer.
sitesled.com
Kalovalex
Maripirs
Sadoviktor
Vinnerira
Footman
X
X
X
X
X
X
X
X
X
X
.com
174 (14%) of 1,280
22 (26%) of 85
Now that the effectiveness of the HoneyMonkey system
is widely known [22], we expect that exploit sites will start
adopting techniques to evade HoneyMonkey detection. We
.com
X
X
X
X
164 (49%) of 338
5 DISCUSSIONS
140
120
100
80
Table 4: Distribution of Malicious60
Sub-domains
40
20
0
tripod porkyhost 100megsfree5 250free.com
analloverz
.com
6 (18%) of 33
we were able to find 1,054 unique, comment-spammed malicious URLs, most of which were doorway pages residing
on free-hosting Web sites. Figure 9 illustrates the distribution of malicious URLs among the top-15 domains according to the number of malicious URLs hosted. Clearly, several of these domains are heavily targeted by exploiters.
We also observed that many same-named malicious subdomains existed on multiple domains. Table 4 illustrates
the distribution of the top-10 sub-domains (in rows) across
top domains hosting malicious sites (in columns) with “X”
indicating that the sub-domain existed on the hosting domain as a malicious site. This is demonstrative of the malicious site hierarchical structure mentioned in section 4.1.2
in which exploiters are using their account names as the
sub-domain names so that payment tracking can be portable across different hosting domains.
In addition, we found a total of 75 behind-the-scenes,
malicious URLs that received redirection traffic from these
thousands of doorway URLs and were responsible for the
actual vulnerability-exploit activities. In particular, the following three URLs were behind all five sets of malicious
URLs:
 http:/ /zllin.info/n/us14/index.php
 http:/ /linim.net/fr/?id=us14
 http:/ /alllinx.info/fr/?id=us14
All three domains share the same registrant with an Oklahoma state address, whose owner appears to be a major
exploiter involved in search spamming.
4.4 Scanning Spam URLs
Between July and September 2006, we used the HoneyMonkey system to scan the top search results of a set of
spammer-targeted keywords at the three major search engines and identified five malicious spam URLs. For each
URL, we used the search engine to find a forum that had
been spammed with the URL, extracted all URLs spammed
on the same forum page, and scanned them to see if we
could discover additional malicious URLs.
Table 3 summarizes the results. For each of the five malicious URLs, we found many other seemingly related
URLs spammed on the same forum page, and scan results
showed that 14% to 49% of these related URLs were also
malicious. In total, from this small set of five “seed URLs”,
t35.com
# Malicious URLs
(%) of # Spam URLs
688 (47%) of 1,463
Number of malicious URLs
hosted on each domain
si t35
te
.
50 sle com
xo 1m d.c
om eg om
s
50 pag . co
00 es m
m .c
eg om
or s.c
gf o
re m
t e.
po ripo com
r
d
fre kyh .co
eh os m
os t.c
tp om
r
ba o.c
10 t t ca om
0m ek ve
c
fre eg itie .ne
eh sf r s. c t
os ee om
to 5.c
nl o
in m
27 e.c
5m om
25 b
0f .co
re m
e.
co
m
1
Malicious URLs
php5.cz
.com
X
X
X
X
X
X
1access
host.com
X
X
Domains hosting malicious URLs
X
X
X
myopen
web.com
X
X
X
FigureX9: Number of comment-spammed malicious
URLs hosted on each of the top-15 domains
WANG ET AL.: STRIDER HONEYMONKEYS: ACTIVE, CLIENT-SIDE HONEYPOTS FOR FINDING MALICIOUS WEB SITES
discuss three types of potential evasion techniques and our
countermeasures. Since it has become clear that a weakness
of the HoneyMonkey is the time window between a successful exploit that allows foreign code execution and the
subsequent execution of the HoneyMonkey exploit detection code, we have developed and integrated a tool called
Vulnerability-Specific Exploit Detector (VSED), which allows the HoneyMonkey to detect and record the first sign
of an exploit. Such a detector only works for known vulnerabilities though; detecting zero-day exploits of totally
unknown vulnerabilities remains a challenge. The VSED
tool will be discussed in Section 5.4.
5.1 Identifying HoneyMonkey Machines
There are three ways for an exploit site to identify HoneyMonkey machines and skip exploits.
(i) Targeting HoneyMonkey IP addresses: The easiest way
to avoid detection is to black-list the IP addresses of HoneyMonkey machines. We plan to counter this by running
the HoneyMonkey network behind multiple ISPs with dynamically assigned IP addresses. If an exploit site wants to
black-list all IP addresses belonging to these ISPs, it will
need to sacrifice a significant percentage of its infection
base. One market research study of ISP client membership
shows that the top 10 US ISPs service over 62% of US Internet users [25].
(ii) Performing a test to determine if a human is present:
Currently, HoneyMonkeys do not click on any dialog box.
A malicious Web site could introduce a one-time dialog box
that asks a simple question; after the user clicks the OK button to prove he’s human, the Web site drops a cookie to
suppress the dialog box for future visits. More sophisticated Web sites can replace the simple dialog box with a
CAPTCHA Turing Test [2] (although this would raise suspicion because most non-exploiting sites do not use such
tests). We will need to incorporate additional intelligence
into the HoneyMonkeys to handle dialog boxes and to detect CAPTCHA tests if and when we see Web sites starting
to adopt such techniques to evade detection.
(iii) Detecting the presence of a VM or the HoneyMonkey
code: Malicious code could detect a VM by executing a series of instructions with high virtualization overhead and
comparing the elapsed time to some external reference [50];
by detecting the use of reserved x86 opcodes normally only
used by specific VMs [33]; by leveraging information
leaked by sensitive, non-privileged instructions [43]; and by
observing certain file directory contents known to be associated with UML (User-Mode Linux) [8] or a specific hardware configuration, default MAC address, or I/O backdoor
associated with VMware [23].
Most VM-detection techniques arise due to the fact that
the x86 processors are not fully virtualizable. Fortunately,
both Intel [52] and AMD [40] have proposed architecture
extensions that would make x86 processors fully virtualizable, and thus make detecting a VM more difficult. In the
meantime, we can adopt anti-detection techniques that target known VM-detection methods [8],[50]. As VMs are increasingly used as general computing platforms, the approach of detecting HoneyMonkeys by detecting VMs will
become less effective.
11
Alternatively, we developed techniques that allow us to
also run HoneyMonkey on non-virtual machines so that the
results can be cross-checked to identify sophisticated attackers. We implemented support to efficiently checkpoint
our system (both memory and disk state) when it is in a
known-good state, and roll back to that checkpoint after an
attack has been detected. To checkpoint memory, we utiliszed the hibernation functionality already present in Windows to efficiently store and restore memory snapshots. To
support disk checkpoints, we implemented copy-on-write
disk functionality by modifying the generic Windows disk
class driver which is used by most disks today. Our copyon-write implementation divides the physical disk into two
equally sized partitions. We use the first partition to hold
the default disk image that we roll back to when restoring a
checkpoint, and the second partition as a scratch partition
to store all disk writes made after taking a checkpoint. We
maintain a bitmap in memory to record which blocks have
been written to so we know which partition contains the
most recent version of each individual block. As a result, no
extra disk reads or writes are needed to provide copy-onwrite functionality and a rollback can be simply accomplished by zeroing out the bitmap. To provide further protection, we can adopt resource-hiding techniques to hide
the driver from sophisticated attackers who are trying to
detect the driver to identify a HoneyMonkey machine.
Some exploit sites may be able to obtain the “signatures”
of the HoneyMonkey logging infrastructure and build a
detection mechanism to allow them to disable the logging
or tamper with the log. Since such detection code can only
be executed after a successful exploit, we can use VSED to
detect occurrences of exploits and highlight those that do
not have a corresponding file-creation log. Additionally, we
are incorporating log signing techniques to detect missing
or modified log entries.
We note that some classes of exploits require writing a
file to disk and then executing that file for running arbitrary
code. These exploits cannot escape our detection by trying
to identify a HoneyMonkey machine because our file-based
detection actually occurs before they can execute code.
5.2 Exploiting without Triggering HoneyMonkey
Detection
Currently, HoneyMonkey cannot detect exploits that do not
make any persistent-state changes or make such changes
only inside browser sandbox. Even with this limitation, the
HoneyMonkey is able to detect most of today’s Trojans,
backdoors, and spyware programs that rely on significant
persistent-state changes to enable automatic restart upon
reboot. Again, the VSED tool can help address this limitation.
HoneyMonkeys only wait for a few minutes for each
URL, so a possible evasion technique is to delay the exploit.
However, such delays reduce the chance of successful infections because real users may close the browser before the
exploit happens. We plan to run HoneyMonkeys with random wait times and highlight those exploit pages that exhibit inconsistent behaviors across runs for more in-depth
manual analysis.
12
5.3 Randomizing the Attacks
Exploit sites may try to inject nondeterministic behavior to
complicate the HoneyMonkey detection. They may randomly exploit one in every N browser visits. We consider
this an acceptable trade-off: while this would require multiple scans by the HoneyMonkeys to detect an exploit, it
forces the exploit sites to reduce their infection rates by N
times as well. If a major exploit provider is behind more
than N monitored content providers, the HoneyMonkey
can still detect it through redirection tracking in one round
of scans.
Exploit sites may try to randomize URL redirections by
selecting a random subset of machines to receive forwarded
traffic each time from a large set of infected machines that
are made to host exploit code. Our site ranking algorithm
based on connection counts should discourage this because
such sites would end up prioritizing themselves higher for
investigation. Also, they reveal the identities of infected
machines, whose owners can be notified to clean up the
machines.
5.4 Vulnerability-Specific Exploit Detector (VSED)
To address some of the limitations discussed above and
to provide additional information on the exact vulnerabilities being exploited, we have developed a vulnerabilityspecific detector, called VSED, and integrated it into the
HoneyMonkey. The VSED tool implements a source-code
level, vulnerability-specific intrusion detection technique
that is similar to IntroVirt [31]. For each vulnerability, we
manually write “predicates” to test the state of the monitored program to determine when an attacker is about to
trigger a vulnerability. VSED operates by inserting breakpoints within buggy code to stop execution before potentially malicious code runs, in order to allow secure logging
of an exploit alert. For example, VSED could detect a buffer
overflow involving the “strcpy” function by setting a
breakpoint right before the buggy “strcpy” executes. Once
VSED stops the application, the predicate examines the variables passed into “strcpy” to determine if an overflow is
going to happen. One limitation of VSED is that it cannot
identify zero-day exploits of unknown vulnerabilities.
To evaluate the effectiveness of VSED for detecting browser-based exploits, we wrote predicates for six recent IE vulnerabilities and tested them against the exploit-URLs from
both the suspicious list and the popular list. Although we
do not have a comprehensive list of predicates built yet, we
can already pinpoint the vulnerabilities exploited by hundreds of exploit-URLs.
6 RELATED WORK
There is a rich body of literature on honeypots. Most
honeypots are deployed to mimic vulnerable servers waiting for attacks from client machines [18,39,27,32]. In contrast, HoneyMonkeys mimic clients drawing attacks from
malicious servers.
To our knowledge, there are sixz other projects related
to the concept of client-side honeypots. Three are active
client honeypots: HoneyClient [19] and Moshchuk et al.’s
study of spyware [63], both of which were done in parallel
IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
with HoneyMonkey, and the recent work of Provos et al
[65]. The other three are passive client honeypots used as
part of intrusion detection systems: email quarantine, shadow honeypots, and SpyProxy.
In parallel to our work, the Honeyclient project [19]
shares the same goal of trying to identify browser-based
attacks. However, the project has not published any deployment experience or any data on detected exploit-URLs.
There are also several major differences in terms of implementation: HoneyMonkeys are VM-based, use a pipeline of
machines with different patch levels, and track URL redirections.
Also parallel to HoneyMonkey are systems deployed for
measurement studies of malware on the Web. Moshchuk et
al [63] focused on spyware. Their tools actively sought out
executables for download as well as Web sites performing
“drive-by downloads.” They used virtual machines in a
state-change-based approach and anti-spyware tools to determine whether the executables were malicious. Their system had built in heuristics to deal with non-deterministic
elements such as time bombs, pop-up windows, and other
Javascript tricks. They determined through three separate
crawls that although there was a reduction in malicious
Web pages, spyware’s presence on the Web is still significant.
Provos et al [65] conducted another measurement study
also using a state-change-based approach. Their goal was to
gauge the state of Web-based malware over a twelve month
period by identifying and finding examples of four injection mechanisms: Web server security, user contributed
content, advertising, and 3rd-party widgets. Using data
from Google crawls, they identified potentially malicious
URLs that they fed to Internet Explorer running in a virtual
machine. Using anti-virus engines and detecting state
changes to the file system, registry, and processes spawned,
they classified URLs as harmless, malicious, or inconclusive. They found that not only is malware distributed
across large numbers of URLs and domains, but that malware binaries change frequently. Furthermore, they discovered that machines compromised by Web-based malware
communicate with malicious servers in botnet-like structures. In May of 2007, the same team released a blog post
[68] that reemphasized the significance of the finding that
0.1% of sampled URLs were malicious. In the context of
billions of URLs, this means that millions are malicious.
Their findings are consistent to our findings that 0.071% of
URLs are malicious.
Sidiroglou et al. [47] described an email quarantine system which intercepts every incoming message, “opens” all
suspicious attachments inside instrumented virtual machines, uses behavior-based anomaly detection to flag potentially malicious actions, quarantines flagged emails, and
only delivers messages that are deemed safe.
Anagnostakis et al. [4] proposed the technique of “shadow honeypots” which are applicable to both servers and
clients. The key idea is to combine anomaly detection with
honeypots by diverting suspicious traffic identified by
anomaly detectors to a shadow version of the target application that is instrumented to detect potential attacks and
filter out false positives. As a demonstration of client-side
WANG ET AL.: STRIDER HONEYMONKEYS: ACTIVE, CLIENT-SIDE HONEYPOTS FOR FINDING MALICIOUS WEB SITES
protection, the authors deployed their prototype on Mozilla
Firefox browsers.
SpyProxy [64] is optimized to detect malicious Web sites
in real time transparently using a state-change-based approach. While the HoneyMonkeys are designed for detection and measurement, SpyProxy is designed to act as a
proxy for client browsers by intercepting HTTP requests
and executing Web content on the fly in a virtual machine
before forwarding the results to the client. The authors deployed SpyProxy as a network service and tested SpyProxy
against a list of 100 malicious Web pages from 45 Web sites.
SpyProxy was 100% effective in blocking malware in that
experiment.
The three client-side honeypots described above are
passive in that they are given existing traffic and do not
actively solicit traffic. In contrast, HoneyMonkeys are active
and are responsible for seeking out malicious Web sites and
drawing attack traffic from them. The former has the advantage of providing effective, focused protection of targeted population. The latter has the advantages of staying out
of the application’s critical path and achieving a broader
coverage, but it does require additional defense against
potential traps/black-holes during the recursive redirection
analysis. The two approaches are complementary and can
be used in conjunction with each other to provide maximum protection.
Existing honeypot techniques can be categorized using
two other criteria: (1) physical honeypots [32] with dedicated physical machines versus virtual honeypots built on
Virtual Machines [51,49]; (2) low-interaction honeypots
[39], which only simulate network protocol stacks of different operating systems, versus high-interaction honeypots
[27], which provide an authentic decoy system environment. HoneyMonkeys belong to the category of highinteraction, virtual honeypots.
In contrast with the black-box, state-change-based detection approach used in HoneyMonkey, several papers proposed vulnerability-oriented detection methods, which can
be further divided into vulnerability-specific and vulnerability-generic methods. The former includes Shield [58], a
network-level filter designed to detect worms exploiting
known vulnerabilities, and IntroVirt [31], a technique for
specifying and monitoring vulnerability-specific predicates
at code level. The latter includes system call-based intrusion detection systems [13][14], memory layout randomization [5][59], non-executable pages [1] and pointer encryption [7]. An advantage of vulnerability-oriented techniques
is the ability to detect an exploit earlier and identify the
exact vulnerability being exploited. As discussed in Section
5.4, we have incorporated IntroVirt-style, vulnerabilityspecific detection capability into the HoneyMonkey.
7 SUMMARY
We have presented the design and implementation of
the Strider HoneyMonkey as the first systematic method for
automated Web patrol in the hunt for malicious Web sites
that exploit browser vulnerabilities. Our analyses of two
sets of data showed that the densities of malicious URLs are
1.28% and 0.071%, respectively. In total, we have identified
13
a large community of 741 Web sites hosting 1,780 exploitURLs. Furthermore, using a small set of only five seed
URLs, we discovered 1,054 additional malicious URLs that
were spammed in the comment fields at public forums. We
proposed redirection analysis to capture the relationship
between exploit sites, and site ranking algorithms based on
the number of directly connected sites and the number of
hosted exploit-URLs to identify major players. Our success
in detecting the first-reported, in-the-wild, zero-day exploit-URL of the javaprxy.dll vulnerability demonstrates
the effectiveness of our approach, which monitors easy-tofind content providers with well-known URLs as well as
top exploit providers with advanced exploit capabilities.
Finally, we discussed several techniques that malicious
Web sites can adopt to evade HoneyMonkey detection,
which motivated us to incorporate an additional vulnerability-specific exploit detection mechanism to complement
the HoneyMonkey’s core black-box exploit detection approach.
ACKNOWLEDGMENT
The authors wish to express their thanks to Ming Ma for his
help in finding malicious spam pages.
REFERENCES
[1] S. Andersen and V. Abella, “Data Execution Prevention. Changes to
Functionality in Microsoft Windows XP Service Pack 2, Part 3: Memory
Protection Technologies.,”
http://www.microsoft.com/technet/prodtechnol/winxppro/maintai
n/ sp2mempr.mspx.
[2] L. von Ahn, M. Blum, and J. Langford, “Telling Humans and Computers Apart Automatically,” in Communications of the ACM, Feb. 2004.
[3] Alexa, http://www.alexa.com/.
[4] K. Anagnostakisy, S. Sidiroglouz, P. Akritidis, K. Xinidis, E. Markatos, and A. Keromytis. “Detecting Targeted Attacks Using Shadow
Honeypots,” in Proc. USENIX Security Symposium, August 2005.
[5] PaX Address Space Layout Randomization (ASLR).
http://pax.grsecurity.net/docs/aslr.txt.
[6] Xpire.info, http://www.vitalsecurity.org/xpire-splitinfinityserverhack_malwareinstall-condensed.pdf, Nov. 2004.
[7] C. Cowan, S. Beattie, J. Johansen, and P. Wagle. “PointGuard: Protecting pointers from buffer overflow vulnerabilities,” in Proc. USENIX
Security Symposium, August 2003.
[8] C. Carella, J. Dike, N. Fox, and M. Ryan, “UML Extensions for
Honeypots in the ISTS Distributed Honeypot Project,” in Proc. IEEE
Workshop on Information Assurance, 2004.
[9] “Webroot: CoolWebSearch Top Spyware Threat,”
http://www.techweb.com/showArticle.jhtml?articleID=160400314,
TechWeb, March 30, 2005.
[10] Download.Ject,
http://www.microsoft.com/security/incident/download_ject.mspx,
June 2004.
[11] Ben Edelman, “Who Profits from Security Holes?” Nov. 2004,
http://www.benedelman.org/news/111804-1.html.
[12] “Follow the Money; or, why does my computer keep getting infested with spyware?”
http://www.livejournal.com/users/tacit/125748.html.
[13] S. Forrest, S. Hofmeyr, A. Somayaji, and T. Longsta, “A sense of
self for Unix processes,” in Proc. IEEE Symposium. on Security and Priva-
14
cy, May 1996.
[14] H. Feng, O. Kolesnikov, P. Fogla, W. Lee, and W. Gong, “Anomaly detection using call stack information,” in Proc. IEEE Symposium on
Security and Privacy, May 2003.
[15] “Googkle.com installed malware by exploiting browser vulnerabilities,” http://www.f-secure.com/v-descs/googkle.shtml.
[16] Archana Ganapathi, Yi-Min Wang, Ni Lao, and Ji-Rong Wen,
"Why PCs Are Fragile and What We Can Do About It: A Study of
Windows Registry Problems", in Proc. IEEE Dependable Computing and
Communications Symposium, June 2004.
[17] Z. Gyongyi and H. Garcia-Molina, “Web Spam Taxonomy,” in
Proc. International Workshop on Adversarial Information Retrieval on the
Web (AIRWeb), 2005.
[18] The Honeynet Project, http://www.honeynet.org/.
[19] Honeyclient Development Project, http://www.honeyclient.org/.
[20] “Another round of DNS cache poisoning,” Handlers Diary, March
30, 2005, http://isc.sans.org/.
[21] hpHOSTS community managed hosts file, http://www.hostsfile.net/downloads.html.
[22] Strider HoneyMonkey Exploit Detection,
http://research.microsoft.com/HoneyMonkey.
[23] T. Holz and F. Raynal, “Detecting Honeypots and other suspicious
environments,” in Proc. IEEE Workshop on Information Assurance and
Security, 2005.
[24] “iframeDOLLARS dot biz partnership maliciousness,”
http://isc.sans.org/diary.php?date=2005-05-23.
[25] ISP Ranking by Subscriber, http://www.ispplanet.com/research/rankings/index.html.
[26] “Scammers use Symantec, DNS holes to push adware,” InfoWorld.com, March 7, 2005,
http://www.infoworld.com/article/05/03/07/HNsymantecholesand
adware_1.html?DESKTOP%20SECURITY.
[27] Xuxian Jiang, Dongyan Xu, “Collapsar: A VM-Based Architecture
for Network Attack Detention Center”, in Proc. USENIX Security Symposium, Aug. 2004.
[28] Microsoft Security Advisory (903144) - A COM Object (Javaprxy.dll) Could Cause Internet Explorer to Unexpectedly Exit,
http://www.microsoft.com/technet/security/advisory/903144.mspx.
[29] Microsoft Security Bulletin MS05-037 - Vulnerability in JView
Profiler Could Allow Remote Code Execution (903235),
http://www.microsoft.com/technet/security/bulletin/ms05037.mspx.
[30] Microsoft Security Advisory (906267) – A COM Object (Msdds.dll)
Could Cause Internet Explorer to Unexpectedly Exit,
http://www.microsoft.com/technet/security/advisory/906267.mspx.
[31] Ashlesha Joshi, Sam King, George Dunlap, Peter Chen, “Detecting
Past and Present Intrusions Through Vulnerability-Specific Predicates,” in Proc. ACM Symposium on Operating System Principles, 2005.
[32] Sven Krasser, Julian Grizzard, Henry Owen, and John Levine,
“The Use of Honeynets to Increase Computer Network Security and
User Awareness”, in Journal of Security Education, pp. 23-37, vol. 1, no.
2/3. March 2005.
[33] Lallous, “Detect if your program is running inside a Virtual Machine,” March 2005,
http://www.codeproject.com/system/VmDetect.asp.
[34] Microsoft Security Bulletin MS05-002, Vulnerability in Cursor and
Icon Format Handling Could Allow Remote Code Execution,
http://www.microsoft.com/technet/security/Bulletin/MS05002.mspx.
[35] Microsoft Security Bulletin MS03-011, Flaw in Microsoft VM
IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
Could Enable System Compromise,
http://www.microsoft.com/technet/security/Bulletin/MS03011.mspx.
[36] Microsoft Security Bulletin MS04-013, Cumulative Security Update
for Outlook Express,
http://www.microsoft.com/technet/security/Bulletin/MS04013.mspx.
[37] Y. Niu, Y. M. Wang, H. Chen, M. Ma, and F. Hsu, “A Quantitative
Study of Forum Spamming Using Context-based Analysis,” in Proc.
Network and Distributed System Security Symposium, 2007.
[38] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank
Citation Ranking: Bringing Order to the Web,”
http://dbpubs.stanford.edu:8090/pub/1999-66, Jan. 29, 1998
[39] Niels Provos, “A Virtual Honeypot Framework”, in Proc. USENIX
Security Symposium, Aug. 2004.
[40] AMD Pacifica Virtualization Technology,
http://enterprise.amd.com/downloadables/Pacifica.ppt.
[41] “Russians use affiliate model to spread spyware,”
http://www.itnews.com.au/newsstory.aspx?CIaNID=18926.
[42] Team Register, “Bofra exploit hits our ad serving supplier,”
http://www.theregister.co.uk/2004/11/21/register_adserver_attack/
, November 2004.
[43] Red Pill, http://invisiblethings.org/papers/redpill.html.
[44] Symantec Gateway Security Products DNS Cache Poisoning Vulnerability,
http://securityresponse.symantec.com/avcenter/security/Content/2
004.06.21.html.
[45] “Michael Jackson suicide spam leads to Trojan horse,”
http://www.sophos.com/virusinfo/articles/jackotrojan.html, Sophos,
June 9, 2005.
[46] “What is Strider HoneyMonkey,”
http://research.microsoft.com/honeymonkey/article.aspx, Aug. 2005.
[47] Stelios Sidiroglou and Angelos D. Keromytis, “A Network Worm
Vaccine Architecture,” in Proc.Information Security Practice and Experience Conference, April 2005.
[48] Michael Ligh, “Tri-Mode Browser Exploits - MHTML, ANI, and
ByteVerify,” http://www.mnin.org/write/2005_trimode.html, April
30, 2005.
[49] Know Your Enemy: Learning with User-Mode Linux. Building
Virutal Honeynets using UML,
http://www.honeynet.org/papers/uml/.
[50] Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex Snoeren, Geoff Voelker, and Stefan Savage, “Scalability,
Fidelity and Containment in the Potemkin Virtual Honeyfarm,” in
Proc. ACM Symposium on Operating Systems Principles, Oct. 2005.
[51] Know Your Enemy: Learning with VMware. Building Virutal
Honeynets using VMware,
http://www.honeynet.org/papers/vmware/.
[52] Vanderpool Technology, Technical report, Intel Corporation, 2005.
[53] Yi-Min Wang, et al., “STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support”, in
Proc. Usenix LISA, Oct. 2003.
[54] Yi-Min Wang, et al., “Gatekeeper: Monitoring Auto-Start Extensibility Points (ASEPs) for Spyware Management”, in Proc. Usenix LISA,
2004.
[55] Yi-Min Wang, Doug Beck, Binh Vo, Roussi Roussev, and Chad
Verbowski, “Detecting Stealth Software with Strider GhostBuster,” in
Proc. International Conference on Dependable Systems and Networks, June
2005.
[56] Yi-Min Wang, Doug Beck, Jeffrey Wang, Chad Verbowski, and
WANG ET AL.: STRIDER HONEYMONKEYS: ACTIVE, CLIENT-SIDE HONEYPOTS FOR FINDING MALICIOUS WEB SITES
Brad Daniels, “Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting,” in Proc. 2nd Workshop on Steps to Reducing Unwanted Traffic on the Internet (SRUTI), July 2006.
[57] Yi-Min Wang, Doug Beck, Xuxian Jiang, and Roussi Roussev,
“Automated Web Patrol with Strider HoneyMonkeys: Finding Web
sites That Exploit Browser Vulnerabilities,” Microsoft Research Technical Report MSR-TR-2005-72, July 2005,
ftp://ftp.research.microsoft.com/pub/tr/TR-2005-72.pdf.
[58] Helen J. Wang, Chuanxiong Guo, Daniel R. Simon, and Alf
Zugenmaier, “Shield: Vulnerability-Driven Network Filters for Preventing Known Vulnerability Exploits,” in Proc. ACM SIGCOMM,
August 2004.
[59] J. Xu, Z. Kalbarczyk and R. K. Iyer, “Transparent Runtime Randomization for Security,” in Proc. Symp. Reliable and Distributed Systems (SRDS), October 2003.
[60] Strider URL Tracer, http://research.microsoft.com/URLTracer.
[61] Microsoft, Washington AG sue antispyware company
http://www.computerworld.com/printthis/2006/0,4814,108047,00.html, Jan.
25, 2006
[62] Attorney General McKenna Announces $1 Million Settlement in
Washington’s First Spyware Suit.
http://www.atg.wa.gov/releases/2006/rel_Secure_Computer_Restitution_2006
.html, Dec. 4, 2006
[63] Alexander Moshchuk, Tanya Bragin, Steven D. Gribble, and Henry M. Levy. “A Crawler-based Study of Spyware on the Web,” in Proc. Network and Distributed Systems Security Symposium, San Diego, CA, February 2006.
[64] Alexander Moshchuk, Tanya Bragin, Damien Deville, Steven D.
Gribble, and Henry M. Levy. “A Crawler-based Study of Spyware on the
Web,” in Proc. USENIX Security Symposium, Boston, MA, August 2007.
[65] Niels Provos, Dean McNamee, Panayiotis Mavrommatis, Ke
Wang, and Nagendra Modadugu. “ The Ghost in the Browser: Analysis
of Web-based Malware,” in Proc. Workshop on Hot Topics in Understanding Botnets, Cambridge, MA, April 2007.
[66] Bofra exploit hits our ad serving supplier.
http://www.theregister.com/2004/11/21/register_adserver_attack/, Nov. 11,
2004.
[67] Brian Krebs, “Hacked Ad Seen on MySpace Served Spyware to a
Million.”
http://blog.washingtonpost.com/securityfix/2006/07/myspace_ad_served_adw
are_to_mo.html, Jul. 19, 2006.
[68] Panayiotis Mavrommatis and Niels Provos. “Introducing Google's
online security efforts.”
http://googleonlinesecurity.blogspot.com/2007/05/introducing-googles-antimalware.html, May 21, 2007.
15
Download