Spamming Botnets: Signatures and Characteristics .

advertisement
Spamming Botnets:
Signatures and
Characteristics
Yinglian Xie, Fang Yu, Kannan Achan, Rina
Panigrahy, Geoff Hulten, and Ivan Osipkov.
SIGCOMM, 2008.
Presented by: Arnold Perez
Outline



Introduction
Goals
AutoRE
 Challenges
 Design
 Results



Botnet characteristics
Contributions
Weaknesses
Introduction

Botnets are commonly used for profit
 Botnets

rented out to spammers
Botnets can send spam emails at a large
scale
 Can
transmit thousands of emails in a short
duration

Difficult to detect and blacklist individual bots
Goals

Understand the behaviors of botnets from
the perspective of large email servers that
are popular targets
 Identify
botnet characteristics and trends
 Track sending behavior and content patterns

Develop a framework (AutoRE) that
identifies botnet hosts by generating
botnet spam signatures from emails
AutoRE

Motivated by recent success of signature
based worm and virus detection systems
 Botnet
spam emails are often sent in an
aggregate fashion, resulting in content
prevalence similar to worm propagation

Focus primarily on URLs embedded in the
email
AutoRE Challenges

Spammers often add random, legitimate
URLs to content in order to increase the
perceived legitimacy of emails
AutoRE Challenges

Spammers use URL obfuscation
techniques to evade detection
AutoRE Design
AutoRE Design

Input
 Set

of unlabeled email messages
Output
 Set
of spam URL signatures
Complete URL string
 URL regular expression

 List
of botnet host IP addresses
AutoRE Design

Comprised of three modules
 URL

preprocessor
Extracts URLs and other relevant fields and groups them
according to web domain
 Group

Selects URL groups with the highest degree of burstiness in
sending times
 RegEx

selector
generator
Extracts signatures by processing one group at a time
URL Pre-Processing

Extracts
 URL
string
 Source server IP address
 Email sending time

Partitions into groups based on web domains
 Emails
from same spam campaign always advertise
the same product or service from the same domain
URL Group Selection

Each email my belong to more than one
group
 Use

the bursty property of botnet email traffic
Select group that exhibits the strongest temporal
correlation across a large set of distributed
senders
Signature Generation and Botnet
Identification

Two types of signatures
 Complete
URL based signature
 Regular expression signatures

Signature criteria
 Distributed
 Bursty
 Specific
Signature Generation and Botnet
Identification

Distributed
 Total
number of Autonomous Systems (AS) spanned
by source IP addresses must be at least 20

Bursty
 The

set of matching URLs must be sent within 5 days
Specific
 Complete URLs are specific by definition
 For regex, entropy reduction is used to test.
Probability of a random string matching signature is
1/(2^90)
Automatic URL Regular Expression
Generation
Signature Tree Construction

Constructs a keyword-based signature
tree where each node corresponds to a
substring, with the root of the tree set to
the domain name
 Keywords
are the most frequent substrings
that are both bursty and distributed
Signature Tree Construction
Regular Expression Generation

Detailing
 Returns
a domain specific regular expression
using the keyword-based signature

Generalization
 Returns
a more general domain-agnostic
regular expression by merging very similar
domain-specific expressions
Regular Expression Generation
Datasets and Results

Based on randomly sampled Hotmail email
messages
 November
2006
 June
2007
 July 2007


Total of 5,382,460 sampled emails
Pre-classified as either spam or non-spam by
human user (not used by filter, used for
validation purposes only)
AutoRE Results
Identified 7,721 botnet spam campaigns
 580,466 spam messages
 340,050 distinct botnet host IP addresses
 5,916 AS

AutoRE Results
AutoRE Results




Majority of the campaigns belong to CU category
100% increase from July 2007 when compared
to Nov 2006
Spam volume increased 50% in same time
period
Total number of botnet IPs does not increase
proportionally, suggesting that each botnet is
being used more aggressively
False Positive Rate

Rate = non spam matching signature /
total number of non spam
Ability to Detect Future Spam

Experiment

Apply signatures derived in Nov 2006 and June 2007 to the
emails collected in July 2007

Nov 2006 signatures are not useful



Indicates that spam URL patterns evolve over time
June 2007 signatures are highly effective
RE signatures are more robust than CU signatures over time
Regular Expressions vs Keyword
Conjunctions
Identical spam detection rates
 Difference is in false positive rate

Domain-specific vs DomainAgnostic Signatures

Generalization effectively preserves the stable
structures of polymorphic URLs while removing
the volatile domain substrings
Botnet Characteristics

Distribution of IP addresses indicate botnet menace is a
global phenomenon, with China, Korea, France, and
USA having significant number of IP addresses
Botnet Characteristics

When viewed individually, botnet hosts do
not exhibit distinct sending patterns
 Content
in email is quite different even though
the target web pages are the same

50% of botnet spam campaigns have a
standard deviation of less than 1.81 hours,
while 90% have standard deviation of less
than 24 hours.
Botnet Characteristics


Similar number of recipients per email
Share a constant connection rate
 Most

likely due to rate control seen in botnet software
Large number of campaigns share the same
domain-agnostic regular expression signatures
 Same
botnets participating in multiple spam
campaigns
Contributions


AutoRE, a framework that automatically
generates URL signatures for spamming botnet
detection
Several important findings about botnet spam
 Botnet
hosts spread across the internet
 No distinctive pattern when viewed individually
 Botnet host sending patterns
Weaknesses

The AutoRE system analyzes batches of emails
after they are all received
 Would
be better if we could do this in real time to stop
email once a campaign has been identified and a
signature created

The AutoRE system needs a lot of emails to
work effectively.
 We
can’t use it on individual inboxes, it must be put
between the ISP and the incoming email
Weaknesses

I was hoping to take the characteristics
found in the paper to use in my own
project
 Paper
shows that individually you can not
identify spam from botnets. The AutoRE
system works on group behavior.
References

"Spamming Botnets: Signatures and
Characteristics". Yinglian Xie, Fang Yu, Kannan
Achan, Rina Panigrahy, Geoff Hulten, and Ivan
Osipkov. SIGCOMM, 2008.
Download