Understanding the Network-Level
Behavior of Spammers
Author:
Anirudh Ramachandran, Nick Feamster
SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy
Presenter:
Tao Li
Questions
What IP ranges send the most spam?
Common spamming modes? How much spam comes from botnets versus other techniques? (open relays, short-lived route announcements)
How persistent across time each spamming host is?
Characteristics of spamming botnets?
Motivation
17-month trace over 10 million spam messages at “ spam sinkhole ”
Joint analysis with IP-based blacklist lookups, passive TCP fingerprinting info, routing info, botnet “ C&C ” traces
To find the network-level properties to design more robust network-level spam filters.
Outline
Background Information
Data Collection
Data Analysis
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Discussion
Outline
Background Information
Data Collection
Data Analysis
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Discussion
Spamming Methods
Direct spamming
Buy connectivity from “ spam-friendly ” ISPs
Open relays and proxies
Allow unauthenticated hosts to relay email
Botnets
Infected hosts as mail relay
BGP Spectrum Agility
Hijack send spam withdrawal routes
Mitigation techniques
Content filter
Continually update filtering rules large corpuses for training
Spammers easy to change content
Blacklist lookup
Stolen IP address to send spam
Many bot IP addresses are short-lived
Outline
Background
Data Collection
Data Analysis
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Discussion
Spam Email Traces
“ Sinkhole ” corpus domain 8/5/2005 — 1/6/2006
No legitimate email addresses
DNS Main Exchange (MX) record
Run Mail Avenger — SMTP sever
IP address of the relay
A traceroute to that IP address
A passive “ p0f ” TCP fingerprinting — OS
Result of DNS blacklist (DNSBL) lookups
Spam Email Traces
Number of spam and distinct IP address rising
Data Collection
Legitimate Email Traces
700,000 legitimate form a large email provider
Botnet Command and Control Data
A trace of hosts infected by “ Bobax ”
Hijacked authoritative DNS server running the C&C of the botnet, redirect it to a honeypot ,
BGP Routing Measurements
Colocate a BGP monitor in the same network as
“ sinkhole ”
Outline
Background
Data Collection
Data Analysis
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Discussion
Network-level Characteristics of Spammers
Distribution Across Networks
Distribution across IP address space
Distribution across ASes
Distribution by country
The Effectiveness of Blacklists
Distribution Across Networks
Distribution across IP address space
The majority of spam is from a relative small fraction of IP address space and the distribution is persistent.
Distribution Across Networks
About 85% of client IP addresses sent less than
10 emails to the sinkhole.
Important for spam filter design.
Distribution Across Networks
Distribution across ASes
Over 10% from 2 ASes; 36% from 20 ASes
Distribution Across Networks
Distribution by country
Although the top 2 ASes from which spam were received were from Asia, 11 of top 20 were from USA compromising 40% of all of the spam received from the top 20.
Assigning a higher level of suspicion according to an email ’ s country of origin maybe effective in filtering.
The Effectiveness of Blacklists
Nearly 80% relays in the 8 blacklists
The Effectiveness of Blacklists
Spamcop only lists
50% spam received
Blacklists have high false positive
Ineffective when IP address using more sophisticated cloaking techniques
Outline
Background
Data Collection
Data Analysis
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Discussion
Spam from Botnets
Bobax Topology
Spamming hosts and bobax drones have similar distribution across IP address space — much of the spam may due to botnets
Spam from Botnets
Operating Systems of Spamming Hosts
4% not Windows; but sent 8% spam
Spam from Botnets
Spamming Bot Activity Profile
over 65% bot single shot, 75% of which less than 2 minutes
Spam from Botnets
Spamming Bot Activity Profile
Regardless of persistence, 99% of bots sent fewer than 100 pieces of spam
Outline
Background
Data Collection
Data Analysis
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Discussion
Spam from Transient BGP
Announcements
BGP Spectrum Agility
A small but persistent group of spammers appear to send spam by
Advertising (hijacking) large blocks of IP address space (ie. /8s)
Sending spam from IP address scattered throughout that space
Withdrawing the route for the IP address space shortly after the spam is sent
Spam from Transient BGP
Announcements
Announcement, withdrawal and spam from
61.0.0.0/8 and 82.0.0.0/8
Spam from Transient BGP
Announcements
Prevalence of BGP Spectrum Agility
1% spam from short-lived routes; but sometimes 10%
Outline
Background
Data Collection
Data Analysis
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Discussion
Contribution
Suggest using network-level properties of spammers as an addition to spam mitigation techniques
Quantify and document spammers using BGP route announcements for the first time
Present the first study examining the interplay between spam, botnets and the Internet routing infrastructure
Lots of useful findings according to network-level properties of spam
Weakness
Use only a small sample, not providing general conclusions about the Interne-wide characteristics
Only studied spam sent by Bobax drones
Data collection in the Botnet Command and
Control Data, assuming host not patched and not use dynamic addressing during the course.
How to improve
Design a better notion of host identity
Detection techniques based on aggregate behavior
Securing the Internet routing infrastructure
Incorporating some network-level properties of spam into spam filters