slides - Computer Science and Engineering

advertisement
Spam: Why?
+
=
Chris Kanich
Christian Kreibich
Kirill Levchenko
Brandon Enright
Vern Paxson
Geoffrey M. Voelker
Stefan Savage
1
What is Computer security?
2
What is Computer security?
• Most of computer science is about providing
functionality:







User Interface
Software Design
Algorithms
Operating Systems/Networking
Compilers/PL
Microarchitecture
VLSI/CAD
• Computer security is not about functionality
• It is about how the embodiment of functionality
behaves in the presence of an adversary
• Security mindset – think like a bad guy
3
My Background
• Collaborative Center for
Internet Epidemiology and Defenses (CCIED)
 UCSD/ICSI group created in response to worm threat
 Very well funded, many strong partners
• Goals



Internet epidemiology: measuring/understanding attacks
Automated defenses: stopping outbreaks/attacks
Economic and legal issues: that other stuff
Many big successes…
• 50+ papers, lots of tech transfer, big sytems, etc
• Network Telescope

Passive monitor for > 1%
of routable Internet addr space
• Potemkin & GQ Honeyfarms

Active VM honeypot servers on
>250k IP addresses
• Earlybird

On-line learning of new
worm signatures in < 1ms
But… depressing truth
We didn’t stop Internet worms,
let alone malware,
let alone cybercrime…
nor did anyone else.
At best, moved it around a bit.
By any meaningful metric the bad guys are winning…
Mistake: looking at this solely as a technical problem
Key threat transformations
of the 21st century
• Efficient large-scale compromises



Internet communications model
Software homogeneity
User naïveity/fatigue
• Centralized control


Makes compromised host a
commodity good
Platform economy
• Profit-driven applications


Commodity resources
(IP, bandwidth, storage, CPU)
Unique resources
(PII/credentials, CD-Keys, address book, etc)
7
DDoS for sale
• Emergence of economic engine for Internet crime

SPAM, phishing, spyware, etc
• Fluid third party markets for illicit digital goods/services


Bots ~$0.5/host, special orders, value added tiers
Cards, malware, exploits, DDoS, cashout, etc.
Botnet Spammer Rental Rates
>20-30k always online SOCKs4, url is de-duped and updated
> every 10 minutes. 900/weekly, Samples will be sent on
> request. Monthly payments arranged at discount prices.
•
3.6 cents per bot week
>$350.00/weekly - $1,000/monthly (USD)
>Type of service: Exclusive (One slot only)
>Always Online: 5,000 - 6,000
>Updated every: 10 minutes
•
6 cents per bot week
>$220.00/weekly - $800.00/monthly (USD)
>Type of service: Shared (4 slots)
>Always Online: 9,000 - 10,000
>Updated every: 5 minutes
•
2.5 cents per bot week
September 2004 postings to SpecialHam.com, Spamforum.biz
Bot Payloads
9
Spamalytics
11
Key structural asymmetries
• Defenders reactive, attackers proactive


Defenses public, attacker develops/tests in private
Arms race where best case for defender is to “catch up”
• New defenses expensive, new attacks cheap

Defenses sunk costs/business model,
attacker agile and not tied to particular technology
• Low risk to attacker, high reward to attacker


Minimal deterrence
Functional anonymity on the Internet; very hard to fix
• Defenses hard to measure, attacks easy to measure

Few security metrics (no “evidence-based” security),
attackers measure monetization which drives attack quality
12
Revisiting the problem
• We tend to think about this in terms of technical means for
securing computer systems
• Most of 50-100B IT budget on cyber security is spent on
securing the end host



AV, firewalls, IDS, encryption, etc…
Single most expensive front to secure
Single hardest front to secure
• But are individual end hosts valuable to bad guys?

Maybe $1.50? Even less in bulk… not a pain point
• What instead? Economically informed strategies
•
•
Identify and attack economic bottlenecks in value chain
This means understanding the return-on-investment for bad guys
13
Today: the spam problem
• We tend to focus on the costs of spam



> 100 Billion spam emails sent every day [Ironport]
> $1B in direct costs – anti-spam products/services [IDC]
Estimates of indirect costs (e.g., productivity) 10-100x more
• But spam exists only because it is profitable
• Someone is buying! (though no one has admitted it to me…)
• Our goal

Understand underlying economic support for spam
14
History of the
spam business model
• Direct Mail: origins in 19th century
catalog business


Idea: send unsolicited advertisements to
potential customers
Rough value proposition:
Delivery cost <
(Conversion rate * Marginal revenue)
• Modern direct mail (> $60B in US)


Response rate: ~2.5% (mean per DMA)
CPM (cost per thousand) = $250 - $1000
• Spam is qualitatively the same…
15
… but quantitatively different
• Advantages of e-mail direct marketing




No printing cost
Legitimate delivery cost low
(outsourced price ~ $0.001/message [Get Response])
Dominated by production & lead generation cost (i.e. mailing list)
But this is for spam as a legal marketing vehicle… a minority
• Spam as marketing/bait for criminal enterprises
(scams)


Mailing lists → ε (purchase/steal/harvest) <$10/M retail
Delivery cost → ε (botnet-based delivery) <$70M retail
16
Anatomy of a modern Pharma
spam campaign
Courtesy Stuart Brown
modernlifisrubbish.co.uk
Estimating spam profits
• Recall key basic inequality:
(Delivery Cost) < (Conversion Rate) x (Marginal Revenue)
• We have some handle on two of these (e.g., [Franklin07])


Delivery cost to send spam
» Outsourced cost: retail purchase price < $70/M addrs
» In-house cost: development/management labor
Marginal revenue
» Average pharma sale of $100, affiliate commissions ≈ 50%
• Conversion rate is fundamentally different
• We don’t know; estimates vary by orders of magnitude
20
The measurement conundrum
• No accident that we lack good conversion measures
• Its easy to measure spam from a receiver viewpoint



Which MTA sent it to me?
What does the content contain?
Where do the links go? etc…
• But the key economic issue is only known by the sender

Conversion rate * marginal profit = revenue per msg sent
• What to do?



Interview spammers? (0.00036) [Carmack03]
Guess? (“millions of dollars a day”) [Corman08])
Send lots of spam and see who clicks on links? (gold standard)
21
Botnet infiltration
• Key idea: distributed C&C is a vulnerability



Botnet authors like de-centralized communications for
scalability and resilience, but…
… to do so, they trust their bots to be good actors
If you can modify the right bots you can observe and influence
actions of the botnet
• Rest of today: preliminary results from a case study




Infiltrated Storm P2P botnet, instrumented ~500M spams
Delivery rates (anti-spam impacts on delivery)
Click through (visits to spam advertized sites)
Conversions (purchases and purchase amounts)
Kanich, Kreibich, Levchenko, Enright, Paxson, Voelker and Savage,
Spamalytics: an Empirical Analysis of Spam Marketing Conversion,22
ACM CCS 2008
How this works in detail
• Botnet Infiltration


Overview of the Storm peer-to-peer botnet
» How does Storm work?
Mechanics of botnet spamming
» How can Storm’s C&C be instrumented?
• Economic issues


Using a botnet for measurement
» How to measure conversion via C&C interposition
Measuring spam delivery pipeline
» What happens to spam from when a bot sends it…
» …to when a user clicks “purchase” at a scam site?
23
Storm
• Storm is a well-known peer-to-peer botnet
• Storm has a hierarchical architecture



Workers perform tasks (send spam, launch DDoS attacks, etc.)
Proxies organize workers, connect to HTTP proxies
Master servers controlled directly by botmaster
• Workers and proxies are compromised hosts (bots)


Use a Distributed Hash Table protocol (Overnet) for rendezvous
Roughly 20,000 actives bots at any time in April [Kanich08]
• Master servers run in “bullet-proof” hosting centers

Communicate with proxies and workers via command and
control (C&C) protocol over TCP
Kanich, Levchenko, Enright, Voelker and Savage, The Heisenbot
Spamalytics
24
Uncertainty Problem: Challenges in Separating Bots from Chaff, LEET 2008.
Storm architecture
Dr. Evil
Master
servers
Proxy
bots
Worker
bots
25
Storm setup
• New bots decide if they are proxies or workers

Inbound connectivity? Yes, proxy. No, worker.
• Proxies advertise their status via encrypted
variant of Overnet DHT P2P protocol


Master sends “Breath of Life” packet to new proxies to
tell them IP address of master servers (RSA signature)
Allows master servers to be mobile if necessary
• Workers use Overnet to find proxies
(tricky: time-based key identifies request)
• Workers send to proxy, proxy forwards
to one of master servers in “safe” data center
• Bottom line: imperfect, but remarkably sophisticated
26
Storm spam campaigns
Workers request “updates” to send spam [Kreibich08]


Dictionaries: names, domains, URLs, etc.
Email templates for producing polymorphic spam
» Macros instantiate fields: %^Fdomains^% from domains dict

Lists of target email addresses (batches of 500-1000 at a time)
Workers immediately act on these updates



Create a unique message for each email address
Send the message to the target
Report the results (success, failure) back to proxies
Many campaign types

Self-propagation malware, pharmaceutical, stocks, phishing, …
Kreibich, Kanich, Levchenko, Enright, Voelker, Paxson and Savage,
On the Spam Campaign Trail, LEET 2008.
27
Storm templates
Macro expansion to insert
target email address
Example Storm spam template and instantiation
28
Storm in action
Received: from dkjs.sgdsz
([132.233.197.74]) by dsl-189-188-7963.prod-infinitum.com.mx with
Microsoft SMTPSVC(5.0.2195.6713);
Received: from auz.xwzww
Wed, 6 Feb 2008 16:33:44 -0800
([132.233.197.74]) Received:
by dsl-189-188-79from auz.xwzww
From: <johnny@hotmail.com>
63.prod-infinitum.com.mx
with
([132.233.197.74])
by dsl-189-188-79Received:
from %^C0%^P%^R2To: <kreibich@icir.org>
Received: from %^C0%^P%^R21224704030~!pharma_links~!
1224720409~!names~!eduardo
1224739062~!vern@icir.org
Microsoft SMTPSVC(5.0.2195.6713);
63.prod-infinitum.com.mx
with
6^%:qwertyuiopasdfghjklzxcvbnm^%.%^
6^%:qwertyuiopasdfghjklzxcvbnm^%.%^P
Subject: Say hello to bluepill!
Received:
from
auz.xwzww
Wed,
6
Feb
2008
16:33:44
-0800
spammerdomain1.com
rafael
ckanich@cs.ucsd.edu
%^R2Microsoft
SMTPSVC(5.0.2195.6713);
P%^R2([132.233.197.74])
byspammerdomain3.com
dsl-189-188-796^%:qwertyuiopasdfghjklzxcvbnm^%^%
From: <katiera@experimentalist.org>
Wed,
6
Feb
2008 16:33:44 -0800
spammerdomain2.com
katiera
savage@cs.ucsd.edu
([%^C6%^I^%.%^I^%.%^I^%.%^I^%^%]) by
63.prod-infinitum.com.mx
with
6^%:qwertyuiopasdfghjklzxcvbnm^%^%
To: <ckanich@cs.ucsd.edu>
From: <eduardo@slave.org>
%^A^% with Microsoft
Microsoft
SMTPSVC(5.0.2195.6713);
spammerdomain3.com
chris
kreibich@icir.org
([%^C6%^I^%.%^I^%.%^I^%.%^I^%^%])
Subject:
Say
hello
to bluepill!
SMTPSVC(%^Fsvcver^%); %^D^%
To:
<vern@icir.org>
Wed,
6
Feb
2008
16:33:44
-0800
From:
<%^Fnames^%@%^Fdomains^%>
spammerdomain2.com
by %^A^% with Microsoft
…
johnny
...
Subject: Say hello to bluepill!
To: <%^0^%>
From: <rafael@superlative.edu>
spammerdomain1.com
SMTPSVC(%^Fsvcver^%);
… Subject: Say hello to bluepill! %^D^%
To: savage@cs.ucsd.edu
<%^Fpharma_links^%>
From: <%^Fnames^%@%^Fdomains^%>
Subject: Say hello to bluepill!
To: <%^0^%>
spammerdomain2.com
Subject: Say hello to bluepill!
<%^Fpharma_links^%>
30
Interposition on Storm
• We interpose on Storm command and control network

Reverse-engineered Storm protocols, communication
scrambling, rendezvous mechanisms [Kanich08] [Kreibich08]
• Run unmodified Storm proxy bots in VMs

Key issue: Real bot workers connect to our proxies
• Insert rewriting proxies between workers & proxies


Transparently interpose on messages between Storm proxies
and their associated Storm workers
Generic engine for rewriting traffic based on rules
• Interpose to control site URLs and spam delivery


Which sites the spam advertises (replace urls in template links)
To whom spam gets sent (replace addrs in target list)
31
Modifying template links
Received: from dkjs.sgdsz
([132.233.197.74]) by dsl-189-188-7963.prod-infinitum.com.mx
spammerdomain.com with
Microsoft SMTPSVC(5.0.2195.6713);
Wed,
6 Feb 2008 16:33:44 -0800
spammerdomain2.com
From: <freebie@pants.com>
spammerdomain3.com
To:
<ckanich@cs.ucsd.edu>
Subject: Say hello to bluepill!
newdomain2.com
newdomain1.com
newdomain2.com
newdomain3.com
Received: from dkjs.sgdsz
([132.233.197.74]) by dsl-189-188-7963.prod-infinitum.com.mx with
Microsoft SMTPSVC(5.0.2195.6713);
Wed, 6 Feb 2008 16:33:44 -0800
From: <johnny@hotmail.com>
To: <kreibich@icir.org>
Subject: Say hello to bluepill!
spammerdomain3.com
Measuring click-through
• Create two sites that mirror actual sites in spam


E-card (self-propagation) and pharmaceutical
Replace dictionaries with URLs to our sites
• E-card (self-prop) site


Link to benign executable that POSTs to our server
Log all POSTs to track downloads and executions
• Pharma site


Log all accesses up through clicks on “purchase”
Track the contents of shopping carts
• Strive for verisimilitude to remove bias (spam filtering)

Site content is similar, URLs have same format as originals, …
33
Aside: having fun
34
Measuring Delivery
• Create various test email accounts



At Web mail providers: Hotmail, Yahoo!, Gmail
Behind a commercial spam filtering appliance
As SMTP sinks: accept every message delivered
• Put email addresses in Storm target delivery lists
• Log all emails delivered to these addresses

Both labeled as spam (“Junk E-mail”) and in inbox
35
Ethical context
• Consequentialism
• First, do no harm (users no worse off than before)
 We do not send any spam
» Proxies are relays, worker bots send spam

We do not enable additional spam to be sent
» Workers would have connected to some other proxy

We do not enable spam to be sent to additional users
» Users are already on target lists, only add control addresses
• Second, reduce harm where possible
 Our pharma sites don’t take credit card info
 Our e-card sites don’t export malicious code
36
Legal context
• Warning: IANAL (we had lawyers involved though)
• CAN*SPAM
• Subject to strong definition of “initiator”; we don’t fit it
• ECPA
• Our proxy is directly addressed by worker bots
(“party to” communication carve out)
• CFAA
• We do not contact worker bots, they contact us
(“unauthorized access”?)
• We do not cause any information to be extracted or any
fundamentally new activity to take place
• Hard to find a good theory of damages
(functionally indistinguishable -- consequentialism)
37
But…
• In this kind of work there is little precedent
• No agency to get permission; no way to get indemnity
• Lawyers tend to say “I believe this activity has low risk of…”
• We communicate our activities to a lot of people
•
•
•
•
Security researchers in industry, academia
Affected network operators/registrars
Law enforcement
FTC
38
Aside: Spam is hard
• Lots of operational complexities to a study like this
• Net Ops notices huge Storm infestation
• Address space cleanliness
• Registrar issues


•
•
•
•
GoDaddy
TUCOWS
Abuse complaints
Spam site support e-mail
Anti-virus signatures
Law-enforcement
39
Spam conversion experiment
• Experimented with Storm March 21 – April 15, 2008
• Instrumented roughly 1.5% of Storm’s total output
Pharmacy
Campaign
E-card Campaigns
Postcard
April Fool
Worker bots
31,348
17,639
3,678
Emails
347,590,389
83,665,479
38,651,124
Duration
19 days
7 days
3 days
40
Effects of
Blacklisting
Response
rates
by country
Spam pipeline
(CBL Feed)
Spam filtering software
Sent
MTA
Inbox
Visits
Unused
• The fraction
of spam delivered into
user(0.003%)
inboxes
347.5M
82.7M (24%)
10,522
depends on the spam filtering software used
83.6 M
40.1 M
•

Conversions
28 (0.000008%)
21.1M (25%)
3,827 (0.005%)
Combination
of site filtering (e.g., blacklists)
and 316 (0.00037%)
--content
10.1Mfiltering
(25%) (e.g., spamassassin)
2,721 (0.005%) 225 (0.00056%)
Difficult to generalize, but we can use our test
accounts for specific services
Other
filtering12
Pharma:
Two orders
of magnitude
M spam emails for one “purchase”
No large aberrations
based on email topic
E-card:
1 insent
10 visitors
execute
the binary
Fraction
of spam
that was delivered
to inboxes
Effective
41
The spammer’s bottom line
• Recall that we tracked the contents of shopping carts
• Using the prices on the actual site, we can estimate the
value of the purchases

28 purchases for $2,731 over 25 days, or $100/day ($140 active)
• We only interposed on a fraction of the workers



Connected to approx 1.5% of workers
Back-of-the-envelope (be very careful) 
$7-10k/day for all, or ~$3M/year
With a 50% affiliate commission, $1.5M/year revenue
• For self-propagation

Roughly 3-9k new bots/day
42
Summary
• First measurement study of spam marketing conversion
• Infiltrated Storm botnet, interposed on spam campaigns

Rewriting proxies take advantage of Storm reverse-engineering
• Pharmaceutical spam




1 in 12M conversion rate  $1.5M/yr net revenue
Profitability possibly tied to infrastructure integration
Sent via retail market, this campaign would not be profitable
Ergo: in-house delivery (Storm owners = pharma spammers)
• Self Propagation spam


250k spam emails per infection
Social engineering effective: one in ten visitors run executable
43
What are we doing now?
• More analysis



Extending infiltration to ~15 botnets; comparative analysis
Characteristic fingerprints of different spammers/crews
Characterizing supply chain relationships
» Broadly order on-line “viagra”, rolexes, etc
» Cluster credit processor/merchant, mailing materials, etc
» Cluster on manufacturing fingerprint (e.g., NIR spectroscopy)

Measuring monetization by purposely losing credit cards
• Proactive defenses



Automated filter generation from templates
Automated classification of URLs
Automated vision-based detection of phishing pages
44
Security courses at UCSD
• CSE107 – Introduction to modern cryptography
• CSE127 – Computer Security
• But…
• Security plays a role in virtually all of your courses
45
Questions?
Collaborative Center
for Internet Epidemiology and Defenses
http://ccied.org
Yahoo!
46
What’s next:
Value-chain characterization
• Value-chain characterization

Empirical map establishing links between criminal
groups and enablers
» Affiliate programs, botnets, fast flux networks, registrars,
payment processors, SEO/traffic partners,
fulfillment/manufacturing
» Data mining across huge data feeds we’ve built or
established relationships for

Social network among criminal groups
» Semantic Web mining
New: Fulfillment measurements
• About to start purchasing wide range of spam-advertized products
 Watches
 Pharma
 Traffic
• Cluster purchases based on
 Merchant and processor
 Packaging (postmark, forensic analysis of paper)
 Artifacts of manufacturing process (e.g., FT-NIR on drugs)
48
New: Bot-based spam filter
generation
• Observations
– Modest number of bots send most spam
– Virtually all bots use templates with simple rules to
describe polymorphism
letters and numbers
– random
Templates+dictionaries
≈ regex describing spam to be
generated
– If we can extract or infer these from the botnets, we have a
perfect filter for all the spam generated by the botnet
– Very specific filters, extremely low FP risk
http://www.marshal.com/trace/spam_statistics.asp
phrases from a dictionary
Early results (last week)
0 FP with 50 examples
0 FN on Storm with 500 examples
Still tuning for other botnets
Spare slides
Removing
crawlers/honeyclients
• Anyone can send email to our accounts or visit our Web
sites, potentially muddying the waters

Use various heuristics to validate the logs
• Validate spam in mailboxes was sent by us


Spam from other campaigns, bounce messages, etc.
Subject line matches our campaign, URL from our dictionary
• Validate Web accesses were by users in response



Sites with links in spam are immediately crawled by Google, A/V
vendors, etc.
Special 3rd-level DNS names, special url encoding
Ignore hosts that access robots.txt, don’t load javascript,
don’t load flash, don’t load images, many malformed requests
52
Pharma and e-card
conversions
53
Who is targeted?
Top 20 domains
Many Web mail & broadband
providers, but very long tail
Campaigns have nearly identical
distributions
Same scammers, or target
lists sold to multiple
scammers
54
Download