CCIEDFutures - Systems and Networking

CCIED Looking Forward
1
Context
• After four years…

50+ papers, multiple awards, significant advances on state of
the art, two new workshops, lots of tech transfer, many students
trained, etc
• But…
We didn’t stop Internet worms,
let alone malware,
let alone cybercrime…
nor did anyone else.
At best, moved it around a bit.
By any meaningful metric things are worse than when we started…
• Mistake: looking at this primarily as a technical problem
Key threat transformations
of the 21st century
• Efficient large-scale compromises



Internet communications model
Software homogeneity
User naïveity/fatigue
• Centralized control

Cheap scalability for criminal applications
(e.g. spam, info theft, DDoS, etc)
• Profit-driven applications


Commodity resources
(IP, bandwidth, storage, CPU)
Unique resources
(PII/credentials, CD-Keys, address book, etc)
3
Emergence of
Economic Drivers
• In last five years, emergence of profit-making malware


Anti-spam efforts force spammers to launder e-mail through
compromised machines (starts with MyDoom.A, SoBig)
“Virtuous” economic cycle transforms nature of threat
• Commoditization of compromised hosts

Fluid third-party exchange market (millions of hosts)
» Raw bots (range from pennies to dollars)
» Value added tier: SPAM proxying (more expensive)
• Innovation in both host substrate and its uses


Sophisticated infection and command/control networks: platform
SPAM, piracy, phishing, identity theft, DDoS are all applications
DDoS for sale
• Emergence of economic engine for Internet crime

SPAM, phishing, spyware, etc
• Fluid third party markets for illicit digital goods/services


Bots ~$0.5/host, special orders, value added tiers
Cards, malware, exploits, DDoS, cashout, etc.
Botnet Spammer Rental Rates
>20-30k always online SOCKs4, url is de-duped and updated
> every 10 minutes. 900/weekly, Samples will be sent on
> request. Monthly payments arranged at discount prices.
•
3.6 cents per bot week
>$350.00/weekly - $1,000/monthly (USD)
>Type of service: Exclusive (One slot only)
>Always Online: 5,000 - 6,000
>Updated every: 10 minutes
•
6 cents per bot week
>$220.00/weekly - $800.00/monthly (USD)
>Type of service: Shared (4 slots)
>Always Online: 9,000 - 10,000
>Updated every: 5 minutes
•
2.5 cents per bot week
September 2004 postings to SpecialHam.com, Spamforum.biz
Bot Payloads
6
Structural asymmetries
• Defenders reactive, attackers proactive


Defenses public, attacker develops/tests in private
Arms race where best case for defender is to “catch up”
• New defenses expensive, new attacks cheap

Defenses sunk costs/business model,
attacker agile and not tied to particular technology
• Minimal deterrent effect

Functional anonymity on the Internet; very hard to fix
• Defenses hard to measure, attacks easy to measure

Few security metrics (no “evidence-based” security),
attackers measure monetization which drives attack quality
10
Example: brief history
of the spam arms race
Anti-spam action
1. Real-time IP
blacklisting
2. Clean up open
relays/proxies
3. Content-based
learning
4. Site takedown
5. CAPTCHAs
Spammer response
1. Send via open
relays/proxies
2. Delivery via
compromised botnets
3. Content chaff,
polymorphic spam
generators, img spam
4. Fast-flux redirect and
transparent proxies
5. CAPTCHA outsourcing,
OCR-based breaking
11
The problem
• We think about this in terms of technical means for
securing computer systems
• Most of 50-100B IT budget on cyber security is spent
on securing the end host



AV, firewalls, IDS, encryption, etc…
Single most expensive front to secure
Single hardest front to secure
• But individual end hosts are not that valuable to the
bad guys?

Maybe $1.50? Even less in bulk…
• We need to focus on their economic bottlenecks
• Which means we need to understand their economics
13
Internet Criminal Economics
• Our experience so far


Underground market analysis [CCS 07]
Spam [USEC ‘07, LEET ‘08/’09, CCS ’08]
• Where we’re going




In-depth analysis of Market enablers
Large-scale analysis of vertical markets
Technical defenses based on market enablers
Empirical defense assessment (“evidence-based security”)
14
Elements of the Internet
“underground economy”
• Acquisition of illicit digital goods

Tier-1 goods (e.g. credit card data, paypal, etc)
» Directly valued in “real world”; single step liquidity

Tier-2 goods (e.g. bots, malware, $ services)
» Valued only in UE, rented for service, or used to produce value in scam
• Trade/Sale in such goods

On-line markets and market enablers (IRC/Web Forums)
• Scams (capital investment to extract new value)


Combine digital goods with value creation strategy
SPAM, phishing, DDoS extortion, pump/dump, etc
• Liquidation of goods (cash out)


Indirect: SPAM/Adware (potentially legal), Click fraud, pump/dump,
gambling
Direct: cash out (WU, eGold, WebMoney), wire transfer, card “tracking”,
mules/drops
Example
• Scammer runs phishing campaign





Buys phishing kit from software specialist
Buys mailing list
Buys bots for mail relay or rents remailing net
Buys host(s) for phishing server
Gets credit cards plus PII & CVV2 info (“fulls”)
• Trade fulls on-line for money or other digital goods
• Can use to buy physical goods

Drop/remailer: launders physical goods
• Cashier will cash-out fulls for percentage of take

E.g. WU: drop receives cash, confirmer “fakes” true owner
Market data collection
 13 million public messages
 From Jan. ’06 to Aug. ’06
 Think QVC, not NASDAQ
Market
Msgs
S
C
S
M
C
S
IRC Network
…
 Market is public channel
active on independent
IRC networks (#ccpower)
 Common channel activity
and admin. creates
unified market
 IRC log dataset (2.4GB)
Dataset
Msgs
S
C
S
M
C
S
IRC Network
Market Activity
• 1. Posting advertisements

Sales and want ads for
goods and services
• 2. Posting sensitive
personal information


Full personal information
freely pasted to channel
Establishes credibility
• Unstructured quasi-english

Need automatic techniques
to identify ads and sensitive
data
”have hacked hosts,
“i have
verified
paypal
Name:
Phil Phished
mail lists,
php
mailer
Address: 100
Scammed
accounts
with
goodLn
sendPhone:
to all555-687-5309
inbox”
balance…and
i can
Card Num: 4123 4567 8901 2345
cashout
paypals”
Exp: 10/09
CVV:123
SSN: 123-45-6789
Market
S
S S
Buy, Sell, & Trade
What’s on the market?
Financial instruments
i sell CVV2s at $0.90, hacked hosts
at $8, paypals at 8, fullz at $10,
and wells fargo logins. IM me at
XXXX
DO NOT ASK FOR TESTS OR FREE CARDS.
Thank you :)
What’s on the market?
Financial services
i am boa cashout and wellsfargo
including chase
westernunion confirmer
can confirm males and females have
drops in usa
I AM VERIFIED MSG ME
looking good and legit drop from
USA for stuff (laptop, mobile
phones, TV plasma etc)
Goods
Percentage of Labeled Data
Mailer Sale (3%)
Hacked Host Sale (3%)
Scam Page Sale (1.5%)
Email List Sale (2%)
Ad Type (Goods)
courtesy Jason Franklin
Some high bits
• Value of “goodwill data”



87k unique credit cards (w/valid Luhn and BIN #)
» Estimate $427.50 exposure = $37M
Declared value of bank accounts = $54M
But these are only the public numbers, not trades
• Reputation



Few miscreants will deal with unknown buyers/sellers
New entrants establish reputation by providing free samples
or services
» Post raw credit card, bank account, etc
Poor behavior is systematically reported
» #rippers channel
Leads to many questions…
• Vertical integration vs open markets?

How much is each? How much transparency?
• Who dominates market volume?


A small number of bigger players?
A large number of small players?
• What dominates value creation in each segment?
• Can we use market data to directly value threat risk?
• Where are the bottlenecks?


Cashout? Market friction (reputation issues)
Which bottlenecks amenable to technical means vs
economic means/state power.
• All unknown… and fairly critical
Vertical market segment:
Spam-based marketing
• 100B+ spam e-mails sent per day [Ironport]



Most focused on product/service advertising
Some as vector for malware, etc.
>$1B in direct costs [IDC], larger indirect costs 10-100x
• Range of enablers

Botnet-based mail delivery, spaming software, address list,
redirection infrastructure, hosting infrastructure, payment
processing, fulfillment
• Direct marketing business model


Cost of delivery < marginal revenue * conversion rate
Only works because someone is buying?
• Very little empirical data on any of this…
24
Anatomy of a modern pharma
spam campaign
Courtesy Stuart Brown
modernlifisrubbish.co.uk
Spamscatter
• Goal: Measure and analyze Internet scam hosting
infrastructure
• Mine spam for URLs to scam sites hosting ad



Probe machines hosting the scams over time
Follow all redirections (separate redirection infrastructure from
hosting infrastructure)
Render pages and cluster sites based on image similarity
(image shingling)
Andreson, Fleizach, Savage and Voelker, Spamscatter: Characterizing
Internet Scam hosting Infrastrcuture, USENIX Security 2007.
Spam Campaign Lifetime
How long do spam campaigns last for a scam?

Spam campaigns
relatively short



88% last < 20 hours
8% > 2 days
On average...


12 hours of spam
Scam site up 1 week
March 11, 2016
< 2 days
< 20 hours
Scam Lifetime & Stability
How long are scams active, and how reliable are the
hosts?
• Scam sites long-lived

50+% lifetime as long as
probe time (1 week)
• Multiple hosts extend
scam lifetime
• Web servers and hosts
have same lifetime

Hosts likely blocked
• Overall availability high

97% downloads
successful
Shared Infrastructure
To what extent do multiple scams share
infrastructure?
• Substantial
sharing


38% of scams
share IP with
another scam
10 IPs hosted 10
or more scams
• Reasons?


Same scammer,
multiple scams
Or, sites rented
to multiple
scammers...
Looking inside
spam campaigns
• Virtually all analysis of spam is from standpoint of
recipient

How many received, from whom, content of msg, etc?
• We really care much more about standpoint of spammer


How many sent, how many delivered, to whom, for how long,
sent how, what kind of countermeasures, how many site visits in
response, how many conversions, how much cost, how much
revenue?
But generally not visible, except to spammer
• Approach: botnet infiltration


Spam sent via botnets, botnets have trust problem wrt
compromised hosts
Instrumented botnet host offers window into spam operations
30
Storm
• Storm is a well-known peer-to-peer botnet
• Storm has a hierarchical architecture



Workers perform tasks (send spam, launch DDoS attacks, etc.)
Proxies organize workers, connect to HTTP proxies
Master servers controlled directly by botmaster
• Workers and proxies are compromised hosts (bots)


Use a Distributed Hash Table protocol (Overnet) for rendezvous
Roughly 20,000 actives bots at any time in April [Kanich08]
• Master servers run in “bullet-proof” hosting centers

Communicate with proxies and workers via command and
control (C&C) protocol over TCP
Kanich, Levchenko,
Enright, Voelker and Savage, The Heisenbot
Spamalytics
31
Uncertainty Problem: Challenges in Separating Bots from Chaff, LEET 2008.
Storm architecture
Dr. Evil
Master
servers
Proxy
bots
Worker
bots
32
Storm spam campaigns
 Workers request “updates” to send spam [Kreibich08]


Dictionaries: names, domains, URLs, etc.
Email templates for producing polymorphic spam
» Macros instantiate fields: %^Fdomains^% from domains dict

Lists of target email addresses (batches of 500-1000 at a time)
 Workers immediately act on these updates




Create a unique message for each email address
Send the message to the target
Report the results (success, failure) back to proxies
Send harvested e-mail addresses
 Many campaign types

Self-propagation malware, pharmaceutical, stocks, phishing, …
Kreibich, Kanich, Levchenko, Enright, Voelker, Paxson and Savage,
33
On the Spam Campaign Trail, LEET 2008.
Storm templates
Macro expansion to insert
target email address
Example Storm spam template and instantiation
34
Storm in action
Received: from dkjs.sgdsz
([132.233.197.74]) by dsl-189-188-7963.prod-infinitum.com.mx with
Microsoft SMTPSVC(5.0.2195.6713);
Received: from auz.xwzww
Wed, 6 Feb 2008 16:33:44 -0800
([132.233.197.74]) Received:
by dsl-189-188-79from auz.xwzww
From: <johnny@hotmail.com>
63.prod-infinitum.com.mx
with
([132.233.197.74])
by dsl-189-188-79Received:
from %^C0%^P%^R2To: <kreibich@icir.org>
Received: from %^C0%^P%^R21224704030~!pharma_links~!
1224720409~!names~!eduardo
1224739062~!vern@icir.org
Microsoft SMTPSVC(5.0.2195.6713);
63.prod-infinitum.com.mx
with
6^%:qwertyuiopasdfghjklzxcvbnm^%.%^
6^%:qwertyuiopasdfghjklzxcvbnm^%.%^P
Subject: Say hello to bluepill!
Received:
from
auz.xwzww
Wed,
6
Feb
2008
16:33:44
-0800
spammerdomain1.com
rafael
ckanich@cs.ucsd.edu
%^R2Microsoft
SMTPSVC(5.0.2195.6713);
P%^R2([132.233.197.74])
byspammerdomain3.com
dsl-189-188-796^%:qwertyuiopasdfghjklzxcvbnm^%^%
From: <katiera@experimentalist.org>
Wed,
6
Feb
2008 16:33:44 -0800
spammerdomain2.com
katiera
savage@cs.ucsd.edu
([%^C6%^I^%.%^I^%.%^I^%.%^I^%^%]) by
63.prod-infinitum.com.mx
with
6^%:qwertyuiopasdfghjklzxcvbnm^%^%
To: <ckanich@cs.ucsd.edu>
From: <eduardo@slave.org>
%^A^% with Microsoft
Microsoft
SMTPSVC(5.0.2195.6713);
spammerdomain3.com
chris
kreibich@icir.org
([%^C6%^I^%.%^I^%.%^I^%.%^I^%^%])
Subject:
Say
hello
to bluepill!
SMTPSVC(%^Fsvcver^%); %^D^%
To:
<vern@icir.org>
Wed,
6
Feb
2008
16:33:44
-0800
From:
<%^Fnames^%@%^Fdomains^%>
spammerdomain2.com
by %^A^% with Microsoft
…
johnny
...
Subject: Say hello to bluepill!
To: <%^0^%>
From: <rafael@superlative.edu>
spammerdomain1.com
SMTPSVC(%^Fsvcver^%);
… Subject: Say hello to bluepill! %^D^%
To: savage@cs.ucsd.edu
<%^Fpharma_links^%>
From: <%^Fnames^%@%^Fdomains^%>
Subject: Say hello to bluepill!
To: <%^0^%>
spammerdomain2.com
Subject: Say hello to bluepill!
<%^Fpharma_links^%>
35
Data Collection:
C&C Crawler
Data Collection: Proxy
Operation
@
@@
@@@
@@
Data Collection: Summary
• Crawler-based dataset



Nov 20 2007 – Nov 11 2008
492,491 C&C requests (to 2,794 proxies)
536,607 templates (23% unique)
• Proxy dataset




March 9 2008 – April 02 2008
94,335 workers
813,655 templates (52% unique)
1,212,971 harvested addresses (49% unique)
• Harvest injection dataset





April 26 2008 – May 6 2008
1,820,360 harvested addresses (50% unique)
87,846 marker addresses injected
1,957 markers targeted (2.2%)
1,017 spams delivered to markers
Kreibich, Kanich, Levchenko, Enright, Voelker, Paxson and Savage,
Spamcraft: An Inside Look
At Spam Campaign Orchestration, LEET 2009.
Who gets spammed?
39
Campaigns: The Big Picture
Others don't last,
but have many types
(types ~ instances)
Stock scams took a break
Long campaigns
use few types
Domain Use & Usability
•
•
No more .cn, shorter time
to use, longer use
•
•
•
Registrations in batches
used at the same time
•
Domains are abandoned
after being blocked
•
•
JwSpamSpy
557 pharma
2LDs, 94% on
blacklist
Average use
5.6 days
Shortest use is
single
dictionary
Longest is 86
days
12.9 domains
per hour
Registration ->
use: 21 days
Use -> block:
18 minutes
Address Sourcing
• 10,000 addresses sampled from harvests and target lists
• Web-searches on Google
• Only available on infected machines:


76% of harvested addresses
87% of targeted addresses
• Web crawling for addresses unlikely
Affiliate linkage
• Evidence of pharma affiliate scheme


Web server error message leaked into dictionaries
21 days Nov 20 2007 – Feb 11 2008
<div style="padding-left:165px;paddingtop:40px;"><img src="img/logo.gif" border="0"
alt="Spamit.com"></div>
<div style="padding-bottom:3px;padding-top:26px; fontsize: 14px;"><br /><strong>The system is temporary
busy, try to access it later. No data can be
lost.</strong></div>
<div>Copyright © SpamIt.com 2007, All rights
reserved.</div>
Estimating spam profits
• Key basic inequality:
(Delivery Cost) < (Conversion Rate) x (Marginal Revenue)
• We have some handle on two of these


Delivery cost to send spam
» Outsourced cost: retail purchase price < $70/M addrs
» In-house cost: development/management labor
Marginal revenue
» Average pharma sale of $100, affiliate commissions ≈ 50%
• Conversion rate is hard to measure directly
• We provided first empirical measurement of conversion
• By rewriting requests sent through proxies under our control
44
Modifying template links
Received: from dkjs.sgdsz
([132.233.197.74]) by dsl-189-188-7963.prod-infinitum.com.mx
spammerdomain.com with
Microsoft SMTPSVC(5.0.2195.6713);
Wed,
6 Feb 2008 16:33:44 -0800
spammerdomain2.com
From: <freebie@pants.com>
spammerdomain3.com
To:
<ckanich@cs.ucsd.edu>
Subject: Say hello to bluepill!
newdomain2.com
newdomain1.com
newdomain2.com
newdomain3.com
Received: from dkjs.sgdsz
([132.233.197.74]) by dsl-189-188-7963.prod-infinitum.com.mx with
Microsoft SMTPSVC(5.0.2195.6713);
Wed, 6 Feb 2008 16:33:44 -0800
From: <johnny@hotmail.com>
To: <kreibich@icir.org>
Subject: Say hello to bluepill!
spammerdomain3.com
Measuring click-through
• Create two sites that mirror actual sites in spam


E-card (self-propagation) and pharmaceutical
Replace dictionaries with URLs to our sites
• E-card (self-prop) site


Link to benign executable that POSTs to our server
Log all POSTs to track downloads and executions
• Pharma site


Log all accesses up through clicks on “purchase”
Track the contents of shopping carts
• Strive for verisimilitude to remove bias (spam filtering)

Site content is similar, URLs have same format as originals, …
46
Measuring Delivery
• Create various test email accounts



At Web mail providers: Hotmail, Yahoo!, Gmail
Behind a commercial spam filtering appliance
As SMTP sinks: accept every message delivered
• Put email addresses in Storm target delivery lists
• Log all emails delivered to these addresses

Both labeled as spam (“Junk E-mail”) and in inbox
47
Ethical context
• Consequentialism
• First, do no harm (users no worse off than before)
 We do not send any spam
» Proxies are relays, worker bots send spam

We do not enable additional spam to be sent
» Workers would have connected to some other proxy

We do not enable spam to be sent to additional users
» Users are already on target lists, only add control addresses
• Second, reduce harm where possible
 Our pharma sites don’t take credit card info
 Our e-card sites don’t export malicious code
48
Legal context
• Warning: IANAL
• CAN*SPAM
• Subject to strong definition of “initiator”; we don’t fit it
• ECPA
• Our proxy is directly addressed by worker bots
(“party to” communication carve out)
• CFAA
• We do not contact worker bots, they contact us
(“unauthorized access”?)
• We do not cause any information to be extracted or any
fundamentally new activity to take place
• Hard to find a good theory of damages
(functionally indistinguishable -- consequentialism)
49
But…
• In this kind of work there is little precedent
• No agency to get permission; no way to get indemnity
• Lawyers tend to say “I believe this activity has low risk of…”
• We worked with two different lawyers to make sure
• Thus, we communicate our activities to a lot of
people
•
•
•
•
Security researchers in industry, academia
Affected network operators/registrars
Law enforcement
FTC
50
Effects of
Blacklisting
Response
rates
by country
Spam pipeline
(CBL Feed)
Spam filtering software
Sent
MTA
Inbox
Visits
Unused
• The fraction
of spam delivered into
user(0.003%)
inboxes
347.5M
82.7M (24%)
10,522
depends on the spam filtering software used
83.6 M
40.1 M
•

Conversions
28 (0.000008%)
21.1M (25%)
3,827 (0.005%)
Combination
of site filtering (e.g., blacklists)
and 316 (0.00037%)
--content
10.1Mfiltering
(25%) (e.g., spamassassin)
2,721 (0.005%) 225 (0.00056%)
Difficult to generalize, but we can use our test
accounts for specific services
Other
filtering12
Pharma:
Two orders
of magnitude
M spam emails for one “purchase”
No large aberrations
based on email topic
E-card:
1 insent
10 visitors
execute
the binary
Fraction
of spam
that was delivered
to inboxes
Effective
51
The spammer’s bottom line
• Recall that we tracked the contents of shopping carts
• Using the prices on the actual site, we can estimate the
value of the purchases

28 purchases for $2,731 over 25 days, or $100/day ($140 active)
• We only interposed on a fraction of the workers




Connected to approx 1.5% of workers
Back-of-the-envelope (be very careful) 
$7-10k/day for all, or ~$3M/year
With a 50% affiliate commission, $1.5M/year revenue
Not enough to be profitable unless spammer = botnet owner
• For self-propagation

Roughly 3-9k new bots/day
52
We’re on the cusp…
• This is a wide open area with huge impact potential
• We have tremendous momentum and experience here
• Over several years we’ve brokered the commercial
partnerships necessary to do this work (plus fed advice)
Active Data Providers
Active Research Partnerships
• Key agreements in UC: active purchasing experiments
53
Going forward…
• Epidemiology






Characterizing value chain for different scams
» Spammers, botnets, fast flux, affiliates, processing, fulfillment,
Mining social network of underground providers
Analyzing market enablers (cost structure and characteristics)
» E.g., mules, domain registration, traffic selling, de-CAPTCHA
Mapping monetization via financial credential honeytokens
Characterization of phishing defense effectiveness
Nation-state vs e-crime infrastructure
• Defenses



Botnet-driven spam filtering
Proactive URL blocking via on-line learning
Proactive phishing defense via machine vision
54
Click Trajectory project
• 10,000 foot idea:


We’ve gone deep into one spam campaign
Like to understand the relationship between all the elements
of the value chain involved across the spam industry
• Value-chain characterization


Front end (visible via network)
» Spamming groups
» Botnets (& hosters)
» Fast flux networks (& hosters/registrars)
» Affiliate programs (& hosters)
Back end
» Payment processing
» Fulfillment
55
Unraveling
front end value chain
• Expanding honeyfarm to host all major botnets (safely)


Log C&C and spam traffic; additional reversing too
All URLs tagged and stored in database
• 1st and 3rd spam feeds and bad url feeds (many)

URLs into same database (with source tag)
• Crawl all pages, referrers and metadata (DNS, whois)
• Database allows direct association of





Distinct scams (Web page matching and text matching)
Distinct botnets (via source tag)
Distinct fast flux networks (mapped during crawl)
Distinct affiliate programs (via both cookies and templates –
also partner infiltrating affiliate programs to validate)
Have IP, DNS and registrar data for everything…
56
Unraveling back end
of value chain
• Purchasing wide range of spam-advertized products
(note: actual purchasing not using any NSF money)


Watches
Herbal, Pharma (via partner)
• Cluster purchases based on



Merchant and processor
Packaging (postmark, forensic analysis of paper)
Artifacts of manufacturing process (e.g., FT-NIR on drugs,
analysis of movement similarity for watches)
57
Crawling underground
social networks

Underground criminals have implicit social network



Who offers which services, who partners with whom, etc...
Use multiple pseudo-identities, but significant structure still can
be reconstructed manually
Goal: build social network via crawling/datamining


Identifiers (ICQ, phone, etc)
Web page content, linkage on forum sites (who referenced
whom, etc)
CAPTCHA solving analysis

Webmail based spam




Web bots hard to filter; launder reputation of Web mail provider
But bots must solve CAPTCHA to create account; key enabler
De-catpcha services ($2/1k solved, 33% margin)
Study: purchase solving from range of such services
Key questions



Human vs vision-based solving (via error variation)
For Humans,
» Native language (language primes)
» Size of operation (via queuing)
For computers
» Accuracy variation, differential pricing
» Capacity
Mule recruitment

“Mules” are used to launder money or goods (remailers)




Recruited via spam
Building classifier that identifies mule spam
automatically; cluster based on e-mail content and site
Engage in automated conversation with e-mail sender
Goal:

Infer size of mule operation, turn-over, level of sophistication,
changes in demand, etc.
Traffic selling

On-line underground market for click traffic (parallel to
Google/Yahoo)



For direction to particular scams (e.g. pharma, counterfeits, etc)
For use in click fraud/PTC scams
Active purchasing of traffic streams


Characterize traffic streams themselves
» Real people, country of origin, time on site, click through, etc
» Survey of subset of people (why are you here)
Differential pricing for different click streams
Financial honeytokens


Range of scams that steal financial credentials
Question: do they share monetization infrastructure?


Money mules, wire cashout, layering via purchase, carding,
trading, etc
Methodology:



Purposely “lose” financial credentials
» Infostealing malware, phishing site, on open market
See how accounts are monetized
» Fingerprinting test transactions
» Merchant for large transfers
Exploring solo version and via
Partnership with financial services
company
Scam domain registration

Web-based crime is built on cheap and easy domain
registration, but little understood
We now have full feed for .com, .net and .org (others)

Look at pattern of use for scam domains (ala w/Storm)




Time to use, length of use, registrar agility, etc
Different between FF domains and hosting domains
Mining registrant records

Either identify template or tie into social network
Phishing defense value
• We have three kinds of phishing defenses



Spam filtering: stops subset from getting known e-mails lures
Toolbars: stop subset from clicking on a known phishing site
Takedown: stop everyone from reaching known phishing site
• But… how much do they each matter (i.e., to the
phisher) and which is worth additional investment?
• Dataset




Categorize phish e-mail and send through current filters
Track current toolbar blacklists
Track site lifetime (i.e. takedown)
Estimating click through (Taylor webalizer trick, DNS caching)
64
Assessing Attacks
By Nation-States
• AirJaldi is the ISP for the Nation of Tibet


20,000 users in wireless deployment in Dharamsala (nation-in-exile)
Maintains Tibetan nation’s web presence (San Jose)
• At both locations we’ve deployed Bro monitors
• Goal: can we observe attacks originating for nation-state
purposes rather than cybercrime?



San Jose location has “control”: AirJaldi has non-Tibetan customers
too, can partition address space
Control for Dharamsala deployment harder, but working on it …
Initial data captured prior to GhostNet story indeed exhibits GhostNet
infections = direct subversion from China
• Meta-question: how much similarity between e-crime/nationstate methods/infrastructure?
Proactive phishing defense
• Virtually all anti-phishing defenses are reactive
• Proactive defense via browser-based logo identification

Phishing campaigns all use logos or variations as trust cues
• SIFT feature matching invariant (rotation, shearing,
scale)
66
Proactive phishing defense
Warning: you are attempting to
enter data into a site that is not
authorized to use the
Bank of America trademark.
is likely
thatfor
thisdomains)
is
provider It(ala
SPF
a scam
• Query brand
on
recognized logo – is IP address authorized to display
• Delay notification until user attempts to enter data
67
Proactive detection Of
malicious web sites
URL = Uniform Resource•Locator
Safe URL?
• Web exploit?
http://www.cs.mcgill.ca/~icml2009/abstracts.htm
l
• Spam-advertised
site?
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
• Phishing site?
http://fblight.com
Predict
http://mail.ru
what is
safe without
committing to
risky actions
Joint work w/Lawrence Saul
68
Problem in a Nutshell


URL features to identify malicious Web sites
Different classes of URLs


Benign, spam, phishing, exploits, scams...
For now, distinguish benign vs. malicious
facebook.com
fblight.com
69
Live URL Classification
System
Label
Example
Hypothesis
70
Feature vector construction
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
WHOIS registration:
3/25/2009
Hosted from
208.78.240.0/22
IP hosted in San Mateo
Connection speed: T1
Has DNS PTR record? Yes
Registrant “Chad”
...
[__
…
Real-valued
60+ features
000111…1 0
Host-based
1.8 million
1
Lexical
1.1 million
1 …]
GROWING
71
Which online algorithms?
Perceptron
LR w/ SGD
Confidence-Weighted

99% accuracy w/on-line classifier
72
Meta-points on URL
classification
• Two big practical issues for using machine learning


Much work doesn’t scale to large-scale problems
Batch SVM-type strategies adapt slowly and don’t work well in
practice (adversary just changes from day-to-day)
• We’ve been working closely with a large Web-mail
provider on this project



Scales to their problem size
Online update adapts quickly
Performs better than their current strategy
(they have reimplemented our scheme and tested w/live data)
73
Bot-based spam filter generation
• Observations
– Modest number of bots send most spam
– Virtually all bots use templates with simple rules to
describe polymorphism
random letters and ≈numbers
– Templates+dictionaries
regex describing spam to be
generated
– If we can extract or infer these from the botnets, we have a
perfect filter for all the spam generated by the botnet
– Very specific filters, extremely low FP risk
http://www.marshal.com/trace/spam_statistics.asp
phrases from a dictionary
Full automated algorithm
Almost perfect in testing
(~0 false positives, very few false negatives)
Exploring live testing
Summary
• We think that the economic structures underlying
e-crime are far weaker than their technical
vulnerabilities
• Quantitative empirical data is key both for driving
technical innovations and policy
• We think we’re uniquely positioned to do this work
76
Questions?
Collaborative Center
for Internet Epidemiology and Defenses
http://ccied.org
Yahoo!
77