23009 >> Helen Wang: So let me welcome Dr. Nick... Institute. Nick's group and the USCD group headed by...

advertisement
23009
>> Helen Wang: So let me welcome Dr. Nick Weaver again from Berkeley ICSI
Institute. Nick's group and the USCD group headed by Stefan Savage and Jeff
Walker have done a lot of high impact work uncovering the underground
economy. And today Nick, in the second talk that he's giving today, he's
graciously offering giving two talks in a row, in the second talk he's going to talk
about serendipity and spam. Nick, please.
>> Nicholas Weaver: Thank you very much. This is also my first time giving this
talk so I hope it works out well. And the title is because this is about two papers.
The second of which was entirely random lucky discoveries out of the first one of
the process of doing one. So, first of all, acknowledgments. I'm just one piece in
a large joint work between ICSI and U.C. Berkeley. We're such a big group that
we actually have a mission patch which I will explain.
This is, of course, in Russian. It means measure seven times, cut three times.
The normal saying is cut once, but we tried to write the paper three times. We're
the Seaside Trajectory Project. Our database server is Mastodon with Seaside
North, Seaside South. This being cyber warfare mission patch, you must have
lightning bolts that coincidentally spell out www. There is the trajectory which
cuts this to spell out WIN, and, oh, these were the banks that we'll get to later.
And around is the ->> Helen Wang: Why Russian?
>> Nicholas Weaver: Because the affiliate programs are Russian. And ->>: What was the mastodon tie in?
>> Nicholas Weaver: That's the name of our database server, which admitted let
we abused. And so this is a project of 15 people. This is the author list of the
first trajectory paper.
When in doubt, any work I'm going to talk about is done by one of these 15
people that's probably not the person in red there. Oh, and being unfortunately
computer science, it's 14 out of 15 authors are male. So these Viagra pills are
male and one female co-author.
So least a lot of symbolism in our mission patch.
>>: Who designed that?
>> Nicholas Weaver: Carol and Christian. This is a big project with a lot of
sponsorship from the NSF and the Office of Naval Research, plus a lot of in-kind
sponsorship from Google, Microsoft, Yahoo! Cisco, HP, UCSD, CNS and spam
feeds that shall not be named and all opinions are those of the presenter.
So our group and everybody else has focused a huge amount on how spam
works how botnets generate spam, how bot masters infect the system, how they
control their bots, botnet milking to fight spam. People have spent tons of money
and time and effort fighting spam. And this is all before it gets to the address.
But let's face it. Spam exists for one purpose. To make money.
If you talk to people who are legitimate spammers, well, they call it unsolicited
commercial e-mail. That's the term you use when you don't want to insult the
people who spam for a living.
The key being commercial. So we focused on the largest categories of
commercial sales. We focused on pharmaceuticals. Viagra and all that stuff.
We focused on replica, fake Rolexes and other luxury goods.
And we focused on downloadable OEM software. Fun stuff. All of these are
based on the model of the user has to click on the spam in order to order the
product for the spammer to make any money and all these products are driven
by affiliate networks. The spammer doesn't run the Web store, basically the guy
doing the spam is divorced from the guy who is running the infrastructure and
fulfilling payment. So that these are all -- I don't have to spam and run a
storefront, I can just spam.
We did not analyze 419 spam because, well, there's plenty of people who do that
for fun. The 419 eater stories are amazingly amusing. We avoided pornography
for obvious reasons of not getting into trouble. And we avoided gaming-related
spam because that seems to be being attacked already through the credit card
network.
What do I mean by affiliate program? This is RX Promotions ad page for people
who want to become a spammer. They provide the Web storefront and all this
stuff. And fortunately Kira was able to read Russian because he's native Russian
he does a much better job than Google translate, and some of the highlights are
up to 60 percent commission. So if I as a spammer drive somebody who buys
$100 words of goods from the storefront I get 60 bucks. Cool. Payout on
demand, as soon as the payment comes, it gets transferred as Web money into
my account and I get the money right away.
Low, low prices, my storefront would be competitive. I open my own store
without having to do all the messy stuff of opening the store. Processing credit
cards, order fulfillment, anything else. Detailed statistics on how good my spam
is working.
Contests.
>>: There was 16 percent, 60 percent. What did it say?
>> Nicholas Weaver: I'm not sure as I don't read Russian.
>>: [Russian] the font is horrible. But something is up to 15, not 60.
>> Nicholas Weaver: The 60 percent of the commission, the 15 percent might be
if you refer other people as spammers.
>>: Refers to you -- I think it says refers. But I can't see the colors. Okay.
>> Nicholas Weaver: So contests. Parties. RX Promotions is notorious for
having the gold brick party where they give the best spammer a gold brick and
there's lots of mostly naked women running around. Free hosting. You don't
have to do any Web hosting, you just provide the spam. And rich promo
materials for you to use in the spam. Seems like a very attractive model if you
actually want to be a spammer. You don't have to do any work. These guys do
all the work for you.
But we wanted to understand these guys. We wanted to understand all the stuff
that happens after the click. The basic division is the spammer handles up to the
point where the user clicks on the link in the mail and then after that it's handled
by the affiliate program.
So what's their infrastructure needed to support the commercial website? How
does payment processing work to collect payments and ship? Because one
thing that you might find surprising is these guys actually ship stuff. Because
let's face it, ripping off people's credit cards, credit cards are cheap.
If you want to make 100 bucks off somebody you have to actually deliver the
goods otherwise they'll call their credit card company and get their money back.
And along the way, we just happen to discover how big the market was. Good
estimates of how much revenue these affiliate programs are bringing in.
And what are the customers and what do they buy. So that's the serendipity part.
So let's start out with an actual spam message. Viagra, official site to med shop
RX.RU[phonetic]. The user clicks on it and this is where the affiliate program's
magic happens.
To start with, it has to do a DNS lookup, which goes to a DNS registrar. In this
case the Russian registrar. Goes to the actual A record DNS server, which in
this case was out in China. It then gets http get to a Web proxy server that goes
back to the actual affiliate program, which in this case is Pharmacy Express in
Russia, which delivers the happy fun page with the lovely female doctor on it.
I've seen these pages way too often, with the ads full of Viagra and Cialis and the
like. But even then this hasn't made the spammers or the affiliate a dime. They
only make money if the following step goes the user decides to purchase, in
which case the user's payment puts in the information.
It pays through Azerbaijan bank, merchant bank in Azerbaijan with payment
coming from the guy's bank in the U.S., who contracts with some manufacturer in
India or somebody who contracts with the manufacturer in India, to drop ship a
package that gets sent to the user.
All these moving parts have to happen or the spammers can't make money. So
we wanted to understand this whole chain, the whole trajectory of the spam. So
we started with about a billion URLs. So we had a whole bunch of feeds that we
all collected and we isolated the URLs. 969 million distinct URLs representing 17
million domains.
>>: Are those actual verified spam feeds or probably spam feeds.
>> Nicholas Weaver: There's various set of spam feeds. Some are like MX
honeypots. So everything is bogus like one that I think we can talk about, is a
domain name that's used by a lot of people for tests, bogus things, so Any Mail is
bogus.
Some are seeded account honeypots from different vendors where they put Web
addresses on the Web and get data back. Some are human identified spam.
This can have false positives in it. And some feeds are actually we generated
internally by running bots. And anything that a bot tells you is spam. So that's a
really good, clean spam feed. And rereduced all of them just to URLs because
we were not interested in message bodies and contents and everything else.
We just wanted the URLs. And we also ignored spam that was like image spam
where the URL was in the image, because who is going to type that stuff? That's
not very good spam. So then we started the crawling process. And the crawling
process I wrote some of the DNS library used by the crawler. But it ended up
being huge, because we visited 3.5 million domains but these 3.5 million
domains represented 950 million of the URLs covered.
So we biased towards domains that were seen in multiple URLs. So this helped
somewhat on the false positive problem. So the DNS crawler crawled queries
each domain multiple times until they converged. So fast blocks. You just keep
asking. And also queried domains daily for a week because well spammer
infrastructure changes a lot. And then the Web crawler was instrumented Firefox
running in a cluster with 100 copies of Firefox per node and the reason why we
had to run Firefox is, A, we wanted a real Web browser with real JavaScript so
anything that was detect the user being a crawler wouldn't detect it. Also they
have these redirection changes.
The first page you get is not the page that renders. You have 302 redirects,
meta refreshes, JavaScript all stuff that the browser happily follows. So you want
to go to the final page and we actually capture two things. The page document
model. So the actual page HTML and as well as an image of the page. And that
was very useful in the clustering process because a lot of us did participation in
the manual clustering and looking at the screen shots. It's instantly clear when
some guys are the same.
It prioritized crawling for new domains. So quick repeat. This allowed us to map
the technical resources, the DNS and http and domain name registration
infrastructure. And I'll spare you the details of the paper, but the conclusion is
trying to do whackamo here give up. It's too wide a set of resources. Too easy
for them to change, too volatile.
So technical manipulation, technical interventions at both the DNS and the Web
level are probably a lost cause. You can try. It will make you feel good, but we
aren't going to because we're lazy.
>>: So you run processes of the Firefox browser in a single node?
>> Nicholas Weaver: Yes.
>>: So that's kind of a lot.
>> Nicholas Weaver: Yes. That was not -- that was somebody else. Remember
15 people. This I didn't actually do the crawling on. But they had to customize
Firefox. They stripped it down. They did a custom plug-in for screen capture.
So there was a fair amount of infrastructure magic.
Part of the reason why is this is a 15-person paper. This is our particle physics
paper. We have a lot of people doing a lot of different moving parts. And as I
said, most of the stuff is different people. Me, I did the DNS library because we
needed a Python -- I had a Python library from Netalyzer that allows
asynchronous DNS lookups from Python that actually works really well when you
want to crawl two million domains in the space of a couple of days.
Then we did clustering and tagging. So we got all these lovely images. These
are actual screen shots. Separate them out into clusters and then tag the
clusters by which program it was. So like stephon and Kiral did a lot of reading of
Russian sites finding the canonical examples forgiven vendors.
>>: These are images, though, right? So you ->> Nicholas Weaver: We also captured the docs. We captured the HTML on the
page.
>>: That's what you're indexing, that's what you're using as ->> Nicholas Weaver: We're clustering on the DOMs and building tags on the
DOMs,,but we display them as images because it's a lot easier for us to work
with.
>>: Right.
>> Nicholas Weaver: So this allowed us to cover 38 percent of the total volume
of URLs were from one of 30 identified affiliate programs. So this is excluding all
the other categories of spam. Still, 40 percent of the volume is specifically the
stuff we were interested in.
>>: So I'm clear, so you went through and clustered and tagged all of the stuff as
either one of the 30 affiliate programs or not affiliate.
>> Nicholas Weaver: Yes.
>>: Out of your 915 million URLs you visited you only clustered a third of them,
only a third were clusterable?
>> Nicholas Weaver: No, these were clustered and tagged with specific affiliate
programs. So these were identified the actual actors behind the program.
>>: So is this saying that 62 percent you couldn't find an affiliate?
>> Nicholas Weaver: Or it was, actually it was often in the categories we weren't
interested in. Gambling, porn, random Chinese companies offering their
animation services. You've got all sorts of long tail spam we weren't looking at.
So like during the clustering process, that's when we excluded the porn,
gambling and others at this step. So that other 60 percent can be gambling, can
be the porn, et cetera.
We don't want to look at porn. It's not as interesting. We've manually validated
the clusters, and then the tagging was a manual process. And this evolved a lot
of us, including me. I did a lot of this. Where the goal was to examine the page
source to create regular expressions which identify individual programs rather
than storefronts. So like a given affiliate program might have six or seven
different-looking storefronts that look totally different but are actually the same in
the infrastructure underneath.
>>: You couldn't automate the tagging?
>> Nicholas Weaver: No, because we keyed in on -- once we developed the
tags, it was an automated process, but creating the tags was manual
examination. Because it was looking for implementation glitches and fingerprints
so I will give you a concrete example. But basically these guys are lazy. Why do
a totally different set of HTML for a different storefront when you can change a
little CSS, change the image's directory and call it good.
So let's actually look at an example. This is Eva Pharmacy page. This actually
got sent into my mailbox a couple of days ago. I'm sick of looking at her. She
just -- I've seen this page way too many times. And I specifically did a lot of work
on the Eva page. And there were two tags of interest. First of all, they had this
convention where they'd have image source, a large hexadecimal string of about
30 characters I think it looks like an MD fivesum.gif and what looked to be sort of
a counter.
This is an odd ball convention. No other online pharmacy did this. But they did,
with the convention of this appended thing. One of the other online pharmacies
did a tracker tag that was just dot gif, no appended. This is odd ball. We aren't
sure why we do this, but we think it has to do with the hosting infrastructure. So
if you do a system where you have caches, this acts as a cache bust so you end
up getting your proxy ends up going back to your original thing and this gives you
a kit counter, basically.
>>: You don't think it's tagging you so they know which button it came from?
>> Nicholas Weaver: No, because it changes with every page load. At least as
far as we know. So we think it's tracking for a hit counter. It could be encoding
information back. It's hard to tell. The other thing that was interesting on Eva is
they were cheap. For a lot of Eva sites they didn't bother hosting their own
images. Instead, they'd go all over the world, compromise Web servers and
have their page randomly select eight or five Web servers where they put
something on port 8080 that actually served up the images. This cuts their
bandwidth bill in half, because half of the page load is all the various little images.
A large fraction use this convention, but not all of them. Even the ones that
don't -- well, you can key in on sort of the template structure. So images CHCM,
Canadian Healthcare and Mall. That's theirs. So images chcmlogo.gif says this
is an Eva page using the CHCM template.
On one hand this saves 50 percent of their bandwidth. On the other hand, it
makes Eva page loads often bad because sometimes these systems are down or
they're two seconds away latency-wise. They stopped it in January but have
started up again, I don't know why they do this. So these are third-party Web
servers running engine X, proxy caches. If anybody is good at decrypting binary
blobs we know the control query to these. You ask for style.CSS or something
like that and it sends you a big binary blob that may give internal statistics but we
don't have a clue because we don't have a binary that's actual. So if anybody is
really good at looking at random strings of bits and seeing patterns, please talk to
me.
We gave up. But now that we have this and now that we've identified the actual
30 affiliate programs that compromise 40 percent of the spam we've seen, and
when I say 40 percent of all spam, it's almost all of the spam of interest. So this
is basically 95 percent plus of all the Viagra. 95 percent plus of the software.
95 percent plus of the replicas.
Well, we identify the programs. Now we start purchasing and see what we get.
So during the initial survey it was 120 purchases attempted. $80 medium
purchase. 56 were successful. As in build product delivered. They have to
deliver product because otherwise they get chargebacks. Almost all of the
unsuccessful purchases were the purchase never got charged, because, well,
one of the things we discovered is that the spammers have fraud problems.
And therefore they're actually very fussy about actually shipping to valid
addresses. To having a valid phone number that they can call up, that the e-mail
address is valid, that the zip code on the credit card matches the zip code on the
billing address of the credit card as the shipping address of the product. They
have all the fraud problems that a normal merchant has and then some because
hell they're selling Viagra, they're going to get ripped off.
So this is why our initial orders, we had so many problems with unsuccessful
orders. In terms of purchasing, we generally purchased legal to us items
over-the-counter drugs. We were basically the only people who would buy Zirtec
from these guys I was used as a mule address so I've actually got a package of
Zirtec at home from them. Software with licenses. Of so the software purchases
focused on things like Microsoft software where UCSD has a site license. Also,
I'm assuming Microsoft would not object to research that finds out who is making
money pirating Microsoft software.
And we also bought fake Rolexes which are very amusing. The payment was
tracked using special gift cards. Normal gift cards that you buy at the
supermarket you cannot use for these purchases these days because they aren't
allowed for overseas transactions due to federal regulations. You have to use a
card that has a billing address associated with it.
This is to prevent money laundering. So Chris [inaudible] got together with a
company that does these for business use and was able to get gift cards on
demand that used the valid mailing address of the actual researcher. So things
were shipped to our houses.
I was late in the process of buying so I only have one thing shipped to me. We
have also done the point of splitting out people who are only used for shipping
after we have problems to see if they're black listing us.
The card agreement allowed us to see the merchant information on the
transaction. So we could see what banks were processing the payments. And
the fulfillment part was tracked with disposable Gmail accounts. Burner cell
phone, shipping to residential addresses.
You want to use real cell phones when you're doing this, not Skype numbers,
because there are databases that recognize Skype numbers. So Skype and
Google Voice are out for stuff like this. You have to real cell phones.
>>: Why?
>> Nicholas Weaver: Because there are people who are -- these days a lot of
things that require a cell phone valid phone number to register like Craig's List, et
cetera, and there are databases of free phone numbers that are not allowed for
this purpose. And there's guys that use the same thing.
And it's a heavy interactive order process. You get the automatic e-mails with
the purchase number of your product and the thing going to the Gmail account.
>>: None of these guys insist on a land line phone?
>> Nicholas Weaver: No, they don't. You can't do that these days. A lot of
people don't have land lines. They just insist on real phone numbers. So we
have -- Chris Kanich [phonetic] has a fair number of T-Mobile SIMS lines lying
around. If you're getting burners go with T-Mobile. For 100 bucks you can get a
burner number that's good for a year and has more air time than you ever need.
E-mail or phone verification on a lot of these orders that a lot of the reason why
the early orders were cancelled was because we didn't have the infrastructure in
place. As I said, they have real fraud problems.
And this allowed us to see weak points in fulfillment. They're generally drop
shipped from a foreign location on the product. So it's difficult to disrupt. We
saw in cases where Chris I think got a package of Zyrtec that was hidden in a
wallet. So this stuff is really designed to get through customs.
The fake Rolexes, those all came from China. And several were genuine
[inaudible] watches. If you Google for this phrase you find out that this is a
phrase that has been seen on the inside of fake Rolex watches for over five
years. So this started out as some Chinese inadvertent mangling of Geneva,
Switzerland in the Rolex logo, but at this point it has to be a maker's mark,
because somebody has to have told the guy making them that, hey, you're doing
this.
Oh, these days if you're looking at Rolexes, fake Rolexes have hologram stickers
on the back. Real Rolexes don't. Rolex has given up on the hologram stickers
because the fakes are too good. The drugs were from India and elsewhere. The
herbal stuff was sent from all over.
The herbal stuff is in an interesting category because that stuff is actually fully
legal under the FDA because the FDA has decided that as long as it's called
natural you can ship whatever poison you want at people.
But so order delivery does not look like a weak point. But the payment
processing was. During these purchases that went through, 95 percent of the
test purchases cleared through just three banks. Only three banks. Bank of
Azerbaijan, DB Nord subsidiary in Albania and Saint Kitts [inaudible] in Anguila
[phonetic], I believe.
The interesting part is impact. All three of the banks terminated the merchant's
accounts in the face of bad publicity within a couple of days. Unfortunately ->>: Within a couple of -- of your paper release?
>> Nicholas Weaver: Not the paper release. The thing is they got their attention
by a above-the-fold article on page, on section two of the New York Times. That
Markovic the reporter tried contacting the banks, got nothing, got nothing, ended
above-the-fold on the business section and the banks dropped these guys like a
hot potato.
>>: These guys, specific merchants or all the -- which merchants?
>> Nicholas Weaver: All the merchants that we tested going through these
banks had their accounts suspended.
>>: How did the banks know their accounts did you provide the information
before the New York Times article?
>> Nicholas Weaver: Before the New York Times article we tried contacting
them and got nowhere. Markovic tried contacting them and got nowhere. The
interesting thing is they're honest about transaction type. That there are severe
penalties on Visa, et cetera, if you're dishonest. So if you're selling pirated
software it's done with a transaction type of software. If you're doing
pharmaceuticals it's a transaction type that says pharmaceuticals.
And so these were very easy to trace once the banks decided to care about it.
>>: You're talking about the 30 merchants.
>> Nicholas Weaver: The 30 affiliate programs.
>>: Cool. So there were 30 merchants.
>> Nicholas Weaver: About 28 or so. One went through Wells Fargo. They got
cancelled very quickly. So once you get the bank's attention, response is quick
and it costs a lot to set up a new merchant account. It costs days of time, et
cetera.
And so it's a matter of if we can keep the pressure on the banks, this can really
put pressure on the infrastructure.
>>: So you observed that they were down for a couple of days?
>> Nicholas Weaver: Yes. And in fact it's been hard -- it's been harder to order
from some of these guys since.
>>: Surprise.
>>: Great job.
>>: That was my question. How did you verify these accounts were closed?
>> Nicholas Weaver: The thing is we've continued to do purchases. Because
those order numbers seem kind of interesting. Chris noticed something
interesting about the order numbers. We'd be doing multiple purchases from the
given affiliate programs.
And they seemed to be going up pretty much linearly, pretty much monotonically.
That seems kind of cool. That seems kind of odd. So seven of the online
pharmacies representing a bigger than that percentage of the business and three
of the software vendors all used what looked like linearly increasing order
numbers in all those e-mails that they sent us about your item's being shipped.
Hey, wait a second. Let's just start keep buying some stuff, see what happens.
So down in San Diego they do prayers of purchase to verify the sequential
hypothesis. Get two people on two Web browsers, get all set up, get everything
in, click purchase, click purchase. If the order numbers are truly linearly
increasing you should just see them one off. And there's other testing that they
did to verify the sequentiality hypothesis that the order numbers were increasing
sequentially.
Once you do that then you purchase again a week later, a week later, and you
get a good estimate of the order volume.
>>: I would generalize. From the point of security, 00 numbers are a bad idea.
World War II story.
>> Nicholas Weaver: Yes. I think it's one of those classic blunders. I think it's
number three, just below land war in Asia and up against a Sicilian when death is
on the line, linearly increasing stuff is classic blunder number three at this point.
So another 150 odd purchase attempts. Also one of the things that's great about
Kiral reading Russian he reads the bad guys' forums. And the reason why he
can read the bad guy forums well these are public markets.
When you have a market, you want -- there's a trade-off between secrecy and
size. That a secret market is easy to do operational security on, but not that
productive because you don't have a lot of people involved. If you're more
public, well, then it's easier for people who you don't want to read what you're
doing.
But it's a bigger market. And so one of these, Gladmed had their affiliate support
forums publicly readable and people would post order numbers for purposes of
tracking packages. Because affiliates have to worry about customers actually
getting the product.
And so the forums would post order numbers, and again these seem to be
linearly increasing. And all tested programs except RX Promotions used linearly
increasing order numbers. RX Promotions they increased it by two. They
probably had a plus, plus, and a plus, plus in their code someplace.
And you get really lovely lines, except there was a little glitch On Gladmed I can't
remember it's discussed in the paper, but other than that Gladmed nice linearly
increasing orders as a function of time. 33 drugs. Four RX. Eva. Where is
Eva? My favorite thing. I get sick of looking at her. But it's still my favorite
program. Nice, beautiful linear increasing orders. Everything looks linear.
And that makes it pretty easy because now we can just draw a line and get the
orders per day. So this is just simply a matter of counting. Counting and
subtraction. So RX Promotions, about 450 orders a day. Eva, Eva is a big one.
She's nearly 900 orders a day. Voila, we got order volume.
>>: Why do you say their sequencing is a blunder here? Do they care about
other people knowing their volumes? Does that need to be secret?
>> Nicholas Weaver: One of the things that's a vulnerability is some of these
sites using just order number, you can access order information. We did not do
this because we considered this hacking. But this represents vulnerabilities in
their website because you can just buy a purchase and now see everything else
that has been bought by every different user.
So, yes, these do represent real security vulnerabilities.
>>: Even if it's not linear, it doesn't mean you couldn't just probe and get that ->> Nicholas Weaver: Yeah. But it's really easy if it's linear, because you don't
have a wide space to guess. There's no good cert advisory protocol for online
criminal pharmacies to report these bugs so we didn't bother reporting it.
>>: Do you have an understanding of the size of this work? Do you think they're
about 50 bucks, 80 bucks?
>> Nicholas Weaver: I'm going to get to that in a sec actually. We do. Because
we now just need what's technically known as a scientific wild ass guess of
revenue per order. And there's actually three different guesses per order we did.
The first one was from the Spamalytics paper which suggested about $100 for
online pharmacy. So we just used that. Guess number one. Guess number two
is always buy the cheapest item. What's the cheapest item on the site.
And guess three is what we call basket inference. Try to understand what people
normally buy and take the cheapest one of each one. For software it was troll
the torent sites. Figure out what torents are seeded the most, and that's the
relative popularity of the different software products.
For Eva, it's interesting, because this is serendipitous discovery number two.
Those 80/80 things look kind of odd. What kind of systems were compromised?
So I actually took it upon myself to go and try to contact every single site owner I
could on all of those. Because any given Eva visit used a set of five image
hosters.
And seemed to randomly select which image mapped to which back end hoster
on any given proxies and it was changed relatively slowly. So we tried to find out
what sort of systems these were.
I batted almost nothing. There were like random compromised Web servers in
Asia. A trial thing for a -- a test website for a demo of a conference website that
was live two years ago, et cetera, et cetera, there's all these forgotten computers
out there.
>>: On the net.
>> Nicholas Weaver: And almost all of these were forgotten. One was a VM that
actually did reply to us. We got a capture of the VM but were unable to find the
binary in question. Oh darn.
But one was at a university where we had colleagues who knew the IDS team.
The IDS team was actually very interested by this, because they didn't know that
they had a system that was compromised for five days, because what happened
is the department or whatever who had the system, well, they noticed it was
compromised and cleaned it up on their own and never told security.
Fortunately, this IDS team, they run Bro, and like every Bro user on the planet
they're unapologetic data pack rats. Vern has traffic traces going back a decade
at LVL.
They were able to give us the http log from the IDS for five days for this image
hoster. So this was, every http request to this hoster, the image that was
requested and the refer URL in question.
This allowed us to find the control program or the control IP, which was a dead
end, it was some random system, who knows where. This allowed us to find
their control channel, which looks like a blob that we can't adjust. And this
allowed us to get an analytics window into 3,000 plus people who added stuff to
their shopping cart on this website.
Because what happens is the user who actually goes all the way through, they
see the landing page. It gets a bunch of image gets from all these image
hosters. Then it goes to the product page. Anything new -- anything old on the
product page the browser gets it out of the cache. But anything new it goes to
these image hosters and the product page often has this bit that has other stuff
you might be interested in.
And then when they click on a product to add it to the shopping cart, the
shopping cart page also has bunch of images that go out and then when they
actually clicked to checkout that goes to a different domain that's SSLed with
none of this bonkers image hosting.
Oh well, this still gets us a huge amount of information. This allows us to see all
the visitors, which their visit used the logged image hoster and during the monitor
time about 40 percent of our visits used this logged image hoster. So I think we
saw about 40 percent of eva's traffic. We saw when a product is viewed.
Because we see this. And because the refer URL, that tells us what product they
viewed.
And then we see when they add a product to the shopping cart. Because we
know what product page we're on, we know what product got added to the
shopping cart. The only thing we don't know is quantity. So we'll assume
minimum. And we don't know if they actually purchased. So we will assume that
the ratio of people adding stuff to the cart is the same ratio of those who go to
check out.
>>: Seems like the business might pop up, should have popped up by now
around this or companies that supply that as an analytic service, you know what I
mean?
>> Nicholas Weaver: A few -- we have seen online pharmacies that use Google
analytics.
>>: It's similar in many ways to that.
>> Nicholas Weaver: This was inadvertently using spammer analytics.
>>: I like that.
>> Nicholas Weaver: One of the things is Eva actually has two businesses.
Spam pharmacies and SEO pharmacies. We suspect this was only spam
pharmacies, because if you're doing SEO you don't use the same domain as
your spam because your spam domains do get blacklisted. SEO you want to last
longer. And 45 of the top 50 domains were in our spam feeds. So this is -- this
is almost certainly their spam-side business.
>>: I'm guessing SEO central search implementation?
>> Nicholas Weaver: Yes, because that's the other way that a lot of these
pharmacy sites work is various tricks to bump their page rank up.
>>: The domains ->> Nicholas Weaver: Of the 50s top domains of landing pages. So the different
landing pages visited. 45 were in our spam feed. So at least 45 are provably
spam advertised versions. So this is -- this is what people browse for spam. And
of course the visitors come from all over the world. E obligatory pin map.
>>: Lots of visitors.
>> Nicholas Weaver: Lots of visitors.
>>: Five days.
>> Nicholas Weaver: Five days. Lots of products added to the cart for five days.
Almost all western countries. 91 percent western countries, U.S., Canada. EU.
Australia, New Zealand, et cetera.
>>: Isn't that approximately the percentage of where GDP goes in this country?
>> Nicholas Weaver: Probably.
>>: And also must having to do with language. The page is in English.
>> Nicholas Weaver: The pages aren't just in English. Eva has translation
buttons. They are in a lot of European languages as well.
>>: But not Chinese.
>> Nicholas Weaver: Not Chinese.
>>: A lot of yellow pins. Yellow pins has a lot, drop to the ->> Nicholas Weaver: But China probably doesn't need this stuff that they can
probably are better served dealing local. The U.S. is 75 percent of the business.
So this is actually greater than the share of GDP for the U.S. of the western
countries.
We will see that U.S. healthcare policy matters because there are -- we
categorize stuff into two categories. These are, by the way, eva's categories.
We didn't do the categorization, they kindly provided it for us. Lifestyle drugs,
basically anything in the men's category. These are significantly cheaper than
western purchases.
You have Soma and Tramadol, most of the pain category. These are schedule
four substances, so it's mild scheduling. So it's -- it's not like the Valiums or the
oxycodones. It's easier to get a prescription for it but it's not trivial. It's stuff that
is watched.
>>: Are they real? Did you guys test them?
>> Nicholas Weaver: We only ordered over the counter pharmaceuticals. They
tested down in San Diego one of the shipments of the antihistamines. It had the
active ingredient in the right quantity. We're not sure about the inactive
ingredient. We did not test and do not plan on testing at this time the Viagra, et
cetera, et cetera.
>>: But some other guys purchased and tested it and it had the right quantity?
>> Nicholas Weaver: Yeah. Part of the worry is the inert ingredients the other
stuff because you don't have the quality control you have out of the western drug
plant. We see some abuse potential items. Human growth hormone. There's a
lot of human growth hormone. That's most of the general health is HGH.
>>: Most of this is drugs not necessarily bad but they're not for the intended use.
>> Nicholas Weaver: They're not for the intended use. They're for lifestyle
purposes.
>>: The other part of it is probably a big part about United States healthcare is ->> Nicholas Weaver: Legitimate. We see antibiotics. Antibiotics is right here.
Anti depressants. Heart and blood pressure medication. Cholesterol medication.
Diabetes medication.
>>: Small numbers?
>> Nicholas Weaver: These add up percentage-wise. It's a long tail but it
matters. So in EU Canada, 92 percent of the sales were in one of these lifestyle
categories. In the US, only 67 percent of the stuff adding to the shopping cart
was in the lifestyle categories. A good 33 percent was stuff in the legitimate drug
category.
We suspect --
>>: Healthcare failure?
>> Nicholas Weaver: We suspect that the healthcare failure is due to the
availability of prescriptions. We never checked overall pricing comparing Eva
with Wal-Mart or with drugstore.com. But I did check the price of amoxicillin. If
you buy amoxicillin from Eva you're spending twice as much and it takes vastly
longer than it does to get it from drugstore.com let alone from Wal-Mart. This is
one of the things I think they have in their $10 generic. So you can just go down
to Wal-Mart, pluck down your ten bucks. But in order to pluck down your ten
bucks you need a prescription. Getting a prescription is 100 bucks if you don't
have health insurance.
So, yes, this is our social commentary part of our work.
>>: So do you feel that this whole process is the only illegal part is the
compromising those servers -- that's the only -- the rest is sort of like a business?
>> Nicholas Weaver: But it is -- these are all stuff that, all the stuff without
prescription that you're ordering online, that's illegal. A lot of the stuff is
specifically for abusive purposes. That's illegal.
The counterfeit software is a big business. I'm sure Microsoft doesn't think the
OEM sites are legal.
>>: But there is a prescription, you haven't examined it but there are sites that will
actually have a digital Turk kind of prescription where you can go and somebody
in Miami who is a doctor writes the prescription.
>> Nicholas Weaver: Those are legal.
>>: Makes it legal.
>> Nicholas Weaver: These are not. These are not doing that gating.
>>: But if we leave out this, whether the goods -- imagine they're buying replicas
know they're buying replicas, let's just go there for goodies. Does the merchant
or storefront owner know this is spam-generated traffic.
>> Nicholas Weaver: Yes.
>>: Because he knows he's not SEO.
>> Nicholas Weaver: These are affiliate programs. These are specifically paying
people to advertise through black hat means. This is specifically criminal
enterprise Web stores.
>>: So like that first page you showed, that one in Russian that said online
seminar, it was saying, hey, we're generating your spam leads for you.
>> Nicholas Weaver: No, you generate your spam leads for us and we provide
all this and we split the revenue.
>>: Here's how we turn your spam into ->> Nicholas Weaver: Yes.
>>: Affiliate program is only the middle guy?
>> Nicholas Weaver: The affiliate program runs the storefront and handles all
the back end stuff. They handle the order fulfillment, the payment, it's split in
two, the spammer drives the traffic.
>>: So you're saying there's a bad guy who gets the spam, and then there the
bad guy who sells things quasi legal.
>>: I want to sell these drugs I can go to the affiliate send traffic to me I'll pay
you.
>> Nicholas Weaver: No, that's not how the affiliate program works. The affiliate
program is paying for the traffic and is the one that's going the sales.
>>: Was there any spam going through legitimate affiliate programs such as
Amazon sorts of programs?
>> Nicholas Weaver: No, not this stuff. That stuff if you get caught spamming
for an Amazon affiliate program they'll smash you, they don't like it.
>>: There's lots of attention from the pharma industry and also law enforcement
and the [inaudible] down the ->> Nicholas Weaver: That's why they're also in Russia.
>>: Wonder if you could make money by spamming books on Amazon or random
products on Amazon.
>>: You can't give away books.
>>: Question to you about this, for some of the products in some of the countries
are legal; is that true?
>> Nicholas Weaver: Some of the products in some of the countries are legal.
But they are not cost-effective when they are legal. We ordered -- we paid five
times the market rate for our Zoloft compared with what we'd pay for down at
Costco.
>>: The Viagra, they sent for 1.5 and this is for like 16 ->> Nicholas Weaver: That's because of patent issues. That stuff is illegal to
import even if you have a subscription because of Pfizer's patents.
>>: So who is actually breaking the law? Is it the purchaser or is ->> Nicholas Weaver: The storefront and the purchaser and the spammer.
Basically the purchase is illegal but not enforced for the purchaser. It's illegal but
very hard to enforce for the storefront because these guys are all in Russia.
>>: Illegal under U.S. law?
>> Nicholas Weaver: Yes.
>>: Versus something done in Russia.
>> Nicholas Weaver: Yes.
>>: You're saying that.
>>: There's no law in Russia anyway.
>>: You're saying it's still economical for them to do it that way?
>> Nicholas Weaver: No, that's just how they do it. Because, well, this is how
they make money. And the net result if it's not that big a business, Eva is the
biggest it's still only 2.4 million a month gross revenue of which about half is
going to the spammers. Some goes to the product that actually gets delivered
and the cost of the infrastructure. And the rest goes to the ones running the
affiliate program. Which means the take-away is this actually a pretty small
business. Gross industry-wide business is 100 million a year, gross revenue.
Not net. Gross. Which is Microsoft will happily kill $100 million a year gross
business for being totally useless. The Zune was way more than this when
Microsoft killed it.
>>: Compared with Microsoft but these are maybe for these people it's ->> Nicholas Weaver: That's the thing. It's actually not that many players. 30
players, 30 affiliate programs is not all that many. It's a pretty small business,
load barrier to entry. You just have to be in Russia and break the law and keep
your head down because if you try to get too high profile you might get into
arguments with other ones.
And currently the big drama is if you're reading Krebs is two of the big affiliate
programs that used to be partners, they're not anymore. They've each been
bribing the cops to go after the other. And right now the cops are going after
both of them. [laughter].
>>: In Russia.
>> Nicholas Weaver: In Russia.
>>: How do you know about this?
>> Nicholas Weaver: Krebs does a lot of reporting on this. Brian Krebs is really
fun. But the other thing is this is heavily driven by U.S. money. On the pharma
side, three-quarters is driven by U.S. money. So if we get the U.S. people to
stop buying or, more precisely, get the banks to not do card not present
pharmaceutical transactions to Azerbaijan banks, you'd cut off the money supply.
And depending on the wild ass guess technique that you use, you get numbers
ranging from like Eva 1.3 to $2.7 million a month gross revenue. So it's not
pocket change, but it's not all that much money given all the damage that was
done. Because this represents 40 percent of the spam.
>>: What was your contribution to these numbers?
>> Nicholas Weaver: I was the one who figured out, did the Eva analytics.
>>: The money.
>> Nicholas Weaver: The money we spent, we only spent less than a couple
thousand dollars total. There was a lot of work in minimizing harm. So the
amount of purchases we purchased to any given affiliate program was in the
couple of $100 range. The products are all not used. They are in a locked room
for the most part. They will be destroyed eventually. Everything was over the
counter herbal on the pharmaceuticals. All the software was stuff we had site
licenses for.
So there was a lot of harm reduction in constructing the experiment. Because
one of the things we did not want to do is add a measurable volume to their traffic
or to their purchases in the act of measuring them.
>>: What we don't know is how big this quasi legal business is compared to
pornography and gambling.
>> Nicholas Weaver: Right.
>>: The other side is you don't know how else is spammer, the botnet owner is
generating revenue off his Board of Trustees.
>> Nicholas Weaver: Some of the other things is pay per install. Some of the
other folks at Berkeley and U.S. D are working on the paper install side. We're
also looking at other ways of calculating how much ->>: Paper installed, sorry?
>> Nicholas Weaver: Oh, I want a botnet. I'm not going to actually go out and
compromise machines. I can pay somebody else to run my binary on the
machines they compromise.
>>: Hmm.
>> Nicholas Weaver: As a bot master, I don't even need to go out and
compromise machines to send my spam. I just pay for them and send spam
from them.
>>: A new middleware guy?
>> Nicholas Weaver: Yes.
>>: There must be more income in this business than 100 million a year if you're
seeing middle men come up between ->> Nicholas Weaver: Not necessarily.
>>: The Bot herder and the affiliate program.
>> Nicholas Weaver: There may be some, but the other monetization channels
have to be independent, because pay per install is just a service for and it's just
sort of representing how the market has fragmented and specialized.
The other thing is $100 million a year may not be much in the grand scheme of
businesses, but if I as a service vendor can get .1 percent of that, 100,000 a year
and I'm living in Russia or India, et cetera, that's a lot of money for me. So you
get a lot of this race to the bottom stuff going on, just simply because the barrier
to entry is relatively low.
>>: Do you have a number that says how many compromised machines
generate that 100 million number.
>> Nicholas Weaver: No, I really don't.
>>: You do look at the SEO spam and other things to see if the merchant's trail is
leading to the same bank?
>> Nicholas Weaver: We haven't yet publicly, but we are, I believe some of the
folks down at San Diego are.
>>: What are the other factors you're looking at, social networks...
>> Nicholas Weaver: There have been a lot of work Chris Grier has been
looking at Twitter spam. Facebook is a bit harder because you can't get a good
feed of spam from Facebook. Facebook's privacy policy is if you're Facebook
they see everything and make your life a living hell but if you're somebody else
they don't let you get at that information. So it's really hard to find Facebook
spam.
>>: Back to Microsoft.
>>: Do you want to estimate how much bigger this figure is if you include the
search engine optimization trials maybe other things. Those guys are selling not
only the e-mail spam but --
>> Nicholas Weaver: It's unclear. So, for example, Eva the figure may actually
include the SEO revenue as well. Because if the order numbers are the same, if
it's the same order space for their SEO as their spam then it's capturing their
SEO revenue as well. We don't know.
>>: You mentioned paper install. About how long do you think such an install
lasts. Will it take over the whole machine or just take over enough to spam and
go away.
>> Nicholas Weaver: Actually if you look at Chris Grier's paper I didn't do this
paper install stuff, Chris Grier did. What happens is you have droppers that drop
stuff that drop stuff, and so actually what will happen is the same machine will be
sold to multiple individuals because the paid per install, you pay for basically
being installed but also other people's stuff gets installed too including other pay
per install clients for other pay per install networks and it wouldn't surprise me if
some black hat somewhere is doing arbitrage where it costs X dollars to pay per
install for one but I can get Y greater than X dollars for doing a pay per install to
these other two or three, which creates an arbitrage situation where my pay per
install stuff is something that installs three other paper installers on it. I bet that
that happens, because that's what I'd be doing if I was one of those black hats.
So in conclusion, actually, it's possible to understand the economic and technical
infrastructure, financial processing is a weak point, and the business is relatively
small for the damage done.
This is a great example for anybody writing an econ textbook on externalities.
Because spam doesn't hurt the spammers.
>>: Did you just -- I mean, you got the 30 merchants shut down. How long were
they disruptive?
>> Nicholas Weaver: They were disruptive about a week. It takes a while to get
new merchant accounts up. And there are people ->>: You expected about $2 million?
>> Nicholas Weaver: There are people who are -- but on the other hand they're
also being affected by the Russian cops right now because as I mentioned the
two biggest players are at each other's throats. It's quite amusing to read Krebs
today.
>>: Do they mention you guys?
>> Nicholas Weaver: No.
>>: Do they get their accounts back with the same banks?
>> Nicholas Weaver: No, different banks.
>>: Do they know about you? Do these affiliates ->>: They have your home address. [laughter].
>> Nicholas Weaver: They have bigger problems right now with their internal
infighting.
>> Helen Wang: Any last questions? There's one more question here?
>>: Do you have any good estimate on exactly how much spam is costing the
world? Because whenever I discuss spam with other people most people don't
think it's a problem anymore.
>> Nicholas Weaver: I have no clue. The problem is any figures that you see
are done by anti spam companies whose interest is in inflating the cost as much
as possible because that's how they make, justify making money.
>>: Yeah.
>> Helen Wang: Thank you, Nick.
[applause]
Download