23008 >> Helen Wang: It's my great pleasure to have... Institute. Nick's institute and as well as UCSD Research...

advertisement
23008
>> Helen Wang: It's my great pleasure to have Nick Weaver from Berkeley ICSI Research
Institute. Nick's institute and as well as UCSD Research Group have been working together to
uncovering the underground economy as well as a bunch of other network security which has
made really significant impact.
So I'm really happy that Nick is here with us and telling us about two pieces of work. So there are
two talks. The first talk is about Netalyzer, and he has some very exciting story on some ICs, I
will not, you will hear all the exciting story from Nick firsthand.
The next talk is about the spam underground economy. Okay. Thank you, Nick.
>> Nicholas Weaver: Thank you very much.
The way I figured it, if I'm flying up to Microsoft, I might as well give two talks rather than one.
Also, it gave me an excuse to prepare the second talk.
This first one I've given in a couple of places because Netalyzer is this big network measurement
tool that we've built at ICSI. It has been widely successful at this point as network measurement
tools go.
How many people have run it before? Well, if you haven't, you probably want to run it on your
own network just to find out what exactly is broken. If you have, run it again, because we keep
adding new tests. Also it's really fun to run on random hot spots.
So this is joint work with Christian, Boris, Gregor and Vern. Boris, while he was at ICSI as a
visitor, and the work is sponsored by the National Science Foundation, with additional support
from Google and Comcast, as well as EC2 Time from Amazon. Again, opinions are myself and
not those of the sponsor or co-authors.
So let's start off with an anecdote of how Web search is supposed to work for people. They type
something into the browser's search bar. In this case it's Internet Explorer 8 and the browser
goes, okay, ask the DNS resolver what's the address of the search engine, which by default is
Bing. I'm using this example not because I'm at Microsoft but because this is what actually
happens for more people.
The result comes back. The request goes to the Web server. And the reply goes. However, we
saw something interesting for about one to two percent of Netalyzer sessions.
So not a huge number, but enough to be really significant. And I will get back to what this was
later. But as a teaser, the ISP had installed some sort of DNS appliance in their network. And
some third party, for this purposes unknown, was running a proxy server.
Again, the user does a query. The DNS resolver returns the address of the local Akamai node
that's serving Bing and the resolver appliance in between, or the appliance in between the
resolver and the user goes, oh, wait, this is a request for Bing, I'm going to change it to the
address of a proxy server. The request goes to the proxy server and in this example nothing
happens.
Request gets forwarded to Bing. Results come back. And, well, nothing happens. The proxy
forwards it on and the user has no understanding of what happened.
This is just one example of how the network really works in practice for people. And it gets on to
this lovely theoretical view of the Internet we have where we've got our server over on that side
and the user on this side. Traffic goes back and forth.
But in reality, the path packets might take might as well be Napoleon's march on Moscow. We
have fragments are just not allowed by the network these days.
We have hidden proxies on various networks for DNS and http and other protocols. We have this
problem called buffer bloat where the network can do great file transfers or do interactive tasks
but if you try to do both at the same time, you're in a world of pain.
We have this problem of annex domain wildcarding. Anybody who thinks that an error should be
properly reported back to the computer doing it is unfortunately disagreeing with most of the ISPs
out there. And we have this where ISPs were men in the middle and search sites for up until a
month ago completely unknown reasons.
And, finally, this lovely gentleman created the home gateway. The home gateway or NAT is an
abomination unto God and man. No fragments. Broken DNS resolvers. FTP proxies. SIP
proxies. If a NAT can break the network in a new and interesting way, there's at least one vendor
whose NAT breaks the network in that particular interesting way.
I think the way to put it, all NATs are broken, all NATs are broken differently. So we wanted a tool
to know what the network actually is.
So we wanted two roles. An easy-to-use network survey for anybody. Basically the motto is so
easy my mother can run it. And indeed it is actually easy enough that I can have my mother run
this tool.
>>: Your mother install an applet, Java applet?
>> Nicholas Weaver: She doesn't have to install the applet it's two mouse clicks for the typical
user. She's running a Mac. Windows XP has Java installed. Windows by default, Windows XP
is still your biggest market share so for all those users it's two mouse clicks.
As well as a detailed diagnostic tool for expert including what we colloquially call mom-mode
debugging tell your mother to run Netalyzer have her send you the results and now you know all
the different ways that her network is broken.
And so we targeted the Web browser with just two mouse clicks and we did it in Java rather than
JavaScript, because JavaScript doesn't allow us to do raw traffic and we didn't do it in flash
because flash gives us headaches.
But there's a lot of sneakiness in our tests. And.
>>: Would flash give you headaches?
>> Nicholas Weaver: I've just looked at it and their same origin policy is weird. One of the nice
things about Java is it has blame the user bypass of same origin.
So our architecture is rather distributed. We built a cloud-based service. So our front end Web
server DNS server is running at ICSI. All our servers are custom. We wrote them ourselves
because they fold spindle and mutilate protocols in interesting ways.
Then the test runs on a EC2 node. We right now tend to run five EC2 nodes at a time but we've
run as many as 20 when we're expecting a flash crowd. And this probes the network for
everything.
And then a result page is delivered. And then we fetch the result back. And now I try to do the
bold thing of a live demo on the network. So let's get on the guest network.
This doesn't count in the mouse clicks, okay?
>>: Well while that's happening let me ask you something, a couple of weeks ago I was at a hot
spot and I tried to do a search of the browser just as you described. And instead of actually
getting some hidden proxy what actually happened was instead of getting a Bing result I got a
result from some other search engine. Redirected, hijacked me away from Bing to another
search engine.
>> Nicholas Weaver: That's interesting. We've contemplating adding in a test for that direct to
our server, because if it's done using DNS, Netalyzer would capture it if it's done using direct
injection we don't currently capture it but we capture for rewriting which a couple of ISPs are
doing where the ISP looks at the 404 page and rewrites it.
So this is the basic user experience. They see our Web page. It's got a big start analysis button.
Although, if we ask my sister who is doing marketing stuff, it's not big enough. They click the start
analysis button. It gets redirected to a random back end node and there's a Java signed applet
dialogue, that's mouse click No. 2. If they click deny, most tests still run just not all of them.
And then it starts running, and the results display after way too much. And the guest network
here is relatively clean. There's a little bit of port filtering, namely the Windows ports. Yes. Even
Microsoft is unwilling to allow the Windows ports onto the Internet.
And DNS passes through a firewall that's protocol-aware. And there are a couple of minor bugs
in the DNS server, basically it's not doing failover for TCP write.
Common, annoying, doesn't really matter until you want to start doing DNSSEC. When you start
doing DNSSEC, this may be a problem. So back to the talk. Yes, it works well enough that I'm
willing to do live demos in all my talks.
So to give you an idea of the test suite, this is actually most of the test suite, we've added in 404
rewriting detection. We've added in some more DNS probing. We've added in UP&P queries.
And it really depends what the PC is, how they react to the paper. The IMC paper committee
loved our paper.
The SIGCOMM paper committee said this reads like a data dump. One thing we have is lots of
biases. Netalyzer is run by geeks. Far more Linux. Far more Firefox than you'd ever expect in
the real world. Far more open DNS than you'd expect in the real world.
We have a tendency towards flash crowds. So our initial flash crowd was biased towards
Comcast users. We recently received 20,000 visitors from Germany, because we did a -- Heise
provided a German translation and a "link to" on their page and as a result a lot of German geeks
go, oh, this is a really cool tool.
We've participated in at least two IPv6 deployment tests, Tora Anderson's trial. If users had
problems it threw up Netalyzer. Comcast IPv6 deployment trial. And one online game has
decided through no knowledge of our own that Netalyzer is a cool debugging tool. And this is a
quote from their FAQ. None of this also worked, my Internet is poned and the answer is run
Netalyzer, post results in the forum and somebody who is a geek in the forum will actually be able
to find out what's going wrong.
We didn't intend this, but it shows just how usable this is for mom-level debugging. So one thing
we do is we really exploit Java same-origin policy. If the signature is rejected, we are confined to
same origin, and we wanted as many tests to run as possible within that. Fortunately Java same
origin is amazingly permissive from our point of view
We can contact our own origin server on any port using raw TCP and UDP as well as http
through the Java API. No objections. It allows us to do whatever we want. And since we have
raw TCP and UDP we can do anything that's network layer and above. We can play games with
protocols, we can do direct DNS, all sorts. We can perform any DNS lookup but the Java VM
enforces same origin at DNS sort of.
If the answer is the origin server's IP address, that answer is returned. Otherwise it raises a
security exception of this request would violate same origin. You aren't supposed to know about
any other IP addresses.
But this allows us Boolean property tests on the DNS infrastructure. You do a query. It either
returns the server's IP address or an error. So as long as it's something that's a Boolean property
with the resolver itself, we can test it without violating same origin.
However, if the signature is accepted, Java sandboxing is gone. Java believes in a
blame-the-user sandbox. There's one click to bypass the sandbox. And if the applet does
something wrong ->>: What are examples of the Boolean test?
>> Nicholas Weaver: For example, how is glue policy handled? Does the DNS resolver accept
arbitrarily in bailiwick glue records, does it accept in bailiwick glue records only when they're in
the NSR set? Does the resolver validate DNSSEC as one that we will able to do. Can the
resolver fetch large responses over TCP? For that one, we send truncate back with false. If it
caches that, a dozen, then when it retries with TCP we send the true value.
So this allows a large number of tests to be done.
>>: Who accepts this material if you're Java --
>> Nicholas Weaver: Almost everybody.
>>: The user?
>> Nicholas Weaver: The user.
>>: So that's another click?
>> Nicholas Weaver: Yes, that's click two.
>>: Allowing deny click?
>> Nicholas Weaver: Yes. And I hate to admit it, but there's basically UI bugs involved that we
are inadvertently but happily exploiting. The Java dialogue actually is fairly reasonable for the
Mac. It's not great. But it says the applet is requesting access to your computer. Deny/allow.
Deny means run within same origin. We can't touch the disk. We can't touch the computer.
So it could be much clearer, but it's okay. It's great for our purposes. It's kind of okay from a user
education standpoint. Suns dialogue box however for Windows, what does this mean? This
application's digital signature cannot be verified. It can't be verified because I'm running on
debug. So I'm running with a nonverified self-signed certificate in this case.
This is its dialogue. If it has no clue where the code's coming from. Run or cancel. Do you know
what cancel means? Run with same origin. Run means run bypassing same origin. This is a
textbook example of how not to design a dialogue box to educate users. But it's great for us
because it means everybody bypasses same origin.
So the first result of interest is fragments and path MTU discovery. We test fragments by sending
a 2,000 byte UDP data gram. It should just be fragmented by the network. One sent from our
server are sent without DSF set the thing should defragment, et cetera. Eight percent can't send
fragments, eight percent can't receive fragments.
So basically fragmentation to the client does not work on the modern Internet. This is due to
stateless firewalls and NATs. It's easy to make a stateless firewall break all fragments. People
do it all the time.
Those who can send fragments may have an MTU hull. Three percent of the sessions just can
send a 2,000 byte UDP data gram but cannot send a 1500 byte UDP data gram. This is largely
due to we'll get to it in a bit, PPP over Ethernet, which is still distressingly common.
>>: Doesn't exactly match what's on the slide. Did you mean that three percent of sessions can
send 2,000 byte frames or three percent which can send?
>> Nicholas Weaver: Three percent of the sessions which can send fragments. Can't send a
subfragment value that would be fragmented on anything other than an Ethernet network. So I
hope that's precise enough. Basically they don't have a full 1500 byte Ethernet MTU and
something keeps fragments from working.
Our server, Java does not allow us to send raw packets. Our servers do. So we have a UDP
echo server that reads and writes raw UDP packets. It can return the size of the initial fragment
received so we can see what the fragment is. It can send packets with DS SET and specific size.
Replies with ICMP 6 too big. It basically supports trace route functionality from the server to the
client with path MTU discovery. So we can figure out where the path MTU bottleneck is.
And the network's mostly but not all ethernet. There's this great quote "I don't know what the
physical layer 20 years from now will look like but it will be called ethernet and have a 1500 byte
MTU." Well, at least for right now. But there's a lot of PPP over ethernet still out there. There's a
lot of old DSL modems and they're configured with PPPoE.
>>: What's the denominator for all this, how many sessions?
>> Nicholas Weaver: What do you mean? This is for all our user base of 300,000 plus sessions.
So this is a good representative survey of the Internet. Modulo. We would have hoped that with
the number of geeks involved this would be lower, because geeks are willing to pay for better
Internet and upgrade their modem.
So that's kind of distressing. In these cases, where you should get an ICM P2 big sent, only
60 percent sent them. So path MTU discovery for IPv4 has been decreed broken on the Internet.
It doesn't work. 40 percent of the time you never get the ICM P2 big. And basically path MTU
discovery must use fallback when ICMP isn't received.
Oh, and you guys don't have this problem, but we do. Linux uses path MTU discovery on UDP
by default and it sets the DF bit on all UDP data grams that aren't fragmented by the network
driver.
Don't ask me why.
>>: Question, what's eight percent of sessions, so this is join for how many hosts?
>> Nicholas Weaver: About 250,000 hosts or so. Over 200,000 plus.
>>: Okay.
>>: So I have a question. How did you decide to look into -- [inaudible] among the characteristics
what motivates you to looking at these two?
>> Nicholas Weaver: Basically we motivated by looking at problems that we've personally
experienced. We've personally experienced a problem where in the middle of the network there
was a path MTU hole and that broke SSH connections from Germany to ICSI but not from
Germany to Comcast to ICSI because it was in the middle of the network. So that's how we
chose that.
So proxies are another thing that we see way too many of. They're delicate things. Proxies
assume that the traffic conforms to the protocol. But we control both ends, because for
everything but http and DNS we're connecting to a custom server on our end over a custom
protocol. Namely, you connect and it sends back the IP and port number.
So this allows us all this port filtering policy. If the response is received as expected, we don't
detect filtering. There could be a smarter proxy that goes, oh, this is just somebody playing
games, I'm going to allow it.
If the IP address is changed, this says either this protocol is redirected through a different
computer, or the user has a load balancing network connection between multiple IP addresses or
the user started Netalyzer, closed it and opened it elsewhere, which is why these days we do
tests in order. So we'll see that.
If the connection fails, so if you get a connection and then no data back, then the port is blocked
somewhere in the network, if the connection fails altogether, so the SIM fails to connect. But the
interesting thing is that the connection succeeds and different data is returned. It passes through
a declared proxy. The proxy actually sent something back that was unexpected to us.
And if the connection succeeds but no data is returned -- [cell phone] -- sorry. Then the requester
response was blocked by a network device that was probably protocol-aware. That something in
the network goes, oh, this is a protocol violation, stop it. That's actually quite common.
So a few surprises. POP proxies are amazingly common. Seven percent of sessions reject our
protocol violation and another six percent actually send us a proxy banner. And it's almost
invariably local host antivirus, that the local host antivirus return is block mail worms by
intercepting POP connections. They don't actually end up intercepting IMAP SSL or any of those.
So it's questionable how well it works in practice.
But we see a lot of it. NATs do FTP proxies for those who remember networking 101. FTP is
broken and bass ackwards. So the data connection comes from the server to the client. So most
NATs or a large number of NATs just don't bother doing port forwarding and parsing the protocol.
They just run a proxy.
>>: Is the NAT, is the network translation ->> Nicholas Weaver: The home gateways.
>>: Okay. The ->> Nicholas Weaver: Sorry. I capitalized the S. SIP-aware devices are amazingly common.
>>: This local antivirus programs and [inaudible] to the client how do you know? Do you have
screen catcher?
>> Nicholas Weaver: We don't have screen capture. We have the banner that it returns. And
often the banner says AVG POP proxy. So the data coming back is specifically saying what the
product is.
SIP-aware network devices are surprisingly common. This is a big headache for those doing
VoIP work. We saw less SMTP filtering than we expected. Only about 25 percent blocked SMTP
requests to our server and eight percent rejected the protocol violation, which basically suggests
that many ISPs are doing port 25 filtering dynamically.
They're looking at the network. If you have a little port 25, they allow it. If you have too much,
then you're blocked. This is a compromise. This keeps the spam bots from spamming but keeps
the support calls down. Expected results, port 443 is almost completely unmolested. Run SSH
server on port 443 back at your home institution. You will thank yourself when you're on odd ball
networks. And the Windows port blocking is very common.
And this was distressing: Port, UDP port 1434, the slammer worm of, what, eight years ago, that
single UDP port which is in the efemoral range, so you can have stuff randomly assigned to use
that port, 20, 30 percent of the net still blocks it. It's kind of distressingly still high.
DNS is commonly firewalled. 99 percent have DNS access; that is, they can seem to contact our
server. But of those, 10 percent failed to fetch non-DNS over port 53 which says there is some
device in the network path that's going, oh, this doesn't look like DNS, kill it, which means you
better hope that as you deploy DNSSEC that this device actually understands DNS enough to not
kill your legitimate DNS traffic. DNS is somewhat proxied. 1.4 percent has the DNS request that
is sent from the client direct to a remote machine, redirected through some other system.
In this case I've seen NATs where if you don't set the recursive desired bit it will actually even
block those connections. DNS filtering on the wire, you see that 10 percent number. A lot of
these devices we've already shown don't actually understand the DNS protocol. Quad A records
get blocked. Text records get blocked. Unknown resource records get blocked. EDNS gets
blocked. Oh, you wanted to use DNSSEC to validate certificates so you don't have to deal with
DIGI NOPAR 2 [phonetic], sorry, you are out of luck. The network has decreed DNSSEC shall
not work for one, two percent of users. This is actually a big problem for those who want to do
DNSSEC to reduce a lot of the certificate problems.
Http proxies, well, we took advantage of the custom http server so known headers, random
capitalization, any proxy that resynthesizes headers will change capitalization. If you see the
capitalization of the connection header changed, that may be a sensor ware proxy. At least one
vendor changes just that header's capitalization.
Http proxies might not handle unknown requests. They may have a bug that's now two, three
years old in this field where it follows the host field. About five percent plus show evidence of
some http proxy. So it's significant. We also detect caches, because hidden cache much cache.
So basically having a custom Web server we can lie about the cache control headers and so the
first time you fetch an image it shows up like this. The second time you fetch the image it shows
up like this. Same size, different MD 5.
Known size. Four percent of IPs show in path http caching, which would be fine and good but
50 percent of the caches are broken. 50 percent of the caches cache data that the server says,
oh, cache, any cache that may exist in the network, don't cache this, please, they cache it
anyway.
Hidden filtering must filter. About ten percent of hosts have virus filters that are running on the
end host itself. We test with the ICAR test virus.
>>: What structure are you using for don't cache, are you using all of them?
>> Nicholas Weaver: We do weekly. Kindly, sir, don't cache and strongly everything that says
don't cache this thing!
>>: So 50 percent is even with the strong one?
>> Nicholas Weaver: A lot of this is with week. A lot of this with weekly uncacheable will still
cache. We do see a fair number that strongly cacheable will still cache. Like during development
we had problems where the MSF network would change connection from close to keep alive and
we weren't handling that. And it would cache things that were explicitly marked as, oh, my God
do not cache.
>>: Are these ISPs?
>> Nicholas Weaver: Usually ISPs in foreign country where bandwidth costs more, or corporate
networks.
>>: It's a seminal question. So basically four percent of IPs it's about like more than 10,000 IPs.
So did you try looking to the cities and to a particular area?
>> Nicholas Weaver: We haven't looked into geographical beyond just sort of country level. That
certain countries are much more prevalent than others. So like New Zealand you see a lot of
Web caches.
>>: So what would be a hunch, it will be a few ISPs or ->> Nicholas Weaver: It's very, very rare among U.S. ISPs. It is much more common among
foreign ISPs, where they have high transit costs.
>>: Those undesired caches, if the client sends about every day reload or if modify sends, I would
assume that they honor that and then batch it for ->> Nicholas Weaver: We don't check for that. But I would not bet on it, since they were caching
stuff that were marked from the server as oh my God do not cache. So that the browser itself
should always be generating new requests.
Another problem we see is big buffers. They're all over the place. They're in access devices.
They're in network drivers. They're in wireless access cards. Basically everywhere there's a
buffer there's odds somebody involved in building it thought bigger was better.
And so what we do is we just do a UDP hammer. We do a max rate UDP flow. See how much
the latency increases and then go the other way and see what happens the other way. And it's
just an endemic problem.
So this is what we call the graph of pain. Basically, it's the up load bandwidth and the inferred
buffer capacity based on the delay in packet size. And we get nice vertical striations suggesting
that we're getting realistic measurements of buffers. We get nice horizontal striations that
indicate common service tiers.
Anything right of the green line under stress the user experience a half second of latency,
anything right of this red line, under stress you can induce two seconds of latency into the user's
connection.
>>: Can you elaborate a little bit why the big buffers are bad?
>> Nicholas Weaver: Okay. I'm a full rate TCP flow. I fill up the full buffer, and I'll alternate
between full and half full and the sawtoothing. If the buffer is a simple droptail queue and I have
some latency thing like a SIN, that SIN now sees that full buffer delay until the packet goes
through and basically the Net can't walk and chew gum.
>>: Are most of the buffers [inaudible].
>> Nicholas Weaver: Yes, because a lot of these are either in the wireless card or in the access
device, the cable modem and DSL modem hardware. And this is a -- down link is fairly bad. Up
link is worse. And it basically is until you do a full rate transfer at the bottleneck, there's no
problem.
Once you do, everything interactive goes to heck. This is why if you have roommates, don't allow
them to run Bit Torrent. Your gaming will suffer.
Even if we just do user self-reporting as wired connections to exclude the wireless buffers, which
are unfortunately big, we still see a lot of systems with very large buffers. So we're able to fill up
buffers.
If we can fill them up a TCP flow can. If it's just a drop tail queue then they're in trouble.
We have this lovely graph of just Comcast users. God bless biases in your dataset. Well, we
only really see two big bands of capacity, which is what you would expect. They're more
homogenous. They have a lot of rental cable modems. There's some suggestion that some
cable modems actually implemented buffering properly; that is, in a simple drop tail queue you
put a max delay, not a max capacity. And you do that and you get good behavior.
It's great if you're doing just dumb droptail queues. It gives you 90 percent there with a
comparator and a timestamp. However, we track down these cable modems. It looked like one
old Linksys. I bought one of these off of Flea Bay, plugged it in and got the Comcast walled
garden saying this cable modem is too old to connect to our network, which suggests that at least
one hardware vendor used to get it right and then doesn't anymore.
But basically it's the walk and chew gum problem. There's some work for 90 percent solutions,
smarter NATs. I hate NATs, but if you're going to have them anyway, you can have them do
queue management so that you can fix your network with a better NAT.
The 100 percent solution is very hard. Basically every buffer which might be congested needs
smart queue management. And this is required if you want sub hundred millisecond latency
under load. So if you want clean VoIP when you're doing full Bit Torrents, you need smart
queues.
And NX domain wildcarding is endemic at this point. This is the problem where you do a typo
and you get to a helpful search page from your ISP. Strangely enough this helpful search page is
filled with ads.
When they do wildcard, they're often broken. Seriously, they don't just wildcard WWW, which the
browser generates, they'll wildcard anything. They basically assume that every DNS lookup is
only from a Web browser, because the only thing they return is the address of a Web server.
Six percent, most of these being open DNS, but some not open DNS. Wildcard serve fail. So
they don't care to limit collateral damage.
One percent of the non-open DNS sessions five percent overall, wildcard and A record for 0
answer reply. If I have an IPv6 test site, it returns a quad A record and queries for the A record
return a valid answer with a 0 number of answers.
So it's basically the server is saying this name is valid. There's just no record for it. Open DNS
wildcards up a record. You cannot build an IPv6 test site and expect it to work reliably if you have
open DNS clients.
This is a bug. I reported it to them months ago. As far as I know this bug is still there. If you're
going to use third-party DNS use Google public DNS. It doesn't have these issues. And this is
disturbing. We actually tracked down the vendor literature. This is for one to $3 per customer per
year. That for what for even a big ISP is slightly more than pocket lint, they will drastically change
how the Internet operates.
But this is the interesting one. And this directly affects this company. Some ISPs were
manipulating DNS results for Google, Yahoo! and Bing. This is what I showed at the start.
Yahoo! and Bing lookups always return the IP address of a third-party control proxy in one of two
address ranges.
Google was sometimes faithful. Sometimes an ISP controlled proxy. Sometimes a third-party
proxy. And the behavior was set per ISP. So the ISPs had specific control over how Google was
manipulated.
Google put a halt to all Google proxying back in May by throwing up a couch on the search page
with a linker, with a link to a page of why did I get this couch, your ISP is doing this stuff without
your permission, presumably they also called their lawyers involved. Depending on most lawyers
you talk to call this wiretapping. And the records are for clear proxy. This isn't Akamai notes.
So when Yahoo! and Bing don't have clean reverse, we now do a direct probe of the server
because a lot of them, Akamai you don't get clean reverse DNS on. But these are not Akamai
notes. These are proxies. Proxy traffic appears to be unmolested. As we will see in a sec, it's
not the case. Http proxy is proxied to Google unmolested, because Google is the only one to
runs https services on any of these domains. So if you are set to one of these rogue DNS
resolvers and go to https//www.bing.com you get a complaint from your web browser going this
certificate is only for Google.
The nonproxied host return redirect to 255.255.255 so that we can see that it's proxying all three
of these and the banner reveals that it is squid proxy. Invalid request put up a page referring to
phishing-warning-site.com which is a Go Daddy parked domain name with anonymous
registration, would be a dead end, except if you Google for phishing-warning-site.com you get
complaints about Google being down, because if it the proxy has an internal error it also throws
up this banner. So when the proxy breaks, the user thinks Google is down, sees the banner and
if you search for it you get this.
So anybody who has friends on the DNS library side at Microsoft? Client side DNSSEC
validation is a must have. We cannot trust the recursive resolver. So why were they doing this?
Well, these ISPs had this behavior as of this summer. Cavalier, Cogent, Correct PCS, Frontier,
Fuse, IBBS, Inside Broadband, Mega Tech, Pay Tech Currency and Wide Open West and Exo
[phonetic]. Who here uses Frontier? Double-check your network or use a third-party DNS
resolver.
>>: How do you conclude these ISPs have these problems?
>> Nicholas Weaver: Because when we do these lookups on the client ship them to the server
and record them, we saw that these ISPs were routing the user request for Yahoo! and Bing
through these proxies.
>>: Have sufficient clients ->> Nicholas Weaver: Yes. Dozens to hundreds, depending on the ISP.
>>: And they all experienced problems?
>> Nicholas Weaver: Almost all of them. It depends on the ISP. So like Paytech, only one of
their resolver clusters used this and other ones did not. Frontier, pretty much all their resolver
clusters did this.
Actually, Weidong, I'd like to talk to you later we'd like to do some testing on Frontier, your home
network using Frontier.
>>: I'm kidding.
>> Nicholas Weaver: So Charter was doing this, but they stopped. And Iowa Telecom was doing
this. But when they got bought by Windstream, they stopped. Who was responsible, this lovely
company called Pacfire [phonetic] who specializes in monetizing DNS. We found Pacfire
demonstration servers and a bunch of other evidence. We also found collaborating data online.
And this was the one that stumped us until just a little over a month ago is why we couldn't get
the proxies to change.
Note that also there's an MSR paper that also noticed this but as far as I can tell didn't answer the
why question either. Let alone get the publicity. It turns out there was the Google post had a
pointer to another blog post that had a URL that was changed, and that allowed us to determine
the condition.
If the query was from the browser search or URL bar only, and the query matched one of at least
165 specific keywords, then replace search results with a 302 redirect through an affiliate
network.
>>: How do you know whether the query is from the browser ->> Nicholas Weaver: I will get to that. Because this is what happened. Let's say the search is
for CA. They're interested in the state of California. Standard flow. Goes to the resolver. The
DNS appliance lies and it goes to the proxy. Now the proxy actually observes two things here.
The first thing it observed is that it's for a keyword search, and the second is the URL actually
does say this is from the browser URL or search bar, not from the Bing home page.
So it looks for both of these conditions, does a 302 redirect in this case to Pacfire's own analysis
server that goes, oh, this is for a specific affiliate program, which goes and does a 302 redirect
through the affiliate program, which then goes and does a 302 redirect back to the final merchant
and after the final merchant gets the request, well, in this case the user ended up at the computer
associate's Web store.
In the process presumably the merchant pays the affiliate program, which pays packs fire, which
splits the money with the ISP. This sounds like very complicated, but let's see basically real time.
Well, bounces back and forth. And the page renders. And how are users supposed to know
about this?
>>: How do you get these queries?
>> Nicholas Weaver: What we did we took the alexa top 10,000, we stripped out the suffix so it
was just the domain name, like Google, Amazon, EC2 and we bombarded this server with
queries to see which ones it would modify.
>>: I have this question.
>>: Sued you for modifying their network traffic.
>> Nicholas Weaver: No, they sued us for libel because we pointed out that they -- or they've
sued another party for libel because we pointed out that this can monitor user search. They don't
claim to monitor user search, just modify it so it's okay.
>>: How long does it take to run an analysis from end-to-end?
>> Nicholas Weaver: Netalyzer, takes about three, four minutes. We do a lot of tests. So they
do modify and monitor, but they claim not to record and profile. So from their point of view it's
okay.
>>: The user clearly notices.
>> Nicholas Weaver: How?
>>: Because he didn't get the query results.
>> Nicholas Weaver: How many users would actually notice this? Weidong, do you use packs
fire or do you use frontier's DNS resolver?
>>: D 2 to e-mail -- I assume he's working somewhere.
>>: Working from ->>: And I never observe anything bad. I search Bing. I always get Bing. Basically ->>: Search bar.
>>: Search bar, yes.
>> Nicholas Weaver: Do you hit the keywords?
>>: No like IE doesn't have search bar it's only in address bar.
>>: It's interesting stuff.
>>: So let me finish what I want to say. Is basically I know Frontier is doing well if I type an
address it's gone it's not resolved correctly, it would redirect to their own search engine.
>> Nicholas Weaver: That's their ->>: Search this website.
>> Nicholas Weaver: Actually, talk to us. That opt out is -- if Frontier does not add a new opt out
the opt out they provide on the bottom of the page is fictional. It just sets a browser cookie so the
next time you go to the page it sends up a fake browser error message.
I have a protocol for testing their opt out. So let's communicate afterwards. So on the 26th we
measured the monetization Web of the things they manipulate. Some were redirected direct to
Amazon. If you did Amazon, eBay, PayPal, it went direct because they have direct affiliate
programs.
I got to C before I got tired of copying company logos. Alibras, Computer Associates, comp USA,
Dyno Direct, they went through Link Share. Google Affiliate Network went through a lot. Two of
which are note. If you typed in HBO or FIFA into your address bar on the 26th you would get to a
page saying: This double click URL is wrong. So you wouldn't even get to the final destination.
Some went through Ask. All of these ended up going to YP.com with a banner on the top, did
you really mean the site you were looking for or YP.com and a random yellowpages.com listing in
it. Kohl's was especially amusing on the 26th.
You would get a Kohl's Web page framed with an ad for JC Penney. A lot went through
Commission Junction, they're the biggest. I gave up at A. And some Commission Junction was
then redirected to Double Click for these airlines.
During this process it's unclear whether anybody understood how packs fire was driving the
traffic. So ->>: So are these people actually paying packs fire.
>> Nicholas Weaver: Presumably, because that's how it works.
>>: Then it looks that someone went to a search site and clicked on a link or something.
>> Nicholas Weaver: Or affiliate ad. So an ad on somebody's Web page.
>>: Google use Pacfire?
>> Nicholas Weaver: All three big affiliate programs, Link Share, Google and Commission
Junction suspended the accounts within two business days.
>>: So they didn't know.
>> Nicholas Weaver: They didn't know as far as we know. So the current situation, they halted
the proxy server redirection to affiliate links within 24 hours of public disclosure. But as of the
16th they've maintained the DNS redirections that are routing user traffic through the proxy
servers. So they can resume modifying results at a later date, presumably. Why they haven't
stopped, I don't know.
Commission Junction, Link Share and Google have all publicly suspended Pacfire account
pending further investigation. We haven't heard anything more from them. But they've all
publicly stated the account is at least temporarily suspended.
And so that's amazing what ISPs will do to manipulate traffic. Like we now see at least two ISPs
that do in path 404 rewriting where they have an IDS box that looks at http replies and if the http
reply is a 404 they inject a JavaScript redirect to their advertisement page.
>>: So Pacfire ->> Nicholas Weaver: But that's a different company that's doing that.
>>: So sounds like the ISPs are the problems.
>> Nicholas Weaver: Both. There are ISPs that use Pacfire for an X domain wildcarding that
don't do this behavior. It's both.
>>: Sounds like Pacfire is just exposing a service, providing you an additional ->> Nicholas Weaver: That's what they sell to the ISP but they're direct participants in the traffic.
They're the ones running the proxy server. So they are the ones that are providing the
infrastructure and running the systems that are monitoring user search requests and modifying
the results or at least were monitoring and modifying.
So conclusions. Well, this worked. We actually did write once run anywhere and actually got it to
work. You can, as a small group, build a robust and comprehensive network measurement and
diagnostic tool. We got slash dotted and sys admin never noticed with a tool that does full rate
UDP flows for ten second bursts. We didn't get noticed.
We did a lot of sneakiness well the network is your adversary these days. We viewed the
network as an adversary and tried to trip it up, and we found interesting things. [applause].
>>: How do you turn this into a sort of longer term monitoring strategy, seems like you show
up/once or twice.
>> Nicholas Weaver: A lot of users it for debugging now. So it becomes something as people
run on a regular basis. We're 20 to 70 sessions an hour just at random baseline at this point,
because of ran do links, people running it, people getting to Starbucks and running it, et cetera, et
cetera.
>>: Administrators running?
>> Nicholas Weaver: A fair number of network administrators really like the tool because it allows
them to help debug their own and their parents' networks.
>>: Isn't debugging performance, connection.
>> Nicholas Weaver: Connection problems like NAT issues.
>>: But if one connection doesn't work don't connect to your server how can ->> Nicholas Weaver: Because there's a lot of problems where the connections sort of works.
So, for example, the Internet seems slow. What causes the ->>: The problem.
>> Nicholas Weaver: Yeah, or some sites aren't loading right. That could be a DNS error. So
like, for example, open DNS by wildcarding serve fail, if the site's DNS server is down you'll get
odd ball open DNS stuff rather than the legitimate response.
So there's a lot of things like that. There's a lot of just weird edge cases. One of the tests we do
is background outages, so the test takes three minutes. During that three minutes we look for,
we do one ping every tenth of a second if we lose three packets in a row we alert that. So that
says that your network is flaky and dropping.
>>: So you also did the trip wire work with Charlie Wright before. Can you put them together
somehow? That ->> Nicholas Weaver: We actually do use the Trip Wire test to see if we're in a frame and we do
direct http requests to see some supplies are modified. So we do do similar detection. So we
include Trip Wire functionality in this.
>>: So do you -- so in that work you just cover a bunch of ISPs injecting content into the result
page?
>> Nicholas Weaver: Yeah. And we see two ISPs injecting content into 404 pages, which is
something that you actually really would not be able to do with the trip wire, because it's really
kind of -- because of how the framing and stuff like that works and the redirections. So there's a
lot of that stuff.
>>: So for IPv6, seems like there's a lot of negative lessons here, a lot of things that will break. Is
there anything we can take as a positive, like here's a way which we could deploy IPv6.
>> Nicholas Weaver: IPv6 is looking remarkably good in our data about five percent of users of
IPv6 connectivity. The gross problem of broken IPv6 means broken Internet to dual stack sites is
no longer the case. That Apple and Opera fixed those show stoppers so now they're basically
everybody's doing happy eyeballs. So that if you get a quad A record and an A record you try
them both. If whichever one finishes first is the one you use.
IPv6 path MTU discovery is much less broken than v4, mostly because most IPv6 is going
through tunnels, but at the same time it has that minimum 1280 byte MTU that's enforced in the
spec. So the path MTU and fragment issues for v6 seem to be a lot better. One thing that's nice
for v6 is it gets the NATs out of the way because the NATs just pass traffic. So it reduces the
fragment problems.
So port filtering on v6 is vastly less than v4. A lot of computers have v6 without knowing it
because of Toredo and 6to4. The situation of 6to4 is not vastly better than it was a year where
6to4 turned on would mean your computer broke for dual stack sites. Now it means if 6to4 is
turned on, it doesn't work, it doesn't work and it doesn't matter.
>>: 6to4 are those implied on the server side?
>> Nicholas Weaver: Those are deployed in the network infrastructure. How 6to4 works is your
v6 packet, you encapsulate it in v4 to a special anycast address. It gets to the first one, strips it
out, sends the v6 on and your v6 has the v4 address embedded in it, but goes to a special v6
anycast address, gets to a proxy that reencapsulates it and sends it on.
>>: Our own networks -- major ISPs.
>> Nicholas Weaver: Very few ISPs are -- 6to4 is actually something that's just on the client.
Toredo is just on the client and Microsoft has it on Windows and it actually works most of the time
now.
So the V 6 story is it's still pretty rare on the order of 5 percent. But it's not breaking things, which
means turning on V 6 is no longer the catastrophe it was a year ago.
And some retail ISPs are shifting to it. So like Comcast is supporting V 6 in trial deployments as
native V 6 to clients. I think mostly because they want to use dual stack light to handle the v4
address space exhaustion problem.
>> Helen Wang: Great. So we have a half an hour break. I'm sorry, are there any more
questions? We have half an hour break and then the next talk starts at 11:30.
Download