>> Srikanth Kandula: It's a pleasure to introduce John John. John John is from the University of Washington where he was advised by Tom Anderson and Irene Kristimoti. His interests lie at the intersection of security and networks and distributed systems. He's done quite a bit with bots and understanding botnets and how they work, which is what he's going to talk about. I will let him go ahead. >> John John: Hi. Hello. I guess the mic is working so, I was told that this is a 90 minute slot, so I tried to make sure that I use up all of the time. [laughter] >> John John: I am just kidding. All right. Now that I've got you all sweating, let's start. Good morning, I am John from the University of Washington, advised by Tom Anderson and Irene Kristimoti. Today I am going to talk to you about understanding malware on the internet, botnets and all the other bad stuff that goes on. Let me start by saying that, you know, the internet is not a very safe place today. You don't want your kids roaming around alone there. There are a lot of prevailing problems. First there is spam. There are over 100 billion e-mails a day. Everyone gets it; everyone hates it, estimated productivity loss of billions of dollars a year. And then you have denial of service attacks. And these are becoming more and more powerful each day. A couple of years ago you had one strong enough to take out a small country. And more recently we've seen attacks against big financial institutions, like PayPal and MasterCard, which cost them real dollar losses. And another thing which in today's ad driven internet economy, is click fraud. Recent studies have shown that nearly 20% of all ad clicks are fraudulent and that is something that people want to really fix. And there is also other affiliated bad stuff that happens like phishing, so this is where they send you a page that looks like your Bank of America logon page, you enter your username and password and they steal all your money. And this again cost hundreds of millions of dollars in losses a year. So what is the common thread here? It turns out that the common thread for all of these problems is usually botnets. They provide the underlying infrastructure which allow hackers and bad guys to carry on their activities with impunity. And what exactly do I mean by botnet here? So botnet is a network of compromised computers that are controlled by an attacker. And the attacker has complete control over these machines, can use it to send whatever messages he wants and participate in any malicious activity of his choice. And since they are such a big part of the internet malware infrastructure, understanding botnets is kind of a necessary first step to understanding these threats and combating them. Unfortunately botnets are not really well mapped out today, so Windsurf would say that approximately a quarter of all computers connected to the internet are infected and malicious. Now that's probably not true, but who knows? And at the height of the Storm fervor in 2007, there were a bunch of news articles in September which said the Storm botnet has 50 million nodes; it's the end of the world. A month later you had researchers actually measure and understand and they revised it to a more reasonable estimate of 20,000 machines. So it wasn't too bad and there is still life going. So what this shows us is that there is a lot of confusion and fear and uncertainty and doubt regarding these botnets, you know? The sizes, who is participating in them, what do they do, what are their operations? And our idea was to kind of throw some light on this to understand botnets better. So it turns out understanding botnets is really hard. And the reason for this, they, these malware authors they do take special steps to make sure that you don't easily understand how they operate. They make use of sophisticated techniques for evasion and reverse engineering is difficult because these bot binaries are obfuscated, primarily to prevent this sort of [inaudible] engineering. And final analysis is very difficult; it takes a lot of time. And as you can see the poor grad student has to work hard to manually analyze these binaries and this approach does not really scale well, at least, unless you have a lot of grad students. So eventually the problem is that there is no, there is a lack of a comprehensive botnet monitoring platforms, something which makes life a lot easier, right? And so that brings us to our goal. Our goal here is to build a system which can in a timely fashion with minimum human interaction monitor botnets and their propagation. So I have highlighted a bunch of terms here and let me just tell you about each of them. So we want this to be done in a timely fashion. Botnets are constantly evolving and changing and information about them becomes less valuable the further along it gets. The information degrades quickly so we need our things to predict pretty quick. We want to scale so clearly we cannot have the human completely in the loop so we want to minimize the amount of human interaction required. And finally, you know, we want to have a comprehensive system. We want to monitor not just botnets and their activities but also the complete lifecycle, how they propagate and all the details regarding them. And this would give us information which can help combat attacks in real-time. And I will give you examples of this as we move along with the talk. So as I mentioned, there is a need for a comprehensive monitoring platform. We want to monitor the complete lifecycle of the botnet. And this figure here gives you a rather simplistic view of the botnet lifecycle. So you have bots which perform botnet activity, which are all the malicious things I mentioned. And they also try to infect new hosts which again participate in more botnet activity, a rather simple lifecycle. So in order to study what bots do and how they operate, we first started with Botlab. The focus of Botlab was to study the activity, the communication patterns, how they are organized and how do attackers control these millions of machines. And as a part of our study one of the things that we found was that the second step of infecting the host, is in fact a rather complicated and more involved step. So typically, traditionally malware used to spread through taking advantage of vulnerabilities in applications or operating systems and compromising the machine. But as these operating systems and applications get more and more secure there is definitely a push towards better security, it becomes harder to exploit these loopholes. And so attackers have been moving towards a simpler approach. So the thing is even if your operating system is really secure, you still have the human who is probably the weakest link in the chain. And a naïve user is very likely to click on any link and install any application you ask him to. And so there has been a trend towards having more social engineering attacks in order to infect hosts and spread malware. And you just need to get someone to click on the link, and these links are spread through e-mails, instant messengers and even search results. Again more details shortly. So one of the things we found in our research was that these links which are used to spread malware typically our regular legitimate web servers that have been compromised. So these are regular websites which have been compromised and are being used to host malware. So this brings us to the question of, you know, how are these web servers actually getting compromised? So we wanted to study this as well in order to make it more difficult for attackers to go about doing this. And a step before this is in order to compromise these web servers, you must first find vulnerable web servers. So the question here is how do attackers go about finding such vulnerable web servers and then compromise them? So our goal was to kind of study this entire botnet ecosystem, study the lifecycle, come up with defenses. And this brings us to the contributions. So for each of these various steps we came up with systems which would measure, understand and come up with defenses. And these have been over the last few years NSDI Internet Security and more recently [inaudible] and WWW. So let me give you a brief outline of what each of these things does, so this SearchAudit which studies how attackers go about finding vulnerable websites, web servers on the internet. And then we use heatseeking honeypots which takes it one step further. Once attackers find these vulnerable web servers, how do they go about actually compromising them? So this, our honeypots came to study that. And once the servers are compromised and used to host malware, tons of attackers spread these links to search results so the goal of deSeO was to study how these malware links are being spread through search results. And finally we started off with Botlab which gave us an initial picture of how botnets operate and what activities they participate in. Let me start with Botlab, which essentially kick started our research into bots. Yes? >>: [inaudible] you said there were four objectives, three objectives to real-time monitor [inaudible] propagation. Based on what I see from the slides they are not finding long term activities [inaudible] bot has skill [inaudible] study bot behavior [inaudible] and the mainstream attacks are mostly off-line, study off-line? >> John John: Not necessarily off-line. >>: Is it like real-time monitoring, all this? >> John John: Reasonably real-time, as in on a daily basis. So these are things which run on say web server logs on a daily basis and come up with more information regarding how these botnets operate. So it gives us some idea as to how these things operate in real time but as a research prototype, it is not necessarily real, real time, but expected realtime, order real-time. Okay, I am going to start off with Botlab which was a project which kick started our involvement with bots. And the stuff that we learned here essentially led to the other remaining steps of understanding how bots operate. So what exactly is Botlab? So let's consider botnets in the wild. You have millions of infected machines and each of these botnets talks to a command and control server. And that is different botnets have different command-and-control servers which give them instructions on what to do. So these botnets talk to the servers and the server tells them who to send spam to, which website to attack and things like that. So the goal of Botlab was to kind of take a small-scale version of this and study this botnet ecosystem in a locally contained environment. So we have captive bots which we are running in our virtual machines in our contained environment. We study who are the command-and-control servers they talk to. What are the instructions they get; what kind of activities do they participate in. What spam do they send? And essentially get a feel for what bots look like in the wild. And our approach to scaling this sort of botnet analysis was to automate it. And here what we do is we do eliminate the manually intensive task of reverse engineering these bot binaries; we use a black box approach. And what is a black box approach? We essentially execute the binary and study the external behavior. And in order to do this in a scalable fashion we needed to automate this process of finding and executing these bot binaries. So what does Botlab need to do? Well first and foremost we need to continuously find and incorporate new bot binaries. As I mentioned they keep evolving; they keep changing, so we need to keep track of them as they change. Then we perform some sort of initial analysis to pick interesting binaries. There are lots of malicious binaries out there, but we wanted the ones out there that we are currently interested in and in this case we are looking at botnets that send spam. So we want some initial analysis to select these interesting binaries. And finally we need to execute these binaries safely and collect the data that we want. And the Botlab architecture is essentially the same three steps but in a diagrammatic form. And the first step after spy plane is obtaining bot binaries. How would you go about obtaining malware? The traditional way of collecting malware is through honeypots. So we had set up--so what is a honeypot here? The idea is very simple. You take a new machine install an unpatched operating system, preferably Windows, connected to the internet and wait for 5 minutes or maybe get a coffee. 5 minutes later you are infected and you have a bot. You rinse and complete this process and you collect a whole range of these malware binaries. And in a mere two months period, we collected nearly 2000 such binaries from honeypots. And unfortunately we did not find any of these spamming boards which are a more recent variety. And most of the botnets that infected our honeypots were traditional IRC botnets, which are on the decline. And the reason for this was that this new generation of malware spreads through social engineering more than by exploiting vulnerabilities. To give you an example, here is a social engineering attack. You might receive say and E card for Valentine's Day that says you have received and E card please click on this link to view the card. You click on this link it installs a binary and you've essentially got an infection. Another example you would find is if you visit a site and you want to view a video, you occasionally find these little pop-ups that say that your current version of flash is outdated please click on this link and install it to view the video. You click on the link and you've just become the latest member of a new botnet. So the point here is that our passive honeypots cannot capture these sorts of attacks so we need to augment our honeypots with active crawling. Right, and for this we essentially need to emulate a user. And fortunately we just need to emulate a naïve user who clicks okay on everything and we get our set of new binaries that we want to deal with. So what is our source for getting these binaries? Botlab gets a constant feed of spam from the University of Washington and this is on the order of 2 1/2 million e-mails a day, 90% of all the e-mail that comes to the University is in fact spam. And nearly 1% of these URLs point to malicious binaries, malicious executables and drive-by downloads. We crawl these links and fetch the binaries. In addition to the spam we also get binaries from public repositories of malware and public honeypots. So this kind of fills in the first step of our picture which is how do you obtain these bot binaries? And the next step is to analyze the binary. So at this stage we have thousands of these binaries which we obtained and we want to do two things. First we want to select spamming bots to identify which bots send spam. And secondly want to eliminate duplicates. Unfortunately a simple hash is insufficient to detect duplicates for perfectly familiar binaries. This is because malware authors frequently repack their binaries to escape the sort of hash-based detection. So what we do instead is to kind of generate a behavioral fingerprint of each binary. So we execute the binary for a while and we log all the network connections, so we see which IP address port and how many packets they sent. And now we do this for all of the binaries and we compare any two binaries and if they have a similar network fingerprint then we say that they are duplicates. And this can also, if we see them trying to make connections to port 25, we know that they are trying to send spam and they are interesting spam bots. So at this step we have interesting binaries that we want to monitor. So in the final step we, the third step is to actually execute these binaries and collect data. So here we have an interesting trade-off between safety and effectiveness. On one extreme we can say you know we are going to let Botlab send out any traffic. This would be really effective because these bots get to communicate with their control servers, they get to do whatever else they need to do but the flipside is that you are now adding to the bot population and if you end up infecting some of the machines you've got all these legal issues to deal with. The other extreme is to say we are going to make this a completely contained environment and these bots are not allowed to make any external connections. Now this is not going to be effective because as I mentioned earlier these bots need to talk to their control servers, or C&C servers and unless they talk to these C&C servers they don't really know what to do. So in our case we decide to pick a middle ground where we say things like traffic to known vulnerable ports are dropped; traffic to privileged ports are dropped; we place limits on connection rates and data rates so that you don't participate in a needless attack. And since we are dealing with spambots, we don't want them to send a lot of spam so all the spam that is attempted to send is directed to a fake mail server. And at this step we have a system which finds bot binaries, picks interesting ones, and executes them. The bots are happy because they get to send tons of fake spam, and we are happy because we get to see what they are trying to do. Now most of the bots run fine… Yes? >>: [inaudible] >> John John: Yes. We only route traffic to known privileged ports, and most of these bots in fact are [inaudible] to HTTP. So we do allow traffic to go through, go to the command-and-control service. Yes? >>: What [inaudible] connection may compromise the Web server? >> John John: That is something which is difficult to do and that is one of the reasons we decided that this might not really be feasible in the long run. For the purposes of the study we did look back and see that it only contacted actual C&C servers, but it is possible that it could masquerade as a C&C connection and try to compromise a Web server. >>: How do you draw basically [inaudible] because typically the connection runs on [inaudible] so how are you going to recommend what to connect… >> John John: So we only drop all connections to privileged ports. So any ports that it tries to connect to say port 3389 to a small desktop servers we drop that on that pile ports. Ports with known abnormalities, those are dropped. So that would mean that we end up missing a few bots which might require this in order to get active. >>: So it seems that when you are doing this [inaudible] you are making some assumption about what kind of activity they are doing. What kind of behavior, what kind of rules. Wondering if there are bots that are not [inaudible] and your description actually will make the bots unusable or [inaudible] >> John John: So in that case we end up not seeing these bots. We only were able to capture bots which kind of fit into some particular set of rules which we thought to be reasonable. >>: Among all the bots that you catch, what's the fraction of them that remains that conform to the rules you set? >> John John: In terms of spamming bots, we are finally able to run around 11 or 12 of them. >>: And this is out of how many? >> John John: That we are not really sure because the others don't get to send spam. So we find 11 bots that actually send, 11 different botnets that actually send spam. And we found that this corresponds nearly 80% of all the spam that comes into the University is from these 11 different botnets, actually only seven different botnets. So we do see a reasonable coverage in the terms of botnets that actually send spam. >>: I wonder if there is a better way for you to, beyond restricting the properties so heavily at the same time [inaudible] much better coverage [inaudible] just redirect all the meds to some fake IP address [inaudible] >> John John: But you would still need to have, for each botnet, you would need to figure out what is the, it would not be the information would not be really high fidelity because now the bot is not necessarily sending spam to the people who the controller wants it to send spam to. Or the kind of spam that it is sending would be pretty much different. So there is definitely a trade-off between fidelity and safety and that is somewhat difficult line to figure out. >>: I might've missed this, but are you associating spam to [inaudible] to a known [inaudible]. >> John John: I haven't yet mentioned that so that is kind of the next part of the talk based on what we study from these bots, how can you gain additional information about them? So I will talk about that in a couple of slides. So one of the things we found here was that most of the bots ran fine but in some cases we do need to do some manual tweaking in order to get them to run. So here is an example. One of our bots Mega D actually verified that it was able to send e-mail before it got activated. So when the bot is running it would send a test e-mail to a specific mail server which is controlled by the attacker. That would return a special code which is the activation code and the bot needed to send out this activation code to the C&C server before it got activated. And if the code was incorrect the bot would essentially refuse to run. So in these cases we had to allow a few connections out in order to get it to run. >>: When you say you're bot detection is automated, it seems that… >> John John: It is mostly automated, with some manual tweaking. >>: It seems that in the beginning you do have to go through a lot of manual process just to write up all these rules. And then you can use some detection by matching their behavior with some signature that you already have but it's not like that's, you [inaudible] automatic that generates the signatures or you can only detect on [inaudible] based on existing signatures. >> John John: It's usually based on existing signatures because you could potentially come up with a scheme to automatically detect these signatures, but that would involve letting the bot run unhindered for a while to observe its actual behavior. >>: But are you able to express this initial [inaudible] >> John John: In terms of which? >>: For example the natural behavior of this bot versus that bot [inaudible] use this [inaudible] are you able to extract all this… >> John John: Yes. Each bot runs for a few minutes inside of VM; we look at all the network connections it makes and that becomes the signature of the binary. >>: You used a raw network traffic as a signature or are you extract features [inaudible] >> John John: We extract features, such as the which IP address it connects to, which of the DNS names it looks up, which port does it connect to, what are the packet sizes, and use that as a signature. So this is one of the cases in which it required some sort of manual tweaking to get these bots to run. But there are a couple of other challenges which we occasionally face. So one of the main problems was that there were some bots which would detect and they were being run inside the virtual machine and they were self-destruct. So in these cases we need to actually have physical bare metal boxes to run certain bots. And one of the bots we had in fact did try to use webmail so after using SMTP, it would connect to Hotmail, login with a stolen username and password and then send mail. So in this case we did set up a man in the middle so, to look at the credentials but this was again a very small botnet which did not do a lot of spam so it was not a big deal here. So what we have here are two really interesting streams of data. On one hand we have our bots in Botlab, a few dozen of them which are constantly churning out 6 million emails a day. And what's great here is that with these captive bots you get to see all the email sent irrespective of destination. You see spam sent to Hotmail, Gmail, Yahoo. You see spam in various languages, and so on. So we essentially have a tiny slice of each botnet and this gives us a very local view of the spam producers but a global view of all the spam that is being produced because you see kind of a wide variety of spam sent to all places. On the other hand we have spam coming in to the University of Washington. So this is another two and half million e-mails a day which provides a completely different perspective. So here we receive spam from pretty much every bot node in the world. So if you have an infected machine, it is likely sending you spam. In a couple of days you would definitely see spam from pretty much all of the infected nodes out there. So this kind of gives you a global view of the spam producers. You see all the spam producers but a very local view of the actual spam, because you only see spam that is coming in to the University of Washington. You only see spam coming to Washington.edu; it's mostly English spam mostly targeted at students perhaps. And so this gives you a local view of the… >>: [inaudible] that you received from almost every bot in the world. I guess I am going to challenge that in terms of my assumption is if you have a relatively small number of email addresses [inaudible] >> John John: Right around 1%. Or it's around 300,000 e-mail addresses. >>: Yeah that's right but that's come on… >> John John: .1%. >>: Wait a minute. 300,000? >> John John: 300,000 e-mail addresses. >>: Out of a billion, a couple billion? I guess I'm wondering… >> John John: So we essentially see spam from… >>: I would, I would accept that you see from a large number of bots but I wouldn't necessarily say it was every bot. >> John John: Not necessarily every bot, but we do see on the order of 1 million IP's a day. So that is a fairly large number. Yeah, let me rephrase by saying a fairly large number of these bots. >>: Well if you say a million I'm, I would bet out of a billion PCs, more than a million of them are on botnets. >> John John: This is 1 million a day. >>: Okay. >> John John: And over a reasonably long period of time you would see a larger fraction. >>: Have you compared this to any external data feed or you could certainly go to SpamHouse or one of the open public lists of spam websites to see what fraction that you are getting visible to you versus what is being observed in the world. >> John John: The problem with SpamHouse Blacklist is that you don't really know the number of false positives and false negatives in there. But from comparison, what we found was around 30% of the IP's that we actually observed are present in SpamHouse. >>: It's the opposite number that you really wouldn't know. >> John John: Yeah. So you don't really know what coverage SpamHouse would have either, right, especially since we have no idea of their false positives and false negatives. >>: This should fallout statistically. I mean, right, if you know that maybe 5% of SpamHouse is false positives and you know they are saying 100 million IP addresses and you know you are seeing 2 million, then you know you are seeing 2,000,000/95,000,000. >> John John: SpamHouse Blacklist are not, they are not in the hundred million range though. I believe, so I do get, I do have access to the SpamHouse Blacklist and they are roughly in the 3 million IPs on a daily basis, is what you see. And this kind of changes over time as things drop out and things get re-added in. So the SpamHouse Blacklist which we looked at had on average 3 million IPs a day. >>: Are you comparing your numbers with someone running ICSI were [inaudible] honeypots? >> John John: Yes. So we do share our information with them. This was done before their honeypots came up. >>: No. This was published in 2009. >> John John: 2008. >>: 2008. So their first paper, spam architecture… >> John John: Spam Analytics was before that, yes. But their actual spam-- but that was an incoming spam feed right? >>: Yes. But this one [inaudible] >> John John: Yes, yes, yes. Right here was incoming. >>: So I am just saying, the numbers for incoming spam, how many IPs they see operating everyday, how much spam they see operating everyday compared… >> John John: That I have not really chatted with them about their actual numbers versus ours. >>: So they didn't report in their paper? >> John John: Their paper didn't have the number of IP addresses. They were looking more at the hosting of the spam campaign. So things like where are the web servers hosted and those kinds of information. Not the actual number of e-mails that they received. So one point here is that by combining these two feeds of information you're going to get a lot more than any of these individual feeds, right? And the question was how would you go about linking this? And what we observed from our data was that spam subjects are reasonably special. So they are chosen quite carefully for two reasons. First they have to escape your spam filters, and second they have to be interesting enough for you to want to click on them. So as a result from nearly 6 months of our data, we found that there was absolutely no overlap in the subjects between two bots. So this was on an average of 500 subjects per day per bot and we found zero overlap across any two botnets. So we decided that looking at spam subjects and comparing them would be a good way of linking these two different streams of data. >>: You're only looking at spam that gets caught by ESS. >> John John: Yes. We do not have access to the other e-mails. >>: It's because less than 1% of the e-mails, so 99% is [inaudible] >> John John: 90%. >>: Okay so the remaining 10%, 90% of that might be great spam that is escaping your…? >> John John: Yes. That is possible but we don't have access to that data. >>: And those might be other botnets that are [inaudible] higher-level [inaudible] >> John John: Absolutely. They might be different botnets but in terms of actual total volume it would still be a smaller fraction, even though they might be more effective. >>: Yes. They might be affecting PCs far more often. >>: [inaudible] classified with those others there are millions and millions of [inaudible] >> John John: That will come up in the next couple of slides, the number of botnets we found. From this linking these two streams of information we decided to ask a couple of questions. There are more questions in the papers but for now I am just going to look at who is sending all the spam and what are some of the characteristics of these botnets. So the first thing we found was that nearly 80% of all the spam came from just six different botnets. And a single botnet called Sizbi was responsible for almost 35% of the spam. So this kind of means that if you could knock a few of these botnets out, you are going to significantly reduce the volume of spam on the internet. The question now is how difficult is it to actually knock out a botnet? So for that let's look at a couple of characteristics of these bots. So the first thing we observed is that most of the botnets we ran contacted only a very small number of C&C servers, on the order of a dozen. And in many cases the information about which IP address to contact was hardcoded in the binary. So if you could essentially block access to this IP address the bot would be headless and it would have no idea of what to do. And in fact in November 2008, a hosting company Micolo in California was taken down by researchers and law enforcement and overnight the volume of spam decreased by almost 80%. And the largest botnet Sizbi was knocked off-line and never came back. So some of the other interesting characteristics that we found about these botnets was that you could possibly fingerprint which botnet was sending the mail based on the spam sending rates. So this varies from 20 messages a minute to a crazy 2000 messages a minute. We also looked at that kind of mailing list over to botnets, so which, if you are a spammer you would preferably want to rent out multiple botnets in order to reach a wider audience, because two botnets the overlapping mailing list was only 30%. And finally we also looked at the active slices of these botnets and that varied from 16,000 to 130,000 with respect to how many of them are actively sending spam on a daily basis. >>: How did you get the [inaudible] >> John John: Based on what we see, based on our incoming spam. So every piece of spam that comes in, then we can, for 80% of the spam that comes in we can say which botnet it belongs to and by looking at how many different IP addresses, so this is definitely a lower bound of the number of spambots. >>: These characteristics, so it seems that like just the first two of them are not [inaudible] maybe because they are because they have such a large volume they don't care if they are caught by anybody. So maybe be smarter if you don't have to contact if it's set up you don't have to send this many again they can adjust their sending rate. >> John John: So back in the day most of the botnets that we looked at had very simple central control mechanisms. They would contact one fixed IP address or a bunch of IP addresses. And that has kind of changed slowly over time. And it is still has not moved to a decentralized control network. It was only the Stormbot net [inaudible] centralized network, but all the other botnets even today still use a simple HTTP central controller. The way they access the controller has changed. So now instead of contacting a particular IP address they use a DNS name which is algorithmically generated. So each day it would generate a DNS name and look that up. And one of the problems with these approaches is that if researchers otherwise engineered this, what they do is they pick a date which is say next month and they buy that domain name. So on that day at least you got control of all the bots. So a simpler mechanism would ensure that there is no infiltration, but they do have techniques to kind of spread out a bit. >>: [inaudible] have this characteristic, the biggest >> John John: These are the biggest botnets. >>: Yeah I think we [inaudible] the smarter, this Storm botnet. They only generated a small number of e-mails the small but it was hard [inaudible] may be more effective [inaudible] >> John John: I think Steffen’s group actually did look at the Storm botnet and in terms of actual delivery effectiveness it's the same as all the other botnets. It was not any more effective than all the other bots. And they also had this additional problem that since their control network was decentralized as a part of ADHD, researchers could easily infiltrate a ADHD now had control of these bots and could make them do what they wanted. So it's kind of a great opportunity having full control and having reliability. And so far it turns out that botnets have not needed to branch out and even today they still use a central controller with a small number of servers. And the main reason for that is that if the servers are not hosted in the US, if they are hosted in Eastern Europe or in Russia, it's really hard to take them down legally. And so this is sufficient for their current purposes. >>: [inaudible] best botnets in different countries. So how do you [inaudible] solve a known [inaudible] because you are saying that they are centralized but [inaudible] these places but [inaudible] >> John John: This is the way it was in 2008. And as I mentioned as of now things have kind of diversified a bit. But they still only contact a small number of hosts. They do not need to contact a large number of hosts yet, because they are hosted in different countries. So there is no legal framework which makes it easy to take down these nodes so a small number of nodes is sufficient for them to get by. But naturally they would diversify and have a larger set as you go-- it's an arms race, which as you raise the stakes they are going to try something different. And so this is the state of botnets as it was a couple of years ago. Okay so what did we get with Botlab? So what Botlab gave us was a real-time feed of malicious links and spam that was being sent out. This would be useful for having safer browsing because you know first time what the bad links currently going out or before they have been caught and added to safe databases. And also better spam filtering because now you have a pure feed of spam that is being produced and this was used again by UCSD folks to generate a signature-based detection for botnet spam. And the details of command-and-control servers and the communication is useful for detecting network level bot activity and also for CNC takedown. So the information from Botlab was provided to law enforcement agents and antivirus company so they can go about doing their business. So now we have seen kind of the end result of what botnets do. And the rest of the talk is going to focus on how they propagate and how they work. Yeah? >>: [inaudible] Focus on spam saving bots. Some of the detection mechanisms [inaudible] >> John John: So the other things bots do I mentioned in my initial slide, I said this is the bad things that happened on the internet. You have got click fraud, bots that do click fraud; you have got bots that send out phishing attacks, bots that do denial of service attacks. The reason we picked spam was that it was easy to see both parts of the attack. You get to see spam that is being sent out and you also get to receive spam. This is not really true of click fraud. You need to be a search engine or a large advertising provider to observe click fraud from the other side. And the same thing with denial of service attacks. Unless you are a big company, you don't get to see denial of service attacks. So the reason we picked spambots was so you can get both sides of the picture. And other bots are also similarly operated. In fact a lot of the spambots they partition some portion of their botnet to do various activities and these are rented out to people who want to do different things. And we do see some click fraud in the outgoing fashion but we did not focus on it because we could not get a complete picture. But some of the techniques we used here could also be used for other kinds of botnets because they use similar control mechanisms. >>: So based on the information provided by botnets [inaudible] so what about just directly use the spam feed where they come from [inaudible] and use our spam detection. It seems that this approach could be very similar [inaudible] depending on which idea, but what are [inaudible] does seem to be a much simpler approach so what are [inaudible] >> John John: So here you kind of get attribution. So there you are going to see that these are the botnet IP addresses out there but you don't really know which botnet they belong to. >>: [inaudible] based on e-mail subjects [inaudible] cluster them… >> John John: You can cluster them; so that is something that we learned from Botlab. So you don't have to keep running Botlab you could use the information you gained here in order to continue without it. The fact that the spam subjects are in fact unique is something that came out of our running of the actual bots. >>: So you use honest signature to attack the [inaudible] subject [inaudible] >> John John: We also did look at other signatures and their SMTP headers which kind of suggested that they were the same thing. >>: But SMTP headers are going to get the same information from, from the e-mail spams directly as well, right? So what other significance [inaudible] botnets [inaudible] >> John John: The only thing that we know is the set up subjects that are being currently set out by each botnet. And we found in these subjects that these are unique. And this information can now be used even without the presence of Botlab; that is true. Botlab was kind of a bootstrapping process which let you understand how some of these bots operate. >>: What I mean is I can extrapolate this same information about the botnet from just directly getting the e-mails [inaudible] based on what emails I get in and then use a spam filter [inaudible] same subject in same sender with the same headers so why do I still need to [inaudible] and doing this kind of thing quite >> John John: You get to see things like information like which of the command-andcontrol servers they contact. What is the control infrastructure like? So these are two different aspects of the information that you would see. Okay so, let's sort of backtrack and see how it all begins. So the question we asked was, one of the things he found was that some of the [inaudible] which are used for self propagation are things which are, which servers host malware are in fact legitimate sites which had been compromised. And the question was how do you find, how do attackers find these vulnerabilities on the internet? And the question is how do you find anything on the internet? You search for it. So one example, so here is an interesting thing that we found was that search engines are really good at crawling and indexing everything that is accessible. And in many cases a poorly configured server might expose sensitive information that can be then used by attackers so attackers can then craft malicious queries which would give them this information. So let me give you a concrete example of what I say. So here is a posted exploit for PHP waste content management system. So this is an application running on top of your web server. And this is an application DataLife Engine. It's a content management system. And version 8.2 of this DataLife Engine has a remote file inclusion vulnerability which means that any third party can store an arbitrary file onto this web server. And they helpfully provide a search term which can be used to find such servers which in this case is powered by DataLife Engine. So this is essentially, so you find in all these web applications, in the case of DataLife Engine that at the very bottom of the webpage is generated you have the stamp called powered by DataLife Engine and copyright trademark and all the other things. You pop this into a search engine, in this case Bing, and you find hundreds of thousands of servers. And some fraction of the servers would in fact be running version 8.2 that suffers from this folder ability. So now you no longer need the kind of brute force search the entire internet for all possible, all potential vulnerabilities, but you have used a search engine to shortlist your search to a narrow set of things that you can now easily attack. >>: Where do you cover this [inaudible] >> John John: That is posted on hacker forums. So you've got lots of these underground forums where hackers share their information and you can kind of bootstrap your system from this. Overall search engines kind of make it easier for bad guys to go about their business, right? And our goal here given, now that you know what kind of queries attackers use in order to find vulnerable servers, the question is can we use this information to essentially understand how attacks happen and possibly detect new attacks before they are out in full force. So we want to essentially follow attackers’ trails and have them be our guides. In order to do this we have access to a good data set which happens to be the Bing data set. So we had three months of sample logs from Bing. This is 1.2 TB of data containing billions of queries. And so with SearchAudit we have two stages. First we have the identification phase, wherein we try to detect malicious queries and this is an automated process where we start with a known seed set, expand it and generate a list of all malicious queries. And the second stage is the investigation phase where we can manually analyze these queries and try to understand the intent of the attackers. Then we quickly look at the identification phase so here we start with a small set of known malicious queries. And these can be obtained from a variety of sources, in our case we kind of look at hacker forums. So one of them was hack forums, another is MilWorm where they post exploits and the sort of queries you could use to to find these vulnerable servers. We crawled these forums and we start with a seed of 500 queries. And this was from a clear period. So we started with a small set. And now we have on one hand we have the set of seed queries which we know to be malicious. And we also have the search log which is a set of all the queries that were issued to Bing. And then it becomes reasonably straightforward to see which of these malicious queries actually show up in the Bing search logs. And once you have this, you kind of know who are the people issuing these queries. And one of the things we find about attackers is that they don't issue one query and stop; they issue a bunch of queries, and so you kind of now have a larger set of queries which you did not find in your seed set but are able to find through the search logs. The next step is to generalize these queries. So one of the observations we made was that attackers don't use the same queries very often. They make changes to suit their needs. So one of the things they do is they sometimes say they restrict the domain to which they want search results to [inaudible]. So they might be only interested in .EDU domains which are running on a particular site because they have a higher page rank and are more valuable for their time. And in some cases we find them adding random keywords to the query string so that you get a different set of search results. An exact query match, an exact string match does not capture these variations, so we decided to use regular expressions. In this case we feed all of these queries to our regular expression generator. This was [inaudible] which was in sitcom 2008, from folks at SBC. This kind of helps you capture the structure, the underlying structure of the query. And you get to match all queries that are roughly similar even though they are not an exact match. And here is a regular expression tool, and once you have this set of regular expressions you can run this on top of your search log and find all of the various queries which are similar. And now once you have this you can think of this as your new seed set and these are all malicious queries why don't you repeat this process. So we essentially do that until we get a fixed point where we feed this back into our system and look at the query expansion. And we find our final set of malicious queries. And typically it converges on one or two equations and it wasn't a big deal. So this is some of the data we have from a week in February 2009. We found these sorts of malicious queries from nearly 40,000 IP addresses. They issued 540,000 unique queries which are all different and for a total number of 9 million queries. So this number comes from the fact that many queries that are repeated multiple times in order to get different pages of the search results. So they issue the same query and look at the second page, issue the same query look at the third page and so on. And so in a week we found nearly mine 9 million queries for these kinds of vulnerable sites. And what kind of attacks did we find from these queries? So this is a part of our analysis phase where we looked at these queries to figure out what they are looking for. And one of the things that we found, naturally they were searching for vulnerable web servers and we found that nearly 5% of the returned search results show up in blacklist at a future point in time and 12% of these return servers were in fact vulnerable to SQL injection. And we also find queries which are trying to find forums and blogs to post spam comments on. So you would find queries on the form SueAnnesblog@ comment@blog post. Yeah? >>: Regarding the first one. So if my query is only for [inaudible] something. So that's actually returning all of the websites that run that software, but it was different for this. Some servers are vulnerable some servers are not. So in terms of just finding [inaudible] new set of rules to identify how vulnerable web servers… >> John John: You're going to see a bunch of these queries and there are some other queries which look at the--it's not just what is powered by, they also have things that are special to the pot. So when the particular say what does 5.1 uses a particular set of pots in the URL. >>: I am looking for is do you think that you can generate a set of vulnerable web servers just simply by analyzing the queries? >> John John: Potentially vulnerable web servers. They are not all vulnerable. >>: So when you say potentially, that means they could have a lot of this possible? >> John John: Yes. In our case we found approximately 5% of them actually show up. >>: So 5% is actually based on the future, some future event that you can cross check. When you actually fine-tune for that, that means a large number of them are not vulnerable. >> John John: There are not necessarily vulnerable, but they are being targeted by the attackers because they are issuing these queries with the intent of trying to compromise them. >>: Yeah. Okay. But basically you cannot. >> John John: You cannot say for sure whether something is vulnerable purely on this. >>: Just a quick question on this previous situation. So you have those [inaudible] ports then you expand those port sets so how big was this set in the, and compared to those initial ports that you found on MilWorm for example? >> John John: So the initial ports was 500 queries and the final one was 540,000. But there are kind of variations of similar ports. In many cases they are our other ports which the attackers issued and in many cases there are variations where they add these things at the end and beginning and keywords. >>: But you said that you also expanded in all based on what else they were searching for, based on IP addresses. So essentially these would not terminate and let you be the whole Bing index, so how would you differentiate between this is a port and this is just an IP that that looks up a stream on Bing. >> John John: Only the ones which started with these ports, right? So we look at the IP's that issued at least one of these ports and we look at the other ports that they issue or other queries at the issue. >>: But say, I am a attacker. And I Google for a port and then I want to search, you know, what is in their rest of the toolbar today. So then I would get everything… >> John John: Yes, so we also do look at least a large number of people who issued similar queries. There must be overall criteria that had to be done. So we do a bit of filtering before we just, we don't blindly take all the other ones. We do filtering to make sure that that doesn't happen. >>: What is the baseline for your top [inaudible]? >> John John: Baseline in the sense? >>: You said 5% or in [inaudible] randomly sample web servers on the internet… >> John John: 25%. >>: So this is 10 times more likely to be on a Blacklist? >> John John: Yes, 6 to 10 times. It varied from .5% to 1%. So that was the baseline and for SQL injection, we found 2% to be vulnerable if you search for random. And one of the other attacks we found was actually an ongoing attack, an ongoing phishing attack of live messenger user credentials. So you had attackers who would compromise a live messenger user, and send out a link, a phishing link to the user who would then click on it and be shown some things and his account would also be compromised. Looking at the search ordered results we found nearly 1,000,000 such compromised accounts that would have appeared over a year. And this was something, yes? >>: [inaudible] how did you get hold of the compromised accounts maybe it's out of scope but I'm… >> John John: The way it worked was that when you click on one of these links what happens is that it issues a query to Bing. The way they set it up, this was purely incidental. It was not anything special to this attack. It just happened to make use of the Bing search engine. It would issue a credit to Bing with the reference field containing the username off of the person who clicked the account. And that's how we were able to see which set of users had been compromised. And then we also later did some cross analysis which showed that these accounts had in fact been accessed by the attacker from an IP in Singapore and verified that these were in fact compromised. As soon as an attacker starts this process of finding these vulnerable web servers, you know which servers are in the crosshairs, so you can potentially proactively defend against these attacks even before they are launched. And the search engine could try to block such malicious queries and sanitize these results to make it harder for the attacker to go about finding these things. So eventually we can use it to detect new attacks as they come up and potentially also find the attackers. Alright so the next step once you have this notion of which are the servers being targeted, what we decided is how do we know what the attackers do next. What is their next step? How do they actually go about compromising these machines? And in order to do that we take a page from the attackers’ playbook. We create fake pages that look vulnerable and these pages are now crawled by the search engine and one attackers issue these queries, they get our pages and when they try to attack us we get to observe their attempts firsthand. So let me quickly run through the architecture of these heatseeking honeypots which kind of give you this information. First we have the malicious query feed which we get from the SearchAudit, so we know the sort of queries the attackers are issuing to the search engines. We issue the same query to Google and Bing and we get these pages. We fetch these web pages and store them and set them up in our honeypots. These are now crawled by all the various search engines. The next time an attacker issues a query for a similar term, our pages get returned. And then they kind of try to attack us and we get to see firsthand how they go about this. >>: Your pages will be returned way down the road [inaudible] right? >> John John: Yes. >>: And so why [inaudible] >>: Because they want 1 billion of them. >>: Even though you might need 10 million as a target… >> John John: 1000 pages we still get and since we have people on .edu and Microsoft.com linking to our honeypot pages [inaudible] higher than it should otherwise be. [laughter] >> John John: So once they find these attacks we kind of install an action software to see how the compromise really happens. And the actual compromise is going to be a lot more straightforward. So osCommerce, is a web software for managing shopping carts. So if you are running an Amazon like website if you wanted to have a shopping cart you would use OS commerce. And if the site is hosted on example.com/store the way you would actually compromise the site is straightforward. You would visit this URL and present the file for them to upload and now it is hosted on your web server. And this could be any file. It could be an executable file. It could be a PHP file which can essentially run with the privilege of your web server. And what they do after this is quite interesting. Most attackers typically host a PHP based file management system. So it is like a shell which gives you a graphical interface to delete files, upload files, change permissions, perform a brute force attack of your /epc/password file and whatnot. So this is typically one of the typical things that attackers do after they have compromised the server. And they can host any amount of malicious files on here and then send out links. So from our honeypot we set up nearly a hundred… >>: [inaudible] change computers or something so [inaudible] >> John John: It does not, at least for the smaller web servers. And actually the ones which are attacked by this are not the well administered once; these are free open software and usually run on smaller servers without a good security box. So we ran our honeypot for three months and with 100 pages set up we found nearly 6000 different IP addresses [inaudible], so not as large as it would have been if we had had a highly ranked site. We had nearly 55,000 attack attempts. And the honeypots saw all sorts of different attack attempts such as trying to get admin access, brute force password attacks, SQL injection, process scripting and the whole gamut of things. Yes? >>: [inaudible] attacks were result of the honeypot instead of just random [inaudible]. In other words I ran a number of web servers that are constantly just getting scattershot… >> John John: Right. So we do have a baseline case of just a [inaudible] running their just a web server. And we see four different, five different attacks, whereas with the honeypots we see a larger variety of attacks. >>: How do you run them side by side? >> John John: Oh, this was like before and after. >>: But two different IP's, right, at different times? >> John John: No, at the same time, two different IP's. >>: Two different IP's, but the IP's are right next to each other or something quick >> John John: They are on the same domain; they're all in Washington EDU. And now for the last part of the thing which should be pretty quick. So what happened after the site was compromised, was that they host malware pages on there and then they spread these links through either e-mails or IMs or in this case through search engines. So let me give you a video example of how this works. So if you go type in a query in Google which is a benign query, in this case Flintstone pictures, on MySpace and they happily help you auto complete, search on the results. Click on the very first link, and that turns out to be a compromised link, because now it shows a big pop up which is your computer is infected. And you click okay. It scans your Windows drive, your C is infected, your B is infected, you've got a whole bunch of things going on and now you have a choice of either protecting your PC or ignoring it. No matter what you do it tries to download a file and if you actually save and install the file you have now been compromised. So this is a rather common form of social engineering which today is called a Scareware attacks. >>: You're running that on a Mac. >> John John: No, this is actually Windows. This is a video. >>: [inaudible] >> John John: So this, it runs on a Mac too; you get the same thing. [laughter] >> John John: Full-screen doesn't look quite so realistic, so I had to run it on Windows to show you. Now is this really a problem? Well it turns out nearly half of the popular search terms contain at least one malicious link in the top results, and that is quite bad. And last year, just last year, this is sort of Scareware fraud cost nearly $150 million. Now where is the money coming in? Well once you install this fake antivirus it runs in your taskbar and every 30 days it pops up, a link saying your protection is running out, please by the full version for $30. And it turns out at least 5 million people did fall for that and paid $30 to buy it. >>: So Scareware doesn't take over your machine and scan all your files and blackmail you? It just tries to sell you something? >> John John: So this one does not. But you do have things that… >>: But it's software. [laughter] >>: It does an update. [laughter] >> John John: There is a similar thing which does Ransom Ware which would essentially encrypt your C drive. And then ask you to wire over so much money before it gives you your password. >>: [inaudible] you mean the top 10? >> John John: Top 50, top 50 yeah. And this was mostly a problem with Google not with Bing. >>: I feel like [inaudible] getting an oil change or something. Just this weird. >> John John: So I guess they could use it also as a dropper, because once you install a piece of software, it doesn't have to be just this fake antivirus; it could be anything of their choice. For now this seems to be a good [inaudible] approach that they have stuck to. So our goal here is to understand how these search engines are getting poisoned. How is it possible for them [inaudible] poison atop a news result and get it to the very top of the search results. For this example we looked at a sample attack; this was a real attack that was going on in progress. And this contained in nearly 5000 compromised web servers. And these have a very strongly connected cross domain link so you have each of them pointing to 200 other sites and so on. And these guys once you click on any of these results it redirects you to an actual exploit server which serves this Malware exploit. And these were hosted on nearly 400 domains in the US and in Russia. And one of the things we observed was that the log files on these servers was not very well protected so we were able to access these logs and we could see which were the different things that they were redirecting to and how many victims actually clicked through and ordered. And over a 10 week we found that over 100,000 users had actually clicked through to the final Scareware page. >>: When you say they weren't well protected? >> John John: They were not password protected. The log files were on the web server and could be read. So we didn't have to do anything shady to access the files. >>: [inaudible] is that part of the no crunch and no crawling like if [inaudible] was faulty [inaudible] access [inaudible] >> John John: This is a log stored by the attackers. >>: The attackers decided to [inaudible] to make it a request where you don't have to authenticate, so you just had to pretend to be them, you didn't have to do anything other than just say may I have it? >> John John: It was a capability-based. If you know the name of the file, you can access it. >>: It seems a bit optimistic that the attacker happened to store a log file? >> John John: Yes. >>: [inaudible] everyone access with no protection. >>: What do they care? [laughter] >>: So those logs are separate on those 500 servers, right? >> John John: Actually the log files are on a redirection service so all of these 5000 servers funnel into three servers which are responsible for redirecting into the final server. >>: And they verified those log files? So the public at the moment is that you have interfaces or log files basically producing random uploads. So if you go there today you know they will say 100,000 and, if you go there tomorrow from a different IP it was a… >> John John: No. It was definitely verified [inaudible]. Every time I was at it my IP and everything was [inaudible]. >>: Okay. >> John John: So some of the prominent features for this attack was nearly 20,000 keywords were poisoned. And where did the attackers find these keywords? They come from Google trends, and Google trends essentially each day Google produces a list of the top 10, top 20 search terms. And these hackers take these 20 search terms, they push it into Bing, and for each of these terms we get another 10 related search terms. And they collect all of these things over several days and they have a huge set of keywords that they can now poison. And more than 40 million pages were indexed and all this happened in just 10 weeks. So in order to detect these things we make use of the features that we found. One of the features that we found is that they have a very dense link structure across them, hundreds of sites linking to each other. There are many popular search terms in the URLs. Pretty much anything related to Justin Bieber would show up there. Then there were a large number of new pages. So once a server got compromised you will find that an attacker suddenly hosts a thousand or 10,000 new pages on the server and that these new pages are very similar across multiple domains since hackers typically use crypts that go out attack and host these servers. They are very similar across these domains. >>: There is a long history of [inaudible] >> John John: Yes. This is not necessarily SeO; this is more towards only the ones which are fully compromised sites. So SeO typically is for sites that are set up for this purpose, but compromised sites you have a different view of the model. So they have a particular behavior up to a certain time and once it gets compromised they completely change phase and do something different. >>: [inaudible] for a hacker to put on a compromising website into [inaudible] they will use a similar attack [inaudible] so nonmalicious websites are kind of optimized [inaudible] we use similar techniques, right? And then you can in that sense your techniques can [inaudible] optimize compromised websites would be similar search engine to the hacker as you optimize them [inaudible] >> John John: Yes, but in our case it's probably a little easier, because there is a sudden phase change after the compromise, so we can make use of that to determine which sites have been compromised. >>: I see you assume you have a before and after… >> John John: So we have the weblogs from the historical information about each website. >>: I see. >> John John: So it's actually easier than the normal SeO thing which search engines have to do. >>: I see, I see. >> John John: And some quick results. We found nearly 15,000 URLs and 900 domains corresponding to multiple compromised SeO campaigns. And there were, we picked 120 popular searches and we found that 43 of searches did in fact have compromised the results. And 163 URLs were compromised. And to conclude today's landscape is rather complex and we need a multipronged different strategy to address these various attacks. So we use SearchAudit, deSIO, heatseeking honeypots and Botlab as defensive tools. And we found that monitoring attackers often reveals new attacks and that infiltration is a rather effective mechanism, but it has to be done carefully. And with that, let me just quickly mention I am also on a bunch of non-security projects, so a couple of them are Consensus Route language, which kind of does consistent routing on inter-domain level, and also Hubble which is a system for studying rigidity problems in the internet and more recently Keypad is a file system for theft prone devices. And if you want more information on bots, you can go to Botlab.org. And that is it. [applause] >> John John: Yes? >>: You alluded to it in the slide you had 120 searches on Google with 43 found with malicious… What about Bing results? >> John John: One. Bing had one as I recall. >>: Are you going to publish that somewhere? >> John John: Oh yeah, it was there in the… >>: Yeah, because I mean the Bing guys have been working. >> John John: It's there in the paper. For sure and yeah, Bing had only one malicious [inaudible]. >>: What is the scale for [inaudible] >> John John: For spam links. In our case we were looking only at sort of compromised SeO. >>: I guess this is a follow-up to parts of the solutions but [inaudible] features that you detected identify cases of SPO [inaudible] seems to be necessary because the fact that there are a large number of pages or a large number of new pages or pages that are very similar, if somebody knew that by doing that they were going to get figured out they would work around it. So it didn't seem necessary… >> John John: A couple of necessary conditions there are the dense link structure because finally, depends on the page rank algorithm. >>: What if I just have some random links? To get rid of the dense link structure, even that doesn't seem necessary. >> John John: No, if you just look at the dense link structure across these things. If you look at the completely connected or the strongly connected, components that is kind of necessary in order to boost your [inaudible] you need a lot of incoming links. >>: It kind of depends. If you only have 100 maybe you need strong links. Maybe if you have 10,000 or 20,000 you would still boost up without being strongly connected on your setup. All you have to do is get it down to the level [inaudible]-based sample [inaudible] >> John John: So yes, in that case it is not necessary, it is possible for them to move things around but in terms of getting it out to the top very quickly they do need things like, they need relevant information in the page, right. If you just focus on the top search terms, that is sufficient for you to kind of look at the smaller side of web pages, and then do your analysis on there. So one of the things which they found necessary was to be a relevant search term, because there's not much historical information about these things. Like it's really hard to get Bank of America out of the top spot, but something about say the tsunami or something else is really, the search engines won't have any historic information about these things and so it becomes easier to game the system for the short term events. >>: Another way to just I don't know if this would work but another way to distinguish your apparent SeO is from companies that are actually trying to up their search engines is if one case if you e-mail the site administrator they will say, oh my God, my sites are hacking the other [inaudible] but that's the way it's supposed to look it's got a great [inaudible] it's got all sorts of [inaudible] with that work or would the [inaudible] >> John John: Most site owners do not respond, so one of the things that we had is once you recognized a site had been compromised, any attempt to contact them could place some sort of legal liability because in many cases they would be like, oh my God, Microsoft attacked me. [laughter] >> John John: So if you send an e-mail from Microsoft and your site has been compromised, that is kind of reaction they would have and so lawyers are like if you find something just don't do anything. >>: [inaudible] [laughter] >>: [inaudible] has like a crawler for websites [inaudible] >> John John: Yes. So we may end up using the historical Bing web information. >>: [inaudible] websites? >>: That’s slander. [inaudible] >>: So this is one for the audience, but I'm assuming that most people came here because they are interested in botnets and research on that. Do we have an internal [inaudible] for discussing this kind of thing? Should we? How many people are hip to be offered this kind of discussion? Not that many, okay. Okay well, maybe we should just get together whenever this is over. Share some notes and get something set up. >> Srikanth Kandula: Let's thank the speaker one more time. [applause]