1
>> Leonardo de Moura: It's my great pleasure to introduce Hassen Saidi. He's a computer scientist at SRI International. One big thing that Hassen did when he was a student was to invent predicate abstraction. His paper, last time I checked, had more than 1,000 citations. It's the basis of many model checkers, including SLAM, SSET, that we have here in Microsoft.
Recently Hassen moved to security. His recent work is described in a book,
Worm, the First Digital World War. From the same author of Blackhawk Down.
Maybe in the future, you're going to have Hassen in the movies.
>> Hassen Saidi: Thank you. Today, I'm going to talk about challenges in malware analysis. Besides malware analysis, I work on different projects related to security and formal methods. And specifically, mobile security. We have some interesting projects with partners like Cambridge University where we're doing more sort of like foundational work, trying to design hosts from scratch to be more secure and applying that to the networking world where we're building switches and routers that are more secure from the ground up.
But today, I'm talking about malware analysis. It's sort of like really interesting topic in the sense that it's attracting a lot of interest in the media, and in several sort of like government circles, because it's really important topic when it comes to securing the digital infrastructure of countries, different countries around the world.
So this is, you've probably seen this. This is the internet more than 30, 40 years ago. 40 years ago. It was a small network. It has grown, and with every sort of like scale in its growth, the threats to the internet and to various component of the sort of like what constitute the digital infrastructure have grown.
Started with simple worms that were returned as sort of like proof of concept and toys that replicate very rapidly across the network, to more and more sophisticated threats, where people started using multiple vectors of infection and combinating in sort of a certain payloads, bot nets, bank intrusions and so on. And the latest ones that are sort of like really interesting and are related to even cyber warfare, such as the Stuxnet and flame and other variants.
And we're really not dealing with a single network when you think about it.
2
The internet has grown from sort of like an IP network to the web. On top of that, you have social networks, you have cyber physical networks, you have critical infrastructures that were separate kind of like networks but now they're actually being connected to the global network through the internet.
And on top of that, you have also mobile networks.
And one thing to notice is something from biology that we learned from biology is that whenever you have such connected world, connected, basically that connectedness actually gives edges to the most potent of viruses. This is a well-known fact in biology, and you can see that translating here in the sense that our people who are actually using the internet, actually infect the critical infrastructure so they can actually build much more potent viruses.
So the way people actually infect the infrastructure is through vulnerabilities. This is a list of compiled by various sources of the price of vulnerabilities in various systems. This is basically across the board, operating systems vulnerabilities, browser vulnerabilities, some application vulnerabilities and so on.
And you can see, some of these actually go for quite a lot of money hundreds of thousands of dollars. But you don't need a huge amount of money to actually infect the entire world. This is, for instance, the example of Conficker that hit the world in 2008-2009. Basically sent in a message to a random IP address on the internet. So if you are connected to the internet, you have an IP address, and if you have somebody basically generating randomly IP addresses, at some point they're going to hit your IP address and just the fact that you are connected to the internet basically makes you vulnerable to these attacks.
Somebody can send you a message, and if you have a vulnerability in your operating system or in your infrastructure that's listening to that sort of like message, well, you get infected. In this case, there was vulnerability in the Microsoft operating system that was very well documented. There was a patch for it. It was patched, but somebody actually wrote an exploit for it and distributed it on the internet and actually was selling it on the Chinese website for $37.80. So it's not a lot of money.
So if you have that exploit and you already thought about a scheme for infecting the world, you know, it's quite a cheap investment. And one thing to notice here is that the vulnerability was actually patched. So you wonder, you know, it's like how come people can exploit vulnerabilities that are already
3 patched? It's because it's a global problem. It's not just a local problem.
If you are in the U.S., you know, most people are actually running legitimate versions of the Microsoft operating system so they actually have valid licenses. They get free updates. They actually are diligent about their security and updates.
If you are in the rest of the world, you're more likely to actually get a pirated copy so you don't care about updates. You have a fake antivirus product running on your machine and so on. So you see, like, here when we looked at the number of machines that were infected by Conficker, you know, the lion's share was in China, Brazil, Russia, India and you can easily imagine why.
Let's look at another example. For instance, Stuxnet. So in Stuxnet, the machines that were infected were not on the internet, for instance. So you don't need necessarily to be connected to the internet to be infected. So you had a system running Microsoft operating system, Windows, and it was controlling industrial system.
So if you go to a factory, you have machines, built-in products, you have raw bots and guess what? Those are actually controlled by a computer, and those computers, of course, were not connected in the case of Stuxnet, were not connected to the internet, but they actually, they have other -- I mean, a bunch of components that actually make them open to the world, you know. USB ports, connection to printers, email services, file shares and so on.
In this case, they were just simply connected to a printer that was shared with other PCs that were sharing an LAN, basically, and actually some of these PCs were connected only the internet, but most of them actually were not. But somebody actually put a USB key in one of those machines and that infected other machines that share the same network, and some of these machines are actually attacked the machines that were controlling the industrial system that were supposed to be completely shielded from the internet and infected the controller of these industrial systems.
So you have a machine or a robot that's programmed to do a particular action, you can actually change that, override that.
And the way they did it in this case is not through a $37.80 exploit, but
4 through actually three or four zero-day attacks that nobody knew about. Some of them were known, most of them were not. And you can imagine how valuable those zero-day vulnerabilities were.
But when you think bit, they needed all of those zero-day vulnerabilities, because they were going after a really interesting and much more hardened target, whereas in the Conficker case, just the fact that you were connected to the internet would make you vulnerable. Here, they really needed, first of all, to exploit a vulnerability in the way the system would actually would execute an executable just because you introduce USB key in your computer.
So they use a vulnerability there. They used another vulnerability to actually, to infect PCs on the same shared network, and then they used a different vulnerability, which actually was an interesting one. Kind of like publicly disclosed, I think, year before Stuxnet appeared. But it was an obscure blog. Somebody said, oh, yeah, you can actually do this if you're connected to the internet. I can actually connect -- if you share a printer, you can actually make the other machine execute script.
And once they got to the machine, the interesting machine that actually was controlling the industrial process, that was easy, because in that case, you actually needed a password, but the password was actually publicly available on the internet, because all of these machines running Siemens software had the same password worldwide and it was posted on the internet because if you have thousands of clients around the world, you don't want them to have, like, different passwords because clients forget their passwords and you have to reset it and so on. So they set a single password for everybody. Everybody knew it and so on. It's really interesting.
And if you go to like, if you are interested in these industrial systems, it's really interesting. You can look up in the internet, like, IP addresses and so on. So these, for instance, utility companies, they have machines that are connected to the internet and they need actually to maintain these machines.
So what they do, they actually post all of that information online.
So if somebody wants to access that machine to do some maintenance, they say oh, you know, that's the IP address of the machine, just log on there and change whatever you need to change.
So one thing to note about those vulnerabilities is that things have gotten
5 better, actually. A lot of these vulnerabilities -- there are less, in some sets, vulnerabilities in these modern operating systems. I mean, things are getting better. It's much harder to get -- to compromise the newest version, for instance, of the Windows operating system. It is much harder, for instance, to jailbreak, for instance, if you have an iPhone, for instance.
It's much harder now to jail break these iPhones. I mean, it's becoming more and more complex, because people are putting more and more security features in those products and so on.
But the malware, in itself, is getting more interesting. I mean, people are building large botnets. They spend spam, they can monetize spam, actually, quite effectively. You have people, basically, building capabilities like back door and key log in and root kits to actually spy on people, you know, basically countries spying on other countries. Countries of involved in industrial espionage. This is basically rampant and people are building more and more sophisticated pieces of malware.
APT, advanced persistent threats, is an example of that. So if you have somebody write in malware to infect millions of machines, that's one thing.
But if you get people, basically, building or crafting malicious software that is actually -- that, in fact, is basically just a hundred computers, that's much harder to deal with.
If you remember, the Dalai Lama organization had their computers, for instance, hacked into two or three years ago. And it was roughly like a thousand machines that were hacked into. It wasn't like a large number of machines. I think the government of India also a couple of years ago discovered that everything that they were doing in their, you know, Minister of Foreign Affairs and Minister of Defense was basically siphoned to China and China had basically access to everything, all of their policy decisions and so on.
And they didn't infect, like, millions of machines, like in the case of
Conficker or somebody wanted to steal money. They just infected a few hundred machines, and they had access to all of that wealth of information.
And when you're dealing with those kind of issues, so once you have an infection, you have some sort of like binary that you have that infects your system, and you have to actually figure out what that binary is doing, how it's infecting your system, and how do you sort of like clean up your system, how do you deal with it.
6
And malware reverse engineering is close to impossible. I mean, it's very hard. I mean, it's -- actually, maybe I shouldn't say close to impossible. It just takes time. I mean, that's the issue. If you have enough resources, you can do very good job in understanding what a particular malware that affects your system is doing.
But it's really time consuming. So you need automated techniques. So you end up with binaries like that, X forms, maybe some strings, and you want to know what does that binary do on my system. Does it have a hidden logic. What happens, you know. It's been on my machine for like a month, a day. What's the full capability of the malware and so on.
Just a few examples. When I worked on Conficker, for instance, in 2008, 2009, there was one variant that was completely obfuscated. I mean, it was like really hard to look at code. You look at like the assembly code and it was basically spaghetti code and a lot of the calls, the function calls were obfuscated. You couldn't figure out what the malware was doing.
So it took me about like four weeks to basically write a report and rebuild the entire malware, basically, and get it into C form so you could look at it and read it as a C-code. So if you have to do this for every piece of malware that you get, I mean, it's really, it's crazy.
It took Symantec, for instance, six months to produce a full report on Stuxnet.
And I'm pretty sure that there are aspects of Stuxnet that they're not fully aware of. They cannot understand the malware 100 percent. They can't tell you, basically, what every line of code does, what every bit of data is for.
So it takes them six months, and they throw at it a lot of people.
And when I say six months more Symantec, this basically, I should add sort of like the effort that Microsoft guys were involved in, and the Kaspersky guys were involved in. If it takes you six months to understand what a piece of malware is doing on your system, I mean, in six months, you know, I can steal all your data, steal all your money and I'm gone. I don't care after six months.
>>: So you can stop it before six months. You have to understand it to stop it? That's how [indiscernible] detection works.
7
>> Hassen Saidi: You can stop it to a certain extent. I mean, like, for instance, in Conficker. I mean, Conficker until today, it's still there. I mean, we have signature for it. It's still there. I mean, if you go to the
Conficker working group page, it still has the stats where all the machines that are infected. I don't really know what are the latest numbers, but at least, at least, you know, it's over a million that are still infected.
>>: Many more people who [indiscernible].
>> Hassen Saidi: Right, right. And that causes, I mean, that's the point about this being a global problem is that you can stop it somewhere. But if it's really aggressive in its propagation, it's much harder.
Here, for instance, the example of Stuxnet, I mean, you can see, for instance, that even though people discovered Stuxnet after six months, they fully understand what it is, now they're discovering, for instance, Flame, which has been there actually for like, I think they're saying it's been there for years.
And it's quite interesting.
So what you want is basically to dedicate your resources, your especially your reverse engineering resources to only work on the difficult part and use automated tools for the rest. So why we built something called Malgram, which tries to actually solve these problems. The idea is you want automated reverse engineering.
I want to give you a binary that I found on my system, and I want you to tell me that it does and produce a report close to sort of like an automated way of generating reports. But instead of doing it in six months, you want to do it in a much shorter period of time so at least you can get an idea of what piece of malware is doing and focus your energy on a particular aspect of it.
So what it does, basically, it has components doing dynamic analysis and static analysis, and those are basically, that's what people do when they analyze malware. I mean, people either run the malware, they collect forensic data, they see what the malware is doing on your system, or they look at it just from the static point of view. They look at the code and try to understand what the code does.
So the first challenge, when you get, for instance, a piece of malware is so you get a piece of malware, and this is actually a real piece of malware that I
8 got from somebody who was actually in China, who visited China, actually. And when that person came back from that trip, they were suspect that their computer was actually infected.
So they gave me a piece of malware and say, you know, look at it. And you get it and you go to these websites like virus total, you submit the binary and say, look, do you know something about this binary? Have you seen it before?
Has any of the antivirus software seen this binary?
And if you're lucky, you get multiple hits by saying, yeah, you know, Avast knows about this, bid defender knows about this, Kaspersky knows about this,
Symantec knows about this, and you can have a link to report that describes what the malware does.
In this case, only one -- it's still now, you know, it's after almost over a year, two years, you have one obscure thing saying Trojan. You have no information whatsoever. And most of the case, when you have something that hit the internet like, for instance, like the case of Conficker, the first instance we got it, we submitted to [indiscernible] and says zero. Nobody knows about this thing. Have no idea what it is.
So what do people do? They actually take the binary. So in this case, for instance, McAfee, there's another Chinese company, antivirus company that produces like automated reports. When they get a sample, they analyze it and they actually produce a report and this person actually gave the sample to
McAfee friends and McAfee and the other friends and this Chinese security company, and they produced a report for him.
And when you click on the report, basically didn't say much. All it says, if you go to characteristics, it says, basically, it drops these files. That's why most of these dynamic reports basically try and give a label for these samples. In this case, they say it's a dropper, because it creates file on your system. What do these files do? Have no idea. And then they say, oh, you know, it talks to the internet and it talks to Microsoft. Okay. That's all it does. But when you look at it closely, when you open the code and try and figure out what's in that code, well, you see a reference to Microsoft updates, Microsoft. And what happened is that somebody is actually creating this buffer full of data that looks like, for instance, HTTP traffic with HTTP headers. And sending this through the send, putting it in buffer and send.
9
So somebody is opening, like, sockets to some IP address and sending data that looks like HTTP traffic with nice headers and putting somewhere like, you know,
Microsoft update dot Microsoft dot-com, and if you're just relying on these sort of like dynamic analysis engines, you're completely fooled. You're saying, oh, this thing doesn't do much, it just talked to this IP address.
And in this case, it was actually talking to an IP address in, I think, one was in Malaysia and one was in indicia that were not Microsoft IP addresses.
So what does Malgram do? It basically, it showcases the sort of like the different phases for analyzing malware. It has some sort of file format analysis. You get a binary. You want to know, you know, is this DLL, is this an executable, and so on. Is it, for instance, packed? Is it -- packed meaning encrypted, or it's not and so on.
It has a dynamic analysis. It's similar to the kind of like reports you've seen, basically tells you, oh, no, these are all the files that it creates.
These are all the network communication that it issues and so on.
And then it has a component of static analysis basically that digs deep into the code and tries to rebuild, basically, recover Symantec information from the binary to tell you exactly so that you don't get fooled by those, basically, dynamic reports that says, well, this thing is talking to Microsoft. Say no, this is not talking to Microsoft. It's basically, it's opening a raw socket.
It's not actually doing HTTP traffic and so on.
So some of the components that the static analyzer has, packing component, which basically if you look at the code and it's scrambled or it's compressed, it actually unravels its inner sort of like logic. It has some sort of like decompilation capabilities so that you can actually go from binary to assembly to C-code and so on.
And it has some analysis components that try to actually -- now, it's not easy to go from binary to C-code. But sometimes, even looking at the C-code, sometimes some of the C-code is actually worse than the assembly. It's much easier to look at the assembly. But even if you have C-code, you still have that question, what does this C-code do.
And, you know, we know that's a difficult problem to solve. I think that's like 70 or 80-year-old problem. I need to show you sort of like a sample
10 report. I don't know if this is open to the rest of the world so you can see it. Yeah, something like this.
So it basically, in this case, tell you about -- this was [indiscernible]. So you submit a sample. It tells you, we think it's bi-root because of what some other people are thinking. It actually includes some interesting component, which is basically comparing the sample image to other samples. And this is actually quite interesting, because it's a little bit of science fiction. It's interesting how it's not 100 percent bulletproof. But when it works is really effective. And it's actually fast. It allows you to do matching through a large number of samples quite easily.
And then it tells you, you know, when you run that, basically, executable, it spawns different processes and it tells you, basically, what all the different library calls it invoked. It also gives you some -- you know, these are basically forensic data that it tells you basically that it creates some new keys and so on.
So you can generate from this some sort of like a signature for this particular sample. So let me move to the static analysis. So because dynamic analysis, I mean, it's much more popular, because it's much easier to set up, you know.
You just have like a bunch of virtual machines. You run your sample, and enter different configuration of the OS. You can say, oh, this is what it does on, say, XP. This is what it does on Windows 7. This is what it does on Windows
NT and so on. And you collect all that data.
And this is what most people do, because it's much easy to set up. It's easy to give you, for instance, quick data about the malware, you know. For instance, you know, like you've seen in the report. As soon as you run that thing, it issues like a communication with the outside world. But what happens when you run sample that actually has a time bomb that says, you know, I'll only execute this when it's generally first and so on. That's a little bit tricky to figure out.
But people use different tricks. They move the clock forward, they move it backward. They play with it. They do interesting things.
And the idea here is that instead of looking at these effect-oriented profiles like, you know, what files are being created, what registry keys are being modified, you want to perform static analysis so you can figure out what the
11 binary does. Not what it does when you run it for five minutes, but what does it do, what are the full capabilities.
So ideally, what you want is to go from row binary to some sort of assembly code. And from assembly code, you want to go to a more structured assembly code. You want to get so sort of like functions. Saying these sets of instructions constitute a function.
And hopefully, you want to go from that to decompile code so that you can, instead of looking at, like, 20 machine instruction, you can look at like two lines of code. And at every step of the way, you have challenges in the sense that none of these things are easy in the sense that they can all be, all of these phases could be actually made difficult through obfuscation and all tie analysis techniques and so on.
So what people do is basically, malware authors, in some sense, they know that most people are trying to do this. And they know that to a certain extent, it's not difficult if you have enough resources. And what they're after, they don't care if you can decompile the code. They really don't care. They say, oh, go ahead, decompile. Figure out what I'm trying to do to your system.
They just want to slow you down. They don't want you to do it, like, after five seconds from, you know, finding out that you have a new malware strain on the internet.
They want you to be able to do this after five months. So after five months, they don't care. They already made their money. They already stole your information and so on. So basically, what most of the obfuscation techniques are, the most popular one is packing, basically. They just compress the malware. I'm going to show you examples of that.
But after that, what they do is basically, you know, you write your code in C or whatever, C plus plus, you compile it and you get an executable. And what they do with that is they compress it or they actually rewrite it and compress it. By rewriting it, they basically produce a semantically equivalent malware.
It's just that that binary is much harder to analyze.
So what's the purpose of code obfuscation? Like I said, it's basically slowing down the analysis effort. And so some of the common things they do is basically they try to hide API calls. Because if you look at the binary and you say, oh, you know, this is open file, this is, you know, create a mute X,
12 this is sleep command and so on, so you can read the code and understand what it does.
They try and hide that. They try to actually hide the control flow, make it difficult to actually rebuilding the control flow by basically putting indirect jumps. They compute addresses at run time. So only by running the malware that you can actually follow the control flow.
But the problem is that if you combine that with like time bombs and logic bombs that says only execute this on a certain day, only execute this if you have this specific configuration of the system, and if they can hide those checks, then you will never execute part of the code. So you will never get to the code, to the part of the code where you can actually rebuild the control flow.
They also try and basically make static analysis almost impossible, basically by -- if you have control flow obfuscation, you can't actually build control flow graph of a function. So you can't know where a function starts, where it ends. So you can't know how many arguments it takes. It fools, basically, the static analysis engine quite dramatically.
And I'll talk briefly about several techniques to basically deal with these situations. The first thing is basically unpacking. So the malware is packed, you have to unpack it. Basically, if it's compressed, encrypted, you have to decrypt it and get to the logic. Once you get to the logic, you have to disassemble it. Now the disassembly has all sort of like, you know, anti-disassembly techniques to thwart that.
If you are successful with the unpacking and the disassembly, then you want to do decompilation. And there are techniques, actually, that are embedded in the binary that will prevent you from decompiling the code. And finally, if you actually compile -- decompile the code to a C code, then you still have to understand what it does. For instance, you know, I want to understand that this piece of code that I decompiled is basically an encryption algorithm, because that tells me a lot. It tells me that the output of this function is actually used on -- put in a buffer and you're sending that buffer. It tells me, oh, you're sending encrypted data. So I don't actually have to look at those network traffic logs and try to figure out where you send it in and so on.
13
So one note about code obfuscation. So a lot of people in this room, for instance, know a lot about program analysis. You can think about really evil ways of obfuscating a piece of code that actually very few people can understand. You can make it very, very difficult. So why don't -- why do these people who are actually write malwares, why don't they use these sophisticated techniques? They are as smart as us. I mean, they could use -- they could use any of these techniques.
But here's the thing. They have to strike a balance. First of all, you know, it's too much work for them. They have to think about these techniques and think about the threat model, try to come up with something that's really hard to analyze. And they say, well, maybe it's too much work. It's not worth it.
Why am I spending too much time with this, you know, since most people are actually running unpack systems, most people are not protected anyway so I'm just going to make my millions and not care about that.
They have to, basically, make sure -- you know, because what happens, you're taking a binary, you're basically transforming that binary into semantically equivalent binary that actually should run. And they run, actually, quite sophisticated and scaleable infrastructures. You write piece of malware that infects millions of machines and you want all of these machines to actually work properly, to contact you on a regular basis so that you can obtain them you, don't want that binary to have actually a glitch that makes it crash.
That actually, you know, cuts your bottom line in some sense.
The other thing that actually I heard that from several sources, which is quite interesting. They say that people don't want to use quite sophisticated obfuscation, because they don't want the security researchers and security companies to actually come up with quite sophisticated technique to deal with those. Because they want to maintain the level of the sort of like interaction with security companies and security researchers at a very low level.
So they don't want to come up with sophisticated techniques so that the security companies don't become smarter. They can basically come up with, you know, easy, sort of like solutions and so on. You have a question?
>>: Yeah, I do. Just concretely, so do you know what obfuscation Stuxnet and
Flame use?
>> Hassen Saidi: No obfuscation. It's C plus plus. It's obfuscation by C
14 plus plus. It's death by C plus plus.
>>: It didn't take the three months or whatever, obviously.
>> Hassen Saidi: Well, they did -- okay. So they did actually control flow, some control flow obfuscation. For instance, what they had, their binary had like, I think, 19 exported functions in their DLL, and the code actually -- so each of these functions had a particular functionality. One was doing propagation. One was actually going after the PLC code to replace it. One was doing networking so it was actually exfiltrating data. One of them was actually listening to traffic for updates and so on.
But the way they actually called each other was quite interesting. So basically, you would see in the code some functions taken as argument constant, which was the number of the entry, the DLL entry. And, I mean, it wasn't quite sophisticated, but it took, you know, some time to look at it to figure out oh, okay, you're using the six as an argument and, you know, three or four, five calls later, you see that basically what you were calling, you were calling the six DLL export in that thing.
That's all they did.
They encrypted some -- I mean, they basically stored some strings so it wasn't obvious if you just run the string command to figure out what kind of strings they had embedded. But this is like, this is not sophisticated at all compared to other people, other sophistication techniques.
But mainly, mainly it was death by C plus plus. You would look at, like, all of this code, you know, which was like, you know, hundreds of functions that were there basically to handle like string objects so you see, you know, if wasn't C plus plus, you would see like strong concatenation function. For instance, they would use a library function that does that.
Instead, you would look at like this graph, call graph tree that has like 30 functions. And what it did, basically, it concatenated two objects. And then it takes a while to figure out that's what you're doing.
>>: You think it was intentional, they just chose a particular library to do this stuff that was harder to disassemble?
15
>> Hassen Saidi: No it's just like C plus plus is ugly to look at. I mean --
>>: [indiscernible].
>> Hassen Saidi: It's quite effective. It's quite effective, because what happens, it bloats the code so instead of looking at, like, a small portion that actually was doing the real thing, you were spending time looking at like what, oh, yeah, this is string concatenation. Duh. And C plus plus, I mean, I would C plus plus as an obfuscation technique, because it's quite -- and they didn't use it really, like, they could have done much worse in terms of obfuscation.
But because they're not using sophisticated obfuscation, there is actually a hope that you can actually undo the obfuscation, because the obfuscation is basically systematic. Once you figure out that systematic rewrite, you can basically undo it.
So like I said, basically, ideally what you have, you have like somebody writing source code, compiling it into an executable, you would disassemble it and decompile it and get some legitimate C or C plus plus. On the other hand, if you're analyzing an obfuscated binary, you do the same thing and you end up with a mess, basically. You end up with some C code that is completely incomprehensible or you actually fail to decompile or to disassemble.
And what we do is basically just take in that assembly code as basically just rewriting it. Just take the assembly code and rewrite it and rebuild the assembly file until it's decompileable and then we decompile it.
So the first thing that we do is basically unpacking. So I can actually give you an example, a live example of that. If you are -- so here's a piece of malware. You open it in [indiscernible], which is like a disassembler. This is what you see. It has only one function. The rest is all data. That's typically how you find malware. I mean, malware is basically distributed this way.
>>: Are there instances where non-malware is obfuscated this way?
>> Hassen Saidi: Not exactly. People use obfuscation to protect intellectual property. But they usually don't do it this way. There are other, more effective techniques to do it. The reason they actually compress the code this
16 way is just to actually reduce the size of the code. Because, you know, a small file, you know, is not really suspicious if it's on the system.
If you're looking at a file that has, you know, two megabytes and it's residing on the system, you try and figure out what it does. But if it's like 5K, it's like, you know, who knows what it does. Also, if you're transmitting it on the internet, if you're basically propagating it, you want to minimize the footprint.
So behind the scene, one of the component of Malgram is this Eureka unpacker.
So you can just -- oh, no. Malware. So you can give it an executable and it actually is going to execute it and it's going to drop a file hopefully. It dropped this hidden code file. And when you open this hidden code in the disassembler, you look at it, ah, suddenly this is the code.
So that was exactly the same file. Just by running the file, it basically, it decrypts itself. But you still have to now look at this file and say what does it do? That do all these functions do?
>>: Doesn't that give you a way to [indiscernible] malware? It's the only thing that comes in this manner that's predictable, right?
>> Hassen Saidi: Yes.
>>: But your goal was to do static detection, right? Not to emulate, right?
>>: No, the original thing was very different from the regular program, anything that sits in your computer is identified.
>>: Why not build an intrusion detection system that just looks for those kind of files?
>>: I thought the question was about what does it do now. Is it classification -- you're talking about classification.
>>: Right. The second question is what it does, but the first question is just to get rid of it.
>> Hassen Saidi: This is sort of like the standard technique for actually flagging something as suspicious. If it's basically packed, it's already
17 suspicious. Now, because something was packed, and it's running on your system and it impacts itself at run time, that could be suspicious. That could be, you know, somebody protecting their intellectual property.
But people are usually actually, who actually try and protect their intellectual property, they wouldn't use these compression techniques. That's right. But this is, yeah, I mean, it's remarkable how easy you can come up with ways of detecting malware.
But in practice, it's not. I mean, we run, for instance, at SRI, a Symantec product and we get spam every day that passes through their filter. And I'll run min or max so I can look at it and open it. Still, people open attachments and so on.
>>: But the goal there is to have no false positive rate so they'll err on the side of saying no, rather than on the side of understanding things.
>> Hassen Saidi: Right.
>>: There are different goals, the one you have and the one they have?
>> Hassen Saidi: That's right, yes. That goes back to your point. So once you get to that -- so like I said, you have that pack binary and then you unpack it. You have some binary and you try and figure out what it does now.
Like I showed you the example of the difference between the two, maybe a point about doing that. So the way we do it is basically the following way. So you have a binary and it's packed. And you want to basically unpack it.
The way you do it is basically you run the malware, and you try and figure out when does it actually reveal its logic. And one way of doing this, and the way we do it is looking for appearances of, like, instruction machines.
So we look at, like, calls and push instructions. And the interesting thing is that depending on the packer, you see these packs. You run this malware and after maybe 350 system calls, you observe a spike in those instructions. So you can say, well, this is enough. I think the malware decrypted itself so I can actually capture its image and actually dump its memory content to analyze it.
18
Some others are actually clever. They do first step, they say, well, because they do things in steps, they look at like, you know, am I running on the virtual machines, things like that. It's like yeah so they bail out here.
So actually they maybe reveal enough logic to do the checks. It's like oh, I'm running a debugger. I'm running in a virtual machine so I'm going to bail out.
Otherwise, they keep going and say okay, I'm going to execute until completion.
So one of the things that you get in those binaries, you still have to actually rebuild them correctly, because you grab the memory image. Now you don't know where that actually, where that logic basically starts executing, because if you want to do that, you have to do like, basically, instruction level tracing to figure out oh, you know, I'm doing this long jump there or something like that.
References to APIs are basically all dynamically computed. So you have to actually rebuild them. And also, you might have calls through registers and so on. And you have to actually rebuild those.
One of the things, for instance, is in a binary, in a very well-behaved binary, you know, you have like an import table. You have APIs, and then you have like, you know, this is called to create file mapping so I know exactly all the elements that you push from the stack. I know their type, their meaning and so on.
But what happens is that if you're lucky, if you're lucky, what you will get, you will get these dynamic references to the addresses where these things are usually loaded. So you can see, for instance, here a call to a function, a call through a register and you can do data flow analysis and say, well, you know, you're actually -- the address that you're calling is actually in this variable. And then you look at the variable and this is like there's some value.
And then you say, well, I know this value because this is the standard address of a well-known API. But what happens is that sometime, the malware loads the
DLLs into non-standard locations so you won't be able to actually figure out -- well, you can actually figure out, but you have to do a little bit more work than just the look-up table.
Other things that you can look for is this data to code ratio. A binary will
19 have usually a lot of code, unless it's a weird binary. There are some cases where there's a lot of data, and that's where you really have to scratch your head. Like, for instance in Stuxnet, there are all these modules, and that was data. It was clearly not X-86 code. It was data. But what was it? It was actually pieces of code, PLC code, that they were actually trying to override.
And if you get binary code, you get like hex values, you have no idea what those hex values are.
The other case is when you get things like this. So your call-in value start in a pointer which references an array and then you're looking at, like, 60 byte into that array. And you want to say, well, I really want to recover from this the exit process. And we have technique exactly for doing this.
I don't have time to actually go through them, but they actually use sort of like type analysis to figure out what the call is. So basically, looking at like, you know, how many functions actually take one argument and this actually function actually occurs at the end of the execution, the longest execution paths and stuff like that. So I know that this is exit and so on.
So after doing that, there's one thing that we do. We actually find the OEP, the original entry point. So you get a file. It has all these functions. You want to know which function is the main function. Because now you can figure out where exit process is, where get command line is, where get, you know, create mute X, which is always called usually in the beginning of the execution, you can figure out oh, that's where this code actually starts executing.
By doing a combination of looking at APIs and also looking at the call graph, you can figure out where. So once you know that, you can actually rebuild the import table. So you grab the memory image of a process, running process. It doesn't have like an import table, because the import table was rebuilt dynamically, and it's all stored in an array. So we basically grab that, rebuild the binary totally, grabbing the code part, put in a new header, compute in the section sizes and stuff like that correctly, rebuild in the import table, set in the entry point.
And when you do that, then you can actually disassemble properly the code and hopefully decompile it. But not so easy.
20
So the next thing is that even if you actually rebuild the import table and so on, and sometimes know where the entry point is, you might not know where all the functions are. So one of the, for instance, techniques that people use is this chunk, and I call it de-chunking as sort of like a way of undoing this.
But this is an example of this. So you have code, and I think this was -- yeah, this was Conficker. So this was a function. But instead of having a function stored this way in the file, basically in a contiguous memory region, it's broken into pieces, and every block is basically, it's in a different address. And basically, the code is stitched together with long jumps.
Now, you know, this is not a difficult thing for the disassembler to know. And even the decompiler to figure out. But the difficulty is that you see this chunk, for instance, or this chunk. These are very common chunks of code. So actually, this is actually a compiler optimization. So that these chunks are actually shared by multiple functions.
So the disassembler and the decompiler get confused. It's like okay, which function starts where, and where does it end. And if you have these chunks that are multiple -- shared by multiple functions, the disassembler gets a little bit actually confused.
And only if you stitch it together this way, then you can actually decompile it. And basically, that's what we do is basically look at that code that we got after finding the entry point and rebuilding the import table and basically rebuilding all the function.
This is, I mean, it's a little bit tricky, because you have to deal with relocation of code and stuff like that. So we compute exactly the right addresses and all the right references and so on to be able to decompile.
This was used actually in Conficker and Hidrag. Hidrag was involved in the
Google attack in 2010. And basically, that code was not obfuscated. It was right there, you could look at it. But if you tried to decompile it or look at it in the disassembler, it looked really ugly. It looked really ugly because all of the functions were in this spaghetti form. So even looking at it was difficult. You could trace the code. But again, there's this problem of am I tracing all the code. Am I executing all the paths. And that's sort of like a really difficult problem to solve. Because there are so many ways to make you -- basically, to force you to take certain paths and not the others and
21 make it so difficult. There are these techniques of opaque predicate. So if you make all of the jumps, all the conditional jumps very hard to evaluate, then I can make you go through certain path. But if you want to now reason about what are the conditions you can use as empty [indiscernible] or something. Say, you know, give me a condition under which I take the other path, I can make that extremely difficult for you to figure out through these opaque predicates.
So like, for instance, in this case, if you looked at the code -- I'm trying to remember. Yeah, so basically you look at the disassembler and the disassembler was telling you, there are 185 functions. But, in fact, when you break them apart, rebuild them, there were only 141 functions.
So most of the 185 functions, the most interesting ones weren't -- it wasn't possible to decompile them until you actually rebuild the file completely, and you can then decompile everything and look at it. It was, like, really easy to analyze after that.
>>: So the next now uses return to [indiscernible] to avoid have to eject code at all?
>> Hassen Saidi: Right.
>>: Have you seen binaries that do that just for obfuscation purposes?
Imagine, basically, it's the same kind of thing. You can take a C program or whatever and then compile it down to a bunch of gadgets and then just execute the gadgets, in which case they're tiny chunks.
>> Hassen Saidi: I haven't seen something like that. I would consider that as a -- the bad guys are aware of these things. I can tell you that the bad guys, they read all of the, you know, Usenics, CCS, Oakland papers, because it's remarkable. When they see something really interesting or when they know that you're doing something interesting, immediately they actually try to work around it.
But this, I would consider that as maybe too sophisticated for them in the sense that, yeah, they know it, but then they say well, you know, why use it.
It's always a trade-off for them. You know, if it's too much work for them, because even for you, if you want to actually write an exploit using a return to lipsy attacks, it's quite a lot of work for you to do that.
22
>>: [inaudible].
>> Hassen Saidi: Right. You had a question?
>>: No, sorry.
>> Hassen Saidi: Okay. So after you get the C code, we have this thing, we had a new component of it that actually tries to recover the types and the names of the variables in the C code. So what do we do is basically try and improve the type analysis. Because if you are looking at assembly code, everything is a pointer. Everything is an int, but you want to know. I want to know if this pointer is a pointer to a buffer or is this a pointer to an integer.
Is this a pointer to -- if it's a buffer to a -- if it's a pointer to a buffer,
I want to know if it's an encrypted buffer or a clear buffer, for instance. So you want to recover type information and also semantically rich information.
So this is an example where, you know, basically, instead of like generic names of, like, first argument, second argument, third argument, you can say, well, this is actually pointed to a destination buffer. This is actually source buffer and so on.
And the idea is also to propagate those names and those types to actually other functions and so on. And the idea is -- this is, for instance, a comparison between like what a disassembler figures out in terms of like names and what we can actually figure out in terms of like names and types. We can do roughly like 50 percent or more. Actually, I have new numbers that are actually much better. We can do like 80 percent typing and renaming of variables. Because what happens is that again, you know, you jump through all these hoops. You get C code and you still have to figure out what the C code does.
And I hate to read C code that says, you know, it has -- this function has three arguments and they're all pointed to int. It's like I know, they could all be pointed to void, actually. And I could type everything as a pointer to void and my program would be completely perfectly typed correctly. But I don't want that. I want to know that, oh, that argument is actually buffer. And it's an encrypted buffer, and the second argument is an integer, but it's actually the length of the buffer and so on.
23
So I can quickly go through like 200 or 300 functions if I have to look at the binary and say these functions, I don't need to look at them, or this is encryption, this is quite interesting.
>>: Do you try to recover structural text?
>> Hassen Saidi: Yes, yes, we do. So we do simple types. We do structures.
And we do names, which is -- yeah, we don't try to just generate the types. We try to actually give them names. So we want to give you, you know, this is a file, for instance. We don't call it like, you know, just a buffer or something like that.
>>: It might be based on [indiscernible].
>> Hassen Saidi: Yes, yes. So basically, I downloaded all the documentation of the Windows APIs and I have it all in a database with all the names, the arguments and so on which is basically useful in figuring out which APIs. So it's quite interesting. There are like cases where you see this function and there's a call to a value through a pointer or through like a register and you figure out what does this function do.
And just simply look up in the database and say this thing has seven arguments.
The first one is an integer. The second one is a string. The third one is an integer so it cannot be really a pointer because it's a small integer and so on. And you put that, and it says oh, that's open file. And the 88,000 something Windows API, there are no other functions that are seven arguments that do this, blah, blah, blah. So it's quite a neat sort of like --
>>: [indiscernible] is your friend in this case.
>> Hassen Saidi: Yeah. Well, there are cases that are like -- maybe I shouldn't do this, but I looked at all the -- like, for instance, I looked at like a typical installation of XP, for instance, service pack 3. That was a couple years ago when it was probably the most predominant sort of like Windows system there.
And I looked at all the DLLs and basically scraped all their signatures and stuff like that. And there are a lot of them that are obscure and you find this function takes ten arguments and it's int int int int int int. That's not
24 really helpful. But if you look at malware in most applications, they don't use those. They don't use those.
There are about, like, what, 20, 25 DLLs that are mostly used. The rest are not. I mean, the rest are for obscure reasons.
The last point that I wanted to mention is crypto. So imagine that now I took binary. It was packed. I unpacked it. I managed to actually rebuild it properly. I managed to actually type it. I managed to decompile it. I still have to figure out, even sometimes you give it like meaningful names, but that's based on the Windows API.
So I know when you're doing like communication or networking, when you're doing like file manipulation, registry manipulation, there are some other things you might be interested in. One of the things might be, for instance, crypto which is used in malware to encrypt information and used also for digital signatures.
So I have this thing that actually does look in a binary for crypto modules.
And the way it does that is very, you know, through some heuristics. First of all, it looks for constants. A lot of encryption algorithm use constants if you look at MD 5 or shaw 1, shaw 2. Even shaw 3 has known constants.
Looking for padding analysis. Usually, in crypto algorithm, you start by allocating a large buffer, and then filling it with some weird sequence of bytes. And I look for that as well.
Look for the Microsoft crypto API as well. But then look for known computations which look like crypto. So large local variables, typically if you're doing like large numbers, sort of like multiplication and so on. Loops, when looking for typical op codes used for in crypto and so on.
So in this case, yeah, you can look up binary like this and say, well, these constants are basically shaw 1 and so on. Or look at an array of data, just an array and say oh, that's an S-box and so on.
And basically, by labeling the graph, the call graph with these colors, by saying, well, this is actually an MD-6 here or some unknown algorithm, here is a known algorithm. I have no idea what it is, but it looks like crypto and so on, and doing some sort of like clustering.
25
And this is, for instance, an example of a call graph for MD-6 that was present in Conficker. That was actually the first use and only use of MD-6. Nobody actually used it except the bad guys.
And when you look at the code, and the first time I wrote this and applied it to Conficker, it was really interesting. Even without knowing this function, this module told me that there were actually five unknown computations in the code. There were like 350 or 400 functions. But it told me that there were these four unknown function in this clusters, and there was one other function that was used in the same function as Rand. So that was okay. That was some function was doing some random number generation. That's fine.
But this cluster was a bit weird. And then when you go to the node that actually calls all of these, the common node, that was actually MD-6, because I had a version where I labeled all the functions and I had a version where I didn't label the functions so I ran it with the version that didn't have any labels. So when I looked up the name of this function, I was surprised. I saw
MD-6.
So with this, basically if you're looking at like a file that has, you know, hundreds of functions, that's typical for malware, hundreds of functions and you try and figure out where is the crypto, this actually helps you. Sometimes you don't discover, some of these operations are not basically crypto in the sense that they're implementing the crypto algorithm.
They basically tell you here what you're doing, you're transforming some information on the data. So you're taking the buffer and doing random things to it. Suddenly, you know, you look at the output of this function now suddenly is passed to a buffer and that buffer is sent.
So you say oh, you know, yeah, the data is not in the clear in some sense. The data. There was some transformation on the data that was applied.
So this is an example where I had, like, all kinds of, in a single sample, all kinds of functions that were labeled that way.
Probably I'll stop here for sort of like sake of time. But let me know if you have questions about any of the -- so the idea of the sort of like the report generation is that we have a single report that basically documents all of the steps that I described. So all of the dynamic output analysis is there.
26
The information about the packed version, the unpacked version, the complete disassembly of the unpacked version and complete decompilation with the output of type analyses and crypto finding and so on.
And the idea here is that -- so with this report, the idea is you can get a specialist who is like a malware analyst. You can save them a lot of time by giving them a higher overview of, high level overview of what the sample does.
So that if they have to do actually manual work, it's much more easy to look at the report and say, well, there's that part that I don't understand or this part is basically sounds interesting. Let me dive into it. Let me crack open the binary. Let me, you know, trace it or whatever.
And that's quite important. And also, to sort of like reconcile this dynamic end of the malware so you can see oh, yeah, you've done all of these communications here, and I've seen it when I ran this thing in the sand box, but I see all these threads here that were never executed. What are they about and so on.
So any questions?
>>: What do you find that these programs actually do?
>> Hassen Saidi: All kinds of things. So you find usually, the typical things they do are file changes. File system changes. So they actually drop files.
They typically, what they do is that they save a copy of themselves. So once they are on the system, they change their names, they store the same copies.
You can figure it out with the MD-5s and stuff like that, with the hashes.
The next thing they do is they want to persist on the system. So they have to find a way of actually, when you reboot your machine, they want to restart. So typically what they want to do, they change some service configuration, they register themselves as a service or something. I've seen samples. They don't hide themselves on the file system. They hide themselves in the registries.
They actually copy themselves as a registry key. This is quite, sort of like remarkable and they encrypt themselves so you can look at these registry keys and you can't figure out what they're doing.
And then you have networking, a lot of networking. All of these things, I mean
27 most of them do a lot of networking. Usually, it's HTTP going to some site and downloading updates. They do peer-to-peer, a lot of peer-to-peer protocols.
They all mostly have some sort of like networking protocol that you can actually extract from that.
>>: I would expect them to take something from my machine, take my data and send it somewhere.
>> Hassen Saidi: It's very interesting. Most of them don't have that.
>>: So they just lie on my system?
>> Hassen Saidi: Exactly. They sit there. So what they try and build, they build an infrastructure. So basically, they infect as many machines as they want, and then if they want to do something to your machine, if they want to steal that information, they want to actually send you something, execute, steal the information and disappear.
So basically what they're doing, they're buying -- they're not really buying, but they're acquiring real estate, computational real estate. A lot of them actually either use it or actually sell it. There are actually business models where you can actually rent stuff. You say, I have a million of machines. I'm going to sell you -- they have a price for a machine in a dot-com. They have a price of a machine in a dot-gov or in any country, for instance, domain country. They have pricing models for that and so on.
They wouldn't do something -- yeah. Except, for instance, with things like
APT, advanced persistent threats like rats and Stuxnet and things like that.
They want to be constantly there collecting info and then sending it quickly.
>>: How many of them are trying to [indiscernible]?
>> Hassen Saidi: A fair amount of them. I don't have like numbers, but it's a small percentage. I mean, but there are -- they do exist in a fair number.
>>: Can you say a few words on the [indiscernible] algorithms. Are there lots of diverse algorithms, or there's just one that --
>> Hassen Saidi: It's very interesting. So the characteristic of the packing algorithms is what makes this sort of like image analysis possible. That's why
28
I was saying, like science fiction. Imagine if I can tell you that if I took two binaries that are unpacked, they are clear, and I find that there's a distance between them, just looking at their image, like their -- what if I tell you, even if you pack them, compress them, I can still find some relationship just by looking at the images of the compressed.
That tells you something about the compression algorithm, that they're not trying basically to be -- these are not encryption algorithm in the sense that, you know, if it was really an encryption algorithm, you look at the output and you couldn't find some sort of like similarity in the two.
So very simple things. They sometimes sort the thing with a constant. So you look at that single function in the packed code and what that function does, says start at this address and soar every whatever bytes with this constant and that's it. And then jump to some address. It's very, it's as simple as that.
Those are the simple ones.
The more sophisticated ones that we do badly on those, I mean those are like really sophisticated, are the ones who use virtualization. Virtualization. So in this case, I don't compress the code. I just rewrite the code in some form, and I put it in the data section. And my code, basically, is a virtual machine. It basically reads the data and basically, based on the form of the data, I can say, oh, this is actually -- I need to execute a move. Or this, I need to execute an add. And this causes me to do a jump.
And you look at those, and these are very difficult to actually even undo, because you have to actually understand reverse engineer the virtual machine.
>>: Are there papers about that using embedding virtual machines and malware to -- it's essentially creating a new instruction set, right?
>> Hassen Saidi: Oh, yeah. So there's one packer which is very difficult to deal with. It's called Themida, T-h-e-m-i-d-a, and they use actually, they use virtualization. They actually use some cute tricks.
For instance, they never do a jump to an API. So you never see call open file.
What they do is that they copy the first few instruction of open file. They execute those as part of the code so you'll never see that you're doing actually open file. And then they jump to some weird address in the middle, or they virtualize the beginning of the API function, and then they jump covertly
29 to the rest of the instructions.
>>: You might like to know there's a new virus out attacking computers called
Gauss.
>> Hassen Saidi: Gauss, yes, I have a copy of it.
>>: You know about it?
>> Hassen Saidi: Yes.
>>: It just sent this afternoon [indiscernible].
>> Hassen Saidi: No, no. I don't care about it, actually. It's a lot of hype. It's actually stealing people's money. I think in the Middle East, people have a lot of money. It's not a big deal.
>>: So about packing, how many of the assemblers you analyze use, like, one of the well-known packers like you've got UBX, for example.
>> Hassen Saidi: Most of them use UBX. Most of them use the low grade packers. Again, this is, it's really fascinating to try and understand what their sort of like mental model is. But their mental model is really on a distributed virus, I want to make it a little bit difficult for you to detect it. So if you pack it, I know that every time I pack it, I pack the same malware. I'm going to obtain a different MD file, a different hash.
So if you are basically looking for particular hashes, you know, you're defenseless against this.
So I'm going to make this difficult for you, but I'm not going to make this difficult for you if you want to unpack it and stuff like that. So still, as a security company, I still want you to be in the business of, you know, selling security products, but I don't want you to be very sophisticated at doing this.
So it's basically just to defeat hash analysis.
And it's so cheap to do so. It's so cheap. I mean, that's why they use it.
You had a question in the back?
>>: Yeah, I had a bunch of questions, but let me ask one of them. So have you
30 seen or do you see much attempts to avoid emulation, essentially, the code trying to find out whether you're --
>> Hassen Saidi: Oh, yeah. I've got a sample like a week ago that basically, that sample was really interesting. It had embedded in it every trick in the book for detecting VM Ware, zen. It was very educational, actually, to figure out, oh, okay, I didn't know about this trick one.
But it's also not very difficult to detect virtual machines. I mean, they are very sort of like noisy. I've seen instructions where, yeah, some malware was -- it was very interesting. It was using get time or one of those APIs for getting time. Get time, and then sleep for certain period of milliseconds, get time again, doing the difference. And if the difference was not right, it would exit.
It's very simple thing. So if you were slowing down this thing or playing with a timer, it's so, so, so simple to detect those kind of tricks. I've seen examples where there was this piece of malware. It was executing millions and millions of instructions in a big loop, a huge loop. I was like, what the heck is it doing? It was doing nothing. You look at the register values before and after. Same.
So basically, if you were running that in an emulator and a lot of people do, and they do like instruction tracing, you will be spending a lot of time. You would be spending half a day executing that routine, which basically was doing absolutely nothing. So yeah, they use all kinds of tricks.
It's interesting. Nos all of these cute tricks are actually used by everybody.
People pick and choose. They're like, you know, they have a model of how much money they want to make, how much stealthy they want to be, and one interesting thing is with respect, for instance, to Stuxnet and things like that I was very surprised they didn't use a lot of tricks.
I mean, looking at like a lot of malware that came out of like Russia and things like that, where people are actually writing malware not to spy really on people but to actually make money, they use a lot of sophistication. So a lot of people could use actually a course from the Russian malware authors.
>>: Do you have a theory like -- it seems like Stuxnet is kind of a conundrum.
Because on the one hand, they had very sophisticated exploits, and they were
31 quite clever in that way. But on the other hand, they didn't use the sophistication in trying to --
>> Hassen Saidi: That's right, that's right.
>>: -- to be stealthy.
>> Hassen Saidi: It's a good question to ask, why they didn't do this. For instance, one of the things this they could do, and they didn't, was to contain it.
>>: Right.
>> Hassen Saidi: So this thing found itself in Indonesia and found itself in
India and all kinds of places. And it's like couldn't you have put a check that, you know, I shouldn't run if I'm somewhere else? I only run in that country or something like that.
So actually, the time difference would have been interesting. The time difference could have actually used that in multiple countries of interesting.
>>: How much money can they make, these guys?
>> Hassen Saidi: Oh, people have looked into that, and they make millions.
>>: [indiscernible].
>> Hassen Saidi: Well, I read this interesting article about a village in
Romania. Well, it's not a village, it's a small town, and the most noticeable piece of news is that Mercedes opened a shop there. A point of sale. Why?
Because that's the number one Romanian I.T. sort of like village or small town, and all of the hackers from Romania that are actually, a bunch of them, they were running some scam to steal money from car buyers through Craigslist and
Yahoo and things like that. They made quite a lot of money.
And Mercedes suddenly went to this obscure, small town in Romania and opened a shop, because there were suddenly few people who were very, very rich and somebody started buying Mercedes.
So they make money. One thing to notice is that there aren't that many groups
32 making money. When you look at, like, spam, phishing, roughly 10, 15 groups top in the world doing this and making money on this who understand this as a business model or are implementing this as a business model.
So they make money. Not a lot of them, actually. Also, think about this. If you are in the U.S. and you have like a six figure job salary, you don't do this. You know, you can work eight hours a day and go buy a Mercedes. But if you're in Romania, this is actually, this makes a lot of sense for them. I mean, they are very skilled, and it's very easy to make money out of this, and you don't fear prosecution, because like, you know, if you send, I don't know, if you talk to the Russian police or something like that and say hey, this guy's bad, he stole $100,000 from an American bank, and they look at you and say okay, $100,000 from an American back. If they have to prioritize that, that's way down their priority list. I mean, they have to deal with, you know, drug trafficking, human trafficking and it's like, you know, oh, you got
$100,000 stolen from your bank account. Oh, too bad.
>>: But how do you steal $100,000 using this kind of exploit?
>> Hassen Saidi: What they do, so they have like an entire business model.
So, for instance, like I said, they build an infrastructure, they infect millions of machines. Then what do they do? They distribute fake AV, for instance. Suddenly, on your system, your infected system you don't know what happened and suddenly this pop-up window pops up and says your system is infected. Click here and we're going to clean it.
And you can click and you're going to give them like, you know, $20 or $30 and they're perfectly happy with that. $30 multiplied by, you don't know, a few thousand people who are going to click out of the millions, and they make money.
The other things that they do is they sell spam. So they use your machine to send spam to people. They use your machine, so basically they infect your machine and the only thing they do is they go through your mailbox and gather the addresses and take the addresses.
Now they have millions and millions of addresses. Or they can actually use your machine to send spam and it's a lot of money.
>>: They're starting to steal money from bank accounts too.
33
>> Hassen Saidi: Oh, yeah.
>>: It's happening right now. [indiscernible] was talking about that.
>> Hassen Saidi: So the most famous one is Zeus, for instance. Zeus is a
Trojan bank. It's very cleverly done. So you basically, it's like buying SDK.
So the Zeus guys sell you this SDK, and this SDK basically allows you to customize your fake setup. So if you're a Bank of America client or a Wells
Fargo client, they can actually customize that for you.
So it can actually intercept your network traffic. Know that oh, you have a
Bank of America account and stuff like that and pop for you, hijack your network traffic and pop for you like the fake Bank of America website, and take your credentials that way.
Some banks started using two-factor authentication. So they tell you not just to use your password, your usual password, but we're going to send you an SMS to your phone number, mobile phone that has this secret code that you have to add so now Zeus doesn't know about that. But they basically wrote a Trojan for the mobile phones. It runs on Android. It runs on Symbian. It runs, yeah, on
Blackberry.
And what it does, it's basically looking for those SMS messages and now it can put two and two together and say ah, you have -- you are the same person who has --
>>: They also write on the verified by Visa stuff that some sites use. With facts they have, they'll verify your Visa card or debit card with the bank and now that gives hackers a way to now they have a way to [indiscernible].
>> Hassen Saidi: Yeah.
>>: So they ask you for a lot more than your bank would ever ask you for.
>> Leonardo de Moura: We should probably stuff, because we're running late.
We'll talk afterward, though. Thanks a lot.
>> Hassen Saidi: Thanks, sure.