Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and Richard A. Kemmerer Department of Computer Science University of California, Santa Barbara Presenting : Majed Alhudaib Introduction • Spyware is growing. • In a study that I made on campus over 50 students I found that: • 6 out of each 10 students encountered some sort of spyware infect over the past three years. • Only 3 out of 50 know what a spyware is. • 20 asked for help removing what they call a “virus” and we call spyware. • some blamed the manufacturer of their computers for adding all these adds in the browser!!. Overview • • • • • • • • • • Current anti-Spyware detectors. Behavior based spyware detection. BHO’s, toolbars and COM. Spyware characterization. The Big picture. Dynamic Analysis. Static Analysis. Evading Detection. Evaluation. Conclusion. Current anti-Spyware detectors • Signature based mechanism. Behavior based spyware detection • a new spyware detection technique that overcomes some of the limitations of existing anti-spyware approaches. This technique is based on an abstract characterization of the behavior of a popular class of spyware programs that relies on Internet Explorer’s Browser Helper Object (BHO) and toolbar interfaces to monitor a user’s browsing behavior. • Why Internet Explorer’s BHO and toolbar? • How about other web browsers? BHO’s, toolbars and COM • What are BHO’s and Toolbars? • What is COM? • How do they interact? • How can spyware applications take advantage of all this? Spyware characterization • we classify a browser helper object or a toolbar as spyware if the component, in response to browser events: 1. monitors user behavior by interacting with the web browser and 2. Invokes Windows API calls that can potentially leak information about this behavior (e.g., calls to save the data to a file or transmit information to a remote host). The Big picture • Behavior based spyware detection consists of two major parts: • 1- Dynamic analysis: exposes a suspicious component to crafted browser events (which simulate user activity) and analyzes the component’s response. In particular, we dynamically record both the browser COM functions and the Windows API functions that the component calls. • 2- Static analysis: extracts the control flow graph of all code regions that are responsible for handling events. Dynamic Analysis • The goal of the dynamic analysis step is two things: • First, it has to monitor the interaction of the component with the browser and record all the browser’s COM functions that are invoked in response to events. • Second, it has to determine the code regions that are responsible for handling events, thereby providing the necessary starting points for the static analysis step. Dynamic Analysis: Core elements • “fake” WebBrowser COM object: which provides the component under analysis with an environment similar to the one that would be present when being hosted by an instance of Internet Explorer. • COM object host application: which properly instantiates all involved components and sends the relevant browser events to the BHO or toolbar component under evaluation. • A program that traces the execution of our host application to extract those code regions that handle the various browser events that are delivered. Dynamic Analysis: Locating eventhandling code • An instruction that handles an event is called “event-specific instruction”. • How to get the first event-specific instruction? Dynamic Analysis: Locating eventhandling code • Example. Static Analysis • first task: of the static analysis step is to disassemble the target binary and generate a control flow graph from the disassembled code. A control flow graph (CFG) is defined as a directed graph. • Next task: Based on the CFG for the entire component, we isolate those parts of the graph that are responsible for handling events. In particular, we are interested in all subgraphs of the CFG that contain the code to handle the different events. Using the event-specific addresses collected during dynamic analysis. • Finally: the event specific lists are merged to obtain a list of all API calls that are invoked in response to events. Evading Detection • How can a spyware bypass this kind of detection? 1- a spyware component could attempt to leak information using means other than API calls, or it could prevent the static analysis process from finding their invocations in the code of the BHO. How? • 2- evasion venue is to craft the BHO code such that it can resist static analysis. Evaluation • total of 51 samples (33 malicious and 18 benign); 34 of them were BHOs and 17 were toolbars. Conclusion • This mechanism does need more work, thus, it definitely raises the bar for spyware writers. • The potential is very high and the outcome could benefit in deferent malware types detection. Questions? • Thank You