31st IEEE Symposium on Security & Privacy, 2010 Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology Christopher Kruegel University of California Engin Kirda Institute Eurecom Outline Introduction System Overview Automated Extraction Gadget Preparation and Replay Gadget Inversion Evaluation Introduction Malware is the driving force behind many of the attacks on the Internet today. It now being increasingly deployed as software that can be remotely controlled. How to analyze… Static analysis Obfuscation, etc. Dynamic analysis It doesn’t support automatically extracting the specific functionality from the malware. Ex: domain generation algorithm of samples that use domain flux Ex: the decoding function This paper aims… Presenting a novel approach to automatically extract from a given malware the instructions that are responsible for a certain activity of the sample First, INSPECTOR performs dynamic program slicing on the malware to extract a slicing with “interesting” behavior. Second, it generates a stand-alone gadget base on the extracted slice. Advantages of the extracted gadgets Reduce our exposure to the malicious code Immediately carry out a certain operation the malware performs Identify in-memory buffers that hold decrypted data Some gadgets can be inverted. System Overview Automated Extraction Generating Activity Logs Anubis[web] performs dynamic malware analysis base on a processor emulator(QEMU). ○ Recording all executed instructions ○ Marking each byte returned by a system call, and using taint technique ○ Record all memory accesses Once an analyst has spotted an interesting behavior, she can instruct INSPECTOR to extract a gadget. Automated Extraction (cont.) Selecting and Extracting Algorithms An analyst has to select the relevant flow manually. ○ In the HTTP download, she may select WriteFile, or CreateFile. Extract a slice ○ Attempts to find all necessary data sources required to calculate the parameters pass to the function call. Selecting and Extracting Algorithms Forward Searching and Backward Slicing The behavior selected by an analyst is not the intended endpoint. The analyst should specify something as an endpoint where the forward searching stops. Heuristics for Detecting Endpoint string comparison functions, or execution of code containing string handling instructions The data has been processed by a list of mathematical instructions. Selecting and Extracting Algorithms (cont.) Closure Analysis INSPECTOR can decide to deliberately exclude certain dependencies. ○ Conditional jump ○ A behavior is only triggered under a certain condition Gadget Preparation and Replay Gadget Format and Relocation Dynamic loadable library (DLL) All references to absolute code addresses are rewritten to use relative addressing Extract all static memory areas into a data file Gadget Preparation and Replay (cont.) Gadget Player Memory Management ○ Preinitialized memory areas ○ Provide the player with a complete view of the memory buffers accessible to the gadget. Gadget Preparation and Replay (cont.) Execution Containment Must isolate the gadget from the player’s memory Some choice ○ Emulation Performance consideration ○ Our approach Memory management rewrites the memory accesses Using a separate thread Redirect the API or system call to environment interface ○ Other approach SFI, Native Client[web] Gadget Preparation and Replay (cont.) Environment Interface During the gadget start-up, it registers a callback function inside the gadget ○ Invoked by the gadget each time a system or Windows API call ○ The callback can be changed by the analyst Gadget Preparation and Replay (cont.) Callback Handling The gadget player can return fake information to the gadget Gadget Inversion Main idea First, extract the gadget that is responsible for stealing and encoding the data Second, compute the input that leads to the output observed in the network dump Use brute-force and the data dependencies Gadget Inversion o O, be theset of output bytes i I , be theset of input bytes ov is theexpectedvalue Dependentinput bytes: Do i | i I o depends on i Candidateinputs: Co vii vin | ii ,, in Do Gadget Inversion Implementation Using taint tracking to get information Applicability Base64: ○ 3 byte encode to 4 byte ○ Depend on 2 byte Gadget Inversion XOR ○ Using constant key depend on 1 byte ○ Using the content as key depend on 2 byte Strong Encryption ○ Ex: RSA ○ Depend on all byte ○ imposible Gadget Inversion Possible Extensions Extract algebraic formulae ○ Constraint solver Input parallelization ○ Check multiple input candidates Evaluation Evaluation Domain Flux: Conficker[web] Evaluation Evaluation Fetching Binary Updates: Pushdo Over a period of 16 days Change IP for 3 C&C servers Binary Update Decryption: Pushdo Pushdo client use random key to append on URL in order to get encrypt file. Invere the program to find the key Evaluation Binary Update Generation: Pushdo Inverse the decrypt algorithm Redirect connection to our server 140 bytes 44 seconds Evaluation Template-based Spamming: Cutwail XOR based encrypt Store template in memory