Yara & Python Malware Identification and Classification CarolinaCon 7 Michael Goffin @mjxg http://www.mgoff.in Hey sir! Why hello there!! • Rochester Institute of Technology • Computer Science House • Information Security Scientist/Engineer What’s in store? • • • • • • Malware Yara Python Identification and Classification of Malware Showing it all off QQ session Malware! Sonofa... Methods of acquisition • downloads • compromised website content (ex: images) • attachments • links to compromised site content You’ve been infiltrated! Things to note: • You don’t know it yet, and might not for a while • You don’t know the scope of it • You don’t know the severity of it But you eventually see something… Start the cycle! Management wants answers! What do you do next? • Go into a panic! • Oh no! We should remove the known compromised host(s) from network! • We should assess the compromise…somehow! • Oh geez, might be good to change passwords – let’s just have everyone do it just in case! • We need to go through logs and other hosts for signs of lateral movement – wait, what are we looking for? • Can we make firewall rules to block any IPs or domains? • Do we have any AV or IDS appliances? Most importantly You did get a copy of the malware to analyze, right? …Right? Get better at data mining! • Who is interested in this user or your company? • What are they trying to do with this malware (and what are they exploiting?)? • When did this malware come in? • Where did it come from and where did it go to? • Why are they after your company, or this user? • How does this malware help them accomplish their goals? What do we do with all the data? Build a classification database over time! • Identify trends • Find commonalities Lots of action, now what? Enter Yara What does Yara do? Identify and classify malware samples based on textual or binary patterns contained within those samples MALWARE! MALWARE! MALWARE! MALWARE! How does it do it? Pretty basic: • Search for patterns • Use defined conditions to determine if the patterns are a positive match • Output matching rule content for consumption Yara and Python Step 1: % python Step 2: > import yara > rules = yara.compile(signatures) > matches = rules.match(filetoscan) Step 3: profit As the old saying goes… If it walks like a duck… And it quacks like a duck… It’s probably the DHA installing backdoors and keyloggers while xfil’ing your data. Identification • Can we tease out specific characteristics about this piece of malware that can describe it both from a functional and fashionable perspective? • • • • What does it attempt to touch? What does it attempt to modify? Is this type of malware stylish? Etc. Identification • Are there any quantitative or qualitative datasets about this malware that can help further describe its nature? – Functions used in other malware – Code style similar to other malware – IPs or domains used – Specific targets (files, processes, etc.) – End result of successful execution Classification Questions[1]: • Does an unknown malware instance belong to a known malware family or does it constitute a novel malware strain? • What behavioral features are discriminative for distinguishing instances of one malware family from those of other families? – Compare these to our Identification Strains • • • • • • • Trojan Rootkit Backdoor Xfil Worms Ransomware Keylogger Build Signatures • • • • Generate conditions Build rules for those conditions Compile rules into a signature set Develop process to scan files using those signature sets • Generate alerts Set human response expectations to these alerts!! What a rule looks like rule foo { meta: key: value strings: $variable = something condition: logic_for_determining_positive_rule_match } Conditions Some basic condition examples: • A string or value exists • A set of strings or values exist • Strings or values at certain offsets exist • The number of times a string or value occurs • File size restriction Let’s see Yara in action! How to incorporate Yara • Web downloads • Web content – Urllib • Email attachments • Honeypots Grab files from AV and IDS appliances to scan! Why Yara? • Supplement to additional applications (Snort, AV, detonation chambers) • MD5 of known malware only good if exact file is seen again • Detect future malware with similar identifiers that AV or IDS might not catch yet • Free The cooldown… • http://code.google.com/p/yara-project/ Questions? References • [1] Learning and Classification of Malware Behavior – Rieck, Holz, Willems, Dussel, Laskov – http://pi1.informatik.uni-mannheim.de/filepool/publications/malwareclassification-dimva08.pdf