The Needle in the Haystack: Find the Offending File Robert K. Henry CISSP, GCIH, GCFA Information Security Officer HR Has an Employee Grievance Hostile Workplace – Sexual Harassment Inappropriate/offensive files stored on web server and displayed in office College Staff Already Involved College Investigation Course Site Files Deleted Six weeks prior to HR grievance report No Backups! Backup System on the fritz at time files were deleted College Investigation How do we get the goods? College systems admin made manual backups to local PC drive Not removed from local drive after backup system was repaired The Mission: Find inappropriate material among 6 GB of mixed images, word-processed, and text files. Identify owner/creator of files > 7000 files Search Options Manual grep ssdeep foremost sorter Content Based Image Retrieval, CBIR Evaluation Criteria: Easy! Free! Search Options Manual (The First Responder's Strategy) zzzzzzzzzzzzzzzzzz! Thumbnails Slide Show One-at-a-time Too much room for error Pretty Inefficient (32 hours of searching) Two people spent two workdays each going through the DVD's Search Options But . . . it worked! Identified inappropriate word-processed files and images in one directory on one of the DVD’s Due to multiple file copying, creator/owner of files doesn't show up in Windows file properties Did I mention the files were uploaded via ftp with shared userID’s? Not much accountability! Search Options There’s gotta be an easier way! Search Options-- grep Built-in *nix string search command also available for Windows Steps to conduct search with grep (1) Make a forensic image of the disks #dd if=/dev/sr0 of=dvdimage.img conv=noerror,sync Search Options--grep Steps to conduct search with grep (2) Extract Strings Ascii strings first #cat dvdimage.img | strings --radix=d dvdimage.img > dvdimage.str Unicode strings second #cat dvdimage.img | srch_strings -t d -e > dvdimage.uni.str Search Options--grep Steps to conduct search with grep (3) Examine Strings Files Create “dirty word” file Use “dirty word” file to search strings for, well, dirty words #grep -f dirtyWords.txt dvdimage.str > grepOutput.txt #grep -f dirtyWords.txt dvdimage.uni.str > grepOutput.uni.txt Search Options--grep Results process sounds a little involved, however . . . Took about 30 minutes to image DVD’s and run commands. Not Bad! Identified Word-Processed files with inappropriate jokes Doesn't get image files (didn't expect it to) Doesn't Identify Creator of files Zero non-repudiation Doesn't help investigation confirm or deny ownership of files Bonus: found survey data with Too Much Information Protected student information in clear text Search Options--ssdeep linux and Windows http://ssdeep.sourceforge.net/ Uses fuzzy hashing A “partial” or “inexact” hashing of files to identify similar files Its author, Jesse Kornblum, even uses the phrase “finding needles in haystacks” in his documentation! Haven't heard of it being used to find questionable pictures, but why not give it a try? Search Options--ssdeep “ssdeep! Go find files in the test directory that look like files in the “homeStuff” directory!” #ssdeep -lrd test homeStuff Bummer- Identified exact matches only Search Options--ssdeep Need to try carving out portion of file for true fuzziness Skip the first 20 blocks (header info and more) of file and cut out the next 70 blocks for the hash comparison: #dd if=dsc00219.jpg of=219partial.jpg skip=20 count=70 Create file for comparison #ssdeep dsc00219partial.jpg > testhash.txt Compare fuzzy hash of image to images in directory #ssdeep -lrm testhash.txt homeStuff Search Options--ssdeep Results: Not Promising Can check for similarities in files on a file-by-file basis, but that's too much like a manual search Can easily find exact matches so you must have the file you are looking for ??? However . . . Useful for an intellectual property issue or finding known bad files Search Options--foremost linux and Windows http://sourceforge.net/projects/foremost/ Identifies files based on a database of file headers and footers Find a list of most file headers at http://www.wotsit.org Search Options--foremost This is the header of a gzip file displayed in a hex editor The gzip header is 0x1f 0x8b 0x08 Search Options--foremost #foremost –o pathToOuptutFile –c pathToConfigFile pathToImageFile foremost--Results Search Options--sorter linux and Windows perl wrapper for several Sleuthkit tools http://www.sleuthkit.org/ Runs against a disk image Finds active or deleted files Then displays thumbnail view of the files Search Options--sorter #sorter –s –d pathToutputFile pathToInputFile Search Options--sorter Results Save many steps compared to foremost Still have a bunch of thumbnails to look through Search Options There’s gotta be an easier way! Search Options--CBIR Content Based Image Retrieval Commercial Versions Available My Office (me) too cheap—didn’t even look into commercial options! Free and Open Source imgSeek Linux and Windows http://www.imgseek.net/ Gnu Image Finding Tool Linux http://www.gnu.org/software/gift/gift.html Search Options--CBIR ImgSeek Demo Lessons Learned Mission Accomplished! Not so much Found inappropriate material among 6 GB of mixed images, word-processed, and text files Failed to identify owner/creator of files Identified a potentially useful tool Lessons Learned Need to develop incident response procedure for entire organization Procedure for breaches of Personally Identifiable Information and Payment Card data are on the books Procedures for responding to HR requests needs documentation And needs distribution to de-centralized IT units References: The Sleuthkit (includes sorter) foremost http://www.imgseek.net/ GIFT (Gnu Image Finding Tool) http://ssdeep.sourceforge.net/ imgSeek http://sourceforge.net/projects/foremost/ ssdeep http://www.sleuthkit.org/ http://www.gnu.org/software/gift/gift.html Presentation available at: http://boisestate.edu/oit/iso/HTCIA&CBIR.ppt Questions? bhenry@boisestate.edu