Automated Mapping of Large Binary Objects Ben Sangster Roy Ragsdale Greg Conti http://www.loc.gov/loc/lcib/0611/images/map.jpg The views expressed in this presentation are those of the author and do not reflect the official policy or position of the United States Military Academy, the Department of the Army, the Department of Defense or the U.S. Government. http://www.cdcr.ca.gov/News/Images/overcrowding/MuleCreek_071906v1.jpg Motivation 0400-07FF 0800-9FFF 8000-9FFF A000-BFFF A000-BFFF C000-CFFF D000-D02E D400-D41C D800-DBFF DC00-DC0F DD00-DD0F D000-DFFF E000-FFFF E000-FFFF FF81-FFF5 1024-2047 2048-40959 32758-40959 40960-49151 49060-59151 49152-53247 53248-53294 54272-54300 55296-56319 56320-56335 56576-56591 53248-53294 57344-65535 57344-65535 65409-65525 Screen memory Basic ROM memory Alternate: Rom plug-in area ROM : Basic Alternate: RAM RAM memory, including alternate Video Chip (6566) Sound Chip (6581 SID) Color nybble memory Interface chip 1, IRQ (6526 CIA) Interface chip 2, NMI (6526 CIA) Alternate: Character set ROM: Operating System Alternate : RAM Jump Table Goals • Accurately identify regions within arbitrary binary object • Efficient algorithms • Extensible framework • Automated mapping process • Automated process for generating test data • Current State: BINMAP Utility 0 insert ~ 5MB here... insert ~ 5MB here... ~12MB 0 ASCII Text Data Structure insert ~ 5MB here... Compressed Image 1 insert ~ 5MB here... Compressed Image N Unicode URLs Data Structure ~12MB f(x) 0 N f(x) 0 N Partial Taxonomy binary fragment high entropy medium entropy compression encryption low entropy machine human data uncompressed code language structures media RLE LZW ... AES DES ... ECB CBC ... EN FR RU ... repeating values Goal 0400-07FF 0800-9FFF 8000-9FFF A000-BFFF A000-BFFF C000-CFFF D000-D02E D400-D41C D800-DBFF DC00-DC0F DD00-DD0F D000-DFFF E000-FFFF E000-FFFF FF81-FFF5 1024-2047 2048-40959 32758-40959 40960-49151 49060-59151 49152-53247 53248-53294 54272-54300 55296-56319 56320-56335 56576-56591 53248-53294 57344-65535 57344-65535 65409-65525 ASCII Text (English) Pointer Table Variable Length Array Compressed Data Unicode (Basic Latin) Unknown Region Repeating Value (0xFF) Encrypted Region (AES) PNG Image JavaScript Encrypted Region (RSA Key?) Unknown Region BMP Image Unicode (Hyperlinks?) Repeating Value (0x00) f(x) Fragment type 1 Fragment type 2 Fragment type N a1-a2 a3-a4 a5-a6 Fragment type 1 Fragment type 2 Fragment type N Test 1 Test 2 Test 3 Test N a1-a2 b1-b2 c1-c2 z1-z2 a3-a4 b3-b4 c3-c4 z3-z4 a5-a6 b5-b6 c5-c6 z5-z6 Shannon Entropy Perl Random Number Sequence a1-a2 AES Encrypted Word Document a3-a4 ASCII Text Document a5-a6 BMP (Single Color) a7-a8 Shannon Entropy Shannon entropy H(X) measures uncertainty and quantifies information contained in message. Other Techniques - Hamming Weight - Index of Coincidence - Mean / Standard Deviation - Traditional pattern matching - <Your ideas?> http://en.wikipedia.org/wiki/Shannon_entropy Window Size (Shannon Entropy of AES sample) Window Size (Shannon Entropy of AES sample) Window Size (Shannon Entropy of AES sample) Window Size (Shannon Entropy of AES sample) Window Size (Shannon Entropy of 4 file types) Window Size (Shannon Entropy of 4 file types) BinMap Demo Extensibility Example Entropy/Evaluating 8.5 8 7.5 7 6.5 generic 6 compressed encrypted 5.5 5 4.5 4 3.5 0 10000 20000 30000 40000 50000 60000 Future Work • Improve Framework – Analyze performance – Develop & improve plug-ins • Improve Datasets • Integrate with visualization, interaction and GUI • Other identification measures • Apply datamining techniques • Increase size of taxonomy Code repository: http://binmap.googlecode.com 0x3F 0x3F 0x3F ? ? ?