“Hex sucks. A better mapping must be possible.” -Dan Kaminsky Visual Forensic Analysis and Reverse Engineering of Binary Data Gregory Conti Erik Dean United States Military Academy West Point, New York gregory.conti@usma.edu erik.dean@usma.edu The views expressed in this presentation are those of the authors and do not reflect the official policy or position of the United States Military Academy, the Department of the Army, the Department of Defense or the U.S. Government. http://www.whitehouse.gov/omb/budget/fy2005/images/justice-7.jpg A brief fashion statement... http://www.amazon.com/gp/product/images/B000UI91XY/sr=1-12/qid=1216606930/ref=dp_image_text_0?ie=UTF8&n=377110011&s=watches&qid=1216606930&sr=1-12 http://www.amazon.com/gp/product/images/B000RWJG6U/sr=1-5/qid=1216606930/ref=dp_image_text_0?ie=UTF8&n=377110011&s=watches&qid=1216606930&sr=1-5 Outline • The Problem – Tiny Windows • Background and Motivation • Related Work • Moving Beyond Hex • System Design • Case Studies • Demos Releasing Two Tools... data file doc xls txt… operated on by applications executed by OS exe ELF PE... 01010 10101 01010 other special cases core dump pagefile.sys hiberfil.sys… memory process memory cache… network packets… Ida Pro OllyDBG BinNavi (Zynamics) BinDiff (Zynamics)… high insight Filemon Regmon… 011 lower insight hex editors hexdump grep & diff strings general purpose objdump original application precise application strings /grep/diff H:\Datasets>strings 20040517_homeISP.pcap | more Strings v2.4 Copyright (C) 1999-2007 Mark Russinovich Sysinternals - www.sysinternals.com 0hF M@y 7bs Z19Z MICROSOFT NETWORKS WINDOWS USER Microsoft Security Bulletin MS03-043 Buffer Overrun in Messenger Service Could Allow Code Execution (828035) Affected Software: Microsoft Windows NT Workstation Microsoft Windows NT Server 4.0 Microsoft Windows 2000 ... Hex Editors Hex Workshop 011 WinHex Ida Pro OllyDBG BinNavi (Zynamics) BinDiff (Zynamics)… high insight Filemon Regmon… 011 lower insight hex editors hexdump grep & diff strings general purpose objdump original application precise application Ida Pro OllyDBG BinNavi (Zynamics) BinDiff (Zynamics)… high insight Filemon Regmon… 011 lower insight hex editors hexdump grep & diff strings general purpose objdump original application precise application F-Secure Malware http://www.f-secure.com/weblog/archives/00000662.html IDA Pro v5.1 http://www.hex-rays.com/idapro/ Zynamics BinDiff http://www.zynamics.com/content/_images/bindiff_scr2.gif Zynamics BinNavi http://www.zynamics.com/index.php?page=binnavi Ida Pro OllyDBG BinNavi (Zynamics) BinDiff (Zynamics)… high insight Filemon Regmon… 011 lower insight hex editors hexdump grep & diff strings general purpose objdump original application precise application nwdiff http://computer.forensikblog.de/en/2006/02/compare_binary_files_with_nwdiff.html http://www.geocities.jp/belden_dr/ToolNwdiff_Eng.html Dot Plots & Visual BinDiff (Kaminsky) Self-Similarity in a single file. (.NET Assembly) Diffing Two Files images: Dan Kaminsky, CCC2006 Framework • File Independent Level – Entropy – Byte Frequency – N-Gram Analysis – Strings – Hex / Decimal / ASCII – Bit Plot (2D/3D) – File Statistics • File Specific Level – Complete or Partial Knowledge of File Structure – For Example, Metadata Textual Hex/ASCII Detail View Traditional Textual Utilities (strings...) Graphical Displays Machine Assisted Mapping and Navigation Hex Editor Core Towards a Visual Hex Editor • Malware Analysis • Locate Embedded Objects – Encoding / Encryption • • • • • • • • • Audit Files for Vulnerabilities Compare files (Diffing) Cracking Analyze Unknown/Undocumented File Format Cryptanalysis Perform Forensic Analysis File System Analysis Reporting File Fuzzing Goals • Handle Large Files • Many Insightful Windows • Big Picture Context • Improved Navigation • Data Files / Executable Files • Hex Editor best practices is the foundation • Support Art & Science • Provide rapid analysis capability • Inform machine processing development • Fun Two Approaches • vizbin – C# VS 2008 • danglybytes – C# VS 2005 vizbin • Textual: Text/ASCII • Graphical: Byte plot, Byte Frequency Plot (overview + detail) • Interaction: navigation arrows, search, entropy display • Plug and Play Design: Designed to allow dynamic addition of new Viz’s Interface Memory Management Problem: .NET limit’s image height and width to <= 65535 Solution: Create a table with each cell containing a start and end offset into the file to be visualized Byte Plot 1 1 255 108 0 40 ... 480 640 Color Coding • ASCII • Entropy • Byte Frequency Explorer.exe (color coding: ASCII enhanced) Printable ASCII CRLF Tab or Space Other ASCII Word 2007 Document (color coding: entropy) ASCII Viz LZW Viz Packing (color coding: byte frequency) Original Explorer.exe cmd.exe UPX What if we could use the same Viz techniques to find hidden messages in other files? Erik Demo Embedded Messages – MP3 Stego Overview of the MP3 in question But on closer inspection there’s something out of the ordinary which looks a bit like ASCII text… Embedded Messages – MP3 Stego On closer inspection of the suspect section of the file, we find… VizBin Future Work / Lessons Learned • Complete ‘Magic File’ search and display • Change the memory table model to a memory array model – Memory table navigation is not intuitive enough • Add navigation through overview image • Add interactive interrogation of detail image danglybytes • Textual: Text/ASCII, Strings, ByteCloud • Graphical: Bitplot, BytePlot, RGBPlot, BytePresence, ByteFrequency, Digram, Dotplot • Interaction: VCR, Memory Map, Color Coding Traditional Views Hex / ASCII View Strings Strange Attractors and TCP/IP Sequence Number Analysis (Michal Zalewski) • http://lcamtuf.coredump.cx/oldtcp/tcpseq.html • http://lcamtuf.coredump.cx/newtcp/ Digraph View black hat bl la ac ck k_ _h ha at (98,108) (108,97) (97,99) (99,107) (107,32) (32,104) (104,97) (97,116) Digraph View 0,1, ... Byte 0 Byte 1 32,108 ... Byte 255 98,108 255 uuencoded slashdot.org compression encryption .txt incrementing words constrained pairs Bit Plot 1 1 1 1 0 1 ... 480 640 Byte Plot Example (Word Document) Byte Presence 0 255 255 108 0 40 128 255 RGB Plot 1 1 0 0 0 200 0 0 480 640 Display Comparison Pixels/Byte 19” Monitor Gain Textual Hex 300 pixels/byte 4.4 KB N/A Byte View 1 pixel/byte 1.3 MB 300x RGB View 3 bytes/pixel 3.9 MB 900x Dot Plots • Jonathan Helfman’s “Dotplot Patterns: A Literal Look at Pattern Languages.” • Dan Kaminsky, CCC & BH 2006 DotPlot Examples Images: Jonathan Helfman, “Dotplot Patterns: A Literal Look at Pattern Languages.” DotPlot Examples Images: Jonathan Helfman, “Dotplot Patterns: A Literal Look at Pattern Languages.” Kaminsky DotPlot Byte 0, Byte 1, ... Byte N Byte 0 Byte 1 ... Byte N O(N2) Modified for Interactivity Byte 0, Byte 1, ... Byte N Byte 0 Byte 1 ... Byte N 500x500 O(N) English Text Bitmap Image Compressed Audio Byte Clouds Tag Cloud Smashing the Stack for Fun and Profit http://tagcrowd.com/ Byte Cloud Byte Frequency (word document) 0 255 Unencrypted 0 255 AES Quick Assessments • Alphabet in use • Use of encryption • Application file format exploration • Fixed length structures • Variable length structures • Bitmaps Pure Edge (.xfdl) 0A 31 37 42 48 4E 54 5A 66 6C 72 78 22 32 38 43 49 4F 55 61 67 6D 73 79 2B 33 39 44 4A 50 56 62 68 6E 74 7A 2D 34 3B 45 4B 51 57 63 69 6F 75 2E 35 3D 46 4C 52 58 64 6A 70 76 30 36 41 47 4D 53 59 65 6B 71 77 Encryption unencrypted XOR unencrypted AES observe format changes (~chosen plaintext attack) .tiff Insert Image from File (.tiff) Fixed Length Structure Neverwinter Nights Database File tvDebug.log • Created by ZoneAlarm Firewall • Can grow quite large • 6.7M in this case • Binary • Seeking big picture context • ~240 byte wide data structures • Vertical bands identify identical values • Exceptions visible Alphabet (90 values) 01 03 05 09 0B 0D 0F 11 13 15 17 19 1D 1F 21 23 25 29 2B 2D 2F 31 33 35 37 3B 3D 3F 42 44 45 49 4B 4D 4F 51 53 54 55 56 57 59 5B 5D 5F 67 6D 80 82 84 86 88 8A 8C 8E 90 92 94 96 9A 9C 9E A0 A2 A4 A6 A8 AA AC AE B0 B2 B4 B6 B8 BA BC BE C0 C6 C8 CA CE D0 D2 D4 D6 DA DE EA Variable Length Structure Thumbs.db See http://www.acquisitiondata.com/white_papers/thumbsdbfiles.pdf for a well written white paper. pcap 8 bits / pixel pcap packet length / color coding by protocol pcap multicolumn Demo (Firefox hdmp) Firefox .hdmp Firefox .hdmp Firefox .hdmp Firefox .hdmp Redacted PDF... http://entertainment.slashdot.org/article.pl?sid=08/05/20/0228229 Example .NET Image Formats Format8bppIndexed Specifies that the format is 8 bits per pixel, indexed. Format16bppGrayScale The pixel format is 16 bits per pixel. The color information specifies 65536 shades of gray. Format16bppRgb565 Specifies that the format is 16 bits per pixel; 5 bits are used for the red component, 6 bits are used for the green component, and 5 bits are used for the blue component. Format1bppIndexed Specifies that the pixel format is 1 bit per pixel and that it uses indexed color. The color table therefore has two colors in it. Format24bppRgb Specifies that the format is 24 bits per pixel; 8 bits each are used for the red, green, and blue components. Format32bppArgb Specifies that the format is 32 bits per pixel; 8 bits each are used for the alpha, red, green, and blue components. Format48bppRgb Specifies that the format is 48 bits per pixel; 16 bits each are used for the red, green, and blue components. Format64bppArgb Specifies that the format is 64 bits per pixel; 16 bits each are used for the alpha, red, green, and blue components. http://msdn.microsoft.com/en-us/library/system.drawing.imaging.pixelformat(VS.80).aspx Weaknesses • entire file may be extracted from bit/byte/RGB – May trigger AV or IDS – 8bit/byte steg • Screams for big monitor • Better memory management – ~300MB+ • Unicode Future Work • Plug-ins / Editable Config Files – Visualizations – Encodings • Saving state – Memory Maps • Improving Interaction – What works / What doesn’t • Multiple Files / File Systems • REGEX search • Automated Memory Map Generation DAVIX (Jan Monsch and Raffy Marty) DAVIX Workshop DEFCON Breakout Room Sunday 2PM-4PM http://www.secviz.org/node/89 Communities http://secviz.org/ http://vizsec.org/ “The place to share, discuss, challenge, and learn about security visualization.” “vizSEC is a research community for computer security visualization.” Raffy Marty Splunk John Goodall Secure Decisions VizSEC 2008 http://www.vizsec.org/workshop2008/ More Information • “Visual Reverse Engineering of Binary and Data Files.” Gregory Conti, Erik Dean, Matthew Sinda, Benjamin Sangster. VizSEC 2008. – Publicly available September • Security Data Visualization (No Starch Press) • Applied Security Visualization (Addison-Wesley) Acknowledgements Damon Becknell, Jon Bentley, Jean Blair, Sergey Bratus, Chris Compton, Tom Cross, Ron Dodge, Carrie Gates, Chris Gates, Joe Grand, Julian Grizzard, Toby Kohlenberg, Oleg Kolesnikov, Frank Mabry, Raffy Marty, Brent Nolan, Gene Ressler, Ben Sangster, Dino Schweitzer, Matt Sinda, and Ed Sobiesk Feedback Welcome • Visualization ideas • Usage feedback • Desired functionality / feature requests • Plug-in architecture recommendations • DanglyBytes is here... www.rumint.org/db.zip • We’ll have a link to VizBin up at www.rumint.org shortly Survey Gregory Conti gregory.conti@usma.edu| Erik Dean erik.dean@usma.edu