Static Analysis of Anomalies and Security Vulnerabilities in Executable Files Presented by Jay-Evan Tevis Department of Computer Science What is actually inside an executable file? Can the contents of an executable file reveal if secure programming practices were used when developing the software? #include <stdio.h> int main() { float number; float answer; printf(“Enter a value:”); scanf(“%f”, &number); answer = sqrt(number); printf(“Square root: “); printf(“%f”, answer); return 0; } Source code file (209 bytes) Object code file Executable file Answer: There is much more than just binary code in an executable file. DOS Header [64 bytes] MS-DOS Stub [57 bytes] PE Signature [4 bytes] File Header [20 bytes] Optional Header [224 bytes] The contents of the file contains a wealth of information on anomalies and security vulnerabilities. Section Table [200 bytes] .text section [2048 bytes] .data section [512 bytes] File Header [20 bytes] .rdata section [512 bytes] #include <stdio.h> int main() { float number; float answer; printf(“Enter a value:”); scanf(“%f”, &number); answer = sqrt(number); printf(“Square root: “); printf(“%f”, answer); return 0; } Source code file (209 bytes) Section Table [160 bytes] .idata section [439 bytes] .text section [128 bytes] Symbol Table [6138 bytes] .rdata section [32 bytes] COFF Relocations [110 bytes] Symbol Table [306 bytes] String Table [1791 bytes] String Table [4 bytes] Object code file (760 bytes) Executable file (13,049 bytes) Research Thesis A methodology can be devised that uses information located in the headers, sections, and tables of an executable file, along with information derived from the overall contents of the file, as a means to detect specific software security vulnerabilities without having to disassemble the code Such a methodology can be instantiated in a software utility program that automatically detects certain software security vulnerabilities before ever installing and running the executable file 4 Summary of Research Objectives 1) 2) 3) 4) 5) 6) Do an in-depth study of the PE format of an executable file to discover any information that could be useful in detecting software security vulnerabilities Create a software utility to dissect a PE file byte-by-byte from beginning to end (without disassembling) in order identify, map, and categorize its contents Formulate a methodology for combining and correlating the information found in a PE file in an effort to detect indicators of certain security vulnerabilities Incorporate the methodology into the PE dissecting utility in order to automatically detect software security vulnerabilities in a matter of seconds Test the automated methodology on installation files, software development files, Windows XP operating system files, Microsoft application files, security-centric application files, and other application files Analyze the test results and make conclusions 5 Overview of Presentation • The PE File Format • A Software Utility for Dissecting a PE File • A Methodology to Detect Indicators of Certain Security Vulnerabilities • Methodology + PE Dissecting Utility = findssv • Testing the Automated Methodology • Conclusion and Future Work 6 The PE File Format PE and COFF Formats • PE: Portable Executable – Common format created by Microsoft and used by all 32-bit Windows NT executable files (.exe, .dll, .ocx, .sys, .drv, etc.) – PE files are called image files in the Microsoft documentation (e.g., binary image) • COFF: Common Object File Format – MS-DOS object code format created by Microsoft and used by many compilers when producing object code files – COFF is a subset of the PE format 8 Typical Contents of a PE file • • • • • • • • • • • DOS header MS-DOS stub PE signature File header Optional header Section table Symbol table String table Various sections (.text, .data, etc.) Import table Export table 9 helloworld.exe (Cygwin) (1 of 2) ADDRESS 0 63 64 120 121 127 128 131 132 151 152 375 376 575 576 1023 1024 177151 DESCRIPTION +-------------------------------------------------------------+ DOS Header [64 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ MS-DOS Stub [57 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ (Contents not known) [7 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ PE Signature [4 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ File Header [20 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ Optional Header [224 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ Section Table [200 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ (** Zero-filled region **) [448 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ .text section [176128 bytes] +-------------------------------------------------------------+ 10 helloworld.exe (Cygwin) (2 of 2) ADDRESS 177152 181759 181760 217599 217600 218782 218783 219023 219024 219135 DESCRIPTION +-------------------------------------------------------------+ .data section [4608 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ .rdata section [35840 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ .idata section [1183 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ More of Import Table (.idata section) [241 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ (** Zero-filled region **) [112 bytes] +-------------------------------------------------------------+ 11 A Software Utility for Dissecting a PE File Currently-available PE Dump Utilities • First considered using – – – – objdump from Cygwin tdump from Borland dumpbin from Microsoft pedump from Pietrek • All provide PE format information that has been first filtered and then text formatted; some output is extraneous data • All appear to rely on the values in the header and look-up tables to locate and display section and table contents instead of looking directly at what is actually in the tables • Comparing results provided by each revealed inconsistencies in the section table and import table information • None provide a complete byte-for-byte account of the file contents to find "unmapped regions" or compressed file storage • Some abort on PE files containing non-typical information 13 A PE File Dissecting Utility • • • • • Built a PE file dissecting utility Accepts program options from the command line Works on a single PE or COFF file Written in C++; 2700 source lines of code Made up of a driver module, a utility module and 16 classes – Modeled the classes after the PE and COFF formats 14 A Methodology to Detect Indicators of Certain Security Vulnerabilities Overview of the Methodology • Derived from an in-depth search for security indicators gleaned from the PE file format • Describes how to statically analyze a PE file • Categorizes its findings as facts, anomalies, and vulnerabilities • Consists of four steps 16 Step 1: Read through the complete file and create a file fact summary ==== File Fact Summary ==== - Image file in Windows NT portable executable (PE) format Actual file size: 7168 bytes Created on Tue Feb 1 16:01:34 2005 Target CPU: Intel 386 or later compatibles Targeted for a 32-bit-word architecture Debugging information has been removed Designed for Windows Operating System version 4.0 Runs in the Windows character subsystem Lists these table names in the data directory: Import Contains no string table Contains no symbol table Imports functions from -- cygwin1.dll (CYGWIN GNU base dynamic link library) -- KERNEL32.dll (WinNT base API client) ==== End of File Fact Summary ==== Sample C-1: vulnerable.exe (g++) 17 Step 2: Detect anomalies when reading the headers, tables, and sections in the file **** Anomalies **** - A section entry named .bss appears in the section table, but the table doesn't contain the location of the 226272 bytes for that section - The data directory table in the optional header states that the Resource Table (.rsrc section) is 1104 bytes in size when actually it is 1536 bytes in size - The data directory table in the optional header states that the Relocation Table (.reloc section) is 43792 bytes in size when actually it is 44032 bytes in size **** End of Anomalies **** Sample H-1: cygwin1.dll (g++) 18 Step 3: Detect anomalies when mapping the file contents 135168 +-------------------------------------------------------------+ .rsrc section [10360352 bytes] 10495519 +-------------------------------------------------------------+ 135168 +-----------------------------------------------------+ (No additional details) [10346544 bytes] 10481711 +-----------------------------------------------------+ 10481712 +-----------------------------------------------------+ Certificate Table [5616 bytes] 10487327 +-----------------------------------------------------+ 10487328 +-----------------------------------------------------+ (No additional details) [8192 bytes] 10495519 +-----------------------------------------------------+ - Contains an unusual area of 10346544 bytes starting at address 135168 which may indicate a group of compressed files Sample K-3: Real Player installation file 19 Step 4: Detect software security vulnerabilities (see next three slides) 20 Step 4a: Detect sections that are both writable and executable !!!! Security Vulnerabilities and Risks!!!! - Has a executed - Has a executed - Has a - Has a - Has a executed - Has a - Has a executed - Has a - Has a executed - Has a - Has a executed - Has a section named .advapi32_text whose contents can be both written to and section named .netapi32_text whose contents can be both written to and section named .ntdll_text whose contents can be both written to and executed section named .psapi_text whose contents can be both written to and executed section named .secur32_text whose contents can be both written to and section named .user32_text whose contents can be both written to and executed section named .wsock32_text whose contents can be both written to and section named .ws2_32_text whose contents can be both written to and executed section named .iphlpapi_text whose contents can be both written to and section named .ole32_text whose contents can be both written to and executed section named .kernel32_text whose contents can be both written to and section named .winmm_text whose contents can be both written to and executed !!!! End of Security Vulnerabilities and Risks!!!! Sample H-2: cygwin1.dll (g++) 21 Step 4b: Detect large unused zerofilled regions (over 50 bytes) FILE NAME: OC30.DLL !!!! Security Vulnerabilities and Risks!!!! - Contains 9228 bytes of unused zero-filled space that could be used to store malicious code or data !!!! End of Security Vulnerabilities and Risks!!!! Sample L-2: oc30.dll (Windows system file) 22 Step 4c: Detect the use of vulnerable C library functions FILE NAME: OC30.DLL !!!! Security Vulnerabilities and Risks!!!! - Uses 5 standard C functions susceptible to buffer overflow attacks: fgets (Low risk), memcpy (Low risk), strcat (Very high risk), strcpy (Very high risk), vsprintf (Very high risk) !!!! End of Security Vulnerabilities and Risks!!!! Sample L-2: oc30.dll (Windows system file) 23 List of Vulnerable Functions in the Standard C Library • Ultra High Risk: gets • Very High Risk: strcpy, strcat, sprintf, scanf, sscanf, fscanf, vfscanf, vsprintf, vscanf, vsscanf, streadd, strecpy, strtrns, realpath, syslog, getopt, getopt_long, getpass • Medium Risk: getchar, fgetc, getc, read • Low Risk: bcopy, fgets, memcpy, snprintf, strccpy, strcadd, strncpy, vsnprintf (Source: Viega and McGraw,"Building Secure Software", 2002, pp. 152-153) 24 The Consequences of No Symbol Table or Import Table • If the symbol or import table is missing, our methodology cannot detect the use of functions vulnerable to buffer overflow attacks • Symbol table can be stripped when a file is linked • Import table is not needed if function definitions are in the PE file rather than a separate DLL An empty list of vulnerable function names only indicates the lack of enough information to detect any function names at all; it does not indicate that no vulnerable functions are used 25 Methodology + PE Dissecting Utility = findssv “find software security vulnerabilities” Features of findssv • Written in C++; command-line interface; 3800 SLOC • Executable size – Borland: 1,150,000 Cygwin: 660,000 Microsoft: 412,000 • Performs a static security analysis of object code or image files stored in the COFF or PE format • Detects and reports facts and certain anomalies and software security vulnerabilities about the file • Also provides the ability to search, display and change any range of bytes in the file in byte, character, string, unicode, word, or double word format • Accepts command line options • Can be used on a single file, a select group of files, or a whole directory • Scans scores of files in only a few seconds • Produces an optional trace showing each stage of the analysis and the results obtained from it 27 Sample Uses of findssv – Display the file map of helloworld.exe c:\>findssv helloworld.exe –MZ – Detect anomalies and vulnerabilities in Microsoft Word c:\>findssv winword.exe – Display the contents of each PE format part in Secure CRT c:\>findssv securecrt.exe –P – Detect anomalies and vulnerabilities in the DLL files located in the Windows System32 directory c:\>findssv “C:\windows\system32\*.dll” – Create a trace output while reading through cygwin1.dll c:\>findssv cygwin1.dll -T – Find all UNICODE strings of length four or more in a file c:\>findssv file.exe 4 0 100000 28 Testing the Automated Methodology Testing Approach 1) Identified seven categories of files to test 2) Identified PE files for each category (2725 total files) 3) Ran findssv against a single file, a group of files, or a whole directory depending on the location and configuration of the files stored on the hard drive 4) Saved the report data generated by findssv using command line redirection of the standard output file 5) Manually analyzed the report data and summarized the results in seven separate tables corresponding to each category 30 Categories of Test Files • • • • • • Specific Example Files Executable installation files Software development files Windows XP operating system files Microsoft application files Security-centric application files • Miscellaneous application files 31 Specific Example Files (Key Findings 1 of 2) • The Cygwin Gnu C++ compiler and linker injected seven vulnerable function calls into the executable program of findssv • A program compiled and linked using the Cygwin Gnu tools will have standard C functions in it that are susceptible to buffer overflow attacks even when these functions are not explicitly used by the software developer • The cygwin1.dll file contains security vulnerabilities that allow executable code to be modified after the program is loaded into memory and executed 32 Specific Example Files (Key Findings 2 of 2) • The kernel32.dll file contains the use of functions that are susceptible to buffer overflow attacks • An executable file will reveal less information about the functions it uses by having its symbol table stripped and by using fewer DLLs • It may be possible to analyze the general layout of the sections and tables in a file map of an executable file in order to detect a pattern that indicates the compiling and linking tools used to generate the file 33 Executable Installation Files (Key Findings) • An image file can take advantage of the flexibility of the PE format and serve as its own storehouse for millions of bytes of data What is in the pdf995s.exe installation file? 34 pdf995s.exe (1 of 2) ADDRESS 0 63 64 120 121 199 200 203 204 223 224 447 448 647 648 1023 1024 23039 DESCRIPTION +-------------------------------------------------------------+ DOS Header [64 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ MS-DOS Stub [57 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ (Contents not known) [79 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ PE Signature [4 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ File Header [20 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ Optional Header [224 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ Section Table [200 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ (** Zero-filled region **) [376 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ .text section [22016 bytes] +-------------------------------------------------------------+ 35 pdf995s.exe (2 of 2) ADDRESS 23040 26111 26112 29695 29696 31231 31232 1360383 DESCRIPTION +-------------------------------------------------------------+ .rdata section [3072 bytes] +-------------------------------------------------------------+ 23040 +-----------------------------------------------------+ (No additional details) [492 bytes] 23531 +-----------------------------------------------------+ 23532 +-----------------------------------------------------+ Import Table (.idata section) [2303 bytes] 25834 +-----------------------------------------------------+ 25835 +-----------------------------------------------------+ (No additional details) [181 bytes] 26015 +-----------------------------------------------------+ 26016 +-----------------------------------------------------+ Export Table (.edata section) [50 bytes] 26065 +-----------------------------------------------------+ 26066 +-----------------------------------------------------+ (No additional details) [46 bytes] 26111 +-----------------------------------------------------+ +-------------------------------------------------------------+ .data section [3584 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ .rsrc section [1536 bytes] +-------------------------------------------------------------+ +-------------------------------------------------------------+ _winzip_ section [1329152 bytes] +-------------------------------------------------------------+ 1.3MB of data 36 Software Development Files (Key Findings 1 of 2) • The Cygwin software development files and utility programs contain scores of security vulnerabilities; therefore, we do not recommend them for secure programming activities • 42 of the 56 Sun Microsystems Java software development files that we tested (including the Java interpreter) contained the use of one or more functions that are susceptible to buffer overflow attacks • 8 of the 32 Microsoft Visual Studio SDK files that we tested (including the MSIL assembler) contained the use of one or more functions that are susceptible to buffer overflow attacks 37 Software Development Files (Key Findings 2 of 2) • 13 of the 21 Microsoft Visual Studio C/C++ 7.0 files that we tested (including the compiler and linker) contained the use of one or more functions that are susceptible to buffer overflow attacks 38 Windows XP Operating System Files (Key Findings) • In the Windows XP Home Edition system32 directory, approximately 25% of the executable files and DLLs use one or more standard C functions that are susceptible to buffer overflow attacks 39 Microsoft Application Files (Key Findings) • (Good) Only 3 of the 48 Microsoft Office DLLs have four or more security vulnerabilities • (Good) Of the 15 NetMeeting DLLs, only one file has a security vulnerability; it uses only one low risk standard C function susceptible to buffer overflow attacks 40 Security-centric Application Files (Key Findings) • In the Network Associates Common Framework software installation that accompanies the VirusScan software installation, approximately 75% of the executable files and DLLs use one or more standard C functions that are susceptible to buffer overflow attacks • The SecureCRT 4.0 software contains executable files and DLLs that are highly vulnerable to buffer overflow attacks • In the Zero Knowledge Freedom software, approximately 70% of the DLLs contain the use of standard C functions that are susceptible to buffer overflow attacks • In the Zone Alarm Pro 4.0 software, all three of the DLLs contain the use of standard C functions that are susceptible to buffer overflow attacks 41 Miscellaneous Application Files (Key Findings 1 of 2) • Earthlink TotalAccess: 42 of the 54 DLLs contain security vulnerabilities • MusicMatch Jukebox: 29 of the 44 DLLs contain security vulnerabilities; 11 of those contain four or more vulnerabilities • OpenOffice: 50 of the 193 DLLs contain security vulnerabilities; 12 of those have four or more vulnerabilities • Real One Player: 14 of the 15 DLLs contain security vulnerabilities 42 Miscellaneous Application Files (Key Findings 2 of 2) “These key findings indicate a major lack of secure programming practices by the people who developed the DLLs for these miscellaneous applications. This is in sharp contrast to the very low number of security vulnerabilities detected by findssv in the DLLs of the Windows application files. However, this high number of vulnerabilities corresponds closely to the large number of vulnerabilities found in the executable files and the DLLs in the Windows System32 directory.” 43 Conclusion and Future Work Research Thesis This research demonstrated the following: – A methodology can be devised that uses information located in the headers, sections, and tables of an executable file, along with information derived from the overall contents of the file, as a means to detect specific software security vulnerabilities without having to disassemble the code – Such a methodology can be instantiated in a software utility program that automatically detects certain software security vulnerabilities before ever installing and running the executable file 45 Immediate Practical Uses of findssv • It quickly pares down a group of executable files to the ones in which secure programming was not an objective of the software developers • It can do in seconds what could take a security analyst days or weeks to do using hex editors and file dump utilities Scan for Vulnerabilities… – It knows what to look for and where to look for it in the PE format – It knows when to stop looking when specific security vulnerability indicators are not present 46 Future Work • Determine the compiler and linker used to build an executable file • Establish the relationship between DLL function use and program purpose • Provide more details on unknown regions • Reveal the names of files stored in compressed file regions • Detect the use of standard C functions by way of function call signatures searched for in the code sections of a PE file 47 Static Analysis of Anomalies and Security Vulnerabilities in Executable Files Presented by Jay-Evan Tevis Department of Computer Science LeTourneau University Longview, TX (jaytevis@letu.edu) Questions?