Data Carving using Artificial Headers R. Daniel1, N.L. Clarke1,2 & F. Li1 for Security, Communications & Network Research (CSCAN), Plymouth University, United Kingdom; 2Security Research Institute, Edith Cowan University, Western Australia info@cscan.org 1Centre Abstract Digital forensic tools are an essential requirement in criminal and increasingly civil cases in order to process electronic evidence. Investigators rely upon the functionality of these tools to identify and extract relevant artifacts. One of these key processes is data carving – an approach that ignores the file system and analyses the drive for files that match a particular signature. Unfortunately, however, other than simple files, data carving has many limitations that result in either missing files or producing high numbers of false alarms. The core of their detection is largely based upon a signature appearing in the header of the file. However, for files that have corrupted or missing headers, modern data carvers are unable to recover the file successfully. This paper proposes a new approach to data carving that inserts an artificial header onto the file, thereby circumventing the header issue. Experiments have demonstrated that this approach is able to successfully recover files that no current data-carving tools are able to achieve. 1. INTRODUCTION Digital forensics has become an invaluable tool in the identification of criminal activities (Casey, 2010). Computer and mobile forensics have received particular attention due to the demand from law enforcement, which is in turn linked to the growth and popularity of such equipment (European AntiFraud Office, 2014). Used for both cyber and traditional crime (e.g. terrorist attacks, child pornography and information leakage), these electronic devices provide an invaluable source of information and evidence. Indeed, criminals have been prosecuted based upon the evidence recovered from their computers and mobile phones via digital forensic techniques (FBI, 2011; Inforsecusa, 2011; Brainz, 2014). An essential analysis tool available to investigators is to perform data carving. This process permits the recovery of files from the raw image independent of any file system that might be present. This enables files to be recovered from unallocated space, slack space and from within files that an inspection of the file system would not reveal. The primary for detection mechanism is to locate the header and footer of a file and extract the data in between (Beek, 2011). Unfortunately, however, due to a variety of issues, such as fragmentation, deletion and missing sectors, the ability for data carvers to recover the data successfully is variable (Merola, 2008). A key issue for data carvers is their ability to recover data in scenarios where no associated header or footer information is present. For example, slack space often contains information regarding files but with the header missing perhaps due to being overwritten. The paper develops a new approach to data carving that enables the investigator to be able to determine if particular chunks of data contain information. The paper is structured as follows. Section 2 describes the current state of the art, introduces a range of data carvers and performs an evaluation of data carvers to investigate their performance. Section 3 presents the new tool and describes the design, testing and logic of the approach. An evaluation of the tool is presented in Section 4 alongside the conclusions and future work in Section 5. Information Institute Conferences, Las Vegas, NV, May 21-23, 2014 1 Daniel; Clarke; Li 2. BACKGROUND LITERATURE Literature often seeks to classify data carving approaches into two: simple and advanced (Pal & Memon, 2009). Simple data carvers are able to carve files via identifying a unique signature within the header and locating its associated footer. For example, a PDF file could be carved from a piece of data if it starts with “%PDF” (i.e. the PDF header) and ends with “%EOF” (i.e. the PDF footer). The approach therefore assumes the files are stored in continuous data clusters within the raw image (Hand, 2012). From one perspective, this is a sound assumption, as modern file systems will always seek to store data in continuous data clusters. However, due to the operation of the file system and the size of a file, a series of alternative scenarios are possible. As illustrated in Figure 1, a variety of fragmentation possibilities exist which result in the data for a file being injected with data from another file, missing or reversed. Figure 1: Examples of File Fragmentation Advanced data carving approaches seek to overcome these issues. Techniques to date largely focus upon relying upon some internal file structure within the data itself. Content-based approaches utilize characteristics such as character count, text/language recognition, white and black listing of data, statistical attributes and information entropy (Kloet, 2010). Such approaches are however open to errors with incorrectly carved files. This gives rise to performance characteristics. Garfinkel (2007) identified two key limitations with current data carving tools: 1. Files had to be stored in sequential clusters 2. No evaluation of the carved file leading to a large number of false positives Pal and Memon (2009) present a number of approaches that seek to automate the reconstruction of fragmented files with varying levels of success. Automated verification of the validity of data carving is no simple problem to solve. Whilst the literature provides a reasonable overview of the current state of the art, it is difficult to establish their relative performance. Moreover, it is not evident from the prior work, how well they perform in scenarios where files are fragmented. It was therefore considered prudent to perform an evaluation of current tool capabilities in order to evaluate the performance. An experiment was devised to test the capabilities of a number of data carvers against a fixed forensic image. The Digital Forensics Workshop (DFRWS) through its annual conference challenge produced a dataset in 2006 (and also a 2 Editors: Gurpreet Dhillon and Spyridon Samonas Data Carving using Artificial Headers more advanced version in 2007) (DFRWS, 2006; DFRWS, 2007). The 2006 dataset focused primarily on 4 categories of files: HTML, Microsoft Office, JPEG and Zip and contained a total of 32 base files. A selection of open source and commercial data carving tools were utilized, including the industry leading products: Guidance Software’s Encase and AccessData’s FTK (Guidance Software, 2014; AccessData, 2014). Application Encase FTK Scalpel WinHex No. of files present 32 32 32 32 No. of Files Extracted 24 24 50 13 No. of Successfully Carved 10 (31%) 6 (19%) 15 (47%) 8 (25%) No. of Partial Carved Files 6 10 5 5 Table 1: Data Carver Results for DFRWS 2006 Dataset The results from the DRFWS 2006 dataset demonstrate a relatively poor performance across the tools. The successful category is measured based upon a file that is completely carved correctly. It was notable on a number of occasions across all tools that partial recovery was possible. Indeed, utilizing Scalpel, three of the fragmented image files had been partial recovered successfully. In these particular cases, enough to recognize the content and thus be of potential use; however, this is not necessarily always the case. Notably, none of the carvers supported the Microsoft Excel spreadsheet or the text file formats, so neither were successfully carved. That said, some of the text files were contained within other partially carved files (i.e. appeared as a fragment after an HTML file). Initially, the 2007 dataset was also going to be evaluated; however, as it represents a more complex scenario incorporating a wider range of file types such as MP3, AVI, FLV and PDF and given the performance against the 2006 dataset, it was deemed unnecessary. Analysis of these results shows that the data carvers have a significant issue when it comes files that are fragmented, out of sequence or missing. What is particularly surprising is that these problems have been established for over 8 years and modern carvers are still unable to process them (Garfinkel, 2007). 3. FILE RECOVERY USING ARTIFICIAL HEADERS (FRAH) Given the prior art and evaluation of the tools, the research sought to develop an approach to data carving that look to solve several issues: • To provide the ability to render files with missing or corrupt headers • To provide the ability to render fragments of data that contain no associated header information. This approach to the problem enables the investigator to examine whether files that are not rendering (or cannot be open) might indeed be incomplete but yet contain valuable information. It also provides an approach to examine the slack space areas within the drive to determine whether the data is meaningful. It achieves this by inserting an artificial header on the file and subsequently manipulating the data in order to determine whether a valid file is present. A process model for the approach is presented in Figure 2. Figure 2: FRAH Process Model Information Institute Conferences, Las Vegas, NV, May 21-23, 2014 3 Daniel; Clarke; Li In order to test the approach, a prototype was developed. As illustrated in Figure 3, a simple interface was proposed that accepted the location of the file and would then subsequently proceed to evaluate the data against a set of pre-defined file types (e.g. BMP, PNG, GIF, PDF). In order to focus upon the concept of artificial headers, the tool was designed to take files that Access Data’s FTK was able to extract, rather than working on the individual forensic image; however, future developments will include this functionality. After the file has been entered and a file type selected, the system will apply the appropriate header and attempt to open the file using the system’s built-in viewer. Figure 3: FRAH Interface For the purposes of demonstrating the capability, the tool merely leaves the decision as to whether the file context is valid or not to the investigator. However, for large numbers of files, this process will need to be automated. 4. EVALUATION & DISCUSSION In order to test the tool across the differing files types, a number of test files were created (2 BMP, 2 PNG, 1 GIF, 1 PDF). In each of the cases, the header information was corrupted through the deletion or additional of random bytes. Importantly however, in all but one test file (Testfile1b), the data carving signature was included, meaning data carvers should be able to identify the file. As illustrated in Figure 3, in a standard file system view of the files, none of them are either rendered or identified except for Testfile1a – which is recognized as a BMP merely due to the file extension being present on the file name. Nevertheless it is still unable to render the image due to corruption. 4 Editors: Gurpreet Dhillon and Spyridon Samonas Data Carving using Artificial Headers Figure 4: Evaluation Files: Initial State As illustrated in Table 3 and Figure 4, the application of FRAH results in each of these files being recoverable. In each case, FRAH merely ignores any header information present and merely inserts an artificial header onto the file. Filename Testfile1a Testfile1b Testfile2a Testfile2b Testfile3a Testfile4a Carve Signature File Type Analysis File Type Yes BMP BMP No Unknown BMP Yes Unknown PNG Yes Unknown PNG Yes Unknown GIF Yes Unknown PDF Table 3: Evaluation Results Successful Carve Yes Yes Yes Yes Yes Yes Interestingly, even with valid carver signatures present in five of the six files, testing these files against the Access Data’s FTK resulted in FTK unable to recover any of the files. The FTK data carving process did however recover three partial carved files, but all three were associated with images contained within the PDF of Testfile4a. Figure 5: Evaluation Files: Post FRAH Information Institute Conferences, Las Vegas, NV, May 21-23, 2014 5 Daniel; Clarke; Li Notably, neither of the forensic images (DFRWS 2006 and 2007) contain files where the header is specifically corrupted or no longer present, although files that have been fragmented could arguably fall into this category for any fragments (bar the one containing the header). Therefore, a secondary external source was identified in order to evaluate the tool. The DC3 Digital Forensics Challenge is an annual forensics challenge run by the US Department of Defence (DC3, 2013). The challenge involves users putting their knowledge of security to the test in completing a range of tasks such as data carving, decryption, file registry analysis and steganography. The challenge consists of two files with missing headers (a PNG and PDF). As illustrated in Figure 4 both of these files were recovered successfully. Whilst the evaluation has proven successful, further analysis of the scenarios that would naturally occur within cases does highlight a number of limitations with the current approach. FRAH currently operates by inserting an artificial header onto the payload of the file. If a file header is corrupted then FRAH is able to recover the file. However, in circumstances where the header or the first fragment is missing, it is likely that elements of the payload in addition to the header are also missing. Further research needs to investigate the impact of missing or corrupt payload data, with a view to the padding and manipulation of the data in order to recover the files contents that remain. This approach would then also permit the application of single fragments of data to be recovered (rather than simple the first fragment as is typical with data carvers today). 5. CONCLUSIONS The proposed tool is capable of recovering files with corrupt or missing header information across a number of standard file types. An analysis of current data carvers demonstrated that none of these tools currently have such capability and the evaluation successfully demonstrated recovery for all files. The initial prototype is however limited and further research is required to provide a more robust carver with a level of automation. Enhancements are required in the following areas: • • • • The ability to accept a range of data fragments, rather than a single file so that multiple data fragments can readily analyzed To automate the identification of meaningful data, thereby removing the need for humanintervention To manipulate the file contents in a systematic fashion in order to enable successful viewing of the content To increase the range of file types supported References AccessData (2014) “FTK-Forensic Toolkit”, http://www.accessdata.com/products/digital-forensics/ftk Retrieved from Beek, C (2011) “Introduction to File Carving”, McAfee white paper, Retrieved from http://www.mcafee.com/uk/resources/white-papers/foundstone/wp-intro-to-file-carving.pdf Brainz (2014) “15 Criminal Cases Solved With Digital Evidence”, Retrieved from http://brainz.org/15criminal-cases-solved-digital-evidence/ Casey, E. ed (2010). “Handbook of Digital Forensics and Investigation”, Academic Press. p. 567. ISBN 012-374267-6 DC3 (2013) “DC3 Cyber Crime Challenges”, Retrieved from https://www.dc3.mil/challenge/ DFRWS (2006) “DFRWS 2006 Forensics Challenge http://www.dfrws.org/2006/challenge/index.shtml 6 Editors: Gurpreet Dhillon and Spyridon Samonas Overview”, Retrieved from Data Carving using Artificial Headers DFRWS (2007) “DFRWS 2007 Forensics Challenge http://www.dfrws.org/2007/challenge/index.shtml Overview”, European Anti-Fraud Office (2014) “Digital Forensics”, http://ec.europa.eu/anti_fraud/investigations/forensics/index_en.htm FBI (2011) “Digital Forensics Regional Labs Help Solve Local http://www.fbi.gov/news/stories/2011/may/forensics_053111 Retrieved Retrieved Crimes”, Retrieved from from from Garfinkel, S. (2007). “Carving contiguous and fragmented files with fast object validation”, Retrieved from http://dfrws.org/2007/proceedings/p2-garfinkel.pdf Guidance Software (2014) “EnCase Forensic”, Retrieved from http://www.guidancesoftware.com/products/Pages/encase-forensic/overview.aspx?cmpid=nav Hand, S. (2012) “Bin-Carver: Automatic Recovery of Binary Executables”, Retrieved from http://www.dfrws.org/2012/proceedings/DFRWS2012-12.pdf Inforsecusa (2011) “Computer Forensics Criminal http://infosecusa.com/computer-forensics-criminal-cases Cases”, Retrieved from Kloet, B. (2010) “Advanced File Carving”, Retrieved from http://computer-forensics.sans.org/summitarchives/2010/eu-digital-forensics-incident-response-summit-bas-kloet-advanced-file-carving.pdf Merola, A (2008) “Data Carving Concepts”, Retrieved from http://www.sans.org/readingroom/whitepapers/forensics/data-carving-concepts-32969 Pal, A & Memon, N. (2009). “The Evolution of File Carving”, Retrieved from http://digitalassembly.com/technology/research/pubs/ieee-spm-2009.pdf Information Institute Conferences, Las Vegas, NV, May 21-23, 2014 7