Data Carving using Artificial Headers

advertisement
Data Carving using Artificial Headers
R. Daniel1, N.L. Clarke1,2 & F. Li1
for Security, Communications & Network Research (CSCAN), Plymouth
University, United Kingdom;
2Security Research Institute, Edith Cowan University, Western Australia
info@cscan.org
1Centre
Abstract
Digital forensic tools are an essential requirement in criminal and increasingly civil cases
in order to process electronic evidence. Investigators rely upon the functionality of these
tools to identify and extract relevant artifacts. One of these key processes is data carving –
an approach that ignores the file system and analyses the drive for files that match a
particular signature. Unfortunately, however, other than simple files, data carving has
many limitations that result in either missing files or producing high numbers of false
alarms. The core of their detection is largely based upon a signature appearing in the
header of the file. However, for files that have corrupted or missing headers, modern data
carvers are unable to recover the file successfully. This paper proposes a new approach to
data carving that inserts an artificial header onto the file, thereby circumventing the
header issue. Experiments have demonstrated that this approach is able to successfully
recover files that no current data-carving tools are able to achieve.
1. INTRODUCTION
Digital forensics has become an invaluable tool in the identification of criminal activities (Casey, 2010).
Computer and mobile forensics have received particular attention due to the demand from law
enforcement, which is in turn linked to the growth and popularity of such equipment (European AntiFraud Office, 2014). Used for both cyber and traditional crime (e.g. terrorist attacks, child pornography
and information leakage), these electronic devices provide an invaluable source of information and
evidence. Indeed, criminals have been prosecuted based upon the evidence recovered from their
computers and mobile phones via digital forensic techniques (FBI, 2011; Inforsecusa, 2011; Brainz, 2014).
An essential analysis tool available to investigators is to perform data carving. This process permits the
recovery of files from the raw image independent of any file system that might be present. This enables
files to be recovered from unallocated space, slack space and from within files that an inspection of the file
system would not reveal. The primary for detection mechanism is to locate the header and footer of a file
and extract the data in between (Beek, 2011). Unfortunately, however, due to a variety of issues, such as
fragmentation, deletion and missing sectors, the ability for data carvers to recover the data successfully is
variable (Merola, 2008).
A key issue for data carvers is their ability to recover data in scenarios where no associated header or
footer information is present. For example, slack space often contains information regarding files but with
the header missing perhaps due to being overwritten. The paper develops a new approach to data carving
that enables the investigator to be able to determine if particular chunks of data contain information.
The paper is structured as follows. Section 2 describes the current state of the art, introduces a range of
data carvers and performs an evaluation of data carvers to investigate their performance. Section 3
presents the new tool and describes the design, testing and logic of the approach. An evaluation of the tool
is presented in Section 4 alongside the conclusions and future work in Section 5.
Information Institute Conferences, Las Vegas, NV, May 21-23, 2014
1
Daniel; Clarke; Li
2. BACKGROUND LITERATURE
Literature often seeks to classify data carving approaches into two: simple and advanced (Pal & Memon,
2009). Simple data carvers are able to carve files via identifying a unique signature within the header and
locating its associated footer. For example, a PDF file could be carved from a piece of data if it starts with
“%PDF” (i.e. the PDF header) and ends with “%EOF” (i.e. the PDF footer). The approach therefore
assumes the files are stored in continuous data clusters within the raw image (Hand, 2012). From one
perspective, this is a sound assumption, as modern file systems will always seek to store data in
continuous data clusters.
However, due to the operation of the file system and the size of a file, a series of alternative scenarios are
possible. As illustrated in Figure 1, a variety of fragmentation possibilities exist which result in the data for
a file being injected with data from another file, missing or reversed.
Figure 1: Examples of File Fragmentation
Advanced data carving approaches seek to overcome these issues. Techniques to date largely focus upon
relying upon some internal file structure within the data itself. Content-based approaches utilize
characteristics such as character count, text/language recognition, white and black listing of data,
statistical attributes and information entropy (Kloet, 2010). Such approaches are however open to errors
with incorrectly carved files. This gives rise to performance characteristics.
Garfinkel (2007) identified two key limitations with current data carving tools:
1. Files had to be stored in sequential clusters
2. No evaluation of the carved file leading to a large number of false positives
Pal and Memon (2009) present a number of approaches that seek to automate the reconstruction of
fragmented files with varying levels of success. Automated verification of the validity of data carving is no
simple problem to solve.
Whilst the literature provides a reasonable overview of the current state of the art, it is difficult to
establish their relative performance. Moreover, it is not evident from the prior work, how well they
perform in scenarios where files are fragmented. It was therefore considered prudent to perform an
evaluation of current tool capabilities in order to evaluate the performance. An experiment was devised to
test the capabilities of a number of data carvers against a fixed forensic image. The Digital Forensics
Workshop (DFRWS) through its annual conference challenge produced a dataset in 2006 (and also a
2
Editors: Gurpreet Dhillon and Spyridon Samonas
Data Carving using Artificial Headers
more advanced version in 2007) (DFRWS, 2006; DFRWS, 2007). The 2006 dataset focused primarily on
4 categories of files: HTML, Microsoft Office, JPEG and Zip and contained a total of 32 base files.
A selection of open source and commercial data carving tools were utilized, including the industry leading
products: Guidance Software’s Encase and AccessData’s FTK (Guidance Software, 2014; AccessData,
2014).
Application
Encase
FTK
Scalpel
WinHex
No. of files
present
32
32
32
32
No. of Files
Extracted
24
24
50
13
No. of Successfully
Carved
10 (31%)
6 (19%)
15 (47%)
8 (25%)
No. of Partial
Carved Files
6
10
5
5
Table 1: Data Carver Results for DFRWS 2006 Dataset
The results from the DRFWS 2006 dataset demonstrate a relatively poor performance across the tools.
The successful category is measured based upon a file that is completely carved correctly. It was notable
on a number of occasions across all tools that partial recovery was possible. Indeed, utilizing Scalpel, three
of the fragmented image files had been partial recovered successfully. In these particular cases, enough to
recognize the content and thus be of potential use; however, this is not necessarily always the case.
Notably, none of the carvers supported the Microsoft Excel spreadsheet or the text file formats, so neither
were successfully carved. That said, some of the text files were contained within other partially carved files
(i.e. appeared as a fragment after an HTML file). Initially, the 2007 dataset was also going to be evaluated;
however, as it represents a more complex scenario incorporating a wider range of file types such as MP3,
AVI, FLV and PDF and given the performance against the 2006 dataset, it was deemed unnecessary.
Analysis of these results shows that the data carvers have a significant issue when it comes files that are
fragmented, out of sequence or missing. What is particularly surprising is that these problems have been
established for over 8 years and modern carvers are still unable to process them (Garfinkel, 2007).
3. FILE RECOVERY USING ARTIFICIAL HEADERS (FRAH)
Given the prior art and evaluation of the tools, the research sought to develop an approach to data carving
that look to solve several issues:
•
To provide the ability to render files with missing or corrupt headers
•
To provide the ability to render fragments of data that contain no associated header information.
This approach to the problem enables the investigator to examine whether files that are not rendering (or
cannot be open) might indeed be incomplete but yet contain valuable information. It also provides an
approach to examine the slack space areas within the drive to determine whether the data is meaningful.
It achieves this by inserting an artificial header on the file and subsequently manipulating the data in
order to determine whether a valid file is present. A process model for the approach is presented in Figure
2.
Figure 2: FRAH Process Model
Information Institute Conferences, Las Vegas, NV, May 21-23, 2014
3
Daniel; Clarke; Li
In order to test the approach, a prototype was developed. As illustrated in Figure 3, a simple interface was
proposed that accepted the location of the file and would then subsequently proceed to evaluate the data
against a set of pre-defined file types (e.g. BMP, PNG, GIF, PDF). In order to focus upon the concept of
artificial headers, the tool was designed to take files that Access Data’s FTK was able to extract, rather
than working on the individual forensic image; however, future developments will include this
functionality. After the file has been entered and a file type selected, the system will apply the appropriate
header and attempt to open the file using the system’s built-in viewer.
Figure 3: FRAH Interface
For the purposes of demonstrating the capability, the tool merely leaves the decision as to whether the file
context is valid or not to the investigator. However, for large numbers of files, this process will need to be
automated.
4.
EVALUATION & DISCUSSION
In order to test the tool across the differing files types, a number of test files were created (2 BMP, 2 PNG,
1 GIF, 1 PDF). In each of the cases, the header information was corrupted through the deletion or
additional of random bytes. Importantly however, in all but one test file (Testfile1b), the data carving
signature was included, meaning data carvers should be able to identify the file. As illustrated in Figure 3,
in a standard file system view of the files, none of them are either rendered or identified except for
Testfile1a – which is recognized as a BMP merely due to the file extension being present on the file name.
Nevertheless it is still unable to render the image due to corruption.
4
Editors: Gurpreet Dhillon and Spyridon Samonas
Data Carving using Artificial Headers
Figure 4: Evaluation Files: Initial State
As illustrated in Table 3 and Figure 4, the application of FRAH results in each of these files being
recoverable. In each case, FRAH merely ignores any header information present and merely inserts an
artificial header onto the file.
Filename
Testfile1a
Testfile1b
Testfile2a
Testfile2b
Testfile3a
Testfile4a
Carve Signature File Type Analysis File Type
Yes
BMP
BMP
No
Unknown
BMP
Yes
Unknown
PNG
Yes
Unknown
PNG
Yes
Unknown
GIF
Yes
Unknown
PDF
Table 3: Evaluation Results
Successful Carve
Yes
Yes
Yes
Yes
Yes
Yes
Interestingly, even with valid carver signatures present in five of the six files, testing these files against the
Access Data’s FTK resulted in FTK unable to recover any of the files. The FTK data carving process did
however recover three partial carved files, but all three were associated with images contained within the
PDF of Testfile4a.
Figure 5: Evaluation Files: Post FRAH
Information Institute Conferences, Las Vegas, NV, May 21-23, 2014
5
Daniel; Clarke; Li
Notably, neither of the forensic images (DFRWS 2006 and 2007) contain files where the header is
specifically corrupted or no longer present, although files that have been fragmented could arguably fall
into this category for any fragments (bar the one containing the header). Therefore, a secondary external
source was identified in order to evaluate the tool. The DC3 Digital Forensics Challenge is an annual
forensics challenge run by the US Department of Defence (DC3, 2013). The challenge involves users
putting their knowledge of security to the test in completing a range of tasks such as data carving,
decryption, file registry analysis and steganography. The challenge consists of two files with missing
headers (a PNG and PDF). As illustrated in Figure 4 both of these files were recovered successfully.
Whilst the evaluation has proven successful, further analysis of the scenarios that would naturally occur
within cases does highlight a number of limitations with the current approach. FRAH currently operates
by inserting an artificial header onto the payload of the file. If a file header is corrupted then FRAH is able
to recover the file. However, in circumstances where the header or the first fragment is missing, it is likely
that elements of the payload in addition to the header are also missing. Further research needs to
investigate the impact of missing or corrupt payload data, with a view to the padding and manipulation of
the data in order to recover the files contents that remain. This approach would then also permit the
application of single fragments of data to be recovered (rather than simple the first fragment as is typical
with data carvers today).
5. CONCLUSIONS
The proposed tool is capable of recovering files with corrupt or missing header information across a
number of standard file types. An analysis of current data carvers demonstrated that none of these tools
currently have such capability and the evaluation successfully demonstrated recovery for all files.
The initial prototype is however limited and further research is required to provide a more robust carver
with a level of automation. Enhancements are required in the following areas:
•
•
•
•
The ability to accept a range of data fragments, rather than a single file so that multiple data
fragments can readily analyzed
To automate the identification of meaningful data, thereby removing the need for humanintervention
To manipulate the file contents in a systematic fashion in order to enable successful viewing of the
content
To increase the range of file types supported
References
AccessData
(2014)
“FTK-Forensic
Toolkit”,
http://www.accessdata.com/products/digital-forensics/ftk
Retrieved
from
Beek, C (2011) “Introduction to File Carving”, McAfee white paper, Retrieved from
http://www.mcafee.com/uk/resources/white-papers/foundstone/wp-intro-to-file-carving.pdf
Brainz (2014) “15 Criminal Cases Solved With Digital Evidence”, Retrieved from http://brainz.org/15criminal-cases-solved-digital-evidence/
Casey, E. ed (2010). “Handbook of Digital Forensics and Investigation”, Academic Press. p. 567. ISBN 012-374267-6
DC3 (2013) “DC3 Cyber Crime Challenges”, Retrieved from https://www.dc3.mil/challenge/
DFRWS
(2006)
“DFRWS
2006
Forensics
Challenge
http://www.dfrws.org/2006/challenge/index.shtml
6
Editors: Gurpreet Dhillon and Spyridon Samonas
Overview”,
Retrieved
from
Data Carving using Artificial Headers
DFRWS
(2007)
“DFRWS
2007
Forensics
Challenge
http://www.dfrws.org/2007/challenge/index.shtml
Overview”,
European
Anti-Fraud
Office
(2014)
“Digital
Forensics”,
http://ec.europa.eu/anti_fraud/investigations/forensics/index_en.htm
FBI
(2011) “Digital Forensics Regional Labs Help Solve Local
http://www.fbi.gov/news/stories/2011/may/forensics_053111
Retrieved
Retrieved
Crimes”,
Retrieved
from
from
from
Garfinkel, S. (2007). “Carving contiguous and fragmented files with fast object validation”, Retrieved
from http://dfrws.org/2007/proceedings/p2-garfinkel.pdf
Guidance
Software
(2014)
“EnCase
Forensic”,
Retrieved
from
http://www.guidancesoftware.com/products/Pages/encase-forensic/overview.aspx?cmpid=nav
Hand, S. (2012) “Bin-Carver: Automatic Recovery of Binary Executables”, Retrieved from
http://www.dfrws.org/2012/proceedings/DFRWS2012-12.pdf
Inforsecusa
(2011)
“Computer
Forensics
Criminal
http://infosecusa.com/computer-forensics-criminal-cases
Cases”,
Retrieved
from
Kloet, B. (2010) “Advanced File Carving”, Retrieved from http://computer-forensics.sans.org/summitarchives/2010/eu-digital-forensics-incident-response-summit-bas-kloet-advanced-file-carving.pdf
Merola, A (2008) “Data Carving Concepts”, Retrieved from http://www.sans.org/readingroom/whitepapers/forensics/data-carving-concepts-32969
Pal, A & Memon, N. (2009). “The Evolution of File Carving”, Retrieved from http://digitalassembly.com/technology/research/pubs/ieee-spm-2009.pdf
Information Institute Conferences, Las Vegas, NV, May 21-23, 2014
7
Download