Forensic Information in Digital Objects (FIDO)

advertisement
Watching the Detectives
Forensic Information in Digital
Objects (FIDO)
KCL Facts
• 5 million archives
(including artefacts, images, sound recordings
and databases)
• 295,000 rare/special books
• Spans 6 centuries (most from 18th Century
onwards)
• Wide range of subjects, formats and languages
• Internationally and nationally recognised
• Whole collection valued at £81,000,000
• Liddell Hart Centre for Military Archives and
Foyle Special Collections library
Information Management Team
Responsible for advice and support for:
• Content creation
• Active management during business use
• Retention for legal or business purposes
• Digital archiving and preservation
JISC FIDO Project
• 6 month project in 2011
• Investigation of tools to aid data acquisition, file
identification & process documentation
• Case study to report findings & lessons learnt
• Mapping of forensic terms to archival terms
• Address ethical issues of the approach
• Establish suitable computer hardware and tools to
assist in newly defined digital acquisition process
Why digital forensics?
• Forensic investigation is an emerging profession
developing tools that map user activity to legal
admissibility standards
• Digital collections can be large and difficult to
appraise – forensic tools can provide analysis of file
characteristics and document what is done & when
• Forensic tools can provide contextual information
such as a timeline or file types for initial appraisal
• Authenticity – Archivists need to capture authentic
digital collections - forensic tools can support this
process
Digital forensics vs Digital
appraisal
• Different language – terms mean different things to
each practitioner
• Confidence & skills – Digital archive skills are much
closer to forensics or IT than traditional skills
• Forensics are dealing with potential crime scene –
archivists work with the co-operation of the
depositor
• Forensics want all available information including
deleted documents & browser history whereas
archivists may only have consent to take files
defined by the donor
Ethical Issues
•
•
•
•
•
•
Does the depositor know the collection?
A forensic image will capture everything!
Is e-mail included in the deposit?
Do all family members agree to the deposit?
Does the depositor own the copyright?
Is there unpublished work that might be published
after deposit?
• Are computers included or just their contents?
Technical Issues
• Data transfer or recovery
• Level of rights required for tasks
• Additional hardware/software
familiarisation
• New skills for archives staff
• Redaction
• Finding new software for particular
tasks
Data handling workflow
Acquire
Obtain data from depositor /
donor
Analyse
Examine the acquired data to
locate user generated content
Appraise
Appraise data to select data of
potential value to the institution
Archive
Transfer selected data into
digital repository for curation &
preservation
Data Acquisition Methods
1. File copy: Files are
copied/moved from the
donor’s media to AIM-owned
storage, e.g. FTP, DVD-R,
hard disk
2. Disk clone: Bit copy of files
on source disk copied to
mirror disk
3. Disk image: Bit copy of disk
is created and stored as a file
on other media.
Different Hardware
Different Media
What type of media do
you wish to image?
Removable media
(e.g. floppy, CDROM, USB stick,
etc.)
Hard disk
Is the disk installed in a
computer?
Locate media
reader &
create disk
image
No
Yes
Does the machine
possess appropriate
ports (e.g. USB/Firewire)
to allow connection of an
external HD?
No
Yes
Do you have permission
to remove the disk from
the machine & is it
physically possible ?
No
Copy files to
disk. Notify
donor that some
content may be
missed
ATA/IDE or
SATA
Yes
Are you able to
perform a network
capture?
No
What type of
connectors does it
have?
Capture
disk image
using
network
capture
Are you able to boot
from disk/optical media
& perform capture?
No
Perform
capture via
host system
Install into
portable disk
enclosure
Other
Obtain
appropriate
reader device
Yes
Boot from
media &
perform
imaging
Yes
12
Data held on digital media
•
Types:
– Operating system files, e.g. Windows has 30,000+ after fresh
install
– Software: Applications, utilities, games, etc.
– Log data: Windows Registry, browser cache, cookies, temp files
– User-generated content: Documents, images, sound, emails, etc.
•
Data layers:
1. Active data: Information normally seen by Operating System
2. Inactive/residual data: deleted or modified data
•
•
Deleted files located in unallocated space that have yet to be
overwritten (retrieved using undelete application)
Data fragments that contains information from a partially deleted file
(retrieved through carving)
Usefulness of Inactive data still to be seen
13
Active Data Analysis
Common techniques:
•
Navigate directory structure to get a ‘feel’
for data files held on disk
•
Search by:
•
•
•
•
•
•
File name, e.g. *report*
File type, e.g. *.doc, *.pdf, etc.
Creation/modification date
Content type, e.g. word usage
File size
Windows search does not identify everything
investigation process leaves artefacts,
e.g. thumbs.db behind
OS Forensic search interface for
active files
Sort by:
•Name,
•Folder,
•Size
•Type,
•Creation date,
•Modification date,
Recovering deleted files
Recovering partial/complete files
•Undelete\File recovery software searches unallocated space and
makes found files available.
Recovering Data Fragments
•Data carving technique - raw bits of disk analysed to identify
recognisable patterns that may indicate a data file, e.g. header/footer,
semantic information.
– Carving software designed to take a linear approach to locating data files –
ineffective on fragmented disks
– Creates Franken-Files! – incomplete files, large files containing info from multiple
sources, extracts embedded images from PowerPoint's, etc
Keyword Search
• Scan the content of a disk,
including all emails,
documents and other text content,
to locate a particular search term
• Commonly used by police to identify
illegal content, e.g. bank numbers,
telephone numbers
Archival use:
• Does the disk contain reference to topic X?
• What trends may be identified in use of concept – when did term
appear and disappear?
Analysis of research behaviour
•Hard disk may contain other
information:
– Web sites visited/bookmarked
for research
– Chat logs indicating
discussion
with colleagues
– Other digital media that may
have been used to store data
This may be useful for
understanding
researcher work process,
but consider the ethical issues
What type of
information do you
wish to locate on the
drive?
Specific
information on a
topic
User
created
data files
What level of analysis
are you permitted to
perform?
Do you know what
keywords should be
used?
No
Examine event logs for
devices connected/
disconnected
Yes
Full search
Including active,
Deleted &
fragments
Contact/research
donor
Information about other
media on which data
may be stored
Create & search index
None
Perform file search of
common file types
Only readily
available files
(active files)
Do you have any
additional criteria for
user content?
Specific object
types/formats
Perform file search of
specific file types
Available &
deleted files
Perform search of
active & inactive
(deleted) files
Data created/modifed
before/after/between
a set date
Perform file search
with additional date
parameters
Forensic Hardware
1) Desktop PC
Intel Pentium Dual Core
E5800 CPU (3.20Ghz)
2GB DDR
500GB HD
(2) USB Write
Blocker
Prevents OS
writing to
connected
devices
Super multi DVD-RW
(3) Drive
enclosure
Enables
connection of
internal
ATA/SATA disks
via USB
(4) Kryoflux USB
Floppy disk
controller to enable
attachment of
disparate disk
devices & forensic
imaging
Access to digital collections
• Publication of summary guide
• Folder hierarchy to give overview of
collection
• Ability of researchers to search across
file lists/index to identify information
• Access to whole digital collection?
• Policy regarding number of files, what
access, copies still to be determined
Next steps
• Working with desktop support to capture images
• Drafting new advice for depositors
• Encouraging depositors to deposit their digital
records
• Working with College Senior staff to capture their
personal papers and research data throughout their
career
• Improving skills within the AIM team – especially
Mac skills
• Preserving digital records in our collections
Thank you
Lindsay Ould
Information Manager and Digital Archivist
E-mail: digital-archive@kcl.ac.uk
Download