Watching the Detectives Forensic Information in Digital Objects (FIDO) KCL Facts • 5 million archives (including artefacts, images, sound recordings and databases) • 295,000 rare/special books • Spans 6 centuries (most from 18th Century onwards) • Wide range of subjects, formats and languages • Internationally and nationally recognised • Whole collection valued at £81,000,000 • Liddell Hart Centre for Military Archives and Foyle Special Collections library Information Management Team Responsible for advice and support for: • Content creation • Active management during business use • Retention for legal or business purposes • Digital archiving and preservation JISC FIDO Project • 6 month project in 2011 • Investigation of tools to aid data acquisition, file identification & process documentation • Case study to report findings & lessons learnt • Mapping of forensic terms to archival terms • Address ethical issues of the approach • Establish suitable computer hardware and tools to assist in newly defined digital acquisition process Why digital forensics? • Forensic investigation is an emerging profession developing tools that map user activity to legal admissibility standards • Digital collections can be large and difficult to appraise – forensic tools can provide analysis of file characteristics and document what is done & when • Forensic tools can provide contextual information such as a timeline or file types for initial appraisal • Authenticity – Archivists need to capture authentic digital collections - forensic tools can support this process Digital forensics vs Digital appraisal • Different language – terms mean different things to each practitioner • Confidence & skills – Digital archive skills are much closer to forensics or IT than traditional skills • Forensics are dealing with potential crime scene – archivists work with the co-operation of the depositor • Forensics want all available information including deleted documents & browser history whereas archivists may only have consent to take files defined by the donor Ethical Issues • • • • • • Does the depositor know the collection? A forensic image will capture everything! Is e-mail included in the deposit? Do all family members agree to the deposit? Does the depositor own the copyright? Is there unpublished work that might be published after deposit? • Are computers included or just their contents? Technical Issues • Data transfer or recovery • Level of rights required for tasks • Additional hardware/software familiarisation • New skills for archives staff • Redaction • Finding new software for particular tasks Data handling workflow Acquire Obtain data from depositor / donor Analyse Examine the acquired data to locate user generated content Appraise Appraise data to select data of potential value to the institution Archive Transfer selected data into digital repository for curation & preservation Data Acquisition Methods 1. File copy: Files are copied/moved from the donor’s media to AIM-owned storage, e.g. FTP, DVD-R, hard disk 2. Disk clone: Bit copy of files on source disk copied to mirror disk 3. Disk image: Bit copy of disk is created and stored as a file on other media. Different Hardware Different Media What type of media do you wish to image? Removable media (e.g. floppy, CDROM, USB stick, etc.) Hard disk Is the disk installed in a computer? Locate media reader & create disk image No Yes Does the machine possess appropriate ports (e.g. USB/Firewire) to allow connection of an external HD? No Yes Do you have permission to remove the disk from the machine & is it physically possible ? No Copy files to disk. Notify donor that some content may be missed ATA/IDE or SATA Yes Are you able to perform a network capture? No What type of connectors does it have? Capture disk image using network capture Are you able to boot from disk/optical media & perform capture? No Perform capture via host system Install into portable disk enclosure Other Obtain appropriate reader device Yes Boot from media & perform imaging Yes 12 Data held on digital media • Types: – Operating system files, e.g. Windows has 30,000+ after fresh install – Software: Applications, utilities, games, etc. – Log data: Windows Registry, browser cache, cookies, temp files – User-generated content: Documents, images, sound, emails, etc. • Data layers: 1. Active data: Information normally seen by Operating System 2. Inactive/residual data: deleted or modified data • • Deleted files located in unallocated space that have yet to be overwritten (retrieved using undelete application) Data fragments that contains information from a partially deleted file (retrieved through carving) Usefulness of Inactive data still to be seen 13 Active Data Analysis Common techniques: • Navigate directory structure to get a ‘feel’ for data files held on disk • Search by: • • • • • • File name, e.g. *report* File type, e.g. *.doc, *.pdf, etc. Creation/modification date Content type, e.g. word usage File size Windows search does not identify everything investigation process leaves artefacts, e.g. thumbs.db behind OS Forensic search interface for active files Sort by: •Name, •Folder, •Size •Type, •Creation date, •Modification date, Recovering deleted files Recovering partial/complete files •Undelete\File recovery software searches unallocated space and makes found files available. Recovering Data Fragments •Data carving technique - raw bits of disk analysed to identify recognisable patterns that may indicate a data file, e.g. header/footer, semantic information. – Carving software designed to take a linear approach to locating data files – ineffective on fragmented disks – Creates Franken-Files! – incomplete files, large files containing info from multiple sources, extracts embedded images from PowerPoint's, etc Keyword Search • Scan the content of a disk, including all emails, documents and other text content, to locate a particular search term • Commonly used by police to identify illegal content, e.g. bank numbers, telephone numbers Archival use: • Does the disk contain reference to topic X? • What trends may be identified in use of concept – when did term appear and disappear? Analysis of research behaviour •Hard disk may contain other information: – Web sites visited/bookmarked for research – Chat logs indicating discussion with colleagues – Other digital media that may have been used to store data This may be useful for understanding researcher work process, but consider the ethical issues What type of information do you wish to locate on the drive? Specific information on a topic User created data files What level of analysis are you permitted to perform? Do you know what keywords should be used? No Examine event logs for devices connected/ disconnected Yes Full search Including active, Deleted & fragments Contact/research donor Information about other media on which data may be stored Create & search index None Perform file search of common file types Only readily available files (active files) Do you have any additional criteria for user content? Specific object types/formats Perform file search of specific file types Available & deleted files Perform search of active & inactive (deleted) files Data created/modifed before/after/between a set date Perform file search with additional date parameters Forensic Hardware 1) Desktop PC Intel Pentium Dual Core E5800 CPU (3.20Ghz) 2GB DDR 500GB HD (2) USB Write Blocker Prevents OS writing to connected devices Super multi DVD-RW (3) Drive enclosure Enables connection of internal ATA/SATA disks via USB (4) Kryoflux USB Floppy disk controller to enable attachment of disparate disk devices & forensic imaging Access to digital collections • Publication of summary guide • Folder hierarchy to give overview of collection • Ability of researchers to search across file lists/index to identify information • Access to whole digital collection? • Policy regarding number of files, what access, copies still to be determined Next steps • Working with desktop support to capture images • Drafting new advice for depositors • Encouraging depositors to deposit their digital records • Working with College Senior staff to capture their personal papers and research data throughout their career • Improving skills within the AIM team – especially Mac skills • Preserving digital records in our collections Thank you Lindsay Ould Information Manager and Digital Archivist E-mail: digital-archive@kcl.ac.uk