- ULCC Publications Archive

advertisement
Securing your digital heritage:
Practical tips and solutions for smaller archives
Joanne Anthony,
University of London
Computer Centre
CC: Working Together Teamwork Puzzle Concept, by
lumaxart, Flickr,
http://www.flickr.com/photos/lumaxart/2137737248/
USAGE RIGHTS:
The contents of this PowerPoint presentation are provided under the following
open source licence: http://creativecommons.org/licenses/by-sa/3.0/
In summary, you are free:
 to share — to copy, distribute and transmit the work
 to remix — to adapt the work
Under the following conditions:
 Attribution. You must attribute the work by referring to ULCC
http://www.ulcc.ac.uk/ (but not in any way that suggests that we endorse
you or your use of the work).
 Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under the same, similar or a compatible
license.
Coverage






Smaller archives: sustainability challenges
Digital Preservation: Why it matters
Salvaging your website: some options
[Digital Preservation] What you can do: some
practical solutions
The Future?
Useful resources for small archives
Smaller archives: sustainability challenges








Lack of resources e.g. technical, fundraising and bid writing skills;
small time-specific or project-based budgets, skills vacuum with staff
turnover
Limited technological infrastructure and technical expertise to
implement tools/software
Not always linked to or supported by broader policy / mandate; and
varying or non-existent levels of commitment by an institutional
partner
Project-based funding: difficult to integrate digital preservation as
mainstream
Limited resources and plans to actively curate digital assets over the
long-term
Audience Sustainability: “After the launch”
Access: online archive portals: resources to undertake updates &
remain ‘linked up’
Organisations themselves: already stretched resources in addition to
core operations
Source: Bernie Grant Archives website: Home Page: http://www.berniegrantarchive.org.uk/
Source: Bernie Grant Archives website: http://www.berniegrantarchive.org.uk/archive/showcase.asp, Audio clips within
‘The Archive; Showcase’.
Source: Bernie Grant Archives website: http://www.berniegrantarchive.org.uk/archive/showcase.asp, Video clips
within ‘The Archive; Showcase’.
Digital Preservation: Why bother?
CC: Computer Says No_; by Benjibot: http://www.flickr.com/photos/benjibot/3141128891/sizes/m/
Digital Preservation: Why it matters?

Increasing dependence on digital materials

They won’t take care of themselves…

Increasing risk of loss - rapid loss of cultural/corporate/community memories

Timeframe to salvage is short – most digital objects survive less than 5
years!

Archives, regardless of format, reveal what a society chooses to remember,
and what it chooses to forget!

Community practitioners can make a vital contribution to informing &
shaping archival practice (including digital preservation) in this digital age.
Web Archiving Options:
….What are they?
Before you start:
ACKNOWLEDGE you own this web resource and that
something needs to be done – it won’t take care of
itself.
Options:

Do it yourself

Look for outside support
Do it yourself:
… it’s within your reach!

It’s technically possible: web harvesting tools exist that are free and open
source - more flexibility and control

Salvage the website’s core content/format types, not functionality

Save underlying databases as non-proprietary ‘CSV’ format

Keep supporting contextual documentation e.g. web
specifications/contracts, database design documents etc.

Take screenshots of web interface to show original ‘look and feel’

Save flat html files i.e. context of links between pages

Save image/audio/video files separately in preservation formats
Look for outside support

Nominate your site:

UK Web Archive, British Library
http://www.webarchive.org.uk/ukwa/info/nominate
See tips on "Making Your Website Crawler-Friendly”
http://www.webarchive.org.uk/ukwa/info/technical

Huge burden lifted when national institutions are committed to
capturing and sustaining these resources

“Selection policies may be inadequate, reactive or too broad in
selection; so you must be proactive in archiving your sites.”
[nb: personal opinion, not that of ULCC]
Let other people do it?

Internet Archive: http://www.archive.org/ – they possibly have
snapshots already

Good to have another backup here – some room for flexibility over
the harvesting process

http://www.archive-it.org/ Sign up and improve harvesting

Snapshots may be incomplete or sporadic, and dynamic elements
missed in web-harvest e.g. databases, audio/video clips etc.
Practical tips for digital preservation:

Identify digital objects and assess risks and solutions: What
have you got & do you need to keep everything? (e.g. identify formats,
survey records, selection/appraisal, keep inventory/capture metadata, link
cataloguing details to objects)

“Lots of copies keeps stuff safe”! Different media, in different places

Different preservation solutions for different resources

Develop a preservation plan: Document everything e.g. digital capture
- spread the risks; but also document where you’ve put copies Vs masters,
and link them to their contextual inventory/cataloguing details
(migration, emulation, refresh, replicate etc.) There’s more than one way!
processes/workflow, formats/standards used, budget, responsibilities;
migration, test and refresh procedures etc.

Stick to best practice and widely accepted standards e.g.
metadata, formats etc. – more chance your collections will be used,
integrated with other resources, & preserved over the long term.
What you can do: take stock!
What digital assets have you got/are about to create?






Electronic documents - ‘digital paper’ (including email);
Spreadsheets;
Databases (e.g. collection management system,
underlying database of a website, research datasets);
Digital audio/video/images;
Websites; Web 2.0: wikis, blogs etc.;
Exotic forms: virtual worlds, games, programs etc.
Source: Dance Heritage Coalition (U.S): http://www.danceheritage.org/ NB: Of note, see ‘Digital
Video Preservation Reformatting Project’; and see link to ‘Dance Videotapes at Risk’ for inventory
guidance.
Good Digital Preservation depends on:
Taking stock of what you can and can’t control
operationally; and where you need outside help:
1.
Organizational Infrastructure e.g. policies, preservation plans,
institutional commitment etc.
2.
Technological Infrastructure e.g. hardware/software,
storage/formats, security, workflow, procedures, archival/technical skills
etc.
3.
Resources Framework e.g. staff, technology, space, storage etc.
Preservation Strategies – there’s more
than one way…
Migration

Obsolescence is our enemy! Transfer content from one format (such as a
Word document) onto a different format (PDF). So the resource remains
functional and accessible.
Refreshment

Copying data onto another example of the same storage media (such as
from an old CD-ROM to a new CD-ROM). “Same file, new carrier”.
Emulation

Replicating functionality of an obsolete application (often as original system
is no longer available). E.g. playing vintage computer games on a
contemporary games emulator. Using virtual machines/programs to make
new computers behave like old ones.
Refreshing
Word V2 file
Word V2 file
Refreshing
Word V2 file
Word V2 file
Migrating
Word V2 file
PDF file
Quick ways to reduce loss:

Replicate data: Another Preservation Strategy: Keep lots of copies
of digital objects on different storage media (and use different
brands)

Store any CD etc. produced in a secure, stable, and controlled
environment

Handle media properly

Ensure off-site storage of copies for security purposes

Store archival-quality digital images on a server, if possible

Store copies in various locations, using combination of offline and
online storage media
More ways…

Maintain and refresh data: e.g. implement regular refreshment
cycles to copy onto newer media

Migrate formats e.g. every 3 to 5 years, and quality check integrity of
data after each migration

At point of creation of object, make preservation copies (assuming
licensing/copyright permission i.e. engage with rights holders of
software and hardware etc.)

Subject media to management routines e.g. media testing, keep
inventory of what data is held where
Warning:
Backups of networks aren’t preservation, and storage on disks etc doesn’t mean
permanence (not even gold CD’s!) - so, have more than one approach…
Don’t forget your source material!
Videodrome street theater: CC: by Jima: http://www.flickr.com/photos/jima/3711736520/
Storage and Re-use

As a minimum you need to create a high quality ‘master’
from which other versions of your digital material (for
example images you might make available over the
Internet) can be made. This digital master should be
stored independently e.g offline.

Link digital objects to e.g. an inventory spreadsheet or
collection management database.
Create once, use many times!
Floppy Pencil Box, CC: by alwright1,
http://www.flickr.com/photos/alwright1/27914688
94/
Nice Display: CC: by Mike-Andrews: http://www.flickr.com/photos/smaller-spaces/3284418116/sizes/m/
More to Preservation than Storage…
Curation of whole life cycle of a digital object







Ingest: accessioning/incoming donations,
selection/appraisal etc.
Data management: metadata/cataloguing
Access and Delivery: dissemination
Storage
Preservation Planning
Administration
“Designated communities” you are serving
A matter of formats…



Understanding formats is crucial to long-term
accessibility and preservation
Find out what you can about formats in use
Make informed decisions about preservation formats





Pick ones that conform to published standards
Contrast preservation with reuse
Databases, websites: keep supporting documentation to
allow reconstruction
Buzzwords: WAV, AVI, TIFF, PDF, CSV
Avoid ‘lossy compression’ for preservation e.g. JPEG, MP3
Access Vs Preservation
“Access is about current fashion; preservation is
about enduring style”…
Papyrus.gif
162k
CC: iPod shuffle 3G : Ntr23:
http://www.flickr.com/photos
/ntr23/3348000167/sizes/s/
Audio.mp3
CC: RCA Dog: HeatherL:
http://www.flickr.com/photos
/suzieblue/512881759/sizes/s
/
Audio.wav
Have a cunning plan...a Preservation Plan?:

Identify what to preserve and define your selection criteria.

Identify roles and responsibilities

Determine requirements for donation/accessioning/ingest (formats,
metadata, storage media).

Degree of integration with storage, backup, and preservation for non-digital
resources.

Maintenance strategies (backups - online and/or offline, monitoring,
refreshing).

A prioritized plan is needed, with built-in review periods to assess potential
changes to technology and storage media.

In-house versus outsourcing options. Outline any reliance on outside
consulting and archiving services, if any contract negotiation, etc.
Source: JISC Project Report: Digitisation Programme: Preservation Study April 2009
http://www.jisc.ac.uk/whatwedo/programmes/digitisation/digitisationpreservationstudy.aspx
Digital Preservation in short: Act now!

There is no access without preservation!

Needs active and ongoing management

Preservation and its strategies need to be led by our values:
research values, your users, your requirements/priorities - not simply
technology-led

Small steps usually better than no steps at all

Big institutions and DP players don’t have it all figured out – Must
act now!

Don’t need to do it all or know everything at once! Talk to ICT staff.
Seek external advice e.g. Digital Preservation Coalition (DPC)

Preservation should not be postponed until a perfect solutions
appears…
The Future?

Need more joined-up thinking on digital preservation – between
creators, custodians, project managers, users, funders, ICT etc

DP is everyone’s responsibility

Get talking to other departments/similar organisations to
yours/natural partners: ICT staff, project managers, information
managers, users, partners like Local Authorities – all stakeholders

Digital Repositories / Digital Asset Management Systems
http://www.dcc.ac.uk/resource/briefing-papers/digital-repositories/

Archive Press: Blog self-archiving – ULCC/British Library led project,
JISC funded http://archivepress.ulcc.ac.uk/

The DP community needs you! Contribute to archival and DP
processes – you are close to, and know your users (“designated
communities”); and can advocate their needs and concerns.
Useful resources for smaller archives














JISC Digital Media http://www.jiscdigitalmedia.ac.uk/ - excellent source of introductory advice on still images, moving
images and sound, from file format selection to access and preservation. See Introduction to Digital Preservation:
http://www.jiscdigitalmedia.ac.uk/crossmedia/advice/an-introduction-to-digital-preservation/
Digital Preservation Coalition (see useful advice and publications) http://www.dpconline.org
Wordpress: Alan’s Notes and Thoughts on Digital Preservation: http://alanake.wordpress.com/so-you-want-to-keepall-your-stuff
DigitalNZ: Make It Digital website: http://makeit.digitalnz.org/guidelines/preserving-digital-content/
Managing and Preserving Community Archives, National Preservation Office Te Tari Tohu Taonga, June 2005,
http://www.natlib.govt.nz/catalogues/library-documents/managing-community-archives
"Rethinking Personal Digital Archiving, Part 1: Four Challenges from the Field", Catherine C. Marshall, D-Lib
Magazine, March/April 2008, Volume 14 Number ¾, http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html
Digital Curation Centre Case Studies and Interviews: PrestoSpace: Preservation towards storage and access.
Standardised Practices for Audiovisual Contents in Europe http://www.prestospace.org) March 2008
http://www.dcc.ac.uk/webfm_send/110
International Association of Sound and Audiovisual Archivists at: http://www.iasa-web.org/special_publications.asp
PARADIGM - Personal Archives Accessible in Digital Media (paradigm) project: can be aligned better to community
archives http://www.paradigm.ac.uk/workbook/appendices/guidelines-tips.html (e.g. practical tips and guidelines for
creators & donors of personal digital archives).
US-Based DuraSpace Blog: Small Archives: http://www.fedoracommons.org/confluence/display/FCCWG/Small+Archives
Around the World in 80 Gigabytes: Alexandra Eveleigh's Archives & Technology Blog: Tag on 'small archives‘
http://80gb.wordpress.com/tag/small-archives/
Preserving Your Personal Digital Archives, Heather Louise Mae Bowden, June 24th, 2008, The Long Now Blog,
http://blog.longnow.org/2008/06/24/preserving-your-personal-digital-archives/
Wright, R (2008) Preservation of Digital Audiovisual Content. DPE Briefing Paper. Retrieved 19 October 2008 from:
http://www.digitalpreservationeurope.eu/publications/
briefs/audiovisual_v3.pdf
Film Archive forum - http://bufvc.ac.uk/faf/guidance.htm
Source: Alan’s notes and thoughts on digital preservation: http://alanake.wordpress.com/so-you-want-to-keep-all-your-stuff/
Source: DigitalNZ : http://makeit.digitalnz.org/
General resources on digital preservation:















JISC Digital Media http://www.jiscdigitalmedia.ac.uk/
Digital Preservation Coalition (see useful advice and publications)
http://www.dpconline.org
Digital Curation Centre http://www.dcc.ac.uk/
Digital Preservation Training Programme http://www.ulcc.ac.uk/dptp/
Digital Preservation Management Online Tutorial (Cornell University)
http://www.icpsr.umich.edu/dpm/
AV Preservation: Prestospace http://prestospace.org/
AV Preservation: TAPE http://www.tape-online.net/
D-Lib Magazine http://www.dlib.org/
UKOLN http://www.ukoln.ac.uk/
Joint Information Systems Committee http://www.jisc.ac.uk
The National Archives http://www.nationalarchives.gov.uk/preservation/digital.htm
The British Library http://www.bl.uk/about/collectioncare/digpresintro.html
AHDS preservation handbooks http://ahds.ac.uk/preservation/ahds-preservationdocuments.htm
Listserv for Digital Preservation https://www.jiscmail.ac.uk/cgibin/webadmin?A0=digital-preservation
PLANETS: http://www.planets-project.eu/
Additional resources to take home:
Video/Film & Audio Preservation
Significant characteristics to consider when storing,
transmitting, and preserving:

Film/Video: resolution, size, aspect ratio, frame rate and
fields, bit rate, bit depth and compression method
(codec)

Audio: bit depth, sampling rate, compression method
(codec), and number of channels
DPE: Preservation of digital AV content
What to do - Despite the problems, some clear statements can be made about AV
preservation:




Preserve the artefact: Keep the ‘original’, even if compressed. ‘Preserve the
bits’, whatever else is done. AV content has one advantage: there is a lot of it, in
a relatively small number of formats. Methods to ‘play the bits’ may exist.
Decode to uncompressed and save as uncompressed (in addition to keeping the
original). This is a demanding requirement for video (100 GB/hr for 625-line
TV), but storage is now very inexpensive.
Enhance the metadata: A file extension (e.g. .wav, .avi is not sufficient).
There are over 50 registered variants of encoding within the definition of .wav;
MPEG-1 and MPEG-2 use the extension ,mpg. Ideally, there will be a
metadata extraction tool; otherwise, manual testing and documentation is
needed.
You are not alone: Use the file-type registries, software repositories,
emulation platforms, and Preservation Guides listed in the [DPE] references.
Source:
Wright, R (2008) Preservation of Digital Audiovisual Content. Digital Preservation Europe (DPE) Briefing Paper. Retrieved 19
October 2008 from: http://www.digitalpreservationeurope.eu/publications/briefs/audiovisual_v3.pdf
Common reasons for data loss








Obsolete file formats / software / media
Insufficient catalogue information/context (“metadata”)
Corrupted files on portable media
Uncontrolled number of file formats
Insufficiently documented proprietary file formats
Inaccessible data at point of donation to archive
Software updates or emulations not fully compatible with
data
Data physically lost
Source: Digital Preservation Coalition, Survey 2005
Storage Media
Variety of online and offline storage media:







CD-ROM
DVD-ROM
LTO (Linear Tape Open)
DLT (Digital Linear Tape)
Networked/managed server storage
Hard drives e.g. Online storage is often mirrored across multiple
disks using redundant disk arrays (RAID).
Image and video hosting websites e.g. Flickr
http://www.flickr.com/photos/nga_researchlibrary/
Tips:
Never use rewritable discs for long-term storage. Don’t buy
media from one single supplier or name brand – spread risks.
Media testing








All media needs periodic testing e.g. random error checking
Use of brand names doesn’t guarantee longevity – (use variety of
brands/suppliers)
Verify initial transfer to new media
Confirm continued viability of stored files
Spot degradation prior to permanent loss
Spot trends in media degradation
Support media refreshing and migration decisions
Confidence in longevity requires
 Initial testing of drives and media
 Proper handling and storage
 Periodic re-sampling
Other Digital Preservation considerations:

Best Practice Standards: OAIS (Open Archive Ingest System), Trusted
Digital Repositories (TDR), PREMIS, DRAMBORA

Metadata: “data about data” (embedded like ‘TIFF’ or external like
cataloguing information). Note: preservation metadata needed to preserve
digital objects over time (PREMIS)

Intellectual Property Rights affect DP e.g. database rights (expire after
c.15 years); find out with website contractor: do you have the right(s) to
make a preservation copy or migrate etc?

Tools and technologies e.g. media testing, format identification/migration,
automatic metadata extraction

Formats – in one sense DP is about making calculated decisions about
formats – which will last, which we can trust, what do we know about them;
can we be assured to migrate to/from them in the future…
Preservation Planning Tools
Migration Decision-Making:
 PRONOM (UK National Archives)
 File format registry, offers, e.g:



Basic data about file formats
Searchable Web database by file format, product
name, vendor name, date
Information about file formats and the software and
hardware required to access them
Example: Database Preservation
National Digital Archive of Datasets (NDAD)
http://www.ndad.nationalarchives.gov.uk/
Example: Storage Plan




Resources currently in use: kept online with regular backup,
refreshment, and migration.
Whether online or not: all archival versions (highest resolution,
fullest capture, lossless compression) are written to approved
storage media and stored off-line in the Library Digital Program
Division, with a schedule for regular refreshment, and migration.
For archival versions which are not currently online: a duplicate offline copy is created for storage at a different site.
All versions, online and offline, are tracked through the CUL local
asset management system.
Source: Columbia University Libraries
Policy for Preservation of Digital Resources
July 2000 (rev. 2006) - http://www.columbia.edu/cu/lweb/services/preservation/dlpolicy.html
Before a Preservation Plan:






List resources: All types of digital resources that you either currently or
plan to create, own or subscribe to
Identification: Document risks for each resource type – e.g. website
changes, software version changes, media degradation, hardware failure
etc.
Implications: Consider implications for your service in worst case
scenario. Which resources are ephemeral Vs permanent?
Assessment: Assess the value of groups of resources and the impact on
your service if these no longer exist or are inaccessible.
Solutions: For each case, identify what the options are, how much they
will cost and what they will require in terms of staff time and skills.
Decide: Decide on the strategies which are most appropriate for each type
of resource.
Source: CC: Adapted from UKOLN “Developing your Digital Preservation Policy”
http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-42/
Examples of Preservation Policies:






UKOLN Guidance: Developing Your Digital Preservation Policy
http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-42/briefing42.doc
Yale University Library: Digital Preservation Policy
http://www.library.yale.edu/iac/DPC/final1.html
Columbia University Libraries Policy for Preservation of Digital Resources
http://www.columbia.edu/cu/lweb/services/preservation/dlpolicy.html
Preservation Policy of “DiscoverArchive”
http://discoverarchive.vanderbilt.edu/bitstream/handle/1803/2361/Preservati
onPolicy.pdf?sequence=1
Moving Here: Digitisation Guidelines (Audio and Video) (See page 13)
http://www.movinghere.org.uk/help/documents/audiovideo_guidelines_2005
.pdf
OCLC Digital Archive Preservation Policy and Supporting Documentation
http://www.oclc.org/support/documentation/digitalarchive/preservationpolicy.
pdf
USAGE RIGHTS:
The contents of this PowerPoint presentation are provided under the following
open source licence: http://creativecommons.org/licenses/by-sa/3.0/
In summary, you are free:
 to share — to copy, distribute and transmit the work
 to remix — to adapt the work
Under the following conditions:
 Attribution. You must attribute the work by referring to ULCC
http://www.ulcc.ac.uk/ (but not in any way that suggests that we endorse
you or your use of the work).
 Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under the same, similar or a compatible
license.
Download