Archiving archiving in context □ principles & processes □ examples □ DocLing 2016 David Nathan Archiving in context Where does archiving fit in? “traditionally”: education/research institutions libraries libraries, archives, museums and galleries are “memory institutions” archives museums galleries Archiving skill inputs Sources speakers/performers authors historical and “legacy” providers THE ARCHIVE Curators content/area specialists cataloguers Recordists audio and video experts data collectors/annotators/analysts Data managers data scientists Technical practitioners IT, media & communications IT practitioners programmers, installers Co-ordinators managers governance IT systems & software cataloguing, storage, preservation & access systems A definition of archiving: a commitment by an organization to: appraise the value of a resource preserve the resource make known the existence of the resource enable access to the resource (or its ‘content’) Archiving principles & processes Archiving Acquisition & curation Storage & preservation The virtuous loop we hope to achieve through serving community and through community participation Access & usage Archiving Acquisition & curation Storage & preservation Access & usage Acquisition & curation creation evaluation & selection foster creation collaboration with advice providers & users promotion sharing & exhibiting rights & protocol audiences good practice curation requirements seek resources rights rights description research reach & help outcomes metadata collect & record users work with providers completeness implement provenance funding & community curation formats goals ▫ policies ▫ resourcing ▫ management ▫ documentation sustainability content agreements security ▫ usability ▫ organisation/technology changes ▫ evaluation & reporting usages languages change history Archiving Acquisition & curation Storage & preservation Access & usage Storage & preservation analogue (things) certification A→D integrity packing digital storage copy/backup environment management players integrity check carrier formats catalogue media migration players provider identifiers digital formats hardware usability file formats locations▫ documentation goals ▫ policies ▫ resourcing ▫ management functions formats security ▫ usability metadata ▫ organisation/technology number of userschanges ▫ evaluation & reporting migration filenames Archiving Acquisition & curation Storage & preservation Access & usage Access & usage catalogue → acquisition delivery management acquiring from users users relationships protocols usability monitoring user capabilities accuracy record keeping user needs completeness communications access methods functions statistics & reports archive ↔users, providers research costs, business model ... providers ↔users formulation goals ▫ policies ▫ resourcing ▫ management ▫ documentation communication implementation security ▫ usability ▫ organisation/technology changes ▫ evaluation & reporting negotiation manage responses share & exchange community stakeholding Managing data and preparing for archiving Software to help manage data and prepare for archiving checking file names, sizes, folder structures etc (Treesize, Everything) changing or standardizing formats (especially of media files) Handbrake (video), Audacity (audio), XnView or paint.net (images), MS or Libre Office and Notepad++ (text) creating and managing metadata spreadsheets and databases SIL’s SayMore TLA’s Arbil Miromaa File formats audio WAV (what if original is not WAV??) resolution: 16 bit, 44.1KHz, stereo or better video changing frequently MP4/MPEG4 or MTS/H264/AVCH aspect, resolution: depends on project get advice from achive before depositing File formats images TIFF **OR** original from device resolution: archive quality is 300dpi or better File formats text best is plain text PDF/A often acceptable, but may pose problems if MS-Word or ODF, check with archive structured data (spreadsheets, databases original format should be supplied provide a preservable derivative as well (eg csv, PDF/A) common linguistic software (ELAN, Transcriber, Toolbox, Praat etc) their file formats are generally preservable Can I still use MS Word? most archives no longer accept MS Word files but Word is still useful quicker to type up useful tables, functions, macros etc solutions think “text only” tables as spreadsheets (are they bad too?) (advanced) complex materials formatted as styles, then export as marked up PDF/A – but not a perfect solution Standards we have already mentioned some standards – UTF-8, WAV etc there are other relevant standards, eg ISO 639-3 (language/dialect names) metadata systems – OLAC, CMDI, METS/MODS and others you can also establish project-local standards, eg to handle special characters (eg \e = schwa) data field names document them! – for your usage and for correspondence to wider standards Approaches to small scale archive storage Approaches to small scale archive storage/backup work with a large institution that can support/sponsor your storage/backup needs partner with a number of similar centres to achieve critical mass of materials and resources, set up replication or data centre set up local storage/backup using creative “appropriate technology” approach (e.g. using NAS unit and offsite replication (HD, SSD, tape, or cloud) use a commercial (cloud) provider (also hybrid version – “cloud gateway”) Examples Archive examples – Aboriginal languages/protocol emphasis http://www.atsida.edu.au/ (Aboriginal and Torres Strait Islander Data Archive) – research data related to Indigenous Australia emphasis on return of Indigenous knowledge; can assist communities with repatriation, hosting and distribution http://mira.canningstockrouteproject.com/ an archive based on Mukurtu CMS emphasis on culturally appropriate and controlled access and usage (see also http://plateauportal.wsulibs.wsu.edu/html/ppp/index.php) http://elar.soas.ac.uk/ (Endangered Languages Archive) - international language documentation archive with 20 Australia deposits (http://elar.soas.ac.uk/deposit/0019) emphasis on protocol-based and negotiated access to recordings and annotations Archive examples – Aboriginal languages http://catalogue.aiatsis.gov.au/client/en_AU/external/ (AIATSIS) - merged archive and library catalogues to “Mura” largest archive but limited operationally http://catalog.paradisec.org.au/ (Paradisec) – Pacific and regional but much Australian content emphasis on digitization http://laal.cdu.edu.au/ (Living Archive of Aboriginal Languages) community-created literature gathered and “rescued” after the end of support for bilingual education emphasis on easy to use but powerful interface Archive examples – records institutions http://www.sro.wa.gov.au/archive-collection/collection/aboriginal-records (State Records Office WA) - demographic, school and other records http://www.newnorcia.wa.edu.au/education-and-research/archives/ missionary correspondence, records, registers archives https://www.library.uq.edu.au/fryer-library/ms/Flint/flint_cat_preface.html (Flint collection, UQ library) emphasis on providing awareness of (audio and written) materials In development or not publicly available http://www.irititja.com/ created by Pitjatjantjatjara Council, to repatriate digital versions of cultural.community materials and to manage access to them (see also http://www.rightside.com.au/ara-irititja-kms ) emphasis on usability by remote communities and detailed control of access http://artsandmuseums.nt.gov.au/northern-territory-library/programs-andprojects/our_story_version_2_project Community Stories, a version of Ara Irititja, enabling communities to establish a digital collections by creating, adding and repatriating content related to their own culture and history