CAVA Human Communication Audio-Visual Archive Co-funded by UCL and the JISC (Joint Information Systems Committee) April 2009 – March 2010 Martin Moyle, UCL Library Services Merle Mahon, Developmental Science, UCL Suzanne Beeke, Language and Communication, UCL Background • Much research in human communication is based on experimental data, but a different understanding comes from examining natural audio-visual data • Despite advances in online resources there is no centralised AV archive in the UK to support such work and to share data – UKDA (ESDS Qualidata): no video – TalkBank: prescribed formats for accompanying data • A large group of cross-disciplinary researchers at UCL investigate typical and atypical communication, in children and in adults – Developmental Science and Language & Communication Research Departments – Much video data already accumulated, different formats, various stages of analysis etc. • We set out to try to establish a data archive for ourselves Pilot work: UCL Research Challenges Award 2007-2009 • found collaborators – Martin Moyle, Digital Curation Manager, UCL Library Services – Libby Bishop, ESDS Qualidata & UK Data Archive • carried out a feasibility study to explore hardware and software issues • set up a pilot to demonstrate access to one dataset • started to address metadata issues • prepared a grant proposal to fund the setting up of CAVA The CAVA team • Martin Moyle, Digital Curation Manager, UCL Library Services; Project Manager • Dr Merle Mahon, Senior Lecturer, UCL Developmental Science • Dr Suzanne Beeke, Head of Department, UCL Language and Communication • Dr Libby Bishop, Manager, Economic and Social Data Service (ESDS) Qualidata (Essex University) • Dr Paul Ayris Director of UCL Library Services and UCL Copyright Officer (Chair of Project Steering Group) • Stevie Russell, Site Librarian, UCL Language and Speech Science Library CAVA project aims • establish a digital video repository for human communication sciences, initially populated with an existing (and growing) body of rights-cleared digital content owned by UCL researchers • house this within the UCL Library Services Digital Collections service which uses the Ex Libris DigiTool repository platform • catalogue each video to a discipline-specific descriptive standard, IMDI • deposit transcripts and other supporting material wherever available • develop procedures and processes for managing access (restricted to bona fide researchers) • look at options for long-term digital preservation of the master files, with help of UKDA Current digital material for inclusion • Past projects – Children 141 hours – Adults 32 hours • Ongoing projects – Adults 58 hours – Children 7 hours – British Sign Language Corpus 360 hours • Contributors from UCL and other institutions… Data from past projects Hours Deaf children & teachers UCL/ Mahon/Department of Health 45 Deaf children & parents UCL/Mahon/ESRC 6 Children with language disorder & teachers Institute of Education/Radford/PhD 14 Persons with autism-teacher interaction Roehampton/Rae, Dickerson & Stribling/ESRC 16 Typically developing toddlers & parent UCL/Corrin/PhD 60 Typically developing toddlers & parent Canterbury/Forrester/ESRC 12 Children using AAC &peer UCL/Clarke/PhD 4 People with MND & spouse UCL/Bloch/PhD 6 Data expected from ongoing projects Aphasia therapy UCL/Beeke/Stroke Association 13 Adults with neurological disease UCL/ S.Bloch/NHS HIHR/PI 45 British Sign Language Corpus project UCL/Schembri/ESRC 360 Deaf children UCL Mahon/British Academy 7 Example of data: Deaf children & teachers UCL/Mahon/Department of Health Example of data: Children using AAC & peer UCL/Clarke/PhD Example of data: Aphasia therapy UCL/Beeke/Stroke Association Content creation Get video data Digitise it to avi for preservation Check consent Make copy to mpeg 3 Get accompanying data IMDI metadata transcripts Upload to Repository Assign rights statement / license Consent and data protection issues • Consent – Retrospective – Prospective • guidelines for depositors based on recent successful approval via UCL Research Ethics Committee/NHS multi-site ethics process – Data Protection Act (1998) • Adults and children – renewed consent at 18 years old – death of participant • Authorising access for bona fide researchers – Item-specific rights – Application, authorisation and authentication procedures needed The CAVA repository • Will use UCL’s DigiTool repository platform – http://digital-collections.lib.ucl.ac.uk • Metadata is openly searchable; video resources will have access restrictions • Built-in technical metadata extraction (using JHOVE), checksums, change history metadata; access control capabilities (IP and/or username) • Front-end: quick overview… Keyword search… “CAVA” link will also appear here. This will allow browse navigation. Click title to see full record… Two files are associated with this record (1 mpeg, 1 pdf) Access is restricted. Authorised users may log in to view resources Full IMDI record IMDI metadata is fully-indexed (Transcripts are also indexed in full) Click-through copyright reminder (optional) Can be customised for different classes of resource Click icon to switch between video and transcript… Video is delivered to user’s browser. It can then be saved edited, analysed, etc. locally. Side by side viewing of video and transcript is possible… …if we create an additional copy of the video in streaming format Technical issues • Interface improvements required – browse navigation – dedicated CAVA repository site, with DigiTool functionality embedded • IMDI profile • Formats – supply additional streaming format? • Access and rights management – what technical work is needed to underpin access and rights management procedures? • Long-term preservation – the ideal would be to retain the uncompressed master files in a managed environment, but these exceed current storage capacity. • will work with UK Data Archive on options Post-project issues • Sustaining maintenance and growth? – ongoing costs of access management, storage, user licences for repository, support/training, etc • Continuing deposit by UCL researchers is foreseen – DIY or mediated? • Long term future – Deposit by non-UCL researchers? – Possible synergies with UKDA • Exit strategy will be required… Further information • CAVA website – http://www.ucl.ac.uk/ls/cava • UCL Digital Collections – http://digital-collections.lib.ucl.ac.uk • DigiTool – http://www.exlibrisgroup.com/category/DigiToolOverview • Project Team – lib-cava@ucl.ac.uk