CAVA C Co-funded by UCL and the JISC (Joint Information Systems Committee)

advertisement
CAVA
Human Communication Audio-Visual Archive
Co-funded by UCL and the JISC (Joint Information Systems Committee)
April 2009 – March 2010
Martin Moyle, UCL Library Services
Merle Mahon, Developmental Science, UCL
Suzanne Beeke, Language and Communication, UCL
Background
• Much research in human communication is based on experimental
data, but a different understanding comes from examining natural
audio-visual data
• Despite advances in online resources there is no centralised AV
archive in the UK to support such work and to share data
– UKDA (ESDS Qualidata): no video
– TalkBank: prescribed formats for accompanying data
• A large group of cross-disciplinary researchers at UCL investigate
typical and atypical communication, in children and in adults
– Developmental Science and Language & Communication Research
Departments
– Much video data already accumulated, different formats, various stages of
analysis etc.
• We set out to try to establish a data archive for ourselves
Pilot work: UCL Research Challenges Award
2007-2009
• found collaborators
– Martin Moyle, Digital Curation Manager, UCL Library Services
– Libby Bishop, ESDS Qualidata & UK Data Archive
• carried out a feasibility study to explore hardware and
software issues
• set up a pilot to demonstrate access to one dataset
• started to address metadata issues
• prepared a grant proposal to fund the setting up of CAVA
The CAVA team
• Martin Moyle, Digital Curation Manager, UCL Library Services; Project
Manager
• Dr Merle Mahon, Senior Lecturer, UCL Developmental Science
• Dr Suzanne Beeke, Head of Department, UCL Language and
Communication
• Dr Libby Bishop, Manager, Economic and Social Data Service (ESDS)
Qualidata (Essex University)
• Dr Paul Ayris Director of UCL Library Services and UCL Copyright
Officer (Chair of Project Steering Group)
• Stevie Russell, Site Librarian, UCL Language and Speech Science
Library
CAVA project aims
• establish a digital video repository for human communication sciences,
initially populated with an existing (and growing) body of rights-cleared
digital content owned by UCL researchers
• house this within the UCL Library Services Digital Collections service
which uses the Ex Libris DigiTool repository platform
• catalogue each video to a discipline-specific descriptive standard, IMDI
• deposit transcripts and other supporting material wherever available
• develop procedures and processes for managing access (restricted to
bona fide researchers)
• look at options for long-term digital preservation of the master files,
with help of UKDA
Current digital material for inclusion
• Past projects
– Children 141 hours
– Adults 32 hours
• Ongoing projects
– Adults 58 hours
– Children 7 hours
– British Sign Language Corpus 360 hours
• Contributors from UCL and other institutions…
Data from past projects
Hours
Deaf children & teachers
UCL/ Mahon/Department of Health
45
Deaf children & parents
UCL/Mahon/ESRC
6
Children with language disorder & teachers
Institute of Education/Radford/PhD
14
Persons with autism-teacher interaction
Roehampton/Rae, Dickerson & Stribling/ESRC
16
Typically developing toddlers & parent
UCL/Corrin/PhD
60
Typically developing toddlers & parent
Canterbury/Forrester/ESRC
12
Children using AAC &peer
UCL/Clarke/PhD
4
People with MND & spouse
UCL/Bloch/PhD
6
Data expected from ongoing projects
Aphasia therapy
UCL/Beeke/Stroke Association
13
Adults with neurological disease
UCL/ S.Bloch/NHS HIHR/PI
45
British Sign Language Corpus project
UCL/Schembri/ESRC
360
Deaf children
UCL Mahon/British Academy
7
Example of data:
Deaf children & teachers
UCL/Mahon/Department of Health
Example of data:
Children using AAC & peer
UCL/Clarke/PhD
Example of data:
Aphasia therapy
UCL/Beeke/Stroke Association
Content creation
Get video data
Digitise it to avi
for preservation
Check consent
Make copy to mpeg 3
Get accompanying data
IMDI metadata
transcripts
Upload to Repository
Assign rights statement / license
Consent and data protection issues
• Consent
– Retrospective
– Prospective
• guidelines for depositors based on recent successful approval via UCL
Research Ethics Committee/NHS multi-site ethics process
– Data Protection Act (1998)
• Adults and children
– renewed consent at 18 years old
– death of participant
• Authorising access for bona fide researchers
– Item-specific rights
– Application, authorisation and authentication procedures needed
The CAVA repository
• Will use UCL’s DigiTool repository platform
– http://digital-collections.lib.ucl.ac.uk
• Metadata is openly searchable; video resources
will have access restrictions
• Built-in technical metadata extraction (using
JHOVE), checksums, change history metadata;
access control capabilities (IP and/or username)
• Front-end: quick overview…
Keyword search…
“CAVA” link will also
appear here.
This will allow
browse navigation.
Click title to see full record…
Two files are
associated with this
record (1 mpeg, 1
pdf)
Access is restricted.
Authorised users may log in
to view resources
Full
IMDI
record
IMDI metadata
is fully-indexed
(Transcripts are also
indexed in full)
Click-through copyright reminder (optional)
Can be customised for different classes of resource
Click icon to switch between
video and transcript…
Video is delivered to
user’s browser.
It can then be saved
edited, analysed, etc.
locally.
Side by side viewing
of video and
transcript is
possible…
…if we create an
additional copy of the
video in streaming
format
Technical issues
• Interface improvements required
– browse navigation
– dedicated CAVA repository site, with DigiTool functionality embedded
• IMDI profile
• Formats
– supply additional streaming format?
• Access and rights management
– what technical work is needed to underpin access and rights management
procedures?
• Long-term preservation
– the ideal would be to retain the uncompressed master files in a managed
environment, but these exceed current storage capacity.
• will work with UK Data Archive on options
Post-project issues
• Sustaining maintenance and growth?
– ongoing costs of access management, storage, user licences for
repository, support/training, etc
• Continuing deposit by UCL researchers is foreseen
– DIY or mediated?
• Long term future
– Deposit by non-UCL researchers?
– Possible synergies with UKDA
• Exit strategy will be required…
Further information
• CAVA website
– http://www.ucl.ac.uk/ls/cava
• UCL Digital Collections
– http://digital-collections.lib.ucl.ac.uk
• DigiTool
– http://www.exlibrisgroup.com/category/DigiToolOverview
• Project Team
– lib-cava@ucl.ac.uk
Download