Exploring a million hours of sounds

advertisement
Exploring a million
hours of sounds
Richard Ranft, The British Library
27 November 2014
Search Solutions 2014
Outline
• the British Library’s audio collections
• discovery and access
• finding one in a million
www.bl.uk
2
The British Library’s audio collections
• originated in 1955
• national collection of UK record industry
• selected publications from overseas
• radio broadcasts
• unpublished recordings
www.bl.uk
3
Subjects
• music
• spoken word
• environments & nature
www.bl.uk
4
Extent
• 6 million tracks
• from 1857 to this morning
• many formats
• 115 years of listening
www.bl.uk
5
Obstacles to exploring and access
• copyrights
• analogue or offline digital
• many non-digital tracks
• time-based = time consuming
• limited, text-based search
• no serendipity
• high expectations (c.f. iTunes, Spotify)
www.bl.uk
6
Online consumer audio services
• ‘opacity’ of
audio (no
freezeframes!)
Human-led enrichment
• description
• transcription
• annotation
• category tagging
• rating, recommendation & review
www.bl.uk
9
Machine enrichment/search
Categorisation
Music genre, language/dialect
detection, mood
Synchronisation
Score following
Transcript following
Identification
Speaker/vocalist ID
Melody recognition
Query by humming/tapping
Non-text browsing
Map browse
Timeline browse
Recommendation & matching
melody matching
Cross-media linking
Speaker/ tune matching
Feature extraction
Pitch, tempo, chord, time
signature, rhythm
Segmentation/event detection
Music/speech segments
Speaker/ lead instrument change
Laughter, applause, emotion
detection
Transcription
Speech-to-text
Score generation
Discovery and access
• Sound & Moving Image Catalogue
sami.bl.uk
• onsite listening:
– Appointments service
– SoundServer (200,000 tracks, 3% of
total)
• off site listening:
– BL Sounds website (50,000 tracks, 1%)
• streaming
• downloading
www.bl.uk
11
Sound & Moving Image Catalogue
sami.bl.uk
www.bl.uk
12
BL Sounds
• Improving
access and
discovery
• http://sounds.bl.uk/
Visualisation and analysis
Current BL projects
• ‘Metable’ software: acquire / describe UK’s digital music,
searching via APIs across open music databases
(MusicBrainz, Decibel, Discogs)
• COMMA: cloud-based media analysis project with BBC
http://www.bbc.co.uk/rd/projects/comma
• Digital Music Lab: analysing and visualising big music data
collections http://dml.city.ac.uk/
www.bl.uk
21
Digital Music Lab example
Chord detection using Chordino VAMP Plugin (Queen Mary University of London)
www.bl.uk
22
English conversation: At the Tobacconist's (1929)
Linguaphone 78rpm shellac disc
http://sounds.bl.uk/Arts-literature-and-performance/Earlyspoken-word-recordings/024M-1CS0011556XX-0200V0
www.bl.uk
23
Thanks for listening!
richard.ranft@bl.uk
http://sounds.bl.uk
@soundarchive
www.bl.uk
25
Download