(a.k.a. Music Representation, Searching, and Retrieval)
Donald Byrd
School of Informatics & School of Music
Indiana University
16 January 2007
1
1. Introduction and Motivation
2. Basic Representations
3. Why is Musical Information Hard to Handle?
4. Music vs. Text and Other Media
5. OMRAS and Other Projects
6. Summary rev. Jan. 2006 2
• Three basic forms (representations) of music are important
– Audio: most important for most people (general public)
• All Music Guide (www.allmusicguide.com) has info on >>230,000 CD’s
– MIDI files: often best or essential for some musicians, especially for pop, rock, film/TV
• Hundreds of thousands of MIDI files on the Web
– CMN (Conventional Music Notation): often best, sometimes essential for musicians (even amateurs) and music researchers
• Music holdings of Library of Congress: over 10M items
– Includes over 6M pieces of sheet music and tens/hundreds of thousands of scores of operas, symphonies, etc.: all notation, especially Conventional Music Notation (CMN)
• Differences among the forms are profound
3
Digital Audio
Audio (e.g., CD, MP3): like speech
(e.g., MIDI file): like unformatted text text with complex formatting
4
Common examples
Unit
Explicit structure
Avg. rel. storage
Convert to left
Ideal for
Basic Representations of Music & Audio
Convert to right
Audio Time-stamped Events Music Notation
CD, MP3 file
Sample
Standard MIDI File Sheet music
Event Note, clef, lyric, etc.
none little (partial voicing much (complete information) voicing information)
1 10 2000
OK job: easy
Good job: hard
1 note: pretty easy other: hard or very hard
OK job: hard music bird/animal sounds sound effects speech music
-
OK job: easy
Good job: hard music rev. Jan. 2006 5
• Four basic parameters of a definite-pitched musical note
1.
pitch: how high or low the sound is: perceptual analog of frequency
2.
duration: how long the note lasts
3.
loudness: perceptual analog of amplitude
4.
timbre or tone quality
• Above is decreasing order of importance for most
Western music
• …and decreasing order of explicitness in CMN!
6
• CMN shows at least six aspects of music:
– NP1. Pitches (how high or low): on vertical axis
– NP2. Durations (how long): indicated by note/rest shapes
– NP3. Loudness: indicated by signs like p , mf , etc.
– NP4. Timbre (tone quality): indicated with words like
“violin”, “pizzicato”, etc.
– Start times: on horizontal axis
– Voicing: mostly indicated by staff; in complex cases also shown by stem direction, beams, etc.
• See “ Essentials of Music Reading” musical example.
7
Query understanding
Query concepts
Database understanding
Database concepts matching
Results
•What user wants is almost always concepts…
•But computer can only recognize words
8
Query
(no und ersta ndin g)
Stemming, s topping, query ex pans ion, etc.
Database
(no und ersta ndin g) matching
Results
•“Stemming, stopping, query expansion” are all tricks to increase precision & recall (avoid false negatives & false positives) due to synonyms, variant forms of words, etc.
9
1. Units of meaning: not clear there are any—assuming music even has meaning! (all representations)
2. Polyphony: “parallel” independent voices, something like characters in a play (all representations)
3. Recognizing notes (audio only)
4. Other reasons
– Musician-friendly I/O is difficult
– Diversity: of styles of music, of people interested in music
10
• Handling text information nearly always via words
– “What we want is concepts; what we have is words”
• Not clear anything in music is analogous to words
– No explicit delimiters (like Chinese)
– Experts don’t agree on “word” boundaries (unlike Chinese)
– Music is always art => “meaning” much more subtle!
• Are notes like words?
• No. Relative, not absolute, pitch is important
• Are pitch intervals like words?
• No. They’re too low level: more like characters rev. Jan. 2007 11
• Are pitch intervals like words?
• No. They’re too low level: more like characters
• Are pitch-interval sequences like words?
• In some ways, but
– Ignores rhythm
– Ignores relationships between voices (harmony)
– Probably little correlation with semantics
• Are chords like words? (Christy Keele)
– If so, chord progressions may be like sentences
– In some ways, but ignores melody & rhythm, most relevant for tonal music, etc.
• Anyway, in much music, pitch isn’t important, and/or notes aren’t important!
rev. Jan. 2007 12
J.S. Bach: “St. Anne” Fugue, beginning
13
MARLENE. What I fancy is a rare steak. Gret?
ISABELLA. I am of course a member of the / Church of England.*
GRET. Potatoes.
MARLENE. *I haven’t been to church for years. / I like Christmas carols.
ISABELLA. Good works matter more than church attendance.
--Caryl Churchill: “Top Girls” (1982), Act 1, Scene 1
Performance (time goes from left to right):
M: What I fancy is a rare steak. Gret?
I:
G:
I am of course a member of the Church of England.
Potatoes.
I haven’t been...
14
• Relationship between notation and its sound is very subtle
• Not at all one symbol <=> one symbol
– Notes w/ornaments (trills, etc.) are one => many
– All symbols but notes are one => zero!
– Bach F-major Toccata example
• Style-dependent
– Swing (jazz), dotting (baroque art music)
– Improvisation (baroque art music, jazz)
– “Events” (20th-century art music)
– How well-defined is style-dependent
• Interpretation is difficult even for musicians
– Can take 50-90% of lesson time for performance students
15
• Salience is affected by texture, loudness, etc.
– Inner voices in orchestral music rarely salient
• Streaming effects and cross-voice matching
– produced by timbre: Wessel’s illusion (Ex. 1, 2)
– produced by register: Telemann example (Ex. 3)
• Octave identities, timbre and texture
– Beethoven “Hammerklavier” Sonata example (Ex.4, 5)
– Affects pitch-interval matching
16
Music
Text
———— Explicit Structure ———— least medium most audio audio (speech) events ordinary notation text with markup written text
Salience increasers loud; thin texture
“headlining”: large, bold, etc.
bright color Images photo, bitmap
Video w/o sound videotape
Biological DNA sequences, data 3D protein structures
PostScript drawing-program file
MPEG?
Premiere file motion, etc.
MEDLINE abstracts ??
17
•
• Simultaneous independent voices and texture
Analogy in text: characters in a play
•
• Chords within a voice
Analogy in text: character in a play writing something visible to the audience while saying different out loud
•
• Rhythm
Analogy in text: rhythm in poetry
•
•
•
• Notes and intervals
Note pitches rarely important
Intervals more significant, but still very low-level
Analogy in text: interval = (very roughly!) letter, not word
18
•
• Words
Analogy in music: for practical purposes, none
•
• Sentences
Analogy in music: phrases (but much less explicit)
•
• Paragraphs
Analogy in music: sections of a movement (but less explicit)
•
• Chapters
Analogy in music: movements
19
• II. Organization of Musical Information (music representation)
– “What we want is concepts; what we have is words”
– Audio, MIDI, notation
• III. Finding Musical Information
– A Similarity Scale for Content-Based Music IR
• IV. Musical Similarity and Finding Music by Content
• V. Finding music via Metadata
– Digital music libraries (Variations2), iTunes, etc.
– Music recommender systems
Jan. 2007 20
• R is very interactive: can use as powerful calculator
• Assignments will be fairly simple
• Much help available: from Don & other students
• Why R?
– NOT because it's great for statistics!
– easy to do simple things with it, including graphs and handling audio files
– probably not good for complex programs
– free, & available for all popular operating systems
– very interactive => easy to experiment
– has good documentation
– In use in other Music Informatics classes, & standardizing is good
21
• Originally for statistics; good for far more
• How to get R
– Web site: http://cran.us.r-project.org/
– Versions for Linux, Mac OS X, Windows
– Already on STC Windows machines; will be in M373
• Tutorial:
• http://xavier.informatics.indiana.edu/~craphael/teach/symb olic_music/
• Can use R interactively as a powerful graphing, musicing, etc. calculator
• …but it’s not perfect: sometimes very cryptic
3 Sep. 2006 22
• Rainer Typke’s “MIR Systems: A Survey of Music
Information Retrieval Systems” lists many systems
– http://mirsystems.info/
• Commercial system: Shazam
• Some research systems can be used over the Web, incl.:
– C-Brahms
– Meldex/Greenstone
– Mu-seek
– MusicSurfer
– Musipedia/Tuneserver/Melodyhound
– QBH at NYU
– Themefinder
23
Machinery to Evaluate Music-IR Research
• Problem: how do we know if one system is really better than another, or an earlier version?
• Solution: standardized tasks, databases, evaluation
– In use for speech recognition, text IR, question answering, etc.
• Important example: TREC (Text Retrieval Conference)
• For music IR, we now have...
• IMIRSEL (International Music Information Retrieval
Systems Evaluation Laboratory) project
– http://www.music-ir.org/evaluation/
• MIREX (Music IR Evaluation eXchange) modeled on
TREC
– 2005: audio only
– 2006: audio and symbolic
24
• Collections are improving, but very slowly
• For research: poor to fair
– “Candidate Music IR Test Collections”
• http://mypage.iu.edu/~donbyrd/MusicTestCollections.HTML
– Representation “CMN” vs. CMN
• For practical use: pathetic (symbolic) to good (pop audio)
– Most are commercial, especially audio
– Very little free/public domain
– …especially audio! (cf. RWC)
• IPR issues are a total mess
25
• Why is so little available?
– Symbolic form: no efficient way to enter
– Solution: OMR? AMR? research challenges
– Music is an art!
– Cf. “Searching CMN” slides: chicken & egg problem
– IPR issues are a total mess
26
• Basic representations of music: audio, events, notation
– Fundamental difference: amount of explicit structure
• Have very different characteristics => each is by far best for some users and/or application
• Converting to reduce structure much easier than to add
• Music in all forms very hard to handle mostly because of:
– Units of meaning problem
– Polyphony
• Both problems are much less serious with text rev. Jan. 2006 27
• Projects include
– Audio-based: via recognition of polyphonic music (OMRAS, query-by-humming, etc.)
– CMN-based: monophonic query vs. polyphonic database
(emphasis on UI) (OMRAS)
– Style-genre identification from audio
– Creative applications: music IR for improvisation, etc.
• Machinery to evaluate research is coming along (MIREX)
• Collections
– for research: poor to fair
– For practical use: pathetic (symbolic) to good (pop audio)
– improving, but…
– Serious problems with IPR as well as technology rev. Jan. 2006 28