ieee2012_mir_webinar..

advertisement
Music Information
Retrieval
George Tzanetakis (gtzan@cs.uvic.ca)
Associate Professor, IEEE Senior Member
Tier II Canada Research Chair
Computer Science Department
(also in Music, ECE)
University of Victoria, Canada
1
Copyright 2011 G.Tzanetakis
MIR
‣
‣
‣
‣
Interdisciplinary science of retrieving information from
music
ISMIR - Int. Symposium -> Int. Conf. on MIR -> Int.
Conf. of the Society of MIR
First ISMIR in 2000
Increasing presence in ICASSP, ICME, ACMM, TMM,
TASLP, MMTA
‣
All proceedings are freely available online
‣
music-ir@listes.ircam.fr
Copyright 2011 G.Tzanetakis
Connections
Machine
Learning
Computer Science
Signal Processing
Information Science
Psychology
Human-Computer
Interaction
MUSIC
3
Copyright 2011 G.Tzanetakis
Music today
‣
Music is produced, distributed and consumed digitally
‣
2011 digital music sales > physical album sales
4
Copyright 2011 G.Tzanetakis
Industry
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
6725421
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
5
Copyright 2011 G.Tzanetakis
Music Collections
‣
Personal music collections ~ thousands
‣
Streaming music sites, stores ~ millions
‣
‣
Great celestial jukebox in the sky ~ all of
recorded music in human history
A 5-minute music track is digitally
represented using approximately 26 million
floating point numbers
Copyright 2011 G.Tzanetakis
Overview

Focus on signal processing and audio

Audio Feature Extraction


Analysis


Timbre, Pitch, Rhythm
Similarity, Classification, Modelling Time
Tasks

Similarity, Genre classification, Tag annotation,
Query-by-Humming, Audio-Score Alignment
7
Copyright 2011 G.Tzanetakis
Audio Feature Extraction

Sound and sine waves

Timbral Features

Short Time Fourier Transform (STFT)
Mel-Frequency Cepstral Coefficients
(MFCC), Perceptual Audio Compression

Pitch and Harmony

Rhythm
8
Copyright 2011 G.Tzanetakis
Linear Systems
and Sinusoids
Amplitude
Phase
0
180
True sine waves last forever
360
Period = 1 / Frequency
sine wave -> LTI -> new sine wave
in1
out1
in2
out2
in1 + in2
9
out1 + out2
Copyright 2011 G.Tzanetakis
Fourier Transform
1768-1830
Text
10
Copyright 2011 G.Tzanetakis
Short Time
Fourier Transform
Input
Time
t
Amplitude
t+1
Frequency
t+2
Time-varying spectra
Fast Fourier Transform FFT
Output
Filters
Oscillators
Copyright 2011 G.Tzanetakis
Spectrum and Shape
Descriptors
M
Centroid
Rolloff
=
Flux
Bandwidth
Moments
....
Feature
Space
Feature vector
F
Centroid
12
Copyright 2011 G.Tzanetakis
Mel Frequency Cepstral
Coefficients
Mel-filtering
Mel-scale
13 linearly-spaced filters
27 log-spaced filters
Log
CF-130
CF
CF / 1.0718
CF+130
CF * 1.0718
DCT
MFCCs
13
Copyright 2011 G.Tzanetakis
Audio Feature Extraction
14
Copyright 2011 G.Tzanetakis
Traditional Music
Representations
15
Copyright 2011 G.Tzanetakis
Pitch content

Harmony, melody = pitch concepts

Music Theory

Bridge to symbolic MIR

Automatic music transcription

Non-transcriptive arguments
Score = Music
16
Split the octave
to discrete
logarithmically
spaced intervals
Copyright 2011 G.Tzanetakis
Pitch Detection
P
Time-domain
Frequency-domain
Perceptual
Pitch is a PERCEPTUAL attribute
correlated but not equivalent to
fundamental frequency
17
Copyright 2011 G.Tzanetakis
Time Domain
# zero-crossings sensitive to noise – needs LPF
C4 Sine Wave
C4 Clarinet Note
18
Copyright 2011 G.Tzanetakis
AutoCorrelation
F(f) = FFT(X(t))
S(f) = F(f) F*(f)
R(l) = IFFT(S(f))
Efficient computation possible for powers of 2 using FFT
19
Copyright 2011 G.Tzanetakis
Frequency Domain
Sine C4
Clarinet C4
Fundamental frequency (as well as pitch) will correspond to peaks in the
Spectrum. The fundamental does not necessarily have the highest amplitude.
20
Copyright 2011 G.Tzanetakis
Chroma – Pitch
perception
21
Copyright 2011 G.Tzanetakis
Automatic Rhythm
Description
22
Copyright 2011 G.Tzanetakis
Beat Histograms
Tzanetakis et al
max(h(i)), argmax(h(i))
AMTA01
Beat Histogram Features
23
Copyright 2011 G.Tzanetakis
Analysis Overview
Trajectory
Musical
Piece
Point
Cloud
24
Copyright 2011 G.Tzanetakis
Content-based
Similarity Retrieval
(or query-by-example)
Input: Query example
Output: Ranked list of
similar audio files based
on feature vector
similarity
Point
25
Copyright 2011 G.Tzanetakis
Classification
Partitioning of feature space
Generative vs discriminative models
P( |
)=
p(
| ) * P( )
p( )
Decision boundary
Music
Speech
26
Copyright 2011 G.Tzanetakis
Classification

Genre/Style

Emotion/Mood

Artist

Instrument
MIREX 2007
10 genres
700 30-second
clips / genre
27
Copyright 2011 G.Tzanetakis
Multi-tag
annotation

Free-form tags (female voice, woman singing)

Multi-label classification problems with twists

Issues: synonyms, subpart relations, sparse,noisy

Cold start problem


Typically each tag is treated independently as a
classification problem
Inverse also interesting (query-by-keywords)
28
Copyright 2011 G.Tzanetakis
Stacking
29
Copyright 2011 G.Tzanetakis
Polyphonic
Audio-Score Alignment

Representation


Time Series of Chroma
Matching Procedure

30
Dynamic Time Warping
Copyright 2011 G.Tzanetakis
Dynamic Time
Wraping
Aligned Performances
of the same orchestral piece
Attempting to align two different
orchestra pieces
31
Copyright 2011 G.Tzanetakis
Query-by-humming



User sings a melody
Computer searches database for song
containing the melody
The challenge of difficult queries
32
Copyright 2011 G.Tzanetakis
The MUSART system



Query preprocessing

Pitch contour extraction (audio)

Note segmentation
Target preprocessing
(symbolic)
(symbolic)

Theme extraction

Model-forming, representation
Search to find approximate match

Dynamic Time Warping, HMMs
33
Copyright 2011 G.Tzanetakis
Conclusions



Through a combination of digital signal processing and
machine learning techniques a variety of music
information retrieval tasks have been explored in the
literature
The tasks covered in this presentation are
representative of existing work and there are already
commercial implementations for them. There are many
more that are actively being investigated.
Music is a complex and fascinating signal and we are just
beginning to understand it better using computers
34
Copyright 2011 G.Tzanetakis
Download