Music Information Retrieval George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor, IEEE Senior Member Tier II Canada Research Chair Computer Science Department (also in Music, ECE) University of Victoria, Canada 1 Copyright 2011 G.Tzanetakis MIR ‣ ‣ ‣ ‣ Interdisciplinary science of retrieving information from music ISMIR - Int. Symposium -> Int. Conf. on MIR -> Int. Conf. of the Society of MIR First ISMIR in 2000 Increasing presence in ICASSP, ICME, ACMM, TMM, TASLP, MMTA ‣ All proceedings are freely available online ‣ music-ir@listes.ircam.fr Copyright 2011 G.Tzanetakis Connections Machine Learning Computer Science Signal Processing Information Science Psychology Human-Computer Interaction MUSIC 3 Copyright 2011 G.Tzanetakis Music today ‣ Music is produced, distributed and consumed digitally ‣ 2011 digital music sales > physical album sales 4 Copyright 2011 G.Tzanetakis Industry QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 6725421 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 5 Copyright 2011 G.Tzanetakis Music Collections ‣ Personal music collections ~ thousands ‣ Streaming music sites, stores ~ millions ‣ ‣ Great celestial jukebox in the sky ~ all of recorded music in human history A 5-minute music track is digitally represented using approximately 26 million floating point numbers Copyright 2011 G.Tzanetakis Overview Focus on signal processing and audio Audio Feature Extraction Analysis Timbre, Pitch, Rhythm Similarity, Classification, Modelling Time Tasks Similarity, Genre classification, Tag annotation, Query-by-Humming, Audio-Score Alignment 7 Copyright 2011 G.Tzanetakis Audio Feature Extraction Sound and sine waves Timbral Features Short Time Fourier Transform (STFT) Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Audio Compression Pitch and Harmony Rhythm 8 Copyright 2011 G.Tzanetakis Linear Systems and Sinusoids Amplitude Phase 0 180 True sine waves last forever 360 Period = 1 / Frequency sine wave -> LTI -> new sine wave in1 out1 in2 out2 in1 + in2 9 out1 + out2 Copyright 2011 G.Tzanetakis Fourier Transform 1768-1830 Text 10 Copyright 2011 G.Tzanetakis Short Time Fourier Transform Input Time t Amplitude t+1 Frequency t+2 Time-varying spectra Fast Fourier Transform FFT Output Filters Oscillators Copyright 2011 G.Tzanetakis Spectrum and Shape Descriptors M Centroid Rolloff = Flux Bandwidth Moments .... Feature Space Feature vector F Centroid 12 Copyright 2011 G.Tzanetakis Mel Frequency Cepstral Coefficients Mel-filtering Mel-scale 13 linearly-spaced filters 27 log-spaced filters Log CF-130 CF CF / 1.0718 CF+130 CF * 1.0718 DCT MFCCs 13 Copyright 2011 G.Tzanetakis Audio Feature Extraction 14 Copyright 2011 G.Tzanetakis Traditional Music Representations 15 Copyright 2011 G.Tzanetakis Pitch content Harmony, melody = pitch concepts Music Theory Bridge to symbolic MIR Automatic music transcription Non-transcriptive arguments Score = Music 16 Split the octave to discrete logarithmically spaced intervals Copyright 2011 G.Tzanetakis Pitch Detection P Time-domain Frequency-domain Perceptual Pitch is a PERCEPTUAL attribute correlated but not equivalent to fundamental frequency 17 Copyright 2011 G.Tzanetakis Time Domain # zero-crossings sensitive to noise – needs LPF C4 Sine Wave C4 Clarinet Note 18 Copyright 2011 G.Tzanetakis AutoCorrelation F(f) = FFT(X(t)) S(f) = F(f) F*(f) R(l) = IFFT(S(f)) Efficient computation possible for powers of 2 using FFT 19 Copyright 2011 G.Tzanetakis Frequency Domain Sine C4 Clarinet C4 Fundamental frequency (as well as pitch) will correspond to peaks in the Spectrum. The fundamental does not necessarily have the highest amplitude. 20 Copyright 2011 G.Tzanetakis Chroma – Pitch perception 21 Copyright 2011 G.Tzanetakis Automatic Rhythm Description 22 Copyright 2011 G.Tzanetakis Beat Histograms Tzanetakis et al max(h(i)), argmax(h(i)) AMTA01 Beat Histogram Features 23 Copyright 2011 G.Tzanetakis Analysis Overview Trajectory Musical Piece Point Cloud 24 Copyright 2011 G.Tzanetakis Content-based Similarity Retrieval (or query-by-example) Input: Query example Output: Ranked list of similar audio files based on feature vector similarity Point 25 Copyright 2011 G.Tzanetakis Classification Partitioning of feature space Generative vs discriminative models P( | )= p( | ) * P( ) p( ) Decision boundary Music Speech 26 Copyright 2011 G.Tzanetakis Classification Genre/Style Emotion/Mood Artist Instrument MIREX 2007 10 genres 700 30-second clips / genre 27 Copyright 2011 G.Tzanetakis Multi-tag annotation Free-form tags (female voice, woman singing) Multi-label classification problems with twists Issues: synonyms, subpart relations, sparse,noisy Cold start problem Typically each tag is treated independently as a classification problem Inverse also interesting (query-by-keywords) 28 Copyright 2011 G.Tzanetakis Stacking 29 Copyright 2011 G.Tzanetakis Polyphonic Audio-Score Alignment Representation Time Series of Chroma Matching Procedure 30 Dynamic Time Warping Copyright 2011 G.Tzanetakis Dynamic Time Wraping Aligned Performances of the same orchestral piece Attempting to align two different orchestra pieces 31 Copyright 2011 G.Tzanetakis Query-by-humming User sings a melody Computer searches database for song containing the melody The challenge of difficult queries 32 Copyright 2011 G.Tzanetakis The MUSART system Query preprocessing Pitch contour extraction (audio) Note segmentation Target preprocessing (symbolic) (symbolic) Theme extraction Model-forming, representation Search to find approximate match Dynamic Time Warping, HMMs 33 Copyright 2011 G.Tzanetakis Conclusions Through a combination of digital signal processing and machine learning techniques a variety of music information retrieval tasks have been explored in the literature The tasks covered in this presentation are representative of existing work and there are already commercial implementations for them. There are many more that are actively being investigated. Music is a complex and fascinating signal and we are just beginning to understand it better using computers 34 Copyright 2011 G.Tzanetakis