20060907

advertisement
Automatic Genre Classification
of Music Content
[A survey]
Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek,
IEEE SIGNAL PROCESSING MAGAZINE
MARCH 2006
By Yi-Tang Wang
Outline
•
•
•
•
•
Introduction
Feature extraction techniques
Genre classification paradigms
Classification results
Future directions & Conclusion
Introduction
• EMD (electronic music distribution)
– Restoration of analog archives
– New content
– music catalogues become huge
• What do you want to listen ?
– 1 million tracks online
– Efficient ways to browse & organize
Introduction (cont.)
• Music Genres
– Categories to characterize similarities
– Boundaries are fuzzy
• Automatic Classification
– Finding a taxonomy
– Hierarchical set of categories
– Nontrivial task
Critical issues
• Artists, Albums, or Titles
– One song to one genre(?)
– Albums - heterogeneous material
– Artists - several albums
– Same Titles?
• Nonagreement on Taxonomies
– Allmusic, Amazon, Mp3
[2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. ContentBased Multimedia Information Access (RIAO), Paris, France, 2000
Critical issues (cont.)
• ILL-Defined Genre Labels
– Varied criteria (geographically, timely, etc)
– Dependant on cultural
• Scalability of genre taxonomies
– New genres appear frequently
– Merging or splitting
– Automatic system
Feature extraction techniques
• High-level model
– Event-like format (MIDI)
– Symbolic format (MusicXML)
– Rarely availiable
• Low-level
– Audio samples
– Low level and low density of info
• Do feature extraction
– Timbre, Melody, Harmony, Rhythm
Timbre
• Same pitch and loudness but sound
different
• Features to characterize timbre
– Temporal features
– Energy features
– Spectral shape features
– Perceptual features
– Some have been normalized in MPEG-7
Timbre (cont.)
Timbre (cont.)
• Transformations
– new feature or increase dimensionality
– Suggested transforming into logarithmic
decibel scale
• Texture window
–
–
–
–
–
Larger window
Reduce computation
Increase classification accuracy
1s
Variant size and positions
Timbre (cont.)
• Texture model
– model of features over texture window:
• 1) simple modeling with low-order statistics
• 2) modeling with autoregressive model
• 3) modeling with distribution estimation
algorithms (for example, EM estimation of a
GMM of frames)
Melody & Harmony
• Melody
– succession of pitched events
– Horizontal element
• Harmony
– pitch simultaneity, chords
– Vertical element
Melody & Harmony (cont.)
• Pitch function
– Characterizing pitch distribution
– Amplitude, position of main peak, …
– Unfolded
• Contains pitch content and info of its range
– Folded
• Mapped to a single octave
• Harmonic content
Rhythm
• No precise definition
• Generically, all of the temporal aspects
• Periodicity function
– Low level approach as pitch function
• 1) tempo: periodicities typically in the range
0.3–1,5s (i.e., 200–40 bpm)
• 2) musical pattern: periodicities between 2 and
6 s (corresponding to the length of one or more
measure bar)
– Gouyon et al. get MFCCs-like descriptors
Extracting from segments
• Small segment may contain sufficient
information
• Reduced required computation
• Typically 30s segment
– and 30s after beginning
• Artist classification
– Voice is easier to identify than music only
Local conclusion
• High level descriptors from
polyphonic audio signal is not yet
state of the art
• Focus on timbre modeling
• Timbre may contain sufficient info
– 250ms : 53% , 3s : 72%
– Among 10 genres
Local conclusion (cont.)
• Another point of view (pessimistic)
– Timbre similarity measure & 20,000
titles distributed over 18 genres
– Little correlation
– May not scalable
– Take cultrual features into account
Genre classification
• Expert systems
• Unsupervised approach
– clustering
• Supervised approach
– Machine learning algorithms
Expert systems
• A knowledge based system made up
of a set of rules
• No model based on it so far
• Expensive to implement and maintain
• May yield unexpected interactions
Expert systems (cont.)
• Pachet and Cazaly’s work
– State differences with language based, e.g.
instrumentation
Unsupervised approach
• Clustering with similarity measures
• Similarity measures
– If time invariant
• Euclidean distance or cosine distance
– Otherwise
• Build statistical model (Gaussian or GMMs)
– Kullback-Leibler divergence, relative entropy
– Sampling, Earth’s mover distance,
asymptotic likelihood approximation
• Shao et al. use HMMs
Unsupervised approach
• Clustering algorithms
– K-means
– Shao et al.’s work
• agglomerative hierarchical clustering
– SOM (self-organizing map)
• Artificial neural network
• High dim onto lower dim
• GHSOM (growing hierarchical SOM)
– Rauber et al.
Supervised approach
• A taxonomy of genres is given
• VS. Expert System
– No rules (or description to genre)
• Supervised machine learning algo
–
–
–
–
–
–
KNN (K-Nearest Neighbor)
GMMs (Gaussian Mixture Models)
HMM (Hidden Markov Models)
LDA (Linear Discriminant Analysis)
SVMs (Support Vector Machines)
ANNs (Artificial Neural Networks)
Classification results
• MIREX genre classification contest
– 1,005 / 510 songs over ten genres
– 940 / 447 songs over six genres
Classification results
Future directions
• Classification into perceptual categories
– Moods, emotions
• Novelty Detection
– New or unknown data (not belong to any
class)
• Classification with multiple labels
– Probably closer to human experience
• From taxonomies to folksonomies
– Does the taxonomy fit to users
Conclusion
• Definitions of music genres are
convoluted
• Features → classification → result
• Research is evolving from purely
objective machine calculations to
techniques
• Machine learning plays a fundamental
role in classification domains
Thank You
Download