Exploration on audio-visual mappings

advertisement
Exploration on audio-visual mappings
Daniel Defays1
Keywords: Dissimilarities, Distances,
Metricon
spaces,
Isometric
mapping, Pattern recognition
Exploration
audio-visual
mappings
Abstract
Summary
WhichWhichofthesevisualsequencesofimagescorrespondstotheHappyBirthdaysong?
of these 5 visual sequences of images corresponds to the Happy Birthday song
?
!
!
!
!
!
!
!
!
!
!!
Sequence 1
Sequence 2
Sequence 3 Sequence 4 Sequence 5
1. Which of these visual images corresponds best to the Happy BIrthday song? And why?
Figure
Most people will probably choose the third sequence, with the cake, glasses of Champaign, birds and
balloons,
which evokes a birthday party. But those familiar with music notation could choose
Mostpeoplewillprobablychosethethirdsequenceinthemiddle,withthecake,glassesof
sequence
1 or 2 which code the music of the song. It is unlikely that the last two sequences will be
Champaign,threebirdsandballoons.Somefamiliarwithmusicnotationcouldchoosesequence
very appealing
for anybody, despite of the fact that they are linked to the four parts of the song, as it
1or2anditiisunlikelythathelasttwosequenceswillbeveryappealing.
will be
explained
in the talk.
choosethescore,
!
2 0 1 5 -0 2 -0 7 1 9 : 2 7 :0 0
1 /1
!
P a rtitio n H B (# 2 3 )
The figure 1 illustrates the topic of an exploration on trans-sensory mappings: how can different
sensorial inputs be mapped into each other. More specifically here, how can a set of images be
associated with a piece of music ?
Numerous facts suggest that different inputs to our sensorial channels share some common patterns
or at least at some stage of their processing by the nervous system activates the same areas. If this is
the case, matching between songs, images and odours could make some sense. In fact, software
already exists to bridge music and images, like in the work of the Analema group or in the Media
Player application.
The paper will focus on one particular aspect of that exploration : the « mathematical » mapping of
images on songs.
The songs are decomposed into segments that are then represented in a multidimensional space
through a kind of spectral analysis (with the use of Mell frequency Cepstral Coefficients) widely used
in automatic and speaker recognition [Berenzweig et al, 2003]. In the area of automated processing of
images, Nguyen-Khang Pham, Annie Morin, Patrick Gros and Quyet-Thang Le have used local
descriptors obtained through filters to quantify the content of images and Factorial Correspondence
1UniversityofLiege,ddefays@ulg.ac.be
Analysis (FCA) to reduce the number of dimensions [Pham et al, 2009]. This makes it possible to
represent images into Cartesian spaces as well.
Once the two sets have been wrapped into structures, a morphism between the two sets can be
elaborated.
A method will be presented which makes it possible to find the subset of images (S) - characterized
by their dissimilarities - which matches in an optimal way the musical segments (C) of a song also
characterized by dissimilarities. The quality of the match is assessed by comparing the dissimilarities
of the elements of the target C with the corresponding dissimilarities in S. The closer they are, the
better the fit is. Two different algorithms will be presented and commented.
The images in the 5th sequence of figure 1 have been extracted from a set of 55 photos using that
method.
References
Berenzweig A., Ellis D., Lawrence S. (2003). Anchor space for classification and similarity
measurement of music, http://www.ee.columbia.edu/~dpwe/pubs/icme03-anchor.pdf/.
Defays D. (1978). « A short note on a method of seriation » , British Journal of Mathematical
Psychology 31, pp. 49-53
Diday E., et Noirhomme-Fraiture M. (eds) (2008). Symbolic Data Analysis and the SODAS Software,
Wiley.
Pham N-K., Morin A., Gros P., Le Q-T. (2009). « Utilisation de l’analyse factorielle des
correspondances pour la recherche d’images à grande échelle », Actes d'EGC, RNTI-E-15, Revue des
Nouvelles Technologies de l'Information - Série Extraction et Gestion des Connaissances, Cépaduès
Editions, pp. 283 - 294.
Widmer G., Dixon S., Goebl W., Pampalk E., Tobudic A. (2003). « In search of the Horowitz factor »,
AI Magazine Volume 24 Number 3, pp. 111-130.
Download