Exploration on audio-visual mappings

Exploration on audio-visual mappings Daniel Defays1 Keywords: Dissimilarities, Distances, Metricon spaces, Isometric mapping, Pattern recognition Exploration audio-visual mappings Abstract Summary WhichWhichofthesevisualsequencesofimagescorrespondstotheHappyBirthdaysong? of these 5 visual sequences of images corresponds to the Happy Birthday song ? ! ! ! ! ! ! ! ! ! !! Sequence 1 Sequence 2 Sequence 3 Sequence 4 Sequence 5 1. Which of these visual images corresponds best to the Happy BIrthday song? And why? Figure Most people will probably choose the third sequence, with the cake, glasses of Champaign, birds and balloons, which evokes a birthday party. But those familiar with music notation could choose Mostpeoplewillprobablychosethethirdsequenceinthemiddle,withthecake,glassesof sequence 1 or 2 which code the music of the song. It is unlikely that the last two sequences will be Champaign,threebirdsandballoons.Somefamiliarwithmusicnotationcouldchoosesequence very appealing for anybody, despite of the fact that they are linked to the four parts of the song, as it 1or2anditiisunlikelythathelasttwosequenceswillbeveryappealing. will be explained in the talk. choosethescore, ! 2 0 1 5 -0 2 -0 7 1 9 : 2 7 :0 0 1 /1 ! P a rtitio n H B (# 2 3 ) The figure 1 illustrates the topic of an exploration on trans-sensory mappings: how can different sensorial inputs be mapped into each other. More specifically here, how can a set of images be associated with a piece of music ? Numerous facts suggest that different inputs to our sensorial channels share some common patterns or at least at some stage of their processing by the nervous system activates the same areas. If this is the case, matching between songs, images and odours could make some sense. In fact, software already exists to bridge music and images, like in the work of the Analema group or in the Media Player application. The paper will focus on one particular aspect of that exploration : the « mathematical » mapping of images on songs. The songs are decomposed into segments that are then represented in a multidimensional space through a kind of spectral analysis (with the use of Mell frequency Cepstral Coefficients) widely used in automatic and speaker recognition [Berenzweig et al, 2003]. In the area of automated processing of images, Nguyen-Khang Pham, Annie Morin, Patrick Gros and Quyet-Thang Le have used local descriptors obtained through filters to quantify the content of images and Factorial Correspondence 1UniversityofLiege,ddefays@ulg.ac.be Analysis (FCA) to reduce the number of dimensions [Pham et al, 2009]. This makes it possible to represent images into Cartesian spaces as well. Once the two sets have been wrapped into structures, a morphism between the two sets can be elaborated. A method will be presented which makes it possible to find the subset of images (S) - characterized by their dissimilarities - which matches in an optimal way the musical segments (C) of a song also characterized by dissimilarities. The quality of the match is assessed by comparing the dissimilarities of the elements of the target C with the corresponding dissimilarities in S. The closer they are, the better the fit is. Two different algorithms will be presented and commented. The images in the 5th sequence of figure 1 have been extracted from a set of 55 photos using that method. References Berenzweig A., Ellis D., Lawrence S. (2003). Anchor space for classification and similarity measurement of music, http://www.ee.columbia.edu/~dpwe/pubs/icme03-anchor.pdf/. Defays D. (1978). « A short note on a method of seriation » , British Journal of Mathematical Psychology 31, pp. 49-53 Diday E., et Noirhomme-Fraiture M. (eds) (2008). Symbolic Data Analysis and the SODAS Software, Wiley. Pham N-K., Morin A., Gros P., Le Q-T. (2009). « Utilisation de l’analyse factorielle des correspondances pour la recherche d’images à grande échelle », Actes d'EGC, RNTI-E-15, Revue des Nouvelles Technologies de l'Information - Série Extraction et Gestion des Connaissances, Cépaduès Editions, pp. 283 - 294. Widmer G., Dixon S., Goebl W., Pampalk E., Tobudic A. (2003). « In search of the Horowitz factor », AI Magazine Volume 24 Number 3, pp. 111-130.

Exploration on audio-visual mappings

Related documents

Products

Support

Exploration on audio-visual mappings

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib