Computerised musical intelligence 1 Introduction

Computerised musical intelligence Ph.D. research project Kristian Nymoen Vibesgate 24 0356 Oslo kristian.nymoen@imv.uio.no 1 21.11.2008 Introduction Short project summary Musical performances hold complex information which is interpreted by humans through our cognition. In a musical performance, both the sound and the movement of the performer are significant factors in how we interpret the musical expression [17]. It has been suggested that a deeper knowledge of these complex musical expressions may be attained by studying the lower-level technical features of the movements of the performer [16] [6]. In this Ph.D. project I shall investigate the possibility of making computers understand these expressions. I aim to develop prototypes of intelligent computer systems for music, i.e. systems capable of understanding and responding to a musical input based on detection and classification of action and sound features found in performance. Main research question How can the semantics of a musical performance be captured and classified by a machine learning system through audio, video, motion capture and other sensor data? By proposing this research question I aim to complement the Sensing Music-related Actions (SMA)1 project goals of developing machine-learning techniques and segmentation methods for extracting semantic actions from a continuous stream of sensor data, and developing prototypes of enactive media devices that allow continuous control of music based on the actions of the user. The SMA project has a main focus on the user/listener. I shall keep a main focus on the musical performer. 1 The Sensing Music-related Actions research project is a collaboration between the Departments of Musicology and Informatics at the University of Oslo. The SMA project aims to develop sensor technologies and intelligent systems for musical interaction. Such systems will require a capability of classifying musical expressions. I will collaborate closely with this project. 1 2 Primary project goals • Develop a computerised system capable of segmenting and classifying action and sound features of a musical performance. • Develop new intelligent interfaces to be used in musical performance. By an intelligent interface I mean an interface capable of learning new control actions. • Develop intelligent systems for computerised music generation, e.g. systems for sound synthesis, robot control and animation. 2 Theoretical Background Music as an interdisciplinary research field From a research perspective, I believe music holds an interesting position. To achieve a deeper understanding of it, one needs to include several academic disciplines. For instance, without the humanities and aesthetics perspectives, music would merely be calculations of physical relationships between sounds. And without the perspective of the natural sciences, we would be very limited in terms of understanding and expanding the aesthetic field of music. Physical laws and mathematical models, together with psycho-acoustics and cognitive science, make up the foundation for our knowledge of sound and music perception. Sensor technologies and computer algorithms have enabled development of new musical instruments, new ways of musical interaction, and new methods to gain knowledge of human actions and reactions, in musical and other contexts. Music, technology and interface design New Interfaces for Musical Expression (NIME) has over the last decade become one of the major research fields within the area of music technology research. New ways of interacting with musical instruments and other musical and non-musical interfaces has also caught the eyes of commercial interests, in particular the computer gaming industry in the development of games and consoles like Guitar Hero and Nintendo Wii. Human musical interaction with computers has evolved from being constrained to communicating MIDI data between a keyboard interface and a sound generator, to a variety of protocols and interfaces. These interfaces may be controllers, sending data based on a variety of sensors on the interface (eg. the T-Stick [11]). Interfaces may also be augmented instruments, i.e. interfaces resembling the playing technique of a traditional musical instrument, augmented with sensor technology for sound modifica- 3 tion, like the Overtone Violin [14]. Interfaces may also be extended with vibrotactile feedback or force-feedback, like in the Cellomobo and the Haptic Drum [1]. The International Society for Music Information Retrieval (ISMIR) is concerned with extracting musical information from data files (e.g. audio files and midi files). Musical information here denotes music theory, meta-information, emotive content, etc. Among the goals within the music information retrieval community are to be able to classify audio files to genres and compare melodies in audio files by using computer algorithms [4]. It has been suggested that the study of movement and motor modalities will enhance our understanding of sound and music perception [6] [8]. New sensor technology and methods for storing, streaming and analysing movement data, provide increased knowledge of cognitive processes in musical contexts. For instance, in our own studies at the Dept. of Musicology, using the Sound Description Interchange Format to work with movement data of several sample rates simultaneously for studying the phenomenon of co-articulation2 in musical movement [9]. I believe this knowledge may be applied to development of a computerised musical intelligence by applying machine learning algorithms to classification of this hierarchical structure of superordinate and subordinate movements. 3 Method This study will build on current research in sound and movement analysis, where the Sound Description Interchange Format (SDIF) and the Gesture Description Interchange Format (GDIF) have been suggested as bridges between technological aspects (e.g. sensor data or audio analysis) and human descriptive terms (e.g. hand movement or timbre). A wide taxonomy for these descriptions has already been developed through various research (e.g. [18], [10] and [3]), and constitute a good foundation for development of a computerised musical intelligence. The classification of musical features in my project will build on previous work within this field, as well as be a part of the ongoing GDIF development [9]. My goals of developing an intelligent system for classifying segments of a musical performance and the goal of developing systems for intelligent music generation are closely linked together. Within the field of audio analysis, the concept of analysis-bysynthesis has been a widely used method for understanding the structure of signals [15]. I believe that this may be expanded to several other disciplines, including the discipline of simultaneous analysis of movement and audio. If a computer system may 2 Co-articulation theory suggests that human actions are performed with certain goal-points on which we put our attention, and the trajectory between these goal-points are made up of a superordinate goal-directed movement and smaller subordinate movements [7]. 4 learn to generate a human-like musical expression, this system is likely to be a good model of human cognitive processing of a musical input. In other similar research where music is used as a framework for studying robots or other computerised systems and musical interaction, the main focus is typically on features like beat tracking [13], or expression of predefined emotive features in music [2], [12]. Developing a self-evaluating system where the evaluation is based on rules evolved through evolutionary computing could eventually lead to a system evolving to understand musical emotions, rather than a system being trained to understand musical emotions based on human descriptions of emotive features of music. I shall approach this project by creating a ”virtual perceiver”, i.e. a computer system obtaining audio, video, motion capture and sensor data, and classifying the data by using models inspired by human cognition. This may probably be solved by applying models simulating the limitations of human visual and auditory senses (e.g. masking of sound) to the input, and using evolutionary algorithms to simulate the cognitive processing of the input data. The necessary facilities and equipment for getting this raw data is available in the FourMs lab at the University of Oslo where I will do much of my research. I will investigate ways to analyse this data, both by looking at existing methods and experimenting with new ways of analysis in collaboration with the SMA project. By applying techniques from evolutionary computing, the system may evolve to ”understand” the musical performance and simulate the performance through animation or by controlling a robot. A first step, as illustrated in figure 1, could be to analyse data from a single system (e.g. motion capture) for a single musical action and then use this analysis as input to an evolutionary algorithm designed to evolve raw data which corresponds to the raw data from the original system. Musical action Good reverse model of analysis If satisfactory solution is found Optitrack motion capture Evaluate against raw data Fitness evaluation If no satisfactory solution is found Phenotypes to be evaluated Perceptual filtering Analysis Candidate solutions for reverse model of perceptual filter and analysis Generate genomes Figure 1: Preliminary model of a simple system for evolving a reverse model of the analysis used for analysing the musical action. This model only considers simulation of the original performance based on motion capture input. A number of such evolved ”reverse models” may constitute a set of rules or possibilities for a more complex system. 5 I will investigate how a set of such evolved rules and models may be used in development of a larger system which is able to understand and generate music based on input from several different input devices. As a final application near the end of the research period, I aim to develop a system which is capable of performing with a musician. I shall investigate animation and robotics as ways for this system to express a musical movement. Figure 2 shows a high-level schematic illustration of what such a system with a robot and sound synthesis may look like. Musical Performance Motion capture data, video, audio, sensor data Robot control and sound synthesis Motion capture data, video, audio, sensor data Perceptual and cognitive filtering Analysis and classification based on evolved rules Perceptual and cognitive filtering Figure 2: High-level model of a computer system for performing together with a musician. The robot control and sound synthesis is continuously evaluated against the input from the human musician. Co-evolution is currently one of the research topics in the Robotics and Intelligent Systems group [5]. In this technique, the evolutionary computing is performed on a model of the real environment before it is executed in the real environment. The model is under continuous evaluation, and is continuously adapting to the real environment. Because music, both in a short and a long time scale, is a continuously developing phenomenon, it will be interesting to investigate use of co-evolution in my applications. 4 Relevance to current international research The main foundation of this project is in Human-Computer Interaction in combination with Machine Learning (e.g. Evolutionary Computing), Music Technology and Music Cognition. Our research group is already working internationally in the projects Structured Understaning of Music (SUM)3 and the EU funded Sonic Interaction Design (COST-SID).4 My Ph.D. project is, as previously stated, also relevant to the NIME and ISMIR communities in a global scale, and to the SMA project at the University of Oslo at a local scale. I believe that the envisaged results of my project will also be of interest to commercial actors like the gaming industry. 3 http://www.re-new.dk/index.php?pid=62 4 http://www.cost-sid.org/ 6 5 Progress plan and milestones I will submit at least one paper as first author in 2009. This will be based on my master degree project, and therefore not directly linked to this PhD project. Ia also already working on material as co-writer for publications during the next semester. I shall publish papers for each major step in the process, with the most productive period being 2010-2011. An exact plan is hard to make, but I believe the schedule outlined below is sustainable. I do plan to spend at least one semester abroad, most likely in 2010. I have not yet determined the exact time and location for this. YEAR SEMESTER WORK OUTLINE spring Investigate the potential of the technologies that I am going to use for collecting performance data. This will be the Optitrack motion capture system in particular, as well as audio and video analysis. 2009 2010 autumn Develop a system for simple classification of sound and movement data. Writing papers. spring Case-studies of performing musicians. Use the results for improving the system. Do the initial work on intelligent musical interfaces. autumn Continued development of intelligent interfaces. Use the interfaces in concerts. Initial work on animation and sound generation for a system simulating a musical performance. Writing papers Spring Further development of the system: Implement robot control, animation, sound generation and music theory. Writing papers. 2011 autumn Case study of musicians playing together with the system. Evaluations, redesign. Using the system in concerts. Writing papers. spring Completing the system. autumn Dissertation writing. 2012 MILESTONES Develop the initial classification system Develop intelligent musical interfaces Develop a system for performing music together with human musicians Dissertation completion 7 References [1] Edgar Berdahl, Hans-Christoph Steiner, and Collin Oldham. Practical hardware and algorithms for creating haptic musical instruments. In NIME ’08: Proceedings of the 8th international conference on New interfaces for musical expression, pages 61–66, Genova, Italy, 2008. Casa Paganini. [2] Birgitta Burger and Roberto Bresin. Displaying expression in musical performance by means of a mobile robot. In Affective Computing and Intelligent Interaction, volume 4738/2007 of Lecture Notes in Computer Science, pages 753–754. Springer, Berlin / Heidelberg, 2007. [3] Claude Cadoz and Marcelo Wanderley. Gesture-music. In Marcelo M. Wanderley and Marc Battier, editors, Trends in Gestural Control of Music, pages 71–94. IRCAM — Centre Pompidou, Paris, France, 2000. [4] J. Stephen Downie. The music information retrieval evaluation exchange (mirex). D-Lib Magazine, 12(12), 2006. [5] Marcus Furuholmen, Kyrre Harald Glette, Jim Tørresen, and Mats Erling Høvin. Indirect Online Evolution - A Conceptual Framework for Adaptation in Industrial Robotic Systems, pages 165–176. Springer, 2008. [6] Rolf Inge Godøy. Gestural-sonorous objects: embodied extensions of schaeffer’s conceptual apparatus. Organized Sound, 11(2):149–157, 2006. [7] Rolf Inge Godøy, Alexander Refsum Jensenius, and Kristian Nymoen. Production and perception of goal-points and coarticulations in music. In ASA-EAA Conference, Paris, France, 2008. [8] Alexander Refsum Jensenius. Action–Sound : Developing Methods and Tools for Studying Music-Related Bodily Movement. PhD thesis, University of Oslo, 2007. [9] Alexander Refsum Jensenius, Kristian Nymoen, and Rolf Inge Godøy. A multilayered GDIF-based setup for studying coarticulation in the movements of musicians. In Proceedings of the 2008 International Computer Music Conference, Belfast, Northern Ireland, 2008. ICMA, San Fransisco. [10] Tellef Kvifte. Instruments and the electronic age. Solum, Oslo, 1989. [11] Joseph Malloch and Marcelo M. Wanderley. The T-Stick: from musical interface to musical instrument. In NIME ’07: Proceedings of the 7th international conference on New interfaces for musical expression, pages 66–70, New York, NY, USA, 2007. ACM. 8 [12] Maurizio Mancini, Roberto Bresin, and Catherine Pelachaud. From acoustic cues to an expressive agent. Gesture in Human-Computer Interaction and Simulation, 3881/2006:280–291, 2006. [13] Marek P. Michalowski, Selma Sabanovic, and Hideki Kozima. A dancing robot for rhythmic social interaction. In HRI ’07: Proceedings of the ACM/IEEE international conference on Human-robot interaction, pages 89–96, New York, NY, USA, 2007. ACM. [14] Dan Overholt. The Overtone Violin: A new computer music instrument. In Proceedings of the 2005 International Computer Music Conference, pages 604– 607, Barcelona, Spain, 2005. [15] Jean-Claude Risset. Timbre analysis by synthesis: representations, imitations, and variants for musical composition. In Representations of musical signals, pages 7–43. MIT Press, Cambridge, MA, USA, 1991. [16] Pierre Schaeffer and Guy Reibel. Solfege de l’objet sonore. ORTF, Paris, France, 1967. [17] B. Vines, M. Wanderley, C.Krumhansl, R. Nuzzo, and D. Levitin. Performance gestures of musicians: What structural and emotional information do they convey? In Gesture-Based Communication in Human-Computer Interaction 5th International Gesture Workshop, GW 2003, Genova, Italy, April 15-17, 2003, Selected Revised Papers, pages 468–478. Springer Verlag, 2004. [18] David L. Wessel. Timbre space as musical control structure. Computer Music Journal, 3(2):45–52, 1979. 6 Signatures Kristian Nymoen Ph.D. candidate Jim Tørresen Principal supervisor Rolf Inge Godøy Subsidiary supervisor Mats E. Høvin Subsidiary supervisor Alexander Refsum Jensenius Subsidiary supervisor

Computerised musical intelligence 1 Introduction

Related documents

Products

Support

Computerised musical intelligence 1 Introduction

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib