Computerised musical intelligence 1 Introduction

advertisement
Computerised musical intelligence
Ph.D. research project
Kristian Nymoen
Vibesgate 24
0356 Oslo
kristian.nymoen@imv.uio.no
1
21.11.2008
Introduction
Short project summary
Musical performances hold complex information which is interpreted by humans through
our cognition. In a musical performance, both the sound and the movement of the performer are significant factors in how we interpret the musical expression [17]. It has
been suggested that a deeper knowledge of these complex musical expressions may
be attained by studying the lower-level technical features of the movements of the
performer [16] [6].
In this Ph.D. project I shall investigate the possibility of making computers understand these expressions. I aim to develop prototypes of intelligent computer systems
for music, i.e. systems capable of understanding and responding to a musical input
based on detection and classification of action and sound features found in performance.
Main research question
How can the semantics of a musical performance be captured and classified by a machine learning system through audio, video, motion capture and other sensor data?
By proposing this research question I aim to complement the Sensing Music-related
Actions (SMA)1 project goals of developing machine-learning techniques and segmentation methods for extracting semantic actions from a continuous stream of sensor data,
and developing prototypes of enactive media devices that allow continuous control of
music based on the actions of the user. The SMA project has a main focus on the
user/listener. I shall keep a main focus on the musical performer.
1 The Sensing Music-related Actions research project is a collaboration between the Departments of Musicology and
Informatics at the University of Oslo. The SMA project aims to develop sensor technologies and intelligent systems for
musical interaction. Such systems will require a capability of classifying musical expressions. I will collaborate closely
with this project.
1
2
Primary project goals
• Develop a computerised system capable of segmenting and classifying action and
sound features of a musical performance.
• Develop new intelligent interfaces to be used in musical performance. By an
intelligent interface I mean an interface capable of learning new control actions.
• Develop intelligent systems for computerised music generation, e.g. systems for
sound synthesis, robot control and animation.
2
Theoretical Background
Music as an interdisciplinary research field
From a research perspective, I believe music holds an interesting position. To achieve
a deeper understanding of it, one needs to include several academic disciplines. For
instance, without the humanities and aesthetics perspectives, music would merely be
calculations of physical relationships between sounds. And without the perspective of
the natural sciences, we would be very limited in terms of understanding and expanding
the aesthetic field of music. Physical laws and mathematical models, together with
psycho-acoustics and cognitive science, make up the foundation for our knowledge
of sound and music perception. Sensor technologies and computer algorithms have
enabled development of new musical instruments, new ways of musical interaction,
and new methods to gain knowledge of human actions and reactions, in musical and
other contexts.
Music, technology and interface design
New Interfaces for Musical Expression (NIME) has over the last decade become one
of the major research fields within the area of music technology research. New ways
of interacting with musical instruments and other musical and non-musical interfaces
has also caught the eyes of commercial interests, in particular the computer gaming
industry in the development of games and consoles like Guitar Hero and Nintendo Wii.
Human musical interaction with computers has evolved from being constrained
to communicating MIDI data between a keyboard interface and a sound generator, to
a variety of protocols and interfaces. These interfaces may be controllers, sending data
based on a variety of sensors on the interface (eg. the T-Stick [11]). Interfaces may
also be augmented instruments, i.e. interfaces resembling the playing technique of a
traditional musical instrument, augmented with sensor technology for sound modifica-
3
tion, like the Overtone Violin [14]. Interfaces may also be extended with vibrotactile
feedback or force-feedback, like in the Cellomobo and the Haptic Drum [1].
The International Society for Music Information Retrieval (ISMIR) is concerned
with extracting musical information from data files (e.g. audio files and midi files).
Musical information here denotes music theory, meta-information, emotive content,
etc. Among the goals within the music information retrieval community are to be able
to classify audio files to genres and compare melodies in audio files by using computer
algorithms [4].
It has been suggested that the study of movement and motor modalities will enhance our understanding of sound and music perception [6] [8]. New sensor technology
and methods for storing, streaming and analysing movement data, provide increased
knowledge of cognitive processes in musical contexts. For instance, in our own studies at the Dept. of Musicology, using the Sound Description Interchange Format to
work with movement data of several sample rates simultaneously for studying the phenomenon of co-articulation2 in musical movement [9]. I believe this knowledge may
be applied to development of a computerised musical intelligence by applying machine
learning algorithms to classification of this hierarchical structure of superordinate and
subordinate movements.
3
Method
This study will build on current research in sound and movement analysis, where
the Sound Description Interchange Format (SDIF) and the Gesture Description Interchange Format (GDIF) have been suggested as bridges between technological aspects
(e.g. sensor data or audio analysis) and human descriptive terms (e.g. hand movement or timbre). A wide taxonomy for these descriptions has already been developed
through various research (e.g. [18], [10] and [3]), and constitute a good foundation
for development of a computerised musical intelligence. The classification of musical
features in my project will build on previous work within this field, as well as be a part
of the ongoing GDIF development [9].
My goals of developing an intelligent system for classifying segments of a musical
performance and the goal of developing systems for intelligent music generation are
closely linked together. Within the field of audio analysis, the concept of analysis-bysynthesis has been a widely used method for understanding the structure of signals
[15]. I believe that this may be expanded to several other disciplines, including the
discipline of simultaneous analysis of movement and audio. If a computer system may
2 Co-articulation theory suggests that human actions are performed with certain goal-points on which we put our
attention, and the trajectory between these goal-points are made up of a superordinate goal-directed movement and
smaller subordinate movements [7].
4
learn to generate a human-like musical expression, this system is likely to be a good
model of human cognitive processing of a musical input.
In other similar research where music is used as a framework for studying robots
or other computerised systems and musical interaction, the main focus is typically on
features like beat tracking [13], or expression of predefined emotive features in music
[2], [12]. Developing a self-evaluating system where the evaluation is based on rules
evolved through evolutionary computing could eventually lead to a system evolving
to understand musical emotions, rather than a system being trained to understand
musical emotions based on human descriptions of emotive features of music.
I shall approach this project by creating a ”virtual perceiver”, i.e. a computer
system obtaining audio, video, motion capture and sensor data, and classifying the data
by using models inspired by human cognition. This may probably be solved by applying
models simulating the limitations of human visual and auditory senses (e.g. masking
of sound) to the input, and using evolutionary algorithms to simulate the cognitive
processing of the input data. The necessary facilities and equipment for getting this
raw data is available in the FourMs lab at the University of Oslo where I will do much
of my research. I will investigate ways to analyse this data, both by looking at existing
methods and experimenting with new ways of analysis in collaboration with the SMA
project.
By applying techniques from evolutionary computing, the system may evolve to
”understand” the musical performance and simulate the performance through animation or by controlling a robot. A first step, as illustrated in figure 1, could be to analyse
data from a single system (e.g. motion capture) for a single musical action and then
use this analysis as input to an evolutionary algorithm designed to evolve raw data
which corresponds to the raw data from the original system.
Musical
action
Good reverse
model of analysis
If satisfactory
solution is found
Optitrack
motion capture
Evaluate against raw data
Fitness
evaluation
If no satisfactory
solution is found
Phenotypes to
be evaluated
Perceptual
filtering
Analysis
Candidate solutions
for reverse model of
perceptual filter
and analysis
Generate
genomes
Figure 1: Preliminary model of a simple system for evolving a reverse model of the analysis used for
analysing the musical action. This model only considers simulation of the original performance based
on motion capture input. A number of such evolved ”reverse models” may constitute a set of rules or
possibilities for a more complex system.
5
I will investigate how a set of such evolved rules and models may be used in
development of a larger system which is able to understand and generate music based
on input from several different input devices. As a final application near the end of
the research period, I aim to develop a system which is capable of performing with a
musician. I shall investigate animation and robotics as ways for this system to express
a musical movement. Figure 2 shows a high-level schematic illustration of what such
a system with a robot and sound synthesis may look like.
Musical
Performance
Motion capture data, video,
audio, sensor data
Robot control and
sound synthesis
Motion capture data, video,
audio, sensor data
Perceptual and cognitive
filtering
Analysis and classification
based on evolved rules
Perceptual and cognitive
filtering
Figure 2: High-level model of a computer system for performing together with a musician. The robot
control and sound synthesis is continuously evaluated against the input from the human musician.
Co-evolution is currently one of the research topics in the Robotics and Intelligent
Systems group [5]. In this technique, the evolutionary computing is performed on a
model of the real environment before it is executed in the real environment. The model
is under continuous evaluation, and is continuously adapting to the real environment.
Because music, both in a short and a long time scale, is a continuously developing
phenomenon, it will be interesting to investigate use of co-evolution in my applications.
4
Relevance to current international research
The main foundation of this project is in Human-Computer Interaction in combination with Machine Learning (e.g. Evolutionary Computing), Music Technology and
Music Cognition. Our research group is already working internationally in the projects
Structured Understaning of Music (SUM)3 and the EU funded Sonic Interaction Design
(COST-SID).4 My Ph.D. project is, as previously stated, also relevant to the NIME
and ISMIR communities in a global scale, and to the SMA project at the University
of Oslo at a local scale. I believe that the envisaged results of my project will also be
of interest to commercial actors like the gaming industry.
3 http://www.re-new.dk/index.php?pid=62
4 http://www.cost-sid.org/
6
5
Progress plan and milestones
I will submit at least one paper as first author in 2009. This will be based on my
master degree project, and therefore not directly linked to this PhD project. Ia also
already working on material as co-writer for publications during the next semester.
I shall publish papers for each major step in the process, with the most productive
period being 2010-2011. An exact plan is hard to make, but I believe the schedule
outlined below is sustainable. I do plan to spend at least one semester abroad, most
likely in 2010. I have not yet determined the exact time and location for this.
YEAR
SEMESTER
WORK OUTLINE
spring
Investigate the potential of the technologies that
I am going to use for collecting performance data.
This will be the Optitrack motion capture system
in particular, as well as audio and video analysis.
2009
2010
autumn
Develop a system for simple classification of
sound and movement data.
Writing papers.
spring
Case-studies of performing musicians. Use the
results for improving the system. Do the initial
work on intelligent musical interfaces.
autumn
Continued development of intelligent interfaces.
Use the interfaces in concerts. Initial work on animation and sound generation for a system simulating a musical performance.
Writing papers
Spring
Further development of the system: Implement
robot control, animation, sound generation and
music theory.
Writing papers.
2011
autumn
Case study of musicians playing together with the
system. Evaluations, redesign.
Using the system in concerts.
Writing papers.
spring
Completing the system.
autumn
Dissertation writing.
2012
MILESTONES
Develop the initial
classification
system
Develop intelligent
musical interfaces
Develop a system
for performing
music together
with human
musicians
Dissertation
completion
7
References
[1] Edgar Berdahl, Hans-Christoph Steiner, and Collin Oldham. Practical hardware
and algorithms for creating haptic musical instruments. In NIME ’08: Proceedings
of the 8th international conference on New interfaces for musical expression, pages
61–66, Genova, Italy, 2008. Casa Paganini.
[2] Birgitta Burger and Roberto Bresin. Displaying expression in musical performance
by means of a mobile robot. In Affective Computing and Intelligent Interaction,
volume 4738/2007 of Lecture Notes in Computer Science, pages 753–754. Springer,
Berlin / Heidelberg, 2007.
[3] Claude Cadoz and Marcelo Wanderley. Gesture-music. In Marcelo M. Wanderley
and Marc Battier, editors, Trends in Gestural Control of Music, pages 71–94.
IRCAM — Centre Pompidou, Paris, France, 2000.
[4] J. Stephen Downie. The music information retrieval evaluation exchange (mirex).
D-Lib Magazine, 12(12), 2006.
[5] Marcus Furuholmen, Kyrre Harald Glette, Jim Tørresen, and Mats Erling Høvin.
Indirect Online Evolution - A Conceptual Framework for Adaptation in Industrial
Robotic Systems, pages 165–176. Springer, 2008.
[6] Rolf Inge Godøy. Gestural-sonorous objects: embodied extensions of schaeffer’s
conceptual apparatus. Organized Sound, 11(2):149–157, 2006.
[7] Rolf Inge Godøy, Alexander Refsum Jensenius, and Kristian Nymoen. Production and perception of goal-points and coarticulations in music. In ASA-EAA
Conference, Paris, France, 2008.
[8] Alexander Refsum Jensenius. Action–Sound : Developing Methods and Tools for
Studying Music-Related Bodily Movement. PhD thesis, University of Oslo, 2007.
[9] Alexander Refsum Jensenius, Kristian Nymoen, and Rolf Inge Godøy. A multilayered GDIF-based setup for studying coarticulation in the movements of musicians. In Proceedings of the 2008 International Computer Music Conference,
Belfast, Northern Ireland, 2008. ICMA, San Fransisco.
[10] Tellef Kvifte. Instruments and the electronic age. Solum, Oslo, 1989.
[11] Joseph Malloch and Marcelo M. Wanderley. The T-Stick: from musical interface to
musical instrument. In NIME ’07: Proceedings of the 7th international conference
on New interfaces for musical expression, pages 66–70, New York, NY, USA, 2007.
ACM.
8
[12] Maurizio Mancini, Roberto Bresin, and Catherine Pelachaud. From acoustic cues
to an expressive agent. Gesture in Human-Computer Interaction and Simulation,
3881/2006:280–291, 2006.
[13] Marek P. Michalowski, Selma Sabanovic, and Hideki Kozima. A dancing robot for
rhythmic social interaction. In HRI ’07: Proceedings of the ACM/IEEE international conference on Human-robot interaction, pages 89–96, New York, NY, USA,
2007. ACM.
[14] Dan Overholt. The Overtone Violin: A new computer music instrument. In
Proceedings of the 2005 International Computer Music Conference, pages 604–
607, Barcelona, Spain, 2005.
[15] Jean-Claude Risset. Timbre analysis by synthesis: representations, imitations,
and variants for musical composition. In Representations of musical signals, pages
7–43. MIT Press, Cambridge, MA, USA, 1991.
[16] Pierre Schaeffer and Guy Reibel. Solfege de l’objet sonore. ORTF, Paris, France,
1967.
[17] B. Vines, M. Wanderley, C.Krumhansl, R. Nuzzo, and D. Levitin. Performance
gestures of musicians: What structural and emotional information do they convey?
In Gesture-Based Communication in Human-Computer Interaction 5th International Gesture Workshop, GW 2003, Genova, Italy, April 15-17, 2003, Selected
Revised Papers, pages 468–478. Springer Verlag, 2004.
[18] David L. Wessel. Timbre space as musical control structure. Computer Music
Journal, 3(2):45–52, 1979.
6
Signatures
Kristian Nymoen
Ph.D. candidate
Jim Tørresen
Principal supervisor
Rolf Inge Godøy
Subsidiary supervisor
Mats E. Høvin
Subsidiary supervisor
Alexander Refsum Jensenius
Subsidiary supervisor
Download