Uploaded by sureshdoraiswamy

Modelling&AnalysisOfCarnaticMusicUsingCategoryTheory

advertisement
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
967
Modeling and Analysis of Indian Carnatic
Music Using Category Theory
Sarala Padi, Spencer Breiner, Eswaran Subrahmanian, and Ram D. Sriram, Fellow, IEEE
Abstract—This paper presents a category theoretic ontology
of Carnatic music. Our goals here are twofold. First, we will
demonstrate the power and flexibility of conceptual modeling
techniques based on a branch of mathematics called category
theory (CT), using the structure of Carnatic music as an example.
Second, we describe a platform for collaboration and research
sharing in this area. The construction of this platform uses formal methods of CT (colimits) to merge our Carnatic ontology
with a generic model of music information retrieval tasks. The
latter model allows us to integrate multiple analytical methods,
such as hidden Markov models, machine learning algorithms, and
other data mining techniques like clustering, bagging, etc., in the
analysis of a variety of different musical features. Furthermore,
the framework facilitates the storage of musical performances
based on the proposed ontology, making them available for
additional analysis and integration. The proposed framework
is extensible, allowing future work in the area of rāga recognition to build on our results, thereby facilitating collaborative
research. Generally speaking, the methods presented here are
intended as an exemplar for designing collaborative frameworks supporting reproducibility of computational analysis and
simulation.
Index Terms—Categorical framework for Carnatic music,
categorical structure for rāga, category theory (CT),
ontology.
I. I NTRODUCTION
ROUND the world, a tremendous number of audio music
files are accessed every day for personal usage, research,
and analysis purposes. The area of music information retrieval
(MIR) studies techniques and methods for helping users find
the music files they want. Thus, MIR concentrates on archival
methods, providing metadata, annotations, search algorithms,
and analysis of music to help the users filter an ocean of musical content. Especially as listening habits migrate from hard
drives to the cloud, MIR is becoming an increasingly critical
area of research [1].
There are two general classes of MIR: 1) content-based
and 2) metadata-based [2], [3]. The former involves retrieval
based on musical content, which a user might provide by
A
Manuscript received May 12, 2016; revised August 29, 2016; accepted
November 4, 2016. Date of publication January 31, 2017; date of current
version May 15, 2018. This paper was recommended by Associate Editor
L. Sheremetov.
The authors are with the Department of ITL, National Institute
of Standards and Technology, Gaithersburg, MD 20899 USA (e-mail:
sarala.padi@nist.gov).
This paper has supplementary downloadable multimedia material available
at http://ieeexplore.ieee.org provided by the authors. This includes a brief
introduction to Category Theory. This material is 116 KB in size.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSMC.2016.2631130
humming or singing. However, the vast majority of MIR is
metadata-based; we retrieve music segments by searching for
text related to the segment, such as its title or artist. In this
paper, we will focus exclusively on metadata-based MIR.
Although metadata-based retrieval is convenient for users
and easy to implement, the metadata on which these searches
depend must be expressive enough and accurate enough to
support retrieval tasks. This makes metadata creation and
evaluation one of the most significant challenges in metadatabased MIR. In many cases metadata collection requires explicit
supervision to ensure accurate results. It may also be difficult
to maintain a consistent format (including the choice of data
fields) in our music databases, especially when these grow
rapidly.
One way to ease, though not eliminate, these challenges is
through the use of domain modeling. If we have some rules
about the way that different pieces of metadata relate to one
another, then we can evaluate new data against these rules to
recognize some incorrect inferences. Similarly, modeling the
structure of a musical domain can allow us to cross-check
related inferences.
There is a significant body of literature modeling Western
classical music, including tempo estimation, beat tracking,
instrumental music segmentation, instruments classification,
and transcription [4]–[9]. By comparison, there has been
little effort to analyze Indian classical music and, correspondingly, most MIR technologies have not been applied in this
context [10]–[12]. Here, we try to address this gap.
Indian classical music has two main traditions, namely,
Carnatic and Hindustani. Carnatic music is popular in the
southern part of India while Hindustani is popular in the north.
Though these two traditions are similar in certain respects,
in this paper, we choose to focus only on Carnatic music,
leaving the extension to Hindustani music and the relationship between the two branches for future work. A reader
interested in learning more about Hindustani music should
consult [13], [14].
A song in Carnatic music is composed in a specific
melody (rāga) but, unlike the case in Western classical music,
the rhythm can be rendered differently by different musicians [13], [15]. This is primarily because Carnatic music has
been handed down from teacher to student in an oral tradition. Consequently, Carnatic music typically does not have
standardized notations as are available in Western music, and
what does exist varies from school to school. This adds an
additional difficulty for the archiving and analysis of Indian
classical music.
c 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/
2168-2216 redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
968
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
Despite all this flexibility, Carnatic music is still constrained
by rules, and our goal here is to model these rules using a
mathematical discipline called category theory (CT). CT provides a precise mathematical representation for these aspects
of Carnatic music so that these properties and characteristics
can be clearly elucidated. This will allow us to check MIR
metadata for consistency with these characteristics.
There are, of course, many different modeling languages and
techniques; CT is noteworthy for its focus on the relationships
between informational entities, rather than the entities themselves. Because of this, categorical models are largely agnostic
with respect to specific choices of representational form or formalism. The success of these methods is demonstrated by the
impressive breadth of categorical methods, which have provided insights into fields as varied as music modeling [16],
computer science [17]–[20], engineering [21]–[23], theoretical
physics [24], and biosciences [25], [26]. Importantly for our
purposes, CT models are closely related to database schemas, a
fact which facilitates MIR activities, such as storage, retrieval,
and analysis.
We hope that this paper will find interested readers in several
different areas. First of all, we hope that researchers interested
in Carnatic music will be able to use and extend our model
to assist in their own work, and thereby facilitate collaboration in this area. More generally, we offer this example as a
fairly detailed study in the development and application of category theoretic models, of interest in any area of collaborative
science. In the interests of these readers we do not assume
familiarity with CT methods, and include a short introduction
in the supplementary material. However, we also hope that
experienced practitioners of CT may also enjoy this paper,
and find in it some small progress toward a methodology of
applied CT, an area of study still largely undeveloped.
This paper begins in Section II, where we describe the individual building blocks and structure of Carnatic music that will
be used for MIR tasks. In Section III, we translate this description into a categorical model. The basis of this translation is
provided as supplementary material giving a brief introduction to CT; readers without a CT background are strongly
advised to read the supplemental material before proceeding
to Section III. Section IV broadens the scope of our models by
merging the individual building blocks into a single conceptual framework to model Carnatic music as a whole. Finally,
we discuss some of the ways that categorical methods can
facilitate collaboration and group research.
II. I NTRODUCTION TO C ARNATIC M USIC
Indian classical music is one of the world’s oldest musical traditions. It has been developed over centuries, and been
influenced by many religions and cultures. The present system
of Indian music is based on two pillars: 1) rāga and 2) tāla.
The first, rāga, is the melodic component of the music, corresponding to the “mode” or “scale” in Western classical music.
The tāla, by contrast, describes the rhythmic component of a
music. Indian classical music contains two traditions, Carnatic
and Hindustani, associated with southern and northern India,
respectively. Though both traditions involve the concepts of
Fig. 1.
Basic svaras that make up a melody in Carnatic music tradition.
rāga and tāla, the interpretations of these notions vary based
on the performance practices of the two traditions [27]. In this
paper, we focus on the Carnatic tradition, leaving Hindustani
music for future analysis.
The most basic component in Carnatic music is the svara,
which is roughly analogous to a note in western music.
However, as we will see, the relationship between svara and
pitch is more complicated than in western music. In addition, these svaras determine the articulation of a note, the
way that it is sung or played. Fundamentally there are seven
svaras in Carnatic music, namely, Sadja (S), Rishaba (R),
Ghandhara (G), Madhyama (M), Panchama (P), Daivatha (D),
and Nishada (N) [13]. As shown in Fig. 1, the first note,
Sadja, is the tonic, meaning that the frequency of the other
svaras is measured relative to the pitch1 of the Sadja. Thus,
the frequency of the Panchama svara is always one and a half
times the frequency of the Sadja.
However, the remaining svaras (R, G, M, D, and N) may
each occur in one of two or three variations. Rishaba, for
example, may have a frequency ratio of (16/15), (9/8),
or (6/5). We refer to these variations by R1 , R2 , and R3 ,
respectively. Moreover, some different svaras may share the
same pitch frequency: G2 also has a relative pitch frequency
of (6/5). Altogether there are 16 svara variations spread across
12 pitch frequencies (svarastanas). The fixed svaras, S and P,
are called shudh svaras.
The situation is similar to that in Western classical music,
where E and B are pure notes in the sense that they do not have
sharp and flat modifications, whereas the remaining notes, C,
D, F, G, and A, may be modulated by sharp and flat notations
C , D , etc. It is also worth noting that when the S svara in
Carnatic music is fixed to C major in Western classical music
then the svaras S, R, G, M, P, D, and N correspond to the seven
notes of the C major scale, C, D, E, F, G, A, and B [28].
Table I lists the pitch ratios associated with each svara.
See [10], [29] for discussions of the allowed frequencies
for the 12 svarastanas and their frequency ratios relative to
the tonic. Notice that some svaras share the same relative
pitch; these are nonetheless distinguished by their articulation. This is displayed in Fig. 1, which shows the grouping of svaras by pitch (vertical groupings) and articulation
(horizontal groupings).
1 Here, we use “pitch” to refer to the frequency of a tone (in hertz) and we
use these two terms interchangeably. In later sections pitch will be the named
feature which measures the frequency in Hz for a rāga recognition tasks.
PADI et al.: MODELING AND ANALYSIS OF INDIAN CARNATIC MUSIC USING CT
969
TABLE I
C ARNATIC M USIC Svaras AND T HEIR F REQUENCY R ATIOS
(a)
(b)
Fig. 2. Svara sequence (both ascending and descending) and its generating
regular expression for a Melakartha rāga system.
A. Rāga
The melodic component of a piece of Carnatic music is
called its rāga. Rāgas are constructed by grouping together
the svaras introduced above in different ways, and each of
these groupings are associated with one of nine corresponding emotions [15]. The rāgas are developed to elicit these
emotions and, in fact, the word “rāga” is derived from the
Sanskrit word for “color” or “passion.”
A rāga is uniquely determined by a sequence of svaras.
Additionally, this sequence can be decomposed into an
aarohana (ascending) sequence followed by an avarohana
(descending) sequence. In the aarohana sequence the pitch
tends to increase, beginning at the tonic and ending again at
the tonic, one full scale higher (i.e., at twice the frequency of
the initial tonic). The avarohana sequence, on the other hand,
tends to decreases, beginning at the high tonic and ending at
the low.
Rāgas are broadly grouped into two classes, namely,
Melakartha and Janya rāgas. While both classes contain
aarohana and avarohana sequences, the Melakartha rāgas are
more restrictive in the sequences they allow. Rāgas may be
further classified as sampoorna (complete) or asampoorna.
A sampoorna rāga includes exactly seven svaras in both
the aarohana and the avarohana sequence. Any other rāga
is asampoorna. Therefore, all Melakartha rāgas are sampoorna rāgas while Janya rāgas, may be either sampoorna
or asampoorna rāgas.
In Melakartha rāgas, both the aarohana and avarohana
sequences are required to contain exactly one note from each
svara class (i.e., the horizontal groupings as shown in Fig. 1).
Moreover, in the aarohana sequence the pitch of each svara
must be higher than the one before. Thus, e.g., the svara R2
can be followed by G2 or G3 , but not by G1 . Similar (but
reversed) requirements hold for the avarohana sequence. This
means that the frequency of notes in a rāga gradually increases
from the tonic to its middle note (one full scale higher), and
then gradually descends to return to the tonic at the end of
the rāga. Fig. 2(a) gives a finite state machine and a regular
expression which will produce all of the aarohana sequences
in Melakartha rāgas.
By contrast, Janya rāgas are much more flexible. Their
sequences usually contain fewer than seven notes, although
sometimes they may have seven or more notes. A given
sequence in a Janya rāga may also repeat certain svaras, and
need not be strictly ascending or descending. For a comparison
of Melakartha and Janya rāgas [see Fig. 2(b)].
There are further restrictions on Melakartha rāgas which
Janya rāgas do not share. For Melakartha rāgas, both the
aarohana and avarohana sequences must contain exactly the
same svaras, and these sequences must be exactly the reverse
of one another. Thus, all together there are 72 possible
Melakartha rāgas constructed from the allowed combinations
of svaras indicated in Fig. 2.
There is, however, an important relationship between the
two types of rāgas: every Janya rāga is derived from a
Melakartha rāga. Starting from a Melakartha rāga, one usually drops or adds a small number of svaras to arrive at an
associated Janya rāga.
This relationship is also reflected in an additional characteristic of rāgas: the emotion associated with a rāga.
Traditionally, each rāga is associated with one of nine emotions, include Bhakti (ritual or devotional) and Viram (bravery
or fury). A thorough discussion is given in [30]. Here, the
important observation is that a Janya rāgas shares the same
emotion as the Melakartha rāgas that it was derived from.
B. Tāla
In the Carnatic music tradition tāla and rāga are concepts
of equal importance in rendering a composition. Tāla defines
the rhythmic structure or framework using which music is
performed. In general tāla means clap and is used to maintain the time which in turn determines the rhythmic structure
or pattern for a musical composition. Generally, there are no
special instruments to maintain the tāla in Indian classical
music performance. It is usually maintained by the musician
by tapping of the hand on the lap or using both the hands like
clapping. As shown in Table II, seven tālas are characterized
970
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
TABLE II
Tāla NAMES AND C ORRESPONDING S TRUCTURES (PATTERN ) IN
C ARNATIC M USIC T RADITION . T HE N OTATIONS IN THE TABLE
ARE : laghu—I, drutam—O, AND anudrutam—U
TABLE IV
L IST OF G ATIS AND THE N UMBER OF N OTES P ER B EAT
TABLE III
JATI NAMES AND N UMBER OF B EATS FOR
laghu (I) S PECIFIED IN tāla Pattern
using three different notations, namely, laghu (I), drutam (O),
and anudrutam (U).2
Any piece of Carnatic music has a fixed cycle called an
avartana, and each cycle is divided into basic time intervals
called units or aksharas. The number of aksharas in a cycle
is determined by two pieces of information: 1) a tāla and
2) a jati. A tāla is built up as a combination of three different clapping styles. Andrutam (indicated using U) consists of
a single beat counted with an ordinary (palm-to-palm) clap.
Drutam (O) consists of two beats: one ordinary clap followed
by an “empty” clap placing the back of one hand into the palm
of the other. Finally laghu (I) consists of a variable number of
beats. The first is an ordinary clap followed by counting the
remaining beats on the fingers of one hand. There are seven
different tālas, each corresponding to a different sequence of
these components. Thus, for example, the Jampa tāla corresponds to the sequence IUO while Ata tāla corresponds to
IIOO. A list of all these tālas, together with the associated
sequences, are described in Table II.
The variability of the laghu component of a tāla is determined by a second characteristic of Carnatic rhythm: the jati.
There are five traditional options for jati, each of which determines a different number of beats for the laghu component of
a tāla. So the Tisra jati involves three beats for each laghu
component while Misra involves seven. These various jatis,
along with their number of beats, are indicated in Table III.
The rhythm of a Carnatic composition is determined by a
tāla and a jati, a pair which we call a tāla structure. Any
combination is valid, so there are 7 × 5 = 35 tāla structures.
From a tāla structure we can compute the number of beats in
a full musical cycle. For example, the ata tāla (with pattern
IIOO) in the tisra jati would consist of 3 + 3 + 2 + 2 = 10
beats per cycle, whereas the same tāla in the misra jati would
consist of 7 + 7 + 2 + 2 = 18 beats per cycle.
2 laghu—indicates one beat of palm followed by counting the fingers,
drutam—indicates one beat of the palm and turning it over and anudrutam—
indicates just one beat of the palm. The number of aksharas (beats) for drutam
is 2, anudrutam is 1, and for laghu depends on type of jati.
Fig. 3.
Illustration of tāla structure in Carnatic music tradition.
The final component of Carnatic rhythm is called its gati.
This determines the number of notes which are played or sung
for each beat (akshara). Combining this with the number of
aksharas per cycle we can also compute the number of notes
per cycle. One point of potential confusion here arises from
the fact that the gati names are the same as the jati names,
although the two can vary independently. In addition, the number of notes per akshara for a given gati is the same as the
number of beats per laghu (I) for the jati of the same name
(see Table IV). Thus, a performance in tisra gati will have
three notes per beat, regardless of its jati.
A full example of this rhythmic structure is illustrated in
Fig. 3 for Rupaka tāla Kanda jati with Tisra gati. Because
Rupaka tāla has the pattern OI, the entire cycle is divided
into two parts: 1) drutam (O) and 2) laghu (I). As always,
drutam has fixed number of beats (2) while the number of
beats for laghu is determined by the Kanda jati (5), making
seven in total. Finally, the Tisra gati entails three notes per
akshara, meaning that the full cycle will consist of 21 notes.
C. Performance in Carnatic Music
In the Carnatic music tradition, performance can be vocal
or instrumental. In the case of vocal performances, the lead
musician is a vocalist and in the case of instrumental performances, the lead musician is an instrumentalist (violin, veena,
mridangam, etc). In Carnatic music performances the lead
musician is generally accompanied by instruments, namely,
violin, mridangam, ghatam, morsing, and tanpura. Tanpura is
used to maintain the basic pitch throughout the concert, usually
referred as the tonic of the performance. All these instruments
are tuned to basic pitch of the lead performer. In general, any
concert or musical performance is a combined effort from the
lead musician and various instrumentalists who are accompanying the lead performer. Therefore, a piece of music is usually
PADI et al.: MODELING AND ANALYSIS OF INDIAN CARNATIC MUSIC USING CT
971
Fig. 5. Methods and features used for rāga recognition task in Carnatic
music tradition.
Fig. 4.
Research tasks in the MIR community for Indian classical music.
tagged with song name, rāga name, tāla name, and composer
name along with the names of all the musicians involved in
a performance. This data is referred to as metadata and it is
critical for MIR, and providing it is the main archival goal
in MIR. The following section gives an overview of current
research on archival methods and retrieval of Carnatic music.
1) Research in Carnatic Music (State of the Art): This section gives an overview of the state of the research in both the
Carnatic and the Hindustani musical traditions. In particular,
various methods and features used in the literature to recognize the rāga of a given music segment are described fully
to help understand the categorical models discussed in later
sections. Fig. 4 illustrates the state of research useful for MIR
purposes in Indian classical music, an area which has grown
substantially in the last ten years.
One aspect of this research, CompMusic,3 is a research
project focused on a wide variety of musical traditions,
including Indian classical music, Chinese music, Turkish
music, etc. “Dunya,” a culture-specific toolbox developed
under this project, includes tools for music segmentation,
feature extraction, meta-data extraction, and meta-data-based
retrieval of music segments in wide variety of musical traditions [31]–[38]. Apart from the CompMusic project, other
efforts have focused on analyzing Carnatic and Hindustani
music for various processing tasks (both vocal and instrumental music) which are useful for MIR [39]–[44].
For most MIR-related tasks, metadata plays a dominant role;
for example, a composer’s name is among the most useful
metadata for information retrieval. Thus, composer identification is one of the most critical tasks among those given
in Fig. 4. In Carnatic music, composers often have a unique
signature (mudra) which they weave into their compositions.
Thus, one possible way to identify the composer of a music
segment is to identify its mudra, which occurs near the end
of the a composition. As yet, no one has developed any
technique or feature to find the mudra of a composition segment for either Carnatic or Hindustani music. The different
borders around the tasks in Fig. 4 distinguishes them into
3 http://compmusic.upf.edu/
three classes: 1) those which have not yet been attempted;
2) those which are in progress; and 3) those which have been
successfully completed.
Fig. 5 elaborates on the rāga recognition task by giving an
overview of the features and methods used for identifying the
rāga of a music segment. Such a task involves a particular
method which is used to analyze a particular feature in hopes
of identifying the rāga of a music segment. For example, two
recognition procedures considered in this paper are the application of hidden Markov models (the method HMMs) [45]
to mel frequency cepstral coefficients features [46], and the
application of the K-nearest neighbor [47] method to the pitch
feature [48]. Many other features (pitch class distributions and
pitch class dyad distributions) and methods (support vector
machines and linear discriminant analysis) are available for
these recognition tasks.
At present, there is no common platform used to store,
share, and integrate the results obtained from these tasks. This
state of affairs is due to the lack of a common model of
Carnatic music that is needed to integrate these studies. The
categorical model that we propose will allow users to develop
and work with such common database collections and to easily share their results with a larger group. In a broad sense, the
model proposed in this paper provides the complete structure
necessary for group of researchers to work together, share their
data and results, and store the data in a collective database.
The proposed categorical framework is intended to facilitate
a platform for collaborative research and it can be adapted to
other domains as well.
III. C ATEGORY T HEORETIC R EPRESENTATION
OF C ARNATIC M USIC
Knowledge representation plays a fundamental role in
describing any domain problem in which a computer must
interpret and process data for further analysis. In many real
world applications, modeling the objects and relationships in
a domain constitutes a major contribution to such knowledge
representation. The most important criteria for representing
any domain problem is that knowledge should be properly
defined so that it is consistent with the problem domain.
Therefore, the primary challenges of knowledge representation
are as follows.
1) Choosing the problem to solve.
972
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
2) Representing a problem and providing the required
knowledge to solve the problem.
3) Validating the appropriateness of the knowledge before
solving the problem.
4) Solving the problem computationally and evaluating the
correctness of the solution to the problem.
5) Comprehensive representation of the knowledge so that a
computer can solve the problem for a particular domain.
There are many models for knowledge representation. One
class of such models are called ontologies, which provide an
explicit specification and abstraction of knowledge [49]–[52]
in either human-readable form or a formal language. These
typically begin by specifying the types of entities which occur
in the problem domain, then supplemented with a collection
of rules that the entities are expected to obey.
The most popular approach to ontologies rely on a family of
Web ontology languages (OWL) [53]–[55]. In MIR, these have
been applied to model music for segmentation and the extraction of semantic information [56], [57]. The OWL approach
is based on description logic and is often used in conjunction with other technologies, such as the resource description
framework and the SPARQL the query language, and can also
be mapped to relational database schemas for storage and
retrieval [58], [59].
One shortcoming of existing ontological approaches is a
lack of extensibility. It can be difficult to modify an existing
ontology in order to extend or debug its domain. In particular, it is difficult to relate different ontologies to one another,
particularly in cases where neither ontology embeds into the
other. A lack of formal relationships between ontologies also
leads to difficulties when migrating data from one ontology to
another.
Similar issues arise for ontological integration. Domainspecific OWL ontologies are often specialized from large and
complex “upper” ontologies, e.g., [60]. The advantage is that
this method can align multiple small ontologies derived from
the same upper ontology. The disadvantage is that these upper
ontologies can be difficult for developers to write and debug
and for users to navigate. A more modular approach, based on
identification of overlap between small ontologies, depends on
a concrete representation of that overlap.
In this paper, we investigate an alternative approach to
ontologies based on CT. CT was developed in the 1940s
by Eilenberg and MacLane [61] to study the relationship
between two areas of mathematics: 1) topology and 2) algebra.
Subsequent research has led to applications throughout mathematics as well as in theoretical physics and computer science.
In 1990s, Rosebrugh and Wood [62] showed that a database
schema can be viewed as a category, and a database instance
as a functor on this category [63]. More recently, Spivak [64]
has used this point of view to apply the methods of CT to
problems of data management. Generally speaking, this line
of work provides a dictionary of categorical interpretations
for the standard vocabulary of databases, such as schemas,
instances, queries, updates, and data migration. This paper has
also been implemented in the functorial query language, software for building and analyzing databases from the categorical
perspective [65].
There is also a broad and substantial literature on the
use of CT in formal modeling. One line of inquiry following on this database work can be found in the work
of Johnson and Rosebrugh [63], who provide a categorical interpretation of entity–attribute–relation diagrams. This
paper was further elaborated by Diskin [66], [67], and
Rutle et al. [68], with a particular emphasis on software
engineering. Most importantly, whereas earlier work was
largely theoretical, much of this more recent work has been
implemented and is directly informed by engineering practice.
Along broadly similar lines, CT can be used to understand the object-oriented class modeling as found, for example,
in the universal modeling language (UML). Although it is
ubiquitous in (especially the early stages of) software engineering, UML lacks well-defined semantics. Both Diskin [69]
and Padi et al. [70] have associated (fragments of) the UML
class diagram syntax with constructions in CT. Turning this
relationship around, CT straight-forwardly inherits many of
the methods used in object-oriented class modeling.
The formal mathematics of CT allow us to sidestep some of
the difficulties of other ontological representations. For example, one advantage of categorical ontologies is that we can
express their inter-relationships using maps called functors.
More generally, a diagram of functors can describe sophisticated relationships between families of ontologies, which can
then be merged together using a construction called a colimit.
Kan extensions provide a formal method for data migration
between schemas. These powerful formal methods allow us to
manipulate categorical ontologies in a way that is both correct
and mathematically justified [71]–[73].
The previous section described some fundamental aspects of
Carnatic music. In this section, we translate that description
into a category-theoretic ontology. Our supplementary materials provide a brief introduction to the methods of CT (as
well as our notation); we strongly advise those without a
background in CT to study that material before proceeding.
In our categorical model, each object represents a set of
entities, and each arrow represents a function between these
sets. In diagrams the objects will be represented as a box containing text which describe a typical element of the set. For
example:
represents the set of svaras (notes) in Carnatic music. For
readability, we use corner braces rather than full boxes when
discussing objects in the text: e.g., a svara.
As discussed in Section II, a Carnatic melody is built up
from 16 basic notes, spread across 12 pitches and 7 articulations. Each of these will become an object in our categorical
representation. Since each of these objects represents a single
individual, these can all be modeled by terminal objects in our
category. We then group these notes by articulation, as in the
horizontal groupings of Fig. 1. As shown in Fig. 6 and 7,
categorically, these groupings take the form of coproducts,
allowing us to represent the objects Rishabham, Gandharam,
Dhaivata, Nishada, Madhyama, Sadja, and Panchama as finite
sets (also called enumerations).
PADI et al.: MODELING AND ANALYSIS OF INDIAN CARNATIC MUSIC USING CT
973
Fig. 10. Universal Mapping Property (UMP) of the product to compare
frequency ratios between svaras.
Fig. 6.
Categorical representation of notes in Carnatic music tradition.
Fig. 7.
Categorical representation of notes with respect to isomorphism.
Fig. 11. Illustrating the Universal Mapping Property of svaras shown in
Fig. 10 with an example.
Fig. 8.
Projections from an M-arohana sequence to its svaras.
Fig. 9.
Cartesian product formed from individual svara objects.
A. Modeling rāgas
Let a svara denote the coproduct of the seven objects
Sadja, Rish., etc., as described in Table I. All together this
will be a coproduct of 16 terminal objects (2 × 1 element +
1 × 2 elements + 4 × 3 element). Table I defines a function4
As a minor abuse of notation we will use the same
name freq_ratio to denote the composition of the
above map with the coproduct injections, e.g., Rish. →
a svara → Q.
These svara objects are then connected into two sequences:
1) the arohana and 2) avarohana, to form a rāga. Here, we
must distinguish between Melakartha and Janya rāgas, as their
mathematical structure is quite different. For a Melakartha
rāga, we can identify its Sadja svara, its Rishabham svara, etc.
Mathematically, this corresponds to a family of functions from
a sequence object into each of the svara objects as displayed
in Fig. 8 for the Melakartha arohana (M-arohana) sequence.
Together, these functions define a single map into the
product object (see Fig. 9).
Indeed the arohana sequence is completely determined
by these choice of these seven svaras, so the map
M-arohanasequence → Svara product will be a
4 Here, Q, the rational numbers, is the set of fractions.
monomorphism (i.e., an injection). We can say that these
arrows are jointly monic.
However, not every choice of svaras is allowed as an
M-arohana sequence; the rules for identifying those which
are acceptable were described in Section II. We can model
these rules using truth functions and pullbacks. One of these
rules says that the frequency of the Rishabham svara is less
than that of the Ghandharam svara (as displayed in Table I).
To say this categorically we make use of a truth function
less_than : Q × Q → {True, False}. This function is
defined by rule
True
if p < q
less_than(p, q) =
False otherwise.
Using the universal mapping property of the product we
can use this to define a truth function on Rish. × Gand.
as in Fig. 10.
Notice that the object Q appears twice in the preceding
diagram. Usually when this occurs it means that the same
object is involved in two or more different arrows; we write it
multiple times for readability, even though it really is the same
object. The best way to understand such diagrams is by tracing
an element through the diagram; since different functions give
different values, this allows us to see why the object appears
twice. For example, we might trace the pair (R1, G1) through
the diagram in Fig. 11.
Since (R1, G1) maps to True, this is an allowable transition in Melakartha rāgas. If we traced (R2, G1) through
the diagram we would end up at False, so this pair is not
allowed. In order to distinguish these cases requires the use
of the pullback operation.
Note that there is a map True : 1 → {True, False}
which sends the unique element of 1 to True. As discussed in supplementary section on CT, pulling our truth
function back along this map will yield a subset of Rish. ×
Gand. consisting of only those pairs which map to the
value True. We denote this pullback by Rish. < Gand..
See Fig. 12.
974
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
Fig. 12. Defining frequency-ordered pairs of svaras using a pullback property.
Fig. 16.
Fig. 13.
Representation of M-araohana sequence as a Cartesian product.
Fig. 17.
rāga.
Fig. 14.
Commutative diagram expressing the reversal symmetry of
Melakartha rāga.
Fig. 15.
Using the UMP of the coproduct to define the type of a rāga.
An additional relationship between Janya and Melakartha rāgas.
Similar reasoning applies to the restriction between
Dhaivata and Nishada svaras. This gives us a final definition
shown in Fig. 13.
Next, we turn to the relationship between arohana and
avarohana sequences. In Melakartha rāgas, each is the reverse
of the other. Using products and coproducts one can define a
list operator ∗ which acts on objects in the category; given a
set A, A∗ is the set of lists whose elements come from A. In
particular, each arohana sequence is a list of svaras, so we
will have a monic arrow from an M-arohana sequence into
a list of svaras.
The advantage of this point of view is that list objects come
equipped with a variety of operations. In particular, we may
reverse any sequence, corresponding to a map A∗ → A∗ . We
can then specify the relationship between arohana and avarohana in Melakartha rāgas as an equation between the two
paths such that the diagram in Fig. 14 commutes.
As discussed in Section II, this categorical model says that
every Melakartha rāga is a sampoorna rāga which is indicated
as commutative property in the above diagram.
Next consider Janya rāgas; these are much less structured
than Melakartha rāgas and, consequently, there is much less
that we can say about them at a categorical level. It is true that
Janya rāgas contain an arohana and an avarohana sequence.
We can also classify which Janya rāgas are sampoorna using
an analysis very similar to the one above. There is also a
relationship between the two types of rāgas: every Janya rāga
is derived from a particular Melakartha rāga. In our category,
this corresponds to the map provided in Fig. 15.
In general, a rāga may be either a Melakartha rāga or a
Janya rāga (but not both), so we can represent the object
Illustrating the relationship between Melakartha rāga and Janya
a rāga as the co-product of the a Melakartha rāga object
and the a Janya rāga objects.
We also find it useful to represent this information as a
typing map. Consider the two-element set {J-type, M-type} ∼
=
1 + 1. Using the fact that any object has a unique map to
the terminal object, the universal property of the coproduct
a rāga yields our typing map. See Fig. 16.
One final piece of information usually associated with a
rāga is its emotion; historically, this is usually classified
into one of nine categories. This corresponds to an arrow
has
a rāga −→ an emotion. In addition, this emotion is the
same between a Janya rāga and the Melakartha rāga which
it is derived from, corresponding to a final condition requiring that the two paths a Janya rāga → an emotion in the
diagram in Fig. 17 agree.
The above descriptions can be regarded as a recipe for defining a category which models the relationships between pieces
and types of Carnatic rāgas. One begins by assembling all of
the objects and arrows referred to above into a single graph
(which we omit here for reasons of space). Next, one assembles a list of declared path equations and, finally, a list of
all categorical constructions (i.e., products, pullbacks, monic
arrow, coproducts, and finite sets) specified in our description. Mathematically, these lists of categorical constructions
are called a sketch for our model, and there are well-known
methods for generating a category from these basic constructions (see [74]); a complete description is beyond the
scope of this paper, but is close in spirit to the categories
of Examples 2 and 2.1 in the supplementary section.
B. Modeling Rhythm Structure
As discussed in Section II, in order to describe a piece of
music we must specify both its melody and its rhythm. In
Carnatic music the rāga provides the first, while the second is
determined by what we call a Carnatic rhythm structure. This
can be broken down into three pieces: 1) the tāla; 2) the jati;
and 3) the gati; the first two pieces determine a tāla structure.
PADI et al.: MODELING AND ANALYSIS OF INDIAN CARNATIC MUSIC USING CT
Fig. 18.
Fig. 19.
975
A tāla determines a sequence (list) of beats (of type O, U or I).
Fig. 21.
Categorical representation of an instance of a tāla structure.
Fig. 22.
Categorical representation of Carnatic music rhythm structure.
Fig. 23.
Jatis and gatis with the same name also have related beat counts.
Categorical representation of tāla structure.
Fig. 20. A jati associates each type of beat (O, U or I) with a natural number,
its beat count. This figure number should be 19.
As explained in Section II-B, there are a fixed number
of tālas (seven), of jatis (five), and of gatis (five), so each
of these can be represented as a finite set (i.e., a coproduct of terminal objects). Each of these vary independently, so
the a Carnatic rhythm structure object will be a product of
these three (and consequently a coproduct of 7 × 5 × 5 = 175
terminal objects). We also distinguish the object of pairs (tāla,
jati), which we call a tāla structure.
The structural pattern associated with each tāla (as displayed in Table II) can be represented as a function, as shown
in Fig. 18. Here, we use the “Kleene star” A∗ to denote the
set of lists with elements from A.
As given on the left-hand side of the same table, the number of laghu (I) beats associated with each jati determines a
function from a jati to the natural numbers. More usefully,
because we know that the number of beats associated with drutam (O) and anudrutam (U) are always the same (two and one,
respectively), each jati determines a function {O, U, I} → N.
For example, the Misra jati (I = 7) would be associated
with the function in Fig. 19.
As discussed in the supplementary material, applying this
function entry by entry determines a related map between
∗
lists (N∗ ){O,U,I} . This is essentially the functoriality of the
list operator.
This shows that a tāla structure determines both an ele∗
ment of {O, U, I}∗ and a function in (N∗ ){O,U,I} . Since one
is a function and the other is an input, we can pair these
together and apply the evaluation function to obtain a list of
natural numbers. Summing the resulting list, we obtain the
number of beats per cycle associated with a tāla structure
shown in Fig. 20.
Just as we did for rāgas, we can examine this diagram
by tracking an element through it (Fig. 21) [although this is
made more difficult by the fact that a function List{O, U, I} →
List(N) contains an infinite number of values].
Finally, we can combine the beats/cycle information generated by a tāla structure with the notes/beat information
given by the gati in order to determine the total number of
notes/cycle in the segment (Fig. 22).
Although we do not need to use it, we should also note
that there is an isomorphism between a gati and a jati,
implicit in Tables III and IV, and that this iso commutes with
relevant counts associated with these objects, as shown in
Fig. 23.
IV. S INGLE C ONCEPTUAL M ODEL FOR A NALYSIS
In this section, we describe a method of integrating the
categorical models just defined (and others) into a single conceptual framework. We have already seen applications of limits
976
Fig. 24.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
Categorical representation for analysis of Carnatic music.
and colimits inside a category. Now, we employ some of the
same ideas to study relationships between categories.
First consider the following simple category (Fig. 24) which
contains metadata related to a musical segment. A raw file5
records a Carnatic music segment, and this segment has five
relevant pieces of metadata, including its title, artist, and
composer.
This is acceptable as a structure for storing data, but for
analysis and validation we would like to integrate it with the
categorical models from the previous sections. We can do this
using pushouts of categories.
Let R denotes the categorical ontology for rāgas which we
developed in the previous section, and M the metadata category presented in Fig. 24. Notice that both of these categories
contain an object called a rāga.6 We would like to glue
these two categories together by identifying their rāga objects.
In CT, functors are used to relate structures in one category
to analogous constructions in another [75]. Technically, these
functors must preserve the categorical structures (e.g., products
and coproducts) include in our model.
Let 1 denotes the category which contains only a single
object and its identity arrow. Because of its particularly simple form, a functor from 1 into some other category C is
determined by the choice of a single object in C. In particular, the a rāga object in R and M define two functors
ragaM : 1 → M and ragaR : 1 → R.
Consider the pushout diagram below. The functors into RM
show that this composite category contains both M and R
as subcategories. Moreover, the two paths 1 → RM agree,
meaning that the rāga objects in the two categories have been
identified. By its universal property, RM is the smallest category satisfying these two properties: it contains nothing except
the objects and arrows which come from R and M. This gives
a formal description of our intuitive goal: we have glued our
models together along their common piece. In particular, RM
is fully specified by categories M and R and the two maps
ragaM and ragaR (see Fig. 25).
Similarly, let T denotes the categorical ontology for tāla
structures; exactly the same sort of argument allows us to integrate M and T. Moreover, these two pushouts form the legs
of a third diagram, allowing us to iterate the pushout procedure. This yields a new category integrating both ontologies
into our metadata category. Of course, given ontologies for
the other metadata, such as database specifications for artist
and composer data, we could glue these into our framework
in exactly the same fashion (see Fig. 26).
5 We limit our attention to raw files that are Carnatic music segments.
6 Note, however, that unlike some other ontological approaches it is not
important that the naming of these objects agree.
Fig. 25.
Fig. 26.
pushouts.
Integrating rāga and metadata models using pushout.
Integrating rāga, tāla and metadata models through iterated
Fig. 27.
Categorical representation of task.
Fig. 28.
The task type object is a finite set consisting of five tasks.
Fig. 29.
Each piece of MIR metadata addresses a specific task type.
Fig. 30.
Categorical model for an experiment evaluation.
Finally, we need to evaluate our model for various experimental conditions. Fig. 27 shows the categorical representation
of a task. The object a task is a product consisting of three
pieces of information: 1) a feature to be used; 2) a method
of analysis required; and 3) the type of information to be
identified. The last of these, a task type, is a finite set
consisting of elements shown in Fig. 28.
PADI et al.: MODELING AND ANALYSIS OF INDIAN CARNATIC MUSIC USING CT
977
any processing task, share the code, use the existing results
for analysis purposes and verifying the results. This platform
creates a common workspace, where anyone can use a shared
database and perform analysis in a consistent manner.
Notice, in particular, the dotted line
Fig. 31. A commutative diagram expressing coherence between the task and
result of an experiment.
If we let meta-data for MIR denote the coproduct
Attached to this object there is a typing map constructed
in the same fashion as the rāga typing map introduced in
Section III (see Fig. 29).
Next, we have the notion of an experiment as depicted
in Fig. 30. Roughly speaking, an experiment is an instance
of a particular task which analyzes a raw file to generate a
single result. This result can be mapped to meta-data for MIR,
corresponding to an arrow (see Fig. 32).
There is an obvious consistency check here, which can be represented as a commutative diagram: the type of result from
an experiment is the same as the type of the task which
the experiment performed. This is shown diagrammatically
in Fig. 31.
As shown in Fig. 30, we can also keep track of some additional information like the researcher and the date on which the
experiment is conducted. This categorical model allows group
of people to do the experiments on a given music segment
with different methods and features and different users can
store, retrieve, and update the results. This framework can be
used by group of people for collaborative analysis and implementation purposes. Notice that the models of an experiment
and a task may be glued together along their overlap using the
same colimit methods discussed above. Moreover, in any particular domain the feature parameters, method parameters and
results objects may be expanded in a similar way to represent
structured data.
Fig. 32 illustrates the complete categorical model for analysis of Carnatic music with experimental evaluation, including
the tasks performed on music segments. As suggested by the
figure, an experiment is performed on a raw file which contains
a recording of a Carnatic music segment. Every experiment
involves a task which is performed using features and methods with appropriate parameters. The parameters for feature
and method vary based on type of feature and type of method
with respect to a particular task.
The idea behind this categorical model is that any user can
execute the model and store his/her outputs for a particular task using different features and methods. These can be
accessed or cross verified by other people who work on the
same music segment. The model shares information between
a group of people and allows them to collaborate on research
using the same music databases. This collaborative platform
allows for sharing the common model for modeling music for
represents a ground truth. Every file under consideration
records a particular piece of Carnatic music, so there is such
a function. However, this information is, in general, not available to us. In some sense, our goal is to identify (the metadata
of) the recorded segment based on characteristics of the file.
Alternatively, in situations where we know the ground truth of
a music segment, this provides us with additional paths from
an experiment to meta-data for MIR (refer to Fig. 33).
These alternate paths can help to verify our metadata.
Ideally, these paths should agree (for any particular task
type), but in practical experimental evaluations, this cannot
be expected in all cases. One path is derived from ground
truth about music segment and another path is taken from the
experimental evaluation. We can compare the results of both
the paths to judge the accuracy of our experimental methods.
For example, Fig. 35 provides an instance of the task model
for rāga recognition task using the HMM method applied to
the pitch feature. It also gives an example of one element in
the model, traced through the various maps. Other tasks may
involve different parameters, but this basic model provides a
framework enabling end users to integrate various pieces of
information for implementing other tasks like tāla recognition,
singer identification, song name identification, etc.
V. U SE C ASES
This section provides experimental evaluation and database
schemas for the categorical models discussed in Section IV.
A database is a collection of data that is stored and organized
so that the end user can easily and quickly access, manage, and
update the collection. In particular, relational database models
involve the storage of data in tables (also called relations).
In the relational models, each column in a table is called an
attribute and each row is called as an entity or record. Each
record is uniquely determined by an identifier called a primary
key. Columns of other tables which refer to this primary key
are called foreign keys, and these can be regarded as arrows
in a category.
In fact, Spivak has shown [64], [65] that (finitely presented)
categories can be converted into database specifications (and
vice versa), so in defining our categorical models we have
already provided a database model. The fact that categorical
models can be translated into a database schemas facilitates the
storage, analysis and fine tuning of the data based on our models. The resulting database provides the users access to the data
for further analysis and processing tasks. See [64], [65] for an
explanation of how a categorical model can be represented as
a database schema along with its integrity constraints.
Via this approach we can use our existing models of rāga,
tāla, etc., to define our underlying database models. In fact, in
most cases we do not want to store all of the data associated
978
Fig. 32.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
Single conceptual model for analysis of Carnatic music.
Fig. 33. Categorical representation of an experimental analysis of a music
segment.
Fig. 34.
Fig. 35.
Categorical representation of rāga task with example elements.
Fig. 36.
Database model for the Carnatic music conceptual structure.
Categorical representation of rāga task.
with our model. For example, the database schema depicted
in Fig. 36 includes only some of the objects and arrows from
the detailed model in Section III; this might be the only data
needed for metadata-based retrieval purposes.
The decision of which data to store and which to throw
away can be encoded as a functor from a reduced database
schema into the full categorical model. It sends each table in
Fig. 36 to the object of the categorical model which has the
same name. Based on this mapping, there is a canonical way
to migrate data from the complex schema into the simpler one.
These are called -migrations, and are discussed in [65].
Fig. 37 depicts of a generic model for implementation
and analysis of music segments. This generic model can be
PADI et al.: MODELING AND ANALYSIS OF INDIAN CARNATIC MUSIC USING CT
Fig. 37.
979
Graphical interpretation of generic model for analysis of Carnatic music.
visualized as a three-layered structure. The first layer concentrates on a categorical representation of the characteristics of
Carnatic music. This layer also details the methods, features,
and tasks which are used to perform a particular analysis.
While implementing such a task, we may pull characteristics
of music from its corresponding models.
After concluding a task, its results are stored in the database
of the second layer.
As explained in Fig. 36, the second layer in the generic
model is a database schema which mediates our category theoretic ontology and the query processing system presented to
users. This layer allows us to create database schema based
on (a fragment of) the CT models provided in first layer so
that end user can access the required information from the
database through query processing system.
The last layer of the model supports query analysis for information retrieval purposes. This layer also integrates the processing
of results based on different methods which may be evaluated
on different features. This layer allow the end user to access
or retrieve information about desired music segments. The end
user can also find the methods and features used for a particular
task for further analysis purposes. This layer extracts the data
from the database for any query analysis. In this way, the unified
conceptual model allows a group of people to share the results
and to get complete details about the tasks, features and results
of a music segment in the database. We can say that this model
allows the people to do collaborative research.
for music segments, but these efforts have been somewhat disparate, making it difficult to compare their results. This paper
discusses a categorical approach for modeling the Carnatic
music for various processing tasks for MIR applications.
Despite the complexity of traditional Carnatic music and its
basis in rāga and tāla structures, we have formally characterized these complexities using the categorical approach. The
conceptual model proposed in this paper provides a means to
model any music segment for finding metadata for information
retrieval purposes.
With respect to collaborative research, the proposed categorical model provides a common platform that allows users or
researchers to share data and verify the results on a common
database. This paper also addresses the mapping of categorical structures into database schemas for storage and retrieval
purposes. The framework developed in this paper will allow a
group of people to conduct analysis using a categorical model
on a common corpus of music segments collaboratively.
This paper provides a framework to formalize the relationship between Carnatic music structure, prior research, tasks
previously attempted, methods and features used for different music processing tasks. The general model proposed in
this paper can be adapted to other domains by replacing our
model of Carnatic music by models applicable to other domain
problems. As future work, the proposed categorical model can
be extended to characterize other musical traditions, including
Western classical music and Hindustani music.
VI. C ONCLUSION
ACKNOWLEDGMENT
In the area of MIR for Carnatic music, developing metadata for music segments is important and challenging task.
Presently, some efforts have been made to provide metadata
The authors would like to thank D. Spivak who helped to
improve the CT models defined in this paper specially for
defining models for tāla structure. The authors also thank
980
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 6, JUNE 2018
R. Wisnesky and S. Adhinarayanan for their valuable comments and suggestions for improving the consistency and
readability of this paper.
R EFERENCES
[1] N. Orio, “Music retrieval: A tutorial and review,” Found. Trends Inf.
Retrieval, vol. 1, no. 1, pp. 1–90, 2006.
[2] M. A. Casey et al., “Content-based music information retrieval:
Current directions and future challenges,” Proc. IEEE, vol. 96, no. 4,
pp. 668–696, Apr. 2008.
[3] R. Typke, F. Wiering, and R. C. Veltkamp, “A survey of music
information retrieval systems,” in Proc. ISMIR, London, U.K., 2005,
pp. 153–160.
[4] E. D. Scheirer, “Tempo and beat analysis of acoustic musical signals,”
J. Acoust. Soc. America, vol. 103, no. 1, pp. 588–601, 1998.
[5] M. A. Alonso, G. Richard, and B. David, “Tempo and beat estimation of
musical signals,” in Proc. ISMIR, Barcelona, Spain, 2004, pp. 158–163.
[6] P. Herrera-Boyer, G. Peeters, and S. Dubnov, “Automatic classification of
musical instrument sounds,” J. New Music Res., vol. 32, no. 1, pp. 3–21,
2003.
[7] T. Kitahara, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno,
“Instrument identification in polyphonic music: Feature weighting to
minimize influence of sound overlaps,” EURASIP J. Appl. Signal
Process., vol. 1, no. 1, p. 155, 2007.
[8] B. L. Sturm, M. Morvidone, and L. Daudet, “Musical instrument
identification using multiscale mel-frequency cepstral coefficients,” in
Proc. 18th Eur. Signal Process. Conf., Aalborg, Denmark, Aug. 2010,
pp. 477–481.
[9] M. Marolt, “SONIC: Transcription of polyphonic piano music with
neural networks,” in Proc. Workshop Current Res. Directions Comput.
Music, Barcelona, Spain, 2001, pp. 217–224.
[10] H. G. Ranjani, S. Arthi, and T. V. Sreenivas, “Carnatic music analysis: Shadja, swara identification and raga verification in alapana using
stochastic models,” in Proc. IEEE Workshop Appl. Signal Process. Audio
Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2011, pp. 29–32.
[11] R. Sridhar and T. V. Geetha, “Raga identification of Carnatic music for
music information retrieval,” Int. J. Recent Trends Eng., vol. 1, no. 1,
pp. 571–574, 2009.
[12] R. Sridhar and T. V. Geetha, “Music information retrieval of Carnatic
songs based on Carnatic music singer identification,” in Proc. Int. Conf.
Comput. Elect. Eng., 2008, pp. 407–411.
[13] L. Pesch, The Oxford Illustrated Companion to South Indian Classical
Music. New Delhi, India: Oxford Univ. Press, 2009.
[14] R. M. Rangayyan, “An introduction to the classical music of India,”
Dept. Elect. Comput. Eng., Univ. Calgary, Calgary, AB, Canada.
[15] T. M. Krishna, A Southern Music: The Karnatik Story, HarperCollins,
Noida, India, 2013.
[16] M. Andreatta, A. Ehresmann, R. Guitart, and G. Mazzola, “Towards
a categorical theory of creativity for music, discourse, and cognition,” in Mathematics and Computation in Music. Heidelberg, Germany:
Springer, 2013, pp. 19–37.
[17] E. Dennis-Jones and D. E. Rydeheard, “Categorical ML—Categorytheoretic modular programming,” Formal Aspects Comput., vol. 5, no. 4,
pp. 337–366, 1993.
[18] K. Williamson, M. Healy, and R. Barker, “Industrial applications of
software synthesis via category theory—Case studies using Specware,”
Autom. Softw. Eng., vol. 8, no. 1, pp. 7–30, 2001.
[19] R. Tate, M. Stepp, and S. Lerner, “Generating compiler optimizations
from proofs,” ACM SIGPLAN Notices, vol. 45, no. 1, pp. 389–402,
2010.
[20] J. C. Reynolds, “Using category theory to design implicit conversions
and generic operators,” in Semantics-Directed Compiler Generation.
New York, NY, USA: Springer, 1980, pp. 211–258.
[21] Z. Diskin and T. Maibaum, “Category theory and model-driven engineering: From formal semantics to design patterns and beyond,”
in Model-Driven Engineering of Information Systems: Principles,
Techniques, and Practice. Toronto, ON, Canada: Apple Acad. Press,
2014, p. 173.
[22] T. Giesa, D. I. Spivak, and M. J. Buehler, “Category theory based solution for the building block replacement problem in materials design,”
Adv. Eng. Mater., vol. 14, no. 9, pp. 810–817, 2012.
[23] D. I. Spivak, T. Giesa, E. Wood, and M. J. Buehler, “Category theoretic
analysis of hierarchical protein materials and social networks,” PLoS
ONE, vol. 6, no. 9, pp. 1–15, Sep. 2011.
[24] J. C. Baez and A. Lauda, “A prehistory of n-categorical physics,” in
Deep Beauty: Understanding the Quantum World Through Mathematical
Innovation. 2009, pp. 13–128.
[25] J.-C. Letelier, J. Soto-Andrade, F. G. Abarzua, A. Cornish-Bowden, and
M. Luz Cárdenas, “Organizational invariance and metabolic closure:
Analysis in terms of (M,R) systems,” J. Theor. Biol., vol. 238, no. 4,
pp. 949–961, 2006.
[26] D. Ellerman, “Determination through universals: An application of category theory in the life sciences,” unpublished paper. [Online]. Available:
https://arxiv.org/abs/1305.6958
[27] V. G. Paranjape, “Indian music and aesthetics,” J. Music Acad. Madras,
vol. XXVIII, pp. 68–71, 1957.
[28] P. Lavezzoli, The Dawn of Indian Music in the West. New York, NY,
USA: Bhairavi, Continuum, 2006.
[29] A. Bellur, V. Ishwar, X. Serra, and H. A. Murthy, “A knowledge based
signal processing approach to tonic identification in Indian classical
music,” in Proc. 2nd CompMusic Workshop, Istanbul, Turkey, 2012,
pp. 113–116.
[30] G. K. Koduri and B. Indurkhya, “A behavioral study of emotions in
South Indian classical music and its implications in music recommendation systems,” in Proc. ACM Workshop Soc. Adapt. Pers. Multimedia
Interact. Access, Florence, Italy, 2010, pp. 55–60.
[31] P. Sarala, V. Ishwar, A. Bellur, and H. A. Murthy, “Applause identification and its relevance to archival of Carnatic music,” in Proc. Workshop
Comput. Music, Istanbul, Turkey, Jul. 2012, pp. 66–71.
[32] V. Ishwar, A. Bellur, and H. A. Murthy, “Motivic analysis and its relevance to raga identification in Carnatic music,” in Proc. Workshop
Comput. Music, Istanbul, Turkey, Jul. 2012, pp. 153–157.
[33] S. Gulati, J. Salamon, and X. Serra, “A two-stage approach for tonic
identification in Indian art music,” in Proc. Workshop Comput. Music,
Istanbul, Turkey, Jul. 2012, pp. 119–127.
[34] G. K. Koduri, J. Serrá, and X. Serra, “Characterization of intonation in Karnataka music by parametrizing context-based Svara
Distributions,” in Proc. Workshop Comput. Music, Istanbul, Turkey,
Jul. 2012, pp. 128–132.
[35] J. C. Ross and P. Rao, “Detection of raga-characteristic phrases from
Hindustani classical music audio,” in Proc. Workshop Comput. Music,
Istanbul, Turkey, Jul. 2012, pp. 133–138.
[36] A. Vidwans, K. K. Ganguli, and P. Rao, “Classification of Indian classical vocal styles from melodic contours,” in Proc. Workshop Comput.
Music, Istanbul, Turkey, Jul. 2012, pp. 139–146.
[37] B. Bozkurt, “Features for analysis of Makam music,” in Proc. Workshop
Comput. Music, Istanbul, Turkey, Jul. 2012, pp. 61–65.
[38] A. Srinivasamurthy, S. Subramanian, T. Gregoire, and P. Chordia,
“A beat tracking approach to complete description of rhythm in Indian
classical music,” in Proc. Workshop Comput. Music, Istanbul, Turkey,
Jul. 2012, pp. 72–78.
[39] P. Sarala and H. A. Murthy, “Cent filter-banks and its relevance
to identifying the main song in Carnatic music,” in Proc. Comput.
Music Multidscipl. Res. (CMMR), Marseilles, France, Oct. 2013,
pp. 659–681.
[40] V. Ishwar, S. Dutta, A. Bellur, and H. A. Murthy, “Motif spotting in an alapana in Carnatic music,” in Proc. Int. Soc. Music Inf.
Retrieval (ISMIR), Curitiba, Brazil, Nov. 2013, pp. 499–504.
[41] A. Bellur and H. A. Murthy, “A novel application of group delay function
for identifying tonic in Carnatic music,” in Proc. EUSIPCO, Marrakesh,
Morocco, 2013, pp. 1–5.
[42] J. C. Ross, T. P. Vinutha, and P. Rao, “Detecting melodic motifs from
audio for Hindustani classical music,” in Proc. Int. Soc. Music Inf.
Retrieval (ISMIR), Porto, Portugal, Oct. 2012, pp. 193–198.
[43] P. Sarala and H. A. Murthy, “Inter and intra item segmentation of
continuous audio recordings of Carnatic music for archival,” in Proc.
Int. Soc. Music Inf. Retrieval (ISMIR), Curitiba, Brazil, Nov. 2013,
pp. 487–492.
[44] J. Kuriakose, J. C. Kumar, P. Sarala, H. A. Murthy, and U. K. Sivaraman,
“Akshara transcription of mrudangam strokes in Carnatic music,” in
Proc. 21st IEEE Nat. Conf. Commun. (NCC), Mumbai, India, 2015,
pp. 1–6.
[45] CUED. (2002). HTK Speech Recognition Toolkit. [Online]. Available:
http://htk.eng.cam.ac.uk
[46] B. Logan, “Mel frequency cepstral coefficients for music modeling,” in
Proc. Int. Symp. Music Inf. Retrieval, Plymouth, MA, USA, 2000.
[47] M.-L. Zhang and Z.-H. Zhou, “ML-KNN: A lazy learning approach to
multi-label learning,” Pattern Recognit., vol. 40, no. 7, pp. 2038–2048,
2007.
[48] D. Gerhard, “Pitch extraction and fundamental frequency: History and
current techniques,” Dept. Comput. Sci., Univ. Regina, Regina, SK,
Canada, Tech. Rep. TR-CS 2003-6, 2003.
PADI et al.: MODELING AND ANALYSIS OF INDIAN CARNATIC MUSIC USING CT
[49] B. Chandrasekaran, J. R. Josephson, and V. R. Benjamins, “What are
ontologies, and why do we need them?” IEEE Intell. Syst., vol. 14, no. 1,
pp. 20–26, Jan. 1999.
[50] P. M. Simons, Parts: A Study in Ontology. Oxford, U.K.: Oxford Univ.
Press, 1987.
[51] D. Fensel, Ontologies: A Silver Bullet for Knowledge Management and
Electronic Commerce, 2nd ed. Heidelberg, Germany: Springer-Verlag,
2003.
[52] D. Fensel, “Ontology-based knowledge management,” Computer,
vol. 35, no. 11, pp. 56–59, 2002.
[53] G. Antoniou and F. Van Harmelen, “Web ontology language: Owl,”
in Handbook on Ontologies. Heidelberg, Germany: Springer, 2004,
pp. 67–92.
[54] G. Antoniou, E. Franconi, and F. Van Harmelen, “Introduction to semantic Web ontology languages,” in Reasoning Web. Heidelberg, Germany:
Springer, 2005, pp. 1–21.
[55] D. L. McGuinness and F. Van Harmelen, “Owl Web ontology language overview,” W3C Recommendation, vol. 10, no. 10, 2004. [Online].
Available: https://www.w3.org/TR/owl-features/
[56] S. Song, M. Kim, S. Rho, and E. Hwang, “Music ontology for mood
and situation reasoning to support music retrieval and recommendation,”
in Proc. 3rd Int. Conf. Digit. Soc. (ICDS), Cancún, Mexico, Feb. 2009,
pp. 304–309.
[57] B. Fields, K. Page, D. De Roure, and T. Crawford, “The segment ontology: Bridging music-generic and domain-specific,” in Proc. IEEE Int.
Conf. Multimedia Expo (ICME), Barcelona, Spain, 2011, pp. 1–6.
[58] A. Gali, C. X. Chen, K. T. Claypool, and R. Uceda-Sosa, “From
ontology to relational databases,” in Conceptual Modeling for
Advanced Application Domains. Heidelberg, Germany: Springer, 2004,
pp. 278–289.
[59] B. Motik, I. Horrocks, and U. Sattler, “Bridging the gap between owl
and relational databases,” Web Semantics Sci. Services Agents World
Wide Web, vol. 7, no. 2, pp. 74–89, 2009.
[60] A. Gangemi, N. Guarino, C. Masolo, A. Oltramari, and L. Schneider,
“Sweetening Ontologies with DOLCE,” in Proc. Int. Conf. Knowl. Eng.
Knowl. Manag., Sigüenza, Spain, 2002, pp. 166–181.
[61] S. Eilenberg and S. MacLane, “General theory of natural equivalences,”
Trans. Amer. Math. Soc., vol. 58, no. 2, pp. 231–294, 1945.
[62] R. Rosebrugh and R. J. Wood, “Relational databases and indexed categories,” in Proc. Int. Category Theory Meeting CMS Conf., vol. 13.
1992, pp. 391–407.
[63] M. Johnson and R. Rosebrugh, “Sketch data models, relational schema
and data specifications,” Electron. Notes Theor. Comput. Sci., vol. 61,
pp. 51–63, Jan. 2002.
[64] D. I. Spivak, Category Theory for the Sciences. Cambridge, MA, USA:
MIT Press, 2014.
[65] H. Forssell, H. R. Gylterud, and D. I. Spivak, “Type theoretical
databases,” in Proc. Int. Symp. Logic. Found. Comput. Sci., 2016,
pp. 117–129.
[66] Z. Diskin, “Generalized sketches as an algebraic graph-based framework
for semantic modeling and database design,” Faculty Phys. Math., Univ.
Latvia, Riga, Latvia, Tech. Rep. M-97, 1997.
[67] Z. Diskin and U. Wolter, “A diagrammatic logic for object-oriented
visual modeling,” Electron. Notes Theor. Comput. Sci., vol. 203, no. 6,
pp. 19–41, 2008.
[68] A. Rutle, A. Rossini, Y. Lamo, and U. Wolter, “A category-theoretical
approach to the formalisation of version control in MDE,” in Proc.
Int. Conf. Fundam. Approaches Softw. Eng., York, U.K., 2009,
pp. 64–78.
[69] Z. Diskin, “Mathematics of UML,” in Practical Foundations of Business
System Specifications. Amsterdam, The Netherlands: Springer, 2003,
pp. 145–178.
[70] S. Padi, S. Breiner, E. Subrahmanian, and R. D. Sriram, “Category
theoretical approaches for deconstructing the semantics of UML class
diagrams,” in preparation.
[71] I. Cafezeiro and E. H. Haeusler, “Semantic interoperability via category theory,” in Proc. 26th Int. Conf. Conceptual Model. Tuts. Posters
Panels Ind. Contribut. (ER), vol. 83. Auckland, New Zealand, 2007,
pp. 197–202.
[72] F. McNeill, A. Bundy, and C. Walton, “Diagnosing and repairing ontological mismatches,” in Proc. Stairs 2nd Starting Ai Res. Symp., vol. 109.
Amsterdam, The Netherlands, 2004, p. 241.
[73] F. McNeill, “Dynamic ontology refinement,” Ph.D. dissertation, Div.
Informat., Univ. at Edinburgh, Edinburgh, U.K., 2006.
[74] M. Barr and C. Wells, Eds., Category Theory for Computing Science,
2nd ed. Hertfordshire, U.K.: Prentice-Hall, 1995.
[75] S. M. Lane, Categories for the Working Mathematician (Graduate Texts
in Mathematics), vol. 5. New York, NY, USA: Springer-Verlag, 1971.
981
Sarala Padi received the Ph.D. degree from the
Indian Institute of Technology Madras, Chennai,
India, in 2014.
She is currently a Guest Researcher with the
National Institute of Standards and Technology,
Gaithersburg, MD, USA. Her area of specializations
are machine learning and signal processing techniques for efficient music information retrieval and
automatic indexing purposes, text mining, and NLP
methods for processing of text documents for various
applications. Her current research interests include
applying category theory for modeling and knowledge representation, and
applying deep learning models for medical image analysis purposes.
Spencer Breiner received the master’s and
Ph.D. degrees from Carnegie Mellon University,
Pittsburgh, PA, USA, in 2010 and 2013,
respectively.
He joined the U.S. National Institute of Standards
and Technology, Gaithersburg, MD, USA, in 2015
under a Post-Doctoral Grant from the National
Research Council. His current research interest
includes potential applications of the mathematical
field of category theory to real-world applications.
Eswaran Subrahmanian received the Ph.D. degree
from Carnegie Mellon University, Pittsburgh, PA,
USA, in 1987.
He is a Research Professor with the Institute
for Complex Engineered Systems and Engineering
and Public Policy, Carnegie Mellon University,
Pittsburgh, PA, USA. He has published over
100 refereed papers, co-edited three books on
Empirical Studies in Design, Design engineering, and Organizational Implications knowledge
Management. He also co-authored a book setting the
global research agenda for ICT in Development. He has also co-edited three
special issues in the areas of design theory, engineering informatics, and annotations in engineering design. His current research interests include design
theory, design support systems, information modeling for engineering, collaborative engineering, and engineering education.
Dr. Subrahmanian was a recipient of the Steven Fenves Award for Systems
Engineering at CMU. He is a Distinguished Scientist of the Association
of Computing Machinery, and a Fellow of the American Association of
Advancement of Science and a member of the Design Society.
Ram D. Sriram (S’82–M’85–SM’00–F’17) was
a Faculty of Engineer with the Massachusetts
Institute of Technology, Cambridge, MA, USA,
from 1986 to 1994 and was instrumental in setting
up the Intelligent Engineering Systems Laboratory.
He is currently a Division Chief of the Software
and Systems Division with the National Institute
of Standards and Technology, Gaithersburg, MD,
USA. He is a Distinguished Alumni with the
Indian Institute of Technology and Carnegie Mellon
University, Pittsburgh, PA, USA. He has co-authored
or authored nearly 250 papers, books, and reports, including several books. His
current research interests include developing knowledge-based expert systems,
natural language interfaces, machine learning, object-oriented software development, life-cycle product and process models, geometrical modelers, objectoriented databases for industrial applications, health care informatics, bioinformatics, and bioimaging.
Dr. Sriram was a recipient of the NSF’s Presidential Young Investigator
Award in 1989, the ASME Design Automation Award in 2011, the ASME
CIE Distinguished Service Award in 2014, and the Washington Academy of
Sciences’ Distinguished Career in Engineering Sciences Award in 2015.
Download