Uploaded by Muhammad Ibne Muzammil

Melody Generation using RNN with LSTM

advertisement
Melody Generation using RNN with LSTM
Introduction
This report presents the findings of a
project that aims to encode musical
compositions in the form of time
series data. The encoded data can
then be used to train machine
learning models for tasks such as
music generation and music
classification.
Music
representation
is
a
fundamental problem in the field of
computer music, and various
approaches have been proposed for
representing music in a numerical
format that can be processed by
computers. One common approach
is to represent music as a sequence
of symbolic events, such as MIDI
notes or musical notation, and use
this representation as input for
machine learning models.
The goal of this project was to
develop a method for encoding
musical compositions as time series
data that can be used as input for
machine learning models. To achieve
this goal, we used the `music21`
library to parse and preprocess a
dataset
of
classical
music
compositions in the `kern` format.
The preprocessing steps included
transposing the compositions to a
common key (C major or A minor)
and eliminating compositions with
durations that are not among a
predefined set of acceptable
durations (16th notes, 8th notes,
quarter notes, half notes, and whole
notes). The resulting dataset was
then encoded as time series data
using a specified time step (in this
case, 0.25 quarter lengths). The time
series data represents a sequence of
MIDI notes, rests, and carry-over
symbols for notes/rests that extend
beyond a single time step.
Dataset Details
The dataset used in this project is a
collection of classical music
compositions in the `kern` format.
The data was sourced from the
`deutschl` dataset and contains a
variety of musical styles and
instruments.
The `kern` format is a textual
representation of musical notation
that is widely used in the field of
musicology. It consists of a series of
commands and parameters that
specify the musical content of a
composition,
including
pitch,
duration, dynamics, and other
musical features. The `kern` format
is well-suited for musicological
analysis, but it is not as widely used
in the field of computer music,
where more structured and
machine-readable formats such as
MIDI or MusicXML are often
preferred.
Methodologies
To encode the musical compositions,
the following steps were taken:
1. Load the kern files using the
music21 library. The music21
library is a Python-based
toolkit for working with music
data that provides a rich set of
tools
for
parsing,
manipulating, and analyzing
music notation.
2. Preprocess the data by
transposing the compositions
to a common key (C major or
A minor) and eliminating
compositions with durations
that are not among a
predefined set of acceptable
durations (16th notes, 8th
notes, quarter notes, half
notes, and whole notes).
Transposition is the process of
adjusting the pitch of a
musical composition to a
different key. This step was
included in the preprocessing
to ensure that all of the
compositions in the dataset
were in a common tonal
context, which makes it easier
to compare and analyze the
musical
content.
The
acceptable durations were
chosen to reflect common
musical rhythms and to
eliminate compositions that
might contain unusual or
complex rhythmic patterns
that would be difficult to
encode accurately.
3. Encode each composition as a
time series data using a
specified time step (in this
case, 0.25 quarter lengths). To
encode a composition, we
iterated through the notes
and rests in the composition
using the music21 library and
converted each event into a
symbol that represents its
pitch (for notes) or duration
(for rests). The symbols were
then divided into time steps of
the specified length, and
carry-over symbols were
added to represent notes and
rests that extended beyond a
single time step.
4. Save the encoded data to a
file, along with a mapping of
integer values to musical
symbols (MIDI notes, rests,
and carry-over symbols). The
mapping was included to
allow the encoded data to be
decoded back into symbolic
notation, if needed.
Results
The encoding method was applied to
all of the compositions in the
deutschl dataset. A total of 4903
compositions were retained after
preprocessing, and the resulting
time series data had a shape of
(4903, SEQUENCE_LENGTH, 1),
where SEQUENCE_LENGTH is a
hyperparameter representing the
length of the encoded sequences.
The encoded data showed a good
degree of structure and coherence,
with distinct patterns of notes and
rests that corresponded to the
musical content of the original
compositions. The distribution of
different musical symbols in the data
was
also
consistent
with
expectations, with a larger number
of shorter durations (such as 16th
and 8th notes) and a smaller number
of longer durations (such as half and
whole notes).
Potential Applications
and Limitations
The encoded data produced by this
method has several potential
applications in the field of computer
music. For example, the data could
be used to train machine learning
models for tasks such as music
generation, music classification, or
music recommendation. The data
could also be used to study the
statistical properties of music and to
explore relationships between
musical features and aesthetic
qualities.
There are also several limitations to
the current approach that should be
considered when interpreting the
results of this study. First, the kern
format is not as widely used in the
field of computer music as more
structured formats such as MIDI or
MusicXML, and the data may not be
representative of other types of
music. Second, the preprocessing
steps (such as transposition and
acceptable durations) were designed
to simplify the data and may have
eliminated some musical content
that would be important for certain
applications. Finally, the time step
used for encoding the data (0.25
quarter lengths) was chosen for
convenience, but other time steps
could potentially provide different
insights into the data.
Future Work
There are several directions for
future work that could extend and
improve upon the current approach.
One possibility is to explore
alternative methods for encoding
musical compositions, such as using
different time steps, using different
symbolic representations (e.g.,
musical notation or audio features),
or using more sophisticated
encoding
techniques
(e.g.,
convolutional neural networks).
Another possibility is to conduct
experiments using the encoded data
to train machine learning models for
tasks such as music generation or
classification, and to report on the
results of these experiments. Finally,
it would be interesting to compare
the results of this study to other
research on music representation
and machine learning, to see how
the current approach compares to
other methods in terms of
performance and interpretability.
References
Michael Scott Cuthbert. "music21: A
Toolkit
for
Computer-Aided
Musicology." In: Computing in
Musicology 14 (2008), pp. 1-10.
Stephen C. Smith and Christopher
Ariza. "MusicXML: A Flexible,
Extensible Format for Music
Notation
and
Performance
Information." In: Computer Music
Journal 26.3 (2002), pp. 53-59.
Links
Dataset:
https://kern.humdrum.org/cgibin/browse?l=essen/europa/deutschl
GitHub Repository:
https://github.com/Mountain311/Melody
_Generator
Download