Expressing Music Composition as an Optimization Problem by Neil Dickson for D-Wave Systems Inc. Table of Contents 1 2 Introduction ............................................................................................................................. 2 Defining Common Practice Harmony .................................................................................... 3 2.1 Notation and basic definitions ........................................................................................ 3 2.2 Basic Voice Leading Rules ............................................................................................. 4 2.3 Definitions and Rules for Chords ................................................................................... 5 3 Artistic Style ........................................................................................................................... 9 3.1 Identifying a Style ......................................................................................................... 10 3.2 Stylistic Composition .................................................................................................... 11 4 Putting it all together ............................................................................................................. 13 5 Conclusion ............................................................................................................................ 15 1 Introduction Composition of music in the Common Practice Period (Baroque, Classical, Romantic) was governed by a large set of well-defined rules and recommendations on harmony and melody. Many of the harmonization rules are sufficiently well-defined that one can express them in terms of formal constraints on integer variables, each integer representing the pitch of a certain voice in a certain chord. As such, one could develop a harmonization based on these rules. From this, a melody can be developed, or vice versa, developing a harmony subject to a given melody. Additionally, much more specific constraints can be added by a user, such as “there must be exactly one deceptive cadence”, or “there must be at least two seventh chords in inversion”. This type of specific constraints may not make sense on its own; however, combined with melody and other constraints, it can be used to roughly define larger structures like phrases with certain inflections. Then one can theoretically apply species counterpoint and/or theme development to construct multi-phrase structures. Beyond that, forms such as Sonata-Allegro Form can be used to stitch the sets of phrases into a full piece. First, I will present a method of formally defining the common practice harmony rules in terms of constraints on integer variables. Second, I will present a possible approach to impose a style onto the creation of a piece given a set of pieces as a style reference. Third, I will present a possible approach (although quite informally-defined) to create a fugue given a theme. 2 Defining Common Practice Harmony 2.1 Notation and basic definitions The description or explanation of chords and other music theory concepts is beyond the scope of this document, so it is assumed that the reader has some music theory background. Start by defining integers to represent each note: Octave C C# D Eb E F F# G G# A Bb 1 0 1 2 3 4 5 6 7 8 9 10 2 12 13 14 15 16 17 18 19 20 21 22 3 24 25 26 27 28 29 30 31 32 33 34 4 36 37 38 39 40 41 42 43 44 45 46 5 48 49 50 51 52 53 54 55 56 57 58 6 60 61 62 63 64 65 66 67 68 69 70 7 72 73 74 75 76 77 78 79 80 81 82 B 11 23 35 47 59 71 83 Since the definitions and rules are in terms of semitones, here is a table of the semitone numbers for each scale degree, and the triad built on each scale degree. Degree Major ST H. Minor ST Major Triad H. Minor Triad 0 0 I i 1̂ 2 2 ii ii° 2̂ 4 3 iii III× 3̂ 5 5 IV iv 4̂ 7 7 V V 5̂ 9 8 vi VI 6̂ 11 11 vii° vii° 7̂ Many of the rules apply specifically to four-part voice leading, so for simplicity, the variables are defined as: si : Soprano note at time i Alto note at time i ai : Tenor note at time i ti : bi : Bass note at time i Where a rule applies only to four-part voice leading, an asterisk marks this. Also, in some cases, the rules become recommendations when dealing with other instrumentations, but the relative importances are highly debatable, so they have not been marked. The rules as they are should work sufficiently for a string quartet (2 violins, viola, and cello). The following notation has been added for simplicity of expression: x or y: Any of soprano, alto, tenor, bass d k mod 12 , where k is the current tonic (0 to 11); i.e. semitone relative to key d: d emod 12 d e: x y: x is a higher part than y (e.g. x = s and y = t) 2.2 Basic Voice Leading Rules Voice Ranges: * 36 si 60 29 ai 53 24 ti 48 15 bi 38 No Voice-Crossing: si ai ti bi Maximum Spacing: Soprano and Alto apart by at most an octave si ai 12 Alto and Tenor apart by at most an octave ai ti 12 Tenor and Bass apart by at most a perfect twelfth ti bi 19 No Voice Overlapping: Can’t move to a note higher than the previous note of higher part xi yi 1 x y xi 1 yi x y Can’t move to a note lower than the previous note of lower part Illegal Leaps: si 1 si 6 si 1 11 No tritone leap in soprano except to leading tone ai 1 ai 6 No tritone leap in alto ti 1 ti 6 No tritone leap in tenor bi 1 bi 6 No tritone leap in bass xi 1 xi 10 x No m7 leap in soprano, alto, tenor, or bass xi 1 xi 11 x No M7 leap in soprano, alto, tenor, or bass xi 1 xi 12 x No leaps larger than an 8ve in soprano, alto, tenor, or bass Parallel (or Consecutive even if not parallel motion) 5ths and 8ves: No parallel P5s (and can have parallel P4s) xi yi 7 xi 1 yi 1 7 x y x y 0 x i i 1 i yi 1 0 x y Resolution of the Leading Tone: xi 11 xi 1 xi 1 x No parallel P1s/P8s Any LT must proceed up one semitone No Doubling of the Leading Tone: xi 11 yi 11 x y Only up to one voice can have the LT at a time 2.3 Definitions and Rules for Chords First some basic rules about defining what it means to be a certain chord: A triad must have a root and a third, but not necessarily a fifth A 7th chord must have a root, a third, and a seventh, but not necessarily a fifth For clarity, here is a summary of the doubling rules for triads in root position, in order of preference (noting that vii° is never used in root position): Double the root (note that in a large cadence it is acceptable to triple the root) Double 1̂, 5̂, or 4̂ (in that order of preference) Double the fifth in V ( 2̂ ) Here is a summary of the doubling rules for triads in first inversion, in order of preference: Double 1̂, 5̂, or 4̂ (in that order of preference) Double the soprano Second inversion triads always have the bass note doubled. Here is a summary of the doubling rules for 7th chords, in order of preference: Don’t double any notes Double the root A couple of very specific exceptions to avoid parallel 5ths/8ves As an interesting example of how different rules interact, whenever two root position 7th chords with no doubled notes are next to each other, it is almost always that there are parallel 5ths caused by the required resolution of the 7th downward by step. 2.3.1 I Definition of I i 1̂,3̂,5̂ : Doubling root preferred: Can double fifth ( 5̂ ) instead: 2.3.2 I6 Definition of I 6 i 1̂,3̂,5̂ : Doubling 1̂ preferred: Can double 5̂ instead: Can double soprano instead: 2.3.3 I64 Definition of I 64 i 1̂,3̂,5̂ : b 0 x : x 4 y : y 0 y 4 y 7 t 0 a 0 s 0 t 7 a 7 s 7 i i i i i i i i i i i b 4 x : x 0 y : y 0 y 4 y 7 t 0 a 0 t 0 s 0 a 0 s 0 t 7 a 7 t 7 s 7 a 7 s 7 t s a s i i i i i i i i i i i i i i i i i i i i i b 7 x : x 0 x : x 4 y : y 0 y 4 y 7 i i i i i i t 7 a 7 s 7 Double bass: Allowed 6/4 Functions: Ii 1 I6i 1 Ii 1 I6i 1 i i i Arpeggio 6/4 Vi 1 bi 1 bi x, y : xi 1 xi 2 yi 1 yi 2 Cadential 6/4, a.k.a. V IVi 1 IV 6i 1 bi bi 1 2 bi 1 bi 2 IV 6 i 1 65 43 Passing 6/4 Up IVi 1 bi bi 1 2 bi 1 bi 2 Passing 6/4 Down 2.3.4 i This in a minor key acts the same as I in a major key, except with 3̂ being 3 instead of 4. 2.3.5 i6 This in a minor key acts the same as I6 in a major key, except with 3̂ being 3 instead of 4. 2.3.6 i64 This in a minor key acts the same as I64 in a major key, except with 3̂ being 3 instead of 4, and the Passing 6/4 up and down are involving iv instead of IV (meaning in both cases, the transition to/from the iv6 is by a semitone, instead of a tone, so that changes in the equations as well). 2.3.7 V b 7 x : x 11 y : y 7 y 11 y 2 Doubling root ( 5̂ ) preferred: t 7 a 7 s 7 Can double fifth ( 2̂ ) instead: t 2 a 2 t 2 s 2 a 2 s 2 Definition of Vi 5̂,7̂,2̂ : i i i i i 2.3.8 V6 Definition of V 6 i 5̂,7̂,2̂ : Doubling 5̂ preferred: 2.3.9 V64 Definition of V46 i 5̂,7̂,2̂ : i i i i i i i i i i i i i i i i i i b 2 x : x 7 x : x 11 y : y 7 y 11 y 2 t 2 a 2 s 2 i I 6 i1 Ii1 bi bi1 2 bi1 bi 2 Definition of V 7i 5̂,7̂,2̂,4̂ : i b 11 x : x 7 y : y 7 y 11 y 2 t 7 a 7 t 7 s 7 a 7 s 7 t s a s Double bass: i Allowed 6/4 Functions: Ii 1 I6i 1 bi bi 1 2 bi 1 bi 2 2.3.10 V7 i i i i Can double soprano instead: i i i i i Passing 6/4 Up Passing 6/4 Down i i i b 7 x : x 11 x : x 5 y : y 7 y 11 y 2 y 5 Double root ( 5̂ ): t 7 a 7 s 7 Seventh must resolve downward by step: x 5 x x 1 (or in a minor key: x 5 x x 2 ) i i i i i i i i i i 1 i i 1 i 2.3.11 IV Definition of IVi 4̂,6̂,1̂ : Doubling root ( 4̂ ) preferred: Can double 1̂ instead: 2.3.12 IV6 Doubling 1̂ preferred: Can double 4̂ instead: Can double soprano instead: 2.3.13 IV64 i i b 7 x : x 11 y : y 7 y 11 y 2 t 7 a 7 s 7 t 0 a 0 s 0 i i i i i i Definition of IV 6 i 4̂,6̂,1̂ : i i i i i i b 9 x : x 5 y : y 5 y 9 y 0 t 0 a 0 t 0 s 0 a 0 s 0 t 5 a 5 t 5 s 5 a 5 s 5 t s a s i i i i i i i i i i i i i i i i i i i i i b 0 x : x 5 x : x 9 y : y 5 y 9 y 0 t 0 a 0 s 0 Definition of IV46 i : i i Double bass: i i Allowed 6/4 Functions: IV46 i I i 1 I i 1 bi bi 1 bi 1 bi i i i i i Pedal I - IV46 - I 2.3.14 Other chords The full list of chords is too large to reasonably detail in this document. Also, formalizing the rules of non-chord tones would require a slightly more elaborate system, and so is also beyond the scope of this document. The following table is of the other most common triads in a major key: ii i bi 2 x : xi 5 y : yi 2 yi 5 yi 9 ii 6 i iii i iii 6 i vi i vi 6 i vii 6 i b 5 x : x 2 y : y 2 y 5 y 9 b 4 x : x 7 y : y 4 y 7 y 11 b 7 x : x 4 y : y 4 y 7 y 11 b 9 x : x 0 y : y 9 y 0 y 4 b 0 x : x 9 y : y 9 y 0 y 4 b 2 x : x 11 y : y 11 y 2 y 5 i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i The following is a more-complete (though far from exhaustive) list of 121 common chords in any single key: I I6 I 64 I7 i I 56 I 34 I 42 i 6 i 64 i #7 i 6#5 i 4#3 i 42 # ii ii 6 ii 7 ii 56 ii 34 ii 42 ii6 6 7 6 5 4 3 4 2 III III iii iii iii 6 iii 7 iii iii IV IV IV IV IV IV IV iv V V 6 V46 V 7 V56 V34 V24 V # vi vi 6 6 4 vi 7 vii 6 vii ø7 6 5 vi 56 vii ø6 5 4 3 vi 34 vii ø4 3 4 2 ii ø 7 ii ø 56 ii ø 34 ii ø 42 ii 6 6 iv 6 # V vi 42 VI VI 6 6 4 vii ø 2 VII vii V iii iv 6 4 6 4# V 7 6 5 4 3 4 2 IV #4 2 V VI 7 VI 56 VI 34 VI 42 vi iv iv iv 6 5# 7 # V V iv 4# V3 V vii vii vii vii 7 6 5 4 3 4 2 V ii 6 iii 6 V IV V V V 6 vi iii V 6 V ii 7 V V 6 V vii 6 V IV46 V46 V V 7 IV V V V 7 V 7 vi V 7 V vii 7 V ii56 iii56 V V IV56 V56 vi56 V V V vii 56 V ii34 iii34 V V IV34 V34 vi34 V V V vii 34 V ii 42 iii42 V V IV24 V24 vi 42 V V V vii 42 V 3 Artistic Style Percentage of transitions that use excessive arpeggios or scales The method of style analysis and definition described here is based on stylometry used for literary style analysis, but more elaborate, relating also to object analysis in computer vision. One starts with the concept of a feature. In literary style analysis, for example, words and “rare pairs” of words can be used as features. The frequency of the chosen features in chunks is then used to find a set of multi-dimensional data points (or possibly a single data point if analysing an entire author) determining the style of the given text. To compare two texts, some function of the distances between data points in the two sets is evaluated. In music, chords, short sequences of chords, rhythmic/melodic motifs, instrumentations, form structures, and various other aspects can be used as features. Then, a similar approach could theoretically be used to compare styles of two works. Well-chosen features could in fact be more representative of style in music than in text, due to the simpler inherent relations between the features. For example, in English text, the words “red”, “orange”, “yellow”, “green”, etc. are completely different words and yet all are simple colours, but finding such relations would require extensive analysis of a large sampling of English texts. In music, however, many such relations are quite well-defined and thoroughly studied. Things like the expected frequencies of common chords are easily “guessed” and could be found quite accurately by a relatively small sampling of well-selected music. The following is an example (albeit with made-up data) of a possible spread of data points for the frequencies of 2 high-level features (dimensions) in pieces by 4 composers: Bach Mozart Beethoven Wagner Ratio of notes by wind instruments to notes by string instruments Assuming that these features completely represent the style of any given piece, one can see that Mozart and Beethoven had similar styles for some of their pieces, but that Beethoven also wrote in a different style. Likewise, one can see that Wagner and Bach had similar styles in one feature, but very different in the other. However, the real goal of this analysis is to determine, for instance, what it means to be a piece that sounds distinctly like something that Bach would write. More interestingly with the plot above would be determining a piece that sounds distinctly like something that Beethoven would write, since although there are two clear clusters, taking a simple average isn’t sufficient, because of the space between the clusters. In higher dimensions (more features) this type of clustering would happen more because, for example, piano concertos would have a different spectrum of features than symphonies, string quartets, etc. So then the question becomes how to determine the set of pieces in the same style that are most easily identifiable as being in the style of Beethoven. Note that one must also take into account how well-known each piece is, assigning it a weighting for inclusion into the set chosen. In small numbers of dimensions, this may be fairly easy, but in an arbitrary number of dimensions, this is NP-hard in the worst case. 3.1 Identifying a Style The analysis proceeds as follows: 1. Formally define many features of the music in terms of numbers within a finite range, possibly discrete, of real numbers (including features that may be uncorrelated with the style of a composer). 2. Determine data points in this feature space. 3. Remove each feature dimension in which points are trivially distributed (e.g. uniform, normal) identically through all other features (i.e. it is an independent feature). These features should not affect the search for the most recognizable style, but this distribution information should be saved for when imposing the style, since it may still determine the style of the composer. One could also consider removing a dimension from any pair of dimensions that correlate very highly and correlate very similarly with all other dimensions (i.e. they are identical features), again saving the information. 4. Normalize each dimension by the standard deviation of the corresponding coordinate in the set of points. 5. Calculate the Euclidean distances between each pair of points in the normalized feature space. n 6. Find x 0,1 such that pi xi dij xi x j is maximized, where n is the number of i ij pieces, pi is the popularity of piece i from 0 to 1, d ij is the distance between the points of piece i and piece j, and is a positive factor chosen to represent the relative importance of the points being within a certain distance of each other. x x 1 is the set of pieces representing the style most recognizable as that of the composer, so i i this is an indication of where this composer’s style lies in the feature space. The Boolean quadratic program of any real symmetric matrix whose off-diagonal elements satisfy 0 aik aij a jk i j k i k can be transformed in polynomial time into a style analysis problem of a sort similar to the above, meaning that if any of these matrices are arbitrarily large and not positive definite, the style analysis problem is NP-hard. The polynomial-time transformation is to build an n-dimensional irregular simplex whose nodes represent the rows (or columns) of the matrix and whose edge lengths are the off-diagonal matrix elements. The diagonal elements represent the popularities of the pieces. 3.2 Stylistic Composition Beginning with an extremely rigid and highly-simplified example, suppose that a style dictates that: each feature A-F must appear exactly once feature A can immediately precede feature B or D, but not feature C, E, or F feature B can immediately precede feature C, D or E, but not A or F feature C can immediately precede feature A or F, but not B, D, or E feature D can immediately precede feature B or C, but not A, E, or F feature E can immediately precede feature D or F, but not A, B, or C feature F can immediately precede feature A or E, but not B, C, or D The objective is to find an order of the features that follows the style requirements. These restrictions can be represented by the following matrix: A B C D E F 1 0 1 0 0 A 0 0 1 1 1 0 B 0 0 0 0 0 1 C 1 1 1 0 0 0 D 0 0 0 1 0 1 E 0 1 0 0 0 1 0 F or, as a directed graph: A B C D F E The set of all valid orderings is then the set of all Hamiltonian paths in the graph, two of which are EDBCFA and BEFADC in this case. A C B A D F C B D F E E Since this mapping is bijective and has a polynomial-time inverse, any Hamiltonian path problem can be transformed in polynomial time into an extremely rigid, highly simplified stylistic composition problem. Such a stylistic composition problem is therefore NP-hard. How, though, does this compare to stylistic composition problems approaching slightly more reasonable instances? Suppose that the given style instead dictates the exact number of times that any feature i must appear immediately preceding feature j. This is then the Eulerian path problem on a directed multigraph (if the graph is Eulerian), which is not NP-hard. So what’s the difference? The Hamiltonian path problem has the option of taking an edge or not and can visit each node only once, whereas the Eulerian path problem must take all edges and can visit a node more than once. The issue is that supposing such information was given from a style analysis, such a graph is very unlikely to be Eulerian, in which case the problem then becomes much harder: find the path with the most edges. This no longer has the restriction that all edges must be taken. In fact, the more realistic case is that the number of such transitions from the style information is not discrete and the resulting path must be. Then, no such multigraph can be constructed, and the problem then becomes to find the path that has the number of transitions of each type as close as possible to the given values. One can also factor in a standard deviation to favour some values over others. Likewise, since this relaxes the number of times that any given feature occurs, one can factor in the mean and standard deviation of the number of times each feature occurs, and/or the mean and standard deviation of the number of times each feature occurs within a certain distance in the order. Many other feature statistics including those for complex, hierarchically-defined features (since few features are actually separate in time) are necessary, but make the optimization even more difficult. However, there is still the ability to rate the success of such a composition based on these style features. As such it is quite likely that stylistic composition in a realistic case is NP-hard. 4 Putting it all together Now that low-level rules and style have been established, the high-level rules to put the two together must be defined. There are of course medium-level rules dealing with what it means to be a phrase, modulation, etc. but in this case, high-level refers to defining the form of a piece. The example shown here is that of composing a fugue. This was chosen because it is a case in which there is a fairly well-defined form and the harmonization rules are in effect throughout the piece. A similar approach could be taken with regard to other forms, but much more research would be required to clearly define another form. From the fugue article on Wikipedia, the single-theme, three-voice, Baroque fugue has a usual form that can be expressed as follows: Fugue Exposition E Middle Entry 1 T D Codetta T (D) Rel. T S 1 2 A 1 A 1 2 S 1 Rel. D E Middle Entry 2 E Final Entries SD T T 2 S 1 F S 1 2 S 1 2 A 1 2 S Coda E = Episode (transition) T = Section in the tonic key D = Section in the dominant key Rel. T = Section in the relative major/minor’s tonic key Rel. D = Section in the relative major/minor’s dominant key SD = Section in the subdominant key S = Subject (main theme) 1 = 1st Counter-subject 2 = 2nd Counter-subject A = Answer F = Free counterpoint The description of composition in this section will assume that at least the subject is provided as input. There are various approaches to creating a subject, but the only difficult part of that would be finding a subject that is sufficiently different than themes in pieces that have already been written, which is more related to the style analysis problem. The goal is mostly then to create the answer, 1st and 2nd counter-subjects, the codetta and coda, episodes and smaller transitions, and the free counterpoint near the end, making variations on the various components based on the key and larger section of fugue. For simplicity, if this form is taken as an exact and solid requirement with no overlapping sections (which is likely never the case in a fugue), some of what it imposes is: There must be a valid harmonization of the following in (Soprano, Alto, Bass, key): (1,A,_,D), (2,1,S,T), (A,2,1,D), (1,S,2,Rel. T), (2,1,A, Rel. D), (S,2,1,SD), (1,S,2,T), (F,1,S,T) It must be possible to validly and quickly modulate: o from (S,_,_,T) to (1,A,_,D) o from (2,1,S,T) to (A,2,1,D) o from (1,S,2,Rel. T) to (2,1,A, Rel. D) o from (1,S,2,T) to (F,1,S,T) Each section must be of approximately certain lengths. The episodes must be valid counterpoint that modulate as needed (and validly harmonize with the preceding and following material of course) However, there are more difficult requirements and preferences in the more realistic version: The answer and counter-subjects must (at least loosely) follow rules and preferences for melodies (the subject too if generated instead of supplied), since they are effectively counter-melodies, but counter-subjects must not overshadow the subject in complexity. The answer must be “a response” to the subject, in that the subject should be an antecedent phrase (or small set of phrases) and the answer should be the subject’s consequent phrase (or small set of phrases). They should contain similar motifs but resolve differently. The 1st and 2nd counter-subjects must be similar to the subject in style, possibly with more similarity, but they must be sufficiently different from the subject, answer, and themselves to be clearly distinguishable (at least distinguishable by thoroughly analysing the score). The episodes must have similar style to the components, and should have some motif(s) in common with one or more components, but should be a gradual modulation (i.e. difficult to determine a single exact point at which the key changes). Various special patterns (e.g. canon) can come into play here, (dependent on the composer’s style). Components should rarely (if ever) enter at exactly the same time. Likewise, “overlap” of components, including over section boundaries, can also be desirable, but this must not ruin any modulation between sections. Components don’t need to be exactly the same every time that they occur, but they should be mostly consistent in the placement within the bar, and must be recognizable as being the same component. There are several specific preferences towards different variations of the fugue’s form depending on the subject, answer, and counter-subjects. The piece must fit the composer’s style, but hopefully be sufficiently different from existing compositions that as few sections as possible are easily confused with sections in the existing compositions. Although many of these are somewhat subjective (no pun intended), they can be roughly defined in terms of optimization objective functions. The full objective function may be very complicated when expressed in terms of the individual notes, but since there is a lot of structure to the problem, creating a good composition is not necessarily an impossible task, just very difficult. 5 Conclusion It does appear to be possible to express music composition in terms of constraint satisfaction and optimization. The problems thereof are likely very difficult to solve. More research is definitely required before implementing such a program, but given collaboration with music theorists, this should be feasible.