March 2003 Music Analyser MSc Project By Jean-Marie FROUX Preliminary report. Project supervised by DJ Styles. Table of contents: Table of contents: ................................................................................................................ 2 Introduction: ........................................................................................................................ 3 1/ The software. .................................................................................................................. 4 2/ The problems to be solved. ............................................................................................. 5 a) How to capture the sound?.......................................................................................... 5 b) How to find the note? ................................................................................................. 6 c) How to save the work in the different formats and in particular in MIDI? ................ 8 d) Others problems. ....................................................................................................... 10 3/ Planning. ...................................................................................................................... 11 Conclusion. ....................................................................................................................... 12 References: ........................................................................................................................ 12 Annexes: ........................................................................................................................... 13 Annex 1 : The table of frequencies. .............................................................................. 13 Annex 2 : A MIDI file exemple: ................................................................................... 14 MSc Project By JM FROUX Music analyser City University -2- Introduction: Since a long time the computers and the music do good work, and the MIDI standards imposed itself today in the Computer Assisted Music and in particular in the assistance with the creation of scores. However, the musician who wishes to benefit from it must have a MIDI instrument which is limited almost to the keyboard. And many musicians not pianists would like to be able to enjoy the recognition of notes without having to invest in such equipment. This is why it is interesting to consider the recognition of musical notes with a microphone and a sound card as only interface with the instrument. And it is the goal of this project, to create a software of recognition of notes played by a flute with the language C #. Wee will study in this preliminary report what exactly the software will do, have a first approach of all the problems to be solved, and see a quick planning. MSc Project By JM FROUX Music analyser City University -3- 1/ The software. The first stage of the software will be to tune the instrument so that the played notes are at the good frequency. Then the user will play his music and the software will write in reel-time the notes in a score. It will be possible to change some parameter for the spectrum analyse like the sampling frequency. Once that made, it will be possible to listen what has been played and what has been written. The next stage will be the modification of the score, by changing any wrong note, or by adding new note or silence, title or other information, key signature, bar line… Finally, the user will be able to print his work, save it as an image file (like JPEG) or as a MIDI file, and save the sound recorded as a wave file. MSc Project By JM FROUX Music analyser City University -4- 2/ The problems to be solved. a) How to capture the sound? I will use DirectX®. It is designed with hardware acceleration in mind. It tries to provide a low level access to hardware, while still remaining a generic interface. DirectSound® and DirectMusic® are separate components of DirectX® with some overlapping functionality. Both play WAV sounds, and DirectMusic® ultimately synthesizes all sounds into waveforms that are played through DirectSound buffers. We can use the DirectSound API independently to play WAV sounds, even in applications that use DirectMusic to play other content. We can also use DirectSound to manipulate sound buffers that are managed by DirectMusic The following table summarizes the functionality offered by the two APIs. Functionality DirectMusic DirectSound Play WAV sounds Yes Yes Play MIDI Yes No Play DirectMusic Producer segments Yes No Load content files and manage objects Yes No, but some support in sample code Control musical parameters at run time Yes No Manage timeline for cuing sounds Yes No Use downloadable sounds (DLS) Yes No Set volume, pitch, and pan of individual sounds Yes, through DirectSound API Yes Set volume on multiple sounds (audiopaths) Yes No Apply effects (DMOs) Yes, through DirectMusic Producer content or DirectSound API Yes Chain buffers for mix-in (send) effects Yes, through DirectMusic Producer content No Capture WAV sounds No Yes Implement full duplex No Yes Capture MIDI Yes No MSc Project By JM FROUX Music analyser City University -5- So by using the Microsoft® DirectMusic® and Microsoft® DirectSound® interfaces in my application, I can capture wave sounds from a microphone or other input and analyse it in reel time. I explain how below: First I create a capture buffer, calling the DirectSoundCapture8::CreateCaptureBuffer method for capturing waveform audio. Then I start the buffer by calling the IDirectSoundCaptureBuffer8::Start method. The buffer will keep running continuously rather than stopping when it reaches the end. The program will wait until the desired amount of data is available using the IDirectSoundNotify8::SetNotificationPositions method. When sufficient data is available, the program locks a portion of the capture buffer by calling the IDirectSoundCaptureBuffer8::Lock method. As parameters to the Lock method, the program passes the size and offset of the block of memory wanted to be read. The program copies the data from the buffer, using the addresses and block sizes returned by the Lock method, unlock the buffer with the IDirectSoundCaptureBuffer8::Unlock method. The frequency analyse will takes place at this stage. And all this is repeated until the user to stop his record. The program will call the IDirectSoundCaptureBuffer8::Stop method. b) How to find the note? We can find the note by making a spectral analysis of the signal by FFT. A frequential spectrum is an ordered graph of the magnitude relating to the sinusoidal components of an acoustic vibration according to the frequency. As an indication, the ear is sensitive to the sounds whose frequency lies between 20Hz (low registers) and 20 KHz (the acute ones) (that is 10 octaves) and the magnitude higher than 30 dB. But a flute can only play notes between 130.81 and 1046.50 Hz (look at the frequencies table in annex) so it will limit our analyses. One passes from one octave to the other by dividing the frequency by 2. The scale divides the octave into 12 intervals of 122 each one (=1.0594). It is a system to MSc Project By JM FROUX Music analyser City University -6- recognize the notes, the basic note being the "A3" 440 Hz, and the other notes being deduced from the preceding one by multiplication with T = 122. It is in fact more complicated than that: the note emitted is made up of several: - the basic, called fundamental, which is the first term of the series and whose frequency characterizes the note. It is in the majority of the cases of the most significant magnitude, but it happens that it is not the case, which can pose problem in the recognition of the octave. - and the harmonics, whose frequencies are multiples of the fundamental one. The number and the intensity of different harmonics determine the tone of the instrument. And as if that were not enough, for the same instrument, the panel of the harmonics changes according to the height of the sound. The theoretical spectrum of the "A3" 440Hz, note of reference, is as follows: Magnitude 440 880 1320 1760 Frequency (Hz) Thus the main problem will be to find which peak corresponds to the note. The problem consists in selecting in the spectrum the note having the most power knowing that a note corresponds to a fundamental frequency and whole multiple harmonic frequencies of the frequency of fundamental. The difficulty consists in identifying on the spectrum which harmonics are attached to which fundamental. It is especially necessary to avoid regarding the harmonics as notes because they have never been played. First we have to clean the spectrum from all very small peaks due to noise. It is then necessary to associate fundamental and harmonics: for each frequency, one looks if there is a possible fundamental in the purified spectrum. We can create a list of all fundamental possible with for each fundamental a list of its associated harmonics. To select the good fundamental, we can seek the note MSc Project By JM FROUX Music analyser City University -7- having the most of harmonics or the note having the most power, or even use these two criteria simultaneously by balancing their relative importance. Another method could be the following: We glance through the table containing the peaks. For the index i of the table we look at which are the frequencies j which, multiplied by 1,2,3,4,5 or 6 can give the sound of frequency i: thus i would be a harmonic of these frequencies. As soon as one peak j could generate i, we add the amplitude of i to that of j, and store the result in the box j of a results table. Thus, energies of the harmonics of a note are added, whereas isolated harmonics energies are not. We can now detect which note has the most of energy and it should be the fundamental. c) How to save the work in the different formats and in particular in MIDI? A MIDI file is made up of a number of 'chunks' of data. The first of these is the header chunk, which contains information like the format type, the number of tracks, and the timing resolution. This is followed by one or more track chunks, each of which contains the information for one complete track of MIDI data. All chunks, however, regardless of type, have the same basic components - each contains a four byte indicator, signifying the chunk type, another four byte word containing the length in bytes of the data following, and lastly, the data itself. See Annex 2 for an example. Header chunk The header chunk type is always the same, being the letters 'M','T', 'h' and 'd' in ASCII format. Following this, we have four more bytes which indicate the length of the header chunk data. The six bytes header chunk data is grouped into three sixteen-bit words, representing, in order, File format number MSc Project By JM FROUX Music analyser City University -8- Number of tracks Timing resolution. This is the time increment unit which is used in the track data to represent MIDI event durations, and can be one of two types. If the MSB (Most Significant Bit - i.e. the leftmost bit) is set to 0, the unit is 'ticks' per quarter note (or crotchet), the actual number being expressed in the remaining 15 bits of the word. If the MSB is set to 1, however, the unit is in terms of 'ticks' per time code frame. So the header chunk on a four track Format 1 MIDI file with a time resolution of 1024 (&0400) ticks per crotchet will appear as follows. Track chunk A track chunk contains within it a complete description of one sequencer track. Like the header chunk, it consists of a type ('MTrk' in ASCII), a data length indicator, and the data itself. The data consists of a series of track events. A track event can be anything that can be sent down a MIDI cable - NOTE ON, NOTE OFF, PROGRAM CHANGE etc. Alternatively, it can represent what is known as a meta-event - that is to say, information about the piece of music. This may be the key or time signature, the name of the track or tempo information. A meta-event always starts with a three byte header, the first byte of which is &FF. The second indicates the meta-event type, and the third indicates the number of bytes in the data following. A C major key signature meta-event, for example, would have the following format: Track events, whether they are MIDI messages or meta-events, are prefixed with a number of bytes representing the delta time. This is the amount of time in ticks which has elapsed since the last event, and is used to denote note and rest durations. MSc Project By JM FROUX Music analyser City University -9- d) Others problems. There are some other less important problems for this project: How to find the duration of a note? How to write the score? MSc Project By JM FROUX Music analyser City University - 10 - 3/ Planning. The project will begin the 19th of May. I expect that I will be able to capture sound by the start of June, to detect a note by the start of July. There I think I will have to do a lot of tests to improve the detection. If I have still a lot of time at this stage, I will try to apply my program for other instruments like guitar or piano. I want this project to be finished at semi-August, then do the report and be able to present it at the beginning of September, because I want then to continue my studies in a French engineering school. MSc Project By JM FROUX Music analyser City University - 11 - Conclusion. This project is a multidisciplinary one it needs some knowledge in programming, digital signal processing, mathematics, physic of sound… After doing this report, this project and his realization are now clearer in my mind. I am certain that this work will be enthralling. References: http://villemin.gerard.free.fr/CultureG/MusNote.htm http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/htm/gettingstartedwithdirectsound.asp http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/htm/directsoundanddirectmusic.asp http://chv.chez.tiscali.fr/jm/musique/ http://courses.ece.uiuc.edu/ece291/books/labmanual/io-devices-speaker.html http://www.lgu.ac.uk/~seago/SMF.html http://sunlightd.virtualave.net/Windows/DirectX.NET/ MSc Project By JM FROUX Music analyser City University - 12 - Annexes: Annex 1 : The table of frequencies. The following table lists frequencies and frequency numbers for the three octaves which can play a flute: Note C C# D D# E F F# G G# A A# B Middle C C# D D# E F F# G G# A A# B C C# Frequency (Hz) 130.81 138.59 146.83 155.56 164.81 174.61 185.00 196.00 207.65 220.00 233.08 246.94 261.63 277.18 293.66 311.13 329.63 349.23 369.99 391.00 415.30 440.00 466.16 493.88 523.25 554.37 Note D D# E F F# G G# A A# B C MSc Project By JM FROUX Music analyser City University Frequency (Hz) 587.33 622.25 659.26 698.46 739.99 783.99 830.61 880.00 923.33 987.77 1046.50 - 13 - Annex 2 : A MIDI file exemple: Taking the above as an example, the following format 0 file was created out of it. (Format 0 is the simplest, and is used for recording a single multichannel track of MIDI data, together with tempo information.) Delta time Data Interpretation &4d &54 &68 &64 Header chunk &00 &00 &00 &06 Six data bytes &00 &00 Format 0 &00 &01 One track &04 &00 1024 ticks/crotchet &4d &54 &72 &6b Track chunk &00 &00 &00 &59 89 data bytes &00 &ff &58 &04 &04 &02 &18 &08 Time signature (4/4) &00 &ff &59 &02 &00 &00 Key signature (C) &00 &ff &51 &03 &09 &89 &68 Tempo (crotchet = 96) &00 &90 &48 &40 C (NOTE ON) &84 &00 &80 &48 &00 NOTE OFF &00 &90 &43 &40 G &82 &00 &80 &43 &00 &00 &90 &43 &40 &82 &00 &80 &43 &00 &00 &90 &44 &40 &84 &00 &80 &44 &00 &00 &90 &43 &40 &84 &00 &80 &43 &00 &84 &00 &90 &47 &40 &84 &00 &80 &47 &00 &00 &90 &48 &40 MSc Project By JM FROUX G A flat G B C Music analyser City University - 14 - &88 &00 &80 &48 &00 &01 &ff &2f &00 MSc Project By JM FROUX End of track Music analyser City University - 15 -