The evaluation and optimisation of multiresolution FFT Parameters For use in automatic music transcription algorithms Automatic music transcription (AMT) AMT Algorithms Time & Frequency Resolution Short Window Time Resolution Increases Frequency Resolution Decreases Long Window Time Resolution Decreases Frequency Resolution Increases Multiresolution FFT (MRFFT) High Time Resolution High Frequency Resolution FFT A FFT B FcA FFT C FcB FFT D FcC FcD 256 Data Length 256 Data Length Time Freq Plane - Dressler 512 Data Length 512 Data Length Frequency 1024 Data Length 1024 Data Length 2048 Data Length Time Window Length - Bin Alignment Note-bin alignment – The position of a fundamental frequency relative to a FFT bin frequency. Note bin alignment A 2048 FFT Decomposition of a 376.83Hz Sine Wave 250 FFT Bin Magnitude 200 150 100 50 0 215.33 236.87 258.40 279.93 301.46 323.00 344.53 366.06 387.60 409.13 430.66 452.20 473.73 495.26 FFT Bin (Hz) Note bin alignment A 2048 FFT Decomposition of a 366.06Hz Sine Wave 600 FFT Bin Magnitude 500 400 300 200 100 0 215.33 236.87 258.40 279.93 301.46 323.00 344.53 366.06 387.60 409.13 430.66 452.20 473.73 495.26 FFT Bin (Hz) MRFFT Optimisation Cut off frequencies Subband FFT Length Optimised based on 3 characteristics determined by window length Time Resolution Frequency Resolution Note Bin Alignment Scoring Calculate score for time, freq, and note-bin alignment in each subband Weight score according to notes in subband Range correct score to be between 0 and 1 Sum all scores across all bands to generate MRFFT Score Note Bin Scoring Weighted Sub-band FFT Bin Score = Sub-band FFT Bin Score * (notes in sub-band/total notes across all bands) Error Scores 1 0 0.5 1 0.5 Bin A Value 0 0.5 Half way point 1 0.5 Bin B Value If 2 note frequencies fall within same bin, FFT length is discounted as unsuitable Scoring Process The algorithm moves the cut off frequencies A, B and C through all combinations of positions. For each position, all FFT lengths between 256 and 8192 samples in increments of 128 are evaluated on each sub-band. All combinations of FFT lengths on all combinations of subbands are evaluated and scored. Subband B Subband A Subband C 80 Hz FcA FcB FcC Subband D 5KHz FcD Solutions 1. 2. 3. 4. 5. 6. 4 band MRFFT 3 band MRFFT Dressler 4 band MRFFT Dressler fixed FFT Length variable bands 4 band MRFFT 1 band FFT 256-8192 range 256-8192 range 256-2048 range 256-2048 range 256-2048 range 8192 Results – Subband Divisions Determined by MRFFT Scoring System Karin Dressler Frequency MRFFT 4 (Hz) Band Parameters 98.00 103.83 110.00 116.54 123.47 130.81 138.59 146.83 155.57 164.82 174.62 185.00 196.00 207.65 220.00 233.08 246.94 261.63 277.19 293.67 311.13 329.63 349.23 370.00 392.00 415.31 440.01 466.17 2048 FFT 493.89 FcA 523.26 554.37 587.34 622.26 659.26 698.46 740.00 784.00 830.62 880.01 932.34 987.78 1046.51 1108.74 1174.67 1024 FFT 1244.52 FcB 1318.53 1396.93 1479.99 1568.00 1661.24 1760.02 1864.68 1975.56 2093.03 2217.49 2349.35 2489.04 512 FFT 2637.05 FcC 2793.86 2959.99 3136.00 3322.48 3520.04 3729.35 3951.11 4186.06 4434.97 4698.69 256 FFT 4978.09 FcD Solution 1 Solution 2 Solution 4 Solution 5 Band A Band B 6016 FFT FcA 6016 FFT FcA Band C 3328 FFT FcB Not Analysed Not Analysed 1664 FFT FcA 2048 FFT FcA 1408 FFT FcB FFT 2688 FcC 1792 FFT FcC 1024 FFT FcB 896 FFT FcC 1408 FFT FcD 1408 FFT FcC 512 FFT FcC FcD 256 FFT 768 FFT FcD Band D Results – MRFFT Score Transcription Test – Low F Bands Original Solution 6 FcA FcB High F Resolution of solution 6 is reflected in Low frequency transcription accuracy Solution 1 Transcription Test – High F Bands Solution 1 Solution 3 Solution 6 F-Measure Results Recall refers to the fraction of the relevant notes that were retrieved i.e. how many of the correct notes the system extracted. 1.000 0.900 0.800 0.700 Score Precision refers to the fraction of relevant notes retrieved, relative to the total number retrieved. I.e. how many of the extracted notes that were correct. F-Measure is the weighted mean of precision and recall. 0.600 0.500 Recall 0.400 Precision Fmeasure 0.300 0.200 0.100 0.000 1 2 3 4 Solution 5 6 Peak Picker A threshold is dynamically set for each analysis window of the STFT as a percentage of the maximum magnitude within the window, with a minimum threshold heuristically decided. If a bin magnitude exceeds the threshold a note is transcribed at that point. Peak Picker Robustness 25% Threshold Solution 1 Vs Solution 6 Picker 25% Threshold 25% Threshold MRFFT Implementation 6016 FFT is performed on the entire frequency spectrum. The spectral information is then filtered to include only the frequencies required by that band. 6016 FFT Not Used in MRFFT FcA F note frequency (orange magnitude) not in the frequency band considered, generates cross channel interference (red magnitudes) that contributes to the magnitudes in the sub-band of interest. Cross talk indicators Solution 1 2 3 4 5 6 average power average magnitude in power in non note bins note bins 538.79 23.4338906 594.48 32.7189717 157.7 29.0526256 216.75 45.368934 208.73 35.0150222 1527 27.4607458 average power in non note bins as % of power in note bins 4.35 5.50 18.42 20.93 16.78 1.80 Adjacent bins Note Bin Frequency Frequency 98.00 96.90 103.83 102.28 110.00 107.67 113.05 116.54 118.43 123.47 123.82 130.81 129.20 134.58 138.59 139.97 155.57 145.35 150.73 155.57 156.12 161.50 164.82 166.88 174.62 172.27 177.65 185.00 183.03 188.42 196.00 193.80 199.18 204.57 207.65 209.95 215.33 220.00 220.72 226.10 233.08 231.48 236.87 242.25 246.94 247.63 253.02 258.40 261.63 263.78 269.17 274.55 Adjacent bins in optimised MRFFT represent fundamental frequencies. Therefore any cross channel interference will contribute to energy contained in FFT bins representing note frequencies. Note FrequencyBin Frequency 98.00 95.30 103.83 102.63 110.00 109.96 116.54 117.29 123.47 124.62 130.82 131.95 138.59 139.28 146.84 146.61 155.57 153.94 161.27 164.82 168.60 174.62 172.27 185.00 185.52 196.00 198.77 207.66 212.02 220.01 225.27 233.09 238.52 246.95 251.77 261.63 265.02 This may contribute to false positives. F Measure conclusions The results of the F-Measure are largely disappointing, and can be attributed to the inadequacies of the implemented peak picker to handle fluctuations in magnitude of local maxima. Characteristics of the MRFFT, like adjacent note representing bins, and interference generated by sub-band division methods contribute to this problem. Large variations of spectral magnitudes also contribute Conclusions The theoretical scoring of MRFFT parameters resulted in favourable results for the optimised FFT. The ‘real world’ sinusoidal extraction test demonstrated initially disappointing F-Measure results for the MRFFT solutions compared to the single band 8192 FFT. However, upon closer analysis of the transcribed files, positive aspects of the MRFFT analysis were found as performance improved in the higher frequencies. Further investigation of the results revealed inadequacies of the peak picker implemented and also indicated issues with the construction of the MRFFT that require further investigation.