The optimisation of multiresolution FFT Parameters

advertisement
The evaluation and optimisation of
multiresolution FFT Parameters
For use in automatic music transcription algorithms
Automatic music transcription (AMT)
AMT Algorithms
Time & Frequency Resolution
Short Window
Time Resolution Increases
Frequency Resolution Decreases
Long Window
Time Resolution Decreases
Frequency Resolution Increases
Multiresolution FFT (MRFFT)
High Time
Resolution
High Frequency
Resolution
FFT A
FFT B
FcA
FFT C
FcB
FFT D
FcC
FcD
256 Data Length
256 Data Length
Time Freq Plane - Dressler
512 Data
Length
512 Data
Length
Frequency
1024
Data Length
1024
Data Length
2048 Data Length
Time
Window Length - Bin Alignment
 Note-bin alignment – The position of a fundamental
frequency relative to a FFT bin frequency.
Note bin alignment
A 2048 FFT Decomposition of a 376.83Hz Sine Wave
250
FFT Bin Magnitude
200
150
100
50
0
215.33 236.87 258.40 279.93 301.46 323.00 344.53 366.06 387.60 409.13 430.66 452.20 473.73 495.26
FFT Bin (Hz)
Note bin alignment
A 2048 FFT Decomposition of a 366.06Hz Sine Wave
600
FFT Bin Magnitude
500
400
300
200
100
0
215.33 236.87 258.40 279.93 301.46 323.00 344.53 366.06 387.60 409.13 430.66 452.20 473.73 495.26
FFT Bin (Hz)
MRFFT Optimisation
 Cut off frequencies
 Subband FFT Length
 Optimised based on 3 characteristics determined by
window length
 Time Resolution
 Frequency Resolution
 Note Bin Alignment
Scoring
 Calculate score for time, freq, and note-bin alignment in
each subband
 Weight score according to notes in subband
 Range correct score to be between 0 and 1
 Sum all scores across all bands to generate MRFFT Score
Note Bin Scoring
Weighted Sub-band FFT Bin Score =
Sub-band FFT Bin Score * (notes in sub-band/total notes across all bands)
Error Scores
1
0
0.5
1
0.5
Bin A
Value
0
0.5
Half way
point
1
0.5
Bin B
Value
If 2 note frequencies fall within same bin, FFT length is discounted as unsuitable
Scoring Process
 The algorithm moves the cut off frequencies A, B and C through
all combinations of positions. For each position, all FFT lengths
between 256 and 8192 samples in increments of 128 are
evaluated on each sub-band. All combinations of FFT lengths
on all combinations of subbands are evaluated and scored.
Subband B
Subband A
Subband C
80 Hz
FcA
FcB
FcC
Subband D
5KHz
FcD
Solutions
1.
2.
3.
4.
5.
6.
4 band MRFFT
3 band MRFFT
Dressler 4 band MRFFT
Dressler fixed FFT Length variable bands
4 band MRFFT
1 band FFT
256-8192 range
256-8192 range
256-2048 range
256-2048 range
256-2048 range
8192
Results – Subband Divisions
Determined by MRFFT Scoring System
Karin
Dressler
Frequency
MRFFT 4
(Hz)
Band
Parameters
98.00
103.83
110.00
116.54
123.47
130.81
138.59
146.83
155.57
164.82
174.62
185.00
196.00
207.65
220.00
233.08
246.94
261.63
277.19
293.67
311.13
329.63
349.23
370.00
392.00
415.31
440.01
466.17 2048 FFT
493.89 FcA
523.26
554.37
587.34
622.26
659.26
698.46
740.00
784.00
830.62
880.01
932.34
987.78
1046.51
1108.74
1174.67 1024 FFT
1244.52 FcB
1318.53
1396.93
1479.99
1568.00
1661.24
1760.02
1864.68
1975.56
2093.03
2217.49
2349.35
2489.04 512 FFT
2637.05 FcC
2793.86
2959.99
3136.00
3322.48
3520.04
3729.35
3951.11
4186.06
4434.97
4698.69 256 FFT
4978.09 FcD
Solution 1
Solution 2
Solution 4
Solution 5
Band A
Band B
6016 FFT
FcA
6016 FFT
FcA
Band C
3328 FFT
FcB
Not Analysed Not Analysed
1664 FFT
FcA
2048 FFT
FcA
1408 FFT
FcB
FFT 2688
FcC
1792 FFT
FcC
1024 FFT
FcB
896 FFT
FcC
1408 FFT
FcD
1408 FFT
FcC
512 FFT
FcC
FcD 256 FFT
768 FFT
FcD
Band D
Results – MRFFT Score
Transcription Test – Low F Bands
Original
Solution 6
FcA
FcB
High F Resolution of solution 6 is
reflected in
Low frequency transcription accuracy
Solution 1
Transcription Test – High F Bands
Solution 1
Solution 3
Solution 6
F-Measure Results
Recall refers to the fraction
of the relevant notes that
were retrieved i.e. how
many of the correct notes
the system extracted.
1.000
0.900
0.800
0.700
Score
Precision refers to the
fraction of relevant notes
retrieved, relative to the
total number retrieved. I.e.
how many of the
extracted notes that were
correct.
F-Measure is the weighted
mean of precision and
recall.
0.600
0.500
Recall
0.400
Precision
Fmeasure
0.300
0.200
0.100
0.000
1
2
3
4
Solution
5
6
Peak Picker
 A threshold is dynamically set for each analysis window of
the STFT as a percentage of the maximum magnitude
within the window, with a minimum threshold heuristically
decided. If a bin magnitude exceeds the threshold a
note is transcribed at that point.
Peak Picker Robustness
25%
Threshold
Solution 1 Vs Solution 6 Picker
25%
Threshold
25%
Threshold
MRFFT Implementation
6016 FFT is performed on the entire frequency spectrum. The spectral information is then filtered to
include only the frequencies required by that band.
6016 FFT
Not Used in MRFFT
FcA
F
note frequency (orange magnitude) not in the frequency band considered, generates cross
channel interference (red magnitudes) that contributes to the magnitudes in the sub-band of
interest.
Cross talk indicators
Solution
1
2
3
4
5
6
average
power
average
magnitude in power in non
note bins
note bins
538.79 23.4338906
594.48 32.7189717
157.7 29.0526256
216.75
45.368934
208.73 35.0150222
1527 27.4607458
average
power in non
note bins as
% of power
in note bins
4.35
5.50
18.42
20.93
16.78
1.80
Adjacent bins
Note
Bin
Frequency
Frequency
98.00
96.90
103.83
102.28
110.00
107.67
113.05
116.54
118.43
123.47
123.82
130.81
129.20
134.58
138.59
139.97
155.57
145.35
150.73
155.57
156.12
161.50
164.82
166.88
174.62
172.27
177.65
185.00
183.03
188.42
196.00
193.80
199.18
204.57
207.65
209.95
215.33
220.00
220.72
226.10
233.08
231.48
236.87
242.25
246.94
247.63
253.02
258.40
261.63
263.78
269.17
274.55
 Adjacent bins in optimised MRFFT
represent fundamental frequencies.
Therefore any cross channel
interference will contribute to energy
contained in FFT bins representing note
frequencies.
Note FrequencyBin Frequency
98.00
95.30
103.83
102.63
110.00
109.96
116.54
117.29
123.47
124.62
130.82
131.95
138.59
139.28
146.84
146.61
155.57
153.94
161.27
164.82
168.60
174.62
172.27
185.00
185.52
196.00
198.77
207.66
212.02
220.01
225.27
233.09
238.52
246.95
251.77
261.63
265.02
 This may contribute to false positives.
F Measure conclusions
 The results of the F-Measure are largely disappointing,
and can be attributed to the inadequacies of the
implemented peak picker to handle fluctuations in
magnitude of local maxima. Characteristics of the MRFFT,
like adjacent note representing bins, and interference
generated by sub-band division methods contribute to
this problem.
 Large variations of spectral magnitudes also contribute
Conclusions
 The theoretical scoring of MRFFT parameters resulted in
favourable results for the optimised FFT.
 The ‘real world’ sinusoidal extraction test demonstrated
initially disappointing F-Measure results for the MRFFT
solutions compared to the single band 8192 FFT. However,
upon closer analysis of the transcribed files, positive aspects
of the MRFFT analysis were found as performance improved
in the higher frequencies.
 Further investigation of the results revealed inadequacies of
the peak picker implemented and also indicated issues with
the construction of the MRFFT that require further
investigation.
Download