1.130 Final Presentation by Stephen Geiger
What is pitch recognition?
Well, what is pitch? . . .
How HIGH or LOW a sound is
Which note?
Perceived Frequency
Relationship Between
Pitch and Frequency
Pitch
Fundamental
Frequency
For Example:
For Middle C:
Frequency = 262 Hz
MATLAB CODE: fs = 22050; % Sampling Frequency.
f = 262; % Fundamental Freq of Middle C. t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise!
For an A Scale:
A = 220*2^( 0 /12)= 220 Hz
A#= 220*2^( 1 /12)= 233 Hz
B = 220*2^( 2 /12)= 247 Hz
C = 220*2^( 3 /12)= 262 Hz
C#= 220*2^( 4 /12)= 277 Hz
D = 220*2^( 5 /12)= 294 Hz
D#= 220*2^( 6 /12)= 311 Hz
E = 220*2^( 7 /12) = 330 Hz
F = 220*2^( 8 /12) = 349 Hz
F#= 220*2^( 9 /12) = 370 Hz
G = 220*2^( 10 /12)= 392 Hz
G = 220*2^( 11 /12)= 415 Hz
A = 220*2^( 12 /12)= 440 Hz
An Octave Up:
For C5:
Frequency = 524 Hz
MATLAB CODE: fs = 22050; % Sampling Frequency.
f = 524; % Fundamental Freq of C5.
t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise!
A Sum with 2 Frequencies:
Frequency = 262 Hz and
Frequency = 524 Hz
MATLAB CODE: fs = 22050; % Sampling Frequency.
f1 = 262; % Fundamental Freq of Middle C. f2 = 524; % Fundamental Freq of C5.
t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound((cos(2*pi*f1*t)+ . . .
0.25*cos(2*pi*f2*t))/2,fs);
Freq in a Piano - Middle C
Frequency, Hz
FFT of a Oboe Middle C
Frequency, Hz
Mono vs. Poly
Monophonic
one note at a time
(e.g. trumpet)
Polyphonic
multiple notes at a time
(e.g. piano, orchestra)
Creates a problem for pitch recognition.
(especially octaves!)
Some Existing Methods
Time Domain – Pitch Period estimation
With wavelets.
With auto-correlation function.
Freq. Domain – Find Fundamental
Auditory Scene Analysis
Blackboard Systems
Neural Networks
Perceptual Models
What applications are there?
Transcription of Music
Modeling of Musical Instruments
Speech Analysis
Besides its an Interesting Problem
A Novel Wavelet Approach
Based on an observation made by
Jeremy Todd, that:
For a piano playing these notes , a CWT could be used to identify a ‘G’ with certain scale/wavelet combinations.
Even with some polyphony !
Finding a G in a C Scale
Original
Signal
CWT @
Specific
“Scale”
The Continuous Wavelet
Transform
Definition of a CWT:
C a , b
f ( t )
1 a
a b dt
Where: a = scaling factor b = shift factor f(t) = function we start with
(t) = Mother wavelet
What is Scale?
LOW SCALE
Compressed Wavelet
Lots of Detail
High Frequency
HIGH SCALE
Stretched Wavelet
Coarse Features
Low Frequency
(You are here) (And here)
Gaussian 2 nd Order Wavelet
Initial Work
Took an empirical approach.
Ran a number of CWT’s at varying scale, and looked at the results.
Picked out a CWT scale for each note in the C scale.
Finding Notes in a C Scale
Original
Scale: 594
530
472
446
394
722
642
606
Finding Notes w/ Polyphony
Original
Scale: 594
530
472
446
394
722
642
606
More Complex Polyphony
Original
Scale: 594
530
472
446
394
722
642
606
Testing with different timbre
Original
Scale: 594
530
472
446
394
722
642
606
Why does this work?
The scale parameter in the CWT affects frequency response.
However, our “scales” that work don’t seem to follow a clear pattern.
Training Algorithm
Again, took an empirical approach.
Ran CWT’s at varying scales, on sample files containing one note.
Picked out scales, where: maximum of the CWT for one note >> other notes
(and collected results).
Longer C Scale –
Trained on 3 Octaves of Notes
*From Right Hand of Prelude in C, Op. 28 No. 1
A Fragment by Chopin*
Training on a ‘Real’ Guitar
Only able to find 5 of 8 pitches for C Scale training case. (With limited attempt).
Results on a test file were not completely accurate.
Expected to be a more difficult case than a piano.
Could merit a more thorough try.
Entire 88 K on a P
Work in progress.
It takes a long time to run many
CWT’s on 88 different sound files.
Initial results able to identify notes 70-88.
Frequency Response
Revisited
Frequency Response of a 2 nd Order Gaussian Wavelet
2500
Resulting Scales for
22 Piano Notes
2000
SCALE
1500
1000
500
0
0 1 2 3 4 5 6 7 10 11 12 13 14 15 16 17 18 19 20 21 22 8 9
NOTE NUMBER
10000
SCALE 8000
6000
4000
2000
0
0
Resulting Scales for
8 Sinusoidal Notes
14000
12000
1 2 3 4
NOTE NUMBER
5 6 7 8
Conclusions
The novel wavelet approach isn’t perfect.
Requiring “training” is a handicap.
Most likely not suited to sources with varying timbre. (e.g. guitar, voice)
Some interesting results.
The mechanism of detection could be further investigated and better understood.