Pitch Recognition with Wavelets

advertisement

Pitch Recognition with Wavelets

1.130 Final Presentation by Stephen Geiger

What is pitch recognition?

Well, what is pitch? . . .

How HIGH or LOW a sound is

Which note?

Perceived Frequency

Relationship Between

Pitch and Frequency

Pitch

Fundamental

Frequency

For Example:

For Middle C:

Frequency = 262 Hz

MATLAB CODE: fs = 22050; % Sampling Frequency.

f = 262; % Fundamental Freq of Middle C. t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise!

For an A Scale:

A = 220*2^( 0 /12)= 220 Hz

A#= 220*2^( 1 /12)= 233 Hz

B = 220*2^( 2 /12)= 247 Hz

C = 220*2^( 3 /12)= 262 Hz

C#= 220*2^( 4 /12)= 277 Hz

D = 220*2^( 5 /12)= 294 Hz

D#= 220*2^( 6 /12)= 311 Hz

E = 220*2^( 7 /12) = 330 Hz

F = 220*2^( 8 /12) = 349 Hz

F#= 220*2^( 9 /12) = 370 Hz

G = 220*2^( 10 /12)= 392 Hz

G = 220*2^( 11 /12)= 415 Hz

A = 220*2^( 12 /12)= 440 Hz

An Octave Up:

For C5:

Frequency = 524 Hz

MATLAB CODE: fs = 22050; % Sampling Frequency.

f = 524; % Fundamental Freq of C5.

t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise!

A Sum with 2 Frequencies:

Frequency = 262 Hz and

Frequency = 524 Hz

MATLAB CODE: fs = 22050; % Sampling Frequency.

f1 = 262; % Fundamental Freq of Middle C. f2 = 524; % Fundamental Freq of C5.

t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound((cos(2*pi*f1*t)+ . . .

0.25*cos(2*pi*f2*t))/2,fs);

Freq in a Piano - Middle C

Frequency, Hz

FFT of a Oboe Middle C

Frequency, Hz

Mono vs. Poly

 Monophonic

 one note at a time

 (e.g. trumpet)

 Polyphonic

 multiple notes at a time

 (e.g. piano, orchestra)

Creates a problem for pitch recognition.

(especially octaves!)

Some Existing Methods

 Time Domain – Pitch Period estimation

 With wavelets.

 With auto-correlation function.

Freq. Domain – Find Fundamental

 Auditory Scene Analysis

 Blackboard Systems

 Neural Networks

 Perceptual Models

What applications are there?

 Transcription of Music

 Modeling of Musical Instruments

 Speech Analysis

 Besides its an Interesting Problem

My Work . . .

A Novel Wavelet Approach

Based on an observation made by

Jeremy Todd, that:

For a piano playing these notes , a CWT could be used to identify a ‘G’ with certain scale/wavelet combinations.

Even with some polyphony !

Finding a G in a C Scale

Original

Signal

CWT @

Specific

“Scale”

The Continuous Wavelet

Transform

Definition of a CWT:

C a , b

  f ( t )

1 a

  a b dt

Where: a = scaling factor b = shift factor f(t) = function we start with

(t) = Mother wavelet

What is Scale?

LOW SCALE

Compressed Wavelet

Lots of Detail

High Frequency

HIGH SCALE

Stretched Wavelet

Coarse Features

Low Frequency

(You are here) (And here)

Gaussian 2 nd Order Wavelet

Initial Work

Took an empirical approach.

Ran a number of CWT’s at varying scale, and looked at the results.

Picked out a CWT scale for each note in the C scale.

Finding Notes in a C Scale

Original

Scale: 594

530

472

446

394

722

642

606

Finding Notes w/ Polyphony

Original

Scale: 594

530

472

446

394

722

642

606

More Complex Polyphony

Original

Scale: 594

530

472

446

394

722

642

606

Testing with different timbre

Original

Scale: 594

530

472

446

394

722

642

606

Why does this work?

The scale parameter in the CWT affects frequency response.

However, our “scales” that work don’t seem to follow a clear pattern.

Training Algorithm

Again, took an empirical approach.

Ran CWT’s at varying scales, on sample files containing one note.

Picked out scales, where: maximum of the CWT for one note >> other notes

(and collected results).

Results of

Training Algorithm

. . .

Longer C Scale –

Trained on 3 Octaves of Notes

*From Right Hand of Prelude in C, Op. 28 No. 1

A Fragment by Chopin*

Training on a ‘Real’ Guitar

 Only able to find 5 of 8 pitches for C Scale training case. (With limited attempt).

 Results on a test file were not completely accurate.

 Expected to be a more difficult case than a piano.

 Could merit a more thorough try.

Entire 88 K on a P

Work in progress.

It takes a long time to run many

CWT’s on 88 different sound files.

Initial results able to identify notes 70-88.

Frequency Response

Revisited

Frequency Response of a 2 nd Order Gaussian Wavelet

2500

Resulting Scales for

22 Piano Notes

2000

SCALE

1500

1000

500

0

0 1 2 3 4 5 6 7 10 11 12 13 14 15 16 17 18 19 20 21 22 8 9

NOTE NUMBER

10000

SCALE 8000

6000

4000

2000

0

0

Resulting Scales for

8 Sinusoidal Notes

14000

12000

1 2 3 4

NOTE NUMBER

5 6 7 8

Conclusions

The novel wavelet approach isn’t perfect.

Requiring “training” is a handicap.

 Most likely not suited to sources with varying timbre. (e.g. guitar, voice)

 Some interesting results.

 The mechanism of detection could be further investigated and better understood.

Download