Running head: PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 1

advertisement
Running head: PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
Implementing a Pitch Detection Algorithm to Tune a Bass Guitar
Abigail Lira
El Paso Community College
1
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
2
Implementing a Pitch Detection Algorithm to Tune a Bass Guitar
Aspiring computer scientists can be motivated by implementing algorithms that
solve real-world problems; unfortunately, many interesting algorithms are beyond the reach
of beginning students because they are often explained in journal articles aimed to the
experienced computer scientist. In this paper I aim to illuminate one algorithm —pitch
detection using autocorrelation— and use it to tune an electric bass guitar in the hopes that
other students can reproduce my steps to create their own bass or guitar tuners. In doing so, I
will explain the basics of digital audio and the differences between pitch detection algorithms;
I will distill the algorithm into pseudocode; and show the code that I have created.
Digital Audio
Computers need to process the data onto their own language; binary code. Sound
is a variation in sound pressure, which is commonly represented as a waveform. Before a
computer can process a sound, its waveform needs to be converted into binary code. This
conversion of analog waveform to binary code is done by a method called sampling, in which
a computer records the sound pressure at discrete time points; a sound pressure value at
a given time is called a sample. At each step of the sampling process “the amplitude [is]
recorded [...] knowing that [t]he closer together [the] samples are[,] the more accurate the
recording becomes” (Media, n.d.). In my research, I found a common misconception: that
the higher the number of samples, the better the “accuracy” or “resolution” of the recording.
This is probably because people are familiar digital photography where a higher amount of
pixels amounts to a higher resolution picture. However, in digital audio, there is no concept of
“resolution”. For example, Jisc Digital Media states that “[t]he closer together [the] samples
are[,] the more accurate the recording becomes” (Media, n.d.), which is false. In contrast, the
Nyquist-Shannon Theorem states that the number of samples per unit time in a recording
simply determine the highest frequency that can be represented. Figure 1 shows the flow of
sound correspondent input through a digital system back to an audio output.
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
3
Figure 1 . The analog-to-digital converter (ADC) will convert the analog waveform to
discrete samples which will be then stored in the computer; in a traditional digital sound
setup (not used in my bass tuner) the digital-to-analogue converter (DAC) reconstructs an
analog waveform from the discrete samples. In my system, the original recording (OR) will
be the bass guitar signal.
Background on Pitch Detection
For many years computer scientists and electrical engineers have studied pitch detection,
and have analyzed to its limit. Rabiner, Cheng, Rosenberg, and McGonegal state that, “[p]itch
detect[tion] is an essential component in a variety of speech processing systems[...]the pitch
contour of an utterance is useful for recognizing speakers[...]for speech instruction to the
hearing impaired[...]and is a requirement in almost all speech analysis-synthesis (vocoder)
systems” (Rabiner, Cheng, Rosenberg, & McGonegal, 1976). Pitch detection is widely used,
thus a variety of algorithms have been developed. Each algorithm has its own method
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
4
containing an approach as to how pitch detection can be obtained. The algorithms compared
is the following: autocorrelation method, cepstral method, the simplified inverse filtering
technique method, the parallel processing time-domain method, the data reduction method,
and the spectral flattening linear predictive coding method. The seven pitch detection
algorithms have their own unique way of dealing with pitch detection, but as a whole they
all have the same intention.
Pitch detection is complicated. Rabiner, Cheng, Rosenberg, and McGonegal state
that, “[a]ccurate and reliable measurement of the pitch period of a speech signal from the
acoustic pressure waveform alone is often exceedingly difficult for several reasons” (Rabiner et
al., 1976). One reason is that the glottal excitation is a quasiperiodic signal, and “measuring
the period of a speech waveform, which varies both in period and in the detailed structure of
the waveform within a period, can be quite difficult” (Rabiner et al., 1976). In other words,
our voice is naturally unpitched. Rabiner, Cheng, Rosenberg, and McGonegal state that,
“[a]second difficulty in measuring pitch period is the interaction between the vocal tract and
the glottal excitation. In some instances the formants of the vocal tract can alter significantly
the structure of the glottal waveform so that the actual pitch period is difficult to detect”
(Rabiner et al., 1976). The challenge is when dealing with rapid movements that the results
may alter significantly. The third problem arises when measuring a pitch from beginning to
end. The question may emerge as to where would the exact beginning, and exact end be? ...
Lastly, the fourth difficulty would be detecting between low-levels of frequency or no levels at
all.
The autocorrelation method has its pros and cons when dealing with pitch detection.
Rabiner states that, “[t]he autocorrelation computation is made directly on the waveform and
is a fairly straightforward (albeit time consuming) computation” (Rabiner, 1977). Although
the algorithm does not have the best complexity, it “is simply amenable to digital hardware
implementation generally requiring only a single multiplier and an accumulator as the
computational elements” (Rabiner, 1977). This method brings good success when dealing
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
5
with digital communications pitch detection, and deals good with via distortions. Despite of
the many capabilities of the autocorrelation method, there exist problems that may arise
when dealing with this algorithm. Rabiner states that, “one problem is to decide which of
several autocorrelation peaks corresponds to the pitch period” (Rabiner, 1977). Another
problem appears when dealing with window handling, thus two problems are known. Rabiner
states that, “[first] there is the problem of choosing an appropriate window. Second there is
the problem that[...] no matter which window is selected, the effect of the window is to taper
the autocorrelation function smoothly to 0 as the autocorrelation index increases” (Rabiner,
1977). In other words, the zero index is a crucial fundamental when dealing with the exact
beginning at all times of the repetition. Finally, there would be a problem with “choosing an
appropriate analysis frame (window) size. The ideal analysis frame should contain from 2 to
3 complete pitch periods” (Rabiner, 1977). This adapts to an estimated average of the pitch
detection as a whole.
The aim of my algorithm is to detect the fundamental frequencies of every string in a
bass guitar. My algorithm implementation will work only with standard tuning. The table
on page 9 shows the note frequencies in an open note in an electric bass guitar.
The Implemented Algorithm
The algorithm that was proposed on Figure 2 is explained in this section. The sampled
waveform will come out of the digital-to-analogue converter (DAC) system, it will be split
into frames Which then will be processed by the autocorrelation procedure, which will focus
on the total number of samples and will use the lag strategy to get the crucial peaks of the
estimated fundamental frequency and finally, the peak finder procedure will find the most
likely peak to be an estimate of F0 .
Figure 2 illustrates how pitch detection starts off with first with a sampled waveform,
then onto processing procedures, ending up with an estimate of the fundamental frequency
F0 .
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
6
sampled waveform
Split Procedure
frames
Windowing Procedure
tapered frames
Autocorrelation Procedure
autocorrelation
Peak Finder Procedure
F0 estimate
Figure 2 . Algorithm overview.
Windowing and Filtering Procedure
Programs that deal with streams of data usually have a windowing procedure to split
an incoming stream into smaller parts called windows so that algorithms can process the
data in a piecemeal fashion, enabling indefinitely long streams such as the continuous audio
input that I will be using in my tuning program. Windowing procedures may sometimes
modify the data to make them easier to process.
Two types of windowing functions are commonly used: the Hanning window, which is
more responsive when working with 3 periods per analysis window, and a Gaussian window,
which is better when working with a larger analysis window (Boersma, 1993). The windowing
procedure that I will use will apply a Gaussian window function, which decreases the energies
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
7
at the edge of the window, which facilitates analysis. The procedure Gaussian-Filter(A)
below takes an array of samples A in a window and applies the filter by multiplying the
Gaussian function by the window.
Gaussian-Filter(A)
1 N = A.length
2 σ = 0.5
3 for n = 0 to A.length
−1)/2−n)
4
x = 2((N(N
−1)
5
A[n] = ex A[n]
Autocorrelation Procedure
The correlation of two waveforms is a measure of their similarity. Waveforms are
compared against each other by keeping one of them fixed and delaying the other by τ
samples. Autocorrelation is the correlation of a waveform with itself.
The mathematical definition of the autocorrelation function is shown in the equation
below:
Rx (τ ) =
N −1−τ
X
x[n]x[n + τ ]
(1)
n=0
where N is the total number of samples in the window, τ is the delay, and x[n] is the nth
sample in the window. The autocorrelation is expected to have a maximum at τ = 0 (when
identical), and dissimilarity to increase as τ increases —with local maximums along the way.
I have defined the Autocorrelation0(A,τ ) procedure found below that takes an
array A of samples in a window and a integer delay τ . Note that for every increase in τ , an
additional sample is needed; thus, for a window of length N and a τ that can go up to N ,
2N samples are needed. To allow for this without requiring complicated look-ahead logic,
the procedure correlates half of the provided array of samples.
Autocorrelation0(A, τ )
1 sum = 0
2 for i = 0 to bA.length/2c
3
sum = sum + A[i]A[i + τ ]
4 return sum
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
8
We would like to store the autocorrelation results for τ = [0, bA.length/2c] to later
find the maximums which will be F0 candidates. Thus, I defined the Autocorrelation(A)
procedure below that returns an array containing the autocorrelated waveform.
Autocorrelation(A)
1 R is an empty array of size A.length
2 for τ = 0 to bA.length/2c
3
R[τ ] = Autocorrelation0(A,τ )
4 return R
Peak Finding Procedure
Once I have computed the autocorrelation, the next task is to find a local maximum
R[τmax ] where τmax > 0 (King, n.d.). In the ideal case, the local maximum will be the period
1
of the original waveform, and from it I can easily estimate f0 =
.
τmax × sampling frequency
More commonly though, the local maximum will be an harmonic of F0 (Rabiner et al., 1976).
Better peak finding algorithms exist that can ignore harmonics (Tan & Karnjanadecha, 2003),
but for simplicity and to avoid premature optimization I implemented the simplest possible
peak-finding algorithm shown below as procedure Find-Peak(A), where A is the array
containing the autocorrelations.
Find-Peak(A)
1 τmax = 1
2 maxPeak = A[1]
3 for τ = 0 to A.length
4
if R[τ ] > maxPeak
5
maxPeak = R[τ ]
6
τmax = τ
7 return τmax
“As one can notice the range between the range frequencies of the open notes is
between 30.8 Hz to 130.8 Hz. Tan and Karnjanadecha state that, “[s]ince, the range of F0 is
generally in the range of 80-500 Hz, then the frequency components above 500 Hz is useless
[...] [t]hus a low-pass filter [...] above 500 Hz would be useful in improving the performance
[...] we use the lowpass-filter with 900 Hz” (Tan & Karnjanadecha, 2003). This is why the
autocorrelation method is an important when estimating F0.
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
9
Note F0 (Hz)
B
E
A
D
G
C
30.868
41.204
55.000
73.416
97.999
130.813
Table 1
Fundamental frequencies of open strings in bass guitars (Green, n.d.).
There may be some confusion as when referring to speech or music, but the basis of it
all is sound. Cheveigné and Kawahara state that, “[s]ounds may be periodic yet “outside the
existence region” of pitch [...] Conversely, a sound may not be periodic, but yet evoke a pitch
[...] However, over a wide range pitch and period are in a one-to-one relation, to the degree
that the word ‘pitch’ is often [...] F0 ” (de Cheveigné & Kawahara, 2002). In other words,
speech and music go hand-in-hand when dealing with fundamental frequency. The goal is to
fully understand how the autocorrelation works in order to see this true. Understanding pitch
detection will be by the use of the autocorrelation function itself. Tan and Karnjanadecha
state that, “[g]iven a discrete time signal x(n), defined for all n, the auto-correlation function
is generally defined as:
N
X
1
x(n)x(n + m)
Rx (m) = lim
N →∞ 2N + 1
n=−N
[...]Thus, for pitch detection, if we assume x(n) is exactly periodic with period P, i.e.,
x(n) = x(n+P) for all n, then it is easily shown that:
Rx (m) = Rx(m + P ),
i.e, the autocorrelation is also periodic with the same period” (Tan & Karnjanadecha,
2003). The autocorrelation function gives data to display a sinusoid structure waveform, and
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
10
can be analyzed furthermore. In order to explain this concept one must know that any sound
has a fundamental frequency period, meaning how long the sound takes to conclude. This
norm of time will be primary when dealing with the second signal compared. Cheveigné
and Kawahara state that, “[t]he autocorrelation method compares the signal to its shifted
self”(de Cheveigné & Kawahara, 2002). In other words, there will be the original waveform
and a shifted version of itself being compared. King states that, “auto means –itself, and
correlation means –similarity” (King, n.d.). The purpose to be compared is to find an average
for the fundamental frequency of pitch detection by using the autocorrelation formula; the
samples from the original waveform will be multiplied by the copied samples from the original
waveform depending on the lag or shift it is currently on. In addition Boersma states that,
“[f]or zero lag, we have rx (0) = rH (0) + rN (0), and if the noise is white[...]we find a local
maximum at a lag τmax = T0 [...]” (Boersma, 1993). That is, the local maximum will always
be at lag 0 because the original waveform will always be the target.
The following Java program shows a similar approach:
import java.util.Scanner;
import java.io.*;
public class Autocorrelation {
public static void main(String[] args) throws IOException{
File file = new File("SampleOriginal.txt");
Scanner outputFile = new Scanner(file);
double[] sample = new double[6];
double[] copy = new double[sample.length];
int i = 0;
while(outputFile.hasNext()) {
double num = outputFile.nextDouble();
sample[i]= num;
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
copy[i] = num;
i++;
}
int sample1 = (sample.length-1)/2;
for(int lag= 0; lag <= sample1; lag++) {
double autocorrelation = auto(sample, copy, lag);
System.out.println(autocorrelation);
}
outputFile.close();
}
public static double auto(double[] sample, double[] copy, int lag) {
int sample1 = sample.length-1;
double itself = 0;
int inc = 0;
for(int a = lag; a <= sample1; a++) {
itself += sample[inc] * copy[a];
inc++;
}
return itself;
}
}
SampleOriginal.txt, bass F0, notepad:
30.8
41.2
55.0
73.2
11
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
12
98.0
130.8
Output:
37741.96
27552.96
19674.40
Testing an array of six fundamental frequency components gave an estimation of the
fundamental period from which the fundamental frequency is obtained. Notice that the
process of reproducing the same array is concreted in the beginning of the program. Next,
it is taken the original wavelength and dividing itself by the copied wavelength; taking the
average. This is crucial to the result synthesis to be recorded in their precise measurements
of totality. The estimation of F0 will always be half the size of the fundamental period
as stated before. The repetition of the method will take place where the first for-loop is
implemented in order to not violate the total average of the original wavelength length. Now,
within the method the autocorrelation function is performed by setting the shift or lag to the
corresponding start, in this case 0, in that the original waveform is frequent. Notice that this
time the copied samples are fully being multiplied to its total length, and do not correspond
to the limit of the average. This is necessary in order for the estimated F0 to be calculated
within all of its components. Thus why the copied samples make the solution to the problem.
While Cheveigne´ and Kawahara state that, “[t]he algorithm has few parameters, and these
do not require fine tuning. In contrast to most other methods, no upper limit need be put
on the F0 search range[...] [Making this method] relatively simple[...] [And] implemented
efficientl” (de Cheveigné & Kawahara, 2002).
PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR
13
References
Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the
harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic
Sciences(17), 97-110.
de Cheveigné, A., & Kawahara, H. (2002, April). YIN, a fundamental frequency estimator
for speech and music. J. Acoust. Soc. Am., 111 , 1917-1930.
Green, G. (n.d.). Frequencies and ranges. Retrieved 2015-11-05, from
http://www.contrabass.com/pages/frequency.html
King, S. (n.d.). Autocorrelation for estimating F0. Retrieved 2015-11-05, from
http://speech.zone/autocorrelation/
Media, J. D. (n.d.). An introduction to digital audio. Retrieved 2015-11-05, from
http://www.jiscdigitalmedia.ac.uk/guide/an-introduction-to-digital-audio
Rabiner, L. R. (1977, February). On the use of autocorrelation analysis for pitch detection.
IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-25 (1), 24-33.
Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976, October). A
comparative performance study of several pitch detection algorithms. IEEE
Transactions on Acoustics, Speech, and Signal Processing, 24 (6), 399-418.
Tan, L., & Karnjanadecha, M. (2003). Pitch detection algorithm: Autocorrelation method
and AMDF. Proceedings of the 3rd International Symposium on Communications and
Information Technology, 2 , 551-556.
Download