Running head: PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR Implementing a Pitch Detection Algorithm to Tune a Bass Guitar Abigail Lira El Paso Community College 1 PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 2 Implementing a Pitch Detection Algorithm to Tune a Bass Guitar Aspiring computer scientists can be motivated by implementing algorithms that solve real-world problems; unfortunately, many interesting algorithms are beyond the reach of beginning students because they are often explained in journal articles aimed to the experienced computer scientist. In this paper I aim to illuminate one algorithm —pitch detection using autocorrelation— and use it to tune an electric bass guitar in the hopes that other students can reproduce my steps to create their own bass or guitar tuners. In doing so, I will explain the basics of digital audio and the differences between pitch detection algorithms; I will distill the algorithm into pseudocode; and show the code that I have created. Digital Audio Computers need to process the data onto their own language; binary code. Sound is a variation in sound pressure, which is commonly represented as a waveform. Before a computer can process a sound, its waveform needs to be converted into binary code. This conversion of analog waveform to binary code is done by a method called sampling, in which a computer records the sound pressure at discrete time points; a sound pressure value at a given time is called a sample. At each step of the sampling process “the amplitude [is] recorded [...] knowing that [t]he closer together [the] samples are[,] the more accurate the recording becomes” (Media, n.d.). In my research, I found a common misconception: that the higher the number of samples, the better the “accuracy” or “resolution” of the recording. This is probably because people are familiar digital photography where a higher amount of pixels amounts to a higher resolution picture. However, in digital audio, there is no concept of “resolution”. For example, Jisc Digital Media states that “[t]he closer together [the] samples are[,] the more accurate the recording becomes” (Media, n.d.), which is false. In contrast, the Nyquist-Shannon Theorem states that the number of samples per unit time in a recording simply determine the highest frequency that can be represented. Figure 1 shows the flow of sound correspondent input through a digital system back to an audio output. PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 3 Figure 1 . The analog-to-digital converter (ADC) will convert the analog waveform to discrete samples which will be then stored in the computer; in a traditional digital sound setup (not used in my bass tuner) the digital-to-analogue converter (DAC) reconstructs an analog waveform from the discrete samples. In my system, the original recording (OR) will be the bass guitar signal. Background on Pitch Detection For many years computer scientists and electrical engineers have studied pitch detection, and have analyzed to its limit. Rabiner, Cheng, Rosenberg, and McGonegal state that, “[p]itch detect[tion] is an essential component in a variety of speech processing systems[...]the pitch contour of an utterance is useful for recognizing speakers[...]for speech instruction to the hearing impaired[...]and is a requirement in almost all speech analysis-synthesis (vocoder) systems” (Rabiner, Cheng, Rosenberg, & McGonegal, 1976). Pitch detection is widely used, thus a variety of algorithms have been developed. Each algorithm has its own method PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 4 containing an approach as to how pitch detection can be obtained. The algorithms compared is the following: autocorrelation method, cepstral method, the simplified inverse filtering technique method, the parallel processing time-domain method, the data reduction method, and the spectral flattening linear predictive coding method. The seven pitch detection algorithms have their own unique way of dealing with pitch detection, but as a whole they all have the same intention. Pitch detection is complicated. Rabiner, Cheng, Rosenberg, and McGonegal state that, “[a]ccurate and reliable measurement of the pitch period of a speech signal from the acoustic pressure waveform alone is often exceedingly difficult for several reasons” (Rabiner et al., 1976). One reason is that the glottal excitation is a quasiperiodic signal, and “measuring the period of a speech waveform, which varies both in period and in the detailed structure of the waveform within a period, can be quite difficult” (Rabiner et al., 1976). In other words, our voice is naturally unpitched. Rabiner, Cheng, Rosenberg, and McGonegal state that, “[a]second difficulty in measuring pitch period is the interaction between the vocal tract and the glottal excitation. In some instances the formants of the vocal tract can alter significantly the structure of the glottal waveform so that the actual pitch period is difficult to detect” (Rabiner et al., 1976). The challenge is when dealing with rapid movements that the results may alter significantly. The third problem arises when measuring a pitch from beginning to end. The question may emerge as to where would the exact beginning, and exact end be? ... Lastly, the fourth difficulty would be detecting between low-levels of frequency or no levels at all. The autocorrelation method has its pros and cons when dealing with pitch detection. Rabiner states that, “[t]he autocorrelation computation is made directly on the waveform and is a fairly straightforward (albeit time consuming) computation” (Rabiner, 1977). Although the algorithm does not have the best complexity, it “is simply amenable to digital hardware implementation generally requiring only a single multiplier and an accumulator as the computational elements” (Rabiner, 1977). This method brings good success when dealing PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 5 with digital communications pitch detection, and deals good with via distortions. Despite of the many capabilities of the autocorrelation method, there exist problems that may arise when dealing with this algorithm. Rabiner states that, “one problem is to decide which of several autocorrelation peaks corresponds to the pitch period” (Rabiner, 1977). Another problem appears when dealing with window handling, thus two problems are known. Rabiner states that, “[first] there is the problem of choosing an appropriate window. Second there is the problem that[...] no matter which window is selected, the effect of the window is to taper the autocorrelation function smoothly to 0 as the autocorrelation index increases” (Rabiner, 1977). In other words, the zero index is a crucial fundamental when dealing with the exact beginning at all times of the repetition. Finally, there would be a problem with “choosing an appropriate analysis frame (window) size. The ideal analysis frame should contain from 2 to 3 complete pitch periods” (Rabiner, 1977). This adapts to an estimated average of the pitch detection as a whole. The aim of my algorithm is to detect the fundamental frequencies of every string in a bass guitar. My algorithm implementation will work only with standard tuning. The table on page 9 shows the note frequencies in an open note in an electric bass guitar. The Implemented Algorithm The algorithm that was proposed on Figure 2 is explained in this section. The sampled waveform will come out of the digital-to-analogue converter (DAC) system, it will be split into frames Which then will be processed by the autocorrelation procedure, which will focus on the total number of samples and will use the lag strategy to get the crucial peaks of the estimated fundamental frequency and finally, the peak finder procedure will find the most likely peak to be an estimate of F0 . Figure 2 illustrates how pitch detection starts off with first with a sampled waveform, then onto processing procedures, ending up with an estimate of the fundamental frequency F0 . PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 6 sampled waveform Split Procedure frames Windowing Procedure tapered frames Autocorrelation Procedure autocorrelation Peak Finder Procedure F0 estimate Figure 2 . Algorithm overview. Windowing and Filtering Procedure Programs that deal with streams of data usually have a windowing procedure to split an incoming stream into smaller parts called windows so that algorithms can process the data in a piecemeal fashion, enabling indefinitely long streams such as the continuous audio input that I will be using in my tuning program. Windowing procedures may sometimes modify the data to make them easier to process. Two types of windowing functions are commonly used: the Hanning window, which is more responsive when working with 3 periods per analysis window, and a Gaussian window, which is better when working with a larger analysis window (Boersma, 1993). The windowing procedure that I will use will apply a Gaussian window function, which decreases the energies PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 7 at the edge of the window, which facilitates analysis. The procedure Gaussian-Filter(A) below takes an array of samples A in a window and applies the filter by multiplying the Gaussian function by the window. Gaussian-Filter(A) 1 N = A.length 2 σ = 0.5 3 for n = 0 to A.length −1)/2−n) 4 x = 2((N(N −1) 5 A[n] = ex A[n] Autocorrelation Procedure The correlation of two waveforms is a measure of their similarity. Waveforms are compared against each other by keeping one of them fixed and delaying the other by τ samples. Autocorrelation is the correlation of a waveform with itself. The mathematical definition of the autocorrelation function is shown in the equation below: Rx (τ ) = N −1−τ X x[n]x[n + τ ] (1) n=0 where N is the total number of samples in the window, τ is the delay, and x[n] is the nth sample in the window. The autocorrelation is expected to have a maximum at τ = 0 (when identical), and dissimilarity to increase as τ increases —with local maximums along the way. I have defined the Autocorrelation0(A,τ ) procedure found below that takes an array A of samples in a window and a integer delay τ . Note that for every increase in τ , an additional sample is needed; thus, for a window of length N and a τ that can go up to N , 2N samples are needed. To allow for this without requiring complicated look-ahead logic, the procedure correlates half of the provided array of samples. Autocorrelation0(A, τ ) 1 sum = 0 2 for i = 0 to bA.length/2c 3 sum = sum + A[i]A[i + τ ] 4 return sum PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 8 We would like to store the autocorrelation results for τ = [0, bA.length/2c] to later find the maximums which will be F0 candidates. Thus, I defined the Autocorrelation(A) procedure below that returns an array containing the autocorrelated waveform. Autocorrelation(A) 1 R is an empty array of size A.length 2 for τ = 0 to bA.length/2c 3 R[τ ] = Autocorrelation0(A,τ ) 4 return R Peak Finding Procedure Once I have computed the autocorrelation, the next task is to find a local maximum R[τmax ] where τmax > 0 (King, n.d.). In the ideal case, the local maximum will be the period 1 of the original waveform, and from it I can easily estimate f0 = . τmax × sampling frequency More commonly though, the local maximum will be an harmonic of F0 (Rabiner et al., 1976). Better peak finding algorithms exist that can ignore harmonics (Tan & Karnjanadecha, 2003), but for simplicity and to avoid premature optimization I implemented the simplest possible peak-finding algorithm shown below as procedure Find-Peak(A), where A is the array containing the autocorrelations. Find-Peak(A) 1 τmax = 1 2 maxPeak = A[1] 3 for τ = 0 to A.length 4 if R[τ ] > maxPeak 5 maxPeak = R[τ ] 6 τmax = τ 7 return τmax “As one can notice the range between the range frequencies of the open notes is between 30.8 Hz to 130.8 Hz. Tan and Karnjanadecha state that, “[s]ince, the range of F0 is generally in the range of 80-500 Hz, then the frequency components above 500 Hz is useless [...] [t]hus a low-pass filter [...] above 500 Hz would be useful in improving the performance [...] we use the lowpass-filter with 900 Hz” (Tan & Karnjanadecha, 2003). This is why the autocorrelation method is an important when estimating F0. PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 9 Note F0 (Hz) B E A D G C 30.868 41.204 55.000 73.416 97.999 130.813 Table 1 Fundamental frequencies of open strings in bass guitars (Green, n.d.). There may be some confusion as when referring to speech or music, but the basis of it all is sound. Cheveigné and Kawahara state that, “[s]ounds may be periodic yet “outside the existence region” of pitch [...] Conversely, a sound may not be periodic, but yet evoke a pitch [...] However, over a wide range pitch and period are in a one-to-one relation, to the degree that the word ‘pitch’ is often [...] F0 ” (de Cheveigné & Kawahara, 2002). In other words, speech and music go hand-in-hand when dealing with fundamental frequency. The goal is to fully understand how the autocorrelation works in order to see this true. Understanding pitch detection will be by the use of the autocorrelation function itself. Tan and Karnjanadecha state that, “[g]iven a discrete time signal x(n), defined for all n, the auto-correlation function is generally defined as: N X 1 x(n)x(n + m) Rx (m) = lim N →∞ 2N + 1 n=−N [...]Thus, for pitch detection, if we assume x(n) is exactly periodic with period P, i.e., x(n) = x(n+P) for all n, then it is easily shown that: Rx (m) = Rx(m + P ), i.e, the autocorrelation is also periodic with the same period” (Tan & Karnjanadecha, 2003). The autocorrelation function gives data to display a sinusoid structure waveform, and PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 10 can be analyzed furthermore. In order to explain this concept one must know that any sound has a fundamental frequency period, meaning how long the sound takes to conclude. This norm of time will be primary when dealing with the second signal compared. Cheveigné and Kawahara state that, “[t]he autocorrelation method compares the signal to its shifted self”(de Cheveigné & Kawahara, 2002). In other words, there will be the original waveform and a shifted version of itself being compared. King states that, “auto means –itself, and correlation means –similarity” (King, n.d.). The purpose to be compared is to find an average for the fundamental frequency of pitch detection by using the autocorrelation formula; the samples from the original waveform will be multiplied by the copied samples from the original waveform depending on the lag or shift it is currently on. In addition Boersma states that, “[f]or zero lag, we have rx (0) = rH (0) + rN (0), and if the noise is white[...]we find a local maximum at a lag τmax = T0 [...]” (Boersma, 1993). That is, the local maximum will always be at lag 0 because the original waveform will always be the target. The following Java program shows a similar approach: import java.util.Scanner; import java.io.*; public class Autocorrelation { public static void main(String[] args) throws IOException{ File file = new File("SampleOriginal.txt"); Scanner outputFile = new Scanner(file); double[] sample = new double[6]; double[] copy = new double[sample.length]; int i = 0; while(outputFile.hasNext()) { double num = outputFile.nextDouble(); sample[i]= num; PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR copy[i] = num; i++; } int sample1 = (sample.length-1)/2; for(int lag= 0; lag <= sample1; lag++) { double autocorrelation = auto(sample, copy, lag); System.out.println(autocorrelation); } outputFile.close(); } public static double auto(double[] sample, double[] copy, int lag) { int sample1 = sample.length-1; double itself = 0; int inc = 0; for(int a = lag; a <= sample1; a++) { itself += sample[inc] * copy[a]; inc++; } return itself; } } SampleOriginal.txt, bass F0, notepad: 30.8 41.2 55.0 73.2 11 PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 12 98.0 130.8 Output: 37741.96 27552.96 19674.40 Testing an array of six fundamental frequency components gave an estimation of the fundamental period from which the fundamental frequency is obtained. Notice that the process of reproducing the same array is concreted in the beginning of the program. Next, it is taken the original wavelength and dividing itself by the copied wavelength; taking the average. This is crucial to the result synthesis to be recorded in their precise measurements of totality. The estimation of F0 will always be half the size of the fundamental period as stated before. The repetition of the method will take place where the first for-loop is implemented in order to not violate the total average of the original wavelength length. Now, within the method the autocorrelation function is performed by setting the shift or lag to the corresponding start, in this case 0, in that the original waveform is frequent. Notice that this time the copied samples are fully being multiplied to its total length, and do not correspond to the limit of the average. This is necessary in order for the estimated F0 to be calculated within all of its components. Thus why the copied samples make the solution to the problem. While Cheveigne´ and Kawahara state that, “[t]he algorithm has few parameters, and these do not require fine tuning. In contrast to most other methods, no upper limit need be put on the F0 search range[...] [Making this method] relatively simple[...] [And] implemented efficientl” (de Cheveigné & Kawahara, 2002). PITCH DETECTION ALGORITHM TO TUNE A BASS GUITAR 13 References Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences(17), 97-110. de Cheveigné, A., & Kawahara, H. (2002, April). YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am., 111 , 1917-1930. Green, G. (n.d.). Frequencies and ranges. Retrieved 2015-11-05, from http://www.contrabass.com/pages/frequency.html King, S. (n.d.). Autocorrelation for estimating F0. Retrieved 2015-11-05, from http://speech.zone/autocorrelation/ Media, J. D. (n.d.). An introduction to digital audio. Retrieved 2015-11-05, from http://www.jiscdigitalmedia.ac.uk/guide/an-introduction-to-digital-audio Rabiner, L. R. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-25 (1), 24-33. Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976, October). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24 (6), 399-418. Tan, L., & Karnjanadecha, M. (2003). Pitch detection algorithm: Autocorrelation method and AMDF. Proceedings of the 3rd International Symposium on Communications and Information Technology, 2 , 551-556.