Lab: Objective Measurement of Breathy Voice Background Breathy voice (also known as murmur) is one of the most common symptoms voice disorders, both organic and functional (Aronson, 1971; Aronson, 1990; Boone & McFarlane, 1988; Colton & Casper, 1990). There is also evidence that breathiness is associated with aging (Hollien, 1987; Ryan & Burk, 1974), and there is a tendency for women – on average - to produce somewhat breathier voices than men (Klatt & Klatt, 1990; McKay, 1987). In terms of the underlying laryngeal vibratory pattern, two major features that are associated with breathy voice are described below. 1. Aspiration noise. The most obvious aerodynamic feature of breathy voice is that air escapes during the “closed” phase of the vibratory cycle. (The word closed is in quotation marks because, in the case of breathy voice, the vocal folds do not entirely meet at midline during the portion of the phonatory cycle in which glottal area reaches a minimum.) There is a surprising variety of laryngeal configurations that can be responsible for this air leakage, but the most commonly described (e.g., Södersten et al., 1995) is the posterior glottal (or glottic) chink, in which the anterior portion of the vocal folds periodically meet at (or near) midline (producing the buzzy component of breathy voice) while the posterior portion of the vocal folds remains open (producing the aperiodic, hissy component of breathy voice). This hissy component is called aspiration or aspiration noise. (This use of the term aspiration is entirely distinct from the use of this word in swallowing, referring to the entry of liquids or other unwanted gunk into the trachea and lungs.) Air leakage during the phonatory cycle results in turbulence, which is heard as noise. Figure 1. Glottal source waveforms for breathy and non-breathy voice. The spectrum of aspiration noise – the hissy component of breathy voice – is stronger in the mid- and high-frequencies than it is in the lower frequencies. The spectrum of the buzzy (periodic) component of breathy voice, on the other hand, is exactly the opposite; i.e., it is stronger in the lower frequencies than it is in the highs. For this reason, aspiration is more easily seen in the mid-and high-frequencies, which the opposite is true of the periodic/harmonic component. These features can be seen in the spectra of Figure 2. The presence of aspiration can be seen in the time domain: the waveform of a clearly phonated (modal) voice will appear highly periodic, and the degree of periodicity will decrease as the voice becomes more breathy. However, these differences in waveform periodicity are not always easy to see by eye – but they can usually be easily seen in the spectrum. For example, 2 compare the narrow band spectra for the clear (non-breathy) and breathy voices in Figure 3. Notice that the degree of harmonic organization is much greater for the non-breathy voice; i.e., most of the energy is at harmonically related frequencies. The breathy voice, on the other hand, shows a reasonable degree of harmonic organization mainly in the low frequencies (see the glottal source spectra in the lower part of Figure 2). Differences in the degree of harmonic organization can also be seen in the output spectra (as opposed to source spectra) of Figure 3. Figure 2. Airflow functions and spectra for non-breathy (left) and breathy voice. Notice the more rounded glottal wave for breathy voice, which produces a spectrum with a stronger 1 st harmonic or, to state it the other way around, less energy spread into the higher frequencies. Figure 3. Narrow band spectra for clear phonation (left) and for breathy voice. Notice that the degree of harmonic organization is much greater for the non-breathy voice on the left; i.e., most of the energy is at harmonically related frequencies, while the breathy voice shows a reasonable degree of harmonic organization mainly in the low frequencies. Also, note the strong 1st harmonic in the breathy signal – more on this below. 3 2. Rounding of the glottal source waveform for breathy voice. In non-breathy phonation in modal register, airflow rises gradually to a peak during the opening phase, but typically falls more abruptly during the closing phase (Figure 1). However, in breathy voice, the closing phase of the glottal source function is more gradual, producing a more rounded source signal. This can be seen in the top of Figure 2, which shows a more rounded (i.e., more sinusoidal) source waveform for the breathy voice. Once again, these breathiness-related differences in the degree of rounding of the glottal signal are more easily observed in the spectrum rather than in the time domain – in this case, by measuring the relative amplitude of the first harmonic (H1). Here is the reasoning: For a perfect sinusoid, all of the energy in the signal is at H1, with no spread of energy into the higher frequencies. (One definition of a sinusoid is that changes over time are as smooth as they can possibly be. As we discussed in the section on basic acoustics, the sinusoid is the extension over time of motion around a circle, with a circle being the smoothest shape possible.) As the source waveform becomes more abrupt (i.e., more like an impulse), the spread of energy to higher frequencies increases – meaning that the relative amplitude of H1 will decrease. The bottom line is that we would generally expect to see higher amplitude first harmonics for more breathy voices and weaker first harmonics for less breathy voices. Compare the first harmonic amplitudes for the breathy and non-breathy voices in Figures 2 and 3. Which of the spectra show stronger 1st harmonics – the breathy or the non-breathy? The Lab The lab uses ten sustained [a] vowels out of 25 voice samples that were used many years ago in a study of breathy vocal quality in dysphonic speakers (Hillenbrand & Houde, 1996). These voice samples, in turn, were drawn from a large database of recordings that were made at Massachusetts Eye and Ear Infirmary by Robert Hillman. The samples that we picked were intended to represent a fairly broad range of breathiness percepts ranging from clear phonation to very breathy voice. Procedure 1. Open SpeechTool/Ztool, then use the File menu to open ‘br01.wav’ (‘c:\ztool\br01.wav’ – on the LRC/Open Lab machines, it’s ‘r:\ztool\br01.wav’). 2. Play the signal as many times as you wish and rate how breathy the voice is on a scale of 1 to 5, with 5 being the most breathy. Record your rating for this signal. So, the row of data for this signal will look like this: br01.wav 3 (or whatever) *** LRC/Open Lab Users: Go to the page with the heading For LRC/Open Lab Users *** 3. Toward the end of the string of buttons at the top, you will see one called ‘CPP’. Bugging this button will run a program that estimates how periodic the signal is by measuring the degree of harmonic organization in its narrow band spectrum. The very last number that the program gives you is called “Mean CPPS”. Larger CPP values indicate a higher degree of 4 harmonic organization (i.e., a more periodic signal). Consequently, small values of CPP should be associated with breathier voices. Write this number down in the same row as your breathiness rating. Your row of data should now look something like this: br01.wav 3 0.71 4. Do the same thing for the remaining nine signals. In your table of results, you should have your breathiness rating (1-5) and a CPP value for each test signal. 5. Below is a table of breathiness ratings for each signal. br01.wav br02.wav br03.wav br04.wav br05.wav br06.wav br07.wav br08.wav br09.wav br10.wav 8.22 5.21 2.12 2.86 5.33 1.51 7.94 3.42 4.07 4.87 These are very much like the breathiness ratings (BR) that you made, except that these ratings are averages from a panel of 21 listeners doing pretty much what you did. (These values vary from ~1.5 to ~8.2 instead of 1-5, but this doesn’t matter.) Copy these breathiness ratings into a new column of the table you created. So, each row in your table will have, in this order: (1) the name of the signal, (2) your breathiness rating, (3) the CPP value, and (4) the average rating from the 21-listener panel. Create a file called ‘brdata.txt’ in the ztool folder with all of this information using Word with all of these numbers in it (filename, your BR, CPP, and panel BR – for all 10 signals). Set the font to Courier and use the space bar only, not the Tab key. SAVE YOUR FILE AS PLAIN TEXT (File>Save as>Choose plain text from the drop down menu under Save as type, using the name ‘brdata.txt’. If Word asks you about “text encoding”, just leave it at the Windows default setting.). Your results file should be in this format: br01.wav br02.wav br03.wav . . . 2 0.54 8.22 4 3.44 5.21 1 8.54 2.12 . . . . . . Notes: (1) The numbers in columns 2 and 3 are just made up. (2) Do not create a Wordformatted table – just type the file and the three numbers on each line. (3) Do not use any column headings and do not enter any text (e.g., “dB”) other than the filename and the three numbers in each row. 6. The last step is to measure correlations between: (1) your BR and the panel BR (columns 2 and 4), (2) your BR and CPP (columns 2 and 3), and (3) the panel BR and CPP (columns 3 5 and 4). A correlation is a measure of the strength of the relationship between two sets of numbers.1 The easiest way to measure a correlation happens to be the most arcane, but it’s not that bad: a. Hold down the Windows key (the one with the flag-looking thing on it) and hit ‘R’. b. Type ‘cmd’ into the text box that pops up. c. Put your cursor in the black window that appears and type: ‘cd c:\ztool’ <ENTER> On the LRC/Open lab machines: r: <Enter> cd r:\ztool <ENTER> d. Let’s assume you want to measure the correlation between your BR (col 2) and CPP (col 3). Type this arcane thing: .\tcor brdata.txt 2 brdata.txt 3 (measure the correlation between column 2 and column 3) (The weird ‘.\’ thing has to be there. It needs to be a backslash (‘\’) and not a forward slash (‘/’.) ‘tcor’ will type out a bunch of stuff; the only numbers you need are the values for ‘r’ and ‘rsq’ (r2, aka variance explained); e.g.: r: rsq: see: -0.92022 0.84680 0.52258 Do the same thing for the two other correlations that you need. Results: correlation between your BR and the panel BR r _______ r2 _______ correlation between your BR and CPP _______ _______ correlation between the panel BR CPP _______ _______ Questions: 1. How well do your breathiness ratings agree with the panel ratings? Note that the more important measure of the strength of a relationship is rsq (r2) rather than r: for example, an r value of 0.8 is not 80% of perfect, but an rsq value of 0.8 is 80% of perfect. 1 Correlations vary from 0 to +1 for positive relationships (large values on one variable tend to be associated with large values on the other variable) or from 0 to -1 for negative relationships (large values on one variable are associated with small values on the other variable). 6 2. How well do the CPP measures predict your breathiness ratings? 3. How well do the CPP measures predict the panel breathiness ratings? 4. Why is the correlation between breathiness ratings and CPP negative? (If you’re not sure, see footnote 1). 5. What do you make of all this? For example, is there any advantage to using this measure of periodicity measure in place of your own subjective estimate? 6. Look at the figures on the last two pages of this document and read the description at the top of the page. Pick the two spectra that seem to show the most harmonic organization, and the two spectra that seem to show the least harmonic organization. Record your results below, along with the panel breathiness rating and the CPP value for each signal: Signal with the most harmonic organization File name (e.g., br09) Panel BR CPP ________________ ______ ____ Signal with the second most harmonic organization 7 File name Panel BR CPP ________________ ______ ____ Signal with the least harmonic organization File name Panel BR CPP ________________ ______ ____ Signal with the second least harmonic organization File name Panel BR CPP ________________ ______ ____ 7. Last question: Do the voices that you judged to have the most harmonic organization tend to be among the signals with: (a) the lowest breathiness ratings and/or (b) the largest CPP values? 8 For LRC/Open Lab Users Before you start: You can only run this lab on machines with an R: drive. Right now I don’t know which is which, so for now you’re stuck with trial and error: Bug the Computer icon on the desktop and look for an R: drive. If you don’t see it, try another one. Somebody at the front desk might be able to help you, but I wouldn’t count on it. 1. While holding down the Windows key (the one with the flag), type ‘R’. 2. In the text box that pops up, type: cmd <Enter> A black window will appear. Type: a. r: <Enter> (this switches you to the ‘r:’ disk, which is where the stuff is) b. cd r:\ztool <Enter> c. cpps br01.wav vowel <Enter> (or br02.wav or br02.wav … br10.wav) 3. The program will type out a bunch of numbers. The value that you need is the last one that cpps reports: Mean CPPS. Write it down. (For ‘br01.wav’, it should be a small number pretty close to zero, meaning that the signal is only marginally periodic.) 4. Do the same thing for br02.wav, br03.wav … br10.wav. 5. Return to step 5 under Procedure. 9 REFERENCES Aronson, A.E. (1971). Early motor unit disease masquerading as psychogenic breathy dysphonia: A clinical case presentation. Journal of Speech and Hearing Disorders, 36, 115124. Aronson, A.E. (1990). Clinical voice disorders (3rd ed). New York: Thieme. Boone, D.R., and McFarlane, S.C. (1988). The voice and voice therapy (4th ed). Englewood Cliffs, NJ: Prentice Hall. Colton, R.A., and Casper, J.K. (1990). Understanding voice problems: A physiological perspective for diagnosis and treatment. Baltimore: Williams and Wilkins. Hillenbrand, J.M., and Houde, R.A. (1996). “Acoustic characteristics of breathy vocal quality: Dysphonic voices and continuous speech,” Journal of Speech and Hearing Research, 39, 311-321. Hollien, H. (1987). "Old voices": What do we really know about them? Journal of Voice, 1, 217. Ryan, W.J., and Burk, K.W. (1974). Perceptual and acoustic correlates of aging in the speech of males. Journal of Communication Disorders, 1, 181-192. Södersten. M., Hertegård S., Hammarberg B. (1995). Glottal closure, transglottal airflow, and voice quality in healthy middle-aged women. Journal of Voice, 9, 182-97. 10 Narrow Band Spectra of the Test Signals The figures below are narrow band amplitude spectra of the ten test signals. Notice that the spectra vary quite a bit in the degree of harmonic organization, which reflects how periodic the signal is. For example, for br06 nearly all of the energy is at harmonic frequencies (whole number multiples of f0). The same is true of br03, though to a somewhat lesser extent. The spectra of some of the other signals, however, show all kinds of energy at non-harmonic frequencies; e.g., br01, br05, br07, and br10. The CPP algorithm attempts to measure these variations in harmonic organization, with large CPP values reflecting a high degree of harmonic organization (i.e., high periodicity). The assumption is that signals with large CPP values tend to be less breathy – and vice versa. 11