Lab: Objective Measurement of Breathy Voice Background

advertisement
Lab: Objective Measurement of Breathy Voice
Background
Breathy voice (also known as murmur) is one of the most common symptoms voice
disorders, both organic and functional (Aronson, 1971; Aronson, 1990; Boone & McFarlane,
1988; Colton & Casper, 1990). There is also evidence that breathiness is associated with aging
(Hollien, 1987; Ryan & Burk, 1974), and there is a tendency for women – on average - to
produce somewhat breathier voices than men (Klatt & Klatt, 1990; McKay, 1987).
In terms of the underlying laryngeal vibratory pattern, two major features that are associated
with breathy voice are described below.
1. Aspiration noise. The most obvious aerodynamic feature of breathy voice is that air escapes
during the “closed” phase of the vibratory cycle. (The word closed is in quotation marks
because, in the case of breathy voice, the vocal folds do not entirely meet at midline during
the portion of the phonatory cycle in which glottal area reaches a minimum.) There is a
surprising variety of laryngeal configurations that can be responsible for this air leakage, but
the most commonly described (e.g., Södersten et al., 1995) is the posterior glottal (or glottic)
chink, in which the anterior portion of the vocal folds periodically meet at (or near) midline
(producing the buzzy component of breathy voice) while the posterior portion of the vocal
folds remains open (producing the aperiodic, hissy component of breathy voice). This hissy
component is called aspiration or aspiration noise. (This use of the term aspiration is
entirely distinct from the use of this word in swallowing, referring to the entry of liquids or
other unwanted gunk into the trachea and lungs.) Air leakage during the phonatory cycle
results in turbulence, which is heard as noise.
Figure 1. Glottal source waveforms for breathy and non-breathy voice.
The spectrum of aspiration noise – the hissy component of breathy voice – is stronger in the
mid- and high-frequencies than it is in the lower frequencies. The spectrum of the buzzy
(periodic) component of breathy voice, on the other hand, is exactly the opposite; i.e., it is
stronger in the lower frequencies than it is in the highs. For this reason, aspiration is more easily
seen in the mid-and high-frequencies, which the opposite is true of the periodic/harmonic
component. These features can be seen in the spectra of Figure 2.
The presence of aspiration can be seen in the time domain: the waveform of a clearly
phonated (modal) voice will appear highly periodic, and the degree of periodicity will decrease
as the voice becomes more breathy. However, these differences in waveform periodicity are not
always easy to see by eye – but they can usually be easily seen in the spectrum. For example,
2
compare the narrow band spectra for the clear (non-breathy) and breathy voices in Figure 3.
Notice that the degree of harmonic organization is much greater for the non-breathy voice; i.e.,
most of the energy is at harmonically related frequencies. The breathy voice, on the other hand,
shows a reasonable degree of harmonic organization mainly in the low frequencies (see the
glottal source spectra in the lower part of Figure 2). Differences in the degree of harmonic
organization can also be seen in the output spectra (as opposed to source spectra) of Figure 3.
Figure 2. Airflow functions and spectra for non-breathy (left) and breathy voice. Notice the more
rounded glottal wave for breathy voice, which produces a spectrum with a stronger 1 st harmonic
or, to state it the other way around, less energy spread into the higher frequencies.
Figure 3. Narrow band spectra for clear phonation (left) and for breathy voice. Notice that the degree of harmonic
organization is much greater for the non-breathy voice on the left; i.e., most of the energy is at harmonically related
frequencies, while the breathy voice shows a reasonable degree of harmonic organization mainly in the low
frequencies. Also, note the strong 1st harmonic in the breathy signal – more on this below.
3
2. Rounding of the glottal source waveform for breathy voice. In non-breathy phonation in
modal register, airflow rises gradually to a peak during the opening phase, but typically falls
more abruptly during the closing phase (Figure 1). However, in breathy voice, the closing
phase of the glottal source function is more gradual, producing a more rounded source signal.
This can be seen in the top of Figure 2, which shows a more rounded (i.e., more sinusoidal)
source waveform for the breathy voice. Once again, these breathiness-related differences in
the degree of rounding of the glottal signal are more easily observed in the spectrum rather
than in the time domain – in this case, by measuring the relative amplitude of the first
harmonic (H1). Here is the reasoning: For a perfect sinusoid, all of the energy in the signal is
at H1, with no spread of energy into the higher frequencies. (One definition of a sinusoid is
that changes over time are as smooth as they can possibly be. As we discussed in the section
on basic acoustics, the sinusoid is the extension over time of motion around a circle, with a
circle being the smoothest shape possible.) As the source waveform becomes more abrupt
(i.e., more like an impulse), the spread of energy to higher frequencies increases – meaning
that the relative amplitude of H1 will decrease. The bottom line is that we would generally
expect to see higher amplitude first harmonics for more breathy voices and weaker first
harmonics for less breathy voices. Compare the first harmonic amplitudes for the breathy and
non-breathy voices in Figures 2 and 3. Which of the spectra show stronger 1st harmonics –
the breathy or the non-breathy?
The Lab
The lab uses ten sustained [a] vowels out of 25 voice samples that were used many years
ago in a study of breathy vocal quality in dysphonic speakers (Hillenbrand & Houde, 1996).
These voice samples, in turn, were drawn from a large database of recordings that were made at
Massachusetts Eye and Ear Infirmary by Robert Hillman. The samples that we picked were
intended to represent a fairly broad range of breathiness percepts ranging from clear phonation to
very breathy voice.
Procedure
1. Open SpeechTool/Ztool, then use the File menu to open ‘br01.wav’ (‘c:\ztool\br01.wav’ – on
the LRC/Open Lab machines, it’s ‘r:\ztool\br01.wav’).
2. Play the signal as many times as you wish and rate how breathy the voice is on a scale of 1 to
5, with 5 being the most breathy. Record your rating for this signal. So, the row of data for
this signal will look like this:
br01.wav
3 (or whatever)
*** LRC/Open Lab Users: Go to the page with the heading For LRC/Open Lab Users ***
3. Toward the end of the string of buttons at the top, you will see one called ‘CPP’. Bugging
this button will run a program that estimates how periodic the signal is by measuring the
degree of harmonic organization in its narrow band spectrum. The very last number that the
program gives you is called “Mean CPPS”. Larger CPP values indicate a higher degree of
4
harmonic organization (i.e., a more periodic signal). Consequently, small values of CPP
should be associated with breathier voices. Write this number down in the same row as your
breathiness rating. Your row of data should now look something like this:
br01.wav
3
0.71
4. Do the same thing for the remaining nine signals. In your table of results, you should have
your breathiness rating (1-5) and a CPP value for each test signal.
5. Below is a table of breathiness ratings for each signal.
br01.wav
br02.wav
br03.wav
br04.wav
br05.wav
br06.wav
br07.wav
br08.wav
br09.wav
br10.wav
8.22
5.21
2.12
2.86
5.33
1.51
7.94
3.42
4.07
4.87
These are very much like the breathiness ratings (BR) that you made, except that these
ratings are averages from a panel of 21 listeners doing pretty much what you did. (These
values vary from ~1.5 to ~8.2 instead of 1-5, but this doesn’t matter.) Copy these breathiness
ratings into a new column of the table you created. So, each row in your table will have, in
this order: (1) the name of the signal, (2) your breathiness rating, (3) the CPP value, and (4)
the average rating from the 21-listener panel. Create a file called ‘brdata.txt’ in the ztool
folder with all of this information using Word with all of these numbers in it (filename, your
BR, CPP, and panel BR – for all 10 signals). Set the font to Courier and use the space bar
only, not the Tab key. SAVE YOUR FILE AS PLAIN TEXT (File>Save as>Choose plain
text from the drop down menu under Save as type, using the name ‘brdata.txt’. If Word asks
you about “text encoding”, just leave it at the Windows default setting.). Your results file
should be in this format:
br01.wav
br02.wav
br03.wav
.
.
.
2 0.54 8.22
4 3.44 5.21
1 8.54 2.12
.
.
.
.
.
.
Notes: (1) The numbers in columns 2 and 3 are just made up. (2) Do not create a Wordformatted table – just type the file and the three numbers on each line. (3) Do not use any
column headings and do not enter any text (e.g., “dB”) other than the filename and the three
numbers in each row.
6. The last step is to measure correlations between: (1) your BR and the panel BR (columns 2
and 4), (2) your BR and CPP (columns 2 and 3), and (3) the panel BR and CPP (columns 3
5
and 4). A correlation is a measure of the strength of the relationship between two sets of
numbers.1 The easiest way to measure a correlation happens to be the most arcane, but it’s
not that bad:
a. Hold down the Windows key (the one with the flag-looking thing on it) and hit ‘R’.
b. Type ‘cmd’ into the text box that pops up.
c. Put your cursor in the black window that appears and type: ‘cd c:\ztool’ <ENTER>
On the LRC/Open lab machines:
r: <Enter>
cd r:\ztool <ENTER>
d. Let’s assume you want to measure the correlation between your BR (col 2) and CPP (col 3).
Type this arcane thing:
.\tcor brdata.txt 2 brdata.txt 3 (measure the correlation between column 2 and column 3)
(The weird ‘.\’ thing has to be there. It needs to be a backslash (‘\’) and not a forward
slash (‘/’.)
‘tcor’ will type out a bunch of stuff; the only numbers you need are the values for ‘r’ and
‘rsq’ (r2, aka variance explained); e.g.:
r:
rsq:
see:
-0.92022
0.84680
0.52258
Do the same thing for the two other correlations that you need.
Results:
correlation between your BR and the panel BR
r
_______
r2
_______
correlation between your BR and CPP
_______
_______
correlation between the panel BR CPP
_______
_______
Questions:
1. How well do your breathiness ratings agree with the panel ratings? Note that the more
important measure of the strength of a relationship is rsq (r2) rather than r: for example, an r
value of 0.8 is not 80% of perfect, but an rsq value of 0.8 is 80% of perfect.
1
Correlations vary from 0 to +1 for positive relationships (large values on one variable tend to be associated with
large values on the other variable) or from 0 to -1 for negative relationships (large values on one variable are
associated with small values on the other variable).
6
2. How well do the CPP measures predict your breathiness ratings?
3. How well do the CPP measures predict the panel breathiness ratings?
4. Why is the correlation between breathiness ratings and CPP negative? (If you’re not sure, see
footnote 1).
5. What do you make of all this? For example, is there any advantage to using this measure of
periodicity measure in place of your own subjective estimate?
6. Look at the figures on the last two pages of this document and read the description at the top
of the page. Pick the two spectra that seem to show the most harmonic organization, and the
two spectra that seem to show the least harmonic organization. Record your results below,
along with the panel breathiness rating and the CPP value for each signal:
Signal with the most harmonic organization
File name (e.g., br09)
Panel BR
CPP
________________
______
____
Signal with the second most harmonic organization
7
File name
Panel BR
CPP
________________
______
____
Signal with the least harmonic organization
File name
Panel BR
CPP
________________
______
____
Signal with the second least harmonic organization
File name
Panel BR
CPP
________________
______
____
7. Last question: Do the voices that you judged to have the most harmonic organization tend to
be among the signals with: (a) the lowest breathiness ratings and/or (b) the largest CPP
values?
8
For LRC/Open Lab Users
Before you start: You can only run this lab on machines with an R: drive. Right now I
don’t know which is which, so for now you’re stuck with trial and error: Bug the Computer
icon on the desktop and look for an R: drive. If you don’t see it, try another one. Somebody
at the front desk might be able to help you, but I wouldn’t count on it.
1. While holding down the Windows key (the one with the flag), type ‘R’.
2. In the text box that pops up, type:
cmd <Enter>
A black window will appear. Type:
a. r: <Enter> (this switches you to the ‘r:’ disk, which is where the stuff is)
b. cd r:\ztool <Enter>
c. cpps br01.wav vowel <Enter> (or br02.wav or br02.wav … br10.wav)
3. The program will type out a bunch of numbers. The value that you need is the last one that
cpps reports: Mean CPPS. Write it down. (For ‘br01.wav’, it should be a small number
pretty close to zero, meaning that the signal is only marginally periodic.)
4. Do the same thing for br02.wav, br03.wav … br10.wav.
5. Return to step 5 under Procedure.
9
REFERENCES
Aronson, A.E. (1971). Early motor unit disease masquerading as psychogenic breathy
dysphonia: A clinical case presentation. Journal of Speech and Hearing Disorders, 36, 115124.
Aronson, A.E. (1990). Clinical voice disorders (3rd ed). New York: Thieme.
Boone, D.R., and McFarlane, S.C. (1988). The voice and voice therapy (4th ed). Englewood
Cliffs, NJ: Prentice Hall.
Colton, R.A., and Casper, J.K. (1990). Understanding voice problems: A physiological
perspective for diagnosis and treatment. Baltimore: Williams and Wilkins.
Hillenbrand, J.M., and Houde, R.A. (1996). “Acoustic characteristics of breathy vocal quality:
Dysphonic voices and continuous speech,” Journal of Speech and Hearing Research, 39,
311-321.
Hollien, H. (1987). "Old voices": What do we really know about them? Journal of Voice, 1, 217.
Ryan, W.J., and Burk, K.W. (1974). Perceptual and acoustic correlates of aging in the speech of
males. Journal of Communication Disorders, 1, 181-192.
Södersten. M., Hertegård S., Hammarberg B. (1995). Glottal closure, transglottal airflow, and
voice quality in healthy middle-aged women. Journal of Voice, 9, 182-97.
10
Narrow Band Spectra of the Test Signals
The figures below are narrow band amplitude spectra of the ten test signals. Notice that
the spectra vary quite a bit in the degree of harmonic organization, which reflects how periodic
the signal is. For example, for br06 nearly all of the energy is at harmonic frequencies (whole
number multiples of f0). The same is true of br03, though to a somewhat lesser extent. The
spectra of some of the other signals, however, show all kinds of energy at non-harmonic
frequencies; e.g., br01, br05, br07, and br10.
The CPP algorithm attempts to measure these variations in harmonic organization, with
large CPP values reflecting a high degree of harmonic organization (i.e., high periodicity). The
assumption is that signals with large CPP values tend to be less breathy – and vice versa.
11
Download