Chapter 2

advertisement
ENG 528: Language Change
Research Seminar
Sociophonetics: An Introduction
Chapter 2: Production
Acoustic Concepts
• 3 dimensions of sound:
 frequency
 amplitude
 time
 phase might be considered a fourth
dimension
Frequency
amplitude
• Frequency is the time it takes the wave to go
through its pattern; measured in cycles per
second (cps), or Hertz (Hz)
0
5
time in ms
10
Amplitude
amplitude
• Amplitude is the degree to which a wave
deviates from zero sound pressure level
during its course; usually seen measured in
decibels (dB)
0
5
time in ms
10
Phase
amplitude
• Waves of the same frequency are in different
phases when their zero-crossing points are
different
• Waves in completely opposite phase cancel each
other out, creating antiformants or zeroes
0
5
time in ms
10
Fourier Analysis (1)
amplitude
• Fourier said that all complex waves can be
broken down into a series of simple waves
0
5
time in ms
10
Fourier Analysis (2)
• At one time, it was done through painstaking
mathematical calculations
• Discrete Fourier Transforms (DFT): use a window
and analyze only what’s in the window; window
has various shapes, depending on how it’s
attenuated (we’ll discuss that next)
• Fast Fourier Transforms (FFT): done by computer;
everything is digitized; note that you’ll get
quantization error, but advantages of FFT make
up for it
Fourier Analysis (3)
• Steps in Fourier Analysis are shown here
a.
am plitude
a m p litu d e
b.
0
5
tim e in m s
0
10
10
5
tim e in m s
60
d.
A m plitude
a m p litu d e in d B
c.
40
20
0
-2 0
0
200
400
600
800
1000
1200
1400
F req u en cy in H z
1600
1800
2000
2200
0
200
400
600
800
1000
1200
1400
f r e q u e n c y in H z
1600
1800
2000
2200
Windowing (1)
• Windowing is a necessary part of Fourier Analysis
• You have to chop the signal into pieces, called
windows, to analyze it
• Windows vary in two important ways:
a) Their length—this is how you get wideband
and narrowband spectra and spectrograms
b) Their shape—this has to do with the
windowing method
Windowing (2)
Rectangular Window
Triangular (Bartlett) Window
Windowing (3)
Hanning Window
Hamming Window
Windowing (4)
Blackman Window
Digitization (1)
amplitude
• Digitization involves sampling the waveform at
even intervals of time
• The computer extrapolates a waveform from the
samples
0
0.5
time in ms
1.0
Digitization (2)
amplitude
• However, the computer can extrapolate phony
waves if the digitization is done wrong; this is
called aliasing
0
0.05
0.1
0.15
time in ms
0.2
0.25
Digitization (3)
• To avoid aliasing, you have to sample at a rate
at least twice that of the highest frequency in
the signal
• That is, the highest frequency in the signal can
be no more than half the sampling rate
• Half the sampling rate is called the Nyquist
frequency
• E.g., if sampling rate is 44.1 kHz, Nyquist
frequency is 22.05 kHz
Digitization (4)
• Problem: any natural recording will have high
frequencies
• Solution: filter them out with a lowpass filter
• Note how filters have a transition band, so you have to
set the filter lower than the Nyquist frequency
Digitization (5)
• Next problem: in speech, amplitude falls off as
frequency goes up
• Solution: pre-emphasis of the signal, which
amplifies higher frequencies so they show up
on spectrograms
• 6 dB per octave increase in amplitude
• Usually added at a factor, such as 0.85
Digitization (6)
• The whole process can be done in two orders:
 pre-emphasis, lowpass filtering, digitization
 lowpass filtering, digitization, pre-emphasis
Visual Displays
• Power spectrum: one point in time; shows
frequency against amplitude
60
amplitude in dB
40
20
0
-20
-40
0
2000
4000
6000
frequency in Hz
8000
10000
• Wideband and narrowband spectrograms: we
discussed them last week. Note that a
spectrogram is a bunch of power spectra lined
up side-by-side
The Source-Filter Theory
• Vocal fold vibration is the source
 Depends on time between vibrations (resulting in F0 in
cycles/s, i.e., Hz)
 Harmonics at all multiples of F0 because they have zerocrossing points at the same places
• Configuration of tongue, lips, etc. is the filter; depends on
length of cavities, as we’ll see
first harmonic (fundamental frequency)
amplitude
+
tenth
harmonic
0
200
400
600
800
1000
1200
1400
frequency in Hz
1600
1800
2000
=
amplitude
third harmonic
amplitude
second harmonic
2200
frequency
frequency
Formants (1)
• Tube open at one end: Fn=(2n-1)c/4L
c=speed of sound, ~34,300 cm/s
• Lots of vowels have cavities like this
lips
both (or all) sine waves
have an antinode at
the closed end
both (or all) sine waves
have a node at
the open end
glottis
[
]
tube open at one end
compression
lips
constriction
glottis
[i]
lowest
resonance
second
lowest
resonance
source
of
sound
tube open at one end
lips
[]
glottis
rarefaction
tubes open at one end
Formants (2)
• Tube closed at both ends: Fn=nc/2L
• Back cavity often looks like this; front cavity
can with lip rounding
both (or all) sine waves
have antinodes at
both ends
lips
tubes closed at both ends
glottis
[u]
back cavity
and constriction
together make a
Helmholtz resonator
compression
constriction
lowest
resonance
lips
source
of
sound
second
lowest
resonance
rarefaction
[i]
glottis
tube closed at both ends
lips
[]
glottis
tube closed at both ends
Formants (3)
• Helmholz resonator: any jug-shaped cavity
• Just one resonance: F=(c/2)(An)/(VbLn),
where An=cross-sectional area of neck,
Vb=volume of body, Ln=length of neck
lips
tubes closed at both ends
glottis
[u]
back cavity
and constriction
together make a
Helmholtz resonator
constriction
lips
[i]
glottis
tube closed at both ends
lips
[]
glottis
tube closed at both ends
Measuring F0 (1)
• Autocorrelation is the matching of sections of
a waveform with each other to see where they
match up best
Amplitude
}
}
These two sections resemble each other
0
5
10
Time (ms)
15
20
Measuring F0 (2)
• Other ways to measure F0:
Measuring formants (1)
• One old method: estimation from wideband
FFT power spectra
30
F1
F2
amplitude in dB
20
F3
F4
10
0
-10
-20
-30
-40
0
1000
2000
3000
frequency in Hz
4000
5000
Measuring formants (2)
• Another old method: estimation from
narrowband FFT power spectra
60
F1 corresponds
closely to this
harmonic F2 corresponds F corresponds
3
closely to this closely
to this
F4 is about
harmonic
harmonic
halfway between
these two harmonics
amplitude in dB
40
20
0
-20
-40
-60
0
1000
2000
3000
frequency in Hz
4000
5000
Measuring formants (3)
• The most common method today: Linear
Predictive Coding (LPC)
• You’ll get results such as this:
Time_s
F1_Hz
F2_Hz
F3_Hz
F4_Hz
0.241506 571.972189 1785.357253 2473.494437 3200.979236
60
LPC spectrum
FFT spectrum
50
amplitude in dB
40
30
20
10
0
-10
-20
-30
-40
0
1000
2000
3000
frequency in Hz
4000
5000
Measuring formants (4)
• In LPC, you set the number of poles or
coefficients, which determine the number of
formants the program expects to find
• Improper setting results in bad readings
Bandwidth
• One other thing to pay attention to
• Two ways to define bandwidth: a) area with half the
energy of the curve or b) frequency range at 3 dB below
peak
• Larger bandwidths can indicate poor recording quality or
other factors such as nasality
• You’ll get readings such as:
67.96889751316604 Hertz (nearest B1 to CURSOR)
amplitude
curve with
narrow
bandwidth
curve with
large ("wide")
bandwidth
frequency
Vowel Formant Exercise #1
• Record yourself saying the following words two
ways—first, in a normal voice, and second, while
yawning: heed, hid, head, had, hod, hawed, HUD,
hood, who’d, hold, heard
• Measure the first three formants and the
fundamental frequency at the center of each
vowel and put these measurements in a
spreadsheet
• Plot F1 and F2 in a graph
• Turn in the spreadsheet and the graph two class
periods from now
Vowel Plot Practice
These first two plots are the ones from last week, in case we didn’t have time
to discuss them then.
400
400
r
i
u
500

e
600
F1
700
 
r
o
ai
au
550
=


650
ai
=
æ
900

æ
e

l
o o
r
o

r



 


ai1
=


 au
2
=


oi





k
700
r
i
e =
600
v
g
u

r
r


800
500
'
=o
u=
1
r

=o =
o
i
450
 l
 r=ur
2
r
æ
u
l
oi
e
N
<tour>
F1
i
æ
ai2
æ
750
2800 2600 2400 2200 2000 1800 1600 1400 1200 1000 800
2400 2200 2000 1800 1600 1400 1200 1000 800
F2
F2
Vowel Plot Practice
Here are some new plots. Where is each one of these speakers from? How
do you know?
300
i
r
i



500
F1
700
900
ai
2500
oi
i
u
l
 =u

600
æ

æ:
.
.
æ
au
800
<fields>

r
r
o
 '
..
o


e

r
e
oi
600
1000
3000
500
<cool>
r


e

r
'


ol

au
v

ai
2000
1500
F2

r
r

=o =u
r
e o
700
l
<still>
N
F1
400
i
u
ai
æ
ai
800
o
1000
500
2500
2000
o
1500
F2
v

1000
References
• The windowing diagrams on slides 10-12 came
from:
• Haddad, Richard A., and Thomas W. Parsons.
1991. Digital Signal Processing: Theory,
Applications, and Hardware. New
York/Oxford: W. H. Freeman.
• The diagram on slide 16 came from:
• http://crca.ucsd.edu/~msp/techniques/v0.11/
book-html/node129.html
Download