ENG 528: Language Change Research Seminar Sociophonetics: An Introduction Chapter 2: Production Acoustic Concepts • 3 dimensions of sound: frequency amplitude time phase might be considered a fourth dimension Frequency amplitude • Frequency is the time it takes the wave to go through its pattern; measured in cycles per second (cps), or Hertz (Hz) 0 5 time in ms 10 Amplitude amplitude • Amplitude is the degree to which a wave deviates from zero sound pressure level during its course; usually seen measured in decibels (dB) 0 5 time in ms 10 Phase amplitude • Waves of the same frequency are in different phases when their zero-crossing points are different • Waves in completely opposite phase cancel each other out, creating antiformants or zeroes 0 5 time in ms 10 Fourier Analysis (1) amplitude • Fourier said that all complex waves can be broken down into a series of simple waves 0 5 time in ms 10 Fourier Analysis (2) • At one time, it was done through painstaking mathematical calculations • Discrete Fourier Transforms (DFT): use a window and analyze only what’s in the window; window has various shapes, depending on how it’s attenuated (we’ll discuss that next) • Fast Fourier Transforms (FFT): done by computer; everything is digitized; note that you’ll get quantization error, but advantages of FFT make up for it Fourier Analysis (3) • Steps in Fourier Analysis are shown here a. am plitude a m p litu d e b. 0 5 tim e in m s 0 10 10 5 tim e in m s 60 d. A m plitude a m p litu d e in d B c. 40 20 0 -2 0 0 200 400 600 800 1000 1200 1400 F req u en cy in H z 1600 1800 2000 2200 0 200 400 600 800 1000 1200 1400 f r e q u e n c y in H z 1600 1800 2000 2200 Windowing (1) • Windowing is a necessary part of Fourier Analysis • You have to chop the signal into pieces, called windows, to analyze it • Windows vary in two important ways: a) Their length—this is how you get wideband and narrowband spectra and spectrograms b) Their shape—this has to do with the windowing method Windowing (2) Rectangular Window Triangular (Bartlett) Window Windowing (3) Hanning Window Hamming Window Windowing (4) Blackman Window Digitization (1) amplitude • Digitization involves sampling the waveform at even intervals of time • The computer extrapolates a waveform from the samples 0 0.5 time in ms 1.0 Digitization (2) amplitude • However, the computer can extrapolate phony waves if the digitization is done wrong; this is called aliasing 0 0.05 0.1 0.15 time in ms 0.2 0.25 Digitization (3) • To avoid aliasing, you have to sample at a rate at least twice that of the highest frequency in the signal • That is, the highest frequency in the signal can be no more than half the sampling rate • Half the sampling rate is called the Nyquist frequency • E.g., if sampling rate is 44.1 kHz, Nyquist frequency is 22.05 kHz Digitization (4) • Problem: any natural recording will have high frequencies • Solution: filter them out with a lowpass filter • Note how filters have a transition band, so you have to set the filter lower than the Nyquist frequency Digitization (5) • Next problem: in speech, amplitude falls off as frequency goes up • Solution: pre-emphasis of the signal, which amplifies higher frequencies so they show up on spectrograms • 6 dB per octave increase in amplitude • Usually added at a factor, such as 0.85 Digitization (6) • The whole process can be done in two orders: pre-emphasis, lowpass filtering, digitization lowpass filtering, digitization, pre-emphasis Visual Displays • Power spectrum: one point in time; shows frequency against amplitude 60 amplitude in dB 40 20 0 -20 -40 0 2000 4000 6000 frequency in Hz 8000 10000 • Wideband and narrowband spectrograms: we discussed them last week. Note that a spectrogram is a bunch of power spectra lined up side-by-side The Source-Filter Theory • Vocal fold vibration is the source Depends on time between vibrations (resulting in F0 in cycles/s, i.e., Hz) Harmonics at all multiples of F0 because they have zerocrossing points at the same places • Configuration of tongue, lips, etc. is the filter; depends on length of cavities, as we’ll see first harmonic (fundamental frequency) amplitude + tenth harmonic 0 200 400 600 800 1000 1200 1400 frequency in Hz 1600 1800 2000 = amplitude third harmonic amplitude second harmonic 2200 frequency frequency Formants (1) • Tube open at one end: Fn=(2n-1)c/4L c=speed of sound, ~34,300 cm/s • Lots of vowels have cavities like this lips both (or all) sine waves have an antinode at the closed end both (or all) sine waves have a node at the open end glottis [ ] tube open at one end compression lips constriction glottis [i] lowest resonance second lowest resonance source of sound tube open at one end lips [] glottis rarefaction tubes open at one end Formants (2) • Tube closed at both ends: Fn=nc/2L • Back cavity often looks like this; front cavity can with lip rounding both (or all) sine waves have antinodes at both ends lips tubes closed at both ends glottis [u] back cavity and constriction together make a Helmholtz resonator compression constriction lowest resonance lips source of sound second lowest resonance rarefaction [i] glottis tube closed at both ends lips [] glottis tube closed at both ends Formants (3) • Helmholz resonator: any jug-shaped cavity • Just one resonance: F=(c/2)(An)/(VbLn), where An=cross-sectional area of neck, Vb=volume of body, Ln=length of neck lips tubes closed at both ends glottis [u] back cavity and constriction together make a Helmholtz resonator constriction lips [i] glottis tube closed at both ends lips [] glottis tube closed at both ends Measuring F0 (1) • Autocorrelation is the matching of sections of a waveform with each other to see where they match up best Amplitude } } These two sections resemble each other 0 5 10 Time (ms) 15 20 Measuring F0 (2) • Other ways to measure F0: Measuring formants (1) • One old method: estimation from wideband FFT power spectra 30 F1 F2 amplitude in dB 20 F3 F4 10 0 -10 -20 -30 -40 0 1000 2000 3000 frequency in Hz 4000 5000 Measuring formants (2) • Another old method: estimation from narrowband FFT power spectra 60 F1 corresponds closely to this harmonic F2 corresponds F corresponds 3 closely to this closely to this F4 is about harmonic harmonic halfway between these two harmonics amplitude in dB 40 20 0 -20 -40 -60 0 1000 2000 3000 frequency in Hz 4000 5000 Measuring formants (3) • The most common method today: Linear Predictive Coding (LPC) • You’ll get results such as this: Time_s F1_Hz F2_Hz F3_Hz F4_Hz 0.241506 571.972189 1785.357253 2473.494437 3200.979236 60 LPC spectrum FFT spectrum 50 amplitude in dB 40 30 20 10 0 -10 -20 -30 -40 0 1000 2000 3000 frequency in Hz 4000 5000 Measuring formants (4) • In LPC, you set the number of poles or coefficients, which determine the number of formants the program expects to find • Improper setting results in bad readings Bandwidth • One other thing to pay attention to • Two ways to define bandwidth: a) area with half the energy of the curve or b) frequency range at 3 dB below peak • Larger bandwidths can indicate poor recording quality or other factors such as nasality • You’ll get readings such as: 67.96889751316604 Hertz (nearest B1 to CURSOR) amplitude curve with narrow bandwidth curve with large ("wide") bandwidth frequency Vowel Formant Exercise #1 • Record yourself saying the following words two ways—first, in a normal voice, and second, while yawning: heed, hid, head, had, hod, hawed, HUD, hood, who’d, hold, heard • Measure the first three formants and the fundamental frequency at the center of each vowel and put these measurements in a spreadsheet • Plot F1 and F2 in a graph • Turn in the spreadsheet and the graph two class periods from now Vowel Plot Practice These first two plots are the ones from last week, in case we didn’t have time to discuss them then. 400 400 r i u 500 e 600 F1 700 r o ai au 550 = 650 ai = æ 900 æ e l o o r o r ai1 = au 2 = oi k 700 r i e = 600 v g u r r 800 500 ' =o u= 1 r =o = o i 450 l r=ur 2 r æ u l oi e N <tour> F1 i æ ai2 æ 750 2800 2600 2400 2200 2000 1800 1600 1400 1200 1000 800 2400 2200 2000 1800 1600 1400 1200 1000 800 F2 F2 Vowel Plot Practice Here are some new plots. Where is each one of these speakers from? How do you know? 300 i r i 500 F1 700 900 ai 2500 oi i u l =u 600 æ æ: . . æ au 800 <fields> r r o ' .. o e r e oi 600 1000 3000 500 <cool> r e r ' ol au v ai 2000 1500 F2 r r =o =u r e o 700 l <still> N F1 400 i u ai æ ai 800 o 1000 500 2500 2000 o 1500 F2 v 1000 References • The windowing diagrams on slides 10-12 came from: • Haddad, Richard A., and Thomas W. Parsons. 1991. Digital Signal Processing: Theory, Applications, and Hardware. New York/Oxford: W. H. Freeman. • The diagram on slide 16 came from: • http://crca.ucsd.edu/~msp/techniques/v0.11/ book-html/node129.html