Acoustic Analysis of..

advertisement
Acoustic Analysis of Speech
Robert A. Prosek, Ph.D.
CSD 301
Acoustic Analysis
• Instrumental acoustical analyses have been
used for over 100 years
• Analog techniques dominated the first 60 of
these years
• More recently, digital techniques have
dominated the field
• We will begin by introducing a few of the
important analog methods, then turn to the
digital
Oscillograph/Oscillogram
• Any device that can display a waveform is an
oscillograph
• The output (display or hardcopy) is an oscillogram
• There is limited information available in a waveform
• silence
• burst
• noise
• periodicity
Filter Bank Analysis
• In this procedure, a filter bank or a single filter is used to
divide the signal energy into frequency bands
• The output energy is displayed for each band
• This is a form of spectral analysis
• The output typically is displayed in the form of an
histogram
• The technique is very common in audiology and hearing
applications
Sound
Spectrograph/Spectrogram
• The instrument is called a spectrograph
• The output (usually a hardcopy) is a spectrogram
• This is the most commonly used device in speech
research
• The spectrograph can capture the dynamics of speech
• Acoustic signals vary only in frequency, amplitude and
time
• The sound spectrograph captures all of these
Sound Spectrogram
• Abscissa is time
• Ordinate is frequency
• Intensity is shown as shades of gray
• Black areas indicate the highest amplitudes
• White areas indicate the noise floor
• Amplitudes between these extremes are shown in
varying shades of grey
• the more intense the signal is at a particular
frequency and time, the darker the trace
Digital Signal Processing (1)
• In the late 1960’s general purpose digital computers
made it possible to analyze acoustic signals on the
computer
• These techniques are necissarily discrete as well as
digital
• Once in discrete form, the signal can be stored
conveniently and analyzed in many way that were not
possible with analog techniques
Digital Signal Processing (2)
• Presampling or brickwall filtering
• Nyquist Theorum
• In order to represent a signal faithfully, it must be
sampled at a rate equal to twice its highest
frequency
• The brickwall filter removes all of the energy above
the Nyquist frequency
• The clinician/researcher determines the Nyquist
frequency
• Some knowledge of speech and speech and
language disorders is required
Digital Signal Processing (3)
• Sampling
• Analog-to-digital conversion
• Signal must be sampled at the Nyquist rate
• Sampling decides the times at which the signal will be
• Sampling converts the acoustic signal into a series of
numbers
• Instead of amplitudes at all instances of time, no
matter how small the time interval, amplitudes in the
digital world exist only at the sampling interval
• Aliasing
Digital Signal Processing (4)
• Quantization
• Discrete number of amplitude levels
• The more quantizer levels available, the more the
discrete signal represents the original analog signal
• In our applications, 16 -bit quantizers over a 20-volt
range are typical
• This yields an amplitude resolution of 300 μvolts
and a signal to noise ratio of 96 dB
Digital Signal Processing (5)
• After A/D conversion
• the signal is stored as a stream of numbers
• time is related by the index to the sampling rate
• the amplitude is the stored number
• in this form, many operations can be performed
Waveform Display
• Duration measurements
• speech changes gradually
• some consistent rules need to be adopted
• Signal editing
• again, some consistent rules need to be adopted
• Amplitude measurements
• rms is the most common
• vocal fundamental frequency
Digital Spectrum Analysis
• The Fourier Transform revisited (FFT)
• Periodic waveforms can be thought of as a series of
sinusoids
• amplitude and phase
• The Fourier Transform and the Inverse Fourier
transform allow powerful analysis-by-synthesis
techniques
Digital Spectrograph
• This is a series of spectra based on the FFT
or LPC (see below)
• The amplitude is depicted as shades of gray
• PRAAT is an example of a digital
spectrograph
• Speech Filing System, Speech Station 2,
Wavesurfer, and many other free or
commercially spectrographs are available
Linear Predictive Coding (1)
• Speech is highly predictable over the short term
• It is not hard to predict the amplitude of the next time
sample of the speech waveform from a knowledge of the
previous amplitudes
• As few as 10 to 15 previous samples is all that is
required
LPC (2)
• From statistics, we know that:
• y= a0+a1(x-1)+a2(x-2)+...+an(x-n)
• where y is the amplitude of the next sample
• and x is one of the previous samples
• This is linear prediction
LPC (3)
• Linear Predictive Coding (LPC) is one of the most
powerful techniques in speech analysis
• The a’s in the previous equation can be used as
estimates of the resonances of the vocal tract.
• They can represent sections of the vocal tract
Wideband versus
Narrowband Spectrograms
• Wideband (0.005, 0.007, 0.009)
• Short time window
• Good for measuring formant frequencies
• Narrowband (0.1, 0.05)
• Long time window
• Good for showing and measuring harmonics
Download