Analyzing the Speech Signal Julia Hirschberg CS 6998 7/15/2016

Analyzing the Speech Signal Julia Hirschberg CS 6998 7/15/2016 1 Basic Acoustics What is sound? Pressure fluctuations in the air caused by a musical instrument, a car horn, a voice Cause eardrum to move Auditory system translates into neural impulses Brain interprets as sound How does it travel? Via sound wave of air molecules that ‘travels’ thru air 7/15/2016 2 Molecules don’t travel but pressure fluctuations do But sound waves lose energy as they travel -it takes energy to move those molecules And molecules also move for reasons other than e.g. the sound of my voice: noise Ratio of speech-generated molecular motion to other motion: signal-to-noise ratio 7/15/2016 3 Types of Sound: Periodic Waves Simple Periodic Waves (sine waves) defined by Frequency: how often does pattern repeat per time unit Cycle: one repetition Period: duration of cycle Frequency=# cycles per time unit, e.g. • Frequency in Hz=1sec/period_in_sec • Horizontal axis of waveform Amplitude: peak deviation of pressure from normal atmospheric pressure 7/15/2016 4 Phase: timing of waveform relative to a reference point Complex periodic waves (eg) Cyclic but composed of two or more sine waves Fundamental frequency (F0): rate at which largest pattern repeats (also GCD of component freqs) Components not always easily identifiable: power spectrum graphs amplitude vs. frequency 7/15/2016 5 Fourier’s Theorem Any complex waveform can be analyzed into a set of sine waves with their own frequencies, amplitudes, and phases Fourier analysis produces power spectrum from complex periodic wave Potential problems: Assumes infinite waveform when we have only a small window for analysis Waveform itself may be inaccurately represented 7/15/2016 6 Types of Sound: Aperiodic Waves Waveforms with random or non-repeating patterns (eg) Random aperiodic waveforms: white noise Flat spectrum: equal amplitude for all frequency components Transients: sudden bursts of pressure (clicks, pops, door slams) Waveform shows a single impulse Fourier analysis shows a flat spectrum 7/15/2016 7 Sample Analyses Wavesurfer Download from http://www.speech.kth.se/wavesurfer/download.html 7/15/2016 8 Filters Acoustic filters block out certain frequencies of sounds Low-pass filter blocks high frequency components of a waveform High-pass filter blocks low frequencies Reject band (what to block) vs. pass band (what to let through) 7/15/2016 9 Production of Speech  Voiced and voiceless sounds  Vocal fold vibration produces complex periodic waveform Cycles per sec of lowest frequency component of signal = fundamental frequency (F0) Fourier analysis yields power spectrum with component frequencies and amplitudes F0 is first (lowest frequency) peak Harmonics are resonances of vocal folds multiples of F0  Vocal tract filters simple voicing waveform to create complex wave 7/15/2016 10 Digital Signal Processing Analog devices store and analyze continuous air pressure variations (speech) as a continuous signal Digital devices (e.g. computers) first convert continuous signals into discrete signals (A-to-D conversion) Sampling: how many time points in the signal to consider? Quantization: how accurately do we want to measure amplitude at sampling points? 7/15/2016 11 Sampling Sampling rate: how often do we need to sample? At least 2 samples per cycle to capture periodicity of a waveform component at a given frequency 100 Hz waveform needs 200 samples per sec Nyquist frequency: highest-frequency component captured with a given sampling rate (half the sampling rate) 7/15/2016 12 Samping/storage tradeoff Human hearing: 20K top frequency But do we really need to store 40K samples per second of speech? Telephone speech: 300-4K Hz (8K sampling) But fricatives have energy above 4K 16-22K usually good enough 7/15/2016 13 Sampling Errors Aliasing: Signal’s frequency higher than half the sampling rate Solutions: Increase the sampling rate Filter out frequencies above half the sampling rate (anti-aliasing filter) 7/15/2016 14 Quantization Measuring the amplitude at sampling points: what resolution to choose? Integer representation 8, 12 or 16 bits per sample Noise due to quantization steps avoided by higher resolution but requires more storage Choice depends on what kind of analysis to be done 7/15/2016 15 But clipping occurs when input volume is greater than range representable in digitized waveform  transients 7/15/2016 16 Perception of Pitch Auditory system’s perception of pitch is nonlinear Sounds at lower frequencies with same difference in absolute frequency sound more different than those at higher frequencies Bark scale (Zwicker) models perceived difference 7/15/2016 17 Pitch-Tracking Autocorrelation techniques Goal: Estimate F0 over time as fn of vocal fold vibration A periodic waveform is correlated with itself One period looks much like another (eg) Find the period by finding the ‘lag’ (offset) between two windows on the signal for which the correlation of the windows is highest Lag duration (T) is 1 period of waveform Inverse is F0 (1/T) 7/15/2016 18 Errors: Halving: shortest lag calculated is too long (underestimate pitch) Doubling: shortest lag too short (overestimate pitch) 7/15/2016 19 Pitch Track Headers  version 1  type_code 4  frequency 12000.000000  samples 160768  start_time 0.000000  end_time 13.397333  bandwidth 6000.000000  dimensions 1  maximum 9660.000000  minimum -17384.000000  time Sat Nov 2 15:55:50 1991  operation record: padding xxxxxxxxxxxx 7/15/2016 20 Pitch Track Data  F0 Pvoicing Energy A/C Score  147.896 1 2154.07 0.902643  140.894 1 1544.93 0.967008  138.05 1 1080.55 0.92588  130.399 1 745.262 0.595265  0 0 567.153 0.504029  0 0 638.037 0.222939  0 0 670.936 0.370024  0 0 790.751 0.357141  141.215 1 1281.1 0.904345 7/15/2016 21 RMS Amplitude Energy closely correlated experimentally with perceived loudness For each window, square the amplitude values of the samples, take their mean, and take the root of that mean What size window? Longer windows produce smoother amplitude traces but miss sudden acoustic events 7/15/2016 22 Perception of Loudness  Non-linear: Described in sones or decibels (dB) Differences in soft sounds more salient than loud Intensity proportional to square of amplitude so…intensity of sound with pressure x vs. reference sound with pressure r = x2/r2 bel: base 10 log of ratio decibel: 10 bels dB = 10log10 (x2/r2) Absolute (20 Pa, lowest audible pressure fluctuation of 1000 Hz tone) or typical threshold level for tone at frequency 7/15/2016 23 Pressure of Common Sounds Event Absolute Whisper Quiet office Conversation Bus Subway Thunder *DAMAGE* 7/15/2016 Pressure 20 200 2K 20K 200K 2M 20M 200M Db 0 20 40 60 80 100 120 140 24 Speech Analysis Gives us Information About variation in Loudness Pitch (contours, accent, phrasing, range) Timing (rate, pauses) Style (articulation, disfluencies) This can be correlated with other features Syntax, semantics, discourse context, words 7/15/2016 25 Now and Next Week Now: turn in discussion questions and project ideas Read HLT96 (Ch. 5) Try out some TTS systems; exercises Bring 3 discussion questions to class Decide which week you would like to help with class 7/15/2016 26 Vocal fold vibration [UCLA Phonetics Lab demo] 7/15/2016 27 Places of articulation dental labial alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html 7/15/2016 28

Analyzing the Speech Signal Julia Hirschberg CS 6998 7/15/2016

Related documents

Products

Support

Analyzing the Speech Signal Julia Hirschberg CS 6998 7/15/2016

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib