Speech Perception CS4706 Pitch Perception • But do pitch trackers capture what humans perceive? • Auditory system’s perception of pitch is non-linear – Sounds at lower frequencies with same difference in absolute frequency sound more different than those at higher frequencies (male vs. female speech) – Bark scale (Zwicker) and other models of perceived difference 7/15/2016 2 How do we capture loudness/intensity? • Is one utterance louder than another? • Energy closely correlated experimentally with perceived loudness • For each window, square the amplitude values of the samples, take their mean, and take the root of that mean (RMS energy) – What size window? – Longer windows produce smoother amplitude traces but miss sudden acoustic events 7/15/2016 3 Perception of Loudness • But the relation is non-linear: sones or decibels (dB) – Differences in soft sounds more salient than loud – Intensity proportional to square of amplitude so…intensity of sound with pressure x vs. reference sound with pressure r = x2/r2 – bel: base 10 log of ratio – decibel: 10 bels – dB = 10log10 (x2/r2) – Absolute (20 Pa, lowest audible pressure fluctuation of 1000 Hz tone), typical threshold level for tone at frequency 7/15/2016 4 How do we capture…. • • • • For utterances X and Y Pitch contour: Same or different? Pitch range: Is X larger than Y? Duration: Is utterance X longer than utterance Y? • Speaker rate: Is the speaker of X speaking faster than the speaker of Y? • Voice quality…. 7/15/2016 5