Speech Perception CS4706

advertisement
Speech Perception
CS4706
Pitch Perception
• But do pitch trackers capture what humans perceive?
• Auditory system’s perception of pitch is non-linear
– Sounds at lower frequencies with same difference in
absolute frequency sound more different than those at
higher frequencies (male vs. female speech)
– Bark scale (Zwicker) and other models of perceived
difference
7/15/2016
2
How do we capture
loudness/intensity?
• Is one utterance louder than another?
• Energy closely correlated experimentally with
perceived loudness
• For each window, square the amplitude values
of the samples, take their mean, and take the
root of that mean (RMS energy)
– What size window?
– Longer windows produce smoother amplitude
traces but miss sudden acoustic events
7/15/2016
3
Perception of Loudness
• But the relation is non-linear: sones or decibels (dB)
– Differences in soft sounds more salient than loud
– Intensity proportional to square of amplitude so…intensity
of sound with pressure x vs. reference sound with pressure
r = x2/r2
– bel: base 10 log of ratio
– decibel: 10 bels
– dB = 10log10 (x2/r2)
– Absolute (20 Pa, lowest audible pressure fluctuation of
1000 Hz tone), typical threshold level for tone at frequency
7/15/2016
4
How do we capture….
•
•
•
•
For utterances X and Y
Pitch contour: Same or different?
Pitch range: Is X larger than Y?
Duration: Is utterance X longer than utterance
Y?
• Speaker rate: Is the speaker of X speaking
faster than the speaker of Y?
• Voice quality….
7/15/2016
5
Download