Auditory Perception Hillenbrand SPPA 2060

advertisement
Auditory Perception
Hillenbrand
SPPA 2060
Auditory perception is one branch of
a larger science called psychophysics.
Psychophysics studies the
relationships that exist between
perceptual dimensions (also
psychological, subjective, or mental)
and the physical properties of stimuli.
The distinction between perceptual
dimensions and physical dimensions
is all important.
Physical dimensions: Any aspect of a
physical stimulus that could be
measured in a straightforward way
with an instrument (e.g., a light meter,
a sound level meter, a spectrum
analyzer, a fundamental frequency
meter, etc.)
Perceptual dimensions: These are the
mental experiences that occur inside
the mind of the observer. These
experiences are actively created by the
sensory system and brain based on an
analysis of the physical properties of
the stimulus. Perceptual dimensions
can be measured, but not with a
meter, Measuring perceptual
dimensions requires an observer (e.g.,
a listener, a “looker”, a smeller, a
taster …).
For example, in vision:
The percept of hue is created by the eye
and brain based (in part) on the visual
system’s analysis of the wavelength
composition of the stimulus.
But: hue wavelength
wavelength: physical dimension (can be
measured with a meter)
hue: psychological dimension (can be measured,
but that requires an observer)
Visual Psychophysics
Perceptual Dimensions
Hue
Brightness
Shape
Physical Properties
of Light
Wavelength
Luminance
Contour/Contrast
Both dimensions can be measured – the
physical dimensions can be measured with the
right instrument; measuring psychological
dimensions requires an observer.
6/20/2016
6
Auditory Psychophysics
(aka psychoacoustics or auditory perception)
Perceptual Dimensions
Pitch
Loudness
Timbre (sound quality)
6/20/2016
Physical Properties
of Sound
Fund. Freq. (f0)
Intensity
Spectrum env./
Amplitude env.
7
Perceptual Experiences are Actively Created,
Not Passively Received
Subjective contour: The triangles, circles
and squares are “seen” not so much
because they are “there” in the physical
sense, but because they are inferred.
Unconscious inference lies at the heart of
perception. In some sense, “I’ll see it when
I believe it.” is more true than “I’ll believe
it6/20/2016
when I see it.”
8
Reversible Figures
Reversible figures reveal the active
organization of percepts – the drawing on
the left is organized by you brain into a bird,
then reorganized into a rabbit, then back to a
bird,
…
Same
with
the
old
lady/young
lady.
6/20/2016
9
Another duck-rabbit,
Which is bigger?
just for yucks.
Bottom one, eh? Nah.
They’re the same. (This
is the Jastraw Illusion.)
The Muller-Lyer Illusion
Which horizontal line is longer?
The Muller-Lyer Illusion
Surprise, surprise. They’re the same. Duh everything in this field is always the same. It
gets on your nerves.
The corridor illusion: Which cylinder is larger? The
cylinder to the right appears larger because the
visual system infers that it is further away. The
inference is unconscious, automatic and obligatory
(i.e., you can’t help yourself – even when you know
the trick).
6/20/2016
13
The McGurk Effect
(McGurk & Macdonald, 1976*)
*McGurk,
H., and MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.
6/20/2016
14
Some History on the McGurk Illusion
The most striking demonstration of the combined (bimodal) nature of speech understanding appeared by
accident. Harry McGurk, a senior developmental psychologist at the University of Surrey in England, and his
research assistant John MacDonald were studying how infants perceive speech during different periods of
development. For example, they placed a videotape of a mother talking in one location while the sound of her
voice played in another. For some reason, they asked their recording technician to create a videotape with the
audio syllable "ba" dubbed onto a visual "ga." When they played the tape, McGurk and McDonald perceived
"da." Confusion reigned until they realized that "da" resulted from a quirk in human perception, not an error on
the technician's part. After testing children and adults with the dubbed tape, the psychologists reported this
phenomenon in a 1976 paper humorously titled "Hearing Lips and Seeing Voices," a landmark in the field of
human sensory integration. This audio-visual illusion has become known as the McGurk effect or McGurk
illusion."
Further reading:
Dominic W. Massaro & David G. Stork, "Speech Recognition and Sensory Integration", American Scientist,
1998, vol. 86, p. 236-244. The McGurk effect has played an important role in audio-visual speech integration
and speech reading.
McGurk links on the web include the following:
http://www.psych.ucr.edu/faculty/rosenblum/AVspeech.html
http://www.theshop.net/campbell/mcgurk1.htm
http://www.amsci.org/amsci/articles/98articles/massaro.html
http://www.sys.uea.ac.uk/~iam/newav/newav.html
http://www.media.uio.no/personer/arntm/McGurk_english.html
http://macserver.haskins.yale.edu/Haskins/HEADS/BIBLIOGRAPHY/bibliomcgurk.html
6/20/2016
15
The Three Main Perceptual
Attributes of Sound
• Pitch (not fundamental frequency)
• Loudness (not intensity)
• Timbre (not spectrum envelope or
amplitude envelope)
The terms pitch, loudness, and timbre refer not to
the physical characteristics of sound, but to the
mental experiences that occur in the minds of
listeners.
6/20/2016
16
Pitch and Fundamental Frequency
Rule 1: All else being equal, the higher the f0, the higher
the perceived pitch.
Lower f0, lower pitch
6/20/2016
Higher f0, higher pitch
17
Rule 2: The ear is more sensitive to f0 differences in
the low frequencies than the higher frequencies.
This means that:
300 vs. 350 3000 vs. 3050
That is, the difference in perceived pitch (not f0)
between 300 and 350 Hz is NOT the same as the
difference in pitch between 3000 and 3050 Hz, even
though the physical differences in f0 are the same.
300-350 vs. 3000-3050
Which f0 difference is larger? (A: They’re the same.)
Which pitch difference is larger? (A: 300 vs. 350 – by a lot)
6/20/2016
18
Lower f0, lower pitch
Higher f0, lower pitch
Three ways to measure f0
1. Frequency domain: Measure H1 (i.e., the lowest
frequency harmonic).
2. Frequency domain: Measure the harmonic spacing.
3. Time domain: Measure the fundamental period.
6/20/2016
19
The “Problem” of the Missing Fundamental
Normal f0:
6/20/2016
f0 Removed:
20
Conclusion: The fundamental does not need to be
physically present in the signal for a listener to hear
a pitch corresponding to where f0 ought to be.
What Explains This?
Even with the 1st harmonic removed, a signal remains
periodic at the original f0.
6/20/2016
21
Amplitude
Amplitude
100
90
80
70
60
50
40
30
20
10
0
100 0
90
80
70
60
50
40
30
20
10
0
0
Harmonics at 1200, 1400
1600, 1800 ...
Harmonic spacing is 200 Hz
Signal is periodic at 200 Hz
500 1000 1500 2000 2500 3000
'Harmonics' at 1240, 1440
1640, 1840 ...
The “Pitch
Shift” Effect
'Harmonic' spacing is 200 Hz
Signal is periodic at ~205 Hz
(Don't worry about why. It just is.)
500 1000 1500 2000 2500 3000
Frequency (Hz)
If the auditory system evaluated pitch by measuring the
harmonic spacing, these 2 signals (1200, 1400, 1600 … and 1240,
1440, 1640 …) would have the same pitch. They do not have
the same pitch, so we can rule out harmonic spacing.
Which theory is left? Measuring the fundamental period.
6/20/2016
22
What does all this mean?
Rule 3: The sensation of pitch is probably
based on a measurement of the
fundamental period. It is definitely not
based on a measurement of either (a) the
lowest frequency harmonic in a
harmonic spectrum (because of the “missing
fundamental” effect), or (b) harmonic
spacing (because of the “pitch shift” effect).
6/20/2016
23
Loudness and Intensity
Rule 1: All else being equal, the higher the
intensity, the greater the loudness.
Higher intensity,
higher loudness
6/20/2016
Lower intensity,
lower loudness
24
Rule 2: The relationship between intensity and
loudness is seriously nonlinear. Doubling
intensity does not double loudness. In order to
double loudness, intensity must be increased
by a factor of 10, or by 10 dB [10 x log10 (10) =
10 x 1 = 10 dB]. This is called the 10 dB rule.
Two signals differing by 10 dB:
(500 Hz sinusoids)
Note that the more intense sound is NOT
10 times louder, even though it is 10 times
more intense.
6/20/2016
25
The 10 dB rule means that a 70 dB signal
will be twice as loud as a 60 dB signal,
four times as loud as a 50 dB signal,
eight times as loud as a 40 dB signal, etc.
A 30 dB hearing loss is considered mild –
just outside the range of normal hearing.
Based on the 10 dB rule, how much is
loudness affected by a 30 dB hearing
loss?
(Answer: 1/8th. But note that this does not mean that
someone with a 30 dB loss will have 8 times more
difficulty with speech understanding than someone
with normal hearing.)
Rule 3: Loudness is strongly affected by the
frequency of the signal. If intensity is held
constant, a mid-frequency signal (in the
range from ~1000-4000 Hz) will be louder
than lower or higher frequency signals.
250 Hz, 3000 Hz, 8000 Hz
The 3000 Hz signal should appear louder
than the 125 or the 8000 signal, despite the
fact that their intensities are (about) equal.
(Remember that this is the reason for the
dBHL scale.)
6/20/2016
27
Timbre (also sound quality or tone color)
Timbre, also known as sound quality or
tone color, is oddly defined in terms of
what it is not:
When two sounds are heard that match for
pitch, loudness, and duration, and a
difference can still be heard between the
sounds, that difference is called timbre
(also called sound quality or tone color).
6/20/2016
28
Example: a clarinet, a saxophone, and a
piano all play a middle C at the same
loudness and same duration. Each of
these instruments has a unique sound
quality. This difference is called timbre,
tone color, or sound quality.
There are also many examples of timbre
difference in speech. For example, two
vowels (e.g., [ɑ] and [i]) spoken at the same
loudness and same pitch differ from one
another in timbre.
6/20/2016
29
There are two physical correlates of
timbre:
•
•
spectrum envelope
amplitude envelope
spectrum envelope: Smooth line drawn to
enclose an amplitude spectrum.
amplitude envelope: Smooth line drawn
to enclose a sound wave (time domain
representation).
6/20/2016
30
Timbre and Spectrum Envelope
Timbre differences between one musical instrument and another are
partly related to differences in spectrum envelope -- differences in the
relative amplitudes of the individual harmonics. In the examples
above, we would expect all of these sounds to have the same pitch
because the harmonic spacing is the same in all cases. The timbre
differences that you would hear are controlled in part by the
differences in the shape of the spectrum envelope.
6/20/2016
31
Six Synthesized Sounds Differing in
Spectrum Envelope
Note the similarities in pitch (due to
constant f0/harmonic spacing) and the
differences in timbre or sound quality.
6/20/2016
32
Vowels Also Differ in Spectrum Envelope
Shown here are the smoothed envelopes only (i.e., the harmonic fine structure is
not shown) of 10 American-English vowels.* Note that each vowel has a unique
shape to its spectrum envelope. Perceptually, these sounds differ from one another
in timbre. Purely as a matter of convention, the term timbre is seldom used by
phoneticians, although it applies just as well here as it does in music. In
phonetics, timbre differences among vowels are typically referred to as
differences in vowel quality or vowel color.
Hillenbrand and Houde (2003). “A narrow band pattern-matching model of vowel perception,” Journal of the
Acoustical Society of America, 113, 1044-1055.
6/20/2016
33
* From
Aperiodic sounds can also differ in
spectrum envelope, and the perceptual
differences are properly described as
timbre differences.
6/20/2016
34
Amplitude Envelope
Timber is also affected by amplitude envelope. Amplitude envelope is
a smooth line drawn to enclose a sound wave. It is also sometimes
called the amplitude contour of the sound wave. These are both
good terms since the amplitude envelope shows how overall signal
amplitude varies over time. Amplitude envelope refers mainly to the
characteristics of the way sounds are turned on and turned off. The
four signals below are sinusoids that differ in their amplitude envelopes.
Leading edge
= attack
6/20/2016
The attack especially has a large effect on timbre.
Trailing edge
= decay
35
Same melody, same spectrum envelope (if
sustained), different amplitude envelopes (i.e.,
different attack and decay characteristics).
Note differences in timbre or sound quality as
the amplitude envelope varies.
6/20/2016
36
Timbre differences related to amplitude
envelope also play a role in speech. Note the
differences in the shape of the attack for [bɑ] vs.
[wɑ] (top) and [ʃɑ] vs. [tʃɑ].
abrupt attack
more gradual attack
6/20/2016
more gradual attack
abrupt attack
37
Download