PSYC3702 Lecture Top..

advertisement
PSYC3702 Perception Lecture Slides
PSYC3702 B
Perception
The Auditory System
(Hearing, Speech and Music Perception)
Textbook Readings:
Chaudhuri, Chs. 5, 6 & 7
Dr. Brian W. Tansley
Physical limits define ‘sound’
n 
n 
n 
Hearing is a‘telesense’…
Sound is a highly processed representation
of a complex physical (acoustic) stimulus…
Humans experience a broad range of
auditory experiences:
u  e.g.,
‘soundscapes’ (acoustic environments)
u  speech, vocalizations
u  music
u  Frequency:
a “band” with upper and lower limits:
«  Infrasonic
= sounds that have a frequency below
20 Hz
«  Ultrasonic = sounds frequencies above 20000 Hz
u  Amplitude:
lower limit only (upper limit remains
undefined or defined by pain, not audition).
Molecular collisions vary in intensity and in time
n 
Sound exists when:
u  Large
populations of molecules collide in
synchrony (within the frequency band of 20 hz 15,000hz)
u  Molecular collisions occur through space and time
in a medium (e.g. air, water, wood, brass, bone)
(in air, this is referred to as pressure changes)
u  Transduced into neural patterns within the CNS
u  Processed at the cortical level by an awake
organism
Amplitude
Frequency
1
PSYC3702 Perception Lecture Slides
Physical properties of Sound
u  “Perturbation”
of the instantaneous compression
of the molecules of a “medium” (e.g., air, water,
wood, brass, etc.)
u  Resulting in a ‘single-valued’ function of time…
Physical properties of Sound
u  “Amplitude
Envelope” of the instantaneous
compression of the molecules of a “medium”
Amplitude Envelope
Sound is heard when the air pressure in which the listener lives FLUCTUATES
above and below ambient* at frequencies >20 Hz < 20 kHz
Pressure
Attack, Decay, Sustain, Release (ADSR)
Time
*Ambient (baseline) pressure
(today’s ambient pressure in Ottawa = 101.9 kiloPascals)
At 1000 Hz (1 kHz) it takes a deviation of 2 x 10-5 Pascals from the ambient pressure
Of ~ 100 kiloPascals for a person with ‘normal’ hearing to just detect the presence of
A sound…
The Sound Measurement scale related to
Fechner’s Law:
S = k log I:
Bel = log I(stim) / I(ref)
-- a Bel is a LARGE quantity – typically too
large for psychoacoustic research.
The “Island of Audibility”…
Instead, we use the deciBel which is 0.1 Bel
(closer to the difference threshold for
loudness)…
–
deciBel (dB) = 10 log I(stim)/I(ref)
2
PSYC3702 Perception Lecture Slides
Sound intensity is related to the pressure
created by the vibrating energy source.
Since pressure varies as the square
root of intensity. Thus, the
Sound Pressure Level (SPL) of a
stimulus, S =
10 log P2s/P2ref OR: 20 log Ps/Pref
Source of sound
Sound pressure
Sound pressure level
Sound in air pascal RMS dB re 20 µPa
Shockwave (distorted sound waves > 1 atm; waveform valleys are clipped at zero pressure)
>101,325 Pa >194 dB
Theoretical limit for undistorted sound at 1 atmosphere environmental pressure
101,325 Pa ~194.094 dB
Stun grenades
6,000–20,000 Pa
170–180 dB
Rocket launch equipment acoustic tests ~4000 Pa
~165 dB
Simple open-ended thermoacoustic device[6]
12,619 Pa 176 dB
.30-06 rifle being fired 1 m to shooter's side
7,265 Pa
171 dB (peak)
M1 Garand rifle being fired at 1 m
5,023 Pa
168 dB
Jet engine at 30 m
632 Pa
150 dB
Threshold of pain
63.2 Pa
130 dB
Vuvuzela horn at 1 m
20 Pa
120 dB(A)[7]
Hearing damage (possible) 20 Pa
approx. 120 dB
Jet engine at 100 m
6.32 – 200 Pa 110 – 140 dB
Jack hammer at 1 m
2 Pa
approx. 100 dB
Traffic on a busy roadway at 10 m
2×10−1 – 6.32×10−1 Pa
80 – 90 dB
Hearing damage (over long-term exposure, need not be continuous) 0.356 Pa
85 dB[8]
Passenger car at 10 m
2×10−2 – 2×10−1 Pa
60 – 80 dB
EPA-identified maximum to protect against hearing loss and other disruptive effects from noise, such as sleep disturbance,
stress, learning detriment, etc.
70 dB[9]
Handheld electric mixer
65 dB
TV (set at home level) at 1 m
2×10−2 Pa approx. 60 dB
Washing machine, dish washer
50-53 dB
Normal conversation at 1 m 2×10−3 – 2×10−2 Pa
40 – 60 dB
−4
Very calm room
2×10 – 6.32×10−4 Pa
20 – 30 dB
Light leaf rustling, calm breathing
6.32×10−5 Pa 10 dB
Auditory threshold at 1 kHz 2×10−5 Pa 0 dB[8]
Adding decibels isn’t like adding linear values (i.e.,
Two 20 db sounds played together don’t result in a 40
db sound – they result in a 26 dB sound...)
Hint: convert back to linear values, add them and then
convert to logarithms once again…
n 
Number of (identical)
Sources…
Increase in
Sound Pressure
Level
(dB)
2
6
3
9.6
4
12
5
14
10
20
15
23.6
20
26
The properties of sound:
u  Sound
is information presented to the brain
by acoustic perturbations
u  A ‘medium’ is required to carry the perturbations
u  A medium is defined by a minimum # of molecules
(enough to knock each other along a line from
source to the auditory system or microphone)
u  Speed of sound:
Density and elasticity of medium
Affect sound speed differently…
u  Carried
«  In
air, it is 331.5 meters/second.
•  The greater the elasticity* of the medium, the
greater the speed of sound propagation.
•  Inelastic substances cannot carry sound since,
once they have been perturbed (displaced) they
cannot return to their former state
3
PSYC3702 Perception Lecture Slides
n 
“Complex” sounds (sounds containing energy at
more than one frequency):
u 
Harmonics are ‘integer multiples’ of the fundamental frequency, f,
as in 2f (400 Hz), 3f (600 Hz), 4f (800 Hz), 5f (1kHz), 6f (1200 Hz)
“periodic” sounds:
«  This
occurs when the pattern of pressure change
repeats itself at regular intervals over time.
«  e.g., music, speech (vowel) sounds
u 
“aperiodic” sounds:
«  Pattern
of pressure change does not repeat itself
e.g., traffic, white noise, noise from crowd,
fricatives and plosive speech sounds, etc.
n Complex tones can be composed of a fundamental frequency and exact mul5ples of that frequency, which are known as its harmonics. n Intensity of each harmonic can vary. n 
Complex sounds:
u  Fourier
analysis
«  Decomposing
a complex waveform (represented
as a single valued function of time) into a series of
sine wave “elements” (in the analytic tradition)
Fourier spectrum = plot of the frequency of a
particular single sine wave as a function of its
amplitude
n 
Complex sounds:
u  Ohm’s
law of hearing:
«  The
ear actually functions as a type of Fourier analyzer
in that it decomposes complex sound waveforms into
their spectral components.
Complex periodic sound waves are produced by summing two or more pure tones. Resul5ng waveformis the algebraic sum of the component amplitudes at each moment in 5me. «  Converts
a single valued function of time into:
an ARRAY of single-valued functions of time
where the index on the array is FREQUENCY…
4
PSYC3702 Perception Lecture Slides
n 
Sound transmission:
u  Interaction
of sound with objects:
The “Inverse square law”
Reflection = when much of the sound bounces back
off the second medium
Absorption = when the sound is transmitted through
the second medium
Diffraction = when the sound waves bend around the
object
•  Easier for low frequency sounds
•  Acoustic mirrors are everywhere…
The inverse square law sets limits on the speech intelligibility range…
n 
Anatomical components of the human ear
u  The
outer ear:
«  Pinna
= visible part of the ear that serves as a
sound funnel
«  External auditory canal
«  Tympanic membrane = eardrum
•  Thin elastic membrane that vibrates with
incoming sound waves
n 
Anatomical components of the
human
u  The
middle ear:
«  Eustachian
tube = connects middle
ear with back of throat to equalize
air pressure differences across the
eardrum
«  Ossicles = three tiny bones that
conduct the sound vibration to the
inner ear at the oval window
5
PSYC3702 Perception Lecture Slides
n 
Anatomical components of the human
ear
u  The
inner ear:
«  Semicircular
canals = fluid-filled bony
cavities that are part of the vestibular
system
•  Used to transmit information about
balance and body position in space
«  Cochlea
= fluid-filled bony structure
that contains the sensory transduction
apparatus of hearing
n 
Anatomical components of the
human ear
u  The
inner ear:
«  Cochlea
divided into three
separate fluid-filled channels
n 
Anatomical components of the
human ear:
u  Sound
transmission through the
ear:
«  Sound
waves are funneled to
eardrum, which vibrates.
«  Vibrational pattern is conducted by
ossicles into cochlea through the
oval window:
Auditory Processing of
Sound—Biological
Mechanisms
n 
Auditory transduction:
u  Organ
of Corti—structural features
(see Figure 5.18 in text):
«  It
lies on top of basilar membrane
in the cochlear duct and spans the
length of the cochlea.
«  Mechanical stimulation produces
the initial bioelectric response.
•  Now vibrational pattern is traveling
through liquid rather than air.
6
PSYC3702 Perception Lecture Slides
Auditory Processing: Neural
Mechanisms
n 
Auditory transduction:
u  Organ
of Corti—structural features
«  Arch
of Corti divides the organ of Corti into an
inner and outer portion.
«  Inner arch contains a single row of inner hair
cells:
•  3,000–3,500 inner hair cells
«  Outer
arch contains three rows of outer hair
cells:
•  12,000 outer hair cells
Auditory Transduction
u  Organ
of Corti—structural features Both types of
hair cells contain stereocilia:
•  Stereocilia are fine filaments that protrude from
upper surface of hair cells.
•  Inner hair cells have 40 stereocilia, while outer hair
cells have 150.
«  Tectorial
membrane lies above the organ of Corti:
•  Connects with stereocilia of outer hair cells
n 
Auditory transduction:
u  Organ
of Corti—mechanical response to sound
stimulation
«  Traveling
wave in basilar membrane produces
movement of the tectorial membrane and arches of
Corti:
•  Produces a shearing force on the stereocilia of hair
cells – causing them to bend
7
PSYC3702 Perception Lecture Slides
u  Role
u  Transductional
mechanism in hair cells
neural output from cochlea arises from inner
hair cells.
the movement of the tectorial membrane
through two active mechanisms:
•  Mechanical response:
– Physical changes in structure of hair cells to
sound stimuli
•  Electrical response:
– Nerve stimulation of outer hair cells
– E.g. phase-locking of action potentials to peak
amplitude of the acoustic stimulus
«  Bending
of stereocilia on hair cells causes tension
on tip links, which opens ionic pores and causes
subsequent depolarization or hyperpolarization:
•  Tip links connect individual filaments of the
stereocilia.
n 
of outer hair cells:
«  Influence
u  Most
Frequency representation:
u  Frequency
theory:
«  Basilar
membrane is capable of vibrating within
full range of frequencies that are audible to
humans.
«  Nerve fibres stimulated by the basilar membrane
are fired at a rate that conveys sound frequency
information.
«  But:
•  Neuronal firing rates rarely exceed 500 impulses
per second.
•  Basilar membrane is not a constant width; it is
wider at the apex and narrower at the basal end.
Cochlear Processing of the Acoustic
Stimulus
Auditory Processing
n 
Frequency representation:
u  Tonotopic
n 
Frequency representation:
u  Place
theory:
organization:
«  Basilar
membrane’s resonant frequency is
organized in a topographic manner.
«  Representation is called a tonotopic map
«  Sounds
of different frequencies produce a
vibrational pattern whose maximum amplitude
occurs at different places along the basilar
membrane:
•  A high frequency sound produces maximum
vibration at the basal end of the basilar
membrane, whereas a low frequency sound
produces maximum vibration at the apical end.
8
PSYC3702 Perception Lecture Slides
Periodic acoustic stimuli are ‘decomposed’ into a spatiotemporal activation pattern
Along the basilar membrane of the cochlea…
n 
Neural capture of auditory signals:
u  Anatomical
organization:
«  Cochlear
nerve fibres originate from cell bodies in the
cochlear ganglion (a.k.a. spiral ganglion):
•  Bipolar neurons with one branch innervates hair cells,
and the other branch carries auditory signals to higher
brain centres
«  90%
of cochlear nerve fibres terminate upon inner
hair cells:
•  Remaining fibres innervate outer hair cells.
The cochlea acts as a natural ‘spectrum analyzer’, converting complex waveforms into simpler constituent
Components and measuring each’s contribution to the whole (much like Fourier’s Theorem).
n 
Neural capture of auditory signals:
u  Afferent
«  Each
versus efferent nerve fibres
afferent fibre terminates on a single inner hair
Auditory Processing:Neural
Mechanisms
cell:
•  Each inner hair cell can direct its output through
nearly 10 separate nerve fibres.
«  Most
of the efferent fibres terminate on the outer hair
cells:
•  Inhibitory effect on afferent fibres
The growth of sensation magnitude as a function of intensity
varies with test frequency…
n 
Neural capture of auditory signals:
u  Neural
coding of sound intensity:
«  Afferent
nerve fibres have different sensitivities.
given inner hair cell can respond to a particular
sound intensity by releasing different amounts of
neurotransmitter at its 10 different synapses.
«  A
Sound intensity is coded by afferent fibres with different sensi5vity profiles. n Very sensi5ve fibres have high spontaneous firing rate and saturate at low intensi5es. n Low sensi5vity fibres have lower spontaneous firing and saturate at higher intensi5es n Analogous to simultaneous exposure “bracke5ng” in photography 9
PSYC3702 Perception Lecture Slides
Auditory Processing: Neural
Mechanisms
n 
Neural capture of auditory signals:
Auditory Processing : Biological
Mechanisms
n 
u  Neural
coding of sound frequency—place theory
mechanisms:
Neural “capture” of auditory signals:
u  Neural
coding of sound frequency—place theory
mechanisms
u  Tuning curve = a plot of the minimum sound
intensity required to obtain a sufficient neural
response as a function of sound frequency
«  A
particular afferent fibre carries signals from a
hair cell that was depolarized by the particular
sound frequency associated with its place in the
tonotopic map:
«  Characteristic
•  With this place code, frequency information can be
interpreted by the brain according to which fibre
was activated.
frequency = the lowest point in the
curve
Auditory Processing: Neural
Mechanisms
n 
Neural capture of auditory signals:
u  Neural
coding of sound frequency—place theory
mechanisms (see Figure 5.25 in text):
«  Frequency
response curves (FRCs) = a plot of the
neural discharge rate as a function of sound
frequency
Auditory Processing :
Neural Mechanisms
n 
Neural ‘capture’ of auditory signals:
u  Neural
coding of sound frequency—frequency
theory mechanisms:
«  Phase-locked
response = neural firing that has a
precise timing relationship to the sound
waveform, being linked to either each
compressive portion or spaced out to every few
cycles
«  Can occur for frequencies up to 4000 Hz:
•  Beyond that, only the place code applies.
10
PSYC3702 Perception Lecture Slides
Auditory Processing of
Sound—Biological
Mechanisms
n 
Subcortical auditory structures:
u  Ascending
pathways
«  SG
neurons form a nerve trunk that enters the
brainstem and terminates in the cochlear nucleus.
«  Cochlear nucleus is where auditory signals branch
out in a bilateral fashion.
Auditory Processing : Biological
Mechanisms
Auditory Processing :Biological
Mechanisms
n 
n 
Subcortical auditory structures:
u  Ascending
«  Neurons
u  Response
structures:
pathways:
there, project to superior olive and inferior
colliculus:
«  Tonotopicity:
•  Place code is propagated through the subcortical
pathways all the way up to the cortex.
•  Binaural projections allow integration of auditory
signals from both ears.
«  Laterality:
«  From
there, projections to medial geniculate
nucleus:
•  Auditory neurons that reside in structures above
the cochlear nucleus integrate information from
the two ears.
•  Thalamic nucleus
there, projection to auditory cortex
Auditory Processing: Biological
Mechanisms
Auditory Processing—Biological
Mechanisms
Subcortical auditory structures:
u  Descending
pathways:
«  Signals
from the cortex flow down through the
same structures to modulate the auditory
response to sound in two ways:
•  Efferent fibres that innervate the cochlea originate
in the superior olive, called olivocochlear neurons,
can influence firing rates in the ascending system.
•  Efferent system can also activate two small
muscles in the middle ear, called the tensor
tympani and the stapedius, to protect from
damage due to intense sounds.
modification:
•  Auditory signals are processed to some degree
within a localized neural network.
«  From
n 
characteristics of subcortical
«  Response
in the cochlear nucleus are monaural:
•  No crossover of signals
«  From
Subcortical auditory structures:
n 
The auditory cortex:
u  Anatomical
organization (see Figure 5.27 in text):
«  Ascending
fibres leave the medial geniculate
nucleus and terminate in area A1, or the primary
auditory cortex, of the temporal lobe.
«  Area A2 surrounds area A1 and processes higher
order acoustic information.
«  Wernicke’s area is involved in speech analysis.
11
PSYC3702 Perception Lecture Slides
Auditory Processing: Biological
Mechanisms
n 
The auditory cortex:
u  Functional
organization:
«  Tonotopic
organization:
•  Result of the tonotopic projection pattern found in
the subcortical structures
«  Combination
of binaural inputs:
•  Summation response = inputs from both ears are
excitatory and therefore one type of neuron in A1
can be driven by sound stimulation of either ear
•  Suppression response = input from the opposite
side excites one type of neuron in A1 while input
from the same side inhibits the neuron
Auditory Processing of Sound:
Biological Mechanisms
n 
The auditory cortex:
u  Parallel
auditory pathways:
«  Area
A1 is organized to represent sound frequency
and binaural interaction.
«  This helps with sound localization and processing
temporal patterns.
«  These two functions may be processed in separate
but parallel neural pathways of the higher auditory
system.
Psychoacoustics:
n 
n 
n 
Intensity and loudness
Frequency and pitch
Auditory space perception
Intensity and Loudness
n 
Absolute audibility thresholds:
u  Procedural
«  Minimum
aspects:
audible field (MAF):
•  An open-field technique for obtaining detection
thresholds by providing sound stimuli through a
loudspeaker in front of the subject
«  Minimum
audible pressure (MAP):
•  A closed-ear method for obtaining detection
thresholds whereby sound is provided through a set
of earphones
12
PSYC3702 Perception Lecture Slides
u  Absolute
Intensity and Loudness
n 
detection thresholds:
«  Minimal
audibility curve
•  A psychophysical function that relates threshold sound
detection levels in terms of frequency
Audibility thresholds:
u  Absolute
detection thresholds:
«  Hearing
sensitivity is very dependent upon
frequency.
n 
Audibility thresholds:
n 
n 
n 
Audibility thresholds:
u  Factors
Anatomical and physiological determinants of the minimal
audibility curve:
n  Three possible sources can account for the U-shaped
profile of the curve:
§  Conductive elements of the outer and middle ear
(maximal enhancement occurs at the middle frequencies)
§  Mechanical response function of the cochlea (not this
one: the basilar membrane responds similarly across a
range of sound frequencies)
§  Physiological properties of the auditory nervous
system (higher frequencies, up to a point, produce a
stronger neural signal)
that affect absolute sensitivity:
«  The
way in which the data are collected:
•  MAF technique yields lower thresholds than
MAP technique, although the profile of the
audibility curve is the same.
«  The duration of the sound stimulus:
•  Temporal summation = the way the auditory
system integrates sound energy over time
«  Whether the sound is binaural or monaural:
•  Stimulation of both ears yields lower
thresholds either one alone.
•  Binaural summation = the way the auditory
system integrates sound information from both
ears
Loudness perception—relationship to intensity:
u 
u 
Loudness grows as a power function of sound pressure level
with an exponent of about 2/3:
L = k SPL0.67.
n 
Loudness perception—relationship to intensity:
u  Determinants
«  Cochlear
of loudness perception:
neurons encode discrete intensity regions:
•  The brain can map sound intensity into loudness by
assessing which pool of neurons are responsive to a
stimulus and what the firing rate is within that pool.
13
PSYC3702 Perception Lecture Slides
n 
Loudness perception—relationship to intensity:
u  Determinants
n 
Loudness perception—relationship to frequency:
u  Equal
loudness contours :
provide data on the intensity values that
produce equal loudness perception across a range
of frequencies.
«  Progressively higher intensities are needed to
achieve equal loudness perception toward the
extreme end of the frequency spectrum.
«  Equal loudness contours flatten out at higher
intensities.
of loudness perception:
«  These
«  Higher
sound intensities produce a spread of
excitation along the basilar membrane:
«  The
greater the range of frequency-tuned fibres that are
active, the larger the spectral bandwidth of the acoustic
stimulus.
«  Thus,
there are two ways to make a sound louder:
«  Increase
its amplitude
the timbral properties of the stimulus (make the
sound more spectrally ‘rich’…)
«  Broaden
n 
Loudness perception—relationship to frequency:
u  The
phon scale of loudness:
«  Phon
= the unit of measurement used to indicate
loudness level across frequencies
n 
Masking and noise:
u  Tonal
masking—frequency effects:
«  Tonal
n 
Masking and noise:
u  Masking—general
«  Masking
properties:
occurs when the detection threshold of a
particular sound is raised because of other sounds.
«  Masking studies have provided considerable insight
into auditory function.
masking = the masking effects of a single tone
by another
«  The most effective masking occurs when the masking
and test tone frequencies are the same.
«  Masking tones of other frequencies require higher
intensities to mask the test tone.
14
PSYC3702 Perception Lecture Slides
Frequency and Pitch
n 
Auditory Space Perception
Coding mechanisms in pitch perception:
u  Place
n 
versus frequency theories:
Acoustic cues for localization:
u  Physical
«  Place
theory is better suited to explain other
instances of pitch perception:
considerations
«  Azimuth
= the horizontal angle around the head
= the vertical angle with respect to the
head (Pinna imposes time distortion products
into the incoming acoustic signal)
«  Elevation
•  Dichotically presented harmonics
Auditory Space
Perception
n 
Acoustic cues for localization:
u  Interaural
intensity difference:
«  When
sound reaching the farther ear has slightly
less intensity than sound reaching the nearer ear
«  Occurs because:
•  Sound pressure decreases with increasing
distance.
•  Head shadow effect = the head reflects incoming
sounds:
–  Much greater with high frequency sounds
–  Much greater when the sound path difference
between the two ears has the largest value
Auditory Space Perception
n 
Acoustic cues for localization:
u  Interaural
time difference:
«  Sounds
require extra time to travel to farther ear.
needs to be dichotic.
«  The path difference of sound transmission to the two
ears will depend on the azimuth angle.
«  It is less susceptible to changes in frequency than
interaural intensity differences.
«  Stimulation
15
PSYC3702 Perception Lecture Slides
Auditory Space Perception
Auditory Space Perception
n 
n 
Acoustic cues for localization:
u  Localization
Neural mechanisms:
u  The
under ambiguous conditions:
binaural cross-correlation model:
«  Neural
network model is based on interaural time
differences
«  There
are two possible angles for any given value of
IID and ITD.
«  This can be resolved by turning or tilting the head.
«  Pinna also contributes to auditory localization under
ambiguous conditions.
Auditory Space Perception
n 
Neural mechanisms:
u  Neural
correlation mechanisms in the brainstem:
«  Superior
olive is the primary site where binaural
differences are coded in a systematic way.
Auditory Space Perception
n 
Binaural hearing and perceptual organization:
u  Masking
effects of background noise:
«  Binaural
system is very important for distinguishing
signal from noise.
«  Masking effect of noise is minimum when it has
a different spatial location from the signal.
16
PSYC3702 Perception Lecture Slides
Auditory Space Perception
n 
Binaural hearing and perceptual organization:
u  Auditory
scene analysis:
«  This
is the manner by which the ear achieves an
effortless segregation of concurrent sounds.
«  First, the mixture of sounds undergo primitive
grouping.
«  The subsequent processing stage involves schemadriven grouping.
Testing hearing function:
Typical behaviour hearing test
Psychophysical tests:
•  The ‘pure tone’ audiogram
•  Speech recognition tests
Acoustic reflexes:
•  Stapedius reflex
Electrophysiologic tests:
•  Brainstem auditory evoked responses (BAERs)
•  Auditory evoked potentials (AEPs)
detection thresholds:
«  Minimal
The “normal” Audiogram: showing the relative change in sensitivity at each
frequency of testing. Symbols for Air Conduction test: X = Left ear O = right ear.
audibility curve
•  A psychophysical function that relates threshold sound
detection levels in terms of frequency
http://www.phys.unsw.edu.au/jw/hearing.html
Note that O db HL (Hearing Level) refers to the RATIO of the sound pressure level
At a given Ss’ detection threshold divided by the SPL of the normal population
at that same frequency.
Hearing Level (dB)
u  Absolute
http://www.phys.unsw.edu.au/jw/hearing.html
17
PSYC3702 Perception Lecture Slides
Testing hearing function:
Brainstem Evoked Auditory Response
Psychophysical tests:
•  The ‘pure tone’ audiogram
•  Speech recognition tests
Acoustic reflexes:
•  Stapedius reflex
Electrophysiologic tests:
•  Brainstem auditory evoked responses (BAERs)
•  Auditory evoked potentials (AEPs)
Auditory Evoked Potentials
Auditory Dysfunction
Auditory Dysfunction
n 
Causes of hearing loss:
u  Sensorineural
n 
Causes of hearing loss:
u  Conductive
«  When
loss:
the mechanical conduction of sound through
the ear is affected
«  Otitis media = middle ear infection can change the
vibrational characteristics of the eardrum
«  Otosclerosis = an inherited bone disease that
produces abnormal development and function of
the ossicles
loss:
«  When
the cochlea or the neural elements of the
inner ear malfunction
«  Usually occurs due to damaged hair cells:
•  e.g., ingestion of ototoxic drugs, injury, tumors,
disease, exposure to intense environmental
noise
«  Genetic
causes:
•  Usher syndrome, Waardenburg syndrome
«  Aging:
•  Presbycusis = hearing loss that gradually occurs
due to aging
–  Affects high frequencies first
18
PSYC3702 Perception Lecture Slides
n 
Diagnosis and treatment:
u  Use
the bone conduction test to distinguish
between conductive and sensorineural loss.
u  Hearing aids:
Audiogram from a patient with a sensorineural
hearing loss in the left ear (X = air conduction; > = bone
conduction)
«  Amplify
incoming sounds
effective as long as sensory function in the
cochlea is not lost
«  Are
u  Cochlear
implant:
«  Converts
the sound signal into an electric current
that is then delivered directly to the auditory nerve
fibres
Different causes of deafness result in different symptoms:
The Psychological Effects Associated with Hearing Loss
•  In acoustic neuroma the pure tone audiogram shows an
ipsilateral sensorineural deafness.
•  ‘Glue ear’ results in conduction deafness. The pure tone
audiogram shows hearing loss particularly at low
frequencies.
•  Meniere's disease is characterised by a low frequency
sensorineural hearing loss.
•  Noise-induced hearing loss results in (typically) bilateral
sensorineural hearing loss centred at 4 kHz…
•  Presbycusis (due to aging) results in high frequency
hearing loss…
•  Presbycusis is a cumulative deficit (i.e., expression of the
cumulative effects of hearing damage over the life span.
Or: Why are hearing aids not a ‘fashion item’?
The Puzzle of Recruitment in Hearing Loss:
n 
n 
n 
n 
n 
A patient has a 40 dB hearing loss at 4 kHz, measured
by pure tone audiometry.
This means that at 4kHz the patient requires 40 dB
more sound pressure than a ‘normal’ hearing individual
to just detect the presence of the 4kHz tone.
However, when the same tone is presented to the
patient at above threshold, it is rated much louder than
one would expect from a person with a 40 dB hearing
loss…
The auditory system “recruits” more neurons in
response to higher intensities (i.e., the hearing loss
decreases with increasing SPL above threshold)
Thus, the corrective “gain” needed for louder sounds is
less than for quieter ones…(problematic for hearing
aids)
“Deafness is a much worse misfortune for it means the loss of the
most vital stimulus– the sound of the voice that brings language,
sets thoughts astir, and keeps us in the intellectual company of
man. Blindness separates us from things but deafness separates
us from people.”
-- Helen Keller (1880-1968)
Speech samples with selected frequency bands
Removed…
Speech sample (full bandwidth – all frequencies present)
Speech sample (low pass filtered -- all frequencies < 2000 Hz)
Speech sample (low pass filtered -- all frequencies < 500 Hz)
Speech sample (high pass filtered -- all frequencies > 2000 Hz)
Speech sample (high pass filtered -- all frequencies > 500 Hz)
19
PSYC3702 Perception Lecture Slides
Applications:
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Hearing aid design
Digital sound coding / compression
High Fidelity Music Reproduction
Cochlear implantation technology
Speech recognition technology
Room/Hall acoustics design
Musical instrument design
Auditory alarm design
Industrial Noise Suppression
Echolocation
Acoustic spectrography/’fingerprinting’
Automatic voice recognition
Applications:
•  Hearing aid design: cochlear implant
Applications:
•  Hearing aid design
Applications:
•  Digital sound coding / compression
Perceptual Audio Coding: Basic Ideas
• Decompose a signal into separate frequency bands by using a filter bank
• Analyze signal energy in different bands and determine the total masking
threshold of each band because of signals in other band/time
• Quantize samples in different bands with accuracy proportional to the masking
level
– Any signal below the masking level does not need to be coded
– Signal above the masking level are quantized with a quantization step size
according to masking level and bits are assigned across bands so that each
additional bit provides maximum reduction in perceived distortion.
Utube presentation on the cochlear implant
•  Digital reproduction of speech sounds demo
Applications:
Applications:
•  High Fidelity Music Reproduction
•  Speech recognition technology
•  Acoustic reproduction (music, soundtracks, voice)
•  Applications for cinema, live theatre and conferences
•  Automated voice dictation systems,
•  ‘Hands-free’ telephony,
•  Automated voice-band surveillance, etc.
•  Voice operated machine control
•  Speech-based demand management systems
20
PSYC3702 Perception Lecture Slides
Music Perception
Applications:
•  Room/Hall acoustics design (standards)
Speech Intelligibility may be expressed by a single
number value. Two scales are most commonly used:
Speech Transmission Index (STI) and Common
Intelligibility Scale (CIS)
n 
n 
The challenge to define music…
Music is sonic information that is:
temporally correlated with respect to pitch,
harmony, rhythm and loudness
u  spatially correlated with respect to timbre,
loudness and …
u 
Both involve measuring the input/output relationship
between a set of test signals broadcast into a room
and factors affecting the signal fidelity, including, speech
level, background noise level, masking effects,
reverberation effects, etc.
n 
Music is not:
Music Perception
n 
Musical pitch and timbre:
u  Pitch
sensations
«  Various
harmonic components produce an orderly
pattern of peak wave activity on the basilar
membrane.
«  The brain can read this spatial pattern to perceive
both a pitch sensation and a timbral sensation
«  Temporal properties of the stimulus are important
«  Removing the attack transient on a note makes it
more difficult to identify the musical instrument that
created it.
Struck, plucked and bowed strings have a number of different preferred states of vibra:on or resonance which produce harmonics
21
PSYC3702 Perception Lecture Slides
Music Perception
n 
Musical pitch and timbre:
u  Musical
scales—chromatic intervals
and tone height
«  Tone
chroma = relative position of
the tones within an octave
«  Tone height = the series of
successive octave registers
spanning the full musical pitch
spectrum
Music Perception
Music Perception
n 
Perceptual aspects of music:
u  Tonal
n 
Musical pitch and timbre:
u  Perfect
pitch:
«  Relative
pitch = the ability to accurately identify
tone intervals
«  Perfect pitch = the ability to identify an isolated
tone by name
•  Arises from a combination of genetic and
environmental factors
superpositions—monophonic versus
polyphonic tones:
«  Auditory
system is able to identify individual tones
when presented together in a chord.
«  It can discriminate two notes of the same pitch but
different timbre.
«  It can also discriminate two notes of different
pitches.
*recall the distinction between prothetic and
metathetic perceptual continua…
Adding notes together results in a perceptually
segregated set (unlike adding colours together)…
Music Perception
n 
Perceptual aspects of music:
u  Tonal
superpositions—consonance and
dissonance:
«  Two
or more simultaneously played notes
generate a consonant sensation if they are
perceived as pleasant, and dissonant if
unpleasant.
«  Consonant sounds occur when:
•  Two or more tones contain harmonic frequencies
that coincide and therefore reinforce each other.
•  A distinct and separable frequency pattern is
present among two or more tones played
simultaneously.
22
PSYC3702 Perception Lecture Slides
Music Perception
n 
Perceptual aspects of music:
u  Tonal
sequences—melody, tempo, and rhythm:
Melody refers to a temporal sequence of pitches
Harmony refers to a SCALE CONTEXT into which the melody is placed…
Both Melodic perception is invariant (within limits) with many transformations of key
(base pitch), meter, timbre, harmonic content, and lyrics!
«  Melody
= a group of musical tones that forms a
sequential appearance in time (with statistical
properties related to an intermediate information
level)
«  Tempo = the perceived speed of tonal presentations
in a melodic sequence
«  Rhythm = the perceptual organization of a melodic
sequence based on temporal grouping of musical
tones
«  Contour = the nature of pitch changes with time
Music Perception
The ORDINAL PATTERN of a transposed melody remains constant.
S+_ _+_ _ _ _
S+_ _+_ _ _ _
n 
Perceptual aspects of music:
u  Tonal
sequences—cognitive theories:
«  Gestalt
principles are believed to be responsible for
generating the perceived wholeness of music:
•  Proximity = tones appearing close together in time
will be grouped as a perceptual unit
•  Similarity = tones that have a similar pitch or timbre
are perceptually bounded together in the same group
•  Common fate = tonal sequences whose pitch or
intensity pattern change in the same direction form a
common unit
Gestalt Principles associated with music perception:
Gestalt Principles associated with music perception:
Melodic components “reveal” themselves due to:
Melodic components “reveal” themselves due to:
Spatiotemporal “grouping”… (auditory “stream segregation”)
Spatiotemporal “grouping”… (auditory “stream segregation”)
proximity
http://webpages.mcgill.ca/staff/Group2/abregm1/web/snd/Track05.mp3
• “Growth” of segregation with repetition
When sequential tones are close in pitch, they are perceived as ‘belonging’ together…
When sequential tones are close in time, they are perceived as belonging together…
As you separate either pitch or time (or both) the stream ‘segregates’ into two or more
Separately perceived streams…
Auditory demonstrations downloaded from Dr. Al Bregman’s website, McGill University
http://webpages.mcgill.ca/staff/Group2/abregm1/web/downloadsdl.htm
• Segregation of a melody from an auditory
note stream. -- more effective when the pitch
of the tones in the melody are different from the rest of the
stream…
23
PSYC3702 Perception Lecture Slides
Gestalt Principles associated with music perception:
Music Perception
Stream segregation as a melodic construction in music -- using
pitch grouping (excerpt from sonata by Telemann)
n 
Music and the brain:
u  Pitch
The sum is greater than its parts (under the right conditions):
The sum is NOT greater than its parts (under the right
conditions):
Segregation of rhythmic components
perception is processed in planum
temporale.
u  Different patterns of brain activation occur when
listening to non-melodic compared to melodic
sequences.
Melodic segregation by differences in timbre (commonly
used as a compositional mechanism in music)
Speech Perception
All normally developing humans have the
capacity for speech and language acquisition
Speech/hearing evolve and develop together
Speech and hearing experiences
modify brain function and structure
Excitation of the vocal apparatus
n This shows a sequence of focal fold posi5ons during speech. n Rapid pulses of air pressure into the oral/nasal cavi5es produce the ‘vocal carrier’ waveform for all speech sounds… n (Male: 125 pulses/sec f0 ~ 125 Hz Female: 250 pulses/sec f0 ~ 250 Hz) Voiced excitation (phonation). Exhalatory air streams through the
larynx, passing through the vocal cords (glottis). At appropriate tension,
these are excited to quasi-periodic vibration (mostly constant, but can
be somewhat variable in period and amplitude).
Voiceless excitation. Air streams through the open glottis and passes
through a constriction in the oral cavity or in the pharynx. At this point,
the friction produces turbulence and hence, a noise whose spectrum is
determined by the position of the constriction.
Transient excitation (plosive excitation). Air congests at an occlusion
in the oral cavity or in the pharynx. A sudden opening of the occlusion
releases the pressure.
24
PSYC3702 Perception Lecture Slides
Modulation of the vocal ‘carrier’
Modulation: (Electrical Engineering) a. the
act or process of varying the amplitude,
frequency, phase, etc., of a ‘carrier’ signal
under control of another signal
For speech this means varying the vocal
carrier (created by the glottal pulse or by
exhaled air) under control of the brain. This
results in changes in the SHORT TIME
FREQUENCY SPECTRUM of the vocal
stream.
From M1, 6 of the 12 cranial nerves send motor fibers
to the muscles involved in the production of speech:
• 
• 
• 
• 
• 
• 
Trigeminal (CN5)
Facial (CN7)
Glossopharyngeal (CN9)
Vagus (CN10)
Spinal accessory (CN11)
Hypoglossal (CN12)
Controlling the muscles of the pharynx, palate, tongue, and
face and of mastication.
Tongue positions for articulating vowels…
The intelligence (message) in speech is “carried” in
the modulations of the vocal apparatus
n 
n 
n 
Thus, limits to the rate of production of speech sounds
sets the upper limit to the amount of information that can
be carried per unit time… (about 10 speech events/sec ~
50 – 60 bits/sec ~ realistically closer to 30 bits/sec)
Bilateral cerebral and cerebellar regions are involved in
the motor act of articulation, irrespective of the type of
speech.
Three additional, left‐lateralized regions, adjacent to the
Sylvian sulcus, are activated in common: the most
posterior part of the supratemporal plane, the lateral part
of the pars opercularis in the posterior inferior frontal
gyrus and the anterior insula.
Various speech components use different parts of the
‘auditory island’…
n Vowel sounds are largely represented in lower frequency end. n Consonants cover wider range of frequencies. n Frica5ves are largely represented in higher frequency end. 25
PSYC3702 Perception Lecture Slides
The “waterfall plot or continuous speech spectrogram:
A graph showing how the sound spectrum varies with time (due to articulation)
(Note the plot of the amplitude envelope as a function of time in red)
The darker the blue pixels the more energy located at that point in frequency/time…
Note the ‘formant’ shapes ( ridges of energy (dark bands) located at different
Frequencies parallel to the time axis), labelled F1, F2 and F3…
n Spectrogram for four words reveals discrete bands of resonant frequencies, known as formants. n Formants indicated by white lines are numbered from lowest frequency. n The posi5on of a formant on the frequency axis depends on tongue posi5on…
F3
F2
F1
Formants are produced by specific positions of the tongue during vocalization
n 
Speech comprehension:
u  Perception
«  The
of phonetic segments—consonants:
sounds that occur before or after a phoneme
can affect the way it is perceived.
«  Timing sequence during articulation of certain
consonants also an important feature of a sound.
«  Speech perception is an example of categorical
perception – many to one mapping of the properties
of the acoustic stimulus to perceptual experience
Examples of articulation components of speech
Spectrogram of “connected” (con5nuous) speech. Acous5cal signals of individual words merge into a con5nuous paYern where the beginnings and endings of words cannot be easily determined (visually). This is one of the most important reasons why it has been so difficult to build ‘automa5c’ speech recogni5on systems. (“Disconnected” speech recognizers came first…) How does the brain solve this problem? Ans: through “Top Down” influences,
Such as knowledge of a language, knowledge of a dialect or accent, facial features
During speaking, linguistic context and information density of the utterance, etc.
26
PSYC3702 Perception Lecture Slides
Formant transi5ons occur in consonant-­‐vowel pairing. Transi5ons represent abrupt shi]s in acous5c energy from one frequency to another over a short 5me (< ~ 50 msec)
Three Problems for Speech Perception
Segmentation problem = the difficulty in parsing speech
due to the relative lack of acoustical boundaries
between individual words in free-flowing speech
n  Variation problem = the difficulty in recognizing different
acoustic stimuli as carrying the same information…
n  Coarticulation problem = the difficulty in parsing speech
due to overlapping speech segments that occur from the
contextual relationship between adjacent phonetic
segments
n 
Speech Perception
n 
Speech comprehension:
u  Perception
n Voice-­‐onset 5me (VOT) of 30 ms represents transi5on point between voiced [d a] and voiceless [t a] sounds. n Discrimina5on ability is poor if VOTs fall within a single category. n Excellent discrimina5on ability occurs for sounds whose VOT falls on either side of phone5c boundary. of words and word sequences:
«  A
crucial element is having a sufficient lexical
database.
«  Semantics and grammatical rules also play a
critical role.
«  Phonemic restoration effect showed that the
context of the speech segment is important too.
«  McGurk effect shows that speech perception
also includes visual information regarding
articulation.
Speech Perception is Categorical:
The variations in the production of speech sounds results in
acoustic variability in the speech signal.
n Neural processing of speech begins in primary auditory cortex. n Signals from there feed into Wernicke’s area—key area for speech comprehension. n Signals then feed into Broca’s area—key area for speech produc5on. n Motor cortex finally receives signals to coordinate motor movements necessary for mouth and tongue movements involved in speech. Thus, a large set of acoustic stimuli (speech sounds) must map to
a small set of speech categories which, in turn, maps to an even
Smaller set called ‘intelligence’ or ‘meaning’.
The amount of intelligence carried in a speech sample is not large…
Maximum is the !Xóõ language (Botswana) 112 phonemes
Minimum is the Rotokas language (New Guinea) 11 phonemes
English language has ~ 40 phonemes
In part this is why translation is such a challenge…
Compare the information processing task of speech perception to the information processing task of
Receptors on the post-synaptic membrane (also a receiver in a communication system…)
Is the relative recency of speech (on the evolutionary time scale) possible only because of the information
Processing capacity of the auditory cortex?
27
PSYC3702 Perception Lecture Slides
n 
Speech comprehension:
u  Interpretation
of the speech signal:
«  Combination
«  A
of bottom-up and top-down processing
form of translation….
"Still it seems to me that translation from one language
into another, if it be not from the queens of languages,
the Greek and the Latin, is like looking at Flemish
tapestries on the wrong side; for though the figures are
visible, they are full of threads that make them
indistinct, and they do not show with the smoothness
and brightness of the right side" (Cervantes, 1615)
Natural human languages (i.e., languages that evolved naturally)
Are redundant (i.e., contain less than the maximum information
per unit symbol theoretically possible)
For example, English texts can be compressed by a factor of 4
without loss of information.
Spoken English carries ~ 30-40 bits/sec of information
Digital reproduction of a hi-fidelity recording of a speaker’s voice
Requires a thousand times more information…44.1kBits/sec.
Thus, MOST of what is spoken in speech isn’t necessary to transmit
What is meant…
u  Brain
regions near the auditory cortex involved
in language and speech processing
u  Brain
asymmetry and lateralization:
«  Much
of the neural processing underlying language
production and comprehension are restricted to the
left hemisphere in most people.
«  Right
hemisphere performs high-level functions
such as language interpretation, drawing
inferences, and resolving linguistic ambiguities.
Wernicke’s aphasia = following damage to Wernicke’s
area in the temporal lobe, the deficit in understanding
language
«  Broca’s aphasia = following damage to Broca’s area in
the frontal lobe, the deficit in speech production ability
« 
28
Download