PSYC3702 Perception Lecture Slides PSYC3702 B Perception The Auditory System (Hearing, Speech and Music Perception) Textbook Readings: Chaudhuri, Chs. 5, 6 & 7 Dr. Brian W. Tansley Physical limits define ‘sound’ n n n Hearing is a‘telesense’… Sound is a highly processed representation of a complex physical (acoustic) stimulus… Humans experience a broad range of auditory experiences: u e.g., ‘soundscapes’ (acoustic environments) u speech, vocalizations u music u Frequency: a “band” with upper and lower limits: « Infrasonic = sounds that have a frequency below 20 Hz « Ultrasonic = sounds frequencies above 20000 Hz u Amplitude: lower limit only (upper limit remains undefined or defined by pain, not audition). Molecular collisions vary in intensity and in time n Sound exists when: u Large populations of molecules collide in synchrony (within the frequency band of 20 hz 15,000hz) u Molecular collisions occur through space and time in a medium (e.g. air, water, wood, brass, bone) (in air, this is referred to as pressure changes) u Transduced into neural patterns within the CNS u Processed at the cortical level by an awake organism Amplitude Frequency 1 PSYC3702 Perception Lecture Slides Physical properties of Sound u “Perturbation” of the instantaneous compression of the molecules of a “medium” (e.g., air, water, wood, brass, etc.) u Resulting in a ‘single-valued’ function of time… Physical properties of Sound u “Amplitude Envelope” of the instantaneous compression of the molecules of a “medium” Amplitude Envelope Sound is heard when the air pressure in which the listener lives FLUCTUATES above and below ambient* at frequencies >20 Hz < 20 kHz Pressure Attack, Decay, Sustain, Release (ADSR) Time *Ambient (baseline) pressure (today’s ambient pressure in Ottawa = 101.9 kiloPascals) At 1000 Hz (1 kHz) it takes a deviation of 2 x 10-5 Pascals from the ambient pressure Of ~ 100 kiloPascals for a person with ‘normal’ hearing to just detect the presence of A sound… The Sound Measurement scale related to Fechner’s Law: S = k log I: Bel = log I(stim) / I(ref) -- a Bel is a LARGE quantity – typically too large for psychoacoustic research. The “Island of Audibility”… Instead, we use the deciBel which is 0.1 Bel (closer to the difference threshold for loudness)… – deciBel (dB) = 10 log I(stim)/I(ref) 2 PSYC3702 Perception Lecture Slides Sound intensity is related to the pressure created by the vibrating energy source. Since pressure varies as the square root of intensity. Thus, the Sound Pressure Level (SPL) of a stimulus, S = 10 log P2s/P2ref OR: 20 log Ps/Pref Source of sound Sound pressure Sound pressure level Sound in air pascal RMS dB re 20 µPa Shockwave (distorted sound waves > 1 atm; waveform valleys are clipped at zero pressure) >101,325 Pa >194 dB Theoretical limit for undistorted sound at 1 atmosphere environmental pressure 101,325 Pa ~194.094 dB Stun grenades 6,000–20,000 Pa 170–180 dB Rocket launch equipment acoustic tests ~4000 Pa ~165 dB Simple open-ended thermoacoustic device[6] 12,619 Pa 176 dB .30-06 rifle being fired 1 m to shooter's side 7,265 Pa 171 dB (peak) M1 Garand rifle being fired at 1 m 5,023 Pa 168 dB Jet engine at 30 m 632 Pa 150 dB Threshold of pain 63.2 Pa 130 dB Vuvuzela horn at 1 m 20 Pa 120 dB(A)[7] Hearing damage (possible) 20 Pa approx. 120 dB Jet engine at 100 m 6.32 – 200 Pa 110 – 140 dB Jack hammer at 1 m 2 Pa approx. 100 dB Traffic on a busy roadway at 10 m 2×10−1 – 6.32×10−1 Pa 80 – 90 dB Hearing damage (over long-term exposure, need not be continuous) 0.356 Pa 85 dB[8] Passenger car at 10 m 2×10−2 – 2×10−1 Pa 60 – 80 dB EPA-identified maximum to protect against hearing loss and other disruptive effects from noise, such as sleep disturbance, stress, learning detriment, etc. 70 dB[9] Handheld electric mixer 65 dB TV (set at home level) at 1 m 2×10−2 Pa approx. 60 dB Washing machine, dish washer 50-53 dB Normal conversation at 1 m 2×10−3 – 2×10−2 Pa 40 – 60 dB −4 Very calm room 2×10 – 6.32×10−4 Pa 20 – 30 dB Light leaf rustling, calm breathing 6.32×10−5 Pa 10 dB Auditory threshold at 1 kHz 2×10−5 Pa 0 dB[8] Adding decibels isn’t like adding linear values (i.e., Two 20 db sounds played together don’t result in a 40 db sound – they result in a 26 dB sound...) Hint: convert back to linear values, add them and then convert to logarithms once again… n Number of (identical) Sources… Increase in Sound Pressure Level (dB) 2 6 3 9.6 4 12 5 14 10 20 15 23.6 20 26 The properties of sound: u Sound is information presented to the brain by acoustic perturbations u A ‘medium’ is required to carry the perturbations u A medium is defined by a minimum # of molecules (enough to knock each other along a line from source to the auditory system or microphone) u Speed of sound: Density and elasticity of medium Affect sound speed differently… u Carried « In air, it is 331.5 meters/second. • The greater the elasticity* of the medium, the greater the speed of sound propagation. • Inelastic substances cannot carry sound since, once they have been perturbed (displaced) they cannot return to their former state 3 PSYC3702 Perception Lecture Slides n “Complex” sounds (sounds containing energy at more than one frequency): u Harmonics are ‘integer multiples’ of the fundamental frequency, f, as in 2f (400 Hz), 3f (600 Hz), 4f (800 Hz), 5f (1kHz), 6f (1200 Hz) “periodic” sounds: « This occurs when the pattern of pressure change repeats itself at regular intervals over time. « e.g., music, speech (vowel) sounds u “aperiodic” sounds: « Pattern of pressure change does not repeat itself e.g., traffic, white noise, noise from crowd, fricatives and plosive speech sounds, etc. n Complex tones can be composed of a fundamental frequency and exact mul5ples of that frequency, which are known as its harmonics. n Intensity of each harmonic can vary. n Complex sounds: u Fourier analysis « Decomposing a complex waveform (represented as a single valued function of time) into a series of sine wave “elements” (in the analytic tradition) Fourier spectrum = plot of the frequency of a particular single sine wave as a function of its amplitude n Complex sounds: u Ohm’s law of hearing: « The ear actually functions as a type of Fourier analyzer in that it decomposes complex sound waveforms into their spectral components. Complex periodic sound waves are produced by summing two or more pure tones. Resul5ng waveformis the algebraic sum of the component amplitudes at each moment in 5me. « Converts a single valued function of time into: an ARRAY of single-valued functions of time where the index on the array is FREQUENCY… 4 PSYC3702 Perception Lecture Slides n Sound transmission: u Interaction of sound with objects: The “Inverse square law” Reflection = when much of the sound bounces back off the second medium Absorption = when the sound is transmitted through the second medium Diffraction = when the sound waves bend around the object • Easier for low frequency sounds • Acoustic mirrors are everywhere… The inverse square law sets limits on the speech intelligibility range… n Anatomical components of the human ear u The outer ear: « Pinna = visible part of the ear that serves as a sound funnel « External auditory canal « Tympanic membrane = eardrum • Thin elastic membrane that vibrates with incoming sound waves n Anatomical components of the human u The middle ear: « Eustachian tube = connects middle ear with back of throat to equalize air pressure differences across the eardrum « Ossicles = three tiny bones that conduct the sound vibration to the inner ear at the oval window 5 PSYC3702 Perception Lecture Slides n Anatomical components of the human ear u The inner ear: « Semicircular canals = fluid-filled bony cavities that are part of the vestibular system • Used to transmit information about balance and body position in space « Cochlea = fluid-filled bony structure that contains the sensory transduction apparatus of hearing n Anatomical components of the human ear u The inner ear: « Cochlea divided into three separate fluid-filled channels n Anatomical components of the human ear: u Sound transmission through the ear: « Sound waves are funneled to eardrum, which vibrates. « Vibrational pattern is conducted by ossicles into cochlea through the oval window: Auditory Processing of Sound—Biological Mechanisms n Auditory transduction: u Organ of Corti—structural features (see Figure 5.18 in text): « It lies on top of basilar membrane in the cochlear duct and spans the length of the cochlea. « Mechanical stimulation produces the initial bioelectric response. • Now vibrational pattern is traveling through liquid rather than air. 6 PSYC3702 Perception Lecture Slides Auditory Processing: Neural Mechanisms n Auditory transduction: u Organ of Corti—structural features « Arch of Corti divides the organ of Corti into an inner and outer portion. « Inner arch contains a single row of inner hair cells: • 3,000–3,500 inner hair cells « Outer arch contains three rows of outer hair cells: • 12,000 outer hair cells Auditory Transduction u Organ of Corti—structural features Both types of hair cells contain stereocilia: • Stereocilia are fine filaments that protrude from upper surface of hair cells. • Inner hair cells have 40 stereocilia, while outer hair cells have 150. « Tectorial membrane lies above the organ of Corti: • Connects with stereocilia of outer hair cells n Auditory transduction: u Organ of Corti—mechanical response to sound stimulation « Traveling wave in basilar membrane produces movement of the tectorial membrane and arches of Corti: • Produces a shearing force on the stereocilia of hair cells – causing them to bend 7 PSYC3702 Perception Lecture Slides u Role u Transductional mechanism in hair cells neural output from cochlea arises from inner hair cells. the movement of the tectorial membrane through two active mechanisms: • Mechanical response: – Physical changes in structure of hair cells to sound stimuli • Electrical response: – Nerve stimulation of outer hair cells – E.g. phase-locking of action potentials to peak amplitude of the acoustic stimulus « Bending of stereocilia on hair cells causes tension on tip links, which opens ionic pores and causes subsequent depolarization or hyperpolarization: • Tip links connect individual filaments of the stereocilia. n of outer hair cells: « Influence u Most Frequency representation: u Frequency theory: « Basilar membrane is capable of vibrating within full range of frequencies that are audible to humans. « Nerve fibres stimulated by the basilar membrane are fired at a rate that conveys sound frequency information. « But: • Neuronal firing rates rarely exceed 500 impulses per second. • Basilar membrane is not a constant width; it is wider at the apex and narrower at the basal end. Cochlear Processing of the Acoustic Stimulus Auditory Processing n Frequency representation: u Tonotopic n Frequency representation: u Place theory: organization: « Basilar membrane’s resonant frequency is organized in a topographic manner. « Representation is called a tonotopic map « Sounds of different frequencies produce a vibrational pattern whose maximum amplitude occurs at different places along the basilar membrane: • A high frequency sound produces maximum vibration at the basal end of the basilar membrane, whereas a low frequency sound produces maximum vibration at the apical end. 8 PSYC3702 Perception Lecture Slides Periodic acoustic stimuli are ‘decomposed’ into a spatiotemporal activation pattern Along the basilar membrane of the cochlea… n Neural capture of auditory signals: u Anatomical organization: « Cochlear nerve fibres originate from cell bodies in the cochlear ganglion (a.k.a. spiral ganglion): • Bipolar neurons with one branch innervates hair cells, and the other branch carries auditory signals to higher brain centres « 90% of cochlear nerve fibres terminate upon inner hair cells: • Remaining fibres innervate outer hair cells. The cochlea acts as a natural ‘spectrum analyzer’, converting complex waveforms into simpler constituent Components and measuring each’s contribution to the whole (much like Fourier’s Theorem). n Neural capture of auditory signals: u Afferent « Each versus efferent nerve fibres afferent fibre terminates on a single inner hair Auditory Processing:Neural Mechanisms cell: • Each inner hair cell can direct its output through nearly 10 separate nerve fibres. « Most of the efferent fibres terminate on the outer hair cells: • Inhibitory effect on afferent fibres The growth of sensation magnitude as a function of intensity varies with test frequency… n Neural capture of auditory signals: u Neural coding of sound intensity: « Afferent nerve fibres have different sensitivities. given inner hair cell can respond to a particular sound intensity by releasing different amounts of neurotransmitter at its 10 different synapses. « A Sound intensity is coded by afferent fibres with different sensi5vity profiles. n Very sensi5ve fibres have high spontaneous firing rate and saturate at low intensi5es. n Low sensi5vity fibres have lower spontaneous firing and saturate at higher intensi5es n Analogous to simultaneous exposure “bracke5ng” in photography 9 PSYC3702 Perception Lecture Slides Auditory Processing: Neural Mechanisms n Neural capture of auditory signals: Auditory Processing : Biological Mechanisms n u Neural coding of sound frequency—place theory mechanisms: Neural “capture” of auditory signals: u Neural coding of sound frequency—place theory mechanisms u Tuning curve = a plot of the minimum sound intensity required to obtain a sufficient neural response as a function of sound frequency « A particular afferent fibre carries signals from a hair cell that was depolarized by the particular sound frequency associated with its place in the tonotopic map: « Characteristic • With this place code, frequency information can be interpreted by the brain according to which fibre was activated. frequency = the lowest point in the curve Auditory Processing: Neural Mechanisms n Neural capture of auditory signals: u Neural coding of sound frequency—place theory mechanisms (see Figure 5.25 in text): « Frequency response curves (FRCs) = a plot of the neural discharge rate as a function of sound frequency Auditory Processing : Neural Mechanisms n Neural ‘capture’ of auditory signals: u Neural coding of sound frequency—frequency theory mechanisms: « Phase-locked response = neural firing that has a precise timing relationship to the sound waveform, being linked to either each compressive portion or spaced out to every few cycles « Can occur for frequencies up to 4000 Hz: • Beyond that, only the place code applies. 10 PSYC3702 Perception Lecture Slides Auditory Processing of Sound—Biological Mechanisms n Subcortical auditory structures: u Ascending pathways « SG neurons form a nerve trunk that enters the brainstem and terminates in the cochlear nucleus. « Cochlear nucleus is where auditory signals branch out in a bilateral fashion. Auditory Processing : Biological Mechanisms Auditory Processing :Biological Mechanisms n n Subcortical auditory structures: u Ascending « Neurons u Response structures: pathways: there, project to superior olive and inferior colliculus: « Tonotopicity: • Place code is propagated through the subcortical pathways all the way up to the cortex. • Binaural projections allow integration of auditory signals from both ears. « Laterality: « From there, projections to medial geniculate nucleus: • Auditory neurons that reside in structures above the cochlear nucleus integrate information from the two ears. • Thalamic nucleus there, projection to auditory cortex Auditory Processing: Biological Mechanisms Auditory Processing—Biological Mechanisms Subcortical auditory structures: u Descending pathways: « Signals from the cortex flow down through the same structures to modulate the auditory response to sound in two ways: • Efferent fibres that innervate the cochlea originate in the superior olive, called olivocochlear neurons, can influence firing rates in the ascending system. • Efferent system can also activate two small muscles in the middle ear, called the tensor tympani and the stapedius, to protect from damage due to intense sounds. modification: • Auditory signals are processed to some degree within a localized neural network. « From n characteristics of subcortical « Response in the cochlear nucleus are monaural: • No crossover of signals « From Subcortical auditory structures: n The auditory cortex: u Anatomical organization (see Figure 5.27 in text): « Ascending fibres leave the medial geniculate nucleus and terminate in area A1, or the primary auditory cortex, of the temporal lobe. « Area A2 surrounds area A1 and processes higher order acoustic information. « Wernicke’s area is involved in speech analysis. 11 PSYC3702 Perception Lecture Slides Auditory Processing: Biological Mechanisms n The auditory cortex: u Functional organization: « Tonotopic organization: • Result of the tonotopic projection pattern found in the subcortical structures « Combination of binaural inputs: • Summation response = inputs from both ears are excitatory and therefore one type of neuron in A1 can be driven by sound stimulation of either ear • Suppression response = input from the opposite side excites one type of neuron in A1 while input from the same side inhibits the neuron Auditory Processing of Sound: Biological Mechanisms n The auditory cortex: u Parallel auditory pathways: « Area A1 is organized to represent sound frequency and binaural interaction. « This helps with sound localization and processing temporal patterns. « These two functions may be processed in separate but parallel neural pathways of the higher auditory system. Psychoacoustics: n n n Intensity and loudness Frequency and pitch Auditory space perception Intensity and Loudness n Absolute audibility thresholds: u Procedural « Minimum aspects: audible field (MAF): • An open-field technique for obtaining detection thresholds by providing sound stimuli through a loudspeaker in front of the subject « Minimum audible pressure (MAP): • A closed-ear method for obtaining detection thresholds whereby sound is provided through a set of earphones 12 PSYC3702 Perception Lecture Slides u Absolute Intensity and Loudness n detection thresholds: « Minimal audibility curve • A psychophysical function that relates threshold sound detection levels in terms of frequency Audibility thresholds: u Absolute detection thresholds: « Hearing sensitivity is very dependent upon frequency. n Audibility thresholds: n n n Audibility thresholds: u Factors Anatomical and physiological determinants of the minimal audibility curve: n Three possible sources can account for the U-shaped profile of the curve: § Conductive elements of the outer and middle ear (maximal enhancement occurs at the middle frequencies) § Mechanical response function of the cochlea (not this one: the basilar membrane responds similarly across a range of sound frequencies) § Physiological properties of the auditory nervous system (higher frequencies, up to a point, produce a stronger neural signal) that affect absolute sensitivity: « The way in which the data are collected: • MAF technique yields lower thresholds than MAP technique, although the profile of the audibility curve is the same. « The duration of the sound stimulus: • Temporal summation = the way the auditory system integrates sound energy over time « Whether the sound is binaural or monaural: • Stimulation of both ears yields lower thresholds either one alone. • Binaural summation = the way the auditory system integrates sound information from both ears Loudness perception—relationship to intensity: u u Loudness grows as a power function of sound pressure level with an exponent of about 2/3: L = k SPL0.67. n Loudness perception—relationship to intensity: u Determinants « Cochlear of loudness perception: neurons encode discrete intensity regions: • The brain can map sound intensity into loudness by assessing which pool of neurons are responsive to a stimulus and what the firing rate is within that pool. 13 PSYC3702 Perception Lecture Slides n Loudness perception—relationship to intensity: u Determinants n Loudness perception—relationship to frequency: u Equal loudness contours : provide data on the intensity values that produce equal loudness perception across a range of frequencies. « Progressively higher intensities are needed to achieve equal loudness perception toward the extreme end of the frequency spectrum. « Equal loudness contours flatten out at higher intensities. of loudness perception: « These « Higher sound intensities produce a spread of excitation along the basilar membrane: « The greater the range of frequency-tuned fibres that are active, the larger the spectral bandwidth of the acoustic stimulus. « Thus, there are two ways to make a sound louder: « Increase its amplitude the timbral properties of the stimulus (make the sound more spectrally ‘rich’…) « Broaden n Loudness perception—relationship to frequency: u The phon scale of loudness: « Phon = the unit of measurement used to indicate loudness level across frequencies n Masking and noise: u Tonal masking—frequency effects: « Tonal n Masking and noise: u Masking—general « Masking properties: occurs when the detection threshold of a particular sound is raised because of other sounds. « Masking studies have provided considerable insight into auditory function. masking = the masking effects of a single tone by another « The most effective masking occurs when the masking and test tone frequencies are the same. « Masking tones of other frequencies require higher intensities to mask the test tone. 14 PSYC3702 Perception Lecture Slides Frequency and Pitch n Auditory Space Perception Coding mechanisms in pitch perception: u Place n versus frequency theories: Acoustic cues for localization: u Physical « Place theory is better suited to explain other instances of pitch perception: considerations « Azimuth = the horizontal angle around the head = the vertical angle with respect to the head (Pinna imposes time distortion products into the incoming acoustic signal) « Elevation • Dichotically presented harmonics Auditory Space Perception n Acoustic cues for localization: u Interaural intensity difference: « When sound reaching the farther ear has slightly less intensity than sound reaching the nearer ear « Occurs because: • Sound pressure decreases with increasing distance. • Head shadow effect = the head reflects incoming sounds: – Much greater with high frequency sounds – Much greater when the sound path difference between the two ears has the largest value Auditory Space Perception n Acoustic cues for localization: u Interaural time difference: « Sounds require extra time to travel to farther ear. needs to be dichotic. « The path difference of sound transmission to the two ears will depend on the azimuth angle. « It is less susceptible to changes in frequency than interaural intensity differences. « Stimulation 15 PSYC3702 Perception Lecture Slides Auditory Space Perception Auditory Space Perception n n Acoustic cues for localization: u Localization Neural mechanisms: u The under ambiguous conditions: binaural cross-correlation model: « Neural network model is based on interaural time differences « There are two possible angles for any given value of IID and ITD. « This can be resolved by turning or tilting the head. « Pinna also contributes to auditory localization under ambiguous conditions. Auditory Space Perception n Neural mechanisms: u Neural correlation mechanisms in the brainstem: « Superior olive is the primary site where binaural differences are coded in a systematic way. Auditory Space Perception n Binaural hearing and perceptual organization: u Masking effects of background noise: « Binaural system is very important for distinguishing signal from noise. « Masking effect of noise is minimum when it has a different spatial location from the signal. 16 PSYC3702 Perception Lecture Slides Auditory Space Perception n Binaural hearing and perceptual organization: u Auditory scene analysis: « This is the manner by which the ear achieves an effortless segregation of concurrent sounds. « First, the mixture of sounds undergo primitive grouping. « The subsequent processing stage involves schemadriven grouping. Testing hearing function: Typical behaviour hearing test Psychophysical tests: • The ‘pure tone’ audiogram • Speech recognition tests Acoustic reflexes: • Stapedius reflex Electrophysiologic tests: • Brainstem auditory evoked responses (BAERs) • Auditory evoked potentials (AEPs) detection thresholds: « Minimal The “normal” Audiogram: showing the relative change in sensitivity at each frequency of testing. Symbols for Air Conduction test: X = Left ear O = right ear. audibility curve • A psychophysical function that relates threshold sound detection levels in terms of frequency http://www.phys.unsw.edu.au/jw/hearing.html Note that O db HL (Hearing Level) refers to the RATIO of the sound pressure level At a given Ss’ detection threshold divided by the SPL of the normal population at that same frequency. Hearing Level (dB) u Absolute http://www.phys.unsw.edu.au/jw/hearing.html 17 PSYC3702 Perception Lecture Slides Testing hearing function: Brainstem Evoked Auditory Response Psychophysical tests: • The ‘pure tone’ audiogram • Speech recognition tests Acoustic reflexes: • Stapedius reflex Electrophysiologic tests: • Brainstem auditory evoked responses (BAERs) • Auditory evoked potentials (AEPs) Auditory Evoked Potentials Auditory Dysfunction Auditory Dysfunction n Causes of hearing loss: u Sensorineural n Causes of hearing loss: u Conductive « When loss: the mechanical conduction of sound through the ear is affected « Otitis media = middle ear infection can change the vibrational characteristics of the eardrum « Otosclerosis = an inherited bone disease that produces abnormal development and function of the ossicles loss: « When the cochlea or the neural elements of the inner ear malfunction « Usually occurs due to damaged hair cells: • e.g., ingestion of ototoxic drugs, injury, tumors, disease, exposure to intense environmental noise « Genetic causes: • Usher syndrome, Waardenburg syndrome « Aging: • Presbycusis = hearing loss that gradually occurs due to aging – Affects high frequencies first 18 PSYC3702 Perception Lecture Slides n Diagnosis and treatment: u Use the bone conduction test to distinguish between conductive and sensorineural loss. u Hearing aids: Audiogram from a patient with a sensorineural hearing loss in the left ear (X = air conduction; > = bone conduction) « Amplify incoming sounds effective as long as sensory function in the cochlea is not lost « Are u Cochlear implant: « Converts the sound signal into an electric current that is then delivered directly to the auditory nerve fibres Different causes of deafness result in different symptoms: The Psychological Effects Associated with Hearing Loss • In acoustic neuroma the pure tone audiogram shows an ipsilateral sensorineural deafness. • ‘Glue ear’ results in conduction deafness. The pure tone audiogram shows hearing loss particularly at low frequencies. • Meniere's disease is characterised by a low frequency sensorineural hearing loss. • Noise-induced hearing loss results in (typically) bilateral sensorineural hearing loss centred at 4 kHz… • Presbycusis (due to aging) results in high frequency hearing loss… • Presbycusis is a cumulative deficit (i.e., expression of the cumulative effects of hearing damage over the life span. Or: Why are hearing aids not a ‘fashion item’? The Puzzle of Recruitment in Hearing Loss: n n n n n A patient has a 40 dB hearing loss at 4 kHz, measured by pure tone audiometry. This means that at 4kHz the patient requires 40 dB more sound pressure than a ‘normal’ hearing individual to just detect the presence of the 4kHz tone. However, when the same tone is presented to the patient at above threshold, it is rated much louder than one would expect from a person with a 40 dB hearing loss… The auditory system “recruits” more neurons in response to higher intensities (i.e., the hearing loss decreases with increasing SPL above threshold) Thus, the corrective “gain” needed for louder sounds is less than for quieter ones…(problematic for hearing aids) “Deafness is a much worse misfortune for it means the loss of the most vital stimulus– the sound of the voice that brings language, sets thoughts astir, and keeps us in the intellectual company of man. Blindness separates us from things but deafness separates us from people.” -- Helen Keller (1880-1968) Speech samples with selected frequency bands Removed… Speech sample (full bandwidth – all frequencies present) Speech sample (low pass filtered -- all frequencies < 2000 Hz) Speech sample (low pass filtered -- all frequencies < 500 Hz) Speech sample (high pass filtered -- all frequencies > 2000 Hz) Speech sample (high pass filtered -- all frequencies > 500 Hz) 19 PSYC3702 Perception Lecture Slides Applications: • • • • • • • • • • • • Hearing aid design Digital sound coding / compression High Fidelity Music Reproduction Cochlear implantation technology Speech recognition technology Room/Hall acoustics design Musical instrument design Auditory alarm design Industrial Noise Suppression Echolocation Acoustic spectrography/’fingerprinting’ Automatic voice recognition Applications: • Hearing aid design: cochlear implant Applications: • Hearing aid design Applications: • Digital sound coding / compression Perceptual Audio Coding: Basic Ideas • Decompose a signal into separate frequency bands by using a filter bank • Analyze signal energy in different bands and determine the total masking threshold of each band because of signals in other band/time • Quantize samples in different bands with accuracy proportional to the masking level – Any signal below the masking level does not need to be coded – Signal above the masking level are quantized with a quantization step size according to masking level and bits are assigned across bands so that each additional bit provides maximum reduction in perceived distortion. Utube presentation on the cochlear implant • Digital reproduction of speech sounds demo Applications: Applications: • High Fidelity Music Reproduction • Speech recognition technology • Acoustic reproduction (music, soundtracks, voice) • Applications for cinema, live theatre and conferences • Automated voice dictation systems, • ‘Hands-free’ telephony, • Automated voice-band surveillance, etc. • Voice operated machine control • Speech-based demand management systems 20 PSYC3702 Perception Lecture Slides Music Perception Applications: • Room/Hall acoustics design (standards) Speech Intelligibility may be expressed by a single number value. Two scales are most commonly used: Speech Transmission Index (STI) and Common Intelligibility Scale (CIS) n n The challenge to define music… Music is sonic information that is: temporally correlated with respect to pitch, harmony, rhythm and loudness u spatially correlated with respect to timbre, loudness and … u Both involve measuring the input/output relationship between a set of test signals broadcast into a room and factors affecting the signal fidelity, including, speech level, background noise level, masking effects, reverberation effects, etc. n Music is not: Music Perception n Musical pitch and timbre: u Pitch sensations « Various harmonic components produce an orderly pattern of peak wave activity on the basilar membrane. « The brain can read this spatial pattern to perceive both a pitch sensation and a timbral sensation « Temporal properties of the stimulus are important « Removing the attack transient on a note makes it more difficult to identify the musical instrument that created it. Struck, plucked and bowed strings have a number of different preferred states of vibra:on or resonance which produce harmonics 21 PSYC3702 Perception Lecture Slides Music Perception n Musical pitch and timbre: u Musical scales—chromatic intervals and tone height « Tone chroma = relative position of the tones within an octave « Tone height = the series of successive octave registers spanning the full musical pitch spectrum Music Perception Music Perception n Perceptual aspects of music: u Tonal n Musical pitch and timbre: u Perfect pitch: « Relative pitch = the ability to accurately identify tone intervals « Perfect pitch = the ability to identify an isolated tone by name • Arises from a combination of genetic and environmental factors superpositions—monophonic versus polyphonic tones: « Auditory system is able to identify individual tones when presented together in a chord. « It can discriminate two notes of the same pitch but different timbre. « It can also discriminate two notes of different pitches. *recall the distinction between prothetic and metathetic perceptual continua… Adding notes together results in a perceptually segregated set (unlike adding colours together)… Music Perception n Perceptual aspects of music: u Tonal superpositions—consonance and dissonance: « Two or more simultaneously played notes generate a consonant sensation if they are perceived as pleasant, and dissonant if unpleasant. « Consonant sounds occur when: • Two or more tones contain harmonic frequencies that coincide and therefore reinforce each other. • A distinct and separable frequency pattern is present among two or more tones played simultaneously. 22 PSYC3702 Perception Lecture Slides Music Perception n Perceptual aspects of music: u Tonal sequences—melody, tempo, and rhythm: Melody refers to a temporal sequence of pitches Harmony refers to a SCALE CONTEXT into which the melody is placed… Both Melodic perception is invariant (within limits) with many transformations of key (base pitch), meter, timbre, harmonic content, and lyrics! « Melody = a group of musical tones that forms a sequential appearance in time (with statistical properties related to an intermediate information level) « Tempo = the perceived speed of tonal presentations in a melodic sequence « Rhythm = the perceptual organization of a melodic sequence based on temporal grouping of musical tones « Contour = the nature of pitch changes with time Music Perception The ORDINAL PATTERN of a transposed melody remains constant. S+_ _+_ _ _ _ S+_ _+_ _ _ _ n Perceptual aspects of music: u Tonal sequences—cognitive theories: « Gestalt principles are believed to be responsible for generating the perceived wholeness of music: • Proximity = tones appearing close together in time will be grouped as a perceptual unit • Similarity = tones that have a similar pitch or timbre are perceptually bounded together in the same group • Common fate = tonal sequences whose pitch or intensity pattern change in the same direction form a common unit Gestalt Principles associated with music perception: Gestalt Principles associated with music perception: Melodic components “reveal” themselves due to: Melodic components “reveal” themselves due to: Spatiotemporal “grouping”… (auditory “stream segregation”) Spatiotemporal “grouping”… (auditory “stream segregation”) proximity http://webpages.mcgill.ca/staff/Group2/abregm1/web/snd/Track05.mp3 • “Growth” of segregation with repetition When sequential tones are close in pitch, they are perceived as ‘belonging’ together… When sequential tones are close in time, they are perceived as belonging together… As you separate either pitch or time (or both) the stream ‘segregates’ into two or more Separately perceived streams… Auditory demonstrations downloaded from Dr. Al Bregman’s website, McGill University http://webpages.mcgill.ca/staff/Group2/abregm1/web/downloadsdl.htm • Segregation of a melody from an auditory note stream. -- more effective when the pitch of the tones in the melody are different from the rest of the stream… 23 PSYC3702 Perception Lecture Slides Gestalt Principles associated with music perception: Music Perception Stream segregation as a melodic construction in music -- using pitch grouping (excerpt from sonata by Telemann) n Music and the brain: u Pitch The sum is greater than its parts (under the right conditions): The sum is NOT greater than its parts (under the right conditions): Segregation of rhythmic components perception is processed in planum temporale. u Different patterns of brain activation occur when listening to non-melodic compared to melodic sequences. Melodic segregation by differences in timbre (commonly used as a compositional mechanism in music) Speech Perception All normally developing humans have the capacity for speech and language acquisition Speech/hearing evolve and develop together Speech and hearing experiences modify brain function and structure Excitation of the vocal apparatus n This shows a sequence of focal fold posi5ons during speech. n Rapid pulses of air pressure into the oral/nasal cavi5es produce the ‘vocal carrier’ waveform for all speech sounds… n (Male: 125 pulses/sec f0 ~ 125 Hz Female: 250 pulses/sec f0 ~ 250 Hz) Voiced excitation (phonation). Exhalatory air streams through the larynx, passing through the vocal cords (glottis). At appropriate tension, these are excited to quasi-periodic vibration (mostly constant, but can be somewhat variable in period and amplitude). Voiceless excitation. Air streams through the open glottis and passes through a constriction in the oral cavity or in the pharynx. At this point, the friction produces turbulence and hence, a noise whose spectrum is determined by the position of the constriction. Transient excitation (plosive excitation). Air congests at an occlusion in the oral cavity or in the pharynx. A sudden opening of the occlusion releases the pressure. 24 PSYC3702 Perception Lecture Slides Modulation of the vocal ‘carrier’ Modulation: (Electrical Engineering) a. the act or process of varying the amplitude, frequency, phase, etc., of a ‘carrier’ signal under control of another signal For speech this means varying the vocal carrier (created by the glottal pulse or by exhaled air) under control of the brain. This results in changes in the SHORT TIME FREQUENCY SPECTRUM of the vocal stream. From M1, 6 of the 12 cranial nerves send motor fibers to the muscles involved in the production of speech: • • • • • • Trigeminal (CN5) Facial (CN7) Glossopharyngeal (CN9) Vagus (CN10) Spinal accessory (CN11) Hypoglossal (CN12) Controlling the muscles of the pharynx, palate, tongue, and face and of mastication. Tongue positions for articulating vowels… The intelligence (message) in speech is “carried” in the modulations of the vocal apparatus n n n Thus, limits to the rate of production of speech sounds sets the upper limit to the amount of information that can be carried per unit time… (about 10 speech events/sec ~ 50 – 60 bits/sec ~ realistically closer to 30 bits/sec) Bilateral cerebral and cerebellar regions are involved in the motor act of articulation, irrespective of the type of speech. Three additional, left‐lateralized regions, adjacent to the Sylvian sulcus, are activated in common: the most posterior part of the supratemporal plane, the lateral part of the pars opercularis in the posterior inferior frontal gyrus and the anterior insula. Various speech components use different parts of the ‘auditory island’… n Vowel sounds are largely represented in lower frequency end. n Consonants cover wider range of frequencies. n Frica5ves are largely represented in higher frequency end. 25 PSYC3702 Perception Lecture Slides The “waterfall plot or continuous speech spectrogram: A graph showing how the sound spectrum varies with time (due to articulation) (Note the plot of the amplitude envelope as a function of time in red) The darker the blue pixels the more energy located at that point in frequency/time… Note the ‘formant’ shapes ( ridges of energy (dark bands) located at different Frequencies parallel to the time axis), labelled F1, F2 and F3… n Spectrogram for four words reveals discrete bands of resonant frequencies, known as formants. n Formants indicated by white lines are numbered from lowest frequency. n The posi5on of a formant on the frequency axis depends on tongue posi5on… F3 F2 F1 Formants are produced by specific positions of the tongue during vocalization n Speech comprehension: u Perception « The of phonetic segments—consonants: sounds that occur before or after a phoneme can affect the way it is perceived. « Timing sequence during articulation of certain consonants also an important feature of a sound. « Speech perception is an example of categorical perception – many to one mapping of the properties of the acoustic stimulus to perceptual experience Examples of articulation components of speech Spectrogram of “connected” (con5nuous) speech. Acous5cal signals of individual words merge into a con5nuous paYern where the beginnings and endings of words cannot be easily determined (visually). This is one of the most important reasons why it has been so difficult to build ‘automa5c’ speech recogni5on systems. (“Disconnected” speech recognizers came first…) How does the brain solve this problem? Ans: through “Top Down” influences, Such as knowledge of a language, knowledge of a dialect or accent, facial features During speaking, linguistic context and information density of the utterance, etc. 26 PSYC3702 Perception Lecture Slides Formant transi5ons occur in consonant-­‐vowel pairing. Transi5ons represent abrupt shi]s in acous5c energy from one frequency to another over a short 5me (< ~ 50 msec) Three Problems for Speech Perception Segmentation problem = the difficulty in parsing speech due to the relative lack of acoustical boundaries between individual words in free-flowing speech n Variation problem = the difficulty in recognizing different acoustic stimuli as carrying the same information… n Coarticulation problem = the difficulty in parsing speech due to overlapping speech segments that occur from the contextual relationship between adjacent phonetic segments n Speech Perception n Speech comprehension: u Perception n Voice-­‐onset 5me (VOT) of 30 ms represents transi5on point between voiced [d a] and voiceless [t a] sounds. n Discrimina5on ability is poor if VOTs fall within a single category. n Excellent discrimina5on ability occurs for sounds whose VOT falls on either side of phone5c boundary. of words and word sequences: « A crucial element is having a sufficient lexical database. « Semantics and grammatical rules also play a critical role. « Phonemic restoration effect showed that the context of the speech segment is important too. « McGurk effect shows that speech perception also includes visual information regarding articulation. Speech Perception is Categorical: The variations in the production of speech sounds results in acoustic variability in the speech signal. n Neural processing of speech begins in primary auditory cortex. n Signals from there feed into Wernicke’s area—key area for speech comprehension. n Signals then feed into Broca’s area—key area for speech produc5on. n Motor cortex finally receives signals to coordinate motor movements necessary for mouth and tongue movements involved in speech. Thus, a large set of acoustic stimuli (speech sounds) must map to a small set of speech categories which, in turn, maps to an even Smaller set called ‘intelligence’ or ‘meaning’. The amount of intelligence carried in a speech sample is not large… Maximum is the !Xóõ language (Botswana) 112 phonemes Minimum is the Rotokas language (New Guinea) 11 phonemes English language has ~ 40 phonemes In part this is why translation is such a challenge… Compare the information processing task of speech perception to the information processing task of Receptors on the post-synaptic membrane (also a receiver in a communication system…) Is the relative recency of speech (on the evolutionary time scale) possible only because of the information Processing capacity of the auditory cortex? 27 PSYC3702 Perception Lecture Slides n Speech comprehension: u Interpretation of the speech signal: « Combination « A of bottom-up and top-down processing form of translation…. "Still it seems to me that translation from one language into another, if it be not from the queens of languages, the Greek and the Latin, is like looking at Flemish tapestries on the wrong side; for though the figures are visible, they are full of threads that make them indistinct, and they do not show with the smoothness and brightness of the right side" (Cervantes, 1615) Natural human languages (i.e., languages that evolved naturally) Are redundant (i.e., contain less than the maximum information per unit symbol theoretically possible) For example, English texts can be compressed by a factor of 4 without loss of information. Spoken English carries ~ 30-40 bits/sec of information Digital reproduction of a hi-fidelity recording of a speaker’s voice Requires a thousand times more information…44.1kBits/sec. Thus, MOST of what is spoken in speech isn’t necessary to transmit What is meant… u Brain regions near the auditory cortex involved in language and speech processing u Brain asymmetry and lateralization: « Much of the neural processing underlying language production and comprehension are restricted to the left hemisphere in most people. « Right hemisphere performs high-level functions such as language interpretation, drawing inferences, and resolving linguistic ambiguities. Wernicke’s aphasia = following damage to Wernicke’s area in the temporal lobe, the deficit in understanding language « Broca’s aphasia = following damage to Broca’s area in the frontal lobe, the deficit in speech production ability « 28