Auditory Perception Hillenbrand SPPA 2060 Auditory perception is one branch of a larger science called psychophysics. Psychophysics studies the relationships that exist between perceptual dimensions (also psychological, subjective, or mental) and the physical properties of stimuli. The distinction between perceptual dimensions and physical dimensions is all important. Physical dimensions: Any aspect of a physical stimulus that could be measured in a straightforward way with an instrument (e.g., a light meter, a sound level meter, a spectrum analyzer, a fundamental frequency meter, etc.) Perceptual dimensions: These are the mental experiences that occur inside the mind of the observer. These experiences are actively created by the sensory system and brain based on an analysis of the physical properties of the stimulus. Perceptual dimensions can be measured, but not with a meter, Measuring perceptual dimensions requires an observer (e.g., a listener, a “looker”, a smeller, a taster …). For example, in vision: The percept of hue is created by the eye and brain based (in part) on the visual system’s analysis of the wavelength composition of the stimulus. But: hue wavelength wavelength: physical dimension (can be measured with a meter) hue: psychological dimension (can be measured, but that requires an observer) Visual Psychophysics Perceptual Dimensions Hue Brightness Shape Physical Properties of Light Wavelength Luminance Contour/Contrast Both dimensions can be measured – the physical dimensions can be measured with the right instrument; measuring psychological dimensions requires an observer. 6/20/2016 6 Auditory Psychophysics (aka psychoacoustics or auditory perception) Perceptual Dimensions Pitch Loudness Timbre (sound quality) 6/20/2016 Physical Properties of Sound Fund. Freq. (f0) Intensity Spectrum env./ Amplitude env. 7 Perceptual Experiences are Actively Created, Not Passively Received Subjective contour: The triangles, circles and squares are “seen” not so much because they are “there” in the physical sense, but because they are inferred. Unconscious inference lies at the heart of perception. In some sense, “I’ll see it when I believe it.” is more true than “I’ll believe it6/20/2016 when I see it.” 8 Reversible Figures Reversible figures reveal the active organization of percepts – the drawing on the left is organized by you brain into a bird, then reorganized into a rabbit, then back to a bird, … Same with the old lady/young lady. 6/20/2016 9 Another duck-rabbit, Which is bigger? just for yucks. Bottom one, eh? Nah. They’re the same. (This is the Jastraw Illusion.) The Muller-Lyer Illusion Which horizontal line is longer? The Muller-Lyer Illusion Surprise, surprise. They’re the same. Duh everything in this field is always the same. It gets on your nerves. The corridor illusion: Which cylinder is larger? The cylinder to the right appears larger because the visual system infers that it is further away. The inference is unconscious, automatic and obligatory (i.e., you can’t help yourself – even when you know the trick). 6/20/2016 13 The McGurk Effect (McGurk & Macdonald, 1976*) *McGurk, H., and MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748. 6/20/2016 14 Some History on the McGurk Illusion The most striking demonstration of the combined (bimodal) nature of speech understanding appeared by accident. Harry McGurk, a senior developmental psychologist at the University of Surrey in England, and his research assistant John MacDonald were studying how infants perceive speech during different periods of development. For example, they placed a videotape of a mother talking in one location while the sound of her voice played in another. For some reason, they asked their recording technician to create a videotape with the audio syllable "ba" dubbed onto a visual "ga." When they played the tape, McGurk and McDonald perceived "da." Confusion reigned until they realized that "da" resulted from a quirk in human perception, not an error on the technician's part. After testing children and adults with the dubbed tape, the psychologists reported this phenomenon in a 1976 paper humorously titled "Hearing Lips and Seeing Voices," a landmark in the field of human sensory integration. This audio-visual illusion has become known as the McGurk effect or McGurk illusion." Further reading: Dominic W. Massaro & David G. Stork, "Speech Recognition and Sensory Integration", American Scientist, 1998, vol. 86, p. 236-244. The McGurk effect has played an important role in audio-visual speech integration and speech reading. McGurk links on the web include the following: http://www.psych.ucr.edu/faculty/rosenblum/AVspeech.html http://www.theshop.net/campbell/mcgurk1.htm http://www.amsci.org/amsci/articles/98articles/massaro.html http://www.sys.uea.ac.uk/~iam/newav/newav.html http://www.media.uio.no/personer/arntm/McGurk_english.html http://macserver.haskins.yale.edu/Haskins/HEADS/BIBLIOGRAPHY/bibliomcgurk.html 6/20/2016 15 The Three Main Perceptual Attributes of Sound • Pitch (not fundamental frequency) • Loudness (not intensity) • Timbre (not spectrum envelope or amplitude envelope) The terms pitch, loudness, and timbre refer not to the physical characteristics of sound, but to the mental experiences that occur in the minds of listeners. 6/20/2016 16 Pitch and Fundamental Frequency Rule 1: All else being equal, the higher the f0, the higher the perceived pitch. Lower f0, lower pitch 6/20/2016 Higher f0, higher pitch 17 Rule 2: The ear is more sensitive to f0 differences in the low frequencies than the higher frequencies. This means that: 300 vs. 350 3000 vs. 3050 That is, the difference in perceived pitch (not f0) between 300 and 350 Hz is NOT the same as the difference in pitch between 3000 and 3050 Hz, even though the physical differences in f0 are the same. 300-350 vs. 3000-3050 Which f0 difference is larger? (A: They’re the same.) Which pitch difference is larger? (A: 300 vs. 350 – by a lot) 6/20/2016 18 Lower f0, lower pitch Higher f0, lower pitch Three ways to measure f0 1. Frequency domain: Measure H1 (i.e., the lowest frequency harmonic). 2. Frequency domain: Measure the harmonic spacing. 3. Time domain: Measure the fundamental period. 6/20/2016 19 The “Problem” of the Missing Fundamental Normal f0: 6/20/2016 f0 Removed: 20 Conclusion: The fundamental does not need to be physically present in the signal for a listener to hear a pitch corresponding to where f0 ought to be. What Explains This? Even with the 1st harmonic removed, a signal remains periodic at the original f0. 6/20/2016 21 Amplitude Amplitude 100 90 80 70 60 50 40 30 20 10 0 100 0 90 80 70 60 50 40 30 20 10 0 0 Harmonics at 1200, 1400 1600, 1800 ... Harmonic spacing is 200 Hz Signal is periodic at 200 Hz 500 1000 1500 2000 2500 3000 'Harmonics' at 1240, 1440 1640, 1840 ... The “Pitch Shift” Effect 'Harmonic' spacing is 200 Hz Signal is periodic at ~205 Hz (Don't worry about why. It just is.) 500 1000 1500 2000 2500 3000 Frequency (Hz) If the auditory system evaluated pitch by measuring the harmonic spacing, these 2 signals (1200, 1400, 1600 … and 1240, 1440, 1640 …) would have the same pitch. They do not have the same pitch, so we can rule out harmonic spacing. Which theory is left? Measuring the fundamental period. 6/20/2016 22 What does all this mean? Rule 3: The sensation of pitch is probably based on a measurement of the fundamental period. It is definitely not based on a measurement of either (a) the lowest frequency harmonic in a harmonic spectrum (because of the “missing fundamental” effect), or (b) harmonic spacing (because of the “pitch shift” effect). 6/20/2016 23 Loudness and Intensity Rule 1: All else being equal, the higher the intensity, the greater the loudness. Higher intensity, higher loudness 6/20/2016 Lower intensity, lower loudness 24 Rule 2: The relationship between intensity and loudness is seriously nonlinear. Doubling intensity does not double loudness. In order to double loudness, intensity must be increased by a factor of 10, or by 10 dB [10 x log10 (10) = 10 x 1 = 10 dB]. This is called the 10 dB rule. Two signals differing by 10 dB: (500 Hz sinusoids) Note that the more intense sound is NOT 10 times louder, even though it is 10 times more intense. 6/20/2016 25 The 10 dB rule means that a 70 dB signal will be twice as loud as a 60 dB signal, four times as loud as a 50 dB signal, eight times as loud as a 40 dB signal, etc. A 30 dB hearing loss is considered mild – just outside the range of normal hearing. Based on the 10 dB rule, how much is loudness affected by a 30 dB hearing loss? (Answer: 1/8th. But note that this does not mean that someone with a 30 dB loss will have 8 times more difficulty with speech understanding than someone with normal hearing.) Rule 3: Loudness is strongly affected by the frequency of the signal. If intensity is held constant, a mid-frequency signal (in the range from ~1000-4000 Hz) will be louder than lower or higher frequency signals. 250 Hz, 3000 Hz, 8000 Hz The 3000 Hz signal should appear louder than the 125 or the 8000 signal, despite the fact that their intensities are (about) equal. (Remember that this is the reason for the dBHL scale.) 6/20/2016 27 Timbre (also sound quality or tone color) Timbre, also known as sound quality or tone color, is oddly defined in terms of what it is not: When two sounds are heard that match for pitch, loudness, and duration, and a difference can still be heard between the sounds, that difference is called timbre (also called sound quality or tone color). 6/20/2016 28 Example: a clarinet, a saxophone, and a piano all play a middle C at the same loudness and same duration. Each of these instruments has a unique sound quality. This difference is called timbre, tone color, or sound quality. There are also many examples of timbre difference in speech. For example, two vowels (e.g., [ɑ] and [i]) spoken at the same loudness and same pitch differ from one another in timbre. 6/20/2016 29 There are two physical correlates of timbre: • • spectrum envelope amplitude envelope spectrum envelope: Smooth line drawn to enclose an amplitude spectrum. amplitude envelope: Smooth line drawn to enclose a sound wave (time domain representation). 6/20/2016 30 Timbre and Spectrum Envelope Timbre differences between one musical instrument and another are partly related to differences in spectrum envelope -- differences in the relative amplitudes of the individual harmonics. In the examples above, we would expect all of these sounds to have the same pitch because the harmonic spacing is the same in all cases. The timbre differences that you would hear are controlled in part by the differences in the shape of the spectrum envelope. 6/20/2016 31 Six Synthesized Sounds Differing in Spectrum Envelope Note the similarities in pitch (due to constant f0/harmonic spacing) and the differences in timbre or sound quality. 6/20/2016 32 Vowels Also Differ in Spectrum Envelope Shown here are the smoothed envelopes only (i.e., the harmonic fine structure is not shown) of 10 American-English vowels.* Note that each vowel has a unique shape to its spectrum envelope. Perceptually, these sounds differ from one another in timbre. Purely as a matter of convention, the term timbre is seldom used by phoneticians, although it applies just as well here as it does in music. In phonetics, timbre differences among vowels are typically referred to as differences in vowel quality or vowel color. Hillenbrand and Houde (2003). “A narrow band pattern-matching model of vowel perception,” Journal of the Acoustical Society of America, 113, 1044-1055. 6/20/2016 33 * From Aperiodic sounds can also differ in spectrum envelope, and the perceptual differences are properly described as timbre differences. 6/20/2016 34 Amplitude Envelope Timber is also affected by amplitude envelope. Amplitude envelope is a smooth line drawn to enclose a sound wave. It is also sometimes called the amplitude contour of the sound wave. These are both good terms since the amplitude envelope shows how overall signal amplitude varies over time. Amplitude envelope refers mainly to the characteristics of the way sounds are turned on and turned off. The four signals below are sinusoids that differ in their amplitude envelopes. Leading edge = attack 6/20/2016 The attack especially has a large effect on timbre. Trailing edge = decay 35 Same melody, same spectrum envelope (if sustained), different amplitude envelopes (i.e., different attack and decay characteristics). Note differences in timbre or sound quality as the amplitude envelope varies. 6/20/2016 36 Timbre differences related to amplitude envelope also play a role in speech. Note the differences in the shape of the attack for [bɑ] vs. [wɑ] (top) and [ʃɑ] vs. [tʃɑ]. abrupt attack more gradual attack 6/20/2016 more gradual attack abrupt attack 37