Lecture 7: Auditory Scene Analysis: I Slide 1 Recap Timbre Spectral Components Critical bandwidth Envelope – Attack, Steady State and Release Judgements of Timbre – Grey Shepard Tones – Never ending ascent/descent Sound Localisation Inter Aural Time Difference Inter Aural Intensity Difference Hass or Precedence Effect CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 2 Auditory Scene Analysis: 1 The Identification of Auditory Objects The Perception of Temporal Patterns Gestalt Psychology Gestalt Principles Proximity Similarity Common fate Good Continuation Disjoint Allocation (Belongingness) Closure CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 3 The Identification of Auditory Objects Thus far we have looked at the perceptual processes involved in the perception of pitch, loudness and timbre. These are all cues as to identity of things and processes. Pitch, loudness and timbre are fundamental – but the auditory world that we live in is highly complex and involves multiple sounds originating from multiple sources occurring at one instant, and also over time. Our hearing has to be able to separate out these sounds and say this is a car, that is a leaf rustling in the gutter, or a violin, or a piano, etc…. Research indicates that the following cues are used to identify and distinguish the sounds associated with events: Fundamental Frequency. Onset Timing – how synchronised sound sources are. Contrast with Prior sounds (i.e. changes in pitch, loudness and timbre). Correlated Modulation or Change – e.g. in Amplitude or Frequency – many sounds have signatures that are reflected in synchronised patterns of AM and FM. Location – Sounds coming from two locations are unlikely to be from one source, while sounds from one location are probably from one source. CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 4 The Identification of Auditory Objects This richness is useful. Along any one dimension we only have a limited capacity to categorise – up to 5 – 6 categories. If we want to have more categories we need to use more dimensions. Fundamental Frequency If two sounds share their fundamental frequency then they fuse into a single percept. Experiments with synthetic vowels showed that if sets of harmonics are split into two groups that don’t share a fundamental then we hear more than one pitch. If one harmonic in a series is mistuned enough then it will stand out and be heard as a separate pure tone. CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 5 The Identification of Auditory Objects Onset Timing Rasch(1978). Played two tones– a masker Fm and a test tone Ft Ft was just higher than Fm – i.e. it was at the JND level. Rasch observed three things when the tones were played more or less synchronously. … When played synchronously - if Ft is 20dB (or more) less intense than Fm it is not heard. If Ft is played slightly (say 30 ms,) before Fm, then Ft can be up to 60 dB less powerful than the masker before it is inaudible. If Ft is started before masker and then stopped it seems to continue. Thus, sounds that are difficult to separate when played at exactly the same time become separable if there is a slight asynchrony i.e. time difference, between them. Also, given proximal or overlapping stimuli the auditory system can be fooled into thinking that something is occurring when it isn’t. CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 6 The influence Onset Timing on Auditory Objects Rasch also observed that if Fm and Ft are played synchronously or up to 30ms asynchronously, but the rise-time (onset of energy) in Ft is faster than for Fm they are heard as separate but simultaneous Simultaneity in hearing is approx. < 15 ms, and pitch fusion is approx. 60 ms i.e. we don’t have to be consciously aware of the onset differences to be able to use them to distinguish tones. Example: ASA, Demonstration 21 In the example 4 tones have slightly different onset times and also vary their rise time. This affects the clarity with which their order can be heard. The order is either M L H M or M H L M This kind of effect is important in ensemble playing where the onset of instruments is never completely synchronised. CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 7 The influence of Contrast with Prior sounds on Auditory Objects The auditory System uses the change in a signal to identify and distinguish Steady state Leads to adaptation Change Differences of stimuli stands out Or, when change occurs adaptation is removed. Experiments with spectral sounds and noise (Zwicker 1964) Noise by itself is colourless If noise is played after a spectral tone then it sounds coloured – coloration is the inverse of the spectral sound. If a notched noise is played and then a noise with a flat spectrum distribution it sounds as if it has a spectral peak at the same place as the notch. Experiments with complex tones Generally we hear synthetically – we are unaware of components If one of the harmonics is changed (This can be higher or lower in energy) it stands out –after a while our awareness of it fades Changes in sound are important for noticing the arrival of a new source – e.g. you are lying on the grass listening to the wind when a car drives up. CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 8 The influence Coherent Changes in Amplitude or Frequency on Auditory Objects Rasch (1975) showed that synchronous tones could be separated by Modulating their frequency Modulating their amplitude FM reduces level loss of F0 against masker by 17dB. This also affects the fusion of sounds (McAdams 1982, 1989) If a subset of the harmonics of a complex tone constructed from random harmonics,vary in AM or FM, these harmonics stand out like a figure against a ground. This is the Gestalt Principle of Common fate. Also this affects whether we can hear complex tones analytically or synthetically. Synchronised onset synthetic Asynchronous onset analytical CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 9 The influence of Location on Auditory Objects Our hearing assumes that sounds coming from different places are probably coming from different sources This also affects masking affects – Masking Level Difference – MLD If sounds are in phase then masking is greater that if they are out of phase The auditory system seems to assume that if they are out of phase they are from different sources because masking effects are suppressed (Kubovy 1974). The cues outlined above are all capable of distinguishing sounds. In the world patterns of change in parameters is regular, i.e. consistent. The auditory system has adapted to extract and organise these regularities – this process is called Auditory Scene Analysis. ASA is thought to follow Gestalt Principles of Organisation CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 10 Gestalt Psychology Gestalt means pattern. Gestalt Psychology originated in 1920’s, (Kohler, Koffka & Wertheimer) The basic principles underlying Gestalt psychology are The whole is greater than the sum of the parts. The parts are defined by the whole as much as vice versa Gestalt psychologists are best known for their work in vision – but their principles are also applicable to auditory perception. They argued that the brain’s ability to organise is innate to it and originates in patterns of electrical activity in the brain (never proven). They systematically developed a set of principles of perceptual organisation that they thought determine how we assemble or associate components in a perceptual field. These principles are: 1. Proximity 2. Similarity Bottom Up 3. Common Fate (Common Direction) 4. Good Continuation 5. Disjoint Allocation (Belongingness) Top Down 6. Closure CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 11 Gestalt Principals Two sets of Principles Bottom Up (BU) Hard wired, Pre-attentive, Not Learned Top Down Plastic, Schematic, Learned (TD) Can be difficult to know exactly what effects are operating Gestalt principles are rules of thumb that seem to have been incorporated in to the auditory system over evolution because they give approximately right answers most of the time. Bregman (1978) and Bregman and Pinker (1978) distinguished between two concepts in auditory processing. Source – a physical entity producing sound Stream – a coherent set of ordered or simultaneous events indicates a source In this view hearing is a process of auditory parsing Simultaneous event = single source Connected sequences indicate what is happening over time CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 12 Gestalt Principles of Perceptual Organisation Some Visual Analogies CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 13 Gestalt Principle of Proximity In vision when elements in an image are close together they are perceived to be together and separate from others that are further away, even though they are similar, e.g. In hearing sounds occurring together over time are clustered – e.g. CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 14 Gestalt Principle of Similarity Two or more auditory events are grouped if they are similar in timbre, pitch, loudness, or close in apparent location or time Pure tones - closer in pitch grouped Van Noorden (1975) sequences of Complex Tones – but with missing fundamental Fundamentals in same region but harmonics are not leads to fission i.e. Different timbres but same pitch = unfused Harmonics in same region but fundamentals not leads to fusion i.e. Different pitches same timbre = fused This is not clear-cut –depends on individual differences. Loud and Soft tones (Van Norden) If the difference in loudness is large enough they form different streams – either can be attended to Same dB single stream at twice the tempo CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 15 Gestalt Principle of Common Fate. Components in sound act together The tend to start and finish together They tend to change in pitch or intensity together Therefore if we have a complex sound and the components are coordinated then they are fused e.g. onset disparities, and AM and FM (tremolo & vibrato) For example if harmonics 2,4 and 8’s frequency is modulated (FM) they separate from harmonics 3,5,6 and 7 Example: ASA, Demonstration 19, FM Or if the frequency of the 1st harmonic is modulated (FM) at a different rate it separates from harmonics 3,4 and 5 Example: ASA, Demonstration 20, FM CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 16 Gestalt Principle of Good Continuation. Natural sound sources tend change gradually rather than abruptly in frequency, intensity, location or timbre Therefore: Abrupt change == new stream == new source Low and high tones that tend to split into streams – this can be suppressed by putting glides in between (Bregman & Dannenbring 1973) In speech if there are oscillations in frequency it gives the impression that there are two speakers saying the one word In music in general if a note is near in pitch to the one just before it then it will be heard as the next note in the melody rather than a note that is separate - higher or lower Example ASA, Demonstration 12 CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 17 Gestalt Principle of Disjoint Allocation (Belongingness). A component can only come from one source – i.e. hearing tries to use each component only once (Bregman & Rudnicky 1975) Say we have two tones at slightly different pitches and these can either be heard in isolation or embedded in another series of tones thus In isolation the order of AB or BA is easily judged. The addition of tones (X’s) close to the pitch of AB act as distracters making it difficult to order AB (This is thought to be because we attend more to the start and end of sequences). But if more X’s are added, they form a stream that is separate from AB and again the order of AB is easily judged. This not hard & fast – ambiguity is possible and this shows that this level of organisation is on the boundary of being pre-attentive and attentive It also shows how the addition of new elements changes the perceptual organisation of the stimulus. Example ASA, Demonstration 16 CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick Lecture 7: Auditory Scene Analysis: I Slide 18 Gestalt Principle of Closure through Apparent Continuity. A source maybe obscured or absent – but its percept continues. e.g. FM radio – disturbance from ignition of passing cars – we hear a click over the sound whereas in fact the radio is producing only a click. A pitched sound that is broken but the gap is filled by noise seems unbroken. Example ASA, Demonstration 28 Similarly a glide that is broken but the gap is filled with noise seems unbroken. Example ASA, Demonstration 29 CS5611 - Psychoacoustics - Niall Griffith – Semester 1 Computer Science and Information Systems, University of Limerick