Uploaded by 林美

Speech perception

1. How do people understand speech?
When we talk. We produce sounds with cheeks, lips, jaws and vocal cords through air into people’s ear
drums, then the brains work on it.
Do you know how many English words you recognize in your minds? T he average 20-year-old
English native speaker knows between 27,000 and 52,000 different words. But it doesn’t matter how many words
we know, evertime we speak out loud, most of these words last less than a second. With every word, our brain
has a quick decision to make: which of those thousands of options matches the signal? And about 98% of the
time, the brain chooses the correct word.
When we are listening to a new language we don’t know well, it is super hard to understand message. But the
language we are fluent in, we don’t even need to catch every sound, even every single word. That’s because
our brain stores meanings of the sound connected word when we hear the sounds. (and that’s what the writer
mention in the middle of the 2nd paragraph this page.)
How do our brains make it and process speech? First, we have to figure out how we identify the sounds of the
language. The following session will help us know how to identify continuous sounds and how we segment
Why is perception difficult
Speech comprehension is different from reading comprehension. Most theories assume that each word
we know is represented by a separate procession unit that has just one job: to assess the likelihood of
incoming speech matching that particular word. In the context of the brain, the processing unit that
represents a word is likely a pattern of firing activity across a group of neurons in the brain’s cortex.
When we hear the beginning of a word, several thousand such units may become active, because with just
the beginning of a word, there are many possible matches, then as the word goes on, more and more units
register that some vital piece of information is missing and lose activity. Possibly well before the end of
the word, just one firing pattern remains active, corresponding to one word. This is called the recognition
point. In the process of honing in on one word, the active units suppress the activity of others, saving vital
milliseconds. Most people can comprehend up to about 8 syllables per second, but the goal for
understanding speech is not only to recognize the word, but also to accesee its stored meaning.
Therefore, we have challengies in Pace, blending, automaticity of processing, invariance problem
• Why is speech perception difficult?
o Spoken words often presented briefly
1 second of conversation contains about 8-10 phonemes, 3-4 syllables and 2-3 words.
o Written word always presented in front of you
2.an issue in language comprehension due to variation in how phonemes are produced (the same
phoneme sounds different depending on context)
One is Assimilation: Phonemes take on some quality of their neighbors
The way in which we segment speech depends on the language we speek.
Savin & Never (1970)
We identify words based on syllable content not phonemes.
Foss & Blank (1980)
We have a Dual-Route system where info is processed arpelexcially and phonemically at the same time
phoneme restoration effect
the use of top-down processing to comprehend fragmented language (context is helpful) ; (Low level of
Initial contact
Spoken word identification time course PART 1; sensory input makes initial contact with the lexicon
lexical selection
Spoken word identification time course PART 2 ; sensory input continues to accumulate until one
lexical entry is selected
word recognition
Spoken word identification time course PART 3 ; word is recognized (usually occurs before the
complete word has been heard)
cohort model
in lexical access, a large number of spoken words are initially considered as candidates but words
get eliminated as more evidence accumulates (DEDUCTIVE)
TRACE model
Connectionist model that consists of 3 levels ; auditory features, phonemes, and words
Speech recognition
18% of brain damage patients have a problem with this
Brain damage affects this
Early acoustic-phonetic processing
Word-deaf patients cannot do this ____ but can do this _____
Understand spoken language ; read, write, & speak
Speech perception
• Speech perception
o How we understand and perceive sounds of language
o Knowing the word helps to identify the constituent sounds
o Might not need to hear all the sounds of word to know the word
• Recognising speech
o Can distinguish the pre-lexical code
Sound representation used prior to identification of a word from post-lexical code
Info only available after lexical access
o Need to specify nature of prelexical code to understand speech
• Why is speech perception difficult?
o Spoken words often presented briefly
o Written word always presented in front of you
o Not such easy segmentation of words into component sounds as words into letters - sounds and
whole words tend to slur into each other
o Despite problems, good at recognising speech
o Automatic and fast process
o Clark and Clark (1977)
When people presented with buzz, hiss, tone and vowel = distinguish sounds if presented slower
rate than 1.5 seconds
o At 1 second = understand speech at rate of 20 phonemes per second
o Marslen and Wilson (1984)
Identify spoken words in context from about 200ms after onset
o Miller, Heise and Lichten (1951)
Found more words there were to choose from predetermined set = louder signal had to be relative
to the noise for ppts to identify them equally well
o Bruce (1958)
Words in meaningful context = recognised better than words out of context
Taken x2 as long if taken out of isolation
Acoustic signals and phonetic segments: how do we segment speech?
o Acoustic properties of phonemes not fixed
o Vary with what context they are in
o The 'b' sound in ball, bill and able = acoustically distinct
Makes phoneme ID difficult
o Physical acoustic signal and sound conveyed by signal = relation between them is complex
o Miller and Jusczyk (1989)
Complexity arises because 2 main features must act as major constraints on theories of speech
Features both facets of lack of identity/isomorphism between acoustic and phonemic levels of
language - called segmentation and invariance problems
o Invariance problem
Same phoneme can sound different depending on context
o Segmentation problem
Sounds slur together = cannot be easily separated
Not easy to separate sounds in speech as they run together
Doesn't apply just to sounds within words - words run into each other
• 'I scream' and 'ice cream' = almost indistinguishable
Obvious constraint on segmenting speech = prefer segment speech so each speech segment is
accounted for by possible word
o Acoustic invariance
Arises due to details of realisation that phoneme vary depending on context of surrounding
Phonemes taks on some acoustic properties on neighbours
• Process called assimilation
Co-articulation effects
• Produce one sound our vocal apparatus has just moved into position from making another sound preparing to change again to make subsequent sound
• Has advantages for speak and listener
• Speaker
o Speech produced more quickly
• Listener
o Info about identity of phonetic segments may be spread over several acoustic segments
o Other strategies develop segment speech depend on exposure to particular language
Strong syllables never shortened to unstressed neutral vowel sounds
Weak syllables are
Cutler and Butterfield (1992)
• In English, strong syllables likely initial syllables of main content-bearing words
• Weak syllables are either not word initial or start a function word
• Strategy that uses this type of info = metrical segmentation strategy
o Experiments can be use to violate these expecations = cause listerners to mishear
Another segmentation procedure = stress based segmentation
• Listerners segment speech by identifying stressed syllables
Alternative mechanism = based on detecting syllables called syllable based segmentation
o Bilingual speakers segment languages due to their segmentation process determined by which is
their dominant language
o Cutler, Mehler, Norris and Segui (1992)
Tested English-French bilinguals on segmenting English and French materials
Used syllable monitoring task - ppt responded as quickly as they could if they heard particular
sequence of sounds
Native French - find easy to detect 'ba' in balance and 'bal' in balcon
• Take longer to find 'bal' in balance and 'ba' in balcon
• Do not respond to syllables
English word balance far less clear - uncertain which syllable 'l' sound belongs to
• Takes time for English speakers to detect 'ba' and 'bal' doesn't vary with syllable structure of word
French - use syllables
• Showed syllable based segmentation
English - do not use syllables segmented on basis of their primary language
• Never showed syllable based segmentation
Categorical Perception (CP)
• Categorical perception (CP)
o Classify speech sounds as one phoneme
o Phenomenon called categorical perception of phonemes
o Liberman et al
Used speech synthesiser to create continuum of articial syllables that differed in place of
Ppts placed syllables into 3 distinct categories beginning with /b/, /d/ and /g/
o Another example of CP is voice onset time (VOT)
Vocal cords start vibrating as soon as vocal tract closed
Voicing lies on continuum
Categorise sounds as voiced or unvoiced
o Boundaries between categories not fixed - sensitive to contextual factors e.g. rate of speech
Perceptual system able to adjust to fast rates of speech
Short interval can be treated as realtively long one if surrounding speech is fast enough
(Summerfield, 1981)
Not learned - infants sensitive to speech rate
o Pisoni and Tash (1974)
Ppts faster to say that two /ba/ syllables were same if the /b/ sounds were acoustically identical
than if /b/ sounds differed slightly in VOT
Sensitive to differences within category
Importance of CP come into question
o Massaro
Argued apparent poor discrimination within categories doesn't result from early perceptual
Arises from bias of ppts to say items from same category are identical
What is the nature of the prelexical code?
o Savin and Bever (1970
Ppts responded as soon as heard a unit - either single phoneme or syllable
Foundppts response slowly to phoneme targets than syllable
Concluded phoneme ID is subsequent to perception of syllable
Do not recognise words through perceiving individual phonemes but instead recgonise them
through perceiving more fundamental units - syllables
o Foss and Swinney (1973)
Challenged this
Argued that phoneme and syllable monitoring task used by above study not directly tap into
perception process
Just because we become consciously aware of higher unit first - doesn't mean processed
perceptually earlier
o Foss and Blank (1980)
Proposed dual code theory
Speech processing employs both prelexical (phonetic) code and postlexical (phonemic)
• Computed directly from perceptual analysis of input acoustic info
• Derived from info from higher level units such as words
Phoneme monitoring times for words and non words = approx. same
Frequency of target words doesn't affect phoneme monitoring times
Manipulating sematic context of word leads people responding on basis of postlexical code
People respond to prelexical code when phoneme monitoring task made easy - to postlexical if task
made hard
o Marslen-Wilson and Warren (1994)
Evidence on range of tasks that phoneme classification doesn't have to be finished before lexical
activation can begin
Nonwords constructed from words - more difficult to reject in auditory lexical task than nonwords
constructed from nonwords
'Smog' (word) 'Smod' (nonword)
• Take original consonant and splice with 'b' = Smob
Smog more difficult to reject as nonword because co-articulation info from vowel is consistent with
Lexical representations directly accessed from featural info in sound signal
Co-articulation info from vowels used early to identify following consonant = therefore word
What role does context play in identifying sounds?
o speech recognition - bottom up or top down?
o Top down
If word which sound occurs or meaning of whole sentence can influence recognition of that sound
then this is top down
In this case - have to show speech is in part at least an interactive process
• Knowledge about whole words influencing perception of component words
o First evidence
Based on CP of sounds varying along continuum
Word context affects where boundary between two lies
o Ganong (1980)
Varied ambiguous phoneme along appropriate continuum (e.g. /k/ to /g/)
Inserted this in front of context, provided by word ending
Found context affected the perceptual changeover point
Ppt willing to put sound into category they wouldn't choose if resuts makes a word (kiss is a word
but giss isn't)
o Lexical identification shift
Influence our categorical perception of ambiguous phoneme
Word context influencing categorisation
o Connine and Clifton (1987)
Strengthened argument that lexical knowledge available to categorical perception of ambiguous
Other processing advantages accrue to ambiguous stimuli when lexical knowledge is invoked
o Signal detection theory
Provides means of describing identification of imperfectly discriminable stimuli
Lexical context not sensitivie to manipulations
o Connine (1990)
Found senential context behaves differently from lexical context (context provided by words which
the ambiguous phoneme occurs)
Sentential context similar effect to postperceptual effect on amount of monetary payoff - certain
responses lead to greater words
o Phoneme restoration effect
Evidence of contextual involvement in sound ID (Obusek and Warren, 1973)
Ppts presented with sentences that involved an asterisk in middle of sentence
* - cough presented here when this was shown
Ppts couldn't detect sound was missing from sample
Ppts using semantic and syntactic info beyong the individual phonemes in processing speech
o Warren and Warren (1970)
Ppts listened to tapes constructed so the only thing that differed was the last word
Phoneme at beginning of sentence (*eel) replaced with cough
Found ppts restored depended on semantic context given by final word of sentence
E.g. (original sentence: it was found that the *eel was on the orange ppts restored a phoneme that
would make sentence appropriate: restored sentence: it was found that the peel was on the orange)
o Fodor (1983)
Asked whether restoration occurs at phonological level or at some higher level
Ppts could just guess the deleted phoneme
o Samuel (1981;1987;1990;1996)
Examined effects of adding noise to segment instead of replacing segment with noise
Phoneme truly perceptual - ppts should not be able to detect any differences between these
If effect postperceptual - should be discriminationg between 2 conditions
Concluded that lexical context does lead to true phoneme restoration and effect is prelexical
Sentence context does not affect phoneme recognition and affects only postlexical processing
o Samuel (1997)
Investigated suggesting that people guess the phoneme in restoration task than truly restore it at
perceptual level
Combined phoneme restoration technique with selective adaptation teachnique of Eimas and
Corbit (1973)
Listeners identified sounds from /bI/ - /dI/ continuum where sounds were acting as adaptors were
the third syllable of words beginning with /b/ or /d/
Repeated presentation of adaptor- ppts less likely to classify subsequent sound as /b/
Adaptation occurred even if critical phoneme in adaptor word was replaced with loud burst of noise
o Argued that lexical context doesn't seem to be improving perceptibility of phoneme (Sensitivity)
but just affects ppts response (bias)
o Top down info not really affecting sensitivity towards word recognition
The time course of spoken word recognition
o Recognising spoken word begins when some representation of sensory input makes initial
contact with lexicon - initial contact phase
o Once lexical entries begin to match contact representations - change
Become activated
o Activation can be all-or-none or relative activation level depend on properities of words
o Activation accumulates until one lexical entry selected
o Word recognition is end point of selection phase
o Word recognition point corresponds to its uniqueness point - words initial sequence is common to
o Recognition delayed until after uniqueness point - might recognise word before its uniqueness
o If this occurs - isolation point
Point where proportion of listeners identify the word correctly even though not confident with
o By isolation point - isolated word has occurred (candidate)
o Recognition point - lexical access refers to point where all info about word becomes available and
then ppt able to recognise the word
o Process of integration follows the start of comprehension process where semantic and syntactic
properties of word are integrated into higher level sentence representation
When does frequency affect spoken word recognition?
Dahan, Magnuson and Tanenhaus (2001)
Examine eye movements while looking at pics on screen
Ppts had to follow spoken instructions about which object they had to click on
Ppts looked at objects with higher frequency name first compared with computer pic with lower
frequency name but same initial sound
Needed to look for less time at targets with higher frequency names
o Word frequency is important from the earliest stages of processing - effects persistent for some
Context effects on word recognition
o Context = includes info available from the previous sensory input & from higher knowledge
o Nature of context depends on level of analysis
o To show context affects recognition, we need to demonstrate top down influences on bottom up
processing of acoustic signal
o Autonomous position
Context cannot have an effect prior to word recognition
Only contribute to evaluation and integration of output of lexical processing - not its generation
However, lateral flow of info is permitted in these models
o Interactive position
Allow different types of info to interact with one another
May be feedback from later levels of processing to earlier ones
• E.g. info about meaning of sentence or pragmatic context might affect perception
o Because of huge differences between models - can be difficult to test between them
o Strong evidence for interactionist view - if context has an effect before or during the access and
selection phases
o Autonomous model - context can only have influence after word has emerged as the best fit to
sensory input
o Frauenfelder and Tyler (1987)
Distinguished between 2 types of context
• Non structural
o Context can be thought of as info from same level of processing which currently being processed
o Example of facilitation is processing arising from intra-lexical context (e.g. associative relation
between 2 words)
o Can be explained in terms of relations within single level of processing - need not violate the
principle of autonomy
o Associative facilitation can be thought of as occurring due to hardwired connections between
similar things at same level
• Structural
o Context affects the combination of words into higher level units - involves higher level of info
o It's top down processing
o Number of possible types of structural context
o Word knowledge (lexical context) might used to identify phonemes
o Sentence level knowledge (sentence and syntactic context) used to help identify individual words
o Most interesting structural context = meaning
o Frauenfelder and Tyler (1987)
Distinguished 2 subtypes (semantic and interpretative)
Semantic = based on word meanings
• Evidence to suggest this affects word processing
o Not clear whether non structural and semantic structural context effects can be distinguished
o Delay between stimulus and response should not be too long - ppts would have chance to reflect
on and alter their decisions - reflect late stage, post access mechanisms
o Interpretative structural context = involves more high level info e.g. pragmatic info, discourse info
and knowledge about world
• Some evidence that non-linguistic context can have effect on word recoginition
• Tanenaus, Spivey-Knowlton, Eberhard and Sedivy (1995)
o Studied peoples eye movements while they were examining visual scene while following
o Found visual context can facilitate spoken word recognition
o E.g. candy and candle - ppts eyes moved faster to object mentioned if only candle was on scene
rather than if candle and candy presented at the same time
o When no confusion occurs - ppts identified object before hearing the end of the word
o Suggests that interpretative structural context can affect word recognition
Models of speech recognition
o Speech perception concerned with early stage of processing
o Early models of speech rec. examined the possibility that word rec occurred by template
o Target words stored as templates - ID occurs when match is found
o Template is exact descripton of sound/word for which searching
o However, far too much variation in speech for this to be plausible account except in most
restricted domains
o Speakers differ in dialect, basic pitch, basic speed of talking
o Person can produce same phoneme in many different ways
o Number of templates that would be stored would be large
o Template models not considered plausible accounts in psycholinguistics
Models of speech recognition: Analysis by synthesis (Halle & Stevens, 1962)
Early model of speech perception
Recognise speech by reference to actions necessary to produce sound
Important idea underlying this model - when we hear speech, we produce/synthesise succession of
speech sounds until we match what we hear
Synthesiser doesn't randomly generate candidates for matching against input
Create intial best guess constrained by acoustic cues in input - attempt minimise difference
between this and input
• Uses our capacity for speech production to cope with speech recognition
• Copes easily with intra-speaker differences - listeners generating own candidates
• Easy to show how constraints of all levels might have an effect
• Synthesiser only generates candidates that are plausible
• Not generate sequences of sounds that are illegitmate within language
Variant of model = motor theory
• Proposes speech synthesiser models the articulatory apparatus and moto movements of speaker
• Effectively computes which motor movements would be necessary to create sounds
• Evidence - way sounds are made provides perfect description of them
Problems with analysis by synthesis model
• There no apparent way of translating the articulatory hyp generated by production system into
same format as heard speech in order for potential match to be assessed
• We are adempt at recognising clearly articulated words that are impossible in their context suggests speech rec. is primarily data driven process
• Clark and Clark (1977) - argued this theory is underspecified and has little predictive power
This theory doesn't explain whole story of speech perception but appears that motor processes
play some role
Cohort model
o Cohort model
Marslen and Wilson and Welsh (1978)
• Emphasise bottom up nature of word rec.
• Distinguish between early and later version
• We hear speech, set up cohort of possible items it could be
• Items eliminated from this set until one is left
• Then taken as the word currently trying to be recognised
• Distinguish between early version of model - permitted more interaction from later version
(processing more autonomous and recognition system better able to recover if beginnings words
were degraded)
• 3 stages of processing in this model
o 1. Access stage the perceptual representation is used to activate lexical items - thereby generate
candidate set of items
Candidates = cohort
Beginning of word important in generating cohort
o 2. Selection stage - one item is chosen from the set
o 3. Integration stage - semantic and syntactic properties of chosen word utilised (integrating word
into complete representation of whole sentence)
• Access and selecton stages are prelexical and integration stage = postlexical
• Parallel, interactive, direct access
• Uniqueness point - around this point when most intense processing occurs
o Recognition point doesn't have to coincide with uniqueness point
• Context only affects integration stage
• Bottom up priority - context cannot be used to restrict which items form the initial cohort
o Context- not used to eliminate cadidates at early stage
• Later version - elimination of candidates doesn't become all or none
Experimental tests of cohort model
• Marslen-Wilson and Welsh (1978)
o Used technique called shadowing to examine how syntax and semantics interact in word
o Ppts listened to continuous speech and repeated back as quickly as possible
o Speech samples have deliberate mistakes - distorted sounds so words are mispronounced
o Ppts not told there are mispronuciations but told they have to repeat back passage of speech
o 50% of time repeated these back as they should be rather than actually are (correcting
mispronounciation) - fluent restorations
o More distorted a sound is - more likely to get an exact repetition
o Most fluent restorations made when distortions was slight, when distortions was in final syllable
and when word highly predictable from its context
o Most of exact reproductions occur with greater distortion when word is uncontrained by context
o In suitable contraining context - listeners make fluent restorations even when deviations are
• Not pay attention to all parts of word
• Beginning of word (first syllable) = salient
• Cole (1973) demonstrated this
o Ppts listened to speech where sound is distorted (boot changed to poot)
o Detect changes
o Consistent with shadowing task - ppts more sensitive to changes to beginning of words
• Word fragments that match word from onset - nearly as effective a prime as word itself
• Rhyme framgents of words produce little priming
o E.g. cattle or yattle
Gating task
• Grosjean (1980)
o Involves presenting increasing amounts of word
o Enables isolation points of words to be found
o Mean time it takes form onset of word to be able to guess correctly
o Task demonstrates importance of context
o Ppts need average 333ms to identify word in isolation - only 199ms in appropriate context
"at zoo the kids rode the ca...." (camel)
• Taks showed that candidates generated that are compatible with perceptual representation up to
a point but not compatible with context
• Showed context can assist in selecting semantically appropriate candidates before words
recognition point
Imaging data support the idea that semantics play role in selecting candidates
• Lexical decision task - high imaginability words generate stronger activation than low imaginability
words in competitive contexts (Zhuang, Randall, Stamatakis, Marslen-Wilson & Tyler, 2011)
The influence of lexical neighbourhoods
• Goldinger, Luce and Pisoni (1989) suggest that cohort size doesn't affect the time course of word
• Luce et al found structure of words neighbourhood affects speed and accuracy of auditory word
recognition on range of tasks
o Including identifying words and performing auditory lexical decision tasks
• Number and characteristics of word competitors are important
• Marslen-Wilson (1990)
o Examined effect of frequency of competitors on recognising words
o Found time it takes to recognise word - doesn't depend on relative uniqueness points of
competitors in cohort
• Phonological neighbourhood not only factor can affect auditory recognition
• Orthographic neighbourhood also affect auditory recognition but doesn't in facilitatory fashion
• Spoken words with visually similar neighbours = faster to identify
• Printed words sometime affect spoken word recognition - due to sublexical units or word units (or
both) for different modalities are linked
Evaluation of cohort model
• Early version
o Context cannot affect access stage
o Can affect selection and integration stages
• Later version
o Context cannot affect selection but only affects integration
• Model doesn't distinguish between provisional and definite ID
• Problem is its reliance on knowing when words start without having explicit mechanism for finding
start of words
Trace model
Emphases interactive model
Allows feedback between levels of processing
Emphasises the role of top down processing (context) on word recognition
Lexical context can directly assist acoustic perceptual processing and info above word level can
directly influence word processing
Connectionist model
• Consist many simple processing units connected together
• Units arrange in 3 levels of processing
Assumes early, perceptual processing of acoustic signal
Level input units represent phonological features
• Connected to phoneme units - connected to output units =represent word
All connections between levels are bidirectional
• Info flows along them in both directions
• Means both bottom up and top down processing occurs
Inhibitory connections between units with each level - has effect once unit activated - tends to
inhibit competitors
Emphasises concet of competition between units at same level
Units represented independently in each time slot
Model implemented in form of computer simulations
Shows how lexical knowledge aids perception
Categorical perception arise in model as consequence of within level inhibition between phoneme
Evaluation of TRACE Model
Handles context effects in speech perception well
Cope with acoustic variability (Some)
Gives account of findings such as phoneme restoration effect and co-articulation effects and lexical
context effects
Good at finding word boundary and copes well with noisy input
• Many parameters that can be manipulated = trace too powerful that can accommodate any result
o The way model deals with time - simulating it as discrete slices is implausible
• Massaro (1989)
o Carried out experiment - listeners had to make forced choice decsions about which phoneme
they heard
o Sounds occurred in context
o First context favoured ID of /l/ - number of English words that begin with /sli/ but no words
beginning with /sri/
o Third context favors /r/ - words beginning with /tri/ but not /tli/
o Context biases performance - behaviour in this tasks differed from behaviour of trace model
o Shows that it is possible to make falsifiable predictions about connectionist models
• Trace context has biggest effects when speech signal most ambiguous - less effect when signal
less ambiguous
• Main problem - based on idea that top down context permeates recognition process
o Extent to which top down influence speech perception is controversial
o Experimental evidence against types of top down processing that trace predict occur in speech
o Context effects observed with perceptually degraded stimuli
In support
• Elman and McClelland (1988)
o Reported experiment showing interactive effects on speech recognition of sort predicted by trace
o Argued between level processes demonstrated - can affect within level processes at lower level
o Illusory phonemes created by top down can affect co-articulation (influence of one sound on
neighbouring sound) operating at basic sound perception predicted by simulations in trace
o Listeners sensitive to co-articulation effects in speech
Lexicon appears to be influencing prelexical effect (compensation)
• Accounts of data compatible with autonomous model
• Not necessary to invoke lexical knowledge
• Connectionist simulations use bottom up processing can learn difference between /g/ and /s/ and
Pitt and McQueen (1998)
• Demonstrated sequential info can be used in speech perception
• Compensation for co-articulation effects on categorisation of stop consonants when preceded by
ambiguous sounds at end of non words
o Poor at detecting mispronounciations
It's a single outlet model
Only way it identifies phonemes is to see which phonemes are identified at phoneme level
Multiple output models
2 sources of info - stored and maintained prelexical analysis of word and word's lexical entry
• Compete for output
Decision made on basis which route produces answer first
Lexical effects on phoneme processing should maximised when people pay attention to lexical
• Minimised when pay attention to prelexical outlet
This pattern of behaviour is difficult to observe in trace model
o Norris et al (2000)
Argued feedback never necessary in speech recognition
Top down feedback would hinder recognition
Feedback cannot improve accuracy in processing - speed up processing
o Frauenfelder, Segui and Dijkstra (1990)
No evidence found of top down inhibition on phonemes in task involving phoneme monitoring of
unexpected phonemes late in word compared to non words
Trace predicts once word accessed, phonemes not in it should be subject top down inhibition
Trace predicts targets in non words derived from changed wordws should be identified more slowly
• Because actual phoneme competes with phoneme in real word because top down feedback
o Trace unable to account for findings from subcategorical mismatch experiments
Task involves cross splicing initial consonants and consonant clusters from matched pairs of words
Marslen-Wilson and Warren
• Examined effect of splicing lexical decision and phoneme categorisation
• Effect of cross splice on non words = much greater when spliced material came from word
• Difficult for trace because simulations in trace show that words should affected as well as non
• Trace does poorly - cannot use data about mismatch between 2 items
o Trace successful accounting for number of phenomena in speech recognition
Good explaining context effects.
categorical perception
the perception of speech sounds as belonging to discrete categories
when speech sounds are produced while the vocal cords are vibrating
top-down influences
can fill out the gaps and correctly interpret obscure sentences by using logic (top-down)
visual cues
cues from vision can play a role in accurate comprehension
Models of Speech Perception
attempt to explain how information coming in form the continuous stream of speech make contact with
our store