CS664HandoutforFeb2628

advertisement

Arbib: 664 notes for February 26 and 28, 2002 1

The HEARSAY Paradigm for Speech Understanding 1

There are important parallels between visual perception and speech understanding on the one hand, and between speech production and motor control on the other. The basic notion is that visual perception, like speech understanding, requires the segmentation of the input into regions, the recognition that certain regions may be aggregated as portions of a single structure of known type, and the understanding of the whole in terms of the relationship between these parts. In Visual Scene Interpretation, we use the term

"schema" to refer to the internal representation of some meaningful structure and viewed the animal's internal model of its visually-defined environment as an appropriate assemblage of schemas. We now view the human being's internal model of the state of discourse as also forming a schema assemblage (for a similar linguistic perspective see Fillmore, 1976). Similarly, the generation of movement then requires the development of a plan on the basis of the internal model of goals and environment to yield a temporally ordered, feedback-modulated pattern of overlapping activation of a variety of effectors. We thus view the word-by-word generation of speech in relation to the general problem of motor control.

The action/perception cycle corresponds to the role of the speaker in ongoing discourse. While there are no direct one-to-one correspondences between sensory-motor and language computations, there are overall similarities that allow us to investigate aspects of the neural mechanisms of language by examining the neural mechanisms relevant to perceptual-motor activity. We believe that the pursuit of these connections will require a framework of cooperative computation in which, in addition, to interaction between components of a language alone, there are important interactions between components of linguistic and nonlinguistic systems.

We have argued for "cooperative computation" as a style for cognitive modeling in general, and for neurolinguistics in particular . To precede a (currently preliminary) extension of neurolinguistics beyond the earlier brief overview of Broca's and Wernicke's aphasias, we discuss how this style might incorporate lessons learnt from AI systems for language processing such as HEARSAY-II (Erman & Lesser, 1980,

Lesser, Fennel, Erman, and Reddy, 1975), even though it is far from the state of the art for current computational systems for speech understanding. Figure 32 shows how the system handles the ambiguities of a speech stream corresponding to the pronunciation "Wouldja" of the English phrase

"Would you?".

1

This approximates material in TMB2 Section 4.2. Note link to VISIONS (TMB2, Section 5.2).

Arbib: 664 notes for February 26 and 28, 2002 2

Figure 32.

Cooperative computation. The HEARSAY paradigm for understanding the spoken "English" phrase

"Woodja?". Multiple hypotheses at different levels of the HEARSAY blackboard. "L" supports "will" which supports a question; "D" supports "would" which supports a modal question. [Lesser et al., 1975].

The input to the system is the spectrogram showing the variation of energy in different frequency bands of the spoken input as it varies over time. Even expert phonologists are unable to recognize with certainty a phoneme based on just the corresponding segment of the speech stream. Correspondingly, HEARSAY replaces the speech stream at the lowest level of its "blackboard" (multi-level database, indexed by time flowing from left to right) with a set of time-located hypotheses as to what phonemes might be present, with confidence levels associated with the evidence for such phonemes in the spectrogram. This relation between fragments of the spectrogram and possible phonemes is mediated by a processor which the

HEARSAY team called a knowledge source. Another knowledge source hypothesizes words consistent with phoneme sequences and computes their confidence values in turn. Yet another knowledge source applies syntactic knowledge to group such words into phrases. In addition to these "bottom-up" processes, knowledge sources may also act "top-down", e.g., by trying to complete a phrase in which a verb has been recognized as plural by seeking evidence for a missing "s" (or variant phoneme) at the end of a noun which precedes it. As the result of such processing, an overall interpretation of the utterance – both of the words that constitute it and their syntactical relationship – may emerge with a confidence level significantly higher than that of other interpretations.

For example in Figure 32 we see a situation in which there are two surface-phonemic hypotheses "L" and "D" consistent with the raw data at the parameter level, with the "L" supporting the lexical hypothesis "will" which in turn supports the phrasal hypothesis "question", while the "D" supports "would" which in turn supports the "modal question" hypothesis at the phrasal level. Each hypothesis is indexed not only by its level but also by the time

Arbib: 664 notes for February 26 and 28, 2002 3 segment over which it is posited to occur, though this is not explicitly shown in the figure. We also do not show the "credibility rating" which is assigned to each hypothesis.

Figure 33. HEARSAY II (1976). A Serial Implementation of a Distributed Architecture: HEARSAY-II Architecture.

The blackboard is divided into levels. Each KS interacts with just a few levels. A KS becomes a candidate for application if its precondition is met. However, to avoid a "combinatorial explosion" of hypotheses on the blackboard, a scheduler is used to restrict the number of KS's which are allowed to modify the blackboard [Lesser & Erman, 1979].

Consider how it might relate to the interaction of multiple brain regions.

HEARSAY also embodies a strict notion of constituent processes, and provides scheduling processes whereby the activity of these processes and their interaction through the blackboard data base is controlled. Each process is called a knowledge source (KS), and is viewed as an agent which embodies some area of knowledge, and can take action based on that knowledge. Each KS can make errors and create ambiguities. Other KS's cooperate to limit the ramifications of these mistakes. Some knowledge sources are grouped as computational entities called modules in the final version of the HEARSAY-II system. The knowledge sources within a module share working storage and computational routines which are common to the procedural computations of the grouped KS's. HEARSAY is based on the

"hypothesize-and-test" paradigm which views solution-finding as an iterative process, with each iteration involving the creation of a hypothesis about some aspect of the problem and a test of the plausibility of the hypothesis. Each step rests on a priori knowledge of the problem, as well as on previously generated hypotheses. The process terminates when the best consistent hypothesis is generated satisfying the requirements of an overall solution.

As we have seen, the KS's cooperate via the blackboard in this iterative formation of hypotheses. In

HEARSAY no KS "knows" what or how many other KS's exist. This ignorance is maintained to achieve a

Arbib: 664 notes for February 26 and 28, 2002 4 completely modular KS structure that enhances the ability to test various representations of a KS as well as possible interactions of different combinations of KS's.

The current state of the blackboard contains all current hypotheses. Each hypothesis has an associated set of attributes, some optional, others required. Several of the required attributes are: the name of the hypothesis and its level; an estimate of its time interval relative to the time span of the entire utterance; information about its structural relationships with other hypotheses; and validity ratings. Subsets of hypotheses are defined relative to a contiguous time interval. A given subset may compete with other partial solutions or with subsets having time intervals that overlap the given subset.

We thus regard the task of the system as a search problem. The search space is the set of all possible networks of hypotheses that sufficiently span the time interval of the utterance connecting hypotheses directly derived from the acoustic input to those that describe the semantic content of the utterance. The state of the blackboard at any time, then, comprises a set of possibly overlapping, partial elements of the search space. No KS can single-handedly generate an entire network to provide the element of the search space. Rather, we view HEARSAY as an example of "cooperative computation": the KS's cooperate to provide hypotheses for the network that provides an acceptable interpretation of the acoustic data. Each

KS may read data, add, delete, or modify hypotheses, and attribute values of hypotheses on the blackboard. It may also establish or modify explicit structural relations among hypotheses. The generation and modification of hypotheses on the blackboard is the exclusive means of communication between KS's.

Each KS includes both a precondition and a procedure. When the precondition detects a configuration of hypotheses to which the KS's knowledge can be applied, it invokes the KS procedure, that is, it schedules a blackboard-modifying operation by the KS. The scheduling does not imply that the KS will be activated at that time, or that the KS will indeed be activated with this particular triggering precondition, because HEARSAY uses a "focus of attention" mechanism to stop the KS's from forming an unworkably large number of hypotheses. The blackboard modifications may trigger further KS activity - acting on hypotheses both at different levels and at different times. Any newly generated hypothesis would be connected by links to the seminal hypothesis to indicate the implicative or evidentiary relation between them.

Changes in validity ratings reflecting creation and modification of hypotheses are propagated automatically throughout the blackboard by a rating policy module called RPOL. The actual activation of the knowledge sources occurs under control of an external scheduler. The scheduler constrains KS activation by functionally assessing the current state of the blackboard with respect to the solution space and the set of KS invocations that have been triggered by KS preconditions. The KS most highly rated by the scheduler is the one that is next activated (Hayes-Roth and Lesser, 1977).

Arbib: 664 notes for February 26 and 28, 2002 5

This account by no means exhausts the details of HEARSAY II, but it does make explicit a number of properties that suggests that it contains the seeds of the proper methodology for combining the best features of the faculty and process models with those of the representational models.

• Explicit specification of the different levels of representation: We see in it the explicit specification of the different levels of representation, and an interpretive strategy wherein the components interact via the generation and modification of multiple tentative hypotheses. This process yields a network of interconnected hypotheses that supports a satisfactory interpretation of the original utterance.

• An interpretive strategy whereby components interact via the generation and modification of multiple tentative hypotheses: HEARSAY exhibits a style of "cooperative computation" (Arbib, 1975, sect.

5). Through data-directed activation, KS can exhibit a high degree of asynchronous activity and parallelism. HEARSAY explicitly excludes the direct calling of one KS by another, even if both are grouped as a module. It also excludes an explicitly predefined centralized control scheme. The multilevel representation attempts to provide for efficient sequencing of the activity of the KS's in a nondeterministic manner that can make use of multiprocessing, with computation distributed across a number of concurrently active processors. The decomposition of knowledge into sufficiently simpleacting KS's is intended to simplify and localize the relationships in the blackboard.

#The connectionists' (and brain theorists'?) questions: Can different representations be explicitly separated?

Can we view neural activity as encoding hypotheses?#

Two other observations come from the studies of AI models in general and from recent psycholinguistic approaches.

• A grammar interacts with and is constrained by processes for understanding or production.

• Linguistic representations and the processes whereby they are evoked interact in a "translation" process with information about how an utterance is to be used. AI and psycholinguistics thus provide a framework for considering an ever-widening domain of concern, beginning with the narrowly constrained mediation by the linguistic code between sound and meaning, and extending to include processing and intentional concerns.

# Contrast the C 2 search (distributed and /or hierarchical) of HEARSAY and VISIONS with the classic searches of serial AI. Relate to NNs settling towards an attractor and GAs "evolving". How does this relate to Newell's growing network of problem spaces in SOAR?#

# Note that focus of attention is necessary even in parallel systems. cf. discussion of FoA in my paper with

Goodale.#

A "Neurologized" HEARSAY

As Figure 33 shows, the 1976 implementation of HEARSAY II was serial – with a master scheduler analyzing the state of the blackboard at each iteration to determine which knowledge source to apply next and to which data on the blackboard to apply it. However, for our purposes the reader is to imagine

Arbib: 664 notes for February 26 and 28, 2002 6 what would happen if the passive blackboard of this implementation were replaced by a set of working memories distributed across the brain, with each knowledge source a neural circuit which could continually sample the states of other circuits, transform them, and update parts of the working memory accordingly. The result would be a style of "cooperative computation" which I believe is characteristic of the brain.

The HEARSAY model is a well-defined example of a cooperative computation model of language comprehension. Following Arbib & Caplan [1979], we now suggest ways in which it lets us more explicitly hypothesize how language understanding might be played across interacting subsystems in a human brain. We again distinguish AI (artificial intelligence) from BT (brain theory), where we go beyond the general notion of a process model that simulates the overall input-output behavior of a system to one in which various processes are mapped onto anatomically characterizable portions of the brain. We predict that future modeling will catalyze the interactive definition of region and function — which will be necessary in neurolinguistic theory no matter what the fate of our current hypotheses may prove to be. In what follows, we distinguish the KS as a unit of analysis of some overall functional subsystem from the schema unit of analysis which corresponds more to individual percepts, action strategies, or units of the lexicon.

First, we have seen that the processes in HEARSAY are represented as KS's. It would be tempting, then, to suggest that in computational implementations of neurolinguistic process models, each brain region would correspond to either a KS or a module. Schemas would correspond to much smaller units both functionally and structurally - perhaps at the level of application of a single production in a performance grammar (functionally), or the activation of a few cortical columns (neurally). A major conceptual problem arises because in a computer implementation, a KS is a program, and it may be called many times - the circuitry allocated to working through each "instantiation" being separate from the storage area where the "master copy" is stored. But a brain region cannot be copied ad libitum, and so if we identify a brain region with a KS we must ask "How can the region support multiple simultaneous activations of its function?" #We return to this in our new analysis (Goodale paper) of schema instances in the VISIONS system.#

HEARSAY is a program implemented on serial computers. Thus, unlike the brain which can support the simultaneous activity of myriad processes, HEARSAY has an explicit scheduler which determines which hypothesis will be processed next, and which KS will be invoked to process it. This determination is based on assigning validity ratings to each hypothesis, so that resources can be allocated to the most

"promising" hypotheses. After processing, a hypothesis will be replaced by new hypotheses which are either highly rated and thus immediately receive further processing, or else have a lower rating which ensures that they are processed later, if at all. In HEARSAY, changes in validity ratings reflecting creation and modification of hypotheses are propagated throughout the blackboard by a single processor called the rating policy module, RPOL. HEARSAY's use of a single scheduler seems "undistributed" and "non-

Arbib: 664 notes for February 26 and 28, 2002 7 neural". In analyzing a brain region, one may explore what conditions lead to different patterns of activity, but it is not in the "style of the brain" to talk of scheduling different circuits. However, the particular scheduling strategy used in any AI "perceptual" system is a reflection of the exigencies of implementing the system on a serial computer. Serial implementation requires us to place a tight upper bound on the number of activations of KS's, since they must all be carried out on the same processor. In a parallel "implementation" of a perceptual system in the style of the brain, we may view each KS as having its own "processor" in a different portion of the structure. We would posit that rather than there being a global process in the brain to set ratings, the neural subsystems representing each schema or KS would have activity levels serving the functions of such ratings in determining the extent to which any process could affect the current dynamics of other processes, and that propagation of changes in these activity levels can be likened to relaxation procedures. #cf. Rumelhart et al.'s PDP view of schemas.#

The third "non-neural" feature is the use of a centralized blackboard in HEARSAY. This is not, perhaps, such a serious problem. For each level, we may list those KS's that write on that level ("input"

KS's) and those that read from that level ("output" KS's). From this point of view, it is quite reasonable to view the blackboard as a distributed structure, being made up of those pathways which link the different

KS's. One conceptual problem remains. If we think of a pathway carrying phonemic information, say, then the signals passing along it will encode just one phoneme at a time. But our experience with

HEARSAY suggests that a memoryless pathway alone is not enough to fill the computational role of a level on the blackboard; rather the pathway must be supplemented by neural structures which can support a short-term memory of multiple hypotheses over a suitable extended time interval. #Dualize this, so that levels are regions with working memory, and KS's are composed of pathways and nuclei which link them. Reconcile with the new approach to schema instances in VISIONS.#

An immediate research project for computational neurolinguistics, then, might be to approach the programming of a truly distributed speech understanding system (free of the centralized scheduling in the current implementation of HEARSAY) with the constraint that it include subsystems meeting the constraints such as those in the reanalysis of Luria's data offered above. Gigley (1985?) offers a cooperative computation model which, without serial scheduling, uses interactions between phonemic, semantic, categorial-grammatical and pragmatic representations in analyzing phonetically-encoded sentences generable by a simple grammar.

Luria on Neurolinguistics

We now emphasize the finer grain of the components in linguistic modeling, presenting an outdated

(!) overview of neurolinguistics in the form of a summary diagram developed by Arbib and Caplan (1979) based on Luria's (1973) analyses of object naming, speech production, speech comprehension, and repetition. Each box in the following diagrams corresponds to a brain region and to functions suggested by clinical data. When the components are embedded in the overall system, we find that a great deal of overlap arises – e.g., mechanisms involved in speech perception are also necessary for flawless speech

Arbib: 664 notes for February 26 and 28, 2002 8 production. (Boxes are labeled with the same capital letter if they correspond to the same brain region; they differ in the number of primes if they are attributed to different hypothetical functions. In each figure, the functional attribution of a region is taken from Luria, whereas the arrows are our own indication of a plausible information flow.) Recent work in neurolinguistics has done surprisingly little to improve our understanding of how specific brain regions engage in “cooperative computation” to serve language.

Caveat for all such analyses: Do not say that because a lesion of some region yields impaired behavior that the same region (mechanism) must be postulated to compensate for the deficit.

My point in putting a 20 year old model here is not so much to say “use it” as to say “we must do something that is demonstrably better.

Luria transfers the emphasis from the brain-damaged patient to how brain regions interact in some normal performance - but using data on abnormal behavior to provide clues about neurological processes. Luria's approach is founded upon the idea of a functional system (Anokhin, 1935) in which an invariant task can be performed by variable mechanisms to bring the process to an invariant result, with the set of mechanisms being complex and, to an important degree, interchangeable - in other words, different resources can be deployed to achieve the same goal. (Similarly, we have the notion of motor equivalence in motor control, cf. Bernstein.) From such a background, as well as from the developmental studies of Vygotsky (1934, trans 1962) and his own wide experience in neurology and psychology, Luria formulated the following program for neuropsychology:

"It is accordingly our fundamental task not to localize higher human psychological processes in limited areas of the cortex, but to ascertain by careful analysis which groups of concertedly working zones of the brain are responsible for the performance of complex mental activity; what contribution is made by each of the zones to the complex functional system and how the relationship between these concertedly working parts of the brain in the performance of complex mental activity changes in the various stages of its development." (Luria, 1973, pp. 33-34)

Earlier theorists considered that the function of Broca's area was the production of speech, and that the linguistic representations in this area were the motor programs for speech. Similar

functional/representational pairings characterized all centers. Geschwind's analysis of naming follows the pattern inasmuch as a function (object naming) is associated with a representational system

(associating modality-specific information about objects and words) at an anatomical site (the inferior lobe). It is reasonable to consider that this center may function as part of these overt tasks, because it retrieves the full lexical form of words from semantic, auditory, and other cues, as each task requires.

Geschwind (1965) notes the relative sparing of number reading and object naming in the presence of impairments of color naming, word reading, and the deciphering of musical notation. Such an analysis proceeds from a recognition of the subprocesses (e.g., tactile association) that figure in linguistic behaviors and cannot be considered a wholesale acceptance of the received categories of linguistic

Arbib: 664 notes for February 26 and 28, 2002 9 analysis. However, Luria stresses the notion of hierarchical relations among functioning systems, relations that can be altered through development or dissolution of neural structures.

M A B

O

(Broca's) motor speech center

(Wernicke's)

Auditory speech center

A

Angular gyrus

B

Visual

Association

Area

O

Auditory

Cortex

Figure. Geschwind's (1972a) pathways involved in naming. (Top) Saying the name of a seen object. (Bottom)

Understanding the spoken name of an object. Geschwind postulates that the angular gyrus contains the rules for going back and forth between the spoken and the visual pattern.

MA: a) Discuss the terms function, representation, and anatomical site. b) Discuss the issue of local vs. distributed representations at the levels of brain regions and neurons. And what is the "full lexical form"? This account does not address the features that modern G-B theory, for example, includes in the lexicon (whether we view it in rule-based or connectionist terms). More generally, "naming" per se ignores the issues of using words in sentences.

Relate these models to the psycholinguistic data from slips of the tongue.

What is the current status of Geschwind's hypothesis on the role of the angular and supramarginal gyri?

Explicitly compare the Wernicke, Lichtheim, and Geschwind schemes with that for Luria.

Note the reference to Luria's Vygotskian concern with development.

B

Selective Naming

____

Tertiary L. Parieto-

Occipital Zone

A

Visual Perception

____

L. Temporo-Occipital

Zones

C

Switching Control

____

Inferior Zone of

L. Premotor Cortex

D

Articulatory System

____

Inferior Zone of

L. Postcentral Cortex

Visual Input

Figure 35. Naming of objects. [Arbib and Caplan (1979) based on Luria's (1973) analysis.]

In the object-naming task, the subject looks at an object and is to name it by an appropriate spoken word. Clearly, object naming requires reasonably precise visual perception. Luria singles out the left temporo-occipital zone (A), where lesions disturb both the ability to name objects and the ability to evoke visual images in response to a given word, as the anatomical site of this component. A patient with such a lesion cannot draw a named object, even though he can copy a drawing line-by-line. In short, lesions here seem to impair the transformation between an array of isolated visual features and a perceptual unity

Arbib: 664 notes for February 26 and 28, 2002 10 into which the features are integrated. The next step (boxes B and C) is to discover the appropriate name and

to inhibit irrelevant alternatives. Lesions of the left tertiary parieto-occipital zones yield verbal paraphasias

 the appearances of irrelevant words that resemble the required word in morphology, meaning, or phonetic composition. Irrelevant sensory features of the object or of the articulatory or phonetic information associated with its name can evoke a response as easily as correct features. It is as if the inhibitory constraints were removed in a competitive process. Such lesions do not disturb the phonological representation of language: prompting with the first sound of a name does trigger its recall.

Goodglass: Semantically and phonologically motivated paraphasias may be associated with lesions elsewhere in the posterior language zone, particularly the superior temporal gyrus; moreover, that the ability to discover the appropriate name may be severely damaged without paraphasia as a product as in anemic aphasia.

How do we reconcile this with the view that the temporal lobe provides the "what", whereas the parietal lobe provides the "how" and so, in particular, is not implicated in naming? Also: Need a view of "the many auditory systems" to set against that of Van Essen et al. for vision.

Dell et al. (Psych Review, 1997) model this paraphasia without box B and C. They change global network parameters such as decay and connection strength (they present a normal model which they then lesion to fit data from 21 aphasic patients - and derive predictions which were confirmed experimentally.).

Luria also includes phonemic analysis (box E) in the naming of objects. Lesions of the left temporal region disturb the phonemic organization of naming, yielding literal paraphasias, in which words of similar phonemic organization are substituted. In strong contrast with the verbal paraphasias induced by box B lesions, prompting with the initial sound of the name does not help the patient with a left temporal lesion. # cf. Wernicke and Lichtheim.# This model exemplifies Luria's view of the brain as a functional

system. It is clear that box E is not just for sensory phonemic analysis and that box D is not purely for motor articulatory analysis. Rather, both systems participate in all brain functions that require exploitation of the network of representations that define a word within the brain. Convergence on the proper word can be accelerated by the cooperative exploitation of both phonemic and articulatory features, and of others as well.

Discuss the issue of how multiple "knowledge sources" can speed convergence. Relate to the motor theory of speech perception.

Luria's view is that the linguistic system utilizes multiple components to arrive at the appropriate item. Some of these components (e.g. box D related to articulatory analysis) include representations at remote linguistic levels. The retrieval of the appropriate word involves the exploitation of the entire network of representations that define a word, and convergence on the proper word is accelerated by cooperation of the entirety of the components of the language device. Similarly, one can imagine a comparable cooperative mechanism for the assignment of a representation to an object from visual data.

A fuller analysis of such a process would involve, for example, distinguishing the 3-D shape of an object

Arbib: 664 notes for February 26 and 28, 2002 11 from its functional role and might utilize these two different sorts of information in different routines

(one of which, in this example, might plausibly involve information of a "motor" sort). cf. recent imaging data distinguishing brain activity for tools from that for other objects.

F

Plan Formation

____

Frontal Lobes

G

Formation of the

Linear Scheme

____

Inferior Zone of

L. Fronto-Temporal

Cortex

J

Logical Scheme

____

L. Parieto-Temporo-

Occipital Zones

Figure 38. Verbal expression of motives [Arbib and Caplan (1979) based on Luria's (1973) analysis.]

Luria's description of the processes involved in speech production, "verbal expression of motives", is brief. The frontal lobes are essential for the creation of active intentions or the forming of plans. Frontal lesions (box F) do not disturb the phonemic, lexical, or logico-grammatical functions of speech, but they do disturb its regulatory role. Lesions of the left inferior fronto-temporal zone (box G) yield "dynamic aphasia" - the patient can repeat words or simple sentences and can name objects, but is unable to formulate a sentence beyond "Well...this...but how?..." Luria thus views the task of this region as being to recode the plan (formulated by box F) into the "linear scheme of the sentence," which makes clear its predicative structure.

Analyze the notion of "regulatory role." Can we integrate this with our “neurologized HEARSAY”?

H

Lexical Analysis

____

Posterior Zone of

L. Temporo-Occipital

Region

I

Speech Memory

____

Middle Zones of

L. Temporal Region

Deep Zones of

L. Temporal Lobe

F'

Active Analysis of

Most Significant

Elements

____

Frontal Lobes

Figure 37. Speech understanding [Arbib and Caplan (1979) based on Luria's (1973) analysis.]

Arbib: 664 notes for February 26 and 28, 2002 12

H

Lexical Analysis

____

Posterior Zone of

L. Temporo-Occipital

Region

I

Speech Memory

____

Middle Zones of

L. Temporal Region

Deep Zones of

L. Temporal Lobe

J

Logical Scheme

____

L. Parieto-Temporo-

Occipital Zones

F'

Active Analysis of

Most Significant

Elements

____

Frontal Lobes

Figure 37. Speech understanding [Arbib and Caplan (1979) based on Luria's (1973) analysis.]

Turning to speech understanding we can follow Luria's analysis of the process whereby the spoken expression is converted, in the brain of the hearer, into its "linear scheme," from which the general idea and the underlying conversational motive can be extracted. Box E performs its usual role of phonemic analysis, supplying input to box H. Lesions here, in the posterior zones of the temporal or temporooccipital region of the left hemisphere, leave phonemic analysis unimpaired, but grossly disturb the recognition of meaning . #This seems more consonant with the view of a temporal "what" system than is the role of Box B in naming.# Luria very tentatively suggests that this may be due to the impairment of concerted working of the auditory and visual analyzers. The intriguing suggestion here seems to be that phonological representations serve to evoke a modality-specific representation (akin to a visual image) rather than directly evoking a linguistic semantic representation. The former representation aids the evocation of the appropriate semantic and syntactic representation for further processing.

Contrast this notion of "concerted working of the auditory and visual analyzers" to the purely linguistic structure of the lexicon in Chomsky's LGB.

# Summarize the key "components" of sentence production and comprehensionRelate to abstract formulations of grammatical competence. Contrast "effort" required to recognize versus generate a word or grammatical construct#

Luria identifies three subsystems involved in syntactic-semantic analysis: speech memory, logical scheme, and active analysis of most significant elements. Lesions of the parieto-temporo-occipital zones of the left hemisphere (box J) impair perception of spatial relations, constructional activity, complex arithmetical operations, and the understanding of logico-grammatical relations. A sentence with little reliance on subtle syntax

(1) Father and mother went to the cinema but grandmother and the children stayed at home is still understood, whereas a sentence like

Arbib: 664 notes for February 26 and 28, 2002 13

(2) A lady came from the factory to the school where Nina worked cannot be understood. Understanding a sentence requires not only the retention of its elements, but their simultaneous synthesis into a single, logical scheme. Luria argues that data on parieto-temporo-occipital lesions give neurological evidence of a system specifically adapted to this synthesis for those constructions where identical words in different relationships receive different values. Box J plays a role when the grammatical codes - case relations, prepositions, word order, and so forth - are decisive in determining how the words of the sentence combine to give its overall meaning.

How can the effects of the lost capacity be isolated from the other processing capacities, in such a way as to provide insight into how the deviant comprehension system can understand (1) but not (2)? What are the general empirical constraints which affect the 'translation' of computational processes into neural mechanisms. In the case of phonological processes it might be assumed that there are enough empirical data to generate a reasonably precise model, one yielding an illuminating simulation of neural mechanisms. But for syntactic and semantic processes the data, such as they are, would seem to be neither as stable nor as transparent. Hence, one might wonder whether there is an adequate empirical base to generate a simulation that can be similarly illuminating.

Distinguish neural and connectionist modeling here.

F"

Updating the Plan of the Expression

____

Frontal Lobes

...

E

Phonemic Analysis

____

Secondary Zone of

L. Temporal Cortex

Figure 36. Speech repetition. [Arbib and Caplan (1979) based on Luria's (1973) analysis.]

Luria also analyzes repetitive speech, surveying the brain regions involved when a subject repeats sentences spoken to him.

Boxes are labeled not in terms of what role they might play as a subsystem of the whole functional system, but rather in terms of the deficit associated with a lesion of the subsystem. For example, Luria does not speak of a subsystem whose removal blocks the proper implementation of switching, but rather calls it the subsystem for switching control.

Critique: The "diagramatization" of Luria (Figure 7) once more relabels the old notation, leading one to ask for the evidence that Lexical Analysis (Box H), say, is a theoretically meaningful component at any level of description; worse, the notation fails to distinguish such presumably disparate "functions" as Phonemic Analysis (Box E) and

Switching Control (Box C).

Arbib: 664 notes for February 26 and 28, 2002

B

Selective Namin g

___ _

Tertiary L. Parieto-

Occipital Zo ne

C

Switching Contro l

___ _

Inferior Zo ne of

L. Premoto r Cortex

D

Articulatory Sys tem

___ _

Inferior Zo ne of

L. Pos tcentral Cortex

A

Vis ual Perception

___ _

L. Temporo -Occipital

Zones

Vis ual Inpu t

14

Au ditory Inpu t

F"

Updating the Plan of th e Exp res sio n

___ _

Fron tal Lobes

F

Plan Formatio n

___ _

Fron tal Lobes

...

E

Pho nemic Analysis

___ _

Secon dary Zone of

L. Temporal Co rtex

G

Formation of the

Lin ear Scheme

___ _

Inferior Zo ne of

L. Fronto -Temporal

Cortex

H

Lexical Analysis

___ _

Pos terior Zon e o f

L. Temporo -Occipital

Reg ion

I

Speech Memory

___ _

Midd le Zones o f

L. Temporal Regio n

Deep Zon es of

L. Temporal Lobe

F'

Active Analysis o f

Mos t Sig nifican t

Elements

___ _

Fron tal Lobes

J

Log ical Scheme

___ _

L. Parieto-Temporo-

Occipital Zo nes

Figure 34: A summary of diagrams developed by Arbib and Caplan (1979) based on Luria's (1973) analyses of object naming, speech production, speech comprehension, and repetition. Color code: naming of objects (blue); verbal expression of motives (green); speech understanding (red); and speech repetition (pink).

Arbib: 664 notes for February 26 and 28, 2002

B Selective

Naming

A

Visual

Perception

15

C

Switching

Control

D

Articulatory

System

Visual Input

Auditory Input

F" Updating the Plan of the Expr’n

...

E

Phonemic

Analysis

H

Lexical

Analysis

F

Plan

Formation

G

Formation of the

Linear

Scheme

I

Speech

Memory

F'

Analysis of

Significant

Elements

J

Logical

Scheme

Figure: Previous figure redrawn without anatomical labels.

Each psycholinguistic task is performed by several components acting in parallel and sequential fashion. Many components are involved in several tasks. One might consider whether more shared components could be derivable from the system (G and J, for instance, might interact through some shared process) and whether some components could not be partitioned (I seems overburdened). It also seems clear that additional components might profitably be considered (the relation of "planning" to

"linear schemes," J - F in and F - G out, seems simplistic.)

Basically, this 1979 analysis of Luria’s earlier findings is just a placeholder for a new analysis that addresses more recent data and ties into our analysis of brain evolution.

Despite his concept of the "functional system", Luria still talked of the functions of a region in terms of the deficits resulting from lesions which were localized there, and we have indicated how a functional analysis may yield a reinterpretation of Luria's data which leads to a characterization of a region as part of a network of dynamic interactions, quite different from the role-in-isolation of Luria's original analysis.

However, there is insufficient specification of the input-output codes of the components to allow a clear conception of exactly how the components function individually or how they utilize information from each other. Several components, such as E, are relatively well specified in this regard (though none are fully specified to the point, for instance, where they could be directly transposed to a computer), but others, such as G and J, are grossly underspecified and seem to involve the construction of a variety of different linguistic representations.

Arbib: 664 notes for February 26 and 28, 2002 16

What are the "rules", or consistent patterns, that describe how partially analyzed packages of behaviorally related data are transferred from one area of the brain to another. There probably are significant differences between the transcallosal connections of primary and secondary association areas

(Pandya & Vignolo, 1969).

Cf. the work of Van Essen on connections of the various visual cortices.

Recall Luria's analysis of lesions related to naming. The arrows have no particular significance, for despite Luria's emphasis on functional systems, he paid little explicit attention to patterns of functional interaction between subsystems. The roles ascribed to the boxes labeled "articulatory system" and

"phonemic analysis" seem to correspond to Wernicke's analysis suggesting that speech production will be defective without a phonemic base on which to match the shape of the words. This dispels the naive view that phonemic analysis is irrelevant to the naming task of going from a visual input to a speech output.

Recent research in developmental dyslexia strongly suggests that phonological problems is a major cause of dyslexia.

The box labeled "visual perception" takes an array of retinal stimulation and integrates it into a percept

(or percepts), whether or not a name can be given to that percept. A person with a lesion here still has vision in the sense of being able, shown a drawing, to copy it by copying the lines, but is not able to name the object, and - which is crucial to showing that this can be called a visual perception deficit rather than a naming deficit - is unable, shown a drawing, to recreate it in the way that having seen a drawing of a cat one can draw another cat thereafter even if the second cat is graphically quite different from the first.

Luria says that if a person has a lesion of the region called "selective naming" they will come up with a name, but it is as likely as not to be the wrong name, and can err while being semantically similar or phonemically similar. Finally, a patient with a lesion of "switching control" will have no particular problem with a single test; but, if shown object after object, will be very likely to perseverate with a particular name, and will use it repeatedly.

We now offer a re-presentation of Luria's analysis of naming using concepts such as “winner-takes-all”

(WTA). Box A, "visual perception", is given the crudest possible reanalysis on the analogy with foodness and relative foodness. Basic processing activates an array of possible schemas, the internal representations of objects. The system is to choose for utterance a name associated with just one of these schemas, and so we posit an array in which various processes are carried out in interaction with other arrays to bring one schema up to peak activity and suppress others. We posit "linguistic S-cells" that take semantic, contextual, and syntactic cues into account and which monitor activity and on that basis provide local inhibition and ensure that only one of a range of alternatives would be emitted. A lesion to these S-cells would remove control over which schema-name would first reach the motor output, and so an S-cell lesion is precisely analogous to what Luria calls a lesion of selective naming. Note that, in distinction from Luria, we no longer view "selective naming" as the function of a separate "box", but rather as the outcome of intimate dynamic interaction between several "boxes". To complete this

Arbib: 664 notes for February 26 and 28, 2002 17 exposition, we add N-cells that respond to novel stimuli. These correspond to the function of "switching control" called for in Luria's analysis. Implementing the circuit along lines suggested by Didday (1976) and Amari and Arbib (1977) offers precise hypotheses on the neural interactions whereby switching control is achieved, and reveals hysteresis in the selective naming circuitry as responsible for lack of switching when the N-cells are lesioned. In this way, we have initiated a more sophisticated neurolinguistic analysis which stresses how a pattern of dynamic interaction could achieve the effects of selective naming and switching control.

Towards a Mirror Neurolinguistics

The MNS model shows that the monkey needs many brain regions for the mirror system for grasping.

We will need many more brain regions for an account of language-readiness that goes "beyond the mirror" to develop a full neurolinguistic model that extends the linkages far beyond the F5  Broca’s area homology. To set the stage for the future development of such a model, we briefly link our view of AIP and F5 in monkey to data on human abilities. Studies of the visual system of monkey led Ungerleider and

Mishkin (1982) to distinguish inferotemporal mechanisms for object recognition (“What”) from parietal mechanisms for localizing objects (“Where”). Goodale and Milner (1992) extended this to human, studying a patient (DF) with damage to the inferotemporal pathway who could grasp and orient objects appropriately for manipulating them but could not report – either verbally or by pantomime – on how big an object was or what the orientation of a slot was. They thus viewed location as just one parameter relevant to how one interacts with an object, re-naming the “Where” pathway as the “How” pathway.

Another patient (AT; Castiello, Paulignan & Jeannerod, 1991), with damage to the parietal pathway, could communicate the size of a cylinder but not preshape appropriately. However, she could preshape appropriately if the “semantics” of an object indicated its size – suggesting the path from inferotemporal cortex (IT) to the controller for grasping shown in Figure 12.

Let us now try to reconcile these observations with our mirror-system based approach to language.

Our evolutionary theory suggests a progression from action to action recognition to language as follows:

1) object  AIP  F5 canonica l

: pragmatics

2) action  PF  F5 mirror : action understanding

3) scene  Wernicke’s  Broca’s: utterance

The "zero order” model of the Figure 12 data is:

4) Parietal “affordances”  preshape

5) IT “perception of object”  pantomime or verbally describe size

However, (5) seems to imply that one cannot directly pantomime or verbalize an affordance; one needs the "unified view of the object" (IT) before one can communicate attributes. The problem with this is that the “language” path as shown in (5) is completely independent of the parietal  F5 system, and so the data seem to contradict our view in (3).

Arbib: 664 notes for February 26 and 28, 2002

reach programming

18

Parietal

Cortex

grasp programming

How (dorsal)

Visual

Cortex

Inferotemporal

Cortex

What (ventral)

Figure 12.

"What" versus "How”. Lesion of parietal pathway yields inability to verbalize or pantomime size or orientation; lesion of the inferotemporal pathway yields inability to preshape (except for objects with size “in the semantics”).

To resolve this paradox, we must augment Figure 12 with psychophysical data (Bridgeman, Peery &

Anand, 1997; Bridgeman, 1999). In their experiments, an observer sees a target in one of several possible positions, and a frame either centered before the observer or deviated left or right. Verbal judgments of the target position are altered by the background frame's position but "jabbing" at the target never misses, regardless of the frame's position. The data demonstrate independent representations of visual space in the two systems, with the observer aware only of the spatial values in the cognitive (inferotemporal) system (Castiello et al., 1991). Bridgeman et al. have also shown that a symbolic message about which of two targets to jab can be communicated from the cognitive to the sensorimotor (parietal) system without communicating the cognitive system's spatial bias as well. To make this consistent with the cross-talk from IT to posterior parietal cortex (PP) postulated in Figure 12, I would suggest that the IT "size-signal" has a diffuse effect on PP – it is enough to bias a choice between 2 alternatives, or provide a default value when PP cannot offer a value itself, but not strong enough to perturb a single sharply defined value when it has been established in PP by other means. In any case, the crucial point for our discussion is that communication must be based on the size estimate generated by IT, not that generated by PP.

Arbib: 664 notes for February 26 and 28, 2002

AIP

PF

Visual

Input

STS

Wernicke’s

Area

F5 canonical

F5 mirror

Choosing an

Action

Broca’s

Area

Describing an

Episode, Object or Action

19

Recognizing an

Object or an

Action

IT Prefrontal (PFC)

Memory

Figure 13. An Early Pass on a mirror-system based neurolinguistics.

Given these data, we may now recall (Figure 5) that although AIP extracts a set of affordances, it is IT and PFC that are crucial to F5’s selection of the affordance to execute, and then offer the scheme shown in

Figure 13. Here we now emphasize the crucial role of IT-mediated functioning of PFC in the activity of

Broca's area. This is the merest of sketches. For example, we do not tease apart the different role of different subdivisions of PFC in modulating F5 canonica l

, F5 mirror

, and Broca’s area. However, the crucial point is that, just as F5 mirror receives its parietal input from PF rather than AIP, so Broca's area receives (I hypothesize) its size data as well as object identity data from IT via PFC, rather than via a side path from

AIP.

Turning this into a well-formed neurolinguistic model – and developing an integrated view of syntax and semantics to go with it – will require both analysis of neurological data and subtle modeling. This work constitutes a central aim for going "Beyond the Mirror" to establish the mechanisms of the MNS circuitry of Figure 7 as the evolutionary heart of a fully articulated model which links neurolinguistics

(Arbib and Caplan 1979) to the basic neural mechanisms for the recognition of the interactions of actors and objects, and for the elaboration of suitable motor plans for interacting with the environment so perceived.

Download