Word97 version - Department of Psychological Sciences

advertisement

Thomas, M.S.C. & de Wet, N.M. (1998). Stochastic double dissociations in distributed models of semantic memory. In: G. Humphreys & D. Heinke (Eds.),

Proceedings of the 5th Neural Computation and Psychology Workshop . Springer.

Stochastic double dissociations in distributed models of semantic memory

Michael S. C. Thomas

Neurocognitive Development Unit,

Institute of Child Health,

University College London Medical School,

London, U.K.

Nicola M. de Wet

Department of Psychology, King Alfred’s College,

Winchester, UK

Abstract

We present a preliminary set of connectionist models of impairments to semantic memory, exploring the conditions under which double dissociations between knowledge of living and non-living entities occur. Small et al [1] argue that category specific impairments are a consequence of semantic feature based representations in a fully distributed memory system. Farah and McClelland [2] argue that category specific impairments arise due to modular structure in semantic memory, albeit structure that is specific to modality; they hypothesise that living and non-living entities have a differential reliance on perceptual and functional features. We evaluated these respective claims by lesioning a simple autoassociative model of semantic memory, using a 2x2 design: fully distributed architecture versus partially modular architecture with modality specific channels;

Small et al’s training set versus a training set constructed according to

Farah and McClelland’s perceptual/functional scheme. One thousand stochastic lesions were applied to each network ‘subject.’ The results supported Farah and McClelland: on average, double dissociations required modular structure and differential reliance on modalities.

However, by choosing select (i.e. rare) lesions from each set of 1000, double dissociations of living versus non-living knowledge were found in both networks using both training sets. We discuss the idea that statistical distributions of impairments in patients with similar lesions are necessary to compare against the predictions of functional models, and thus that single case studies may be insufficient to distinguish distributed and modular architectures.

1 Introduction

The functional structure of semantic memory has been explored through impairments found in patients with brain damage. Of particular interest have been category specific impairments of semantic knowledge, found across a range of tasks including reading [3], object naming [4, 5], and comprehension [6, 7], implying that it is semantic memory itself which has been functionally damaged. Selective impairments have been found to categories such as animate objects [6], inanimate objects [8], concrete objects [3], animals, fruits and vegetables [5], fruits and vegetables [4], and animals [9]. Such category specific impairments were generally caused by bilateral temporal lobe injury, arising through infection by herpes encephalitis [1]. However, these selective impairments are rare [1]. In this paper, we will seek to evaluate two accounts of this pattern of selective impairments, both based on parallel distributed processing models of semantic memory. First we will examine a model proposed by

Farah and McClelland in 1991, and then a model proposed by Small et al in 1996.

1.1 Farah and McClelland (1991)

Farah and McClelland (F&M) sought to resolve a conflict between two types of neuropsychological evidence. On the one hand, most selective losses of knowledge after brain damage appear to be tied to specific modalities. This would imply that semantic memory is subdivided by modality of knowledge, which F&M claimed is consistent with neurophysiological data showing that most cortical neurons are modality specific. On the other hand, there are the more infrequent cases of category specific impairment. Pursuing an idea proposed by Warrington and colleagues, F&M resolved this conflict by suggesting that semantic memory is subdivided by modality, but that categories are differentially reliant on different modalities (for example, visual information vs motor associations vs auditory information vs tactile information, and so on). Using subjects to rate dictionary definitions, they showed that living things on average have 7.7 times as many perceptually based as functionally based features, whereas non-living things only have 1.4 times as many perceptual features as functional features. They then put forward a connectionist model, which used distributed representations to map interactively between verbal information (words), semantic information, and visual information (pictures).

Semantic information was split into 60 perceptual and 20 functional features.

Representations for 10 living and 10 non-living objects were defined, reflecting the ratios of perceptual and functional features generated from their empirical study. On lesioning the network, F&M found that living things were affected worse by lesions to perceptual features, and non-living by lesions to functional features.

The modality specific hypothesis suffers one inconsistency: if impairment to knowledge of living things is caused by an impairment to perceptual knowledge alone, why is it that patients also show an impairment to their knowledge of the functional attributes of living things [5, 6, 10, 11, 12]? However, F&M’s model also showed this characteristic. The authors explained this effect in terms of a ‘critical mass’. The claim was that distributed representations need a certain critical mass to become activated: if this critical mass (in this case, of perceptual features) is

eliminated in the representations of living things, the preserved part of the representation (i.e. the functional features) would not become fully activated.

1.2 Small, Hart, Nguyen, and Gordon (1996)

Small et al, on the other hand, resist the idea that category specific impairments necessitate any explicit partitioning or organisation of semantic memory. Their hypothesis is that ‘a significant degree of semantic memory organisation could devolve naturally from the demands of a system that encodes features of objects’ (p.

127). In this view, categories emerge in a fully distributed semantic memory from the structure inherent within the information being encoded. Small et al explored this idea using connectionist models which either autoassociated semantic features of objects, or mapped between semantic features and a (localist) name for each object. They defined semantic codes for 57 objects, over a set of 77 semantic features. The objects were taken from pictures used by Hart and Gordon [9] in their study of a patient with a category specific deficit. Small et al’s feature set was initially based on the features used by Hinton and Shallice [13] in their connectionist model of deep dyslexia, but also ‘derived from several studies of patients with category specific anomias, where explicit knowledge of the features of items were assessed [4, 9]. The feature set was chosen to depict adequately the specific items in the study based upon their perceptual qualities, physical attributes, motor associations, and functional/ associative attributes deemed pertinent to describe the objects verbally.’ (p. 111).

Although Small et al performed no lesions on their networks, they established that their feature set generated internal representations which demonstrated precisely those categories which are found in specific impairments after brain damage.

Hierarchical cluster analysis, principal components analysis, and dependency analysis all revealed emergent categories of animals, fruits and vegetables, tools, and vehicles.

Small et al further found that a unsupervised network trained on the pattern set developed clusters of units responsible for representing each category. Selectively removing such clusters of units would cause category specific impairments 1,2 . Small et al concluded that their computational study provided ‘one explanation for the cognitive neurological manifestations of diseases such as herpes simplex encephalitis that affect primarily the temporal neocortex and limbic system’ (p. 127).

1

However, such a model would imply that category specific deficits can only be generated by focal lesions. Patient data suggests that they also regularly occur through diffuse damage, such as in

Alzheimer’s disease [23]

2

In a similar vein, French and Mareschal [24] demonstrated that an autoassociative network trained on

Living and Non-living patterns but using a training algorithm that progressively compressed hidden unit representations, led to certain hidden units specialising in representing certain categories. Removal of these units would cause selective deficits for that category.

1.3 Comparison

We thus have two competing hypotheses. The first accepts the notion that the double dissociation of living and non-living categories must derive from some structural partitioning of semantic memory, albeit by modality rather than category. The second maintains that a fully distributed semantic memory will generate category specific impairments simply through latent structure in the knowledge that the system is representing. Although in the absence of lesioning evidence from their models, the

Small et al claim must be considered the weaker, we were nevertheless interested in comparing these approaches. It seemed to us important to establish whether the organisation of distributed representations established by network analyses techniques such as PCA could ever appear as selective impairments. If not, then it seems reasonable to accept Farah and McClelland’s line that selective impairments must reveal structural partitioning of the system in question. Given the reliance of standard cognitive neuropsychology on deducing structure from single case studies, we were also interested in exploring whether there is any difference in specific case studies and the average effect of lesioning networks. Thus, following an approach recently suggested by Juola and Plunkett when lesioning a connectionist model of inflectional morphology [14], we lesioned our models many times and examined the distribution of the pattern of results.

2 Simulations

2.1 Design

We based our simulations around Small et al’s autoassociative semantic memory network, shown in Figure 1a. While autoassociation can only be a component of a semantic memory system that must be accessed in different ways, it nevertheless allowed us to develop a distributed system representing the knowledge base and investigate the effect of lesions to this system 3 . We trained the network to autoassociate Small et al’s training set, 57 objects (23 living, 34 non-living) represented using 59 named perceptual features (e.g. colour, size, texture) and 18 named functional features (e.g. makes_noise, propelled, for_cooking). Features were binary, 1 if the feature was present, 0 if not. We will then lesion this network.

Lesioning was performed by damaging the connections at various points in the network (see Bullinaria and Chater [15] for a discussion of the implications of using different forms of damage). We examined the effect of lesioning the Small et al network at 3 different sites: between the input layer and the first hidden layer, between the two hidden layers, and between the second hidden layer and the output layer (see Fig 1a). We used 6 network ‘subjects’, trained using different random

3

If semantic memory were accessed in different ways, this would change the nature of the computations required of the distributed model. For example, a naming network might take a featural representation as input and map to localist word nodes as output [1]. Mappings of this sort would alter the similarity structure established over internal representations: the nature of the mappings is thus likely to change the patterns of deficits which arise through damaging such representations. In this preliminary work, we chose to start with the more simple case of autoassociation.

seeds. For each subject, we lesioned each site 1000 times, and took the mean impairment at each site. We then averaged across our six subjects to find the results at each site. We repeated this with 30% impairment to the connections at each site,

50%, 75%, and 100%. The different degrees of damage were implemented by subjecting each weight to a .30, .50, .75, or 1.00 chance of being set to zero. In effect, this schedule of lesionings is equivalent to assessing 72,000 patients with damage to their semantic memory systems.

Secondly, we modified the original Small model to implement the assumptions employed by F&M. F&M proposed modality specific structure for the semantic memory system. We implemented this by splitting the first hidden layer in two, and having channels specialised for processing perceptual and functional knowledge. The first hidden layer was split in proportion to the ratio of the number of perceptual and functional input units. This architecture is shown in Figure 1b. We now needed a training set which embodied F&M’s assumptions about the ratio of perceptual features to functional features for living and non-living things. Accordingly, binary patterns were generated at random under the following constraints: there were 23 living patterns, 34 non-living patterns; 58 perceptual features and 19 functional features; patterns were generated around prototypes, where each exemplar had a 20% chance of each of its features being different from that of its prototype; living exemplars were based around 4 prototypes, non-living exemplars were based around

Hiddens (12)

(a)

Perceptual Features (59)

Site 3

Site 2

Functional

Features (18)

(b)

Perceptual Features (59)

Hiddens (12)

Site 3

Site 5

Functional

Features (18)

Site 4

Hiddens (36)

(27) (9)

Site 1

Perceptual Features (59) Functional

Features (18)

Site 1

Perceptual Features (59)

Site 2

Functional

Features (18)

Figure 1. Architecture of (a) the fully distributed Small et al model, and

(b) the semi-modular ‘Farah and McClelland’ model.

6 prototypes; the overall ratio of perceptual to functional features in the living exemplars was 7.8:1, in the non-living exemplars it was 1.8:1. Living exemplars had on average 33.9 perceptual features active, and 4.3 functional features active; nonliving exemplars had on average 15.1 perceptual features active and 8.5 functional features active. (The equivalent values for the Small et al training set were for living were 1.6:1, 8.9 perceptual features and 5.5 functional features; for non-living 2.9:1,

10.3 perceptual features and 3.6 functional features). As before, we took 6 network

‘subjects’, trained using different random seeds. For each subject, we lesioned each of the 5 sites 1000 times (compared to F&M’s 5 times), and took the mean

impairment each of the sites in this semi-modular network. We then averaged across our six subjects. We will repeat this at 30%, 50%, 75%, and 100% impairment. This is equivalent to assessing 120,000 patients with damage to their semantic memory systems.

Finally, we trained the Small et al architecture on the F&M training set, and the

F&M architecture on the Small et al training set, yielding 192,000 further ‘patients’.

This design allowed us to investigate separately the effect of training set and of architecture.

Simulation details : Network employed sigmoid processing units; weights were initially randomised between +/- 0.5; networks trained using a learning rate of 0.01 and no momentum, using backpropagation with cross-entropy [16]. Training was terminated when all the features for all patterns fell the correct side of 0.5. For fully distributed networks this took on average 1514 epochs for semantic features, 1400 epochs for ratio features; for semi-modular networks, this took 1556 epochs and 832 epochs respectively. Piloting of the above scheme revealed that the F&M training set, with less internal structure, required more hidden units to learn to criterion.

Accordingly, the distributed and semi-modular networks trained on this set used double the number of hidden units in each layer (see discussion).

2.2 Assessment of Performance

We evaluated the performance of the networks under lesioning in terms of the mean increase in RMS error across the living and non-living patterns. In terms of the response of the network, it may seem more appropriate to perform a nearest neighbour transform on the network output, and determine whether the output most resembles the target from among the training set; that is, is the network still closest to the correct pattern? However, to perform such a transform on the raw activations risks warping the pattern of results: it discriminates against those patterns which are tightly clustered in the pattern space, and in favour of those which have few close neighbours. Since we are interested in the effects on the underlying representations, we will concentrate on changes in output accuracy. Note, it may well be that some dissociations in the patient data result from differences in the density of the pattern space. However, here our emphasis remains on underlying representations.

2.3 Results

2.3.1 Analyses of the training sets

A principal component analysis was used to explore the structure of the two training sets. A plot of the first two principal components showed that in both the Small et al pattern set and the F&M ratio training set, living and non-living patterns were separated. In the Small et al set, there were additional clusters for animals, fruit and vegetables, cooking ingredients, tools, and vehicles. In the F&M ratio training set there was loose clustering of the exemplars generated around each prototype.

2.3.2 Mean population results

Table 1 shows the mean error on Living and Non-living patterns in each combination of network architecture and training set, as increasing levels of lesioning were applied. Figures are averaged over sets of 1000 lesions within subjects, and then across the set of 6 network subjects. Underlined numbers show dissociations between

Living and Non-living patterns at each site. Figure 2 shows these data graphically for

3 sites in the semi-modular network trained on the F&M ratio pattern set.

Table 1: Mean error for Living and Non-living patterns for each network and training set after lesions at each lesion site. Scores are averaged over the 6 network ‘subjects’. Underlined figures show where a dissociation has occurred between the two types of patterns, defined here as a difference greater than 0.01.

Distributed Network, Small et al Training Set

%

0

30

50

L

Site 1

0.02

0.05

0.07

NL

0.02

0.05

0.07

L

Site 2

0.02

0.08

0.11

NL

0.02

0.08

0.11

L

0.02

0.11

0.16

Site 3

NL

0.02

0.11

0.16

75 0.12 0.11 0.16 0.16 0.22 0.21

100 0.18 0.17 0.21 0.21 0.26 0.26

Distributed Network, F&M Ratio Training Set

%

0

L

Site 1

0.05

NL

0.04

L

Site 2

0.05

NL

0.04

L

0.05

Site 3

NL

0.04

30

50

0.20

0.26

0.18

0.24

0.23

0.29

0.23

0.28

0.22

0.29

0.23

0.30

75 0.35 0.30 0.37 0.34 0.37 0.38

100 0.48 0.41 0.49 0.42 0.47 0.48

Semi-Modular Network, Small et al Training Set

Perceptual Channel

L

Site 1

NL

0.02 0.02

L

Site 3

0.02

NL

0.02

0.04 0.04 0.06 0.06

Functional Channel

L

Site 2

0.02

0.03

NL

0.02

0.02

L

Site 4

0.02

0.04

NL

0.02

0.03

0.05

0.07

0.11

0.05

0.08

0.13

0.08

0.10

0.13

0.07

0.10

0.13

0.04

0.05

0.08

0.03

0.03

0.04

0.04

0.05

0.06

0.04

0.05

0.06

Semi-Modular Network, F&M Ratio Training Set

Perceptual Channel

L

Site 1

NL

0.04 0.04

L

Site 3

0.04

NL

0.04

Functional Channel

L

Site 2

NL

0.04 0.04

L

Site 4

0.04

NL

0.04

0.13

0.20

0.32

0.50

0.09

0.12

0.18

0.25

0.15

0.20

0.28

0.37

0.14

0.19

0.25

0.31

0.06

0.06

0.08

0.09

0.06

0.07

0.10

0.11

0.07

0.09

0.10

0.12

0.07

0.09

0.11

0.13

Common

L

Site 5

NL

0.02 0.02

0.10 0.10

0.15

0.21

0.25

0.15

0.21

0.26

Common

L

Site 5

NL

0.04 0.04

0.21

0.28

0.37

0.48

0.21

0.28

0.37

0.48

Notes: a) % refers to the percentage of connections removed at the relevant site during lesioning. b) scores are averaged over 6 network

‘subjects’; the standard errors of these means are all  0.01.

Four main points may be taken from these results. (1) The distributed network trained on the Small et al pattern set showed no dissociations. Small et al’s claim that latent structure in the training set may explain category specific deficits is not supported by these data. (2) The distributed network trained on the F&M ratio training set showed a single dissociation, in which knowledge of living objects was more vulnerable to damage. This may reflect the additional hidden units used in this network, or more likely, that the structure of the ratio pattern set itself leaves Living patterns more open to damage. Specifically, Living patterns are much more similar to each other over their perceptual features, and thus harder to distinguish in compressed internal representations. (3) In the semi-modular network trained on the Small et al pattern set, a double dissociation between knowledge of living objects and non-living objects was found at high levels of damage. Damage to the perceptual channel impaired Non-living more than Living. Damage to the functional channel damaged

Living more than Non-living. In this model, we see that damage to modular structure did lead to a double dissociation of category specific knowledge, (albeit in the opposite direction to Farah and McClelland’s hypothesis, since Living patterns

Figure 2. Increases in error with increasing lesions to network connections for the semi-modular network trained on the F&M ratio training set. (a) lesions at Site 1, in the perceptual channel; (b) lesions at Site 2, in the functional channel; (c) lesions at Site 5, in shared representations.

(a) (b)

Living vs Non-living Living vs Non-living

0.5

0.5

0.4

0.4

0.3

0.3

Living

Non-living

0.2

0.2

0.1

Living

Non-living

0.1

0

None 30% 50% 75%

Level of Impairment

100%

0

None 30% 50% 75%

Level of Impairment

100%

(c)

Living vs Non-living

0.5

0.4

0.3

0.2

Living

Non-living

0.1

0

None 30% 50% 75%

Level of Impairment

100% appeared to rely more on the functional channel than the perceptual channel).

However, these results come at high levels of damage at sites close to the input.

Performance after such damage reflects a fixed response, irrespective of varying input, driven only by the resting activations of hidden unit and output units. Thus the double dissocation reflects specialisation in the resting activation of the hidden units in the perceptual and functional channels. (4) The semi-modular network trained on the F&M ratio pattern set replicated the results of Farah and McClelland’s initial study. Double dissociations were found between categories of knowledge when modality specific structures were damaged. No such dissociation was found when the common processing channel was damaged. The pattern of damage reflected the

asymmetry found in patient data, with Living patterns suffering most impairment. For

Living patterns, damage to the perceptual channel also impaired functional knowledge, due to the common processing stage.

In summary, the results supported the idea that the pattern of impairments found in patient data reflect an interaction between a modality specific modular structure, and the latent structure of a particular problem domain. Averaged over populations, distributed networks cannot demonstrate double dissociations.

2.3.3 Individual ‘case study’ results

In using multiple stochastic lesions to a given network, some lesions will cause greater impairment than others. Moreover, some lesions by chance will lead to greater impairment of Living patterns than Non-living patterns, and some to greater impairment of Non-living patterns than Living patterns. If we allow ourselves to pick out the maximal dissociations in each direction, as interesting ‘case studies’, we may explore whether the findings for populations of ‘patients’ hold for individual cases.

Figure 3 shows plots of these selected dissociations from sites 1, 2, and 3 of the distributed network trained on the Small et al pattern set. Picking any ‘patient’ from the left hand column and any ‘patient’ from the right hand column, we may demonstrate a double dissociation between the Living and Non-living categories of knowledge. This characteristic emerged for all lesions to all networks and sites, except for the semi-modular network trained on the F&M pattern set. In this latter network, damage to the functional channel never caused Living patterns to be impaired worse than Non-living; and in the perceptual channel, damage to Site 1 never caused Non-living patterns to be impaired worse than Living patterns.

Nevertheless, dissociations in both directions could be found in the common hidden layer in the network. Figure 4 demonstrates that the ability to find dissociations depends on the distribution of deficits that each model can generate under stochastic lesioning of weights. Note that each such distribution is determined both by network architecture and by the nature of the training set.

In sum, these results show that, thus far, our dismissal of distributed models as candidates to generate double dissociations depends on averaging over populations of lesions (equivalent to populations of patients). However, cognitive neuropsychology traditionally relies upon single case studies rather than population data to deduce functional structure. The results from this analysis suggest that such an approach would not exclude distributed models as candidate functional structures, even in the face of double dissociation evidence. It is therefore important for us to consider whether the presence of stochastic double dissociations in these distributed models of semantic memory is a central or irrelevant feature.

Figure 3. Selected ‘case studies’ from network lesions. All lesions are from the distributed network trained on the Small et al pattern set. (a) and (d) are lesions at

Site 3, (b) and (e) are lesions at Site 2, and (c) and (e) are lesions at Site 1. The left hand column plots the instances of a network lesion from each set of 1000 where damage to Living patterns was maximally greater than that to Non-living patterns.

The right hand column plots the opposite dissociation.

0.3

(a)

Living vs Non-living

0.2

0.1

Living

Non-living

0.3

0

None 30% 50% 75%

Level of Impairment

100%

(b)

Living vs Non-living

0.2

0.1

Living

Non-living

0

None 30% 50% 75%

Level of Impairment

100%

(c)

Living vs Non-living

0.3

0.2

0.1

Living

Non-living

0

None 30% 50% 75%

Level of Impairment

100%

(d)

Living vs Non-living 0.3

0.2

0.1

Living

Non-living

0

0.3

None 30% 50% 75%

Level of Impairment

(e)

Living vs Non-living

100%

0.2

0.1

0

None 30% 50% 75%

Level of Impairment

Living

Non-living

100%

(f)

Living vs Non-living

0.3

0.2

0.1

0

None 30% 50% 75%

Level of Impairment

Living

Non-living

100%

Figure 4. Increases in error under 30% lesioning for Living and Non-living patterns, for a 1000 lesions of a single network. (a) Site 2 in the distributed model trained on the Small et al pattern set; (b) Site 1 in the perceptual channel of the semi-modular network trained on the F&M ratio set.

(a) (b)

Distributed network, Small et al pattern set, Site 2, 30%

Se mi-modular ne twork, F&M ratio

patte rn se t, Site 1, 30%

0.2

0.2

0.1

0.1

0.0

0.0

0.1

Increased error on Living

0.2

0.0

0.0

0.1

Increased error on Living

0.2

3. Discussion.

The generation of double dissociations from distributed networks is an established finding. Woods [17] and Sartori [18] have shown that lesions to small networks may independently impair performance on separate patterns stored in the network.

However, these were simple theoretical demonstrations rather than models, and the dissociations at stake here are not of individual patterns but categories. Nevertheless,

Shallice [19] and Plaut [20] have argued that individual dissociations of categories like those found in the current set of models are artefacts of scale. In proper sized models, such dissociations will disappear. For example, double dissociations in the common layer of the semi-modular network would arise because, with only 24 hidden units, certain units may end up playing significant roles in generating Living or Nonliving patterns. Chance elimination of connections from these important units to outputs would cause dissociations. With many more hidden units, no significant roles should be attributed to hidden units.

How many hidden units must a model contain? Juola and Plunkett [14] trained a model with 130 inputs, 200 hidden units, and 160 outputs to inflect 3200 nouns and verbs to plurals and past tenses. Using 1065 stochastic lesions per subject and selected case studies, they found double dissociations between regular and irregular inflections for both nouns and verbs. They also found dissociations between regular noun plurals and irregular verb past tenses, regular verbs and irregular nouns, and arbitrarily defined categories, although the distribution of deficits was different in each case. Two hundred is thus still too few hidden units for a realistic model.

Bullinaria and Chater [15] examined such double dissociations between regular and irregular mappings in three layer networks in some detail. Their view was that unless many more hidden units were used than were necessary to learn the training set,

representations would not be “fully distributed” and spurious double dissociations would be found. Given that the brain has many neurons and no a priori way to establish the right number to use for a given problem, they suggested that realistic,

‘psychologically and biologically relevant’ models should employ large numbers of hidden units. Bullinaria and Chater trained a network with 9 input units and 9 output units to learn 256 mappings. Ten hidden units were sufficient to learn the problem.

Using 20 lesions per subjects, they determined that spurious double dissociations did not disappear until 600 hidden units were employed. Using this multiple, Juola and

Plunkett’s model should have used 12,000 hidden units to avoid spurious dissociations.

We would like to conclude with the following points. Firstly, the models used in the current simulations are small and unrealistic. In this preliminary work, our aim was to compare the claims of Small et al and Farah and McClelland as directly as possible.

In future work we will explore the implications of using many more hidden units and where possible, more realistic training sets.

4

Secondly, while it seems likely Bullinaria and Chater’s finding will extend to the domain of models of semantic memory, it is worth pointing out that their findings relate to quasi-regular domains. In models of such domains, single dissociations of irregular patterns are easy to find, and damage to rules is characteristic of small networks without “fully distributed” representations. In our study, double dissociations were found between two sub-clusters of patterns distinguished by different regularities.

Thirdly, Shallice, Plaut, and Bullinaria and Chater [19, 20, 15] argue that distributed networks will not show double dissociations so long as they have fully distributed internal representations. Bullinaria and Chater claim that realistic models will have fully distributed representations because (i) they will have many hidden units, and (ii) a learning algorithm will develop representations that are fully distributed, so that hidden units will seek to participate in as many representations as possible. However, Bullinaria and Chater do not justify (ii), and in fact there is neurobiological evidence [21, 22] that argues against it. This work suggests that neural representations are sparsely coded, so that units tend to participate in few representations. If more realistic representations are indeed sparse, then clusters of processing units may end up playing an exaggerated role in producing outputs

(particularly if they are organised topographically), thus increasing the likelihood of atypical double dissociations in distributed networks. In short, the addition of more units does not necessarily lead to more fully distributed representations.

Lastly, analyses such as those presented here suggest that in any stochastic lesioning of a distributed or partially distributed system, a distribution of deficits will be created. The exact nature of the distribution (and thus the likelihood of finding chance dissociations) will depend on the architecture of the system and the knowledge it is storing (as shown in Fig.4). Only in certain types of model will the distribution be very tight (such as the ‘realistic’ models proposed by Bullinaria and

Chater which employ fully distributed representations). If various functional models

4

We are currently engaged in this work. Provisional results show that when the Small et al architecture employs 1000 hidden units in each hidden layer, stochastic double dissociations between semantic categories still occurred. Indeed we found no tightening in the distribution of deficits.

of the cognitive system which use distributed representations are distinguished by their different distributions of deficits, then it might seem appropriate to test the predicted deficit distributions of models against distributions of deficits in the population of patients with similar lesions. We may draw two conclusions. The current use of single case studies to derive functional structure in the cognitive system relies heavily on a certain class of models turning out to be the correct one, namely the class that generates very tight distributions of deficits, (tight enough, that is, such that they rule out chance double dissociations). Secondly, if the correct models turn out to be those that generate wide distributions of deficits, the relevant empirical data to test competing models may be difficult to come by: we must find patients who have similar damage, yet who can be diagnosed independently of their pattern of behavioural deficits (because such models predict that the behavioural deficits may vary widely). This would place a heavy onus on techniques that can diagnose patients according to the type of damage they have sustained, before we can collect the relevant empirical data to distinguish functional models.

References

1.

Small SL, Hart J, Nguyen T, and Gordon B. Distributed representations of semantic knowledge in the brain: Computational experiments using feature based codes. In Reggia

J, Ruppin E, and Berndt RS, Neural modelling of brain and cognitive disorders. World

Scientific, 1996.

2.

Farah MJ and McClelland JL. A computational model of semantic memory impairment:

Modality specificity and emergent category specificity. Journal of Experimental

Psychology 1991, Vol. 120(4), 339-357.

3.

Warrington EK. Concrete word dyslexia. British Journal of Psychology 1981, 72, 175-

196.

4.

Hart J Jr, Berndt RS, and Caramazza A. Category-specific naming deficit following cerebral infarction. Nature 1985, 316, 439-440.

5.

Sartori G and Job R. The oyster with four legs: A neuropsychological study on the interaction of visual and semantic information. Cognitive Neuropsychology 1988, 5(1),

105-132.

6.

Warrington EK and Shallice T. Category specific semantic impairments. Brain 1984, 106,

859-878.

7.

Warrington EK and McCarthy R. Categories of knowledge: Further fractionation and an attempted integration. Brain 1987, 110, 1273-1296.

8.

Warrington EK and McCarthy R. Category specific access dysphasia. Brain 1983, 106,

859-878.

9.

Hart J Jr and Gordon B. Neural subsystems for object knowledge. Nature 1992, 359, 60-

64.

10.

Silveri MC and Gainotti G. Interaction between vision and language in category-specific semantic impairment. Cognitive Neuropsychology 1988, 5, 677-709.

11.

Basso A, Capitani E, and Laiacona M. Progressive language impairment without dementia: A case with isolated category specific semantic defect. Journal of Neurology,

Neurosurgery and Psychiatry 1988, 51, 1201-1207.

12.

Farah MJ, Hammond KH, Mehta Z, and Ratcliff G. Category-specificity and modalityspecificity in semantic memory. Neuropsychologia 1989, 8, 1-19.

13.

Hinton GE and Shallice T. Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review 1991, 98(1), 74-95.

14.

Juola P and Plunkett K. Why double dissociations don’t mean much. In Gernsbacher MA and Derby SJ (Eds.), Proceedings of the 20th Annual Conference of the Cognitive

Science Society. Lawrence Erlbaum Associates, 1998, 561-566.

15.

Bullinaria JA and Chater N. Connectionist modelling: Implications for cognitive neuropsychology. Language and Cognitive Processes 1995, 10(3/4), 227-264.

16.

Hinton GE. Connectionist learning procedures. Artificial Intelligence 1989, 40, 185-234.

17.

Wood CC. Variations on a theme of Lashley: Lesion experiments on the neural model of

Anderson, Silverstein, Ritz & Jones. Psychological Review 1978, 85, 582-591.

18.

Sartori G. From neuropsychological data to theory and vice versa. In Denes G, Bisiacchi

P, Semenza C and Andrewsky E (Eds.), Perspectives in cognitive neuropsychology.

Hillsdale NJ: Erlbaum, 1988.

19.

Shallice T. From neuropsychology to mental structure. Cambridge: Cambridge University

Press, 1988.

20.

Plaut DC. Double dissociation without modularity: Evidence from connectionist neuropsychology. Journal of Clinical and Experimental Neuropsychology 1995, 17(2),

291-321.

21.

Baddeley RJ, Abbott LF, Booth MJA, Sengpiel F, Freeman T, Wakeman EA and Rolls

ET. Responses of neurons in primary and inferior visual cortices to natural scenes.

Proceedings of the Royal Society 1998, B, in press.

22.

Rolls ET and Treves A. Neural networks and brain function. Oxford University Press,

1998.

23.

Gonnerman LM, Andersen ES, Devlin JT, Kempler D, and Seidenberg MS. Double dissociation of semantic categories in Alzheimer’s disease. Brain and Language 1997, 57,

254-279.

24.

French RM and Mareschal, D. Could category-specific semantic deficits reflect differences in the distributions of features within a unified semantic memory? In

Proceedings of the 20 th Annual Conference of the Cognitive Science Society, 1998,

NJ:LEA, 374-379.

Download