Document

advertisement
Connectionism
Connectionism, the alternative paradigm, considers (in a similar way to
computationalism or “symbolic paradigm”) that the brain is a large neural network
wherein computations over representations take place, the computations being a
mapping of an input vector to an output vector.
The essential difference between these approaches concerns the nature of
representations: in connectionist theory, in the most important neural networks, the
representations are distributed.
Exclusive OR (XOR) problem: Perceptron Convergence Procedure (Rosenblatt 1962)
with two-layer network with Hebbian rule (a more powerful variation: the delta rule)
cannot solve the XOR problem. (Or Perceptrons - Minsky and Papert 1969)
The solution- internal representations (hidden units). (Elman et al, pp. 60-66)
According to Clark- 3 generations.
I. First generation: “The input units send signals to the hidden units. Each hidden
unit computes its own outcome and then sends the signal to the output units. If a
neural net were to model the whole human nervous system, the input units would be
analogous to the sensory neurons, the output units to the motor neurons, and the
hidden units to all other neurons.” (Garson 2007)
(Garson 2007)
Other networks- (Elman et al, p. 51)
1
The pattern of activation in the network is determined by the weights on the nodes. “It
is in the weights that knowledge is progressively build up in a network.” (Elman et al.
p. 51) Each input unit receives input external to the net. Input units send their
activation value to hidden units. “Each of these hidden units calculates its own
activation value depending on the activation values it receives from the input units.”
(Garson 2007)
each hidden unit is sensitive to complex, often subtle, regularities that connectionists call
microfeatures. Each layer of hidden units can be regarded as provinding a particular distributed
encoding of the input pattern (that is, an encoding in terms of a pattern of microfeatures).
(Bechtel &
Abrahamsen 2002, p. 42 or Hinton, McClleland and Rumelhart, 1986, PDP:3, pp. 801 in Bechtel 2002, p. 51)
The same phenomena take place between hidden units and output units. Essential for
neural nets is that the activation of a net is determined by its weights that can be
positive (excitatory) or negative (inhibitory).
The activation value for each receiving unit: “The function sums together the
contributions of all sending units, where the contribution of a unit is defined as the
weight of the connection between the sending and receiving units times the sending
unit's activation value.”
aj = the activation of node j that send output to node i. weight- wij. The single input
from j is wij aj.
Ex: A node j- output= 0.5; connection to i with weight= -2.0 → (0.5 x -2.0) = -1
One node receives inputs from many nodes: Net input to the node
neti = ∑ wij aj
- the total input received by a node.
(Elman et. al, 1996, p. 51-2)
The output- like neurons- not the same as the input: what a node “does” (the response
function) = the node’s activation value that can be linear or nonlinear functions
(sigmoid activation function or others). (Elman et al, p. 53) Nonlinear = “the
numerical value of the output is not directly proportional to the sum of the inputs.”
(Clark 2001, p. 63)
2
Rumelhart and McClleland 1982: 3 layers of nodes for word, letter and orthographic
feature. (Ex.: the node “trap” receives positively weighted input from the letter nodes
“t”. “r”, “a”, and “p”; inhibited by other word nodes. -- Elman et al, p. 55)
One basic principle: similarity. A network has classified a pattern (11110000) in a
certain way then it will tend to classify a novel pattern (11110001) in a similar way.
(Elman et al, p. 59)
Similarity → generalizations vs. “tyranny of similarity”. (Mcleod, Rolls, Plunkett
1996)
The task for neural nets is to find the weights that correspond to a particular task. One
most widely used training method is the backpropagation rule. (It does not
correspond to human learning processes! Other methods: self-supervised learning,
unsupervised learning.) (Elman et al p. 66-7)
Step I: The error is the difference between the activation of a given output unit (actual
output) and that it is supposed to have (target output).
Step II: Adjusting of the weights leading into those output units → decrease the error.
“The local feedback … is provided by the supervisory system that determines whether
a slight increase or decrease in a given weight would improve performance (assuming
the other weights remain fixed). This procedure, repeated weight by weight and layer
by layer, effectively pushes the system down a slope of decreasing error.” (Clark
2001, p. 65) The algorithm: “We propagate the error information (…= error signal)
backwards in the network from output units to hidden units.” (Elman et al 1966, p. 67)
“… the ability to learn may change over time-not as a function of any explicit change
in the mechanism, but rather as an intrinsic consequence of learning itself. The
network learns, just as children do.” (Elman et al, p. 70)
Learning as gradient descent in weight space. (Elman et al, pp. 71-2)
NETtalk (Sejnowsky and Rosenberg 1986, 1987) - the task: written input into coding
for speech (grapheme-to-phoneme conversation). (Clark 2001, p. 63)
3
- DECtalk vs. NETtalk: classical program (explicitly programmed, rules and
exceptions) vs. “learned to solve the problem using a learning algorithm and …
example cases…” (p. 63) During learning- the speech output- progress from initial
babble to semirecognizable words and syllable structure, to … a fair simulacrum of
human speech.” (p. 63) However - no semantic depth.
Superpositional storage: “Two representations are fully superposed if the resources
used to represent item 1 are coextensive with those used to represent item 2.” (Clark
1997, p. 169) The definition of superpositional storage of two items if
… it then goes on to encode the information about item 2 by amending the set of original weightings in
a way that preserves the functionally (some desired input-output pattern) required to represent item 1
while simultaneously exhibiting the functionality required to represent item 2. (Clark
1997, p. 170)
The combination of two characteristics for superposition:
(1) The use of distributed representations (2) the use of a learning rule that imposes a
semantic metric on the acquired representations. (Clark 1997, p. 170)
“… semantically related items are represented by sintactically related (partially
overlapping) patterns of activation.” (Clark 2001, p. 66) (Ex: cat and panther vs. fox)
Or “The semantic (…) similarity between representational contents is echoed as a
similarity between representational vehicle.” (Clark 1997, p. 171) → Prototype
extraction (category or concept) + generalization.
Against symbolic paradigm, neural nets do not work with innate rules (and
representations). (Chomsky, Fodor, etc.) Such rules appear as a natural effect of
training (learning) so we have learning rules that impose the semantic metric of
acquired representations. (Clark 1997, p. 171) In Bechtel & Abrahamsen 2002, the
whole Chapter 5 – “Are rules required to process representations?” is dedicated to the
same topic.

Intrinsic context-sensitivity or Smolensky’s “subsymbolic paradigm”:
Physical symbol system approaches display semantic transparency (familiar words +
ideas – rendered as simple inner symbols).
4
Fodor and Pylyshyn (1988) - against connectionism (Bechtel & Abrahamsen 2002,
Chapter 6): LOT with compositionality, systematicity, and productivity. (See Week 4)
“Symbolic representations have a combinatorial syntax and semantics.” (Bechtel &
Abrahamsen 2002, p. 157) Dennett: “The syntactic engine mimics a semantic engine.”
(Bechtel & Abrahamsen 2002, p. 157)
Fodor and Pylyshyn consider that connectionism as lacking a combinatorial syntax
and semantics. For them connectionism is just a mere implementation of symbolic
system. Fodor and Pylyshyn 1988)
vs.
Connectionism - “fine grained context sensitivity”.
A representation of an item is given by a
distributed pattern of activity that contains sub-patterns appropriate to the feature-set involved… [A]
network will be able to represent several instances of such an item, which may differ in respect of one
or more features. …[S]uch “near neighbors” will be represented by similar internal representational
structures, that is, the vehicles of the several representations (activation patterns) will be similar to each
other in ways that echo the semantic similarity of the cases – that is the semantic metric (see above) in
operation. (Clark
1997, p. 174)
and thus
“.. the contentful elements in a subsymbolic program do not directly recapitulate the
concepts we use ‘to consciously conceptualize the task domain’ (Smolensky, 1988, p.
5) and that ‘the units do not have the same semantics as words of natural language’
(p.6).” (Clark 2001, p. 67) In Clark words, the unit-level activation differences can
mirror the details of various mental functions in interactions with “real-world
contexts”. The knowledge – from training data → “postraining analysis” (statistical
analysis and systematic interference).
Smolensky (1988): A connectionist state is a pattern of activity (within an activation
space), which contains constituent subpatterns. A pattern of activity cannot be
decomposed into conceptual constituents as in the symbolic paradigm. The
5
connectionist decomposition is an approximate one: a complex pattern contains
constituent subpatterns that are not defined precisely and exactly, but depend on
context. The constituent structure of a subpattern is strongly influenced by the inner
structure included within it. (See the example with a cup with coffee in Smolensky
1988). The conceptual constituents of mental states are vectors of activity with a
special kind of constituent structure: the activation of individual units. The
connectionist representations have constituents, but these constituents are functional
parts of the complex representations, not effective parts of a concatenate scheme, the
constituent relations not being instantiated in a part-whole type of relation. While the
classical approach deals with a type of concatenate compositionality, connectionism
stresses functional compositionality (van Gelder 1990). For van Gelder, concatenation
means “linking or ordering successive constituents without altering them in any way”
and the “representations ‘must preserve tokens of an expression’s constituents (and
the sequential relations among tokens)’ (p. 360).” Functional compositionality is the
process of having a representation as recovering parts through certain operations. (van
Gelder 1990, p. 360 in (Bechtel & Abrahamsen 2002, p. 170 and 6.3.1) His examples
are Pollack’s RAAM nets, Hinton’s (1990) reduced descriptions of levels in
hierarchical trees, and Smolensky’s (1990) tensor product representations of binding
relations.
The difference between classical approach and connectionism is that
In the symbolic paradigm the context of a symbol is manifest around it and consist of other symbols; in
the subsymbolic paradigm the context of a symbol is manifest inside it, and consist of subsymbols.
(Smolensky 1988, p. 17)
About the coffee example:
The compositional structure is there, but it’s there in an approximate sense. It not equivalent to taking a
context-independent representation of coffee and a context-independent representation of cup – and
certainly not equivalent to taking a context-independent representation of the relationship in or with –
and stiking them al together in a symbolic structure concatenating them together to form syntactic
compositional structure like “with (cup, coffee).” (Smolensky, 1991, p. 208)
6
(Clark 1997, p. 175)
Such nets “do not involve computations defined over symbols. Instead, any given
accurate (i.e., fully predictive) picture of the systems processing will need to be given
at the numerical level of units and weights and activation-evolution equation…” and
so
there are no syntactically identifiable elements that both have a symbolic interpretation and can figure
in a full explanation of the totality of the system’s semantic good behaviour, that is, “There is no
account of the architecture in which the same elements carry both the syntax and the semantics”
(Smolensky, 1991. p. 204).
(Clark 1997, p. 175)
and
Mental representations and mental processes are not supported by the same formal entities−there are
not “symbols” that can do both jobs. The new cognitive architecture is fundamentally two-level;
formal, algorithmic specification of processing mechanisms on the one hand, and semantic
interpretation on the other, must be done at two different levels of description. (Smolensky, 1991, p.
203) (Clark,
1997c, p. 175)
In Smolensky’s words, on one level, mental processes are represented by “numerical
level descriptions of units, weights and activation-evolution equation”. (Clark, p. 175)
At this level we can not find the semantic interpretation. On the other level, “large
scale activity of such systems allows interpretation but the patterns thus fixed on are
not capable of figuring in accurate descriptions of the actual course or processing.
(See Smolensky, op. cit., p. 204)” (Clark, p. 176) The semantic metric of the system
imposes a similarity for content when there is a similarity for vehicle (similar
patterns).
Clark emphasizes that such coding systems exploit “more highly structured syntactic
vehicles than words.” →
- Economical use of representational resources
- “Free” generalization (a new input if it resembles an old one… will yield a response
rooted in that partial overlap → Sensible responses to new inputs are possible.
- Graceful degradation (the ability to produce sensible responses given some systemic
damage), pattern completion, damage tolerance. (Clark, p. 66-7)
7
Fodor and McLaughlin 1990, McLaughlin 1993: against Smolensky vs. Hadley and
Hayward 1997, Christiansen and Charter 1994.
II. Second generation: the temporal structure- In 1990, 1991 and 1993, Elman
created recurrent neural nets that have something more then classic nets, in that the
signal from inputs to hidden units and finally to outputs is sent back from the output
units to the inputs or hidden units. In this way, the recurrent net stands for human
short-term memory.
According to Bechtel & Abrahamsen (2002), related to time, language-sentences have
two features: “(1) they are processed sequentially in time; (2) they exhibit longdistance dependencies, that is, the form of one word (or larger constituent) may
depend on another that is located at an indeterminate distance.” (Verbs must agree
with their subjects- a relative clause … intervenes between the subject and the verb.)
(p. 179) For producing such sentences the net has to incorporate such relationships
without using explicit representations of linguistic structures.
(Elman et al., pp. 74-5, p. 81)
Elman’s simple recurrent network (1990) (150 hidden units related to context unitsincorporate information about previous words → the net processes sentences
sequentially in time through grasping the dependencies between nonadjacent wordsBechtel & Abrahamsen 2002, p. 181): to predict successive words in a sentence. The
input one (the word – localist representation) one time. The output of the network- to
predict the next word. After the network’s output (prediction of the word) –
Backpropag. rule → Adjust the weights. Then next word as input. This process –
reiterated thousands of sentences.
The hidden units define a high dimensional space 150-dimensional hypercube: “the
network would learn to represent words which ‘behave’ in similar ways (i.e., have
similar distributional properties) with vectors which are close in this internal
representation space.” (Elman et al, p. 94) This space cannot be visualized, therefore:
hierarchical clustering tree of the words’ hidden unit activation patterns. (Elman et al,
p. 96 or Clark 2001, p. 68-73) It means “capturing the hidden unit activation pattern
corresponding to each word, and then measuring the distance between each pattern
8
and every other pattern. These inter-pattern distances are nothing more than the
Euclidian distance between vectors in activation space” → the hierarchical clustering
tree, “placing similar patterns close and low on the tree, and more distant groups on
different branches.” (p. 94-5) = VERBS, animates, NOUNS, inanimates. (Contextsensitivity= “tokens of the same type are all spatially proximal, and closer to each
other than to tokens of any other type.” Elman et al, p. 97)
The net “discovered” categories - verbs, nouns, animate, inanimate - “properties that
were good clues to grammatical role in the training corpus used.” (Clark, p. 71)
Elman uses “cluster analysis” and “principal component analysis” (PCA) for
determining what the networks learned. For NETtalk - they used “cluster analysis”
(the network learned a set of static distributed symbols→ the relations of similarity
and difference between static states), PCA for a SRN in addition “can promote or
impede movement into future states = “temporally rich information-processing
detail.” → Dynamic representation. (Clark 2001, p. 71-2)
There is no separate stage of lexical retrieval. There are no representations of words in isolation. The
representations of words (the internal states following input of a word) always reflect the input taken
together with the prior state ... the representations are not propositional and their information content
changes constantly over time in accord with the demands of the current task. Words serve as guideposts
which help establish mental states that support (desired) behaviour.
(Elman, 1991b, p. 378 in
Clark 2001, p. 72)
SRN- Without any knowledge of semantics, the net learns to group the encoding of
animate objects together only because they were distributed similarly in the training
corpus. (Bechtel & Abrahamsen, 182)
Strong Representational Change
vs. weak representational change (Fodor) = the concepts are innate not learned. For
Fodor “concept-learning involves two processes: “(1) triggering of an innate
representational atom and/or (2) the deployment of such atoms in generate and test
style learning.”
9
Connectionism is against this image. It models acquire “domain knowledge” only
through learning mechanisms. (See past tense learning net- Rumelhart and
McClelland 1986) However, initial architecture (units, layers) = a kind of “little
knowledge” but not innate symbols. In a net essential are the weights and their
content depend on training the net. Bates and Elman consider encode 90% and 10%
innate. (1992 in Clark, p. 183) In a net training environment determine “both the
knowledge and the processing profile acquired by a network.” It can be a kind of
functional modularity = “that are powerfully connected among themselves and
relatively weakly connected to units outside the set”. (Rumelhart and McClleland
1986b, p. 141 in Clark, p. 182) Through training there are qualitative changes for a
network. (See U-curve effect, Plunkett and Marshman 1991)
Essential it is the “deep interpenetration of knowledge and processing
characteristics” for a network. Processing involve the weights to create pattern of
activation → outputs. But these weights are the knowledge stored of network. “And a
new knowledge has to be stored superpositionally, that is, by amending existing
weights.” (p. 184) “Text (knowledge) and process (the use and alteration of
knowledge) are thus inextricably intertwined.” And “Where the classicist thinks of
mind as essentially static, recombinable text, the connectionist thinks as a highly fluid
environmentally coupled dynamic process.” (p. 184)
Neural nets can perform various tasks. “Experiments on models of this kind have
demonstrated an ability to learn such skills as face recognition, reading, and the
detection of simple grammatical structure.” (Garson 2007) The first important task for
a net after its training was to predict the irregular past tense of verbs (PDP Group,
1986). Other nets can recognize faces or associate images with labels (Plunkett &
Marchman 1991, 1993 in Elman et al, p. 124-129).
“Another influential early connectionist model was a net trained by Rumelhart and
McClelland (1986) to predict the past tense of English verbs.” [Elman et al, Chapter
3, p. 131-7] → Regular vs. irregular verbs.
Classical approach: There are two mechanisms (one for regular verbs the other for
irregular verbs).
vs.
10
Connectionism- only one mechanism (a single set of connections for regular and
irregular). They used a single layered network and perceptron convergence procedure
→ Such nets – capable of learning problems that are linearly separable. But past tense
problem is a nonlinear one. (Elman et al, 1996, p. 137)
Rumelhart and McClleland (1986, PDP:18)
Lawful behaviour and judgments any be produced by a mechanism in which there is no explicit
representation of the rule. Instead, we suggest that the mechanisms that process language and make
judgments of grammaticality are constructed in such a way that their performance is characterizable by
rule, but that the rules themselves are not written in explicit form anywhere in the mechanism. (1986a,
p. 217) (Bechtel
& Abrahamsen 2002, p. 121)
We have shown that a reasonable account of the acquisition of past tense can be provided without
recourse to the notion of a “rule” as anything more than a description of the language. We have shown
that, for this case, there is no induction problem. The child need not figurer out what the rules are, nor
even that there are rules. The child need not decide whether a verb is regular or irregular. … A
uniform procedure is supplied as input to the past-tense network and the resulting pattern of activation
is interpreted as a phonological representation of the past form of that verb. This is the procedure
whether the verb is regular or irregular, familiar or novel. (1986a, p. 267)
(Bechtel & Abrahamsen
2002, p. 135)
Debates -Pinker & Prince (1988) – “… a poor job of generalizing to some novel
regular verbs. … Nets may be good at making associations and matching patterns, but
- fundamental limitations in mastering general rules such as the formation of the
regular past tense.” (Garson 2007) (See 5.3 in Bechtel & Abrahamsen 2002, p. 135-)
“Despite Pinker and Prince's objections, many connectionists believe that
generalization of the right kind is still possible (Niklasson and van Gelder 1994).”
(Garson 2007)
Plunkett and Marchman (1991, 1993) – A net with hidden units → The U-shape form
for reproducing the patterns of error observed in children. (Elman et al, 137-47 or
Bechtel and Abrahamsen 5.4)
11
Differentiation at the behavioral level need not necessarily imply differentiation at the level of
mechanism. Regular and irregular verbs can behave quite differently even though represented and
processed similarly in the same device.
(Elman et al, p. 139)
The net was first trained on a set containing a large number of irregular verbs, and later on a set of 460
verbs containing mostly regulars. The net learned the past tenses of the 460 verbs in about 200 rounds
of training, and it generalized fairly well to verbs not in the training set. It even showed a good
appreciation of "regularities" to be found among the irregular verbs (‘send’ / ‘sent’, ‘build’ / ‘built’;
‘blow’ / ‘blew’, ‘fly’ / ‘flew’). During learning, as the system was exposed to the training set
containing more regular verbs, it had a tendency to overregularize, i.e., to combine both irregular and
regular forms: (‘break’ / ‘broked’, instead of ‘break’ / ‘broke’). This was corrected with more training.
It is interesting to note that children are known to exhibit the same tendency to overregularize during
language learning.
(Garson 2007)
III. Third generation: “Dynamical connectionism” (Wheeler 1994, Port and van
Gelder 1995) - Neurobiologically realistic features to the basic units and weights.
(Clark, p. 72-3)
Some philosophers: nn - distributed representations - similar to the brain structure.
Nn= implementation of the mind.
Others - nn = the mind.
NN- strengths (motor control, pattern recognition) and weaknesses (planning and
sequential logical derivation) (Clark, p. 73)
Connectionism vs. classical approach
- Connectionism eliminates the homunculus.
The Belusov-Zhabotinsky reaction –classical example of emergent behaviour.
“Connectionist models are attractive because they provide a computational framework
for exploring the conditions under which such emergent properties occur.” (Elman et
al, p. 85)
- Innate (Chomsky, Fodor- mental representations and rules are innate) vs. learning: 3
classes of constraints (in Rethinking Innateness): representations, architecture, timing.
- Connectionist representations: local or distributed representations. (Elman p. 90-2)
12
Distributed representation = computationally – encode information concerning
similarities and differences. (Clark, p. 66) “A distributed pattern of activity can
encode ‘microstructural’ information such that variations in the overall pattern reflect
variations in the content.” (Clark, p. 66)
- No modularity for nn at the beginning of training. (Elman, p. 100-1) → “There is a
huge difference between starting modular and becoming modular.” (p. 101)
- Rules for nn: they are not capable of productive and systematic behavior but they
have certain “rules” “since networks are function approximators and functions are
nothing if not rules.” (Elman, p. 102)
Against connectionism- (Clark 2001)
(a) Mental causation (Ramsey, Stich, and Garon, 1991 in Clark, p. 73-6)
- “Propositional modularity”: in common talk- “functionally discrete semantically
interpretable states that play a causal role in the production of behaviour” (p. 204,
their emphasis in Clark, p. 73)
- “Propositional modularity”: individual beliefs functions as the discrete causes for
specific actions. (Clark, p. 74)
- No such “propositional modularity” for distributed connectionist processing- mainly
because of the use of “superpositional” information storage that produces “total
causal holism”. (Clark, p. 74-5)
(b) Systematicity (Fodor and Pylyshyn 1988)
- Fodor’s LOT: compositionality, systematicity and productivity.
The systematicity of thought is an effect of the compositionally structured inner base, which includes
manipulable inner expressions meaning “John” “loves” “Mary” and resources for combining them.
(Clark, p. 77)
Replies: (1) Classical approach is not the only way to support systematicity (2) this
property is displayed from the grammatical structure of human language. (Clark, p.
77)
- Examples for systematicity in nn: Smolensky’s tensor product, Chalmers’s net (that
uses recursive autoassociative memory – RAAM)
13
(c) Biological reality
1. The use of artificial tasks and choice of input and output representations
- The choice of problem domain and training materials
- “Horizontal microworlds”: parts of human cognition (past tense, simple grammars,
etc.) (Clark, 79-80)
2. Small resources of units and connections vs. brain (Clark, p. 80)
3. The enormous differences between nn and the brain. (p. 81)
The sentences were formed from a simple vocabulary of 23 words using a subset of English grammar.
The grammar, though simple, posed a hard test for linguistic awareness. It allowed unlimited formation
of relative clauses while demanding agreement between the head noun and the verb. So for example, in
the sentence “Any man that chases dogs that chase cats … runs.” “The singular ‘man’ must agree with
the verb ‘runs’ despite the intervening plural nouns (‘dogs’, ‘cats’) which might cause the selection of
‘run’. One of the important features of Elman's model is the use of recurrent connections. The values at
the hidden units are saved in a set of so called context units, to be sent back to the input level for the
next round of processing. This looping back from hidden to input layers provides the net with a
rudimentary form of memory of the sequence of words in the input sentence. Elman's nets displayed an
appreciation of the grammatical structure of sentences that were not in the training set. The net's
command of syntax was measured in the following way. Predicting the next word in an English
sentence is, of course, an impossible task. However, these nets succeeded, at least by the following
measure. At a given point in an input sentence, the output units for words that are grammatical
continuations of the sentence at that point should be active and output units for all other words should
be inactive. After intensive training, Elman was able to produce nets that displayed perfect
performance on this measure including sentences not in the training set.” (Garson 2007) “Marcus
(1998, 2001) argues that Elman's nets are not able to generalize this performance to sentences formed
from a novel vocabulary. This, he claims, is a sign that connectionist models merely associate
instances, and are unable to truly master abstract rules. On the other hand, Phillips (2002) argues that
classical architectures are no better off in this respect. The purported inability of connectionist models
to generalize performance in this way has become an important theme in the systematicity debate.”
(Garson 2007)
Distributed representations for complex expressions like ‘John loves Mary’ can be constructed that do
not contain any explicit representation of their parts (Smolensky 1991). The information about the
constituents can be extracted from the representations, but neural network models do not need to
explicitly extract this information themselves in order to process it correctly (Chalmers 1990). This
suggests that neural network models serve as counterexamples to the idea that the language of thought
is a prerequisite for human cognition. However, the matter is still a topic of lively debate (Fodor 1997).
(Garson 2007)
14
Download