poster5 - Department of Psychological Sciences

advertisement
What is modularity good for?
Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson
(m.thomas@bbk.ac.uk, n.forrester@bbk.ac.uk, f.richardson@psychology.bbk.ac.uk)
Developmental Neurocognition Lab, Birkbeck College, University of London, UK.
Results
Abstract
Note 2: Pinker (1999) proposed a Revised Dual Mechanism model, in which the regular mechanism learns
regulars but the exception mechanism attempts all verbs. Results also shown for this architecture.
Figure 2. Interference errors
500
100
5
25
1
500
25
100
5
1
100
25
5
1
500
100
25
5
1
500
100
5
25
1
500
25
500
Novel items
Modular
Emergent
Redundant
Revised DM
60%
40%
20%
Training set
Gktted
500
100
Wugged
5
1
500
25
5
1
100
Went
500
25
5
1
100
Drank
500
100
25
5
1
25
Hit
500
Talked
100
5
1
500
100
25
5
0%
Novel items
No level of exception signal boosting gave the modular solution an advantage
over emergent or redundant architectures.
Figure 4. Developmental trajectories while boosting the signal strength from the
exception mechanism (biasing factor x1 to x1000)
Training set: Modular (high exception resources)
Training set: Modular (low exception resources)
100%
100%
1
2
5
10
50
100
200
250
1000
20%
20%
Talked
Hit
Drank
500
100
25
5
1
500
100
25
5
1
500
25
5
1
Went
1
0%
500
100
40%
100
Drank
25
5
1
500
100
25
5
1
500
100
25
5
1
500
25
5
1
100
Hit
60%
500
40%
1
2
5
10
50
100
200
250
1000
100
60%
80%
25
80%
Went
Novel items: Modular (high exception resources)
100%
Frinked
Frank
500
25
5
1
100
Frinked
500
25
5
1
100
Gktted
500
25
100
Wugged
Frank
5
0%
1
500
100
25
5
1
500
100
25
5
1
500
100
25
1
500
100
25
5
1
5
Gktted
20%
500
20%
40%
100
40%
60%
5
60%
1
2
5
10
50
100
200
250
1000
80%
1
1
2
5
10
50
100
200
250
1000
80%
The modular solution was the least efficient use of the computational
resources. How general is this finding? It is suggestive that in adaptive
systems, co-operative use of resources to drive outputs is better than
competitive use when the mechanisms received the same input. The modular
solution may be superior when a common input must drive separate outputs,
and the two output tasks rely on independent information present in the input.
Output
• Use a 2-layer connectionist network for the
optimised rule-learning device (Pinker’s rule learning
C
device not specified in sufficient detail to implement)
Redundant => 2-layer and 3-layer both separately
trained on whole past tense problem; strongest signal
drives output
Gktted
Discussion & Conclusions
Phonological specification of past tense problem
(Plunkett & Marchman, 1991)
Emergent specialisation => 2-layer network and 3layer network both adapt to reduce error at output;
networks demonstrate partial emergent specialisation
of function (regulars+rule to 2-layer, exceptions to 3-layer)
Wugged
80%
100%
Method
Pre-specified modularity => 2-layer network trained
on regular verbs; 3-layer network trained on
exceptions; strongest signal drives output
1
100
The modular solution had fast rule
learning and strong generalisation of
the rule. But when the signal from the
exception mechanism was boosted to
allow exceptions to drive the output,
so that these verbs could be learned to
ceiling, the advantage on rule learning
was lost.
Wugged
Can use these same resources in three ways:
Went
100%
Novel items: Modular (low exception resources)
8
Drank
High resources - Exception route
Talked
7
Hit
Training set
High exception resources
0%
• Use a 3-layer connectionist network for the
optimal learning of arbitrary associations
Talked
Go-ed
25
Drink-ed
25
5
1
500
25
100
5
1
500
100
25
5
1
100
25
5
1
500
100
25
5
500
Hit-ed
Proportion of responses .
6
Go-ed
Low exception resources
Proportion of responses .
4
Drink-ed
0%
Common
Separate
PROCESSING RESOURCES
Common
Separate
Common
Separate
2
1
Hit-ed
OUTPUT
I Common
N
P
U Separate
T
500
0%
100
0%
25
20%
5
20%
Table 1. Architectures with different modular commitments
5
40%
100
40%
60%
5
Correct response .
60%
1
Interference errors .
80%
25
We explored the developmental trajectory of a modular approach to past
tense acquisition (Table 1, #3), and contrasted it with non-modular ways of
using the same initial computational resources. Does the modular solution
show the predicted advantage?
Emergent
Redundant
5
Pinker (1991) proposed that modularity would aid language development. E.g.,
in the English past tense, there is a duality between regular verbs (talk-talked)
+ rule generalisation (wug-wugged) and exceptions (hit-hit, drink-drank, gowent). When children learn the past tense, they shown intermittent overapplication of the rule to exceptions (e.g., *drinked). Pinker argued for a
modular architecture, with a rule-learning mechanism and an exceptionlearning mechanism. Over-application errors arise as the child learns to coordinate the mechanisms. However, the model has never been implemented.
Modular
100%
1
How do modular systems learn their abilities? Table 1 shows some simple
architectures with different modular commitments. Calabretta et al. (2003)
trained a network with a common visual input to output ‘what’ and ‘where’
information about objects presented on the retina. They found modular
processing channels were the optimal architecture for learning (Table 1, #7
was better than #5). ‘What’ and ‘where’ information is independent and
modularity prevents interference.
Low resources - Exception route
Emergent
Redundant
Revised DM
80%
Proportion correct .
Modularity was initially proposed as an efficient architecture for low-level
perceptual processing (Fodor, 1983). Latterly it was extended as a principle
that might apply to high-level cognition, with the architecture shaped by
natural selection (Pinker, 1997).
3
Modular
Over-application of the rule to exception verbs
100%
Introduction
1
Figure 3. Developmental trajectories
.
What’s the problem? The modules try to drive the common output in different
ways and the competition between them must be resolved. Co-operative
processing is more efficient
Note 1: Results were sensitive to hidden unit resource levels in the (3-layer) exception mechanism.
Results for both low and high resources shown.
Proportion correct
Modularity is bad for: When components receive information from a common
input and have to drive a common output
Was the modular solution best? No, it was worse than both emergent and
redundant solutions, and indeed failed to learn the exception verbs to ceiling
(Fig.3). The modular solution struggled to resolve the competition between the
different output responses of the two modules. Indeed, because the regular
mechanism was learning a simpler function, it produced a stronger response
than the exception mechanism and generally overrode it.
Correct response .
Modularity is good for: When computational components drive separate
outputs and the information required by each output is independent
All architectures exhibited a phase of interference (*drinked) errors (Fig.2).
These were not solely diagnostic of the modular solution.
500
Modular systems have been proposed as efficient architectures for high-level
cognition. However, such architectures are rarely implemented as
developmental systems. Taking the example of past tense, we find that prespecified modularity is less efficient for learning than using the same
computational resources in different ways (to produce emergent or redundant
systems).
Hidden
U
S
Acknowledgements
This research was supported by UK MRC CE Grant G0300188 to Michael Thomas
Input
Figure 1. Computational resources
used to learn past tense problem. (S)
selection mechanism allows
components to learn domain-specific
mappings. (C) competition mechanism
determines which mechanism drives
the output (cf. Pinker’s ‘blocking’
device). Modular solution used S+C.
Emergent solution uses neither.
Redundant solution uses only C.
References
1. Calabretta, R., Di Ferdinando, A., Wagner, G. P., & Parisi, D. (2003). What does it take to evolve
behaviorally complex organisms? Biosystems, 69, 245-262.
2. Fodor, J. A. (1983). The modularity of mind. CUP.
3. Pinker, S. (1991). Rules of language, Science, 253, 530-535.
4. Pinker, S. (1997). How the mind works. Allen Lane.
5. Pinker, S. (1999). Words and rules. London: Weidenfeld & Nicolson
6. Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered
perceptron. Cognition, 38, 43-102.
Download