Lecture 7: Basics of Neural Nets and Past-Tense model

advertisement
COM1070: Introduction to
Artificial Intelligence: week 9
Yorick Wilks
Computer Science Department
University of Sheffield
www.dcs.shef.ac.uk/-yorick
Rule-system
Linguists: stress importance of rules in describing
human behaviour.
We know the rules of language, in that we are able to
speak grammatically, or even to make judgements of
whether a sentence is or is not grammatical.
But this does not mean we know the rule like we know
the rule ‘i before e except after c’: may not be able to
state them explicitly.
But has been held (e.g. Pinker, 1984 following
Chomsky), that our knowledge of language is stored
explicitly as rules. Only we cannot describe them
verbally because they are written in a special code
only the language processing system can
understand:
Explicit inaccessible rule view
Alternative view: no explicit inaccessible rules. Our
performance is characterisable by rules, but they are
emergent from the system, and are not explicitly
represented anyway.
e.g. honeycomb: structure could be described by a rule,
but this rule is not explicitly coded. Regular structure
of honeycomb arises from interaction of forces that
wax balls exert on each other when compressed.
Parallel distributed processing view: no explicit (albeit
inaccessible) rules.
Advantages of using NNs to model aspects of human
behaviour.
Neurally plausible, or at least ‘brain-style computing’.
 Learned: not explicitly programmed.
 No explicit rules; permits new explanation of
phenomenon.
 Model both produces the behaviour and fits the data:
errors emerge naturally from the operation of the
model.
Contrast to symbolic models in all 4 respects (above)

Rumelhart and McClelland:
…lawful behaviour and judgements maybe produced by
a mechanism in which there is no explicit
representation of the rule. Instead, we suggest that
the mechanisms that process language and make
judgements of grammaticality are constructed in such
a way that their performance is characterizable by
rules, but that the rules themselves are not written in
explicit form anywhere in the mechanism..’
Important counter-argument to linguists, who tend to
think that people were applying syntactic rules.
Point: can have syntactic rules that describe language,
but that doesn’t mean that when we speak
syntactically (as if we were following those rules) that
we literally are following rules.
Many philosophers have made a similar point against
the reality of explicit rules --e.g. Wittgenstein.
The ANN approach provides a computational model of
how that might be possible in practice----to have the
same behavioural effect as rules but without there
being any anywhere in the system.
On the other hand, the standard model of science is of
amny possible rule systems describing the same
phenomenon--that also allows that real rules (in a
brain) could be quite different from the ones we
invent to describe a phenomenon.
Some computer scientists (e.g Charniak) refuse to
accept incomprehensible explanations.
Specific criticisms of the model:
Criticism 1
Performance of model depends on use of Wickelfeature
representation: and this is an adaptation of standard
linguistic featural analysis. – ie it relies on symbolic
input representation(cf. phonemes in NETALK)
Ie what’s the contribution of the architecture?
Criticism 2
Pinker and Prince (1988): role of input and U-shaped
curve.
Model’s entry to Stage 2 due to addition of 410 medium
frequency verbs.
This change is more abrupt than is the case with
children--there may be no relation between this
method of partitioning the training data and what
happens to children.
But later research (Plunkett and Marchman 1989) show
that U-shaped curves can be achieved without abrupt
changes in input. Trained on all examples together
(using backpropogation net).
Presented more irregular verbs, but still found
regularization, and other Stage 2 phenomena for
certain verbs.
Criticism 3
Nets are not simply exposed to data, so that we can
then examine what they learn.
They are programmed in a sense: Decisions have to be
made about several things including
 Training algorithm to be used
 Number of hidden units
 How to represent the task in question


Input and output representation
Training examples, and manner of presentation
Criticism 4
At some point after or during learning this kind of thing,
humans become able to articulate the rule.
Eg regular past tenses end in –ed.
Also can control and alter these rules – eg could
pretend to be a younger child and say ‘runned’ even
though she knows it is incorrect (cf. some use
learned and some learnt, lit is UK and lighted US)
Hard to see how such kind of behaviour would emerge
from a set of interconnected neurons.
Conclusions
Although the Past-tense model can be criticised, it is
best to evaluate it in the context of the time (1986)
when it was first presented.
At the time, it provided a tangible demonstration that
 Possible to use neural net to model an aspect of
human learning
 Possible to capture apparently rule-governed
behaviour in a neural net
Contrasting Neural Computing with Symbolic Artificial
Intelligence


Overview of main differences between them.
Relationship to the brain
(a) Similarities between Neural Computing and
the brain
(b) Differences between brain and Symbolic AI –
evidence that brain does not have a von Neumann
architecture.

Ability to provide an account of thought and
cognition
(a) Argument by symbolicists that only symbol
system can provide an account of cognition
(b) Counter-argument that neural computing
(subsymbolic) can also provide an account of
cognition
(c) Hybrid account?
•
Main differences between Connectionism and
Symbolic AI
Knowledge: knowledge represented by weights and
activations versus explicit propositions.
Rules: rule-like behaviour without explicit rules versus
explicit rules.
Learning: Connectionist nets trained versus
programmed. But there are now many machine
learning algorithms that are wholly symbolic----both
kinds only work in a specialised domain.
Examinability: Can examine symbolic program to ‘see
how it works’. Less easy in the case of Neural
Computing – problems with black box nature – set of
weights opague.
Relationship to the brain: Brain-style computing
versus manipulation of symbols. Different models of
human abilities.
Ability to provide an account of human thought: see
following discussion about need for symbol system to
account for thought.
Applicability to problems: Neural computing more
suited to pattern recognition problems, Symbolic
computing to systems characterisable by rules.
But for a different view, that stresses similarities
between GOFAI and NN approaches see Boden, M.
(1991) Horses of a different colour, In Ramsey, W.,
Stich, S.P. and D.E. Rumelhart, ‘Philosophy and
Connectionist Theory’, Lawrence Erlbaum
Associates: Hillsdale, New Jersey, pp 3-19, where
she points out some of the similarities. See also YW
in Foundations of AI book, on web course list.
Fashions: historical tendency to model brain on
fashionable technology.
mid 17th century: water clocks and hydraulic puppets
popular
Descartes developed hydraulic theory of brain
Early 18th century Leibniz likened brain to a factory.
Freud: relied on electromagnetics and hydraulics in
descriptions of mind.
Sherrington: likened nervous system to telegraph.
Brain also modelled as telephone switchboard.
We use a computer to model the human brain; but is
human brain itself a computer?
Differences between Brains and von Neumann
machines
McCulloch and Pitts: simplified account of neurons as
On/Off switch.
In early days seemed that neurons were like flip-flops in
computers.
Flip-flop: can be thought of as tiny switches that can be
either off or on. But now clear that there are
differences:
- rate of firing of neuron important, as well as on/off
feature
- Neuron has enormous number of input and output
connections, compared to logic gates.
speed: neuron much slower. Takes thousandth of a
second to respond, whereas flip-flop can shift
position from 0 to 1 in thousand-millionth of a second.
I.e. brain takes a million times longer.
Thus if brain running an AI program, stepping through
instructions, would take at least 1000th sec for each
instruction.
Brain can extract meaning from sentence, or recognise
visual pattern in about 1/10th second.
So, if this is being accomplished by stepping through
program, program can only be 100 instructions long.
But current AI programs contain 1000s of instructions!
Suggests brain operates in parallel, rather than as
sequential processor.
-
Symbol manipulators: most (NOT ALL) are sequential –
carrying out instructions in sequence.
- human memories: content-addressable. Access to
memory via its content.
E.g. can retrieve memory via description:
(e.g. could refer to Turing Test either as ‘Turing Test’, or
as ‘assessment of intelligence based on Victorian
parlour game, and would still access memory).
But memory in computer has unique address: cannot
get at memory without knowing its address (at the
bottom level that is!).
- memory distribution
In a computer a string of symbol tokens exists at
specific physical location in hardware.
But our memories do not seem to function like that.
E.g. Lashley and search for the engram
Trained rats to learn route through maze to food.
Destroyed different areas of brain. As long as only 10
percent destroyed, no loss of memory, regardless of
which area of brain destroyed.
Lashley (1950) ‘…There are no special cells reserved
for special memories… The same neurons which
retain memory traces of one experience must also
participate in countless other activities…’
and conversely a single memory must be stored in
many places across a brain----there was brief fashion
for ‘the brain as a hologram’ because of the way a
hologram stores information.
Graceful degradation
With injury, brain performance degrades gradually, but
computers crash.
E.g. Phineas Gage, railway worker. Speeding iron rod
crashed through the anterior and middle left lobes of
his cerebrum, but within minutes he was conscious,
collected and speaking.
He lived for thirteen years afterwards.
BUT cannot conclude because of the differences
between the brain, and the von Neumann machine,
that the brain is not a computer.
All the above is taken to show that brain does not have
von Neumann architecture, but we could choose to
store the same data all over a computer.
-
Download