Neural Network Models - Temporal Dynamics of Learning Center

advertisement
Modeling with neural networks
Garrison W. Cottrell
Gary's Unbelievable Research Unit (GURU)
Computer Science and Engineering Department
Temporal Dynamics of Learning Center
Institute for Neural Computation
UCSD
1
Ways to understand
how the brain works
• Behavioral measures
• Choices
• Reaction times
• Eye movements
• Brain imaging
•
•
•
•
•
•
PET
fMRI
MEG
EEG
NIRS
DTI
• Neural recording
• Single cell recording
• Multicell recording
• Optical imaging
• Voltage-sensitive dyes
• Optogenetics
• ECOG
• Modeling
• Neural networks
• Bayesian Models
• Abstract Mathematical Models
Modeling with Neural Networks
2
Why model?
• Models rush in where theories fear to tread.
• Models can be manipulated in ways people cannot
• Models can be analyzed in ways people cannot.
Modeling with Neural Networks
3
Models rush in
where theories fear to tread
• Theories are high level descriptions of the
processes underlying behavior.
• They are often not explicit about the
processes involved.
• They are difficult to reason about if no
mechanisms are explicit -- they may be
too high level to make explicit predictions.
• Theory formation itself is difficult.
Modeling with Neural Networks
4
Models rush in where theories fear to
tread
• Using machine learning techniques, one can often build a
working model of a task for which we have no theories or
algorithms (e.g., expression recognition).
• A working model provides an “intuition pump” for how
things might work, especially if they are “neurally
plausible” (e.g., development of face processing - Dailey
and Cottrell).
• A working model may make unexpected predictions (e.g.,
the Interactive Activation Model and SLNT).
Modeling with Neural Networks
5
Your first neural net:
The Interactive Activation Model:
A model of reading from print
• Word level
• Letter level
• Feature level
Modeling with Neural Networks
6
Operation of the model
Modeling with Neural Networks
7
Operation of the model
Modeling with Neural Networks
8
Example of data accounted for…
Pseudoword effect
Modeling with Neural Networks
9
Example of data accounted for…
Pseudoword effect
Modeling with Neural Networks
10
•
Example
of
data
predicted
What about non-pronounceable non-words like SLNT?
• SLNT has a lot of friends at the word level
• The model predicts that there should be a superiority effect for SLNT.
• The tested this in UCSD Psychology sophomores and got the predicted
effect
Modeling with Neural Networks
11
Summary
• Why model?
• Models make assumptions explicit
• Models (because they are run on a computer and can be
highly non-linear) can make unexpected predictions
• While no model is “correct”, the more data a model
predicts, the more we “believe” that model…
Modeling with Neural Networks
12
Models can be manipulated in ways
people cannot
• We can see the effects of variations in cortical
architecture (e.g., split (hemispheric) vs. non-split
models (Shillcock and Monaghan word perception
model)).
• We can see the effects of variations in processing
resources (e.g., variations in number of hidden
units in Plaut et al. models).
Modeling with Neural Networks
13
Models can be manipulated in ways
people cannot
• We can see the effects of variations in environment
(e.g., what if our parents were cans, cups or books
instead of humans? I.e., is there something special
about face expertise versus visual expertise in
general? (Sugimoto and Cottrell, Joyce and
Cottrell)).
• We can see variations in behavior due to different
kinds of brain damage within a single “brain” (e.g.
Juola and Plunkett, Hinton and Shallice).
Modeling with Neural Networks
14
Models can be analyzed in ways
people cannot
In the following, I specifically refer to neural network models.
• We can do single unit recordings.
• We can selectively ablate and restore parts of the network,
even down to the single unit level, to assess the contribution
to processing.
• We can measure the individual connections -- e.g., the
receptive and projective fields of a unit.
• We can measure responses at different layers of processing
(e.g., which level accounts for a particular judgment:
perceptual, object, or categorization? (Dailey et al. J Cog
Neuro 2002).
Modeling with Neural Networks
15
How (I like) to build Cognitive Models
• In a domain where there is a lot of data and controversy!
• I like to be able to relate them to the brain, so “neurally
plausible” models are preferred -- neural nets.
• The model should be a working model of the actual task,
rather than a cartoon version of it.
• Of course, the model should nevertheless be simplifying
(i.e. it should be constrained to the essential features of the
problem at hand):
• Do we really need to model the (supposed) translation invariance
and size invariance of biological perception?
• As far as I can tell, NO!
• Then, take the model “as is” and fit the experimental data:
No fitting parameters is to be preferred over 1, 2 , or 3.
Modeling with Neural Networks
16
The other way (I like) to build
Cognitive Models
• In domains where there is little data and much mystery
• Use them as exploratory models -- in domains where there
is little direct data (e.g. no single cell recordings in infants
or undergraduates) to suggest what we might find if we
could get the data. These can then serve as “intuition
pumps.”
• Examples:
• Why we might get specialized face processors
• Why those face processors get recruited for other tasks
Modeling with Neural Networks
17
A few giants
Modeling with Neural Networks
18
A few giants
Frank Rosenblatt invented the
perceptron:
• One of the first neural
networks to learn by
supervised training
• Still in use today!
Modeling with Neural Networks
19
A few giants
Dave E. Rumelhart with Geoff
Hinton and Ron Williams
invented back-propagation
• Many had invented back
propagation before; few could
appreciate as deeply as Dave
did what they had when they
discovered it.
Modeling with Neural Networks
20
A few giants
• Hal White was a theoretician of
neural networks
• Hal White’s paper with Max
Stinchcombe:
“Multilayer feedforward networks are
universal approximators” is his
second most-cited paper, at 8,114
cites.
Modeling with Neural Networks
21
A few giants
•
In yet another paper (in Neural
Computation, 1989), he wrote
“The premise of this article is that
learning procedures used to train
artificial neural networks are
inherently statistical techniques. It
follows that statistical theory can
provide considerable insight into the
properties, advantages, and
disadvantages of different network
learning methods…”
This was one of the first papers to make
the connection between neural
networks and statistical models - and
thereby put them on a sound
statistical foundation.
Modeling with Neural Networks
22
What is backpropagation, and why
is/was it important?
• We have billions and billions of neurons
that somehow work together to create the
mind.
• These neurons are connected by 1014 - 1015
synapses, which we think encode the
“knowledge” in the network - too many for
us to explicitly program them in our
models
• Rather we need some way to indirectly
set them via a procedure that will achieve
some goal by changing the synaptic
strengths (which we call weights).
• This is called learning in these systems.
Modeling with Neural Networks
23
Learning: A bit of history
• Frank Rosenblatt studied a simple version of a neural net
called a perceptron:
• A single layer of processing
• Binary output
• Can compute simple things like (some) boolean functions (OR,
AND, etc.)
Modeling with Neural Networks
24
Learning: A bit of history
output
net input
Modeling with Neural Networks
25
Learning: A bit of history
Modeling with Neural Networks
26
Learning: A bit of history
• Rosenblatt (1962) discovered a learning rule for perceptrons called
the perceptron convergence procedure.
• Guaranteed to learn anything computable (by a two-layer
perceptron)
• Unfortunately, not everything was computable (Minsky & Papert,
1969)
Modeling with Neural Networks
27
Perceptron Learning Demonstration
• Output activation rule:
• First, compute the net input to the output unit:
wixi = net
• Then, compute the output as:
If net   then output = 1
else output = 0
output
net input
Modeling with Neural Networks
28
Perceptron Learning Demonstration
• Output activation rule:
• First, compute the net input to the output unit:
wixi = net
If net   then output = 1
else output = 0
• Learning rule:
If output is 1 and should be 0, then lower weights to active inputs
and raise the threshold ()
If output is 0 and should be 1, then raise weights to active inputs
and lower the threshold ()
(“active input” means xi = 1, not 0)
Modeling with Neural Networks
29
Characteristics of perceptron learning
• Supervised learning: Gave it a set of input-output examples
for it to model the function (a teaching signal)
• Error correction learning: only correct it when it is wrong.
• Random presentation of patterns.
• Slow! Learning on some patterns ruins learning on others.
Modeling with Neural Networks
30
Perceptron Learning Made Simple
• Output activation rule:
• First, compute the net input to the output unit:
wixi = net
If net   then output = 1
else output = 0
• Learning rule:
If output is 1 and should be 0, then lower weights to
active inputs and raise the threshold ()
If output is 0 and should be 1, then raise weights to
active inputs and lower the threshold ()
Modeling with Neural Networks
31
Perceptron Learning Made Simple
• Learning rule:
If output is 1 and should be 0, then lower weights to
active inputs and raise the threshold ()
If output is 0 and should be 1, then raise weights to
active inputs and lower the threshold ()
• Learning rule:
wi(t+1) = wi(t) + *(teacher - output)*xi
( is the learning rate)
Modeling with Neural Networks
32
Perceptron Learning Made Simple
• Learning rule:
If output is 1 and should be 0, then lower weights to active inputs
and raise the threshold ()
If output is 0 and should be 1, then raise weights to active inputs
and lower the threshold ()
• Learning rule:
wi(t+1) = wi(t) + *(teacher - output)*xi
( is the learning rate)
• This is known as the delta rule because learning is based
on the delta (difference) between what you did and what
you should have done:  = (teacher - output)
Modeling with Neural Networks
33
Problems with perceptrons
• The learning rule comes with a great guarantee: anything a
perceptron can compute, it can learn to compute.
• Problem: Lots of things were not computable,
e.g., XOR (Minsky & Papert, 1969)
• Minsky & Papert said:
• if you had hidden units, you could compute any boolean function.
• But no learning rule exists for such multilayer networks, and we
don’t think one will ever be discovered.
Modeling with Neural Networks
34
Problems with perceptrons
Modeling with Neural Networks
35
Aside about perceptrons
• They didn’t have hidden units - but Rosenblatt assumed
nonlinear preprocessing!
• Hidden units compute features of the input
• The nonlinear preprocessing is a way to choose features by
hand.
• Support Vector Machines essentially do this in a principled
way, followed by a (highly sophisticated) perceptron
learning algorithm.
Modeling with Neural Networks
36
Enter Rumelhart, Hinton, & Williams (1985)
• Discovered a learning rule for networks with hidden units.
• Works a lot like the perceptron algorithm:
•
•
•
•
Randomly choose an input-output pattern
present the input, let activation propagate through the network
give the teaching signal
propagate the error back through the network (hence the name
back propagation)
• change the connection strengths according to the error
Modeling with Neural Networks
37
Enter Rumelhart, Hinton, & Williams (1985)
OUTPUTS
...
Hidden Units
Activation
Error
...
INPUTS
• The actual algorithm uses the chain rule of calculus to go downhill in
an error measure with respect to the weights
• The hidden units must learn features that solve the problem
Modeling with Neural Networks
38
XOR
Back Propagation
Learning
AND
OR
Random Network
XOR Network
• Here, the hidden units learned AND and OR - two features
that when combined appropriately, can solve the problem
Modeling with Neural Networks
39
XOR
Back Propagation
Learning
Random Network
OR
AND
XOR Network
But, depending on initial conditions, there are an infinite
number of ways to do XOR - backprop can surprise you
with innovative solutions.
Modeling with Neural Networks
40
Why is/was this wonderful?
•
•
•
•
•
Efficiency
Learns internal representations
Learns internal representations
Learns internal representations
Generalizes to recurrent networks
Modeling with Neural Networks
41
Hinton’s Family Trees example
• Idea: Learn to represent relationships between people that
are encoded in a family tree:
Modeling with Neural Networks
42
Hinton’s Family Trees example
• Idea 2: Learn distributed representations of concepts:
localist outputs
Learn: features of these
entities useful for
solving the task
Input: localist people
localist relations
Localist: one unit “ON” to represent each item
Modeling with Neural Networks
43
People hidden units: Hinton diagram
• What does the unit 1 encode?
What is unit 1 encoding?
Modeling with Neural Networks
44
People hidden units: Hinton diagram
•
What does unit 2 encode?
What is unit 2 encoding?
Modeling with Neural Networks
45
People hidden units: Hinton diagram
• Unit 6?
What is unit 6 encoding?
Modeling with Neural Networks
46
People hidden units: Hinton diagram
When all three are on, these units pick out Christopher and Penelope:
Other combinations pick out other parts of the trees
Modeling with Neural Networks
47
Relation units
What does the lower middle one code?
Modeling with Neural Networks
48
Lessons
• The network learns features in the service of the
task - i.e., it learns features on its own.
• This is useful if we don’t know what the features
ought to be.
• Can explain some human phenomena
Modeling with Neural Networks
49
Thanks to funders, GURONS, and you!
Questions?
Modeling with Neural Networks
50
Download