WhatsNextPDPLancaster

advertisement
What’s next for Parallel Distributed Processing?
Mathematical Cognition and Other New
Directions
Jay McClelland
Stanford University
Core features of the PDP approach to
representation and learning
• The knowledge is in the
connections
/h/ /i/ /n/
– It’s intrinsically implicit
• It is acquired by a blind automatic
procedure
– By performing gradient descent
rather than by making explicit
inferences or engaging in any kind
of reasoning process
• It can approximate systems of rules
– Without ever having any
• It captures the gradual nature of
developmental change
– And emphasizes the importance of
the gradual accumulation of small
changes
H I N T
/t/
Second and Third Waves of
Neural Networks
• Some classical applications of PDP models:
–
–
–
–
Reading and morphology
Sentence processing
Semantic cognition
Intuitive physics
• Some recent breakthroughs in machine learning:
–
–
–
–
Object classification
Speech recognition
Language processing
Surpassing human performance in Atari games
Sentiment Analysis (Socher et al, 2013)
What’s Next?
My lab’s new direction:
Mathematical Cognition
Why is Math so Hard to Learn?
• Late grade-school-aged kids misunderstand equations
– What goes in the blank: 7 + 3 + 4 = __ + 4
• Many middle-school-aged kids misunderstand fractions
– Is 19/20 closer to 1 or 21?
• Most Stanford undergraduates don’t understand the
rudiments of trigonometry
– Which expression below has the same value as
cos(-30°)?
sin(30°) -sin(30°) cos(30°) -cos(30°)
Failure to attach the appropriate
meaning to mathematical expressions
• A fraction N/D represents a
certain number N of pieces of a
unit whole divided into D equal
parts
• An equation represents an
equivalence relation between two
quantities, one to the left and one
to the right of the equals sign
• The sine / cosine of an angle θ in
degrees represents
– the projection of a point on the
unit circle specified by θ onto the
vertical / horizontal axis through
the center of the circle,
– or equivalently, the coordinates of
the point on the circle
4
7
5
?
XXX
cos(70)
cos(–70+0)
sin(-θ)
cos(-θ)
Reported Circle Use:
“A Lot”
“A Little” or
“Not at all”
Why are these things hard to
learn?
Learning Depends on the
Prepared Mind
• Algebra for eighth graders?
– “though strong math students can benefit from taking
algebra in eighth grade, it is "decidedly harmful" for
weaker math students to be rushed into advanced
math concepts”
• Failure to appreciate what X/Y means
– Setting up the appropriate encoding habits
• Failure to rely on the unit circle?
– Failure of a module for visuospatial cognition or
failure to develop the habit of mapping numbers into
a multi-faceted coordinate framework?
Habits of Mind1
• Learning to encode expressions automatically so
that their meaning is readily apparent in the mind
requires gradual connection adjustments that
occurs incrementally over repeated opportunities
to learn
– This is no different in principle from learning to read
words aloud
• We quickly loose awareness that we are engaging
in these processes – once we understand well,
meaning is a habit of mind we cannot readily
appreciate that others do not have
Margolis, H. (1987). Patterns, Thinking and Cognition. U. of Chicago Press.
Case Study in Readiness:
The Balance Scale
Balance Scale Model
Training involved more cases in which the weight varied than cases
In which the distance varied
Network’s task was to activate the unit corresponding to the side that
Should go down, or (if the sides are in balance) to set the activation of
both output units to .5
Siegler’s Readiness Experiment
• Two groups of children
– 5 year old Rule 1 children
– 7-8 year old Rule 1 children
• After pretext:
– Children saw 15 conflict problems with feedback
• 1/3: the side with greater weight would go down
• 1/3: the side with greater distance would go down
• 1/3: the two sides balance
• Most 7-8 yr olds progressed to rule 2
• Most 5 yr olds showed no change or reverted to
guessing
Rule 1
Start
Rule 1
End
Benefit From a Brief Lesson in the Unit
Circle Depends on a Prepared Mind
Applying these ideas to
Mathematical Cognition
• Application 1: Representation of approximate number
– Stoianov & Zorzi, 2012; Zou & McClelland (poster)
• Application 2: Learning to correctly solve equivalence problems
– Mickey & McClelland (talk)
• Application 3: Incremental improvements in strategies for adding small
numbers
– Hanson, McKenzie & McClelland (talk)
• Application 4: Learning to geometry
– Me and anyone who is willing to help me!
The Approximate Number Problem
•
Must we really imagine that a system
for representing number
approximately is innate, or can the
problem be solved using a generic
neural network?
•
Can we account for the developmental
improvement in acuity of the
approximate number system?
•
Can we understand why our
representations of approximate
number have the properties that they
do?
– Specifically, why does our sensitivity to
approximate numbers approximately
conform to Weber’s law?
Why Neural Networks?
• Deep (unsupervised) learning can create
an invertible internal representation
that is driven solely by the goal of
capturing the content of its inputs
– As Stoianov & Zorzi (2012) showed, this
is sufficient to support human level
performance in numerosity judgment.
• Using ‘stochastic gradient descent’
instead of batch learning allows us to
explore both the initial state and
progressive refinement of
representations.
– Zou & McClelland explore the
developmental trajectory, also explored
in subsequent work by S & Z.
Errors on Equivalence Problems
• Children are reliably incorrect in answers problems of the form:
a = b + __
• They tend to put the sum of a and b in the blank, rather than the
correct answer, which is b – a.
• When given such equations in a brief presentation, and asked to
reproduce them, they tend to reproduce them as
a + b = __
•
Children’s experiences are biased in ways that are consistent with
these errors.
Why a Neural Network?
• Gradually learns in a way that depends on statistics
of training set
• Exhibits ‘pattern completion’
biases that capture both
math errors and problem
reconstruction errors
• Gradually learns it was out of its errors, capturing
patterns in the data
These two models are great, but…
• When we solve mathematical problems, we
often perform a sequence of operations.
• These operations are not rigidly structured, so
we need flexibility
• And as we gain facility, we can
(spontaneously) develop more efficient
strategies
Strategy change in simple addition
5+2=7
• Children appear to gradually progress through a series
of alternative ‘strategies’, with strategy choice being
probabilistic and with the probabilities changing
gradually over age
• Children can be induced to change strategies if given
problems that give a clear advantage to one strategy
• Children’s strategies seem constrained to be consistent
with the principles of addition, even though children
can’t necessarily articulate such principles.
Incremental, Hierarchical, Supervised
Reinforcement Learning in a Neural Network
•
A strategy is a sequence of steps, and reward only comes at the end
•
The time it takes as well as the outcome are automatically considerations in
reinforcement learning.
•
The use of a neural network as function approximator supports generalization.
•
Re-use of number skills previously acquired leads to selection of task-appropriate
rather than task inappropriate strategies.
•
The ‘strategy’ as a whole emerges as an assemblage of strategy chunks, each
associated with a component skill relevant to addition.
•
A key idea is that learning is curriculum based:
The culture and educational system provide early experiences in initial
components that then provide the previously acquired skills.
Intuitive Geometry Project:
Motivations
•
Geometrical intuition as developing gradually
with age, through a series of ‘levels’.
– A year’s course in Geometry has no special impact
on student’s ‘level’.
•
Lessons learned from presenting students with
the Socratic Dialog uncovering the supposed
prior understanding of how to create a square
with twice the area of a given square.
– In spite of profession of ‘understanding’ after
walking through the dialog, those with many
misconceptions can’t demonstrate the solution on
a new square.
•
Geometry as grounded in Intuition but
ultimately connected to proof
– Carmenga, Transforming Geometric Proof with
Reflections, Rotations and Translations
– Henderson, Experiencing Geomety

Given: ∠A≅∠A’,
AC≅A’C’, ∠C≅∠C’

Prove: △ABC≅△A’B’C’

Idea:
◦ translate A to A’
◦ rotate △ABC until AC
coincides with A’C’
◦ reflect over A’C’ if
necessary.
Then the whole triangle
coincides!
Given: ∠A≅∠A’, AC≅A’C’, ∠C≅∠C’
 Translate △ABC so that A coincides with A’.
 Rotate △ABC so that ray AC coincides with
ray A’C’. Since AC≅A’C’, C coincides with
C’.
 If B and B’ are on different sides of line AC,
reflect △ABC over line AC.
◦ Since ∠A≅∠A’ and AC and A’C’ coincide and
are on the same side of the angle, ∠A
coincides with ∠A’.
◦ Since the angles coincide, the other rays AB
and A’B’ coincide.
◦ Similarly, since ∠C≅∠C’ and AC and A’C’
coincide, ∠C coincides with ∠C’ and the other
rays CB and C’B’ coincide.
◦ Since ray AB coincides with ray A’B’ and ray
CB with ray C’B’and two lines intersect in at
most one point, B coincides with B’.

Since all sides and angles coincide,
△ABC≅△A’B’C’.
How Can We Begin to Make Progress
on this Ambitious Project?
• Create a simulated agent that must carry out tasks in a virtual world
– Similar to the Deepmind ATARI project
– Agent has a few actions it can perform
• Change its point of view on its (2-D) world
• Move, rotate and flip objects
• Measure, copy, and construct objects to instruction using geometry tools
– E.g., adjustable straightedge and compass
» Demo some time during the conference!
• Train the agent using incremental supervised learning
– Initial tasks:
•
•
•
•
Find named objects, find objects that have the same shape as a given target
Translate, rotate, and flip objects to fit them through shaped wholes
Learn to measure length and angle
Learn to impose alternative frames of reference to identify congruent shapes
under rotations and flips
Two Relevant Ideas
• Use eye movements to
bring objects to the center
of gaze, where they can be
recognized in canonical
position.
– Plaut, McClelland, &
Seidenberg, NCPW 1
– Mnih et al, 2014
• Use transforming
autoencoders to learn
effects of transformations
on the way objects look.
– Hinton et al, 2011
Later Stages
V. Carry out Euclidean constructions to instruction
- Based on given diagrams
- Purely from instruction
VI. Determine perimeter and area of polygons
and circles in given units
VII. Establish correspondence between figures A
and B
VIII. Solve complex geometry problems requiring
several intermediate inferences and
computations
One Example Problem
Challenges, Open Questions, and
Broader Directions
• Explicit cognition, metacognitive knowledge, ant their
relation to knowledge in connections
• Abstract mathematics, proof, and justification
– Are they, at least in part, extensions of concrete embodied
reasoning
• A broader understanding of understanding in
embodied terms
– We don’t just map mathematical expressions onto learned
conceptual structures, we do the same in general when we
understand ideas expressed in language as well
• These are the questions for the next ten years
Download