Learning Transitions

advertisement
Sensory-Motor Integration
in HTM Theory
Numenta
Catalyst for machine intelligence
Research
NuPIC
Theory,
Algorithms
Open source
community
San Jose Meetup 3/14/14
Products for streaming analytics
Based on cortical algorithms
Jeff Hawkins
jhawkins@Numenta.com
What is the Problem?
Sensory-motor integration
Sensorimotor contingencies
Embodied A.I.
Symbol grounding
-
Most changes in sensor data are due to our own actions.
Our model of the world is a “sensory-motor model”
1) How does the cortex learn this sensory-motor model?
2) How does the cortex generate behavior?
Hierarchical Temporal Memory (Review)
1) Hierarchy of nearly identical regions
- across species
- across modalities
retina
cochlea
somatic
data stream
2) Regions are mostly sequence memory
- inference
- motor
3) Feedforward: Temporal stability
- inference
- the world seems stable
Feedback: Unfold sequences
- motor/prediction
Cortical Learning Algorithm (Review)
A Model of a Layer of Cells in Cortex
Sequence Memory
- Converts input to sparse representations
- Learns sequences
- Makes predictions and detects anomalies
Capabilities
- On-line learning
- Large capacity
- High-order
- Simple local learning rules
- Fault tolerant
Cortex and Behavior (The big picture)
retina
Old brain has complex behaviors.
cochlea
somatic
Cortex evolved “on top” of old brain.
sensory data
vestibular
motor copy
proprioceptive
muscles
Old brain
- spinal cord
- brain stem
- basal ganglia
Cortex receives both sensory data and
copies of motor commands.
- It has the information needed for
sensory-motor inference.
Cortex learns to “control” old brain
(via associative linking).
- It has the connections needed to
generate goal-oriented behavior.
Region
2.5 mm
2/3
Layers
4
5
6
2/3
Sequence memory
4
Sequence memory
5
6
Layers &
Mini-columns
Sequence memory
Sequence memory
1) The cellular layer is unit of cortical computation.
2) Each layer implements a variant of the CLA.
3) Mini-columns play critical role in how layers learn sequences.
L4 and L2/3: Feedforward Inference
2/3
Learns high-order sequences
4
Learns sensory-motor transitions
Stable
Unique
Pass through
changes
Predicted
Un-predicted
A-B-C-D
X-B-C-Y
A-B-C- ? D
X-B-C- ? Y
5
6
Sensor/afferent data
Copy of motor commands
Layer 4 learns changes in input due to behavior.
Layer 3 learns sequences of L4 remainders.
These are universal inference steps.
They apply to all sensory modalities.
Next higher region
Learns high-order sequences
4
Learns sensory-motor transitions
5
Recalls motor sequences
6
Attention
Old brain
Feedforward
2/3
Well understood (CLA)
tested / commercial
Feedback
L5 and L6: Feedback, Behavior and Attention
50% understood
90% understood
testing in progress
10% understood
L3: Existing CLA Learns High-order Sequences
2/3
Learns high-order sequences
4
Learns sensory-motor transitions
5
Recalls motor sequences
6
Attention
L4: CLA With Motor Command Input
Learns Sensory-motor Transitions
2/3
Learns high-order sequences
4
Learns sensory-motor transitions
5
Recalls motor sequences
6
Attention
Sensor/afferent data
Copy of motor commands
+W
+SE
+N
+SW
+NE
With motor context, CLA will form unique representations.
+E
+NW
+S
+NE
+SW
With motor context, CLA will form unique representations.
How Does L5 Generate Motor Commands?
2/3
Learns high-order sequences
4
Learns sensory-motor transitions
5
Recalls motor sequences
6
Attention
Old brain
One extreme: Dynamic world no behavior
sensory encoder
CLA, high-order
sequence memory
Grok: Detects Anomalies In Servers
Other extreme: Static world
Behavior is needed for any change in sensory stream
1)
2)
Move sensor:
saccade, turn head, walk
Interact with world: push, lift, open, talk
Review: Mammals have pre-wired, sub-cortical behaviors
Body + old brain
Pre-wired
behavior
generators
walking
running
eye movements
head turning
grasping
reaching
breathing
blinking, etc.
Review: Cortex models behavior of body and old brain
Body + old brain
Cortex (CLA)
Pre-wired
behavior
generators
Cortex learns sensory patterns.
Cortex also learns sensory-motor patterns.
Makes much better predictions knowing behavior.
How does cortex control behavior?
Body + old brain
L4/L3
Pre-wired
behavior
generators
L5
L4/L3 is a sensory-motor model of the world
L3 projects to L5 and they share column activations
Think of L5 as a copy of L3
L5 associatively links to sub-cortical behavior
Cortex can now invoke behaviors
Q. Why L5 in addition to L3?
A. To separate inference from behavior.
- Inference: L4/L3
- Behavior: L5
(L3->L5->old brain learns associative link)
(L5 generates behavior before feedforward inference)
Cortical Hierarchy
L4/L3
L5
L6
L4/L3
L5
Stable
feedback
L3/L4
Sub-cortical
behavior
generator
L5
Copy of motor command
Body + old brain
L6
Each cortical region treats the region below it in
the hierarchy as the first region treats the prewired areas.
Goal Oriented Behavior via Feedback
Higher
region
Stable/Invariant pattern
from higher region
Copy of motor command
2/3
complex cells
4
simple cells
5
complex cells
6
simple cells
Sensor/afferent data
Copy of motor commands
Sub-cortical
motor
Stable feedback invokes union of sparse states in
multiple sequences. (“Goal”)
Feedforward input selects one state and plays back
motor sequence from there.
Short term goal: Build a one region sensorimotor system
Body + old brain
L4/L3
Pre-wired
behavior
generators
L5
Hypothesis:
The only way to build machines with intelligent goal
oriented behavior is to use the same principles as the
neocortex.
retina
cochlea
somatic
sensory data
vestibular
proprioceptive
Old brain
behaviors
Cortex 2
Cortex 1
Basal ganglia
Brainstem
Spine
f3
f4
f2
f1
A
Cortical Principles
1) On-line learning from streaming data
retina
cochlea
somatic
data stream
Cortical Principles
1) On-line learning from streaming data
2) Hierarchy of memory regions
retina
cochlea
somatic
data stream
Cortical Principles
1) On-line learning from streaming data
2) Hierarchy of memory regions
retina
cochlea
somatic
data stream
3) Sequence memory
- inference
- motor
Cortical Principles
1) On-line learning from streaming data
2) Hierarchy of memory regions
3) Sequence memory
retina
cochlea
somatic
data stream
4) Sparse Distributed Representations
Cortical Principles
1) On-line learning from streaming data
2) Hierarchy of memory regions
3) Sequence memory
retina
cochlea
data stream
4) Sparse Distributed Representations
5) All regions are sensory and motor
somatic
Motor
Cortical Principles
1) On-line learning from streaming data
2) Hierarchy of memory regions
retina
cochlea
somatic
data stream
xx
xxx
xx xx
x
3) Sequence memory
x
xx
4) Sparse Distributed Representations
5) All regions are sensory and motor
6) Attention
Cortical Principles
1) On-line learning from streaming data
2) Hierarchy of memory regions
3) Sequence memory
retina
data stream
cochlea
somatic
4) Sparse Distributed Representations
5) All regions are sensory and motor
6) Attention
These six principles are necessary and sufficient for biological and
machine intelligence.
What the Cortex Does
retina
cochlea
somatic
data stream
The neocortex learns a model
from sensory data
- predictions
- anomalies
- actions
The neocortex learns a sensory-motor model of the world
Attributes of Solution
Independent of sensory modality and motor modality
Should work with biological and non-biological senses
-
Biological:
Non-biological:
Vision, audition, touch, proprioception
M2M data, IT data, financial data, etc.
Can be understood in a single region
Must work in a hierarchy
Will be based on CLA principles
How does a layer of neurons learn sequences?
2/3
Learns high-order sequences
4
5
6
First:
- Sparse Distributed Representations
- Neurons
Dense Representations
•
•
•
Few bits (8 to 128)
All combinations of 1’s and 0’s
Example: 8 bit ASCII
01101101 = m
•
Bits have no inherent meaning
Arbitrarily assigned by programmer
Sparse Distributed Representations (SDRs)
•
•
•
Many bits (thousands)
Few 1’s mostly 0’s
Example: 2,000 bits, 2% active
•
Each bit has semantic meaning
Learned
01000000000000000001000000000000000000000000000000000010000…………01000
SDR Properties
1) Similarity:
shared bits = semantic similarity
2) Store and Compare:
store indices of active bits
subsampling is OK
3) Union membership:
Indices
1
2
3
4
5
|
40
Indices
1
2
|
10
1)
2)
3)
2%
….
10)
Union
Is this SDR
a member?
20%
Neurons
Active Dendrites
Feedforward
Pattern detectors
Each cell can recognize
100’s of unique patterns
100’s of synapses
“Classic” receptive field
Context
1,000’s of synapses
Depolarize neuron
“Predicted state”
Model neuron
Learning Transitions
Feedforward activation
Learning Transitions
Inhibition
Learning Transitions
Sparse cell activation
Learning Transitions
Time = 1
Learning Transitions
Time = 2
Learning Transitions
Form connections to previously active cells.
Predict future activity.
Learning Transitions
Multiple predictions can occur at once.
A-B A-C A-D
This is a first order sequence memory.
It cannot learn A-B-C-D vs. X-B-C-Y.
High-order Sequences Require Mini-columns
A-B-C-D vs. X-B-C-Y
After training
X
A
Same columns,
Before training
B’’
C’’
Y’’
B’
C’
D’
but only one cell active per column.
X
B
C
Y
A
B
C
D
IF
THEN
40 active columns, 10 cells per column
1040 ways to represent the same input in different contexts
Cortical Learning Algorithm (CLA)
aka Cellular Layer
Converts input to sparse representations in columns
Learns transitions
Makes predictions and detects anomalies
Applications
1) High-order sequence inference
2) Sensory-motor inference
3) Motor sequence recall
L2/3
L4
L5
Capabilities
- On-line learning
- High capacity
- Simple local learning rules
- Fault tolerant
- No sensitive parameters
Basic building block of neocortex/Machine Intelligence
Anomaly Detection Using CLA
CLA
Metric(s)
Encoder
SDR
.
.
.
Metric N
CLA
Encoder
SDR
Prediction
Point anomaly
Time average
Historical comparison
Anomaly score
Prediction
Point anomaly
Time average
Historical comparison
Anomaly score
System
Anomaly
Score
Grok for Amazon AWS
“Breakthrough Science for Anomaly Detection”




Automated model creation
Ranks anomalous instances
Continuously updated
Continuously learning
Unique Value of Cortical Algorithms
Technology can be applied to any kind
of data, financial, manufacturing, web
sales, etc.
Application: CEPT Systems
100K “Word SDRs”
Document corpus
(e.g. Wikipedia)
128 x 128
Apple
Fruit
-
Computer
=
Macintosh
Microsoft
Mac
Linux
Operating system
….
Sequences of Word SDRs
Training set
frog
cow
elephant
goat
wolf
cat
elephant
sheep
cat
wolf
lion
dog
elephant
cat
coyote
coyote
wolf
dog
cat
----
eats
eats
eats
eats
eats
likes
likes
eats
eats
eats
eats
likes
likes
likes
eats
eats
eats
likes
likes
----
flies
grain
leaves
grass
rabbit
ball
water
grass
salmon
mice
cow
sleep
water
ball
rodent
rabbit
squirrel
sleep
ball
-----
Word 1
Word 2
Word 3
Sequences of Word SDRs
Training set
frog
cow
elephant
goat
wolf
cat
elephant
sheep
cat
wolf
lion
dog
elephant
cat
coyote
coyote
wolf
dog
cat
----
eats
eats
eats
eats
eats
likes
likes
eats
eats
eats
eats
likes
likes
likes
eats
eats
eats
likes
likes
----
flies
grain
leaves
grass
rabbit
ball
water
grass
salmon
mice
cow
sleep
water
ball
rodent
rabbit
squirrel
sleep
ball
-----
“fox”
eats
?
Sequences of Word SDRs
Training set
frog
cow
elephant
goat
wolf
cat
elephant
sheep
cat
wolf
lion
dog
elephant
cat
coyote
coyote
wolf
dog
cat
----
eats
eats
eats
eats
eats
likes
likes
eats
eats
eats
eats
likes
likes
likes
eats
eats
eats
likes
likes
----
flies
grain
leaves
grass
rabbit
ball
water
grass
salmon
mice
cow
sleep
water
ball
rodent
rabbit
squirrel
sleep
ball
-----
“fox”
eats
rodent
1) Word SDRs created unsupervised
2) Semantic generalization
SDR: lexical
CLA: grammatic
3) Commercial applications
Sentiment analysis
Abstraction
Improved text to speech
Dialog, Reporting, etc.
www.Cept.at
Cept and Grok use exact same code base
“fox”
eats
rodent
NuPIC Open Source Project (created July 2013)
Source code for:
- Cortical Learning Algorithm
- Encoders
- Support libraries
Single source tree (used by GROK), GPLv3
Active and growing community
- 80 contributors
as of 2/2014
- 533 mailing list subscribers
Education Resources / Hackathons
www.Numenta.org
Goals For 2014
Research
- Implement and test L4 sensory/motor inference
- Introduce hierarchy
- Publish
NuPIC
- Grow open source community
- Support partners, e.g. IBM Almaden, CEPT
Grok
- Demonstrate commercial value for CLA
Attract resources (people and money)
Provide a “target” and market for HW
- Explore new application areas
•
•
•
•
•
•
One layer, no hierarchy
2000 mini-columns
30 cells per mini-column
Up to 128 dendrite segments per cell
Up to 40 active synapses per segment
Thousands of potential synapses per segment
One millionth human cortex
50 msec per inference/training step
128
The Limits of Software
30
2000
columns
(highly optimized)
We need HW for the usual reasons.
- Speed
- Power
- Volume
40
Start of supplemental material
“If you invent a
breakthrough so
computers can learn,
that is worth 10
Microsofts”
"If you invent a
Machine
Intelligence
breakthrough so
computers that learn
that is worth 10 Micros
1) What principles will we use to build
intelligent machines?
2) What applications will drive adoption
in the near and long term?
Machine intelligence
will be built on the
principles of the
neocortex
1) Flexible (universal learning machine)
2) Robust
3) If we knew how the neocortex worked,
we would be in a race to build them.
Download