Advances in Modeling Neocortex and its impact on machine intelligence Jeff Hawkins

advertisement
Advances in Modeling Neocortex
and its impact on machine intelligence
Jeff Hawkins
Numenta Inc.
VS265 Neural Computation
December 2, 2010
Documentation for the algorithms described in this talk can be found at
www.numenta.com (papers)
Premise
1) The principles of brain function can be understood.
2) We can build machines that work on these principles.
3) Many machine learning, A.I., and robotics problems
can only be solved this way.
Neocortex Is Our Focus
•
•
•
75% of volume of human brain
All high level vision, audition, motor, language, thought
Composed of a repetitive element
- complex
- hierarchical
Process
Neurobiology
anatomy, physiology
Algorithmic
needs
Biological
model
Computer
model
empirical results
Our computer models are biologically and empirically driven.
Neocortex (large scale architecture)
Neocortex overall
-
1000 cm^2, 2 mm thick
30 billion cells
100 trillion synapses
Regions
-
Nearly identical architecture
Differentiated by connectivity
Common algorithms
Hierarchy
-
Convergence
Temporal slowness
Hierarchical Temporal Memory (Basic)
Regions
- Learn common spatial patterns
(Sparse Distributed Representations of input)
-
Learn sequences of common spatial patterns
(variable order transitions of SDRs)
-
Pass stable representations up hierarchy
Unfold sequences going down hierarchy
Hierarchy
- Reduces memory and training time
- Provides means of generalization
Sequence Memory is Key
Why is sequence memory so important?
- Prediction
- Motor behavior
- Time-based inference
- Spatial inference
Attributes
- High capacity
- Context dependent
- Robust
- Multiple simultaneous predictions
- Form stable representations of sequences
- On-line learning
Neocortical regions
Biology
-
1
4
5
6
to higher region
from lower region
Five layers of cells
Densely packed
Massively interconnected
2/3
-
Cells in columns have similar response properties
Majority of connections are within layer
Feed forward connections are few but strong
Layers 4 and 3 are primary feed forward layers
Layer 4 disappears as you ascend hierarchy
Hypothesis
-
Common mechanism is used in each layer
Each layer is a sequence memory
Learns transitions of sparse distributed patterns
-
Layer 4 learns first order transitions
Ideal for spatial inference, “simple cells”
-
Layer 3 learns variable order transitions
Ideal for time-based inference, “complex cells”
-
Layer 5 motor
specific timing
-
Layers 2, 6 feedback, attention
Neurons
Real neuron
Proximal dendrites
Linear summation
Feed forward connections
Distal dendrites
Not a neuron
Sum of weighted synapses
Non-linear function
Scalar output
HTM neuron
Proximal dendrite
Linear summation
Feed forward connections
Distal dendrites
Dozens of regions
Non-linear integration
Connections to other cells in layer
Synapses
Dozens of regions
Threshold coincidence detectors
Connections to other cells in layer
Synapses
Thousands on distal dendrites
Hundreds on proximal dendrites
Numerous learning rules
Forming and un-forming constantly
Output
Thousands on distal dendrites (dozens per segment)
Hundreds on proximal dendrite
Scalar Permanence
Binary weight
Output
Variable spike rate
Bursts of spikes
Projects laterally and inter-layer
Active state (fast or burst)
Predictive state (slow)
Projects laterally and inter-region
HTM Regions
What is an HTM region?
- A set of neurons arranged in columns
- Cells in column have same feed forward activation
- Cells in column have different response in context
What does a region do?
1) Creates a sparse distributed representation of input
2) Creates a representation of input in the context of prior inputs
3) Learns sequences of representations from 2)
4) Forms a prediction based on the current input in the context of previous inputs
This prediction is a slow changing representation of sequence.
It is the output of the region.
Cellular layer - 1 cell per column
Internal potential of cells
(via feed forward input to proximal synapses)
Cellular layer - 1 cell per column
Cells with highest potential fire first, inhibit neighbors
Cellular layer - 1 cell per column
Sparse Distributed Representation of input (time = 1)
Cellular layer - 1 cell per column
Sparse Distributed Representation of input (time = 2)
Cellular layer - 1 cell per column
Sparse Distributed Representation of input (time = 1)
Cellular layer - 1 cell per column
Sparse Distributed Representation of input (time = 2)
Cellular layer - 1 cell per column
Prediction (via lateral connections to distal dendrite segments)
With 1 cell per column, all transitions are first order
Cellular layer - 4 cells per column – no context
Sparse Distributed Representation of input (if unpredicted, all cells in a column fire)
Cellular layer - 4 cells per column – with prior context
Sparse Distributed Representation of input (if predicted, one cell in a column fires)
Represents input in the context of prior states (variable order sequence memory)
Cellular layer - 4 cells per column
Prediction
In the context of prior states
Cellular layer - 4 cells per column
Unpredicted column
Predicted columns
HTM Neuron
Distal dendrite segments
- Act like coincidence detectors
Recognize state of region
- When segment active, cell enters predictive state
- Typical threshold, 15 active synapses, sufficient!
- Synapses formed from “potential synapse” pool
Feed forward input
Lateral connections from
other cells in region
- Each segment can learn several patterns without error
- One cell can participate in many different sequences
Proximal dendrite
-
Shared by all cells in a column
Linear summation of input
Boosted by duty cycle
Synapses formed from “potential synapse” pool
Leads to self-adjusting representations
Learning rules
- If a segment is active
Modify synapses on active segment
Modify synapses on segment best matching t=-1
- Modify permanence for all potential synapses
Increase for active synapses
Decrease for inactive synapses
Permanence range 0.0 to 1.0, >0.2 = valid
HTM Cortical Learning Algorithm
Variable order sequence memory
Time-based and static inference
Massive predictive ability
Uses sparse distributed representations
High capacity
High noise immunity
On-line learning
Deep biological mapping
Self adjusting representations
What’s next
Commercial Applications
Numenta is applying these new algorithms to
data analytics problems, e.g.
- Credit card fraud
- Large sensor environments
- Web click prediction
Research
Full documentation plus pseudo code at
www.numenta.com (papers)
We will release software in 2011
All use is free for research purposes
Engage Numenta for further discussions
Employment
Interns and full time position
jhawkins@numenta.com
The Future of Machine Intelligence
Download