Advances in Modeling Neocortex and its impact on machine intelligence Jeff Hawkins Numenta Inc. VS265 Neural Computation December 2, 2010 Documentation for the algorithms described in this talk can be found at www.numenta.com (papers) Premise 1) The principles of brain function can be understood. 2) We can build machines that work on these principles. 3) Many machine learning, A.I., and robotics problems can only be solved this way. Neocortex Is Our Focus • • • 75% of volume of human brain All high level vision, audition, motor, language, thought Composed of a repetitive element - complex - hierarchical Process Neurobiology anatomy, physiology Algorithmic needs Biological model Computer model empirical results Our computer models are biologically and empirically driven. Neocortex (large scale architecture) Neocortex overall - 1000 cm^2, 2 mm thick 30 billion cells 100 trillion synapses Regions - Nearly identical architecture Differentiated by connectivity Common algorithms Hierarchy - Convergence Temporal slowness Hierarchical Temporal Memory (Basic) Regions - Learn common spatial patterns (Sparse Distributed Representations of input) - Learn sequences of common spatial patterns (variable order transitions of SDRs) - Pass stable representations up hierarchy Unfold sequences going down hierarchy Hierarchy - Reduces memory and training time - Provides means of generalization Sequence Memory is Key Why is sequence memory so important? - Prediction - Motor behavior - Time-based inference - Spatial inference Attributes - High capacity - Context dependent - Robust - Multiple simultaneous predictions - Form stable representations of sequences - On-line learning Neocortical regions Biology - 1 4 5 6 to higher region from lower region Five layers of cells Densely packed Massively interconnected 2/3 - Cells in columns have similar response properties Majority of connections are within layer Feed forward connections are few but strong Layers 4 and 3 are primary feed forward layers Layer 4 disappears as you ascend hierarchy Hypothesis - Common mechanism is used in each layer Each layer is a sequence memory Learns transitions of sparse distributed patterns - Layer 4 learns first order transitions Ideal for spatial inference, “simple cells” - Layer 3 learns variable order transitions Ideal for time-based inference, “complex cells” - Layer 5 motor specific timing - Layers 2, 6 feedback, attention Neurons Real neuron Proximal dendrites Linear summation Feed forward connections Distal dendrites Not a neuron Sum of weighted synapses Non-linear function Scalar output HTM neuron Proximal dendrite Linear summation Feed forward connections Distal dendrites Dozens of regions Non-linear integration Connections to other cells in layer Synapses Dozens of regions Threshold coincidence detectors Connections to other cells in layer Synapses Thousands on distal dendrites Hundreds on proximal dendrites Numerous learning rules Forming and un-forming constantly Output Thousands on distal dendrites (dozens per segment) Hundreds on proximal dendrite Scalar Permanence Binary weight Output Variable spike rate Bursts of spikes Projects laterally and inter-layer Active state (fast or burst) Predictive state (slow) Projects laterally and inter-region HTM Regions What is an HTM region? - A set of neurons arranged in columns - Cells in column have same feed forward activation - Cells in column have different response in context What does a region do? 1) Creates a sparse distributed representation of input 2) Creates a representation of input in the context of prior inputs 3) Learns sequences of representations from 2) 4) Forms a prediction based on the current input in the context of previous inputs This prediction is a slow changing representation of sequence. It is the output of the region. Cellular layer - 1 cell per column Internal potential of cells (via feed forward input to proximal synapses) Cellular layer - 1 cell per column Cells with highest potential fire first, inhibit neighbors Cellular layer - 1 cell per column Sparse Distributed Representation of input (time = 1) Cellular layer - 1 cell per column Sparse Distributed Representation of input (time = 2) Cellular layer - 1 cell per column Sparse Distributed Representation of input (time = 1) Cellular layer - 1 cell per column Sparse Distributed Representation of input (time = 2) Cellular layer - 1 cell per column Prediction (via lateral connections to distal dendrite segments) With 1 cell per column, all transitions are first order Cellular layer - 4 cells per column – no context Sparse Distributed Representation of input (if unpredicted, all cells in a column fire) Cellular layer - 4 cells per column – with prior context Sparse Distributed Representation of input (if predicted, one cell in a column fires) Represents input in the context of prior states (variable order sequence memory) Cellular layer - 4 cells per column Prediction In the context of prior states Cellular layer - 4 cells per column Unpredicted column Predicted columns HTM Neuron Distal dendrite segments - Act like coincidence detectors Recognize state of region - When segment active, cell enters predictive state - Typical threshold, 15 active synapses, sufficient! - Synapses formed from “potential synapse” pool Feed forward input Lateral connections from other cells in region - Each segment can learn several patterns without error - One cell can participate in many different sequences Proximal dendrite - Shared by all cells in a column Linear summation of input Boosted by duty cycle Synapses formed from “potential synapse” pool Leads to self-adjusting representations Learning rules - If a segment is active Modify synapses on active segment Modify synapses on segment best matching t=-1 - Modify permanence for all potential synapses Increase for active synapses Decrease for inactive synapses Permanence range 0.0 to 1.0, >0.2 = valid HTM Cortical Learning Algorithm Variable order sequence memory Time-based and static inference Massive predictive ability Uses sparse distributed representations High capacity High noise immunity On-line learning Deep biological mapping Self adjusting representations What’s next Commercial Applications Numenta is applying these new algorithms to data analytics problems, e.g. - Credit card fraud - Large sensor environments - Web click prediction Research Full documentation plus pseudo code at www.numenta.com (papers) We will release software in 2011 All use is free for research purposes Engage Numenta for further discussions Employment Interns and full time position jhawkins@numenta.com The Future of Machine Intelligence