Sensory-Motor Integration in HTM Theory Numenta Catalyst for machine intelligence Research NuPIC Theory, Algorithms Open source community San Jose Meetup 3/14/14 Products for streaming analytics Based on cortical algorithms Jeff Hawkins jhawkins@Numenta.com What is the Problem? Sensory-motor integration Sensorimotor contingencies Embodied A.I. Symbol grounding - Most changes in sensor data are due to our own actions. Our model of the world is a “sensory-motor model” 1) How does the cortex learn this sensory-motor model? 2) How does the cortex generate behavior? Hierarchical Temporal Memory (Review) 1) Hierarchy of nearly identical regions - across species - across modalities retina cochlea somatic data stream 2) Regions are mostly sequence memory - inference - motor 3) Feedforward: Temporal stability - inference - the world seems stable Feedback: Unfold sequences - motor/prediction Cortical Learning Algorithm (Review) A Model of a Layer of Cells in Cortex Sequence Memory - Converts input to sparse representations - Learns sequences - Makes predictions and detects anomalies Capabilities - On-line learning - Large capacity - High-order - Simple local learning rules - Fault tolerant Cortex and Behavior (The big picture) retina Old brain has complex behaviors. cochlea somatic Cortex evolved “on top” of old brain. sensory data vestibular motor copy proprioceptive muscles Old brain - spinal cord - brain stem - basal ganglia Cortex receives both sensory data and copies of motor commands. - It has the information needed for sensory-motor inference. Cortex learns to “control” old brain (via associative linking). - It has the connections needed to generate goal-oriented behavior. Region 2.5 mm 2/3 Layers 4 5 6 2/3 Sequence memory 4 Sequence memory 5 6 Layers & Mini-columns Sequence memory Sequence memory 1) The cellular layer is unit of cortical computation. 2) Each layer implements a variant of the CLA. 3) Mini-columns play critical role in how layers learn sequences. L4 and L2/3: Feedforward Inference 2/3 Learns high-order sequences 4 Learns sensory-motor transitions Stable Unique Pass through changes Predicted Un-predicted A-B-C-D X-B-C-Y A-B-C- ? D X-B-C- ? Y 5 6 Sensor/afferent data Copy of motor commands Layer 4 learns changes in input due to behavior. Layer 3 learns sequences of L4 remainders. These are universal inference steps. They apply to all sensory modalities. Next higher region Learns high-order sequences 4 Learns sensory-motor transitions 5 Recalls motor sequences 6 Attention Old brain Feedforward 2/3 Well understood (CLA) tested / commercial Feedback L5 and L6: Feedback, Behavior and Attention 50% understood 90% understood testing in progress 10% understood L3: Existing CLA Learns High-order Sequences 2/3 Learns high-order sequences 4 Learns sensory-motor transitions 5 Recalls motor sequences 6 Attention L4: CLA With Motor Command Input Learns Sensory-motor Transitions 2/3 Learns high-order sequences 4 Learns sensory-motor transitions 5 Recalls motor sequences 6 Attention Sensor/afferent data Copy of motor commands +W +SE +N +SW +NE With motor context, CLA will form unique representations. +E +NW +S +NE +SW With motor context, CLA will form unique representations. How Does L5 Generate Motor Commands? 2/3 Learns high-order sequences 4 Learns sensory-motor transitions 5 Recalls motor sequences 6 Attention Old brain One extreme: Dynamic world no behavior sensory encoder CLA, high-order sequence memory Grok: Detects Anomalies In Servers Other extreme: Static world Behavior is needed for any change in sensory stream 1) 2) Move sensor: saccade, turn head, walk Interact with world: push, lift, open, talk Review: Mammals have pre-wired, sub-cortical behaviors Body + old brain Pre-wired behavior generators walking running eye movements head turning grasping reaching breathing blinking, etc. Review: Cortex models behavior of body and old brain Body + old brain Cortex (CLA) Pre-wired behavior generators Cortex learns sensory patterns. Cortex also learns sensory-motor patterns. Makes much better predictions knowing behavior. How does cortex control behavior? Body + old brain L4/L3 Pre-wired behavior generators L5 L4/L3 is a sensory-motor model of the world L3 projects to L5 and they share column activations Think of L5 as a copy of L3 L5 associatively links to sub-cortical behavior Cortex can now invoke behaviors Q. Why L5 in addition to L3? A. To separate inference from behavior. - Inference: L4/L3 - Behavior: L5 (L3->L5->old brain learns associative link) (L5 generates behavior before feedforward inference) Cortical Hierarchy L4/L3 L5 L6 L4/L3 L5 Stable feedback L3/L4 Sub-cortical behavior generator L5 Copy of motor command Body + old brain L6 Each cortical region treats the region below it in the hierarchy as the first region treats the prewired areas. Goal Oriented Behavior via Feedback Higher region Stable/Invariant pattern from higher region Copy of motor command 2/3 complex cells 4 simple cells 5 complex cells 6 simple cells Sensor/afferent data Copy of motor commands Sub-cortical motor Stable feedback invokes union of sparse states in multiple sequences. (“Goal”) Feedforward input selects one state and plays back motor sequence from there. Short term goal: Build a one region sensorimotor system Body + old brain L4/L3 Pre-wired behavior generators L5 Hypothesis: The only way to build machines with intelligent goal oriented behavior is to use the same principles as the neocortex. retina cochlea somatic sensory data vestibular proprioceptive Old brain behaviors Cortex 2 Cortex 1 Basal ganglia Brainstem Spine f3 f4 f2 f1 A Cortical Principles 1) On-line learning from streaming data retina cochlea somatic data stream Cortical Principles 1) On-line learning from streaming data 2) Hierarchy of memory regions retina cochlea somatic data stream Cortical Principles 1) On-line learning from streaming data 2) Hierarchy of memory regions retina cochlea somatic data stream 3) Sequence memory - inference - motor Cortical Principles 1) On-line learning from streaming data 2) Hierarchy of memory regions 3) Sequence memory retina cochlea somatic data stream 4) Sparse Distributed Representations Cortical Principles 1) On-line learning from streaming data 2) Hierarchy of memory regions 3) Sequence memory retina cochlea data stream 4) Sparse Distributed Representations 5) All regions are sensory and motor somatic Motor Cortical Principles 1) On-line learning from streaming data 2) Hierarchy of memory regions retina cochlea somatic data stream xx xxx xx xx x 3) Sequence memory x xx 4) Sparse Distributed Representations 5) All regions are sensory and motor 6) Attention Cortical Principles 1) On-line learning from streaming data 2) Hierarchy of memory regions 3) Sequence memory retina data stream cochlea somatic 4) Sparse Distributed Representations 5) All regions are sensory and motor 6) Attention These six principles are necessary and sufficient for biological and machine intelligence. What the Cortex Does retina cochlea somatic data stream The neocortex learns a model from sensory data - predictions - anomalies - actions The neocortex learns a sensory-motor model of the world Attributes of Solution Independent of sensory modality and motor modality Should work with biological and non-biological senses - Biological: Non-biological: Vision, audition, touch, proprioception M2M data, IT data, financial data, etc. Can be understood in a single region Must work in a hierarchy Will be based on CLA principles How does a layer of neurons learn sequences? 2/3 Learns high-order sequences 4 5 6 First: - Sparse Distributed Representations - Neurons Dense Representations • • • Few bits (8 to 128) All combinations of 1’s and 0’s Example: 8 bit ASCII 01101101 = m • Bits have no inherent meaning Arbitrarily assigned by programmer Sparse Distributed Representations (SDRs) • • • Many bits (thousands) Few 1’s mostly 0’s Example: 2,000 bits, 2% active • Each bit has semantic meaning Learned 01000000000000000001000000000000000000000000000000000010000…………01000 SDR Properties 1) Similarity: shared bits = semantic similarity 2) Store and Compare: store indices of active bits subsampling is OK 3) Union membership: Indices 1 2 3 4 5 | 40 Indices 1 2 | 10 1) 2) 3) 2% …. 10) Union Is this SDR a member? 20% Neurons Active Dendrites Feedforward Pattern detectors Each cell can recognize 100’s of unique patterns 100’s of synapses “Classic” receptive field Context 1,000’s of synapses Depolarize neuron “Predicted state” Model neuron Learning Transitions Feedforward activation Learning Transitions Inhibition Learning Transitions Sparse cell activation Learning Transitions Time = 1 Learning Transitions Time = 2 Learning Transitions Form connections to previously active cells. Predict future activity. Learning Transitions Multiple predictions can occur at once. A-B A-C A-D This is a first order sequence memory. It cannot learn A-B-C-D vs. X-B-C-Y. High-order Sequences Require Mini-columns A-B-C-D vs. X-B-C-Y After training X A Same columns, Before training B’’ C’’ Y’’ B’ C’ D’ but only one cell active per column. X B C Y A B C D IF THEN 40 active columns, 10 cells per column 1040 ways to represent the same input in different contexts Cortical Learning Algorithm (CLA) aka Cellular Layer Converts input to sparse representations in columns Learns transitions Makes predictions and detects anomalies Applications 1) High-order sequence inference 2) Sensory-motor inference 3) Motor sequence recall L2/3 L4 L5 Capabilities - On-line learning - High capacity - Simple local learning rules - Fault tolerant - No sensitive parameters Basic building block of neocortex/Machine Intelligence Anomaly Detection Using CLA CLA Metric(s) Encoder SDR . . . Metric N CLA Encoder SDR Prediction Point anomaly Time average Historical comparison Anomaly score Prediction Point anomaly Time average Historical comparison Anomaly score System Anomaly Score Grok for Amazon AWS “Breakthrough Science for Anomaly Detection” Automated model creation Ranks anomalous instances Continuously updated Continuously learning Unique Value of Cortical Algorithms Technology can be applied to any kind of data, financial, manufacturing, web sales, etc. Application: CEPT Systems 100K “Word SDRs” Document corpus (e.g. Wikipedia) 128 x 128 Apple Fruit - Computer = Macintosh Microsoft Mac Linux Operating system …. Sequences of Word SDRs Training set frog cow elephant goat wolf cat elephant sheep cat wolf lion dog elephant cat coyote coyote wolf dog cat ---- eats eats eats eats eats likes likes eats eats eats eats likes likes likes eats eats eats likes likes ---- flies grain leaves grass rabbit ball water grass salmon mice cow sleep water ball rodent rabbit squirrel sleep ball ----- Word 1 Word 2 Word 3 Sequences of Word SDRs Training set frog cow elephant goat wolf cat elephant sheep cat wolf lion dog elephant cat coyote coyote wolf dog cat ---- eats eats eats eats eats likes likes eats eats eats eats likes likes likes eats eats eats likes likes ---- flies grain leaves grass rabbit ball water grass salmon mice cow sleep water ball rodent rabbit squirrel sleep ball ----- “fox” eats ? Sequences of Word SDRs Training set frog cow elephant goat wolf cat elephant sheep cat wolf lion dog elephant cat coyote coyote wolf dog cat ---- eats eats eats eats eats likes likes eats eats eats eats likes likes likes eats eats eats likes likes ---- flies grain leaves grass rabbit ball water grass salmon mice cow sleep water ball rodent rabbit squirrel sleep ball ----- “fox” eats rodent 1) Word SDRs created unsupervised 2) Semantic generalization SDR: lexical CLA: grammatic 3) Commercial applications Sentiment analysis Abstraction Improved text to speech Dialog, Reporting, etc. www.Cept.at Cept and Grok use exact same code base “fox” eats rodent NuPIC Open Source Project (created July 2013) Source code for: - Cortical Learning Algorithm - Encoders - Support libraries Single source tree (used by GROK), GPLv3 Active and growing community - 80 contributors as of 2/2014 - 533 mailing list subscribers Education Resources / Hackathons www.Numenta.org Goals For 2014 Research - Implement and test L4 sensory/motor inference - Introduce hierarchy - Publish NuPIC - Grow open source community - Support partners, e.g. IBM Almaden, CEPT Grok - Demonstrate commercial value for CLA Attract resources (people and money) Provide a “target” and market for HW - Explore new application areas • • • • • • One layer, no hierarchy 2000 mini-columns 30 cells per mini-column Up to 128 dendrite segments per cell Up to 40 active synapses per segment Thousands of potential synapses per segment One millionth human cortex 50 msec per inference/training step 128 The Limits of Software 30 2000 columns (highly optimized) We need HW for the usual reasons. - Speed - Power - Volume 40 Start of supplemental material “If you invent a breakthrough so computers can learn, that is worth 10 Microsofts” "If you invent a Machine Intelligence breakthrough so computers that learn that is worth 10 Micros 1) What principles will we use to build intelligent machines? 2) What applications will drive adoption in the near and long term? Machine intelligence will be built on the principles of the neocortex 1) Flexible (universal learning machine) 2) Robust 3) If we knew how the neocortex worked, we would be in a race to build them.