Ubiquitous Cognitive Computing: A Vector Symbolic Approach BLERIM EMRULI EISLAB, Luleå University of Technology Outline Context and motivation Aims Background (concepts and methods) Summary of appended papers Conclusions and future work Ubiquitous Cognitive Computing: A Vector Symbolic Approach Ubiquitous Cognitive Computing: A Vector Symbolic Approach Ubiquitous Cognitive Computing: A Vector Symbolic Approach Conventional computing 1+2/3 = 1.666…7 1010 XOR 1000 = 0010 1-64 bit variables Cognitive computing Concepts, relations, sequences, actions, perceptions, learning … Some concepts man ≅ woman man ≅ lake Ubiquitous Cognitive Computing: A Vector Symbolic Approach Cognitive computing Bridging of dissimilar concepts man - fisherman - fish – lake man - plumber - water – lake Relations between concepts and sequences 5 : 10 : 15 : 20 5 : 10 : 15 : 30 Ubiquitous Cognitive Computing: A Vector Symbolic Approach Ubiquitous Cognitive Computing: A Vector Symbolic Approach “..invisible, everywhere computing that does not live on a personal device of any sort, but is in the woodwork everywhere (Weiser, 1994).” – Mark Weiser, widely considered to be the father of ubiquitous computing Ubiquitous Cognitive Computing: A Vector Symbolic Approach Is cognitive computing for ubiquitous systems, i.e., systems that in principle can appear “everywhere and anywhere” as part of the physical infrastructure that surrounds us. Ubiquitous Cognitive Computing: A Vector Symbolic Approach Intuition high-level “symbol-like” representations high-level processing low-level processing (sensory integration) Aims Investigate mathematical concepts and develop computational principles with cognitive qualities, which can enable digital systems to function more like brains in terms of: learning/adaption generalization association prediction … Other desirable properties computationally lightweight suitable for distributed and parallel computation robust and degrade gracefully Related approaches service-oriented architecture (SOA) traditional artificial intelligence techniques cognitive approach (Giaffreda, 2013; Wu et al., 2014) Geometric approach to cognition What can we do with words of 1-kilobyte or more? 1 0 1 0 1 2 3 4 … …. 1 0 1 0 1 9996 9997 9998 9999 10000 Pentti Kanerva started to explore this idea in the 80’s Engineering perspective with inspiration from biological neural circuits and human long-term memory Since the 90’s similar ideas developed also from Peter Gärdenfors, Professor at Lund University Sparse Distributed Memory (SDM) inspired by circuits in the brain model of human long-term memory associative memory KEY IDEA: Similar or related concepts in memory correspond to nearby points in a high-dimensional space (Kanerva, 1988, 1993) SDM interpreted as computer memory SDM interpreted as feedforward neural network Vector symbolic architectures (VSAs) Concepts and their interrelationships correspond to points in a high-dimensional space Able to represent concepts, relations, sequences… learn, generalize, associate… perform analogy-making using vector representations based on sound mathematical concepts and principles (Plate, 1994) Vector symbolic architectures (VSAs) VSAs were developed to address some early criticisms of neural networks (Fodor and Pylyshyn, 1988) while retaining useful properties such as learning, generalization, pattern recognition, robustness and noise immunity (30% corruption tolerable) There are mathematical operators for how to construct operate, query etc. compositional structures, which are part of the VSA framework Analogy-making Analogy-making is a central element of cognition that enables animals to identify and manage new information by generalizing past experiences, possibly from a few learned examples Present theories of analogy-making usually divide this process into three or four stages (Eliasmith and Thagard, 2001) My work is focused mainly on the challenging mapping stage Analogical mapping Analogical mapping is the process of mapping relations and concepts from one situation (a source), x, to another (a target), y; M : x → y Analogical mapping The process of mapping relations and concepts that describe one situation (a source) to another (a target) Analogical mapping (cont’d) The process of mapping relations and concepts that describe one situation (a source) to another (a target) Circle is above the square Analogical mapping (cont’d) The process of mapping relations and concepts that describe one situation (a source) to another (a target) Square is below the circle Analogical mapping (cont’d) The process of mapping relations and concepts that describe one situation (a source) to another (a target) Novel ‘‘above–below’’ relations Generalization via analogical mapping (Neumann, 2001) Generalization via analogical mapping (Neumann, 2001) Generalization via analogical mapping (Neumann, 2001) Generalization via analogical mapping (Neumann, 2001) A difficult computational problem If analogical mapping is considered as a graph comparison problem it is a challenging computational problem VSAs use compressive representations, not graphs The ability to encode symbol-like approximate representations makes VSAs computationally feasible and psychologically plausible Gentner and Forbus (2011) and Eliasmith (2013) Sum-up I have adopted a vector-based geometric approach to cognitive computation because it appears to be sufficiently potent and suitable for implementation in resource-constrained devices A central part of the work deals with analogy making and learning as a key mechanism enabling interoperability between heterogonous systems, much like ontologies play a central role in service-oriented architecture and the semantic web Raad and Evermann (2014): Is Ontology Alignment like Analogy? Thesis – Appended papers A. Emruli, B. and Sandin, F. (2014): Analogical Mapping with Sparse Distributed Memory: A Simple Model that Learns to Generalize from Examples B. Emruli, B., Gayler, R. W., and Sandin, F. (2013): Analogical Mapping and Inference with Binary Spatter Codes and Sparse Distributed Memory C. Emruli, B., Sandin, F. and Delsing, J. (2014): Vector Space Architecture for Emergent Interoperability of Systems by Learning from Demonstration D. Sandin, F., Emruli, B. and Sahlgren M. (2014): Random Indexing of Multi-dimensional Data Thesis – Cognitive computation papers A. Emruli, B. and Sandin, F. (2014): Analogical Mapping with Sparse Distributed Memory: A Simple Model that Learns to Generalize from Examples B. Emruli, B., Gayler, R. W., and Sandin, F. (2013): Analogical Mapping and Inference with Binary Spatter Codes and Sparse Distributed Memory C. Emruli, B., Sandin, F. and Delsing, J. (2014): Vector Space Architecture for Emergent Interoperability of Systems by Learning from Demonstration D. Sandin, F., Emruli, B. and Sahlgren M. (2014): Random Indexing of Multi-dimensional Data Thesis – Cognitive architecture for ubiquitous systems paper A. Emruli, B. and Sandin, F. (2014): Analogical Mapping with Sparse Distributed Memory: A Simple Model that Learns to Generalize from Examples B. Emruli, B., Gayler, R. W., and Sandin, F. (2013): Analogical Mapping and Inference with Binary Spatter Codes and Sparse Distributed Memory C. Emruli, B., Sandin, F. and Delsing, J. (2014): Vector Space Architecture for Emergent Interoperability of Systems by Learning from Demonstration D. Sandin, F., Emruli, B. and Sahlgren M. (2014): Random Indexing of Multi-dimensional Data Thesis – Encoding vector representations paper A. Emruli, B. and Sandin, F. (2014): Analogical Mapping with Sparse Distributed Memory: A Simple Model that Learns to Generalize from Examples B. Emruli, B., Gayler, R. W., and Sandin, F. (2013): Analogical Mapping and Inference with Binary Spatter Codes and Sparse Distributed Memory C. Emruli, B., Sandin, F. and Delsing, J. (2014): Vector Space Architecture for Emergent Interoperability of Systems by Learning from Demonstration D. Sandin, F., Emruli, B. and Sahlgren M. (2014): Random Indexing of Multi-dimensional Data Paper A Cognitive Computation 6(1):74–88, 2014 Emruli B. and Sandin F. Q1: Is it possible to extend the sparse distributed memory model so that it can store multiple mapping examples of compositional structures and make correct analogies from novel inputs? Analogical mapping unit (AMU) SDM Results: size of the memory and generalization Results: size of the memory and generalization minimum probability of error Paper B IJCNN 2013, Dallas, TX Aug. 4 – 9, 2013 Emruli B., Gayler W. R. and Sandin F. Q2: If such an extended sparse distributed memory model is developed, can it learn and infer novel patterns in sequences such as those encountered in widely used intelligence tests like Raven’s Progressive Matrices? Bidirectionality of mapping vectors Bidirectionality problem Raven's Progressive Matrices Rasmussen R. and Eliasmith C., Topics in Cognitive Science, Vol. 3, No. 1, 2011 Learning mapping vectors SDM Learning mapping vectors (cont’d) SDM Learning mapping vectors (cont’d) SDM Prediction SDM Results Paper C Biologically Inspired Cognitive Architectures 9:33–45, 2014 Emruli B., Sandin F. and Delsing J. Q3: Could extended sparse distributed memory and vector-symbolic methodologies such as those considered in Q1 and Q2 be used to address the problem of designing an architecture that enables heterogeneous IoT devices and systems to interoperate autonomously and adapt to instructions in dynamic environments? Communication architecture No shared operational semantics (Sheth, 1999; Obrst, 2003; Baresi et al., 2013) Automation system Learning by demonstration Alice Bob Interact with the four systems to achieve a particular goal Instructions of Alice and Bob are the same Results One instruction per day by Alice and Bob Paper D Knowledge and Information Systems Submitted Sandin F., Emruli B. and Sahlgren M. Q4: Is it possible to extend the traditional method of random indexing to handle matrices and higher-order arrays in the form of N-way random indexing, so that more complex data streams and semantic relationships can be analyzed? What are the other implications of this extension? Random indexing (RI) Random indexing is (was) an approximative method for dimension reduction and semantic analysis of pairwise relationships Main properties concepts and their interrelationships correspond to random points in a high-dimensional space incremental coding/learning lightweight, suitable for processing of streaming data accuracy comparable to standard methods for dimension reduction Applications natural language processing search engines pattern recognition (e.g., event detection in blogs) graph searching (e.g., social network analysis) other machine learning applications Results: one-way versus two-way Random Indexing (RI) Anecdote “ As an engineer, this can feel like a deal with the devil, as you have to accept error and uncertainty in your results. But the alternative is no results at all! ” Pete Warden, data scientist and a former Apple engineer Results: two-way RI versus PCA Gavagai AB: Opinion mining Viewer votes Gavagai forecast 30 % Danny Saucedo 33 % 22 % Thorsten Flinck 8% 12 % Loreen 2012 22 % Summary The proposed AMU integrates the idea of mapping vectors with sparse distributed memory Demonstration of transparent learning and application of multiple analogical mappings The AMU solves a particular type of Raven’s matrix The SDM breaks the commutative (bidirectionality) property of the binary mapping vectors Summary (cont’d) Outline of communication architecture that enables system interoperability by learning, without reference to a shared operational semantics Presenting a novel approach to a challenging problem Extension of Random Indexing (RI) to multiple dimensions in an approximately fixed size representation Comparison of two-RI with the traditional (one-way) RI and PCA Limitations Hand-coding of the representations The examples addressed in Paper C are relatively simple, more complex examples and symbolic representation schemes are needed to further test the architecture Attention mechanism needs to be developed Extension to higher-order Markov chains In Paper D only one- and two-way RI are investigated and problems considered are relative small in scale and not demonstrated in streaming data Future work To apply the architecture outlined in Paper C in a “Living Lab” equipped with technology similar to that described in the hypothetical automation scenario To improve and further investigate, both empirically and theoretically the implications of the NRI extension Is the mathematical framework sufficiently general? “A beloved child has many names.” Holographic Reduced Representation (HRR) - 1994 Context-Dependent Thinning (CDT) - 2001 Vector Symbolic Architecture (VSA) - 2003 Hyperdimensional Computing (HC) - 2009 Analogical Mapping Unit (AMU) - 2013 Semantic Pointer Architecture (SPAUN) - 2013 Matrix Binding of Additive Terms (MBAT) - 2014 Key readings Sparse Distributed Memory (Kanerva, 1988) Conceptual Spaces (Gärdenfors, 2000) Holographic Reduced Representation (Plate, 2003) Geometry and Meaning (Widdows, 2004) How to Build a Brain (Eliasmith, 2013) The Geometry of Meaning (Gärdenfors, 2014) Credits Supervisors JERKER DELSING FREDRIK SANDIN GUSTAFSSON LENNART Coauthors ROSS GAYLER MAGNUS SAHLGREN Discussions and inspiration ASAD KHAN PENTTI KANERVA BRUNO OLSHAUSEN CHRIS ELIASMITH Financial support STINT, ARROWHEAD PROJECT, NORDEAS NORRLANDSSTIFTELSE, AND THE WALLENBERG FOUNDATION COLLEAGUES, FAMILY AND FRIENDS THE END … or perhaps the beginning