Implementing HMAX with an Integrate-&-Fire Array Tranceiver Ralph Etienne-Cummings, Fope Folowesele, R. Jacob Vogelstein, Gert Cauwenberghs* The Johns Hopkins University *UC – San Diego Outline Introduction Neural Arrays Our Integrate-&-Fire Array Transceivers Visual Object Recognition Pathways Models – HMAX HMAX with IFAT Conclusion Introduction Object detection, recognition and tracking are computationally difficult tasks Primates excel at these tasks Engineered systems are unable to match their level of proficiency, flexibility and speed Robots and other artificial systems are limited in their ability to interact with the environment Big Picture Our overall goal is to work towards developing a realtime autonomous intelligent system that can detect, recognize and track objects under various viewing conditions • Sense presence of object Detect Cross-Correlation Recognize • Identify and categorize object Spiking HMAX • Monitor object movement Track Neural Kalman The Approach Emulate cortical functions of primates to design more intelligent artificial systems › Mimic the visual information processing of the primate’s visual system › Model computationally-intensive algorithms in neural hardware Potential Applications Population Surveillance and Visual Search Engines Visual Prosthesis and Ocular Implants Research Tool for Neuroscientists Techarena 2009; Future Predictions 2008; R. Friendman, Biomedical Computation Review 2009 Project Plan Develop a spike-based processing platform on which we can demonstrate object detection, recognition and tracking › Design the next generation neural array transceiver › Realize silicon facsimiles of cortical simple cells, complex cells, composite feature cells and MAX › Implement Spike-based Classification › Implement neural algorithms analogous to crosscorrelation and Kalman filtering for object detection and tracking respectively Outline Introduction Neural Arrays Our Integrate-&-Fire Array Transceivers Visual Object Recognition Pathways Models – HMAX HMAX with IFAT Conclusion Software vs. Hardware Models Software models run slower than real time and are unable to interact with the environment Silicon designs take a few months to be fabricated, after which they are constrained by limited flexibility IBM 2004; Tenore 2008 Solution Reconfigurable Models Neural array transceivers are reconfigurable systems consisting of large arrays of silicon neurons Useful for studying real-time operations of cortical, largescale neural networks › Able to leverage the known fundamental blocks such as the operation of neurons and synapses › Flexible enough for testing out unknowns Digital Application Specific General Purpose Application-Specific Neural Array Transceivers Specific to particular neural processes such as › Spatial frequency and orientation (Choi et al. 2005) › Acoustic localization (Horiuchi & Hynna 2001) › Retinotopic self-organization (Taba & Boahen 2006) › Learning and Memory (Arthur & Boahen 2004, 2006) Digital Neural Array Transceivers Utilize digital logic as an alternative approach to analog VLSI designs › FPGA conductance-based neuron model (Graas et al. › › › › 2004) FPGA leaky integrate-and-fire neuron model (Pearson et al. 2005) DSP and FPGA populations of cortical cells for retinotopic maps (Shi et al. 2006) FPGA spike response neuron model (Ros et al. 2006) FPGA Izhikevich neural models (Cassidy & Andreou 2008) General Purpose Neural Array Transceivers More easily amenable to multiple tasks › Integrate-and-fire cooperative-competitive ring of neurons (Chicca & Indiveri 2006) › Integrate-and-fire with stop learning neural array (Mitra & Indiveri 2008) › Hodgkin-Huxley type neural array (Zou et al. 2006) › Integrate-and-fire array transceiver (Goldberg et al. 2001; Vogelstein et al. 2004, Folowosele et al. 2008) Why Integrate-and-Fire Array Transceiver? Flexible › No local or hardwired connectivity Reprogrammable › Virtual synaptic connections with programmable weight and equilibrium potential allowing for any arbitrary connection topology Expandable › Multiple chips can be connected together Outline Introduction Neural Arrays Our Integrate-&-Fire Array Transceivers Visual Object Recognition Pathways Models – HMAX HMAX with IFAT Conclusion Integrate-and-Fire Array Transceiver (IFAT) One of the earliest designs was by D.H. Goldberg et al in 2001 The chip was designed in a 0.5-micron process on a 1.5mm x 1.5mm die › 1024 integrate-and-fire neurons › 128 probabilistic synapses with two sets of fixed parameters D.H. Goldberg, Neural Networks, 2001 2nd Generation Integrate-and-Fire Array Transceiver (IFAT) Each neuron implements discrete-time model of a single compartment neuron using switched-capacitor architecture Synapses have two internal parameters › Synaptic weight › Equilibrium potential 2400 Neurons/Chip 4,194,304 synapses R.J. Vogelstein et al., IEEE Trans. Neural Networks 2007a IFAT Operation Incoming and outgoing address events are communicated through the digital I/O port (DIO) The MCU looks up the synaptic parameters (conductance and driving potential) and neuron address in RAM It then provides the parameters (driving potential via the DAC) to the appropriate neuron on the I&F chip R.J. Vogelstein et al., IEEE Trans. Neural Networks 2007a IFAT Operation R.J. Vogelstein et al., IEEE Trans. Neural Networks 2007a Spike-Based CMOS Cameras: Octopus Vdd_r reset event Ic Imaging Concept Sample Image Other Approaches: - W. Yang, “Oscillator in a Pixel,” 1994 -J. Harris, “Time to First Spike,” 2002 - A. Bermak, “Arbitrated Time to First Spike,” 2007 Culurciello, Etienne-Cummings & Boahen, 2001, 2003 IFAT Results R.J. Vogelstein et al., NIPS, 2005 IFAT 3G: 3D Design in 150n CMOS Tier A Tier B Tier C • Address Event Representation (AER) Communication Circuits • Receiver • Transmitter • Synapse • Bursting Circuit • Control Circuit • Neuron • Spike Generating Circuit In collaboration with the Sensory Communication and Microsystems Lab Outline Introduction Neural Arrays Our Integrate-&-Fire Array Transceivers Visual Object Recognition Pathways › Models – HMAX HMAX with IFAT Conclusion Visual Pathways Primary Visual Cortex V1 transmits information to two primary pathways › Dorsal stream › Ventral stream Dorsal pathway is associated with motion Ventral pathway mediates the visual identification of objects T. Poggio, NIPS, 2007 Wikipedia, The Free Encyclopedia Object Recognition for Computer Vision T. Poggio, NIPS 2007 Neurobiological Software Models VisNet (Wallis & Rolls 1997) › Homogenous architecture for invariance and specificity HMAX (Riesenhuber & Poggio 1999) › Feature complexity and invariance alternatingly increased in different layers of a processing hierarchy › Utilizes different computational mechanisms to attain invariance and specificity VisNet VisNet is a four layer feedforward network A series of hierarchical competitive networks with local graded inhibition Convergent connections to each neuron from a topologically corresponding region of the preceding layer Synaptic plasticity based on a modified Hebbian learning rule with a temporal trace of each cell’s previous activity E. Rolls & T. Milward, Neural Computation 2000 HMAX Summarizes and integrates large amount of data from different levels of understanding (from biophysics to physiology to behavior) Two main operations occur in the model › Gaussian-like tuning operation in the S layers › Nonlinear MAX-like operation in the C layers M. Riesenhuber & T. Poggio, Nature Neuroscience 1999 An Implementation Serre et al. 2007 System Layers S1 › Corresponds to classical simple cells of Hubel and Wiesel found in V1 › Gaussian-like tuning to one of four possible orientations with different filter sizes C1 › Corresponds to complex cells of Hubel and Wiesel › MAX pooling operation of S1 cells with the same orientation and scale band S2 › Pools over C1 units from a local spatial neighborhood › Behaves as radial basis function units – Gaussian-like dependence on the Euclidean distance between a new input and a stored prototype C2 › Global maximum over all scales and positions for each S2 type over the entire S2 lattice Serre et al. 2007 Learning and Classification Stages Learning › During training, extract prototypes at the C1 level from target image across all orientations Classification › At runtime, extract C1 and C2 standard model features (SMFs) and pass them to a simple linear classifier Serre et al. 2007 Scene Understanding System Serre et al. 2007 Object Recognition in Clutter C2 responses computed over a new input image and passed to a linear classifier Superior to previous approaches on MIT-CBCL data sets Comparable to previous on CalTech5 data sets Data Sets Bench mark C2 Features Boost SVM Leaves 84.0 97.0 95.9 Cars 84.8 99.7 99.8 Faces 96.4 98.2 98.1 Airplanes 94.0 96.7 94.9 Motorcycles 95.0 98.0 97.4 Faces 90.4 95.9 95.3 Cars 75.4 95.1 93.3 Serre et al. 2007 Summary Benefits to using the fine information from low-level SMFs › C1 SMFs superior for shape based object recognition Benefits to using the more invariant high-level SMFs › C2 SMFs suitable for semisupervised recognition of objects in clutter › C2 SMFs excel at recognition of texture-based objects which lack a geometric structure Too slow for real-time applications Outline Introduction Neural Arrays Our Integrate-&-Fire Array Transceivers Visual Object Recognition Pathways › Models – HMAX HMAX with IFAT Conclusion HMAX on IFAT System receives its inputs from silicon retinas Each simple cell receives inputs from four consecutive retinal cells › Two with excitatory connections › Two with inhibitory connections Excitatory and inhibitory synaptic weights are balanced so that the simple cells do not respond to uniform light R.J. Vogelstein et al., NIPS 2007 C1, S2 and beyond Implement C1, S2 and possibly C2 stages of the HMAX model HMAX model provides a generic high-level computational function in a quantitative way T. Serre, Dissertation 2006 Preliminary Results: S1 and C1 Stages S1 neurons are oriented spatial filters that detect local changes in contrast C1 neurons take the MAX of similarly-oriented simple cells over a region of space S1 cell integrates inputs from a 4x1 retinal receptive field C1 cell integrates inputs from an array of 5x5 similarlyoriented S1 cells F. Folowosele et al., BioCAS 2008 Canonical Models Biologically plausible neural circuits for implementing both Gaussian-like and MAX-like operations Kouh 2007 MAX Operation Nonlinear saturating pooling function on a set of inputs, such that the output codes the amplitude of the largest input regardless of the strength and number of the other inputs Set of input neurons {X} causes the output Z to generate spikes at a rate proportional to the input with the fastest firing rate R.J. Vogelstein et. al, NIPS 2007 Test1: Test Images and Resulting Simple Cells (A1-4) Generated test images (B1-4) Horizontallyoriented simple cells that respond to light-to-dark transitions (C1-4) Verticallyoriented simple cells that respond to dark-to-light transitions F. Folowosele et al., ISCAS 2007 Test 1: MAX Network Computation Results The ratio k obtained is approximately constant among all the simple cells, with a mean of 0.068 and a standard deviation of 0.0006 F. Folowosele et al., ISCAS 2007 Test2: Test Images and Resulting Simple and Complex Cells Checkerboard Test Image The cells within each square of the overlaid checkerboard pattern represent the 5x5 array of simple cells which are pooled to form a complex cell 2400 Simple Cells 80 complex cells MAX ratio: 0.1085 ± 0.02 After outliers are removed, MAX ratio: 0.1179 ± 0.01 F. Folowosele et al., BioCAS 2008 Future: Attention Modulated HMAX Riesenhuber , 2004 Conclusion General Purpose IF Array Transceivers › Allows implementation spike-based algorithms › Digital implementations may end up being more effective than the mixed signal version Object Recognition › HMAX provides a biologically plausible hierarchical model of V1 – PFC › Can be shown to outperform some benchmarks Implementation with IFAT › Preliminary results on the early layers › Future must also include attention Acknowledgments Telluride Neuromorphic Engineering Workshop UNCF-Merck Fellowship National Science Foundation References R.R. Murphy and E. Rogers, “Cooperative assistance for remote robot supervision,” Presence: Teleoperators and Virtual Environments Journal, vol. 5, no. 2, pp. 224-240, 1996. T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, and T. Poggio, “A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex,” AI Memo, MIT, Cambridge 2005. M. Riesenhuber, and T. Poggio, “Computational models of object recognition in cortex: a review,” Technical Report Artificial Intelligence Laboratory and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 2000b. R.J. Vogelstein, U. Mallik, E. Culurciello, G. Cauwenberghs, R. Etienne-Cummings, “A multichip neuromorphic system for spike-based visual information processing,” Neural Computation, vol. 19, pp. 2281-2300, 2007a. D.H. Goldberg, G. Cauwenberghs, and A.G. Andreou, “Probabilistic synaptic weighting in a reconfigurable network of VLSI integrate-and-fire neurons,” Neural Networks, vol. 14, pp. 781-793, 2001. T.Y.W. Choi, P.A. Merolla, J.V. Arthur, K.A. Boahen, and B.E. Shi, “Neuromorphic implementation of orientation hypercolumns,” IEEE ISCAS 2005. R.J. Vogelstein, U. Mallik, J.T. Vogelstein, G. Cauwenberghs, “Dynamically reconfigurable silicon array of spiking neurons with conductance-based synapses,” IEEE Transactions on Neural Networks, 2007b. A. Cassidy, S. Denham, P. Kanold, and A.G. Andreou, “FPGA-based silicon spiking neural array,” IEEE BioCAS 2007. B. E. Shi, E. K. C. Tsang, S. Y. M. Lam and Y. Meng, "Expandable hardware for computing cortical maps," IEEE ISCAS 2006. D.H. Hubel and T.N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat's visual cortex,” Journal of Physiology, vol. 160, no. 1, 1962. L.G. Ungerleider, and J.V. Haxby, “What and where in the human brain,” Curr. Opin. Neurobiol., pp. 157-165, 1994. E. Rolls and T. Milward, “A model of invariant object recognition in the visual system: Learning rules, activation functions, lateral inhibition, and information-based performance measures, Neural Computation, vol. 12, pp. 2547-2572, 2000. P Merolla and K Boahen, “A recurrent model of orientation maps with simple and complex cells,” Advances in Neural Information Processing Systems (NIPS) 16, S Thrun and L Saul, Eds, MIT Press, pp 995-1002, 2004. R.P.N. Rao, “Robut Kalman filters for prediction, recognition, and learning,” Technical Report 645, Computer Science Department, University of Rochester, 1996. J. Licklider, “A duplex theory of pitch perception,” Cellular and Molecular Life Sciences (CMLS), vol. 7, no. 4, pp. 128-134, 1951 J. Tapson, “Autocorrelation properties of single neurons,” Proceedings of the 1998 South African Symposium on Communication and Signal Processing, 1998. J. Tapson, C. Jin, A. van Schaik and R. Etienne-Cummings, “A First-Order Nonhomogeneous Markov Model for the Response of Spiking Neurons Stimulated by Small Phase-Continuous Signals,” Neural Computation, vol. 21, no. 6, pp. 1554-1588, June 2009. T. Lacey, “Tutorial: The Kalman filter,” Lecure Notes, Department of Computer Science, Georgia Institute of Technology, 1998. R. Linsker, “Neural network learning of optimal Kalman prediction and control,” Neural Networks, vol. 21, no. 9, pp. 1328-1343, 2008. R.E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering (Series D), pp. 3545, 1960. S. Mihalas and E. Niebur, “A generalized linear integrate-and-fire neural model produces diverse spiking behaviors,” Neural Computation, 2008 in Press. C. Cadieu, M. Kouh, A. Pasupathy, C.E. Connor, M. Riesenhuber, T. Poggio, “A model of V4 shape selectivity and invariance,” J. Neurophysiol., 2007.