Deconstructing the Sense of Smell—June/19/2015 The pros and cons of the computational design of the olfactory system Ramon Huerta BioCircuits Institute, University California, San Diego Looking at the problem as an engineer • What’s the computational problem? • The role of fan-in/fan-out structure in the brain. • The equivalence with machine learning algorithms. • Gain control: What for and how? What is the computational problem? • What do we want to recognize? • How is the information transferred? -1 100 200 300 400 -1 500 600 (f) 2 =0.001 700 800 900 1000 1100 1000 1100 Response times of metal-oxide sensors to gas exposure -2 6000 2 (d) 4000 0 2000 Sensor Response () =0.1 emamax =0.01 emamax (c) =0.1 emamax 300 max ema=0.1 Chemical analyte adsorption (Gas injection phase) =0.01 emamin 400 100 200 300 400 100 considered 200 in the300 400 Features rising portion 1 0 Maximum values of the ema max ema=0.001 -1 100 200 300 400 2 max ema=0.01 0 100 200 300 500 400 0 -2 100 200 300 400 min ema=0.01 500 800 900 Chemical analyte desorption (Cleaning phase) min ema=0.1 700 800 1000 700 considered 800in the900 900 portion 1000 1100 1100 Features decaying of the sensor response Minimum values of the ema 0 -1 500 600 (f) 2 500 700 0 -2 500 600 600 500 Time (s) (e) 1 2 max ema=0.1 0 -2 600 2 (g) Steady-State Feature R=R-R0 of the sensor response -2 (d) 200 =0.01 emamin =0.001 emamax (b) -2 100 min ema=0.001 700 800 900 1000 1100 1000 1100 1000 1100 0 min ema=0.01 -2 600 2 (g) =0.1 emamin (a) max ema=0.01 0 =0.1 emamin =0.01 emamax 2 =0.001 emamin (c) 700 800 900 0 -2 600 min ema=0.1 700 800 900 60RPMSspeed 0.21 m/sec Fig. 7 Average accuracy of the models trained in one position landmark and validated in the rest of the positions. The models are trained and validated at the same sensors’ temperature and wind speed. Models trained in position lines # 1 and # 2 show poor ... Alexander Vergara , Jordi Fonollosa , Jonas Mahiques , Marco Trincavelli , Nikolai Rulkov , Ramón Huerta Sensors and Actuators B: Chemical, Volume 185, 2013, 462 - 477 http://dx.doi.org/10.1016/j.snb.2013.05.027 Sensor response Feature # / Sensor feature/ “Olfactory receptor” Sensory neuron representations Evoked Spike Rate ORN population response (24 of 51 ORN types) to a single odor ORN type Hallem and Carlson Cell 200 Main computational tasks • Classification: What is the machinery used for gas discrimination? • Regression: How do they estimate gas concentration or distance to the source? The simplified insect brain: model 0 Antennal Lobe (AL) Feature Extraction: Spatio-temporal coding Mushroom body (MB) Sparse code Antenna Main location of learning High divergence-convergence ratios from layer to layer. Output neurons What models do we use? • Level 1: Mcculloch-Pitts N yi (t 1) F wij x j (t ) j 1 It helps to determine how to build the connections and the neural code to solve pattern recognition problem. • Level 2: Grossberg-type or Wilson-Cowan N dyi F wij x j yi dt j 1 It helps to understand time because it can generate complex dynamics • Level 3: Hodgkin–Huxley d yi I k dt k N wij r ( x j ) ( yi V * ) j 1 It teaches you how to add circuits to be able to implement Level 1 discrimination. Stage II: Learning “perception” of odors Stage I: Transformation into a large display MB lobes CALYX Display Layer AL No learning required Decision layer Hebbian plasticity Output neurons Kenyon Cells Sparse code PNs (2%) iKC(35%) Inhibition Output(0.1%) Fan-out systems • PROS or CONS: What do you want to know first? The inherent instability of the fan-out structure N AL yi (t 1) F cij x j (t ) j 1 Fan-out: solvable inconveniences • The Projection from the PNs to KCs amplify noise. Fit a model of integrate and fire neurons to data Main message of the cons • Fan-out systems amplify everything even the bad stuff. • Gain control or gain modulation systems are needed if one wants to use them. PROS! Classification is easier in higher dimensions N AL yi (t 1) F cij x j (t ) j 1 N KC zi (t 2) F wij y j (t 1) O j 1 Linear versus nonlinear classifiers? Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems?. The Journal of Machine Learning Research, 15(1), 3133-3181. • The authors evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods). • The authors use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve • The classifiers most likely to be the bests are the random forest (RF) versions, the best of which achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. The SVM with Gaussian kernel achieves 92.3% of the maximum accuracy. What are the best known classification methods? 32.9 82.0 parRF t (RF) 33.1 82.3 rf t (RF) 36.8 81.8 svm C (SVM) 38.0 81.2 svmPoly t (SVM) 39.4 81.9 rforest R (RF) 39.6 82.0 elm kernel m (NNET) 40.3 81.4 svmRadialCost t (SVM) 42.5 81.0 svmRadial t (SVM) Stage I • No evidence of learning. • Large ratio: (#KCs /#PNs) • Sparse code: 1-5% active KCs for a given odor. Perez-Orive et al Science 2002 Jul 19;297(5580):359-65 Paul Szyszka et al, J. Neurophysiology 94 (2005). Thanks to Glen Turner 3-octanol 4-methylcyclohexanol Classification Accuracy 72% 3-octanol MBN # Cell Two 4-methylcyclohexanol Cell One Linear Discriminant Analysis (LDA) to assign odor identity on trial-by-trial basis trial # trial # Rob Campbell & Kyle Honegge Evidence of learning in the MB: Heisenberg et al (1985) J Neurogenet 2 , pp. 1-30. Mauelshagen J. (1993) J Neurophysiol. 69(2):609-25. Belle and Heisenberg,(1994) Science 263 , pp. 692-695. Connolly et al (1996) Science 274 (5295): 2104 Zars et al (2000) Science 288(5466):672-5. Pascual and Preat (2001) Science 294(5544):1115-7. Dubnau et al (2001) Nature 411(6836):476-80. Menzel & Manz (2005) J. Experimental Biol. 208: 4317-4332 Okada, Rybak, &Menzel (2007) J. of Neuroscience 27(43): 11736-47 Stijn Cassenaer &Laurent (2007) Nature 448:709-713. Strube-Bloss MF, Nawrot MP and Menzel R (2011): Mushroom Body Output Neurons Encode Odor-Reward Association. The Journal of Neuroscience, 31(8): 3129-3140 Key elements: 1. Hebbian plasticity in w. 2. Competition via inhibition (gain control). So what about the inhibition? LN1 LN2 Thanks to Stijn Cassenaer So, what about the plasticity? Hebbian rule 1 y j sgn zi R(e) with Pe wij (t 1) wij (t ) 2 rest 0 R(e) +1 positive reward and -1 negative reward (Dehaene, Changeux, 2000) and (Houk, Adams, Barto 1995) So, what about the reinforcement? Ventral Unpaired Median cell mx1 (VUMmx1) Broadly aroborizes brain regions associated with olfactory processing, sensory integration and premotor areas Receives input from gustatory input regions VUMmx1responds to Sucrose application to the proboscis and/or antennae sucrose Another Advantage • Robustness MB performance on MNIST dataset •Huerta R, Nowotny T, Fast and robust learning by reinforcement signals: explorations in the insect brain. Neural Comput. 2009 Aug;21(8):2123-51. Testing MB resilience on MNIST dataset Kenyon cells Sucrose Sucrose Option 1 Proboscis extension Active Kenyon cell + - Extension Active Retraction Active Sucrose Retraction Output neurons Option 2 Active Kenyon cell + Extension Active Retraction Inactive Bazhenov, Maxim, Ramon Huerta, and Brian H. Smith. "A computational framework for understanding decision making through integration of basic learning rules." The Journal of Neuroscience 33.13 (2013): 5686-5697. Analogy with machine learning devices: Support Vector Machines (SVM) • Given a training set {xi , yi }, i 1,, N , xi , yi [1,1] M Odorant in the AL coding space Good or bad? How many samples? Bad Good SVM • SVMs often use a expansion function (a Calyx) M () : , with the feature space (the KC neural coding space). • The classification function, the odor recognition function or the pattern recognition function is The AL neural coding f (i ) w, (i ) The output neurons, the β-lobe neurons, or the extrinsic neurons The connections from the Calyx to the output neurons. what we are trying to learn. The Calyx neural coding MB lobes CALYX Display Layer AL Intrinsic Kenyon Cells w Decision layer Extrinsic Neurons Competition Via inhibition f (i ) w, (i ) ( ) , AL coding SVM • We want to solve the classification problem: 1 2 min w w C max( 1 f ( i ) yi ,0) i 1 2 N Minimize the strength of the connections Minimize the errors SVM stochastic gradient algorithm ( i ) yi w w C 0 Make the connections as small as possible Connection removal is necessary to generalize better. To avoid overfitting. almost incorrrect strongly correct Change the connections if the sample is not correctly classified Hebbian ( x) j f i R(e) with Pe wij 0 rest Remarkable similarities 1. Structural organization: AL->Calyx ->ML Lobes 2. Connection removal and Hebbian learning: Perceptron rule 3. Inhibition provides robustness and allow to learn from fewer examples better. Thank you! Kerem Muezzinoglu(UCSD-Biocircuits, now ) Alex Vergara (UCSD-Biocircuits) Shankar Vembu(UCSD-Biocircuits) Thomas Nowotny(Sussex, UK) Amy Ryan (JPL-NASA) Margie Homer (JPL-NASA) Brian Smith (ASU) Gilles Laurent (CALTECH-Max Planck) Nikolai Rulkov (UCSD-Biocirucits) Mikhail Rabinovich (UCSD-Biocircuits) Travis Wong (ELINTRIX, San Diego) Drew Barnett (ELINTIRX, San Diego) Marco Trincavelli (Orebro, Sweden) Pablo Varona (UAM, Spain) Francisco Rodriguez (UAM, Spain) Marta Garcia Sanchez (UAM, Spain)