Kipt_Huerta_summary_June_2015

advertisement
Deconstructing the Sense of Smell—June/19/2015
The pros and cons of the
computational design of the
olfactory system
Ramon Huerta
BioCircuits Institute,
University California, San Diego
Looking at the problem as an
engineer
• What’s the computational problem?
• The role of fan-in/fan-out structure in the
brain.
• The equivalence with machine learning
algorithms.
• Gain control: What for and how?
What is the computational problem?
• What do we want to recognize?
• How is the information transferred?
-1
100
200
300
400
-1
500
600
(f) 2
=0.001
700
800
900
1000
1100
1000
1100
Response times of metal-oxide sensors
to gas exposure
-2
6000
2
(d)
4000
0
2000
Sensor
Response ()
=0.1
emamax
=0.01
emamax
(c)
=0.1
emamax
300
max ema=0.1
Chemical analyte adsorption
(Gas injection phase)
=0.01
emamin
400
100
200
300
400
100 considered
200 in the300
400
Features
rising portion
1
0
Maximum values of the ema
max ema=0.001
-1
100
200
300
400
2
max ema=0.01
0
100
200
300
500
400
0
-2
100
200
300
400
min ema=0.01
500
800
900
Chemical analyte desorption
(Cleaning phase)
min ema=0.1
700
800
1000
700 considered
800in the900
900 portion
1000 1100
1100
Features
decaying
of the sensor response
Minimum values of the ema
0
-1
500
600
(f) 2
500
700
0
-2
500
600
600
500
Time (s)
(e)
1
2
max ema=0.1
0
-2
600
2
(g)
Steady-State Feature
 R=R-R0
of the sensor response
-2
(d)
200
=0.01
emamin
=0.001
emamax
(b)
-2
100
min ema=0.001
700
800
900
1000
1100
1000
1100
1000
1100
0
min ema=0.01
-2
600
2
(g)
=0.1
emamin
(a)
max ema=0.01
0
=0.1
emamin
=0.01
emamax
2
=0.001
emamin
(c)
700
800
900
0
-2
600
min ema=0.1
700
800
900
60RPMSspeed 0.21 m/sec
Fig. 7 Average accuracy of the models trained in one position landmark and validated in the rest of the positions. The models are
trained and validated at the same sensors’ temperature and wind speed. Models trained in position lines # 1 and # 2 show poor ...
Alexander Vergara , Jordi Fonollosa , Jonas Mahiques , Marco Trincavelli , Nikolai Rulkov , Ramón Huerta
Sensors and Actuators B: Chemical, Volume 185, 2013, 462 - 477
http://dx.doi.org/10.1016/j.snb.2013.05.027
Sensor response
Feature # / Sensor feature/ “Olfactory receptor”
Sensory neuron representations
Evoked
Spike Rate
ORN population response (24 of 51 ORN types) to a single
odor
ORN type
Hallem and Carlson Cell 200
Main computational tasks
• Classification: What is the machinery used for
gas discrimination?
• Regression: How do they estimate gas
concentration or distance to the source?
The simplified insect brain: model 0
Antennal Lobe (AL)
Feature Extraction:
Spatio-temporal coding
Mushroom body (MB)
Sparse code
Antenna
Main location of learning
High divergence-convergence ratios
from layer to layer.
Output neurons
What models do we use?
• Level 1: Mcculloch-Pitts
 N


yi (t  1)  F   wij x j (t )   
 j 1

It helps to determine how to build the connections and the neural code to solve
pattern recognition problem.
• Level 2: Grossberg-type or Wilson-Cowan
 N

dyi
 F   wij x j      yi
dt
 j 1

It helps to understand time because it can generate complex dynamics
• Level 3: Hodgkin–Huxley
d yi
  I k
dt
k
 N

   wij r ( x j ) ( yi  V * )
 j 1

It teaches you how to add circuits to be able to implement Level 1 discrimination.
Stage II: Learning “perception” of odors
Stage I: Transformation into a large display
MB lobes
CALYX
Display Layer
AL
No learning
required
Decision layer
Hebbian
plasticity
Output neurons
Kenyon Cells
Sparse
code
PNs (2%)
iKC(35%)
Inhibition
Output(0.1%)
Fan-out systems
• PROS or CONS: What do you want to know
first?
The inherent instability of the fan-out
structure
 N AL


yi (t  1)  F   cij x j (t )   
 j 1

Fan-out: solvable
inconveniences
• The Projection from the PNs to KCs amplify
noise.
Fit a model of integrate and fire
neurons to data
Main message of the cons
• Fan-out systems amplify everything even the
bad stuff.
• Gain control or gain modulation systems are
needed if one wants to use them.
PROS!
Classification is easier in higher
dimensions
 N AL


yi (t  1)  F   cij x j (t )   
 j 1

 N KC


zi (t  2)  F   wij y j (t  1)  O 
 j 1

Linear
versus
nonlinear
classifiers?
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we
need hundreds of classifiers to solve real world classification problems?. The
Journal of Machine Learning Research, 15(1), 3133-3181.
• The authors evaluate 179 classifiers arising from 17 families
(discriminant analysis, Bayesian, neural networks, support vector
machines, decision trees, rule-based classifiers, boosting, bagging,
stacking, random forests and other ensembles, generalized linear
models, nearest neighbors, partial least squares and principal
component regression, logistic and multinomial regression, multiple
adaptive regression splines and other methods).
• The authors use 121 data sets, which represent the whole UCI data
base (excluding the large-scale problems) and other own real problems,
in order to achieve
• The classifiers most likely to be the bests are the random forest
(RF) versions, the best of which achieves 94.1% of the maximum
accuracy overcoming 90% in the 84.3% of the data sets. The SVM
with Gaussian kernel achieves 92.3% of the maximum accuracy.
What are the best known
classification methods?
32.9 82.0 parRF t
(RF)
33.1 82.3 rf t
(RF)
36.8 81.8 svm C
(SVM)
38.0 81.2 svmPoly t
(SVM)
39.4 81.9 rforest R
(RF)
39.6 82.0 elm kernel m (NNET)
40.3 81.4 svmRadialCost t (SVM)
42.5 81.0 svmRadial t
(SVM)
Stage I
• No evidence of learning.
• Large ratio: (#KCs /#PNs)
• Sparse code: 1-5% active KCs for a given odor.
Perez-Orive et al Science 2002 Jul 19;297(5580):359-65
Paul Szyszka et al, J. Neurophysiology 94 (2005).
Thanks to Glen Turner
3-octanol
4-methylcyclohexanol
Classification Accuracy 72%
3-octanol
MBN #
Cell Two
4-methylcyclohexanol
Cell One
Linear Discriminant Analysis (LDA)
to assign odor identity on
trial-by-trial basis
trial #
trial #
Rob Campbell & Kyle Honegge
Evidence of learning in the MB:
Heisenberg et al (1985) J Neurogenet 2 , pp. 1-30.
Mauelshagen J. (1993) J Neurophysiol. 69(2):609-25.
Belle and Heisenberg,(1994) Science 263 , pp. 692-695.
Connolly et al (1996) Science 274 (5295): 2104
Zars et al (2000) Science 288(5466):672-5.
Pascual and Preat (2001) Science 294(5544):1115-7.
Dubnau et al (2001) Nature 411(6836):476-80.
Menzel & Manz (2005) J. Experimental Biol. 208: 4317-4332
Okada, Rybak, &Menzel (2007) J. of Neuroscience 27(43): 11736-47
Stijn Cassenaer &Laurent (2007) Nature 448:709-713.
Strube-Bloss MF, Nawrot MP and Menzel R (2011): Mushroom Body Output Neurons
Encode Odor-Reward Association. The Journal of Neuroscience, 31(8): 3129-3140
Key elements:
1.
Hebbian plasticity in w.
2.
Competition via inhibition (gain
control).
So what about the inhibition?
LN1
LN2
Thanks to Stijn Cassenaer
So, what about the plasticity?
Hebbian rule

1

 y j sgn  zi   R(e) with Pe 
wij (t  1)  wij (t )  
2

rest

0

R(e)
+1 positive reward and -1 negative reward
(Dehaene, Changeux, 2000) and (Houk, Adams, Barto 1995)
So, what about the reinforcement?
Ventral Unpaired Median cell mx1 (VUMmx1)
Broadly aroborizes brain regions
associated with olfactory processing,
sensory integration and premotor areas
Receives input from gustatory input regions
VUMmx1responds to Sucrose application
to the proboscis and/or antennae
sucrose
Another Advantage
• Robustness
MB performance on MNIST dataset
•Huerta R, Nowotny T, Fast and robust learning by reinforcement signals: explorations in
the insect brain. Neural Comput. 2009 Aug;21(8):2123-51.
Testing MB resilience on MNIST dataset
Kenyon cells
Sucrose
Sucrose
Option 1
Proboscis
extension
Active Kenyon
cell
+
-
Extension
Active
Retraction
Active
Sucrose
Retraction
Output neurons
Option 2
Active Kenyon
cell
+
Extension
Active
Retraction
Inactive
Bazhenov, Maxim, Ramon Huerta, and Brian H. Smith. "A computational
framework for understanding decision making through integration of basic
learning rules." The Journal of Neuroscience 33.13 (2013): 5686-5697.
Analogy with machine learning devices:
Support Vector Machines (SVM)
• Given a training set
{xi , yi }, i  1,, N , xi   , yi  [1,1]
M
Odorant in
the AL coding
space
Good or bad?
How many
samples?
Bad
Good
SVM
• SVMs often use a expansion function (a Calyx)
M
() :   , with  the feature
space (the KC neural coding space).
• The classification function, the odor
recognition function or the pattern
recognition function is
The AL neural coding
f (i )  w, (i )
The output neurons, the
β-lobe neurons, or the
extrinsic neurons
The connections from the
Calyx to the output neurons.
what we are trying to learn.
The Calyx neural coding
MB lobes
CALYX
Display Layer
AL
Intrinsic
Kenyon Cells
w
Decision layer
Extrinsic
Neurons
Competition
Via
inhibition
f (i )  w, (i )
 ( )  ,  AL coding
SVM
• We want to solve the classification problem:
1 2

min w  w  C  max( 1  f (  i ) yi ,0)
i 1
2

N
Minimize the strength
of the connections
Minimize the errors
SVM stochastic gradient algorithm
 (  i )  yi
w   w  C 
0

Make the connections
as small as possible
Connection removal is
necessary to generalize
better. To avoid
overfitting.
almost incorrrect
strongly correct
Change the connections if the sample is not
correctly classified
Hebbian
( x) j f i R(e) with Pe 
wij  
0
rest

Remarkable similarities
1. Structural organization: AL->Calyx ->ML Lobes
2. Connection removal and Hebbian learning:
Perceptron rule
3. Inhibition provides robustness and allow to learn
from fewer examples better.
Thank you!
Kerem Muezzinoglu(UCSD-Biocircuits, now )
Alex Vergara (UCSD-Biocircuits)
Shankar Vembu(UCSD-Biocircuits)
Thomas Nowotny(Sussex, UK)
Amy Ryan (JPL-NASA)
Margie Homer (JPL-NASA)
Brian Smith (ASU)
Gilles Laurent (CALTECH-Max Planck)
Nikolai Rulkov (UCSD-Biocirucits)
Mikhail Rabinovich (UCSD-Biocircuits)
Travis Wong (ELINTRIX, San Diego)
Drew Barnett (ELINTIRX, San Diego)
Marco Trincavelli (Orebro, Sweden)
Pablo Varona (UAM, Spain)
Francisco Rodriguez (UAM, Spain)
Marta Garcia Sanchez (UAM, Spain)
Download