Exploring a sampling-based framework for probabilistic representation and computation in the cortex József Fiser Brandeis University Pietro Berkes Brandeis University Máté Lengyel, Gergő Orbán University of Cambridge Outline Challenges of the standard model of vision in the cortex Proposal: the sampling hypothesis How viable the sampling model is Empirical evidence explained Challenges Behavioral: In many cases, animals' and humans' perception, action and learning is best captured by models that assume probabilistic representation and computation in the brain Physiological: High level of structured spontaneous activity in the cortex that questions the feed forward nature of information processing and the role of ongoing activity Coding: High trial-to-trial variability of cell responses even in primary sensory cortices that strongly interferes with reliable neural computation. Systematic variations of neural response variance is not explained by current theories. The sampling hypothesis The cortex performs a probabilistic computation for inference and learning and to achieve this it uses a sampling-based neural representation in which spontaneous activity represents the prior information used for inference. Probabilistic computation: Represents and computes with probability distributions in order to handle uncertainty. Sampling-based: Distributions are represented by samples. Spontaneous = prior: Spontaneous activity is not noise. Logic of sampling-based representation trials / time variable 2 firing rate variable 2 orientation intensity feature 2 2 firing rate neuron 1 neuron 2 variable 1 orientation feature intensity 1 1 variable 1 neuron 1 Functional role of spontaneous activity Brightness coding in the primary visual cortex is factorized (Rossi et al. Science 1996) Spontaneous activity = samples from the prior Formalizing the relationship between SA and EA Activity p(y | x) in absence of input stimuli (x) represent samples from the prior p(y) - and this is the spontaneous activity. P (y) learning P (y | x) Pdata (x) dx from Pdata ( x ) PSA (y) PEA (y | x) Pdata ( x ) Spontaneous activity Evoked activity Natural scenes Evidence of sampling based probabilistic representation in the awake cortex Visual Data collection • Awake, head-restrained ferrets • 4 age groups (P30, P45, P90, P120) • Multi-electrode recording with 200 µ spacing from layer 2-3 of V1 • Complete dark (spont. activity) • Drifting grating • Natural scene movie • Random noise Results I SA and EA in the Movie condition converge with age In the adult animal, they do not differ significantly Results II The match between SA and EA is specific to natural scenes Results III The match between SA and EA is not specific to the visual modality FAQ FHS Why sampling just cannot be the answer It is not clear how the neural circuitry can do it It requires way too much time to collect sufficient number of samples It must break down if the input is non-stationary Limited number of samples must lead to bias in learning Well, let’s see… How can sampling be inplemented in neural networks? Quite naturally Gibbs sampling Hamiltonian Monte Carlo – Langevin sampling: A simple model of cue combination based on sampling Task: given auditory and visual information where is the target? Hamiltonian Monte Carlo – Langevin sampling: How many samples are needed for accurate estimation? Surprisingly few With 1 sample, the variance of the estimate is just twice the optimal (ML) variance Sampling might be optimized by the cortex via minimizing the autocorrelation of successive samples: What happens with non-static input? Things are actually getting better not worse As long as the internal model is correct, the model will have no problem tracking the signal Why is this happening? No need for burn-in, plus tighter prior Can we learn with such a small number of sampels? Yes..... Learning tuning curve positions and variances Results It is not clear how the neural circuitry can do it It requires way too much time to collect sufficient number of samples It must break down if the input is non-stationary Limited number of samples must lead to bias in learning Sampling can be the answer ✔ ✔ ✔ ✔ Experimental results explained Simulations Receptive fields u – membrane pot. Gain [u-Thres]1.2 ri - Poisson rate ni - spike count Trained on 80 000 patches of size 8x8 from the van Hateren image database Maximum likelihood learning using Expectation Maximization Specific questions Can we reproduce our KL results? Does the model agree on characteristics of variability measured in neural activity? Contrast dependence of variability Stimulus specificity of variability Onset effects of variability Stimulus specificity of co-variability Sanity check Sampling reproduces the KL divergence results Contrast dependent variability of u Posterior Blank u2 P(z|x) High Blank High u1 Sampling gives a natural link between contrast and variance Stimulus dependent variability of u and the effect of stimulus onset SIMULATIONS Nonpreferred Preferred time (ms) Sampling can reproduce both the stimulus independence of variability-change and the quenching effect of stimulus onset Stimulus dependent co-variability of u EXPERIMENT SIMULATION (Yu & Ferster 2010 Neuron) Sampling produces a reduced co-variability of the membrane potential at stimulus presentation just as it was found in vivo Stimulus dependent co-variability of u EXPERIMENT SIMULATION Similar orientation preference Dissimilar orientation preference (Yu & Ferster 2010 Neuron) Sampling can reproduce these correlational effects, too Conclusions • Sampling based probabilistic schemes are viable alternatives for neural representation in the cortex providing a unified framework to investigate learning and instantaneous perception • Measures based on comparing SA and EA support the idea that the cortex uses sampling based representation and SA has the functional role of providing prior information for probabilistic perceptual inferences • The sampling based representations corroborate the idea that rather than using mostly first moments, cortical coding is strongly related to signal co-variances and higher order moments of the neural signal Thank you ! The standard view on vision The fundamental method of information processing in the visual system can be described as • deterministic computation with added neural noise • feed-forward • based on mean firings of cells The function of the primary visual cortex in this framework is • restricted to represent only visual information • to create a faithful representation of the sensory stimulus with a neural code that is as sparse and independent as possible Challenges Behavioral: In many cases, animals' and humans' perception, action and learning is best captured by models that assume probabilistic representation and computation in the brain Life is uncertain and ambiguous… 2D to 3D s: stimulus Orientation of stimulus Intensity of stimulus Proposal The brain encodes both the value and the uncertainty of the stimulus during perception. Probabilistic approaches in … Classical conditioning (Courville et al. TICS 2006) Perceptual processes (Kersten et al. Ann. Rev. Psych. 2006) Visuo-motor coordination (Kording & Wolper Nature 2004) Cue combination (Atkins et al. Vis Res 2001; Ernst & Banks Nature 2002) Decision making (Trommershäuser et al. TICS 2008) High-level cognitive processes (Griffiths & Tenenbaum TICS 2006) Visual statistical learning (Orban et al. PNAS 2008) Is the neural computation deterministic? Probabilistic computation in the brain Computation Evidence Inference Behavior al Neural Learning Atkins et al., 2001 Ernst & Banks, 2002 Körding & Wolpert, 2004 Dayan et al., 2000 Weiss et al., 2002 Tenenbaum et al., 2006 Yang & Shadlen, 2007 Orbán et al., 2008 Kepecs et al., 2008 Platt & Glimcher, 1999 Yang & Shadlen, 2007 Kepecs, et al., 2008 ? Challenges Behavioral: In many cases, animals' and humans' perception, action and learning is best captured by models that assume probabilistic representation and computation in the brain Physiological: High level of structured spontaneous activity in the cortex that questions the feed forward nature of information processing and the role of ongoing activity Spontaneous activity in the awake brain Is it really noise? Example: Stimulus onset quenches trial-to-trial variability Churchland & >25; Nat Neuro 2010 Why? The sampling hypothesis Probabilistic representation in the cortex Sampling-based Parametric Probabilistic Population Codes (PPCs) Predecessors: Kernel Dens. Estimation Distr. Population Codes Earlier proposals: • Lee & Mumford 2003 • Hoyer & Hyvarinen 2003 Implicit examples: • Boltzmann machine • Helmholtz machine Special cases: • Olshausen & Field 1996 • Karklin & Lewicki 2009 A simple model of the early visual system e.g. Olshausen & Field 1996 Sparseness and independence probabilistic formulation Perception is an inference process which is always based on internally available prior information Looking for supporting evidence Predictions With accumulating visual experience, the distribution of spontaneous activity should become increasingly similar to the distribution of evoked activity averaged over natural stimuli . Probability of exhibiting a given neural activity pattern should be identical Probability of transitioning between particular patterns should also match Similarity should be specific to natural scenes but not to other stimulus ensembles Data analysis KL pMovie PpSpont pMovie x pMovie x log pSpont x X Results II Both SA and EAMovie become increasingly correlated Spatial correlations are similar between SA and EAMovie 16 P (y) P(yi ) ~ i 1 Results III Temporal correlations are similar between SA and EAMovie State transition distributions : P(yt | yt ) ~ Control : P(yt | yt ) P(y) Evidence of sampling based probabilistic representation in the awake cortex Auditory Data collection • Awake, head-restrained ferrets • One age group (adults) • Single-electrode single-unit recording in A1 • Complete silence (spont. activity) • White noise (0-20 KHz) • Natural speech Shamma-lab University of Maryland Stimulus dependence of spike count correlations Sampling can reproduce these correlational effects, too Evidence against sparseness and independence • No support for active sparsification independence in the cortex or increase of Relation between MP, mean, variance and Fano MP variance decreases, both SC meand and variance increase, but the mean increases stronger therefore the Fano factor still decreases Standard models of visual recognition Neural bases: (Hubel and Wiesel, 1962) System: (Ohki et al., 2006) Example: Cue combination Atkins, Fiser & Jacobs VisRes 2001; Ernst & Banks Nature 2002 Is the neural computation deterministic? Example 2: Decision making Trommershäuser, Maloney & Landy TICS 2008 Probabilistic inference and learning are coupled A fact If instantaneous perception is a process of probabilistic inference, learning also must be probabilistic. Probabilistic inference and learning Part 3 Evidence Probabilistic representation in the cortex: neural Computation Behavi oral Neural Inference Learning Atkins et al., 2001 Ernst & Banks, 2002 Körding & Wolpert, 2004 Weiss et al., 2002 Yang & Shadlen, 2007 Kepecs et al., 2008 Dayan et al., 2000 Tenenbaum et al., 2006 Platt & Glimcher, 1999 Yang & Shadlen, 2007 Kepecs, et al., 2008 Orbán et al., 2008 ? Alternative representational schemes Parametric Probabilistic Population Codes (PPCs) Predecessor: Kernel Dens. Estim. Sampling-based Implicit examples: • Boltzmann machine • Helmholtz machine Special cases: • Olshausen & Field 1996 • Karklin & Lewicki 2009 Practical issues in probabilisitic representation s: stimulus Parametric (PPC) Sampling-based Comparing the two schemes PPCs PPC sampl ing-based Sampling-based neurons correspond to parameters variables network dynamics required (beyond the first layer) deterministic stochastic (self consistent) representable distributions must correspond to a particular parametric form can be arbitrary critical factor in accuracy of encod ing a distribution number of neurons time allowed for sampling instantane ous representation of uncerta inty complete, the whole distribution is represented at any time partial, a sequence of samples is required number of neurons needed for representing multimodal distributions scales exponentially with the number of dimensions scales linearly with the number of dimensions implementation of learning unkno wn well-suited Logic of the two types of representation Sampling-based variable 2 variable 2 orientation intensity feature 2 2 PPC variable 1 variable 1 neuron 1 orientation feature intensity 1 neuron 2 1 neuron 3 firing rate neuron 1 ... variable22 variable firing rate variable 2 trials / time neuron 2 neuron 1 variable 1 neuron 2 variable 1 neuron 3 variable 1 ... neuron 1 Bottom line Sampling based probabilisitic representational schemes are promising alternatives but there is no neural evidence supporting the hypothesis that the cortex uses such a scheme In order to obtain such evidence we take a detour… Part 4 The role of spontaneous activity in visual coding • Raichle-lab: humans, entire cortex cortex, fMRI - Fox & Raichle Nature Rev Neu. 2007; Vincent et al. Nature 2007; Fox et al. Nat. Neu. 2005 • Grinvald-lab: cat, visual cortex, optical imaging - Kenet et al. Nature 2004; Tsodyks et al. Science 1999; Arieli et al. Science 1996 • Buzsáki-lab: rat, hippocampus, tetrode recording - Buzsáki & Draghun Science 2004; Harris et al. Nature 2003; Leinekugel et al. Science 2002; • McCormick-lab: ferret, cat, rat slices - Shu et al. Nature 2003; Shu et al. J. Neurosci 2003; McCormicki et al. Cereb. Cortex 2003 • Yuste & Ferster labs: in vivo in vitro single cell recording, 2-photon Ca imaging - Ikegaya et al. Science 2004; Anderson et al. Nat. Neuro 2000; Cossart et al. Nature 2003 Spontaneous activity in the awake brain (Fiser et al. Nature 2004) V • Awake, head-restrained ferrets • 3 age groups (P30, P45, P90) Visual stimulus • Multi-electrode recording from V1 • Three interleaved trial types: • complete dark (spontaneous act.) • random noise stimuli • natural scene movie The MATRIX Goal: Compare neural activity under the three different conditions Spontaneous activity in the awake brain (Fiser et al. Nature 2004) Results: • A systematic change in the correlational structure with age • Minimal difference between spontaneous and evoked activities Framework Not perceptual learning in the classic sense, but representational learning of the structure of the visual world through detecting "suspicious coincidences" a'la Barlow – Classical perceptual learning – Representational learning Supervised learning Reinforcement learning Signal-detection problem Unsupervised learning Combinatorial problem Not higher cognition or concept learning, but visual perception A plausible method for infants and animals, too Experiments with pairs Inventory: A Test: 2-AFC, Base-pair vs. Non-base pair B E Non-base element in either novel or familiar grid position I J Base-pair F Non-base pairs Novel element pos. Familiar element pos. I F A B F I Pair inventory ** Novel position * * Familiar position Elements of the non-base pairs Subjects automatically extracted both position-dependent and position-independent statistics from the scenes Frequency-balanced inventory Rare Frequent Base pairs Freq. balanced nonbase-pairs Base pairs Frequent base-pairs SCENE Frequency balanced nonbase-pairs Frequency-balanced results Subjects automatically preferred the rare but more predictable pairs At the same time, they were also sensitive to single element appearance frequency Embedded inventory How do we learn hierarchical structures? Method: Two base-pairs Two base-quadruples Six elements per scene Testing embedded and not embedded base-pairs against non-base pairs Sample scene Embedded results 1st round 2nd round * Chance • significant preference for base quads and non-embedded pairs • no significant preference for embedded pairs Results with balanced appearance of each element Experiment 1 p < 0.007 Results with frequency-balanced non-base pairs Experiment 2 Pairs: p < 0.012 Experiment 3 Singles: p = ns. Model performances I Basic experiments with pairs Frequency-counting models fail Model performances II Embedded experiment with triplets A D All three of the naïve statistical models fail Sample scene B E C F G AH I B G I C K H J L Summary Correct prediction without any parameter tweaking Part 3 The capacity of Working Memory Externa l world Visual stimulus Working memory Long-term memory What is the effect of long-term memory on working memory? Now we have a tool to investigate this! Issues with the capacity of WM WM is defined as a limited-capacity memory buffer Capacity of WM is traditionally given as the maximum number of ‘chunks’ stored in WM There is a controversy around the capacity of WM: 4 or 2 or 1? Vogel & Luck, Nature (1997) Olsson & Poom, PNAS (2005) Idea: Long-term memory is used to compress data in WM Visual Encoding signal Externa stimulus l world Working memory LTM dependent capacity Description length Learning Source coding theorem Long-term memory Predictive distribution DL Likely stimulus Encoding signal: b a a Unlikely stimulus Encoding signal: a a b d b a c Experimental paradigm Familiarization Change detection task 2AFC task ? vs. Learning predictive distribution Compression description length Assessing learning efficiency Key idea: modulating DL by building from different chunks Overall performance of humans True inventory 2AFC task CD task Naïve All vs. vs. vs. GL BL Overall performance of humans: Effect of set size Learning decouples set size and description length Controlling for set size Controlling for DL Performance change of trained subjects relative to naive subjects Low-DL trials Intermediate-DL trials High-DL trials Conditioning eliminates most of set size dependence Controlling for DL Controlling for set size Part 4 The role of spontaneous activity in visual coding An abstract computational model of representational learning • Probabilistic representation of data and uncertainty • Use of a generative model of the world A partial model of visual coding in the brain with issues • Ongoing activity, trial-by-trial response variability • Context dependency, attention, illusions Can we reconcile these issue? A Bayesian generative framework for the visual system Generative modeling hypothesis: the visual cortex implements an approximate inference in a generative model of the visual input. p(x) p(y | x) Causes: objects (x) … Learning … pixels (y) Representation = probability distributions p(x | y) p(y | x) p(x) y Recognition = inference Sampling hypothesis: neural activity represents Monte Carlo samples from the probability distribution over possible causes p(x | y) Top-down effects Confirmation Similarity measured by Kullback-Leibler divergence between the distributions of spontaneous and evoked activity KL pMovie PpSpont Results: pMovie x pMovie x log pSpont x X The probabilistic model Familiarization/learning: Testing/calculating predictive probabilities Graphical model: Likelihood: Model posterior: Choice probability: Computational framework Sigmoid Belief Network Computational framework Associative learner Graphical model: Likelihood: Model posterior: Choice probability: Embedded pairs Bayesian model comparison Compare evidences for inventories: Automatic Occam’s razor possible data sets experienced data set Overall performance of humans: Effect of set size Fitting 2-AFC: How about temporal correlations? • Adults can rapidly learn to segment (and group) sequences of shapes based solely on statistical information (joint and/or transitional probabilities) (Fiser & Aslin JEP LM&C 2002) • Infants can do the same (Fiser, McCrink & Aslin, in preparation)