Neural coding and information geometry Hiroyuki Nakahara Lab for Integrated Theoretical Neuroscience RIKEN Brain Science Institute Outline 1. Overview 2. Hierarchical model of neural interactions Section I Overview Introduction Intelligent behavior (e.g. higher, cognitive, adaptive) Achieved by collective neural activities Decision making and reward-oriented behavior — Neural computations -- “Computations by collective neural activities” Introduction Neural computation – computational neuroscience: At heart, interactions of neural activities 1015 bits/sec (#synapse x 1 bit/sec) Terry Sejnowski Most experimental findings so far -- mostly by approach of “isolation” (e.g., lesion, fMRI and single-unit studies) (Bartels, Zeki 04) How to go beyond the ‘isolation’ approach? Introduction “modeling” – rather abstract form e.g. Amari-Hopfield model, Boltzmann machine, gradient disecent (natural ….), em-algorithm, graphical model, etc “modeling” – neural coding / population coding Tuning curve (spike counts), their variability ri = φi ( x; ci ) + ni ( x) ni ( x) N (0,σ i2 ( x)) Q: About input x, given the activities r ? e.g. Fisher information etc Our past works * Faithful and unfaithful models -- Wu et al. (’01, ’02a,b, ’04) * Attention modulation -- Nakahara et al (’01, ’02) * Singularity in decoding – Amari & Nakahara (’05) * Neural dynamics in SC – Nakahara et al (’06) Introduction “data analysis” “method of data analysis” Spike coding (Neural spike as binary variable) ηi = E[ xi ] Most research with data so far: 1st-order or 2nd-order model log Q pair ( X ) = ∑ θ i xi + ∑ θij xi x j −ψ i i, j How can we reliably detect usefulness and meanings of higher-than-pairwise order interactions of neural activities? Information geometry on binary random vectors Information Geometric Approach Basic needs : Systematic treatment for higher-order interaction Use good properties of the dual coordinates log P ( X ) = ∑ θi xi + ∑ θij xi x j + ∑ θijk xi x j xk + K + θ12Kn x1 x2 K xn −ψ i ij ijk η -coordinates: η = (ηi , ηij , ...) = (η1 ,η2 ,...,ηN ), ηA = E[ X A ] θ -coordinates: θ = (θ i , θ ij ,...) = (θ1 ,θ 2 ,...,θ N ) k-cut mixed coord: ς k = (η1 , η2 ,...,ηk ;θ k +1 ,θ k + 2 ,...,θ N ) = (ηk- ;θ k + ) ⎡ Aζ k FI is given as Gζ k = ⎢ ⎢⎣ O O ⎤ ⎡ Aη ⎥ , where Gη = ⎢ T Dζ k ⎥⎦ ⎣ Bη Bη ⎤ , Dη ⎥⎦ ⎡ Aθ Gθ = ⎢ T ⎣ Bθ Aζ k = Aθ−1 , Pythagorian: D[ P : Q] = D[ P : R] + D[ R : Q] where ς kR = ( ηkP− ; ς Qk + ) Past works Bθ ⎤ Dθ ⎥⎦ Dζ k = Dη−1 * IG framework for spike analysis -- Nakahara & Amari (‘02) * Detection of interaction cascade -- Nakahara et al. (‘03) * Emergent higher-order synchrony -- Amari et al. (‘03) * Temporal domain — Nakahara et al (’06) * Others etc…. Section II “Theory in practice?” Hierarchical Interaction Structure of Neural Activities in Cortical Slice Cultures (Santos, Gireesh, Plenz & Nakahara, J. Neurosci, 2010) Introduction • Previous studies suggested that 2nd-order maximum entropy model (“pairwise model”) adequately explains activity patterns. log Q pair ( X ) = ∑ θ i xi + ∑ θij xi x j −ψ i i, j • If generally true, a significant simplication for describing neural interactions Schneidman et al., 2006 Other works (Shlens et al 06, 09, Tang et al 08 etc) • Certainly, “appropriately simple” models are crucial for improving our understanding of neural interactions and thereby functions. • But is only up to 2nd-order really sufficient for any underlying network systems? cf. for computations as well as scalability • There may be other simplified models that correspond better to an underlying network interaction structure, thereby being adequate for large-scale cortical activity ---- what are appropriate units for interactions? Our dataset (Plenz Lab, NIMH) (hn note) “neural avalanches” power law • Cultures of coronal slices from rat somatosensory cortex and the VTA • LFP signal (1-200 Hz), thresholded to obtain negative LFP (nLFP) peaks • nLFP peaks binned at 4, 10, or 20 ms Pairwise model results Pairwise model on a group of 10 electrodes (randomly-sampled) Good results in 12 datasets (200 groups / dataset) The score called Fpair is frequently used : Fpair = 1 - ( ( Pˆ ( X ) Q ) ( X)) D KL Pˆ ( X ) Q pair ( X ) D KL ind Hierarchical Model: Proposal Our proposal - Define a new functional unit for large-scale activity - Create a hierarchical model for different spatial scales Intuition: low resolution of correlation structure COR of every electrode pair may not be needed Electrodes as units Clusters as units • How to define cluster activities? • How to find clusters? Cluster activity = magnitude of electrode activity in cluster Measures of cluster activity: Linear Examples n lin I = ∑ xi log I = ⎢⎣log 2 (1 + cIlin ) ⎥⎦ c i =1 Log c Binary cIbin = ( cIlin > 0 ) Results shown later, mostly for log cluster activity Clustering criterion: homogeneity of activity A backward searching procedure was used. (note: caveat) Hierarchical Model: Formulation Hierarchical model integrates interactions over different scales m c P ( X ) ≈ Qhier ( X ) = Q pair (C) ∏ I =1 e Q pair ( XI ) e Q pair ( CI ) C = ( c1 , c2 ,K , cm ) • Pairwise models for both local (unit = electrode) & long-range (unit = cluster) interactions, with conditional independence assumption on electrode activity across clusters given cluster activity. Results: clusters Clusters - example array: Homogeneous clusters characterized by strong correlation and electrode proximity within cluster across clusters • Physical distance on array • Correlation coefficients Results: 10-electrode – comparison with pairwise model One example of a 10 electrode group Cluster information used for 10-electrode groups Example results for 1 group (Fpair = 0.89, Fhier = 0.93) Comparison with pairwise model (10-electrode groups) Hierarchical model has better accuracy than pairwise model, even with fewer parameters Summary (12 datasets, 200 groups / dataset) Accuracy and # of param (over 12 datasets) Note: also confirmed with the results using cross-validation also confirmed w.r.t. ‘shuffled’ clusters Results: accuracy on full array (~ 60 electrodes) Hierarchical model with log clusters predicts array-wide synchronized states Summary Example (one dataset; for a ‘fraction’) P(f) (with log – colored, binary -- gray) Until here is in Santos et al., J. Neurosci, 2010 Ongoing results: Getting back to interactions at electrode level m c P ( X ) ≈ Qhier ( X ) = Q pair (C) ∏ I =1 e Q pair ( XI ) e Q pair ( CI ) C = ( c1 , c2 ,K , cm ) Example Original param. e Q pair (xα ) :{θiα , θijα } c Q pair (c) :{ζ α |a , ζ α ( β )α ′( β )|ab } θ − coordinates (electrode level) Q (xα ) :{θ% , θ% , θ% ,..., θ% } hier i ij ijk 123...n (Note: possible to write analytically) Discussion • The hierarchical model captures two levels of interactions using two units: electrodes and clusters. • The model captures cortical LFP activity patterns better than the pairwise model. Æ clusters -- underlying cortical layer structure? Æ Identifying the appropriate units of interaction of a network may enable the network interactions to be better characterized, with better pasimony and scalability • Significant higher-order interactions are embedded in a specific way. A new hierarchical model further helps us examine those properties. cf. across/within cluster interactions Follow-up (Q&A session) • I make a note here regarding “context specific graphical model” and/or “a weak definition of conditional independence”, which I mentioned in Q & A session. Here is the paper I was referring to; Nakahara et al. (2003) Bioinformatics. (you can also download it from www.itn.brain.riken.jp). Any comments and feedbacks will be greatly appreciated. • Taking advantage of adding this slide after my presentation, I would like to express my sincere gratitude to all the particiants in the workshop and the organizers who have all of us get together. END