Computational and Physiological Models Part 1 Ekaterina Lomakina Computational Psychiatry Seminar: Computational Neuropharmacology 7 March, 2014 Entropy and the theory of life “The general struggle for existence of animate beings is not a struggle for raw materials – these, for organisms, are air, water and soil, all abundantly available – nor for energy which exists in plenty in any body in the form of heat, but a struggle for [negative] entropy, which becomes available through the transition of energy from the hot sun to the cold earth.” Ludwig Boltzmann, 1875 “[...] if I had been catering for them [physicists] alone I should have let the discussion turn on free energy instead. It is the more familiar notion in this context. But this highly technical term seemed linguistically too near to energy for making the average reader alive to the contrast between the two things.” Erwin Schrödinger, 1944 Let’s go a bit more formally – 1 • The defining characteristic of biological systems is that they maintain their states and form in the face of a constantly changing environment Minimize entropy • Mathematically, this means that the probability of sensory states in which biological agent can be present must have low entropy. • Entropy is also the average self information or Minimize surprise surprise. Low surprise means that agent is likely to be in one of the very few states, so observing each of this states would cause low surprise (both emotionally and mathematically). • So to stay ‘alive’ biological agents must therefore minimize the long-term average surprise to ensure that their sensory entropy remains low. Let’s go a bit more formally – 2 • So system must avoid surprise. But how? It does not know everything what is going to happen… • System can only evaluate what is available to it: it’s sensory experience and it’s own properties, i.e. energy efficiency and robustness. • That’s where free-energy comes into play. • Thermodynamical (Helmholtz) free-energy is the work obtainable from a closed system at a constant temperature. FE = Energy + Entropy • Statistical Free-energy is a negative sum of the predictive accuracy of the model (energy of the model) and the entropy of the model. • As we show now free-energy can be seen as an upper-bound for surprise, which however system can optimize. Minimize entropy Minimize surprise Minimize free-energy And a bit more mathematically – 1 Let’s look at the log version Bayes formula, where s(t) are the sensory states and ϑ are representational parameters of the agent: Surprise! If we would have known the true generative model of the sensorium then we would have been able to compute model evidence of the model or it’s average surprise. It is also called sometimes the evidence of the agent’s self existence. And a bit more mathematically – 2 • However, the true model is almost always unavailable or at least computationally expensive to compute. Which means that brain is unlikely to be able to perform such an operation. • Instead, one can propose simpler model q and optimize it to be as similar to the true model as possible. • D is the KL divergence between proposed model q and true model p, which is always positive. • The smaller D becomes (the closer q gets to p) the lower becomes F and the closer F gets to model evidence. Thus F becomes the upper-bound for surprise. And a bit more mathematically – 3 • However, we can’t optimize D directly as it requires knowledge of the true model p. But after some magic… Expected energy (accuracy) Entropy (complexity) • This now can be efficiently optimized, using Variational Bayes technics. • They (usually) provide us with fast and efficient update and decision rules which are likely to be implementable within brain, contrary to sampling or numerical integration. And one last bit of formulas Free energy can be minimized in two ways: • By changing the mental representation (optimizing it such that it better explains data and becomes more compact) • By performing actions which would reduce potential surprise. Bayesian brain hypothesis P(ϑ|x) – posterior belief about causes Probabilistic Causality Sensations X Update beliefs • The Bayesian brain hypothesis uses Bayesian probability theory to formulate perception as a constructive process based on internal or generative models, where brain is presented as an inference machine that actively predicts and explains its sensations. • Probabilistic model generates predictions, against which sensory samples are tested to update beliefs about their causes. • The brain is an inference engine that is trying to optimize probabilistic representations of what caused its sensory input. • This optimization can be finessed using a (variational free-energy) bound on surprise. Complexity p(ϑ) – prior beliefs Accuracy p(x|ϑ) – generative model (likelihood) Bayesian brain hypothesis Two key questions in this hypothesis are: 1. How to choose the form of generative model and the choice of prior beliefs? Answer: to use hierarchical models in which the priors themselves are optimized. 2. How to choose the form of the proposed model or distribution q? Answer: it can take take any form driving the choice of optimization procedure, but the simplest assumption as it to be Gaussian (Laplace approximation). Then minimizing free energy simply explains away prediction error. Bayesian brain hypothesis • In case of Gaussian assumption the scheme is known as predictive coding. • It is a popular framework for understanding neuronal message passing among different levels of cortical hierarchies. • This scheme has been used to explain many features of early visual responses and can plausibly explain repetition suppression and mismatch responses in electrophysiology. The principle of efficient coding • The principle of efficient coding suggests that the brain optimizes the mutual information (that is, the mutual predictability) between the sensory states and its internal representation, under constraints on the efficiency of those representations. • The infomax principle says that neuronal activity should encode sensory information in an efficient and parsimonious fashion Representing variables Efficient coding Sensory states The principle of efficient coding • The infomax principle might be presented as a special case of the free-energy principle, which arises when we ignore uncertainty in probabilistic representations. • The infomax principle can be understood in terms of the decomposition of free energy into complexity and accuracy: mutual information is optimized when conditional expectations maximize accuracy (or minimize prediction error), and efficiency is assured by minimizing complexity. Complexity Accuracy Coding length Mutual information The cell assembly theory • ‘Cells that fire together wire together’. • Conditional expectations about states of the world are encoded by synaptic activity. • Learning under the free-energy principle is the the optimization of the connection strengths in hierarchical models of the sensory states. • It appears that a gradient descent on free energy is formally identical to Hebbian plasticity. The cell assembly theory • When the predictions and prediction errors are highly correlated, the connection strength increases, so that predictions can suppress prediction errors more efficiently. • Synaptic gain of prediction error units is modulated by the precision of units. • The most obvious candidates for controlling gain are classical neuromodulators like dopamine and acetylcholine. Neural Darwinism Epigenetic mechanisms Primary repertoire of neuronal connections Experience-dependent plasticity • • • • Secondary repertoire of neuronal connections • This theory focuses on how selection and reinforcement of action policies is performed within the boundaries of cell assembly theory. Only neuronal assembly which increase evolutionary value get reinforcement through the interaction with the environment (natural selection). Plasticity is thus modulated through value. Neuronal value systems reinforce connections to themselves, thereby enabling the brain to label a sensory state as valuable if, and only if, it leads to another valuable state. This theory has deep connections with reinforcement learning and related approaches in engineering, such as dynamic programming and temporal difference models. Neural Darwinism • Value is inversely proportional to surprise: the probability of an agent being in a particular state increases with the value of that state. • The evolutionary value of an agent is the negative surprise averaged over all the states it experiences, which is simply its negative entropy. • Prior expectations (that is, the primary repertoire) can prescribe a small number of attractive states with innate value. They can also affect the way world is sampled, i.e. force agent to explore until states with innate value are founded. • Neural Darwinism exploits the selective processes in order to explain brain evolution. • Free energy formulation considers the optimization of ensemble or population dynamics in terms of entropy and surprise. Complexity Priors on small amount of innate ‘good’ states + on exploration Accuracy ‘Survival’ or rate change of value Optimal control theory • • • • • • Optimal control theory describes how optimal actions should be selected to optimize expected cost. Free energy is an upper bound on expected cost. According to the principle of optimality cost is the rate of change of value, which depends on changes in sensory states. Optimized policy ensures that the next state is the most valuable of the available states. Priors specify small amount of fixed-point attractors, and when the states arrive at the fixed point, value will stop changing and cost will be minimized. Additional priors on motion through state space enforce exploration until an attractive state is found. Action under the free-energy principle is meant to suppress sensory prediction errors that depend on predicted (expected or desired) movement trajectories. Complexity Priors on small amount of fixed-point attractors where system should arrive + on motion Accuracy Rate of change of value Overview of free-energy principle • Many global theories of brain function can be united under a free energy principle. • The commonality is that brain optimizes a (free-energy) bound on surprise or its complement, value. • This manifests as perception (so as to change predictions) or action (so as to change the sensations that are predicted). • Crucially, these predictions depend on prior expectations (that furnish policies), which are optimized at different (somatic and evolutionary) timescales and define what is valuable. Dopamine as a prediction error encoder • Measures show that dopamine sensitively react to the reward itself or to the conditional stimuli predicting reward, when such a connection is learnt. • However dopamine level decreases in case of lack of predicted reward. • That gave rise to the hypothesis that dopamine encodes prediction error. • Optimal control theory might provide a framework to explain. Temporal difference algorithm • • • • • The computational goal of learning is to maximize expected discounted reward V (value function proportional to surprise). It can be computed dynamically knowing rewards at the previous time points. Prediction error δ (TD error) can be defined as a linear combination of reward and change in surprise. M1 and M2 are two cortical modalities which input (as a derivative of value V(t)) arrives at VTA. Reward r(t) also converges on the VTA. VTA output as a prediction error is taken as a simple linear sum informing structures constructing the prediction. Representing a stimulus through time • The ability to predict in time within the model is encoded as events happening at the different time points being different stimuli. • The predicted value is then modeled as a weighted linear combination of the sensory input. • The weights are updated using the information about the prediction error. Simulations • The conditional stimuli were presented at time step 10 and time step 20 followed by reward on time step 60. The prediction of the model. • Absence of reward on one of the intermediate trials causes a large negative fluctuation of the prediction error. • However overall model is able to learn well the dependency between conditional stimulus and reward, while also blocking the repetition of redundant in terms of information secondary stimulus. • The behavior of the prediction error mimics accuratly the measured dopamine response in monkeys in similar situation. Input Perception Volatility Probability Learning Hierarchical Gaussian filtering • One particular model within Bayesian Brain hypothesis is Hierarchical Gaussian filtering. • It consists of a hierarchical Bayesian model of learning through perception. • The response model can deal with states and inputs that are discrete or continuous, uni- or multivariate, and as well as with deterministic and probabilistic relationships between environment and perception. • Parameters of the model can account for individual differences between agents and can be used to simulate and explain maladaptive behavior. Hierarchical Gaussian filtering • • • • • This is the simplest example (with univariate, binary deterministic response model). Each layer of the model performs Gaussian random walk with parameters guiding layer coupling and width of the Gaussian walks. As many layers as needed can be added on top. Priors on the parameters χ = {κ, ω, ϑ} allow to perform full Bayesian inference. Inverting this model corresponds to optimizing the posterior densities over the unknown (hidden) states x = {x1, x2, x3} and parameters χ. This corresponds to perceptual inference and learning, respectively. Inversion under free energy principle • Exact inference of such a model would involve expensive numerical optimization. • Instead was used Variational Bayesian approach (which involves minimization of free energy). • The key assumption which was made is a factorization of the recognition distribution q (mean-field approximation) Update equations The resulting update equations are not only efficient and easy to compute but also resemble results from the field of reinforcement learning. Rescorla-Wagner model: prediction(k) - prediction(k-1) = learning rate x prediction error Precision updates • Variances σ1, σ2 and σ3 are also updated on every step in form of the precision (inverse variance) • The precision updates account for two type of uncertainty ‘informational’ (the lack of knowledge about x2) and ‘environmental’ (the volatility on the third level). • It has been proposed that dopamine might encode the value of prediction error, i.e., the precision-weighting of prediction errors. This is encoded by parameters κ and ω. Meaning and the effect of the parameters Reference scenario: ϑ = 0.5, ω = −2.2, κ = 1.4 Meaning and the effect of the parameters Reduced ϑ = 0.05 (unchanged ω = −2.2, κ = 1.4) Meaning and the effect of the parameters Reduced ω = −4 (unchanged ϑ = 0.5, κ = 1.4) Meaning and the effect of the parameters Reduced κ = 0.2 (unchanged ϑ = 0.5, ω = −2.2) Potential for real data Conventional observed behavioural data fail to detect difference between subjects However computational model reveal striking difference of the underlying mechanisms fraction of correct responses success 1 [running average] 0.5 0 20 40 60 80 100 120 140 160 trial-wise reward score 10 [running average] 5 0 RT [ms] 20 20 40 60 80 100 140 160 reaction time [s] 2 10 0 120 20 40 60 80 trials 100 120 4 140 6 160 S_3364P S_2978P S_2947P S_3040P S_2411T S_2855P S_1021T S_3232P S_3515P S_3327P S_3504P S_2862P S_3031P 1 0.8 healthy participant 0 50 4 volatility 2 1 0 -2 0.8 -4 0 50 4 1 2 probability 0 0.5 -2 -4 0 0 50 trials 1 1 0.5 prodromal schizophrenia 0.8 0 50 0 4 50 20 volatility 1 0 -2 0.8 -40 50 0 50 4 1 probability 2 0 0.5 -2 -4 00 50 0 50 trials 1 100 150 100 150 100 150 100 100 150 150 100 100 150 150 100 100 150 150 100 150 0.5 0 0 50 Outcome • Using free-energy principle we derive fast & efficient update equations which can be implemented by brain. • The resulting update equations present a clear connection to the field of reinforcement learning. • Parameterization of these equations accounts for individual differences and can model the whole variety of maladaptive behavior. • There is evidence that some of the parameters may correspond to neuromodulators, in particular dopamine. Conclusions • Computational models inspired by expert knowledge about the brain can provide us with powerful insights how brain works in healthy and maladaptive ways. • Free-energy principle provides a powerful framework, which generalizes many of the existent theories about the brain functioning. • However, “all models are wrong but some are useful”. To prove the correctness of model we have to see whether it predicts real data – more next week. • Neuromodulators can be predicted to explain certain parameters of the models. I.e. dopamine seems to play key role in the processes of reward-driven learning. Studies with pharmacological manipulations together with computational models can provide a deeper mechanistical insight on the particular role of it.