KTH/CSC The max-entropy fallacy Erik Aurell International Mini-workshop on Collective Dynamics in Information Systems 2014 Beijing, October 13, 2014 Kavli Institute for Theoretical Physics China (KITPC) Loosely based on G. Del Ferraro & E.A. J. Phys. Soc. Japan 83 084001 (2014) C. Feinauer, M. Skwark, A. Pagnani & E.A. PLoS Comp Biol 10 e1003847 (2014) October 13, 2014 Erik Aurell, KTH & Aalto U 1 What entropy? KTH/CSC By entropy I will mean the Shannon entropy of a probability distribution: S[ p] pi log pi i What maximization? Maximizing S[p] subject to the constraint pi e 1 Z E pi Ei const. gives i Ei What max-entropy? The idea that other probability distributions than equilibrium statistical mechanics can be derived by maximizing entropy given suitable constraints. October 13, 2014 Erik Aurell, KTH & Aalto U 2 Two reasons to give this talk KTH/CSC Max-entropy: E.T. Jaynes Max-entropy inference: in proposed in 1957 that both equilibrium and non-equilibrium statistical mechanics be based upon this criterion. the last decade considerable attention has been given to learning pairwise interaction models from data, motivated by max-entropy arguments. This research is highly interesting, but does it support max-entropy? [...] the probability distribution over microscopic states which has maximum entropy, subject to whatever is known, provides the most unbiased representation of our knowledge of the system. E.T Jaynes, “Information Theory and Statistical Mechanics II”, Physical Review 108 171-190 (1957) October 13, 2014 Erik Aurell, KTH & Aalto U 3 Why oppose max-entropy? KTH/CSC Are probabilities in Physics objective or subjective? ”[...]one must recognize the fact that probability theory has developed in two very different directions as regards fundamental notions.” ”[..] the ’objective’ school of thought ..” ”[..] the ’subjective’ school of thought... ” ”[...] the probability of an event is merely a formal expression of our expectation that the event did or will occur, based on whatever information is available” E.T Jaynes, “Information Theory and Statistical Mechanics I”, Physical Review 106 620-630 (1957) October 13, 2014 (1) is it a practical method to study non-equilibrium processes (say, on graphs)? (2) does it give the right answers in principle? (3) is it necessary to explain the recent successes in inference? (4) is max-entropy inference a scientific methodology e.g. in the sense of Popper? Erik Aurell, KTH & Aalto U 4 (1) is it practical? KTH/CSC We consider continuous-time dynamics on graphs. Dynamics could be driven out of equilibrium, or relaxing towards equilibrium. dP 1 , 2 , F j P F j F j P dt j [master equation] True distribution Observables in the sense of max-entropy 1 1 N O1 s , 2 1 N O2 s , 3 1 N O3 s ,... Auxiliary maximum entropy distribution October 13, 2014 Erik Aurell, KTH & Aalto U 5 This is a dimensional reduction KTH/CSC Dynamics of the observables according to the master equation Dynamics of the observables according to the auxiliary distr. d (T ) l dt d (M ) l N1 k OL Ok Ol Ok dt 1 N j ( s )Ol ( F j ( s )) Ol ( s ) j If the auxiliary distribution is a good model both ways of computing the dynamics must agree. In this way the changes of the β’s can be computed and the master equation reduced to a (complicated) finite-dimensional ODE. The averages have to be computed by the cavity method (or something else). Graph of dynamics Overlayed possible terms Factor graph of auxiliary model v October 13, 2014 Erik Aurell, KTH & Aalto U 6 KTH/CSC The approach not new. But not (much) tested on single graphs Simplest non-trivial case: the ID Ising spin chain Simple ferromagnetic Hamiltonian Obeys detailed balance Essentially solved 51 years ago Roy J Glauber, “Time-dependent statistics of the Ising model”, Journal of mathematical physics, 4:294, (1963) Simplest max-entropy theory built on observing magnetization and energy Already this is not totally trivial to do…because of the averages… October 13, 2014 Erik Aurell, KTH & Aalto U 7 KTH/CSC …plus, in every time-step, an implicit three-variable equation change from master equation computed by cavity Solving equations by Newton …works reasonably well… Energy vs time October 13, 2014 Erik Aurell, KTH & Aalto U Difference to the Glauber theory, in energy vs time 8 Joint spin-field distribution… KTH/CSC A. C. C Coolen, S. N. Laughton, and D. Sherrington. Physical Review B 53: 8184, (1996) In principle similar, but needs three cavity fields and solving an eight-dimensional implicit equation at every step… …and works better, though not perfectly. The longer range in the auxiliary distribution the more complicated the cavity calculation and equation. October 13, 2014 Erik Aurell, KTH & Aalto U Difference to the Glauber theory 9 Internal consistency check… KTH/CSC Consider again the two ways of computing the changes of observables d (T ) l dt 1 N j ( s )Ol ( F j ( s )) Ol ( s ) j d (M ) l N1 k OL Ok Ol Ok dt They work also if Ol is not in the theory. But then they do not have to agree, and the discrepancy between the two sides is an internal consistency check. Simplest tests are for longer-range pairwise correlations. ci ,i k si si k Magnetization-energy theory October 13, 2014 Erik Aurell, KTH & Aalto U Joint spin-field distribution theory 10 Answer to: (1) is it practical? KTH/CSC No. It is complicated to implement, even in a simple 1D model of a dynamics relaxing towards equilibrium. One can consider successive approximations with longer “interactions”, but the complexity grows very quickly. Which brings us to the next question: (2) does max-entropy give right answers outside equilibrium? δQ forward: ε+Lθ October 13, 2014 δQ backward: ε Erik Aurell, KTH & Aalto U 11 KTH/CSC Traditionally there were no exact (relevant) results If max-entropy is relevant for non-equilibrium then the probability distributions should, as Gibbs-Boltzmann distribution, be exponential. N X f PrN X e [Gibbs-Boltzmann distribution] PrN ,t X e NV t , t ,.. sub - leading [putative non-equilibrium distribution] Now known for 10-15 years this is the case, but the functional V is very non-trivial. SSEP B , r 1 log 1 log 1 r r F F ( x) V BF ( x), ( x) log b a 1 F F F 2 B. Derrida, J. Stat. Mech. (2007) P07023 October 13, 2014 Recently extended to multidimensional systems, for the related question of fluctuations of the current. E. Akkermans et al, EPL 103 20001 (2013) Erik Aurell, KTH & Aalto U 12 KTH/CSC Answer to: (2) does max-entropy give right answers outside equilibrium? No, because there is no way that a complicated long-range effective interaction potential can be deduced from maximizing entropy and a limited number of simple constraints. For the experts: both systems relaxing to equilibrium such as the Ising spin chain and the SSEP (and other such solved models) are covered by the macroscopic fluctuation theory of Jona-Lasinio and co-workers. But only SSEP-like systems lead to long-range effective interactions. For the relaxing Ising spin chain the max-entropy approach should hence probably eventually work, though remain computationally cumbersome. October 13, 2014 Erik Aurell, KTH & Aalto U 13 KTH/CSC (3) is it necessary to explain the recent successes in inference? The main success is contact prediction in proteins. Folding proteins in silico is hard, and not a solved problem – unless you have an already solved structure as template. Predicting which amino acids are in contact in a structure can be done from co-variation in similar proteins. October 13, 2014 Erik Aurell, KTH & Aalto U 14 KTH/CSC Relation btw positional correlation and structure known since 20 ys X2 Neher (1994) Göbel, Sander, Schneider, Valencia (1994) X3 X1 X4 Lapedes et al 2001 X5 Weigt et al PNAS 2009 Burger & van Nimwegen 2010 Balakrishnan et al 2011 Morcos et al PNAS 2011 Hopf et al Cell 2012 Jones et al Bioinformatics 2012 Ekeberg et al Phys Rev E 2013 Skwark et al Bioinformatics 2013 Kamisetty et al PNAS 2013 August 27, 2014 Erik Aurell, KTH & Aalto X6 X7 15 KTH/CSC The recent success is to learn a Potts model from data ”To disentangle direct and indirect couplings, we aim at inferring a statistical model P(A1, ...,AL) for entire protein sequences (A1, ...,AL) […] aim at the most general, least-constrained model […] achieved by applying the maximum-entropy principle ” F. Morcos et al, PNAS 108:E1293–E1301 (2011) ”The prediction method applies a maximum entropy approach to infer evolutionary covariation […]” T. Hopf et al, Cell 149:1607-21 (2012) ”The maximum-entropy approach to potentially solving the problem of protein structure prediction from residue covariation patterns […]” D. Marks et al, Nat Biotechnol. 30:1072-80 (2012) February 26, 2014 Erik Aurell, KTH & Aalto 16 Actually we have all the data KTH/CSC It is a choice to reduce multiple sequence alignments to nucleotide frequencies and correlations for data analysis. But we start from all the data. The conceptual basis of max-entropy is therefore not there. Furthermore, the best available methods to learn these Potts models use all the data. M Ekeberg et al, Phys Rev. E (2013) M Ekeberg et al, J Comp Phys (2014) http://plmdca.csc.kth.se/ August 27, 2014 Erik Aurell, KTH & Aalto 17 Learning better models... KTH/CSC Multiple sequence alignments generally have stretches of gaps. Not generated with high probability from a Potts model. This (and previous) slide show the – real, but admittedly not very large – improvement in contact prediction by learning two models (gplmDCA and plmDCA20) which do take into account gap stretches. Marcin Skwark and Christoph Feinauer, AISTATS (2014) C. Feinauer, M. Skwark, A. Pagnani & E.A. PLoS Comp Biol 10 e1003847 (2014) February 26, 2014 Erik Aurell, KTH & Aalto 18 KTH/CSC Answer to: (3) is max-entropy necessary to explain the recent successes in inference? No. We are back to the objective / subjective interpretations of probability, from the start the most contentious issue surrounding max-entropy. The successes are better explained by the distribution of amino acids in homologous proteins, as a result of all evolution of life, is actually in an exponential family, and rather close to a Potts model. Why that is or should be so? Nobody knows! Perhaps an important problem for evolutionary theory? And perhaps has other uses? October 13, 2014 Erik Aurell, KTH & Aalto U 19 KTH/CSC (4) is max-entropy inference a scientific methodology? According to Popper science is based on falsifiability. This same basic idea has been stated by many others, before and after. We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress R.P. Feynman, as on famousquotes.org. You cannot falsify anything by a single experiment or single data set with no theory of prediction beforehand to falsify. October 13, 2014 Also according to Popper, scientific knowledge is built as a collective enterprise of scientists. Therefore, Jaynes’ conditional ... […]subject to whatever is known [..] ...implicitly includes all human knowledge up to that time – which is not a simple constraint. A similar philosophical objection can be made against Rissanen’s Minimum Description Length principle. Erik Aurell, KTH & Aalto U 20 KTH/CSC Thanks to Gino Del Ferraro Alexander Mozeika Marcin Skwark Christoph Feinauer Andrea Pagnani Magnus Ekeberg Angelo Vulpiani October 13, 2014 Erik Aurell, KTH & Aalto U 21