inverse Ising

advertisement
KTH/CSC
The max-entropy fallacy
Erik Aurell
International Mini-workshop on
Collective Dynamics in Information Systems 2014
Beijing, October 13, 2014
Kavli Institute for Theoretical Physics China (KITPC)
Loosely based on
G. Del Ferraro & E.A. J. Phys. Soc. Japan 83 084001 (2014)
C. Feinauer, M. Skwark, A. Pagnani & E.A. PLoS Comp Biol 10 e1003847 (2014)
October 13, 2014
Erik Aurell, KTH & Aalto U
1
What entropy?
KTH/CSC
By entropy I will mean the Shannon entropy of a probability distribution:
S[ p]   pi log pi
i
What maximization?
Maximizing S[p] subject to the constraint
pi  e
1
Z
E   pi Ei  const. gives
i
  Ei
What max-entropy?
The idea that other probability distributions than equilibrium statistical
mechanics can be derived by maximizing entropy given suitable constraints.
October 13, 2014
Erik Aurell, KTH & Aalto U
2
Two reasons to give this talk
KTH/CSC
Max-entropy: E.T. Jaynes
Max-entropy inference: in
proposed in 1957 that both
equilibrium and non-equilibrium
statistical mechanics be based
upon this criterion.
the last decade considerable
attention has been given to
learning pairwise interaction
models from data, motivated by
max-entropy arguments. This
research is highly interesting, but
does it support max-entropy?
[...] the probability distribution
over microscopic states which
has maximum entropy, subject to
whatever is known, provides the
most unbiased representation of
our knowledge of the system.
E.T Jaynes, “Information Theory and
Statistical Mechanics II”, Physical Review
108 171-190 (1957)
October 13, 2014
Erik Aurell, KTH & Aalto U
3
Why oppose max-entropy?
KTH/CSC
Are probabilities in Physics
objective or subjective?
”[...]one must recognize the fact that
probability theory has developed in two
very different directions as regards
fundamental notions.”
”[..] the ’objective’ school of thought ..”
”[..] the ’subjective’ school of thought... ”
”[...] the probability of an event is merely
a formal expression of our expectation
that the event did or will occur, based on
whatever information is available”
E.T Jaynes, “Information Theory and
Statistical Mechanics I”, Physical Review
106 620-630 (1957)
October 13, 2014
(1) is it a practical method
to study non-equilibrium
processes (say, on graphs)?
(2) does it give the right
answers in principle?
(3) is it necessary to explain
the recent successes in
inference?
(4) is max-entropy inference
a scientific methodology
e.g. in the sense of Popper?
Erik Aurell, KTH & Aalto U
4
(1) is it practical?
KTH/CSC
We consider continuous-time dynamics on graphs. Dynamics could be
driven out of equilibrium, or relaxing towards equilibrium.
dP 1 ,  2 , 
   F j  P F j      F j P 
dt
j


[master equation]
True distribution
Observables in the sense of max-entropy
1 
1
N
O1 s  ,  2 
1
N
O2 s  ,  3 
1
N
O3 s  ,...
Auxiliary maximum entropy distribution
October 13, 2014
Erik Aurell, KTH & Aalto U
5
This is a dimensional reduction
KTH/CSC
Dynamics of the observables
according to the master equation
Dynamics of the observables
according to the auxiliary distr.
d (T )  l

dt
d (M ) l
  N1   k  OL Ok  Ol Ok
dt
1
N

j
( s )Ol ( F j ( s ))  Ol ( s ) 
j

If the auxiliary distribution is a good model both ways of computing the
dynamics must agree. In this way the changes of the β’s can be computed
and the master equation reduced to a (complicated) finite-dimensional ODE.
The averages have to be computed by the cavity method (or something else).
Graph of dynamics
Overlayed possible terms
Factor graph of auxiliary model
v
October 13, 2014
Erik Aurell, KTH & Aalto U
6
KTH/CSC
The approach not new. But not
(much) tested on single graphs
Simplest non-trivial case: the ID Ising spin chain
Simple ferromagnetic Hamiltonian
Obeys detailed balance
Essentially solved 51 years ago
Roy J Glauber, “Time-dependent statistics of the Ising model”,
Journal of mathematical physics, 4:294, (1963)
Simplest max-entropy theory built on observing magnetization and energy
Already this is not totally trivial to do…because of the averages…
October 13, 2014
Erik Aurell, KTH & Aalto U
7
KTH/CSC
…plus, in every time-step, an
implicit three-variable equation
change from master equation
computed by cavity
Solving equations by Newton
…works reasonably well…
Energy vs time
October 13, 2014
Erik Aurell, KTH & Aalto U
Difference to the
Glauber theory, in
energy vs time
8
Joint spin-field distribution…
KTH/CSC
A. C. C Coolen, S. N. Laughton, and D. Sherrington. Physical Review B 53: 8184, (1996)
In principle similar, but needs three cavity fields and solving an
eight-dimensional implicit equation at every step…
…and works better, though
not perfectly. The longer
range in the auxiliary
distribution the more
complicated the cavity
calculation and equation.
October 13, 2014
Erik Aurell, KTH & Aalto U
Difference to the
Glauber theory
9
Internal consistency check…
KTH/CSC
Consider again the two ways of computing the changes of observables
d (T )  l

dt
1
N

j
( s )Ol ( F j ( s ))  Ol ( s ) 
j
d (M ) l
  N1   k  OL Ok  Ol Ok
dt

They work also if Ol is not in the theory. But then they do not have to agree,
and the discrepancy between the two sides is an internal consistency check.
Simplest tests are for longer-range pairwise correlations. ci ,i  k  si si  k
Magnetization-energy
theory
October 13, 2014
Erik Aurell, KTH & Aalto U
Joint spin-field
distribution theory
10
Answer to: (1) is it practical?
KTH/CSC
No. It is complicated to implement, even in a simple 1D model of a
dynamics relaxing towards equilibrium. One can consider successive
approximations with longer “interactions”, but the complexity grows
very quickly. Which brings us to the next question:
(2) does max-entropy give right answers
outside equilibrium?
δQ forward: ε+Lθ
October 13, 2014
δQ backward: ε
Erik Aurell, KTH & Aalto U
11
KTH/CSC
Traditionally there were no
exact (relevant) results
If max-entropy is relevant for non-equilibrium then the probability distributions
should, as Gibbs-Boltzmann distribution, be exponential.
 N   X  f   
PrN X   e
[Gibbs-Boltzmann distribution]
PrN ,t X   e  NV  t , t ,..  sub - leading  [putative non-equilibrium distribution]
Now known for 10-15 years this is the case, but the functional V is very non-trivial.
SSEP
B , r   1    log
1 

  log
1 r
r
F

F ( x) 
V    BF ( x),  ( x)   log
 b   a 

1  F F 
 F  2
B. Derrida, J. Stat. Mech. (2007) P07023
October 13, 2014
Recently extended to multidimensional systems, for the related
question of fluctuations of the current.
E. Akkermans et al, EPL 103 20001 (2013)
Erik Aurell, KTH & Aalto U
12
KTH/CSC
Answer to: (2) does max-entropy
give right answers outside
equilibrium?
No, because there is no way that a complicated long-range effective
interaction potential can be deduced from maximizing entropy and a
limited number of simple constraints.
For the experts: both systems relaxing to equilibrium such as the Ising
spin chain and the SSEP (and other such solved models) are covered by
the macroscopic fluctuation theory of Jona-Lasinio and co-workers. But
only SSEP-like systems lead to long-range effective interactions. For the
relaxing Ising spin chain the max-entropy approach should hence
probably eventually work, though remain computationally cumbersome.
October 13, 2014
Erik Aurell, KTH & Aalto U
13
KTH/CSC
(3) is it necessary to explain the
recent successes in inference?
The main success is contact prediction in proteins. Folding proteins in
silico is hard, and not a solved problem – unless you have an already
solved structure as template. Predicting which amino acids are in
contact in a structure can be done from co-variation in similar proteins.
October 13, 2014
Erik Aurell, KTH & Aalto U
14
KTH/CSC
Relation btw positional correlation
and structure known since 20 ys
X2
Neher (1994)
Göbel, Sander,
Schneider, Valencia
(1994)
X3
X1 X4
Lapedes et al 2001
X5
Weigt et al PNAS 2009
Burger & van Nimwegen 2010
Balakrishnan et al 2011
Morcos et al PNAS 2011
Hopf et al Cell 2012
Jones et al Bioinformatics 2012
Ekeberg et al Phys Rev E 2013
Skwark et al Bioinformatics 2013
Kamisetty et al PNAS 2013
August 27, 2014
Erik Aurell, KTH & Aalto
X6
X7
15
KTH/CSC
The recent success is to learn a
Potts model from data
”To disentangle direct and indirect
couplings, we aim at inferring a
statistical model P(A1, ...,AL) for entire
protein sequences (A1, ...,AL) […] aim at
the most general, least-constrained
model […] achieved by applying the
maximum-entropy principle ”
F. Morcos et al, PNAS 108:E1293–E1301 (2011)
”The prediction method applies a maximum entropy approach to infer
evolutionary covariation […]”
T. Hopf et al, Cell 149:1607-21 (2012)
”The maximum-entropy approach to potentially solving the problem of protein
structure prediction from residue covariation patterns […]”
D. Marks et al, Nat Biotechnol. 30:1072-80 (2012)
February 26, 2014
Erik Aurell, KTH & Aalto
16
Actually we have all the data
KTH/CSC
It is a choice to reduce multiple sequence
alignments to nucleotide frequencies and
correlations for data analysis. But we start
from all the data. The conceptual basis of
max-entropy is therefore not there.
Furthermore, the best available
methods to learn these Potts
models use all the data.
M Ekeberg et al, Phys Rev. E (2013)
M Ekeberg et al, J Comp Phys (2014)
http://plmdca.csc.kth.se/
August 27, 2014
Erik Aurell, KTH & Aalto
17
Learning better models...
KTH/CSC
Multiple sequence alignments generally have stretches of gaps.
Not generated with high probability from a Potts model.
This (and previous) slide show the – real, but admittedly not very large
– improvement in contact prediction by learning two models
(gplmDCA and plmDCA20) which do take into account gap stretches.
Marcin Skwark and Christoph Feinauer, AISTATS (2014)
C. Feinauer, M. Skwark, A. Pagnani & E.A. PLoS Comp Biol 10 e1003847 (2014)
February 26, 2014
Erik Aurell, KTH & Aalto
18
KTH/CSC
Answer to: (3) is max-entropy
necessary to explain the recent
successes in inference?
No.
We are back to the objective / subjective interpretations of probability,
from the start the most contentious issue surrounding max-entropy.
The successes are better explained by the distribution of amino acids in
homologous proteins, as a result of all evolution of life, is actually in an
exponential family, and rather close to a Potts model.
Why that is or should be so? Nobody knows! Perhaps an important
problem for evolutionary theory? And perhaps has other uses?
October 13, 2014
Erik Aurell, KTH & Aalto U
19
KTH/CSC
(4) is max-entropy inference a
scientific methodology?
According to Popper science is
based on falsifiability. This same
basic idea has been stated by
many others, before and after.
We are trying to prove ourselves wrong as
quickly as possible, because only in that way
can we find progress
R.P. Feynman, as on famousquotes.org.
You cannot falsify anything by a
single experiment or single data
set with no theory of prediction
beforehand to falsify.
October 13, 2014
Also according to Popper,
scientific knowledge is built as a
collective enterprise of scientists.
Therefore, Jaynes’ conditional ...
[…]subject to whatever is known [..]
...implicitly includes all human
knowledge up to that time –
which is not a simple constraint.
A similar philosophical objection
can be made against Rissanen’s
Minimum Description Length
principle.
Erik Aurell, KTH & Aalto U
20
KTH/CSC
Thanks to
Gino Del Ferraro
Alexander Mozeika
Marcin Skwark
Christoph Feinauer
Andrea Pagnani
Magnus Ekeberg
Angelo Vulpiani
October 13, 2014
Erik Aurell, KTH & Aalto U
21
Download