Discrete Probability on Graphs: Estimation, Reconstruction of & Optimization on Networks

advertisement
Discrete Probability on Graphs:
Estimation, Reconstruction of &
Optimization on
Networks
Elchanan Mossel
UC Berkeley
At: IPAM Mar 2007
Outline: Stochastic Models on Networks
• Disclaimer: Big field ; Biased choice of examples … - an
applied view.
• Part 0: Two types of Network problems.
• Part I: Estimation of statistical quantities in Gibbs-measures
/ Markov Random Fields
• part II: Reconstruction of Stochastic Networks from
observations.
–
-
Tree Networks.
Directed Acyclic graphs.
• part III: Optimization over stochastic models defined on
networks
- Which functions of stochastic models can be
(approximately) optimized efficiently?
January 15, 2007
2/31
Part 0:
Two Types of Network
Problems
Two types of Network problems
• Type 1: Structural Network problems.
• Type 2: Distributional Network problems.
• This talk: Mostly Distributional network problems.
• Examples of Structural Network problems:
• “Clustering”: Partition a graph G = (V,E) to V = V1,…,Vk
such that each Vi is “big” and there is a small number of
edges between Vi and Vj for i  j.
• “Ranking”: Given a random walk on a finite set, find the
stationary distribution.
• Spectral Techniques are applicable for both problems.
January 15, 2007
4/31
A hard Structural Network Problem
• The “Graph Isomorphism Problem”
Given two graphs (G,E) and (H,F) is there an “isomorphism”,
f : G ! H one to one s.t. (v1,v2) 2 E iff (f(v1),f(v2)) 2 F.
• Clear: if two graphs isomorphic, then they have same
spectral structure, but this is not enough …
• Other open problems exits in this area …
• Example of recent work:
January 15, 2007
5/31
part I:
Estimation in Markov
Random Fields
Gibbs Measures / Graphical Models





A Gibbs Measure on a (finite) graph G=(V,E) is given by
Node potentials (v : v 2 V) and
Edge Potentials (e : e 2 E)
The probability of  = ((v) : v 2 V) 2 A|V| is given by
P[] = Z-1
£
v 2 V v[(v)] £
e=(v,u) 2 Ee[(v),(u)]
G



Gibbs measures introduced in Statistical Physics.
Essential in Machine Learning.
Also known as Markov Random Fields, Graphical Models etc.
January 15, 2007
Diffusion of Influence in Social Networks
7
Message Passing Algorithms / The Replica Method





Statistical Problem: Given a Gibbs measure: estimate
P[(0) = a]
Equivalent to many other Inference Problems.
Computational View: Problem can be NP hard (to approximate)
even in very simple cases.
Statistical Physics view: Find Dynamics / Markov Chains that
have P as stationary measure.
Statistical Physics Insight:



G
Rapid Convergence of Dynamics  spatial correlation decay.
A very active area of research ; Fascinating Challenges.
Artificial Intelligence / Neuroscience / Replica view:
Solve problem by “Message Passing”
January 15, 2007
Diffusion of Influence in Social Networks
8
Message Passing Algorithms / The Replica Method






Message Passing Algorithms are used to estimate
probabilities on graphical models.
Examples: Warning Propagation, Sum-Product, Belief
Propagation etc.
All of these algorithms do exact calculation for an associated
computation tree.
Example: Belief Propagation (BP) is a popular method in
AI/Coding for estimating marginal probabilities P[(0) = a] for
a Gibbs measure G.
It is equivalent [TatikondaJordan02] to calculating marginal
probabilities P[(0) = a] on the computation tree T(G).
Question: How come message passing algorithms work in
practice?
January 15, 2007
Diffusion of Influence in Social Networks
G
T
9
Message Passing Algorithms in Coding







In coding:
BP is used to decode Low Density Parity Check Codes (LDPC)
[Gallager62]
Proved to be efficient [Luby-Mitzenmacher-ShokrollahiSpielman-98, Richardson-Urbanke-01]
Message passing algorithms work “because:”
LDPC factor graphs are locally “tree-like” &
Individual constraints “push” toward the correct code
word.
Actual analysis uses recursion of random variables on the
tree.
January 15, 2007
Diffusion of Influence in Social Networks
10
Message Passing Algorithms -Random 3-SAT
x1
x2
x3
x4
x5
x6
x7
n
x8
m = n
WalkSAT
Survey propagation
Satisfiable
Belief propagation
Satisfiable
Myopic
Not
satisfiable
Not
satisfiable
PLR
0 January 15, 2007
1.63

3.52
3.95
Diffusion of Influence in Social Networks
4.27
4.51
11
Message Passing Algorithms for Random 3-SAT







Message passing algorithms work because:
Random-SAT graphs are locally “tree-like”
Far away variables are uncorrelated:
Speculation 1: For Belief Propagation: Variables are un-correlated in
a “standard sense” when  · 3.95
Thm: (Maneva-M-Wainwright-05): Survey Propagation is just Belief
Propagation on an extended Markov Random Field.
Speculation 2: For Survey Propagation: Variables are un-correlated
in the extended Markov Random Field for all .
Speculations 1 & 2 are under heated discussions between Physicists,
Computer Scientists and Mathematicians …
15, 2007
M. January
Talagrand
in Social Networks
G. ParisiDiffusion of Influence
B. Selman
12
Decay of correlation for 3-SAT extended MRF
{0, 1}n assignments
Partial assignments
0110
1011
01101
# stars
01101

January 15, 2007
Diffusion of Influence in Social Networks
13
part II:
Reconstructing Stochastic
Network from observations
Main Problem:
• How to reconstruct the network topology from
observations at a (sub)-set of the nodes?
• The Example: Reconstructing Trees.
January 15, 2007
15
Two Tree Inference Problems

In Evolution:


Phase Transition:


Given a tree of species /
mothers, can we infer
ancestral sequence at the root
from contemporary samples?
Trade-off between noise and
duplication?
Reconstructing Evolution:

Is it possible to reconstruct
evolutionary history from
genetic sequences?
January 15, 2007
16
Defn: Markov Model on a Tree
…001100011101000011000100…

s(r)
Ising/BSC/CFN Model:


Tree: T = (V,E)
Node states:
pra
s(v) 0,1: v V 


s(a)
Number of leaves: n
0: Purines (A,G)
1: Pyrimidines (C,T)
January 15, 2007
prc
0
pab
Mutation probabilities:
0  pe  1 2 : e  E
0
s(b)
pb1
0
s(1)
pa3
0
pb2
0 1
s(2)
s(3)
1
s(c)
pc4
pc5
0
1
s(4) s(5)
17
Defn: Phylogenetic Reconstruction Problem
Phylogenetic Reconstruction:



Given: k i.i.d. samples at the n leaves
Task: fully reconstruct the model, i.e.
find tree and mutation probabilities
(and, if possible, do so efficiently)
Studied in:
Biology (dozens of books, 1000s of
papers) [Felsenstein’04]

TCS (Learning): [Ambainis-DesperFarach-Kannan’97], [FarachKannan’96], [Cryan-GoldbergGoldberg’02]
[M-Roch ]

Combinatorial Phylogeny: [Erdos‘98], [M’07]
January Steel-Szekely-Warnow’97,
15, 2007
s(1)
s(2)
0
0
1
0
1
0
0
1
0
0
s(3)
1
0
0
1
0
s(4) s(5)
1
1
1
1
0
1
1
1
1
1

+
prc
pc5

18
Phase Transition for the Ising model
LOW
Temp
“typical”
boundary
bias
2 2 > 1
HIGH
Temp
“typical”
boundary
no bias
22 < 1
The transition at 2 2 = 1 was proved by:
[Bleher-Ruiz-Zagrebnov’95], [Ioffe’96],[Evans-Kenyon-Peres-Schulman’00],
[Kenyon-Mossel-Peres’01],[Martinelli-Sinclair-Weitz’04], [Borgs-Chayes-M-Roch’06].
Also, “spin-glass” case studied by [Chayes-Chayes-Sethna-Thouless’86]. Solvability for
2 2 > 1January
was first
by [Higuchi’77] (and [Kesten-Stigum’66]).
15, proved
2007
  2 (M )
19
n = # of leaves
k = # of samples
Steel’s Favorite Conjecture
Reconstruction Problem
Phylogeny
N
conj
Y
conj
N
proof
k = n(1)
k =(log n)
k = n(1)
[M’03 (J. Comp. Biol.)]
Y
proof
k =(log n)
Random Cluster Model: [M-Steel’04 (Math. Biosciences.)]
CFN Model: [M-04’ (Transaction of AMS)],
January 15, 2007
[Daskalakis-M-Roch’ (STOC06)]
20
Polynomial Lower Bound at High Mutations
Proof:

Conditional Independence + Data
Processing Lemma
X=T
L
Known
q-L

?
?
*k
Known
*k
In fact:


[M’06: (IEEE. Comp. Bio. & BioInfo)]: “Shallow Part” of the tree can
be efficiently reconstructed when k = O(log n) for all mutation rates.
Also in practice [Daskalakis-Hill-Jaffe-Mihaescu-M-Rao (Recomb06)]
January 15, 2007
21
Reconstruction from short sequences

Th [Daskalakis-M-Roch (STOC’06)]: If T is a tree on n
leaves s.t.


For all e, min < (e)< max and 22min > 1, max < 1.
Then there exists a polynomial time algorithm that uses
sequences of length k = O(log n – log ) to reconstruct the
topology with probability 1- in polynomial time where the
constant depends on (min, max).
January 15, 2007
22
Proof: Distance Methods



Associate to each edge e the weight ln (12pe)
For any two leaves i and j:
ln(1 – 2 pi,j) =  ln (1 – 2 pe)
where the sum is over all e in the path
connecting a to b.
Reconstruction Algorithm:




r
(ra)
a
(ab)
Estimate pi,j from sequences
b
Deduce the topology of the tree
(b1)
Problem: Need exp. long sequences
ESSW: “log n” radius neighborhoods
determine the tree ) poly(n) sequence length 1
suffices.
January 15, 2007
(rc)
(a3)
(b2)
2
c
(c4)
3
4
(c5)
5
23
Back
Four-Point Method
  D(a, c)  D(b, d )  D(a, b)  D(c, d )
a
c
a
b
b
d
c
d
January 15, 2007
0
0
a
b
d
c
0
24
Balanced Trees

Two-Step Algorithm [M, 2004]:



1) Reconstruct one (or a few) level(s)
2) Infer sequences at roots
3) Start over
January 15, 2007
25
General Trees [Daskalakis, M, Roch, 2006]
January 15, 2007
26
Blindfolded Cherry Picking

Need “only” one extra step in the algorithm

Main Loop:




1) Distance estimation
2) Identify cherries from the next level
3) Sequence reconstruction
4) Detect “fake cherries”
January 15, 2007
27
Blindfolded Cherry Picking I: Edge Disjointness
Non Edge-Disjoint Reconstruction
January 15, 2007
True Tree
28
Blindfolded Cherry Picking II: Weight Estimation
January 15, 2007
29
Blindfolded Cherry Picking III: Collisions
January 15, 2007
30
Tree Reconstruction in a Nutshell
Tree reconstruction can be
solved from very short sequences

There exists a good estimator
for root reconstruction
Similar Techniques apply to other tree networks – for example

Reconstructing Multicast Networks (Liang-M-Yu, BhamidiRajagopal-Roch)
January 15, 2007
31
Back to General Problem:
• How to reconstruct the network topology from
observations at a (sub)-set of the nodes?
• Example 3: Reconstructing Markov Random Fields from
observations at a subset of the nodes ???
January 15, 2007
32/31
part III:
Optimization over Stochastic Networks
Motivating Problem
•
Problem:
–
•
Examples:
–
–
–
–
–
•
•
Optimization over stochastic models defined on networks.
Which Genes to knock out in order to kill a cancer cell?
Which computers to immune in order make a networks robust?
Which computers to attack in order to fail the network?
Which individuals to immune to stop a disease from spreading.
Viral Marketing: Which individuals to expose to a product so as to
maximize its distribution?
One case Study: Influence in Social Networks
Joint work with Sebastien Roch.
January 15, 2007
34/31
models of collective behavior
•
examples:
–
–
–
•
joining a riot
adopting a product
going to a movie
model features:
–
–
–
binary decision
cascade effect
network structure
January 15, 2007
35/31
viral marketing
• referrals, word-of-mouth can be very effective
–
ex.: Hotmail
• viral marketing
–
–
goal: mining the network value of potential customers
how: target a small set of trendsetters, seeds
• example [Domingos-Richardson’02]
–
–
collaborative filtering system
use MRF to compute “influence” of each customer
January 15, 2007
36/31
independent cascade model
• when a node is activated
–
–
it gets one chance to activate each neighbour
probability of success from u to v is pu,v
0.5
0.33
0.25
0.5
1.0
0.5
0.5
0.5
1.0
0.75
0.5
0.5
January 15, 2007
0.25
0.5
37/31
generalized models
• graph G=(V,E); initial activated set S0
•
generalized threshold model [Kempe-Kleinberg-Tardos’03,’05]
–
–
–
•
activation functions: fu(S) where S is set of activated nodes
threshold value: u uniform in [0,1]
dynamics: at time t,set St to St-1 and add all nodes with fu(St-1)  u
(note the process stops after (at most) n-1 steps)
generalized cascade model [KKT’03,’05]
–
when node u is activated:
• gets one chance to activate each neighbours
• probability of success from u to v: pu(v,S) where S is set of nodes who have
already tried (and failed) to activate u
–
•
assumption: the pu(v,.)’s are “order-independent”
theorem [KKT’03] - the two models are equivalent
January 15, 2007
38/31
influence maximization
• definition - the influence (S) given the initial seed S is the
expected size of the infected set at termination
 (S)  E S Sn1 
• definition - in the influence maximization problem (IMP), we want
to find the seed S of fixed size k that maximizes the influence

S*  argmax  (S) : S  V, S  k
• theorem [KKT’03] - the IMP is NP-hard
–
reduction from Set Cover: ground set U = {u1,…,un} and collection of cover
S ,…,S
subsets
1
m
u1
u2
u3
un

ui  S j
…
…
independent
cascade
model
S1
S2
S3
(ui ,S j )  E
Sm

January 15, 2007

S, S  k,  (S)  n  k ?
39/31
submodularity
• definition - a set function f : V -> R is submodular if for all A, B in V
f (A)  f (B)  f (A B)  f (A B)
• example: f(S) = g(|S|) where g is concave

• interpretation: “discrete concavity” or “diminishing returns”, indeed
submodularity equivalent to
S  T,v  V,
f (T {v})  f (T)  f (S {v})  f (S)
• threshold models:
 – it is natural to assume that the activation functions have diminishing
–
returns
supported by observations of [Leskovec-Adamic-Huberman’06] in the
context of viral marketing
January 15, 2007
40/31
main result
• theorem [M-Roch’06; first conjectured in KKT’03] - in the generalized
threshold model, if all activation functions are monotone and
submodular, then the influence is also submodular
• corollary [M-Roch’06] - IMP admits a (1 - e-1 - )-approximation
algorithm (for all  > 0)
–
this follows from a general result on the approximation of submodular
functions [Nemhauser-Wolsey-Fisher’78]
• known special cases [KKT’03,’05]:
–
–
linear threshold model, independent cascade model
decreasing cascade model, “normalized” submodular threshold model
S  T, pu (v,S)  pu (v,T) or equiv.

January 15, 2007
f u (S {v})  f u (S) f u (T {v})  f u (T)

1 f u (S)
1 f u (T)
41/31
related work
•
sociology
–
–
threshold models: [Granovetter’78], [Morris’00]
cascades: [Watts’02]
• data mining
–
–
viral marketing: [KKT’03,’05], [Domingos-Richardson’02]
recommendation networks: [Leskovec-Singh-Kleinberg’05], [LeskovecAdamic-Huberman’06]
• economics
–
game-theoretic point of view: [Ellison’93], [Young’02]
• probability theory
–
–
–
Markov random fields, Glauber dynamics
percolation
interacting particle systems: voter model, contact process
January 15, 2007
42/31
proof sketch
coupling
• we use the generalized threshold model
• arbitrary sets A, B; consider 4 processes:
–
–
–
–
(At) started at A
(Bt) started at B
(Ct) started at AB
(Dt) started at AB
• it suffices to couple the 4 processes in such a way that for all t
Ct  At  Bt
Dt  At  Bt
• indeed, at termination
An1  Bn1
  An1  Bn1  An1  Bn1  Cn1  Dn1
(note this works with |.| replaced with any w monotone, submodular)

January 15, 2007
44/31
proof ideas
• our goal:
Ct  At  Bt
(1)
Dt  At  Bt
(2)
• antisense coupling
–
–
–
–
obvious way to couple: use same u’s for all 4 processes
satisfies
(1) but not (2)

“antisense”: using u for (At) and (1-u) for (Bt) “maximizes union”
we combine both couplings
• piecemeal growth
–
–
seed sets can be introduced in stages
we add AB then A\B and finally B\A
• need-to-know
–
–
not necessary to pick all u’s at beginning
can unveil only what we need to know:
v   f v St2 , f v St1 ?
January 15, 2007

45/31
piecemeal growth
• process started at S: (St)
• partition of S: S(1),…,S(K)
• consider the process (Tt):
–
–
–
–
pick u’s
run the process with seed S(1) until termination
add S(2) and continue until termination
add S(3) and so on
• lemma - the sets Sn-1 and TKn-1 have the same distribution
January 15, 2007
46/31
antisense coupling
• disjoint sets: S, T
• partition of S: S(1),…,S(K)
• piecemeal process with seeds S(1),…,S(K),T: (St)
• consider the process (Tt):
–
–
–
pick u’s
run piecemeal process with seeds S(1),…,S(K) until termination
add T and continue with threshold values
v ' 1  v  f v TKn 1
• lemma - the sets S(K+1)n-1 and T(K+1)n-1 have the same distribution

January 15, 2007
47/31
need-to-know
• proof of lemma
–
–
–
run the first K stages identically in both processes
note that for all v not in SKn-1 = TKn-1, v is uniformly distributed in
[fv(TKn-1),1]
but v’ = 1 - v + fv(TKn-1) has the same distribution
v   f v St2 , f v St1 ?

simulation 1
January 15, 2007
simulation 2
48/31
proof I
ANTI
January 15, 2007
49/31
proof II
ANTI
January 15, 2007
50/31
proof III
• new processes have correct final distribution
• up to time 2n-1, Bt = Ct and At = Dt so that
Ct  At  Bt
Dt  At  Bt
• for time 2n, note that
B2n1  D2n1
B2n  B2n1  (T \ S)
D2n  D2n1  (T \ S)
• so by monotonicity and submodularity
f v (B2n )  f v (B2n1)  f v (D2n )  f v (D2n1)

• then proceed by induction

January 15, 2007
51/31
general result
• we have proved:
theorem [Mossel-R’06] - in the generalized threshold model, if all
activation functions are submodular, then for any monotone, submodular
function w, the generalized influence
 w (S)  E S [w(Sn1)]
is submodular
• Note: A closure property for sub-modular functions!

January 15, 2007
52/31
Future Research Directions
• Study optimization problems for other stochastic models defined on
networks.
• And another annoying problem where discrete probability may help:
• Are there (easily computable? Probabilistic?) invariants of unlabelled
graphs that uniquely determine them?
• Motivation: Can one efficiently check if two graphs are isomorphic?
January 15, 2007
53/31
Download