1 Review of Basic Probability Lecture 3: Introduction to Probabilistic Algorithms (MIS))

advertisement
Comp 260: Advanced Algorithms
Tufts University, Spring 2011
Prof. Lenore Cowen
Scribe: Eli Brown
Lecture 3: Introduction to Probabilistic Algorithms
(MIS))1
1
Review of Basic Probability
Definition 1.0.1 (Random Variable) A random variable X is a real number that is the outcome of a random event.
For example, X = number of spots when a six-sided die is rolled.
Definition 1.0.2 (Expected Value) Expectation of X, denoted E[X] is
X
E[X] =
i· (P r[X = i])
i
That is, if someone promises to pay you a dollar per spot that comes up
on a die, the expected value is the amount you expect to be paid on average.
With the example above of random variable X,
E[X] =
6
X
i· (P r[X = i]) =
i=1
7
2
Note that E[X] is not necessarily a possible value of X. P r[X = 72 ] = 0.
Also note that expected value is not the only way to judge a probabilistic
decision. Increasing the payout on an incredibly unlikely event will raise
expected value, but not necessarily make, say the lottery, a good bet.
1
These notes were partially based on past lectures scribed by Adam Lewis and Jeremy
Freeman
1
Theorem 1.0.3 (Linearity of Expectation) If X and Y are anyP
two random
P variables then E[X + Y ] = E[X] + E[Y ]. Generally: E[ Xi ] =
E[Xi ].
If c is a real number then E[c · X] = cE[X]. E[XY ] 6= E[X]E[Y ] unless
X & Y are independent.
Definition 1.0.4 (Indicator Random Variable) Let w be some event (e.g.
a six sided die was rolled and turned up a 6), then the indicator random variable for w is
1
if w happens
Iw =
0
otherwise
The probability of an event is the expectation of its indicator random
variable.
Theorem 1.0.5 (Markov’s Inequality) P r[|X| ≥ a] ≤
E[x]
a
This bounds the probability of having a variable express a value far from
the expected value.
Proof 1.0.6
Let I|x|≥a =
It is always the case that
and by 1.0.3
E[x]
a
x
a
if |x| ≥ a
if 0 < |x| < a
1
0
≥ I|x|≥a Therefore: E[ xa ] ≥ E[I|x|≥a ] .
≥ P r[|x| ≥ a]
Definition 1.0.7 (Variance)
V ar[X] = E[X − E[X]]2
= E[X 2 − 2XE[X] + (E[X])2 + 2E[X 2 ]
= E[X 2 ] − E[X]2
Theorem 1.0.8 (Chebyshev’s inequality) P r[X − E[X] ≥ a] ≤
V ar[X]
a2
Chebyshev’s inequality can be shown from Markov’s inequality and the
definition of variance.
2
2
Probabilistic Algorithms
2.1
A Toy Example
Suppose you are given two containers, one containing N blue balls and the
other containing N2 blue balls and N2 yellow balls. The goal is to determine
which container is which.
2.2
Algorithm 1
1. Pick a container at random
2. Draw 20 balls
3. If a yellow ball is found, stop and report “found the mixed container”
4. Else stop and report “found the blue container”
This algorithm is fast, but not always correct. However, it has a high
probability of being correct.
2.3
Algorithm 2
1. Pick a container at random and draw a single ball
2. If a yellow ball is found, stop and report “found the mixed container”
3. Else goto step 1
This algorithm is always correct, but not always fast. However, it has a
high probability of being fast.
3
2.4
Algorithm 3
1. Pick a container at random and start drawing balls
2. If a yellow ball is found, stop and report “found the mixed container”
3. Else continue until the container is empty and report “found the blue
container”
This algorithm is always correct, but with probability
2.5
1
2
it is slow.
Types of Probabilistic Algorithms
There are two classes of probabilistic algorithms we will discuss. Monte Carlo
algorithms are always fast though not always correct, but correct with high
probability. Las Vegas algorithms are always correct though not always fast,
but fast with high probability. Formal definitions follow.
Definition 2.5.1 (Monte Carlo Algorithm) A Monte Carlo algorithm is
a probabilistic algorithm M for which ∃ a polynomial P such that ∀x, M terminates within P (x) steps on input x.
Furthermore, P [M (x) is correct] >
coin tosses in algorithm M .
2
3
where probability is taken over all
Note that it is easy transform any such Monte Carlo Algorithm into a
Monte Carlo Algorithm that is correct with probability 1 − by simply running M multiple times and taking a vote. In fact, this strategy works whenever the probability that the probability the algorithm is correct is greater
than 21 by at least a fixed constant value.
Definition 2.5.2 (Las Vegas Algorithm) A Las Vegas algorithm is a probabilistic algorithm M for which ∃ a polynomial P such that ∀x
E[running time] =
∞
X
(t)P r[M (x) takes exactlly t steps] < P (x)
t=1
Furthermore, the output of M is always correct.
4
You can think of randomized algorithms as deterministic via the trick of
considering an algorithm that tries each possible sequence of random choices
assuming discrete random variables. Early primality checking algorithms
were often Monte Carlo. If a given number was factorable, the algorithm
could say so, otherwise the number was probably prime. There have since
been Las Vegas and eventually deterministic algorithms.
It is still not known whether having a Monte Carlo algorithm implies
there is a Las Vegas algorithm, but the converse is true. It is also still an
open question whether or not there exist problems whose only polynomial
time algorithms require randomness.
3
The Max-Cut Problem
Given a graph, G = (V, E), we wish to partition V into two sets, A and B,
such that the number of edges crossing the cut is maximized. This problem
is NP-Hard.
3.1
The Erdős and Spencer Probabilistic Method
We now use the probabilistic method to show that for any graph, there is
guaranteed to exist a cut that contains at least half the edges of the graph.
Notice that such a cut is always a 21 approximation to max cut, since no cut
can contain more than all the edges of a graph. It turns out this probabilistic
method belies a deterministic algorithm for 12 approximation of max-cut.
To perform the probabilistic algorithm, for each vertex i, flip a coin and
let the variable Xi be a random variable:
−1
if coin i flipped tails
Xi =
1
if coin i flipped heads
Then if Xi = −1, put i in A. If Xi = 1, put i in B. E[Xi ] = 0.
Define a varible that represents whether or not an edge crosses the cut (1
or 0 resp.)
xi xj
Eij = 1 −
∀i, j ∈ V
2
5
Let S be the number of edges that cross the cut.


X
E[S] = E 
Eij 
(i,j)∈E
=
X
E[Eij ] by Linearity of Expectation
(i,j∈E
=
X
(i,j)∈E
=
1 − xi xj
E
2
|E| 1 X
−
E [Xi Xj ]
2
2
(i,j)∈E
=
|E| 1 X
−
E [Xi ] E [Xj ]
2
2
(i,j)∈E
because coin flips were independent
|E|
because E[Xi ] = 0 ∀i
=
2
This probabilistic argument says that if E[S] ≥ |E|
, then there must exist
2
some assignment of vertices to sets, A & B, that is at least this good, but
note that this just gives us the expectation. What we really want is that the
probability that we get a cut at least as good as the expectation is 1c . You
can prove that with Chebyshev’s inequality. Also, a proof that this algorithm
gives ≥ |E|
edges with P r > 21 is left as an exercise to the reader.
4
From here, we will create a deterministic algorithm to do the same thing.
Note that the simplest version of that would be to enumerate all possible
configurations of the flipped coins and then check them until you find a
configuration that matches your expectation. The Erdös-Spencer method is
just that: having shown the expectation is good with a randomized algorithm,
means there exists some configuration with |E|
edges
the fact that E[S] ≥ |E|
2
2
that cross the cut.
6
A Digression on Pairwise Independence
xi
1
1
0
0
xj
1
0
1
0
xk = xi ⊕ xj
0
1
1
0
In the table above, any one variable (column) can be removed, leaving the
remaining two pairwise independent. Notice that in the proof above, we only
needed pairwise independence. In the context of a randomized algorithm we
just want a way to reduce the possibilities of random strings of coin flips
while maintaining pairwise independence, that is:
E[Xs = a|xt = b] = E[xs = a] where s 6= t ∈ {i, j, k}
We can take advantage of that notion of pairwise independence in constructing a deterministic algorithm for the above problem that interates over
a sample space of coin flips that is polynomial (rather than exponential)
size, and thus runs in polynomial time. Suppose we have N variables and
assume N = 2k for some natural number k (if not just pad the set). Take
a sequence of length N over the set {−1, 1}. Let W = w1 w2 w3 ...wlog2 N be
a random log N -bit sequence of 0’s and 1’s. Let Xi = (−1)bin(i)w , where
bin(i) is the binary expansion of i and x y is the count of bits set in the
bit-wise exclusive-or of x and y. Each Xi ∈ {−1, 1} and Xi , Xj are pairwise
independent. Notice Xi will be 1 if bin(i) w is even and −1 if it is odd.
As an example, 1101 0010 0001 1011 = 4. Vertex 36 would be written as
log N bits, say as 00100100 and then XORed with the bits of W .
If two numbers have two different bit values then they will have different
XOR values against W , giving us 2logN = N strings to try. The algorithm
is to enumerate all the log N bit strings, XOR with each vertex, and check
all possible strings from each to pick the best. Clearly with only N strings
to try, this is a polynomial time algorithm, and since the analysis shows the
expectation over this smaller sample space is also good, the partition of the
best string will give us the cut that we need in polynomial time.
7
3.2
A Much Simpler Deterministic Algorithm
We now show, in fact, that there’s a much simpler deterministic procedure
that would yield the same bound.
A vertex can be labeled as “happy” or “unhappy”. Vertices are unhappy
if they have more direct neighbors on their side of the cut than on the other
side, and are happy otherwise. Any unhappy vertex can be made happy by
flipping it to the other side of the cut.
The greedy algorithm is to simply flip the side of unhappy vertices. That
is, while there exists an unhappy vertex, pick an unhappy vertex and flip it
across the cut. While a flip may cause some neighbors to become unhappy,
each flip is guaranteed to increase the number of edges that cross the cut, thus
forward progress is made with each step and the algorithm will terminate.
In other words, at each step the number of edges with an endpoint in each
set decreases.
Furthermore, when the algorithm terminates, since all vertices are happy,
locally, the number of edges that cross the cut is greater than the number
of edges that stay on the same side of the cut and therefore, the number of
edges globally that cross the cut is greater than the number of edges that
stay on the same side of the cut, and thus, it terminates with at least half
the edges crossing the cut.
That solution has a running time of O(|E|) and it is still an open question
whether or not there is an algorithm with O(|V |) running time.
Both these methods can get a 21 optimal solution, but we will see Gomez
and Williamson later in the class which will give us a 0.878 optimal solution
to the MaxCut problem.
4
Maximal Independent Set (MIS)
Definition 4.0.1 A Maximal Independent Set (MIS) of a graph G =
(V, E) is a set of vertices I ⊆ V such that
• Independent: x ∈ I ⇒ y ∈
/ I ∀y such that (x, y) ∈ E
8
• Maximal: ∀x, x ∈ I or y ∈ I for some y such that (x, y) ∈ E
Figure 1: Examples of Maximal Independent Sets
In Figure 1, there are two examples of MIS graphs. Note that Maximal
6= Maximum. The only restriction for maximal is that you cannot have a
vertex and none of its neighbors in the set.
Note that Maximum Independent Set is NP-hard, but MIS is easy. One
way to do it would be to create a tree from a breadth-first traversal and then
choose every other level for the set.
So deterministically finding a polynomial time algorithm to compute an
MIS is easy. However, finding a good parallel algorithm to compute an MIS
is harder, and that’s the problem we consider next.”
9
4.1
Parallel Computing Application
To solve MIS with each node represented by its own processor, we approach
the problem in rounds. We begin with an empty set of vertices I and a graph
G = (V, E), and then in each round each of the vertices in {V − I} flips a
coin that goes towards determining if it can enter I.
Let D be the maximum degree in G.
Call a vertex “unsatisfied” if neither it nor any of its neighbors has been
put in the MIS.
1. If vertex j is unsatisfied, it flips a coin to assign
1 with probability p =
Xj =
0 otherwise
1
4D
2. If Xj = 1 and Xk 6= 1 for all neighbors k of i then vertex j enters the
MIS
3. Update the list of which vertices are satisfied. If not all vertices, go to
next round.
The calculation for how long this is likely to take is a bit more complicated. First define an indicator variable
Y
Yi = Xi ×
(1 − Xj )
(i,j)∈E
Random variable Yi shows whether or not vertex i enters the MIS this round.
It is zero unless Xi is 1 and Xj is 0 for all j, the neighbors of i. To represent
whether or not vertex i is satisfied, define:
P
1
iff (Yi + (i,j)∈E Yj ) ≥ 1
Zi =
0
otherwise
Next we can sort the vertices into groups by degree. We can make log D
buckets:
D
1 − 2, 2 − 4, 4 − 8, 8 − 16, . . . , − D
2
10
The last group is called “big degree nodes” and we put all vertices in it that
satisfy
2dlog(D−1)e ≤ degree(i) ≤ 2dlogDe
Now we claim that in round i, we satisfy a constant fraction of the big
degree nodes. If that is true, we have shown O(log2 n) rounds will be enough
to have an MIS.
Figure 2: i becomes satisfied by j flipping a 1 and all the other neighbors’
not flipping 1
Proof 4.1.1 Let T be the number of big degree nodes that get satisfied in
round i:
X
T =
Zj
j∈{big degree nodes}
We want to show that
E[T ] ≥
|big degree nodes|
constant
but we will actually bound based on something smaller.
Consider the probability that i becomes satisfied by j flipping 1 and none
of the other neighbors flipping 1 (i.e. the picture is Figure 2). Then we define
Y
Y
Rij = Xj
(1 − Xk )
(1 − Xl )
(j,k)∈E
(i,l)∈E,l6=j
11
This Rij is disjoint for every neighbor j of i since they keep each other from
happening. Note that no two Rij can be 1 at the same time. So, with D the
maximum degree in G:
X
Zi ≥
Rij
(i,j)∈E
T =
X
Zi
i∈big
≥
X X
Rij
i∈big (i,j)∈E
≥
X X
i∈big (i,j)∈E
p−
X
(j,k)∈E
p2 −
X
p2
(k,l)∈E,k6=l
≥|big|Dp − Dp2 − Dp2
For further study, check out Professor Cowen’s paper on parallel, randomized MIS when there is not a clock maintained across all processors involved.
For further study, read Mike Luby’s paper. For a generalization to when
there is not a clock that allows synchronized rounds, see a paper of Awerbuch,
Cowen and Smith in the 1994 STOC conference.
5
Digression - Vertex Coloring
Definition 5.0.2 Let G = (V, E). A proper Vertex Coloring is an assignment C : V → {1, . . . , k} of k colors such that ∀(u, v) ∈ E, c(u) 6= c(v).
Any planar graph can be 4-colored. More generally, let ∆ be the max
degree of G. Then the vertices of any G can be colored with ∆ + 1 colors.
The proof is by induction on the number of vertices
Proof 5.0.3 Suppose you have a graph with k vertices. Assume the theorem
is true for any graph with k vertices, then for a graph with k + 1 vertices,
take G − V for some vertex V , and what remains is a k vertex graph. Color
with the ∆ + 1 colors and color V with a color not used by any neighbor.
12
Fact: any planar graph has average degree less than 6.
Corollary: in any planar graph, there is always some vertex of degree 5
or less.
Theorem 5.0.4 Any planar graph can be colored with 6 colors
Proof 5.0.5 Using the above facts: take a vertex with fewer than 6 neighbors
and remove it. Color the remaining graph then add it back in. Since you
maintain planarity, you maintain the average degree across steps.
Distributed ∆ + 1 coloring can be done by applying MIS iteratively, assigning a color to each MIS:
1. Each uncolored vertex chooses a color at random from δi + 1 colors
where δi is the degree of vertex i
2. For every edge, if both endpoints have the same color, arbitrarily uncolor one endpoint
3. Update the graph by removing all colored vertices and updating list of
allowed colors for remaining vertices
13
Download