Randomized Rounding

CS6999

Probabilistic Methods in Integer Programming

Randomized Rounding

Andrew D. Smith

April 2003

Overview

Background

Randomized Rounding

Handling Feasibility

Derandomization

Advanced Techniques

Randomized Rounding – Overview

2

Integer Programming

The Probabilistic Method

Randomization and Approximation

The Covering Radius of a Code

The Algorithm

Probabilistic Analysis

Covering and Packing Problems

Scaling Variables

Conditional Probabilities

Pessimistic Estimators

Positive Correlation

Limited Dependence

Version 2.0 – April 2003

Andrew D. Smith

Integer and Linear Programming

maximizing or minimizing a linear function over a polyhedron given matrix vector vector

A ∈ R m × n b ∈ R m c ∈ R n find a vector x optimizing max { cx | Ax ≤ b } min { cx | x ≥ 0; Ax ≥ b }

3

Randomized Rounding – Background

Alexander Schrijver (1986)

“Theory of linear and integer programming”


Andrew D. Smith



A ∈ R m × n b ∈ R m c ∈ R n find a vector max { cx | x optimizing

Ax ≤ b } min { cx | x ≥ 0; Ax ≥ b } x is the solution vector max cx or min cx are objective functions

Ax ≤ b , Ax ≥ b , x ≥ 0 are constraints on the solution

4





Andrew D. Smith



A ∈ R m × n b ∈ R m c ∈ R n find a vector max { cx | x optimizing

Ax ≤ b } min { cx | x ≥ 0; Ax ≥ b } x is the solution vector max cx or min cx are objective functions

Ax ≤ b , Ax ≥ b , x ≥ 0 are constraints on the solution linear programming integer programming x ∈ R n x ∈ Z n polynomial time

NP-complete (decision version)


5




Andrew D. Smith

Integer Programming

solving optimization problems over discrete structures famous integer programs

• Facility Location

• Knapsack Problem

• Job Shop Scheduling

• Travelling Salesperson all NP-hard!

some exact solution techniques

• Branch and Bound

• Dynamic Programming

• Cutting Planes

• Ellispoid/Separation exact algorithms fail eventually!

6

Randomized Rounding – Background Version 2.0 – April 2003

Andrew D. Smith

Integer Programming

solving optimization problems over discrete structures famous integer programs

• Facility Location

• Knapsack Problem

• Job Shop Scheduling

• Travelling Salesperson all NP-hard!

some exact solution techniques

• Branch and Bound

• Dynamic Programming

• Cutting Planes

• Ellispoid/Separation exact algorithms fail eventually!

what happens when no exact algorithm is fast enough but some solution must be found?

7

Find an approximately optimal solution

Randomized Rounding – Background more on this shortly . . .


Andrew D. Smith

Linear Programming Relaxations

relaxation: remove constraints geometrically: expand (and simplify) the polyhedron translation: allow more vectors to be solutions

8


Andrew D. Smith


relaxation: remove constraints geometrically: expand (and simplify) the polyhedron translation: allow more vectors to be solutions fractional relaxation: remove integrality constraints very important technique in solving integer programs x ∈ Z n 7−→ ¯ ∈ R n

• recall that linear programs can be solved in polynomial time

(e.g. primal dual interior point)

• fractional relaxations for NP-hard problems can be very easy

(e.g. fractional knapsack reduces to sorting)

• provide intuition about (geometric?) structure of problem

• (important concept) integrality gap = c ¯


9


Andrew D. Smith


relaxation: remove constraints geometrically: expand (and simplify) the polyhedron translation: allow more vectors to be solutions fractional relaxation: remove integrality constraints very important technique in solving integer programs x ∈ Z n 7−→ ¯ ∈ R n

• recall that linear programs can be solved in polynomial time

(e.g. primal dual interior point)

• fractional relaxations for NP-hard problems can be very easy

(e.g. fractional knapsack reduces to sorting)

• provide intuition about (geometric?) structure of problem

• (important concept) integrality gap = c ¯ true understanding of an NP-complete compinatorial optimization problem requires understand its fractional relaxation


10


Andrew D. Smith

Randomized and Approximation Algorithms

Randomized Algorithms any algorithm with behavior depending on random bits ( we assume a source of random bits ) quality measures time complexity number of random bits required probability of success

11

Motwani & Raghavan (1995) “Randomized algorithms”

D. Hochbaum (1995) “Approximation algorithms for NP-hard problems”


Andrew D. Smith


Randomized Algorithms any algorithm with behavior depending on random bits ( we assume a source of random bits ) quality measures time complexity number of random bits required or probability of success

Approximation Algorithms polynomial time algorithm for an NP-hard optimization problem performance ratio optimal approximate

(maximization) approximate optimal

(minimization) performance ratios are better when closer to 1 and cannot be < 1 quality measures time complexity performance ratio

12




Andrew D. Smith


Randomized Algorithms any algorithm with behavior depending on random bits ( we assume a source of random bits ) quality measures time complexity number of random bits required or probability of success

Approximation Algorithms polynomial time algorithm for an NP-hard optimization problem performance ratio optimal approximate

(maximization) approximate optimal

(minimization) performance ratios are better when closer to 1 and cannot be < 1 quality measures time complexity performance ratio

13 combine the two Randomized Approximation Algorithms




Andrew D. Smith


Trying to prove that a structure with certain desired properties exists, one defines an appropriate probability space of structures and then shows that the desired properties hold in this space with positive probability.

— Alon & Spencer

14

Alon & Spencer (2000) “The probabilistic method” 2nd edition


Andrew D. Smith



— Alon & Spencer

15 importance for algorithms

• can be used to show a solution exists

• and to show that a solution exists with particular qualities

• and to show that a solution exists where an algorithm will find it

• essential tool in design and analysis of randomized algorithms



Andrew D. Smith



— Alon & Spencer

16 importance for algorithms

• can be used to show a solution exists

• and to show that a solution exists with particular qualities

• and to show that a solution exists where an algorithm will find it

• essential tool in design and analysis of randomized algorithms caveat non-constructive by definition



Andrew D. Smith

A Quick Illustration

K

5 is the complete graph on 5 vertices


17


Andrew D. Smith


K


A 5 player tournament is an orientation of K

5


18


Andrew D. Smith


K


A 5 player tournament is an orientation of K

5

A Hamiltonian path touches each vertex exactly once


19


Andrew D. Smith


Theorem

There is a tournament with n players and at least n !2

(1 − n )

Hamiltonian paths.

20


Andrew D. Smith


21

Theorem

There is a tournament with n players and at least n !2

(1 − n )

Hamiltonian paths.

Proof

Let T be a random tournament on n players and let X be the number of

Hamiltonian paths in T .

For each permutation σ let X

σ

Hamiltonian path - i.e., satisfying

P

Then X = X

σ and be the indicator random variable for σ giving a

( σ ( i ) , σ ( i + 1)) ∈ T for 1 ≤ i ≤ n .

X

E[ X ] = E[ X

σ

] = n !2

(1 − n ) .

Thus some tournament has at least E[ X ] Hamiltonian paths.


Andrew D. Smith

Minimum Radius of a Code

formal problem statement instance: a code

F ⊆ { 0 , 1 } n

,

F = { s

1

, . . . , s m

}

.

objective: find a vector z minimizing max i d

H

( s i

, z ) .

• d

H is the Hamming distance (number of mismatching positions)

• the vector z is called the center of F

• special case of the Closest String problem (highly approximable)

• important problem in clustering and classification

F = { s

1

, s

2

, s

3

, s

4

} z = 0110111100 s

1 s

2

= 011 1 1111 1 0

= 0 00 011 0 1 1 0 s

3

= 1 110111100 s

4

= 1 11 100 11 11 d

H

( s

4

, z ) = 6

22


Frances & Litman (1997) “On covering problems of codes”


Andrew D. Smith

The Covering Radius Integer Program

identify vector s i in F with row a i of A for each position j , row a i has two positions example transformation s

1

= 1 0 1 1 0 1 0 0 1 1 a

1

= 10 01 10 10 01 10 01 01 10 10

( s j

= 0) ⇔ ( x j

1

( s j

= 1) ⇔ ( x j

1

= 0) ∧ ( x j

2

= 1) ∧ ( x j

2

= 1)

= 0) minimize subject to fractional relaxation

P n j =1

( a ij

1 x j

1

+

∀ 1 ≤ j ≤ n, x j

1 a

+ ij

2 x x j

2 j

2

) ≤ d ,

= 1 , x j

1

, x j

2

∈ { 0 , 1 } x j

1

, ¯ j

2

∈ [0 , 1]

F s

1 s

2 s

3 s

4

= { s

1

, s

2

, s

3

= 0111111110

= 0000110110

= 1110111100

= 1111001111

, s

4

} A = [ a

1

, a

2

, a

3

, a

4

] T a

1 a

2 a

3 a

4

= 01101010101010101001

= 01010101101001101001

= 10101001101010100101

= 10101010010110101010 a

4

= 1 01010 1 00 1 0 1 1010 1 0 1 0 x = 1 00101 1 00 1 0 1 0101 1 0 1 0

⇒

a

4 x = 6

23


Andrew D. Smith

Randomized Rounding

input: a fractional vector output: a random vector

¯ ∈ R x ∈ Z n n

(a solution to the linear relaxation) distributed according to start with the solution to the linear programming relaxation

R ANDOMIZED R OUNDING x ) {

for each position j of x with probability otherwise set x

¯ j

1 j

1 set

← 0 x j

1 and

← 1 and x j

2 x j

2

← 1 return x

}

← 0 all solution vectors produced by the algorithm are valid solutions some are just better than others example suppose h then Pr x

1

1 x

1

1

= 0

= 1

.

68

∧ and x

1

2

¯

1

2

= 0 i

= 0 .

32 ,

= 0 .

68 and Pr h x

1

1

= 0 ∧ x

1

2

= 1 i

= 0 .

32

24 key idea: treat the solution to the linear programming relaxation as a probability distribution

Randomized Rounding – The Basic Technique

Raghavan & Thompson (1987)

“Randomized rounding: a technique for provably good algorithms and algorithmic proofs”


Andrew D. Smith

Analysis

three simple ideas used in probabilistic proofs

Linearity of Expectation

For any random variables X, Y and any a, b ∈ R

,

E[ aX + bY ] = a E[ X ] + b E[ Y ] .

25

Boole’s inequality

Let E be any set of events, then

Pr[ ∨

E ∈E

E ] ≤

P

E ∈E

Pr[ E ] , regardless of any dependencies between the events.

Chernoff’s Bound

Let X =

P n j =1

X j

, X j

∈ { 0 , 1 } , and µ = E[ X ] , then

Pr[ X > (1 + ) µ ] <

(1 + e

) (1+ )

µ

< exp( − 2 µ/ 3)

Randomized Rounding – The Basic Technique Version 2.0 – April 2003

Andrew D. Smith

Using Linearity of Expectation

26

What is the expected quality of the solution?

consider a vector consider a row a i s i from F what is the expected distance between from A what is the expected value of a i x ?

s i and z ?

by our transformation from

Minimum Covering Radius to the integer program, these questions are equivalent we can answer these questions because we know the probabilities since ¯ was used as the probability when rounding

E[ d

H

( s i

, z )] = E[ a i x ]

= a i

E[ x ]

= a i h E[ x

1

1

= a i h ¯

1

1

, ¯

] ,

1

2

E[ x

1

, . . . ,

2

]

¯

, . . . , n

1

, ¯

E[ n

2 x i n

1

= a i

¯

] , E[ x n

2

] i so we expect the randomly rounded solution vector to be of the same quality as the fractional solution, but it could be better or worse


Andrew D. Smith

Using Chernoff’s Bound

What is the probability that the quality differs significantly from expectation?

the value of a i x is a sum of 0 − 1 random variables Bernoulli random variables so we can apply Chernoff’s Bound

Pr [ X ≥ (1 + ) µ ] ≤ exp( − 2 µ/ 3)

27

Pr [ a i x ≥ (1 + ) a i

¯ ] ≤ exp( − 2 a i

¯ 3) so now we have a bound on the probability that the product of x with any individual constraint exceeds 1 + times expectation

Hagerup & Rub (1989) “A guided tour of chernoff bounds”


Andrew D. Smith

Using Boole’s Inequality

How do we consider all rows simultaneously?

define the bad events

E i

≡ “ a i x > (1 + ) a i

¯ ” solution must avoid all E i probability of violating at least one constraint so we need to bound

Pr (

W m i =1

E i

)

28

Boole’s inequality says

Pr (

W m i =1

E i

) ≤

P m i =1

Pr( E i

) for success with probablity 1 / 2

Pr( E i

) ≤ 1

2 m

Randomized Rounding – The Basic Technique combining with Chernoff’s bound we require e

(1 + ) (1+ ) a i

¯

<

1

2 m


Andrew D. Smith

Selecting Parameters

when we cannot bound the objective function from below

≥ ln( m ) d

Pr( E i

) <

2

1 m the probability of failure on a given attempt is < 1

2 conclusion randomized rounding is a randomized O (log m ) -approximation algorithm for Minimum Covering Radius.

29


Andrew D. Smith

Selecting Parameters

when we cannot bound the objective function from below

≥ ln( m ) d

Pr( E i

) <

2


2 conclusion randomized rounding is a randomized O (log m ) -approximation algorithm for Minimum Covering Radius.

30 with a large bound on the objective function d ≥

3 ln(2 m )

2

Pr( E i

) <

1


2 conclusion for any > 0 we can derive the number of repetitions required to guarantee a performance ratio of 1 + .


Andrew D. Smith

Some Observations

for non-binary solution vectors , we can often use the fractional part as the probability:

Rounding up

Rounding down

Pr( x i

= d ¯ i e ) = ¯ i

− b ¯ i c

Pr( x i

= b ¯ i c ) = d ¯ i e − ¯ i

31

Ł many NP-hard problems have fractional relaxations that are much easier than general linear programs

Fractional Knapsack reduces to sorting items in non-decreasing order of value/weight

Ł the performance ratio is based on using the fractional relaxation as a lower bound exact solution is not required a maximum matching algorithm can be used to obtain a fractional vertex cover that provides a lower bound on the optimal integral solution

The method can be (and has been) applied to a very wide range of problems and it has been shown to work well in practice.

Akamai uses a variant of randomized rounding for performance based load balancing


Andrew D. Smith

Handling Feasibility

for Minimum Covering Radius any solution produced by randomized rounding was valid most integer programs are more difficult a packing problem is the integer program with the matrix and vectors A, b, c, x ≥ 0 max { cx | Ax ≤ b }

Note unlike the minimum covering radius, packing problems have constraints that are not trivially satisfied because the right hand side b is fixed examples

Travelling Salesman pack paths subject to degree 2 and cost constraints

Job Shop Scheduling pack jobs subject to machine load constraints the point it does not matter how good the solution is with respect to the objective function if a constraint a i x ≤ b i is violated randomly rounding can lead to violation of a constraint

Randomized Rounding – Feasibility

32


Andrew D. Smith

Scaling Variables

the solution is to scale the variables used as probabilites, prior to rounding, so that the rounding process is likely to result in a feasible solution.

define the scaled vector x 0 instead of ¯ x 0 = ¯ (1 + ) and use as the probabilities when rounding analysis

Pr( a i x > (1 + ) a i x 0 ) = Pr( a i x > a i

¯ ) < exp

2 a i

3(1 + ) as before, the analysis proceeds using linearity of expectation,

Chernoff’s bound and Boole’s inequality

33 to calculate the performance ratio we minimize subject to

2

(1 − )

>

2 ln( m/ 4) a i

¯ key idea: multiply the probabilites by some fraction less than 1

Randomized Rounding – Feasibility

Raghavan & Thompson (1987)

“Randoimized rounding: a technique for provably good algorithms and algorithmic proofs”


Andrew D. Smith

Derandomization

what is derandomization?

2 kinds of (algorithmic) derandomization partial derandomization reducing the number of coin flips (random bits) required often the goal is to produce better pseudorandom number generators

34 full derandomization eliminate the use of random bits (no coin flips) sufficient to reduce random bits to O (log n ) for problem of size n the number of sequences of O (log n ) random bits is bounded by a polynomial remark complexity theoretic derandomization proving that certain randomized complexity classes are equivalent to deterministic classes they are known to contain similar to proving P=NP, but much more plausible, especially given PRIMES ∈ P

Randomized Rounding – Derandomization Version 2.0 – April 2003

Andrew D. Smith

The Method of Conditional Probabilities

if a particular coordinate is rounded to a specified value, is the probability of failure still < 1 ?

with an oracle for the above question, we would not need random choices define the event

“fail”

Pr( fail

≡ ∨

) = m i =1

E i prior to the first random choice Pr( fail ) < 1 the first decision requires selecting from

2 possible choices

Pr( fail | x

1

1

= 1) + Pr( fail | x

1

1

2

= 0)

⇒ min { Pr( fail | x

1

1

= 1) , Pr( fail | x

1

1

= 0) } ≤ Pr( fail )

35 if we knew the conditional probability we could make each choice so that Pr( fail ) does not increase

Pr( fail | all choices made ) ∈ { 0 , 1 } since the probability of failure did not increase, all E i have been avoided key idea: use a probability function to guide a greedy algorithm

Randomized Rounding – Derandomization

Erdos & Selfridge (1973) “On a combinatorial game”


Andrew D. Smith

Pessimistic Estimators

the difficulty with conditional probabilities is computing them efficiently the idea of a pessimistic estimator was introduced to characterize functions that can be used in the place of the true conditional probability

3 defining properties the estimator always bounds the probability of failure the estimator initially indicates that probability of failure is not 1 the value of teh estimator never indreases as coputation proceeds known pessimistic estimators are often discovered by carefull examination of the probabilistic proofs for the randomized algorithm pessimistic estimators hide in the proofs of bounds like Chernoff’s

36 surprising observation the deterministic algorithm has performance that is better than the expected performance of the randomized algorithm

Prabhakar Raghavan (1988)

”Probabilistic construction of deterministic algorithms: approximating packing integer programs”

Randomized Rounding – Derandomization Version 2.0 – April 2003

Andrew D. Smith

Recent Advances

very active area of research two probabilistic ideas correlated events (e.g. planarity and hamiltonicity of graphs) events with limited dependence (e.g. coloring sparse graphs)

37

Randomized Rounding – Recent Advances Version 2.0 – April 2003

Andrew D. Smith

Correlation

randomized algorithm high probability derandomized algorithm positive probability can we show positive probability when high probability does not hold?

let N be a family of subsets of { 1 , . . . , n }

N is monotone decreasing if ( X ∈ N ) ∧ ( X

0 ⊆ X ) ⇒ ( X

0

N is monotone increasing if ( X ∈ N ) ∧ ( X ⊆ X 0 ) ⇒ ( X 0

∈ N

∈ N )

)

38

Given a set placing each

Pr( Y ∈ F

N = { 1 , . . . , n } and some p ∈ [0 , 1] n

) i ∈ N in Y independently with probability p i

. For any F ⊆ 2 N

. Let F

1

, F

2

. . . , F s

⊆ 2 N

!

suppose we select a random

, let

Y ⊆ N by

Pr( F ) ≡ be any sequence of monotone decreasing families.

Pr F i

≥ Pr( F i

) i =1

If we assume a symmetric probability space with i =1

Pr = p , then Pr (

V s i =1

F i

) ≥ p s

.

Randomized Rounding – Recent Advances

Fortuin, Ginibre & Kasteleyn (1971)

”Correlational inequalities for partially ordered sets”


Andrew D. Smith

Correlation

let N be the set of positions in the solution vector x let the families each family

F i

= { S ⊆ N : a i x ≤ b }

F i corresponds to the set of solutions avoiding the bad event E i each F i fits the definition of a monotone decreasing family for covering and packing the events of constraint satisfaction are positively correlated we may treat them as though they are at least as probably as independent events

Pr

W m i =1

X i

> (1 + ) ¯ < mp

Pr(

W m i =1

E i

) ≤ 1 − Pr(

≤ 1 − Pr(

V m

V i =1 m i =1

¯ i

F i

)

)

≤ 1 − p m

< 1

Pr becomes

W m i =1

X i

> (1 + ) ¯ < p m

39 positive probability is too small for a randomized algorithm conditional probabilities can be used because a pessimistic estimator is known

Srinivasan (1995)

”Improved approximations of packing and covering problems”


Andrew D. Smith

Lovasz Local Lemma

40 two forms of the Lovasz Local Lemma: general and symmetric the general form is more powerful but difficult to use in algorithms

Lovasz Local Lemma (Symmetric Form)

Let Y = { y

1

, . . . , y m

} be events in an arbitrary probability space. Suppose each event y i is mutually independent of all other events y j exception of at most d , and Pr( y i

) ≤ p for all 1 ≤ i ≤ m . If ep ( d + 1) ≤ 1 then

Pr (

V m i =1 y i

) > 0 .

“ e ” is the base of the natural logarithm the symmetric form is easy to apply in existence proofs bonus most objects shown to exist by the symmetric Lovasz Local Lemma can be found with a general algorithm known as

Ł Beck’s hypergraph coloring algorithm

J. Beck (1991) ”An algorithmic approach to the lovasz local lemma I”

Erdos & Lovasz (1975)

”Problems and results on 3-chromatic hypergraphs and some related questions”


Andrew D. Smith

Lovasz Local Lemma

the Lovasz Local Lemma applied to integer programming

E i and E j are dependent ⇔ a i and a j share a variable the events of interest are the E i previously defined example a

1 and a

3 are dependent a

1

0000111 11 00000000000000000 a

2 a

3

0000000 11 11111000000000000

00000000000000100111100000

Very high level overview of Randomized Rounding with Beck’s Algorithm

1) do basic randomized rounding

2) remove all variables that do not appear in a violated constraint

3) remove all constraints for which all variables have been removed

4) form a graph on the set of remaining constraints with vertices corresponding to constraints and edges between constraints sharing a variable

5) key idea with high probability the connected components in the graph are small and can therefore be treated independently and efficiently

41

Chi-Jen Lu (1999)

”A deterministic approximation algorithm for a minimax integer programming problem”


Andrew D. Smith

Conclusion

Thank Prof. Horton and Prof. Evans for helping me with this course

Lots more, ask about applying these methods to your problems (especially routing problems)

42


Andrew D. Smith

Randomized Rounding

CS6999

Probabilistic Methods in Integer Programming