Sublinear Algorihms for Big Data Lecture 1 Grigory Yaroslavtsev http://grigory.us Part 0: Introduction • • • • Disclaimers Logistics Materials … Name Correct: • Grigory • Gregory (easiest and highly recommended!) Also correct: • Dr. Yaroslavtsev (I bet it’s difficult to pronounce) Wrong: • Prof. Yaroslavtsev (Not any easier) Disclaimers • A lot of Math! Disclaimers • No programming! Disclaimers • 10-15 times longer than “Fuerza Bruta”, soccer game, milonga… Big Data • • • • Data Programming and Systems Algorithms Probability and Statistics Sublinear Algorithms π = size of the data, we want π(π), not π(π) • Sublinear Time – Queries – Samples • Sublinear Space – Data Streams – Sketching • Distributed Algorithms – Local and distributed computations – MapReduce-style algorithms Why is it useful? • Algorithms for big data used by big companies (ultra-fast (randomized algorithms for approximate decision making) – Networking applications (counting and detecting patterns in small space) – Distributed computations (small sketches to reduce communication overheads) • Aggregate Knowledge: startup doing streaming algorithms, acquired for $150M • Today: Applications to soccer Course Materials • Will be posted at the class homepage: http://grigory.us/big-data.html • Related and further reading: – Sublinear Algorithms (MIT) by Indyk, Rubinfeld – Algorithms for Big Data (Harvard) by Nelson – Data Stream Algorithms (University of Massachusetts) by McGregor – Sublinear Algorithms (Penn State) by Raskhodnikova Course Overview • • • • • Lecture 1 Lecture 2 Lecture 3 Lecture 4 Lecture 5 3 hours = 3 x (45-50 min lecture + 10-15 min break). Puzzles You see a sequence of values π1 , … , ππ , arriving one by one: • (Easy, “Find a missing player”) – If all ππ′ π are different and have values between 1 and π + 1, which value is missing? – You have π(log π) space • Example: – There are 11 soccer players with numbers 1, …, 11. – You see 10 of them one by one, which one is missing? You can only remember a single number. Which number was missing? Puzzle #1 You see a sequence of values π1 , … , ππ , arriving one by one: • (Easy, “Find a missing player”) – If all ππ′ π are different and have values between 1 and π + 1, which value is missing? – You have π(log π) space • Example: – There are 11 soccer players with numbers 1, …, 11. – You see 10 of them one by one, which one is missing? You can only remember a single number. Puzzle #2 You see a sequence of values π1 , … , ππ , arriving one by one: • (Harder, “Keep a random team”) – How can you maintain a uniformly random sample of π values out of those you have seen so far? – You can store exactly π items at any time • Example: – You want to have a team of 11 players randomly chosen from the set you have seen. – Players arrive one at a time and you have to decide whether to keep them or not. Puzzle #3 You see a sequence of values π1 , … , ππ , arriving one by one: • (Very hard, “Count the number of players”) – What is the total number of values up to error ± ππ? – You have π(log log π /π 2 ) space and can be completely wrong with some small probability Puzzles You see a sequence of values π1 , … , ππ , arriving one by one: • (Easy, “Find a missing player”) – If all ππ′ π are different and have values between 1 and π + 1, which value is missing? – You have π(log π) space • (Harder, “Keep a random team”) – How can you maintain a uniformly random sample of π values out of those you have seen so far? – You can store exactly π items at any time • (Very hard, “Count the number of players”) – What is the total number of values up to error ±ππ? – You have π(log log π /π 2 ) space and can be completely wrong with some small probability Part 1: Probability 101 “The bigger the data the better you should know your Probability” • Basic Spanish: Hola, Gracias, Bueno, Por favor, Bebida, Comida, Jamon, Queso, Gringo, Chica, Amigo, … • Basic Probability: – Probability, events, random variables – Expectation, variance / standard deviation – Conditional probability, independence, pairwise independence, mutual independence Expectation • πΏ = random variable with values π₯1 , … , π₯π , … • Expectation πΌ πΏ ∞ πΌπΏ = xi ⋅ Pr[πΏ = π₯π ] π=1 • Properties (linearity): πΌ ππΏ = ππΌ πΏ πΌ πΏ + π = πΌ πΏ] + πΌ[π • Useful fact: if all π₯π ≥ 0 and integer then πΌπΏ = ∞ π=1 Pr[πΏ ≥ π] Expectation • Example: dice has values 1, 2, … , 6 with probability 1/6 πΌ[Value]= 6 π ⋅ Pr[ππππ’π = π] π=1 1 = 6 6 π=1 21 π= = 3.5 6 Variance • Variance πππ πΏ = πΌ[(X −πΌ[X])2 ] πππ πΏ = πΌ[(X −πΌ[X])2 ] = = πΌ πΏ2 − 2 π ⋅ πΌ[X] + πΌ[X]2 = πΌ[πΏ2 ] − 2πΌ[π ⋅ πΌ[X]] + πΌ[πΌ[X]2 ] • • • • • πΌ[X] is some fixed value (a constant) 2 πΌ[π ⋅ πΌ[X]]= 2 πΌ[X] ⋅ πΌ[X] =2 πΌ2 [πΏ] πΌ[πΌ[X]2 ] = πΌ2 [X] πππ πΏ = πΌ[πΏ2 ] − 2 πΌ2 πΏ + πΌ2 [X] = πΌ[πΏπ ] − πΌπ [X] Corollary: πππ[ππΏ] = π 2 πππ[πΏ] Variance • Example (Variance of a fair dice): πΌ[ππππ’π] = 3.5 πππ ππππ’π = πΌ[ ππππ’π − πΌ ππππ’π 2 ] =πΌ[ ππππ’π − 3.5 2 ] = 6π=1 π − 3.5 2 ⋅ ππ ππππ’π = π = 1 6 1 6 6 π=1 π − 3.5 2 = [ 1 – 3.5 2 + 2 – 3.5 2 + 3 – 3.5 + 4 – 3.5 2 + 5 – 3.5 2 + 6 – 3.5 2 ] = = 1 6.25 + 2.25 6 8.75 ≈ 2.917 3 2 + 0.25 + 0.25 + 2.25 + 6.25 Independence • Two random variables πΏ and π are independent if and only if (iff) for every π₯, π¦: Pr πΏ = π₯, π = π¦ = Pr πΏ = π₯ ⋅ Pr[π = π¦] • Variables πΏ1 , … , πΏπ are mutually independent iff π Pr πΏπ = π₯1 , … , πΏπ = π₯π = Pr πΏπ = π₯π π=1 • Variables πΏ1 , … , πΏπ are pairwise independent iff for all pairs i,j Pr πΏπ = π₯π , πΏπ = π₯π = Pr πΏπ = π₯π Pr πΏπ = π₯π Independence: Example Independent or not? • Event πΈ1 = Argentina wins the World Cup • Event πΈ2 = Messi becomes the best striker Independent or not? • Event πΈ1 = Argentina wins against Netherlands in the semifinals • Event πΈ2 = Germany wins against Brazil in the semifinals Independence: Example • Ratings of mortgage securities – – – – – – AAA = 1% probability of default (over X years) AA = 2% probability of default A = 5% probability of default B = 10% probability of default C = 50% probability of default D = 100% probability of default • You are a portfolio holder with 1000 AAA securities? – Are they all independent? – Is the probability of default 0.01 1000 = 10−2000 ? Conditional Probabilities • For two events πΈ1 and πΈ2 : Pr[πΈ1 πππ πΈ2 ] Pr πΈ2 πΈ1 = Pr[πΈ1 ] • If two random variables (r.vs) are independent Pr π2 = π₯2 |π1 = π₯1 = = Pr[π1 =π₯1 πππ π2 =π₯2 ] (by definition) Pr π1 =π₯1 Pr π1 =π₯1 ππ π2 =π₯2 (by independence) Pr[π1 =π₯1 ] = Pr[π2 = π₯2 ] Union Bound For any events πΈ1 , … , πΈπ : Pr πΈ1 ππ πΈ2 ππ … ππ πΈπ ≤ Pr πΈ1 + Pr πΈ2 + … + Pr[πΈπ ] • Pro: Works even for dependent variables! • Con: Sometimes very loose, especially for mutually independent events Pr πΈ1 ππ πΈ2 ππ … ππ πΈπ = 1 − ππ=1(1 − Pr πΈπ ) Union Bound: Example Events “Argentina wins the World Cup” and “Messi becomes the best striker” are not independent, but: Pr[“Argentina wins the World Cup” or “Messi becomes the best striker”] ≤ Pr[“Argentina wins the World Cup”] + Pr[“Messi becomes the best striker”] Independence and Linearity of Expectation/Variance • Linearity of expectation (even for dependent variables!): π πΌ π ππ = π=1 πΌ[ππ ] π=1 • Linearity of variance (only for pairwise independent variables!) π πππ π ππ = π=1 πππ[ππ ] π=1 Part 2: Inequalities • Markov inequality • Chebyshev inequality • Chernoff bound Markov’s Inequality • For every π > 0: Pr πΏ ≥ π πΌ πΏ ≤ 1 π • Proof (by contradiction) Pr πΏ ≥ π πΌ πΏ > πΌ πΏ = π π ⋅ Pr[πΏ = π] ≥ ∞ π=ππΌ πΏ π ⋅ Pr πΏ = π ≥ ∞ π=ππΌ πΏ ππΌ πΏ ππΌ πΏ ⋅ Pr πΏ = π ∞ π=ππΌ πΏ Pr πΏ = π = ππΌ πΏ Pr πΏ ≥ π πΌ πΏ >πΌ πΏ 1 π (by definition) (pick only some i’s) (π ≥ ππΌ πΏ ) = (by linearity) (same as above) (by assumption Pr πΏ ≥ π πΌ πΏ > 1 ) π Markov’s Inequality • For every π > 0: Pr πΏ ≥ π πΌ πΏ ≤ • Corollary (c ′ = π πΌ πΏ ) : For every π′ > 0: Pr πΏ ≥ π′ ≤ 1 π πΌπΏ π′ • Pro: always works! • Cons: – Not very precise – Doesn’t work for the lower tail: Pr πΏ ≤ π πΌ πΏ Markov Inequality: Example Markov 1: For every π > 0: 1 Pr πΏ ≥ π πΌ πΏ ≤ π • Example: Pr ππππ’π ≥ 1.5 ⋅ πΌ ππππ’π = Pr ππππ’π ≥ 1.5 ⋅ 3.5 = 1 2 Pr ππππ’π ≥ 5.25 ≤ = 1.5 Pr ππππ’π ≥ 2 ⋅ πΌ ππππ’π 3 = Pr ππππ’π ≥ 2 ⋅ 3.5 1 = Pr ππππ’π ≥ 7 ≤ 2 Markov Inequality: Example Markov 2: For every π > 0: πΌπΏ Pr πΏ ≥ π ≤ π • Example: Pr ππππ’π ≥ 4 Pr ππππ’π ≥ 5 Pr ππππ’π ≥ 6 Pr ππππ’π ≥ 3 πΌ ππππ’π ≤ 4 πΌ ππππ’π ≤ 5 πΌ ππππ’π ≤ 6 πΌ ππππ’π ≤ 3 = = = = 3.5 = 0.875 4 3.5 = 0.7 5 3.5 ≈ 0.58 6 3.5 ≈ 1.17 3 (= 0.5) ≈ 0.33 ≈ 0.17 =1 Markov Inequality: Example Markov 2: For every π > 0: πΌπΏ Pr πΏ ≥ π ≤ π • Pr ππππ’π ≤ π§ = Pr[(7 − ππππ’π) ≥ π§]: Pr ππππ’π ≤ 3 Pr ππππ’π ≤ 2 Pr ππππ’π ≤ 1 Pr ππππ’π ≤ 4 πΌ 7 − ππππ’π ≤ 4 πΌ 7 − ππππ’π ≤ 5 πΌ 7 − ππππ’π ≤ 6 πΌ 7 − ππππ’π ≤ 3 = = = = 3.5 = 0.875 4 3.5 = 0.7 5 3.5 ≈ 0.58 6 3.5 ≈ 1.17 3 (= π. π) ≈ π. ππ ≈ π. ππ =π Markov + Union Bound: Example Markov 2: For every π > 0: πΌπΏ Pr πΏ ≥ π ≤ π • Example: Pr ππππ’π ≥ 4 ππ ππππ’π ≤ 3 ≤ Pr[ππππ’π ≥ 4] + Chebyshev’s Inequality • For every π > 0: • Proof: 1 Pr πΏ − πΌ πΏ ≥ π πππ πΏ ≤ 2 π Pr πΏ − πΌ πΏ ≥ π πππ πΏ = Pr πΏ − πΌ πΏ = Pr πΏ − πΌ πΏ ≤ 1 π2 2 2 ≥ π 2 πππ πΏ ≥ π 2 πΌ[ πΏ − πΌ πΏ 2 (by squaring) ] (def. of Var) (by Markov’s inequality) Chebyshev’s Inequality • For every π > 0: 1 Pr πΏ − πΌ πΏ ≥ π πππ πΏ ≤ 2 π • Corollary (π ′ = π πππ πΏ ): For every π′ > 0: πππ πΏ Pr πΏ − πΌ πΏ ≥ π′ ≤ π ′2 Chebyshev: Example πππ πΏ π ′2 • For every π′ > 0: Pr πΏ − πΌ πΏ ≥ π′ ≤ πΌ ππππ’π = 3.5; πππ ππππ’π ≈ 2.91 Pr ππππ’π ≥ 4 ππ ππππ’π ≤ 3 = ππ[ ππππ’π − 3.5 > Chebyshev: Example • Roll a dice 10 times: ππππ’π10 = Average value over 10 rolls Pr ππππ’π10 ≥ 4 ππ ππππ’π10 ≤ 3 = ? • ππ = value of the i-th roll, πΏ = 1 10 10 π=1 ππ • Variance (= by linearity for independent r.vs): 1 πππ πΏ = πππ 10 1 = 100 10 π=1 10 π=1 1 ππ = πππ 100 10 ππ π=1 1 πππ ππ ≈ ⋅ 10 ⋅ 2.91 = 0.291 100 Chebyshev: Example • Roll a dice 10 times: ππππ’π10 = Average value over 10 rolls Pr ππππ’π10 ≥ 4 ππ ππππ’π10 ≤ 3 = ? • πππ ππππ’π10 = 0.291 (if n rolls then 2.91 / n) • Pr • Pr 0.291 ππππ’π10 ≥ 4 ππ ππππ’π10 ≤ 3 ≤ 2 ≈ 1.16 0.5 2.91 11.6 ππππ’ππ ≥ 4 ππ ππππ’ππ ≤ 3 ≤ ≈ 2 π⋅0.5 π Chernoff bound • Let π1 … ππ‘ be independent and identically distributed r.vs with range [0,1] and expectation π. • Then if π = 1 π‘ π ππ and 1 > πΏ > 0, ππ‘πΏ 2 Pr π − π ≥ πΏπ ≤ 2 exp − 3 Chernoff bound (corollary) • Let π1 … ππ‘ be independent and identically distributed r.vs with range [0, c] and expectation π. • Then if π = 1 π‘ π ππ and 1 > πΏ > 0, ππ‘πΏ 2 Pr π − π ≥ πΏπ ≤ 2 exp − 3π Chernoff: Example • Pr π − π ≥ πΏπ ≤ 2 exp ππ‘πΏ 2 − 3π • Roll a dice 10 times: ππππ’π10 = Average value over 10 rolls Pr ππππ’π10 ≥ 4 ππ ππππ’π10 ≤ 3 = ? – π = ππππ’π10 , π‘ = 10, c = 6 – π = πΌ ππ = 3.5 –πΏ= 0.5 3.5 = 1 7 • Pr ππππ’π10 ≥ 4 ππ ππππ’π10 ≤ 3 ≤ 2 exp 2 exp 35 − 882 ≈ 2 ⋅ 0.96 = 1.92 3.5⋅10 − 3⋅6⋅49 = Chernoff: Example • Pr π − π ≥ πΏπ ≤ 2 exp ππ‘πΏ 2 − 3π • Roll a dice 1000 times: ππππ’π1000 = Average value over 1000 rolls Pr ππππ’π1000 ≥ 4 ππ ππππ’π1000 ≤ 3 = ? – π = ππππ’π1000 , π‘ = 1000, c = 6 – π = πΌ ππ = 3.5 –πΏ= 0.5 3.5 = 1 7 • Pr ππππ’π10 ≥ 4 ππ ππππ’π10 ≤ 3 ≤ 3.5⋅1000 3500 2 exp − = 2 exp − ≈2⋅ 3⋅6⋅49 882 exp −3.96 ≈ 2 ⋅ 0.02 = 0.04 Chernoff v.s Chebyshev: Example Let π = πππ[ππ ] : • Chebyshev: Pr πΏ − π ≥ π′ ≤ πππ πΏ π ′2 • Chernoff: Pr π − π ≥ πΏπ ≤ 2 exp If π‘ is very big: • Values π, π, πΏ, π, π′ are all constants! – Chebyshev: Pr πΏ − π ≥ – Chernoff: Pr πΏ − π ≥ π§ 1 π§ =π π‘ = π −Ω(π‘) = π π‘ π ′2 ππ‘πΏ 2 − 3π Chernoff v.s Chebyshev: Example Large values of t is exactly what we need! • Chebyshev: Pr πΏ − π ≥ π§ 1 =π π‘ −Ω(π‘) • Chernoff: Pr πΏ − π ≥ π§ = π So is Chernoff always better for us? • Yes, if we have i.i.d. variables. • No, if we have dependent or only pairwise independent random varaibles. • If the variables are not identical – Chernoff-type bounds exist. Answers to the puzzles You see a sequence of values π1 , … , ππ , arriving one by one: • (Easy) – If all ππ′ π are different and have values between 1 and π + 1, which value is missing? – You have π(log π) space – Answer: missing value = ππ=1 π − ππ=1 ππ • (Harder) – How can you maintain a uniformly random sample of π values out of those you have seen so far? – You can store exactly π values at any time – Answer: Store first π1 , … , ππ . When you see ππ for π > π, with probability S/π replace random value from your storage with ππ . Part 3: Morris’s Algorithm • (Very hard, “Count the number of players”) – What is the total number of values up to error ± ππ? – You have π(log log π /π 2 ) space and can be completely wrong with some small probability Morris’s Algorithm: Alpha-version Maintains a counter π using log log π bits • Initialize π to 0 • When an item arrives, increase X by 1 with 1 probability π 2 • When the stream is over, output 2 π − 1 Claim: πΌ 2 π = π + 1 Morris’s Algorithm: Alpha-version Maintains a counter π using log log π bits • Initialize π to 0, when an item arrives, increase X 1 by 1 with probability π 2 Claim: πΌ 2π = π + 1 • Let the value after seeing π items be ππ ∞ πΌ 2ππ = Pr[ππ−1 = π ]πΌ 2ππ |ππ−1 = π π=0 = = 1 ∞ π+1 Pr[π = π ] 2 + π−1 π=0 π 2 ∞ π Pr[π = π ] 2 +1 =1 π−1 π=0 1 − 1 π 2 2π ππ−1 +πΌ 2 Morris’s Algorithm: Alpha-version Maintains a counter π using log log π bits • Initialize π to 0, when an item arrives, increase X by 1 with 1 probability π 2 3 2 3 2 Claim: πΌ 22π = π2 + π + 1 ∞ πΌ 22ππ = Pr[2ππ−1 = π ]πΌ 22ππ |2ππ−1 = π π=0 = ∞ ππ−1 π=0 Pr[2 ∞ =π] 1 π 4 π2 + 1 − 1 π π2 Pr[2ππ−1 = π ] π 2 + 3π = πΌ 22ππ−1 + 3πΌ 2 ππ−1 = π=0 n −1 =3 2 2 + 3(n − 1)/2 + 1 + 3n Morris’s Algorithm: Alpha-version Maintains a counter π using log log π bits • Initialize π to 0, when an item arrives, 1 increase X by 1 with probability π • πΌ[2 π ] = n + 1, πππ 2 π = π π • Is this good? 2 2 Morris’s Algorithm: Beta-version Maintains π‘ counters π1 , … , π π‘ using log log π bits for each π′ • Initialize π π to 0, when an item arrives, increase 1 π each π by 1 independently with probability ππ • Output Z = 1 π‘ ππ ( π=1 2 − π‘ ππ • πΌ[2ππ ] = n + 1, πππ 2 • πππ π = πππ • Claim: If π‘ ≥ π π2 1 π‘ 2 1) = π π2 π‘ ππ π=1 2 −1 =π π2 π‘ then Pr π − π > ππ < 1/3 Morris’s Algorithm: Beta-version Maintains π‘ counters π1 , … , π π‘ using log log π bits for each • Output Z = 1 π‘ ππ ( π=1 2 π‘ • πππ π = πππ • Claim: If π‘ ≥ π π2 1 π‘ π‘ ππ π=1 2 – If π‘ ≥ −1 =π π2 π‘ then Pr π − π > ππ < 1/3 – Pr π − π > π π < π π2 − 1) π2 =π π‘ 1 at most 3 πππ[π] π 2 π2 we can make this ⋅ 1 π 2 π2 Morris’s Algorithm: Final • What if I want the probability of error to be really small, i.e. Pr π − π > π π ≤ πΏ? • Same Chebyshev-based analysis: π‘ = π 1 log πΏ • Do these steps π = π times independently in parallel and output the median answer. • Total space: π 1 log log π⋅log πΏ π2 1 π2 πΏ Morris’s Algorithm: Final 1 log πΏ • Do these steps π = π times independently in parallel and output the median answer π π . Maintains π‘ counters π1 , … , π π‘ using log log π bits for each π′ • Initialize π π to 0, when an item arrives, increase 1 π each π by 1 independently with probability ππ • Output Z = 1 π‘ ππ ( π=1 2 π‘ 2 − 1) Morris’s Algorithm: Final Analysis Claim: Pr π π − π > π π ≤ πΏ • Let ππ be an indicator r.v. for the event that ππ − π ≤ ππ, where ππ is the i-th trial. • Let π= π ππ . • Pr π π − π > ππ ≤ Pr π ≤ Pr π − πΌ π ≥ exp 1 2π −π 2 4 3 π 6 π 2 ≤ ≤ Pr π − πΌ π ≥ < exp 1 −π log πΏ <πΏ π 4 ≤ Thank you! • Questions? • Next time: – More streaming algorithms – Testing distributions