Randomized Algorithms CS648 Lecture 11 • BPWM : The final algorithm • Hashing – (part 1) 1 Boolean Product Witness Matrix (BPWM) Problem: Given two Boolean matrices A and B, and their Boolean product C, compute a matrix ๐พ, such that: ๐พ๐๐ stores a witness for each (๐, ๐) with ๐ช๐๐ = 1. Conclusion of last lecture Theorem: Given two Boolean matrices ๐จ and ๐ฉ, and integer ๐, there is a randomized Monte Carlo algorithm to compute witnesses for all those pairs which have ๐ witnesses. • The running time is O(๐๐ ๐ฅ๐จ๐ ๐). • The error probability is at most ๐ ๐๐−๐ ? Questions: 1. How to compute witnesses for all pairs in O(๐๐ ๐ฅ๐จ๐ ๐ ๐) time. 2. How to convert Monte Carlo to Las Vegas ? Randomized algorithm for Computing Witnesses for all pairs with exactly ๐ witnesses //The pseudo code for sampling the (indices of) columns Sample(๐, ๐) { ๐ ๏ ∅; For each ๐ ∈ [๐] do: add ๐ to ๐ with probability ๐; return ๐ ; } Randomized algorithm for Computing Witnesses for all pairs with exactly ๐ witnesses Compute-Witnesses(๐ด,๐ต,๐ก) { { ๐ ๏ Sample(๐, 1/๐ก); For each ๐, ๐ ∈ [๐] if ๐ ∈ ๐ then ๐ด๐๐ ๏ ๐โ ๐ด๐๐ ; else ๐ด๐๐ ๏0; ๐ท′ ๏ ๐ด โ ๐ต; For each ๐, ๐ ∈ [๐] ′ If ๐ท๐๐ is a witness for (๐, ๐) ′ ๐๐๐ ๏ ๐ท๐๐ } } Time complexity: O(๐๐ ) 1 Probability of finding a witness for a single pair (๐, ๐): ?? > ๐ Focus on a single pair (๐,๐) Let there be ๐ค witnesses for (๐,๐) such that ๐ก ≤ ๐ค < 2๐ก Question: If each column is selected independently with ๐ = 1/๐ก , what is the probability that exactly one out of ๐ค witnesses for (๐,๐) survives ? Answer: = ๐ค 1 1 ๐ก 1 ๐ค−1 ๐ก 1− 1 ๐ค 1−๐ก 1 2๐ก ≥ ≥ 1−๐ก 1 ≥ ๐2 = 0.135 … > 1 8 Randomized algorithm for Computing Witnesses for all pairs with witness count ∈ [๐, ๐๐) Compute-Witnesses(๐ด,๐ต,๐ก) { { ๐ ๏ Sample(๐, 1/๐ก); For each ๐, ๐ ∈ [๐] if ๐ ∈ ๐ then ๐ด๐๐ ๏ ๐โ ๐ด๐๐ ; else How to reduce the error ๐ probability to < ๐๐ ? ๐ด๐๐ ๏0; ๐ท′ ๏ ๐ด โ ๐ต; For each ๐, ๐ ∈ [๐] ′ If ๐ท๐๐ is a witness for (๐, ๐) Repeat the entire process ๐ ๐ฅ๐จ๐ 8/7 ๐ times. ′ ๐๐๐ ๏ ๐ท๐๐ } } Time complexity: O(๐๐ ) 1 7 Probability of failing to find a witness for a single pair (๐, ๐): ??< 1 − < 8 8 Randomized algorithm for Computing Witnesses for all pairs with witness count ∈ [๐, ๐๐) Compute-Witnesses(๐ด,๐ต,๐ก) { Repeat ๐ ๐ฅ๐จ๐ ๐/๐ ๐ times { ๐ ๏ Sample(๐, 1/๐ก); For each ๐, ๐ ∈ [๐] if ๐ ∈ ๐ then ๐ด๐๐ ๏ ๐โ ๐ด๐๐ ; else ๐ด๐๐ ๏0; ๐ท′ ๏ ๐ด โ ๐ต; For each ๐, ๐ ∈ [๐] ′ If ๐ท๐๐ is a witness for (๐, ๐) ′ ๐๐๐ ๏ ๐ท๐๐ } } Time complexity: O(๐๐ ๐ฅ๐จ๐ ๐) ๐ < ๐ Probability of failing to find a witness for a single pair (๐, ๐): ?? ๐ Randomized algorithm for Computing Witnesses for all pairs with witness count ∈ [๐, ๐๐) Compute-Witnesses(๐ด,๐ต,๐ก) { Repeat ๐ ๐ฅ๐จ๐ ๐/๐ ๐ times { ๐ ๏ Sample(๐, 1/๐ก); For each ๐, ๐ ∈ [๐] if ๐ ∈ ๐ then ๐ด๐๐ ๏ ๐โ ๐ด๐๐ ; else How to compute witnesses for all pairs in O(๐๐ ๐ฅ๐จ๐ ๐ ๐) time ? ๐ด๐๐ ๏0; ๐ท′ ๏ ๐ด โ ๐ต; For each ๐, ๐ ∈ [๐] ′ If ๐ท๐๐ is a witness for (๐, ๐) Let there be ๐ pairs that have exactly ๐ก witnesses. Apply Union theorem … ′ ๐๐๐ ๏ ๐ท๐๐ } } Time complexity: O(๐๐ ๐ฅ๐จ๐ ๐) ๐ Prob. of failing to find a witness for any pair having witness count ∈ [๐, ๐๐) : ?? < ๐๐ < ๐ ๐๐−๐ Randomized algorithm for Computing Witnesses for all pairs Compute-Witnesses(๐ด,๐ต) { For ๐ = 1 to ๐ฅ๐จ๐ ๐ ๐ { Repeat ๐ ๐ฅ๐จ๐ ๐/๐ ๐ times { ๐ ๏ Sample(๐, 1/2๐ ); For each ๐, ๐ ∈ [๐] if ๐ ∈ ๐ then ๐ด๐๐ ๏ ๐โ ๐ด๐๐ ; else ๐ด๐๐ ๏0; ๐ท′ ๏ ๐ด โ ๐ต; For each ๐, ๐ ∈ [๐] ′ If ๐ท๐๐ is a witness for (๐, ๐) ′ ๐๐๐ ๏ ๐ท๐๐ } } } Time complexity: O(๐๐ ๐ฅ๐จ๐ ๐ ๐) Question: What is Prob. that witness is not found for a given pair (๐,๐) ? Answer: ? < ๐ ๐ ๐ Question: What is Prob. that witness is not found for at least one pair ? Answer: < ๐ ๐๐−๐ How to If there is transform even a single to Laspair with nonzero witnesses Vegas but algowe ? fail to find even one, run the O(๐๐ ) algo. Expected running time = O(๐๐ ๐ฅ๐จ๐ ๐ ๐) Boolean Product Witness Matrix (BPWM) Theorem: Given two Boolean matrices A and B, and their Boolean product C, there exists a Las Vegas algorithm for computing a matrix ๐พ, such that: ๐พ๐๐ stores a witness for each (๐, ๐) with ๐ช๐๐ = 1. The expected running time of the algorithm is O(๐๐ ๐ฅ๐จ๐ ๐ ๐). Homework: Show that the running time of the algo is concentrated around O(๐๐ ๐ฅ๐จ๐ ๐ ๐). (modify the algorithm if needed) Hashing (part 1) Problem Definition • ๐ผ = 1,2, … , ๐ called universe • ๐บ ⊆ ๐ผ and ๐ = |๐บ| • ๐ โช๐ Examples: ๐ = 1018 , ๐ = 103 Aim Maintain a data structure for storing ๐บ to support the search query : “Does ๐ ∈ ๐บ ?” for any given ๐ ∈ ๐ผ. Solutions Solutions with worst case guarantees Solution for static ๐บ : • Array storing ๐บ in sorted order Solution for dynamic ๐บ : • Height Balanced Search trees (AVL trees, Red-Black trees,…) Time per operation: O(log ๐ ), Space: O(๐ ) Alternative: Time per operation: O(1), Space: O(๐) Solutions used in practice with no worst case guarantees Hashing. Hashing • Hash table: ๐ป: an array of size ๐. • Hash function ๐ : ๐ผ๏ [๐] Answering a Query: “Does ๐ ∈ ๐บ ?” 1. ๐๏๐(๐); 2. Search the list stored at ๐ป[๐]. Properties of ๐ : • ๐ ๐ computable in O(1) time. • Space required by ๐: O(1). How many words needed to encode ๐ ? ๐ป 0 1 โฎ โฎ ๐−๐ Elements of ๐บ Collision Definition: Two elements ๐, ๐ ∈ ๐ผ are said to collide under hash function ๐ if ๐ ๐ =๐ ๐ Worst case time complexity of searching an item ๐ : No. of elements in ๐บ colliding with ๐. A Discouraging fact: No hash function can be found which is good for all ๐บ. Proof: At least ๐/๐ elements from ๐ผ are mapped to a single index in ๐ป. ๐ป 0 1 โฎ โฎ ๐−๐ Collision Definition: Two elements ๐, ๐ ∈ ๐ผ are said to collide under hash function ๐ if ๐ ๐ =๐ ๐ Worst case time complexity of searching an item ๐ : No. of elements in ๐บ colliding with ๐. A Discouraging fact: No hash function can be found which is good for all ๐บ. Proof: At least ๐/๐ elements from ๐ผ are mapped to a single index in ๐ป. ๐ป 0 1 โฎ โฏ โฎ ๐−๐ ๐/๐ Hashing • A very popular heuristic since 1950’s • Achieves O(1) search time in practice • Worst case guarantee on search time: O(๐) Question: Can we have a hashing ensuring • O(1) worst case guarantee on search time. • O(๐) space. • Expected O(๐) preprocessing time. The following result gave an answer in affirmative๏ Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984. WHY DOES HASHING WORK SO WELL IN PRACTICE ? Why does hashing work so well in Practice ? Question: What is the simplest hash function ๐ : ๐ผ๏ [๐] ? Answer: ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐ Hashing works so well in practice because the set ๐บ is usually a uniformly random subset of ๐ผ. Let us give a theoretical reasoning for this fact. Why does hashing work so well in Practice ? Let ๐ฆ1 , ๐ฆ2 , … , ๐ฆ๐ denote ๐ elements selected randomly uniformly from ๐ผ to form ๐บ. Question: What is expected number of elements colliding with ๐ฆ1 ? Answer: Let ๐ฆ1 takes value ๐. P(๐ฆ๐ collides with ๐ฆ1 ) = ?? 1 2 โฎ ๐−๐ ๐ ๐+๐ How many possible values can ๐ฆ๐ take ? ๐ − 1 How many possible values can collide with ๐ ? ๐ + 2๐ ๐ + 3๐ โฎ m Why does hashing work so well in Practice ? Let ๐ฆ1 , ๐ฆ2 , … , ๐ฆ๐ denote ๐ elements selected randomly uniformly from ๐ผ to form ๐บ. Question: What is expected number of elements colliding with ๐ฆ1 ? Answer: Let ๐ฆ1 takes value ๐. P(๐ฆ๐ collides with ๐ฆ1 ) = ๐ ๐ 1 2 โฎ ๐−๐ ๐ ๐+๐ −1 ๐−1 ๐ + 2๐ Expected number of elements of ๐บ colliding with ๐ฆ1 = = ๐ ๐ −1 ๐−1 ๐ + 3๐ โฎ (๐ − 1) = ๐ 1 for ๐ = ๐(๐ ) m Values which may collide with ๐ under the hash function ๐ ๐ฅ = ๐ ๐ฆ๐จ๐ ๐ Why does hashing work so well in Practice ? Conclusion 1. ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐ works so well because for a uniformly random subset of ๐ผ, the expected number of collision at an index of ๐ป is O(1). It is easy to fool this hash function such that it achieves O(s) search time. (do it as a simple exercise). This makes us think: “How can we achieve worst case O(1) search time for a given set ๐บ.” HOW TO ACHIEVE WORST CASE O(1) SEARCH TIME Key idea to achieve worst case O(1) search time Observation: Of course, no single hash function is good for every possible ๐บ. But we may strive for a hash function which is good for a given ๐บ. A promising direction: Find out a family of hash functions H such that • For any given ๐บ, many of them are good. The notion of goodness is captured formally by Universal hash family in the following slide. • Select a function randomly from H and try for ๐บ. UNIVERSAL HASH FAMILY Universal Hash Family Definition: A collection ๐ฏ of hash-functions is said to be c-universal if there exists a constant ๐ such that for any ๐, ๐ ∈ ๐ผ, ๐๐∈๐ ๐ฏ ๐ ๐ = ๐ ๐ ≤ ๐ ๐ Question: Does there exist a Universal hash family whose hash functions have a compact encoding? Answer: Yes and it is very simple too ๏ STATIC HASHING WORST CASE O(1) SEARCH TIME The Journey One Milestone in Our Journey: • A perfect hash function using hash table of size O(๐ 2 ) Tools Needed: • Universal Hash Family where ๐ is a small constant • Elementary Probability Spend at least 30 minutes today to see how the milestone can be achieved. It is simple and beautiful ๏