Data Structures: Range Queries - Space Efficiency Pooya Davoodi Aarhus University PhD Defense July 4, 2011 Thesis Overview 25 12 14 4 Range Minimum Queries in Arrays (ESA 2010, Invited to Algorithmica) Path Minima Queries in Trees (WADS 2011) 6 76 18 98 8 31 79 45 20 7 5 12 43 23 6 4 10 17 84 11 65 62 10 38 7 9 8 58 13 5 40 25 46 4 2 3 10 Range Diameter Queries in 2D Point Sets (Submitted to ISAAC 2011) a Succinct 𝑘-ary Trees c (TAMC 2011) b d e f 2 Range Minimum Queries Database systems – Lowest average-salary: Year Age 1990 1995 2000 2005 2010 60 80,000 85,000 115,000 120,000 118,000 50 81,000 73,000 90,000 100,000 94,000 40 65,000 75,000 86,000 92,000 95,000 30 72,000 91,000 89,000 103,000 102,000 20 35,000 45,000 42,000 50,000 41,000 Minimum: 65,000 at [3,1] 3 Definition Input: an array 𝐴 = 1 . . 𝑚 × 1. . 𝑛 (𝑚 ≤ 𝑛) Query: where is minimum in 𝑖1 . . 𝑖2 × 𝑗1 . . 𝑗2 ? 𝑛 𝑖1 𝑚 𝑖2 𝑗1 𝑗2 4 Naïve Solution Brute force search Query time: 𝑂(𝑠 ∙ 𝑠′) time Worst case : 𝑂(𝑁) time (𝑁 = 𝑚 ∙ 𝑛) 𝑛 𝑠′ 𝑚 𝑠 5 Data Structures Preprocess and store some information Naïve: store the answers of all queries Top-Left 1 2 1 2 12 5 8 10 Tabulation 𝑂(1) query time Size of the table: 𝑂(𝑁 2 log 𝑁) bits Bottom-Right Minimum (1,1) (1,1) (1,1): 12 (1,1) (1,2) (1,2): 8 (1,1) (2,1) (2,1): 5 (1,1) (2,2) (2,1): 5 (2,1) (2,1) (2,1): 5 (2,1) (2,2) (2,2): 5 (1,2) (1,2) (1,2): 8 (1,2) (2,2) (1,2): 8 (2,2) (2,2) (2,2): 10 6 Space-Efficient Data Structures Reference Tabulation Tarjan et al. (STOC’84) Chazelle & Rosenberg (SoCG’89) Lewenstein et al. (CPM’07) Demaine et al. (ICALP’09) Sadakane (ISAAC’07) Our Result (ESA’10) Our Result (ESA’10) Space (bits) 𝑂 𝑁 2 log 𝑁 Query Time 𝑂 1 𝑂 𝑁 log 2 𝑁 𝑂 log 𝑁 𝑂 𝑁 log 𝑁 𝑂 𝛼 𝑁 𝑂 𝑁 log 𝑁 (𝑚 ≤ 𝑛) Ω 𝑁 log 𝑁 (𝑚 = 𝑛) 𝑂 𝑁 (𝑚 = 1) Ω 𝑁 log 𝑚 (𝑚 ≤ 𝑛) 𝑂 𝑁 ∙ min 𝑚, log 𝑁 (𝑚 ≤ 𝑛) 2 𝑂 1 𝑂 1 - 𝑂 1 7 1D vs. 2D 1D: Cartesian Trees 7 𝑂(log 𝑁) bits per element 5 20 8 (Tarjan et al., STOC’84) 6 10 𝑂(1) bits per element (Sadakane, ISAAC’07) Lowest Common Ancestor 2 7 20 2 10 16 8 5 16 6 2D: Nothing like Cartesian Trees Ω(log 𝑚) bits per element 𝑚 (Our Result, ESA’10) 8 Indexing Data Structures Popular in Succinct Data Structures Read-only Index Input Array 𝑚×𝑛=𝑁 𝑂(1) 𝑂 𝑂 𝑁 bits 𝑁 log 𝑁 bits Size of Input 1 𝑐 bits per element 𝑂(1) 𝑂 𝑁 𝑐 bits 𝑂(𝑐 log 2 𝑐) 𝑂 𝑁 𝑐 bits Ω(𝑐) Size of Index Query Time (Our Results, ESA’10) 9 𝑂(𝑁) bits with 𝑂(1) query time 27 30 2 90 28 15 18 6 13 20 93 54 17 11 16 12 7 74 39 62 61 38 68 10 8 9 5 2 46 23 7 5 20 Cartesian Trees 8 6 16 10 87 98 21 7 20 2 10 8 5 16 6 Cartesian Tree: log 𝑛 log log 𝑛 log log 𝑚 log 𝑚 Tabulation Atallah and Yuan (SODA’10) 10 𝑂 𝑁/𝑐 bits 𝑁/𝑐 bits 1 bits Per Element 𝑐 𝑂(𝑐 log 2 𝑐) query time Ω(𝑐) query time Proof: 𝑁/𝑐 queries distinguish inputs in Ω(𝑁) time ∃query with Ω(𝑐) time C 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 11 Outline Range Minimum Queries 25 12 14 4 (ESA 2010, Invited to Algorithmica) Path Minima Queries (WADS 2011) 6 76 18 98 8 31 79 45 20 7 5 12 43 23 6 4 10 17 84 11 65 62 10 38 4 42 7 9 8 58 6 3 13 5 40 25 10 Range Diameter Queries (Submitted to ISAAC 2011) 12 Path Minima/Maxima Queries The most expensive connection between two given nodes? – between b and k = (c,e) i – between e and k = (j,k) 4 7 b 5 a 30 4 c 4 4 6 e 2 f d g j 3 h 10 k Tree-Topology Networks Update(c,e) = 4 Trees with Dynamic Weights 13 Naïve Structures Brute Force Search – Worst case query time: 𝑂(𝑛) – Update time: 𝑂(1) b 7 5 a c i 4 30 e 6 4 2 3 h g f 4 d j 10 k Tabulation – Query time: 𝑂(1) – Update time: 𝑂(𝑛2 ) 30 4 14 Dynamic Weights Reference Query Time Update Time Tabulation 𝑂 1 𝑂 𝑛2 Brute Force Search 𝑂 𝑛 𝑂 1 Sleator and Tarjan (STOC’81) 𝑂 log 𝑛 𝑂 Our Result (WADS’11) 𝑂 log 𝑛 log 𝑛 log log 𝑛 Comparison Based 𝑂(log 𝑛) Optimal: Alstrup et al. (FOCS’98) Optimal: Brodal et al. (SWAT’96) 𝑂 Our Result (WADS’11) log 𝑛 log log 𝑛 𝑂 log 𝑛 log log 𝑛 Optimal by conjecture: Optimal: Alstrup et al. (FOCS’98) Patrascu and Thorup (STOC’06) RAM Reduction from Range Minimum Queries in 1D arrays 𝐴[1] 𝐴[2] 𝐴[3] 𝐴[4] 𝐴[5] 𝐴[6] 15 Dynamic Leaves Reference Query Time Update Time Comment Alstrup and Holm (ICALP’00) and Kaplan and Shafrir (ESA’08) 𝑂(1) 𝑂(1) RAM Our Results (WADS’11) 𝑂 𝛼(𝑛) 𝑂(1) Comparison based Optimal: Pettie (FOCS’02) i 4 b 5 a 7 c 4 d 30 e 6 4 2 3 h g f j 10 4 k 16 Updates with link and cut i cut(c,e) link (d,i,12) Reference Sleator and Tarjan (STOC’81) Our Results (WADS’11) b 5 a 7 30 c 4 e 6 4 2 3 h g f 12 d Query Time 4 Update Time 𝑂 log 𝑛 𝑂 log 𝑛 Ω(log 𝑛) 𝑂(log 𝑛) log 𝑛 Ω( ) log log 𝑛 𝑂(log 𝑐 𝑛) j 10 k Comment Comparison Based Cell Probe Proof: by reduction from connectivity problems in graphs 17 Outline Range Minimum Queries 25 12 14 4 (ESA 2010, Invited to Algorithmica) Path Minima Queries (WADS 2011) 6 76 18 98 8 31 79 45 20 7 5 12 43 23 6 4 10 17 84 11 65 62 10 38 4 42 7 9 8 58 6 3 13 5 40 25 10 Range Diameter Queries (Submitted to ISAAC 2011) 18 Range Diameter Queries Farthest pair of points A Difficult Problem 19 Known Results Reference Query Time Space Tabulation 𝑂 log 𝑛 𝑂 𝑛4 Smid et al. (CCCG’08) 𝑂 log 6 𝑛 𝑂 𝑛2 𝑂 Our Results (Submitted to ISAAC’11) 𝑛 log 𝑛 𝑂 𝑛 Reduction from Set Intersection Set Intersection Problem Cohen and Porat (2010) 𝑂 log 𝑛 𝑂 𝑛 log 𝑛 𝑂 𝑛2 𝑂 𝑛 Conjecture: Set Intersection problem is difficult (Patrascu and Roditty, FOCS’10) 20 Set Intersection Queries Reduction 𝑆1 = 0.5,1,2 𝑆2 = 1,1.5, 3 𝑆3 = 0.2,3 𝑆𝑖 ∩ 𝑆𝑗 = ∅ ? 4 2 1 𝑆1 ∩ 𝑆2 ≠ ∅ 𝑆1 ∩ 𝑆3 = ∅ Diameter = 3 𝑆1 ∩ 𝑆2 ≠ ∅ Diameter < 5 𝑆1 ∩ 𝑆3 = ∅ Arithmetic on real numbers with unbounded precisions Reference Query Time Space Points in Convex Position Our Results (Submitted to ISAAC’11) 𝑂 log 𝑛 𝑂 𝑛 log 𝑛 21 Publications Range Minimum Queries 25 12 14 4 (ESA 2010, Invited to Algorithmica) Path Minima Queries 12 43 23 6 (WADS 2011) 8 31 79 45 10 17 84 11 20 7 5 6 76 18 98 4 65 62 10 38 7 9 8 58 4 42 13 5 40 25 6 3 10 Range Diameter Queries (Submitted to ISAAC 2011) a Succinct 𝑘-ary Trees b c de (TAMC 2011) f 22