Allan Jørgensen All J g Aarhus University Range Median – Data Struc ctures and Batched Queries Range Median Data Structure Maintain a set of n points in the plane plane. Batched Range Median Extensions Given a set of n points and a set of k range r median queries queries, return the answer to the k queries. queries Select(i j s) Find the point with ss’th Select(i,j,s) th smallest y y-value value in [i,j] [i j] x [-∞ [ ,∞]]. Select(i j 4) Select(i,j,4) i In the static case the point set is fixed, fixed and in the dynamic case points can be inserted and deleted. deleted The objective is to use small space and support efficient range median queries queries. j Rank(i,j,e) Return the number of points within [i,j] x [-∞,e]. i1 i3 j3 i2 j1 j2 e Median(i j) Find the point with median y-value Median(i,j) y value in [i,j] [i j] x [[-∞,∞]. ∞ ∞] A batched range median query consissting of three queries: [i1,jj1],[i ] [i2,jj2],[i ] [i3,jj3]. ] Each query and its answer iss shown with a distinct color. color i Rank(i,j,e)=6 ( ,j, ) j P d Predecessor(i,j,e) (i j ) Find Fi d point i t with ith maximum i y-value l in i [i,j] [i j] x [[-∞,e]. ] e i j i Application: pp Maintaining g Histogram g A range g median data structure p presents a way y of storing g a histogram g such that interesting gq queries can be answered efficiently. y 140 Sale of Pilsner, DK (index 2000=100) 120 100 80 P d Predecessor(i,j,e) (i j ) Application: pp Analyzing y g Web Data Ap point set can be made from logs g of cclicks on internet ads. Records contain the time of the click and the pr price paid p by y the advertiser. Each advertiser ad e se runs u s se several e a ad ca campaigns pa g s sp spread ead o over e d different e e intervals e aso of time, and wish to compare the price to o the general ad market during the campaign. A typical comparison is with h the medium price for clicks during the time intervals. This computation co orresponds to the batched range median problem. j Results Median* Median S i Static Dynamic y Space Query Updates O( ) O(n) O(l n / llog log O(log l n)) - O(n ( log g n / log g log g n)) O((log (( g n / log g log g n))2) O((log (( g n / log g log g n))2) * Bounds also hold for Select/Rank/Predecessor 60 40 Batched Range g Median 20 S Space Ti Time O(n) ( ) O(n ( log g k + k log g n / log g log g k)) 1965 5 1967 7 1969 9 1971 1 1973 3 1975 5 1977 7 1979 9 1981 1 1983 3 1985 5 1987 7 1989 9 1991 1 1993 3 1995 5 1997 7 1999 9 2001 1 2003 3 2005 5 2007 7 0 R f References Source: Danmarks Statistik (dst.dk) Source:. (dst dk) In this histogram you can compare your beer consumption with that of the general public over any period of years. You could determine the median beer consumption in your college years and compare with the beer consumption in the college years of for instance your kids kids. [1] Gerth S. Brodal and Allan G. Jørgensen. Range Median – Data Structures and Batched Queries. 2009. MADALGO – Center for Massive Data Algorithmics, Algorithmics a Center of the Danish National Research Foundation