Range Median – Data Struc ctures and Batched Queries

advertisement
Allan Jørgensen
All
J g
Aarhus University
Range Median – Data Struc
ctures and Batched Queries
Range Median Data Structure
Maintain a set of n points in the plane
plane.
Batched Range Median
Extensions
Given a set of n points and a set of k range
r
median queries
queries, return the
answer to the k queries.
queries
Select(i j s) Find the point with ss’th
Select(i,j,s)
th smallest y
y-value
value in [i,j]
[i j] x [-∞
[ ,∞]].
Select(i j 4)
Select(i,j,4)
i
In the static case the point set is fixed,
fixed and in the dynamic case points can
be inserted and deleted.
deleted The objective is to use small space and support
efficient range median queries
queries.
j
Rank(i,j,e) Return the number of points within [i,j] x [-∞,e].
i1
i3
j3
i2
j1
j2
e
Median(i j) Find the point with median y-value
Median(i,j)
y value in [i,j]
[i j] x [[-∞,∞].
∞ ∞]
A batched range median query consissting of three queries: [i1,jj1],[i
] [i2,jj2],[i
] [i3,jj3].
]
Each query and its answer iss shown with a distinct color.
color
i
Rank(i,j,e)=6
( ,j, )
j
P d
Predecessor(i,j,e)
(i j ) Find
Fi d point
i t with
ith maximum
i
y-value
l in
i [i,j]
[i j] x [[-∞,e].
]
e
i
j
i
Application:
pp
Maintaining
g Histogram
g
A range
g median data structure p
presents a way
y of storing
g a histogram
g
such
that interesting
gq
queries can be answered efficiently.
y
140
Sale of Pilsner, DK (index 2000=100)
120
100
80
P d
Predecessor(i,j,e)
(i j )
Application:
pp
Analyzing
y g Web Data
Ap
point set can be made from logs
g of cclicks on internet ads. Records
contain the time of the click and the pr
price paid
p
by
y the advertiser. Each
advertiser
ad
e se runs
u s se
several
e a ad ca
campaigns
pa g s sp
spread
ead o
over
e d
different
e e intervals
e aso
of
time, and wish to compare the price to
o the general ad market during the
campaign. A typical comparison is with
h the medium price for clicks during
the time intervals. This computation co
orresponds to the batched range
median problem.
j
Results
Median*
Median
S i
Static
Dynamic
y
Space
Query
Updates
O( )
O(n)
O(l n / llog log
O(log
l n))
-
O(n
( log
g n / log
g log
g n)) O((log
(( g n / log
g log
g n))2) O((log
(( g n / log
g log
g n))2)
* Bounds also hold for Select/Rank/Predecessor
60
40
Batched Range
g Median
20
S
Space
Ti
Time
O(n)
( )
O(n
( log
g k + k log
g n / log
g log
g k))
1965
5
1967
7
1969
9
1971
1
1973
3
1975
5
1977
7
1979
9
1981
1
1983
3
1985
5
1987
7
1989
9
1991
1
1993
3
1995
5
1997
7
1999
9
2001
1
2003
3
2005
5
2007
7
0
R f
References
Source: Danmarks Statistik (dst.dk)
Source:.
(dst dk)
In this histogram you can compare your beer consumption with that of the
general public over any period of years. You could determine the median
beer consumption in your college years and compare with the beer
consumption in the college years of for instance your kids
kids.
[1] Gerth S. Brodal and Allan G. Jørgensen. Range Median – Data Structures and
Batched Queries. 2009.
MADALGO – Center for Massive Data Algorithmics,
Algorithmics a Center of the Danish National Research Foundation
Download