slides

advertisement
Succinct Orthogonal Range
Search Structures on a Grid with
Applications to Text Indexing
Prosenjit Bose, Carleton University
Meng He, Unversity of Waterloo
Anil Maheshwari and Pat Morin,
Carleton University
2D Orthogonal Range Search
A
fundamental geometric query problem
 Data sets: A set, N, of n points in the plane
 Query: Given an orthogonal query
rectangle R, return information about the
points in N∩R
Orthogonal range counting queries
 Orthogonal range reporting queries


k: size of the output
Example
Range counting query: 5
Range reporting query
Classic Solutions
Data
Structures
Space (words)
Time
(counting)
R-trees
O(n)
O(n)
kd-trees
O(n)
O(n1/2 + k)
Chazelle 1988
O(n)
Range trees
O(n lg n)
O(lg n + k)
Chazelle 1988
O(n lgε n)
O(lg n + k)
O(lg n)
Time
(reporting)
O(lg n + k lgε n)
Range Search on an n×n Grid
A special case: points coordinates are from
[1..n]×[1..n] (rank space)
 The general problem can be reduced to this
special case using a standard approach



Alstrup et al. 2000
Orthogonal range search structures in the rank
space and succinct data structures
Background: Succinct Data Structures
 What
are succinct data structures
(Jacobson 1989)
Representing data structures using ideally
information-theoretic minimum space
 Supporting efficient navigational operations

 Why

succinct data structures
Large data sets in modern applications:
textual, genomic, spatial or geometric
Succinct Orthogonal Range Search
Structures in rank space

Wavelet Trees (Grossi et al. 2003)


Space: n lg n + o (n lg n) bits
Query time for orthogonal range search (Makinen and
Navarro 2006):




Restriction: no points have the same x or y coordinates
Counting: O(lg n)
Reporting: O(k lg n)
Applications

Space-efficient text indexes: Makinen and Navarro
2006, Chien et al. 2008
Support counting: an Overview
Reduce orthogonal range counting to
Dominance counting
 Design a succinct data structure supporting
dominance counting on a narrow grid, i.e. an n×t
grid where t = O(lgε n) (0<ε<1). We also assume
that each point has a distinct x-coordinate
 Recursively divide the n×n grid into narrow grids
and use the above structure at each level
 Remove the restriction that each point has a
distinct x-coordinate

Range counting on a Narrow Grid
S = 2 3 4 4 1 3 1 1 3 2 4 2 3…
Divide the grid into blocks of size lg2 n × t
A 2D array A: A[i,j] stores the result of dominance counting when
(i lg2 n+1, j) is given as the query point
Divide each block into subblocks of size lgλ n × t (0< λ < ε)
A 2D array B: B[i,j] stores, when (i lgλ n+1, j) is given as a query point,
the result of dominance counting inside the block containing this point
A table C that stores for each possible set of lgλ n points on a lgλ n × t
grid and each query point in the grid, the result of dominance counting
Space: n lg t + o(n) bits
Time: O(1)
Range Counting on an n×n Grid
Transform the original grid into a narrow grid by
grouping y-coordinates into ranges of size n/t
Construct orthogonal range search structures
for this narrow grid and recurse
Number of levels: log t n
Space: n lg n + o(n lg n) bits Time: O(log t n)
More results
The restriction that each point has a distinct xcoordinate can be removed using 2n+o(n) extra
bits
 The support for range reporting is based on
similar ideas but is more complicated
 Our main result



Space: n lg n + o (n lg n) bits
Query time for orthogonal range


Counting: O(lg n / lg lg n)
Reporting: O(k lg n / lg lg n)
Applications: Substring Search

Notation:



T-text, n-text size, σ-alphabet size
P-pattern, m-pattern length
occ-number of occurrences
Query: report the occurrences of P in T
 Chien et al. 2008: O(n lg σ) bits, O(m + lg n ×
(logσn + occ lg n)) time
 Our results: O(n lg σ) bits, O(m + lg n × (logσn +
occ lg n) / lglg n) time

Applications: Position-Restricted
Substring Search
 Query:
Given a pattern P and a range [i, j],
how many times does P occur in T[i, j]?
 Makinen and Navarro 2006
Space: 3n lg n + o(n lg n) bits
 Time: O(m + occ lg n)

 Our
results:
Space: 3n lg n + o(n lg n) bits
 Time: O(m + occ lg n / lglg n)

Applications: Representing Small
Integers
Data: A sequence S of n numbers in [1..s],
where s = polylog (n)
 Ferragina et al. 2007




Space: nH0(S) + o(n) bits
Operations: rank/select in O(1) time
Our result:


New operation: Given a range of position [p1..p2] and
a range of values [v1..v2], retrieve the entries in
S[p1..p2] whose values are in [v1..v2]
Time: O(1) for counting, O(1) per entry for reporting
Applications: A Restricted
Versions of Range Search
Restriction: the query rectangle is defined by two
points in the given point set
 Notation:


c: the number of bits required to encode the
coordinates of a point
Space: cn + n lg n + o(n lg n) bits
 Time:



Counting: O (lg n / lglg n)
Reporting: O(k lg n / lglg n)
Conclusions

We designed a succinct data structure for
orthogonal range search on an n×n grid that
provides more efficient support for both counting
and reporting queries

This structure can be used to improve and
extend previous results on succinct data
structures, such as succinct text indexes and
sequence representation.
Thank you!
Download