Online Interval Skyline Queries on Time Series

advertisement
Bin Jiang, Jian Pei


Problem Definition
An On-the-fly Method
◦ Interval Skyline Query Answering Algorithm
◦ Online Interval Skyline Query Algorithm
 Radix Priority Search Tree

A View-Materialization Method
◦ Non-redundant skyline time series---NRSky[i:j]

Experiments

Notions
◦ Time Series: A time series s consists of a set of ( value, timestamp)
pairs.Here we denote the value of s at timestamp I by s[i], and s as
a sequence of values s[1],s[2],…
◦ Time Interval: a range in time, denoted as [i : j]. We write
if
;
if
.
Some Notions in This Paper

Interval Skyline
◦ Given a set S of time series and interval[i:j], the interval skyline is
the set of time series that are not dominated by any other time
series in [i:j], denoted by
Suppose
S={S1, S2, S3}
S2
S1
S3
S1 and S2 are
in Sky[16:22],
while S3 is
doninated by
S2.

Interval Skyline
Property 1:If there exist timestamps k1,…,kl(i≤k1<…<kl≤j) such that
and s is the only such a time series, then
time series
is in
.

Problem Definition
◦ Given a set of time series S such that each time series is in the
base interval
,we want to maintain a data
structure D such that any interval skyline queries in interval
can be answered efficiently using D.

Methods
◦ An On-The-Fly Method
 Original Interval Skyline Query Algorithm
 Online Interval Skyline Query Algorithm
◦ A View-Materialization Method


Problem Definition
An On-the-fly Method
◦ Interval Skyline Query Answering Algorithm
◦ Online Interval Skyline Query Algorithm
 Radix Priority Search Tree

A View-Materialization Method
◦ Non-redundant skyline time series---NRSky[i:j]

Experiments

Idea
Using the maximum value and minimum value of the time
series, we can determine the domination of some time series
without checking the details.

1.
2.
3.
4.
5.
6.
7.
8.
Algorithm
Set current Skyline Set Sky is null;
Sort the time series in a list L in the descending order of their
maximum value;
Set the maximum value of the minimum value of the time
series in Sky
For each time series s that satisfies
in L, determine whether it can dominate or be dominated by
time series in Sky; If it can not be dominated:
add it into Sky ;
delete its dominance in Sky ;
update
;
Return Sky;

Example
Goal: compute the skyline in interval [2:3]
Steps:
1. s2->Sky, maxmin =1
2. s3->Sky, maxmin =2
3. s5->Sky, maxmin =4
4. s5->s1, s1 is discarded, maxmin =4
5. s4.min=3<4=maxmin, s4 is discarded.
Return Sky={s2,s3,s5}

Disadvantage
Checking the max value for each time series and the min[i:j]
for the query interval [i:j] is costly.

•
•
Improvement Idea
Utilize Radix Priority Search Tree to maintain the min[i:j]
Use a sketch to keep the max value for each time series

Radix Priority Search Tree
Radix Priority Search Tree is a two-dimensional data structure,
a hybrid of a heap on one dimension and a binary search tree
on the other dimension.
Advantages:
•Insertion in O(h)
•Deletion in O(h)
•Query in O(h)
h: the height of the tree

Radix Priority Search Tree
◦ Build
• Use the timestamps as the binary tree dimension X and
the data value as the heap dimension Y;
• Map W into a fixed domain of X, {0,1,...,w-1};
• The height of the tree is O(logw)
◦ Update
One insertion s[
One deletion s[
→
]
]
: the most recent timestamp

Sketches
◦ A pair (v,t) is maintained if no other pair (v1,t1) such
that v1>v, t1>t;
◦ These pairs form the skyline of points in the interval;
◦ The expected number of points in the skyline is
O(logw);
◦ With the sketches, finding the maximum value in W
costs O(1) time ;
W=[1,3]
Sketches : (4,1),(3,2),(2,3)
W=[1,4]
Sketches : (5,4)

Complexity
◦ Space
 Radix priority search tree O(w)
 Sketch of the max values O(logw)
Total: O(nw)
◦ Time
 Radix priority search tree O(logw)
 Sketch of the max values O(logw)
Total: O(nlogw)


Problem Definition
An On-the-fly Method
◦ Interval Skyline Query Answering Algorithm
◦ Online Interval Skyline Query Algorithm
 Radix Priority Search Tree

A View-Materialization Method
◦ Non-redundant skyline time series---NRSky[i:j]

Experiments

Non-redundant interval skylines
A time series s is called a non-redundant skyline time
series in interval [i:j] if
1)S is in the skyline in interval[i:j]
2)S is not in the skyline in any subinterval[i‫׳‬:j‫[ ]׳‬i:j]
It can be proved by pigeonhole principle, if there are
more than w skyline intervals, at least two of them will
share the same starting timestamps, then one of them is
not a minimum skyline interval.

Idea
Suppose all non-redundant interval skylines are
materialized, we can union all these skylines over
all intervals in [i:j] and remove those fail Lemma 2.

Algorithm

Example
W= [2:4]
Goal: compute the interval skyline in [3:4]
Steps:
1. s3->Sky
2. s4->Sky
3. s1->Sky(s2 is dominated by s1)
Return Sky={s1,s3,s4}
How to maintain the nonredundant skylines ?

Steps

Step1
◦ Use the on-the-fly algorithm to obtain the interval
skyline in the new interval W‫׳‬.
◦ Find possible false negatives .

Step2-Shared Divide-and-Conquer Algorithm
◦ This algorithm is an extension of the divide-and
conquer algorithm(DC).
◦ In SDC, a space is defined as a time interval. Each
timestamp represents a dimension.
◦ The related spaces(intervals) are organized as a
path, eg. [j:j],[j-1,j],...,[i,j](i<j).
Merge Step
Divide Step
B
B
P4
P3
S1
S2 P4
P3
P1
P5
B
P5
P2
mA
P1
mB
P5
A
S22
P3
P1
P2
S12
A
S21 P2
S11
mA
A

Comparisons

Results

Step3-Remove “redundant time series”


Problem Definition
An On-the-fly Method
◦ Interval Skyline Query Answering Algorithm
◦ Online Interval Skyline Query Algorithm
 Radix Priority Search Tree

A View-Materialization Method
◦ Non-redundant skyline time series---NRSky[i:j]

Experiments

Parameters

Synthetic Data Sets
◦ Data Sets Properties
◦ Query Efficiency

Synthetic Data Sets
◦ Update Efficiency
◦ Space Cost

Stock Data Sets
◦ Query Time
Download