Ranking Large Temporal Data Jeffrey Jestes Jeff M. Phillips Feifei Li

advertisement
Ranking Large Temporal Data
Jeffrey Jestes
Jeff M. Phillips
1
Feifei Li
School of Computing
University of Utah
August 29, 2012
Mingwang Tang
Introduction
Temporal data is important in numerous domains:
financial market
scientific applications
biomedical field
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Introduction
Temporal data is important in numerous domains:
financial market
scientific applications
biomedical field
Extensive efforts have been made towards efficiently storing,
processing, and querying temporal data.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Introduction
Temporal data is important in numerous domains:
financial market
scientific applications
biomedical field
Extensive efforts have been made towards efficiently storing,
processing, and querying temporal data.
Ranking temporal data has only recently been studied. [LYL10]
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Related Work
The instant top-k query returns objects oi s with the k highest scores
at query time t. [LYL10]
Score
o1
o2
o3
Time
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Related Work
The instant top-k query returns objects oi s with the k highest scores
at query time t. [LYL10]
rank 1
rank 2
Score
o1
o2
o3
t1
Time
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Related Work
The instant top-k query returns objects oi s with the k highest scores
at query time t. [LYL10]
rank 1
rank 2
Score
o1
o2
o3
t1
Time
What is a good value for t?
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Related Work
The instant top-k query returns objects oi s with the k highest scores
at query time t. [LYL10]
rank 1
rank 2
Score
o1
o2
o3
t1
Time
What is a good value for t?
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Related Work
The instant top-k query returns objects oi s with the k highest scores
at query time t. [LYL10]
rank 1
rank 2
Score
o1
o2
o3
t1
Time
What is a good value for t?
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Related Work
The instant top-k query returns objects oi s with the k highest scores
at query time t. [LYL10]
rank 1
rank 2
Score
o1
o2
o3
t1
t2
Time
Use aggregation within a temporal interval instead!!!
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Related Work
The instant top-k query returns objects oi s with the k highest scores
at query time t. [LYL10]
rank 1
rank 2
Score
o1
o2
o3
t1
t2
Time
Use aggregation within a temporal interval instead!!!
Example: Return top-10 weather stations with highest average
temperature from 1 Aug to 27 Aug.
[LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Problem Formulation
Score
o1
o2
o3
Time
Temporal database consists of m objects o1 , o2 , . . . , om
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Problem Formulation
Score
gi : R → R ( time → score )
g3 (t) = 90
t
o1
o2
n1 = 3
n2 = 5
n3 = 6
o3
Time
Temporal database consists of m objects o1 , o2 , . . . , om
oi is represented by piecewise linear function gi with ni segments.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Problem Formulation
Score
gi : R → R ( time → score )
g3 (t) = 90
t
o1
o2
n1 = 3
n2 = 5
n3 = 6
o3
Time
Temporal database consists of m objects o1 , o2 , . . . , om
oi is represented by piecewise linear function gi with ni segments.
top-k(t1 , t2 , σ) is an aggregate top-k query for aggregate function σ
gi (t1 , t2 ) represent all possible values of gi in [t1 , t2 ]
σ(gi (t1 , t2 )) (= σi (t1 , t2 )) is the aggregate score of oi in [t1 , t2 ]
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Problem Formulation
Score
gi : R → R ( time → score )
g3 (t) = 90
t
o1
o2
n1 = 3
n2 = 5
n3 = 6
o3
Time
Temporal database consists of m objects o1 , o2 , . . . , om
oi is represented by piecewise linear function gi with ni segments.
top-k(t1 , t2 , σ) is an aggregate top-k query for aggregate function σ
gi (t1 , t2 ) represent all possible values of gi in [t1 , t2 ]
σ(gi (t1 , t2 )) (= σi (t1 , t2 )) is the
R t2aggregate score of oi in [t1 , t2 ]
For σ = sum, σ(gi (t1 , t2 )) = t1 gi (t)dt
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Problem Formulation
Score
o1
o2
o3
Time
A(k, t1 , t2 ) : ordered top-k objects for top-k(t1 , t2 , σ)
R t2
Let σ = sum = t1 g (t)dt
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Problem Formulation
A(2, t1 , t2 ) = {o3 , o1 }
Score
o1
o2
o3
t1
t2
Time
A(k, t1 , t2 ) : ordered top-k objects for top-k(t1 , t2 , σ)
R t2
Let σ = sum = t1 g (t)dt
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Problem Formulation
A(1, t1 , t2 ) = {o1 }
Score
o1
o2
o3
t1
t2
Time
A(k, t1 , t2 ) : ordered top-k objects for top-k(t1 , t2 , σ)
R t2
Let σ = sum = t1 g (t)dt
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Outline
1
Introduction and Problem Formulation
2
Exact Solutions
Baseline Solution
Improved Solution using Prefix Sums and B-tree Forest
Improved Solution using Prefix Sums and Interval Tree
3
Approximate Solutions
Overview
Breakpoints
Approaches for Approximation Queries
Combining Breakpoints with Queries
4
Experiments
5
Conclusions
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Compute σi (t1 , t2 ) for all objects by scanning each segment.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Compute σi (t1 , t2 ) for all objects by scanning each segment.
Simple improvement: use B-tree to avoid segments outside query
interval.
Pm
qi
Query cost: O(logB N + i=1
+ (m/B)logB k)
B
qi = number of segments overlapping [t1 , t2 ]
We denote this query Exact1.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
We can avoid scanning all overlapping segments with [t1 , t2 ] by
using prefix sums:
Index segment and prefix sums for an object in a B-tree.
Compute σi (t1 , t2 ) by retrieving two segments from B-tree.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
We can avoid scanning all overlapping segments with [t1 , t2 ] by
using prefix sums:
Index segment and prefix sums for an object in a B-tree.
Compute σi (t1 , t2 ) by retrieving two segments from B-tree.
Pm
Query cost is O( i=1 logB ni + (m/B)logB k)
This solution is denoted Exact2.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
Score
I3,4
I3,3
g3,4
g3,3
g3,2
g3,1
t3,0
t3,1
t3,2
t3,3
t3,4
Time
Consider an object oi with intervals Ii,1 , . . . , Ii,ni
gi,j = jth segment of oi is ((ti,j−1 , vi,j−1 ), (ti,j , vi,j ))
Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,1
I3,2
I3,3
I3,4
Time
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
I3,3
I3,4
−
I3,4
Time
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
o1
o2
o3
..
.
om
g3,1
−
I3,2
−
I3,3
I3,3
I3,4
−
I3,4
Time
–
–
: σ3 (I3,1 )
...
–
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
I3,4
I3,3
−
I3,4
t1
t2
Time
Issue two stabbing queries: t1 , t2
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
I3,4
I3,3
−
I3,4
t1
o1
o2
o3
..
.
om
g3,2
–
–
: σ3 (I3,2 )
..
.
–
t2
o1
o2
o3
..
.
om
g3,4
Time
–
–
: σ3 (I3,4 )
..
.
–
Retrieve associated 2m data entries
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
We have g3,2 ; g3,4 ; I3,2 ; I3,4
−
I3,4
t1
Score
I3,4
I3,3
t2
g3,4
g3,3
Time
o3
g3,2
g3,1
t1
t2
Time
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
We have g3,2 ; g3,4 ; I3,2 ; I3,4
−
I3,4
t1
Score
I3,4
I3,3
t2
g3,4
g3,3
Compute as I3,4 − I3,2
Time
o3
g3,2
g3,1
t1
t2
Time
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
We have g3,2 ; g3,4 ; I3,2 ; I3,4
−
I3,4
t1
Score
I3,4
I3,3
t2
g3,4
g3,3
Time
Subtract using g3,4
o3
g3,2
g3,1
t1
t2
Time
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
We have g3,2 ; g3,4 ; I3,2 ; I3,4
−
I3,4
t1
Score
I3,4
I3,3
t2
g3,4
g3,3
Time
Subtract using g3,4
o3
g3,2
Add using g3,2
g3,1
t1
t2
Time
−
−
−
We define Ii,1
, . . . , Ii,n
s.t. Ii,`
= [Ii,`−1 , Ii,` ]
i
The data entries for i = 1, . . . , m and ` = 1, . . . , ni are
−
key: (Ii,`
) and value: (gi,` , σi (Ii,` ))
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and Interval Tree
I3,2
I3,1
−
I3,1
−
I3,2
−
I3,3
−
I3,4
t1
o1
o2
o3
..
.
om
g3,2
–
–
: σ3 (I3,2 )
..
.
–
I3,4
I3,3
t2
o1
o2
o3
..
.
om
g3,4
Time
–
–
: σ3 (I3,4 )
..
.
–
Retrieve associated 2m data entries
Total stabbing query cost is O(logB N + m/B).
Using priority queue to get top-k is O(logB N + (m/B)logB k).
We denote this query Exact3.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Outline
1
Introduction and Problem Formulation
2
Exact Solutions
Baseline Solution
Improved Solution using Prefix Sums and B-tree Forest
Improved Solution using Prefix Sums and Interval Tree
3
Approximate Solutions
Overview
Breakpoints
Approaches for Approximation Queries
Nested B-tree Approximate Query
Dyadic Interval Approximate Query
Combining Breakpoints with Queries
4
Experiments
5
Conclusions
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Approximate Solution Overview
N segments
m objects
BreakPoints1
Query1
BreakPoints2
Query2
Our most query-efficient technique costs O(logB N + m/B).
Must compute all m aggregates σi (t1 , t2 ).
Still too expensive for large datasets with large m.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Approximate Solution Overview
N segments
m objects
BreakPoints1
Query1
BreakPoints2
Query2
Our most query-efficient technique costs O(logB N + m/B).
Must compute all m aggregates σi (t1 , t2 ).
Still too expensive for large datasets with large m.
Our approximate methods construct breakpoints
B = {b1 , . . . , br }, bi ∈ [0, T ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Approximate Solution Overview
N segments
m objects
BreakPoints1
Query1
BreakPoints2
Query2
Our most query-efficient technique costs O(logB N + m/B).
Must compute all m aggregates σi (t1 , t2 ).
Still too expensive for large datasets with large m.
Our approximate methods construct breakpoints
B = {b1 , . . . , br }, bi ∈ [0, T ].
Queries are snapped to align to breakpoints.
A query snapped to (bi , bj ) uses σi (bi , bj ) as an object’s score.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Approximate Solution Notations
G is an (ε, α)-approximation algorithm if:
G returns σ
ei (t1 , t2 ) s.t.
σi (t1 , t2 )/α − εM ≤ σ
ei (t1 , t2 ) ≤ σi (t1 , t2 ) + εM
α ≥ 1,Pε > 0
M= m
i=1 σi (0, T )
Must hold for all objects and temporal intevals.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Approximate Solution Notations
Score
e
o2 A(j)
t1
t2
Time
e
e t1 , t2 ))
A(j) (A(j))
= the jth ranked object in A(k, t1 , t2 ) (A(k,
R is an (ε, α)-approximation algorithm of top-k(t1 , t2 , σ) if:
e t1 , t2 ) and σ
R returns A(k,
eA(j)
e (t1 , t2 ) for j ∈ [1, k], s.t.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Approximate Solution Notations
Score
e
o2 A(j)
t1
t2
Time
σ2 (t1 , t2 )/α − εM ≤ σe 2 (t1 , t2 ) ≤ σ2 (t1 , t2 ) + εM
e
e t1 , t2 ))
A(j) (A(j))
= the jth ranked object in A(k, t1 , t2 ) (A(k,
R is an (ε, α)-approximation algorithm of top-k(t1 , t2 , σ) if:
e t1 , t2 ) and σ
R returns A(k,
eA(j)
e (t1 , t2 ) for j ∈ [1, k], s.t.
1
σ
eA(j)
e (t1 , t2 ) is an (ε, α)-approximation of σA(j)
e (t1 , t2 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Approximate Solution Notations
Score
e
o2 A(j)
o3 A(j)
t2
t1
Time
σ3 (t1 , t2 )/α − εM ≤ σe 2 (t1 , t2 ) ≤ σ3 (t1 , t2 ) + εM
e
e t1 , t2 ))
A(j) (A(j))
= the jth ranked object in A(k, t1 , t2 ) (A(k,
R is an (ε, α)-approximation algorithm of top-k(t1 , t2 , σ) if:
e t1 , t2 ) and σ
R returns A(k,
eA(j)
e (t1 , t2 ) for j ∈ [1, k], s.t.
1
2
σ
eA(j)
e (t1 , t2 ) is an (ε, α)-approximation of σA(j)
e (t1 , t2 )
σ
eA(j)
(t
,
t
)
is
an
(ε,
α)-approximation
of
σ
1 2
e
A(j) (t1 , t2 )
Must hold for all k and all temporal intervals.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
Score
o1
o2
o3
Time
breakpoint
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
Score
o1
o2
o3
Time
breakpoint
σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
Score
o1
o2
o3
Time
breakpoint
σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
Score
o1
o2
o3
Time
breakpoint
σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
Score
o1
o2
o3
Time
breakpoint
σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
Score
o1
o2
o3
Time
breakpoint
max{σ1 (bj , bj+1 ), σ2 (bj , bj+1 ), σ3 (bj , bj+1 } = εM
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
Score
o1
o2
o3
Time
breakpoint
max{σ1 (bj , bj+1 ), σ2 (bj , bj+1 ), σ3 (bj , bj+1 } = εM
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Properties of Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
We show how to efficiently construct both types of breakpoints
A cost of O((N/B)logB N) IOs for both types.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Properties of Breakpoints
Starting from b0 and moving forward we have:
(P
m
in BreakPoints1(B1 )
i=1 σi (bj , bj+1 ) = εM,
bj+1 so
m
maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 )
We show how to efficiently construct both types of breakpoints
A cost of O((N/B)logB N) IOs for both types.
The theoretical number of breakpoints is O(1/ε) for both types.
BreakPoints2 has much fewer breakpoints than BreakPoints1
in practice.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Answering Queries with Breakpoints
Score
o1
o2
o3
Time
breakpoint
We show how to answer queries using B1 or B2 approximately.
∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval
B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1
B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Answering Queries with Breakpoints
Score
t1
t2
o1
o2
o3
Time
breakpoint
We show how to answer queries using B1 or B2 approximately.
∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval
B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1
B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Answering Queries with Breakpoints
Score
t1
t2
o1
o2
o3
breakpoint
B(t1 )
B(t2 )
Time
We show how to answer queries using B1 or B2 approximately.
∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval
B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1
B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Answering Queries with Breakpoints
Score
t1
t2
o1
o2
o3
breakpoint
B(t1 )
B(t2 )
Time
We show how to answer queries using B1 or B2 approximately.
∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval
B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1
B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2
Lemma
∀(t1 , t2 ) and its approximate interval (B(t1 ), B(t2 )): ∀oi ,
|σi (t1 , t2 ) − σi (B(t1 ), B(t2 ))| ≤ εM.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Answering Queries with Breakpoints
Score
t1
t2
o1
o2
o3
breakpoint
B(t1 )
B(t2 )
Time
We show how to answer queries using B1 or B2 approximately.
∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval
B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1
B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2
Lemma
∀(t1 , t2 ) and its approximate interval (B(t1 ), B(t2 )): ∀oi ,
|σi (t1 , t2 ) − σi (B(t1 ), B(t2 ))| ≤ εM.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Outline
1
Introduction and Problem Formulation
2
Exact Solutions
Baseline Solution
Improved Solution using Prefix Sums and B-tree Forest
Improved Solution using Prefix Sums and Interval Tree
3
Approximate Solutions
Overview
Breakpoints
Approaches for Approximation Queries
Nested B-tree Approximate Query
Dyadic Interval Approximate Query
Combining Breakpoints with Queries
4
Experiments
5
Conclusions
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
Left end-point index.
Time
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
Left end-point index.
Time
Right end-point index.
Time
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
Left end-point index.
Time
Right end-point index.
Time
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
Left end-point index.
Time
Right end-point index.
Time
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
Left end-point index.
Time
Right end-point index.
Time
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
Left end-point index.
Time
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
At query time we probe first-level B-tree with t1 to get B(t1 ).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
Left end-point index.
Time
B(t1 )
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
At query time we probe first-level B-tree with t1 to get B(t1 ).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
Left end-point index.
B(t1 )
Time
t2
Right end-point index.
Time
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
At query time we probe first-level B-tree with t1 to get B(t1 ).
We probe B(t1 )’s associated nested B-tree to get B(t2 ).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
Left end-point index.
B(t1 )
Time
t2
Right end-point index.
B(t2 )
Query1 indexes all
n
2
Time
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
At query time we probe first-level B-tree with t1 to get B(t1 ).
We probe B(t1 )’s associated nested B-tree to get B(t2 ).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
Left end-point index.
B(t1 )
Time
t2
Right end-point index.
B(t2 )
Time
A(kmax , B(t1 ), B(t2 ))
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
At query time we probe first-level B-tree with t1 to get B(t1 ).
We probe B(t1 )’s associated nested B-tree to get B(t2 ).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
oi
o`1
..
.
B(t1 )
o`kmax
t2
σi (B(t1 ), B(t2 ))
σ`1 (B(t1 ), B(t2 ))
..
.
σ`kmax (B(t1 ), B(t2 ))
Objects ordered in descending order of σi (.)
B(t2 )
A(kmax , B(t1 ), B(t2 ))
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
At query time we probe first-level B-tree with t1 to get B(t1 ).
We probe B(t1 )’s associated nested B-tree to get B(t2 ).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
oi
o`1
..
.
B(t1 )
o`kmax
t2
σi (B(t1 ), B(t2 ))
σ`1 (B(t1 ), B(t2 ))
..
.
σ`kmax (B(t1 ), B(t2 ))
Objects ordered in descending order of σi (.)
Take top-k
B(t2 )
f
A(k,
t1 , t2 )
A(kmax , B(t1 ), B(t2 ))
Query1 indexes all
n
2
intervals of breakpoints B.
For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed.
At query time we probe first-level B-tree with t1 to get B(t1 ).
We probe B(t1 )’s associated nested B-tree to get B(t2 ).
e t1 , t2 ) is returned.
The approximate answer A(k,
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
oi
o`1
..
.
B(t1 )
o`kmax
t2
σi (B(t1 ), B(t2 ))
σ`1 (B(t1 ), B(t2 ))
..
.
σ`kmax (B(t1 ), B(t2 ))
Objects ordered in descending order of σi (.)
B(t2 )
A(kmax , B(t1 ), B(t2 ))
We prove Query1 has the following properties:
Index size O((1/ε)2 kmax /B).
Query cost O(k/B + logB (1/ε)).
(ε, 1)-approximation.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Nested B-trees
breakpoint
t1
oi
o`1
..
.
B(t1 )
o`kmax
t2
σi (B(t1 ), B(t2 ))
σ`1 (B(t1 ), B(t2 ))
..
.
σ`kmax (B(t1 ), B(t2 ))
Objects ordered in descending order of σi (.)
B(t2 )
A(kmax , B(t1 ), B(t2 ))
We prove Query1 has the following properties:
Index size O((1/ε)2 kmax /B).
Query cost O(k/B + logB (1/ε)).
(ε, 1)-approximation.
Query2 reduces space to O((1/ε)kmax /B).
(ε, 2log (1/ε))-approximation.
Query cost O(k log(1/ε) logB k).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
breakpoint
Time
t1
t2
A(kmax , b2 , b3 ) : σA(j) (b2 , b3 )∀j ∈ [1, ..., kmax ]
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
At each dyadic interval [bi , bj ] we store A(kmax , bi , bj ).
There are at most 2log (1/ε) intervals and 2klog (1/ε) candidates.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
candidate running sums
oi
o6
..
.
o19
running sum σi0
σ60
..
.
0
σ19
at most 2k log(1/ε) candidates!!!
t1
t2
A(kmax , b2 , b3 ) : σA(j) (b2 , b3 ), ∀j ∈ [1, ..., kmax ]
Query2 indexes all dyadic intervals over the breakpoints B
The intervals represent the span of nodes in a balanced binary tree.
Consider a query over [t1 , t2 ].
At each dyadic interval [bi , bj ] we store A(kmax , bi , bj ).
There are at most 2log (1/ε) intervals and 2klog (1/ε) candidates.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Querying Breakpoints with Dyadic Intervals
candidate running sums
oi
o6
..
.
o19
running sum σi0
σ60
..
.
0
σ19
at most 2k log(1/ε) candidates!!!
t1
t2
A(kmax , b2 , b3 ) : σA(j) (b2 , b3 ), ∀j ∈ [1, ..., kmax ]
We prove Query2 has the following properties:
Index size O((1/ε)kmax /B).
Query cost O(k log(1/ε) logB k).
(ε, 2 log(1/ε))-approximation.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Combining Breakpoints with Queries
N segments
m objects
BreakPoints1
Query1
BreakPoints2
Query2
We consider the following algorithms:
Appx1-B: (Query1, BreakPoints1)
Appx2-B: (Query2, BreakPoints1)
Appx1: (Query1, BreakPoints2)
Appx2: (Query2, BreakPoints2)
Appx2+: (Query2, BreakPoints2) and Discovers candidates’
exact aggregate score using B-tree from Exact2 (B-tree forest).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Combining Breakpoints with Queries
N segments
m objects
BreakPoints1
Query1
BreakPoints2
Query2
We consider the following algorithms:
Appx1-B: (Query1, BreakPoints1)
Appx2-B: (Query2, BreakPoints1)
Appx1: (Query1, BreakPoints2)
Appx2: (Query2, BreakPoints2)
Appx2+: (Query2, BreakPoints2) and Discovers candidates’
exact aggregate score using B-tree from Exact2 (B-tree forest).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Experiments: Setup
Our algorithms are designed to efficiently handle I/Os.
All algorithms are implemented in C++ using TPIE.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Experiments: Setup
Our algorithms are designed to efficiently handle I/Os.
All algorithms are implemented in C++ using TPIE.
All experiments performed on Linux machine with:
Intel Core i7-2600 3.4GHz CPU
8GB of memory
1TB hard drive
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Experiments: Setup
Our algorithms are designed to efficiently handle I/Os.
All algorithms are implemented in C++ using TPIE.
All experiments performed on Linux machine with:
Intel Core i7-2600 3.4GHz CPU
8GB of memory
1TB hard drive
We use two real large datasets:
Temp is a temperature dataset from the MesoWest Project.
contains measurements from Jan 1997 to Oct 2011.
there are m = 145, 628 objects with average navg = 17, 833.
Meme is obtained from the Memetracker Project.
tracks the frequency of popular quotes over time.
there are m = 1.5 million objects with navg = 67.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Experiments: Default Values
Parameter
dataset
number of objects
average object line segments
max top-k value
top-k value
number of breakpoints
query interval size
TPIE disk block size
Symbol
m
navg
kmax
k
r = (1/ε)
(t2 − t1 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Default value
Temp
50,000
1,000
200
50
500
20% T
4KB
Ranking Large Temporal Data
Experiment: Index size.
12
Index size (bytes)
10
Exact1
Appx1
Exact2 Exact3
Appx2 Appx2+
10
10
8
10
6
10
4
10
10
30
50
100
3
Objects m (×10 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
145
Build time (seconds)
Experiment: Build time.
4
10
Exact1
Appx1
Exact2 Exact3
Appx2 Appx2+
3
10
2
10
1
10
0
10
10
30
50
100
3
Objects m (×10 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
145
Experiment: Query I/Os.
8
10
Exact1
Appx1
Exact2 Exact3
Appx2 Appx2+
6
I/Os
10
4
10
2
10
0
10
10
30
50
100
3
Objects m ( ×10 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
145
Time (seconds)
Experiment: Query time.
Exact1
Appx1
2
10
Exact2 Exact3
Appx2 Appx2+
0
10
−2
10
10
30
50
100
3
Objects m (×10 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
145
Experiment: Precision/Recall.
Precision/Recall
1.00
0.98
0.96
0.94
0.92
0.90
0
Appx1
10
Appx2
20
Appx2+
30
40
(t2 − t1 ) as % of T
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
50
Experiment: Ratio.
Approximation ratio
1.09
Appx1
Appx2
Appx2+
1.06
1.03
1.00
0.97
0.94
0
10
20
30
40
(t2 − t1 ) as % of T
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
50
Conclusions
We studied ranking large temporal data using aggregate scores over
a query interval.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Conclusions
We studied ranking large temporal data using aggregate scores over
a query interval.
Our most efficient exact technique Exact3 is more efficient than
baseline solutions.
Approximations offer even more improvements.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Conclusions
We studied ranking large temporal data using aggregate scores over
a query interval.
Our most efficient exact technique Exact3 is more efficient than
baseline solutions.
Approximations offer even more improvements.
Future work includes ranking with holistic aggregations and
extending to distributed settings.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
The End
Thank You
Q and A
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Computing σ(g3 (t1 , t2 ))
Score
o3
t1
1
t2
Initialize sum s3 = 0 for object o3
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Time
Baseline Solution
Computing σ(g3 (t1 , t2 ))
Score
(t3,2 , v3,2 )
(t3,1 , v3,1 )
o3
t3,1 t1
1
2
t3,2
t2
Time
Initialize sum s3 = 0 for object o3
For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Computing σ(g3 (t1 , t2 ))
Score
(t3,2 , v3,2 )
(t3,1 , v3,1 )
o3
t3,1 t1
1
2
I
t3,2
t2
Time
Initialize sum s3 = 0 for object o3
For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 )
Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ]
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Computing σ(g3 (t1 , t2 ))
Score
(t3,2 , v3,2 )
(t3,1 , v3,1 )
σ3 (I )
t3,1 t1
1
2
I
o3
t3,2
t2
Time
Initialize sum s3 = 0 for object o3
For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 )
Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ]
Update s3 = s3 + σ3 (I)
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Computing σ(g3 (t1 , t2 ))
Score
o3
t1
1
2
t2
Time
Initialize sum s3 = 0 for object o3
For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 )
Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ]
Update s3 = s3 + σ3 (I)
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Computing σ(g3 (t1 , t2 ))
Score
o3
t1
1
2
t2
Time
Initialize sum s3 = 0 for object o3
For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 )
Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ]
Update s3 = s3 + σ3 (I)
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Computing A(k, t1 , t2 )
Score
o3
t2
t1
Compute si for all objects i ∈ [1, m].
Time
Insert si ’s into priority queue of size k to get A(k, t1 , t2 ).
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Baseline Solution
Computing A(k, t1 , t2 )
Score
o3
t2
t1
Compute si for all objects i ∈ [1, m].
Time
Insert si ’s into priority queue of size k to get A(k, t1 , t2 ).
Naive cost: O(N + mlogk)
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Baseline Solution using B-tree
Score
o3
t2
t1
For each line segment ` = {(ti,j , vi,j ), (ti,j+1 , vi,j+1 )}
Index left end-point ti,j in B-tree.
The value associated with ti,j is `.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Time
Improved Baseline Solution using B-tree
Score
o3
t2
t1
For each line segment ` = {(ti,j , vi,j ), (ti,j+1 , vi,j+1 )}
Index left end-point ti,j in B-tree.
The value associated with ti,j is `.
Query cost: O(logB N +
Pm
i=1
qi
B
+ (m/B)logB k)
qi = number of ` overlapping [t1 , t2 ]
We denote this query Exact1.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Time
Improved Solution using Prefix Sums and B-tree Forest
Score
g3,4
g3,3
g3,2
g3,5
g3,1
g3,6
o3
t3,5 t3,6
t3,0
t3,1
t3,2
t3,3
t3,4
Time
gi = ∪gi,j
gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni }
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,1
Score
g3,4
g3,3
g3,2
g3,5
g3,1
g3,6
σ3 (I3,1 )
o3
t3,5 t3,6
t3,0
t3,1
t3,2
t3,3
t3,4
Time
gi = ∪gi,j
gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni }
Let Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni and compute σi (Ii,` )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,2
I3,1
Score
g3,4
g3,3
g3,2
g3,5
g3,1
g3,6
σ3σ(I
)
3,2
3 (I
3,2 )
t3,0
t3,1
o3
t3,5 t3,6
t3,2
t3,3
t3,4
Time
gi = ∪gi,j
gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni }
Let Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni and compute σi (Ii,` )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,2
I3,1
Score
I3,6
I3,5
I3,4
I3,3
g3,4
g3,3
g3,2
g3,5
g3,1
g3,6
o3
t3,5 t3,6
t3,0
t3,1
t3,2
t3,3
t3,4
Time
gi = ∪gi,j
gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni }
Let Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni and compute σi (Ii,` )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
Score
g3,4
g3,3
g3,2
g3,5
g3,1
g3,6
o3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,4
I3,2
Score
g3,4
g3,3
g3,2
g3,5
g3,1
σ3 (t1 , t3,2 )
σ3 (t2 , t3,4 ) g3,6 o
3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 )
σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,4
I3,2
Score
g3,4
g3,3
g3,2
g3,5
g3,1
σ3 (t1 , t3,2 )
σ3 (t2 , t3,4 ) g3,6 o
3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 )
σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,4
I3,2
Score
g3,4
g3,3
g3,2
g3,5
g3,1
σ3 (t1 , t3,2 )
σ3 (t2 , t3,4 ) g3,6 o
3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 )
σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,4
I3,2
Score
g3,4
g3,3
g3,2
g3,5
g3,1
σ3 (t1 , t3,2 )
σ3 (t2 , t3,4 ) g3,6 o
3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 )
σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L )
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,4
I3,2
Score
g3,4
g3,3
g3,2
g3,5
g3,1
σ3 (t1 , t3,2 )
σ3 (t2 , t3,4 ) g3,6 o
3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 )
σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L )
Use a B-tree forest to index (t3,` , (gi,` , σi (Ii,` ))
Each oi indexed in
Pa separate B-tree
Query cost is O( m
i=1 logB ni + (m/B)logB k)
We denote this query Exact2.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,4
I3,2
Score
g3,4
g3,3
g3,2
g3,5
g3,1
σ3 (t1 , t3,2 )
σ3 (t2 , t3,4 ) g3,6 o
3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Our B-tree forest solution requires m B-trees.
Query time improves from baseline.
Opening/Closing m B-trees expensive for large m.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Improved Solution using Prefix Sums and B-tree Forest
I3,4
I3,2
Score
g3,4
g3,3
g3,2
g3,5
g3,1
σ3 (t1 , t3,2 )
σ3 (t2 , t3,4 ) g3,6 o
3
t3,5 t3,6
t3,0
t3,1 t1
t3,2
t3,3
t2 t3,4
Time
Our B-tree forest solution requires m B-trees.
Query time improves from baseline.
Opening/Closing m B-trees expensive for large m.
We show how to solve a query using a single interval tree.
Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang
Ranking Large Temporal Data
Download