Ranking Large Temporal Data Jeffrey Jestes Jeff M. Phillips 1 Feifei Li School of Computing University of Utah August 29, 2012 Mingwang Tang Introduction Temporal data is important in numerous domains: financial market scientific applications biomedical field Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Introduction Temporal data is important in numerous domains: financial market scientific applications biomedical field Extensive efforts have been made towards efficiently storing, processing, and querying temporal data. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Introduction Temporal data is important in numerous domains: financial market scientific applications biomedical field Extensive efforts have been made towards efficiently storing, processing, and querying temporal data. Ranking temporal data has only recently been studied. [LYL10] [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Related Work The instant top-k query returns objects oi s with the k highest scores at query time t. [LYL10] Score o1 o2 o3 Time [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Related Work The instant top-k query returns objects oi s with the k highest scores at query time t. [LYL10] rank 1 rank 2 Score o1 o2 o3 t1 Time [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Related Work The instant top-k query returns objects oi s with the k highest scores at query time t. [LYL10] rank 1 rank 2 Score o1 o2 o3 t1 Time What is a good value for t? [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Related Work The instant top-k query returns objects oi s with the k highest scores at query time t. [LYL10] rank 1 rank 2 Score o1 o2 o3 t1 Time What is a good value for t? [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Related Work The instant top-k query returns objects oi s with the k highest scores at query time t. [LYL10] rank 1 rank 2 Score o1 o2 o3 t1 Time What is a good value for t? [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Related Work The instant top-k query returns objects oi s with the k highest scores at query time t. [LYL10] rank 1 rank 2 Score o1 o2 o3 t1 t2 Time Use aggregation within a temporal interval instead!!! [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Related Work The instant top-k query returns objects oi s with the k highest scores at query time t. [LYL10] rank 1 rank 2 Score o1 o2 o3 t1 t2 Time Use aggregation within a temporal interval instead!!! Example: Return top-10 weather stations with highest average temperature from 1 Aug to 27 Aug. [LYL10] Li et al., Top-k queries on temporal data. In VLDBJ, 2010. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Problem Formulation Score o1 o2 o3 Time Temporal database consists of m objects o1 , o2 , . . . , om Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Problem Formulation Score gi : R → R ( time → score ) g3 (t) = 90 t o1 o2 n1 = 3 n2 = 5 n3 = 6 o3 Time Temporal database consists of m objects o1 , o2 , . . . , om oi is represented by piecewise linear function gi with ni segments. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Problem Formulation Score gi : R → R ( time → score ) g3 (t) = 90 t o1 o2 n1 = 3 n2 = 5 n3 = 6 o3 Time Temporal database consists of m objects o1 , o2 , . . . , om oi is represented by piecewise linear function gi with ni segments. top-k(t1 , t2 , σ) is an aggregate top-k query for aggregate function σ gi (t1 , t2 ) represent all possible values of gi in [t1 , t2 ] σ(gi (t1 , t2 )) (= σi (t1 , t2 )) is the aggregate score of oi in [t1 , t2 ] Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Problem Formulation Score gi : R → R ( time → score ) g3 (t) = 90 t o1 o2 n1 = 3 n2 = 5 n3 = 6 o3 Time Temporal database consists of m objects o1 , o2 , . . . , om oi is represented by piecewise linear function gi with ni segments. top-k(t1 , t2 , σ) is an aggregate top-k query for aggregate function σ gi (t1 , t2 ) represent all possible values of gi in [t1 , t2 ] σ(gi (t1 , t2 )) (= σi (t1 , t2 )) is the R t2aggregate score of oi in [t1 , t2 ] For σ = sum, σ(gi (t1 , t2 )) = t1 gi (t)dt Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Problem Formulation Score o1 o2 o3 Time A(k, t1 , t2 ) : ordered top-k objects for top-k(t1 , t2 , σ) R t2 Let σ = sum = t1 g (t)dt Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Problem Formulation A(2, t1 , t2 ) = {o3 , o1 } Score o1 o2 o3 t1 t2 Time A(k, t1 , t2 ) : ordered top-k objects for top-k(t1 , t2 , σ) R t2 Let σ = sum = t1 g (t)dt Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Problem Formulation A(1, t1 , t2 ) = {o1 } Score o1 o2 o3 t1 t2 Time A(k, t1 , t2 ) : ordered top-k objects for top-k(t1 , t2 , σ) R t2 Let σ = sum = t1 g (t)dt Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Outline 1 Introduction and Problem Formulation 2 Exact Solutions Baseline Solution Improved Solution using Prefix Sums and B-tree Forest Improved Solution using Prefix Sums and Interval Tree 3 Approximate Solutions Overview Breakpoints Approaches for Approximation Queries Combining Breakpoints with Queries 4 Experiments 5 Conclusions Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Compute σi (t1 , t2 ) for all objects by scanning each segment. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Compute σi (t1 , t2 ) for all objects by scanning each segment. Simple improvement: use B-tree to avoid segments outside query interval. Pm qi Query cost: O(logB N + i=1 + (m/B)logB k) B qi = number of segments overlapping [t1 , t2 ] We denote this query Exact1. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest We can avoid scanning all overlapping segments with [t1 , t2 ] by using prefix sums: Index segment and prefix sums for an object in a B-tree. Compute σi (t1 , t2 ) by retrieving two segments from B-tree. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest We can avoid scanning all overlapping segments with [t1 , t2 ] by using prefix sums: Index segment and prefix sums for an object in a B-tree. Compute σi (t1 , t2 ) by retrieving two segments from B-tree. Pm Query cost is O( i=1 logB ni + (m/B)logB k) This solution is denoted Exact2. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 Score I3,4 I3,3 g3,4 g3,3 g3,2 g3,1 t3,0 t3,1 t3,2 t3,3 t3,4 Time Consider an object oi with intervals Ii,1 , . . . , Ii,ni gi,j = jth segment of oi is ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,1 I3,2 I3,3 I3,4 Time − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 I3,3 I3,4 − I3,4 Time − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 o1 o2 o3 .. . om g3,1 − I3,2 − I3,3 I3,3 I3,4 − I3,4 Time – – : σ3 (I3,1 ) ... – − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 I3,4 I3,3 − I3,4 t1 t2 Time Issue two stabbing queries: t1 , t2 − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 I3,4 I3,3 − I3,4 t1 o1 o2 o3 .. . om g3,2 – – : σ3 (I3,2 ) .. . – t2 o1 o2 o3 .. . om g3,4 Time – – : σ3 (I3,4 ) .. . – Retrieve associated 2m data entries − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 We have g3,2 ; g3,4 ; I3,2 ; I3,4 − I3,4 t1 Score I3,4 I3,3 t2 g3,4 g3,3 Time o3 g3,2 g3,1 t1 t2 Time − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 We have g3,2 ; g3,4 ; I3,2 ; I3,4 − I3,4 t1 Score I3,4 I3,3 t2 g3,4 g3,3 Compute as I3,4 − I3,2 Time o3 g3,2 g3,1 t1 t2 Time − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 We have g3,2 ; g3,4 ; I3,2 ; I3,4 − I3,4 t1 Score I3,4 I3,3 t2 g3,4 g3,3 Time Subtract using g3,4 o3 g3,2 g3,1 t1 t2 Time − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 We have g3,2 ; g3,4 ; I3,2 ; I3,4 − I3,4 t1 Score I3,4 I3,3 t2 g3,4 g3,3 Time Subtract using g3,4 o3 g3,2 Add using g3,2 g3,1 t1 t2 Time − − − We define Ii,1 , . . . , Ii,n s.t. Ii,` = [Ii,`−1 , Ii,` ] i The data entries for i = 1, . . . , m and ` = 1, . . . , ni are − key: (Ii,` ) and value: (gi,` , σi (Ii,` )) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and Interval Tree I3,2 I3,1 − I3,1 − I3,2 − I3,3 − I3,4 t1 o1 o2 o3 .. . om g3,2 – – : σ3 (I3,2 ) .. . – I3,4 I3,3 t2 o1 o2 o3 .. . om g3,4 Time – – : σ3 (I3,4 ) .. . – Retrieve associated 2m data entries Total stabbing query cost is O(logB N + m/B). Using priority queue to get top-k is O(logB N + (m/B)logB k). We denote this query Exact3. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Outline 1 Introduction and Problem Formulation 2 Exact Solutions Baseline Solution Improved Solution using Prefix Sums and B-tree Forest Improved Solution using Prefix Sums and Interval Tree 3 Approximate Solutions Overview Breakpoints Approaches for Approximation Queries Nested B-tree Approximate Query Dyadic Interval Approximate Query Combining Breakpoints with Queries 4 Experiments 5 Conclusions Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Approximate Solution Overview N segments m objects BreakPoints1 Query1 BreakPoints2 Query2 Our most query-efficient technique costs O(logB N + m/B). Must compute all m aggregates σi (t1 , t2 ). Still too expensive for large datasets with large m. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Approximate Solution Overview N segments m objects BreakPoints1 Query1 BreakPoints2 Query2 Our most query-efficient technique costs O(logB N + m/B). Must compute all m aggregates σi (t1 , t2 ). Still too expensive for large datasets with large m. Our approximate methods construct breakpoints B = {b1 , . . . , br }, bi ∈ [0, T ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Approximate Solution Overview N segments m objects BreakPoints1 Query1 BreakPoints2 Query2 Our most query-efficient technique costs O(logB N + m/B). Must compute all m aggregates σi (t1 , t2 ). Still too expensive for large datasets with large m. Our approximate methods construct breakpoints B = {b1 , . . . , br }, bi ∈ [0, T ]. Queries are snapped to align to breakpoints. A query snapped to (bi , bj ) uses σi (bi , bj ) as an object’s score. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Approximate Solution Notations G is an (ε, α)-approximation algorithm if: G returns σ ei (t1 , t2 ) s.t. σi (t1 , t2 )/α − εM ≤ σ ei (t1 , t2 ) ≤ σi (t1 , t2 ) + εM α ≥ 1,Pε > 0 M= m i=1 σi (0, T ) Must hold for all objects and temporal intevals. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Approximate Solution Notations Score e o2 A(j) t1 t2 Time e e t1 , t2 )) A(j) (A(j)) = the jth ranked object in A(k, t1 , t2 ) (A(k, R is an (ε, α)-approximation algorithm of top-k(t1 , t2 , σ) if: e t1 , t2 ) and σ R returns A(k, eA(j) e (t1 , t2 ) for j ∈ [1, k], s.t. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Approximate Solution Notations Score e o2 A(j) t1 t2 Time σ2 (t1 , t2 )/α − εM ≤ σe 2 (t1 , t2 ) ≤ σ2 (t1 , t2 ) + εM e e t1 , t2 )) A(j) (A(j)) = the jth ranked object in A(k, t1 , t2 ) (A(k, R is an (ε, α)-approximation algorithm of top-k(t1 , t2 , σ) if: e t1 , t2 ) and σ R returns A(k, eA(j) e (t1 , t2 ) for j ∈ [1, k], s.t. 1 σ eA(j) e (t1 , t2 ) is an (ε, α)-approximation of σA(j) e (t1 , t2 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Approximate Solution Notations Score e o2 A(j) o3 A(j) t2 t1 Time σ3 (t1 , t2 )/α − εM ≤ σe 2 (t1 , t2 ) ≤ σ3 (t1 , t2 ) + εM e e t1 , t2 )) A(j) (A(j)) = the jth ranked object in A(k, t1 , t2 ) (A(k, R is an (ε, α)-approximation algorithm of top-k(t1 , t2 , σ) if: e t1 , t2 ) and σ R returns A(k, eA(j) e (t1 , t2 ) for j ∈ [1, k], s.t. 1 2 σ eA(j) e (t1 , t2 ) is an (ε, α)-approximation of σA(j) e (t1 , t2 ) σ eA(j) (t , t ) is an (ε, α)-approximation of σ 1 2 e A(j) (t1 , t2 ) Must hold for all k and all temporal intervals. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) Score o1 o2 o3 Time breakpoint Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) Score o1 o2 o3 Time breakpoint σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) Score o1 o2 o3 Time breakpoint σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) Score o1 o2 o3 Time breakpoint σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) Score o1 o2 o3 Time breakpoint σ1 (bj , bj+1 ) + σ2 (bj , bj+1 ) + σ3 (bj , bj+1 ) = εM Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) Score o1 o2 o3 Time breakpoint max{σ1 (bj , bj+1 ), σ2 (bj , bj+1 ), σ3 (bj , bj+1 } = εM Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) Score o1 o2 o3 Time breakpoint max{σ1 (bj , bj+1 ), σ2 (bj , bj+1 ), σ3 (bj , bj+1 } = εM Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Properties of Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) We show how to efficiently construct both types of breakpoints A cost of O((N/B)logB N) IOs for both types. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Properties of Breakpoints Starting from b0 and moving forward we have: (P m in BreakPoints1(B1 ) i=1 σi (bj , bj+1 ) = εM, bj+1 so m maxi=1 σi (bj , bj+1 ) = εM, in BreakPoints2(B2 ) We show how to efficiently construct both types of breakpoints A cost of O((N/B)logB N) IOs for both types. The theoretical number of breakpoints is O(1/ε) for both types. BreakPoints2 has much fewer breakpoints than BreakPoints1 in practice. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Answering Queries with Breakpoints Score o1 o2 o3 Time breakpoint We show how to answer queries using B1 or B2 approximately. ∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1 B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2 Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Answering Queries with Breakpoints Score t1 t2 o1 o2 o3 Time breakpoint We show how to answer queries using B1 or B2 approximately. ∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1 B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2 Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Answering Queries with Breakpoints Score t1 t2 o1 o2 o3 breakpoint B(t1 ) B(t2 ) Time We show how to answer queries using B1 or B2 approximately. ∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1 B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2 Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Answering Queries with Breakpoints Score t1 t2 o1 o2 o3 breakpoint B(t1 ) B(t2 ) Time We show how to answer queries using B1 or B2 approximately. ∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1 B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2 Lemma ∀(t1 , t2 ) and its approximate interval (B(t1 ), B(t2 )): ∀oi , |σi (t1 , t2 ) − σi (B(t1 ), B(t2 ))| ≤ εM. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Answering Queries with Breakpoints Score t1 t2 o1 o2 o3 breakpoint B(t1 ) B(t2 ) Time We show how to answer queries using B1 or B2 approximately. ∀(t1 , t2 ), let (B(t1 ), B(t2 )) be the approximate interval B(t1 ) = minbi ∈B s.t. B(t1 ) ≥ t1 B(t2 ) = minbi ∈B s.t. B(t2 ) ≥ t2 Lemma ∀(t1 , t2 ) and its approximate interval (B(t1 ), B(t2 )): ∀oi , |σi (t1 , t2 ) − σi (B(t1 ), B(t2 ))| ≤ εM. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Outline 1 Introduction and Problem Formulation 2 Exact Solutions Baseline Solution Improved Solution using Prefix Sums and B-tree Forest Improved Solution using Prefix Sums and Interval Tree 3 Approximate Solutions Overview Breakpoints Approaches for Approximation Queries Nested B-tree Approximate Query Dyadic Interval Approximate Query Combining Breakpoints with Queries 4 Experiments 5 Conclusions Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint Left end-point index. Time Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint Left end-point index. Time Right end-point index. Time Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint Left end-point index. Time Right end-point index. Time Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint Left end-point index. Time Right end-point index. Time Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint Left end-point index. Time Right end-point index. Time Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 Left end-point index. Time Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. At query time we probe first-level B-tree with t1 to get B(t1 ). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 Left end-point index. Time B(t1 ) Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. At query time we probe first-level B-tree with t1 to get B(t1 ). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 Left end-point index. B(t1 ) Time t2 Right end-point index. Time Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. At query time we probe first-level B-tree with t1 to get B(t1 ). We probe B(t1 )’s associated nested B-tree to get B(t2 ). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 Left end-point index. B(t1 ) Time t2 Right end-point index. B(t2 ) Query1 indexes all n 2 Time intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. At query time we probe first-level B-tree with t1 to get B(t1 ). We probe B(t1 )’s associated nested B-tree to get B(t2 ). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 Left end-point index. B(t1 ) Time t2 Right end-point index. B(t2 ) Time A(kmax , B(t1 ), B(t2 )) Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. At query time we probe first-level B-tree with t1 to get B(t1 ). We probe B(t1 )’s associated nested B-tree to get B(t2 ). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 oi o`1 .. . B(t1 ) o`kmax t2 σi (B(t1 ), B(t2 )) σ`1 (B(t1 ), B(t2 )) .. . σ`kmax (B(t1 ), B(t2 )) Objects ordered in descending order of σi (.) B(t2 ) A(kmax , B(t1 ), B(t2 )) Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. At query time we probe first-level B-tree with t1 to get B(t1 ). We probe B(t1 )’s associated nested B-tree to get B(t2 ). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 oi o`1 .. . B(t1 ) o`kmax t2 σi (B(t1 ), B(t2 )) σ`1 (B(t1 ), B(t2 )) .. . σ`kmax (B(t1 ), B(t2 )) Objects ordered in descending order of σi (.) Take top-k B(t2 ) f A(k, t1 , t2 ) A(kmax , B(t1 ), B(t2 )) Query1 indexes all n 2 intervals of breakpoints B. For each interval [bj , bj0 ], A(kmax , bj , bj0 ) is computed. At query time we probe first-level B-tree with t1 to get B(t1 ). We probe B(t1 )’s associated nested B-tree to get B(t2 ). e t1 , t2 ) is returned. The approximate answer A(k, Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 oi o`1 .. . B(t1 ) o`kmax t2 σi (B(t1 ), B(t2 )) σ`1 (B(t1 ), B(t2 )) .. . σ`kmax (B(t1 ), B(t2 )) Objects ordered in descending order of σi (.) B(t2 ) A(kmax , B(t1 ), B(t2 )) We prove Query1 has the following properties: Index size O((1/ε)2 kmax /B). Query cost O(k/B + logB (1/ε)). (ε, 1)-approximation. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Nested B-trees breakpoint t1 oi o`1 .. . B(t1 ) o`kmax t2 σi (B(t1 ), B(t2 )) σ`1 (B(t1 ), B(t2 )) .. . σ`kmax (B(t1 ), B(t2 )) Objects ordered in descending order of σi (.) B(t2 ) A(kmax , B(t1 ), B(t2 )) We prove Query1 has the following properties: Index size O((1/ε)2 kmax /B). Query cost O(k/B + logB (1/ε)). (ε, 1)-approximation. Query2 reduces space to O((1/ε)kmax /B). (ε, 2log (1/ε))-approximation. Query cost O(k log(1/ε) logB k). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals breakpoint Time t1 t2 A(kmax , b2 , b3 ) : σA(j) (b2 , b3 )∀j ∈ [1, ..., kmax ] Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. At each dyadic interval [bi , bj ] we store A(kmax , bi , bj ). There are at most 2log (1/ε) intervals and 2klog (1/ε) candidates. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals candidate running sums oi o6 .. . o19 running sum σi0 σ60 .. . 0 σ19 at most 2k log(1/ε) candidates!!! t1 t2 A(kmax , b2 , b3 ) : σA(j) (b2 , b3 ), ∀j ∈ [1, ..., kmax ] Query2 indexes all dyadic intervals over the breakpoints B The intervals represent the span of nodes in a balanced binary tree. Consider a query over [t1 , t2 ]. At each dyadic interval [bi , bj ] we store A(kmax , bi , bj ). There are at most 2log (1/ε) intervals and 2klog (1/ε) candidates. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Querying Breakpoints with Dyadic Intervals candidate running sums oi o6 .. . o19 running sum σi0 σ60 .. . 0 σ19 at most 2k log(1/ε) candidates!!! t1 t2 A(kmax , b2 , b3 ) : σA(j) (b2 , b3 ), ∀j ∈ [1, ..., kmax ] We prove Query2 has the following properties: Index size O((1/ε)kmax /B). Query cost O(k log(1/ε) logB k). (ε, 2 log(1/ε))-approximation. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Combining Breakpoints with Queries N segments m objects BreakPoints1 Query1 BreakPoints2 Query2 We consider the following algorithms: Appx1-B: (Query1, BreakPoints1) Appx2-B: (Query2, BreakPoints1) Appx1: (Query1, BreakPoints2) Appx2: (Query2, BreakPoints2) Appx2+: (Query2, BreakPoints2) and Discovers candidates’ exact aggregate score using B-tree from Exact2 (B-tree forest). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Combining Breakpoints with Queries N segments m objects BreakPoints1 Query1 BreakPoints2 Query2 We consider the following algorithms: Appx1-B: (Query1, BreakPoints1) Appx2-B: (Query2, BreakPoints1) Appx1: (Query1, BreakPoints2) Appx2: (Query2, BreakPoints2) Appx2+: (Query2, BreakPoints2) and Discovers candidates’ exact aggregate score using B-tree from Exact2 (B-tree forest). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Experiments: Setup Our algorithms are designed to efficiently handle I/Os. All algorithms are implemented in C++ using TPIE. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Experiments: Setup Our algorithms are designed to efficiently handle I/Os. All algorithms are implemented in C++ using TPIE. All experiments performed on Linux machine with: Intel Core i7-2600 3.4GHz CPU 8GB of memory 1TB hard drive Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Experiments: Setup Our algorithms are designed to efficiently handle I/Os. All algorithms are implemented in C++ using TPIE. All experiments performed on Linux machine with: Intel Core i7-2600 3.4GHz CPU 8GB of memory 1TB hard drive We use two real large datasets: Temp is a temperature dataset from the MesoWest Project. contains measurements from Jan 1997 to Oct 2011. there are m = 145, 628 objects with average navg = 17, 833. Meme is obtained from the Memetracker Project. tracks the frequency of popular quotes over time. there are m = 1.5 million objects with navg = 67. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Experiments: Default Values Parameter dataset number of objects average object line segments max top-k value top-k value number of breakpoints query interval size TPIE disk block size Symbol m navg kmax k r = (1/ε) (t2 − t1 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Default value Temp 50,000 1,000 200 50 500 20% T 4KB Ranking Large Temporal Data Experiment: Index size. 12 Index size (bytes) 10 Exact1 Appx1 Exact2 Exact3 Appx2 Appx2+ 10 10 8 10 6 10 4 10 10 30 50 100 3 Objects m (×10 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data 145 Build time (seconds) Experiment: Build time. 4 10 Exact1 Appx1 Exact2 Exact3 Appx2 Appx2+ 3 10 2 10 1 10 0 10 10 30 50 100 3 Objects m (×10 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data 145 Experiment: Query I/Os. 8 10 Exact1 Appx1 Exact2 Exact3 Appx2 Appx2+ 6 I/Os 10 4 10 2 10 0 10 10 30 50 100 3 Objects m ( ×10 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data 145 Time (seconds) Experiment: Query time. Exact1 Appx1 2 10 Exact2 Exact3 Appx2 Appx2+ 0 10 −2 10 10 30 50 100 3 Objects m (×10 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data 145 Experiment: Precision/Recall. Precision/Recall 1.00 0.98 0.96 0.94 0.92 0.90 0 Appx1 10 Appx2 20 Appx2+ 30 40 (t2 − t1 ) as % of T Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data 50 Experiment: Ratio. Approximation ratio 1.09 Appx1 Appx2 Appx2+ 1.06 1.03 1.00 0.97 0.94 0 10 20 30 40 (t2 − t1 ) as % of T Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data 50 Conclusions We studied ranking large temporal data using aggregate scores over a query interval. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Conclusions We studied ranking large temporal data using aggregate scores over a query interval. Our most efficient exact technique Exact3 is more efficient than baseline solutions. Approximations offer even more improvements. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Conclusions We studied ranking large temporal data using aggregate scores over a query interval. Our most efficient exact technique Exact3 is more efficient than baseline solutions. Approximations offer even more improvements. Future work includes ranking with holistic aggregations and extending to distributed settings. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data The End Thank You Q and A Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Computing σ(g3 (t1 , t2 )) Score o3 t1 1 t2 Initialize sum s3 = 0 for object o3 Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Time Baseline Solution Computing σ(g3 (t1 , t2 )) Score (t3,2 , v3,2 ) (t3,1 , v3,1 ) o3 t3,1 t1 1 2 t3,2 t2 Time Initialize sum s3 = 0 for object o3 For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Computing σ(g3 (t1 , t2 )) Score (t3,2 , v3,2 ) (t3,1 , v3,1 ) o3 t3,1 t1 1 2 I t3,2 t2 Time Initialize sum s3 = 0 for object o3 For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 ) Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ] Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Computing σ(g3 (t1 , t2 )) Score (t3,2 , v3,2 ) (t3,1 , v3,1 ) σ3 (I ) t3,1 t1 1 2 I o3 t3,2 t2 Time Initialize sum s3 = 0 for object o3 For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 ) Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ] Update s3 = s3 + σ3 (I) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Computing σ(g3 (t1 , t2 )) Score o3 t1 1 2 t2 Time Initialize sum s3 = 0 for object o3 For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 ) Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ] Update s3 = s3 + σ3 (I) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Computing σ(g3 (t1 , t2 )) Score o3 t1 1 2 t2 Time Initialize sum s3 = 0 for object o3 For each segment ` of g3 defined by (t3,j , v3,j ), (t3,j+1 , v3,j+1 ) Define I = [t1 , t2 ] ∩ [t3,j , t3,j+1 ] Update s3 = s3 + σ3 (I) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Computing A(k, t1 , t2 ) Score o3 t2 t1 Compute si for all objects i ∈ [1, m]. Time Insert si ’s into priority queue of size k to get A(k, t1 , t2 ). Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Baseline Solution Computing A(k, t1 , t2 ) Score o3 t2 t1 Compute si for all objects i ∈ [1, m]. Time Insert si ’s into priority queue of size k to get A(k, t1 , t2 ). Naive cost: O(N + mlogk) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Baseline Solution using B-tree Score o3 t2 t1 For each line segment ` = {(ti,j , vi,j ), (ti,j+1 , vi,j+1 )} Index left end-point ti,j in B-tree. The value associated with ti,j is `. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Time Improved Baseline Solution using B-tree Score o3 t2 t1 For each line segment ` = {(ti,j , vi,j ), (ti,j+1 , vi,j+1 )} Index left end-point ti,j in B-tree. The value associated with ti,j is `. Query cost: O(logB N + Pm i=1 qi B + (m/B)logB k) qi = number of ` overlapping [t1 , t2 ] We denote this query Exact1. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Time Improved Solution using Prefix Sums and B-tree Forest Score g3,4 g3,3 g3,2 g3,5 g3,1 g3,6 o3 t3,5 t3,6 t3,0 t3,1 t3,2 t3,3 t3,4 Time gi = ∪gi,j gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni } Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,1 Score g3,4 g3,3 g3,2 g3,5 g3,1 g3,6 σ3 (I3,1 ) o3 t3,5 t3,6 t3,0 t3,1 t3,2 t3,3 t3,4 Time gi = ∪gi,j gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni } Let Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni and compute σi (Ii,` ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,2 I3,1 Score g3,4 g3,3 g3,2 g3,5 g3,1 g3,6 σ3σ(I ) 3,2 3 (I 3,2 ) t3,0 t3,1 o3 t3,5 t3,6 t3,2 t3,3 t3,4 Time gi = ∪gi,j gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni } Let Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni and compute σi (Ii,` ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,2 I3,1 Score I3,6 I3,5 I3,4 I3,3 g3,4 g3,3 g3,2 g3,5 g3,1 g3,6 o3 t3,5 t3,6 t3,0 t3,1 t3,2 t3,3 t3,4 Time gi = ∪gi,j gi,j is defined by ((ti,j−1 , vi,j−1 ), (ti,j , vi,j )) for j ∈ {1, . . . , ni } Let Ii,` = [ti,0 , ti,` ] for ` = 1, . . . , ni and compute σi (Ii,` ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest Score g3,4 g3,3 g3,2 g3,5 g3,1 g3,6 o3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,4 I3,2 Score g3,4 g3,3 g3,2 g3,5 g3,1 σ3 (t1 , t3,2 ) σ3 (t2 , t3,4 ) g3,6 o 3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 ) σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,4 I3,2 Score g3,4 g3,3 g3,2 g3,5 g3,1 σ3 (t1 , t3,2 ) σ3 (t2 , t3,4 ) g3,6 o 3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 ) σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,4 I3,2 Score g3,4 g3,3 g3,2 g3,5 g3,1 σ3 (t1 , t3,2 ) σ3 (t2 , t3,4 ) g3,6 o 3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 ) σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,4 I3,2 Score g3,4 g3,3 g3,2 g3,5 g3,1 σ3 (t1 , t3,2 ) σ3 (t2 , t3,4 ) g3,6 o 3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 ) σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L ) Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,4 I3,2 Score g3,4 g3,3 g3,2 g3,5 g3,1 σ3 (t1 , t3,2 ) σ3 (t2 , t3,4 ) g3,6 o 3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Let ti,L = successor (ti,1 ) and ti,R = successor (ti,2 ) σi (t1 , t2 ) = σi (Ii,R ) − σi (Ii,L ) − σi (t2 , ti,R ) + σi (t1 , ti,L ) Use a B-tree forest to index (t3,` , (gi,` , σi (Ii,` )) Each oi indexed in Pa separate B-tree Query cost is O( m i=1 logB ni + (m/B)logB k) We denote this query Exact2. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,4 I3,2 Score g3,4 g3,3 g3,2 g3,5 g3,1 σ3 (t1 , t3,2 ) σ3 (t2 , t3,4 ) g3,6 o 3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Our B-tree forest solution requires m B-trees. Query time improves from baseline. Opening/Closing m B-trees expensive for large m. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data Improved Solution using Prefix Sums and B-tree Forest I3,4 I3,2 Score g3,4 g3,3 g3,2 g3,5 g3,1 σ3 (t1 , t3,2 ) σ3 (t2 , t3,4 ) g3,6 o 3 t3,5 t3,6 t3,0 t3,1 t1 t3,2 t3,3 t2 t3,4 Time Our B-tree forest solution requires m B-trees. Query time improves from baseline. Opening/Closing m B-trees expensive for large m. We show how to solve a query using a single interval tree. Jeffrey Jestes, Jeff M. Phillips, Feifei Li, Mingwang Tang Ranking Large Temporal Data