STUN - Computer Science Department

advertisement
1
STUN:
SPATIO-TEMPORAL UNCERTAIN
(SOCIAL) NETWORKS
Chanhyun Kang
Computer Science Dept.
University of Maryland, USA
chanhyun@cs.umd.edu
Andrea Pugliese
DEIS Dept.
University of Calabria, Italy
apugliese@deis.unical.it
John Grant, V.S. Subrahmanian
Computer Science Dept.
University of Maryland, USA
{grant,vs}@cs.umd.edu
2
Motivation
Let’s assume that there is a social network including spatio-temporal
information with certainty values.
Maryland
Potomac
Bethesda
3
Motivation
• Query example
• Find all people who attended a party in Maryland
at time point 5 with certainty at least 0.5
Common subgraph matching query
Temporal constraint Certainty constraint
Spatial constraint
Within Maryland
At time point 5
At least 0.5 certainty
The query contains not only common graph query but also constraints
for spatio-temporal information and certainty values
4
Motivation
• In graph query research
• Several subgraph matching algorithms and index structures are
suggested
• The indexes and the algorithms consider graph structure property
only
• But in order to answer the query efficiently, we need to consider
• Graph structure property
• Spatio-temporal information property
• Certainty information property
• So, we suggest a new index structure considering the
properties and a query processing algorithm using the index.
5
In this paper
• Introduce STUN: Spatio-Temporal Uncertainty (Social)
Network
• Define STUN query language
• Develop STUN index, a disk based index structure
• Develop a query processing algorithm using STUN index
• Evaluate the algorithms
6
STUN
• Spatio-Temporal Uncertainty (Social) Network is an
extension of social networks
• Supports aspects of spatio-temporal uncertainty in
networks
• Where and when the relationships are/were true
• How certain we are that the relationships hold/held
• Defined by a set of STUN tuples
• STUN tuple : STUN quadruple + STUN annotation
• STUN quadruple : two vertices, a relationship and a certainty
value
• STUN annotation : spatio-temporal information
7
Syntax : STUN quadruple
• STUN quadruple : (v, l, v’; c)
• v, v’ ∈ V (vertices) and l ∈ L (labels)
• Certainty factor c ∈ [0,1]
For example, (Jim, Friend, Ed; 0.7)
Jim
Ed
Friend;0.7
“Jim” is a friend of “Ed” with certainty 0.7
8
Syntax : STUN annotation
• STUN annotation: [R,T]
• Expresses spatial information and temporal information
• R is a region, a set of space points in a spatial reference system S
• S ⊆ [0,M] x [0,N] with M,N ∈ R (Real numbers)
• A space point is a member of S
• T is a time interval, a pair(st, et) with st ≤ et
• st and et are time points to express the start and the end of a
specific period
• A time point is a member of a temporal reference system [L, U]
9
Syntax : STUN tuple
• STUN tuple : (v, l, v’; c) : [R, T]
• STUN quadruple + STUN annotation
Ex. (Phil, Organized, Party2; 1):[Bethesda, (15,15)]
Party2
Phil
( ,Organized, ;1) [Bethesda, (15,15)]
“Phil” organized “Party2” with certainty 1 and the event occurred
at time 15 at some location within the region “Bethesda”
• A STUN knowledge base is a finite set of STUN tuples.
10
STUN QUERY LANGUAGE
11
STUN Queries
• A STUN query q contains
• Graph part (Gq)
• Subgraph query
• Minimum certainty values for the relationships in the graph query
• Constraint Part (Cq)
• Constraints for spatial information
• Constraints for temporal information
Example.
Find all people who attended a party in Maryland at time point 5
Subgraph query
with certainty at least 0.5
Minimum certainty value
Constraint for spatiotemporal information
12
STUN Queries
• Graph part : Gq
• Subgraph query and Minimum certainty values
• A set of query graph tuples
• Variables are denoted using “?”; output variables are
underlined
• A query graph tuple is (v, l, v’; c) : [R, T] where
• v, v’∈ V U VARV, l ∈ L U VARL, c∈[0,1],
• R ∈VARR and T ∈VART
Example.
Find all people(?I) who attended a party(?P) in Maryland at time point 5
Subgraph query
with certainty at least 0.5
Gq={(?I, attended, ?P; 0.5):[?s,?t]}
Minimum certainty value
13
STUN Queries
• Constraint part: Cq
• Specify spatial constraints and temporal constraints
• Expressed by
• Predicate symbols
• Represent a spatial relation or a temporal relation
• Parameters for the predicates
• Ground terms or variables in the graph part
Example.
Find all people(?I) who attended a party(?P) in Maryland at time point 5
with certainty at least 0.5
Spatial
constraint
Temporal
constraint
Cq ={inside(?s, Maryland), during(?t,[5,5])}
14
STUN Query example
• Find all people(?I) who attended a party(?P) in Maryland
at time point 5 with certainty at least 0.5
Gq={(?I, attended, ?P; 0.5):[?s,?t]}
Cq ={inside(?s,Maryland), during(?t,[5,5])}
15
STUN Query example
• Finds all people(?I)
• who have been a friend of ‘Jim’ in the time interval [10,20] with
certainty at least 0.9 as well as a friend of ‘Phil’ in the same interval
with certainty at least 0.6
• And who attended a party(?P) in Maryland organized by ‘Phil’ that
occurred during the time interval [0,20]
Gq={(?I, attended, ?P; 1.0):[?s1,?t1],
(?I, friend, Jim; 0.9):[?s2,?t2],
(?I, friend, Phil; 0.6):[?s2,?t2],
(Phil, organized, ?P; 1.0):[?s1,?t1],}
Cq={inside(?s1, Maryland),
during(?t1,[0,20]), during(?t2,[10,20])}
16
STUN query answer
• A substitution θ maps variables to ground terms
• Each ground term maps to itself
• Denote the application of θ to a term x as xθ
Phil Organized
?P
Phil
Organized Party3
Substitution θ
• A substitution θ is an answer to a STUN query q:(Gq, Cq)
• The tuples with θ for the Gq exist in the STUN KB
• The certainty values of the tuples in STUN KB are larger than
minimum certainty in the Gq
• The spatio-temporal information of the tuples satisfy all constraints in
the Cq
- ∀ 𝑣, 𝑙, 𝑣 ′ ; 𝑐 : 𝑅, 𝑇 ∈ πΊπ‘ž, ∃𝑐 ′ ≥ 𝑐 𝑠. 𝑑. π‘£πœƒ, π‘™πœƒ, 𝑣 ′ πœƒ; 𝑐 ′ : π‘…πœƒ, π‘‡πœƒ ∈ 𝐾𝐡
- And π‘π‘ž∈πΆπ‘ž πΆπ‘žπœƒ is true
17
STUN INDEX
18
STUN Index
• A balanced tree
• Each leaf node represents a portion of the STUN knowledge base.
• Each inner node captures the subgraph represented by its child
nodes.
19
STUN Index
• Each node occupies a disk page and contains
• MBR(minimum bounding rectangle)
• Envelops the regions associated with the STUN tuples in the subgraph
of child nodes
• MBI(minimum bounding interval)
• Envelops the time intervals associated with the STUN tuples in the
subgraph of child nodes
A spatial reference system
R1,R2, R2: regions
N1,N2,N3: nodes
N3
MBRMBIs
of N3 are used to prune nodes for
• On processing queries, MBRs and
the answers using
R3 spatial constraints and temporal constraints
R1
R2
N2
N1
MBR of N1
MBR of N2
R1
R2
R3
20
STUN Index
• Reduce the number of nodes to read for answering
queries.
• Each index node should have
• Few cross edges with other nodes at the same level
• Small MBR(minimum bounding rectangle) and small MBI(minimum
bounding interval)
• Small MBR overlaps with other nodes at the same level
• Small MBI overlaps with other nodes at the same level.
• In order to achieve the constraints
• Build a vertex and edge weighted undirected graph(WUG) from the
STUN KB
• Then, handle the weights on building the index
21
Building STUN Index
I.
Initial step
• Build a vertex and edge weighted undirected graph(WUG) from
STUN KB
• The weights are used to satisfy the constraints
• Few cross edges
• Small MBR(minimum bounding rectangle) and small MBI(Minimum
bounding interval)
• Small MBR overlaps and small MBI overlaps
II.
Coarsening Step
• Merging vertices using weights of vertices and edges
III. Partitioning Step
• Build a tree index using coarsened graphs
22
Building Index- Initial Step
Initial Step
I.
• Build a vertex and edge weighted undirected graph(WUG)
• Assign weights of vertices as 1
• Calculate weights of edges using a spatio-temporal vertex distance
function 𝛿 𝑣, 𝑣 ′
• Calculate MBR(minimum bounding rectangle)s and MBI(minimum
bounding interval)s for edges
e1
v0
v1
v2
e0
e2
Each edge contains a spatio-temporal
information with a certainty value
v0
1
MBR π’†πŸ, π’†πŸ
MBR π’†πŸŽ
v1 MBI π’†πŸ, π’†πŸ v2
MBI π’†πŸŽ
1
1
𝜹 π’—πŸŽ, π’—πŸ
𝜹 π’—πŸ, π’—πŸ
{π’π’†πŸ, π’π’†πŸ}
{π’π’†πŸŽ}
labels
WUG
23
Building Index- Initial Step
 Spatio-temporal vertex distance function 𝛿 𝑣, 𝑣 ′
• Looks at the neighborhood of the two vertices
• Measures the “amount” of space and time the vertices share with
each other with respect to their neighborhoods.
𝛿 𝑣, 𝑣 ′ = 𝛼 βˆ™ 𝑐𝑇 𝑣, 𝑣 ′ βˆ™
𝑐𝑇 𝑣, 𝑣 ′ =
𝑛𝑇 𝑣 =
𝑐𝑆 𝑣, 𝑣′ =
𝑛𝑆 𝑣 =
1
𝑛𝑇 𝑣
+
𝑣,𝑙,𝑣 ′ ;𝑐 :[𝑅,𝑇]∈𝐾𝐡 𝑐
𝑣,𝑙,𝑣 ′ ;𝑐 :[𝑅,𝑇]∈𝐾𝐡 𝑐
𝛼 + 𝛽 = 1, π‘Žπ‘›π‘‘ 𝛼, 𝛽 ∈ 𝑅
βˆ™ π‘™π‘’π‘›π‘”β„Ž 𝑇 ,
βˆ™ π‘Žπ‘Ÿπ‘’π‘Ž 𝑅 ,
βˆ™ π‘Žπ‘Ÿπ‘’π‘Ž 𝑅 ,
1
𝑛𝑆 𝑣
+ 𝛽 βˆ™ 𝑐𝑆(𝑣, 𝑣′)(
βˆ™ π‘™π‘’π‘›π‘”β„Ž 𝑇 ,
𝑣,𝑙,𝑣 ′ ;𝑐 :[𝑅,𝑇]∈𝐾𝐡 𝑐
𝑣,𝑙,𝑣 ′ ;𝑐 :[𝑅,𝑇]∈𝐾𝐡 𝑐
1
𝑛𝑇 𝑣 ′
+
1
𝑛𝑆 𝑣 ′
),
24
Building Index- Coarsening
• Coarsen the graph until the size of the coarsened graph is less than 1 disk
page
• At each coarsening level l, the number of vertices in Gl is half of the
number of vertices in Gl-1
Coarsening
Level k
…
…
Level 2
N/2k
Gk
Merging vertices
G2
N/4
Merging vertices
Level 1
Level 0
G1
N/2
Merging vertices
N
G0
Original graph
# of vertices
25
How to merge vertices
Choose a vertex v randomly to merge
Select a neighbor m of v with minimum edge weight
(v is merged into m)
Update the weight of vertex m : π’˜ π’Ž = π’˜ π’Ž + π’˜ 𝒗
Update the weight, MBR and MBI of edges of v and m
(If there is no edge between m and a neighbor of v, add
an edge between m and the neighbor)
Delete the edge between v and m and the vertex v
Update mapping information: 𝝁(𝝁
−𝟏
(v))
26
Building Index- Partitioning
- Each edge already has a MBR and a MBI
1. Store Gk as a root page
MBR(all edges of Gk)
MBI(all edges of Gk)
Gk
2. Partition
3.Induce subgraphs using the mapping information from Gk-1 to Gk
4. Store the subgraphs as child pages
Gk-1
MBR(all edges of a)
MBI(all edges of a)
b
a
MBR(all edges of b)
MBI(all edges of b)
Gk-2
…
…
Coarsened graphs
5. Do the works until at the lowest coarsening level
recursively
27
Query Answering
• STUN index is used to get candidates for variables
• Retrieve the index tree using mapping information with ground
terms(constants) in a query
• MBR(minimum bounding rectangle) and MBI(minimum bounding
interval) are used to filter out the unnecessary pages for the query
answer with regard to spatial and temporal constraints
?I friend Jim
?I friend Phil
Phil organized ?P
- Check MBRs and MBIs of
pages with the constraints for
pruning
STUN index
28
Query Answering
• Overall algorithm
I.
Get candidates for each variable of a query
II.
Select a variable that has the smallest number of candidates
III. Substitute each candidate for the variable
IV. For each substitution, do steps II and III for remaining variables
recursively in a depth first manner
V. If no variable is left, return the substitutions
29
EVALUATION
30
Experiment : Environment
• We developed a prototype implementation in about
10,600 lines of Java code
• Ran the code on a laptop
• a dual-core 2.8 GHz CPU with 8G of RAM running Window 7
• Indexes are on the disk (No explicit buffer to load the index)
• Experiments for the scalability of the STUN index by
varying
• The size of the graph
• The complexity of queries
• The number of constraints in queries
• Queries are randomly generated from STUN KBs
• Each query has at least one answer.
• More than 10000 queries are tested
31
Experiment : Dataset
• YouTube dataset
• Vertices : people and groups
• 20% of groups have a region randomly assigned
• Edge relations
• ‘follow’ : person to person, a time interval
• ‘membership’ : person to group, a time interval
• ‘co-located’ : person to group, a time interval and a region
• Time intervals are randomly assigned to ‘follow’ and ‘membership’
relationships
• A ‘co-located’ edge is added between two members if
• They have ‘membership’ relationships with a same group
• And they have overlapped time interval with the same group
• And the same group has an assigned region
32
Experiment: Result
• Every single data point was obtained by running 200 queries.
33
Experiment: Result
• The query processing time increases slightly super-linearly with the size
of the database thought the slope of the graph increases with the
complexity of the query.
34
Conclusion
• Introduce Spatio-Temporal Uncertainty (Social) network
• Define STUN query language
• Develop a disk based index structure
• Develop a query processing algorithm
• Do experiments for evaluating the STUN system
35
Questions
Download