A Framework for the Static and Dynamic

advertisement
An Event-based Framework for
Characterizing the Evolutionary
Behavior of Interaction Graphs
Sitaram Asur, Srinivasan Parthasarathy and
Duygu Ucar
Department of Computer Science
The Ohio State University
Copyright 2006, Data Mining Research Laboratory
Motivation
Protein-protein interactions
in yeast (Jeong et al, 2001)
• Interaction Networks
– Represent scientific data
from various domains
– Nodes represent entities
– Edges represent
interactions among entities
– Examples:
• Biological Networks - ProteinProtein Interaction (PPI)
networks, gene expression
networks
• Collaboration networks
• Social networks, online
communities, blog networks
Physicist collaboration
network (Newman and
Girvan, 2004)
Copyright 2006, Data Mining Research Laboratory
Motivation
• Mining interaction networks
important
– Gain insight into structure,
properties and behavior of
these networks [Newman,
2001]
• Modular nature of interaction
networks important
– Co-expression networks :
dense components - >
functional modules
– Social networks : clusters ->
community structure
Copyright 2006, Data Mining Research Laboratory
Motivation
• A large number of earlier approaches focused on
mining static interaction networks
• Many important real-world networks are dynamic
Ulrik de Lichtenberg, et al. Science 307, 724 (2005)
Temporal protein interaction network of the yeast mitotic cell cycle.
Copyright 2006, Data Mining Research Laboratory
Motivation
• Dynamic Interaction Networks
– Nodes and interactions change over time
– Structure changes in the network
• Need for a structured method to characterize and model
evolution
– Understand nature of change (evolution) in networks
– Consider evolution of individuals and communities
– Develop models for reasoning and inference of future
events
Copyright 2006, Data Mining Research Laboratory
Workflow
Evolving Graph
Temporal Snapshots
Si
Si+1
Clustering
Ci
Iterate
i
Ci+1
Event Detection
Behavioral Patterns
Copyright 2006, Data Mining Research Laboratory
Analysis
and
Inference
Temporal Snapshots
• Split the graph data into non-overlapping temporal
snapshots
– Each snapshot corresponds to a graph
– Consists of all nodes and interactions active in that time
period
– Nodes active if they have an interaction in a particular
time period
T1
A
B
E
F
T2 A
B
F
E
C
D
G
C
D
Copyright 2006, Data Mining Research Laboratory
G
Clustering
• Represent the snapshot graphs using clusters
– Clusters of a graph can provide structure information
– Examine the evolution of clusters over time
– Can provide insight on corresponding changes to the graph
T1
A
B
E
F
T2 A
B
F
E
C
D
G
C
D
G
– MCL clustering algorithm employed in this work
– Ensemble clustering approaches can be employed to obtain
robust clusters (Asur et al, ISMB 2007)
Copyright 2006, Data Mining Research Laboratory
Community-based Event Detection
•
•
•
•
•
Continue
Merge
Split
Form
Dissolve
T=2
T=3
T=1
T=6
T=5
T=4
1
1
C1
C2
C22
C 31
1
1
C6
1
C4
C5
3
2
C6
2
C4
3
C4
2
C5
3
C5
Copyright 2006, Data Mining Research Laboratory
4
C5
C6
4
C6
5
C6
Entity-based Event Detection
•
•
•
•
Appear
Disappear
Join
Leave
1
1
C1
T=4
T=2
T=3
T=1
C2
A
C
B
2
1
A
C22
B
1
C3
1
C4
A
A
B
C 32
B
C 24
Copyright 2006, Data Mining Research Laboratory
Event Detection
• Represent each set of snapshot clusters as a k X N binary
cluster-membership matrix
• Use bitwise operators to compute the events between each
successive pair of matrices (snapshots)
• Example: Continue Event
Continue (Cj, Ck) = AND (Si(j), Si+1(k)) == OR(Si(j), Si+1(k))
• Event Detection algorithm linear in the number of nodes in
the graph O(N)
Copyright 2006, Data Mining Research Laboratory
Temporal Analysis
• Use critical events for analysis
• Form and Dissolve events
– Used to study group formation and dissipation
• Merge and Split events
– Evolution of groups
• Continue events
– Stability of clusters/groups
– Evolution of topics in a collaboration network
Copyright 2006, Data Mining Research Laboratory
Behavioral Analysis
• Use entity-based critical events discovered to compose
incremental measures for capturing behavioral patterns
• Behavioral measures can then be used to analyze
evolutionary behavior of nodes and clusters
• Four Behavioral measures
–
–
–
–
Stability Index
Sociability Index
Popularity Index
Influence Index
Copyright 2006, Data Mining Research Laboratory
Case Study 1 : DBLP Collaboration network
• Data from 28 key conferences in
databases/data mining/AI over
10 years
• Authors (nodes) connected by
collaborations (edges)
• 23136 nodes and 54989 edges
• Collaboration networks display
many of the structural features
of social networks (Kempe,
Kleinberg and Tardos 2003,
Newman 2001)
Copyright 2006, Data Mining Research Laboratory
Case Study 2 : Clinical Trials Network
• Clinical Trials
– Can provide information on risks, benefits and optimal
dosage levels.
– Consists of observations of patients under drug use as well as
some under placebo
– Generally represented as a set of multivariate time series
• Evolving clinical trials network
– Nodes representing patients
– Correlations among patients modeled as edges
– Edges change over time as correlations change
• Motivation: Use evolution of correlation to identify
potential toxic effects of drugs
Copyright 2006, Data Mining Research Laboratory
Stability Index
• Propensity of a node to interact with the same
group of people over time
• Stability for a node over time incrementally
computed based on the stability of the clusters it
belongs to
Copyright 2006, Data Mining Research Laboratory
Stability for Clinical Trials data
• Nodes with low Stability Index values
represent patients with fluctuating correlation
values (outliers)
• Null Hypothesis:
– If the drug does not result in toxicity, then
outliers are likely to be flagged at random
from each group (drug and placebo).
• Experiment on clinical trials network for
diabetes patients
– 19 nodes (patients) found having Stability
Index below threshold.
18 out of the 19 were on the drug!!!
– The drug under study was discontinued
due to possible toxic effects.
Copyright 2006, Data Mining Research Laboratory
Sociability Index
• Incremental measure of the different interactions a
node participates in
• Opposite of the Stability Index
Does not represent degree!
Copyright 2006, Data Mining Research Laboratory
Sociability Index for Community Prediction
• Goal : To identify future cluster co-occurrences based
on history data for the DBLP dataset
• Key Intuition: If two authors have high sociability,
and they have not yet collaborated (not been clustered
together), there is a high chance they will.
• Setup : Use the data for 1997-2001 to predict cluster cooccurrences for 2002-2006
Copyright 2006, Data Mining Research Laboratory
Experimental Results
• Comparison with other measures (Liben-Nowell and
Kleinberg, CIKM 2003)
– Common Neighbor
– Adamic-Adar
– Jacquard
Copyright 2006, Data Mining Research Laboratory
Popularity Index
• Measure of attraction of nodes to a cluster
• Influence measure of a cluster
• Does not reflect the size of the cluster
• DBLP dataset
– Can be used to identify hot topics
– If a large number of nodes join a cluster and they are all
working on a similar topic, it indicates a buzz around that
topic for that year
Copyright 2006, Data Mining Research Laboratory
Application of Popularity Index
• Example : XML
• Year 1999 : 3 authors
(XML and web
applications)
• Year 2000 : 50 joins
– 30 of these
authors published
papers on XML
Copyright 2006, Data Mining Research Laboratory
Influence Index
• Measure of influence of a node on others
• Influence in terms of participation in critical events
• Influence of a node initially computed as
• Follower nodes need to be pruned!
unless
Copyright 2006, Data Mining Research Laboratory
Top Influential authors – DBLP dataset
Copyright 2006, Data Mining Research Laboratory
Diffusion Models
• Study the spread of information in an evolving
interaction network (Kempe et al, 2003, 2005)
–
–
–
–
Nodes activated with information
Newly activated nodes become contagious briefly
Information propagates through the network
Activation function maps weights of the links of a
t1 t2 t3 t4
node to determine
if it is activated
• SUM Activation: If sum of weights > threshold,
activate
• MAX Activation: If any single weight > threshold,
activate
Copyright 2006, Data Mining Research Laboratory
Diffusion Models – Influence Maximization
• Influence Maximization Problem : Find initial set of nodes that
can activate the most number of nodes over a time period
– Critical in applications such as viral marketing and for
epidemiological research
– Complicated in the case of dynamic interaction networks as
the network changes over time
• Need for dynamic measures that reflect the current status
of the network
– Sociability Index used to weight links
• Highly sociable nodes have high propensity to pass on information
– Influence Index to determine initial set of active nodes
– Comparison with random choice of nodes and degree-based
selection (Wasserman and Faust, 1994)
Copyright 2006, Data Mining Research Laboratory
Conclusions
• Most real-world graphs dynamic in nature
– Need for analysis, reasoning and inference
– Proposed an event-based framework
• Clusters to capture structure at different
snapshots
• Critical events over clusters to identify
dynamic properties of graphs
• Behavioral patterns incrementally composed
from critical events
– Proposed method useful in many application
domains
• Protein function prediction, drug design,
recommender systems, viral marketing,
epidemiology
Copyright 2006, Data Mining Research Laboratory
Temporal Snapshots
Clustering
Event Detection
Behavioral Patterns
Analysis
and
Inference
Future Directions
• Extensions to large interaction graphs
• Use of semantic information for reasoning and inference
– Merge and Split Events
• If two clusters have high semantic similarity, probability of a Merge is
high
– Continue events
• Track the evolution of topics
• Sequences of Form, Continue, Continue …
• Multi-scale temporal modeling
• Analyze snapshots of different granularity
Copyright 2006, Data Mining Research Laboratory
Thanks!
• Poster # 36, this evening (Mon 13th Aug, 6:15 – 9:15 pm)
• This work was supported by the following grants:
– DOE Early Career Principal Investigator Award No. DE-FG02-
04ER25611
– NSF CAREER Grant IIS-0347662
• Contacts:
– Sitaram Asur : asur@cse.ohio-state.edu
– Dr Srinivasan Parthasarathy : srini@cse.ohio-state.edu
– Duygu Ucar : ucar@cse.ohio-state.edu
• Group Webpage : http://dmrl.cse.ohio-state.edu
Copyright 2006, Data Mining Research Laboratory
Event Detection
Copyright 2006, Data Mining Research Laboratory
Event Detection
Copyright 2006, Data Mining Research Laboratory
Download