hanghang-thesis-prop.. - Carnegie Mellon University

advertisement
Tools and Algorithms for
Querying and Mining Large Graphs
Hanghang Tong
Machine Learning Department
Carnegie Mellon University
htong@cs.cmu.edu
http://www.cs.cmu.edu/~htong
1
Thesis Committee
•
•
•
•
Christos Faloutsos
William Cohen
Jeff Schneider
Philip S. Yu
2
Graphs are everywhere!
3
Motivating Questions: (high level)
• Given a large graph, we want to
+Task A: Querying
H.V.
Jagadish
15
Laks V.S.
Lakshmanan
10
R. Agrawal
13
Jiawei Han
10
1
2
Heikki
Mannila
Christos
Faloutsos
1
1
Corinna
Cortes
6
1
6
Padhraic
Smyth
1
V. Vapnik
4
+Task B: Mining
3
1
1
M. Jordan
Daryl
Pregibon
CePS on DBLP [Tong+ KDD 06]
T3 on CIKM [Tong+ CIKM 08]
Will return to this later…
4
Motivating Questions (in details)
• Querying [Goal: query complex relationship]
– Q.1. Find complex user-specific patterns;
– Q.2. Link Prediction & Proximity Tracking;
– Q.3. Answer all the above questions quickly.
• Mining [Goal: find interesting patterns]
– M.1. Spot Anomalies;
– M.2. Mine time & space;
– M.3. Detect communities.
5
Thesis Overview
Q1
Q2
Q2
Q3
Q3
M1
M1
M2
M2
M3
M3
6
Questions That We Ask
Thesis Overview
Completed
CePS, G-Ray, ProSIN
Q1 (KDD06,
KDD07 a, ICDM08)
Q2
Q3
DAP
Proposed
Q2
pTrack/cTrack
Q3
FastProx
(SDM08, SAM08)
(KDD07 b)
FastProx
(ICDM06,
KAIS07, KDD07 b, ICDM08)
P3
M1
Colibri-S
M2
P2
P3 M2
M3
P1
P3 M3
(KDD08 b)
P3 M1
(SDM08, SAM08)
Colibri-D
(KDD08 b)
T3/MT3
(CIKM08)
P1
7
Thesis Overview: Impact
Querying Mining
Tasks
Q1
Impact, Applications
Identify master-mind criminal; money launder ring;
interactive search & summarization
Q2
Q3
Predict who-calls-whom; Trend analysis on graph level
M1
M2
M3
Efficient anomaly detection in an intuitive, dynamic way
Scale all the above app.s to large, disk resident, graphs
Mine time/space in complex settings
Detect community w/ optional constraints
Footnote: Our work for Q1 has
been transferred into IBM product (Cyano)
8
Roadmap
• Introduction
• Completed Work
–Querying
–Mining
• Proposed Work
• Preliminary
• Q1
• Q2
• Q3
9
Preliminary: Proximity Measurement
I
1
J
1
A
1
1
1
H
1
B
1
D
1 1 1
E
G
F
a.k.a Relevance, Closeness, ‘Similarity’…
10
Questions That We Ask
Thesis Overview
Completed
CePS, G-Ray, ProSIN
Q1 (KDD06,
KDD07 a, ICDM08)
Q2
Q3
DAP
Proposed
Q2
pTrack/cTrack
Q3
FastProx
(SDM08, SAM08)
(KDD07 b)
FastProx
(ICDM06,
KAIS07, KDD07 b, ICDM08)
P3
M1
Colibri-S
M2
P2
P3 M2
M3
P1
P3 M3
(KDD08 b)
P3 M1
(SDM08, SAM08)
Colibri-D
(KDD08 b)
T3/MT3
(CIKM08)
P1
11
Competed work on Q1
• Goal: Find complex user-specific patterns,
– Q1.1. Center-Piece Subgraph Discovery,
– e.g., master-mind criminal given some suspects X, Y and Z?
– Q1.2. Best Effort Pattern Match,
– e.g., Money-laundry ring
– Q1.3 Interactive querying (e.g. Negation)
– e.g., find most similar conferences wrt KDD, but not like
ICML?
12
Q1.1 Center-Piece Subgraph Discovery
[Tong+ KDD 06]
Input
Output
B
B
CePS Node
A
C
Original Graph
C
A
CePS
Q: How to find hub for the black nodes?
Red: Max (Prox(A, Red) x Prox(B, Red) x Prox(C, Red))
CePS: Example (AND Query)
H.V.
Jagadish
15
Laks V.S.
Lakshmanan
10
R. Agrawal
Jiawei Han
10
1
2
Heikki
Mannila
Christos
Faloutsos
1
Corinna
Cortes
6
1
6
1
Padhraic
Smyth
1
1
V. Vapnik
4
13
3
1
M. Jordan
Daryl
Pregibon
DBLP co-authorship network:
- 400,000 authors, 2,000,000 edges
14
K_SoftAND: Relaxation of AND
Noise
Disconnected
Communities
Asking AND query?  No Answer!
15
CePS: 2 SoftAND
DB
H.V.
Jagadish
15
10
Laks V.S.
Lakshmanan
13
R. Agrawal
Jiawei Han
Umeshwar
Dayal
3
Stat.
Bernhard
Scholkopf
5
V. Vapnik
4
2
27
3
Peter L.
Bartlett
3
2
M. Jordan
Alex J.
Smola
16
Q1.2. Best-Effort Pattern Match
[Tong+ KDD 2007 b]
Query Graph
Interception
Data Graph
CEO
SEC
Matching Subgraph
Accountant
Manager
Input
Output
Q: How to find matching subgraph?
G-Ray: How to?
details
matching node
matching node
matching node
matching node
Goodness = Prox (12, 4) x Prox (4, 12) x
Prox (7, 4) x Prox (4, 7) x
Prox (11, 7) x Prox (7, 11) x
Prox (12, 11) x Prox (11, 12)
Observation:
, etc.
18
Effectiveness: star-query
Databases
Intelligent Agent
Query
Bio-medical
Result
19
Effectiveness: line-query
Theory
Databases Learning
Bio-medical
Query
Result
20
Q1.3: Interactive Querying
User Feedback
User Feedback
User Feedback
User Feedback
21
Q1.3 ProSIN for Interactive Querying
[Tong+ ICDM 08]
Initial Results
No to `ICML’
Yes to `SIGIR’
'ICDM'
'ICML'
'SDM'
'VLDB'
'ICDE'
'SIGMOD'
'NIPS'
'PKDD'
'IJCAI'
'PAKDD'
'ICDM'
'SDM'
'PKDD'
'ICDE'
'VLDB'
'SIGMOD'
'PAKDD'
'CIKM'
'SIGIR'
'WWW'
'SIGIR'
'TREC'
'CIKM'
'ECIR'
'CLEF'
'ICDM'
'JCDL'
'VLDB'
'ACL'
'ICDE'
two main sub-communities
in KDD: DBs (green) vs. Stat
(Red)
Negative feedback on ICML
will exclude other stats confs
(NIPS, IJCAI)
Positive feedback on SIGIR
will bring more IR (brown)
conferences.
what are most related conferences wrt KDD?
(DBLP author-conference bipartite graph)
22
Q1.3 ProSIN for Interactive Querying
[Tong+ ICDM 08]
Initial Results
No to `ICML’
Yes to `SIGIR’
'ICDM'
'ICML'
'SDM'
'VLDB'
'ICDE'
'SIGMOD'
'NIPS'
'PKDD'
'IJCAI'
'PAKDD'
'ICDM'
'SDM'
'PKDD'
'ICDE'
'VLDB'
'SIGMOD'
'PAKDD'
'CIKM'
'SIGIR'
'WWW'
'SIGIR'
'TREC'
'CIKM'
'ECIR'
'CLEF'
'ICDM'
'JCDL'
'VLDB'
'ACL'
'ICDE'
two main sub-communities
in KDD: DBs (green) vs. Stat
(Red)
Negative feedback on ICML
will exclude other stats confs
(NIPS, IJCAI)
Positive feedback on SIGIR
will bring more IR (brown)
conferences.
what are most related conferences wrt KDD?
(DBLP author-conference bipartite graph)
23
Q1.3 ProSIN for Interactive Querying
[Tong+ ICDM 08]
Initial Results
No to `ICML’
Yes to `SIGIR’
'ICDM'
'ICML'
'SDM'
'VLDB'
'ICDE'
'SIGMOD'
'NIPS'
'PKDD'
'IJCAI'
'PAKDD'
'ICDM'
'SDM'
'PKDD'
'ICDE'
'VLDB'
'SIGMOD'
'PAKDD'
'CIKM'
'SIGIR'
'WWW'
'SIGIR'
'TREC'
'CIKM'
'ECIR'
'CLEF'
'ICDM'
'JCDL'
'VLDB'
'ACL'
'ICDE'
two main sub-communities
in KDD: DBs (green) vs. Stat
(Red)
Negative feedback on ICML
will exclude other stats confs
(NIPS, IJCAI)
Positive feedback on SIGIR
will bring more IR (brown)
conferences.
what are most related conferences wrt KDD?
(DBLP author-conference bipartite graph)
24
Questions That We Ask
Thesis Overview
Completed
CePS, G-Ray, ProSIN
Q1 (KDD06,
KDD07 a, ICDM08)
Q2
Q3
DAP
Proposed
Q2
pTrack/cTrack
Q3
FastProx
(SDM08, SAM08)
(KDD07 b)
FastProx
(ICDM06,
KAIS07, KDD07 b, ICDM08)
P3
M1
Colibri-S
M2
P2
P3 M2
M3
P1
P3 M3
(KDD08 b)
P3 M1
(SDM08, SAM08)
Colibri-D
(KDD08 b)
T3/MT3
(CIKM08)
P1
25
Q2.1 Link Prediction:
direction [Tong+ KDD 07 a]
i
i
?
j
i
• Q: Given the existence of the link,
i
what is the direction of the link?
• A: (DAP) Compare Prox(ij) and Prox(ji)
density
Web Link
- 4, 000 nodes
- 10, 000 edges
>70%
Prox (ij) - Prox (j26i)
Q2.2 pTrack/cTrack: Challenge
[Tong+ SDM 08]
• Observations (CePS, GRay, ProSIN…)
– All for static graphs
– Proximity: main tool
• Graphs are evolving over time!
– New nodes/edges show up;
– Existing nodes/edges die out;
– Edge weights change…
Q: How to make everything incremental?
A: Track Proximity!
27
pTrack/cTrack: Trend analysis on graph level
T. Sejnowski
Rank of Influence
C. Koch
G.Hinton
M. Jordan
Year
28
pTrack: Problem Definitions
• [Given]
– (1) a large, skewed time-evolving bipartite graphs,
– (2) the query nodes of interest
• [Track]
– (1) top-k most related nodes for each query node
at each time step t;
– (2) the proximity score (or rank of proximity)
between any two query nodes at each time step t
29
pTrack: Philip S. Yu’s Top-5 conferences up to each year
ICDE
ICDCS
SIGMETRICS
PDIS
VLDB
CIKM
ICDCS
ICDE
SIGMETRICS
ICMCS
KDD
SIGMOD
ICDM
CIKM
ICDCS
ICDM
KDD
ICDE
SDM
VLDB
1992
1997
2002
2007
Databases
Performance
Distributed Sys.
DBLP: (Au. x Conf.)
- 400k aus,
- 3.5k confs
- 20 yrs
Databases
Data Mining
30
KDD’s Rank wrt. VLDB over years
(Closer)
Prox. Rank
Data Mining and Databases
are getting closer & closer
Year
31
cTrack:10 most influential authors in
NIPS community up to each year
T. Sejnowski
M. Jordan
Author-paper bipartite graph from NIPS 1987-1999.
1740 papers, 2037 authors,
spreading over 13 years
32
Questions That We Ask
Thesis Overview
Completed
CePS, G-Ray, ProSIN
Q1 (KDD06,
KDD07 a, ICDM08)
Q2
Q3
DAP
Proposed
Q2
pTrack/cTrack
Q3
FastProx
(SDM08, SAM08)
(KDD07 b)
FastProx
(ICDM06,
KAIS07, KDD07 b, ICDM08)
P3
M1
Colibri-S
M2
P2
P3 M2
M3
P1
P3 M3
(KDD08 b)
P3 M1
(SDM08, SAM08)
Colibri-D
(KDD08 b)
T3/MT3
(CIKM08)
P1
33
Proximity is the main tool
• Q.1: CePS, G-Ray, ProSIN
• Q.2: DAP, pTrack/cTrack
I
1
J
1
A
1
1
1
H
1
1
D
1 1 1
E
B
a.k.a Relevance, Closeness,
‘Similarity’…
G
F
Q: What is a `good’ Score?
34
Random walk with restart [Pan+ KDD 2004]
0.04
9
0.10
2
0.13
1
0.03
10
12
0.02
0.08
3
8
0.13
11
0.04
4
0.13
6
5
7
Node 4
0.05
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10
Node 11
Node 12
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
0.05
Nearby nodes, higher scores
More red, more relevant
Ranking vector
r4
Why RWR is a good score?
Q(i, j )  ri , j
j
1
Q  ( I  cW ) 
i
W: adjacency matrix.
c: damping factor
Qc W
all paths from i
to j with length 1
2
c W
2
c
3
all paths from i
to j with length 2
W
3
...
all paths from i
to j with length 3
RWR summarizes all the weighted paths from i to j
Computing RWR
• OntheFly ri [t  1]  cWri [t ]  (1  c)ei
– No Pre-Computation;
~
– Light Storage Cost (W)
– Slow On-Line Response: O(mE)
• Pre-Compute Q  ( I  cW )
1
– Fast On-Line Response
– Prohibitive Pre-Compute Cost: O(n3)
– Prohibitive Storage Cost: O(n2)
37
Q: How to Balance?
On-line
Off-line
Goal: Efficiently Get (elements) of Q  ( I  cW )1
38
B_Lin: Basic Idea
[Tong+ ICDM 2006]
10
9
12
2
Find Community
1
8
3
11
4
9
0.04
7
12
2
8
1
6
5
10
3
9
0.10
0.13
8
3 0.13
5
6 0.05
5
6
0.02
11
0.04
4
4
12
0.08
2
1
11
0.03
10
0.13
7
0.05
7
9
10
10
12
12
2
8
1
3
Combine
11
11
4
Fix the remaining
6
5
7
39
B_Lin: details
~
W
details
+
=
~
~
+
~
W 1: within community
Cross community
40
B_Lin: details
details
-1
-1
~
~
~ I – c W1 – cUSV
I – cW ~
Easy to be inverted LRA difference
Sherman–Morrison Lemma!
If
Then
41
B_Lin: summary
• Pre-Compute Stage
• Q: Efficiently compute and store Q
• A: A few small, instead of ONE BIG, matrices inversions
• On-Line Stage
• Q: Efficiently recover one column of Q
• A: A few, instead of MANY, matrix-vector multiplications
42
Query Time vs. Pre-Compute Time
Log Query Time
Our Results
•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-computation:
•Two orders saving
Log Pre-compute Time
43
More on Scalability Issues for Querying
(the spectrum of ``FastProx’’)
• B_Lin: one large linear system
– [Tong+ ICDM06, KAIS08]
• BB_Lin: the intrinsic complexity is small
– [Tong+ KAIS08]
• FastUpdate: time-evolving linear system
– [Tong+ SDM08, SAM08]
• FastAllDAP: multiple linear systems
– [Tong+ KDD07 a]
• Fast-ProSIN: dealing w/ on-line feedback
– [Tong+ ICDM 2008]
44
Roadmap
• Introduction
• Completed Work
–Querying
–Mining
• Proposed Work
• M1: Spotting Anomalies
• M2: Mining Time
45
Questions That We Ask
Thesis Overview
Completed
CePS, G-Ray, ProSIN
Q1 (KDD06,
KDD07 a, ICDM08)
Q2
Q3
DAP
Proposed
Q2
pTrack/cTrack
Q3
FastProx
(SDM08, SAM08)
(KDD07 b)
FastProx
(ICDM06,
KAIS07, KDD07 b, ICDM08)
P3
M1
Colibri-S
M2
P2
P3 M2
M3
P1
P3 M3
(KDD08 b)
P3 M1
(SDM08, SAM08)
Colibri-D
(KDD08 b)
T3/MT3
(CIKM08)
P1
46
Motivation [Tong+ KDD 08 b]
• Q: How to find patterns?
– e.g., communities, anomalies, etc.
• A: Low-Rank Approximation (LRA) for Adjacency
Matrix of the Graph.
X M X
A
~
R
L
47
LRA for Graph Mining: Example
Adj. matrix: A
John
ICDM
KDD
Carl
ISMB
M
X
Tom
Bob
L
~
R
X
Conf. Cluster
Interaction
Van
Roy
Author
RECOMB
Conf.
Au. clusters
Recon. error is high
 ‘Carl’ is abnormal
48
Challenges: How to get (L, M, R)?
• Efficiently
• both time and space
• Intuitively
• easy for interpretation
• Dynamically
• track patterns over time
None of Existing Methods Fully Meets
Our Wish List!
49
Why Not SVD and CUR/CX?
• SVD: Optimal in L2 and LF • CUR: Example-based
– Efficiency
2
2
O
(min(
n
m
,
nm
))
• Time:
• Space: (L, R) are dense
– Interpretation
• Linear Combination of
many columns
– Dynamic: Not Easy
– Efficiency
• Better than SVD
• Redundancy in L
– Interpretation
• Actual Columns from A
xxxx
– Dynamic: Not Easy
50
Solutions: Colibri [Tong+ KDD 08 b]
detail
s
• Colibri-S: for static graph
– Basic idea: remove linear redundancy
– Same accuracy as CUR/CX
– Significant savings in both time & space
• Up to 53x speed-up
• Colibri-D: for dynamic graph
– Basic idea: leverage smoothness between time
– Same accuracy as CUR/CMD
• Up to 112x speed-up
51
A Pictorial Comparison
(for static graphs)
detail
s
1st singular vector
SVD
CUR
2nd singular vector
CMD
Colibri-S
52
Comparison SVD, CUR vs. Colibri
s
Wish List
SVD
CUR/CX
detail
s
Colibri
[Golub+ 1989] [Drineas+ 2005] [Tong+ 2008]
Efficiency
Interpretation
Dynamics
53
Performance of Colibri-S
CUR
CUR
CMD
Ours
CMD
Time
Ours
• Accuracy
• Same 91%+
• Time
• 12x of CMD
• 28x of CUR
• Space
• ~1/3 of CMD
• ~10% of CUR
Space
Data set: Network traffic
- 21,837 sources/destinations, 158,805 edges
54
Performance
of Colibri-D
Time
CMD
Network traffic
- 21,837 nodes
- 1,220 hours
- 22,800 edge/hr
Colibri-S
Colibri-D
# of changed cols
Colibri-D achieves up to 112x speedups
55
Questions That We Ask
Thesis Overview
Completed
CePS, G-Ray, ProSIN
Q1 (KDD06,
KDD07 a, ICDM08)
Q2
Q3
DAP
Proposed
Q2
pTrack/cTrack
Q3
FastProx
(SDM08, SAM08)
(KDD07 b)
FastProx
(ICDM06,
KAIS07, KDD07 b, ICDM08)
P3
M1
Colibri-S
M2
P2
P3 M2
M3
P1
P3 M3
(KDD08 b)
P3 M1
(SDM08, SAM08)
Colibri-D
(KDD08 b)
T3/MT3
(CIKM08)
P1
56
M2: How to mine time in some
complex context?
[Tong+ CIKM 08]
57
A Motivating Example: Inputs
Time
Event(e.g., Session)
Entity
Oct. 26
Link Analysis
Clustering
Classification
Anomaly Detection
Party
Web Search
Advertising
Enterprise Search
Q&A
Tom, Bob
Bob, Alan
Bob, Alan
Alan, Beck
Beck, Dan
Dan, Jack
Jack, Peter
Jack, Peter
Peter, Smith
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
58
Time Cluster,
rep. entities: b7,b6, b8
A Motivating Example: Outputs
Oct. 30
Jack
Oct. 29
Time Cluster
Oct. 30
Rep. Entities:
``Jack’’, ``Peter’’, ``Smith’’
Abnormal Time
Oct. 28
Rep. Entities:
``Beck’’ , ``Dan’’
Oct. 26
Oct. 27
Time Cluster
Rep. Entities:
``Tom’’, ``Bob’’, ``Alan’’
Problem Definitions
(How to mine time in such complex context)
• Given data sets collected at different time
stamps;
• We want to find
Our Solutions
+1: Time Clusters
+2: Abnormal Time stamps
+3: Interpretations
+4: Right time granularity
T3
MT3
60
Data Sets
• CIKM: from CIKM proceedings
• Time: Publication year (1993-2007, 15)
• Event: Paper-published (952)
• Entities: Author (1895) & Session (279)
• Attribute: Keyword (158)
• DeviceScan: from MIT Reality Mining
• Time: the day scanning happened (1/1/20045/5/2005, 294)
• Event: blue tooth device scanning person (114, 046)
• Entities: Device (103) & Person (97)
• Attribute: NA
61
T3 on `CIKM’ Data Set
Rep. Authors
Rep. Keywords
James. P. Callan
W. Bruce Croft
James Allan
Philip S. Yu
George Karypis
Charles Clarke
Web
Cluster
Classification
XML
Language
Stream
Rep. Authors
Rep. Keywords
Elke Rundensteiner
Daniel Miranker
Andreas Henrich
Il-Yeol Song
Scott B Huffman
Robert J. Hall
Knowledge
System
Unstructured
Rule
Object-oriented
Deductive
62
MT3 on `DeviceScan’ Data Set
Work day
Semester Break & Holiday
Apr. 2004 is anomaly
Aggregate by Day
Aggregate by Month
63
Roadmap
• Introduction
• Completed Work
–Querying
–Mining
• Proposed Work
–P1: Community detection
–P2: Mining Space
–P3: Diffusion Wavelets
64
Questions That We Ask
Thesis Overview
Completed
CePS, G-Ray, ProSIN
Q1 (KDD06,
KDD07 a, ICDM08)
Q2
Q3
DAP
Proposed
Q2
pTrack/cTrack
Q3
FastProx
(SDM08, SAM08)
(KDD07 b)
FastProx
(ICDM06,
KAIS07, KDD07 b, ICDM08)
P3
M1
Colibri-S
M2
P2
P3 M2
M3
P1
P3 M3
(KDD08 b)
P3 M1
(SDM08, SAM08)
Colibri-D
(KDD08 b)
T3/MT3
(CIKM08)
P1
65
P1
Detecting Communities
• Observations: two seemingly opposite efforts
in community detection
– E1: parameter-free (no user intervention)
– E2: cluster w/ constraints (listen to users)
• Challenge: How to fill the gap?
• Idea: MDL-based method, encoding the
constraints in descriptions.
66
P2
Mining Space
• Given the data sets collected at different
locations
• We want to
– Find similar locations
– Spot Abnormal locations
– Provide Interpretations
• Idea: extend T3/MT3 to 2-d case
67
P3
Diffusion Wavelets
• Observation #1: Graph Laplacian is basis
– For many querying and mining techniques
• Observation #2: Diffusion wavelets focus on
local spectrum in multi-scales
• Conjecture: Diffusion wavelets (might)
provide an alternative/better way for
– Querying
– Mining
68
Time Line
• Dec. ‘08: Thesis Proposal
P1• Jan. – Feb., ‘09:
– Research on Community Detection
P2• Mar. – Apr. ‘09:
– Research on Mining Space
P3• May – Jul. ‘09:
– Research on Diffusion Wavelets
• Aug. ‘09: Thesis Write-up
• Sep. ‘09: Defense
69
Selected References
• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition
and fast solutions. In KDD, 404-413, 2006.
• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and
Its Applications. In ICDM, 613-622, 2006. (b.p. award)
• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for
graph mining. In KDD, 747-756, 2007.
• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007) Fast best-effort
pattern matching in large attributed graphs. In KDD, 737-746, 2007.
• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking
on Time-Evolving Bipartite Graphs. in SDM 2008. (b.p. award)
• H. Tong, S. Papadimitriou, J. Sun, P.S. Yu & C. Faloutsos. (2008) Fast Mining
of Static and Dynamic Graphs. KDD 2008
• H. Tong, Y. Sakurai, T. Eliassi-Rad, and C. Faloutsos. Fast Mining of Complex
Time-Stamped Events CIKM 08
• H. Tong, H. Qu, and H. Jamjoom. Measuring Proximity on Graphs with Side
Information. ICDM 2008
70
My other work during Ph.D study
• GhostEdge (w/ Brian, Christos and Tina, in KDD 08)
– Classification in Sparsely Labeled Network
• GMine (w/ Junio, Agma, Christos and Jure, in VLDB 06)
– Interactive Graph Visualization and Mining
• Graphite (w/ Polo, Christos, Jason, Brian and Tina, in ICDM 08)
– Visual Query System for Attributed Graphs
• TANGENT (w/ Kensuke and Christos)
– ``surprise-me’’ recommendation
• PaCK (w/ Jingrui, Spiros, Tina, Jaime and Christos)
– Community detection for heterogonous graphs
71
Acknowledgements
(the old way)
• Christos Faloutsos, Jia-Yu Pan, Yehuda Koren,
Spiros Papadimitriou, Philip S. Yu, Jimeng Sun,
Huiming Qu, Hani Jamjoom, Tina Eliassi-Rad,
Brian Gallagher, Yasushi Sakurai,
• Kensuke Oonuma, Duen Horng (Polo) Chau,
Jason I. Hong, Jingrui He, Jaime Carbonell,
José Fernando Rodrigues Jr., Jure Leskovec
Agma J. M. Traina,
• Charalampos (Babis) Tsourakakis, Meng Su
72
A Graph Miner’s Way: My Collaboration Graph
Legends:
Green: Querying
Blue: Mining
Purple: Others
: Completed
: Proposed
(During Ph.D Study)
P1
Q1
CePS
ProSIN
M3
Gray
DAP
T3/MT3
P2 M2
P3
M1
Colibri
Q2
pTrack
cTrack
GhostEdge
Graphite
FastDAP
Q3
Fast-ProSIN
BLin
GMine
BBLin
Pack
TANGENT
Q&A
Thank you!
74
Download