Uploaded by Vinny Ng

graphsage gin-ita-feb19

advertisement
How Powerful are
Graph Neural Networks
Jure Leskovec
Tasks on Networks
Classical ML tasks in networks:
§ Node classification
§ Predict a type of a given node
§ Link prediction
§ Predict whether two nodes are linked
§ Community detection
§ Identify densely linked clusters of nodes
§ Network similarity
§ How similar are two (sub)networks
Jure Leskovec, Stanford University
2
ML on Graphs
?
?
?
?
?
Machine
Learning
Example: Node Classification
Many possible ways to create node features:
§ Node degree, PageRank score, Motifs, Degree of
neighbors, Clustering, …
Jure Leskovec, Stanford University
3
Deep Learning in Graphs
…
z
Input: Network
Predictions: Node labels,
New links, Generated
graphs and subgraphs
Jure Leskovec, Stanford University
4
Why is it Hard?
§ Modern deep learning toolbox is
designed for simple sequences or grids
§ CNNs for fixed-size images/grids….
§ RNNs or word2vec for text/sequences…
Jure Leskovec, Stanford University
5
Why is it Hard?
But networks are far more complex!
§ Arbitrary size and complex topological
structure (i.e., no spatial locality like grids)
vs.
Text
Networks
Images
§ No fixed node ordering or reference point
§ Often dynamic and have multimodal features
Jure Leskovec, Stanford University
6
Setup
We have a graph !:
§ " is the vertex set
§ # is the (binary) adjacency matrix
'×|*|
§ $∈ℝ
is a matrix of node features
§ Meaningful node features:
§ Social networks: User profile
§ Biological networks: Gene expression profiles,
gene functional information
Jure Leskovec, Stanford University
7
Graph Neural Networks
Idea: Node’s neighborhood defines a
computation graph
!
Determine node
computation graph
!
Propagate and
transform information
Learn how to propagate information across
the graph to compute node features
The Graph Neural Network Model. Scarselli et al. IEEE Transactions on Neural Networks 2005
Semi-Supervised Classification with Graph ConvolutionalJure
Networks.
N. Kipf,University
M. Welling, ICLR 2017
Leskovec,T.Stanford
8
Graph Neural Networks
A
C
B
B
TARGET NODE
A
A
C
B
A
C
E
F
D
F
E
D
INPUT GRAPH
A
Each node defines a computation graph
§ Each edge in this graph is a
transformation/aggregation function
Inductive Representation Learning on Large
Graphs. W. Hamilton, R. Ying, J. Leskovec. NIPS, 2017.
Jure Leskovec, Stanford University
9
Graph Neural Networks
A
C
B
B
TARGET NODE
A
A
C
B
A
C
E
F
D
F
E
D
A
INPUT GRAPH
Neural networks
Intuition: Nodes aggregate information from
their neighbors using neural networks
Inductive Representation Learning on Large
Graphs. W. Hamilton, R. Ying, J. Leskovec. NIPS, 2017.
Jure Leskovec, Stanford University
10
Idea: Aggregate Neighbors
Intuition: Network neighborhood
defines a computation graph
Every node defines a computation
graph based on its neighborhood!
Can be viewed as learning a generic linear combination
of graph low-pass and high-pass operators
Jure Leskovec, Stanford University
11
[NIPS ‘17]
Our Approach: GraphSAGE
!( #' ∈ ) *ℎ% )
xA
xC
>… combine
?… aggregate
?
>
Update for node +:
(-./)
ℎ,
1<=
=1 2
8+
level
embedding of node *
-
#
ℎ,- , 4
%∈) ,
Transform *’s own
embedding from level 8
1(5
-
ℎ%- )
Transform and aggregate
embeddings of neighbors 9
6
§ ℎ, = attributes 7, of node *
Jure Leskovec, Stanford University
12
[NIPS ‘17]
GraphSAGE: Training
§ Aggregation parameters are shared for all nodes
§ The number of model parameters is sublinear in |V|
§ Can use different loss functions:
*
§ Classification/Regression: ℒ ℎ% = '% − ) ℎ%
§ Pairwise Loss: ℒ ℎ% , ℎ, = max(0, 1 − 3456 ℎ% , ℎ, )
shared parameters
B
A
8 (9) , : (9)
C
F
D
shared parameters
E
INPUT GRAPH
Compute graph for node A
Jure Leskovec, Stanford University
Compute graph for node B
13
[NeurIPS ‘18]
Pooling for GNNs
Don’t just embed individual nodes. Embed the
entire graph.
Problem: Learn how to hierarchical pool the
nodes to embed the entire graph
Our solution: DIFFPOOL
§ Learns hierarchical pooling strategy
§ Sets of nodes are pooled hierarchically
§ Soft assignment of nodes to next-level nodes
Hierarchical Graph Representation LearningJure
with
Differentiable Pooling. R. Ying, et al. NeurIPS 2018.
Leskovec, Stanford University
14
DIFFPOOL Architecture
A different GNN is learned at every level of abstraction
Our approach: Use two sets of GNNs
§ GNN1 to learn how to pool the network
§ Learn cluster assignment matrix
§ GNN2 to learn the node embeddings
Hierarchical Graph Representation Learning
with Differentiable Pooling. R. Ying, et al. NeurIPS 2018.
Jure Leskovec, Stanford University
15
DIFFPOOL Architecture
A different GNN is learned at every level of abstraction
Next: Many successful
extensions and domain
Our approach: Use two sets of GNNs
adaptations
§ GNN1 to learn how to pool the network
§ Learn cluster assignment matrix
§ GNN2 to learn the node embeddings
Hierarchical Graph Representation Learning
with Differentiable Pooling. R. Ying, et al. NeurIPS 2018.
Jure Leskovec, Stanford University
16
Application (1): Pinterest
Task: Recommend related pins
tio
n
An
no
ta
Vi
su
al
N
C
G
N
§ Challenges:
0.4
0.3
0.2
0.1
0
RW
-G
C
Source pin
Mean Reciprocal
Rank
Task: Learn node
embeddings !" s.t.
# !$%&'(, !$%&'*
< #(!$%&'(, !-.'%/'0 )
§ Massive size: 3 billion nodes, 20 billion edges
§ Heterogeneous data: Rich image and text features
Graph Convolutional Neural Networks for Web-Scale Recommender Systems. R. Ying, et al. KDD, 2018. 8
(2) Drug Side Effects
§ Task: Given a pair of drugs predict
adverse side effects
46% of people ages 70-79 take >5 drugs
Proteins
Drugs
§ Link prediction on a multimodal graph
36% improvement
in AP@50 over
state of the art
Modeling Polypharmacy Side Effects with GraphJureConvolutional
Networks. M. Zitnik, et al. Bioinformatics, 2018.18
Leskovec, Stanford University
[NeurIPS ‘18]
(3) Targeted Molecule Generation
Goal: Generate molecules that optimize a
given property (Quant. energy, solubility)
Solution: Combination of
§ Graph representation learning
§ Adversarial training
§ Reinforcement learning
Graph Convolutional Policy Network for Goal-Directed Molecular
Graph
Generation.
Jure Leskovec,
Stanford
University J. You, et al., NeurIPS 2018.
19
[NeurIPS ‘18]
(4) Knowledge Graphs
Embedding Logical Queries on Knowledge Graphs.
W. Hamilton, et al. NeurIPS, 2018.
Jure Leskovec, Stanford University
20
How expressive are GNNs?
Theoretical framework: Characterize GNN’s
discriminative power by analyzing the
discriminative power of the multiset function:
§ Characterize upper bound of the
discriminative power of GNNs
§ Propose a maximally powerful GNN
§ Characterize discriminative power of popular
GNNs that use non-injective multiset
functions
How Powerful are Graph Neural Networks?
K. Xu, et al. ICLR 2019.
Jure Leskovec, Stanford University
21
Discriminative Power of GNNs
!( #' ∈ )(*)ℎ% )
+… combine
,… aggregate
Theorem: The most discriminative
GNN uses injective multiset function
for neighbor aggregation
If the aggregation function is injective, GNN can fully
capture/distinguish the rooted subtree structures:
Injective
Jure Leskovec, Stanford University
22
Discriminative Power of GNNs
Theorem: GNNs can be at most as
powerful as the Weisfeiler-Lehman
graph isomorphism test (a.k.a. canonical
labeling or color refinement)
What GNN achieves this upper bound?
Jure Leskovec, Stanford University
23
Injective functions over multiset
Idea: If GNN functions are injective, GNN
can capture/distinguish the rooted
subtree structures
Theorem: Any injective multiset function
can be expressed as ! ∑#∈% &(()
§ Solution: Model !() and &() with MLP
Consequence: Sum aggregator is the
most expressive
How Powerful are Graph Neural Networks?
K. Xu, et al. ICLR 2019.
Jure Leskovec, Stanford University
24
Expressive Power of GNNs
Many popular
aggregators:
Under review as a conference paper at ICLR 2019
Under review as a conference paper at ICLR 2019
(MEAN {f (x) : x 2 S}) cannot distinguish:
A
>vs.>
A
> >
Theorem (Informal):
sum - multiset
mean - distribution
max - set
sum
- multiset
mean
- distribution
max - se
Input
Mean aggregatorInput
only captures
distribution
information
Figure 2: Ranking by expressive power for sum, mean and max-pooling aggregators over a multis
der review Under
as a conference
paper
at ICLR
2019
Figure
Ranking
by
expressive
power
for sum,
mean
and max-pooling
review as aLeft
conference
paper
ICLR
2019and
panel2:shows
theat
input
multiset
the three
panels
illustrate
the aspects of aggregators
the multiset aover
give
Left panel shows the input multiset and the three panels illustrate the aspects of the multi
is able to capture: sum captures the full multiset, mean captures the proportion/distributio
(MAX {f aggregator
(x)
:
x
S})
aggregator
totype,
capture:
summax
captures
the full
multiset,
mean captures
thethe
proportion/d
cannot
distinguish:
of elements ofis2
aable
given
and
the
aggregator
ignores
multiplicities
(reduces
multiset to
of elements
simple
set). of a given type, and the max aggregator ignores multiplicities (reduces the m
simple set).
>
>A
> >
vs.
A
vs.
vs.
Theorem (Informal):
vs.
sum
multiset
mean
distribution
max
set
Input
vs.
vs.- multiset
vs.
sum
mean - distribution
max - set
Input
Max
aggregator
only
captures
set information
(no
multiplicity)
and Maxpower
both fail
Max fails
(c) Mean
and Max both fai
Figure 2: Ranking (a)
byMean
expressive
for sum, mean and(b)max-pooling
aggregators over
a multiset.
Jure Leskovec, Stanford University
25
>
>
Power of Aggregators
Input
sum - multiset
mean - distribution
max - set
Figure 2: Ranking by expressive power for sum, mean and max-pooling aggregators over a multiset.
Left panel shows the input multiset and the three panels illustrate the aspects of the multiset a given
aggregator is able to capture: sum captures the full multiset, mean captures the proportion/distribution
of elements of a given type, and the max aggregator ignores multiplicities (reduces the multiset to a
simple set).
Failure cases for mean and max agg.
A
vs.
A
(a) Mean and Max both fail
A
vs.
A
(b) Max fails
A
vs.
A
(c) Mean and Max both fail
Under Figure
review 3:
as Examples
a conference
paper atgraph
ICLR
2019 that mean and max-pooling aggregators fail to
of simple
structures
Ranking by discriminative power
distinguish. Figure 2 gives reasoning about how different aggregators “compress” different graph
structures/multisets.
existing GNNs instead use a 1-layer perceptron
W (Duvenaud et al., 2015; Kipf & Welling, 2017;
A
Zhang etAal., 2018), a linear mapping followed
by a non-linear activation
function such
A
A as a ReLU.
Such 1-layer mappings are examples of Generalized Linear Models (Nelder & Wedderburn, 1972).
Therefore, we are interested in understanding whether 1-layer perceptrons are enough for graph
learning. Lemma 7 suggests that there are indeed network neighborhoods (multisets) that models
sumdistinguish.
- multiset
mean - distribution
max - set
Inputperceptrons can never
with 1-layer
>
Jure Leskovec, Stanford University
>
26
Summary
§ Graph Convolution Networks
§ Generalize beyond simple convolutions
§ Fuses node features & graph info
§ State-of-the-art accuracy for node
classification and link prediction.
§ Model size independent of graph size;
can scale to billions of nodes
§ Largest embedding to date (3B nodes, 17B edges)
§ Leads to significant performance gains
Jure Leskovec, Stanford University
27
Conclusion
Results from the past 2-3 years have shown:
§ Representation learning paradigm can be
extended to graphs
§ No feature engineering necessary
§ Can effectively combine node attribute data
with the network information
§ State-of-the-art results in a number of
domains/tasks
§ Use end-to-end training instead of
multi-stage approaches for better performance
Jure Leskovec, Stanford University
28
Future Work
§ Understand other/new GNN variants
§ Understand optimization and
generalization of GNNs
§ Towards more powerful DL on graphs
§ GNNs can only distinguish rooted trees:
Jure Leskovec, Stanford University
29
References
§
Tutorial on Representation Learning on Networks at WWW 2018 http://snap.stanford.edu/proj/embeddingswww/
§
Inductive Representation Learning on Large Graphs.
W. Hamilton, R. Ying, J. Leskovec. NIPS 2017.
§
Representation Learning on Graphs: Methods and Applications. W. Hamilton, R. Ying, J. Leskovec.
IEEE Data Engineering Bulletin, 2017.
§
Graph Convolutional Neural Networks for Web-Scale Recommender Systems. R. Ying, R. He, K. Chen, P.
Eksombatchai, W. L. Hamilton, J. Leskovec. KDD, 2018.
§
Modeling Polypharmacy Side Effects with Graph Convolutional Networks. M. Zitnik, M. Agrawal, J.
Leskovec. Bioinformatics, 2018.
§
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. J. You, B. Liu, R. Ying, V.
Pande, J. Leskovec, NeurIPS 2018.
§
Embedding Logical Queries on Knowledge Graphs. W. Hamilton, P. Bajaj, M. Zitnik, D. Jurafsky, J.
Leskovec. NeuIPS, 2018.
§
How Powerful are Graph Neural Networks? K. Xu, W. Hu, J. Leskovec, S. Jegelka. ICLR 2019.
§
Code:
§
§
§
§
§
http://snap.stanford.edu/graphsage
http://snap.stanford.edu/decagon/
https://github.com/bowenliu16/rl_graph_generation
https://github.com/williamleif/graphqembed
https://github.com/snap-stanford/GraphRNN
Jure Leskovec, Stanford University
30
Industry Partnerships
PhD Students
Alexandra
Porter
Camilo
Ruiz
Claire
Donnat
Emma
Pierson
Funding
Jiaxuan
You
Mitchell
Gordon
Mohit
Tiwari
Rex
Ying
Post-Doctoral Fellows
Collaborators
Baharan
Marinka
Mirzasoleiman Zitnik
Michele
Catasta
Robert
Palovics
Adrijan
Bradaschia
Rok
Sosic
Research
Staff
Srijan
Kumar
Dan Jurafsky, Linguistics, Stanford University
Christian Danescu-Miculescu-Mizil, Information Science, Cornell University
Stephen Boyd, Electrical Engineering, Stanford University
David Gleich, Computer Science, Purdue University
VS Subrahmanian, Computer Science, University of Maryland
Sarah Kunz, Medicine, Harvard University
Russ Altman, Medicine, Stanford University
Jochen Profit, Medicine, Stanford University
Eric Horvitz, Microsoft Research
Jon Kleinberg, Computer Science, Cornell University
Sendhill Mullainathan, Economics, Harvard University
Scott Delp, Bioengineering, Stanford University
Jens Ludwig, Harris Public Policy, University of Chicago
Jure Leskovec, Stanford University
31
Download