Uploaded by xxx l

GCN-Seminar

advertisement
A Brief Overview of Spectral Graph
Convolutional Network
Min Li
Computational Social Sciences and Humanities (CSSH),
Rheinisch-Westfälische Technische Hochschule Aachen,
Templergraben 55, 52062 Aachen, Germany
Abstract. The success of Deep Learning has inspired the development
of learning methods on non-Euclidean domains. Recently, many approaches
have emerged, especially the spectral methods based on spectral graph
theory. In this paper, we provide a brief overview of the spectral methods,
focusing on the fundamental theories and algorithms of three spectral
methods.
Keywords: graph neural network · graph convolutional network · spectral methods.
1
Introduction
In recent years the neural network has boosted both in the research and in the
application. Modern technologies such as pattern recognition, data mining, nature language processing(NLP), and computer vision massively benefit from the
development of neural network. Among the vast varieties of Neural Networks, the
convolutional neural network(CNN) is the first network which raises the accuracy of image recognition task to a satisfactory level[17]. CNN can extract spatial
features from a local range of the data and raise the abstract level progressively
through its multi-layer structure[18]. Comparing with the early machine learning
techniques such as support vector machine(SVM) and decision trees, CNN has
significantly reduced the artifact work in design and training and promotes the
reusability of a trained network in transfer learning of CNN. For example, the
users can take a trained network to extract the low-abstract-level features and
stack this network with several new layers for classification in high-abstract-level
of their interest[34].
The success of CNN is restricted on Euclidean data such as numbers, 2Dimages, text, and audios. However, non-Euclidean data are ubiquitous in modern
application domains such as 3D-geometry and most of all, graph. Graph, which
is consist of nodes and edges, can represent objects and their interactional relations in a mathematical form. A graph can have sophisticated underlying features behind its structural representation, which is used in social network[31],
in chemistry[25], in biology[21], and in physic controls[26]. As a result, to find
an analogical convolutional method on graph has attracted incremental attention since the success of CNN. However, several challenges prevent the direct
applying of CNN on graph:
2
Min Li
Triangle Inequality not hold. The Triangle Inequality is held in Euclidean
data, which means the shortest path between two points are always a direct
connection between them. In contrary, this feature does not necessarily exist in
a non-Euclidean space such as Manifold. Manifold is a space which is similar to
an Euclidean space locally near a point. One example of Manifold is the surface
of the earth, which was considered as flat in early history. For instance, one man
in Barcelona wants to take a journey to Shanghai. If he is looking at a world
map, he may think that Beijing is located precisely in the east of Barcelona
because the two cities have almost the same latitude on map. If he forgets that
the earth is not flat and sets out to the east along the latitude line, he will suffer
from a longer distance comparing to an alternative path if he heads north at
the first time and crosses the large longitude angle there instead. Therefore, the
shortest path between Barcelona and Shanghai is not the direct connection on
the world map. As a result, the metrics of the similarity between two points
should be different in non-Euclidean space[3].
Transition Invariant not hold. Unlike image and text, on which each point
has a fixed number of neighbors and connections, non-Euclidean data such as
graph has an irregular structure in each node. Consequently, if the center of
the kernel transfers from one center node to another in two convolution steps,
the involved number of nodes and the structure of the calculation changes dramatically. Therefore, the convolution with the kernel, which is the fundamental
operation of CNN, cannot be directly applied on a graph. Besides, the degree of
a node can vary from zero to millions in a large graph, which causes difficulty
in the choice of weight variables of the kernel.
This paper gives a brief overview of development in the domain of graph
convolutional network(GCN) and the basic theoretical idea behind the spectral
approaches. In Section 2, we give a categorization of different approaches as well
as introduce the history of the GNN and related works. In the long Section 3, we
introduces the theoretical foundation of GCN such as the definition of Laplacian
matrix on graph in Subsec. 3.2, the Fourier transformation and decomposition
on graph in Subsec. 3.3, the convolution method on graph in Subsec. 3.4. After
the introduction of the theories, we show three highly correlated approaches
in spectral methods, which are the Non-parametric GCN in Subsec. 3.5, the
ChebyNet in Subsec. 3.6, and the Simplfied ChebyNet in Subsec. 3.7. In Section
4, we show a benchmark of the Simplfied ChebyNet on different datasets. Section
5 concludes this paper and give a dicussion of the future directions.
2
Categorization, History and Related Works
Categorization The graph convolutional network(GCN) is related to and in some
papers categorized as a sub-domain of graph embedding[10][32]. The graph embedding aims to represent the graph nodes in a derived vector space, which
preserves the information on nodes, the structure of the graph, and the edge
between nodes[10].
A Brief Overview of Spectral Graph Convolutional Network
3
Because of the boost in research of the graph embedding and graph neural
network(GNN), different approaches from different aspects have appeared in recent years and makes the categorization complicated and ambiguous. This paper
takes one of the categorization methods which separates the network embedding
and GNN apart[32]. The separation is due to the rapid development and massive diversity between different GNNs. Fig.1 stats the differences between graph
embedding and GNN. The matrix factorization[1], random walk[22], and LINE
[28] belong to network embedding along while DNGR[30] and SDNE[5] locate
in a shared domain between network embedding and GNN. The GNN contains
four different sub-domains, and this paper focuses only on GCN.
Fig. 1. The categorization method separates graph embedding and GNN apart. The
graph auto-encoder, which contains DNGR and SDNE algorithms, belongs to both
categories due to their neural network alike embedding strategy. The GCN, which is
the focus of this paper, belongs to GNN. Reposted from [32].
Fig. 2. Timeline to show a brief history of graph neural network. In 2005 Macro Gori
finished the first graph neural network(GNN). In 2013 Joan Bruna built the foundament
of spectral methods. In 2016 Defferrard created the ChebyNet and later Kipf and
Welling simplified the ChebyNet. Between the spatial methods, Yujia Li introduced
GGNN algorithm in 2016 and William L. Hamilton published the GraphSage in 2018.
The different approaches of GCN distinguish on their rooted convolution
theory. The approaches which root on the graph spectral theory are widely categorized as spectral methods while the rest approaches as spatial methods[35].
A spectral method usually uses the eigenvalue and the eigenvectors of a graph,
4
Min Li
which should consider the information of all nodes and edges, and is hard to be
parallel and scale to large graphs[32]. A spatial method, in general, performs the
convolution on nodes directly and aggregates the information together, which
allows a good parallelizability and scalability to large graphs[32].
A Brief History A timeline, which is shown in Fig.2, stats several vital developments in the domain of GNN. Macro Gori finished the early work of spatial
method in 2005, which becomes the first GNN in history[9]. A further breakthrough, which is inspired by the huge success of the AlexNet in Computer
vision, took place later in 2013 when Joan Bruna introduced the spectral graph
theory[6] into this domain and built the fundament of all spectral methods[4].
However, the complexity and efficiency of the algorithm in Bruna’s work was still
not satisfactory until Defferrard suggested using Chebychev polynomial as the
kernel function and created the ChebyNet[7]. In 2017 Kipf and Welling simplified
the ChebyNet and finished the most cited paper in the domain of graph convolution[16]. On the other side of the spatial methods, Yujia Li introduced gate
recurrent unit(GRU) from recurrent neural network(RNN) into GNN and established the gated graph sequence neural networks(GGNN)[19]. In 2018 William
L. Hamilton published a simple but well-performing GraphSage algorithm based
on spatial method[11]. There are also many other important developments which
are not mentioned in this paper.
Related Works Several surveys are related to this paper. Bronstein et al.[3] gives
a description in the non-euclidean domain. Wu et al. [32] presents a new taxonomy, which is adapted in this paper, and contributes an extensive introduction as
well as benchmarks in the domain of GNN. Zhou et al.[36] conducts a summary
of all frameworks of the state of art developments[36].
3
3.1
The Spectral Method of GCN
Notations
In this paper, a graph is denoted as G = (V, E) where V = {v1 , ..., vN } is
the set of all N nodes of G and E ⊆ V × V is the set of all edges of G. The
adjacency matrix of graph is denoted as A. This paper only consider the problem
of undirected graph. Most Notations are shown in Table.3.1.
3.2
The Laplacian Matrix
The traditional convolution of CNN cannot be directly applied to non-Euclidean
data, which can be found in Sec. 1. The Laplacian matrix L introduced by Bruna
et al.[4] build the fundament of the convolution with the spectral methods[6].
This section gives a brief description of the Laplacian matrix on graph.
A Brief Overview of Spectral Graph Convolutional Network
5
Notations
Descriptions
Notations
Descriptions
G
Graph
N
Number of nodes
V
Set of all nodes
f,g
Signal function on node
E
Set of all edges
Λ
A diagonal matrix
A
Adjacency Matrix
Â
Aggregated matrix
D
Degree Matrix
D̃
Renormalized matrix
L
Laplacian Matrix
gθ (X)
The kernel/filter
L
Combinatorial Laplacian
dG (i, j)
distance between nodes
Lnorm
Normalized Laplacian
∗
Convolution operator
F
Fourier Transform
◦
Schur product
Table 1. Notations used in this paper.
comb
The finding of Laplacian matrix comes from an analogy with the LaplaceBeltrami operator on Riemannian manifolds[14]. In general, Norman Biggs shows
that the Laplacian Matrix on an undirected graph is equal to the adjacency matrix minus the degree matrix[2]. This Laplacian matrix is also called as Combinatorial Laplacian matrix. The Combinatorial Laplacian matrix is defined as:
Lcomb = A − D
(1)
where Lcomb ∈ RN ×N is the Laplacian matrix of graph G, A ∈ RN ×N is the
adjacency matrix of graph G, D is the degree matrix of graph G. The Equation
1 is proved by [2]
Because this paper only deals with the undirected graph, it exists A(i, j) =
A(j, i) and therefore the adjacency matrix A is symmetry. The value 1 in the
cell A(i, j) represents the existence of the edge between node i and node j while
0 refers to no connection. The degree matrix D is a diagonal matrix with D =
diag(d1 , d2 , ..., dN ) where di represents the degree of node i.
On the diagonal of the Combinatorial Laplacian matrix, the i-th value represents the degree of node i. In other positions L(i, j) out of the diagonal, the
negative 1 stats that the adjacency between node i and node j while 0 stats for no
connection. The following equation summaries the meaning of the Combinatorial
Laplacian matrix:

 di
Lcomb (i, j) = −1

0
if i=j,
if i and j are adjacent,
otherwise
as proved by [6].
Chung et al.[6] show that the Combinatorial Laplacian matrix only works
for regular graph and suggest the use of a normalized Laplacian matrix instead,
which is shown in the following equation:
6
Min Li
1
1
1
1
Lnorm = D− 2 Lcomb D− 2 = I − D− 2 AD− 2
norm
N ×N
(2)
N ×N
where L
∈ R
is the normalized Laplacian matrix, A ∈ R
is the
adjacency matrix, D is the degree matrix, I ∈ RN ×N is the identity matrix. The
Equation 2 is proved by [6].
In compare to Combinatorial Laplacian matrix, the diagonal values of the
normalized Laplacian matrix are all ones. The degrees of two adjacent nodes
are relocated to the denominator of Lnorm (i, j), which is shown in the following
equation:


1
1
Lnorm (i, j) = − √di dj

0
if i=j and dv 6= 0
if i and j are adjacent,
otherwise
as proved by [6].
In the rest of this paper, we denote the normalized Laplacian matrix as
Laplacian matrix due to the simplicity. The Laplacian matrix has some essential
features in its eigenvalues and eigenvectors, which can be found in the Sec.3.3.
3.3
Fourier Transformation and Fourier Decomposition on Graph
The spectral graph theory uses Fourier analysis to solve the graph convolution
problem in the spectral domain instead of in the graph domain. The transform
between the spectral and graph domains is based on the Fourier Transformation
on Graph, which is mathematically inspired by the traditional Fourier Transformation[27].
The classical Fourier Transformation can be consider as the inner product of
a function f and a term e−iωt .
Z
−iωt
F (ω) = hf (t), e
i=
f (t)e−iωt dt
(3)
R
where ω is a parameter and t is the time. Some papers write ω = 2πξ instead
while ξ is the frequency[27].
A generalized equation to calculate eigenvectors or eigenfunctions can be
defined as following:
AV = λV
(4)
where A is a certain transformation operator function or transform matrix, V
is a characteristic function or a eigen vector, λ is a scalar which represents the
eigenvalue. The Equation 4 is derived from [23].
If the transformation operator function is the Laplacian operator 4, the
equation turns out to be:
AV = 4e−iωt = ∇2 e−iωt =
∂ 2 −iωt
e
= −ω 2 e−iωt = λV
∂t2
A Brief Overview of Spectral Graph Convolutional Network
7
Therefore, the parameter ω is the eigenvalue of the Laplacian operator function according to the generalized eigenvalue equation.
In analogy, the generalized eigenvalue equation can be applied on Graph with
respect to the Laplacian matrix:
AV = Lul = λl ul = λV
(5)
where L ∈ RN ×N is the Laplacian matrix, ul ∈ RN is the l-th eigenvector of L,
λl ∈ R is the l-th eigenvalue of L.
The eigenvalues and eigenvectors of the Laplacian matrix have three important features: (derived from [24])
Trivial Eigenvector A trivial eigenvector u1 = (1, ..., 1) ∈ RN exists for all
Laplacian matrix, and its corresponding eigenvalue λ1 is 0.
Non-negative Eigenvalues Eigenvalues of the Laplacian matrix are non-negative
and are of real number.
Orthogonal Eigenvectors Eigenvectors of the Laplacian matrix are orthogonal and are of real number.
Graph is a generic data representation form and can represent the data at
each node in the graph, namely graph signal[27]. Inspired by the traditional
Fourier Transform, the Fourier Transform on Graph can also be written as the
inner product between a signal function f and the eigenvector ul :
F (λl ) = hf, ul i =
N
X
f (i)u∗l (i)
i=1


  ∗
f (λ1 )
u1 (λ1 ) · · · u∗1 (λ1 )
f (λ1 )
 

..   ..  = U ∗ f = U T f
..
fˆ =  ...  =  ...
.
.  . 
f (λN )
u∗N (λ1 ) · · · u∗N (λ1 )
f (λN )

(6)
where l ∈ RN stats the index of l-th pair of eigenvector and eigenvalue, fˆ ∈
RN is the Fourier transformed signal in spectral domain, f ∈ RN is a certain
signal function on node i in graph domain, u∗l is the conjugate transpose of the
eigenvector ul in the complex space, U ∈ RN ×N stats the matrix of eigenvectors
of Laplacian matrix, which is a matrix of real number. The Equation 6 is derived
from [27].
With the similar derivation, the Inverse Fourier Transform equation on graph
is:
Z
iωt
ˆ
F (t) = hf (ω), e i =
fˆ(ω)eiωt dω
(7)
R


f (1)
u1 (1) · · · uN (1)
fˆ(λ1 )

 
..   ..  = U fˆ
..
f =  ...  =  ...
.
.  . 
f (N )
u1 (N ) · · · uN (N )
fˆ(λN )



(8)
8
Min Li
where f ∈ RN stats the result of the back transformed signal in graph domain,
and fˆ ∈ RN stats the signal in spectral domain. U ∈ RN ×N stats the matrix of
eigenvectors of Laplacian matrix. The Equation 8 is derived from [27].
The Equation 6 and 8 show that the transform between spectral domain
and graph domain can be achieved by a left scalar product with the eigenvector
matrix or its transposed matrix. In essence, Fourier transform is a change of
bases between the orthonormal bases in graph domain and eigenvector bases in
spectral domain. The Fourier transform on graph makes the signal analysis in
the spectral domain possible, which leads to the Fourier Decomposition
The Equation 7 of the classical Inverse Fourier Transform can be interpreted
as a linear combination of different Fourier bases, which is silimar to the Fourier
Decomposition. In analogy, each pair of ul and λl in the Equation 5 represents a
Fourier base. The eigenvalue represents the importance of the corresponded base,
which can be interpreted as high frequency when the eigenvalue is high and low
frequency when the eigenvalue is low[27]. This notation allows a generalization
between the classical Fourier Decomposition and the Fourier Decomposition on
Graph.
3.4
Convolution on Graph
The traditional convolution theorem in CNN can be concluded as Convolution
operation in time domain is equal to scalar product in frequency domain.
F (f ∗ g) = F (f ) · F (g)
where F is the Fourier transformation, f and g are two signal functions in time
domain. Operator ∗ is the convolution operator and · is the scalar product operator.[15]
In analogy, the convolution theorem can be extended to the graph:
(f ∗ g)(i) = F −1 {F (f ) · F (g)}(i)
=
N
N
X
X
(fˆ(λl ) · ĝ(λl )) ul (λl ) =
fˆ(λl ) ĝ(λl ) ul (λl )
l=1
l=1
If we write the equation in the matrix form:

 


(f ∗ g)(1)
u1 (1) · · · uN (1)
fˆ(λ1 ) · ĝ(λ1 )

  ..

..
..  
..
..
f ∗g =
= .

.
.
. 
.
(f ∗ g)(N )
u1 (N ) · · · uN (N )
fˆ(λN ) · ĝ(λN )
 


fˆ(λ1 )
ĝ(λ1 )
 ..   .. 
= U  .  ◦  .  = U fˆ ◦ ĝ
ĝ(λN )
fˆ(λN )
A Brief Overview of Spectral Graph Convolutional Network
9
f ∗ g = U fˆ ◦ ĝ = U (U T f ) ◦ (U T g) = U diag fˆ(λ1 ), ..., fˆ(λN ) U T g (9)
where ◦ is the schur product operator. The equation is derived from [4].
The Equation 9 defines the convolution operation on graph, which is the
fundament of the spectral algorithms in Sec. 3.5, 3.6 and 3.7.
3.5
Non-parametric GCN
The Equation 9 defines
the convolution on graph. In 2013 Bruna et al. suggested
using the term diag fˆ(λ1 ), ..., fˆ(λN ) in the Equation 9 as kernel while each
fˆ(λ1 ) is a learnable weight. This algorithm, namely Non-parametric GCN, becomes the first spectral method in the GCN domain. The following equation
shows a forward propagation in a layer:
xk+1 = σ U diag (Fk,1 , ..., Fk,N ) U T xk
(10)
where Fk,i is the i-th learnable weight of the kernel in the k-th layer, σ is a
non-linear activation function, xj is the input of j-th layer. The Equation 10 is
derived from [4].
As an early approach among the spectral methods, some weaknesses are
remarkable in the Non-parametric GCN. The following points of the weaknesses
are summarised by [7]:
High Complexity In each forward propagation step, the calculation of the
term U diag (Fk,1 , ..., Fk,N ) U T has the complexity of O(N 2 ). In addition, the
step of eigenvalue decompositon to get the eigenvector matrix U is also expensive.
Localization Problem One of the keypoints leading to the success of CNN is
its localization in convolution, which allows a feature extraction and aggregation within the size of the kernel. In contrary, the convolution kernel in Nonparametric GCN are N-dimensional, which takes information from all nodes in
graph. The absence of the localization makes it harder to build a multi-layer
model.
Curse of Dimensionality The size of the kernel is N, which is expensive to
process on a large graph.
Because of weaknesses above, this algorithm has never been popular in practice. However, the fundamental idea of its spectral convolution has inspired many
other approaches such as ChebyNet.
3.6
ChebyNet
After analyse the weaknesses of Non-parametric GCN, Defferrard et al. published
the ChebyNet in 2016[7]. The main idea behind the ChebyNet is to limit the
size of the kernel by using the polynomial parametrization.
10
Min Li
The generalized form of the forward propagation, which is filtered by gθ , can
be represented as:
xk+1 = σ (gθ (L) xk ) = σ gθ (U ΛU T ) xk = σ U gθ (Λ) U T xk
where Λ = diag (Fk,1 , ..., Fk,N ) for Non-Parametric GCN, gθ (Λ) is the generalized kernel. [7]
Suppose K is a given setting of the algorithm, which refers to the size of
the kernel. If a polynomial kernel withPa maximalpower of K is applied to
K
approximate the kernel, we get gθ (Λ) = k=0 θk Λk
Suppose that the kernel is centered by node j and the value to be taken in
convolutionPcalculation is from the node i, the calculation step can be written as
K
gθ (Λ)i,j = k=0 θk Lk i,j . According to a lemma from [12], if the shortest path
between two nodes are larger
than k, the polynomial term will be zero, which
follows dG (i, j) > k ⇒ Lk i,j = 0. Therefore, the setting of K can limit the
perception range under the distance of K. In analogy, K is similar to the size of
the kernel in CNN.
A further observation is based on the orthogonality of matrix U :
Lk = (U ΛU T )(U ΛU T ) · · · (U ΛU T ) = (U Λk U T )
Therefore the forward propagation equation with a polynomial kernel can avoid
using the eigenvector matrix U, which can reduce the cost on the eigenvalue
decomposition:
xl+1 = σ U
K
X
k=0
!
k
θk Λ
T
U xk
=σ
K
X
!
k
θk U Λ U
T
k=0
xk
=σ
K
X
!
k
θ k L xl
k=0
The polynomial kernel has solved some problems of the Non-parametric
GCN. Firstly, the polynomial kernel has only K variables of weights, which is
independent on the size N of the input data. Secondly, the expensive eigenvalue
decomposition can be saved. Thirdly, the kernel has a good spatial localization
feature similar to CNN. The only remaining problem is that the complexity of
forward propagation is still O(N 2 ). [7]
To reduce the complexity, Defferrard et al. suggested the employ of Chebyshev polynomial in the approximation:
T0 (x) = 1, T1 (x) = x
Tn+1 (x) = 2xTn (x) − Tn−1 (x)
(11)
Where Ti (x) is the i-th Chebyshev term and x ∈ [−1, 1]. Chebyshev polynomial
is defined recursively.
Before the Chevbychev polynomial replaces the Lk term in the forward propagation equation, the L should be normalized into the range of [−1, 1] with
A Brief Overview of Spectral Graph Convolutional Network
L̃ =
2L
λmax
11
− IN . Finally, the forward propagation equation becomes:
xl+1 = σ
P
K
k=0 θk Tk (L̃)
xl
(12)
T0 (L̃) = 1, T1 (L̃) = L̃
Tk+1 (L̃) = 2xTk (L̃) − Tk−1 (L̃)
Because the Chebyshev term is recursively calculable and a scalar product
with the sparse matrix L̃ has the complexity O(|E|), the using of Chebyshev
polynomial convolution reduces the complexity from O(N 2 ) to O(|E|).[7]
3.7
Simplified ChebyNet
Kipf and Welling simplified the ChebyNet in 2017 and published their simplified
ChebyNet, which is proved to be success in graph learning. This net is sometimes
also called as GCN due to its popularity.
The idea of Kipf & Welling is to limit the size of the kernel K to 1 and stack
multiple convolution layers. Furthermore, the maximal eigenvalue is approximately to be λmax ≈ 2. Thirdly, theparameters θ0 and
θ1 are approximately as
− 21
− 12
is renormalized to raise
one parameter θ. Finally, the term IN − D AD
the numerical stability. The steps of simplification are shown in the following
equations:
gθ (L) =
P
K
k=0 θk Tk (L̃)
− IN x = θ0 x + θ1 (L − IN ) x
1
1
1
1
= θ0 x + θ1 IN − D− 2 AD− 2 − IN x = θ0 x − θ1 D− 2 AD− 2 x
(13)
1
1
1
1
≈ θx − θ D− 2 AD− 2 x = θ IN − D− 2 AD− 2 x
1
1
= θ D̃− 2 AD̃− 2 x
1
1
= ReLU D̃− 2 AD̃− 2 Xl W (l) = ReLU ÂXl W (l)
= θ0 x + θ1
Xl+1
xl
2L
λmax
1
1
Where D̃ is the renormalized degree matrix, Â = D̃− 2 AD̃− 2 is the aggregated
matrix which can be calculated in preprocessing, Xl is the matrix of input data
with different channels in layer l, W (l) is the vector of the learnable weights of
all filters set in the l-th layer. The suggested activation function between layers
is ReLU function. The Equation 13 is proved by [16].
The complexity of the forward propagation in each layer is therefore O(|E|F C),
where |E| is the number of edges on graph, F is the number of filter set in the
layer, C is number of input channels.[16]
The rest configurations are similar to CNN. Kipf and Welling suggest using
softmax with cross entropy behind the last layer for classfication tasks. In traning
process, batch loss, dropout and normal gradient descent are recommended.[16]
12
4
Min Li
Benchmark of Simplified ChebyNet
This benchmark focuses on the accuracy of document classification on three
citation networks, namely Citeseer, Cora and Pubmed, which are the same setup
of node classification stated by Yang et al.[33]. A citation network is composed
of document as node, feature vector of bag-of-words as node feature, citation
link as edge, and exactly one document label as target. We treat the citation
networks as undirected graphs and run various algorithm for classification.
The Fig.3 shows the results of node classification of five common methods
in graph embedding and graph neural network. The tendency of improvement
in accuracy is obvious with the development in this domain. The GraphSage,
which is a spatial method, has already surpassed the spectral method in node
classification task. The three colours represent three different citation network,
on which the training and validation are run. The result of SDNE on Pubmed
network is missing because of the lack of data in the cited paper.[8] [13] [16] [20]
[29]
Fig. 3. The accuracy of node classification. The vertical coordinate stats the accuracy
of node classification of different method on different citation network. The horizontal
coordinate stats the denotation of each method and the year of its release. The three
colours represent three different citation network, on which the training and validation
are run. The tendency of improvement in accuracy is obvious with the development
in this domain. The GraphSage, which is a spatial method, has already surpassed the
spectral method in node classification task. To notice that the result of SDNE on
Pubmed network is missing because of the lack of data in the cited paper. The raw
benchmark results are cited from [8] [13] [16] [20] [29].
A Brief Overview of Spectral Graph Convolutional Network
5
13
Conclusion and Discusssion
In this paper, we provided a brief overview of the spectral methods of GCN. We
first reviewed the history of the domain and introduced a categorization of the
existing methods. The introduction of the spectral methods mainly focused on
the theoretical inference, which follows the logical structure between different
fundaments. The first fundament is the two different Laplacian matrices of the
graph spectral theory, given by their definition and the interpretation on a graph.
The second fundament we showed is the traditional Fourier transformation and
the essence concerning the eigenvalue and eigenvector of the Laplacian operator. In analogy, we presented Fourier transformation on graph between spectral
domain and graph domain, which is closely related to the eigenvalue and eigenvector of the Laplacian matrix. The Inverse Fourier transform equation can be
interpreted as the equation of Fourier decomposition and the eigenvalue as the
frequency of each basis. After defining of Fourier transformation, we derived the
convolution theorem on graph and the equation in the matrix form.
Based on the fundaments, we introduced three approaches of the spectral
methods. The first Non-Parametric GCN method takes the diagonal term in the
convolution equation as their kernel, at the cost of the complexity in calculation
and loss in localization. The ChebyNet uses Chebyshev polynomial to approximate this diagonal term and remarkably reduce the complexity and achieve the
localization of the kernel. The simplified ChebyNet makes some further simplification of the kernel and builds a multilayer model as the CNN. At last, we
presented the benchmark of a node classification task and found the progressive
improvement in the domain. Above all, the accuracy of node classification has
already reached a high level as the AlexNet did in classification of images, which
shows the effectiveness of GCN.
There are some future directions, primitive ideas and possible application
which I have noticed:
The future of spatial methods is more brilliant Although the first breakthrough of GCN comes from the spectral approaches, the tendency of development goes towards the spatial methods. One reason for the success of simplified
ChebyNet is the limitation of perceptive field of a centered kernel. Therefore, the
simplified ChebyNet can be regarded as an approximation of a spatial method in
the spectral domain. In the benchmark section, we have shown that the GraphSage method, which is a spatial method, has surpassed the accuracy of simplified
ChebyNet. Furthermore, the spectral method is generally hard to run in parallel
because the Fourier transformation takes the whole graph in the calculation.
The spectral methods can still have its application scenario in some cases, but
the main focus of research has shifted to the spatial methods in last year.
Different Graphs All introduced methods of this paper is for general undirected graphs. In the world of graphs, there are still more special graphs such
as heterogenous graph, directed graph, etc. Each special graph has its special
structure and can have respective optimization methods.
The variable size of the perceptive field Each node in the graph has its
special betweenness, which shows the importance that one node is for a network.
14
Min Li
In the aspect of information flow, a node with high betweenness has a higher
possibility to touch information from all directions while a node with low betweenness plays a localized role. Therefore, we can enlarge the receptive field
of the high betweenness nodes to convolute more information together for ambiguous but extensive knowledge, while reducing the receptive field of the low
betweenness nodes for localized but explicit knowledge.
Deep-Dream on graph Deep-Dream, a network to creates a new image from
learned knowledge, has shown its success in Computer Vision. In analogy, the
graph convolutional network should also have the ability to create a graph or
network based on its learned knowledge. This technique could have some application field, such as the design of an isomer of medicine and the creation of a
new agriculture product with gene editing.
Learning algorithm for learning algorithms The existing learning algorithms such as CNN, RNN and GCN are graph in essence. It could be possible
to train a network to generate a suitable learning algorithm according to the
given problem and the training and validation data set.
References
1. Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., Smola, A.J.:
Distributed large-scale natural graph factorization pp. 37–48 (2013)
2. Biggs, N., Biggs, N., Norman, B., Press, C.U.: Algebraic Graph Theory. Cambridge
Mathematical Library, Cambridge University Press (1993)
3. Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric
deep learning: going beyond euclidean data. CoRR abs/1611.08097 (2016)
4. Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs (12 2013)
5. Cao, S., Lu, W., Xu, Q.: Deep neural networks for learning graph representations
(2016)
6. Chung, F.R.K.: Spectral Graph Theory. American Mathematical Society (1997)
7. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on
graphs with fast localized spectral filtering. CoRR abs/1606.09375 (2016)
8. Gao, H., Huang, H.: Deep attributed network embedding pp. 3364–3370
9. Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains
2, 729–734 vol. 2 (July 2005)
10. Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: A survey. CoRR abs/1705.02801 (2017)
11. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large
graphs. CoRR abs/1706.02216 (2017)
12. Hammond, D., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral
graph theory. Applied and Computational Harmonic Analysis 30, 129–150 (12
2009)
13. Huang, W., Zhang, T., Rong, Y., Huang, J.: Adaptive sampling towards fast graph
representation learning. CoRR abs/1809.05343 (2018)
14. Jakobson, D., Miller, S.D., Rivin, I., Rudnick, Z.: Eigenvalue Spacings for Regular
Graphs. Springer New York (1999)
15. Katznelson, Y.: An introduction to harmonic analysis (1976)
A Brief Overview of Spectral Graph Convolutional Network
15
16. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional
networks. CoRR abs/1609.02907 (2017)
17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks pp. 1097–1105 (2012)
18. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–44 (05 2015)
19. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.S.: Gated graph sequence neural
networks. CoRR abs/1511.05493 (2016)
20. Ma, Z., Li, M., Wang, Y.: PAN: path integral based convolution for deep graph neural networks. CoRR abs/1904.10996 (2019), http://arxiv.org/abs/1904.10996
21. Mason, O., Verwoerd, M.: Graph theory and networks in biology. IET Systems
Biology 1(2), 89–119 (March 2007)
22. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. CoRR abs/1403.6652 (2014)
23. Putinar, M.: Generalized eigenfunction expansions and spectral decompositions.
Banach Center Publications 38 (1997)
24. Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University
Press, Cambridge (2012)
25. Randic, M.: Nenad trinajstic - pioneer of chemical graph theory. Croatica Chemica
Acta 77, 1–15 (05 2004)
26. Sanchez-Gonzalez, A., Heess, N., Springenberg, J.T., Merel, J., Riedmiller, M.A.,
Hadsell, R., Battaglia, P.: Graph networks as learnable physics engines for inference
and control. CoRR abs/1806.01242 (2018)
27. Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.: Signal
processing on graphs: Extending high-dimensional data analysis to networks and
other irregular data domains. CoRR abs/1211.0053 (2012)
28. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. CoRR abs/1503.03578 (2015)
29. Tu, C., Zeng, X., Wang, H., Zhang, Z., Liu, Z., Sun, M., Zhang, B., Lin, L.: A
unified framework for community detection and network representation learning.
IEEE Transactions on Knowledge and Data Engineering 31(6), 1051–1065 (2019)
30. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding pp. 1225–1234
(2016)
31. Wasserman, S., Faust, K.: Social network analysis: Methods and applications,
vol. 8. Cambridge university press (1994)
32. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey
on graph neural networks. CoRR abs/1901.00596 (2019)
33. Yang, Z., Cohen, W.W., Salakhutdinov, R.: Revisiting semi-supervised learning
with graph embeddings. CoRR abs/1603.08861 (2016)
34. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in
deep neural networks? CoRR abs/1411.1792 (2014)
35. Zhang, Z., Cui, P., Zhu, W.: Deep learning on graphs: A survey. CoRR
abs/1812.04202 (2018)
36. Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Sun, M.: Graph neural networks:
A review of methods and applications. CoRR abs/1812.08434 (2018)
Download