1
CS6208 : Advanced Topics in Artificial Intelligence
Graph Machine Learning
Lecture 2 : Introduction to Graph Science
Semester 2 2022/23
Xavier Bresson
https://twitter.com/xbresson
Department of Computer Science
National University of Singapore (NUS)
Xavier Bresson
1
Course lectures
Introduction to Graph Machine Learning
Part 1: GML without feature learning
(before 2014)
Introduction to Graph Science
Graph Analysis Techniques without
Feature Learning
Graph Convolutional Networks
(spectral and spatial)
Weisfeiler-Lehman GNNs
Graph clustering
Classification
Benchmarking GNNs
Recommendation
Molecular science and generative GNNs
Part 2 : GML with shallow feature learning
(2014-2016)
Shallow graph feature learning
Xavier Bresson
Part 3 : GML with deep feature learning,
a.k.a. GNNs (after 2016)
Graph Transformer & Graph
ViT/MLP-Mixer
Dimensionality reduction
2
GNNs for combinatorial optimization
GNNs for recommendation
GNNs for knowledge graphs
Integrating GNNs and LLMs
2
Outline
3
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
3
Outline
4
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
4
5
Graphs
Graphs encode complex data structures.
They are everywhere! Internet, social networks,
customer-product relationships, etc
MNIST Image
Network
Social Network
Internet Network of
California
“Graphs are the most important discrete models in the world.”
- Gil Strang (MIT)
GTZAN Music
Network
Network of Text Documents
20newsgroups
Xavier Bresson
5
6
Graphs
Definition : (Simple) mathematical model representing pairwise relationships between data.
All pairwise
relationships
Graph
data1
Why are graphs useful?
data2
data3
data1
data2
Graphs offer a global view and analysis of data structures.
They possess meaningful patterns, i.e. insights about data properties.
Some tasks are exclusively designed for graphs, e.g. Google PageRank recommendation.
They can boost performance with additional priors, a.k.a. inductive bias.
They can benefit from GPUs as graphs are (sparse) matrices.
Xavier Bresson
6
7
Graph theory
When did it start?
History of graph theory : Graphs have been formally studied since 1736, starting with
Mathematician Leonhard Euler and the famous problem of “Seven Bridges of Königsberg” :
Q: Can we find a path through the city (starting from any place) that crosses each bridge
once and only once? Euler proved that it is not possible (it is only feasible if the graph has
even degree that allows cycles).
Simplification
Königsberg
City
Graph
representation
Source: Wikipedia.
Graph theory have since developed tools to analyze and process networks for all sort of
applications : clustering, classification, visualization, recommendation, etc.
Xavier Bresson
7
Outline
8
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
8
9
Graph categories
Natural Graphs
Social networks : Meta, LinkedIn, Twitter
Biological networks : Brain connectivity & functionality, gene regulatory networks
Communication networks : Internet, networking devices
Transportation networks : Trains, cars, airplanes, pedestrians
Power networks : Electricity, water
Natural graphs mean graphs that are not artificially hand-crafted/constructed.
Facebook
=
Minnesota Road Network
Xavier Bresson
Telecommunication
Network
US Electrical Network
Brain
Connectivity
Graphs
9
10
Graph categories
Graphs constructed from data :
MNIST Image Graph
MNIST image network
GTZAN music network
20NEWS text document network
3D mesh points
GTZAN Music Graph
Graph of Text Documents
20newsgroups
3D mesh points
Optimal graph construction : Unfortunately, no theoretical approach is available – it is
empiricial and depends mostly on domain expertise knowledge and good common practice
(later discussed).
What is the computational time needed to construct a graph from data features ?
Exact construction : O(n2.d), n = num data, d = num features.
for d = 1K, n = 1K
⇒ time < 1sec
n = 100K ⇒ time = 1 min
n = 1M
⇒ time > 1 hour
Approximate technique : kd-trees O(n log n.d) with e.g. FLANN[1], a library for fast
approximate nearest neighbor search in high-dimensional spaces.
Xavier Bresson
[1] https://github.com/flann-lib/flann
10
11
Graph categories
Mathematical/simulated graphs
Erdos-Renyi graphs[1]
Stochastic block models (SBM)[2]
Erdos-Renyi Network
Source: Wikipedia.
Paul Erdős
1913 – 1996
Lancichinetti-Fortunato-Radicchi (LFR) graphs[3]
Why using artificial networks?
Mathematical Modeling
LFR
Source: Kojaku, 2018
SBM
Source: Abbe, JMLR'17
Advantage : Precise control of your data model (estimate best performance given some
data assumptions). No need to perform extensive experiments.
Limitation : Most data assumptions are often too restrictive, and it is not guaranteed
that real-world data follow the given model assumptions.
[1] Erdos, Renyi, On random graph, 1959
[2] Anderson, Wasserman, Faust, Building stochastic blockmodels, 1992
[3] Lancichinetti, Fortunato, Filippo, Benchmark graphs for testing community detection algorithms, 2008
Xavier Bresson
11
Outline
12
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
12
13
Basic definitions
eij 2 E
Graphs are defined as G = (V,E,A) where
V is the set of vertices (or nodes) w/ |V| = n.
E is the set of edges.
G
i
j
A is the adjacency similarity matrix.
i2V
Directed or undirected graphs :
j2V
Aij = 0.9
<latexit sha1_base64="E6IvTFupaJuuUnDO4ywOZAhA8yE=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoPgaZmV4OMgRLx4jGAemCxhdjKbjJmdXWZmhbDkL7x4UMSrf+PNv3GS7EGjBQ1FVTfdXUEiuDYYfzmFpeWV1bXiemljc2t7p7y719Rxqihr0FjEqh0QzQSXrGG4EaydKEaiQLBWMLqe+q1HpjSP5Z0ZJ8yPyEDykFNirHR/1cv4w+QSuxe9cgW7eAb0l3g5qUCOeq/82e3HNI2YNFQQrTseToyfEWU4FWxS6qaaJYSOyIB1LJUkYtrPZhdP0JFV+iiMlS1p0Ez9OZGRSOtxFNjOiJihXvSm4n9eJzXhuZ9xmaSGSTpfFKYCmRhN30d9rhg1YmwJoYrbWxEdEkWosSGVbAje4st/SfPE9U7d6m21UsN5HEU4gEM4Bg/OoAY3UIcGUJDwBC/w6mjn2Xlz3uetBSef2YdfcD6+AXZekBQ=</latexit>
Directed graph
Undirected graph
Node degree :
For binary graphs, Aij = {0,1} ⇒ degree = num of edges connected to a node
For weighted graphs, Aij in [0,1] ⇒ degree is defined as di = ∑! ∈ # Aij
Xavier Bresson
1
1
1
0.1
0.2
0.4
1
0.6
degree = 4
degree = 1.3
Binary graph
Weighted graph
13
Algorithms
14
Standard graph algorithms (a.k.a. software engineering, no learning) :
Breadth-first search, depth-first search, minimum spanning tree, topological sorting, strongly
connected components, graph colouring, maximum flow/minimum cut, graph matching, etc.
Shortest path algorithm : Find a path on a graph with the smallest possible length.
Fast Algorithm : Dijkstra's algorithm[1]
Popular application is the road navigator product, e.g. from New York to Los Angeles.
Shortest path
j
i
Longest path
[1] Dijkstra, A note on two problems in connexion with graphs, 1959
Xavier Bresson
14
15
Dense vs. sparse graphs
Dense/complete/full graphs : Each vertex is connected to all other vertices.
|E| =
n(n
1)
2
= O(n2 )
Dense graphs
Sparse graphs : Each vertex is connected to a few other k ≪ n vertices.
|E| = O(kn) = O(n)
<latexit sha1_base64="WiQVtby8TdkErQGUlxnQ2gqSnrA=">AAAB+XicbVDLSsNAFL3xWeMr6tJNsAjtpiRS1I1QEMGdFewD2lAm00k7dDIJM5NCSfsnblwo4tY/ceffOGmz0NYD93I4517mzvFjRqVynG9jbX1jc2u7sGPu7u0fHFpHx00ZJQKTBo5YJNo+koRRThqKKkbasSAo9Blp+aPbzG+NiZA04k9qEhMvRANOA4qR0lLPsqZ305uH0oiXdedl0+xZRafizGGvEjcnRchR71lf3X6Ek5BwhRmSsuM6sfJSJBTFjMzMbiJJjPAIDUhHU45CIr10fvnMPtdK3w4ioYsre67+3khRKOUk9PVkiNRQLnuZ+J/XSVRw7aWUx4kiHC8eChJmq8jOYrD7VBCs2EQThAXVt9p4iATCSoeVheAuf3mVNC8q7mWl+lgt1pw8jgKcwhmUwIUrqME91KEBGMbwDK/wZqTGi/FufCxG14x85wT+wPj8AYNmkZQ=</latexit>
Which graph is better -- dense or sparse?
Sparse graph
Sparse networks are highly desirable for memory and computational efficiency.
For instance, Internet has n = 4.73 billion pages (as of August 2016)
|E| = n2 = 1018 if Internet was full.
|E| = k.n = 1011 as it is sparse with k ≈ 100 (mean degree).
Internet
Good news : Most real-world networks (e.g. social, brain, communication, etc) are sparse !
Because sparsity induces structure (a fully connected graph has no structure).
Xavier Bresson
15
16
Adjacency matrix
Definition : Matrix A in G = (V,E,A) represents structural/topological information about the
network. We have two classes of A :
Binary matrix : Aij ∈ {0,1}
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
Aij =
W
⇢
1 if (i, j) 2 E
0 otherwise
2
1
1
G=
4
1
1
1
3
1
1 0
1
26
) W
=
A 36
4 0
4 0
2
<latexit sha1_base64="75R+9jLduDP6KhqPnafG/i/q8Yg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8VLx6r2FpoQ9lsJ+3SzSbsboRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm8x/fEKleSwfzCRBP6JDyUPOqLHS/XWpX664VXcOskq8nFQgR7Nf/uoNYpZGKA0TVOuu5ybGn1JlOBM4K/VSjQllYzrErqWSRqj96fzSGTmzyoCEsbIlDZmrvyemNNJ6EgW2M6JmpJe9TPzP66YmvPKnXCapQckWi8JUEBOT7G0y4AqZERNLKFPc3krYiCrKjA0nC8FbfnmVtGtV76Jav6tXGrU8jiKcwCmcgweX0IBbaEILGITwDK/w5oydF+fd+Vi0Fpx85hj+wPn8AcVIjNI=</latexit>
2
1
0
1
1
3
0
1
0
1
4
3
0
1 7
7
1 5
0
Weighted matrix : Aij ∈ [0,1] (commonly normalized to 1)
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
Aij =
W
⇢
wij 2 [0, 1] if (i, j) 2 E
0 otherwise
2
0.3
0.4
G=
1
1
3
Xavier Bresson
4
0.7
2
1 2 3 4 3
1
0 0.4 0
0
7
0.4
0
0.7
0.3
26
6
) W
=
A 3 4 0 0.7 0 1 75
0 0.3 1
0
4
<latexit sha1_base64="75R+9jLduDP6KhqPnafG/i/q8Yg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8VLx6r2FpoQ9lsJ+3SzSbsboRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm8x/fEKleSwfzCRBP6JDyUPOqLHS/XWpX664VXcOskq8nFQgR7Nf/uoNYpZGKA0TVOuu5ybGn1JlOBM4K/VSjQllYzrErqWSRqj96fzSGTmzyoCEsbIlDZmrvyemNNJ6EgW2M6JmpJe9TPzP66YmvPKnXCapQckWi8JUEBOT7G0y4AqZERNLKFPc3krYiCrKjA0nC8FbfnmVtGtV76Jav6tXGrU8jiKcwCmcgweX0IBbaEILGITwDK/w5oydF+fd+Vi0Fpx85hj+wPn8AcVIjNI=</latexit>
16
17
Lab 1 : LFR social networks
Run code01.ipynb and synthesize LFR social networks.
prin
Play with the mixing parameter μ :
μ small : Communities are well separated.
μ large : Communities are mixed.
Xavier Bresson
µ=
prout
prout
prin
Community #1
Community #2
17
Outline
18
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
18
19
Curse of dimensionality
Rd , d is large
In high dimensions, Euclidean distance between data becomes meaningless.
!
What is the curse of dimensionality ?
xi
Theorem[1] : Suppose data are uniformly distributed in Rd,
pick any data xi ∈ V, we have :
⇣ d`2 (x , V \ x ) d`2 (x , V \ x ) ⌘
i
i
i
min i
lim Exi max
=0
`2
d!1
dmin (xi , V \ xi )
<latexit sha1_base64="YTnTXgjhqJooijg/zZaBEhb+axM=">AAACwXichVFda9swFJW9rzb7aLo99kUsDFLYgl3KtpeOsjLYYwdLWogyI8vXiagkG+m6TRD+k3va/s3kNA9bOugFwdE593Cle/JaSYdJ8juKHzx89PjJzm7v6bPnL/b6+y8nrmqsgLGoVGUvc+5ASQNjlKjgsrbAda7gIr866/SLa7BOVuY7rmqYaT43spSCY6Cy/i+mpM58waycL5BbW90waUpctUxzXOS5/9JmfpnJlrLPcj6krLRceMoQlmi1L4IaOpftD89AqeyoHYbmtxPmALU0jaPhekjfbRmkucfQbo+437F+4CGjJ5QmvV7WHySjZF30Lkg3YEA2dZ71f7KiEo0Gg0Jx56ZpUuPMc4tSKGh7rHFQc3HF5zAN0HANbubXCbT0TWAKWlY2HIN0zf7t8Fw7t9J56OzW6ra1jvyfNm2w/Djz0tQNghG3g8pGUaxoFyctpAWBahUAF1aGt1Kx4CEiDKF3S0i3v3wXTI5G6fvR8bfjwWmyWccOOSCvyZCk5AM5JV/JORkTEX2KikhHJj6LZVzH9rY1jjaeV+Sfiv0fhUvdLA==</latexit>
Interpretation : All data are far away to each other with the same distance value !
Besides, loss of intuition in high dimensions, e.g. with the Normal distribution :
In low-dim, most data are concentrated at the center.
In high-dim, most data are concentrated on the surface.
R1
R1M
Most data
Most data
[1] Beyer, Goldstein, Ramakrishnan, Shaft, When is “nearest neighbor” meaningful? 1999
Xavier Bresson
1-D Gaussian
1,000,000-D Gaussian
19
20
Blessing of structure
What is the blessing of structure ?
Previously, the assumption “data are uniformly” distributed is actually not true for realworld data. Data have always properties, i.e. structures or invariances, such that they
belong to a low-dimensional space called manifold where (geodesic) distances are
meaningful.
Rd , d
1
Rd , d
1
Low-dim space/
Manifold
Uniform distribution of data
No structure/randomness
Xavier Bresson
Non-Uniform distribution of data
Structure/inductive bias
20
Outline
21
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
21
Manifold learning
22
It can be challenging to identify structures hidden in data because of
The curse of dimensionality (i.e. high-dimensional data).
Some data have easy structures, but most have complex ones.
A class of algorithms that extracts low-dimensional patterns is manifold learning (later discussed).
MNIST Image
Graph
Xavier Bresson
Graph of Text Documents
20newsgroups
22
23
From manifolds to graphs
Manifold assumption : High-dimensional data are sampled from
a low-dimensional manifold.
d
R , d
1
x 2 Rd
Example : Let x be a movie, each movie is defined by d
features/attributes like genre, actors, release year, origin
country, etc such that x in Rd. We can decide to make the
assumption that all movies form a manifold M in Rd.
Assumption validity : The manifold is a good working
hypothesis for
Several types of data, including images, text documents,
music, etc.
x 2 Rd
Rd , d
1
Most machine learning tasks e.g. classification,
visualization, recommendation.
However, it can also be limited as it can be a too crude
approximation of the (true) data distribution.
Xavier Bresson
Manifold M of
all movies
dim M ≪ d
23
24
From manifolds to graphs
Graphs can be regarded as a manifold sampling process.
The manifold information is represented by a neighborhood graph (observe that the
manifold is never directly observed or constructed).
Graph
construction
Sampling
Manifold
Data points
Graph
G = (V,E,A)
Neighborhood graphs :
Most populars are k-NN graphs defined as :
(
dist(xi ,xj )2
2
e
if j 2 Nik
Aij =
W
0 otherwise
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
where dist(xi,xj) is a distance (to be decided) between xi and xj,
σ is the scale parameter (value depends on the dataset), and
Nik is the neighborhood of data xi.
Xavier Bresson
i
Nik=4
24
Outline
25
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
25
26
Spectral graph theory
Given a graph G = (V,E,A), spectral graph theory (SGT) can
Fan Chung
SGT book
1997
Find meaningful patterns that reveal multi-scale graph structures.
Process data defined on the graph domain (a.k.a. graph signal processing).
Boost performance of learning tasks s.a. clustering, classification, recommendation, etc.
The most fundamental tool in SGT is the graph Laplacian operator L.
Why is the Laplacian useful?
It is the central operator of diffusion processes (on graphs).
Basis functions of this operator are the well-known Fourier modes
(later discussed).
Pierre-Simon Laplace
(1749–1827)
Heat propagation using
Laplacian operator :
Fourier
Transform
st+1 = Lst
s0 = Heat source
(Dirac function)
Xavier Bresson
26
27
Graph Laplacian
Un-normalized/combinatorial graph Laplacian :
<latexit sha1_base64="75R+9jLduDP6KhqPnafG/i/q8Yg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8VLx6r2FpoQ9lsJ+3SzSbsboRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm8x/fEKleSwfzCRBP6JDyUPOqLHS/XWpX664VXcOskq8nFQgR7Nf/uoNYpZGKA0TVOuu5ybGn1JlOBM4K/VSjQllYzrErqWSRqj96fzSGTmzyoCEsbIlDZmrvyemNNJ6EgW2M6JmpJe9TPzP66YmvPKnXCapQckWi8JUEBOT7G0y4AqZERNLKFPc3krYiCrKjA0nC8FbfnmVtGtV76Jav6tXGrU8jiKcwCmcgweX0IBbaEILGITwDK/w5oydF+fd+Vi0Fpx85hj+wPn8AcVIjNI=</latexit>
Lun = D
<latexit sha1_base64="TlKP0iA0rKLByzOFcVvdtW0tbYk=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoPgxbAbgnoRAnrw4CGCeWCyhNnJbDJkdnaZhxCW/IUXD4p49W+8+TdOkj1oYkFDUdVNd1eQcKa06347uZXVtfWN/GZha3tnd6+4f9BUsZGENkjMY9kOsKKcCdrQTHPaTiTFUcBpKxhdT/3WE5WKxeJBjxPqR3ggWMgI1lZ6vOulRkyubs5avWLJLbszoGXiZaQEGeq94le3HxMTUaEJx0p1PDfRfoqlZoTTSaFrFE0wGeEB7VgqcESVn84unqATq/RRGEtbQqOZ+nsixZFS4yiwnRHWQ7XoTcX/vI7R4aWfMpEYTQWZLwoNRzpG0/dRn0lKNB9bgolk9lZEhlhiom1IBRuCt/jyMmlWyt55uXpfLdUqWRx5OIJjOAUPLqAGt1CHBhAQ8Ayv8OYo58V5dz7mrTknmzmEP3A+fwDrDpBi</latexit>
A
W
with D is the degree matrix :
n⇥n
D = diag(d1 , ..., dn ),
X
Aij
di =
W
ij
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
n = |V |
j
Normalized Laplacian (most popular) :
Graph sampling of M
is unbalanced
Robust w.r.t. unbalanced sampling
L=D
1/2
Lun D
1/2
= In
D
1/2
W D 1/2
A
<latexit sha1_base64="75R+9jLduDP6KhqPnafG/i/q8Yg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8VLx6r2FpoQ9lsJ+3SzSbsboRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm8x/fEKleSwfzCRBP6JDyUPOqLHS/XWpX664VXcOskq8nFQgR7Nf/uoNYpZGKA0TVOuu5ybGn1JlOBM4K/VSjQllYzrErqWSRqj96fzSGTmzyoCEsbIlDZmrvyemNNJ6EgW2M6JmpJe9TPzP66YmvPKnXCapQckWi8JUEBOT7G0y4AqZERNLKFPc3krYiCrKjA0nC8FbfnmVtGtV76Jav6tXGrU8jiKcwCmcgweX0IBbaEILGITwDK/w5oydF+fd+Vi0Fpx85hj+wPn8AcVIjNI=</latexit>
Random Walk Laplacian (for Google PageRank, later discussed) :
L=D
1
Lun = In
D
1
W
A
<latexit sha1_base64="75R+9jLduDP6KhqPnafG/i/q8Yg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8VLx6r2FpoQ9lsJ+3SzSbsboRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm8x/fEKleSwfzCRBP6JDyUPOqLHS/XWpX664VXcOskq8nFQgR7Nf/uoNYpZGKA0TVOuu5ybGn1JlOBM4K/VSjQllYzrErqWSRqj96fzSGTmzyoCEsbIlDZmrvyemNNJ6EgW2M6JmpJe9TPzP66YmvPKnXCapQckWi8JUEBOT7G0y4AqZERNLKFPc3krYiCrKjA0nC8FbfnmVtGtV76Jav6tXGrU8jiKcwCmcgweX0IBbaEILGITwDK/w5oydF+fd+Vi0Fpx85hj+wPn8AcVIjNI=</latexit>
Manifold M
All Laplacians are diffusion operators.
Xavier Bresson
27
Graph spectrum
28
Spectral graph theory can extract the modes of variation of the graph system.
How? With the eigenvalue decomposition (EVD) of the Laplacian operator L :
L = U ⇤U T 2 Rn⇥n with
<latexit sha1_base64="Dd1aC5HYaDiDcSFBeITJa4ZrwXw=">AAADdXicfVJbb9MwFHZbLiPcOnhBQkhmhW6TqpBUE/ASaRIvIO1hoGabVLeR47ipFccJtsOoovwC/h1v/A1eeMVJ02lswJESfec71886Yc6Z0o7zo9Pt3bh56/bWHevuvfsPHva3H52orJCE+iTjmTwLsaKcCeprpjk9yyXFacjpaZi8q+OnX6hULBMTvcrpLMWxYAtGsDZUsN35dgQ96KMjUxJh6M8nQ4iYQCnWyzAsP1XzUiDNUqqgqCDS9KuWaQnPmV7CCiFr6HvTInBHtm2PikDM/lk7apLnE9+M+xCIEUQX3ZhNbWiacyxiTmERJHBk/mWya0i5Jj1oIU4XGpUWCmnMRImlxKuq5LyyXDiEiZfsmhGOgZu+mV5Sec4UrSxERdRWWEiyeKntZqFWt7cpiRiOqz2zSc22sjae2P+/OMfbZJYoZaK6cF2z+eeNM64d0/Yy1yTAsRX0B47tNAavA7cFA9DacdD/jqKMFCkVmnCs1NR1cj0zQjUjvJZdKJpjkuCYTg0U2Kw7K5urqeBLw0RwkUnzCQ0b9nJFiVOlVmloMmvR6mqsJv8WmxZ68XZWMpEXmgqyHrQoONQZrE8QRkxSovnKAEwkM7tCssQSE20OtX4E96rk6+BkbLuv7YOPB4PDcfscW+Ap2AF7wAVvwCF4D46BD0jnZ/dJ93l3p/ur96z3ojdcp3Y7bc1j8If1Xv0GK9IVkQ==</latexit>
U = [u1 , ..., un ] 2 Rn⇥n ,
T
U U = In , i.e. huk , uk0 i =
⇤ = diag( 1 , ...,
0=
Interpretation:
min =
1
n) 2 R
⇢
n⇥n
2 ...
1 k = k0
,
0 otherwise
,
1 2
uk : Laplacian eigenvectors a.k.a. Fourier functions, i.e. vibration modes of the graph.
λk : Laplacian eigenvalues a.k.a. frequencies of the Fourier functions, i.e. how fast uk vibrate.
EVD answers the famous question ‘Can One Hear the Shape of a Drum’?[1]
[1] Kac, Can One Hear the Shape of a Drum, 1966
Xavier Bresson
28
Lab 2 : Spectrum of point cloud and grid
29
Run code02.ipynb and visualize the Fourier functions of a human body graph and a regular grid.
What is the main property of the smallest and largest eigenvectors?
Smallest eigenvectors ⇒ Smoothest modes of vibration, i.e. low-frequency information.
Largest eigenvectors ⇒ Highest frequencies of the graph, i.e. details or noise.
Fourier mode from a Graph =
Set of points or meshes in graphics
(e.g. 2D/3D shape recognition)
Xavier Bresson
Fourier mode from a Graph =
Regular grid
(e.g. JPEG image compression,
most used technique)
29
Spectrum of human brain
Brain connectivity
(white matter tracts
connecting brain regions)
Goal : Find brain activation patterns from structural MRI,
a.k.a. brain resting states (brain structure implies brain
function).
Methodology : Given G (brain connectivity graph) ⇒ design
a temporal graph ⇒ compute Laplacian L ⇒ compute
eigenvectors uk ⇒ visualize brain temporal activation
patterns.
30
Brain connectivity
graph
Time series
at this location
Dynamic activity
of the brain
Result : uk represent the dynamic patterns related to basic
functional brain tasks s.a. vision, body motor, language, etc.
How to construct the brain temporal graph?
Temporal edges
Brain temporal
graph
=
Static graph of
brain connectivity
Xavier Bresson
t1
t2
t3 time
Laplacian eigenvectors uk
a.k.a. brain resting states
(video)
30
Outline
31
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
31
How to construct graphs from data?
32
Three basic questions
Which type of graphs?
Which data distance?
Which data features?
Optimal graph construction
No universal recipe is available (no theory).
It depends on data and analysis task (empirical).
Domain expertise and good practice are essential.
Xavier Bresson
32
Type of constructed graphs
33
ɛ-graphs : Fully connected graphs, not practical.
Neighborhood graphs
k-NN graphs, i.e. sparse graphs by design.
Parameters
k : number of nearest neighbors, a common value is between {5,50}.
σ : positive scale parameter
Two options to compute scale σ
Global scale : σ = mean distance of all kth neighbors.
Local scale[1] : σi = distance of the kth neighbor for node i.
W
A ij =
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
(
dist(xi ,xj )2
2
e
if j 2 Nik
0 otherwise
Adjacency matrix
with global scale
W
Aij
ij =
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
(
dist(xi ,xj )2
i j
e
if j 2 Nik
0 otherwise
Adjacency matrix
with local scale
[1] Zelnik-Manor, Perona, Self-tuning spectral clustering, 2004
Xavier Bresson
33
34
Distance definition
xi
Euclidean/L2 distance :
xj
Good distance for low-dim data, e.g. d < 10.
Good distance for high-dim data with linearly separable data (e.g. MNIST).
v
u d
uX
d (x , x ) = kx
x k =t
|x
x |2
`2
i
j
i
d `2
j 2
i,m
j,m
m=1
Cosine/dot product distance :
Good distance for high-dim sparse data (e.g. text documents)
xj
⇣ hx , x i ⌘
i
j
1
dcos (xi , xj ) = cos
= |✓ij |
kxi k2 kxj k2
xi
L2-normalization
1
xi
= Projection on
kxi k2
L2 unit sphere
xi |✓ij | = dcos
1
d
x1
X=
n⇥d
n
xi
xn
kxi k2 = 1
Other analytical distances : Kullback-Leibler distance (information theory), Wasserstein
distance (optimal transport), etc.
Xavier Bresson
34
35
Data features
Types of data features
Raw features (e.g. movie features such as genre, actors, year, etc)
Hand-crafted features (e.g. SIFT in computer vision, etc)
Learned features (PCA, NMF, sparse coding, deep learning, etc)
It is generally unsuccessful to directly use the raw features for graph construction.
Issues are noise, unbalanced scaling, lack of expressiveness, curse of dimentionality, etc
For successful graph construction, raw data should be transformed into meaningful data
representation by designing new features, s.a. handcrafted or learned features.
x i 2 Rd
Raw data
Ex: Image pixels
Feature
Extraction
!
zi 2 R e
Aijij = e
W
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
New features
Ex: Car features
Flower features
+
zi
Histogram
of features
Xavier Bresson
dist(xi ,xj )2
2
Aij
W
ij = e
<latexit sha1_base64="NwN8yQKYdK776nx0TQRevY3paHg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHixWMF+wFtKJvttF272YTdjVBCf4QXD4p49fd489+4aXPQ1gcDj/dmmJkXxIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ShRDJssEpHqBFSj4BKbhhuBnVghDQOB7WBym/ntJ1SaR/LBTGP0QzqSfMgZNVZq3/RT/jgr9ktlt+LOQVaJl5My5Gj0S1+9QcSSEKVhgmrd9dzY+ClVhjOBs2Iv0RhTNqEj7FoqaYjaT+fnzsi5VQZkGClb0pC5+nsipaHW0zCwnSE1Y73sZeJ/Xjcxw2s/5TJODEq2WDRMBDERyX4nA66QGTG1hDLF7a2EjamizNiEshC85ZdXSata8S4rtftauV7N4yjAKZzBBXhwBXW4gwY0gcEEnuEV3pzYeXHenY9F65qTz5zAHzifP81Wjy4=</latexit>
dist(zi ,zj )2
2
35
36
Standard pre-processing
Center data (along feature dimension) : zero-mean property
<latexit sha1_base64="51ZsV5JiMT29voZGJY8/y9tbv7Q=">AAACJnicbVBNSwMxEM36WetX1aOXYBH0YNkVUS+C4MVjFatCt5ZsOqvBJLsks2pZ9td48a948aCIePOnmK09+PUg8PLeDDPzolQKi77/7o2Mjo1PTFamqtMzs3PztYXFU5tkhkOLJzIx5xGzIIWGFgqUcJ4aYCqScBZdH5T+2Q0YKxJ9gv0UOopdahELztBJ3dreXVfQUEKMzJjklpbfDRoi3KFRuQKmi7Uwd2pYrNNQaBoqhldRlB8XF71qt1b3G/4A9C8JhqROhmh2a89hL+GZAo1cMmvbgZ9iJ2cGBZdQVMPMQsr4NbuEtqOaKbCdfHBmQVed0qNxYtzTSAfq946cKWv7KnKV5ZL2t1eK/3ntDOPdTi50miFo/jUoziTFhJaZ0Z4wwFH2HWHcCLcr5VfMMI4u2TKE4PfJf8npZiPYbmwdbdX3N4dxVMgyWSFrJCA7ZJ8ckiZpEU7uySN5Ji/eg/fkvXpvX6Uj3rBnifyA9/EJOVylkg==</latexit>
xi
xi
mean({xi }) 2 Rd
Normalize data variance (along feature dimension) : z-scoring property
<latexit sha1_base64="QPoRt9j0O5E0AWpjVToYbXGQ3SU=">AAACmnicbVFNbxMxEPUuXyV8BThwgMOIiCocSHejquVSqQKJD3EpqGkrxenK6/UmprZ3Zc/SRKv9UfwVbvwbvGmooOlIIz2990Zjv0lLJR1G0e8gvHHz1u07G3c79+4/ePio+/jJkSsqy8WIF6qwJylzQkkjRihRiZPSCqZTJY7Ts/etfvxDWCcLc4iLUkw0mxqZS87QU0n35zyRQJXIkVlbnMMmUIAlB1u+KYo5Wl07zJo+rb1Am9dApQGqGc7StP7WnGaUdjYvrecSZ9BcP3k6hL1LRQtmWgloKqf9efId3qxpfxd6RzvscSfp9qJBtCxYB/EK9MiqDpLuL5oVvNLCIFfMuXEclTipmUXJlWg6tHKiZPyMTcXYQ8O0cJN6GW0DrzyTQV5Y3wZhyf47UTPt3EKn3tkG4q5qLXmdNq4wfzuppSkrFIZfLMorBVhAeyfIpBUc1cIDxq30bwU+Y5Zx9NdsQ4ivfnkdHA0H8c5g++t2b3+4imODPCcvSZ/EZJfsk0/kgIwID54Fe8GH4GP4InwXfg6/XFjDYDXzlPxX4eEfufjJtQ==</latexit>
xi
xi / std({xi }) 2 Rd
with std({xi })2 = mean({ xj
2
mean({xi }) })
d
Project data on L2-sphere (along feature dimension or data dimension) :
<latexit sha1_base64="M52OvkWRIoxqb4aQ+irUvKNk5pQ=">AAACIHicbVC7TsMwFHV4lvIKMLJYVEhMJakqyliJhbEg+pDqEjmu01p1nMh2gCrtp7DwKywMIAQbfA1O2wFajnSl43Pule89fsyZ0o7zZS0tr6yurec28ptb2zu79t5+Q0WJJLROIh7Jlo8V5UzQumaa01YsKQ59Tpv+4CLzm3dUKhaJGz2MaSfEPcECRrA2kmdXHjwGEaeBxlJG9xBO3vDUFBoZjkZeCSImIAqx7vt+ej2+7SKU9+yCU3QmgIvEnZECmKHm2Z+oG5EkpEITjpVqu06sOymWmhFOx3mUKBpjMsA92jZU4JCqTjo5cAyPjdKFQSRNCQ0n6u+JFIdKDUPfdGZrqnkvE//z2okOzjspE3GiqSDTj4KEQx3BLC3YZZISzYeGYCKZ2RWSPpaYaJNpFoI7f/IiaZSK7lmxfFUuVEuzOHLgEByBE+CCCqiCS1ADdUDAI3gGr+DNerJerHfrY9q6ZM1mDsAfWN8/V9uhzw==</latexit>
xi
xi / kxi k2 2 Rd
X=
Along feature
dimension
n
Normalize max and min of feature value :
<latexit sha1_base64="jeOwkUhUbTa0JNQ9rOZkVVVco4M=">AAACQHicbZDNS8MwGMbT+TXnV9Wjl+AQFHS0Q9TjwItHBeeEpo40S7ewJC1Jqo6yP82Lf4I3z148KOLVk+kcos4XAs/7PO9Lkl+UcqaN5z06panpmdm58nxlYXFpecVdXbvQSaYIbZKEJ+oywppyJmnTMMPpZaooFhGnrah/XOSta6o0S+S5GaQ0FLgrWcwINtZqu63bNoOI09hgpZIbiGKFSV6YexAJJrdR0aDhzjBHAt9+txMxhIhJGHi7fnjVQajtVr2aNyo4KfyxqIJxnbbdB9RJSCaoNIRjrQPfS02YY2UY4XRYQZmmKSZ93KWBlRILqsN8BGAIt6zTgXGi7JEGjtyfGzkWWg9EZCcFNj39NyvM/7IgM/FRmDOZZoZK8nVRnHFoEljQhB2mKDF8YAUmitm3QtLDFqGxzCsWgv/3y5Piol7zD2r7Z/vVRn2Moww2wCbYBj44BA1wAk5BExBwB57AC3h17p1n5815/xotOeOddfCrnI9PB8eutQ==</latexit>
xi
Xavier Bresson
xi min({xi })
2 [0, 1]d
max({xi }) min({xi })
Along data
dimension
36
Lab 3 : Graph construction for two-moon
37
Run code03.ipynb and study data pre-processing, construction of k-NN graphs, visualization of
distances, adjacency matrix, and graph quality with clustering accuracy.
Xavier Bresson
37
Lab 4 : Graph construction for text documents
38
Run code04.ipynb and construct graph text documents.
Xavier Bresson
38
Outline
39
Graph theory
Graph categories
Basic definitions
Curse of dimensionality and structure
Manifolds and graphs
Spectral graph theory
Graph construction
Conclusion
Xavier Bresson
39
Summary
40
Graphs can represent complex and heterogenous relationships between data.
They help to go beyond the standard assumption that data are i.i.d. (independent and
identically distributed) as they explicitly leverage the connections between pairs of data.
Any dataset and task that use explicit relations between data is a graph-based task.
When we begin looking for graphs, we discover they are ubiquitous! J
Graphs offer an augmented representation of data.
With data feature X and data relationship depicted by G = (V,E,A).
Xavier Bresson
40
Summary
41
First fundamental tool of graph science :
Adjacency matrix A
It reveals global structures in data relationship (graph spectrum).
It can visualize graphs in 2D or 3D Euclidean spaces (later discussed).
It can boost performance of machine learning techniques (later discussed).
Second fundamental tool of graph science :
Graph Laplacian matrix L
It is a diffusion operator -- It propagates information on graphs (parabolic PDEs).
It is used to reveal modes of variation of the graph system.
It is used for image compression (jpeg), neuroscience (brain activity), positional
encoding (later discussed), etc.
Xavier Bresson
41
42
Pipeline
Feature
Extraction
Step #1: Data ⇒ Graph
Skip this step if graph is given,
e.g. molecule, social network, etc
Step #2: Graph ⇒ Analysis
Spectral
Graph Theory/
others
High-dim
raw data
Humanengineered/
Learned
Graph
Analysis
Identify
Patterns
Graph
Construction
Data
Features
Graph
Domain
Expertise
Good
Practice
Unsupervised
Learning
Supervised
Learning
Graph
Make use of
graphs
Recommen
dation
Tasks
Visualization
Xavier Bresson
Feature
Extraction
42
43
Questions?
Xavier Bresson
43
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )