Spectrum based Fraud Detection in
Social Networks
XiaoweiYing,
1
Xintao Wu,
Daniel Barbara
Random Link Attack
Shirvastava et al. icde08
An abstraction of collaborative attacks including spam, viral
marketing, individual re-identification via active/passive
attacks
The attacker creates some fake nodes and uses them to
attack a large set of randomly selected regular nodes;
Fake nodes also mimic the real graph structure among
themselves to evade detection.
2
Topology Approach
Shirvastava et al. icde08
Idea
count external triangles around each node --- neighbors of a
regular user have many triangles, but random victims do not.
Algorithm
detecting suspects
clustering test and neighborhood independence test
detecting RLAs
GREEDY and TRWALK
Limitation
too many parameters
high computational cost
difficult to detect when there exist multiple RLAs
3
Our Approach
Examine the spectral space of graph topology.
: undirected, un-weighted, unsigned, and without
considering link/node attribute information;
Adjacency Matrix A (symmetric)
Adjacency Eigenspace
4
Adjacency Eigenspace
Spectral coordinate: u ( x1u , x2u ,xku ) Ying andWu SDM09
1
x11
x12
x1n
2
k
x21
xk 1
x22
xk 2
x2 n
xkn
Polbook Network
5
Spectrum Based Fraud Detection
RLA– from the matrix perturbation point of view
6
Spectrum Based Fraud Detection
Approximate the spectral coordinate
7
Approximation
Approximate the eigenvector in random link attack
Attacking nodes
first
order
second
order
Regular nodes
8
Illustrating network data
Network of the political blogs
on the 2004 U.S. election
(polblogs, 1,222 nodes and
16,714 edges)
The blogs were labeled as
either liberal or conservative.
9
Illustrating example
Political blogs (1222, 16714): each node labeled as either
liberal or conservative
Add one RLA with 20 attacking nodes that have the same
degree dist. as the regular ones.
10
Problem
We do not know who are attackers/victims in the graph
topology.
For Random Link Attacks, we can derive the distribution
of attacking nodes’ spectral coordinates.
11
Dist. of attackers’ spectral coordinates
The spectral coordinate of attacking node p
has the normal distribution with mean and variance bounded by:
We can get the region in the spectral space where RLA attacking nodes appear
with high prob. Inner structure of attackers
does not affect the region!!!
12
polblogs (1222, 16714), 20 attackers, each randomly
attacks 30 victims
Using node non-randomness
It is tedious to check every dimension one by one.
The node non-randomness of RLA attackers
We derive the upper bounds of mean and variance and get the
decision line:
13
Identifying suspects
The node non-randomness of RLA attackers
Nodes below the decision line are suspects
14
RLAs with varied inner structure
15
SPCTRA Algorithm
16
Evaluation
Topology based RLA detection approach – Shrivastava et
al. ICDE08
clustering test and neighborhood independence test
GREEDY and TRWALK
Experimental Setting
Political blogs (1222,16714), add 1 RLA with 20 attackers
Web Spam Challenge data (114K nodes and 1.8M links),
add a mix of 8 RLAs with varied sizes and connection
patterns.
17
Accuracy
Evaluation on Web spam challenge data
A snapshot of websites in domain .UK (2007)
SPCTRA: based on spectral space
GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008]
19
Execution time
TRWALK is 10 times faster than GREEDY (with less
accuracy), but still 100 times slower than SPCTRA.
Discussion of complexity is in the paper.
20
Bipartite Core Attacks
Attacker creates two type of nodes:
Accomplices: behave like normal users
except heavily connecting to fraudsters to
enhance fraudsters’ rating.
Fraudsters: nodes that actually do frauds,
mostly connect to accomplices.
No link exists within accomplices or
fraudsters.
Figure from: Duen Horng Chau et. al., Detecting
Fraudulent Personalities in Networks of Online
Auctioneers
21
Bipartite core
Bipartite Core Attacks
22
20 fraudsters and 30 accomplices.
DDoS attacks
Attacker controls 10% normal nodes to attack one victim node.
23
Conclusion
Present a framework that exploits the spectral space of
graph topology to detect attacks.
Theoretical analysis showed that attackers locate in a
different region from the regular ones in the spectral
space.
Develop the SPCTRA algorithm for detecting RLAs.
Demonstrate its effectiveness and efficiency through
empirical evaluation.
24
Future Work
Explore other attacking scenarios in both social networks
and communication networks.
In Sybil attacks, attackers may choose victims purposely,
rather than randomly.
Track how graph evolves dynamically.
25
Thank You!
Questions?
Acknowledgments
This work was collaborated with Xiaowei Ying and Daniel Barbara,
and was supported in part by U.S. National Science Foundation IIS0546027 , CNS-0831204 and CCF-1047621.
26
Another Example
27
Adjacency Eigenspace
Spectral coordinate: i ( xi1, xi 2 , xik )
1
x11
x21
x
n1
2
k
Ying andWu SDM09
x12
x1k
x22
x2 k
x
x
n2
nk
Polbook Network
28