Link Prediction Analysis through Machine Learning, Supervised and Unsupervised Shashank Golla

advertisement
Link Prediction
Analysis through Machine Learning,
Supervised and Unsupervised
Shashank Golla
Aaron Kemmer
1.
What is Link
Prediction?
Problem Definition
• Let Graph G(V, E) represent a network of vertices
(V) and edges (E). Consider G₁ and G₂ to be two
instances of the G at different times (that may
contain different edges or “relationships”), such that
Time(G₁) < Time(G₂). By using only information
from G₁, generate a set of edges E` such that any e
∈ E` does not exist in G₁ but is expected to exist in
G₂.
Link Prediction example
◎Consider Facebook friend
recommendations - the “people you may
know” list. That is, essentially, a
collection of edges between you and
another user that the system believes
may belong in the graph representative
of the social network.
Example: Twitter Sample
A directed, observed network for a single
user or “Ego” node with a small follow
network
Example: Twitter Sample
A directed, observed network for a single
user or “Ego” node with a small follow
network
Methodology
We’ll show you the
right people to follow!
Basic Algorithm
The basic algorithm is simple, you remove edges
from your observed graph, rank new edges based
on heuristics, pick top k edges, and evaluate
effectiveness.
Comparison of Heuristics
Testing Link Prediction
Here, E’’ represents
the edges that exist
and the edges you
have deleted. Enew
represents the edges
you’ve predicted in
your network. The
success region
exists between the
two.
Our process
•
•
•
Get data sets of Twitter that have the feature
sets we feel are important.
Run different heuristics/algorithms on those
data sets.
Test to see if the result from the algorithms is
accurate and be able to quantify this value.
Feature Sets
•
•
•
Proximity features - Text analysis(data
permitting).
Aggregate features - Number of similar
“Following”, with more weight given to users
with more Followers.
Topological features: Clustering index
Data Sets(Twitter)
•
•
We are using twitter data sets for research
purpose from Stanford SNAP and ASU. We
are only using an observed network of the data
available.
We used the Twitter API in order to gather
more data ourselves. We intend to use data
from our own accounts for testing.
Results/Plan
•
•
•
So far we’ve mined for all the data sets that we
need using the twitter API.
We are in the process of writing the code
necessary for the algorithms.
We’ve already got the process for being able to
test and benchmark each of the algorithms.
Conclusion
•
•
•
In conclusion we feel as though Link Prediction
is a very important problem which needs to be
more domain oriented and not generalized.
For Twitter in specific we had to do a lot of
tweaking the feature sets.
In the future we hope to be able to do this
across different social networks.
Citations
•
1. “Lecture 24 – Santa Fe Institute. “2013. 30
Mar. 2016
<http://tuvalu.santafe.edu/~aaaronc/courses/54
54/csci5454_sprint2013_CSL10.pdf>
Download