Analysis and modeling of social networks.

advertisement
Analysis and
Modeling of Social
Networks
Foudalis Ilias
Introduction



Online social networks have become a
ubiquitous part of everyday life
Opportunity to study social interactions in a
large-scale worldwide environment
Why model such networks?
 Understand
their evolution and formation
 Improve current systems and build better applications
 Advance the state of the art in closely related fields
(such as diffusion of information)
Social and Information Networks

Social Networks
 Mainly undirected graphs
 Connect people
 Nodes with more similar degrees
(limited capacity of
social ties)

Information Networks
 Tend to be directed graphs
 Connect web pages or other units of information
 Few nodes with extremely large number of incoming
links
Statistical characteristics of social
networks

Exhibit small diameter and small average path
length
 Also




known as the “small world phenomenon”
Clustering coefficients tend to be larger
Distribution of nodes tend to exhibit fat tails
High degree nodes tend to be connected with
other high degree nodes
Neighbors of a high degree node are less likely
to be connected with each other
Related work

Internet





Wats and Strogatz (1998), simple model that exhibits small world
characteristics
Barabasi and Albert (1999), preferential attachment models,
power law distributions
Kumar et al. (2000), link copying model, power law distributions
Klemm, Eguiluz (2002), preferential attachment with fertile
nodes, small world properties
Social Networks


Jackson and Rogers (2006), random meetings and local search
Kumar et al. (2006), preferential attachment, different types of
nodes
Our algorithm, General Description



People by default are part of certain groups
A person will have a high chance to connect to
people in the same group
People also make connections to people they
meet at random
 To
capture this effect we introduce random walks
 In a random walk a person will have a higher chance
to connect with social or famous persons
 As time passes “older” persons will do less random
walks
Our algorithm, Group Formation

First Pass Clique Formation
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
Our algorithm, Group Formation

Second Pass Clique Formation
1
2
3
4
5
6
1
2
3
4
5
6
3
4
1
2
6
5
3
4
1
2
6
5
6
3
2
1
5
4
6
3
2
1
5
4
Our Algorithm, Group Formation

Clique generation (Imaginary graph)

For FIRST_PASS times

While the total number of nodes in cliques are less than N




Get m nodes and put them in a clique
m will be chosen according to a power law distribution with exponent γ
Let M be the number of cliques generated from the first pass
For M times


Get m nodes and put them in a clique
m will be chosen according to a power law distribution with exponent γ
Our Algorithm, Graph Generation

Connection to groups
 At

each time step t a node will enter the graph
The node will try to connect to all nodes with id < t with
probability:
 | MinCommonC
lique| 
1 

 | MaxCliqueInGraph| 
Our Algorithm, Graph Generation

Random walks



All nodes with id ≤ t will try RW_TIMES to start a random walk
with probability 1/(t-id+1)
During the random walk node i will try to connect with node j with
probability sociali*qualityj
At each step the probability to stop will be (1 – 1/DEPTH)
Metrics 1/3

Degree distribution


Description of the relative frequencies of nodes that have different degrees
Diameter and average path length

Diameter is the largest distance between any two pairs of nodes in the network



Distance is defined as the length of the shortest path between two nodes
Average path length is the average over all the shortest paths
Betweenness Centrality

Gives information on how important a node is in terms of connecting other nodes

Computed as:

Where Pi(k,j) denotes the number of shortest paths from k and j that i lies on
Cei 
Pi (k , j ) / P(k , j )

k  j:i  ( j , k ) (N  1)(N  2) / 2
Metrics 2/3

Clustering

Indicates whether two neighbors of the same node are also connected with each
other
# ( j  k, j  Ni , k  Ni )
 Clustering coefficient for each node i is: Cl 
i
d i  (d i  1) / 2

Assortativity coefficient

In real networks the degrees in the endpoints of any edge tend not to be
independent
 This feature can be captured by computing the assortativity coefficient:
r
 (d  m)(d  m)
 (d  m)
i , jg
i
2
iN

j
i
Where m is the average degree of the graph
Metrics 3/3
 Neighbor

degree distribution
Average degree of the nearest neighbors of a vertex
with degree k:
k max
k nn (k )   k ' P(k ' | k )
k '1


Where P(k’|k) is the conditional probability that a node
with degree k will be connected to a node with degree
k’
Positive assortativity is translated as an increasing
knn(k) function
Data Description





Facebook data from 4 large U.S. universities
Number of nodes is small compared to the real
Facebook graph
Nodes represent a closed society
Much better way to analyze a social network
Large sample presents disadvantages
 Difficult
to analyze
 How good is the sampling?
Results and Comparisons 1/5

Average degree does not depend on the size of network
Results and Comparisons 1/5


Average degree does not depend on the size of network
All networks present positive assortativity

High degree nodes tend to connect with other high degree nodes
Results and Comparisons 1/5


Average degree does not depend on the size of network
All networks present positive assortativity


High degree nodes tend to connect with other high degree nodes
High clustering coefficients
Results and Comparisons 1/5


Average degree does not depend on the size of network
All networks present positive assortativity



High degree nodes tend to connect with other high degree nodes
High clustering coefficients
Small diameter and average path length
Results and Comparisons 2/5



Increasing knn(k)
functions
As expected due to
positive assortativity
Nodes with high
degree tend to be
connected to each
other
Results and Comparisons 3/5




Small betweenness
values
Almost independent
of node degree
No central authorities
Information flows are
distributed
Results and Comparisons 4/5


No clear power law
phenomena
On the log scale we
see fat tails as
expected
Results and Comparisons 5/5




Overall clustering is a
simple summary
characteristic
Clear clustering pattern
emerges
High node degrees have
small clustering
Neighbors of high degree
nodes less likely to be
connected to each other
Current Work

Analysis of information networks
 Very

large datasets from
LiveJournal, YouTube, Flickr
 As
expected, different structure
 Clear power law distributions
 Introduction of a new metric:

How close is pagerank with in-degree?
Future Work
Make our model mathematically tractable
 Graph evolution over time

 Densification
laws
 Shrinking diameters
Community detection and formation
 New focus on coevolutionary models

Thank you!
aiw.cs.aueb.gr/projects.html
Download