3 - DCC | Departamento de Ciência da Computação

advertisement
Redes Complexas:
t i algoritmos
teoria,
l it
Chapter 3a
Virgílio A. F. Almeida
Março de 2010
Departamento de
D
d Ciência
Ciê i da
d Computação
C
ã
Universidade Federal de Minas Gerais
O que são as redes?
„ Redes são coleções de
pontos e linhas.
p
“Rede” ≡ “Grafo”
1
2
nó
3
aresta
4
5
pontos
linhas
vertices
Arcos, arestas
matematica
nós
Links, arestas
CC
atores
ligações relações
ligações,
sociologia
Elementos da Redes (Networks) : arestas
• Orientado
• A -> B
• Não orientado
• A <-> B or A – B
• Atributos das arestas
• Ponderação (ex. Frequencia de conexões entre 2 sites)
• Ranking-ordenamento: no roteador a ordem dos links de saída
• tipo
ti (ex.
(
em sociologia;
i l i amigo,
i
parente,
t co-autor)
t )
• Propiedades que dependem da estrutura do resto grafo, ex.:
betweenness
Grafos Ponderados (Weighted Graphs)
„ Cada vértice é anotado por um weight or capacity
„ Orientado ou não-orientado
„ Usado para modelar
„ Custo de transmissão, latência
„ Capacidade
C
id d d
do li
link
k
„ hubs and authorities (Google PageRank algorithm)
Matriz de Adjacência
„ Representação de arestas como matrix
„ Aij = 1 se nó i tem uma aresta para j
= 0 if nó i não tem aresta para j
j
i
„ Aii = 0 a não ser no caso de self-loops
i
„ Aij = Aji se a rede é não-orientada
não-orientada,
ou se i e j compartilham arestas recíprocas
Exemplos:
2
3
1
5
4
A=
i
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
0
j
Lista de Adjacência
„ Lista de arestas
„
„
„
„
„
„
„
23
24
32
34
45
52
51
„ Lista de Adjacência
„ Mais adequado
q
p
para redes g
grandes e
esparsas
„ Fácil recuperação de nós vizinhos
„ 1:
„ 2: 3 4
„ 3: 2 4
„ 4: 5
„ 5: 1 2
2
3
1
5
4
Nós
„ Propriedades dos nós da rede
„ Das conexões imediatas (locais)
„ indegree
quantas arestas orientadas que chegam a um nó
„ outdegree
quantas arestas orientadas originam em um nó
„ Grau: Degree (in ou out):
número de arestas que incidem em um nó
„ Do grafo inteiro (globais)
„ centrality
y ((betweenness,, closeness))
indegree=3
outdegree=2
degree=5
2
3
1
5
4
„ Outdegree =
Grau de Nós da Matrix de Valores
n
∑A
j =1
ij
A=
exemplo: o outdegree para nó 3 é 2, certo?
n
∑A
j =1
∑A
i =1
ij
exemplo:
l o indegree
i d
para nó
ó 3 é 1,
1
n
∑A
i =1
i3
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
0
3j
n
„ Indegree =
0
A=
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
0
Métricas das Redes: sequência dos graús e distribuição dos
graús
„ Sequência de graus: Uma lista ordenada dos (in,out) graus de cada nó
„ Sequência In-degree:
In degree:
„ [2, 2, 2, 1, 1, 1, 1, 0]
„ Sequência Out-degree:
„ [[2,, 2,, 2,, 2,, 1,, 1,, 1,, 0]]
„ Sequencia de graus não-orientados
„ [3, 3, 3, 2, 2, 1, 1, 1]
„ Distribuição dos graus: contagem de frequencia da ocorrência de cada grau
5
4
frequen
ncy
„ Di
Distribuição
t ib i ã d
de In-degree:
I d
„ [(2,3) (1,4) (0,1)]
„ Distribuição de Out-degree:
„ [(2,4) (1,3) (0,1)]
„ Distribuição não orientada:
„ [(3,3) (2,2) (1,3)]
3
2
1
0
0
1
indegree
2
Métricas das Redes: componentes conectados
„ Componentes fortemente conectados: Strongly connected components
(SCC): cada nó dentro do componente pode ser alcançado de outro nó
do componente seguindo arestas orientadas
orientadas.
B
„ SCC
„ BCDE
„ A
„ GH
„ F
F
G
C
A
E
H
D
„ Componentes fracamente conectatos (Weakly connected components
WCC): cada nó pode ser alcançado a partir de qualquer outro nó seguindo
arestas em qualquer direção
direção.
„ WCC
„ ABCDE
„ GHF
„ Em redes não orientadas, simplesmente
refere-se a componentes conectados
B
F
G
C
A
E
D
H
Métricas das Redes: caminho mínimo ‐ shortest paths
„ Caminho minimo: a menor sequência de arestas conectando dois nós.
„ Nem sempre única
B
3
„ A e C são conectados por 2 shortest
paths
„ A–E–B-C
„ A–E–D-C
A
C
2
1
3
E 2
D
„ Diametro:a maior distância geodésica no grafo
„ A distancia entre A e C é o máximo
para o grafo: 3
„ Obs: o termo ‘diametro’ é as vezes usado como sendo a distância do
caminho mínimo médio. Wikipedia: the distance between two vertices in a
graph
h is
i th
the number
b off edges
d
iin a shortest
h t t path
th connecting
ti them.
th
This
Thi iis
also known as the geodesic distance.
Cliques e grafos completos
„ Kn é um grafo completo (clique) com K vértices
„ Cada vértice é conectado a todos outros vértices.
K3
K5
K8
Clustering Coefficient
„ Transitividade
„ se A está conectado a B e B é conectado a C
qual a probabilidade que A seja conectado C?
„ Amigos dos meus amigos são provavelmente meus amigos...
„ Clustering coefficient
CC =
numero de triângulos conectados ao nó i
Número de triplas centradas no nó i
?
A
CC = [0 : 1]
B
C
Local clustering coefficient (Watts&Strogatz 1998)
„ Para um vertice i
„ A fração de pares de vizinhos do nó que estão eles próprios
conectados
„ Seja ni o número de vizinhos do vértice i
CCi =
number of connections between i’s neighbors
maximum number of p
possible connections between i’s neighbors
g
CCi directed =
CCi undirected =
# directed connections between i’s neighbors
ni * (ni -1)
# undirected
di t d connections
ti
b
between
t
i’i’s neighbors
i hb
ni * (ni -1)/2
Local clustering coefficient (Watts&Strogatz 1998)
g
(
g
)
„ Média sobre todos nós n
1
C = ∑ Ci
n i
i
link presente
link ausente
ni = 4
max número de conexões:
4*3/2 = 6
3 conexões presentes
Ci = 3/6 = 0.5
factors influencing diffusion
„ network structure (unweighted)
„ density
„ degree distribution
„ clustering
„ connected components
„ community structure
„ strength of ties (weighted)
strength of ties (weighted)
„ frequency of communication
„ strength of influence
st e gt o
ue ce
factors influencing diffusion: proximity
Scalable Proximity Estimation and Link Prediction in
Online Social Networks –ACM-IMC
ACM IMC 2009
„ A central concept in the computational analysis of social A central concept in the computational analysis of social
networks is proximity measure, which quantifies the closeness or similarity between nodes in a social network. Proximity measures form the basis for a wide range of important applications in social and natural sciences (e.g., modeling complex networks [6, 13, 25, 42]), business (e.g., viral l
k [6 13 25 42]) b i
(
i l
marketing [23], fraud detection [11]), information technology (e g improving Internet search [35] collaborativefiltering
(e.g., improving Internet search [35], collaborativefiltering [7]), computer networks (e.g., constructing overlay networks [45]), and cyber security (e.g., mitigating email spams [22], defending against Sybil attacks [56]).
Studies influencing diffusion
1. The Strength of Weak Ties MS, Granovetter 1973
Granovetter,1973
2 The Structure of Information Pathways in a 2.
The Structure of Information Pathways in a
Social Communication Network. Kossinets et al 2008
al, 2008
Artigo 1:
The Strength of Weak Ties
MS Granovetter - American Journal of Sociology
Sociology, 1973 UChicago Press
www.si.umich.edu/~rfrost/courses/SI110/readings/In_Out_and_Beyond/Granovetter.pdf
19
Artigo 2:
G Kossinets,
G.
K
i t J.
J Kleinberg,
Kl i b
D
D. Watts.
W tt
The Structure of Information Pathways in a Social
Communication Network.
Proc. 14th ACM SIGKDD Intl. Conf. on Knowledge
Discovery and Data Mining
Mining, 2008
2008.
http://www.cs.cornell.edu/home/kleinber/chrono.html
20
The Strength of Weak Ties
Mark S. Granovetter
The American Journal of Sociology,
1973
Introduction
„ “One of the most influential sociology papers ever
written” (Barabasi)
(
)
„ One of the most cited (Current Contents, 1986)
„ Interviewed people and asked:
“How did you find your job?”
„ Kept getting the the same answer:
“through an acquaintance, not a friend”
How does strength of a tie influence diffusion?
„ M. S. Granovetter: The Strength of Weak Ties, AJS, 1973:
„ finding a job through a contact that one saw
„ frequently
q
y ((2+ times/week)) 16.7%
„ occasionally (more than once a year but < 2x week) 55.6%
„ rarely 27.8%
„ but… length of path is short
„ contact
t t directly
di tl works
k for/is
f /i the
th employer
l
„ or is connected directly to employer
Context
„ Lots of studies of macro patterns
„ Social mobility
mobility, community organization
„ Data and studies for micro behavior
„ Interactions within small groups
„ Limited understanding of how micro behavior translates
into macro patterns
Network Analysis
„ Analysis of the interaction network
„ bridge the gap between micro and macro
„ Interaction network
„ Nodes: People
„ Edges: Between people with a social relationship
„ Weight: strength of connection
Quantize to either “weak” or “strong”
Basic argument
„ Classify interpersonal relations as “strong”,
“weak”
weak , or “absent”
absent
„ Strength
g is ((vaguely)
g
y) defined as “a (p
(probably
y linear))
combination of…
„ the amount of time,
„ the emotional intensity,
intensity
„ the intimacy (mutual confiding),
„ and the reciprocal services which characterize the tie
„ Negative and/or asymmetric ties (e.g. enemies or
relations with p
power imbalance)) are brushed aside for
now
27
Basic argument (cont.)
„ The stronger the tie between two individuals, the
larger the proportion of people to which they are
both tied (weakly or strongly)
„ In the extreme case, two p
people
p that are always
y
together will be tied to the same individuals
28
Forbidden triad
„ If person A has a strong tie to both B and C,
then it is unlikely for B and C not to share a tie.
tie
„ Granovetter (admittedly) exaggerates and supposes
such a triad never occurs
29
Bridges
„ A bridge is “a line in a network which provides
the only path between two points”
points
„ Therefore,
Therefore if the previous triad is in fact absent,
absent
no strong tie is a bridge
„ In other words,, all bridges
g are weak ties!
„ (realistically, bridges can be local rather than global, but
still weak)
30
Strength of weak ties
„ “Intuitively speaking, this means that whatever is
to be diffused can reach a larger number of
people, and traverse greater social distance (i.e.,
path length),
p
g ) when p
passed through
g weak ties
rather than strong.”
„ Consequences
„ Diffusion of information (rumours, innovations, getting a
j b!)
job!)
„ Homophily
p cohesion and trust
„ Group
„ Traversal of networks and node coverage
31
Bridges
„ Bridge: An edge that is part of every path between
two nodes
Bridge b/w red & green
Local Bridges
„ Local Bridge of degree N: An edge that is part of
every path of length less than N
„ Generalization of a bridge
Local bridge of deg 3
b/w A & B
A
B
Bridges
„ Bridges allow diffusion of information between otherwise
disconnected communities.
„ Local bridges bring otherwise distant communities
together
„ “Bridge” concept provides an important piece of the
micro => macro puzzle
„ What sort of relationships act as bridges?
Granovetter Transitivity
„ The stronger the tie between A and B, the larger the
overlap
p in their relationship
p circles
„ Strong tie =>
„ lots of time together => lots of opportunity for B to meet the A’s
ffriends
i d
„ similarity => greater chance that B will be “compatible” with A’s
friends
„ physiological need for congruence =>
B will have a natural affinity for A’s friends, based on A’s opinion
of them
Forbidden Triad
„ This triad will resolve to a fully connected triad
„ New edge need not be strong
„ Alternate: Any time strong tie A-B exists, then all of
A’ strong
A’s
t
tities will
ill b
be att lleastt weakly
kl connected
t d tto B
„ Supported by evidence
All Bridges are Weak Ties!
„ Proof:
„ If A
A-B
B and A-C
A C are strong, then forbidden triad implies that B-C
B C is
at least weak
„ If A-B is deleted, then A can still reach B via A-C-B
„ Small corner case: if both nodes have only a strong edge to
each other, and no other strong edges, than it is a bridge
„ Unlikely in reality
„ All local bridges are also weak ties
„ Proof is identical
Implications
„ Removal of weak ties raises path lengths more than
removal of strong
g ties
„ Assume: probability of info passing successfully between
two nodes
„ is proportional to the number of paths connecting the two nodes
„ is inversely proportional to length of those paths
„ Conclusion: Removal of a weak edge damages the
connectivityy more than the removal of a strong
g edge
g
Evidence
„ Junior High Experiment:
((Rapoport
p p and Horvath,, 1961))
„
„
„
„
„
Student writes down an ordered list of 8 friend
Pick a random starting student
B dth fi
Breadth
firstt search
h on 1
1stt and
d2
2nd
d ffriends
i d
Count number of students seen after each cycle
Repeat
p
using
g 3/4th, 5/6th, 7/8th
„ Largest number of people reached by using 7/8th,
smallest using 1/2nd
Ti S
Tie Strength and Social Media:
h d S i l M di
facebook and twitter
PEOPLE MAINTIAN HUNDREDS OF FRIENDSHIP LINKS: HOW MANY OF
THESE CORRESPOND TO STRONG TIES THAT INVOLVE FREQUENT
Q
CONTACT, AND HOW MANY OF THESE CORRESPOND TO WEAK TIES THAT
RAE ACTIVATED RELATIVELY RARELY?
Facebook
Facebook
T itt
Twitter under the microscope, Huberman 2009
d th
i
H b
2009
Twitter under the microscope, Huberman 2009
Exercícios
„ Exercícios 3-3 a 3-5, pagina 91, para entregar dia 30
„ Não haverá aula na próxima semana, conforme o
programa
„ Ler o Capítulo 3 e o capítulo 4 até pagina 117
Download