Redes Complexas: t i algoritmos teoria, l it Chapter 3a Virgílio A. F. Almeida Março de 2010 Departamento de D d Ciência Ciê i da d Computação C ã Universidade Federal de Minas Gerais O que são as redes? Redes são coleções de pontos e linhas. p “Rede” ≡ “Grafo” 1 2 nó 3 aresta 4 5 pontos linhas vertices Arcos, arestas matematica nós Links, arestas CC atores ligações relações ligações, sociologia Elementos da Redes (Networks) : arestas • Orientado • A -> B • Não orientado • A <-> B or A – B • Atributos das arestas • Ponderação (ex. Frequencia de conexões entre 2 sites) • Ranking-ordenamento: no roteador a ordem dos links de saída • tipo ti (ex. ( em sociologia; i l i amigo, i parente, t co-autor) t ) • Propiedades que dependem da estrutura do resto grafo, ex.: betweenness Grafos Ponderados (Weighted Graphs) Cada vértice é anotado por um weight or capacity Orientado ou não-orientado Usado para modelar Custo de transmissão, latência Capacidade C id d d do li link k hubs and authorities (Google PageRank algorithm) Matriz de Adjacência Representação de arestas como matrix Aij = 1 se nó i tem uma aresta para j = 0 if nó i não tem aresta para j j i Aii = 0 a não ser no caso de self-loops i Aij = Aji se a rede é não-orientada não-orientada, ou se i e j compartilham arestas recíprocas Exemplos: 2 3 1 5 4 A= i 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 0 j Lista de Adjacência Lista de arestas 23 24 32 34 45 52 51 Lista de Adjacência Mais adequado q p para redes g grandes e esparsas Fácil recuperação de nós vizinhos 1: 2: 3 4 3: 2 4 4: 5 5: 1 2 2 3 1 5 4 Nós Propriedades dos nós da rede Das conexões imediatas (locais) indegree quantas arestas orientadas que chegam a um nó outdegree quantas arestas orientadas originam em um nó Grau: Degree (in ou out): número de arestas que incidem em um nó Do grafo inteiro (globais) centrality y ((betweenness,, closeness)) indegree=3 outdegree=2 degree=5 2 3 1 5 4 Outdegree = Grau de Nós da Matrix de Valores n ∑A j =1 ij A= exemplo: o outdegree para nó 3 é 2, certo? n ∑A j =1 ∑A i =1 ij exemplo: l o indegree i d para nó ó 3 é 1, 1 n ∑A i =1 i3 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 0 3j n Indegree = 0 A= 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 0 Métricas das Redes: sequência dos graús e distribuição dos graús Sequência de graus: Uma lista ordenada dos (in,out) graus de cada nó Sequência In-degree: In degree: [2, 2, 2, 1, 1, 1, 1, 0] Sequência Out-degree: [[2,, 2,, 2,, 2,, 1,, 1,, 1,, 0]] Sequencia de graus não-orientados [3, 3, 3, 2, 2, 1, 1, 1] Distribuição dos graus: contagem de frequencia da ocorrência de cada grau 5 4 frequen ncy Di Distribuição t ib i ã d de In-degree: I d [(2,3) (1,4) (0,1)] Distribuição de Out-degree: [(2,4) (1,3) (0,1)] Distribuição não orientada: [(3,3) (2,2) (1,3)] 3 2 1 0 0 1 indegree 2 Métricas das Redes: componentes conectados Componentes fortemente conectados: Strongly connected components (SCC): cada nó dentro do componente pode ser alcançado de outro nó do componente seguindo arestas orientadas orientadas. B SCC BCDE A GH F F G C A E H D Componentes fracamente conectatos (Weakly connected components WCC): cada nó pode ser alcançado a partir de qualquer outro nó seguindo arestas em qualquer direção direção. WCC ABCDE GHF Em redes não orientadas, simplesmente refere-se a componentes conectados B F G C A E D H Métricas das Redes: caminho mínimo ‐ shortest paths Caminho minimo: a menor sequência de arestas conectando dois nós. Nem sempre única B 3 A e C são conectados por 2 shortest paths A–E–B-C A–E–D-C A C 2 1 3 E 2 D Diametro:a maior distância geodésica no grafo A distancia entre A e C é o máximo para o grafo: 3 Obs: o termo ‘diametro’ é as vezes usado como sendo a distância do caminho mínimo médio. Wikipedia: the distance between two vertices in a graph h is i th the number b off edges d iin a shortest h t t path th connecting ti them. th This Thi iis also known as the geodesic distance. Cliques e grafos completos Kn é um grafo completo (clique) com K vértices Cada vértice é conectado a todos outros vértices. K3 K5 K8 Clustering Coefficient Transitividade se A está conectado a B e B é conectado a C qual a probabilidade que A seja conectado C? Amigos dos meus amigos são provavelmente meus amigos... Clustering coefficient CC = numero de triângulos conectados ao nó i Número de triplas centradas no nó i ? A CC = [0 : 1] B C Local clustering coefficient (Watts&Strogatz 1998) Para um vertice i A fração de pares de vizinhos do nó que estão eles próprios conectados Seja ni o número de vizinhos do vértice i CCi = number of connections between i’s neighbors maximum number of p possible connections between i’s neighbors g CCi directed = CCi undirected = # directed connections between i’s neighbors ni * (ni -1) # undirected di t d connections ti b between t i’i’s neighbors i hb ni * (ni -1)/2 Local clustering coefficient (Watts&Strogatz 1998) g ( g ) Média sobre todos nós n 1 C = ∑ Ci n i i link presente link ausente ni = 4 max número de conexões: 4*3/2 = 6 3 conexões presentes Ci = 3/6 = 0.5 factors influencing diffusion network structure (unweighted) density degree distribution clustering connected components community structure strength of ties (weighted) strength of ties (weighted) frequency of communication strength of influence st e gt o ue ce factors influencing diffusion: proximity Scalable Proximity Estimation and Link Prediction in Online Social Networks –ACM-IMC ACM IMC 2009 A central concept in the computational analysis of social A central concept in the computational analysis of social networks is proximity measure, which quantifies the closeness or similarity between nodes in a social network. Proximity measures form the basis for a wide range of important applications in social and natural sciences (e.g., modeling complex networks [6, 13, 25, 42]), business (e.g., viral l k [6 13 25 42]) b i ( i l marketing [23], fraud detection [11]), information technology (e g improving Internet search [35] collaborativefiltering (e.g., improving Internet search [35], collaborativefiltering [7]), computer networks (e.g., constructing overlay networks [45]), and cyber security (e.g., mitigating email spams [22], defending against Sybil attacks [56]). Studies influencing diffusion 1. The Strength of Weak Ties MS, Granovetter 1973 Granovetter,1973 2 The Structure of Information Pathways in a 2. The Structure of Information Pathways in a Social Communication Network. Kossinets et al 2008 al, 2008 Artigo 1: The Strength of Weak Ties MS Granovetter - American Journal of Sociology Sociology, 1973 UChicago Press www.si.umich.edu/~rfrost/courses/SI110/readings/In_Out_and_Beyond/Granovetter.pdf 19 Artigo 2: G Kossinets, G. K i t J. J Kleinberg, Kl i b D D. Watts. W tt The Structure of Information Pathways in a Social Communication Network. Proc. 14th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining Mining, 2008 2008. http://www.cs.cornell.edu/home/kleinber/chrono.html 20 The Strength of Weak Ties Mark S. Granovetter The American Journal of Sociology, 1973 Introduction “One of the most influential sociology papers ever written” (Barabasi) ( ) One of the most cited (Current Contents, 1986) Interviewed people and asked: “How did you find your job?” Kept getting the the same answer: “through an acquaintance, not a friend” How does strength of a tie influence diffusion? M. S. Granovetter: The Strength of Weak Ties, AJS, 1973: finding a job through a contact that one saw frequently q y ((2+ times/week)) 16.7% occasionally (more than once a year but < 2x week) 55.6% rarely 27.8% but… length of path is short contact t t directly di tl works k for/is f /i the th employer l or is connected directly to employer Context Lots of studies of macro patterns Social mobility mobility, community organization Data and studies for micro behavior Interactions within small groups Limited understanding of how micro behavior translates into macro patterns Network Analysis Analysis of the interaction network bridge the gap between micro and macro Interaction network Nodes: People Edges: Between people with a social relationship Weight: strength of connection Quantize to either “weak” or “strong” Basic argument Classify interpersonal relations as “strong”, “weak” weak , or “absent” absent Strength g is ((vaguely) g y) defined as “a (p (probably y linear)) combination of… the amount of time, the emotional intensity, intensity the intimacy (mutual confiding), and the reciprocal services which characterize the tie Negative and/or asymmetric ties (e.g. enemies or relations with p power imbalance)) are brushed aside for now 27 Basic argument (cont.) The stronger the tie between two individuals, the larger the proportion of people to which they are both tied (weakly or strongly) In the extreme case, two p people p that are always y together will be tied to the same individuals 28 Forbidden triad If person A has a strong tie to both B and C, then it is unlikely for B and C not to share a tie. tie Granovetter (admittedly) exaggerates and supposes such a triad never occurs 29 Bridges A bridge is “a line in a network which provides the only path between two points” points Therefore, Therefore if the previous triad is in fact absent, absent no strong tie is a bridge In other words,, all bridges g are weak ties! (realistically, bridges can be local rather than global, but still weak) 30 Strength of weak ties “Intuitively speaking, this means that whatever is to be diffused can reach a larger number of people, and traverse greater social distance (i.e., path length), p g ) when p passed through g weak ties rather than strong.” Consequences Diffusion of information (rumours, innovations, getting a j b!) job!) Homophily p cohesion and trust Group Traversal of networks and node coverage 31 Bridges Bridge: An edge that is part of every path between two nodes Bridge b/w red & green Local Bridges Local Bridge of degree N: An edge that is part of every path of length less than N Generalization of a bridge Local bridge of deg 3 b/w A & B A B Bridges Bridges allow diffusion of information between otherwise disconnected communities. Local bridges bring otherwise distant communities together “Bridge” concept provides an important piece of the micro => macro puzzle What sort of relationships act as bridges? Granovetter Transitivity The stronger the tie between A and B, the larger the overlap p in their relationship p circles Strong tie => lots of time together => lots of opportunity for B to meet the A’s ffriends i d similarity => greater chance that B will be “compatible” with A’s friends physiological need for congruence => B will have a natural affinity for A’s friends, based on A’s opinion of them Forbidden Triad This triad will resolve to a fully connected triad New edge need not be strong Alternate: Any time strong tie A-B exists, then all of A’ strong A’s t tities will ill b be att lleastt weakly kl connected t d tto B Supported by evidence All Bridges are Weak Ties! Proof: If A A-B B and A-C A C are strong, then forbidden triad implies that B-C B C is at least weak If A-B is deleted, then A can still reach B via A-C-B Small corner case: if both nodes have only a strong edge to each other, and no other strong edges, than it is a bridge Unlikely in reality All local bridges are also weak ties Proof is identical Implications Removal of weak ties raises path lengths more than removal of strong g ties Assume: probability of info passing successfully between two nodes is proportional to the number of paths connecting the two nodes is inversely proportional to length of those paths Conclusion: Removal of a weak edge damages the connectivityy more than the removal of a strong g edge g Evidence Junior High Experiment: ((Rapoport p p and Horvath,, 1961)) Student writes down an ordered list of 8 friend Pick a random starting student B dth fi Breadth firstt search h on 1 1stt and d2 2nd d ffriends i d Count number of students seen after each cycle Repeat p using g 3/4th, 5/6th, 7/8th Largest number of people reached by using 7/8th, smallest using 1/2nd Ti S Tie Strength and Social Media: h d S i l M di facebook and twitter PEOPLE MAINTIAN HUNDREDS OF FRIENDSHIP LINKS: HOW MANY OF THESE CORRESPOND TO STRONG TIES THAT INVOLVE FREQUENT Q CONTACT, AND HOW MANY OF THESE CORRESPOND TO WEAK TIES THAT RAE ACTIVATED RELATIVELY RARELY? Facebook Facebook T itt Twitter under the microscope, Huberman 2009 d th i H b 2009 Twitter under the microscope, Huberman 2009 Exercícios Exercícios 3-3 a 3-5, pagina 91, para entregar dia 30 Não haverá aula na próxima semana, conforme o programa Ler o Capítulo 3 e o capítulo 4 até pagina 117