Theory! Engage brain ;) 1 Network Analysis: Social Networks Network Analysis ‘Network’ is a heavily overloaded term- >‘network analysis’ means different things to different people. Specific forms of network analysis are used in the study of diverse structures such as: – – – – – – – 3 aspects of the Internet, interlocking directorates, transportation systems, epidemic spreading, metabolic pathways, the Web graph, electrical circuits, etc. There is, however, a broad methodological foundation which is quickly becoming a prerequisite for researchers and practitioners working with network models. - “Network Analysis” -Brandes, Ulrik; Erlebach, Thomas (Eds.) Hint: how to make this theory easier ... 4 Understand first – get the idea straight. Then look for the formal definitions and terms. Draw a mind map during the lectures. Keep track of all new terminology. Learn it! Organize your mental framework. Networks and Graphs Mathematically, social (and other) networks can be represented as graphs or matrices (different to the charts and graphs you created in Excel) Representing a problem as a graph: – – 5 can provide a different point of view can make a problem much simpler and provide the appropriate tools for solving the problem http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt Terminology There are two components to a graph: – – – As a broad generalisation, in social networks: – 6 Nodes and edges Nodes also called points, vertices, actors (only in social networks) Edges also called ties, links, arcs (digraphs)) People are nodes and interactions between people are edges Formal definition A graph is defined as a set of nodes (a,b,c,d,e) and a set of links ({a,b}, {b,c}, {c,d}, {d,a}, {a,e}) that connect the nodes. This is sometimes written mathematically as G=(V,E) or G(V,E). The V and the E stand for vertices (nodes) and edges (links/ties) http://www.analytictech.com/mb021/graphtheory.htm 7 Social Network Analysis Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. The nodes in the network are the people and groups while the edges show relationships or flows between the nodes. SNA provides both a visual and a mathematical analysis of human relationships. Management consultants use this methodology with their business clients and call it Organizational Network Analysis [ONA]. http://www.orgnet.com/sna.html 8 Social Network Analysis Extremely fast growing field: – "supplier" to sociology, anthropology, psychology, management, health, etc – "client" of graph theory, algebra, statistics, sociometry, psychometry 9 Most complex systems are graph-like http://www.analytictech.com/networks/whatis.htm Social Network Examples: Friendship 10 http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt Social Network Examples: Scientific Collaboration 11 http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt Social Network Examples: Business ties in US biotech-industry 12 http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt Social Network Analysis Problems: – IRL - lack of sufficient computing resources to handle large datasets. Problematic to bound a social network IRL, E.G. If we are looking at needle-sharing among drug users, we can artificially bound the network at some arbitrary boundary, such as city or neighbourhood, but this distorts the data – Online – SNSs allow us to use groups and communities membership lists to create boundaries – criticised as too theoretical but a proper understanding gives enormous power to control people and events, as well as understanding of history (e.g. rise of Moscow in 12th century Russia in terms of betweenness centrality) 13 http://www.analytictech.com/networks/whatis.htm 14 Basic Graph Theory 15 Graph theory gives us the tools to analyse social networks The length of the lines and the shape of the graph doesn’t usually mean anything because all it is representing is that there is or is not a relationship These three graphs show the exact same network: Example 16 A network depicting the sites on the Internet, then known as the Arpanet, inDecember 1970. (Image from F. Heart, A. McKenzie, J. McQuillian, and D. Walden [215]; on-line at http://som.csudh.edu/cis/lpress/history/arpamaps/.) Examples An alternate drawing of the 13-node Internet graph on the previous slide – looks different but expresses the same thing. 17 More on graphs 18 The nodes in a graph represent persons (or animals, organizations, cities, countries, etc) and the lines (edges) represent relationships among them. The edge between persons a and b is represented mathematically like this: (a,b). The network drawn below contains these edges: (a,b), (a,e), (a,d), (b,c), and (d,c). http://www.analytictech.com/mb021/graphtheory.htm Mathematically: 19 V is a finite set, called the vertices of G, E is a finite set, called the edges of G, and φ is a function with domain E and codomain P2(V ). Degree, & the handshaking lemma 20 The degree of a node is defined as the number of lines incident upon that node. The degree of B is 6 because it has 6 links The handshaking lemma is the statement that every finite undirected graph has an even number of nodes/vertices with odd degree Name comes from mathematical proof that in a party of people some of whom shake hands, an even number of people must have shaken an odd number of other people's hands (think about it ;) ). Isolates and pendants 21 How would you add an isolate to this graph? In a SNS on the web, when would you typically be an isolate and for how long? What is a pendant? Locate one on the graph. Undirected graphs 22 There are different kinds of graphs, e.g. directed and undirected undirected graphs: edges have no direction. For example below, there is a relationship between a and b, and this is the same thing as saying there is a relationship between b and a. We could refer to the edge as (a,b) or (b,a) -- it makes no difference. Directed graphs 23 In directed graphs (also known as digraphs), the edges do have direction. In such cases, we typically draw the graph with arrowheads, and refer to the lines as "arcs". Example below records the social relation "who likes whom". Persons b, d and e all say they like person a. Note that person does not say they like d or e, but they do reciprocate with b. Nobody says they like e. More on graphs 24 In a directed graph, a point has both indegree and outdegree. The outdegree is the number of arcs from that point to other points. E.G., below the outdegree of node a is 1. The indegree is the number of arcs coming in to the point from other points. E.G., below the indegree of node a in is 3. Paths 25 A path is an alternating sequence of points and lines, beginning at a point and ending at a point, and which does not visit any point more than once Walks 26 A Walk is an alternating sequence of nodes and edges, beginning and ending with a vertex A walk is closed if its first and last vertices are the same, and open if they are different. An open walk is also called a path. Examples 27 (a) www.airlineroutemaps.com/USA/Northwest Airlines asia pacic.shtml, (b) www.wmata.com/metrorail/systemmap.cfm, (c) www.cs.cornell.edu/ugrad/owchart.htm Examples 28 The depictions of airline and subway systems in (a) and (b) are examples of transportation networks, in which nodes are destinations and edges represent direct connections. Much of the terminology surrounding graphs derives from metaphors based on transportation through a network of roads, rail lines, or airline flights A Water Molecule 29 Examples 30 The prerequisites among college courses in (c) is an example of a dependency network, in which nodes are tasks and directed edges indicate that one task must be performed before another. The design of complex software systems and industrial processes often requires the analysis of enormous dependency networks, with important consequences for efficient scheduling in these settings. Examples 31 The Tank Street Bridge from Brisbane, Australia shown in (d) is an example of a structural network, with joints as nodes and physical linkages as edges. The internal frameworks of mechanical structures like buildings, vehicles, or human bodies are based on such networks Connectedness and reachability 32 A graph is connected if there exists a path (of any length) from every node to every other node. The longest possible path between any two points in a connected graph is n-1, where n is the number of nodes in the graph A node is reachable from another node if there exists a path of any length from one to the other Examples A graph with 3 connected components. The graph as a whole is NOT connected. a connected component of a graph is a subset of the nodes such that: – (i) every node in the subset has a path to every other; and – (ii) the subset is not part of some larger set with the property that every node can reach every other 33 Examples 34 A network in which the nodes are students in a large American high school, and an edge joins two who had a romantic relationship at some point during the 18-month period in which the study was conducted Geodesic distance 35 The graph-theoretic or geodesic distance between two points is defined as the length of the shortest path between them. Thus, a geodesic path is the shortest path between 2 points If something is flowing through a network (such as gossip, or a disease), the time that it takes to get from one point to another is partly a function of the graph-theoretic distance/length of the geodesic path between them. Pivotal nodes 36 We say that a node X is pivotal for a pair of distinct nodes Y and Z if X lies on every geodesic (shortest) path between Y and Z (and X is not equal to either Y or Z) E.G. in graph below, node B is pivotal for two pairs: the pair consisting of A and C, and the pair consisting of A and D. B is not pivotal for the pair consisting of D and E – why? Is node D is not pivotal for any pairs? Exercises 37 Give an example of a graph in which every node is pivotal for at least one pair of nodes. Explain your answer. Give an example of a graph in which every node is pivotal for at least two different pairs of nodes. Explain your answer. Give an example of a graph having at least four nodes in which there is a single node X that is pivotal for every pair of nodes (not counting pairs that include X). Explain your answer. Centrality The centrality of a node in a network is a measure of the structural importance of the node. A person's centrality in a social network affects the opportunities and constraints that they face. There are three important aspects of centrality: – degree/activity, – betweenness, and – closeness. 38 Centrality 39 Degree/Activity – the number of direct connections a node has, i.e. who knows whom. the greater a person's degree, the more potential influence they have on the network, and vice-versa (good/bad -> gossip, viruses) Centrality 40 Here Diane has the most direct connections, making hers the most active node in the network. She is a 'connector' or 'hub' in this network. Centrality Degree/Activity – Often "the more connections, the better”, but not always. – Where do those connections lead, how do they they connect the otherwise unconnected? – Diane has connections only to others in her immediate cluster -her clique. She connects only those who are already connected to each other. 41 Centrality Betweenness – Loosely speaking, betweenness centrality is defined as the number of geodesic paths that pass through a node (remember that a geodesic path is the shortest path between 2 points) – It is the number of "times" that any route needs go through a given node to reach any other node by the shortest path, i.e. for a given node it is the number of shortest paths in the network that pass through that node 42 Calculating betweenness 43 Betweenness with respect to specific nodes = #geodesic paths through node/total #geodesic paths Olaf’s Betweenness Score with respect to Eliza and Latisha: Geodesic paths from Eliza to Latisha: EJOBL and EAOBL Geodesic paths from Eliza to Latisha through Olaf: EJOBL and EAOBL Olaf’s betweeness score with respect to Eliza and Latisha is 1. Calculating betweenness 44 Betweenness with respect to specific nodes = #geodesic paths through node/total #geodesic paths Anna’s Betweenness Score with respect to Eliza and Latisha: Geodesic paths from Eliza to Latisha: EJOBL and EAOBL Geodesic paths from Eliza to Latisha through Anna: EAOBL Anna’s betweeness score with respect to Eliza and Latisha is 1/2. Centrality Betweenness – A node that has high betweenness can control the flow of information, acting as a gatekeeper. (e.g., executive secretaries, power) – benefits to being in the middle: the information benefit from being plugged into different camps or regions of the network – e.g. hear different versions of the story, more data to be able to predict outcomes the control benefit of being able to play one person against the other. 45 Centrality Betweenness – Diane has many direct ties. Heather has few direct fewer than the average in the network but has one of the best locations in the network -- she is between two important constituencies, plays a 'broker' role in the network. – Also single point of failure -great influence over what flows and does not in the network Location, Location, Location." 46 47 Betweenness: this example shades nodes with hues from Red (least) to Blue (most betweenness). [Wikipedia] Centrality Closeness – Closeness centrality is defined the inverse of the sum of the shortest distances between each individual and every other person in the network 48 Centrality 49 Closeness of Olaf d(O,B) = 1 d(O,T) = 2 d(O,L) = 2 d(O,A) = 1 d(O,J) = 1 d(O,D) = 1 d(O,G) = 2 d(O,E) = 2 d(O,R) = 2 Centrality 50 Closeness of Olaf d(O,B) = 1 d(O,T) = 2 d(O,L) = 2 d(O,A) = 1 d(O,J) = 1 d(O,D) = 1 d(O,G) = 2 d(O,E) = 2 d(O,R) = 2 Sum the reciprocals: – 1/1 + 1/2 + 1/2 + 1/1 + 1/1 + 1/1 + 1/2 + 1/2 + 1/2 – Olaf’s closeness score is 6.5. Centrality 51 Closeness of Anna d(A,B) = 2 d(A,T) = 3 d(A,L) = 3 d(A,O) = 1 d(A,J) = 1 d(A,D) = 2 d(A,G) = 1 d(A,E) = 1 d(A,R) = 1 Centrality 52 Closeness of Anna d(A,B) = 2 d(A,T) = 3 d(A,L) = 3 d(A,O) = 1 d(A,J) = 1 d(A,D) = 2 d(A,G) = 1 d(A,E) = 1 d(A,R) = 1 Sum the reciprocals: - 5(1/1) + 2(1/2) + 2(1/3) - Anna’s closeness score is 6.66 Centrality Closeness – When a node has a low closeness score it is highly central – nodes with low closeness scores tend to receive anything flowing through the network very quickly because the speed with which something spreads in a network is a function of the number of links in the paths traversed. Since nodes with low closeness scores are close to all nodes, they receive things quickly. – good/bad dependent on what’s flowing! 53 Centrality Closeness – Fernando and Garth (lowest closeness scores) have fewer connections than Diane, yet the pattern of their direct and indirect ties allow them to access all the nodes in the network more quickly than anyone else shortest paths 54 Centrality Network Centralization – a very centralized network is dominated by one or a few very central nodes. If these nodes are removed or damaged, the network quickly fragments into unconnected sub-networks. – highly central node can become a single point of failure. A network centralized around a well connected hub can fail abruptly if that hub is disabled or removed – a less centralized network has no single points of failure, resilient Networks of low centralization fail gracefully. 55