QUICK REFERENCE GUIDE GRAPH CONCEPTS Name Graph Multigraph Weighted Graph Labelled Graph Distance between 2 nodes Simple Path Length of a path Definition A set of nodes & edges Can be directed or undirected A graph is connected if there is a path between any two of its vertices, otherwise they are connected components A graph that allows loops and multiple edges Graph with weighted edges A graph where its nodes or edges have properties (attributes) Shortest path between the 2 nodes Name Diameter of graph Definition Maximum distance in graph Example Distance between A & D is 2 Nodes are unique Number of Edges NETWORK CHARACTERISTICS A Full network contains all entities and connections among them Ego: Node in focus Alter: neighbor of Ego Egocentric Network: an ego and its connections Unimodal Network Multimodal Network Multiplex Network Only one type of vertex Vertices have ≥ 2 types e.g. person, document Edges of ≥ 2 types e.g. people and modes of communication 1 QUICK REFERENCE GUIDE Name Size of network Density of network Reachability Degree Centrality In-degree Centrality Out-degree Centrality Closeness Centrality Graph Level Metrics Definition The number of nodes in the network, or The number of edges in the network Number of ties in the network over number of ALL possible ties Directed network of size π, no. of ties = π × (π − 1) Undirected network of size π, π−1 no. of ties = π × 2 The ability to get from one vertex to another within a graph Vertex Metrics (Centrality) Count of the total number of connections linked to vertex Note: in/out degree for directed graphs Closeness Centrality = Sum of shortest distance to all other vertices −1 OR Average Distance to all other vertices OR (Average Distance to all other vertices)-1 Betweenness Centrality Usage Used to compute connectiveness of the network Using geodesic (shortest) distance, 1 = 0.25 1+1+1+1 1 Node B = = 0.14 1+2+2+2 1 Node C = = 0.17 1+2+1+2 1 Node D = = 0.2 1+2+1+1 1 Node E = = 0.17 1+2+2+1 Node A = Measure of how often a given vertex lies on the shortest path between two other vertices Number of shortest paths passing through v Betweenness Centrality = ∑ Number of shortest paths Node A B C D E Note: Betweenness centrality of all nodes = 0 when network density = 1 Eigenvector Centrality Depends on both the number and quality of its connections π½-centrality Metric Small value: Analysis weighted towards local structure surrounding the ego Positive Beta: Good for ego to be connected to highly central people Cut Vertex Bridge A vertex whose removal disconnects a graph An edge whose removal disconnects a graph Betweenness 0.5 1.5 0.0 0.5 1.5 Eigenvector 0.162 0.241 0.194 0.162 0.241 For Node B, betweenness = π΄π©πΆ + π΄π©πΈ ⁄π΄π·πΈ = 1 + 0.5 Large value: Weighs towards wider network structure Negative Beta: Ego’s disadvantage to be connected to others who are themselves well-connected Note: See Structural Balance 2 QUICK REFERENCE GUIDE Vertex Characteristics (Pivotal, Gatekeeper) A node X is Pivotal for a pair of distinct nodes Y and Z if X lies on every shortest path between Y and Z Pivotal Node Gatekeeper Node B is pivotal for pairs A & C, and A & D A node X is a Gatekeeper if for a pair of nodes Y and Z, every path from Y to Z passes through X Gatekeeper ο¨ Pivotal Gatekeeper/Pivotal ο¨ Local Gatekeeper A node V is a Local Gatekeeper if there are two neighbors of V, Y and Z, that are not connected by an edge Node A is a gatekeeper Node D is a local gatekeeper, but not a gatekeeper Comparison Generally, the 3 centrality types will be positively correlated, when they are not, it probably tells you something interesting about the network Low Degree High Degree High Closeness Key player tied to important important/active alters High Betweenness Ego's few ties are crucial for network flow Alter is super important, connected to a big chunk of the network Low Closeness Embedded in cluster that is far from the rest of the network Low Betweenness Ego's connections are redundant - communication bypasses him/her Alter connects to each other Probably multiple paths in the network, ego is near many people, but so are many others Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others. 3 QUICK REFERENCE GUIDE SOCIAL GROUPS Undirected Directed Reciprocity Dyads (2 nodes) 2 ties - Yes/No No, 1-way (which way), 2-way Ratio of all dyads to reciprocated r/s Ratio of all connected dyads to reciprocated r/s Total mutual Total connected Total dyads Reciprocity Undirected Directed Transitivity 2 6 10 2/10 OR 2/6 Cliques Every member of a clique knows everybody else, i.e. Density = 1 ο¨ Any subset of nodes from a clique also forms a clique N-Clique Members within a N-clique are at most N distance away from each other Example { A, C, E } forms a 2-clique BUT { A, B, C, E } is not a 2-clique because B-E is 3 distance away Triads (3 nodes) 0, 1, 2, or 3 ties 16 possible r/s (See below) π₯ → π¦ → π§ No π₯ → π¦ → π§ Yes π₯ → π§ Yes π₯ → π¦ → π§ Yes π₯ → π§ No Vacuous Transitive Intransitive See next page Triadic Closure Clans An N-clan is an N-clique where every pair has a path within the N-clique with distance ≤ π i.e., N-clan cannot use nodes outside the clique Example { A, B, C } is a 2-clan { A, C, E } is not a 2-clan Note: Nodes in N-clique can depend on non-clique nodes to form the Npath Clustering Clustering Coefficient Clustering coefficient of an ego is defined as: How well the alters are connected among themselves, i.e.: actual ties Max ties Density in 1.5 degree egocentric network Clustering coefficient of the entire network is the: Average of the clustering coefficients of ALL the nodes Agglomerative Bottom up: Start from singleton and merge Clustering Algorithm Divisive Top down: Start from cluster and split recursively 4 QUICK REFERENCE GUIDE STRUCTURAL BALANCE Triadic Closure is the idea that if two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future. Structural Balance Property: For every set of three nodes, considering the three edges connecting them, either all three are labeled +, or else exactly one of them is labeled + Strong Triadic Closure Property: if a node A has edges to nodes B and C, then the B-C edge is likely to form if A-B and A-C are both strong ties. Node A violates the Strong Triadic Closure Property as there is no edge between B and C at all. Bridge: An edge whose removal disconnects a graph Local Bridge: An edge whose removal results in a path > 2 from its endpoints, A & B, i.e. A & B have no common friends ∴ if A-B is a strong tie bridge, A/B cannot have a strong tie to another node or it violates the Strong Triadic Closure Property Weak Structural Balance Property: There is no set of three nodes s.t. the edges among them consist of exactly 2 +ve and 1 –ve, ο¨ If a graph is weakly balanced, its nodes can be divided into groups where every 2 nodes in the same group are friends and every 2 nodes in different groups are enemies Balance Theorem: If a labeled complete graph is balanced, then either all pairs of nodes are friends, or the nodes can be divided into two groups, X and Y, such that 1) every pair of nodes in X like each other, 2) every pair of nodes in Y like each other, and 3) everyone in X is the enemy of everyone in Y Graded Measure for a local bridge: number of nodes who are neighbors of πππ‘β A and B number of nodes who are neighbors of ππ‘ ππππ π‘ πππ of A and B i.e. if a complete graph has 2 sets of mutual friends, with complete mutual antagonism between the two, it is balanced 1 6 2 π΄πΈ = 5 ∴ π΄πΈ is more of a local bridge π΄πΉ = 5 QUICK REFERENCE GUIDE INFORMATION FLOW Freeman’s formula for Network Centralization πΆπ· = π (∑π=1[πΆπ· (π∗ ) − πΆπ· (ππ )]) (π − 1)(π − 2) πΆπ· is centralization of the network πΆπ· (ππ ) is degree centrality of ππππ π πΆπ· (π∗ ) is degree centrality of the highest centrality node π is the number of nodes in the network Flow Betweenness Let πππ be the amount of flow between vertex π and vertex π which must pass through π for any maximum flow. The flow betweenness of vertex π is the sum of all πππ where π, π and π are distinct and π < π . The flow betweenness is therefore a measure of the contribution of a vertex to all possible maximum flows Node 2 1-2: 2 1-3: 3 1-4: 4(2) 1-5: 2(1) 1-6: 4(2) 2-3: 0 2-4: 3 2-5: 1 2-6: 3 3-4: 1 3-5: 1 3-6: 2 4-5: 0 4-6: 2 5-6: 3 Centralization shows the degree of inequality or variance in the network as a percentage of that of a perfect star network of the same size0. Note: The star network is the most unequal network & πΆπ· = 1 Max Flow/Min Cut (Flow, Capacity) Max-flow min-cut theorem: for any network having a single origin node and a single destination node, the maximum possible flow from origin to destination equals the minimum cut value for all cuts in the network Finding Max Flow Node 3 1-2: 2 1-3: 3 1-4: 4(1) 1-5: 2(1) 1-6: 4(2) 2-3: 0 2-4: 3 2-5: 1 2-6: 3 3-4: 1 3-5: 1 3-6: 2 4-5: 0 4-6: 2 5-6: 3 Bookkeeping Algorithm: 1. 2. 3. 4. Find any path from source to sink that has a positive flow capacity remaining. If no more such paths, exit Determine π, the maximum flow along this path, which is equal to the smallest flow capacity on any arc in the path (the bottleneck arc) Subtract π from the remaining flow capacity in the forward direction Add π to the remaining flow capacity in the backwards direction for each arc (if needed) Go to Step 1 UCINET: Network > Centrality and Power > Flow Betweenness … Information Cascade Occurs when a person observes the actions of others and then—despite possible contradictions in his/her own private information signals—engages in the same acts Conditional Probability: Finding Min Cut A cut is any set of directed arcs containing at least one arc in every path from the source to the sink. The cut value is the sum of the flow capacities in the source-to-sink direction of all the arcs. By the max-flow min-cut theorem, the cut value of the min cut is equals to the max flow. UCINET: Network > Cohesion > Max Flow π(π΄|π΅) = π(π΄) β π(π΄|π΅) π(π΅) There are four key conditions in an information cascade model: 1. 2. 3. 4. Agents make decisions sequentially Agents make decisions rationally based on the information they have Agents do not have access to the private information of others A limited action space exists (e.g. an adopt/reject decision 6 QUICK REFERENCE GUIDE STUDY DESIGN 1. Basics: Measurements & Data Variable: Characteristic or property Scales: Nominal, Ordinal, Interval, Ratio Nominal Ordinal Interval Ratio Categorical; Qualitative e.g. Male, Female; North, South, East, West No concept of gap size: π > π > π e.g. first, second, third; primary, secondary, jc Gaps measured in continuous units Can perform +, − operations e.g. Celcius Ratios can be compared Can perform +, −,×,÷ operations e.g. dollars 3. Steps in doing a social network study Decide what to study Choose relevant population What type of scale to use?0 - Degree Centrality: Ratio Betweenness Centrality: Ratio Pivotal/Non-pivotal: Categorical Survey Ratings: Ratio Edge (Yes/No): Categorical Weighted edge (e.g., 1…10): Ratio 2. Data collection Asking Respondents Experiments Web Access Secondary Data 1) 2) 3) 4) Simple Questions (e.g. age, education) Survey Type Questions Open-ended questions Roster choice method, i.e., respondents given a list (roster) of people and asked questions about them e.g. which of the following would you regard as a friend Measure variables Web crawling Blogs, forums, social media 1) Datasets on the internet (context) 2) Reports 3) Email Records 4) Company transaction record Collect data Analyse Deduce Findings Report What to study? The Hypothesis See Notes for examples Variables Identify variables, consider independent variables e.g. Node properties, edge properties Level of Detail e.g. team email: sender, receiver, etc. Sampling Identify the population study is interested in Roles/positions (directors/politicians) Relationships (friends of …) Events (participation/communication) Time Location Complete Population (Census) VS Random (ego) + snowball (alters) Refer to 2. Data Collection Mixture of qualitative, descriptive statistics, and statistical tests Statistics, and compare with prior studies Clear, meaningful and obvious graphs Introduction ο¨ Literature Review ο¨ Objective (Hypothesis) ο¨ Methodology ο¨ Analysis ο¨ Findings 7 QUICK REFERENCE GUIDE UCINET CHEAT SHEET Action UCINET Steps UCINET Output Import text file Data > Import text file > DL (cntr-i) Display dataset Data > Display (cntr-d) Outputs: 1. Text log file 2. Network files, .##h & .##d Outputs matrix in selected data file NetDraw Visualize > NetDraw To open a dataset: File > Open > Ucinet dataset > Network Data > Unpack Separate files with multiple matrices Prepare Data Produce matrix from attributes Display Univariate Statistics Compute Network Metrics Data > Data Editors > Matrix Editor (cntr-s) Data > Attribute to matrix Note: Refer to Notes Tools > Univariate Statistics (cntr-u) Note: Refer to Notes Network > Centrality and power > Multiple measures (cntr-m) Input: .##h file 8 QUICK REFERENCE GUIDE Test Type UCINET Action / Input Test observed mean/density against a fixed value Network > Compare densities > Against theoretical parameter UCINET Output Test whether the density of the selected network is close to the Expected density. In this case, z-score is -3.7943 i.e. 3.79 s.d. to the left of expected density ο¨ observed density is significantly smaller than expected density of 1.0 as p-value = 0.0002 Find p-value against a fixed value Actual density as shown in UCINET Output in Display dataset Test of density (more than mean, takes into account of variability) difference between 2 networks Find p-value of 2 groups divided on node attributes Correlation between 2 networks with same actors Analysis Network > Compare densities > Paired (same nodes) Used to compare density difference between two networks. Good for testing time difference of the same network t-statistic = 2.4089 p-value = 0.0052 < 0.05 ∴ the difference in density is significant Compares Matrix VS Matrix. Tools > Testing Hypothesis > Dyadic (QAP) > QAP Correlation (cntr-q) Find r Col 1: coefficient for dataset Col 2: p-value Col 3: average coefficient of all sampled datasets β΅ p − value > 0.05 ο¨ correlation is not significant Compares Matrix VS Matrix. NO dependency 9 QUICK REFERENCE GUIDE Test Type UCINET Action / Input Regression (you have control over the independent variable) Tools > Testing Hypothesis > Dyadic (QAP) > QAP Regression > MR-QAP Linear Regression > Double Dekker: if no missing values Semi-partialling: missing values T-test of 2 group means Tools > Testing Hypothesis > Node-level > TTest UCINET Output Analysis Compares Matrix VS Matrix Look at R-sq first to see if model is a good fit. Then look at individual variables Compares Column VS Column ANOVA for 2 or more groups Tools > Testing Hypothesis > Node-level > Anova T-test used to test if there are differences between the means of two groups, in this case, whether the govt or non-govt groups have different out-degree centrality (col 1). Is one group bigger than the other? Result: No difference across groups, all pvalues are > 0.05 Look at f-statistic and significance. Significance is the same as that of two-tailed test. Note: Refer to Notes Compares Column VS Column 10 QUICK REFERENCE GUIDE Triad Undirected π1 π2 π3 π1 : No. of mutual dyads π2 : No. of asymmetric dyads π3 : No. of null dyads D: Down U: Up T: Transitive C: Cyclic 11