An Introduction to Social Network Analysis James Moody Department of Sociology The Ohio State University Introduction The world we live in is connected: Jim Moody Craig Calhoun Isaias Afworki Introduction These patterns of connection form a social space. Social network analysis maps and analyzes this social space. Adolescent Social Structure Introduction Yet standard social science analysis methods do not take this space into account. Moreover, the complexity of the relational world makes it impossible (in most cases) to understand this connectivity using only our intuitive understanding of a setting. Introduction Why networks matter: • Intuitive: information travels through contacts between actors, which can reflect a power distribution or influence attitudes and behaviors. Our understanding of social life improves if we account for this social space. • Less intuitive: patterns of inter-actor contact can have effects on the spread of “goods” or power dynamics that could not be seen focusing only on individual behavior. Introduction Social network analysis is: •a set of relational methods for systematically understanding and identifying connections among actors •a body of theory relating to types of observable social spaces and their relation to individual and group behavior. Introduction Network analysis assumes that: • How actors behave depends in large part on how they are linked together Example: Adolescents with peers that smoke are more likely to smoke themselves. • The success or failure of organizations may depend on the pattern of relations within the organization Example: The ability of companies to survive strikes depends on how product flows through factories and storehouses. (continued…..) Introduction Network analysis assumes that: • Patterns of relations reflect the power structure of a given setting, and clustering may reflect coalitions within the group Example: Overlapping voting patterns in a coalition government Introduction An information network: Email exchanges within the Reagan white house, early 1980s (source: Blanton, 1995) Introduction Power positions and potential influence Overview Introduction Basic Concepts Flows within Networks Structure of Social Space Tools, Models & Methods For Flows and Structures Conclusions Basic Concepts • Actors are nodes Ideas, Papers, Events, Individuals, Organizations, Nations • Relations are lines between pairs of nodes Symmetric (shares a room with) Asymmetric (gives an order to) Valued (number of times seen together) Basic Concepts • Network data are familiar to you • For example: - Personal, face-to-face contact - Telephone contact - Email contact - Contact through faxes or wires - Snail-mail contact - Membership in the same organization - Attendance at the same meetings - Graduates of the same university Basic Concepts For example, you might be tracking the activities of a number of people in related, but not identical cases, including meetings they attended. You may know little of the content of the event, or what they may have said to each other, only whether particular people were at the event. Your data might look like: Basic Concepts 11.19.2001. Meeting at Brussels. Attending: Smith, Johnson, Davis, James, Jackson 12.22.2001. Meeting at Paris. Attending: Johnson, James, Jones, Wilson 1.12.2001. Meeting in New York. Jones, Carter, Burns Attending: 2.14.2001. Meeting in Denver. Attending: Wilson, Burns, Wilf, Newman (Red bold indicates people who are the focus of an investigation) Basic Concepts While perhaps not immediately apparent when looking at the list of names, a simple algorithm reveals connections among these actors. Smith Johnson Newman Wilson Jackson Wilf James Jones Burns Davis Carter Basic concepts Types of network data: 1) Ego-network - Have data on a respondent (ego) and the people they are connected to (alters) - May include estimates of connections among alters Basic concepts Types of network data: 2) Partial network - Ego networks plus some amount of tracing to reach contacts of contacts - Something less than full account of connections among all pairs of actors in the relevant population - Example: CDC Contact tracing data for STDs Basic concepts Types of network data: 3) Complete - Data on all actors within a particular (relevant) boundary - Never exactly complete, but boundaries are set - Example: Coauthorship data among all writers in the social sciences Examples: linked levels of data Actor Key contact Contact’s contact Primary Relation Alter Relation Trace Relation Why networks matter: Consider the following (much simplified) scenario: •Probability that actor i passes information to actor j (pij)is a constant over all relations = 0.6 •S & T are connected through the following structure: S T •The probability that S passes the information to T through either path would be: 0.09 Why networks matter: Now consider the following (similar?) scenario: S T •Every actor but one has the exact same number of contacts •The category-to-category mixing is identical •The distance from S to T is the same (7 steps) •S and T have not changed their behavior •Their contacts’ contacts have the same behavior •But the probability of the information passing from S to T is: = 0.148 •Different outcomes & different potentials for intervention Overview Introduction Basic Concepts Flows within Networks Structure of Social Space Tools, Models & Methods For Flows and Structures Conclusions Network Flow In addition to the simple probablity that one actor passes information on to another (pij), two factors affect flow through a network: Topology -the shape, or form, of the network - Example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contact matters - Example: an actor cannot pass information he has not receive yet Topology Two features of the network’s shape are known to be important: connectivity and centrality Connectivity refers to how actors in one part of the network are connected to actors in another part of the network. • Reachability: Is it possible for actor i to reach actor j? This can only be true if there is a chain of contact from one actor to another. • Distance: Given they can be reached, how many steps are they from each other? • Number of paths: How many different paths connect each pair? Network topology: reachability Without full network data, you can’t distinguish actors with limited information from those more deeply embedded in a setting. c b a Network topology: distance & number of paths Given that ego can reach alter, distance determines the likelihood of information passing from one end of the chain to another. • Because information spread is never certain, the probability of transfer decreases over distance. • However, the probability of transfer increases with each alternative path connecting pairs of people in the network. Network topology: distance & number of paths Distance is measured by the (weighted) number of relations separating a pair: Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1 a Network topology: distance & number of paths Paths are the different routes one can take. Node-independent paths are particularly important. b There are 2 independent paths connecting a and b. There are many nonindependent paths a Probability of information transfer by distance and number of paths, assume a constant p ij of 0.6 1.2 1 probability 10 paths 0.8 5 paths 0.6 2 paths 0.4 1 path 0.2 0 2 3 4 Path distance 5 6 Reachability in Colorado Springs (Sexual contact only) •High-risk actors over 4 years •695 people represented •Longest path is 17 steps •Average distance is about 5 steps •Average person is within 3 steps of 75 other people •137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths (Node size = log of degree) Network topology: centrality Centrality refers to (one dimension of) location, identifying where an actor resides in a network. • For example, we can compare actors at the edge of the network to actors at the center. • In general, this is a way to formalize intuitive notions about the distinction between insiders and outsiders. Centrality example: At the local level, we expect people like NSJMP and NSOLN to have greater access to information than others in the network. Network analysis gives us a set of tools to quantify this difference. Centrality example: Actors that appear very different when seen individually, are comparable in the global network. (Node size proportional to betweenness centrality ) Information flows Two factors that affect network flows: Topology - the shape, or form, of the network - simple example: one actor cannot pass information to another unless they are either directly or indirectly connected Time - the timing of contacts matters - simple example: an actor cannot pass information he has not receive yet Timing in networks A focus on contact structure often slights the importance of network dynamics Time affects networks in two important ways: 1) The structure itself goes through phases that are correlated with information spread 2) The timing of contact constrains information flow Changes in Network Structure Sexual Relations among A syphilis outbreak Rothenberg et al map the pattern of sexual contact among youth involved in a Syphilis outbreak in Atlanta over a one year period. (Syphilis cases in red) Jan - June, 1995 Sexual Relations among A syphilis outbreak July-Dec, 1995 Sexual Relations among A syphilis outbreak July-Dec, 1995 Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 1 Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 2 Current year in red, past relations in gray Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 3 Current year in red, past relations in gray Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 4 Current year in red, past relations in gray Data on drug users in Colorado Springs, over 5 years Drug Relations, Colorado Springs, Year 5 Current year in red, past relations in gray What impact does timing have on flow through the network? In addition to changes in the shape over time, contact timing constrains how information can flow through the network. Consider the following example: A hypothetical contact network C A 2-5 8-9 E B D Numbers above lines indicate contact periods 3-5 F The path graph for the hypothetical contact network A C E D F B Direct contact network of 8 people in a ring 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (adjacency matrix: cell = number of paths from row to column) Implied contact network of 8 people in a ring All contacts concurrent 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 1 1 1 2 2 2 2 2 1 Implied contact network of 8 people in a ring Mixed Concurrent 3 2 1 2 1 1 1 1 1 2 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Density = 0.57 1 1 1 1 1 1 Implied contact network of 8 people in a ring Serial (1) 8 1 1 2 7 3 6 5 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Density = 0.73 1 1 1 1 1 1 1 Implied contact network of 8 people in a ring Serial (2) 8 1 1 2 7 1 1 1 1 1 1 1 1 3 6 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Density = 0.51 1 1 1 Implied contact network of 8 people in a ring Serial (3) 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 Density = 0.43 1 1 1 1 Information flows Summary: Topology: - Information requires connected communication chains - Real-world networks are too complex to map these without specialized tools. Time: - Network topology changes over time. This has implications for information flow. - Because small changes in relationship timing can have dramatic effects on information flow, it is impossible to know this intuitively. Overview Introduction Basic Concepts Flows within Networks Structure of Social Space Tools, Models & Methods For Flows and Structures Conclusions Structure of Social Space Information flows are only one use of networks It is also possible to characterize the key topological features of any social network. These features include things such as the extent of hierarchy and clustering. Structure of Social Space 1) Identify core groups & patterns of relations among groups a. embeddedness in groups constrains action b. group structure affects stability & resource distribution 2) Locate tensions or inconsistencies in a relational structure that might indicate sources of social change. Structure of Social Space Two features of interest related to network structure: 1) Cohesive groups: Sets of people who interact frequently with each other. These are often groups that work together. Groups are often organized into positions within a network that indicate particular roles or access resources 2) Hierarchy: Relational structure can identify the leadership positions within a network, though either direction of ties or periphery status Structure of cohesive groups A cohesive group is a set of actors with more interaction inside the group than outside the group, mutually connected through multiple paths. Cohesive Group Structure “Immaculate Preparatory High School” Cohesive Group Structure: 3 types of positions “Immaculate Preparatory High School” Cohesive Group Structure: Group member “Immaculate Preparatory High School” Cohesive Group Structure: Group Member “Immaculate Preparatory High School” Cohesive Group Structure: Bridge between groups “Immaculate Preparatory High School” Cohesive Group Structure: Outsider “Immaculate Preparatory High School” Cohesive Groups: Relevance • Identify people who bridge important constituencies - people who are between groups have a unique ability to control information •Such actors are said to bridge structural holes, the number of “holes” an actor bridges gives insight into an actor’s power position in the network. Hierarchy and network position Many cohesive groups are embedded within a hierarchy, which one can map using relational tools. Changes in the hierarchical position indicate changes in the power structure. Examples of Hierarchical Systems Linear Hierarchy (all triads transitive) Simple Hierarchy Branched Hierarchy Mixed Hierarchy Hierarchy and network position If you don’t know the hierarchy of the network, asymmetry optimization techniques allow one to identify levels in a hierarchy Hierarchy and network position If you don’t know the hierarchy of the network, asymmetry optimization techniques allow one to identify levels in a hierarchy Group structure through multiple relations Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Thus, we might represent a family as: H W C C C Romantic Love Provides food for Bickers with (and there are, of course, many other relations inside the family) Group structure through multiple relations The key idea, is that we can express a role through a relation (or set of relations) and thus a social system by the inventory of roles. If roles equate to positions in an exchange system, then we need only identify particular aspects of a position. But what aspect? Structural Equivalence Two actors are structurally equivalent if they have the same types of ties to the same people. Structural Equivalence A single relation Structural Equivalence Graph reduced to positions Alternative notions of equivalence Instead of exact same ties to exact same alters, you look for nodes with similar ties to similar types of alters Overview Introduction Basic Concepts Flows within Networks Structure of Social Space Tools, Models & Methods For Flows and Structures Conclusions Tools, Methods & Models Data Representations Adjacency Matrix Graph 1 2 3 5 4 Arc List Send Recv 1 2 1 3 2 4 3 2 4 1 4 2 4 3 4 5 5 1 5 3 5 4 Node List Tools, Methods & Models Graphical Display Benefits: •Intuitive way to display networks. •Helps people see the social space – it is a map. •A concise presentation of a great deal of data. Costs: •Lack of standards for how to display can create misleading images. •Displays of large networks tend to reveal only the roughest properties of the network Tools, Methods & Models Graphical Display: Software PAJEK •Program for analyzing and plotting very large networks •Intuitive windows interface •Used for most of the real data plots in this presentation •Mainly a graphics program, but is expanding the analytic capabilities •Free •Available from: Tools, Methods & Models Graphical Display: Software Cyram Netminer for Windows •Very new: largely untested •Price range depends on application •Limited to smaller networks O(100) Tools, Methods & Models Graphical Display: Software NetDraw •Also very new, but by one of the best known names in network analysis software. •Free •Limited to smaller networks O(100) Tools, Methods & Models Analysis Methods: Descriptive / Measurement The key text for methods and measurement is: Wasserman, Stanley and Katherine Faust. 1994. Social Network Analysis. Cambridge: Cambridge University Press. The basic network measures use graph theory to formalize aspects of the network, and always work from either an adjacency matrix (slow for large graphs) or an edge/node list. Tools, Methods & Models Analysis Methods: Descriptive / Measurement Properties of interest include: Individual Level: Degree: Number of contacts for each person - Sum over the row/column of the adjacency matrix. Closeness Centrality: Inverse of the distance to every other node in the network. Count path distances from ego to alters. Sub-group Level: Group Membership: Which groups are there? Various search algorithms for identifying groups. Group Position: Where does a given group fit in the overall flow of relations? Various Equivalence algorithms. Graph Level: Density: Number of ties present as a percentage of all possible ties. Centralization: To what degree are edges focused through a small number of nodes. Various formulas for different centrality indices. Tools, Methods & Models Analysis Methods: Descriptive / Measurement: Software 1) UCI-NET •General Network analysis program, runs in Windows •Good for computing measures of network topography for single nets •Input-Output of data is a little clunky, but workable. •Not optimal for large networks •Available from: Analytic Technologies Borgatti@mediaone.net 2) STRUCTURE •“A General Purpose Network Analysis Program providing Sociometric Indices, Cliques, Structural and Role Equivalence, Density Tables, Contagion, Autonomy, Power and Equilibria In Multiple Network Systems.” •DOS Interface w. somewhat awkward syntax •Great for role and structural equivalence models •Manual is a very nice, substantive, introduction to network methods •Available from a link at the INSNA web site: http://www.heinz.cmu.edu/project/INSNA/soft_inf.html Tools, Methods & Models Analysis Methods: Descriptive / Measurement: Software 3) NEGOPY •Program designed to identify cohesive sub-groups in a network, based on the relative density of ties. •DOS based program, need to have data in arc-list format •Moving the results back into an analysis program is difficult. •Available from: William D. Richards http://www.sfu.ca/~richards/Pages/negopy.htm 4) SPAN - Sas Programs for Analyzing Networks (Moody, ongoing) •is a collection of IML and Macro programs that allow one to: a) create network data structures from nomination data b) import/export data to/from the other network programs c) calculate measures of network pattern and composition d) analyze network models •Allows one to work with multiple, large networks •Easy to move from creating measures to analyzing data •All of the Add Health data are already in SAS •Available by sending an email to: Moody.77@osu.edu Tools, Methods & Models Analysis Methods: Statistical Models There are two general classes of statistical models for networks: 1) Models of the network itself The statistical question is how an observed network fits into the class of all possible random graphs with a given set of topological characteristics. The whole network is the substantive unit of analysis, though technically one works with the dyads from the network. Examples: p* models (Wasserman and Pattison), MCMC random graph models (Tom Snijders, Mark Handcock) 2) Models of individual behavior that incorporate network characteristics The statistical question is whether or not network properties affect individual behaviors. Examples: Network regressive-autoregressive models (Doriean), Peer influence models (Friedkin) Tools, Methods & Models Analysis Methods: Statistical Models Exponential Random Graph Models exp( z ( x)) p ( X x) ( ) Where: z is a collection of r explanatory variables, calculated on x 2 is a collection of r parameters to be estimated k is a normalizing constant that ensures the probability sums to 1. As it turns out, k is incredibly difficult to identify, introducing a number of complexities to the model. Tools, Methods & Models Analysis Methods: Statistical Models Exponential Random Graph Models To estimate the model, we work with the conditional probabilities (Xij|Xcij) instead of the full graph. This transforms the exponential model to a logit model on the dyads: exp{ z ( xij )} c p( X ij 0 | X ij ) exp{ z ( xij )} p( X ij 1 | X ijc ) exp{ [ z ( xij ) z ( xij )]} p( xij 1 | X ijc ) wij log [ z ( x ) z ( x ij ij )] c p( xij 0 | X ij ) Analysis Methods: Statistical Models Exponential Random Graph Models Software for analyzing these models is available from: Logit Pseudo-Likelihood estimation: http://kentucky.psych.uiuc.edu/pstar/index.html (SPSS programs) http://www.sfu.ca/~richards/Pages/pspar.html (Program for Large graphs) Empirically, these models are tricky to estimate, as the potential result space can easily become degenerate, particularly as z starts to include a more complicated rage of dependencies. MCMC Estimation: Ongoing work by Mark Handcock, Tom Snijders and Co. Tools, Methods & Models Analysis Methods: Statistical Models Network Effect Models Question is whether or not being connected to a particular set of people affects an individual’s behavior. The key statistical point is that we have abandoned the assumption that our cases are independent. These models originated in spatial statistics – looking at the effect of an adjacent geographic area on outcomes for any given area. Basic Peer Influence Model Formal Model Y Y (t ) (1) XB αWY (T 1) (1) (1 α)Y (1) (2) Y(1) = an N x M matrix of initial opinions on M issues for N actors X = an N x K matrix of K exogenous variable that affect Y B = a K x M matrix of coefficients relating X to Y a = a weight of the strength of endogenous interpersonal influences W = an N x N matrix of interpersonal influences Basic Peer Influence Model Formal Model Y (1) XB (1) This is the basic general linear model. It says that a dependent variable (Y) is some function (B) of a set of independent variables (X). At the individual level, the model says that: Yi X ik Bk k Usually, one of the covariates is e, the model error term. Basic Peer Influence Model Y (t ) αWY (T 1) (1 α)Y (1) (2) This part of the model taps social influence. It says that each person’s final opinion is a weighted average of their own initial opinions (1 α)Y (1) And the opinions of those they communicate with (which can include their own current opinions) αWY (T 1) Basic Peer Influence Model The key to the peer influence part of the model is W, a matrix of interpersonal weights. W is a function of the communication structure of the network, and is usually a transformation of the adjacency matrix. In general: 0 wij 1 w ij 1 j Various specifications of the model change the value of wii, the extent to which one weighs their own current opinion and the relative weight of alters. Basic Peer Influence Model Formal Properties of the model If we allow the model to run over t, we can describe the model as: Y () αWY () (1 α) XB The model is directly related to spatial econometric models: Y () αWY () ~ X e Where the two coefficients (a and ) are estimated directly (See Doreian, 1982, SMR) Overview Introduction Basic Concepts Flows within Networks Structure of Social Space Tools, Models & Methods For Flows and Structures Conclusions