Foundations of Network Analysis Overview Theory: A structural Approach to Sociology •Emirbayer •Martin Methods: •Points and Lines •Data formats •Matrices •Adjacency Lists •Edge Lists •Basic Graph Theory Homework Results JWM’s 3-step kinship neighborhood (plus in-laws for fun) N=70+ Foundations Theory “A manifesto for Relational Sociology” •“Substantialism vs Relationalism” •Theoretical Domains: Power, equality, freedom, agency •Substantive domains (research): Social Structure Network analysis Culture Social Psychology •Problems Boundary specification Network dynamics Causality Normative implication Foundations Theory “Structural Analysis: from method and metaphor to theory and substance.” (Wellman, you didn’t read this) H. White: “The presently existing, largely categorical descriptions of social structure have no solid theoretical grounding; furthermore, network concepts may provide the only way to construct a theory of social structure.” (p.25) Form Vs. Content Integration of large-scale social systems Foundations Theory “Structural Analysis: from method and metaphor to theory and substance.” Major Claims: •Structured social relationships are a more powerful source of sociological explanation than personal attributes of system members. •Norms emerge from location in structured systems of social relationships •Social Structures determine the operation of dyadic relationships •The world is composed of networks, not groups •Structural methods supplant and supplement individualistic methods Foundations Theory Social Structures •Goal: To provide an analytic understanding of social structures “from the ground up” by asking what limitations are created by forms of relations. An analytic approach to explaining institutions: imagine a noncontradictory aggregation process of individual actions that yield the observed institution. so institutions are the “crystallization” of relationships: Foundations Theory Social Structures •The first question is how to characterize “social relations” – by form, content, quality, quantity…? •JLM focuses on a formal aspect of the base relation: •Examples: •Symmetric: ab implies bA •Asymmetric: ab does not necessarily imply ba •Antisymmetric: ab forbids ba Foundations Theory Social Structures •The second question is how to characterize “social structure”? Do so w. respect to particular people, rather than roles/classes. Foundations Data The unit of interest in a network are the combined sets of actors and their relations. We represent actors with points and relations with lines. Actors are referred to variously as: Nodes, vertices, actors or points Relations are referred to variously as: Edges, Arcs, Lines, Ties Example: b a (Review from last class…) d c e Foundations Data Social Network data consists of two linked classes of data: a) Nodes: Information on the individuals (actors, nodes, points, vertices) • • • Network nodes are most often people, but can be any other unit capable of being linked to another (schools, countries, organizations, personalities, etc.) The information about nodes is what we usually collect in standard social science research: demographics, attitudes, behaviors, etc. Often includes dynamic information about when the node is active b) Edges: Information on the relations among individuals (lines, edges, arcs) • • • • Records a connection between the nodes in the network Can be valued, directed (arcs), binary or undirected (edges) One-mode (direct ties between actors) or two-mode (actors share membership in an organization) Includes the times when the relation is active Graph theory notation: G(V,E) (Review from last class…) Foundations Data In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected b b d a c a e c 1 a b d 1 3 c e Directed, binary Undirected, binary b d d 2 4 Undirected, Valued e a c e Directed, Valued The social process of interest will often determine what form your data take. Almost all of the techniques and measures we describe can be generalized across data format. Social Network Data Basic Data Elements In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected b a d c e Directed, Multiplex categorical edges The social process of interest will often determine what form your data take. Conceptually, almost all of the techniques and measures we describe can be generalized across data format, but you may have to do some of the coding work yourself…. Foundations Data Primary Group Global-Net Ego-Net Best Friend Dyad 2-step Partial network Foundations Data We can examine networks across multiple levels: 1) Ego-network - Have data on a respondent (ego) and the people they are connected to (alters). Example: 1985 GSS module - May include estimates of connections among alters 2) Partial network - Ego networks plus some amount of tracing to reach contacts of contacts - Something less than full account of connections among all pairs of actors in the relevant population - Example: CDC Contact tracing data for STDs Foundations Data We can examine networks across multiple levels: 3) Complete or “Global” data - Data on all actors within a particular (relevant) boundary - Never exactly complete (due to missing data), but boundaries are set -Example: Coauthorship data among all writers in the social sciences, friendships among all students in a classroom A Little network Visualization History Euler, 1741 Euler’s treatment of the “Seven Bridges of Kronigsberg” problem is one of the first moments of graph theory…. A Little network Visualization History The study of network has depended on a graphical element since its first moments: Or early representations of organizational relations (1921) A Little network Visualization History The study of network has depended on a graphical element since its first moments: ..but Moreno’s sociograms from Who Shall Survive (1934) are typically seen as the beginnings of social network analysis (certainly if you were to ask Moreno!). A Little network Visualization History Lundberg & Steel 1938 – Using a “Social Atom” representation The flow of images continued over time, marking a wide range of potential styles…. A Little network Visualization History Charles Loomis – 1948 Loomis, 1940s The flow of images continued over time, marking a wide range of potential styles…. A Little network Visualization History Northaway’s – “Target Sociograms” Bronfenbrenner, 1941 Northway 1952 The flow of images continued over time, marking a wide range of potential styles…. A Little network Visualization History “Viral Marketing” is perhaps the most recent advocate; with this add appearing in popular women’s magazines… The flow of images continued over time, marking a wide range of potential styles…. Foundations Graphs A good network drawing allows viewers to come away from the image with an almost immediate intuition about the underlying structure of the network being displayed. However, because there are multiple ways to display the same information, and standards for doing so are few, the information content of a network display can be quite variable. Consider the 4 graphs drawn at right. After asking yourself what intuition you gain from each graph, click on the screen. Now trace the actual pattern of ties. You will see that these 4 graphs are exactly the same. Why Visualize Network at all? While the history is deeply rooted in visual analysis, why bother? Consider Anscombe’s answer in the 1973 American Statistician (replicated in Tufte) X 4 5 6 7 8 9 10 11 12 13 14 y1 4.26 5.68 7.24 4.82 6.95 8.81 8.04 8.33 10.84 7.58 9.96 y2 3.1 4.74 6.13 7.26 8.14 8.77 9.14 9.26 9.13 8.74 8.1 y3 5.39 5.73 6.08 6.42 6.77 7.11 7.46 7.81 8.15 12.74 8.84 These 3 series seem very similar, when viewed statistically N=11 Mean of Y = 7.5 Reg Equation: Y = 3 + .5(X) SE of slope estimate: 0.118 T=4.24 Sum of Squares (X-X): 110 Regression SS: 27.5 Correlation Coeff: 0.82 Why Visualize Network at all? While the history is deeply rooted in visual analysis, why bother? Consider Anscombe’s answer in the 1973 American Statistician (replicated in Tufte) 15 We Might expect a relation like this: 10 5 0 0 5 10 15 Why Visualize Network at all? While the history is deeply rooted in visual analysis, why bother? Consider Anscombe’s answer in the 1973 American Statistician (replicated in Tufte) 15 ..but could have this… 10 5 0 0 5 10 15 Why Visualize Network at all? While the history is deeply rooted in visual analysis, why bother? Consider Anscombe’s answer in the 1973 American Statistician (replicated in Tufte) 15 …or this… Or many more. 10 5 0 0 5 10 15 Why Visualize Network at all? While the history is deeply rooted in visual analysis, why bother? Consider Anscombe’s answer in the 1973 American Statistician (replicated in Tufte) 15 15 15 10 10 10 5 5 5 0 0 0 5 10 15 0 0 5 10 15 0 5 10 Visualization allows you to see the relations among elements “in the whole” – a complete macro-vision of your data in ways that summary statistics cannot. This is largely because a good summary statistic captures a single dimension, while visualization allows us to layer dimensionality and relations among them. 15 Why Visualize Network at all? But consider changing a key feature of the scatterplot: the scaled ordering of the axes. 15 15 14 14 13 13 Standard View 12 Permuted View 12 11 11 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 12 2 6 0 11 8 1 13 7 4 3 15 10 14 9 Technically, all the information is retained – but the presentation provides no new information. 5 Why Visualize Network at all? Now consider network visualizations: We lack a determinant coordinate system, having only “adjacent” or “not” distinguished by a connecting line. Thus, there are many ways to represent the same data. Consider the “Zachary Karate Club” data: 3 representations of the same underlying data Original, 1979(?) White & Harary, 2001 Kolaczyk, Eric D., Chua, David B., Barthélemy, Marc (2009) The exact same data, presented in press distinct ways. We’d never see this with a scatter plot… Foundations Graphs Network visualization helps build intuition, but you have to keep the drawing algorithm in mind: Spring-embeder layouts Tree-Based layouts Most effective for very sparse, regular graphs. Very useful when relations are strongly directed, such as organization charts, internet connections, Most effective with graphs that have a strong community structure (clustering, etc). Provides a very clear correspondence between social distance and plotted distance Two images of the same network Foundations Graphs Network visualization helps build intuition, but you have to keep the drawing algorithm in mind: Tree-Based layouts Spring-embeder layouts Two images of the same network Foundations Graphs Network visualization helps build intuition, but you have to keep the drawing algorithm in mind. Hierarchy & Tree models Use optimization routines to add meaning to the “Y-axis” of the plot. This makes it possible to easily see who is most central because of who is on the top of the figure. Usually includes some routine for minimizing linecrossing. Spring Embedder layouts Work on an analogy to a physical system: ties connecting a pair have ‘springs’ that pull them together. Unconnected nodes have springs that push them apart. The resulting image reflects the balance of these two features. This usually creates a correspondence between physical closeness and network distance. Foundations Graphs 2 12 9 63 Male Female Foundations Graphs Using colors to code attributes makes it simpler to compare attributes to relations. Here we can assess the effectiveness of two different clustering routines on a school friendship network. Foundations Graphs As networks increase in size, the effectiveness of a point-and-line display diminishes, because you simply run out of plotting dimensions. I’ve found that you can still get some insight by using the ‘overlap’ that results in from a space-based layout as information. Here you see the clustering evident in movie co-staring for about 8000 actors. Foundations Graphs As networks increase in size, the effectiveness of a point-and-line display diminishes, because you simply run out of plotting dimensions. I’ve found that you can still get some insight by using the ‘overlap’ that results in from a space-based layout as information. This figure contains over 29,000 social science authors. The two dense regions reflect different topics. Foundations Graphs As networks increase in size, the effectiveness of a point-and-line display diminishes, because you simply run out of plotting dimensions. I’ve found that you can still get some insight by using the ‘overlap’ that results in from a space-based layout as information. This figure contains over 29,000 social science authors. The two dense regions reflect different topics. Foundations Graphs Adding time to social networks is also complicated, as you run out of space to put time in most network figures. One solution is to animate the network. Here we see streaming interaction in a classroom, where the teacher (yellow square) has trouble maintaining order. The SONIA software program (McFarland and Bender-deMoll) will produce these figures. Black ties: Teaching relevant communication Blue ties: Positive social communications Red ties: Negative social communication Source: Moody, James, Daniel A. McFarland and Skye Bender-DeMoll (2005) "Dynamic Network Visualization: Methods for Meaning with Longitudinal Network Movies” American Journal of Sociology 110:1206-1241 Foundations Methods Analytically, graphs are cumbersome to work with analytically, though there is a great deal of good work to be done on using visualization to build network intuition. I recommend using layouts that optimize on the feature you are most interested in. The two I use most are a hierarchical layout or a force-directed layout are best. Foundations Methods From pictures to matrices b b d a c e Undirected, binary a b 1 a b 1 c 1 d e c d 1 1 c e a 1 a b 1 c 1 d e 1 1 a e Directed, binary 1 1 d b 1 c 1 d e 1 1 1 Foundations Methods From matrices to lists a a b 1 c d e b 1 c d e 1 1 1 1 1 1 1 1 Adjacency List ab bac cbde dce ecd Arc List ab ba bc cb cd ce dc de ec ed Foundations Basic Measures Basic Measures & A little graph theory For greater detail, see: http://www.analytictech.com/networks/graphtheory.htm Volume The first measure of interest is the simple volume of relations in the system, known as density, which is the average relational value over all dyads. Under most circumstances, it is calculated as: D= X N ( N - 1) Foundations Basic Measures Basic Measures & A little graph theory Volume At the individual level, volume is the number of relations, sent or received, equal to the row and column sums of the adjacency matrix. a a b 1 c d e b 1 c 1 1 d e 1 1 1 Node In-Degree Out-Degree a 1 1 b 2 1 c 1 3 d 2 0 e 1 2 Mean: 7/5 7/5 Foundations Data Basic Measures & A little graph theory Reachability Indirect connections are what make networks systems. One actor can reach another if there is a path in the graph connecting them. b a a d c b e f c f d e Foundations Basic Matrix Operations One of the key advantages to storing networks as matrices is that we can use all of the tools from linear algebra on the socio-matrix. Some of the basics matrix manipulations that we use are as follows: 1) Definition A matrix is any rectangular array of numbers. We refer to the matrix dimension as the number of rows and columns a b c d e a 0 1 0 0 0 b 1 0 0 0 0 c 0 1 0 1 1 d 0 e 0 0 0 0 0 0 1 1 0 (5 x 5) W B 1 0 1 0 0 1 0 1 0 1 1 0 (5x2) Age 13 10 7 8 16 11 (5x1) Foundations Basic Matrix Operations Matrix operations work on the elements of the matrix in particular ways. To do so, the matrices must be conformable. That means the sizes allow the operation. For addition (+), subtraction (-), or elementwise multiplication (#), both matrices must have the same number of rows and columns. For these operations, the matrix value is the operation applied to the corresponding cell values. 1 3 A= 4 7 2 5 2 3 B= 7 1 0 4 3 6 A+B = 11 8 2 9 3 9 Multiplication by a scalar: 3A = 12 21 6 15 A-B = -1 0 -3 6 2 1 2 9 A#B = 28 7 0 20 Foundations Basic Matrix Operations The transpose (` or T) of a matrix reverses the row and column dimensions. Atij=Aji So a M x N matrix becomes an N x M matrix. a b c d e f T = a c e b d f Foundations Basic Matrix Operations The matrix multiplication (x) of two matrices involves all elements of the matrix, and will often result in a matrix of new dimensions. In general, to be conformable, the inner dimension of both matrices must match. So: A3x2 x B2x3 = C3 x 3 But A3x3 x B2x3 is not defined Substantively, adding ‘names’ to the dimensions will help us keep track of what the resulting multiplications mean: So multiplying (send x receive)x (send x receive) = (send x receive), giving us the two-step distances (the sender’s recipient's receivers). Foundations Basic Matrix Operations The multiplication of two matrices Amxn and Bnxq results in Cmxq n Cmq = amk bkq k =1 a b c d e f g h a b c d e f g h i j k l (3x2) (2x3) = = ae+bg ce+dg ag+bj cg+dj eg+fg af+bh cf+dh ah+bk ch+dk eh+fk (3x3) ai+bl ci+dl ei+fl Foundations Basic Matrix Operations The powers (square, cube, etc) of a matrix are just the matrix times itself that many times. A2 = AA or A3 = AAA We often use matrix multiplication to find types of people one is tied to, since the ‘1’ in the adjacency matrix effectively captures just the people each row is connected to. (Preview: This is also how we do compound relations: Mother x Brother “Uncle”) Foundations Data Basic Measures & A little graph theory Reachability The distance from one actor to another is the shortest path between them, known as the geodesic distance. If there is at least one path connecting every pair of actors in the graph, the graph is connected and is called a component. Two paths are independent if they only have the two endnodes in common. If a graph has two independent paths between every pair, it is biconnected, and called a bicomponent. Similarly for three paths, four, etc. Foundations Data Basic Measures & A little graph theory Calculate reachability through matrix multiplication. (see p.162 of W&F) 0 1 0 0 0 1 e d c b a f 1 0 1 0 0 0 X 0 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 2 0 2 0 0 0 0 2 0 1 1 2 X2 2 0 0 1 4 1 1 2 1 1 0 1 Distance . 1 2 0 1 . 1 2 2 1 . 1 0 2 1 . 0 2 1 1 1 2 1 2 0 1 1 1 2 1 0 2 1 1 . 2 4 0 6 1 1 0 X3 0 2 6 1 2 5 5 2 5 3 6 1 0 2 0 1 1 2 0 4 0 2 2 4 2 1 5 3 2 1 4 0 6 1 1 0 1 2 1 2 2 . Distance . 1 2 3 3 1 . 1 2 2 2 1 . 1 1 3 2 1 . 1 3 2 1 1 . 1 2 1 2 2 1 2 1 2 2 . Foundations Data Basic Measures & A little graph theory Mixing patterns Matrices make it easy to look at mixing patterns: connections among types of nodes. Simply multiply an indicator of category by the adjacency matrix. e d c b a f 0 1 0 0 0 1 1 0 1 0 0 0 X 0 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 Race 1 0 1 0 0 1 0 1 0 1 1 0 R G R 4 2 Race`(X)Race= G 2 6 X(Race) 2 0 1 1 2 2 0 2 0 2 1 1 Foundations Data Basic Measures & A little graph theory Matrix manipulations allow you to look at direction of ties, and distinguish symmetric from asymmetric ties. To transform an asymmetric graph to a symmetric graph, add it to its transpose. 0 1 0 0 0 1 0 1 0 0 X 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 XT 0 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 2 0 0 0 2 0 1 0 0 0 1 0 1 2 0 0 1 0 1 0 0 2 1 0 Max Sym 0 1 0 0 0 1 0 1 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 0 MIN Sym 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 Social Network Software UCINET •The Standard network analysis program, runs in Windows •Good for computing measures of network topography for single nets •Input-Output of data is a special 2-file format, but is now able to read PAJEK files directly. •Not optimal for large networks, but much better than it used to be! •Available from: Analytic Technologies Social Network Software PAJEK •Program for analyzing and plotting very large networks •Intuitive windows interface •Used for most of the real data plots in this presentation •Started mainly a graphics program, but has expanded to a wide range of analytic capabilities •Can link to the R & SPSS statistical package •Free •Available from: Social Network Software Cyram Netminer for Windows •Newest Product, not yet widely used •Price range depends on application & size, but typically quite spendy ($4000+) http://www.netminer.com/NetMiner/overview_01.jsp Social Network Software NetDraw •A drawing program packaged w. UCINET 6 •Free •Works directly w. UCINET files, so useful there… Social Network Software NEGOPY (no longer in production, but you may find a copy out there..) •Program designed to identify cohesive sub-groups in a network, based on the relative density of ties. •DOS based program, need to have data in arc-list format •Moving the results back into an analysis program is difficult. •Available from: William D. Richards http://www.sfu.ca/~richards/Pages/negopy.htm SPAN - Sas Programs for Analyzing Networks (Moody, ongoing) •is a collection of IML and Macro programs that allow one to: a) create network data structures from nomination data b) import/export data to/from the other network programs c) calculate measures of network pattern and composition d) analyze network models •Allows one to work with multiple, large networks •Easy to move from creating measures to analyzing data http://www.soc.duke.edu/~jmoody77/span/span.zip Social Network Software STATNET •Program designed to estimate statistical models on networks in R. Statnet Team http://csde.washington.edu/statnet/ Other R Resources: Carter Butts (UC-Irvine, Sociology) – SNA & PermNet •Program for general network analysis in R •Does most of what we’ve discussed today… Social Network Software STATNET •Program designed to estimate statistical models on networks in R. Statnet Team http://csde.washington.edu/statnet/ Other R Resources: iGraph Social Network Software Lots of Java-Based programs Both are flexible, fairly good at “drawing by hand” (but some quirks) Social Network Software CASOS – A collection of tools for networks, developed by the folks at Carnegie Mellon (Carley et al) http://www.casos.cs.cmu.edu/index.php Social Network Software Homework Preview: Let’s open SAS, UCINET & PAJEK