Chapter 4: Methods for Analyzing Networks This chapter discusses various methods for analyzing social networks, giving equal attention to traditionally important topics and newly emergent methods. In particular, we discuss topics such as centrality, cohesiveness measures, structural equivalence, clustering, multidimensional scaling, blockmodels, logit p*, affiliation network, and analysis of lattices. Because representing network data is always precedent to analyzing social networks, this chapter naturally starts with a description of two methods that are used commonly to represent social networks: graphs and matrices. 1 Graphs and matrices Graphs and matrices are two separate methods to represent social network data. Graphs present a visual format of a social network, whereas matrices are mathematical algebraic representation of network relations. Although social network scholars may freely choose either graphs or matrices to present their data, both methods have their respective advantages and disadvantages. Graphs are much straightforward visual illustrations of network structures, but it does not support mathematical manipulations. In contrast, matrices are less user-friendly, but it facilitates mathematical and computer analyses of social network data. Often a matrix is a squared array of elements arranged in rows and columns. For example, a mathematic notation of A (N, N) denotes a social network matrix A with N by N social actors. The headings of rows and columns are arranged in the same sequence to indicate social actors in the network. Values in the matrix are actual measures of the relationship between a pair of actors in the matrix. Normally, the actors in the rows are the senders of a specific relation, whereas the actors in the columns are receivers of the relation. Thus, a mathematic notation X i , j ,k = 1 suggests that actor i sends a relation k to actor j in the binary network, whereas X i , j ,k = 0 suggests an absence of relation k from actor i to actor j. Note that in non-directed graph, X i , j ,k = X j ,i ,k that is: the value on the relation k between the sender i and the receiver j always equal the value on the relation between the send j and the receiver i. We call them symmetric matrices. Empirical exemplars of symmetric matrices are marriage network, or communication channels. In contrast, many social networks such as reporting or friendship network are asymmetric: 2 X i , j ,k and X j ,i ,k often diverge, suggesting a disagreement in the assessment of the relation under scrutiny between actors i and j. Many social networks contain integer values to reflect the intensity of the relationship such as frequency of contacts, strength, and magnitudes of associations. Those networks are called valued graphs, in which the value of X i , j ,k ranges from 0 to whatever maximum value in the network, as opposed to its restricted range in binary graphs as either 0 or 1. Same as the binary graphs, valued graphs are also distinguished as non-directed symmetric, in which X i , j ,k = X j ,i ,k , and directed asymmetric matrix, in which X i , j ,k = or X j ,i ,k . Sometimes, network researchers use non-square matrices to indicate actors’ attributes, or participations in certain events. A mathematical notation A (N, M) is commonly used to denote a non-square matrix A, in which N is the number of actors and M is the attributes, events, or locations under investigation. Freeman and Webster (1994), for example, observed 43 regular beach-goer and recorded 353 events over 31 days, in which interaction between the 43 beach-goer took place. They created a 43 by 353 matrix, in which a joint entry of 1 between row i and column j indicates that person i was involved in the interaction event j. To illustrate presentation of matrix in representing a social network structure, we discuss network research by Feldman-Savelsberg et. al., (2005) on Cameroon women’s hometown associations. To analyze how collective memory affects women’s discussion over reproduction, Feldman-Savelsberg et al., (2005) interviewed 156 women belonging to 6 women’s associations in Yaounde, Cameroon. Their in-depth interviews contain questions about women’s social network, such as “please rank the strength of ties 3 between you and other women in the same association according to the following schema: 1) confidant, 2) friends, 3) acquaintances, and 4) complete strangers.” As each woman was asked only to rank her ties with other women in the same association, the network data contain 6 network structures for the 6 associations. For a concise illustration, we use only network structure of women’s association 6 with 6 women. Table 4_1 demonstrates the matrix representation of women’s social network structure of hometown association 6 at Yaounde, Cameroon. Because each of the 6 women was asked to rank her relation with the other 5 women, the matrix in representing the network structure is valued and asymmetric. Women in the rows are the “senders” or “evaluators” of their relation with other women, whereas women in the columns are the “receivers” or “evaluatees” of their relation with the other 5 women. For example, woman 1 ranks her tie with woman 3 at the strength level 2 (friend), while woman 3 ranks her tie with woman 1 at the strength level 3 (acquaintance). Disagreement occurs in their respective assessment of the relationship between two women. Note that diagonal values in the matrix are null: we do not consider how each of the 6 women assesses her relation with herself as valid. Figure 4_1 shows the graph representation of the social network between 6 women in Association 6. Out of the total 30 directed ties between the 6 women, only 4 of those are mutually agreed: two women rank the same on their relation. Women 1 and 2 mutually rank each other as confidents, whereas women 5 and 6 mutually rank each other as total strangers. In addition, women 2 and 5 rank each other as acquaintances, and women 3 and 5 rank each other as friends. Other pairs have different ranking on their relations. For example, woman 6 ranks woman 2 as confident, whereas woman 2 ranks 4 woman 6 as mere acquaintances. Although the graph representation present much intuitive and straightforward picture of the network structure, they can be poor visual illustration of large networks with tens or even hundreds actors. We can see that with only 6 actors and 30 directed relations, the graph appears to be overwhelmingly entangling. In contrast, matrix can easily display network data with tens of actors. 5 Centrality, Prestige, and Power Measures for Ego-centric and Complete networks One of the most important indicators in social network data analyses is centrality, measured at both individual and group level (Wasserman and Faust 1994: 169-219). Centrality measures at the individual level indicate the extent to which actor’s position approximates the central position of the network. Therefore, centrality may suggest prestige and power, in the sense that central actors commonly receive most “choices” from other actors (prestige) and, due to their central positions, receive and control great amount of information or commodities flowing in the network (power). However, Knoke and Burt (1983) provided highly cogent advice that centrality and prestige are not interchangeable concepts and may suggest disparate processes. Centrality, in measuring relative position of a network actor, is largely indifferent to the directions of relations, whereas prestige, in measuring actor’s influences, is highly sensitive to the relational directions. Therefore, while centrality is essential network indicator for both non-directed and directed graphs, prestige measure is mostly relevant to directed graphs. Group level centrality is normally called centralization. We defer the detailed discussion on group centralization to the next section. Centrality and prestige measures are also different between ego-centric networks and complete networks. Below, we first discuss centrality and prestige in complete networks, after which we discuss those measures in ego-centric networks. Centrality and prestige in complete network Actor centrality and group centralization include several different measures such as degree measure, closeness measure, betweeness measure, and information measure. 6 The computation and implementation of those measures vary, depending on the type of social network data. Starting with the simplest case, we first discuss the measures of centrality and centralization in undirected graphs and then move on to the directed graphs. Because most development in centrality and centralization measures assumes binary graphs, we will restrict our discussion to binary data, and encourage network scholars to more attention to centrality and centralization issues in valued graphs. Measures of Actor Centrality and Group Centralization Actor degree centrality measures the extent to which an individual actor connects to other actors in a social network. Suppose a social network has g actors, degree centrality for actor i measures the aggregation of i’s connection to other g-1 actors: g C D ( N i ) X ij (i j ) (4.1) 1 g CD ( Ni ) denotes degree centrality for node i, and X ij denotes the aggregation 1 of presence of a tie from the node i to other nodes j (j denotes nodes from 1 to g, excluding i). However, note that degree centrality so measured reflects not only a node’s connectivity with other nodes, but also the size of the network, as the larger the network size, the higher the degree centrality. Therefore, a given actor degree centrality means either the actor is well-connected in a small network or the actor is only connected to a few other nodes in a large network. To eliminate the effect of network size on the degree centrality measure, researchers (Wasserman and Faust 1994: 179) recommended normalized degree centrality: 7 g C (Ni ) ' D X ij (i j ) 1 (4.2) g 1 The normalized degree centrality divides the degree centrality by the maximum number of possible connections with g actors (g –1). Controlling for the network size, normalized degree centrality reflects only the connection of a given node in a social network, with a range from 0 to 1, indicating from no connections with other nodes to connections with all other nodes respectively. Actor degree centrality measures the extent to which actors involve in relationships. Actors receive high normalized degree centrality are the most visible actors in the network. In particular, the closer the normalized degree centrality is to 1, the greater the actor’s involvement in the relationship networks. Researchers can readily apply such concept in measuring access, control, and brokerage of information networks, in which the sheer involvement in the relationship is more important than the source and object of relation (Knoke and Burt 1983: 195-222). Unlike actor degree centrality, Group Degree Centralization measures the extent to which actors in a social network differ with each other in their degree centralities. Group degree centralization resembles closely to measure of dispersion in statistics, indicating the variability or spread of individual actor degree centrality in a network. Freeman (1979) proposed a generic mathematic solution to indicate such group index of centralization. g CA [C i 1 A (n*) C A (ni )] (4.3) g max [C A (n*) C A (ni )] i 1 8 C A (n*) denotes the largest actor centrality observed in a network, whereas C A (ni ) indicates actor centralities for other actors in the network. Thus, the numerator in the equation measures the aggregation of the difference in centralities between the node with the largest centrality and other individual nodes. The denominator is the theoretical maximum possible sum of differences in actor centralities in a network. Based on such generic suggestion to compute group level index of centralization, Wasserman and Faust (1994: 180) proposed a method for computing group degree centralization. g CD [C i 1 D (n*) C D (ni )] (4.4) ( g 1)( g 2) The numerator measures the sum of the difference in degree centrality between the node with the highest degree centrality and other nodes. The denominator measures the maximum possible sum of difference between the node with the highest centrality and other nodes. Note that as previously argued, the node with the highest centrality with g nodes should have “g – 1” degree centrality, and all other nodes should have “1” degree centrality (in this scenario, the other nodes must have “1” degree centrality, instead of “0” degree centrality to make it possible that the node with the highest degree centrality achieves g – 1). Therefore, the distance between the highest node and other node is g – 1 – 1 = g – 2. Such a distance repeats g – 1 times to cover the distance between the highest node with all other nodes. Thus, the maximum possible sum of degree centrality difference between the highest node and other nodes is (g –1)(g –2). Such group index of degree centralization ranges from 0 to 1. When degree centrality in a social network has a perfect dispersion that every node has the same degree 9 g centrality, [C i 1 D (n*) CD (ni )] will be 0, thus group level degree centralization is 0. At the other extreme, the degree centrality has complete uneven dispersion that one node has the highest g – 1 centrality and other nodes all have 1 degree centrality. The numerator g [C i 1 D (n*) CD (ni )] will be equal the denominator (g – 1)( g – 2), thus group degree centralization equals 1. Therefore, the closer the group index of degree centralization is to 1, the more uneven or hierarchical in the degree centrality between notes in a social network. Actor Closeness Centrality was developed to reflect how close each node to the other nodes in a social network (Sabidussi 1966). Actor closeness centrality index is a function of actor’s geodesic distance to all other nodes in the network, geodesic distance is defined as the length of the shortest path between all pairs of nodes in a network. Based on Sabidussi’s (1966) suggestion, actor closeness centrality is computed with the following formulae. g Cc (ni ) [ d (ni , n j (i j ))]1 j 1 (4.5) Actor closeness centrality (for actor i) is actually the inverse of the sum of geodesic distance between the actor i and other actors in the network. Therefore, actor closeness centrality can never be 0 as the denominator of 0 is mathematically undefined. In empirical social network analysis, such restriction requires that all the nodes in a network have at least one connection to other nodes. Completely isolated node with no connection to other nodes does not have a valid closeness centrality measure. In contrast, 10 actor closeness centrality can be 1, assuming a network with two actors connected with each other. The low value of actor closeness centrality, thus the high value of the sum of the distance between a given node and other nodes in a network, results from either that the node locates in a relatively large network or that the node locates in a small network but has relatively long distance with other nodes. To control for the size of the network, thus engendering meaningful comparison in closeness centrality between nodes from different network, Wasserman and Faust (1994: 186) recommended normalized closeness centrality as the following CC' (ni ) g 1 g d (n , n j 1 i j (4.6) (i j )) To compare and contrast the closeness centrality and the normalized closeness centrality, we produce two network illustrations. Figure 4_2 shows a 3-node network structure, in which actor A is directly connected to B with geodesic distance of 1 and indirectly connected to C with geodesic distance of 2. Thus, the closeness centrality for A in figure 4_2 is 1/3. Figure 4_3 shows a 4-node structure with direct connection between the nodes. Actor A has direct connection with Actors B, C, and D with geodesic distance of 1. Therefore, actor A has a closeness centrality of 1/3. Even though actor A in graph 4_3 is better connected than the actor A in graph 4_2, their closeness indices are the same, simply because actor network depicted in figure 4_3 has more nodes than does 4_2. In contrast, the normalized closeness centrality will distinguish the two actors by taking into account of the network size. The normalized closeness centrality for actor A in 4_3 is 3/3=1, whereas that value for actor A in figure 4_2 is 2/3. Therefore, the closer 11 the actor’s closeness centrality is to 1, the better the actor is connected to other nodes, in the sense that the actor can reach other nodes via shortest geodesic distances. Similar to Group Degree Centralization, Group closeness centralization is a dispersion measure, indicating the hierarchy of actor’s closeness centralities in a given network. In particular, Group Closeness Centralization measures the extent to which actors in a given network differ from each other in their closeness centralities. According to Freeman’s (1979), group closeness centralization is computed with the following formula. g CC [C i 1 ' C (n * ) CC' (ni )] (4.7) [( g 2)( g 1)] /( 2 g 3) The group closeness centralization reaches 1 when the network embraces a complete uneven distribution in actors’ closeness centralities, in which one actor has the highest closeness centrality and all others have the lowest closeness centralities. In contrast, the group closeness centralization equals 0, when the network has a complete even distribution in actor’s closeness centralities, in which every actor receives the same closeness centrality. Actor Betweenness Centrality measures the extent to which an actor lies on the geodesic path between two other actors in the network. Actor betweenness centrality is important measure of control of information or resource flow between other actors in the network. Suppose that actor j has to go through actor i to reach actor k, actor i has responsibilities or control over the content and the timing in passing the message between actors j and k. The more the actor i locates at the geodesic path between pairs of other actors, the more control actor i has in the information or resource flows in the network. 12 To qualify actor i’s betweenness centrality, Freeman (1977) proposed the following procedure: first assume g jk is the number of geodesic paths between the two actors j and k, and g jk (ni ) is the number of geodesic paths between the two actors j and k that contain actor i. Thus, dividing g jk (ni ) by g jk measures the degree to which actor i sits on the geodesic paths connecting j and k. Aggregating g jk (ni ) / g jk should reflect the extent to which actor i sits on the geodesic paths for all pairs of the remaining nodes in a network. The following formula reflects such logic. CB (ni ) g jk (ni ) g jk ( i j, j k ) (4.8) This index will be 0 when node i falls on no geodesic path for all the pairs between remaining g – 1 nodes. It reaches its maximum value of (g – 1)(g – 2)/2 when node i falls on the geodesic path for all pairs of the remaining g – 1 nodes, assuming that each pair has only one geodesic path. For the remaining g – 1 nodes excluding node i, the total number of geodesic paths between all pairs (assuming that each pair has only one geodesic path) is C g21 ( g 1)! ( g 1)! ( g 1)( g 2) . We must add to this 2!( g 1 2)! 2!( g 3)! 2 body of knowledge of actor betweenness centrality (Freeman, 1977; Wasserman and Faust 1994: 190) that when the pairs between the g –1 nodes have more than one geodesic path, the theoretical maximum possible value for node i’s betweenness centrality will be larger than ( g 1)( g 2) , depending on how many geodesic paths 2 present between each of those pairs. 13 Wasserman and Faust (1994:110) recommended that actor’s betweenness centrality C B (ni ) be divided by its maximum theoretical value of ( g 1)( g 2) 2 (assuming each pair has only one geodesic path) to produce the standardized actor betweenness centrality. ' C B (ni ) = C B ( ni ) 2 ( g 1)( g 2) (4.9) The standardized betweenness centrality becomes 0 when the original betweenness centrality is 0, it reaches 1 when the actor falls on the geodesic path of all pairs between the remaining g –1 nodes. Therefore, the closer the standardized actor betweenness centrality is to 1, the more the actor i falls on every geodesic path between all pairs of the remaining nodes in the network. Much like group level degree and closeness centralization, Group Level Betweenness Centralization measures the extent to which actors in a network differ in their individual level betweenness centralities. Following Freeman’s (1979) generic method, Wasserman and Faust proposed the following equation to calculate group betweenness centralization index. g CB 2 [C B (n*) C B (ni )] i 1 (4.10) ( g 1) 2 ( g 2) The numerator measures the sum of the difference between the actor with highest betweenness centrality and other actors with lower between centralities. The denominator indicates the theoretical possible maximum value of betweenness centralities for all nodes in a network. Note that individual betweenness centrality reaches the theoretical 14 maximum value at ( g 1)( g 2) . At group level, such a individual level maximum can 2 occur maximally g –1 times, in which one dominate node serves as the intermediaries for all the geodesic paths between the dominate node and all other nodes. Thus, the theoretical maximum betweenness centralization for a network with g actors is ( g 1) 2 ( g 2) . Again, we stress that this computation assumes that each dyadic pair has 2 only one geodesic path between them. If multiple geodesic paths present between any pair of actors, the individual maximum possible betweenness centrality will be larger than ( g 1)( g 2) , which produces a corresponding change in theoretical maximum 2 possible value in the group betweenness centralization. Group betweenness centralization reaches 1 when there is one and only one dominate actor in the network that sits on the geodesic paths of all pairs for the remaining actors. The difference between the dominate node and all the remaining nodes is ( g 1)( g 2) and such difference repeats g –1 times in a network with g nodes. Thus, 2 the numerator reaches the theoretical maximum value to equal the denominator, producing the result of 1. In contrast, in a complete “egalitarian” network in which every node has the same betweenness centrality, the numerator is 0, thus the group level centralization is 0. Thus the closer the betweenness centralization is to 1, the more unequal in the value of betweenness centrality between different actors in the network. 15 Measures of Prestige in Directed Graphs In many occasions, social interactions involve directions that specify “senders” and “receivers” of the relations in the network. Social networks that embrace directions of the relations between the nodes are called directed graphs. In directed graphs, the mere participation or involvements in certain relations is less important then the role of “being receiver” or “being sender” of the relation. For example, in a reporting network of a workplace, low echelon employees routinely report to their managerial supervisors of their work activities and merit contribution, whereas high level employees rarely report their work activities to their subordinates. In friendship network, an actor enthusiastically nominated many other actors as his best friends may not receive the “best friend” nomination from those actors. Here, we define prestige as an indicator of the extent to which a social actor in a network “receives” or “serves as an object” of the relations in the network. The distinction between the “senders” or the “sources” and the “receivers” or “objects” of relations is highly emphasized as the distinction reflects inequalities in control over resources, and authorities and deference produced by those inequalities (Knoke and Burt 1983: 199). By definition, actor prestige can be measured by simply tallying the number of times an actor receives nomination of a certain relation in a given network. Wasserman and Faust (1994: 202) proposed such a measure be called actor degree prestige, calculated with the following formula. g PD ( N i ) X ji ( j i ) (4.11) j 1 While this measure counts the number of times actor i was nominated by other actors in a network with g nodes, its maximum value is g-1 and minimum value 0, 16 indicating respectively that actor i was nominated by all g-1 other actors, or by none of other actors. Therefore, a standardized actor degree prestige controls for the size of network, making it possible to compare actor’s degree prestige across different networks. g P (Ni ) X j 1 ' D ji ( j i) (4.12) g 1 The standardized actor degree prestige achieves 1 when all the other actors nominate actor i for a specific relation, and it is 0 when none of other actors nominates actor i. Thus, the closer the actor i’s degree prestige is to 1, the greater its prestige in the network. One can easily conjure up a friendship network in which a high degree prestige for actor i vividly illustrates that many other actors in the network nominate actor i as their best friends. We propose that actor degree prestige can be aggregated to produce an index of group level degree prestige. We argue before that individual actor’s degree prestige reaches its maximum value of g-1 when all other nodes nominate this actor. A group level degree prestige maximum can reach g(g –1), when every node in the network nominates all other nodes and was nominated by all other nodes for a specific relation. A summation of the number of nominations actually received by each node in a network can be the actual measure of reciprocity of the relations in the network. We assert that group level degree prestige can be computed using the actual reciprocity divided by the maximum group level degree prestige, shown in the following equation g PD ( g ) g X i 1 j 1 j ,i ( j i) (4.13) g ( g 1) 17 The equation suggests a group level index of degree prestige measures the extent to which either a given relation is reciprocated or actors are connected in a network. In a fully connected and complete reciprocated network, such an index reaches 1, suggesting that everybody nominates all other actors and was nominated by all other actors for a specific relation. The index reaches 0 when every node stays complete isolated from other actors, neither nominating for anybody nor being nominated by anyone else in a network. However, we must caution audiences that group degree prestige computed using the equation does not distinguish connection from reciprocity between the nodes in a network. Therefore, a high group degree prestige may result from either that the nodes are well connected or that a restricted set of nodes are highly reciprocated in a relations. Thus, we call for more refined studies on the index of group degree prestige that can distinguish connection and reciprocity as two separate sources. Overall, we assert that measures of centrality, centralization, and prestige are largely based on assumption of binary graphs. Studies of those corresponding measures on valued graphs are scarce, with an exception of Freeman’s et. al., (1991) discussion of betweenness centrality in valued graphs. Clearly more studies are needed to advance our analytical techniques of centrality, centralization, and prestige in valued graphs. In addition, analyses of centrality and prestige have been restricted only in the complete network data until very recently, when Marsden’s (2002) landmark work extended centrality measures from complete network data to egocentric network data. We devote the following sections to this topic of measuring centrality in egocentric data. 18 Centrality in Egocentric Network Focusing only on binary symmetric data on a single relation, Marsden (2002) discussed centrality measures in egocentric network data. Because egocentric research design survey respondents/egos to nominate alters with whom the egos have certain relations, each ego would have a data matrix Ai containing the ego (i) and its alters with size N i N i , which includes the ego and all the alters. Because by definition, ego (i) has direct ties with all of its alters, X i , j 1(1 j N , i j ) in the Ai matrix. The actor degree centrality in complete network measures the extent to which an actor is connected directly with other actors in a network (see equation 4.1). In egocentric network, egos are connected directly with all other alters. Thus, ego i’s degree centrality is the maximum possible value of actor degree centrality: g – 1 in a network with g actors. Standardized degree centrality for ego i is always 1 ( g 1 1 ). g 1 In complete networks, actor closeness centrality is actually the inverse of the sum of geodesic distance between the actor i and other actors in the network (see equation 4.5). The normalized closeness centrality controls for the size of the network (see equation 4.6). By definition, ego i in its egocentric network data is connected directly with all alters. Thus, its closeness centrality and normalized closeness centrality are and 1 ( 1 g 1 g 1 1) respectively. g 1 In complete network, betweenness centrality measures the extent to which a given node i sits on the geodesic distances of all pairs between the other nodes in a network (see equation 4.8). Because by definition, ego node i has direct connection with all its 19 alters in its own egocentric network, this ego node i serves as intermediary node for all pairs between the alters, unless there exists a direct link between the alters. Marsden (2002:410) asserted that betweenness centrality for node i in its egocentric data differs from the measure in complete network data. One the one hand, the betweenness centrality for node i can be biased downwardly in its egocentric data. The egocentric betweenness measure does not reflect the ego node i’s intermediary location between a pair of nodes connected via geodesic distance with length 3 or more, which was counted in complete network betweenness measure. On the other hand, egocentric betweenness centrality can exaggerate ego node i’s betweenness centrality if two alters are connected via both i and another node outside egocentric network. In this case, the intermediary node outside the egocentric network is discarded in egocentric betweenness measure but was counted in complete betweenness centrality measure. To illustrate the above discussions, we produce Figure 4_4 of an egocentric network of ego node i, who nominated A, B, C, and D as its alters on a specific relation. Note that node M is connected to two of i’s alters: A and D, though it does not belong to ego i’s egocentric network. The total number of pairs between i’s alters is 6 (AB, AC, AD, BC, BD, and CD), each has one unique geodesic path. Because BC are directed connected and all other geodesic paths that connect other alters include ego i, ego i’s betweenness centrality is 5/6. However, the presence of node M will distinguish egocentric betweenness centrality and complete betweenness centrality for ego node i. On the one hand, the egocentric betweenness measure for i is biased downwardly by omitting the intermediary location of i on the geodesic paths between M and B and between M and C. On the other hand, the egocentric betweenness for i is also biased upwardly by 20 overlooking the geodesic path between A and D that goes through M, which competes with the AD geodesic path that goes through i. Despite those divergences between egocentric betweenness and complete network betweenness measures, Marsden (2002) demonstrated empirically that the two measures closely correspond with each other by analyzing 17 network data. Therefore, egocentric betweenness centrality is a reliable substitute for actor betweenness centrality in complete network, when complete network data is difficult to come by. Marsden (2002) asserted that the data collection differences between egocentric and complete network may account for the divergences in the betweenness measures between the two network data, pinpointing an important topic that deserves further systematic studies. 21 Cliques, Cohesion, and Connections To the extent cohesiveness among a subset of members can be measured by strong, direct, intense, and positive ties between them, cliques are effective network indicator of such cohesiveness. The notion of cohesive subgroups and cliques are widely used in social sciences to indicated frequent and intense interactions among the members. The concept of cohesive groups and cliques help researchers to understand better how cliques benefit its members by providing advices and instrumental supports (Dunbar 1995), and how the extensive use of cliques restricts one’s network contact ranges (Blau, Ruan and Aldelt 1991). Often the concepts of social groups, subgroup, and clique are used interchangeably without rigorous definition of each (Borgatti et. al., 1990). Based on a vast literature of subgroups in social network studies, Wasserman and Faust (1994:251) stated four general properties that characterize cohesive subgroups. They are the mutuality of ties, the reachability of subgroup members, the frequency of ties among members, and the relative frequency of ties among subgroup members compared with non-members. Such a summary of underlying characteristics of cohesive subgroup lays foundation for operational definition of clique in measuring those subgroups. Cliques Wasserman and Faust (1994: 254) proposed to define a clique as a maximal complete subgraph of three or more nodes, all of which are directed connected with each other, and there are no other nodes that are directly connected with all of the nodes in the subgraph. Thus, three conditions have to present simultaneously in a subgraph to suffice 22 it as a clique: 1) having at least three nodes, 2) all the nodes in the subgraph are directed connected with each other, and 3) no other node outside such a subgraph is directed connected with all nodes in the subgraph. To illustrate the concept of clique, we produce the figure 4_5 that describes a binary and symmetric network structure among 6 nodes. Two cliques are formed – cliques BCDE and ABE – because connections between them meet the three requirements for a clique: 1) that there are more than three nodes; 2) that they are directly connected, and 3) no other node in the graph are directly connected with all the nodes in the cliques. Note that although node F is connected with A and C directly, nodes A and C are not connected directly, which disqualifies F as a member of either of the cliques. Note also that the third requirement disqualifies EBC, EBD, BCD, and BDE from being cliques because there is always an outside node that is directly connected with all nodes in the groups. For example, EBC is not a clique because node D is connected directly with E, B, and C. In representing cohesive subgroup, clique requires very strict conditions, which gains it a reputation of “being stingy” (Alba 1973). Because of the high thresholds of a clique formation, empirical researchers rarely detect cliques in their actual datasets (Wasserman and Faust 1994:256). Part of reasons for lack of clique in actual network data lies in the design of network data collection. For example, the fixed list approach restricts respondents’ nomination on a specific relation to a certain maximum number. Thus, the size of clique will not exceed that number pre-imposed by researchers. Clique also dichotomizes members in cohesive subgroup from those not in the group, thus overlooking the gradations of the differentiations between the more central 23 and more peripheral actors. Once clique draws a boundary, no further distinction takes place between clique members and between non-clique members. In reality, such simplified dichotomization often is not very informative as much more important distinction occurs between insiders and between outsiders. One of the determinants for the number of cliques is the size of network. Small networks hardly yield any clique, whereas large datasets often embrace numerous cliques, many of which are overlapping with each other. This leads researchers to focus more on the multiple, overlapping cliques rather than a single standing-along clique. Freeman (1992) developed lattices to describe overlapping cliques. Although the rigid definition of clique prevents it from being much informative, the concept of clique is catalytic for many new measures of cohesive subgroups that relax some of the stringent requirements for clique. Two general approaches emerge to propose alternatives to cliques in measuring subgroup cohesiveness. One approach unstraines the requirements based on nodal degree: the number of lines incident on the node (Seidman1983; Doreian and Woodard 1994). We discussed such approach using k-core concept in chapter 3. Essentially, the k-core defines that the subset is a k-core if every node has ties with at least k other nodes. By changing the value of k, a researcher can set more or less restrictive criteria for bounding a network. Another approach modifies the clique based on nodal connectivity, which we discuss in the sessions below (Alba 1973; Mokken 1979). n-clique n-clique relaxes the rigid requirement that geodesic distance between two nodes in a clique has to be 1, which denotes that all pairs in a clique have to be connected with 24 each other directly. In n-cliques, the geodesic distance between all pairs becomes an variable of n. By varying the value of n, network researchers can differentiate subgroups with greater cohesiveness (higher n values) or with lower cohesiveness (lower n values). For example, a 2-clique method identifies a clique if it has more than two nodes, the geodesic distances between the nodes equal or less than two, and no other nodes in the network is connected to all the clique nodes with geodesic distance of two or less. Note that in the previous illustration of the Figure 4_5, node F is not a member of the two cliques. Relaxing the geodesic distance from 1 to 2, the 2-clique includes F in the clique because it is connected with all nodes of the cliques with geodesic distances of either 1 or 2. In fact, the n-clique concept specifies that the maximum geodesic distance between all pairs of nodes in the clique cannot exceed n. Therefore, the higher the n value, the more inclusive of the clique and the less cohesive between the clique members. The original strict definition of the clique is actually a special case of n-clique, in which n equals 1. Cliques in Directed Graphs Directed graphs distinguish relational senders from the relational receivers. Emphasis in directed graphs is the direction of a specific relation, rather than the mere presence of a tie between two actors. For example, in a friendship network, a receiver of many “best friend” nominations is quite distinctive from a sender of many of those nominations, who receives few nominations itself. As cliques measure the degree of cohesiveness between members in a subgraph, cliques in directed graphs take into account of reciprocity of ties. Extending previous definition of clique in non-directed graphs, we define that a clique in directed graphs as having more than three nodes, all of 25 nodes are connected directly with each other, and all ties between the nodes are reciprocated. To illustrate the clique detection in directed graphs, Figure 4_6 shows a network configuration with directed ties between six nodes. Three cliques are formed – ABE, EBD, and BCD. Note that the relation from E to C is not reciprocated, which prevents EBCD from forming a clique. The lack of direct and mutual tie between A and C also prevents AFC from forming a clique. Modeling after the n-clique method in non-directed graphs, researchers (Peay 1980, Wasserman and Faust 1994: 275) proposed that the rigid requirement for a clique be relaxed by varying the geodesic distance between mutually connected nodes in a directed graph. The method bears great similarity with the one in non-directed graph, except for a special handling of the directions of ties. For example, two nodes can be nconnected via four distinctive scenarios such as weakly n-connected, unilaterally nconnected, strongly n-connected, and recursively n-connected (Peay 1980). In the strongest form of n-connection – recursive n-connected – the path from i to j uses the same nodes and connections as the path from j to i in reverse order, in which the path length is n or lesser. Each of the scenarios can be used to define an n-clique in directed graphs, producing four types of n-clique subgroup. In Figure 4_6, all the nodes belong to the same recursive 2-clique because all pairs in the graph are mutually reachable with each other with path length 2 or less, and the paths connecting all pairs are reversible. 26 Structural Equivalence Social scientists often are interested in not only the cohesiveness of network actors but also in the positional equivalence between two actors in the sense that they both connect with a same set of actors. Structurally equivalent actors are in a competition relation more than a cohesive relation. For example, vendors of a common goods connecting with a same set of retailers are structurally equivalent and facing stifle competition from each other. Actors in structural equivalence are completely substitutable with each other. That is: if one node is withdrawing from a network, its structurally equivalent node can easily replace the leaving node while maintaining original network configuration. For example, two vendors of same goods for a same set of retailers are completely substitutable, that should one of them leaves the business, the other one can quickly fill in for the departure vendor to maintain the original flow of merchandise. Such a substitutability often produces fierce competition. Thus, network scholars using structural equivalence to partition actors mostly are interested in competitive relations rather than cohesive ties (Burt 1992). Much like clique in identifying cohesive relations, formal mathematical definition of structural equivalence is very strict. Two nodes are structurally equivalent if they have ties to or from the same set of other actors on a specific relation. In particular, actors i and j are structurally equivalent if for all other actors N = 1, 2, … g (N i j), i has a tie to N if and only if j has a tie to N, or i has a tie from N if and only if j has a tie from N (Wasserman and Faust 1994:356). If multiple relations emerge, such a condition must be present for all relations for two nodes to be structurally equivalent. Note that the presence or absence of tie between two nodes is not a factor for determining whether they are 27 structurally equivalent. Rather, the determinants of whether two nodes are structurally equivalent are their connections with other nodes in the network. Note that previous mathematic definition of structural equivalence assumes directed binary graphs. In non-directed binary graphs, there is no distinction between senders and receivers of relations. Thus, extending the definition of structural equivalence in directed graphs, actors i and j are structurally equivalent in non-directed graphs if for all other actors N = 1, 2, … g (N i j), i has a tie with N if and only if j has a tie with N. In addition, the definition of structural equivalence needs to take into account the values of the relations in valued graphs, where ranking scales, rather than binary values, measure the ties between nodes. Strictly speaking, in valued graphs, two nodes are in structural equivalence if they have ties with identical values to and from identical other nodes (Wasserman and Faust 1994:359). If the relations are non-directed in valued graphs, two nodes are in structural equivalence if they have ties with identical values with identical other nodes. Mathematical definition of structural equivalence is too rigid to be practical. Empirical network data rarely contain pairs that are structural equivalent according to such a stringent definition. Rather, many pairs of nodes often are approximately structurally equivalent, in the sense that their connections with other nodes are overlapping but not identical (Wasserman and Faust 1994:366). To reflect such gradual approximation to structural equivalence between two actors, measure of structural equivalence is based on the sum of distances for two nodes in their respective connections with other nodes, rather than on whether or not two nodes are strictly structural equivalent. The closer the sum of distances for two nodes in their respective 28 connections with other nodes is to 0, the more structural equivalent between the two actors. Measurement of Structural Equivalence Measurement of structural equivalence between two actors is based on their similarities in their patterns of relations with other network actors. Two actors are structurally equivalent if they share common connections to and from the same set of other actors in the network. Assuming a binary directed graph, to the extent two actors are structurally equivalent, they should have identical entries in their corresponding rows and columns of the matrix. Operationalizing this structural characteristic, Burt (1978) proposed that Euclidean distance between two actors be used to measure the structural equivalence between them. d ij g [( x k 1 ik x jk ) 2 ( xki xkj ) 2 ] (i j k ) (4.14) In the equation, d ij is the Euclidean distance between actors i and j, xik is the entry value of actors i and k in the matrix, which equals either 1 or 0 in a binary matrix. Because d ij is the outcome of a square root of summation of two square terms, d ij 0 . If two actors are in perfect structural equivalence, d ij = 0. The larger the d ij , the less structural equivalence between actors i and j. To illustrate how to compute the structural equivalence between two actors, we produce Figure 4_7 and Table 4_2, both depict the same five-node network structure, except that Figure 4_7 is the graph representation, whereas Table 4_2 is the matrix representation. Figure 4_7 shows that actors 1 and 2 are structurally equivalent as they both connect to actors 3 and 4. In contrast, actors 4 and 5 are not in structural equivalence 29 because although they both connect to actor 3, actor 4 receives connections from actors 1 and 2, whereas actor 5 does not. The followings show how to apply the above equation to compute the Euclidean distance between actors 1 and 2. d12 [( x13 x23 ) 2 ( x31 x32 ) 2 ] [( x14 x24 ) 2 ( x41 x42 ) 2 ] [( x15 x25 ) 2 ( x51 x52 ) 2 ] (4.15) Applying the entry values of x13 , x23 , x31 , x32 , x14 , x24 , x41 , x42 , x15 , x25 , x51 , x52 , which are shown in Table 4_2, d12 equals 0, indicating that they are structurally equivalent. We leave the audiences the exercise to compute the Euclidean distance between actors 4 and 5, which should be 2. Computation of structural equivalence in binary non-directed graphs is simpler than it is in directed graph because there is no distinction between senders and receivers of the relations. The equation of structural equivalence in binary non-directed graphs is d ij g (x k 1 ik x jk ) 2 (i j k ) (4.16) When multiple relations present in the network, computation of structural equivalence between two actors should take into account their connections with other actors for all the relations. In multiple relations situation, two actors are structurally equivalent if and only if they are structurally equivalent in every relation of the multiple relations. The equation reflexive of this logic follows d ij R g [( x r 1 k 1 ikr x jkr ) 2 ( xkir xkjr ) 2 ] (i j k ) (4.17) Another important measure of structural equivalence between two actors in a network is Pearson’s correlation coefficient, which was used in the CONCOR algorithm 30 (to be discussed in detail in Blockmodeling). The correlation coefficient (rij) between two actors i and j is g rij g ( X ik R i )( X jk R j ) ( X ki C i )( X kj C j ) k 1 k 1 ( X ik R i ) 2 ( X ki C i ) 2 k 1 k 1 g g 1/ 2 ( X jk R j ) 2 ( X kj C j ) 2 k 1 k 1 g g 1/ 2 i jk (4.18) Where Ri 1 g X ik g k 1 and Ci 1 g X ki g k 1 ik R i and C i in the equation are the average values of the entry value for the row i and column i respectively. If the two actors i and j are structurally equivalent, the correlation between their respective rows and columns in the matrix will be 1. According to the formula, one can compute that R1 = R 2 = 2/5 and C 1 = C 2 = 0, and r12 equals 1 in Figure 4_7. The computation of correlation coefficients for pairs of nodes in symmetric network is simpler than it is in asymmetric network, because there is no distinction between X ik and X ki , and between R i and C i . The formula that computes correlation coefficients in symmetric network is as the following: g rij (X k 1 ik R i )( X jk R j ) g g ( X ik R i ) 2 ( X jk R j ) 2 k 1 k 1 1/ 2 i jk (4.19) Using this formula, we developed a JAVA program to compute the correlation coefficients between pairs of nodes in a symmetric/undirected social network shown in 31 Figure 4_5. Table 4_3 displays the results that the coefficient between B and E is 1, whereas it is -1 between C and F. 32 Visual Displays, Clustering, Multidimensional Scaling Images of networks were used commonly in social network studies to develop structural insights and to communicate those insights to others (Freeman 2005). Social network analysis has undergone three distinctive phases in the development of visual displays of network structures (Freeman 2000). The beginning stage ranges from 1930s to 1950s, in which network researchers relied on hand drawings to depict similarities and differences in the positions occupied by actors (Moreno 1953). The second stage begins in the 1970s that witnessed the automatic graphing, various software, and mainframe computers. In this stage, network scholars have been increasingly using standardized computation and graphing processes. In the latest development, the advent of high-speed networks, browsers, World Wide Webs, and widespread of PCs opens a whole new array of opportunities in visual displays of network data. Clustering For the most part, visual displays for exploring social network data seek to uncover cohesive subgroups through partitioning methods. One of those partitioning methods is hierarchical agglomerative clustering, which groups network actors into subsets, so that actors in the same subset are relatively similar to each other. Hierarchical agglomerative clustering normally processes N N matrices, in which N denotes the number of actors in the network. Although hierarchical agglomerative clustering works on both binary and valued graphs, here we focus on binary graphs for simplicity reason. 33 Two measures – correlation coefficient and Euclidean distance – are widely used to measure the similarity between a pair of actors in social network. An empirical analysis using both measures suggests that they produce very similar results, although result of correlation coefficient is easier to interpret than that of Euclidean distance (Aldenderfer and Blashfield 1984: 24-28). Note that the computation of Euclidean distance or correlation coefficients depends on whether the matrix is directed/asymmetric or un-directed/symmetric. Once the computation of similarity measure is complete, the Hierarchical agglomerative clustering can proceed to partition actors using some threshold value, α, which serves as the ceiling value for pairs of actors in the subsets being partitioned ( d ij α). Thus, actors within the subsets are more structurally equivalent or are correlated stronger than are actors across different subsets. The partitioning process continues with successively less restrictive α (higher α) until every actor belongs to one big group. Note that although the hierarchical agglomerative clustering produces nonoverlapping clusters, those clusters are nested, in that each cluster can be subsumed as a member of a larger and more inclusive cluster at a higher level of similarity. Often, a tree diagram called dendrogram is used to depict visually the sequence of mergers of clusters. Hierarchical agglomerative clustering has three criteria in its merger inclusive rule: single linkage, complete linkage, and average linkage. At a given α level, a single linkage criterion would include an actor into a cluster if the actor has a correlation coefficient larger than α with at least one of the actors in the existing cluster. In contrast, complete linkage criterion operates on the logic opposite of the single linkage criterion that any candidate for inclusion into an existing cluster must have a correlation 34 coefficient larger than α with all the actors in the cluster. The third criterion, the average linkage criterion, was developed as an antidote of the extremes of the single linkage and complete linkage criteria (Aldenderfer and Blashfield 1984: 36-44). It states that the candidate actors being included into an existing cluster must have an average correlation coefficient with all actors in the cluster that is larger than α. Empirical analyses of all three criteria report that each of them has its advantages and disadvantages, which should prompt researchers planning to use any of those methods to be acutely aware of those issues (Aldenderfer and Blashfield 1984: 53-62). Multidimensional Scaling Multidimensional scaling (MDS) is yet another method to illustrate visually some hidden underlying structures of data. The MDS has been highly instrumental in facilitating research in various disciplines such as psychology, sociology, economics, and educational researchers. In social network analysis, the primary goal of MDS is to detect meaningful underlying dimensions that reflect similarities or dissimilarities (distances) between the network actors. Commonly the input data to MDS is an N N symmetric ( X ij X ji ) matrix, in which N denotes any type of entities such as persons, communities, organizations, or countries. Depending on what the entry numbers represent, the matrix can be similarity matrix if the high numbers indicate great similarity between the two actors or dissimilarity matrix is the high numbers suggest low similarity between the two actors. The output of MDS is a visual map depicting actors’ locations that the actors with more proximity are closer in space than the actors with less proximity. Although the MDS 35 output diagrams can be represented in any dimensions, mostly the results are presented in two-dimensional maps. The result of a MDS map is not directly correspondent to the entry values of the original matrix. Rather it reflects the computed Euclidean distance between pairs of actors in the network. Thus, an indicator called stress reflects the level of discrepancy between the original matrix and the new matrix consisting of Euclidean distance between all pairs (Kruskal and Wish 1978: 23-30). Stress ( f (x ij ) d ij ) 2 (4.20) Scale The f ( xij ) is a non-metric, monotonic function of the original entry values (Kruskal and Wish 1978: 29), whereas the d ij refers to the Euclidean distance between actors i and j on the map. The scale is a scaling factor to constraint the Stress indicator between 0 and 1. When the MDS map perfectly reproduces the input data, f ( xij ) d ij for all i and j, and stress is zero. Thus, the smaller the stress, the better the representation. To illustrate the application of the two visualizing techniques, we produce a hierarchical dendrogram graph (shown in Figure 4_8) and a multidimensional scaling (shown in Figure 4_9) using UCINET 5.0 with network data from Figure 4_5. The input matrix for hierarchical dendrogram is the correlation matrix shown in Table 4_3, whereas the input matrix for MDS is the original matrix corresponding to Figure 4_5. Figure 4_8 shows the dendrogram using the “average” option: that the candidate actors being included into an existing cluster must have an average correlation coefficient with all actors in the cluster that is larger than α, which is shown in the up horizontal axis. For example, the average of the correlations between F and E (0.316) and between F and B 36 (0.316) is 0.316. Thus, F is joined to the BE cluster at the level of 0.316. The dendrogram shows that at the most superficial level, the six nodes are divided into 2 clusters: ACD and BEF. Further split occurs within each cluster as the threshold value α increases: within ACD, CD forms a cluster as oppose to A, and within BEF, BE forms a cluster as opposed to F. The MDS mapping of the 6 actors yields a slightly different configuration. The dimension specified by the X-axis clusters CBE together as the central set of actors, whereas the dimension by the Y-axis groups CBF as central actors. Note that the stress indicator of this MDS is 0, suggesting a perfect representation of its original data. However, it is inevitable that when MDS encounters large datasets, the stress indicator will be increasing to reflect a certain level of discrepancy between the input data and the MDS mapping. 37 Blockmodels Blockmodeling is an important method to partition network actors, which was developed initially by Whites and his associates (White, Boorman, and Breiger 1976; Boorman and White 1976; Schwartz 1977). Since White’s et. al., (1976) groundbreaking work of blockmodeling, researchers have been fruitfully used the method to study various topics such as interorganizational network (Knoke and Rogers 1979), diffusion of a new technology (Anderson and Jay 1985), and positions and roles of cities belonging to a world city system (Alderson and Beckfield 2004). Studies also extend blockmodeling in its searching and partitioning methods (Winship and Mandel 1983; Wu 1983; Nowicki and Snijders 2001). Space limitation prohibits an extensive discussion on all those literatures of blockmodeling. Rather, this section focuses on core issues such as what is blockmodeling method, how to implement the blockmodeling with suitable algorithm, and how to interpret the outputs of blockmodeling partitioning. The Blockmodeling Method The blockmodeling is a partitioning technique to divide a population into sets of structurally equivalent actors – blocks. In blockmodeling, a search process iteratively partitions a population, permuting rows and columns, so that members of a block are grouped together. The term block refers to a rectangular submatrix, consisting of structurally equivalent actors that have great density between themselves. Blockmodeling essentially is a data reduction technique for descriptive purpose, searching for patterns in network data by regrouping cases and presenting condensed aggregate-level information. Close and similar cases are grouped together to form a 38 homogeneous block, which is distinguished from other blocks. By creating several blocks that are characterized by within block homogeneity and across block heterogeneity, blockmodeling reveals those regularities in the patterns of relations among actors that undergird social structure. Blockmodeling can process matrix reflective of a social network in single and multiple relations, directed and undirected matrices, and in binary or valued graphs. Here we focus on binary matrix and refer readers to a more elaborated discussion on blockmodeling of both binary and valued graphs by Doreian, Batagelj, and Ferligoj (2005: 347-360). When input matrices reflect multiple relations, those matrices are stacked up to produce a matrix of K N N , whereby K denotes the number of relations and N represents the number of cases in the matrix. The implementation of blockmodeling is mainly through an algorithm CONCOR (Convergence of iterative Correlations), developed by one of White’s students Schwartz (Schwartz 1977). Sections below devote to discussing the CONCOR algorithm. The CONCOR Algorithm The CONCOR algorithm operates on rows, columns, or both rows and columns simultaneously. For simplicity in our illustration, we assume that the algorithm correlates between columns. The first step of the algorithm calculates the Pearson correlation coefficients between all pairs across different columns. Two cases with the exact same pattern of connections with other network actors should have a correlation coefficient of 1, whereas two cases with the opposite pattern of connections would have a coefficient of -1. The result of this first step calculation is a symmetric N N matrix, in which N denotes the case and the entries represents the correlation coefficients between all pairs of 39 cases. The second step uses the results of similarity measures from the first step to group cases into structurally equivalent sets – the blocks, so that cases with great similarities belong to the blocks, whereas cases with great dissimilarities are separated into distinctive blocks. If the first step processes a matrix in which columns are either perfectly correlated (1) or completely uncorrelated (-1), the second step of clustering would be easy. All values in the result from the first step would be either 1 or -1, permitting a clear-cut division that groups all the pairs of 1 together, in contrast to the other group with all the pairs having -1. However, empirical social network data rarely fit in such a profile, necessitating an iterative processing of the result. When the first step does not produce a clearly divisible matrix, the CONCOR algorithm re-calculates the correlation coefficients using the previous round of result as input matrix. This process is repeated for each successive matrix – correlating the coefficients of the coefficients and so on. Such repeated computation of the correlation coefficients eventually produces a matrix containing either 1 or -1, allowing a distinctive grouping of the cases into two different blocks. Each of the two blocks can be further divided using the same procedure: repeatedly computing the correlation coefficients until a clearly divisible matrix emerges. Researchers can decide when to stop the iteration of division, thus determining the number of blocks. The CONCOR algorithm uses 1/-1 as the threshold parameter of for division during each round of iteration, which represents the strongest criterion for an unambiguous partition into structurally equivalent sets – the blocks. Those blocks, which contain structurally equivalent actors constitute a squared image matrix that contains 40 binary values in its entries. The criteria to assign 0 and 1 to the entries vary, which will be discussed further in the following sections. The Output and the Interpretations The output of the blockmodeling, implemented through the CONCOR algorithm, is a squared binary image matrix, which replaces the original submatrices with blocks. The entries of the image matrix are either 0 or 1, depending on the density of relations within each block. Here, two criteria emerged to determine the binary value for each entry. (1) blocks with no ties among their actors are coded as 0s (zero-blocks), blocks with one or more ties are coded as 1s (one-block), or (2) researchers arbitrarily choose a density cutoff point – an alpha value (), those entries with density measures below the cutoff threshold are coded as 0, and those with the value or above are coded as 1. The first criterion of using 0/1 in block density to determine the entry value is the most restrictive form as such density patterns rarely occur in real data. The second criterion that researchers use the alpha value to dichotomize the entry values is a more common practice. Often researchers choose the average density of the entire matrix as the cutoff point. However, because the choice of the alpha value inevitably involves researchers’ discretionary judgment, researchers adopting the alpha value criterion are vulnerable to criticism of being arbitrary. In response, researchers should always provide justifications based on theoretical and empirical grounds, rather than purely mathematical principles (Scott: 136-142). 41 Network Position Measures: Automorphic, Isomorphic, and Regular Equivalences Roles and positions are central concepts in social network analysis. Structural equivalence is one of those methods to identify roles and positions for individuals in a social network. However, its definition – two actors have to be connected with the same set of other actors to be structurally equivalent – is too stringent to be practical. Researchers have developed many alternative and more abstract measures to identify roles and positions (Everett 1985; Everett, Boyd, and Borgatti 1990; Borgatti and Everett 1992; Faust 1988; Pattison 1988). The following sections discuss those new methods, including automorphic/isomorphic equivalence, and regular equivalence. Note that with regard to the level of abstract in describing properties of the relations, structural equivalence is the least abstract, regular equivalence is the most abstract, automorphic/isomorphic equivalence lies in the middle. Therefore, structural equivalence guarantees automorphic/isomorphic equivalence, which in turn embraces regular equivalence, whereas the reverse is not necessarily true. For simplicity in our illustration, the following description presumes a binary, undirected graph defined by a single relation, although with some modifications, automorphic equivalence and isomorphic structure can be used to partition directed and valued graphs too (Wasserman and Faust 1994: 461-502). Automorphic Equivalence and Isomorphic Structure Automorphic equivalence and isomorphic structure are closely related concepts. Researchers often used them interchangeably (Borgatti and Everett 1992). However, it should be noted that isomorphic structure is used to characterize two graphs, whereas 42 automorphic equivalence describes relational properties between social actors in one graph. Two graphs are structurally isomorphic if there is a one-to-one mapping of one set of nodes to another such that the relations among the original nodes are also preserved. In other words, a graph isomorphism is a mapping of the nodes in one graph to corresponding nodes in another graph such that if two nodes are connected in one graph, then their correspondences in the second graph must also be connected (Borgatti and Everett 1992: 11). All graphs are isomorphic with themselves; an isomorphism of a structure with itself is called automorphism. Two actors are automorphically equivalent if they are connected to corresponding other positions. Automorphic equivalent nodes have identical graph theoretic properties, such as centrality, ego-density, and clique size (Borgatti and Everett 1992). Automorphic equivalence relaxes the rigid requirement of structural equivalence in defining roles and positions in social networks. Structural equivalence defines positions by locating groups of similar individuals based on the extent to which they share identical ties with identical others. In contrast, automorphic equivalence identifies positions by grouping similar individuals based on the extent to which they share identical ties with counterparts who play the same roles. For example, two professors must have the same relations with the same set of students to be structurally equivalent, whereas automorphic equivalence requires only two professors to have the same relations with their own students. Therefore, automorphically equivalent actors are also structurally equivalent, whereas the reverse is not necessarily true. By relaxing the structural equivalence, automorphic equivalence proves to be very useful in facilitating empirical research corresponding to various theories. Borgatti and 43 Everett (1992) summarize and clarify several studies using structural equivalence to operationalize different theories, which, In fact, should be operationalized via automorphic equivalence. For example, addressing Burt’s (1979) proposal to define the industries or sectors in the economy as firms producing similar types of goods and occupying a single position in an interorganizational network, Borgatti and Everett (1992:21) argue that structurally equivalent firms, which buy from the same providers or sell to the same clients, hardly constitute sectors, whereas automorphically equivalent firms – buying from the similar vendors and selling to the similar clients – might. Regular Equivalences Regular equivalence is the least restrictive of the three most commonly used definitions of equivalence: structural equivalence, automorphic equivalence, and regular equivalence. However, it is the most important measure for the sociologist in capturing social roles and positions. Two persons are regularly equivalent if one has a relation with a person in a second position, the other has an identical relation with a counterpart in that position (White and Reitz 1983: 214). Mothers with children are regularly equivalent, so are doctors with nurses. The following paragraphs review studies on the definitions of equivalence (Borgatti and Everett 1992; Borgatti and Everett 1993; Everett 1985; Borgatti and Everett 1989; Doreian 1987; Everett, Boyd, and Borgatti 1990; Faust 1988), attempting to clarify the differences between the three types of equivalence. As the strictest form of equivalence, structural equivalence requires that a pair of actors connect with the same set of other actors on the same type of relation. In contrast, automorphic equivalence and regular equivalence require that a pair of actors connect with the other actors who are structurally equivalence with each other on the same 44 relation. However, the distinction between automorphic equivalence and regular equivalence is not always clear. Here, we state that automorphic equivalence requires that sub-structure of graphs can be substituted for one another, whereas regular equivalence does not require a complete substitution between two subgraphs. To illustrate the differences, we provide an artificial organizational hierarchical network. Figure 4_10 depicts an organizational hierarchical network divided into four levels and linked by supervisory relations. The CEO supervises three executive level managers A, B, and C, who supervise four middle managers (D, E, F, and G), who in turn supervise a few rank-and-file (H, I, J, K, L, M and N). Actors B and C are structurally equivalent because they have identical ties (supervisory relation) with identical others (F and G). However, the other two pairs (A and B, A and C) are not structurally equivalent but instead regularly equivalent because they are not connected with identical others but with role-similar others. Note that these two pairs are not automorphic equivalent either because the subgraph leading by A is not substitutable with the subgraph leading by B and C. No structural equivalent pairs present at the middle manager level between D, E, F, and G. Instead, several automorphic equivalent pairs surface, including DF, DG, and FG. The subgraphs leading by D (DHI), F (FKL), and G (GMN) are completely substitutable with each other. In addition, several regular equivalent pairs emerge, including ED, EF, and EG. Although the subgraph leading by E is not substitutable with those leading by D, F, and G, actor E shares some similarities with D, F, and G in that they all are middle managers of the rank-and-file employees in the organization hierarchical structure. Figure 4_10 vividly illustrates that among the three equivalences, the strictest one is structural equivalence, the least strict is regular equivalence, 45 automorphic equivalence lies in the middle. In addition, in reflecting social roles and positions, defined as an aggregate class or category of individuals who share similarities in their relations with other categories of the rest of the social system (Faust 1988: 315), regular equivalence is a better indicator than is structural or automorphic equivalence. 46 Logit models (p*) Most social network methods are descriptive, attempting to represent some underlying social structures through data reduction techniques or to characterize network properties through algebraic computations. A recent wave of groundbreaking work moves beyond the descriptive analyses of social networks, providing an important statistics model – logit p* – to explain the presence of dyadic ties with a set of individual level and graph level explanatory factors. Wasserman and Pattison (1996) firstly proposed the logit model p* and logistic regression for social network. Their work, however, was developed from several earlier treatises on Markov random graphs (Frank and Strauss 1986), the log-linear modeling of directed graphs (p1) (Holland and Leinhardt 1981), and algorithm implementation of pseudolikelihood estimate (Strauss and Ikeda 1990). In recent developments, the proposed logit p* and logistic regression were extended to analyze multivariate relations (Pattison and Wasserman 1999) and valued relations (Robins, Pattison, and Wasserman 1999). Focusing on the application of the logit p*, a couple of recent thesis describe, in detail, the data structure and interpretation of the results (Crouch and Wasserman 1998; Anderson, Wasserman, and Crouch 1999). This section discusses briefly the mathematical basics to the logit p* model, while emphasizing the applications of the method with an artificial network. Although the method can be used to analyze multivariate relations and valued graphs, for simplicity, this section presumes a dichotomous directed graph with single relation. Interested readers may consult the above citations for more advanced topics. 47 Logistic Regression The logit p* is closely related to the logistic model. Thus we start with a brief introduction of the logistic regression and refer readers to Pampel (2000) for an introduction to logistic model in great details. Logistic regression model is used to explain dichotomous dependent variable coded as a binary variable (Y* = 1 or 0), which often is presumed to have a binomial distribution. Applying the OLS regression to the binary dependent variable, which models the probabilities as a function of a linear combination of a vector of explanatory variables produces two major problems: 1) that the predicted response value can be larger than 1 or lower than 0; and 2) that the model induces heteroscedasticity: the variance of the error term varies in response to the value of the dependent variable. The logistic regression model corrects the problems by transforming the probabilities into logit. In particular, Logit (Y*) = log ( Pr(Y * 1) ) 0 1 X 1 2 X 2 ... k X k Pr(Y * 0) (4.21) One can interpret the parameters using logit, odds, or probability. Parameter interpretation using odds is more common than the other two methods, possibly because it is more straightforward than logit and less mathematical than the probability method (Pampel 2000). To obtain the effects of independent variable in terms of odds, one needs to take the exponential function of the linear equation, Pr(Y * 1) exp( 0 1 X 1 2 X 2 ... k X k ) e 0 e 1 y1 ...e k yk Pr(Y * 0) (4.22) The equation shows that independent variables exert a multiplicative impact on 48 the odds of the response variable. The interpretation of the impact of a certain independent variable ( y k ) involves taking the exponential of its original parameter e k . Logit p* Logit p* is the application of logistic regression on analyzing social network data. In a social network dataset with dichotomous, single, and directed relations between g actors, the entry (i, j) in the matrix X (X = g g) is a binary value X i, j i j {10ifotherwise From X, researchers (Wasserman and Pattison 1996) proposed three additional matrices: 1) X i, j as the matrix for the relational tie from i to j is forced to be present, 2) X i, j as the matrix for the relational tie from i to j is forced to be absent, and 3) X iC, j the complement relation for the tie from i to j. With these three additional matrices, one can model the probability that the tie from i to j is present as the following, Pr( X i , j 1 | X ) C ij Pr( X X ij ) Pr( X X ij ) Pr( X X ij ) (4.23) exp{( xij )} exp{( xij )} exp{( xij )} In the formula, is the vector of the parameters to be estimated, whereas ( xij ) and ( xij ) is the vector of network statistics when the variable X i , j 1 and X i , j 0 respectively. The odds ratio of the presence of a tie from i to j to its absence is 49 Pr( X i , j 1 | X ijC ) Pr( X i , j 0 | X ijC ) exp{( xij )} exp{( xij )} (4.24) Taking the natural log of both sides and simplifying the formula transform the above equation into the following ( log Pr( X i , j 1| X ijC ) Pr( X i , j 0| X ijC ) ) ( ( xij ) ( xij )) (4.25) This equation is called logit p*, which contains a vector of parameters to be estimated, and a vector of network statistics ( xij ) ( xij ) that arises when the variable X i , j changes from 1 to 0. An Artificial Network Dataset To illustrate the application of the logit p*, we use a small artificial network dataset. Figure 4_8 shows the binary directed graph with 6 actors, 3 boys in square and 3 girls in circle. Assuming the directed lines in the graph represent “nomination of best friend,” actors 1 and 5 name each other as the best friend, whereas actor 2 names actor 1 as her best friend, but not vis-à-vis. Scrutinizing the graph, one can easily detect that nominating best friend is genderspecific: best friend nominations occur more frequent between same sexes: boy-boy or girl-girl than between cross-sex: a boy and a girl. To model this “same-sex trend” and other network characteristics on the presence of ties, we chose five model parameters: 1) overall degree of choice (θ), 2) differential choice within sex (θw), 3) mutuality (ρ), 4) differential mutuality within sex (ρw), and 5) transitivity. The vector of model parameters to be estimated is 50 θ = {θ, θw, ρ, ρw,τT }. Computing the vector of explanatory variables for all pairs in the graph involves calculation of the changes in the vector of network statistics z(x), when the ties between i and j changes from 1 to 0. In particular, Z X ij is the Zw X ij ij is the choice within sex explanatory variable Z i j X ij X ji is the mutuality variable Z w i j X ij X ji ij is the mutuality within sex explanatory variable Z T ijk X ij X jk X ik is the transitivity explanatory variable choice explanatory variable The indicator variable ij is a binary indicator, which equals 1 if both i and j are in the same sex group or 0 if otherwise. Note that for a directed network dataset with g actors, the total number of cases for the logit p* should be g(g-1), derived from Pg2 g! g ( g 1) . Thus our dataset with 6 actors would have 30 cases of ( g 2)! directed pairs in the logit p* model. Also noted that Markov graph theory encompasses more variables than those in our model, such as individual expansiveness ( X i ) and popularity ( X j ), and graph level cyclicity. Space limitation prohibits exhibition of the full-blown logit p* with all explanatory variables. Interested audiences can consult Anderson et al., (1999) and Wasserman and Pattison (1996). 51 Table 4_3 shows the input dataset for logistic regression of the presence/absence of directed friendship ties between ordered pairs. Below we illustrate the computation of the five explanatory variables with the ordered pair from actor 5 to 6. Change in choice = X X ij ij Change in choice within the same sex i j X ij X ji Change in mutuality – = 10 – 9 = 1 X X ij ij i j ij ij X ij X ji =3–2=1 i j X ij X ji ij Change in mutuality within the same sex =9–8=1 – i j X ij X ji ij =3–2=1 Change in the transitivity X ij X jk X ik ijk – ijk X ij X jk X ik =4–2=2 Note that in computing the transitivity, the original graph with the tie being present from actor 5 to actor 6 has 4 transitivity: (1 5, 5 6, 1 6) , (1 6, 6 5, 1 5) , (4 2, 2 3, 4 3) , and (5 1, 1 6, 5 6) . With the tie from actor 5 to actor 6 being forced to be absent, the graph contains 2 transitivity, (1 6, 6 5, 1 5) and (4 2, 2 3, 4 3) . Thus, the change in the transitivity is 4 – 2 = 2 when the tie from actor 5 to 6 changes from being present to being absent. Loading the dataset in Table 4_3 into some commercial statistical software such as SPSS, SAS, or STATA, one can conduct logistic regression of the “tie” as the dependent variable with the four independent variables: choice, choice-within, mutuality, mutuality-within, and transitivity. Regressing “tie” on the four independent variables, we 52 found some anomalies with our result. First, the variable “choice” turns out to be constant because every ordered pair has a value of 1. Thus the model discards the “choice” variable. Second, the remaining variables have unusually huge parameter estimates and standard errors, indicating a potential problem of multicollinearity: the high correlation between independent variables significantly distort the regression estimate of standard errors. To circumvent this problem, we run four separate models, regressing the “tie” dependent variable on the four independent variables one at a time. Third, even with separate model, we found anomalies that the standard error for transitivity is as large as 9262 to some unknown reasons. We thus drop the transitivity model and call for more investigation on this anomaly. Interpretation of the logit p* results is similar to interpreting other standard outputs from logistic regression. Model 1 in Table 4_4 shows that each unit of increase in the choice between i and j, provided that i and j have the same sex, increases the odds that i actually sends a tie to j by 51 (exp(3.932) = 51) times. Model 2 shows that the tendency that actors i and j have mutual tie increases the odds that actor i actually sends a tie to actor j by 6 (exp.(1.792) = 6) times. Model 3 shows that the tendency that actors i and j have mutual tie increases the odds that actor i actually sends a tie to actor j by 8.5 (exp(2.140) = 8.49) times, provided that both actors are in the same sex group. Note that the logistic regression model fitness is indicted by twice the negative of the log likelihood (-2 log likelihood). If the model were to fit perfectly, this -2 log likelihood measure would equal 0. Thus, large value in the -2 log likelihood suggests poor fit (Knoke, Bohrnstedt and Mee 2002: 287-314). Comparing -2 log likelihood of two nested equations – one is less restrictive and one is more restrictive –, one can determine 53 whether the additional predictors in the less restrictive equation significantly improve the model fitness. Researchers have used such technique to search for the most parsimonious model (Anderson et al., 1999). The logit p* model marks an important step advancing social network studies. It moves beyond representation and description, focusing on explanation of relational ties between actors. It models explicitly the impacts on relational ties from graph level characteristics and individual idiosyncrasies. Immense opportunities lie ahead for network researchers to use logit p* to analyze substantive social issues. We suggest that more practical guides showing the applications of logit p* are much needed to propagate this new technique among network researchers. 54 Affiliation Networks Affiliation networks are used to represent the affiliation of a set of actors with a set of social events (Wasserman and Faust 1994: 291-343). Social actors are linked through their joint participation in social events or membership in collectivities. Social events are linked to each other through the multiple memberships of actors. Affiliation networks vividly illustrate those connections between actors and events. Affiliation network is also called membership network (Breiger 1990) or hypernetwork (McPherson 1982). An affiliation network consists of two elements: a set of actors and a set of events, which makes it a two-mode network. Substantive studies using affiliation networks are numerous (Wasserman and Faust 1994: 196). In chapter 3 we discuss Freeman’s et al., (1987) study focusing on a group of university faculty and students who attend a series of nine colloquiums. The researchers use affiliation network to represent their network data, in which faculty and students are social actors and colloquiums and social events. This chapter uses an artificial network data that consist of five social actors and three social events to discuss topics such as how to represent affiliation network, analysis of affiliation network, and properties of affiliation network. The Affiliation Network Matrix and Bipartite Graph Affiliation network can be represented with a matrix that records the affiliation of each actor with each event. Assuming that the affiliation network has g actors and h events, the matrix that represents such affiliation should have g rows and h columns, whereby rows and columns indicate actors and events. If actor i attends event j, the i, j 55 entry in the matrix should be 1, otherwise the entry would be 0. Denoting such affiliation matrix as A and values in the matrix as X i , j the following shows the condition 1 if actor i is affiliated with event j X i, j = { 0 otherwise h Note that the row margins ( X i , j ) of the matrix A sum up the number of social events j 1 g an actor is affiliated with, whereas the column margins ( X i , j ) indicate the number of i 1 actors in a social event. Thus the value of the row margins range from 0 to h, indicating that the number of social event an actor attends can be anywhere from no event to all the events. Similarly, the values of the column margins are from 0 to g, suggesting that an event attracts actors from no actor to all actors in the network. In addition to affiliation matrix A, bipartite Graph is another method to represent an affiliation network. Bipartite graph consists of two presentation forms: a visual bipartite graph and a bipartite matrix. A visual bipartite graph includes actors, events, and lines connecting the actors to the events, with which the actors are affiliated. However, bipartite graph does not permit lines connecting actors with each other or lines connecting events with each other. A bipartite matrix contains both the actors and events in its row and column specification. Assuming an affiliation network with g actors and h events, a bipartite matrix should be a “g + h” by “g + h” squared matrix. We use an artificial network consisting of five actors and three events to illustrate how to represent an affiliation network using bipartite visual graph and bipartite matrix. Figure 4_12 shows the affiliation network, in which the five actors are represented in circles and the three events in squares. Lines connecting actors with events indicate that 56 the actors attend the events, or the events draw the actors. From the actor’s perspective, actor 1 attends both events 1 and 2, actor 2 attends only event 1, actor 3 attends only event 2, actor 4 attends event 3, and actor 5 attends both event 2 and event 3. From the event’s perspective, event 1 draws actors 1 and 2, event 2 draws actors 1, 3, and 5, and event 3 attracts actors 4 and 5. While the graph contains no line between actors or between events, it shows how actors can be connected through their common affiliation with certain events. For example, actor 1 is connected with actor 2 through their common affiliation with event 1. Actor 1 is also connected with actors 3 and 5 through common affiliation with event 2. Table 4_5 shows the matrix representation of the bipartite graph. It is a squared matrix with rows and columns representing actors and events. Cell entries connecting actors with actors, or connecting events with events are all 0. The cell entries connecting the five actors in the rows and the three events in the columns are either 1 or 0, depending on whether the actor is affiliated with the event. For example, actor 1 has two 1s in both events 1 and 2, suggesting that it attends both events. The five actors in the rows and the three events in the columns constitute a sub-matrix A, denoted with bold fonts and borders in the up-right portion of the table. In contrast, the lower left portion of the table contains another sub-matrix with events as rows and actors as columns. It is actually a transpose matrix of A, denoted as A’ ( X i , j X j ,i ). Note that the row margin equals its corresponding column margin. Row margins (or column margins) of actors suggest the number of events the actors attend, whereas the row margins (or column margins) of events indicate the number of actors who attend the events. 57 Succinctly, such bipartite matrix can be represented using the following form 0 A XA,E = A' 0 (4.26) Multiplying the two sub-matrix (A and A’) produces much information about the co-membership relations between actors or co-actorship relations between events. Let’s define XA as a symmetric and valued matrix describing the co-membership between actors. XA = AA’ (4.27) Assuming an affiliation network has g actors and h events, A is a g h matrix, whereas A’ is an h g matrix. Thus, XA is always a g g matrix, in which the entry value of Xi, j indicates the number of shared events actors i and j are affiliated with. Meanwhile, if we assume XE is a symmetric valued graph describing the number of common actors an event has, then XE = A’A (4.28) One can easily see that XE is an h h matrix, in which the entry value of Xi, j indicates the number of shared members between the two events i, and j. Using our example to illustrate 110 21101 100 11000 11000 XA = 010 10101 10101 001 00011 00011 011 10112 (4.29) Because we have 5 actors and 3 events, the result in XA is a 5 by 5 symmetric matrix, in which Xi, j denotes the number of events both actors i and j attend. For example, actor 1 has 1 co-membership with actors 2, 3, and 5 through their common 58 affiliations but no co-membership with actor 4. The diagonal values Xi, i in XA have substantive meanings as they indicate the number of event an actor is affiliated with. For example, actor 5 is affiliated with 2 events. Using our example to compute XE, one can easily obtain the results indicative of the event’s actors and affiliation. 110 11000 100 210 XE = 10101 010 131 00011 001 012 011 (4.30) Similar to XA, XE has several interesting properties. First, it is a 3 by 3 matrix symmetric matrix with rows and columns indicating the three events. Second, the diagonal values (Xi,i) indicate the number of actors who are affiliated with the event i. For example, event 2 has three actors (actors 1, 3, and 5). Third, the entry values coordinating two events indicate the number of common actors the two events share. For example, X1,2 =1 suggests that events 1 and 2 share the same actor (actor 1 attends both events), whereas X1,3 = 0 denotes that events 1 and 3 share no common actors. As the diagonal values in XA suggest the number of events an actor is affiliated with, one can easily compute the average rate of affiliation at the network level by dividing the total number of events all the actors are affiliated by the total number of g X actors ( i 1 g i ,i 7 ). In our example, the average rate of affiliation is 1.4 ( ), suggesting 5 that, on the average, each actor is affiliated with 1.4 events in the graph. Likewise, we can compute the average number of actors per event by dividing the diagonal total in XE 59 h X by the total number of events ( event is 2.33 ( i 1 h i ,i ). In our example, the average number of actors per 7 ), indicating that, on the average, each event attracts 2.33 actors. 3 Density and Centrality in Affiliation Network Density and centrality are important measures of network properties. In particular, a density measure suggests the proportion of ties that are present out of the maximum possible ties in a binary graph or the average value attached to the lines in a valued graph. Similarly, the interpretation of density measure in an affiliation network depends on whether the affiliation network is a binary or valued graph (Wasserman and Faust 1994: 316). Assuming that we have an actor by actor matrix XA that describes the comembership relation, the density measure formula follows (note that our formula is bit different from Wasserman and Faust 1994: 316 because we assume symmetric matrix in which relations are undirected, whereas they assume asymmetric directed graphs). g g i 1 j 2 X iA, j (i j ) DA = (4.31) g ( g 1) DA denotes the density measure for the actor matrix in an affiliation network. The numerator sums up all the entry values between all distinctive pairs in the matrix, excluding the diagonal cells. The diagonal values in the actor matrix suggest the number of events an actor is affiliated with. In this case, we do not consider a pair involving an actor and itself a valid unit. The denominator shows the total number of pairs in a network with g actors, which equals g ( g 1) /2 in a symmetric matrix. 60 The formula to compute the density measure of the event by event matrix XE is as the following: h h i 1 j 2 X iE, j (i j ) DE = (4.32) h(h 1) This formula uses the summation of all the entry values between all distinctive pairs in the matrix (excluding the diagonal cells) to divide the total number of pairs between different events. It indicates the proportion of events that share one or more members in common in binary graph or the average number of actors who belong to each pair of events. In our example, both actor matrix XA and event matrix XE are binary graphs, excluding the diagonal values. Thus the interpretation of the density of both graphs assumes binary network. In particular, DA = 5/10 = 50 percent, suggesting that 50 percent of actors share co-membership by attending at least one common event. From event network perspective, DE = 2/3 = 66.67 percent, indicating that 66.67 percent of events share at least one common actor. Looking at the Figure 4_12, you may find that the actor pairs that share at least one event are a1a2, a1a3, a1a5, a3a5, and a4a5, whereas the event pairs that share at least one actor are e1e2, which shares a1, and e2e3, which shares a5. In contrast, e1and e3 share no common actor. Social network analysts have been studying centrality at actor level and centralization at graph level for decades (Freeman 1979; Wasserman and Faust 1994: chpt. 5). In previous section, we noted that actor centrality measures the importance or visibility of actors within a network. Depending on how the concepts of importance or visibility are interpreted, researchers described four major types of centrality: degree 61 centrality, closeness centrality, betweenness centrality, and eigenvector centrality (Wasserman and Faust 1994: chpt. 5). To review briefly, degree centrality reflects the extent to which an actor is active in a network, closeness centrality measures the extent to which an actor is connected with other actors in a network via shortest paths, betweenness centrality captures the extent to which an actor mediates flows of information or resources between other actors in a network, and eigenvector centrality reflects the extent to which an actor is connected to other central actors in a network. A recent work discusses application of all four centrality measures in affiliation network (Faust 1997). Here, we focus on the computation and interpretation of actor degree centrality in affiliation network. A distinctive feature of affiliation networks is that it relates not only actors and events, but also between actors and between events. Thus, degree centrality for affiliation networks can reflect both actor’s and event’s activity. Drawing upon the notion that degree centrality measures the number of contacts an actor has, one can measure actor’s degree centrality in affiliation networks by looking at the number of contacts an actor has through co-membership with certain events (Faust 1997). Interestingly, the actor matrix XA that was derived from the bipartite graph describes the co-membership between a certain actor and other actors in a network. Taking advantage of this property of the actor matrix XA, we can obtain the actor centrality by computing actor’s row margins in XA. g C DA (ni ) X iA, j (i j ) (4.33) j 1 In our example, actor 1 shares co-membership with actors 2, 3, and 5. Its degree centrality of 3 reflects its connectivity. With only one co-membership with actor 1, actor 62 2 has lower degree centrality (1) than actor 1, suggesting its lower number of contacts with other actors compared with actor 1. Likewise, one can obtain the event’s degree centrality in affiliation networks by looking at row margins of the event matrix XE, which reflect the number of other events with which a given event is connected by sharing common actors. h C DE (ei ) X iE, j (i j ) (4.34) j 1 The degree centralities for the three events in our example are 1, 2, and 1 respectively. Event 2 has higher degree centrality than events 1 and 3 because it is connected with both events 1 and 3 by sharing at least one common actor, whereas events 1 and 3 are connected with only event 2. 63 Analysis of Lattices The affiliation network is essentially a two mode network data. One mode is a set of N ( ai , a j ,....a n , ) actors, and the other mode is a set of M ( ei , e j ,....em, ) events. The two sets are linked by affiliations. When an actor ai participates in an event ej, the binary entry X i , j = 1 in the matrix P ( P N M ) , in which the actors define the rows and events define the columns. One of the clear advantages of using the affiliation network to represent network structure is that the affiliation network can illustrate three types of patterning: (1) the actor-event structure, in which actors are linked to events through their participations, (2) the actor-actor structure, in which actors are connected with each other through their common affiliations with certain events, and (3) the event-event structure, in which events are related to each other through their sharing of a common set of actors. However, the bipartite graphs, as described in a previous section that only permits links between actors and events, fall short of visualizing the other two types of structures, such as actor-actor structure, and the event-event structure. Lattices are developed to represent clearly all three types of structures in a single visual model (Freeman and White 1993; Wasserman and Faust 1994: 326-342). Assume that we have a finite nonempty set X (x, y, z,…) and a binary relation in X, in which “ ” is reflexive ( X i ,i 1 ), asymmetric ( X i , j X j ,i ) and transitive (if X i , j X j ,k and X j ,k X k ,l on a relation R, then X i , j X k ,l ). Between a pair of elements x, y in X, there is an element m such that m x and m y. Such an element m 64 is called lower bound, it is the greatest lower bound, or meet when there is on other element b such that b x and b y and m b. An upper bound j is an element such that x j and y j. “j” becomes the greatest upper bound, or join when there is no other element b such as x b and y b, and b j. A lattice is formed when a partial order imposes on a finite set X that every pair of the elements in X has both a meet and a join (Freeman and White 1993: 131). Galois lattice A Galois lattice encompasses dual ordering. It has two nonempty sets: actor set A and event set E. The two sets are linked by the affiliation patterns that assign actors to events. Therefore, Galois lattice is defined with a triple (A, E, I), in which I is the binary relation in the matrix A E. The sub-matrix A, which is contained in the bipartite matrix in Table 4_3, displays such actor by event matrix. Now considering P(A) = {A1, A2, …}, a collection of subsets of A, and P(E) = {E1, E2, …}, a collection of subsets of E. The I relation defines the mapping from P(A) to P(E): B B : B {e E | (a, e) I for all a A} (4.35) The above mathematic expression indicates that the mapping can identify all the events certain actor or actors are affiliated with. For example, the sub-matrix A in Table 4_5 suggests that actor 1 is affiliated with events 1 and 2, whereas actors 1 and 2 are affiliated with event 1. Conversely, the mapping can take place from P(E) to P(A): F F F {a A | (a, e) I for all e E} 65 (4.36) The above expression means the mapping should identify all the actors certain event or events attract. The sub-matrix A in Table 4_5 shows that event 2 attracts actors 1, 3, and 5, whereas events 2 and 3 attract only actor 5. Combining both mappings, the Galois lattice shows how subsets of actors are affiliated with subsets of events. As a convention, the universal lower bound of the lattice contains all the elements in A, and its universal lower bound contains all the elements in E. Figure 4_13 displays the Galois lattice pictorially, using the affiliation matrix in Table 4_5 as an example. The graph describes the three events as in A, B, and C and the five actors in numbers 1, 2, 3, 4, and 5. Each point in the graph labels both the actors and the events that define it. Down in the bottom, the lattice contains the largest collection of events. As it moves up, it contains larger collection of actors and smaller collections of events. The starting point at the bottom has all events (ABC) but an empty set of actors ( ) because no actor attends all three events. Moving up, the lattice graph shows that the events A and B share actor 1, and the events B and C share actor 5. Moving further up, the subsets of events get smaller but the subsets of actors become larger. It shows that the event A attracts actors 1 and 2, whereas the event B attracts actors 1, 3, and 5. Event C attracts actors 4 and 5. The top portion of the lattice lists all actors but no event, indicating that no event attracts all actors in the network. In general, actors that are incident to a line descending from events are affiliated with those events. In figure 4_13, actor 1 is incident on lines from events A and B respectively; indicating that actor 1 is affiliated with both events. Conversely, events that are incident on the lines ascending from actors attract those actors. For example, Figure 4_13 shows that event C is incident 66 on the line ascending from actor 5, suggesting that event C attracts actor 5. Event B receives lines from both actors 1 and 5, indicating that both actors attend event B. The Galois lattice also shows some affiliation patterns between events and between actors. For example, Figure 4_13 illustrates that actor 2 does not participate in any other events without actor 1 (actor 2 only participates in the event (A), with which actor 1 is also affiliated). Likewise, actor 4 only participates in the event (C) that draws actor 5. In contrast, neither actor 1 nor actor 5 restricts itself in the events that draw actor 2 or 4 respectively. In other words, the participation in certain event for actors 1 or 5 is not contingent upon participation of other actors. From the event’s perspective, Figure 4_13 also shows that the three events contain distinctive sets of actors, in the sense that the actors in the three sets are overlapping but not identical. A more elaborated network with more actors and events may reveal some containment structures that certain actors who are present in one event appear certainly in the other events (Freeman and White 1993: 135). Thus, we can observe all three types of relations from the Galois lattice: (1) the actor-event relation, (2) the actor-actor relation, and (3) the event-event relation. Despite the clear advantage of using Galois lattice to examine simultaneously the structural features of all three types of relations, its application is limited in representing large dataset. In this vein, Galois lattice is similar to graphs, whose principal use is to represent, not to reduce data. Large datasets commonly embrace highly complex structures that overwhelm the Galois lattice representation. Even with reduce symbols in Galois lattice (suggested by Freeman and White 1993), observers may encounter great difficulty to entangle the complex images generated by lattice in representing large datasets. 67 Galois Lattice of Network Cliques Researchers suggest that some statistical or algebraic data reduction techniques can be used to simplify the visual representation of Galois lattice (White 1996; Duquenne 1996). Freeman (1996) cogently recommends that Galois lattice be used to facilitate representation of cliques among social actors. The classic Luce-Perry definition (Luce and Perry 1949) of the cliques presumes a binary symmetric squared matrix ( A A ) on a social relation R. A clique C is a maximal subset containing three or more actors among whom all pairs are linked by R. The term “maximal” means that no clique can be contained in a larger clique. However, in practice, cliques often are too small, too many, or too overlapping to reflect intuition social group structures (Freeman 1996:174). The application of Galois lattices to cliques involves replacing the collection of events with a collection of cliques. In previous section we state that Galois lattice is defined with a triple (A, E, M), in which A is a set of human actors, E is a set of events, and M is a binary relations in A E . Applying Galois lattice to cliques, the Galois lattice is redefined with an another triple (A, C, and M), in which A and M are the same as their original connotation, and C indicates a set of cliques. Following the similar layout as the Galois lattice of events, Galois lattice of cliques places the larger collection of cliques towards the bottom and the larger collection of actors toward the top. Assuming that figure 4_13 shows the Galois lattice of cliques, instead of lattice of events, the bottom lies the three cliques ABC with the null set indicating that no actor belongs to all three cliques. As the graph moving up, it shows fewer cliques and larger set of actors. Actor 1 belongs to both cliques A and B, whereas actor 5 belongs to both cliques B and C. Moving further up, clique A contains actors 1 68 and 2, clique B includes actors 1, 3, and 5, whereas clique C has actors 4 and 5. The entire set of actors lies on the top of the graph, with the null set indicating that no clique encompasses all five actors. Freeman (1996) discussed several important structural properties in Galois lattice of network cliques. First, two overlapping cliques will be linked by descending lines that converge at some labeled point lower in the lattice, whereas two non-overlapping cliques will be linked only at the unlabeled universal lower bound with null set ( ). Assuming that Figure 4_13 describes a Galois lattice of network cliques, cliques A and B converge at a lower point labeled with an actor 1, indicating that the two cliques are overlapping by sharing a common actor 1. In contrast, cliques A and C are not overlapping, indicated by their converging point that is at the universal unlabeled lower bound. Second, Freeman (1996) characterized the position of individual actors in the clique lattice with several key dimensions such as chain, length, height, and depth. A chain is a sequence made up of entirely of ascending lines of entirely of descending lines leading from one element to another. The length of that chain is the number of lines it contains. The height of an actor is the length of chains ascending from the universal lower bound to that actor. In contrast, the depth of an actor is the length of chains from the universal upper bound to that actor. Therefore, actors who appear near the bottom of the lattice would have great depth but low height, whereas actors near the top of the lattice would have great height and low depth. Those with great depth but low height are deeply embedded in the network. They involve in several cliques but their affiliations with those cliques are not dependent on others’ affiliation. In this sense, they are the core members of the cliques. In contrast, those with great height but low depth involve with 69 the network superficially. Their affiliations with certain cliques depend on others’ affiliations. In this sense, they are the peripheral members of the cliques. Diagnosing Figure4_13 with those insights, we can ascertain that actors 1 and 5 are core members to cliques AB and BC respectively, whereas actors 2, 3, and 4 are peripheral actors to those cliques. Correspondence Analysis While lattice represents algebraic approach to display affiliation networks, correspondence analysis uses scaling technique to achieve the “joint display” of actors and events in an affiliation network (Wasserman and Faust 1994: 291-343; Faust 2005). Correspondence analysis is accomplished mainly through a mathematical technique called “Singular Value Decomposition (SVD).” Here we provide a very sketchy description of SVD, focusing on the issues that are directly relevant to affiliation matrix (for details on SVD, see Strang 1996). SVD is a decomposition of a matrix A, of size ( g h). A X YT (4.37) The equation contains , which is a diagonal matrix of singular values {λK}, X, the matrix of left singular vectors of size g h , and Y, the matrix of right singular vectors of size h h . The number of singular values is also called rank, denoted commonly as W. The SVD uses the rank “W” to scale the actors and events in a graphic display to approximate their entries in A. The SVD in correspondence analysis involves decomposition of the normalized version of A. There are two methods to produce the entries values for the normalized A. One is by dividing the entries in original matrix A by the square root of the product of the 70 row and column marginal totals. The other method involves computing the product of three matrices, two diagonal matrices R 1 2 , and C 1 2 , and the matrix of A (Faust 2005). In particular, R C 1 2 1 2 = diag ( = diag ( 1 ai 1 a j ) (4.38) ) (4.39) Multiplying the three matrices R 1 2 AC 1 2 produces the normalized version of A, which can be obtained also by dividing the entries in original matrix A by the square root of the product of the row and column marginal totals. Correspondence analysis involves singular value decomposition of the matrix result of R R 1 2 AC 1 2 X YT 1 2 AC 1 2 , (4.40) Correspondence analysis produces three sets of information: a set of g scores for rows of the matrix, U = {uik}, for i = 1,2,…g and k = 1,2,…w; a set of h scores for columns of the matrix, V = {vjk}, j = 1,2,…h and k = 1,2,…w; and the singular values = {λK}, for k = 1,2,…w, which indicates the importance of each dimension. To achieve joint display of row and column entries, correspondence analysis compute the score for an actor as the function of the weighted average of the scores for the events with which it is affiliated and the score for an event as the function of the weighted average of the scores of its constituent actors. The formulas are as the following, 71 h λk uik = aij a j 1 v jk (4.41) u ik (4.42) i and g λk vjk = aij a i 1 j In both equations, the aij is the entry value of the ith row and the jth column in the original matrix A. To illustrate, we use our previous example of the affiliation network data consisting of the five actors and three events. Table 4_6 shows the matrix A in its both original and normalized version. Table 4_6 shows the UCINET solution to the Singular Value Decomposition of the normalized A, which produces three sets of scores uik, vjk, and λK. Note that the actors’ scores are the function of the scores of the events with which the actors are affiliated. Likewise, the events’ scores are the function of scores of the actors they attract. In particular, actor 1’s score (-0.661) in the first dimension is derived through the weighted average of the two events it is affiliated divided by the 1 1 (1.146) 0 2 0.661 . Likewise, the score for event 1 in singular value (λ1) 2 0.866 dimension 1 (-1.146) is derived through the weighted average of scores of the actors it 1 1 (0.661) (1.323) 2 1.146 . Interested readers can attracts (actors 1 and 2) 2 0.866 determine that scores of other actors and events should match with the computed scores 72 using the event scores, with which the actor is affiliated or using the actor scores, whom the event attracts. Figure 4_14 shows the visual graph of the correspondence analysis of the affiliation network. Numbers in parentheses are the scores in both dimensions, which should be the same as their corresponding numbers in Table 4_7. The X-axis and Y-axis denote the first dimension and the second dimension respectively (λ). From the graphic display, we determine that actor 1, who is affiliated with events 1 and 2, and actor 5, who is affiliated with events 2 and 3, are central in both dimensions, thus locating in the center of the graph. Event 2 appears to be the center of the first dimension, which is possibly due to that it attracts the most actors (actors 1, 3, and 5), but it is slightly further away from the center of the second dimension than events 1 and 3, each attracts only two actors. Actor 3 locates at the center of the first dimension, which is due to its affiliation with the central event (event 2), but compared with all other actors, actor 3 also locates at the most peripheral position at the second dimension, possible because it attends only one event. Actors 2 and 4 are in peripheral locations in both dimensions, reflecting their respective affiliations with only one event. Likewise, events 1 and 3 are also peripheral in both dimensions, suggesting that they have attracted fewer party-goers than does event 2. 73 References Alba, Richard D. 1973 “A Graph-Theoretic Definition of a Sociometric Clique” Journal of Mathematical Sociology 3:113-126 Aldenderfer, Mark S. and Roger K. Blashfield. 1984. Cluster Analysis. Beverly Hills.: Sage Publications Alderson, Arthur and Jason Beckfield. 2004. “Power and Position in the World City System” American Journal of Sociology 109/4: 811-851 Anderson, Carolyn, Stanley Wasserman and Bradley Crouch. 1999. “A p* Primer: Logit Models for Social Networks” Social Networks 21:37-66 Anderson, James G and Stephen Jay. 1985. “Computers and Clinical Judgement: The Role of Physician Networks” Social Science and Medicine 20/10: 969-979 Blau, Peter M; Ruan, Danching; Ardelt, Monika. 1991. “Interpersonal Choice and Networks in China” Social Forces 69/4: 1037-1062 Boorman, Scott and Harrison White. 1976. “Social Structure from Multiple Networks. II. Role Structures” American Journal of Sociology 81/6: 1384-1446 Borgatti, Stephen and Martin Everett. 1992. “Notions of Position in Social Network Analysis” Sociological Methodology 22:1-35 Borgatti, Stephen and Martin Everett. 1993. “Two Algorithms for Computing Regular Equivalence” Social Networks 15/4: 361-376 Borgatti, Steven, Martin Everett and Paul Shirey 1990. “LS sets, Lambda Sets and Other Cohesive Subsets. Social Networks 12: 337-357 Borgatti, Stephen and Martin Everett. 1989. “The Class of All Regular Equivalences: Algebraic Structure and Computation” Social Networks 11/1: 65-88 Breiger, Ronald L. 1990. “Social Control andSocial Networks: A Model from GeorgSimmel,” Pp. 453-476 in Craig Calhoun, Marshall W. Meyer, and W.Richard Scott (eds.), Structures of Power and Constraint: Papers in Honor of Peter M. Blau. New York: Cambridge University Press Burt, Ronald S. 1992. Structural Holes: the Social Structure of Competition. Cambridge, Mass.: Harvard University Press Burt, Ronald S. 1979. “Disaggregating the Effect on Profits in Manufacturing Industries of Having Imperfectly Competitive Consumers and Suppliers” Social Science Research 8/2: 120-143 Burt, Ronald S. 1978. “Cohesion versus Structural Equivalence as a Basis for Network Subgroups” Sociological Methods and Research 7/2: 189-212 74 Crouch, Bradley and Stanley Wasserman 1998 “A Practical Guide to Fitting Social Network Models via Logistic Regression” Connections 21:87-101 Doreian, Patrick, Vladimir Batagelj, Anuska Ferligoj. 2005. Generalized Blockmodeling Cambridge, U.K. Cambridge University Press Doreian, Patrick and Katherine Woodard 1994 “Defining and Locating Cores and Boundaries of Social Networks” Social Networks, 16/4:267-293 Dunbar, RIM and M Spoor 1995 “Social Networks, Support Cliques and Kinship” Human Nature 6: 273-290 Duquenne, Vincent. 1996. “On Lattice Approximations: Syntactic Aspects” Social Networks 18/3:189-199 Everett, Martin. 1985. “Role Similarity and Complexity in Social Networks” Social Networks 7/4:353-359 Everett, Martin, John Boyd, and Stephen Borgatti. 1990. “Ego-Centered and Local Roles: A Graph Theoretic Approach” The Journal of Mathematical Sociology 15: 163172 Faust, Katherine. 2005. “Using Correspondence Analysis for Joint Displays of Affiliation Networks.” in Models and Methods in Social Network Analysis, edited by Carrington, Peter J., John Scott, and Stanley Wasserman. New York: Cambridge University Press. Faust, Katherine. 1997. “Centrality in Affiliation Networks” Social Networks 19/2: 157191 Faust, Katherine. 1988. “Comparison of Methods for Positional Analysis: Structural and General Equivalences” Social Networks 10/4: 313-341 Feldman-Savelsberg, Pamela, Flavien Ndonko and Song Yang. 2005. “Remembering ‘the troubles:’ Reproductive Insecurity and the Management of Memory in Cameroon” Africa 75/1: 10-29 Frank, Ove and David Strauss. 1986. “Markov Graphs” Journal of the American Statistical Association, 81:832—842 Freeman, Linton 2005 “Graphic Techniques for Exploring Social Network Data” Pp 248270 in Models and Methods in Social Network Analysis, edited by Peter J. Carrington, John Scott and Stanley Wasserman Cambridge MA: Cambridge University Press Freeman, Linton 2000 “Visualizing social networks” Journal of Social Structure 1:1-15 75 Freeman, Linton C. 1992. “The Resurrection of Cliques: Application of Galois Lattices” BMS, Bulletin de Methodologie Sociologique 37: 3-24 Freeman, Linton. 1979. “Centrality in Social Networks: I. Conceptual Clarification” Social Networks 1: 215-239 Freeman, Linton. 1977. “A Set of Measures of Centrality Based Upon Betweeness” Sociometry 40:35-41 Freeman, Linton and Cynthia Webster. 1994. “Interpersonal Proximity in Social and Cognitive Space” Social Cognition, 12/3: 223-247 Freeman, Linton C and Douglas White. 1993. “Using Galois Lattices to Represent Network Data” Sociological Methodology 23: 127-146 Freeman, Linton, Stephen Borgatti and Douglas White. 1991. “Centrality in Valued Graphs: A Measure of Betweeness Based on Network Flow” Social Networks 13: 141-154 Freeman, Linton C, Kimball Romney, and Sue Freeman. 1987. “Cognitive Structure and Informant Accuracy” American Anthropologist, 89/2: 310-325 Holland, Paul and Samuel Leinhardt. 1981 “An Exponential Family of Probability Distributions for Directed Graphs.” Journal of the American Statistical Association. 76:33-65 Knoke, David, George W. Bohrnstedt, Alisa Potter Mee. 2002. Statistics for Social Data Analysis 4th Edition, Wadsworth Publishing Knoke, David and Ronald Burt. 1983. Prominence. Pp 195-222 In Applied Network Analysis: A Methodological Introduction, edited by Burt, Ronald and Michael J. Miner, Beverly Hills CA: Sage Knoke, David and David Rogers. 1979. “A Blockmodel Analysis of Interorganizational Networks” Sociology and Social Research 64/1: 28-52 Kruskal, Joseph B. Myron Wish. 1978. Multidimensional Scaling. Beverly Hills, Calif.: Sage Publications Luce, Duncan and Albert D. Perry. 1949. “A Method of Matrix Analysis of Group Structure.” Psychometrika 14: 95—116 Marsden, Peter 2002 “Egocentric and Sociocentric Measures of Network Centrality” Social Networks 24: 407-422. McPherson, Miller. 1982. “Hypernetwork Sampling: Duality and Differentiation among Voluntary Organizations” Social Networks 3/9:225-249 Mokken, Robert J. 1979. “Cliques, Clubs and Clans” Quantity and Quality 13:161-173 76 Moreno, Jacob L 1953 (Revised Edition) Who Shall Survive? Foundations of Sociometry, Group Psychotherapy, and Sociodrama Beacon, NY: Beacon House. Nowicki, Krzysztof and Tom Snijders. 2001. “Estimation and Prediction for Stochastic Blockstructures” Journal of the American Statistical Association 96/455: 10771088 Pampel , Fred C. 2000. Logistic Regression: a Primer Thousand Oaks, Calif.: Sage Publications Pattison, Philippa and Stanley Wasserman. 1999 “Logit Models and Logistic Regressions for Social Networks, II. Multivariate Relationships” British Journal of Mathematical and Statistical Psychology 52: 169–193 Pattison Philippa and Stanley Wasserman. 1999. “Logit Models and Logistic Regressions for Social Networks: II. Multivariate Relations” British Journal of Mathematical and Statistical Psychology 52/2: 169-193 Peay, Edmund R. 1980. “Connectedness in a General Model for Valued Networks” Social Networks 2: 385-410 Robins Garry Robins, Philippa Pattison and Stanley Wasserman. 1999. “Logit Models and Logistic Regressions for Social Networks: III. Valued Relations” Psychometrika 64/3: 371-394 Sabidussi, Gert. 1966 “The Centrality Index of a Graph” Psychometrika 31:581-603 Seidman, Stephen 1983 “Network Structure and Minimum Degree” Social Networks 5: 269-284 Strang, Gilbert. 1988 Linear Algebra and its Applications. 3rd ed. San Diego : Harcourt, Brace, Jovanovich Strauss, David and Michael Ikeda 1990 “Pseudolikelihood Estimation for Social Networks” Journal of the American Statistical Association 85/409: 204-212 Wasserman, Stanley and Philippa Pattison. 1996. “Logit Models and Logistic Regressions for Social Networks: I. An Introduction to Markov Graphs and P” Psychometrika 61/3: 401-425 Wasserman, Stanley and Katherine Faust 1994 Social Network Analysis: Methods and Applications Cambridge; New York: Cambridge University Press White, Douglas and Karl Reitz. 1983. “Graph and Semigroup Homomorphisms on Networks of Relations” Social Networks 5/2:193-234 77 White, Douglas. 1996. “Statistical Entailments and the Galois Lattice” Social Networks 18/3: 201-215 White, Harrison, Scott Boorman and Ronald Breiger. 1976. “Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions” American Journal of Sociology 81/4: 730-780 Winship, Christopher and Michael Mandel. 1983. “Roles and Positions: A Critique and Extension of the Blockmodeling Approach” Sociological Methodology 14: 314344 Wu, Lawrence L. 1983. “Local Blockmodel Algebras for Analyzing Social Networks” Sociological Methodology 14:272-313 78