Persons Through Groups 2-mode networks Overview Breiger: Duality of Persons and Groups •Argument •Method •Sociology Examples •Moody: Coauthorship •Methods: •Finish ego-networks •Working w. 2-mode data •Constructing a PTG network •Constructing a GTP network •(Bipartite graphs) Persons Through Groups 2-mode networks Breiger: 1974 - Duality of Persons and Groups Argument: Metaphor: people intersect through their associations, which defines (in part) their individuality. Duality implies that relations among groups implies relations among individuals Persons Through Groups 2-mode networks An Example: Interpersonal Network C B Intergroup Network E D 3 F 1 4 A 0 0 0 1 0 0 (4.3) 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 2 1 5 2 0 0 0 2 1 0 0 1 0 0 0 (4.4) 1 0 0 0 1 1 1 0 2 1 2 0 1 1 1 0 1 1 1 0 Problem: These two representations, though clearly related, are not easily compared. Persons Through Groups 2-mode networks An Example: To compare them, construct a person-to-group adjacency matrix: A= A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0 Each column is a group, each row a person, and the cell = 1 if the person in that row belongs to that group. You can tell how many groups two people both belong to by comparing the rows: Identify every place that both rows = 1, sum them, and you have the overlap. Persons Through Groups 2-mode networks An Example: Compare persons A and F A= A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0 1 A 0 F 0 AF 0 2 0 0 0 3 0 1 0 4 0 1 0 5 1 0 0 S = 1 = 2 = 0 Or persons D and F 1 D 0 F 0 DF 0 2 1 0 0 3 1 1 1 4 1 1 1 5 1 0 0 S = 4 = 4 = 2 Person A is in 1 group, Person F is in two groups, and they are in no groups together. Person D is in 4 groups, Person F is in two groups, and they are in 2 groups together. Persons Through Groups 2-mode networks An Example: A= A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0 Similarly for Groups: 1 2 12 A 0 0 0 B 1 0 0 C 1 1 1 D 0 1 0 E 0 0 0 F 0 0 0 2 2 1 • Group 1 has 2 members, group 2 has 2 members and they overlap by 1 members (C). Persons Through Groups 2-mode networks In general, you can get the overlap for any pair of groups / persons by summing the multiplied elements of the corresponding rows/columns of the persons-to-groups adjacency matrix. That is: Persons-to-Persons g Pij Aik A jk k 1 Groups-to-Groups p Gij Aki Akj k 1 Persons Through Groups 2-mode networks One can get these easily with a little matrix multiplication. First define AT as the transpose of A (Simply reverse the rows and columns). If A is of size P x G, then AT will be of size G x P. A Aji T ij A= A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0 AT = 1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0 Persons Through Groups 2-mode networks A B A= C D E F = A(AT) P G = AT(A) 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0 1 2 AT = 3 4 5 A 0 0 0 0 1 A B C D E F A 1 0 0 1 0 0 B 0 1 1 0 0 0 P C 0 1 2 1 0 0 = P (6x6) D 1 0 1 4 1 2 E 0 0 0 1 1 1 F 0 0 0 2 1 2 See: Breiger_ex.sas for an IML example. C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0 (5x6) (6x5) A * AT (6x5)(5x6) B 1 0 0 0 0 AT * A = P (5x6) 6x5) (5x5) 1 2 3 4 5 1 2 1 0 0 0 2 1 2 1 1 1 G 3 0 1 3 2 1 4 0 1 2 2 1 5 0 1 1 1 2 Persons Through Groups 2-mode networks Theoretically, these two equations define what Breiger means by duality: “With respect to the membership network,…, persons who are actors in one picture (the P matrix) are with equal legitimacy viewed as connections in the dual picture (the G matrix), and conversely for groups.” (p.87) The resulting network: 1) Is always symmetric 2) the diagonal tells you how many groups (persons) a person (group) belongs to (has) In practice, most network software (UCINET, PAJEK) will do all of these operations. It is also simple to do the matrix multiplication in programs like SAS or SPSS Name Alessandro Tarozzi Alexander Pfaff-Talikoff Amar Hamoudi Anatoli Yashin Angela M ORand Anna Gassman-Pines Asia Maselko Avshalom Caspi Charlie Cloffelter Christina M. Gibson-Davis Duncan Thomas Elizabeth Frankenberg Elizabeth Oltmans Ananat Frank A. Sloan Jacob L. Vigdor James Moody James S Clark James W. Vaupel Jennan Read Jerry Reiter Kim Blankenship Kathleen Sikkema Keith E Whitfield Kenneth A Dodge Kenneth C Land Linda K George Linda M Burton Lisa A Keister M. Giovanna Merli Manoj Mohanan Marie Lynn Miranda Marjorie B McElroy P. J. Eric Stallard Patrick Bayer Peter Arcidiacono Phil Morgan Philip J. Cook Philip R Costanzo Rachel Kranton Sabrendu Pattanayak Seth Gary Sanders Sherman James Terrie E Moffitt V. Joseph Hotz William \"Sandy\" Darity Zeng Yi Health Fam 0 0 1 0 1 1 1 0 1 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 1 1 Devlp 1 0 0 0 0 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 1 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 Ineq 1 0 0 1 0 1 0 0 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 1 0 Persons Through Groups DuPRI Example =A G=(AT)A Health Fam HDev Ineqy 29 7 9 9 7 14 6 10 9 6 15 10 9 10 10 23 Area Overlap Among DuPRI Faculty P Human Dev = A(AT) (Inequality) Health Family Persons Through Groups 2-mode networks Online Version Persons Through Groups Sociology Example Or consider ties formed by sharing membership on a student committee (MA, exams, etc). (all committee memberships, line thickness proportional to number of joint appearances) Persons Through Groups Sociology Example Or consider ties formed by sharing membership on a student committee (MA, exams, etc). Duke English Department (all committee memberships, line thickness proportional to number of joint appearances) Persons Through Groups Sociology Example Or consider ties formed by sharing membership on a student committee (MA, exams, etc). Duke English Department Interactive version (all committee memberships, line thickness proportional to number of joint appearances) Persons Through Groups Sociology Coauthorship Sociology Coauthorship Networks Persons Through Groups Sociology Coauthorship (2-mode) (1-mode projection) Persons Through Groups Sociology Coauthorship LSL reaches 533 people in 3 steps. 3-degrees of Lynn Smith-Lovin Persons Through Groups Sociology Coauthorship 3-degrees of LSL Persons Through Groups Sociology Coauthorship The likelihood of coauthorship varies by type of work Persons Through Groups Sociology Coauthorship Persons Through Groups Sociology Coauthorship Largest Bicomponent, g = 29,462 0.04 0.27 0.50 0.73 0.96 Persons Through Groups Sociology Coauthorship Largest Bicomponent, n = 29,462 Persons Through Groups Director Interlocks Val Burris – Interlocks & Political Cohesion Persons Through Groups Director Interlocks Val Burris – Interlocks & Political Cohesion Persons Through Groups Director Interlocks Val Burris – Interlocks & Political Cohesion Persons Through Groups Director Interlocks Val Burris – Interlocks & Political Cohesion Effect size of indirect ties, by Dependent Variable 0.16 0.14 0.12 Party Contribution 0.1 Presidential Match 0.08 0.06 Presidential Correlation 0.04 0.02 0 Direct 2-ste 3-step 4-step 5-step 6-step Persons Through Groups Ecology Co-authorship Persons Through Groups Ecology Co-authorship Persons Through Groups Ecology Co-authorship Persons Through Groups Physician Networks Construct networks of physicians who share patients. Note we sampled patients from 5 states, here are resulting physicians from all the PA patients. Table 1. Network Sample Construction Patient Visits 2008 12,263,448 2009 12,977,008 2010 12,167,013 Total 37,407,477 Unique Patients 922,189 924,387 871,993 963,899 Unique Physicians 138,375 134,863 135,212 190,785 Persons Through Groups Constructing large 2-mode nets • The direct matrix multiplication approach is (highly) inefficient for large 2-mode networks. -couldn’t even hold the physician indicator matrix in memory Solution is to construct the bipartite list then construct edges as a summary over that. For example: Obs 1 2 3 4 5 6 7 8 9 10 rid 1 2 2 2 3 3 4 4 5 5 auid 60242 1961 16006 47741 50009 51417 30417 49612 8396 11500 In SAS, I then transpose the matrix by the mode I want to link by. So here, if I want an author to author network, I transpose by papers (Rid) Persons Through Groups Constructing large 2-mode nets • The direct matrix multiplication approach is (highly) inefficient for large 2-mode networks. -couldn’t even hold the physician indicator matrix in memory Then write a loop to construct the edge-parts data edges; set auplev; array aus(82) col1-col82; do i=1 to 81; if aus(i)^= . then do; snd=aus(i); end; else do; i=82; end; do j=i+1 to 82; if aus(j)^= . then do; snd=min(aus(i),aus(j); rcv=max(aus(i),aus(j); val=1; output; end; else do; j=82; end; end; end; keep snd rcv rid val; run; This produces all the edge parts, then sum by dyad to get the valued network. proc means data=edges noprint; class snd rcv; var val; output out=edgesum (where=(_type_=3)) sum=; run; Persons Through Groups Constructing large 2-mode nets In PAJEK, you can define an input graph as bipartite as: *Vertices 8 3 1 "Actor 1" 2 "Actor 2" 3 "Actor 3" 4 "Event 1" 5 "Event 2" 6 "Event 3" 7 "Event 4" 8 "Event 5" *Edges 14 15 24 25 26 28 34 37 38 So the first line has two vertices numbers, the total number of nodes (8) and the number in the first “row” mode (3). Then the edges all fall from mode 1 to mode 2. Persons Through Groups Bipartite “Two-Mode” graphs It is possible to construct a network that links people and their groups directly in a single network. In this case, the nodes are of 2 types: person and groups. Consider the classic example of the Southern Women’s data: Persons Through Groups Bipartite “Two-Mode” graphs The classic treatment of this network would create a person to person or a group to group network: Persons Through Groups Bipartite “Two-Mode” graphs The classic treatment of this network would create a person to person or a group to group network: Persons Through Groups Bipartite “Two-Mode” graphs Instead, you could analyze the network as a joint network, with two types of nodes: Persons Through Groups Bipartite “Two-Mode” graphs Instead, you could analyze the network as a joint network, with two types of nodes: Persons Through Groups Bipartite “Two-Mode” graphs 1 2 3 4 5 6 7 8 ---------------------------Actor 1 1. 0 0 0 1 1 0 0 0 Actor 2 2. 0 0 0 1 1 1 0 1 Actor 3 3. 0 0 0 1 0 0 1 1 Event 1 4. 1 1 1 0 0 0 0 0 Event 2 5. 1 1 0 0 0 0 0 0 Event 3 6. 0 1 0 0 0 0 0 0 Event 4 7. 0 0 1 0 0 0 0 0 Event 5 8. 0 1 1 0 0 0 0 0 It is always possible to arrange a 2mode network so that the adjacency matrix has all zeros in the blockdiagonal cells. Persons Through Groups Bipartite “Two-Mode” graphs Galois Lattices A new way to think about bipartite networks is as a collection of ordered sets, and then use some of the tools from discrete mathematics to map the collection of sets. For example, consider the set of all possible combinations of {1,2,3}. This can be represented in a network as: This is known as a Galois Lattice Persons Through Groups Bipartite “Two-Mode” graphs Galois Lattices Imagine you had the following data on actors and events: Persons Through Groups Bipartite “Two-Mode” graphs Galois Lattices Persons Through Groups Bipartite “Two-Mode” graphs Galois Lattices The Davis data in Lattice form: Topic / Text Models To uncover topics, we applying a similar process across papers and words. Basically a corpus is nothing more than a big two-mode network of papers containing words: Paper 1 Paper 2 Paper 3 Paper 4 Obedient 5 10 0 0 Loyal 6 5 1 0 Friendly 8 9 0 0 Aloof 0 1 9 15 Proud 0 0 5 4 Dog 2 1 0 0 Cat 0 0 1 1 Comparing across columns tells us whether the two papers are recognized by others as similar. similarity matrix Paper 1 Paper 2 Paper 3 Paper 4 Paper 1 -- Hi low Low Paper 2 Hi -- Low Low Paper 3 low Low -- Hi Paper 3 low low Hi -- Topic / Text Models Key differences are: a) we typically need to parse the text first for unimportant words, parts of speech or other particular features we care about. b) Weight words differently based on their importance in the corpus -Most common is the td-idf formulation, that gives higher weight to rare words c) Then define a similarity score rather than a simple count/volume of overlap Topic / Text Models Topic / Text Models Term “key” result Topic / Text Models Tgparse linked output: Weighting applied by tmutil These are all “under the hood” in the SAS “TextMiner” application (linked) Background Mining Science Products: Topic structure To uncover topics, we applying a similar process across papers: Example: One-step neighborhood of “More information, better jobs?” Background Mining Science Products: Topic structure To uncover topics, we applying a similar process across papers: Example: One-step neighborhood of “More information, better jobs?” Background Mining Science Products: Topic structure To uncover topics, we applying a similar process across papers: Background Mining Science Products: Topic structure Network Ecology Topic Map Borrett, Stuart R., James Moody & Achim Edelmann. 2014. “The Rise of Network Ecology: Maps of the topic diversity and scientific collaboration” Ecological Modeling (DOI: 10.1016/j.ecolmodel.2014.02.019) Man Made Pathogen Debate Community of Science Foundations Topic Structures The collaboration space is based on published papers and we’re curious how the papers are topically clustered. Here we used the Latent Dirichlet allocation (LDA) topic modeling routine on the full corpus of papers. LDA does not assign papers to topics exactly, but rather provides a degree of association based on the topic loadings depending on the paper’s distribution of terms. Community of Science Foundations Topic Structures We settled on an eight topic solution: Paper similarity matrix, sorted by topic loadings Community of Science Foundations Topic Structures Papers titles of papers with the top five topic loadings on each topic Title: Virology (emphasis on Influenza) top1 Growth of H5N1 influenza a viruses in the upper respiratory tracts of mice 0.99363 Transmission of Influenza Virus in a Mammalian Host Is Increased by PB2 Amino Acids 627K or 627E/701N 0.99274 The M Segment of the 2009 New Pandemic H1N1 Influenza Virus Is Critical for Its High Transmission Efficiency in the 0.99236 Guinea Pig Model Insertion of a multibasic cleavage site in the haemagglutinin of human influenza H3N2 virus does not increase pathogenicity in 0.99137 ferrets Reverse genetics demonstrates that proteolytic processing of the Ebola virus glycoprotein is not essential for replication in cell 0.99127 culture. Title: Evolutionary Genetics top2 Identifying Sigtures of Selection in Genetic Time Series 0.99457 A spatially explicit model of sex ratio evolution in response to sex-biased dispersal 0.99441 The magnitude of local adaptation under genotype-dependent dispersal 0.99431 The advantages of segregation and the evolution of sex. 0.99427 DISENTANGLING THE EFFECTS OF EVOLUTIORY, DEMOGRAPHIC, AND ENVIRONMENTAL FACTORS INFLUENCING GENETIC STRUCTURE OF TURAL POPULATIONS: ATLANTIC HERRING AS A CASE STUDY 0.99414 Title: Genetic Sequencing Sequence and organization of coelacanth neurohypophysial hormone genes: evolutiory history of the vertebrate neurohypophysial hormone gene locus Characterization of the neurohypophysial hormone gene loci in elephant shark and the Japanese lamprey: origin of the vertebrate neurohypophysial hormone genes Sequence Data from New Plastid and Nuclear COSII Regions Resolves Early Diverging Lineages in Coffea (Rubiaceae) top3 0.99445 Sequence characterization and comparative alysis of three Plasmids isolated from environmental Vibfio spp. 0.99267 Large Linear Plasmids of Borrelia Species That Cause Relapsing Fever 0.99244 0.99357 0.99308 Community of Science Foundations Topic Structures Papers titles of papers with the top five topic loadings on each topic Title: Immunology Cholinergic agonists regulate JAK2/STAT3 sigling to suppress endothelial cell activation top4 0.99368 CD4 expression on activated NK cells: Ligation of CD4 induces cytokine expression and cell migration 0.99295 Reduced DEAF1 function during type 1 diabetes inhibits translation in lymph node stromal cells by suppressing Eif4g3 0.99281 Persistent expression of Pax3 in the neural crest causes cleft palate and defective osteogenesis in mice 0.99259 Critical Role of the Tumor Suppressor Tuberous Sclerosis Complex 1 in Dendritic Cell Activation of CD4 T Cells by Promoting MHC Class II Expression via IRF4 and CIITA 0.99252 Title: Public Health (emphasis on HIV) Opportunities for health promotion education in child care. To Fund or Not to Fund Development of a Decision-Making Framework for the Coverage of New Health Technologies top5 0.99639 0.99515 Community-based research in AIDS-service organizations: what helps and what doesn't? 0.99445 Sustaining chronic disease magement in primary care: Lessons from a demonstration project 0.99418 Strengthening biostatistics resources in sub-Saharan Africa: Research collaborations through U.S. partnerships 0.99418 Title: Biochemistry (cellular) Functiol and structural roles of the N-termil extension in Methanosarci acetivorans protoglobin top6 0.99315 The effects of an ideal beta-turn on beta-2 microglobulin fold stability The Juxtamembrane Linker of Full-length Syptotagmin 1 Controls Oligomerization and Calcium-dependent Membrane Binding. Structure, conformatiol stability, and enzymatic properties of acylphosphatase from the hyperthermophile Sulfolobus solfataricus. The Escherichia coli Lpt Transenvelope Protein Complex for Lipopolysaccharide Export Is Assembled via Conserved Structurally Homologous Domains 0.99288 0.99267 0.99236 0.99220 Community of Science Foundations Topic Structures Papers titles of papers with the top five topic loadings on each topic Title: HIV Vaccines & Drugs Efficacy of zidovudine compared to stavudine, both in combition with lamivudine and indivir, in human immunodeficiency virus-infected nucleoside-experienced patients with no prior exposure to lamivudine, stavudine, or protease inhibitors (novavir trial). Stavudine, nevirapine and ritovir in stable antiretroviral therapy-experienced children with human immunodeficiency virus infection. Effect of HIV Infection Status and Anti-Retroviral Treatment on Quantitative and Qualitative Antibody Responses to Pneumococcal Conjugate Vaccine in Infants Prior meningococcal A/C polysaccharide vaccine does not reduce immune responses to conjugate vaccine in young adults. Long-Term Efficacy and Safety of Raltegravir Combined with Optimized Background Therapy in Treatment-Experienced Patients with Drug-ResistantHIV Infection: Week 96 Results of the BENCHMRK 1 and 2 Phase III Trials top7 0.99418 Title: Social Aspects of Health Care Impact of admission hyperglycemia on hospital mortality in various intensive care unit populations. top8 0.99572 High prevalence of chronic kidney disease in population-based patients diagnosed with type 2 diabetes in downtown Shanghai 0.99553 Does socioeconomic status affect mortality subsequent to hospital admission for community acquired pneumonia among older persons? 0.99488 F-18-FDG PET/CT Identifies Patients at Risk for Future Vascular Events in an Otherwise Asymptomatic Cohort with Neoplastic Disease Preinjury warfarin use among elderly patients with closed head injuries in a trauma center. 0.99461 0.99302 0.98871 0.98816 0.98403 0.99457 Community of Science Foundations Topic Structures Red-blue scale is size, circle is proportional to distribution in 2d space Community of Science Foundations Topic Structures Red-blue scale is size, circle is proportional to distribution in 2d space Community of Science Foundations Topics predict Debate Side Assign each node to the area they write in most: “Virology Influenza” Evolutionary Genetic Immunology Genetics Sequencing Public Health Cellular BioChem HIV/Drugs Social Aspects of health Extending Text beyond “bag of words” Key issue with text models is that they “chop up” language – subtle differences get lost: “country music problem” Solutions: • link words (k-word phrases). This adds in a little localized context • sentiment models: add a content-specific weight to each term, based on prior knowledge • Implication models. Goal here is to link terms/concepts to each other by the narrative implication implied in the sentence/corpus. Extending Text beyond “bag of words” Blocking the Future Bearman, Peter S., Robert Farris, and James Moody. “Blocking the Future: New Solutions for Old Problems in Historical Social Science.” Social Science History 23: 501-535. Extending Text beyond “bag of words” Blocking the Future One villager's life story Bearman, Peter S., Robert Farris, and James Moody. “Blocking the Future: New Solutions for Old Problems in Historical Social Science.” Social Science History 23: 501-535. Extending Text beyond “bag of words” Blocking the Future Combined narratives from multiple interviews Bearman, Peter S., Robert Farris, and James Moody. “Blocking the Future: New Solutions for Old Problems in Historical Social Science.” Social Science History 23: 501-535. Methods: Review Ego-Networks. 1) Go over network drawing programs 2) Go over ego-network creation programs 3) Go over ego-network measures programs 4) Go over persons-through-groups creation programs