KliqueFinder: Identifying Clusters in Network Data Kenneth A. Frank Michigan State University Based on: • • • • Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56 Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119. *Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006). "Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123 * co first authors. https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representation home 1 Overview • • • • • • • • • • Clustering and Graphical Representations of Networks Running KliqueFinder... – Step 1) Criteria for Determining Group Membership – Step 2: Maximizing Criterion – Step 3) Examine evidence of clusters – Step 4) Evaluating the Performance of the Algorithm : Did... Make Sociogram in Netdraw Confidentiality/Ethical issues in Collecting Network Data Modifying the Image: Adding Node Data or Relations... Two mode Software Challenge... Batch KliqueFinder Prepping Converting data A Priori Clusters home 2 Clustering and Graphical Representations of Networks video : (26:09-31:41): ID: kenfrank@msu.edu PW:kenfrank2014 Goal: to identify patterns in the network • Rearrange rows and columns of social network matrix to reveal clustering • Plot actors and ties in two dimensions to reveal clustering home 3 Theory for defining cluster membership • cohesion (clusters are called subgroups): an actor should be in a cluster if the actor has demonstrated a preference for engaging in ties with members of the cluster. – Result: ties are concentrated within subgroups • structural equivalence (blocks): an actor should be in a cluster if the actor engages in a similar pattern of ties as members of that cluster. – Result: blocks represent positions, but ties not necessarily concentrated within blocks. home 4 Crystallized Sociogram: Friendships Among the French Financial Elite Lines indicate friendships: solid within subgroups, dotted between subgroups. numbers represent actors Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686 5 Crystallized Sociogram: Clusters in Foodwebs Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure." Nature 426:282-285 6 Data Input File name must be less than 20 character. Best if file name is six characters followed by .list: xxxxxx.list . For example stanne.list Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted New: flexible columns, Old (10 spaces for each) Same results Prepping data in excel Prepping Data in UCINET Converting data using sas ID’s should be 6 digits or less 7 Data Edgelist First two rows do not appear in the data – I put them there to show the format: 10 spaces for each entry Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted Best if file name is six characters followed by .list. xxxxxx.list For example stanne.list New version of KliqueFinder is more flexible About 10 column widths. ID’s should be 6 digits or less Prepping data in excel Prepping Data in UCINET Converting data using sas 8 Steps for finding clusters video: (31:41-43:30): ID: kenfrank@msu.edu PW:kenfrank2014 1) Determine criterion for defining clusters 2) Maximize criterion 3) Examine evidence of clusters 4) Evaluate performance of the algorithm 5) Interpret clusters commonality of attributes focal experiences subsequent behavior home 9 Step 1) Criteria for Determining Group Membership Structural Equivalence: Factor analyze sociomatrix (Katz & Kahn) iteratively rearrange and revalue rows and columns (CONCORR -- White el al., 1976) Cohesion utilize fixed criteria (e.g., must be connected to at least k others in clusters, or must be minimal path length from k others, etc). use flexible criterion -- preference relative to group sizes and number of ties: home 10 Model Based Cohesion Wii’=1 if tie between actors i and i’, 0 otherwise samegroupii’ = 1 if actors i and i’ are members of the same subgroup, 0 otherwise. Then θ1 represents subgroups salience: So ...... Maximize θ1 (odds ratio) 11 Odds Ratio for Association Between Common Subgroup Membership and The Occurrence of Ties Between Actors home 12 Step 2: Maximizing Criterion • 1) find a subgroup seed (3 actors who interact with each other, and with similar others) • 2) add to the cluster to maximize θ1 until you cannot do any more • 3) start new subgroup with new seed • 4) shuffle between existing subgroups • 5) make new subgroups as necessary, dissolve existing ones as necessary. home 13 KliqueFinder Algorithm: Phase I Computat ionally intensive, modify for large networks Initialize: assign each actor to own subgroup Find subgroup seed of 2 or 3 Identify single move that most increases objective function θ1 Does move increase function? No yes Reassign actor that makes best move If assignment moves actor out of a group of 3, reassign reamaining 2 to next best groups For finding best subgroup seed: 1) can only choose from unaffiliated actors 2) Each actor can only be a seed onc KliqueFinder Algorithm: Phases II and III • Phase II: If best move does not increase objective function and there are fewer than 3 actors available for subgroups then – Attach all isolated (or singleton) actors to best existing subgroups, even if this reduces objective function • Phase III: shuffle actors between existing subgroups without seeding new ones or disbanding existing ones – Number of subgroups is fixed – This is simple hill climbing and can be cast as EM algorithm home Running KliqueFinder video • :(43:30-1:01:00): ID: kenfrank@msu.edu PW:kenfrank2014 Download KliqueFinder at –http://hlmsoft.net/wkf/ –Follow instructions to install. Put in c:\kliqfind –Mac users: vmware fusion, Windows 7, 32 bit: http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/ • Click on “Browse…” button to specify the directory where the data file is located. home 16 KliqueFinder • Choose “Basic setup” and then click “Run setup file” button. home 17 KliqueFinder • Click on the “Browse” button to choose a data file. home 18 Run Analysis Data file 19 New Version of Data Input more Flexible File name must be less than 20 characters ID’s should be 6 digits or less Actor 1 interacts with actor 2 at a level of 3 Extent of relation can be binary or weighted New: flexible columns, Old (10 spaces for each) Same results Prepping data in excel Prepping Data in UCINET Converting data using sas 20 View Clusters Output 21 Blocked Network Data N 24 Group And Actor Id |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037| ------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......| ------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......| ------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......| ------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D| θ1 =1.1738 22 Step 3) Examine evidence of clusters 1) randomly redistribute ties 2) apply algorithm 3) record value of odds ratio and θ1 4) repeat 1000 times to generate distribution 5) use mean of distribution as baseline for comparison home 23 Randomly Redistributing Ties home 24 Apply Algorithm to Random Data, home θ1=.81822 25 Monte Carlo Sampling Distribution video: (1:06:35-1:18:50) ID: kenfrank@msu.edu PW:kenfrank2014 Data can include weights Indicate simulate data Output in sampdist.dat θ1=Log odds/2 Set up sampling. Remember to do “new data” set up when done To prepare for next analysis Odds Ratio 26 spss Code for Reading in Sample Distribution Data SAS GET DATA title "Sampling distribution for theta1"; /TYPE=TXT data one; /FILE="C:\KLIQFIND\sampdist.dat" infile "sampdist.dat" missover; /FIXCASE=1 Input theta1 odds1; /ARRANGEMENT=FIXED /FIRSTCASE=1 /IMPORTCASE=ALL proc univariate plot; /VARIABLES= var theta1; /1 theta1 0-29 F30.10 oddsratio 30-59 F30.10 Stata samplesize 60-89 F30.10. CACHE. *This command imports the data file EXECUTE. import delimited C:\KLIQFIND\sampdist.dat, DATASET NAME DataSet9 WINDOW=FRONT. delimiter(" ", asstring) DATASET ACTIVATE DataSet9. GRAPH /HISTOGRAM=theta1. *These commands perform data management: 27 drop v1 rename v2 theta1 rename v3 oddsratio rename v4 samplesize *This command plots histogram for theta1: hist theta1,freq Comparison of Sampling Distributions 28 Distribution of θ1base From Application of the Algorithm to Data Simulated Without Regard for Subgroup Membership Observed value: 1.1738 29 Sampling Distribution Parameters Edit simulation parameters. First element is number of replications Must keep # of reps in first 5 columns 30 Approximate p-value Based on Previous Simulations PREDICTED THETA (1 base) BASED ON SIMULATIONS. VALUE BASED ON UNWEIGHTED DATA. 0.76985 ESTIMATE OF THETA (1 subgroup processes) 0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397 THE TOTAL THETA1 IS: 1.1738 APPROXIMATE TEST OF CONCENTRATION OF TIES WITHIN SUBGROUPS BASED ON SIZE OF THETA1 subgroup processes: THETA1 | SUBGROUP | APPROX | APPROX PROCESSES| LRT | P-VALUE 0.40 34.82 0.00 home Reject null hypotheses of no clusters: H0:Θ1 subgroup processes =0 31 Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the Correct Subgroups? • Many algorithms search for optimal subgroups. KliqueFinder does not, but how different are the subgroups it finds from the optimal or known subgroups? home 32 Output for Recovery of Subgroups PREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUP MEMBERSHIP, + OR - .5734 (FOR A 95% CI) 1.4989 The Log odds applies to the following table: OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | A | B | KNOWN | | | SUBGROUP |--------|--------| | | | SAME | C | D | | | | ------------------- Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide THE LOGODDS TRANSLATES TO AN ODDS RATIO OF 4.4766 WHICH INDICATES THE INCREASE IN THE ODDS THAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TO THE SAME SUBGROUP IF THEY ARE TRULY IN THE IN THE SAME SUBGROUP. 33 Odds of Recovery (Toy Example) Simulated data with known subgroups Observed subgroups identified by KliqueFinder 1 1 2 3 4 5 6 1 1 0 1 0 1 0 0 0 0 2 1 0 0 1 3 1 1 1 1 4 0 1 1 1 5 0 0 0 0 1 6 1 0 0 1 2 1 3 1 1 4 0 1 1 5 0 0 0 0 6 1 0 0 1 1 1 Cell A: 6 pairs correctly assigned to different subgroups: 1,5; 2,5; 3,5; 1,6; 2,6; 3,6 OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | | | KNOWN | A (6)| B (3)| SUBGROUP |--------|--------| | | | SAME | | | | C (2)| D (4)| ------------------- Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00 2 3 4 5 6 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 Missassignment of actor 4 contributes 3 to cell B and 2 to cell C Cell D: 4 pairs correctly assigned to same subgroup: (1,2; 1,3; 2,3; 5,6) Make Sociogram in Netdraw video : (1:01:00-1:06:22): ID: kenfrank@msu.edu PW:kenfrank2014 35 Sometimes Netdraw can’t find file retrieve manually 36 Modifying Image in Netdraw 37 38 Group And Actor Id Density = 4/(4x8)=1/8 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| Kliqfinder uses | | | | | | 2 1|221 1| 11 2|111122| Density =4/(4x5)=.20 because Group ID|7445|612214|98133560|796037| maximum number of nominations is 5 ------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| Data used for 1 A 4|33A.|......|........|......| 15|433A|......|........|......| multidimensional 1 A ------------+----+------+--------+------+ Scaling within DIRECT ASSOCIATIONS 2 B 26|.2..|B443..|........|......| GROUP 1 2 3 4 2 B 21|.1..|4B....|...4....|....2.| subgroups. A B C D 2 B 12|....|4.B...|........|......|LABEL Distance= N 4 6 8 6 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| GROUP maximum 1 2.42 0.00 0.20 0.05 2 B 14|....|....1B|........|......| value/cell entry ------------+----+------+--------+------+ 2 0.25 1.07 0.13 0.27 3 0.38 0.40 2.40 0.28 e.g., maximum 3 C 9|....|......|C...3.33|.3....| 4 0.21 0.17 0.67 1.17 3 C 8|.4..|..4...|.C.4..4.|4.....| value is 4, 3 C 11|....|......|33C.4.3.|..4...| So a tie of 2 3 C 13|.4..|.4....|444C....|......| 3|3...|.4....|4.44C...|......| 4/2=2, distance 3 C In xxxxxx.clusters 3 C 5|.1..|.....4|3.2.3C..|......| of 2 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......| ------------+----+------+--------+------+ Distance in multidimensional 4 D 17|.1..|......|.1......|D.1...| Scaling between subgroups 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| =maximum value /density 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 39 home 4 D 27|.1..|.1....|........|.3..3D| N 24 Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119. cohesion Structural similarity video: (1:19:15-1:23:40)) ID: kenfrank@msu.edu PW:kenfrank2014 40 Choosing lines: Groups 41 Confidentiality/Ethical issues in Collecting Network Data • Need names on survey • Data can be confidential but not anonymous (especially for longitudinal) • R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.” Social Networks 27 / 2 (2005): 89 – 93. Read it online. http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf – (All issues of social networks available via science direct) • Who benefits from network analysis? Who bears the cost? – • Kadushin, Charles “Who benefits from network analysis: ethics of social network research” Social Networks 27 / 2 (2005): Pages 139-153. Issues to raise when dealing with Human Subjects Board: – Klovdahl, Alden S. Social network research and human subjects protection: Towards more effective infectious disease control Pages 119-137 • Hint on Human Subjects boards: they like precedents. Once you have one network study accepted, refer to it when submitting others! • https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm home video : (1:23:41-1:28)ID: kenfrank@msu.edu PW:kenfrank2014 42 The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups 1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc. Benefit: reveals location of resources relative to social; structure Protection: does not reveal specific responses because all information is at the cluster level. 2) Provide locations from in a sociogram unique for each respondent, indicating where that person is located (“you are here”). But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses. Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information. Protection: Specific responses of others not revealed, so confidentiality preserved. home 43 Can even include names of actors home Using subgroups for feedback to respondents and in a proposal 44 Choosing Lines: Actor Level Within 45 Choosing Lines: Actor Level Remove group nodes 46 Choosing Lines: Actor Level Between 47 Choosing Lines: Group Level 48 Modifying the Image: Adding Node Data or Relations video : ID: kenfrank@msu.edu PW:kenfrank2014 : (1:49:35-2:07:48) http://www.analytictech.com/ucinet/download.htm http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0C B0QFjAA&url=http%3A%2F%2Fwww.analytictech.com%2FNetdraw%2FNetdra wGuide.doc&ei=6pC4Tp29Men3sQLv99WoCA&usg=AFQjCNHg_NTjlHOclmeJ kwQs2xRaiPYgXQ&sig2=WLwXKSjJq_Yinpfkwv0m4w http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data 49 Files for KliqueFinder Input data Parameters Network data Node data Alternative network data xxxxxx.list xxxxxx.ilabel xxxxxx.xnet Kliqfind.par Printo Simulate.par KliqueFinder Output xxxxxx.place Data containing actor ID’s and subgroup placement xxxxxx.clusters xxxxxx.vna Diagnostics for Netdraw and matrix formatted data 50 Modifying node data by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace *node data id type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 Add new node variable here (e.g. gender) then add data *Node properties ID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 "0B " -9.41864 15.75047 16777215 1 "0C " 2.06574 2.09162 16777215 1 "0D " 8.54812 10.10988 16777215 1 1 -10.52314 14.16442 16711680 1 10 2 -8.29999 13.27802 16711680 1 10 30 85 52 79 *Tie data from to any strength actor group between within technology 1 2 1 3 1 0 1 4 1 3 1 0 1 19 1 3 1 0 1 23 1 2 1 0 1 26 1 3 1 0 2 26 1 3 1 0 2 10 1 1 1 0 *Tie properties FROM TO color size headcolor headsize active "0A " "0B " 12632256 1 12632256 0 TRUE "0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE A B C D TRUE TRUE TRUE TRUE 1 TRUE 2 TRUE 0 1 1 1 0 0 1 51 Adding Node Attributes with Extra File KliqueFinder will put attributes into vna file xxxxxx.Ilabel xxxxxx.list File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data file 10 columns for ID; Skip a space; Name; Node attribute 1-5 stanne.list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Jacob 1 3 5 Stan 1 2 5 Linton 1 2 5 Charles 1 3 3 Mark 1 3 3 Tom 2 3 3 Ronald 2 3 5 Nan 2 1 3 Elizabeth 2 1 4 Barry 2 2 3 Martin 2 3 1 Steve 2 3 1 PeterC 2 1 5 Patrick 1 1 1 Katy 1 1 3 Kathleen 3 3 3 Ove 2 2 2 JamesC 5 5 5 Robert 4 4 4 JamesM 1 2 3 4 Noah 4 3 2 1 Marijtje 1 2 1 2 Ronald 2 1 2 1 Harrison 3 1 3 1 Duncan 4 1 4 1 Cut and paste into stanne.Ilabel 52 53 54 Interactive: adding node data or 55 56 Include Node Data in Image 57 Modifying Links Lines indicate friendships: solid within subgroups, dotted between subgroups. numbers represent actors Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686 58 Hostile Actions 59 Supportive Actions 60 35 E 25 • Each number is a teacher • G_ indicates grade in which teacher teaches • Lines connecting two numbers indicate teachers who are close colleagues Solid lines within subgroups, dashed between • Circles indicate cohesive subgroups B 15 5 C -5 D -15 -25 -35 A -45 -25 -15 -5 61 5 15 25 Ripple Plot • Overlay talk about technology on social geography of crystallized sociogram • Lines indicate talk about technology • Size of dot indicates teacher’s use of technology at time 1 • Ripples indicate increase in use from time 1 to time 2 home 62 Frank, K. A. and Zhao, Y. (2005). "Subgroups as a Meso-Level Entity in the Social Organization of Schools." Chapter 10, pages 279-318. Book honoring Charles Bidwell's retirement, edited by Larry Hedges and Barbara Schneider. New York: Sage publications. 63 Modifying Links by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace *node data id type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 Add new node variable here (e.g. gender) then add data *Node properties ID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 "0B " -9.41864 15.75047 16777215 1 "0C " 2.06574 2.09162 16777215 1 "0D " 8.54812 10.10988 16777215 1 1 -10.52314 14.16442 16711680 1 10 2 -8.29999 13.27802 16711680 1 10 30 85 52 79 A B C D TRUE TRUE TRUE TRUE 1 TRUE 2 TRUE Add new relation here (e.g. technology) then add data *Tie data from to any strength actor group between within technology 1 2 1 3 1 0 1 4 1 3 1 0 1 19 1 3 1 0 1 23 1 2 1 0 1 26 1 3 1 0 2 26 1 3 1 0 2 10 1 1 1 0 *Tie properties FROM TO color size headcolor headsize active "0A " "0B " 12632256 1 12632256 0 TRUE "0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE 0 1 1 1 0 0 1 64 Modifying Links with Extra File KliqueFinder will put attributes into vna file xxxxxx.xnet xxxxxx.list File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data file File containing extra network stanne.list Nominator nominee strength of tie 1 19 22 2 15 26 4 3 1 stanne.xnet 65 66 Modifying Links: Interactive – Finicky 67 Interactive Modifying Links 68 Two mode *Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123. * co first authors. Data source 1 2 69 video : ID: kenfrank@msu.edu PW:kenfrank2014:(1:39:25-1:49:35) Copy homact.list from c:\kliqfind/setups to c:\kliqfind 70 Two-mode Data Edgelist First two rows do not appear in the data – I put them there to show the format: 10 spaces for each entry Actor 1 participates in event 19 at a level of 1 Extent of relation can be binary or weighted New version of KliqueFinder is more flexible About 10 column widths. ID’s should be 6 digits or less Prepping data in excel Prepping Data in UCINET Converting data using sas 71 Two mode Clusters output 72 Blocked Two-Mode Blocked Network Data 73 Two-mode Crystallized Sociogram 74 Centralization & Centrality in KliqueFinder • KliqueFinder produces a measure of Warp. • Starts with distances defined by – Maximum value in network / observed value • E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4. – These are the distances used in the MDS to produce the sociograms (see “running KliqueFinder ppt”) • Obtains eigen values – within each cluster based on raw data within cluster – Between clusters based on 1/density of ties between clusters • Density=average value in a given block • Warp =sum of positive eigen values/sum of all eigen values – Note it does not use the square root of the eigen values (variances are more additive) • Output into xxxxxx.bcord (9th element) and into netdraw as node attribute for groups, called “centrality” • Centrality for individuals is distance to the center of their 75 home subgroup (radius). Running on a Large Data File (more than 1000 actors) If you start the program and it just sits there, it is looking for the best seed for the first subgroup. Seed is 3 actors, but it looks for all combinations of 3 that share common ties in network. Intensive, and unnecessary for large data (1st subgroup does not matter so much). To shortcut: change value from 12. save & run. 76 Software Challenge video : ID: kenfrank@msu.edu PW:kenfrank2014 :(2:07:57-2:08:15) • Analyze nonpr1.list – Evidence of clusters? – Performance of algorithm? • Replace lines with nonpr2 • Describe the KliqueFinder algorithm home 77 KliqueFinder Applications: Adding Individual Attributes in SAS: run KliqueFinder data file collt1.list make graph use ID from other file? Yes: sas file name: c:\kliqfind\indiv [be sure to include full path] id variable: nominator string variable: gradelev Save In sas, run socgramz in the working directory home 78 KliqueFinder Applications: Adding Individual Attributes: • Select “Yes” for “User ID (character) from other SAS file?” home 79 KliqueFinder Applications: Adding Individual Attributes: • Type the following information in the corresponding boxes • Then Click “Save” home 80 Choosing an ID Variable 81 With ID based on Grade home 82 KliqueFinder Applications: Replacing Lines run KliqueFinder data file collt1.list make graph save retrieve socgramz.sas in the working directory replace all occurrences of collt1.list with collt2.list run home 83 Opening socgramz.sas 84 Changing lines 85 Change lines to different source 86 New Lines based on Collt2 87 Batch KliqueFinder 88 Basics • Program runs KliqueFinder on multiple files • Input – List of filenames – Files containing data – BACK UP YOUR DATA FIRST! • Output – Clustering output (.place, .clusters, vna) for each list file home 89 Files File containing names of data files: testb.txt BACK UP YOUR DATA FIRST! Data file: stanne.list Data file: ffe.list 90 KliqueFinder • Browse to directory you want to work in • Choose “Basic setup” and then click “Run setup file” button. 91 Running Batch Mode BACK UP DATA FILES BEFORE RUNNING! File with names of data files Click here to run as batch 92 Prepping data in excel video : ID: kenfrank@msu.edu PW:kenfrank2014 :Time: (1:28-1:39) Name your file xxxxxx.list e.g., test01.list Right click Choose Formatted text (space delimited) 93 Prepping Data in UCINET Navigate to UCINET data Navigate to where you want to save: c:\kliqfind 94 Must remove “!” from file. There may be several !’s points are there because of Multiple data sets 95 Converting data using sas video : ID: kenfrank@msu.edu PW:kenfrank2014 : : Time: (2:10:43-2:19) data one; infile "badform.list"; input chooser chosen wt; data two; set one; file "ready1.list"; if wt ne . then put (chooser chosen wt) (10.); run; 96 A Priori Clusters A line with 99999 in the data file indicates in which a priori cluster an actor is placed. For example, actor 1 is in a priori cluster 3. Run repeat2 setup, and then proceed as usual. Remember to do “new data” setup when done. 97 KliqueFinder will make pictures based on a priori clusters Comparison of A Priori Clusters and Identified Solution Run as new data Data with a priori cluster assignments Run as usual then look at cluster output SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL 52. POSS 88. QAP standardized 98 STANDARDIZED measure, compare with 9.55565 normal distribution Data Containing Cluster Assignments File called stanne.place [datafile.place] Internal ID There may be Slightly different numeric formats Depending on the version of KliqueFinder 1.0 2.0 3.0 4.0 5.0 6.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 -27.0 User ID Cluster 1.0 2.0 4.0 19.0 23.0 26.0 6.0 8.0 20.0 15.0 12.0 17.0 16.0 27.0 28.0 2.0 2.0 1.0 4.0 4.0 2.0 3.0 3.0 3.0 1.0 2.0 4.0 4.0 4.0 4.0 ignore: for simulation only 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 If first number (internal ID) is negative, this indicates a tagalong – an actor connected to only one other. In this case, the last line should be read as the tagee, tagger, and group. So, actor 28 is connected to only one other actor (27) and is therefore assigned to actor 27’s cluster, which is cluster 4. 99 Including Cluster Membership in Influence Model SPSS SAS DATA LIST / intid 1-10 nominee 11-20 31-40 extra 41-50. BEGIN DATA 1.0 1.0 1.0 2.0 2.0 1.0 3.0 3.0 1.0 4.0 4.0 2.0 5.0 5.0 2.0 6.0 6.0 2.0 END DATA. DATASET NAME clusters WINDOW=FRONT. SORT CASES BY nominee(A). EXECUTE. MATCH FILES /FILE=yvar1 /FILE='indeg' /FILE=clusters /BY nominee. EXECUTE. cluster 21-30 simx 1.0 1.0 1.0 1.0 1.0 1.0 3.0 3.0 3.0 3.0 3.0 3.0 data clusters; *groups from KLiqueFinder; input intid nominator cluster simx extra; cards; 1.0 1.0 1.0 1.0 2.0 2.0 1.0 1.0 3.0 3.0 1.0 1.0 4.0 4.0 2.0 1.0 5.0 5.0 2.0 1.0 6.0 6.0 2.0 1.0 3.0 3.0 3.0 3.0 3.0 3.0 proc sort data=groups; by nominator; data withinfl; merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator)); by nominator; drop nominee _type_ _freq_; advanced: run influence model for technology Identify clusters from talkt2 Include cluster membership the influence model 100 Adding Patches Patch for one -mode Patch for Two-mode 101 Alternative community detection algorithms • http://cs.stanford.edu/people/jure/pubs/co mmunities-www10.pdf • http://www.uvm.edu/~pdodds/files/papers/ others/2009/lancichinetti2009a.pdf • http://fatweasel.net/analytics/networkanalysis/community-detection-in-networks/ home 102