quick power point on how to use KliqueFinder

advertisement
KliqueFinder:
Identifying Clusters in Network Data
Kenneth A. Frank
Michigan State University
Based on:
•
•
•
•
Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56
Frank, K. 1996. Mapping interactions within and between cohesive subgroups.
Social Networks 18: 93-119.
*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006).
"Identifying Social Contexts in Affiliation Networks: Preserving the Duality of
People and Events. Social Networks 28:97-123 * co first authors.
https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representation
home
1
Overview
•
•
•
•
•
•
•
•
•
•
Clustering and Graphical Representations of Networks
Running KliqueFinder...
– Step 1) Criteria for Determining Group Membership
– Step 2: Maximizing Criterion
– Step 3) Examine evidence of clusters
– Step 4) Evaluating the Performance of the Algorithm : Did...
Make Sociogram in Netdraw
Confidentiality/Ethical issues in Collecting Network Data
Modifying the Image: Adding Node Data or Relations...
Two mode
Software Challenge...
Batch KliqueFinder
Prepping Converting data
A Priori Clusters
home
2
Clustering and Graphical Representations of Networks
video : (26:09-31:41): ID: kenfrank@msu.edu PW:kenfrank2014
Goal: to identify patterns in the network
• Rearrange rows and columns of social
network matrix to reveal clustering
• Plot actors and ties in two dimensions to
reveal clustering
home
3
Theory for defining cluster
membership
• cohesion (clusters are called subgroups): an actor should be in a
cluster if the actor has demonstrated a preference for engaging in
ties with members of the cluster.
– Result: ties are concentrated within subgroups
• structural equivalence (blocks): an actor should be in a cluster if the
actor engages in a similar pattern of ties as members of that cluster.
– Result: blocks represent positions, but ties not necessarily concentrated within
blocks.
home
4
Crystallized Sociogram: Friendships Among the French Financial Elite
Lines indicate
friendships:
solid within
subgroups,
dotted between
subgroups.
numbers
represent actors
Rgt,Cen,Soc,Non
= political parties;
B=Banker,
T=treasury;
E=Ecole National
D’administration
Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital
Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686
5
Crystallized Sociogram: Clusters in Foodwebs
Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed
in food-web structure." Nature 426:282-285
6
Data Input
File name must be less than 20 character. Best if file name is six characters
followed by .list: xxxxxx.list . For example stanne.list
Actor 1 interacts with actor 2 at a level of 3
Extent of relation can be binary or weighted
New: flexible columns,
Old (10 spaces for each)
Same results
Prepping data in excel
Prepping Data in UCINET
Converting data using sas
ID’s should be 6 digits or less
7
Data
Edgelist
First two rows do not appear in the data –
I put them there to show the format: 10
spaces for each entry
Actor 1 interacts with actor 2 at a level of 3
Extent of relation can be binary or weighted
Best if file name is six characters
followed by .list.
xxxxxx.list
For example stanne.list
New version of KliqueFinder is more flexible
About 10 column widths.
ID’s should be 6 digits or less
Prepping data in excel
Prepping Data in UCINET
Converting data using sas
8
Steps for finding clusters
video: (31:41-43:30): ID: kenfrank@msu.edu PW:kenfrank2014
1) Determine criterion for defining clusters
2) Maximize criterion
3) Examine evidence of clusters
4) Evaluate performance of the algorithm
5) Interpret clusters
commonality of attributes
focal experiences
subsequent behavior
home
9
Step 1) Criteria for Determining Group
Membership
Structural Equivalence:
Factor analyze sociomatrix (Katz & Kahn)
iteratively rearrange and revalue rows and columns
(CONCORR -- White el al., 1976)
Cohesion
utilize fixed criteria (e.g., must be connected to at
least k others in clusters, or must be minimal path length
from k others, etc).
use flexible criterion -- preference relative to group
sizes and number of ties:
home
10
Model Based Cohesion
Wii’=1 if tie between actors i and i’, 0
otherwise
samegroupii’ = 1 if actors i and i’ are
members of the same subgroup,
0 otherwise.
Then θ1 represents subgroups salience:
So ...... Maximize θ1 (odds ratio)
11
Odds Ratio for Association Between Common Subgroup
Membership and
The Occurrence of Ties Between Actors
home
12
Step 2: Maximizing Criterion
• 1) find a subgroup seed (3 actors who
interact with each other, and with similar
others)
• 2) add to the cluster to maximize θ1 until
you cannot do any more
• 3) start new subgroup with new seed
• 4) shuffle between existing subgroups
• 5) make new subgroups as necessary,
dissolve existing ones as necessary.
home
13
KliqueFinder Algorithm: Phase I
Computat
ionally
intensive,
modify for
large
networks
Initialize: assign each
actor to own subgroup
Find subgroup seed of 2 or 3
Identify single move that most
increases objective function θ1
Does move
increase function? No
yes
Reassign actor that makes
best move
If assignment moves actor out
of a group of 3, reassign
reamaining 2 to next best
groups
For finding best subgroup seed:
1) can only choose from unaffiliated
actors
2) Each actor can only be a seed onc
KliqueFinder Algorithm:
Phases II and III
• Phase II: If best move does not increase
objective function and there are fewer than 3
actors available for subgroups then
– Attach all isolated (or singleton) actors to best
existing subgroups, even if this reduces objective
function
• Phase III: shuffle actors between existing
subgroups without seeding new ones or
disbanding existing ones
– Number of subgroups is fixed
– This is simple hill climbing and can be cast as EM
algorithm
home
Running KliqueFinder
video
•
:(43:30-1:01:00):
ID: kenfrank@msu.edu PW:kenfrank2014
Download KliqueFinder at
–http://hlmsoft.net/wkf/
–Follow instructions to install. Put in c:\kliqfind
–Mac users: vmware fusion, Windows 7, 32 bit:
http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/
•
Click on “Browse…” button to specify the directory where the data file is located.
home
16
KliqueFinder
• Choose “Basic setup” and then click “Run
setup file” button.
home
17
KliqueFinder
• Click on the “Browse” button to choose a
data file.
home
18
Run Analysis
Data file
19
New Version of Data Input more Flexible
File name must be less than 20 characters
ID’s should be 6 digits or less
Actor 1 interacts with actor 2 at a level of 3
Extent of relation can be binary or weighted
New: flexible columns,
Old (10 spaces for each)
Same results
Prepping data in excel
Prepping Data in UCINET
Converting data using sas
20
View Clusters Output
21
Blocked Network Data
N
24
Group And Actor Id
|AAAA|BBBBBB|CCCCCCCC|DDDDDD|
|
|
|
|
|
| 2 1|221 1| 11
2|111122|
Group
ID|7445|612214|98133560|796037|
------------+----+------+--------+------+
1 A
7|A213|......|........|...1..|
1 A
24|4A3.|......|.4......|......|
1 A
4|33A.|......|........|......|
1 A
15|433A|......|........|......|
------------+----+------+--------+------+
2 B
26|.2..|B443..|........|......|
2 B
21|.1..|4B....|...4....|....2.|
2 B
12|....|4.B...|........|......|
2 B
2|....|33.B..|........|...1..|
2 B
1|..3.|3..3B.|........|.3..2.|
2 B
14|....|....1B|........|......|
------------+----+------+--------+------+
3 C
9|....|......|C...3.33|.3....|
3 C
8|.4..|..4...|.C.4..4.|4.....|
3 C
11|....|......|33C.4.3.|..4...|
3 C
13|.4..|.4....|444C....|......|
3 C
3|3...|.4....|4.44C...|......|
3 C
5|.1..|.....4|3.2.3C..|......|
3 C
6|....|......|444..4C4|......|
3 C
20|....|......|3..3.44C|......|
------------+----+------+--------+------+
4 D
17|.1..|......|.1......|D.1...|
4 D
19|....|......|4.3.....|3D4...|
4 D
16|....|......|4..4...4|44D...|
4 D
10|..3.|...1..|........|...D3.|
4 D
23|....|.3....|........|.343D.|
4 D
27|.1..|.1....|........|.3..3D|
θ1 =1.1738
22
Step 3) Examine evidence of clusters
1) randomly redistribute ties
2) apply algorithm
3) record value of odds ratio and θ1
4) repeat 1000 times to generate
distribution
5) use mean of distribution as baseline for
comparison
home
23
Randomly Redistributing Ties
home
24
Apply Algorithm to Random Data,
home
θ1=.81822
25
Monte Carlo Sampling Distribution
video: (1:06:35-1:18:50) ID: kenfrank@msu.edu PW:kenfrank2014
Data can include weights
Indicate simulate data
Output in sampdist.dat
θ1=Log odds/2
Set up sampling. Remember
to do “new data” set up when done
To prepare for next analysis
Odds Ratio
26
spss
Code for Reading in Sample Distribution Data
SAS
GET DATA
title "Sampling distribution for theta1";
/TYPE=TXT
data one;
/FILE="C:\KLIQFIND\sampdist.dat"
infile "sampdist.dat" missover;
/FIXCASE=1
Input theta1 odds1;
/ARRANGEMENT=FIXED
/FIRSTCASE=1
/IMPORTCASE=ALL
proc univariate plot;
/VARIABLES=
var theta1;
/1 theta1 0-29 F30.10
oddsratio 30-59 F30.10
Stata
samplesize 60-89 F30.10.
CACHE.
*This command imports the data file
EXECUTE.
import delimited C:\KLIQFIND\sampdist.dat,
DATASET NAME DataSet9 WINDOW=FRONT. delimiter("
", asstring)
DATASET ACTIVATE DataSet9.
GRAPH
/HISTOGRAM=theta1.
*These commands perform data management:
27
drop v1
rename v2 theta1
rename v3 oddsratio
rename v4 samplesize
*This command plots histogram for theta1:
hist theta1,freq
Comparison of Sampling Distributions
28
Distribution of θ1base From Application of the Algorithm to
Data Simulated Without Regard for Subgroup Membership
Observed value:
1.1738
29
Sampling Distribution Parameters
Edit simulation parameters.
First element is number of replications
Must keep # of reps in first 5 columns
30
Approximate p-value Based on
Previous Simulations
PREDICTED THETA (1 base) BASED ON SIMULATIONS.
VALUE BASED ON UNWEIGHTED DATA.
0.76985
ESTIMATE OF THETA (1 subgroup processes)
0.40397
(total-predicted=evidence of groups): 1.1738-.76985=.40397
THE TOTAL THETA1 IS:
1.1738
APPROXIMATE TEST OF CONCENTRATION OF TIES
WITHIN SUBGROUPS BASED ON
SIZE OF THETA1 subgroup processes:
THETA1 |
SUBGROUP | APPROX | APPROX
PROCESSES| LRT
| P-VALUE
0.40
34.82
0.00
home
Reject null hypotheses of no clusters:
H0:Θ1 subgroup processes =0
31
Step 4) Evaluating the
Performance of the Algorithm :
Did the Algorithm Recover the
Correct Subgroups?
• Many algorithms search for optimal
subgroups. KliqueFinder does not, but
how different are the subgroups it finds
from the optimal or known subgroups?
home
32
Output for Recovery of Subgroups
PREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUP
MEMBERSHIP, + OR - .5734 (FOR A 95% CI)
1.4989
The Log odds applies to the following table:
OBSERVED SUBGROUP
DIFFERENT
SAME
___________________
|
|
|
DIFFERENT |
A
|
B
|
KNOWN
|
|
|
SUBGROUP
|--------|--------|
|
|
|
SAME |
C
|
D
|
|
|
|
-------------------
Specific accuracy for a
given data set not
known, results predicted
from thousands of
simulations – see next
slide
THE LOGODDS TRANSLATES TO AN ODDS RATIO OF
4.4766
WHICH INDICATES THE INCREASE IN THE ODDS
THAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TO
THE SAME SUBGROUP IF THEY ARE TRULY IN THE
IN THE SAME SUBGROUP.
33
Odds of Recovery (Toy Example)
Simulated data with known subgroups Observed subgroups identified by KliqueFinder
1
1
2
3
4
5
6
1
1
0
1
0
1
0
0
0
0
2
1
0
0
1
3
1
1
1
1
4
0
1
1
1
5
0
0
0
0
1
6
1
0
0
1
2
1
3
1
1
4
0
1
1
5
0
0
0
0
6
1
0
0
1
1
1
Cell A: 6 pairs correctly
assigned to different
subgroups:
1,5; 2,5; 3,5; 1,6; 2,6; 3,6
OBSERVED SUBGROUP
DIFFERENT
SAME
___________________
|
|
|
DIFFERENT |
|
|
KNOWN
|
A (6)|
B (3)|
SUBGROUP
|--------|--------|
|
|
|
SAME |
|
|
|
C (2)|
D (4)|
-------------------
Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00
2
3
4
5
6
1
1
0
1
0
0
0
0
0
0
0
1
1
1
1
1
Missassignment of
actor 4 contributes 3 to
cell B and 2 to cell C
Cell D: 4 pairs correctly
assigned to same subgroup:
(1,2; 1,3; 2,3; 5,6)
Make Sociogram in Netdraw
video : (1:01:00-1:06:22):
ID: kenfrank@msu.edu PW:kenfrank2014
35
Sometimes Netdraw can’t find file
retrieve manually
36
Modifying Image in Netdraw
37
38
Group And Actor Id
Density = 4/(4x8)=1/8
|AAAA|BBBBBB|CCCCCCCC|DDDDDD|
Kliqfinder uses
|
|
|
|
|
| 2 1|221 1| 11
2|111122| Density =4/(4x5)=.20 because
Group
ID|7445|612214|98133560|796037| maximum number of nominations is 5
------------+----+------+--------+------+
1 A
7|A213|......|........|...1..|
1 A
24|4A3.|......|.4......|......|
Data used for
1 A
4|33A.|......|........|......|
15|433A|......|........|......|
multidimensional 1 A
------------+----+------+--------+------+
Scaling within
DIRECT ASSOCIATIONS
2 B
26|.2..|B443..|........|......|
GROUP
1
2
3
4
2 B
21|.1..|4B....|...4....|....2.|
subgroups.
A
B
C
D
2 B
12|....|4.B...|........|......|LABEL
Distance=
N
4
6
8
6
2 B
2|....|33.B..|........|...1..|
2 B
1|..3.|3..3B.|........|.3..2.| GROUP
maximum
1
2.42
0.00
0.20
0.05
2 B
14|....|....1B|........|......|
value/cell entry ------------+----+------+--------+------+
2
0.25
1.07
0.13
0.27
3
0.38
0.40
2.40
0.28
e.g., maximum 3 C
9|....|......|C...3.33|.3....|
4
0.21
0.17
0.67
1.17
3 C
8|.4..|..4...|.C.4..4.|4.....|
value is 4,
3 C
11|....|......|33C.4.3.|..4...|
So a tie of 2  3 C
13|.4..|.4....|444C....|......|
3|3...|.4....|4.44C...|......|
4/2=2, distance 3 C
In xxxxxx.clusters
3 C
5|.1..|.....4|3.2.3C..|......|
of 2
3 C
6|....|......|444..4C4|......|
3 C
20|....|......|3..3.44C|......|
------------+----+------+--------+------+
Distance in multidimensional
4 D
17|.1..|......|.1......|D.1...|
Scaling between subgroups
4 D
19|....|......|4.3.....|3D4...|
4 D
16|....|......|4..4...4|44D...|
=maximum value /density
4 D
10|..3.|...1..|........|...D3.|
4 D
23|....|.3....|........|.343D.|
39
home
4 D
27|.1..|.1....|........|.3..3D|
N
24
Frank, K. 1996. Mapping interactions within and between
cohesive subgroups. Social Networks 18: 93-119.
cohesion
Structural similarity
video: (1:19:15-1:23:40)) ID: kenfrank@msu.edu PW:kenfrank2014
40
Choosing lines: Groups
41
Confidentiality/Ethical issues in Collecting
Network Data
•
Need names on survey
•
Data can be confidential but not anonymous (especially for longitudinal)
•
R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.” Social
Networks 27 / 2 (2005): 89 – 93. Read it online.
http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf
– (All issues of social networks available via science direct)
•
Who benefits from network analysis? Who bears the cost?
–
•
Kadushin, Charles “Who benefits from network analysis: ethics of social network research”
Social Networks 27 / 2 (2005): Pages 139-153.
Issues to raise when dealing with Human Subjects Board:
–
Klovdahl, Alden S. Social network research and human subjects protection: Towards more
effective infectious disease control Pages 119-137
•
Hint on Human Subjects boards: they like precedents. Once you have one network study
accepted, refer to it when submitting others!
•
https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm
home
video : (1:23:41-1:28)ID: kenfrank@msu.edu PW:kenfrank2014
42
The SRI/KLiqueFinder Solution to
confidentiality: aggregate to subgroups
1) Provide information about who is in which cluster as well as information
regarding the resources embedded in each cluster. Resources could be
information, expertise, material resources, etc.
Benefit: reveals location of resources relative to social; structure
Protection: does not reveal specific responses because all information is at the
cluster level.
2) Provide locations from in a sociogram unique for each respondent, indicating
where that person is located (“you are here”). But figure does not include
the lines from a sociogram, so respondents cannot infer others’ responses.
Benefit: Respondents then use this as a guide to individual behavior for
identifying further resources or information.
Protection: Specific responses of others not revealed, so confidentiality
preserved.
home
43
Can even
include
names of
actors
home
Using subgroups for feedback to respondents and in a proposal
44
Choosing Lines: Actor Level Within
45
Choosing Lines: Actor Level
Remove
group
nodes
46
Choosing Lines: Actor Level Between
47
Choosing Lines: Group Level
48
Modifying the Image:
Adding Node Data or Relations
video : ID: kenfrank@msu.edu PW:kenfrank2014 : (1:49:35-2:07:48)
http://www.analytictech.com/ucinet/download.htm
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0C
B0QFjAA&url=http%3A%2F%2Fwww.analytictech.com%2FNetdraw%2FNetdra
wGuide.doc&ei=6pC4Tp29Men3sQLv99WoCA&usg=AFQjCNHg_NTjlHOclmeJ
kwQs2xRaiPYgXQ&sig2=WLwXKSjJq_Yinpfkwv0m4w
http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data
49
Files for KliqueFinder
Input data
Parameters
Network data
Node data
Alternative
network data
xxxxxx.list
xxxxxx.ilabel
xxxxxx.xnet
Kliqfind.par
Printo
Simulate.par
KliqueFinder
Output
xxxxxx.place
Data containing
actor ID’s and
subgroup
placement
xxxxxx.clusters
xxxxxx.vna
Diagnostics
for Netdraw
and matrix formatted
data
50
Modifying node data by Editing [datafile].vna:
File is read by netdraw. Copy relevant data into excel, edit, and replace
*node data
id type group gender
"0A " 2
1 0
"0B " 2
2 0
"0C " 2
3 0
"0D " 2
4 0
1 1
2 1
2 1
2 2
Add new node variable here (e.g. gender)
then add data
*Node properties
ID x y color shape size shortlabel active
"0A "
-2.01889
-15.04530 16777215 1
"0B "
-9.41864
15.75047 16777215 1
"0C "
2.06574
2.09162 16777215 1
"0D "
8.54812
10.10988 16777215 1
1
-10.52314
14.16442 16711680 1 10
2
-8.29999
13.27802 16711680 1 10
30
85
52
79
*Tie data
from to any strength actor group between within technology
1
2
1
3
1
0
1
4
1
3
1
0
1
19
1
3
1
0
1
23
1
2
1
0
1
26
1
3
1
0
2
26
1
3
1
0
2
10
1
1
1
0
*Tie properties
FROM TO color size headcolor headsize active
"0A " "0B " 12632256
1 12632256 0 TRUE
"0A " "0C " 12632256
9 12632256 0 TRUE
1
2
0 3 0 8 TRUE
1
4 12632256 3 0 8 TRUE
A
B
C
D
TRUE
TRUE
TRUE
TRUE
1 TRUE
2 TRUE
0
1
1
1
0
0
1
51
Adding Node Attributes with Extra File
KliqueFinder will put attributes into vna file
xxxxxx.Ilabel
xxxxxx.list
File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data file
10 columns for ID; Skip a space; Name; Node attribute 1-5
stanne.list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Jacob 1 3 5
Stan 1 2 5
Linton 1 2 5
Charles 1 3 3
Mark 1 3 3
Tom 2 3 3
Ronald 2 3 5
Nan
2 1 3
Elizabeth
2 1 4
Barry
2 2 3
Martin
2 3 1
Steve
2 3 1
PeterC
2 1 5
Patrick
1 1 1
Katy
1 1 3
Kathleen
3 3 3
Ove 2 2 2
JamesC 5 5 5
Robert 4 4 4
JamesM 1 2 3 4
Noah 4 3 2 1
Marijtje 1 2 1 2
Ronald 2 1 2 1
Harrison 3 1 3 1
Duncan 4 1 4 1
Cut and paste into
stanne.Ilabel
52
53
54
Interactive: adding node data
or
55
56
Include Node Data in Image
57
Modifying Links
Lines indicate
friendships:
solid within
subgroups,
dotted between
subgroups.
numbers
represent actors
Rgt,Cen,Soc,Non
= political parties;
B=Banker,
T=treasury;
E=Ecole National
D’administration
Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital
Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686
58
Hostile Actions
59
Supportive Actions
60
35
E
25
• Each number is a
teacher
• G_ indicates
grade in which
teacher teaches
• Lines connecting
two numbers
indicate teachers
who are close
colleagues
Solid lines within
subgroups, dashed
between
• Circles indicate
cohesive
subgroups
B
15
5
C
-5
D
-15
-25
-35
A
-45
-25
-15
-5
61
5
15
25
Ripple Plot
• Overlay talk about technology on social
geography of crystallized sociogram
• Lines indicate talk about technology
• Size of dot indicates teacher’s use of
technology at time 1
• Ripples indicate increase in use from time
1 to time 2
home
62
Frank, K. A. and
Zhao, Y. (2005).
"Subgroups as a
Meso-Level
Entity in the
Social
Organization of
Schools."
Chapter 10,
pages 279-318.
Book honoring
Charles Bidwell's
retirement,
edited by Larry
Hedges and
Barbara
Schneider. New
York: Sage
publications.
63
Modifying Links by Editing [datafile].vna:
File is read by netdraw. Copy relevant data into excel, edit, and replace
*node data
id type group gender
"0A " 2
1 0
"0B " 2
2 0
"0C " 2
3 0
"0D " 2
4 0
1 1
2 1
2 1
2 2
Add new node variable here (e.g. gender)
then add data
*Node properties
ID x y color shape size shortlabel active
"0A "
-2.01889
-15.04530 16777215 1
"0B "
-9.41864
15.75047 16777215 1
"0C "
2.06574
2.09162 16777215 1
"0D "
8.54812
10.10988 16777215 1
1
-10.52314
14.16442 16711680 1 10
2
-8.29999
13.27802 16711680 1 10
30
85
52
79
A
B
C
D
TRUE
TRUE
TRUE
TRUE
1 TRUE
2 TRUE
Add new relation here (e.g. technology)
then add data
*Tie data
from to any strength actor group between within technology
1
2
1
3
1
0
1
4
1
3
1
0
1
19
1
3
1
0
1
23
1
2
1
0
1
26
1
3
1
0
2
26
1
3
1
0
2
10
1
1
1
0
*Tie properties
FROM TO color size headcolor headsize active
"0A " "0B " 12632256
1 12632256 0 TRUE
"0A " "0C " 12632256
9 12632256 0 TRUE
1
2
0 3 0 8 TRUE
1
4 12632256 3 0 8 TRUE
0
1
1
1
0
0
1
64
Modifying Links with Extra File
KliqueFinder will put attributes into vna file
xxxxxx.xnet
xxxxxx.list
File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data file
File containing extra network
stanne.list
Nominator nominee strength of tie
1
19
22
2
15
26
4
3
1
stanne.xnet
65
66
Modifying Links: Interactive –
Finicky
67
Interactive Modifying Links
68
Two mode
*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. 2006. “Identifying
Social Contexts in Affiliation Networks: Preserving the Duality of People and Events.
Social Networks 28:97-123. * co first authors.
Data source
1
2
69
video : ID: kenfrank@msu.edu PW:kenfrank2014:(1:39:25-1:49:35)
Copy homact.list from
c:\kliqfind/setups to c:\kliqfind
70
Two-mode Data
Edgelist
First two rows do not appear in the data –
I put them there to show the format: 10
spaces for each entry
Actor 1 participates in event 19 at a level of 1
Extent of relation can be binary or weighted
New version of KliqueFinder is more flexible
About 10 column widths.
ID’s should be 6 digits or less
Prepping data in excel
Prepping Data in UCINET
Converting data using sas
71
Two mode
Clusters output
72
Blocked Two-Mode Blocked
Network Data
73
Two-mode Crystallized Sociogram
74
Centralization & Centrality in KliqueFinder
• KliqueFinder produces a measure of Warp.
• Starts with distances defined by
– Maximum value in network / observed value
• E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4.
– These are the distances used in the MDS to produce the
sociograms (see “running KliqueFinder ppt”)
• Obtains eigen values
– within each cluster based on raw data within cluster
– Between clusters based on 1/density of ties between clusters
• Density=average value in a given block
• Warp =sum of positive eigen values/sum of all eigen
values
– Note it does not use the square root of the eigen values
(variances are more additive)
• Output into xxxxxx.bcord (9th element) and into netdraw
as node attribute for groups, called “centrality”
• Centrality for individuals is distance to the center of their
75
home
subgroup (radius).
Running on a Large Data File
(more than 1000 actors)
If you start the program and it just sits there, it is looking for the
best seed for the first subgroup. Seed is 3 actors, but it looks for all
combinations of 3 that share common ties in network.
Intensive, and unnecessary for large data (1st subgroup does not
matter so much). To shortcut: change value from 12. save & run.
76
Software Challenge
video : ID: kenfrank@msu.edu PW:kenfrank2014 :(2:07:57-2:08:15)
• Analyze nonpr1.list
– Evidence of clusters?
– Performance of algorithm?
• Replace lines with nonpr2
• Describe the KliqueFinder algorithm
home
77
KliqueFinder Applications:
Adding Individual Attributes in
SAS:
run KliqueFinder
data file collt1.list
make graph
use ID from other file? Yes:
sas file name: c:\kliqfind\indiv
[be sure to include full path]
id variable: nominator
string variable: gradelev
Save
In sas, run socgramz in the working directory
home
78
KliqueFinder Applications:
Adding Individual Attributes:
• Select “Yes” for “User ID (character) from
other SAS file?”
home
79
KliqueFinder Applications:
Adding Individual Attributes:
• Type the following information in the
corresponding boxes
• Then Click “Save”
home
80
Choosing an ID Variable
81
With ID based on Grade
home
82
KliqueFinder Applications:
Replacing Lines
run KliqueFinder
data file collt1.list
make graph
save
retrieve socgramz.sas in the working directory
replace all occurrences of collt1.list with
collt2.list
run
home
83
Opening socgramz.sas
84
Changing lines
85
Change lines to different source
86
New Lines based on Collt2
87
Batch KliqueFinder
88
Basics
• Program runs KliqueFinder on multiple
files
• Input
– List of filenames
– Files containing data
– BACK UP YOUR DATA FIRST!
• Output
– Clustering output (.place, .clusters, vna) for
each list file
home
89
Files
File containing names of data files: testb.txt
BACK UP YOUR DATA FIRST!
Data file: stanne.list
Data file: ffe.list
90
KliqueFinder
• Browse to directory you want to work in
• Choose “Basic setup” and then click “Run
setup file” button.
91
Running Batch Mode
BACK UP
DATA
FILES
BEFORE
RUNNING!
File
with
names
of data
files
Click
here to
run as
batch
92
Prepping data in excel
video : ID: kenfrank@msu.edu PW:kenfrank2014 :Time: (1:28-1:39)
Name your file xxxxxx.list
e.g., test01.list
Right click
Choose
Formatted text
(space delimited)
93
Prepping Data in UCINET
Navigate to UCINET data
Navigate to where you want to save:
c:\kliqfind
94
Must remove “!” from file.
There may be several
!’s points are there because of
Multiple data sets
95
Converting data using sas
video : ID: kenfrank@msu.edu PW:kenfrank2014 : :
Time: (2:10:43-2:19)
data one;
infile "badform.list";
input chooser chosen wt;
data two; set one;
file "ready1.list";
if wt ne . then put (chooser chosen wt) (10.);
run;
96
A Priori Clusters
A line with 99999 in the data file indicates in which a priori
cluster an actor is placed.
For example, actor 1 is in a priori cluster 3.
Run repeat2 setup, and then proceed as usual.
Remember to do “new data” setup when done.
97
KliqueFinder will make pictures
based on a priori clusters
Comparison of A Priori Clusters and
Identified Solution
Run as new data
Data with a priori cluster assignments
Run as usual then
look at cluster
output
SIMILARITY BETWEEN THE START AND END GROUPS:
ACTUAL
52.
POSS
88.
QAP standardized
98
STANDARDIZED measure, compare with
9.55565
normal distribution
Data Containing Cluster Assignments
File called stanne.place [datafile.place]
Internal ID
There may be
Slightly
different
numeric
formats
Depending on
the version of
KliqueFinder
1.0
2.0
3.0
4.0
5.0
6.0
17.0
18.0
19.0
20.0
21.0
22.0
23.0
24.0
-27.0
User ID Cluster
1.0
2.0
4.0
19.0
23.0
26.0
6.0
8.0
20.0
15.0
12.0
17.0
16.0
27.0
28.0
2.0
2.0
1.0
4.0
4.0
2.0
3.0
3.0
3.0
1.0
2.0
4.0
4.0
4.0
4.0
ignore: for simulation only
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
If first number (internal ID) is negative, this indicates a tagalong –
an actor connected to only one other.
In this case, the last line should be read as the tagee, tagger, and group.
So, actor 28 is connected to only one other actor (27) and is
therefore assigned to actor 27’s cluster, which is cluster 4.
99
Including Cluster Membership in
Influence Model
SPSS
SAS
DATA LIST / intid 1-10 nominee 11-20
31-40 extra 41-50.
BEGIN DATA
1.0
1.0
1.0
2.0
2.0
1.0
3.0
3.0
1.0
4.0
4.0
2.0
5.0
5.0
2.0
6.0
6.0
2.0
END DATA.
DATASET NAME clusters WINDOW=FRONT.
SORT CASES BY nominee(A).
EXECUTE.
MATCH FILES /FILE=yvar1
/FILE='indeg'
/FILE=clusters
/BY nominee.
EXECUTE.
cluster 21-30 simx
1.0
1.0
1.0
1.0
1.0
1.0
3.0
3.0
3.0
3.0
3.0
3.0
data clusters;
*groups from KLiqueFinder;
input intid nominator cluster simx extra;
cards;
1.0
1.0
1.0
1.0
2.0
2.0
1.0
1.0
3.0
3.0
1.0
1.0
4.0
4.0
2.0
1.0
5.0
5.0
2.0
1.0
6.0
6.0
2.0
1.0
3.0
3.0
3.0
3.0
3.0
3.0
proc sort data=groups;
by nominator;
data withinfl;
merge yvar2 yvar1 infl expanse cluster
attract(rename=(nominee=nominator));
by nominator;
drop nominee _type_ _freq_;
advanced:
run influence model for technology
Identify clusters from talkt2
Include cluster membership the influence model
100
Adding Patches
Patch for one
-mode
Patch for
Two-mode
101
Alternative community detection
algorithms
• http://cs.stanford.edu/people/jure/pubs/co
mmunities-www10.pdf
• http://www.uvm.edu/~pdodds/files/papers/
others/2009/lancichinetti2009a.pdf
• http://fatweasel.net/analytics/networkanalysis/community-detection-in-networks/
home
102
Download