Foundations of network analysis

advertisement
Foundations of Network Analysis
Overview
Theory: A structural Approach to Sociology
•Emirbayer
•Martin
Methods:
•Points and Lines
•Data formats
•Matrices
•Adjacency Lists
•Edge Lists
•Basic Graph Theory
Homework Results
JWM’s 3-step kinship neighborhood (plus in-laws for fun)
N=70+
Foundations
Theory
“A manifesto for Relational Sociology”
•“Substantialism vs Relationalism”
•Theoretical Domains:
Power, equality, freedom, agency
•Substantive domains (research):
Social Structure
Network analysis
Culture
Social Psychology
•Problems
Boundary specification
Network dynamics
Causality
Normative implication
Foundations
Theory
“Structural Analysis: from method and
metaphor to theory and substance.” (Wellman,
you didn’t read this)
H. White: “The presently existing, largely categorical
descriptions of social structure have no solid theoretical
grounding; furthermore, network concepts may provide
the only way to construct a theory of social structure.”
(p.25)
Form Vs. Content
Integration of large-scale social
systems
Foundations
Theory
“Structural Analysis: from method and
metaphor to theory and substance.”
Major Claims:
•Structured social relationships are a more powerful source of
sociological explanation than personal attributes of system members.
•Norms emerge from location in structured systems of social
relationships
•Social Structures determine the operation of dyadic relationships
•The world is composed of networks, not groups
•Structural methods supplant and supplement individualistic methods
Foundations
Theory
Social Structures
•Goal: To provide an analytic understanding of social structures “from
the ground up”  by asking what limitations are created by forms of
relations.
An analytic approach to explaining institutions: imagine a noncontradictory aggregation process of individual actions that yield the
observed institution.  so institutions are the “crystallization” of
relationships:
Foundations
Theory
Social Structures
•The first question is how to characterize “social relations” – by form,
content, quality, quantity…?
•JLM focuses on a formal aspect of the base relation:
•Examples:
•Symmetric: ab implies bA
•Asymmetric: ab does not necessarily imply ba
•Antisymmetric: ab forbids ba
Foundations
Theory
Social Structures
•The second question is how to characterize “social structure”?
Do so w. respect to particular people, rather than roles/classes.
Foundations
Data
The unit of interest in a network are the combined sets of
actors and their relations.
We represent actors with points and relations with lines.
Actors are referred to variously as:
Nodes, vertices, actors or points
Relations are referred to variously as:
Edges, Arcs, Lines, Ties
Example:
b
a
(Review from last class…)
d
c
e
Foundations
Data
Social Network data consists of two linked classes of data:
a) Nodes: Information on the individuals (actors, nodes, points, vertices)
•
•
•
Network nodes are most often people, but can be any other unit capable of
being linked to another (schools, countries, organizations, personalities, etc.)
The information about nodes is what we usually collect in standard social
science research: demographics, attitudes, behaviors, etc.
Often includes dynamic information about when the node is active
b) Edges: Information on the relations among individuals (lines, edges, arcs)
•
•
•
•
Records a connection between the nodes in the network
Can be valued, directed (arcs), binary or undirected (edges)
One-mode (direct ties between actors) or two-mode (actors share membership
in an organization)
Includes the times when the relation is active
Graph theory notation: G(V,E)
(Review from last class…)
Foundations
Data
In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected
b
b
d
a
c
a
e
c
1
a
b
d
1
3
c
e
Directed, binary
Undirected, binary
b
d
d
2
4
Undirected, Valued
e
a
c
e
Directed, Valued
The social process of interest will often determine what form your data take. Almost all of the
techniques and measures we describe can be generalized across data format.
Social Network Data
Basic Data Elements
In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected
b
a
d
c
e
Directed,
Multiplex categorical edges
The social process of interest will often determine what form your data take. Conceptually, almost
all of the techniques and measures we describe can be generalized across data format, but you may
have to do some of the coding work yourself….
Foundations
Data
Primary
Group
Global-Net
Ego-Net
Best Friend
Dyad
2-step
Partial network
Foundations
Data
We can examine networks across multiple levels:
1) Ego-network
- Have data on a respondent (ego) and the people they are connected to
(alters). Example: 1985 GSS module
- May include estimates of connections among alters
2) Partial network
- Ego networks plus some amount of tracing to reach contacts of
contacts
- Something less than full account of connections among all pairs of
actors in the relevant population
- Example: CDC Contact tracing data for STDs
Foundations
Data
We can examine networks across multiple levels:
3) Complete or “Global” data
- Data on all actors within a particular (relevant) boundary
- Never exactly complete (due to missing data), but boundaries are set
-Example: Coauthorship data among all writers in the social
sciences, friendships among all students in a classroom
A Little network Visualization History
Euler, 1741
Euler’s treatment of the “Seven Bridges of Kronigsberg” problem is one of the
first moments of graph theory….
A Little network Visualization History
The study of network has depended on a graphical element since its first
moments:
Or early representations of organizational relations (1921)
A Little network Visualization History
The study of network has depended on a graphical element since its first
moments:
..but Moreno’s sociograms from Who Shall Survive (1934) are typically seen as
the beginnings of social network analysis (certainly if you were to ask
Moreno!).
A Little network Visualization History
Lundberg & Steel 1938 – Using a “Social Atom” representation
The flow of images continued over time, marking a wide range of potential
styles….
A Little network Visualization History
Charles Loomis – 1948
Loomis, 1940s
The flow of images continued over time, marking a wide range of potential
styles….
A Little network Visualization History
Northaway’s – “Target Sociograms”
Bronfenbrenner, 1941
Northway 1952
The flow of images continued over time, marking a wide range of potential
styles….
A Little network Visualization History
“Viral Marketing” is perhaps the most recent advocate; with this
add appearing in popular women’s magazines…
The flow of images continued over time, marking a wide range of potential
styles….
Foundations
Graphs
A good network drawing allows viewers to come away from the image with an almost
immediate intuition about the underlying structure of the network being displayed.
However, because there are multiple ways to display the same information, and standards
for doing so are few, the information content of a network display can be quite variable.
Consider the 4 graphs drawn at right.
After asking yourself what intuition
you gain from each graph, click on
the screen.
Now trace the actual pattern of ties.
You will see that these 4 graphs are
exactly the same.
Why Visualize Network at all?
While the history is deeply rooted in visual analysis, why bother? Consider
Anscombe’s answer in the 1973 American Statistician (replicated in Tufte)
X
4
5
6
7
8
9
10
11
12
13
14
y1
4.26
5.68
7.24
4.82
6.95
8.81
8.04
8.33
10.84
7.58
9.96
y2
3.1
4.74
6.13
7.26
8.14
8.77
9.14
9.26
9.13
8.74
8.1
y3
5.39
5.73
6.08
6.42
6.77
7.11
7.46
7.81
8.15
12.74
8.84
These 3 series seem very similar, when
viewed statistically
N=11
Mean of Y = 7.5
Reg Equation: Y = 3 + .5(X)
SE of slope estimate: 0.118
T=4.24
Sum of Squares (X-X): 110
Regression SS: 27.5
Correlation Coeff: 0.82
Why Visualize Network at all?
While the history is deeply rooted in visual analysis, why bother? Consider
Anscombe’s answer in the 1973 American Statistician (replicated in Tufte)
15
We Might expect a
relation like this:
10
5
0
0
5
10
15
Why Visualize Network at all?
While the history is deeply rooted in visual analysis, why bother? Consider
Anscombe’s answer in the 1973 American Statistician (replicated in Tufte)
15
..but could have
this…
10
5
0
0
5
10
15
Why Visualize Network at all?
While the history is deeply rooted in visual analysis, why bother? Consider
Anscombe’s answer in the 1973 American Statistician (replicated in Tufte)
15
…or this…
Or many more.
10
5
0
0
5
10
15
Why Visualize Network at all?
While the history is deeply rooted in visual analysis, why bother? Consider
Anscombe’s answer in the 1973 American Statistician (replicated in Tufte)
15
15
15
10
10
10
5
5
5
0
0
0
5
10
15
0
0
5
10
15
0
5
10
Visualization allows you to see the relations among elements “in the whole” – a
complete macro-vision of your data in ways that summary statistics cannot.
This is largely because a good summary statistic captures a single dimension,
while visualization allows us to layer dimensionality and relations among them.
15
Why Visualize Network at all?
But consider changing a key feature of the scatterplot: the scaled ordering of the
axes.
15
15
14
14
13
13
Standard View
12
Permuted View
12
11
11
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
12
2
6
0
11 8
1
13 7
4
3
15 10 14 9
Technically, all the information is retained – but the presentation provides no
new information.
5
Why Visualize Network at all?
Now consider network visualizations: We lack a determinant coordinate system, having
only “adjacent” or “not” distinguished by a connecting line. Thus, there are many ways to
represent the same data. Consider the “Zachary Karate Club” data:
3 representations of the same underlying data
Original, 1979(?)
White & Harary, 2001
Kolaczyk, Eric D., Chua, David B.,
Barthélemy, Marc (2009)
The exact same data, presented in press distinct ways. We’d never see
this with a scatter plot…
Foundations
Graphs
Network visualization helps build intuition, but you have to keep the drawing
algorithm in mind:
Spring-embeder layouts
Tree-Based layouts
Most effective for very sparse,
regular graphs. Very useful
when relations are strongly
directed, such as organization
charts, internet connections,
Most effective with graphs that have a strong
community structure (clustering, etc). Provides a very
clear correspondence between social distance and
plotted distance
Two images of the same network
Foundations
Graphs
Network visualization helps build intuition, but you have to keep the drawing
algorithm in mind:
Tree-Based layouts
Spring-embeder layouts
Two images of the same network
Foundations
Graphs
Network visualization helps build intuition, but you have to keep the drawing
algorithm in mind.
Hierarchy & Tree models
Use optimization routines to add meaning to the “Y-axis” of the plot. This
makes it possible to easily see who is most central because of who is on the
top of the figure. Usually includes some routine for minimizing linecrossing.
Spring Embedder layouts
Work on an analogy to a physical system: ties connecting a pair have
‘springs’ that pull them together. Unconnected nodes have springs that push
them apart. The resulting image reflects the balance of these two features.
This usually creates a correspondence between physical closeness and
network distance.
Foundations
Graphs
2
12
9
63
Male
Female
Foundations
Graphs
Using colors to code
attributes makes it simpler to
compare attributes to
relations.
Here we can assess the
effectiveness of two different
clustering routines on a
school friendship network.
Foundations
Graphs
As networks increase in size, the
effectiveness of a point-and-line
display diminishes, because you
simply run out of plotting
dimensions.
I’ve found that you can still get
some insight by using the
‘overlap’ that results in from a
space-based layout as
information.
Here you see the clustering
evident in movie co-staring for
about 8000 actors.
Foundations
Graphs
As networks increase in size, the
effectiveness of a point-and-line
display diminishes, because you
simply run out of plotting
dimensions.
I’ve found that you can still get
some insight by using the
‘overlap’ that results in from a
space-based layout as
information.
This figure contains over 29,000
social science authors. The two
dense regions reflect different
topics.
Foundations
Graphs
As networks increase in size, the
effectiveness of a point-and-line
display diminishes, because you
simply run out of plotting
dimensions.
I’ve found that you can still get
some insight by using the
‘overlap’ that results in from a
space-based layout as
information.
This figure contains over 29,000
social science authors. The two
dense regions reflect different
topics.
Foundations
Graphs
Adding time to social networks is
also complicated, as you run out
of space to put time in most
network figures.
One solution is to animate the
network.
Here we see streaming interaction
in a classroom, where the teacher
(yellow square) has trouble
maintaining order.
The SONIA software program
(McFarland and Bender-deMoll)
will produce these figures.
Black ties: Teaching relevant communication
Blue ties: Positive social communications
Red ties: Negative social communication
Source: Moody, James, Daniel A. McFarland and Skye Bender-DeMoll (2005) "Dynamic Network Visualization: Methods for Meaning with Longitudinal Network
Movies” American Journal of Sociology 110:1206-1241
Foundations
Methods
Analytically, graphs are cumbersome to work with analytically, though there is a
great deal of good work to be done on using visualization to build network
intuition.
I recommend using layouts that optimize on the feature you are most interested
in. The two I use most are a hierarchical layout or a force-directed layout are
best.
Foundations
Methods
From pictures to matrices
b
b
d
a
c
e
Undirected, binary
a
b
1
a
b 1
c
1
d
e
c
d
1
1
c
e
a
1
a
b 1
c
1
d
e
1
1
a
e
Directed, binary
1
1
d
b
1
c
1
d
e
1
1
1
Foundations
Methods
From matrices to lists
a
a
b 1
c
d
e
b
1
c
d
e
1
1
1
1
1
1
1
1
Adjacency List
ab
bac
cbde
dce
ecd
Arc List
ab
ba
bc
cb
cd
ce
dc
de
ec
ed
Foundations
Basic Measures
Basic Measures & A little graph theory
For greater detail, see:
http://www.analytictech.com/networks/graphtheory.htm
Volume
The first measure of interest is the simple volume of
relations in the system, known as density, which is the
average relational value over all dyads. Under most
circumstances, it is calculated as:
D=
X
N ( N - 1)
Foundations
Basic Measures
Basic Measures & A little graph theory
Volume
At the individual level, volume is the number of relations, sent
or received, equal to the row and column sums of the adjacency
matrix.
a
a
b 1
c
d
e
b
1
c
1
1
d
e
1
1
1
Node In-Degree Out-Degree
a
1
1
b
2
1
c
1
3
d
2
0
e
1
2
Mean:
7/5
7/5
Foundations
Data
Basic Measures & A little graph theory
Reachability
Indirect connections are what make networks systems. One
actor can reach another if there is a path in the graph
connecting them.
b
a
a
d
c
b
e
f
c
f
d
e
Foundations
Basic Matrix Operations
One of the key advantages to storing networks as matrices is that we can use all of the
tools from linear algebra on the socio-matrix.
Some of the basics matrix manipulations that we use are as follows:
1)
Definition
A matrix is any rectangular array of numbers. We refer to the matrix dimension as
the number of rows and columns
a b c d e
a 0 1 0 0 0
b 1 0 0 0 0
c 0 1 0 1 1
d 0
e 0
0 0 0 0
0 1 1 0
(5 x 5)
W B
1 0
1 0
0 1
0 1
0 1
1 0
(5x2)
Age
13
10
7
8
16
11
(5x1)
Foundations
Basic Matrix Operations
Matrix operations work on the elements of the matrix in particular ways. To do so,
the matrices must be conformable. That means the sizes allow the operation.
For addition (+), subtraction (-), or elementwise multiplication (#), both matrices
must have the same number of rows and columns. For these operations, the matrix
value is the operation applied to the corresponding cell values.
1 3
A= 4 7
2 5
2 3
B= 7 1
0 4
3 6
A+B = 11 8
2 9
3 9
Multiplication by a scalar: 3A = 12 21
6 15
A-B =
-1 0
-3 6
2 1
2 9
A#B = 28 7
0 20
Foundations
Basic Matrix Operations
The transpose (` or T) of a matrix reverses the row and column
dimensions.
Atij=Aji
So a M x N matrix becomes an N x M matrix.
a b
c d
e f
T
=
a c e
b d f
Foundations
Basic Matrix Operations
The matrix multiplication (x) of two matrices involves all elements of the
matrix, and will often result in a matrix of new dimensions. In general, to
be conformable, the inner dimension of both matrices must match. So:
A3x2 x B2x3 = C3 x 3
But
A3x3 x B2x3 is not defined
Substantively, adding ‘names’ to the dimensions will help us keep track of
what the resulting multiplications mean:
So multiplying (send x receive)x (send x receive) = (send x receive), giving
us the two-step distances (the sender’s recipient's receivers).
Foundations
Basic Matrix Operations
The multiplication of two matrices Amxn and Bnxq results in Cmxq
n
Cmq =  amk bkq
k =1
a b
c d
e f
g h
a b
c d
e f
g h i
j k l
(3x2)
(2x3)
=
=
ae+bg
ce+dg
ag+bj
cg+dj
eg+fg
af+bh
cf+dh
ah+bk
ch+dk
eh+fk
(3x3)
ai+bl
ci+dl
ei+fl
Foundations
Basic Matrix Operations
The powers (square, cube, etc) of a matrix are just the matrix times
itself that many times.
A2 = AA or A3 = AAA
We often use matrix multiplication to find types of people one is
tied to, since the ‘1’ in the adjacency matrix effectively captures
just the people each row is connected to.
(Preview: This is also how we do compound relations: Mother x Brother  “Uncle”)
Foundations
Data
Basic Measures & A little graph theory
Reachability
The distance from one actor to another is the shortest path
between them, known as the geodesic distance. If there is
at least one path connecting every pair of actors in the
graph, the graph is connected and is called a component.
Two paths are independent if they only have the two endnodes in common. If a graph has two independent paths
between every pair, it is biconnected, and called a
bicomponent. Similarly for three paths, four, etc.
Foundations
Data
Basic Measures & A little graph theory
Calculate reachability through matrix multiplication.
(see p.162 of W&F)
0
1
0
0
0
1
e
d
c
b
a
f
1
0
1
0
0
0
X
0 0
1 0
0 1
1 0
1 1
1 0
0
0
1
1
0
0
1
0
1
0
0
0
2
0
2
0
0
0
0
2
0
1
1
2
X2
2 0
0 1
4 1
1 2
1 1
0 1
Distance
. 1 2 0
1 . 1 2
2 1 . 1
0 2 1 .
0 2 1 1
1 2 1 2
0
1
1
1
2
1
0
2
1
1
.
2
4
0
6
1
1
0
X3
0 2
6 1
2 5
5 2
5 3
6 1
0
2
0
1
1
2
0
4
0
2
2
4
2
1
5
3
2
1
4
0
6
1
1
0
1
2
1
2
2
.
Distance
. 1 2 3 3
1 . 1 2 2
2 1 . 1 1
3 2 1 . 1
3 2 1 1 .
1 2 1 2 2
1
2
1
2
2
.
Foundations
Data
Basic Measures & A little graph theory
Mixing patterns
Matrices make it easy to look at mixing patterns: connections
among types of nodes. Simply multiply an indicator of
category by the adjacency matrix.
e
d
c
b
a
f
0
1
0
0
0
1
1
0
1
0
0
0
X
0 0
1 0
0 1
1 0
1 1
1 0
0
0
1
1
0
0
1
0
1
0
0
0
Race
1 0
1 0
0 1
0 1
0 1
1 0
R G
R 4 2
Race`(X)Race=
G 2 6
X(Race)
2 0
1 1
2 2
0 2
0 2
1 1
Foundations
Data
Basic Measures & A little graph theory
Matrix manipulations allow you to look at direction of ties,
and distinguish symmetric from asymmetric ties.
To transform an asymmetric graph to a symmetric graph, add
it to its transpose.
0
1
0
0
0
1
0
1
0
0
X
0
0
0
0
1
0
0
1
0
1
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
XT
0 0
1 0
0 0
1 0
1 0
0
0
1
1
0
0
2
0
0
0
2
0
1
0
0
0
1
0
1
2
0
0
1
0
1
0
0
2
1
0
Max Sym
0 1 0 0 0
1 0 1 0 0
0 1 0 1 1
0 0 1 0 1
0 0 1 1 0
MIN Sym
0 1 0 0 0
1 0 0 0 0
0 0 0 0 1
0 0 0 0 0
0 0 1 0 0
Social Network Software
UCINET
•The Standard network analysis program, runs in Windows
•Good for computing measures of network topography for single nets
•Input-Output of data is a special 2-file format, but is now able to read
PAJEK files directly.
•Not optimal for large networks, but much better than it used to be!
•Available from:
Analytic Technologies
Social Network Software
PAJEK
•Program for analyzing and plotting very large networks
•Intuitive windows interface
•Used for most of the real data plots in this presentation
•Started mainly a graphics program, but has expanded to a wide range of
analytic capabilities
•Can link to the R & SPSS statistical package
•Free
•Available from:
Social Network Software
Cyram Netminer for Windows
•Newest Product, not yet widely used
•Price range depends on application & size, but
typically quite spendy ($4000+)
http://www.netminer.com/NetMiner/overview_01.jsp
Social Network Software
NetDraw
•A drawing program packaged w. UCINET 6
•Free
•Works directly w. UCINET files, so useful there…
Social Network Software
NEGOPY (no longer in production, but you may find a copy out there..)
•Program designed to identify cohesive sub-groups in a network,
based on the relative density of ties.
•DOS based program, need to have data in arc-list format
•Moving the results back into an analysis program is difficult.
•Available from:
William D. Richards
http://www.sfu.ca/~richards/Pages/negopy.htm
SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)
•is a collection of IML and Macro programs that allow one to:
a) create network data structures from nomination data
b) import/export data to/from the other network programs
c) calculate measures of network pattern and composition
d) analyze network models
•Allows one to work with multiple, large networks
•Easy to move from creating measures to analyzing data
http://www.soc.duke.edu/~jmoody77/span/span.zip
Social Network Software
STATNET
•Program designed to estimate statistical models on networks in R.
Statnet Team
http://csde.washington.edu/statnet/
Other R Resources:
Carter Butts (UC-Irvine, Sociology) – SNA & PermNet
•Program for general network analysis in R
•Does most of what we’ve discussed today…
Social Network Software
STATNET
•Program designed to estimate statistical models on networks in R.
Statnet Team
http://csde.washington.edu/statnet/
Other R Resources:
iGraph
Social Network Software
Lots of Java-Based programs
Both are flexible, fairly good at “drawing by hand” (but some quirks)
Social Network Software
CASOS – A collection of tools for networks, developed by the folks at
Carnegie Mellon (Carley et al)
http://www.casos.cs.cmu.edu/index.php
Social Network Software
Homework Preview: Let’s open SAS, UCINET & PAJEK
Download