An Introduction to Social Network Analysis

advertisement
An Introduction to
Social Network Analysis
James Moody
Department of Sociology
The Ohio State University
Introduction
The world we live in is connected:
Jim
Moody
Craig
Calhoun
Isaias
Afworki
Introduction
These patterns of connection form a social space.
Social network analysis maps and analyzes this social space.
Adolescent Social Structure
Introduction
Yet standard social science analysis methods do not
take this space into account.
Moreover, the complexity of the relational world
makes it impossible (in most cases) to understand
this connectivity using only our intuitive
understanding of a setting.
Introduction
Why networks matter:
• Intuitive: information travels through contacts
between actors, which can reflect a power
distribution or influence attitudes and behaviors. Our
understanding of social life improves if we account
for this social space.
• Less intuitive: patterns of inter-actor contact can
have effects on the spread of “goods” or power
dynamics that could not be seen focusing only on
individual behavior.
Introduction
Social network analysis is:
•a set of relational methods for systematically
understanding and identifying connections
among actors
•a body of theory relating to types of
observable social spaces and their relation to
individual and group behavior.
Introduction
Network analysis assumes that:
• How actors behave depends in large part on how they are
linked together
Example: Adolescents with peers that smoke are more
likely to smoke themselves.
• The success or failure of organizations may depend on the
pattern of relations within the organization
Example: The ability of companies to survive strikes
depends on how product flows through factories and
storehouses.
(continued…..)
Introduction
Network analysis assumes that:
• Patterns of relations reflect the power structure of a
given setting, and clustering may reflect coalitions
within the group
Example: Overlapping voting patterns in a
coalition government
Introduction
An information
network:
Email exchanges
within the Reagan
white house, early
1980s
(source: Blanton,
1995)
Introduction
Power positions
and potential
influence
Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models & Methods
For Flows and Structures
Conclusions
Basic Concepts
• Actors are nodes
Ideas, Papers, Events, Individuals,
Organizations, Nations
• Relations are lines between pairs of nodes
Symmetric (shares a room with)
Asymmetric (gives an order to)
Valued (number of times seen together)
Basic Concepts
• Network data are familiar to you
• For example:
- Personal, face-to-face contact
- Telephone contact
- Email contact
- Contact through faxes or wires
- Snail-mail contact
- Membership in the same organization
- Attendance at the same meetings
- Graduates of the same university
Basic Concepts
For example, you might be tracking the activities of a number of
people in related, but not identical cases, including meetings they
attended. You may know little of the content of the event, or what
they may have said to each other, only whether particular people
were at the event.
Your data might look like:
Basic Concepts
11.19.2001. Meeting at Brussels. Attending:
Smith, Johnson, Davis, James, Jackson
12.22.2001. Meeting at Paris. Attending:
Johnson, James, Jones, Wilson
1.12.2001. Meeting in New York.
Jones, Carter, Burns
Attending:
2.14.2001. Meeting in Denver. Attending:
Wilson, Burns, Wilf, Newman
(Red bold indicates people who are the focus of an investigation)
Basic Concepts
While perhaps not immediately apparent when looking at the list of names,
a simple algorithm reveals connections among these actors.
Smith
Johnson
Newman
Wilson
Jackson
Wilf
James
Jones
Burns
Davis
Carter
Basic concepts
Types of network data:
1) Ego-network
- Have data on a respondent (ego) and the people they are
connected to (alters)
- May include estimates of connections among alters
Basic concepts
Types of network data:
2) Partial network
- Ego networks plus some amount of tracing to reach
contacts of contacts
- Something less than full account of connections among
all pairs of actors in the relevant population
- Example: CDC Contact tracing data for STDs
Basic concepts
Types of network data:
3) Complete
- Data on all actors within a particular (relevant) boundary
- Never exactly complete, but boundaries are set
- Example: Coauthorship data among all writers in the
social sciences
Examples: linked levels of data
Actor
Key contact
Contact’s contact
Primary Relation
Alter Relation
Trace Relation
Why networks matter:
Consider the following (much simplified) scenario:
•Probability that actor i passes information to actor j (pij)is
a constant over all relations = 0.6
•S & T are connected through the following structure:
S
T
•The probability that S passes the information to T through
either path would be: 0.09
Why networks matter:
Now consider the following (similar?) scenario:
S
T
•Every actor but one has the exact same number of contacts
•The category-to-category mixing is identical
•The distance from S to T is the same (7 steps)
•S and T have not changed their behavior
•Their contacts’ contacts have the same behavior
•But the probability of the information passing from S to T is:
= 0.148
•Different outcomes & different potentials for intervention
Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models & Methods
For Flows and Structures
Conclusions
Network Flow
In addition to the simple probablity that one actor
passes information on to another (pij), two factors
affect flow through a network:
Topology
-the shape, or form, of the network
- Example: one actor cannot pass information to another
unless they are either directly or indirectly connected
Time
- the timing of contact matters
- Example: an actor cannot pass information he has not
receive yet
Topology
Two features of the network’s shape are known to be
important: connectivity and centrality
Connectivity refers to how actors in one part of the network are
connected to actors in another part of the network.
• Reachability: Is it possible for actor i to reach actor j? This
can only be true if there is a chain of contact from one actor
to another.
• Distance: Given they can be reached, how many steps are
they from each other?
• Number of paths: How many different paths connect each
pair?
Network topology: reachability
Without full network data, you can’t distinguish actors with
limited information from those more deeply embedded in a
setting.
c
b
a
Network topology: distance & number of paths
Given that ego can reach alter, distance determines the
likelihood of information passing from one end of the
chain to another.
• Because information spread is never certain, the
probability of transfer decreases over distance.
• However, the probability of transfer increases
with each alternative path connecting pairs of
people in the network.
Network topology: distance & number of paths
Distance is measured by the (weighted) number of relations separating a pair:
Actor “a” is:
1 step from 4
2 steps from 5
3 steps from 4
4 steps from 3
5 steps from 1
a
Network topology: distance & number of paths
Paths are the different routes one can take. Node-independent paths are
particularly important.
b
There are 2 independent
paths connecting a and
b.
There are many nonindependent paths
a
Probability of information transfer
by distance and number of paths, assume a constant p ij of 0.6
1.2
1
probability
10 paths
0.8
5 paths
0.6
2 paths
0.4
1 path
0.2
0
2
3
4
Path distance
5
6
Reachability in Colorado Springs
(Sexual contact only)
•High-risk actors over 4 years
•695 people represented
•Longest path is 17 steps
•Average distance is about 5 steps
•Average person is within 3 steps
of 75 other people
•137 people connected through 2
independent paths, core of 30
people connected through 4
independent paths
(Node size = log of degree)
Network topology: centrality
Centrality refers to (one dimension of) location,
identifying where an actor resides in a network.
• For example, we can compare actors at the edge of
the network to actors at the center.
• In general, this is a way to formalize intuitive
notions about the distinction between insiders and
outsiders.
Centrality example:
At the local level, we
expect people like
NSJMP and NSOLN
to have greater access
to information than
others in the network.
Network analysis
gives us a set of tools
to quantify this
difference.
Centrality example:
Actors that appear
very different when
seen individually, are
comparable in the
global network.
(Node size proportional to betweenness centrality )
Information flows
Two factors that affect network flows:
Topology
- the shape, or form, of the network
- simple example: one actor cannot pass information to
another unless they are either directly or indirectly
connected
Time
- the timing of contacts matters
- simple example: an actor cannot pass information he has
not receive yet
Timing in networks
A focus on contact structure often slights the importance
of network dynamics
Time affects networks in two important ways:
1) The structure itself goes through phases that are
correlated with information spread
2) The timing of contact constrains information flow
Changes in Network
Structure
Sexual Relations among A syphilis outbreak
Rothenberg et al map the
pattern of sexual contact
among youth involved in
a Syphilis outbreak in
Atlanta over a one year
period.
(Syphilis cases in red)
Jan - June, 1995
Sexual Relations among A syphilis outbreak
July-Dec, 1995
Sexual Relations among A syphilis outbreak
July-Dec, 1995
Data on drug users in
Colorado Springs, over
5 years
Drug Relations, Colorado Springs, Year 1
Data on drug users in
Colorado Springs, over
5 years
Drug Relations, Colorado Springs, Year 2
Current year in red, past relations in gray
Data on drug users in
Colorado Springs, over
5 years
Drug Relations, Colorado Springs, Year 3
Current year in red, past relations in gray
Data on drug users in
Colorado Springs, over
5 years
Drug Relations, Colorado Springs, Year 4
Current year in red, past relations in gray
Data on drug users in
Colorado Springs, over
5 years
Drug Relations, Colorado Springs, Year 5
Current year in red, past relations in gray
What impact does timing have on flow through the network?
In addition to changes in the shape over time, contact timing
constrains how information can flow through the network.
Consider the following example:
A hypothetical contact network
C
A
2-5
8-9
E
B
D
Numbers above lines indicate contact periods
3-5
F
The path graph for the hypothetical contact network
A
C
E
D
F
B
Direct contact network of 8 people in a ring
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(adjacency matrix: cell =
number of paths from row to
column)
Implied contact network of 8 people in a ring
All contacts concurrent
1
1
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
2
2
2
2
2
2
1
1
1
2
2
2
2
2
1
Implied contact network of 8 people in a ring
Mixed Concurrent
3
2
1
2
1
1
1
1
1
2
2
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Density = 0.57
1
1
1
1
1
1
Implied contact network of 8 people in a ring
Serial (1)
8
1
1
2
7
3
6
5
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Density = 0.73
1
1
1
1
1
1
1
Implied contact network of 8 people in a ring
Serial (2)
8
1
1
2
7
1
1
1
1
1
1
1
1
3
6
1
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Density = 0.51
1
1
1
Implied contact network of 8 people in a ring
Serial (3)
2
1
1
2
1
1
1
1
1
2
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
Density = 0.43
1
1
1
1
Information flows
Summary:
Topology:
- Information requires connected communication chains
- Real-world networks are too complex to map these
without specialized tools.
Time:
- Network topology changes over time. This has
implications for information flow.
- Because small changes in relationship timing can have
dramatic effects on information flow, it is impossible to
know this intuitively.
Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models & Methods
For Flows and Structures
Conclusions
Structure of Social Space
Information flows are only one use of networks
It is also possible to characterize the key topological
features of any social network. These features
include things such as the extent of hierarchy and
clustering.
Structure of Social Space
1) Identify core groups & patterns of relations among
groups
a. embeddedness in groups constrains action
b. group structure affects stability & resource distribution
2) Locate tensions or inconsistencies in a relational
structure that might indicate sources of social change.
Structure of Social Space
Two features of interest related to network structure:
1) Cohesive groups: Sets of people who interact frequently
with each other. These are often groups that work together.
Groups are often organized into positions within a network
that indicate particular roles or access resources
2) Hierarchy: Relational structure can identify the leadership
positions within a network, though either direction of ties or
periphery status
Structure of cohesive groups
A cohesive group is a set of
actors with more
interaction inside the group
than outside the group,
mutually connected
through multiple paths.
Cohesive Group Structure
“Immaculate Preparatory High School”
Cohesive Group Structure: 3 types of positions
“Immaculate Preparatory High School”
Cohesive Group Structure: Group member
“Immaculate Preparatory High School”
Cohesive Group Structure: Group Member
“Immaculate Preparatory High School”
Cohesive Group Structure: Bridge between groups
“Immaculate Preparatory High School”
Cohesive Group Structure: Outsider
“Immaculate Preparatory High School”
Cohesive Groups: Relevance
• Identify people who bridge important constituencies
- people who are between groups have a unique ability to
control information
•Such actors are said to bridge structural holes, the number of
“holes” an actor bridges gives insight into an actor’s power
position in the network.
Hierarchy and network position
Many cohesive groups are embedded within a hierarchy,
which one can map using relational tools.
Changes in the hierarchical position indicate changes in
the power structure.
Examples of Hierarchical Systems
Linear Hierarchy
(all triads transitive)
Simple
Hierarchy
Branched
Hierarchy
Mixed
Hierarchy
Hierarchy and network position
If you don’t know
the hierarchy of the
network, asymmetry
optimization
techniques allow one
to identify levels in a
hierarchy
Hierarchy and network position
If you don’t know
the hierarchy of the
network, asymmetry
optimization
techniques allow one
to identify levels in a
hierarchy
Group structure through multiple relations
Start with some basic ideas
of what a role is: An
exchange of something
(support, ideas, commands,
etc) between actors. Thus,
we might represent a family
as:
H
W
C
C
C
Romantic Love
Provides food for
Bickers with
(and there are, of course, many other relations inside the family)
Group structure through multiple relations
The key idea, is that we can express a role through a relation (or set
of relations) and thus a social system by the inventory of roles. If
roles equate to positions in an exchange system, then we need only
identify particular aspects of a position. But what aspect?
Structural Equivalence
Two actors are structurally equivalent if they have
the same types of ties to the same people.
Structural Equivalence
A single relation
Structural Equivalence
Graph reduced to positions
Alternative notions of equivalence
Instead of exact same ties to exact same alters, you look for nodes
with similar ties to similar types of alters
Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models & Methods
For Flows and Structures
Conclusions
Tools, Methods & Models
Data Representations
Adjacency Matrix
Graph
1
2
3
5
4
Arc List
Send Recv
1
2
1
3
2
4
3
2
4
1
4
2
4
3
4
5
5
1
5
3
5
4
Node List
Tools, Methods & Models
Graphical Display
Benefits:
•Intuitive way to display networks.
•Helps people see the social space – it is a map.
•A concise presentation of a great deal of data.
Costs:
•Lack of standards for how to display can create
misleading images.
•Displays of large networks tend to reveal only the
roughest properties of the network
Tools, Methods & Models
Graphical Display: Software
PAJEK
•Program for analyzing and plotting very large networks
•Intuitive windows interface
•Used for most of the real data plots in this presentation
•Mainly a graphics program, but is expanding the analytic capabilities
•Free
•Available from:
Tools, Methods & Models
Graphical Display: Software
Cyram Netminer for Windows
•Very new: largely untested
•Price range depends on application
•Limited to smaller networks O(100)
Tools, Methods & Models
Graphical Display: Software
NetDraw
•Also very new, but by one of the best known names
in network analysis software.
•Free
•Limited to smaller networks O(100)
Tools, Methods & Models
Analysis Methods: Descriptive / Measurement
The key text for methods and measurement is:
Wasserman, Stanley and Katherine Faust. 1994.
Social Network Analysis. Cambridge: Cambridge
University Press.
The basic network measures use graph theory to formalize
aspects of the network, and always work from either an
adjacency matrix (slow for large graphs) or an edge/node list.
Tools, Methods & Models
Analysis Methods: Descriptive / Measurement
Properties of interest include:
Individual Level:
Degree: Number of contacts for each person - Sum over the
row/column of the adjacency matrix.
Closeness Centrality: Inverse of the distance to every other
node in the network. Count path distances from ego to alters.
Sub-group Level:
Group Membership: Which groups are there? Various search
algorithms for identifying groups.
Group Position: Where does a given group fit in the overall
flow of relations? Various Equivalence algorithms.
Graph Level:
Density: Number of ties present as a percentage of all
possible ties.
Centralization: To what degree are edges focused through a
small number of nodes. Various formulas for different
centrality indices.
Tools, Methods & Models
Analysis Methods: Descriptive / Measurement: Software
1) UCI-NET
•General Network analysis program, runs in Windows
•Good for computing measures of network topography for single nets
•Input-Output of data is a little clunky, but workable.
•Not optimal for large networks
•Available from:
Analytic Technologies
Borgatti@mediaone.net
2) STRUCTURE
•“A General Purpose Network Analysis Program providing
Sociometric Indices, Cliques, Structural and Role Equivalence,
Density Tables, Contagion, Autonomy, Power and Equilibria In
Multiple Network Systems.”
•DOS Interface w. somewhat awkward syntax
•Great for role and structural equivalence models
•Manual is a very nice, substantive, introduction to network methods
•Available from a link at the INSNA web site:
http://www.heinz.cmu.edu/project/INSNA/soft_inf.html
Tools, Methods & Models
Analysis Methods: Descriptive / Measurement: Software
3) NEGOPY
•Program designed to identify cohesive sub-groups in a network,
based on the relative density of ties.
•DOS based program, need to have data in arc-list format
•Moving the results back into an analysis program is difficult.
•Available from:
William D. Richards
http://www.sfu.ca/~richards/Pages/negopy.htm
4) SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)
•is a collection of IML and Macro programs that allow one to:
a) create network data structures from nomination data
b) import/export data to/from the other network programs
c) calculate measures of network pattern and composition
d) analyze network models
•Allows one to work with multiple, large networks
•Easy to move from creating measures to analyzing data
•All of the Add Health data are already in SAS
•Available by sending an email to:
Moody.77@osu.edu
Tools, Methods & Models
Analysis Methods: Statistical Models
There are two general classes of statistical models for networks:
1) Models of the network itself
The statistical question is how an observed network fits into the class
of all possible random graphs with a given set of topological
characteristics. The whole network is the substantive unit of analysis,
though technically one works with the dyads from the network.
Examples: p* models (Wasserman and Pattison), MCMC random
graph models (Tom Snijders, Mark Handcock)
2) Models of individual behavior that incorporate network characteristics
The statistical question is whether or not network properties affect
individual behaviors.
Examples: Network regressive-autoregressive models (Doriean), Peer
influence models (Friedkin)
Tools, Methods & Models
Analysis Methods: Statistical Models
Exponential Random Graph Models
exp( z ( x))
p ( X  x) 
 ( )
Where:
z is a collection of r explanatory variables, calculated on x
2 is a collection of r parameters to be estimated
k is a normalizing constant that ensures the probability sums to 1.
As it turns out, k is incredibly difficult to identify, introducing a number of
complexities to the model.
Tools, Methods & Models
Analysis Methods: Statistical Models
Exponential Random Graph Models
To estimate the model, we work with the conditional probabilities (Xij|Xcij)
instead of the full graph. This transforms the exponential model to a logit
model on the dyads:
exp{ z ( xij )}

c
p( X ij  0 | X ij ) exp{ z ( xij )}
p( X ij  1 | X ijc )
 exp{ [ z ( xij )  z ( xij )]}
 p( xij  1 | X ijc ) 



wij  log 


[
z
(
x
)

z
(
x

ij
ij )]
c
 p( xij  0 | X ij ) 
Analysis Methods: Statistical Models
Exponential Random Graph Models
Software for analyzing these models is available from:
Logit Pseudo-Likelihood estimation:
http://kentucky.psych.uiuc.edu/pstar/index.html (SPSS programs)
http://www.sfu.ca/~richards/Pages/pspar.html (Program for Large graphs)
Empirically, these models are tricky to estimate, as the potential result space
can easily become degenerate, particularly as z starts to include a more
complicated rage of dependencies.
MCMC Estimation:
Ongoing work by Mark Handcock, Tom Snijders and Co.
Tools, Methods & Models
Analysis Methods: Statistical Models
Network Effect Models
Question is whether or not being connected to a particular set of people
affects an individual’s behavior. The key statistical point is that we
have abandoned the assumption that our cases are independent.
These models originated in spatial statistics – looking at the effect of an
adjacent geographic area on outcomes for any given area.
Basic Peer Influence Model
Formal Model
Y
Y
(t )
(1)
 XB
 αWY
(T 1)
(1)
 (1  α)Y
(1)
(2)
Y(1) = an N x M matrix of initial opinions on M issues for N
actors
X = an N x K matrix of K exogenous variable that affect Y
B = a K x M matrix of coefficients relating X to Y
a = a weight of the strength of endogenous interpersonal
influences
W = an N x N matrix of interpersonal influences
Basic Peer Influence Model
Formal Model
Y
(1)
 XB
(1)
This is the basic general linear model.
It says that a dependent variable (Y) is some function (B) of a set of independent
variables (X). At the individual level, the model says that:
Yi   X ik Bk
k
Usually, one of the covariates is e, the model error term.
Basic Peer Influence Model
Y
(t )
 αWY
(T 1)
 (1  α)Y
(1)
(2)
This part of the model taps social influence. It says that each person’s final opinion is
a weighted average of their own initial opinions
(1  α)Y
(1)
And the opinions of those they communicate with (which can include their own current
opinions)
αWY
(T 1)
Basic Peer Influence Model
The key to the peer influence part of the model is W, a matrix of
interpersonal weights. W is a function of the communication structure of the
network, and is usually a transformation of the adjacency matrix. In general:
0  wij  1
w
ij
1
j
Various specifications of the model change the value of wii, the extent to which
one weighs their own current opinion and the relative weight of alters.
Basic Peer Influence Model
Formal Properties of the model
If we allow the model to run over t, we can describe the model as:
Y
()
 αWY
()
 (1  α) XB
The model is directly related to spatial econometric models:
Y
()
 αWY
()
~
 X  e
Where the two coefficients (a and ) are estimated directly (See Doreian,
1982, SMR)
Overview
Introduction
Basic Concepts
Flows within Networks
Structure of Social Space
Tools, Models & Methods
For Flows and Structures
Conclusions
Download