Introduction to Networks

advertisement
Wh t are networks?
What
t
k?
Brian D. Fath
y, Towson,, MD USA
Towson University,
International Institute for Applied Systems
Analysis, Laxenburg, Austria
A new paradigm
Environmental concerns have become of paramount importance.
Certain global problems may soon be irreversible (example,
deforestation, extinction, soil loss, climate change) - can’t turn
back the clock.
These are systemic
Th
t i problems
bl
that
th t can’t
’t be
b understood
d t d in
i isolation
i l ti
but rather are interconnected and interdependent.
Current problem: Pest management
Conventional response to crop pest is to spray a pesticide designed to kill
that insect. Imagine a perfect pesticide that kills all target insects and
which has no side effects on air, water, or soil.
Is using this pesticide likely to make the farmer better off?
Representing
R
ti the
th thinking
thi ki usedd by
b those
th
applying
l i the
th pesticides
ti id would
ld
look like this:
Unfortunately, what frequently happens is that in following years the
problem of crop damage gets worse and worse and the pesticide that
formerly seemed so effective does not seem to help anymore.
E.g., the pest was controlling another insect population, either by
predation
d ti or competition.
titi The
Th effective
ff ti pesticide
ti id eliminates
li i t the
th
control that those insects were applying on the population of the other
insects. Then non
non-target
target insect populations explode and cause more
damage than the insects killed by the pesticide.
In other words, the action intended to solve the problem
actually makes it worse because unintended side effects
change the system ends up exacerbating the problem.
Studies suggest a majority of the 25 insects that cause the
most crop damage became problems because of this cycle.
Many important problems today are complex, involve
multiple actors, and are at least partly the result of past
actions that were taken to alleviate them. Dealing with
such problems is difficult and the results of conventional
solutions are often poor enough to create discouragement
about the prospects of ever effectively addressing them.
If everything is connected to everything else,
else
then how can we ever know anything?
There is a need for scientific
methodologies that deal with
whole systems:
Systems modelling and
Network analysis are such
approaches
System theory
Core Assumptions and Statements
System theory is the transdisciplinary study of the abstract
organization of phenomena, independent of their substance, type,
or spatial
ti l or temporal
t
l scale
l off existence.
it
It investigates
i
ti t both
b th the
th
principles common to all complex entities, and the (usually
mathematical) models which can be used to describe them.
them
History of Systems theory
•Ludwig von Bertalanffy - biologist (1940s)
((General Systems
y
Theory,
y 1968))
•Ross Ashby (Introduction to Cybernetics, 1956).
•Jay
J F
Forrester
t founded
f d d System
S t dynamics
d
i in
i 1956 - way off ttesting
ti
new ideas about systems, in the same way we test ideas in
engineering.
engineering
•Club of Rome –think tank developed world models
•Donella Meadows et al. “Limits to Growth”
•George
G
Kli
Klir (Facets
(F t off Systems
S t
Science,
S i
1991) di
discusses conceptual
t l
foundations and philosophy (e.g. Bunge, Bohm and Laszlo);
•Fritjof Capra “popularized” systems theory ideas through mass
media books and application to social system
von Bertalanffyy was both reactingg against
g
reductionism and
attempting to revive the unity of science.
The approach of systems thinking is fundamentally different from that
of traditional forms of analysis, which focuses on separating the
individual pieces of the study object.
Rather than reducingg an entityy to the pproperties
p
of its parts,
p , systems
y
theory focuses on the arrangement of and relations between the parts
which connect them into a whole.
This results in sometimes strikingly different conclusions, especially
when what is being studied is dynamically complex or has a great
deal of feedback from internal or external sources.
Investigating Biological Systems
Haeckel – 1866 “Ecology” (oikos) study of Earth household.
von Uexkull
U k ll – Umwelt
U
lt means "environment"
" i
t" or "surrounding
"
di world"
ld"
Lotka – energy flow in ecology
Elton – feeding relations
Tansley 1935 coined term “ecosystem”
Lindeman 1942 – trophic dynamic concept
Vernadsky – Biosphere: life partly creates and partly controls the
planetary environment
Lovelock and Margulis: Gaia
A system can be said to consist of four things:
1. the parts or elements of the system.
1
system These may be physical or
abstract or both, depending on the nature of the system.
22. the
th qualities
liti or properties
ti off the
th system
t andd its
it objects;
bj t
attributes
3 internal
3.
i
l relationships
l i hi among its
i objects.
bj
4. systems exist in an environment.
A system, then, is a set of things that affect one another
within an environment and form a larger pattern that is
different from any of the parts.
S t
Systems
as Networks
N t
k
Introduction to Networks
Fundamental Concepts in Network Analysis
C
Concerned
d with
ith understanding
d t di linkages
li k
among
actors/objects and the implications of them.
Actor/Object – are discrete individual or collective
unit
it (people,
(
l departments,
d
t
t nations,
ti
corporate
t sectors,
t
species, trophic groups, cells, organelles).
Connections/Ties – links between two actors/agents
Transaction – exchange of material or information
• transfer
t
f off material
t i l resources (financial,
(fi
i l energetic)
ti )
• movement (migration)
• behavioral interaction (talking,
(talking messaging)
Pattern – structure of organization
• evaluation of one person by another (friendship, respect)
• association or affiliation ((social groups,
g p , trophic
p groups)
g p)
• physical connection (road, river, bridge)
• formal relation (authority)
(
y)
• biological relation (kinship)
Communicable disease
Syphilis Outbreak in Rockdale County, Georgia 1996
Terrorism network
High School Friendship
High School Dating
The Internet
Ecological Food Web
Oyster Reef Model
z1 = 41.4697
y1 = 25.1646
25 1646
f61 = 0.5135
0 5135
Filter
Feeders
P d
Predators
x6 = 69.2367
x1 = 2000.00
f21 = 15.7915
15 7915
y6 = 0.3594
f26 = 0.3262
f65 = 0.1721
f25 = 1.9076
y2 = 6.1759
Deposited
Detritus
x2 = 1000.00
f53 = 1.2060
y5 = 0.4303
0 4303
x5 = 16.2740
f24 = 4.2403
f32 = 8.1721
y3 = 5.7600
Deposit
Feeders
f52 = 0.6431
0 6431
f42 = 7.2745
7 2745
f54 = 0.6609
Microbiota
Meiofauna
x3 = 2.4121
x4 = 24.12140
f43 = 1.2060
Dame and Patten 1981
y4 = 3.5794
Network analysis is a tool that allows you to formally
(i e not just intuitively) investigate and interpret
(i.e.,
systems.
A big part of this class will be in learning to
recognize,
i construct,
t t analyze
l
and
d interpret
i t
t
(socio)-ecological networks!!!
What are network data?
Boundary specification
What
h iis your population?
l i
Must have a finite set of actors
(
(company,
sports league,
l
ecosystem, group).
)
Who
h are the
h relevant
l
actors??
Identify the population.
How are they connected?
Id if the
Identify
h connections.
i
Be
B consistent.
i
How do you get the data?
D t measurementt andd collection
Data
ll ti
Questionnaires
I t i
Interviews
Observations
A hi l records
Archival
d
Experiments
Oth techniques
Other
t h i
Examples:
Ecosystems (from field or from literature)
Economies
Employment
p oy e
Kinship
Social relations
Sports leagues
Let’s construct a network of the students in the class…
Notation for network data: Graphs
Whyy ggraphs?
p
A ggraph
p is a model of the system.
y
Model - a simplified representation
Graphs
p provide:
p
• a common vocabulary
• known mathematical operations
p
• one can prove theorems about graphs and hence
about representations
p
of network structures.
A graph consists of two sets of information:
{ 1, x2, …,, xn} and
a set off nodes,, X = {x
a set of lines, L = {l1, l2,…, lL} between pairs
of nodes
nodes.
Th are n nodes
There
d andd L lines.
li
Graph – undirected pairwise connection (“is
)
kin to”,, “lives near”,, “works with”).
No direction implied.
Two nodes are adjacent if the line lk = (xi, xj), is in
the set of lines L.
L
Each line is an unordered pair of distinct node
lk = (xi, xj), since it is unordered lk = (xi, xj) = (xj, xi).
xi
lk
xj
Loop – single edge starting and ending on same
node
Simple graph – no m
multiple
ltiple edges or loops
Special
p
Cases:
A trivial graph is one with only one node.
An empty graph is one with no lines.
X1
X2
X3
X4
X5
X6
Actor
Allison
Drew
Eliot
Keith
Ross
Sarah
l1
X1
Allison
l2
X3
Eliot
X5
Ross
l3
l4
l6
Connection (lives near)
Ross,, Sarah
Eliot
Drew
Ross, Sarah
Allison,, Keith,, Sarah
Allison, Keith, Ross
X2
Drew
X4
Keith
l5
X6
Sarah
l1 = (x
( 1, x 6)
l2 = (x1, x5)
l3 = (x2, x3)
l4 = ((x4, x5)
l5 = (x4, x6)
l6 = (x5, x6)
Degree of a node is given by the number of
nodes that are adjacent to it
it.
Degree ranges from 0 to n–1
n 1
Each node could have its own degree
Mean nodal degree is a statistic that reports
th average degree
the
d
off the
th nodes
d in
i the
th graph.
h
d =
∑
n
i =1
d ( xi )
n
2L
=
n
X1
Allison
X2
Drew
X3
Eliot
X4
Keith
X5
Ross
X6
Sarah
d(x1)=2
d(x2)=1
d(x3)=1
d(x4)=2
d(x5)=3
d(x6)=3
Total=12
n=6
Mean nodal degree = 2
d =
∑
n
d
(
x
)
i
i =1
n
2L
=
n
If all node degrees are equal graph is said to
be d-regular, a measure of uniformity
If it is not dd-regular,
regular the variance of degrees
is calculated as:
∑ (d ( x ) − d )
n
S =
2
D
i =1
i
n
2
∑ (d ( x ) − d )
n
S =
2
D
SD =
=
i =1
i
n
2
2
2
2
2
2
(
)
(
)
(
)
(
)
(
)
(
)
( 2− 2 + 2− 1 + 2− 1 + 2− 2 + 2− 3 + 2− 3 )
6
2
2
2
2
2
2
(
)
(
)
(
)
(
)
(
)
(
)
(0 + 1 + 1 + 0 + −1 + −1 )
4
=
6
2
6
Which has a higher mean nodal
Degree standard deviation?
X5
X4
B.
X3
X1
X2
X1
X3
X4
X5
A.
X2
Graph Density – proportion of lines in graph
Since there are n nodes, and excluding
loops, there are n(n–1)/2 possible lines in
the graph.
L
2L
Δ =
=
n( n − 1) / 2 n( n − 1)
Relation between density and mean degree.
C combine
Can
bi equations
ti
to
t get:
t
d
Δ =
( n − 1)
X1
Allison
X2
Drew
X3
Eliot
X4
Keith
X5
Ross
X6
Sarah
2L
2(6) 12
Δ =
=
=
= 0.40
n( n − 1) 6( 5) 30
If all lines are present, then the graph is called
a complete
l t graph,
h Kn
Denoted Kn and has n(n-1)/2
n(n 1)/2 undirected
ndi ected edges
Example Florentine Families
d = 2.5; S D2 = 2.120; Δ = 01667
.
;
Nodal Degree
g
1 Acciaiuol
2 Albizzi
3 Barbadori
B b d i
4 Bicheri
5 Castellan
6 Ginori
7 Guadagni
8 Lambertes
9 Medici
10 Pazzi
11 Perruzi
12 Pucci
13 Ridolfi
14 Salviati
15 Strozzi
16 Tornabuon
1
3
2
3
3
1
4
1
6
1
3
0
3
2
4
3
Walk, trail, and path
Walk is a sequence of nodes and lines, starting
and ending with nodes
Length
h off a walk
lk is
i number
b off occurrences off
lines in it.
If a line is included more than once on the walk,
then it is counted each time it occurs.
occurs
Walk, trail, and path (cont)
Trail is a walk in which all lines are distinct
Path is a walk in which are all nodes and lines are
di i
distinct
If there is a path between two nodes xi and xj then
xi and xj are said to be reachable.
A graph is connected if there is a path between
every pair of nodes, i.e., all nodes are reachable.
Distance and Diameter
Distance, d(i,j), is the shortest path
b
between
pairs
i off nodes
d
Diameter of a connected graph is the
length of the largest distance between
any pair of nodes.
nodes
Graph vs. Subgraph
Node and line generated subgraphs
selecting
l i nodes
d or lines
li
to generate a subgraph
b
h
Connected subgraphs in a graph are called components
Graph Connectivity
Cutpoints: A node, xi is a cutpoint if the number of
components in
i the
h graphh with
i h xi is
i fewer
f
than
h the
h
number of components in the subgraph that results
from deleting xi from the graph.
Bridge,
B
d analogous
l
to cutpoint.
i A bridge
b id is
i a line
li that
h is
i
critical to the connectedness of the graph.
Florentine example revisited
A vulnerable graph is one that is more
likely to become disconnected if a few
nodes or lines are removed.
Cutpoints:
Albizzi
Guadagni
Medici
Salviati
Bridges:
Albizzi-Ginori
Guadagni-Lambertes
Medici Salviati
Medici-Salviati
Pazzi-Salviati
Medici-Acciaiuol
Isomorphic graphs – one
one-to-one
to one mapping,
that preserves the adjacency of the nodes.
If two graphs are isomorphic, then they are
identical on all graph theoretic properties.
x2
x1
x1
x2
x3
x4
x3
x4
Cyclic and acyclic graphs
A graph that is connected and is acyclic is
called a tree.
A di
disconnected
d graph
h with
i h no cycles
l is
i called
ll d a
forest
DIRECTED GRAPHS
Many connections are directional, meaning
it is oriented from one actor to another.
Directed graph or digraph, has a set of nodes and
arcs. Each
E h arc is
i an ordered
d d pair
i off distinct
di ti t
nodes The arc <xi, xj> is direct from xi (the
origin or sender) to xj (the termin
terminuss or recei
receiver).
er)
In <xi, xj>, node xi is adjacent to xj, and node xj
is adjacent from xi
The arc is represented
p
byy an arrow.
Three types of directed dyads
1. Null dyads have no arcs, in either
direction between the two nodes.
2 A
2.
Asymmetric dyad
d d has
h an arc going
i
in one direction or the other, but
not both
x11
x22
x1
x2
oor
x1
x2
3 A mutuall or reciprocall dyad
3.
d d has
h
two arcs one going in one direction x1
and the other going in the opposite
direction.
x2
Actor
X1 Allison
X2 Drew
X3 Eliot
X4 Keith
X5 Ross
X6 Sarah
Connection (likes at beginning of year)
Drew Ross
Drew,
Eliot, Sarah
Dre
Drew
Ross
Sarah
Drew
Allison
Drew
Eliot
Keith
Ross
Sarah
Indegree, dI(xi), is the number of nodes
th t are adjacent
that
dj
t to
t or the
th number
b off arcs
terminating at xi.
Outdegree, dO(xi), is the number of nodes
that are adjacent from or the number of
arcs originating at xi.
Outdegrees are measure of expansiveness
IIndegrees
d
measure off receptivity
ti it or
popularity
Mean indegree and outdegree
dI =
dO =
∑
n
d
(
x
)
I
i
i =1
n
∑
n
d
(
x
)
O
i
i =1
n
L
d I = dO =
n
Variance of indegree and outdegree
∑ (d
n
S
2
DI
=
i =1
I
( xi ) − d I )
n
∑ (d
n
S
2
DO
=
2
i =1
O
( xi ) − d O )
2
n
Measures how unequal the actors are in a
network wrt originating or receiving connections
Types of nodes in a directed graph
Isolate if dI(xi) = dO(xi) = 0,
Transmitter if dI(xi) = 0 and dO(xi) > 0,
Receiver if dI(xi) >0 and dO(xi) = 0,
Carrier or ordinary if dI(xi) >0 and dO(xi) > 0.
L
Density of a directed graph: Δ =
n( n − 1)
Distance and Diameter of digraph
Distance shortest path from xi to xj
Diameter is the length of the longest distance
between any pair of nodes.
Valued graphs and value directed graphs
Weighted graphs, frequency of interaction,
dollar amount of exchange, energy flow in
ecosystem.
Set of graphs whose values are probabilities.
Th
These
graphs
h are known
k
as Markov
M k Chains
Ch i
and their corresponding matrices are
referred to as transition matrices.
“For the last thirty years, empirical social research has been
dominated by
b the sample survey.
s r e But
B t as usually
s all practiced,
practiced using
sing
random samplings of individuals, the survey is a sociological
meatgrinder, tearing the individual from his social context and
guaranteeing that nobody in the study interacts with anyone else in it.
It is a little like a biologist putting his experimental animals through a
hamburger machine and looking at every hundredth cell through a
microscope; anatomy and physiology get lost, structure and function
disappear and one is left with cell biology
disappear,
biology… If our aim is to
understand people’s behavior rather than simply record it, we want to
know about primary groups, neighborhoods, organizations, social
circles, and communities; about interaction, communication, role
expectations, and social control.”
Barton 1968 reprinted in Freeman 2004.
Freeman defines Social Network Analysis, as a defined
pparadigm
g of research, havingg the following:
g
1. Social network analysis is motivated by a structural
intuition based on ties linkingg social actors,
2. It is grounded in systematic empirical data,
3. It draws heavily on graphic imagery, and
4. It relies on the use of mathematical and/or
computational models.
Graph information can be expressed as a Matrix.
U f l ffor presenting,
Useful
ti manipulating,
i l ti andd analyzing
l i
data
Adjacency Matrix
Matrix, (A=aij) – rows and columns labeled by
edges, with a 1 in position (ai, aj) iff ai and aj are adjacent, and 0
otherwise. Graph with no loops, the adjacency matrix must have
0s on the diagonal.
In undirected graphs the adjacency matrix is symmetric: aij=aji
X1
X2
X3
X4
X5
X6
Actor
Allison
Drew
Eliot
Keith
Ross
Sarah
Connection (lives near)
Ross,, Sarah
Eliot
Drew
Ross, Sarah
Allison,, Keith,, Sarah
Allison, Keith, Ross
Allison
Drew
Eliot
Keith
Ross
Sarah
⎡0
⎢0
⎢
⎢0
A= ⎢
⎢0
⎢1
⎢
⎣1
0 0 0 1 1⎤
0 1 0 0 0⎥
⎥
1 0 0 0 0⎥
⎥
0 0 0 1 1⎥
0 0 1 0 1⎥
⎥
0 0 1 1 0⎦
Actor
X1 Alli
Allison
X2 Drew
X3 Eliot
X4 Keith
X5 Ross
X6 Sarah
Connection (likes at beginning of year)
D
Drew,
R
Ross
Eliot, Sarah
Drew
Ross
Sarah
Drew
⎡ 0 1 0 0 1 0⎤
Allison
so
Drew
ew
Eliot
Keith
Ross
Sarah
⎢0
⎢
⎢0
A= ⎢
⎢0
⎢0
⎢
⎣0
0 1 0 0 1⎥
⎥
1 0 0 0 0⎥
⎥
0 0 0 1 0⎥
0 0 0 0 1⎥
⎥
1 0 0 0 0⎦
Matrix Vocabulary
Size (or order) is defined as the number of rows and
columns in the matrix.
Adjacency matrices have the same number of rows
and columns and thus are square.
Each entry in a matrix is called a cell or element
Main Diagonal – consists of the entries in which the row
and column index are the same (aii).
A symmetric matrix is one with aij=aji for all i,j
Matrix Vocabulary (cont)
Matrix addition is possible if the matrices are the same size,
Z=X+Y, where zij=xij+yij
Matrix Vocabulary (cont)
Matrix multiplication is used to study walks and reachability
Z=YW
Number of columns of Y must equal the number of rows of W
Identity matrix (I) is defined such that I (X) ≡ X
⎡1
⎢0
I= ⎢
⎢M
⎢
⎣0
0 L 0⎤
1 L 0⎥
⎥
M O M⎥
⎥
0 L 1⎦
Matrix Vocabulary (cont)
Powers of a matrix XX=X2
XX2 =X3
XX3 =X4
in general, Xm (X to the mth power) is the matrix product of X
times itself, p times
Powers of a matrix!!
The matrix Xm gives exactly the number of walks between
two nodes of length m.
X1 are the direct walks.
X2 are the walks that take two steps
X3 are the walks that take three steps, etc.
Notice that some elements which were zero originally get
filled in.
In other words we have a way to identify the indirect, i.e.,
m>1, walks in the matrix, and hence in the graph.
Example
l 1 - digraph
di
h
x1
x3
x2
⎡ 0 0 1⎤
⎢
⎥
A = ⎢ 1 0 0⎥
⎢⎣ 1 1 0⎥⎦
Higher Order (Indirect) Pathways
Am ,
x1
where m > 1
What happens to aij as m @ ?
x3
⎡ 1 1 0⎤
⎢
⎥
2
A = ⎢ 0 0 1⎥
⎢⎣ 1 0 1⎥⎦
⎡1 0 1⎤
⎢
⎥
3
A = ⎢1 1 0⎥
⎢⎣1 1 1⎥⎦
⎡ 1 1 1⎤
⎢
⎥
4
A = ⎢ 1 0 1⎥
⎢⎣ 2 1 1⎥⎦
⎡ 2 1 1⎤
⎢
⎥
5
A = ⎢ 1 1 1⎥
⎢⎣ 2 1 2⎥⎦
x2
Powers of a matrix!!
Over all walk lengths if there is a way to get between any
two nodes than they are reachable so one can sum the
powers of the matrices to see if there are any gaps in the
connectedness.
X[R]=X+X2+X3+…Xn–1
Two nodes are reachable if and only if X[R]1 and not
reachable if it is 0.
MATRIX CALCULATIONS
PRACTICE:
PRACTICE
1. BY HAND
2 WITH MATLAB
2.
z1 = 41.4697
Oyster reef model
y1 = 25.1646
f61 = 0.5135
Filter
Feeders
y6 = 0.3594
x6 = 69.2367
x1 = 2000.00
f21 = 15.7915
Predators
f26 = 0.3262
f65 = 0.1721
f25 = 1.9076
y2 = 6.1759
6 1759
Deposited
Detritus
x2 = 1000.00
f53 = 1.2060
Deposit
Feeders
y5 = 0.4303
x5 = 16.2740
f24 = 4.2403
f32 = 8.1721
8 1721
y3 = 5.7600
f52 = 0.6431
f42 = 7.2745
f54 = 0.6609
Microbiota
Meiofauna
x3 = 2.4121
x4 = 24.12140
f43 = 1.2060
y4 = 3.5794
A=
0
1
0
0
0
1
0
0
1
1
1
0
0
0
0
1
1
0
0
1
0
0
1
0
0
1
0
0
0
1
0
1
0
0
0
0
A2 =
0
1
1
1
1
0
A3 =
0
2
1
2
3
1
0
2
0
1
2
1
0
4
2
2
3
2
0
2
0
0
1
1
0
2
2
2
2
1
0
1
1
1
1
1
0
3
1
2
3
1
0
1
1
1
1
0
0
2
1
2
3
1
0
0
1
1
1
0
A4 =
0
6
2
3
5
3
0
2
0
1
2
1
A5 =
0 0 0
11 17 12
6 7 5
8 11 7
11 17 11
5 8 6
0
7
4
6
8
3
0
5
2
4
6
2
0
6
3
4
6
3
0
6
2
3
5
3
0
4
2
2
3
2
0 0 0
13 11 7
6 6 4
9 8 6
13 11 8
6 5 3
A10 =
0
519
241
353
518
242
0 0 0
759 518
354 241
519 354
760 519
353 241
0
595
277
406
595
277
0
519
241
353
518
242
354
165
241
353
165
A20 =
0
1083304
504356
739169
1083304
504355
0
1587660
739168
1083304
1587660
739169
0
0
0
0
1083305 1243524 1083304
739168
504355
578949
504356
344136
739168
848491
739169
504356
1083304 1243524 1083304
739169
504356
578949
504355
344135
THANK YOU FOR YOUR ATTENTION
Download