Graph Theory and Social Network Analysis

advertisement
Theory!
Engage brain ;)
1
Network Analysis:
Social Networks
Network Analysis


‘Network’ is a heavily overloaded term- >‘network
analysis’ means different things to different people.
Specific forms of network analysis are used in the study of
diverse structures such as:
–
–
–
–
–
–
–

3
aspects of the Internet,
interlocking directorates,
transportation systems,
epidemic spreading,
metabolic pathways,
the Web graph,
electrical circuits, etc.
There is, however, a broad methodological foundation
which is quickly becoming a prerequisite for researchers
and practitioners working with network models.
- “Network Analysis” -Brandes, Ulrik; Erlebach, Thomas (Eds.)
Hint: how to make this
theory easier ...
4

Understand first – get the idea straight. Then look for the
formal definitions and terms.

Draw a mind map during the lectures.

Keep track of all new terminology. Learn it!

Organize your mental framework.
Networks and Graphs


Mathematically, social (and other) networks can be
represented as graphs or matrices (different to the charts
and graphs you created in Excel)
Representing a problem as a graph:
–
–
5
can provide a different point of view
can make a problem much simpler and provide the appropriate
tools for solving the problem
http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt
Terminology

There are two components to a graph:
–
–
–

As a broad generalisation, in social networks:
–
6
Nodes and edges
Nodes also called points, vertices, actors (only in social networks)
Edges also called ties, links, arcs (digraphs))
People are nodes and interactions between people are edges
Formal definition

A graph is defined as a set of nodes (a,b,c,d,e) and a set of
links ({a,b}, {b,c}, {c,d}, {d,a}, {a,e}) that connect the nodes.

This is sometimes written mathematically as G=(V,E) or
G(V,E). The V and the E stand for vertices (nodes) and
edges (links/ties)
http://www.analytictech.com/mb021/graphtheory.htm
7
Social Network Analysis
Social network analysis [SNA] is the mapping and
measuring of relationships and flows between people,
groups, organizations, computers, URLs, and other
connected information/knowledge entities.
 The nodes in the network are the people and groups while
the edges show relationships or flows between the nodes.
 SNA provides both a visual and a mathematical analysis of
human relationships. Management consultants use this
methodology with their business clients and call it
Organizational Network Analysis [ONA].

http://www.orgnet.com/sna.html
8
Social Network Analysis

Extremely fast growing field:
– "supplier" to sociology, anthropology,
psychology, management, health, etc
– "client" of graph theory, algebra, statistics,
sociometry, psychometry

9
Most complex systems are graph-like
http://www.analytictech.com/networks/whatis.htm
Social Network Examples:
Friendship
10
http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt
Social Network Examples:
Scientific Collaboration
11
http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt
Social Network Examples:
Business ties in US biotech-industry
12
http://www.instituteforadvancedstudies.org.uk/Portals/50/ComplexNetworks_Jan/Estrada_tutorial1.ppt
Social Network Analysis

Problems:
– IRL - lack of sufficient computing resources to handle large
datasets. Problematic to bound a social network IRL, E.G. If we are
looking at needle-sharing among drug users, we can artificially
bound the network at some arbitrary boundary, such as city or
neighbourhood, but this distorts the data
– Online – SNSs allow us to use groups and communities
membership lists to create boundaries
– criticised as too theoretical but a proper understanding gives
enormous power to control people and events, as well as
understanding of history (e.g. rise of Moscow in 12th century
Russia in terms of betweenness centrality)
13
http://www.analytictech.com/networks/whatis.htm
14
Basic Graph Theory



15
Graph theory gives us the tools to analyse social networks
The length of the lines and the shape of the graph doesn’t
usually mean anything because all it is representing is that
there is or is not a relationship
These three graphs show the exact same network:
Example
16
A network depicting the sites on the Internet, then known as the
Arpanet, inDecember 1970. (Image from F. Heart, A. McKenzie, J.
McQuillian, and D. Walden [215]; on-line at
http://som.csudh.edu/cis/lpress/history/arpamaps/.)
Examples
An alternate drawing of the 13-node Internet graph on the
previous slide – looks different but expresses the same
thing.
17
More on graphs


18
The nodes in a graph represent persons (or animals,
organizations, cities, countries, etc) and the lines (edges)
represent relationships among them. The edge between
persons a and b is represented mathematically like this:
(a,b).
The network drawn below contains these edges: (a,b),
(a,e), (a,d), (b,c), and (d,c).
http://www.analytictech.com/mb021/graphtheory.htm
Mathematically:



19
V is a finite set, called the vertices of G,
E is a finite set, called the edges of G, and
φ is a function with domain E and codomain P2(V ).
Degree, & the handshaking lemma


20
The degree of a node is defined as
the number of lines incident upon
that node. The degree of B is 6
because it has 6 links
The handshaking lemma is the
statement that every finite
undirected graph has an even
number of nodes/vertices with
odd degree
Name comes from mathematical proof that in a party of
people some of whom shake hands, an even number of
people must have shaken an odd number of other people's
hands (think about it ;) ).
Isolates and pendants
21

How would you add an
isolate to this graph?

In a SNS on the web,
when would you typically
be an isolate and for how
long?

What is a pendant?
Locate one on the graph.
Undirected graphs


22
There are different kinds of graphs, e.g. directed and
undirected
undirected graphs: edges have no direction. For example
below, there is a relationship between a and b, and this is
the same thing as saying there is a relationship between b
and a. We could refer to the edge as (a,b) or (b,a) -- it
makes no difference.
Directed graphs


23
In directed graphs (also known as digraphs), the edges do
have direction. In such cases, we typically draw the graph
with arrowheads, and refer to the lines as "arcs".
Example below records the social relation "who likes
whom". Persons b, d and e all say they like person a. Note
that person does not say they like d or e, but they do
reciprocate with b. Nobody says they like e.
More on graphs


24
In a directed graph, a point has both indegree and
outdegree. The outdegree is the number of arcs from that
point to other points. E.G., below the outdegree of node a
is 1.
The indegree is the number of arcs coming in to the point
from other points. E.G., below the indegree of node a in is
3.
Paths

25
A path is an alternating sequence of points and lines,
beginning at a point and ending at a point, and which
does not visit any point more than once
Walks


26
A Walk is an alternating sequence of nodes and edges,
beginning and ending with a vertex
A walk is closed if its first and last vertices are the same,
and open if they are different. An open walk is also called
a path.
Examples
27
(a) www.airlineroutemaps.com/USA/Northwest Airlines asia pacic.shtml, (b) www.wmata.com/metrorail/systemmap.cfm, (c) www.cs.cornell.edu/ugrad/owchart.htm
Examples

28
The depictions of airline and subway systems in (a) and (b)
are examples of transportation networks, in which nodes
are destinations and edges represent direct connections.
Much of the terminology surrounding graphs derives from
metaphors based on transportation through a network of
roads, rail lines, or airline flights
A Water Molecule
29
Examples


30
The prerequisites among college
courses in (c) is an example of a
dependency network, in which
nodes are tasks and directed edges
indicate that one task must be
performed before another.
The design of complex software
systems and industrial processes
often requires the analysis of
enormous dependency networks,
with important consequences for
efficient scheduling in these
settings.
Examples


31
The Tank Street Bridge from Brisbane, Australia shown in (d) is an
example of a structural network, with joints as nodes and physical
linkages as edges.
The internal frameworks of mechanical structures like buildings,
vehicles, or human bodies are based on such networks
Connectedness and reachability


32
A graph is connected if there exists a path (of any length)
from every node to every other node. The longest
possible path between any two points in a connected
graph is n-1, where n is the number of nodes in the graph
A node is reachable from another node if there exists a
path of any length from one to the other
Examples


A graph with 3 connected
components. The graph as a
whole is NOT connected.
a connected component of a
graph is a subset of the
nodes such that:
– (i) every node in the subset has
a path to every other; and
– (ii) the subset is not part of
some larger set with the
property that every node can
reach every other
33
Examples

34
A network in which the nodes are students in a large American high
school, and an edge joins two who had a romantic relationship at
some point during the 18-month period in which the study was
conducted
Geodesic distance



35
The graph-theoretic or geodesic distance between two
points is defined as the length of the shortest path
between them.
Thus, a geodesic path is the shortest path between 2
points
If something is flowing through a network (such as gossip,
or a disease), the time that it takes to get from one point
to another is partly a function of the graph-theoretic
distance/length of the geodesic path between them.
Pivotal nodes




36
We say that a node X is pivotal for a pair
of distinct nodes Y and Z if X lies on
every geodesic (shortest) path between
Y and Z (and X is not equal to either Y or
Z)
E.G. in graph below, node B is pivotal
for two pairs: the pair consisting of A
and C, and the pair consisting of A and
D.
B is not pivotal for the pair consisting of
D and E – why?
Is node D is not pivotal for any pairs?
Exercises
37

Give an example of a graph in which every node is pivotal
for at least one pair of nodes. Explain your answer.

Give an example of a graph in which every node is pivotal
for at least two different pairs of nodes. Explain your
answer.

Give an example of a graph having at least four nodes in
which there is a single node X that is pivotal for every pair
of nodes (not counting pairs that include X). Explain your
answer.
Centrality
The centrality of a node in a network is a measure
of the structural importance of the node.
 A person's centrality in a social network affects
the opportunities and constraints that they face.
 There are three important aspects of centrality:

– degree/activity,
– betweenness, and
– closeness.
38
Centrality


39
Degree/Activity – the number of direct connections a
node has, i.e. who knows whom.
the greater a person's degree, the more potential
influence they have on the network, and vice-versa
(good/bad -> gossip, viruses)
Centrality

40
Here Diane has the most direct connections, making hers the
most active node in the network. She is a 'connector' or 'hub' in
this network.
Centrality

Degree/Activity
– Often "the more connections, the better”, but not always.
– Where do those connections lead, how do they they connect the
otherwise unconnected?
– Diane has connections only to others in her immediate cluster -her clique. She connects only those who are already connected to
each other.
41
Centrality

Betweenness
– Loosely speaking, betweenness centrality is defined as the number of
geodesic paths that pass through a node (remember that a geodesic
path is the shortest path between 2 points)
– It is the number of "times" that any route needs go through a given
node to reach any other node by the shortest path, i.e. for a given node
it is the number of shortest paths in the network that pass through that
node
42
Calculating betweenness

43
Betweenness with respect to specific nodes
= #geodesic paths through node/total #geodesic paths
Olaf’s Betweenness Score with respect to Eliza and Latisha:
 Geodesic paths from Eliza to Latisha: EJOBL and EAOBL
 Geodesic paths from Eliza to Latisha through Olaf: EJOBL and EAOBL
 Olaf’s betweeness score with respect to Eliza and Latisha is 1.
Calculating betweenness

44
Betweenness with respect to specific nodes
= #geodesic paths through node/total #geodesic paths
Anna’s Betweenness Score with respect to Eliza and Latisha:
 Geodesic paths from Eliza to Latisha: EJOBL and EAOBL
 Geodesic paths from Eliza to Latisha through Anna: EAOBL
 Anna’s betweeness score with respect to Eliza and Latisha is 1/2.
Centrality

Betweenness
– A node that has high betweenness can control the
flow of information, acting as a gatekeeper. (e.g.,
executive secretaries, power)
– benefits to being in the middle:
 the information benefit from being plugged into
different camps or regions of the network – e.g.
hear different versions of the story, more data to
be able to predict outcomes
the control benefit of being able to play one person
against the other.
45
Centrality

Betweenness
– Diane has many direct ties. Heather has few direct fewer than the
average in the network but has one of the best locations in the
network -- she is between two important constituencies, plays a
'broker' role in the network.
– Also single point of failure -great influence over what flows and
does not in the network Location, Location, Location."
46

47
Betweenness: this example shades
nodes with hues from Red (least) to
Blue (most betweenness). [Wikipedia]
Centrality

Closeness
– Closeness centrality is defined the inverse of the sum of the
shortest distances between each individual and every other
person in the network
48
Centrality

49
Closeness of Olaf
d(O,B) = 1
d(O,T) = 2
d(O,L) = 2
d(O,A) = 1
d(O,J) = 1
d(O,D) = 1
d(O,G) = 2
d(O,E) = 2
d(O,R) = 2
Centrality

50
Closeness of Olaf
d(O,B) = 1
d(O,T) = 2
d(O,L) = 2
d(O,A) = 1
d(O,J) = 1
d(O,D) = 1
d(O,G) = 2
d(O,E) = 2
d(O,R) = 2
Sum the reciprocals:
– 1/1 + 1/2 + 1/2 + 1/1 + 1/1 +
1/1 + 1/2 + 1/2 + 1/2
– Olaf’s closeness score is 6.5.
Centrality

51
Closeness of Anna
d(A,B) = 2
d(A,T) = 3
d(A,L) = 3
d(A,O) = 1
d(A,J) = 1
d(A,D) = 2
d(A,G) = 1
d(A,E) = 1
d(A,R) = 1
Centrality

52
Closeness of Anna
d(A,B) = 2
d(A,T) = 3
d(A,L) = 3
d(A,O) = 1
d(A,J) = 1
d(A,D) = 2
d(A,G) = 1
d(A,E) = 1
d(A,R) = 1
Sum the reciprocals:
- 5(1/1) + 2(1/2) + 2(1/3)
- Anna’s closeness score is 6.66
Centrality

Closeness
– When a node has a low closeness score it is highly
central
– nodes with low closeness scores tend to receive
anything flowing through the network very quickly
because the speed with which something spreads in a
network is a function of the number of links in the
paths traversed. Since nodes with low closeness scores
are close to all nodes, they receive things quickly.
– good/bad dependent on what’s flowing!
53
Centrality

Closeness
– Fernando and Garth (lowest closeness scores) have fewer
connections than Diane, yet the pattern of their direct and
indirect ties allow them to access all the nodes in the network
more quickly than anyone else  shortest paths
54
Centrality

Network Centralization
– a very centralized network is dominated by one or a few very
central nodes. If these nodes are removed or damaged, the
network quickly fragments into unconnected sub-networks.
– highly central node can become a single point of failure. A
network centralized around a well connected hub can fail
abruptly if that hub is disabled or removed
– a less centralized network has no single points of failure, resilient
Networks of low centralization fail gracefully.
55
Download