Revision Pack

advertisement
QUICK REFERENCE GUIDE
GRAPH CONCEPTS
Name
Graph
Multigraph
Weighted Graph
Labelled Graph
Distance between
2 nodes
Simple Path
Length of a path
Definition
A set of nodes & edges
Can be directed or undirected
A graph is connected if there is a path between any
two of its vertices, otherwise they are connected
components
A graph that allows loops and multiple edges
Graph with weighted edges
A graph where its nodes or edges have properties
(attributes)
Shortest path between the 2 nodes
Name
Diameter of
graph
Definition
Maximum
distance in
graph
Example
Distance between A &
D is 2
Nodes are unique
Number of Edges
NETWORK CHARACTERISTICS
A Full network contains
all entities and
connections among
them
Ego: Node in focus
Alter: neighbor of Ego
Egocentric Network: an
ego and its connections
Unimodal Network
Multimodal Network
Multiplex Network
Only one type of vertex
Vertices have ≥ 2 types
e.g. person, document
Edges of ≥ 2 types
e.g. people and modes of communication
1
QUICK REFERENCE GUIDE
Name
Size of network
Density of network
Reachability
Degree Centrality
In-degree Centrality
Out-degree Centrality
Closeness Centrality
Graph Level Metrics
Definition
The number of nodes in the network, or
The number of edges in the network
Number of ties in the network over number of ALL possible ties
Directed network of size 𝑛,
no. of ties = 𝑛 × (𝑛 − 1)
Undirected network of size 𝑛,
𝑛−1
no. of ties = 𝑛 ×
2
The ability to get from one vertex to another within a graph
Vertex Metrics (Centrality)
Count of the total number of connections linked to vertex
Note: in/out degree for directed graphs
Closeness Centrality = Sum of shortest distance to all other vertices −1
OR
Average Distance to all other vertices
OR
(Average Distance to all other vertices)-1
Betweenness Centrality
Usage
Used to compute connectiveness of the network
Using geodesic (shortest) distance,
1
= 0.25
1+1+1+1
1
Node B =
= 0.14
1+2+2+2
1
Node C =
= 0.17
1+2+1+2
1
Node D =
= 0.2
1+2+1+1
1
Node E =
= 0.17
1+2+2+1
Node A =
Measure of how often a given vertex lies on the shortest path between two other
vertices
Number of shortest paths passing through v
Betweenness Centrality = ∑
Number of shortest paths
Node
A
B
C
D
E
Note: Betweenness centrality of all nodes = 0 when network density = 1
Eigenvector Centrality
Depends on both the number and quality of its connections
𝛽-centrality Metric
Small value: Analysis weighted towards local structure surrounding the ego
Positive Beta: Good for ego to be connected to highly central people
Cut Vertex
Bridge
A vertex whose removal disconnects a graph
An edge whose removal disconnects a graph
Betweenness
0.5
1.5
0.0
0.5
1.5
Eigenvector
0.162
0.241
0.194
0.162
0.241
For Node B, betweenness = 𝐴𝑩𝐢 + 𝐴𝑩𝐸 ⁄𝐴𝐷𝐸
= 1 + 0.5
Large value: Weighs towards wider network structure
Negative Beta: Ego’s disadvantage to be connected to others who
are themselves well-connected
Note: See Structural Balance
2
QUICK REFERENCE GUIDE
Vertex Characteristics (Pivotal, Gatekeeper)
A node X is Pivotal for a pair of distinct nodes Y and Z if X lies on every
shortest path between Y and Z
Pivotal Node
Gatekeeper Node
B is pivotal for pairs A & C, and A & D
A node X is a Gatekeeper if for a pair of nodes Y and Z, every path from Y to Z
passes through X
Gatekeeper  Pivotal
Gatekeeper/Pivotal  Local Gatekeeper
A node V is a Local Gatekeeper if there are two neighbors of V, Y and Z, that
are not connected by an edge
Node A is a gatekeeper
Node D is a local gatekeeper, but not a
gatekeeper
Comparison
Generally, the 3 centrality types will be positively correlated, when they are not, it probably tells you something interesting about the network
Low Degree
High Degree
High Closeness
Key player tied to important
important/active alters
High
Betweenness
Ego's few ties are crucial for network flow
Alter is super important, connected to a
big chunk of the network
Low Closeness
Embedded in cluster that is far from the rest of the network
Low Betweenness
Ego's connections are redundant - communication
bypasses him/her
Alter connects to each other
Probably multiple paths in the network, ego is near
many people, but so are many others
Very rare cell. Would mean that ego monopolizes the ties
from a small number of people to many others.
3
QUICK REFERENCE GUIDE
SOCIAL GROUPS
Undirected
Directed
Reciprocity
Dyads (2 nodes)
2 ties - Yes/No
No, 1-way (which way), 2-way
Ratio of all dyads to reciprocated r/s
Ratio of all connected dyads to reciprocated r/s
Total mutual
Total connected
Total dyads
Reciprocity
Undirected
Directed
Transitivity
2
6
10
2/10
OR
2/6
Cliques
Every member of a clique knows everybody else, i.e. Density = 1
 Any subset of nodes from a clique also forms a clique
N-Clique
Members within a N-clique are at most N distance away from
each other
Example
{ A, C, E } forms a 2-clique
BUT
{ A, B, C, E } is not a 2-clique
because B-E is 3 distance away
Triads (3 nodes)
0, 1, 2, or 3 ties
16 possible r/s (See below)
π‘₯ → 𝑦 → 𝑧 No
π‘₯ → 𝑦 → 𝑧 Yes π‘₯ → 𝑧 Yes
π‘₯ → 𝑦 → 𝑧 Yes π‘₯ → 𝑧 No
Vacuous
Transitive
Intransitive
See next
page
Triadic Closure
Clans
An N-clan is an N-clique where every pair has a path within the N-clique with distance ≤ 𝑁
i.e., N-clan cannot use nodes outside the clique
Example
{ A, B, C } is a 2-clan
{ A, C, E } is not a 2-clan
Note: Nodes in N-clique can depend
on non-clique nodes to form the Npath
Clustering
Clustering
Coefficient
Clustering coefficient of an ego is defined as:
How well the alters are connected among themselves, i.e.:
actual ties
Max ties
Density in 1.5 degree egocentric network
Clustering coefficient of the entire network is the:
Average of the clustering coefficients of ALL the nodes
Agglomerative
Bottom up: Start from singleton and merge
Clustering Algorithm
Divisive
Top down: Start from cluster and split recursively
4
QUICK REFERENCE GUIDE
STRUCTURAL BALANCE
Triadic Closure is the idea that if two people in a social network have a friend in common,
then there is an increased likelihood that they will become friends themselves at some
point in the future.
Structural Balance Property: For every set of three nodes, considering the three edges
connecting them, either all three are labeled +, or else exactly one of them is labeled +
Strong Triadic Closure Property: if a node A has edges to nodes B and C, then the B-C edge
is likely to form if A-B and A-C are both strong ties.
Node A violates the Strong Triadic Closure Property as there is no
edge between B and C at all.
Bridge: An edge whose removal disconnects a graph
Local Bridge: An edge whose removal results in a path > 2 from its endpoints, A & B, i.e.
A & B have no common friends
∴ if A-B is a strong tie bridge, A/B cannot have a strong tie to another node or it violates
the Strong Triadic Closure Property
Weak Structural Balance Property: There is no set of three nodes s.t. the edges among
them consist of exactly 2 +ve and 1 –ve,  If a graph is weakly balanced, its nodes can
be divided into groups where every 2 nodes in the same group are friends and every 2
nodes in different groups are enemies
Balance Theorem: If a labeled complete graph is balanced, then either all pairs of nodes
are friends, or the nodes can be divided into two groups, X and Y, such that
1) every pair of nodes in X like each other,
2) every pair of nodes in Y like each other, and
3) everyone in X is the enemy of everyone in Y
Graded Measure for a local bridge:
number of nodes who are neighbors of π‘π‘œπ‘‘β„Ž A and B
number of nodes who are neighbors of π‘Žπ‘‘ π‘™π‘’π‘Žπ‘ π‘‘ π‘œπ‘›π‘’ of A and B
i.e. if a complete graph has 2 sets of mutual friends, with complete mutual antagonism
between the two, it is balanced
1
6
2
𝐴𝐸 =
5
∴ 𝐴𝐸 is more of a local bridge
𝐴𝐹 =
5
QUICK REFERENCE GUIDE
INFORMATION FLOW
Freeman’s formula for Network Centralization
𝐢𝐷 =
𝑔
(∑𝑖=1[𝐢𝐷 (𝑛∗ ) − 𝐢𝐷 (𝑛𝑖 )])
(𝑔 − 1)(𝑔 − 2)
𝐢𝐷 is centralization of the network
𝐢𝐷 (𝑛𝑖 ) is degree centrality of π‘›π‘œπ‘‘π‘’ 𝑖
𝐢𝐷 (𝑛∗ ) is degree centrality of the highest centrality node
𝑔 is the number of nodes in the network
Flow Betweenness
Let π‘šπ‘—π‘˜ be the amount of flow between vertex 𝑗 and vertex π‘˜ which must pass through 𝑖
for any maximum flow. The flow betweenness of vertex 𝑖 is the sum of all π‘šπ‘—π‘˜ where 𝑖, 𝑗
and π‘˜ are distinct and 𝑗 < π‘˜ . The flow betweenness is therefore a measure of the
contribution of a vertex to all possible maximum flows
Node 2
1-2: 2
1-3: 3
1-4: 4(2)
1-5: 2(1)
1-6: 4(2)
2-3: 0
2-4: 3
2-5: 1
2-6: 3
3-4: 1
3-5: 1
3-6: 2
4-5: 0
4-6: 2
5-6: 3
Centralization shows the degree of inequality or variance in the network as a percentage
of that of a perfect star network of the same size0.
Note: The star network is the most unequal network & 𝐢𝐷 = 1
Max Flow/Min Cut (Flow, Capacity)
Max-flow min-cut theorem: for any network having a single origin node and a single
destination node, the maximum possible flow from origin to destination equals the
minimum cut value for all cuts in the network
Finding Max Flow
Node 3
1-2: 2
1-3: 3
1-4: 4(1)
1-5: 2(1)
1-6: 4(2)
2-3: 0
2-4: 3
2-5: 1
2-6: 3
3-4: 1
3-5: 1
3-6: 2
4-5: 0
4-6: 2
5-6: 3
Bookkeeping Algorithm:
1.
2.
3.
4.
Find any path from source to sink that has a positive flow capacity remaining. If no more
such paths, exit
Determine 𝑓, the maximum flow along this path, which is equal to the smallest flow
capacity on any arc in the path (the bottleneck arc)
Subtract 𝑓 from the remaining flow capacity in the forward direction
Add 𝑓 to the remaining flow capacity in the backwards direction for each arc (if needed)
Go to Step 1
UCINET: Network > Centrality and Power > Flow Betweenness …
Information Cascade
Occurs when a person observes the actions of others and then—despite possible
contradictions in his/her own private information signals—engages in the same acts
Conditional Probability:
Finding Min Cut
A cut is any set of directed arcs containing at least one arc in every path from the source
to the sink. The cut value is the sum of the flow capacities in the source-to-sink direction
of all the arcs.
By the max-flow min-cut theorem, the cut value of the min cut is equals to the max flow.
UCINET: Network > Cohesion > Max Flow
𝑃(𝐴|𝐡) =
𝑃(𝐴) βˆ™ 𝑃(𝐴|𝐡)
𝑃(𝐡)
There are four key conditions in an information cascade model:
1.
2.
3.
4.
Agents make decisions sequentially
Agents make decisions rationally based on the information they have
Agents do not have access to the private information of others
A limited action space exists (e.g. an adopt/reject decision
6
QUICK REFERENCE GUIDE
STUDY DESIGN
1.
Basics: Measurements & Data
Variable: Characteristic or property
Scales: Nominal, Ordinal, Interval, Ratio
Nominal
Ordinal
Interval
Ratio
Categorical; Qualitative
e.g. Male, Female; North, South, East, West
No concept of gap size: π‘Ž > 𝑏 > 𝑐
e.g. first, second, third; primary, secondary, jc
Gaps measured in continuous units
Can perform +, − operations
e.g. Celcius
Ratios can be compared
Can perform +, −,×,÷ operations
e.g. dollars
3.
Steps in doing a social network study
Decide
what to
study
Choose
relevant
population
What type of scale to use?0
-
Degree Centrality: Ratio
Betweenness Centrality: Ratio
Pivotal/Non-pivotal: Categorical
Survey Ratings: Ratio
Edge (Yes/No): Categorical
Weighted edge (e.g., 1…10): Ratio
2.
Data collection
Asking
Respondents
Experiments
Web Access
Secondary
Data
1)
2)
3)
4)
Simple Questions (e.g. age, education)
Survey Type Questions
Open-ended questions
Roster choice method, i.e., respondents given a list (roster) of
people and asked questions about them
e.g. which of the following would you regard as a friend
Measure variables
Web crawling
Blogs, forums, social media
1) Datasets on the internet (context)
2) Reports
3) Email Records
4) Company transaction record
Collect
data
Analyse
Deduce
Findings
Report
What to study?
The Hypothesis
See Notes for examples
Variables
Identify variables, consider independent variables
e.g. Node properties, edge properties
Level of Detail
e.g. team email: sender, receiver, etc.
Sampling
Identify the population study is interested in
Roles/positions (directors/politicians)
Relationships (friends of …)
Events (participation/communication)
Time
Location
Complete Population (Census)
VS
Random (ego) + snowball (alters)
Refer to 2. Data Collection
Mixture of qualitative, descriptive statistics, and statistical
tests
Statistics, and compare with prior studies
Clear, meaningful and obvious graphs
Introduction  Literature Review  Objective (Hypothesis)
 Methodology  Analysis  Findings
7
QUICK REFERENCE GUIDE
UCINET CHEAT SHEET
Action
UCINET Steps
UCINET Output
Import text file
Data > Import text file > DL (cntr-i)
Display dataset
Data > Display (cntr-d)
Outputs:
1. Text log file
2. Network files, .##h & .##d
Outputs matrix in selected data file
NetDraw
Visualize > NetDraw
To open a dataset: File > Open > Ucinet dataset > Network
Data > Unpack
Separate files with
multiple matrices
Prepare Data
Produce matrix
from attributes
Display Univariate
Statistics
Compute Network
Metrics
Data > Data Editors > Matrix Editor (cntr-s)
Data > Attribute to matrix
Note: Refer to Notes
Tools > Univariate Statistics (cntr-u)
Note: Refer to Notes
Network > Centrality and power > Multiple measures (cntr-m)
Input: .##h file
8
QUICK REFERENCE GUIDE
Test Type
UCINET Action / Input
Test observed
mean/density
against a fixed value
Network > Compare densities > Against
theoretical parameter
UCINET Output
Test whether the density of the selected
network is close to the Expected density.
In this case, z-score is -3.7943
i.e. 3.79 s.d. to the left of expected
density  observed density is
significantly smaller than expected
density of 1.0 as p-value = 0.0002
Find p-value against
a fixed value
Actual density as shown in UCINET Output in Display dataset
Test of density
(more than mean,
takes into account
of variability)
difference between
2 networks
Find p-value of 2
groups divided on
node attributes
Correlation
between 2
networks with same
actors
Analysis
Network > Compare densities > Paired
(same nodes)
Used to compare density difference
between two networks. Good for testing
time difference of the same network
t-statistic = 2.4089
p-value = 0.0052 < 0.05
∴ the difference in density is significant
Compares Matrix VS Matrix.
Tools > Testing Hypothesis > Dyadic
(QAP) > QAP Correlation (cntr-q)
Find r
Col 1: coefficient for dataset
Col 2: p-value
Col 3: average coefficient of all sampled datasets
∡ p − value > 0.05  correlation is not significant
Compares Matrix VS Matrix.
NO dependency
9
QUICK REFERENCE GUIDE
Test Type
UCINET Action / Input
Regression
(you have
control over
the
independent
variable)
Tools > Testing Hypothesis > Dyadic (QAP) > QAP
Regression > MR-QAP Linear Regression >
Double Dekker: if no missing values
Semi-partialling: missing values
T-test of 2
group means
Tools > Testing Hypothesis > Node-level > TTest
UCINET Output
Analysis
Compares Matrix VS Matrix
Look at R-sq first to see if model is a good fit. Then look at individual variables
Compares Column VS Column
ANOVA for 2
or more
groups
Tools > Testing Hypothesis > Node-level > Anova
T-test used to test if there
are differences between
the means of two groups,
in this case, whether the
govt or non-govt groups
have different out-degree
centrality (col 1).
Is one group bigger
than the other?
Result: No difference
across groups, all pvalues are > 0.05
Look at f-statistic and
significance. Significance
is the same as that of
two-tailed test.
Note: Refer to Notes
Compares Column VS Column
10
QUICK REFERENCE GUIDE
Triad Undirected
𝑋1 𝑋2 𝑋3
𝑋1 : No. of mutual dyads
𝑋2 : No. of asymmetric dyads
𝑋3 : No. of null dyads
D: Down
U: Up
T: Transitive
C: Cyclic
11
Download