Exponential Random Graph Models Under Measurement Error

advertisement
Exponential Random Graph
Models Under Measurement
Error
Zoe Rehnberg
with Dr. Nan Lin
Washington University in St. Louis
ARTU 2014
Examples of Social Networks
• Wikipedia pages
– Individual pages are connected
when there is a reference for
one on the other.
• Article authorship
– Two statisticians are connected
when they co-author a paper.
• Friendship
– Two high school students are
connected when they indicate
that they are friends with each
other.
Network Data
• Nodes – individuals in a
mock high school
• Edges – mutual friendships
• Adjacency matrix (W )
• n = number of nodes
• wi,j = 1, if edge present
0, if edge absent
Exponential Random Graph
Models
• The combination of nodes and edges in an
adjacency matrix is random.
• Exponential random graph models explain how
likely it is that a specific configuration of edges
will occur:
P(W = w)µ exp{q T - g(w)}
• w – a given set of edges in an adjacency matrix
• θ – vector of model coefficients
• g(w) – a vector of statistics for the given adjacency
matrix
Descriptive Statistics
• These statistics summarize how nodes are related to
each other within the larger graph as a whole.
• These are used to form the ERG model for a given
adjacency matrix.
• Examples
1.
Degree
ki = å wi, j
j
2.
Degree centrality
CD µ å (k * - ki )
i
3. Triangles
Estimating Model Coefficients
• The ERGM function in the statnet package of R
uses a maximum likelihood approach to estimate
, the vector of model coefficients.
library("statnet")
data(faux.mesa.high)
dat <- faux.mesa.high
# fit the original ERG model
orig.model <- ergm(dat ~ edges + nodematch("Grade") + nodematch
("Race") + nodematch("Sex") + gwesp(0.4, fixed = TRUE), control =
control.ergm(MCMC.samplesize = 1e+5, seed = 123))
# simulate from the original model
sim.net <- simulate(orig.model, seed = 1534)
Possible Measurement Error
• Measurement error refers to how well an observed
network reflects the true network.
• We focused on missing (false negative) and
spurious (false positive) edges in the network.
• Possible sources of error:
– Mistakes in collecting or coding data
– Differences in perception
Goal of Our Study
• Goal: understanding ERGMs under measurement
error
• Method: study by simulation
1. Model and simulate friendship network
• g(w) – edges, assortative mixing, shared
partners
2. Imitate measurement error
• Adding probability: q = 0.001, 0.005, 0.01, 0.05
• Removing probability: p = 0.01, 0.02, …, 0.20
3. Estimate ERGM coefficients and statistics
• Simulated measurement
error by perturbing 100
networks at each
probability combination
• Calculated root mean
square error of the
perturbed ERGM
coefficients
1 m ˆ
RMSE =
(qi - qi )
å
m i
Method of Spectral Denoising
•
Wobs = Wtrue +Wnoise
• Naïve estimator:
p = probability of missing edge
q = probability of spurious edge
WKn =
• Empirical estimator: create Ŵobs through spectral
decomposition of Wobs
• There is a continuity requirement for statistics.
P. Balachandran, E. M. Airoldi, and E. D. Kolaczyk. Inference of network summary statistics through
nonparametric network denoising. Annals of Statistics, 2013. arXiv:1310.0423v3.
Challenges and Future Work
• The estimated adjacency matrices have non-integer
values, which causes practical computational
problems.
– The R function ergm( ) only accepts adjacency
matrices with 0/1 entries.
é
ê
Wobs = ê
ê
ê
ë
0
1
0
0
1
0
0
1
0
0
0
0
0
1
0
0
ù
ú
ú
ú
ú
û
Challenges and Future Work
• The estimated adjacency matrices have non-integer
values, which causes practical computational
problems.
– The R function ergm( ) only accepts adjacency
matrices with 0/1 entries.
• Instead of obtaining a single estimate of Wtrue , we
want to simulate from the conditional distribution of
(i)
and
fit
ERGMs
to
each
W
Wtrue Wobs
true .
– The final estimation of q will then be based on this
simulated distribution.
References
[1] A. Caimo and N. Friel. Bayesian inference for exponential random
graph models. Social Networks, 2010. http://arxiv.org/abs/
1007.5192.
[2] Hanneman, Robert A. and Mark Riddle. 2005. Introduction to social
network methods. Riverside, CA: University of California, Riverside.
http://faculty.ucr.edu/ ~hanneman/nettext/.
[3] P. Balachandran, E. M. Airoldi, and E. D. Kolaczyk. Inference of
network summary statistics through nonparametric network
denoising. Annals of Statistics, 2013. arXiv:1310.0423v3.
[4] Wang, D.J., et al., Measurement error in network data: A
reclassification. Soc. Netw. (2012), doi:10.1016/j.socnet.2012.01.003
Download