Exponential Random Graph Models Under Measurement Error Zoe Rehnberg with Dr. Nan Lin Washington University in St. Louis ARTU 2014 Examples of Social Networks • Wikipedia pages – Individual pages are connected when there is a reference for one on the other. • Article authorship – Two statisticians are connected when they co-author a paper. • Friendship – Two high school students are connected when they indicate that they are friends with each other. Network Data • Nodes – individuals in a mock high school • Edges – mutual friendships • Adjacency matrix (W ) • n = number of nodes • wi,j = 1, if edge present 0, if edge absent Exponential Random Graph Models • The combination of nodes and edges in an adjacency matrix is random. • Exponential random graph models explain how likely it is that a specific configuration of edges will occur: P(W = w)µ exp{q T - g(w)} • w – a given set of edges in an adjacency matrix • θ – vector of model coefficients • g(w) – a vector of statistics for the given adjacency matrix Descriptive Statistics • These statistics summarize how nodes are related to each other within the larger graph as a whole. • These are used to form the ERG model for a given adjacency matrix. • Examples 1. Degree ki = å wi, j j 2. Degree centrality CD µ å (k * - ki ) i 3. Triangles Estimating Model Coefficients • The ERGM function in the statnet package of R uses a maximum likelihood approach to estimate , the vector of model coefficients. library("statnet") data(faux.mesa.high) dat <- faux.mesa.high # fit the original ERG model orig.model <- ergm(dat ~ edges + nodematch("Grade") + nodematch ("Race") + nodematch("Sex") + gwesp(0.4, fixed = TRUE), control = control.ergm(MCMC.samplesize = 1e+5, seed = 123)) # simulate from the original model sim.net <- simulate(orig.model, seed = 1534) Possible Measurement Error • Measurement error refers to how well an observed network reflects the true network. • We focused on missing (false negative) and spurious (false positive) edges in the network. • Possible sources of error: – Mistakes in collecting or coding data – Differences in perception Goal of Our Study • Goal: understanding ERGMs under measurement error • Method: study by simulation 1. Model and simulate friendship network • g(w) – edges, assortative mixing, shared partners 2. Imitate measurement error • Adding probability: q = 0.001, 0.005, 0.01, 0.05 • Removing probability: p = 0.01, 0.02, …, 0.20 3. Estimate ERGM coefficients and statistics • Simulated measurement error by perturbing 100 networks at each probability combination • Calculated root mean square error of the perturbed ERGM coefficients 1 m ˆ RMSE = (qi - qi ) å m i Method of Spectral Denoising • Wobs = Wtrue +Wnoise • Naïve estimator: p = probability of missing edge q = probability of spurious edge WKn = • Empirical estimator: create Ŵobs through spectral decomposition of Wobs • There is a continuity requirement for statistics. P. Balachandran, E. M. Airoldi, and E. D. Kolaczyk. Inference of network summary statistics through nonparametric network denoising. Annals of Statistics, 2013. arXiv:1310.0423v3. Challenges and Future Work • The estimated adjacency matrices have non-integer values, which causes practical computational problems. – The R function ergm( ) only accepts adjacency matrices with 0/1 entries. é ê Wobs = ê ê ê ë 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 ù ú ú ú ú û Challenges and Future Work • The estimated adjacency matrices have non-integer values, which causes practical computational problems. – The R function ergm( ) only accepts adjacency matrices with 0/1 entries. • Instead of obtaining a single estimate of Wtrue , we want to simulate from the conditional distribution of (i) and fit ERGMs to each W Wtrue Wobs true . – The final estimation of q will then be based on this simulated distribution. References [1] A. Caimo and N. Friel. Bayesian inference for exponential random graph models. Social Networks, 2010. http://arxiv.org/abs/ 1007.5192. [2] Hanneman, Robert A. and Mark Riddle. 2005. Introduction to social network methods. Riverside, CA: University of California, Riverside. http://faculty.ucr.edu/ ~hanneman/nettext/. [3] P. Balachandran, E. M. Airoldi, and E. D. Kolaczyk. Inference of network summary statistics through nonparametric network denoising. Annals of Statistics, 2013. arXiv:1310.0423v3. [4] Wang, D.J., et al., Measurement error in network data: A reclassification. Soc. Netw. (2012), doi:10.1016/j.socnet.2012.01.003