1/36 Row and column effects 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/36 Monk data ● 6 ● ● 4 ● ● 2 ● ● 0 ● ● ● ● ● 3 4 5 6 outdegree ● 4 5 ● 3 ● ● ● 0 1 2 ● 2 sd(rsum(Y)) ## [1] 0.9002541 sd(csum(Y)) ## [1] 2.698341 3 4 5 6 7 indegree 8 10 11 3/36 Testing the SRG rERG<-function(n,s) { Y<-matrix(0,n,n) ; diag(Y)<-NA Y[!is.na(Y)]<- sample( c(rep(1,s),rep(0,n*(n-1)-s ))) Y } n<-nrow(Y) s<-sum(Y,na.rm=TRUE) sdo<-sd(rsum(Y)) sdi<-sd(csum(Y)) DSD<-NULL for(sim in 1:S) { Ysim<-rERG(n,s) DSD<-rbind(DSD, c(sd(rsum(Ysim)),sd(csum(Ysim)))) } 4/36 0.0 0.0 0.2 0.2 0.4 0.4 Density 0.6 0.8 Density 0.6 0.8 1.0 1.0 1.2 1.2 Testing the SRG 1.0 1.5 2.0 2.5 outdegree sd 3.0 1.0 1.5 2.0 indegree sd 2.5 5/36 Testing the SRG Compared to any SRG, • the nodes show less heterogeneity in outdegree; • the nodes show more heterogenetiy in indegree. No SRG model provides a good representation of Y in terms of its degrees. What kind of model would provide a better representation of the data? 5/36 Testing the SRG Compared to any SRG, • the nodes show less heterogeneity in outdegree; • the nodes show more heterogenetiy in indegree. No SRG model provides a good representation of Y in terms of its degrees. What kind of model would provide a better representation of the data? 5/36 Testing the SRG Compared to any SRG, • the nodes show less heterogeneity in outdegree; • the nodes show more heterogenetiy in indegree. No SRG model provides a good representation of Y in terms of its degrees. What kind of model would provide a better representation of the data? 5/36 Testing the SRG Compared to any SRG, • the nodes show less heterogeneity in outdegree; • the nodes show more heterogenetiy in indegree. No SRG model provides a good representation of Y in terms of its degrees. What kind of model would provide a better representation of the data? 5/36 Testing the SRG Compared to any SRG, • the nodes show less heterogeneity in outdegree; • the nodes show more heterogenetiy in indegree. No SRG model provides a good representation of Y in terms of its degrees. What kind of model would provide a better representation of the data? ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 6/36 ANOVA for factorial data Consider the data matrix for a generic two-factor layout: 1 2 .. . K1 1 y1,1 y2,1 yK1 ,1 2 y1,2 y2,2 .. . yK1 ,2 ··· ··· ··· K2 y1,K2 y2,K2 ··· yK1 ,K2 The standard “ANOVA” row and column effects (RCE) model for such data is yi,j = µ + ai + bj + i,j • µ represents an overall mean; • ai represents the average deviation of row i from µ; • bj represents the average deviation of column j from µ. More abstractly, • ai is the “effect” of factor one having level i; • bj is the “effect” of factor two being level j. 7/36 ANOVA for factorial designs ML and OLS estimates are given by • µ̂ = ȳ·· ; • âi = ȳi· − ȳ·· ; • b̂j = ȳ·j − ȳ·· . Note that 1 X (ȳi· − ȳ·· ) K1 X 1 ȳi· − ȳ·· = K1 X X 1 = yi,j − ȳ·· K1 K2 = ȳ·· − ȳ·· = 0. mean(â1 , . . . , âK1 ) = var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ). 7/36 ANOVA for factorial designs ML and OLS estimates are given by • µ̂ = ȳ·· ; • âi = ȳi· − ȳ·· ; • b̂j = ȳ·j − ȳ·· . Note that 1 X (ȳi· − ȳ·· ) K1 X 1 ȳi· − ȳ·· = K1 X X 1 = yi,j − ȳ·· K1 K2 = ȳ·· − ȳ·· = 0. mean(â1 , . . . , âK1 ) = var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ). 7/36 ANOVA for factorial designs ML and OLS estimates are given by • µ̂ = ȳ·· ; • âi = ȳi· − ȳ·· ; • b̂j = ȳ·j − ȳ·· . Note that 1 X (ȳi· − ȳ·· ) K1 X 1 ȳi· − ȳ·· = K1 X X 1 = yi,j − ȳ·· K1 K2 = ȳ·· − ȳ·· = 0. mean(â1 , . . . , âK1 ) = var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ). 7/36 ANOVA for factorial designs ML and OLS estimates are given by • µ̂ = ȳ·· ; • âi = ȳi· − ȳ·· ; • b̂j = ȳ·j − ȳ·· . Note that 1 X (ȳi· − ȳ·· ) K1 X 1 ȳi· − ȳ·· = K1 X X 1 = yi,j − ȳ·· K1 K2 = ȳ·· − ȳ·· = 0. mean(â1 , . . . , âK1 ) = var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ). ANOVA for factorial designs ML and OLS estimates are given by • µ̂ = ȳ·· ; • âi = ȳi· − ȳ·· ; • b̂j = ȳ·j − ȳ·· . Note that 1 X (ȳi· − ȳ·· ) K1 X 1 ȳi· − ȳ·· = K1 X X 1 = yi,j − ȳ·· K1 K2 = ȳ·· − ȳ·· = 0. mean(â1 , . . . , âK1 ) = var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ). 7/36 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 RCE model interpretation For the RCE model • the model is parameterized in terms of differences among row means and differences among column means; • heterogeneity in the oberved row/column means is represented by heterogeneity in the âi ’s and b̂j ’s; • tests for row and column heterogeneity can be made via F -tests. Back to binary directed network data: • the data take the form of a two-way layout; • we’ve seen empirically the existence of row and column heterogeneity. Is there an analogous RCE model for network data? 8/36 Odd and probabilities Let Pr(Yi,j = 1) = θ. Then odds(Yi,j = 1) = θ/(1 − θ) log odds(Yi,j = 1) = log θ/(1 − θ) = µ Now write the probability in terms of the log odds µ: log odds(Yi,j = 1) = µ odds(Yi,j = 1) = e µ θ/(1 − θ) = e µ θ = e µ − θe µ µ θ(1 + e ) = e µ θ = Pr(Yi,j = 1) = 9/36 eµ 1 + eµ 10/36 Logistic regression Let {Yi,j : i 6= j} be independent binary random variables with Pr(Yi,j ) = θi,j . • If µi,j = µ for all i, j, then we have the SRG(n, eµ ) 1+e µ model. • Our hypothesis tests generally reject this model. Thus the hypothesis of homogeneous µi,j ’s is rejected. What heterogeneity in the µi,j ’s will help us capture the observed data patterns? Logistic regression Let {Yi,j : i 6= j} be independent binary random variables with Pr(Yi,j ) = θi,j . • If µi,j = µ for all i, j, then we have the SRG(n, eµ ) 1+e µ model. • Our hypothesis tests generally reject this model. Thus the hypothesis of homogeneous µi,j ’s is rejected. What heterogeneity in the µi,j ’s will help us capture the observed data patterns? 10/36 Logistic regression Let {Yi,j : i 6= j} be independent binary random variables with Pr(Yi,j ) = θi,j . • If µi,j = µ for all i, j, then we have the SRG(n, eµ ) 1+e µ model. • Our hypothesis tests generally reject this model. Thus the hypothesis of homogeneous µi,j ’s is rejected. What heterogeneity in the µi,j ’s will help us capture the observed data patterns? 10/36 Logistic regression Let {Yi,j : i 6= j} be independent binary random variables with Pr(Yi,j ) = θi,j . • If µi,j = µ for all i, j, then we have the SRG(n, eµ ) 1+e µ model. • Our hypothesis tests generally reject this model. Thus the hypothesis of homogeneous µi,j ’s is rejected. What heterogeneity in the µi,j ’s will help us capture the observed data patterns? 10/36 Logistic regression Let {Yi,j : i 6= j} be independent binary random variables with Pr(Yi,j ) = θi,j . • If µi,j = µ for all i, j, then we have the SRG(n, eµ ) 1+e µ model. • Our hypothesis tests generally reject this model. Thus the hypothesis of homogeneous µi,j ’s is rejected. What heterogeneity in the µi,j ’s will help us capture the observed data patterns? 10/36 Logistic regression Let {Yi,j : i 6= j} be independent binary random variables with Pr(Yi,j ) = θi,j . • If µi,j = µ for all i, j, then we have the SRG(n, eµ ) 1+e µ model. • Our hypothesis tests generally reject this model. Thus the hypothesis of homogeneous µi,j ’s is rejected. What heterogeneity in the µi,j ’s will help us capture the observed data patterns? 10/36 11/36 Binary RCE model RCE model: {Yi,j : i 6= j} are independent with e µi,j 1 + e µi,j = µ + ai + bj Pr(Yi,j = 1) = µi,j or more compactly, Pr(Yi,j = 1) = e µ+ai +bj 1 + e µ+ai +bj Interpretation: odds(Yi,j = 1) = e µ+ai +bj odds ratio(Yi,j = 1 : Yk,j = 1) = e µ+ai +bj /e µ+ak +bj log odds ratio(Yi,j = 1 : Yk,j = 1) = [µ + ai + bj ] − [µ + ak + bj ] = ai − ak Binary RCE model RCE model: {Yi,j : i 6= j} are independent with e µi,j 1 + e µi,j = µ + ai + bj Pr(Yi,j = 1) = µi,j or more compactly, Pr(Yi,j = 1) = e µ+ai +bj 1 + e µ+ai +bj Interpretation: odds(Yi,j = 1) = e µ+ai +bj odds ratio(Yi,j = 1 : Yk,j = 1) = e µ+ai +bj /e µ+ak +bj log odds ratio(Yi,j = 1 : Yk,j = 1) = [µ + ai + bj ] − [µ + ak + bj ] = ai − ak 11/36 12/36 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 12/36 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 12/36 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 12/36 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 12/36 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 12/36 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 12/36 Binary RCE model e µ+ai +bj 1 + e µ+ai +bj The differences among the ai ’s represents heterogeneity among nodes in terms of the probability of sending a tie, and similarly for the bj ’s. Pr(Yi,j = 1) = • The ai ’s can represent heterogeneity in nodal outdegree; • The bj ’s can represent heterogeneity in nodal outdegree. Caution: The row and column effects a and b are only identified up to differences. Suppose c + d + e = 0: e µ+ai +bj 1 + e µ+ai +bj e µ+ai +bj +c+d+e = 1 + e µ+ai +bj +c+d+e Pr(Yi,j = 1) = = e (µ+c)+(ai +d)+(bj +e) e µ̃+ãi +b̃j = (µ+c)+(ai +d)+(bj +e) 1+e 1 + e µ̃+ãi +b̃j It is standard to restrict the estimates somehow to reflect this: P P • Sum to zero side conditions : ai = 0, bj = 0 • Set to zero side conditions : a1 = b1 = 0 12/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 13/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 Fitting the binary RCE in R To make an anaolgy to the two-way factorial design, • sender index = row factor; • receiver index = columns factor. To fit the RCE model with logistic regrssion in R, we need to • code the senders as a row factor; • code the receivers as a column factor; • convert everything to vectors. 13/36 14/36 Fitting the binary RCE in R Ridx<-matrix((1:nrow(Y)),nrow(Y),nrow(Y)) Cidx<-t(Ridx) Ridx[1:4,1:4] ## ## ## ## ## [1,] [2,] [3,] [4,] [,1] [,2] [,3] [,4] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 Cidx[1:4,1:4] ## ## ## ## ## [1,] [2,] [3,] [4,] [,1] [,2] [,3] [,4] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Y[1:4,1:4] ## ## ## ## ## ROMUL BONAVEN AMBROSE BERTH ROMUL BONAVEN AMBROSE BERTH NA 1 1 0 1 NA 0 0 1 1 NA 0 0 0 0 NA 15/36 Fitting the binary RCE in R y<-c(Y) ridx<-c(Ridx) cidx<-c(Cidx) y[1:22] ## [1] NA 1 1 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 NA 1 0 1 3 4 ridx[1:22] ## [1] 1 cidx[1:22] ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 16/36 Fitting the binary RCE in R fit<-glm( y ~ factor(ridx) + factor(cidx), family=binomial) summary(fit) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: glm(formula = y ~ factor(ridx) + factor(cidx), family = binomial) Deviance Residuals: Min 1Q Median -1.5975 -0.8044 -0.5725 3Q 0.9091 Max 2.2146 Coefficients: (Intercept) factor(ridx)2 factor(ridx)3 factor(ridx)4 factor(ridx)5 factor(ridx)6 factor(ridx)7 factor(ridx)8 factor(ridx)9 factor(ridx)10 factor(ridx)11 factor(ridx)12 factor(ridx)13 factor(ridx)14 factor(ridx)15 factor(ridx)16 Estimate Std. Error z value Pr(>|z|) 1.1154 0.7558 1.476 0.14001 -0.3175 0.7780 -0.408 0.68319 -0.4545 0.7823 -0.581 0.56124 -0.7633 0.8136 -0.938 0.34816 -0.3557 0.7820 -0.455 0.64920 -0.8156 0.8103 -1.007 0.31415 -0.3954 0.7839 -0.504 0.61394 -0.1036 0.7654 -0.135 0.89232 -0.7818 0.8128 -0.962 0.33616 -0.1668 0.7624 -0.219 0.82686 -0.4545 0.7823 -0.581 0.56124 -0.7231 0.8133 -0.889 0.37395 -1.1720 0.8627 -1.358 0.17433 -0.3954 0.7839 -0.504 0.61394 -0.1453 0.7641 -0.190 0.84917 -0.4545 0.7823 -0.581 0.56124 17/36 Fitting the binary RCE in R fit<-glm( y ~ summary(fit) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## C(factor(ridx),sum) + C(factor(cidx),sum) , family=binomial) Call: glm(formula = y ~ C(factor(ridx), sum) + C(factor(cidx), sum), family = binomial) Deviance Residuals: Min 1Q Median -1.5975 -0.8044 -0.5725 3Q 0.9091 Max 2.2146 Coefficients: (Intercept) C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), C(factor(ridx), sum)1 sum)2 sum)3 sum)4 sum)5 sum)6 sum)7 sum)8 sum)9 sum)10 sum)11 sum)12 sum)13 sum)14 Estimate Std. Error z value Pr(>|z|) -1.03516 0.14270 -7.254 4.04e-13 *** 0.47000 0.52409 0.897 0.369829 0.15250 0.54948 0.278 0.781374 0.01546 0.55376 0.028 0.977721 -0.29326 0.59177 -0.496 0.620204 0.11430 0.55418 0.206 0.836593 -0.34557 0.58751 -0.588 0.556408 0.07456 0.55617 0.134 0.893358 0.36639 0.53352 0.687 0.492248 -0.31179 0.59078 -0.528 0.597672 0.30324 0.52919 0.573 0.566632 0.01546 0.55376 0.028 0.977721 -0.25315 0.59185 -0.428 0.668850 -0.70197 0.64986 -1.080 0.280056 0.07456 0.55617 0.134 0.893358 Fitting the binary RCE in R mu.hat<-fit$coef[1] a.hat<- fit$coef[1+1:(nrow(Y)-1)] ; a.hat<-c(a.hat,-sum(a.hat) ) b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)] ; b.hat<-c(b.hat,-sum(b.hat) ) ● ● ● a.hat −0.2 0.0 ● ● 1.0 0.2 ● ● ● 1.5 0.4 ● ● ● b.hat 0.5 ● 0.0 ● ● ● ● ● −1.0 −0.6 −0.5 −0.4 ● ● ● ● ● 3.0 3.5 4.0 4.5 rsum(Y) 5.0 5.5 6.0 ● ● ● ● 2 4 6 csum(Y) • Why do nodes with the same outdegrees have different âi ’s? • Why do nodes with the same indegrees have different b̂j ’s? 18/36 8 10 18/36 Fitting the binary RCE in R mu.hat<-fit$coef[1] a.hat<- fit$coef[1+1:(nrow(Y)-1)] ; a.hat<-c(a.hat,-sum(a.hat) ) b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)] ; b.hat<-c(b.hat,-sum(b.hat) ) ● ● ● a.hat −0.2 0.0 ● ● 1.0 0.2 ● ● ● 1.5 0.4 ● ● ● b.hat 0.5 ● 0.0 ● ● ● ● ● −1.0 −0.6 −0.5 −0.4 ● ● ● ● ● 3.0 3.5 4.0 4.5 rsum(Y) 5.0 5.5 6.0 ● ● ● ● 2 4 6 csum(Y) • Why do nodes with the same outdegrees have different âi ’s? • Why do nodes with the same indegrees have different b̂j ’s? 8 10 18/36 Fitting the binary RCE in R mu.hat<-fit$coef[1] a.hat<- fit$coef[1+1:(nrow(Y)-1)] ; a.hat<-c(a.hat,-sum(a.hat) ) b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)] ; b.hat<-c(b.hat,-sum(b.hat) ) ● ● ● a.hat −0.2 0.0 ● ● 1.0 0.2 ● ● ● 1.5 0.4 ● ● ● b.hat 0.5 ● 0.0 ● ● ● ● ● −1.0 −0.6 −0.5 −0.4 ● ● ● ● ● 3.0 3.5 4.0 4.5 rsum(Y) 5.0 5.5 6.0 ● ● ● ● 2 4 6 csum(Y) • Why do nodes with the same outdegrees have different âi ’s? • Why do nodes with the same indegrees have different b̂j ’s? 8 10 18/36 Fitting the binary RCE in R mu.hat<-fit$coef[1] a.hat<- fit$coef[1+1:(nrow(Y)-1)] ; a.hat<-c(a.hat,-sum(a.hat) ) b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)] ; b.hat<-c(b.hat,-sum(b.hat) ) ● ● ● a.hat −0.2 0.0 ● ● 1.0 0.2 ● ● ● 1.5 0.4 ● ● ● b.hat 0.5 ● 0.0 ● ● ● ● ● −1.0 −0.6 −0.5 −0.4 ● ● ● ● ● 3.0 3.5 4.0 4.5 rsum(Y) 5.0 5.5 6.0 ● ● ● ● 2 4 6 csum(Y) • Why do nodes with the same outdegrees have different âi ’s? • Why do nodes with the same indegrees have different b̂j ’s? 8 10 19/36 Understanding the MLEs 6.0 muij.mle<- mu.hat+ outer(a.hat,b.hat,"+") p.mle<-exp(muij.mle)/(1+exp(muij.mle)) ; diag(p.mle)<-NA ● rsum(Y) 4.5 5.0 5.5 10 ● ● 4.0 csum(Y) 6 8 ● ● ● ● ● ● 3.5 4 ● ● 3.0 2 3.0 ● 3.5 4.0 4.5 5.0 rsum(p.mle) 5.5 6.0 ● 2 4 6 csum(p.mle) 8 10 Understanding MLEs Y e (µ+ai +bj )yi,j 1 + e µ+ai +bj i6=j X X Y = exp(µy·· + ai yi· + bj y·j ) (1 + e µ+ai +bj )−1 Pr(Y|µ, a, b) = i6=j Let’s find the maximizer of the log-likelihood in ai : log Pr(Y|µ, a, b) = µy·· + X ai yi· + X bj y·j − X i6=j ∂ ∂ai log Pr(Y|µ, a, b) = yi· − X i6=j e µ+ai +bj 1 + e µ+ai +bj The MLE occurs at values (µ̂, â, b̂) for which X e µ+ai +bj yi· = 1 + e µ+ai +bj j:i6=j P i.e., yi· = j p̂i,j 20/36 log(1 + e µ+ai +bj ) Understanding MLEs Y e (µ+ai +bj )yi,j 1 + e µ+ai +bj i6=j X X Y = exp(µy·· + ai yi· + bj y·j ) (1 + e µ+ai +bj )−1 Pr(Y|µ, a, b) = i6=j Let’s find the maximizer of the log-likelihood in ai : log Pr(Y|µ, a, b) = µy·· + X ai yi· + X bj y·j − X i6=j ∂ ∂ai log Pr(Y|µ, a, b) = yi· − X i6=j e µ+ai +bj 1 + e µ+ai +bj The MLE occurs at values (µ̂, â, b̂) for which X e µ+ai +bj yi· = 1 + e µ+ai +bj j:i6=j P i.e., yi· = j p̂i,j 20/36 log(1 + e µ+ai +bj ) Understanding MLEs Y e (µ+ai +bj )yi,j 1 + e µ+ai +bj i6=j X X Y = exp(µy·· + ai yi· + bj y·j ) (1 + e µ+ai +bj )−1 Pr(Y|µ, a, b) = i6=j Let’s find the maximizer of the log-likelihood in ai : log Pr(Y|µ, a, b) = µy·· + X ai yi· + X bj y·j − X i6=j ∂ ∂ai log Pr(Y|µ, a, b) = yi· − X i6=j e µ+ai +bj 1 + e µ+ai +bj The MLE occurs at values (µ̂, â, b̂) for which X e µ+ai +bj yi· = 1 + e µ+ai +bj j:i6=j P i.e., yi· = j p̂i,j 20/36 log(1 + e µ+ai +bj ) Understanding MLEs Y e (µ+ai +bj )yi,j 1 + e µ+ai +bj i6=j X X Y = exp(µy·· + ai yi· + bj y·j ) (1 + e µ+ai +bj )−1 Pr(Y|µ, a, b) = i6=j Let’s find the maximizer of the log-likelihood in ai : log Pr(Y|µ, a, b) = µy·· + X ai yi· + X bj y·j − X i6=j ∂ ∂ai log Pr(Y|µ, a, b) = yi· − X i6=j e µ+ai +bj 1 + e µ+ai +bj The MLE occurs at values (µ̂, â, b̂) for which X e µ+ai +bj yi· = 1 + e µ+ai +bj j:i6=j P i.e., yi· = j p̂i,j 20/36 log(1 + e µ+ai +bj ) 21/36 Model evaluation: Best case scenario How does the “best” RCE model compare to the data? log odds(Yi,j = 1) = µi,j µi,j = µ + ai + bj µ̂i,j = µ̂ + âi + b̂j muij.mle<- mu.hat+ outer(a.hat,b.hat,"+") p.mle<-exp(muij.mle)/(1+exp(muij.mle)) t.M<-NULL for(s in 1:S) { Ysim<-matrix(rbinom(n^2,1,p.mle),n,n) ; diag(Ysim)<-NA t.M<-rbind(t.M, c(mmean(Ysim),sd(rsum(Ysim)),sd(csum(Ysim))) ) } 22/36 0.20 0.25 0.30 tH 0.35 1.0 0.8 Density 0.6 0.2 0.0 0 0.0 0.2 5 0.4 0.4 Density 0.6 Density 0.8 10 1.0 1.2 15 Model evaluation: Best case scenario 1.0 1.5 2.0 tH 2.5 3.0 2.0 2.5 3.0 tH • improvement in terms of indegree heterogeneity; • still a discrepancy in terms of outdegree heterogeneity (why?) 3.5 4.0 23/36 0.25 0.30 tH 0.35 0.20 0.25 0.30 tH 0.35 1.0 Density 0.6 0.8 0.4 0.0 1.0 1.5 2.0 tH 2.5 3.0 1.0 1.5 2.0 tH 2.5 3.0 1.0 1.5 2.0 2.5 tH 3.0 3.5 4.0 4.5 1.0 1.5 2.0 2.5 tH 3.0 3.5 4.0 4.5 0.8 Density 0.4 0.6 0.2 0.0 0 0.0 0.2 5 0.4 Density Density 0.6 0.8 10 1.0 1.2 15 0.20 1.0 0 0.0 0.2 0.2 5 0.4 Density Density 0.6 0.8 10 1.0 1.2 1.2 15 Comparison to the i.i.d. model Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. 24/36 Model comparison Often it is desirable to have a numerical comparions of two models. SRG : Simple model, shows lack of fit. RCE : Complex model, shows less lack of fit. Is the RCE model better? Is the added complexity “worth it”? Selecting between two false models requires a balance between model fit and model complexity • Overly simple model: high explanatory power, poor fit to data, poor generalizability; • Overly complex model: low explanatory power, great fit to data, poor generalizability; • Sufficiently complex model: good explanatory power, good fit to data, and good generalizability. These notions can be made precise in various contexts: • communication and information theory; • out-of-sample predictive performance. Model comparison model performance = model fit - model complexity Various numerical criteria have been suggested for fit and complexity. One proposal is • fit = maximized log likelihood • complexity = number of parameters which leads to ˜ = log p(Y|θ̂) − p AIC For historical reasons, this is often multiplied by −2: AIC = −2 log p(Y|θ̂) + 2p, so that a low value of the AIC is better. 25/36 Model comparison model performance = model fit - model complexity Various numerical criteria have been suggested for fit and complexity. One proposal is • fit = maximized log likelihood • complexity = number of parameters which leads to ˜ = log p(Y|θ̂) − p AIC For historical reasons, this is often multiplied by −2: AIC = −2 log p(Y|θ̂) + 2p, so that a low value of the AIC is better. 25/36 Model comparison model performance = model fit - model complexity Various numerical criteria have been suggested for fit and complexity. One proposal is • fit = maximized log likelihood • complexity = number of parameters which leads to ˜ = log p(Y|θ̂) − p AIC For historical reasons, this is often multiplied by −2: AIC = −2 log p(Y|θ̂) + 2p, so that a low value of the AIC is better. 25/36 Model comparison model performance = model fit - model complexity Various numerical criteria have been suggested for fit and complexity. One proposal is • fit = maximized log likelihood • complexity = number of parameters which leads to ˜ = log p(Y|θ̂) − p AIC For historical reasons, this is often multiplied by −2: AIC = −2 log p(Y|θ̂) + 2p, so that a low value of the AIC is better. 25/36 Model comparison fit.0<-glm( y ~ 1, family=binomial) fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial) AIC(fit.0) ## [1] 369.183 AIC(fit.rc) ## [1] 397.4326 This isn’t too surprising: • fit.rc gives a better fit, but • fit.rc has n + n − 2 = 34 more parameters that fit.0 . Can you think of a better model? Recall fit.rc only seemed to improve the indegree fit. fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial) fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial) AIC(fit.r) ## [1] 399.1471 AIC(fit.c) ## [1] 368.308 26/36 26/36 Model comparison fit.0<-glm( y ~ 1, family=binomial) fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial) AIC(fit.0) ## [1] 369.183 AIC(fit.rc) ## [1] 397.4326 This isn’t too surprising: • fit.rc gives a better fit, but • fit.rc has n + n − 2 = 34 more parameters that fit.0 . Can you think of a better model? Recall fit.rc only seemed to improve the indegree fit. fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial) fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial) AIC(fit.r) ## [1] 399.1471 AIC(fit.c) ## [1] 368.308 26/36 Model comparison fit.0<-glm( y ~ 1, family=binomial) fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial) AIC(fit.0) ## [1] 369.183 AIC(fit.rc) ## [1] 397.4326 This isn’t too surprising: • fit.rc gives a better fit, but • fit.rc has n + n − 2 = 34 more parameters that fit.0 . Can you think of a better model? Recall fit.rc only seemed to improve the indegree fit. fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial) fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial) AIC(fit.r) ## [1] 399.1471 AIC(fit.c) ## [1] 368.308 Model comparison fit.0<-glm( y ~ 1, family=binomial) fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial) AIC(fit.0) ## [1] 369.183 AIC(fit.rc) ## [1] 397.4326 This isn’t too surprising: • fit.rc gives a better fit, but • fit.rc has n + n − 2 = 34 more parameters that fit.0 . Can you think of a better model? Recall fit.rc only seemed to improve the indegree fit. fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial) fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial) AIC(fit.r) ## [1] 399.1471 AIC(fit.c) ## [1] 368.308 26/36 Model comparison fit.0<-glm( y ~ 1, family=binomial) fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial) AIC(fit.0) ## [1] 369.183 AIC(fit.rc) ## [1] 397.4326 This isn’t too surprising: • fit.rc gives a better fit, but • fit.rc has n + n − 2 = 34 more parameters that fit.0 . Can you think of a better model? Recall fit.rc only seemed to improve the indegree fit. fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial) fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial) AIC(fit.r) ## [1] 399.1471 AIC(fit.c) ## [1] 368.308 26/36 Model comparison with BIC model performance = model fit - model complexity • fit = maximized log likelihood • complexity increasing in the number of parameters An alternative complexity measure results is Bayes Information Criterion (BIC): ˜ = log p(Y|θ̂) − 1 p × log n BIC 2 Again, for historical reasons, this is often multiplied by −2: BIC = −2 log p(Y|θ̂) + p × log n, so that a low value of the BIC is better. As compared to the AIC, • the BIC penalizes complexity more; • the BIC is generally better at model selection, AIC at prediction. 27/36 Model comparison with BIC model performance = model fit - model complexity • fit = maximized log likelihood • complexity increasing in the number of parameters An alternative complexity measure results is Bayes Information Criterion (BIC): ˜ = log p(Y|θ̂) − 1 p × log n BIC 2 Again, for historical reasons, this is often multiplied by −2: BIC = −2 log p(Y|θ̂) + p × log n, so that a low value of the BIC is better. As compared to the AIC, • the BIC penalizes complexity more; • the BIC is generally better at model selection, AIC at prediction. 27/36 Model comparison with BIC model performance = model fit - model complexity • fit = maximized log likelihood • complexity increasing in the number of parameters An alternative complexity measure results is Bayes Information Criterion (BIC): ˜ = log p(Y|θ̂) − 1 p × log n BIC 2 Again, for historical reasons, this is often multiplied by −2: BIC = −2 log p(Y|θ̂) + p × log n, so that a low value of the BIC is better. As compared to the AIC, • the BIC penalizes complexity more; • the BIC is generally better at model selection, AIC at prediction. 27/36 Model comparison with BIC model performance = model fit - model complexity • fit = maximized log likelihood • complexity increasing in the number of parameters An alternative complexity measure results is Bayes Information Criterion (BIC): ˜ = log p(Y|θ̂) − 1 p × log n BIC 2 Again, for historical reasons, this is often multiplied by −2: BIC = −2 log p(Y|θ̂) + p × log n, so that a low value of the BIC is better. As compared to the AIC, • the BIC penalizes complexity more; • the BIC is generally better at model selection, AIC at prediction. 27/36 Model comparison with BIC model performance = model fit - model complexity • fit = maximized log likelihood • complexity increasing in the number of parameters An alternative complexity measure results is Bayes Information Criterion (BIC): ˜ = log p(Y|θ̂) − 1 p × log n BIC 2 Again, for historical reasons, this is often multiplied by −2: BIC = −2 log p(Y|θ̂) + p × log n, so that a low value of the BIC is better. As compared to the AIC, • the BIC penalizes complexity more; • the BIC is generally better at model selection, AIC at prediction. 27/36 Model comparison with BIC model performance = model fit - model complexity • fit = maximized log likelihood • complexity increasing in the number of parameters An alternative complexity measure results is Bayes Information Criterion (BIC): ˜ = log p(Y|θ̂) − 1 p × log n BIC 2 Again, for historical reasons, this is often multiplied by −2: BIC = −2 log p(Y|θ̂) + p × log n, so that a low value of the BIC is better. As compared to the AIC, • the BIC penalizes complexity more; • the BIC is generally better at model selection, AIC at prediction. 27/36 Model comparison with BIC model performance = model fit - model complexity • fit = maximized log likelihood • complexity increasing in the number of parameters An alternative complexity measure results is Bayes Information Criterion (BIC): ˜ = log p(Y|θ̂) − 1 p × log n BIC 2 Again, for historical reasons, this is often multiplied by −2: BIC = −2 log p(Y|θ̂) + p × log n, so that a low value of the BIC is better. As compared to the AIC, • the BIC penalizes complexity more; • the BIC is generally better at model selection, AIC at prediction. 27/36 28/36 Model comparison with BIC BIC(fit.0) ## [1] 372.9065 BIC(fit.rc) ## [1] 527.7581 BIC(fit.r) ## [1] 466.1716 BIC(fit.c) ## [1] 435.3325 Caution: Theoretical justification of the BIC assumes one of the models considered is correct. 29/36 Model fit mu.hat<-fit.c$coef[1] b.hat<-c(fit.c$coef[2:18],-sum(fit.c$coef[2:18])) muij.mle<- mu.hat+ outer(rep(0,n),b.hat,"+") p.mle<-exp(muij.mle)/(1+exp(muij.mle)) 0.20 0.25 0.30 tH 0.35 Density 0.6 0.4 0.0 0 0.0 0.2 0.2 0.4 5 Density Density 0.6 0.8 10 1.0 0.8 1.2 1.0 15 1.4 t.M<-NULL for(s in 1:S) { Ysim<-matrix(rbinom(n^2,1,p.mle),n,n) ; diag(Ysim)<-NA t.M<-rbind(t.M, c(mmean(Ysim),sd(rsum(Ysim)),sd(csum(Ysim))) ) } 1.0 1.5 2.0 tH 2.5 2.0 2.5 3.0 tH 3.5 4.0 30/36 Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. 30/36 Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. 30/36 Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. 30/36 Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. 30/36 Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. 30/36 Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. 30/36 Model comparison For these monk data, • among these models, the column effects model has best AIC; • all models show lack of fit in terms of outdegree heterogeneity. The lack of fit is not surprising - we know an independence model is wrong: • monks were limited to three nominations per time period. Ideally, such design features of the data would be incorporated into the model. We will discuss such models towards the end of the quarter. 30/36 31/36 60 Conflict in the 90s ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 0 ● ● 0 2 ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 sd(rsum(Y)) ## [1] 3.589398 sd(csum(Y)) ## [1] 1.984451 5 7 ● ● 11 outdegree 26 30 ● ● ● 0 10 ● ● 20 ● 2 4 6 8 indegree 15 32/36 Model selection fit.0<-glm( y ~ 1, family=binomial) fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial) fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial) fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial) AIC(fit.0) ## [1] 2197.674 AIC(fit.r) ## [1] 1947.604 AIC(fit.c) ## [1] 2176.021 AIC(fit.rc) ## [1] 1897.398 For these data, the full RCE model is best among these four. 33/36 Model selection with BIC BIC(fit.0) ## [1] 2205.401 BIC(fit.r) ## [1] 2952.159 BIC(fit.c) ## [1] 3180.576 BIC(fit.rc) ## [1] 3898.78 34/36 0.009 0.010 0.011 0.012 0.013 tH 0.014 0.015 0.016 2.0 Density 1.0 0.0 0 0.0 0.2 100 0.5 0.4 200 Density 300 Density 0.6 0.8 1.5 1.0 400 1.2 500 Best case comparison 3.0 3.5 4.0 tH 4.5 1.8 2.0 2.2 2.4 tH 2.6 2.8 3.0 35/36 Row effects only mu.hat<-fit.r$coef[1] a.hat<- fit.r$coef[1+1:(nrow(Y)-1)] ; a.hat<-c( a.hat,-sum(a.hat) ) theta.mle<- mu.hat+ outer(a.hat,rep(0,nrow(Y)),"+") 0.010 0.011 0.012 tH 0.013 0.014 0.015 3 Density 2 0 0 0.0 0.2 1 100 0.4 Density 0.6 Density 200 300 0.8 400 1.0 4 1.2 p.mle<-exp(theta.mle)/(1+exp(theta.mle)) 2.5 3.0 3.5 4.0 tH 4.5 5.0 1.0 1.2 1.4 1.6 tH 1.8 2.0 36/36 0.009 0.010 0.011 0.012 0.013 0.014 0.015 0.016 4 Density 2 3 1 0 0 0.0 0.2 100 0.4 Density 0.6 Density 200 300 0.8 400 1.0 1.2 Fit comparison 2.5 3.0 3.5 4.0 4.5 5.0 4.0 4.5 5.0 1.0 1.5 2.0 tH 2.5 3.0 1.0 1.5 2.0 tH 2.5 3.0 tH 0.009 0.010 0.011 0.012 0.013 tH 0.014 0.015 0.016 2.0 Density 1.0 0.0 0 0.0 0.2 100 0.5 0.4 Density 200 300 Density 0.6 0.8 1.5 1.0 400 1.2 500 tH 2.5 3.0 3.5 tH Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼? 37/36 Conditional testing Recall the “best case scenario” evaluation is somewhat ad-hoc. Compare to the conditional evaluation: Suppose Y ∼ SRG (n, θ): • {Y|y·· } 6∼ SRG (n, θ); • {Y|y·· } ∼ ERG (n, y·· ). Similarly, suppose Y ∼ RCE (µ, a, b) : • {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂). • {Y|y·· , {yi· }, {y·i }} ∼?