Row and column effects - University of Washington

advertisement
1/36
Row and column effects
567 Statistical analysis of social networks
Peter Hoff
Statistics, University of Washington
2/36
Monk data
●
6
●
●
4
●
●
2
●
●
0
●
●
●
●
●
3
4
5
6
outdegree
●
4
5
●
3
●
●
●
0
1
2
●
2
sd(rsum(Y))
## [1] 0.9002541
sd(csum(Y))
## [1] 2.698341
3
4
5
6
7
indegree
8
10
11
3/36
Testing the SRG
rERG<-function(n,s)
{
Y<-matrix(0,n,n) ; diag(Y)<-NA
Y[!is.na(Y)]<- sample( c(rep(1,s),rep(0,n*(n-1)-s )))
Y
}
n<-nrow(Y)
s<-sum(Y,na.rm=TRUE)
sdo<-sd(rsum(Y))
sdi<-sd(csum(Y))
DSD<-NULL
for(sim in 1:S)
{
Ysim<-rERG(n,s)
DSD<-rbind(DSD, c(sd(rsum(Ysim)),sd(csum(Ysim))))
}
4/36
0.0
0.0
0.2
0.2
0.4
0.4
Density
0.6 0.8
Density
0.6 0.8
1.0
1.0
1.2
1.2
Testing the SRG
1.0
1.5
2.0
2.5
outdegree sd
3.0
1.0
1.5
2.0
indegree sd
2.5
5/36
Testing the SRG
Compared to any SRG,
• the nodes show less heterogeneity in outdegree;
• the nodes show more heterogenetiy in indegree.
No SRG model provides a good representation of Y in terms of its degrees.
What kind of model would provide a better representation of the data?
5/36
Testing the SRG
Compared to any SRG,
• the nodes show less heterogeneity in outdegree;
• the nodes show more heterogenetiy in indegree.
No SRG model provides a good representation of Y in terms of its degrees.
What kind of model would provide a better representation of the data?
5/36
Testing the SRG
Compared to any SRG,
• the nodes show less heterogeneity in outdegree;
• the nodes show more heterogenetiy in indegree.
No SRG model provides a good representation of Y in terms of its degrees.
What kind of model would provide a better representation of the data?
5/36
Testing the SRG
Compared to any SRG,
• the nodes show less heterogeneity in outdegree;
• the nodes show more heterogenetiy in indegree.
No SRG model provides a good representation of Y in terms of its degrees.
What kind of model would provide a better representation of the data?
5/36
Testing the SRG
Compared to any SRG,
• the nodes show less heterogeneity in outdegree;
• the nodes show more heterogenetiy in indegree.
No SRG model provides a good representation of Y in terms of its degrees.
What kind of model would provide a better representation of the data?
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
6/36
ANOVA for factorial data
Consider the data matrix for a generic two-factor layout:
1
2
..
.
K1
1
y1,1
y2,1
yK1 ,1
2
y1,2
y2,2
..
.
yK1 ,2
···
···
···
K2
y1,K2
y2,K2
···
yK1 ,K2
The standard “ANOVA” row and column effects (RCE) model for such data is
yi,j = µ + ai + bj + i,j
• µ represents an overall mean;
• ai represents the average deviation of row i from µ;
• bj represents the average deviation of column j from µ.
More abstractly,
• ai is the “effect” of factor one having level i;
• bj is the “effect” of factor two being level j.
7/36
ANOVA for factorial designs
ML and OLS estimates are given by
• µ̂ = ȳ·· ;
• âi = ȳi· − ȳ·· ;
• b̂j = ȳ·j − ȳ·· .
Note that
1 X
(ȳi· − ȳ·· )
K1
X
1
ȳi· − ȳ··
=
K1
X
X
1
=
yi,j − ȳ··
K1 K2
= ȳ·· − ȳ·· = 0.
mean(â1 , . . . , âK1 ) =
var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ).
7/36
ANOVA for factorial designs
ML and OLS estimates are given by
• µ̂ = ȳ·· ;
• âi = ȳi· − ȳ·· ;
• b̂j = ȳ·j − ȳ·· .
Note that
1 X
(ȳi· − ȳ·· )
K1
X
1
ȳi· − ȳ··
=
K1
X
X
1
=
yi,j − ȳ··
K1 K2
= ȳ·· − ȳ·· = 0.
mean(â1 , . . . , âK1 ) =
var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ).
7/36
ANOVA for factorial designs
ML and OLS estimates are given by
• µ̂ = ȳ·· ;
• âi = ȳi· − ȳ·· ;
• b̂j = ȳ·j − ȳ·· .
Note that
1 X
(ȳi· − ȳ·· )
K1
X
1
ȳi· − ȳ··
=
K1
X
X
1
=
yi,j − ȳ··
K1 K2
= ȳ·· − ȳ·· = 0.
mean(â1 , . . . , âK1 ) =
var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ).
7/36
ANOVA for factorial designs
ML and OLS estimates are given by
• µ̂ = ȳ·· ;
• âi = ȳi· − ȳ·· ;
• b̂j = ȳ·j − ȳ·· .
Note that
1 X
(ȳi· − ȳ·· )
K1
X
1
ȳi· − ȳ··
=
K1
X
X
1
=
yi,j − ȳ··
K1 K2
= ȳ·· − ȳ·· = 0.
mean(â1 , . . . , âK1 ) =
var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ).
ANOVA for factorial designs
ML and OLS estimates are given by
• µ̂ = ȳ·· ;
• âi = ȳi· − ȳ·· ;
• b̂j = ȳ·j − ȳ·· .
Note that
1 X
(ȳi· − ȳ·· )
K1
X
1
ȳi· − ȳ··
=
K1
X
X
1
=
yi,j − ȳ··
K1 K2
= ȳ·· − ȳ·· = 0.
mean(â1 , . . . , âK1 ) =
var (â1 , . . . , âK1 ) = var(ȳ1· , . . . , ȳK1 · ).
7/36
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
RCE model interpretation
For the RCE model
• the model is parameterized in terms of differences among row means and
differences among column means;
• heterogeneity in the oberved row/column means is represented by
heterogeneity in the âi ’s and b̂j ’s;
• tests for row and column heterogeneity can be made via F -tests.
Back to binary directed network data:
• the data take the form of a two-way layout;
• we’ve seen empirically the existence of row and column heterogeneity.
Is there an analogous RCE model for network data?
8/36
Odd and probabilities
Let Pr(Yi,j = 1) = θ. Then
odds(Yi,j = 1) = θ/(1 − θ)
log odds(Yi,j = 1) = log θ/(1 − θ) = µ
Now write the probability in terms of the log odds µ:
log odds(Yi,j = 1) = µ
odds(Yi,j = 1) = e µ
θ/(1 − θ) = e µ
θ = e µ − θe µ
µ
θ(1 + e ) = e µ
θ = Pr(Yi,j = 1) =
9/36
eµ
1 + eµ
10/36
Logistic regression
Let {Yi,j : i 6= j} be independent binary random variables with
Pr(Yi,j ) = θi,j .
• If µi,j = µ for all i, j, then we have the SRG(n,
eµ
)
1+e µ
model.
• Our hypothesis tests generally reject this model.
Thus the hypothesis of homogeneous µi,j ’s is rejected.
What heterogeneity in the µi,j ’s will help us capture the observed data
patterns?
Logistic regression
Let {Yi,j : i 6= j} be independent binary random variables with
Pr(Yi,j ) = θi,j .
• If µi,j = µ for all i, j, then we have the SRG(n,
eµ
)
1+e µ
model.
• Our hypothesis tests generally reject this model.
Thus the hypothesis of homogeneous µi,j ’s is rejected.
What heterogeneity in the µi,j ’s will help us capture the observed data
patterns?
10/36
Logistic regression
Let {Yi,j : i 6= j} be independent binary random variables with
Pr(Yi,j ) = θi,j .
• If µi,j = µ for all i, j, then we have the SRG(n,
eµ
)
1+e µ
model.
• Our hypothesis tests generally reject this model.
Thus the hypothesis of homogeneous µi,j ’s is rejected.
What heterogeneity in the µi,j ’s will help us capture the observed data
patterns?
10/36
Logistic regression
Let {Yi,j : i 6= j} be independent binary random variables with
Pr(Yi,j ) = θi,j .
• If µi,j = µ for all i, j, then we have the SRG(n,
eµ
)
1+e µ
model.
• Our hypothesis tests generally reject this model.
Thus the hypothesis of homogeneous µi,j ’s is rejected.
What heterogeneity in the µi,j ’s will help us capture the observed data
patterns?
10/36
Logistic regression
Let {Yi,j : i 6= j} be independent binary random variables with
Pr(Yi,j ) = θi,j .
• If µi,j = µ for all i, j, then we have the SRG(n,
eµ
)
1+e µ
model.
• Our hypothesis tests generally reject this model.
Thus the hypothesis of homogeneous µi,j ’s is rejected.
What heterogeneity in the µi,j ’s will help us capture the observed data
patterns?
10/36
Logistic regression
Let {Yi,j : i 6= j} be independent binary random variables with
Pr(Yi,j ) = θi,j .
• If µi,j = µ for all i, j, then we have the SRG(n,
eµ
)
1+e µ
model.
• Our hypothesis tests generally reject this model.
Thus the hypothesis of homogeneous µi,j ’s is rejected.
What heterogeneity in the µi,j ’s will help us capture the observed data
patterns?
10/36
11/36
Binary RCE model
RCE model: {Yi,j : i 6= j} are independent with
e µi,j
1 + e µi,j
= µ + ai + bj
Pr(Yi,j = 1) =
µi,j
or more compactly,
Pr(Yi,j = 1) =
e µ+ai +bj
1 + e µ+ai +bj
Interpretation:
odds(Yi,j = 1) = e µ+ai +bj
odds ratio(Yi,j = 1 : Yk,j = 1) = e µ+ai +bj /e µ+ak +bj
log odds ratio(Yi,j = 1 : Yk,j = 1) = [µ + ai + bj ] − [µ + ak + bj ] = ai − ak
Binary RCE model
RCE model: {Yi,j : i 6= j} are independent with
e µi,j
1 + e µi,j
= µ + ai + bj
Pr(Yi,j = 1) =
µi,j
or more compactly,
Pr(Yi,j = 1) =
e µ+ai +bj
1 + e µ+ai +bj
Interpretation:
odds(Yi,j = 1) = e µ+ai +bj
odds ratio(Yi,j = 1 : Yk,j = 1) = e µ+ai +bj /e µ+ak +bj
log odds ratio(Yi,j = 1 : Yk,j = 1) = [µ + ai + bj ] − [µ + ak + bj ] = ai − ak
11/36
12/36
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
12/36
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
12/36
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
12/36
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
12/36
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
12/36
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
12/36
Binary RCE model
e µ+ai +bj
1 + e µ+ai +bj
The differences among the ai ’s represents heterogeneity among nodes in terms
of the probability of sending a tie, and similarly for the bj ’s.
Pr(Yi,j = 1) =
• The ai ’s can represent heterogeneity in nodal outdegree;
• The bj ’s can represent heterogeneity in nodal outdegree.
Caution: The row and column effects a and b are only identified up to
differences. Suppose c + d + e = 0:
e µ+ai +bj
1 + e µ+ai +bj
e µ+ai +bj +c+d+e
=
1 + e µ+ai +bj +c+d+e
Pr(Yi,j = 1) =
=
e (µ+c)+(ai +d)+(bj +e)
e µ̃+ãi +b̃j
=
(µ+c)+(ai +d)+(bj +e)
1+e
1 + e µ̃+ãi +b̃j
It is standard to restrict the estimates somehow to reflect this:
P
P
• Sum to zero side conditions :
ai = 0,
bj = 0
• Set to zero side conditions : a1 = b1 = 0
12/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
13/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
Fitting the binary RCE in R
To make an anaolgy to the two-way factorial design,
• sender index = row factor;
• receiver index = columns factor.
To fit the RCE model with logistic regrssion in R, we need to
• code the senders as a row factor;
• code the receivers as a column factor;
• convert everything to vectors.
13/36
14/36
Fitting the binary RCE in R
Ridx<-matrix((1:nrow(Y)),nrow(Y),nrow(Y))
Cidx<-t(Ridx)
Ridx[1:4,1:4]
##
##
##
##
##
[1,]
[2,]
[3,]
[4,]
[,1] [,2] [,3] [,4]
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
Cidx[1:4,1:4]
##
##
##
##
##
[1,]
[2,]
[3,]
[4,]
[,1] [,2] [,3] [,4]
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Y[1:4,1:4]
##
##
##
##
##
ROMUL
BONAVEN
AMBROSE
BERTH
ROMUL BONAVEN AMBROSE BERTH
NA
1
1
0
1
NA
0
0
1
1
NA
0
0
0
0
NA
15/36
Fitting the binary RCE in R
y<-c(Y)
ridx<-c(Ridx)
cidx<-c(Cidx)
y[1:22]
##
[1] NA
1
1
0
1
1
1
1
0
0
0
1
0
1
1
1
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
1 NA
1
0
1
3
4
ridx[1:22]
##
[1]
1
cidx[1:22]
##
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
2
16/36
Fitting the binary RCE in R
fit<-glm( y ~ factor(ridx) + factor(cidx), family=binomial)
summary(fit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
glm(formula = y ~ factor(ridx) + factor(cidx), family = binomial)
Deviance Residuals:
Min
1Q
Median
-1.5975 -0.8044 -0.5725
3Q
0.9091
Max
2.2146
Coefficients:
(Intercept)
factor(ridx)2
factor(ridx)3
factor(ridx)4
factor(ridx)5
factor(ridx)6
factor(ridx)7
factor(ridx)8
factor(ridx)9
factor(ridx)10
factor(ridx)11
factor(ridx)12
factor(ridx)13
factor(ridx)14
factor(ridx)15
factor(ridx)16
Estimate Std. Error z value Pr(>|z|)
1.1154
0.7558
1.476 0.14001
-0.3175
0.7780 -0.408 0.68319
-0.4545
0.7823 -0.581 0.56124
-0.7633
0.8136 -0.938 0.34816
-0.3557
0.7820 -0.455 0.64920
-0.8156
0.8103 -1.007 0.31415
-0.3954
0.7839 -0.504 0.61394
-0.1036
0.7654 -0.135 0.89232
-0.7818
0.8128 -0.962 0.33616
-0.1668
0.7624 -0.219 0.82686
-0.4545
0.7823 -0.581 0.56124
-0.7231
0.8133 -0.889 0.37395
-1.1720
0.8627 -1.358 0.17433
-0.3954
0.7839 -0.504 0.61394
-0.1453
0.7641 -0.190 0.84917
-0.4545
0.7823 -0.581 0.56124
17/36
Fitting the binary RCE in R
fit<-glm( y ~
summary(fit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
C(factor(ridx),sum) + C(factor(cidx),sum) , family=binomial)
Call:
glm(formula = y ~ C(factor(ridx), sum) + C(factor(cidx), sum),
family = binomial)
Deviance Residuals:
Min
1Q
Median
-1.5975 -0.8044 -0.5725
3Q
0.9091
Max
2.2146
Coefficients:
(Intercept)
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
C(factor(ridx),
sum)1
sum)2
sum)3
sum)4
sum)5
sum)6
sum)7
sum)8
sum)9
sum)10
sum)11
sum)12
sum)13
sum)14
Estimate Std. Error z value Pr(>|z|)
-1.03516
0.14270 -7.254 4.04e-13 ***
0.47000
0.52409
0.897 0.369829
0.15250
0.54948
0.278 0.781374
0.01546
0.55376
0.028 0.977721
-0.29326
0.59177 -0.496 0.620204
0.11430
0.55418
0.206 0.836593
-0.34557
0.58751 -0.588 0.556408
0.07456
0.55617
0.134 0.893358
0.36639
0.53352
0.687 0.492248
-0.31179
0.59078 -0.528 0.597672
0.30324
0.52919
0.573 0.566632
0.01546
0.55376
0.028 0.977721
-0.25315
0.59185 -0.428 0.668850
-0.70197
0.64986 -1.080 0.280056
0.07456
0.55617
0.134 0.893358
Fitting the binary RCE in R
mu.hat<-fit$coef[1]
a.hat<- fit$coef[1+1:(nrow(Y)-1)]
; a.hat<-c(a.hat,-sum(a.hat) )
b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)]
; b.hat<-c(b.hat,-sum(b.hat) )
●
●
●
a.hat
−0.2
0.0
●
●
1.0
0.2
●
●
●
1.5
0.4
●
●
●
b.hat
0.5
●
0.0
●
●
●
●
●
−1.0
−0.6
−0.5
−0.4
●
●
●
●
●
3.0
3.5
4.0
4.5
rsum(Y)
5.0
5.5
6.0
●
●
●
●
2
4
6
csum(Y)
• Why do nodes with the same outdegrees have different âi ’s?
• Why do nodes with the same indegrees have different b̂j ’s?
18/36
8
10
18/36
Fitting the binary RCE in R
mu.hat<-fit$coef[1]
a.hat<- fit$coef[1+1:(nrow(Y)-1)]
; a.hat<-c(a.hat,-sum(a.hat) )
b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)]
; b.hat<-c(b.hat,-sum(b.hat) )
●
●
●
a.hat
−0.2
0.0
●
●
1.0
0.2
●
●
●
1.5
0.4
●
●
●
b.hat
0.5
●
0.0
●
●
●
●
●
−1.0
−0.6
−0.5
−0.4
●
●
●
●
●
3.0
3.5
4.0
4.5
rsum(Y)
5.0
5.5
6.0
●
●
●
●
2
4
6
csum(Y)
• Why do nodes with the same outdegrees have different âi ’s?
• Why do nodes with the same indegrees have different b̂j ’s?
8
10
18/36
Fitting the binary RCE in R
mu.hat<-fit$coef[1]
a.hat<- fit$coef[1+1:(nrow(Y)-1)]
; a.hat<-c(a.hat,-sum(a.hat) )
b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)]
; b.hat<-c(b.hat,-sum(b.hat) )
●
●
●
a.hat
−0.2
0.0
●
●
1.0
0.2
●
●
●
1.5
0.4
●
●
●
b.hat
0.5
●
0.0
●
●
●
●
●
−1.0
−0.6
−0.5
−0.4
●
●
●
●
●
3.0
3.5
4.0
4.5
rsum(Y)
5.0
5.5
6.0
●
●
●
●
2
4
6
csum(Y)
• Why do nodes with the same outdegrees have different âi ’s?
• Why do nodes with the same indegrees have different b̂j ’s?
8
10
18/36
Fitting the binary RCE in R
mu.hat<-fit$coef[1]
a.hat<- fit$coef[1+1:(nrow(Y)-1)]
; a.hat<-c(a.hat,-sum(a.hat) )
b.hat<- fit$coef[nrow(Y)+1:(nrow(Y)-1)]
; b.hat<-c(b.hat,-sum(b.hat) )
●
●
●
a.hat
−0.2
0.0
●
●
1.0
0.2
●
●
●
1.5
0.4
●
●
●
b.hat
0.5
●
0.0
●
●
●
●
●
−1.0
−0.6
−0.5
−0.4
●
●
●
●
●
3.0
3.5
4.0
4.5
rsum(Y)
5.0
5.5
6.0
●
●
●
●
2
4
6
csum(Y)
• Why do nodes with the same outdegrees have different âi ’s?
• Why do nodes with the same indegrees have different b̂j ’s?
8
10
19/36
Understanding the MLEs
6.0
muij.mle<- mu.hat+ outer(a.hat,b.hat,"+")
p.mle<-exp(muij.mle)/(1+exp(muij.mle)) ; diag(p.mle)<-NA
●
rsum(Y)
4.5
5.0
5.5
10
●
●
4.0
csum(Y)
6
8
●
●
●
●
●
●
3.5
4
●
●
3.0
2
3.0
●
3.5
4.0
4.5
5.0
rsum(p.mle)
5.5
6.0
●
2
4
6
csum(p.mle)
8
10
Understanding MLEs
Y e (µ+ai +bj )yi,j
1 + e µ+ai +bj
i6=j
X
X
Y
= exp(µy·· +
ai yi· +
bj y·j ) (1 + e µ+ai +bj )−1
Pr(Y|µ, a, b) =
i6=j
Let’s find the maximizer of the log-likelihood in ai :
log Pr(Y|µ, a, b) = µy·· +
X
ai yi· +
X
bj y·j −
X
i6=j
∂
∂ai
log Pr(Y|µ, a, b) = yi· −
X
i6=j
e µ+ai +bj
1 + e µ+ai +bj
The MLE occurs at values (µ̂, â, b̂) for which
X e µ+ai +bj
yi· =
1 + e µ+ai +bj
j:i6=j
P
i.e., yi· = j p̂i,j
20/36
log(1 + e µ+ai +bj )
Understanding MLEs
Y e (µ+ai +bj )yi,j
1 + e µ+ai +bj
i6=j
X
X
Y
= exp(µy·· +
ai yi· +
bj y·j ) (1 + e µ+ai +bj )−1
Pr(Y|µ, a, b) =
i6=j
Let’s find the maximizer of the log-likelihood in ai :
log Pr(Y|µ, a, b) = µy·· +
X
ai yi· +
X
bj y·j −
X
i6=j
∂
∂ai
log Pr(Y|µ, a, b) = yi· −
X
i6=j
e µ+ai +bj
1 + e µ+ai +bj
The MLE occurs at values (µ̂, â, b̂) for which
X e µ+ai +bj
yi· =
1 + e µ+ai +bj
j:i6=j
P
i.e., yi· = j p̂i,j
20/36
log(1 + e µ+ai +bj )
Understanding MLEs
Y e (µ+ai +bj )yi,j
1 + e µ+ai +bj
i6=j
X
X
Y
= exp(µy·· +
ai yi· +
bj y·j ) (1 + e µ+ai +bj )−1
Pr(Y|µ, a, b) =
i6=j
Let’s find the maximizer of the log-likelihood in ai :
log Pr(Y|µ, a, b) = µy·· +
X
ai yi· +
X
bj y·j −
X
i6=j
∂
∂ai
log Pr(Y|µ, a, b) = yi· −
X
i6=j
e µ+ai +bj
1 + e µ+ai +bj
The MLE occurs at values (µ̂, â, b̂) for which
X e µ+ai +bj
yi· =
1 + e µ+ai +bj
j:i6=j
P
i.e., yi· = j p̂i,j
20/36
log(1 + e µ+ai +bj )
Understanding MLEs
Y e (µ+ai +bj )yi,j
1 + e µ+ai +bj
i6=j
X
X
Y
= exp(µy·· +
ai yi· +
bj y·j ) (1 + e µ+ai +bj )−1
Pr(Y|µ, a, b) =
i6=j
Let’s find the maximizer of the log-likelihood in ai :
log Pr(Y|µ, a, b) = µy·· +
X
ai yi· +
X
bj y·j −
X
i6=j
∂
∂ai
log Pr(Y|µ, a, b) = yi· −
X
i6=j
e µ+ai +bj
1 + e µ+ai +bj
The MLE occurs at values (µ̂, â, b̂) for which
X e µ+ai +bj
yi· =
1 + e µ+ai +bj
j:i6=j
P
i.e., yi· = j p̂i,j
20/36
log(1 + e µ+ai +bj )
21/36
Model evaluation: Best case scenario
How does the “best” RCE model compare to the data?
log odds(Yi,j = 1) = µi,j
µi,j = µ + ai + bj
µ̂i,j = µ̂ + âi + b̂j
muij.mle<- mu.hat+ outer(a.hat,b.hat,"+")
p.mle<-exp(muij.mle)/(1+exp(muij.mle))
t.M<-NULL
for(s in 1:S)
{
Ysim<-matrix(rbinom(n^2,1,p.mle),n,n) ; diag(Ysim)<-NA
t.M<-rbind(t.M, c(mmean(Ysim),sd(rsum(Ysim)),sd(csum(Ysim))) )
}
22/36
0.20
0.25
0.30
tH
0.35
1.0
0.8
Density
0.6
0.2
0.0
0
0.0
0.2
5
0.4
0.4
Density
0.6
Density
0.8
10
1.0
1.2
15
Model evaluation: Best case scenario
1.0
1.5
2.0
tH
2.5
3.0
2.0
2.5
3.0
tH
• improvement in terms of indegree heterogeneity;
• still a discrepancy in terms of outdegree heterogeneity (why?)
3.5
4.0
23/36
0.25
0.30
tH
0.35
0.20
0.25
0.30
tH
0.35
1.0
Density
0.6 0.8
0.4
0.0
1.0
1.5
2.0
tH
2.5
3.0
1.0
1.5
2.0
tH
2.5
3.0
1.0
1.5
2.0
2.5
tH
3.0
3.5
4.0
4.5
1.0
1.5
2.0
2.5
tH
3.0
3.5
4.0
4.5
0.8
Density
0.4
0.6
0.2
0.0
0
0.0
0.2
5
0.4
Density
Density
0.6 0.8
10
1.0
1.2
15
0.20
1.0
0
0.0
0.2
0.2
5
0.4
Density
Density
0.6 0.8
10
1.0
1.2
1.2
15
Comparison to the i.i.d. model
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
24/36
Model comparison
Often it is desirable to have a numerical comparions of two models.
SRG : Simple model, shows lack of fit.
RCE : Complex model, shows less lack of fit.
Is the RCE model better? Is the added complexity “worth it”?
Selecting between two false models requires a balance between
model fit and model complexity
• Overly simple model: high explanatory power, poor fit to data, poor
generalizability;
• Overly complex model: low explanatory power, great fit to data, poor
generalizability;
• Sufficiently complex model: good explanatory power, good fit to data, and
good generalizability.
These notions can be made precise in various contexts:
• communication and information theory;
• out-of-sample predictive performance.
Model comparison
model performance = model fit - model complexity
Various numerical criteria have been suggested for fit and complexity.
One proposal is
• fit = maximized log likelihood
• complexity = number of parameters
which leads to
˜ = log p(Y|θ̂) − p
AIC
For historical reasons, this is often multiplied by −2:
AIC = −2 log p(Y|θ̂) + 2p,
so that a low value of the AIC is better.
25/36
Model comparison
model performance = model fit - model complexity
Various numerical criteria have been suggested for fit and complexity.
One proposal is
• fit = maximized log likelihood
• complexity = number of parameters
which leads to
˜ = log p(Y|θ̂) − p
AIC
For historical reasons, this is often multiplied by −2:
AIC = −2 log p(Y|θ̂) + 2p,
so that a low value of the AIC is better.
25/36
Model comparison
model performance = model fit - model complexity
Various numerical criteria have been suggested for fit and complexity.
One proposal is
• fit = maximized log likelihood
• complexity = number of parameters
which leads to
˜ = log p(Y|θ̂) − p
AIC
For historical reasons, this is often multiplied by −2:
AIC = −2 log p(Y|θ̂) + 2p,
so that a low value of the AIC is better.
25/36
Model comparison
model performance = model fit - model complexity
Various numerical criteria have been suggested for fit and complexity.
One proposal is
• fit = maximized log likelihood
• complexity = number of parameters
which leads to
˜ = log p(Y|θ̂) − p
AIC
For historical reasons, this is often multiplied by −2:
AIC = −2 log p(Y|θ̂) + 2p,
so that a low value of the AIC is better.
25/36
Model comparison
fit.0<-glm( y ~ 1, family=binomial)
fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial)
AIC(fit.0)
## [1] 369.183
AIC(fit.rc)
## [1] 397.4326
This isn’t too surprising:
• fit.rc gives a better fit, but
• fit.rc has n + n − 2 = 34 more parameters that fit.0 .
Can you think of a better model?
Recall fit.rc only seemed to improve the indegree fit.
fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial)
fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial)
AIC(fit.r)
## [1] 399.1471
AIC(fit.c)
## [1] 368.308
26/36
26/36
Model comparison
fit.0<-glm( y ~ 1, family=binomial)
fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial)
AIC(fit.0)
## [1] 369.183
AIC(fit.rc)
## [1] 397.4326
This isn’t too surprising:
• fit.rc gives a better fit, but
• fit.rc has n + n − 2 = 34 more parameters that fit.0 .
Can you think of a better model?
Recall fit.rc only seemed to improve the indegree fit.
fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial)
fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial)
AIC(fit.r)
## [1] 399.1471
AIC(fit.c)
## [1] 368.308
26/36
Model comparison
fit.0<-glm( y ~ 1, family=binomial)
fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial)
AIC(fit.0)
## [1] 369.183
AIC(fit.rc)
## [1] 397.4326
This isn’t too surprising:
• fit.rc gives a better fit, but
• fit.rc has n + n − 2 = 34 more parameters that fit.0 .
Can you think of a better model?
Recall fit.rc only seemed to improve the indegree fit.
fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial)
fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial)
AIC(fit.r)
## [1] 399.1471
AIC(fit.c)
## [1] 368.308
Model comparison
fit.0<-glm( y ~ 1, family=binomial)
fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial)
AIC(fit.0)
## [1] 369.183
AIC(fit.rc)
## [1] 397.4326
This isn’t too surprising:
• fit.rc gives a better fit, but
• fit.rc has n + n − 2 = 34 more parameters that fit.0 .
Can you think of a better model?
Recall fit.rc only seemed to improve the indegree fit.
fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial)
fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial)
AIC(fit.r)
## [1] 399.1471
AIC(fit.c)
## [1] 368.308
26/36
Model comparison
fit.0<-glm( y ~ 1, family=binomial)
fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial)
AIC(fit.0)
## [1] 369.183
AIC(fit.rc)
## [1] 397.4326
This isn’t too surprising:
• fit.rc gives a better fit, but
• fit.rc has n + n − 2 = 34 more parameters that fit.0 .
Can you think of a better model?
Recall fit.rc only seemed to improve the indegree fit.
fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial)
fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial)
AIC(fit.r)
## [1] 399.1471
AIC(fit.c)
## [1] 368.308
26/36
Model comparison with BIC
model performance = model fit - model complexity
• fit = maximized log likelihood
• complexity increasing in the number of parameters
An alternative complexity measure results is Bayes Information Criterion (BIC):
˜ = log p(Y|θ̂) − 1 p × log n
BIC
2
Again, for historical reasons, this is often multiplied by −2:
BIC = −2 log p(Y|θ̂) + p × log n,
so that a low value of the BIC is better.
As compared to the AIC,
• the BIC penalizes complexity more;
• the BIC is generally better at model selection, AIC at prediction.
27/36
Model comparison with BIC
model performance = model fit - model complexity
• fit = maximized log likelihood
• complexity increasing in the number of parameters
An alternative complexity measure results is Bayes Information Criterion (BIC):
˜ = log p(Y|θ̂) − 1 p × log n
BIC
2
Again, for historical reasons, this is often multiplied by −2:
BIC = −2 log p(Y|θ̂) + p × log n,
so that a low value of the BIC is better.
As compared to the AIC,
• the BIC penalizes complexity more;
• the BIC is generally better at model selection, AIC at prediction.
27/36
Model comparison with BIC
model performance = model fit - model complexity
• fit = maximized log likelihood
• complexity increasing in the number of parameters
An alternative complexity measure results is Bayes Information Criterion (BIC):
˜ = log p(Y|θ̂) − 1 p × log n
BIC
2
Again, for historical reasons, this is often multiplied by −2:
BIC = −2 log p(Y|θ̂) + p × log n,
so that a low value of the BIC is better.
As compared to the AIC,
• the BIC penalizes complexity more;
• the BIC is generally better at model selection, AIC at prediction.
27/36
Model comparison with BIC
model performance = model fit - model complexity
• fit = maximized log likelihood
• complexity increasing in the number of parameters
An alternative complexity measure results is Bayes Information Criterion (BIC):
˜ = log p(Y|θ̂) − 1 p × log n
BIC
2
Again, for historical reasons, this is often multiplied by −2:
BIC = −2 log p(Y|θ̂) + p × log n,
so that a low value of the BIC is better.
As compared to the AIC,
• the BIC penalizes complexity more;
• the BIC is generally better at model selection, AIC at prediction.
27/36
Model comparison with BIC
model performance = model fit - model complexity
• fit = maximized log likelihood
• complexity increasing in the number of parameters
An alternative complexity measure results is Bayes Information Criterion (BIC):
˜ = log p(Y|θ̂) − 1 p × log n
BIC
2
Again, for historical reasons, this is often multiplied by −2:
BIC = −2 log p(Y|θ̂) + p × log n,
so that a low value of the BIC is better.
As compared to the AIC,
• the BIC penalizes complexity more;
• the BIC is generally better at model selection, AIC at prediction.
27/36
Model comparison with BIC
model performance = model fit - model complexity
• fit = maximized log likelihood
• complexity increasing in the number of parameters
An alternative complexity measure results is Bayes Information Criterion (BIC):
˜ = log p(Y|θ̂) − 1 p × log n
BIC
2
Again, for historical reasons, this is often multiplied by −2:
BIC = −2 log p(Y|θ̂) + p × log n,
so that a low value of the BIC is better.
As compared to the AIC,
• the BIC penalizes complexity more;
• the BIC is generally better at model selection, AIC at prediction.
27/36
Model comparison with BIC
model performance = model fit - model complexity
• fit = maximized log likelihood
• complexity increasing in the number of parameters
An alternative complexity measure results is Bayes Information Criterion (BIC):
˜ = log p(Y|θ̂) − 1 p × log n
BIC
2
Again, for historical reasons, this is often multiplied by −2:
BIC = −2 log p(Y|θ̂) + p × log n,
so that a low value of the BIC is better.
As compared to the AIC,
• the BIC penalizes complexity more;
• the BIC is generally better at model selection, AIC at prediction.
27/36
28/36
Model comparison with BIC
BIC(fit.0)
## [1] 372.9065
BIC(fit.rc)
## [1] 527.7581
BIC(fit.r)
## [1] 466.1716
BIC(fit.c)
## [1] 435.3325
Caution: Theoretical justification of the BIC assumes one of the models
considered is correct.
29/36
Model fit
mu.hat<-fit.c$coef[1]
b.hat<-c(fit.c$coef[2:18],-sum(fit.c$coef[2:18]))
muij.mle<- mu.hat+ outer(rep(0,n),b.hat,"+")
p.mle<-exp(muij.mle)/(1+exp(muij.mle))
0.20
0.25
0.30
tH
0.35
Density
0.6
0.4
0.0
0
0.0
0.2
0.2
0.4
5
Density
Density
0.6
0.8
10
1.0
0.8
1.2
1.0
15
1.4
t.M<-NULL
for(s in 1:S)
{
Ysim<-matrix(rbinom(n^2,1,p.mle),n,n) ; diag(Ysim)<-NA
t.M<-rbind(t.M, c(mmean(Ysim),sd(rsum(Ysim)),sd(csum(Ysim))) )
}
1.0
1.5
2.0
tH
2.5
2.0
2.5
3.0
tH
3.5
4.0
30/36
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
30/36
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
30/36
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
30/36
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
30/36
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
30/36
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
30/36
Model comparison
For these monk data,
• among these models, the column effects model has best AIC;
• all models show lack of fit in terms of outdegree heterogeneity.
The lack of fit is not surprising - we know an independence model is wrong:
• monks were limited to three nominations per time period.
Ideally, such design features of the data would be incorporated into the model.
We will discuss such models towards the end of the quarter.
30/36
31/36
60
Conflict in the 90s
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
40
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0
●
●
0 2
●
● ● ●
●
●
●
●● ●
●
●● ● ●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
● ●●
●
● ●
●●
●
●
●
●
● ●
●
●
●
●
● ●
●
0
sd(rsum(Y))
## [1] 3.589398
sd(csum(Y))
## [1] 1.984451
5 7
●
●
11
outdegree
26
30
●
●
●
0 10
●
●
20
●
2
4
6
8
indegree
15
32/36
Model selection
fit.0<-glm( y ~ 1, family=binomial)
fit.r<-glm( y ~ C(factor(ridx),sum) , family=binomial)
fit.c<-glm( y ~ C(factor(cidx),sum) , family=binomial)
fit.rc<-glm( y ~ C(factor(ridx),sum)+C(factor(cidx),sum), family=binomial)
AIC(fit.0)
## [1] 2197.674
AIC(fit.r)
## [1] 1947.604
AIC(fit.c)
## [1] 2176.021
AIC(fit.rc)
## [1] 1897.398
For these data, the full RCE model is best among these four.
33/36
Model selection with BIC
BIC(fit.0)
## [1] 2205.401
BIC(fit.r)
## [1] 2952.159
BIC(fit.c)
## [1] 3180.576
BIC(fit.rc)
## [1] 3898.78
34/36
0.009
0.010
0.011
0.012
0.013
tH
0.014
0.015
0.016
2.0
Density
1.0
0.0
0
0.0
0.2
100
0.5
0.4
200
Density
300
Density
0.6
0.8
1.5
1.0
400
1.2
500
Best case comparison
3.0
3.5
4.0
tH
4.5
1.8
2.0
2.2
2.4
tH
2.6
2.8
3.0
35/36
Row effects only
mu.hat<-fit.r$coef[1]
a.hat<- fit.r$coef[1+1:(nrow(Y)-1)] ; a.hat<-c( a.hat,-sum(a.hat) )
theta.mle<- mu.hat+ outer(a.hat,rep(0,nrow(Y)),"+")
0.010
0.011
0.012
tH
0.013
0.014
0.015
3
Density
2
0
0
0.0
0.2
1
100
0.4
Density
0.6
Density
200
300
0.8
400
1.0
4
1.2
p.mle<-exp(theta.mle)/(1+exp(theta.mle))
2.5
3.0
3.5
4.0
tH
4.5
5.0
1.0
1.2
1.4
1.6
tH
1.8
2.0
36/36
0.009
0.010
0.011
0.012
0.013
0.014
0.015
0.016
4
Density
2
3
1
0
0
0.0
0.2
100
0.4
Density
0.6
Density
200
300
0.8
400
1.0
1.2
Fit comparison
2.5
3.0
3.5
4.0
4.5
5.0
4.0
4.5
5.0
1.0
1.5
2.0
tH
2.5
3.0
1.0
1.5
2.0
tH
2.5
3.0
tH
0.009
0.010
0.011
0.012
0.013
tH
0.014
0.015
0.016
2.0
Density
1.0
0.0
0
0.0
0.2
100
0.5
0.4
Density
200
300
Density
0.6
0.8
1.5
1.0
400
1.2
500
tH
2.5
3.0
3.5
tH
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
37/36
Conditional testing
Recall the “best case scenario” evaluation is somewhat ad-hoc.
Compare to the conditional evaluation:
Suppose Y ∼ SRG (n, θ):
• {Y|y·· } 6∼ SRG (n, θ);
• {Y|y·· } ∼ ERG (n, y·· ).
Similarly, suppose Y ∼ RCE (µ, a, b) :
• {Y|y·· , {yi· }, {y·i }} 6∼ RCE (µ̂, â, b̂).
• {Y|y·· , {yi· }, {y·i }} ∼?
Download