Association models

advertisement
Association models
Today’s topics:
1. Introduction
2. Linear-by-linear association model
3. Association models
4. Correlation model
5. Correspondence Analysis
6. Interpretation of Graphical displays
Association models
10.1
Introduction
A main disadvantage of the loglinear models presented last weeks is that
1. They do not take into account ordered categories
2. For two-way tables it is independence or saturated
Association models are models in between independence and saturated. They
have the advantage that the association structure is described by a ‘simple’
structure and that the model can be tested (opposed to the saturated model).
Association models
10.2
Linear-by-linear model
For two-way tables with ordinal variables a simple model assigns (ordered)
scores to both the row (u1 ≤ u2 ≤ . . . ≤ uI ) and column categories (v1 ≤ v2 ≤
. . . ≤ vJ ).
The linear-by-linear association model (L × L) is
Y
log µij = λ + λX
i + λj + βui vj
It is a special case of the saturated model with λXY
ij = βui vj . Only the β is a
parameter, thus instead of (I − 1)(J − 1) parameters for the saturated model
there is 1 parameter for the linear-by-linear association model. When β = 0
the independence model results.
Association models
10.3
Linear-by-linear model
At a given level of X the deviation from independence is linear in the Y -scores
(and vice versa).
When β > 0, Y tends to increase as X increases (positive relation), when
β < 0, Y decreases as X increases (negative relationship).
The likelihood ratio statistic
G2 (I|L × L) = G2 (I) − G2 (L × L)
is designed to detect positive or negative trends. This statistics has more
power than G2 (I), since in general the power of chi-squared tests increase
when df decrease.
Association models
10.4
Linear-by-linear model
The odds ratios for row categories a and c and column categories b and d
given the linear-by-linear model are
µab µcd
log
= β(uc − ua )(vd − vb)
µad µbc
When the row categories are equidistant, u2 − u1 = u3 − u2 = . . . = uI − uI−1
and also the column categories then the local odds ratios have a common
value. When the distances are 1 the value is eβ . (uniform association model.)
Disadvantage: The choice of scores affect the interpretation of β. Sometimes
it is not very difficult to assign scores but in specific applications this might
be cumbersome.
Association models
10.5
Linear-by-linear model
Often it is useful to standardize the scores, subtracting the mean and dividing
by the standard deviation, i.e.
X
X
uiπi+ =
vj π+j = 0
X
X
2
ui πi+ =
vj2 π+j = 1
Then β represents the log odds ratios for standard deviation distances in
the X and Y distributions. The model, usually fits well when the underlying
continuous distribution is bivariate normal. For standardized scores β ≈ ρ/(1−
ρ2 ) where ρ is the underlying correlation.
Association models
10.6
Association models
The linear-by-linear model can be generalized by changing scores into parameters. These generalizations are called association models since they focus
on the association structure.
The linear-by-linear association model (L × L) is
Y
log µij = λ + λX
i + λj + βui vj
If we only assign scores to the categories of Y (v1 ≤ v2 ≤ . . . ≤ vJ ), but
estimate the scores of X, we obtain the following model
Y
log µij = λ + λX
i + λj + µi vj
called the row-effects model or parallel odds model.
It is appropriate if Y is ordinal but X is nominal, since the row-effects are not
necessarily
ordered. The row-effects must be constrained such that µI = 0
P
or
µi = 0.
It uses I − 1 more parameters than the independence model.
Association models
10.7
Association models in multiway tables
For multiway tables with ordinal variables many generalizations exist of the
linear-by-linear association model, the row-effects model, etc.
In the three-way case, for example, each of the two-way effects can be reparameterized using the linear-by-linear association term.
Z
But also three-way effects (λXY
ijk ) can be replaced by simpler terms.
example,
For
Y
Z
XZ
YZ
log µijk = λ + λX
i + λj + λk + λik + λjk + βk ui vj
XY Z
where βk uivj replaces λXY
ij + λijk . In this case the XY association may differ
depending on the level of variable Z.
Association models
10.8
RC-Association model
The row effects model and column effects model are generalizations of the
linear-by-linear association model. A further generalization is the row and
column effects model, abbreviated the RC-association model:
Y
log µij = λ + λX
i + λj + βµi νj
For identification, location and scaling constraints are needed, for example
X
X
µi =
νj = 0
i
X
i
j
µ2i =
X
νj2 = 1
j
The degrees of freedom equal
df = (I − 2)(J − 2).
This is not a loglinear model since the predictor is a multiplicative function
of the parameters. It is a log-bilinear model.
Association models
10.9
RC(M∗)-association model
The interaction form of the RC-association model is λXY
= βµiνj . This can
ij
P
M∗
easily be generalized to a multidimensional form λXY
=
ij
m=1 βm µim νjm .
The maximum value of M ∗ is min(I − 1, J − 1) in which case the model equals
the saturated model. The degrees of freedom equal df = (I − 1 − M ∗ )(J −
1 − M ∗ ).
The RC(M∗ )-association model:
∗
Y
log µij = λ + λX
i + λj +
M
X
βm µimνjm
m=1
For identification, location, scaling, and orthogonality constraints are needed,
for example
X
X
µim =
νjm = 0
i
j
X
X
µ2im =
i
X
i
Association models
µimµin =
2
νjm
=1
j
X
νjm νjn = 0
j
10.10
Correlation model
A closely related model is the correlation model
πij = πi+ π+j (1 + λµiνj )
For identification, location and scaling constraints are needed, for example
X
X
µiπi+ =
νj π+j = 0
X
X
2
µi πi+ =
νj2 π+j = 1
The λ is the correlation between the scores for the joint distribution. This
model is also called the canonical correlation model. A generalization is
πij = πi+ π+j (1 +
M
X
λk µik νjk )
k=1
When M is min(I − 1, J − 1) the saturated model results.
Association models
10.11
Correspondence Analysis
Correspondence Analysis is the least squares version of the correlation model.
It is generally used as a graphical device to represent association in a contingency table, and then uses two dimensions.
The results of the RC(M∗ ) association model, the correlation model, and
correspondence analysis are most often very much alike.
The advantage of Correspondence analysis is that it is very easy to estimate
(the least squares version). Estimation of the ML correlation model is often
difficult.
The RC(M∗ )-association model can be fitted iteratively, but since the likelihood function is not concave local optima can be found.
Association models
10.12
Interpretation of graphical displays
Row points that are close together have similar conditional distributions over
the columns.
Column points that are close together have similar conditional distributions
over the rows.
The interpretation of a row-column point relation is via inner-products, i.e. a
projection rule. That is, the product of the two lengths of the vectors times
the cosine of the angle between the vectors:
M∗
X
µimνjm = |µi||ν j | cos(µi, ν j ),
m=1
But what to do with the β’s or λ’s?
Association models
10.13
Interpretation of graphical displays
In general, all plots are mathematically correct with row coordinates µ∗im =
∗
κ
τ µ
βm
im and column coordinates νjm = βm νjm , where τ + κ = 1.
In practice, one of the following graphical displays is often found:
1. Row principal normalization: the row categories are plotted as points
with coordinates µ0im = βmµim, and the column categories as vectors with
coordinates νjm . In row principal normalization, the Euclidean distances
between the row points approximate (possibly weighted) Euclidean distances between the row entries of the contingency table, that are logarithmically transformed and corrected for the main effects. The column
vectors have a direction and a length. The association with the row
categories is reconstructed by projection, and the length indicates how
well a column fits the chosen dimensionality.
2. Column principal normalization: vice versa
Association models
10.14
Interpretation of graphical displays
1. Symmetric normalization: the row categories are plotted as vectors with
1
00
2
coordinates µim = βm µim, and the column categories as vectors with coor00
νjm
1
2
dinates
= βm νjm . This normalization spreads the intrinsic association
terms symmetrically over the rows and columns. Note that neither the
distances between the row points or between the column points are approximations to data related distances. This plot can only be interpreted
by projecting the row (column) points onto the direction indicated by
any column (row) point.
2. Principal normalization: the row categories are plotted as points with
coordinates βm µim and the column categories as points with coordinates
βm νjm . Here the intrinsic association terms are spread twice in the solution, once over the row scores and once over the column scores. This is
basically a incorrect graphical display since τ + κ 6= 1. This method of
normalization can only be used for making separate plots of row categories and column categories, respectively.
Association models
10.15
Interpretation of graphical displays
An example with 5 row-categories
(a,b,c,d,e) and 3 column-categories
(1,2,3).
Row principal normalization: the Euclidean distances between row-points can
be interpreted and the inner-products.
1
2
0.8
0.6
0.4
b
Dimension 2
0.2
d
0
e
3
a
−0.2
c
−0.4
−0.6
1
−0.8
−1
Association models
−1
−0.5
0
Dimension 1
0.5
1
10.16
Interpretation of graphical displays
The inner product equals
1. the products of length of the two
vectors times the cosine of the angle
2. the product of the length of the vector and the projection of the point
onto the vector
1
2
0.8
0.6
0.4
b
Dimension 2
0.2
d
0
e
3
a
−0.2
c
−0.4
−0.6
1
−0.8
−1
Association models
−1
−0.5
0
Dimension 1
0.5
1
10.17
Interpretation of graphical displays
Greenacre (1984, p. 65) noted:
“There are advantages and disadvantages of the simultaneous display. Clearly an advantage is the very concise graphical display expressing a number of different features of the data in a single picture.
The display of each set of points indicates the nature of similarities
and dispersion within the set. Notice, however, that we should avoid
the danger of interpreting the distances between the points of different sets, since no such distances have been explicitly defined”.
As opposed to what Agresti writes in lines 3-5 on page 384!
Association models
10.18
Download