Association models Today’s topics: 1. Introduction 2. Linear-by-linear association model 3. Association models 4. Correlation model 5. Correspondence Analysis 6. Interpretation of Graphical displays Association models 10.1 Introduction A main disadvantage of the loglinear models presented last weeks is that 1. They do not take into account ordered categories 2. For two-way tables it is independence or saturated Association models are models in between independence and saturated. They have the advantage that the association structure is described by a ‘simple’ structure and that the model can be tested (opposed to the saturated model). Association models 10.2 Linear-by-linear model For two-way tables with ordinal variables a simple model assigns (ordered) scores to both the row (u1 ≤ u2 ≤ . . . ≤ uI ) and column categories (v1 ≤ v2 ≤ . . . ≤ vJ ). The linear-by-linear association model (L × L) is Y log µij = λ + λX i + λj + βui vj It is a special case of the saturated model with λXY ij = βui vj . Only the β is a parameter, thus instead of (I − 1)(J − 1) parameters for the saturated model there is 1 parameter for the linear-by-linear association model. When β = 0 the independence model results. Association models 10.3 Linear-by-linear model At a given level of X the deviation from independence is linear in the Y -scores (and vice versa). When β > 0, Y tends to increase as X increases (positive relation), when β < 0, Y decreases as X increases (negative relationship). The likelihood ratio statistic G2 (I|L × L) = G2 (I) − G2 (L × L) is designed to detect positive or negative trends. This statistics has more power than G2 (I), since in general the power of chi-squared tests increase when df decrease. Association models 10.4 Linear-by-linear model The odds ratios for row categories a and c and column categories b and d given the linear-by-linear model are µab µcd log = β(uc − ua )(vd − vb) µad µbc When the row categories are equidistant, u2 − u1 = u3 − u2 = . . . = uI − uI−1 and also the column categories then the local odds ratios have a common value. When the distances are 1 the value is eβ . (uniform association model.) Disadvantage: The choice of scores affect the interpretation of β. Sometimes it is not very difficult to assign scores but in specific applications this might be cumbersome. Association models 10.5 Linear-by-linear model Often it is useful to standardize the scores, subtracting the mean and dividing by the standard deviation, i.e. X X uiπi+ = vj π+j = 0 X X 2 ui πi+ = vj2 π+j = 1 Then β represents the log odds ratios for standard deviation distances in the X and Y distributions. The model, usually fits well when the underlying continuous distribution is bivariate normal. For standardized scores β ≈ ρ/(1− ρ2 ) where ρ is the underlying correlation. Association models 10.6 Association models The linear-by-linear model can be generalized by changing scores into parameters. These generalizations are called association models since they focus on the association structure. The linear-by-linear association model (L × L) is Y log µij = λ + λX i + λj + βui vj If we only assign scores to the categories of Y (v1 ≤ v2 ≤ . . . ≤ vJ ), but estimate the scores of X, we obtain the following model Y log µij = λ + λX i + λj + µi vj called the row-effects model or parallel odds model. It is appropriate if Y is ordinal but X is nominal, since the row-effects are not necessarily ordered. The row-effects must be constrained such that µI = 0 P or µi = 0. It uses I − 1 more parameters than the independence model. Association models 10.7 Association models in multiway tables For multiway tables with ordinal variables many generalizations exist of the linear-by-linear association model, the row-effects model, etc. In the three-way case, for example, each of the two-way effects can be reparameterized using the linear-by-linear association term. Z But also three-way effects (λXY ijk ) can be replaced by simpler terms. example, For Y Z XZ YZ log µijk = λ + λX i + λj + λk + λik + λjk + βk ui vj XY Z where βk uivj replaces λXY ij + λijk . In this case the XY association may differ depending on the level of variable Z. Association models 10.8 RC-Association model The row effects model and column effects model are generalizations of the linear-by-linear association model. A further generalization is the row and column effects model, abbreviated the RC-association model: Y log µij = λ + λX i + λj + βµi νj For identification, location and scaling constraints are needed, for example X X µi = νj = 0 i X i j µ2i = X νj2 = 1 j The degrees of freedom equal df = (I − 2)(J − 2). This is not a loglinear model since the predictor is a multiplicative function of the parameters. It is a log-bilinear model. Association models 10.9 RC(M∗)-association model The interaction form of the RC-association model is λXY = βµiνj . This can ij P M∗ easily be generalized to a multidimensional form λXY = ij m=1 βm µim νjm . The maximum value of M ∗ is min(I − 1, J − 1) in which case the model equals the saturated model. The degrees of freedom equal df = (I − 1 − M ∗ )(J − 1 − M ∗ ). The RC(M∗ )-association model: ∗ Y log µij = λ + λX i + λj + M X βm µimνjm m=1 For identification, location, scaling, and orthogonality constraints are needed, for example X X µim = νjm = 0 i j X X µ2im = i X i Association models µimµin = 2 νjm =1 j X νjm νjn = 0 j 10.10 Correlation model A closely related model is the correlation model πij = πi+ π+j (1 + λµiνj ) For identification, location and scaling constraints are needed, for example X X µiπi+ = νj π+j = 0 X X 2 µi πi+ = νj2 π+j = 1 The λ is the correlation between the scores for the joint distribution. This model is also called the canonical correlation model. A generalization is πij = πi+ π+j (1 + M X λk µik νjk ) k=1 When M is min(I − 1, J − 1) the saturated model results. Association models 10.11 Correspondence Analysis Correspondence Analysis is the least squares version of the correlation model. It is generally used as a graphical device to represent association in a contingency table, and then uses two dimensions. The results of the RC(M∗ ) association model, the correlation model, and correspondence analysis are most often very much alike. The advantage of Correspondence analysis is that it is very easy to estimate (the least squares version). Estimation of the ML correlation model is often difficult. The RC(M∗ )-association model can be fitted iteratively, but since the likelihood function is not concave local optima can be found. Association models 10.12 Interpretation of graphical displays Row points that are close together have similar conditional distributions over the columns. Column points that are close together have similar conditional distributions over the rows. The interpretation of a row-column point relation is via inner-products, i.e. a projection rule. That is, the product of the two lengths of the vectors times the cosine of the angle between the vectors: M∗ X µimνjm = |µi||ν j | cos(µi, ν j ), m=1 But what to do with the β’s or λ’s? Association models 10.13 Interpretation of graphical displays In general, all plots are mathematically correct with row coordinates µ∗im = ∗ κ τ µ βm im and column coordinates νjm = βm νjm , where τ + κ = 1. In practice, one of the following graphical displays is often found: 1. Row principal normalization: the row categories are plotted as points with coordinates µ0im = βmµim, and the column categories as vectors with coordinates νjm . In row principal normalization, the Euclidean distances between the row points approximate (possibly weighted) Euclidean distances between the row entries of the contingency table, that are logarithmically transformed and corrected for the main effects. The column vectors have a direction and a length. The association with the row categories is reconstructed by projection, and the length indicates how well a column fits the chosen dimensionality. 2. Column principal normalization: vice versa Association models 10.14 Interpretation of graphical displays 1. Symmetric normalization: the row categories are plotted as vectors with 1 00 2 coordinates µim = βm µim, and the column categories as vectors with coor00 νjm 1 2 dinates = βm νjm . This normalization spreads the intrinsic association terms symmetrically over the rows and columns. Note that neither the distances between the row points or between the column points are approximations to data related distances. This plot can only be interpreted by projecting the row (column) points onto the direction indicated by any column (row) point. 2. Principal normalization: the row categories are plotted as points with coordinates βm µim and the column categories as points with coordinates βm νjm . Here the intrinsic association terms are spread twice in the solution, once over the row scores and once over the column scores. This is basically a incorrect graphical display since τ + κ 6= 1. This method of normalization can only be used for making separate plots of row categories and column categories, respectively. Association models 10.15 Interpretation of graphical displays An example with 5 row-categories (a,b,c,d,e) and 3 column-categories (1,2,3). Row principal normalization: the Euclidean distances between row-points can be interpreted and the inner-products. 1 2 0.8 0.6 0.4 b Dimension 2 0.2 d 0 e 3 a −0.2 c −0.4 −0.6 1 −0.8 −1 Association models −1 −0.5 0 Dimension 1 0.5 1 10.16 Interpretation of graphical displays The inner product equals 1. the products of length of the two vectors times the cosine of the angle 2. the product of the length of the vector and the projection of the point onto the vector 1 2 0.8 0.6 0.4 b Dimension 2 0.2 d 0 e 3 a −0.2 c −0.4 −0.6 1 −0.8 −1 Association models −1 −0.5 0 Dimension 1 0.5 1 10.17 Interpretation of graphical displays Greenacre (1984, p. 65) noted: “There are advantages and disadvantages of the simultaneous display. Clearly an advantage is the very concise graphical display expressing a number of different features of the data in a single picture. The display of each set of points indicates the nature of similarities and dispersion within the set. Notice, however, that we should avoid the danger of interpreting the distances between the points of different sets, since no such distances have been explicitly defined”. As opposed to what Agresti writes in lines 3-5 on page 384! Association models 10.18