ESTIMATION OF ROW AND COLUMN SCORES IN THE LINEAR

advertisement
ESTIMATION OF ROW AND COLUMN SCORES IN THE LINEAR-BY-LINEAR ASSOCIATION MODEL
FOR TWO-WAY ORDINAL CONTINGENCY TABLES
Charles S. Davis, University of Iowa
Abstract
or both of the variables are ordinal, unsaturated association
models exist. These models are more realistic than the independence model.
Consider the situation when both the column and row
variables of a two-dimensional table are ordinal. A simple
loglinear model that utilizes the orderings of the rows and
the columns is the linear-by-linear association model (Agresti,
1984, pp. 76-80). Since this model has only one more parameter than the independence model, the degrees of freedom for
testing goodness orfit is (T -1)(c-1)-l.
An obvious disadvantage of the linear-by-linear association model is the necessity of assigning scores to the categories
of the row and column variables. In many applications, the
choice of scores will reflect assumed distances between midpoints of categories for an underlying interval scale. The integer scores are most commonly used in practice, in which-case
the model is known as the uniform association model. However, there is no obvious choice of scores for many variables
and the researcher may not wish to assume equal spacings.
One solution is to assign scores a variety of "reasonable" ways
to check whether substantive conclusions concerning parameter estimates and the goodness of fit of the model depend on
the actual choice.
In this paper, we consider the alternative of treating the
scores as parameters to be estimated from the data rather
than as numbers to be supplied by the researcher. This model
was first discussed by Goodman (1979, 1981a, 1981b), who
referred to it as "Model II" or the "RC model." Because the
log expected frequency is a multiplicative (rather than linear)
function of the model parameters, Agresti (1984, p. 139) calls
it the log-multiplicative model. The RC model has the same
general appearance as the linear-by-linear association model,
except that scores for ordinal variables are treated as param~
eters. It is unnecessary for the user to assign the scores, since
the estimation process provides estimated scores that yield the
best fit for the linear-by-linear association.
In Section 2, the RC model is described and discussed. An
easy-to-use SAS macro for determining maximum likelihood
estimates (MLE's) of the row and column scores is described in
Section 3. The program uses PROC MATRIX to iteratively
estimate the scores. Finally, Section 4 contains an example
demonstrating the use of the macro in fitting the RC model
to an observed contingency table.
Consider the situation when both the column and row
variables of a. two-dimensional table are ordinal. A simple loglinear model that -utilizes the orderings of the TOWS and the
columns is the linear-by-linear association model. A disadvantage of this model is the necessity of assigning scores to
the categories of the row and column variables. Although the
integer scores are most commonly used in practice, there is no
obvious choice of scores for many variables and the researcher
may not wish to assume equal spacings.
In this paper, we consider the alternative of treating the
scores a.s parameters to be estimated from the data rather than
as numbers to be supplied by the researcher. The resulting
model was first discussed by Goodman, who referred to it
as the "RC model." The RC model has the same general
appearance as the linear~by-linear association model, except
that scores for ordinal variables are treated as parameters. An
easy-to-use SAS® macro for determining maximum likelihood
estimates of the row and column scores and the association
parameter is described. The program uses PROC MATRIX
to iteratively estimate the scores.
1. Introduction
While methods for analyzing cross-classified categorical
data have received considerable attention in recent years, most
of the well-known statistical techniques for analyzing such
data treat all variables as nominal. Thus, the results are invariant to permutations of the categories of any of the variables. Examples of such methods include the Pearson chisquare test ofindependence and the traditionalloglinear models (Bishop, et ai., 1975). In much of the research conducted
in various disciplines, these methods are routinely applied to
both nominal and ordinal categorical data.
Recently, specialized methods and descriptive measures
have' been developed for contingency tables having ordered
categories for at least one of the classifications. There are
several advantages to be gained from using specialized models which efficiently use the information on ordering instead
of the standard procedures appropriate for nominal data. Ordinal methods have greater power for detecting certain types
of alternatives to null hypotheses such as the one of independence. In addition, ordinal methods can use a greater variety
of models, most of which are more parsimonious and have
simpler interpretations than the standard models for nominal
variables. Finally, interesting ordinal models can be applied
in settings where the standard nominal models are trivial or
else have too many parameters to be tested for goodness of
fit.
2. The RC Association Model for Two-Way
Ordinal-Ordinal Contingency Tables
Let {nij} denote the cell frequencies in an r X c crossclassification of ordinal variables X and Y. Let {mij} denote
the corresponding expected frequencies and let {ll'ij} denote
the cell probabilities. A general structural form for modelling
the association between X and Y is:
For example, in a two-way r X c contingency table with
expected frequencies {mij}, it is quite common for the independence model
(2.1)
where
to provide a poor fit to a set of observed data. However, in the
standard hierarchical system, the model of next greater complexity is the saturated one having an additional (r-l)(c-1)
independent )qJ Y parameters. Thus, a nontrivial model does
not exist for describing the association. In contrast, if one
I>x.f = L AJ = O.
In this general model, the local
log-odds ratio is
Iogij-Og
0 - I
mij m i+l,j+1
mi,j+l mi+1,j
= (1'.+1 - 1';)(Vj+1 - Vj).
946
Stewart, 1979, pp. 599--609). In the canonical correlation approach, the scores {Pi} and {Vj} that produce the canonical
(maximum) correlation for the joint distribution {7rij} are estimated. Again, the constraints (2.3) are used. Goodman
(I981a) noted that the estimated parameter scores obtained
for the RC model are often very close to the estimates obtained
for the canonical correlation model.
.Model (2.1) is the linear-by-linear association model when
JliVj = j3UjVj with the {Ui} and {Vj} being fixed, strictly
monotone scores (Agresti, 1984, p. 77).
In addition, this general model is referred to as the row
effects model when the {Pi} are unknown parameters and the
{Vj} are fixed, strictly monotone scores (Agresti, 1984, p. 84);
the column effects model when the {Vj} are parameters and
the {ILi} are fixed, strictly monotone scores (Agresti, 1984,
p. 85); and the RC model when both sets are parameters.
While the linear-by-linear association, row effects, and column effects models are loglinear models, the RC model is not
loglinear in the natural parameters.
3. Description of the SAS Macro
The log-multiplicative model cannot easily be fit using
commonly available packages. Agresti (1984, p. 141) suggests
the following iterative procedure:
(i) treat the column parameter scores as fixed and estimate
It is common to rewrite the RC model as:
the row scores as in a loglinear row effects model;
(ii) treat the resulting row scores as fixed and estimate the
(2.2)
column scores as in a column
I>,iPH =
j
modelj
MLE's of the parameters can be obtained using the iterative sequence of weighted least squares estimates described in
Agresti (1984, p. 238). Although this procedure will directly
produce estimated parameters from which predicted frequencies and the goodness of fit likelihood ratio chi-square statistic
(G 2 ) can be calculated, calculation of the scaled scores and the
estimate of {3 is not described.
L PiPi+ = 2: Vjp+j = 0,
i
e~ects
(iii) repeat steps (i) .and (ii) until convergence.
Since the basic form of the model is unchanged when the {JLd
or {Vj} are replaced by linear functions of themselves, without
loss of generality, an arbitrary location and scale may be assumed for them. Let Pi+ and P+j denote the row and column
sample marginal distributions, respectively. The constraints
(2.3)
A macro RC-MODEL to fit the log-multiplicative model
to a general r x c contingency table was written using PROC
MATRIX; this macro is listed in the Appendix. The input
contingency table is contained in a SAS data set with r observations and c variables. Observation i for variable j contains
the (i,j) count of the table. The steps are documented in the
program listing and are summarized as follows:
2:vjp+j = 1,
j
scale the scores to have means of zero and standard deviations
of one with respect to the marginal distributions. With this
choice of constraints a function of the estimated association
parameter jj can be interpreted as a correlation coefficient, as
will be shown below.
a. Determine the dimensions of the table, calculate marginal
probabilities for rows and columns, and reshape the table
into an rc X 1 vector (called N).
Since r - 2 of the {It;} and c - 2 of the {Vj} are linearly
independent, the number of independent parameters in (2.2)
is 2(r + c - 2). Thus, the degrees of freedom (df) for testing
goodness offit is (r-2)(c-2). Therefore, the table must have
dimensions at least 3 x 3 for the RC model to be unsaturated.
h. Generate the fixed part of the design matrix (r + c - 1
columns):
(i) a column of ~1'sj
(ii) r - 1 row effects of the form:
Goodman (1981b) pointed out that the RC model for discrete variables has form similar to the bivariate normal density
for continuous variables. If (X, Y) have a standardized bivariate normal distribution, then
f(x,y) = (2 .. ,11 - p')-' exP [2(1 ~ p') (x' - 2pxy + y2)] ,
o
o
o
o
o
0
o
o
1
1
o
o
o
1
o
0
0
0
0
1
0
0
I
=g(x)h(y) exp[ 1 ~ p,x y ].
Using the standardized scores as in (2.3), the RC model is:
where A* = A -logN and N = Lnij. Then
i,j
where ai = exp(A* + >.f) and Ij = exp(>.j). Thus the association parameter f3 in the RC model corresponds to p/(I- p2)
in the bivariate normal density.
Goodman (1981a) noted many similarities
RC model and the canonical correlation method
ysis of two-way contingency tables with ordered
umn categories (Fisher, 1940; Williams, 1952;
between the
for the analrow and colKendall and
947
-1 -1
-1 -1
-1
-1 -1
-1
-1
f. MLE's of the row scores are then found using iterative
weighted least squares (Agresti, 1984, p. 238, Equation
B.5). Iteration continues until the absolute value of the
relative difference is less than 0.001 for every component
of the vector of estimated parameters, that is,
(iii) c - 1 column effects of the form:
0-1
1
0
0
0
0
0
-1
-1
1
0
0
0
0
max
1
-1
An absolute maximum of 5 iterations is also incorporated.
g. Create c - 1 columns of the design matrix for estimating
the column scores. These columns are of the form:
1
0
0
1
0
000
0-1
o
0
Xl
0
0
0
0
0
0
c. Initialize the vectors of predicted values (M), estimated
parameters (B). column scores (COLSCORE), and row
scores (ROWSCORE). The column scores are initially set
equal to j for j = 1, ... ,c and M is set equal to N. The
other parameters are initialized to the value L The initial estimates are stored so they can be used in checking
convergence.
-Xl
-Xl
Xl
-Xl
0
0
X,
d. Create two additional sets of columns of the design matrix: T -1 columns for estimating the row scores and c - 1
columns for estimating the column scores. At this point,
the iterative procedure starts. Steps e.-i. are repeated
until the estimates converge or until the maximum number of iterations (currently set equal to 10) is reached.
-X,
-x,
-x,
x.
0
0
0
x.
0
0
0
0
0
-x.
0
-x.
-x.
o
0
1
-1
-1
-1
Xl
0
0
e. Create r - 1 columns of the design matrix for estimating
the row scores. These columns are of the form:
• -1
VI
V,
0
0
0
0
Vo
0
0
0
0
VI
V,
0
0
0
Yo
0
0
0
0
0
Yl
V,
0
0
V,
-Yl
-V,
-Yl
-Y'
-VI
-y,
-Yo
-Yo
-Yo
X,
0
0
0
X,
0
0
0
0
X•
where Xl, ••• , Xr are the current estimates of the row
scores. These columns are appended on the right of the
fixed part of the design matrix.
h. MLE's of the column scores are then found using the same
procedure as was described in step f.
i. Convergence is checked. When the criteria of step f. is
satisfied, the iterative step is finished.
j. The row and column scores are standardized to have mean
values of zero and variances of one (with respect to the
marginal distributions).
k. The MLE of j3 is then calculated. It can be shown that
is equal to the product of the estimated standard devi~
ations of the row and column scores. These values were
obtained in step j.
1i
1. The goodness of fit to the model is calculated (C 2 , df,
p~value)
and printed. The row scores, the column scores,
and the association parameter are printed. The correlation parameter p is calculated from the relationship
f3 = pj(1- p') and printed.
4. Example
Srole, et at. (1962) conducted a study which attempted
to examine relationships between mental illness and socioe~
conomic status. Subjects were obtained from a probability
sample of the resident midtown Manhattan population. Of
where Yl, Y2, .. 'J Yc are the current estimates of the column scores. These columns are appended on the right of
the fixed part of the design matrix.
948
(r -1) x (r -1) centra! Wishart matrix with (c -1) df. The
upper 1 percent a.nd 5 percent critical values for this statistic
are given in Table.51 of Pearson and Hartley (1972).
1911 persons contacted, 1660 permitted themselves to be interviewed. Table 1 displays the cross-classification of the respondents by mental health status and parental socioeconomic
status (SES). These data have been previously analyzed by
Haberman (1974, 1979) and Goodman (1979, 1985), among
others.
In this case, G2(I)-G'(RC) = 47.42-3.57 = 43.85. The
corresponding upper 5 and 1 percent points of the distribution
of the maximum eigenvalue of the 3 X 3 central Wishart matrix
with df=5 are 17.21 and 21.65. Thus, there is strong evidence
for association between SES and mental health status.
Table 1
Subjects Cross-Classified by Mental Health
Status and Parental Socioeconomic Status
5. Acknowledgements
This research was supported in part by Grant CA39065
from the National Cancer Institute.
Parental SES
Mental Health
Category
A
B
C
D
E
F
Total
Well
64
57
57
72
36
21
307
Mild symptoms
94
94 105 141
97
71
602
Mod. symptoms
58
54
65
77
54
54
362
Impaired
46
40
60
94
78
71
389
262 245 287 384 265 217
1660
Total
SAS is the registered trademark of SAS Institute Inc.,
Cary, NC, USA.
6. References
Agresti, A. (1984). Analysis of Ordinal Categorical Data. New
York: John Wiley and Sons.
Bishop, Y.M.M., Fienberg, S.E., and Holland, P.W. (1975).
Discrete Multivariate Annlysis. Cambridge, MA: MIT
Press.
Fisher, R.A. (1940). The precision of discriminant functions.
Ann. Eugenics, London 10, 422-429.
Goodman, L.A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories.
J. Amer. Statist. Ass. 74,537-552.
The null hypothesis of independence between SES and
mental health status is not supported, since the likelihood
ratio chi-square (G2) is 47.42, with 15 df. The necessary SAS
statements for fitting the RC model are listed below:
Goodman, L.A. (1981a). Association models and canonical
correlation in the analysis of cross~classifications having
ordered categories. J. Amer. Statist. Ass. 76, 320-334.
DATA SES;
INPUT SES.A SES.ll SES_C SES..D SES..E SES_F;
CARDS;
64 57 57 72 36 21
94 94 105 141 97 71
58 54 65 77 54 54
46 40 60 94 78 71
Goodman, L.A. (1981b). Association models and the bivariate
normal distribution in the analysis of cross-classifications
having ordered categories. Biometrika 68, 347-355.
Goodman, L.A. (1985). The analysis of cross-classified data
having ordered and/or unordered categories: association
models, correlation models, and asymmetry models for
contingency tables with or without missing entries. Ann.
Statist. 13, 1(}.-69.
MACRO DATA..sET SES % RC..MODEL
Haberman, S.J. (1974). Log-linear models for frequency tables
with ordered classifications. Biometrics 30, 589-600:
Since C 2 = 3.57 with 8 df, the RC model fits the observed
data very well. The estimated row scores are -1.68, -.14, .14,
and 1.41, while the estimated column scores are -1.11, -1.12,
-.37, 0.03, 1.01, and 1.82. The row scores indicate that the
distance between the mental health status categories labelled
"mild" and 'Imoderate" is much less than the distances between other adjacent categories. Similarly, SES categories A
and B have almost the same score and the distance between
categories C and D is much less than the distances· between B
and C, D and E, and E and F.
Haberman, S.J. (1979). Analysis of Qualitative Data: Volume
2. New Developments. New York: Academic Press.
Haberman, S.J. (1981). Tests for independence in two-way
contingency tables based on canonical correlation and on
linear-by-linear interaction. Ann. Statist. 9, 1178-1186.
Kendall, M.G. and Stuart, A. (1979). The Advanced Theory
of Statistics: Volume 2. Inference and Relationship (4th
Edition). New York: MacMillan.
The estimate of the association parameter is (j = 0.166
and the corresponding correlation parameter is p = 0.162. In
comparison, the correlation parameter from canonical correla.tion analysis is 0.163 (Goodman, 1985, p. 42). Thus, the two
methods give very similar results.
Pearson, E.S. and Hartley, H.O. (1972). Biometrika Tables
for Statisticians: Volume II. Cambridge: The University
Press.
Srole, L., Langner, T.S., Michael, S.T., Opler, M.K., and Rennie, T.A.C. (1962). Mental Health in the Metropolis: The
Midtown Manhattan Study. New York: McGraw-Hill.
Haberman (1981) presented the asymptotic theory for
testing Ho: f3 = 0 in the RC model. Let G'(I) denote the
likelihood ratio goodness of fit statistic from the independence
model and C 2 (RC) denote the corresponding statistic from
the RC model. Under the null hypothesis of independence,
Haberman showed that G'(I) - G'(RC) has the sarne asymp·
totic distribution as that of the maximum eigenvalue of the
Williams, E.J. (1952). Use of scores for the analysis of association in contingency tables. Biometrika 39, 274-280.
949
Appendix:
*
Listing of Macro RC_MODEL
THIS PROGRAM FITS THE RC ASSOCIATION MODEL TO A TWO-WAY TABLE.
THE INPUT RXC CONTINGENCY TABLE SHOULD BE IN A SAS DATA SET NAMED 'DATA SET'.
(ALTERNATIVELY, THE DATA FILE NAME CAN BE SPECIFIED IN A MACRO 'DATA SET'.)
THIS DATA FILE SHOULD HAVE R OBSERVATIONS AND C VARIABLES.
THE (I,J) COUNT OF THE CONTINGENCY TABLE IS OBS. I FOR VARIABLE J;
PROC MATRIX;
FETCH TABLE DATA~DATA SET;
* DETERMINE THE DIMENSIONS OF THE CONTINGENCY TABLE;
R-NROW(TABLE); C-NCOL(TABLE);
* CALCULATE MARGINAL PROBABILITIES FOR ROWS AND COLUMNS;
ROWPROB-TABLE(,+)#/SUM(TABLE); COLPROB-TABLE(+,)#/SUM(TABLE);
* RESHAPE THE TWO-WAY TABLE INTO A VECTOR OF LENGTH RC-R*C;
N-SHAPE(TABLE,l); RC-R#C;
* GENERATE THE FIRST
* FIRST, THE R-1 ROW
1+(R-1)+(C-1) COLUMNS OF THE DESIGN MATRIX X;
EFFECTS;
ROWDESIG-J(RC,R-l,O);
ROWDESIG«C#(R-1)+1):RC,)-J(C,R-1,-1);
ROWSTART=1;
DO COL-1 TO (R-1);
ROWSTOP-ROWSTART+C-1;
ROWDESIG(ROWSTART:ROWSTOP,COL)-J(C,l,l);
ROWSTART-ROWSTART+C;
END;
* NEXT, THE C-l COLUMN EFFECTS;
COLDESIG-J(RC,C-1,0);
COLBLOCK-I(C-1) II J(l,C-l,-l);
DO ROWSTART~l TO (C#(R-1)+1) BY C;
ROWSTOP=ROWSTART+C-1;
COLDESIG(ROWSTART:ROWSTOP,)-COLBLOCK;
END;
*
THE FIXED PART OF THE DESIGN MATRIX CONSISTS OF A COLUMN
OF ONES, THE R-1 ROW EFFECTS, AND THE C-l COLUMN EFFECTS;
FIXEDX-J(RC,l,l) II ROWDESIG I I COLDESIG;
*
INITIALIZE THE VECTOR OF PREDICTED VALUES TO THE VECTOR OF OBSERVED VALUES;
M=N;
INITIALIZE THE PARAMETER VECTOR, THE COLUMN SCORES, AND THE ROW SCORES;
OLDB1-J(2#R+C-2,1); OLDB2-J(2#C+R-2,1);
COLSCOR-J(C,l);
DO ROW-1 TO C;
COLSCOR(ROW,l)-ROW;
END;
ROWSCORE-J(R,l);
CHECKB, CHECKR, AND CHECKC CONTAIN VALUES OF THE FIXED PARAMETERS,
ROW SCORES, AND COLUMN SCORES, RESPECTIVELY. THESE VALUES WILL BE
USED IN CHECKING FOR CONVERGENCE;
CHECKB-J(R+C-l,l); CHECKR-ROWSCORE; CHECKC-COLSCOR;
CREATE TWO ADDITIONAL SETS OF COLUMNS OF THE DESIGN MATRIX, ONE SET FOR USE·
IN ESTIMATING ROW SCORES AND ONE SET FOR USE IN ESTIMATING COLUMN SCORES;
DRSCORE-J(RC,R-1); DCSCORE-J(RC,C-l);
*
*
*
*
THE ITERATIVE PART OF THE PROGRAM CONSISTS OF TWO STEPS:
1. THE COLUMN SCORES ARE TREATED AS FIXED AND ROW SCORES ARE ESTIMATED.
2. THE RESULTING ROW SCORES ARE CONSIDERED TO BE FIXED AND THE COLUMN SCORES
ARE REESTIMATED.
THIS PROCESS CONTINUES UNTIL THE ESTIMATES CONVERGE OR UNTIL THE MAXIMUM
NUMBER OF ITERATIONS IS REACHED;
DO ITERATE-1 TO 10;
*
CREATE R-l COLUMNS OF THE DESIGN MATRIX FOR ESTIMATING ROW SCORES;
DO ROWSTART-l TO (C#(R-1)+1) BY C;
ROWSTOP-ROWSTART+C-1;
950
Appendix:
Listing of Macro RC_MODEL (Continued)
DO COL-l TO (R-l);
DRSCORE(ROWSTART: ROWSTOP, COL)-COLSCOR#ROWDESIG(ROWSTART: ROWSTOP,COL);
END; END;
X-FIXEDX II DRS CORE ;
*
ITERATE TO FIND ESTIMATES OF THE ROW SCORES.
THE PROCEDURE IS DESCRIBED IN AGRESTI (1984, P. 238, EQUATION B.5);
DO I-I TO 5;
SINV-INV(DIAG(l#jM));
Y-LOG(M)+«N-M)#jM);
B-INV(X'*SINV*X)*X'*SINV*Y; M-EXP(X*B);
IF ABS«B-OLDBl)#jOLDBl)<O.OOl THEN GOTO Ll;
OLDBl-B;
END;
NOTE 'CONVERGENCE WAS NOT ACHIEVED IN THE ROW SCORES STEP';
Ll:ROWSCORE-B«R+C):(2#R+C-2),l) j j (-SUM(B«R+C):(2#R+C-2),)));
* CREATE C-l COLUMNS OF THE DESIGN MATRIX FOR ESTIMATING COL. SCORES;
DO I-I TO R;
ROWSTART-C#(I-l)+l; ROWSTOP-C#I; ROWMAT-J(C,C-l,ROWSCORE(I,l));
DCSCORE(ROWSTART:ROWSTOP,)-ROWMAT#COLDESIG(ROWSTART:ROWSTOP,);
END;
X-FIXEDX II DCSCORE;
* ITERATE TO FIND ESTIMATES OF THE COLUMN SCORES;
DO I-I TO 5;
SINV-INV(DIAG(l#jM));
Y-LOG(M)+«N-M)#jM);
B=INV(X'*SINV*X)*X'*SINV*Y; M=EXP(X*B);
IF ABS«B-OLDB2)#jOLDB2)<0.001 THEN COTO L2;
OLDB2-B;
END;
NOTE 'CONVERGENCE WAS NOT ACHIEVED IN THE COLUMN SCORES STEP';
L2:COLSCOR-B«R+C):(2#C+R-2),l) j j (-SUM(B«R+C):(2#C+R-2),)));
* CHECK FOR CONVERGENCE;
TEMPB-B(l:(R+C-l),l); FLAG-D;
IF ABS«TEMPB-CHECKB)#jCHECKB»O.OOl THEN FLAG-I;
IF ABS«ROWSCORE-CHECKR)#jCHECKR»O.OOl THEN FLAG-I;
IF ABS«COLSCOR -CHECKC)#jCHECKC»O.OOl THEN FLAG-I;
GHEGKB=TEMPB; CHECKR=ROWSGORE; CHECKC=COLSCOR;
IF FLAG-O THEN GOTO L3;
END;
NOTE 'OVERALL CONVERGENCE WAS NOT ACHIEVED' ;
L3: * CONVERGENCE HAS BEEN OBTAINED;
* STANDARDIZE THE ROW AND COLUMN SCORES TO HAVE MEANS OF ZERO AND
VARIANCES OF ONE WITH RESPECT TO THE MARGINAL DISTRIBUTIONS;
RBAR-SUM(ROWSCORE#ROWPROB);
RSCORE-ROWSCORE-RBAR;
RSD-SUM(RSCORE#RSCORE#ROWPROB)##O.5;
ROWSCORE-RSCORE#jRSD;
CBAR-SUM(COLSCOR#COLPROB');
CSCORE-COLSCOR-CBAR;
CSD-SUM(CSCORE#CSCORE#COLPROB')##O.5; COLSCOR-CSCORE#jCSD;
* CALCULATE THE GOODNESS-OF-FIT OF THE MODEL;
GSQUARED-2#N'*(LOG(N)-LOG(M)); DF-(R-2)#(C-2); P-l-PROBCHI(GSQUARED,DF);
LR-GSQUARED I I DF II P;
NOTE 'GOODNESS-OF-FIT (G-SQUARED, DF, P-VALUE), ; PRINT LR;
COLSCORE=COLSCOR';
NOTE 'ROW SCORES AND COLUMN SCORES';
NOTE 'SCALED TO HAVE MEANS OF ZERO AND VARIANCES OF ONE';
PRINT ROWSCORE FORMAT-8.3; PRINT COLSCORE FORMAT-8.3;
* ESTIMATE THE ASSOCIATION PARAMETER BETA AND THE CORRELATION PARAMETER RHO;
BETA=RSD#CSD;
NOTE 'ASSOCIATION PARAMETER'; PRINT BETA FORMAT-8.3;
NU-l#j(2#BETA); TERM-(1+NU#NU)##O.5;
IF BETA(l,l»O THEN RHD-(-NU)+TERM; IF BETA(l,l)<O THEN RHO-(-NU)-TERM;
NOTE 'CORRELATION PARAMETER'; PRINT RHO FORMAT-8.3;
STOP;
%
951
Download