Lipsitz, S., Williamson, J., Klar, N., Ibrahim, J. and Parzen, M.

advertisement
J. R. Statist. Soc. A (2001)
164, Part 3, pp. 449±465
A simple method for estimating a regression model
for between a pair of raters
Stuart R. Lipsitz,
Medical University of South Carolina, Charleston, USA
John Williamson,
Centers for Disease Control, Atlanta, USA
Neil Klar,
Cancer Care Ontario, Toronto, Canada
Joseph Ibrahim
Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA
and Michael Parzen
University of Chicago, USA
[Received November 1999. Revised December 2000]
Summary. Agreement studies commonly occur in medical research, for example, in the review of Xrays by radiologists, blood tests by a panel of pathologists and the evaluation of psychopathology by
a panel of raters. In these studies, often two observers rate the same subject for some characteristic
with a discrete number of levels. The -coef®cient is a popular measure of agreement between the
two raters. The -coef®cient may depend on covariates, i.e. characteristics of the raters and/or the
subjects being rated. Our research was motivated by two agreement problems. The ®rst is a study of
agreement between a pastor and a co-ordinator of Christian education on whether they feel that
the congregation puts enough emphasis on encouraging members to work for social justice (yes
versus no). We wish to model the -coef®cient as a function of covariates such as political orientation (liberal versus conservative) of the pastor and co-ordinator. The second example is a spousal
education study, in which we wish to model the -coef®cient as a function of covariates such as the
highest degree of the father of the wife and the father of the husband. We propose a simple method
to estimate the regression model for the -coef®cient, which consists of two logistic (or multinomial
logistic) regressions and one linear regression for binary data. The estimates can be easily obtained
in any generalized linear model software program.
Keywords: Generalized estimating equations; Generalized linear model; Measure of agreement
1.
Introduction
Agreement studies are common in medical research. For example, researchers are interested
in agreement between radiologists in the review of X-rays, pathologists in reviewing blood
tests and raters in evaluating patients' mental condition. When two observers rate subjects on
Address for correspondence: Stuart R. Lipsitz, Department of Biometry and Epidemiology, Medical University of
South Carolina, Suite 1148, 135 Rutledge Avenue, PO Box 250551, Charleston, SC 29425, USA.
E-mail: lipsitzs@musc.edu
& 2001 Royal Statistical Society
0964±1998/01/164449
450
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
a categorical scale, the -coef®cient proposed by Cohen (1960) is a popular chance-corrected
measure of interobserver agreement. Although not always the case, it is often true in such
studies that each subject is rated by only two raters. The -coef®cient may depend on
covariates, i.e. characteristics of the raters and/or the subjects being rated. The characteristics
of the raters are called `rater-speci®c' covariates; the characteristics of the subject are called
`subject-speci®c' covariates. In this paper, we propose a simple method to estimate the
regression model of the -coef®cient as a function of covariates, when each subject is rated by
two raters. For ratings on a dichotomous scale, estimates can be obtained by computing two
ordinary logistic regressions and one linear regression for binary data. If the rating is on a
categorical scale, all that we need to do is to compute two ordinal or polytomous logistic
regressions and one linear regression for binary data. These estimates can be easily
obtained in any generalized linear model software program. We use the following two
examples to illustrate the methods. We note that both data sets are selected subsamples
downloaded from the World Wide Web. The interpretation and results given here do not
necessarily re¯ect the results and views of the data collection and funding agencies for either
data set.
1.1. Example 1: pastor and co-ordinator educator study
Our ®rst example is from the `Effective Christian education study' (Benson and Eklin, 1991),
which was a study of US Protestant congregations. In this study, one pastor and one coordinator of Christian education from each of 395 randomly chosen congregations provided
detailed data on faith, loyalty, congregational life and the dynamics of Christian education
programming. In our example, both the pastor and the co-ordinator were asked whether they
felt that the congregation puts enough emphasis on encouraging members to work for social
justice (yes versus no); we are interested in agreement between a pastor and a co-ordinator on
this variable. The `subject' in this example is the congregation, with one rating from the
pastor and the other rating from the co-ordinator.
The goal of this analysis was to determine whether agreement between pastors and the coordinators depends on two subject-level covariates and three rater level covariates. The two
`subject level' covariates are characteristics of the congregations: the Protestant denomination (Disciples of Christ, Lutheran, Presbyterian, Baptist, Church of Christ or Methodist)
and whether the church offers community service projects for youths (yes versus no). The
following three covariates can differ between the pastor and co-ordinator and thus are `rater
level' covariates: the number of hours (2 or under versus more than 2) spent in the last 30
days helping people who are poor, political orientation (liberal versus conservative) and the
amount of money given to charities over the past year ($100 or more versus less than $100).
Our hypothesis is that the -coef®cient may be higher in certain denominations, may be
higher in churches that offer community service projects for youths, may be higher when both
the pastor and the co-ordinator work the same number of hours helping the poor, may be
higher when both the pastor and the co-ordinator have the same political orientation and
may be higher when both the pastor and the co-ordinator give the same amount to charity.
The data of 15 of the 395 congregations in the data set are displayed in Table 1.
1.2. Example 2: spousal education study
In our second example, we are interested in the similarity of the highest educational degree
(no high school degree, high school degree or college degree) of a husband and wife. The
subject in this example is the husband±wife pair; the wife is considered one rater, and the
Estimating a Regression Model for 451
Table 1. Data from 15 selected children from the pastor and co-ordinator educator study{
Case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Church emphasizes
social justice
Hours helping
poor
Pastor
Coordinator
Pastor
Coordinator
Y
N
Y
Y
Y
Y
Y
N
Y
N
Y
Y
N
Y
Y
N
Y
Y
Y
N
Y
N
Y
Y
Y
N
N
Y
N
Y
>2
>2
42
>2
42
>2
>2
42
>2
42
>2
>2
42
>2
>2
>2
42
>2
>2
>2
42
>2
>2
>2
>2
>2
42
>2
42
>2
Donations
Liberal
Pastor
Coordinator
Pastor
Coordinator
< $100
5$100
5$100
5$100
5$100
< $100
< $100
5$100
< $100
5$100
5$100
5$100
5$100
< $100
5$100
< $100
5$100
5$100
5$100
< $100
5$100
5$100
5$100
5$100
5$100
< $100
5$100
5$100
< $100
5$100
N
N
Y
N
N
Y
N
N
N
Y
Y
N
Y
N
Y
N
N
Y
N
N
N
N
N
Y
N
Y
N
N
Y
Y
Church offers Religion
community
service projects
N
N
Y
N
Y
Y
N
Y
N
N
N
N
N
Y
Y
3
3
6
4
1
3
2
6
3
3
1
2
5
5
5
{Y, yes; N, no.
husband the other rater. We use data from the 1991 General Social Survey (Smith, 1996),
which is a personal interview survey of US households conducted by the National Opinion
Research Center. The goal of this analysis was to determine whether agreement of the highest
degree between the husband and wife can be predicted by two subject level (family level)
covariates and three rater level covariates. The subject level covariates are the current family
income (less than $25 000, $25000±50000 or more than $50 000) and the number of children
in the family (0, 1, 2, 3 or 4). The rater level covariates are the ages at marriage for the wife
and husband the highest degree (no high school degree, high school degree or college degree)
of the father of the wife and the father of the husband, and an indicator for living at home
with both parents at age 16 years (yes versus no). In this example, we expect families with
lower incomes and more children possibly to have both husband and wife with lower
education (and thus higher agreement). We also expect agreement to be higher for husbands
and wives whose fathers have the same education level, husbands and wives who have similar
ages at marriage and husbands and wives who both lived or did not live at home at age 16
years. The data of 15 of the 332 husband±wife pairs are displayed in Table 2.
Maximum likelihood (Shoukri and Mian, 1996) has recently been proposed for estimating
the -coef®cient for dichotomous ratings as a function of covariates. The method requires a
special macro to compute the estimates. In this paper, we show that, for dichotomous ratings,
the regression model for can be estimated by using two ordinary logistic regressions and
one linear regression for binary data, and we show that the estimates can be obtained in any
generalized linear models program, such as SAS procedure GENMOD (SAS Institute, 1997),
STATA glm (Stata Corporation, 1995), S function glm (Hastie and Pregibon, 1993) or
GLIM (Francis et al., 1993).
For dichotomous ratings, the two ordinary logistic regressions and the one linear regression
for binary data completely specify the joint multinomial distribution of the two ratings on
the same subject, so the main advantage of our approach versus maximum likelihood for
dichotomous ratings is that writing the computer program needed to obtain the estimate is
452
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
Table 2. Data from 15 selected patients from the spousal education study{
Case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Highest
degree
Father's highest
degree
Wife
Husband
Wife
Husband
HS
HS
HS
HS
Below HS
Below HS
COL
HS
HS
Below HS
COL
COL
COL
COL
COL
HS
HS
HS
HS
Below HS
Below HS
COL
HS
HS
HS
COL
COL
COL
COL
HS
HS
HS
COL
COL
HS
Below HS
HS
Below HS
Below HS
Below HS
COL
COL
HS
COL
HS
Below HS
HS
HS
HS
HS
Below HS
COL
Below HS
Below HS
HS
HS
Below HS
HS
HS
HS
Age at
marriage (years)
Wife Husband
24
21
23
23
16
18
28
23
21
28
24
27
21
26
22
20
21
25
23
27
21
27
25
23
28
27
29
23
27
20
Living with parents at 16 years
Wife
Husband
Y
Y
Y
N
N
Y
Y
Y
N
Y
Y
Y
Y
Y
Y
Y
N
Y
N
Y
Y
Y
Y
N
Y
Y
Y
Y
Y
Y
Family
income
($)
Number
of
children
25000±50000
25000±50000
> 50000
> 50000
< 25000
> 50000
25000±50000
25000±50000
25000±50000
> 50000
> 50000
> 50000
> 50000
> 50000
25000±50000
0
0
3
1
0
0
1
0
0
0
0
3
3
1
2
{HS, high school; COL, college; Y, yes; N, no.
simpler. However, when estimating the -coef®cient for polytomous ratings (more than
two levels), maximum likelihood is much more complicated to extend since the full joint
distribution of the two ratings must be speci®ed, and this contains many unwanted nuisance
parameters. Using our approach, the regression model for can be estimated by using two
multinomial or ordinal logistic regressions for the polytomous ratings and one linear
regression for binary data, and does not require the speci®cation of the nuisance parameters
that is required by maximum likelihood. Thus, for polytomous ratings, our method can be
considered semiparametric.
In Section 2 we introduce some notation and brie¯y describe how the estimates are
obtained; ®rst for dichotomous ratings, and then for general categorical ratings. In Section 3,
we compare our proposed estimates with the maximum likelihood estimates (MLEs) by using
the pastor and co-ordinator educator example. As stated above, a maximum likelihood
method has yet to be developed for polytomous ratings, so we do not compare our estimate
with maximum likelihood in the spousal education study.
2.
Notation and model: dichotomous rating
We have i ˆ 1, . . ., N independent subjects. Two raters assess each subject on a discrete
characteristic. First, we assume that the rating is dichotomous. Thus, we let Yir equal 1 if
subject i is judged positive by rater r and equal 0 if judged negative, for r ˆ 1, 2. Each
individual also has a subject-speci®c covariate vector xi , such as family income in the spousal
education study. Also, there are two rater-speci®c covariates xir , r ˆ 1, 2. A rater-speci®c
covariate could be, for example, the age of the rater. We let zi0 ˆ …xi0 , xi10 , xi20 †. We can further
de®ne an indicator random variable Yi , which equals 1 if both raters agree on their ratings for
subject i and equals 0 otherwise. In terms of Yi1 and Yi2 ,
Yi ˆ Yi1 Yi2 ‡ …1
Yi1 †…1
Yi2 †.
Estimating a Regression Model for 453
To measure agreement between Yi1 and Yi2 , Cohen (1960) proposed the -coef®cient
i ˆ
pagree, i pchance, i
,
1 pchance, i
…1†
where pagree, i is the probability that the two raters agree on subject i,
pagree, i ˆ pr…Yi ˆ 1jzi † ˆ pr…Yi1 ˆ 1, Yi2 ˆ 1jzi † ‡ pr…Yi1 ˆ 0, Yi2 ˆ 0jzi †,
and
pchance, i ˆ pi1 pi2 ‡ …1
pi1 †…1
pi2 †
is the probability that the two raters agree if they independently choose a category, where
pir ˆ pr…Yir ˆ 1jxi , xir †.
Also, for a given value of pchance, i , one can show that i is restricted to be in the interval
pchance, i
4 i 4 1;
1 pchance, i
…2†
further algebra can be used to show that i is restricted to be in the interval … 1, 1). For a
given link function g….†, an adjustment for covariates associated with can be accomplished
by using the model
g…i † ˆ zi0 .
Since i is similar to a correlation coef®cient, we could let g…i † be proportional to Fisher's ztransformation, i.e.
1 ‡ i
g…i † ˆ log
.
1 i
This link function avoids the need for constraints on the parameter to ensure that
1 < i < 1. However, the parameters have no simple interpretation. Thus, we propose to
use the identity (linear) link
i ˆ zi0 ,
…3†
so that the parameters will be directly interpreted as the change in for a 1-unit change in a
covariate. In equation (3), there are constraints on so that 1 < i < 1. However, we do
not use constraints in the estimating procedure that we propose later in this section; the
estimation procedure is much simpler without constraints, and the resulting estimates are
asymptotically normal and consistent. Also, in all the data sets (and simulated data sets)
that we have looked at, we have never had a problem with non-convergence of the algorithm described below to a unique ;
^ further, for each ,
^ all estimated i were such that
1 < ^ i < 1. Although we have not experienced problems with ^ i being greater than 1, if the
true i is close to 1 for most combinations of covariates, then we expect that, depending on
the sample size, there could be problems with ^ i being greater than 1.
We now show that we can estimate the parameter vector through modelling
pagree, i ˆ E…Yi jzi † ˆ pr…Yi ˆ 1jzi †.
In particular, using equations (1) and (3), the model for pagree, i in terms of i can be expressed
as
454
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
pagree, i ˆ pchance, i ‡ i …1
ˆ pchance, i ‡
pchance, i †
zi0 …1
pchance, i †
ˆ pchance, i ‡ z*i 0 ,
where
z*i ˆ …1
…4†
pchance, i †zi .
If pchance, i was known, then the model for pagree, i is a `linear' model with
(a) known `offset' pchance, i ,
(b) known covariate vector z*i ˆ …1
(c) and parameter vector .
pchance, i †zi
Thus, instead of a logistic model for the probability pagree, i , we have a linear model for pagree, i .
To estimate if pchance, i is known, we can use maximum likelihood based on the Bernoulli
distribution of Yi . The likelihood in this case is just
n
Q
iˆ1
y
i
p agree,
i …1
pagree, i †1
yi
.
The MLE from this likelihood can be obtained by using any generalized linear models
program which allows a linear model for the binary outcome Yi .
Unfortunately, pchance, i is rarely known. One possibility is to use maximum likelihood based
on the joint distribution of …Yi1 , Yi2 † to estimate …pi1 , pi2 † and jointly. This is the approach
taken by Shoukri and Mian (1996). We take a different approach, which will be easier to
extend than maximum likelihood when the ratings are polytomous. In our approach, we
replace pchance, i in equation (4) with an estimate p^ chance, i and then estimate by using a linear
model. In particular, we can estimate pi1 and pi2 , denoted p^ i1 and p^ i2 , by using ordinary
logistic regression with the model
logit…pir † ˆ xir0 1r ‡ xi0 2r ,
…5†
r ˆ 1, 2, and we estimate pchance, i with
p^ chance, i ˆ p^ i1 p^ i2 ‡ …1
p^ i1 †…1
p^ i2 †.
Then, the linear model for pi, agree is
pagree, i p^ chance, i ‡ …1
p^ chance, i †zi0 …6†
(we put `' in expression (6) since p^ chance, i is an estimate). Treating the estimated value p^ chance, i
in expression (6) as known, can still be estimated by using a linear model for binary data.
For example, suppose that we wanted to use the SAS generalized linear models program
PROC GENMOD to estimate ; then we can do the following.
(a) Use logistic regression (PROC GENMOD or LOGISTIC) of Yi1 versus (xi1 , xi † to obtain
p^ i1 .
(b) Use logistic regression (PROC GENMOD or LOGISTIC) of Yi2 versus (xi2 , xi † to obtain
p^ i2 .
(c) For each subject, use the results of the ®rst two models to form p^ chance, i ˆ p^ i1 p^ i2 ‡
…1 p^ i1 †…1 p^ i2 †.
(d) Use linear regression (PROC GENMOD) of the binary outcome Yi versus z*i with offset
p^ chance, i .
Estimating a Regression Model for 455
SAS PROC GENMOD prints out an error message that convergence is questionable if
pr…Y
b i ˆ 1jz*i † ˆ p^ agree, i
ˆ p^ chance, i ‡ zi0 ^ …1
ˆ p^ chance, i ‡ ^ i …1
p^ chance, i †
p^ chance, i †
is not between 0 and 1. However, if
0 < p^ chance, i ‡ ^ i …1
p^ chance, i † < 1,
then, by rearranging terms, one can show that condition (2) holds, and PROC GENMOD prints
out `converged'. Although it is surely possible that ^ i will be outside the range ‰ 1, 1Š, we
have never found this to be true in any data which we have analysed.
We note here that since the model for pir in equation (5) is not a function of , for a given
model for pir , the estimate ^ will be the same, regardless of the model for i . The estimate of
i , however, depends on the model for pir and thus could be sensitive to the model for pir and
be biased if we under®t the model for pir . Thus, if we were most interested in consistently
estimating i , we should ®t the largest possible model for pir . In particular, to prevent biases
in the estimate of i , we would not worry about the type I error rate when ®tting a model for
pir ; we would much rather over®t the model for pir than under®t it. In all the examples that we
have looked at, we have found that over®tting the model (i.e. including non-signi®cant terms
^
in the model) for pir leads to very little increase in the estimated standard error of .
0
0
0
, 21 , 12
, 22
†. Using Taylor series expansions similarly to
Suppose that we let 0 ˆ …11
Prentice (1988), assuming that the models for pi1 , pi2 and i are correctly speci®ed, … ^ 0 , ^ 0 †0 is
consistent for … 0 , ^ 0 †0 , and N 1=2 …… ^ †0 , … ^ †0 †0 has an asymptotic distribution which is
multivariate normal with mean vector 0 and a covariance matrix which can be consistently
estimated by a robust variance estimator such as the jackknife (Wu, 1986) or the sandwich
estimator of White (1982). One form of the jackknife variance estimate (Wu, 1986) is
var…
c ^ 0 , ^ 0 †0 ˆ
n
P
iˆ1
…… ^ 0 , ^ 0 †
i
… ^ 0 , ^ 0 ††0 …… ^ 0 , ^ 0 †
i
… ^ 0 , ^ 0 ††,
…7†
where … ^ 0 , ^ 0 † i is the estimate of … 0 , 0 † obtained by deleting the two ratings on the ith
individual and recalculating both and . As opposed to equation (7), the SAS PROC
GENMOD variance estimate of ,
^ sometimes called a `naõÈ ve' variance estimate, is calculated
assuming that p^ chance, i is known. The SAS PROC GENMOD variance estimate tends to be
somewhat conservative in simulations that we have performed; these standard errors tend to
be conservative because of the covariance between ^ and .
^
Since logistic and linear regression for binary data can be calculated quite quickly, the time
that is needed to calculate the jackknife variance estimate is minimal. For example, it took 2
min on a SPARCstation Ultra 20 computer to calculate the variance for the ®rst example
with 395 observations.
As stated above, to use maximum likelihood (Shoukri and Mian, 1996) with pi1 and pi2
unknown, we must specify the joint multinomial distribution of …Yi1 , Yi2 †. This joint multinomial distribution has three non-redundant probabilities. One can show that …pi1 , pi2 , i †
completely speci®es these three probabilities. Thus, we must estimate the same number of
parameters with our method as with maximum likelihood, so the main advantage of our
method is that a simpler program can be written to calculate the estimate. SAS macros to
calculate our estimate can be found at the Web site of the ®fth author:
456
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
http://gsbwww.uchicago.edu/fac/michael.parzen/research/
However, if the ratings have more than two levels, our method, described in the next section,
extends more easily than maximum likelihood.
3.
Extension to more than two categories
Suppose that Yir can take levels 1, . . ., J instead of 0 or 1. In this case, the -coef®cient is
again de®ned as
pagree, i pchance, i
i ˆ
,
1 pchance, i
where pagree, i is the probability that the two raters agree on subject i,
pagree, i ˆ
J
P
jˆ1
pr…Yi1 ˆ j, Yi2 ˆ jjzi †
and
pchance, i ˆ
J
P
jˆ1
pr…Yi1 ˆ jjxi , xi1 † pr…Yi2 ˆ jjxi , xi2 †.
If we again de®ne an indicator random variable Yi , which equals 1 if both raters agree on
their ratings for subject i and equals 0 otherwise, then
pr…Yi ˆ 1jzi † ˆ pagree, i
ˆ pchance, i ‡ i …1
pchance, i †
0
ˆ pchance, i ‡ z*i ,
…8†
zi0 ,
where z*i ˆ zi …1 pchance, i †.
for the linear model i ˆ
To estimate if pchance, i is known, we can again use maximum likelihood based on the
distribution of Yi , with the likelihood
n
Q
iˆ1
y
i
p agree,
i …1
pagree, i †1
yi
.
The MLE from this likelihood can again be obtained by using any generalized linear models
program which allows a linear model for binary outcome Yi . However, pchance, i is rarely
known, so we must extend our method from the previous section. Extending the approach in
the previous section, to estimate i , we can
(a) use polytomous or ordinal logistic regression of Yi1 versus …xi , xi1 † to obtain pr…Y
b i1 ˆ jjxi ,
xi1 †,
b i2 ˆ jjxi ,
(b) use polytomous or ordinal logistic regression of Yi2 versus …xi , xi2 † to obtain pr…Y
xi2 ),
(c) for each subject, form
p^ chance, i ˆ
J
P
jˆ1
pr…Y
b i1 ˆ jjxi , xi1 † pr…Y
b i2 ˆ jjxi , xi2 †
and
(d) use linear regression of the binary outcome Yi versus z*i with offset p^ chance, i .
Estimating a Regression Model for 457
To use maximum likelihood with polytomous ratings, and pr…Yi1 ˆ jjxi , xi1 † and pr…Yi2
ˆ jjxi , xi2 † unknown, we must specify the joint multinomial distribution of …Yi1 , Yi2 †.
This joint multinomial distribution has J 2 1 non-redundant probabilities. Our method
requires a model for the J 1 non-redundant probabilities pr…Yi1 ˆ jjxi , xi1 † … j ˆ 1, . . .,
J 1†, a model for the J 1 probabilities pr…Yi2 ˆ jjxi , xi2 † ( j ˆ 1, . . ., J 1† and a model
for i . These models only specify 2…J 1† ‡ 1 dimensions of the …J 2 1†-dimensional joint
distribution of …Yi1 , Yi2 †. Thus, compared with a maximum likelihood method based on the
joint distribution of …Yi1 , Yi2 †, our method requires far fewer `nuisance' parameters. Further,
the full joint distribution that is needed for maximum likelihood does not have a simple form.
Thus, not only is the computational method simple with polytomous ratings, this method is
also semiparametric in that the full joint distribution of …Yi1 , Yi2 † need not be speci®ed as in
maximum likelihood.
4.
Examples
4.1. Pastor and co-ordinator educator study
For the pastor and co-ordinator educator study, we want to determine covariates predictive
of agreement between a pastor and co-ordinator with respect to whether the congregation
puts enough emphasis on encouraging members to work for social justice. We model the
-coef®cient as a function of the Protestant denomination (six different denominations),
whether the church offers community service projects for youths (yes versus no), the number
of hours (2 or under versus more than 2) spent in the last 30 days helping people who are
poor, political orientation (liberal versus conservative) and the amount of money donated to
charities over the past year ($100 or more versus less than $100). Table 3 gives Cohen's (1960)
simple estimate of for various levels of these covariates; note that the estimates of for
levels of one covariate are not adjusted for the other covariates. Looking at the results in
Table 3, none of the covariates seems signi®cantly to predict chance-corrected agreement. As
we shall see later in this section, just as in these univariate results, when we formulate a
regression model for , none of these covariates are signi®cant.
If we let Yi1 be the pastor rating (1 if yes; 0 if no), and Yi2 be the co-ordinator rating, then
the models for the marginal probabilities are, for the pastor,
logit… pi1 † ˆ 01 ‡ 11 c1i ‡ . . . ‡ 51 c5i ‡ 61 SERVi ‡ 71 HOURi1 ‡ 81 LIBi1
‡ 91 DONATEi1
…9†
and, for the co-ordinator,
logit… pi2 † ˆ 02 ‡ 12 c1i ‡ . . . ‡ 52 c5i ‡ 62 SERVi ‡ 72 HOURi2 ‡ 82 LIBi2
‡ 92 DONATEi2 ,
…10†
where fc1i , . . ., c5i g are ®ve church indicators, SERVi equals 1 if the church offers community service projects for youths and equals 0 otherwise, HOURir equals 1 if rater j spent
more than 2 h in the last 30 days helping people who are poor and equals 0 otherwise, LIBir
equals 1 if rater j is liberal and equals 0 otherwise, and DONATEir equals 1 if rater j donated
at least $100 to charities in the past year and equals 0 otherwise.
As stated earlier, to prevent biases in the estimate of i , we do not worry about the type I
error rate when ®tting a model for pir ; we would much rather over®t the model for pir than
under®t it. Thus, we ®t the model for pir which was to include all interactions terms that are
458
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
Table 3. Cohen's (1960) estimates of from the pastor and co-ordinator educator
study for subgroups
Covariate
N
% agree
Standard
error
P-value{
Church offers youth community service projects
No
217
62
0.201
Yes
178
57
0.115
0.067
0.075
0.390
Religion
1
2
3
4
5
6
0.100
0.120
0.224
70.064
0.244
0.195
0.114
0.109
0.119
0.161
0.114
0.121
0.664
Hours helping poor (pastor, co-ordinator)
…42, 42†
47
65
…42, > 2†
38
53
… > 2, 42†
132
62
… > 2, > 2†
178
58
0.227
0.053
0.201
0.148
0.145
0.158
0.082
0.074
0.818
Donations (pastor, co-ordinator)
…< $100, < $100†
48
…< $100, 5$100†
84
…5$100, < $100†
62
…5$100, 5$100†
201
65
60
60
58
0.297
0.198
0.185
0.074
0.134
0.104
0.126
0.071
0.455
Liberal (pastor, co-ordinator)
(no, no)
244
(no, yes)
31
(yes, no)
85
(yes, yes)
35
62
58
53
60
0.165
0.176
0.071
0.178
0.065
0.144
0.105
0.163
0.877
72
85
67
37
69
65
58
61
61
51
62
60
^
{Wald test for equal s over subgroups.
signi®cant at the 0.25-level of signi®cance. However, ®tting a step-up model, none of the
pairwise interactions were signi®cant using a Wald statistic (P-values greater than 0.25). The
estimates for the logistic models (9) and (10) are given in Table 4. We see from Table 4 that
the covariates have different effects on pi1 in model (9) and pi2 in model (10). For instance,
DONATIONS and LIBERAL are signi®cant for the pastor, but not for the co-ordinator.
When further ®tting the model for i , we included the same covariates for both pi1 in model
(9) and pi2 in model (10); however, this is not necessary, and we could have dropped the nonsigni®cant terms in either pi1 or pi2 .
Our model for is
i ˆ 0 ‡ 1 c1i ‡ . . . ‡ 5 c5i ‡ 6 SERVi ‡ 7 HOURi1 …1
‡ 8 …1
HOURi1 †HOURi2 ‡ 9 …1
‡ 10 LIBi1 …1
LIBi2 † ‡ 11 …1
HOURi1 †…1
HOURi2 †
HOURi2 †
LIBi1 †LIBi2
‡ 12 …1
LIBi1 †…1
LIBi2 † ‡ 13 DONATEi1 …1
‡ 14 …1
DONATEi1 †DONATEi2
‡ 15 …1
DONATEi1 †…1
DONATEi2 †.
DONATEi2 †
…11†
Estimating a Regression Model for 459
Table 4. Estimates from the pastor and co-ordinator educator study for the logistic
marginal model for putting emphasis on encouraging members to work for social
justice
Parameter
Estimate
Standard error
Z
P-value
Pastor
INTERCEPT
YOUTH SERVICE
CHURCH(1)
CHURCH(2)
CHURCH(3)
CHURCH(4)
CHURCH(5)
HOURS HELP
DONATIONS
LIBERAL
70.635
70.039
70.090
70.302
0.079
70.170
70.057
0.039
0.583
0.703
0.412
0.221
0.394
0.389
0.399
0.479
0.403
0.269
0.230
0.247
71.54
70.18
70.23
70.78
0.20
70.36
70.14
0.14
2.54
2.85
0.123
0.859
0.820
0.438
0.844
0.723
0.887
0.886
0.011
0.004
Co-ordinator
INTERCEPT
YOUTH SERVICE
CHURCH(1)
CHURCH(2)
CHURCH(3)
CHURCH(4)
CHURCH(5)
HOURS HELP
DONATIONS
LIBERAL
70.481
0.194
70.799
70.812
0.044
70.311
70.420
0.328
0.346
0.018
0.357
0.225
0.404
0.391
0.392
0.464
0.399
0.228
0.245
0.292
71.35
0.87
71.98
72.08
0.11
70.67
71.05
1.44
1.41
0.06
0.178
0.387
0.048
0.038
0.911
0.502
0.293
0.150
0.157
0.952
The effects f 7 , . . ., 15 g correspond to the various combinations of the rater-speci®c
covariates. When ®tting this model for i , we ®t the marginal models (9) and (10). The
covariates in the marginal models and the model for need not be identical, and we see that
the covariates in model (11) are different from those in models (9) and (10). We can think of
the `saturated model' as containing a different -coef®cient for every possible combination of
the covariates, which means that both the marginal model for pir and the model for would
have all possible interaction terms. Of course, this `saturated' model usually cannot be ®tted in
practice, so we propose ®tting step-up regression models for pir and i . Fitting a step-up model
for i , none of the pairwise interactions were signi®cant using a Wald statistic …P-values
greater than 0.25). Table 5 gives our estimate and the MLE (Shoukri and Mian, 1996) for .
Our approach and the MLE are similar, with the MLEs having slightly smaller estimated
standard errors. Thus, our estimates appear highly ef®cient compared with the MLEs (the
estimated ef®ciency is at least 85% for most effects). Unfortunately, none of the covariates are
signi®cant. For model (11), the estimated values of ^ i range from 0:35 to 0.51, which are well
within the parameter space ‰ 1, 1Š. We note that it took 2 min to run on a SPARCstation Ultra
20 computer to calculate the jackknife variance for this example with 395 observations.
As stated earlier, the estimate of could be sensitive to the model for pir . Table 6 contains
the estimated parameters for model (11) when including only intercepts in the marginal
models for pir as well as the marginal model which adds all possible pairwise interactions to
models (9) and (10). Looking at Table 6, the parameter estimates of and estimated standard
errors are very similar. In this example, the estimates of are not very sensitive to under®tting or over®tting the marginal model. One possible reason for this is that none of the
covariates are apparently predictive of ; however, we could imagine that an under®tted
marginal model could affect the best ®tting model for i .
460
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
Table 5. Regression parameter estimates for for the pastor and co-ordinator educator study
Effect
INTERCEPT
YOUTH SERVICE
CHURCH(1)
CHURCH(2)
CHURCH(3)
CHURCH(4)
CHURCH(5)
HOURS(42, > 2)
HOURS(> 2, 42)
HOURS(> 2, > 2)
DONATIONS(< $100, 5$100†
DONATIONS(5$100, < $100†
DONATIONS(5$100, 5$100†
LIBERAL(NO, YES)
LIBERAL(YES, NO)
LIBERAL(YES, YES)
Approach{
Estimate
Standard error
Z
P-value
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
GLIM
MLE
0.345
0.376
70.097
70.132
70.127
70.154
70.135
70.185
0.012
70.086
70.300
70.322
0.067
0.033
70.282
70.308
70.076
70.095
70.112
70.130
0.099
0.125
0.116
0.116
0.180
0.163
70.122
70.035
70.181
70.164
0.002
70.050
0.209
0.201
0.111
0.107
0.178
0.170
0.175
0.163
0.179
0.160
0.226
0.213
0.185
0.175
0.244
0.233
0.179
0.169
0.174
0.165
0.157
0.145
0.133
0.128
0.166
0.157
0.217
0.189
0.135
0.131
0.187
0.171
1.65
1.87
70.87
71.23
70.71
70.91
70.77
71.14
0.07
70.54
71.33
71.51
0.36
0.19
71.16
71.32
70.43
70.56
70.64
70.79
0.63
0.86
0.87
0.90
1.09
1.04
70.56
70.19
71.34
71.25
0.01
70.29
0.099
0.061
0.384
0.219
0.478
0.363
0.441
0.254
0.944
0.589
0.184
0.131
0.719
0.849
0.246
0.187
0.667
0.575
0.522
0.430
0.529
0.390
0.384
0.368
0.276
0.298
0.575
0.849
0.180
0.211
0.992
0.772
{GLIM is our approach, with jackknife standard errors.
Given that none of the covariates in the model for i are signi®cant, we are interested in an
overall estimate of i ˆ . To obtain an overall estimate of the -coef®cient, we re®tted the
marginal models (9) and (10), and modelled ˆ i ˆ 0 . The estimate of is ^ ˆ 0:1528. We
computed the 95% con®dence interval by ®rst calculating the quantity
1 ‡ ^
^ ˆ log
,
1 ^
c †,
^ by using the delta method. We
and estimating its asymptotic standard error, denoted SE…
then calculated the end points of the 95% con®dence interval with
c †g
^
exp f ^ 1:96 SE…
1
.
^
c
^
exp f 1:96 SE…†g ‡ 1
The 95% con®dence interval is ‰0:0541, 0:2484Š, suggesting agreement beyond chance. However, using the scale of Landis and Koch (1977), agreement between the pastor and coordinators is at best `slight'.
Estimating a Regression Model for 461
Table 6. Regression parameter estimates for for the pastor and co-ordinator educator study,
for the intercept only and pairwise interaction marginal models
Effect
INTERCEPT
YOUTH SERVICE
CHURCH(1)
CHURCH(2)
CHURCH(3)
CHURCH(4)
CHURCH(5)
HOURS(42, > 2)
HOURS(> 2, 42)
HOURS(> 2, > 2)
DONATIONS(< $100, 5$100†
DONATIONS(5$100, < $100†
DONATIONS(5$100, 5$100†
LIBERAL(NO, YES)
LIBERAL(YES, NO)
LIBERAL(YES, YES)
Marginal
model{
Estimate
Standard
error
Z
P-value
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
INT
PAIR
0.354
0.361
70.081
70.098
70.071
70.132
70.033
70.141
0.018
0.005
70.248
70.299
0.102
0.059
70.259
70.319
70.041
70.089
70.121
70.127
0.056
0.098
0.020
0.125
0.127
0.175
70.117
70.113
70.254
70.183
70.073
0.003
0.208
0.207
0.112
0.111
0.185
0.179
0.176
0.176
0.182
0.179
0.226
0.225
0.189
0.185
0.241
0.246
0.178
0.177
0.173
0.172
0.156
0.157
0.140
0.134
0.170
0.167
0.202
0.213
0.142
0.137
0.201
0.189
1.71
1.74
70.72
70.89
70.38
70.74
70.19
70.80
0.10
0.03
71.10
71.33
0.54
0.32
71.07
71.30
70.23
70.50
70.70
70.74
0.36
0.62
0.14
0.93
0.75
1.05
70.58
70.53
71.79
71.34
70.36
0.02
0.087
0.082
0.472
0.373
0.704
0.459
0.849
0.424
0.920
0.976
0.271
0.184
0.589
0.749
0.285
0.194
0.818
0.617
0.484
0.459
0.719
0.535
0.889
0.352
0.453
0.294
0.562
0.596
0.073
0.180
0.719
0.984
{INT just contains an intercept in the marginal model; PAIR contains pairwise interactions in the
marginal model.
4.2. Spousal education study
Social scientists are often interested in measuring the similarity of the highest educational
degree (no high school degree, high school degree or college degree) of a husband and wife.
We want to determine whether agreement between the spouses' highest educational degree
depends on the current family income (less than $25000, $25000±50000 and more than
$50000), the number of children (0, 1, 2, 3 or 4), the ages at marriage for the wife and
husband, the highest degree (no high school degree, high school degree or college degree) of
the father of the wife and father of the husband, and an indicator for living at home with both
parents at age 16 years (yes versus no). The data from 332 husband±wife pairs are taken from
the 1991 US General Social Survey (Smith, 1996). Table 7 gives Cohen's (1960) simple
estimate of for various levels of these covariates, unadjusted for the other covariates.
Looking at the results in Table 7, appears to differ for different combinations of the father's
education.
Because the ratings are ordinal, we chose to model the marginal probabilities pir, j ˆ
462
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
Table 7. Cohen's (1960) estimates of from the spousal education study for
subgroups{
Covariate
Standard error
P-value{
0.420
0.312
0.458
0.110
0.071
0.069
0.325
70
48
44
72
64
52
73
79
83
0.518
0.192
70.231
0.354
0.270
70.073
0.492
0.569
0.356
0.081
0.134
0.192
0.144
0.107
0.171
0.199
0.143
0.220
0.002
Age at marriage} (years) (wife, husband)
(422, 422†
129
70
(422, > 22†
80
64
(> 22, 422†
9
56
(> 22, >22†
116
66
0.476
0.400
0.053
0.387
0.070
0.090
0.296
0.074
0.503
Family income
< $25000
$25000±50000
> $50000
N
% agree
50
127
157
64
61
72
Father's education (wife, husband)
(below HS, below HS)
80
(below HS, HS)
31
(below HS, COL)
8
(HS, below HS)
36
(HS, HS)
80
(HS, COL)
27
(COL, below HS)
15
(COL, HS)
28
(COL, COL)
29
^
Home at 16 years (wife, husband)
(no, no)
6
(no, yes)
29
(yes, no)
27
(yes, yes)
270
50
59
67
68
70.200
0.374
0.449
0.441
0.150
0.132
0.148
0.049
0.299
Number of children
0±1
2±4
66
69
0.433
0.417
0.051
0.085
0.872
244
90
{HS, high school; COL, college.
{Wald test for equal s over subgroups.
}22 years is the median age, collapsed over spouse.
pr…Yir ˆ jjxi , xir † with the cumulative logistic link function (McCullagh, 1980). In particular,
we model
log
Fir, j
1 Fir, j
for j ˆ 1, 2, where Fir, j ˆ pir, 1 ‡ . . . ‡ pir, j is a `cumulative' probability of observing at most
the jth level. For example, Fir, 2 ˆ pir, 1 ‡ pir, 2 is the probability of at most a high school degree
… j ˆ 1, no high school degree, or j ˆ 2, high school degree).
The covariates are the number of children in the family (KIDSi ), current family income
(INCOME2i equals 1 if the income is $25 000±50 000 and 0 otherwise, INCOME3i equals 1 if
the income is more than $50 000 and 0 otherwise), father's education (FATHED2ir equals 1 if
the father's education is a high school degree and 0 otherwise, FATHED3ir equals 1 if the
father's education is a college degree and 0 otherwise), age at marriage (AGEir † and home
with both parents at age 16 years (HOMEir ). We ®t a proportional odds model (McCullagh,
1980) to the cumulative logits, i.e.
Estimating a Regression Model for 463
Table 8. Estimates from the spousal education study for the cumulative logistic marginal
model for education level
Parameter
Estimate
Standard error
Z
P-value
Wife's education level
INTERCEPT1
INTERCEPT2
KIDS
INCOME2
INCOME3
FATHED2
FATHED3
AGE AT MAR
HOME AT 16
INCOME2 * FATHED2
INCOME2 * FATHED3
INCOME3 * FATHED2
INCOME3 * FATHED3
INCOME2 * AGE
INCOME3 * AGE
70.347
3.489
0.100
71.136
0.845
72.079
73.990
0.038
70.767
1.218
1.760
1.741
2.191
70.021
70.193
1.202
1.235
0.096
1.402
1.559
0.700
0.903
0.053
0.392
0.810
1.023
0.814
1.043
0.062
0.069
70.29
2.83
1.04
70.81
0.54
72.97
74.42
0.71
71.96
1.50
1.72
2.14
2.10
70.34
72.81
0.773
0.005
0.299
0.418
0.588
0.003
0.000
0.480
0.050
0.133
0.085
0.032
0.036
0.737
0.005
Husband's education level
INTERCEPT1
INTERCEPT2
KIDS
INCOME2
INCOME3
FATHED2
FATHED3
AGE AT MAR
HOME AT 16
INCOME2 * FATHED2
INCOME2 * FATHED3
INCOME3 * FATHED2
INCOME3 * FATHED3
INCOME2 * AGE
INCOME3 * AGE
0.076
2.981
0.112
70.978
0.378
72.006
73.621
0.001
70.355
1.245
0.586
1.941
1.854
70.028
70.126
1.156
1.175
0.090
1.493
1.578
0.622
0.955
0.044
0.373
0.718
1.241
0.727
1.095
0.060
0.065
0.07
2.54
1.24
70.66
0.24
73.22
73.79
0.02
70.95
1.73
0.47
2.67
1.69
70.46
71.95
0.948
0.011
0.214
0.512
0.811
0.001
0.000
0.988
0.342
0.083
0.637
0.008
0.090
0.646
0.051
Fir, j
log
ˆ 0rj ‡ 1r KIDSi ‡ 2r INCOME2i ‡ 3r INCOME3i ‡ 4r FATHED2ir
1 Fir, j
‡ 5r FATHED3ir ‡ 6r AGEir ‡ 7r HOMEir ‡ 8r INCOME2i FATHED2ir
‡ 9r INCOME2i FATHED3ir ‡ 10r INCOME3i FATHED2ir
‡ 11r INCOME3i FATHED3ir ‡ 12r INCOME3i AGEir
‡ 13r INCOME3i AGEir .
…12†
for j ˆ 1, 2 and r ˆ 1, 2. The estimates of the parameters of the cumulative logistic model (12)
are given in Table 8. All other interaction terms were non-signi®cant at the 0.25 level of
signi®cance.
We ®t the following model for the -coef®cient:
i ˆ 0 ‡ 1 KIDSi ‡ 2 INCOME2i ‡ 3 INCOME3i ‡ 4 FATHEDi
‡ 5 AGEi ‡ 6 HOMEi ,
…13†
where FATHEDi equals 0 if the level of education of the husband and wife are the same, and
464
S. R. Lipsitz, J. Williamson, N. Klar, J. Ibrahim and M. Parzen
Table 9. Regression parameter estimates for for the spousal education study, using
the best ®tting marginal model (Table 8) and the marginal model containing only
intercepts
Parameter
INTERCEPT
KIDS
INCOME2
INCOME3
FATHED DIF
AGE DIF
HOME AT 16 DIF
Marginal
model{
(12)
INT
(12)
INT
(12)
INT
(12)
INT
(12)
INT
(12)
INT
(12)
INT
Estimate
Standard
error
Z
P-value
0.364
0.378
0.025
0.027
70.112
70.094
0.016
0.097
70.140
70.144
0.005
0.004
70.037
70.031
0.129
0.127
0.038
0.034
0.147
0.145
0.145
0.139
0.057
0.055
0.010
0.009
0.116
0.106
2.83
2.97
0.65
0.82
70.76
70.64
0.11
0.69
72.47
72.62
0.55
0.47
70.32
70.29
0.005
0.003
0.516
0.412
0.447
0.522
0.912
0.490
0.014
0.009
0.582
0.638
0.749
0.772
{INT, intercept-only model.
1 otherwise, AGEi equals 0 if the age at marriage of the husband and wife is the same, and 1
otherwise, and HOMEi equals 0 if the husband and wife have the same value of HOMEir , and
1 otherwise. Table 9 gives a comparison of the estimates of using marginal model (12) as
well as only an intercept marginal model, with jackknife standard errors. Looking at Table 9,
the parameter estimates of and estimated standard errors are very similar when either
marginal model (12) or the intercept-only model is used. In this example, the estimates of are again not very sensitive to under®tting or over®tting the marginal model. Then, for either
marginal model, the results in Table 9 agree with the univariate results in Table 7 in that only
father's education is signi®cant. From Table 9, when the father's education of the husband
and wife are different, the -coef®cient decreases by approximately 0.14.
We note that, using marginal model (12) and model (13), it took 2 min and 26 s to calculate
the jackknife standard errors on a SPARCstation Ultra 20 computer for this example with
332 observations. Further, using these same models, the estimated values of ^ i range from
0:05 to 0.75, which are well within the parameter space ‰ 1, 1Š.
Unfortunately, as discussed earlier, we do not compare this method with an MLE because
a likelihood method requires a full speci®cation of the joint distribution of the two ratings on
the same subject, which can be quite complicated to specify and has only been developed for
dichotomous ratings (Shoukri and Mian, 1996).
5.
Conclusion
In this paper we considered a simple method for estimating a linear regression model for the
-coef®cient, for the case when each subject is rated by two raters. Although it was not shown
here, the estimates can be shown to be the solution to a generalized estimating equation
(Liang and Zeger, 1986). The method is attractive because it can be implemented easily by
using existing general purpose statistical software. The method can be easily extended to
more than two raters. With more than two ratings per subject, we can still use a (possibly
ordinal) logistic regression for each rating on a subject, and a linear regression model for a
binary indicator of agreement between a pair of ratings. For more than two ratings, though,
Estimating a Regression Model for 465
the resulting estimator may not be the most ef®cient estimator of the -coef®cient, which is a
topic for future research. Although useful for dichotomous ratings, our method is especially
attractive for polytomous ratings, in that it is not necessary to specify the full joint multinomial distribution between the pair of ratings on each subject.
Acknowledgements
We are grateful for the support provided by National Institutes of Health grant CA 55576
and NIMH grant 1-R01-MH-54693-01A1.
References
Benson, P. L. and Eklin, C. H. (1991) Effective Christian Education: a National Study of Protestant Congregations.
Minneapolis: Search Institute.
Cohen, J. (1960) A coef®cient of agreement for nominal scales. Educ. Psychol. Measmnt, 20, 37±46.
Francis, B., Green, M. and Payne, C. (1993) The GLIM System, Release 4 Manual. New York: Clarendon.
Hastie, T. J. and Pregibon, D. (1993) Generalized linear models. In Statistical Models in S (eds J. M. Chambers and
T. J. Hastie). Paci®c Grove: Wadsworth and Brooks.
Landis, J. R. and Koch, G. (1977) The measurement of observer agreement for categorical data. Biometrics, 33,
159±174.
Liang, K. Y. and Zeger, S. L. (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73,
13±22.
McCullagh, P. (1980) Regression models for ordinal data (with discussion). J. R. Statist. Soc. B, 42, 109±142.
Prentice, R. L. (1988) Correlated binary regression with covariates speci®c to each binary observation. Biometrics, 44,
1033±1048.
SAS Institute (1997) SAS/STAT Software: Changes and Enhancements through Release 6.12. Cary: SAS Institute.
Shoukri, M. M. and Mian, I. U. (1996) Maximum likelihood estimation of the kappa coef®cient from bivariate
logistic regression. Statist. Med., 15, 1409±1419.
Smith, T. W. (1996) Who, What, When, Where, and Why: an Analysis of Usage of the General Social Survey. Chicago:
National Opinion Research Center.
Stata Corporation (1995) Stata Reference Manual: Release 4.0, vol. 2, pp. 399±410. College Station: Stata
Corporation.
White, H. (1982) Maximum likelihood estimation of misspeci®ed models. Econometrica, 50, 1±25.
Wu, C. F. J. (1986) Jackknife bootstrap and other resampling methods in regression analysis. Ann. Statist., 14,
1261±1295.
Download