3.2 Scales and modelling

advertisement
3.2 Measurement scales and modeling
(a) General
There are two types of scales, pure scales and compound scales. A
bivariate responses with one response ordinal and the other
continuous is an example of compound scales. For pure scales, there
are several types:
1. nominal scales: the categories are regarded as exchangeable and
totally devoid of structure.
2. ordinal scales: the categories are ordered much like the ordinal
number, “first”, “second”,…. It does not make
sense to talk of “distance” or “spacing” between
“first” and “second” nor to compare “spacings”
between pairs of response categories.
3. interval scales: the categories are ordered and numerical labels
or scores are attached. The scores are treated as
category averages, median or mid-points.
Differences between scores are therefore
interpreted as a measure of separation of the
categories.
Note:
In applications, the distinction between nomial or ordinal scales is
usually but not always clear. For example, hair color and eye color
can be ordered to a large extent on the grey-scale from light to dark
and are therefore ordinal. However, unless there is a clear connection
with electromagnetic spectrum or a grey-scale, colors are best
regarded as nomial.
1
(b) Models for ordinal scales
Ordinal scales occur more frequently in applications than the other
types. The applications include food testing (bad, good, excellent,…),
classification of radiographs, determination of physical or mental
well-being, ….
Note:
It is essential the same conclusion can be arrived even though the
number or choice of response categories has been changed. As a
consequence, if a new category is formed by combining adjacent
categories of the old scale, the form of the conclusions should be
unaffected. This is an important non-mathematical point that is
difficult to make mathematically rigorous. This point lead fair
directly to models based on the cumulative probabilities
than the category probabilities

j
r j rather
.
Commonly used models:
There are two commonly used models that are found to work well in
practice. They are
1. logistic scale:
It is the simplest model. The form is
 r j x  
log 
 


1

r
x


j


j
 x .
This model is also known as the proportional-odds model since the
ratio of the odds is
r j  x1 
1  r j  x1 
 exp   x1  x2  
r j  x2 
,
1  r j  x2 
2
which is independent of the choice of category (j). In addition, if
1, treatme nt group
X 
0, control group
then
,
r j 1
1  r j 1
 e
r j 0 
.
1  r j 0 
2. complementary log-log scale:
The form is


log  log 1  r j  x     j  x .
Note:
The model based on logistic scale may be derived from the notion of a
tolerance distribution or an underlying unobserved continuous
random variable Z, Z  x   ,

is distributed as logistic
distribution. If the unobserved variable lies in the interval
 j 1  Z   j ,
then Y  y j is recorded. That is,
3
r j  x   P Y  j   P Z   j 
 P Z  x   j  x 
 P    j  x 

exp  j  x 
1  exp  j  x 
 rj x  
 log 
   j  x


1

r
x


j


Note:
It is sometimes claimed that the models based on logistic scale and
complementary log-log scale and related models are appropriate only
if there exists a latent variable Z. This claim seems to be too strong
and, in any case, the existence of Z is usually unverifiable in practice.
Note:
Z  x
The model, exp( x )   , is worthy of serious consideration,
where

is distributed as logistic distribution. The model will lead
to
 r j  x    j  x
log 



1

r
x
exp  x  ,


j


where
x
denominator
plays the role of linear predictor for the mean and in the
x
plays the role of linear predictor for the
dispersion or variance. if
4
1, treatme nt group
X 
0, control group
,
then
r j 1
1  r j 1
 j  


 exp 
 j 

r j 0 
 

1  r j 0 
 1
  

 exp  
 1 
 exp  j 
  

 
where
  exp   .
increasing in
j
If
 1 ,
,
then the odds ratio is
and decreasing otherwise. This model is useful
for testing the proportional-odds assumption ( 
 0 ) against the
alternative that the odds ratio is systematically increasing or
systematically decreasing in
j.
Note:
Models in which the k-1 regression lines are not parallel can be
specified by
 rj x  
log 
   j  x j .


1

r
x


j


(c) Models for interval scales
Interval scales are distinguished by the following properties:
1. The categories are of interest in themselves and are not chosen
5
arbitrarily.
2. It does not normally make sense to form a new category by
amalgamating adjacent categories.
3. Attached to the j’th category is a cardinal number or score,
sj ,
such that the difference between scores is a measure of distance
between or separation of categories.
Note:
Genuine interval scales having these 3 properties are rare in practice
because, although properties 1 and 2 may be satisfied, it is rare to
find a response scale having well determined cardinal scores attached
to the categories.
There are 3 options for model construction.
1.
 rj x  
 s j  s j 1 

  x  x c j  c 
log 





0
1


1

r
x
2


j


where c j 
s j  s j 1
2
 s j  s j 1 

 .
c

log
it
or j
2


2.
The probability

j
can also be used. The model is
 j xi  

 exp  x  ,
k
j 1
where

exp  j xi 
j
 j  xi    j   xi  s j   i .
Note:
6
i
The relative odds for category j over category k in the above model
are
 j x 
 exp  j   k   x s j  sk 
 k x 
Thus, the relative odds are increased multiplicatively by the factor
exp s j  sk  per unit increase in x .
3.
k
 x s
j 1
j
i
 xi 
j
In this model, instead of regarding y as the response and the score
sj
as a contrast of special interest, we may regard the observed
score as the response and y as the set of observed multiplicities or
k
weights.
 x s
j 1
j
i
j
is the expected score. The estimate of the
expected score is
k
Si 
s
j 1
j
yij
mi
.
If there are only two treatment groups, with observed counts
y
1j
we may use the standardized difference as test statistic
T
S1  S 2
2
 k
 1
k


1 
2
~
~
  j s j     j s j  




 j 1
 j 1
  m1 m2 
7
, y2 j 
~  y1 j  y 2 j

where j
m1  m2 .
(d) Models for nomial scales
The probability

j
can be used. The model is
 j xi  

 exp  x  ,
k
j 1
where

exp  j xi 
j
i
 j  xi    j  x0    xi  x0  j   i .
Note:
The relative odds for category j over category k in the above model
are
 j x   j x0 

exp x  x0  j   k 
 k x   k x0 
Thus, the relative odds are increased multiplicatively by the factor
 j  x0 
exp  j   k 
 k  x0 
per unit increase in
x.
(e) Models for nested or hierarchical scales
Example:
Objective: we want to test the hypothesis that a winter diet
containing a high proportion of red clover has the effect of reducing
the fertility of milch cows.
8
To test the hypothesis, 80 cows were assigned at random to one of the
two diets. More cows become pregnant at first insemination but a few
require a second or third insemination. The response variable is the
pregnancy rate. The response, probability and odds are summarized
in the following table:
Insemination
Response
Probability
Odds
Y1 | m
1
First
1
1  r1
Second
Y2 | m  y1
2
Third
Y3 | m  y1  y 2
3
2
1  r1
3
1  r2
1  r2
1  r3
Then, a simple sequence models having a constant treatment effect is
as follows:
g  1    1  x
 2
g
1 r
1



   2  x

 3
g
1 r
2



   3  x

If the logistic link function is used, we have
  j
log 
1 r
j

The incident parameters

 


j
 1 , 2 ,, k 1
expected decline in fertility.
9
 x .
make allowance for the
Download