3.2 Measurement scales and modeling (a) General There are two types of scales, pure scales and compound scales. A bivariate responses with one response ordinal and the other continuous is an example of compound scales. For pure scales, there are several types: 1. nominal scales: the categories are regarded as exchangeable and totally devoid of structure. 2. ordinal scales: the categories are ordered much like the ordinal number, “first”, “second”,…. It does not make sense to talk of “distance” or “spacing” between “first” and “second” nor to compare “spacings” between pairs of response categories. 3. interval scales: the categories are ordered and numerical labels or scores are attached. The scores are treated as category averages, median or mid-points. Differences between scores are therefore interpreted as a measure of separation of the categories. Note: In applications, the distinction between nomial or ordinal scales is usually but not always clear. For example, hair color and eye color can be ordered to a large extent on the grey-scale from light to dark and are therefore ordinal. However, unless there is a clear connection with electromagnetic spectrum or a grey-scale, colors are best regarded as nomial. 1 (b) Models for ordinal scales Ordinal scales occur more frequently in applications than the other types. The applications include food testing (bad, good, excellent,…), classification of radiographs, determination of physical or mental well-being, …. Note: It is essential the same conclusion can be arrived even though the number or choice of response categories has been changed. As a consequence, if a new category is formed by combining adjacent categories of the old scale, the form of the conclusions should be unaffected. This is an important non-mathematical point that is difficult to make mathematically rigorous. This point lead fair directly to models based on the cumulative probabilities than the category probabilities j r j rather . Commonly used models: There are two commonly used models that are found to work well in practice. They are 1. logistic scale: It is the simplest model. The form is r j x log 1 r x j j x . This model is also known as the proportional-odds model since the ratio of the odds is r j x1 1 r j x1 exp x1 x2 r j x2 , 1 r j x2 2 which is independent of the choice of category (j). In addition, if 1, treatme nt group X 0, control group then , r j 1 1 r j 1 e r j 0 . 1 r j 0 2. complementary log-log scale: The form is log log 1 r j x j x . Note: The model based on logistic scale may be derived from the notion of a tolerance distribution or an underlying unobserved continuous random variable Z, Z x , is distributed as logistic distribution. If the unobserved variable lies in the interval j 1 Z j , then Y y j is recorded. That is, 3 r j x P Y j P Z j P Z x j x P j x exp j x 1 exp j x rj x log j x 1 r x j Note: It is sometimes claimed that the models based on logistic scale and complementary log-log scale and related models are appropriate only if there exists a latent variable Z. This claim seems to be too strong and, in any case, the existence of Z is usually unverifiable in practice. Note: Z x The model, exp( x ) , is worthy of serious consideration, where is distributed as logistic distribution. The model will lead to r j x j x log 1 r x exp x , j where x denominator plays the role of linear predictor for the mean and in the x plays the role of linear predictor for the dispersion or variance. if 4 1, treatme nt group X 0, control group , then r j 1 1 r j 1 j exp j r j 0 1 r j 0 1 exp 1 exp j where exp . increasing in j If 1 , , then the odds ratio is and decreasing otherwise. This model is useful for testing the proportional-odds assumption ( 0 ) against the alternative that the odds ratio is systematically increasing or systematically decreasing in j. Note: Models in which the k-1 regression lines are not parallel can be specified by rj x log j x j . 1 r x j (c) Models for interval scales Interval scales are distinguished by the following properties: 1. The categories are of interest in themselves and are not chosen 5 arbitrarily. 2. It does not normally make sense to form a new category by amalgamating adjacent categories. 3. Attached to the j’th category is a cardinal number or score, sj , such that the difference between scores is a measure of distance between or separation of categories. Note: Genuine interval scales having these 3 properties are rare in practice because, although properties 1 and 2 may be satisfied, it is rare to find a response scale having well determined cardinal scores attached to the categories. There are 3 options for model construction. 1. rj x s j s j 1 x x c j c log 0 1 1 r x 2 j where c j s j s j 1 2 s j s j 1 . c log it or j 2 2. The probability j can also be used. The model is j xi exp x , k j 1 where exp j xi j j xi j xi s j i . Note: 6 i The relative odds for category j over category k in the above model are j x exp j k x s j sk k x Thus, the relative odds are increased multiplicatively by the factor exp s j sk per unit increase in x . 3. k x s j 1 j i xi j In this model, instead of regarding y as the response and the score sj as a contrast of special interest, we may regard the observed score as the response and y as the set of observed multiplicities or k weights. x s j 1 j i j is the expected score. The estimate of the expected score is k Si s j 1 j yij mi . If there are only two treatment groups, with observed counts y 1j we may use the standardized difference as test statistic T S1 S 2 2 k 1 k 1 2 ~ ~ j s j j s j j 1 j 1 m1 m2 7 , y2 j ~ y1 j y 2 j where j m1 m2 . (d) Models for nomial scales The probability j can be used. The model is j xi exp x , k j 1 where exp j xi j i j xi j x0 xi x0 j i . Note: The relative odds for category j over category k in the above model are j x j x0 exp x x0 j k k x k x0 Thus, the relative odds are increased multiplicatively by the factor j x0 exp j k k x0 per unit increase in x. (e) Models for nested or hierarchical scales Example: Objective: we want to test the hypothesis that a winter diet containing a high proportion of red clover has the effect of reducing the fertility of milch cows. 8 To test the hypothesis, 80 cows were assigned at random to one of the two diets. More cows become pregnant at first insemination but a few require a second or third insemination. The response variable is the pregnancy rate. The response, probability and odds are summarized in the following table: Insemination Response Probability Odds Y1 | m 1 First 1 1 r1 Second Y2 | m y1 2 Third Y3 | m y1 y 2 3 2 1 r1 3 1 r2 1 r2 1 r3 Then, a simple sequence models having a constant treatment effect is as follows: g 1 1 x 2 g 1 r 1 2 x 3 g 1 r 2 3 x If the logistic link function is used, we have j log 1 r j The incident parameters j 1 , 2 ,, k 1 expected decline in fertility. 9 x . make allowance for the