A residual from updating based approach for multiple categorical ordinal responses

A residual from updating based approach for
multiple categorical ordinal responses
Giulio D’Epifanio1
Department of Economy, Finance and Statistics, University of Perugia, via A.
Pascoli, 06100 Perugia, Italy, giulio@stat.unipg.it
Summary. To analyze multiple categorical ordinal responses, a pseudo-Bayes approach is proposed which uses estimating equations based on the Constrained Fixed
Point methodology.
Key words: constrained fixed point, multiple categorical ordinal data, pseudoBayes
1 Introduction
In social inquires about perceived health, a typical questionnaire contains queries as the following: How do you consider your
health status? Thus, ordinal responses as the following Y quest. ∈
{”V ery bad”, ”Bad”, ”M oderate”, ”Good”, ”Excellent”} are typically considered
over a L-point (L := 5) verbal descriptor scale. Focusing interest in problem
diagnostics, the original categories of Y quest. may be properly re-codified in
those of ”Severity” Y ∈ {N one, M ild, M oderate, Severe, V ery severe}, so that
higher levels of Y indicate higher level of difficulty.
In cross-sectional design, the policy maker would interpret ordinal responses of Y on social domains using aggregate data set as that of table
1. Using conventional numerical level labels, let Y := L + 1 − Y quest. so that
Table 1. Reference table. Severity of perceived health
X: Sex Age class
Male
45 - 55
55 - 65
65 - 75
75 - 85
85 - 110
Female 45 - 55
55 - 65
65 - 75
75 - 85
85 - 110
Y (Difficulty)
None(1)
Mild(2)
17
11
7
2
0
2
8
3
3
1
160
108
64
18
0
139
91
43
17
3
Moderate(3) Severe(4) Very severe(5)
164
165
181
99
20
182
197
215
131
40
9
23
44
30
7
15
40
51
58
32
1
6
3
11
4
2
5
10
9
6
566
Giulio D’Epifanio
Y ∈ {”1”, ”2”, . . . ”L”}. Over the ordinal categories of Y quest. , assume the
choice process for respondents1 which is sketched in figure 1. Thus, reversing Y quest. to Y , the main substantive concern is in detecting latent rules
which govern, for any severity level l := 2, . . . , L of Y , the ”resistance of lower
severity levels against further aggravation up to l”, by considering both social
strata (induced by sex and age) and current level effects.
Fig. 1. Choice process over Y quest. : partitioning of categories
A recursively structured model would be necessary to interpret table 1.
Unfortunately, in the ”constrained multinomial model”(CMM) setup2 , the full
maximum likelihood approach may be difficult to implement.
In a non standard framework, the Prior Feedback Setup has been proposed
(Casella & Robert, 2002, pp. 203-204) which uses pseudo-Bayes tools. In this
type of inferential framework, D’Epifanio proposed (1996) the ”Fixed Point”
(FP) approach and delineated (1999) the ”Constrained Fixed Point” (CFP),
which has been recently (D’Epifanio, 2004) applied to analyze contingency
tables. Briefly, CFP would match ”candidate priors”, which are parametrized
over conjectured belief-model, to ”posterior” attempting to minimize ”residual from updating”. It provides a type of estimating equation which elicits
data dependent prior (Agresti, 1990, pp. 466). Inferences are very close to
that of maximum likelihood (ML) whenever the likelihood function is sufficiently regular. Thus, typical ML-diagnostic tools may be also considered.
But, since CFP is pseudo-Bayes, it could even work better than ML whenever the likelihood function is pathological (Robert et al., 1993; D’Epifanio,
1996) or parameter smoothing is also requested. On the other hand, CFP data
1
Consider the original question Y quest. . Having started (see figure 1) from the
lower (”Very bad”) level sequentially ascending levels, suppose respondent i is temporarily arrived up to level r-th of the scale of Y . Now, he processes the verbal
description at level r-th against descriptions of higher level r + 1, . . . , L. Then, he
passes to the upper-level (r + 1)-th whenever he perceives the added level-specific
requirements Or of satisfaction are met.
2
it provides a framework where usual types (Lunt, 2005) of ordinal models (the
”proportional odds”, the ”continuation ratio”, the ”stereotype model”, etc.) may be
specified
Title Suppressed Due to Excessive Length
567
analysis is insensitive to the risk, which instead is actual in the full bayesian
analysis, of using some misleading specific prior.
Using CFP, powerful numerical procedures may be implemented. In practice, CFP is a machinery which may be useful from both the classical and
Bayesian perspective. For classic statisticians, it could provide advanced starting points for their classical ML based procedures. For Bayesians, it would
provide an operative tool for automatic eliciting of some ”standard” prior,
over preliminary ”reference data set”, to be then used in their bayesian logic.
This work is organized as follows. In section 2, we provide the formal
framework to interpret data table 1, conditional on working conjectures. In
section 3, we recall CFP methodology and provide specific estimating system.
In section 4, we briefly report results of CFP over data table 1.
2 Conceptual framework and statistical modeling
We establish a formal framework to interpret ordinal responses Y of table 1,
conditional on choice process of figure 1 and working conjecture about effects
of both social strata and levels. Model is structured in two main parts: the
”manifest (observation) model” (1) and the underlying ”belief carrier model”
(2-3-4). Working conjectures are depicted3 over the belief carrier model (instead of directly over manifest model (1), see also Agresti et al, 1989), up to
a certain degree of uncertainty which is (in our perception) physiological and
natural in social sciences.
Within the ”belief container” model (2), which is (used as a formal ”blackboard”) a product of Dirichlet’s distributions, the ”belief carrier model” is
hierarchically structured to incorporate assumptions on choice process of figure 1 (eq. 3) and, subsequent nested, on specific working-conjectures about
latent social and level effects (eq. 4) over process.
From observed responses, information will pass through manifest model
(1) to be then interpreted at light of the belief-model (2-3-4).
2.1 The manifest model
ind.
Yi | ψi ∼ M ult(yi ; ψi , ni ), i = 1, . . . , R
(1)
Here, Yi ∈ {yi1 , . . . , yiL } is the (row-)profile (L := 5) which is associated to
the i-th stratum of the R (R := 10) strata of data in table (1). M ult(yi ; ψi , ni )
denotes the multinomial distribution with parameters ψi := (ψi1 , ψi2 , . . . , ψiL ),
where ψil := P r{Yi = ”l”}, l := 1, . . . , L, and size ni .
3
conjectures are structured by constraining the Dirichlet’s distributions in (2) to
have expected values which fulfill specified pattern-conjectures
568
Giulio D’Epifanio
2.2 The belief carrier model
Let Dirich(ψi ; mi , ai ) denote the Dirichlet’s distribution, which is associated
to the stratum i-th, where mi := (mi1 , mi2 . . . , miL ) := E[Ψi ] and ai > 0 is
the usual scalar hyper-parameter4. The belief-carrier model is structured as
follows:
The belief-carrier ”container”:
ind.
Ψi | mi , ai ∼
i:=1,...,R
Dirich(ψi ; mi , ai ),
X
(2)
L
mi := (mi1 , mi2 , . . . , miL ), 0 < mi1 < 1,
mir = 1,
r:=1
ai := wi , wi > 0
Conjecture on the binary choice process (figure 1):
eηi1
mi1
=
,
mi1 + mi2
1 + eηi1
...
mi1 + ... + mi(l−1)
eηi(l−1)
vil :=
=
,
mi1 + mi2 + ... + mil
1 + eηi(l−1)
...
mi1 + ... + mi(L−1)
eηi(L−1)
viL :=
=
,
mi1 + mi2 + ... + mi(L)
1 + eηi(L−1)
(3)
vi2 :=
Working conjecture about latent phenomenon5 which governs choice process:
ηil = µ0 +
L
X
s:=3
Sex
δl I(s=l) + β(l−1)
· I(Sex(i)=”F em.”) +
K
X
r:=2
Age
βr(l−1)
· I(Age(i)=r) ,
(4)
i := 1, . . . , R, l := 2, . . . , L
Let γ := (µ0 , δ, β) denote the full profile of parameters in (4), which are
implicitly interpreted through table 2.
Notes.
• Across strata i := 1, . . . , R, parameter profiles ψi := (ψi1 , ψi2 , . . . , ψiL ) of (1)
are considered as ”virtual” random vectors which are conditionally independent.
The parametric class of product of Dirichlet’s priors provides a convenient coordinate space {(mi , ai ), i := 1, . . . , R} to depict (see sketch in figure 2) working
conjectures as geometric sub-manifolds
m +···+mi(l−1)
is
• Associated to choice process (figure 1), the parameter vil := i1
mi1 +···+mil
interpreted as follows. For social stratum i-th, at current difficulty level l, vil
measures the ”resistance” of lower severity levels against further aggravation up
to level l. Model (4) depicts the conjecture that vil is driven by sex, age, which
are level-specific, and current level l itself.
4
it weights the ”virtual” number of observations that the prior information would
represent (Agresti et al., 1989)
5
in eq. (4), K := 5 denotes the number of category for Age, I(s=l) the indicator function which selects category l-th when s = l, I(Sex=”F em.”) and I(Age=r)
respectively the indicator which selects ”Sex=female” and the r-th age interval.
Title Suppressed Due to Excessive Length
569
Table 2. Interpretation of parameter γ which depicts working conjecture of eq. (4).
Cell (il)-th refers to the effects which govern process parameter vil
X: Sex Age class
Mild (”2”)
Male
45 - 55
55 - 65
65 - 75
75 - 85
85 - 110
Female
45 - 55
55 - 65
65 - 75
µ0
µ0
75 - 85 µ0
85 - 110 µ0
µ0
Age
µ0 + β21
Age
µ0 + β
31
Age
µ0 + β
41
Age
µ0 + β51
Sex
µ0 + β1
Age
Sex
+β
+ β1
21
Age
Sex
+β
+ β1
31
Age
Sex
+ β41
+ β1
Age
Sex
+ β51
+ β1
Moderate (”3”)
µ0
µ0
µ0
µ0
µ0 + δ1
Age
µ0 + δ1 + β22
Age
µ0 + δ1 + β
32
Age
µ0 + δ1 + β
42
Age
µ0 + δ1 + β52
Sex
µ0 + δ1 + β2
Age
Sex
+ δ1 + β
+ β2
22
Age
Sex
+ δ1 + β
+ β2
32
Age
Sex
+ δ1 + β42
+ β2
Age
Sex
+ δ1 + β52
+ β2
Severe (”4”)
µ0
µ0
µ0
µ0
µ0 + δ2
Age
µ0 + δ2 + β23
Age
µ0 + δ2 + β
33
Age
µ0 + δ2 + β
43
Age
µ0 + δ2 + β53
Sex
µ0 + δ2 + β3
Age
Sex
+ δ2 + β
+ β3
23
Age
Sex
+ δ2 + β
+ β3
33
Age
Sex
+ δ2 + β43
+ β3
Age
Sex
+ δ2 + β53
+ β3
Very severe (”5”)
µ0
µ0
µ0
µ0
µ0 + δ3
Age
µ0 + δ3 + β24
Age
µ0 + δ3 + β
34
Age
µ0 + δ3 + β
44
Age
µ0 + δ3 + β54
Sex
µ0 + δ3 + β4
Age
Sex
+ δ3 + β
+ β4
24
Age
Sex
+ δ3 + β
+ β4
34
Age
Sex
+ δ3 + β44
+ β4
Age
Sex
+ δ3 + β54
+ β4
3 The Constrained Fixed Point methodology
3.1 About the CFP setup
The underlying principle of FP, which is of ”least information”, is that (D’Epifanio,
1996) ”the less a candidate prior is updated (that is, the more it would be insensitive
to the Bayesian updating rule if it had actually used as prior), the more it intrinsically already was accounted for by the information added by current data”, conditional
on the assumed model. This leads to a ”what-if” type of criterion. Thus, by formally
interpreting this principle, CFP would search for minimizing the, properly weighted,
”residual from updating”. By formally interpreting this general principle-, we would
minimize” the ”residual from updating”:
k V ec ( E(Ψ | y, x; γ, w) − E(Ψ | x; γ, w) ) k= min .
γ
(5)
Here6 , E(Ψ | y, x; γ, w) is the response-updated predictive expectation given y,
whereas E(Ψ | x; γ, w) is the non response-updated counterpart, of full parameter
profile Ψ , over the design point x. Let ∆T (γ, w) := V ec ( E(Ψ | y, x; γ, w) − E(Ψ |
x; γ, w)).
Of course in model (2), belief-parameters w might be ”a priori” assigned.
But more interesting, they could be determined by actual data to quantify datadependent uncertainty. A further criterion is necessary which is complementary
to that of ”least updating”. Over the coordinate space of belief-carrier manifold (2), we would (D’Epifanio, 1996) the ”virtual information gain” {V ar(Ψ |
x; γ, w) − V ar(Ψ |y, x; γ, w)} meets, the more is possible, with the ”virtual loss”
V ar(Ψ |y; x; γ, w) due to updating induced by response y, given design point x. In
the full FP, there exists a large sample interpretation (D’Epifanio, 1996) of this
principle which is related to the Fisher’s information.
6
For simplicity we consider here belief-parameter w assigned. CFP uses actual
data (table 1) to determine the point-coordinate (see sketch in figure 2) which elicits
(within the proper coordinate space which depicts working model) that potenzial
”candidate prior” such that it ”would be, the more is possible, insensitive to the
bayesian updating if it had actually used as prior”. So it would be conformed, the
more is possible, to the specified (sub-manifold induced by) working conjecture.
570
Giulio D’Epifanio
Fig. 2. CFP: geometrical sketch
Thus, full CFP would search, in proper coordinate space given the design point
x, for that candidate prior which minimizes full residual:
k
V ec ( E(Ψ | y, x; γ, w) − E(Ψ | x; γ, w) )
k= min
V ec V ar(Ψ | y, x; γ, w) − V ec {V ar(Ψ | x; γ, w) − V ar(Ψ | y, x; γ, w)}
γ,w
(6)
Let denote variation here with ∆y;x .
The method is, generally, consistent and efficient. Asymptotics of unrestricted
FP has been delinated in other works (D’Epifanio, 1996, 1999). It is operatively
important to focus attention here on the fact that belief-container model has an
instrumental role in that convergence (D’Epifanio, 1999) to maximum likelihood estimates (under general regularity assumptions when the conjectured depicted model
is adequate) is very fast irrespective of the choice for some specific class of ”candidate
priors”. Thus, without relevant arbitrariness, a criterion may be considered which
uses convenient belief-container models so that updating rules are possibly simple,
easy to implement and computationally efficient.
3.2 Specific CFP formulation for categorical data
By adapting general criterion7 (5) to categorical data we would search for γ ∗ such
that, given γ ∗ the ”weighted residual from updating”:
R
S
Σi:=1
Σs:=1
{
E(Ψis | y, x; γ, wi ) − E(Ψis | x; γ, wi ) 2
√
} = min
γ
{V arγ ∗ [E(Ψis | y, x; γ ∗ , wi )]}
(7)
is minimized by γ ∗ itself. Here, V arγ ∗ [.] is intended as predictive variance given γ ∗ .
Notes.
• Using manifest model (1) and belief-container (2) (the Multinomial-Dirichlet
model), it is well known (O’Hagan, 1994) that analytic expressions exist which
provide exact calculations of objects in (7). In particular, for stratum i-th, recall
that
7
in miming a structural large sample property of hypothetical ”true” value γ0 ,
if we have used the same structural equations 3-4 to directly model parameters of
manifest model 1
Title Suppressed Due to Excessive Length
571
1
− 1+w1i +ni (1 − A2i )}, where
V arγ [E(Ψis | y, x; γ, wi )] = {mis (1 − mis )}(γ){ 1+w
i
i
Ai := win+n
is
the
shrinkage
effect.
i
• Let Eγ ∗ [.] denote predictive expectation. Use identity
V arγ ∗ [E(Ψis | y, x; γ ∗ , wi )] = V ar[Ψis ; x, γ ∗ , wi )] − Eγ ∗ [V ar(Ψis | y, x; γ ∗ , wi )],
so that weights in (7) may be interpreted (sect. 3.1) as the reciprocal of ”expected
virtual information gain due to updating”
• Suppose that coordinate γ ∗ exists such that E(Ψi | y, x; γ ∗ , w) ≈ E(Ψi |
x; γ ∗ , w). Then, we could check that ”virtual information gain” {V ar(Ψi |
x; γ, wi ) − V ar(Ψi |y, x; γ, wi )} will approximately meets with the ”virtual cost”
(sect. 3.1) V ar(Ψi |y, x; γ, wi ) whenever wi := wi∗ = ni − 1, where ni is the sample size for stratum i-th. Thus, weights in (7) are proportional to the inverse of
predicted cell variance up to a constant.
3.3 The computational process
∂
∂
∂
Let P(γ, w) :=< ∂(γ,w)
, [ ∂(γ,w)
]t >−1 ·[ ∂(γ,w)
] denote the coordinate projector of
the full variation ∆y;x (γ, w) (sect. 3.1) upon the tangent space of the sub-manifold
∂
] denotes the basic coordinate (row-)vector system, < . >
at p(γ, w). Here, [ ∂(γ,w)
the usual inner product. The operator
(γ, w) 7−→ P(γ, w)[W −1/2 ∆y;x ](γ, w)
is a vector field which yields a vector over the tangent space at coordinate-point
(γ, w). This vector field induces a dynamic over the coordinate space, which yields
the following iterative process:
(γ, w)(q+1) = (γ, w)(q) + ρ · P((γ, w)(q) )[W −1/2 ∆y;x](m, Σ)((γ, w)(q) ).
Here, ρ denotes the step-length. Due to the non-linearity, ρ should be sufficiently
small to assure convergence. Provided this process converges, the convergencepoint would satisfy the orthogonal equation: P(γ, w)[W −1/2 ∆Y ;x (γ, w)] = 0. The
convergence-point would be a CFP solutions, by checking that the process reduces
distances progressively.
4 Results of elaborations
Using data table 1 conditional on model (1-2-3-4), a Splus procedure was used
(which is easy to implement and very fast) to calculate estimates of parameters
γ given wi∗ = ni − 1. In principle from a frequentist perspective, to evaluate sampling reliability of estimates γ ∗ a simulation design may be implemented which uses
probabilistic model (1-2-3-4) given γ ∗ . Given estimates γ ∗ , recovered and predicted
tables are reported in figure 3. To help in accurate diagnostics, specific cell ”weighted
residual from updating” are reported. In addition, Square Chi like measures are also
reported
Here, for each stratum i-th, we calculated
P which are specific of strata.
∗
∗
2
χ2i := L
{y
−
n
·
E(Ψ
;
γ
,
w
))}
/(n · E(Ψir ; γ ∗ , wi∗ )), i := 1, . . . , R. Then,
ir
i
ir
i
r:=1
PR i 2
2
the global adequacy measure χ =
i:=1 χi = 25.43357. Therefore, conjectured
model (3-4) seems reasonable to depict ”true” latent phenomena, although perhaps
some attention should be devoted to age in (65-75].
572
Giulio D’Epifanio
Table 3. Estimates of parameters γ of model (4)
effects
base-line
level 3
level 4
level 5
sex=female
µ0 = −2.3912506
δ1 = 2.4975252
δ2 = 5.9474365
δ3 = 7.6910955
age: 55 - 65
age: 65 - 75
age: 75 - 85
age: 85 - 110
level 2
level 3
Sex = −0.7163176
β1
Age
β
= 1.8580490
21
Age
β
= −1.6587429
31
Age
β41
= −1.6096750
Age
β
= −1.4365701
51
Sex = 0.2412238
β2
Age
= −0.3793121
22
Age
β
= −2.8332706
32
Age
β42
= −2.1834729
Ag5
β
= −1.6056114
52
β
level 4
level 5
Sex = 0.2135347
Sex = 0.5682983
β3
β4
Age
Age
= −0.4271680 β
= −1.1240745
23
24
Age
Age
β
= −0.4351274 β
= −1.1023277
33
34
Age
Age
β43
= −2.7817188 β44
= 0.4742685
Age
Age
β
= −2.6903340 β
= −3.6125219
53
54
β
Fig. 3. Results of elaborations and diagnostics.
References
[A90] Agresti, A.: Categorical Data Analysis. Wiley, New York (1990)
[AC89] Agresti, A., Chiang, C.: Model-Based bayesian Methods for Estimating Cell
Proportions in Cross-Classification Table having Ordered Categories. Computational Statistics & Data Analysis, 7, 245-258 (1989)
[CR02] Casella, G., Robert, C.P.: Monte Carlo Statistical Methods (third printing).
Springer, New York (2002)
[DG96] D’Epifanio, G., Notes on A Recursive Procedure for Point Estimation. Test,
5, N. 1, 1-24 (1996)
[DG99] D’Epifanio, G., Properties of a fixed point method. Annales de L’ISUP, Vol.
XXXXIII, Fasc. 2-3, 69-83 (1999)
[DG04] D’Epifanio, G., Data Dependent Prior Modeling and Estimation in Contingency Tables. The Order-Restricted RC Model. In: Vichi M. et al. (ed) Cladag
2003, Studies in Classification, Data Analysis, and Knowledge Organization.
Springer-Verlag, Berlin Heidelberg New York (2004)
[LM05] Lunt, M., Prediction of ordinal outcomes when the association between
predictors and outcome differs between outcome levels. Statistics in medicine,
24, 1357-1369 (2005)
[OA94] O’Hagan, A.: Bayesian Inference. Kendall’s Advanced Theory of Statistics,
Vol. 2b, John Wiley & Sons, New York (1994)
[R93] Robert, C.P., Prior Feedback: a Bayesian Approach to Maximum Likelihood
Estimation, J. Computational Statististics, 8, 279-294 (1993)