A General Sample Selection model with Skew-normal distribution

advertisement
A General Sample Selection model with Skew-normal
distribution
Emmanuel O. Ogundimu and Jane L. Hutton
Department of Statistics, University of Warwick, UK.
{E.O.Ogundimu, J.L.Hutton}@warwick.ac.uk
14/03/12
Abstract
Scores arising from questionnaires often follow asymmetric distributions, on a fixed
range. This can be due to scores clustering at one end of the scale or selective reporting.
Sometimes, the scores are further subjected to sample selection resulting in its partial
observability. Thus, methods based on complete cases for skew data are inadequate for
the analysis of such data and a general sample selection model is required. Heckman
(1976) proposed a full maximum likelihood estimation method under the normality
assumption for sample selection problems, and parametric and non-parametric extensions have been proposed. We develop a general sample selection model with underlying
skew-normal distribution. A link is established between the continuous component of
our model log-likelihood function and an extended version of a generalized skew-normal
distribution. This link is used to derive the expected value of the model, which extends
Heckman’s two-stage model. Finite sample performance of the maximum likelihood estimator of the model is studied via Monte-Carlo simulation. The model parameters are
more precisely estimated under the new model even in the presence of moderate to extreme skewness than the Heckman selection models. The model is applied to data from
a study of neck injuries where the responses are substantially skew. Computational and
identification issues are discussed.
1
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
1
Introduction
For certain diseases, the patient’s perception of his or her well-being may be the most important outcome of interest. These are broadly termed quality of life (QoL) outcomes when
patient-reported outcome is measured. Scores arising from instruments designed to assess
QoL (e.g. screening questionnaires) often follow asymmetric distributions due to skewness
inherent in the Likert-scale type instruments. Indeed, skewness related studies are not uncommon in psychology literature due to the use of such instruments. Apart from this, the
realized samples from the underlying discrete process are further subjected to selective reporting and missing data, with the scores reflecting a selected population. Consequently,
there is need for a general model for sample selection with inherent skewness.
If a sample selection approach is taken to item nonresponse in questionnaires, the data
are assumed to be missing not at random (MNAR). This assumption is more realistic than
the missing at random (MAR) missing data mechanism assumption. For instance, patients
may refuse to answer sensitive questions (e.g. underlying health issues, drug addiction)
on a questionnaire for reasons related to the underlying true values for those questions.
In multivariate settings with arbitrary patterns of nonresponse, the MAR assumption is
convenient computationally, but it is often implausible (Robins and Gill, 1997). In this
setting, MAR means that a patient’s probabilities of responding to items may depend only
on his or her own set of observed items, which is an unreasonable assumption. However,
when we suspect that nonresponse may depend on missing values, then a proper analysis will
be to model jointly the population of complete data and the nonresponse process. Selection
models are therefore viable tool.
A Selection model was introduced by Heckman (1976). He proposed a full maximum
likelihood estimation under the assumption of normality. His method was criticized on the
ground of its sensitivity to normality assumption prompting him to develop the two-step
estimator (Heckman, 1979). Sample selection models, also referred to as models with incidental (hidden) truncation, arise in practice as a result of the partial observability of the
outcome of interest in a study. The data are missing not at random (MNAR) because the
observed data do not represent a random sample from the population, even after controlling
for covariates. Although the model has its origin from the field of Economics, it has been
applied extensively in other fields like Finance, Sociology and Political science, but sparingly
in medical research. A prominent application to treatment allocation for patients and links
with the skew-normal distribution was discussed by Copas and Li (1997).
The two most common deviations from normality are heavier tails and skewness. In
2
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
dealing with heavier tail in sample selection, Marchenko and Genton (2011) derived a model
using links between hidden truncation and sample selection but with an underlying bivariatet error distribution. They noted that a more appealing flexible parametric model needed to
be considered that can accommodate simultaneously these two deviations from normality. A
skew normal distribution could be a good candidate.
A continuous random variable Z is said to have a standard skew-normal distribution with
parameter λ ∈ R if its density is
f (z; λ) = 2φ(z)Φ(λz),
z ∈ R,
(1.1)
where φ and Φ denote the standard normal density and corresponding distribution function
respectively. The component λ is called the shape parameter because it regulates the shape
of the density function. Although the tail behaviour of skew-normal distribution is similar
to the normal, its lower tail behavior becomes heavier with increase in truncation intensity.
An added advantage of using the skew normal distribution is that the model will contain the
normal one as a special case and a comparison of the two models can be used to study the
degree of deviation from normality.
The article is organized as follows. In section 2, we describe the Copas and Li (1997)
model in relation to the general hidden truncation model formulation of Arnold and Beaver
(2002). Motivation for using the skew-normal as underlying process when Likert-scale type
questionnaires are used in medical research is discussed in section 3. In section 4, a new
model is derived using the general formulation of skew distributions arising from selection
and linked with hidden truncation formulation of the model. Finite sample performance of
this model are studied. The model is applied to a real life data in section 5 and conclusion
given in section 6.
2
Copas and Li (1997) Selection Model
In this section, we formulate the Copas and Li (1997) model from the unified framework
of skew-distributions arising from selections (Arellano-Valle et al., 2006). A link with the
hidden truncation formulation given by Arnold and Beaver (2002) is established. A general
representation of various classes of skew-normal distribution by using the closed skew-normal
distribution is presented.
3
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
2.1
Copas and Li (1997) model: Skew distributions arising from
selection approach
Let Y ? be the outcome variable of interest, assumed linearly related to covariates xi through
the standard multiple regression
Yi? = β 0 xi + σε1i ,
ε1i ∼ N (0, 1),
i = 1, . . . , N.
Suppose the main model is supplemented by a selection (missingness) equation
Si? = γ 0 xi + ε2i ,
ε2i ∼ N (0, 1),
i = 1, . . . , N
where β and γ are unknown parameters and x are fixed observed characteristics not subject
to missingness. Suppose further that
ε1i
ε2i
!
∼ N2
!
!!
0
1 ρ
,
.
0
ρ 1
Note that the variance of Si? is posited as 1 because only its sign is observed and the variance
is not identifiable in the model. It is assumed that Yi? and Si? are correlated with parameter
ρ in the underlying process. The parameter ρ ∈ [-1,1] determines the severity of the selection
process. Due to the selection, when Si? > 0 (the 0 threshold is arbitrary since no symmetry
is assumed), we observe Yi with n observations out of N from Yi? i.e. si = I(Si? > 0) and
Yi = Yi? si . Thus the observation is on the conditional density
f (y|x, S ? > 0) =
P (S ? > 0|y, x)f (y, x)
f (y, x, S ? > 0)
=
.
P (S ? > 0)
P (S ? > 0)
(2.2)
Equation (2.2) is the basis of the unification of selection problems as skew distributions
given by Arellano-Valle et al. (2006). It is straightforward to see that
0x
1
φ y−β
σ
σ
f (y|x, S = 1) =
Φ
γ 0 x+ρ
√
Φ(γ 0 x)
y−β 0 x
σ
1−ρ2
,
(2.3)
where φ and Φ are as defined in (1.1).
4
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Model (2.3) can be related to Rubin (1976) definitions of missing data mechanism. Data
are missing completely at random (MCAR) when the probability of missing data on the
response variable Y is not related to other measured covariates and is unrelated to Y itself.
If the non-intercept terms in γ, as well as ρ are 0 in (2.3), the data is MCAR. A complete
case analysis without the need of any adjustment using covariates will give valid inference.
Data are missing at Random (MAR) when the probability of missing data for Y is related
to some other measured covariates in the analysis model but not on the values of Y . Thus,
if ρ = 0 in (2.3) the data are MAR, and valid inference about the conditional distribution
of Y given x can be made when adjustment for missing data is done using covariates on
complete cases. The third missing data mechanism is the missing not at random (MNAR)
mechanism. Data are MNAR, when the probability of missing data on Y is related to the
values of Y itself, even after adjusting for covariates. If ρ 6= 0 in (2.3) then the missing data
is MNAR. In this case, the missing data process is said to be informative or non-ignorable.
It is non-ignorable in that the missing data process needs to be accounted for in order to
arrive at valid inference.
The complete density of a sample selection model is comprised of a continuous component
(the conditional density given by (2.3)), and a discrete component given by P (S). The
distribution of the discrete component determines the nature of the model to be fitted to the
selection process. In Copas and Li (1997), the model P (S = s) = {Φ(γ 0 x)}s {1 − Φ(γ 0 x)}1−s
(i.e a probit model) is used. The log-likelihood function is therefore
n
n
n
X
X
X
0
si (ln Φ(γ xi )) +
(1 − si ) ln Φ(−γ 0 xi )
l(β, σ, γ, ρ) =
si ln f (yi |xi , Si = 1) +
i=1
n
1
−n
ln 2π − ln σ 2 −
=si
2
2
2
+
n
X
i=1
n
X
i=1
i=1
0
2
(yi − β xi )
+
σ2
n
X
i=1
0 x 0
i
γ xi + ρ yi −β
σ
p
ln Φ
1 − ρ2
(1 − si ) ln Φ(−γ 0 xi ).
i=1
(2.4)
2.2
Copas and Li (1997) model: Hidden truncation formulation
The continuous component of the sample selection density give by (2.3) has a link with the
0
extended skew normal distribution. Let µ = β 0 x, λ0 = √γ x 2 ∈ R and λ1 = √ ρ 2 ∈ R in
1−ρ
1−ρ
(2.3); we then have the pdf written in the usual extended skew-normal form:
5
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Φ λ0 + λ1 ( y−µ
)
σ
,
λ0
√
σΦ
2
y−µ
σ
φ
f (y; µ, σ 2 , λ0 , λ1 ) =
(2.5)
1+λ1
where λ0 and λ1 are shift and scale parameters respectively (see Capitanio et al. (2003)).
The Azzalini skew normal distribution is recovered when λ0 =0.
Now, equation (2.5) and hence equation (2.3) can be readily derived using the hidden
truncation formulation of Arnold and Beaver (2002). The idea is as follows: Suppose Y and
S are two independent random variables, not necessarily normal (the distribution of Y and
S can be different). Assume Y has density (distribution) function ψ1 (Ψ1 ) and S has density
(distribution) function ψ2 (Ψ2 ). The conditional density of Y |λ0 + λ1 Y > S is
f (y; λ0 , λ1 ) =
ψ1 (y)Ψ2 (λ0 + λ1 y)
.
P (λ0 + λ1 Y > S)
(2.6)
In the case of Copas and Li (1997) model, Y and S are normal. Thus, we write
f (y; λ0 , λ1 ) =
φ(y)Φ(λ0 + λ1 y)
.
λ
0
Φ √ 2
1+λ1
By performing location-scale transformation, this equation becomes equation (2.5). The
moment generating function of the pdf in (2.5) is
Φ
My (t) =
λ√
0 +σλ1 t
1+λ21
σ 2 t2 exp tµ +
,
2
Φ √ λ0
1+λ21
and the first moment is
E(Y ) = µ + σ p
Λ p
,
1 + λ21
1 + λ21
λ1
λ0
(2.7)
6
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
0
where Λ(.) = φ(.)/Φ(.) is the inverse Mill’s ratio. If we substitute µ = β 0 x, λ0 = √γ x
1−ρ2
R and λ1 = √ ρ
1−ρ2
∈
∈ R, in (2.7) we have
E(Y |x, S ? > 0) = β 0 x + σρΛ(γ 0 x),
(2.8)
which is the expected value of the conditional distribution whose density is given by (2.3).
Equation (2.8) is the basis of Heckman’s two-step procedure (Heckman, 1979). A standard
probit model is fitted by noting cases with S = 1 and γ is estimated. The resulting estimate
of γ̂, is used to form Λ(γ̂ 0 x) for each of the cases with S = 1. This quantity is then taken as
an additional covariate in equation (2.8) which is fitted by least squares. The coefficient of
the additional covariate now gives an estimate of σρ.
The method given by equation (2.8) is more robust to normality assumption than the
likelihood method given in (2.4). However, when the outcome and selection equations contain
the same covariates, the method has been shown to perform poorly (Puhani, 2000). This
is due to collinearity among the covariates which the inverse Mill’s ratio (Λ(.)) could not
remove because Λ(.) is linear in a wide range of its support. To circumvent this problem, the
so called exclusion restriction (i.e. at least an extra variable which predict selection is included
in the selection equation and excluded from the outcome equation) is used in practice. A
general sample selection model which has Copas and Li (1997) model as a special case will be
formulated in section 4. The first moment of this model will be shown to extend Heckman’s
2-step method and it is expected to be less sensitive to collinearity among covariates. The
correction term equivalent to the inverse Mill’s ratio in this model is nonlinear in the wide
range of its support due to the fact that it comes from the distribution function of a skew
normal.
We consider next a special tool that will simplify the formulation of this model and many
more general models in this category.
2.3
The closed skew-normal distribution
The CSN family is constructed in the multivariate framework because it is a generalization of
the multivariate skew-normal distribution such that some important properties of the normal
distribution are preserved. It is closed under:
• Marginalization
• Conditional distribution
7
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
• Linear transformations
• Sums of independent random variables from CSN family
• Joint distribution of independent random variables in CSN family.
We begin with a definition of the CSN distribution.
Definition 1 : Consider p ≥ 1, q ≥ 1, µ ∈ Rp , ν ∈ Rq , D an arbitrary q × p matrix, Σ and
∆ positive definite matrices of dimensions p × p and q × q, respectively. Then the probability
density function (pdf) of the CSN distribution is given by:
fp,q (y) = Cφp (y; µ, Σ)Φq (D(y − µ); ν, ∆),
y ∈ Rp ,
(2.9)
with:
C −1 = Φq (0; ν, ∆ + DΣD0 ),
(2.10)
where φp (.; η, Ψ),Φp (.; η, Ψ) are the pdf and cdf of a p-dimensional normal distribution with
mean η ∈ Rp and p × p covariance matrix Ψ. We write Y ∼ CSNp,q (µ, Σ, D, ν, ∆), if y ∈ Rp
is distributed as CSN distribution with parameters q, µ, D, Σ, ν, ∆. The special case of ν = 0
in (2.9), gives,
fp,q (y) = 2q φp (y; µ, Σ)Φq (D(y − µ); 0, ∆).
(2.11)
It is straightforward to see that the pdf in (2.9) includes the normal distribution as a special
case.
The following properties CSN distributions are required
(1) Distribution function: Let Y ∼ CSNp,q (µ, Σ, D, ν, ∆). The distribution function of Y is
Fp,q (y) = CΦp+q
!
!
!!
y
µ
Σ
ΣD0
;
,
,
0
ν
DΣ ∆ + DΣD0
(2.12)
where C is as defined in (2.10).
(2) Scalar multiplication: Let Y ∼ CSNp,q (µ, Σ, D, ν, ∆), then for any c ∈ R
8
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
cY ∼ CSNp,q (cµ, Σc2 , Dc−1 , ν, ∆)
(2.13)
(3) Marginal density: Let Y ∼ CSNp,q (µ, Σ, D, ν, ∆) and partition Y = Y0 = (Y10 , Y20 ),
where Y1 is k dimensional, Y2 is p − k dimensional. Then
Y1 ∼ CSNk,q (µ1 , Σ11 , D? , ν, ∆? ),
(2.14)
?
0
where D? = D1 + D2 Σ21 Σ−1
11 , ∆ = ∆ + D2 Σ22.1 D2 , Σ22.1 = Σ22 − Σ21 Σ11 Σ12 , and µ1 , Σ11 ,
Σ22 , Σ12 , Σ21 came from the corresponding partitions of µ & Σ and D1 , D2 from
k
D= q
D1
p−k
D2
.
(4) Conditional density: Let Y ∼ CSNp,q (µ, Σ, D, ν, ∆), then for two subvectors Y1 and
Y2 , where Y0 = (Y10 , Y20 ), Y1 is k-dimensional, 1 ≤ k ≤ p, and µ, Σ, D are partitioned as
above, then the conditional distribution of Y2 given Y1 = Y10 is
?
CSNp−k,q (µ2 + Σ21 Σ−1
11 (Y10 − µ1 ), Σ22.1 , D2 , ν − D (Y10 − µ1 ), ∆).
(2.15)
The properties of CSN given above are sufficient for our model formulation in section 4.
Further details of CSN can be found in Gonzalez-Farias et al. (2004). We consider next some
examples to motivate the use of bivariate skew-normal underlying process rather than the
normal distribution when dealing with outcome from Likert-scale type questionnaire.
9
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
3
Skew-normal and Normal approximation to Discrete
distributions
In surveys such as market and opinion polls, test and questionnaire data are often used, with
scores measured on N units and P items. A common attribute of data realized from these
surveys is that they are skewed. Several articles on the use of sample Cronbach’s alpha (α̂)
to measure test reliability pointed to the need to take skewness of the data into account (see
for example Maydeu-Olivares et al. (2007) and Yuan et al. (2003)).
In health-related Quality of Life (QoL) studies, questionnaires are often used and realized
samples from them are skewed. This can be due to the fact that the underlying population
from which the samples are drawn (e.g Likert-scale type questionnaires) are discrete. Aside
from the fact that the scale is skewed due to discreteness, there are situations in which some
hidden truncation is already present in the underlying population from which the samples
are drawn. Thus, the assumption of underlying normality is not realistic.
To correct for possible misspecification of the distributional assumption in the parametric
framework, semi-parametric methods are often used for sample selection models. However,
in (most) clinical trial settings, the intercept of the regression model can be of interest for
prediction purposes. This may render the use of this method impractical.
We have shown in section 2.2 that the hidden truncation process almost always leads
to skewness. In this section, we will show that the skew-normal distribution gives a better
approximation to discrete distributions than the normal ones. The use of skew-normal distribution to approximate binomial distribution is presented in section 3.1. Section 3.2 gives
the approximation to negative binomial distribution.
3.1
Skew-Normal Approximation to Binomial distribution
Chang et al. (2008) presented an improved approximation to binomial distribution by the
skew-normal distribution. Their aim was to obtain an approximation better than the normal
ones especially when the binomial distribution is asymmetric with probability p 6= 0.5. For a
given binomial, B(n, p) distribution (where n is the size of the trials and p is the probability
of success in each trial), the parameters of the approximating skew-normal, SN (µ, σ 2 , λ)
distribution is determined by the methods of moment. For example, suppose X ∼ B(n, p),
10
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
and Y ∼ SN (µ, σ 2 , λ), the following moment matching can be used:
E(X) =np = E(Y ) = µ + σ(
p
√
2/π)(λ/ 1 + λ2 )
E(X − E(X))2 =np(1 − p) = E(Y − E(Y ))2
=σ 2 1 − (2/π)λ2 /(1 + λ2 )
(3.16)
E(X − E(X))3 =np(p − 1)(2p − 1) = E(Y − E(Y ))3
p
√
=σ 3 ( 2/π)(λ/ 1 + λ2 )3 ((4/π) − 1).
0.02
0.04
0.06
0.08
Binomial
Normal
Skew−Normal
0.00
0.05
0.10
0.15
dbinom(x1, size = 100, prob = 0.25)
Binomial
Normal
Skew−Normal
0.00
dbinom(x, size = 20, prob = 0.25)
0.20
The simultaneous solution of equation (3.16) gives the desired values of λ, σ and µ. Further
details on the solution and mild restriction on n and p to make the skew-normal approximation work can be found in Chang et al. (2008). To illustrate the use of skew-normal
approximation to binomial distribution, suppose X1 ∼ B(20, 0.25) and X2 ∼ B(100, 0.25),
then the matching skew-normal distributions are Y1 ∼ SN (3.365, (2.534)2 , 1.374) and Y2 ∼
SN (22.205, (5.154)2 , 0.927) respectively.
0
5
10
15
20
x
0
20
40
60
80
100
x
Figure 1: B(20,0.25) pmf with matching normal and skew-normal pdf.
Figure 2: B(100,0.25) pmf with matching normal and skew-normal pdf.
Figure 1 shows the plot of the matching normal and skew-normal pdfs for B(20, 0.25).
The plot clearly shows that the skew-normal gave a far better approximation than the usual
11
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
normal approximation. With large sample (100) presented in Figure 2, the normal and skewnormal pdfs gave good approximation. However, the peak of the binomial pmf was better
approximated by the skew-normal distribution.
3.2
Skew-Normal Approximation to Negative Binomial distribution
Similar to the binomial distribution, the normal approximation is usually used to approximate
the negative binomial distribution. However, Lin et al. (2010) showed by method of moment
matching (as it was done in binomial case) that the skew-normal distribution gives better
approximation than the normal distribution for the negative binomial distribution. Suppose
X ∼ N B(r, p) and Y ∼ SN (µ, σ 2 , λ), the matching distribution was found by equating the
first three moments and they obtained:
n√ p
2/3
o−1/2
λ=
rq (2/π)((4/π) − 1)/(1 + q)
+ (2/π) − 1
1/2
√
2
2
σ =( rq/p)/ 1 − (2/π)λ /(1 + λ )
µ =(rq/p) − σ(
(3.17)
p
√
2/π)λ/ 1 + λ2 .
For example, suppose X1 ∼ N B(20, 0.75) and X2 ∼ N B(100, 0.75), then the matching
skew-normal distributions are Y1 ∼ SN (3.411, (4.415)2 , 2.422) and Y2 ∼ SN (27.766, (8.686)2 , 1.349).
The normal approximation under small sample size (20) is very poor as compared to the
skew-normal approximation (see Figure 3). Although the normal approximation gives good
approximation when the sample size is large (100), the peak of the negative binomial pmf is
better approximated with the skew-normal as it was for the binomial distribution.
In general, the plots clearly show that the skew-normal approximation, when applicable,
is much more superior to the usual normal approximation both for small and large samples
for the binomial and negative binomial distributions. It should therefore be more efficient
to approximate discrete process by skew-normal distribution rather than the normal. This
further motivate the use of underlying skew-normal distribution in the model we develop
next.
12
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
0.06
0.14
0.02
0.03
0.04
0.05
NBinomial
Normal
Skew−Normal
0.00
0.01
dnbinom(x, size = 100, prob = 0.75)
0.12
0.10
0.08
0.06
0.04
0.02
0.00
dnbinom(x, size = 20, prob = 0.75)
NBinomial
Normal
Skew−Normal
0
5
10
15
20
0
20
x
60
80
100
x
Figure 3:
NB(20,0.75) pmf with
matching normal and skew-normal pdf.
4
40
Figure 4: NB(100,0.75) pmf with
matching normal and skew-normal pdf.
Skew-normal selection model (SNSM)
Suppose we relax the assumption of bivariate normality given in section 2.1 such that the
underlying error distribution is bivariate skew-normal. i.e
ε1i
ε2i
!
∼ SN2
!
!
!!
0
1 ρ
λ1
,
,
,
0
ρ 1
λ2
where λ1 and λ2 are the skewness parameters for Yi? and Si? respectively. Then f (y|x, S = 1)
is still defined as equation (2.2). Now, the pdf of the bivariate process can be written as
−1
0
0
f (y, s) = 2φ2 (y, s); (β x, γ x), DRD Φ (λ1 , λ2 )D ((y, s) − (β x, γ x)) ,
0
0
where Σ = DRD and D denotes the diagonal matrix which has the square roots of the
diagonal entries of Σ on its diagonal and R is the correlation matrix. This has a closed
13
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
skew-normal representation (CSN) given as
(y, s) ∼ CSN2,1 µ = (β 0 x, γ 0 x), Σ =
!
σ 2 ρσ
, D = (λ1 /σ, λ2 ), ν = 0, ∆ = 1
ρσ 1
(4.18)
To determine the expression P (S ? > 0|y, x) in equation (2.2), we determine the distribution of S|Y using the conditional distribution property given in equation (2.15). This
gives
y − β 0x y − β 0x S|Y ∼ CSN1,1 γ 0 x + ρ
, 1 − ρ2 , λ2 , −(λ1 + λ2 )
,1
σ
σ
Thus, P (S ? > 0|Y ) = 1 − P (S ? < 0|Y ), and using scalar multiplication properties given
in equation (2.13) becomes
y − β 0x y − β 0x , 1 − ρ2 , −λ2 , −(λ1 + λ2 )
,1
CSN1,1 γ 0 x + ρ
σ
σ
This belongs to the ESN family with shift and scale parameters −(λ1 + λ2 )
p
−λ2 / 1 − ρ2 respectively. We denote this cdf as
y − β 0 x y − β 0x −λ2
, −(λ1 + λ2 )
ΦESN γ 0 x + ρ
; 0, 1 − ρ2 , p
σ
σ
1 − ρ2
y−β 0 x
σ
and
(4.19)
To determine the expression P (S ? > 0) in equation (2.2) we need to extract its marginal
distribution from the bivariate process. Using the property of marginalization of CSN (see
equation (2.14)) one can easily write the P (S ? < 0) as
P (S ? < 0) = CSN1,1 γ 0 x, 1, λ2 + λ1 ρ, 0, (1 + λ21 − λ21 ρ2 )
which turns out to be a skew normal distribution. Since, P (S ? > 0) = 1 − P (S ? < 0) we
write
−(λ2 + λ1 ρ)
?
0
P (S > 0) = ΦSN γ x; 0, 1, p
,
(4.20)
1 + λ21 − λ21 ρ2
where ΦSN denotes the cdf of a skew-normal random variable. By noting that Y ∼
14
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
SN (β 0 x, σ 2 , λ1 ) before the selection process and substituting (4.19) and (4.20) into the general
sample selection equation (2.2) we have
2
φ
σ
f (y, x, S = 1) =
y−β 0 x
σ
0 0 λ1 (y−β 0 x)
y−β x
x
0
2 √−λ2
Φ
ΦESN γ x + ρ σ ; 0, 1 − ρ ,
, −(λ1 + λ2 ) y−β
σ
σ
1−ρ2
−(λ2 +λ1 ρ)
0
ΦSN γ x; 0, 1, √ 2 2 2
1+λ1 −λ1 ρ
(4.21)
If λ1 and λ2 are set equal to zero in (4.21), Copas and Li (1997) model given by (2.3) is
recovered.
From now on, we shall restrict attention to a special case of the model given in (4.21).
Suppose only λ2 is set equal to zero, we get a simpler model:
2
φ
σ
γ 0 x+ρ y−β0 x λ1 (y−β 0 x)
√ σ
Φ
Φ
σ
1−ρ2
−λ
ρ
1
0
ΦSN γ x; 0, 1, √ 2 2 2
y−β 0 x
σ
f (y, x, S = 1) =
(4.22)
1+λ1 −λ1 ρ
This situation is possible in practice where the underlying mechanism governing selection
is not skewed before entering the joint process.
Equation (4.22) describes another class of skew-normal distributions. To see this, suppose
0
substitution similar to what was done in section 2.2 is used, i.e, if we put µ = β 0 x, λ0 = √γ x 2
1−ρ
∈ R and λ = √ ρ
1−ρ2
∈ R in (4.22), we obtain
λ1 (y−µ) y−µ
Φ
Φ λ0 + λ σ
σ
2φ
.
f (y; µ, σ 2 , λ0 , λ1 , λ) =
λ0
−λ1 λ
σ
√
√
ΦSN 1+λ2 ; 0, 1,
2
2
y−µ
σ
(4.23)
1+λ1 +λ
If λ0 = 0, in (4.23) we have
f (y; µ, σ 2 , λ1 , λ) =
2φ
σ
λ1 (y−µ) Φ
Φ λ
σ
ΦSN 0; 0, 1, √ −λ12λ
y−µ
σ
y−µ
σ
.
(4.24)
1+λ1 +λ2
Comparing equation (4.24) with equation 12 given in Jamalizadeh et al. (2008), and noting
15
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
that
π
cos−1 √
−λ1 λ
√
1+λ21 1+λ2
=
1
ΦSN 0; 0, 1, √ −λ12λ
,
(4.25)
1+λ1 +λ2
shows that the two models are the same. The L.H.S in (4.25) was evaluated using the orthant
probability expression. In this case, the orthant probability is of the form P (Y1 > 0, Y2 > 0)
with Y1 and Y2 ∼ N2 (0, Σ), where Σ is a 2 × 2 diagonal matrix with diagonal elements 1 and
non diagonal element ρ12 (See a more general expression in Kotz et al. (2000)). However, the
R.H.S, although it requires evaluation of two-dimensional integral, is a more general expression when the centered orthant probabilities rule is not applicable and is readily available
in public statistical softwares (e.g. the ‘psn’ function in Azzalini’s skew-normal package).
Indeed, when µ = 0 and σ = 1 in equation (4.23), it can be referred to as an extended twoparameter generalized skew-normal distribution denoted as GSN (λ0 , λ1 , λ2 ) since it extends
the two parameter generalized skew-normal distribution discuss in Jamalizadeh et al. (2008)
with an extra parameter λ0 .
In general, equation (4.22) could be derived from the general hidden truncation formulation given by equation (2.6) with appropriate re-parametrization. Whichever route is
taken, the use of the CSN distribution is essential. If skew distribution arising from selection (Arellano-Valle et al., 2006) approach is followed, then the required properties of
CSN distributions are its conditional distribution, marginal distribution and scalar multiplication properties. However, hidden truncation formulation of Arnold and Beaver (2002)
make use of scalar multiplication and additive properties of the CSN distribution. Thus,
the link between skew distribution arising from selection and its counterpart through hidden
truncation formulation is not limited to the elliptical distributions. It is readily extended
to skew-elliptical distributions even though the correlation coefficient may not be adequate
in capturing association in the underlying process due to the non-elliptical contours in this
case.
4.1
Moment estimator of Skew-normal selection model
Now that we have established that the link between skew-normal distribution arising from
selection and the hidden truncation formulation of the model is also applicable in the skewsymmetric family, the conditional expectation of model in (4.22) can be readily derived.
Let Zλ0 ,λ1 ,λ ∼ GSN (λ0 , λ1 , λ) be given by
16
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
k(λ0 , λ1 , λ)φ(z)Φ(λ1 z)Φ(λ0 + λz) z ∈ R,
(4.26)
2
where k(λ0 , λ1 , λ) =
ΦSN
√ λ0 ;0,1,
1+λ2
√ −λ12λ
1+λ1 +λ2
Theorem 1 : If M (t; λ0 , λ1 , λ) is the moment generating function of Zλ0 ,λ1 ,λ ∼ GSN (λ0 , λ1 , λ),
then
t2 /2
M (t; λ0 , λ1 , λ) = k(λ0 , λ1 , λ)e
Φ2
λ1 t
λ + λt
λ1 λ
p
√0
p
,
;
√
1 + λ2
1 + λ21
1 + λ21 1 + λ2
(4.27)
where k(λ0 , λ1 , λ) is as given in (4.26) and Φ2 (., ., ρ) denotes the cdf of N2 (0, 0, 1, 1, ρ).
The derivation of equation (4.27) can be found in the Appendix.
The moments of Zλ0 ,λ1 ,λ can be derived from (4.27). In particular, the first moment, after
some algebra, is
(
2
E(Zλ0 ,λ1 ,λ ) =
ΦSN
√ λ0 ; 0, 1,
1+λ2
√ −λ12λ 2
1+λ1 +λ
p
λ0 1 + λ21
1
λ1
√ p
Φ p
2π 1 + λ21
1 + λ21 + λ2
(4.28)
)
λ0 −λ0 λ1 λ
λ
p
.
+√
φ √
Φ √
1 + λ2
1 + λ2
1 + λ2 1 + λ21 + λ2
If Y = µ + σZλ0 ,λ1 ,λ , then E(Y ) = µ + σE(Zλ0 ,λ1 ,λ ). Using the link between equations
(2.7) and (2.8) in section 2.2 and noting the regression parametrization, the conditional mean
of the model given by equation (4.22) can be written as
?
0
E(Y |x, S > 0) =β x + σ
(
2
ΦSN γ 0 x; 0, 1, √
−λρ
√
1
γ 0 x 1 + λ2
λ
√ √
Φ p
2π 1 + λ2
1 + λ2 − λ2 ρ2
1+λ2 −λ2 ρ2
)
0
−γ
xλρ
+ρφ(γ 0 x)Φ p
,
1 + λ2 − λ2 ρ2
(4.29)
17
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
where from now on, we take λ1 = λ. When λ = 0 in equation (4.29), we have the
Heckman two-step model given in equation (2.8). To visualize the impact of using selectionnormal model when the correct model is the one given by equation (4.29), we plot the second
component of the expectation as a function of γ 0 x, the mean of the selection variable. We
take ρ = 0.5 and 0.9 for values of λ= 0, 1, 2 and 5. It should be noted that λ = 0 corresponds
to the inverse Mill’s ratio correction for (2.8). The standard deviation, σ, simply scales the
correction factor and ρ is the correlation between the outcome and the selection process.
ρ = 0.9
4
λ=0
λ=1
λ=2
λ=5
3
E(Y|x, S > 0) − β'x
1.5
1
1.0
0
0.0
0.5
E(Y|x, S > 0) − β'x
2.0
λ=0
λ=1
λ=2
λ=5
2
2.5
ρ = 0.5
−4
−2
0
2
4
−4
γx
−2
0
2
4
γx
'
'
Figure 5: Plot of correction factor for
different values of skewness parameter
with λ = 0 corresponding to the normal
case.
Figure 6: Plot of correction factor for
different values of skewness parameter
with λ = 0 corresponding to the normal
case.
It can be seen from Figure 5 (ρ = 0.5) that for positive values of the selection linear
predictor γ 0 x, the conditional expectation will be underestimated under the usual selectionnormal model. This underestimation increases as the skewness increases. However, for
negative values of γ 0 x, the underestimation of the conditional expectation by the selectionnormal model compared to selection skew-normal model decreases and the difference dies out
as γ 0 x becomes more negative and missingness increases. This observation is also true for
ρ = 0.9, as the figures are similar (see Figure 6).
Sometimes, the marginal effect of the covariates (xi ) on the outcome Yi in the observed
sample may be of interest. For the Heckman two-step model, the effect consists of two
components- the direct effect of the covariates on the mean of Yi which is captured by β and
18
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
the indirect effect of the covariates in the selection equation. For Heckman two-step model
(equation (2.8)), the marginal effect is given by
2 0
∂E(Y |x, S ? > 0)
φ(γ 0 x)
0 φ(γ x)
= βi − ρσγi γ x
+
∂xi
Φ(γ 0 x)
Φ(γ 0 x)
(4.30)
Using similar argument, the marginal effect corresponding to equation (4.29) can be written as
"
!
2
∂E(Y |x, S ? > 0)
−γ 0 xλρ
0
0
=βi − σγi
ρ(γ x)φ(γ x)Φ p
∂xi
1 + λ2 − λ2 ρ2
−λρ
0
ΦSN γ x; 0, 1, √
1+λ2 −λ2 ρ2
2 2
−γ 0 xλρ
0
√
√
2ρ φ(γ x)
Φ
1+λ2 −λ2 ρ2
1
γ 0 x 1 + λ2
λ
−√ p
+
φ p
2π 1 + λ2 − λ2 ρ2
1 + λ2 − λ2 ρ2
−λρ
0
√
ΦSN γ x; 0, 1,
1+λ2 −λ2 ρ2
λρ2 φ(γ 0 x)
−γ 0 xλρ
+p
φ p
1 + λ2 − λ2 ρ 2
1 + λ2 − λ2 ρ2
−γ 0 xλρ
0
√
√
2φ(γ x)Φ
#
1+λ2 −λ2 ρ2
1
λ
γ 0 x 1 + λ2
.
+√ √
Φ p
2π 1 + λ2
1 + λ2 − λ2 ρ2
−λρ
0
√
ΦSN γ x; 0, 1,
2
2 2
1+λ −λ ρ
(4.31)
Equation (4.31) reduces to equation (4.30) when λ = 0.
From Figures 7 and 8, the conditional marginal effect of covariates xi on the outcome
Y will be underestimated by the selection-normal model for positive values of γ 0 x between
(roughly) -4 and 4. When |γ 0 x| exceeds 4, this effect dies out since the correction factor
becomes zero for all the values of λ (including λ = 0).
4.2
Maximum likelihood estimation
The complete density of the selection skew-normal model, like the selection normal model,
is comprised of a continuous component given by (4.22) and a discrete component for P (S).
As stated earlier, the distribution of the selection process determines the nature of the model
19
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
−0.4
−0.2
λ=0
λ=1
λ=2
λ=5
−0.6
∂(E(Y|x, S > 0)) ∂(x) − β'
−0.3
−0.2
−0.1
λ=0
λ=1
λ=2
λ=5
−0.5
−0.8
−0.4
∂(E(Y|x, S > 0)) ∂(x) − β'
0.0
ρ = 0.9
0.0
ρ = 0.5
−4
−2
0
2
4
−4
−2
0
γx
2
4
γx
'
'
Figure 7: Plot of marginal effect for
different values of skewness parameter
with λ = 0 corresponding to the normal
case.
Figure 8: Plot of marginal effect for
different values of skewness parameter
with λ = 0 corresponding to the normal
case.
to be fitted for the binary variable which in this case is given by
P (S = s) = {ΦSN (γ 0 x; 0, 1, λ? )}s {1 − ΦSN (γ 0 x; 0, 1, λ? )}1−s ,
where λ? = √
−λρ
1+λ2 −λ2 ρ2
. This is a probit model with skew-link.
The loglikelihood function is therefore,
n
n
X
X
0
?
l(β, σ, γ, ρ, λ) =
si ln f (yi |xi , Si = 1) +
si ln ΦSN (γ xi ; 0, 1, λ )
i=1
+
n
X
i=1
(1 − si ) ln ΦSN (−γ 0 x; 0, 1, −λ? )
i=1
n
n
n
1 X (yi − β 0 xi )2 X
y i − β 0 xi n
2
+
=si n ln 2 − ln 2π − ln σ −
+
ln Φ λ
2
2
2 i=1
σ2
σ
i=1
0 x 0
n
n
i
X
X
γ xi + ρ yi −β
σ
p
ln Φ
+
(1 − si ) ln ΦSN (−γ 0 x; 0, 1, −λ? ).
2
1
−
ρ
i=1
i=1
(4.32)
20
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
4.3
Monte Carlo Simulation
In this section we study the finite-sample properties of our selection skew-normal model
(SNSM). We compare its performance with selection normal model (Copas and Li, 1997)
SNM, and the Heckman’s two-step method TS, in a version similar to Marchenko and Genton
iid
(2011). The outcome equation is Yi? = 0.5+1.5xi +ε1i , where xi ∼ N (0, 1) and i = 1, . . . , N =
1000. Two types of selection equations: Si? = 1 + xi + 1.5wi + ε2i , with exclusion restriction
iid
wi ∼ N (0, 1), and Si? = 1 + xi + ε2i , without the exclusion restriction are considered. Hence,
β 0 = (0.5, 1.5), and γ 0 =(1, 1, 1.5) and (1, 1) for selection with exclusion and without exclusion
restriction respectively. The covariates xi and wi are independent and are also independent
of the error terms ε1i and ε2i . The error terms are generated from bivariate
! skew-normal
2
σ ρσ
distribution with λ= 0, 0.5, 1, 2 and 5. The covariance matrix Σ =
, where σ = 1
ρσ 1
and the correlation ρ = 0.5. This simulation scenario implies that only Yi? is skew, which
is in line with our model. It should be noted that λ = 0 case corresponds to an underlying
bivariate normal process. We only observe values Yi? when Si? > 0. The degree of censoring is
about 30% with exclusion restriction, and about 20% in the absence of exclusion restriction.
Simulation results are based on 1000 replications
The results of the simulation in the presence of exclusion restriction are presented in Table
1. Even under normality assumption (i.e. λ = 0) the performance of SNSM is comparable to
SNM and TS. For instance, SNM and TS showed slightly less bias in the estimation of the
intercept of the outcome equation than SNSM. However, this advantage is counter-balanced
when the intercept of the selection equation is considered since is has less bias than SNM
and TS. In terms of MSE, SNM and TS are more efficient. Other parameters are comparable
across the three models. In effect, SNM and TS do not appear to show emphatic superior
advantage overall even with underlying normal assumption.
As the degree of skewness increases, the SNSM model gets better in precision of estimating
the intercept of the selection and the outcome equations whereas SNM and TS get worse.
When λ = 5 (which is almost a folded normal), the SNM and TS break down. However,
SNSM performs well but at a cost of non-convergence for some of the samples (in this case,
828 out of 1000 samples converged).
The results of the simulation in the absence of exclusion restriction are presented in Table
2. When the underlying process is normal, the intercept has a lower bias than SNM but higher
than TS. For regression parameters of interest, the three models are comparable. Similar to
what we observed under exclusion restriction, the SNSM model appears useful even when
21
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
the underlying process is normal. When λ increases, the performance of SNSM gets better
both in bias and MSE. There were severe identifiability problems with SNSM model when
λ = 5 as about 300 samples out of 1000 did not converge. However, in cases where a sample
converges, they converged to their MLE.
In addition, the SNSM estimates are better than the SNM and TS models for σ and ρ
when λ ≥ 1 both in the presence and absence of exclusion restriction. Since, the variance
indicates the variability of the probability distribution of the outcomes Yi , it follows that a
correct prediction intervals of new observations will be obtained under SNSM model. Further,
in applied settings (similar to the MINT Trials data we described next), interest may be on
patients who do not return their questionnaire. This requires a correct model for the selection
process. As can be seen from Tables 1 and 2, the SNSM gave consistently smaller bias and
MSE as compared to SNM and TS models for the selection equation when λ ≥ 1. It should
also be noted that the bias in the parameter estimates of the selection equation when SNSM
model is used is smaller even under normality assumption, with or without the exclusion
restriction.
We also considered the effect of varying the underlying correlation in the presence of
exclusion restriction for λ = 1 and 2. The results (see Tables 5 and 6 in the Appendix) are
similar to the ones for ρ = 0.5
5
Application to MINT Trials
We examine data from a multi-center randomized controlled trial of treatments for Whiplash
Associated Disorder (WAD) referred to as Managing Injuries of the Neck Trial (MINT), in
which two treatment regimes were compared: physiotherapy versus reinforcement of advice
in patients with continuing symptoms after three weeks of their initial visit to the Emergency
Department (ED)(Lamb et al., 2007). As with many longitudinal patient-reported outcome
or quality of life studies, the data were collected using questionnaires at regular intervals over
a follow-up period at 4, 8 and 12 months after patient’s ED attendance.
The main goal of the study is to determine if there is any meaningful difference in two
treatments. The primary outcome of interest is return to normal function after the whiplash
injury, and is measured using the Neck Disability Index (NDI). The NDI is a self-completed
questionnaire which assess pain-related activity restrictions in 10 areas including personal
care, lifting, sleeping, driving, concentration, reading and work and result in a score between
0 and 50. It was developed in 1989 by Howard Vernon as a modification of the Oswestry
22
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Table 1: Simulation results in the presence of exclusion restriction.
Bias
MSE
SNSM
SNM
TS SNSM
SNM
TS
β0 0.0016 -0.0001 0.0002 0.0108 0.0024 0.0027
λ = 0.0 β1 -0.0003 -0.0003 -0.0005 0.0019 0.0019 0.0019
γ0 0.0061 0.0067 0.0073 0.0074 0.0050 0.0051
γ1 0.0040 0.0052 0.0059 0.0060 0.0059 0.0060
γ2 0.0080 0.0098 0.0106 0.0094 0.0093 0.0094
σ
0.0028 -0.0009 -0.0007 0.0017 0.0009 0.0009
ρ -0.0007 -0.0006 -0.0021 0.0084 0.0084 0.0113
λ -0.0027
- 0.0175
β0 0.2071 0.3564 0.3564 0.1379 0.1289 0.1291
λ = 0.5 β1 0.0001 0.0002 0.0002 0.0016 0.0016 0.0016
γ0 0.1786 0.2091 0.2101 0.0517 0.0507 0.0514
γ1 0.0203 0.0259 0.0269 0.0074 0.0075 0.0078
γ2 0.0314 0.0398 0.0409 0.0126 0.0125 0.0130
σ -0.0444 -0.0654 -0.0652 0.0065 0.0050 0.0050
ρ -0.0173 -0.0243 -0.0248 0.0104 0.0102 0.0129
λ -0.0030
- 0.1267
β0 0.0445 0.5620 0.5624 0.0361 0.3173 0.3178
λ = 1.0 β1 0.0004 0.0010 0.0007 0.0012 0.0012 0.0012
γ0 0.0401 0.3516 0.3529 0.0282 0.1319 0.1330
γ1 0.0108 0.0533 0.0547 0.0073 0.0098 0.0102
γ2 0.0201 0.0835 0.0860 0.0138 0.0192 0.0199
σ -0.0110 -0.1697 -0.1696 0.0067 0.0293 0.0293
ρ -0.0072 -0.0636 -0.0658 0.0133 0.0155 0.0181
λ -0.0501
- 0.1471
β0 0.0013 0.7088 0.7098 0.0036 0.5034 0.5049
λ = 2.0 β1 0.0007 0.0020 0.0014 0.0008 0.0009 0.0009
γ0 0.0149 0.4706 0.4728 0.0302 0.2310 0.2333
γ1 0.0086 0.0850 0.0877 0.0088 0.0151 0.0157
γ2 0.0140 0.1275 0.1324 0.0171 0.0304 0.0317
σ -0.0006 -0.2879 -0.2881 0.0022 0.0833 0.0834
ρ -0.0065 -0.1087 -0.1145 0.0170 0.0250 0.0285
λ
0.0311
- 0.0993
β0 0.0063 0.7753 0.7771 0.0069 0.6020 0.6047
λ = 5.0 β1 0.0013 0.0029 0.0019 0.0004 0.0007 0.0007
γ0 0.0184 0.5306 0.5337 0.0481 0.2918 0.2952
γ1 0.0066 0.0997 0.1038 0.0114 0.0180 0.0188
γ2 0.0134 0.1489 0.1566 0.0219 0.0368 0.0386
σ -0.0020 -0.3605 -0.3608 0.0025 0.1303 0.1305
ρ -0.0052 -0.1339 -0.1453 0.0216 0.0356 0.0383
λ
0.1398
- 0.8776
-
23
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Table 2: Simulation results in the absence of exclusion restriction.
Bias
MSE
SNSM
SNM
TS SNSM
SNM
TS
β0 0.0143 0.0154 0.0049 0.0607 0.0084 0.0124
λ = 0.0 β1 -0.0123 -0.0121 0.0036 0.0062 0.0062 0.0089
γ0 -0.0002 0.0066 0.0066 0.0167 0.0038 0.0038
γ1 0.0029 0.0100 0.0101 0.0055 0.0052 0.0052
σ
0.0228 -0.0018 0.0059 0.0069 0.0012 0.0023
ρ -0.0359 -0.0427 -0.0237 0.0474 0.0452 0.0651
λ
0.0018
- 0.1139
β0 0.2912 0.3675 0.3593 0.1334 0.1411 0.1372
λ = 0.5 β1 -0.0108 -0.0088 -0.0020 0.0050 0.0048 0.0063
γ0 0.1646 0.2036 0.2038 0.0463 0.0461 0.0462
γ1 0.0157 0.0217 0.0220 0.0060 0.0058 0.0058
σ -0.0406 -0.0642 -0.0586 0.0059 0.0049 0.0050
ρ -0.0654 -0.0648 -0.0440 0.0604 0.0544 0.0683
- 0.2527
λ -0.3782
β0 0.0640 0.5580 0.5604 0.0381 0.3151 0.3187
λ = 1.0 β1 -0.0076 0.0048 0.0025 0.0037 0.0036 0.0042
γ0 0.0759 0.5261 0.5340 0.0527 0.2841 0.2926
γ1 0.0091 0.0434 0.0490 0.0073 0.0084 0.0088
σ -0.0138 -0.1637 -0.1628 0.0067 0.0276 0.0276
ρ -0.0669 -0.0604 -0.0733 0.0768 0.0548 0.0721
- 0.1512
λ -0.0761
β0 0.0036 0.6812 0.7085 0.0045 0.4681 0.5051
λ = 2.0 β1 -0.0017 0.0304 0.0037 0.0023 0.0049 0.0029
γ0 0.0333 0.4451 0.4677 0.0884 0.2052 0.2251
γ1 -0.0047 0.0507 0.0865 0.0121 0.0114 0.0141
σ
0.0033 -0.2708 -0.2827 0.0019 0.0741 0.0805
ρ -0.0556 0.0100 -0.1245 0.0879 0.0869 0.0864
λ
0.0165
- 0.0916
β0 -0.0036 0.7107 0.7744 0.0052 0.5103 0.6021
λ = 5.0 β1 0.0057 0.0744 0.0057 0.0020 0.0115 0.0023
γ0 -0.0232 0.3987 0.5280 0.1175 0.1738 0.2856
γ1 -0.0234 -0.0769 0.1039 0.0211 0.0315 0.0178
σ
0.0048 -0.3219 -0.3559 0.0020 0.1049 0.1272
ρ -0.0065 -0.1803 -0.1503 0.0065 0.1783 0.0979
λ
0.1460
- 0.7745
-
24
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
0
10 20 30 40
0
10
20
30
0.56
0.72
0.69
20
0.57
0
10
20
30
Month.4
●
● ●
●●
●
●
●
●
●
● ● ●
●
●●●
●●
●
●● ● ●
●
●
●
●
● ●
●● ●
● ●●
●●●
●●
● ●●
●
●●
●●●
●●
●●●●
●●
●● ●
●
●
●
●●●
●
●●
●
●●
●●●
●
●●
● ●●●●
●●
●●●
●●
●●●●
●
●
●●
●
●
●●
●●● ●
●
●
●
●
●
●
●
●
●
●
● ●●
●●●●●●●
●●●●●● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●●
●●●
●●
●●●
●
●
●●●●
●
●●
●●●●
●●●
●●
●●
●●
●●●
●●●●●
●●
●
●
●
●●●●●●●●
●●●
●●●
●●●●
●
●●
●●●
●●
●
●● ●
●
●●●
●●●●●
●●
●●
●●●●●
●●●
●●
●
● ●
● ●●
●
●
● ●
● ●●
●
●
●
● ● ● ●
●
●
● ●●
●●●
●●●●●●●●● ●●● ●
●● ●●
●●
●
●
●
●
●
●
●
●●●●
● ●●●●●●●●●●
●● ●
●
● ●●
●●●
●
●
●
●●
●
●●
●●
●
●
●●
●
●●●
●●●
●●●●●●
●●
●
●●●●
●●●●
●
●●
●
●
●
●
●●
●
●
●●●
●●●●●●
●
●
●
●●●
●●
●●
●●
● ●●
●
●
●
●
●
●
●
●●●●
●●●
●●
●●●
●●
●● ●
●●●
●
●
●●●●
●●● ●
●●
●●
●●●
●●
●●
●●
●●●
●
●
●
●●
●●●●●
●●
●●
●●
●
●●
●
●●●
●
●
●●
●
●●
●●●
●●
●● ●
●
●
●
●●
●●
●
●●
●●●●
●
●●
●●●●
●
●
● ●●●
●
●●
●
●
●
●●
● ● ● ●●●
●
●
●
●●
●
● ●●●
●
● ● ●●●
● ● ●●
●
●●
●
●●●
●●●●
●●
●●●
●
●
●
●●
●●●●●●●● ●●
●
●
●●
● ●●
●● ●
●
●
●●●●●●
●
● ●
●
●●
●●
●●
●●●
●
●
●
●● ●
●
●●
●●
●●
●●●●
●
●
●●●●
●●●
●●●●●●●
●
●●
●●
●●
●●
●●●
●●●●
●●
●
●
●
●
●
●
●
●
●
●●●●
●●●
●
● ●●
●●●
●
●
●●
●●
●●
●●
●●●
●●●
●●●●
● ●● ●
●●
●●
●●
●
●●●●●
●●
●●
●
●● ●●
●●●●●
●●
●●●
●●
●●●●●
●●
●
●
●●●●
●
● ●
●●●
●●●
●●
●●●
●●
●●
●●
●
●
●●●● ●
●●
● ●●
●
●●
● ●●●●● ●●
●●
●● ●
●● ● ● ●
●
●
● ● ● ●●●●●
●● ●●
●
●
●●●
●
●●●●●●●●●●●
●●●●●
●
●●●●●
●●●●
●
●● ●● ● ● ●
●
●●●
●
●
● ●●
●●●●
● ● ●
●
●●●●
●
●
●
●
●
●
●●●●●●●●●●
●●●
●
●●
●
●●
● ●●
●●
●●●
●
●●
●●
●
●
●
●● ●
●
●
●
●●
●●
●●
●●●
●●
●●
●
●●
●●
●●●●● ●●
●●●●
●●
●●
●●
●●●
●●
●●
●●
●●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●●●
●●●●
●●●●●
●
●●
●
●
●●●
●●
●●
●●
●●●
●●●●
●●
●●●
●●
●●
10
20
30
40
●
10 20 30 40
●
● ●
●
●●
●● ●
●●
● ●● ●●● ●
● ● ●● ●●
● ●
●
●● ●
●●●
●●●●●●
●●
●●
●● ●
●●
●
●●
●●
●●
●●●●
● ● ●●
●●
●
●
●
●●
●●●
●●
●●
●●●●●●
●●● ●●
●
●
●●
●
●
●
●
●●●● ● ●
●●
●
●
●
●●
●●●
●
●
●
●●●
●●
●●
●
●
●●
●●
●●
●●●
●●●●
●●
●●
●
●
●●
●
●●
●
●●
●●●●●●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●●
●
● ●●
●
●
●●
●
●●
●●●●
● ●●
●
●
●
●●●
●
●
●●
●●
●●●
●● ●●●●●
●●
●●●
●●
●●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
● ● ● ●●●
●●●●●
● ●
●●●●●●
● ●
●● ●●●●●●
●●
● ●
Month.8
0.79
0
0
10 20 30 40
10
0.61
30
40
Baseline
●
●
●●
●
●
●
●● ●
●●●
●
●
● ● ●
● ●●●● ●● ●
●
●
●
●
●●●●
● ●●●
●
● ●
●●●●
●●●●●●
●●●
●●●●●
● ●●
●●
●●
●●
●●
●●
●
●●
●
● ●●
●
●
●●●
●
●●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●● ●
●●
●●
●
●
●●
●
●●
●●●●
●●
●●
●●
●●
●
●●
●
●●●
●
●
●●●
●
●
●●
●●
●
●●●
●●
●● ●
●●
●
●
●●
●
●
●●
●
●●
●●
●●●
● ●
●
●
●●
●
●
●
●●●
●●
●●●●
●
●
●
●
●
●
●●
●●●
●●
●●
●●
●●●●● ●●● ●
●
●●
●●●
●●●●
●●
●●
●●●
●
0
Month.12
10 20 30 40
Figure 9: Marginal distributions and Correlations at Baseline, Month 4, 8 and 12 for the
NDI scores
25
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Low Back Pain Disability Index. The NDI has been shown to be reliable and valid (Vernon
and Mior, 1991), hence its use as a standard instrument for measuring self-rated disability
due to neck pain by clinicians and researchers.
The fact that the responses were derived from the use of a 10-item questionnaire posed
several challenges. These include, but is not limited to the discreteness (Likert-scale type) of
the scores, item and unit nonresponse and dropout with time. These might be responsible
for the skewness present in the observed data (see Figure 9).
There are 599 patients with a total of 1934 measurements and 372 (62%) patients have
complete observations (i.e. scores at all measurements occasion). Further, approximately 50%
of the patients are in the two treatment groups resulting from balanced randomisation. The
mean age is approximately 41 years with range 18 to 78 years. Vernon (2009) recommended
that patient’s replieis with only 2 missed items should be considered complete, with mean
imputation used for adjustment. We follow this recommendation and any patient with 3 or
more missing items are considered as unit missing. In effect, we have only unit nonresponse
left in the dataset. In what follows, we will identify predictors of dropout at each measurement
occasion before restricting attention to a measurement occasion to illustrate our new model.
5.1
Use of Logistic Regression to identify Predictors of Dropout
In any model involving missing data, it is important to include covariates that are predictors
of dropout in the model. For the NDI scores, we use logistic regression to identify predictors
of dropout. Binary response variables werw constructed with value 1 if patient drops out
by months 4, 8 or 12 and 0 otherwise. The first step was to consider if the baseline (y0 )
measurements could influence dropout. We then consider whether any pre-randomization
variables give any further improvement. The two treatments under consideration were also
included with the reinforcement of advice used as the reference category.
The results of these models are presented in Table 3. We focus on the missingness model
at months 8, which shows that age and sex of the patients are good predictors of missingness.
The odds of a female (other variables held constant) dropping out at this month is about 2
times as much as for the male. This implies that females dropout more than males. Similarly,
the odds of a patient dropping out increase by 4.1% with each additional year of age. Thus,
the older the patients are, the more their tendency to dropout.
26
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Table 3: Logistic regression for dropout at 4, 8 and 12 months using Vernon scores
Missing
Estimate
int
0.617
trt(physio)
0.364
sex(female)
0.091
age
0.029
y0
-0.004
y4
y8
5.2
at 4 months
Missing
S.E. p-value Estimate
0.574
0.282
-0.119
0.248
0.142
-0.415
0.254
0.720
0.700
0.010
0.004
0.040
-0.003
0.015
0.799
0.025
at 8 months
Missing
S.E p-value Estimate
0.653
0.856
1.468
0.278
0.135
-0.919
0.272
0.010
0.567
0.012
0.000
0.020
0.021
0.901
-0.053
0.021
0.233
0.018
0.050
at 12 months
S.E p-value
0.736
0.046
0.312
0.003
0.306
0.064
0.012
0.107
0.024
0.026
0.028
0.496
0.029
0.082
Application of selection skew-normal model to the NDI scores
To illustrate our model, we restrict attention to the measurement at 8 months. Table 3 shows
that sex and age are possible predictors of nonresponse for this month. These variables are
used in the selection equations.
The results of fitting SNSM, SNM and TS models to the NDI scores at 8 months is
presented in Table 4. The intercept estimates differ substantially, as expected from the
simulation results. In addition, parameter estimates are similar for covariates in the outcome
equation for the three models. However, as observed in the simulation study, the coefficients
in the probit selection equation for SNM and TS are consistently larger. The degree of
this will depends on the intensity of the skewness present in the data. A similar effect is
also noticeable for the NDI scores. In particular, the skewness parameter (λ = 1.509) is
statistically significant in the SNSM model. This implies that neglecting the influence of λ
in the model, although leads to the same qualitative conclusions for the covariate effects in
the outcome equation, will lead to wrong predictive power of the model. The SNSM is more
general with the advantage of having good predictive power whether or not there is skewness
in the data and, of course, have SNM as a special case.
27
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Table 4: Fit of selection skew-normal model (SNSM), Selection-normal model (SNM), and
Heckman two-step model to the NDI scores at 8 months.
Estimate
SNSM
S.E.
p-value
Estimate
SNM
S.E
p-value
Two Step
Estimate
S.E p-value
Selection Equation
int
age
sex(female)
int
age
sex(female)
prev
trt(physio)
σ
ρ
λ
6
0.165 0.160
0.021 0.005
0.330 0.123
-4.072
0.078
0.418
0.676
0.754
7.764
0.787
1.509
0.906
0.025
0.629
0.035
0.539
0.544
0.131
0.432
0.303
0.000
0.007
0.802 0.108
0.000
0.024 0.006
0.000
0.384 0.142
0.007
Outcome Equation
0.000
0.001
0.507
0.000
0.162
0.000
0.000
0.001
0.491
0.088
0.513
0.684
0.879
6.188
0.802
-
0.732
0.024
0.636
0.035
0.538
0.290
0.072
-
0.502
0.000
0.420
0.000
0.102
0.000
0.000
-
0.818 0.111
0.025 0.006
0.383 0.147
-2.867
0.150
1.566
0.708
0.985
-
5.002
0.113
2.030
0.033
0.545
-
0.000
0.000
0.009
0.567
0.187
0.441
0.000
0.072
-
Conclusion
We introduced a sample selection model with underlying bivariate skew-normal distribution
which we called Skew-normal selection model (SNSM). This model is more flexible than the
conventional sample selection model since it has an extra parameter that regulates skewness
and has conventional sample selection model as a special case. Its moment estimator was
derived using the link between skew models arising from selection and hidden truncation
formulation of skew models. The moment estimator was shown to extend Heckman two-step
model. Maximum likelihood estimation for parameters of the model was considered. A Monte
Carlo study was used to compare the model with conventional sample selection models with
moderate correlation (ρ = 0.5) and varying degree of skewness between 0 and 5. We also fixed
λ to be 1 and 2, and considered the effect of varying the correlation ρ under the exclusion
restriction criteria. The simulation showed that the selection skew-normal model outperforms
the conventional sample selection models for all the skewness parameters considered. The
conventional sample selection model has a negligible advantage when λ = 0 with smaller bias
in the intercept of outcome equation. We also noted that the conventional sample selection
28
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
model breaks down as λ increases to 5 (which is almost a folded normal distribution) and
the SNSM, works well if it converges. The model is very promising even in the absence of
exclusion restriction criteria.
In addition, the model has good estimates of the intercept both in the selection and
outcome equations and hence will give better predictions even when the underlying process
is bivariate normal. We believe that this model should perform well in modeling heavier
tailed data, which is also a prominent departure from normality.
It should be noted that the model presented here is very simple to use. In fact, the model
can be readily implemented using Sample Selection package in R software (see Toomet and
Henningsen (2008)). What is needed is an additional parameter λ to capture skewness, recoding the log-likelihood function to reflect equation (4.32), and either the use of numerical
gradients or adding analytic gradients based on the new log-likelihood function. Starting
values can be obtained using the two-step method in the Sample Selection package. However, we recommend obtaining a starting value for λ by fitting the Azzalini skew-normal
model to complete cases with the intended covarites for the outcome equation. Further, the
optimization routine used was BFGS but other numerical maximization algorithms can be
used as well.
On the issue of model identification, the model is well identified in the sense that for
any Θ1 6= Θ2 , f (y, Θ1 ) 6= f (y, Θ2 ), where Θ1 and Θ2 are model parameters. This is usually
the case with sample selection models since additional information comes into the model
through the selection process. However, in the absence of exclusion restriction and with
λ approaching infinity, the model is weakly identified. It is noteworthy that the model
will fail when λ and ρ equals zero simultaneously. This is not related to the identification
of the model parameters but the usual singularity of the Fisher information and observed
information matrices suffered by the Azzalini’s skew-normal distribution at λ = 0. To make
the SNSM model generally useful in practice, a reparametrization of the model to circumvent
this problem may be required and our future work will look into this.
We also note that model (4.21) is more general than the one presented here. However, it
is computationally complicated. Apart from this, the parameter ρ is no longer adequate to
capture the underlying association. The model therefore needs to be re-parameterized using
correlation curves. Our future development of ideas presented here will look into this and
the Heckman-like two-step estimator given by equation (4.29). An approach which requires
only the knowledge of marginal distribution of the error terms in both the selection and the
outcome equations was presented in Lee (1983). Our subsequent work will examine how
this model compares to the ones we have presented here. Methods for testing hypothesis of
29
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
selection bias (ρ = 0) when the underlying error distribution is bivariate skew-normal will
also be examined in the future development of the skew-normal selection model.
To apply this model in practice, we recommend that the model is fitted in conjunction
with the conventional sample selection model. This can be used to access the degree of
departure from normality. The model could be of benefit in clinical trials and it has prospects
in fields where observational studies are conducted (econometrics, psychology, politics) and
respondents need to complete questionnaires.
30
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
References
Arellano-Valle, R. B., M. D. Branco, and M. G. Genton (2006). A unified view of skewed
distributions arising from selections. The Canadian Journal of Statistics 34, 581–601.
Arnold, B. C. and R. J. Beaver (2002). Skewed multivariate models related to hidden truncation and/or selective reporting. Test 11, 7–54.
Capitanio, A., A. Azzalini, and E. Stanghellini (2003). Graphical models for skew-normal
variates. Scandinavian Journal of Statistics 30, 129–144.
Chang, C., J. Lin, N. Pal, and M. Chiang (2008). A Note on Improved Approximation of
the Binomial Distribution by the Skew-Normal Distribution. American Statistician 62(2),
167–170.
Copas, J. B. and H. Li (1997). Inference for non-random samples. J. R. Statist. Soc. B 59,
55–95.
Gonzalez-Farias, G., J. A. Dominguez-Molina, and A. K. Gupta (2004). The closed skewnormal. In M. G. Genton (Ed.), Skew-Elliptical Distributions and Their Applications: A
Journey Beyond Normality, pp. 25–42. Boca Raton, Florida: Chapman & Hall, CRC.
Heckman, J. (1976). The common structure of statistical models of truncation, sample
selection and limited dependent variables and a simple estimator for such models. Annals
of Economic and Social Measurement 5, 475–492.
Heckman, J. (1979). Sample selection bias as a specification error. Annals of Economic and
Social Measurement 47, 153–161.
Jamalizadeh, A., J. Behboodian, and N. Balakrishnan (2008). A two-parameter generalized
skew-normal distribution. Statistics and Probability Letters 78, 1722–1728.
Kotz, S., N. Balakrishnan, and N. L. Johnson (2000). Continuous Multivariate DistributionsVol. 1 (2 ed.). New York: John Wiley & Sons Ltd.
Lamb, S. E., S. Gates, M. R. Underwood, M. W. Cooke, D. Ashby, A. Szczepura, M. A.
Williams, E. M. Williamson, E. J. Withers, S. M. Isa, and A. Gumber (2007). Managing
Injuries of the Neck Trial (MINT): design of a randomised controlled trial of treatments
for whiplash associated disorders. BMC Muscloskeletal Disorder 8, :7.
Lee, L. (1983). Generalized econometric models with selectivity. Econometrica 51, 507–5012.
31
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Lin, J., C. Chang, and M. R. Jou (2010). A Note on Skew-Normal Distribution Approximation to the Negative Binomial Distribution. WSEAS Transactions on Mathematics 9(1),
32–41.
Marchenko, Y. V. and M. G. Genton (2011). A Heckman selection-t model. Institute for
Applied Maths and Computer Sci., Texas A & M University. Paper No. 171 − 2011.
Maydeu-Olivares, A., D. L. Coffman, and W. M. Hartmann (2007). Asymptotically
distribution-free (ADF) interval estimation of Coefficient alpha. Psychological Methods 12,
157–176.
Puhani, P. A. (2000). The Heckman correction for sample selection and its critique. Journal
of Economic Surveys 14, 53–68.
Robins, J. M. and R. D. Gill (1997). Non-response models for the analysis of nonmonotone
ignorable missing data. Statistics in Medicine 16, 39–56.
Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581–592.
Toomet, O. and A. Henningsen (2008). Sample selection models in R: Package sampleselection. Journal of Statistical Software 27(7).
Vernon, H. (2009). The Neck Disability Index: An instrument for measuring self-rated disability due to neck pain or whiplash-assocaited disorder. Last accessed Febuary 20, 2010 at
http://www.cmcc.ca/Portals/0/PDFs/Research_05_2009_NDI_Manual.pdf.
Vernon, H. and S. Mior (1991). The Neck Disability Index: a study of reliabity and validity.
J. Manipulative Physiol Ther. 7, 409–415.
Yuan, K.-H., C. A. Guarnaccia, and B. Hayslip (2003). A study of the ditribution of sample coefficient alpha with the Hopkins symptom checklist: Bootstrap versus asymptotics.
Educational and Psychological Measurement 63, 5–23.
32
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
7
Appendix
Derivation of Moment Generating function
The moment generating function given in by equation (4.27) is derived as follows:
tz
Z
∞
etz φ(z)Φ(λ1 z)Φ(λ0 + λz) dz
−∞
Z ∞
2
t /2
φ(z − t)Φ(λ1 z)Φ(λ0 + λz) dz
=k(λ0 , λ1 , λ)e
E(e ) =k(λ0 , λ1 , λ)
−∞
Put x = z − t
=k(λ0 , λ1 , λ)e
t2 /2
Z
∞
φ(x)Φ(λ1 x + λ1 t)Φ(λ0 + λx + λt) dx
−∞
2 /2
=k(λ0 , λ1 , λ)et
E(Φ(λ1 x + λ1 t)Φ(λ0 + λx + λt)
t2 /2
P (Y1 − λ1 X < λ1 t, Y2 − λt < λ0 + λt)
λ1 t
λ0 + λt
λ1 λ
t2 /2
=k(λ0 , λ1 , λ)e Φ2 p
,√
;p
√
1 + λ2
1 + λ21
1 + λ21 1 + λ2
=k(λ0 , λ1 , λ)e
where X, Y1 , Y2 are iid N (0, 1), and
P (Y1 − λ1 X < λ1 t, Y2 − λt < λ0 + λt) = Φ2
λ0 + λt
λ1 λ
p
,√
;p
.
√
1 + λ2
1 + λ21
1 + λ21 1 + λ2
λ1 t
Derivation of Gradients
The gradient of the selection skew-normal model log-likelihood given by (4.32) can be derived
as follows:
33
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
X
n
n
X
∂l
1
=si p
K1 xi + (1 − si )
(−2)K2 xi
∂γ
1 − ρ2 i=1
i=1
X
n n
n
X
X
1
λ
ρ
∂l
0
=si 2
yi − β x i −
K 3 xi − p
K 1 xi
∂β
σ i=1
σ i=1
σ 1 − ρ2 i=1
n
n
n
2
X
n
1 X
∂l
ρ
λ X 0
0
0
=si − + 3
y i − β xi − 2
K 3 y i − β xi − p
K 1 y i − β xi
∂σ
σ σ i=1
σ i=1
σ 2 1 − ρ2 i=1
n
n
X
X
1
yi − β 0 xi λ + λ3
∂l
0
=si
K1 ργ xi +
+ (1 − si )
2K4
∂ρ
(1 − ρ2 )3/2 i=1
σ
(1 + λ2 − λ2 ρ2 )3/2 i=1
X
n
n
(y − β 0 x ) X
∂l
ρ
i
i
=si
+ (1 − si )
2K4 ,
K3
∂λ
σ
(1 + λ2 − λ2 ρ2 )3/2 i=1
i=1
where,

0
(yi −β 0 xi )

γ xi +ρ

σ


√
φ

1−ρ2



,
K
=

1

0
(yi −β 0 xi )

γ xi +ρ

σ


√
Φ
φ −γ 0 xi Φ
K2 =
ΦSN
0 x λρ
i
1+λ2 −λ2 ρ2
√ −γ
−γ 0 xi ;0,1, √
λρ
1+λ2 −λ2 ρ2
1−ρ2







(y −β 0 x )

φ λ i σ i



,
K = 

 3 Φ λ (yi −β0 xi )
K4 =
Note that the derivative of ΦSN
ΦSN
σ
√1 φ
2π
2 −λ2 ρ2
1+λ2
−γ 0 xi ;0, 1+λ
−γ 0 xi ;0,1, √
λρ
0
−γ xi ; 0, 1, √
1+λ2 −λ2 ρ2
λρ
1+λ2 −λ2 ρ2
w.r.t. γ follows the usual differen-
tiation of cdf to get the pdf. However, the derivatives of ρ and γ in this expression is not
a straightforward application of this principle. The approach we followed is to re-write the
cdf above as a standard bivariate normal integral. We then used the general results given in
Chapter 46 of Kotz et al. (2000). That is, if Φ2 (., .; ρ) and φ2 (., .; ρ) are standard bivariate
normal cdf and pdf respectively, then
dΦ2 (., .; ρ)
= φ2 (., .; ρ)
dρ
34
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Table 5: Simulation results for λ = 1 and varying ρ in
Bias
SSNM
SNM
TS
β0 0.0990 0.5636 0.5637
ρ = 0.0 β1 0.0007 0.0004 0.0003
γ0 -0.0035 0.0072 0.0073
γ1 0.0049 0.0075 0.0076
γ2 0.0109 0.0148 0.0149
σ -0.0385 -0.1746 -0.1746
0.0042 0.0022 0.0015
ρ
λ -0.2164
β0 0.0461 0.5627 0.5630
ρ = 0.3 β1 0.0007 0.0009 0.0007
γ0 0.0209 0.1960 0.1965
γ1 0.0058 0.0226 0.0233
γ2 0.0114 0.0366 0.0376
σ -0.0113 -0.1714 -0.1714
ρ -0.0026 -0.0432 -0.0449
λ -0.0541
β0 0.0484 0.5614 0.5619
ρ = 0.7 β1 0.0001 0.0009 0.0006
γ0 0.0637 0.5395 0.5437
γ1 0.0185 0.1036 0.1078
γ2 0.0309 0.1583 0.1645
σ -0.0123 -0.1684 -0.1683
ρ -0.0093 -0.0656 -0.0683
λ -0.0564
-
the presence of exclusion restriction.
MSE
SSNM
SNM
TS
0.0977 0.3196 0.3197
0.0014 0.0014 0.0014
0.0110 0.0054 0.0054
0.0063 0.0063 0.0063
0.0101 0.0102 0.0102
0.0121 0.0310 0.0310
0.0194 0.0145 0.0143
0.3274
0.0384 0.3183 0.3187
0.0013 0.0013 0.0013
0.0181 0.0448 0.0451
0.0068 0.0072 0.0073
0.0111 0.0119 0.0121
0.0071 0.0299 0.0299
0.0173 0.0153 0.0158
0.1529
0.0375 0.3164 0.3171
0.0011 0.0011 0.0012
0.0478 0.3011 0.3065
0.0093 0.0187 0.0202
0.0173 0.0383 0.0412
0.0067 0.0289 0.0289
0.0070 0.0113 0.0165
0.1518
-
35
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Table 6: Simulation results for λ = 2 and varying ρ in the presence of exclusion restriction.
Bias
MSE
SSNM
SNM
TS SSNM
SNM
TS
β0 0.0009 0.7127 0.7130 0.0041 0.5093 0.5097
ρ = 0.0 β1 0.0010 0.0004 0.0004 0.0009 0.0030 0.0030
γ0 -0.0004 0.0072 0.0073 0.0193 0.0054 0.0054
γ1 0.0006 0.0074 0.0076 0.0062 0.0063 0.0063
γ2 0.0045 0.0148 0.0149 0.0100 0.0102 0.0102
σ -0.0004 -0.2996 -0.2996 0.0027 0.0902 0.0902
0.0013 0.0030 0.0013 0.0265 0.0138 0.0135
ρ
λ
0.0339
- 0.1110
β0 0.0006 0.7107 0.7114 0.0038 0.5063 0.5073
ρ = 0.3 β1 0.0010 0.0013 0.0030 0.0008 0.0009 0.0009
γ0 0.0040 0.2538 0.2543 0.0256 0.0714 0.0717
γ1 0.0022 0.0325 0.0333 0.0073 0.0080 0.0081
γ2 0.0057 0.0509 0.0524 0.0125 0.0138 0.0140
σ -0.0001 -0.2918 -0.2919 0.0023 0.0856 0.0856
ρ -0.0014 -0.0739 -0.0776 0.0224 0.0200 0.0203
λ
0.0331
- 0.1034
β0 0.0024 0.7072 0.7079 0.0034 0.5010 0.5021
ρ = 0.7 β1 0.0005 0.0026 0.0021 0.0007 0.0008 0.0008
γ0 0.0251 0.7605 0.7683 0.0311 0.5911 0.6038
γ1 0.0141 0.1754 0.1832 0.0102 0.0400 0.0432
γ2 0.0226 0.2629 0.2757 0.0198 0.0849 0.0922
σ -0.0012 -0.2851 -0.2850 0.0021 0.0817 0.00816
ρ -0.0086 -0.1162 -0.1194 0.0088 0.0236 0.0295
λ
0.0284
- 0.0991
-
36
CRiSM Paper No. 12-05, www.warwick.ac.uk/go/crism
Download