Non-Local Priors for High-Dimensional Estimation David Rossell , Donatello Telesca

advertisement
Non-Local Priors for
High-Dimensional Estimation
David Rossell1 , Donatello Telesca2
1
Author’s Footnote
University of Warwick, Department of Statistics
2
UCLA, Department of Biostatistics
April 16, 2014
Abstract
Simultaneously achieving parsimony and good predictive power in high dimensions is a main challenge in statistics. Non-local priors (NLPs) possess
appealing properties for high-dimensional model choice. but their use for
estimation has not been studied in detail, mostly due to difficulties in characterizing the posterior on the parameter space. We give a general representation of NLPs as mixtures of truncated distributions. This enables simple
posterior sampling and flexibly defining NLPs beyond previously proposed
families. We develop posterior sampling algorithms and assess performance
in p >> n setups. We observed low posterior serial correlation and notable
high-dimensional estimation for linear models. Relative to benchmark and
hyper-g priors, SCAD and LASSO, combining NLPs with Bayesian model
averaging provided substantially lower estimation error when p >> n. In
gene expression data they achieved higher cross-validated R2 by using an
order of magnitude less predictors than competing methods. Remarkably,
these results were obtained without the need for methods to pre-screen predictors. Our findings contribute to the debate of whether different priors
should be used for estimation and model selection, showing that selection
priors may actually be desirable for high-dimensional estimation.
Keywords: Model Selection, MCMC, Non Local Priors, Bayesian Model
Averaging
1
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
1
Introduction
Developing high-dimensional methods that balance parsimony and good predictive power is one of the main challenges in statistics. Non-local prior
(NLP) distributions have appealing properties for Bayesian model selection. Relative to local priors, NLPs discard spurious covariates faster as the
sample size n grows, while preserving exponential learning rates to detect
non-zero coefficients [Johnson and Rossell, 2010]. This additional parsimony enforcement has important consequences in high dimensions. Denote
the observations by yn ∈ Yn , where Yn is the sample space. In Normal
regression models, when the number of variables p grows at rate O(nα ) with
0.5 ≤ α < 1 and the ratios between model prior probabilities are bounded,
the posterior probability P (Mt | yn ) assigned to the data-generating model
Mt converges in probability to 1 when using NLPs. In contrast, when using
local priors P (Mt | yn ) converges to 0 [Johnson and Rossell, 2012]. That
is, consistency of P (Mt | yn ) occurs if and only if NLPs are used. These
results make NLPs potentially very attractive for high-dimensional estimation, but this use has not been studied in detail. We remark that consistency
of P (Mt | yn ) can also be achieved using model prior probabilities which
increasingly encourage sparsity as p grows [Liang et al., 2013]. While useful,
the extra complexity penalty induced by this strategy is not guided by the
data but by a priori assumptions on the model space, and is complementary
to parameter-specific priors such as NLPs.
For simplicity we assume a parametric density f (yn | θ) indexed by
θ ∈ Θ ⊆ Rp , defined with respect to an adequate σ-algebra and σ-finite dominating measure. We entertain a collection of K models M1 , . . . , MK with
corresponding parameter spaces Θ1 , . . . , ΘK ⊆ Θ. We assume that models
are nested, where ΘK = Θ is the full model and for any k 0 ∈ {1, . . . , K − 1}
T
there exists k such that Θk0 ⊂ Θk , dim(Θk0 ) < dim(Θk ) and Θk Θk0 has 0
Lebesgue measure. We say that π(θ | Mk ), an absolutely continuous density
for θ ∈ Θk under model Mk , is a NLP if it converges to 0 as θ approaches
any value θ0 that would be consistent with a submodel Mk0 .
Definition 1. Let kθ − θ0 k be a suitable distance between θ ∈ Θk and
θ0 ∈ Θk0 ⊂ Θk . π(θ | Mk ) is a non-local prior under model Mk if for all
θ0 , k 0 6= k and > 0 there exists η > 0 such that kθ − θ0 k < η implies that
π(θ | Mk ) < .
Intuitively, NLPs define probabilistic separation of the models under
consideration, which is the basis for their improved learning rates. For preciseness, the definition also assumes that (as usual) the intersection between
2
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
any two models (k, k 0 ) of equal dimensionality is included in another model
T
k 00 , i.e. Θk Θk0 ⊆ Θk00 .
To fix ideas, consider the Normal linear regression y ∼ N (Xθ, φI) with
unknown variance φ. The three following distributions are NLPs under Mk .
Y θ2
i
πM (θ | φ, Mk ) =
i∈Mk
τφ
1
i∈Mk
(
πE (θ | φ, Mk ) =
Y
exp
i∈Mk
√
(
τφ
√ 2 exp − 2
πθi
θi
Y (τ φ) 2
πI (θ | φ, Mk ) =
N (θi ; 0, τ φ)
τφ
2− 2
θi
(1)
)
(2)
)
N (θi ; 0, τ φ),
(3)
where i ∈ Mk are the non-zero coefficients, N (θi ; 0, v) is the univariate
Normal density with mean 0 and variance v and πM , πI and πE are called
the product MOM, iMOM and eMOM priors (respectively).
Consider now an estimation problem where interest lies either on the
posterior distribution conditional on a single model π(θ | Mk , yn ) or the
P
Bayesian model averaging (BMA) π(θ | yn ) = K
k=1 π(θ | Mk , yn )P (Mk |
yn ). Let Mt be the data-generating model. Whenever P (Mt | yn ) → 1,
the oracle property π(θ | yn ) → π(θ | Mt , yn ) is achieved by direct application of Slutsky’s theorem. For instance, under the conditions in Johnson
and Rossell [2012] NLPs achieve P (Mt | yn ) → 1 and hence achieve the
oracle property. However, using NLPs for estimating θ presents important
challenges. Strategies to compute P (Mk | yn ) and explore the model space
are available, but π(θ | Mk , yn ) has extreme multi-modalities in high dimensions, e.g. the posteriors induced by (1)-(3) have 2pk modes, where
pk = dim(Θk ). Also, finding functional forms that lead to simple calculations limits the flexibility in choosing the prior.
The main contribution of this manuscript lies in the representation of
NLPs as mixtures of truncated distributions (Section 3). We prove that this
is a one-to-one characterization, in the sense that any NLP can be represented as such a mixture, and any non-degenerate mixture induces a NLP.
The representation provides an intuitive justification for NLPs, adds flexibility in prior choice and enables efficient posterior sampling even under
strong multi-modalities (Section 4). We also study finite-sample performance in case studies with growing p. (Section 5). Throughout we place
emphasis on high dimensions and devote our interest to proper priors, as
their shrinkage properties outperform improper priors even in moderate dimensions. See the early arguments in Stein [1956], the review on penalized
3
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
likelihood in Fan and Lv [2010], or related Bayesian perpectives in George
et al. [2012]. To provide intuition, we first introduce a simple example built
on basic principles.
2
Non local priors for testing and estimation
In the fundamental Bayesian paradigm the prior depends only on one’s
knowledge (or lack thereof), but in practice the analysis goals may affect
the prior choice. For instance, different priors are often used for estimation
and model selection (although see Bernardo [2010] and references therein
for an alternative decision-theoretic approach). While the practice deviates
from the basic paradigm, it can be argued that some preferences are hard to
formalize in the utility function and may be easier to account for in the prior.
For instance, in high dimensions point masses may simplify interpretation
and computation, which could be otherwise unfeasible.
Suppose we wish to both estimate θ ∈ R and test H0 : θ = 0 vs. H1 :
θ 6= 0, and that the analyst is comfortable with specifying a (possibly vague)
prior for the estimation problem. As an example, the gray line in Figure
1 shows a Cauchy(0, 0.25) prior expressing confidence that θ is close to 0,
e.g. P (|θ| > 0.25) = 0.5 and P (|θ| < 3) = 0.947. Setting a prior for
the testing problem is more challenging, as the analyst wishes to assign
positive prior probability to H0 . To be consistent, she aims to preserve the
estimation prior as much as possible. This could indicate a belief that θ = 0
exactly or simply reflect that |θ| < λ are irrelevant, where λ is a practical
significance threshold. She sets λ = 0.25 and combines a point mass at 0
with a Cauchy(0, 0.25) truncated to exclude (−0.25, 0.25), each with weight
0.5. The black line in Figure 1 (top) shows the resulting prior. It assigns the
same P (|θ| > θ0 ) as the estimation prior for any θ0 ≥ 0.25 and concentrates
all probability in (−0.25, 0.25) at θ = 0.
The intuitive appeal of truncated priors for Bayesian tests has been argued before [Verdinelli and Wasserman, 1996, Rousseau, 2010, Klugkist and
Hoijtink, 2007]. They encourage coherency between estimation and testing.
As important limitations they require a practical significance threshold λ,
and even as n → ∞ there is no chance of detecting small but non-zero effects. In fact, most Bayesian approaches to hypothesis testing combine point
masses with untruncated priors. These include Jeffreys-Zellner-Siow priors
[Jeffreys, 1961, Zellner and Siow, 1980, 1984], g and hyper-g priors [Zellner, 1986, Liang et al., 2008], unit information priors [Kass and Wasserman,
1995] and conventional priors [Bayarri and Garcia-Donato, 2007]. Although
4
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
1.2
1.0
0.8
0.6
Prior density
0.4
0.2
0.0
−2
−1
0
1
2
1
2
1
2
0.8
0.6
0.0
0.2
0.4
Prior density
1.0
1.2
θ
−2
−1
0
0.8
0.6
0.0
0.2
0.4
Prior density
1.0
1.2
θ
−2
−1
0
θ
Figure 1: Marginal priors for θ ∈ R (estimation prior Cauchy(0, 0.0625)
shown in grey). Top: mixture of point mass at 0 and Cauchy(0, 0.0625)
truncated at λ = 0.25; Middle: same with untruncated Cauchy(0, 0.0625);
Bottom: same as top with λ ∼ IG(3, 10)
5
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
not strictly necessary, most objective Bayes approaches are also based on untruncated priors (e.g. O’Hagan [1995], Berger and Pericchi [1996], Moreno
et al. [1998], Berger and Pericchi [2001], Pérez and Berger [2002]). Figure
1 (middle) shows an untruncated Cauchy(0, 0.25) combined with a point
mass at 0. It is substantially more concentrated around 0 than the original
estimation prior, e.g. P (|θ| > 0.25) decreased from 0.5 to 0.25. Setting a
larger scale parameter for the Cauchy would not fix the issue, as the Cauchy
mode would remain at 0. We view this discrepancy between estimation and
testing priors as troublesome, as their underlying beliefs cannot be easily
reconciled.
Suppose that the analyst goes back to the truncated Cauchy but now
expresses her uncertainty in the truncation point by placing a prior λ ∼
G(2.5, 10), so that E(λ) = 0.25. Figure 1 (bottom) shows the marginal
prior on θ, obtained by integrating out λ. The prior under H1 is a smooth
version of the truncated Cauchy that goes to 0 as θ → 0, hence it is a NLP
(Definition 1). Relative to the estimation prior, most of the probability
assigned to θ ≈ 0 is absorbed by the point mass, and P (|θ| > θ0 ) is roughly
preserved for θ0 > 0.5. In contrast with the truncated Cauchy, it avoids
setting a fixed λ and aims to detect any θ 6= 0. The example is meant to
show a case where a NLP arises as a mixture of truncated distributions, and
that this builds NLPs from first principles. We formalize this intuition in
the following section.
3
Non-local priors as truncation mixtures
We establish the correspondence between NLPs and mixtures of truncated
distributions. In Section 3.1 we show that the two classes are essentially
equivalent, and in Section 3.2 we study how the nature of the mixture determines some relevant NLP characteristics. Our subsequent discussion is
conditional on a given model Mk . Therefore, we simplify notation letting
π(θ) = π(θ | Mk ) and refer to dim(Θk ) as p.
We note that any NLP density under Mk can be written as π(θ) ∝
d(θ)πu (θ), where the penalty d(θ) → 0 as θ → θ0 for any θ0 ∈ Θk0 ⊂ Θk
and πu (θ) is an arbitrary prior. To ensure that π(θ) is proper we assume
R
d(θ)πu (θ)dθ < ∞. NLPs are often expressed in this form (e.g. (1) or
(3)), but the representation is always possible. A NLP density π(θ) can be
written as π(θ) = ππ(θ)
πu (θ) = d(θ)πu (θ), where πu (θ) is any local prior
u (θ)
density and d(θ) =
π(θ)
πu (θ) .
6
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
3.1
Equivalence between NLPs and truncation mixtures
We first prove that truncation mixtures define valid NLPs, and subsequently
that any NLP may be represented via truncation schemes. Given that the
representation is not unique, we give two constructive approaches and discuss their relative merits.
Let πu (θ) be an arbitrary prior on θ (typically, a local prior) and let
λ ∈ R+ be a latent truncation point. Define the conditional prior π(θ | λ) ∝
πu (θ)I(d(θ) > λ), where as before d(θ) → 0 as θ → θ0 ∈ Θk0 ⊂ Θk , and let
π(λ) be a marginal prior for λ. The following proposition holds.
Proposition 1. Define π(θ | λ) ∝ πu (θ)I(d(θ) > λ), assume that π(λ)
places no probability mass at λ = 0 and that πu (θ)R is bounded in a neighbourhood around any θ0 ∈ Θk0 ⊂ Θk . Then π(θ) = π(θ | λ)π(λ)dλ defines
a non-local prior.
Proof. See Appendix A.1.
As a corollary, whenever d(θ) can be expressed as the product of independent penalties di (θi ) NLPs may also be induced with multiple latent
truncation variables. This alternative representation can be convenient for
sampling (as illustrated later on) or to avoid the marginal dependency between elements in θ induced by a common truncation.
Corollary 1. Define πu (θ) as in Proposition 1 with d(θ) = pi=1 di (θi ). Let
the latent truncations λ = (λ1 , . . . , λp )0 ∈ Rp+ have an absolutely continuous
Q
prior π(λ). Then π(θ | λ) ∝ πu (θ) pi=1 I (di (θi ) > λi ) defines a non-local
prior.
Q
Proof. Replace I(d(θ) > λ) by pi=1 I(d(θi ) > λi ) in the proof of Proposition
1. Letting any λi go to 0 and applying the same argument delivers the
result.
Q
Example 1. Consider the linear regression model y ∼ N (Xθ, σ 2 I), where
y = (y1 , . . . , yn )0 , θ ∈ Rp , σ 2 is known and I is the n × n identity matrix. Variable selection is conceptualized as the vanishing of any component θi of θ, (i=1,. . . ,p). Accordingly, we define a NLP for θ that penalizes θi → 0 with a single truncation
point as in Proposition 1, e.g.
Qp
2 > λ . To complete the prior specification
π(θ | λ) ∝ N (θ; 0, τ I)I
θ
i=1 i
we choose some specific form for π(λ), e.g. Gamma or Inverse Gamma. Obviously, the choice of π(λ) affects the properties of the marginal prior π(θ).
We study this issue in Section 3.2. An alternative prior based on Corollary
7
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
1 is π(θ | λ1 , . . . , λp ) ∝ N (θ; 0, τ I) pi=1 I θi2 > λi . This prior results in
marginal independence as long as the prior on (λ1 , . . . , λp ) has independent
components.
Q
We turn attention to the complementary question: is it possible to represent any given NLP with latent truncations? The following proposition
proves that such representation can always be achieved with a single truncation variable λ and an adequate prior choice π(λ).
Proposition 2. Let π(θ) ∝ d(θ)πu (θ) be an arbitrary non-local prior and
denote by h(λ) = Pu (d(θ) > λ), where Pu (·) is the probability under πu .
Then π(θ) is the marginal prior associated to π(θ | λ) ∝ πu (θ)I(d(θ) > λ)
and
h(λ)
π(λ) =
∝ h(λ),
Eu (d(θ))
where Eu (·) is the expectation with respect to πu (θ).
Proof. See Appendix A.2.
A corollary is that NLPs with product penalties d(θ) =
also be represented with multiple truncation variables.
Qp
i=1 di (θi )
can
Corollary 2. Let π(θ) ∝ πu (θ) pi=1 di (θi ) be a non-local prior, denote by
h(λ) = Pu (d1 (θ1 ) > λ1 , . . . , dp (θp ) > λp ) and assume that h(λ) integrates to
a finite constant. Then π(θ) is the marginal prior associated to π(θ | λ) ∝
Q
πu (θ) pi=1 I(θi > λi ) and π(λ) ∝ h(λ).
Q
Proof. See Appendix A.3.
The main advantage of Corollary 2 is that, in spite of introducing additional latent variables, the representation greatly facilitates sampling. The
technical condition that h(λ) has a finite integral is guaranteed when πu (θ)
has independent components, as the result then follows by applying Proposition 2 to each univariate marginal. We illustrate this issue in an example
where we seek to sample from the prior. Section 4 discusses the advantages
for posterior sampling.
Example 2. Consider the product MOM prior in (1), where τ is the prior
Q
dispersion. Setting d(θ) = pi=1 θi2 and πu (θ) = N (θ; 0, τ I) in Proposition
Q
2, (1) can be represented using π(θ | λ) ∝ N (θ; 0, τ I)I pi=1 θi2 > λ and
the following marginal prior on λ
π(λ) =
P(
Qp
2
p
i=1 θi /τ > λ/τ )
Qp
2
E
i=1 θi
=
h(λ/τ p )
,
τp
8
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
10
0.14
●
0.12
●
●
●
●
●
0
0.08
●
●●
●
● ●
●
●
●
●
●
●
●● ●
●
●●●
● ●
0.002
●●● ●●
●
●●●
●●
● ●●
●●
● ●
●
●●●
●●●
●● ●●
●
●●
● ●●
●
●
●
●●
●
●
●●
● ●
● ●●
●●
●●
● ●●●●●
●
●●
●
●
●
● ●
●
●● ●
●
●● ●
●
●● ●● ●●●
● ●●
●
●
●● ●
●
●●
●
●●
●●
●
●
●6
●
●
●●
●
●
●
●●
●
●
●
●
●●●●
●●●
●●
●●●
●0
●
●
●
●
●● ● ●
●
●
●
●
●
●
●●
●●
●● ●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●●
●
●
●●
●●
●
●
●
●●
●
● ●●●●
.0
●
●
●● ●
●
●
●
●
0.01
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
0●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●● ● ●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●●● ● ●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●● ●● ● ●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
● ● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●●
●●●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
●● ● ●
●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
● ●● ● ●
●
●
●
●
●●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
● ● ●●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
● ●
●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●● ●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●● ●●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0.012
●
●
●
●
●●●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●●
●●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ● ●●● ●
●
●●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●●●
●●
●
●
●
●
●
●
●●●
●●●●
●
●
●
●
●
●
●●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●●
●●●
●
●
●●● ●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●● ●
●●
●●
●
●
●●●●
● ● ●
●●
●
●
0.008
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
● ● ●●
●
●
●
●●
●
●●
●
●
●
●●●
●● ● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●● ●●
●
●● ●●●
●
●●
●
●●
●●
●●
●●
●●
●
●
●
●●●●
●●
● ●● ●● ● ●
●
●●● ●
●
●
● ●●●● ●
●
●
●
●●
●
●
●
● ●
●● ●●
●●●●● ●
●●●●
● ●
●●
●
● ● ●● ●
●●
●
●●
●●
●
●
●●0.004
●●
●
●
●
● ●●
●
●●
●
● ●
●
●●
● ●●●●
● ●
●
● ●●
●
●
● ●
●
●● ● ●
●●●● ●●
●●
●
●
●
● ●●●
●● ●●
●
● ●● ●
●
●● ●● ●
●● ●● ● ●●
●● ●●
●●● ● ●
● ●●
●
● ● ●
●●● ● ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
● ●
●
●● ●
●
●
−10
0.00
0.02
−5
0.04
0.0
14
0.06
Density
●
0.01
4
5
0.10
●
●
●
●
● ●
●
●●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●●
● ●● ●●
● ●
●
●
●●
●● ●
●
●
●
●
● ●
●
●●
●
●
●●
●●
●●
●
● ●●●
● ●
●●
●
●●●●
● ● ●●
●●
● ●● ● ● ●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●● ● ● ●
●●●●● ● ● ● ● ●
●
●
●
●
● ●
●
●● ●
● ● ●●
●
●
●
●
●● ●●
●●●
●●●
● ● ●●
●● ●●● ●
● ●● ● ●
●●●
●
● ● ● ●●
● ● ● ●●
●●
● ●
● ● ●
●●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ● ●●
●●
●●
●●●
●
●● ● ●
●
●
● ●●
●
●
● ●●
● ●●●
●
●
●
●
●●
●
●●
●●
●
0.004
●● ● ● ●
●
●
●
● ●
●●
●
●
●
●●● ● ●
●●
●
●● ●
●
●
●●
●●
●
●
●●
●
●●● ●
●
●
●
●●
●
●
●●●
●● ●●●●● ●
●●
●
●●●0.004
●
●
●●
●●●●
●●
● ●●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●●●●●
●
●
●
●● ●●
●
●
●
●●●
●
●
●●
●●●
●● ● ●● ●
●●
●
●
●
●●
●●
●
●●
●
●
●● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●●●
●
●
●●
●●●●
●
●
●
●● ●●
●●
●
●
●●
● ●● ●
●●●●●
●●
●●●
●●
●
●
●
●● ● ●●●
●●
●
●●
●●
●
●●●●●
●●
●●
● ●●●
●
●
●
●
●
● ●●●
● ●●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●● ●
●
●
●●●
●
●
●
●
●●●●●
●●
●●
●
●● ●● ●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●●●●
●
●●
●
●
●●
●●
●
● ● ●
● ●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
● ●
●
●
●
●
●●
●
●
●
●
0.00
●
●●●
●
●
●●●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
● ●●
●●
●
●●
●
●
●
●
●
0.008
●
●
●
8
●
●
●
●
●
●
●● ●●●
●
●
●
●
●
●
●
●
●
●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
● ●●
●
●
●●●●●
●
●●
● ●●●●●
●●
●
●
●●
●
●
●●
●●●
●
●●●
●
●
●●
●
●
●●
●●
●
●●
●
●●●●●
●
●
●●
●
●
●●●
●●
●●
●
●●
●●
●●●●
●
●●●●●
●
●
●●
●
●
●
●
● ● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●● ●
●
●●
●
●
●
● ●●
●
●● ●
●
●
●
●●● ●
●●●●
●
●
●
●●●
●
●
●● ●● ●●●● ●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●●
●
●●
●
●
●
●
●
●
●●
● ●● ● ● ●
●
●
●
●
●
●● ●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
● ●
●●
●
●●
●
●●●
● ●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
.0
●
●
●
●
●
●
●
●
●
0.012
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●●●
●
●●●●●
●
● ●●●
●● ●
●●
●
●
●
0●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●
●
●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●● ● ●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
0.014
●
●●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●● ● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●●
●
●
●
●
●
●●
●●
●
●●●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●●●●●
●
●
●
●
●
●●● ●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●●
●
●●●●●●●●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ●● ●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
0.016
● ●●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
● ● ●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●
●
●●
● ●●
●
●
●
●
●●
●
●●
●
●
●
● ●● ●
●●
●
●● ● ●
●●
●
● ●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●●
●●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●● ●
●
●
●
●●
●
● ●●●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
● ●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●● ●● ●●●
●
●●
●●
●
●●
●
●●●
●●
●
●
●
●
●● ● ●●
●
●
●
●
●
●
● ●●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●●
●
●
●●
●●
●
●
●
●
●●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ●
●●●
●
●
●
●
2
●
●●
●
●
●
●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●●●●● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●● ●●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
0.01
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●.0
●
●
●
●●
●
●●
●
●●
●●
●●
●
●
0●●
●
●
●
●●
●
●●
●
●●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
● ●●●●●
1●
●
●●
●
●●
●
●
●●● ●●●
● ●● ●
●
●●
●● ●
●
●
●
●●
●● ●
●●
●● ●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
0.0
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●●
●●
0
●●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●●●●
●●
●
● ●●
●●
●
●
●
●
●
●
●
●
6
●
●
●
● ● ●●●
●
●
●●
●●
●
●
●
6●
●
●
●
●
● ● ●●
●
●
● ●
●
●
●
●●●●
●● ●
●
●
●●
●●● ●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
● ●●●
●0.●00
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●●
●
● ● ●●
●
●
●
●
● ●●
●
●
●
●
●
●
● ●●
●●
●
●● ● ●
●
●●
●
●●
●
●
●●●●
● ●● ●
●●● ●
●●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●●●
●
●
●●
●
●
●● ●
●
●
● ●●
●
●●
●
●●● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●● ● ●● ●●
●
● ● ●●
●
● ● ●●
●●
●●
●
●
●
●●●
● ● ●● ●
●●
●● ● ●●
● ●●●
●
●
●
●
●0.002
●
●● ●
●
●●●● ● ●
●●
●
●●
●●
●
●●● ●
●
●● ● ●
●
●
0.002
● ●
●
●
●● ● ●●●
● ●
●
●
●
●●
●
●●
● ● ●● ●
●●
●
●●
●
● ● ● ●
●
● ●
●
●
−10
−5
0
5
10
●
●
●
●●
●●
●
●● ●
●
0.002
● ●
● ●● ●
●●● ● ● ●●●
● ● ●● ● ●●●●
●●
● ●
●
●● ●● ●
●●
●
●● ●
●
●
●
●
●
●●
●●
●●
●●●
●
●●
● ● ●●
●
●
●●
●●
●
●
●●● ●
● ● ●
●
● ●
●
●●
● ●
●
●
●●
●
●● ●●
●●
●●
●●
●
●●
●●●
●
●
●
●●
●●
0●
●
●
● ●
●●
●●●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
.0
●●
●
●
●
●
●
●●
●●●●
●●●
●
●●
●
●
●
●
●
0.008
●
●●
● ●● ●●
●●
●
●
●
●
●
●●
●
●
●
●●
●0●
●●
●
●
●
●
●●
●
●
●
●●
6●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●● ●● ●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●●
●
●●
●
●
●
●
● ● ●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●● ●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●● ●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ● ● ●●
●
●
●
●
●●
●
●●●●
0●●
●● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
● ●● ● ●●
●
●
●
●
.0
●
●
●
●
●
● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●●●●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
1●
●● ●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●● ●● ● ●
●
●
●●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●
●●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●●
●●
●●●
●● ●
●●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●● ●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
14
●●
●
●
●
●
●
●
●
●
●
●
● ● ●●● ● ● ● ●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
0.0
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ● ● ●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●●●● ●● ● ●
●●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
0.01
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
● ●●●●●●
●
●
●●
●●
●
●●●
●●
●
●●
●
●
●● ●● ● ● ●
● ●
●
●●●
●
●
●
●●
●
●
●
●●
●●
●
●● ●●
●
●
●●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●
●
●● ●● ●
●●
●
●●
●
●●
●●●●
●●
●
●
●● ●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●●
● ●●●
●
●●●●
● ●● ● ●
●
●●
●●
●
●
●
●● ● ●●
●
●● ●●●● ●
●
●●
●
●
●
●●
●●●●●●
●
●
●●
●
●●
●●
●
●●
●
●●●● ●
●
●
●
● ●●
●
●●
●● ● ●
●●●●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●●●
● ● ●●●
●●●
●
● ●●
●
●
●●
●
●
●0.004
●
●●
●● ●● ●●●
●
● ● ●●
●
●
●
●●
●
●●●
●●● ● ●●
● ●●
●
● ●
●
●
●●●
●● ●
●
●●
●●●
●
● ●
●●●●●● ●●● ●●● ●
●
●
●●
● ●●
● ●●
●●
●● ●●
●● ●●
●
●
●
●
●
●
●●
●●
●
● ● ●●
●
●
●●
●
●
● ● ●● ● ●
●
●
●● ●
●●
●
●
●
●
●
●
● ● ●● ●
● ●
●
●
●
●●
●
●
●
−10
−5
●
0
5
10
theta
Figure 2: 10,000 independent MOM prior draws (τ = 5). Lines indicate
true density.
where h(·) is the survival function for a product of independent chi-square
random variables with 1 degree of freedom (Springer and Thompson [1970]
give expressions for h(·)). Using this representation, prior draws are obtained as follows:
1. Draw u ∼ Unif(0, 1). Set λ = P −1 (u), where P (u) = Pπ (λ ≤ u) is the
cdf associated to π(λ).
2. Draw θ ∼ N (0, τ I)I (d(θ) > λ).
As important drawbacks, P (u) requires Meijer G-functions and is cumbersome to evaluate for large p. Furthermore, sampling from a multivariate
Q
Normal with truncation region pi=1 θi2 > λ is also non-trivial.
Q
As an alternative representation, given the product penalty pi=1 θi2 , we
use Corollary 2 and sample from the univariate marginals. Let P (u) =
)
P (λ < u) be the cdf associated to π(λ) = h(u/τ
where h(·) is the survival of
τ
a χ21 distribution. For i = 1, . . . , p do
1. Draw u ∼ Unif(0, 1) and set λi = P −1 (u).
2. Draw θi ∼ N (0, τ )I(θi > |λi |).
9
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
The function P −1 (·) can be tabulated and quickly evaluated, rendering the
approach computationally efficient. Figure 2 shows 100,000 draws from univariate (left) and bivariate (right) product MOM priors with τ = 5.
3.2
Deriving NLP properties for a given mixture
Our results so far prove a correspondence between NLPs and mixtures of
truncated distributions. We now establish how the two most important
characteristics of a NLP functional form, namely the penalty d(θ) and its tail
behavior, depend on a given truncation scheme. It is necessary to distinguish
whether a single or multiple truncation variables are used. For a common
truncation variable λ, shared across θ1 , . . . , θp , the following holds.
Proposition 3. Let π(θ) be the marginal non-local prior for π(θ, λ) =
πu (θ) Qp
i=1 I(d(θi ) > λ) π(λ), where h(λ) = Pu (d(θ1 ) > λ, . . . , d(θp ) > λ)
h(λ)
and π(λ) is absolutely continuous wrt. the Lebesque measure on R+ . Denoting dmin = min{d(θ1 ), . . . , d(θp )}, the following hold:
1. As dmin → 0, π(θ) ∝ πu (θ)dmin π(λ∗ ), where λ∗ ∈ (0, dmin ). If π(λ) ∝
1 as λ → 0+ (including the case π(λ) ∝ h(λ)) then π(θ) ∝ πu (θ)dmin .
2. As dmin → ∞, π(θ) has tails at least as thick as πu (θ). In particular,
R
if π(λ)
h(λ) dλ < ∞ then π(θ) ∝ πu (θ).
Proof. See Appendix A.4.
In words, the non-local penalty is given by dmin π(dmin ), i.e. only depends on the smallest individual penalty d(θ1 ), . . . , d(θp ). This property is
important as the asymptotic Bayes factor rates depend on the form of the
penalty. Proposition 3 also indicates that π(θ) inherits its tail behavior from
πu (θ), which can be important for parameter estimation and to avoid finite
sample inconsistencies [Liang et al., 2008]. Corollary 3 extends the results
to multiple truncation points.
Corollary 3. Let λ = (λ1 , . . . , λp )0 be continuous positive variables and π(θ)
Qp
u (θ)
be the marginal non-local prior for π(θ, λ) = πh(λ)
i=1 I(di (θi ) > λi )π(λ),
where h(λ) = Pu (d1 (θ1 ) > λ1 , . . . , dp (θp ) > λp ). The following hold:
1. As di (θi ) → 0 for i = 1, . . . , p, π(θ) ∝ πu (θ) pi=1 di (θi )π(λ∗i ), where
π(λ∗i ) is the marginal prior for λi at λ∗i ∈ (0, d(θi )).
Q
10
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
2. As di (θi ) → ∞ for i = 1, . . . , p, π(θ) has tails at least as thick as
πu (θ). In particular, if E h(λ)−1 < ∞ under the prior on λ, then
π(θ) ∝ πu (θ).
Proof. See Appendix A.5.
In words, multiple truncation variables induce a multiplicative non-local
penalty. This implies that assigning λi ∼ π(λ) for i = 1, . . . , p induces
stronger parsimony than a common λ ∼ π(λ). As an example, with multiple
λi it may suffice that each di (λi )π(λi ) induces a quadratic penalty, but with
a single λ an exponential penalty might be preferable.
4
Posterior sampling
We use the latent truncation characterization to derive posterior sampling
algorithms, and show how setting the truncation mixture as in Proposition 2
and Corollary 2 leads to convenient simplifications. Section 4.1 provides two
generic Gibbs algorithms to sample from arbitrary posteriors, and Section
4.2 adapts the algorithm to linear models under product MOM, iMOM and
eMOM priors (1)-(3). As before, sampling is implicitly conditional on a
given model Mk and we drop Mk to keep notation simple.
4.1
General algorithm
First consider a NLP defined by a single latent truncation, i.e. π(θ | λ) =
πu (θ)I(d(θ) > λ)/h(λ), where h(λ) = Pu (d(θ) > λ) and π(λ) is an arbitrary
prior on λ ∈ R+ . The joint posterior can be expressed as
π(θ, λ | y) ∝ f (y | θ)
πu (θ)I(d(θ) > λ)
π(λ).
h(λ)
(4)
Sampling from π(θ | y) directly is challenging as it has 2p modes. However, straightforward algebra gives the following k th Gibbs iteration to sample from π(θ, λ | y).
Algorithm 1. Gibbs sampling with a single truncation
1. Draw λ(k) ∼ π(λ | y, θ (k−1) ) ∝ I(d(θ) > λ)π(λ)/h(λ). When π(λ) ∝
h(λ) as in Proposition 2, λ(k) ∼ Unif(0, d(θ (k−1) )).
2. Draw θ (k) ∼ π(θ | y, λ(k) ) ∝ πu (θ | y)I(d(θ) > λ(k) ).
11
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
That is, λ(k) is sampled from a univariate distribution that reduces to
a uniform when setting π(λ) ∝ h(λ), and θ (k) from a truncated version of
πu (·). For instance, πu (·) may be a local prior that allows easy posterior
sampling. As a difficulty, the truncation region {θ : d(θ) > λ(k) } is nonlinear and non-convex so that jointly sampling θ = (θ1 , . . . , θp ) may be
challenging. One may apply a Gibbs step to each element in θ1 , . . . , θp
sequentially, which only requires univariate truncated draws from πu (·), but
the mixing of the chain may suffer.
The multiple truncation representation in Corollary 2 provides a conveQ
nient alternative. Consider π(θ | λ) = πu (θ) pi=1 I(di (θi ) > λi )π(λ)/h(λ),
where h(λ) = Pu (d1 (θ1 ) > λ1 , . . . dp (θp ) > λp ) and π(λ) is an arbitrary
prior. The following steps define the k th Gibbs sampling iteration:
Algorithm 2. Gibbs sampling with multiple truncations
1. Draw λ(k) ∼ π(λ | y, θ (k−1) ) =
h(λ) as in Corollary 2,
(k)
λi
π(λ)
i=1 Unif(λi ; 0, di (θi )) h(λ) .
Qp
If π(λ) ∝
∼ Unif(0, di (θi )).
2. Draw θ (k) ∼ π(θ | y, λ(k) ) ∝ πu (θ | y)
Qp
i=1 I(di (θi )
(k)
> λi )
Now the truncation region in Step 2 is defined by hyper-rectangles, which
facilitates sampling. As in Algorithm 1, by setting the prior conveniently
Step 1 avoids evaluating π(λ) and h(λ).
4.2
Linear models
We adapt Algorithm 2 to a Normal linear regression y ∼ N (Xθ, φI) with
unknown variance φ and the three priors in (1)-(3). We set the prior φ ∼
IG(aφ /2, bφ /2) and let τ be a user-specified prior dispersion. To set a hyperprior on τ or determine it in a data-adaptively manner see Rossell et al.
[2013], and for objective Bayes alternatives see e.g. Consonni and La Rocca
[2010].
For all three priors, Step 2 in Algorithm 2 samples from a multivariate
Normal with rectangular truncation around 0, for which we developed an
efficient algorithm. Kotecha and Djuric [1999] and Rodriguez-Yam et al.
[2004] proposed Gibbs after orthogonalization strategies that result in low
serial correlation, and Wilhelm and Manjunath [2010] implemented the approach in the R package tmvtnorm under restrictions of the type l ≤ θi ≤ u.
Here we require sampling under di (θi ) ≥ l, which defines a non-convex region. Our adaptated algorithm is in Appendix A.6 and implemented in our
R package mombf. An important property is that the algorithm produces
12
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
independent samples when the posterior probability of the truncation region
becomes negligible. Since NLPs only assign high posterior probability to a
model when the posterior for non-zero coefficients is well shifted from the
origin, the truncation region is indeed often negligible. We outline the full
algorithm separately for each prior.
4.2.1
Product MOM prior
Straightforward algebra shows that the full conditional posteriors are
π(θ | φ, y) ∝
p
Y
!
θi2
N (θ; m, φS −1 )
i=1
aφ + n + 3p bφ + s2R + θ 0 θ/τ
π(φ | θ, y) = IG
,
2
2
!
,
(5)
where S = X 0 X + τ −1 I, m = S −1 X 0 y and s2R = (y − Xθ)0 (y − Xθ) is the
sum of squared residuals. Corollary 2 represents the product MOM prior in
(1) as
p
Y
θi2
π(θ | φ, λ) = N (θ; 0, τ φI)
I
> λi
τ
φ
i=1
marginalized with respect to π(λi ) = h(λi ) = P
θi2
τφ
!
1
h(λi )
(6)
> λi | φ , where h(·)
is the survival of a chi-square with 1 degree of freedom. Algorithm 2 and
straightforward algebra give the k th Gibbs iteration as
1. φ(k) ∼ IG(
2.
λ(k)
aφ +n+3p bφ +s2R +(θ (k−1) )0 θ (k−1) /τ
,
)
2
2
∼ π(λ |
θ (k−1) , φ(k) , y)
=
Qp
i=1 I
(k−1)
(θi
)2
τ φ(k)
3. θ (k) ∼ π(θ | λ(k) , φ(k) , y) = N (θ; m, φ(k) S −1 )
> λi
Qp
i=1 I
θi2
τ φ(k)
> λi .
Step 1 samples unconditionally on λ, so that no efficiency is lost for introducing these latent variables. Step 3 requires truncated multivariate Normal
draws.
13
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
4.2.2
Product iMOM prior
We focus models with less than n parameters. The full conditional posteriors
are
!
p √
τφ
Y
τ φ − θ2
i
π(θ | φ, y) ∝
N (θ; m, φS −1 )
2 e
θ
i
i=1
π(φ | θ, y) = e−τ φ
Pp
θ−2
i=1 i
aφ + n − p bφ + s2R
IG φ;
,
2
2
!
,
(7)
where S = X 0 X, m = S −1 X 0 y and s2R = (y − Xθ)0 (y − Xθ).
Now, the iMOM prior in (2) can be written as πI (θ | φ) =
√
φτ
τφ − 2
p
p
√ 2 e θi
Y
Y
πθi
= N (θ; 0; τN φI)
di (θi , φ).
N (θ; 0; τN φI)
N (θi ; 0, τN φ)
i=1
i=1
(8)
While in principle any value of τN may be used, setting τN ≥ 2τ guarantees
d(θi , φ) to be monotone increasing in θi2 , so that its inverse exists (Appendix
A.7). By default we set τN = 2τ . Following Corollary 2 we represent (8)
using latent variables λ, i.e.
π(θ | φ, λ) = N (θ; 0, τN φI)
p
Y
I(d(θi , φ) > λi )
i=1
1
h(λi )
(9)
and π(λ) = pi=1 h(λi ), where h(λi ) = P (d(θi , φ) > λi ) which we need not
evaluate. Algorithm 2 gives the following MH within Gibbs procedure.
Q
1. MH step
(a) Propose
φ∗
∼ IG φ;
aφ +n−p bφ +s2R
, 2
2
n
(b) Set φ(k) = φ∗ with probability min 1, e(φ
φ(k)
2. λ(k) ∼
=
Qp
(k−1) −φ∗ )τ
Pp
θ−2
i=1 i
o
, else
φ(k−1) .
(k−1)
i=1 Unif λi ; 0, d(θi
3. θ (k) ∼ N (θ; m, φ(k) S −1 )
Qp
, φ(k) )
(k)
(k)
i=1 I d(θi , φ ) > λi
.
Step 3 requires the inverse d−1 (·), which can be evaluated efficiently combining an asymptotic approximation with a linear interpolation search (Appendix A.7). As a token, 10,000 draws for p = 2 variables required 0.58
seconds on a 2.8 GHz processor running OS X 10.6.8.
14
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
4.2.3
Product eMOM prior
The full conditional posteriors are
p
Y
π(θ | φ, y) ∝
e
− τ 2φ
!
N (θ; m, φS −1 )
θ
i
i=1
−
Pp
τφ
i=1 θ 2
i
π(φ | θ, y) ∝ e
IG φ;
a∗ b∗
,
,
2 2
(10)
where S = X 0 X + τ −1 I, m = S −1 X 0 y, a∗ = aφ + n + p, b∗ = bφ + s2R + θ 0 θ/τ
and s2R = (y − Xθ)0 (y − Xθ). Corollary 2 represents the product eMOM
prior πE (θ) in (3) as
π(θ | φ, λ) = N (θ; 0, τ φI)
p
Y
√
I e
!
2− τ 2φ
θ
> λi
i
i=1
√
marginalized with respect to π(λi ) = h(λi ) = P
e
2− τ 2φ
θ
i
1
h(λi )
(11)
!
> λi | φ . Again
h(λi ) has no simple form but is not required by Algorithm 2, which gives
the k th Gibbs iteration
1.
φ(k)
−
Pp
τφ
i=1 θ 2
i
∼e
∗
∗
IG φ; a2 , b2
∗
∗
(a) Propose φ∗ ∼ IG φ; a2 , b2
n
(b) Set φ(k) = φ∗ with probability min 1, e(φ
φ(k)
=
5
2.
3.
θ (k)
Pp
θ−2
i=1 i
o
, else
φ(k−1) .
√
λ(k)
(k−1) −φ∗ )τ
2− τ 2φ
∼
Qp
∼
Q
N (θ; m, φ(k) S −1 ) pi=1 I
i=1 Unif
λi ; 0, e
θ
!
i
θi2
φτ √ > log(λ )− 2 .
i
Examples
We assess the performance of the posterior sampling algorithms proposed
in Section 4, as well as the use of NLPs for high-dimensional parameter
estimation. Section 5.1 shows a bivariate example designed to illustrate the
main posterior sampling issues. Section 5.2 studies p ≥ n cases and compares
the BMA estimators induced by NLPs with benchmark [Fernández et al.,
15
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
θ1
θ1
θ1
θ1
= 0,
= 0,
6= 0,
6= 0,
θ2
θ2
θ2
θ2
=0
6= 0
=0
=0
θ1
θ1
θ1
θ1
= 0,
= 0,
6= 0,
6= 0,
θ2
θ2
θ2
θ2
=0
6= 0
=0
=0
θ1 = 0.5, θ2 = 1
MOM
iMOM
0
0
2.8e-78
2.72e-78
1.95e-191 3.82-e191
1
1
θ1 = 0, θ2 = 1
1.69e-225 4.39e-225
0.999
1
1.82e-193 1.64e-192
8.83e-05
3.30e-09
eMOM
0
6.86e-79
5.90e-191
1
1.08e-224
1
6.80e-192
3.17e-09
Table 1: Posterior probabilities
2001] and hyper-g priors [Liang et al., 2008], SCAD [Fan and Li, 2001]
and LASSO [Tibshirani, 1996]. In all examples we use the default prior
dispersions τ = 0.358, 0.133, 0.119 for product MOM, iMOM and eMOM
√
priors (respectively), all of which assign 0.01 prior probability to |θi / φ| <
0.2 [Johnson and Rossell, 2010], and φ ∼ IG(0.01/2, 0.01/2).
We set P (Mk ) = 0 whenever dim(Θk ) > n, and adapt the Gibbs search
on the model space proposed in Johnson and Rossell [2012] to never visit
those models. We assess the relative merits attained by each method without
the help of any procedures to pre-screen covariates.
5.1
Posterior samples for a given model
We simulate 1,000 realizations from yi ∼ N (θ1 x1i +θ2 x2i , 1), where (x1i , x2i )
are drawn from a bivariate Normal with E(x1i ) = E(x2i ) = 0, V (x1i ) =
V (x2i ) = 2, Cov(x1i , x2i ) = 1.
We first consider a scenario where both variables have non-zero coefficients θ1 = 0.5, θ2 = 1, and compute posterior probabilities for the four
possible models. We assign equal a priori probabilities and obtain exact integrated likelihoods using functions pmomMarginalU, pimomMarginalU and
pemomMarginalU in the mombf package (the former is available in closedform, for the latter two we used 106 importance samples). The posterior
probability assigned to the full model under all three priors is 1 (up to
rounding) (Table 1). Figure 3 (left) shows 900 Gibbs draws (100 burn-in)
obtained under the full model. The posterior mass is well-shifted away from
0 and resembles an elliptical shape for the three priors, as expected. Table
16
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
1.05
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●
● ● ●●
●
●
●
●
●
● ● ●
● ●
●●
●
●●● ●
● ●● ● ●
● ●
●
●
●
●
●
●
●
●
● ● ●●
● ●
● ●●
● ●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
● ●● ● ●
●
● ●●
●●
●
●
●●
●
●
● ●
●●
●
● ● ●
●
● ● ● ● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
● ●
●●
● ●● ●●●●●●
● ● ● ● ●●
●
●
●
●
● ● ●
● ●●●
●
● ●● ●●
● ● ●
●
●
●
●
●
● ● ●●
●
●● ●
●
●
●
●
●●
●
●
● ●
●●
●
●
●● ●
● ●
●
●
●
●
●
● ●●
●
● ● ● ● ● ● ●● ●
● ●
●
● ●● ●●● ● ●
●●
● ● ●
●
●
●
●
●●
● ●
●● ●●
●
● ● ●
●
●● ● ● ● ●
● ●
●
●
● ●
● ●● ●
●
●● ●●
● ● ●
● ●
● ● ●● ●
●●
●●
●●
●●
●
●
● ●●
●●
● ●
●
●● ●
●● ●
●
●● ● ●
●
●
●
●
●
●
● ●●
● ●●
●● ●
●
●
●
●
●●
●
●●
●
●
●●
●●
●● ●
●
●
●
●
●
●
●
● ●●●
●
● ●
●
●●
●
●●
●●● ●
●●● ●
●
●● ●● ● ● ● ●
●
●
●
● ● ●
● ●
●●●●●● ●●
●
●
● ●●
● ●
●
●
●
●
●● ● ●
●●
●●
● ●● ●● ●
● ●
●
● ●●●●
●●
● ●● ●● ●● ●
●
●
●●
● ●
●●● ●
●
●
●● ●
●
●
●
●● ●● ● ●
● ●●● ● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ● ●●
●
●
● ● ● ● ●●●●
●●
●
●
●
●
●
●
●
●
● ●●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
● ● ●●●●
●● ●
●●●
● ● ●●
●●
●
●
●
● ●
●
● ● ●
● ● ●
●
●●
●
●●
●
● ●●●●
●●●
●
●
●
●
● ●● ● ●●
●
●
●
●
●● ●
● ● ●●
●
●
●●
●
●● ●●
● ●
●
● ● ● ●
● ●
●
●
●
● ● ●● ● ● ● ●● ●●●●●●●●● ●
● ● ● ●●
●
●● ●
●
●
● ●
●
●
●
● ●● ●
●
●●
●
●
●●● ●●
●● ●
● ●
●
● ● ●●
●
● ●●● ● ●
●
●
●
●
●
●
● ●●
●
●
● ● ●
●
●
●
●●
●
●
●● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
● ●
●● ●
●
● ●
● ●●
●
●●●
● ●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
● ●
●
●●
●
● ●
●
●●
●
●●
● ●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●● ●
●
●
●
●
●●●
●
1.00
0.95
●
●
1.00
●
●●
●
● ● ●
●
●
0.50
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
● ● ●
●
●
●
●
●
● ●●
● ● ●
●●
●●
●
● ● ● ●
●
●
● ● ●●
●●
●
●
● ●
● ●
●
●
●
●
●
●
●
●● ● ●● ●
●
●
●●
●
●
●●
●
●
●
●
●● ●
●
● ● ●● ●
● ●
●
● ●●● ●
●
●
●
● ● ●●● ● ● ● ●●●●
●●
●
●●
●
●
●● ●●
● ●●
● ● ● ●●
●●
●●
● ●● ●
●●
●
●●
●
●
● ●
●
●
● ●●
●
●●
● ●● ●
● ●
● ● ●● ●●
●
● ●● ●●
●
●
● ●
● ●
● ●
●
●
●
●
● ● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●
● ●●
●●
● ●● ● ●
● ● ●●●
●●
●●
●
●
●
●
●● ● ● ●●●
●●●
●
●
●
●
●
●●● ●●● ●● ●●
● ●●●●
●
●
●
●● ● ● ●●
●
●● ● ●● ● ●●●
●
●●
●●
●
●
●
●
●
●●● ●
●
● ●●
●●●●●
●
●
●
● ●
● ●●●
●
●●● ●
●
●● ● ● ●●● ●●
●
●●
●●● ●●●
●●●
●●●
●● ● ●●
●●●●
●●●
●●
●●
● ●●
●● ● ● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
● ●● ● ●●
●
●
●
●●●● ●● ● ●●
●
● ●● ●
●
● ● ●
● ● ● ●●●
● ●● ●
● ●●
● ● ●●
● ●●●● ●●
●
●●●
● ● ●
● ● ●
●●
● ● ●
●● ● ● ●
● ●●●
●
● ●●●●
●
●●
●●
●
●● ●
●● ● ● ●
●
●
● ●●
●
●
● ●●
●
● ● ● ● ●● ●●
●
●● ● ●
● ● ●● ●
● ●
●
●
●
●●●
● ●● ●
● ●
●
● ●
●
●● ●●
●
● ●●●●
●●●
● ●
● ●
● ●● ●
●
●
●
●
● ●● ●● ● ● ●●
●
●
●
●
● ●●
●
● ●
●
●
●
●●
●
● ●
●●
●
●
●
0.60
−0.10
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●
●
● ●●
●
●
● ●●
● ●
●
●● ●
● ●● ● ●
●
● ●
●
●
●
●
●● ● ●
● ●
●
●
● ●
●●
● ● ● ● ●●
●● ● ●●
●
● ●
●
●
●● ●
●●● ●
●
●
●● ● ● ● ●●
●
● ●●
●
●●●
● ●
●
●●
●
●●
●● ●
●
● ● ● ●●●
●
●●
●
●● ● ●
●● ● ● ● ●
● ●
● ● ● ●
● ● ●
●
● ● ●
●
●
●
●
●
●
●●
●
●
● ●●● ●
● ●
●
●● ●
●
●
●
●
●
● ● ●●
●
●● ●
●●
● ●
●
●●
● ●● ●
● ●
●●●●
●
●●
● ●
●
● ●
●
●
● ●●
●●
●
●
●
●●
● ● ●●●● ● ●●
●
● ● ● ●● ● ●
●
● ●
● ● ● ● ●
●
●●● ●
● ●●●● ● ●● ● ●● ●●
●●
● ●● ●● ●●
●
●
● ●
●● ●
●●
●●
● ●
●● ●●●●●●
●
●
●●●● ● ● ●●● ● ● ● ●
●●
●●
● ●
●
● ● ● ●
● ●
●●
● ●
●●
●
● ● ●●●
● ●
●●
● ●● ●●●● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●● ●
● ●●
● ●
●●
●
● ● ●●●●●
●●
●
●
●● ●●
● ●
●●●
●
● ●●
●●
●
●
●
●● ●●
●
● ●
●
●
● ●
●● ● ●● ●●● ● ● ●●
●●●
●
●
●
● ●●●
●●●
●
●
●
●
●
●
●
●● ●●●
●● ●●●
● ●
●●●
●
●
●●●● ●●●
● ●●
●● ●
●
●
●
● ● ●
●
●●●
●
●●
●
● ●●●● ●●●●
●
●
●●
●
●
●
●● ●
●●
●
●●
●
● ●●
●
●●
●
●
●
● ●● ● ●● ●
●
●
●
● ●
●●●●
● ●● ●● ● ●
●
●
●●●
● ● ●●
●
●
●●●
●●
●●●●
●●
● ● ●●
●
●
●
●
●● ●
● ●
● ● ●●
● ●
●
●
●
●
●● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●● ●
●
●
●
● ●● ●
●●
●●●●
●
● ● ●● ●●●
●●●●
●
●
●
●
●●●● ●●
●
●
● ● ●●●
●●
●● ●
●
● ● ● ●●● ●●
● ●
●
●● ●
●●
●
● ●
●
●
●●●
●
● ●
●
●
● ●● ● ●
●●
●
● ●●● ● ●●● ●
●
●
● ●●●●
●
●
●
●
●
● ●
● ●●
● ●● ● ●
●
●
●●
● ●
●●●● ●
●
●●●
● ● ●
●
●
●●
●
●
●●
●
● ● ● ●●
●●
● ●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●● ●
● ●
● ●
●
●
●●
●● ●
●
●
●
●
●
● ● ●
● ●●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●●
1.05
1.05
●
●
●
●
●
●
●
●
●
●
● ● ●
●●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
● ● ●
●
●
●● ●
●
●
●
●
●
●● ●
●●
●
● ●
● ● ●● ● ● ●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
● ●
● ●
●
● ●
●●
●● ●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
● ● ●●
●●
● ●● ●
● ●
●●●
●
● ●
●
●
●● ●
●
●
● ●●
●
●
● ●
●
●
●
●● ● ● ●
●
●
●●
●●
●
●
●
● ●●● ●●● ● ●
●
●
● ●●
● ●
●● ●
●
●
●●
● ●● ● ● ●●●
●
●
●
●
●
●●
●● ●
●
● ● ●
●
●
●
●
● ●
● ●● ● ●
● ●
●
●
●
● ● ●●
●
●
●
●●
●
●
● ●
●●
●
●
●
●●
●●
●
●
●
−0.05
0.00
●
0.05
●
●
●
●
●
●
● ● ●
●
●
●
●●
● ●● ●
● ● ● ● ● ●●
●
●
● ● ● ● ●● ●
●●
● ●● ●●
● ●
●● ● ● ●●
● ●
●●
●
●●
●● ●● ●
●● ●●●
●
● ●●
●● ●●
●
● ● ● ●
●
●● ●
●●
●
●
● ●●
●
●
●
●
● ●●
●●
●
●●
●●●●
●●
●
●
●
● ● ●●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●●●
●
● ●
●
●
●●
●●
●●
● ●●●
●●
●
●
●●
●
●●●
●
●
●●●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●
●
●●
●●
● ●●
●
●●
●●●
●
●
● ●●
●●●●●
●
●
●
●●
●
●●
●
●
●●
●●●
●
●
● ●●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
● ●●●
● ●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●● ●
●
●
●● ●
●●●
●
●
●
●
●
●●●●● ●
●
●
●●●●
●
●
●
●●●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●●
●
●
●
● ●●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●●
●●
●
●
● ● ●●
●
●
●
●
●●●●
●
● ● ●●
●●●●●
●
●
●
●●●
● ●
●
●●
●
●
● ●● ●●●
●●
●●
●
●
●
● ●
●●●
●
●
●
●●
●● ●●
●●
●●
●
●●
●
●●●
● ●●
●
●●
●
●
● ●
● ●●
●
●
●●●
●●●●● ●
●●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●
● ●●
●●
●
●
●
●●
●
●● ●
●●●
●
●
●
●
● ●● ●
● ●●
●●
●
●
●
●
●
●
●
●● ●
●● ●
●
●
●
●
● ●●
●
●
●●
●
● ●
●
●● ●
●
●
●● ●●● ●
●●
●●●
● ●●●
●●●
● ●●● ● ●
●
●
●
●
●
●
●
●●
●● ●
●● ●
● ●
●●●●●
● ●
● ●
●●
●
●
● ●●
● ● ●●
●● ●
●●
●
● ●
●
● ● ●●●
●
● ●●
●
●
●● ●
● ●
●●
●
●● ● ●
●
● ●
θ2
0.95
●
●
●
●
●●
●●●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
0.90
0.95
θ2
1.00
●
1.00
●
●
●●
●● ●
● ● ●
θ1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.55
●
●
●
●
●
θ1
●
●
●●
●
●
●
●
●
●
●
●
●
θ2
θ2
●
●
●
0.95
●
●
●● ●
0.90
1.05
●
●●
●●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●●
●
0.90
●
0.55
0.85
0.60
●
−0.15
−0.10
−0.05
θ1
●
●
●
●
●
●● ●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
0.50
0.55
θ1
●
1.00
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
● ●●
●●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●● ● ● ●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
● ●●
● ●●●
●
●●
●
●
● ● ●
●
● ● ●● ●● ● ●
●●● ●● ●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●●
●
● ●●
●
●● ●●
● ● ●
●
● ● ●
●
●●
●● ● ●
●●
●● ● ● ● ●●
● ● ●●
●
●●
●
●●
●● ● ●
● ●
●
●●
● ●
● ● ●●●●
●
●
●
● ● ● ● ● ●●
●
●●
●
●● ● ● ● ● ●
●
● ●●
●●
●
●
●● ●●
●
● ●●●
● ●● ●●
●
●
●
● ●
●● ●
●
●
●●
●
●
●
● ●●●
●●● ●● ●● ●
●
●●●
●●
●● ●
●●
●
● ● ●●● ●
●
●
●
●
●
●
●● ● ●
●
● ●● ● ●
●
●
●
●●●●
● ●●
●
●●●●●
●
●
● ● ●●
●
●
●
● ●●●
●
●● ● ●
● ● ● ●●● ● ●
● ●● ● ●
●
●
●
●●
● ●
●
● ●
●
●●
●● ● ● ●● ●● ●●
●● ●
● ●
●
●●
● ●● ●
● ●
●
●
● ●● ●
● ●●●●
●
● ● ●● ● ●
●
●●
●●
●●●●
●
●
●●●
● ●
●●
●
●
●
●
●
●
● ●
●
●● ●●
●
●
●
●
●
● ●
●●●●●● ● ●● ● ●● ●
●●
●
●
●●
●
● ●
●
● ●
●
●
● ● ●●
●●●
●
● ●
●● ●●● ●● ●
●●
●●● ● ●
●●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
● ● ●●● ●●●
●
●
● ●
●
●
●
●
●
● ● ● ●● ●
● ● ●● ●
● ●●
● ●
●
●
●
● ●●
●
●
● ●
●
●● ●● ● ●
●
● ●
●● ●
●
●●●
●
●
● ● ●● ●
● ●● ●
● ●
●
●●
●
●
● ●
●●
● ●
● ●● ● ●●
●●
●
●
●● ●● ● ● ● ● ●
●
●
●●
● ●●
●
●●
● ● ● ●●
● ●● ● ● ●
● ● ●●●●●
●
●●
● ● ● ●● ●●
●●
●● ●
●● ● ● ● ●
●
● ●
●●
●●
●● ●
●● ● ● ●
● ● ●● ● ●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●● ●● ●
● ● ●
●
●
● ●●
●
●
●●
●
●
●
● ● ●
●
●
●
●
● ●
● ●● ● ●●
●
● ●●
● ●
● ●
●
●●
● ● ●●
●
●● ● ● ● ● ● ●
●●
●
● ●
●
●
● ●
●
● ●
●
●● ●
●●
●●
●
● ●●
●
●
● ●●
●
●●
●● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
θ2
●
0.95
●
●
0.10
0.15
● ●
●
●
●
● ● ●● ●
●
●●
● ●●
●
● ● ●
●
●
●●
● ● ●●●●● ●
● ●
●
● ●●
●●
●
●
● ● ● ●●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●●
●●●
●
●
●● ● ●●
●●
●●
●●
●
●
●●
● ●●
●
●
●● ●●●● ●●
●
●
●
●●●
●●● ● ●
● ●
● ●
●
●●
●
●●
●
● ● ● ●●
●
● ●
●
●
●
●●
●●
●●
●●
●● ●
●●●●●●
● ●●
●
●
●●●
●
●●
●
●
●
●
●● ●●
●
●
●
●●
●
●
●●●
●● ●●
●
●●
●
●
●
●
●●●
●
● ●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●●●
●
●
●● ●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●● ●
●
●●● ●
●●
●
●●●
●●●●
● ●●
●
●
● ●●
●
●
●
●
●●
●●
●
●●
●●●
●
●● ● ●
●
●●
●
●
●
●
●
●●
●●●
●
●
●●
●
●●●● ●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●●
●●
●
●●
●
● ●●●
●●
●
●
●●
● ●●
●●●
●
●●●
●●●●●
●●
●●●●
●
●●
●●
● ●●
●●
●●
● ●●●
●●●●
●●●●
● ●
●●
● ●●●
●
●●
●
●●
●●●● ●
●
●
●● ●●●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●●
●
●● ●●
●●
●
●●
●
●
●
●
●
●
●
●
●● ●
●● ●
●●
●●
●
●
●●
●●●
●
●●●
●
●
●
●●
● ●●
●●
●●
●
●
●●
●
●●
● ●●● ●●
●
● ●
●
●
●
●
●
●
●● ● ●
●
●● ● ●●
●●●
●
●
●●
● ● ●
●
●●
●
●● ● ●
●
●●
●●
●●
●● ●
●
●●●
●
● ●
● ● ● ● ●●
●
●
●
● ●
●
●
●● ●
● ●●●
●●
●
●
● ●●
●●● ● ●
●●
● ●●
●
●
●
●
● ●●
●● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●●
●
●
● ● ●
●
●●● ● ●●●●
● ●● ● ● ●
●
●
●
●● ● ●●● ●
●
●
●
●
●●
●
● ●
●
●
●
● ●●●
●
● ●
● ●● ●
●
●
●
● ●●
●●
● ●●● ●
●
●
●
●●●●
●● ●● ●● ●
●
●
●
●●
●
●● ● ●● ● ● ● ●
●
●●
● ●
●
● ● ● ● ●
●
●●
● ●
● ●
●
●
●●●● ● ●
●
●
● ●●● ●
●
●
●
●●●
●
●
● ● ●
● ●●
●
●
●● ●
● ●●●
●
●
●
●
●
● ●
●
●
●●●
●
● ●
●
● ● ●
●
●
●
●
0.90
●
0.05
0.85
1.05
●
●
1.05
●
●
●
●
0.00
●
●
●
●
1.00
●
θ1
●
●
●
θ2
●
●
●
0.50
0.95
●
●●
●●
● ●
●
●
● ●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
● ●●● ●●
●●
●
● ●●
●●
●●
● ● ●●
●
●
●
●
● ●●
●●
●
●
●
0.60
−0.15
−0.10
−0.05
0.00
0.05
0.10
0.15
θ1
17
Figure 3: 900 Gibbs draws after a 100 burn-in when θ = (0.5, 1)0 (left)
and θ = (0, 1)0 (right). Contours indicate posterior density. Top: MOM
(τ = 0.358); Middle: iMOM (τ = 0.133); Bottom: eMOM (τ = 0.119)
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
θ1
θ2
φ
θ1
θ2
φ
θ1 = 0.5, θ2 = 1
MOM iMOM eMOM
0.096
0.110
0.018
0.034
0.134
0.019
-0.016 0.069
0.027
θ1 = 0, θ2 = 1
0.115
0.032
0.049
0.134
0.122
0.042
-0.040 0.327
0.353
Table 2: Serial correlation in Gibbs sampling algorithm
2 gives the first-order auto-correlations, which are very small. This example
reflects the advantages of the orthogonalization strategy, which is particularly efficient as the latent truncation becomes negligible.
We now set θ1 = 0, θ2 = 1 and keep n = 1000 and (x1i , x2i ) as before. We simulated several data sets and in most cases did not observe a
noticeable multi-modality in the posterior density. We portray a specific
simulation that did exhibit multi-modality, as this poses a greater challenge
from a sampling perspective. Table 1 shows that the data-generating model
adequately concentrated the posterior mass. Although the full model was
clearly dismissed in light of the data, as an exercise we drew from its posterior. Figure 3 (right) shows 900 Gibbs draws after a 100 burn-in, and Table
2 indicates the auto-correlation. The sampled values adequately captured
the multi-modal, non-elliptical posterior.
5.2
High-dimensional estimation
We perform a simulation study with n = 100 and growing dimensionality
p = 100, 500, 1000. We set θi = 0 for i = 1, . . . , p − 5, the remaining 5
coefficients to (0.6, 1.2, 1.8, 2.4, 3) and consider residual variances φ = 1, 4.
Covariates were sampled from x ∼ N (0, Σ), where Σ has unit variances and
all pairwise correlation set to ρ = 0 or ρ = 0.25. We remark that ρ are
population correlations, the maximum absolute sample correlations when
ρ = 0 being 0.37, 0.44, 0.47 for p = 100, 500, 1000 (respectively), and 0.54,
0.60, 0.62 when ρ = 0.25. We simulated 1,000 data sets under each setup.
Parameter estimates θ̂ were obtained via Bayesian model averaging. Let
δ be the model indicator. For each simulated data set, we performed 1,000
full Gibbs iterations (100 burn-in) which are equivalent to 1,000×p birth18
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
0.5
ρ = 0.25
●
SCAD
●
LASSO
SCAD
●
LASSO
●
0.4 0.5
●
●
HyperG
BP
0.2
HyperG
BP
0.3
SSE
SSE
●
0.2
●
0.7
ρ=0
●
●
MOM
iMOM
MOM
iMOM
●
0.1
0.1
●
100
500
1000
100
500
1000
p
5.0
p
BP
5
BP
2.0
●
LASSO
●
2
SSE
SSE
●
LASSO
●
1.0
●
●
SCAD
●
1
●
●
HyperG
●
SCAD
HyperG
●
iMOM
MOM
iMOM
MOM
0.5
●
100
500
1000
100
p
Figure 4: Mean SSE=
and φ = 4 (bottom)
500
1000
p
Pp
i=1 (θ̂i
− θi )2 when residual variance φ = 1 (top)
19
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
●
●
LASSO
BP
●
iMOM
●
MOM
100
500
●
●
SSE
●
●
●
●
0.002 0.005
SSE
SCAD
HyperG
0.002 0.005
0.020 0.050
●
0.200 0.500
●
0.020 0.050
0.200 0.500
ρ = 0, φ = 1
1000
100
500
p
1000
p
5.00
5.00
ρ = 0, φ = 4
2.00
2.00
BP
●
●
0.50
LASSO
SSE
SCAD
●
●
●
●
●
0.20
●
●
●
●
0.20
SSE
0.50
●
iMOM
0.05
0.02
0.02
0.05
MOM
HyperG
100
500
1000
100
500
p
0.050
0.500
●
●
●
0.200
●
●
SCAD
LASSO
BP
HyperG
●
●
●
0.020
0.020
iMOM
0.050
0.200
●
SSE
0.500
ρ = 0.25, φ = 1
●
SSE
1000
p
MOM
●
100
0.005
0.005
●
500
1000
100
500
p
1000
p
ρ = 0.25, φ = 4
5.00
SCAD
●
HyperG
iMOM
MOM
●
100
500
1000
p
0.50 1.00 2.00
SSE
LASSO
●
●
●
●
●
●
0.05 0.10 0.20
0.50 1.00 2.00
●
●
●
●
0.05 0.10 0.20
SSE
5.00
BP
100
20
500
1000
p
Figure 5: Mean SSE for θi = 0 (left) and θi 6= 0 (right)
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
MOM
iMOM
BP
HG
SCAD
LASSO
p = 172
p̄
R2
4.3 0.566
5.3 0.560
4.2 0.562
10.5 0.559
28 0.560
42 0.586
p = 10, 172
p̄
R2
6.5
0.617
10.3 0.620
259.0 0.014
116.6 0.419
81
0.535
159 0.570
Table 3: Expression data with p = 172 and p = 10, 172 genes. p̄: mean number of predictors (MOM, iMOM, BP, HG) or number of predictors (SCAD,
LASSO). R2 coefficient is between (yi , ŷi ) (leave-one-out cross-validation)
death moves. These provided approximate posterior samples δ1 , . . . , δ1000
from P (δ | y). We estimated model probabilities from the proportion of
MCMC visits and obtained posterior draws for θ using the algorithms in
Section 4.2. For benchmark (BP) and hyper-g (HG) priors we used the
same strategy for δ and drew θ from their corresponding posteriors. For
comparison, we also used SCAD and LASSO to estimate θ (penalty parameter set via 10-fold cross-validation).
P
Figure 4 shows overall Sum of Squared Errors (SSE) pi=1 (θ̂i − θi )2 averaged across simulations for scenarios with φ = 1, 4, ρ = 0, 0.25. MOM and
iMOM perform similarly and present an SSE between 1.15 and 10 times
lower than other methods in all scenarios. As p grows, differences between
methods become larger. To obtain more insight on how the lower SSE is
achieved, Figure 5 shows SSE separately for θi = 0 (left) and θi 6= 0 (right).
The largest differences between methods were observed for zero coefficients,
where SSE remains very stable as p grows for MOM and iMOM and is
smaller for the former. SCAD appears particularly sensitive to increasing p.
For non-zero coefficients differences in SSE are smaller, iMOM slightly outperforming MOM. Here SSE tends to be largest for LASSO. Most methods
show SSE remarkably close to the least squares oracle estimator (Figure 5,
right panels, black horizontal segments).
5.3
Gene expression data
We assess predictive performance in high-dimensional gene expression data.
Calon et al. [2012] used mice experiments to identify 172 genes potentially
related to the gene TGFB, and showed that these were related to colon
21
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
cancer progression in an independent data set with n = 262 human patients. TGFB plays a crucial role in colon cancer, hence it is important
to understand its relation to other genes. Our goal is to predict TGFB in
the human data, first using only the p = 172 genes and then adding 10,000
extra genes. Both response and predictors were standardized to zero mean
and unit variance (data provided as Supplementary Material). We assessed
predictive performance via the leave-one-out cross-validated R2 coefficient
between predictions and observations. For Bayesian methods we report the
posterior expected number of variables in the model (i.e. the mean number of predictors used by BMA), and for SCAD and LASSO the number of
selected variables.
Table 3 shows the results. When using p = 172 predictors all methods achieve similar R2 , that for LASSO being slightly higher. We note
that MOM, iMOM and the Benchmark Prior used substantially less predictors. These results appear reasonable in a moderately dimensional setting
where genes are expected to be related to TGFB. However, when using
p = 10, 172 predictors important differences between methods are observed.
The BMA estimates based on MOM and iMOM priors remain parsimonious
(6.5 and 10.3 predictors, respectively) and the cross-validated R2 increases
roughly by 0.05. In contrast, for the remaining methods the number of
predictors increased sharply (the smallest being 81 predictors for SCAD)
and a drop in R2 was observed. Predictors with large marginal inclusion
probabilities in MOM/iMOM included genes related to various cancer types
(ESM1, GAS1, HIC1, CILP, ARL4C, PCGF2), TGFB regulators (FAM89B)
or AOC3 which is used to alleviate certain cancer symptoms. These findings suggest that NLPs were extremely effective in detecting a parsimonious
subset of predictors in this high-dimensional example. Regarding the efficiency of our proposed posterior sampler, the correlation between θ̂ from
two independent chains with 1,000 iterations each was > 0.99.
6
Discussion
We represented NLPs as mixtures of truncated distributions, which enables
their use for estimation. This one-to-one characterization defines NLPs from
arbitrary local priors, which greatly reduces analytical restrictions in choosing the prior formulation. We provided sampling algorithms for three NLP
families (MOM, iMOM and eMOM) in linear regression. Beyond their computational appeal, latent truncations build NLPs from first principles.
We observed promising results. Posterior samples exhibited low serial
22
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
correlation, captured multi-modalities and independent runs of the chain delivered virtually identical parameter estimates. Also, combining NLPs with
BMA delivered remarkably precise parameter estimates in p >> n cases.
Applying NLPs to gene expression data resulted in extremely parsimonious
predictions that achieved higher cross-validated predictive performance than
competing methods. These results did no require procedures to pre-screen
covariates.
Johnson and Rossell [2012] showed that NLPs provided advantages to
select the data-generating model in p = n cases. Hence, our results show
that it is not only possible to use the same prior for estimation and selection,
but may indeed be desirable. The shrinkage provided by BMA and point
masses appears extremely competitive. We remark that we used default
informative priors, which are relatively popular for testing, but perhaps less
readily adopted for estimation. Developing objective Bayes variations is an
interesting venue for future research.
The proposed construction of NLPs remains feasible beyond linear regression. In particular, inference based on MCMC in several model classes
may be easily achieved by applying our results to extend available posterior
simulation strategies. Promising applications include, but are not limited
to, generalized linear, graphical and mixture models.
A
Proofs and Miscellanea
A.1
Proof of Proposition 1
The goal is to show that for all > 0 there exists η > 0 such that d(θ) < η
implies π(θ) < . By construction, the conditional prior Rdensity is π(θ |
λ) = πu (θ)I(d(θ) > λ)/h(λ), where h(λ) = Pu (d(θ) > λ) = πu (θ)I(d(θ) >
λ)dθ. Let θ be a value such that d(θ) < η, and express the prior density as
Z
π(θ) =
Z
λ≤η
πu (θ)I(d(θ) > λ)
π(λ)dλ +
h(λ)
Z
λ>η
π(θ | λ)π(λ)dλ =
πu (θ)I(d(θ) > λ)
π(λ)dλ
h(λ)
(12)
The second term in (12) is 0, as by assumption d(θ) < η. Now, consider
that for λ ≤ η, h(λ) = Pu (d(θ) > λ) is minimized at λ = η, and therefore
(12) can be bounded by
π(θ) ≤
πu (θ)
R
λ≤η
I(d(θ) > λ)π(λ)dλ
h(η)
=
πu (θ)P (λ < min{η, d(θ)})
(13)
h(η)
23
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
Notice that the numerator can be made arbitrarily small by decreasing η,
since πu (θ) is bounded around θ0 , by assumption there is no prior mass at
λ = 0 so that the cdf in the numerator converges to 0 as η → 0, and that
denominator converges to 1 as η → 0. That is, it is possible to choose η
such that π(θ) ≤ , which gives the result.
A.2
Proof of Proposition 2
We first note that in order for π(θ) to be proper the random variable d(θ)
must have finite expectation with respect to πu (θ). Now, the marginal prior
for θ is
Z
Z d(θ)
πu (θ)I(d(θ) > λ)
π(λ)
π(θ) =
π(λ)dλ = πu (θ)
dλ.
(14)
Pu (d(θ) > λ)
h(λ)
0
R
Suppose we set π(λ) ∝ h(λ), which we can do as long as h(λ)dλ < ∞.
Then π(θ) ∝ πu (θ)d(θ),
which proves the result. The only step left is to
R
show that indeed h(λ)dλ < ∞. In general
Z
Z
h(λ)dλ =
Z
Pu (d(θ) > λ)dλ =
Sd(θ) (λ)dλ,
(15)
where Sd(θ) (λ) is the survival function of the positive random variable d(θ)
and therefore (15) is equal to its expectation Eu (d(θ)) with respect to πu (θ),
which is finite as discussed at the beginning of the proof.
A.3
Proof of Corollary 2
Analogously to the proof of Proposition 2 the marginal prior for θ is π(θ) =
Z
πu (θ) pi=1 I(di (θi ) > λi )
π(λ)dλ1 , . . . , dλp =
Pu (d1 (θ1 ) > λ1 , . . . , dp (θp ) > λp )
Q
Z
...
Z
d1 (θ1 )
Z
...
πu (θ)
0
0
dp (θp )
p
Y
π(λ)
dλ1 , . . . , dλp ∝ πu (θ)
di (θi ),
h(λ)
i=1
(16)
as by assumption π(λ) ∝ h(λ).
A.4
Proof of Proposition 3
By definition, the marginal density π(θ) =
Z
πu (θ)
p
π(λ) Y
I(d(θi ) > λ)dλ = πu (θ)
h(λ) i=1
Z
πu (θ)Pλ (dmin )
Z
I(λ < dmin )
π(λ)
dλ =
h(λ)
1
π(λ | λ < dmin )dλ,
h(λ)
(17)
24
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
where Pλ (dmin ) = P (λ < dmin ) is the cdf of λ evaluated at dmin . As
dmin → 0 we have that π(λ | λ < dmin ) converges to a point mass at zero
and hence the integral in the right hand side of (17) converges to 1/h(0) = 1.
To finish the proof of statement 1 notice that Pλ (dmin ) ∝ dmin π(λ∗ ) by the
Mean Value Theorem, as long as Pλ (·) is differentially and continuous around
0+ , i.e. λ is a continuous random variable.
To prove statement 2, notice that Pλ (dmin ) → 1 as dmin → ∞ and that
the integral in the right hand side of (17) is m(dmin ) = E(1/h(λ) | λ < dmin ),
which is increasing with dmin as h(λ) is monotone decreasing in λ. Hence,
π(θ) ∝ πu (θ)m(dmin ) where m(dmin ) increases as dmin → ∞, i.e. π(θ) has
R
tails at least as thick as πu (θ). Furthermore, if π(λ)
h(λ) < ∞ the Monotone
Converge Theorem applies and m(dmin ) converges to a finite constant, i.e.
π(θ) ∝ πu (θ).
A.5
Proof of Corollary 3
Analogously to the proof of Proposition 3, the marginal density can be
Q
written as π(θ) = πu (θ) pi=1 Pλi (di (θi ))×
Z
Z
...
1
π (λ | λ1 < d1 (θ1 ), . . . , λp < dp (θp )) dλ1 . . . dλp ,
h(λ)
πu (θ)E h(λ)−1 | ∀λi < di (θi )
p
Y
Pλi (di (θi )),
(18)
i=1
where h(λ) is a multivariate survival
function and decreases as di (θi ) → 0.
Hence E h(λ)−1 | ∀λi < di (θi ) decreases as di (θi ) → 0. To find the limit
as di (θi ) → 0 we note that the integral is bounded by the finite integral
obtained plugging di (θi ) = 1 into the integrand. Hence, the Dominated
Convergence Theorem applies, the integral converges to E (h(0)) = 1 and
Q
(18) to πu (θ) pi=1 Pλi (di (θi )). Since λ1 , . . . , λp are continuous the Mean
Value Theorem applies, so that Pλi (di (θi )) = di (θi )π(λ∗i ) for some λ∗i ∈
(0, di (θi )). To prove statement 2, notice that Pλi (di (θi )) → 1 as di (θi ) → ∞
and that E h(λ)−1 | ∀λi < di (θi ) increases as di (θi ) → ∞. Hence, π(θ) ∝
πu (θ)m(θ) where m(θ) is increasing as di (θi ) → ∞, which proves statement
2. Further, if E(h(λ)−1 ) < ∞ the Monotone Convergence Theorem applies
and π(θ) ∝ πu (θ).
25
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
A.6
Multivariate Normal sampling under outer rectangular
truncation
The goal is to sample θ ∼ N (µ, Σ)I (θ ∈ T ) with truncation region T =
{θ : θi < li or θi > ui , i = 1, . . . , p}. We generalize the Gibbs sampling of
Rodriguez-Yam et al. [2004] and importance sampling of Hajivassiliou [1993]
and Keane [1993] to the non-convex region T .
Let D = chol(Σ) be the Cholesky decomposition of Σ and K = D−1 its
inverse, so that KΣK 0 = KDD0 K 0 = I is the identity matrix, and define
α = Kµ. The random variable Z = Kθ follows a N (α, I)I (Z ∈ S) distribution with truncation region S. Since θ = K −1 Z = DZ, denoting di. as the ith
row in D we obtain the truncation region S = {Z : di. Z ≤ li or di. Z ≥ ui , i = 1, . . . , p}.
The full conditionals for Zi given Z(−i) = (Z1 , . . . , Zi−1 , Zi+1 , . . . , Zp )
needed for Gibbs sampling follow from straightforward algebra. Denote by
djk the (j, k) element in D, then Zi | Z(−i) ∼ N (αi , 1) truncated so that
P
P
either dji Zi ≤ lj − k6=i djk Zk or dji Zi ≥ uj − k6=i djk Zk hold simultaneously for j = 1, . . . , p. We now adapt the algorithm to address the fact that
this truncation region is non-convex.
S
The region excluded from sampling can be written as Sic = pj=1 (aj , bj ),
P
P
aj = (lj − k6=i djk Zk )/dji when dji > 0 and aj = (uj − k6=i djk Zk )/dji
when dji < 0 (analogously for bj ). Sic as given is the union of possibly
non-disjoint intervals, which complicates sampling. Fortunately, it can be
S
expressed as a union of disjoint intervals Si = K
j=1 (ãj , b̃j ) with the following
algorithm. Suppose that li are sorted increasingly, set ˜l1 = l1 , ũ1 = u1 and
K = 1. For j = 2, . . . , p repeat the following two steps.
1. If lj > ũK set K = K + 1, ˜lK = lj and ũK = uj , else if lj ≤ ũK and
uj ≥ ũK set ũK = uj .
2. Set j = j + 1.
Finally, because (˜l1 , ũ1 ), . . . , (˜lK , ũK ) are disjoint and increasing, we may
draw a uniform number u in (0, 1) excluding intervals (Φ(˜lj ), Φ(ũj )) and set
Zi = Φ−1 (u), where Φ(·) is the inverse Normal(αi , 1) cdf.
A.7
Monotonicity and inverse of iMOM prior penalty
Consider the product iMOM prior as given in (8). We first study the monotonicity of the penalty d(θi , λ), which for simplicity here we denote as d(θ),
and then provide an algorithm to evaluate its inverse function. Equivalently,
26
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
it is convenient to consider the log-penalty log (d(θ)) =
τφ
1
1
(log(τ τN ) + 2log(φ) + log(2)) − log (θ − θ0 )2 −
+
(θ − θ0 )2 ,
2
2
(θ − θ0 )
2τN φ
(19)
as its inverse uniquely determines the inverse of d(θ). Denoting z = (θ−θ0 )2 ,
(19) can be written as
g(z) =
1
τφ
1
(log(τ τN ) + 2log(φ) + log(2)) − log(z) −
+
z.
2
z
2τN φ
(20)
To show the monotonicity of (20) we compute its derivative g 0 (z) = − z1 +
τφ
+ 2τ1N φ and show that it is positive for all z. Clearly, both when z → 0
z2
and z → ∞ we have positive g 0 (z). Hence we just need to see that there is
some τN for which all roots of g 0 (z) are imaginary, so that g 0 (z) >q0 for all
z. Simple algebra shows that the roots of g 0 (z) are z = τN φ ± τN φ 1 − τ2τN ,
so that for τN ≤ 2τ there are no real roots. Hence, for τN ≤ 2τ g(z) is
monotone increasing.
We now provide an algorithm to evaluate the inverse. That is, given a
threshold t we seek z0 such that g(z0 ) = t. Our strategy is to obtain an
initial guess from an approximation to g(z) and then use continuity and
monotonicity to bound the desired z0 and conduct a linear interpolation
based search. Inspecting the expression for g(z) in (20) we see that the
term log(z) is dominated by τ φ/z when z approaches 0 and by 2τzN φ when z
is large. Hence, we approximate g(z) by dropping the log(z) term, obtaining
1
τφ
1
g(z) ≈ (log(τ τN ) + 2log(φ) + log(2)) −
+
z.
2
z
2τN φ
q
(21)
Setting (21) equal to t and solving for z gives z0 = τN φ −b + b2 − 2 ττN
as an initial guess, where b = log(τ τN ) + 2log(φ) + log(2) − t.
If g(z0 ) < t we set a lower bound zl = z0 and an upper bound zu obtained
by increasing z0 by a factor of 2 until g(z0 ) > t. Similarly, if g(z0 ) > t
we set the upper bound zu = z0 and find a lower bound by successively
dividing z0 by a factor of 0.5. Once (zl , zu ) are determined, we use a linear
interpolation to update z0 , evaluate g(z0 ) and update either zl or zu . The
process continues until |g(z0 ) − t| is below some tolerance (we used 10−5 ).
In our experience the initial guess is often quite good and the algorithm
converges in very few iterations.
27
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
Acknowledgements
This research was partially funded by the NIH grant R01 CA158113-01.
References
M.J. Bayarri and G. Garcia-Donato. Extending conventional priors for testing general hypotheses in linear models. Biometrika, 94:135–152, 2007.
J.O. Berger and R.L Pericchi. The intrinsic Bayes factor for model selection
and prediction. Journal of the American Statistical Association, 91:109–
122, 1996.
J.O. Berger and R.L. Pericchi. Objective Bayesian methods for model selection: introduction and comparison, volume 38, pages 135–207. Institute
of Mathematical Statistics lecture notes - Monograph series, 2001.
J.M. Bernardo. Integrated objective Bayesian estimation and hypothesis testing. In J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid,
D. Heckerman, A.F.M Smith, and M. West, editors, Bayesian Statistics
9 - Proceedings of the ninth Valencia international meeting, pages 1–68.
Oxford University Press, 2010.
A. Calon, E. Espinet, S. Palomo-Ponce, D.V.F. Tauriello, M. Iglesias, M.V.
Céspedes, M. Sevillano, C. Nadal, P. Jung, X.H.-F. Zhang, D. Byrom,
A. Riera, D. Rossell, R. Mangues, J. Massague, E. Sancho, and E. Batlle.
Dependency of colorectal cancer on a tgf-beta-driven programme in stromal cells for metastasis initiation. Cancer Cell, 22(5):571–584, 2012.
G. Consonni and L. La Rocca. On moment priors for Bayesian model
choice with applications to directed acyclic graphs. In J.M. Bernardo,
M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M Smith, and
M. West, editors, Bayesian Statistics 9 - Proceedings of the ninth Valencia
international meeting, pages 119–144. Oxford University Press, 2010.
J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and
its oracle properties. Journal of the American Statistical Association, 96:
1348–1360, 2001.
J. Fan and J. Lv. A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20:101–140, 2010.
28
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
C. Fernández, E. Ley, and M.F.J. Steel. Benchmark priors for Bayesian
model averaging. Journal of Econometrics, 100:381–427, 2001.
E. George, F. Liang, and X. Xu. From minimax shrinkage estimation to
minimax shrinkage prediction. Statistical Science, 27:82–94, 2012.
V.A. Hajivassiliou. Simulating normal rectangle probabilities and their
derivatives: the effects of vectorization. Econometrics, 11:519–543, 1993.
H. Jeffreys. Theory of Probability. Oxford University Press, Oxford, England, third edition, 1961.
V.E. Johnson and D. Rossell. Prior densities for default Bayesian hypothesis
tests. Journal of the Royal Statistical Society B, 72:143–170, 2010.
V.E. Johnson and D. Rossell. Bayesian model selection in high-dimensional
settings. Journal of the American Statistical Association, 24(498):649–
660, 2012.
R.E. Kass and L. Wasserman. A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American
Statistical Association, 90:928–934, 1995.
M.P. Keane. Simulation estimation for panel data models with limited dependent variables. Econometrics, 11:545–571, 1993.
Irene Klugkist and Herbert Hoijtink. The bayes factor for inequality and
about equality constrained models. Computational Statistics & Data
Analysis, 51(12):6367 – 6379, 2007. ISSN 0167-9473.
J.H. Kotecha and P.M. Djuric. Gibbs sampling approach for generation of
truncated multivariate gaussian random variables. In Proceedings, 1999
IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1757–1760. IEEE Computer Society, 1999.
F. Liang, R. Paulo, G. Molina, M.A. Clyde, and J.O. Berger. Mixtures of gpriors for Bayesian variable selection. Journal of the American Statistical
Association, 103:410–423, 2008.
F. Liang, Q. Song, and K. Yu. Bayesian modeling for high-dimensional
generalized linear models. Journal of the American Statistical Association,
108(502):589–606, 2013.
29
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
E. Moreno, F. Bertolino, and W. Racugno. An intrinsic limiting procedure for model selection and hypotheses testing. Journal of the American
Statistical Association, 93:1451–1460, 1998.
A. O’Hagan. Fractional Bayes factors for model comparison. Journal of the
Royal Statistical Society, Series B, 57:99–118, 1995.
J.M. Pérez and J.O. Berger. Expected posterior prior distributions for model
selection. Biometrika, 89:491–512, 2002.
G. Rodriguez-Yam, R.A. Davis, and L.L. Scharf. Efficient Gibbs sampling
of truncated multivariate normal with application to constrained linear regression. PhD thesis, Department of Statistics, Colorado State University,
2004.
D. Rossell, D. Telesca, and V.E. Johnson. High-dimensional Bayesian classifiers using non-local priors. In Statistical Models for Data Analysis XV,
pages 305–314. Springer, 2013.
J. Rousseau. Approximating interval hypothesis: p-values and Bayes factors.
In J.M. Bernardo, M.J. Bayarri, J.O. Berger, and A.P. Dawid, editors,
Bayesian Statistics 8, pages 417–452. Oxford University Press, 2010.
M.D. Springer and W.E. Thompson. The distribution of products of beta,
gamma and gaussian random variables. SIAM Journal of Applied Mathematics, 18(4):721–737, 1970.
C. Stein. Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proceedings of the Third Berkeley Symposium on
Mathematical Statistics and Probability, volume 1, pages 197–206, 1956.
R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of
the Royal Statistical Society, B, 58:267–288, 1996.
I. Verdinelli and L. Wasserman. Bayes factors, nuisance parameters and
imprecise tests. In J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M.
Smith, editors, Bayesian Statistics 5, pages 765–771. Oxford University
Press, 1996.
S. Wilhelm and B.G. Manjunath. tmvtnorm: a package for the truncated
multivariate normal distribution. The R Journal, 2:25–29, 2010.
A. Zellner. On assessing prior distributions and bayesian regression analysis
with g-prior distributions. In Bayesian inference and decision techniques:
30
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
essays in honor of Bruno de Finetti, Amsterdam; New York, 1986. NorthHolland/Elsevier.
A. Zellner and A. Siow. Posterior odds ratios for selected regression hypotheses. In J.M. Bernardo, M.H. DeGroot, D.V. Lindley, and A.F.M.
Smith, editors, Bayesian statistics: Proceedings of the first international
meeting held in Valencia (Spain), volume 1. Valencia: University Press,
1980.
A. Zellner and A. Siow. Basic issues in econometrics. University of Chicago
Press, Chicago, 1984.
31
CRiSM Paper No. 14-10, www.warwick.ac.uk/go/crism
Download