Generalized Beta Mixture of Gaussians

advertisement
Advances in Neural Information Processing Systems 24 (2011)
Generalized Beta Mixture of Gaussians
Artin Armagan, David B. Dunson and Merlise Clyde
Presented by Shaobo Han, Duke University
July 22, 2013
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
1/8
Outline
1
Introduction
2
The Horseshoe Estimator
3
Generalized Beta Mixture of Normals
4
Equivalent Conjugate Hierarchies
5
Experiments
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
2/8
Introduction
Two alternatives in Bayesian sparse learning:
Point mass mixture priors
θj ∼ (1 − π)δ0 + πgθ ,
j = 1, . . . , p
(1)
Continuous shrinkage priors
I
I
Inducing regularization penalties
Global-local shrinkage [N. G. Polson and J. G. Scott(2010)]1
θj ∼ N (0, ψj τ ),
ψj ∼ f , τ ∼ g
(2)
e.g. the normal-exponential-gamma, the horseshoe and the
generalized double Pareto.
I
Computationally attractive
1
Shrinkage globally, act locally: sparse Bayesian regularization and prediction,
Bayesian statistics, 2010
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
3/8
The horseshoe estimator
[Sparse normal-means problem] Suppose a p− dimensional vector
y|θ ∼ N (θ, I) is observed.
θj |τj ∼ N (0, τj ),
1/2
τj
∼ C + (0, φ1/2 ),
j = 1, . . . , p
(3)
With an appropriate transformation ρj = 1/(1 + τj ),
θj |ρj ∼ N (0, 1/ρj − 1),
−1/2
π(ρj |φ) ∝ ρj
(1 − ρj )−1/2
1
1 + (φ − 1)ρj
For fixed values φ = 1 [C. M. Carvalho, et. al. (2010)]2 ,
Z 1
E(θj |y) =
yj (1 − ρj )p(ρj |y)d ρj = {1 − E(ρj |y)}yj
(4)
(5)
0
Two major issues: robustness to large signals and shrinkage of noise.
Horseshoe prior: φ = 1, ρj ∼ B(1/2, 1/2).
Strawderman-Berger prior: φ = 1, ρj ∼ B(1, 1/2).
2
The horseshoe estimator for sparse signals, Biometrika, 2010
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
4/8
TPB normal scale mixture prior
Definition 1
The three-parameter beta (TPB) distribution for a random variable X is
defined by the density function
f (x; a, b, φ) =
Γ(a + b) b b−1
φ x
(1 − x)a−1 {1 + (φ − 1)x}−(a+b)
Γ(a)Γ(b)
(6)
for 0 < x < 1, a > 0, b > 0 and φ > 0 and is denoted by T PB(a, b, φ)
Definition 2
The TPB normal scale mixture representation for the distribution of
random variable θj is given by
θj |ρj ∼ N (0, 1/ρj − 1),
ρj ∼ T PB(a, b, φ)
(7)
where a > 0, b > 0 and φ > 0. The resulting marginal distribution on θj is
denoted by T PBN (a, b, φ).
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
5/8
Shrinkage effects for different values of a, b and φ
θj |ρj ∼ N (0, 1/ρj − 1),
ρj ∼ T PB(a, b, φ)
(8)
(a,b)=(1/2,1/2)
(a,b)=(1,1/2)
(a,b)=(1,1)
(a)
(b)
(c)
(a,b)=(1/2,2)
(a,b)=(2,2)
(a,b)=(5,2)
(d )
(e)
(f )
Figure 1:
φ = {1/10, 1/9, 1/8, 1/7, 1/6, 1/5, 1/4, 1/3, 1/2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
considered for all pairs of a and b. The line corresponding to the lowest value of
φ is drawn with a dashed line.
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
6/8
Equivalence of hierarchies
Proposition 3
If θj ∼ T PBN (a, b, φ), then
1). θj ∼ N (0, τj ), τj ∼ G(a, λj ) and λj ∼ G(b, φ).
Γ(a+b) −a a−1
2). θj ∼ N (0, τj ), π(τj ) = Γ(a)Γ(b)
φ τ
(1 + τj /φ)−(a+b) which implies
0
that τj φ ∼ β (a, b), the inverted beta distribution with parameter a and b.
Corollary 4
If a = 1 in Proposition 3.1, then TPBN ≡ NEG . If (a, b, φ) = (1, 1/2, 1) in
Proposition 3.1, then TPBN ≡ SB ≡ NEG
Corollary 5
1/2
If θj ∼ N (0, τj ), τj ∼ C + (0, φ1/2 ) and φ1/2 ∼ C + (0, 1), then
θj ∼ T PBN (1/2, 1/2, φ), φ ∼ G(1/2, ω), and ω ∼ G(1/2, 1).
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
7/8
Experiments: sparse linear regression
(n,p)=(50,20)
(n,p)=(250,100)
(a)
(b)
Figure 2: Relative ME at different (a, b, φ) values for (a) Case 1 and (b) Case 2.
Both contains on average 10 nonzero elements.
(n,p)=(100,10000),
Figure 3: Posterior mean by sampling (square) and by approximate inference
(circle). (a, b, φ) = (1, 1/2, 10−4 )
A.Armagan et al., 2012
Generalized Beta Mixture of Gaussians
8/8
Download