advertisement

Advances in Neural Information Processing Systems 24 (2011) Generalized Beta Mixture of Gaussians Artin Armagan, David B. Dunson and Merlise Clyde Presented by Shaobo Han, Duke University July 22, 2013 A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 1/8 Outline 1 Introduction 2 The Horseshoe Estimator 3 Generalized Beta Mixture of Normals 4 Equivalent Conjugate Hierarchies 5 Experiments A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 2/8 Introduction Two alternatives in Bayesian sparse learning: Point mass mixture priors θj ∼ (1 − π)δ0 + πgθ , j = 1, . . . , p (1) Continuous shrinkage priors I I Inducing regularization penalties Global-local shrinkage [N. G. Polson and J. G. Scott(2010)]1 θj ∼ N (0, ψj τ ), ψj ∼ f , τ ∼ g (2) e.g. the normal-exponential-gamma, the horseshoe and the generalized double Pareto. I Computationally attractive 1 Shrinkage globally, act locally: sparse Bayesian regularization and prediction, Bayesian statistics, 2010 A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 3/8 The horseshoe estimator [Sparse normal-means problem] Suppose a p− dimensional vector y|θ ∼ N (θ, I) is observed. θj |τj ∼ N (0, τj ), 1/2 τj ∼ C + (0, φ1/2 ), j = 1, . . . , p (3) With an appropriate transformation ρj = 1/(1 + τj ), θj |ρj ∼ N (0, 1/ρj − 1), −1/2 π(ρj |φ) ∝ ρj (1 − ρj )−1/2 1 1 + (φ − 1)ρj For fixed values φ = 1 [C. M. Carvalho, et. al. (2010)]2 , Z 1 E(θj |y) = yj (1 − ρj )p(ρj |y)d ρj = {1 − E(ρj |y)}yj (4) (5) 0 Two major issues: robustness to large signals and shrinkage of noise. Horseshoe prior: φ = 1, ρj ∼ B(1/2, 1/2). Strawderman-Berger prior: φ = 1, ρj ∼ B(1, 1/2). 2 The horseshoe estimator for sparse signals, Biometrika, 2010 A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 4/8 TPB normal scale mixture prior Definition 1 The three-parameter beta (TPB) distribution for a random variable X is defined by the density function f (x; a, b, φ) = Γ(a + b) b b−1 φ x (1 − x)a−1 {1 + (φ − 1)x}−(a+b) Γ(a)Γ(b) (6) for 0 < x < 1, a > 0, b > 0 and φ > 0 and is denoted by T PB(a, b, φ) Definition 2 The TPB normal scale mixture representation for the distribution of random variable θj is given by θj |ρj ∼ N (0, 1/ρj − 1), ρj ∼ T PB(a, b, φ) (7) where a > 0, b > 0 and φ > 0. The resulting marginal distribution on θj is denoted by T PBN (a, b, φ). A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 5/8 Shrinkage effects for different values of a, b and φ θj |ρj ∼ N (0, 1/ρj − 1), ρj ∼ T PB(a, b, φ) (8) (a,b)=(1/2,1/2) (a,b)=(1,1/2) (a,b)=(1,1) (a) (b) (c) (a,b)=(1/2,2) (a,b)=(2,2) (a,b)=(5,2) (d ) (e) (f ) Figure 1: φ = {1/10, 1/9, 1/8, 1/7, 1/6, 1/5, 1/4, 1/3, 1/2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} considered for all pairs of a and b. The line corresponding to the lowest value of φ is drawn with a dashed line. A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 6/8 Equivalence of hierarchies Proposition 3 If θj ∼ T PBN (a, b, φ), then 1). θj ∼ N (0, τj ), τj ∼ G(a, λj ) and λj ∼ G(b, φ). Γ(a+b) −a a−1 2). θj ∼ N (0, τj ), π(τj ) = Γ(a)Γ(b) φ τ (1 + τj /φ)−(a+b) which implies 0 that τj φ ∼ β (a, b), the inverted beta distribution with parameter a and b. Corollary 4 If a = 1 in Proposition 3.1, then TPBN ≡ NEG . If (a, b, φ) = (1, 1/2, 1) in Proposition 3.1, then TPBN ≡ SB ≡ NEG Corollary 5 1/2 If θj ∼ N (0, τj ), τj ∼ C + (0, φ1/2 ) and φ1/2 ∼ C + (0, 1), then θj ∼ T PBN (1/2, 1/2, φ), φ ∼ G(1/2, ω), and ω ∼ G(1/2, 1). A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 7/8 Experiments: sparse linear regression (n,p)=(50,20) (n,p)=(250,100) (a) (b) Figure 2: Relative ME at different (a, b, φ) values for (a) Case 1 and (b) Case 2. Both contains on average 10 nonzero elements. (n,p)=(100,10000), Figure 3: Posterior mean by sampling (square) and by approximate inference (circle). (a, b, φ) = (1, 1/2, 10−4 ) A.Armagan et al., 2012 Generalized Beta Mixture of Gaussians 8/8