An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri, University of Valencia; Woncheol Jang, University of Georgia; Luis R. Pericchi, University of Puerto Rico JSM: July-August 2007 - slide #1 Motivation and Outline Introduction and Motivation ➲ Motivation and Outline ➲ Bayesian Information Criterion: Initiated from a SAMSI Social Sciences working group. “getting the model right” in structural equation modeling. BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group means ➲ Example: Common mean, differing variances: ➲ Problem with n and p Introduced in the Wald Lecture by Jim Berger. Specific characteristics They have ‘independent cases’ (individuals). Often ‘Multilevel’ or ‘random effects’ models (p grows as n grows). Regularity Conditions may not be satisfied. Problems with BIC. EBIC The prior EBIC* Effective sample size Simulation Can a generalization of BIC work? Goal: Use only Standard software output and close-form expressions. Present a generalization of BIC (‘extended BIC’ or EBIC) and some examples in linear regression scenario. Work in progress (comments are welcome!) JSM: July-August 2007 - slide #2 Bayesian Information Criterion: BIC Introduction and Motivation ➲ Motivation and Outline BIC = 2l(θ̂|x)−p log(n), ➲ Bayesian Information Criterion: BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group l(θ̂|x) the log-likelihood of θ̂ given x p the number of estimated parameters. n the cardinality of x. means ➲ Example: Common mean, differing variances: ➲ Problem with n and p EBIC The prior EBIC* Swartz’s result: As n → ∞ (with p fixed) this is an approximation (up to a constant) to R twice the Bayesian log likelihood for the model, m(x) = f (x | θ)π(θ)dθ , so that m(x) = cπ eBIC/2 (1 + o(1)) . Effective sample size Simulation BIC approaches to BF’s further ignore the c π ’s check this JSM: July-August 2007 - slide #3 First Ambiguity: What is n? Introduction and Motivation ➲ Motivation and Outline Consider time series yit , ➲ Bayesian Information Criterion: yit = βyi t−1 + εit , i = 1, . . . , n∗ , t = 1, . . . T, BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be εit ∼ N (0, σ 2 ). different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group means When the goal is to estimate the mean level ȳ , consider two extreme cases: ➲ Example: Common mean, differing variances: ➲ Problem with n and p 1. EBIC β = 0; then the yit are i.i.d., and the estimated parameter ȳ has sample size n = n∗ × T β = 1; then clearly the effective sample size for ȳ equals n = n ∗ The prior 2. EBIC* See Thiebaux & Zwiers (1984) for discussion of effective sample size in time series. Effective sample size Simulation JSM: July-August 2007 - slide #4 Second Ambiguity: Can n be different for differnt parameters? Introduction and Motivation Consider data from p groups ➲ Motivation and Outline ➲ Bayesian Information Criterion: yip = µp + εip , i = 1 . . . n∗ , p = 1 . . . r BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be εip ∼ N (0, σ 2 ) different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group means ➲ Example: Common mean, differing variances: ➲ Problem with n and p EBIC µp and σ 2 both unknown. Group 1 y11 y21 ... y n∗ 1 Group 2 y12 y22 ... y n∗ 2 . . . . . . y1p y2p . . . . . . y1r y2r The prior EBIC* Effective sample size Group p Simulation Group r . . . . ... y n∗ p .. . . . . ... y n∗ r .. JSM: July-August 2007 - slide #5 Second Ambiguity: Can n be different for differnt parameters? Introduction and Motivation Consider data from p groups ➲ Motivation and Outline ➲ Bayesian Information Criterion: BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group means ➲ Example: Common mean, yip = µp + εip , i = 1 . . . n∗ , p = 1 . . . r εip ∼ N (0, σ 2 ) µp and σ 2 both unknown. differing variances: ➲ Problem with n and p EBIC The prior EBIC* Different parameters can have different sample sizes: Effective sample size the sample size for estimating each µp is n Simulation the sample size for estimating σ is n 2 ∗ ∗ ×r JSM: July-August 2007 - slide #5 Second Ambiguity: Can n be different for differnt parameters? Introduction and Motivation Consider data from p groups ➲ Motivation and Outline ➲ Bayesian Information Criterion: yip = µp + εip , i = 1 . . . n∗ , p = 1 . . . r BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be εip ∼ N (0, σ 2 ) different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group means ➲ Example: Common mean, µp and σ 2 both unknown. differing variances: ➲ Problem with n and p EBIC The prior EBIC* Computation for this example yields Effective sample size Î = Simulation where σ̂ 2 = 1 n Pp i=1 Pr 0 ∗ n I @ σ̂2 r×r l=1 (Xil 0 − X̄i )2 . 0 n 2σ̂ 4 1 A, JSM: July-August 2007 - slide #5 Third Ambiguity: What is the number of parameters Introduction and Motivation ➲ Motivation and Outline ➲ Bayesian Information Criterion: BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be different for differnt parameters? Example: Random effects group means In the previous Group Means example, with p groups and r observations X il per 2 group, with Xil ∼ N (µi , σ ) for l = 1, . . . , r , consider the multilevel version in which µi ∼ N (ξ, τ 2 ), ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group means ➲ Example: Common mean, 2 with ξ and τ being unknown. differing variances: ➲ Problem with n and p EBIC The prior EBIC* Effective sample size What is the number of parameters? (see also Pauler (1998)) = 0, there is only two parameters: ξ and σ 2 . 2 (2) If τ is huge, is the number of parameters p + 3 ? 2 2 the means along with ξ, τ and σ (1) If τ 2 Continued ... Simulation JSM: July-August 2007 - slide #6 Example: Random effects group means Introduction and Motivation ➲ Motivation and Outline (3) But, if one integrates out µ ➲ Bayesian Information Criterion: BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be f (x | σ 2 , ξ, τ 2 ) = different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ∝ ➲ Example: Random effects group means ➲ Example: Common mean, differing variances: ➲ Problem with n and p EBIC The prior EBIC* Effective sample size Simulation so p = (µ1 , . . . , µp ), then Z f (x | µ, ξ, σ 2 )π(µ | ξ, τ 2 )dµ ! p ! 2 Y 2 ˆ (x¯i − ξ) 1 σ exp exp − , σ2 2 2σ 2 i=1 σ −p(r−1) 2( r + τ ) = 3 if one can work directly with f (x | σ 2 , ξ, τ 2 ). Note: This seems to be common practice in Social Sciences: latent variables are integrated out, so remaining parameters are the only unknowns when applying BIC Note: In this example the effective sample sizes should be ≈ 2 and τ , and ≈ r for the µi ’s. p r for σ 2 , ≈ p for ξ JSM: July-August 2007 - slide #7 Example: Common mean, differing variances: Introduction and Motivation ➲ Motivation and Outline ➲ Bayesian Information Criterion: BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters (Cox mixtures) Suppose each independent observation X i , i = 1, . . . , n, has probability 1/2 of arising from the N (θ, 1), and probability 1/2 of arising from the N (θ, 1000). Clearly half of the observations are worthless, so the ‘effective sample size’ is roughly n/2. ➲ Example: Random effects group means ➲ Example: Common mean, differing variances: ➲ Problem with n and p EBIC The prior EBIC* Effective sample size Simulation JSM: July-August 2007 - slide #8 Problem with n and p Introduction and Motivation ➲ Motivation and Outline ➲ Bayesian Information Criterion: BIC ➲ First Ambiguity: What is n? ➲ Second Ambiguity: Can n be different for differnt parameters? ➲ Third Ambiguity: What is the number of parameters ➲ Example: Random effects group means Problems with n. Is n the number of vector observations or the number of real observations? Different θi can have different effective sample sizes. Some observations can me more informative than others as in mixture models models with mixed, continuous and discrete observations ➲ Example: Common mean, differing variances: ➲ Problem with n and p EBIC The prior Problems with p. What is p with random effects, latent variables, mixture model... etc. ? Often p grows with n EBIC* Effective sample size Simulation JSM: July-August 2007 - slide #9 EBIC: a proposed solution Introduction and Motivation EBIC ➲ EBIC: a proposed solution ➲ Laplace approximation The prior EBIC* Effective sample size Simulation Based on a modified Laplace approximation to m(x) for good priors: The good priors can deal with growing p The Laplace approximation retains the constant cπ in the expansion often good even for moderate n Implicitly uses ‘effective sample size’ (discussion later) Computable using standard software using mle’s and observed information matrices. JSM: July-August 2007 - slide #10 EBIC: a proposed solution Introduction and Motivation EBIC ➲ EBIC: a proposed solution ➲ Laplace approximation The prior EBIC* Effective sample size Simulation Based on a modified Laplace approximation to m(x) for good priors: The good priors can deal with growing p The Laplace approximation retains the constant cπ in the expansion often good even for moderate n Implicitly uses ‘effective sample size’ (discussion later) Computable using standard software using mle’s and observed information matrices. JSM: July-August 2007 - slide #10 EBIC: a proposed solution Introduction and Motivation EBIC ➲ EBIC: a proposed solution ➲ Laplace approximation The prior EBIC* Effective sample size Simulation Based on a modified Laplace approximation to m(x) for good priors: The good priors can deal with growing p The Laplace approximation retains the constant cπ in the expansion often good even for moderate n Implicitly uses ‘effective sample size’ (discussion later) Computable using standard software using mle’s and observed information matrices. JSM: July-August 2007 - slide #10 Laplace approximation Introduction and Motivation 1. Preliminary: ‘nice’ reparameterization. EBIC 2. Taylor expansion ➲ EBIC: a proposed solution ➲ Laplace approximation By a Taylor’s series expansion of e The prior EBIC* m(x) = Effective sample size Simulation ≈ l(θ ) Z Z b, about the mle θ f (x | θ)π(θ)dθ = el(θ ) π(θ)dθ » Z ”– “ ”t “ 1 exp l(θ̂) + (θ − θ̂)t ∇l(θ̂) − θ − θ̂ Î θ − θ̂) π(θ)dθ 2 where ∇ denotes the gradient and matrix, with (j, k) entry 3. Approximation to marginal m(x) b = (Iˆjk ) is the observed information I b occurs on the interior of the parameter space, so ∇l( θ) b = 0 (if not true, the If θ analysis must proceed as in Haughton 1991,1993), mild conditions yield m(x) = e b l(θ ) Z e b tb c −1 ( θ − θ ) I ( θ − θ ) 2 π(θ)dθ(1 + on (1)) . JSM: July-August 2007 - slide #11 Choosing a good prior Introduction and Motivation Integrate out common parameters EBIC Orthogonalize the parameters. The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation JSM: July-August 2007 - slide #12 Choosing a good prior Introduction and Motivation Integrate out common parameters EBIC Orthogonalize the parameters. Let O be orthogonal and D The prior ➲ Choosing a good prior ➲ Univariate priors = diag(di ), i = 1, . . . p1 such that Σ = O t DO ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation JSM: July-August 2007 - slide #12 Choosing a good prior Introduction and Motivation Integrate out common parameters EBIC Orthogonalize the parameters. Let O be orthogonal and D The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* = diag(di ), i = 1, . . . p1 such that Σ = O t DO Make the change of variables ξ b(1) ξ = Oθ = Oθ (1) , and let b Effective sample size Simulation JSM: July-August 2007 - slide #12 Choosing a good prior Introduction and Motivation Integrate out common parameters EBIC Orthogonalize the parameters. Let O be orthogonal and D The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* = diag(di ), i = 1, . . . p1 such that Σ = O t DO Make the change of variables ξ b(1) ξ = Oθ = Oθ (1) , and let b If there is no θ (2) to be integrated out, Σ = Î −1 = O t DO Effective sample size Simulation JSM: July-August 2007 - slide #12 Choosing a good prior Introduction and Motivation Integrate out common parameters EBIC Orthogonalize the parameters. Let O be orthogonal and D The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation = diag(di ), i = 1, . . . p1 such that Σ = O t DO Make the change of variables ξ b(1) ξ = Oθ = Oθ (1) , and let b −1 = Î = O t DO Qp1 We now use a prior that is independent in the ξ i i.e, π(ξ) = i=1 πi (ξi ) If there is no θ (2) to be integrated out, Σ JSM: July-August 2007 - slide #12 Choosing a good prior Introduction and Motivation Integrate out common parameters EBIC Orthogonalize the parameters. Let O be orthogonal and D The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation = diag(di ), i = 1, . . . p1 such that Σ = O t DO Make the change of variables ξ b(1) ξ = Oθ = Oθ (1) , and let b −1 = Î = O t DO Qp1 We now use a prior that is independent in the ξ i i.e, π(ξ) = i=1 πi (ξi ) The approximation to m(x) is then "p Z # 2 1 (ξ −ξ̂ ) Y ˆ 1 − i2d i l(θ ) p/2 −1/2 i √ m(x) ≈ e (2π) |Î| e πi (ξi )dξi . 2πdi i=1 If there is no θ (2) to be integrated out, Σ where we recall that di are the diagonal elements of D from Σ that is, Var(ξi ) . They will appear often in what follows. = O t DO , JSM: July-August 2007 - slide #12 Univariate priors Introduction and Motivation For the priors πi (ξi ), there are several possibilities: EBIC 1. Jeffreys recommended the Cauchy(0, b i ) density The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors πiC (ξi ) = ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation Z ∞ 0 « „ « „ ˛ 1 1 1 ˛ dλi N ξi ˛ 0, bi Ga λi | , λi 2 2 (di )−1 where (bi ) = is the unit information for ξi with ni being the ni effecteive sample size for ξi (Kass & Wasserman, 1995). −1 2 Note: For i.i.d. scenarios, di is like σi /ni so bi is like the variance of one observation. 2. Use other sensible (objective) testing priors, like the intrinsic prior. 3. Goal is to get closed form expression: We use priors proposed in Berger (1985) which is very close to both the Cauchy and the intrinsic priors and produces close form expressions for m(x) for normal likelihoods. JSM: July-August 2007 - slide #13 Berger’s robust priors Introduction and Motivation EBIC The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC For if bi ≥ di , the prior is given by « Z 1 „ ˛ 1 1 ˛ √ dλi , πiR (ξi ) = N ξi ˛ 0, (di + bi ) − di 2λi 2 λi 0 EBIC* Effective sample size Simulation JSM: July-August 2007 - slide #14 Berger’s robust priors Introduction and Motivation EBIC The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation For if bi ≥ di , the prior is given by « Z 1 „ ˛ 1 1 ˛ √ dλi , πiR (ξi ) = N ξi ˛ 0, (di + bi ) − di 2λi 2 λi 0 Introduced by Berger in the context of Robust Analyses No closed form expression. Not widely used in model selection Behaves like Cauchy priors with parameters b i . JSM: July-August 2007 - slide #14 Berger’s robust priors Introduction and Motivation EBIC The prior ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation For if bi ≥ di , the prior is given by « Z 1 „ ˛ 1 1 ˛ √ dλi , πiR (ξi ) = N ξi ˛ 0, (di + bi ) − di 2λi 2 λi 0 Introduced by Berger in the context of Robust Analyses No closed form expression. Not widely used in model selection Behaves like Cauchy priors with parameters b i . Gives closed form expression for approximation to m(x) m(x) ≈ e where (di ) ˆ l(θ ) −1 (2π) p/2 |Î| −1/2 2 4 p1 Y i=1 “ −ξ̂i2 /(di +bi ) ”3 1−e 1 5 p √ 2 ˆ 2π(di + bi ) 2 ξi /(di + bi ) is global information about ξi , and (bi ) −1 is unit information. JSM: July-August 2007 - slide #14 Proposal for EBIC Introduction and Motivation Finally, we have, as the approximation to 2 log m(x), EBIC The prior ➲ Choosing a good prior EBIC ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC Effective sample size Simulation 2l(θ̂) + (p − p1 ) log(2π) − log |Î 22 | − i=1 log(1 + ni ) −vi Pp1 ) (1−e ξ̂i2 +2 i=1 log √2 v , where vi = bi +d . i i ➲ Comparison of EBIC and BIC EBIC* ≡ Pp1 (error as approximation to 2 log m(x) is o n (1); exact for Normals) If no parameters are integrated out (so that p 1 = p), then ´ 1 − e−vi log(1 + ni ) + 2 log √ . EBIC = 2l(θ̂) − 2 vi i=1 i=1 p X p X ` If all ni = n, the dominant terms in the expression (as n → ∞) are 2l(θ̂) − p log n. The third term (the ‘constant’ ignored by Swartz) is negative. JSM: July-August 2007 - slide #15 Comparison of EBIC and BIC Introduction and Motivation Compare this EBIC and BIC: EBIC The prior EBIC ➲ Choosing a good prior ➲ Univariate priors ➲ Berger’s robust priors ➲ Proposal for EBIC BIC ➲ Comparison of EBIC and BIC EBIC* Effective sample size Simulation where kπ = 2l(θ̂) − p X i=1 log(1 + ni ) − kπ = 2l(θ̂) − p log n > 0 is the ‘ignored’ constant from π It is easy to intuitively notice that ‘BIC makes two mistakes’: Penalizes too much with p log n. Note that n i ≤ n Penalizes too little with the prior (the third term, a new penalization, is absent) The second ‘mistake’ actually helps the first one (in this sense, BIC is ‘lucky’), but often the ‘first mistake’ completely overwhelms this little help. JSM: July-August 2007 - slide #16 EBIC*: Modification More Favorable to Complex Models Introduction and Motivation Priors centered at zero penalize complex models too much? EBIC Priors centered around MLE ( Raftery, 1996) favor complex models too much? The prior An attractive compromise: Cauchy-type priors centered at zero, but with the scales, bi , chosen so as to maximize the marginal likelihood of the model. Empirical Bayes alternative, popularized in the robust Bayesian literature (Berger, 1994) EBIC* ➲ EBIC*: Modification More Favorable to Complex Models Effective sample size Simulation JSM: July-August 2007 - slide #17 EBIC*: Modification More Favorable to Complex Models Introduction and Motivation The bi that maximizes m(x) can easily be seen to be EBIC The prior EBIC* ξˆi2 b̂i = max{di , − di }, with w s.t. ew = 1 + 2w, or w ≈ 1.3 . w ➲ EBIC*: Modification More Favorable to Complex Models Effective sample size Simulation Problem: when ξi as ni → ∞. = 0, this empirical Bayes choice can result in inconsistency Solution: prevent b̂i from being less than ni di . This results, after an accurate approximation (for fixed p) in EBIC ∗ ≈ 2l(θ̂) − p X i=1 log(1 + ni ) − p X log (3vi + 2 max{vi , 1}) . i=1 again a very simple expresion JSM: July-August 2007 - slide #17 Effective Sample size Introduction and Motivation Calculate I = Ijk , the expected information matrix, at the MLE, EBIC The prior EBIC* Effective sample size ➲ Effective Sample size Ijk = −E X|θ ˛ ∂2 ˛ log f (x | θ)˛ b . ∂θj ∂θk θ =θ ➲ Effective Sample size: Continued Simulation JSM: July-August 2007 - slide #18 Effective Sample size Introduction and Motivation Calculate I = Ijk , the expected information matrix, at the MLE, EBIC The prior Ijk EBIC* Effective sample size ➲ Effective Sample size = −E X|θ For each case xi , compute I i ˛ ∂2 ˛ log f (x | θ)˛ b . ∂θj ∂θk θ =θ = (Ii,jk ), having (j, k) entry ➲ Effective Sample size: Continued Simulation Ii,jk = −E Xi |θ ˛ ∂2 ˛ log fi (xi | θ)˛ b . ∂θj ∂θk θ =θ JSM: July-August 2007 - slide #18 Effective Sample size Introduction and Motivation Calculate I = Ijk , the expected information matrix, at the MLE, EBIC Ijk The prior EBIC* Effective sample size ➲ Effective Sample size = −E X|θ For each case xi , compute I i ˛ ∂2 ˛ log f (x | θ)˛ b . ∂θj ∂θk θ =θ = (Ii,jk ), having (j, k) entry ➲ Effective Sample size: Continued Simulation Ii,jk Define Ijj = Pn = −E Xi |θ ˛ ∂2 ˛ log fi (xi | θ)˛ b . ∂θj ∂θk θ =θ Ii,jj Define nj as follows: Pn Define information weights wij = Ii,jj / k=1 Ik,jj . Define the effective sample size for θj as i=1 (Ijj )2 Ijj nj = Pn = Pn 2 . w I ij i,jj (I ) i,jj i=1 i=1 JSM: July-August 2007 - slide #18 Effective Sample size: Continued P The prior wij Ii,jj is a weighted measure of the information ‘per Intuitively, observation’, and dividing the total information about θ j by this information per case seems plausible as an effective sample size. EBIC* This is just a tentative proposal Effective sample size Works for most of the examples we discussed Introduction and Motivation EBIC ➲ Effective Sample size ➲ Effective Sample size: Continued Research in progress Simulation JSM: July-August 2007 - slide #19 A small linear regression simulation: Introduction and Motivation EBIC The prior EBIC* Effective sample size Simulation ➲ A small linear regression simulation: ➲ Linear regression Example ➲ Linear regression Example ➲ Linear regression Example ➲ Conclusions and future work ➲ References Yn×1 = Xn×8 β8×1 + n×1 ∼ N (0, σ 2 ), β = (3, 1.5, 0, 0, 2, 0, 0, 0), 0 1 Bρ B and design Matrix X ∼ N (0, Σx ), Σx = B B .. @. ρ ρ 1 . . . ρ ··· ··· .. . ··· 1 ρ ρC C C .C .A . 1 JSM: July-August 2007 - slide #20 PSfrag replacements Linear regression Example Method abf2 aic bic Gbic Gbic* hbf Introduction and Motivation EBIC The prior σ =3 n=30 σ =3 60 σ =3 200 σ =3 500 σ =3 1000 σ =2 n=30 σ =2 60 σ =2 200 σ =2 500 σ =2 1000 100 EBIC* 80 60 Effective sample size 40 Simulation 20 ➲ A small linear regression simulation: ➲ Linear regression Example 100 ➲ Linear regression Example ➲ Conclusions and future work 80 exact ➲ Linear regression Example 60 40 ➲ References 20 σ =1 n=30 σ =1 60 σ =1 200 σ =1 500 σ =1 1000 100 80 60 40 20 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 Percentage of true models selected under different situations JSM: July-August 2007 - slide #21 PSfrag replacements Linear regression Example Method abf2 aic bic Gbic Gbic* hbf Introduction and Motivation EBIC The prior σ =3 n=30 σ =3 60 σ =3 200 σ =3 500 σ =3 1000 σ =2 n=30 σ =2 60 σ =2 200 σ =2 500 σ =2 1000 5.0 EBIC* 4.5 Effective sample size 4.0 Simulation 3.5 ➲ A small linear regression simulation: ➲ Linear regression Example 5.0 ➲ Linear regression Example ➲ Conclusions and future work ➲ References 4.5 correct ➲ Linear regression Example 4.0 3.5 σ =1 n=30 σ =1 60 σ =1 200 σ =1 500 σ =1 1000 5.0 4.5 4.0 3.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 Average number of 0’s that were determined to be 0. (True Model) JSM: July-August 2007 - slide #22 Linear regression Example PSfrag replacements Method Introduction and Motivation abf2 aic bic Gbic Gbic* hbf EBIC The prior EBIC* σ =3 n=30 σ =3 60 σ =3 200 σ =3 500 σ =3 1000 σ =2 n=30 σ =2 60 σ =2 200 σ =2 500 σ =2 1000 0.6 Effective sample size 0.4 Simulation 0.2 ➲ A small linear regression 0.0 simulation: ➲ Linear regression Example ➲ Linear regression Example ➲ Linear regression Example ➲ References 0.6 incorrect ➲ Conclusions and future work 0.4 0.2 0.0 σ =1 σ =1 60 n=30 σ =1 200 σ =1 500 σ =1 1000 0.6 0.4 0.2 0.0 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 ρ=0 ρ=.2 ρ=.5 Average number of non-zeroes that were determined to be zero (small is good). JSM: July-August 2007 - slide #23 Conclusions and future work Introduction and Motivation EBIC The prior EBIC* Effective sample size Simulation ➲ A small linear regression simulation: ➲ Linear regression Example ➲ Linear regression Example ➲ Linear regression Example BIC can be improved by – keeping rather than ignoring the prior – using a sensible, objective prior producing close-form expressions – carefully acknowledging “effective sample size” for each parameter to guide choice of the scale of the priors Work in progress – A satisfactory definition of ‘effective sample size’ (if possible). Current proposal works in most cases. – Extension to the non-independent case ➲ Conclusions and future work ➲ References JSM: July-August 2007 - slide #24 References Introduction and Motivation EBIC The prior EBIC* Effective sample size Simulation ➲ A small linear regression simulation: ➲ Linear regression Example ➲ Linear regression Example ➲ Linear regression Example ➲ Conclusions and future work ➲ References 1. Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis. 2nd edition. Springer-Verlag. 2. Berger, J.O., Ghosh, J.K., and Mukhopadhyay (2003). Approximations and Consistency of Bayes Factors as Model Dimension Grows. Journal of Statistical Planning and Inference 112, 241–258. 3. Berger, J., Pericchi, L, and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhya A 60, 307–321. 4. Bollen, K., Ray S., Zavisca J.,(2007) A scaled unit information prior approximation to the Bayes Factor. under prepration JSM: July-August 2007 - slide #25