An Extended BIC for Model Selection

advertisement
An Extended BIC for Model Selection
at
the JSM meeting 2007 - Salt Lake City
Surajit Ray
Boston University (Dept of Mathematics and Statistics)
Joint work with
James Berger, Duke University; Susie Bayarri, University of Valencia;
Woncheol Jang, University of Georgia; Luis R. Pericchi, University of Puerto Rico
JSM: July-August 2007 - slide #1
Motivation and Outline
Introduction and Motivation
➲ Motivation and Outline
➲ Bayesian Information Criterion:
Initiated from a SAMSI Social Sciences working group. “getting the model right”
in structural equation modeling.
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
means
➲ Example: Common mean,
differing variances:
➲ Problem with n and p
Introduced in the Wald Lecture by Jim Berger.
Specific characteristics
They have ‘independent cases’ (individuals).
Often ‘Multilevel’ or ‘random effects’ models (p grows as n grows).
Regularity Conditions may not be satisfied.
Problems with BIC.
EBIC
The prior
EBIC*
Effective sample size
Simulation
Can a generalization of BIC work?
Goal: Use only Standard software output and close-form expressions.
Present a generalization of BIC (‘extended BIC’ or EBIC) and some examples in
linear regression scenario.
Work in progress (comments are welcome!)
JSM: July-August 2007 - slide #2
Bayesian Information Criterion: BIC
Introduction and Motivation
➲ Motivation and Outline
BIC = 2l(θ̂|x)−p log(n),
➲ Bayesian Information Criterion:
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
l(θ̂|x) the log-likelihood of θ̂ given x
p the number of estimated parameters.
n the cardinality of x.
means
➲ Example: Common mean,
differing variances:
➲ Problem with n and p
EBIC
The prior
EBIC*
Swartz’s result: As n → ∞ (with p fixed) this is an approximation (up to a
constant) to
R twice the Bayesian log likelihood for the model,
m(x) = f (x | θ)π(θ)dθ , so that
m(x) = cπ eBIC/2 (1 + o(1)) .
Effective sample size
Simulation
BIC approaches to BF’s further ignore the c π ’s check this
JSM: July-August 2007 - slide #3
First Ambiguity: What is n?
Introduction and Motivation
➲ Motivation and Outline
Consider time series yit ,
➲ Bayesian Information Criterion:
yit = βyi t−1 + εit , i = 1, . . . , n∗ , t = 1, . . . T,
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
εit ∼ N (0, σ 2 ).
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
means
When the goal is to estimate the mean level ȳ , consider two extreme cases:
➲ Example: Common mean,
differing variances:
➲ Problem with n and p
1.
EBIC
β = 0; then the yit are i.i.d., and the estimated parameter ȳ has sample size
n = n∗ × T
β = 1; then clearly the effective sample size for ȳ equals n = n ∗
The prior
2.
EBIC*
See Thiebaux & Zwiers (1984) for discussion of effective sample size in time series.
Effective sample size
Simulation
JSM: July-August 2007 - slide #4
Second Ambiguity: Can n be different for differnt
parameters?
Introduction and Motivation
Consider data from p groups
➲ Motivation and Outline
➲ Bayesian Information Criterion:
yip = µp + εip , i = 1 . . . n∗ , p = 1 . . . r
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
εip ∼ N (0, σ 2 )
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
means
➲ Example: Common mean,
differing variances:
➲ Problem with n and p
EBIC
µp and σ 2 both unknown.
Group 1
y11
y21
...
y n∗ 1
Group 2
y12
y22
...
y n∗ 2
.
.
.
.
.
.
y1p
y2p
.
.
.
.
.
.
y1r
y2r
The prior
EBIC*
Effective sample size
Group p
Simulation
Group r
.
.
.
.
...
y n∗ p
..
.
.
.
.
...
y n∗ r
..
JSM: July-August 2007 - slide #5
Second Ambiguity: Can n be different for differnt
parameters?
Introduction and Motivation
Consider data from p groups
➲ Motivation and Outline
➲ Bayesian Information Criterion:
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
means
➲ Example: Common mean,
yip = µp + εip , i = 1 . . . n∗ , p = 1 . . . r
εip ∼ N (0, σ 2 )
µp and σ 2 both unknown.
differing variances:
➲ Problem with n and p
EBIC
The prior
EBIC*
Different parameters can have different sample sizes:
Effective sample size
the sample size for estimating each µp is n
Simulation
the sample size for estimating σ is n
2
∗
∗
×r
JSM: July-August 2007 - slide #5
Second Ambiguity: Can n be different for differnt
parameters?
Introduction and Motivation
Consider data from p groups
➲ Motivation and Outline
➲ Bayesian Information Criterion:
yip = µp + εip , i = 1 . . . n∗ , p = 1 . . . r
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
εip ∼ N (0, σ 2 )
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
means
➲ Example: Common mean,
µp and σ 2 both unknown.
differing variances:
➲ Problem with n and p
EBIC
The prior
EBIC*
Computation for this example yields
Effective sample size
Î =
Simulation
where σ̂
2
=
1
n
Pp
i=1
Pr
0
∗
n
I
@ σ̂2 r×r
l=1 (Xil
0
− X̄i )2 .
0
n
2σ̂ 4
1
A,
JSM: July-August 2007 - slide #5
Third Ambiguity: What is the number of parameters
Introduction and Motivation
➲ Motivation and Outline
➲ Bayesian Information Criterion:
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
different for differnt parameters?
Example: Random effects group means
In the previous Group Means example, with p groups and r observations X il per
2
group, with Xil ∼ N (µi , σ ) for l = 1, . . . , r , consider the multilevel version in
which
µi ∼ N (ξ, τ 2 ),
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
means
➲ Example: Common mean,
2
with ξ and τ being unknown.
differing variances:
➲ Problem with n and p
EBIC
The prior
EBIC*
Effective sample size
What is the number of parameters? (see also Pauler (1998))
= 0, there is only two parameters: ξ and σ 2 .
2
(2) If τ is huge, is the number of parameters p + 3 ?
2
2
the means along with ξ, τ and σ
(1) If τ
2
Continued ...
Simulation
JSM: July-August 2007 - slide #6
Example: Random effects group means
Introduction and Motivation
➲ Motivation and Outline
(3) But, if one integrates out µ
➲ Bayesian Information Criterion:
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
f (x | σ 2 , ξ, τ 2 )
=
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
∝
➲ Example: Random effects group
means
➲ Example: Common mean,
differing variances:
➲ Problem with n and p
EBIC
The prior
EBIC*
Effective sample size
Simulation
so p
= (µ1 , . . . , µp ), then
Z
f (x | µ, ξ, σ 2 )π(µ | ξ, τ 2 )dµ
! p
!
2
Y
2
ˆ
(x¯i − ξ)
1
σ
exp
exp
−
,
σ2
2
2σ 2 i=1
σ −p(r−1)
2( r + τ )
= 3 if one can work directly with f (x | σ 2 , ξ, τ 2 ).
Note: This seems to be common practice in Social Sciences: latent variables are
integrated out, so remaining parameters are the only unknowns when applying
BIC
Note: In this example the effective sample sizes should be ≈
2
and τ , and ≈ r for the µi ’s.
p r for σ 2 , ≈ p for ξ
JSM: July-August 2007 - slide #7
Example: Common mean, differing variances:
Introduction and Motivation
➲ Motivation and Outline
➲ Bayesian Information Criterion:
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
(Cox mixtures)
Suppose each independent observation X i , i = 1, . . . , n, has probability 1/2 of
arising from the N (θ, 1), and probability 1/2 of arising from the N (θ, 1000).
Clearly half of the observations are worthless, so the ‘effective sample size’ is roughly
n/2.
➲ Example: Random effects group
means
➲ Example: Common mean,
differing variances:
➲ Problem with n and p
EBIC
The prior
EBIC*
Effective sample size
Simulation
JSM: July-August 2007 - slide #8
Problem with n and p
Introduction and Motivation
➲ Motivation and Outline
➲ Bayesian Information Criterion:
BIC
➲ First Ambiguity: What is n?
➲ Second Ambiguity: Can n be
different for differnt parameters?
➲ Third Ambiguity: What is the
number of parameters
➲ Example: Random effects group
means
Problems with n.
Is n the number of vector observations or the number of real observations?
Different θi can have different effective sample sizes.
Some observations can me more informative than others
as in mixture models
models with mixed, continuous and discrete observations
➲ Example: Common mean,
differing variances:
➲ Problem with n and p
EBIC
The prior
Problems with p.
What is p with random effects, latent variables, mixture model... etc. ?
Often p grows with n
EBIC*
Effective sample size
Simulation
JSM: July-August 2007 - slide #9
EBIC: a proposed solution
Introduction and Motivation
EBIC
➲ EBIC: a proposed solution
➲ Laplace approximation
The prior
EBIC*
Effective sample size
Simulation
Based on a modified Laplace approximation to m(x) for good priors:
The good priors can deal with growing p
The Laplace approximation
retains the constant cπ in the expansion
often good even for moderate n
Implicitly uses ‘effective sample size’ (discussion later)
Computable using standard software using mle’s and observed information
matrices.
JSM: July-August 2007 - slide #10
EBIC: a proposed solution
Introduction and Motivation
EBIC
➲ EBIC: a proposed solution
➲ Laplace approximation
The prior
EBIC*
Effective sample size
Simulation
Based on a modified Laplace approximation to m(x) for good priors:
The good priors can deal with growing p
The Laplace approximation
retains the constant cπ in the expansion
often good even for moderate n
Implicitly uses ‘effective sample size’ (discussion later)
Computable using standard software using mle’s and observed information
matrices.
JSM: July-August 2007 - slide #10
EBIC: a proposed solution
Introduction and Motivation
EBIC
➲ EBIC: a proposed solution
➲ Laplace approximation
The prior
EBIC*
Effective sample size
Simulation
Based on a modified Laplace approximation to m(x) for good priors:
The good priors can deal with growing p
The Laplace approximation
retains the constant cπ in the expansion
often good even for moderate n
Implicitly uses ‘effective sample size’ (discussion later)
Computable using standard software using mle’s and observed information
matrices.
JSM: July-August 2007 - slide #10
Laplace approximation
Introduction and Motivation
1. Preliminary: ‘nice’ reparameterization.
EBIC
2. Taylor expansion
➲ EBIC: a proposed solution
➲ Laplace approximation
By a Taylor’s series expansion of e
The prior
EBIC*
m(x)
=
Effective sample size
Simulation
≈
l(θ )
Z
Z
b,
about the mle θ
f (x | θ)π(θ)dθ =
el(θ ) π(θ)dθ
»
Z
”–
“
”t “
1
exp l(θ̂) + (θ − θ̂)t ∇l(θ̂) −
θ − θ̂ Î θ − θ̂) π(θ)dθ
2
where ∇ denotes the gradient and
matrix, with (j, k) entry
3. Approximation to marginal m(x)
b = (Iˆjk ) is the observed information
I
b occurs on the interior of the parameter space, so ∇l( θ)
b = 0 (if not true, the
If θ
analysis must proceed as in Haughton 1991,1993), mild conditions yield
m(x) = e
b
l(θ )
Z
e
b tb
c
−1
(
θ
−
θ
)
I
(
θ
−
θ
)
2
π(θ)dθ(1 + on (1)) .
JSM: July-August 2007 - slide #11
Choosing a good prior
Introduction and Motivation
Integrate out common parameters
EBIC
Orthogonalize the parameters.
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
JSM: July-August 2007 - slide #12
Choosing a good prior
Introduction and Motivation
Integrate out common parameters
EBIC
Orthogonalize the parameters.
Let O be orthogonal and D
The prior
➲ Choosing a good prior
➲ Univariate priors
= diag(di ), i = 1, . . . p1 such that
Σ = O t DO
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
JSM: July-August 2007 - slide #12
Choosing a good prior
Introduction and Motivation
Integrate out common parameters
EBIC
Orthogonalize the parameters.
Let O be orthogonal and D
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
= diag(di ), i = 1, . . . p1 such that
Σ = O t DO
Make the change of variables ξ
b(1)
ξ = Oθ
= Oθ (1) , and let b
Effective sample size
Simulation
JSM: July-August 2007 - slide #12
Choosing a good prior
Introduction and Motivation
Integrate out common parameters
EBIC
Orthogonalize the parameters.
Let O be orthogonal and D
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
= diag(di ), i = 1, . . . p1 such that
Σ = O t DO
Make the change of variables ξ
b(1)
ξ = Oθ
= Oθ (1) , and let b
If there is no θ (2) to be integrated out, Σ
= Î
−1
= O t DO
Effective sample size
Simulation
JSM: July-August 2007 - slide #12
Choosing a good prior
Introduction and Motivation
Integrate out common parameters
EBIC
Orthogonalize the parameters.
Let O be orthogonal and D
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
= diag(di ), i = 1, . . . p1 such that
Σ = O t DO
Make the change of variables ξ
b(1)
ξ = Oθ
= Oθ (1) , and let b
−1
= Î = O t DO
Qp1
We now use a prior that is independent in the ξ i i.e, π(ξ) =
i=1 πi (ξi )
If there is no θ (2) to be integrated out, Σ
JSM: July-August 2007 - slide #12
Choosing a good prior
Introduction and Motivation
Integrate out common parameters
EBIC
Orthogonalize the parameters.
Let O be orthogonal and D
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
= diag(di ), i = 1, . . . p1 such that
Σ = O t DO
Make the change of variables ξ
b(1)
ξ = Oθ
= Oθ (1) , and let b
−1
= Î = O t DO
Qp1
We now use a prior that is independent in the ξ i i.e, π(ξ) =
i=1 πi (ξi )
The approximation to m(x) is then
"p Z
#
2
1
(ξ −ξ̂ )
Y
ˆ
1
− i2d i
l(θ )
p/2
−1/2
i
√
m(x) ≈ e
(2π) |Î|
e
πi (ξi )dξi .
2πdi
i=1
If there is no θ (2) to be integrated out, Σ
where we recall that di are the diagonal elements of D from Σ
that is, Var(ξi ) . They will appear often in what follows.
= O t DO ,
JSM: July-August 2007 - slide #12
Univariate priors
Introduction and Motivation
For the priors πi (ξi ), there are several possibilities:
EBIC
1. Jeffreys recommended the Cauchy(0, b i ) density
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
πiC (ξi ) =
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
Z
∞
0
«
„
«
„ ˛
1
1
1
˛
dλi
N ξi ˛ 0,
bi Ga λi | ,
λi
2 2
(di )−1
where (bi )
=
is the unit information for ξi with ni being the
ni
effecteive sample size for ξi (Kass & Wasserman, 1995).
−1
2
Note: For i.i.d. scenarios, di is like σi /ni so bi is like the
variance of one observation.
2. Use other sensible (objective) testing priors, like the intrinsic prior.
3. Goal is to get closed form expression: We use priors proposed in Berger
(1985) which is very close to both the Cauchy and the intrinsic priors and
produces close form expressions for m(x) for normal likelihoods.
JSM: July-August 2007 - slide #13
Berger’s robust priors
Introduction and Motivation
EBIC
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
For if bi
≥ di , the prior is given by
«
Z 1 „ ˛
1
1
˛
√ dλi ,
πiR (ξi ) =
N ξi ˛ 0,
(di + bi ) − di
2λi
2 λi
0
EBIC*
Effective sample size
Simulation
JSM: July-August 2007 - slide #14
Berger’s robust priors
Introduction and Motivation
EBIC
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
For if bi
≥ di , the prior is given by
«
Z 1 „ ˛
1
1
˛
√ dλi ,
πiR (ξi ) =
N ξi ˛ 0,
(di + bi ) − di
2λi
2 λi
0
Introduced by Berger in the context of Robust Analyses
No closed form expression.
Not widely used in model selection
Behaves like Cauchy priors with parameters b i .
JSM: July-August 2007 - slide #14
Berger’s robust priors
Introduction and Motivation
EBIC
The prior
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
For if bi
≥ di , the prior is given by
«
Z 1 „ ˛
1
1
˛
√ dλi ,
πiR (ξi ) =
N ξi ˛ 0,
(di + bi ) − di
2λi
2 λi
0
Introduced by Berger in the context of Robust Analyses
No closed form expression.
Not widely used in model selection
Behaves like Cauchy priors with parameters b i .
Gives closed form expression for approximation to m(x)
m(x) ≈ e
where (di )
ˆ
l(θ )
−1
(2π)
p/2
|Î|
−1/2
2
4
p1
Y
i=1
“
−ξ̂i2 /(di +bi )
”3
1−e
1
5
p
√ 2
ˆ
2π(di + bi )
2 ξi /(di + bi )
is global information about ξi , and (bi )
−1
is unit information.
JSM: July-August 2007 - slide #14
Proposal for EBIC
Introduction and Motivation
Finally, we have, as the approximation to 2 log m(x),
EBIC
The prior
➲ Choosing a good prior
EBIC
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
Effective sample size
Simulation
2l(θ̂) + (p − p1 ) log(2π) − log |Î 22 | − i=1 log(1 + ni )
−vi
Pp1
)
(1−e
ξ̂i2
+2 i=1 log √2 v , where vi = bi +d
.
i
i
➲ Comparison of EBIC and BIC
EBIC*
≡
Pp1
(error as approximation to 2 log m(x) is o n (1); exact for Normals)
If no parameters are integrated out (so that p 1
= p), then
´
1 − e−vi
log(1 + ni ) + 2
log √
.
EBIC = 2l(θ̂) −
2 vi
i=1
i=1
p
X
p
X
`
If all ni
= n, the dominant terms in the expression (as n → ∞) are
2l(θ̂) − p log n. The third term (the ‘constant’ ignored by Swartz) is negative.
JSM: July-August 2007 - slide #15
Comparison of EBIC and BIC
Introduction and Motivation
Compare this EBIC and BIC:
EBIC
The prior
EBIC
➲ Choosing a good prior
➲ Univariate priors
➲ Berger’s robust priors
➲ Proposal for EBIC
BIC
➲ Comparison of EBIC and BIC
EBIC*
Effective sample size
Simulation
where kπ
= 2l(θ̂) −
p
X
i=1
log(1 + ni ) − kπ
= 2l(θ̂) − p log n
> 0 is the ‘ignored’ constant from π
It is easy to intuitively notice that ‘BIC makes two mistakes’:
Penalizes too much with p log n. Note that n i ≤ n
Penalizes too little with the prior (the third term, a new penalization, is absent)
The second ‘mistake’ actually helps the first one (in this sense, BIC is ‘lucky’),
but often the ‘first mistake’ completely overwhelms this little help.
JSM: July-August 2007 - slide #16
EBIC*: Modification More Favorable to Complex
Models
Introduction and Motivation
Priors centered at zero penalize complex models too much?
EBIC
Priors centered around MLE ( Raftery, 1996) favor complex models too much?
The prior
An attractive compromise: Cauchy-type priors centered at zero, but with the
scales, bi , chosen so as to maximize the marginal likelihood of the model.
Empirical Bayes alternative, popularized in the robust Bayesian literature
(Berger, 1994)
EBIC*
➲ EBIC*: Modification More
Favorable to Complex Models
Effective sample size
Simulation
JSM: July-August 2007 - slide #17
EBIC*: Modification More Favorable to Complex
Models
Introduction and Motivation
The bi that maximizes m(x) can easily be seen to be
EBIC
The prior
EBIC*
ξˆi2
b̂i = max{di ,
− di }, with w s.t. ew = 1 + 2w, or w ≈ 1.3 .
w
➲ EBIC*: Modification More
Favorable to Complex Models
Effective sample size
Simulation
Problem: when ξi
as ni → ∞.
= 0, this empirical Bayes choice can result in inconsistency
Solution: prevent b̂i from being less than ni di . This results, after an accurate
approximation (for fixed p) in
EBIC
∗
≈ 2l(θ̂) −
p
X
i=1
log(1 + ni ) −
p
X
log (3vi + 2 max{vi , 1}) .
i=1
again a very simple expresion
JSM: July-August 2007 - slide #17
Effective Sample size
Introduction and Motivation
Calculate I
= Ijk , the expected information matrix, at the MLE,
EBIC
The prior
EBIC*
Effective sample size
➲ Effective Sample size
Ijk
= −E X|θ
˛
∂2
˛
log f (x | θ)˛ b .
∂θj ∂θk
θ =θ
➲ Effective Sample size: Continued
Simulation
JSM: July-August 2007 - slide #18
Effective Sample size
Introduction and Motivation
Calculate I
= Ijk , the expected information matrix, at the MLE,
EBIC
The prior
Ijk
EBIC*
Effective sample size
➲ Effective Sample size
= −E X|θ
For each case xi , compute I i
˛
∂2
˛
log f (x | θ)˛ b .
∂θj ∂θk
θ =θ
= (Ii,jk ), having (j, k) entry
➲ Effective Sample size: Continued
Simulation
Ii,jk
= −E Xi |θ
˛
∂2
˛
log fi (xi | θ)˛ b .
∂θj ∂θk
θ =θ
JSM: July-August 2007 - slide #18
Effective Sample size
Introduction and Motivation
Calculate I
= Ijk , the expected information matrix, at the MLE,
EBIC
Ijk
The prior
EBIC*
Effective sample size
➲ Effective Sample size
= −E X|θ
For each case xi , compute I i
˛
∂2
˛
log f (x | θ)˛ b .
∂θj ∂θk
θ =θ
= (Ii,jk ), having (j, k) entry
➲ Effective Sample size: Continued
Simulation
Ii,jk
Define Ijj
=
Pn
= −E Xi |θ
˛
∂2
˛
log fi (xi | θ)˛ b .
∂θj ∂θk
θ =θ
Ii,jj Define nj as follows:
Pn
Define information weights wij = Ii,jj /
k=1 Ik,jj .
Define the effective sample size for θj as
i=1
(Ijj )2
Ijj
nj = Pn
= Pn
2 .
w
I
ij
i,jj
(I
)
i,jj
i=1
i=1
JSM: July-August 2007 - slide #18
Effective Sample size: Continued
P
The prior
wij Ii,jj is a weighted measure of the information ‘per
Intuitively,
observation’, and dividing the total information about θ j by this information per
case seems plausible as an effective sample size.
EBIC*
This is just a tentative proposal
Effective sample size
Works for most of the examples we discussed
Introduction and Motivation
EBIC
➲ Effective Sample size
➲ Effective Sample size: Continued
Research in progress
Simulation
JSM: July-August 2007 - slide #19
A small linear regression simulation:
Introduction and Motivation
EBIC
The prior
EBIC*
Effective sample size
Simulation
➲ A small linear regression
simulation:
➲ Linear regression Example
➲ Linear regression Example
➲ Linear regression Example
➲ Conclusions and future work
➲ References
Yn×1 = Xn×8 β8×1 + n×1 ∼ N (0, σ 2 ),
β = (3, 1.5, 0, 0, 2, 0, 0, 0),
0
1
Bρ
B
and design Matrix X ∼ N (0, Σx ), Σx = B
B ..
@.
ρ
ρ
1
.
.
.
ρ
···
···
..
.
···
1
ρ
ρC
C
C
.C
.A
.
1
JSM: July-August 2007 - slide #20
PSfrag replacements
Linear regression Example
Method
abf2
aic
bic
Gbic
Gbic*
hbf
Introduction and Motivation
EBIC
The prior
σ =3
n=30
σ =3
60
σ =3
200
σ =3
500
σ =3
1000
σ =2
n=30
σ =2
60
σ =2
200
σ =2
500
σ =2
1000
100
EBIC*
80
60
Effective sample size
40
Simulation
20
➲ A small linear regression
simulation:
➲ Linear regression Example
100
➲ Linear regression Example
➲ Conclusions and future work
80
exact
➲ Linear regression Example
60
40
➲ References
20
σ =1
n=30
σ =1
60
σ =1
200
σ =1
500
σ =1
1000
100
80
60
40
20
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
Percentage of true models selected under different situations
JSM: July-August 2007 - slide #21
PSfrag replacements
Linear regression Example
Method
abf2
aic
bic
Gbic
Gbic*
hbf
Introduction and Motivation
EBIC
The prior
σ =3
n=30
σ =3
60
σ =3
200
σ =3
500
σ =3
1000
σ =2
n=30
σ =2
60
σ =2
200
σ =2
500
σ =2
1000
5.0
EBIC*
4.5
Effective sample size
4.0
Simulation
3.5
➲ A small linear regression
simulation:
➲ Linear regression Example
5.0
➲ Linear regression Example
➲ Conclusions and future work
➲ References
4.5
correct
➲ Linear regression Example
4.0
3.5
σ =1
n=30
σ =1
60
σ =1
200
σ =1
500
σ =1
1000
5.0
4.5
4.0
3.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
Average number of 0’s that were determined to be 0. (True Model)
JSM: July-August 2007 - slide #22
Linear regression Example
PSfrag replacements
Method
Introduction and Motivation
abf2
aic
bic
Gbic
Gbic*
hbf
EBIC
The prior
EBIC*
σ =3
n=30
σ =3
60
σ =3
200
σ =3
500
σ =3
1000
σ =2
n=30
σ =2
60
σ =2
200
σ =2
500
σ =2
1000
0.6
Effective sample size
0.4
Simulation
0.2
➲ A small linear regression
0.0
simulation:
➲ Linear regression Example
➲ Linear regression Example
➲ Linear regression Example
➲ References
0.6
incorrect
➲ Conclusions and future work
0.4
0.2
0.0
σ =1
σ =1
60
n=30
σ =1
200
σ =1
500
σ =1
1000
0.6
0.4
0.2
0.0
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
ρ=0
ρ=.2
ρ=.5
Average number of non-zeroes that were determined to be zero (small is good).
JSM: July-August 2007 - slide #23
Conclusions and future work
Introduction and Motivation
EBIC
The prior
EBIC*
Effective sample size
Simulation
➲ A small linear regression
simulation:
➲ Linear regression Example
➲ Linear regression Example
➲ Linear regression Example
BIC can be improved by
– keeping rather than ignoring the prior
– using a sensible, objective prior producing close-form expressions
– carefully acknowledging “effective sample size” for each parameter to guide
choice of the scale of the priors
Work in progress
– A satisfactory definition of ‘effective sample size’ (if possible). Current proposal
works in most cases.
– Extension to the non-independent case
➲ Conclusions and future work
➲ References
JSM: July-August 2007 - slide #24
References
Introduction and Motivation
EBIC
The prior
EBIC*
Effective sample size
Simulation
➲ A small linear regression
simulation:
➲ Linear regression Example
➲ Linear regression Example
➲ Linear regression Example
➲ Conclusions and future work
➲ References
1. Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis. 2nd
edition. Springer-Verlag.
2. Berger, J.O., Ghosh, J.K., and Mukhopadhyay (2003). Approximations and
Consistency of Bayes Factors as Model Dimension Grows. Journal of Statistical
Planning and Inference 112, 241–258.
3. Berger, J., Pericchi, L, and Varshavsky, J. (1998). Bayes factors and marginal
distributions in invariant situations. Sankhya A 60, 307–321.
4. Bollen, K., Ray S., Zavisca J.,(2007) A scaled unit information prior approximation
to the Bayes Factor. under prepration
JSM: July-August 2007 - slide #25
Download