Empirical Bayes Methods for Combining Likelihoods

Empirical Bayes Methods for Combining Likelihoods Bradley EFRON Supposethatseveralindependentexperimentsareobserved,each one yieldinga likelihoodLk (0k) for a real-valuedparameterof interestOk. For example,Ok mightbe the log-oddsratio for a 2 x 2 table relatingto the kth populationin a series of medical experiments.This articleconcernsthe followingempiricalBayes question:How can we combineall of the likelihoodsLk to get an intervalestimatefor any one of the Ok'S, say 01? The resultsarepresentedin the formof a realisticcomputational schemethat allowsmodelbuildingandmodelcheckingin the spiritof a regressionanalysis.No specialmathematicalformsarerequiredfor the priorsor the likelihoods.This schemeis designedto take advantageof recentmethodsthatproduceapproximatenumerical likelihoodsLk(6k) even in very complicatedsituations,with all nuisanceparameterseliminated.The empiricalBayes likelihood theoryis extendedto situationswherethe Ok'S have a regressionstructureas well as an empiricalBayes relationship.Most of the discussionis presentedin termsof a hierarchicalBayes model and concernshow such a modelcan be implementedwithout requiringlargeamountsof Bayesianinput.Frequentistapproaches,such as bias correctionandrobustness,play a centralrole in the methodology. KEY WORDS: ABC method;Confidenceexpectation;Generalizedlinearmixedmodels;HierarchicalBayes;Meta-analysisfor likelihoods;Relevance;Specialexponentialfamilies. 1. INTRODUCTION for A typical statistical analysis blends data from independent experimental units into a single combined inference for a parameter of interest 0. Empirical Bayes, or hierarchical or meta-analytic analyses, involve a second level of data acquisition. Several independent experiments are observed, each involving many units, but each perhaps having a different value of the parameter0. Inferences about one or all of the 0's are then made on the basis of the full two-level compound data set. This article concerns the construction of empirical Bayes interval estimates for the individual parameters 0, when the observed data consist of a separate likelihood for each case. Table 1, the ulcer data, exemplifies this situation. Fortyone randomized trials of a new surgical treatment for stomach ulcers were conducted between 1980 and 1989 (Sacks, Chalmers, Blum, Berrier, and Pagano 1990). The kth experiment's data are recorded as (k= 1,2,...,41), (ak,bk,ck,dk) (1) where ak and bk are the number of occurrences and nonoccurrences for the Treatment (the new surgery) and Ck and dk are the occurrences and nonoccurrences for Control (an older surgery). Occurrence here refers to an adverse eventrecurrent bleeding. The estimated log odds ratio for experiment k, 0k log bk dk 2 0k, also appearing in Table 1, is SDk ={ ak +.S + bb + .5 +k + -5 dk + .5 SD13 = .61, for example. The statistic 0k is an estimate of the true log-odds ratio 0k in the kth experimental population, Pk {OccurrencelTreatment} log ( Pk {NonoccurrencelTreatment} Pk {OccurrencelControl} N Pk{NonoccurrencelControl} (4) where Pk indicates probabilities for population k. The data (ak, bk, Ck, dk), considered as a 2 x 2 table with fixed margins, give a conditional likelihood for 0k, Lk(O) (a a + bk) (Ck + dk )okak/S(O) (5) (Lehmann 1959, sec. 4.6). Here Sk(Ok) is the sum of the numerator in (5) over the allowable choices of ak subject to the 2 x 2 table's marginal constraints. Figure 1 shows 10 of the 41 likelihoods (5), each normalized so as to integrate to 1 over the range of the graph. It seems clear that the 0k values are not all the same. For instance, L8(08) and L13(013) barely overlap. On the other hand, the 0k values are not wildly discrepant, most of the 41 likelihood functions being concentrated in the range (2) Ok E (-6,3). measures the excess occurrence of Treatment over Control. For example 013 = .60, suggesting a greater rate of occurrence for the Treatment in the 13th experimental population. But 08 =-4.17 indicates a lesser rate of occurrence in the 8th population. An approximate standard deviation Bradley Efron is Professor, Department of Statistics, Stanford University, Stanford, CA 94305. Research was supported by National Science Foundation Grant DMS92-04864 and National Institutes of Health Grant 5 ROI CA59039-20. 538 This article concerns making interval estimates for any one of the parameters, say 08, on the basis of all the data in Table 1. We could of course use only the data from experiment 8 to form a classical confidence interval for 08. This amounts to assuming that the other 40 experiments have no relevance to the 8th population. At the other extreme, we could assume that all of the 0k values were equal and form ? 1996 American Statistical Association Journal of the American Statistical Association June 1996, Vol. 91, No. 434, Theory and Methods Efron: Empirical Bayes Methods for Combining Likelihoods * 539 L4Asll Our empiricalBayes confidenceintervalswill be constructed directly from the likelihoods Lk (0k), k = 1, 2 ... K, with no other reference to the original data that gave these likelihoods.This can be a big advantage in situationswhere the individualexperimentsare more complicatedthan those in Table 1. Suppose,for instance, that we had observed d-dimensionalnormal vectors xk =1,2,... K and Nd(/,k,Zk) independentlyfor k that the parameterof interest was Ok = second-largest eigenvalue of Xlk. Currentresearchhas providedseveral good ways to constructapproximatelikelihoods LXk(Ok) for Ok alone, with all nuisance parameterseliminated (see Barndorff-Nielsen1986, Cox and Reid 1987, and Efron 1993).The set of approximatelikelihoods,Lxk (Ok), k 1,2, ... K, could serve as the input to our empirical Bayes analysis.In this way empiricalBayes methodscan be broughtto bear on very complicatedsituations. The emphasishere is on a generalapproachto empirical Bayes confidenceintervalsthatdoes not requiremathematically tractableforms for the likelihoodsor for the family of possible prior densities.Our results,which aresprimarily methodological,are presentedin the form of a realistic computationalalgorithmthat allowsmodelbuildingand model checking,in the spiritof a regressionanalysis.As a price for this degree of generality,all of our methodsare approximate.They can be thought of as computer-based generalizationsof the analyticresultsof CarlinandGelfand (1990, 1991), Laird and Louis (1987), Morris(1983), and otherworksreferredto by those authors.Section6 extends the empiricalBayes likelihoodmethods to problemsalso having a regressionstructure.This extensiongives a very generalversion of mixed models applyingto generalized linearmodel (GLM)analyses. The frameworkfor our discussion is the hierarchical Bayes model describedin Section 2. Philosophically,this , *. I 'cJ<L .'1~~' , L o _~~~~~~ _ V_It . , the oter40lieihod6 2 4 log odds ratio 4 e Figure 1. Tenof the 41 IndividualLikelihoodsLk(Ok), (5);k 1, 3, 5, to Satisfyf58Lk(6,0 Normalized 6, 7, 8, 10, 12, 13,and40; Likelihoods dok ==1. Starsindicate L40(040),whichis morenegativelylocatedthan the other40 likelihoods;L41(041), not shown, is perfectlyflat. a confidenceintervalfor the commonlog-oddsratio0 from the totals (a+, b+Ic+Id) d (170,736,352,556). The em- piricalBayes methodsdiscussedhere compromisebetween these extremes.Ourgoal is to make a correctBayesianassessmentof the informationconcerning08 in the other 40 experimentswithout having to specify a full Bayes prior distribution.Morris (1983) gives an excellent description of the empiricalBayes viewpoint. Table 1. Ulcer Data Experiment a b c d *1 2 *3 4 *5 *6 *7 *8 9 *10 11 *12 *13 14 15 16 17 18 19 20 7 8 5 7 3 4 4 1 3 2 6 2 9 7 3 4 2 1 4 7 8 11 29 29 9 3 13 15 11 36 6 5 12 14 22 7 8 30 24 36 11 8 4 4 0 4 13 13 7 12 8 7 7 5 11 6 8 4 15 16 2 8 35 27 12 0 11 3 15 20 0 2 17 20 21 4 2 23 16 27 0 -1.84 -.32 .41 .49 Inf -Inf -1.35 -4.17 -.54 -2.38 -Inf -2.17 .60 .69 -1.35 -.97 -2.77 -1.65 -1.73 -1.11 SD Experiment a b c d .86 .66 .68 .65 1.57 1.65 .68 1.04 .76 .75 1.56 1.06 .61 .66 .68 .86 1.02 .98 .62 .51 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 *40 41 6 4 14 6 0 1 5 0 0 2 1 8 6 0 4 10 3 4 7 0 0 34 14 54 15 6 9 12 10 22 16 14 16 6 20 13 30 13 30 31 34 9 13 5 13 8 6 5 5 12 8 10 7 15 7 5 2 12 2 5 15 34 0 8 34 61 13 0 10 10 2 16 11 6 12 2 18 14 8 14 14 22 0 16 0 -2.22 .66 .20 -.43 -Inf -1.50 -.18 -Inf -Inf -1.98 -2.79 -.92 -1.25 -Inf .77 -1.50 .48 -.99 -1.11 -Inf NA SD .61 .71 .42 .64 2.08 1.02 .73 1.60 1.49 .80 1.01 .57 .92 1.51 .87 .57 .91 .71 .52 2.01 2.04 NOTE: 41 independent experiments comparing Treatment, a new surgery for stomach ulcer, with Control, an older surgery; (a, b) = (occurrences, nonoccurrences) on Treatment; (c, d) = (occurrences, nonoccurrences) on Control; occurrence refers to an adverse event, recurrent bleeding. 4 is estimated log odds ratio (2); SD = estimated standard deviation for 4 (3). Experiments 40 and 41 are not included in the empirical Bayes analysis of Section 3. Stars indicate likelihoods shown in Figure 1. Data from Sacks et al. (1990). Carl Morris, in Part 8 of his Commentary following this article, points out discrepancies between Table 1 and the original data for cities 4, 24, 32, and 33. 540 Journal of the American Statistical Association, June 1996 hyperprior density h(-) 4 hyperparameter ____________________________ I 1 | nduced a1-? I prior density | induced I prior-jparameter of interest- ~ h(rjlx) X X I g(x0/) I | 00 1 foo(.) direct data xO oi 02 03 1 1 1 f02(.) f03(.) . X2 X3 * f0(l X1 ** unobserved parameters K 1 .. dl(x) | individual (K() sampling densities observed data XK I density ~~~~~~sannpling I x "other" data Figure 2. The Hierarchical Bayes Model Hyperparameter 77 Sampled From Hyperprior Density h(). Prior density gq() yields unobservable parameters 00, 01, 02, 03 .... OK; each Ok determines independent sampling density fok () for observed data xk; inferences about 00 are made on the basis of xo and the "other"data x = (x1, x2, x3,..., XK). It is convenient to assume that x is observed before xo (dashed lines), leading to the full Bayesian analysis (13). is a very attractive framework for the combination of independent likelihoods. In practice, however, it is difficult to apply hierarchical Bayes methods because of the substantial Bayesian inputs that they require. This article can be thought of as an attempt to make hierarchical Bayes analyses more flexible and practical, and more palatable to frequentists. Section 3 considers a databased approach to the choice of prior distributions, less rigid than the usual appeal to conjugate families. Section 5, on bias correction, concerns frequentist adjustments to the maximum likelihood estimation of hierarchical parameters. These adjustments correct a simple frequentist approach to better agree with what one would obtain from a full hierarchical Bayes analysis. Section 4 concerns robustness: How we can protect a genuinely unusual case, perhaps the one drug out of many tested that really works, from being overmodified by Bayesian shrinkage to the mean? Section 7 summarizes the various suggestions and results. 'q = (m1,?2) h( , 72 ) d?Id2 sampledfrom the vague hyperpriordensity = dlId?72/9 2< 72 > 0- The prior density g97(.) yields some unobservable realvalued parameters Ok, Ol,* 00 02, iid *... OK "'( ,*)q, (6) with iid meaning independent and identically distributed. (In this section the cases are indexed from zero to K instead of from 1 to k, with 00 representing a parameter of particular interest.) Each Ok determines a sampling density fok (.) that then yields some observable data Xk, Xk rc-' fok (.) independently for k = 0, 1, 2,. . ., K. (7) The Xk can be vector valued, and the densities fok (.) can differ in form for the different cases. In fact our methods use only the likelihoods Lk (Ok) CfOk (Xk), (8) with xk fixed as observed and c indicating an arbitrarypositive constant. In an earlier work (Efron 1993), I showed how likelihoods Lk (Ok) can be accurately approximated even in 2. HIERARCHICALAND EMPIRICALBAYES MODELS cases where nuisance parameters affect the distributions of Empirical Bayes confidence intervals can be thought of the xk. as approximations to a full hierarchical Bayes analysis of Our task here is to form interval estimates for any one a compound data set. That is the point of view taken by of the Ok, say 00, on the basis of all of the observed data. Carlin and Gelfand (1990, 1991), Laird and Louis (1987), The Bayes posterior interval for 00 can be conveniently deand this article. This section briefly reviews the connection scribed by assuming that the "other"data, between hierarchical and empirical Bayes analyses. X = (XI i X2 i... i XK), (9) Figure 2 diagrams the hierarchical Bayes model as used here. At the most remote level from the data, a hyperprior are observed before the direct data xo for 00. This route is density h(.) yields a hyperparametern that determines the illustrated by the dashed lines in Figure 2. form of a Bayes prior density g, (.). For example, g, (.) The marginal sampling density, d, (x), is obtained by incould correspond to a normal distribution N(7q1,q2), with tegrating out the unobserved vector of "other" parameters Efron: Empirical Bayes Methods for Combining Likelihoods 0 = (0, 02,... 541 Now n = (M, A). Marginally, the components xk of x are independently normal, , OK), dc4(x) gJ 9(o) fo(x) dO, (10) dM,A(xk) 9,(0=) = II 9,(Ok, a = fO, (Xk). Bayes's rule gives h(1Ix) = ch(r1)dq(x) N(M, A + 1), (18) so that the MLE 4 = (M, A) has K (11) M as the conditionaldensity of rj, given x, with c a positive constantas before. This providesan inducedprior density =x = EZxk/K k=1 and for 00: 9x(0o) Jh(rlIx)gn K (o) dr. (12) A = Z(xk -_.)2/K - 1. (19) k==1 A final applicationof Bayes's rule gives the posterior densityof 00 based on all of the data: Lo(0o) equals ce-(00o-x) /2. We assume that A in (19) is nonnegative. (13) PX(0oIxo) = cgx (0oo)fo0(xo) = cgx (0o) Lo (0o) Formula (16) gives a MLE posterior density for 00 that The genericpositive constantc may differin the two parts is normal, of (13). HierarchicalBayes interval estimates for 00 are computedfrom px(00Ixo). Notice that (13) generalizesthe pi,(Oolxo) N(x + (1 - B)(xo - t), 1 - B), usual form of Bayes's rule that would apply if i (and so g,) were known: [B_= Pn,(Oolxo)= cg?7(Oo)Lo(0o). (20) (14) level aointerval for 00 goes The generalization consists of averaging g,7(0o) over so an empirical Bayes one-sided from -oc to (1-B)(xo-x) - (1-x) /B , with h(r1jx). The hierarchicalBayes formula(13) neatly separatesthe informationfor 00:the likelihoodLo(0o) is the directinformationabout00 containedin xo; the inducedpriordensity g9(Oo)is the informationabout 0o in the other data x. In practice,though,(13) can be difficultto use becauseof the substantialBayesianinputsrequiredin Figure2. Empirical Bayes methodscan be thoughtof as practicalapproximations to a full hierarchicalBayes analysis. The maximumlikelihoodestimate(MLE)of rqbased on the otherdatax is fi = argmaxc hd7 (x)}* (15) 1.645 etc. Formula (20) suggests the empirical Bayes point estimator X(.95)= 0o=J+(I-B)(xo-x) (21) for 00. This is a little different and not as good as the JamesStein estimator So=Z + (1-B)(xo-X) { B(xo)/ZK(x1)2 B =(K -2)/ ' (Xk-) (22) A familiarempiricalBayes tactic is to replacethe induced priorgx(Oo)with the MLE prior gj7(00) in (13), giving the see Efron and Morris (1973, lem. 2). The diminished accu- MLE posterior density: pj,(00Ixo) - cg7 (0o)Lo(0o) (16) (see Morris 1983). Percentiles of the distributioncorrespondingto pf7(Golxo) providea simple form of empirical Bayes confidenceintervalsfor 00. The hyperpriordensity h(.) is no longer necessary.This approachand improvements on it are consideredin the next three sections. We follow similarideas of CarlinandGelfand(1990) andLaird andLouis (1987),particularlyin Section5, which concerns correctingthe biases in (16). As a simple example, suppose that both gYr() and fo,( ) in Figure2 are normal: Sk '-N(M, A) and xkISk -N(Ok, 1) for k =O, 1, 2, . ..,K. (17) racy comes from the fact that (20) does not use xo in the estimation of q = (M, A). We can imagine estimating (M, A) as in (19), except us.. . ,XK, in which xo is ing a data set xo,XO.... ,XO,X1,X2, repeated no times. For large K, straightforward calculations show that the choice no = 5, rather than no = 0 as in (19), makes the empirical Bayes point estimate (2.16) nearly equal the James-Stein estimate (22). Efron and Morris (1973, sec. 2) derived no = 3 as the optimal choice for a slightly different estimation problem. The point here is that xo can reasonably be included in the estimation of X in formula (16). This is done in the examples that follow, though it would also be feasible to carry out the calculations with x0 excluded. In these examples x indicates all of the data, with the cases numbered 1, 2, ... ., K as in Section 1. Journal of the American Statistical Association, June 1996 542 3. EXPONENTIALFAMILIESOF PRIOR DENSITIES The family of possible prior densities g, (.) in Figure 2 is an essential ingredient of our empirical Bayes analysis. This section discusses a wide class of priors based on exponential family theory. Exponential family priors are flexible enough to allow a model building and model checking approach to empirical Bayes problems, more in the spirit of a regression analysis than of a traditional conjugate prior Bayes solution. The price for this generality is solutions that are numerical and approximate rather than formulaic and exact. An exponential family .F of prior densities g,7(0) for a real-valued parameter 0 can be written as .F = { n(0) = e?7't(0)(23) 1(?) go(0), E A}. Here n is a p-dimensional parameter vector-the natural or canonical vector-and t(0) is the p-dimensional sufficient vector, depending on 0. For example, we might take p = 3 and t(0) = (0,02, 03)'. The nonnegative function go(0) is the carrier density. The moment-generating function 0(nQ) is chosen to make g,(0) integrate to 1 over 0, the range of 0: Ie g(0) dO= len dO= 1. t(0)-0(q)Yo(0) (25) If h(0) is a vector or a matrix, then definition (25) applies component-wise. Notice that the integrals in (25) are onedimensional no matter what the dimensions of r7,t, or h. This leads to easy numerical evaluation of I, (h). Existence of the integral (25) is assumed in what follows, as are other formal properties such as differentiation under the integral sign. Brown (1984, chap. 2) showed why these properties usually hold in exponential families. The marginal density of Xk, the kth component of x in Figure 2, can be expressed in terms of the likelihood Lk (0k) = CfOk (Xk ), dr,(Xk) = = g4 J (0k)fok(Xk) XI (0k 0go )en't(Ok a Letting In(h) m, (xk) m,(Xk) t(O)h(O)e,'t(O)go (0) dO= In(th). J be the marginal log-likelihood for xk, = I(1) + 1og c, log I(Lk)-1og we obtain the marginal score function (O/9r1)m,(xk) (29): a M?(Xk) = In (tL) -IT7 (t) from (31 a) The total score (a/arj)m,(x) is the sum of (31a) over the marginally independent components Xk, K mq(x)= E (31b) (Xk)_ a It is easy to compute the marginal MLE i, (15), from the estimating equation: a K [In(tLk) Inj(t)1 In the examples that follow, (32) was solved by NewtonRaphson iteration. The second derivative matrix is obtained from (29) and (31): - 02 m(x) K rFI (t2) _ (I(t) 7 {[ In(1L) I[*(Lk) ~2 In(1)Lk) (33) I( Ir(Lk))]} Here (02/aTr2)m,(x) is the p x p matrix with entries (a2/aT1iaTjj)M7(x), and for any p vector v, the notation v2 indicates the p x p matrix (vivj). A more intuitive explanation of (32) and (33), relating to the EM algorithm, appears in Remark B. Newton-Raphson iterates i(0), '(1),... converging to the solution of (32) are obtained from the usual updating equation, )Lk (0k) dSk/61'(4) * (26) 1)--j +(j n(j~~~~ ~~~~~ matrix_(2 The[observedinformation dO =lego eo(r)) (0)e,)'t(0) = I, (1) (27) according to (24), formula (26) gives = (30) dOk Because dn (Xk ) (29) -log(d,7(Xk)) (24) The convex set A is the subset of 7 vectors for which (24) is possible. All of these are standard definitions for an absolutely continuous exponential family on the real line, following Lehmann (1983, sec. 1.4), though (23) may look peculiar because 0 is the random variable rather than the parameter. It is easy to find the marginal MLE i, (15), in the exponential family F. For any function h(0), define I, (h) to be the integral, In(h)-lX h(0)er) t(0)go (0) dO. The constant c depends on xk but not on the choices of , t(0) or go(0) in (23). The gradient (a/0rj)Ir,(h) (i.e. the vector with components (&/r/Ak) I4(h) for k = 1, 2, ... ,p) is cIrB(Lk)/IrB(1) . (28) 2M(x) g(34) The observed information matrix -(02/0rn2)m~(x) l, gives an approximate covariance matrix for the MLE i, Efron: Empirical Bayes Methods for Combining Likelihoods 543 quad..r~~~~~~~~~ -. ~~~~~cubic A 00 ~ '7 ~ cbi .2~~~~~~~~12 e o - -5 5.--..~ (a) e 5 0 5 (b) Figure 3. Fitting a Prior Density gq (.) to the Ulcer Data Likelihoods. (a) Carriergo equals average of likelihoods (dotted line); quad (solid line) is gr4 from quadratic model t(O) = (0, 02)'; cubic (dashed line) is gr4 from cubic model t(O) = (0, 02, 03)1. (b) log(gj4) for quadratic (solid line) and cubic (dashed line) models. Cases k = 40, 41 in Table 1 were not included in this analysis. Notice that i and cov(i) are obtained from the likelihoods , LK, with no further reference to the original L1, .L2... data X1, X2.... ,XK. Jackknife estimates of covariance are used as a check on formula (35) in what follows, and in fact give larger estimated variance in our examples. This program was applied to the ulcer data of Table 1, with cases k = 40, 41 excluded for reasons discussed later, so K = 39. The carrier go(0) in (23) was taken to the average likelihood, indicates no improvement in going from the quadraticto the cubic model. Other possible improvements on the quadratic model fared no better. All these calculations excluded cases k = 40 and 41 in Table 1. L41(041) is perfectly flat, which drops it out of the formulas whether or not we intend to do so. Figure 1 shows that L40(040) is located in the far negative reaches of the 0 axis. When case k = 40 was included in the calculations, W significantly favored the cubic model over the quadratic. But looking at the individual summands on the right side K of (37), case k = 40 by itself accounted for almost all of (36) E Lk (0), 9?(o) = It was on these grounds that case 40 was W's significance. k=1 removed from the analysis as an overly influential outlier. with 58 Lk(0) dO= 1 as in Figure 1. The dottedcurve in In general, it is a good idea to consider the components of Figure 3a shows go. Other choices of go, discussed later, W rather than just W alone. made little difference to the estimation of g9. The MLE quadratic prior density g2(0) has expectation The results of two different choices of the exponential -1.22 and standard deviation 1.19. How accurate are these family F, (23), are shown in Figure 3: the quadratic model values? Jackknife standarderrors were computed by removt(0) - (0,02)' and the cubic model t(0) = (0,02,03)1. ing each case k = 1, 2,.. ., 39 in turn and recalculating g7 (We do not need an intercept term in t (0), for example, (which amounts to thinking of the cases xk as being ran(1, 0, 02)1, because constantsare absorbedinto 0(n).) The domly sampled from some superpopulation): cubic MLE g2 is centrally a little narrower than 9g for the quadratic model, but it is bimodal, with a second mode at Expectation =-1.22 ? .26 the left end of the 0 scale. The difference looks more draStandard deviation = 1.19 ? .31. (38) matic in Figure 3b, which plots the log densities. The quadratic model forces g2(0) to be nearly symmet- The ? numbers are one jackknife standarderror.These nonric, whereas the cubic model allows for asymmetry. Does parametricjackknife standarderrors were larger than those the cubic model provide a significantly better fit to the ul- obtained parametrically from (35). In doing the jackknife cer data? We can use Wilks's likelihood ratio criterion to calculations, it was possible to remove the deleted case from answer this question. Let I'(c) (h) and In(q)(h) indicate (25) the definition of go(0) in (36) and also remove it from the for the cubic and quadraticMLE models. According to (28), MLE equation (32), thereby accounting for the data-based Wilks's statistic is choice of carrier. But doing so made very little difference here. The choice of the average likelihood (36) for the carrier W = 21og d(c)(x) dn(q) (X) is convenient but not at all necessary. Smoother versions of Yo(0), obtained by Gaussian smoothing of (36), were tried (1)) (37) (Lk) on the ulcer data. These gave much different carriers go(0) = 2E og ([(c) In(q) but almost identical estimates of the prior density g, (0); see Remark A. Both here and in the toxoplasmosis example of We can compare W to a x2 distribution to test for a signif- Section 6, the choice of go seemed to be the least sensiicantly better fit. In this case the computed value W = .64 tive aspect of the prior fitting process. A flat density over 544 Journal of the American Statistical Association, June 1996 the rangeof 0 gave almost the same results,but of course this would not always be true. Recent work by the author extendsformula(35) to include a data-basedchoice of the carrier,but such formulascan be avoidedby the use of the jackknifeas in (38). In the normal-normalsituationof (17), the averagelikelihood go(0) approachesa N(M, A + 2) distributionas K -- oo, considerablymore broadlysupportedthan g (0) N(M, A), the prior density that we are trying to estimate. This wider supportwill usuallybe the case, making (36) at least a sufficientlybroadchoice for the carrier. RemarkA. Returningto Figure2, supposethat01, 02,... (but not 00) are observed without error, so Lk (Ok) is a delta function at Ok. In this case the average likelihood (36) is equivalentto the empiricaldistributionon {0o, 02,..*, OK}. Then it is easy to show that the estimated posteriordensity p,4(Oo xo), (16), is the discrete distribution puttingprobabilityLo (Ok)/ > Lo(0j) on Ok. This is a reasonableestimator,but we might prefer a smoother answer.Smoothnesscould be achievedby convolving(36) with a normalkernel,as mentionedpreviously. 9g, say 9B. Typically, YB would be a vague, uninformative prior density, not containing much information about 00. The choice C is assumed to be independent of all random mechanisms in Figure 2. This amounts to allocating a priori probability hB that the information in x is irrelevant to the estimation of 00. The statistician must choose hB, perhaps using a conventional value like .05, but this is the only nondata-based element required in the analysis. Having observed x, there are two possible prior densities for 00: - 00 9A with probability hA, or OK 00 - with probability hB= 9B 1 - hA, (41) where we have renamed gx(00), (12), as 9A (00) for convenient notation. These give xo the marginal densities or dB(xo) = fEY9B(Oo)foo dA(XO) = fgeA(00)foo(XO)dOo density being with the overall marginal (xo) dOo, d(xo) -hAdA(xo) + hBdB(xo). (42) After xo is observed, the a posteriori probabilities for the RemarkB. Formulas(32) and (33) can be rewrittenin two choices in (41) are termsof conditionalexpectationsand covariances: hA(Xo) = hAdA(xo)/d(xo) K ar m2 (x) -, 0977 ~ [E {t(0k)lXk} -En {t (0k) }] (39) and k=1 hB(xo) and (43) The overall posterior density for 00 is 02 -0 hBdB (xo)/d(xo). mq(x) hA (Xo)PA (So |xo) + hB (Xo)PB (So Ixo), K - S [COV, { t (0k) - COV, {t (0k) IXk}] (40) (44) where k=1 (45) PC(So|xo) = gc (Oo)fo.(xo)/dc (xo) These expressionsare relatedto the EM algorithmandthe missing-dataprinciple,the missing data in this case being for C equal A or B. the unobservedparameters01, 02, . . ., OK (see LittleandRuThere are two extreme cases of (44). If hA(xO) = 1, then bin 1987, sec. 7.2, and Tanner1991). we are certain that x is relevant to the estimation of 00. In this case the a posteriori density for Oois PA(OOIXO) 4. RELEVANCE A crucial issue in empiricalBayes applicationsis the relevanceof the "other"datax in Figure2 to the particular parameterof interest00. EmpiricalBayes methodsusually pull extremeresultstowardthe groupmean.If 00represents somethingdesirablelike a cure rate, then the relevanceof the other data may suddenlyseem debatable.This section concernsa schemefor assessingthe relevanceof the other data to 00, allowing 00 to opt out of the empiricalBayes schemeundercertaincircumstances.This idea is relatedto the limitedtranslationestimatorof EfronandMorris(1971, 1972). It is easy to modifythe hierarchicalBayes modelto allow for exceptionalcases. We assumethata randomchoice "C" has been madebetweentwo differentpriordensitiesfor 0o: C = A with probabilityhA, "A" meaning that Figure 2 appliesas shown;but C = B with probabilityhB = 1 - hA, "B" meaningthat0o is drawnfrom a differentdensitythan = Px(0oIxo), (13). If hA(xo) = 0, then x is completely irrelevant, and the a posteriori density is PB(Oo xO). The examplethat follows takes PB (O0IXO)to be the confidence density, meaning that by definition the a posteriori intervals based on PB (OoIXO) coincide with confidence intervals for 00 based only on xo; see Remark D. Bayes rule gives hA(XO) hA R(x) hB(Xo) hB where R(xo) = fe, 9A(O)Lo(O)dO dA(xo) dB(xo) -e 9B(Oo)Lo(00) dOo (46) At this point it looks like we could use (46) to evaluate hA(xo) and hB(xo), and then take (44) as our overall a postenioni density for Oo, an appealing compromise between Efron: Empirical Bayes Methods for Combining Likelihoods 545 Figure 4 shows the percentiles of the overall posterior distribution, density (44), as a function of hA, the prior probability that 08 should be included in the empirical Bayes scheme. The central .90 interval with hA = .95 is (-5.26, -1.787), compared to (-5.10, -1.76) for hA = 1.00, the full empirical Bayes inference. The likelihood ratio R(x8) = dA(x8)/dB(x8) was estimated as .47 from (47) and (50), so hA = .95 results in a posteriori probability hA(xo) = .90. As a point of comparison, the same computation gives R(x) = .077 and hA(xo) = .59 for xo representing experiment 40, the excluded case in Table 1. C') I 0.0 I 0.2 I 0.4 I 0.6 hA 0.8 1.0 Figure 4. Interval Estimates for 08, Ulcer Data, as a Function of hA, the Prior Probability that 08 is Drawn From the Same Density as the Other Parameters Ok. Curves indicate percentiles of the overall posterior density (55). Likelihood ratio R, (57), estimated to equal .47. fully including or fully excluding 00 from the empirical Bayes inference scheme. But there is a practical difficulty. Uninformative priors such as 9B(0O) are usually defined only up to an unspecified multiplicative constant. This produces a similar ambiguity in R(xo), say R(xo) = cr(xo), (47) where r(xo) is a known function but c is an unknown positive constant. What follows is a data-based scheme for evaluating c. Notice that Remark C. If we choose the uninformative prior gB correctly, then the percentiles of the posterior density PB (0o x0) = cgB (0o)Lo(0o) will nearlyequalthe confidence limits for 00 based on xo. The prior density (51) is correct in this sense for the likelihood L8(08) obtained by conditioning the eighth 2 x 2 ulcer data table on its marginals. This means that the percentiles at hA = 0 in Figure 4 are nearly the standard confidence limits for 08 based on (a8, b8, c8, d8). Remark D. Formula (51) is a Welch-Peers prior density, one that gives a posteriori intervals agreeing with ordinary confidence intervals to a second order of approximation. Of course, simpler choices, such as 9B(OO) = constant, might be used instead. Earlier work (Efron 1993, sec. 6) gives a straightforwardnumerical algorithm for calculating Welch-Peers densities in general exponential family situations. Formula (51) was actually obtained from the doubly adjusted ABC likelihood (6.26) of that paper. 5. BIAS CORRECTION The MLE prior density gi1(Oo) tends to be lighter-tailed than the induced prior gx(Oo), (12). This bias makes interI (48) hB(xo)d(xo)dxo = hB, val estimates based on the MLE posterior density narrower than they should be according to the corresponding hieraraccording to (43). This just says that the marginal expectachical Bayes analysis. Bias correction has become a major tion of hB(xo) is hB. Solving (46), (47) gives theme of recent empirical Bayes literature (see, for example, Carlin and Gelfand 1990, 1991, Laird and Louis 1987, + AA cr(xo) and Morris 1983). hB(xO) 1 This section introduces a new bias correction technique We can use (48) and (49) to define a method-of-moments specifically designed for hierarchical Bayes situations. The technique uses confidence interval calculations to approxiestimate c for c: mate the induced prior gx(0o) that we would have obtained starting from an appropriateuninformative hyperprior den(50) - hB. 1+ hA sity h(Qq). The actual form of h(Qq)is never required here, (Xk/ k=1 hB which is a great practical advantage when dealing with the Having determined c, we can now use (46) and (47) to eval- general exponential families of Section 3. uate the compromise a posteriori density (44) for 00. Suppose that having observed x from the multiparametric This scheme was applied to the estimation of 08 for the family of densities d,,(x), we wish to make inferences about ulcer data. The density 9A in (41) was a bias-corrected ver- -y(), a real-valued parameter of interest. In what follows, sion of the quadratic MLE g71shown in Figure 3 (9A = gt 'y(,q) is ga (0o) evaluated at a fixed value of 00. Let 'y[oa]be of Fig. 6 in the next section). The uninformative prior 9B the endpoint of an approximateone-sided level oaconfidence interval for -v having second-order accuracy, was taken to be gB(08) = ce 168608 for reasons discussed in Remarks C and D. (51) Prt7{?y< Yx [H]} = +??(kr). (52) 546 Journal of the American Statistical Association, June 1996 -~ ~ ~ ~~7 -S Here 8- is the usual estimate of standard error for a and (a, b, cq) are the three ABC constants described by DiCiccio and Efron (1992), Efron (1993), and Efron and Tibshirani (1993). The numerical calculations needed for (a, b, cq) are about twice those required to compare (8. The calculations here were done using abcnon, an S program available from statlib@lib.stat.cam.edu by typing the one-line message "send bootstrap funs from S". This is a nonparametric version of the ABC method that treats the K likelihoods Lk (0k) as iid observations from a space of random functions. Here is a simple example where the confidence expectation method can be directly compared to a genuine hierarchical Bayes analysis starting from an uninformative hyperprior.We consider the normal situation (17) and (18) with K = 20. The data vector x is taken to be a stylized normal sample, -: 0 442 4 2 e x = c0 (1-1(.5)/20), Figure 5. Estimates of Prior Density for the Normal Situation (17), (18); K = 20, Stylized Sample x, (55). All estimates are symmetric about 0 = 0. The solid line is the induced Bayes priorgx(0o) startingfrom vague hyperprior dmdA/(A + 1); stars show MLE gfi (0o); small dots indicate confidence expectation gt (0), nicely matching the Bayes curve; large dashes indicate density obtained from Carlin and Gelfand's (1990) calibration method, called cdf correction here. All of the curves are normalized to have area 1. Second-order accuracy says that the accuracy of the approximate interval is improving an order of magnitude faster than the 12Sz usual rate in the sample size K. The ABC intervals (DiCiccio and Efron 1992) are convenient secondorder-accurate intervals for the calculations that follow, though other schemes could be used. g( The confidence expectation of tYt= j a[a] given x is defined as doz. (53) This is the expectation of the confidence density, the density obtained by considering the confidence limits to be quantiles; for example, that n exists in ( [.z90],K[. 91) with probability .01. Efron (1993) shows that the confidence density derived from the ABC limits matches to second order the Bayes a posteriori distribution of y(,q) given x, starting from an appropriate uninformative prior densityh(a), the Welch-Peers density. The confidence expectationoft can be thought of as an adjusted version of the MLE [ =-d(t) that better matches tc of the a posteriori expectation given x, starting from an uninformative distribution for q. We will apply this to s = ot4(00) for a fixed value of 00, producing the confidence expectation gtA (0') according to definition (53). This th the confidence otroiexpectation of'ygvnx startin of bias correction frofor is expectation method BC is ityd(e) The appeal of (0l) that it is a direct approximafor Theo itral fixeld valu ofpl approduingatheonmfition to the induced priorgx(iso), (12), starting from a vague isthe confidence expectation mehd5fbiscrrcio:o Welch-Peers is the a posteriori prior gxt(y) h(t), because g11(Oo). The apelo_ +~C(O stat it i a direct appoxma 1-1(1.5/20), ... ., -1 (19.5/20)), (55) with the constant co chosen to make 20 x/20 = 2. Then (M, A), (19), equals (0, 1), so that the MLE prior gi,(00) is the standard normal density (1/v2ir)exp(-.502). This is shown as the starred curve on the left half of Figure 5. The Bayes curve in Figure 5 is the induced prior gx (0o) obtained from the uninformative hyperior dMdA/(A + 1), A > 0. It has heavier tails than the MLE g4(0o). The confidence expectation density gt (00), obtained from (54) using abcnon, nicely matches the Bayes curve. Another bias correction technique, called cdf correction here, appears in Figure 5. This is a bootstrap method proposed by Carlin and Gelfand (1990), following suggestions by Cox (1975) and Efron (1987). Cdf correction is a calibration technique introduced to achieve what Morris (1983) calls the empirical Bayes confidence property: for example, that a 95% empirical Bayes interval for 00 will contain 00 95% of the time when averaging over the random selection of 00 and (x0, x) in Figure 2. This is a somewhat different criterion than bias correcting g4 (0o) to match gx(0o), and in fact cdf correction does not match the Bayes curve very well in Figure 5. Details of these calculations were provided in earlier work (Efron 1994, sec. 4), along with another more favorable example of cdf correction. Laird and Louis' (1987) Bayesian bootstrap technique was also briefly discussed in that article, where it performed the poorest of the three methods. On the basis of admittedly limited experience, the author slightly prefers the confidence expectation method to cdf correction, and strongly prefers either of these to the Bayesian bootstrap. The confidence expectation method of bias correction is applied to the ulcer data in Figure 6, again using abcnon. Figure 6a shows that gtA(0) has a substantially heavier left tail than the uncorrected MLE prior g- (0), but there is not much difference in the right tail. Figure 6b compares the posterior MLE density pt7, (16), for populations 8 and 13 with the corresponding posterior ptAreplacing gt7(0) with 547 Efron: Empirical Bayes Methods for Combining Likelihoods II -5 ci-i g 4-i p'(O I lx~~~~~~~~~Welx3 and L' I R (a) -10 p~~~~~~~~~~~(O~~131x13) I , , -5 , I - 0 (b - - 0 -5 5-10 5 (b) (a) Figure 6. Application of Bias Correction to the Ulcer Data. (a) The solid line is the quadratic MLEgi) from Figure 3; the dotted line is confidence expectation density gt. It has a noticeably heavier left tail. Dashed lines indicate likelihoods for populations 8 and 13. (b) Solid lines show MLE aposteriori density for population 8 and population 13; dotted lines show a posteriori densities replacing g24 with gt. The bias correction makes a substantial difference for population 8, but not for population 13. gt (0). Bias correction has a large effect on the inference for population 8, but not for population 13. The cdf correction method was also applied to the ulcer data, producing a bias corrected prior density with tails moderately heavier than gi (0). Only minor changes resulted in the posterior densities for populations 8 and 13. Remark E. We could bias correct the posterior density = cg,(Oo)Lo(Oo) rather than the prior density pi)(0OOXO) gj1(0o).This is the route taken in all of the references. An advantage of the tactic here, bias correcting gi (00), is that it need be done only once rather than separately for each Ok. In the example of this article, it was numerically more stable than bias correcting pi (Ooxo) in cases where Lo(0o) was far removed from gj1(00). 6. on an unknown parameter vector r7, gn(6) = et/'t(6)>-0(W)go(6). As in (25), we define I,(h) = f h(6)e17't(6)go(6) d6 for any function h(6), the integral being over the range of 6. We observe the likelihood functions + O = zk ?k, as in (8), and we wish to estimate (R, 3). Define Lk(6k)- where the fixed effects Vk depend on known q-dimensional covariate vectors Ckand an unknown q-dimensional parameter vector 3, Vk = C3. (57) The random effects 8k are assumed to be iid draws from an exponential family like (23), with density g, (6) depending LXk(vk (60) + 6k), thought of as a function of 6k, with Vk and Xk held fixed. It is easy to verify that the marginal density of Xk is dn,(xk) (56) (59) = Cfok (Xk) Lxk (0k) REGRESSION AND EMPIRICALBAYES The individual parameters Ok in Figure 2 can be linked to each other by a regression equation, as well as having an empirical Bayes relationship. It is easy to incorporate regression structure into the empirical Bayes methodology of the preceding sections. Doing so results in a very general version of the classical mixed model. The random-effects portion of the mixed model is used to explain the overdispersion in the binomial regression example that follows. There is a large literature concerning normal theory random effects in generalized linear models (see Breslow and Clayton 1993). We write 0k as the sum of fixed and random effects, (58) (61) = cIh(Lk)/hIn (1), as in (28). Then (31a,b) follow, giving the maximum likelihood equation with respect to r7: o ,a logd,(x) a?1 k [h(tLk) I7 IK(t)i [In(Lk) I77(1) Letting Lk-k a DVk Lxk?&k) 6k k k and 2 Lk Lk (63) thought of as functions of Vk, with 6k and Xk held fixed, the maximum likelihood equation with respect to 3 is o = a9 logd, O(x) = a'3 ZCkI?(Lk)/I?(Lk)) (64) ~~~~k Together, (62) and (64) determine the maximum likelihood estimates (i, /), with their solutions being found by 548 Journal of the American Statistical Association, June 1996 Table 2. The Toxoplasmosis Data k r s n k r s n k r s n k r s n 1 2 3 4 5 6 7 8 9 1,620 1,650 1,650 1,735 1,750 1,750 1,756 1,770 1,770 5 15 0 2 2 2 2 6 33 18 30 1 4 2 8 12 11 54 10 11 12 13 14 15 16 17 18 1,780 1,796 1,800 1,800 1,830 1,834 1,871 1,890 1,900 8 41 3 8 0 53 7 24 3 13 77 5 10 1 75 16 51 10 19 20 21 22 23 24 25 26 27 1,918 1,920 1,920 1,936 1,973 1,976 2,000 2,000 2,050 23 3 0 3 3 1 1 0 7 43 6 1 10 10 6 5 1 24 28 29 30 31 32 33 34 2,063 2,077 2,100 2,200 2,240 2,250 2,292 46 7 9 4 4 8 23 82 19 13 22 9 11 37 356 697 Total: NOTE: Number of subjects testing positive (s), number tested (n), and annual rainfall in mm. (r), 34 cities in El Salvador. Newton-Raphsoniteration.The p x p secondderivativematrix -(02/0ri2)log _(02 is given by (33). The q x q matrix dc,(x) /&/32)log dc,p3(x) iS F,~~~~~~~~~~ ( 1i(-Lk) I_II(Lk) ) 02 0932 logd,7,(x) EZCk 0/32 In (Lk) n (Lk 1 k (65) and the p x q off-diagonalsecondderivativematrixis a-1og log dn, (x) z I_h(tLk) [h,(tLk) 1h,(Lk) Ck. The solid curvein the left panel of Figure7 is the MLE cubic logistic regression,transformedback to the probability scale 7r= 1/(1 + e-9). The cubic model is quite significantbut it fails to accountfor nearlyhalf of the city-to-city variability:the deviancesfrom the cubic fit are abouttwice as big as they shouldbe undermodel (67) and (68). This overdispersioncan be explained,or at least described,in terms of the randomeffects in model (56)-(58). Roughly speaking,the varianceof g9,(6)accountsfor abouthalf of the total deviance of the data points from the regression curve. The mixed model (56) was appliedto the toxoplasmosis data,with vk (66) =3o + /3lrk + 32rk + /3r 3 (69) using the quadraticmodel t(6) - (8,62)1 in (58). As a first Table2 shows the data from a disease prevalencestudy step, the vk in (69) were fit to the observedproportions in 34 cities andEl Salvador(Efron1986). In city k, Sk out Pk = sk/nk by ordinarylogistic regression,giving an initial of nk subjectstestedpositivefor toxoplasmosisantibodies, estimate /3(1). Notice that this amounts to assuming that the k = 1, 2,.. ., 34. We assume a binomial sampling model, all equalzero;thatis, g,(6) puts all of its probabilityon 8 = 0. The initial fitted values v(l) determine likelihoods (k = + ( I Bi(nk, rk), , Sk = Lk (k) LXk(Zk(') + &k). The carrierdensity go(6) used in (58) was ZkLk/34, with 55Lk(k) d8k =1. where Ok is the logit of the true positive rate irk in city Equation(62) was then solved to producean initialguess k. A cubic logistic regressionin rainfallwas suggestedby 14(1) for rj. The solid curve in the right panel of Figure 7 is previousanalyses: 8k rV gi)(1).Solving (64), with rq /3lrk + /32r2 + 33r3. (68) p(2) gives a second estimate r(1), - k = 10 + for 3, and so on, iteratingtowardthe MLE (1, /). 0~~~~~~~~~~~~~~~~ + o) 160 1600 0 20ri 180/0 1800 2000 (a) 2200 + rain-I -2-(0 t20 -2 -1 0 1 2 (b) Figure 7. Model (56) Applied to the Toxoplasmosis Data. (a): Rainfall versus proportion positive Pk Sk/lnk. The solid line is the MLE cubic logistic regression model, (69); the dashed line is the second iterative estimate. (b) The quadratic model estimates for gq; the solid line is the first estimate, and the dashed curve is the second estimate, nearly the same. 549 Efron: Empirical Bayes Methods for Combining Likelihoods Figure8 is the convolution 9 = 9(1) EDN(.209,.0972). (72) The It has standarddeviation.53 comparedto .43 for 9 ad hoc estimator(72) is reasonablehere, but can be improvedby taking into accountnonnormalityin the likelihood for v7. The solid curve in Figure 8 is an empiricalBayes posterior density for 07, likelihoodbased on .90 intervalis 07 E obtained by multiplying the binomial n7) = (2, 12) by 3. The resulting (S7, (-1.54, -.13) or 77 - (.18,.47), (73) which,for comparison,does not containthe usualbinomial point estimate7r7 2/12 = .17. 7. -6 4 -2 0 2 4 6 Figure 8. Bayes Analysis of the Toxoplasmosis Rate in City 7. Stars indicate binomial likelihood based on (S7, n7) = (2, 12); the dashed line is prior density g, (72), incorporating random effect plus variability in estimation of fixed effect V7; the solid line is the a posteriori density for 07. Dots indicate gi)(i), the prior density not including variabilityin V7. In fact, there w as never much change from qn1,z31) The dashed lines in Figure 7 show the second estimate, the latternearlyequallinggi)(i). 1/(1 + e-rk ) andg9t(2), instead of t(6) - (6, 62)' did t Using (6) (6,62,63)' not significantly improve the fit to the data. The bias corwere small rections Section 5otine ignored in enough to be acltosa yjckie Th vluof .97wa the carrier of choices case. Smoother this density go had tnaderrta h sa a lotdul in~* (3)In ahdcre3i h in Figure 7 thtmdl(9 assumes~ shown All~ in all, the'scret effect. little density gn(,) is a reasonable estimate of the random component in the mixed model (6.1). This density has expectation and standard deviation -. 13 1t.0 and .43 ? .17, (70) as in (38). These conclusions about the 6 distribution can be thought of as an overdispersion analysis of the toxoplasmosis data. The methods used here apply quite generally, though in this case the results are nearly the same as those obtained for the normal theory model of Anderson and Aitkin (1985) or of Breslow and Clayton (1993). Suppose that we wish to estimate the toxoplasmosis rate in any one city; for instance, city 7. This is similar to the empiricalBayes estimationof 08 in Figure 6, except that now an empiricalBayes confidenceintervalfor 07 = ZJ7 ' needs to include the variability in regression estimate 3(1 gave 67 1v7= .209 ?E.097. . The cubic logistic (71) SUMMARY This article proposes an empiricalBayes methodology for the practicalsolution of hierarchicalBayes problems. The emphasisis on computer-basedanalyses that do not requirespeciallychosen mathematicalforms for their success. Thereare four main aspectsto our approach: 1. The use of likelihoodsLk (0k) ratherthanthe Xkthemselves, as inputsto the empiricalBayes algorithm.This allows the componentdatasets xk to be complicatedin their own right,each perhapsinvolvingmany nuisanceparameters in additionto the parameterof interestOk. In the ulcer data example of Table 1, each Xk involves two or three nuisanceparameters,havingto do with the marginal2 x 2 probabilities.The conditionallikelihood(5) removesthese. Earlierwork (Efron 1993) shows how to get approximate likelihoodslike (5), with all nuisanceparametersremoved, in quite generalsituations. 2. The use of specially designed exponentialfamilies of prior densities, (23), ratherthan conjugateor normal priors. Modern computertechnology,combinedwith the fact that the componentlikelihoodsLk(Ok) are each onedimensional,makesthis numericallypractical.Besides being more flexible, the specially designed families allow model building and model checking in the spirit of a regressionanalysis,as in Figure 3. 3. Allowing an individualcomponentproblem to opt out of the empiricalBayes scheme if the relevanceof the "other"dataappearssufficientlydoubtful.This avoidsovershrinkageof particularlyinterestingresults,an unpleasant Bayes analyses(see featureof most empirical/hierarchical Efron 1981, sec. 8). The relevancecalculationsof Section4 robustifythe empiricalBayes estimationprocess in a way thatallowsgenuinelyunusualcases to standby themselves. Anotherway to do this would be by insisting on heavier trialsin the priorfamily (23), but the schemehere is easier to implementand to interpret. 4. A general nonparametricapproachto bias correcting the MLE priordensity g27.The confidenceexpectation method of Section 5 fits in well with the goal of emulating a full hierarchicalBayes analysis for Figure 2, under the usual assumptionthat the hyperprioris uninformative. 550 Journal of the American Statistical Association, June 1996 Earlier work (Efron 1994) also presents a jackknife implementation of Carlin and Gelfand's cdf correction method that is computationally competitive with the confidence expectation approach. Neither method is yet on a solid theoretical basis, but the specific cases considered here are encouraging. This methodology is capable of providing empirical Bayes analyses for much more complicated situations than the ulcer or toxoplasmosis examples. For instance, 0k might be the proportion of HIV-positive 30-year-old white females in city k, while xk is an extensive epidemiological survey of that city. A reasonable goal of this work is to allow efficient and practical meta-analyses of arbitrarilycomplicated parallel studies. [Received November 1993. Revised August 1995.] REFERENCES Anderson, D., and Aitken, M. (1985), "VarianceComponent Models With Binary Response: Interviewer Variability,"Journal of the Royal Statistical Society, Ser. B, 47, 203-210. Barndorff-Nielsen, 0. (1986), "Inference on Full or Partial Parameters Based on the Standardized Log Likelihood Ratio," Biometrika, 73, 307322. Breslow, N., and Clayton, D. (1993), "Approximate Inference in Generalized Linear Mixed Models," Journal of the American Statistical Association, 88, 9-25. Brown, L. (1984), Fundamentals of Statistical Exponential Families, Monograph 9, Institute of Mathematical Statistics Lecture Notes, Hayward, CA: Institute of Mathematical Statistics. Carlin, B., and Gelfand, A. (1990), "Approachesfor Empirical Bayes Confidence Intervals," Journal of the American Statistical Association, 85, 105-114. (1991), "A Sample Reuse Method for Account Parametric Empirical Bayes Confidence Intervals,"Journal of the Royal Statistical Society, Ser. B, 53, 189-200. Cox, D. R. (1975), "Prediction Intervals and Empirical Bayes Confidence Intervals,"in Perspectives in Probability and Statistics, ed. J. Gani, New York: Academic Press, pp. 47-55. Cox, D. R., and Reid, N. (1987), "ParameterOrthogonality and Approximate Conditional Inference" (with discussion), Journal of the Royal Statistical Society, Ser. B, 49, 1-39. DiCiccio, T., and Efron, B. (1992), "More Accurate Confidence Intervals in Exponential Families," Biometrika, 79, 231-245. Efron, B. (1982), "Maximum Likelihood and Decision Theory,"TheAnnals of Statistics, 10, 340-356. (1982), "The Jackknife, the Bootstrap, and Other Resampling Plans," SIAM CBMS-NSF Monograph 38. (1986), "Double-Exponential Families and Their Use in Generalized Linear Regression," Journal of the American Statistical Association, 81, 709-721. (1987), Comment on "EmpiricalBayes Confidence Intervals Based on Bootstrap Samples" by N. Laird and T. Louis, Journal of the American Statistical Association, 82, 754. (1993), "Bayes and Likelihood Calculations From Confidence Intervals," Biometrika, 80, 3-26. (1994), "Empirical Bayes Methods for Combining Likelihoods," Technical Report 166, Stanford University. Efron, B., and Morris, C. (1971), "Limiting the Risk of Bayes and Empirical Bayes Estimators-Part I: The Bayes Case," Journal of the American Statistical Association, 66, 807-815 (1971). (1972), "Limiting the Risk of Bayes and Empirical Bayes Estimators-Part II: The Empirical Bayes Case," Journal of the American Statistical Association, 67, 130-139. (1973), "Stein's Estimation Rule and Its Competitors-and Empirical Bayes Approach," Journal of the American Statistical Association, 68, 117. Efron, B., and Tibshirani, R. (1993), An Introduction to the Bootstrap, New York: Chapman and Hall. Laird, N., and Louis, T. (1987), "Empirical Bayes Confidence Intervals Based on Bootstrap Samples" (with discussion), Journal of the American Statistical Association, 82, 739-750. Lehmann, E. (1959), TestingStatistical Hypotheses, New York:John Wiley. (1983), Theory of Point Estimation, New York: John Wiley. Morris, C. (1983), "Parametric Empirical Bayes Inference: Theory and Applications," Journal of the American Statistical Association, 78, 4759. Sacks, H. S., Chalmers, T. C., Blum, A. L., Berrier, J., and Pagano, D. (1990), "Endoscopic Hemostasis, an Effective Therapy for Bleeding Peptic Ulcers," Journal of the American Medical Association, 264, 494499. Tanner, M. (1991), Tools for Statistical Inference-Observed Data and Data Augmentation, New York: Springer-Verlag.

Empirical Bayes Methods for Combining Likelihoods

Related documents

Products

Support

Empirical Bayes Methods for Combining Likelihoods

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib