Georgian Mathematical Journal Volume 10 (2003), Number 2, 319–324 WEAK CONVERGENCE OF A DIRICHLET-MULTINOMIAL PROCESS PIETRO MULIERE AND PIERCESARE SECCHI Abstract. We present a random probability distribution which approximates, in the sense of weak convergence, the Dirichlet process and supports a Bayesian resampling plan called a proper Bayesian bootstrap. 2000 Mathematics Subject Classification: 62G09, 60B10. Key words and phrases: Dirichlet process, proper Bayesian bootstrap. 1. Introduction The purpose of this paper is to throw light on a random probability distribution called the Dirichlet-multinomial process that approximates, in the sense of weak convergence, the Dirichlet process. A Dirichlet-multinomial process is a particular mixture of Dirichlet processes: in two previous works [11, 12] we showed that the process supports a Bayesian resampling plan which we called a proper Bayesian bootstrap suitable for approximating the distribution of functionals of the Dirichlet process and therefore being of interest in the context of Bayesian nonparametric inference. Under different names, variants of the Dirichlet-multinomial model have been recently considered by other authors: see, for instance, [7] and the references therein. In fact, it has been pointed out that the Dirichlet-Multinomial model is equivalent to Fisher’s species sampling model [5] recently reconsidered by Pitman among those extending the Blackwell and MacQueen urn scheme [13]. However none of these works allude to a connection between the Dirichletmultinomial model and Bayesian bootstrap resampling plans. Recent applications of our proper Bayesian bootstrap include those in [3] for the approximation of the posterior distribution of the overflow rate in discrete-time queueing models. In Section 2 we define the Dirichlet-multinomial process and we show that it can be used to approximate a Dirichlet process. Section 3 is dedicated to the proper Bayesian bootstrap algorithm and its connections with the Dirichletmultinomial process. 2. A Convergence Result Let P be the class of probability measures defined on the Borel σ-field B of <; for the reason of simplicity we work with < but all the arguments below still hold if < is replaced by a separable metric space. Endow P with the topology c Heldermann Verlag www.heldermann.de ISSN 1072-947X / $8.00 / ° 320 PIETRO MULIERE AND PIERCESARE SECCHI of weak convergence and write σ(P) for the Borel σ-field in P. With these assumptions P becomes a separable and complete metric space [14]. A useful random probability measure P ∈ P is the Dirichlet process introduced by Ferguson [4]. When α is a finite, nonnegative, nonnull measure on (<, B) and P is a Dirichlet process with parameter α, we write P ∈ D(α). We want to define a random element of P that is a mixture of Dirichlet processes; according to [1] we thus need to specify a transition measure and a mixing distribution. Given w > 0, let αw : P × B → [0, +∞) be defined by setting, for every P ∈ P and B ∈ B, αw (P, B) = wP (B). The function αw is a transition measure. Indeed, for every P ∈ P, αw (P, ·) is a finite, nonnegative and nonnull measure on (<, B) whereas, for every B ∈ B, αw (·, B) is measurable on (P, σ(P)) since σ(P) is a smallest σ-field in P such that the function P → P (B) is measurable, for every B ∈ B. ∗ Given a probability distribution P0 , let X1∗ , . . . , Xm be an i.i.d. sample of size ∗ ∗ m > 0 from P0 . Assume Pm ∈ P to be the empirical distribution of X1∗ , . . . , Xm defined by m 1 X ∗ δX ∗ , Pm = m i=1 i ∗ where δx denotes the point mass at x. Write Hm for the distribution of Pm∗ on (P, σ(P)). Roughly, the following definition introduces a process P such that, conditionally on Pm∗ , P ∈ D(wPm∗ ). Definition 2.1. A random element P ∈ P is called a Dirichlet-multinomial process with parameters (m, w, P0 ) (P ∈ DM(m, w, P0 )) if it is a mixture ∗ of Dirichlet processes on (<, B) with mixing distribution Hm and transition measure αw . Remark 2.2. We call the process P defined above Dirichlet-multinomial since, as it will be seen in a moment, given any finite measurable partition B1 , . . . , Bk of <, the distribution of (P (B1 ), . . . , P (Bk )) is a mixture of Dirichlet distributions with multinomial weights. This process must not be confused with the Dirichlet-multinomial point process of Lo [9, 10] whose marginal distributions are mixtures of multinomial with Dirichlet weights. It follows from the definition that if P ∈ DM (m, w, P0 ), for every finite measurable partition B1 , . . . , Bk of < and (y1 , . . . , yk ) ∈ <k , Pr (P (B1 ) ≤ y1 , . . . , P (Bk ) ≤ yk ) Z ∗ (u), = D(y1 , . . . , yk |αw (u, B1 ), . . . , αw (u, Bk )) dHm P where D(y1 , . . . , yk |α1 , . . . , αk ) denotes the Dirichlet distribution function with parameters (α1 , . . . , αk ) evaluated at (y1 , . . . , yk ). With different notation, we WEAK CONVERGENCE OF A DIRICHLET-MULTINOMIAL PROCESS 321 may say that the vector (P (B1 ), . . . , P (Bk )) has a distribution ³ M Mk ´ ^ 1 Dirichlet w ,...,w multinomial (m, (P0 (B1 ), . . . , P0 (Bk ))) ; m m (M1 ,...,Mk ) i.e., a mixture of Dirichlet distributions with multinomial weights. For our purposes, the introduction of the Dirichlet-Multinomial process is justified by the following theorem. Theorem 2.3. For every m > 0, let Pm ∈ P be a Dirichlet-multinomial process with parameters (m, w, P0 ). Then, when m → ∞, Pm converges in distribution to a Dirichlet process with parameter wP0 . The result appears in [11] as well as in [13]. See also [8]. For ease of reference we sketch a simple argument, inspired by [16], that we consider as a nice didactic illustration of Prohorov’s Theorem. Proof. Given any finite measurable partition B1 , . . . , Bk of <, the distribution of the vector (Pm (B1 ), . . . , Pm (Bk )) weakly converges to a Dirichlet distribution with parameters (wP0 (B1 ), . . . , wP0 (Bk )) when m → ∞. In order to prove that Pm weakly converges to a Dirichlet process with parameter wP0 it is therefore enough to show that the sequence of measures induced on (P, σ(P)) by the processes Pm , m = 1, 2, . . . , is tight. Given ² > 0, let Kr , r = 1, 2, . . . , be a compact set of < such that P0 (Krc ) ≤ ²/r3 and define n 1o . Mr = P ∈ P : P (Krc ) ≤ r T The set M = ∞ r=1 Mr is compact in P. For m = 1, 2, . . . and r = 1, 2, . . . , E[Pm (Krc )] = P0 (Krc ) and thus ³ ² 1´ c ≤ rP0 (Krc ) ≤ 2 . Pr Pm (Kr ) > r r Hence, for every m = 1, 2, . . . , ∞ ∞ ³ X X 1´ 1 Pr(Pm ∈ M ) ≥ 1 − Pr Pm (Krc ) > ≥1−² . ¤ 2 r r r=1 r=1 3. Connections with the Proper Bayesian Bootstrap Let T : P → < be a measurable function and P ∈ D(wP0 ) with w > 0, P0 ∈ P. It is often difficult to work out analytically the distribution of T (P ) even when T is a simple statistical functional like the mean [6, 2]. However, when P0 is discrete with finite support one may produce a reasonable approximation of the distribution of T (P ) by a Monte Carlo procedure that obtains i.i.d. samples from D(wP0 ). If P0 is not discrete, we propose to approximate the distribution of T (P ) by the distribution of T (Pm ), where Pm is a Dirichlet-multinomial process with parameters (m, w, P0 ) and m is large enough. Of course, since the Continuous Mapping Theorem does not apply to every function T, the fact that Pm converges in distribution to P does not always 322 PIETRO MULIERE AND PIERCESARE SECCHI imply that the distribution of T (Pm ) is close to that of T (P ). However, we proved in [12] that this is in fact the case when T belongs to a large class of linear functionals or when T is a quantile. In [12] we also tested by means of a few numerical examples a bootstrap algorithm that generates an approximation of the distribution of T (P ) in the following steps: ∗ (1) Generate an i.i.d sample X1∗ , . . . , Xm from P0 . w , 1). (2) Generate an i.i.d. sample V1 , . . . , Vm from a Gamma( m (3) Compute T (Pm ), where Pm ∈ P is defined by m X 1 Vi δXi∗ . P m = Pm i=1 Vi i=1 (4) Repeat steps (1)–(3) s times and approximate the distribution of T (P ) with the empirical distribution of the values T1 , . . . , Ts generated at step (3). It is easily seen that the probability distribution Pm generated in step (3) is in fact a trajectory of the Dirichlet-multinomial process with parameters (m, w, P0 ). We may therefore conclude that the previous algorithm aims at approximating the distribution of T (P ) by distribution of T (Pm ), where Pm ∈ DM (m, w, P0 ), and approximates the latter by means of the empirical distribution of the values T1 , . . . , Ts generated in step (3). Remark 3.1. Step (1) is useless when P0 is P discrete with finite support {z1 , . . . , zm } and P0 (zi ) = pi , i = 1, . . . , m, with m i=1 pi = 1. In fact, in this case one may generate at step (3) a trajectory of P ∈ D(wP0 ), by taking m X 1 P Pm = m Vi δzi i=1 Vi i=1 where V1 , . . . , Vm , are independent and Vi has distribution Gamma(wpi , 1), i = 1, . . . , m. We call the algorithm (1)–(4) the proper Bayesian bootstrap. To understand the reason for this name consider the following situation. A sample X1 , . . . , Xn from a process P ∈ D(kQ0 ), with k > 0 and Q0 ∈ P, has been observed and the problem is to compute the posterior distribution of T (P ) where T is a given statistical functional. Ferguson [4] proved that the P posterior distribution of P is again a Dirichlet process with parameter kQ0 + ni=1 δXi . In order to approximate the posterior distribution of T (P ) our algorithm generates an i.i.d. ∗ from sample X1∗ , . . . , Xm ! Ã n k n 1X Q0 + δX k+n k + n n i=1 i ∗ , and then, in step (3), produces a trajectory of a process that, given X1∗ , . . . , Xm P m −1 ∗ is Dirichlet with parameter = (k +n)m i=1 δXi and evaluates T with respect to this trajectory. The algorithm is therefore a bootstrap procedure since it samples from a mixture of the empirical distribution function generated by WEAK CONVERGENCE OF A DIRICHLET-MULTINOMIAL PROCESS 323 X1 , . . . , Xn and Q0 which, together with the weight k, elicits the prior opinions relative to P. Because it takes into account prior opinions by means of a proper distribution function, the procedure was termed proper. The name proper Bayesian bootstrap also distinguishes the algorithm from the Bayesian bootstrap of Rubin [15] that approximates the posterior Pn distribution of T (P ) by means of the distribution of T (Q) with Q ∈ D( i=1 δXi ). We already noticed in the previous work [12] that there are no proper priors for P which support Rubin’s approximation and that the proper Bayesian bootstrap essentially becomes the Bayesian bootstrap of Rubin when k tends to 0 or n is very large. Acknowledgements We thank an anonymous referee for his helpful comments that greatly improved the paper. Both authors were financially supported by the Italian MIUR’s research project Metodi Bayesiani nonparametrici e loro applicazioni. References 1. C. Antoniak, Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2(1974), 1152–1174. 2. D. M. Cifarelli and E. Regazzini, Distribution functions of means of Dirichlet process. Ann. Statist. 18(1990), No. 1, 429–442. 3. P. L. Conti, Bootstrap approximations for Bayesian analysis of Geo/G/1 discrete time queueing models. J. Statist. Plann. Inference, 2002 (in press). 4. T. S. Ferguson, A Bayesian analysis of some nonparametric problems. Ann. Statist. 1(1973), No. 2, 209–230. 5. R. A. Fisher, A. S. Corbet, and C. B. Williams, The relation between the number of species and the number of individuals in a random sample of animal population. J. Animal Ecology 12(1943), 42–58. 6. R. C. Hannum, M. Hollander, and N. A. Langberg, Distributional results for random functionals of a Dirichlet process. Ann. Prob. 9(1981), 665–670. 7. H. Ishwaran and L. F. James, Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96(2001), 161–173. 8. H. Ishwaran and M. Zarepour, Exact and approximate sum-representations for the Dirichlet process. Canad. J. Statist. 30(2002), 269–283. 9. A. Y. Lo, Bayesian statistical inference for sampling a finite population. Ann. Statist. 14(1986), No. 3, 1226–1233. 10. A. Y. Lo, A Bayesian bootstrap for a finite population. Ann. Statist. 16(1988), No. 4, 1684–1695. 11. P. Muliere and P. Secchi, A note on a proper Bayesian mootstrap. Technical Report #18, Dip. di Economia Politica e Metodi Quantitativi, Università di Pavia, 1995. 12. P. Muliere and P. Secchi, Bayesian nonparametric predictive inference and bootstrap techniques. Ann. Inst. Statist. Math. 48(1996), No. 4, 663–673. 13. J. Pitman, Some developments of the Blackwell–MacQueen urn scheme. Statistics, probability and game theory, 245–267, IMS Lecture Notes Monogr. Ser., 30, Inst. Math. Statist., Hayward, CA, 1996. 324 PIETRO MULIERE AND PIERCESARE SECCHI 14. Yu. V. Prohorov, Convergence of random processes and limit theorems in probability theory. Theory Probab. Appl. 1(1956), 157–214. 15. D. M. Rubin, The Bayesian bootstrap. Ann. Statist. 9(1981), No. 1, 130–134. 16. J. Sethuraman and R. C. Tiwari, Convergence of Dirichlet measures and the interpretation of their parameter. Statistical decision theory and related topics, III, Vol. 2 (West Lafayette, Ind., 1981), 305–315, Academic Press, New York–London, 1982. (Received 3.10.2002; revised 23.01.2003) Authors’ addresses: Pietro Muliere Università L. Bocconi Istituto di Metodi Quantitativi Viale Isonzo 25, 20135 Milano Italy E-mail: pietro.muliere@uni-bocconi.it Piercesare Secchi Politecnico di Milano Dipartimento di Matematica Piazza Leonardo da Vinci 32, 20132 Milano Italy E-mail: secchi@mate.polimi.it