September 14-16, S.Co. 2009, Politecnico di Milano, Italy A generalized sequential construction of exchangeable Gibbs partitions with application Annalisa Cerquetti Collegio Nuovo, Pavia, Italy September 15, 2009 Outline • Motivation • BNP analysis of species sampling problems under Gibbs priors • Can be embedded in Pitman’s framework? • Background • Exchangeable Gibbs Partitions (EGPs) • Central and non-central Generalized Stirling numbers • Chinese Restaurant Process constructions • Contribution • Groups sequential CRP for Gibbs partitions • Embedding BNP results in the construction Motivation Background Contribution BNP analysis under Gibbs priors: can be embedded? Bayesian NP analysis of species sampling problems • Lijoi et al. (2007) Biometrika, 94, 769-786 • Lijoi et al. (2008) Ann. Appl. Probab. 18, 1519-1547 propose BNP analysis under Gibbs priors • given a sample from P a.s. discrete with atoms (Pi ) ∼ PK (ρα , γ) • k distinct species observed with frequencies (n1 , . . . , nk ) they obtain conditional posterior/predictive results for an additional sample w.r.t. • the number of new distinct species • the number of new observations belonging to new species • the random partition induced by the additional sample Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution BNP analysis under Gibbs priors: can be embedded? My question was: Is it possible to embed this analysis in the typical Pitman’s combinatorial framework for exchangeable partitions theory? Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction Exchangeable random partitions. [Kingman, 1975] • a sample (X1 , . . . , Xn , . . . ) from a.s. discrete p.m. P induces a random partition Π = {A1 , . . . , Ak , . . . } of N by i ≈ l ⇔ Xi = Xl • for each restriction to [n], for nj = |Aj |, Pr (Πn = {A1 , . . . , Ak }) = p(n1 , . . . , nk ) for some symmetric p called the exchangeable partition probability function (EPPF) of Π. Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction recall Poisson-Kingman models on P1↓ [Pitman, 2003] A large class of models for ERP (priors for BNP) is obtained by generalizing infinite sum construction of the Dirichlet process as follows: • for J1 ≥ J2 ≥ · · · ≥ 0 ranked points of a Poisson prox on (0, ∞) with general mean intensity ρ(x), (with some constraints) • (Pi ) = (Ji /T ) ∼ Poisson-Kingman (ρ) on P1↓ and even a larger class of mixed PK (ρ, γ) models is obtained as Z ∞ PK (ρ|t)γ(dt) PK (ρ, γ) := 0 for general mixing density γ. Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution EPPFs in Gibbs product form Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction [Gnedin and Pitman, 2005] → an interesting sub-class of PK (ρ, γ) for BNP arise for ρ = ρα the Levy density of the α-stable law for α ∈ (0, 1) → PK (ρα , γ) models are characterized by inducing EPPF in Gibbs form of type α i.e. k Y p(n1 , . . . , nk ) = Vn,k (1 − α)nj −1↑ . j=1 → consistency conditions for Π to be infinite imply the weights (Vn,k ) must satisfy the backward recursion Vn,k = (n − αk)Vn+1,k + Vn+1,k+1 Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction Some known combinatorial formulas For n = 0, 1, 2, . . . , and arbitrary real x and h, (x)n↑h denotes generalized rising factorial (x)n↑h := n−1 Y (x + ih) = hn (x/h)n↑ i=0 for which the following multiplicative law holds (x)n+r ↑h = (x)n↑h (x + nh)r ↑h (1) A binomial formula also holds (x + y )n↑h = n X n (x)k↑h (y )n−k↑h k k=0 Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction as well as a generalized version of the multinomial theorem, p X ( zj )n↑h = p X P nj ≥0, nj =n j=1 Y n! (zj )nj ↑h . n1 ! · · · np ! (2) j=1 An application of (1) yields (zj )nj +mj −1↑ = (zj )mj −1↑ (zj + mj − 1)nj ↑ (3) and by (2) X P nj ≥0, nj =n p p p Y Y X n! (zj )nj +mj −1↑ = (zj )mj −1↑ ( (zj + mj − 1))n↑ = n1 ! · · · np ! j=1 j=1 j=1 = p p Y X (zj )mj −1↑ (m + zj − p)n↑ . j=1 j=1 which agrees with a Lemma in Lijoi et al. (2008) Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction Partial Bell polynomials and generalized Stirling numbers The number of ways to partition [n] into k blocks and assign each block a combinatorial structure, Bn,k (w• ) = n! k! X k Y wn (n1 ,...,nk ) i=1 i ni ! . (4) Generalized Stirling numbers arise as Bn,k (w• ) for particular w• , η,β Sn,k = Bn,k ((β − η)•−1↓η ), where (x)n↓−h = (x)n↑h . In general for arbitrary distinct reals η and β, are the connection coefficients defined by (x)n↓η = n X η,β Sn,k (x)k↓β k=0 Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction −1,−α Hence for η = −1, β = −α, and α ∈ (−∞, 1), Sn,k is defined by (x)n↑1 = n X −1,−α Sn,k (x)k↑α , (5) k=0 and for wni = (1 − α)ni −1↑ and α ∈ [0, 1), equation (4) yields Bn,k ((1 − α)•−1↑ ) = n! k! X k Y (1 − α)n −1↑ i (n1 ,...,nk ) i=1 ni ! −1,−α = Sn,k . (6) i.e. the partial Bell we encounter with Gibbs EPPF of type α. Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction Generalized Stirling vs Generalized Factorial In Lijoi et al. (2007, 2008) the treatment is in terms of generalized α factorial coefficients, Cn,k defined by (αy )n↑1 = n X α Cn,k (y )k↑1 . k=0 From (5), if x = y α then (y α)n↑1 = n X −1,−α Sn,k (y α)k↑α = k=0 n X −1,−α k Sn,k α (y )k↑1 , k=0 hence −1,−α α Sn,k = Cn,k /αk . Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction non-central Generalized Stirling numbers [Hsu and Shiue (1998)] Moreover the following convolution relation holds, −1,−α,γ Sn,k = n X n −1,−α S (−γ)n−s↑1 , s s,k (7) s=k hence α,γ −1,−α,γ Cn,k = αk Sn,k = n X n α C (−γ)n−s↑1 , s s,k s=k and non-central generalized Stirling numbers are the connection −1,−α,γ coefficients Sn,k such that (y α − γ)n↑1 = n X −1,−α,γ k Sn,k α (y )k↑1 = k=0 Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano n X −1,−α,γ Sn,k (y α)k↑α . (8) k=0 Generalized CRP for Gibbs partitions with application Motivation Background Contribution Exchangeable Gibbs partitions Some combinatorial formulas and generalized Stirling numbers Basic CRP construction Sequential CRP construction of exchangeable partitions Given an infinite EPPF, p(n) = p(n1 , . . . , nk ), assume that • an unlimited number of customers arrives sequentially in a restaurant • for n ≥ 1, given (n1 , . . . , nk ), the placement of the first n customers at k tables, the (n + 1)th customer is: → seated at table j, for 1 ≤ j ≤ kn , with probability pj,n = pj (n) = p(nj+ ) p(n) → seated at a new table with probability p0,n = p0,n (n) = for l = kn + 1, Pk+1 j=1 p(nl+ ) p(n) pj,n + p0,n = 1, p(nj+ ) = p(n1 , . . . , nj + 1, . . . , nk ) Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis A groups sequential variation of the CRP [Cerquetti, 2008] Given an infinite EPPF, p(n) = p(n1 , . . . , nk ), assume that • an unlimited number of groups of customers arrives sequentially in a restaurant • for n ≥ 1, given the placement of the first group of n customers in a (n1 , . . . , nk ) configuration in k tables, the new group of m customers is: Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis → all seated at the old k tables in configuration (m1 , . . . , mk ), with probability p(n1 + m1 , . . . , nk + mk ) , p(m|n) = p(n1 , . . . , nk ) → all seated at k ∗ new tables in configuration (s1 , . . . , sk ∗ ) with probability p(n1 , . . . , nk , s1 , . . . , sk ∗ ) , p(s|n) = p(n1 , . . . , nk ) (9) (10) → s < m are seated at k ∗ new tables in configuration (s1 , . . . , sk ∗ ) and the remaining m − s customers at the k old tables in configuration (m1 , . . . , mk ) with probability p(m, s|n) = p(n1 + m1 , . . . , nk + mk , s1 , . . . , sk ∗ ) . p(n1 , . . . , nk ) Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano (11) Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis Groups sequential rules for EPPFs in Gibbs form For p(n1 , . . . , nk ) = Vn,k k Y (1 − α)nj −1↑ j=1 formulas (9), (10) and (11) specialize as follows, k Vn+m,k Y p(m|n) = (nj − α)mj ↑ , Vn,k (12) j=1 ∗ k Vn+m,k+k ∗ Y p(s|n) = (1 − α)sj −1↑ Vn,k (13) j=1 ∗ k k Y Vn+m,k+k ∗ Y p(m, s|n) = (nj − α)mj ↑ (1 − α)sj −1↑ . Vn,k j=1 Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano (14) j=1 Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis Embedding Lijoi et al.’s BNP analysis - some examples The distribution of the random partition (s1 , . . . , sk∗ ) induced by the additional sample [Lijoi et al. 2008, Prop. 1] arises marginalizing (14) with respect to (m1 , . . . , mk ): p(s1 , . . . , sk ∗ |n1 , . . . , nk ) = m Vn+m,k+k ∗ Vn,k m−s X P j (m1 ,...,mk ) mj =m−s,mj ≥0 m−s m1 , . . . , mk Y k k∗ Y (nj −α)mj ↑ (1−α)sj −1↑ = j=1 j=1 and then exploiting the generalized version of the multinomial theorem k∗ Y m Vn+m,k+k ∗ (n − kα)m−s↑ (1 − α)sj −1↑ . = Vn,k m−s (15) j=1 Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis Embedding BNP analysis (continued) P The number of new observations s = sj belonging to new species [Lijoi et al. 2008, Eq. 11] arises by (15) by summing over the space of all partitions of s elements in k ∗ blocks, and then marginalizing with respect to k ∗ , and resorting to the definition of Generalized Stirling numbers, s X 1 m −1,−α (n − kα)m−s↑ Vn+m,k+k ∗ Ss,k Pr (S = s|n1 , . . . , nk ) = . ∗ Vn,k s ∗ k =0 (16) Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis Embedding BNP analysis (continued) The distribution of the number k ∗ of new species in the additional sample [Lijoi et al. 2007, Eq. 4] arises marginalizing the joint conditional distribution of S and K ∗ with respect to S, and by the convolution relation defining non-central generalized Stirling numbers, Pr (K ∗ = k ∗ |n1 , . . . , nk ) = Vn+m,k+k ∗ −1,−α,−(n−kα) Sm,k ∗ . Vn,k (17) N.B. Some additional examples and a connection established between the notion of conditional Gibbs structures of Lijoi et al. (2008) and the deletion of classes property of Pitman (2003) may be found on the full paper on the ArXiv repository. Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis Final comments Is it possible to embed Lijoi et al.’s analysis in the typical Pitman’s combinatorial framework of exchangeable partitions theory? Yes, it is Is that interesting and worth a submission/a paper? I was not sure, so I checked Prof. Pitman’s opinion According to Prof. Jim Pitman (May 2008) it • provides some simplifications of Lijoi et al.’s work, • connects it better to the Gnedin-Pitman development, • provides a better platform for future work, hence it deserves a series of submission attempts. (still ongoing...) Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application Motivation Background Contribution Groups variation of CRP Embedding BNP analysis Selected references Cerquetti, A. (2008) Generalized Chinese restaurant construction of exchangeable Gibbs partitions and related results. ArXiv:0805.3853 [math.PR] Gnedin, S. and Pitman, J. (2005) Exchangeable Gibbs partitions and Stirling triangles arXiv:math.PR/0412494. Kingman, J.F.C. (1975) Random discrete distributions. JRSS B, 37, 1–22. Kingman, J.F.C (1978) The representation of partition structure. J. London Math. Soc. 2, 374–380. Lijoi, A., Mena, R. and Prünster, I. (2007) Bayesian nonparametric estimation of the probability of discovering new species. Biometrika, 94, 769-786 Lijoi, A. Prünster, I. and Walker, S.G. (2008) Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18, 1519-1547 Pitman, J. (2003) Poisson-Kingman partitions. IMS, LN 40, 1–34. Institute of Mathematical Statistics, Hayward, California. (arXiv:math.PR/0210396). Pitman, J. (2006) Combinatorial Stochastic Processes. Ecole d’Eté de Probabilité de Saint-Flour XXXII - 2002. Lecture Notes N. 1875, Springer. Annalisa Cerquetti S.Co. 2009, Sept 15 - Politecnico di Milano Generalized CRP for Gibbs partitions with application