Multivariate Jacobi and Laguerre polynomials, infinite-dimensional

advertisement
Multivariate Jacobi and Laguerre polynomials, infinite-dimensional
extensions, and their probabilistic connections with multivariate
Hahn and Meixner polynomials.
Robert C. Griffiths
∗
Dario Spanò
†
September 9, 2008
Abstract
Multivariate versions of classical orthogonal polynomials such as Jacobi, Hahn, Laguerre, Meixner
are reviewed and their connection explored by adopting a probabilistic approach. Hahn and Meixner
polynomials are interpreted as posterior mixtures of Jacobi and Laguerre polynomials, respectively. By
using known properties of Gamma point processes and related transformations, an infinite-dimensional
version of Jacobi polynomials is constructed with respect to the size-biased version of the Poisson-Dirichlet
weight measure and to the law of the Gamma point process from which it is derived.
1
Introduction.
The Dirichlet distribution Dα on d < ∞ points, where α = (α1 , . . . , αd ) ∈ Rd+ , is the probability distribution
on the (d − 1)−dimensional simplex
∆(d−1) := {(x1 , . . . , xd−1 ) ∈ [0, 1]d−1 :
d−1
X
xj ≤ 1},
j=1
described by
Γ(|α|)
Dα (dx1 , . . . , dxd−1 ) = Qd
Ãd−1
Y
i=1 Γ(αi )
!
i −1
xα
i
(1 − |x|)αd −1 dx1 · · · dxd−1 ,
(1)
i=1
Pd
where, for every d ∈ N and z ∈ Rd , |z| := j=1 zi .
Such a distribution plays a central role in Bayesian Statistics as well as in Population Genetics. In Statistics, it
is the most used class of so-called prior measures, assigned to the random parameter X = (X1 , . . . , Xd−1 , 1 −
|X|) of a statistical d-dimensional Multinomial likelihood with probability mass function
µ ¶
|n| n
Bx (n) := P(n|X = x) =
x ,
n ∈ Nd
(2)
n
where
∗ Department
† Department
µ
|n|
n
¶
:=
|n|!
,
n1 ! · · · nd !
xn := xn1 1 · · · xnd d .
of Statistics, University of Oxford, griff@stats.ox.ac.uk
of Statistics, University of Warwick, D.Spano@warwick.ac.uk
1
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
In Population Genetics, Dα arises as the stationary distribution of the so-called d-types Wright-Fisher
diffusion process (X(t) : t ≥ 0) on ∆(d−1) used to model the evolutionary behavior of d allele frequencies in
an infinite population of genes with parent-independent, neutral mutation. The generator of the diffusion is
d
Ld =
d
d
1 XX
∂2
1X
∂
xi (δij − xj )
+
(αi − |α|xi )
,
2 i=1 j=1
∂xi ∂xj
2 i=1
∂xi
(3)
where δxy is the Kronecker delta, equal to 1 if x = y and to 0 otherwise. Both models in Bayesian Statistics
and Population Genetics are equivalently described in terms of what one expects to observe from a collection
of |n| individuals (genes) (|n| = 1, 2, . . .) sampled from the entire population (at a given time). When
the individuals are exchangeably sampled from Dirichlet random proportions X = (X1 , . . . , Xd−1 , 1 − |X|),
the probability of finding exactly n1 , . . . , nd individuals (genes) of type 1, . . . , d, respectively, is given by a
Dirichlet mixture of Multinomial distributions, defined, for every n ∈ Nd , by
µ ¶ Qd
Z
|n|
i=1 αi (ni )
DMα (n) =
Bx (n)Dα (dx) =
,
(4)
n
|α|(|n|)
∆(d−1)
where
a(z) :=
Γ(a + z)
.
Γ(a)
There are several infinite-dimensional versions of the Dirichlet distribution, as d → ∞ and |α| → |θ| > 0,
which will be described in Section 2, with substantially similar applications in Statistics and Population
Genetics. Under a hypothesis of multinomial sampling, they all induce discrete measures which are a
modification of DMα .
In this paper we will review multivariate orthogonal polynomials, complete with respect to weight measures
given by Dα or DMα , that is, polynomials {Gn : n ∈ Nd } satisfying
Z
1
Gn Gm dµ =
δnm
n, m ∈ Nd .
(5)
cm
We will call {Gn } multivariate Jacobi polynomials if (5) is satisfied with µ = Dα , and multivariate Hahn
polynomials if µ = DMα . Here cm are positive constants. Completeness means that, for every function f
with finite variance (under µ), there is an expansion
X
f (x) =
cn an Gn (x),
(6)
n∈Nd
where
an = E [f (X)Gn (X)] .
Systems of multivariate orthogonal polynomials are not unique, and a large number of characterizations of
d-dimensional Jacobi and Hahn polynomials exist in literature. We will focus on a construction of Jacobi
polynomials, based on a method originally proposed by Koornwinder [16] which has a strong probabilistic
interpretation, by means of which we will be able to: (1) describe multivariate Hahn polynomials as posterior
mixtures of Jacobi polynomials, in a sense which will become precise in section 5; (2) construct, in Section
4, a system of multiple Laguerre polynomials, orthogonal with respect to the product probability measure
|y|
d
γα,|β|
(dy)
y α e− |β|
:=
Γ(α)|β||α|
with
Γ(α) :=
d
Y
y ∈ Rd+
(7)
Γ(αi );
i=1
2
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
(3) derive, in section 6, a new class of multiple Meixner polynomials as posterior mixtures of the Laguerre
polynomials mentioned in (2); (4) obtain polynomials in the multivariate Hypergeometric distribution by
taking the parameters in the Hahn polynomials to be negative; (5) obtain (Section 3.4) asymptotic results
as the dimension d is let to go to infinity with |α| → |θ| > 0.
The intricate relationship connecting all the mentioned systems of polynomials is entirely explained by the
relationship existing among the respective weight measures, which becomes more transparent under a probabilistic approach; with this in mind we will begin the paper with an introductory summary (Section 2)
of known facts from the theory of probability distributions. Section 3.1 is devoted to multivariate Jacobi
polynomials, whose structure will be the building block for the subsequent sections: Multiple Laguerre in
Section 4, Hahn in Section 5, Meixner in section 6.
It is worth observing that the posterior-mixture representation of multivariate Hahn polynomials shown in
proposition 8 is obtained without imposing a priori any Bernstein-Bézier form to the Jacobi polynomials,
and nevertheless it agrees with recent interpretations of Hahn polynomials as Bernstein coefficients of Jacobi
polynomials in such a form ([22, 21]), a result for which a new, more probabilistic proof is offered in Section
5.2.1. Along the same lines one can view the Meixner polynomials obtained in Proposition 10 as re-scaled
Bernstein coefficients of our multiple Laguerre polynomials, as shown in Section 6.1.
On the other hand, our construction of Hahn polynomials is in terms of mixtures over polynomials in independent random variables, and as such, our derivation is closely related to the original formulation of
multivariate Hahn polynomials, as weighted products of univariate Hahn polynomials, proposed decades ago
by Karlin and Mac Gregor [12].
The original motivation for this study was to obtain some background material which can be used to characterize bivariate distributions, or transition functions, with fixed Dirichlet or Dirichlet-Multinomial marginals,
for which the following canonical expansions are possible:


∞


X
x, y ∈ ∆(d−1)
cn ρn Gn (x)Gn (y) Dα (dx)µ(dy),
p(dx, dy) = 1 +


|n|=1
for appropriate, positive-definite sequences ρm : m ∈ Nd , called the canonical correlation coefficients of the
model. Results on such a particular problem will be published in a subsequent paper. Other possible applications in statistics are related to least square approximations and regression. An MCMC-Gibbs sampler
use of orthogonal polynomials is explored, for example, in [20]. In this paper however we will focus merely
on the construction of the mentioned systems of polynomials.
2
2.1
2.1.1
Distributions on the discrete and continuous simplex.
Conditional independence in the Dirichlet distribution.
Gamma sums.
Denote by γ|α|,|β| (dz) the Gamma probability density function (pdf) with parameter (|α|, |β|) ∈ R2+
z
z |α|−1 e− β
γ|α|,|β| (dz) =
I(z > 0)dz.
Γ(|α|)|β||α|
For every α ∈ Rd+ and |β| > 0, let Y = (Y1 , . . . , Yd ) be a collection of d independent Gamma random variables
d
with parameter, respectively, (αi , |β|). Their distribution is given by the product measure γα,|β|
defined by
(7). Consider the mapping
(Y1 , . . . , Yd ) 7−→ (|Y |, X1 , . . . , Xd−1 )
3
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
where
Xj :=
It is easy to rewrite
Yj
,
|Y |
j = 1, . . . , d − 1
d
γα,|β|
(dy) = γ|α|,|β| (d|y|)Dα (dx)
Pd
that is: (i) |Y | := i=1 Yi is a Gamma(|α|, |β|) random variable, and (ii) X is independent of |Y | and has
Dirichlet distribution with parameter α.
2.1.2
Dirichlet as Right-Neutral Distribution.
Let X = (X1 , . . . , Xd ) a random distribution on {1, . . . , d} with Dirichlet distribution Dα , α ∈ Rd+ . Consider
Pj
the random cumulative frequencies Sj := i=1 Xi , j = 1, . . . , d − 1. Then the increments
Bj :=
Xj
,
1 − Sj−1
j = 1, . . . , d − 1
(8)
are independent random variables. This property is known as right-neutrality ([5]). In particular, each Bj
Pi
has a Beta distribution with parameters (αi , |α| − j=1 αj ).
To see this, rewrite Dα in terms of the increments Bj as defined by (8), for j = 1, . . . , d − 1. The change of
measure induces:
e α (db1 , . . . , dbd−1 ) =
D
(1 − bj ) i=j+1 αi −1
´ dbj
³
Pd
B αj , i=j+1 αi
bj j
j=1
=
d−1
Y
Pd
α −1
d−1
Y
D(αj ,Pd
i=j+1
j=1
αi ) (dbj )
(9)
which is the distribution of d − 1 independent Beta random variables. Notice that such a structure holds,
with different parameters, for any reordering of the atoms of X.
2.2
Unordered Dirichlet frequencies and limit distributions.
In many applications the locations of the atoms of a Dirichlet population have no intrinsic, material meaning,
and it is preferable to look at the distribution of these frequencies in an order-independent framework. Two
possible ways of unordering the Dirichlet atoms are: (1) rearranging the frequencies in a size-biased random
order; (2) ranking them in order of magnitude. The main usefulness of considering size-biased or ranked
distributions, is that they admit sensible limits as the dimension d grows to infinity, whereas the original
Dirichlet distribution is obviously bounded to finite dimensions. The two resulting distributions (known,
respectively, as the GEM and the Poisson-Dirichlet distribution) are in a one-to-one relation with each
other.
2.2.1
Size-biased order and the GEM distribution.
Let x be a point of ∆(d−1) , with |x| = 1. Then x induces a probability distribution on the group Gd of all
permutations of {1, . . . , d}:
σx (π) =
d−1
Y
i=1
1−
xπi
Pi−1
j=1
xπj
,
π ∈ Gd .
4
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
Let α ∈ Rd+ . The size-biased measure on ∆(d−1) induced by a Dirichlet distribution Dα is given by
Z
D̈α (A) = σx (π : πx ∈ A)Dα (dx).
Note that σ
ex {y} := σx (π : πx = y) is nonzero if and only if y is a permutation of x, and that
σ
ex {y} = σ
eπx {y} = σ
e{y}
∀π ∈ G,
hence the density of the size-biased measure is
X
dD̈α
(y) = σ
e{y}
Dα (d(π −1 y)).
dy
π∈GD
In particular, if α = (|θ|/d, . . . , |θ|/d) for some |θ| > 0 (symmetric Dirichlet), then its size-biased measure is
D̈|θ|,d (dx) =
d!
d−1
Y
1−
i=1
∝
d−1
Y
|θ|/d
Bi
xi
Pi−1
j=1
xj
(1 − Bi )
Dα (dx)
d−i
d θ−1
(10)
dBi ,
(11)
i=1
where Bi is defined as in (8). This is the distribution of d − 1 independent Beta random variables with
parameters, respectively, (|θ|/d + 1, (d − i/d)θ − 1), i = 1, . . . , d − 1.
The measure D̈|θ|,d is, again, a right-neutral measure.
Now, let d → ∞. Then D̈|θ|,d converges weakly to the law of a right-neutral sequence Ẍ ∞ = (Ẍ1 , Ẍ2 , . . .)
such that
j−1
Y
D
(1 − B̈i ),
j≥1
(12)
Ẍj = B̈j
i=1
for a sequence B̈ = (B̈1 , B̈2 , . . .) of independent and identically distributed (iid ) Beta weights with parameter
(1, |θ|) (here and in the following pages D means “in distribution”).
Definition 1. The random sequence Ẍ ∞ satisfying (12) for a sequence of Beta (1, |θ|) weights, is called the
GEM distribution with parameter |θ| (GEM (|θ|)).
2.2.2
Ranked Dirichlet frequencies and the Poisson-Dirichlet distribution.
d
Let Y = (Y1 , . . . , Yd ) be a vector with distribution γα,|β|
. Consider the function ρ : Rd → Rd which rearranges
d
the coordinates of Y ∈ R in a decreasing order. Then Y ↓ := ρ(Y ) is known as the order statistics of Y . If
d
all coordinates are iid with common parameter |θ|/d, then the law γ̃|θ|/d
of Y ↓ is given by
d
d
d
γ̃|θ|/d
= γ|θ|/d
◦ ρ−1 = d!γ|θ|/d
.
Since |Y | is stochastically independent of Y /|Y |, then it is also independent of f (Y /|Y |) for any function
↓
f, hence |Y | is independent of X ↓ := Y ↓ /|Y |. Denote the distribution of X ↓ by D|θ|,d
. For a symmetric
Dirichlet distribution (i.e. with α = (|θ|/d, . . . , |θ|/d), |θ| > 0) it is easy to verify that size-biased and ranked
Dirichlet frequencies co-determine each other via the relation:
5
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
D
(Ẍ)↓ = X ↓ ;
(13)
D
(X¨↓ ) = Ẍ
(14)
for any d = 2, 3, . . . .
Poisson point process construction ([15]).
Let Y ∞ = (Y1 , Y2 , . . .) be the sequence of points of a non-homogeneous point process with intensity measure
N|θ| (y) = |θ|y −1 e−y .
The probability generating functional is
µ
½Z
¾¶
½ Z
F|θ| (ξ) = E|θ| exp
log ξ(y)N|θ| (dy)
= exp |θ|
∞
¾
(ξ(y) − 1)y −1 e−y dy ,
(15)
0
for suitable functions ξ : R → [0, 1]. Then |Y ∞ | is a Gamma(|θ|) random variable and is independent of the
sequence of ranked, normalized points
ρ(Y ∞ )
.
X ↓∞ =
|Y ∞ |
Definition 2. The distribution of X ↓∞ , is called the Poisson-Dirichlet distribution with parameter |θ| > 0.
Remark 1. Obviously the GEM(|θ|) distribution can be redefined in a similar fashion: consider the same
point process Y ∞ and consider reorder their jumps by their size-biased random order, i.e. set
Ÿ1 = Yi1
with (random) probability Yi1 /|Y ∞ | and
³
´
P Ÿk+1 = Yi,k+1 |Ÿ1 , . . . , Ÿk =
Yi,k+1
,
Pk
|Y | − j=1 Ÿj
k = 1, 2, . . .
D
Denote the vector of all the size-biased jumps by Ÿ ∞ . Then |Ÿ ∞ | = |Y ∞ | is independent of the normalized
sequence
Ÿ ∞
Ẍ ∞ :=
|Ÿ ∞ |
and Ẍ ∞ has the GEM (|θ|) distribution.
Finite-dimensional distributions.
An important role in determining the finite-dimensional distribution of the Poisson-Dirichlet process is given
by its frequency spectrum or factorial moment measure.
Proposition 1. (Watterson [23]) Let ξ be the random measure corresponding to a Poisson-Dirichlet point
P|r|
process. Then, for every k ∈ N and distinct x1 , . . . , x|r| with i=1 xi < 1,
(|r|)
P(ξ(dx1 ) > 0, . . . , ξ(dx|r| ) > 0) = f|θ| (x1 , . . . , x|r| )dx1 · · · dx|r|
6
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
where

|θ|−1
|r|
|r|
X
|θ|
(|r|)
1 −
f|θ| (x1 , . . . , x|r| ) =
xi 
x1 · · · x|r|
i=1
(|r|)
(|r|)
Let F|θ| (dx) be the measure with density f|θ|
0 < x1 < . . . < x|r| < 1.
as in (16). Then
(|r|)
F|θ| (dx) = lim d[|r|] D(|θ|/d,...,|θ|/d, d−|r| |θ|) (dx),
d→∞
where
a[x] =
(16)
x ∈ ∆|r| ,
d
Γ(a + 1)
,
Γ(a + 1 − x)
(17)
a, x ∈ R, a + 1 > x.
(|r|)
Note that the finite-dimensional distributions of the size-biased permutation of F|θ|
the GEM (|θ|) distribution:
|r|
Y
xj
(|r|)
Pj−1 F|θ| (dx) = GEM|θ| (dx).
i=1 xi
j=1 1 −
coincide with those of
The relationship between P D and GEM is more understandable if we notice that the probability generating
d
, for α = (|θ|/d, . . . , |θ|/d), is ([10])
functional of γα,1
µZ
F|θ|,d (ξ)
Ã
=
¶d
∞
=
γ|θ|,1 (dy)
0
Z
∞
1+
0
→
d→∞
|θ|
|θ| y d −1 e−y
(ξ(y) − 1)
dy
d Γ( |θ| + 1)
!d
d
F|θ| (ξ)
(18)
↓
which, by continuity of the ordering function ρ, implies that if X ↓d has distribution D|θ|,d
, then
D
X ↓d → X ↓∞ .
Moreover, continuity of ρ and the fact that |X ↓∞ | = 1 almost surely ensure that the relations (19)-(20) hold
in the limit, that is: if Ẍ ∞ has a GEM(|θ|) distribution, then
D
(Ẍ ∞ )↓ = X ↓∞ ;
(19)
(X¨↓∞ ) = Ẍ ∞ .
(20)
D
Such a duality (of which several proofs are available, see [19] for an account and references) leads to a
double series representation for the most popular class of nonparametric prior measures on the space PE of
probability measures on any diagonal-measurable space E, the so-called Ferguson-Dirichlet class [8].
2.2.3
The Ferguson-Dirichlet class of random probability measures.
Definition 3. Let α be a diffuse measure on some Polish space E. A random probability measure F on
E belongs to the Ferguson-Dirichlet class with parameter α (FD(α)) if its distribution is such that, for
every integer d and any Borel-measurable partition A = (A1 , . . . , Ad ), of E, the distribution of the vector
(F (A1 ), . . . , F (Ad )) is DαA where αA := (α(A1 )), . . . , α(Ad )).
7
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
Theorem 1. Let α be a diffuse measure on some diagonal-measurable space E and denote α(E) = |θ|,
ν = α/|θ|. A random probability distribution F on E is F D(α) if
D
F (·) =
∞
X
Xj δZj (·)
(21)
j=1
almost surely, where:
(i) X = (X1 , X2 , . . .) is independent of Z = (Z1 , Z2 , . . .);
(ii) Z is a collection of iid random variables with common law ν;
(iii) X has either a PD(|θ|), or a GEM(|θ|) distribution.
2.3
2.3.1
Sampling formulae
Negative Binomial sums.
We have already seen in the introduction that the Dirichlet-Multinomial distribution arises as a Dirichlet
mixture of Multinomial likelihoods. Another construction is possible, based on Negative-Binomial random
sequences, which parallels the Gamma construction of the Dirichlet measure of Section 2.1.1.
Let N B|α|,y (k) : |α| > 0, denote the Negative Binomial distribution with probability mass function:
N Bα,p (k) =
α(k) k
p (1 − p)α ,
k!
k = 0, 1, . . .
(22)
With both parameters in N, such a measure describes the distribution of the number of failures occurring in
a sequence of iid Bernoulli experiments (with success probability p), before the α-th success. Two features
of N Bα,p will prove useful, in section 6 to connect multiple Meixner polynomials to multivariate Hahn
polynomials.
The first feature is that Negative-Binomial distributions arise as a Gamma mixtures of Poisson likelihoods:
Z ∞
p (dλ),
N Bα,p (k) =
P oλ (k)γα, 1−p
(23)
0
where
λk e−λ
,
k!
We recapitulate the second feature in the next Lemma.
P oλ (k) =
k = 0, 1, 2, . . .
Lemma 1. Consider any α ∈ Rd+ and p ∈ (0, 1). Let R1 , . . . , Rd be independent Negative Binomial random
variables with parameter (αi , p), respectively for i = 1, . . . , d. Then
Pd
(i) |R| := i=1 Ri is a Negative Binomial random variable with parameter (|α|, p);
(ii) Conditional on |R| = |r|, the vector R = (R1 , . . . , Rd ) has a Dirichlet-Multinomial distribution with
parameter (α, |r|).
For α = (|α|/d, . . . , |α|/d) it is now obvious that ρ(R)/|R|, is independent of |R|.
8
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
2.3.2
Partial right-neutrality.
Pd
For every r ∈ Nd and α ∈ Rd−1 , denote as usual Rj =
i=j+1 ri
and Aj =
Pd
i=j+1
αi . It is easy to see that
Z
DMα (r; R)
=
Bx (r)Dα (dx)
∆(d−1)
d−1
Yµ
=
j=1
d−1
Y
=
Rj−1
rj
¶Z
0
1
r
zj j (1 − zj )Rj Dαj ,Aj (dzj )
DMαj ,Aj (rj ; Rj−1 ).
(24)
j=1
In other words: for every j = 1, . . . , d − 1, rj /Rj is conditionally independent of r1 , . . . , rj−1 , given Rj . Such
a property can be interpreted as a partial right-neutrality property, and we have just seen that it is a direct
consequence of the right-neutrality property of the Dirichlet distribution. This feature is responsible for our
construction of multivariate Hahn polynomials.
2.3.3
Hypergeometric distribution.
Consider the form of the probability mass function DMα but now replace the parameter α with −² =
(−²1 , . . . , −²d ) with 0 ≤ nj ≤ ²j , j = 1, . . . , d. Then
DM−² (n)
=
=
(−²)(n)
|n|!
n1 ! · · · nd ! (−|²|)(|n|)
Q d ¡ ²i ¢
i=1 ni
¡ |²| ¢
|n|
=: H² (n).
(25)
H² (n) is known as the multivariate Hypergeometric distribution with parameter ².
The partial right-neutrality property of the Dirichlet-Multinomial distribution is preserved for the Hypergeometric law, however the interpretation as a Dirichlet mixture of iid laws is lost as the Dirichlet (as well as
the Gamma and the Beta) integral is not defined for negative parameters.
Limit sample distributions.
2.3.4
Ranked sample frequencies and the Ewens’ sampling formula.
Let R be an integer-valued, d-dimensional vector with Dirichlet-Multinomial DMα distribution. As |R| → ∞,
R/|R| will converge to a random point in the simplex with distribution Dα . We have already seen that limit
versions of Dα as d → ∞ exist only in their unlabeled or size-biased versions. The sampling formula
converging to the former version is known as the Ewens’ sampling formula.
For any |n|, d ∈ N, let R ∈ Nd have distribution DMα : α ∈ Rd+ . The vector of order statistics
R↓ = ρ(R)
has distribution
DMα↓ (r↓ ; |n|) =
X
DMα (σr↓ ; |n|),
r↓ ∈ ρ(Nd ).
σ∈Gd
For symmetric measures, with α = (|α|/d, . . . , |α|/d), |α| > 0 this is equal to
µ ¶
d[k] |n|
1
˘
DM |α|,d (r; |n|) = k
m|α|,d (r; |n|)
Q|r|
d
r
bi !
(26)
i=1
9
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
where, for i = 1, . . . , |n|, bi denotes the number of coordinates in r equal to i; here k = k(r) :=
P|n|
the number of strictly positive coordinates (hence |n| = i=1 ibi ), and
m|α|,d (r; |n|) :=
¶
k µ
|α|k Y |θ|
+1
.
|α|(|n|) j=1 d
(nj −1)
P|r|
i=1 bi
is
(27)
It is possible to interpret the symmetric function m|α|,d (r; |n|) in three ways:
m|α|,d (r; |n|) = E(X r )

= E Ẍ1r1 −1 · · · Ẍkrk −1
k
Y
(1 −
i=1
³ r
´
rk
1
= E X ↓ i1 · · · X ↓ ik ,
i−1
X
(28)

Ẍj )
(29)
j=1
{i1 < . . . < ik } ⊆ {1, . . . , d}
(30)
where Ẍ has the size-biased distribution D̈|α|,d as in (10), and X ↓ is the ranked vector with distribution
↓
↓
D|α|,d
as in section 2.2.2. The full formula of DM|α|,d
is obtained by summing over all equivalent choices of
indices {i1 < . . . < ik } in (30).
As d → ∞, assuming |α| → |θ| > 0, the limit sampling distribution is
µ ¶
|n|
1
m|θ| (r; |n|).
(31)
ESF|θ| (r; |n|) =
Q|r|
r
i=1 bi !
where
k
|θ|k Y
m|θ| (r; |n|) :=
(rj − 1)!
|θ|(|n|) i=1
(32)
Pk
for rj ≥ 1, j = 1, . . . , k :
1 rj = |r|. The measure ESF|θ| (·; |n|) is known as the Ewens’ sampling formula
for the distribution of the allele frequency spectrum resulting in a sample of |n| genes, taken from a neutral,
Wright-Fisher Population (i.e. with generator L given by (3)) at equilibrium.
The measure m|θ| (r; |n|) still embodies all the parameters of the model, but the representation (28) no longer
makes sense with d = ∞ as there is no positive limit for X; however (29) and (30) still hold with Ẍ and X ↓
having respectively GEM(|θ|) and PD(|θ|) distributions. Another representation is
Z
(k)
m|θ| (r; |n|) =
xr F|θ| (dx),
(33)
∆k
(k)
with F|θ| given by (17).
↓
In combinatorics, ESF|θ|
describes the distribution of the cycle lengths in a random permutation with |θ|
as a bias parameter (for other combinatorial interpretations, see [3], [19]). In Bayesian statistics, this is the
distribution of the sizes of the unlabeled clusters arising from an iid sample taken from a Poisson-Dirichlet
random measure. Still in the context of Bayesian nonparametrics, let X1 , . . . , Xn be iid F where F is a
Ferguson-Dirichlet(α) random distribution, for a diffuse, finite measure α on some Borel space (E, E) with
α(E) = |θ|. By Fubini’s theorem and (29)-(30), the probability of observing k distinct values z1 , . . . , zk
respectively r1 , . . . , rk times is given by
ESF|θ| (r; |n|)
k
Y
ν(dzj ),
j=1
where ν = α/|θ|.
10
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
2.4
Conjugacy properties
The Gamma and the Dirichlet distribution, and similarly the Negative Binomial and the Dirichlet-Multinomial
distributions, are entangled by yet another property known in Bayesian Statistics as conjugacy with respect
to sampling.
A statistical model can be described by a probability triplet {M, M, lΛ }Λ∈E where the likelihood function
lΛ (x) depends on a random parameter Λ living in some probability space (E, E, π). The distribution π of Λ
is called prior measure of the model. The posterior measure of the model is any version πx (·) = π(·|X = x),
of the conditional probability satisfying
Z
Z
Z
π(B|X = x) lλ (dx)π(dx) =
lλ (A)π(dλ)
a.s.∀A ∈ M, B ∈ E.
(34)
A
B
Definition 4. Let C be a family of prior measures for a statistical model with likelihood lΛ . C is conjugate
with respect to lΛ if
π ∈ C =⇒ πx ∈ C
∀x.
It is easy to check that both Gamma and Dirichlet measures are conjugate classes of prior measures.
Bayes’ theorem shows us the role as marginal distributions played, respectively, by N Bα,p and DMα .
Example 1. The class of Gamma priors is conjugate with respect to lλ = P oλ on {0, 1, 2, . . .}. The posterior
measure is
P oλ (x)γα,β (dλ)
πx (dλ) =
= γα+x, β (dλ).
(35)
1+β
N Bα, β (x)
1+β
d
Similarly, the class of multivariate Gamma prior {γα,|β|
: α ∈ Rd , |β| > 0} is conjugate with respect to
{P odλ (x), λ ∈ Rd+ , x ∈ Nd }
Example 2. The class of Beta priors {D|α|,|β| : (|α|, |β|) ∈ R2+ } is conjugate with respect to the Binomial
likelihood lλ = Bλ (·) on {0, 1, 2, . . . , |n|}, for any integer |n|. The posterior distribution is
πx (dλ) =
Bλ (|r|, |n − r|)D|α|,|β| (dλ)
= D|α|+|r|,|β|+|n|−|r| (dλ).
DM|α|,|β| (|r|; |n| − |r|)
(36)
Similarly the class of Dirichlet measures is conjugate with respect to multinomial sampling, and so are the
Poisson-Dirichlet and the Ferguson-Dirichlet with respect to (possibly unordered) multinomial sampling.
3
Jacobi polynomials on the simplex.
If X, Y are independent random variables, their distribution WX,Y is the product WX WY of their marginal
distributions, and therefore orthogonal polynomials Qn,k (x, y) in WX,Y are simply obtained by products
Pn (x)Rk (y) of orthogonal polynomials with WX and WY as weight measures, respectively.
The key idea for deriving multivariate polynomials with respect to Dirichlet measures on the simplex, and
to all related distributions treated in the subsequent sections, exploits the several properties of conditional
independence enjoyed by the increments of Dα , as pointed out in Section 2.1.1. A method for constructing
orthogonal polynomials in the presence of a particular kind of conditional independence, where Y depends on
X only through a polynomial ρ(x) of first order, is illustrated by the following modification of Koornwinder’s
method (see [16], 3.7.2).
Proposition 2. For l, d ∈ N, let (X, Y ) be a random point of Rl × Rd with distribution W . Let ρ : Rl → R
define polynomials on Rl of order at most 1.
Assume that the random variable
Y
Z :=
ρ(X)
11
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
is independent of X. Denote with WX and WZ the marginal distributions of X and Z, respectively. Then a
system of multivariate polynomials, orthogonal with respect to W is given by
µ
¶
y
(Nl )
Nl
Gn (x, y) = P(n1 ,...,nl ) (x)(ρ(x)) R(nl+1 ,...,nl+d )
,
(x, y) ∈ Rl × Rd , n ∈ Nl+d ,
(37)
ρ(x)
(|m|)
where Nl = nl+1 + · · · + nl+d , and {Pk
}k∈Rl and {Rm }m∈Rd are systems of orthogonal polynomials with
weight measures given by (ρ(x))2|m| WX and WZ , respectively.
Proof. When d = l = 1 his proposition is essentially a probabilistic reformulation of Koornwinder’s construction ([16], 3.7.2). The proof is similar for any l, d. That Gn is a polynomial of degree |n| is evident as the
denominator of the term of maximum degree in R simplifies with (ρ(x))nl+1 +···+nl+d . To show orthogonality,
note that the assumption of conditional independence implies that
µ
¶
1
W (dx, dy) = WX (dx)WZ
dy
.
(ρ(x))d
Denote bn = E[Pn2 ] and cn = E[Rn2 ], n = 0, 1, 2, . . .. For k, r ∈ Rl and m, s ∈ Rd ,
Z
Z
Z
G(k,m) (x, y)G(r,s) (x, y)W (dx, dy) =
Pkm (x)Prs (x)(ρ(x))m+s WX (dx) Rm (z)Rs (z)WZ (dz)
Z
=
Pkm (x)Prm (x)(ρ(x))2m WX (dx)cm δms
=
3.1
bk cm δkr δms .
d = 2. Jacobi Polynomials on [0, 1].
For d = 2, Dα reduces to the Beta distribution, the weight measure of (shifted) Jacobi polynomials. These
are functions of one variable living in ∆1 ≡ [0, 1]. It is convenient to recall some known properties of such
polynomials. Consider the measure
w
ea,b (dx) = (1 − x)a (1 + x)b I(x ∈ (−1, 1))dx,
a, b > −1.
(38)
where I(A) is the indicator function, equal to 1 if A, and 0 otherwise. This is the weight measure of the
Jacobi polynomials defined by
µ
¶
(a + 1)(n)
1−x
−n, n + a + b + 1
a,b
e
Pn (x) :=
2 F1
a+1
n!
2
where p Fq , p, q ∈ N, denote the Hypergeometric function (see [1] for basic properties).
The normalization constants are given by the relation
Z
2a+b+1
Γ(n + a + 1)Γ(n + b + 1)
a,b
Pena,b (x)Pem
(x)w
ea,b (dx) =
δmn .
2n
+
a
+
b
+
1
n!Γ(n + a + b + 1)
(−1,1)
The Jacobi polynomials are known to be solution of the second order partial differential equation
(1 − x2 )y 00 (x) + [b − a − x(a + b + 2)]y 0 (x) = −n(n + a + b + 1)y(x).
(39)
By a simple shift of measure it is easy to see that, for α, β > 0 and θ := α + β, the modified polynomials
Pnα,β (x) =
n!
Peβ−1,α−1 (2x − 1)
(n + θ − 1)(n) n
α, β > 0
(40)
12
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
are orthogonal with respect to the Beta distribution on [0, 1] which can be written as
w
eβ−1,α−1 (du)
,
2α+β−1 B(α, β)
Dα,β (dx) =
(41)
where u = 2x − 1. Note that (39) implies that, for every n, Pnα,β (x) solves
L2 y(x) = −n(n + θ − 1)y(x)
where Ld is given by (3).
For the shifted system (Pnα,β , Dα,β ) the constants are
1
=
ηn (α, β)
Z
0
1
[Pnα,β (x)]2 Dα,β (dx) =
n! α(n) β(n)
,
θ(2n) (θ + n − 1)(n)
n = 0, 1, . . .
(42)
To prove it one just uses (40), (41) and the property of Beta functions:
α(n) β(m)
B(α, β).
θ(n+m)
B(α + n, β + m) =
(43)
The value at 1 of Jacobi polynomials is
Pena,b (1) =
which implies
Pnα,β (1) =
(a + 1)(n)
n!
,
β(n)
.
(θ + n − 1)(n)
(44)
Denote the standardized Jacobi polynomials with
ea,b
ena,b (x) = Pn (x)
R
Pena,b (1)
Obviously
and
Rnα,β (x) =
Pnα,β (x)
Pnα,β (1)
.
en(β−1,α−1) (2x − 1).
Rnα,β (x) = R
(45)
Then, by (42) and (44), the new constant of proportionality is
Z 1
1
:
=
[Rnα,β (x)]2 Dα,β (dx)
(α,β)
0
ζn
Ã
!2
(θ + n − 1)(n)
1
=
β(n)
ηn (α, β)
=
n!
α(n)
1
(θ + 2n − 1)θ(n−1) β(n)
(46)
where again we used (43) for the last equality. A symmetry relation is therefore
Rnα,β (x) =
Rnβ,α (1 − x)
Rnβ,α (0)
.
(47)
Note that, if {Pn∗ α,β (x)} is a system of orthonormal polynomials with weight measure Dα,β , then
ζn(α,β) = [Pn∗ α,β (1)]2 .
(48)
13
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
3.2
2 ≤ d < ∞. Multivariate Jacobi polynomials on the simplex.
3.3
Multivariate Jacobi from right-neutrality.
A system of multivariate polynomials with respect to a Dirichlet distribution on d ≤ ∞ points can be derived
by using its right-neutrality property, via Proposition 2. Let Nd,|m| = {n = (n1 , . . . , nd ) ∈ Nd : |n| = |m|}.
Pd−1
Pd
For every n ∈ Nd−1,|n| and α ∈ Rd+ denote Nj = i=j+1 ni and Aj = i=j+1 αi .
Proposition 3. For d ≤ ∞, a system of multivariate orthogonal polynomials on the Dirichlet distribution
Dα is given by
µ
¶
d−1
Y
xj
Rnα (x) =
Rn(αj j ,Aj +2Nj )
(1 − sj−1 )Nj
x ∈ ∆(d−1)
(49)
1
−
s
j−1
j=1
where sj =
Pj
i=1
xi .
Notice that similar systems of orthogonal polynomials could be obtained by replacing, in (49), Rnj with
either Pn∗j (orthonormal Jacobi polynomials) or Pnj as defined in (40). The choice of Rnj is useful because
of the standardization property
(50)
Rnα (ed ) = 1
where ej := (δij : i = 1, . . . , d).
A similar definition for polynomials in the Dirichlet distribution is proposed by [17], in terms of non-shifted
en . For an alternative choice of basis, see e.g. [6].
Jacobi polynomials R
Proof. The polynomials in Rnα (x) given in Proposition 3 admit a recursive definition as follows:
µ
¶
x2
xd
α∗
Rnα1 ,...,nd−1 (x1 , . . . , xd ) = Rn(α11 ,A1 +2N1 ) (x1 )(1 − x1 )N1 Rn22,...,nd−1
,...,
,
1 − x1
1 − x1
(51)
where αj∗ = (αj , . . . , αd ) (j ≤ d − 1); so Proposition 2 is used with l = 1, ρ(x) = 1 − x and inductively
on d. The claim is a consequence of the neutral-to-the right property and Proposition 2, for consider the
orthogonality of a term
µ
¶Nj
µ
¶
Xj
Xj
αj ,Aj
1−
Rnj
(52)
1 − Sj−1
1 − Sj−1
α
in Rnα with a similar term in Rm
for some m = (m1 , . . . , md−1 )-polynomial. Assume without loss of generality
that for some j = 1, . . . , d − 1, mk = nk for k = j + 1, . . . , d − 1 and mj < nj . Then Nj = Mj and multiplying
the product of (52) by the corresponding Beta density Dαj ,Aj (dBj )/dBj , where Bj is as in (8), gives
α −1
Bj j
αj ,Aj +2Nj
(1 − Bj )Aj +2Nj −1 Rnαjj ,Aj +2Nj (Bj )Rm
(Bj ).
j
(53)
Since Rnj is orthogonal to polynomials of degree less than nj on the weight measure Dαj ,Aj +2Nj , then the
integral with respect to dBj of the quantity (53) vanishes, which proves the orthogonality.
The normalization constant for {Rnα } can be easily derived as
Z
1
1
2
:
=
(Rnα (x)) Dα (dx) = Qd−1 αj ,Aj +2Nj
α
ζn
ζnj
∆(d−1)
j=1
=
d−1
Y
j=1
nj ! (αj )(nj )
(Aj−1 )(nj −1) (Aj−1 + 2Nj−1 − 1)(Aj + 2Nj )(nj )
.
(54)
Notice that the same construction shown in Proposition 3 could be similarly expressed in terms of the
α ,A +2Nj
α ,A +2Nj
polynomials {Pnjj j
} or {P ? αj ,Aj +2Nj } instead of {Rnjj j
}, the only difference resulting in the
orthogonality coefficients.
14
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
3.4
Limit polynomials on the GEM distribution.
Remember that the size-biased permutation of a Dirichlet distribution is still a right-neutral distribution,
so that orthogonal polynomials can be constructed in very much the same way as in proposition 3, with a
similar proof.
Proposition 4. A system of orthogonal polynomials in D̈|θ|,d is
R̈n(|θ|,d) (x)
=
d−1
Y
(|θ|/d+1, d−j
d θ+2Nj )
Rnj
µ
j=1
xj
1 − sj−1
¶
(1 − sj−1 )Nj
x ∈ ∆(d−1) , n ∈ Nd .
(55)
As d → ∞, D̈|θ|,d converges to the so-called GEM distribution, i.e. an infinite-dimensional right-neutral distribution with all iid weights being Beta random variables with parameter (1, θ). Let D̈|θ|,∞ = limd→∞ D̈|θ|,d
denote the GEM distribution with parameter |θ|. An immediate consequence is
Corollary 1. For |θ| > 0, an orthogonal system with respect to the weight measure D̈|θ|,∞ is given by the
polynomials:
R̈n|θ| (x) =
∞
Y
µ
j)
Rn(1,θ+2N
j
j=1
4
xj
1 − sj−1
¶
(1 − sj−1 )Nj
x ∈ ∆∞ , n ∈ N∞ : |n| = 0, 1, . . .
(56)
Multivariate Jacobi and Multiple Laguerre polynomials.
The Laguerre polynomials, defined by
|α|
L|n| (y) =
(|α|)(|n|)
|n|!
1 F1 (−|n|; |α|; y),
|α| > 0,
(57)
are orthogonal to the Gamma density γ|α|,1 with constant of proportionality
Z
∞
0
|α|
[L|n| (y)]2 γα (dy) =
|α|(|n|)
|n|!
.
(58)
(Note that the usual convention is to define Laguerre polynomials in terms of the parameter |α0 | := |α| − 1 >
−1. Here we prefer to use positive parameter for consistency with the parameters in the Gamma distribution).
Remark 2. If Y is a Gamma (|α|) random variable, then, for every scale parameter |β| ∈ R+ , the distribution of Z := |β|Y is γ|α|,|β| (dz) Thus the system
½
µ ¶¾
z
L|α|
n
|β|
n=0,1,...
is orthogonal with weight measure γ|α|,|β| .
d
Let Y ∈ Rd+ be a random vector with distribution γα,|β|
. By the stochastic independence of its coordinates,
orthogonal polynomials of degree |n| with the distribution of Y as weight measure are simply
Lα,|β|
(y) =
n
d
Y
i=1
µ
i
Lα
ni
yi
|β|
¶
,
y ∈ Rd , n ∈ Nn ,
(59)
15
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
with constants of proportionality of
d
Y (αi )(ni )
1
2
= E (Lα
.
n (Y )) =
ϕn
ni !
i=1
(60)
Therefore, with the notation introduced in Section 2.1.1, because of the one-to-one mapping
(Y1 , . . . , Yd ) 7→ (|Y |, X1 , . . . , Xd ),
one can obtain an alternative system of orthogonal polynomials set on y1 , . . . , yn :
Proposition 5. The polynomials defined by
µ ¶ µ ¶|n0 |
µ ¶
|y|
|y|
y
α,|β|∗
|α|+2|n0 |
α
Ln
(y) = Lnd
Rn0
,
|β|
|β|
|y|
n ∈ Nd , y ∈ Rd
(61)
α
d
with n0 = (n1 , . . . , nd−1 ) and Rm
defined by (49), are orthogonal with respect to γα,|β|
.
Proof. The proof of (61) is straightforward and follows immediately from Proposition 2, with l = 1, X = |Y |
and ρ(x) = x (remember that |Y | is Gamma with parameter (|α|, |β|)).
From now on we will only consider the case with |β| = 1, without much loss of generality. The constant of
proportionality of the resulting system {Lα∗
n }is
1
: =
ϕ∗n
Z
2
Rd
Z
=
0
=
[Lα∗
n (y)]
∞
d
Y
γαi (dyi )
i=1
Z
h
i2
|α|+2(|n|−nd )
|n|−nd
Lnd
(|y|)|y|
γ|α| (d|y|)
|α|(2|n0 |) Z h
ζnα0
∆(d−1)
0
|
L|α|+2|n
(|y|)
nd
i2
2
[Rnα0 (x)] Dα (dx)
γα+2|n0 | (d|y|)
2
=
1 (|α|(2|n0 |) )
,
nd !
ζnα0
(62)
where ζnα0 is as in (54).
α∗
The two systems Lα
n and Ln can be expressed as linear combinations of each other:
X
Lα∗
ϕm c∗m (n)Lα
n (y) =
m (y)
(63)
|m|=|n|
and
Lα
n (y) =
X
ϕ∗m cm (n)Lα∗
m (y),
(64)
|m|=|n|
where
α
c∗m (n)δ|m||n| = E [Lα∗
n (y)Lm (y)] = cn (m)δ|m||n| .
For general m, n a representation for c∗m (n) can be derived in terms of a mixture of Lauricella functions of
the first (A) type. Such functions are defined ([18]) as
FA (|a|; b; c; z) =
X
m∈Nd
where v(r) :=
Qd
i=1
|a|(|m|) b(m) m
1
z ,
m1 ! · · · md !
c(m)
a, b, c, z ∈ Cd
(vi )(ri ) for every v, r ∈ Rd .
16
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
Proposition 6. For every n ∈ Nd denote n0 := (n1 , . . . , nd−1 ).
c∗m (n)
= δmn
where
|α|(|n|)
|n|!
DMα (m)
0
dj =
|n |
X
(−|n0 |)(i)
|n|
X
Z
dj
∆(d−1)
j=0
Rnα0 (t)FA (|α|; −m, −j; α, |α|; t, 1 − |t|, 1)Dα (dt)
|α|(|n0 |) (|α| + 2|n0 |)(nd )
i!nd !
i=0
FA (|α|; −i, −nd , −j; |α|, |α| + 2i, |α|; 1, 1, 1).
(65)
(66)
Remark 3. An equivalent representation of c∗m (n) in terms of Hahn polynomials will be given in section
5.2.2.
Proof. The building block of the proof is the following beautiful representation due to Erdélyi ([7]): for every
|a|, |z| ∈ R, α, k ∈ Rd and n ∈ Nd ,
d
Y
j
Lα
nj (kj |z|) =
|n|
X
φs (|a|; α; n; k)L|a|
s (|z|),
(67)
s=0
j=1
where
φs (|a|; α; n; k) = FA (|a|; −n, −s; α, |a|; k, 1)
d (α )
Y
j (nj )
j=1
nj !
.
Now
c∗m (n)
= E [Lα∗
(Y )Lα
m (Y )]
n
0
0
|
= E L|α|+2|n
(|Y |)|Y ||n | Rnα0 (T )
nd

j

Lα
mj (Tj |Y |)
j=1


d
Y
0
0
|
(|Y |)|Y ||n |
= Eα Rnα0 (T ) E|α| L|α|+2|n
nd

¯
j
¯ 
Lα
mj (Tj |Y |) T
d
Y
(68)
j=1
where T ∈ ∆(d−1) has distribution Dα . The inner expectation is with respect to the Gamma (|α|) measure,
and the outer expectation is with respect to the Dirichlet distribution.
We start by evaluating the inner expectation. Since
0
|y|
|n|0
=
|n |
X
−|n0 |(i) Γ(|α| + |n0 |)
Γ(|α| + i)
i=0
|α|
Li (|y|)
(see [7], p. 156), then
0
0
0
|
L|α|+2|n
(|y|)|y||n |
nd
=
|n |
X
−|n0 |(i) Γ(|α| + |n0 |)
i=0
Γ(|α| + i)
0
|α|
|
Li (|y|)L|α|+2|n
(|y|)
nd
0
=
|n|
|n |
X
−|n0 |(i) Γ(|α| + |n0 |) X
i=0
Γ(|α| + i)
|α|
ci,j Lj (|y|)
j=0
|n0 |
=
X
|α|
dj Lj (|y|).
(69)
j=0
17
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
|α|
|α|+2|n0 |
The second equality in (69) is obtained by applying (67) to Li (|y|)Lnd
(|y|). There
ci,j = φj (|α|; |α|, |α| + 2|n0 |; i, nd ; 1, 1)
|α|
vanishes for j > i + nd , by orthogonality of {Lj } and (67). The last equality is obtained just by inverting
the order of summation (note that −|n0 |(i) = 0 for i > |n0 |).
Qd
α
Now apply again (67) to j=1 Lnjj (tj |y|): note that, for every x ∈ Rd

d
Y
|α|
φj (|α|; α; n; x) = E Lj (|Y |)

j

Lα
nj (xj |Y |) ,
j=1
hence by (69)

0
0
|
E|α| L|α|+2|n
(|Y |)|Y ||n |
nd
=
|n |
X

¯
j
¯

Lα
nj (Tj |Y |) T = t
j=1

0
d
Y
|α|
dj E Lj (|Y |)
j=0
d
Y

j
=
Lα
nj (tj |Y |)
j=1
0
=
|n |
X
dj φj (|α|; α; n; t, 1 − |t|).
(70)
j=0
Therefore, taking the expectation over T yields
c∗m (n)
=
|n|
X
dj Eα (Rnα0 (T )φj (|α|; α; n; T ))
j=1
=
δmn
|α|(|n|)
|n|!
Z
0
DMα (m)
|n |
X
dj
j=0
∆(d−1)
Rnα0 (t)FA (|α|; −m, −j; α, |α|; t, 1 − |t|, 1)Dα (dt)
(71)
which is what we wanted to prove.
Remark 4. Note that when |n0 | = 0, c∗m (0, . . . , 0, nd ) = 1 which agrees with the known identity
Lα+β
(x + y) =
n
n
X
β
Lα
j (x)Ln−j (y),
x, y ∈ R
(72)
j=0
(see [2] f. (6.2.35), p. 191), an identity with an obvious extension to the d-dimensional case.
Remark 5. It is immediate to verify that the coefficients c∗m (n) also satisfy
µ ¶
X
0
y
|α|
−1
L|n−n0 | (|β −1 y|)|β −1 y||n | Rnα0
=
ϕm c∗m (n)Lα
y|), β ∈ R+ .
m (|β
|y|
(73)
|m|=|n|
18
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
4.1
Infinite-dimensional multiple Laguerre.
From Remark 1 it is possible to derive an infinite-dimensional version of {Lα?
n }, orthogonal with respect to the
law of the size-biased point process Ÿ ∞ , obtained by Y ∞ of Section 2.2.2. Remember that Ẍ ∞ := Ÿ ∞ /|Ÿ ∞ |
D
has GEM(|θ|) distribution and it is independent of |Ÿ ∞ | = |Y ∞ | which has a Gamma(|θ|) law. The proof of
the following corollary is, at this point, obvious from Corollary 1 and Proposition 5.
Corollary 2. Let γ̈|θ| be the probability distribution of the size-biased sequence Ÿ ∞ obtained by rearranging
in size-biased random order the sequence Y ∞ of points of a Poisson process with generating functional (15).
The polynomials defined by
µ ¶
y
|θ|∗
|θ|+2|n0 |
|n0 | |θ|
L(|m|,n0 ) (y) = L|m|
(|y|) (|y|) R̈n0
,
(74)
|y|
for |m| ∈ N, n0 ∈ N∞ : |n0 | ∈ N, with {R̈n } as in (56), form an orthogonal system with respect to γ̈|θ| .
5
5.1
Multivariate Hahn Polynomials.
Hahn polynomials on {1,. . . ,N}.
As for the Laguerre polynomials, we introduce the discrete Hahn polynomials on {1, . . . , N } with parameters
shifted by 1 to make the notation consistent with the standard probabilistic notation in the corresponding
weight measure. The Hahn polynomials, orthogonal on DMα,β (n; N ), are defined as the hypergeometric
series:
¶
µ
−n, n + θ − 1, −r
hα,β
(r;
N
)
=
F
1
,
n = 0, 1, . . . , N.
(75)
3
2
n
α, −N
The orthogonality constants are given by
1
uα,β
N,n
:=
N
X
£ α,β
¤2
β(n)
1
1 (θ + N )(n)
.
hn (r; N ) DMα,β (n; N ) = ¡N ¢
θ
θ
+
2n
−
1
α
(n−1)
(n)
n
r=0
A special point value is ([13], (1.15))
n
hα,β
n (N ; N ) = (−1)
β(n)
.
α(n)
(76)
Thus if we consider the normalization
qnα,β (r; N ) :=
hα,β
n (r; N )
hα,β
n (N ; N )
then the new constant is, from (76),
1
α,β
wN,n
: =
=
=
£
¤2
E qnα,β (R; N )
α(n)
1 (θ + N )(n)
1
¡N ¢
θ
θ
+
2n
−
1
β(n)
(n−1)
"n
#
(θ + N )(n)
1
,
α,β
N[n]
ζn
where ζn is the Jacobi orthogonality constant, given by (46).
A symmetry relation is
q β,α (N − r; N )
qnα,β (r; N ) = n β,α
qn (0; N )
(77)
(78)
19
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
A well-known relationship is in the limit:
eα−1,β−1 (1 − 2z)
lim hα,β
n (N z; N ) = Rn
α, β > 0
N →∞
(79)
ena,b = R
ena,b /R
ena,b (1) are standardized Jacobi polynomials orthogonal on [−1, 1] as defined
(see [13]) where R
in section 3.1. Because of our definition (40), combining (47), (78) and (80) gives the equivalent limit: For
every n,
lim qnα,β (N z; N ) = Rnα,β (z)
α, β > 0.
(80)
N →∞
Note that also
α,β
lim wN,n
= ζnα,β .
(81)
N →∞
An inverse relation holds as well, which allows one to derive Hahn polynomials as mixture of Jacobi polynomials. Denote by Bx (r; N ) = Bx,1−x (r, N − r) the Binomial distribution.
Proposition 7. The functions
Z
qenα,β (r; N )
1
:=
0
Z
=
0
1
Rnα,β (x)
Bx (r; N )
Dα,β (dx)
DMα,β (r; N )
(82)
Rnα,β (x)Dα+r,β+N −r (dx),
n = 0, 1, . . . , N,
(83)
form the Hahn system of orthogonal polynomials with DMα,β as weight function, such that
qenα,β (r; N ) =
(θ + N )(n)
N[n]
qnα,β (r; N ).
(84)
The representation (83), in particular, shows a Bayesian interpretation of Hahn polynomials, as a posterior
mixture of Jacobi polynomials evaluated on a random Bernoulli probability of success X, conditionally on
having previously observed r successes out of N independent Bernoulli(X) trials, where X has a Beta(α, β)
distribution on {0, . . . , N }.
Proof. The integral defined by (82) is a polynomial: consider
Z
1
0
xn (1 − x)m
Bx (r; N )
Dα,β (dx)
DMα,β (r; N )
=
=
α(n+r) β(N +m−r) θ(N )
α(r) β(N −r) θ(N +n+m)
(α + r)(n) (β + N − r)(m)
(θ + N )(n+m)
.
The numerator is a polynomial in (r, N − r) of order n + m. Write
Rnα,β (x) =
n
X
cj xj ,
j=1
then
Z
0
1
n
Rnα,β (x)
n
X
X
cj
cj
Bx (r; N )
Dα,β (dx) =
(α + r)(j) =
r[j] + L
DMα,β (r; N )
(θ + N )(j)
(θ + N )(j)
j=1
j=1
(85)
20
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
where L is a polynomial in r of order less than n. Then qnα,β (r) is a polynomial of order n in r.
To show orthogonality it is sufficient to show that hn are orthogonal with respect to polynomials of the basis
formed by the falling factorials {r[l] , l = 0, 1, . . .}. For l ≤ n,
" n µ
#
Z 1
n
X
X N − l¶
N!
α,β
l α,β
l−r
N −r
DMα,β (r; N )r[l] qen (r; N ) =
x Rn (x)
x (1 − x)
Dα,β (dx)
(N − l)! 0
r−l
r=0
r=0
Z 1
= N[l]
xl Rnα,β (x)Dα,β (dx).
(86)
0
The last integral is nonzero only if l = n, which proves the orthogonality of qnα,β (r; N ).
Now consider that, in Rnα,β (x), the leading coefficient cn satisfies
Z 1
Z 1
£ α,β ¤2
1
cn xn Rnα,β (x)Dα,β (dx) =
Rn (x) Dα,β (dx) = α,β .
ζn
0
0
1
α,β
ωN,n
=
n
X
DMα,β (r; N )e
qnα,β (r; N )e
qnα,β (r; N ) =
r=0
n
X

n
X
DMα,β (r; N ) 
r=0
That is,
=
=
N[n]
1
.
(θ + N )(n) ζnα,β
"
α,β
ωN,n
=
Z
cn
N[n]
(θ + N )(n)
(θ + N )(n)
j=0

cj
r[j]  qenα,β (r; N ) + L0
(θ + N )(j)
xn Rnα,β (x)Dα,β (dx)
(87)
#2
α,β
wN,n
N[n]
(88)
α,β
with wN,n
as in (76), and therefore the identity (84) follows, completing the proof.
5.2
Multivariate polynomials on the Dirichlet-Multinomial distribution.
Multivariate polynomials orthogonal with respect to DMα on the discrete d-dimensional simplex were first
introduced by Karlin and MacGregor [12], as eigenfunctions of the Birth-and-Death process with Neutral
mutation. Here we derive an alternative derivation as a posterior mixture of multivariate Jacobi polynomials,
which extends Proposition 7 to a multivariate setting.
Proposition 8. For every α ∈ Rd , a system of polynomials, orthogonal with respect to DMα is given by
Z
Bx (r)
qenα (r; |r|) =
Rnα (x)
Dα (dx)
(89)
DM
α (r)
∆(d−1)
Z
=
Rnα (x)Dα+r (dx),
|n| = |r|
(90)
∆(d−1)
à Qd−1
=
j=1
(Aj + Rj + Nj+1 )(nj+1 )
(|α| + |r|)(N1 )
!
d
Y
qenαjj ,Aj +2Nj (rj ; Rj−1 − Nj ),
(91)
j=1
with constant of orthogonality given by
|r|[n]
1
1
2
:= E [e
qnα (R; |r|)] =
.
ωn (α; |r|)
(|α| + |r|)(n) ζnα
(92)
21
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
Proof. The identity between (89) and (90) is obvious from Section 2.4 and (91) follows from Proposition 7
and some simple algebra. For every n ∈ Nd ,
Z
xn Dα+r (dx) = DMα+r (n)
∆(d−1)
=
=
=
d−1
Y
(αi + ri )(ni ) (Ai + Ri )(Ni )
i=1
Qd
i=1
(Ai−1 + Ri−1 )(Ni−1 )
(αi + ri )(ni )
(|α| + |r|)(|n|)
d
Y
1
ri[ni ] + L
(|α| + |r|)(|n|) i=1
(93)
where L is a polynomial in r of order less than |n|. Therefore qenα (r; |r|) are polynomials of order |n| in r.
To show that they are orthogonal, denote
pl (r) :=
d
Y
(ri )[li ]
i=1
and consider that, for every l ∈ N : |l| ≤ |n| = |r|,
X
DMα (m; |r|)pl (m)e
qnα (m; |r|) =
|m|=|r|
=


Z
X µ|r − l|¶
|r|!
xm−l  Dα (dx)
xl Rnα (x) 
m−l
(|r| − |l|)!
|m|=|r|
Z
|r|[|l|] xl Rnα (x)Dα (dx)
(94)
which, by orthogonality of Rn , is nonzero only if |l| = |n|. Since it is always possible to write, for appropriate
coefficients cnm
X
cnm xm + C,
Rnα (x) =
|m|=|n|
where C is a polynomial of order less than |n| in x; then
X
csm
qesα (r; |r|) =
pm (r) + C 0
(|α| + |r|)(|s|)
|m|=|s|
and by (94)
X
00
csk
E [pk (R)e
qnα (R; |r|)] + C
(|α| + |r|)(|s|)
|k|=|s|
Z
X
csk
= |r|[|n|]
xk Rnα (x)Dα (dx)
(|α| + |r|)(|s|)
E [e
qsα (R; |r|)e
qnα (R; |r|)] =
|k|=|r|
=
|r|[|n|]
1
δsn ,
(|α| + |r|)(|n|) ζnα
|n| = |r|.
Remark 6. Note that the representation (91) holds also for negative parameters, so that, if we replace α
with −² (² ∈ Rd ) then (91) is a representation for polynomials with respect to the Hypergeometric distribution
(Section 2.3.3).
22
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
5.2.1
Bernstein-Bézier coefficients of Jacobi polynomials.
As anticipated in the Introduction, Proposition 8 gives a probabilistic proof of a recent result of [22], namely
that Hahn polynomials are the Berstein-Bezier coefficients of the multivariate Jacobi polynomials. Remember
that the Bernstein polynomials, when taken on the simplex, are essentially multinomial distributions Bx (n) =
¡|n|¢ n
n x , seen as functions of x.
Corollary 3. For every d ∈ N, α ∈ Rd , r ∈ Nd ,
X
Rrα (x) = ωr (|α|; |r|)
qerα (m; |r|)Bx (m).
(95)
|m|=|r|
where ωr (|α|; |r|) is given by (92).
Proof. From Proposition 8,
DMα (m : |r|)e
qrα (m; |r|) = E [BX (m)Rrα (X)]
so
|m|
X
Bx (m) = DMα (m; |m|)
qenα (m; |m|)Rn (x).
|n|=0
Hence
X
qerα (m; |r|)Bx (r) =
m
|r|
X
|n|=0
|r|
=
X
|n|=0



X
DMα (m; |r|)e
qrα (m; |r|)e
qnα (m; |r|) Rnα (x)
|m|=|r|
1
1
δrn Rnα (x) =
Rα (x),
ωr (|α|; |r|)
ωr (|α|; |r|) r
(96)
which completes the proof.
Remark 7. By a similar argument it is easy to come back from (95) to (89).
5.2.2
The connection coefficients of Proposition 6.
Consider again the connection coefficients c∗n (m) of Proposition 6 and their representation (65)-(66). An
alternative representation can be given in terms of multivariate Hahn polynomials.
α
Corollary 4. Let c∗n (m) be the connection coefficients between Lα∗
n and Lm , as in Section 4. Then
|α|
c∗n (m) = δmn b|n|,nd DMα (m)
where n0 = (n1 , . . . , nd − 1),
|α|
b|n|,nd =
|α|(|n|)
|n|!
|n|
X
−m(r) α
qen0 (r; |r|).
Qd
l=1 rl !
|r|=0

|n|
X

j=0
(97)

dj 
j!|α|(j)
and dj is as in (66).
23
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
Proof. It is sufficient to use the explicit expression of the Lauricella function FA in (65), to see that


Z ¡|r|¢ r α
|n|
|n|
X
|α|
dj  X −m(r)
(|n|)
∗
r t Rn0 (t)

cm (n) = δmn
DMα (m)
Dα (dt)
Qd
|n|!
j!|α|
DMα (r)
(j)
j=0
l=1 rl !
|r|=0
|n|
=
6
|α|
δmn b|n|,nd DMα (m)
X −m(r)
qenα0 (r; |r|).
Qd
r
!
l=1 l
|r|=0
(98)
Multivariate Hahn and multiple Meixner polynomials.
The Meixner polynomials on {0, 1, 2, . . .}, defined by
µ
¶
p−1
−n, −k
Mn (k; α, p) = 2 F1
,
α
p
α > 0, p ∈ (0, 1)
(99)
are orthogonal with respect to the Negative Binomial distribution N Bα,p . The following representation of
the Meixner polynomials comes from the interpretation of N Bα,p as Gamma mixture of Poisson likelihood
(formula (23)).
Proposition 9. For α ∈ R+ and p ∈ (0, 1), a system of orthogonal polynomials with the Negative Binomial
(α, p) distribution as weight measure is given by
µ
¶
Z ∞
1−p
P oλ (k) α
α,p
f
p (dλ)
Mn (k) =
L λ
γα, 1−p
(100)
N Bα,p (k) n
p
0
µ
¶
Z ∞
1−p
=
Lα
γα+k,p (dλ),
n = 0, 1, . . . .
(101)
n λ
p
0
where Lα
n are Laguerre polynomials with parameter α.
Proof. For every n, consider that
Z
∞
Z
λn γα+k,p (dλ) =
0
=
λ
λα+k+n−1 e− p
dλ
Γ(α + k)pα+k
0
(α + k)(n) pn .
∞
So every polynomial in Λ of order n is mapped to a polynomial in k of the same order.
To show orthogonality it is, again, sufficient to consider polynomials in the basis {r[k] : k = 0, 1, . . .}. Let
m ≤ n.
)
µ
¶ (X
λ
Z ∞
∞
∞
α+k−1 − p
X
α
1
−
p
λ
e
(k)
α,p
α
k
α
fn (k) =
Ln λ
p (1 − p) k[m]
dλ
N Bα,p (k)k[m] M
p
k!
Γ(α + k)pα+k
0
k=0
k=0
)
µ
¶ (X
Z ∞
∞
1−p
α
p (dλ)
=
Ln λ
k[m] P oλ (k) γα, 1−p
p
0
k=0
µ
¶
Z ∞
1−p
α
p (dλ)
=
Ln λ
λm γα, 1−p
(102)
p
0
24
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
where the last line comes from the fact that, if K is a Poisson(λ) random variable, then
Eλ (K[n] ) = λn , n = 0, 1, 2, . . . .
Now, consider the change of measure induced by
z := λ
The last line of (102) reads
µ
p
1−p
¶m Z
0
∞
1−p
.
p
m
Lα
n (z)z γα,1 (dz).
The integral vanishes for every m < n, and therefore the orthogonality is proved.
From Lemma 1, by using Propositions 9, 8 and 6, and Remark 5, it is possible to find the following alternative
d
systems of multivariate Meixner polynomials, orthogonal with respect to N Bα,p
(r).
Proposition 10. Let α ∈ Rd+ and p ∈ (0, 1).
d
(i) Two systems of multivariate orthogonal polynomials with weight measure N Bα,p
(r) are:
fnα,p (r) =
M
d
Y
fnαi ,p (ri )
M
i
n ∈ Nd ,
(103)
i=1
and
∗ fα,p
Mn (r)
0
0
fn|α|+2|n |,p (|r| − |n0 |) (|α + r|) 0 qenα0 (r; |r|)
= (1 − p)|n | M
(|n |)
d
n ∈ Nd ,
(104)
where n0 = (n1 , . . . , nd − 1), {Mnαii ,p } are Meixner polynomials as in Proposition 9, and qeα are multivariate
Hahn defined by Proposition 8.
(ii) A representation for these polynomials is:
µ
¶
Z
P odλ (r) α
1−p
d
fnα,p (r) =
p (dλ)
M
(105)
L
λ
γα,
d (r) n
1−p
N
B
p
Rd
α,p
+
µ
¶
Z
1−p
α
d
=
Ln λ
γα+r,p
(dλ),
(106)
p
Rd
+
and
µ
¶
P odλ (r) α∗
1−p
d
p (dλ)
=
L
λ
γα,
d (r) n
1−p
N
B
p
Rd
α,p
+
µ
¶
Z
1−p
d
(dλ),
=
Lα∗
λ
γα+r,p
n
d
p
R+
Z
∗ fα,p
Mn (r)
(107)
(108)
α∗
where {Lα
n } and {Ln } are given by (59) and (61), and
d
γα,|β|
(dz) :=
d
Y
γαi ,|β| (dzi )
|β| ∈ R, z ∈ Rd .
i=1
fnα } and M
fnα∗ are given by
(iii) The connection coefficients between {M
h
i
α,p
fnα,p (R)M
fm
E ∗M
(R) = c∗m (n)
(109)
where c∗m (n) are as in (65) or (97).
25
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
Proof. (103) is trivial and (105)-(106) follow from (100)-(101).
Now let us first prove (107)-(108). For every z ∈ Rd+ , denote x = z/|z|. Consider that
γα,|β| (dz) = γ|α|,|β| (d|z|)Dα (dx).
and that
P odz (r) = P o|z| (|r|)Lx (r).
Combining this with Lemma 1,
µ
¶
Z
P odλ (r) α∗
1−p
d
p (dλ)
L
λ
γα,
d (r) n
1−p
N
B
p
Rd
α,p
+
ÃZ
!
µ
¶·
¸|n0 |
P o|λ| (|r|) |α|+2|n0 |
1−p
1−p
d
=
Lnd
|λ|
|λ|
γ|α|, p (d|λ|)
1−p
p
p
R+ N B|α|,p (|r|)
ÃZ
!
Lx (r)
×
Rnα0 (x)Dα (dx) .
DM
α (r, |r|)
∆(d−1)
From Proposition 8, the last integral in (110) is equal to qenα0 (r; |r|).
The first integral can be rewritten as
µ
¶·
¸|n0 |
Z
1−p
1−p
|α|+2|n0 |
d
p (d|λ|)
Ln d
|λ|
γ|α|+|r|,
|λ|
1−p
p
p
R+
|λ|
µ
¶
Z
0
1−p
|λ||α+r+n | e− p
|n0 |
|α|+2|n0 |
= (1 − p) (|α + r|)(|n0 |) Lnd
|λ|
d|λ|
p
Γ(|α + r + n0 |)p|α+r+n0 |
0
fnα+2|n0 | (|r| − |n0 |).
= (1 − p)|n | (|α + r|) 0 M
(|n |)
d
(110)
(111)
The last line in (111) is obtained from (101) by rewriting |n0 | = 2|n0 | − |n0 | in the mixing measure. Thus the
identities (107)-(108) are proved.
To prove part (iii), simply use (63) with coefficients given by Proposition 6 to see that (105)-(106) and
(107)-(108) imply
·
µ
¶¸
1−p
α∗
∗ fα,p
Mn (r) = Eα+r,p Ln λ
p


µ
¶
X
1−p 
= Eα+r,p 
c∗m (n)Lα
m λ
p
|m|=|n|
·
µ
¶¸
X
1−p
∗
α
=
cm (n)Eα+r,p Lm λ
p
|m|=|n|
X
α,p
fm
c∗m (n)M
(r).
=
|m|=|n|
α,p
fm
This is equivalent to (109) because of the orthogonality of M
(R).
∗ fα,p
d
But (109) also implies that { Mn (r)} is an orthogonal system with N Bα,p
as weight measure since, for
every polynomial r[l] of degree |l| ≤ |n|,


X
X
X
d
α,p
d
fm
fnα,p (r)r[l] =
N Bα,p
(r)M
(r)r[l] 
N Bα,p
(r)∗ M
c∗m (n) 
r∈Nd
|m|=|n|
r∈Nd
The term between brackets is non-zero only for |l| = |m| = |n|, which implies orthogonality, so the proof of
the proposition is now complete.
26
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
6.1
The Bernstein-Bézier coefficients of the multiple Laguerre polynomials.
The representation of Meixner polynomials given in Proposition 10 leads, not surprisingly, to interpret these
as the Bernstein-Bézier coefficients of the multiple Laguerre polynomials (for any choice of basis), up to
proportionality constants. Note that, for products of Poisson distributions we can write
P odλ (r) =
d
Y
e−λi λri
i
i=1
ri !
=
e−|λ|
Bλ (r).
|λ|!
(112)
α∗ ∗ fα,p
d
fα,p
To simplify the notation, let (Lm , Mn ) denote either (Lα
m , Mm ) or (Lm , Mm ), for some α ∈ R and
−1
2
p ∈ (0, 1), and set ρr (α, p) := E[Mr ].
Corollary 5.
µ
¶
1−p
e−|λ| X
Lr λ
= ρr (α, p)
Mr (m)Bλ (m).
p
|λ|! m
(113)
Proof. The proof is along the same lines as for Corollary 3. From (105)-(107),
· µ
¶
¸
1−p
d
d
E Ln Y
P oY (m) = Mn (m)N Bα,p
(m),
n, m ∈ Nd .
p
Then from (112),
d
Bλ (m) = |λ|!e|λ| N Bα,p
(m)
X
n
¶
µ
1−p
.
Mn (m)Ln Y
p
d
So for every r ∈ N
X
Mr (m)Bλ (m)
= |λ|!e
m
|λ|
"
X X
n
d
N Bα,p
(m)Mn (m)Mr (m)
m
¶
µ
1
1−p
δnr
Ln Y
p
ρ
(α,
p)
r
n
µ
¶
|λ|!e|λ|
1−p
Lr Y
,
ρr (α, p)
p
= |λ|!e
=
#
|λ|
µ
¶
1−p
Ln Y
p
X
and the proof is complete.
References
[1] M. Abramowitz and I. A. Stegun. Handbook of mathematical functions with formulas, graphs, and
mathematical tables, volume 55 of National Bureau of Standards Applied Mathematics Series. For sale
by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C., 1964.
[2] G. E. Andrews, R. Askey, and R. Roy. Special functions, volume 71 of Encyclopedia of Mathematics
and its Applications. Cambridge University Press, Cambridge, 1999.
[3] R. Arratia, A. D. Barbour, and S. Tavaré. Logarithmic combinatorial structures: a probabilistic approach.
EMS Monographs in Mathematics. European Mathematical Society (EMS), Zürich, 2003.
[4] S. Bochner. Positive zonal functions on spheres. Proc. Nat. Acad. Sci. U. S. A., 40:1141–1147, 1954.
[5] K. Doksum. Tailfree and neutral random probabilities and their posterior distributions. Ann. Probability,
2:183–201, 1974.
27
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
[6] C. F. Dunkl and Y. Xu. Orthogonal polynomials of several variables, volume 81 of Encyclopedia of
Mathematics and its Applications. Cambridge University Press, Cambridge, 2001.
[7] A. Erdélyi. On some expansions in Laguerre polynomials. J. London Math. Soc., s1-13(2):154–156,
1938.
[8] T. S. Ferguson. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1:209–230, 1973.
[9] G. Gasper. Banach algebras for Jacobi series and positivity of a kernel. Ann. of Math. (2), 95:261–280,
1972.
[10] R. C. Griffiths. On the distribution of allele frequencies in a diffusion model. Theoret. Population Biol.,
15(1):140–158, 1979.
[11] M. E. H. Ismail. Classical and quantum orthogonal polynomials in one variable, volume 98 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2005. With two
chapters by Walter Van Assche, With a foreword by Richard A. Askey.
[12] S. Karlin and J. McGregor. Linear growth models with many types and multidimensional Hahn polynomials. In Theory and application of special functions (Proc. Advanced Sem., Math. Res. Center, Univ.
Wisconsin, Madison, Wis., 1975), pages 261–288. Math. Res. Center, Univ. Wisconsin, Publ. No. 35.
Academic Press, New York, 1975.
[13] S. Karlin and J. L. McGregor. The Hahn polynomials, formulas and an application. Scripta Math.,
26:33–46, 1961.
[14] J. F. C. Kingman. The population structure associated with the Ewens sampling formula. Theoret.
Population Biology, 11(2):274–283, 1977.
[15] J. F. C. Kingman, S. J. Taylor, A. G. Hawkes, A. M. Walker, D. R. Cox, A. F. M. Smith, B. M. Hill,
P. J. Burville, and T. Leonard. Random discrete distribution. J. Roy. Statist. Soc. Ser. B, 37:1–22,
1975. With a discussion by S. J. Taylor, A. G. Hawkes, A. M. Walker, D. R. Cox, A. F. M. Smith, B.
M. Hill, P. J. Burville, T. Leonard and a reply by the author.
[16] T. Koornwinder. Two-variable analogues of the classical orthogonal polynomials. In Theory and application of special functions (Proc. Advanced Sem., Math. Res. Center, Univ. Wisconsin, Madison, Wis.,
1975), pages 435–495. Math. Res. Center, Univ. Wisconsin, Publ. No. 35. Academic Press, New York,
1975.
[17] T. H. Koornwinder and A. L. Schwartz. Product formulas and associated hypergroups for orthogonal
polynomials on the simplex and on a parabolic biangle. Constr. Approx., 13(4):537–567, 1997.
[18] G. Lauricella. Sulle funzioni ipergeometriche a piu‘ variabili. Rend. Circ. Mat. Palermo, 7:111–158,
1893.
[19] J. Pitman. Combinatorial stochastic processes, volume 1875 of Lecture Notes in Mathematics. SpringerVerlag, Berlin, 2006. Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour,
July 7–24, 2002, With a foreword by Jean Picard.
[20] L. Saloff-Coste, P. Diaconis, and K. Khare. Gibbs sampling, exponential families and orthogonal polynomials. to appear in Statistical Science, 2009.
[21] T. Sauer. Jacobi polynomials in Bernstein form. J. Comput. Appl. Math., 199(1):149–158, 2007.
[22] S. Waldron. On the Bernstein-Bézier form of Jacobi polynomials on a simplex. J. Approx. Theory,
140(1):86–99, 2006.
[23] G. A. Watterson. Correction to: “The stationary distribution of the infinitely-many neutral alleles
diffusion model”. J. Appl. Probability, 14(4):897, 1977.
28
CRiSM Paper No. 08-24, www.warwick.ac.uk/go/crism
Download