Uploaded by Zhong Chuyi

common distribution

advertisement
Common distributions
The binomial distribution
A random variable X ∈ {0� 1� . . . � n} has a binomial(n� θ) distribution if θ ∈
[0� 1] and
� �
n x
Pr(X = x|θ� n) =
θ (1 − θ)n−x for x ∈ {0� 1� . . . � n}.
x
For this distribution,
E[X|θ] = nθ�
Var[X|θ] = nθ(1 − θ)�
mode[X|θ] = �(n + 1)θ��
p(x|θ� n) =
dbinom(x,n,theta) .
If X1 ∼ binomial(n1 � θ) and X2 ∼ binomial(n2 � θ) are independent, then
X = X1 +X2 ∼ binomial(n1 +n2 � θ). When n = 1 this distribution is called the
binary or Bernoulli distribution. The binomial(n� θ) model assumes that X is
(equal in distribution to) a sum of independent binary(θ) random variables.
The beta distribution
A random variable X ∈ [0� 1] has a beta(a� b) distribution if a > 0, b > 0 and
p(x|a� b) =
Γ (a + b) a−1
x
(1 − x)b−1
Γ (a)Γ (b)
for 0 ≤ x ≤ 1.
254
Common distributions
For this distribution,
E[X|a� b] =
a
�
a+b
ab
1
�
= E[X] × E[1 − X] ×
(a + b + 1)(a + b)2
a+b+1
a−1
mode[X|a� b] =
if a > 1 and b > 1�
(a − 1) + (b − 1)
Var[X|a� b] =
p(x|a� b) = dbeta(x,a,b) .
The beta distribution is closely related to the gamma distribution. See the
paragraph on the gamma distribution below for details. A multivariate version
of the beta distribution is the Dirichlet distribution, described in Exercise 12.4.
The Poisson distribution
A random variable X ∈ {0� 1� 2� . . .} has a Poisson(θ) distribution if θ > 0 and
Pr(X = x|θ) = θx e−θ /x� for x ∈ {0� 1� 2� . . .}.
For this distribution,
E[X|θ] = θ�
Var[X|θ] = θ�
mode[X|θ] = �θ��
p(x|θ) = dpois(x,theta) .
If X1 ∼ Poisson(θ1 ) and X2 ∼ Poisson(θ2 ) are independent, then X1 + X2 ∼
Poisson(θ1 +θ2 ). The Poisson family has a “mean-variance relationship,” which
describes the fact that E[X|θ] = Var[X|θ] = θ. If it is observed that a sample
mean is very different than the sample variance, then the Poisson model may
not be appropriate. If the variance is larger than the sample mean, then a
negative binomial model (Section 3.2.1) might be a better fit.
The gamma and inverse-gamma distributions
A random variable X ∈ (0� ∞) has a gamma(a� b) distribution if a > 0, b > 0
and
ba a−1 −bx
p(x|a� b) =
x
e
for x > 0.
Γ (a)
For this distribution,
Common distributions
255
E[X|a� b] = a/b�
Var[X|a� b] = a/b2 �
mode[X|a� b] = (a − 1)/b if a ≥ 1, 0 if 0 < a < 1 �
p(x|a� b) = dgamma(x,a,b) .
If X1 ∼ gamma(a1 � b) and X1 ∼ gamma(a2 � b) are independent, then X1 +
X2 ∼ gamma(a1 +a2 � b) and X1 /(X1 +X2 ) ∼ beta(a1 � a2 ). If X ∼ normal(0� σ 2 )
then X 2 ∼ gamma(1/2� 1/[2σ 2 ]). The chi-square distribution with ν degrees
of freedom is the same as a gamma(ν/2� 1/2) distribution.
A random variable X ∈ (0� ∞) has an inverse-gamma(a� b) distribution if
1/X has a gamma(a� b) distribution. In other words, if Y ∼ gamma(a� b) and
X = 1/Y , then X ∼ inverse-gamma(a� b). The density of X is
p(x|a� b) =
ba −a−1 −b/x
x
e
Γ (a)
for x > 0.
For this distribution,
E[X|a� b] = b/(a − 1) if a ≥ 1, ∞ if 0 < a < 1�
Var[X|a� b] = b2 /[(a − 1)2 (a − 2)] if a ≥ 2, ∞ if 0 < a < 2�
mode[X|a� b] = b/(a + 1).
Note that the inverse-gamma density is not simply the gamma density with
x replaced by 1/x: There is an additional factor of x−2 due to the Jacobian
in the change-of-variables formula (see Exercise 10.3).
The univariate normal distribution
A random variable X ∈ � has a normal(θ� σ 2 ) distribution if σ 2 > 0 and
p(x|θ� σ 2 ) = √
1
2πσ 2
1
e− 2 �x−θ)
2
/σ 2
for −∞ < x < ∞.
For this distribution,
E[X|θ� σ 2 ] = θ�
Var[X|θ� σ 2 ] = σ 2 �
mode[X|θ� σ 2 ] = θ�
p(x|θ� σ 2 ) = dnorm(x,theta,sigma) .
Remember that � parameterizes things in terms of the standard deviation
σ, and not the variance σ 2 . If X1 ∼ normal(θ1 � σ12 ) and X2 ∼ normal(θ2 � σ22 )
are independent, then aX1 + bX2 + c ∼ normal(aθ1 + bθ2 + c� a2 σ12 + b2 σ22 ).
256
Common distributions
A normal sampling model is often useful even if the underlying population
does not have a normal distribution. This is because statistical procedures
that assume a normal model will generally provide good estimates of the
population mean and variance, regardless of whether or not the population is
normal (see Section 5.5 for a discussion).
The multivariate normal distribution
A random vector X ∈ �p has a multivariate normal(θ� Σ) distribution if Σ
is a positive definite p × p matrix and
�
�
1
−p/2
−1/2
T −1
|Σ|
exp − (x − θ) Σ (x − θ)
for x ∈ �p .
p(x|θ� Σ) = (2π)
2
For this distribution,
E[X|θ� Σ] = θ�
Var[X|θ� Σ] = Σ�
mode[X|θ� Σ] = θ.
Just like the univariate normal distribution, if X 1 ∼ normal(θ 1 � Σ1 ) and
X 2 ∼ normal(θ 2 � Σ2 ) are independent, then aX 1 + bX 2 + c ∼ normal(aθ 1 +
bθ 2 + c� a2 Σ1 + b2 Σ2 ). Marginal and conditional distributions of subvectors of X also have multivariate normal distributions: Let a ⊂ {1� . . . � p}
be a subset of variable indices, and let b = ac be the remaining indices.
Then X [a] ∼ multivariate normal(θ [a] � Σ[a�a] ) and {X [b] |X [a] } ∼ multivariate
normal(θ b|a ,Σb|a ), where
θ b|a = θ [b] + Σ[b�a] (Σ[a�a] )−1 (X [a] − θ [a] )
Σb|a = Σ[b�b] − Σ[b�a] (Σ[a�a] )−1 Σ[a�b] .
Simulating a multivariate normal random variable can be achieved by a linear
transformation of a vector of i.i.d. standard normal random variables. If Z
is the vector with elements Z1 � . . . � Zp ∼ i.i.d. normal(0� 1) and AAT = Σ,
then X = θ + AZ ∼ multivariate normal(θ� Σ). Usually A is the Choleski
factorization of Σ. The following �-code will generate an n × p matrix such
that the rows are i.i.d. samples from a multivariate normal distribution:
Z<−m at r i x ( rnorm ( n∗p ) , nrow=n , n c o l=p )
X<−t ( t ( Z%∗%c h o l ( Sigma ) ) + c ( t h e t a ) )
Common distributions
257
The Wishart and inverse-Wishart distributions
A random p × p symmetric positive definite matrix X has a Wishart(ν� M)
distribution if the integer ν ≥ p, M is a p × p symmetric positive definite
matrix and
p(X|ν� M) = [2νp/2 Γp (ν/2)|M|ν/2 ]−1 × |X|�ν−p−1)/2 etr(−M−1 X/2)�
where
�p
Γp (ν/2) = π p�p−1)/4
j=1 Γ [(ν + 1 − j)/2], and
�
etr(A) = exp( aj�j ), the exponent of the sum of the diagonal elements.
For this distribution,
E[X|ν� M] = νM�
Var[Xi�j |ν� M] = ν × (m2i�j + mi�i mj�j )�
mode[X|ν� M] = (ν − p − 1)M.
The Wishart distribution is a multivariate version of the gamma distribution. Just as the sum of squares of i.i.d. univariate normal variables has a
gamma distribution, the sums of squares of i.i.d. multivariate normal vectors
has a Wishart distribution.
Specifically, if Y 1 � . . . � Y ν ∼ i.i.d. multivariate
�
normal(0� M), then Y i Y Ti ∼ Wishart(ν� M). This relationship can be used
to generate a Wishart-distributed random matrix:
Z<−m at r i x ( rnorm ( nu∗p ) , nrow=nu , n c o l=p )
Y<−Z%∗%c h o l (M)
X<−t (Y)%∗%Y
# s t a n d a r d normal
# rows have cov=M
# Wishart matrix
A random p × p symmetric positive definite matrix X has an inverseWishart(ν� M) distribution if X−1 has a Wishart(ν� M) distribution. In other
words, if Y ∼ Wishart(ν� M) and X = Y−1 , then X ∼ inverse-Wishart(ν� M).
The density of X is
p(X|ν� M) = [2νp/2 Γp (ν/2)|M|ν/2 ]−1 × |X|−�ν+p+1)/2 etr(−M−1 X−1 /2).
For this distribution,
E[X|ν� M] = (ν − p − 1)−1 M−1 �
mode[X|ν� M] = (ν + p + 1)−1 M−1 .
The second moments (i.e. the variances) of the elements of X are given in
Press (1972). Since we often use the inverse-Wishart distribution as a prior
distribution for a covariance matrix Σ, it is sometimes useful to parameterize
the distribution in terms of S = M−1 . Then if Σ ∼ inverse-Wishart(ν� S−1 ),
we have mode[X|ν� S] = (ν + p + 1)−1 S. If Σ0 were the most probable value
258
Common distributions
of Σ a priori, then we would set S = (ν0 + p + 1)Σ0 , so that Σ ∼ inverseWishart(ν� [(ν + p − 1)Σ0 ]−1 ) and mode[Σ|ν� S] = Σ0 .
For more on the Wishart distribution and its relationship to the multivariate
normal distribution, see Press (1972) or Mardia et al (1979).
Download