Common distributions The binomial distribution A random variable X ∈ {0� 1� . . . � n} has a binomial(n� θ) distribution if θ ∈ [0� 1] and � � n x Pr(X = x|θ� n) = θ (1 − θ)n−x for x ∈ {0� 1� . . . � n}. x For this distribution, E[X|θ] = nθ� Var[X|θ] = nθ(1 − θ)� mode[X|θ] = �(n + 1)θ�� p(x|θ� n) = dbinom(x,n,theta) . If X1 ∼ binomial(n1 � θ) and X2 ∼ binomial(n2 � θ) are independent, then X = X1 +X2 ∼ binomial(n1 +n2 � θ). When n = 1 this distribution is called the binary or Bernoulli distribution. The binomial(n� θ) model assumes that X is (equal in distribution to) a sum of independent binary(θ) random variables. The beta distribution A random variable X ∈ [0� 1] has a beta(a� b) distribution if a > 0, b > 0 and p(x|a� b) = Γ (a + b) a−1 x (1 − x)b−1 Γ (a)Γ (b) for 0 ≤ x ≤ 1. 254 Common distributions For this distribution, E[X|a� b] = a � a+b ab 1 � = E[X] × E[1 − X] × (a + b + 1)(a + b)2 a+b+1 a−1 mode[X|a� b] = if a > 1 and b > 1� (a − 1) + (b − 1) Var[X|a� b] = p(x|a� b) = dbeta(x,a,b) . The beta distribution is closely related to the gamma distribution. See the paragraph on the gamma distribution below for details. A multivariate version of the beta distribution is the Dirichlet distribution, described in Exercise 12.4. The Poisson distribution A random variable X ∈ {0� 1� 2� . . .} has a Poisson(θ) distribution if θ > 0 and Pr(X = x|θ) = θx e−θ /x� for x ∈ {0� 1� 2� . . .}. For this distribution, E[X|θ] = θ� Var[X|θ] = θ� mode[X|θ] = �θ�� p(x|θ) = dpois(x,theta) . If X1 ∼ Poisson(θ1 ) and X2 ∼ Poisson(θ2 ) are independent, then X1 + X2 ∼ Poisson(θ1 +θ2 ). The Poisson family has a “mean-variance relationship,” which describes the fact that E[X|θ] = Var[X|θ] = θ. If it is observed that a sample mean is very different than the sample variance, then the Poisson model may not be appropriate. If the variance is larger than the sample mean, then a negative binomial model (Section 3.2.1) might be a better fit. The gamma and inverse-gamma distributions A random variable X ∈ (0� ∞) has a gamma(a� b) distribution if a > 0, b > 0 and ba a−1 −bx p(x|a� b) = x e for x > 0. Γ (a) For this distribution, Common distributions 255 E[X|a� b] = a/b� Var[X|a� b] = a/b2 � mode[X|a� b] = (a − 1)/b if a ≥ 1, 0 if 0 < a < 1 � p(x|a� b) = dgamma(x,a,b) . If X1 ∼ gamma(a1 � b) and X1 ∼ gamma(a2 � b) are independent, then X1 + X2 ∼ gamma(a1 +a2 � b) and X1 /(X1 +X2 ) ∼ beta(a1 � a2 ). If X ∼ normal(0� σ 2 ) then X 2 ∼ gamma(1/2� 1/[2σ 2 ]). The chi-square distribution with ν degrees of freedom is the same as a gamma(ν/2� 1/2) distribution. A random variable X ∈ (0� ∞) has an inverse-gamma(a� b) distribution if 1/X has a gamma(a� b) distribution. In other words, if Y ∼ gamma(a� b) and X = 1/Y , then X ∼ inverse-gamma(a� b). The density of X is p(x|a� b) = ba −a−1 −b/x x e Γ (a) for x > 0. For this distribution, E[X|a� b] = b/(a − 1) if a ≥ 1, ∞ if 0 < a < 1� Var[X|a� b] = b2 /[(a − 1)2 (a − 2)] if a ≥ 2, ∞ if 0 < a < 2� mode[X|a� b] = b/(a + 1). Note that the inverse-gamma density is not simply the gamma density with x replaced by 1/x: There is an additional factor of x−2 due to the Jacobian in the change-of-variables formula (see Exercise 10.3). The univariate normal distribution A random variable X ∈ � has a normal(θ� σ 2 ) distribution if σ 2 > 0 and p(x|θ� σ 2 ) = √ 1 2πσ 2 1 e− 2 �x−θ) 2 /σ 2 for −∞ < x < ∞. For this distribution, E[X|θ� σ 2 ] = θ� Var[X|θ� σ 2 ] = σ 2 � mode[X|θ� σ 2 ] = θ� p(x|θ� σ 2 ) = dnorm(x,theta,sigma) . Remember that � parameterizes things in terms of the standard deviation σ, and not the variance σ 2 . If X1 ∼ normal(θ1 � σ12 ) and X2 ∼ normal(θ2 � σ22 ) are independent, then aX1 + bX2 + c ∼ normal(aθ1 + bθ2 + c� a2 σ12 + b2 σ22 ). 256 Common distributions A normal sampling model is often useful even if the underlying population does not have a normal distribution. This is because statistical procedures that assume a normal model will generally provide good estimates of the population mean and variance, regardless of whether or not the population is normal (see Section 5.5 for a discussion). The multivariate normal distribution A random vector X ∈ �p has a multivariate normal(θ� Σ) distribution if Σ is a positive definite p × p matrix and � � 1 −p/2 −1/2 T −1 |Σ| exp − (x − θ) Σ (x − θ) for x ∈ �p . p(x|θ� Σ) = (2π) 2 For this distribution, E[X|θ� Σ] = θ� Var[X|θ� Σ] = Σ� mode[X|θ� Σ] = θ. Just like the univariate normal distribution, if X 1 ∼ normal(θ 1 � Σ1 ) and X 2 ∼ normal(θ 2 � Σ2 ) are independent, then aX 1 + bX 2 + c ∼ normal(aθ 1 + bθ 2 + c� a2 Σ1 + b2 Σ2 ). Marginal and conditional distributions of subvectors of X also have multivariate normal distributions: Let a ⊂ {1� . . . � p} be a subset of variable indices, and let b = ac be the remaining indices. Then X [a] ∼ multivariate normal(θ [a] � Σ[a�a] ) and {X [b] |X [a] } ∼ multivariate normal(θ b|a ,Σb|a ), where θ b|a = θ [b] + Σ[b�a] (Σ[a�a] )−1 (X [a] − θ [a] ) Σb|a = Σ[b�b] − Σ[b�a] (Σ[a�a] )−1 Σ[a�b] . Simulating a multivariate normal random variable can be achieved by a linear transformation of a vector of i.i.d. standard normal random variables. If Z is the vector with elements Z1 � . . . � Zp ∼ i.i.d. normal(0� 1) and AAT = Σ, then X = θ + AZ ∼ multivariate normal(θ� Σ). Usually A is the Choleski factorization of Σ. The following �-code will generate an n × p matrix such that the rows are i.i.d. samples from a multivariate normal distribution: Z<−m at r i x ( rnorm ( n∗p ) , nrow=n , n c o l=p ) X<−t ( t ( Z%∗%c h o l ( Sigma ) ) + c ( t h e t a ) ) Common distributions 257 The Wishart and inverse-Wishart distributions A random p × p symmetric positive definite matrix X has a Wishart(ν� M) distribution if the integer ν ≥ p, M is a p × p symmetric positive definite matrix and p(X|ν� M) = [2νp/2 Γp (ν/2)|M|ν/2 ]−1 × |X|�ν−p−1)/2 etr(−M−1 X/2)� where �p Γp (ν/2) = π p�p−1)/4 j=1 Γ [(ν + 1 − j)/2], and � etr(A) = exp( aj�j ), the exponent of the sum of the diagonal elements. For this distribution, E[X|ν� M] = νM� Var[Xi�j |ν� M] = ν × (m2i�j + mi�i mj�j )� mode[X|ν� M] = (ν − p − 1)M. The Wishart distribution is a multivariate version of the gamma distribution. Just as the sum of squares of i.i.d. univariate normal variables has a gamma distribution, the sums of squares of i.i.d. multivariate normal vectors has a Wishart distribution. Specifically, if Y 1 � . . . � Y ν ∼ i.i.d. multivariate � normal(0� M), then Y i Y Ti ∼ Wishart(ν� M). This relationship can be used to generate a Wishart-distributed random matrix: Z<−m at r i x ( rnorm ( nu∗p ) , nrow=nu , n c o l=p ) Y<−Z%∗%c h o l (M) X<−t (Y)%∗%Y # s t a n d a r d normal # rows have cov=M # Wishart matrix A random p × p symmetric positive definite matrix X has an inverseWishart(ν� M) distribution if X−1 has a Wishart(ν� M) distribution. In other words, if Y ∼ Wishart(ν� M) and X = Y−1 , then X ∼ inverse-Wishart(ν� M). The density of X is p(X|ν� M) = [2νp/2 Γp (ν/2)|M|ν/2 ]−1 × |X|−�ν+p+1)/2 etr(−M−1 X−1 /2). For this distribution, E[X|ν� M] = (ν − p − 1)−1 M−1 � mode[X|ν� M] = (ν + p + 1)−1 M−1 . The second moments (i.e. the variances) of the elements of X are given in Press (1972). Since we often use the inverse-Wishart distribution as a prior distribution for a covariance matrix Σ, it is sometimes useful to parameterize the distribution in terms of S = M−1 . Then if Σ ∼ inverse-Wishart(ν� S−1 ), we have mode[X|ν� S] = (ν + p + 1)−1 S. If Σ0 were the most probable value 258 Common distributions of Σ a priori, then we would set S = (ν0 + p + 1)Σ0 , so that Σ ∼ inverseWishart(ν� [(ν + p − 1)Σ0 ]−1 ) and mode[Σ|ν� S] = Σ0 . For more on the Wishart distribution and its relationship to the multivariate normal distribution, see Press (1972) or Mardia et al (1979).