```Continuous Distributions
by R.J. Reed
These notes are based on handouts for lecture modules given jointly by myself and the late Jeff Harrison.
The following continuous distributions are covered: gamma, χ2 ,normal, lognormal, beta, arcsine, t, Cauchy, F,
power, Laplace, Rayleigh, Weibull, Pareto, bivariate normal, multivariate normal, bivariate t, and multivariate t
distributions.
Important. There are many theoretical results in the exercises—full solutions are provided at the end.
details of errors to my Wordpress account: bcgts.wordpress.com.
Contents
1 Univariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Revision of some basic results: 3.
1.2 Order statistics: 3.
1.3 Exercises: 6.
1.4 The Gamma and χ2
distributions: 8. 1.5 Exercises: 11. 1.6 The normal distribution: 12. 1.7 Exercises: 15. 1.8 The lognormal
distribution: 16. 1.9 Exercises: 18. 1.10 The beta and arcsine distributions: 19. 1.11 Exercises: 22. 1.12 The
t, Cauchy and F distributions: 23. 1.13 Exercises: 26. 1.14 Non-central distributions: 27. 1.15 Exercises: 29.
1.16 The power and pareto distributions: 30. 1.17 Exercises: 33. 1.18 Laplace, Rayleigh and Weibull distributions: 35. 1.19 Exercises: 36. 1.20 Size, shape and related characterization theorems: 38. 1.21 Exercises: 41.
2 Multivariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1 Multivariate distributions—general results: 43.
2.2 Exercises: 44.
2.3 The bivariate normal: 44.
2.4 Exercises: 48. 2.5 The multivariate normal: 50. 2.6 Exercises: 55. 2.7 The bivariate t distribution: 56.
2.8 The multivariate t distribution: 57. 2.9 Exercises: 60.
Appendix: Answers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercises 1.3: 61. Exercises 1.5: 64. Exercises 1.7: 65. Exercises 1.9: 67. Exercises 1.11: 69. Exercises 1.13: 70.
Exercises 1.15: 72. Exercises 1.17: 72. Exercises 1.19: 77. Exercises 1.21: 79. Exercises 2.2: 80. Exercises 2.4: 81.
Exercises 2.6: 84. Exercises 2.9: 85.
Bayesian Time Series Analysis by R.J. Reed
Jun 7, 2018(14:22)
Page 1
Page 2
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
CHAPTER 1
Univariate Continuous Distributions
1 Revision of some basic results
1.1 Conditional variance and expectation. For any random vector (X, Y ) such that E[Y ] is finite, the conditional variance of Y given X is defined to be
2
var(Y | X) = E[Y 2 | X] − E[Y | X]
= E (Y − E[Y | X])2 | X
(1.1a)
This is a function of X. It follows that E var(Y | X) + var E[Y | X] = E[Y 2 ] − (E[Y ])2 = var(Y )
(1.1b)
Equation(1.1b) is often called the Law of Total Variance and is probably best remembered in the following form:
var(Y ) = E[conditional variance] + var(conditional mean)
which is similar to the decomposition in the analysis of variance.
Definition(1.1a). For any random vector (X, Y, Z) such that E[XY ], E[X] and E[Y ] are all finite, the
conditional covariance between X and Y given Z is defined to be
cov(X, Y | Z) = E[XY | Z] − E[X | Z] E[Y | Z]
An alternative definition is
cov(X, Y | Z) = E X − E[X | Z] Y − E[Y | Z] Z
(1.1c)
Note that cov(X, Y | Z) is a function of Z. Using the results cov(X, Y ) = E[XY ] − E[X]E[Y ] and
cov E[X|Z], E[Y |Z] = E E[X|Z] E[Y |Z] − E[X]E[Y ]
gives the Law of Total Covariance
cov(X, Y ) = E cov(X, Y | Z) + cov E[X|Z], E[Y |Z]
(1.1d)
This can be remembered as
cov(X, Y ) = E[conditional covariance] + cov(conditional means)
Note that setting X = Y in the Law of Total Covariance gives the Law of Total Variance.
1.2 Conditional independence. Recall that X and Y are conditionally independent given Z iff
Pr[X ≤ x, Y ≤ y | Z] = Pr[X ≤ x | Z] Pr[Y ≤ y | Z]
a.e.
for all x ∈ R and all y ∈ R.
Example(1.2a). Conditional independence does not imply independence.
Here is a simple demonstration: suppose box 1 contains two fair coins and box 2 contains two coins which have heads
on both sides. A box is chosen at random—denote the result by Z. A coin is selected from the chosen box and tossed—
denote the result by X; then the other coin from the chosen box is tossed independently of the first coin—denote the result
by Y . Clearly X and Y are conditionally independent given Z. However
5
3
but Pr[X = H] = Pr[Y = H] =
Pr[X = H, Y = H] =
8
4
2 Order statistics
2.1 Basics. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a continuous distribution which has
distribution function F and density f . Then
X1:n , X2:n , . . . , Xn:n
denote the order statistics of X1 , X2 , . . . , Xn . This means
X1:n = min{X1 , . . . , Xn }
Xn:n = max{X1 , . . . , Xn }
and the random variables X1:n , X2:n , . . . , Xn:n consist of X1 , X2 , . . . , Xn arranged in increasing order; hence
X1:n ≤ X2:n ≤ · · · ≤ Xn:n
Bayesian Time Series Analysis by R.J. Reed
Jun 7, 2018(14:22)
Section 2
Page 3
Page 4 Section 2
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
2.2 Finding the density of (X1:n , . . . , Xn:n ). Let g(y1 , . . . , yn ) denote the density of (X1:n , . . . , Xn:n ).
Note that (X1:n , . . . , Xn:n ) can be regarded as a transformation T of the vector (X1 , . . . , Xn ).
• Suppose n = 2. Let A1 = {(x1 , x2 ) ∈ R : x1 < x2 } and let T1 denote the restriction of T to A1 . Similarly let
A2 = {(x1 , x2 ) ∈ R : x1 > x2 } and let T2 denote the restriction of T to A2 . Clearly T1 : A1 → A1 is 1 − 1 and
T2 : A2 → A1 is 1 − 1. Hence for all (y1 , y2 ) ∈ A1 (i.e. for all y1 < y2 ), the density g(y1 , y2 ) of (X1:2 , X2:2 ) is
fX1 ,X2 (T1−1 (y1 , y2 )) fX1 ,X2 (T2−1 (y1 , y2 ))
+
g(y1 , y2 ) =
∂(y1 ,y2 ) ∂(y1 ,y2 ) ∂(x1 ,x2 ) ∂(x1 ,x2 ) fX1 ,X2 (y1 , y2 ) fX1 ,X2 (y2 , y1 )
+
|1|
|−1|
= 2f (y1 )fx (y2 )
• Suppose n = 3. For this case, we need A1 , A2 , A3 , A4 , A5 and A6 where
A1 = {(x1 , x2 , x3 ) ∈ R : x1 < x2 < x3 }
A2 = {(x1 , x2 , x3 ) ∈ R : x1 < x3 < x2 }
etc. There are 3! = 6 orderings of (x1 , x2 , x3 ). So this leads to
g(y1 , y2 , y3 ) = 3!f (y1 )f (y2 )f (y3 )
• For the general case of n ≥ 2, we have
g(y1 , . . . , yn ) = n!f (y1 ) · · · f (yn )
=
(2.2a)
2.3 Finding the distribution of Xr:n by using distribution functions. Dealing with the maximum is easy:
n
Y
Fn:n (x) = P[Xn:n ] ≤ x] = P[X1 ≤ x, . . . , Xn ≤ x] =
P[Xi ≤ x] = {F (x)}n
i=1
n−1
fn:n (x) = nf (x) {F (x)}
Now for the minimum: X1:n :
P[X1:n > x] = P[X1 > x, . . . , Xn > x] =
n
Y
i=1
P[Xi > x] = {1 − F (x)}n
F1:n (x) = 1 − P[X1:n > x] = 1 − {1 − F (x)}n
f1:n (x) = nf (x) {1 − F (x)}n−1
Now for the general case, Xr:n where 2 ≤ r ≤ n − 1. The event {Xr:n ≤ x} occurs iff at least r random variables
from X1 , . . . , Xn are less than or equal to x. Hence
n X
n
{F (x)}j {1 − F (x)}n−j
P[Xr:n ≤ x] =
j
j=r
=
n−1 X
j=r
n
{F (x)}j {1 − F (x)}n−j + {F (x)}n
j
Differentiating gives
fr:n (x) =
n−1 X
n
j=r
j
jf (x) {F (x)}j−1 {1 − F (x)}n−j −
n−1 X
n
j=r
=
n
X
j=r
j
(n − j)f (x) {F (x)}j {1 − F (x)}n−j−1 + nf (x) {F (x)}n−1
n!
f (x) {F (x)}j−1 {1 − F (x)}n−j −
(j − 1)!(n − j)!
n−1
X
j=r
n!
f (x) {F (x)}j {1 − F (x)}n−j−1
j!(n − j − 1)!
n!
f (x) {F (x)}r−1 {1 − F (x)}n−r
(r − 1)!(n − r)!
Note that equation(2.3a) is true for all r = 1, 2, . . . , n.
=
(2.3a)
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 2 Page 5
2.4 Finding the distribution of Xr:n by using the density of (X1:n , . . . , Xn:n ). Recall that the density of
(X1:n , . . . , Xn:n ) is g(y1 , . . . , yn ) = n!f (y1 ) · · · f (yn ) for y1 < · · · < yn .
Integrating out yn gives
Z ∞
g(y1 , . . . , yn−1 ) = n!f (y1 ) · · · f (yn−1 )
f (yn )dyn = n!f (y1 ) · · · f (yn−1 ) 1 − F (yn−1 )
yn−1
and hence
g(y1 , . . . , yn−2 ) = n!f (y1 ) · · · f (yn−2 )
Z
∞
yn−2
f (yn−1 ) 1 − F (yn−1 ) dyn−1
2
1 − F (yn−2 )
= n!f (y1 ) · · · f (yn−2 )
2!
3
1 − F (yn−3 )
g(y1 , . . . , yn−3 ) = n!f (y1 ) · · · f (yn−3 )
3!
and by induction for r = 1, 2, . . . , n − 1
[1 − F (yr )]n−r
g(y1 , . . . , yr ) = n!f (y1 ) · · · f (yr )
for y1 < y2 < · · · < yr .
(n − r)!
Assuming r ≥ 3 and integrating over y1 gives
Z y2
[1 − F (yr )]n−r
[1 − F (yr )]n−r
g(y2 , . . . , yr ) = n!
f (y1 ) · · · f (yr )
dy1 = n!F (y2 )f (y2 ) · · · f (yr )
(n − r)!
(n − r)!
y1 =−∞
Integrating over y2 gives
[F (y3 )]2
[1 − F (yr )]n−r
f (y3 ) · · · f (yr )
2!
(n − r)!
And so on, leading to equation(2.3a).
g(y3 , . . . , yr ) = n!
for y3 < · · · < yr .
2.5 Joint distribution of Xj:n and Xr:n by using the density of (X1:n , . . . , Xn:n ). Suppose X1:n , . . . , Xn:n
denote the order statistics from the n random variables X1 , . . . , Xn which have density f (x) and distribution
function F (x). Suppose 1 ≤ j < r ≤ n; then the joint density of (Xj:n , Xr:n ) is
j−1 r−1−j n−r
f(j:n,r:n) (u, v) = cf (u)f (v) F (u)
F (v) − F (u)
1 − F (v)
(2.5a)
where
n!
(j − 1)!(r − 1 − j)!(n − r)!
The method used to derive this result is the same as that used to derive the distribution of Xr:n in the previous
paragraph.
c=
Example(2.5a). Suppose X1 , . . . , Xn are i.i.d. random variables with density f (x) and distribution function F (x). Find
expressions for the density and distribution function of Rn = Xn:n − X1:n , the range of X1 , . . . , Xn .
Solution. The density of (X1:n , Xn:n ) is
n−2
f(1:n,n:n) (u, v) = n(n − 1)f (u)f (v) F (v) − F (u)
for u < v.
Now use the transformation R = Xn:n − X1:n and T = X1:n . The absolute value of the Jacobian is one. Hence
n−2
f(R,T ) (r, t) = n(n − 1)f (t)f (r + t) F (r + t) − F (t)
for r > 0 and t ∈ R.
Integrating out T gives
Z ∞
n−2
fR (r) = n(n − 1)
f (t)f (r + t) F (r + t) − F (t)
dt
t=−∞
The distribution function is, for v > 0,
Z v Z ∞
n−2
FR (v) = n(n − 1)
f (t)f (r + t) F (r + t) − F (t)
dt dr
r=0 t=−∞
Z ∞
Z v
n−2
= n(n − 1)
f (t)
f (r + t) F (r + t) − F (t)
dr dt
t=−∞
r=0
Z ∞
Z
∞
h
n−1 v
n−1
=n
f (t) F (r + t) − F (t)
dt
dt = n
f (t) F (v + t) − F (t)
t=−∞
r=0
t=−∞
Page 6 Exercises 3
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
2.6 Joint distribution of Xj:n and Xr:n by using distribution functions. Suppose u < v and then define the
counts N1 , N2 and N3 as follows:
n
n
n
X
X
X
N1 =
I(Xi ≤ u)
N2 =
I(u < Xi ≤ v) and N3 = n − N1 − N2 =
I(Xi > v)
i=1
i=1
i=1
Now P X1 ≤ u = F (u); P u < X1 ≤ v = F (v) − F (u) and P X > v = 1 − F (v). It follows
that the vector
(N1 , N2 , N3 ) has the multinomial distribution with probabilities F (u), F (v) − F (u), 1 − F (v) .
The joint distribution function of Xj:n .Xr:n is:
P Xj:n ≤ u and Xr:n < v = P N1 ≥ j and (N1 + N2 ) ≥ r
=
n X
`
X
P N1 = k and N1 + N2 = `
`=r k=j
=
n X
`
X
`=r k=j
k `−k n−`
n!
F (u)
F (v) − F (u)
1 − F (v)
k!(` − k)!(n − `)!
The joint density of Xj:n .Xr:n is:
∂2 P Xj:n ≤ u and Xr:n < v
∂u∂v
Using the abbreviations a = F (u), b = F (v) − F (u) and c = 1 − F (v) gives

n X
`
X
∂ n!
P Xj:n ≤ u and Xr:n < v = f (u)
ak−1 b`−k cn−`

∂u
(k − 1)!(` − k)!(n − `)!
f(j:n,r:n) (u, v) =
`=r
k=j
−
= f (u)
n
X
`=r
`−1
X
k=j


n!
ak b`−k−1 cn−`

k!(` − k − 1)!(n − `)!
n!
aj−1 b`−j cn−`
(j − 1)!(` − j)!(n − `)!
and hence
∂2 n!
P Xj:n ≤ u and Xr:n < v = f (u)f (v)
aj−1 br−j−1 cn−r
∂u∂v
(j − 1)!(r − j − 1)!(n − r)!
as required—see equation(2.5a) on page 5.
3 Exercises
(exs-basic.tex)
Revision exercises.
1. The following assumptions are made about the interest rates for the next three years. Suppose the interest rate for year 1
is 4% p.a. effective. Let V1 and V2 denote the interest rates in years 2 and 3 respectively. Suppose V1 = 0.04 + U1 and
V2 = 0.04 + 2U2 where U1 and U2 are independent random variables with a uniform distribution on [−0.01, 0.01]. Hence
V1 has a uniform distribution on [0.03, 0.05] and V2 has a uniform distribution on [0.02, 0.06].
(a) Find the expectation of the accumulated amount at the end of 3 years of £1,000 invested now.
(b) Find the expectation of the present value of £1,000 in three years’ time.
2. Uniform to triangular. Suppose X and Y are i.i.d random variables with the uniform distribution U (−a, a), where a > 0.
Find the density of W = X + Y and sketch the shape of the density.
3. Suppose the random variable X has the density fX (x) =
1
2
for −1 < x < 1. Find the density of Y = X 4 .
4. Suppose X is a random variable with X > 0 a.e. and such that both E[X] and E[ 1/X ] both exist. Prove that E[X] +
E[ 1/X ] ≥ 2.
5. Suppose X is a random variable with X ≥ 0 and density function f . Let F denote the distribution function of X. Show
that
Z
Z
∞
(a) E[X] =
0
∞
[1 − F (x)] dx
(b)
E[X r ] =
0
rxr−1 [1 − F (x)] dx
for r = 1, 2, . . . .
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Exercises 3 Page 7
6. Suppose X1 , X2 , . . . , Xn are independent and identically distributed positive random variables and Sn = X1 + · · · + Xn .
(a) Show that
1
1
E
≥
Sn
nµ
(b) Show that
Z ∞
n
1
E
E[e−tX ]
dt
=
Sn
0
7. Suppose X1 , X2 , . . . , Xn are independent and identically distributed positive random variables.
(a) Suppose E[1/Xi ] is finite for all i. Show that E[1/Sj ] is finite for all j = 2, 3, . . . , n where Sj = X1 + · · · + Xj .
(b) Suppose E[Xi ] and E[1/Xi ] both exist and are finite for all i. Show that
Sj
j
E
=
for j = 1, 2,. . . , n.
Sn
n
8. Suppose X and Y are positive random variables with E[Y ] > 0. Suppose further that X/Y is independent of X and X/Y
is independent of Y .
2
(a) Suppose E[X 2 ], E[Y 2 ] and E[ X /Y 2 ] are all finite. Show that E[X] = E X/Y ] E[Y ]. Hence deduce that there exists
b ∈ R with X/Y = b almost everywhere.
(b) Use characteristic functions to prove there exists b ∈ R with X/Y = b almost everywhere.
Conditional expectation.
9. The best predictor of the random variable Y . Given the random vector (X, Y ) with E[X 2 ] < ∞ and E[Y 2 ] < ∞, find
that random variable Yb = g(X) which is a function of X and provides the best predictor of Y . Precisely, show that
Yb = E[Y |X], which is a function of X, minimizes
2 E
Y − Yb
10. Suppose the random vector (X, Y ) satisfies 0 < E[X 2 ] < ∞ and 0 < E[Y 2 ] < ∞. Suppose further that E[Y |X = x] =
a + bx a.e..
(a) Show that µY = a + bµX and E[XY ] = aµX + bE[X 2 ]. Hence show that cov[X, Y ] = b var[X] and E[Y |X] =
Y
µY + ρ σσX
(X − µX ) a.e..
2
(b) Show that var E(Y |X) = ρ2 σY2 and E Y − E(Y |X) = (1 − ρ2 )σY2 .
(Hence if ρ ≈ 1 then Y is near E(Y |X) with high probability; if ρ = 0 then the variation of Y about E(Y |X) is the
same as the variation about the mean µY .)
2
in
(c) Suppose E(X|Y ) = c + dY a.e. where bd < 1 and d 6= 0. Find expressions for E[X], E[Y ], ρ2 and σY2 /σX
terms of a, b, c and d.
11. Best linear predictor of the random variable Y . Suppose the random vector (X, Y ) satisfies 0 < E[X 2 ] < ∞ and
0 < E[Y 2 ] < ∞.
Find a and b such that therandom variable Yb = a + bX provides the best linear predictor of Y . Precisely, find a ∈ R and
b ∈ R which minimize E ( Y − a − bX )2 .
Note. Suppose E[Y |X] = a0 + b0 X. By exercise 9, we know that E[Y |X] = a0 + b0 X is the best predictor of Y . Hence
a0 + b0 X is also the best linear predictor of Y
12. Suppose the random vector (X, Y ) has the density
6
2
f(X,Y ) (x, y) = 7 (x + y)
0
(a) Find the best predictor of Y .
for x ∈ [0, 1] and y ∈ [0, 1];
otherwise.
(b) Find the best linear predictor of Y .
(c) Compare the plots of the answers to parts (a) and (b) as functions of x ∈ [0, 1].
Order statistics.
13. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform U (0, 1) distribution.
(a) Find the distribution of Xj:n .
(b) Find E[Xj:n ].
Page 8 Section 4
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
14. Suppose k > r. It is known1 that if the random variable X has an absolutely continuous distribution with distribution
function F then the conditional distribution function P[Xk:n < y|Xr:n = x] is the same as the distribution function of
the (k − r)th order statistic in a sample of size (n − r) from the distribution function
F (y)−F (x)
if y > x;
1−F (y)
F1 (y) =
0
otherwise.
Suppose X1 and X2 are i.i.d. absolutely continuous non-negative random variables with density function f (x) and
distribution function F (x). By using the above result, show that X2:2 − X1:2 is independent of X1:2 if and only if
X ∼ exponential (λ).
15. Suppose X1 , X2 , . . . , Xn are i.i.d. absolutely continuous non-negative random variables with density function f (x) and
distribution function F (x). Define the vector (Y1 , Y2 , . . . , Yn ) by
X2:n
Xn:n
Y1 = X1:n , Y2 =
, . . . , Yn =
X1:n
X1:n
(a) Find an expression for the density of the vector (Y1 , Y2 , . . . , Yn ) in terms of f and F .
(b) Hence derive expressions for the density of the vector (Y1 , Y2 ) = (X1:n , X2:n/X1:n ) and the density of the random
variable Y1 = X1:n .
4 The gamma and chi-squared distributions
4.1 Definition of the Gamma distribution.
Definition(4.1a). Suppose n > 0 and α > 0. Then the random variable X has the Gamma(n, α) distribution
iff X has density
f (x) =
By definition, Γ(n) =
R∞
0
αn xn−1 e−αx
Γ(n)
for x > 0.
(4.1a)
xn−1 e−x dx for all n ∈ (0, ∞). It follows that
Z ∞
Γ(n)
xn−1 e−αx dx = n
provided α > 0 and n > 0.
α
0
(4.1b)
4.2 The distribution function. There is a simple expression only when n is a positive integer and then
αx (αx)2
(αx)n−1
−αx
Gn (x) = P[Sn ≤ x] = 1 − e
1+
+
+ ··· +
1!
2!
(n − 1)!
This is easy to check—just differentiate Gn (x) and obtain the density in equation(4.1a).
Note that P[Sn ≤ x] = P[Y ≥ n] where Y has a Poisson distribution with expectation αx. In terms of the Poisson
process with rate α, this means that the nth event occurs before time x iff there are at least n events in [0, x].
4.3 Multiple of a gamma distribution. Suppose n > 0 and α > 0 and X ∼ Gamma(n, α) with density fX (x).
Suppose further that Y = βX where β > 0. Then the density of Y is given by:
fX (x) αn ( y/β )n−1 exp(−αy/β)
fY (y) = =
dy
βΓ(n)
dx Hence Y = βX ∼ Gamma(n, α/β ).
4.4 Moments and shape of the gamma distribution. Using the result that
E[X k ] =
Γ(n + k)
αk Γ(n)
R
f (x) dx = 1 easily gives
for n + k > 0.
(4.4a)
and so
n
E[X] =
α
1
n
and var(X) = 2
α
For example, page 38 of [G ALAMBOS & KOTZ(1978)].
1
α
and E
=
for n > 1.
X
n−1
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 4 Page 9
n = 21
n=1
n=2
0.8
0.6
0.4
0.2
0.0
1
2
3
4
Figure(4.4a). Plot of gamma density function for n = 12 , n = 1 and n = 2 (all with α = 1).
By §4.3, we know that if X ∼ Gamma(n, α) then Y = X/α ∼ Gamma(n, 1). So without loss of generality, we
consider the shape of the density of Gamma(n, 1) distribution.
Let g(x) = xn−1 e−x . If n ≤ 1, then g(x) = e−x /x1−n is monotonic decreasing and hence the density of the
Gamma(n, 1) distribution is monotonic decreasing.
If n > 1, then g 0 (x) = e−x xn−2 [n − 1 − x]. Clearly, if x < n − 1 then g 0 (x) > 0; if x = n − 1 then g 0 (x) = 0 and
if x > n − 1 then g 0 (x) < 0. Hence the density first increases to the maximum at x = n − 1 and then decreases.
By using §4.3, it follows that the maximum of the density of a Gamma(n, α) density occurs at x = (n − 1)/α.
4.5 The moment generating function of a gamma distribution. Suppose X ∼ Gamma(n, α). Then
Z ∞
Z ∞
n n−1 e−αx
αn
tX
tx α x
xn−1 e−(α−t)x dx
dx =
MX (t) = E[e ] =
e
Γ(n)
Γ(n)
0
0
αn Γ(n)
1
=
=
for t < α.
(4.5a)
Γ(n) (α − t)n (1 − t/α)n
Hence the characteristic function is 1/(1 − it/α)n ; in particular, if n = 1, the characteristic function of the
exponential(α) distribution is α/(α − it).
Equation(4.5a) shows that for integral n, the Gamma distribution is the sum of n independent exponentials. The
next paragraph gives the long proof of this.
4.6 Characterization of the gamma distribution for integral n. The following proposition shows that if Gn ∼
Gamma(n, α), then Gn has the same distribution as the waiting time for the nth event in a Poisson process with
rate α.
Suppose X1 , X2 . . . . , Xn are i.i.d. random variables with the exponential density αe−αx
for x ≥ 0. Then Gn = X1 + · · · + Xn has the Gamma distribution Γ(n, α).
Proposition(4.6a).
Proof. By induction: let gn denote the density of Gn . Then for all t > 0 we have
Z t
Z t n
α (t − x)n−1 e−α(t−x) −αx
gn+1 (t) =
gn (t − x)αe−αx dx =
αe
dx
Γ(n)
0
0
x=t
Z
αn+1 e−αt t
αn+1 e−αt
(t − x)n
n−1
=
(t − x)
dx =
−
Γ(n)
Γ(n)
n
0
x=0
=
αn+1 tn e−αt
Γ(n + 1)
as required.
The result that the sum of n independent exponentials has the Gamma distribution is the continuous analogue of
the result that the sum of n independent geometrics has a negative binomial distribution.
It follows from proposition(4.6a) and the Central Limit Theorem that for n large,
Gn − n/α
√
is approximately N (0, 1)
n/α
and hence
αx − n
√
P[Gn ≤ x] ≈ Φ
n
Page 10 Section 4
Jun 7, 2018(14:22)
The local central limit theorem2 shows that
√ √
n+z n
n
fGn
= n(z)
lim
n→∞ α
α
where
Bayesian Time Series Analysis
1 2
1
n(x) = √ e− 2 x
2π
(4.6a)
See exercise 9 on page 12 below.
4.7 Lukacs’ characterization of the gamma distribution.
Suppose X and Y are both positive, non-degenerate 3 and independent random variables.
Then X/(X+Y ) is independent of X+Y iff there exist k1 > 0, k2 > 0 and α > 0 such that X ∼ Gamma(k1 , α)
and Y ∼ Gamma(k2 , α).
Proposition(4.7a).
Proof.
⇐ This is exercise 5 on page 12.
⇒ This is proved in [L UKACS(1955)]and [M ARSAGLIA(1989)].
We can easily extend this result to n variables:
Proposition(4.7b). Suppose X1 , X2 , . . . , Xn are positive, non-degenerate and independent random variables.
Then Xj /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn for j = 1, 2, . . . , n iff there exist α > 0,
k1 > 0, . . . , kn > 0 such that Xj ∼ Gamma(kj , α) for j = 1, 2, . . . , n.
Proof.
⇐ Now W = X2 + · · · + Xn ∼ Gamma(k2 + · · · + kn , β) and X1 ∼ Gamma(k1 , β). Also W and X1 are independent
positive random variables. Hence X1 /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn by proposition(4.7a).. Similarly
Xj /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn for j = 2, . . . , n.
⇒ Let Wj = X1 + · · · + Xn − Xj . Then Wj and Xj are independent positive random variables. Also Xj /(Wj + Xj ) is
independent of Wj + Xj . By proposition(4.7a), there exist kj > 0, kj∗ > 0 and αj > 0 such that Xj ∼ Gamma(kj , αj )
and Zj ∼ Gamma(kj∗ , αj ). Hence X1 + · · · + Xn = Zj + Xj ∼ Gamma(kj + kj∗ , αj ). Applying the same argument to
W1 , . . . , Wn gives α1 = · · · = αn . The result follows.
4.8 The χ2 distribution. For n ∈ (0, ∞) the Gamma( n/2, 1/2) distribution has density:
f (x) =
xn/2−1 e−x/2
2n/2 Γ( n/2)
for x > 0.
This is the density of the χ2n distribution. If n is a positive integer, then n is called the degrees of freedom.
In particular, if n ∈ (0, ∞) and X ∼ Gamma( n/2, α) then 2αX ∼ Gamma( n/2, 1/2) = χ2n .
If Y ∼ χ2n = Gamma( n/2, 1/2), then equation(4.4a) shows that the k th moment of Y is given by
(
n
k Γ(k+ /2)
if n > −2k;
2
k
n
Γ( /2)
(4.8a)
E[Y ] =
∞
if n ≤ −2k.
√
√
In particular E[Y ] = n, E[Y 2 ] = n(n + 2), var[Y ] = 2n, E[ Y ] = 2Γ( (n+1)/2)/Γ( n/2) and E[1/Y ] = 1/(n − 2)
provided n > 2.
By equation(4.5a), the c.f. of the χ2n distribution is 1/(1 − 2it)n/2 . It immediately follows that if X ∼ χ2m ,
Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2m+n .
2
The local central limit theorem. Suppose Y1 , Y2 , . . . are i.i.d. random variables with mean 0 and variance 1 and characteristic
k
function φY . Suppose further that
√ |φY | is integrable for some positive k and sup{|φY (t)| : |t| ≥ δ} < 1 for all δ > 0. Let
Sn = Y1 + · · · + Yn ; then Sn / n has a bounded continuous density fn for all n ≥ k and supx∈R |fn (x) − n(x)| → 0 as
n → ∞.
This formulation is due to Michael Wichura: galton.uchicago.edu/~wichura/Stat304/Handouts/L16.limits.pdf.
See also page 516 in William Feller(1971) An Introduction to Probability Theory and its Applications. Volume 2.
3
To exclude the trivial case that both X and Y are constant. In fact if one of X and Y is constant and X/(X +Y ) is independent
of X + Y , then the other must be constant also.
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Exercises 5 Page 11
4.9 The generalized gamma distribution.
Definition(4.9a). Suppose n > 0, λ > 0 and b > 0. Then the random variable X has the generalized gamma
distribution GGamma(n, λ, b) iff X has density
bλn bn−1 −λxb
f (x) =
x
e
for x > 0.
(4.9a)
Γ(n)
Note that if n = b = 1, then the generalized gamma is the exponential distribution, if b = 1, the generalized gamma
is the gamma distribution, if n = 1, the generalized gamma is the Weibull distribution—introduced below in §18.3
on page 36, if n = 1, b = 2 and λ = 1/2σ2 , the generalized gamma is the Rayleigh distribution—introduced below
in §18.2 on page 36, and finally if n = 1/2, b = 2 and λ = 1/2σ2 then the generalized gamma is the half-normal
distribution—introduced in exercise 1.7(7) on page 15.
It is left to an exercise (see exercise10 on page 12) to check:
• The function f in equation(4.9a) integrates to 1 and so is a density.
• If X ∼ GGamma(n, λ, b) then Y = X b ∼ Gamma(n, λ).
• The central moments are given by the expression:
Γ( k/b + n)
E[X k ] = b/k
λ Γ(n)
The generalized gamma distribution is used in survival analysis and reliability theory to model lifetimes.
4.10
Summary. The gamma distribution.
• Density. X has the Gamma(n, α) density for n > 0 and α > 0 iff
αn xn−1 e−αx
for x > 0.
Γ(n)
• Moments. E[X] = n/α; var[X] = n/α2 and E[X k ] = Γ(n+k)/αk Γ(n) for n + k > 0.
• M.g.f. and c.f.
fX (x) =
MX (t) = E[etX ] =
1
(1 − t/α)n
for t < α.
φX (t) = E[eitX ] =
1
(1 − it/α)n
• Properties.
Gamma(1, α) is the exponential (α) distribution.
If X ∼ Gamma(n, α) and β > 0 then βX ∼ Gamma(n, α/β ).
The Gamma(n, α) distribution is the sum of n independent exponential (α) distributions.
If X ∼ Gamma(m, α), Y ∼ Gamma(n, α) and X and Y are independent, then X + Y ∼ Gamma(m + n, α).
The χ2n distribution.
• This is the Gamma( n/2, 1/2) distribution.
• If X ∼ χ2n , then E[X] = n, var[X] = 2n and the c.f. is φ(t) = 1/(1 − 2it)n/2 .
• If X ∼ χ2m , Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2m+n .
• The χ22 distribution is the exponential ( 1/2) distribution.
5 Exercises
(exs-gamma.tex)
R∞
1. The Gamma function. This is defined to be Γ(x) = 0 ux−1 e−u du for x > 0. Show that
(a) Γ(x + 1) = x Γ(x) for all x > 0;
(b) Γ(1) = 1;√
(c) Γ(n) = (n − 1)! for all integral n ≥ 2;
(d) Γ( 1/2) = π
1.3.5 . . . (2n − 1) √
(2n)! √
(e) Γ n + 1/2 =
π = 2n
π for integral n ≥ 1
n
2
2 n!
2. Suppose X ∼ Gamma(m, α) and Y ∼ Gamma(n, α) and X and Y are independent. Find E[ Y /X ].
3. By §4.4 on page 8, we know that if n > 1, the maximum of the Gamma(n, 1) density occurs at x = n − 1. Show that
the maximum value of the density when n > 1 is approximately
1
√
2π(n − 1)
√
Hint: Stirling’s formula is n! ∼ nn e−n 2πn as n → ∞.
Page 12 Section 6
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
4. Gamma densities are closed under convolution. Suppose X ∼ Gamma(n1 , α), Y ∼ Gamma(n2 , α) and X and Y are
independent. Prove that X + Y has the Gamma(n1 + n2 , α) distribution.
5. Suppose X ∼ Gamma(m, α) and Y ∼ Gamma(n, α) and X and Y are independent.
(a) Show that U = X + Y and V = X/(X + Y ) are independent..
(b) Show that U = X + Y and V = Y /X are independent.
In both cases, find the densities of U and V .
6. Suppose X ∼ Gamma(n, α). Show that
n
2n
2
P X≥
≤
α
e
7. Suppose X ∼ Gamma(m, α) and Y ∼ Gamma(n, α) and X and Y are independent. Show that
n mv
if v > 0;
E[X|X + Y = v] = m+n
0
otherwise.
8. Suppose X ∼ exponential(λ), and given X = x, the n random variables Y1 , . . . , Yn are i.i.d. exponential(x).4 Find the
distribution of (X|Y1 , . . . , Yn ) and E[X|Y1 , . . . , Yn ].
9. P
Suppose Gn ∼ Gamma(n,
Pn α) and Sn = α(Gn − n/α) where n > 1 is an integer. Hence Sn = α(Gn − n/α) =
n
α(X
−
1/α)
=
i
i=1
i=1 Yi where each Xi ∼ exponential (α) and each Yi has mean 0 and variance 1.
Check that the conditions of the local central limit theorem (§4.6 on page 9) are satisfied and hence verify the limiting
result (4.6a) on page 10.
10. The generalized gamma distribution.
(a) Show that the function f defined in equation(4.9a) is a density.
(b) Suppose X ∼ GGamma(n, λ, b). Show that Y = X b ∼ Gamma(n, λ).
(c) Suppose X ∼ GGamma(n, λ, b). Find the central moments E[X k ] for k = 1, 2, . . . .
6 The normal distribution
6.1 The density function.
Suppose µ ∈ (−∞, ∞) and σ ∈ (0, ∞). Then the random variable X has the normal
N (µ, σ 2 ) distribution if it has density
(x − µ)2
1
fX (x) = √ exp −
for x ∈ R.
(6.1a)
2σ 2
σ 2π
The normal density has the familiar “bell” shape. There are points of inflection at x = µ − σ and x = µ + σ—this
means the f 00 (x) = 0 at these points and the curve changes from convex, when x < µ − σ, to concave and then to
convex again when x > µ + σ.
Definition(6.1a).
A
µ−σ
B
µ
µ+σ
Figure(6.1a). The graph of the normal density. Points A and B are points of inflection.
(wmf/normaldensity,72mm,54mm)
4
This means that f(Y1 ,...,Yn )|X (y1 , . . . , yn |x) = Πni=1 fYi |X (yi |x) = xn e−x(y1 +···+yn ) .
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 6 Page 13
To check that the function fX defined in equation(6.1a) is a density function:
Clearly fX (x) ≥ 0 for all x ∈ R. Using the substitution t = (x − µ)/σ gives
Z ∞
Z ∞
1
(x − µ)2
√ exp −
I=
fX (x) dx =
dx
2σ 2
∞
∞ σ 2π
r
Z ∞
Z ∞
2 2 2
1
2
J
=√
exp − t /2 dt = √
exp − t /2 dt =
π
2π −∞
2π 0
where
Z ∞Z ∞
Z π/2 Z ∞
π
J2 =
exp[− 12 (x2 + y 2 )]dy dx =
r exp[− 21 r2 ]dr dθ =
2
0
0
0
0
and hence
r
π
J=
2
This shows that fX integrates to 1 and hence is a density function.
6.2 The distribution function, mean and variance. The standard normal distribution is the normal distribution
N (0, 1); its distribution function is
Z x
1
√ exp[− 12 t2 ] dt
Φ(x) =
2π
−∞
This function is widely tabulated. Note that:
• Φ(−x) = 1 − Φ(x). See exercise 1 on page 15.
• If X has the N (µ, σ 2 ) distribution, then for −∞ < a < b < ∞ we have
Z b
Z (b−µ)/σ
1
(x − µ)2
1
√ exp −
P[a < X ≤ b] =
dx = √
exp − t2/2 dt
2
2σ
2π (a−µ)/σ
a σ 2π
a−µ
b−µ
−Φ
=Φ
σ
σ
2
The mean of the N (µ, σ ) distribution:
Z ∞
1
(x − µ)2
√
E[X] =
[(x − µ) + µ]
exp −
dx = 0 + µ = µ
2σ 2
σ 2π
−∞
because the function x 7−→ x exp[− x2/2] is odd.
The variance of the N (µ, σ 2 ) distribution: use integration by parts as follows
Z ∞
Z ∞
1
(x − µ)2
σ2
2
√
√
var[X] =
(x − µ)
exp −
dx =
t2 exp[ −t2/2] dt
2
2σ
σ 2π
2π −∞
−∞
Z ∞
Z ∞
2
2
2σ
2σ
=√
t t exp[ −t2/2] dt = √
exp[− t2/2] dt = σ 2
2π 0
2π 0
6.3 The moment generating function and characteristic function. Suppose X ∼ N (µ, σ 2 ); then X = µ + σY
where Y ∼ N (0, 1). For s ∈ R, the moment generating function of X is given by
Z ∞
1 2
1
MX (s) = E[esX ] = esµ E[esσY ] = esµ
esσt √ e− 2 t dt
2π
−∞
2
Z ∞
Z ∞
esµ
t − 2σst
esµ
(t − σs)2 σ 2 s2
=√
exp −
dt = √
exp −
+
dt
2
2
2
2π −∞
2π −∞
= exp sµ + 12 σ 2 s2
1
2 2
Similarly the characteristic function of X is E[eitX ] = eiµt− 2 σ t .
Moments of a distribution can be obtained by expanding the moment generating function as a power series: E[X r ]
is the coefficient of sr /r! in the expansion of the moment generating function. It is easy to find the moments
2
sY
the mean of a normal distribution in this way: if X ∼ N (µ, σ ) and Y = X − µ then E[e ] = exp 2 σ s which
can be expanded in a power series of powers of s. Hence
E (X − µ)2n+1 = E Y 2n+1 = 0 for n = 0, 1, . . .
and
Page 14 Section 6
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
(2n)!σ 2n
for n = 0, 1, . . .
E (X − µ)2n = E Y 2n =
2n n!
For example, E[(X − µ)2 ] = σ 2 and E[(X − µ)4 ] = 3σ 4 .
Similarly we can show that (see exercise 9 on page 15):
2n/2 σ n
n+1
n
E |X − µ| = √ Γ
2
π
for n = 0, 1, . . . .
There are available complicated expressions for E[X n ] and E[|X|n ]; for example, see [W INKELBAUER(2014)].
6.4 Linear combination of independent normals.
2
Suppose
Pn X1 , X2 , . . . , Xn are independent random variables with Xi ∼ N (µi , σi ) for
i = 1, 2, . . . , n. Let T = i=1 di Xi where di ∈ R for i = 1, 2, . . . , n. Then
!
n
n
X
X
2 2
T ∼N
di µi ,
di σi
Proposition(6.4a).
i=1
i=1
Proof. Using moment generating functions gives
n
n
n
Y
Y
Y
1 2 2 2
sT
sdi Xi
MT (s) = E[e ] =
E[e
]=
Mxi (sdi ) =
exp sdi µi + s di σi
2
i=1
i=1
i=1
!
n
n
X
1 X 2 2
di µi , + s2
di σi
= exp s
2
i=1
i=1
Pn
Pn 2 2 which is the mgf of N
i=1 di µi ,
i=1 di σi .
Corollary(6.4b). If X1 , . . . , Xn are i.i.d. N (µ, σ 2 ), then Xn has the normal distribution N (µ, σ 2 /n).
6.5 Sum of squares of independent N (0, 1) variables.
Proposition(6.5a). Suppose X1 ,. . . , Xn are i.i.d. random variables with the N (0, 1) distribution.
Let Z = X12 + · · · + Xn2 . Then Z ∼ χ2n .
Proof. Consider n = 1. Now X1 has density
2
1
fX1 (x) = √ e−x /2
2π
for x ∈ R.
Then Z = X12 has density
√ dx 1
1
z −1/2 e−z/2
fZ (z) = 2fX1 ( z) = 2 √ e−z/2 √ = 1/2 1
for z > 0.
dz
2 z
2 Γ( /2)
2π
Thus Z ∼ χ21 . We know that if X ∼ χ2m , Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2n+m . Hence Z ∼ χ2n
in the general case.
6.6 Characterizations of the normal distribution. There are many characterizations of the normal distribution5 —here are two of the most useful and interesting.
Cramér’s theorem. Suppose X and Y are independent random variables such that Z =
X + Y has a normal distribution. Then both X and Y have normal distributions—although one may have a
degenerate distribution.
Proposition(6.6a).
Proof. See, for example, page 298 in [M ORAN(2003)].
Proposition(6.6b). The Skitovich-Darmois theorem. Suppose n ≥ 2 and X1 , . . . , Xn are independent random
variables. Suppose a1 , . . . , an , b1 , . . . , bn are all in R and
L1 = a1 X1 + · · · + an Xn
L2 = b2 X1 + · · · + bn Xn
If L1 and L2 are independent, then all random variables Xj with aj bj 6= 0 are normal.
Proof. See, for example, page 89 in [K AGAN et al.(1973)].
5
For example, see [M ATHAI & P EDERZOLI(1977)]and [PATEL & R EAD(1996)]
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Exercises 7 Page 15
6.7
Summary. The normal distribution.
• Density. X has the N (µ, σ 2 ) distribution iff it has the density
1
(x − µ)2
for x ∈ R.
fX (x) = √ exp −
2σ 2
σ 2π
• Moments: E[X] = µ and var[X] = σ 2
• The distribution function: P[X ≤ x] = Φ(x) which is tabulated.
• The moment generating function: MX (t) = E[etX ] = exp[tµ + 12 t2 σ 2 ]
• The characteristic function: φX (t) = E[eitX ] = exp[iµt − 12 σ 2 t2 ]
• A linear combination of independent normals has a normal distribution.
• The sum of squares of n independent N (0, 1) variables has the χ2n distribution.
7 Exercises
(exs-normal.tex)
1. Show that Φ(−x) = 1 − Φ(x).
2. Suppose X ∼ N (µ, σ 2 ). Suppose further that P[X ≤ 140] = 0.3 and P[X ≤ 200] = 0.6. Find µ and σ 2 .
3. Suppose Y has the distribution function FY (y) with
1
Φ(y)
if y < 0;
FY (y) = 21 1
+
Φ(y)
if y ≥ 0.
2
2
n
Find E[Y ] for n = 0, 1, . . . .
4. Suppose X is a random variable with density fX (x) = ce−Q(x) for all x ∈ R where Q(x) = ax2 − bx and a 6= 0.
(a) Find any relations that must exist between a, b and c and show that X must have a normal density.
(b) Find the mean and variance of X in terms of a and b.
5. (a) Suppose X and Y are i.i.d. random variables with the N (0, σ 2 ) distribution. Find the density of Z = X 2 + Y 2 .
(b) Suppose X1 ,. . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Let Z = X12 + · · · + Xn2 . Find the
distribution of Z.
6. Suppose X ∼ N (µ, σ 2 ). Suppose further that, given X = x, the n random variables Y1 , . . . , Yn are i.i.d. N (x, σ12 ).6
Find the distribution of (X|Y1 , . . . , Yn ).
7. The half-normal distribution. Suppose X ∼ N (0, σ 2 ).
(a) Find the density of |X|.
(b) Find E[|X|].
8. The folded normal distribution. Suppose X ∼ N (µ, σ 2 ). Then |X| has the folded normal distribution, folded (µ, σ 2 ).
Clearly the half-normal is the folded (0, σ 2 ) distribution.
Suppose Y ∼ folded (µ, σ 2 ).
(a) Find the density of Y .
b) Find E[Y ] and var[Y ].
(c) Find the c.f. of Y .
9. Suppose X ∼ N (µ, σ 2 ). Show that
2n/2 σ n
n+1
n
√
Γ
E |X − µ| =
2
π
n
This also gives E[|X| ] for the half-normal distribution.
6
This means that f(Y1 ,...,Yn )|X (y1 , . . . , yn |x) = Πni=1 fYi |X (yi |x).
for n = 0, 1, . . . .
Page 16 Section 8
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
10. Suppose X and Y are i.i.d. N (0, 1).
(a) Let Z1 = X + Y and Z2 = X − Y . Show that Z1 and Z2 are independent.
2
2
(b) By using the relation XY = X+Y
− X−Y
, find the characteristic function of Z = XY .
2
2
R∞
itXY
ityX
(c) By using the relation E[e
] = −∞ E[e
]fY (y) dy, find the characteristic function of Z = XY .
(d) Now suppose X and Y are i.i.d. N (0, σ 2 ). Find the c.f. of Z = XY .
(e) Now suppose X and Y are i.i.d. N (µ, σ 2 ). Find the c.f. of Z = XY .
11. Suppose X1 , X2 , X3 and X4 are i.i.d. N (0, 1). Find the c.f. of X1 X2 + X3 X4 and the c.f. of X1 X2 − X3 X4 . See also
exercise 1.19(31) on page 37.
12. (a) Suppose b ∈ (0, ∞). Show that
∞
r
1
b2
π −b
exp −
du =
u2 + 2
e
2
u
2
0
(b) Suppose a ∈ R with a 6= 0 and b ∈ R. Show that
Z ∞
π 1/2
1
b2
du =
exp −
a2 u2 + 2
e−|ab|
2
u
2a2
0
Z
(7.12a)
(7.12b)
This result is used in exercise 1.19(37) on page 37.
8 The lognormal distribution
8.1 The definition.
Definition(8.1a). The random variable X has the lognormal (µ, σ 2 ) distribution iff ln(X) ∼ N (µ, σ 2 ).
We shall denote this distribution by logN (µ, σ 2 ). Hence:
• if X ∼ logN (µ, σ 2 ) then ln(X) ∼ N (µ, σ 2 );
• if Z ∼ N (µ, σ 2 ) then eZ ∼ logN (µ, σ 2 ).
8.2 The density and distribution function. Suppose X ∼ logN (µ, σ 2 ) and let Z = ln(X). Then
ln(x) − µ
FX (x) = P[X ≤ x] = P[Z ≤ ln(x)] = Φ
σ
2
hence the distribution function of the logN (µ, σ ) distribution is
ln x − µ
FX (x) = Φ
for x > 0.
σ
Differentiating the distribution function gives the density:
1
ln x − µ
1
(ln x − µ)2
fX (x) =
φ
=√
exp −
for x > 0.
σx
σ
2σ 2
2πσx
Z
The density can also be obtained by transforming the
Z ∼ N (µ, σ 2 ). Hence
normal density. Now X = e where
dx
dz
z
2
| dz | = e = x; hence fX (x) = fZ (z)| dx | = fZ (ln x) x where fZ is the density of N (µ, σ ).
µ = 0, σ = 0.25
µ = 0, σ = 0.5
µ = 0, σ = 1.0
1.5
1.0
0.5
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure(8.2a). The graph of the lognormal density for (µ = 0, σ = 0.25), (µ = 0, σ = 0.5) and (µ = 0, σ = 1).
In all 3 cases, we have median = 1, mode < 1 and mean > 1—see exercise 4 on page 19.
(wmf/lognormaldensity,72mm,54mm)
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 8 Page 17
Suppose X ∼ logN (µ, σ 2 ). Then E[X n ] = E[enZ ] = exp nµ + 21 n2 σ 2 for any n ∈ R. In
8.3 Moments.
particular
1 2
E[X] = exp µ + σ
2
(8.3a)
var[X] = E[X 2 ] − {E[X]}2 = e2µ+σ
2
2
eσ − 1
8.4 Other properties.
• Suppose X1 , . . . , Xn are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Then
!
n
n
n
Y
X
X
Xi = X1 · · · Xn ∼ logN
µi ,
σi2
i=1
i=1
• Suppose X1 , . . . , Xn are i.i.d. with the
i=1
logN (µ, σ 2 )
distribution. Then
σ2
1/n
(X1 · · · Xn )
∼ logN µ,
n
• If X ∼ logN (µ, σ 2 ) , b ∈ R and c > 0 then
cX b ∼ logN ln(c) + bµ, b2 σ 2
(8.4a)
See exercises 5 and 2 below for the derivations of these results.
8.5 The multiplicative central limit theorem.
Proposition(8.5a). Suppose X1 , . . . , Xn are i.i.d. positive random variables such that
E[ ln(X) ] = µ
and
var[ ln(X) ] = σ 2
both exist and are finite. Then
√
X1 · · · Xn 1/ n D
−→ logN (0, σ 2 ) as n → ∞.
enµ
Proof. Let Yi = ln(Xi ) for i = 1, 2, . . . , n. Then
"
1/√n # Pn
(Yi − µ) D
X1 · · · Xn
ln
= i=1√
−→ N (0, σ 2 )
nµ
e
n
D
as n → ∞.
7
D
Now if Xn −→X as n → ∞ then g(Xn ) −→g(X) as n → ∞ for any continuous function, g. Taking g(x) = ex proves
the proposition.
Using equation(8.4a) shows that if X ∼ logN (0, σ 2 ) then X 1/σ ∼ logN (0, 1). It follows that
"
#
√1
X1 · · · Xn σ n
lim P
≤ x = Φ(x) for all x > 0.
n→∞
enµ
Also, if we let
W =
X1 · · · Xn
enµ
1/√n
then (X1 · · · Xn )1/
√
n
= eµ
√
n
√
W and (X1 · · · Xn )1/n = eµ W 1/
n
and hence by equation(8.4a), (X1 · · · Xn )1/n is asymptotically logN (µ, σ 2 /n).
We can generalise proposition (8.5a) as follows:
Proposition(8.5b). Suppose X1 , X2 , . . . is a sequence of independent positive random variables such that for
all i = 1, 2, . . .
E[ ln(Xi ) ] = µi ,
7
var[ ln(Xi ) ] = σi2
and E |ln(Xi ) − µi |3 = ωi3
The classical central limit theorem asserts that if X1 , X2 , . . . is a sequence of i.i.d. random variables with finite expectation µ
and finite variance σ 2 and Sn = (X1 + · · · + Xn )/n, then
√
D
n (Sn − µ) −→ N (0, σ 2 ) as n → ∞.
See page 357 in [B ILLINGSLEY(1995)].
Page 18 Exercises 9
Jun 7, 2018(14:22)
all exist and are finite. For n = 1, 2, . . . , let
n
X
µ(n) =
µi
s2(n)
i=1
=
n
X
σi2
i=1
Suppose further that ω(n) /s(n) → 0 as n → ∞. Then
X1 · · · Xn 1/s(n) D
−→ logN (0, 1)
eµ(n)
Bayesian Time Series Analysis
3
ω(n)
=
n
X
ωi3
i=1
as n → ∞.
Proof. Let Yi = ln(Xi ) for i = 1, 2, . . . , n. Then
1/s(n) Pn
(Yi − µi ) D
X1 · · · Xn
ln
= i=1
−→N (0, 1) as n → ∞.
eµ(n)
s(n)
Using the transformation g(x) = ex proves the proposition.
8
Also, if we let
W =
X1 · · · Xn
eµ(n)
1/s(n)
then X1 · · · Xn = eµ(n) W s(n)
and hence by equation(8.4a), the random variable (X1 · · · Xn ) is asymptotically logN µ(n) , s2(n) .
8.6 Usage. The multiplicative central limit theorem suggests the following applications of the lognormal which
can be verified by checking available data.
• Grinding, where a whole is divided into a multiplicity of particles and the particle size is measured by volume,
mass, surface area or length.
• Distribution of farm size (which corresponds to a division of land)—where a 3-parameter lognormal can be
used. The third parameter would be the smallest size entertained.
• The size of many natural phenomena is due to the accumulation of many small percentage changes—leading to
a lognormal distribution.
8.7
Summary. The lognormal distribution.
• X ∼ logN (µ, σ 2 ) iff ln(X) ∼ N (µ, σ 2 ).
2
2
• Moments: if X ∼ logN (µ, σ 2 ) then E[X] = exp µ + 12 σ 2 and var[X] = e2µ+σ ( eσ − 1 )
• The product of independent lognormals is lognormal.
• If X ∼ logN (µ, σ 2 ) , b ∈ R and c > 0 then cX b ∼ logN ln(c) + bµ, b2 σ 2
• The multiplicative central limit theorem.
9 Exercises
(exs-logN.tex)
1. An investor forecasts that the returns on an investment over the next 4 years will be as follows: for each of the first
2 years he estimates that £1 will grow to £(1 + I) where I is a random variable with E[I] = 0.08 and var[I] = 0.001;
for each of the last 2 years he estimates that £1 will grow to £(1 + I) where I is a random variable with E[I] = 0.06 and
var[I] = 0.002.
Suppose he further assumes that the return Ij in year j is independent of the returns in all other years and that 1 + Ij has
a lognormal distribution, for j = 1, 2, . . . , 4.
Calculate the amount of money which must be invested at time t = 0 in order to ensure that there is a 95% chance that
the accumulated value at time t = 4 is at least £5,000.
2. Suppose X ∼ logN (µ, σ 2 ).
(a) Find the distribution of 1/X.
8
(b) Suppose b ∈ R and c > 0. Find the distribution of cX b .
Lyapunov central limit theorem with δ = 1. Suppose X1 , X2 , . . . is a sequence of independent random variables such that
E[Xi ] = µi and var[Xi ] = σi2 are both finite. Let sn = σ12 + · · · + σn2 and suppose
Pn
n
1 X
3
i=1 (Xi − µi ) D
lim 3
E |Xi − µi | = 0,
then
−→N (0, 1) as n → ∞.
n→∞ sn
sn
i=1
See page 362 in [B ILLINGSLEY(1995)].
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 10 Page 19
3. The geometric mean and geometric variance of a distribution. Suppose each xi P
in the data set {x1 ., . . . , P
xn } satisfies
n
xi > 0. Then the geometric mean of the data set is g = (x1 · · · xn )1/n or ln(g) = n1 i=1 ln(xi ) or ln(g) = n1 j fj ln(xj )
where fj is the frequency of the observation xj . This definition motivates the following.
Suppose
X is a random variable with X > 0. Then GMX , the geometric mean of X is defined by ln(GMX ) =
R∞
ln(x)f
X (x) dx = E[ln(X)].
0
Similarly, we define the geometric variance, GVX , by
ln(GVX ) = E (ln X − ln GMX )2 = var[ln(X)]
√
and the geometric standard deviation by GSDX = GVX .
Suppose X ∼ logN (µ, σ 2 ). Find GMX and GSDX .
4. Suppose X ∼ logN (µ, σ 2 ).
mode < median < mean.
(a) Find the median and mode and show that:
(b) Find expressions for the lower and upper quartiles of X in terms of µ and σ.
(c) Suppose αp denotes the p-quartile of X; this means that P[X ≤ αp ] = p. Prove that αp = eµ+σβp where βp is the
p-quartile of the N (0, 1) distribution.
5. (a) Suppose X1 , .Q
. . , Xn are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Find the
n
distribution of i=1 Xi = X1 · · · Xn .
(b) Suppose X1 , . . . , Xn are i.i.d. with the logN (µ, σ 2 ) distribution. Find the distribution of (X1 · · · Xn )1/n .
(c) Suppose X1 , . . . , Xn be independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Suppose further
that a1 , . . . , an are real constants. Show that
n
Y
Xiai ∼ logN (mn , s2n )
i=1
for some mn and sn and find explicit expressions for mn and sn .
6. Suppose X1 and X2 are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1 and i = 2. Find the distribution
of X1 /X2 .
7. Suppose X ∼ logN (µ, σ 2 ). Suppose further that E[X] = α and var[X] = β. Express µ and σ 2 in terms of α and β.
8. Suppose X ∼ logN (µ, σ 2 ) and k > 0. Show that
ln(k)−µ−σ 2
Φ
σ
1 2
E[X|X < k] = eµ+ 2 σ
ln(k)−µ
Φ
σ
and
1
E[X|X ≥ k] = eµ+ 2 σ
2
Φ
µ+σ 2 −ln(k)
σ
1−Φ
ln(k)−µ
σ
9. Suppose X ∼ logN (µ, σ 2 ). Then the j th moment distribution function of X is defined to be the function G : [0, ∞) →
[0, 1] with
Z x
1
G(x) =
uj fX (u) du
E[X j ] 0
(a) Show that G is the distribution function of the logN (µ + jσ 2 , σ 2 ) distribution.
(b) Suppose γX denotes the Gini coefficient of X (also called the coefficient of mean difference of X). By definition
Z ∞Z ∞
1
|u − v|fX (u)fX (v) dudv
γX =
2E[X] 0
0
Hence γX =
E|X−Y |
2E[X]
where X and Y are independent with the same distribution. Prove that
γX = 2Φ( σ/√2) − 1
10 The beta and arcsine distributions
10.1 The density and distribution function.
Suppose α > 0 and β > 0. Then the random variable X has a beta distribution,
Beta(α, β), iff it has density
Γ(α + β) α−1
f (x; α, β) =
x
(1 − x)β−1
for 0 < x < 1.
(10.1a)
Γ(α)Γ(β)
Definition(10.1a).
Page 20 Section 10
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
Note:
R∞
• Checking equation(10.1a) is a density function. Now Γ(α) = 0 uα−1 e−u du by definition. Hence
Z ∞Z ∞
Γ(α)Γ(β) =
uα−1 v β−1 e−u−v du dv
0
0
Now use the transformation x = u/(u + v) and y = u + v; hence u = xy and v = y(1 − x). Clearly 0 < x < 1 and
0 < y < ∞. Finally ∂(u,v)
∂(x,y) = y. Hence
Z 1 Z ∞
Γ(α)Γ(β) =
y α+β−1 xα−1 (1 − x)β−1 e−y dx dy
x=0
y=0
1
Z
= Γ(α + β)
x=0
xα−1 (1 − x)β−1 dx
• The beta function is defined by
Z 1
Γ(α)Γ(β)
for all α > 0 and β > 0.
B(α, β) =
tα−1 (1 − t)β−1 dt =
Γ(α + β)
0
Properties of the beta and gamma functions can be found in most advanced calculus books. Recall that Γ(n) =
(n − 1)! if n is a positive integer.
• The distribution function
β) is
Z x of the beta distribution, Beta(α,
Z x
1
Ix (α, β)
f (t; α, β) dt =
tα−1 (1 − t)β−1 dt =
F (x; α, β) =
for x ∈ (0, 1).
B(α,
β)
B(α, β)
0
0
The integral, Ix (α, β), is called the incomplete beta function.
R1
10.2 Moments. Using the fact that 0 xα−1 (1 − x)β−1 dx = B(α, β), it is easy to check that
α
(α + 1)α
αβ
E[X] =
E[X 2 ] =
and hence var[X] =
(10.2a)
α+β
(α + β + 1)(α + β)
(α + β)2 (α + β + 1)
By differentiation, we get f 0 (x; α, β) = 0 implies x(2 − α − β) = α − 1. This has a root for x in [0, 1] if either
(a) α + β > 2, α ≥ 1 and β ≥ 1 or (b) α + β < 2, α ≤ 1 and β ≤ 1. By checking the second derivative, we see
α−1
mode[X] =
if α + β > 2, α ≥ 1 and β ≥ 1.
α+β−2
10.3 Shape of the density. The beta density can take many different shapes.
α = 1/2 , β = 1/2
α = 5, β = 1
α = 1, β = 3
3.0
2.5
2.5
2.0
2.0
1.5
1.5
1.0
1.0
0.5
0.5
0.0
0.0
0.0
0.2
0.4
0.6
α = 2, β = 2
α = 2, β = 5
α = 5, β = 2
3.0
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure(10.3a). Shape of the beta density for various values of the parameters.
10.4 Some distribution properties.
• Suppose X ∼ Beta(α, β), then 1 − X ∼ Beta(β, α).
• The Beta(1, 1) distribution is the same as the uniform distribution on (0, 1).
• Suppose X ∼ Gamma(n1 , α) and Y ∼ Gamma(n2 , α). Suppose further that X and Y are independent. Then
X/(X + Y ) ∼ Beta(n1 , n2 ). See exercise 5 on page 12.
In particular, if X ∼ χ22k = Gamma(k, 1/2), Y ∼ χ22m = Gamma(m, 1/2) and X and Y are independent, then
X/(X + Y ) ∼ Beta(k, m).
• If X ∼ Beta(α, 1) then − ln(X) ∼ Exponential(α). See exercise 1 on page 22.
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 10 Page 21
10.5 The beta prime distribution.
Suppose α > 0 and β > 0. Then the random variable X is said to have the beta prime
distribution, Beta (α, β), iff it has density
xα−1 (1 + x)−α−β
f (x) =
for x > 0.
(10.5a)
B(α, β)
Properties of the beta prime distribution are left to the exercises. Its relation to the beta distribution is given by the
next two observations.
Definition(10.5a).
0
• If X ∼ Beta(α, β), then
X
1−X
∼ Beta0 (α, β). See exercise 2 on page 22.
• If X ∼ Beta(α, β), then X1 − 1 ∼ Beta0 (β, α). This follows from the previous result: just use Y = 1 − X ∼
Beta(β, α) and 1/X − 1 = (1 − X)/X = Y /(1 − Y ).
We shall see later (see §12.8 on page 25) that the beta prime distribution is just a multiple of the F -distribution.
10.6 The arcsine distribution on (0, 1). The arcsine distribution is the distribution Beta 1/2, 1/2 .
Definition(10.6a). The random variable has the arcsine distribution iff X has density
1
for x ∈ (0, 1).
fX (x) = √
π x(1 − x)
The distribution function. Suppose X has the arcsine distribution; then
√
2
arcsin(2x − 1) 1
FX (x) = P[X ≤ x] = arcsin( x) =
+
for x ∈ [0, 1].
(10.6a)
π
π
2
Moments of the arcsine distribution. Using the results in equation(10.2a) on page 20 above and figure (10.6a)
below we get
3
1
1
1
E[X 2 ] =
var[X] =
mode(X) = {0, 1}
median(X) =
E[X] =
2
8
8
2
Shape of the distribution.
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.2
0.4
0.6
0.8
1.0
Figure(10.6a). Plot of the arcsine density
(wmf/arcsineDensity,72mm,54mm)
10.7 The arcsine distribution on (a, b).
Definition(10.7a). Suppose −∞ < a < b < ∞. Then the random variable X has the arcsine distribution on
(a, b), denoted arcsin(a, b), iff X has density
1
for x ∈ (a, b).
fX (x) = √
π (x − a)(b − x))
This means that the distribution defined in definition(10.6a) can also be described as the arcsin(0, 1) distribution.
The distribution function is
2
F (x) = arcsin
π
r
x−a
b−a
for a ≤ x ≤ b.
If X ∼ arcsin(a, b) then kX + m ∼ arcsin(ka + m, bk + m). In particular,
if X ∼ arcsin(0, 1) then (b − a)X + a ∼ arcsin(a, b);
if X ∼ arcsin(a, b) then (X − a)/(b − a) ∼ arcsin(0, 1).
The proof of this and further properties of the arcsine distribution can be found in exercises 5 and 6 on page 22.
Page 22 Exercises 11
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
10.8
Summary.
The beta distribution. Suppose α > 0 and β > 0; thenX ∼ Beta(α, β) iff X has density
f (x; α, β) =
• Moments:
Γ(α + β) α−1
x
(1 − x)β−1
Γ(α)Γ(β)
for 0 < x < 1.
α
αβ
var[X] =
2
α+β
(α + β) (α + β + 1)
• Suppose X ∼ Beta(α, β), then 1 − X ∼ Beta(β, α).
• The Beta(1, 1) distribution is the same as the uniform distribution on (0, 1).
• Suppose X ∼ Gamma(n1 , α) and Y ∼ Gamma(n2 , α). Suppose further that X and Y are independent. Then X/(X + Y ) ∼ Beta(n1 , n2 ).
nX
• If X ∼ Beta(α, 1) then − ln(X) ∼ Exponential(α). If X ∼ Beta( m/2, n/2) then m(1−X)
∼ Fm,n .
E[X] =
The arcsine distribution. If X ∼ arcsin(0, 1) then X has density
1
for x ∈ (0, 1).
fX (x) = √
π x(1 − x)
• Moments: E[X] = 1/2 and var[X] = 1/8.
The beta prime distribution. Suppose α > 0 and β > 0; then X ∼ Beta0 (α, β) iff the density is
xα−1 (1 + x)−α−β
B(α, β)
0
∼ Beta (α, β).
f (x) =
• If X ∼ Beta(α, β), then
X
1−X
for x > 0.
11 Exercises
(exs-betaarcsine.tex.tex)
The beta and beta prime distributions.
1. Suppose X ∼ Beta(α, 1). Show that − ln(X) ∼ Exponential(α).
2. Suppose X ∼ Beta(α, β). Show that
X
1−X
∼ Beta0 (α, β).
3. The beta prime distribution. Suppose X has the beta prime distribution, Beta0 (α, β).
(a) Show that E[X] = α/(β − 1) provided β > 1.
(b) Show that var[X] = α(α + β − 1)/(β − 2)(β − 1)2 provided β > 2.
(c) Show that the mode occurs at (α − 1)/(β + 1) if α ≥ 1 and at 0 otherwise.
(d) 1/X ∼ Beta0 (β, α).
(e) Suppose X ∼ Gamma(n1 , 1) and Y ∼ Gamma(n2 , 1). Suppose further that X and Y are independent. Show that
X/Y ∼ Beta0 (n1 , n2 ).
(f) Suppose X ∼ χ2n1 , Y ∼ χ2n2 and X and Y are independent. Show that X/Y ∼ Beta0 (n1 /2, n2 /2).
The arcsine distribution.
4. Prove the equality in equation(10.6a) on page 21:
√
2
arcsin(2x − 1) 1
arcsin( x) =
+
π
π
2
for x ∈ [0, 1]
5. (a) Suppose X ∼ arcsin(a, b). Prove that kX + m ∼ arcsin(ka + m, bk + m).
(b) Suppose X ∼ arcsin(−1, 1). Prove that X 2 ∼ arcsin(0, 1).
(c) Suppose X ∼ Uniform(−π, π). Prove that sin(X), sin(2X) and − cos(2X) all have the arcsin(−1, 1) distribution.
6. Suppose X ∼ Uniform(−π, π), Y ∼ Uniform(−π, π) and X and Y are independent.
(a) Prove that sin(X + Y ) ∼ arcsin(−1, 1).
(b) Prove that sin(X − Y ) ∼ arcsin(−1, 1).
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 12 Page 23
12 The t, Cauchy and F distributions
12.1 Definition of the tn distribution.
Definition(12.1a). Suppose n ∈ (0, ∞). Then the random variable T has a t-distribution with n degrees of
freedom iff
X
T =p
Y /n
(12.1a)
where X ∼ N(0, 1), Y ∼ χ2n , and X and Y are independent.
Density: Finding the density is a routine calculation and is left to exercise 1 on page 26; that exercise shows that
the density of the tn distribution is
−(n+1)/2
−(n+1)/2
Γ n+1/2
t2
t2
1
√
√
1+
=
1+
for t ∈ R.
(12.1b)
fT (t) =
n
n
B( 1/2, n/2) n
Γ n/2 πn
We can check that the function fT defined in equation(12.1b) is a density for any n ∈ (0, ∞) as follows. Clearly
fT (t) > 0; also, by using the transformation θ = 1/(1 + t2 /n), it follows that
−(n+1)/2
−(n+1)/2
Z ∞
Z ∞
Z 1
√
t2
t2
1+
1+
dt = 2
dt = n
θ(n−2)/2 (1 − θ)−1/2 dθ
n
n
−∞
0
0
√
= n B(1/2, n/2)
Hence fT is a density.
Now Y in equation(12.1a) can be replaced by Z12 + · · · + Zn2 where Z1 , Z2 , . . . , Zn are i.i.d. with the N (0, 1) distribution. Hence Y /n has variance 1 when n = 1, but its distribution becomes more clustered about the constant 1
as n becomes larger. Hence T has a larger variance then the normal when n = 1 but tends to the normal as
n → ∞. Figure(12.1a) graphically demonstrates the density of the t-distribution is similar to the shape of the
normal density but has heavier tails. See exercise 3 on page 26 for a mathematical proof that the distribution of T
tends to the normal as n → ∞.
0.4
t2
t10
normal
0.3
0.2
0.1
0.0
−4
−2
0
2
4
Figure(12.1a). Plot of the t2 , t10 and standard normal densities.
(wmf/tdensity,72mm,54mm)
12.2 Moments of the tn distribution. Suppose T ∼ tn . Now it is well-known that the integral
converges if j > 1 and diverges if j ≤ 1. It follows that
Z ∞
tr
dt converges if r < n.
(n + t2 )(n+1)/2
1
R∞
1
1
dx
xj
Hence the function tr fT (t) is integrable iff r < n.
√
√
Provided n > 1, E[T ] exists and equals nE[X]E[1/ Y ] = 0.
Provided n > 2, var(T ) = E[T 2 ] = nE[X 2 ] E[1/Y ] = n/(n − 2) by using equation(4.8a) on page 10 which gives
E[1/Y ] = 1/(n − 2).
Page 24 Section 12
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
12.3 Linear transformation of the tn distribution. Suppose m ∈ R, s > 0 and V = m + sT . Then
!−(n+1)/2
1 v−m 2
fT (t)
1
√
1+
fV (v) = dv =
n
s
B( 1/2, n/2)s n
dt
Also E[V ] = m for n > 1 and
var(V ) = s2
This is called a tn (m, s2 ) distribution.
n
n−2
for n > 2
(12.3a)
(12.3b)
12.4 The Cauchy distribution. The standard Cauchy distribution is the same as the t1 distribution; we shall
denote its density by γ1 . Hence
1
γ1 (t) =
for t ∈ R.
π(1 + t2 )
More generally, the Cauchy distribution is the same as the t1 (0, s2 ) distribution; we shall denote its density by γs .
Hence, for s > 0,
s
for t ∈ R.
γs (t) =
2
π(s + t2 )
0.4
0.3
normal
γ1 = t1
γ2 = t1 (0, 4)
0.2
0.1
0.0
−4
−2
0
2
4
Figure(12.4a). Plot of the normal, standard Cauchy and the γ2 = t(0, 4) densities.
(wmf/cauchydensity,72mm,54mm)
12.5 Elementary properties of the Cauchy distribution.
• Moments. The expectation, variance and higher moments of the Cauchy distribution are not defined.
• The distribution function. This is
1
Fs (t) = tan−1
π
This is probably better written as
1 1
t
−1
Fs (t) = + tan
2 π
s
t
s
where now tan−1 t/s ∈ (− π/2, π/2).
• The characteristic function. Suppose the random variable T has the standard Cauchy distribution which has
density γ1 . Then
φT (t) = E[eitT ] = e−|t|
and hence if W has the γs density, then W = sT and E[eitW ] = e−s|t| .
Note. The characteristic function can be derived by using the calculus of residues, or by the following trick. Using integration
by parts gives
Z ∞
Z ∞
Z ∞
Z ∞
−y
−y
−y
e cos(ty) dy = 1 − t
e sin(ty) dy and
e sin(ty) dy = t
e−y cos(ty) dy
0
and hence
0
0
Z
0
∞
e−y cos(ty) dy =
0
1
1 + t2
Now the characteristic function of the bivariate exponential density f (x) = 12 e−|x| for x ∈ R is
Z
Z ∞
1 ∞
1
φ(t) =
(cos(ty) + i sin(ty))e−|y| dy =
e−y cos(ty) dy =
2 −∞
1
+
t2
0
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 12 Page 25
Because this function is absolutely inegrable, we can use the inversion theorem to get
Z ∞ −ity
Z ∞ ity
e
1
e
1 −|t|
1
dy
=
dy
e
=
2
2π −∞ 1 + y 2
2π −∞ 1 + y 2
as required.
Further properties of the Cauchy distribution can be found in exercises 4–9 on page 26.
12.6 Definition of the F distribution.
Definition(12.6a).
independent. Then
Suppose m > 0 and n > 0. Suppose further that X ∼ χ2m , Y ∼ χ2n and X and Y are
X/m
has an Fm,n distribution.
Y /n
Finding the density of the Fm,n distribution is a routine calculation and is left to exercise 10 on page 26; that
exercise shows that the density of the Fm,n distribution is
F =
Γ( m+n
) mm/2 nn/2 xm/2−1
fF (x) = m 2 n
Γ( 2 )Γ( 2 ) [mx + n](m+n)/2
for x ∈ (0, ∞).
(12.6a)
F10,4 density
F10,50 density
0.8
0.6
0.4
0.2
0.0
0
1
2
3
4
5
Figure(12.6a). Plot of the F10,4 and F10,50 densities.
(wmf/fdensity,72mm,54mm)
12.7 The connection between the t and F distributions. Recall the definition of a tn distribution:
X
Tn = p
Y /n
where X and Y are independent, X ∼ N (0, 1) and Y ∼ χ2n .
Now X 2 ∼ χ21 ; hence
X2
Tn2 = 2
∼ F1,n
Y /n
(12.7a)
Example(12.7a). Using knowledge of the F density and equation(12.7a), find the density of Tn .
Solution. Let W = Tn2 ; hence W ∼ F1,n . Then
√
√ 1 fW (w) = √ fTn (− w) + fTn ( w)
2 w
But equation(12.7a) clearly implies the distribution of Tn is symmetric about 0; hence for w > 0
−(n+1)/2
Γ( n+1
Γ( n+1
nn/2 w−1
w2
2 )
2 )
and fTn (w) = wfW (w ) = w 1
=√
1+
n
nπΓ( n2 )
Γ( 2 )Γ( n2 ) [w2 + n](n+1)/2
Finally, by symmetry, fTn (−w) = fTn (w).
√
1
fW (w) = √ fTn ( w)
w
2
12.8 Properties of the F distribution. The following properties of the F -distribution are considered in exercises
13–17 on pages 26–27.
• If X ∼ Fm,n then 1/X ∼ Fn,m .
• If X1 ∼ Gamma(n1 , α1 ), X2 ∼ Gamma(n2 , α2 ) and X1 and X2 are independent then
n2 α1 X1
∼ F2n1 ,2n2
n1 α2 X2
(12.8a)
Page 26 Exercises 13
Jun 7, 2018(14:22)
• If X ∼ Beta( m/2, n/2) then
• If X ∼ Fm,n then
mX
n+mX
nX
m(1−X)
Bayesian Time Series Analysis
∼ Fm,n . See exercise 14 on page 26.
∼ Beta( m/2, n/2).
• If X ∼ Fm,n then mX/n ∼ Beta0 ( m/2, n/2).
D
• Suppose X ∼ Fm,n . Then mX −→χ2m as n → ∞.
(12.8b)
13 Exercises
(exs-tCauchyF.tex.tex)
The t distribution.
1. Using the definition of the tn distribution given in definition(12.1a) on page 23, show that the density of the tn distribution is given by equation(12.1b).
2. Suppose n > 0, s > 0 and α ∈ R. Show that
Z ∞
−∞
1
1 + (t − α)2 /s2
n/2
dt = sB
1 n−1
,
2
2
3. Prove that the limit as n → ∞ of the tn density given in equation(12.1b) is the standard normal density.
The Cauchy distribution.
4. Suppose X and Y are i.i.d. with the N (0, σ 2 ) distribution. Find the distribution of:
(a) W = X/Y ;
(b) W = X/|Y |;
(c) W =
|X|/|Y |.
5. (a) Suppose U has the uniform distribution U (− π/2, π/2). Show that tan(U ) has the Cauchy distribution, γ1 .
(b) Suppose U has the uniform distribution U (−π, π). Show that tan(U ) has the Cauchy distribution, γ1 .
6. Suppose X1 , . . . , Xn are i.i.d. with density γs . Show that
X1 + · · · + Xn
Y =
n
also has density γs .
7. (a) Suppose X has the density γs . Find the density of 2X. (This shows that 2X has the same distribution as X1 + X2
where X1 and X2 are i.i.d. with the same distribution as X.)
(b) Supppose U and V are i.i.d. with density γs . Let X = aU + bV and Y = cU + dV . Find the distribution of X + Y .
8. Suppose X and Y are i.i.d. with the N (0, 1) distribution. Define R and Θ by R2 = X 2 + Y 2 and tan(Θ) = Y /X where
R > 0 and Θ ∈ (−π, π). Show that R2 has the χ22 distribution, tan(Θ) has the Cauchy distribution γ1 , and R and Θ are
2
independent. Show also that the density of R is re−r /2 for r > 0.
9. Suppose X has the Cauchy distribution, γ1 . Find the density of
2X
1 − X2
2
Hint: tan(2θ) = 2 tan(θ) 1 − tan (θ) .
The F distribution.
10. Using definition(14.6a) on page 29, show that the density of the Fm,n distribution is given by equation(12.6a) on page 25.
11. Suppose X and Y are i.i.d. N (0, σ 2 ). Find the density of Z where
2
2
Z = Y /X if X 6= 0;
0
if X = 0.
12. Suppose F has the Fm,n distribution where n > 2. Find E[F ].
13. (a) Suppose X ∼ Fm,n . Show that 1/X ∼ Fn,m .
(b) Suppose X1 ∼ Γ(n1 , α1 ), X2 ∼ Gamma(n2 , α2 ) and X1 and X2 are independent. Show that
n2 α1 X1
∼ F2n1 ,2n2
n1 α2 X2
14. Suppose X ∼ Beta( m/2, n/2). Show that
nX
m(1−X)
∼ Fm,n .
1 Univariate Continuous Distributions
15. (a) Suppose X ∼ Fm,n . Show that
Jun 7, 2018(14:22)
mX
n+mX
Section 14 Page 27
∼ Beta( m/2, n/2).
(b) Suppose X ∼ F2α,2β where α > 0 and β > 0. Show that αX/β ∼ Beta0 (α, β).
16. Suppose W ∼ Fm,n . Show that mW/n ∼ Beta0 ( m/2, n/2).
D
17. Suppose W ∼ Fm,n . Show that mW −→χ2m as n → ∞.
14 Non-central distributions
14.1 The non-central χ2 -distribution with 1 degree of freedom. We know that if Z ∼ N (0, 1), then Z 2 ∼ χ21 .
Now suppose
W = (Z + a)2 where Z ∼ N (0, 1) and a ∈ R.
Then W is said to have a non-central χ21 distribution with non-centrality parameter a2 . Hence W ∼ Y 2 where
Y ∼ N (a, 1).
The moment generating function of W .
Z ∞
1 2
1
2
t(Z+a)2
E[e
]= √
et(z+a) e− 2 z dz
2π −∞
But
t(z + a)2 − 1/2 z 2 = z 2 t + 2azt + a2 t − 1/2 z 2 = z 2 (t − 1/2) + 2azt + a2 t
"
#
2
2t
2t
2 t2
4azt
2a
1
−
2t
2at
2a
4a
−
=−
z−
−
−
= (t − 1/2) z 2 −
1 − 2t 1 − 2t
2
1 − 2t
1 − 2t (1 − 2t)2
"
#
1 − 2t
2at 2
2a2 t
=−
z−
−
2
1 − 2t
(1 − 2t)2
and hence, if α = 2at/(1 − 2t) and t < 1/2,
2 Z ∞
a t
1
(1 − 2t)(z − α)2
2
√
E[et(Z+a) ] = exp
dz
exp −
1 − 2t
2
2π −∞
2 a t
−1/2
= (1 − 2t)
exp
1 − 2t
(14.1a)
The density of W . For w > 0 we have
√
√
√
√
φ( w − a) + φ(− w − a) φ( w − a) + φ( w + a)
√
√
fW (w) =
=
2 w
2 w
√
√ 1
= √
exp − 1/2(w + a2 ) exp(a w) + exp(−a w)
(14.1b)
2 2πw
√
1
=√
exp − 1/2(w + a2 ) cosh(a w) because cosh(x) = (ex + e−x )/2 for all x ∈ R.
2πw
Using the standard expansion for cosh and the standard property of the Gamma function that Γ(n + 1/2) =
√
(2n)! π/(4n n!) for all n = 0, 1, 2, . . . , gives
∞
∞
X
X
(a2 w)j
(a2 w/4)j
1
1
√
fW (w) = √
exp − 1/2(w + a2 )
=√
exp − 1/2(w + a2 )
j!Γ(j + 1/2)
π(2j)!
2w
2w
j=0
j=0
√ 1/2
√
1
a w
2
1
=√
exp − /2(w + a ) I−1/2 (a w)
2
2w
2 1/4
a
√
1
= exp − 1/2(w + a2 )
I−1/2 (a w)
(14.1c)
2
w
where, for all x > 0,
∞
x 2j− 1/2
X
1
I−1/2 (x) =
j!Γ(j + 1/2) 2
j=0
Page 28 Section 14
Jun 7, 2018(14:22)
is a modified Bessel function of the first kind.
Bayesian Time Series Analysis
9
14.2 The non-central χ2 -distribution with n degrees of freedom.
Suppose X1 ∼ N (µ1 , 1), X2 ∼ N (µ2 , 1), . . . , Xn ∼ N (µn , 1) are independent. Then
n
X
W =
Xj2
j=1
P
has a non-central
distribution with non-centrality parameter λ = nj=1 µ2j .
Caution!!! Some authors define the non-centrality parameter to be λ/2.
χ2n
The moment generating function of W . By equation(14.1a) the moment generating function of X12 is
2 µ1 t
1
tX12
exp
E[e ] =
1/2
1 − 2t
(1 − 2t)
Hence
1
λt
tW
E[e ] =
exp
for t < 1/2.
1 − 2t
(1 − 2t)n/2
(14.2a)
14.3 The non-central χ2 -distribution with n degrees of freedom—the basic decomposition theorem.
Proposition(14.3a). Suppose W has a non-central χ2n distribution with non-centrality parameter λ > 0. Then
W has the same distribution as U + V where:
U has a non-central χ21 distribution with non-centrality parameter λ;
V has a χ2n−1 distribution;
U and V are independent.
p
Pn
Proof. Let µj = λ/n for j = 1, . . . , n; hence j=1 µ2j = λ.
Pn
We are given that W has a non-central χ2n distribution with non-centrality parameter λ > 0. Hence W ∼ j=1 Xj2 where
X1 , . . . , Xn are independent with Xj ∼ N (µj , 1) for j = 1, . . . , n.
√
√
Let e1 = (1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) denote the standard basis of Rn . Set b1 = (µ1 / λ, . . . , µn / λ). Then
{b1 , e2 , . . . , en } form a basis of Rn . Use the Gram-Schmidt orthogonalization procedure to create the basis {b1 , . . . , bn }.
Define B to be the n × n matrix with rows {b1 , . . . , bn }; then B is orthogonal.
√
Suppose X = (X1 , . . . , Xn ) and set Y = BX. Then Y ∼ N (Bµ, BIBT = I) where µ = (µ1 , . . . , µn ) = λb1 . Hence
n×1
n×1
√
Y1 ∼ N (bT1 µ = λ, 1) and Yj ∼ N (bTj µ = 0, 1) for j = 2, . . . , n and Y1 , . . . , Yn are independent. Also YT Y = XT X.
Pn
Finally, let U = Y12 and V = j=2 Yj2 .
14.4 The non-central χ2 -distribution with n degrees of freedom—the density function. We use proposition(14.3a). Now U has a non-central χ21 distribution with non-centrality parameter λ. Using equation(14.1b)
gives
h √
√ i
1
1
for u > 0.
fU (u) = 3/2 1 √ e− 2 (u+λ) e λu + e− λu
2 Γ( /2) u
Also, V ∼ χ2n−1 has density
e−v/2 v (n−3)/2
for v > 0.
2(n−1)/2 Γ( (n−1)/2)
Using independence of U and V gives
√ i
u−1/2 v (n−3)/2 e−(u+v)/2 e−λ/2 h √λu
− λu
f(U,V ) (u, v) =
e
+
e
2(n+2)/2 Γ( 1/2)Γ( (n−1)/2)
Now use the transformation X = U + V and Y = V . The Jacobean equals 1. Hence for y > 0 and x > y
√
#
1/2 " √λ(x−y)
x
e
− e− λ(x−y)
e−x/2 e−λ/2 x(n−4)/2 y (n−3)/2
f(X,Y ) (x, y) = n/2 1
x−y
2
2 Γ( /2)Γ( (n−1)/2) x
fV (v) =
9
The general definition of a modified Bessel function of the first kind is
∞
x ν X
x2j
Iν (x) =
for all ν ∈ R and x ∈ C.
j
2
4 j!Γ(ν + j + 1)
j=0
(14.1d)
1 Univariate Continuous Distributions
Now
x
x−y
Jun 7, 2018(14:22)
Exercises 15 Page 29
√
# 1/2 " √λ(x−y)
1/2 X
∞
e
− e− λ(x−y)
x
λj (x − y)j
=
2
x−y
(2j)!
j=0
=
∞
X
(λx)j j=0
(2j)!
1−
y j−1/2
x
and so we have
∞
e−x/2 e−λ/2 x(n−4)/2 X (λx)j y (n−3)/2 y j−1/2
f(X,Y ) (x, y) = n/2 1
1
−
x
2 Γ( /2)Γ( (n−1)/2) j=0 (2j)! x
for y > 0 and x > y.
We need to integrate out y. By setting w = y/x we get
Z 1
Z x (n−3)/2 y j−1/2
y
1−
dy = x
w(n−3)/2 (1 − w)j−1/2 dw
x
w=0
y=0 x
Γ( (n−1)/2)Γ(j + 1/2))
= xB( (n−1)/2, j + 1/2)) = x
Γ( n/2 + j)
and hence for x > 0
∞
e−x/2 e−λ/2 x(n−2)/2 X (λx)j Γ(j + 1/2)) Γ( n/2)
fX (x) =
(2j)! Γ( 1/2) Γ( n/2 + j)
2n/2 Γ( n/2)
=
j=0
∞
−x/2
−λ/2
(n−2)/2
X
e
e
x
2n/2
j=0
(λx)j
4j j!Γ( n/2 + j)
(14.4a)
The expression for the modified Bessel function of the first kind in equation(14.1d) on page 28 gives
∞
√
(λx)j
(λx)(n−2)/4 X
I n2 −1 ( λx) =
4j j!Γ( n/2 + j)
2n/2−1
j=0
Hence an alternative expression for the density is
x (n−2)/4
√
1
I n2 −1 ( λx)
fX (x) = e−x/2 e−λ/2
2
λ
This is the same as equation(14.1c) if we set n = 1 and λ = a2 .
(14.4b)
14.5 The non-central t distribution.
Suppose n ∈ (0, ∞) and µ ∈ R. Then the random variable T has a non-central tdistribution with n degrees of freedom and non-centrality parameter µ iff
X +µ
T =p
(14.5a)
Y /n
where X ∼ N(0, 1), Y ∼ χ2n , and X and Y are independent.
Definition(14.5a).
14.6 The non-central F distribution.
Suppose m > 0 and n > 0. Suppose further that X has a non-central χ2m distribution
with non-centrality parameter λ, Y ∼ χ2n and X and Y are independent. Then
X/m
has a non-central Fm,n distribution with non-centrality parameter λ.
F =
Y /n
Definition(14.6a).
15 Exercises
(exs-noncentral.tex.tex)
1. Suppose X1 , . . . , Xn are independent random variables such that Xj has a non-central χ2kj distribution with noncentrality parameter λj for j = 1, . . . , n. Find the distribution of Z = X1 + · · · + Xn .
2. Suppose W has a non-central χ2n distribution with non-centrality parameter λ.
(a) Find the expectation of W .
(b) Find the variance of W .
Page 30 Section 16
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
3. Suppose the random variable V has the Poisson distribution with mean λ/2. Suppose further that the distribution of W
given V = j is the χ2k+2j distribution. Show that the distribution of W is the non-central χ2k with non-centrality parameter λ.
16 The power and Pareto distributions
There are many results about these distributions in the exercises.
16.1 The power distribution. Suppose a0 ∈ R, h > 0 and α > 0. Then the random variable X is said to have
the power distribution Power(α, a0 , h) iff X has density
α(x − a0 )α−1
f (x) =
for a0 < x < a0 + h.
hα
The distribution function is
(x − a0 )α
for a0 < x < a0 + h.
F (x) =
hα
The standard power distribution is Power(α, 0, 1); this has density f (x) = αxα−1 for 0 < x < 1 and distribution
function F (x) = xα for 0 < x < 1. If X ∼ Power(α, 0, 1), the standard power distribution, then E[X] = α/(α+1),
E[X 2 ] = α/(α + 2) and var[X] = α/(α + 1)2 (α + 2); see exercise 1 on page 33.
Clearly, X ∼ Power(α, a0 , h) iff (X − a0 )/h ∼ Power(α, 0, 1).
4
α = 1/2
α=2
α=4
3
2
1
0
0.0
0.2
0.4
0.6
0.8
1.0
Figure(16.1a). The density of the standard power distribution for α = 1/2, α = 2 and α = 4.
(wmf/powerdensity,72mm,54mm)
16.2 A characterization of the power distribution. Suppose X ∼ Power(α, 0, h); then
αxα−1
xα
f (x) =
and
F
(x)
=
for x ∈ (0, h).
hα
hα
Also, for all c ∈ (0, h) we have
Z c
αxα−1
α
c
x α dx =
c = E[X]
E[X|X ≤ c] =
c
α+1
h
0
The next proposition shows this result characterizes the power distribution (see [DALLAS(1976)]).
Proposition(16.2a). Suppose X is a non-negative absolutely continuous random variable such that these exists
h > 0 with P[X ≤ h] = 1. Suppose further that for all c ∈ (0, h) we have
c
E[X|X ≤ c] = E[X]
h
Then there exists α > 0 such that X ∼ Power(α, 0, h).
Proof. Let f denote the density and F denote the distribution function of X. Then equation(16.2a) becomes
Z c
Z
c h
xf (x)
dx =
xf (x) dx
F (c)
h 0
0
Rh
Let δ = h1 0 xf (x) dx. Then δ ∈ (0, 1) and
Z c
xf (x) dx = cF (c) δ for all c ∈ (0, h).
0
Differentiating with respect to c gives
(16.2a)
(16.2b)
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 16 Page 31
cf (c) = [F (c) + cf (c)] δ
and hence
F 0 (c) α
δ
=
where α =
> 0.
F (c)
c
1−δ
Integrating gives ln F (c) = A + α ln(c) or F (c) = kcα . Using F (h) = 1 gives F (c) = cα /hα for c ∈ (0, h), as required.
The above result leads on to another characterization of the power distribution:
Proposition(16.2b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F and such that there exists h > 0 with F (h) = 1. Then
Sn E
X
=
x
= c with c independent of x
(n)
X(n) iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
(16.2c)
Proof. ⇒ Writing Sn = X(1) + · · · + X(n) in equation(16.2c) gives
(c − 1)x = E X(1) + · · · + X(n−1) |X(n) = x
It is easy to see that given X(n) = x, then the vector (X(1) , . . . , X(n−1) ) has the same distribution as the vector of n − 1
order statistics (Y(1) , . . . , Y(n−1) ) from the density f (y)/F (x) for 0 < y < x. Hence Y(1) + . . . + Y(n−1) = Y1 + · · · + Yn−1
and
Hence
(c − 1)x = (n − 1)E[Y ]
where Y has density f (y)/F (x) for y ∈ (0, x).
(16.2d)
x
c−1
n−1
0
Because X(j) < X(n) for all j = 1, 2, . . . , n, equation(16.2c) implies c < nx
x = n; also equation(16.2d) implies c > 1.
c−1
∈ (0, 1). So we have equation(16.2b) again and we must have F (x) = xα /hα for x ∈ (0, h).
Hence δ = n−1
⇐ See part (a) of exercise 6 on page 33.
Z
yf (y) dy = xF (x)
The next result is an easy consequence of the last one—it was originally announced in [S RIVASTAVA(1965)] but
the proof here is due to [DALLAS(1976)].
Proposition(16.2c). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F and such that there exists h > 0 with F (h) = 1. Then
Sn
is independent of max{X1 , . . . , Xn }
max{X1 , . . . , Xn }
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
Proof. ⇒ Clearly equation(16.2e) implies equation(16.2c).
(16.2e)
⇐ See part (b) of exercise 6 on page 33.
16.3 The Pareto distribution. Suppose α > 0 and x0 > 0. Then the random variable X is said to have a Pareto
distribution iff X has the distribution function
x α
0
FX (x) = 1 −
for x ≥ x0 .
x
It follows that X has density
αxα0
fX (x) = α+1
for x ≥ x0 .
x
More generally, the random variable has the Pareto(α, a, x0 ) distribution iff
xα0
αxα0
fX (x) =
for
x
>
a
+
x
and
F
(x)
=
1
−
for x > a + x0 .
0
X
(x − a)α+1
(x − a)α
α
The standard Pareto distribution is Pareto(α, 0, 1) which has density f (x) = xα+1
for x > 1 and distribution
1
function F (x) = 1 − xα for x > 1. Clearly X ∼ Pareto(α, a, x0 ) iff (X − a)/x0 ∼ Pareto(α, 0, 1).
It is important to note that if X ∼ Pareto(α, 0, x0 ) then 1/X ∼ Power(α, 0, 1/x) also, if X ∼ Power(α, 0, h) then
∼ Pareto(α, 0, 1/h). So results about one distribution can often be transformed into an equivalent result about
the other.
1/X
Page 32 Section 16
Jun 7, 2018(14:22)
3.0
Bayesian Time Series Analysis
x0 = 1, α = 1
x0 = 1, α = 2
x0 = 1, α = 3
2.5
2.0
1.5
1.0
0.5
0.0
0
1
2
3
4
5
Figure(16.3a). The Pareto distribution density for α = 1, α = 2 and α = 3 (all with x0 = 1).
(wmf/Paretodensity,72mm,54mm)
The Pareto distribution has been used to model the distribution of incomes, the distribution of wealth, the sizes of
human settlements, etc.
16.4 A characterization of the Pareto distribution. Suppose X ∼ Pareto(α, 0, x0 ). Suppose further that
α > 1 so that the expectation is finite. We have
xα
αxα0
and F (x) = 1 − α0 for x > x0 .
f (x) = α+1
x
x
Also, for all c > x0 we have, because the expectation is finite,
Z ∞
Z ∞
αxα0
1
αc
c
α
dx = αc
dx =
= E[X]
E[X|X > c] =
x α+1
α
x [1 − F (c)]
x
α − 1 x0
c
c
The next proposition shows this result characterizes the power distribution (see [DALLAS(1976)]).
Proposition(16.4a). Suppose X is a non-negative absolutely continuous random variable with a finite expec-
tation and such that these exists x0 > 0 with P[X > x0 ] = 1. Suppose further that for all c ∈ (0, h) we
have
c
E[X|X > c] = E[X]
(16.4a)
x0
Then there exists α > 1 such that X ∼ Pareto(α, 0, x0 ).
Proof. Let f denote the density and F denote the distribution function of X. Then equation(16.2a) becomes
Z ∞
Z ∞
xf (x)
c
dx =
xf (x) dx
1 − F (c)
x0 x0
c
R∞
Let δ = x10 x0 xf (x) dx. We are assuming E[X] is finite; hence δ ∈ (1, ∞) and
Z ∞
xf (x) dx = c[1 − F (c)] δ for all c > x0 .
(16.4b)
c
Differentiating with respect to c gives
and hence
−cf (c) = [1 − F (c) − cf (c)] δ
cf (c)[δ − 1] = [1 − F (c)]δ
α
δ
F 0 (c)
=
where α =
> 1.
1 − F (c) c
δ−1
α
Integrating gives − ln[1 − F (c)] = A + ln(cα ) or 1 − F (c) = k/cα . Using F (x0 ) = 0 gives 1 − F (c) = xα
0 /c for c > x0 ,
as required.
The above result leads on to another characterization of the Pareto distribution:
Proposition(16.4b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F with a finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
Then
Sn E
X(1) = x = c with c independent of x
(16.4c)
X(1) iff there exists α > 1 such that F (x) = 1 − xα0 /xα for x > x0 .
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Exercises 17 Page 33
Proof. ⇒ Writing Sn = X(1) + · · · + X(n) in equation(16.4c) gives
(c − 1)x = E X(2) + · · · + X(n) |X(1) = x
It is easy to see that given X(1) = x, then the vector (X(2) , . . . , X(n) ) has the same distribution as the vector of n − 1 order
statistics (Y(1) , . . . , Y(n−1) ) from the density f (y)/[1 − F (x)] for y > x. Hence Y(1) + . . . + Y(n−1) = Y1 + · · · + Yn−1 and
(c − 1)x = (n − 1)E[Y ] where Y has density f (y)/[1 − F (x)] for y > x0 .
(16.4d)
Hence
Z ∞
c−1
yf (y) dy = x[1 − F (x)]
n−1
x0
Because X(j) > X(1) for all j = 2, 3, . . . , n, equation(16.4c) implies c > nx
x = n. Hence δ =
α
equation(16.4b) again and we must have F (x) = 1 − xα
0 /x for x ∈ (x0 , ∞).
⇐ See part (b) of exercise 20 on page 35.
c−1
n−1
∈ (1, ∞). So we have
The next result is an easy consequence of the last one—it was originally announced in [S RIVASTAVA(1965)] but
the proof here is due to [DALLAS(1976)].
Proposition(16.4c). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F with finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
Then
Sn
is independent of min{X1 , . . . , Xn }
(16.4e)
min{X1 , . . . , Xn }
iff there exists α > 1 such that F (x) = 1 − xα0 /xα for x ∈ (x0 , ∞).
Proof. ⇒ Clearly equation(16.4e) implies equation(16.4c).
⇐ See part (c) of exercise 20 on page 35.
17 Exercises
(exs-powerPareto.tex)
The power distribution.
1. Suppose X has the Power(α, a, h) distribution. Find E[X], E[X 2 ] and var[X].
2. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Power(α, a, h) distribution. Find the distribution of
Mn = max(X1 , . . . , Xn ).
3. Suppose U1 , U2 , . . . , Un are i.i.d. random variables with the Uniform(0, 1) distribution.
(a) Find the distribution of Mn = max(U1 , . . . , Un ).
1/n
(b) Find the distribution of Y = U1
.
(c) Suppose X ∼ Power(α, a, h). Show that X ∼ a + hU 1/α where U ∼ Uniform(0, 1). Hence show that
n
X
n j n−j
α
E[X n ] =
h a
for n = 1, 2, . . . .
α+j j
j=0
4. Transforming the power distribution to the exponential. Suppose X ∼ Power(α, 0, h). Let Y = − ln(X); equivalently
Y = ln( 1/X ). Show that Y − ln( 1/h) ∼ exponential (α).
5. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the power distribution Power(α, 0, 1). By using the density of
2
Xk:n , find E[Xk:n ] and E[Xk:n
].
6. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Power(α, 0, h) distribution.
Sn X
=
x
= c where c is independent of x.
(a) Show that E
(n)
X(n) (b) Show that
Sn
max{X1 , . . . , Xn }
is independent of
max{X1 , . . . , Xn }.
7. Suppose r > 0 and X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely continuous distribution
function F such that there exists h > 0 with F (h) = 1.
(a) Show that for some i = 1, 2, . . ., n − 1
"
#
r X(i)
E
X(i+1) = x = c with c independent of x for x ∈ (0, h)
r
X(i+1)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
Page 34 Exercises 17
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
(b) Assuming the expectation is finite, show that for some i = 1, 2, . . . , n − 1
"
#
r
X(i+1)
E
r X(i+1) = x = c with c independent of x for x ∈ (0, h)
X(i)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
[DALLAS(1976)]
8. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the power distribution Power(α, 0, 1), which has distribution
function F (x) = xα for 0 < x < 1 where α > 0.
(a) Let
X1:n
X2:n
X(n−1):n
, W2 =
, . . . , Wn−1 =
, Wn = Xn:n
X2:n
X3:n
Xn:n
Prove that W1 , W2 , . . . , Wn are independent and find the distribution of Wk for k = 1, 2, . . . , n.
W1 =
2
(b) Hence find E[Xk:n ] and E[Xk:n
].
The Pareto distribution.
9. Relationship with the power distribution. Recall that if α > 0, then U ∼ Uniform(0, 1) iff Y = U 1/α ∼ Power(α, 0, 1).
(a) Suppose α > 0. Show that U ∼ Uniform(0, 1) iff Y = U −1/α ∼ Pareto(α, 0, 1).
(b) Suppose α > 0 and x0 > 0. Show that Y ∼ Pareto(α, a, x0 ) iff Y = a + x0 U −1/α where U ∼ Uniform(0, 1).
(c) Show that X ∼ Power(α, 0, 1) iff 1/X ∼ Pareto(α, 0, 1).
10. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, a, x0 ) distribution. Find the distribution of
Mn = min(X1 , X2 , . . . , Xn ).
11. Suppose X ∼ Pareto(α, 0, x0 ).
(a) Find E[X] and var[X].
(b) Find the median and mode of X.
(c) Find E[X n ] for n = 1, 2, . . . .
(d) Find MX (t) = E[etX ], the moment generating function of X and φX (t) = E[eitX ], the characteristic function
of X.
12. Show that the Pareto( 1/2, 0, 1) distribution provides an example of a distribution with E[1/X] finite but E[X] infinite.
13. Transforming the Pareto to the exponential. Suppose X ∼ Pareto(α, 0, x0 ). Let Y = ln(X). Show that Y has a shifted
exponential distribution: Y − ln(x0 ) ∼ exponential (α).
14. Suppose X ∼ Pareto(α, 0, x0 ). Find the geometric mean of X and the Gini coefficient of X. The geometric mean of a
distribution is defined in exercise 3 on page 19 and the Gini coefficient is defined in exercise 9 on page 19.
15. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto distribution Pareto(α, 0, 1). By using the density of
2
Xk:n , find E[Xk:n ] and E[Xk:n
].
16. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, 0, 1) distribution.
(a) Let
X(n−1):n
Xn:n
X2:n
, . . . , Wn−1 =
, Wn =
X1:n
X(n−2):n
X(n−1):n
Prove that W1 , W2 , . . . , Wn are independent and find the distribution of Wk for k = 1, 2, . . . , n.
W1 = X1:n
W2 =
2
(b) Hence find E[Xk:n ] and E[Xk:n
]. See exercise 15 for an alternative derivation.
17. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Power(α, 0, 1) distribution. Suppose also Y1 , Y2 , . . . , Yn
are i.i.d. random variables with the Pareto(α, 0, 1) distribution. Show that for k = 1, 2, . . . , n
1
have the same distribution.
Xk:n and
Y(n−k+1):n
18. Suppose X and Y are i.i.d. random variables with the Pareto(α, 0, x0 ) distribution. Find the distribution function and
density of Y /X .
19. Suppose X and Y are i.i.d. random variables with the Pareto(α, 0, x0 ) distribution. Let M = min(X, Y ). Prove that M
and Y /X are independent.
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 18 Page 35
20. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, 0, x0 ) distribution.
(a) Prove that the random variable X1:n is independent of the random vector ( X2:n/X1:n , . . . ,
Sn (b) Show that E
X
=
x
= c where c is independent of x.
(1)
X(1) (c) Prove that X1:n is independent of
Sn/X
1:n
=
(X1 +···+Xn )/X
1:n
Xn:n/X
1:n
).
.
21. Suppose r > 0 and X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely continuous distribution
function F with finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
(a) Show that for some i = 1,"2, . . . , n − 1
#
r
X(i+1)
E
r X(i) = x = c with c independent of x for x > x0
X(i)
α
iff there exists α > r/(n − i) such that F (x) = 1 − xα
0 /x for x > x0 .
(b) Show that for some i = 1,"2, . . . , n − 1
#
r X(i)
E
X(i) = x = c with c independent of x for x > x0
r
X(i+1)
α
iff there exists α > r/(n − i) such that F (x) = 1 − xα
0 /x for x > x0 .
[DALLAS(1976)]
22. A characterization of the Pareto distribution. It is known that if X and Y are i.i.d. random variables with an absolutely
continuous distribution and min(X, Y ) is independent of X − Y , then X and Y have an exponential distribution—see
[C RAWFORD(1966)].
Now suppose X and Y are i.i.d. positive random variables with an absolutely continuous distribution and min(X, Y ) is
independent of Y /X . Prove that X and Y have a Pareto distribution.
Combining this result with exercise 19 gives the following characterization theorem: suppose X and Y are i.i.d. positive
random variables with an absolutely continuous distribution; then min(X, Y ) is independent of Y /X if and only if X and
Y have the Pareto distribution.
23. Another characterization of the Pareto distribution. Suppose X1 , X2 , . . . , Xn are i.i.d. absolutely continuous nonnegative random variables with density function f (x) and distribution function F (x). Suppose further that F (1) = 0 and
f (x) > 0 for all x > 1 and 1 ≤ i < j ≤ n. Show that Xj:n/Xi:n is independent of Xi:n if and only if there exists β > 0
such that each Xi has the Pareto(β, 0, 1) distribution.
Using the fact that X ∼ Pareto(α, 0, x0 ) iff X/x0 ∼ Pareto(α, 0, 1), it follows that if F (x0 ) = 0 and f (x) > 0 for all
x > x0 where x0 > 0 then Xj:n/Xi:n is independent of Xi:n if and only if there exists β > 0 such that each Xi has the
Pareto(β, 0, x0 ) distribution.
18 Laplace, Rayleigh and Weibull distributions
There are many results about these distributions in the exercises.
18.1 The Laplace or bilateral exponential distribution. Suppose µ ∈ R and α > 0. Then the random
variable X is said to have the Laplace(µ, α) distribution iff X has the density
α
fX (x) = e−α|x−µ| for x ∈ R.
2
Clearly if X ∼ Laplace(µ, α), then X − µ ∼ Laplace(0, α). As figure(18.1a) shows, the density consists of two
equal exponential densities spliced back to back.
3.0
2.5
2.0
1.5
1.0
0.5
0.0
−2
−1
0
1
2
Figure(18.1a). The bilateral exponential density for µ = 0 and α = 6.
(wmf/bilateralExponential,72mm,54mm)
Page 36 Exercises 19
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
18.2 The Rayleigh distribution. Suppose σ > 0. Then the random variable X is said to have the Rayleigh (σ)
distribution if X has the density
r
2
2
fR (r) = 2 e−r /2σ for r > 0.
σ
The Rayleigh distribution is used to model the lifetime of various items and the magnitude of vectors—see exercise 33 on page 37.
1.2
σ = 0.5
σ = 1.5
σ=4
1.0
0.8
0.6
0.4
0.2
0.0
0
2
4
6
8
10
Figure(18.2a). The Rayleigh distribution density for σ = 0.5, σ = 1.5 and σ = 4.
(wmf/Rayleighdensity,72mm,54mm)
18.3 The Weibull distribution. Suppose β > 0 and γ > 0. Then the random variable X is said to have the
Weibull (β, γ) distribution iff X has the density
βxβ−1 −(x/γ)β
e
for x > 0.
fX (x) =
γβ
The distribution function is F (x) = 1 − exp(−xβ /γ β ) for x > 0. The Weibull distribution is frequently used to
model failure times.
The density can take several shapes as figure(18.3a) illustrates.
2.5
β = 1/2, γ = 1
2.0
β = 5, γ = 1
1.5
β = 1.5, γ = 1
1.0
β = 1, γ = 1
0.5
0.0
0.0
0.5
1.0
1.5
2.0
2.5
Figure(18.3a). The Weibull density for β = 1/2, β = 1, β = 1.5 and β = 5; all with γ = 1.
(wmf/weibulldensity,72mm,54mm)
19 Exercises
(exs-other.tex)
The Laplace or bilateral exponential distribution.
24. (a) Suppose α > 0. Suppose further that X has the exponential density αe−αx for x > 0 and Y has the exponential
density αeαx for x < 0 and X and Y are independent. Find the density of X + Y .
(b) Suppose α > 0 and the random variables X and Y have the exponential density αe−αx for x > 0. Suppose further
that X and Y are independent. Find the density of X − Y .
25. Suppose X has the Laplace(µ, α) distribution.
(a) Find the expectation, median, mode and variance of X.
(b) Find the distribution function of X.
(c) Find the moment generating function of X.
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Exercises 19 Page 37
26. Suppose X has the Laplace(0, α) distribution. Find the density of |X|.
27. (a) Suppose X ∼ Laplace(µ, α); suppose further that k > 0 and b ∈ R. Show that kX + b ∼ Laplace(kµ + b, α/k).
(b) Suppose X ∼ Laplace(µ, α). Show that α(X − µ) ∼ Laplace(0, 1).
(c) Suppose X1 , . . . , Xn are i.i.d. with the Laplace(µ, α) distribution. Show that 2α
Pn
i=1
|Xi − µ| ∼ χ22n .
28. Suppose X and Y are i.i.d. Laplace(µ, α). Show that
|X − µ|
∼ F2,2
|Y − µ|
29. Suppose X and Y are i.i.d. uniform U (0, 1). Show that ln( X/Y ) ∼ Laplace(0, 1).
30. Suppose X and Y are independent random variables with X ∼ exponential (α) and Y ∼ Bernoulli ( 1/2).
Show that X(2Y − 1) ∼ Laplace(0, α).
31. Suppose X1 , X2 , X3 and X4 are i.i.d. N (0, 1).
(a) Show that X1 X2 − X3 X4 ∼ Laplace(0, 1).
(b) Show that X1 X2 + X3 X4 ∼ Laplace(0, 1).
32. Exponential scale mixture of normals.
Suppose X and Y are independent
random variables with X ∼ exponential (1)
√
√
and Y ∼ N (0, 1). Show that Y 2X ∼ Laplace(0, 1) and µ + Y 2X/α ∼ Laplace(µ, α).
Note. Provide two solutions: one using characteristic functions and one using densities.
The Rayleigh distribution.
√
33. (a) Suppose X and Y are i.i.d. with the N (0, σ 2 ) distribution. Define R and Θ by R = X 2 + Y 2 , X = R cos Θ and
Y = R sin Θ with Θ ∈ [0, 2π). Prove that R and Θ are independent and find the density of R and Θ.
(b) Suppose R has the Rayleigh (σ) distribution and Θ has the uniform U (−π, π) distribution. Show that X = R cos Θ
and Y = R sin Θ are i.i.d. N (0, σ 2 ).
34. Suppose R has the Rayleigh (σ) distribution:
fR (r) =
(a) Find the distribution function of R.
r −r2 /2σ2
e
σ2
for r > 0.
(b) Find an expression for E[Rn ] for n = 1, 2, . . . .
(c) Find E[R] and var[R].
(d) Find the mode and median of R.
√
35. Suppose U has the uniform distribution U (0, 1) and X = σ −2 ln U . Show that X has the Rayleigh (σ) distribution.
36. (a) Suppose R has the Rayleigh (σ) distribution. Find the distribution of R2 .
(b) Suppose R has the Rayleigh (1) distribution. Show that the distribution of R2 is χ22 .
Pn
(c) Suppose R1 , . . . , Rn are i.i.d. with the Rayleigh (σ) distribution. Show that Y = i=1 Ri2 has the Gamma(n, 1/2σ 2 )
distribution.
√
(d) Suppose X has the exponential (λ) = Gamma(1, λ) distribution. Find the distribution of Y = X.
37. Suppose X ∼ Rayleigh (s) where s > 0, and Y |X ∼ N (µ, σ = X). Show that Y ∼ Laplace(µ, 1/s).
The Weibull distribution.
β
38. Suppose X has the Weibull (β, γ) distribution; hence fX (x) = βxβ−1 e−(x/γ) /γ β for x > 0 where β is the shape and γ
is the scale.
(a) Show that the Weibull (1, γ) distribution is the same as the exponential (1/γ) distribution.
√
(b) Show that the Weibull (2, γ) distribution is the same as the Rayleigh (γ/ 2) distribution.
39. Suppose X has the exponential (1) distribution; hence fX (x) = e−x for x > 0. Suppose further that β > 0 and γ > 0
and W = γX 1/β . Find the density of W .
This is the Weibull (β, γ) distribution; β is called the shape and γ is called the scale.
β
40. Suppose X has the Weibull (β, γ) distribution; hence fX (x) = βxβ−1 e−(x/γ) /γ β for x > 0.
(a) Suppose α > 0; find the distribution of Y = αX.
(b) Find an expression for E[X n ] for n = 1, 2, . . . .
(c) Find the mean, variance, median and mode of X.
(d) Find E[et ln(X) ], the moment generating function of ln(X).
Page 38 Section 20
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
41. Suppose X has the Weibull (β, γ) distribution.
(a) Find hX (x) = f (x)/[1 − F (x)], the hazard function of X.
(b) Check that if β < 1 then hX decreases as x increases; if β = 1 then hX is constant; and if β > 1 then hX increases
as x increases.
42. Suppose U has the uniform U (0, 1) distribution. Show that X = γ(− ln U )1/β has the Weibull (β, γ) distribution.
20 Size, shape and related characterization theorems
20.1 Size and shape: the definitions. The results in this section on size and shape are from [M OSSIMAN(1970)]
and [JAMES(1979)].
Definition(20.1a). The function g : (0, ∞)n → (0, ∞) is an n-dimensional size variable iff
g(ax) = ag(x)
for all a > 0 and all x ∈ (0, ∞)n .
Suppose g : (0, ∞)n → (0, ∞) is an n-dimensional size variable. Then the function
→ (0, ∞)n is the shape function associated with g iff
x
for all x ∈ (0, ∞)n .
z(x) =
g(x)
Definition(20.1b).
z:
(0, ∞)n
20.2 Size and shape: standard examples.
• The standard size function. This is g(x1 , . . . , xn ) = x1 + · · · + xn . The associated shape function is the function
z : (0, ∞)n → (0, ∞)n with
!
x1
xn
z(x1 , . . . , xn ) = Pn
, . . . , Pn
j=1 xj
j=1 xj
• Dimension 1 size. This is g(x1 , . . . , xn ) = x1 . The associated shape function is
x2
xn
z(x1 , . . . , xn ) = 1, , . . . ,
x1
x1
• Dimension 2 size. This is g(x1 , . . . , xn ) = x2 . The associated shape function is
x1
xn
z(x1 , . . . , xn ) =
, 1, . . . ,
x2
x2
• Volume. This is g(x1 , . . . , xn ) = (x21 + · · · + x2n )1/2 . The associated shape function is
xn
x1
,..., 2
z(x1 , . . . , xn ) =
(x21 + · · · + x2n )1/2
(x1 + · · · + x2n )1/2
• The maximum. This is g(x1 , . . . , xn ) = max{x1 , . . . , xn }. The associated shape function is
x1
xn
z(x1 , . . . , xn ) =
,...,
max{x1 , . . . , xn }
max{x1 , . . . , xn }
• Root n size. This is g(x1 , . . . , xn ) = (x1 x2 . . . xn )1/n . The associated shape function is
x1
xn
z(x1 , . . . , xn ) =
,...,
(x1 x2 . . . xn )1/n
(x1 x2 . . . xn )1/n
20.3 Size and shape: the fundamental result. We shall show that:
• if any one shape function z(X) is independent of the size variable g(X), then every shape function is independent
of g(X);
• if two size variables g(X) and g ∗ (X) are both independent of the same shape function z(X), then g(X)/g ∗ (X) is
almost surely constant.
First a specific example10 of this second result:
Example(20.3a). Suppose X = (X1 , X2 , X3 ) ∼ logN (µ, Σ) distribution. By definition, this means that if Y1 = ln(X1 ),
Y2 = ln(X2 ) and Y3 = ln(X3 ), then (Y1 , Y2 , Y3 ) ∼ N (µ, Σ).
Define the three size functions:
√
g1 (x) = x1
g2 (x) = x2 x3
g3 (x) = (x1 x2 x3 )1/3
10
Understanding this example is not necessary for the rest of the section. The example makes use of the definition of the
multivariate normal and the fact that normals are independent if the covariance is zero. See 2-§3.6 on page 47.
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Section 20 Page 39
and let z1 , z2 and z3 denote the corresponding shape functions. Suppose g1 (X) is independent of z1 (X).
(a) Show that var[Y1 ] = cov[Y1 , Y2 ] = cov[Y1 , Y3 ].
(b) Show that g1 (X) is independent of z2 (X).
(c) Show that g1 (X) is independent of g2 (X)/g1 (X).
Now suppose g3 (X) is also independent of z1 (X).
(d) Show that cov[Y1 , S] = cov[Y2 , S] = cov[Y3 , S] where S = Y1 + Y2 + Y3 .
(e) Show that var[Y2 ] + cov[Y2 , Y3 ] = var[Y3 ] + cov[Y2 , Y3 ] = 2var[Y1 ].
(f) Show that var[2Y1 − Y2 − Y3 ] = 0 and hence g1 (X)/g3 (X) is constant almost everywhere.
Solution. We are given X1 is independent of 1, X2 /X1 , X3 /X1 . Taking logs shows that Y1 is independent of (Y2 −
Y1 , Y3 − Y1 ) and these are normal. Hence cov[Y1 , Y2 − Y1 ] = cov[Y1 , Y3 − Y1 ] = 0 and hence (a).
(b) follows because Y1 is independent of (Y1 − 21 Y2 − 21 Y3 , 12 Y2 − 21 Y3 , 12 Y3 − 21 Y2 ).
(c) Now cov[Y1 , 12 (Y2 +Y3 )−Y1 ) = 21 cov[Y1 , Y2 ]+ 12 cov[Y1 , Y3 ]−var[Y1 ] = 0. By normality, ln (g1 (X)) = Y1 is independent
of log (g2 (X)) − ln (g1 (X)). Because the exponential function is one-one, we have (c).
(d) The assumption g3 (X) is independent of z1 (X) implies, by taking logs, that S is independent of (Y2 − Y1 , Y3 − Y1 ) and
these are normal. Hence (d).
(e) Expanding cov[Y1 , S] and using part (a) shows that cov[Y1 , S] = 3var[Y1 ]. Similarly, expanding cov[Y2 , S] shows
that var[Y2 ] + cov[Y2 , Y3 ] + cov[Y1 , Y2 ] = cov[Y2 , S] = cov[Y1 , S] = 3var[Y1 ]. Hence (e).
(f) Now var[2Y1 − Y2 − Y3 ] = 4var[Y1 ] − 4cov[Y
1 , Y2 ] − 4cov[Y1 , Y3 ] + var[Y2 ] + var[Y3 ] + 2cov[Y2 , Y3 ] = 0. Hence
var[Y1 − 31 S] = 0; hence var[ln g1 (X)/g3 (X) ] = 0. Hence (f).
Now for the general result:
Suppose g : (0, ∞)n → (0, ∞) is an n-dimensional size variable and z ∗ : (0, ∞)n →
is any shape function. Suppose further that X is a random vector such that z ∗ (X) is non-degenerate
and independent of g(X). Then
(a) for any other shape function z1 : (0, ∞)n → (0, ∞)n , z1 (X) is independent of g(X);
(b) if g2 : (0, ∞)n → (0, ∞) is another size variable such that z ∗ (X) is independent of both g2 (X) and g(X),
then
g2 (X)
is constant almost everywhere.
g(X)
Proposition(20.3b).
(0, ∞)n
Proof. Let g ∗ and g1 denote the size variables which lead to the shape functions z ∗ and z1 . Hence
x
x
z ∗ (x) = ∗
and z1 (x) =
for all x ∈ (0, ∞)n .
g (x)
g1 (x)
For all x ∈ (0, ∞)n we have
g1 (x)
x
by using g1 (ax) = ag1 (x).
g1 z ∗ (x) = g1
= ∗
g ∗ (x)
g (x)
Hence for all x ∈ (0, ∞)n
z ∗ (x)
g ∗ (x)
x
z1 z ∗ (x) =
×
= z1 (x)
(20.3a)
=
g1 ( z ∗ (x) ) g ∗ (x)
g1 (x)
∗
∗
Equation(20.3a) shows that z1 (X) is a function of z (X); also, we are given that z (X) is independent of g(X). Hence
z1 (X) is independent of g(X). This proves (a).
(b) Because of part (a), we can assume
X
X
z2 (X) =
is independent of g(X)
and
z(X) =
is independent of g2 (X)
g2 (X)
g(X)
Applying g to the first and g2 to the second gives
g(X)
g2 (X)
g ( z2 (X) ) =
is independent of g(X)
and
g2 ( z(X) ) =
is independent of g2 (X)
g2 (X)
g(X)
and hence
g2 (X)
is independent of both g2 (X) and g(X).
g(X)
Hence result by part (b) of exercise 8 on page 7.
20.4 A characterization of the gamma distribution.
Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
n
X
∗
g (x) =
xj
Proposition(20.4a).
j=1
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exist α > 0, k1 > 0, . . . , kn > 0
such that Xj ∼ Gamma(kj , α) for j = 1, 2, . . . , n.
Page 40 Section 20
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
Proof.
P
⇐ Now g ∗ (X) = nj=1 Xj ∼ Gamma(k1 + · · · + kn , α). Proposition(4.7b) implies Xj /(X1 + · · · + Xn ) is independent
of g ∗ (X) = X1 + · · · + Xn for j = 1, 2, . . . , n. Hence the standard shape vector
X2
Xn
X1
,
,...,
z(X) =
X1 + · · · + Xn X1 + · · · + Xn
X1 + · · · + Xn
∗
∗
is independent of g (X). Hence all shape vectors are independent of g (X).
⇒ By proposition(20.3b), if there exists one shape vector which is independent of g ∗ (X), then all shape vectors are
independent of g ∗ (X). Hence Xj /(X1 + · · · + Xn ) is independent of g ∗ (X) = X1 + · · · + Xn for j = 1, 2, . . . , n. Hence
by proposition(4.7b), there exists α > 0 and kj > 0 such that Xj ∼ Gamma(kj , α) for j = 1, 2, . . . , n.
This result implies many others. For example, suppose X1 , X2 , . . . , Xn are independent random variables with
Xj ∼ Gamma(kj , α). Then every shape vector is independent of X1 + X2 + · · · + Xn ; in particular,
Xj
X1 + X2 + · · · + Xn is independent of
max{X1 , X2 , . . . , Xn }
and
X1 + X2 + · · · + Xn
X1 + X2 + · · · + Xn is independent of
max{X1 , X2 , . . . , Xn }
20.5 A characterization of the Pareto distribution.
Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
g ∗ (x) = min{x1 , . . . , xn }
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for
j = 1, 2, . . . , n such that Xj ∼ Pareto(αj , 0, x0 ) for j = 1, 2, . . . , n.
Proposition(20.5a).
Proof.
⇐ Let Y1 = ln(X1 ) and Y2 = ln(X2 ). Then Y1 −ln(x0 ) ∼ exponential (α1 ) and Y2 −ln(x0 ) ∼ exponential (α2 ) and Y1 and
Y2 are independent. By exercise 5 on page 41, we know that if Y1 − a ∼ exponential (λ1 ) and Y2 − a ∼ exponential (λ2 )
and Y1 and Y2 are independent, then min{Y1 , Y2 } is independent of Y2 − Y1 .
This establishes U = min{Y1 , Y2 } is independent of V = Y2 − Y1 . But (Y3 , . . . , Yn ) is independent of U and V . Hence
min{Y1 , . . . , Yn } is independent of Y2 −Y1 . Similarly min{Y1 , . . . , Yn } is independent of Yj −Y1 for j = 2, . . . , n. Hence
min{Y1 , . . . , Yn } is independent of the vector (Y2 − Y1 , Y3 − Y1 , . . . , Yn − Y1 ). And hence g ∗ (X) = min{X1 , . . . , Xn ) is
independent of the shape vector (1, X2/X1 , . . . , Xn/X1 ) as required.
⇒ Suppose n = 2. Using the shape vector (1, x2/x1 ) implies that we are given min{X1 , X2 } is independent of X2 /X1 .
Taking logs shows that min{Y1 , Y2 } is independent of Y2 − Y1 where Y1 = ln(X1 ) and Y2 = ln(X2 ).
It is known (see [C RAWFORD(1966)]) that if Y1 and Y2 are independent random variables with an absolutely continuous
distribution and min(Y1 , Y2 ) is independent of Y2 − Y1 , then there exist a ∈ R, α1 > 0 and α2 > 0 such that Y1 − a ∼
exponential (α1 ) and Y2 − a ∼ exponential (α2 ). Hence fY1 (y1 ) = α1 e−α1 (y1 −a) for y1 > a and fY2 (y2 ) = α2 e−α2 (y2 −a) for
y2 > a.
Hence X1 = eY1 ∼ Pareto(α1 , 0, x0 = ea ) and X2 = eY2 ∼ Pareto(α2 , 0, x0 = ea ) where x0 > 0.
For n > 2 we are given that
Xj
is independent of min{X1 , . . . , Xn } for j = 1, 2, . . . , n.
min{X1 , . . . , Xn }
But
min{X1 , . . . , Xn } = min{Xj , Zj } where Zj = min{Xi : i 6= j}
Hence for some x0j > 0, λj > 0 and λ∗j > 0, Xj ∼ Pareto(λj , 0, x0j ) and Zj ∼ Pareto(λ∗j , 0, x0j ). Because Zj =
min{Xi : i 6= j} we must have x0j ≤ x0i for j 6= i. It follows that all x0j are equal. Hence result.
20.6 A characterization of the power distribution. Because the inverse of a Pareto random variable has the
power distribution, the previous proposition can be transformed into a result about the power distribution.
Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose
→ (0, ∞) denotes the size variable
g ∗ (x) = max{x1 , . . . , xn }
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for
j = 1, 2, . . . , n such that Xj ∼ Power(αj , 0, x0 ) for j = 1, 2, . . . , n.
Proposition(20.6a).
g ∗ (0, ∞)n
Proof.
⇐ Let Y1 = ln( 1/X1 ) and Y2 = ln( 1/X2 ). By exercise 4 on page 33, Y1 − ln( 1/x0 ) ∼ exponential (α1 ) and Y2 − ln( 1/x0 ) ∼
1 Univariate Continuous Distributions
Jun 7, 2018(14:22)
Exercises 21 Page 41
exponential (α2 ) and Y1 and Y2 are independent. By exercise 5 on page 41, we know that if Y1 − a ∼ exponential (λ1 ) and
Y2 − a ∼ exponential (λ2 ) and Y1 and Y2 are independent, then min{Y1 , Y2 } is independent of Y1 − Y2 .
This establishes U = min{Y1 , Y2 } is independent of V = Y1 − Y2 . But (Y3 , . . . , Yn ) is independent of U and V . Hence
min{Y1 , . . . , Yn } is independent of Y1 −Y2 . Similarly min{Y1 , . . . , Yn } is independent of Y1 −Yj for j = 2, . . . , n. Hence
min{Y1 , . . . , Yn } is independent of the vector (Y1 − Y2 , Y1 − Y3 , . . . , Y1 − Yn ). And hence g ∗ (X) = max{X1 , . . . , Xn ) is
independent of the shape vector (1, X2/X1 , . . . , Xn/X1 ) as required.
⇒ Suppose n = 2. Using the shape vector (1, x2/x1 ) implies that we are given max{X1 , X2 } is independent of X2 /X1 .
Set Y1 = ln( 1/X1 ) and Y2 = ln( 1/X2 ). Hence min{Y1 , Y2 } is independent of Y2 − Y1 .
It is known (see [C RAWFORD(1966)]) that if Y1 and Y2 are independent random variables with an absolutely continuous
distribution and min(Y1 , Y2 ) is independent of Y2 − Y1 , then there exist a ∈ R, α1 > 0 and α2 > 0 such that Y1 − a ∼
exponential (α1 ) and Y2 − a ∼ exponential (α2 ). Hence fY1 (y1 ) = α1 e−α1 (y1 −a) for y1 > a and fY2 (y2 ) = α2 e−α2 (y2 −a) for
y2 > a.
Hence X1 = e−Y1 ∼ Power(α1 , 0, h = e−a ) and X2 = e−Y2 ∼ Power(α2 , 0, h = e−a ) where h > 0.
For n > 2 we are given that
Xj
is independent of max{X1 , . . . , Xn } for j = 1, 2, . . . , n.
max{X1 , . . . , Xn }
But
max{X1 , . . . , Xn } = max{Xj , Zj } where Zj = max{Xi : i 6= j}
Hence for some hj > 0, λj > 0 and λ∗j > 0, Xj ∼ Power(λj , 0, hj ) and Zj ∼ Power(λ∗j , 0, hj ). Because Zj = max{Xi :
i 6= j} we must have hj ≥ hi for j 6= i. It follows that all hj are equal. Hence result.
21 Exercises
(exs-sizeshape.tex.tex)
1. Suppose X = (X1 , X2 ) is a 2-dimensional random vector with X1 = aX2 where a ∈ R. Show that if z : (0, ∞)2 →
(0, ∞)2 is any shape function, then z(X) is constant almost everywhere.
2. Suppose X = (X1 , X2 ) is a 2-dimensional random vector with the distribution given in the following table:
X2
1
2
3
6
1/4
1
0
0
0
1/4
0
0
0
2
X1
1/4
3
0
0
0
1/4
0
0
0
6
√
Define the size variables g1 (x) = x1 x2 and g2 (x) = x1 + x2 .
(a) Suppose z is any shape function: (0, ∞)2 → (0, ∞)2 . Show that z(X) cannot be almost surely constant. Also, show
that z(X) is independent of both g1 (X) and g2 (X).
(b) Find the distribution of g1 (X)/g2 (X).
3. A characterization of the generalized gamma distribution. Prove the following result.
Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables. Suppose g ∗ (0, ∞)n → (0, ∞)
denotes the size variable

1/b
n
X
g ∗ (x) = 
xbj 
where b > 0.
j=1
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exist α > 0, k1 > 0, . . . , kn > 0 such that
Xj ∼ GGamma(kj , α, b) for j = 1, 2, . . . , n.
Hint: use the result that X ∼ GGamma(n, λ, b) iff X b ∼ Γ(n, λ) and proposition(20.4a).
4. Suppose X1 ∼ exponential (λ1 ), Y ∼ exponential (λ2 ) and X and Y are independent.
(a) Find P[X1 < X2 ].
(b) By using the lack of memory property of the exponential distribution, find the distribution of X1 − X2 .
(c) By using the usual convolution formula for densities, find the denisty of X1 − X2 .
5. Suppose Y1 − a ∼ exponential (λ1 ) and Y2 − a ∼ exponential (λ2 ) and Y1 and Y2 are independent. Show that U =
min{Y1 , Y2 } is independent of V = Y2 − Y1 .
Page 42 Exercises 21
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
6. A generalization of proposition(20.5a) on page 40. Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate
random variables. and θ1 , θ2 , . . . , θn are positive constants. Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
x1
xn
∗
g (x) = min
, ...,
θ1
θn
Prove there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for j =
1, 2, . . . , n such that Xj ∼ Pareto(αj , 0, θj x0 ) for j = 1, 2, . . . , n.
[JAMES(1979)]
7. A generalization of proposition(20.6a) on page 40. Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate
random variables and θ1 , θ2 , . . . , θn are positive constants. Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
x1
xn
∗
g (x) = max
, ...,
θ1
θn
Prove there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for j =
1, 2, . . . , n such that Xj ∼ Power(αj , 0, θj x0 ) for j = 1, 2, . . . , n.
[JAMES(1979)]
CHAPTER 2
Multivariate Continuous Distributions
1 Multivariate distributions—general results
1.1 The mean and variance matrices. If

X1
.
X =  .. 
n×1
Xn
is a random vector, then, provided the univariate expectations E[X1 ], . . . , E[Xn ] exist, we define


E[X1 ]
.
E[X] =  .. 

n×1
Provided the second moments
n × n matrix
E[X12 ],
...,
E[Xn2 ]
E[Xn ]
are finite, the variance matrix or covariance matrix of X is the
var[X] = E[(X − µ)(X − µ)T ]
where µ = E[X].
n×n
The (i, j) entry in this matrix is cov[Xi , Xj ]. In particular, the ith diagonal entry is var[Xi ].
Clearly:
• the variance matrix is symmetric;
• if X1 , . . . , Xn are i.i.d. with variance σ 2 , then var[X] = σ 2 I.
We shall often omit stating “when the seond moments are finite” when it is obviously needed.
1.2 Linear transformations. If Y = X + a then var[Y] = var[X].
More generally, if Y = A + BX where A is m × 1 and B is m × n, then µY = A + BµX and
var[Y] = E (Y − µY )(Y − µY )T = E B(X − µX )(X − µX )T )BT = B var[X] BT
P
In particular, if a = (a1 , . . . , an ) is a 1 × n vector, then aX = ni=1 ai Xi is a random variable and
n X
n
X
var[aX] = a var[X] aT =
ai aj cov[Xi , Xj ]
i=1 j=1
1.3 Positive definiteness of the variance matrix. Suppose X is an n × 1 random vector. Then for any 1 × n
vector c we have
cT var[X]c = var[cT X] ≥ 0
Hence var[X] is positive semi-definite (also called non-negative definite).
Proposition(1.3a). Suppose X is a random vector with finite second moments and such that no element of X is
a linear combination of the other elements. Then var[X] is a symmetric positive definite matrix.
Proof. We are given that if a is a 1 × n vector with aT X = b where b ∈ R then a = 0. Suppose var[cT X] = 0; hence cT X
is constant; hence c = 0. We have shown that cT var[X]c = 0 implies c = 0 and hence var[X] is positive definite.
Example(1.3b). Consider the random vector Z = (X, Y, X + Y )T where µX = E[X], µY = E[Y ] and ρ = corr[X, Y ].
Show that var[Z] is not positive definite.
Solution. Let a = (1, 1, −1). Then a var[Z] aT = var[aZ] = var[0] = 0.
1.4 The square root of a variance matrix. Suppose C is real, symmetric and positive definite. Because C is real
and symmetric, we can write C = LDLT where D = diag(d1 , . . . , dp ) is diagonal and L is orthogonal1 . Because
C is also positive definite, we have d1 > 0, . . . , dp > 0. Hence we can write C = (LD1/2 LT )(LD1/2 LT ) = QQ
where Q is symmetric and nonsingular.
Now suppose X is a random vector with finite second moments and such that no element of X is a linear combination of the other elements; then var[X] = QQ. Hence if Y = Q−1 X then var(Y) = Q−1 var[X] (Q−1 )T = I.
1
Orthogonal means that LT L = I and hence L−1 = LT .
Bayesian Time Series Analysis by R.J. Reed
Jun 7, 2018(14:22)
Section 1
Page 43
Page 44 Section 3
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
1.5 The covariance between two random vectors.
Definition(1.5a). If X is an m × 1 random vector with finite second moments and Y is an n × 1 random vector
with finite second moments, then ΣX,Y = cov[X, Y] is the m × n matrix with (i, j) entry equal to cov[Xi , Yj ].
It follows that:
• by looking at the (i, j) entries, we see that cov[X, Y] = E[(X − µX )(Y − µy )T ] = E[XYT ] − µX µTY ;
• if n = m, the covariance matrix cov[X, Y] is symmetric;
• cov[X, Y] = cov[Y, X]T and cov[X, X] = var[X].
√
1.6 The correlation matrix. Suppose the random vector X has the variance matrix Σ. Let D = diag(Σ). Then
the correlation matrix of X is given by
corr[X] = D−1 ΣD−1
Clearly, the (i, j) element of corr(X) is corr(X
pi , Xj ). Also, corr(X) is the variance matrix of the random vector
Z = (Z1 , . . . , Zn ) where Zj = (Xj − E[Xj ])/ var(Xj ).
Conversely, given corr[X] we need the vector of standard deviations in order to determine the variance
matrix. In
fact, var[X] = D corr[X] D where D is the diagonal matrix with entries stdev(X1 ), . . . , stdev(Xn ) .
2 Exercises
(exs-multiv.tex)
1. Suppose X is an n × 1 random vector with finite second moments. Show that for any n × 1 vector α ∈ Rn we have
E[(X − α)(X − α)T ] = var[X] + (µX − α)(µx − α)T
2. Suppose X is an m × 1 random vector and Y is an n × 1 random vector. Suppose further that all second moments are
finite. Finally, suppose a is an m × 1 vector and b is an n × 1 vector. Show that
m X
n
X
cov[aT X, bT Y] = aT cov[X, Y]b =
ai bj cov[Xi , Yj ]
i=1 j=1
3. Properties of the covariance matrix. Suppose X is an m × 1 random vector and Y is an n × 1 random vector. Suppose
further that all second moments are finite.
(a) Show that for any m × 1 vector b and any n × 1 vector d we have
cov[X + b, Y + d] = cov[X, Y]
(b) Show that for any ` × m matrix A and any p × n matrix B we have
cov[AX, BY] = Acov[X, Y]BT
(c) Suppose a, b, c and d ∈ R; suppose further that V is an m × 1 random vector and W is an n × 1 random vector
with finite second moments.
cov[aX + bV, cY + dW] = ac cov[X, Y] + ad cov[X, W] + bc cov[V, Y] + bd cov[V, W]
Both sides are m × n matrices.
(d) Suppose a and b ∈ R. Show that
var[aX + bV] = a2 var[X] + 2ab cov[X, V] + b2 var[V]
4. Suppose Y1 , Y2 , . . . , Yn are independent random variables each with variance 1. Let X1 = Y1 , X2 = Y1 + Y2 , . . . , Xn =
Y1 + · · · + Yn . Find the n × n matrix var[X].
5. Suppose Y is an n × 1 random vector with finite expectation E[Y] = µ and finite variance var[Y] = Σ. Show that for
any n × n matrix A we have E[YT AY] = trace(AΣ) + µT Aµ.
3 The bivariate normal
3.1 The density. Here is the first of several equivalent formulations of the density.
Definition(3.1a). The random vector (X1 , X2 ) has a bivariate normal distribution iff it has density
|P|1/2
(x − µ)T P (x − µ)
fX1 X2 (x1 , x2 ) =
exp −
2π
2
x1
µ1
where x =
, µ =
∈ R2 and P is a real symmetric positive definite matrix.
x
µ2
2
2×1
2×1
2×2
(3.1a)
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
Section 3 Page 45
Suppose the entries in the 2 × 2 real symmetric matrix P are denoted as follows2 :
a1 a2
P=
a2 a3
It follows that equation(3.1a) is equivalent to
q
a1 a3 − a22
a1 (x1 − µ1 )2 + 2a2 (x1 − µ1 )(x2 − µ2 ) + a3 (x2 − µ2 )2
f (x1 , x2 ) =
exp −
2π
2
(3.1b)
To show that equation(3.1b) defines a density. Clearly f ≥ 0. It remains to check that f integrates to 1. Let
y1 = x1 − µ1 and y2 = x2 − µ2 . Then
Z Z
fX1 X2 (x1 , x2 ) dx1 dx2
x1
x2
a1 y12 + 2a2 y1 y2 + a3 y22
dy1 dy2
(3.1c)
exp −
2
y1 y2
(
2 )
2
Z Z
y2
a22
|P|1/2
a2
a1
=
exp −
y1 + y2
a3 −
dy1 dy2
(3.1d)
exp −
2π y1 y2
2
a1
2
a1
√
√ a1 a3 −a22
. This transformation has Jacobian
Now use the transformation z1 = a1 y1 + aa12 y2 and z2 = y2 √
a1
q
a1 a3 − a22 = |P|1/2 and is a 1 − 1 map R2 → R2 ; it gives
2
Z Z
Z Z
z1 + z22
1
exp −
dz1 dz2 = 1
fX1 X2 (x1 , x2 ) dx1 dx2 =
2π z1 z2
2
x1 x2
by using the integral of the standard normal density equals one.
|P|1/2
=
2π
Z Z
3.2 The marginal distributions of X1 and X2 . For the marginal density of X2 we need to find the following
integral:
Z
fX2 (x2 ) =
fX1 X2 (x1 , x2 ) dx1
x1
First let Y1 = X1 − µ1 and Y2 = X2 − µ2 and find the density of Y2 :
Z
Z
a1 y12 + 2a2 y1 y2 + a3 y22
|P|1/2
exp −
dy1
fY2 (y2 ) =
fY1 ,Y2 (y1 , y2 ) dy1 =
2π y1
2
y1
Using the decomposition in equation(3.1d) gives
(
2
Z
2 )
y2
a22
|P|1/2
a1
a2
fY2 (y2 ) =
exp −
a3 −
exp −
y1 + y2
dy1
2π
2
a1
2
a1
y1
2
r
y2 a1 a3 − a22
y22
|P|1/2
2π
1
=
exp −
=q
exp − 2
2π
2
a1
a1
2σ2
2πσ22
where σ22 = a1 /(a1 a3 − a22 ) = a1 /|P|. It follows that the density of X2 = Y2 + µ2 is
1
(x2 − µ2 )2
a1
a1
fX2 (x2 ) = q
exp −
where σ22 =
=
2
2
2σ2
(a1 a3 − a2 ) |P|
2πσ22
We have shown that the marginal distributions are normal:
X2 has the N (µ2 , σ22 ) distribution.
Similarly,
X1 has the N (µ1 , σ12 ) distribution.
where
a3
a1
a3
a1
σ12 =
=
and σ22 =
=
2
2
a1 a3 − a2 |P|
a1 a3 − a2 |P|
2
It is easy to check that the real symmetric matrix P is positive definite iff a1 > 0 and a1 a3 − a22 > 0.
Page 46 Section 3
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
3.3 The covariance and correlation between X1 and X2 . Of course, cov[X1 , X2 ] = cov[Y1 , Y2 ] where Y1 =
X1 − µ1 and Y2 = X2 − µ2 . So it suffices to find cov[Y1 , Y2 ] = E[Y1 Y2 ]. The density of (Y1 , Y2 ) is:
a1 y12 + 2a2 y1 y2 + a3 y22
|P|1/2
fY1 Y2 (y1 , y2 ) =
exp −
2π
2
It follows that
Z Z
y1
a1 y12 + 2a2 y1 y2 + a3 y22
2π
exp −
dy1 dy2 = q
2
y2
a1 a3 − a22
Differentiating with respect to a2 gives
Z Z
a1 y12 + 2a2 y1 y2 + a3 y22
2π
dy1 dy2 = a2
(−y1 y2 ) exp −
2
(a1 a3 − a22 )3/2
y1 y2
and hence
cov[X1 , X2 ] = E[Y1 Y2 ] = −
The correlation between X1 and X2 is
ρ=
a2
a1 a3 − a22
cov[X1 , X2 ]
a2
= −√
σ1 σ2
a1 a3
These results lead to an alternative expression for the density of a bivariate normal:
1
1
(x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
(x1 − µ1 )2
p
fX1 X2 (x1 , x2 ) =
exp −
−
2ρ
+
2(1 − ρ2 )
σ 1 σ2
σ12
σ22
2πσ1 σ2 1 − ρ2
(3.3a)
We have also shown that
var[X] =
σ12
cov[X1 , X2 ]
= P−1
cov[X1 , X2 ]
σ22
P is sometimes called the precision matrix—it is the inverse of the variance matrix var[X].
Summarizing some of these results:
1
a3 −a2
a1 a2
2
−1
P=
|P| = a1 a3 − a2
P =
a2 a3
a1 a3 − a22 −a2 a1
Σ=P
−1
σ12
=
ρσ1 σ2
ρσ1 σ2
σ22
|Σ| = (1 − ρ
2
)σ12 σ22
−1
Σ
1
=P=
(1 − ρ2 )σ12 σ22
σ22
−ρσ1 σ2
−ρσ1 σ2
σ12
(3.3b)
Example(3.3a). Suppose (X, Y ) has a bivariate normal distribution with density
1 2
1
2
f (x, y) = exp − (x + 2y − xy − 3x − 2y + 4)
k
2
Find the mean vector and the variance matrix of (X, Y ). What is the value of k?
Solution. Let Q(x, y) = a1 (x−µ1 )2 +2a2 (x−µ1 )(y −µ2 )+a3 (y −µ2 )2 . So we want Q(x, y) = x2 +2y 2 −xy −3x−2y +4.
Equating coefficients of x2 , xy and y 2 gives a1 = 1, a2 = − 21 and a3 = 2. Hence
4 2 12
1 − 12
−1
P=
and Σ = P =
− 12
2
7 21 1
√
Also |P| = 74 and hence k = 2π/|P|1/2 = 4π/ 7.
Now ∂Q(x,y)
= 2a1 (x − µ1 ) + 2a2 (y − µ2 ) and ∂Q(x,y)
= 2a2 (x − µ1 ) + 2a3 (y − µ2 ). If ∂Q(x,y)
= 0 and ∂Q(x,y)
= 0 then
∂x
∂y
∂x
∂y
2
we must have x = µ1 and y = µ2 because |P| = a1 a3 − a2 6= 0.
Applying this to Q(x, y) = x2 + 2y 2 − xy − 3x − 2y + 4 gives the equations 2µ1 − µ2 − 3 = 0 and 4µ2 − µ1 − 2 = 0.
Hence (µ1 , µ2 ) = (2, 1).
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
Section 3 Page 47
3.4 The characteristic function. Suppose XT = (X1 , X2 ) has the bivariate density defined in equation(3.1a).
Then for all t ∈ R2 , the characteristic function of X is
2×1
h
φ(t) = E e
=
itT X
i
|P|1/2
=
2π
Z Z
|P|1/2 itT µ
e
2π
y1
(x − µ)T P(x − µ)
dx dy
e
exp −
2
x1 x2
T
2it y − yT Py
exp
dy1 dy2 by setting y = x − µ.
2
y2
Z Z
itT x
But y Py − 2it y = (y − iΣt) P(y − iΣt) + tT Σt where Σ = P−1 = var[X]. Hence
Z Z
|P|1/2 itT µ− 1 tT Σt
(y − iΣt)T P(y − iΣt)
2
φ(t) =
e
exp −
dy1 dy2
2π
2
y1 y2
T
T
T
= eit
T
µ− 21 tT Σt
by using the integral of equation(3.1a) is 1.
3.5 The conditional distributions. We first find the conditional density of Y1 given Y2 where Y1 = X1 − µ1 and
Y2 = X2 − µ2 . Now
fY Y (y1 , y2 )
fY1 |Y2 (y1 |y2 ) = 1 2
fY2 (y2 )
We use the following forms:
 n
o
σ12 2
2ρσ1
2
y1 − σ2 y1 y2 + σ2 y2
1


2
p
exp −
fY1 Y2 (y1 , y2 ) =

2
2
2
2σ1 (1 − ρ )
2πσ1 σ2 1 − ρ
y22
fY2 (y2 ) = q
exp − 2
2σ2
2πσ22
1
and hence
f (y1 |y2 ) = √
1
2π
q
σ12 (1 − ρ2 )
 n
y12 −

exp −
ρ2 σ12 2
y
σ22 2
2ρσ1
σ2 y1 y2
+
2σ12 (1
ρ2 )
−
o


 n
o2 
ρσ1
y
−
y
1
σ2 2
1


exp −
=√ q

2
2)
2σ
(1
−
ρ
2
2
1
2π σ (1 − ρ )
1
− ρ2 ) distribution.
2 (1 − ρ2 ) distribution, and hence
1
(x
−
µ
),
σ
It follows that the density of X1 given X2 is the N µ1 + ρσ
2
2
1
σ2
which is the density of the N
E[X1 |X2 ] = µ1 +
ρσ1
σ2 (X2
ρσ1
2
σ2 y2 , σ1 (1
− µ2 ) and var[X1 |X2 ] = σ12 (1 − ρ2 ).
In terms of the original notation, σ12 (1 − ρ2 ) = 1/a1 and ρσ1 /σ2 = −a2 /a1 and hence the distribution of X1 given
X2 is N µ1 −
a2
a1 (x2
− µ2 ), a11 .
We have also shown that if the random vector (X1 , X2 ) is bivariate normal, then E[X1 |X2 ] is a linear function
of X2 and hence the best predictor and best linear predictor are the same—see exercises 9 and 12 on page 7.
3.6 Independence of X1 and X2 .
Proposition(3.6a). Suppose (X1 , X2 ) ∼ N (µ, Σ). Then X1 and X2 are independent iff ρ = 0.
Proof. If ρ = 0 then fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ). Conversely, if X1 and X2 are independent then cov[X1 , X2 ] = 0
and hence ρ = 0.
In terms of entries in the precision matrix: X1 and X2 are independent iff a2 = 0.
Page 48 Exercises 4
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
3.7 Linear transformation of a bivariate normal.
Proposition(3.7a). Suppose X has the bivariate normal distribution N (µ, Σ) and C is a 2 × 2 non-singular matrix.
Then the random vector Y = a + CX has the bivariate normal distribution N (a + Cµ, CΣCT ).
Proof. The easiest way is to find the characteristic function of Y. For t ∈ R2 we have
2×1
itT Y
itT a
itT CX
itT a itT Cµ− 21 tT CΣCT t
φY (t) = E[e
] = e E[e
]=e e
which is the characteristic function of the bivariate normal N (a + Cµ, CΣCT ).
We need C to be non-singular in order to ensure the variance matrix of the result is non-singular.
3.8
Summary. The bivariate normal distribution.
Suppose X = (X1 , X2 ) ∼ N (µ, Σ) where µ = E[X] and Σ = var[X].
• Density.
|P|1/2
(x − µ)T P(x − µ)
fX (x) =
exp −
where P = Σ−1 is the precision matrix.
2π
2
1
1
(x1 − µ1 )2
(x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
p
=
exp −
+
− 2ρ
2(1 − ρ2 )
σ1 σ2
σ12
σ22
2πσ1 σ2 1 − ρ2
1
a1 a2
a3 −a2
then Σ = P−1 =
.
• If P =
a2 a3
a1 a3 − a22 −a2 a1
1
σ22
−ρσ1 σ2
−1
• P=Σ = 2 2
and |Σ| = (1 − ρ2 )σ12 σ22 .
σ12
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2
• The marginal distributions. X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ).
• The characteristic function. φ(t) = exp itT µ − 21 tT Σt .
• The conditional distributions.
2 (1 − ρ2 ) .
1
The distribution of X1 given X2 = x2 is N µ1 + ρσ
(x
−
µ
),
σ
2
2
1
σ2
ρσ2
The distribution of X2 given X1 = x1 is N µ2 + σ1 (x1 − µ1 ), σ22 (1 − ρ2 ) .
• X1 and X2 are independent iff ρ = 0.
• Linear transformation of a bivariate normal. If C is non-singular, then Y = a + CX has a bivariate
normal distribution with mean a + Cµ and variance matrix CΣCT .
4 Exercises
(exs-bivnormal.tex)
1. Suppose (X, Y ) has the density
2
fXY (x, y) = ce−(x −xy+y
2
)/3
(a) Find c.
(b) Are X and Y independent?
2. Suppose (X, Y ) has a bivariate normal distribution with density
f (x, y) = k exp −(x2 + 2xy + 4y 2 )
Find the mean vector and the variance matrix of (X, Y ). What is the value of k?
3. Suppose (X, Y ) has a bivariate normal distribution with density
1
1
2
2
f (x, y) = exp − (2x + y + 2xy − 22x − 14y + 65)
k
2
Find the mean vector and the variance matrix of (X, Y ). What is the value of k?
4. Suppose the random vector Y = (Y1 , Y2 ) has the density
1
1 2
2
f (y1 , y2 ) = exp − (y1 + 2y2 − y1 y2 − 3y1 − 2y2 + 4)
k
2
Find E[Y] and var[Y].
for y = (y1 , y2 ) ∈ R2 .
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
Exercises 4 Page 49
5. Evaluate the integral
Z
∞
−∞
exp −(y12 + 2y1 y2 + 4y22 ) dy1 dy2
6. Suppose the random vector Y = (Y1 , Y2 ) has the density
1
f (y1 , y2 ) = k exp − (y12 + 2y1 (y2 − 1) + 4(y2 − 1)2
12
Show that Y ∼ N (µ, Σ) and find the values of µ and Σ.
for y = (y1 , y2 ) ∈ R2 .
7. Suppose X = (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0 and variance matrix Σ. Prove that
X2
XT PX − 21 ∼ χ21
σ1
where P is the precision matrix of X.
8. (a) Suppose E[X1 ] = µ1 , E[X2 ] = µ2 and there exists α such that Y = X1 + αX2 is independent of X2 . Prove that
E[X1 |X2 ] = µ1 + αµ2 − αX2 .
(b) Use part (a) to derive E[X1 |X2 ] for the bivariate normal.
9. An alternative method
for constructing the bivariate normal. Suppose X and Y are i.i.d. N (0, 1). Suppose ρ ∈ (−1, 1)
p
and Z = ρX + 1 − ρ2 Y .
(a) Find the density of Z.
(b) Find the density of (X, Z).
(c) Suppose µ1 ∈ R, µ2 ∈ R, σ1 > 0 and σ2 > 0. Find the density of (U, V ) where U = µ1 + σ1 X and V = µ2 + σ2 Z.
10. Suppose (X1 , X2 ) has the bivariate normal distribution with density given by equation(3.3a). Define Q by:
e−Q(x1 ,x2 )
p
fX1 X2 (x1 , x2 ) =
2πσ1 σ2 1 − ρ2
Hence
(x1 − µ1 )2
(x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
1
−
2ρ
Q(x1 , x2 ) =
+
2(1 − ρ2 )
σ1 σ2
σ12
σ22
Define the random variable Y by Y = Q(X1 , X2 ). Show that Y has the exponential density.
11. (a) Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0. Hence it has characteristic function
1 T
1 2 2
2 2
φX (t) = exp − t Σt = exp − σ1 t1 + 2σ12 t1 t2 + σ2 t2
2
2
Explore the situations when Σ is singular.
(b) Now suppose (X1 , X2 ) has a bivariate normal distribution without the restriction of zero means. Explore the
situations when the variance matrix Σ is singular.
12. Suppose T1 and T2 are i.i.d. N (0, 1). Set X = a1 T1 + a2 T2 and Y = b1 T1 + b2 T2 where a21 + a22 > 0 and a1 b2 6= a2 b1 .
(a) Show that E[Y |X] = X(a1 b1 + a2 b2 )/(a21 + a22 ).
2
(b) Show that E Y − E(Y |X) = (a1 b2 − a2 b1 )2 /(a21 + a22 ).
13. Suppose (X, Y ) has a bivariate normal distribution with E[X] = E[Y ] = 0, var[X] = var[Y ] = 1 and cov[X, Y ] = ρ.
Show that X 2 and Y 2 are independent iff ρ = 0.
2
(Note. If var[X] = σX
and var[Y ] = σY2 then just set X1 = X/σX and Y1 = Y /σY .)
14. Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0. Let Z = X1 /X2 . Show that
p
σ1 σ2 1 − ρ2
fZ (z) =
π(σ22 z 2 − 2ρσ1 σ2 z + σ12 )
15. Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0 and var[X1 ] = var[X2 ] = 1. Let
ρ = corr[X1 , X2 ] = cov[X1 , X2 ] = E[X1 X2 ]. Show that
X12 − 2ρX1 X2 + X22
∼ χ22
1 − ρ2
16. Suppose (X, Y ) has a bivariate normal distribution with E[X] = E[Y ] = 0. Show that
1
1
P[X ≥ 0, Y ≥ 0] = P[X ≤ 0, Y ≤ 0] = +
sin−1 ρ
4 2π
1
1
P[X ≤ 0, Y ≥ 0] = P[X ≥ 0, Y ≤ 0] = −
sin−1 ρ
4 2π
Page 50 Section 5
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
5 The multivariate normal
5.1 The multivariate normal distribution.
The random vector X has a (non-singular) multivariate normal
n×1
distribution iff X has density
1
T
f (x) = C exp − (x − µ) P(x − µ)
2
for x ∈ Rn
(5.1a)
where
• C is a constant so that the density integrates to 1;
• µ is a vector in Rn ;
• P is a real symmetric positive definite n × n matrix called the precision matrix.
5.2 Integrating the density. Because P is a real symmetric positive definite matrix, there exists an orthogonal
matrix L with
P = LT DL
where L is orthogonal and D is diagonal with entries d1 > 0, . . . , dn > 0. This result is explained in §1.4 on
page 43.
Consider the transformation Y = L(X − µ); this is a 1 − 1 transformation: Rn → Rn which has a Jacobean with
absolute value:
∂(y1 , . . . , yn ) ∂(x1 , . . . , xn ) = |(det(L)| = 1
Hence, using x − µ = LT y gives
n
Y
1 T
1
1 T
T
2
exp − dj yj
fY (y) = C exp − y LPL y = C exp − y Dy = C
2
2
2
j=1
1/d
It follows that Y1 , . . . , Yn are independent with distributions N (0, 1 ), . . . , N (0, 1/dn ) respectively, and
√
√
√
d1 · · · dn
det(D)
det(P)
=
=
C=
n/2
n/2
(2π)
(2π)
(2π)n/2
It follows that equation(5.1a) becomes
√
det(P)
1
T
f (x) =
exp − (x − µ) P(x − µ)
for x ∈ Rn
2
(2π)n/2
Note that the random vector Y satisfies E[Y] = 0 and var[Y] = D−1 . Using X = µ + LT Y gives
E[X] = µ
T
var[X] = var[L Y] = LT var[Y]L = LT D−1 L = P−1
and hence P is the precision matrix—the inverse of the variance matrix. So equation(5.1a) becomes
1
1
T −1
√
f (x) =
exp − (x − µ) Σ (x − µ) for x ∈ Rn where µ = E[X] and Σ = var[X]. (5.2a)
2
(2π)n/2 det(Σ)
This is defined to be the density of the N (µ, Σ) distribution.
5.3 The characteristic function. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution.
We know that using the transformation Y = L(X − µ) leads to Y ∼ N (0, D−1 ). Because L is orthogonal we have
X = µ + LT Y and the characteristic function of X is:
h T i
h T
i
h T T i
T T
T
φX (t) = E eit X = E eit µ+it L Y = eit µ E eit L Y
for all t ∈ Rn .
n×1
1/d
1/d
But Y1 , . . . , Yn are independent with distributions N (0, 1 ), . . . , N (0, n ), respectively. Hence
h T i
1 T −1
2
2
E eit Y = E ei(t1 Y1 +···+tn Yn ) = e−t1 /2d1 · · · e−tn /2dn = e 2 t D t
for all n × 1 vectors t ∈ Rn . Applying this result to the n × 1 vector Lt gives
h T T i
1 T
1 T T −1
E eit L Y = e− 2 t L D Lt = e− 2 t Σt
We have shown that if X ∼ N (µ, Σ) then
h T i
1 T
T
φX (t) = E eit X = eit µ− 2 t Σt
for t ∈ Rn .
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
Section 5 Page 51
5.4 The singular multivariate normal distribution. In the last section we saw that if µ ∈ Rn and Σ is an
n × n real positive definite matrix, then the function φ : Rn → Cn with
φ(t) = eit
T µ− 1 tT Σt
2
for t ∈ Rn .
is a characteristic function. The conditions can be relaxed as follows:
Proposition(5.4a).
φ:
Rn
→
Cn
with
Suppose µ ∈ Rn and V is an n × n real non-negative definite matrix, then the function
T µ− 1 tT Vt
φ(t) = eit
2
is a characteristic function.
for t ∈ Rn .
Proof. For n = 1, 2, . . . , set Vn = V + n1 I where I is the n × n identity matrix. Then Vn is positive definite and so
T µ− 1 tT V t
n
2
φn (t) = eit
is a characteristic function.
Also φn (t) → φ(t) as n → ∞ for all t ∈ Rn . Finally, φ is continuous at t = 0. It follows by the multidimensional form of
Lévy’s convergence theorem3 in that φ is a characteristic function.
If V is positive-definite, then we know that φ is the characteristic function of the N (µ, Σ) distribution.
If V is only non-negative definite and not positive definite, then by §1.3, we know that some linear combination
of the components is zero and the density does not exist. In this case, we say that the distribution which has the
characteristic function φ is a singular multivariate normal distribution.
5.5 Linear combinations of the components of a multivariate normal.
Suppose the n-dimensional random vector X has the N (µ, Σ) distribution. Then for any
n × 1 vector ` ∈ Rn the random variable Z = `T X has a normal distribution.
Proposition(5.5a).
Proof. Use characteristic functions:
φZ (t) = E[e
and hence Z ∼ N `T µ, `T Σ` .
itZ
] = E[e
it`T X
1
] = φX (t`) = exp it` µ − t2 `T Σ`
2
T
Conversely:
Proposition(5.5b). Suppose X is an n-dimensional random vector such that for every n × 1 vector ` ∈ Rn the
random variable `T X is univariate normal. Then X has the multivariate normal distribution.
Proof. The characteristic function of X is φX (T) = E[exp(itT X)]. Now Z = tT X is univariate normal. Also E[tT X] = tT µ
and var[tT X] = tT Σt where µ = E[X] and Σ = var[X]. Hence Z ∼ N (tT µ, tT Σt). Hence the characteristic function
of Z is, for all u ∈ R:
1 2T
T
φZ (u) = exp iut µ − u t Σt
2
Take u = 1; hence
1 T
T
φZ (1) = exp it µ − t Σt
2
But φZ (1) = E[exp(iZ)] = E[exp(itT X)]. So we have shown that
1 T
T
T
E[exp(it X)] = exp it µ − t Σt
2
and so X ∼ N (µ, Σ).
Combining these two previous propositions gives a characterization of the multivariate normal distribution: the ndimensional random vector X has a multivariate normal distribution iff every linear combination of the components
of X has a univariate normal distribution.
3
Also called the “Continuity Theorem.” See, for example, page 361 in [F RISTEDT & G RAY(1997)].
Page 52 Section 5
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
5.6 The marginal distributions. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution.
Then the characteristic function of X is
1 T
T
φX (t) = E[ei(t1 X1 +···+tn Xn ) ] = eit µ− 2 t Σt for t ∈ Rn .
and hence the characteristic function of X1 is
1
φX1 (t1 ) = φX (t1 , 0, . . . , 0) = exp iµ1 t1 − t21 Σ11
2
and so X1 ∼ N (µ1 , Σ11 ). Similarly, Xj ∼ N (µj , Σjj ) where Σjj is the (j, j) entry in the matrix Σ.
Similarly, the random vector (Xi , Xj ) has the bivariate normal distribution with mean vector (µi , µj ) and variance
matrix
Σii Σij
Σij Σjj
In general, we see that every marginal distribution of a multivariate normal is normal.
The converse is false!!
Example(5.6a). Suppose X and Y are independent 2-dimensional random vectors with distrbutions N (µ, ΣX ) and
N (µ, ΣY ) respectively where
µ1
σ12
ρ1 σ1 σ2
σ12
ρ2 σ1 σ2
µ=
ΣX =
and
Σ
=
Y
µ2
ρ1 σ1 σ2
σ22
ρ2 σ1 σ2
σ22
and ρ1 6= ρ2 . Let
X with probability 1/2;
Z=
Y with probability 1/2.
Show that Z has normal marginals but is not bivariate normal.
Solution. Let Z1 and Z2 denote the components of Z; hence Z = (Z1 , Z2 ). Then Z1 ∼ N (µ1 , σ12 ) and Z2 ∼ N (µ2 , σ22 ).
Hence every marginal distribution of Z is normal.
Now E[Z] = µ and var[Z] = E[(Z − µ)(Z − µ)T ] = 1/2(ΣX + ΣY ). The density of Z is
1
1
fZ (z) = fX (z) + fY (z)
2
2
and this is not the density of N (µ, 1/2(ΣX + ΣY ))—we can see that by comparing the values of these two densities at
z = µ.
5.7 Linear transformation of a multivariate normal.
Proposition(5.7a). Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution.
Suppose further that B is an m × n matrix with m ≤ n and rank(B) = m; hence B has full rank.
Let Z = BX. Then Z has the non-singular N (Bµ, BΣBT ).
Proof. We first establish that BΣBT is positive definite. Suppose x ∈ Rm with xT BΣBT x = 0. Then yT Σy = 0 where
y = BT x. Because Σ is positive definite, we must have y = 0. Hence BT x = 0; hence x1 αT1 + · · · + xm αTm = 0 where
α1 , . . . , αm are the m rows of B. But rank(B) = m; hence x = 0 and hence BΣBT is positive definite.
The characteristic function of Z is, for all t ∈ Rm :
1
φZ (t) = E[exp(itT Z)] = E[exp(itT BX)] = φX (BT t) = exp itT Bµ − tT BΣBT T
2
Hence Z ∼ N (Bµ, BΣBT ).
5.8 Partitioning a multivariate normal into two sub-vectors.
Proposition(5.8a). Suppose X is an n-dimensional random vector with the N (µ, Σ) distribution, and X is
partitioned into two sub-vectors:


Y
k×1 
X =
where n = k + `.
Z
n×1
`×1
Now partition µ and Σ conformably as follows:


µY
k×1 
µ =
and
µZ
n×1
`×1

ΣY
Σ =
k×k
k×`
AT
ΣZ
`×k
`×`
n×n
A


(5.8a)
Then the random vectors Y and Z are independent iff A = 0. (Equivalently, Y and Z are independent iff
cov[Y, Z] = 0.)
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
Section 5 Page 53
Proof.
⇒ Now cov[Yi , Zj ] = 0 for all i = 1, . . . , k and j = 1, . . . , ` implies A = 0.
⇐ Because A = 0 we have
ΣY
0
Σ=
0
ΣZ
Now the characteristic function of X is h
i
T
T
1 T
E eit X = eit µ− 2 t Σt for all t ∈ Rn .
Partitioning t conformably into t = (tY , tZ ) gives
h T i
T
T
1 T
1 T
E eit X = eitY µY − 2 tY ΣY tY eitZ µZ − 2 tZ ΣZ tZ
and hence
i
h T i h T i
h T
T
E eitY Y+itZ Z = E eitY Y E eitZ Z
and hence Y and Z are independent.
Repeated application of this result shows that if Y = (Y1 , . . . , Yn ) has a multivariate normal distribution, then
Y1 , . . . , Yn are independent iff all covariances equal 0.
5.9 Conditional distributions. To prove the following proposition, we use the following result: suppose
W1 is a k-dimensional random vector;
W2 is an `-dimensional random vector;
W1 and W2 are independent;
h is a function : R` → Rk .
Let V = W1 + h(W2 ). Then the distribution of V given W2 has density
f(V,W2 ) (v, w2 ) f(W1 ,W2 ) (v − h(w2 ), w2 )
fV|W2 (v|w2 ) =
=
= fW1 ( v − h(w2 ) )
fW2 (w2 )
fW2 (w2 )
In particular, if W1 ∼ N (µ1 , Σ1 ) then the conditional distribution of V = W1 + h(W2 ) given W2 = w2 has the
density of the N ( µ1 + h(w2 ), Σ1 ) distribution.
Proposition(5.9a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution,
and X is partitioned into two sub-vectors:


Y
k×1 
X =
where n = k + `.
Z
n×1
`×1
Partition µ and Σ as in equations(5.8a). Then the conditional distribution of Y given Z = z has the density of
the
−1 T
N µY + AΣ−1
(z
−
µ
),
Σ
−
AΣ
A
distribution.
Z
Y
Z
Z
Proof. Let

B =
n×n

k×k
−AΣ−1
Z
0
I

`×k
`×`
I
k×`
Note that B is invertible with inverse
B−1 =
I
0
AΣ−1
Z
I
But by proposition(5.7a), we know that BX ∼ N (Bµ, BΣBT ). where
Y − AΣ−1
µY − AΣ−1
Z Z
Z µZ
BX =
Bµ =
and
Z
µZ
BΣBT =
T
ΣY − AΣ−1
Z A
0
0
ΣZ
It follows that Y − AΣ−1
Z Z is independent of Z. Also
−1
−1 T
Y − AΣ−1
Z Z ∼ N µY − AΣZ µZ , ΣY − AΣZ A
By the remark before the statement of the proposition, the conditional distribution of Y given Z = z has the density of
−1 T
N µY + AΣ−1
Z (z − µZ ), ΣY − AΣZ A
Note that AΣ−1
Z is called the matrix of regression coefficients of Y on Z; it is obtained by multiplying the k ×
` matrix consisting of the entries cov[Yi , Zj ] by the ` × ` precision matrix of Z.
Similarly, the matrix of regression coefficients of Z on Y is AT Σ−1
Y .
Page 54 Section 5
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
T
T −1
Let DY = ΣY − AΣ−1
Z A ; then DY is a k × k invertible matrix. Similarly, let DZ = ΣZ − A ΣY A; then DZ is
an ` × ` invertible matrix.
By postmultiplying the following partitioned matrix by ΣX , it is easy to check that
D−1
−D−1
AΣ−1
−1
Y
Y
Z
Σ =
T −1
−D−1
D−1
Z A ΣY
Z
5.10 Transformation to independent normals–relation to the χ2 distribution. Suppose X is an n-dimension
random vector with the non-singular N (µ, Σ) distribution.
From §5.2 on page 50 we know that the precision matrix P = Σ−1 = LT DL where L is orthogonal and D =
diag[d1 , . . . , dn ] with d1 > 0, . . . , dn > 0. Hence Σ = LT D−1 L where D−1 = diag[ 1/d1 , . . . , 1/dn ]. Hence
Σ = QQT where Q = LT diag[ 1/√d1 , . . . , 1/√dn ]. Because L is non-singular, it follows that Q is also non-singular.
Consider the transformation
Z = Q−1 (X − µ)
T
Now E[Z] = 0 and var[Z] = Q−1 Σ(Q−1 )T = Q−1
(Q−1 )T = QT (Q−1 )T = (Q−1 Q)T = I. Hence Z ∼ N (0, I)
PQQ
n
T
and Z1 , . . . , Zn are i.i.d. N (0, 1). Hence Z Z = i=1 Zi2 ∼ χ2n . We have shown that
We can generalize this result:
If X ∼ N (µ, Σ) then (X − µ)T Σ−1 (X − µ) ∼ χ2n
Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution. Then the random variable XT Σ−1 X has the non-central χ2n distribution with non-centrality parameter
λ = µT Σ−1 µ.
Proposition(5.10a).
Proof. Take Z = Q−1 X where Σ = QQT . Then Z ∼ N (µZ , I) where µZ = Q−1 µ. Also
XT Σ−1 X = XT (Q−1 )T Q−1 X = ZT Z = Z12 + · · · + Zn2
Hence XT Σ−1 X has the non-central χ2n distribution with non-centrality parameter µTZ µZ = µT Σ−1 µ.
5.11 Independence of size and shape for the multivariate lognormal. See Chapter1:§20 on page 38.
Suppose X = (X1 , . . . , Xn ) ∼ logN (µ, Σ). This means that if Yj = ln(Xj ) for j =
1, 2, . . . , n then Y = (Y1 , . . . , Yn ) ∼ N (µ, Σ). Suppose g1 (0, ∞)n → (0, ∞) denotes the size variable
g1 (x) = (x1 · · · xn )1/n
Then g1 (X) is independent of every shape vector z(X) iff there exists c ∈ R such that cov[Yj , Y1 + · · · + Yn ] = c
for all j = 1, 2, . . . , n.
Proposition(5.11a).
Proof. By proposition(20.3b) on page 39,we need only prove g1 (X) is independent of one shape function. Consider the
shape function z ∗ (x) = 1, x2/x1 , . . . , xn/x1 .
Now g1 (X) is independent of z ∗ (X) iff (X1 · · · Xn )1/n is independent of 1, X2/X1 , . . . , Xn/X1 . This occurs iff Y1 +· · ·+Yn
is independent
(Y2 − Y1 , . . . , Yn − Y1 ). But the Y ’s are normal; hencePby proposition(5.8a)
page 52, this occurs iff
Pof
Pon
n
n
n
cov[Yi − Y1 , j=1 Yj ] = 0 for i = 2, 3, . . . , n; and this occurs iff cov[Yi , j=1 Yj ] = cov[Y1 , j=1 Yj ] for i = 2, 3, . . . , n.
This leads to the following characterization of the lognormal distribution.
Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g1 (0, ∞)n → (0, ∞) denotes the size variable
g1 (x) = (x1 · · · xn )1/n
Then there exists a shape vector z(X) which is independent of g1 (X) iff there exists σ > 0 such that every
Xj ∼ logN (µj , σ 2 ).
Proposition(5.11b).
Proof.
⇐ Let Yj = ln(Xj ). Then Yj ∼ N (µj , σ 2 ); also Y1 , . . . , Yn are independent. Hence cov[Yj , Y1 + · · · + Yn ] = σ 2 for
j = 1, 2, . . . , n. Hence result by previous proposition.
⇒ By proposition(20.3b), if there exists one shape
vector which is independent of g1 (X), then all shape vectors are inde
pendent of g1 (X). Hence 1, X2/X1 , . . . , Xn/X1 is independent of g1 (X) = (X1 · · · Xn )1/n . Hence Yk − Y1 is independent
of Y1 + · · · + Yn for k = 2, . . . , n. Hence, by the Skitovich-Darmois theorem—see proposition (6.6b), every Yk is normal.
This result implies many others. For example, suppose X1 , X2 , . . . , Xn are independent random variables with
Xj ∼ logN (µj , σ 2 ) for j = 1, 2, . . . , n. Then
Xj
(X1 X2 · · · Xn )1/n is independent of
max{X1 , X2 , . . . , Xn }
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
and
(X1 X2 · · · Xn )1/n
is independent of
Exercises 6 Page 55
X1 + X2 + · · · + Xn
max{X1 , X2 , . . . , Xn }
etc.
6 Exercises
(exs-multivnormal.tex)
1. (a) Suppose the random vector X has the N (µ, Σ) distribution. Show that X − µ ∼ N (0, Σ).
(b) Suppose X1 , . . . , Xn are independent with distributions N (µ1 , σ 2 ), . . . , N (µn , σ 2 ) respectively. Show that the
random vector X = (X1 , . . . , Xn )T has the N (µ, σ 2 I) distribution where µ = (µ1 , . . . , µn )T .
(c) Suppose X ∼ N (µ, Σ) where X = (X1 , . . . , Xn )T . Suppose further that X1 , . . . , Xn are uncorrelated. Show that
X1 , . . . , Xn are independent.
(d) Suppose X and Y are independent n-dimensional random vectors with X ∼ N (µX , ΣX ) and Y ∼ N (µY , ΣY ).
Show that X + Y ∼ N (µX + µY , ΣX + ΣY ).
2. Suppose X ∼ N (µ, Σ) where
µ=
−3
1
4
!
and
(a) Are (X1 , X3 ) and X2 independent?
(b) Are X1 − X3 and X1 − 3X2 + X3 independent?
(c) Are X1 + X3 and X1 − 2X2 − 3X3 independent?
Σ=
4
0
−1
0 −1
5 0
0 2
!
3. From linear regression. Suppose a is an n × m matrix with n ≥ m and rank(a) = m. Hence a has full rank.
(a) Show that the m × m matrix aT a is invertible.
(b) Suppose the n-dimensional random vector X has the N (µ, σ 2 I) distribution. Let
B = (aT a)−1 aT and Y = BX
m×n
n×1
Show that
Y ∼ N Bµ, σ 2 (aT a)−1
4. Suppose the 5-dimensional random vector Z = (Y, X1 , X2 , X3 , X4 ) is multivariate normal with finite expectation E[Z] =
(1, 0, 0, 0, 0) and finite variance var[Z] = Σ where


1 1/2 1/2 1/2 1/2
 1/2 1 1/2 1/2 1/2 


Σ =  1/2 1/2 1 1/2 1/2 

1
/2 1/2 1/2 1 1/2
1/2
1/2
1/2
1/2
1
Show that E[Y |X1 , X2 , X3 , X4 ] = 1 + 51 (X1 + X2 + X3 + X4 ).
5. Suppose X = (X1 , X2 , X3 ) has a non-singular multivariate normal distribution with E[Xj ] = µj and var[Xj ] = σj2 for
j = 1, 2 and 3. Also
!
1 ρ12 ρ13
corr[X] = ρ12 1 ρ23
ρ13 ρ23 1
(a) Find E[X1 |(X2 , X3 )] and var[X1 |(X2 , X3 ).
(b) Find E[(X1 , X2 )|X3 ] and var[(X1 , X2 )|X3 ].
6. Continuation of proposition(5.11a).
(a) Show that c ≥ 0.
(b) Show that the size variable g1 (X) is independent of every shape vector z(X) iff the n-dimensional vector (1, . . . , 1 )
is an eigenvector of Σ.
(c) Suppose c = 0. Show that (X1 · · · Xn )1/n is almost surely constant.
7. Suppose X = (X1 , . . . , Xn ) has the multivariate normal distribution with E[Xj ] = µj , var[Xj ] = σ 2 and corr[Xj , Xk ] =
ρ|j−k| for all j and k in {1, 2, . . . , n}. Hence X ∼ N (µ, Σ) where


1
ρ
ρ2
· · · ρn−1
 
µ
n−2
1
ρ
··· ρ
 ρ

.
µ =  ..  and Σ = σ 2 
..
..
.. 
..
 ..

.
.
.
.
.
µ
ρn−1 ρn−2 ρn−3 · · ·
1
Show that the sequence {X1 , X2 , . . . , Xn } forms a Markov chain.
Page 56 Section 7
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
7 The bivariate t distribution
7.1 The bivariate t-distribution with equal variances. One possible version of the bivariate t-density is
−(ν+2)/2
x2 − 2ρxy + y 2
1
p
1+
f(X,Y ) (x, y) =
(7.1a)
ν(1 − ρ2 )
2π 1 − ρ2
for ν > 0, ρ ∈ (−1, 1), x ∈ R and y ∈ R.
If x denotes the 2 × 1 vector (x, y), then an alternative expression which is equivalent to equation(7.1a) is
−(ν+2)/2
ν (ν+2)/2 p
ν + xT C−1 x
2π 1 − ρ2
1 ρ
and C =
.
ρ 1
fX (x) =
where
C−1 =
1
1 − ρ2
1
−ρ
−ρ
1
(7.1b)
This distribution is called the tν (0, C) distribution. We shall see below in §7.3 on page 57 that C = corr[X] and
X and Y have equal variances.
7.2.Characterization of the bivariate tν (0, C) distribution. The univariate tν -distribution is the distribution of
p
W/ν where Z ∼ N (0, 1), W ∼ χ2 and Z and W are independent. The generalisation to 2 dimensions is:
Z
ν
Proposition(7.2a). Suppose Z = (Z1 , Z2 ) ∼ N (0, C) where
1
C=
ρ
ρ
1
and
ρ ∈ (−1, 1)
Suppose further that W ∼ χ2ν and Z and W are independent. Define X = (X, Y ) by
Z
X=
(W/ν)1/2
Then X = (X, Y ) has the tν (0, C) density given in (7.1a).
Proof. The density of (Z1 , Z2 , W ) is
2
h wi
z − 2ρz1 z2 + z22
wν/2−1
1
p
exp − 1
exp −
ν
2
ν/2
2(1 − ρ )
2
2 Γ( 2 )
2π 1 − ρ2
.p
.p
W/ν , Y = Z
W/ν and W = W . This is a 1 − 1
Consider the transformation to (X, Y, W ) where X = Z1
2
transformation and the absolute value of the Jacobian
is
∂(x, y, w) ν
∂(z1 , z2 , w) = w
Hence
w
wν/2
w x2 − 2ρxy + y 2
p
−
+
1
f(X,Y,W ) (x, y, w) = f (z1 , z2 , w) = ν
exp
ν
2
ν(1 − ρ2 )
2 2 +1 πνΓ( ν2 ) 1 − ρ2
h wα i
wν/2
x2 − 2ρxy + y 2
p
= ν
exp
−
where
α
=
+1
(7.2a)
2
ν(1 − ρ2 )
2 2 +1 πνΓ ν
1 − ρ2
f(Z1 ,Z2 ,W ) (z1 , z2 , w) =
2
Now using the integral of the χ2n density is 1 gives
Z ∞
h xi
n
x 2 −1 exp −
dx = 2n/2 Γ n/2
2
0
which implies
ν2 +1 Z ∞
ν 2 ν2 +1 ν ν
tα
2
ν
t 2 exp −
dt =
Γ
+1 =
Γ
2
α
2
2
α
2
0
Integrating the variable w out of equation(7.2a) gives
−(ν+2)/2
1
x2 − 2ρxy + y 2
p
f(X,Y ) (x.y) =
1+
for (x, y) ∈ R2
ν(1 − ρ2 )
2π 1 − ρ2
which is equation(7.1a) above.
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
Section 8 Page 57
7.3 Properties of the bivariate tν (0, C) distribution.
• The marginal distributions. Both X and Y have t-distributions with ν degrees of freedom. The proof of this is
left to exercise 1 on page 60.
• Moments. E[X] = E[Y ] = 0 and var[X] = var[Y ] = ν/(ν − 2) for ν > 2. The correlation is corr[X, Y ] = ρ
and the covariance is cov[X, Y ] = ρν/(ν − 2). The proof of these results is left to exercise 2 on page 60. It follows
that
ν
ν
1 ρ
var[X] =
=
C
and
corr[X] = C
ρ
1
ν−2
ν−2
• If ρ = 0, then equation(7.1a) becomes
−(ν+2)/2
x2 + y 2
1
1+
f(X,Y ) (x, y) =
2π
ν
Note that f(X,Y ) (x, y) 6= fX (x)fY (y) and hence X and Y are not independent even when ρ = 0.
7.4 Generalisation to non-equal variances. Suppose T1 = aX and T2 = bY where a 6= 0 and b 6= 0 and
X = (X, Y ) ∼ tν (0, C). Thus
2
ν
ν
a 0
T1
a
abρ
=
X and Σ = var[T] =
T=
=
R
2
0 b
T2
ν − 2 abρ b
ν−2
where 4
R=
a2 abρ
abρ b2
and
−1
R
1
= 2 2
a b (1 − ρ2 )
b2
−abρ
−abρ
a2
and |R−1 | =
1
a2 b2 (1
The absolute value of the Jacobian is |ab|. Substituting in equation(7.1a) on page 56 gives
−(ν+2)/2
b2 t21 − 2ρabt1 t2 + a2 t22
1
p
1+
fT (t) =
νa2 b2 (1 − ρ2 )
2π|ab| 1 − ρ2
−(ν+2)/2
−(ν+2)/2
tT R−1 t
ν (ν+2)/2 1
1
+
ν + tT R−1 t
=
=
1/2
1/2
ν
2π|R|
2π|R|
ν
This is the tν (0, R) distribution. Note that var[T] = ν−2 R.
− ρ2 )
8 The multivariate t distribution
8.1 The density of the multivariate t-distribution, tν (0, I). If we put ρ = 0 in equation(7.1b) we see that if
T ∼ tν (0, I) then
−(ν+2)/2
ν (ν+2)/2 fT (t) =
ν + tT t
2π
Generalizing to p-dimensions leads to the following definition.
Definition(8.1a). The p-dimensional random vector T has the t-distribution tν ( 0 , I ) iff T has density
p×1
p×1 p×p
1
f (t) ∝ (ν+p)/2
ν + tT t
where ν ∈ R and ν > 2.
An alternative expression is:
f (t1 , . . . , tp ) = 4
κ
ν+
t21
+ · · · + t2p
(ν+p)/2
In general, the inverse of the 2 × 2 symmetric matrix
1
a c
b −c
is
c b
ab − c2 −c a
provided ab 6= c2 .
Page 58 Section 8
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
The constant of proportionality, κ, can be determined by integration. Integrating out tp gives
Z ∞
Z ∞
dtp
dtp
= 2κ
f (t1 , . . . , tp−1 ) = κ
(ν+p)/2
2 + · · · + t2 (ν+p)/2
0
−∞ ν + t2 + · · · + t2
ν
+
t
p
p
1
1
Z ∞
dtp
2
2
= 2κ
(ν+p)/2 where α = ν + t1 + · · · + tp−1
0
α + t2p
Z ∞
dtp
2κ
= (ν+p)/2
(ν+p)/2
α
0
1 + t2p /α
√
Z ∞
√
√
dx
2κ α
= (ν+p)/2 √
(ν+p)/2 where x = tp ν + p − 1/ α
α
ν+p−1 0
1 + x2 /(ν + p − 1)
Using the standard result that
√
−(n+1)/2
Z ∞
√
nΓ 1/2 Γ n/2
t2
2
1+
dt = nB(1/2, n/2) =
n
Γ (n+1)/2
0
implies
κ
√
πΓ( (ν+p−1)/2)
α(ν+p−1)/2 Γ( (ν+p)/2)
√
κ π Γ( (ν+p−1)/2)
1
=
2
(ν+p)
Γ(
/2)
[ν + t1 + · · · + t2p−1 ](ν+p−1)/2
f (t1 , . . . , tp−1 ) =
By induction
f (t1 ) =
κπ (p−1)/2 Γ( (ν+1)/2)
(ν+1)/2
Γ( (ν+p)/2) ν + t21
and so
κ=
ν ν/2 Γ( (ν+p)/2)
π p/2 Γ( ν/2)
It follows that the density of the p-dimensional tν (0, I) is
f (t) =
1
ν ν/2 Γ( (ν+p)/2)
π p/2 Γ( ν/2) ν + tT t (ν+p)/2
(8.1a)
8.2 Characterization of the tν (0, I) distribution.
Suppose Z1 , Z2 , . . . , Zp are i.i.d. with the N (0, 1) distribution and W has the χ2ν distribution. Suppose further that Z = (Z1 , Z2 , . . . , Zp ) and W are independent. Define T = (T1 , T2 , . . . , Tp )
by
Z
T=
(W/ν)1/2
Then T has the density in equation(8.1a).
Proposition(8.2a).
Proof. See exercise 3 on page 60.
8.3 Properties of the tν (0, I) distribution.
• TT T/p has the F (p, ν) distribution—see exercise 6 on page 60.
• The contours of the distribution are ellipsoidal (the product of independent t distributions does not have this
property).
• The marginal distribution of an r-dimensional subset of T has the tν (0, I) distribution. In particular, each Ti
has the tν distribution. These results follow immediately from the characterization in §8.2.
• E[T] = 0 and var[T] = E[TTT ] =
Finally, corr[T] = I.
ν
ν−2 I
for ν > 2. (Because W ∼ χ2ν implies E[1/W ] = 1/(ν − 2).)
2 Multivariate Continuous Distributions
Jun 7, 2018(14:22)
Section 8 Page 59
8.4 The p-dimensional t-distribution: tν (m, C).
Here C is real, symmetric, positive definite p × p matrix.
The Cholesky decomposition implies there exists a real and nonsingular L with C = LLT . Let
V = m + L T where T ∼ tν (0, I)
p×1
p×1
p×p p×1
ν
ν
Then E[V] = m and var(V) = Lvar(T)LT = ν−2
LLT = ν−2
C. See exercise 4 on page 60 for the proof of the
result
TT T = (V − m)T C−1 (V − m)
(8.4a)
It follows that V has density:
ν ν/2 Γ (ν+p)/2
κ
f (v) =
and |L| = |C|1/2
(8.4b)
(ν+p)/2 where κ =
p/2 Γ ν/
T
−1
π
2
|L| ν + (v − m) C (v − m)
A random variable which has the density given in equation(8.4b) is said to have the tν (m, C) distribution.
Definition(8.4a). Suppose C is real, symmetric, positive definite p × p matrix and m is a p × 1 vector in Rp .
Then the p-dimensional random vector V has the tν (m, C) distribution iff V has the density
1
f (v) ∝ (ν+p)/2
ν + (v − m)T C−1 (v − m)
It follows that
ν
E[V] = m and var[V] =
C
ν−2
and the constant of proportionality is given in equation(8.4b).
8.5 Linear transformation of the tν (m, C) distribution. Suppose T ∼ tν (m, C); thus m is the mean vector and
ν
ν−2 C is the covariance matrix of the random vector T. Suppose V = a + AT where A is non-singular.
ν
It follows that T = A−1 (V − a), E[V] = a + Am and var[V] = ν−2
ACAT .
T
Let m1 = a + Am and C1 = ACA . Then V has the tν (m1 , C1 ) distribution—see exercise 5 on page 60.
8.6 Characterization of the tν (m, C) distribution.
Proposition(8.6a). Suppose Z has the non-singluar multivariate normal distribution N (0, Σ) and W has
the χ2ν distribution. Suppose further that Z and W are independent. Then T = m + Z/(W/ν)1/2 has the
tν (m, Σ) distribution.
Proof. Because Z has a non-singular distribution, Σ is positive definite and there exists a symmetric non-singular Q with
Σ = QQ. Let Y = Q−1 Z. Then var[Y] = Q−1 var[Z](Q−1 )T = I. So Y ∼ N (0, I). Hence
Y
∼ tν (0, I)
T1 = p
W/ν
Using §8.5 gives T = m + QT1 ∼ tν (m, Σ) as required.
8.7
Summary.
• The bivariate t-distribution tν (0, R). This has E[T] = 0 and var[T] =
fT (t) =
ν
ν−2 R.
The density is
−(ν+2)/2
ν (ν+2)/2 ν + tT R−1 t
1/2
2π|R|
Particular case:
−(ν+2)/2
t21 − 2ρt1 t2 + t22
ν
1 ρ
1+
where var[T] =
fT (t) = p
ν(1 − ρ2 )
ν−2 ρ 1
2π 1 − ρ2
ν
• The p-dimensional t-distribution tν (m, R). This has E[T] = m and var[T] = ν−2
R. The density is
ν ν/2 Γ ν+p
1
f (t) = p/2 ν2 π Γ 2 |R|1/2 ν + (v − m)T R−1 (v − m) (ν+p)/2
1
• Characterization of the t-distribution. Suppose Z ∼ N (0, Σ) and W has the χ2ν distribution. Suppose
further that Z and W are independent. Then T = m + Z/(W/ν)1/2 has the tν (m, Σ) distribution.
Page 60 Exercises 9
Jun 7, 2018(14:22)
9 Exercises
Bayesian Time Series Analysis
(exs-t.tex.tex)
1. Suppose T has the bivariate t-density given in equation(7.1a) on page 56. Show that both marginal distributions are the
tν -distribution and hence have density given in equation(12.1b) on page 23:
−(ν+1)/2
1
t2
√
f (t) =
1
+
for t ∈ R.
ν
B( 1/2, ν/2) ν
2. Suppose T has the bivariate t-density given in equation(7.1a) on page 56 and ν > 2.
(a) Find E[X] and var[X].
(b) Find cov[X, Y ] and corr[X, Y ].
3. Prove proposition(8.2a) on page 58: Suppose Z1 , Z2 , . . . , Zp are i.i.d. with the N (0, 1) distribution and W has the
χ2ν distribution.
Suppose further that Z = (Z1 , Z2 , . . . , Zp ) and W are independent. Define T = (T1 , T2 , . . . , Tp ) by
T = Z (W/ν)1/2 . Then T has the following density
f (t) =
ν ν/2 Γ( (ν+p)/2)
1
π p/2 Γ( ν/2) ν + tT t(ν+p)/2
4. Prove equation(8.4a) on page 59: TT T = (V − m)T C−1 (V − m).
5. See §8.5 on page 59. Suppose T ∼ tν (m, C) and V = a + AT where A is non-singular. Prove that V ∼ tν (m1 , C1 ) where
m1 = a + Am and C1 = ACAT .
6. Suppose the p-variate random vector T has the tν (0, I) distribution. Show that TT T/p has the F (p, ν) distribution.
APPENDIX
Chapter 1 Section 3 on page 6
(exs-basic.tex)
1. (a) A = £1,000 × 1.04 × (1 + V1 ) × (1 + V2 ) = £1,000 × 1.04 × (1.04 + U1 )(1.04 + 2U2 ). Hence E[A] = £1 000 × 1.043 =
£1,124.864 or £1,124.86.
(b) For this case
1,000
1,000
1
1
C=
and E[C] =
E
E
1.04(1.04 + U1 )(1.04 + 2U2 )
1.04
1.04 + U1
1.04 + 2U2
Now
0.01
Z 0.01
1
du
= 50(ln 1.05 − ln 1.03)
E
= 50
= 50 ln(1.04 + u)
1.04 + U1
−0.01 1.04 + u
−0.01
0.01
Z 0.01
1
du
50
E
= 25(ln 1.06 − ln 1.02)
= 50
=
ln(1.04 + 2u)
1.04 + 2U2
1.04
+
2u
2
−0.01
−0.01
Hence E[C] = 889.133375744 or £889.13.
2. Clearly −2a < W < 2a. For w ∈ (−2a, 2a) we have
Z
fW (w) = fX (x)fY (w − x) dx
Now −a < x < a and −a < w − x < a; hence w − a < x < w + a. Hence
Z min(a,w+a)
1
fW (w) =
fX (x)fY (w − x) dx = 2 [min(a, w + a) − max(−a, −a + w)]
4a
max(−a,−a+w)
1
|w|
(2a − w)/4a2 if w > 0
=
=
1−
(w + 2a)/4a2 if w < 0 2a
2a
1/2a
−2a
0
2a
Figure(2a). The shape of the triangular density
3. Clearly 0 ≤ Y < 1; also
dy
dx
(wmf/triangulardensity,60mm,21mm)
= 4x3 = 4y 3/4 .
X fX (x) X fX (x) X 1
1
fY (y) =
=
=
= 3/4
3/4
3/4
dy/dx|
|
4y
8y
4y
x
x
x
4. Now (X − 1)2 ≥ 0; hence X 2 + 1 ≥ 2X. Because X > 0 a.e., we have X + 1/X ≥ 2 a.e. Hence result.
5. For parts (a) and (b):
Z ∞
Z ∞
Z ∞Z ∞
Z ∞Z t
rxr−1 [1 − F (x)] dx =
rxr−1 f (t) dt dx =
rxr−1 f (t)dx dt =
tr f (t) dt = E[X]
0
x=0
t=x
t=0
x=0
t=0
6. (a) Jensen’s Inequality is as follows: suppose X is a random variable with a finite expectation and φ : R → R is a
convex function. Then φ (E[X]) ≤ E [φ(X)].
In particular, suppose φ(x) = 1/x, then φ is a convex function on (0, ∞). Hence if X is positive random
variable
with
finite expectation, then 1/E[X] ≤ E[1/X]. Trivially, the result is still true if E[X] = ∞. Hence E 1/Sn ≥ 1/(nµ).
(b)
Z ∞
Z ∞
Z ∞
n
1
E[e−tX ]
dt =
E[e−t(X1 +···+Xn ) ] dt = E
e−t(X1 +···+Xn ) dt = E
S
n
0
0
0
by using the Fubini-Tonelli theorem that the order of integration can be changed for a non-negative integrand.
7. (a) The arithmetic mean-geometric mean inequality gives
√
x1 + · · · + xn
≥ n x1 · · · xn for all x1 > 0, . . . , xn > 0.
n
Hence
1
1
≤
1/n
1/n
x1 + · · · + xn
nx1 · · · xn
Using independence gives
"
#!n
1
1
1
E
≤
E
1/n
Sn
n
X1
Bayesian Time Series Analysis by R.J. Reed
Jun 7, 2018(14:22)
Page 61
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
Now for x > 0 we have
1
1/x if 0 < x ≤ 1;
1
if x ≥ 1.
x1/n
Hence E[1/Sn ] is finite. (b) Because they have identical distributions, E[X1 /Sn ] = · · · = E[Xn /Sn ]. Hence
Sn
X1 + · · · + Xn
X1
Sj
X1 + · · · + Xj
X1
j
1=E
=E
= nE
Hence E
=E
= jE
=
Sn
Sn
Sn
Sn
Sn
Sn
n
√
8. (a) Recall |cov[X1 , X2 ]| ≤ var[X1 ] var[X2 ]; hence cov[ X/Y , Y ] is finite. Hence E[X] = E[ X/Y ] E[Y ]. Also
2
2
2
E[( X/Y )X] = E[( X /Y 2 )Y ] = E[ X /Y 2 ] E[Y ] because X /Y 2 is independent of Y . Hence
2
2
0 = cov[ X/Y , X] = E[( X/Y )X] − E[ X/Y ] E[X] = E[ X /Y 2 ] E[Y ] − {E[ X/Y ]} E[Y ] = var[ X/Y ]E[Y ]
As E[Y ] > 0, it follows that var[ X/Y ] = 0 as required.
(b) Clearly ln( Y /X ) = ln(Y ) − ln(X) is independent of ln(X). Using characteristic functions, φln(Y /X) (t) φln(X) (t) =
φln(Y ) (t). Also ln( X/Y ) = ln(X) − ln(Y ) is independent of ln(Y ). Hence φln(X/Y ) (t) φln(Y ) (t) = φln(X) (t). Hence
φln(Y /X) (t)φln(X/Y ) (t) = 1. But for any characteristic function |φ(t)| ≤ 1. Hence |φln(Y /X) (t)| = 1 everywhere. This
implies1 ln(Y /X) is constant almost everywhere and this establishes the result.
9. Now 2 2 E Y − Yb
= E Y − E(Y |X) + E(Y |X) − Yb
=E
h
Y − E(Y |X)
2 i
≤
+ 2E
h
i
2 Y − E(Y |X) E(Y |X) − Yb + E E(Y |X) − Yb
By equation(1.1a) on page 3 and the law of total expectation, the first term is E[var(Y |X)]. Applying the law of total
expectation to theh second term gives
i
n h
io
2E Y − E(Y |X) E(Y |X) − Yb = 2E E Y − E(Y |X) E(Y |X) − Yb |X
n
o
= 2E E(Y |X) − Yb × 0 = 0
Hence
E
Y − Yb
2 = E[var(Y |X)] + E
E(Y |X) − Yb
2 which is minimized when Yb = E(Y |X).
10. (a) For the second result, just use E(XY |X) = XE(Y |X) = aX + bX 2 and take expectations. Clearly cov[X, Y ] =
b var[X]. Then E(Y |X) = a + bX = a + bµX + b(X − µX ) = µY + b(X − µX ). Then use b = cov(X, Y )/var(X) =
i2
h
2
2
Y
(X − µX ) =
and b = ρσY /σX . Finally E Y − E(Y |X) = E Y − µY − ρ σσX
ρσY /σX . (b) var E(Y |X) = b2 σX
Y
cov[X, Y ] = σY2 + ρ2 σY2 − 2ρ2 σY2 as required.
σY2 + ρ2 σY2 − 2ρ σσX
(c) We have µX = c + dµY and µY = a + bµX . Hence µX = (c + ad)/(1 − bd) and µY = (a + bc)/(1 − bd).
Now E[XY ] = cµY + dE[Y 2 ] and E[XY ] = aµX + bE[X 2 ]. Hence cov[X, Y ] = dvar[Y ] and cov[X, Y ] = bvar[X].
2
Y
= b/d. Finally ρ = cov[X, Y ]/(σX σY ) = d σσX
and hence ρ2 = d2 b/d = bd.
Hence σY2 /σX
2
11. Let g(a, b) = E ( Y − a − bX ) = E[Y 2 ] − 2aµY + a2 − 2bE[XY ] + b2 E[X 2 ] + 2abµX . Hence we need to solve
∂g(a, b)
∂g(a, b)
= −2µY + 2a + 2bµX = 0 and
= −2E[XY ] + 2bE[X 2 ] + 2aµX = 0
∂a
∂b
This gives
E[XY ] − µX µY
σY
σY
b=
=ρ
and
a = µY − bµX = µY − ρ
µX
2
2
σX
σX
E[X ] − µX
12.
Z 1
2
2
6
(x + y)2 dy =
(x + 1)3 − x3 = (3x2 + 3x + 1) for x ∈ [0, 1].
fX (x) =
7
7
0 7
Similarly
2
fY (y) = (3y 2 + 3y + 1) for y ∈ [0, 1].
7
Hence
3(x + y)2
3(x + y)2
fX|Y (x|y) = 2
and fY |X (y|x) = 2
for x ∈ [0, 1] and y ∈ [0, 1].
3y + 3y + 1
3x + 3x + 1
and so the best predictor of Y is
2 2
1
Z 1
x y
3
3
y 3 y 4 E[Y |X = x] = 2
(x2 + 2xy + y 2 )y dy = 2
+
2x
+
3x + 3x + 1 0
3x + 3x + 1 2
3
4 0
2
2
3
x
2x 1
1
= 2
+
+
=
6x + 8x + 3
3x + 3x + 1 2
3
4
4(3x2 + 3x + 1)
1
See for example, pages 18–19 in [L UKACS(1970)]and exercise 4 on page 298 in [A SH(2000)].
Appendix
Jun 7, 2018(14:22)
9
14 ,
(b) Now µX = µY =
2
σX
=
σY2
= E[X
2
2
E[X 2 ] =
9
101
] − 14
2 = 210
Z 1Z 1
7
E[XY ] =
6
(3x4 + 3x3 + x2 )dx =
0
199
2940 .
=
Also
Z 1 (Z
2
xy(x + y) dxdy =
y
y=0
0
0
Z
−
81
196
R1
2
7
1
=
y=0
y 2y 2 y 3
+
+
4
3
2
dy =
2
7
3
5x
5
x=1
+ 43 x4 + 13 x3 x=0 =
)
1
2
x(x + y) dx
Z
1
dy =
x=0
y
y=0
2
7
3
5
+
3
4
1 2y y 2
+
+
4
3
2
+
1
3
=
101
210
and
dy
1 2 1 17
+ + =
8 9 8 36
2
9
5
5
2940
25
and cov[X, Y ] = 17
42 − 142 = − 588 and ρ = − 588 × 199 = − 199 . Hence the best linear predictor is
σY
9
144
25
µY + ρ
(X − µX ) =
(1 − ρ) + ρX =
−
X
σX
14
199 199
(c) See figure (12a) below.
Hence E[XY ] =
17
42
0.75
Best linear predictor
Best predictor
0.70
0.65
0.60
0.0
0.2
0.4
0.6
0.8
1.0
Figure(12a). Plot of best predictor (solid line) and best linear predictor (dashed line) for exercise 12.
(wmf/exs-bestlin,72mm,54mm)
n!
j−1
(1 − x)n−j . Recall B(j, n −
(j−1)!(n−j)! x
1
j−1
f (x) = B(j,n−j+1) x (1 − x)n−j for x ∈ (0, 1)
13. (a) Using equation(2.3a) on page 4 gives the density f (x) =
Γ(j)Γ(n−j+1)
Γ(n+1)
(j−1)!(n−j)!
.
n!
j + 1) =
=
Hence the density of Xj:n is
which is
the Beta(j, n − j + 1) distribution. (b) E[Xj:n ] = j/(n + 1) by using the standard result of the expectation of a Beta
distribution.
14. ⇐ The joint density of (X1:2 , X2:2 ) is g(y1 , y2 ) = 2f (y1 )f (y2 ) = 2λ2 e−λ(y1 +y2 ) for 0 <
y1 < y2 . Now consider the
∂(w,y) transformation to (W, Y ) = (X2:2 − X1:2 , X1:2 ). The absolute value of the Jacobian is ∂(y
= | − 1| = 1. Hence
1 ,y2 )
f(W,Y ) (w, y) = 2λ2 e−λ(w+y+y) = 2λe−2λy λe−λw = fY (y)fW (w) where the density of X1:2 is fY (y) = 2λe−2λy . The
fact that the joint density is the product of the marginal densities implies W and Y are independent.
(x+y)
⇒ P[X2:2 − X1:2 > y|X1:2 = x] = P[X2:2 > x + y|X1:2 = x] = 1−F
1−F (x) and this is independent of x. Taking x = 0
gives 1 − F (x + y) = (1 − F (x))(1 − F (y)) and F is continuous. Hence there exists λ > 0 with F (x) = 1 − e−λx .
15. By equation(2.2a) on page 4, the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn )
for 0 ≤ x1 ≤ x2 · · · ≤ xn . The transformation to (Y1 , Y2 , . . . , Yn ) has Jacobian with absolute value
∂(y1 , . . . , yn ) 1
∂(x1 , . . . , xn ) = y n−1
1
Hence for y1 ≥ 0 and 1 ≤ y2 ≤ · · · ≤ yn , the density of the vector (Y1 , Y2 , . . . , Yn ) is
h(y1 , . . . , yn ) = n!y1n−1 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn )
(b) Integrating yn from yn−1 to ∞ gives
h(y1 , . . . , yn−1 ) = n!y1n−2 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn−1 ) 1 − F (y1 yn−1 )
Then integrating yn−1 over yn−2 to ∞ gives
2
1 − F (y1 yn−2 )
n−3
h(y1 , . . . , yn−2 ) = n!y1 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn−2 )
2
and by induction
[1 − F (y1 y2 )]n−2
(n − 2)!
h(y1 ) = nf (y1 ) [1 − F (y1 )]n−1
h(y1 , y2 ) = n!y1 f (y1 )f (y1 y2 )
as required.
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
Chapter 1 Section 5 on page 11
R∞
(exs-gamma.tex)
∞
−ux e−u |0
R∞
+ x 0 ux−1 e−u du = xΓ(x).
(b) Γ(1) =
1. (a) Integrating by parts gives Γ(x + 1) = 0 ux e−u du =
R ∞ −u
1 2
e
du
=
1.
(c)
Use
parts
(a)
and
(b)
and
induction.
(d)
Use
the
transformation
u
=
t
;
hence
Γ( 1/2) =
2
√ R ∞ − 1 t2
R0∞ −1/2 −u
√
u
e du = 2 0 e 2 dt = π. The final equality follows because the integral over (−∞, ∞) of the
0
√
standard normal density is 1. (e) Use induction. For n = 1 we have Γ( 3/2) = π/2 which is the right hand side. Now
2n + 1
1.3.5 . . . (2n − 1).(2n + 1) √
π as required.
Γ(n + 1 + 1/2) = (n + 1/2)Γ(n + 1/2) =
Γ(n + 1/2) =
2
2n+1
2. E[Y ] = n/α and E[1/X] = α/(m − 1) provided m > 1. Hence E[ Y /X ] = n/(m − 1) provided m > 1.
3. fX (x) = xn−1 e−x /Γ(n). At x = n − 1, fX (x) = (n − 1)n−1 e−(n−1) /Γ(n). Hence result by Stirling’s formula.
4. The simplest way is to use the moment generating function: MX+Y (t) = E[et(X+Y ) ] = E[etX ]E[etY ] = 1/(1 − t/α)n1 +n2
which is the mgf of a Gamma(n1 + n2 , α) distribution. Alternatively,
Z t
Z
αn1 +n2 e−αt t
fX+Y (t) =
fX (t − u)fY (u) du =
(t − u)n1 −1 un2 −1 du
Γ(n
)Γ(n
)
1
2
0
u=0
Z
αn1 +n2 tn1 +n2 −1 e−αt Γ(n1 )Γ(n2 )
αn1 +n2 tn1 +n2 −1 e−αt 1
(1 − w)n1 −1 wn2 −1 dw =
=
Γ(n1 )Γ(n2 )
Γ(n1 )Γ(n2 )
Γ(n1 + n2 )
w=0
n1 +n2 n1 +n2 −1 −αt
α
t
e
=
Γ(n1 + n2 )
where we have used the transformation w = u/t.
5. (a) Clearly u ∈ R and 0 < v < 1; also x = uv and y = u(1 − v). Hence
n+m n+m−1 −αu m−1
∂(x, y) u
e
v
(1 − v)n−1
= u and f(U,V ) (u, v) = α
= fU (u)fV (v)
∂(u, v) Γ(n)Γ(m)
where U ∼ Gamma(n + m, α) and V ∼ Beta(m, n).
(1+v)2
(b) Clearly u ∈ R and v ∈ R; also x = u/(1 + v) and y = uv/(1 + v). Hence ∂(u,v)
and
∂(x,y) =
u
αn+m un+m−2 e−αu v n−1 αn+m un+m−1 e−αu v n−1
u
=
2
(1 + v) (1 + v)m+n−2 Γ(n)Γ(m)
(1 + v)m+n Γ(n)Γ(m)
αn+m un+m−1 e−αu Γ(n + m)
v n−1
=
= fU (u)fV (v)
Γ(n + m)
Γ(n)Γ(m) (1 + v)m+n
whereR U ∼ Gamma(n
+ m, α) and 1/(1 + V ) ∼ Beta(m, n).
R
6. Now etX dP ≥ {X>x} etX dP ≥ etx P[X ≥ x]. Hence, for all x ≥ 0 and all t < α we have P[X ≥ x] ≤ e−tx E[etX ].
Hence
e−tx
P[X ≥ x] ≤ inf e−tx E[etX ] = αn inf
t<α
t<α (α − t)n
By differentiation, the infimum occurs at t = α − n/x. Hence
2n
en−αx
P[X ≥ x] ≤ αn
and
P
X
≥
≤ 2n e−n as required.
(n/x)n
α
7. Let V = X + Y . Then
f(X,V ) (x, v) f(X,Y ) (x, v − x)
fX|V (x|v) =
=
fV (v)
fV (v)
f(U,V ) (u, v) =
α(αx)m−1 e−αx α (α(v − x))n−1 e−α(v−x)
Γ(m + n)
Γ(m)
Γ(n)
α(αv)m+n−1 e−αv
Γ(m + n) xm−1 (v − x)n−1
=
for 0 < x < v.
Γ(m)Γ(n)
v m+n−1
=
Hence for v > 0 we have
Γ(m + n)
E[X|X + Y = v] =
Γ(m)Γ(n)
=
Z
Γ(m + n)
v
Γ(m)Γ(n)
v
x=0
1
xm (v − x)n−1
Γ(m + n)
dx =
v m+n−1
Γ(m)Γ(n)
Z
t=0
tm (1 − t)n−1 dt =
Z
v
x=0
x m v
1−
Γ(m + n) Γ(m + 1)Γ(n)
mv
v
=
Γ(m)Γ(n) Γ(m + n + 1)
m+n
8.
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) ∝ f (y1 , . . . , yn , x) = f (y1 , . . . , yn |x)fX (x)
n
Y
=
xe−xyi λe−λx = λxn e−x(λ+y1 +···+yn )
i=1
Using the integral of the Gamma density gives
x n−1
dx
v
Appendix
Jun 7, 2018(14:22)
Z
∞
λxn e−x(λ+y1 +···+yn ) dx =
0
and hence
λn!
(λ + y1 + · · · + yn )n+1
xn e−x(λ+y1 +···+yn ) (λ + y1 + · · · + yn )n+1
n!
which is the Gamma(n + 1, λ + y1 + · · · + yn ) distribution. Hence E[X|Y1 = y1 , . . . , Yn = yn ] = (n + 1)/(λ + y1 + · · · + yn ).
9. Now φX (t) = E[eitX ] = α/(α − it). Using Y = αX − 1 gives φY (t) = E[eitY ] = e−it E[eitαX ] = e−it /(1 − it). Hence
|φY (t)| = 1/(1 + t2 )1/2 . Choose k > 2 and then |φY (t)|k ≤ 1/(1 + t2 ) which is integrable. Also, for |t| ≥ δ we have
|φY (t)| ≤ 1/(1 + δ 2√
)1/2 < 1 for all δ > 0.
It follows
that Sn / n √
has a bounded continuous density with density fn which satisfies limn→∞ fn (z) = n(z). Now
√
Sn / n = (αGn − n)/ n. Hence
√
√ n
n+z n
fn (z) =
fGn
and hence the required result.
α
α
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) =
10. (a) Use the transformation y = λxb . Then
Z ∞
Z
∞
f (x) dx =
0
(b) Straightforward.
0
y n−1 e−y
dy = 1
Γ(n)
(c) Use the transformation y = xb .
Chapter 1 Section 7 on page 15
(exs-normal.tex)
1. Using the transformation v = −t gives
Z −x
Z x
Z ∞
1
1
1
√ exp − t2/2 dt = −
√ exp − v2/2 dv =
√ exp − v2/2 dv = 1 − Φ(x)
Φ(−x) =
2π
2π
2π
−∞
∞
x
−1
2. (140 − µ)/σ = Φ (0.3) = −0.5244005 and (200 − µ)/σ = 0.2533471. Hence σ = 77.14585 and µ = 180.4553.
3. We can take Y = 0 with probability 1/2 and Y = Z with probability 1/2, where Z ∼ N (0, 1). Hence E[Y n ] = 21 E[Z n ] = 0
if n is odd and 12 E[Z n ] = n!/(2(n+2)/2 ( n/2)!) if n is even..
4. (a) Clearly we hmust have c > 0;
=
i also a > 0 is necessaryin order
to ensure fX (x) can integrate
to
1. Now
Q(x)2 (x−µ)
b
b 2
b2
b2
b 2
b2
2
a(x − a x) = a (x − 2a ) − 4a2 and hence fX (x) = c exp − 4a2 exp −a(x − 2a ) = c exp − 4a2 exp − 2σ2
2
b
1
b
where µ = 2a
and σ 2 = 2a
. Because fX (x) integrates to 1, we must also have c exp − 4a
= σ√12π . This answers (a).
2
b
1
(b) X ∼ N 2a
, σ 2 = 2a
.
5. Clearly X/σ and Y /σ are i.i.d. with the N (0, 1) distribution. Hence (X 2 + Y 2 )/σ 2 ∼ χ22 = Γ(1, 1/2) which is the
exponential ( 1/2) distribution with density 21 e−x/2 for x > 0. Hence X 2 + Y 2 ∼ exponential ( 1/2σ2 ) with expectation 2σ 2 .
(b) Clearly X1/σ, . . . , Xn/σ are i.i.d. with the N (0, 1) distribution. Hence Z 2 /σ 2 ∼ χ2n = Gamma( n/2, 1/2). Hence
Z 2 ∼ Gamma( n/2, 1/2σ2 ).
6.
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) ∝ f (y1 , . . . , yn , x) = f (y1 , . . . , yn |x)fX (x)
n
Y
1
(y − x)2
1
(x − µ)2
√ exp − i 2
√ exp −
=
2σ 2
2σ1
σ 2π
σ 2π
i=1 1
Pn
Pn 2
2x i=1 y1
y
nx2
x2
2µx
µ2
−
−
+
−
∝ exp − i=12 i +
2σ 2 2σ 2
2σ 2
2σ1
2σ12
2σ12
Pn
x i=1 y1
nx2
x2
µx
−
−
+ 2
∝ exp
2
2
2
2σ
σ
σ1
2σ1
Pn
2
yi
αx
n
µ
1
= exp −
+ βx
where α = 2 + 2 and β = 2 + i=12
2
σ
σ1 σ
σ1
α
2
∝ exp − (x − β/α)
2
Hence the distribution of (X|Y1 = y1 , . . . , Yn = yn ) is N ( β/α, σ 2 = 1/α). Note that α, the precision of the result is the
sum of the (n + 1)-precisions. Also, the mean is a weighted average of the input means:
P
β µ σ12 /n + ( yi )σ 2
=
α
σ12 /n + σ 2
q
2
2
2
2
7. (a) fY (y) = σ√22π e−y /2σ = σ1 π2 e−y /2σ for y ∈ (0, ∞).
q R
q h
q
q
2
2
2
2 ∞
∞
(b) E[|X|] = σ1 π2 0 xe−x /2σ dx = σ1 π2 −σ 2 e−x /2σ = σ1 π2 σ 2 = σ π2
0
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
8. (a) FY (x) = P[Y ≤ x] = P[−x ≤ X ≤ x] = Φ(x) − Φ(−x). Hence fY (x) = fX (x) + fX (−x); hence
r
µx (x − µ)2
(x + µ)2
1
1
2
(x2 + µ2 )
√
exp −
exp
−
fY (x) = √
+
=
exp
−
cosh
2σ 2
2σ 2
πσ 2
2σ 2
σ2
2πσ 2
2πσ 2
1 x
by using cosh x = 2 (e + e−x ).
Z ∞
Z ∞
(x − µ)2
(x + µ)2
1
x exp −
x exp −
dx +
dx
E[Y ] = √
2σ 2
2σ 2
2πσ 2 0
0
Z ∞
Z
∞
(x + µ)2
(x − µ)2
1
(x + µ) exp −
(x − µ) exp −
dx +
dx + A
=√
2σ 2
2σ 2
2πσ 2 0
0
1
µ2
µ2
=√
σ 2 exp − 2 + σ 2 exp − 2
2σ
2σ
2πσ 2
where
Z ∞
Z ∞
(x + µ)2
(x − µ)2
µ
exp −
exp −
dx −
dx
A= √
2σ 2
2σ 2
2πσ 2 0
0
"Z
#
Z ∞
∞
h µ
µ i
h
µ i
2
µσ
−y 2 /2
√
=
e
dy −
e−y /2 dy = µ Φ
−Φ −
= µ 1 − 2Φ −
σ
σ
σ
2πσ 2 −µ/σ
µ/σ
Hence
r
h
µ i
2
µ2
E[Y ] = σ
exp − 2 + µ 1 − 2Φ −
π
2σ
σ
2
2
Clearly var[Y ] = var[|X|] = E[X 2 ] − {E[|X|} = var[X] + {E[X]} − µ2Y = σ 2 + µ2 − µ2Y .
(c) The mgf, E[etY ], is
Z ∞
Z ∞
1
(x + µ)2
(x − µ)2
√
dx +
dx =
exp tx −
exp tx −
2σ 2
2σ 2
2πσ 2
0
0
Z ∞
2 2
Z ∞
2 2
σ t
1
(x − µ − σ 2 t)2
(x + µ − σ 2 t)2
σ t
√
+ µt
dx
+
exp
−
µt
dx
exp −
exp
−
exp
2
2σ 2
2
2σ 2
2πσ 2
0
0
2 2
h
2 2
h
µ
i
µ
i
σ t
σ t
= exp
+ µt 1 − Φ − − σt + exp
− µt 1 − Φ
− σt
2
σ
2
σ
Hence the cf is
h
h
µ
µ
i
i
σ 2 t2
σ 2 t2
φY (t) = E[eitY ] = exp −
+ iµt 1 − Φ − − iσt + exp −
− iµt 1 − Φ
− iσt
2
σ
2
σ
9.
Z ∞
Z µ
1
(x − µ)2
(x − µ)2
n
n
n
E |X − µ| = √
dx +
dx
(x − µ) exp −
(µ − x) exp −
2σ 2
2σ 2
2πσ 2
µ
−∞
2
Z ∞
2
t
=√
tn σ n exp −
dt
2
2π 0
Z ∞
n+1
n+1
σn
σn
2n/2 σ n
=√
v (n−1)/2 exp(− v/2) dv = √ 2(n+1)/2 Γ
= √ Γ
2
2
π
2π 0
2π
itZ1 +isZ2
i(t+s)X
i(t−s)Y
− 21 (t+s)2 − 21 (t−s)2
−s2 −t2
itZ1
isZ2
10. (a) Now E[e
] = E[e
]E[e
]=e
e
=e e
= E[e ]E[e
].
X−Y
2
2
2
1
1
(b) Let X1 = X+Y
2 , Then X1 ∼ N (0, σ = /2). Let X2 =
2 , Then X2 ∼ N (0, σ = /2). Let Z1 = 2X1
2
and Z2 = 2X2 . Now X1 and X2 are independent
by part (a); hence Z1 and Z2 are
√ independent. Hence Z1 and Z2
√
are i.i.d. χ21 = Gamma( 1/2, 1/2) with c.f. 1/ √1 − 2it. Hence −Z2 has the c.f. 1/√ 1 + 2it. Because Z1 and Z2 are
independent, the c.f. of 2XY = Z1 − Z2 is 1/ 1 + 4t2 . Hence the c.f. of XY is 1/ 1 + t2 .
(c)
Z ∞
Z ∞
1 2
1
itXY
ityX
E[e
]=
E[e
]fY (y) dy =
E[eityX ] √ e− 2 y dy
2π
−∞
−∞
Z ∞
Z ∞
2
1 2 2
1 2
1 2
1
1
1
=√
e− 2 t y e− 2 y dy = √
e− 2 y (1+t ) dy = √
2π −∞
2π −∞
1 + t2 √
√
(d) Now X = σX1 and Y = σY1 where the c.f. of X1 Y1 is 1/ 1 + t2 . Hence the c.f. of XY is 1/ 1 + σ 4 t2 .
(e) Take σ = 1. Then the m.g.f. is
Z ∞
Z ∞
2
1
1
tXY
tyX
E[e
]=
E[e ]fY (y) dy =
E[etyX ] √ e− 2 (y−µ) dy
2π
−∞
−∞
µ2 Z
Z ∞
∞
−
2
2
1 2 2
1
1 2
1
e 2
=√
eµty+ 2 t y e− 2 (y−µ) dy = √
e− 2 y (1−t )+µy(1+t) dy
2π −∞
2π −∞
Appendix
Jun 7, 2018(14:22)
µ2 Z
∞
(1 − t2 )
e− 2
2µy
exp −
= √
y2 −
dy
2
1−t
2π −∞
"
2 #
Z ∞
1
(1 − t2 )
µ
µ2 µ2 (1 + t)
√
exp −
y−
dy
= exp − +
2
2(1 − t)
2
1−t
2π −∞
2 µ t
1
√
= exp
1−t
1 − t2
2
For the general case E[etXY ] = E[etσ X1 Y1 ] where X1 and Y1 are i.i.d. N ( µ/σ, 1) and hence
1
iµ2 t
1
µ2 t
itXY
√
√
and
the
c.f.
is
E[e
]
=
exp
E[etXY ] = exp
2
2
4
2
1 − tσ
1 − itσ
1−σ t
1 + σ 4 t2
11. Use the previous question. In both cases, the c.f. is 1/(1 + t2 ).
12. (a) Now
Z ∞
Z 1 ∞
b
b
1
b2
2
exp − 2 u + u2
du =
1 − 2 + 2 + 1 exp − 12 u2 +
2 0
u
u
0
Consider the integral
Z ∞
b
1
b2
2
I1 =
+
1
exp
−
u
+
du
2
2
u
u2
0
b
u
The transformation u → z with z = u −
b2
u2
is a 1 − 1 transformation: (0, ∞) → (−∞, ∞). Also
Z ∞
√
1 2
I1 = e−b
e− 2 z dz = e−b 2π
du
dz
du
=1+
b
.
u2
Hence
−∞
Now consider the integral
Z
I2 =
0
∞
b
1 − 2 exp − 12 u2 +
u
b2
u2
du
√
√
Consider the transformation z = u + ub . This is a 1 − 1 transformation (0, b) → (∞, 2 b) and a 1 − 1 transformation
√
√
( b, ∞) → (2 b, ∞). Hence
Z √b Z ∞ ! Z 2√b
Z ∞
2
1 2
1 2
b
1
b
+ √
eb e− 2 z dz + √ eb e− 2 z dz = 0
I2 =
1 − 2 exp − 2 u2 + u2
du =
u
0
∞
b
2 b
as required.
(b) Just use the transformation u = |a|v in part (a) and then set b1 = b/|a|.
Chapter 1 Section 9 on page 18
(exs-logN.tex)
1. Let S4 denote the accumulated value at time t = 4 and let s0 denote the initial amount invested. Then S4 = s0 (1 + I1 )(1 +
P4
I2 )(1 + I3 )(1 + I4 ) and ln(S4 /s0 ) = j=1 ln(1 + Ij )
1
2
Recall that if If Y ∼ lognormal(µ, σ 2 ) then Z = ln Y ∼ N (µ,pσ 2 ). Also E[Y ] = E[eZ ] = eµ+ 2 σ and var[Y ] =
2
2
2
e2µ+σ (eσ − 1). Hence eσ = 1 + var[Y ]/E[Y ]2 and eµ = E[Y ]/ 1 + var[Y ]/E[Y ]2 or µ = ln E[Y ] − σ 2 /2.
Using mean=1.08 and variance=0.001 gives µ1 = 0.0765325553785 and σ12 = 0.000856971515297.
Using mean=1.06 and variance=0.002 gives µ2 = 0.0573797028389 and σ22 = 0.00177841057009.
Hence ln(S4 /s0 ) ∼ N (2µ1 + 2µ2 , 2σ12 + 2σ22 ) = N (0.267824516435, 0.00527076417077). We
qwant
0.95 = P[S5 > 5000] = P[ln(S5 /s0 ) > ln(5000/s0 )] = P[Z > (ln(5000/s0 ) − (2µ1 + 2µ2 ))/ 2σ12 + 2σ22 ] Hence


ln(5000/s
)
−
(2µ
+
2µ
)
) − (2µ1 + 2µ2 )
0
1
2
 and so ln(5000/s
q
q0
0.05 = Φ 
= Φ−1 (0.05)
2
2
2
2
2σ1 + 2σ2
2σ1 + 2σ2
q
Hence ln(5000/s0 ) = (2µ1 + 2µ2 ) + Φ−1 (0.05) 2σ12 + 2σ22 = 0.148408095871.
q
−1
2
2
Hence s0 = 5000 exp −(2µ1 + 2µ2 ) − Φ (0.05) 2σ1 + 2σ2 = 4310.39616086 or £4,310.40.
2
2
2. (a) Let Z = 1/X. Then ln(Z) = − ln(X) ∼
(b) Let Z = cX b . Then
N (−µ, σ ). Hence Z ∼ logN2 (−µ,
σ ).
2 2
2
ln(Z) = ln(c) + b ln(X) ∼ N ln(c) + bµ, b σ . Hence Z ∼ logN ln(c) + bµ, b σ .
3. (a) Now X ∼ logN (µ, σ 2 ); hence ln(X) ∼ N (µ, σ 2 ). Hence ln(GMX ) = E[ln(X)] = µ; hence GMX = eµ .
2
(b) Now ln(GVX ) = var[ln(X)] = σ 2 . Hence GVX = eσ and GSDX = eσ .
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
4. The median is eµ because P[X < eµ ] = P[ln(X) < µ] = 1/2. Hence the median equals the geometric mean. The mean
1 2
is eµ+ 2 σ by equation(8.3a) on page 17. For the mode, we need to differentiate the density function which is
c
(ln(x) − µ)2
fX (x) = exp −
for x > 0.
x
2σ 2
Hence
dfX (x)
c 2(ln(x) − µ)
(ln(x) − µ)2
c
exp
−
= − 2−
dx
x
x
2xσ 2
2σ 2
2
2
1
2
µ
µ+ σ
which equals 0 when x = eµ−σ . Clearly mode = eµ−σ < median
= e < mean = e 2 .
ln(q1 )−µ
ln(q1 )−µ
and hence
= −0.6744898 and hence
(b) Lower quartile: 0.25 = P[X < q1 ] = P[ln(X) < ln(q1 )] = Φ
σ
σ
5.
6.
7.
8.
q1 = eµ−0.6744898σ . Similarly for the upper quartile,
q3 =eµ+0.6744898σ .
ln(αp )−µ
(c) p = P[X ≤ αp ] = P[ln(X) ≤ ln(αp )] = Φ
. Hence ln(αp ) = µ + σβp as required.
σ
Pn
Pn 2 Pn
Pn
Pn 2 (a) Let Z = X1 · · · Xn . Then ln(Z) = i=1 ln(Xi ) ∼ N
i=1 µi ,
i=1 σi .
i=1 µi ,
i=1 σi . Hence Z ∼ logN
P
n
(b) Let Z = (X1 · · · Xn )1/n . Then ln(Z) = i=1 ln(Xi )/n ∼ N (µ, σ 2 /n). Hence Z ∼ logN (µ, σ 2 /n).
Pn
Pn 2 2
Pn
Pn
Qn
(c) Let Z = i=1 Xiai . Then ln(Z) =
i=1 ai µi ,
i=1 ai σi . Hence mn =
i=1 ai µi and
i=1 ai ln(Xi ) ∼ N
qP
n
2 2
sn =
i=1 ai σi .
Let Z = X1 /X2 . Then Z ∼ logN (µ1 − µ2 , σ12 + σ22 ) by using the previous 2 questions.
2
2
1 2
We know that α = eµ+ 2 σ and β = e2µ+σ (eσ −1 ). Hence
β
α
α2
α2
σ 2 = ln 1 + 2
and eµ = p
=p
or µ = ln p
α
1 + β/α2
β + α2
β + α2
Now for x ∈ (0, k) we have
(ln(x) − µ)2
1
√ exp −
f (x|X < k) =
2σ 2
P[X < k]σx 2π
√
ln(k) − µ
1
(ln(x) − µ)2
=
exp −
where α = σ 2πΦ
xα
2σ 2
σ
and hence
Z
1 k
(ln(x) − µ)2
exp −
dx
E[X|X < k] =
α 0
2σ 2
2
1
2
2
Using the transformation w = ln(x) − µ − σ 2 /σ gives dw
dx = xσ and (ln(x) − µ) /σ = (w + σ) . Hence
2
2
Z
Z
σ (ln(k)−µ−σ )/σ
(w + σ)2
w2 σ 2
1 (ln(k)−µ−σ )/σ
xσ dw =
+
+ µ dw
exp −
exp −
E[X|X < k] =
α −∞
2
α −∞
2
2
2
Φ ln(k)−µ−σ
σ
σ2
= eµ+ 2
ln(k)−µ
Φ
σ
1
2
The other result is similar or use E[X|X < k]P[X < k] + E[X|X > k]P[X > k] = E[X] = eµ+ 2 σ .
9. (a)
Z x
1
(ln(u) − µ)2
1
G(x) = exp −jµ − j 2 σ 2
uj √ exp −
du
2
2σ 2
uσ 2π
0
Z x
(ln(u) − µ − jσ 2 )2
1
√ exp −
du as required.
=
2σ 2
0 uσ 2π
Setting j = 1 in part (a) shows that xfX (x) = E[X]fX1 (x) where X1 ∼ logN (µ + σ 2 , σ 2 ).
(b)
Z ∞Z u
Z ∞Z ∞
2E[X]γX =
(u − v)fX (u)fX (v) dvdu +
(v − u)fX (u)fX (v) dvdu
u=0 v=0
u=0 v=u
Z ∞Z u
Z ∞Z v
=
(u − v)fX (u)fX (v) dvdu +
(v − u)fX (u)fX (v) dvdu
u=0 v=0
v=0 u=0
Z ∞Z u
=2
(u − v)fX (u)fX (v) dvdu
v=0
Zu=0
Z ∞ Z u
∞
=2
uFX (u)fX (u)du − 2
vfX (v)dv fX (u)du
u=0
u=0
v=0
Z ∞
Z ∞
= 2E[X]
FX (u)fX1 (u)du −
FX1 (u) fX (u)du
where X1 ∼ logN (µ + σ 2 , σ 2 ).
u=0
u=0
= 2E[X] P[X ≤ X1 ] − P[X1 ≤ X] = 2E[X] P[ X/X1 ≤ 1] − P[ X1/X ≤ 1]
Appendix
Jun 7, 2018(14:22)
But X/X1 ∼ logN (−σ 2 , 2σ 2 ) and P[ X1/X ≤ 1] = P[X ≥ X1 ] = 1 − P[X < X1 ]. Hence
γX = 2P[Y ≤ 1] − 1
where Y ∼ logN (−σ 2 , 2σ 2 ).
σ
√
= 2Φ
−1
as required.
2
Chapter 1 Section 11 on page 22
α−1
1. X has density fX (x) = αx
. Let Y = − ln(X). Then Y ∈ (0, ∞) and X = e
. dy fX (x) dx = αe−y(α−1) × e−y = αe−αy for y ∈ (0, ∞), as required.
(exs-betaarcsine.tex.tex)
−Y
and
dx
dy
= −x. Hence fY (y) =
dy
2. Let Y = X/(1 − X); then Y ∈ (0, ∞). Also X = Y /(1 + Y ), 1 − X = 1/(1 + Y ) and dx
= (1 + y)2 . Hence
dx xα−1 (1 − x)β−1
1
1
y α−1
=
for y ∈ (0, ∞), as required.
fY (y) = fX (x) =
2
dy
B(α, β)
(1 + y)
B(α, β) (1 + y)α+β
3. (a) Suppose α > 0 and β > 0. Then for all x > 0 we have
Z ∞
xα−1
xα
B(α + 1, β − 1)
α
fX (x) =
Hence
E[X]
=
dx =
=
α+β
α+β
B(α, β) (1 + x)
B(α,
β)
(1
+
x)
B(α,
β)
β
−
1
0
R∞ α
−α−β
using 0 x (1 + x)
dx = B(α + 1, β − 1) for all α > −1 and β > 1.
(b) Similarly, for all β > 2 we have
Z ∞
xα+1
B(α + 2, β − 2)
α(α + 1)
2
E[X ] =
dx =
=
α+β
B(α,
β)
(1
+
x)
B(α,
β)
(β
−
1)(β − 2)
0
Hence var[X].
(c) Suppose α ≤ 1. Then fX (x) ↑ as x ↓ and the mode is at 0. Now suppose α > 1 and let g(x) = xα−1 /(1 + x)α+β .
Then g 0 (x) = ( α − 1 − x(1 + β) ) /x(1 + x). Hence g 0 > 0 for x < (α − 1)/(1 + β) and g 0 > 0 when x > (α − 1)/(1 + β).
So the mode is at (α − 1)/(1 + β).
dy
dy
(d) Let
Y = 1/X; hence | dx
| = 1/x2 . Hence fY (y) = fX (x)/| dx
| = x2 fX (x) = xα+1 B(α, β)(1 + x)α+β =
y β−1 B(β, α)(1 + y)α+β as required.
(Note that B(α, β) = B(β, α).)
1
(e) Let V = X/Y and W = Y ; then ∂(v,w)
∂(x,y) = y . Now
f(X,Y ) (x, y) =
f(X,Y ) (x, y) xn1 −1 y n2 e−(x+y) v n1 −1 wn1 +n2 −1 e−w(1+v)
xn1 −1 y n2 −1 e−(x+y)
=
and f(V,W ) (v, w) = =
Γ(n1 )Γ(n2 )
Γ(n1 )Γ(n2 )
Γ(n1 )Γ(n2 )
∂(v,w)
∂(x,y)
and hence
Z ∞
v n1 −1
Γ(n1 + n2 )
v n1 −1
fV (v) =
as required.
wn1 +n2 −1 e−w(1+v) dw =
Γ(n1 )Γ(n2 ) w=0
Γ(n1 )Γ(n2 ) (1 + v)n1 +n2
(f) Just use X/Y = (2X)/(2Y ) and part (e).
4. Throughout x ∈ [0, 1] and we take arcsin(x) ∈ [0, π2 ].
Let y = arcsin(x); then sin(y) = x and sin(−y) = −x. Hence arcsin(−x) = − arcsin(x).
Let y = π2 − arccos(x). Then sin(y) = x and hence arccos(x) + arcsin(x) = π2 .
Now sin(y) = x and 1 − 2x2 = cos2 (y) − sin2 (y) = cos(2y); hence 21 arccos(1 − 2x2 ) = y = arcsin(x).
√
Combining gives 2 arcsin( x) = arccos(1 − 2x) = π2 − arcsin(1 − 2x) = π2 + arcsin(2x − 1).
5. (a) Let Y = kX + m. Hence
fX (x) fX (x)
1
fY (y) = dy =
= √
the density of an arcsin(m + ak, m + bk) distribution.
k
π (y − m − ak)(bk + m − y)
| dx |
(b) Let Y = X 2 ; then Y ∈ (0, 1). Also
fX (x)
fX (x) fX (x)
1
1
√
fY (y) = 2 dy = 2
=
=
= √
as required.
2
2|x|
|x|
π
y(1
− y)
|x|π 1 − x
| dx |
(c) Let Y = sin(X). Then Y ∈ (−1, 1). Also
fX (x)
2
1
1
fX (x)
1
fY (y) = 2 dy = 2
=
= p
= √
as required.
2
|
cos(x)|
2π
|
cos(x)|
π (1 − y)(1 + y)
| dx |
π 1−y
Let Y = sin(2X). Then Y ∈ (−1, 1). Also
fX (x)
fX (x)
1
1
1
1
fY (y) = 4 dy = 4
=
= p
= √
as required.
2
2| cos(2x)| π | cos(2x)| π 1 − y
π (1 − y)(1 + y)
| dx |
Let Y = − cos(2X). Then Y ∈ (−1, 1). Also
fX (x)
fX (x)
1
1
1
1
fY (y) = 4 dy = 4
=
= p
= √
as required.
2
2| sin(2x)| π | sin(2x)| π 1 − y
π (1 − y)(1 + y)
| dx |
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
6. (a) Now V = X + Y has a triangular distribution on (−2π, 2π) with density:
1
|v|
fV (v) =
1−
for −2π < v < 2π.
2π
2π
Let Z = sin(V ) = sin(X + Y ). Then Z ∈ (−1, 1). Also
X fV (v) X fV (v)
1 X
1
|v|
√
fZ (z) =
=
=
1
−
dz
| cos(v)|
2π
| dv
|
1 − z 2 2π v
v
v
−1
The 4 values of v leading to z are sin−1
and −2π + sin−1 (z) and −π − sin−1 (z)
P(z), π − sin (z), which are both√positive,
(b) Now −Y ∼ Uniform(−π, π).
which are both negative. This gives v |v| = 4π. Hence fZ (z) = 1/π 1 − z 2 .
Hence result follows by part (a).
Chapter 1 Section 13 on page 26
(exs-tCauchyF.tex.tex)
1. Use the transformation from (X, Y ) to (W, Z) where
W =Y
Hence w ∈ (0, ∞) and z ∈ R and
Now
and
X
Z=p
Y /n
√
√
∂(w, z) = √n = √ n
∂(x, y) y
w
√
∂(x, y) w
= f(X,Y ) (x, y) √
f(W,Z) (w, z) = f(X,Y ) (x, y) ∂(w, z) n
√
n/2−1
−y/2
2
1
y
e
w
√
= √ e−x /2 n/2
n
2 Γ n/2
2π
2
1
e−z w/2n w(n−1)/2 e−w/2
= 1/2 (n+1)/2 1/2
π 2
n Γ n/2
But
Z
∞
w(n−1)/2 e−αw dw =
0
Γ n+1
2
α(n+1)/2
Hence
Γ n+1
1
1
z2
2
(n+1)/2
where
α
=
1
+
2
n
π 1/2 2(n+1)/2 n1/2 Γ n/2 α
−(n+1)/2
Γ n+1/2
1
z2
√
as required.
= 1/2
1+
n
n
π Γ n/2
fZ (z) =
2. First, let x = (t − α)/s:
n/2
n/2
Z ∞
1
1
dt
=
s
dx
1 + (t − α)2 /s2
1 + x2
−∞
−∞
But from equation(12.1b) on page 23, we know that
−(n+1)/2
Z ∞
√
t2
1+
dt = B( 1/2, n/2) n
n
−∞
√
Letting x = t/ n gives
Z ∞
(1 + x2 )−(n+1)/2 dx = B 1/2, n/2
Z
∞
−∞
and hence the result.
Pn
P
n → ∞ by the Weak Law of Large Numbers. Using the
3. Proof 1. Now Y = Z12 + · · · + Zn2 ; hence i=1 Zi2 /n−→1 asq
Pn
√
√
P
2
simple inequality | a − 1| = |a − 1|/| a + 1| < |a − 1| gives
i=1 Zi /n−→1 as n → ∞. One of the standard results
by Slutsky(see for example p285 in Probability and Stochastic Processes by Grimmett and Stirzaker) is:
D
P
D
if Zn −→Z as n → ∞ and Yn −→c as n →√∞ where c 6= 0 then Zn /Yn −→Z/c as n → ∞. Hence result.
n −n
2πn as n → ∞. Using this we can show that
Proof 2. Stirling’s formula is n! ∼ n e
1
1
√ =√
lim
as n → ∞.
n→∞ B( 1/2, n/2) n
2π
Also
−(n+1)/2
2
t2
= e−t /2 as n → ∞.
lim 1 +
n→∞
n
Appendix
Jun 7, 2018(14:22)
√
4. Now W = X/Y = ( X/σ)/( Y /σ). Hence we can take σ = 1 without loss of generality. (a) Now W = X/Y = X/ Z
where X ∼ N (0, 1) and Z ∼ χ21 and X and Z are independent. Hence W ∼ t1 = γ1 , the Cauchy density.
(b) As for part (a).
(c) The folded Cauchy density which is fW (w) = 2/π(1 + w2 ) for w > 0.
P
5. (a) Let W = tan(U ). Then fW (w) = fU (u) ∂u = 1/π(1 + w2 ). (b) Let W = tan(U ). Then fW (w) =
fU (u) ∂u =
(2/2π) × 1/(1 + w2 ) as required.
u
∂w
∂w
i
h
|t| n
t
t
6. φ(t) = E[eitY ] = E[ei n (X1 +···Xn ) ] = E[ei n X1 ]n = e−s n
= e−s|t| as required.
7. (a) φ2X (t) = E[eit2X ] = E[ei(2t)X ] = e−2s|t| . This is the Cauchy γ2s distribution.
(b) φX+Y (t) = E[eit(X+Y ) ] = E[eit(aU +cU +bV +dV ) ] = E[eit(a+c)U ]E[eit(b+d)V ] = e−s(a+c)|t| e−s(b+d)|t| = e−s(a+b+c+d)|t|
which is the Cauchy distribution γs(a+b+c+d) .
8. We have Y = R sin(Θ) and X = R cos(Θ). Also ∂(x,y)
∂(r,θ) = r. Hence
∂(x, y) = 1 e−(x2 +y2 )/2 r = 1 re−r2 /2
f(R,Θ) (r, θ) = f(X,Y ) (x, y) ∂(r, θ) 2π
2π
2
Hence Θ is uniform on (−π, π), R has density re−r /2 for r > 0 and R and Θ are independent.
If W = R2 then the density of W is fW (w) = 12 e−w/2 for w > 0; this is the χ22 distribution.
9. Let X = tan(Θ); hence Θ ∈ (− π/2, π/2). Then P[Θ ≤ θ] = P[X ≤ tan(θ)] = 1/2 + θ/π and fΘ (θ) = 1/π. Hence Θ has the
uniform distribution on (− π/2, π/2). Now 2X/(1 − X 2 ) = tan(2Θ). So we want the distribution of W = tan(Y ) where Y
has the uniform distribution on (−π, π). Hence
X
dw 1
=2 1
fW (w) =
fY (y) as required.
dy
2π
1
+
w2
y
10. Use the transformation from (X, Y ) to (V, W ) where
X/m nX
V =
=
and
W =Y
Y /n
mY
Hence v ∈ (0, ∞) and w ∈ (0, ∞) and
∂(v, w) n
n
∂(x, y) = my = mw
Now
∂(x, y) = f(X,Y ) (x, y) mw
f(V,W ) (v, w) = f(X,Y ) (x, y) ∂(v, w) n
Using
R∞
0
=
xm/2−1 e−x/2 y n/2−1 e−y/2 mw
n
2m/2 Γ m/2 2n/2 Γ n/2
=
(mwv/n)m/2−1 e−mvw/2n wn/2−1 e−w/2 mw
n
2m/2 Γ m/2
2n/2 Γ n/2
=
w
mv
v m/2−1 (m/n)m/2
w(m+n)/2−1 e− 2 (1+ n )
(m+n)/2
2
Γ(m/2)Γ(n/2)
wk−1 e−αw dw = Γ(k)/αn with α = 21 (1 + mv
n ) and integrating out w gives
m+n
Γ 2
Γ m+n
v m/2−1 (m/n)m/2
v m/2−1 mm/2 nn/2
2
fV (v) = m
= m
n
n
(m+n)/2
(m+n)/2
Γ( 2 )Γ( 2 ) 2
Γ( 2 )Γ( 2 ) (n + mv)(m+n)/2
α
as required.
11. Now Z = X12 /Y12 where X1 = X/σ and Y1 √
= Y /σ are i.i.d. N (0, 1). Now X12 and Y12 are i.i.d. χ21 . Hence Z has the F1,1
distribution which has density fZ (z) = 1/π z(1 + z) for z > 0.
12. Now F = (nX)/(mY ); hence E[F ] = nE[X]E[1/Y ]/m = nmE[1/Y ]/m = nE[1/Y ] = n/(n − 2).
13. (a) By definition(14.6a) on page 29. (b) Using §4.3 on page 8 gives 2α1 X1 ∼ Gamma(n1 , 1/2) = χ22n1 and 2α2 X2 ∼
Gamma(n2 , 1/2) = χ22n2 . Hence result by definition(14.6a).
14. Let Y = nX/m(1 − X); then Y ∈ (0, ∞). Also X = mY /(n + mY ), 1 − X = n/(n + mY ) and
m dy
1
dy
n
=
and hence
=
n dx (1 − x)2
dx m(1 − x)2
Hence
dx xm/2−1 (1 − x)n/2−1 m(1 − x)2
y m/2−1
mm/2 nn/2
fY (y) = fX (x) =
=
as required.
m n
m
n
dy
B( 2 , 2 )
n
B 2 , 2 (my + n)(m+n)/2
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
dy
15. (a) This is the reverse of exercise 14.2 (b) Let Y = αX/β; then | dx
| = α/β. Now for x ∈ (0, ∞) we have
1
1
(2α)α (2β)β xα−1
αα β β xα−1
=
α+β
α+β
B(α, β) 2 [αx + β]
B(α, β) [αx + β]α+β
1
fX (x)
1
αα−1 β β+1 (βy/α)α−1
y α−1
fY (y) = dy =
=
α+β
α+β
B(α, β) β
[1 + y]
B(α, β) [1 + y]α+β
| dx |
fX (x) =
for y ∈ (0, ∞).
Or use X = (Y /2α)/(Z/2β) = (βY )/(αZ) where Y ∼ χ22α , Z ∼ χ22β and Y and Z are independent. Hence X ∼
Beta0 (α, β) by part (f) of 3 on page 22.
16. Now W = nX/(mY ); hence mW/n = X/Y where X ∼ χ2m , Y ∼ χ2n and X and Y are independent. Hence by part (f)
of exercise 3 on page 22, X/Y ∼ Beta0 ( m/2, n/2).
P
17. Proof 1. Now W = (X/m)/(Y /n). Now Y = Z12 + · · · + Zn2 where Z1 , . . . , Zn are i.i.d. N (0, 1). Hence Y /n−→1 as
D
n → ∞. Hence, by Slutsky’s theorem (see answer to exercise 3 on page 70), mW = X/(Y /n)−→χ2m as n → ∞.
Proof 2. Now
Γ( m+n ) mm/2 nn/2 wm/2−1
fW (w) = m 2 n
for w ∈ (0, ∞).
Γ( 2 )Γ( 2 ) [mw + n](m+n)/2
lim fW (w) =
n→∞
n/2
Γ( m+n
mm/2 wm/2−1
2 )n
lim
n→∞ Γ( n )[mw + n](m+n)/2
Γ( m
2 )
2
=
Γ( m+n
mm/2 wm/2−1
1
2 )
lim
m
n
n/2
n→∞ (1 + mw/n)
Γ( 2 )
Γ( 2 )[mw + n]m/2
=
Γ( m+n
mm/2 wm/2−1 −mw/2
2 )
e
lim
m
n
n→∞
Γ( 2 )
Γ( 2 )[mw + n]m/2
mm/2 wm/2−1 −mw/2 1
e
=
Γ( m
2m/2
2 )
√
1
by using Stirling’s formula: n! ∼ 2π nn+ 2 e−n as n → ∞. Finally, convergence in densities implies convergence in
distribution (Feller vII, page 252).
Chapter 1 Section 15 on page 29
(exs-noncentral.tex.tex)
1. Use moment generating functions or characteristic functions.
n
Y
1
1
λj t
λt
tZ
E[e ] =
=
exp
exp
1 − 2t
1 − 2t
(1 − 2t)k/2
(1 − 2t)kj /2
j=1
2
Hence Z hasP
a non-central χP
k distribution with non-centrality parameter λ where k = k1 + · · · + kn and λ = λ1 + · · · + λn .
n
n
2
2. (a) E[W ] = j=1 E[Xj ] = j=1 (1 + µ2j ) = n + λ
(b) Suppose Z ∼ N (0, 1). Then E[Z] = E[Z 3 ] = 0, E[Z 2 ] = 1 and E[Z 4 ] = 3. Suppose X = Z + µ. Then E[X] = µ,
E[X 2 ] = 1 + µ2 , E[X 3 ] = 3µ + µ3 , and E[X 4 ] = 3 + 6µ2 + µ4 . Hence var[X 2 ] = 2 + 4µ2 .
Hence var[W ] = var[X12 ] + · · · + var[Xn2 ] = 2n + 4λ.
3. Rearranging equation(14.4a) on page 29 gives
∞
∞
X
X
e−λ/2 ( λ/2)j e−x/2 x(n+2j−2)/2
e−λ/2 ( λ/2)j
fX (x) =
=
fn+2j (x)
(n+2j)/2
j!
j!
2
Γ( n/2 + j)
j=0
j=0
where fn+2j (x) is the density of the
χ2n+2j
distribution. Hence result.
Chapter 1 Section 17 on page 33
1. Now Y = (X − a)/h has the standard power distribution Power(α, 0, 1) which has density f (y) = αy
R1
For j = 1, 2, . . . , we have E[Y j ] = α 0 y α+j−1 dy = α/(α + j). Hence
α
α
α
E[Y ] =
E[Y 2 ] =
and hence var[Y ] =
α+1
α+2
(α + 1)2 (α + 2)
Now X = a + hY . Hence
αh
αh2
2aαh
αh2
E[X] = a +
E[X 2 ] =
+
+ a2 and var[X] =
α+1
α+2 α+1
(α + 1)2 (α + 2)
nα
nα
2. P[Mn ≤ x] = (x − a) /h and so Mn ∼ Power(nα, a, h).
2
(exs-powerPareto.tex)
α−1
for 0 < y < 1.
Suppose X has density fX and Y = α(X) where the function α has an inverse. Let fY denote the density of Y . Then if Z has
density fY then α−1 (Z) has density fX . This follows because P[α−1 (Z) ≤ z] = P[Z ≤ α(z)] = P[Y ≤ α(z)] = P[α(X) ≤
α(z)] = P[X ≤ z].
Appendix
Jun 7, 2018(14:22)
3. (a) FMn (x) = P[Mn ≤ x] = xn for 0 < x < 1. This is the Power(n, 0, 1) distribution with density fMn (x) = nxn−1 for
0 < x < 1. (b) P[U 1/n ≤ x] = P[U ≤ xn ] = xn for 0 < x < 1. The same distribution as for part (a).
(c) Now (X − a)/h ∼ Power(α, 0, 1); hence by part (b) we have (X − a)/h ∼ U 1/α and X ∼ a + hU 1/α . Then use the
binomial theorem and E[U j/α ] = α/(α + j).
4. Now X ∈ (0, h). Hence Y ∈ (ln( 1/h), ∞) and Y − ln( 1/h) ∈ (0, ∞).
Now P[Y ≤ y] = P[ln(X) ≥ −y] = P[X ≥ e−y ] = 1 − e−αy /hα = 1 − e−α(y−ln(1/h)) for y > ln( 1/h). Hence the
density is αe−α(y−ln(1/h)) for y > ln( 1/h), a shifted exponential.
5. Equation(2.3a) on page 4 gives
n!
n!
k−1
n−k
fk:n (x) =
f (x) {F (x)}
{1 − F (x)}
=
αxkα−1 (1 − xα )n−k
(k − 1)!(n − k)!
(k − 1)!(n − k)!
and using the transformation v = xα gives
Z 1
Z 1
1
n!
n!
αxkα (1 − xα )n−k dx =
v k−1+ α (1 − v)n−k dv
E[Xk:n ] =
(k − 1)!(n − k)! 0
(k − 1)!(n − k)! 0
=
2
E[Xk:n
]=
6. (a)
Γ(k + α1 )Γ(n − k + 1)
n! Γ(k + α1 )
n!
=
(k − 1)!(n − k)!
Γ(n + α1 + 1)
(k − 1)! Γ(n + α1 + 1)
n! Γ(k + α2 )
(k − 1)! Γ(n + α2 + 1)
Rx
E[Y1 + · · · + Yn−1 ]
(n − 1)E[Y ]
n − 1 0 yf (y) dy
(n − 1)α
Sn X(n) = x = 1 +
=1+
=1+
=1+
E
X(n)
x
x
x
F (x)
α+1
as required.
(b) The density of (X1:n , X2:n , . . . Xn:n ) is f (x1 , . . . , xn ) = n!αn (x1 x2 · · · xn )α−1 /hnα for 0 ≤ x1 ≤ x2 · · · ≤ xn . Consider the transformation to (W1 , W2 , . . . , Wn ) where W1 = X1:n /Xn:n , W2 = X2:n /Xn:n , . . . , Wn−1 = X(n−1):n /Xn:n
and Wn = Xn:n . This has Jacobian with absolute
value
∂(w1 , . . . , wn ) 1
∂(x1 , . . . , xn ) = xn−1
n
Hence for 0 < w1 < 1, . . . , 0 < wn < 1, the density of the vector (W1 , . . . , Wn ) is
αn−1
α−1 (n−1)α
h(w1 , . . . , wn ) = wnn−1 f (w1 wn )f (w2 wn ) · · · f (wn−1 wn ) = α(n−1) w1α−1 w2α−1 · · · wn−1
wn
h
Hence W1 , W2 , . . . , Wn are independent. Hence W1 + · · · + Wn−1 is independent of Wn as required.
7. (a) The distribution of X(i) give X(i+1) = x is the same as the distribution of the maximum of i independent random
i
variables from the density f (y)/F (x) for y ∈ (0, x); this maximum has distribution function {F (y)/F (x)} and density
if (y){F (y)}i−1 /{F (x)}i for y ∈ (0, x).
(a) Hence
"
#
Z x
r X(i)
i
E
y r f (y){F (y)}i−1 dy
(17.7a)
X(i+1) = x = r
r
X(i+1)
x {F (x)}i 0
⇐ Substituting f (y) = αy α−1 /hα and FR(y) = y α /hα in the right hand side of equation(17.7a) gives iα/(iα + r)
x
as required. ⇒ Equation(17.7a) gives i 0 y r f (y){F (y)}i−1 dy = cxr {F (x)}i for x ∈ (0, h). Differentiating with
respect to x gives ixr f (x){F (x)}i−1 = cxr−1 {F (x)}i−1 [rF (x) + xif (x)]. Hence f (x)/F (x) = cr/ix(1 − c) > 0
because c < 1. Hence result. " (b) #
Z x
r
X(i+1)
ixr
f (y){F (y)}i−1
E
X
=
x
=
dy
(17.7b)
(i+1)
r X(i)
{F (x)}i 0
yr
and then as for part (a).
8. By equation(2.2a), the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is
g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n!αn (x1 x2 · · · xn )α−1 for 0 ≤ x1 ≤ x2 · · · ≤ xn .
The transformation
to (W
1 , W2 , . . . , Wn ) has Jacobian with absolute value
∂(w1 , . . . , wn ) 1
n−2 n−1
2
and x1 = w1 w2 · · · wn
∂(x1 , . . . , xn ) = x2 · · · xn where x2 · · · xn = w2 w3 · · · wn−1 wn
Hence for 0 < w1 < 1, . . . , 0 < wn < 1, the density of the vector (W1 , . . . , Wn ) is
h(w1 , . . . , wn ) = n!αn xα−1
(x2 · · · xn )α = (αw1α−1 )(2αw22α−1 ) · · · (nαwnnα−1 ) = fW1 (w1 )fW2 (w2 ) · · · fWn (wn )
1
Hence W1 , W2 , . . . , Wn are independent. Also fWk (wk ) = kαwkkα−1 which is Power(kα, 0, 1).
(b) Now Xk:n = Wk Wk+1 · · · Wn ; hence
kα
(k + 1)α
nα
E[Xk:n ] = EWk ]E[Wk+1 ] · · · E[Wn ] =
···
kα + 1 (k + 1)α + 1
nα + 1
Γ(k + α1 )
1
n!
1
1
n!
=
·
·
·
=
(k − 1)! k + α1 k + 1 + α1
(k − 1)! Γ(n + 1 + α1 )
n + α1
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
2
2
Similarly for E[Xk:n
] = EWk2 ]E[Wk+1
] · · · E[Wn2 ].
9. (a) Just use P[Y ≤ y] = P[U −1/α ≤ y] = P[U 1/α ≥ 1/y] = P[U ≥ 1/yα ].
(b) Just use Y ∼ Pareto(α, a, x0 )
iff (Y − a)/x0 ∼ Pareto(α, 0, 1) and part (a).
(c) By part (a), 1/X ∼ Pareto(α, 0, 1) iff 1/X ∼ U −1/α where
U ∼ Uniform(0, 1) and hence iff X ∼ U 1/α where U ∼ Uniform(0, 1) and hence iff X ∼ Power(α, 0, 1).
10.
n
xα
xαn
n
0
0
P[Mn > x] = P[X1 > x] =
=
which is Pareto(αn, a, x0 ).
(x − a)α
(x − a)αn
α+1
11. Now f (x) = αxα
for x > x0 . (a) E[X] = αx0 /(α − 1) if α > 1 and ∞ if α ≤ 1. Also E[X 2 ] = αx20 /(α − 2) if
0 /x
α > 2 and ∞ otherwise.
Hence var[X] = αx20 /(α − 1)2 (α − 2) provided α > 2.
√
α
(b) The median is x0 2. The mode is x0 .
(c) E[X n ] = αxn0 /(α − n) for α > Rn and ∞ otherwise.
∞
tx
α+1
(d) Suppose t < 0. Then E[etX ] = x0 αxα
dx. Set v = −tx; hence v > 0. Then
0 e /x
Z ∞
Z ∞
α −v
α+1
dx
αx0 e (−t)
α(−x0 t)α e−v
E[etX ] =
dx = α(−x0 t)α Γ(−α, −x0 t)
=
v α+1
(−t)
v α+1
−tx0
−tx0
R∞
where Γ(s, x) = x ts−1 e−t dt is the incomplete gamma function. Hence the c.f. is E[eitX ] = α(−ix0 t)α Γ(−α, −ix0 t).
12. Part (a) of exercise 11 shows that E[X] is infinite. Also
∞
Z ∞
Z
1 ∞ −5/2
1 −2x−3/2 1
1
1 1
dx
=
x
dx
=
E
=
=3
X
x 2x3/2
2
2
3
1
1
y
1
−αy
xα
0e
−α(y−ln(x0 ))
for y > ln(x0 ). Hence the density
13. P[Y ≤ y] = P[ln(X) ≤ y] = P[X ≤ e ] = 1 −
= 1−e
is αe−α(y−ln(x0 )) for y > ln(x0 ), a shifted exponential. In particular, if X ∼ Pareto(α, 0, 1), then the distribution of
Y = ln(X) is the exponential (α) distribution.
R ∞ ln(x)
dx =
14. Now GMX is defined by ln(GMX ) = E[ln X]. Either use exercise 13 or directly: E[ln X] = αxα
0
x0 xα+1
R
1
1
α ∞
−αy
αx0 ln(x0 ) ye
dy = ln(x0 ) + α and hence GMX = x0 exp α .
From the answer to exercise 1.9(9) on page 68 we have, where E[X] = αx0 /(α − 1),
Z ∞
Z ∞ Z u
2E[X]γX = 2
uF (u)f (u)du − 2
vf (v)dv f (u)du
u=0
u=0
v=0
Z ∞ h
Z ∞ Z u
x α i αxα
αxα
αxα
0
0
0
0
=2
du
−
2
dv
du
u 1−
v
α+1
α+1
u
u
uα+1
u=x0
u=x0
v=x0 v
Z ∞ Z ∞ Z u
xα
1
1
1
2 2α
0
= 2αxα
−
du
−
2α
x
dv
du
0
0
α
2α
α
α+1
u
u
v
u
u=x0
u=x0
v=x0
=
and hence
2α2 x0
2αx0
−
(2α − 1)(α − 1) (2α − 1)
α
α−1
1
−
=
2α − 1 2α − 1 2α − 1
15. Using equation(2.3a) on page 4 gives
γX =
n!
n!
α
k−1
n−k
f (x) {F (x)}
{1 − F (x)}
=
(k − 1)!(n − k)!
(k − 1)!(n − k)! xα+1
k−1
n!
α
1
=
1− α
(k − 1)!(n − k)! x(n−k+1)α+1
x
1−
fk:n (x) =
and hence
n!
E[Xk:n ] =
(k − 1)!(n − k)!
=
=
2
E[Xk:n
]=
n!
(k − 1)!(n − k)!
Z
∞
x(n−k+1)α
1
Z
0
α
1
v n−k+1
1
v 1+ α
Γ(n − k −
n!
(k − 1)!(n − k)!
Γ(n −
n! Γ(n − k + 1 − α2 )
(n − k)! Γ(n + 1 − α2 )
k−1
k−1
1
1− α
dx
x
(1 − v)k−1 dv =
1
α
1
α
1
xα
+ 1)Γ(k)
=
+ 1)
n!
(k − 1)!(n − k)!
Z
0
1
n! Γ(n − k + 1 − α )
(n − k)! Γ(n + 1 − α1 )
1
1
v n−k− α (1 − v)k−1 dv
1
xα(n−k)
Appendix
Jun 7, 2018(14:22)
16. By equation(2.2a) on page 4, the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is
g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n!αn / (x1 x2 · · · xn )α+1 for 1 ≤ x1 ≤ x2 · · · ≤ xn .
The transformation to (W1 , W2 , . . . , Wn ) has Jacobian with absolute value
∂(w1 , . . . , wn ) 1
n−1 n−2
∂(x1 , . . . , xn ) = x1 · · · xn−1 where x1 · · · xn−1 = w1 w2 · · · wn−1 and xn = w1 w2 · · · wn
Hence for w1 > 1, . . . , wn > 1, the density of the vector (W1 , . . . , Wn ) is
n!αn
n!αn
=
h(w1 , . . . , wn ) =
2α+1 w α+1
(x1 · · · xn−1 )α xα+1
w1nα+1 w2(n−1)α+1 · · · wn−1
n
n
=
nα (n − 1)α
2α
α
· · · 2α+1 α+1 = fW1 (w1 )fW2 (w2 ) · · · fWn (wn )
w1nα+1 w2(n−1)α+1
wn−1 wn
Hence W1 , W2 , . . . , Wn are independent. Also fWk (wk ) =
(b) Now Xk:n = W1 W2 · · · Wk ; hence
(n−k+1)α
(n−k+1)α+1
wk
which is Pareto((n − k + 1)α, 0, 1).
nα
(n − 1)α
(n − k + 1)α
···
nα − 1 (n − 1)α − 1
(n − k + 1)α − 1
1
1
n! Γ(n − k + 1 − α1 )
·
·
·
=
(n − k)! Γ(n + 1 − α1 )
n − 1 − α1
n − k + 1 − α1
E[Xk:n ] = EW1 ]E[W2 ] · · · E[Wk ] =
=
n!
1
(n − k)! n −
1
α
2
Similarly for E[Xk:n
] = EW12 ]E[W22 ] · · · E[Wk2 ].
17. Just use (X1 , . . . , Xn ) ∼ ( Y11 , . . . , Y1n ).
α
18. Let Z = Y /X . Note that F (x) = 1 − xα
0 /x for x ≥ x0 and 0 otherwise. Then for z > 1 we have
Z ∞
Z ∞
Z ∞
xα
αxα
αx2α
1
1
0
0
0
P[Z ≤ z] =
F (zx)f (x) dx =
1− α α
dx = 1 − α
dx = 1 − α
α+1
2α+1
z
x
x
z
x
2z
x0
x0
x0
For z < 1 we have
Z ∞
Z ∞ zα zα
xα
αxα
α
0
0
dx
=
z
−
=
P[Z ≤ z] =
F (zx)f (x) dx =
1− α α
z x
xα+1
2
2
x0 /z
x0 /z
and hence
1
α/z α+1 if z > 1;
fZ (z) = 12 α−1
if z < 1;
2 αz
19.
P[M ≤ x, Y /X ≤ y] = P[M ≤ x, Y /X ≤ y, X ≤ Y ] + P[M ≤ x, Y /X ≤ y, Y < X]
= P[X ≤ x, Y ≤ yX, X ≤ Y ] + P[Y ≤ x, Y ≤ yX, Y < X]
P[X ≤ x, Y ≤ yX, X ≤ Y ] + P[Y ≤ x, Y < X] if y > 1;
=
P[Y ≤ x, Y ≤ yX]
if y < 1.
Rx
P[v ≤ Y ≤ yv]f (v) dv + P[Y ≤ x, Y < X] if y > 1;
x0
=
P[Y ≤ x, Y ≤ yX]
if y < 1.
(Rx
Rx
[F (yv) − F (v)] f (v) dv + x0 [1 − F (v)] f (v) dv if y > 1;
= Rxx0 1 − F ( v/y) f (v) dv
if y < 1.
x0
(
Z
x
x2α
f (v)
1
1 − y1α + 1 if y > 1;
0
1
−
= xα
I
where
I
=
dv
=
0
α
2xα
x2α
x0 v
yα
if y < 1.
0
(
1 − 2y1α if y > 1;
x2α
0
= 1 − 2α
= P[M ≤ x] P[ Y /X ≤ y] by using exercises 10 and 18.
yα
x
if
y
<
1.
2
20. Define the vector (Y1 , Y2 , . . . , Yn ) by
X2:n
Xn:n
, . . . , Yn =
X1:n
X1:n
Exercise 15 on page 8(with answer 1.3(15) on page 63) shows that for y1 > 0 and 1 ≤ y2 ≤ · · · ≤ yn the density of the
vector (Y1 , . . . , Yn ) is
h(y1 , . . . , yn ) = n!y1n−1 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn )
1
1
1
αn xαn
0
= n! αn+1
· · · α+1
α+1
α+1
yn
y1
y2 y3
αnxαn
1
1
1
1
1
1
0
= αn+1
(n − 1)!αn−1 α+1 α+1 · · · α+1 = g(y1 ) (n − 1)!αn−1 α+1 α+1 · · · α+1
yn
yn
y1
y2 y3
y2 y3
Y1 = X1:n ,
Y2 =
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
where g is the density of Y1 = X1:n (see answer 10 above). Hence part (a).
(b)
R∞
Sn E[Y1 + · · · + Yn−1 ]
(n − 1)E[Y ]
n − 1 x yf (y) dy
(n − 1)α
E
X(1) = x = 1 +
=1+
=1+
=1+
X(1) x
x
x
1 − F (x)
α−1
(S
−X
)
S
n
1:n
n
as required.
(c) The result of part (a) implies Y1 is independent of Y2 + · · · + Yn =
/X1:n = /X1:n − 1. Hence
Y1 is independent of Sn/X1:n as required.
21. The distribution of X(i+1) give X(i) = x is the same as the distribution of the minimum of n − i independent random
n
on−i
1−F (y)
variables from the density f (y)/[1 − F (x)] for y ∈ (x, ∞); this minimum has distribution function 1 − 1−F
(x)
and density (n − i)f (y){1 − F (y)}n−i−1 /{1 − F (x)}n−i for y ∈ (x, ∞). Hence
"
#
Z ∞
r
X(i+1)
(n − i)
X
=
x
E
=
y r f (y){1 − F (y)}n−i−1 dy
r (i)
X(i)
xr {1 − F (x)}n−i x
(17.21a)
α+1
α
⇐ Substituting f (x) = αxα
and F (x) = 1 − xα
0 /x
0 /x in the right hand side of equation(17.21a) gives (n −
⇒ Equation(17.21a) gives
i)α/((nR− i)α + r) as required. The condition α > r/(n − i) ensures the integral is finite.
∞
(n − i) x y r f (y){1 − F (y)}n−i−1 dy = cxr {1 − F (x)}n−i for x ∈ (x0 , ∞). Differentiating with respect to x gives
f (x)/{1 − F (x)} = cr/[x(n − i)(c − 1)] = α/x where α = rc/[(n − i)(c − 1)] > r/(n − i). Hence result. Part (b) is
similar.
22. Let X1 = ln(X) and Y1 = ln(Y ). Then X1 and Y1 are i.i.d. random variables with an absolutely continuous distribution.
Also min(X1 , Y1 ) = ln [min(X, Y )] is independent of Y1 − X1 = ln(Y /X). Hence there exists λ > 0 such that X1 and
Y1 have the exponential (λ) distribution. Hence X = eX1 and Y = eY1 have the Pareto(λ, 0, 1) distribution.
23. ⇐ By equation(2.3a) on page 4 the density of Xi:n is
i−1 n−i
n!
n!
β
1
1
i−1
n−i
fi:n (t) =
f (t) {F (t)} {1 − F (t)}
=
1− β
(i − 1)!(n − i)!
(i − 1)!(n − i)! tβ+1
t
tβ
i−1
β β
n!
t −1
for t > 1.
=
βn+1
(i − 1)!(n − i)! t
By equation(2.5a) on page 5, the joint density of (Xi:n , Xj:n ) is, where c = n!/[(i − 1)!(j − i − 1)!(n − j)!],
i−1 j−1−i n−j
f(i:n,j:n) (u, v) = cf (u)f (v) F (u)
F (v) − F (u)
1 − F (v)
i−1 j−i−1 n−j
β2
1
1
1
1
= c β+1 β+1 1 − β
−
u v
u
uβ
vβ
vβ
β
i−1 β
j−i−1
β2
u −1
v − uβ
= c β+1 β(n−j+1)+1
u v
uβ
uβ v β
i−1 β
j−i−1
β2
= c β(j−1)+1 β(n−i)+1 uβ − 1
v − uβ
for 1 ≤ u < v.
u
v
1
Use the transformation (T, W ) = (Xi:n , Xj:n/Xi:n ). The absolute value of the Jacobian is ∂(t,w)
∂(u,v) = | /u| = 1/t. Hence
β
i−1 β
j−i−1
β2
t −1
w −1
= fi:n (t)fW (w)
tβn+1 wβ(n−i)+1
The fact that the joint density is the product of the marginal densities implies W and Y = Xi:n are independent.
⇒ The joint density of (Xi:n , Xj:n ) is given
by equation(2.5a) on page 5. The transformation to T = Xi:n , W =
Xj:n /Xi:n has Jacobian with absolute value ∂(t,w)
∂(u,v) = 1/u = 1/t. Hence (T, W ) has density
i−1 j−i−1 n−j
f(T,W ) (t, w) = ctf (t)f (wt) F (t)
F (wt) − F (t)
1 − F (wt)
Now T = Xi:n has density given by equation(2.3a) on page 4:
n!
i−1
n−i
fT (t) =
f (t) {F (t)} {1 − F (t)}
(i − 1)!(n − i)!
Hence the conditional density is, for all t > 1 and w > 1,
j−i−1 n−j
f(T,W ) (t, w)
(n − i)!
tf (tw)
F (tw) − F (t)
1 − F (tw)
f(W |T ) (w|t) =
=
fT (t)
(j − i − 1)!(n − j)! 1 − F (t)
1 − F (t)
1 − F (t)
(n − i)!
∂q(t, w)
1 − F (tw)
j−i−1
n−j
=
−
{1 − q(t, w)}
{q(t, w)}
where q(t, w) =
(j − i − 1)!(n − j)!
∂w
1 − F (t)
and by independence, this must be independent of t. Hence there exists a function g(w) with
∂q(t, w)
j−i−1
n−j
{1 − q(t, w)}
{q(t, w)}
g(w) =
∂w
j−i−1 ∂q(t, w) X j − i − 1
r
n−j
=
(−1)r {q(t, w)} {q(t, w)}
∂w
r
f(T,W ) (t, w) = c
r=0
Appendix
Jun 7, 2018(14:22)
j−i−1
X j−i−1
∂q(t, w)
r+n−j
(−1)r
{q(t, w)}
∂w
r
r=0
j−i−1 r+n−j+1
{q(t, w)}
∂ X j−i−1
(−1)r
=
r
∂w
r+n−j+1
=
r=0
and hence
Z
g1 (w) =
g(w) dw =
j−i−1
X r=0
r+n−j+1
j−i−1
{q(t, w)}
(−1)r
r
r+n−j+1
j−i−1
X j−i−1
r+n−j ∂q(t, w)
(−1)r {q(t, w)}
∂t
r
r=0
∂q(t, w) ∂q(t, w)
= g(w)
∂t
∂w
∂g1 (w)
0=
=
∂t
= 0 and so q(t, w) is a function of w only. Setting t = 1 shows that q(t, w) = (1 − F (tw))/(1 − F (t)) =
and hence ∂q(t,w)
∂t
q(1, w) = (1 − F (w))/(1 − F (1)). Hence 1 − F (tw) = (1 − F (w))(1 − F (t))/(1 − F (1)). But F (1) = 0; hence we have
the following equation for the continuous function F :
(1 − F (tw)) = (1 − F (t))(1 − F (w))
for all t ≥ 1 and w ≥ 1 with boundary conditions F (1) = 0 and F (∞) = 1. This is effectively Cauchy’s logarithmic
1
F (x) = 1 − β for x ≥ 1.
x
Chapter 1 Section 19 on page 36
24. (a) Let W = X + Y . For w > 0,
Z ∞
Z
fW (w) =
fX (x)fY (w − x) dx =
x=w
For w < 0
Z
−∞
25.
26.
27.
28.
29.
∞
αe−αx αeα(w−x) dx = α2 eαw
x=w
w
fW (w) =
(exs-other.tex)
fY (y)fX (w − y) dy = α2 e−αw
Z
∞
e−2αx dx =
x=w
Z
w
e2αy dy =
−∞
α −αw
e
2
α αw
e
2
and hence fW (w) = α2 e−α|w| for w ∈ R; this is the Laplace(0, α) distribution.
(b) The Laplace(0, α) distribution by part (a).
(a) The expectation, median and mode are all µ. Also var[X] = var[X − µ] = 2/α2 by using the representation of the
Laplace as the difference of two independent exponentials given in exercise 24.
(b) The distribution function is
1 −α(µ−x)
e
if
x
<
µ;
FX (x) = 2 1 −α(x−µ)
1 − 2e
if x ≥ µ.
(c) Using the representation in exercise 24 again implies the moment generating function is
α
α
α2 eµt
= 2
E[etX ] = eµt E[et(X−µ) ] = eµt
for |t| < α.
α − t α + t α − t2
Rx
Rx
Rx
For x > 0 we have P[|X| < x] = P[−x < X < x] = −x α2 e−α|y| dy = 2 0 α2 e−αy dy = 0 αe−αy dy. Hence |X| has
the exponential density αe−αx for x > 0.
(a) Now E[etX ] = α2 eµt /(α2 − t2 ). Hence if Y = kX + b then E[etY ] = ebt E[etkX ] = ebt α2 eµkt /(α2 − k 2 t2 ) =
et(kµ+b) α12 /(α12 − t2 ) where α1 = α/k. This is the mgf of the Laplace(kµ + b, α/k) distribution.
(b) Now X − µ ∼ Laplace(0, α); hence α(X − µ) ∼ Laplace(0, 1).
Pn
(c) By part
α(X − µ) ∼ Laplace(0, 1); hence α|X − µ|exponential (1); hence α i=1 |Xi − µ| ∼ Gamma(n, 1);
P(b),
n
hence 2α i=1 |Xi − µ| ∼ Gamma(n, 1/2) = χ22n
Now |X − µ| ∼ exponential (α) = Gamma(1, α). Hence result by equation(12.8a) on page 25.
Let W = X and Z = ln( X/Y ). Now (X, Y ) ∈ (0, 1) × (0, 1). Clearly W ∈ (0, 1). Also 0 < X < X/Y ; hence
eZ = X/Y > X = W . This implies: if Z > 0 then 0 < W < 1 and if Z < 0 then 0 < W < eZ .
z
−z
1
Then | ∂(w,z)
.
∂(x,y) | = /y = e /x. Hence f(W,Z) (w, z) = y = we
R 1 −z
R ez
1 −z
If z > 0 then fZ (z) = 0 we dw = 2 e . If z < 0 then fZ (z) = 0 we−z dw = 21 ez . The Laplace(0, 1) distribution.
30. Let Z = X(2Y − 1). Then E[etX(2Y −1) ] = 12 E[e−tX ] + 12 E[etX ] =
required.
1 α
2 α−t
+
1 α
2 α+t
=
α 2α
2 α2 −t2
=
α2
α2 −t2
for |t| < α as
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
31. Let Y1 = (X1 + X2 )/2; Y2 = (X3 + X4 )/2, Y3 = (X1 − X2 )/2 and Y4 = (X3 − X4 )/2. Then X1 X2 = Y12 − Y32 and
X3 X4 = Y22 − Y42 . Hence X1 X2 − X3 X4 = (Y12 + Y42 ) − (Y22 + Y32 ).
Now
t1 − t3
t2 + t4
t2 − t4
t1 + t3
E exp ( i(t1 Y1 + t2 Y2 + t3 Y3 + t4 Y4 ) ) = E exp i
X1 +
X2 +
X3 +
X4
2
2
2
2
2
2
2
(t1 + t3 )
(t1 − t3 )
(t2 + t4 )
(t2 − t4 )2
= exp −
−
−
−
8
8
8
8
2 2 2 2
t +t +t +t
= exp − 1 3 2 4 = E[eit1 Y1 ] E[eit2 Y2 ] E[eit3 Y3 ] E[eit4 Y4 ]
4
2
1
Hence Y1 , Y2 , Y3 and Y4 are i.i.d. N (0, σ = /2). Hence 2(X1 X2 − X3 X4 ) = 2(Y12 + Y42 ) − 2(Y22 + Y32 ) is equal to the
difference of two independent χ22 = exponential ( 1/2) distributions which is the Laplace(0, 1/2) distribution.
(b) X1 X2 + X3 X4 = (Y12 + Y22 ) − (Y32 + Y42 ) and then as for part (a).
32. Using characteristic functions.
Z ∞
Z ∞
Z ∞
√
√
2
2
1
1
E[eitY 2X ] =
E[eitY 2x ]e−x dx =
e− 2 2xt e−x dx =
e−x(1+t ) dx =
1
+
t2
0
0
0
and this is the c.f. of the Laplace(0, 1) distribution. Hence result.
√
√
1 ,y2 ) √
Using densities. Let Y1 = X and Y2 = Y 2X. Hence Y1 > 0 and Y2 ∈ R and ∂(y
2x = 2y1 . Hence
∂(x,y) =
f(Y1 ,Y2 ) (y1 , y2 ) =
33.
34.
35.
36.
37.
2
f(X,Y ) (x, y)
1 −x 1 −y2 /2
1
√
=√
e √ e
= √
e−y1 e−y2 /4y1
2
πy
2y1
2y1
2π
1
Using the substitution y1 = z 2 /2 and equation(7.12b) on page 16 gives
Z ∞
Z ∞
y2
z2
2
1
1
1
−(y1 +y22 /4y1 )
√ e−( 2 + 2z2 ) dz = e−|y2 |
fY2 (y2 ) =
e
dy1 =
√
2 πy1
2
2π
0
0
as required.
(a) The absolute value of the Jacobean of the transformation
is
∂(x, y) cos θ −r sin θ =
∂(r, θ) sin θ r cos θ = r
Hence for r ∈ (0, ∞) and θ ∈ (0, 2π) we have
r −r2 /2σ2
f(R,Θ) (r, θ) =
e
2πσ 2
2
2
Hence Θ is uniform on (0, 2π) with density f (θ) = 1/2π and R has density fR (r) = (r/σ 2 )e−r /2σ for r > 0.
(b)
2
2
1 −(x2 +y2 )/2σ2
1 1
r
=
e
for (x, y) ∈ R2 .
f(X,Y ) (x, y) = 2 e−r /2σ
σ
2π r 2πσ 2
Hence X and Y are i.i.d. N (0, σ 2 ).
2
2
(a) P[R ≤ r] = 1 − e−r /2σ for r ≥ 0.
(b) Using the substitution r2 = y shows
Z ∞
Z ∞
1
1
n+2
n
n+1 −r 2 /2σ 2
n/2 −y/2σ 2
n/2 n
E[R ] = 2
r e
dr =
y e
dy = 2 σ Γ
σ 0
2σ 2 0
2
R ∞ n−1 −λx
where the last equality comes from the integral of the gamma density: 0 x
e
dx = Γ(n)/λn .
p
(c) E[R] = σ π/2; E[R2 ] = 2σ 2 and√hence var[R] = (4 − π)σ 2 /2.
(d) By differentiating the density, the mode is
at σ. From part (a), the median is at σ 2 ln(2).
√
2
2
2
2
P[X ≤ x] = P[ −2 ln U ≤ x/σ] = P[−2 ln U ≤ x2 /σ 2 ] = P[ln U ≥ −x2 /2σ 2 ] = P[U ≥ e−x /2σ ] = 1 − e−x /2σ as
required.
2
2
2
(a) Now fR (r) = re−r /2σ /σ 2 for r > 0. Let V = R2 . Then fV (v) = e−v/2σ /2σ 2 which is the exponential (1/2σ 2 ),
2
or equivalently Gamma(n = 1, α = 1/2σ ).
(b) The density of V = R2 is fV (v) = e−v/2 /2 which is the
exponential (1/2), or equivalently Gamma(n = 1, α = 1/2) = χ22 .
(b) Because the sum of i.i.d. exponentials is
2
a Gamma distribution, the result follows.
(c)
X
has
density
f
(x)
=
λe−λx for x > 0. Hence fY (y) = 2λye−λy
X
√
for y > 0. This is the Rayleigh (1/ 2λ) distribution.
First consider the case µ = 0. Then we have
2
2
2
2
x
1
fX (x) = 2 e−x /2s for x > 0. Also fY |X (y|x) = √
e−y /2x for y ∈ R.
s
2πx2
Hence, by using the integration result in equation(7.12b) on page 16, we get
Z ∞
Z ∞
2
2
2
2
1
1
1
√ e− 2 (x /s +y /x ) dx = e−|y|/s
fY (y) =
fY |X (y|x)fX (x) dx =
2
2s
2π
x=0
x=0 s
which is the Laplace(0, 1/s) distribution. Finally, if Y |X ∼ N (µ, σ = X) then (Y − µ)|X ∼ N (0, σ = X) and
(Y − µ) ∼ Laplace(0, 1/s). Hence result.
Appendix
Jun 7, 2018(14:22)
38. (a) Setting β = 1 gives fX (x) = e−x/γ /γ which is the exponential (1/γ) distribution.
2
(b) Setting β = 2 gives fX (x) = 2xe−(x/γ) /γ 2 ; this is the Rayleigh distribution with σ 2 = γ 2 /2.
γ
1/β−1
39. Now dw
. Hence
dx = β x
β−1
β
β w
βwβ−1 −w/γ)β
fW (w) =
e−(w/γ) =
e
for w > 0.
γ γ
γβ
β
dy
40. (a) Now dx
= α; hence fY (y) = βy β−1 e−y/(αγ) /(αγ)β which is the Weibull (αγ, β) distribution.
(b) Using the substitution y = xβ /γ β gives
Z ∞
Z ∞
β
n
β
y n/β e−y dy = γ n Γ 1 +
xn+β−1 e−(x/γ) dx = γ n
E[X n ] = β
γ 0
β
0
2
1
2
(c) E[X] = γΓ(1 + /β ) where β is the shape and γ is the scale. Also var[X] = γ Γ(1 + /β ) − Γ(1 + 1/β )2
The median is γ(ln 2)1/β . The mode is γ(1 − 1/β )1/β if β > 1 and 0 otherwise. (d) E[et ln X ] = E[X t ] = γ t Γ 1 + t/β .
41. (a) It is h(x) = f (x)/[1 − F (x)] = βxβ−1 /γ β . (b) Straightforward.
42. (− ln U ) has the exponential (1) distribution; hence result by exercise 39.
Chapter 1 Section 21 on page 41
(exs-sizeshape.tex.tex)
1. Consider the size variable g1 : (0, ∞)2 → (0, ∞) with g(x) = x1 . The associated shape function is z1 (x) = 1, x2/x1 =
(1, a). Hence z1 (x) is constant. For any other shape function z2 , we know that z2 (x) = z2 ( z(x) ) (see the proof of
equation(20.3a) on page 39. Hence result.
2. (a) Consider the size variable g ∗ (x) = x1 ; the associated shape function is z ∗ (x) = 1, x2 /x1 . Hence
3
with probability 1/2;
z ∗ (X) = 1
/3 with probability 1/2.
If z is any other shape function, then by equation(20.3a) on page 39 we have z ∗ (X) = z ∗ ( z(X) ). Hence z(X) cannot be
almost surely constant.
The possible values of the 3 quantities are as follows:
probability
z(X)
g1 (X)
g2 (X)
√
1/4
( 1/4, 3/4)
4
√3
1/4
( 1/4, 3/4)
12
4
√
1/4
4
( 3/4, 1/4)
√3
1/4
( 3/4, 1/4)
12
4
Clearly z(X) is independent of both g1 (X) and g2 (X).
(b) By proposition(20.3b)
on page 39 we know that g1 (X)/g2 (X) is almost surely constant. It is easy to check that
√
g1 (X)/g2 (X) = 3/4.
3. ⇐ Let Yj = Xjb for j = 1, 2, . . . , n. Then Y1 , . . . , Yn are independent random variables. By proposition(20.4a), the
b
b
shape vector 1, Y2/Y1 , . . . , Yn/Y1 is independent of Y1 + · · · + Yn . This means 1, X2 /X1b , . . . , Xn/X1b is independent of
X1b + · · · + Xnb . Hence 1, X2/X1 , . . . , Xn/X1 is independent of (X1b + · · · + Xnb )1/b as required.
⇒ We are given that 1, X2/X1 , . . . , Xn/X1 is independent of (X1b + · · · + Xnb )1/b . Hence 1, X2b/X1b , . . . , Xnb/X1b is
independent of X1b + · · · + Xnb . By proposition(20.4a), there exist α > 0, k1 > 0, . . . , kn > 0 such that Xjb ∼
Gamma(kj , α) and hence Xj ∼ GGamma(kj , α, b).
R∞
R∞
4. (a) P[X1 < X2 ] = 0 P[X2 > x]λ1 e−λ1 x dx = 0 e−λ2 x λ1 e−λ1 x dx = λ1 /(λ1 + λ2 ).
(b) The lack of memory property implies the distribution of V = X2 − X1 given X2 > X1 is the same as the distribution
of X2 and the distribution of X2 − X1 given X2 < X1 is the same as the distribution of −X1 . Hence
(
2
λ1 eλ1 v λ1λ+λ
if v ≤ 0;
λ2
λ1
2
fV (v) = fV (v|X2 < X1 )
+ fV (v|X1 < X2 )
=
−λ2 v λ1
λ1 + λ2
λ1 + λ2
λ2 e
if v ≥ 0.
λ1 +λ2
(c) Now V = X2 − X1 . Hence
Z
fV (v) =
fX2 (x + v)fX1 (x) dx = λ1 λ2 e−λ2 v
{x:x+v>0}
Hence if v ≥ 0 we have
fV (v) = λ1 λ2 e−λ2 v
fV (v) = λ1 λ2 e−λ2 v
∞
x=max{−v,0}
Z
∞
e−(λ1 +λ2 )x dx =
λ1 λ2 e−λ2 v
λ1 + λ2
e−(λ1 +λ2 )x dx =
λ1 λ2 eλ1 v
λ1 + λ2
0
and if v ≤ 0 we have
Z
Z
∞
−v
e−(λ1 +λ2 )x dx
Jun 7, 2018(14:22)
5. Now V = Y2 − Y1 . By exercise 4, we know that
λ1 λ2
eλ1 v
if v ≤ 0;
fV (v) =
λ1 + λ2
e−λ2 v if v ≥ 0.
Bayesian Time Series Analysis
(
P[V ≤ v] =
and
λ2
λ1 v
λ1 +λ2 e
λ1
1 − λ1 +λ2 e−λ2 v
if v ≤ 0;
if v ≥ 0.
P[U ≥ u] = e−λ1 (u−a) e−λ2 (u−a) = ea(λ1 +λ2 ) e−(λ1 +λ2 )u for u ≥ a.
Now for u ≥ a and v ∈ R we have
Z ∞
P[U ≥ u, V ≤ v] = P[Y1 ≥ u, Y2 ≥ u, Y2 − Y1 ≤ v] =
P[u ≤ Y2 ≤ v + y1 ]fY1 (y1 ) dy1
y1 =u
= λ1 eλ1 a
Z
∞
y1 =u
P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
where
P[u ≤ Y2 ≤ v + y1 ] =
Hence for v ≥ 0 we have
P[U ≥ u, V ≤ v] = λ1 eλ1 a
= λ1 e
Z
λ2 eλ2 a
0
R v+y1
y2 =u
e−λ2 y2 dy2 = eλ2 a [e−λ2 u − e−λ2 (v+y1 ) ]
if v + y1 > u;
if v + y1 < u.
∞
P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
y1 =u
(λ1 +λ2 )a
Z
∞
y1 =u
e−λ1 y1 e−λ2 u − e−λ2 (v+y1 ) dy1
−λ1 u
−(λ1 +λ2 )u
λ1 e−λ2 v
(λ1 +λ2 )a
−λ2 u e
−λ2 v e
(λ1 +λ2 )a −(λ1 +λ2 )u
= λ1 e
e
−e
=e
e
1−
λ1
λ1 + λ2
λ1 + λ2
= P[U ≥ u]P[V ≤ v]
Similarly, for v ≤ 0 we have
Z ∞
λ1 a
P[U ≥ u, V ≤ v] = λ1 e
P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
= λ1 e
y1 =u−v
Z ∞
(λ1 +λ2 )a
y1 =u−v
e−λ1 y1 e−λ2 u − e−λ2 (v+y1 ) dy1 = P[U ≥ u]P[V ≤ v]
Hence the result.
6. Suppose θj > 0 and x0 > 0. Then X ∼ Pareto(α, 0, θj x0 ) iff X/θj ∼ Pareto(α, 0, x0 ) and proceed as in the proof of
proposition(20.5a) on page 40.
7. Suppose θj > 0 and h > 0. Then X ∼ Power(α, 0, θj h) iff X/θj ∼ Power(α, 0, h) and proceed as in the proof of
proposition(20.6a) on page 40.
Chapter 2 Section 2 on page 44
T
(exs-multiv.tex)
T
µY µTY
T
1. Let Y = X − α. Then E[(X − α)(X − α) ] = E[YY ] = var[Y] +
= var[X] + (µX − α)(µx − α) .
2. cov[aT X, bT Y] = E[(aT X − aT µX )(bT Y − bT µY )T ] = aT cov[X, Y]b.
3. (a) Just use E[X + a] = µX + b and E[Y + d] = µY + d. T
T T
T
(b) cov[AX, BY] = E (AX − Aµ
X )(BY − BµY ) = E A(X − µX )(Y − µY ) B =T Acov[X, Y]B .
(c)h cov[aX + bV, cY + dW] = E (aX + bV − aµX − bµY )(cY
i + dW − cµY − dµW ) which equals
E {a(X − µX ) + b(V − µY )} {c(Y − µY ) + d(W − µW )}
(d) Similar to (c).
T
which equals the right hand side.
4.

1 1 1 ··· 1
1 2 2 ··· 2 


1 2 3 ··· 3 
var[X] = 
. . . .

 .. .. ..
. . ... 
1 2 3 ··· n
Pn Pn
5. As usual let σij = cov[Xi , Xj ]. Now YT AY = i=1 j=1 aij Yi Yj and hence
n X
n
n X
n
X
X
E[YT AY] =
aij E[Yi Yj ] =
aij cov[Yi , Yj ] + µi µj

i=1 j=1
=
n X
n
X
i=1 j=1
n X
n
X
aij σij + µT Aµ =
i=1 j=1
= trace(AΣ) + µT Aµ
i=1 j=1
aij σji + µT Aµ
Appendix
Jun 7, 2018(14:22)
Chapter 2 Section 4 on page 48
(exs-bivnormal.tex)
For questions 1–8, see the method in the solution of example(3.3a) on page 46.
1. Setting Q(x, y) = 32 (x2 − xy + y 2 ) gives a1 = 2/3, a2 = − 1/3 and a3 = 2/3. Hence
√
2/3
1
− 1/3
P=
and |P| = . Hence c = 1/2π 3.
− 1/3 2/3
3
(b) We have cov[X1 , X2 ] = −a2 /(a1 a3 − a22 ) = 1 and hence they are not independent.
2. Setting Q(x, y) = 2(x2 + 2xy + 4y 2 ) gives a1 = 2, a2 = 2 and a3 = 8. Clearly (µ1 , µ2 ) = (0, 0). Hence
1
2 2
8 −2
P=
and |P| = 12. Also Σ = P−1 =
2 8
12 −2 2
√
√
1/2
Finally |P| = 2 3 and so k = 3/π.
3. Setting Q(x, y) = 2x2 + y 2 + 2xy − 22x − 14y + 65 gives a1 = 2, a2 = 1 and a3 = 1. Hence
2 1
1 −1
P=
and |P| = 1 and Σ = P−1 =
1 1
−1 2
Finally k = 2π/|P|1/2 = 2π.
Now for the mean vector: setting ∂Q(x,y)
= 0 and ∂Q(x,y)
= 0 gives 4µ1 + 2µ2 − 22 = 0 and 2µ2 + 2µ1 − 14 = 0. Hence
∂x
∂y
(µ1 , µ2 ) = (4, 3).
4. Setting Q(y1 , y2 ) = y12 + 2y22 − y1 y2 − 3y1 − 2y2 + 4 gives a1 = 1, a2 = − 1/2 and a3 = 2. Hence
1 8 2
1
− 1/2
7/4 and Σ = P−1 =
P=
and
|P|
=
− 1/2
2
7 2 4
Setting
∂Q(y1 ,y2 )
∂y1
= 0 and
∂Q(y1 ,y2 )
∂y2
= 0 gives 2µ1 − µ2 − 3 = 0 and 4µ2 − µ1 − 2 = 0. Hence µ1 = 2 and µ2 = 1.
5. Proceed as in question 2 above. Hence integral = π/√3.
6. Setting Q(y1 , y2 ) = 61 y12 + 2y1 (y2 − 1) + 4(y2 − 1)2 gives µ1 = 0, µ2 = 1, a1 = 1/6, a2 = 1/6 and a3 = 2/3. Hence
1 1 1
8 −2
−1
1
and |P| = /12 and Σ = P =
P=
−2 2
6 1 4
7. Using equation (3.3b) on page 46 for P gives XT PX −
2
X12 /σ12
X12
σ12
= (σ22 X12 − 2ρσ1 σ2 X1 X2 + σ12 X22 )/[σ12 σ22 (1 − ρ2 )] −
2
2
2
2
2
(ρ
− 2ρX1 X2 /σ1 σ2 + Xp
2 /σ2 )/(1 − ρ ) = (ρX1 /σ1 − X2 /σ2 ) /(1 − ρ ).
Let Z = (ρX1 /σ1 − X2 /σ2 )/ 1 − ρ2 ). Then Z is normal with E[Z] = 0 and var[Z]
2ρcov[X1 , X2 ]/σ1 σ2 )/(1 − ρ2 ) = (ρ2 + 1 − 2ρ2 )/(1 − ρ2 ) = 1. Hence Z 2 ∼ χ21 as required.
X12
σ12
=
= E[Z 2 ] = (ρ2 + 1 −
8. (a) E[X1 |X2 ] = E[Y − αX2 |X2 ] = E[Y |X2 ] − αX2 = E[Y ] − αX2 = µ1 + αµ2 − αX2 .
(b) Suppose (X1 , X2 ) has a bivariate normal distribution with µ1 = µ2 = 0. Consider the random vector (X2 , Y ) with
Y = X1 − ρσ1 X2 /σ2 . This is a non-singular linear transformation and hence by proposition (3.7a) on page 48 the new
vector has a bivariate normal with mean (0, 0) and variance matrix
2
σ2
0
0 (1 − ρ2 )σ12
Hence Y and X2 are independent. Hence by part (a) we have E[X1 |X2 ] = X2 ρσ1 /σ2 . Hence if V1 = X1 + µ1 and
V2 = X2 + µ2 then E[V1 |V2 ] = µ1 + (V2 − µ2 )ρσ1 /σ2 .
9. (a) Z has the N (0, 1) distribution.
(b) Using
2
1
x − 2ρxz + z 2
fXY (x, y) fXY (x, y)
= p
p
=
exp −
fXZ (x, z) = ∂(x,z) 2(1 − ρ2 )
1 − ρ2
2π 1 − ρ2
∂(x,y)
(c) We now have
∂(u, v) ∂(x, z) = σ1 σ2
and hence the density of (U, V ) is that given in equation(3.3a) on page 46.
10. Now
Z
Z Z
1
itY
itQ(x1 ,x2 )
p
E[e ] = e
fX1 X2 (x1 , x2 ) dx1 dx2 =
e−(1−it)Q(x1 ,x2 ) dx1 dx2
2πσ1 σ2 1 − ρ2 x1 x2
Define α1 and α2 by σ1 = (1 − it)1/2 α1 and σ2 = (1 − it)1/2 α2 . Hence
p
1
α1 α2
1
p
E[eitY ] =
2πα1 α2 1 − ρ2 =
=
2
σ1 σ2 1 − it
2πσ1 σ2 1 − ρ
and hence Y has an exponential(1) distribution.
Jun 7, 2018(14:22)
Bayesian Time Series Analysis
2
11. (a) If |Σ| = 0 then σ12 σ22 = σ12
. The possibilities are
d
(i) σ12 = 0, σ1 = 0 and σ2 6= 0; φ(t) = exp(− 21 σ22 t22 ). Hence (X1 , X2 ) = (0, Z) where Z ∼ N (0, σ22 ).
d
(ii) σ12 = 0, σ1 6= 0 and σ2 = 0; φ(t) = exp(− 21 σ12 t21 ). (X1 , X2 ) = (Z, 0) where Z ∼ N (0, σ12 ).
d
(iii) σ12 = 0, σ1 = 0 and σ2 = 0; φ(t) = 1; (X1 , X2 ) = (0, 0).
(iv) σ12 6= 0 and ρ = σ12 /σ1 σ2 = +1; φ(t) = E[ei(t1 X1 +t2 X2 ) ] = exp − 21 (σ1 t1 + σ2 t2 )2 . Hence if Z = σ2 X1 − σ1 X2 then
a.e.
setting t1 = tσ2 and t2 = tσ1 gives φZ (t) = E[ei(tσ2 X1 −tσ1 X2 ) ] = E[exp(− 21 t2 × 0)] = 1. Hence σ2 X1 = σ1 X2 .
a.e.
(v) σ12 6= 0 and ρ = σ12 /σ1 σ2 = −1; σ2 X1 = −σ1 X2 .
d
(b) (i) (X1 , X2 ) = (µ1 , Z) where Z ∼ N (µ2 , σ22 ).
d
a.e.
d
(ii) (X1 , X2 ) = (Z, µ2 ) where Z ∼ N (µ1 , σ12 ).
a.e.
(iii) (X1 , X2 ) = (µ1 , µ2 ).
(iv) σ2 (X1 − µ1 ) = σ1 (X2 − µ2 ).
(v) σ2 (X1 − µ1 ) = −σ1 (X2 − µ2 ).
12. (a) Now (X, Y ) is bivariate normal with E[X] = E[Y ] = 0 and variance matrix
a21 + a22
a1 b1 + a2 b2
a1 b1 + a2 b2
b21 + b22
Hence E[Y |X] = ρσ2 X/σ1 = X(a1 b1 + a2 b2 )/(a21 + a22 ).
(b) By simple algebra
(a2 T1 − a1 T2 )(a2 b1 − a1 b2 )
Y − E[Y |X] =
a21 + a22
2 (a2 T1 − a1 T2 )2 (a2 b1 − a1 b2 )2
Y − E[Y |X] =
(a21 + a22 )2
and E (a2 T1 − a1 T2 )2 = E(a22 T12 − 2a1 a2 T1 T2 + a21 T22 ) = a21 + a22 . Hence result.
13. Clearly ρ = 0 implies (X, Y ) are independent and hence X 2 and Y 2 are independent. Conversely, suppose X 2 and Y 2
are independent. Recall the characteristic function is
1 2
1 T
2
φX (t) = exp − t Σt = exp − t1 + 2ρt1 t2 + t2
2
2
2 2
2 2
Now E[X Y ] is the coefficient of t1 t2 /4 in the expansion of φ(t). Hence E[X 2 Y 2 ] is the coefficient of t21 t22 /4 in the
expansion of
1 2
2
1 2
1−
t + 2ρt1 t2 + t22 +
t + 2ρt1 t2 + t22
2 1
8 1
Hence E[X 2 Y 2 ] is the coefficient of t21 t22 /4 in the expansion of
2
1 2
t1 + 2ρt1 t2 + t22
8
Hence E[X 2 Y 2 ] = 2ρ2 + 1. Independence of X 2 and Y 2 implies E[X 2 ]E[Y 2 ] = 1. Hence ρ = 0.
This could also be obtained by differentiating the characteristic function—but this is tedious. Here are some of the steps:
1
φ(t) = exp − f (t1 , t2 ) where f (t1 , t2 ) = t21 + 2ρt1 t2 + t22
2
∂φ(t)
1 ∂f
1
=−
exp − f (t1 , t2 )
∂t1
2 ∂t1
2
"
2 #
2
2
∂ φ(t)
1 ∂ f 1 ∂f
1
=
−
+
−
f
(t
,
t
)
exp
1
2
2 ∂t21 4 ∂t1
2
∂t21
2
∂ φ(t)
1
= g(t1 , t2 ) exp − f (t1 , t2 ) where g(t1 , t2 ) = t21 + ρ2 t22 − 1 + 2ρt1 t2
2
2
∂t1
"
2 #
4
2
∂ φ(t)
∂ g 1
∂2f
∂f ∂g 1
∂f
1
=
− g(t1 , t2 ) 2 −
+ g(t1 , t2 )
exp − f (t1 , t2 )
2
∂t2 ∂t2 4
∂t2
2
∂t21 ∂t22
∂t22
∂t2
Setting t1 = t2 = 0 gives 2ρ2 − 21 (−1)(2) = 2ρ2 + 1 as above.
14. Let V1 = X1 /σ1 and V2 = X2 /σ2 . Then (V1 , V2 ) has a bivariate normal distribution with E[V1 ] = E[V2 ] = 0 and
var[V1 ] = var[V2 ] = 1. Then
"
#
v12 − 2ρv1 v2 + v22
1
p
fV1 V 2 (v1 , v2 ) =
exp −
2(1 − ρ2 )
2π 1 − ρ2
Consider the transformation (W, V ) with W = V1 and V = V1 /V2 . The range is effectively R2 —apart from a set of
measure 0. The Jacobean is
∂(w, v) 1
0 |v1 |
=
∂(v1 , v2 ) 1/v2 −v1 /v 2 = v 2
2
2
Appendix
Jun 7, 2018(14:22)
and hence
fW V (w, v) =
Hence
v22 fV1 V2 (v1 , v2 )
where v1 = w and v2 = w/v
|v1 |
w2 (v 2 − 2ρv + 1)
|w|
p
exp −
fW V (w, v) =
2(1 − ρ2 )v 2
2πv 2 1 − ρ2
We now integrate out w to find the density of V :
Z ∞
Z ∞
1
w
w2 (v 2 − 2ρv + 1)
p
p
dw =
w exp −αw2 dw
fV (v) =
exp −
2 )v 2
2
2
2
2
2(1
−
ρ
πv 1 − ρ
πv 1 − ρ 0
0
1
v 2 − 2ρv + 1
p
where α =
2(1 − ρ2 )v 2
2πv 2 α 1 − ρ2
p
1 − ρ2
=
2
π(v − 2ρv + 1)
Now Z = X1 /X2 = V (σ1 /σ2 ) which is a 1 − 1 transformation. Hence fZ (z) = σ2 fV (v)/σ1 and the result follows.
15. The idea is to linearly transform (X1 , X2 ) to (V1 , V2 ) so that V1 and V2 are i.i.d. N (0, 1). In general, Σ = AAT and if we
set V = A−1 X then var[V]
= A−1 Σ(A−1 )T = I and so V1 and V2 are independent.
p
p
p
1 p
1 p
a b
Suppose A =
with a =
1 + ρ + 1 − ρ and b =
1 + ρ − 1 − ρ . Then a2 − b2 = 1 − ρ2 ,
b a
2
2
a2 + b2 = 1 and 2ab =ρ. Hence AAT = Σ.
1
aX1 − bX2
−bX1 + aX2
a −b
Also A−1 = 2
. So let V1 =
and V2 =
.
2
2
2
−b
a
a −b
a −b
a2 − b2
Then E[V1 ] = E[V2 ] = 0, var[V1 ] = var[V2 ] = 1 and cov(V1 , V2 ) = 0. As (V1 , V2 ) is bivariate normal, this implies that
V1 and V2 are i.i.d. N (0, 1). Hence V12 + V22 ∼ χ22 . But
X 2 + X22 − 2ρX1 X2
V12 + V22 = 1
1 − ρ2
as required
16. Recall
2
x
1
2ρxy y 2
1
p
exp −
−
+
f (x, y) =
2(1 − ρ2 ) σ12
σ1 σ2 σ22
2πσ1 σ2 1 − ρ2
By using the transformation v = x/σ1 and w = y/σ2 which has ∂(v,w)
∂(x,y) = σ1 σ2 , we get
Z ∞Z ∞
Z ∞Z ∞
(x2 − 2ρxy + y 2 )
1
p
dxdy
(4.16a)
P[X ≥ 0, Y ≥ 0] =
exp −
f (x, y) dxdy =
2(1 − ρ2 )
x=0 y=0
x=0 y=0 2π 1 − ρ2
Z 0
Z ∞
Z 0
Z ∞
(x2 − 2ρxy + y 2 )
1
p
P[X ≤ 0, Y ≥ 0] =
dxdy
(4.16b)
f (x, y) dxdy =
exp −
2(1 − ρ2 )
x=−∞ y=0
x=−∞ y=0 2π 1 − ρ2
Now use polar coordinates: x = r cos θ and y = r sin θ. Hence tan θ = y/x and r2 = x2 + y 2 . Also ∂(x,y)
∂(r,θ) = r. Hence
2
Z π/2 Z ∞
1
r (1 − ρ sin 2θ)
p
P[X ≥ 0, Y ≥ 0] =
r exp −
drdθ
2(1 − ρ2 )
2π 1 − ρ2 θ=0 r=0
Z π/2 Z ∞
1
1 − ρ sin 2θ
p
=
r exp[−αr2 ] drdθ where α =
2
2(1 − ρ2 )
2π 1 − ρ θ=0 r=0
Z π/2
Z
∞
π/2
1
1
exp[−αr2 ] 1
p
p
=
dθ
dθ =
2
2
−2α
2α
2π 1 − ρ θ=0
2π 1 − ρ θ=0
r=0
p
Z
dθ
1 − ρ2 π/2
=
2π
1
−
ρ
sin 2θ
θ=0
Similarly, where the transformation θ → θ − π/2 is used in the last equality
p
p
Z
Z
1 − ρ2 π
dθ
1 − ρ2 π/2
dθ
P[X ≤ 0, Y ≥ 0] =
=
2π
1
−
ρ
sin
2θ
2π
1
+
ρ
sin 2θ
θ=π/2
θ=0
=
dt
We now use the transformation t = tan θ. Hence dθ
= sec2 θ = 1 + t2 . Also sin 2θ = 2t/(1 + t2 ), cos 2θ = (1 − t2 )/(1 + t2 )
2
and tan 2θ = 2t/(1 − t ). Hence
P[X ≥ 0, Y ≥ 0]
p
p
p
Z
Z
Z
1 − ρ2 ∞
dt
1 − ρ2 ∞
dt
1 − ρ2 ∞
dt
=
=
=
2
2
2
2
2
2π
2π
2π
t=0 t − 2ρt + 1
t=0 (t − ρ) + (1 − ρ )
t=−ρ t + (1 − ρ )
p
Jun 7, 2018(14:22)

Bayesian Time Series Analysis
∞ 
Z
dx
1
t
x

p
by
using
the
standard
result
= tan−1 + c
2 + a2
2
x
a
a
1−ρ
1
1− 
p
tan−1
2
2π
1−ρ
t=−ρ
#
"
1 π
1
1
ρ
=
= +
+ tan−1 p
sin−1 ρ as required.
2
2π 2
4
2π
1−ρ
=
ρ2
The transformation (X, Y ) → (−X, −Y ) shows that P[X ≥ 0, Y ≥ 0] = P[X ≤ 0, Y ≤ 0].
Chapter 2 Section 6 on page 55
1.
2.
3.
4.
5.
(exs-multivnormal.tex)
(a) The characteristic function of X is φX (t) = exp itT µ − 21 tT Σt . Hence the characteristic function of Y = X − µ is
φY (t) = E exp(itT Y) = E exp(itT X) exp(−itT µ) = exp − 21 tT Σt . Hence Y ∼ N (0, Σ).
(b) E[exp(itT X)] = E[exp(it1 X1 )] · · · E[exp(itn Xn )] = exp(it1 µ1 − 21 σ 2 t21 ) · · · exp(itn µn − 21 σ 2 t2n ) = exp(itT µ −
1 2 T
2 σ t It) as required.
Pn
Pn
(c) Σ = diag[d1 , . . . , dn ]. Hence φX (t) = exp i i=1 ti di − 12 i=1 t2i di = exp(it1 µ1 − 12 t21 d1 ) · · · exp(itn µn − 12 t2n dn )
which means that X1 , . . . , Xn are independent with distributionsN (µ1 , d1 ), . . . , N (µn , dn ) respectively.
(d) φZ (t) = φX (t)φY (t) = exp itT (µX + µY ) − 21 tT (ΣX + ΣY )t
(a) The vectors (X1 , X3 ) and X2 are independent iff the 2 × 1 matrix cov[(X1 , X3 ), X2 ] = 0 (by property of the multivariate normal). But cov[X1 , X2 ] = 0 and cov[X3 , X2 ] = 0. Hence result.
(b) cov[X1 − X3 , X1 − 3X2 + X3 ] =
var[X1 ] − 3cov[X1 , X2 ] + cov[X1 , X3 ] − cov[X3 , X1 ] + 3cov[X3 , X2 ] − var[X3 ] = 4 − 0 − 1 + 1 + 0 − 2 = 2. Hence
not independent.
(c) cov[X1 + X3 , X1 − 2X2 − 3X3 ] = 4 − 0 + 3 − 1 + 0 − 6 = 0. Hence independent.
(a) Suppose x ∈ Rm with aT ax = 0; then xT aT ax = 0. Hence (ax)T (ax) = 0; hence ax = 0. Hence x1 α1 +· · ·+xm αm = 0
where α1 , . . . , αm are the columns of a. But rank(a) = m. Hence x = 0. Hence rank(aT a) = m and so aT a is invertible.
(b) First note that if G is any symmetric non-singular matrix, then GG−1 = I; hence (G−1 )T G = I; hence G−1 = (G−1 )T .
Clearly E[Y] = Bµ and var[Y] = σ 2 BBT = σ 2 (aT a)−1 aT a(aT a)−1 = σ 2 (aT a)−1 as required.
By proposition(5.9a) on page 53, we know there exists a0 , a1 , a2 , a3 and a4 in R such that E[Y |X1 , X2 , X3 , X4 ] =
a0 + a1 X1 + a2 X2 + a3 X3 + a4 X4 . Taking expectations gives a0 = 1. Also, taking expectations of
E[Y X1 |X1 , X2 , X3 , X4 ] = X1 EY |X1 , X2 , X3 , X4 ] = X1 + a1 X12 + a2 X1 X2 + a3 X1 X3 + a4 X1 X4
1
gives /2 = a1 + 1/2(a2 +a3 +a4 ). Similarly 1/2 = a2 + 1/2(a1 +a3 +a4 ), 1/2 = a3 + 1/2(a1 +a2 +a4 ), and 1/2 = a4 + 1/2(a1 +a2 +a3 ).
Subtracting shows that a1 = a2 = a3 = a4 ; indeed this was obvious from the symmetry in Σ. Combining this result with
1/2 = a + 1/2(a + a + a ) gives a = a = a = a = 1/5.
1
2
3
4
1
2
3
4
(a) The matrix of regression coefficients is
−1
σ22
ρ23 σ2 σ3
A Σ−1
=
ρ
σ
σ
ρ
σ
σ
[
]
12 1 2
13 1 3
Z
ρ23 σ2 σ3
σ32
1×2 2×2
1
σ32
−ρ23 σ2 σ3
= [ ρ12 σ1 σ2 ρ13 σ1 σ3 ] 2 2
σ22
σ2 σ3 (1 − ρ223 ) −ρ23 σ2 σ3
σ1
=
[ σ3 (ρ12 − ρ13 ρ23 ) σ2 (ρ13 − ρ12 ρ23 ]
σ2 σ3 (1 − ρ223 )
T
Using the general result that E[Y|Z] = µY − AΣ−1
(Z − µZ ) and var[Y|Z] = ΣY − AΣ−1
Z A gives
Z
σ1
X2 − µ2
X3 − µ3
(ρ12 − ρ13 ρ23 )
+ (ρ13 − ρ12 ρ23 )
E[X1 |(X2 , X3 )] = µ1 −
σ2
σ3
(1 − ρ223 )
2
2
ρ − 2ρ12 ρ23 ρ13 + ρ13
var[X1 |(X2 , X3 )] = σ12 1 − 12
1 − ρ223
(b) The matrix of regression coefficients is
ρ13 σ1 σ3 1
A Σ−1
=
Z
ρ23 σ2 σ3 σ32
2×1 1×1
T
Using the general result that E[Y|Z] = µY − AΣ−1
(Z − µZ ) and var[Y|Z] = ΣY − AΣ−1
Z A gives
Z
1 ρ13 σ1 σ3 (X3 − µ3 )
µ1
E[(X1 , X2 )|X3 ] =
− 2
µ2
σ3 ρ23 σ2 σ3 (X3 − µ3 )
2
1
σ1
ρ12 σ1 σ2
ρ213 σ12 σ32
ρ13 ρ23 σ1 σ2 σ32
var[(X1 , X2 )|X3 ] =
−
ρ12 σ1 σ2
σ22
ρ223 σ22 σ32
σ32 ρ13 ρ23 σ1 σ2 σ32
(1 − ρ213 )σ12
(ρ12 − ρ13 ρ23 )σ1 σ2
=
(ρ12 − ρ13 ρ23 )σ1 σ2
(1 − ρ223 )σ22
Appendix
Jun 7, 2018(14:22)
6. Let 1 denote the n × 1-dimensional vector with every entry equal to 1.
(a) Now c = cov[Yj , Y1 + · · · + Yn ] is the sum of the j th row of Σ. Adding over all n rows gives nc = 1T Σ1 = var[1Y] =
var[Y1 + · · · + Yn ] ≥ 0.
(b) Suppose there exists c ∈ R such that cov[Yj , Y1 + · · · + Yn ] = c for all j = 1, 2, . . . , n. Then Σ1 = c1 and hence 1 is
an eigenvector of Σ. The converse is similar.
(c) We are given that cov[Yj , Y1 + · · · + Yn ] = 0 for j = 1, 2, . . . , n. Consider the random vector Z = (Y1 , . . . , Yn , Y1 +
· · · + Yn ) Because every linear combination `T Z of the components of Z has a univariate normal distribution, it follows
that Z has a multivariate normal distribution. Also cov[Y, Y1 + · · · + Yn ] = 0. Hence Y = (Y1 , . . . , Yn ) is independent of
Y1 + · · · + Yn . But Y1 + · · · + Yn is a function of Y = (Y1 , . . . , Yn ); hence Y1 + · · · + Yn is almost surely constant3 . Hence
(X1 · · · Xn )1/n is almost surely constant.
7. We need to find the conditional density fXn |(Xn−1 ,···,X2 ,X1 ) (xn |(xn−1 , . . . , x2 , x1 ). This is the density of
−1 T
N µY + AΣ−1
where Y = {Xn } and Z = (Xn−1 , . . . , X1 ).
Z (z − µZ ), ΣY − AΣZ A
The matrix of regression coefficients is


1
ρ
ρ2
· · · ρn−2 −1
n−3
1
ρ
··· ρ

1  ρ

A
Σ−1
= σ 2 [ ρ ρ2 · · · ρn−1 ] 2 
..
..
..
.
.
Z

..
.. 
σ
1×(n−1) (n−1)×(n−1)
.
.
.
ρn−2 ρn−3 ρn−4 · · ·
1


1
−ρ
0
0 ··· 0
0
0
−ρ
0 ··· 0
0
0 
 −ρ 1 + ρ2


0
−ρ
1 + ρ2 −ρ · · · 0
0
0 

1
2
n−1

= [ρ ρ ··· ρ
]
.
..
..
..
..
..
.. 
..

1 − ρ2 
.
.
.
.
.
. 
.
 ..


2
0
0
0
0 · · · −ρ 1 + ρ −ρ
0
0
0
0 ··· 0
−ρ
1
= [ρ 0 0 ··· 0 0]
Thus the distribution of Xn given (Xn−1 , . . . , X1 ) is N (µn +ρ(Xn−1 −µn−1 ), σ 2 (1−ρ2 )), proving the Markov property.
Chapter 2 Section 9 on page 60
1.
Use the transformation x22 = (ν +
Z ∞
p
2
2π 1 − ρ fY (y) =
1+
−∞
2
2
(exs-t.tex.tex)
2
1)(x − ρy) /(1 − ρ )(ν + y ). Then
−(ν+2)/2
−(ν+2)/2
Z ∞
x2 − 2ρxy + y 2
(x − ρy)2 + (y 2 + ν)(1 − ρ2 )
dx =
dx
ν(1 − ρ2 )
ν(1 − ρ2 )
−∞
−(ν+2)/2 r
Z ∞ x22
(1 − ρ2 )(ν + y 2 )
y2
1+
dx2
=
1+
ν
ν+1
ν+1
−∞
and hence
−(ν+2)/2
√
y2
B( 1/2, (ν+1)/2) ν + 1
ν
−(ν+1)/2
B( 1/2, (ν+1)/2) √
y2
=
ν 1+
2π
ν
1
fY (y) =
2π
r
ν + y2
ν+1
1+
and then use
2π
ν
2. (a) Provided ν > 1, E[X] = 0. Provided ν > 2, var[X] = ν/(ν − 2).
(b) By the usual characterization of the
t-distribution, E[XY ] = E[Z1 Z2 ]ν/E[1/W ] = ρν/(ν − 2). Hence corr[X, Y ] = ρ.
3. Now
ν/2−1
h wi
1
w
1 T
f(Z1 ,...,Zp ,W ) (z1 , . . . , zp , w) =
z
z
exp
−
exp
−
ν
2
2
(2π)p/2
2ν/2 Γ( 2 )
1/2
p/2
p/2
T
T
Use the transformation T = Z (W/ν) ; this has Jacobean ν /w . Also, Z Z = W T T/ν. Hence
h w
wν/2−1
w i wp/2
f(T1 ,...,Tp ,W ) (t1 , . . . , tp , w) = (p+ν)/2 p/2 ν exp − tT t −
2ν
2 ν p/2
2
π Γ 2
w(p+ν)/2−1
w
tT t
= (p+ν)/2 p/2 p/2 ν exp −
1+
2
ν
2
π ν Γ 2
Integrating out w gives
B( 1/2, (ν+1)/2) B( 1/2, ν/2) =
3
By definition, the random variables X and Y are independent iff the generated σ-fields σ(X) and σ(Y ) are independent. If
Y = f (X), then σ(Y ) ⊆ σ(X) and hence for every A ∈ σ(Y ) we have A is independent of itself and so P(A) = 0 or P(A) = 1.
Hence the distribution function of Y satisfies FY (y) = 0 or 1 for every y ∈ R and hence Y is almost surely constant.
Jun 7, 2018(14:22)
fT (t) =
Z
1
2(p+ν)/2 π p/2 ν p/2 Γ
ν
2
∞
0
Bayesian Time Series Analysis
w
tT t
w(p+ν)/2−1 exp −
1+
dw
2
ν
Now let y = (1 + tT t/ν)w. This leads to
fT (t) =
=
=
2(p+ν)/2 π p/2 ν p/2 Γ
2(p+ν)/2 π p/2 Γ
ν
ν
ν
2
ν
ν
ν
2
(1 +
tT t/ν)(p+ν)/2
(ν + tT t)(p+ν)/2
ν
T (p+ν)/2
2 (ν + t t)
p+ν
2
+ tT t)(p+ν)/2
2(p+ν)/2 π p/2 Γ
∞
h yi
dy
y (p+ν)/2−1 exp −
2
0
Z ∞
h yi
y (p+ν)/2−1 exp −
dy
2
0
p + ν × 2(p+ν)/2 Γ
2
Z
1
νν Γ
π p/2 Γ ν2 (ν
4. Now C = LLT and C is symmetric. Hence LT C = CLT ; hence C−1 LT = LT C−1 . Of course C−1 is also symmetric;
hence LC−1 = C−1 L.
Using (V − m)T = TT LT gives (V − m)T L = TT LT L = TT C and hence (V − m)T LC−1 = TT and T = C−1 LT (V − m).
Hence TT T = (V − m)T C−1 (V − m).
5. First, note that
T −1
T −1
(v − m1 )T C−1
1 (v − m1 ) = (v − a − Am) C1 (v − a − Am) = (At − Am) C1 (At − Am)
=
Then use
T −1
= (t − m)T AT C−1
1 A(t − m) = (t − m) C (t − m)
1
fV (v) ∝ (ν+p)/2
T
−1
ν + (t − m) C (v − m)
6. This follows from proposition(8.2a) on page 58.
APPENDIX
References
[A SH (2000)]
A SH , R. B. (2000). Real analysis and probability.. 2nd ed. Academic Press. 0-12-065202-1.
[B ILLINGSLEY (1995)]
B ILLINGSLEY, P. (1995). Probability and measure. J.Wiley. ISBN 0-471-00710-2.
[C RAWFORD (1966)] C RAWFORD , G. B. (1966). Characterization of geometric and exponential distributions.
Annals of Mathematical Statistics, 37 1790–1795.
[DALLAS (1976)] DALLAS , A. (1976). Characterizing the Pareto and power distributions. Annals of the Institute
of Statistical Mathematics, 28 491–497.
[F ELLER (1971)] F ELLER , W. (1971). An introduction to probability theory and its applications. Volume II.. 2nd
ed. J. Wiley. ISBN 0-471-257095.
[F RISTEDT & G RAY (1997)]
Springer. 0-8176-3807-5.
F RISTEDT, B. & G RAY, L. G. (1997). A modern approach to probability theory..
[G ALAMBOS & KOTZ (1978)] G ALAMBOS , J. & KOTZ , S. (1978). Characterizations of probability distributions..
Springer. ISBN 3-540-08933-0.
[JAMES (1979)] JAMES , I. R. (1979). Characterization of a family of distributions by the independence of size
and shape variables. Annals of Statistics, 7 869–881.
[K AGAN et al. (1973)] K AGAN , A., L INNIK , Y. & R AO , C. (1973). Characterization problems in mathematical
statistics.. J. Wiley. ISBN ISBN 0-471-45421-4.
[L UKACS (1955)] L UKACS , E. (1955). A characterization of the gamma distribution. Annals of Mathematical
Statistics, 26 319–324.
[L UKACS (1970)]
L UKACS , E. (1970). Characteristic functions.. 2nd ed. Griffin. ISBN 85264-170-2 .
[M ARSAGLIA (1989)]
M ARSAGLIA , G. (1989). The X+Y, X/Y characterization of the gamma distribution. In
G LESER , L., P ERLMAN , M. D., P RESS , S. J. & S AMPSON , A. R. (eds.), Contributions to probability and statistics.
Essays in honor of Ingram Olkin..
ISBN 978-1-4612-8200-6.
[M ATHAI & P EDERZOLI (1977)] M ATHAI , A. M. & P EDERZOLI , G. (1977). Characterizations of the normal probability law.. J. Wiley. ISBN 0-85226-558-1.
[M ORAN (2003)]
M ORAN , P. (2003). An introduction to probability theory.. 2nd ed. Oxford University Press.
ISBN 0-19-853242-3.
[M OSSIMAN (1970)] M OSSIMAN , J. E. (1970). Size allometry: size and shape variables with characterizations
of the lognormal and generalized gamma distributions. Journal of the American Statistical Association, 65
930–945.
[PATEL & R EAD (1996)] PATEL , J. K. & R EAD , C. B. (1996). Handbook of the normal distribution.. 2nd ed.
Dekker. ISBN 0-82479-342-0.
[S RIVASTAVA (1965)] S RIVASTAVA , M. S. (1965). A characterization of Pareto’s distribution and (k + 1)xk /θk+1 .
Annals of Mathematical Statistics, 36 361–362.
[W INKELBAUER (2014)]
arXiv:1209.4340v2.
W INKELBAUER , A. (2014). Moments and absolute moments of the normal distribution.
Bayesian Time Series Analysis by R.J. Reed
Jun 7, 2018(14:22)