Uploaded by Gabrielle Wu

stat330 textbook

advertisement
STATISTICS 330 COURSE NOTES
Cyntha A. Struthers
Department of Statistics and Actuarial Science, University of Waterloo
Spring 2022 Edition
ii
1
Contents
1. Preview
1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. Univariate Random Variables
2.1 Probability . . . . . . . . . . . . . .
2.2 Random Variables . . . . . . . . . .
2.3 Discrete Random Variables . . . . .
2.4 Continuous Random Variables . . . .
2.5 Location and Scale Parameters . . .
2.6 Functions of a Random Variable . .
2.7 Expectation . . . . . . . . . . . . . .
2.8 Inequalities . . . . . . . . . . . . . .
2.9 Variance Stabilizing Transformation
2.10 Moment Generating Functions . . .
2.11 Calculus Review . . . . . . . . . . .
2.12 Chapter 2 Problems . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Multivariate Random Variables
3.1 Joint and Marginal Cumulative Distribution
3.2 Bivariate Discrete Distributions . . . . . . .
3.3 Bivariate Continuous Distributions . . . . .
3.4 Independent Random Variables . . . . . . .
3.5 Conditional Distributions . . . . . . . . . .
3.6 Joint Expectations . . . . . . . . . . . . . .
3.7 Conditional Expectation . . . . . . . . . . .
3.8 Joint Moment Generating Functions . . . .
3.9 Multinomial Distribution . . . . . . . . . .
3.10 Bivariate Normal Distribution . . . . . . . .
3.11 Calculus Review . . . . . . . . . . . . . . .
3.12 Chapter 3 Problems . . . . . . . . . . . . .
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Functions
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
.
.
.
.
.
.
.
.
.
.
.
.
5
5
10
15
19
27
30
37
43
44
45
52
56
.
.
.
.
.
.
.
.
.
.
.
.
61
62
63
66
76
84
90
97
102
108
111
114
116
iv
CONTENTS
4. Functions of Two or More Random Variables
4.1 Cumulative Distribution Function Technique .
4.2 One-to-One Transformations . . . . . . . . . .
4.3 Moment Generating Function Technique . . . .
4.4 Chapter 4 Problems . . . . . . . . . . . . . . .
5. Limiting or Asymptotic Distributions
5.1 Convergence in Distribution . . . . . . .
5.2 Convergence in Probability . . . . . . .
5.3 Weak Law of Large Numbers . . . . . .
5.4 Moment Generating Function Technique
5.5 Additional Limit Theorems . . . . . . .
5.6 Chapter 5 Problems . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
. . . . . . .
. . . . . . .
for Limiting
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . .
. . . . . . . .
. . . . . . . .
Distributions
. . . . . . . .
. . . . . . . .
6. Maximum Likelihood Estimation - One Parameter
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Maximum Likelihood Method . . . . . . . . . . . . . . .
6.3 Score and Information Functions . . . . . . . . . . . . .
6.4 Likelihood Intervals . . . . . . . . . . . . . . . . . . . .
6.5 Limiting Distribution of Maximum Likelihood Estimator
6.6 Con…dence Intervals . . . . . . . . . . . . . . . . . . . .
6.7 Approximate Con…dence Intervals . . . . . . . . . . . .
6.8 Chapter 6 Problems . . . . . . . . . . . . . . . . . . . .
7. Maximum Likelihood Estimation - Multiparameter
7.1 Likelihood and Related Functions . . . . . . . . . . . . .
7.2 Likelihood Regions . . . . . . . . . . . . . . . . . . . . .
7.3 Limiting Distribution of Maximum Likelihood Estimator
7.4 Approximate Con…dence Regions . . . . . . . . . . . . .
7.5 Chapter 7 Problems . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
121
121
125
137
147
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
151
152
156
159
167
172
179
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
183
184
185
191
204
208
213
218
224
.
.
.
.
.
227
228
237
240
242
249
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8. Hypothesis Testing
8.1 Test of Hypothesis . . . . . . . . . . . . . . . . .
8.2 Likelihood Ratio Tests for Simple Hypotheses . .
8.3 Likelihood Ratio Tests for Composite Hypotheses
8.4 Chapter 8 Problems . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
253
253
257
265
273
9. Solutions to Chapter Exercises
9.1 Chapter 2 . . . . . . . . . . .
9.2 Chapter 3 . . . . . . . . . . .
9.3 Chapter 4 . . . . . . . . . . .
9.4 Chapter 5 . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
275
276
288
305
316
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
9.5
9.6
9.7
v
Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
10. Solutions to Selected End of
10.1 Chapter 2 . . . . . . . . . .
10.2 Chapter 3 . . . . . . . . . .
10.3 Chapter 4 . . . . . . . . . .
10.4 Chapter 5 . . . . . . . . . .
10.5 Chapter 6 . . . . . . . . . .
10.6 Chapter 7 . . . . . . . . . .
10.7 Chapter 8 . . . . . . . . . .
Chapter
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Problems
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
337
338
383
412
435
448
460
465
11. Summary of Named Distributions
469
12. Distribution Tables
473
0
CONTENTS
Preface
In order to provide improved versions of these Course Notes for students in subsequent
terms, please email corrections, sections that are confusing, or comments/suggestions to
castruth@uwaterloo.ca.
1. Preview
The following examples will illustrate the ideas and concepts discussed in these Course
Notes. They also indicate how these ideas and concepts are connected to each other.
1.1
Example
The number of service interruptions in a communications system over 200 separate days is
summarized in the following frequency table:
Number of interruptions
Observed frequency
0
64
1
71
2
42
3
18
4
4
5
1
>5
0
Total
200
It is believed that a Poisson model will …t these data well. Why might this be a reasonable
assumption? (PROBABILITY MODELS )
If we let the random variable X = number of interruptions in a day and assume that
the Poisson model is reasonable then the probability function of X is given by
P (X = x) =
xe
for x = 0; 1; : : :
x!
where is a parameter of the model which represents the mean number of service interruptions in a day. (RANDOM VARIABLES, PROBABILITY FUNCTIONS, EXPECTATION, MODEL PARAMETERS ) Since is unknown we might estimate it using the
sample mean
x=
64(0) + 71(1) +
200
+ 1(5)
=
230
= 1:15
200
(POINT ESTIMATION ) The estimate ^ = x is the maximum likelihood estimate of .
It is the value of which maximizes the likelihood function. (MAXIMUM LIKELIHOOD
ESTIMATION ) The likelihood function is the probability of the observed data as a function
of the unknown parameter(s) in the model. The maximum likelihood estimate is thus the
value of which maximizes the probability of observing the given data.
1
2
1. PREVIEW
In this example the likelihood function is given by
L( ) = P (observing 0 interruptions 64 times,. . . , > 5 interruptions 0 times; )
=
200!
64!71!
1!0!
=c
=c
e
200
for
71
1e
0!
64(0)+71(1)+ +1(5)
230
64
0e
1
5e
1!
1
P
5!
x=6
(64+71+ +1)
e
xe
0
x!
>0
where
1
0!
200!
c=
64!71!
1!0!
The maximum likelihood estimate of
64
1
1!
71
:::
1
5!
can be found by solving
1
dL
d
= 0 or equivalently
d log L
d
= 0 and verifying that it corresponds to a maximum.
If we want an interval of values for
which are reasonable given the data then we
could construct a con…dence interval for . (INTERVAL ESTIMATION ) To construct
con…dence intervals we need to …nd the sampling distribution of the estimator. In this
example we would need to …nd the distribution of the estimator
X=
X1 + X2 +
n
+ Xn
where Xi = number of interruptions in a day i, i = 1; 2; : : : ; 200. (FUNCTIONS OF
RANDOM VARIABLES: cumulative distribution function technique, one-to-one transformations, moment generating function technique) Since Xi
Poisson( ) with E(Xi ) =
and V ar(Xi ) =
the distribution of X for large n is approximately N( ; =n) by the
Central Limit Theorem. (LIMITING DISTRIBUTIONS )
Suppose the manufacturer of the communications system claimed that the mean number
of interruptions was 1. Then we would like to test the hypothesis H : = 1. (TESTS OF
HYPOTHESIS ) A test of hypothesis uses a test statistic to measure the evidence based on
the observed data against the hypothesis. A test statistic with good properties for testing
H : = 0 is the likelihood ratio statistic, 2 log [L ( 0 ) =L (^ )]. (LIKELIHOOD RATIO
STATISTIC ) For large n the distribution of the likelihood ratio statistic is approximately
2 (1) if the hypothesis H :
= 0 is true.
1.2
Example
The following are relief times in hours for 20 patients receiving a pain killer:
1:1
4:1
1:4
1:8
1:3
1:5
1:7
1:2
1:9
1:4
1:8
3:0
1:6
1:7
2:2
2:3
1:7
1:6
2:7
2:0
It is believed that the Weibull distribution with probability density function
f (x) =
x
1
e
(x= )
for x > 0;
> 0;
>0
1.2. EXAMPLE
3
will provide a good …t to the data. (CONTINUOUS MODELS, PROBABILITY DENSITY
FUNCTIONS ) Assuming independent observations the (approximate) likelihood function
is
20
Q
L( ; ) =
xi 1 e (xi = )
for > 0;
>0
i=1
where xi is the observed relief time for the ith patient. (MULTIPARAMETER LIKELIHOODS ) The maximum likelihood estimates ^ and ^ are found by simultaneously solving
@ log L
=0
@
and
@ log L
=0
@
Since an explicit solution to these equations cannot be obtained, a numerical solution must
be found using an iterative method. (NEWTON’S METHOD) Also, since the maximum
likelihood estimators cannot be given explicitly, approximate con…dence intervals and tests
of hypothesis must be based on the asymptotic distributions of the maximum likelihood
estimators. (LIMITING OR ASYMPTOTIC DISTRIBUTIONS OF MAXIMUM LIKELIHOOD ESTIMATORS )
4
1. PREVIEW
2. Univariate Random Variables
In this chapter we review concepts that were introduced in a previous probability course
such as STAT 220/230/240 as well as introducing new concepts. In Section 2:1 the concepts
of random experiments, sample spaces, probability models, and rules of probability are
reviewed. The concepts of a sigma algebra and probability set function are also introduced.
In Section 2:2 we de…ne a random variable and its cumulative distribution function. It
is important to note that the de…nition of a cumulative distribution function is the same
for all types of random variables. In Section 2:3 we de…ne a discrete random variable and
review the named discrete distributions (Hypergeometric, Binomial, Geometric, Negative
Binomial, Poisson). In Section 2:4 we de…ne a continuous random variable and review
the named continuous distributions (Uniform, Exponential, Normal, and Chi-squared). We
also introduce new continuous distributions (Gamma, Two Parameter Exponential, Weibull,
Cauchy, Pareto). A summary of the named discrete and continuous distributions
that are used in these Course Notes can be found in Chapter 11. In Section 2:5
we review the cumulative distribution function technique for …nding a function of a random
variable and prove a theorem which can be used in the case of a monotone function. In
Section 2:6 we review expectations of functions of random variables. In Sections 2:8 2:10
we introduce new material related to expectation such as inequalities, variance stabilizing
transformations and moment generating functions. Section 2.11 contains a number of
useful calculus results which will be used throughout these Course Notes.
2.1
Probability
To model real life phenomena for which we cannot predict exactly what will happen we
assign numbers, called probabilities, to outcomes of interest which re‡ect the likelihood of
such outcomes. To do this it is useful to introduce the concepts of an experiment and its
associated sample space. Consider some phenomenon or process which is repeatable, at
least in theory. We call the phenomenon or process a random experiment and refer to a
single repetition of the experiment as a trial. For such an experiment we consider the set
of all possible outcomes.
5
6
2.1.1
2. UNIVARIATE RANDOM VARIABLES
De…nition - Sample Space
A sample space S is a set of all the distinct outcomes for a random experiment, with the
property that in a single trial, one and only one of these outcomes occurs.
To assign probabilities to the events of interest for a given experiment we begin by de…ning
a collection of subsets of a sample space S which is rich enough to de…ne all the events of
interest for the experiment. We call such a collection of subsets a sigma algebra.
2.1.2
De…nition - Sigma Algebra
A collection of subsets of a set S is called a sigma algebra, denoted by B, if it satis…es the
following properties:
(1) ? 2 B where ? is the empty set
(2) If A 2 B then A 2 B
(3) If A1 ; A2 ; : : : 2 B then
1
S
i=1
Ai 2 B
Suppose A1 ; A2 ; : : : are subsets of the sample space S which correspond to events of interest
for the experiment. To complete the probability model for the experiment we need to assign
real numbers P (Ai ) ; i = 1; 2; : : :, where P (Ai ) is called the probability of Ai . To develop
the theory of probability these probabilities must satisfy certain properties. The following
Axioms of Probability are a set of axioms which allow a mathematical structure to be
developed.
2.1.3
De…nition - Probability Set Function
Let B be a sigma algebra associated with the sample space S. A probability set function is
a function P with domain B that satis…es the following axioms:
(A1) P (A)
0 for all A 2 B
(A2) P (S) = 1
(A3) If A1 ; A2 ; : : : 2 B are pairwise mutually exclusive events, that is, Ai \ Aj = ? for all
i 6= j, then
1
1
S
P
P
Ai =
P (Ai )
i=1
i=1
Note: The probabilities P (A1 ) ; P (A2 ) ; : : : can be assigned in any way as long they satisfy
these three axioms. However, if we wish to model real life phenomena we would assign the
probabilities such that they correspond to the relative frequencies of events in a repeatable
experiment.
2.1. PROBABILITY
2.1.4
7
Example
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A; B 2 B then prove the following:
(a) P (?) = 0
(b) If A; B 2 B and A and B are mutually exclusive events then P (A [ B) = P (A)+P (B).
(c) P A = 1
(d) If A
P (A)
B then P (A)
P (B) Note: A
B means a 2 A implies a 2 B.
Solution
(a) Let A1 = S and Ai = ? for i = 2; 3; : : :. Since
it follows that
P (S) = P (S) +
1
S
Ai = S then by De…nition 2.1.3 (A3)
i=1
1
P
P (?)
i=2
and by (A2) we have
1=1+
1
P
P (?)
i=2
By (A1) the right side is a series of non-negative numbers which must converge to the left
side which is 1 which is …nite which results in a contradiction unless P (?) = 0 as required.
1
S
(b) Let A1 = A, A2 = B, and Ai = ? for i = 3; 4; : : :. Since
Ai = A [ B then by (A3)
i=1
P (A [ B) = P (A) + P (B) +
and since P (?) = 0 by the result (a) it follows that
1
P
P (?)
i=3
P (A [ B) = P (A) + P (B)
(c) Since S = A [ A and A \ A = ? then by (A2) and the result proved in (b) it follows
that
1 = P (S) = P A [ A = P (A) + P A
or
P A =1
P (A)
(d) Since B = (A \ B) [ A \ B = A [ A \ B and A \ A \ B = ? then by (b)
P (B) = P (A) + P A \ B . But by (A1), P A \ B
0 so it follows that P (B) P (A).
2.1.5
Exercise
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A; B 2 B then prove the following:
(a) 0
P (A)
1
(b) P A \ B = P (A)
P (A \ B)
(c) P (A [ B) = P (A) + P (B)
P (A \ B)
8
2. UNIVARIATE RANDOM VARIABLES
For a given experiment we are sometimes interested in the probability of an event given
that we know that the event of interest has occurred in a certain subset of S. For example,
the experiment might involve people of di¤erent ages and we may be interest in an event
only for a given age group. This leads us to de…ne conditional probability.
2.1.6
De…nition - Conditional Probability
Let B be a sigma algebra associated with the sample space S and suppose A; B 2 B. The
conditional probability of event A given event B is
P (AjB) =
2.1.7
P (A \ B)
P (B)
provided P (B) > 0
Example
The following table of probabilities are based on data from the 2011 Canadian census. The
probabilities are for Canadians aged 25 34.
Employed
Unemployed
0:066
0:010
High school
diploma or equivalent
0:185
0:016
Postsecondary
certi…cate, diploma or degree
0:683
0:040
Highest level of education attained
No certi…cate,
diploma or degree
If a person is selected at random what is the probability the person
(a) is employed?
(b) has no certi…cate, diploma or degree?
(c) is unemployed and has at least a high school diploma or equivalent?
(d) has at least a high school diploma or equivalent given that they are unemployed?
Solution
(a) Let E be the event “employed”, A1 be the event “no certi…cate, diploma or degree”,
A2 be the event “high school diploma or equivalent”, and A3 be the event “postsecondary
certi…cate, diploma or degree”.
P (E) = P (E \ A1 ) + P (E \ A2 ) + P (E \ A3 )
= 0:066 + 0:185 + 0:683
= 0:934
2.1. PROBABILITY
9
(b)
P (A1 ) = P (E \ A1 ) + P E \ A1
= 0:066 + 0:010
= 0:076
(c)
P (unemployed and have at least a high school diploma or equivalent)
= P E \ (A2 [ A3 )
= P E \ A2 + P E \ A3
= 0:016 + 0:040 = 0:056
(d)
P A2 [ A3 j E
P E \ (A2 [ A3 )
P E
0:056
=
0:066
= 0:848
=
If the occurrence of event B does not a¤ect the probability of the event A, then the events
are called independent events.
2.1.8
De…nition - Independent Events
Let B be a sigma algebra associated with the sample space S and suppose A; B 2 B. A
and B are independent events if
P (A \ B) = P (A) P (B)
2.1.9
Example
In Example 2.1.7 are the events, “unemployed” and “no certi…cate, diploma or degree”,
independent events?
Solution
The events “unemployed”and “no certi…cate, diploma or degree”are not independent since
0:010 = P E \ A1
6= P E P (A1 ) = (0:066) (0:076)
10
2.2
2. UNIVARIATE RANDOM VARIABLES
Random Variables
A probability model for a random experiment is often easier to construct if the outcomes of
the experiment are real numbers. When the outcomes are not real numbers, the outcomes
can be mapped to numbers using a function called a random variable. When the observed
data are numerical values such as the number of interruptions in a day in a communications
system or the length of time until relief after taking a pain killer, random variables are still
used in constructing probability models.
2.2.1
De…nition of a Random Variable
A random variable X is a function from a sample space S to the real numbers <, that is,
X:S!<
such that P (X
x) is de…ned for all x 2 <.
Note: ‘X x’is an abbreviation for f! 2 S : X(!) xg where f! 2 S : X(!)
and B is a sigma algebra associated with the sample space S.
2.2.2
xg 2 B
Example
Three friends Ali, Benita and Chen are enrolled in STAT 330. Suppose we are interested in
whether these friends earn a grade of 70 or more. If we let A represent the event “Ali earns
a grade of 70 or more”, B represent the event “Benita earns a grade of 70 or more”, and C
represent the event “Chen earns a grade of 70 or more” then a suitable sample space is
S = ABC; ABC; ABC; AB C; ABC; AB C; AB C; AB C
Suppose we are mostly interested in how many of these friends earn a grade of 70 or
more. We can de…ne the random variable X = “number of friends who earn a grade of 70
or more”. The range of X is f0; 1; 2; 3g with associated mapping
X (ABC) = 3
X ABC
= X ABC = X AB C = 2
X ABC
= X AB C = X AB C = 1
X AB C
= 0
An important function associated with random variables is the cumulative distribution
function.
2.2. RANDOM VARIABLES
2.2.3
11
De…nition - Cumulative Distribution Function
The cumulative distribution function (c.d.f.) of a random variable X is de…ned by
F (x) = P (X
x) for x 2 <
Note: The cumulative distribution function is de…ned for all real numbers.
2.2.4
Properties - Cumulative Distribution Function
(1) F is a non-decreasing function, that is,
F (x1 )
F (x2 ) for all x1 < x2
(2)
lim F (x) = 0 and
x! 1
lim F (x) = 1
x!1
(3) F is a right-continuous function, that is,
lim F (x) = F (a)
x!a+
(4) For all a < b
P (a < X
b) = P (X
b)
P (X
a) = F (b)
F (a)
(5) For all b
P (X = b) = F (b)
2.2.5
lim F (a)
a!b
Example
Suppose X is a random variable with cumulative distribution function
8
x<0
>
> 0
>
>
>
>
0 x<1
>
< 0:1
F (x) = P (X
(a) Graph the function F (x).
(b) Determine the probabilities
(i) P (X 1)
(ii) P (X 2)
(iii) P (X 2:4)
(iv) P (X = 2)
(v) P (0 < X 2)
x) =
0:3
>
>
>
>
0:6
>
>
>
:
1
1
x<2
2
x<3
x
3
12
2. UNIVARIATE RANDOM VARIABLES
1.1
1
0.9
0.8
0.7
0.6
F(x)
0.5
0.4
0.3
0.2
0.1
0
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
x
Figure 2.1: Graph of F (x) = P (X
(vi) P (0
X
x) for Example 2.2.5
2).
Solution
(a) See Figure 2.1.
(b) (i)
P (X
1) = F (1) = 0:3
P (X
2) = F (2) = 0:6
(ii)
(iii)
P (X
2:4) = P (X
2) = F (2) = 0:6
(iv)
P (X = 2) = F (2)
lim F (x) = 0:6
x!2
0:3 = 0:3
or
P (X = 2) = P (X
2)
P (X
1) = F (2)
F (1) = 0:6
0:3 = 0:3
(v)
P (0 < X
2) = P (X
2)
P (X
0) = F (2)
F (0) = 0:6
0:1 = 0:5
(vi)
P (0
X
2) = P (X
2)
P (X < 0) = F (2)
0 = 0:6
4
2.2. RANDOM VARIABLES
2.2.6
13
Example
Suppose X is a random variable with cumulative distribution function
8
>
0
x 0
>
>
>
>
2 3
<
0<x<1
5x
F (x) = P (X x) =
1
2
>
3x
7
1 x<2
>
5 12x
>
>
>
:
1
x 2
(a) Graph the function F (x).
(b) Determine the probabilities
(i) P (X 1)
(ii) P (X 2)
(iii) P (X 2:4)
(iv) P (X = 0:5)
(v) P (X = b), for b 2 <
(vi) P (1 < X 2:4)
(vii) P (1 X 2:4).
Solution
(a) See Figure 2.2
1
0 .8
0 .6
F (x )
0 .4
0 .2
0
0
0 .5
1
1 .5
2
x
Figure 2.2: Graph of F (x) = P (X
x) for Example 2.2.6
14
2. UNIVARIATE RANDOM VARIABLES
(b) (i)
P (X
1) = F (1)
2
2
=
(1)3 =
5
5
= 0:4
(ii)
P (X
2) = F (2) = 1
(iii)
P (X
2:4) = F (2:4) = 1
(iv)
P (X = 0:5)
= F (0:5)
lim F (x)
x!0:5
= F (0:5)
F (0:5)
= 0
(v)
P (X = b)
= F (b)
lim F (a)
a!b
= F (b)
F (b)
= 0 for all b 2 <
(vi)
P (1 < X
2:4) = F (2:4)
= 1
F (1)
0:4
= 0:6
(vii)
P (1
X
2:4) = P (1 < X
= 0:6
2:4) since P (X = 1) = 0
2.3. DISCRETE RANDOM VARIABLES
2.3
15
Discrete Random Variables
A set A is countable if the number of elements in the set is …nite or the elements of the set
can be put into a one-to-one correspondence with the positive integers.
2.3.1
De…nition - Discrete Random Variable
A random variable X de…ned on a sample space S is a discrete random variable if there is
a countable subset A < such that P (X 2 A) = 1.
2.3.2
De…nition - Probability Function
If X is a discrete random variable then the probability function (p.f.) of X is given by
f (x) = P (X = x)
= F (x)
lim F (x
"!0+
") for x 2 <
The set A = fx : f (x) > 0g is called the support set of X.
2.3.3
Properties - Probability Function
(1)
f (x)
(2)
0 for x 2 <
P
f (x) = 1
x2A
2.3.4
Example
In Example 2.2.5 …nd the support set A, show that X is a discrete random variable and
determine its probability function.
Solution
The support set of X is A = f0; 1; 2; 3g which is a countable
8
>
0:1
>
>
>
>
< P (X 1) P (X 0) = 0:3
f (x) = P (X = x) =
>
P (X 2) P (X 1) = 0:6
>
>
>
>
: P (X 3) P (X 2) = 1
set. Its probability function is
if x = 0
0:1 = 0:2
if x = 1
0:3 = 0:3
if x = 2
0:6 = 0:4
or
x
f (x) = P (X = x)
0
0:1
1
0:2
2
0:3
3
0:4
Total
1
if x = 3
16
2. UNIVARIATE RANDOM VARIABLES
Since P (X 2 A) =
3
P
P (X = x) = 1, X is a discrete random variable.
x=0
In the next example we review four of the named distributions which were introduced in a
previous probability course.
2.3.5
Example
Suppose a box containing a red balls and b black balls. For each of the following …nd
P
the probability function of the random variable X and show that
f (x) = 1 where
x2A
A = fx : f (x) > 0g is the support set of X.
(a) X = number of red balls among n balls drawn at random without replacement.
(b) X = number of red balls among n balls drawn at random with replacement.
(c) X = number of black balls selected before obtaining the …rst red ball if sampling is
done at random with replacement.
(d) X = number of black balls selected before obtaining the kth red ball if sampling is done
at random with replacement.
Solution
(a) If n balls are selected at random without replacement from a box of a red balls and
b black balls then the random variable X = number of red balls has a Hypergeometric
distribution with probability function
f (x) = P (X = x) =
a
x
b
n x
a+b
n
for x = max (0; n
By the Hypergeometric identity 2.11.6
P
f (x) =
x2A
a
x
min(a;n)
P
x=max(0;n b)
=
=
= 1
1
a+b
n
a+b
n
a+b
n
1
P
x=0
b
n x
a+b
n
a
b
x n x
b) ; : : : ; min (a; n)
2.3. DISCRETE RANDOM VARIABLES
17
(b) If n balls are selected at random with replacement from a box of a red balls and b black
balls then we have a sequence of Bernoulli trials and the random variable X = number of
red balls has a Binomial distribution with probability function
n x
p (1
x
f (x) = P (X = x) =
where p =
a
a+b .
p)n
x
for x = 0; 1; : : : ; n
By the Binomial series 2.11.3(1)
P
f (x) =
n
P
n x
p (1
x
x20
x2A
p)n
x
p)n
= (p + 1
= 1
(c) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the …rst red ball
has a Geometric distribution with probability function
p)x for x = 0; 1; : : :
f (x) = P (X = x) = p (1
By the Geometric series 2.11.1
P
f (x) =
1
P
p (1
p)x
p
(1
p)]
x20
x2A
=
[1
= 1
(d) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the kth red ball
has a Negative Binomial distribution with probability function
f (x) = P (X = x) =
x+k
x
1
pk (1
p)x for x = 0; 1; : : :
Using the identity 2.11.4(1)
x+k
x
1
=
k
( 1)x
x
the probability function can be written as
f (x) = P (X = x) =
k k
p (p
x
1)x
for x = 0; 1; : : :
18
2. UNIVARIATE RANDOM VARIABLES
By the Binomial series 2.11.3(2)
P
1
P
f (x) =
x20
x2A
= pk
k k
p (p
x
1)x
k
(p
x
1)x
1
P
x20
k
= p (1 + p
1)
k
= 1
2.3.6
Example
If X is a random variable with probability function
f (x) =
xe
for x = 0; 1; : : : ;
x!
show that
1
P
>0
(2.1)
f (x) = 1
x=0
Solution
By the Exponential series 2.11.7
1
P
f (x) =
x=0
1
P
x=0
= e
xe
x!
1
P
x=0
= e
x
x!
e
= 1
The probability function (2.1) is called the Poisson probability function.
2.3.7
Exercise
If X is a random variable with probability function
f (x) =
show that
(1 p)x
x log p
for x = 1; 2; : : : ; 0 < p < 1
1
P
f (x) = 1
x=1
Hint: Use the Logarithmic series 2.11.8.
Important Note: A summary of the named distributions used in these Course Notes can
be found in Chapter 11.
2.4. CONTINUOUS RANDOM VARIABLES
2.4
2.4.1
19
Continuous Random Variables
De…nition - Continuous Random Variable
Suppose X is a random variable with cumulative distribution function F . If F is a continuous function for all x 2 < and F is di¤erentiable except possibly at countably many points
then X is called a continuous random variable.
Note: The de…nition (2.2.3) and properties (2.2.4) of the cumulative distribution function
hold for the random variable X regardless of whether X is discrete or continuous.
2.4.2
Example
Suppose X is a random variable with cumulative distribution function
8
>
0
x
1
>
>
>
>
2
1
<
1<x 0
2 (x + 1) + x + 1
F (x) = P (X x) =
2
1
>
1) + x
0<x<1
>
2 (x
>
>
>
:
1
x 1
Show that X is a continuous random variable.
Solution
The cumulative distribution function F is a continuous function for all x 2 < since it is a
piecewise function composed of continuous functions and
lim F (x) = F (a)
x!a
at the break points a = 1; 0; 1.
The function F is di¤erentiable for all x 6=
composed of di¤erentiable functions. Since
F ( 1 + h) F ( 1)
h
F (0 + h) F (0)
lim
h
h!0
F (1 + h) F (1)
lim
h
h!0
lim
h!0
1; 0; 1 since it is a piecewise function
F ( 1 + h) F ( 1)
=1
h
F (0 + h) F (0)
= 0 = lim
+
h
h!0
F (1 + h) F (1)
= 1 6= lim
=0
h
h!0+
= 0 6= lim
h!0+
F is di¤erentiable at x = 0 butnot di¤erentiable at x = 1; 1. The set f 1; 1g is a countable
set.
Since F is a continuous function for all x 2 < and F is di¤erentiable except for countable
many points, therefore X is a continuous random variable.
20
2. UNIVARIATE RANDOM VARIABLES
2.4.3
De…nition - Probability Density Function
If X is a continuous random variable with cumulative distribution function F (x) then the
probability density function (p.d.f.) of X is f (x) = F 0 (x) if F is di¤erentiable at x. The
set A = fx : f (x) > 0g is called the support set of X.
Note: At the countably many points at which F 0 (a) does not exist, f (a) may be assigned
any convenient value since the probabilities P (X x) will be una¤ected by the choice. We
usually choose f (a) 0 and most often we choose f (a) = 0.
2.4.4
Properties - Probability Density Function
(1) f (x) 0 for all x 2 <
R1
(2)
f (x)dx = lim F (x)
lim F (x) = 1
x!1
1
F (x+h) F (x)
h
h!0
x
R
(3) f (x) = lim
(4) F (x) =
f (t)dt;
1
(5) P (a < X
x! 1
P (x X x+h)
h
h!0
= lim
if this limit exists
x2<
b) = P (X
b)
P (X
a) = F (b)
F (a) =
Rb
f (x)dx
a
(6) P (X = b) = F (b)
lim F (a) = F (b)
F (b) = 0
a!b
(since F is continuous).
2.4.5
Example
Find and sketch the probability density function of the random variable X with the cumulative distribution function in Example 2.2.6.
Solution
By taking the derivative of F (x) we obtain
F 0 (x) =
8
>
>
>
>
>
<
>
>
>
>
>
:
0
d
dx
d 2 3
dx 5 x
12x 3x2 7
5
0
x<0
=
=
6 2
5x
1
5 (12
0<x<1
6x)
1<x<2
x>2
We can assign any values to f (0), f (1), and f (2). For convenience we choose
f (0) = f (2) = 0 and f (1) = 56 .
2.4. CONTINUOUS RANDOM VARIABLES
21
The probability density function is
0
f (x) = F (x) =
8
>
>
<
>
>
:
6 2
5x
1
5
(12
0<x
6x)
1
1<x<2
0
otherwise
The graph of f (x) is given in Figure 2.3.
1.4
1.2
1
f(x)
0.8
0.6
0.4
0.2
0
-0.5
0
0.5
1
1.5
2
2.5
x
Figure 2.3: Graph of probability density function for Example 2.4.5
Note: See Table 2.1 in Section 2.7 for a summary of the di¤erences between the properties
of a discrete and continuous random variable.
2.4.6
Example
Suppose X is a random variable with cumulative distribution function
8
0
x a
>
>
<
x a
a<x b
F (x) =
b a
>
>
: 1
x>b
where b > a.
(a) Sketch F (x) the cumulative distribution function of X.
(b) Find f (x) the probability density function of X and sketch it.
(c) Is it possible for f (x) to take on values greater than one?
22
2. UNIVARIATE RANDOM VARIABLES
Solution
(a) See Figure 2.4 for a sketch of the cumulative distribution function.
1.0
F(x)
0.5
0
a
x
(a+b)/2
b
Figure 2.4: Graph of cumulative distribution function for Example 2.4.6
(b) By taking the derivative of F (x) we obtain
8
0
>
>
<
1
F 0 (x) =
b a
>
>
: 0
x<a
a<x<b
x>b
The derivative of F (x) does not exist for x = a or x = b. For convenience we de…ne
f (a) = f (b) = b 1 a so that
( 1
a x b
b a
f (x) =
0
otherwise
See Figure 2.5 for a sketch of the probability density function. Note that we could de…ne
f (a) and f (b) to be any values and the cumulative distribution function would remain the
same since x = a or x = b are countably many points.
The random variable X is said to have a Uniform(a; b) distribution. We write this as
X Uniform(a; b).
(c) If a = 0 and b = 0:5 then f (x) = 2 for 0 x 0:5. This example illustrates that the
probability density function is not a probability and that the probability density function
can take on values greater than one.
The important restriction for continuous random variables is
Z1
f (x) dx = 1
1
2.4. CONTINUOUS RANDOM VARIABLES
23
1/(b-a)
f(x)
x
a
(a+b)/2
b
Figure 2.5: Graph of the probability density function of a Uniform(a; b) random variable
2.4.7
Example
Consider the function
f (x) =
and 0 otherwise. For what values of
for x 1
x +1
is this function a probability density function?
Solution
Using the result (2.8) from Section 2.11
Z1
f (x) dx =
1
Z1
dx
+1
x
1
=
lim
b!1
Zb
1
x
1
=
lim
b!1
= 1
= 1
Also f (x)
0 if
h
x
1
b
>0
lim
jb1
dx
i
b!1
if
> 0. Therefore f (x) is a a probability density function for all
X is said to have a Pareto(1; ) distribution.
> 0.
24
2. UNIVARIATE RANDOM VARIABLES
A useful function for evaluating integrals associated with several named random variables
is the Gamma function.
2.4.8
De…nition - Gamma Function
The gamma function, denoted by ( ) for all
( )=
Z1
y
> 0, is given by
1
e
y
dy
0
2.4.9
Properties - Gamma Function
(1) ( ) = (
1) (
(2) (n) = (n
p
(3) ( 21 ) =
1)! n = 1; 2; : : :
2.4.10
1)
>1
Example
Suppose X is a random variable with probability density function
f (x) =
x
1 e x=
( )
for x > 0;
> 0;
>0
and 0 otherwise.
X is said to have a Gamma distribution with parameters
X Gamma( ; ).
(a) Verify that
Z1
and
and we write
f (x)dx = 1
1
(b) What special probability density function is obtained for
= 1?
(c) Graph the probability density functions for
(i) = 1, = 3
(ii) = 2, = 1:5
(iii) = 5, = 0:6
(iv) = 10, = 0:3
on the same graph.
Note: See Chapter 11 - Summary of Named Distributions. Note that the notation for
parameters used for named distributions is not necessarily the same in all textbooks. This
is especially true for distributions with two or more parameters.
2.4. CONTINUOUS RANDOM VARIABLES
25
Solution
(a)
Z1
Z1
f (x) dx =
1
1 e x=
x
( )
dx let y = x=
0
Z1
1
( )
=
1
(y )
e
y
dy
0
1
( )
=
Z1
1
y
e
y
dy
0
( )
( )
=
= 1
(b) If
= 1 the probability density function is
f (x) =
1
e
x=
for x > 0;
>0
and 0 otherwise which is the Exponential( ) distribution.
(c) See Figure 2.6
0.45
α=10, β=0.3
0.4
0.35
α=1, β=3
α=5, β=0.6
0.3
f(x)
0.25
α=2, β=1.5
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
6
7
8
9
x
Figure 2.6: Gamma( ; ) probability density functions
26
2. UNIVARIATE RANDOM VARIABLES
2.4.11
Exercise
Suppose X is a random variable with probability density function
f (x) =
1
x
(x= )
e
for x > 0;
> 0;
>0
and 0 otherwise.
X is said to have a Weibull distribution with parameters
X Weibull( ; ).
(a) Verify that
Z1
and
and we write
f (x)dx = 1
1
(b) What special probability density function is obtained for
= 1?
(c) Graph the probability density functions for
(i) = 1, = 0:5
(ii) = 2, = 0:5
(iii) = 2, = 1
(iv) = 3, = 1
on the same graph.
2.4.12
Exercise
Suppose X is a random variable with probability density function
f (x) =
x
+1
for x > ;
> 0;
>0
and 0 otherwise.
X is said to have a Pareto distribution with parameters
X Pareto( ; ).
(a) Verify that
Z1
f (x)dx = 1
1
(b) Graph the probability density functions for
(i) = 1, = 1
(ii) = 1, = 2
(iii) = 0:5, = 1
(iv) = 0:5, = 2
on the same graph.
and
and we write
2.5. LOCATION AND SCALE PARAMETERS
2.5
27
Location and Scale Parameters
In Chapter 6 we will look at methods for constructing con…dence intervals for an unknown
parameter . If the parameter is either a location parameter or a scale parameter then a
con…dence interval is easier to construct.
2.5.1
De…nition - Location Parameter
Suppose X is a continuous random variable with probability density function f (x; ) where
is a parameter of the distribution. Let F0 (x) = F (x; = 0) and f0 (x) = f (x; = 0).
The parameter is called a location parameter of the distribution if
F (x; ) = F0 (x
) for
2<
) for
2<
or equivalently
f (x; ) = f0 (x
2.5.2
De…nition - Scale Parameter
Suppose X is a continuous random variable with probability density function f (x; ) where
is a parameter of the distribution. Let F1 (x) = F (x; = 1) and f1 (x) = f (x; = 1).
The parameter is called a scale parameter of the distribution if
F (x; ) = F1
x
for
>0
1
x
f (x; ) = f1 ( ) for
>0
or equivalently
2.5.3
Example
Suppose X is a continuous random variable with probability density function
f (x) =
1
e
(x
)=
for x
;
2 <;
>0
and 0 otherwise.
X is said to have a Two Parameter Exponential distribution and we write X
Exponential( ; ).
(a) If X
Two Parameter Exponential( ; 1) show that
distribution. Sketch the probability density function for
Double
is a location parameter for this
= 1; 0; 1 on the same graph.
(b) If X
Two Parameter Exponential(0; ) show that is a scale parameter for this
distribution. Sketch the probability density function for = 0:5; 0; 2 on the same graph.
28
2. UNIVARIATE RANDOM VARIABLES
Solution
(a) For X
Two Parameter Exponential( ; 1) the probability density function is
f (x; ) = e
(x
)
for x
;
2<
and 0 otherwise.
Let
f0 (x) = f (x; = 0) = e
x
for x > 0
and 0 otherwise. Then
f (x; ) = e
(x
)
= f0 (x
and therefore
)
for all
2<
is a location parameter of this distribution.
See Figure 2.7 for a sketch of the probability density function for
=
1; 0; 1.
1
0.9
0.8
θ=-1
θ=0
θ=1
0.7
0.6
f (x)
0.5
0.4
0.3
0.2
0.1
0
-1
0
1
2
x
3
4
Figure 2.7: Exponential( ; 1) probability density function for
(b) For X
5
=
1; 0; 1
Two Parameter Exponential(0; ) the probability density function is
1
f (x; ) = e
x=
for x > 0;
>0
and 0 otherwise which is the Exponential( ) probability density function.
2.5. LOCATION AND SCALE PARAMETERS
29
Let
f1 (x) = f (x; = 1) = e
x
for x > 0
and 0 otherwise. Then
1
f (x; ) = e
and therefore
x=
1
x
= f1
for all
>0
is a scale parameter of this distribution.
See Figure 2.8 for a sketch of the probability density function for
= 0:5; 1; 2.
2
1.8
θ=0.5
1.6
1.4
1.2
f(x)
θ=1
1
0.8
0.6
θ=2
0.4
0.2
0
0
0.5
1
1.5
x
2
2.5
Figure 2.8: Exponential( ) probability density function for
2.5.4
3
= 0:5; 1; 2
Exercise
Suppose X is a continuous random variable with probability density function
f (x) =
1
f1 + [(x
)= ]2 g
for x 2 <;
2 <;
>0
and 0 otherwise.
X is said to have a two parameter Cauchy distribution and we write
X Cauchy( ; ).
(a) If X Cauchy( ; 1) then show that is a location parameter for the distribution. Graph
the Cauchy( ; 1) probability density function for = 1; 0 and 1 on the same graph.
(b) If X Cauchy(0; ) then show that is a scale parameter for the distribution. Graph
the Cauchy(0; ) probability density function for = 0:5; 1 and 2 on the same graph.
30
2.6
2. UNIVARIATE RANDOM VARIABLES
Functions of a Random Variable
Suppose X is a continuous random variable with probability density function f and cumulative distribution function F and we wish to …nd the probability density function of the
random variable Y = h(X) where h is a real-valued function. In this section we look at
techniques for determining the distribution of Y .
2.6.1
Cumulative Distribution Function Technique
A useful technique for determining the distribution of a function of a random variable
Y = h(X) is the cumulative distribution function technique. This technique involves obtaining an expression for G(y) = P (Y
y), the cumulative distribution function of Y ,
in terms of F , the cumulative distribution function of X. The corresponding probability
density function g of Y is found by di¤erentiating G. Care must be taken to determine the
support set of the random variable Y .
2.6.2
Example
If Z
N(0; 1) …nd the probability density function of Y = Z 2 . What type of random
variable is Y ?
Solution
If Z
N(0; 1) then the probability density function of Z is
1
f (z) = p e
2
z 2 =2
for z 2 <
Let G (y) = P (Y
y) be the cumulative distribution function of Y = Z 2 . Since the support
set of the random variable Z is < and Y = Z 2 then the support set of Y is B = fy : y > 0g.
For y 2 B
G (y) = P (Y
y)
= P Z2 y
p
= P(
y Z
p
y)
p
=
Zy
p
1
p e
2
z 2 =2
1
p e
2
z 2 =2
dz
y
p
= 2
Zy
0
dz
since f (z) is an even function
2.6. FUNCTIONS OF A RANDOM VARIABLE
31
For y 2 B the probability density function of Y is
g (y) =
2p
Zy
d
2
d 6
G (y) =
4 p e
dy
dy
2
z 2 =2
0
=
=
p
2
p e ( y)
2
1
p
e y=2
2 y
2
=2
1
p
2 y
3
p 2
d p
2
7
dz 5 = p e ( y) =2 ( y)
dy
2
(2.2)
(2.3)
by 2.11.10.
We recognize (2.3) as the probability density function of a Chi-squared(1) random vari2 (1).
able so Y = Z 2
The above solution provides a proof of the following theorem.
2.6.3
If Z
2.6.4
Theorem
N(0; 1) then Z 2
2 (1).
Example
Suppose X
Exponential( ). The cumulative distribution function for X is
F (x) = P (X
8
<0
x) =
x
:1
e
x=
0
x>0
Determine the distribution of the random variable Y = F (X) = 1
e
X=
.
Solution
Since Y = 1
e
X=
,X=
log (1
1 (Y
Y)=F
) for X > 0 and 0 < Y < 1.
For 0 < y < 1 the cumulative distribution function of Y is
G (y) = P (Y
y) = P 1
= P (X
= F(
log (1
log (1
= 1
e
log(1 y)=
= 1
(1
y)
= y
y))
e
y))
X=
y
32
If U
2. UNIVARIATE RANDOM VARIABLES
Uniform(0; 1) then the cumulative distribution function of U is
P (U
8
>
0 u 0
>
>
<
u) = u 0 < u <1
>
>
>
:1 u 1
Therefore the cumulative distribution function of Y = F (X) = 1
e
X=
is Uniform(0; 1).
This is an example of a result which holds more generally as summarized in the following
theorem.
2.6.5
Theorem - Probability Integral Transformation
If X is a continuous random variable with cumulative distribution function F then the
random variable
ZX
Y = F (X) =
f (t) dt
(2.4)
1
has a Uniform(0; 1) distribution.
Proof
Suppose the continuous random variable X has support set A = fx : f (x) > 0g. For all
x 2 A, F is an increasing function since F is a cumulative distribution function. Therefore
for all x 2 A the function F has an inverse function F 1 .
For 0 < y < 1, the cumulative distribution function of Y = F (X) is
G (y) = P (Y
y) = P (F (X)
= P X
= F F
F
1
1
y)
(y)
(y)
= y
which is the cumulative distribution function of a Uniform(0; 1) random variable. Therefore
Y = F (X) Uniform(0; 1) as required.
Note: Because of the form of the function (transformation) Y = F (X) in (2.4), this
transformation is called the probability integral transformation. This result holds for any
cumulative distribution function F corresponding to a continuous random variable.
2.6. FUNCTIONS OF A RANDOM VARIABLE
2.6.6
33
Theorem
Suppose F is a cumulative distribution function for a continuous random variable. If
U Uniform(0; 1) then the random variable X = F 1 (U ) also has cumulative distribution
function F .
Proof
Suppose that the support set of the random variable X = F
cumulative distribution function of X = F 1 (U ) is
P (X
x) = P F
1
= P (U
(U )
1 (U )
is A. For x 2 A, the
x
F (x))
= F (x)
since P (U u) = u for 0 < u < 1 if U
cumulative distribution function F .
Uniform(0; 1). Therefore X = F
1 (U )
has
Note: The result of the previous theorem is important because it provides a method for
generating observations from a continuous distribution. Let u be an observation generated
from a Uniform(0; 1) distribution using a random number generator. Then by Theorem
2.6.6, x = F 1 (u) is an observation from the distribution with cumulative distribution
function F .
2.6.7
Example
Explain how Theorem 2.6.6 can be used to generate observations from a Weibull( ; 1)
distribution.
Solution
If X has a Weibull( ; 1) distribution then the cumulative distribution function is
F (x; ) =
Zx
y
1
e
y
dy
0
= 1
e
x
for x > 0
The inverse cumulative distribution function is
F
1
(u) = [ log (1
u)]1=
for 0 < u < 1
If u is an observation from the Uniform(0; 1) distribution then x = [ log (1
observation from the Weibull( ; 1) distribution by Theorem 2.6.6.
u)]1= is an
If we wish to …nd the distribution of the random variable Y = h(X) and h is a one-to-one
real-valued function then the following theorem can be used.
34
2.6.8
2. UNIVARIATE RANDOM VARIABLES
Theorem - One-to-One Transformation of a Random Variable
Suppose X is a continuous random variable with probability density function f and support
set A = fx : f (x) > 0g. Let Y = h(X) where h is a real-valued function. Let B = fy :
g(y) > 0g be the support set of the random variable Y . If h is a one-to-one function from
d
A to B and dx
h (x) is continuous for x 2 A, then the probability density function of Y is
1
g(y) = f (h
(y))
d
h
dy
1
(y)
for y 2 B
Proof
We prove this theorem using the cumulative distribution function technique.
d
(1) Suppose h is an increasing function and dx
h (x) is continuous for x 2 A. Then h 1 (y)
d
h 1 (y) > 0 for y 2 B. The cumulative distribution
is also an increasing function and dy
function of Y = h(X) is
G (y) = P (Y
y) = P (h (X)
= P X
= F h
h
1
1
(y)
y)
since h is an increasing function
(y)
Therefore
d
d
G (y) =
F h 1 (y)
dy
dy
d
= F 0 h 1 (y)
h 1 (y) by the Chain Rule
dy
d
d
= f h 1 (y)
h 1 (y) y 2 B since
h 1 (y) > 0
dy
dy
g (y) =
d
(2) Suppose h is a decreasing function and dx
h (x) is continuous for x 2 A. Then h 1 (y)
d
h 1 (y) < 0 for y 2 B. The cumulative distribution
is also a decreasing function and dy
function of Y = h(X) is
G (y) = P (Y
= P X
= 1
y) = P (h (X)
1
h
F h
1
(y)
y)
since h is a decreasing function
(y)
Therefore
d
d
G (y) =
1 F h 1 (y)
dy
dy
d
=
F 0 h 1 (y)
h 1 (y) by the Chain Rule
dy
d
d
= f h 1 (y)
h 1 (y) y 2 B since
h 1 (y) < 0
dy
dy
g (y) =
These two cases give the desired result.
2.6. FUNCTIONS OF A RANDOM VARIABLE
2.6.9
35
Example
The following two results were used extensively in your previous probability and statistics
courses.
(a) If Z
(b) If X
N(0; 1) then Y =
N( ;
2)
+ Z
then Z =
N
X
2
;
.
N(0; 1).
Prove these results using Theorem 2.6.8.
Solution
(a) If Z
N(0; 1) then the probability density function of Z is
1
f (z) = p e
2
z 2 =2
for z 2 <
Y = + Z = h (Z) is an increasing function with inverse function Z = h
Since the support set of Z is A = <, the support set of Y is B = <.
1 (Y
)=
Y
.
Since
d
d y
1
h 1 (y) =
=
dy
dy
then by Theorem 2.6.8 the probability density function of Y is
g(y) = f (h
=
1
(y))
y
1
p e (
2
d
h
dy
)
2
(b) If X
N( ;
2)
N(0; 1) then Y =
+ Z
(y)
=2 1
which is the probability density function of an N
Therefore if Z
1
;
N
for y 2 <
2
random variable.
2
;
.
then the probability density function of X is
f (x) = p
1
2
x
e (
)
2
=2
for x 2 <
Z= X
= h (X) is an increasing function with inverse function X = h
Since the support set of X is A = <, the support set of Z is B = <.
Since
d
d
h 1 (z) =
( + z) =
dz
dy
then by Theorem 2.6.8 the probability density function of Z is
g(z) = f (h
=
1
(z))
1
p e
2
z 2 =2
d
h
dy
1
(z)
for z 2 <
which is the probability density function of an N(0; 1) random variable.
Therefore if X
N( ;
2)
then Z =
Y
N(0; 1).
1 (Z)
=
+ Z.
36
2. UNIVARIATE RANDOM VARIABLES
2.6.10
Example
Use Theorem 2.6.8 to prove the following relationship between the Pareto distribution and
the Exponential distribution.
If X
P areto (1; ) then Y = log(X)
Exponential
1
Solution
If X
Pareto(1; ) then the probability density function of X is
f (x) =
for x
+1
x
1;
>0
Y = log(X) = h (X) is an increasing function with inverse function X = eY = h 1 (Y ).
Since the support set of X is A = fx : x 0g, the support set of Y is B = fy : y > 0g.
Since
d
h
dy
1
(y) =
d y
e = ey
dy
then by Theorem 2.6.8 the probability density function of Y is
g(y) = f (h
=
(ey )
1
(y))
+1
d
h
dy
ey = e
1
(y)
y
for y > 0
which is the probability density function of an Exponential
2.6.11
1
random variable as required.
Exercise
Use Theorem 2.6.8 to prove the following relationship between the Exponential distribution
and the Weibull distribution.
If X
2.6.12
Exponential(1) then Y = X 1=
Weibull( ; ) for y > 0;
Exercise
Suppose X is a random variable with probability density function
f (x) = x
1
for 0 < x < 1;
>0
and 0 otherwise.
Use Theorem 2.6.8 to prove that Y =
log X
Exponential
1
.
> 0;
>0
2.7. EXPECTATION
2.7
37
Expectation
In this section we de…ne the expectation operator E which maps random variables to real
numbers. These numbers have an interpretation in terms of long run averages for repeated
independent trails of an experiment associated with the random variable. Much of this
section is a review of material covered in a previous probability course.
2.7.1
De…nition - Expectation
Suppose h(x) is a real-valued function.
If X is a discrete random variable with probability function f (x) and support set A then
the expectation of the random variable h (X) is de…ned by
P
E [h(X)] =
h (x) f (x)
x2A
provided the sum converges absolutely, that is, provided
E(jh(X)j) =
P
x2A
jh (x)j f (x) < 1
If X is a continuous random variable with probability density function f (x) then the expectation of the random variable h (X) is de…ned by
E [h(X)] =
Z1
h (x) f (x)dx
1
provided the integral converges absolutely, that is, provided
E(jh(X)j) =
Z1
1
jh (x)j f (x)dx < 1
If E(jh(X)j) = 1 then we say that E [h(X)] does not exist.
E [h(X)] is also called the expected value of the random variable h (X).
2.7.2
Example
Find E(X) if X
Geometric(p).
Solution
If X
Geometric(p) then
f (x) = pq x
for x = 0; 1; : : : ; q = 1
p; 0 < p < 1
38
2. UNIVARIATE RANDOM VARIABLES
and
P
E (X) =
xf (x) =
x2A
1
P
= pq
1
P
xpq x =
x=0
xq x
1
1
P
xpq x
x=1
which converges if 0 < q < 1
x=1
pq
by 2.11.2(2)
(1 q)2
q
if 0 < q < 1
p
=
=
2.7.3
Example
Suppose X
Pareto(1; ) with probability density function
f (x) =
x
for x
+1
and 0 otherwise. Find E(X). For what values of
1;
>0
does E(X) exist?
Solution
E (X) =
Z1
xf (x)dx =
1
Z1
=
Z1
x
x
+1
dx
1
1
dx which converges for
x
> 1 by 2.8
1
=
lim
b!1
Zb
x
dx =
lim x
1
b!1
+1 b
j1
=
1
1
1
b
1
1
=
Therefore E (X) =
2.7.4
1
1
for
>1
and the mean exists only for
> 1.
Exercise
Suppose X is a nonnegative continuous random variable with cumulative distribution function F (x) and E(X) < 1. Show that
E(X) =
Z1
[1
F (x)] dx
0
Hint: Use integration by parts with u = [1
F (x)].
2.7. EXPECTATION
2.7.5
39
Theorem - Expectation is a Linear Operator
Suppose X is a random variable with probability (density) function f (x), a and b are real
constants, and g(x) and h(x) are real-valued functions. Then
E (aX + b) = aE (X) + b
E [ag(X) + bh(X)] = aE [g(X)] + bE [h(X)]
Proof (Continuous Case)
E (aX + b) =
Z1
(ax + b) f (x)dx
1
= a
Z1
xf (x)dx + b
1
= aE (X) + b (1)
Z1
f (x)dx by properties of integrals
1
by De…nition 2.7.1 and Property 2.4.4
= aE (X) + b
E [ag(X) + bh(X)] =
Z1
[ag(x) + bh(x)] f (x)dx
1
= a
Z1
g(x)f (x)dx + b
1
Z1
1
= aE [g(X)] + bE [h(X)]
h(x)f (x)dx by properties of integrals
by De…nition 2.7.1
as required.
The following named expectations are used frequently.
2.7.6
Special Expectations
(1) The mean of a random variable
E (X) =
(2) The kth moment (about the origin) of a random variable
E(X k )
(3) The kth moment about the mean of a random variable
h
i
E (X
)k
40
2. UNIVARIATE RANDOM VARIABLES
(4) The kth factorial moment of a random variable
E X (k) = E [X(X
1)
(X
k + 1)]
(5) The variance of a random variable
)2 ] =
V ar(X) = E[(X
2.7.7
2
where
= E(X)
Theorem - Properties of Variance
2
= V ar(X)
= E(X 2 )
= E[X(X
2
2
1)] +
V ar(aX + b) = a2 V ar(X)
and
E(X 2 ) =
2
+
2
Proof (Continuous Case)
)2 ] = E X 2
V ar(X) = E[(X
= E X2
= E X
2
2 E (X) +
2
2
= E X2
+
2 X+
2
2
by Theorem 2.7.5
2
2
Also
V ar(X) = E X 2
2
= E [X (X
= E [X (X
1)] + E (X)
= E [X (X
2
1)] +
1) + X]
2
by Theorem 2.7.5
n
o
V ar (aX + b) = E [aX + b (a + b)]2
h
i
h
= E (aX a )2 = E a2 (X
= a2 E[(X
2
= E(X 2 )
2
gives
E(X 2 ) =
)2
i
)2 ] by Theorem 2.7.5
= a2 V ar(X) by de…nition
Rearranging
2
2
+
2
2.7. EXPECTATION
2.7.8
If X
41
Example
Binomial(n; p) then show
E(X (k) ) = n(k) pk
for k = 1; 2; : : :
and thus …nd E (X) and V ar(X).
Solution
E(X (k) ) =
=
n
P
x=k
n
P
x(k)
n(k)
x=k
= n(k)
nPk
n x
p (1 p)n x
x
n k x
p (1 p)n
x k
n
nPk
n
k
= n
p)n
p (p + 1
= n(k) pk
p)(n
py (1
y
y=0
(k) k
p)n
py+k (1
y
y=0
= n(k) pk
k
k
x
by 2.11.4(1)
y k
k) y
by 2.11.3(1)
for k = 1; 2; : : :
For k = 1 we obtain
E X (1)
= E (X)
= n(1) p1
= np
For k = 2 we obtain
E X (2)
= E [X (X
1)]
= n(2) p2
1) p2
= n (n
Therefore
V ar (X) = E[X(X
= n (n
=
1)] +
1) p2 + np
np2 + np
= np (1
p)
2
n2 p2
y=x
k
42
2. UNIVARIATE RANDOM VARIABLES
2.7.9
Exercise
Show the following:
k
(a) If X
Poisson( ) then E(X (k) ) =
(b) If X
Negative Binomial(k; p) then E(X (j) ) = ( k)(j)
(c) If X
Gamma( ; ) then E(X p ) =
(d) If X
Weibull( ; ) then
E(X k )
for k = 1; 2; : : :
p
j
for j = 1; 2; : : :
( + p) = ( ) for p >
k
=
p 1
p
k
+1
.
for k = 1; 2; : : :
In each case …nd E (X) and V ar(X).
Table 2.1 summarizes the di¤erences between the properties of a discrete and continuous
random variable.
Property
Discrete Random Variable
F (x) = P (X
x) =
c.d.f.
P
P (X = t)
t x
Continuous Random Variable
F (x) = P (X
x) =
Rx
f (t) dt
1
F is a right continuous
function for all x 2 <
F is a continuous
function for all x 2 <
p.f./p.d.f.
f (x) = P (X = x)
f (x) = F 0 (x) 6= P (X = x) = 0
Probability
of an event
P
P (X = x)
P (X 2 E) =
x2E
P
f (x)
=
P (a < X
=
x2E
Total
Probability
P
P (X = x) =
x2A
P
f (x) = 1
x2A
where A = support set of X
Expectation
E [g (X)] =
P
g (x) f (x)
x2A
where A = support set of X
b) = F (b)
Rb
F (a)
f (x) dx
a
R1
f (x) dx = 1
1
E [g (X)] =
R1
g (x) f (x) dx
1
Table 2.1 Properties of discrete versus continuous random variables
2.8. INEQUALITIES
2.8
43
Inequalities
In Chapter 5 we consider limiting distributions of a sequence of random variables. The
following inequalities which involve the moments of a distribution are useful for proving
limit theorems.
2.8.1
Markov’s Inequality
P (jXj
E(jXjk )
ck
c)
for all k; c > 0
Proof (Continuous Case)
Suppose X is a continuous random variable with probability density function f (x). Let
A=
x:
x
c
k
1
= fx : jxj
cg since c > 0
Then
E jXjk
ck
X
c
= E
=
Z
A
Z
A
Z
x
c
k
!
=
Z1
1
x
c
Z
k
f (x) dx
x k
f (x) dx
c
A
Z
x k
x k
f (x) dx since
f (x) dx
c
c
k
f (x) dx +
x
f (x) dx since
c
A
k
0
1 for x 2 A
A
= P (jXj
c)
as required. (The proof of the discrete case follows by replacing integrals with sums.)
2.8.2
Chebyshev’s Inequality
Suppose X is a random variable with …nite mean
k>0
P (jX
2.8.3
j
k )
and …nite variance
1
k2
Exercise
Use Markov’s Inequality to prove Chebyshev’s Inequality.
2.
Then for any
44
2. UNIVARIATE RANDOM VARIABLES
2.9
Variance Stabilizing Transformation
In Chapter 6 we look at methods for constructing a con…dence interval for an unknown
parameter based on data X. To do this it is often useful to …nd a transformation, g (X),
of the data X whose variance is approximately constant with respect to .
Suppose X is a random variable with …nite mean E (X) = . Suppose also that X has
p
…nite variance V ar (X) = 2 ( ) and standard deviation V ar (X) = ( ) also depending
on . Let Y = g (X) where g is a di¤erentiable function. By the linear approximation
g ( ) + g 0 ( ) (X
Y = g (X)
)
Therefore
E (Y )
E g ( ) + g 0 ( ) (X
) = g( )
since
E g 0 ( ) (X
) = g 0 ( ) E [(X
)] = 0
Also
V ar (Y )
If we want V ar (Y )
V ar g 0 ( ) (X
) = g0 ( )
constant with respect to
g0 ( )
2
2
V ar (X) = g 0 ( ) ( )
2
(2.5)
then we should choose g such that
V ar (X) = g 0 ( ) ( )
2
= constant
In other words we need to solve the di¤erential equation
k
( )
dg
=
d
where k is a conveniently chosen constant.
2.9.1
Example
If X Poisson( ) then show that the random variable Y = g (X) =
constant variance.
p
X has approximately
Solution
p
p
p
If X
Poisson( ) then V ar (X) = ( ) =
. For g (X) = X, g 0 (X) =
p
Therefore by (2.5), the variance of Y = g (X) = X is approximately
g0 ( ) ( )
2
=
1
2
1=2
p
2
=
1
1=2 .
2X
1
4
which is a constant.
2.9.2
Exercise
If X Exponential( ) then show that the random variable Y = g (X) = log X has approximately constant variance.
2.10. MOMENT GENERATING FUNCTIONS
2.10
45
Moment Generating Functions
If we are given the probability (density) function of a random variable X or the cumulative
distribution function of the random variable X then we can determine everything there is to
know about the distribution of X. There is a third type of function, the moment generating
function, which also uniquely determines a distribution. The moment generating function is
closely related to other transforms used in mathematics, the Laplace and Fourier transforms.
Moment generating functions are a powerful tool for determining the distributions of
functions of random variables (Chapter 4), particularly sums, as well as determining the
limiting distribution of a sequence of random variables (Chapter 5).
2.10.1
De…nition - Moment Generating Function
If X is a random variable then M (t) = E(etX ) is called the moment generating function
(m.g.f.) of X if this expectation exists for all t 2 ( h; h) for some h > 0.
Important: When determining the moment generating function M (t) of a random variable
the values of t for which the expectation exists should always be stated.
2.10.2
Example
(a) Find the moment generating function of the random variable X
(b) Find the moment generating function of the random variable X
Gamma( ; ).
Negative Binomial(k; p).
Solution
(a) If X
Gamma( ; ) then
M (t) =
Z1
=
Z1
tx
e f (x) dx =
1
Z1
etx
x
1
1
x
( )
e
x=
dx
0
1
( )
1
x
e
t
dx which converges for t <
0
1
=
( )
Z1
1
x
e
1
x
t
dx let y =
1
t
x
0
1
=
( )
Z1
1
y
1
t
e
y
e
y
1
t
dy
0
=
1
( )
1
1
t
Z1
1
y
0
=
1
1
t
for t <
1
dy =
( )
( )
1
1
t
1
46
(b) If X
2. UNIVARIATE RANDOM VARIABLES
Negative Binomial(k; p) then
M (t) =
1
P
etx
x20
= pk
1
P
x20
= pk 1
=
2.10.3
k k
p ( q)x where q = 1 p
x
k
x
which converges for
et ( q)
x
k
qet
p
1 qet
by 2.11.3(2) for et < q
qet < 1
1
k
for t <
log (q)
Exercise
(a) Show that the moment generating function of the random variable X
n
is M (t) = q + pet for t 2 <.
(b) Show that the moment generating function of the random variable X
t
M (t) = e (e 1) for t 2 <.
Binomial(n; p)
Poisson( ) is
If the moment generating function of random variable X exists then the following theorem gives us a method for determining the distribution of the random variable Y = aX + b
which is a linear function of X.
2.10.4
Theorem - Moment Generating Function of a Linear Function
Suppose the random variable X has moment generating function MX (t) de…ned for
t 2 ( h; h) for some h > 0. Let Y = aX + b where a; b 2 R and a 6= 0. Then the moment
generating function of Y is
MY (t) = ebt MX (at) for jtj <
h
jaj
Proof
MY (t) = E etY = E et(aX+b)
= ebt E eatX
which exists for jatj < h
h
= ebt MX (at) for jtj <
jaj
as required.
2.10.5
Example
(a) Find the moment generating function of Z
N(0; 1).
(b) Use (a) and the fact that X = + Z
of a N( ; 2 ) random variable.
2)
N( ;
to …nd the moment generating function
2.10. MOMENT GENERATING FUNCTIONS
47
Solution
(a) The moment generating function of Z is
MZ (t) =
Z1
1
etz p e
2
=
Z1
2
1
p e (z
2
1
1
t2 =2
= e
Z1
1
t2 =2
= e
z 2 =2
dz
2tz )=2
1
p e
2
dz
(z t)2 =2
dz
for t 2 <
by 2.4.4(2) since
1
p e
2
(z t)=2
is the probability density function of a N (t; 1) random variable.
(b) By Theorem 2.10.4 the moment generating function of X =
+ Z is
MX (t) = e t MZ ( t)
= e t e(
= e
2.10.6
t+
t)2 =2
2 t2 =2
for t 2 <
Exercise
If X
Negative Binomial(k; p) then …nd the moment generating function of Y = X + k,
k = 1; 2; : : :
2.10.7
Theorem - Moments from Moment Generating Function
Suppose the random variable X has moment generating function M (t) de…ned for
t 2 ( h; h) for some h > 0. Then M (0) = 1 and
M (k) (0) = E(X k ) for k = 1; 2; : : :
where
M (k) (t) =
is the kth derivative of M (t).
dk
M (t)
dtk
48
2. UNIVARIATE RANDOM VARIABLES
Proof (Continuous Case)
Note that
M (0) = E(X 0 ) = E (1) = 1
and also that
dk tx
e = xk etx k = 1; 2; : : :
(2.6)
dtk
The result (2.6) can be proved by induction.
Now if X is a continuous random variable with moment generating function M (t) de…ned
for t 2 ( h; h) for some h > 0 then
M (k) (t) =
=
=
dk
E etX
dtk
Z1
dk
etx f (x) dx
dtk
Z1
1
dk tx
e f (x) dx
dtk
k = 1; 2; : : :
1
assuming the operations of di¤erentiation and integration can be exchanged. (This interchange of operations cannot always be done but for the moment generating functions of
interest in this course the result does hold.)
Using (2.6) we have
M
(k)
(t) =
Z1
xk etx f (x) dx
1
= E X k etX
t 2 ( h; h) for some h > 0
Letting t = 0 we obtain
M (k) (0) = E X k
k = 1; 2; : : :
as required.
2.10.8
Example
If X
Gamma( ; ) then M (t) = (1
Theorem 2.10.7.
t)
, t < 1= . Find E X k , k = 1; 2; : : : using
Solution
d
d
M (t) = M 0 (t) =
(1
dt
dt
=
(1
t)
t)
1
2.10. MOMENT GENERATING FUNCTIONS
49
so
E (X) = M 0 (0) =
d2
d2
00
M
(t)
=
M
(t)
=
(1
dt2
dt2
=
( + 1) 2 (1
t)
2
t)
so
E X 2 = M 00 (0) =
( + 1)
2
Continuing in this manner we have
dk
dk
(k)
M
(t)
=
M
(t)
=
(1
dtk
dtk
=
( + 1)
( +k
t)
1)
k
(1
t)
k
for k = 1; 2; : : :
so
E Xk
= M (k) (0)
=
2.10.9
( + 1)
( +k
1)
k
=( +k
1)(k)
k
for k = 1; 2; : : :
Important Idea
Suppose M (k) (t), k = 1; 2; : : : exists for t 2 ( h; h) for some h > 0, then M (t) has a
Maclaurin series given by
1 M (k) (0)
P
tk
k!
k=0
where
M (0) (0) = M (0) = 1
The coe¢ cient of tk in this power series is equal to
M (k) (0)
E(X k )
=
k!
k!
Therefore if we can obtain a Maclaurin series for M (t), for example, by using the Binomial
series or the Exponential series, then we can …nd E(X k ) by using
E(X k ) = k!
coe¢ cient of tk in the Maclaurin series for M (t)
(2.7)
50
2.10.10
Suppose X
M (t) = (1
2. UNIVARIATE RANDOM VARIABLES
Example
Gamma( ; ). Find E X k
t) , t < 1= .
by using the Binomial series expansion for
Solution
M (t) = (1
1
P
=
=
t)
k
k=0
1
P
k
k=0
(
t)k
(
)k tk
for jtj <
1
for jtj <
by 2.11.3(2)
1
The coe¢ cient of tk in M (t) is
)k
(
k
for k = 1; 2; : : :
Therefore
E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t)
= k!
k
(
)k
)(k)
= k!
( )k
k!
= ( )(
1)
(
(
= ( +k
1) ( + k
= ( +k
1)(k)
k
k + 2) (
2)
( + 1) ( )
k + 1) (
)k
k
for k = 1; 2; : : :
which is the same result as obtained in Example 2.10.8.
Moment generating functions are particularly useful for …nding distributions of sums of
independent random variables. The following theorem plays an important role in this
technique.
2.10.11
Uniqueness Theorem for Moment Generating Functions
Suppose the random variable X has moment generating function MX (t) and the random
variable Y has moment generating function MY (t). Suppose also that MX (t) = MY (t) for
all t 2 ( h; h) for some h > 0. Then X and Y have the same distribution, that is,
P (X s) = FX (s) = FY (s) = P (Y
s) for all s 2 <.
Proof
See Problem 18 for the proof of this result in the discrete case.
2.10. MOMENT GENERATING FUNCTIONS
2.10.12
If X
51
Example
Exponential(1) then …nd the distribution of Y =
+ X where
> 0 and
2 <.
Solution
From Example 2.4.10 we know that if X
MX (t) =
Exponential(1) then X
1
1
Gamma(1; 1) so
for t < 1
t
By Theorem 2.10.4
MY (t) = e t MX ( t) for t < 1
e t
1
=
for t <
1
t
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Two Parameter Exponential( ; ) random variable.
Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a Two
Parameter Exponential( ; ) distribution.
2.10.13
If X
Example
Gamma( ; ), where
is a positive integer, then show
2X
2 (2
).
Solution
(a) From Example 2.10.2 the moment generating function of X is
1
M (t) =
1
for t <
t
1
By Theorem 2.10.4 the moment generating function of Y =
2
MY (t) = MX
2
= 4
=
t
1
2
1
1
1
2t
t
5
2
for jtj <
for jtj <
is
1
for jtj <
3
2X
( )
1
2
1
2
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a 2 (2 ) random variable if
is a positive integer.
Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a 2 (2 )
distribution if is a positive integer.
52
2.10.14
2. UNIVARIATE RANDOM VARIABLES
Exercise
Suppose the random variable X has moment generating function
2 =2
M (t) = et
for t 2 <
(a) Use (2.7) and the Exponential series 2.11.7 to …nd E (X) and V ar (X).
(b) Find the moment generating function of Y = 2X
2.11
Calculus Review
2.11.1
Geometric Series
1
P
atx = a + at + at2 +
=
x=0
2.11.2
1. What is the distribution of Y ?
a
1
for jtj < 1
t
Useful Results
(1)
1
P
1
tx =
1
x=0
for jtj < 1
t
(2)
1
P
xtx
1
=
x=1
2.11.3
1
t)2
(1
for jtj < 1
Binomial Series
(1) For n 2 Z + (the positive integers)
(a + b)n =
n
P
n x n
a b
x
x=0
where
n
x
=
x
n!
n(x)
=
x! (n x)!
x!
(2) For n 2 Q (the rational numbers) and jtj < 1
(1 + t)n =
1
P
x=0
where
n
x
=
n(x)
n(n
=
x!
1)
n x
t
x
(n
x!
x + 1)
2.11. CALCULUS REVIEW
2.11.4
53
Important Identities
n
x
(1) x(k)
(2)
2.11.5
x+k
x
1
= n(k)
n
x
x+k 1
k 1
=
k
k
= ( 1)x
k
x
Multinomial Theorem
If n is a positive integer and a1 ; a2 ; : : : ; ak are real numbers, then
(a1 + a2 +
+ ak )n =
PP
P
n!
ax1 ax2
x1 !x2 ! xk ! 1 2
axk k
where the summation extends over all non-negative integers x1 ; x2 ; : : : ; xk with x1 + x2 +
+ xk = n.
2.11.6
Hypergeometric Identity
1
P
x=0
2.11.7
b
n
x
x2
+
+
1!
2!
x
a+b
n
=
1 xn
P
n=0 n!
for x 2 <
Logarithmic Series
ln (1 + x) = log (1 + x) = x
2.11.9
=
Exponential Series
ex = 1 +
2.11.8
a
x
x2 x3
+
2
3
for
1<x
First Fundamental Theorem of Calculus (FTC1)
If f is continuous on [a; b] then the function g de…ned by
g (x) =
Zx
f (t) dt for a
x
b
a
is continuous on [a; b], di¤erentiable on (a; b) and g 0 (x) = f (x).
1
54
2. UNIVARIATE RANDOM VARIABLES
2.11.10
Fundamental Theorem of Calculus and the Chain Rule
Suppose we want the derivative with respect to x of G (x) where
G (x) =
h(x)
Z
f (t) dt for a
x
b
a
and h (x) is a di¤erentiable function on [a; b]. If we de…ne
g (u) =
Zu
f (t) dt
a
then G (x) = g (h (x)). Then by the Chain Rule
G0 (x) = g 0 (h (x)) h0 (x)
= f (h (x)) h0 (x)
2.11.11
(a) If
Rb
for a < x < b
Improper Integrals
f (x) dx exists for every number b
a then
a
Z1
f (x) dx = lim
b!1
a
Zb
f (x) dx
a
provided this limit exists. If the limit exists we say the improper integral converges otherwise
we say the improper integral diverges.
(b) If
Rb
f (x) dx exists for every number a
b then
a
Zb
f (x) dx = lim
a! 1
1
Zb
f (x) dx
a
provided this limit exists.
(c) If both
R1
a
f (x) dx and
Ra
f (x) dx are convergent then we de…ne
1
Z1
1
where a is any real number.
f (x) dx =
Za
1
f (x) dx +
Z1
a
f (x) dx
2.11. CALCULUS REVIEW
2.11.12
55
Comparison Test for Improper Integrals
Suppose that f and g are continuous functions with f (x) g (x) 0 for x
(a)
Z1
Z1
If
f (x) dx is convergent then
g (x) dx is convergent.
a
(b)
If
a
Z1
g (x) dx is divergent then
a
2.11.13
a.
Z1
f (x) dx is divergent.
a
Useful Result for Comparison Test
Z1
1
dx converges if and only if p > 1
xp
(2.8)
1
2.11.14
Useful Inequalities
1
1 + yp
1
1 + yp
2.11.15
yp
1
yp
for y
1
1
= p
p
+y
2y
1, p > 0
for y
1, p > 0
Taylor’s Theorem
Suppose f is a real-valued function such that the derivatives f (1) ; f (2) ; : : : ; f (n+1) all exist
on an open interval containing the point x = a. Then
f (x) = f (a)+f (1) (a) (x
a)+
for some c between a and x.
f (2) (a)
(x
2!
a)2 +
+
f (n) (a)
(x
n!
a)n +
f (n+1) (c)
(x
(n + 1)!
a)n+1
56
2. UNIVARIATE RANDOM VARIABLES
2.12
Chapter 2 Problems
1. Consider the following functions:
(a) f (x) = kx x for x = 1; 2; : : :; 0 < < 1
h
i 1
(b) f (x) = k 1 + (x= )2
for x 2 <,
>0
(c) f (x) = ke
jx
(d) f (x) = k (1
(e) f (x) =
x)
kx2 e
(f) f (x) = kx
j
x
for x 2 <,
2<
for x > 0,
>0
for x
>0
for 0 < x < 1,
( +1)
(g) f (x) = ke
x=
(h) f (x) = kx
3 e 1=( x)
1+e
x) x
x=
2
for x 2 <,
for x > 0,
(i) f (x) = k (1 + x)
(j) f (x) = k (1
1,
for x > 0,
1
>0
>0
>0
>1
for 0 < x < 1,
>0
In each case:
(1) Determine k so that f (x) is a probability (density) function and sketch f (x).
(2) Let X be a random variable with probability (density) function f (x). Find
the cumulative distribution function of X.
(3) Find E(X) and V ar(X) using the probability (density) function. Indicate
the values of for which E(X) and V ar(X) exist.
(4) Find P (0:5 < X 2) and P (X > 0:5jX 2).
In (a) use = 0:3, in (b) use = 1, in (c) use = 0, in (d) use = 5, in (e) use
= 1, in (f ) use = 1, in (g) use = 2, in (h) use = 1, in (i) use = 2, in (j)
use = 3.
2. Determine if is a location parameter, a scale parameter, or neither for the distributions in (b) (j) of Problem 1.
3. (a) If X
Weibull(2; ) then show
is a scale parameter for this distribution.
(b) If X
Uniform(0; ) then show
is a scale parameter for this distribution.
4. Suppose X is a continuous random variable with probability density function
8
<ke (x )2 =2
jx
j c
f (x) =
2
:ke cjx j+c =2
jx
j>c
(a) Show that
where
p
2
1
2
= e c =2 + 2 [2 (c) 1]
k
c
is the N(0; 1) cumulative distribution function.
2.12. CHAPTER 2 PROBLEMS
57
(b) Find the cumulative distribution function of X, E (X) and V ar (X).
(c) Show that
is a location parameter for this distribution.
(d) On the same graph sketch f (x) for c = 1, = 0, f (x) for c = 2,
N(0; 1) probability density function. What do you notice?
= 0 and the
5. The Geometric and Exponential distributions both have a property referred to as the
memoryless property.
(a) Suppose X
Geometric(p). Show that P (X k + jjX k) = P (X j)
where k and j are nonnegative integers. Explain why this is called the memoryless property.
(b) Show that if Y
Exponential( ) then P (Y
a > 0 and b > 0.
a + bjY
a) = P (Y
b) where
6. Suppose that f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support
sets A1 ; A2 ; : : : ; Ak ; means 1 ; 2 ; : : : ; k ; and …nite variances 21 ; 22 ; : : : ; 2k respeck
P
tively. Suppose that 0 < p1 ; p2 ; : : : ; pk < 1 and
pi = 1.
i=1
(a) Show that g (x) =
k
P
pi fi (x) is a probability density function.
i=1
(b) Let X be a random variable with probability density function g (x). Find the
support set of X, E (X) and V ar (X).
7.
(a) If X
Gamma( ; ) then …nd the probability density function of Y = eX .
(b) If X
Gamma( ; ) then show Y = 1=X
(c) If X
Gamma(k; ) then show Y = 2X=
Inverse Gamma( ; ).
2 (2k)
for k = 1; 2; : : :.
then …nd the probability density function of Y = eX .
then …nd the probability density function of Y = X
(d) If X
N
;
2
(e) If X
N
;
2
(f) If X
Uniform(
(g) If X
Pareto( ; ) then show that Y =
(h) If X
Weibull(2; ) then show that Y = X 2
2; 2)
then show that Y = tan X
log(X= )
1.
Cauchy(1; 0).
Exponential(1).
Exponential( 2 ).
(i) If X
Double Exponential(0; 1) then …nd the probability density function of
2
Y =X .
(j) If X
t(k) then show that Y = X 2
F(1; k).
58
2. UNIVARIATE RANDOM VARIABLES
8. Suppose T
t(n).
(a) Show that E (T ) = 0 if n > 1.
(b) Show that V ar(T ) = n= (n
9. Suppose X
2) if n > 2.
Beta(a; b).
(a) Find E X k for k = 1; 2; : : :. Use this result to …nd E (X) and V ar (X).
(b) Graph the probability density function for (i) a = 0:7, b = 0:7, (ii) a = 1, b = 3,
(iii) a = 2, b = 2, (iv) a = 2, b = 4, and (v) a = 3, b = 1 on the same graph.
(c) What special probability density function is obtained for a = b = 1?
10. If E(jXjk ) exists for some integer k > 1, then show that E(jXjj ) exists for
j = 1; 2; : : : ; k 1.
11. If X
Binomial(n; ), …nd the variance stabilizing transformation g (X) such that
V ar [g (X)] is approximately constant.
12. Prove that for any random variable X,
1
P
4
E X4
X2
1
2
13. For each of the following probability (density) functions derive the moment generating
function M (t). State the values for which M (t) exists and use the moment generating
function to …nd the mean and variance.
(a) f (x) =
(b) f (x) =
n x
x p (1
xe
=x!
(c) f (x) = 1 e
(x
)=
(d) f (x) = 21 e
jx
j
p)n
x
for x = 0; 1; : : : ; n; 0 < p < 1
for x = 0; 1; : : :;
for x > ;
for x 2 <;
2 <,
>0
>0
2<
(e) f (x) = 2x for 0 < x < 1
8
>
0 x 1
>
<x
(f) f (x) =
2
>
>
:0
x
1<x
2
otherwise
14. Suppose X is a random variable with moment generating function M (t) = E(etX )
which exists for t 2 ( h; h) for some h > 0. Then K(t) = log M (t) is called the
cumulant generating function of X.
(a) Show that E(X) = K 0 (0) and V ar(X) = K 00 (0).
2.12. CHAPTER 2 PROBLEMS
(b) If X
59
Negative Binomial(k; p) then use (a) to …nd E(X) and V ar(X).
15. For each of the following …nd the Maclaurin series for M (t) using known series. Thus
determine all the moments of X if X is a random variable with moment generating
function M (t):
(a) M (t) = (1
t)
3
(b) M (t) = (1 + t)=(1
(c) M (t) = et =(1
16. Suppose Z
for jtj < 1
t) for jtj < 1
t2 ) for jtj < 1
N(0; 1) and Y = jZj.
2
(a) Show that MY (t) = 2 (t) et =2 for t 2 < where
ution function of a N(0; 1) random variable.
(t) is the cumulative distrib-
(b) Use (a) to …nd E (jZj) and V ar (jZj).
2 and Z
17. Suppose X
N(0; 1). Use the properties of moment generating func(1)
k
tions to compute E X and E Z k for k = 1; 2; : : : How are these two related? Is
this what you expected?
18. Suppose X and Y are discrete random variables such that P (X = j) = pj and
P (Y = j) = qj for j = 0; 1; : : :. Suppose also that MX (t) = MY (t) for t 2 ( h; h),
h > 0. Show that X and Y have the same distribution. (Hint: Compare MX (log s)
and MY (log s) and recall that if two power series are equal then their coe¢ cients are
equal.)
19. Suppose X is a random variable with moment generating function M (t) = et =(1
for jtj < 1.
(a) Find the moment generating function of Y = (X
1) =2.
(b) Use the moment generating function of Y to …nd E (Y ) and V ar (Y ).
(c) What is the distribution of Y ?
t2 )
60
2. UNIVARIATE RANDOM VARIABLES
3. Multivariate Random Variables
Models for real phenomena usually involve more than a single random variable. When there
are multiple random variables associated with an experiment or process we usually denote
them as X; Y; : : : or as X1 ; X2 ; : : : . For example, your …nal mark in a course might involve
X1 = your assignment mark, X2 = your midterm test mark, and X3 = your exam mark.
We need to extend the ideas introduced in Chapter 2 for univariate random variables to
deal with multivariate random variables.
In Section 3:1 we began be de…ning the joint and marginal cumulative distribution
functions since these de…nitions hold regardless of what type of random variable we have.
We de…ne these functions in the case of two random variables X and Y . More than two
random variables will be considered in speci…c examples in later sections. In Section 3:2 we
brie‡y review discrete joint probability functions and marginal probability functions that
were introduced in a previous probability course. In Section 3:3 we introduce the ideas
needed for two continuous random variables and look at detailed examples since this is new
material. In Section 3:4 we de…ne independence for two random variables and show how the
Factorization Theorem for Independence can be used. When two random variables are not
independent then we are interested in conditional distributions. In Section 3:5 we review
the de…nition of a conditional probability function for discrete random variables and de…ne
a conditional probability density function for continuous random variables which is new
material. In Section 3:6 we review expectations of functions of discrete random variables.
We also de…ne expectations of functions of continuous random variables which is new material except for the case of Normal random variables. In Section 3:7 we de…ne conditional
expectations which arise from the conditional distributions discussed in Section 3:5. In
Section 3:8 we discuss moment generating functions for two or more random variables, and
show how the Factorization Theorem for Moment Generating Functions can be used to
prove that random variables are independent. In Section 3:9 we review the Multinomial
distribution and its properties. In Section 3:10 we introduce the very important Bivariate
Normal distribution and its properties. Section 3:11 contains some useful results related to
evaluating double integrals.
61
62
3. MULTIVARIATE RANDOM VARIABLES
3.1
Joint and Marginal Cumulative Distribution Functions
We begin with the de…nitions and properties of the cumulative distribution functions associated with two random variables.
3.1.1
De…nition - Joint Cumulative Distribution Function
Suppose X and Y are random variables de…ned on a sample space S. The joint cumulative
distribution function of X and Y is given by
F (x; y) = P (X
3.1.2
x; Y
y) for (x; y) 2 <2
Properties - Joint Cumulative Distribution Function
(1) F is non-decreasing in x for …xed y
(2) F is non-decreasing in y for …xed x
(3) lim F (x; y) = 0 and lim F (x; y) = 0
(4)
x! 1
lim
(x;y)!( 1; 1)
3.1.3
y! 1
F (x; y) = 0 and
lim
(x;y)!(1;1)
F (x; y) = 1
De…nition - Marginal Distribution Function
The marginal cumulative distribution function of X is given by
F1 (x) =
lim F (x; y)
y!1
= P (X
x) for x 2 <
The marginal cumulative distribution function of Y is given by
F2 (y) =
lim F (x; y)
x!1
= P (Y
y) for y 2 <
Note: The de…nitions and properties of the joint cumulative distribution function and the
marginal cumulative distribution functions hold for both (X; Y ) discrete random variables
and for (X; Y ) continuous random variables.
Joint and marginal cumulative distribution functions for discrete random variables are not
very convenient for determining probabilities. Joint and marginal probability functions,
which are de…ned in Section 3.2, are more frequently used for discrete random variables.
In Section 3.3 we look at speci…c examples of joint and marginal cumulative distribution
functions for continuous random variables. In Chapter 5 we will see the important role of
cumulative distribution functions in determining asymptotic distributions.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS
3.2
63
Bivariate Discrete Distributions
Suppose X and Y are random variables de…ned on a sample space S. If there is a countable
subset A <2 such that P [(X; Y ) 2 A] = 1, then X and Y are discrete random variables.
Probabilities for discrete random variables are most easily handled in terms of joint
probability functions.
3.2.1
De…nition - Joint Probability Function
Suppose X and Y are discrete random variables.
The joint probability function of X and Y is given by
f (x; y) = P (X = x; Y = y) for (x; y) 2 <2
The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ).
3.2.2
Properties of Joint Probability Function
(1) f (x; y) 0 for (x; y) 2 <2
P P
f (x; y) = 1
(2)
(x;y)2 A
(3) For any set R
3.2.3
<2
P [(X; Y ) 2 R] =
P P
f (x; y)
(x;y)2 R
De…nition - Marginal Probability Function
Suppose X and Y are discrete random variables with joint probability function f (x; y).
The marginal probability function of X is given by
f1 (x) = P (X = x)
P
=
f (x; y) for x 2 <
all y
and the marginal probability function of Y is given by
f2 (y) = P (Y = y)
P
=
f (x; y) for y 2 <
all x
3.2.4
Example
In a fourth year statistics course there are 10 actuarial science students, 9 statistics students
and 6 math business students. Five students are selected at random without replacement.
Let X be the number of actuarial science students selected, and let Y be the number of
statistics students selected.
64
3. MULTIVARIATE RANDOM VARIABLES
Find
(a) the joint probability function of X and Y
(b) the marginal probability function of X
(c) the marginal probability function of Y
(d) P (X > Y )
Solution
(a) The joint probability function of X and Y is
f (x; y) = P (X = x; Y = y)
10 9
6
x
y
5 x y
=
25
5
for x = 0; 1; : : : ; 5, y = 0; 1; : : : ; 5, x + y
5
(b) The marginal probability function of X is
f1 (x) = P (X = x)
10 9
1
X
x
y
=
y=0
=
=
10
x
10
x
6
x
5
25
5
15
5
25
5
x
1
X
y=0
9
y
y
6
x
5
y
15
5
x
15
5
25
5
x
for x = 0; 1; : : : ; 5
by the Hypergeometric identity 2.11.6. Note that the marginal probability function of X
is Hypergeometric(25; 10; 5). This makes sense because, when we are only interested in the
number of actuarial science students, we only have two types of objects (actuarial science
students and non-actuarial science students) and we are sampling without replacement
which gives us the familiar Hypergeometric probability function.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS
65
(c) The marginal probability function of Y is
f2 (y) = P (Y = y)
10 9
1
X
x
y
=
x=0
9
y
=
9
y
=
5
25
5
16
5 y
25
5
16
5 y
25
5
1
X
x=0
6
x
10
x
y
6
x
5
16
5 y
y
for y = 0; 1; : : : ; 5
by the Hypergeometric identity 2.11.6. The marginal probability function of Y is
Hypergeometric(25; 9; 5).
(d)
P (X > Y ) =
P P
f (x; y)
(x;y): x>y
= f (1; 0) + f (2; 0) + f (3; 0) + f (4; 0) + f (5; 0)
+f (2; 1) + f (3; 1) + f (4; 1)
+f (3; 2)
3.2.5
Exercise
The Hardy-Weinberg law of genetics states that, under certain conditions, the relative
frequencies with which three genotypes AA; Aa and aa occur in the population will be 2 ;
2 (1
) and (1
)2 respectively where 0 < < 1. Suppose n members of a very large
population are selected at random.
Let X be the number of AA types selected and let Y be the number of Aa types selected.
Find
(a) the joint probability function of X and Y
(b) the marginal probability function of X
(c) the marginal probability function of Y
(d) P (X + Y = t) for t = 0; 1; : : :.
66
3.3
3. MULTIVARIATE RANDOM VARIABLES
Bivariate Continuous Distributions
Probabilities for continuous random variables can also be specifed in terms of joint probability density functions.
3.3.1
De…nition - Joint Probability Density Function
Suppose that F (x; y) is a continuous function and that
f (x; y) =
@2
F (x; y)
@x@y
exists and is a continuous function except possibly along a …nite number of curves. Suppose
also that
Z1 Z1
f (x; y)dxdy = 1
1
1
Then X and Y are said to be continuous random variables with joint probability density
function f . The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ).
Note: We will arbitrarily de…ne f (x; y) to be equal to 0 when
although we could de…ne it to be any real number.
3.3.2
@2
@x@y F (x; y)
does not exist
Properties - Joint Probability Density Function
(1)
0 for all (x; y) 2 <2
f (x; y)
(2)
P [(X; Y ) 2 R] =
ZZ
f (x; y)dxdy
for R
<2
R
=
the volume under the surface z = f (x; y)
and above the region R in the xy
3.3.3
plane
Example
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = x + y
for 0 < x < 1; 0 < y < 1
and 0 otherwise. The support set of (X; Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g.
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS
67
The joint probability function for (x; y) 2 A is graphed in Figure 3.1. We notice that the
surface is the portion of the plane z = x + y lying above the region A.
2
1.5
1
0.5
0
1
1
0.8
0.5
0.6
0.4
0.2
0
0
Figure 3.1: Graph of joint probability density function for Example 3.3.3
(a) Show that
Z1 Z1
1
(b) Find
(i) P X
1
3; Y
(ii) P (X
Y)
1
2
1
2
(iii) P X + Y
(iv) P XY
1
2
.
1
f (x; y)dxdy = 1
68
3. MULTIVARIATE RANDOM VARIABLES
Solution
(a) A graph of the support set for (X; Y ) is given in Figure 3.2. Such a graph is useful for
determining the limits of integration of the double integral.
1
y
A
0
1
x
Figure 3.2: Graph of the support set of (X; Y ) for Example 3.3.3
Z1 Z1
1
f (x; y)dxdy =
1
Z Z
(x + y) dxdy
(x;y) 2A
=
Z1 Z1
(x + y) dxdy
=
Z1
1 2
x + xy j10 dy
2
=
Z1
0
0
0
1
+ y dy
2
0
1
1
y + y 2 j10
2
2
1 1
=
+
2 2
= 1
=
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS
69
(b) (i) A graph of the region of integration is given in Figure 3.3
y
1
1/2
B
1/3
0
x
1
Figure 3.3: Graph of the integration region for Example 3.3.3(b)(i)
P
X
1
;Y
3
1
2
=
Z
Z
(x + y) dxdy
(x;y) 2B
Z1=2Z1=3
(x + y) dxdy
=
0
=
0
Z1=2
1 2
1=3
x + xy j0
dy
2
0
=
Z1=2"
1
2
1
3
2
0
=
=
=
#
1
+ y dy
3
1
1
1=2
y + y 2 j0
18
6
1
18
5
72
1
2
+
1
6
1
2
2
70
3. MULTIVARIATE RANDOM VARIABLES
(ii) A graph of the region of integration is given in Figure 3.4. Note that when the
region is not rectangular then care must be taken with the limits of integration.
y
y=x
1
C
(y,y)
x
0
1
Figure 3.4: Graph of the integration region for Example 3.3.3(b)(ii)
P (X
Z
Y)=
=
Z
(x + y) dxdy
(x;y) 2C
Z1
Zy
(x + y) dxdy
y=0 x=0
=
Z1
1 2
x + xy jy0 dy
2
0
=
Z1
1 2
y + y 2 dy
2
Z1
3 2
1
y dy = y 3 j10
2
2
0
=
0
=
1
2
Alternatively
P (X
Y)=
Z1 Z1
(x + y) dydx
x=0 y=x
Why does the answer of 1=2 make sense when you look at Figure 3.1?
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS
71
(iii) A graph of the region of integration is given in Figure 3.5.
y
1
1/2
(x,1/2-x)
D
1/2
0
x
1
x+y=1/2
Figure 3.5: Graph of the region of integration for Example 3.3.3(b)(iii)
P
1
2
X +Y
=
Z
Z
(x + y) dxdy
(x;y) 2D
1
=
1
2
Z2 Z
x
(x + y) dydx
x=0 y=0
1
=
Z2
1
1
xy + y 2 j02
2
x
dx
0
Z2 "
1
=
1
2
x
1
x +
2
1
2
2
x
0
#
dx
1
=
Z2
x2 1
+
2
8
dx =
x3 1
1=2
+ x j0
6
8
0
=
2
48
Alternatively
1
P
X +Y
1
2
=
1
2
Z2 Z
y
(x + y) dxdy
y=0 x=0
Why does this small probability make sense when you look at Figure 3.1?
72
3. MULTIVARIATE RANDOM VARIABLES
(iv) A graph of the region of integration E is given in Figure 3.6. In this example the
integration can be done more easily by integrating over the region F .
y
1
(x,1/(2x))
F
1/2 xy=1/2
E
x
|
1/2
0
1
Figure 3.6: Graph of the region of integration for Example 3.3.3(b)(iv)
P
XY
1
2
=
Z
Z
(x;y) 2E
= 1
Z
(x + y) dxdy
Z
(x + y) dxdy
(x;y) 2F
= 1
Z1
Z1
Z1
1
xy + y 2 j11 dx
2x
2
(x + y) dydx
1
x= 12 y= 2x
= 1
1
2
= 1
Z1 (
1
x+
2
1
2
= 1
Z1
"
x
1
8x2
dx
1 2
1
x +
2
8x
j11
x
1
2
= 1
=
3
4
2
1
2x
1
+
2
1
2x
2
#)
dx
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS
3.3.4
73
De…nition of Marginal Probability Density Function
Suppose X and Y are continuous random variables with joint probability density function
f (x; y).
The marginal probability density function of X is given by
f1 (x) =
Z1
f (x; y)dy
1
for x 2 <
and the marginal probability density function of Y is given by
f2 (y) =
Z1
1
3.3.5
f (x; y)dx for y 2 <
Example
For the joint probability density function in Example 3.3.3 determine:
(a) the marginal probability density function of X and the marginal probability density
function of Y
(b) the joint cumulative distribution function of X and Y
(c) the marginal cumulative distribution function of X and the marginal cumulative distribution function of Y
Solution
(a) The marginal probability density function of X is
f1 (x) =
Z1
f (x; y)dy =
1
Z1
1
1
xy + y 2 j10 = x +
2
2
(x + y) dy =
for 0 < x < 1
0
and 0 otherwise.
Since both the joint probability density function f (x; y) and the support set A are symmetric
in x and y then by symmetry the marginal probability density function of Y is
f2 (y) = y +
1
2
for 0 < y < 1
and 0 otherwise.
(b) Since
P (X
x; Y
y) =
Zy Zx
0
=
=
(s + t) dsdt =
0
Zy
1 2
s + st jx0 dt =
2
0
1 2
1
x t + xt2 jy0
2
2
1 2
x y + xy 2
2
for 0 < x < 1, 0 < y < 1
Zy
0
1 2
x + xt dt
2
74
3. MULTIVARIATE RANDOM VARIABLES
P (X
x; Y
y) =
Zx Z1
(s + t) dtds =
1 2
x +x
2
for 0 < x < 1, y
y) =
Zy Z1
(s + t) dsdt =
1 2
y +y
2
for x
0
P (X
x; Y
0
0
1, 0 < y < 1
0
the joint cumulative distribution function of X and Y is
F (x; y) = P (X
x; Y
y) =
8
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
:
0
1
2
x2 y
1
2
1
2
+
x
xy 2
0 or y
0
0 < x < 1, 0 < y < 1
x2 + x
0 < x < 1, y
y2
x
+y
1
1
1, 0 < y < 1
x
1, y
1
(c) Since the support set of (X; Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g then
F1 (x) = P (X
=
x)
lim F (x; y) = lim P (X
y!1
x; Y
y!1
= F (x; 1)
1 2
x +x
=
2
y)
for 0 < x < 1
Alternatively
F1 (x) = P (X
=
1
x) =
1 2
x +x
2
Zx
f1 (s) ds =
1
Zx
s+
1
2
0
for 0 < x < 1
In either case the marginal cumulative distribution function of X is
8
0
x 0
>
>
<
1
2
0<x<1
F1 (x) =
2 x +x
>
>
:
1
x 1
By symmetry the marginal cumulative distribution function of Y is
8
0
y 0
>
>
<
1
2
0<y<1
F2 (y) =
2 y +y
>
>
:
1
y 1
ds
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS
3.3.6
75
Exercise
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) =
k
(1 + x + y)3
for 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Determine k and sketch f (x; y).
(b) Find
(i) P (X 1; Y
(ii) P (X Y )
(iii) P (X + Y
2)
1)
(c) Determine the marginal probability density function of X and the marginal probability
density function of Y .
(d) Determine the joint cumulative distribution function of X and Y .
(e) Determine the marginal cumulative distribution function of X and the marginal cumulative distribution function of Y .
3.3.7
Exercise
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = ke
x y
for 0 < x < y < 1
and 0 otherwise.
(a) Determine k and sketch f (x; y).
(b) Find
(i) P (X 1; Y
(ii) P (X Y )
(iii) P (X + Y
2)
1)
(c) Determine the marginal probability density function of X and the marginal probability
density function of Y .
(d) Determine the joint cumulative distribution function of X and Y .
(e) Determine the marginal cumulative distribution function of X and the marginal cumulative distribution function of Y .
76
3.4
3. MULTIVARIATE RANDOM VARIABLES
Independent Random Variables
Suppose we are modeling a phenomenon involving two random variables. For example
suppose X is your …nal mark in this course and Y is the time you spent doing practice
problems. We would be interested in whether the distribution of one random variable
a¤ects the distribution of the other. The following de…nition de…nes this idea precisely.
The de…nition should remind you of the de…nition of independent events (see De…nition
2.1.8).
3.4.1
De…nition - Independent Random Variables
Two random variables X and Y are called independent random variables if and only if
P (X 2 A and Y 2 B) = P (X 2 A)P (Y 2 B)
for all sets A and B of real numbers.
De…nition 3.4.1 is not very convenient for determining the independence of two random
variables. The following theorem shows how to use the marginal and joint cumulative
distribution functions or the marginal and joint probability (density) functions to determine
if two random variables are independent.
3.4.2
Theorem - Independent Random Variables
(1) Suppose X and Y are random variables with joint cumulative distribution function
F (x; y). Suppose also that F1 (x) is the marginal cumulative distribution function of X and
F2 (y) is the marginal cumulative distribution function of Y . Then X and Y are independent
random variables if and only if
F (x; y) = F1 (x) F2 (y)
for all (x; y) 2 <2
(2) Suppose X and Y are random variables with joint probability (density) function f (x; y).
Suppose also that f1 (x) is the marginal probability (density) function of X with support
set A1 = fx : f1 (x) > 0g and f2 (y) is the marginal probability (density) function of Y with
support set A2 = fy : f2 (y) > 0g. Then X and Y are independent random variables if and
only if
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2
where A1
A2 = f(x; y) : x 2 A1 ; y 2 A2 g.
Proof
(1) For given (x; y), let Ax = fs : s xg and let By = ft : t
X and Y are independent random variables if and only if
yg. Then by De…nition 3.4.1
P (X 2 Ax and Y 2 By ) = P (X 2 Ax ) P (Y 2 By )
3.4. INDEPENDENT RANDOM VARIABLES
77
for all (x; y) 2 <2 .
But
P (X 2 Ax and Y 2 By ) = P (X
P (X 2 Ax ) = P (X
x; Y
y) = F (x; y)
x) = F1 (x)
and
P (Y 2 By ) = F2 (y)
Therefore X and Y are independent random variables if and only if
F (x; y) = F1 (x) F2 (y)
for all (x; y) 2 <2
as required.
(2) (Continuous Case) From (1) we have X and Y are independent random variables if
and only if
F (x; y) = F1 (x) F2 (y)
for all (x; y) 2 <2 . Now
@
@x F1 (x)
@2
exists for x 2 A1 and
@
@y F2 (y)
(3.1)
exists for y 2 A2 . Taking
the partial derivative @x@y of both sides of (3.1) where the partial derivative exists implies
that X and Y are independent random variables if and only if
@2
@
@
F (x; y) =
F1 (x) F2 (y)
@x@y
@x
@y
for all (x; y) 2 A1
A2
or
f (x; y) = f1 (x) f2 (y)
for all (x; y) 2 A1
A2
as required.
Note: The discrete case can be proved using an argument similar to the one used for (1).
3.4.3
Example
(a) In Example 3.2.4 determine if X and Y are independent random variables.
(b) In Example 3.3.3 determine if X and Y are independent random variables.
Solution
(a) Since the total number of students is …xed, a larger number of actuarial science students
would imply a smaller number of statistics students and we would guess that the random
variables are not independent. To show this we only need to …nd one pair of values (x; y)
78
3. MULTIVARIATE RANDOM VARIABLES
for which P (X = x; Y = y) 6= P (X = x) P (Y = y). Since
10
0
9 6
0 5
25
5
10 15
0
5
=
25
5
9 16
0
5
=
25
5
P (X = 0; Y = 0) =
P (X = 0) =
P (Y = 0) =
6
5
25
5
=
15
5
25
5
16
5
25
5
and
P (X = 0; Y = 0) =
6
5
25
5
6= P (X = 0) P (Y = 0) =
15
5
25
5
16
5
25
5
therefore by Theorem 3.4.2, X and Y are not independent random variables.
(b) Since
f (x; y) = x + y
f1 (x) = x +
1
2
for 0 < x < 1; 0 < y < 1
for 0 < x < 1, f2 (y) = y +
1
2
for 0 < y < 1
it would appear that X and Y are not independent random variables. To show this we only
need to …nd one pair of values (x; y) for which f (x; y) 6= f1 (x) f2 (y). Since
f
2 1
;
3 3
= 1 6= f1
2
3
f2
1
3
=
7
6
5
6
therefore by Theorem 3.4.2, X and Y are not independent random variables.
3.4.4
Exercise
In Exercises 3.2.5, 3.3.6 and 3.3.7 determine if X and Y independent random variables.
In the previous examples we determined whether the random variables were independent
using the joint probability (density) function and the marginal probability (density) functions. The following very useful theorem does not require us to determine the marginal
probability (density) functions.
3.4. INDEPENDENT RANDOM VARIABLES
3.4.5
79
Factorization Theorem for Independence
Suppose X and Y are random variables with joint probability (density) function f (x; y).
Suppose also that A is the support set of (X; Y ), A1 is the support set of X, and A2 is the
support set of Y . Then X and Y are independent random variables if and only if there
exist non-negative functions g (x) and h (y) such that
f (x; y) = g (x) h (y)
for all (x; y) 2 A1
A2
Notes:
(1) If the Factorization Theorem for Independence holds then the marginal probability
(density) function of X will be proportional to g and the marginal probability (density)
function of Y will be proportional to h.
(2) Whenever the support set A is not rectangular the random variables will not be independent. The reason for this is that when the support set is not rectangular it will always
be possible to …nd a point (x; y) such that x 2 A1 with f1 (x) > 0, and y 2 A2 with
f2 (y) > 0 so that f1 (x) f2 ( y) > 0, but (x; y) 2
= A so f (x; y) = 0. This means there is a
point (x; y) such that f (x; y) 6= f1 (x) f2 ( y) and therefore X and Y are not independent
random variables.
(3) The above de…nitions and theorems can easily be extended to the random vector
(X1 ; X2 ; : : : ; Xn ).
Proof (Continuous Case)
If X and Y are independent random variables then by Theorem 3.4.2
f (x; y) = f1 (x) f2 (y)
for all (x; y) 2 A1
A2
Letting g (x) = f1 (x) and h (y) = f2 (y) proves there exist g (x) and h (y) such that
f (x; y) = g (x) h (y)
for all (x; y) 2 A1
A2
If there exist non-negative functions g (x) and h (y) such that
f (x; y) = g (x) h (y)
then
f1 (x) =
Z1
f (x; y)dy =
Z1
f (x; y)dx =
1
and
f2 (y) =
1
for all (x; y) 2 A1
A2
Z
g (x) h (y) dy = cg (x)
for x 2 A1
Z
g (x) h (y) dy = kh (y)
for y 2 A2
y2A2
x2A1
80
3. MULTIVARIATE RANDOM VARIABLES
Now
1 =
Z1 Z1
1
=
Z
f (x; y)dxdy
1
Z
f1 (x) f2 (y) dxdy
y2A2 x2A1
=
Z
Z
cg (x) kh (y) dxdy
y2A2 x2A1
= ck
Since ck = 1
f (x; y) = g (x) h (y) = ckg (x) h (y)
= cg (x) kh (y)
= f1 (x) f2 (y)
for all (x; y) 2 A1
A2
and by Theorem 3.4.2 X and Y are independent random variables.
3.4.6
Example
Suppose X and Y are discrete random variables with joint probability function
x+y
f (x; y) =
e 2
x!y!
for x = 0; 1 : : : ; y = 0; 1; : : :
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X; Y ) is A = f(x; y) : x = 0; 1; : : : ; y = 0; 1; : : :g which is rectangular. The support set of X is A1 = fx : x = 0; 1; : : :g, and the support set of Y is
A2 = fy : y = 0; 1; : : :g.
Let
x
y
e
e
g (x) =
and h (y) =
x!
y!
Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) By inspection we can see that g (x) is the probability function for a Poisson( ) random
variable. Therefore the marginal probability function of X is
x
f1 (x) =
e
x!
for x = 0; 1 : : :
3.4. INDEPENDENT RANDOM VARIABLES
81
Similarly the marginal probability function of Y is
y
e
y!
f2 (y) =
and Y
3.4.7
for y = 0; 1 : : :
Poisson( ).
Example
Suppose X and Y are continuous random variables with joint probability density function
3
f (x; y) = y 1
2
x2
for
1 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X; Y ) is A = f(x; y) : 1 < x < 1; 0 < y < 1g which is rectangular. The support set of X is A1 = fx : 1 < x < 1g, and the support set of Y is
A2 = fy : 0 < y < 1g.
Let
3
g (x) = 1 x2 and h (y) = y
2
Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) Since the marginal probability density function of Y is proportional to h (y) we know
f2 (y) = kh (y) for 0 < y < 1 where k is determined by
1=k
Z1
3
3k 2 1 3k
ydy =
y j0 =
2
4
4
0
Therefore k =
4
3
and
f2 (y) = 2y
for 0 < y < 1
and 0 otherwise.
Since X and Y are independent random variables f (x; y) = f1 (x) f2 (y) or
f1 (x) = f (x; y)=f2 (y) for x 2 A1 . Therefore the marginal probability density function of
X is
f1 (x) =
=
and 0 otherwise.
3
y 1 x2
f (x; y)
= 2
f2 (y)
2y
3
1 x2
for
1<x<1
4
82
3.4.8
3. MULTIVARIATE RANDOM VARIABLES
Example
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) =
2
for 0 < x <
and 0 otherwise.
p
1
y2,
1<y<1
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X; Y ) which is
n
p
A = (x; y) : 0 < x < 1
y2;
1<y<1
is graphed in Figure 3.7.
o
y
1
2
x=sqrt(1-y )
A
0
1
x
-1
Figure 3.7: Graph of the support set of (X; Y ) for Example 3.4.8
The set A can also be described as
n
A = (x; y) : 0 < x < 1;
p
1
x2 < y <
p
1
x2
o
The support set A is not rectangular. Note that the surface has constant height on the
region A.
3.4. INDEPENDENT RANDOM VARIABLES
83
The support set for X is
A1 = fx : 0 < x < 1g
and the support set for Y is
A2 = fy :
1 < y < 1g
If we choose the point (0:9; 0:9) 2 A1 A2 , f (0:9; 0:9) = 0 but f1 (0:9) > 0 and
f2 (0:9) > 0 so f (0:9; 0:9) 6= f1 (0:9) f2 (0:9) and therefore X and Y are not independent
random variables.
(b) When the support set is not rectangular care must be taken to determine the marginal
probability functions.
To …nd the marginal probability density function of X we use the description of the support
set in which the range of X does not depend on y which is
n
o
p
p
A = (x; y) : 0 < x < 1;
1 x2 < y < 1 x2
The marginal probability density function of X is
f1 (x) =
Z1
f (x; y)dy
1
p
=
p
=
and 0 otherwise.
Z1
x2
2
dy
1 x2
4p
1
x2
for 0 < x < 1
To …nd the marginal probability density function of Y which use the description of the
support set in which the range of Y does not depend on x which is
n
o
p
A = (x; y) : 0 < x < 1 y 2 ; 1 < y < 1
The marginal probability density function of Y is
f2 (y) =
Z1
f (x; y)dx
1
=
p
Z1
y2
2
dx
0
=
and 0 otherwise.
2p
1
y2
for
1<y<1
84
3.5
3. MULTIVARIATE RANDOM VARIABLES
Conditional Distributions
In Section 2.1 we de…ned the conditional probability of event A given event B as
P (AjB) =
P (A \ B)
P (B)
provided P (B) > 0
The concept of conditional probability can also be extended to random variables.
3.5.1
De…nition - Conditional Probability (Density) Function
Suppose X and Y are random variables with joint probability (density) function f (x; y),
and marginal probability (density) functions f1 (x) and f2 (y) respectively. Suppose also
that the support set of (X; Y ) is A = f(x; y) : f (x; y) > 0g.
The conditional probability (density) function of X given Y = y is given by
f1 (xjy) =
f (x; y)
f2 (y)
(3.2)
for (x; y) 2 A provided f2 (y) 6= 0.
The conditional probability (density) function of Y given X = x is given by
f2 (yjx) =
f (x; y)
f1 (x)
for (x; y) 2 A provided f1 (x) 6= 0.
Notes:
(1) If X and Y are discrete random variables then
f1 (xjy) = P (X = xjY = y)
P (X = x; Y = y)
=
P (Y = y)
f (x; y)
=
f2 (y)
and
P
x
Similarly for f2 (yjx).
P f (x; y)
x f2 (y)
1 P
=
f (x; y)
f2 (y) x
f2 (y)
=
f2 (y)
= 1
f1 (xjy) =
(3.3)
3.5. CONDITIONAL DISTRIBUTIONS
85
(2) If X and Y are continuous random variables
Z1
Z1
f1 (xjy)dx =
1
1
f (x; y)
dx
f2 (y)
1
f2 (y)
=
f2 (y)
f2 (y)
= 1
Z1
f (x; y)dx
1
=
Similarly for f2 (yjx).
(3) If X is a continuous random variable then f1 (x) 6= P (X = x) and P (X = x) = 0 for all
x. Therefore to justify the de…nition of the conditional probability density function of Y
given X = x when X and Y are continuous random variables we consider P (Y
yjX = x)
as a limit
P (Y
=
lim
h!0
yjX = x) = lim P (Y
h!0
x+h
R Ry
x
d
dh
=
=
lim
h!0
lim
h!0
1
limh!0
x + h)
f (u; v) dvdu
f1 (u) du
x
x+h
R Ry
f (u; v) dvdu
x
1
x+h
R
by L’Hôpital’s Rule
f1 (u) du
x
f (x + h; v) dv
by the Fundamental Theorem of Calculus
f1 (x + h)
Ry
f (x + h; v) dv
1
=
=
X
1
x+h
R
d
dh
Ry
yjx
limh!0 f1 (x + h)
Ry
1
f (x; v) dv
f1 (x)
assuming that the limits exist and that integration and the limit operation can be interchanged. If we di¤erentiate the last term with respect to y using the Fundamental Theorem
of Calculus we have
d
f (x; y)
P (Y
yjX = x) =
dy
f1 (x)
86
3. MULTIVARIATE RANDOM VARIABLES
which gives us a justi…cation for using
f2 (yjx) =
f (x; y)
f1 (x)
as the conditional probability density function of Y given X = x.
(4) For a given value of x, call it x , we can think of obtaining the conditional probability
density function of Y given X = x geometrically in the following way. Think of the curve
of intersection which is obtained by cutting through the surface z = f (x; y) with the plane
x = x which is parallel to the yz plane. The curve of intersection is z = f (x ; y) which
is a curve lying in the plane x = x . The area under the curve z = f (x ; y) and lying
above the xy plane is not necessarily equal to 1 and therefore z = f (x ; y) is not a proper
probability density function. However if we consider the curve z = f (x ; y) =f1 (x ), which
is just a rescaled version of the curve z = f (x ; y), then the area lying under the curve
z = f (x ; y) =f1 (x ) and lying above the xy plane is equal to 1. This is the probability
density function we want.
3.5.2
Example
In Example 3.4.8 determine the conditional probability density function of X given Y = y
and the conditional probability density function of Y given X = x.
Solution
The conditional probability density function of X given Y = y is
f1 (xjy) =
f (x; y)
f2 (y)
2
=
=
p
2
1 y2
p
1
p
for 0 < x < 1
1 y2
y2;
1<y<1
Note that for each y 2 ( 1; 1), the conditional probability density function of X given
p
Y = y is Uniform 0; 1 y 2 . This makes sense because the joint probability density
function is constant on its support set.
The conditional probability density function of Y given X = x is
f2 (yjx) =
f (x; y)
f1 (x)
2
=
=
p
4
1 x2
1
p
for
2 1 x2
p
1
x2 < y <
p
1
x2 ; 0 < x < 1
3.5. CONDITIONAL DISTRIBUTIONS
87
Note that for each x 2 (0; 1), the conditional probability density function of Y given X = x
p
p
is Uniform
1 x2 ; 1 x2 . This again makes sense because the joint probability
density function is constant on its support set.
3.5.3
Exercise
In Exercise 3.2.5 show that the conditional probability function of Y given X = x is
Binomial n
x;
2 (1
1
)
2
Why does this make sense?
3.5.4
Exercise
In Example 3.3.3 and Exercises 3.3.6 and 3.3.7 determine the conditional probability density
function of X given Y = y and the conditional probability density function of Y given
X = x. Be sure to check that
Z1
1
f1 (xjy)dx = 1 and
Z1
f2 (yjx)dy = 1
1
When choosing a model for bivariate data it is sometimes easier to specify a conditional
probability (density) function and a marginal probability (density) function. The joint
probability (density) function can then be determined using the Product Rule which is
obtained by rewriting (3.2) and (3.3).
3.5.5
Product Rule
Suppose X and Y are random variables with joint probability (density) function f (x; y),
marginal probability (density) functions f1 (x) and f2 (y) respectively and conditional probability (density) function’s f1 (xjy) and f2 (yjx). Then
f (x; y) = f1 (xjy)f2 (y)
= f2 (yjx)f1 (x)
3.5.6
Example
In modeling survival in a certain insect population it is assumed that the number of eggs
laid by a single female follows a Poisson( ) distribution. It is also assumed that each egg
has probability p of surviving independently of any other egg. Determine the probability
function of the number of eggs that survive.
88
3. MULTIVARIATE RANDOM VARIABLES
Solution
Let Y = number of eggs laid and let X = number of eggs that survive. Then Y
Poisson( )
and XjY = y
Binomial(y; p). We want to determine the marginal probability function
of X.
By the Product Rule the joint probability function of X and Y is
f (x; y) = f1 (xjy) f2 (y)
y x
=
p (1 p)y
x
=
px e
y!
y x
(1
x!
ye
x
y
p)
(y
x)!
with support set is
A = f(x; y) : x = 0; 1; : : : ; y; y = 0; 1; : : :g
which can also be written as
A = f(x; y) : y = x; x + 1; : : : ; x = 0; 1; : : :g
(3.4)
The marginal probability function of X can be obtained using
f1 (x) =
P
f (x; y)
all y
Since we are summing over y we need to use the second description of the support set given
in (3.4). So
f1 (x) =
=
=
=
=
1 px e
P
x!
y=x
px e
x!
(1
x P
1
y=x
p)y x
(y x)!
(1
y
p)y x y
(y x)!
x
let u = y
x
x P
1 [ (1
px e
p)]u
x!
u!
u=0
x
x
p e
e (1 p) by the Exponential series 2.11.7
x!
(p )x e p
for x = 0; 1; : : :
x!
which we recognize as a Poisson(p ) probability function.
3.5.7
Example
Determine the marginal probability function of X if Y
distribution of X given Y = y is Weibull(p; y 1=p ).
Gamma( ; 1 ) and the conditional
3.5. CONDITIONAL DISTRIBUTIONS
89
Solution
Since Y
Gamma( ; 1 )
y
1e
y
f2 (y) =
for y > 0
( )
and 0 otherwise.
Since the conditional distribution of X given Y = y is Weibull(p; y
f1 (xjy) = pyxp
1
e
yxp
1=p )
for x > 0
By the Product Rule the joint probability density function of X and Y is
f (x; y) = f1 (xjy) f2 (y)
= pyxp
p
=
1
e
1e
y
yxp
y
( )
xp 1
y e
( )
y( +xp )
The support set is
A = f(x; y) : x > 0; y > 0g
which is a rectangular region.
The marginal probability function of X is
f1 (x) =
Z1
f (x; y) dy
1
=
p
xp 1
( )
Z1
xp 1
( )
Z1
y e
y( +xp )
let u = y ( + xp )
dy
0
=
p
u
+ xp
e
u
1
+ xp
u e
u
du
0
=
p
xp 1
( )
p
xp 1
1
+ xp
1
+1 Z
du
0
=
=
( )
p xp
( + xp )
1
+ xp
+1
( + 1)
by 2.4.8
1
+1
since
( + 1) =
( )
for x > 0 and 0 otherwise. This distribution is a member of the Burr family of distributions
which is frequently used by actuaries for modeling household income, crop prices, insurance
risk, and many other …nancial variables.
90
3. MULTIVARIATE RANDOM VARIABLES
The following theorem gives us one more method for determining whether two random
variables are independent.
3.5.8
Theorem
Suppose X and Y are random variables with marginal probability (density) functions f1 (x)
and f2 (y) respectively and conditional probability (density) functions f1 (xjy) and f2 (yjx).
Suppose also that A1 is the support set of X, and A2 is the support set of Y . Then X and
Y are independent random variables if and only if either of the following holds
f1 (xjy) = f1 (x) for all x 2 A1
or
f2 (yjx) = f2 (y) for all y 2 A2
3.5.9
Example
Suppose the conditional distribution of X given Y = y is
f1 (xjy) =
e
1
x
e
y
for 0 < x < y
and 0 otherwise. Are X and Y independent random variables?
Solution
Since the conditional distribution of X given Y = y depends on y then f1 (xjy) = f1 (x)
cannot hold for all x in the support set of X and therefore X and Y are not independent
random variables.
3.6
Joint Expectations
As with univariate random variables we de…ne the expectation operator for bivariate random
variables. The discrete case is a review of material you would have seen in a previous
probability course.
3.6.1
De…nition - Joint Expectation
Suppose h(x; y) is a real-valued function.
If X and Y are discrete random variables with joint probability function f (x; y) and support
set A then
P P
E[h(X; Y )] =
h(x; y)f (x; y)
(x;y) 2A
provided the joint sum converges absolutely.
3.6. JOINT EXPECTATIONS
91
If X and Y are continuous random variables with joint probability density function f (x; y)
then
Z1 Z1
E[h(X; Y )] =
h(x; y)f (x; y)dxdy
1
1
provided the joint integral converges absolutely.
3.6.2
Theorem
Suppose X and Y are random variables with joint probability (density) function f (x; y), a
and b are real constants, and g(x; y) and h(x; y) are real-valued functions. Then
E[ag(X; Y ) + bh(X; Y )] = aE[g(X; Y )] + bE[h(X; Y )]
Proof (Continuous Case)
E[ag(X; Y ) + bh(X; Y )] =
Z1 Z1
1
= a
[ag(x; y) + bh(x; y)] f (x; y)dxdy
1
Z1 Z1
1
g(x; y)f (x; y)dxdy + b
1
Z1 Z1
1
by properties of double integrals
h(x; y)f (x; y)dxdy
1
= aE[g(X; Y )] + bE[h(X; Y )] by De…nition 3.6.1
3.6.3
Corollary
(1)
E(aX + bY ) = aE(X) + bE(Y ) = a
where
X
= E(X) and
Y
X
+b
Y
= E(Y ).
(2) If X1 ; X2 ; : : : ; Xn are random variables and a1 ; a2 ; : : : ; an are real constants then
E
n
P
i=1
where
i
= E(Xi ).
ai Xi
=
n
P
ai E(Xi ) =
i=1
n
P
i=1
ai
i
(3) If X1 ; X2 ; : : : ; Xn are random variables with E(Xi ) = , i = 1; 2; : : : ; n then
E X =
n
n
1 P
1 P
E (Xi ) =
n i=1
n i=1
=
n
=
n
92
3. MULTIVARIATE RANDOM VARIABLES
Proof of (1) (Continuous Case)
E(aX + bY ) =
Z1 Z1
1
= a
= a
1
1
Z
1
Z1
(ax + by) f (x; y)dxdy
2
x4
Z1
1
3
f (x; y)dy 5 dx + b
xf1 (x) dx + b
1
Z1
3.6.4
1
2
y4
Z1
1
3
f (x; y)dx5 dy
yf2 (y) dy
1
= aE(X) + bE(Y ) = a
Z1
X
+b
Y
Theorem - Expectation and Independence
(1) If X and Y are independent random variables and g(x) and h(y) are real valued functions
then
E [g (X) h (Y )] = E [g (X)] E [h (Y )]
(2) More generally if X1 ; X2 ; : : : ; Xn are independent random variables and h1 ; h2 ; : : : ; hn
are real valued functions then
E
n
Q
hi (Xi ) =
i=1
n
Q
E [hi (Xi )]
i=1
Proof of (1) (Continuous Case)
Since X and Y are independent random variables then by Theorem 3.4.2
f (x; y) = f1 (x) f2 (y)
for all (x; y) 2 A
where A is the support set of (X; Y ). Therefore
E [g(X)h(Y )] =
Z1 Z1
1
=
Z1
1
g(x)h(y)f1 (x) f2 (y) dxdy
1
2
h(y)f2 (y) 4
= E [g (X)]
Z1
1
Z1
1
g(x)f1 (x) dx5 dy
h(y)f2 (y) dy
= E [g (X)] E [h (Y )]
3
3.6. JOINT EXPECTATIONS
3.6.5
93
De…nition - Covariance
The covariance of random variables X and Y is de…ned by
Cov(X; Y ) = E[(X
X )(Y
Y )]
If Cov(X; Y ) = 0 then X and Y are called uncorrelated random variables.
3.6.6
Theorem - Covariance and Independence
If X and Y are random variables then
Cov(X; Y ) = E(XY )
X Y
If X and Y are independent random variables then Cov(X; Y ) = 0.
Proof
Cov(X; Y ) = E [(X
X ) (Y
= E (XY
Y )]
XY
= E(XY )
X
X E(Y
Y
)
+
X Y)
Y E(X)
= E(XY )
E(X)E(Y )
= E(XY )
E(X)E(Y )
+
X Y
E(Y )E(X) + E(X)E(Y )
Now if X and Y are independent random variables then by Theorem 3.6.4
E (XY ) = E(X)E(Y ) and therefore Cov(X; Y ) = 0.
3.6.7
Theorem - Variance of a Linear Combination
(1) Suppose X and Y are random variables and a and b are real constants then
V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X; Y )
= a2
2
X
+ b2
2
Y
+ 2abCov(X; Y )
(2) Suppose X1 ; X2 ; : : : ; Xn are random variables with V ar(Xi ) =
real constants then
V ar
n
P
i=1
ai Xi
=
n
P
i=1
a2i
2
i
+2
nP1
n
P
2
i
and a1 ; a2 ; : : : ; an are
ai aj Cov (Xi ; Xj )
i=1 j=i+1
(3) If X1 ; X2 ; : : : ; Xn are independents random variables and a1 ; a2 ; : : : ; an are real constants
then
n
n
P
P
V ar
ai Xi =
a2i 2i
i=1
i=1
94
3. MULTIVARIATE RANDOM VARIABLES
(4) If X1 ; X2 ; : : : ; Xn are independent random variables with V ar(Xi ) =
then
V ar X
= V ar
n
1 P
n2 i=1
=
n
1 P
Xi
n i=1
2
=
1
n
=
n 2
n2
2 n
P
2,
i = 1; 2; : : : ; n
V ar (Xi )
i=1
2
=
n
Proof of (1)
h
i
V ar (aX + bY ) = E (aX + bY a X b Y )2
n
o
2
= E [a (X
)
+
b
(Y
)]
X
Y
h
2
2
2
= E a2 (X
X ) + b (Y
Y ) + 2ab (X
X ) (Y
h
i
h
i
2
2
= a2 E (X
+ b2 E (Y
+ 2abE [(X
X)
Y)
= a2
3.6.8
2
X
+ b2
2
Y
i
)
Y
X ) (Y
+ 2abCov (X; Y )
De…nition - Correlation Coe¢ cient
The correlation coe¢ cient of random variables X and Y is de…ned by
(X; Y ) =
Cov(X; Y )
X Y
3.6.9
Example
For the joint probability density function in Example 3.3.3 …nd (X; Y ).
Solution
E (XY ) =
Z1 Z1
1
=
=
Z1
0
=
1
Z1 Z1
0
1
3
xyf (x; y)dxdy =
Z1 Z1
0
0
x2 y + xy 2 dxdy =
Z1
0
1
1
y + y 2 dy =
3
2
xy (x + y) dxdy
1
1 3
x y + x2 y 2 j10 dy
3
2
0
1 2 1 3 1 1 1
y + y j0 = +
6
6
6 6
Y )]
3.6. JOINT EXPECTATIONS
E (X) =
Z1
95
xf1 (x)dx =
1
=
=
E X
2
=
=
1
x x+
2
dx =
0
Z1
1
x2 + x dx
2
0
1 3 1 2 1
x + x j0
3
4
1 1
7
+ =
3 4
12
Z1
2
x f1 (x)dx =
1
=
Z1
Z1
2
x
1
x+
2
dx =
Z1
1
x3 + x2 dx
2
0
0
1 4 1 3 1
x + x j0
4
6
1 1
3+2
5
+ =
=
4 6
12
12
[E (X)]2 =
V ar (X) = E X 2
=
7
12
5
12
2
60 49
11
=
144
144
By symmetry
E (Y ) =
7
11
and V ar (Y ) =
12
144
Therefore
Cov (X; Y ) = E (XY )
E (X) E (Y ) =
7
12
1
3
7
12
=
48 49
1
=
144
144
and
(X; Y ) =
=
=
=
3.6.10
Cov(X; Y )
q
X Y
1
144
11
144
11
144
1 144
144 11
1
11
Exercise
For the joint probability density function in Exercise 3.3.7 …nd (X; Y ).
96
3.6.11
3. MULTIVARIATE RANDOM VARIABLES
Theorem
If (X; Y ) is the correlation coe¢ cient of random variables X and Y then
1
(X; Y )
1
(X; Y ) = 1 if and only if Y = aX + b for some a > 0 and (X; Y ) =
Y = aX + b for some a < 0.
1 if and only if
Proof
Let S = X + tY , where t 2 <. Then E (S) =
V ar(S) = E (S
S)
= E (X
= t2
2
Y
and
2
= Ef[(X + tY )
= Ef[(X
S
(
X
+t
X ) + t(Y
2
X ) + 2t(X
+ 2Cov(X; Y )t +
2
Y )] g
2
Y )] g
X )(Y
Y)
+ t2 (Y
Y)
2
2
X
Now V ar(S)
0 for any t 2 < implies that the quadratic equation V ar(S) = t2 2Y +
2Cov(X; Y )t + 2X in the variable t must have at most one real root. To have at most
one real root the discrimant of this quadratic equation must be less than or equal to zero.
Therefore
[2Cov(X; Y )]2 4 2X 2Y
0
or
[Cov(X; Y )]2
or
j (X; Y )j =
2 2
X Y
Cov(X; Y )
1
X Y
and therefore
1
(X; Y )
1
To see that (X; Y ) = 1 corresponds to a linear relationship between X and Y , note
that (X; Y ) = 1 implies
jCov(X; Y )j = X Y
and therefore
[2Cov(X; Y )]2
4
2 2
X Y
=0
which corresponds to a zero discriminant in the quadratic equation. This means that there
exists one real number t for which
V ar(S) = V ar(X + t Y ) = 0
But V ar(X + t Y ) = 0 implies X + t Y must equal a constant, that is, X + t Y = c. Thus
X and Y satisfy a linear relationship.
3.7. CONDITIONAL EXPECTATION
3.7
97
Conditional Expectation
Since conditional probability (density) functions are also probability (density) function,
expectations can be de…ned in terms of these conditional probability (density) functions as
in the following de…nition.
3.7.1
De…nition - Conditional Expectation
The conditional expectation of g(X) given Y = y is given by
E [g(X)jy] =
P
g(x)f1 (xjy)
x
if Y is a discrete random variable and
E[g(X)jy] =
Z1
g(x)f1 (xjy)dx
1
if Y is a continuous random variable provided the sum/integral converges absolutely.
The conditional expectation of h(Y ) given X = x is de…ned in a similar manner.
3.7.2
Special Cases
(1) The conditional mean of X given Y = y is denoted by E (Xjy).
(2) The conditional variance of X given Y = y is denoted by V ar (Xjy) and is given by
n
V ar (Xjy) = E [X
= E X 2 jy
3.7.3
E (Xjy)]2 jy
o
[E (Xjy)]2
Example
For the joint probability density function in Example 3.4.8 …nd E (Y jx) the conditional
mean of Y given X = x, and V ar (Y jx) the conditional variance of Y given X = x.
Solution
From Example 3.5.2 we have
1
f2 (yjx) = p
2 1 x2
for
p
1
x2 < y <
p
1
x2 ; 0 < x < 1
98
3. MULTIVARIATE RANDOM VARIABLES
Therefore
E (Y jx) =
Z1
=
Z1
yf2 (yjx)dy
1
p
p
x2
1
y p
dy
2 1 x2
1 x2
p
=
=
1
p
2 1 x2
p
p
ydy
1 x2
p
1 x2
y2j p
1 x2
1
x2
1
x2
Z1
= 0
Since E (Y jx) = 0
2
V ar (Y jx) = E Y jx =
p
=
p
Z1
Z1
1
x2
1
dy
y2 p
2 1 x2
1 x2
p
=
=
=
y 2 f2 (yjx)dy
1
p
2 1 x2
p
Z1
x2
y 2 dy
1 x2
p
1
1 x2
p
y3j p
1 x2
6 1 x2
1
1 x2
3
Recall that the conditional distribution of Y given X = x is Uniform
The results above can be veri…ed by noting that if U
and V ar (U ) =
3.7.4
p
1
x2 ;
Uniform(a; b) then E (U ) =
(b a)2
12 .
Exercise
In Exercises 3.5.3 and 3.3.7 …nd E (Y jx), V ar (Y jx), E (Xjy) and V ar (Xjy).
3.7.5
p
1
Theorem
If X and Y are independent random variables then E [g (X) jy] = E [g (X)] and
E [h (Y ) jx] = E [h (Y )].
x2 .
a+b
2
3.7. CONDITIONAL EXPECTATION
99
Proof (Continuous Case)
E[g(X)jy] =
Z1
g(x)f1 (xjy)dx
=
Z1
g(x)f1 (x)dx by Theorem 3.5.8
1
1
= E [g (X)]
as required.
E [h (Y ) jx] = E [h (Y )] follows in a similar manner.
3.7.6
De…nition
E [g (X) jY ] is the function of the random variable Y whose value is E [g (X) jy] when Y = y.
This means of course that E [g (X) jY ] is a random variable.
3.7.7
Example
In Example 3.5.6 the joint model was speci…ed by Y
Poisson( ) and
XjY = y Binomial(y; p) and we showed that X Poisson(p ). Determine E (XjY = y),
E (XjY ), E[E(XjY )], and E (X). What do you notice about E[E(XjY )] and E (X).
Solution
Since XjY = y
Binomial(y; p)
E (XjY = y) = py
and
E (XjY ) = pY
which is a random variable. Since Y
Poisson( ) and E (Y ) =
E [E (XjY )] = E (pY ) = pE (Y ) = p
Now since X
Poisson(p ) then E (X) = p .
We notice that
E [E (XjY )] = E (X)
The following theorem indicates that this result holds generally.
100
3.7.8
3. MULTIVARIATE RANDOM VARIABLES
Theorem
Suppose X and Y are random variables then
E fE [g (X) jY ]g = E [g (X)]
Proof (Continuous Case)
2
E fE [g (X) jY ]g = E 4
Z1
=
1
=
Z1 Z1
=
Z1
=
Z1
1
1
1
Z1
g (x) f1 (xjy) dx5
1
Z1
2
4
3
1
3
g (x) f1 (xjy) dx5 f2 (y) dy
g (x) f1 (xjy) f2 (y) dxdy
2
g (x) 4
Z1
1
3
f (x; y) dy 5 dx
g (x) f1 (x) dx
1
= E [g (X)]
3.7.9
Corollary - Law of Total Expectation
Suppose X and Y are random variables then
E[E(XjY )] = E(X)
Proof
Let g (X) = X in Theorem 3.7.8 and the result follows.
3.7.10
Theorem - Law of Total Variance
Suppose X and Y are random variables then
V ar(X) = E[V ar(XjY )] + V ar[E(XjY )]
Proof
V ar (X) = E X 2
[E (X)]2
= E E X 2 jY
= E E X 2 jY
fE [E (XjY )]g2 by Theorem 3.7.8
n
o
n
o
E [E (XjY )]2 + E [E(XjY )]2
fE [E(XjY )]g2
= E[V ar(XjY )] + V ar[E(XjY )]
3.7. CONDITIONAL EXPECTATION
101
When the joint model is speci…ed in terms of a conditional distribution XjY = y and
a marginal distribution for Y then Theorems 3.7.8 and 3.7.10 give a method for …nding
expectations for functions of X without having to determine the marginal distribution for
X.
3.7.11
Example
Suppose P
Uniform(0; 0:1) and Y jP = p
Binomial(10; p). Find E(Y ) and V ar(Y ).
Solution
Since P
Uniform(0; 0:1)
E (P ) =
0 + 0:1
1
(0:1 0)2
1
= , V ar (P ) =
=
2
20
12
1200
and
E P2
=
Since Y jP = p
1
+
1200
1
1+3
4
1
+
=
=
=
1200 400
1200
1200
= V ar (P ) + [E (P )]2 =
1
20
1
300
2
Binomial(10; p)
E (Y jp) = 10p, E (Y jP ) = 10P
and
V ar (Y jp) = 10p (1
p) , V ar (Y jP ) = 10P (1
P2
P ) = 10 P
Therefore
E (Y ) = E [E (Y jP )] = E (10P ) = 10E (P ) = 10
1
20
=
1
2
and
V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )]
= E 10 P
P2
+ V ar (10P )
= 10 E (P ) E P 2 + 100V ar (P )
1
1
11
1
+ 100
=
= 10
20 300
1200
20
3.7.12
Exercise
In Example 3.5.7 …nd E (X) and V ar (X) using Corollary 3.7.9 and Theorem 3.7.10.
3.7.13
Exercise
Suppose P
Beta(a; b) and Y jP = p
Geometric(p). Find E(Y ) and V ar(Y ).
102
3.8
3. MULTIVARIATE RANDOM VARIABLES
Joint Moment Generating Functions
Moment generating functions can also be de…ned for bivariate and multivariate random
variables. As mentioned previously, moment generating functions are a powerful tool for
determining the distributions of functions of random variables (Chapter 4), particularly
sums, as well as determining the limiting distribution of a sequence of random variables
(Chapter 5).
3.8.1
De…nition - Joint Moment Generating Function
If X and Y are random variables then
M (t1 ; t2 ) = E et1 X+t2 Y
is called the joint moment generating function of X and Y if this expectation exists (joint
sum/integral converges absolutely) for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all
t2 2 ( h2 ; h2 ) for some h2 > 0.
More generally if X1 ; X2 ; : : : ; Xn are random variables then
M (t1 ; t2 ; : : : ; tn ) = E exp
n
P
ti Xi
i=1
is called the joint moment generating function of X1 ; X2 ; : : : ; Xn if this expectation exists
for all ti 2 ( hi ; hi ) for some hi > 0, i = 1; 2; : : : ; n.
If the joint moment generating function is known that it is straightforward to obtain the
moment generating functions of the marginal distributions.
3.8.2
Important Note
If M (t1 ; t2 ) exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some
h2 > 0, then the moment generating function of X is given by
MX (t) = E(etX ) = M (t; 0) for t 2 ( h1 ; h1 )
and the moment generating function of Y is given by
MY (t) = E(etY ) = M (0; t) for t 2 ( h2 ; h2 )
3.8.3
Example
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = e
and 0 otherwise.
y
for 0 < x < y < 1
3.8. JOINT MOMENT GENERATING FUNCTIONS
103
(a) Find the joint moment generating function of X and Y .
(b) What is the moment generating function of X and what is the marginal distribution of
X?
(c) What is the moment generating function of Y and what is the marginal distribution of
Y?
Solution
(a) The joint moment generating function is
M (t1 ; t2 ) = E et1 X+t2 Y
Z1 Z1
=
et1 x+t2 y f (x; y) dxdy
1
=
1
Z1 Zy
et1 x+t2 y e
y=0 x=0
=
Z1
et2 y
=
Z1
et2 y
1
t1
Z1
et2 y
1
t1
Z1
e
y
0
y
0
=
=
0
@
Zy
0
y
dxdy
1
et1 x dxA dy
1 t1 x y
e j0 dy
t1
y
et1 y
1 dy
0
(1 t1 t2 )y
e
(1 t2 )y
dy
0
which converges for t1 + t2 < 1 and t2 < 1
Therefore
M (t1 ; t2 ) =
=
=
=
=
1
1
1
lim
e (1 t1 t2 )y jb0 +
e (1 t2 )y jb0
t1 b!1 1 t1 t2
1 t2
h
i
1
1
1 h (1
lim
e (1 t1 t2 )b 1 +
e
t1 b!1 1 t1 t2
1 t2
1
1
1
t1 1 t1 t2 1 t2
1 (1 t2 ) (1 t1 t2 )
t1 (1 t1 t2 ) (1 t2 )
1
for t1 + t2 < 1 and t2 < 1
(1 t1 t2 ) (1 t2 )
t2 )b
i
1
104
3. MULTIVARIATE RANDOM VARIABLES
(b) The moment generating function of X is
MX (t) = E(etX )
= M (t; 0)
=
=
1
0) (1
(1 t
0)
1
for t < 1
1 t
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribution.
(c) The moment generating function of Y is
MY (t) = E(etY )
= M (0; t)
=
=
1
0 t) (1
(1
1
(1
t)2
t)
for t < 1
By examining the list of moment generating functions in Chapter 11 we see that this
is the moment generating function of a Gamma(2; 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Y has a Gamma(2; 1) distribution.
3.8.4
Example
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = e
x y
for x > 0, y > 0
and 0 otherwise.
(a) Find the joint moment generating function of X and Y .
(b) What is the moment generating function of X and what is the marginal distribution of
X?
(c) What is the moment generating function of Y and what is the marginal distribution of
Y?
3.8. JOINT MOMENT GENERATING FUNCTIONS
105
Solution
(a) The joint moment generating function is
M (t1 ; t2 ) = E e
=
Z1 Z1
0
0
= @
=
=
t1 X+t2 Y
1
et1 x+t2 y e
0
Z1
=
e
y(1
x y
et1 x+t2 y f (x; y) dxdy
1
dxdy
10 1
Z
t2 ) A @
e
dy
x(1 t1 )
0
0
e
(1
lim
b!1
y(1 t2 )
t2 )
1
1
Z1 Z1
jb0
1
t1
1
t2
!
lim
b!1
1
dxA
e
(1
x(1 t1 )
t1 )
which converges for t1 < 1, t2 < 1
jb0
!
for t1 < 1, t2 < 1
(b) The moment generating function of X is
MX (t) = E(etX ) = M (t; 0)
1
=
(1 t) (1 0)
1
=
for t < 1
1 t
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribution.
(c) The moment generating function of Y is
MY (t) = E(etY ) = M (0; t)
1
=
(1 0) (1 t)
1
=
for t < 1
1 t
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Y has a Exponential(1) distribution.
106
3. MULTIVARIATE RANDOM VARIABLES
3.8.5
Theorem
If X and Y are random variables with joint moment generating function M (t1 ; t2 ) which
exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0 then
E(X j Y k ) =
@ j+k
@tj1 @tk2
M (t1 ; t2 )j(t1 ;t2 )=(0;0)
Proof
See Problem 11(a).
3.8.6
Independence Theorem for Moment Generating Functions
Suppose X and Y are random variables with joint moment generating function M (t1 ; t2 )
which exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0.
Then X and Y are independent random variables if and only if
M (t1 ; t2 ) = MX (t1 )MY (t2 )
for all t1 2 ( h1 ; h1 ) and t2 2 ( h2 ; h2 ) where MX (t1 ) = M (t1 ; 0) and MY (t2 ) = M (0; t2 ).
Proof
See Problem 11(b).
3.8.7
Example
Use Theorem 3.8.6 to determine if X and Y are independent random variables in Examples
3.8.3 and 3.8.4.
Solution
For Example 3.8.3
M (t1 ; t2 ) =
(1
t1
1
t2 ) (1
t2 )
for t1 + t2 < 1 and t2 < 1
1
MX (t1 ) =
for t1 < 1
t1
1
for t2 < 1
(1 t2 )2
1
MY (t2 ) =
Since
M
1 1
;
4 4
=
1
1
1
4
1
4
1
1
4
=
8
6= MX
3
1
4
MY
1
4
=
1
1
1
1
4
then by Theorem 3.8.6 X and Y are not independent random variables.
1
1 2
4
=
4
3
3
3.8. JOINT MOMENT GENERATING FUNCTIONS
107
For Example 3.8.4
M (t1 ; t2 ) =
MX (t1 ) =
MY (t2 ) =
1
1
1 t1
1 t2
1
for t1 < 1
1 t1
1
for t2 < 1
1 t2
for t1 < 1, t1 < 1
Since
M (t1 ; t2 ) =
1
1
1
t1
1
= MX (t1 ) MY (t2 )
t2
for all t1 < 1, t1 < 1
then by Theorem 3.8.6 X and Y are independent random variables.
3.8.8
Example
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables each
with moment generating function M (t), t 2 ( h; h) for some h > 0. Find M (t1 ; t2 ; : : : ; tn )
the joint moment generating function of X1 ; X2 ; : : : ; Xn . Find the moment generating
n
P
function of T =
Xi .
i=1
Solution
Since the Xi ’s are independent random variables each with moment generating function
M (t), t 2 ( h; h) for some h > 0, the joint moment generating function of X1 ; X2 ; : : : ; Xn
is
n
P
ti Xi
M (t1 ; t2 ; : : : ; tn ) = E exp
i=1
= E
n
Q
eti Xi
i=1
=
=
n
Q
i=1
n
Q
E eti Xi
M (ti )
i=1
for ti 2 ( h; h) ; i = 1; 2; : : : ; n for some h > 0
The moment generating function of T =
n
P
Xi is
i=1
MT (t) = E etT
= E exp
n
P
tXi
i=1
= M (t; t; : : : ; t)
n
Q
=
M (t)
i=1
= [M (t)]n
for t 2 ( h; h) for some h > 0
108
3. MULTIVARIATE RANDOM VARIABLES
3.9
Multinomial Distribution
The discrete multivariate distribution which is the most widely used is the Multinomial
distribution which was introduced in a previous probability course.
We give its joint probability function and its important properties here.
3.9.1
De…nition - Multinomial Distribution
Suppose (X1 ; X2 ; : : : ; Xk ) are discrete random variables with joint probability function
f (x1 ; x2 ; : : : ; xk ) =
k
P
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k;
n!
px1 px2
x1 !x2 ! xk ! 1 2
xi = n; 0
pi
pxk k
1; i = 1; 2; : : : ; k;
i=1
Notes: (1) Since
k
P
pi = 1
i=1
Then (X1 ; X2 ; : : : ; Xk ) is said to have a Multinomial distribution.
We write (X1 ; X2 ; : : : ; Xk )
k
P
Multinomial(n; p1 ; p2 ; : : : ; pk ).
Xi = n, the Multinomial distribution is actually a joint distribution
i=1
for k
1 random variables which can be written as
f (x1 ; x2 ; : : : ; xk
n!
1) =
x1 !x2 !
xk
1!
kP1
n
x
px1 1 px2 2
pk k 11 1
xi !
i=1
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k 1;
kP1
xi
n; 0
pi
kP1
n
pi
1; i = 1; 2; : : : ; k 1;
(2) If k = 2 we obtain the familiar Binomial distribution
=
kP1
i=1
n!
px1 (1
x1 ! (n x1 )! 1
n x1
p (1 p1 )n
x1 1
for x1 = 0; 1; : : : ; n; 0
p1 )n
x1
x1
p1
1
(2) If k = 3 we obtain the Trinomial distribution
f (x1 ; x2 ) =
n!
x1 !x2 ! (n
x1
for xi = 0; 1; : : : ; n; i = 1; 2, x1 + x2
x2 )!
px1 1 px2 2 (1
n and 0
p1
pi
p2 )n
xi
i=1
i=1
i=1
f (x1 ) =
kP1
x1 x2
1; i = 1; 2; p1 + p2
1
pi
1
3.9. MULTINOMIAL DISTRIBUTION
3.9.2
109
Theorem - Properties of the Multinomial Distribution
If (X1 ; X2 ; : : : ; Xk )
(1) (X1 ; X2 ; : : : ; Xk
Multinomial(n; p1 ; p2 ; : : : ; pk ), then
1)
has joint moment generating function
M (t1 ; t2 ; : : : ; tk
1)
= E et1 X1 +t2 X2 +
p1 et1 + p2 et2
=
for (t1 ; t2 ; : : : ; tk
1)
2 <k
+tk
1 Xk 1
+ pk
1e
tk
1
+ pk
n
(3.5)
1.
(2) Any subset of X1 ; X2 ; : : : ; Xk also has a Multinomial distribution. In particular
Xi
Binomial(n; pi ) for i = 1; 2; : : : ; k
(3) If T = Xi + Xj ; i 6= j; then
T
Binomial (n; pi + pj )
(4)
Cov (Xi ; Xj ) =
npi pj
for i 6= j
(5) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the remaining of
the coordinates is a Multinomial distribution. In particular the conditional probability
function of Xi given Xj = xj ; i 6= j; is
Xi jXj = xj
Binomial n
xj ;
pi
1
pj
(6) The conditional distribution of Xi given T = Xi + Xj = t; i 6= j; is
Xi jXi + Xj = t
3.9.3
Binomial t;
pi
pi + pj
Example
Suppose (X1 ; X2 ; : : : ; Xk )
(a) Prove (X1 ; X2 ; : : : ; Xk
Multinomial(n; p1 ; p2 ; : : : ; pk )
1)
has joint moment generating function
M (t1 ; t2 ; : : : ; tk
for (t1 ; t2 ; : : : ; tk
1)
(b) Prove (X1 ; X2 )
(c) Prove Xi
2 <k
1)
= p1 et1 + p2 et2
1.
Multinomial(n; p1 ; p1 ; 1
p1
Binomial(n; pi ) for i = 1; 2; : : : ; k.
(d) Prove T = X1 + X2
+ pk
Binomial(n; p1 + p2 ).
p2 ).
tk
1e
1
+ pk
n
110
3. MULTIVARIATE RANDOM VARIABLES
Solution
(a) Let A =
(x1 ; x2 ; : : : ; xk ) : xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k;
k
P
xi = n
then
i=1
M (t1 ; t2 ; : : : ; tk
=
P
P
(x1 ;x2 ; :::;xk )
=
P
P
(x1 ;x2 ; :::;xk )
=
p1 et1 + p2 et2
P
1)
= E et1 X1 +t2 X2 +
et1 x1 +t2 x2 +
+tk
1 xk 1
2A
P
n!
p1 et1
x
!x
!
x
!
k
2A 1 2
+ pk
1e
tk
1
+ pk
n
+tk
1 Xk 1
n!
p x1 p x2
x1 !x2 ! xk ! 1 2
x1
p2 et2
x2
pk
for (t1 ; t2 ; : : : ; tk
pkxk 1 1 pxk k
tk
1e
1)
1
xk
1
pxk k
2 <k
by the Multinomial Theorem 2.11.5.
(b) The joint moment generating function of (X1 ; X2 ) is
M (t1 ; t2 ; 0; : : : ; 0) = p1 et1 + p2 et2 + (1
p1
p2 )
n
for (t1 ; t2 ) 2 <2
which is of the form 3.5 so by the Uniqueness Theorem for Moment Generating Functions,
(X1 ; X2 ) Multinomial(n; p1 ; p1 ; 1 p1 p2 ).
(c) The moment generating function of Xi is
M (0; 0; : : : ; t; 0; : : : ; 0) = pi eti + (1
pi )
n
for ti 2 <
for i = 1; 2; : : : ; k which is the moment generating function of a Binomial(n; pi ) random variable. By the Uniqueness Theorem for Moment Generating Functions, Xi Binomial(n; pi )
for i = 1; 2; : : : ; k.
(d) The moment generating function of T = X1 + X2 is
MT (t) = E etT = E et(X1 +X2 )
= E etX1 +tX2
= M (t; t; 0; 0; : : : ; 0)
=
p1 et + p2 et + (1
p1
p2 )
=
(p1 + p2 ) et + (1
p1
p2 )
n
n
for t 2 <
which is the moment generating function of a Binomial(n; p1 + p2 ) random variable. By
the Uniqueness Theorem for Moment Generating Functions, T
Binomial(n; p1 + p2 ).
3.9.4
Exercise
Prove property (3) in Theorem 3.9.2.
3.10. BIVARIATE NORMAL DISTRIBUTION
3.10
111
Bivariate Normal Distribution
The best known bivariate continuous distribution is the Bivariate Normal distribution. We
give its joint probability density function written in vector notation so we can easily introduce the multivariate version of this distribution called the Multivariate Normal distribution
in Chapter 7.
3.10.1
De…nition - Bivariate Normal Distribution (BVN)
Suppose X1 and X2 are random variables with joint probability density function
f (x1 ; x2 ) =
where
x=
h
1
2 j
j1=2
x1 x2
i
1
(x
2
exp
;
=
h
1
1
)
2
i
(x
;
)T
=
for (x1 ; x2 ) 2 <2
"
2
1
1 2
2
2
1 2
#
and
is an nonsingular matrix. Then X = (X1 ; X2 ) is said to have a bivariate normal
distribution. We write X BVN( ; ).
The Bivariate Normal distribution has many special properties.
3.10.2
If X
Theorem - Properties of the BVN Distribution
BVN( ; ), then
(1) X1 ; X2 has joint moment generating function
M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT
1
= exp
tT + t tT
for all t = (t1 ; t2 ) 2 <2
2
(2) X1
N(
1;
2)
1
(3) Cov (X1 ; X2 ) =
and X2
1 2
N(
2;
2 ).
2
and Cor (X1 ; X2 ) =
where
1
1.
(4) X1 and X2 are independent random variables if and only if
= 0.
(5) If c = (c1 ; c2 ) is a nonzero vector of constants then
c1 X1 + c2 X2
(6) If A is a 2
cT ; c cT
N
2 nonsingular matrix and b is a 1
XA + b
BVN
2 vector then
A + b; AT A
(7)
X2 jX1 = x1
N
2
+
2 (x1
1 )= 1 ;
2
2 (1
2
)
112
3. MULTIVARIATE RANDOM VARIABLES
and
X1 jX2 = x2
(8) (X
)
)T
1 (X
N
1
+
1 (x2
2
1 (1
2 )= 2 ;
2
)
2 (2)
Proof
For proofs of properties (1)
(4) and (6)
(7) see Problem 13.
(5) The moment generating function of c1 X1 + c2 X2 is
E et(c1 X1 +c2 X2 )
= E e(c1 t)X1 +(c2 t)X2
"
#
"
#!
h
i c t
i
1h
c1 t
1
+
= exp
c1 t c2 t
1
2
2
c2 t
c2 t
"
#
"
# !
h
i c
i
1h
c1
1
t+
t2
= exp
c1 c2
1
2
2
c2
c2
cT t +
= exp
1
c cT t2
2
for t 2 < where c = (c1 ; c2 )
which is the moment generating function of a N cT ; c cT random variable. Therefore by
the Uniqueness Theorem for Moment Generating Functions, c1 X1 + c2 X2 N cT ; c cT .
The BVN joint probability density function is graphed in Figures 3.8 - 3.10.
0.2
f (x, y)
0.15
0.1
0.05
0
3
3
2
2
1
1
0
0
-1
-1
-2
-2
y
-3
-3
Figure 3.8: Graph of BVN p.d.f. with
The graphs all have the same mean vector
matrices . The axes all have the same scale.
x
=
h
0 0
i
and
=
"
1 0
0 1
#
= [0 0] but di¤erent variance/covariance
3.10. BIVARIATE NORMAL DISTRIBUTION
113
0.2
f(x,y)
0.15
0.1
0.05
0
3
3
2
2
1
1
0
0
-1
-1
-2
-2
-3
y
-3
Figure 3.9: Graph of BVN p.d.f. with
x
h
=
i
0 0
and
=
"
1 0:5
0:5 1
#
"
0:6 0:5
0:5 1
#
0.2
f(x,y)
0.15
0.1
0.05
0
3
3
2
2
1
1
0
0
-1
-1
-2
-2
y
-3
Figure 3.10: Graph of BVN p.d.f. with
-3
x
=
h
0 0
i
and
=
114
3.11
3. MULTIVARIATE RANDOM VARIABLES
Calculus Review
Consider the region R in the xy-plane in Figure 3.11.
y=g(x)
y
R
y=h(x)
x=a
x=b
x
Figure 3.11: Region 1
Suppose f (x; y)
0 for all (x; y) 2 <2 . The graph of z = f (x; y) is a surface in 3-space
lying above or touching the xy-plane. The volume of the solid bounded by the surface
z = f (x; y) and the xy-plane above the region R is given by
Volume =
Zb
h(x)
Z
f (x; y)dydx
x=a y=g(x)
y=b
y
x=g(y)
R
x=h(y)
y=a
x
Figure 3.12: Region 2
3.11. CALCULUS REVIEW
115
If R is the region in Figure 3.12 then the volume is given by
Volume =
Zd
h(y)
Z
f (x; y)dxdy
y=c x=g(y)
Give an expression for the volume of the solid bounded by the surface z = f (x; y) and
the xy-plane above the region R = R1 [ R2 in Figure 3.13.
b3
y=g(x)
y
x=h(y)
b2
R2
R
1
b1
a1
a2
x
Figure 3.13: Region 3
a3
116
3.12
3. MULTIVARIATE RANDOM VARIABLES
Chapter 3 Problems
1. Suppose X and Y are discrete random variables with joint probability function
f (x; y) = kq 2 px+y
for x = 0; 1; : : : ; y = 0; 1; : : : ; 0 < p < 1; q = 1
p
(a) Determine the value of k.
(b) Find the marginal probability function of X and the marginal probability function of Y . Are X and Y independent random variables?
(c) Find P (X = xjX + Y = t).
2. Suppose X and Y are discrete random variables with joint probability function
f (x; y) =
e
x!(y
2
x)!
for x = 0; 1; : : : ; y; y = 0; 1; : : :
(a) Find the marginal probability function of X and the marginal probability function of Y .
(b) Are X and Y independent random variables?
3. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = k(x2 + y) for 0 < y < 1 x2 ;
1<x<1
(a) Determine k:
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (Y
X + 1).
4. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = kx2 y for x2 < y < 1
(a) Determine k.
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X
Y ).
(e) Find the conditional probability density function of X given Y = y and the
conditional probability density function of Y given X = x.
3.12. CHAPTER 3 PROBLEMS
117
5. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = kxe y for 0 < x < 1; 0 < y < 1
(a) Determine k.
(b) Find the marginal probability density function of X and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X + Y
t).
6. Suppose each of the following functions is a joint probability density function for
continuous random variables X and Y .
(a) f (x; y) = k
for 0 < x < y < 1
for 0 < x2 < y < 1
(b) f (x; y) = kx
(c) f (x; y) = kxy
for 0 < y < x < 1
(d) f (x; y) = k (x + y) for 0 < x < y < 1
(e) f (x; y) =
k
x
for 0 < y < x < 1
(f) f (x; y) = kx2 y
(g) f (x; y) = ke
for 0 < x < 1; 0 < y < 1; 0 < x + y < 1
x 2y
for 0 < y < x < 1
In each case:
(i) Determine k.
(ii) Find the marginal probability density function of X and the marginal probability density function of Y .
(iii) Find the conditional probability density function of X given Y = y and the
conditional probability density function of Y given X = x.
(iv) Find E (Xjy) and E (Y jx).
7. Suppose X
Uniform(0; 1) and the conditional probability density function of Y
given X = x is
1
f2 (yjx) =
for 0 < x < y < 1
1 x
Determine:
(a) the joint probability density function of X and Y
(b) the marginal probability density function of Y
(c) the conditional probability density function of X given Y = y.
118
3. MULTIVARIATE RANDOM VARIABLES
8. Suppose X and Y are continuous random variables. Suppose also that the marginal
probability density function of X is
f1 (x) =
1
(1 + 4x)
3
for 0 < x < 1
and the conditional probability density function of Y given X = x is
f2 (yjx) =
2y + 4x
1 + 4x
for 0 < x < 1; 0 < y < 1
Determine:
(a) the joint probability density function of X and Y
(b) the marginal probability density function of Y
(c) the conditional probability density function of X given Y = y.
9. Suppose that
Beta(a; b) and Y j
Binomial(n; ). Find E(Y ) and V ar(Y ).
10. Assume that Y denotes the number of bacteria in a cubic centimeter of liquid and
that Y j
Poisson( ). Further assume that varies from location to location and
Gamma( ; ).
(a) Find E(Y ) and V ar(Y ).
(b) If is a positive integer then show that the marginal probability function of Y
is Negative Binomial.
11. Suppose X and Y are random variables with joint moment generating function
M (t1 ; t2 ) which exists for all jt1 j < h1 and jt2 j < h2 for some h1 ; h2 > 0.
(a) Show that
E(X j Y k ) =
@ j+k
@tj1 @tk2
M (t1 ; t2 )j(t1 ;t2 )=(0;0)
(b) Prove that X and Y are independent random variables if and only if
M (t1 ; t2 ) = MX (t1 )MY (t2 ).
(c) If (X; Y )
Multinomial(n; p1 ; p2 ; 1
p1
p2 ) …nd Cov(X; Y ).
12. Suppose X and Y are discrete random variables with joint probability function
f (x; y) =
e
x!(y
2
x)!
for x = 0; 1; : : : ; y; y = 0; 1; : : :
(a) Find the joint moment generating function of X and Y .
(b) Find Cov(X; Y ).
3.12. CHAPTER 3 PROBLEMS
13. Suppose X = (X1 ; X2 )
119
BVN( ; ).
(a) Let t = (t1 ; t2 ). Use matrix multiplication to verify that
(x
= [x
1
)
(x
( + t )]
1
)T
2xtT
[x
( + t )]T
2 tT
t tT
Use this identity to show that the joint moment generating function of X1 and
X2 is
M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT
1
= exp
tT + t tT
for all t = (t1 ; t2 ) 2 <2
2
(b) Use moment generating functions to show X1
N(
1;
2)
1
(c) Use moment generating functions to show Cov(X1 ; X2 ) =
result in Problem 11(a).
and X2
1 2.
N(
2;
2 ).
2
Hint: Use the
(d) Use moment generating functions to show that X1 and X2 are independent
random variables if and only if = 0.
(e) Let A be a 2 2 nonsingular matrix and b be a 1
generating function to show that
XA + b
BVN
2 vector. Use the moment
A + b; AT A
(f) Verify that
(x
)
1
(x
)T
2
x1
1
1
=
1
2 (1
2
2)
x2
2+
2
1
2
(x1
1)
and thus show that the conditional distribution of X2 given X1 = x1 is
2
2 )). Note that by symmetry the conditional
N( 2 + 2 (x1
1 )= 1 ; 2 (1
2
2 )).
distribution of X1 given X2 = x2 is N( 1 + 1 (x2
2 )= 2 ; 1 (1
14. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 2e x y for 0 < x < y < 1
(a) Find the joint moment generating function of X and Y .
(b) Determine the marginal distributions of X and Y .
(c) Find Cov(X; Y ).
120
3. MULTIVARIATE RANDOM VARIABLES
4. Functions of Two or More
Random Variables
In this chapter we look at techniques for determining the distributions of functions of
two or more random variables. These techniques are extremely important for determining
the distributions of estimators such as maximum likelihood estimators, the distributions
of pivotal quantities for constructing con…dence intervals, and the distributions of test
statistics for testing hypotheses.
In Section 4:1 we extend the cumulative distribution function technique introduced in
Section 2.6 to a function of two or more random variables. In Section 4:2 we look at a
method for determining the distribution of a one-to-one transformation of two or more
random variables which is an extension of the result in Theorem 2.6.8. In particular we
show how the t distribution, which you would have used in your previous statistics course,
arises as the ratio of two independent Chi-squared random variables each divided by their
degrees of freedom. In Section 4:3 we see how moment generating functions can be used for
determining the distribution of a sum of random variables which is an extension of Theorem
2.10.4. In particular we prove that a linear combination of independent Normal random
variables has a Normal distribution. This is a result which was used extensively in previous
probability and statistics courses.
4.1
Cumulative Distribution Function Technique
Suppose X1 ; X2 ; : : : ; Xn are continuous random variables with joint probability density
function f (x1 ; x2 ; : : : ; xn ). The probability density function of Y = h (X1 ; X2 ; : : : ; Xn )
can be determined using the cumulative distribution function technique that was used in
Section 2.6 for the case n = 1.
4.1.1
Example
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = 3y
for 0 < x < y < 1
and 0 otherwise. Determine the probability density function of T = XY .
121
122
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 4.1
y
1
F
x=y
x=t/y
E
√t -
x
1
0
Figure 4.1: Support set for Example 4.1.1
For 0 < t < 1
G (t) = P (T
t) = P (XY
t) =
Z
Z
3ydxdy
(x;y) 2 E
Due to the shape of the region E, the double integral over the region E would have to be
written as the sum of two double integrals. It is easier to …nd G (t) using
G (t) =
Z
Z
3ydxdy
(x;y) 2 E
= 1
Z
Z
3ydxdy
(x;y) 2 F
Z1
= 1
Zy
Z1
3ydxdy = 1
p
y= t x=t=y
= 1
Z1
p
3y y
p
t
y
= 1
y
= 3t
2t3=2
3ty
j1p
dy = 1
p
t
=1
for 0 < t < 1
t
Z1
t
3
3y xjyt=y dy
1
3y 2
3t dy
t
3t
t3=2
3t3=2
4.1. CUMULATIVE DISTRIBUTION FUNCTION TECHNIQUE
The cumulative distribution function for T
8
>
>
<
3t
G (t) =
>
>
:
123
is
0
t
0
2t3=2 0 < t < 1
1
t
1
Now a cumulative distribution function must be a continuous function for all real values.
Therefore as a check we note that
lim
3t
2t3=2 = 0 = G (0)
lim
3t
2t3=2 = 1 = G (1)
t!0+
and
t!1
so indeed G (t) is a continuous function for all t 2 <.
Since
d
dt G (t)
= 0 for t < 0 and t > 0, and
d
d
G (t) =
3t 2t3=2
dt
dt
= 3 3t1=2 for 0 < t < 1
the probability density function of T is
g (t) = 3
3t1=2
for 0 < t < 1
and 0 otherwise.
4.1.2
Exercise
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = 3y
for 0
x
y
1
and 0 otherwise. Find the probability density function of S = Y =X.
4.1.3
Example
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed continuous random
variables each with probability density function f (x) and cumulative distribution function
F (x). Find the probability density function of U = max (X1 ; X2 ; : : : ; Xn ) = X(n) and
V = min (X1 ; X2 ; : : : ; Xn ) = X(1) .
124
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
For u 2 <, the cumulative distribution function of U is
G (u) = P (U
u) = P [max (X1 ; X2 ; : : : ; Xn )
= P (X1
u; X2
u; : : : ; Xn
= P (X1
u) P (X2
u]
u)
u) : : : P (Xn
u)
since X1 ; X2 ; : : : ; Xn are independent random variables
n
Q
=
P (Xi
i=1
n
Q
=
u)
F (u)
since X1 ; X2 ; : : : ; Xn are identically distributed
i=1
= [F (u)]n
Suppose A is the support set of Xi , i = 1; 2; : : : ; n. The probability density function of U is
d
d
G (u) =
[F (u)]n
du
du
= n [F (u)]n 1 f (u) for u 2 A
g (u) =
and 0 otherwise.
For v 2 <, the cumulative distribution function of V is
H (v) = P (V
v) = P [min (X1 ; X2 ; : : : ; Xn )
v]
= 1
P [min (X1 ; X2 ; : : : ; Xn ) > v]
= 1
P (X1 > v; X2 > v; : : : ; Xn > v)
= 1
P (X1 > v) P (X2 > v) : : : P (Xn > v)
since X1 ; X2 ; : : : ; Xn are independent random variables
n
Q
= 1
P (Xi > v)
= 1
i=1
n
Q
[1
F (v)]
since X1 ; X2 ; : : : ; Xn are identically distributed
i=1
= 1
[1
F (v)]n
Suppose A is the support set of Xi , i = 1; 2; : : : ; n. The probability density function of V is
d
d
H (v) =
f1 [1 F (v)]n g
dv
dv
= n [1 F (v)]n 1 f (v) for v 2 A
h (v) =
and 0 otherwise.
4.2. ONE-TO-ONE TRANSFORMATIONS
4.2
125
One-to-One Transformations
In this section we look at how to determine the joint distribution of a one-to-one transformation of two or more random variables. We concentrate on the bivariate case for ease of
presentation. The method does extend to more than two random variables. See Problems
12 and 13 at the end of this chapter for examples of one-to-one transformations of three
random variables.
We begin with some notation and a theorem which gives su¢ cient conditions for determining whether a transformation is one-to-one in the bivariate case followed by the theorem
which gives the joint probability density function for the two new random variables.
Suppose the transformation S de…ned by
u = h1 (x; y)
v = h2 (x; y)
is a one-to-one transformation for all (x; y) 2 RXY and that S maps the region RXY into
the region RU V in the uv plane. Since S : (x; y) ! (u; v) is a one-to-one transformation
there exists a inverse transformation T de…ned by
x = w1 (u; v)
y = w2 (u; v)
such that T = S
T is
1
: (u; v) ! (x; y) for all (u; v) 2 RU V . The Jacobian of the transformation
@(x; y)
=
@(u; v)
where
4.2.1
@(u;v)
@(x;y)
@x
@u
@y
@u
@x
@v
@y
@v
=
@(u; v)
@(x; y)
1
is the Jacobian of the transformation S.
Inverse Mapping Theorem
Consider the transformation S de…ned by
u = h1 (x; y)
v = h2 (x; y)
Suppose the partial derivatives
@u @u @v
@x , @y , @x
and
@v
@y are continuous functions
@(u;v)
@(x;y) 6= 0 at the point (a; b).
bourhood of the point (a; b). Suppose also that
a neighbourhood of the point (a; b) in which S has an inverse.
in the neighThen there is
Note: These are su¢ cient but not necessary conditions for the inverse to exist.
126
4.2.2
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Theorem - One-to-One Bivariate Transformations
Let X and Y be continuous random variables with joint probability density function f (x; y)
and let RXY = f(x; y) : f (x; y) > 0g be the support set of (X; Y ). Suppose the transformation S de…ned by
U
= h1 (X; Y )
V
= h2 (X; Y )
is a one-to-one transformation with inverse transformation
X = w1 (U; V )
Y
= w2 (U; V )
Suppose also that S maps RXY into RU V . Then g(u; v), the joint joint probability density
function of U and V , is given by
g(u; v) = f (w1 (u; v); w2 (u; v))
@(x; y)
@(u; v)
for all (u; v) 2 RU V . (Compare Theorem 2.6.8 for univariate random variables.)
4.2.3
Proof
We want to …nd g(u; v), the joint probability density function of the random variables U
and V . Suppose S 1 maps the region B RU V into the region A RXY then
P [(U; V ) 2 B]
ZZ
=
g(u; v)dudv
(4.1)
B
= P [(X; Y ) 2 A]
ZZ
=
f (x; y)dxdy
A
=
ZZ
f (w1 (u; v); w2 (u; v))
@(x; y)
dudv
@(u; v)
(4.2)
B
where the last line follows by the Change of Variable Theorem. Since this is true for all
B RU V we have, by comparing (4.1) and (4.2), that the joint probability density function
of U and V is given by
g(u; v) = f (w1 (u; v); w2 (u; v))
@(x; y)
@(u; v)
for all (u; v) 2 RU V .
In the following example we see how Theorem 4.2.2 can be used to show that the sum of
two independent Exponential(1) random variables is a Gamma random variable.
4.2. ONE-TO-ONE TRANSFORMATIONS
4.2.4
127
Example
Suppose X Exponential(1) and Y
Exponential(1) independently. Find the joint probability density function of U = X + Y and V = X. Show that U Gamma(2; 1).
Solution
Since X
Exponential(1) and Y
density function of X and Y is
Exponential(1) independently, the joint probability
f (x; y) = f1 (x) f2 (y) = e
= e
x
e
y
x y
with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 4.2.
y
.
.
.
.
.
.
...
x
0
Figure 4.2: Support set RXY for Example 4.2.6
The transformation
S : U =X +Y, V =X
has inverse transformation
X =V, Y =U
V
Under S the boundaries of RXY are mapped as
(k; 0) ! (k; k)
(0; k) ! (k; 0)
for k
0
for k
0
and the point (1; 2) is mapped to the point (3; 1). Thus S maps RXY into
RU V = f(u; v) : 0 < v < ug
128
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
v
v=u
...
u
0
Figure 4.3: Support set RU V for Example 4.2.4
as shown in Figure 4.3.
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
@x
@u
@y
@u
@x
@v
@y
@v
=
0
1
1
=
@y
@v
1
Note that the transformation S is a linear transformation and so we would expect the
Jacobian of the transformation to be a constant.
The joint probability density function of U and V is given by
@(x; y)
@(u; v)
g (u; v) = f (w1 (u; v); w2 (u; v))
= f (v; u
= e
u
v) j 1j
for (u; v) 2 RU V
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u. The marginal
probability density function of U is
g1 (u) =
Z1
1
= ue
g (u; v) dv = e
u
Zu
dv
v=0
u
for u > 0
and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
4.2. ONE-TO-ONE TRANSFORMATIONS
129
In the following exercise we see how the sum and di¤erence of two independent Exponential(1)
random variables give a Gamma random variable and a Double Exponential random variable respectively.
4.2.5
Exercise
Suppose X Exponential(1) and Y
Exponential(1) independently. Find the joint probability density function of U = X + Y and V = X Y . Show that U
Gamma(2; 1) and
V
Double Exponential(0; 1).
In the following example we see how the Gamma and Beta distributions are related.
4.2.6
Example
Suppose X Gamma(a; 1) and Y
Gamma(b; 1) independently. Find the joint probability
X
density function of U = X + Y and V = X+Y
. Show that U Gamma(a + b; 1) and
V
X
X+Y
Beta(a; b) independently. Find E (V ) by …nding E
.
Solution
Since X Gamma(a; 1) and Y
function of X and Y is
Gamma(b; 1) independently, the joint probability density
f (x; y) = f1 (x) f2 (y) =
=
xa
1e x
yb
(a)
1e y
(b)
xa 1 y b 1 e x y
(a) (b)
with support set RXY = f(x; y) : x > 0; y > 0g which is the same support set as shown in
Figure 4.2.
The transformation
S : U =X +Y, V =
X
X +Y
has inverse transformation
X = U V , Y = U (1
V)
Under S the boundaries of RXY are mapped as
(k; 0) ! (k; 1)
(0; k) ! (k; 0)
for k
0
for k
0
and the point (1; 2) is mapped to the point 3; 13 . Thus S maps RXY into
RU V = f(u; v) : u > 0; 0 < v < 1g
as shown in Figure 4.4.
130
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
v
1
...
u
0
Figure 4.4: Support set RU V for Example 4.2.6
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
v
1
u
u
v
=
uv
u + uv =
u
The joint probability density function of U and V is given by
g (u; v) = f (uv; u (1
=
(uv)
= ua+b
a 1
1
e
v)) j uj
[u (1 v)]b 1 e
(a) (b)
uv
uj
v)b
(a) (b)
a 1 (1
uj
1
for (u; v) 2 RU V
and 0 otherwise.
To …nd the marginal probability density functions for U and V we note that the support
set of U is B1 = fu : u > 0g and the support set of V is B2 = fv : 0 < v < 1g. Since
a+b 1
u
g (u; v) = u
| {z e }
h1 (u)
va
|
v)b 1
(a) (b)
{z
}
1 (1
h2 (v)
for all (u; v) 2 B1 B2 then, by the Factorization Theorem for Independence, U and V are
independent random variables. Also by the Factorization Theorem for Independence the
probability density function of U must be proportional to h1 (u). By writing
g (u; v) =
ua+b 1 e u
(a + b)
(a + b) a
v
(a) (b)
1
(1
v)b
1
we note that the function in the …rst square bracket is the probability density function of a
Gamma(a + b; 1) random variable and therefore U
Gamma(a + b; 1). It follows that the
4.2. ONE-TO-ONE TRANSFORMATIONS
131
function in the second square bracket must be the probability density function of V which
is a Beta(a; b) probability density function. Therefore U Gamma(a + b; 1) independently
of V
Beta(a; b).
In Chapter 2, Problem 9 the moments of a Beta random variable were found by integration. Here is a rather clever way of …nding E (V ) using the mean of a Gamma random
variable. In Exercise 2.7.9 it was shown that the mean of a Gamma( ; ) random variable
is
.
Now
E (U V ) = E (X + Y )
since X
X
= E (X) = (a) (1) = a
(X + Y )
Gamma(a; 1). But U and V are independent random variables so
a = E (U V ) = E (U ) E (V )
But since U
Gamma(a + b; 1) we know E (U ) = a + b so
a = E (U ) E (V ) = (a + b) E (V )
Solving for E (V ) gives
a
a+b
Higher moments can be found in a similar manner using the higher moments of a Gamma
random variable.
E (V ) =
4.2.7
Exercise
Suppose X
Beta(a; b) and Y
Beta(a + b; c) independently. Find the joint probability
density function of U = XY and V = X. Show that U Beta(a; b + c).
In the following example we see how a rather unusual transformation can be used to transform two independent Uniform(0; 1) random variables into two independent N(0; 1) random
variables. This transformation is referred to as the Box-Muller Transformation after the
two statisticians George E. P. Box and Mervin Edgar Muller who published this result in
1958.
4.2.8
Example - Box-Muller Transformation
Suppose X
Uniform(0; 1) and Y
bility density function of
Uniform(0; 1) independently. Find the joint proba-
U
= ( 2 log X)1=2 cos (2 Y )
V
= ( 2 log X)1=2 sin (2 Y )
Show that U
N(0; 1) and V
N(0; 1) independently. Explain how you could use this
result to generate independent observations from a N(0; 1) distribution.
132
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
Since X Uniform(0; 1) and Y
function of X and Y is
Uniform(0; 1) independently, the joint probability density
f (x; y) = f1 (x) f2 (y) = (1) (1)
= 1
with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g.
Consider the transformation
S : U = ( 2 log X)1=2 cos (2 Y ) , V = ( 2 log X)1=2 sin (2 Y )
To determine the support set of (U; V ) we note that 0 < y < 1 implies 1 < cos (2 y) < 1.
Also 0 < x < 1 implies 0 < ( 2 log X)1=2 < 1. Therefore u = ( 2 log x)1=2 cos (2 y) takes
on values in the interval ( 1; 1). By a similar argument v = ( 2 log x)1=2 sin (2 y) also
takes on values in the interval ( 1; 1). Therefore the support set of (U; V ) is RU V = <2 .
The inverse of the transformation S can be determined. In particular we note that since
h
i2 h
i2
U 2 + V 2 = ( 2 log X)1=2 cos (2 Y ) + ( 2 log X)1=2 sin (2 Y )
= ( 2 log X) cos2 (2 Y ) + sin2 (2 Y )
=
2 log X
and
sin (2 Y )
V
=
= tan (2 Y )
U
cos (2 Y )
the inverse transformation is
X=e
1
2
(U 2 +V 2 ) , Y = 1 arctan
2
V
U
To determine the Jacobian of the inverse transformation it is simpler to use the result
2
3 1
@u
@u
1
@(x; y)
@(u; v)
@x
@y
=
= 4 @v @v 5
@(u; v)
@(x; y)
@x
@y
Since
@u
@x
@v
@x
=
=
=
1
x
1
x
@u
@y
@v
@y
( 2 log x)
( 2 log x)
1=2
cos (2 y)
1=2
sin (2 y)
2
cos2 (2 y) + sin2 (2 y)
x
2
x
2 ( 2 log x) 1=2 sin (2 y)
2 ( 2 log x) 1=2 cos (2 y)
4.2. ONE-TO-ONE TRANSFORMATIONS
133
Therefore
@(x; y)
@(u; v)
@(u; v) 1
=
@(x; y)
x
2
1
2
2
1
e 2 (u +v )
2
=
=
=
1
2
x
The joint probability density function of U and V is
g (u; v) = f (w1 (u; v); w2 (u; v))
1
e
2
= (1)
=
1
e
2
1
2
1
2
@(x; y)
@(u; v)
(u2 +v2 )
(u2 +v2 )
for (u; v) 2 <2
The support set of U is < and the support set of V is <. Since g (u; v) can be written as
g (u; v) =
1
p e
2
1 2
u
2
1
p e
2
1 2
v
2
for all (u; v) 2 < < = <2 , therefore by the Factorization Theorem for Independence, U
and V are independent random variables. We also note that the joint probability density
function is the product of two N(0; 1) probability density functions. Therefore U N(0; 1)
and V
N(0; 1) independently.
Let x and y be two independent Uniform(0; 1) observations which have been generated
using a random number generator. Then from the result above we have that
u = ( 2 log x)1=2 cos (2 y)
v = ( 2 log x)1=2 sin (2 y)
are two independent N(0; 1) observations.
The result in the following theorem is one that was used (without proof) in a previous statistics course such as STAT 221/231/241 to construct con…dence intervals and test hypotheses
regarding the mean in a N ; 2 model when the variance 2 is unknown.
4.2.9
Theorem - t Distribution
If X
2 (n)
independently of Z
N(0; 1) then
Z
T =p
X=n
t(n)
134
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Proof
The transformation T = p Z
X=n
is not a one-to-one transformation. However if we add the
variable U = X to complete the transformation and consider the transformation
Z
S: T =p
, U =X
X=n
then this transformation has an inverse transformation given by
U
n
X = U, Z = T
Since X
X and Z is
2 (n)
independently of Z
f (x; z) = f1 (x) f2 (z) =
=
1=2
N(0; 1) the joint probability density function of
2n=2
1
xn=2
(n=2)
1
p xn=2
2(n+1)=2 (n=2)
1
e
x=2
1
e
e
x=2
1
p e
2
z 2 =2
z 2 =2
with support set RXZ = f(x; z) : x > 0; z 2 <g. The transformation S maps RXZ into
RT U = f(t; u) : t 2 <; u > 0g.
The Jacobian of the inverse transformation is
@ (x; z)
=
@ (t; u)
@x
@t
@z
@t
@x
@u
@z
@u
0
=
1
u 1=2
n
@z
@u
=
u
n
1=2
The joint probability density function of U and V is given by
g (t; u) = f
=
=
t
u
n
2(n+1)=2
2(n+1)=2
1=2
u
n
;u
1=2
u 1=2
1
2
p un=2 1 e u=2 e t u=(2n)
n
(n=2)
2
1
p u(n+1)=2 1 e u(1+t =n)=2 for (t; u) 2 RT U
(n=2)
n
and 0 otherwise.
To determine the distribution of T we need to …nd the marginal probability density
function for T .
g1 (t) =
Z1
g (t; u) du
1
=
2(n+1)=2
1
p
(n=2)
n
Z1
0
u(n+1)=2
1
e
u(1+t2 =n)=2
du
4.2. ONE-TO-ONE TRANSFORMATIONS
2
135
1
2
g1 (t) =
2(n+1)=2
1
p
(n=2)
n
Z1 "
0
=
2(n+1)=2
1
p 2(n+1)=2
(n=2)
n
=
1
p
(n=2)
n
1+
=
n+1
2p
1+
(n=2)
n
t2
"
e
y
#
1 (n+1)=2 Z1
t2
1+
n
dy. Note that when
"
1
t2
2 1+
n
y (n+1)=2
1
e
y
#
dy
dy
0
(n+1)=2
n+1
2
n
t2
n
#
1 (n+1)=2 1
t2
2y 1 +
n
1
t2
n
Let y = u2 1 + tn so that u = 2y 1 + tn
and du = 2 1 +
u = 0 then y = 0, and when u ! 1 then y ! 1. Therefore
(n+1)=2
for t 2 <
which is the probability density function of a random variable with a t(n) distribution.
Therefore
Z
T =p
X=n
as required.
4.2.10
t(n)
Example
Use Theorem 4.2.9 to …nd E (T ) and V ar (T ) if T
t(n).
Solution
If X
2 (n)
independently of Z
N(0; 1) then we know from the previous theorem that
Z
T =p
X=n
t(n)
Now
E (T ) = E
Z
p
X=n
!
=
p
nE (Z) E X
1=2
since X and Z are independent random variables. Since E (Z) = 0 it follows that E (T ) = 0
136
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
as long as E X
1=2
E X
2 (n)
exists. Since X
k
=
Z1
xk
2n=2
1
xn=2
(n=2)
1
e
x=2
dx
0
=
1
(n=2)
Z1
xk+n=2
1
(n=2)
Z1
(2y)k+n=2
2k+n=2
2n=2 (n=2)
Z1
y k+n=2
2n=2
1
e
x=2
dx let y = x=2
0
=
2n=2
1
e
y
(2) dy
0
=
1
e
y
dy
0
=
2k
(n=2 + k)
(n=2)
(4.3)
which exists for n=2 + k > 0. If k = 1=2 then the integral exists for n=2 > 1=2 or n > 1.
Therefore
E (T ) = 0 for n > 1
Now
V ar (T ) = E T 2
= E T2
and
E T2 = E
Since Z
Z2
X=n
[E (T )]2
since E (T ) = 0
= nE Z 2 E X
1
N(0; 1) then
E Z2
= V ar (Z) + [E (Z)]2 = 1 + 02
= 1
Also by (4.3)
E X
1
=
=
2
1
(n=2 1)
1
=
(n=2)
2 (n=2
1)
1
n
2
which exists for n > 2. Therefore
V ar (T ) = E T 2 = nE Z 2 E X
1
= n (1)
n 2
n
=
for n > 2
n 2
1
4.3. MOMENT GENERATING FUNCTION TECHNIQUE
137
The following theorem concerns the F distribution which is used in testing hypotheses about
the parameters in a multiple linear regression model.
4.2.11
If X
Theorem - F Distribution
2 (n)
independently of Y
2 (m)
U=
4.2.12
then
X=n
Y =m
F(n; m)
Exercise
(a) Prove Theorem 4.2.11. Hint: Complete the transformation with V = Y .
(b) Find E(U ) and V ar(U ) and note for what values of n and m that they exist.
Hint: Use the technique and results of Example 4.2.10.
4.3
Moment Generating Function Technique
The moment generating function technique is particularly useful in determining the distribution of a sum of two or more independent random variables if the moment generating
functions of the random variables exist.
4.3.1
Theorem
Suppose X1 ; X2 ; : : : ; Xn are independent random variables and Xi has moment generating
function Mi (t) which exists for t 2 ( h; h) for some h > 0. The moment generating function
n
P
of Y =
Xi is given by
i=1
MY (t) =
n
Q
Mi (t)
i=1
for t 2 ( h; h).
If the Xi ’s are independent and identically distributed random variables each with
n
P
moment generating function M (t) then Y =
Xi has moment generating function
i=1
MY (t) = [M (t)]n
for t 2 ( h; h).
Proof
The moment generating function of Y =
n
P
i=1
Xi is
138
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
MY (t) = E etY
= E exp t
n
P
Xi
i=1
=
=
n
Q
i=1
n
Q
E etXi
Mi (t)
i=1
since X1 ; X2 ; : : : ; Xn are independent random variables
for t 2 ( h; h)
If X1 ; X2 ; : : : ; Xn are identically distributed each with moment generating function M (t)
then
n
Q
MY (t) =
M (t)
i=1
= [M (t)]n
for t 2 ( h; h)
as required.
Note: This theorem in conjunction with the Uniqueness Theorem for Moment Generating
Functions can be used to …nd the distribution of Y .
Here is a summary of results about sums of random variables for the named distributions.
4.3.2
(1) If Xi
Special Results
Binomial(ni ; p), i = 1; 2; : : : ; n independently, then
n
P
Xi
Binomial
i=1
(2) If Xi
Poisson( i ), i = 1; 2; : : : ; n independently, then
n
P
Xi
Poisson
n
P
i
i=1
Negative Binomial(ki ; p), i = 1; 2; : : : ; n independently, then
n
P
Xi
Negative Binomial
i=1
(4) If Xi
ni ; p
i=1
i=1
(3) If Xi
n
P
n
P
ki ; p
i=1
Exponential( ), i = 1; 2; : : : ; n independently, then
n
P
Xi
Gamma(n; )
i=1
(5) If Xi
Gamma( i ; ), i = 1; 2; : : : ; n independently, then
n
P
i=1
Xi
Gamma
n
P
i=1
i;
4.3. MOMENT GENERATING FUNCTION TECHNIQUE
(6) If Xi
2 (k ),
i
i = 1; 2; : : : ; n independently, then
n
P
n
P
2
Xi
i=1
(7) If Xi
N
;
139
2
ki
i=1
, i = 1; 2; : : : ; n independently, then
n
P
2
Xi
2
(n)
i=1
Proof
(1) Suppose Xi
function of Xi is
Binomial(ni ; p), i = 1; 2; : : : ; n independently. The moment generating
ni
Mi (t) = pet + q
for t 2 <
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
n
P
Xi is
i=1
MY (t) =
=
n
Q
i=1
n
Q
Mi (t)
pet + q
ni
i=1
n
P
pet + q
=
ni
i=1
which is the moment generating function of a Binomial
for t 2 <
n
P
ni ; p random variable. There-
i=1
fore by the Uniqueness Theorem for Moment Generating Functions
n
P
Xi
Binomial
ni ; p .
i=1
i=1
(2) Suppose Xi
tion of Xi is
n
P
Poisson( i ), i = 1; 2; : : : ; n independently. The moment generating funct
Mi (t) = e i (e
1)
for t 2 <
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
n
P
Xi is
i=1
MY (t) =
=
n
Q
i=1
n
Q
Mi (t)
t
e i (e
i=1
= e
n
P
i=1
i
1)
!
(et
1)
which is the moment generating function of a Poisson
for t 2 <
n
P
i=1
i
by the Uniqueness Theorem for Moment Generating Functions
random variable. Therefore
n
P
i=1
Xi
Poisson
n
P
i=1
i
.
140
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
(3) Suppose Xi
Negative Binomial(ki ; p), i = 1; 2; : : : ; n independently. The moment
generating function of Xi is
p
1 qet
Mi (t) =
ki
for t <
log (q)
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
n
P
Xi is
i=1
MY (t) =
=
n
Q
i=1
n
Q
i=1
Mi (t)
1
p
qet
n
P
p
1 qet
=
ki
ki
i=1
for t <
log (q)
which is the moment generating function of a Negative Binomial
n
P
ki ; p
random vari-
i=1
able. Therefore by the Uniqueness Theorem for Moment Generating Functions
n
n
P
P
ki ; p .
Xi Negative Binomial
i=1
i=1
(4) Suppose Xi
Exponential( ), i = 1; 2; : : : ; n independently. The moment generating
function of each Xi is
1
1
for t <
M (t) =
1
t
n
P
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
Xi is
i=1
n
1
MY (t) = [M (t)]n =
1
=
t
for t <
1
which is the moment generating function of a Gamma(n; ) random variable. Therefore by
n
P
the Uniqueness Theorem for Moment Generating Functions
Xi Gamma(n; ).
i=1
(5) Suppose Xi
function of Xi is
Gamma( i ; ), i = 1; 2; : : : ; n independently. The moment generating
Mi (t) =
i
1
1
for t <
t
1
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
n
P
i=1
MY (t) =
n
Q
Mi (t) =
i=1
i=1
=
1
1
n
Q
n
P
i=1
t
i
1
1
t
i
for t <
1
Xi is
4.3. MOMENT GENERATING FUNCTION TECHNIQUE
n
P
which is the moment generating function of a Gamma
141
i;
random variable. There-
i=1
fore by the Uniqueness Theorem for Moment Generating Functions
n
P
Xi
Gamma
i=1
(6) Suppose Xi
of Xi is
2 (k
i ),
n
P
i;
i=1
i = 1; 2; : : : ; n independently. The moment generating function
Mi (t) =
ki
1
1
1
2
for t <
2t
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
n
P
Xi is
i=1
MY (t) =
n
Q
Mi (t)
i=1
=
n
Q
1
i=1
=
ki
1
1
1
2t
n
P
ki
i=1
2t
2
which is the moment generating function of a
for t <
n
P
ki
1
2
random variable. Therefore by
i=1
the Uniqueness Theorem for Moment Generating Functions
n
P
Xi
i=1
(7) Suppose Xi
Theorem 2.6.3
N
;
2
2
n
P
ki .
i=1
, i = 1; 2; : : : ; n independently. Then by Example 2.6.9 and
2
Xi
2
(1) for i = 1; 2; : : : ; n
and by (6)
n
P
Xi
2
2
(n)
i=1
4.3.3
Exercise
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with
moment generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1. Give an exn
P
p
pression for the moment generating function of Z = n X
= where X = n1
Xi in
i=1
terms of M (t).
The following theorem is one that was used in your previous probability and statistics
courses without proof. The method of moment generating functions now allows us to easily
proof this result.
.
142
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.3.4
Theorem - Linear Combination of Independent Normal Random
Variables
If Xi
N( i ;
2 ),
i
i = 1; 2; : : : ; n independently, then
n
P
ai Xi
N
i=1
Proof
Suppose Xi
Xi is
2 ),
i
N( i ;
n
P
i=1
n
P
ai i ;
i=1
a2i
2
i
i = 1; 2; : : : ; n independently. The moment generating function of
Mi (t) = e
2 2
i t+ i t =2
for t 2 <
for i = 1; 2; : : : ; n. The moment generating function of Y =
n
P
ai Xi is
i=1
MY (t) = E etY
= E exp t
n
P
ai Xi
i=1
=
=
=
n
Q
E e(ai t)Xi
i=1
n
Q
i=1
n
Q
since X1 ; X2 ; : : : ; Xn are independent random variables
Mi (ai t)
e
2 2 2
i ai t+ i ai t =2
i=1
= exp
n
P
i=1
ai
i
n
P
t+
i=1
a2i
2
i
t2 =2
which is the moment generating function of a N
for t 2 <
n
P
i=1
ai i ;
n
P
i=1
a2i
2
i
random variable.
Therefore by the Uniqueness Theorem for Moment Generating Functions
n
P
ai Xi
N
i=1
4.3.5
Corollary
If Xi
N( ;
2 ),
n
P
i=1
ai i ;
n
P
i=1
a2i
i = 1; 2; : : : ; n independently then
n
P
Xi
2
N n ;n
i=1
and
X=
n
1 P
Xi
n i=1
2
N
;
n
2
i
4.3. MOMENT GENERATING FUNCTION TECHNIQUE
143
Proof
To prove that
n
P
Xi
2
N n ;n
i=1
let ai = 1,
i
2
i
= , and
=
2
in Theorem 4.3.4 to obtain
n
P
Xi
n
P
N
i=1
or
n
P
;
i=1
n
P
Xi
2
i=1
2
N n ;n
i=1
To prove that
X=
we note that
n
1 P
Xi
n i=1
n
P
X=
i=1
Let ai = n1 ,
i
= , and
X=
2
i
=
n
P
i=1
or
2
1
n
2
N
;
1
n
n
Xi
in Theorem 4.3.4 to obtain
Xi
n
P
N
1
n
i=1
;
n
P
i=1
1
n
2
2
!
2
X
N
;
n
The following identity will be used in proving Theorem 4.3.8.
4.3.6
Useful Identity
n
P
i=1
4.3.7
(Xi
)2 =
n
P
Xi
X
2
+n X
2
i=1
Exercise
Prove the identity 4.3.6
As mentioned previously the t distribution is used to construct con…dence intervals and
test hypotheses regarding the mean in a N ; 2 model. We are now able to prove the
theorem on which these results are based.
144
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.3.8
Theorem
If Xi
N( ;
2 ),
i = 1; 2; : : : ; n independently then
2
X
independently of
1) S 2
(n
2
N
n
P
;
Xi
n
2
X
i=1
=
where
S2 =
2
2
n
P
Xi
(n
1)
2
X
i=1
n
1
Proof
For a proof that X and S 2 are independent random variables please see Problem 16.
By identity 4.3.6
n
P
)2 =
(Xi
i=1
Dividing both sides by
2
n
P
Xi
X
2
+n X
2
i=1
gives
n
P
|i=1
2
Xi
=
{z
|
}
Y
(n
1) S 2
2
{z
2
+
}
U
X
p
= n
|
{z
}
V
Since X and S 2 are independent random variables, it follows that U and V are independent
random variables.
By 4.3.2(7)
Y =
n
P
2
Xi
2
(n)
i=1
with moment generating function
MY (t) = (1
X
N
;
2
n
2t)
n=2
for t <
1
2
was proved in Corollary 4.3.5. By Example 2.6.9
X
p
= n
N (0; 1)
and by Theorem 2.6.3
V =
X
p
= n
2
2
(1)
(4.4)
4.3. MOMENT GENERATING FUNCTION TECHNIQUE
145
with moment generating function
MV (t) = (1
1=2
2t)
for t <
1
2
(4.5)
Since U and V are independent random variables and Y = U + V then
MY (t) = E etY = E et(U +V ) = E etU E etV = MU (t) MV (t)
(4.6)
Substituting (4.4) and (4.5) into (4.6) gives
(1
2t)
n=2
= MU (t) (1
2t)
1=2
for t <
1
2
or
1
2
2
which is the moment generating function of a
(n 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions
MU (t) = (1
U=
4.3.9
Theorem
If Xi
N( ;
2 ),
(n
(n 1)=2
2t)
1) S 2
2
2
for t <
(n
1)
i = 1; 2; : : : ; n independently then
X
p
S= n
t (n
1)
Proof
X
p
X
Z
= n
p =r
=q
S= n
(n 1)S 2
U
2
n 1
n 1
where
Z=
X
p
= n
(n
1) S 2
N (0; 1)
independently of
U=
2
2
(n
1)
Therefore by Theorem 4.2.9
X
p
S= n
t (n
1)
The following theorem is useful for testing the equality of variances in a two sample Normal
model.
146
4.3.10
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Theorem
Suppose X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables, and independently
Y1 ; Y2 ; : : : ; Ym are independent N 2 ; 22 random variables. Let
S12 =
Then
4.3.11
n
P
Xi
X
i=1
n
and S22 =
1
S12 =
S22 =
2
2
1
2
2
F (n
1; m
m
P
Yi
Y
i=1
m
1)
Exercise
Prove Theorem 4.3.10. Hint: Use Theorems 4.3.8 and 4.2.11.
1
2
4.4. CHAPTER 4 PROBLEMS
4.4
147
Chapter 4 Problems
1. Show that if X and Y are independent random variables then U = h (X) and
V = g (Y ) are also independent random variables where h and g are real-valued
functions.
2. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 24xy for 0 < x + y < 1; 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X + Y and V = X. Be sure
to specify the support set of (U; V ).
(b) Find the marginal probability density function of U and the marginal probability
density function of V . Be sure to specify their support sets.
3. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = e y for 0 < x < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X + Y and V = X. Be sure
to specify the support set of (U; V ).
(b) Find the marginal probability density function of U and the marginal probability
density function of V . Be sure to specify their support sets.
4. Suppose X and Y are nonnegative continuous random variables with joint probability
density function f (x; y). Show that the probability density function of U = X + Y
is given by
Z1
g (u) = f (v; u v) dv
0
Hint: Consider the transformation U = X + Y and V = X.
5. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 2 (x + y) for 0 < x < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X and V = XY . Be sure to
specify the support set of (U; V ).
148
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
(b) Are U and V independent random variables?
(c) Find the marginal probability density function’s of U and V . Be sure to specify
their support sets.
6. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the probability density function of T = X + Y using the cumulative distribution function technique.
(b) Find the joint probability density function of S = X and T = X + Y . Find the
marginal probability density function of T and compare your answer to the one
you obtained in (a).
(c) Find the joint probability density function of U = X 2 and V = XY . Be sure to
specify the support set of (U; V ).
(d) Find the marginal probability density function’s of U and V:
(e) Find E(V 3 ): (Hint: Are X and Y independent random variables?)
7. Suppose X and Y are continuous random variables with joint probability density
function
f (x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X=Y and V = XY: Be sure
to specify the support set of (U; V ).
(b) Are U and V independent random variables?
(c) Find the marginal probability density function’s of U and V . Be sure to specify
their support sets.
8. Suppose X and Y are independent Uniform(0; ) random variables. Find the probability density function of U = X Y:
(Hint: Complete the transformation with V = X + Y:)
9. Suppose Z1
N(0; 1) and Z2
X1 =
where
1<
1;
2
1
+
< 1,
(a) Show that (X1 ; X2 )T
N(0; 1) independently. Let
1 Z1 ,
1,
X2 =
2
> 0 and
BVN( ; ).
2
+
2[
1<
Z1 + 1
< 1.
2 1=2
Z2 ]
4.4. CHAPTER 4 PROBLEMS
149
(b) Show that (X )T 1 (X
where Z = (Z1 ; Z2 )T .
10. Suppose X
V =X Y.
N
;
2
2 (2).
)
and Y
N
;
2
Hint: Show (X
)T
1 (X
) = ZT Z
independently. Let U = X + Y and
(a) Find the joint moment generating function of U and V .
(b) Use (a) to show that U and V are independent random variables.
11. Let X and Y be independent N(0; 1) random variables and let U = X=Y .
(a) Show that U
Cauchy(1; 0).
(b) Show that the Cauchy(1; 0) probability density function is the same as the t(1)
probability density function
12. Let X1 ; X2 ; X3 be independent Exponential(1) random variables. Let the random
variables Y1 ; Y2 ; Y3 be de…ned by
Y1 =
X1
;
X1 + X2
Y2 =
X1 + X2
;
X1 + X2 + X3
Y3 = X1 + X2 + X3
Show that Y1 ; Y2 ; Y3 are independent random variables and …nd their marginal probability density function’s.
13. Let X1 ; X2 ; X3 be independent N(0; 1) random variables. Let the random variables
Y1 ; Y2 ; Y3 be de…ned by
X1 = Y1 cos Y2 sin Y3; X2 = Y1 sin Y2 sin Y3 ; X3 = Y1 cos Y3
for 0 < y1 < 1; 0 < y2 < 2 ; 0 < y3 <
Show that Y1 ; Y2 ; Y3 are independent random variables and …nd their marginal probability density function’s.
14. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( ) distribution. Find
n
P
the conditional probability function of X1 ; X2 ; : : : ; Xn given T =
Xi = t.
i=1
2 (n), X + Y
2 (m), m > n and X and Y independent random
15. Suppose X
variables. Use the properties of moment generating functions to show that
2 (m
Y
n).
150
4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
16. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N( ; 2 ) distribution. In this
n
P
(Xi X)2 are independent random variables.
problem we wish to show that X and
i=1
Note that this implies that X and
S2
=
1
n 1
variables.
Let U = (U1 ; U2 ; : : : ; Un ) where Ui = Xi
n
P
(Xi X)2 are also independent random
i=1
X; i = 1; 2; : : : ; n and let
M (s1 ; s2 ; : : : ; sn ; s) = E[exp(
n
P
si Ui + sX)]
i=1
be the joint moment generating function of U and X.
(a) Let ti = si
s + s=n; i = 1; 2; : : : ; n where s =
E[exp(
n
P
n
P
si Ui + sX)] = E[exp(
i=1
si . Show that
i=1
ti Xi )] = exp
N( ;
2)
ti = s and
i=1
n
P
i=1
(c) Use (a) and (b) to show that
t2i =
n
P
n
P
ti +
ti +
2
n
P
i=1
i=1
E[exp(ti Xi )] = exp
n
P
n
P
i=1
Hint: Since Xi
(b) Verify that
1
n
t2i =2
2 2
ti =2
s)2 + s2 =n.
(si
i=1
M (s1 ; s2 ; : : : ; sn ; s) = exp[ s + (
2
=n)(s2 =2)] exp[
2
n
P
(si
s)2 =2]
i=1
(d) Show that the random variable X is independent of the random vector U and
n
P
thus X and
(Xi X)2 are independent. Hint: MX (s) = M (0; 0; : : : ; 0; s) and
i=1
MU (s1 ; s2 ; : : : ; sn ) = M (s1 ; s2 ; : : : ; sn ; 0).
5. Limiting or Asymptotic
Distributions
In a previous probability course the Poisson approximation to the Binomial distribution
n x
p (1
x
p)n
x
(np)x e
x!
np
for x = 0; 1; : : : ; n
if n is large and p is small was used.
As well the Normal approximation to the Binomial distribution
P (Xn
n x
p (1
x
x) =
P
Z
p)n
x
for x = 0; 1; : : : ; n
!
x np
p
np (1 p)
where Z
N (0; 1)
if n is large and p is close to 1=2 (a special case of the very important Central Limit Theorem)
was used. These are examples of what we will call limiting or asymptotic distributions.
In this chapter we consider a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and look
at the de…nitions and theorems related to determining the limiting distribution of such a
sequence. In Section 5:1 we de…ne convergence in distribution and look at several examples
to illustrate its meaning. In Section 5:2 we de…ne convergence in probability and examine
its relationship to convergence in distribution. In Section 5:3 we look at the Weak Law of
Large Numbers which is an important theorem when examining the behaviour of estimators
of unknown parameters (Chapter 6). In Section 5:4 we use the moment generating function
to …nd limiting distributions including a proof of the Central Limit Theorem. The Central
Limit Theorem was used in STAT 221/231/241 to construct an approximate con…dence
interval for an unknown parameter. In Section 5:5 additional limit theorems for …nding
limiting distributions are introduced. These additional theorems allow us to determine new
limiting distributions by combining the limiting distributions which have been determined
from de…nitions, the Weak Law of Large Numbers, and/or the Central Limit Theorem.
151
152
5.1
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Convergence in Distribution
In calculus you studied sequences of real numbers a1 ; a2 ; : : : ; an ; : : : and learned theorems
which allowed you to evaluate limits such as lim an . In this course we are interested in
n!1
a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and what happens to the distribution
of Xn as n ! 1. We do this be examining what happens to Fn (x) = P (Xn x), the
cumulative distribution function of Xn , as n ! 1. Note that for a …xed value of x, the
sequence F1 (x) ; F2 (x) ; : : : ; Fn (x) : : : is a sequence of real numbers. In general we will
obtain a di¤erent sequence of real numbers for each di¤erent value of x. Since we have
a sequence of real numbers we will be able to use limit theorems you have used in your
previous calculus courses to evaluate lim Fn (x). We will need to take care in determining
n!1
how Fn (x) behaves as n ! 1 for all real values of x. To formalize these ideas we give the
following de…nition for convergence in distribution of a sequence of random variables.
5.1.1
De…nition - Convergence in Distribution
Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables and let F1 (x) ; F2 (x) ; : : : ; Fn (x) ; : : :
be the corresponding sequence of cumulative distribution functions, that is, Xn has cumulative distribution function Fn (x) = P (Xn x). Let X be a random variable with cumulative distribution function F (x) = P (X x). We say Xn converges in distribution to X
and write
Xn !D X
if
lim Fn (x) = F (x)
n!1
at all points x at which F (x) is continuous. We call F the limiting or asymptotic distribution
of Xn .
Note:
(1) Although we say the random variable Xn converges in distribution to the random
variable X, the de…nition of convergence in distribution is de…ned in terms of the pointwise
convergence of the corresponding sequence of cumulative distribution functions.
(2) This de…nition holds for both discrete and continuous random variables.
(3) One way to think about convergence in distribution is that, if Xn !D X, then for large
n
Fn (x) = P (Xn
x)
F (x) = P (X
x)
if x is a point of continuity of F (x). How good the approximation is will depend on the
values of n and x.
The following theorem and corollary will be useful in determining limiting distributions.
5.1. CONVERGENCE IN DISTRIBUTION
5.1.2
Theorem - e Limit
If b and c are real constants and lim
n!1
lim
n!1
5.1.3
153
(n) = 0 then
1+
b
+
n
(n)
n
cn
= ebc
Corollary
If b and c are real constants then
lim
1+
n!1
5.1.4
cn
b
n
= ebc
Example
Let Yi
Exponential(1) ; i = 1; 2; : : : independently. Consider the sequence of random
variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ) log n. Find the limiting
distribution of Xn .
Solution
Since Yi
Exponential(1)
P (Yi
y) =
(
0
1
y
e
y
0
y>0
for i = 1; 2; : : :. Since the Yi ’s are independent random variables
Fn (x) = P (Xn
x) = P (max (Y1 ; Y2 ; : : : ; Yn )
= P (max (Y1 ; Y2 ; : : : ; Yn )
= P (Y1 x + log n; Y2
n
Q
=
P (Yi x + log n)
=
i=1
n
Q
e
(x+log n)
x
n
1
log n
x + log n)
x + log n; : : : ; Yn
for x + log n > 0
i=1
=
As n ! 1,
log n !
1
e
n
for x >
n!1
lim
n!1
= e
by 5.1.3.
log n
1 so
lim Fn (x) =
e
x
x)
1+
( e x)
n
for x 2 <
n
x + log n)
154
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Consider the function
e
F (x) = e
x
for x 2 <
x
Since F 0 (x) = e x e e > 0 for all x 2 <, therefore F (x) is a continuous, increasing
x
x
function for x 2 <. Also lim F (x) = lim e e = 0, and lim F (x) = lim e e = 1.
x! 1
x!1
x! 1
x!1
Therefore F (x) is a cumulative distribution function for a continuous random variable.
Let X be a random variable with cumulative distribution function F (x). Since
lim Fn (x) = F (x)
n!1
for all x 2 <, that is at all points x at which F (x) is continuous, therefore
Xn !D X
In Figure 5.1 you can see how quickly the curves Fn (x) approach the limiting curve
F (x).
1
n= 1
n= 2
n= 5
n= 10
n= infinity
0.9
0.8
0.7
Fn(x)
0.6
0.5
0.4
0.3
0.2
0.1
0
-3
-2
-1
0
x
Figure 5.1: Graphs of Fn (x) = 1
5.1.5
1
e
2
x
n
n
3
4
for n = 1; 2; 5; 10; 1
Example
Let Yi Uniform(0; ), i = 1; 2; : : : independently. Consider the sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ). Find the limiting distribution
of Xn .
Solution
Since Yi
Uniform(0; )
P (Yi
y) =
8
0
>
>
<
y
>
>
: 1
y
0
0<y<
y
5.1. CONVERGENCE IN DISTRIBUTION
155
for i = 1; 2; : : :. Since the Yi ’s are independent random variables
Fn (x) = P (Xn
= P (Y1
8 n
Q
>
>
0
>
>
>
i=1
>
> Q
n
<
x
=
>
i=1
>
>
n
>
Q
>
>
1
>
:
8
>
>
<
=
x)
i=1
x
0
0<x<
x
i=1
>
>
:
Therefore
x) = P (max (Y1 ; Y2 ; : : : ; Yn ) x)
n
Q
P (Yi
x; Y2 x; : : : ; Yn x) =
0
x n
x
0
0<x<
1
x
lim Fn (x) =
n!1
(
0
x<
1
x
= F (x)
In Figure 5.2 you can see how quickly the curves Fn (x) approach the limiting curve F (x).
1
0.9
0.8
0.7
n= 1
n= 2
n= 5
n= 10
n= 100
n= infinity
0.6
Fn(x)
0.5
0.4
0.3
0.2
0.1
0
-0.5
0
0.5
Figure 5.2: Graphs of Fn (x) =
1
x
x n
for
1.5
2
2.5
= 2 and n = 1; 2; 5; 10; 100; 1
It is straightforward to check that F (x) is a cumulative distribution function for the
discrete random variable X with probability function
(
1
y=
f (x) =
0 otherwise
156
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Therefore
Xn !D X
Since X only takes on one value with probability one, X is called a degenerate random
variable. When Xn converges in distribution to a degenerate random variable we also call
this convergence in probability to a constant as de…ned in the next section.
5.1.6
Comment
Suppose X1 ; X2 ; : : : ; Xn ; : : :is a sequence of random variables such that Xn !D X. Then
for large n we can use the approximation
P (Xn
x)
P (X
x)
If X is degenerate at b then P (X = b) = 1 and this approximation is not very useful.
However, if the limiting distribution is degenerate then we could use this result in another
way. In Example 5.1.5 we showed that if Yi
Uniform(0; ), i = 1; 2; : : : ; n independently then Xn = max (Y1 ; Y2 ; : : : ; Yn ) converges in distribution to a degenerate random
variable X with P (X = ) = 1. This result is rather useful since, if we have observed data
y1 ; y2 ; : : : ; yn from a Uniform(0; ) distribution and is unknown, then this suggests using
y(n) = max (y1 ; y2 ; : : : ; yn ) as an estimate of if n is reasonably large. We will discuss this
idea in more detail in Chapter 6.
5.2
Convergence in Probability
De…nition 5.1.1 is useful for …nding the limiting distribution of a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : when the sequence of corresponding cumulative distribution functions F1 ; F2 ; : : : ; Fn ; : : : can be obtained. In other cases we may use the following de…nition
to determine the limiting distribution.
5.2.1
De…nition - Convergence in Probability
A sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges in probability to a random
variable X if, for all " > 0,
lim P (jXn Xj ") = 0
n!1
or equivalently
lim P (jXn
n!1
Xj < ") = 1
We write
Xn !p X
Convergence in probability is a stronger former of convergence than convergence in distribution in the sense that convergence in probability implies convergence in distribution as
stated in the following theorem. However, if Xn converges in distribution to X, then Xn
may or may not converge in probability to X.
5.2. CONVERGENCE IN PROBABILITY
5.2.2
157
Theorem - Convergence in Probability Implies Convergence in Distribution
If Xn !p X then Xn !D X.
In Example 5.1.5 the limiting distribution was degenerate. When the limiting distribution
is degenerate we say Xn converges in probability to a constant. The following de…nition,
which follows from De…nition 5.2.1, can be used for proving convergence in probability.
5.2.3
De…nition - Convergence in Probability to a Constant
A sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges in probability to a constant b
if, for all " > 0,
lim P (jXn bj ") = 0
n!1
or equivalently
lim P (jXn
bj < ") = 1
n!1
We write
Xn !p b
5.2.4
Example
Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables with E(Xn ) =
V ar(Xn ) = 2n . If lim n = a and lim 2n = 0 then show Xn !p a.
n!1
n
and
n!1
Solution
To show Xn !p a we need to show that for all " > 0
lim P (jXn
aj < ") = 1
lim P (jXn
aj
n!1
or equivalently
n!1
") = 0
Recall Markov’s Inequality. For all k; c > 0
Xk
ck
Therefore by Markov’s Inequality with k = 2 and c = " we have
P (jXj
0
Now
E jXn
aj
2
lim P (jXn
n!1
h
= E (Xn
h
= E (Xn
E
c)
aj
2
a)
")
lim
n!1
i
i
2
)
+ 2(
n
n
E jXn
aj2
(5.1)
"2
a) E (Xn
n)
+(
n
a)2
158
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Since
lim E [Xn
n]
n!1
2
= lim V ar (Xn ) = lim
n!1
n!1
2
n
=0
and
lim (
n!1
a) = 0 since
n
therefore
lim
n!1
n
h
aj2 = lim E (Xn
lim E jXn
n!1
n!1
which also implies
lim
i
a)2 = 0
aj2
E jXn
n!1
=a
=0
"2
(5.2)
Thus by (5.1), (5.2) and the Squeeze Theorem
lim P (jXn
n!1
aj
") = 0
for all " > 0 as required.
The proof in Example 5.2.4 used De…nition 5.2.3 to prove convergence in probability. The
reason for this is that the distribution of the Xi ’s was not speci…ed. Only conditions on
E(Xn ) and V ar(Xn ) were speci…ed. This means that the result in Example 5.2.4 holds for
any sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : satisfying the given conditions.
If the sequence of corresponding cumulative distribution functions F1 ; F2 ; : : : ; Fn ; : : : can be
obtained then the following theorem can also be used to prove convergence in probability
to a constant.
5.2.5
Theorem
Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables such that Xn has cumulative
distribution function Fn (x). If
lim Fn (x) = lim P (Xn
n!1
n!1
x) =
then Xn !p b.
8
<0
:1
x<b
x>b
Note: We do not need to worry about whether lim Fn (b) exists since x = b is a point of
n!1
discontinuity of the limiting distribution (see De…nition 5.1.1).
5.3. WEAK LAW OF LARGE NUMBERS
5.2.6
159
Example
Let Yi
Exponential( ; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = min (Y1 ; Y2 ; : : : ; Yn ). Show that Xn !p .
Solution
Since Yi
Exponential( ; 1)
P (Yi > y) =
(
e
(y
)
y>
1
y
for i = 1; 2; : : :. Since Y1 ; Y2 ; : : : ; Yn are independent random variables
Fn (x) = 1
P (Xn > x) = 1
P (min (Y1 ; Y2 ; : : : ; Yn ) > x)
n
Q
P (Y1 > x; Y2 > x; : : : ; Yn > x) = 1
P (Yi > x)
= 1
8
n
Q
>
>
1
1
>
<
i=1
=
n
Q
>
>
e (x
>
: 1
i=1
x
)
x>
i=1
=
(
0
1
e
n(x
)
x
x>
Therefore
lim Fn (x) =
n!1
(
0
x
1
x>
which we note is not a cumulative distribution function since the function is not rightcontinuous at x = . However
(
0
x<
lim Fn (x) =
n!1
1
x>
and therefore by Theorem 5.2.5, Xn !p . In Figure 5.3 you can see how quickly the limit
is approached.
5.3
Weak Law of Large Numbers
In this section we look at a very important result which we will use in Chapter 6 to show
that maximum likelihood estimators have good properties. This result is called the Weak
Law of Large Numbers. Needless to say there is another law called the Strong Law of Large
Numbers but we will not consider this law here.
Also in this section we will look at some simulations to illustrate the theoretical result
in the Weak Law of Large Numbers.
160
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
1
0.9
0.8
0.7
F n(x)
0.6
0.5
n =1
n =2
n =5
n =10
n=100
n =infinity
0.4
0.3
0.2
0.1
0
1.5
2
2.5
3
x
Figure 5.3: Graphs of Fn (x) = 1
5.3.1
3.5
e
n(x
)
for
4
=2
Weak Law of Large Numbers
Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with
E(Xi ) = and V ar(Xi ) = 2 < 1. Consider the sequence of random X1 ; X2 ; : : : ; Xn ; : : :
where
n
1 P
Xn =
Xi
n i=1
Then
Xn !p
Proof
Using De…nition 5.2.3 we need to show
lim P
n!1
Xn
" = 0 for all " > 0
We apply Chebyshev’s Theorem (see 2.8.2) to the random variable Xn where E Xn =
(see Corollary 3.6.3(3)) and V ar Xn = 2 =n (see Theorem 3.6.7(4)) to obtain
Let k =
p
n"
P
Xn
0
P
k
p
1
k2
n
for all k > 0
. Then
2
Xn
"
Since
n"
2
lim
n!1
n"
=0
for all " > 0
5.3. WEAK LAW OF LARGE NUMBERS
161
therefore by the Squeeze Theorem
lim P
n!1
" = 0 for all " > 0
Xn
as required.
Notes:
(1) The proof of the Weak Law of Large Numbers does not actually require that the random
variables be identically distributed, only that they all have the same mean and variance.
As well the proof does not require knowing the distribution of these random variables.
(2) In words the Weak Law of Large Numbers says that the sample mean Xn approaches
the population mean as n ! 1.
5.3.2
If X
Example
Pareto(1; ) then X has probability density function
f (x) =
for x 1
x +1
and 0 otherwise. X has cumulative distribution function
8
<0
if x < 1
F (x) =
1
:1
for x 1
x
and inverse cumulative distribution function
F
1
(x) = (1
x)
E (X) =
8
<1
Also
and
:
1=
for 0 < x < 1
if 0 <
1
if
1
>1
V ar (X) =
(
1)2 (
2)
Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed Pareto(1; ) random
variables. By the Weak Law of Large Numbers
n
1 P
Xn =
Xi !p E (X) =
for > 1
n i=1
1
If Ui
Uniform(0; 1), i = 1; 2; : : : ; n independently then by Theorem 2.6.6
Xi = F
= (1
1
(Ui )
Ui )
1=
Pareto (1; )
i = 1; 2; : : : ; n independently. If we generate Uniform(0; 1) observations u1 ; u2 ; : : : ; un using
a random number generator and then let xi = (1 ui ) 1= then x1 ; x2 ; : : : ; xn are observations from the Pareto(1; ) distribution.
162
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 5)
distribution are plotted in Figure 5.4.
4.5
4
3.5
3
xi
2.5
2
1.5
1
0.5
0
100
200
300
400
500
i
Figure 5.4: 500 observations from a Pareto(1; 5) distribution
Figure 5.5 shows a plot of the points (n; xn ), n = 1; 2; : : : ; 500 where xn =
1
n
n
P
xi is
i=1
the sample mean. We note that the sample mean xn is approaching the population mean
= E (X) = 5 5 1 = 1:25 as n increases.
1.5
1.45
1.4
1.35
sample
mean
1.3
µ=1.25
1.25
1.2
1.15
0
100
200
300
400
500
n
Figure 5.5: Graph of xn versus n for 500 Pareto(1; 5) observations
5.3. WEAK LAW OF LARGE NUMBERS
163
If we generate a further 1000 values of xi , and plot (i; xi ) ; i = 1; 2; : : : ; 1500 we obtain the
graph in Figure 5.6.
5.5
5
4.5
4
3.5
xi
3
2.5
2
1.5
1
0.5
0
500
1000
1500
i
Figure 5.6: 1500 observations from a Pareto(1; 5) distribution
The corresponding plot of (n; xn ), n = 1; 2; : : : ; 1500 in shown in Figure 5.7. We note that
the sample mean xn stays very close to the population mean = 1:25 for n > 500.
1.5
1.45
1.4
1.35
sample
mean
1.3
µ=1.25
1.25
1.2
1.15
0
500
1000
1500
n
Figure 5.7: Graph of xn versus n for 1500 Pareto(1; 5) observations
Note that these …gures correspond to only one set of simulated data. If we generated
another set of data using a random number generator the actual data points would change.
However what would stay the same is that the sample mean for the new data set would
still approach the mean value E (X) = 1:25 as n increases.
164
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 0:5)
distribution are plotted in Figure 5.8. For this distribution
E (X) =
Z1
x
x
+1
dx =
1
Z1
1
dx which diverges to 1
x
1
Note in Figure 5.8 that the are some very large observations. In particular there is one
observation which is close to 40; 000.
4
x10
4
3.5
3
2.5
xi
2
1.5
1
0.5
0
0
100
200
i
300
400
500
Figure 5.8: 500 observations from a Pareto(1; 0:5) distribution
The corresponding plot of (n; xn ), n = 1; 2; : : : ; 500 for these data is given in Figure 5.9.
Note that the mean xn does not appear to be approaching a …xed value.
700
600
500
sample
mean
400
300
200
100
0
0
100
200
n
300
400
500
Figure 5.9: Graph of xn versus n for 500 Pareto(1; 0:5) observations
5.3. WEAK LAW OF LARGE NUMBERS
165
In Figure 5.10 the points (n; xn ) for a set of 50; 000 observations generated from a Pareto(1; 0:5)
distribution are plotted. Note that the mean xn does not approach a …xed value and in
general is getting larger as n gets large. This is consistent with E (X) diverging to 1.
x10
12
4
10
8
sample
mean
6
4
2
0
0
1
2
n
3
4
5
x10
4
Figure 5.10: Graph of xn versus n for 50000 Pareto(1; 0:5) observations
5.3.3
If X
Example
Cauchy(0; 1) then
f (x) =
1
(1 + x2 )
for x 2 <
The probability density function, shown in Figure 5.11, is symmetric about the y axis and
the median of the distribution is equal to 0.
0.35
0.3
0.25
0.2
f(x)
0.15
0.1
0.05
0
-5
0
x
5
Figure 5.11: Cauchy(0; 1) probability density function
166
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
The mean of a Cauchy(0; 1) random variable does not exist since
1
Z0
x
dx diverges to
1 + x2
1
and
1
Z1
1
(5.3)
x
dx diverges to 1
1 + x2
(5.4)
0
The cumulative distribution function of a Cauchy(0; 1) random variable is
F (x) =
1
arctan x +
1
2
for x 2 <
and the inverse cumulative distribution function is
F
1
(x) = tan
1
2
x
for 0 < x < 1
If we generate Uniform(0; 1) observations u1 ; u2 ; : : : ; uN using a random number generator
and then let xi = tan
ui 12 , then the xi ’s are observations from the Cauchy(0; 1)
distribution.
The points (n; xn ) for a set of 500; 000 observations generated from a Cauchy(0; 1) distribution are plotted in Figure 5.12. Note that xn does not approach a …xed value. However,
unlike the Pareto(1; 0:5) example in which xn was getting larger as n got larger, we see in
Figure 5.12 that xn drifts back and forth around the line y = 0. This behaviour, which is
consistent with (5.3)and (5.4), continues even if more observations are generated.
8
6
4
2
0
sample
mean -2
-4
-6
-8
-10
-12
0
1
2
3
n
4
5
x10
5
Figure 5.12: Graph of xn versus n for 50000 Cauchy(0; 1) observations
5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS167
5.4
Moment Generating Function Technique for Limiting Distributions
We now look at the moment generating function technique for determining a limiting
distribution. Suppose we have a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and
M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : is the corresponding sequence of moment generating functions. For a …xed value of t, the sequence M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : is a sequence of
real numbers. In general we will obtain a di¤erent sequence of real numbers for each different value of t. Since we have a sequence of real numbers we will be able to use limit
theorems you have used in your previous calculus courses to evaluate lim Mn (t). We will
n!1
need to take care in determining how Mn (t) behaves as n ! 1 for an interval of values of
t containing the value 0. Of course this technique only works if the the moment generating
function exists and is tractable.
5.4.1
Limit Theorem for Moment Generating Functions
Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that Xn has moment generating function Mn (t). Let X be a random variable with moment generating function M (t).
If there exists an h > 0 such that
lim Mn (t) = M (t) for all t 2 ( h; h)
n!1
then
Xn !D X
Note:
(1) The sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges if the corresponding sequence of moment generating functions M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : converges pointwise.
(2) This de…nition holds for both discrete and continuous random variables.
Recall from De…nition 5.1.1 that
Xn !D X
if
lim Fn (x) = F (x)
n!1
at all points x at which F (x) is continuous. If X is a discrete random variable then the
cumulative distribution function is a right continuous function. The values of x of main
interest for a discrete random variable are exactly the points at which F (x) is discontinuous. The following theorem indicates that lim Fn (x) = F (x) holds for the values of x at
n!1
which F (x) is discontinuous if Xn and X are non-negative integer-valued random variables.
The named discrete distributions Bernoulli, Binomial, Geometric, Negative Binomial, and
Poisson are all non-negative integer-valued random variables.
168
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
5.4.2
Theorem
Suppose Xn and X are non-negative integer-valued random variables. If Xn !D X then
lim P (Xn x) = P (X x) holds for all x and in particular
n!1
lim P (Xn = x) = P (X = x) for x = 0; 1; : : : :
n!1
5.4.3
Example
Consider the sequence of random variables X1 ; X2 ; : : : ; Xk ; : : : where
Xk
Negative Binomial(k; p). Use Theorem 5.4.1 to determine the limiting distribution
of Xk as k ! 1, p ! 1 such that kq=p = remains constant where q = 1 p. Use this
limiting distribution and Theorem 5.4.2 to give an approximation for P (Xk = x).
Solution
If Xk
Negative Binomial(k; p) then
If
k
p
1 qet
Mk (t) = E etXk =
for t <
log q
(5.5)
= kq=p then
k
+k
p=
and q =
(5.6)
+k
Substituting 5.6 into 5.5 and simplifying gives
k
+k
Mk (t) =
1
"
=
"
=
Now
lim
k!1
"
1
+k e
t
!k
1
1
1
(et 1)
k
#k
et 1
k
et 1
k
#
1
=
#
+k
k
et
!k
k
for t <
log
+k
k
t
= e (e
1)
for t < 1
t
by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of
a Poisson( ) random variable then by Theorem 5.4.1, Xk !D X Poisson( ).
By Theorem 5.4.2
P (Xk = x) =
k k
p ( q)x
x
kq
p
x
e
x!
kq=p
for x = 0; 1; : : :
5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS169
5.4.4
Exercise - Poisson Approximation to the Binomial Distribution
Consider the sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : where Xn Binomial(n; p).
Use Theorem 5.4.1 to determine the limiting distribution of Xn as n ! 1, p ! 0 such
that np = remains constant. Use this limiting distribution and Theorem 5.4.2 to give an
approximation for P (Xn = x).
In your previous probability and statistics courses you would have used the Central Limit
Theorem (without proof!) for approximating Binomial and Poisson probabilities as well as
constructing approximate con…dence intervals. We now give a proof of this theorem.
5.4.5
Central Limit Theorem
Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 < 1. Consider the sequence of random variables Z1 ; Z2 ; : : : ; Zn ; : : :
p
n
P
n(Xn
)
where Zn =
and Xn = n1
Xi . Then
i=1
Zn =
p
n Xn
!D Z
N(0; 1)
Proof
We can write Zn as
n
1 P
Zn = p
(Xi
n i=1
)
Suppose that for i = 1; 2; : : :, Xi has moment generating function MX (t), t 2 ( h; h) for
some h > 0. Then for i = 1; 2; : : :, (Xi
) has moment generating function
t
M (t) = e MX (t), t 2 ( h; h) for some h > 0. Note that
M (0) = 1, M 0 (0) = E (Xi
and
h
M 00 (0) = E (Xi
) = E (Xi )
i
)2 = V ar (Xi ) =
=0
2
Also by Taylor’s Theorem (see 2.11.15) for n = 2 we have
1
M (t) = M (0) + M 0 (0) t + M 00 (c) t2
2
1 00
2
= 1 + M (c) t
2
for some c between 0 and t.
(5.7)
170
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Since X1 ; X2 ; : : : ; Xn are independent and identically distributed, the moment generating function of Zn is
Mn (t) = E etZn
= E exp
=
t
p
M
Using (5.7) in (5.8) gives
"
1
Mn (t) = 1 + M 00 (cn )
2
(
1
t
p
1+
=
2
n
t
p
n
n
t P
p
(Xi
n i=1
n
2
t
p
for
n
<h
n
(5.8)
#
2 n
2
M 00 (cn )
for some cn between 0 and
But M 00 (0) =
)
1
M 00 (0) +
2
t
p
t
p
2
n
M 00 (0)
)n
n
so
(
Mn (t) =
1 2
2t
1+
t2
2 2
+
n
[M 00 (cn )
n
for some cn between 0 and
t
p
Since cn is between 0 and
n
M 00 (0)]
t
p
)n
n
, cn ! 0 as n ! 1. Since M 00 (t) is continuous on ( h; h)
lim M 00 (cn ) = M 00
n!1
and
lim
n!1
lim cn = M 00 (0) =
[M 00 (cn )
t2
2 2
2
n!1
M 00 (0)]
n
=0
Therefore by Theorem 5.1.2, with
t2
2 2
(n) =
we have
lim Mn (t) =
n!1
lim
n!1
1 2
= e2t
(
1+
M 00 (cn )
1 2
2t
n
+
t2
2 2
M 00 (0)
[M 00 (cn )
M 00 (0)]
n
)n
for jtj < 1
which is the moment generating function of a N(0; 1) random variable. Therefore by Theorem 5.4.1
Zn !D Z N(0; 1)
5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS171
as required.
Note: Although this proof assumes that the moment generating function of Xi , i = 1; 2; : : :
exists, it does not make any assumptions about the form of the distribution of the Xi ’s.
There are other more general proofs of the Central Limit Theorem which only assume the
existence of the variance 2 (which implies the existence of the mean ).
5.4.6
Example - Normal Approximation to the
2
Distribution
2 (n), n = 1; 2; : : :. Consider the sequence of random variables Z ; Z ; : : : ; Z ; : : :
Suppose Yn
1
2
n
p
where Zn = (Yn n) = 2n. Show that
Yn n
Zn = p
!D Z
2n
N(0; 1)
Solution
2 (1), i = 1; 2; : : : independently. Since X ; X ; : : : are independent and identically
Let Xi
1
2
distributed random variables with E (Xi ) = 1 and V ar (Xi ) = 2, then by the Central Limit
Theorem
p
n Xn 1
p
!D Z N(0; 1)
2
n
P
But Xn = n1
Xi so
i=1
p
n
1
n
n
P
p
where Sn =
n
P
Xi
1
i=1
2
Sn
= p
n
2n
Xi . Therefore
i=1
Sn
p
Now by 4.3.2(6), Sn
that
5.4.7
2 (n)
n
!D Z
2n
N(0; 1)
and therefore Yn and Sn have the same distribution. It follows
Yn n
Zn = p
!D Z
2n
N(0; 1)
Exercise - Normal Approximation to the Binomial Distribution
Suppose Yn
Binomial(n; p); n = 1; 2; : : : Consider the sequence of random variables
p
Z1 ; Z2 ; : : : ; Zn ; : : : where Zn = (Yn np) = np(1 p). Show that
Hint: Let Xi
Yn np
Zn = p
!D Z
np(1 p)
N(0; 1)
Binomial(1; p); i = 1; 2; : : : ; n independently.
172
5.5
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Additional Limit Theorems
Suppose we know the limiting distribution of one or more sequences of random variables by
using the de…nitions and/or theorems in the previous sections of this chapter. The theorems
in this section allow us to more easily determine the limiting distribution of a function of
these sequences.
5.5.1
Limit Theorems
(1) If Xn !p a and g is continuous at x = a then g(Xn ) !p g(a).
(2) If Xn !p a, Yn !p b and g(x; y) is continuous at (a; b) then g(Xn ; Yn ) !p g(a; b).
(3) (Slutsky’s Theorem) If Xn !p a, Yn !D Y and g(a; y) is continuous for all y 2 support
set of Y then g(Xn ; Yn ) !D g(a; Y ).
Proof of (1)
Since g is continuous at x = a then for every " > 0 there exists a > 0 such that jx
implies jg (x) g (a)j < ". By Example 2.1.4(d) this implies that
P (jg (Xn )
g (a)j < ")
P (jX
But Xn !p a it follows that for every " > 0 there exists a
lim P (jg (Xn )
n!1
g (a)j < ")
aj <
aj < )
> 0 such that
lim P (jX
aj < ) = 1
n!1
But
lim P (jg (Xn )
g (a)j < ")
lim P (jg (Xn )
g (a)j < ") = 1
n!1
1
so by the Squeeze Theorem
n!1
and therefore g(Xn ) !p g(a).
5.5.2
Example
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z
of each of the following:
p
(b) Xn
(b) Xn + Yn
(c) Yn + Zn
(d) Xn Zn
(e) Zn2
N(0; 1) then …nd the limiting distributions
5.5. ADDITIONAL LIMIT THEOREMS
173
Solution
p
(a) Let g (x) = x which is a continuous function for all x 2 <+ . Since Xn !p a then by
p
p
p
p
5.5.1(1), Xn = g (Xn ) !p g (a) = a or Xn !p a.
(b) Let g (x; y) = x + y which is a continuous function for all (x; y) 2 <2 . Since Xn !p a
and Yn !p b then by 5.5.1(2), Xn + Yn = g (Xn ; Yn ) !p g (a; b) = a + b or Xn + Yn !p a + b.
(c) Let g (y; z) = y + z which is a continuous function for all (y; z) 2 <2 . Since Yn !p b
and Zn !D Z
N(0; 1) then by 5.5.1(3), Yn + Zn = g (Yn ; Zn ) !D g (b; z) = b + Z or
Yn + Zn !D b + Z where Z N(0; 1). Since b + Z N(b; 1), therefore Yn + Zn !D b + Z
N(b; 1)
(d) Let g (x; z) = xz which is a continuous function for all (x; z) 2 <2 . Since Xn !p a and
Zn !D Z
N(0; 1) then by Slutsky’s Theorem, Xn Zn = g (Xn ; Zn ) !D g (a; z) = aZ or
Xn Zn !D aZ where Z N(0; 1). Since aZ N 0; a2 , therefore Xn Zn !D aZ N 0; a2
(e) Let g (x; z) = z 2 which is a continuous function for all (x; z) 2 <2 . Since Zn !D Z
N(0; 1) then by Slutsky’s Theorem, Zn2 = g (Xn ; Zn ) !D g (a; z) = Z 2 or Zn2 !D Z 2 where
2 (1), therefore Z 2 ! Z 2
2 (1)
Z N(0; 1). Since Z 2
D
n
5.5.3
Exercise
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z
of each of the following:
N(0; 1) then …nd the limiting distributions
(a) Xn2
(b) Xn Yn
(c) Xn =Yn
(d) Xn
2Zn
(e) 1=Zn
In Example 5.5.2 we identi…ed the function g in each case. As with other limit theorems we
tend not to explicitly identify the function g once we have a good idea of how the theorems
work as illustrated in the next example.
5.5.4
Example
Suppose Xi
Poisson( ), i = 1; 2; : : : independently. Consider the sequence of random
variables Z1 ; Z2 ; : : : ; Zn ; : : : where
p
n Xn
p
Zn =
Xn
Find the limiting distribution of Zn .
174
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Solution
Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = , then by the Central Limit Theorem
p
n Xn
Wn =
!D Z N(0; 1)
(5.9)
p
and by the Weak Law of Large Numbers
Xn !p
By (5.10) and 5.5.1(1)
Un =
s
Xn
(5.10)
!p
r
=1
(5.11)
Now
Zn =
=
=
p
n Xn
p
Xn
p
n(Xn
)
p
q
Xn
Wn
Un
By (5.9), (5.11), and Slutsky’s Theorem
Zn =
5.5.5
Wn
Z
!D
=Z
Un
1
N(0; 1)
Example
Suppose Xi Uniform(0; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables U1 ; U2 ; : : : ; Un ; : : : where Un = max (X1 ; X2 ; : : : ; Xn ). Show that
(a) Un !p 1
(b) eUn !p e
(c) sin (1
Un ) !p 0
(d) Vn = n (1
(e) 1
e
Vn
Un ) !D V
!D 1
2
(f ) (Un + 1) [n (1
e
V
Exponential(1)
Uniform(0; 1)
Un )] !D 4V
Exponential(4)
5.5. ADDITIONAL LIMIT THEOREMS
175
Solution
(a) Since X1 ; X2 ; : : : ; Xn are Uniform(0; 1) random variables then for i = 1; 2; : : :
8
0
x 0
>
>
<
x 0<x<1
P (Xi x) =
>
>
: 1
x 1
Since X1 ; X2 ; : : : ; Xn are independent random variables
Fn (x) = P (Un
u)
= P (max (X1 ; X2 ; : : : ; Xn )
n
Q
=
P (Xi u)
u)
i=1
8
0
>
>
<
un
=
>
>
: 1
Therefore
u
0
0<u<1
u
lim Fn (u) =
n!1
(
1
0
u<1
1
u
1
and by Theorem 5.2.5
Un = max (X1 ; X2 ; : : : ; Xn ) !p 1
(5.12)
(b) By (5.12) and 5.5.1(1)
eUn !p e1 = e
(c) By (5.12) and 5.5.1(1)
sin (1
Un ) !p sin (1
1) = sin (0) = 0
(d) The cumulative distribution function of Vn = n (1
Gn (x) = P (Vn
Un ) is
v)
= P (n (1
max (X1 ; X2 ; : : : ; Xn ))
1
P max (X1 ; X2 ; : : : ; Xn )
1
= P max (X1 ; X2 ; : : : ; Xn )
= 1
n
Q
= 1
=
(
v)
v
n
P Xi
v
n
1
i=1
1
0
1
v
v n
n
0
v>0
v
n
176
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Therefore
lim Gn (v) =
n!1
(
0
1
v
v
e
0
v>0
which is the cumulative distribution function of an Exponential(1) random variable. Therefore by De…nition (5.1.1)
Vn = n (1
Un ) !D V
Exponential(1)
(5.13)
(e) By (5.13) and Slutsky’s Theorem
1
Vn
e
!D 1
V
e
where V
The probability density function of W = 1
V
is
d
[ log (1
dw
for 0 < w < 1
g (w) = elog(1
= 1
e
Exponential(1)
w)
w)]
which is the probability of a Uniform(0; 1) random variable. Therefore
1
e
Vn
!D 1
e
V
Uniform (0; 1)
(f ) By (5.12) and 5.5.1(1)
(Un + 1)2 !p (1 + 1)2 = 4
(5.14)
By (5.13), (5.14), and Slutsky’s Theorem
(Un + 1)2 [n (1
where V
Exponential(1). If V
(Un + 1)2 [n (1
5.5.6
Un )] !D 4V
Exponential(1) then 4V
Un )] !D 4V
Exponential(4) so
Exponential (4)
Delta Method
Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that
nb (Xn
a) !D X
for some b > 0. Suppose the function g(x) is di¤erentiable at a and g 0 (a) 6= 0. Then
nb [g(Xn )
g(a)] !D g 0 (a)X
(5.15)
5.5. ADDITIONAL LIMIT THEOREMS
177
Proof
By Taylor’s Theorem (2.11.15) we have
g (Xn ) = g (a) + g 0 (cn ) (Xn
a)
g (a) = g 0 (cn ) (Xn
a)
or
g (Xn )
(5.16)
where cn is between a and Xn .
From (5.15) it follows that Xn !p a. Since cn is between Xn and a, therefore cn !p a and
by 5.5.1(1)
g 0 (cn ) !p g 0 (a)
(5.17)
Multiplying (5.16) by nb gives
nb [g (Xn )
g (a)] = g 0 (cn ) nb (Xn
a)
(5.18)
Therefore by (5.15), (5.17), (5.18) and Slutsky’s Theorem
nb [g(Xn )
5.5.7
g(a)] !D g 0 (a)X
Example
Suppose Xi Exponential( ), i = 1; 2; : : : independently. Find the limiting distributions
of each of the following:
(a) Xn
(b) Un =
p
n Xn
n(Xn )
(c) Zn =
p Xn
(d) Vn = n log(Xn )
p
log
Solution
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 , then by the Weak Law of Large Numbers
Xn !p
(5.19)
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 , then by the Central Limit Theorem
p
n Xn
Wn =
!D Z N (0; 1)
(5.20)
Therefore by Slutsky’s Theorem
Un =
p
n Xn
!D Z
N 0;
2
(5.21)
178
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
(c) Zn can be written as
Zn =
p
p
n(Xn
n Xn
Xn
=
)
Xn
By (5.19) and 5.5.1(1),
Xn
!p
=1
(5.22)
By (5.20), (5.22), and Slutsky’s Theorem
Zn !D
Z
=Z
1
N(0; 1)
(d) Let g (x) = log x, a = , and b = 1=2. Then g 0 (x) =
(5.21) and the Delta Method
n1=2 log(Xn )
5.5.8
!D
1
( Z) = Z
and g 0 (a) = g 0 ( ) = 1 . By
N (0; 1)
Exercise
Suppose Xi
Poisson( ), i = 1; 2; : : : independently. Show that
p
Un =
and Vn
5.5.9
log
1
x
n Xn
p p
=
n
Xn
!D U N (0; )
1
p
!D V
N 0;
4
Theorem
Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that
p
n(Xn
a) !D X
N 0;
2
(5.23)
Suppose the function g(x) is di¤erentiable at a. Then
p
n[g(Xn )
g(a)] !D W
N 0; g 0 (a)
2
2
provided g 0 (a) 6= 0.
Proof
Suppose g(x) is a di¤erentiable function at a and g 0 (a) 6= 0. Let b = 1=2. Then by (5.23)
and the Delta Method it follows that
p
n[g(Xn )
g(a)] !D W
N 0; g 0 (a)
2
2
5.6. CHAPTER 5 PROBLEMS
5.6
179
Chapter 5 Problems
1. Suppose Yi
butions of
Exponential( ; 1), i = 1; 2; : : : independently. Find the limiting distri-
(a) Xn = min (Y1 ; Y2 ; : : : ; Yn )
(b) Un = Xn =
(c) Vn = n(Xn
(d) Wn = n2 (Xn
)
)
2. Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed continuous random variables with cumulative distribution function F (x) and probability density
function f (x). Let Yn = max (X1 ; X2 ; : : : ; Xn ).
Show that
Zn = n[1
F (Yn )] !D Z
Exponential(1)
3. Suppose Xi
Poisson( ), i = 1; 2; : : : independently. Find Mn (t) the moment
generating function of
p
Yn = n(Xn
)
Show that
lim log Mn (t) =
n!1
1 2
t
2
What is the limiting distribution of Yn ?
4. Suppose Xi
Exponential( ), i = 1; 2; : : : independently. Show that the moment
generating function of
n
p
P
Xi n = n
Zn =
i=1
is
h
Mn (t) = e
p
t= n
p i
t= n
1
n
Find lim Mn (t) and thus determine the limiting distribution of Zn .
n!1
5. If Z
N(0; 1) and Wn
Show that
2 (n)
independently then we know
Z
Tn = p
Wn =n
Tn !D Y
t(n)
N(0; 1)
180
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
6. Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with
E(Xi ) = , V ar(Xi ) = 2 < 1, and E(Xi4 ) < 1. Let
Xn =
Sn2 =
and
n
1 P
Xi
n i=1
n
1 P
Xi
n 1 i=1
p
Tn =
Xn
2
n Xn
Sn
Show that
Sn ! p
and
Tn !D Z
7. Let Xn
N(0; 1)
Binomial(n; ). Find the limiting distributions of
Xn
n
Xn
n
(a) Tn =
(b) Un =
(c) Wn =
p
(d) Zn =
Wn
p
Un
(e) Vn =
p
n
1
Xn
n
Xn
n
n arcsin
q
Xn
n
arcsin
p
(f) Compare the variances of the limiting distributions of Wn , Zn and Vn and comment.
8. Suppose Xi
Geometric( ), i = 1; 2; : : : independently. Let
Yn =
n
P
i=1
Find the limiting distributions of
(a) Xn =
Yn
n
(b) Wn =
p
(c) Vn =
(d) Zn =
n Xn
1
1+Xn
p
)
n
p n(V
Vn2 (1 Vn )
(1
)
Xi
5.6. CHAPTER 5 PROBLEMS
9. Suppose Xi
181
Gamma(2; ), i = 1; 2; : : : independently. Let
Yn =
n
P
Xi
i=1
Find the limiting distributions of
p
2p
Yn
n and Xn = 2
p
n(Xn 2 )
p
Wn =
and
2 2
p
n(Xn 2 )
Zn = X =p2
n
(a) Xn =
(b)
(c)
(d) Un =
p
n log(Xn )
Vn =
p
n Xn
2
log(2 )
(e) Compare the variances of the limiting distributions of Zn and Un .
10. Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables with
E(Xn ) =
and
V ar(Xn ) =
a
np
Show that
Xn !p
for p > 0
182
5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
6. Maximum Likelihood
Estimation - One Parameter
In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates of one unknown parameter. Some of this material was introduced in a
previous statistics course such as STAT 221/231/241.
In Section 6:2 we review the de…nitions needed for the method of maximum likelihood
estimation, the derivations of the maximum likelihood estimates for the unknown parameter
in the Binomial, Poisson and Exponential models, and the important invariance property of
maximum likelihood estimates. You will notice that we pay more attention to verifying that
the maximum likelihood estimate does correspond to a maximum using the …rst derivative
test. Example 6.2.9 is new and illustrates how the maximum likelihood estimate is found
when the support set of the random variable depends on the unknown parameter.
In Section 6:3 we de…ne the score function, the information function, and the expected
information function. These functions play an important role in the distribution of the
maximum likelihood estimator. These functions are also used in Newton’s Method which
is a method for determining the maximum likelihood estimate in cases where there is no
explicit solution. Although the maximum likelihood estimates in nearly all the examples
you saw previously could be found explicitly, this is not true in general.
In Section 6:4 we review likelihood intervals. Likelihood intervals provide a way to
summarize the uncertainty in an estimate. In Section 6:5 we give a theorem on the limiting
distribution of the maximum likelihood estimator. This important theorem tells us why
maximum likelihood estimators are good estimators.
In Section 6:6 we review how to …nd a con…dence interval using a pivotal quantity.
Con…dence intervals also give us a way to summarize the uncertainty in an estimate. We
also give a theorem on how to obtain a pivotal quantity using the maximum likelihood
estimator if the parameter is either a scale or location parameter. In Section 6:7 we review
how to …nd an approximate con…dence interval using an asymptotic pivotal quantity. We
then show how to use asymptotic pivotal quantities based on the limiting distribution of
the maximum likelihood estimator to construct approximate con…dence intervals.
183
184
6.1
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Introduction
Suppose the random variable X (possibly a vector of random variables) has probability
function/probability density function f (x; ). Suppose also that is unknown and 2
where is the parameter space or the set of possible values of . Let X be the potential
data that is to be collected. In your previous statistics course you learned how numerical
and graphical summaries as well as goodness of …t tests could be used to check whether the
assumed model for an observed set of data x was reasonable. In this course we will assume
that the …t of the model has been checked and that the main focus now is to use the model
and the data to determine point and interval estimates of .
6.1.1
De…nition - Statistic
A statistic, T = T (X), is a function of the data X which does not depend on any unknown
parameters.
6.1.2
Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample, that is, X1 ; X2 ; : : : ; Xn are independent
and identically distributed random variables, from a distribution with E(Xi ) =
and
2
2
V ar(Xi ) =
where and
are unknown.
n
n
P
P
2
The sample mean X = n1
Xi , the sample variance S 2 = n 1 1
Xi X , and the
i=1
i=1
sample minimum X(1) = min (X1 ; X2 ; : : : ; Xn ) are statistics.
The random variable
Xp
= n
is not a statistic since it is not only a function of the data X
but also a function of the unknown parameters
6.1.3
and
2.
De…nition - Estimator and Estimate
A statistic T = T (X) that is used to estimate ( ), a function of , is called an estimator
of ( ) and an observed value of the statistic t = t(x) is called an estimate of ( ).
6.1.4
Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) are independent and identically distributed random variables from a distribution with E(Xi ) = and V ar(Xi ) = 2 , i = 1; 2; : : : ; n. Suppose
x = (x1 ; x2 ; : : : ; xn ) is an observed random sample from this distribution.
The random variable X is an estimator of . The number x is an estimate of .
The random variable S is an estimator of . The number s is an estimate of .
6.2. MAXIMUM LIKELIHOOD METHOD
6.2
185
Maximum Likelihood Method
Suppose X is discrete random variable with probability function P (X = x; ) = f (x; ),
2 where the scalar parameter is unknown. Suppose also that x is an observed value
of the random variable X. Then the probability of observing this value is, P (X = x; ) =
f (x; ). With the observed value of x substituted into f (x; ) we have a function of the
parameter only, referred to as the likelihood function and denoted L ( ; x) or L( ). In the
absence of any other information, it seems logical that we should estimate the parameter
using a value most compatible with the data. For example we might choose the value of
which maximizes the probability of the observed data or equivalently the value of which
maximizes the likelihood function.
6.2.1
De…nition - Likelihood Function: Discrete Case
Suppose X is a discrete random variable with probability function f (x; ), where is a
scalar and 2 and x is an observed value of X. The likelihood function for based on
the observed data x is
L( ) = L ( ; x)
= P (observing the data x; )
= P (X = x; )
= f (x; ) for
2
If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from a distribution with probability function
f (x; ) and x = (x1 ; x2 ; : : : ; xn ) are the observed data then the likelihood function for
based on the observed data x is
L( ) = L ( ; x)
= P (observing the data x; )
= P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; )
n
Q
=
f (xi ; ) for 2
i=1
6.2.2
De…nition - Maximum Likelihood Estimate and Estimator
The value of that maximizes the likelihood function L( ) is called the maximum likelihood
estimate. The maximum likelihood estimate is a function of the observed data x and we
write ^ = ^ (x). The corresponding maximum likelihood estimator, which is a random
variable, is denoted by ~ = ~(X).
The shape of the likelihood function and the value of at which it is maximized are not
a¤ected if L( ) is multiplied by a constant. Indeed it is not the absolute value of the
likelihood function that is important but the relative values at two di¤erent values of the
186
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
parameter, for example, L( 1 )=L( 2 ). This ratio can be interpreted as how much more or
less consistent the data are with the parameter 1 as compared to 2 . The ratio L( 1 )=L( 2 )
is also una¤ected if L( ) is multiplied by a constant. In view of this the likelihood may be
de…ned as P (X = x; ) or as any constant multiple of it.
To …nd the maximum likelihood estimate we usually solve the equation dd L ( ) = 0. If
the value of which maximizes L ( ) occurs at an endpoint of , of course this does not
provide the value at which the likelihood is maximized. In Example 6.2.9 we see that solving
d
d L ( ) = 0 does not give the maximum likelihood estimate. Most often however we will
…nd ^ by solving dd L ( ) = 0.
Since the log function (Note: log = ln) is an increasing function, the value of which
maximizes the likelihood L ( ) also maximizes log L( ), the logarithm of the likelihood
function. Since it is usually simpler to …nd the derivative of the sum of n terms rather
than the product, it is often easier to determine the maximum likelihood estimate of by
solving dd log L ( ) = 0.
It is important to verify that ^ is the value of which maximizes L ( ) or equivalently
l ( ). This can be done using the …rst derivative test. Recall that the second derivative test
checks for a local extremum.
6.2.3
De…nition - Log Likelihood Function
The log likelihood function is de…ned as
l( ) = l ( ; x) = log L( ) for
2
where x are the observed data and log is the natural logarithmic function.
6.2.4
Example
Suppose in a sequence of n Bernoulli trials the probability of success is equal to and we
have observed x successes. Find the likelihood function, the log likelihood function, the
maximum likelihood estimate of and the maximum likelihood estimator of .
Solution
The likelihood function for
based on x successes in n trials is
L( ) = P (X = x; )
n x
=
(1
)n
x
x
for 0
1
or more simply
L( ) =
x
(1
)n
x
for 0
1
6.2. MAXIMUM LIKELIHOOD METHOD
187
Suppose x 6= 0 and x 6= n. The log likelihood function is
l( ) = x log + (n
x) log (1
)
for 0 <
<1
with derivative
d
l( ) =
d
x
n
1
x (1
=
x
)
(1
x n
(1
)
=
(n
)
x)
for 0 <
<1
The solution to dd l( ) = 0 is = x=n which is the sample proportion. Since dd l( ) > 0 if
0 < < x=n and dd l( ) < 0 if x=n < < 1 then, by the …rst derivative test, l( ) has a
absolute maximum at = x=n.
If x = 0 then
L( ) = (1
)n for 0
1
which is a decreasing function of
= 0 or = 0=n.
If x = n then
on the interval [0; 1]. L( ) is maximized at the endpoint
n
L( ) =
for 0
1
which is a increasing function of on the interval [0; 1]. L( ) is maximized at the endpoint
= 1 or = n=n.
In all cases the value of which maximizes the likelihood function is the sample proportion = x=n. Therefore the maximum likelihood estimate of is ^ = x=n and the
maximum likelihood estimator is ~ = x=n.
6.2.5
Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Poisson( ) distribution. Find
the likelihood function, the log likelihood function, the maximum likelihood estimate of
and the maximum likelihood estimator of .
Solution
The likelihood function is
L( ) =
=
n
Q
i=1
n
Q
i=1
=
f (xi ; ) =
n
Q
P (Xi = xi ; )
i=1
xi
e
xi !
n 1
Q
i=1 xi !
n
P
i=1
xi
e
n
for
0
188
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
or more simply
L( ) =
nx
e
n
for
0
Suppose x 6= 0. The log likelihood function is
l ( ) = n (x log
)
for
>0
with derivative
d
x
l( ) = n
d
n
=
(x
1
)
for
>0
The solution to dd l( ) = 0 is = x which is the sample mean. Since dd l( ) > 0 if 0 < < x
and dd l( ) < 0 if > x then, by the …rst derivative test, l( ) has a absolute maximum at
= x.
If x = 0 then
L( ) = e
which is a decreasing function of
= 0 or = 0 = x.
n
for
0
on the interval [0; 1). L( ) is maximized at the endpoint
In all cases the value of which maximizes the likelihood function is the sample mean
= x. Therefore the maximum likelihood estimate of is ^ = x and the maximum
likelihood estimator is ~ = X.
6.2.6
Likelihood Functions for Continuous Models
Suppose X is a continuous random variable with probability density function f (x; ). For
continuous continuous random variable, P (X = x; ) is unsuitable as a de…nition of the
likelihood function since this probability always equals zero.
For continuous data we usually observe only the value of X rounded to some degree
of precision, for example, data on waiting times is rounded to the closest second or data
on heights is rounded to the closest centimeter. The actual observation is really a discrete
random variable. For example, suppose we observe X correct to one decimal place. Then
1:15
Z
P (we observe 1:1 ; ) =
f (x; )dx
(0:1)f (1:1; )
1:05
assuming the function f (x; ) is reasonably smooth over the interval. More generally, suppose x1 ; x2 ; : : : ; xn are the observations from a random sample from the distribution with
probability density function f (x; ) which have been rounded to the nearest
which is
assumed to be small. Then
P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; )
n
Q
i=1
f (xi ; ) =
n
n
Q
i=1
f (xi ; )
6.2. MAXIMUM LIKELIHOOD METHOD
189
If we assume that the precision does not depend on the unknown parameter , then the
term n can be ignored. This argument leads us to adopt the following de…nition of the
likelihood function for a random sample from a continuous distribution.
6.2.7
De…nition - Likelihood Function: Continuous Case
If x = (x1 ; x2 ; : : : ; xn ) are the observed values of a random sample from a distribution with
probability density function f (x; ), then the likelihood function is de…ned as
L ( ) = L ( ; x) =
n
Q
f (xi ; ) for
2
i=1
6.2.8
Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Exponential( ) distribution.
Find the likelihood function, the log likelihood function, the maximum likelihood estimate
of and the maximum likelihood estimator of .
Solution
The likelihood function is
L( ) =
n
Q
f (xi ; ) =
i=1
=
=
1
n
n 1
Q
e
xi =
i=1
n
P
exp
xi =
i=1
n
e
nx=
for
>0
The log likelihood function is
l( ) =
n log +
x
for
>0
with derivative
d
l( ) =
d
=
n
n
2
(x
1
x
2
)
Now dd l ( ) = 0 for = x. Since dd l( ) > 0 if 0 < < x and dd l( ) < 0 if > x then, by
the …rst derivative test, l( ) has a absolute maximum at = x. Therefore the maximum
likelihood estimate of is ^ = x and the maximum likelihood estimator is ~ = X.
6.2.9
Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Find the likelihood function, the maximum likelihood estimate of and the maximum
likelihood estimator of .
190
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Solution
The probability density function of a Uniform(0; ) random variable is
f (x; ) =
1
for 0
x
and zero otherwise. The support set of the random variable X is [0; ] which depends on the
unknown parameter . In such examples care must be taken in determining the maximum
likelihood estimate of .
The likelihood function is
L( ) =
=
n
Q
i=1
n
Q
f (xi ; )
1
if 0
xi
, i = 1; 2; : : : ; n, and
>0
i=1
=
To determine the value of
1
n
if 0
xi
, i = 1; 2; : : : ; n, and
>0
which maximizes L( ) we note that L( ) can be written as
8
<0
if 0 < < x(n)
L( ) =
: 1n if
x(n)
where x(n) = max (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. To see this remember
that in order to observe the sample x1 ; x2 ; : : : ; xn the value of must be larger than all the
observed xi ’s.
L( ) is a decreasing function of on the interval [x(n) ; 1). Therefore L( ) is maximized
at = x(n) . The maximum likelihood estimate of is ^ = x(n) and the maximum likelihood
estimator is ~ = X(n) .
Note: In this example there is no solution to dd l ( ) = dd ( n log ) = 0 and the maximum
likelihood estimate of is not found by solving dd l ( ) = 0.
One of the reasons the method of maximum likelihood is so widely used is the invariance
property of the maximum likelihood estimate under one-to-one transformations.
6.2.10
Theorem - Invariance of the Maximum Likelihood Estimate
If ^ is the maximum likelihood estimate of
of g ( ).
then g(^) is the maximum likelihood estimate
Note: The invariance property of the maximum likelihood estimate means that if we know
the maximum likelihood estimate of then we know the maximum likelihood estimate of
any function of .
6.3. SCORE AND INFORMATION FUNCTIONS
6.2.11
191
Example
In Example 6.2.8 …nd the maximum likelihood estimate of the median of the distribution
and the maximum likelihood estimate of V ar(~).
Solution
If X has an Exponential( ) distribution then the median m is found by solving
0:5 =
Zm
1
e
x=
dx
0
to obtain
m=
log (0:5)
By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
of m is m
^ = ^ log (0:5) = x log (0:5).
Since Xi has an Exponential( ) distribution with V ar (Xi ) = 2 , i = 1; 2; : : : ; n independently, the variance of the maximum likelihood estimator ~ = X is
V ar ~ = V ar(X) =
2
n
By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
of V ar(~) is (^)2 =n = (x)2 =n.
6.3
Score and Information Functions
The derivative of the log likelihood function plays an important role in the method of
maximum likelihood. This function is often called the score function.
6.3.1
De…nition - Score Function
The score function is de…ned as
S( ) = S( ; x) =
d
d
l( ) =
log L ( )
d
d
for
2
where x are the observed data.
Another function which plays an important role in the method of maximum likelihood
is the information function.
192
6.3.2
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
De…nition - Information Function
The information function is de…ned as
d2
l( ) =
d 2
I( ) = I( ; x) =
d2
log L( ) for
d 2
2
where x are the observed data. I(^) is called the observed information.
In Section 6:7 we will see how the observed information I(^) can be used to construct
an approximate con…dence interval for the unknown parameter . I(^) also tells us about
the concavity of the log likelihood function l ( ).
6.3.3
Example
Find the observed information for Example 6.2.5. Suppose the maximum likelihood estimate of was ^ = 2. Compare I(^) = I (2) if n = 10 and n = 25. Plot the function
r ( ) = l ( ) l(^) for n = 10 and n = 25 on the same graph.
Solution
From Example 6.2.5, the score function is
S( ) =
d
n
l ( ) = (x
d
)
for
>0
Therefore the information function is
d2
l( ) =
d 2
=
d h x
n
d
nx
1
i
=
nx
2
2
and the observed information is
I(^) =
=
nx
n^
=
^2
^2
n
^
If n = 10 then I(^) = I (2) = n=^ = 10=2 = 5. If n = 25 then I(^) = I (2) = 25=2 = 12:5.
See Figure 6.1. The function r ( ) = l ( ) l(^) is more concave and symmetric for n = 25
than for n = 10. As the number of observations increases we have more “information”
about the unknown parameter .
Although we view the likelihood, log likelihood, score and information functions as
functions of , they are also functions of the observed data x. When it is important to
emphasize the dependence on the data x we will write L( ; x); S( ; x); and I( ; x). When
we wish to determine the sampling distribution of the corresponding random variables we
will write L( ; X), S( ; X), and I( ; X).
6.3. SCORE AND INFORMATION FUNCTIONS
193
0
-1
n=10
-2
-3
n=25
r(θ)
-4
-5
-6
-7
-8
-9
-10
1
1.5
2
θ
2.5
3
3.5
Figure 6.1: Poisson Log Likelihoods for n = 10 and n = 25
Here is one more function which plays an important role in the method of maximum
likelihood.
6.3.4
If
De…nition - Expected Information Function
is a scalar then the expected information function is given by
J( ) = E [I( ; X)]
d2
= E
l( ; X)
d 2
for
2
where X is the potential data.
Note:
If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) then
J( ) = E
= nE
d2
l( ; X)
d 2
d2
log f (X; )
d 2
where X has probability density function f (x; ).
for
2
194
6.3.5
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Example
For each of the following …nd the observed information I(^) and the expected information
J ( ). Compare I(^) and J(^). Determine the mean and variance of the maximum likelihood estimator ~. Compare the expected information with the variance of the maximum
likelihood estimator.
(a) Example 6.2.4 (Binomial)
(b) Example 6.2.5 (Poisson)
(c) Example 6.2.8 (Exponential)
Solution
(a) From Example 6.2.4, the score function based on x successes in n Bernoulli trials is
S( ) =
d
x n
l( ) =
d
(1
)
for 0 <
<1
Therefore the information function is
d2
d x n x
2l( ) =
d
1
d
x
n x
for 0 < < 1
2 +
(1
)2
I( ) =
=
Since the maximum likelihood estimate is ^ =
x
n
=
x
2
n
(1
x
)2
the observed information is
x
x
n x
n x
=
+
2 +
2
2
x
2
^
(1 ^)
1 nx
n
"
#
x
1 nx
1
1
n
=n x +
= n
+
2
2
x
1
1 nx
n
n
n
=
x
1 nx
n
n
=
^(1 ^)
I(^) =
x
n
If X has a Binomial(n; ) distribution then E (X) = n and V ar (X) = n (1
fore the expected information is
J( ) = E [I( ; X)] = E
=
=
X
n [(1
)+ ]
(1
)
n
for 0 <
(1
)
We note that I(^) = J(^).
2
+
n
(1
<1
X
n
n (1
2 = 2 +
)
(1
)
2
)
). There-
6.3. SCORE AND INFORMATION FUNCTIONS
195
The maximum likelihood estimator is ~ = X=n with
X
n
E(~) = E
=
1
n
E (X) =
=
n
n
and
X
n
V ar(~) = V ar
=
1
n (1
V ar (X) =
2
n
n2
)
(1
n
=
)
1
J( )
=
(b) From Example 6.3.3 the information function based on Poisson data x1 ; x2 ; : : : ; xn is
I( ) =
nx
2
for
>0
Since the maximum likelihood estimate is ^ = x the observed information is
nx
n
I(^) = 2 =
^
^
We note that I(^) = J(^).
Since Xi has a Poisson( ) distribution with E (Xi ) = V ar (Xi ) =
V ar X = =n. Therefore the expected information is
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = E
=
n
for
nX
2
=
then E X =
n
2
( )
>0
The maximum likelihood estimator is ~ = X with
E(~) = E X =
and
V ar(~) = V ar X =
n
=
1
J( )
(c) From Example 6.2.8 the score function based on Exponential data x1 ; x2 ; : : : ; xn is
S( ) =
d
l( ) =
d
n
1
x
2
for
Therefore the information function is
d2
d 1
x
2 l ( ) = nd
2
d
1 2x
= n
for > 0
2 + 3
I( ) =
>0
and
196
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Since the maximum likelihood estimate is ^ = x the observed information is
!
^
1
2
n
I(^) = n
=
2 + 3
^
^
^
( )2
Since Xi has a Exponential( ) distribution with E (Xi ) =
and V ar (Xi ) =
2
E X = and V ar X = =n. Therefore the expected information is
1
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = nE
=
n
2
for
2
+
2X
3
=n
1
2
+
2
then
2
2
>0
We note that I(^) = J(^).
The maximum likelihood estimator is ~ = X with
E(~) = E X =
and
V ar(~) = V ar X =
2
n
=
1
J( )
In all three examples we have I(^) = J(^), E(~) = , and V ar(~) = [J ( )]
1
.
In the three previous examples, we observed that E(~) = and therefore ~ was an
unbiased estimator of . This is not always true for maximum likelihood estimators as we
see in the next example. However, maximum likelihood estimators usually have other good
properties. Suppose ~n = ~n (X1 ; X2 ; : : : ; Xn ) is the maximum likelihood estimator based
on a sample of size n. If lim E(~n ) = then ~n is an asymptotically unbiased estimator of
n!1
. If ~n !p then ~n is called a consistent estimator of .
6.3.6
Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the distribution with probability
density function
f (x; ) = x 1 for 0 x 1 for > 0
(6.1)
(a) Find the score function, the maximum likelihood estimator, the information function,
the observed information, and the expected information.
n
P
(b) Show that T =
log Xi Gamma n; 1
i=1
(c) Use (b) and 2.7.9 to show that ~ is not an unbiased estimator of . Show however that
~ is an aysmptotically unbiased estimator of .
(d) Show that ~ is a consistent estimator of .
6.3. SCORE AND INFORMATION FUNCTIONS
197
(e) Use (b) and 2.7.9 to …nd V ar(~). Compare V ar(~) with the expected information.
Solution
(a) The likelihood function is
n
Q
L( ) =
f (xi ; ) =
n
Q
n
1
xi
i=1
1
i=1
=
n
Q
for
xi
>0
i=1
or more simply
n
Q
n
L( ) =
for
xi
>0
i=1
The log likelihood function is
l( ) = n log +
n
P
log xi
i=1
= n log
where t =
n
P
t for
>0
log xi . The score function is
i=1
S( ) =
=
=
d
l( )
d
n
t
1
(n
t)
for
>0
Now dd l ( ) = 0 for = n=t. Since dd l( ) > 0 if 0 < < n=t and dd l( ) < 0 if > n=t
then, by the …rst derivative test, l( ) has a absolute maximum at = n=t. Therefore the
maximum likelihood estimate of is ^ = n=t and the maximum likelihood estimator is
n
~ = n=T where T = P log Xi .
i=1
The information function is
I( ) =
=
and the observed information is
d2
l( )
d 2
n
for > 0
2
n
I(^) = 2
^
The expected information is
J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = E
=
n
2
for
>0
n
2
198
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
(b) From Exercise 2.6.12 we have that if Xi has the probability density function (6.1) then
Yi =
log Xi
1
Exponential
for i = 1; 2; : : : ; n
(6.2)
From 4.3.2(4) have that
T =
n
P
i=1
(c) Since T
n
P
log Xi =
Yi
Gamma n;
1
i=1
Gamma n; 1 then from 2.7.9 we have
(n + p)
p
(n)
E (T p ) =
for p >
n
Assuming n > 1 we have
E T
1
(n
=
1)
=
n
(n)
1
1
so
E ~
n
T
= E
= nE
= nE T
= n
=
n
1
1
T
1
1
1
n
6=
and therefore ~ is not an unbiased estimator of .
Now ~ is an estimator based on a sample of size n. Since
lim E ~ = lim
n!1
n!1
1
1
n
=
therefore ~ is an asymptotically unbiased estimator of .
(d) By (6.2), Y1 ; Y2 ; : : : ; Yn are independent and identically distributed random variables
with E (Yi ) = 1 and V ar (Yi ) = 12 < 1. Therefore by the Weak Law of Large Numbers
and by the Limit Theorems
n
T
1 P
1
=
Yi !p
n
n i=1
Thus ~ is a consistent estimator of .
~ = n !p
T
6.3. SCORE AND INFORMATION FUNCTIONS
199
(e) To determine V ar(~) we note that
n
V ar(~) = V ar
T
1
= n2 V ar
T
( "
= n2
E
n
= n2 E T
Since
2
E T
(n
=
2
1
T
#
2
E
1
E T
2
o
)
2
2)
=
(n
(n)
2
2
1
T
1) (n
2)
then
n
V ar(~) = n2 E T
"
= n
2
(n
2
2
1
2
2
= n2
E T
1) (n
2)
n
o
1
2
#
(n 1) (n 2)
(n 1)2 (n 2)
2
=
1 2
(n
n
1
We note that V ar ~ 6=
6.3.7
2
n
=
1
J( ) ,
2)
however for large n, V ar ~
1
J( ) .
Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Weibull( ; 1) distribution
with probability density function
f (x; ) = x
1
e
x
for x > 0;
>0
Find the score function and the information function. How would you …nd the maximum
likelihood estimate of ?
Solution
The likelihood function is
L( ) =
n
Q
f (xi ; ) =
i=1
1
i=1
=
n
n
Q
i=1
n
Q
xi
xi
exp
1
e
xi
n
P
i=1
xi
for
>0
200
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
or more simply
L( ) =
n
n
Q
exp
xi
i=1
i=1
The log likelihood function is
l( ) = n log +
n
P
log xi
i=1
The score function is
S( ) =
=
n
P
xi
n
P
i=1
xi
d
l( )
d
n
P
n
+t
xi log xi
for
for
>0
>0
for
>0
i=1
where t =
n
P
log xi .
i=1
Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Since
d
S( )=
d
n
2
+
n
P
i=1
xi (log xi )2
for
>0
is negative for all values of > 0 then we know that the function S ( ) is always decreasing
and therefore there is only one solution to S( ) = 0. The solution to S( ) = 0 gives the
maximum likelihood estimate.
The information function is
I( ) =
=
d2
l( )
d 2
n
P
n
xi (log xi )2
2 +
for
>0
i=1
To illustrate how to …nd the maximum likelihood estimate for a given sample of data, we
randomly generate 20 observations from the Weibull( ; 1) distribution. To do this we use the
result of Example 2.6.7 in which we showed that if u is an observation from the Uniform(0; 1)
distribution then x = [ log (1 u)]1= is an observation from the Weibull( ; 1) distribution.
The following R code generates the data, plots the likelihood function, …nds ^ by solving
S( ) = 0 using the R function uniroot, and determines S(^) and the observed information
I(^).
6.3. SCORE AND INFORMATION FUNCTIONS
# randomly generate 20 observations from a Weibull(theta,1)
# using a random theta value between 0.5 and 1.5
set.seed(20086689) # set the seed so results can be reproduced
truetheta<-runif(1,min=0.5,max=1.5)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((-log(1-runif(20)))^(1/truetheta),2))
x
#
# function for calculating Weibull likelihood for data x and theta=th
WBLF<-function(th,x)
{n<-length(x)
L<-th^n*prod(x)^th*exp(-sum(x^th))
return(L)}
#
# function for calculating Weibull score for data x and theta=th
WBSF<-function(th,x)
{n<-length(x)
t<-sum(log(x))
S<-(n/th)+t-sum(log(x)*x^th)
return(S)}
#
# function for calculating Weibull information for data x and theta=th
WBIF<-function(th,x)
{n<-length(x)
I<-(n/th^2)+sum(log(x)^2*x^th)
return(I)}
#
# plot the Weibull likelihood function
th<-seq(0.25,0.75,0.01)
L<-sapply(th,WBLF,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("L(",theta,")")),lwd=3)
#
# find thetahat using uniroot function
thetahat<-uniroot(function(th) WBSF(th,x),lower=0.4,upper=0.6)$root
cat("thetahat = ",thetahat) # display value of thetahat
# display value of Score function at thetahat
cat("S(thetahat) = ",WBSF(thetahat,x))
# calculate observed information
cat("Observed Information = ",WBIF(thetahat,x))
201
202
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
The generated data are
0:01
0:64
0:01
1:07
0:05
1:16
0:07
1:25
0:10
2:40
0:11
3:03
0:23
3:65
0:28
5:90
0:44
6:60
0:46
30:07
4e-20
0e+00
2e-20
L(θ)
6e-20
8e-20
The likelihood function is graphed in Figure 6.2. The solution to S( ) = 0 determined
by uniroot was ^ = 0:4951607. S(^) = 2:468574 10 5 which is close to zero and the
observed information was I(^) = 181:8069 which is positive and indicates a local maximum.
We have already shown in this example that the solution to S( ) = 0 is the unique maximum
likelihood estimate.
0.3
0.4
0.5
0.6
0.7
θ
Figure 6.2: Likelihood function for Example 6.3.7
Note that the interval [0:25; 0:75] used to graph the likelihood function was determined
by trial and error. The values lower=0.4 and upper=0.6 used for uniroot were determined
from the graph of the likelihood function. From the graph is it easy to see that the value
of ^ lies in the interval [0:4; 0:6].
Newton’s Method, which is a numerical method for …nding the roots of an equation usually
discussed in …rst year calculus, is a method which can be used for …nding the maximum likelihood estimate. Newton’s Method usually works quite well for …nding maximum likelihood
estimates because the likelihood function is often very quadratic in shape.
6.3. SCORE AND INFORMATION FUNCTIONS
6.3.8
Let
(0)
203
Newton’s Method
be an initial estimate of . The estimate
(i+1)
=
(i)
+
S(
(i)
I(
(i)
)
)
(i)
can be updated using
for i = 0; 1; : : :
Notes:
(1) The initial estimate,
(0)
, may be determined by graphing L ( ) or l ( ).
(2) The algorithm is usually run until the value of (i) no longer changes to a reasonable
number of decimal places. When the algorithm is stopped it is always important to check
that the value of obtained does indeed maximize L ( ).
(3) This algorithm is also called the Newton-Raphson Method.
(4) I ( ) can be replaced by J ( ) for a similar algorithm which is called the Method of
Scoring.
(5) If the support set of X depends on (e.g. Uniform(0; )) then ^ is not found by solving
S( ) = 0.
6.3.9
Example
Use Newton’s Method to …nd the maximum likelihood in Example 6.3.7.
Solution
Here is R code for Newton’s Method for the Weibull Example
# Newton’s Method for Weibull Example
NewtonWB<-function(th,x)
{thold<-th
thnew<-th+0.1
while (abs(thold-thnew)>0.00001)
{thold<-thnew
thnew<-thold+WBSF(thold,x)/WBIF(thold,x)
print(thnew)}
return(thnew)}
#
thetahat<-NewtonWB(0.2,x)
Newton’s Method converges after four iterations and the value of thetahat returned is
^ = 0:4951605 which is the same value to six decimal places as was obtained above using
the uniroot function.
204
6.4
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Likelihood Intervals
In your previous statistics course, likelihood intervals were introduced as one approach to
constructing an interval estimate for the unknown parameter .
6.4.1
De…nition - Relative Likelihood Function
The relative likelihood function R( ) is de…ned by
R( ) = R ( ; x) =
L( )
L(^)
for
2
where x are the observed data.
The relative likelihood function takes on values between 0 and 1 and can be used to rank
parameter values according to their plausibilities in light of the observed data. If R( 1 ) =
0:1, for example, then 1 is rather an implausible parameter value because the data are
ten times more probable when = ^ than they are when = 1 . However, if R( 1 ) = 0:5,
then 1 is a fairly plausible value because it gives the data 50% of the maximum possible
probability under the model.
6.4.2
De…nition - Likelihood Interval
The set of
values for which R( )
p is called a 100p% likelihood interval for .
Values of inside a 10% likelihood interval are referred to as plausible values in light of the
observed data. Values of outside a 10% likelihood interval are referred to as implausible
values given the observed data. Values of inside a 50% likelihood interval are very plausible
and values of outside a 1% likelihood interval are very implausible in light of the data.
6.4.3
De…nition - Log Relative Function
The log relative likelihood function is the natural logarithm of the relative likelihood function:
r( ) = r ( ; x) = log[R( )] for
2
where x are the observed data.
Likelihood regions or intervals may be determined from a graph of R( ) or r( ). Alternatively, they can be found by solving R ( ) p = 0 or r( ) log p = 0. In most cases this
must be done numerically.
6.4. LIKELIHOOD INTERVALS
6.4.4
205
Example
Plot the relative likelihood function for
intervals for .
in Example 6.3.7. Find 10% and 50% likelihood
Solution
Here is R code to plot the relative likelihood function for the Weibull Example with lines
for determining 10% and 50% likelihood intervals for as well as code to determine these
intervals using uniroot.
The R function WBRLF uses the R function WBLF from Example 6.3.7.
# function for calculating Weibull relative likelihood function
WBRLF<-function(th,thetahat,x)
{R<-WBLF(th,x)/WBLF(thetahat,x)
return(R)}
#
# plot the Weibull relative likelihood function
th<-seq(0.25,0.75,0.01)
R<-sapply(th,WBRLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
# add lines to determine 10% and 50% likelihood intervals
abline(a=0.10,b=0,col="red",lwd=2)
abline(a=0.50,b=0,col="blue",lwd=2)
#
# use uniroot to determine endpoints of 10%, 15%, and 50% likelihood intervals
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.6,upper=0.7)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.35,upper=0.45)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.55,upper=0.65)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.6,upper=0.7)$root
The graph of the relative likelihood function is given in Figure 6.3.
The upper and lower values used for uniroot were determined using this graph.
The 10% likelihood interval for
is [0:34; 0:65].
The 50% likelihood interval for
is [0:41; 0:58].
For later reference the 15% likelihood interval for
is [0:3550; 0:6401].
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
0.0
0.2
0.4
R(θ)
0.6
0.8
1.0
206
0.3
0.4
0.5
0.6
0.7
θ
Figure 6.3: Relative Likelihood function for Example 6.3.7
6.4.5
Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Plot the relative likelihood function for if n = 20 and x(20) = 1:5. Find 10% and 50%
likelihood intervals for .
Solution
From Example 6.2.9 the likelihood function for n = 10 and ^ = x(10) = 0:5 is
8
<0
if 0 < < 0:5
L( ) =
: 1 if
0:5
10
The relative likelihood function is
R( ) =
8
<0
:
if 0 <
0:5 10
if
< 0:5
0:5
is graphed in Figure 6.4 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
10
R( ) we solve 0:5
= p to obtain = 0:5p 1=10 . Since R( ) = 0 if 0 < < 0:5 then a
100p% likelihood interval for is of the form 0:5; 0:5p 1=10 .
6.4. LIKELIHOOD INTERVALS
207
A 10% likelihood interval is
h
0:5; 0:5 (0:1)
A 50% likelihood interval is
h
0:5; 0:5 (0:5)
1=10
1=10
i
= [0:5; 0:629]
i
= [0:5; 0:536]
1
0.9
R(θ)
0.7
0.5
0.3
0.1
0
0
0.2
0.4
θ
0.6
0.8
1
Figure 6.4: Relative likelihood function for Example 6.4.5
More generally for an observed random sample x1 ; x2 ; : : : ; xn from the Uniform(0; ) distribution a 100p% likelihood interval for will be of the form x(n) ; x(n) p 1=n .
6.4.6
Exercise
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Two Parameter Exponential(1; )
distribution. Plot the relative likelihood function for if n = 12 and x(1) = 2. Find 10%
and 50% likelihood intervals for .
208
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.5
Limiting Distribution of Maximum Likelihood Estimator
If there is no explicit expression for the maximum likelihood estimator ~, as in Example
6.3.7, then its sampling distribution can only be obtained by simulation. This makes it
di¢ cult to determine the properties of the maximum likelihood estimator. In particular,
to determine how good an estimator is, we look at its mean and variance. The mean
indicates whether the estimator is close, on average, to the true value of , while the variance
indicates the uncertainty in the estimator. The larger the variance the more uncertainty in
our estimation. To determine how good an estimator is we also look at how the mean and
variance behave as the sample size n ! 1.
We saw in Example 6.3.5 that E(~) = and V ar(~) = [J ( )] 1 ! 0 as n ! 1
for the Binomial, Poisson, Exponential models. For each of these models the maximum
likelihood estimator is an unbiased estimator of . In Example 6.3.6 we saw that E(~) !
and V ar(~) ! [J ( )] 1 ! 0 as n ! 1. The maximum likelihood estimator ~ is an
asymptotically unbiased estimator. In Example 6.3.7 we are not able to determine E(~)
and V ar(~).
The following theorem gives the limiting distribution of the maximum likelihood estimator in general under certain restrictions.
6.5.1
Theorem - Limiting Distribution of Maximum Likelihood Estimator
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) be a random sample from the probability (density) function
f (x; ) for 2 . Let ~n = ~n (Xn ) be the maximum likelihood estimator of based on
Xn .
Then under certain (regularity) conditions
~n !p
(~n
)[J( )]1=2 !D Z
2 log R( ; Xn ) =
for each
2 log
(6.3)
N(0; 1)
L( ; Xn )
!D W
L(~n ; Xn )
(6.4)
2
(1)
(6.5)
2 .
The proof of this result which depends on applying Taylor’s Theorem to the score
function is beyond the scope of this course. The regularity conditions are a bit complicated
but essentially they are a set of conditions which ensure that the error term in Taylor’s
Theorem goes to zero as n ! 1. One of the conditions is that the support set of f (x; ) does
not depend on . Therefore, for example, this theorem cannot be applied to the maximum
likelihood estimator in the case of a random sample from the Uniform(0; ) distribution.
6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR
209
This is actually not a problem since the distribution of the maximum likelihood estimator
in this case can be determined exactly.
Since (6.3) holds ~n is called a consistent estimator of .
Theorem 6.5.1 implies that for su¢ ciently large n, ~n has an approximately N( ; 1=J( ))
distribution. Therefore for large n
E(~n )
and the maximum likelihood estimator is an asymptotically unbiased estimator of .
Since ~n has an approximately N( ; 1=J( )) distribution this also means that for su¢ ciently large n
1
V ar(~n )
J( )
1=J( ) is called the asymptotic variance of ~n . Of course J( ) is unknown because is
unknown. By (6.3), (6.4), and Slutsky’s Theorem
q
~
( n
) J(~n ) !D Z N(0; 1)
(6.6)
which implies that the asymptotic variance of ~n can be estimated using 1=J(^n ). Therefore
for su¢ ciently large n we have
1
V ar(~n )
J(^n )
By the Weak Law of Large Numbers
1
I( ; Xn ) =
n
n d2
1 P
l( ; Xi ) !p E
n i=1 d 2
d2
l( ; Xi )
d 2
Therefore by (6.3), (6.4), (6.7) and the Limit Theorems it follows that
q
~
( n
) I(~n ; Xn ) !D Z N(0; 1)
(6.7)
(6.8)
which implies the asymptotic variance of ~n can be also be estimated using 1=I(^n ) where
I(^n ) is the observed information. Therefore for su¢ ciently large n we have
V ar(~n )
1
I(^n )
Results (6.6), (6.8) and (6.5) can be used to construct approximate con…dence intervals
for .
In Chapter 8 we will see how result (6.5) can be used in a test of hypothesis.
Although we will not prove Theorem 6.5.1 in general we can prove the results in a particular case. The following example illustrates how techniques and theorems from previous
chapters can be used together to obtained the results of interest. It is also a good review
of several ideas covered thus far in these Course Notes.
210
6.5.2
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Example
(a) Suppose X
Weibull(2; ). Show that E X 2 =
2
and V ar X 2 =
4
.
(b) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Weibull(2; ) distribution. Find
the maximum likelihood estimator ~ of , the information function I ( ), the observed
information I(^), and the expected information J ( ).
(c) Show that
~n !p
(d) Show that
(~n
)[J( )]1=2 !D Z
N(0; 1)
Solution
(a) From Exercise 2.7.9 we have
if X
k
+1
2
k
E Xk =
for k = 1; 2; : : :
(6.9)
Weibull(2; ). Therefore
E X2
=
2
E X4
=
4
and
V ar X 2 = E
(b) The likelihood function is
L( ) =
n
Q
h
2
X2
i
=
2
E X2
f (xi ; ) =
i=1
2
+1
2
4
+1
2
n
Q
L( ) =
n
1 P
2xi exp
where t =
n
P
i=1
2n
2
t
exp
4
=2
2 2
=
4
2
i=1
or more simply
4
=2
(xi = )2
n 2x e
Q
i
i=1
2n
2
=
i=1
x2i
for
2
for
>0
for
>0
>0
x2i . The log likelihood function is
l( ) =
t
2n log
for
2
>0
The score function is
S( )=
d
l( ) =
d
2n
+
2t
3
=
2
2
t
n
2
6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR
Now
d
d l( ) = 0
t 1=2
then,
n
for
=
t 1=2
.
n
d
d
Since
l( ) > 0 if 0 <
t 1=2
n
<
and
d
d
211
l( ) < 0 if
1=2
by the …rst derivative test, l( ) has a absolute maximum at = nt
.
1=2
t
Therefore the maximum likelihood estimate of is ^ = n
. The maximum likelihood
estimator is
1=2
n
P
~= T
where T =
Xi2
n
i=1
>
The information function is
d2
l( )
d 2
6T
2n
I( ) =
=
4
for
2
>0
and the observed information is
2
2n
6(n^ )
=
^2
^4
6t
^4
4n
^2
I(^) =
=
2n
^2
To …nd the expected information we note that, from (a), E Xi2 =
n
P
E (T ) = E
i=1
Therefore the expected information is
J( ) = E
=
(c) To show that ~n !p
4n
2
6T
2n
4
2
for
=
Xi2
4
and thus
2
=n
6E (T )
2
2n
2
=
6n
4
2
2n
2
>0
we need to show that
~=
T
n
1=2
n
1 P
X2
n i=1 i
=
1=2
!p
Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with
E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Weak Law of Large
Numbers
n
T
1 P
=
X 2 !p 2
n
n i=1 i
and by 5.5.1(1)
~=
as required.
T
n
1=2
!p
(6.10)
212
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
(d) To show that
(~n
)[J( )]1=2 !D Z
N(0; 1)
we need to show that
[J( )]1=2 (~n
)
p "
2 n
T 1=2
=
n
#
!D Z
N(0; 1)
Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with
E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Central Limit Theorem
p
n
2
T
n
!D Z
2
p
Let g (x) = x and a =
Method we have
2
p
. Then
n
q
d
dx g (x)
p
T
n
=
1
p
2 x
N(0; 1)
(6.11)
and g 0 (a) =
1
2
. By (6.11) and the Delta
2
!D
2
1
Z
2
N 0;
1
4 2
or
p
n
q
(2 )
p "
2 n
T
=
n
T
n
2
1=2
(6.12)
#
!D Z
N(0; 1)
(6.13)
as required.
6.5.3
Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Uniform(0; ) distribution. Since
the support set of Xi depends on Theorem 6.5.1 does not hold. Show however that the
maximum likelihood estimator ~n = X(n) is a consistent estimator of .
Solution
In Example 5.1.5 we showed that ~n = max (X1 ; X2 ; : : : ; Xn ) !p
consistent estimator of .
and therefore ~n is a
6.6. CONFIDENCE INTERVALS
6.6
213
Con…dence Intervals
In your previous statistics course a con…dence interval was used to summarize the available
information about an unknown parameter. Con…dence intervals allow us to quantify the
uncertainty in the unknown parameter.
6.6.1
De…nition - Con…dence Interval
Suppose X is a random variable (possibly a vector) whose distribution depends on , and
A(X) and B(X) are statistics. If
P [A(X)
B(X)] = p for 0 < p < 1
then [a(x); b(x)] is called a 100p% con…dence interval for
where x are the observed data.
Con…dence intervals can be constructed in a straightforward manner if a pivotal quantity
exists.
6.6.2
De…nition - Pivotal Quantity
Suppose X is a random variable (possibly a vector) whose distribution depends on . The
random variable Q(X; ) is called a pivotal quantity if the distribution of Q does not depend
on .
Pivotal quantities can be used for constructing con…dence intervals in the following way.
Since the distribution of Q(X; ) is known we can write down a probability statement of
the form
P (q1 Q(X; ) q2 ) = p
where q1 and q2 do not depend on . If Q is a monotone function of
can be rewritten as
P [A(X)
B(X)] = p
then this statement
and the interval [a(x); b(x)] is a 100p% con…dence interval.
6.6.3
Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution.
Determine the distribution of
n
P
2
Xi
i=1
Q(X; ) =
and thus show Q(X; ) is a pivotal quantity. Show how Q(X; ) can be used to construct
a 100p% equal tail con…dence interval for .
214
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Solution
Since X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution then
by 4.3.2(4)
n
P
Xi
Gamma (n; )
Y =
i=1
and by Chapter 2, Problem 7(c)
Q(X; ) =
2Y
2
n
P
Xi
i=1
=
2
(2n)
and therefore Q(X; ) is a pivotal quantity.
Find values a and b such that
P (W
where W
1
a) =
p
2
and P (W
b) =
1
p
2
2 (2n).
Since P (a
W
b) = p then
0
B
PB
@a
or
Therefore
2
n
P
1
Xi
C
bC
A=p
i=1
0 P
n
2
X
B i=1 i
B
P@
b
2
n
P
Xi
i=1
a
3
2 P
n
n
P
xi
2
xi 2
7
6 i=1
i=1
7
6
4 b ; a 5
1
C
C=p
A
is a 100p% equal tail con…dence interval for .
The following theorem gives the pivotal quantity in the case in which
or scale parameter.
6.6.4
is either a location
Theorem
Let X = (X1 ; X2 ; : : : ; Xn ) be a random sample from f (x; ) and let ~ = ~(X) be the
maximum likelihood estimator of the scalar parameter based on X.
(1) If is a location parameter of the distribution then Q(X; ) = ~
is a pivotal quantity.
(2) If
is a scale parameter of the distribution then Q(X; ) = ~= is a pivotal quantity.
6.6. CONFIDENCE INTERVALS
6.6.5
215
Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Weibull(2; ) distribution. Use
Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can be
used to construct a 100p% equal tail con…dence interval for .
Solution
From Chapter 2, Problem 3(a) we know that is a scale parameter for the Weibull(2; )
distribution. From Example 6.5.2 the maximum likelihood estimator of is
~=
1=2
T
n
=
Therefore by Theorem 6.6.4
n
1 P
X2
n i=1 i
1=2
n
1 P
X2
n i=1 i
1=2
~
Q(X; ) =
1
=
is a pivotal quantity. To construct a con…dence interval for we need to determine the
distribution of Q(X; ). This looks di¢ cult at …rst until we notice that
~
!2
=
n
1 P
Xi2
n 2 i=1
This form suggests looking for the distribution of
n
P
i=1
Xi2 which is a sum of independent and
identically distributed random variables X12 ; X22 ; : : : ; Xn2 .
From Chapter 2, Problem 7(h) we have that if Xi
Xi2
Therefore
n
P
i=1
Weibull(2; ), i = 1; 2; : : : ; n then
Exponential( 2 ) for i = 1; 2; : : : ; n
Xi2 is a sum of independent Exponential( 2 ) random variables. By 4.3.2(4)
n
P
i=1
and by Chapter 2, Problem 7(h)
Xi2
Gamma n;
2
Q1 (X; ) =
n
P
Xi2
i=1
2
and therefore Q1 (X; ) is a pivotal quantity.
2
2
(2n)
216
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Now
~
Q1 (X; ) = 2n
!2
= 2n [Q(X; )]2
is a one-to-one function of Q(X; ). To see that Q1 (X; ) and Q(X; ) generate the same
con…dence intervals for we note that
P (a
0
Q1 (X; ) b)
1
!2
~
= P @a 2n
bA
= P
= P
"
"
a
2n
1=2
a
2n
1=2
~
b
2n
1=2
#
1=2
b
2n
Q(X; )
#
To construct a 100p% equal tail con…dence interval we choose a and b such that
P (W
where W
2 (2n).
a) =
1
p
2
"
"
= P ~
a
2n
6.6.6
n
P
i=1
~
1=2
2n
b
a 100p% equal tail con…dence interval is
"
^ 2n
b
1
n
b) =
1
p
2
Since
p = P
where ^ =
and P (W
b
2n
1=2
~
1=2
; ^
2n
a
1=2
1=2
2n
a
#
1=2
#
#
1=2
x2i
Example
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Uniform(0; ) distribution.
Use Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can
be used to construct a 100p% con…dence interval for of the form [^; a^].
6.6. CONFIDENCE INTERVALS
217
Solution
From Chapter 2, Problem 3(b) we know that is a scale parameter for the Uniform(0; )
distribution. From Example 6.2.9 the maximum likelihood estimator of is
~ = X(n)
Therefore by Theorem 6.6.4
~
Q(X; ) =
=
X(n)
is a pivotal quantity. To construct a con…dence interval for
distribution of Q(X; ).
!
~
P (Q (X; ) q) = P
q
= P X(n) q
n
Q
=
P (Xi q )
i=1
n
Q
=
q
i=1
n
= q
since P (Xi
for 0
q
= P
1
= P
1
a
or a = (1
p)
where ^ = x(n) .
1=n
P
= 1
a
for 0
~
a
x
1
a~
=P
Q (X; )
= P (Q (X; )
= 1
x
of the form [^; a^] we need to choose a such
To construct a 100p% con…dence interval for
that
p = P ~
x) =
we need to determine the
1
a
~
1
!
1
1)
P
Q (X; )
1
a
Q (X; )
n
. The 100p% con…dence interval for
h
i
^; (1 p) 1=n ^
is
1
a
218
6.7
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Approximate Con…dence Intervals
A pivotal quantity does not always exist. For example there is no pivotal quantity for the
Binomial or the Poisson distributions. In these cases we use an asymptotic pivotal quantity
to construct approximate con…dence intervals.
6.7.1
De…nition - Asymptotic Pivotal Quantity
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random variable whose distribution depends on . The
random variable Q(Xn ; ) is called an asymptotic pivotal quantity if the limiting distribution
of Q(Xn ; ) as n ! 1 does not depend on .
In your previous statistics course, approximate con…dence intervals for the Binomial and
Poisson distribution were justi…ed using a Central Limit Theorem argument. We are now
able to clearly justify the asymptotic pivotal quantity using the theorems of Chapter 5.
6.7.2
Example
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Poisson( ) distribution. Show
that
p
n Xn
p
Q(Xn ; ) =
Xn
is an asymptotic pivotal quantity. Show how Q(Xn ; ) can be used to construct an approximate 100p% equal tail con…dence interval for .
Solution
In Example 5.5.4, the Weak Law of Large Numbers, the Central Limit Theorem and Slutsky’s Theorem were all used to prove
p
n Xn
p
Q(Xn ; ) =
!D Z N(0; 1)
(6.14)
Xn
and thus Q(Xn ; ) is an asymptotic pivotal quantity.
Let a be the value such that P (Z
have
p
P
= P
a) = (1 + p) =2 where Z
a
Xn
p
n Xn
p
Xn
r
Xn
a
n
!
N(0; 1). Then by (6.14) we
a
Xn
r
a
and an approximate 100p% equal tail con…dence interval for
"
r
r #
xn
xn
xn a
; xn + a
n
n
Xn
n
is
!
6.7. APPROXIMATE CONFIDENCE INTERVALS
or
2
4^n
since ^n = xn .
6.7.3
s
a
^n
n
; ^n + a
s
219
^n
n
3
5
(6.15)
Exercise
Suppose Xn
Binomial(n; ). Show that
p
n
Q(Xn ; ) = q
Xn
n
Xn
n
1
Xn
n
is an asymptotic pivotal quantity. Show that an approximate 100p% equal tail con…dence
interval for based on Q(Xn ; ) is given by
s
s
2
3
^n (1 ^n )
^n (1 ^n )
4^n a
5
; ^n + a
(6.16)
n
n
where ^n =
6.7.4
xn
n .
Approximate Con…dence Intervals and the Limiting Distribution
of the Maximum Likelihood Estimator
The limiting distribution of the maximum likelihood estimator ~n = ~n (X1 ; X2 ; : : : ; Xn )
can also be used to construct approximate con…dence intervals. This is particularly useful
in cases in which the maximum likelihood estimate cannot be found explicitly.
Since
h
J(~n )
i1=2
(~n
) !D Z
N(0; 1)
h
i1=2
then J(~n )
(~n
) is an asymptotic pivotal quantity. An approximate 100p% con…dence interval based on this asymptotic pivotal quantity is given by
"
#
s
s
s
1
1
1
^n a
= ^n a
; ^n + a
(6.17)
J(^n )
J(^n )
J(^n )
where P (Z
a) =
1+p
2
and Z
N(0; 1).
Similarly since
[I(~n ; X)]1=2 (~n
) !D Z
N(0; 1)
then [I(~n ; X)]1=2 (~n
) is an asymptotic pivotal quantity. An approximate 100p% con…dence interval based on this asymptotic pivotal quantity is given by
"
#
s
s
s
1
1
1
^n a
= ^n a
; ^n + a
(6.18)
I(^n )
I(^n )
I(^n )
220
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
where I(^n ) is the observed information.
Notes:
(1) One drawback of these intervals is that we don’t know how large n needs to be to obtain
a good approximation.
(2) These approximate con…dence intervals are both symmetric about ^n which may not
be a reasonable summary of the plausible values of
likelihood intervals below.
in light of the observed data. See
(3) It is possible to obtain approximate con…dence intervals based on a given data set which
contain values which are not valid for . For example, an interval may contain negative
values although must be positive.
6.7.5
Example
Use the results from Example 6.3.5 to determine the approximate 100p% con…dence intervals
based on (6.17) and (6.18) in the case of Binomial data and Poisson data. Compare these
intervals with the intervals in (6.16) and (6.15).
Solution
From Example 6.3.5F we have that for Binomial data
I(^n ) = J(^n ) =
n
^n (1
^n )
so (6.17) and (6.18) both give the intervals
2
4^n
s
a
^n (1
^n )
n
s
; ^n + a
^n (1
n
which is the same interval as in (6.16).
From Example 6.3.5F we have that for Poisson data
n
I(^n ) = J(^n ) =
^n
so (6.17) and (6.18) both give the intervals
2
4^n
a
which is the same interval as in (6.15).
s
^n
n
s
; ^n + a
^n )
^n
n
3
5
3
5
6.7. APPROXIMATE CONFIDENCE INTERVALS
6.7.6
221
Example
For Example 6.3.7 construct an approximate 95% con…dence interval based on (6.18). Compare this with the 15% likelihood interval determined in Example 6.4.4.
Solution
From Example 6.3.7 we have ^ = 0:4951605 and I(^) = 181:8069. Therefore an approximate
95% con…dence interval based on (6.18) is
s
1
^ 1:96
I(^)
r
1
= 0:4951605 1:96
181:8069
= 0:4951605 0:145362
= [0:3498; 0:6405]
From Example 6.4.4 the 15% likelihood interval is [0:3550; 0:6401] which is very similar. We
expect this to happen since the relative likelihood function in Figure 6.3 is very symmetric.
6.7.7
Approximate Con…dence Intervals and Likelihood Intervals
In your previous statistics course you learned that likelihood intervals are also approximate
con…dence intervals.
6.7.8
Theorem
If a is a value such othat p = 2P (Z a) 1 where Z N (0; 1), then the likelihood interval
n
2
: R( ) e a =2 is an approximate 100p% con…dence interval.
Proof
By Theorem 6.5.1
2 log R( ; Xn ) = 2 log
L( ; Xn )
!D W
L(~n ; Xn )
2
(1)
(6.19)
where
Xn = (X1 ;oX2 ; : : : ; Xn ). The con…dence coe¢ cient corresponding to the interval
n
2
: R( ) e a =2 is
P
L( ; Xn )
L(~n ; Xn )
e
a2 =2
= P
a2
P W
a2
where W
2
= 2P (Z
a)
1 where Z
N (0; 1)
= p
as required.
2 log R( ; Xn )
(1) by 6.19
222
6.7.9
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Example
Since
0:95 = 2P (Z
1:96)
1 where Z
N (0; 1)
and
e
(1:96)2 =2
therefore a 15% likelihood interval for
.
6.7.10
1:9208
=e
0:1465
0:15
is also an approximate 95% con…dence interval for
Exercise
(a) Show that a 1% likelihood interval is an approximate 99:8% con…dence interval.
(b) Show that a 50% likelihood interval is an approximate 76% con…dence interval.
Note that while the con…dence intervals given by (6.17) or (6.18) are symmetric about the
point estimate ^n , this is not true in general for likelihood intervals.
6.7.11
Example
For Example 6.3.7 compare a 15% likelihood interval with the approximate 95% con…dence
interval in Example 6.7.6.
Solution
From Example 6.3.7 the 15% likelihood interval is
[0:3550; 0:6401]
and from Example 6.7.6 the approximate 95% con…dence interval is
[0:3498; 0:6405]
These intervals are very close and agree to 2 decimal places. The reason for this is because
the likelihood function (see Figure 6.3) is very symmetric about the maximum likelihood
estimate. The approximate intervals (6.17) or (6.18) will be close to the corresponding
likelihood interval whenever the likelihood function is reasonably symmetric about the
maximum likelihood estimate.
6.7.12
Exercise
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Logistic( ; 1) distribution
with probability density function
f (x; ) =
e
(x
1+e
(x
)
) 2
for x 2 <;
2<
6.7. APPROXIMATE CONFIDENCE INTERVALS
223
(a) Find the likelihood function, the score function, and the information function. How
would you …nd the maximum likelihood estimate of ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x=
log
1
u
1
is an observation from the Logistic( ; 1) distribution.
(c) Use the following R code to randomly generate 30 observations from a Logistic( ; 1)
distribution.
# randomly generate 30 observations from a Logistic(theta,1)
# using a random theta value between 2 and 3
set.seed(21086689) # set the seed so results can be reproduced
truetheta<-runif(1,min=2,max=3)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((truetheta-log(1/runif(30)-1)),2))
x
(d) Use R to plot the likelihood function for
(e) Use Newton’s Method and R to …nd ^.
based on these data.
(f ) What are the values of S(^) and I(^)?
(g) Use R to plot the relative likelihood function for
based on these data.
(h) Compare the 15% likelihood interval with the approximate 95% con…dence interval
(6.18).
224
6.8
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Chapter 6 Problems
1. Suppose X Binomial(n; ). Plot the log relative likelihood function for if x = 3 is
observed for n = 100. On the same graph plot the log relative likelihood function for
if x = 6 is observed for n = 200. Compare the graphs as well as the 10% likelihood
interval and 50% likelihood interval for .
2. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Discrete Uniform(1; )
distribution. Find the likelihood function, the maximum likelihood estimate of and
the maximum likelihood estimator of . If n = 20 and x20 = 33, …nd a 15% likelihood
interval for .
3. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Geometric( ) distribution.
(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi ).
(d) If n = 20 and
20
P
xi = 40 then …nd the maximum likelihood estimate of
and a
i=1
15% likelihood interval for . Is
= 0:5 a plausible value of ? Why?
4. Suppose (X1 ; X2 ; X3 ) Multinomial(n; 2 ; 2 (1
) ; (1
)2 ). Find the maximum
likelihood estimator of , the observed information and the expected information.
5. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Pareto(1; ) distribution.
(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi ).
(d) If n = 20 and
20
P
log xi = 10 …nd the maximum likelihood estimate of
and a
i=1
15% likelihood interval for . Is
= 0:1 a plausible value of ? Why?
(e) Show that
Q(X; ) = 2
n
P
log Xi
i=1
is a pivotal quantity. (Hint: What is the distribution of log X if X Pareto(1; )?)
Use this pivotal quantity to determine a 95% equal tail con…dence interval for
the data in (d). Compare this interval with the 15% likelihood interval.
6.8. CHAPTER 6 PROBLEMS
225
6. The following model is proposed for the distribution of family size in a large population:
k
P (k children in family; ) =
for k = 1; 2; : : :
1 2
1
P (0 children in family; ) =
The parameter is unknown and 0 < < 12 . Fifty families were chosen at random
from the population. The observed numbers of children are given in the following
table:
No. of children
0
1
2 3 4 Total
Frequency observed 17 22 7 3 1 50
(a) Find the likelihood, log likelihood, score and information functions for .
(b) Find the maximum likelihood estimate of
and the observed information.
(c) Find a 15% likelihood interval for .
(d) A large study done 20 years earlier indicated that
for these data?
= 0:45. Is this value plausible
(e) Calculate estimated expected frequencies. Does the model give a reasonable …t
to the data?
7. Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the Two Parameter
Exponential( ; 1) distribution. Show that is a location parameter and ~ = X(1) is
the maximum likelihood estimator of . Show that
P (~
and thus show that
and
q) = 1
^ + 1 log (1
n
^ + 1 log
n
1
p
2
e
nq
for q
0
p) ; ^
1
; ^ + log
n
1+p
2
are both 100p% con…dence intervals for . Which con…dence interval seems more
reasonable?
8. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Gamma
(a) Find ~n , the maximum likelihood estimator of .
(b) Justify the statement ~n !p .
(c) Find the maximum likelihood estimator of V ar(Xi ).
1 1
2;
distribution.
226
6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
(d) Use moment generating functions to show that Q = 2
n
P
Xi
2 (n).
If n = 20
i=1
and
20
P
xi = 6, use the pivotal quantity Q to construct an exact 95% equal tail
i=1
con…dence interval for . Is = 0:7 a plausible value of ?
h
i1=2
!D Z
N(0; 1). Use this asymptotic pivotal
(e) Verify that (~n
) J(~n )
quantity to construct an approximate 95% con…dence interval for . Compare
this interval with the exact con…dence interval from (d) and a 15% likelihood
interval for . What do the approximate con…dence interval and the likelihood
interval indicate about the plausibility of the value = 0:7?
9. The number of coliform bacteria X in a 10 cubic centimeter sample of water from a
section of lake near a beach has a Poisson( ) distribution.
(a) If a random sample of n specimen samples is taken and X1 ; X2 ; : : : ; Xn are
the respective numbers of observed bacteria, …nd the likelihood function, the
maximum likelihood estimator and the expected information for :
20
P
(b) If n = 20 and
xi = 40, obtain an approximate 95% con…dence interval for
i=1
and a 15% likelihood interval for . Compare the intervals.
(c) Suppose there is a fast, simple test which can detect whether there are bacteria
present, but not the exact number. If Y is the number of samples out of n which
have bacteria, show that
P (Y = y) =
n
(1
y
e
)y (e
)n
y
for y = 0; 1; : : : ; n
(d) If n = 20 and we found y = 17 of the samples contained bacteria, use the
likelihood function from part (c) to get an approximate 95% con…dence interval
for . Hint: Let = 1 e
and use the likelihood function for to get an
approximate con…dence interval for and then transform this to an approximate
con…dence interval for = log(1
).
7. Maximum Likelihood
Estimation - Multiparameter
In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates for the case in which the unknown parameter is a vector of unknown
parameters = ( 1 ; 2 ; : : : ; k ). In your previous statistics course you would have seen the
N ; 2 model with two unknown parameters = ; 2 and the simple linear regression
model N + x; 2 with three unknown parameters = ; ; 2 .
Although the case of k parameters is a natural extension of the one parameter case, the
k parameter case is more challenging. For example, the maximum likelihood estimates are
usually found be solving k nonlinear equations in k unknowns 1 ; 2 ; : : : ; k . In most cases
there are no explicit solutions and the maximum likelihood estimates must be found using a
numerical method such as Newton’s Method. Another challenging issue is how to summarize
the uncertainty in the k estimates. For one parameter it is straightforward to summarize
the uncertainty using a likelihood interval or a con…dence interval. For k parameters these
intervals become regions in <k which are di¢ cult to visualize and interpret.
In Section 7:1 we give all the de…nitions related to …nding the maximum likelihood
estimates for k unknown parameters. These de…nitions are analogous to the de…nitions
which were given in Chapter 6 for one unknown parameter. We also give the extension
of the invariance property of maximum likelihood estimates and Newton’s Method for k
variables. In Section 7:2 we de…ne likelihood regions and show how to …nd them for the
case k = 2.
In Section 7:3 we introduce the Multivariate Normal distribution which is the natural
extension of the Bivariate Normal distribution discussed in Section 3.10. We also give the
limiting distribution of the maximum likelihood estimator of which is a natural extension
of Theorem 6.5.1.
In Section 7:4 we show how to obtain approximate con…dence regions for based on
the limiting distribution of the maximum likelihood estimator of and show how to …nd
them for the case k = 2. We also show how to …nd approximate con…dence intervals for
individual parameters and indicate that these intervals must be used with care.
227
228
7.1
7.1.1
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Likelihood and Related Functions
De…nition - Likelihood Function
Suppose X is a (vector) random variable with probability (density) function f (x; ), where
= ( 1 ; 2 ; : : : ; k ) 2 and is the parameter space or set of possible values of . Suppose
also that x is an observed value of X. The likelihood function for based on the observed
data x is
L( ) = L( ; x) = f (x; ) for
2
(7.1)
If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from a distribution with probability function
f (x; ) and x = (x1 ; x2 ; : : : ; xn ) are the observed data then the likelihood function for
based on the observed data x1 ; x2 ; : : : ; xn is
L( ) = L ( ; x)
n
Q
=
f (xi ; ) for
i=1
2
Note: If X is a discrete random variable then L( ) = P (observing the data x; ). If X is
a continuous random variable then an argument similar to the one in 6.2.6 can be made to
justify the use of (7.1).
7.1.2
De…nition - Maximum Likelihood Estimate and Estimator
The value of that maximizes the likelihood function L( ) is called the maximum likelihood
estimate. The maximum likelihood estimate is a function of the observed data x and we
write ^ = ^ (x). The corresponding maximum likelihood estimator which is a random vector
is denoted by ~ = ~(X).
As in the case of k = 1 it is frequently easier to work with the log likelihood function which
is maximized at the same value of as the likelihood function.
7.1.3
De…nition - Log Likelihood Function
The log likelihood function is de…ned as
l( ) = l ( ; x) = log L( ) for
2
where x are the observed data and log is the natural logarithmic function.
The maximum likelihood estimate of , ^ = (^1 ; ^2 ; : : : ; ^k ) is usually found by solving
@l( )
@ j = 0; j = 1; 2; : : : ; k simultaneously. See Chapter 7, Problem 1 for an example in which
the maximum likelihood estimate is not found in this way.
7.1. LIKELIHOOD AND RELATED FUNCTIONS
7.1.4
229
De…nition - Score Vector
If = ( 1 ;
as
2; : : : ; k )
then the score vector (function) is a 1
S( ) = S( ; x) =
@l( ) @l( )
@l( )
;
;:::;
@ 1 @ 2
@ k
k vector of functions de…ned
for
2
where x are the observed data.
We will see that, as in the case of one parameter, the information matrice will provides
information about the variance of the maximum likelihood estimator.
7.1.5
De…nition - Information Matrix
If = ( 1 ; 2 ; : : : ; k ) then the information matrix (function) I( ) = I ( ; x) is a k
symmetric matrix of functions whose (i; j) entry is given by
@ 2 l( )
@ i@ j
for
k
2
where x are the observed data. I(^) is called the observed information matrix.
7.1.6
De…nition - Expected Information Matrix
If
= ( 1 ; 2 ; : : : ; k ) then the expected information matrix (function) J( ) is a k
symmetric matrix of functions whose (i; j) entry is given by
E
@ 2 l( ; X)
@ i@ j
for
k
2
The invariance property of the maximum likelihood estimator also holds in the multiparameter case.
7.1.7
Theorem - Invariance of the Maximum Likelihood Estimate
If ^ = (^1 ; ^2 ; : : : ; ^k ) is the maximum likelihood estimate of
is the maximum likelihood estimate of g ( ).
7.1.8
= ( 1;
2; : : : ; k )
then g(^)
Example
Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the N( ; 2 ) distribution. Find the
score vector, the information matrix, the expected information matrix and the maximum
likelihood estimator of = ; 2 . Find the observed information matrix I ^ ; ^ 2 and thus
verify that ^ ; ^ 2 is the maximum likelihood estimator of ; 2 . What is the maximum
likelihood estimator of the parameter =
; 2 = = which is called the coe¢ cient of
variation?
230
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Solution
The likelihood function is
L
;
2
n
Q
=
i=1
p
= (2 )
1
2
1
exp
n=2
n=2
2
)2
(xi
2
2
exp
n
1 P
2
2
or more simply
L
;
2
=
2
n=2
exp
n
1 P
2
2
The log likelihood function is
l
;
2
=
=
=
n
log
2
n
log
2
n
log
2
2
2
1
2
1
2
2
2
n
1 P
2
1
2
1
n
2
2
(xi
)2
for
2 <,
1
2 <,
2
+
n
P
x)2 + n (x
(xi
h
(n
n
1 i=1
2
2
h
i
for
x)2
(xi
2
)=n
1
2
)2
1) s2 + n (x
n
P
1
@l
= 0;
@
(n
1
(x
)
)2
1) s2 + n (x
@l
=0
@ 2
are solved simultaneously for
2
Since
@2l
@ 2
@2l
@(
2 )2
=
=
>0
>0
)2
The equations
= x and
2
)2
i=1
@l
n
= 2 (x
@
@l
=
@ 2
for
i=1
(xi
2
s2 =
and
)2
i=1
where
Now
(xi
i=1
=
n
1 P
(xi
n i=1
x)2 =
(n
1)
n
@2l
n (x
)
=
2
4
@ @
n 1
1 h
(n 1) s2 + n (x
+
6
2 4
s2
n
;
2
)2
i
i
2 <,
2
>0
7.1. LIKELIHOOD AND RELATED FUNCTIONS
231
the information matrix is
I
;
2
Since
2
=4
n(x
n
2
n(x
)
n 1
2 4
4
+
h
(n
1
6
)
4
)2
1) s2 + n (x
3
i 5
n
n2
2
>
0
and
det
I
^
;
^
=
>0
^2
2^ 6
then by the second derivative test the maximum likelihood estimates of of
I11 ^ ; ^ 2 =
n
1 P
(xi
n i=1
^ = x and ^ 2 =
x)2 =
(n
1)
n
and
s2
and the maximum likelihood estimators are
n
1 P
Xi
n i=1
~ = X and ~ 2 =
The observed information is
2
I ^; ^2 = 4
Now
E
n
n
=
2
2
;
E
2
X
n
^2
0
0
n
2^ 4
"
(n
=
1)
n
S2
3
5
#
n X
4
=0
Also
n 1
1 h
+
(n 1) S 2 + n X
6
2 4
h
n 1
1 n
2
+
(n
1)
E(S
)
+
nE
X
6
2 4
1
n 1
+ 6 (n 1) 2 + 2
2 4
n
2 4
E
=
=
=
since
E X
= 0; E
h
2
X
i
2
= V ar X =
Therefore the expected information matrix is
"
J
2
;
=
n
n
2 4
0
and the inverse of the expected information matrix is
2 2
J
;
2
1
=4
#
0
2
n
n
0
0
2 4
n
3
5
2
i
2
io
and E S 2 =
2
2
are
232
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Note that
2
V ar X
=
V ar ^ 2
= V ar
n
n
1 P
Xi
n i=1
X
2
=
2(n
1)
2 4
n
4
n2
and
n
P
1
Cov(X;
Xi
n
i=1
Cov(X; ^ 2 ) =
since X and
n
P
Xi
X
2
2
X )=0
are independent random variables.
i=1
By the invariance property of maximum likelihood estimators the maximum likelihood
estimator of = = is ^ = ^ =^ .
Recall from your previous statistics course that inferences for
the independent pivotal quantities
X
p
S= n
t (n
1) and
See Figure 7.1 for a graph of R
;
(n
1)S 2
2
2
(n
and
2
are made using
1)
for n = 50; ^ = 5 and ^ 2 = 4.
2
1
0.8
0.6
0.4
0.2
0
8
6
6
5.5
4
5
2
σ
2
4.5
0
4
µ
Figure 7.1: Normal Relative Likelihood Function for n = 50; ^ = 5 and ^ 2 = 4
7.1. LIKELIHOOD AND RELATED FUNCTIONS
7.1.9
233
Exercise
Suppose Yi N( + xi ; 2 ), i = 1; 2; : : : ; n independently where the xi are known constants.
Show that the maximum likelihood estimators of , and 2 are given by
^x
~ = Y
n
P
(xi x) Yi Y
~ = i=1
n
P
(xi x)2
i=1
~
2
n
1 P
(Yi
n i=1
=
~ xi )2
~
Note: ~ and ~ are also the least squares estimators of
7.1.10
and .
Example
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Beta(a; b) distribution with
probability density function
(a + b) a
x
(a) (b)
f (x; a; b) =
1
x)b
(1
1
for 0 < x < 1, a > 0; b > 0
Find the likelihood function, the score vector, and the information matrix and the expected
information matrix. How would you …nd the maximum likelihood estimates of a and b?
Solution
The likelihood function is
L(a; b) =
n
Q
(a + b) a
x
(a) (b) i
i=1
=
(a + b)
(a) (b)
n
(a + b)
(a) (b)
The log likelihood function is
l(a; b) = n [log (a + b)
where
t1 =
n
n
Q
n
Q
xi )b
(1
a 1
xi
i=1
or more simply
L(a; b) =
1
a
xi
i=1
log (a)
1
n
Q
for a > 0; b > 0
b 1
(1
xi )
i=1
n
Q
b
(1
xi )
for a > 0; b > 0
i=1
log (b) + at1 + bt2 ]
n
n
1 P
1 P
log xi and t2 =
log(1
n i=1
n i=1
for a > 0; b > 0
xi )
234
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Let
(z) =
0 (z)
d log (z)
=
dz
(z)
which is called the digamma function.
The score vector is
S (a; b) =
h
= n
@l
@a
h
@l
@b
i
(a + b)
(a) + t1
(a + b)
(b) + t2
i
for a > 0; b > 0. S (a; b) = (0; 0) must be solved numerically to …nd the maximum likelihood
estimates of a and b.
Let
d
0
(z)
(z) =
dz
which is called the trigamma function.
The information matrix is
I(a; b) = n
"
0 (a)
0 (a
0 (a
0 (a
+ b)
0 (b)
+ b)
+ b)
0 (a
+ b)
#
for a > 0; b > 0, which is also expected information matrix.
7.1.11
Exercise
Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Gamma( ; ) distribution.
Find the likelihood function, the score vector, and the information matrix and the expected
information matrix. How would you …nd the maximum likelihood estimates of and ?
Often S( 1 ; 2 ; : : : ;
Newton’s Method.
7.1.12
k)
= (0; 0; : : : ; 0) must be solved numerically using a method such as
Newton’s Method
Let (0) be an initial estimate of
using
(i+1)
=
(i)
= ( 1;
h
+ S(
(i)
2 ; : : : ; k ).
ih
) I(
(i)
)
i
1
The estimate
(i)
can be updated
for i = 0; 1; : : :
Note: The initial estimate, (0) , may be determined by calculating L ( ) for a grid of
values to determine the region in which L ( ) obtains a maximum.
7.1. LIKELIHOOD AND RELATED FUNCTIONS
7.1.13
235
Example
Use the following R code to randomly generate 35 observations from a Beta(a; b) distribution
# randomly generate 35 observations from a Beta(a,b)
set.seed(32086689)
# set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=2,max=3)
trueb<-runif(1,min=1,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rbeta(35,truea,trueb),2))
x
Use Newton’s Method and R to …nd (^
a; ^b).
What are the values of S(^
a; ^b) and I(^
a; ^b)?
Solution
The generated data are
0:08
0:34
0:55
0:73
0:19
0:36
0:55
0:77
0:21
0:39
0:56
0:79
0:25
0:45
0:56
0:81
0:28
0:45
0:61
0:85
0:29
0:47
0:63
0:29
0:48
0:64
0:30
0:49
0:65
0:30
0:54
0:69
0:32
0:54
0:69
The maximum likelihood estimates of a and b can be found using Newton’s Method given
by
"
# "
#
h
i 1
a(i+1)
a(i)
(i) (i)
(i) (i)
=
+
S(a
;
b
)
I(a
;
b
)
b(i+1)
b(i)
for i = 0; 1; ::: until convergence.
Here is R code for Newton’s Method for the Beta Example.
# function for calculating Beta score for a and b and data x
BESF<-function(a,b,x)
{S<-length(x)*c(digamma(a+b)-digamma(a)+mean(log(x)),
digamma(a+b)-digamma(b)+mean(log(1-x))))
return(S)}
#)
# function for calculating Beta information for a and b)
BEIF<-function(a,b)
{I<-length(x)*cbind(c(trigamma(a)-trigamma(a+b),-trigamma(a+b)),
c(-trigamma(a+b),trigamma(b)-trigamma(a+b))))
return(I)}
236
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
# Newton’s Method for Beta Example
NewtonBE<-function(a,b,x)
{thold<-c(a,b)
thnew<-thold+0.1
while (sum(abs(thold-thnew))>0.0000001)
{thold<-thnew
thnew<-thold+BESF(thold[1],thold[2],x)%*%solve(BEIF(thold[1],thold[2]))
print(thnew)}
return(thnew)}
thetahat<-NewtonBE(2,2,x)
The maximum likelihood estimates are a
^ = 2:824775 and ^b = 2:97317. The score vector
^
evaluated at (^
a; b) is
S(^
a; ^b) =
h
3:108624
10
14
7:771561
10
15
i
which indicates we have obtained a local extrema. The observed information matrix is
"
#
8:249382
6:586959
I(^
a; ^b) =
6:586959 7:381967
Note that since
det[I(^
a; ^b)] = (8:249382) (7:381967)
(6:586959)2 > 0
and
[I(^
a; ^b)]11 = 8:249382 > 0
then by the second derivative test we have found the maximum likelihood estimates.
7.1.14
Exercise
Use the following R code to randomly generate 30 observations from a Gamma( ; ) distribution
# randomly generate 35 observations from a Gamma(a,b)
set.seed(32067489)
# set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=1,max=3)
trueb<-runif(1,min=3,max=5)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rgamma(30,truea,scale=trueb),2))
x
Use Newton’s Method and R to …nd (^ ; ^ ). What are the values of S(^ ; ^ ) and I(^ ; ^ )?
7.2. LIKELIHOOD REGIONS
7.2
237
Likelihood Regions
For one unknown parameter likelihood intervals provide a way of summarizing the uncertainty in the maximum likelihood estimate by providing an intervals of values which are
plausible given the observed data. For k unknown parameter summarizing the uncertainty
is more challenging. We begin with the de…nition of a likelihood region which is the natural
extension of a likelihood interval.
7.2.1
De…nition - Likelihood Regions
The set of
values for which R( )
p is called a 100p% likelihood region for .
A 100p% likelihood region for two unknown parameters = ( 1 ; 2 ) is given by
f( 1 ; 2 ) ; R ( 1 ; 2 ) pg. These regions will be elliptical in shape. To show this we note
that for ( 1 ; 2 ) su¢ ciently close to (^1 ; ^2 ) we have
"
#
"
#
h
i
^1
^1
1
1
1
^1
^2
L ( 1; 2)
L(^1 ; ^2 ) + (^1 ; ^2 )
+
I(^1 ; ^2 )
1
2
^2
^2
2
2
2
"
#
h
i
^
1 ^
1
1
^2
I(^1 ; ^2 )
= L(^1 ; ^2 ) +
since S(^1 ; ^2 ) = 0
1
1
2
^2
2
2
Therefore
R ( 1;
2)
=
L ( 1; 2)
L(^1 ; ^2 )
1
= 1
h
1
^1
2L(^1 ; ^2 )
h
i 1h
2L(^1 ; ^2 )
(
The set of points ( 1 ;
( 1 ; 2 ) which satisfy
(
1
2)
^1 )2 I^11 + 2(
1
1
^2
^1 )(
2
"
^1 )2 I^11 + 2
which satisfy R ( 1 ;
1
2
i
2)
^2 )I^12 + (
I^11 I^12
I^12 I^22
1
#"
^1 (
2
^1
1
^2
2
#
^2 )I^12 + (
2
^2 )2 I^22
i
= p is approximately the set of points
2
^2 )2 I^22 = 2 (1
p) L(^1 ; ^2 )
which we recognize as the points on an ellipse centred at (^1 ; ^2 ). Therefore a 100p%
likelihood region for two unknown parameters = ( 1 ; 2 ) will be the set of points on and
inside a region which will be approximately elliptical in shape.
A similar argument can be made to show that the likelihood regions for three unknown
parameters = ( 1 ; 2 ; 3 ) will be approximate ellipsoids in <3 .
238
7.2.2
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Example
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters (a; b)
in Example 7.1.13. Comment on the shapes of the regions.
(b) Is the value (2:5; 3:5) a plausible value of (a; b)?
Solution
(a) The following R code generates the required likelihood regions.
# function for calculating Beta relative likelihood function
# for parameters a,b and data x
BERLF<-function(a,b,that,x)
{t1<-prod(x)
t2<-prod(1-x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-<-((gamma(a+b)*gamma(ah)*gamma(bh))/
(gamma(a)*gamma(b)*gamma(ah+bh)))^n*t1^(a-ah)*t2^(b-bh)
return(L)}
#
a<-seq(0.5,5.5,0.01)
b<-seq(0.5,6,0.01)
R<-outer(a,b,FUN = BERLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
The 1%, 5%, 10%, 50%, and 90% likelihood regions for (a; b) are shown in Figure 7.2.
The likelihood contours are approximate ellipses but they are not symmetric about
the maximum likelihood estimates (^
a; ^b) = (2:824775; 2:97317). The likelihood regions are
more stretched for larger values of a and b. The ellipses are also skewed relative to the ab
coordinate axes. The skewness of the likelihood contours relative to the ab coordinate axes
is determined by the value of I^12 . If the value of I^12 is close to zero the skewness will be
small.
(b) Since R (2:5; 3:5) = 0:082 the point (2:5; 3:5) lies outside a 10% likelihood region so it
is not a very plausible value of (a; b).
Note however that a = 2:5 is a plausible value of a for some values of b, for example,
a = 2:5, b = 2:5 lies inside a 50% likelihood region so (2:5; 2:5) is a plausible value of (a; b).
We see that when there is more than one parameter then we need to determine whether a
set of values are jointly plausible given the observed data.
239
1
2
3
b
4
5
6
7.2. LIKELIHOOD REGIONS
1
2
3
4
5
a
Figure 7.2: Likelihood regions for Beta example
7.2.3
Exercise
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters ( ; )
in Exercise 7.1.14. Comment on the shapes of the regions.
(b) Is the value (3; 2:7) a plausible value of ( ; )?
(c) Use the R code in Exercise 7.1.14 to generate 100 observations from the Gamma( ; )
distribution.
(d) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for the data
generated in (c). Comment on the shapes of these regions as compared to the regions in
(a).
240
7.3
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Limiting Distribution of Maximum Likelihood Estimator
To discuss the asymptotic properties of the maximum likelihood estimator in the multiparameter case we need to de…ne convergence in probability and convergence in distribution
of a sequence of random vectors.
7.3.1
De…nition - Convergence of a Sequence of Random Vectors
Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random vectors where Xn = (X1n ; X2n ; : : : ; Xkn ).
Let X = (X1 ; X2 ; : : : ; Xk ) be a random vector and x = (x1 ; x2 ; : : : ; xk ).
(1) If Xin !p Xi for i = 1; 2; : : : ; k, then
Xn !p X
(2) Let Fn (x) = P (X1n x1 ; X2n x2 ; : : : ; Xkn xk ) be the cumulative distribution
function of Xn . Let F (x) = P (X1 x1 ; X2 x2 ; : : : ; Xk xk ) be the cumulative distribution function of X. If
lim Fn (x) = F (x)
n!1
at all points of continuity of F (x) then
Xn !D X = (X1 ; X2 ; : : : ; Xk )
To discuss the asymptotic properties of the maximum likelihood estimator in the multiparameter case we also need the de…nition and properties of the Multivariate Normal Distribution. The Multivariate Normal distribution is the natural extension of the Bivariate
Normal distribution which was discussed in Section 3.10.
7.3.2
De…nition - Multivariate Normal Distribution (MVN)
Let X = (X1 ; X2 ; : : : ; Xk ) be a 1 k random vector with E(Xi ) = i and Cov(Xi ; Xj ) = ij ;
i; j = 1; 2; : : : ; k. (Note: Cov(Xi ; Xi ) = ii = V ar(Xi ) = 2i .) Let
= ( 1; 2; : : : ; k )
be the mean vector and be the k k symmetric covariance matrix whose (i; j) entry is
1 , exists. If the joint probability density
ij . Suppose also that the inverse matrix of ,
function of (X1 ; X2 ; : : : ; Xk ) is given by
f (x1 ; x2 ; : : : ; xk ) =
1
(2 )k=2 j j1=2
exp
1
(x
2
)
1
(x
)T
for x 2 <k
where x = (x1 ; x2 ; : : : ; xk ) then X is said to have a Multivariate Normal distribution. We
write X MVN( ; ).
The following theorem gives some important properties of the Multivariate Normal distribution. These properties are a natural extension of the properties of the Bivariate Normal
distribution found in Theorem 3.10.2.
7.3. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR
7.3.3
241
Theorem - Properties of MVN Distribution
Suppose X = (X1 ; X2 ; : : : ; Xk )
MVN( ; ). Then
(1) X has joint moment generating function
1
tT + t tT
2
M (t) = exp
for t = (t1 ; t2 ; : : : ; tk ) 2 <k
(2) Any subset of X1 ; X2 ; : : : ; Xk also has a MVN distribution and in particular
Xi N i ; 2i ; i = 1; 2; : : : ; k.
(3)
(X
1
)
)T
(X
2
(k)
(4) Let c = (c1 ; c2 ; : : : ; ck ) be a nonzero vector of constants then
XcT =
k
P
ci Xi
cT ; c cT
N
i=1
(5) Let A be a k
p vector of constants of rank p
k then
N( A; AT A)
XA
(6) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the rest of the coordinates is a MVN distribution. In particular the conditional probability density function
of Xi given Xj = xj ; i 6= j; is
Xi jXj = xj
N(
i
+
ij i (xj
j )= j ;
(1
2
2
ij ) i )
The following theorem gives the asymptotic distribution of the maximum likelihood estimator in the multiparameter case. This theorem looks very similar to Theorem 6.5.1 with
the scalar quantities replaced by the appropriate vectors and matrices.
7.3.4
Theorem - Limiting Distribution of the Maximum Likelihood Estimator
Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where = ( 1 ; 2 ; : : : ; k ) 2
and the dimension of is k. Let ~n = ~n (X1 ; X2 ; : : : ; Xn ) be the maximum likelihood
estimator of based on Xn . Let 0k be a 1 k vector of zeros, Ik be the k k identity
matrix, and [J( )]1=2 be a matrix such that [J( )]1=2 [J( )]1=2 = J( ). Then under certain
(regularity) conditions
~n !p
(~n
) [J( )]1=2 !D Z
2 log R( ; Xn ) = 2[l(~n ; Xn )
(7.2)
MVN(0k ; Ik )
l( ; Xn )] !D W
(7.3)
2
(k)
(7.4)
242
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
for each
2 .
Since ~n !p , ~n is a consistent estimator of .
Theorem 7.3.4 implies that for su¢ ciently large n, ~n has an approximately
MVN( ; [J( 0 )] 1 ) distribution. Therefore for su¢ ciently large n
E(~n )
and therefore ~n is an asymptotically unbiased estimator of . Also
V ar(~n )
1
[J( )]
where [J( )] 1 is the inverse matrix of of the matrix J( ). (Since J( ) is a k k symmetric matrix, [J( )] 1 also a k k symmetric matrix.) [J( )] 1 is called the asymptotic
variance/covariance matrix of ~n . Of course J( ) is unknown because is unknown. But
(7.2), (7.3) and Slutsky’s Theorem imply that
(~n
)[J(~n )]1=2 !D Z
MVN(0k ; Ik )
Therefore for su¢ ciently large n we have
V ar(~n )
h
i
where J(^n )
1
h
i
J(^n )
1
is the inverse matrix of J(^n ).
It is also possible to show that
(~n
)[I(~n ; X)]1=2 !D MVN(0k ; Ik )
so that for su¢ ciently large n we also have
V ar(~n )
h
i
where I(^n )
1
h
I(^n )
i
1
is the inverse matrix of the observed information matrix I(^n ).
These results can be used to construct approximate con…dence regions for
in the next section.
as shown
Note: These results do not hold if the support set of X depends on .
7.4
7.4.1
Approximate Con…dence Regions
De…nition - Con…dence Region
A 100p% con…dence region for
satis…es
= ( 1;
2; : : : ; k )
based on X is a region R(X)
<k which
P [ 2 R(X)] = p
Exact con…dence regions can only be obtained in a very few special cases such as Normal
linear models. More generally we must rely on approximate con…dence regions based on
the results of Theorem 7.3.3.
7.4. APPROXIMATE CONFIDENCE REGIONS
7.4.2
243
Asymptotic Pivotal Quantities and Approximate Con…dence Regions
The limiting distribution of ~n can be used to obtain approximate con…dence regions for
. Since
(~n
)[J( )]1=2 !D Z MVN(0k ; Ik )
it follows from Theorem 7.3.3(3) and Limit Theorems that
(~n
)J(~n )(~n
)T !D W
2
(k)
An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n
)J(~n )(~n
)T is the set of all vectors in the set
f : (^n
where c is the value such that P (W
)J(^n )(^n
)T
cg
2 (k).
c) = p and W
Similarly since
(~n
)[I(~n ; Xn )]1=2 !D Z
MVN(0k ; Ik )
it follows from Theorem 7.3.3(3) and Limit Theorems that
(~n
)I(~n ; Xn )(~n
)T !D W
2
(k)
An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n
)I(~n ; Xn )(~n
)T is the set of all vectors in the set
f : (^n
)I(^n )(^n
)T
cg
where I(^n ) is the observed information.
Finally since
2
2 log R( ; Xn ) !D W
an approximate 100p% con…dence region for
the set of all vectors satisfying
f :
(k)
based on this asymptotic pivotal quantity is
2 log R( ; x)
cg
where x = (x1 ; x2 ; : : : ; xn ) are the observed data. Since
f : 2 log R( ; x) cg
n
o
=
: R( ; x) e c=2
we recognize that this interval is actually a likelihood region.
244
7.4.3
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Example
Use R and the results from Examples 7.1.10 and 7.1.13 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Example 7.2.2.
Solution
From Example 7.1.10 that for a random sample from the Beta(a; b) distribution the information matrix and the expected information matrix are given by
"
#
0 (a)
0 (a + b)
0 (a + b)
I(a; b) = n
= J(a; b)
0 (a + b)
0 (b)
0 (a + b)
Since
h
a
~
a ~b
b
i
J(~
a; ~b)
"
a
~
~b
a
b
#
2
!D W
an approximate 100p% con…dence region for (a; b) is given by
"
#
h
i
a
^
a
^
f(a; b) : a
a; b) ^
^ a ^b b J(^
b b
where P (W
using
c) = p. Since
2 (2)
(2)
cg
= Gamma(1; 2) = Exponential(2), c can be determined
p = P (W
c) =
Zc
1
e
2
x=2
dx = 1
e
c=2
0
which gives
c=
For p = 0:95, c =
2 log(1
p)
2 log(0:05) = 5:99; an approximate 95% con…dence region is given by
"
#
h
i
a
^
a
f(a; b) : a
5:99g
a; ^b) ^
^ a ^b b J(^
b b
If we let
J(^
a; ^b) =
"
J^11 J^12
J^12 J^22
#
then the approximate con…dence region can be written as
f(a; b) : (^
a
a)2 J^11 + 2 (^
a
a) (^b
b)J^12 + (^b
b)2 J^22
5:99g
We note that the approximate con…dence region is the set of points on and inside the ellipse
(^
a
a)2 J^11 + 2 (^
a
a) (^b
b)J^12 + (^b
b)2 J^22 = 5:99
7.4. APPROXIMATE CONFIDENCE REGIONS
245
which is centred at (^
a; ^b).
For the data in Example 7.1.13, a
^ = 2:824775, ^b = 2:97317 and
"
#
8:249382
6:586959
I(^
a; ^b) = J(^
a; ^b) =
6:586959 7:381967
Approximate 90% ( 2 log(0:1) = 4:61), 95% ( 2 log(0:05) = 5:99), and 99% ( 2 log(0:01) = 9:21)
con…dence regions are shown in Figure 7.3.
The following R code generates the required approximate con…dence regions.
1
2
3
b
4
5
6
# function for calculating values for determining confidence regions
ConfRegion<-function(a,b,th,info)
{c<-(th[1]-a)^2*info[1,1]+2*(th[1]-a)*
(th[2]-b)*info[1,2]+(th[2]-b)^2*info[2,2]
return(c)}
#
# graph approximate confidence regions
a<-seq(1,5.5,0.01)
b<-seq(1,6,0.01)
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
#
1
2
3
4
5
a
Figure 7.3: Approximate con…dence regions for Beta(a; b) example
246
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
A 10% likelihood region for (a; b) is given by f(a; b) : R(a; b; x)
2
2 log R(a; b; Xn ) !D W
0:1g. Since
(2) = Exponential (2)
we have
P [R(a; b; X)
0:1] = P [ 2 log R(a; b; X)
P (W
2 log (0:1)]
2 log (0:1))
[ 2 log(0:1)]=2
= 1
e
= 1
0:1 = 0:9
and therefore a 10% likelihood region corresponds to an approximate 90% con…dence region.
Similarly 1% and 5% likelihood regions correspond to approximate 99% and 95% con…dence
regions respectively.
If we compare the likelihood regions in Figure 7.2 with the approximate con…dence
regions shown in Figure 7.3 we notice that the con…dence regions are exact ellipses centred
at the maximum likelihood estimates whereas the likelihood regions are only approximate
ellipses not centered at the maximum likelihood estimates. We notice that there are values
inside an approximate 99% con…dence region but which are outside a 1% likelihood region.
The point (a; b) = (1; 1:5) is an example. There were only 35 observations in this data
set. The di¤erences between the likelihood regions and the approximate con…dence regions
indicate that the Normal approximation might not be good. In this example the likelihood
regions provide a better summary of the uncertainty in the estimates.
7.4.4
Exercise
Use R and the results from Exercises 7.1.11 and 7.1.14 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Exercise 7.2.3.
Since likelihood regions and approximate con…dence regions cannot be graphed or easily
interpreted for more than two parameters, we often construct approximate con…dence intervals for individual parameters. Such con…dence intervals are often referred to as marginal
con…dence intervals. These con…dence intervals must be used with care as we will see in
Example 7.4.6.
Approximate con…dence intervals can also be constructed for a linear combination of parameters. An illustration is given in Example 7.4.6.
7.4. APPROXIMATE CONFIDENCE REGIONS
7.4.5
Let
i
247
Approximate Marginal Con…dence Intervals
be the ith entry in the vector
(~n
= ( 1;
2 ; : : : ; k ).
)[J( )]1=2 !D Z
Since
MVN(0k ; Ik )
it follows that an approximate 100p% marginal con…dence interval for
p
p
[^i a v^ii ; ^i + a v^ii ]
i
is given by
where ^i is the ith entry in the vector ^n , v^ii is the (i; i) entry of the matrix [J(^n )]
a is the value such that P (Z a) = 1+p
N(0; 1).
2 where Z
1,
and
Similarly since
(~n
)[I(~n ; Xn )]1=2 !D Z
MVN(0k ; Ik )
it follows that an approximate 100p% con…dence interval for
p
p
[^i a v^ii ; ^i + a v^ii ]
where v^ii is now the (i; i) entry of the matrix [I(^n )]
7.4.6
i
is given by
1.
Example
Using the results from Examples 7.1.10 and 7.1.13 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a + b.
Solution
Let
Since
h
i
J(^
a; ^b)
h
a
~
a ~b
b
i
1
=
"
[J(~
a; ~b)]1=2 !D Z
v^11 v^12
v^12 v^22
#
BVN
h
0 0
then for large n, V ar(~
a) v^11 , V ar(~b) v^22 and Cov(~
a; ~b)
mate 95% con…dence interval for a is given by
p
p
^ + 1:96 v^11 ]
[^
a 1:96 v^11 ; a
and an approximate 95% con…dence interval for b is given by
p
p
[^b 1:96 v^22 ; ^b + 1:96 v^22 ]
i
;
"
1 0
0 1
#!
v^12 . Therefore an approxi-
248
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
For the data in Example 7.1.13, a
^ = 2:824775, ^b = 2:97317 and
h
I(^
a; ^b)
i
1
i 1
J(^
a; ^b)
"
#
8:249382
6:586959
=
6:586959 7:381967
"
#
0:4216186 0:3762120
=
0:3762120 0:4711608
=
h
1
An approximate 95% marginal con…dence interval for a is
p
p
[2:824775 + 1:96 0:4216186; 2:824775 1:96 0:4216186] = [1:998403; 3:651148]
and an approximate 95% con…dence interval for b is
p
p
[2:97317 1:96 0:4711608; 2:97317 + 1:96 0:4711608] = [2:049695; 3:896645]
Note that a = 2:1 is in the approximate 95% marginal con…dence interval for a and b = 3:8
is in the approximate 95% marginal con…dence interval for b and yet the point (2:1; 3:8)
is not in the approximate 95% joint con…dence region for (a; b). Clearly these marginal
con…dence intervals for a and b must be used with care.
To obtain an approximate 95% marginal con…dence interval for a + b we note that
V ar(~
a + ~b) = V ar(~
a) + V ar(~b) + 2Cov(~
a; ~b)
v^11 + v^22 + 2^
v12 = v^
so that an approximate 95% con…dence interval for a + b is given by
p
p
[^
a + ^b 1:96 v^; a
^ + ^b + 1:96 v^]
For the data in Example 7.1.13
a
^ + ^b = 2:824775 + 2:824775 = 5:797945
v^ = v^11 + v^22 + 2^
v12 = 0:4216186 + 0:4711608 + 2(0:3762120) = 1:645203
and an approximate 95% marginal con…dence interval for a + b is
p
p
[5:797945 + 1:96 1:645203; 5:797945 1:96 1:645203] = [2:573347; 9:022544]
7.4.7
Exercise
Using the results from Exercises 7.1.11 and 7.1.14 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a + b.
7.5. CHAPTER 7 PROBLEMS
7.5
249
Chapter 7 Problems
1. Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the distribution with cumulative distribution function
F (x;
1; 2)
=1
1
2
for x
x
Find the maximum likelihood estimators of
1
1;
and
1
> 0;
2
>0
2.
2. Suppose (X1 ; X2 ; X3 )
Multinomial(n; 1 ; 2 ; 3 ). Verify that the maximum likelihood estimators of 1 and 2 are ~1 = X1 =n and ~2 = X2 =n. Find the expected
information for 1 and 2 .
3. Suppose x11 ; x12 ; : : : ; x1n1 is an observed random sample from the N( 1 ; 2 ) distribution and independently x21 ; x22 ; : : : ; x2n2 is an observed random sample from the
N( 2 ; 2 ) distribution. Find the maximum likelihood estimators of 1 ; 2 ; and 2 .
4. In a large population of males ages 40 50, the proportion who are regular smokers is
where 0
1 and the proportion who have hypertension (high blood pressure) is
where 0
1. Suppose that n men are selected at random from this population
and the observed data are
Category
Frequency
SH
x11
SH
x12
SH
x21
SH
x22
where S is the event the male is a smoker and H is the event the male has hypertension.
(a) Assuming the events S and H are independent determine the likelihood function,
the score vector, the maximum likelihood estimates, and the information matrix
for and .
(b) Determine the expected information matrix and its inverse matrix. What do
you notice regarding the diagonal entries of the inverse matrix?
5. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Logistic( ; ) distribution.
(a) Find the likelihood function, the score vector, and the information matrix for
and . How would you …nd the maximum likelihood estimates of and ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x=
log
1
u
1
is an observation from the Logistic( ; ) distribution.
250
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
(c) Use the following R code to randomly generate 30 observations from a Logistic( ; )
distribution.
# randomly generate 30 observations from a Logistic(mu,beta)
# using a random mu and beta values
set.seed(21086689) # set the seed so results can be reproduced
truemu<-runif(1,min=2,max=3)
truebeta<-runif(1,min=3,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((truemu-truebeta*log(1/runif(30)-1)),2))
x
(d) Use Newton’s Method and R to …nd (^ ; ^ ). Determine S(^ ; ^ ) and I(^ ; ^ ).
(e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ).
(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an approximate con…dence interval for + .
6. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Weibull( ; ) distribution.
(a) Find the likelihood function, the score vector, and the information matrix for
and . How would you …nd the maximum likelihood estimates of and ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x=
[ log (1
u)]1=
is an observation from the Weibull( ; ) distribution.
(c) Use the following R code to randomly generate 40 observations from a Weibull( ; )
distribution.
# randomly generate 40 observations from a Weibull(alpha,beta)
# using random values for alpha and beta
set.seed(21086689) # set the seed so results can be reproduced
truealpha<-runif(1,min=2,max=3)
truebeta<-runif(1,min=3,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(truebeta*(-log(1-runif(40)))^(1/truealpha),2))
x
(d) Use Newton’s Method and R to …nd (^ ; ^ ). Determine S(^ ; ^ ) and I(^ ; ^ ).
(e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ).
7.5. CHAPTER 7 PROBLEMS
251
(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an approximate con…dence interval for + .
7. Suppose Yi Binomial(1; pi ), i = 1; 2; : : : ; n independently where pi = 1 + e
and the xi are known constants.
xi
1
(a) Determine the likelihood function, the score vector, and the expected information
matrix for and .
(b) Explain how you would use Newton’s method to …nd the maximum likelihood
estimates of and .
8. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Three Parameter Burr
distribution with probability density function
f (x; ; ; ) =
(x= )
1
[1 + (x= ) ]
+1
for x > 0,
> 0;
> 0;
>0
(a) Find the likelihood function, the score vector, and the information matrix for ,
, and . How would you …nd the maximum likelihood estimates of , , and
?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
h
i1=
x=
(1 u) 1=
1
is an observation from the Three Parameter Burr distribution.
(c) Use the following R code to randomly generate 60 observations from the Three
Parameter Burr distribution.
# randomly generate 60 observations from the 3 Parameter Burr
# distribution using random values for alpha, beta and gamma
set.seed(21086689) # set the seed so results can be reproduced
truea<-runif(1,min=2,max=3)
trueb<-runif(1,min=3,max=4)
truec<-runif(1,min=3,max=4)
# data are sorted and rounded to 2 decimal places for easier display
x<-sort(round(trueb*((1-runif(60))^(-1/truec)-1)^(1/truea),2))
x
(d) Use Newton’s Method and R to …nd (^ ; ^ ; ^ ). Determine S(^ ; ^ ; ^ ) and I(^ ; ^ ; ^ ).
Use the second derivative test to verify that (^ ; ^ ; ^ ) are the maximum likelihood
estimates.
(e) Determine approximate 95% marginal con…dence intervals for , , and , and
an approximate con…dence interval for + + .
252
7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
8. Hypothesis Testing
Point estimation is a useful statistical procedure for estimating unknown parameters in
a model based on observed data. Interval estimation is a useful statistical procedure for
quantifying the uncertainty in these estimates. Hypothesis testing is another important
statistical procedure which is used for deciding whether a given statement is supported by
the observed data.
In Section 8:1 we review the de…nitions and steps of a test of hypothesis. Much of
this material was introduced in a previous statistics course such as STAT 221/231/241. In
Section 8:2 we look at how the likelihood function can be use to construct a test of hypothesis
when the model is completely speci…ed by the hypothesis of interest. The material in this
section is mostly a review of material covered in a previous statistics course. In Section 8:3
we look at how the likelihood function function can be use to construct a test of hypothesis
when the model is not completely speci…ed by the hypothesis of interest. The material in
this section is mostly new material.
8.1
Test of Hypothesis
In order to analyse a set of data x we often assume a model f (x; ) where 2 and
is the parameter space or set of possible values of . A test of hypothesis is a statistical
procedure used for evaluating the strength of the evidence provided by the observed data
against an hypothesis. An hypothesis is a statement about the model. In many cases the
hypothesis can be formulated in terms of the parameter as
H0 :
2
0
where 0 is some subset of . H0 is called the null hypothesis. When conducting a test
of hypothesis there is usually another statement of interest which is the statement which
re‡ects what might be true if H0 is not supported by the observed data. This statement is
called the alternative hypothesis and is denoted HA or H1 . In many cases HA may simply
take the form
HA : 2
= 0
In constructing a test of hypothesis it is useful to distinguish between simple and composite hypotheses.
253
254
8.1.1
8. HYPOTHESIS TESTING
De…nition - Simple and Composite Hypotheses
If the hypothesis completely speci…es the model including any parameters in the model
then the hypothesis is simple otherwise the hypothesis is composite.
8.1.2
Example
For each of the following indicate whether the null hypothesis is simple or composite. Specify and 0 and determine the dimension of each.
(a) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a Poisson( ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(b) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a Gamma( ; ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(c) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from an Exponential( 1 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym )
represent a random sample from an Exponential( 2 ) distribution. The hypothesis of interest is H0 : 1 = 2 .
(d) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample
from a N( 1 ; 21 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym )
represent a random sample from a N( 2 ; 22 ) distribution. The hypothesis of interest is
H0 : 1 = 2 ; 21 = 22 .
Solution
(a) This is a simple hypothesis since the model and the unknown parameter are completely
speci…ed. = f : > 0g which has dimension 1 and 0 = f 0 g which has dimension 0.
(b) This is a composite hypothesis since is not speci…ed by H0 . = f( ; ) :
which has dimension 2 and 0 = f( 0 ; ) : > 0g which has dimension 1.
> 0;
> 0g
(c) This is a composite hypothesis since 1 and 2 are not speci…ed by H0 .
= f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension 2 and
0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension 1.
(d) This is a composite hypothesis since 1 ,
2
2
2 > 0;
=
2 2 <;
1 ; 1 ; 2 ; 2 : 1 2 <;
1
2
2
2 = 2;
and 0 =
;
;
;
:
=
;
1
1
2
1 2
2
1
2
dimension 2.
2,
2,
1
2 >0
2
1
and 22 are not speci…ed by H0 .
which has dimension 4
2 <; 21 > 0; 2 2 <; 22 > 0 which has
To measure the evidence against H0 based on the observed data we use a test statistic or
discrepancy measure.
8.1. TEST OF HYPOTHESIS
8.1.3
255
De…nition - Test Statistic or Discrepancy Measure
A test statistic or discrepancy measure D is a function of the data X that is constructed
to measure the degree of “agreement” between the data X and the null hypothesis H0 .
A test statistic is usually chosen so that a small observed value of the test statistic
indicates close agreement between the observed data and the null hypothesis H0 while a
large observed value of the test statistic indicates poor agreement. The test statistic is
chosen before the data are examined and the choice re‡ects the type of departure from the
null hypothesis H0 that we wish to detect as speci…ed by the alternative hypothesis HA . A
general method for constructing test statistics can be based on the likelihood function as
we will see in the next two sections.
8.1.4
Example
For Example 8.1.2(a) suggest a test statistic which could be used if the alternative hypothesis is HA : 6= 0 . Suggest a test statistic which could be used if the alternative hypothesis
is HA : > 0 and if the alternative hypothesis is HA : < 0 .
Solution
If H0 : = 0 is true then E X = 0 . If the alternative hypothesis is HA :
reasonable test statistic which could be used is D = X
0 .
6=
0
then a
If the alternative hypothesis is HA :
used is D = X
0.
>
0
then a reasonable test statistic which could be
If the alternative hypothesis is HA :
used is D = 0 X.
<
0
then a reasonable test statistic which could be
After the data have been collected the observed value of the test statistic is calculated.
Assuming the null hypothesis H0 is true we compute the probability of observing a value
of the test statistic at least as great as that observed. This probability is called the p-value
of the data in relation to the null hypothesis H0 .
8.1.5
De…nition - p-value
Suppose we use the test statistic D = D (X) to test the null hypothesis H0 . Suppose also
that d = D (x) is the observed value of D. The p-value or observed signi…cance level of the
test of hypothesis H0 using test statistic D is
p-value = P (D
d; H0 )
The p-value is the probability of observing such poor agreement using test statistic D
between the null hypothesis H0 and the data if the null hypothesis H0 is true. If the p-value
256
8. HYPOTHESIS TESTING
is very small, then such poor agreement would occur very rarely if the null hypothesis H0 is
true, and we interpret this to mean that the observed data are providing evidence against
the null hypothesis H0 . The smaller the p-value the stronger the evidence against the null
hypothesis H0 based on the observed data. A large p-value does not mean that the null
hypothesis H0 is true but only indicates a lack of evidence against the null hypothesis H0
based on the observed data and using the test statistic D.
The following table gives a rough guideline for interpreting p-values. These are only
guidelines. The interpretation of p-values must always be made in the context of a given
study.
Table 10.1: Guidelines for interpreting p-values
p-value
Interpretation
p-value > 0:10
No evidence against H0 based on the observed data.
0:05 < p-value 0:10
0:01 < p-value 0:05
0:001 < p-value 0:01
p-value
0:001
Weak evidence against H0 based on the observed data.
8.1.6
Evidence against H0 based on the observed data.
Strong evidence against H0 based on the observed data.
Very strong evidence against H0 based on the observed data.
Example
For Example 8.1.4 suppose x = 5:7, n = 25 and 0 = 5. Determine the p-value for both
HA : 6= 0 and HA : > 0 . Give a conclusion in each case.
Solution
For x = 5:7, n = 25; 0 = 5, and HA :
d = j5:7 5j = 0:7, and
p-value = P
X
= P (jT
5
125j
6= 5 the observed value of the test statistic is
0:7; H0 :
17:5)
=5
where T =
25
P
Xi
Poisson (125)
i=1
= P (T
107:5) + P (T
= P (T
107) + P (T
142:5)
143)
= 0:05605429 + 0:06113746
= 0:1171917
calculated using R. Since p-value > 0:1 there is no evidence against H0 :
the data.
For x = 5:7, n = 25;
0
= 5, and HA :
= 5 based on
> 5 the observed value of the test statistic is
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES
d = 5:7
257
5 = 0:7, and
p-value = P X
= P (T
5
0:7; H0 :
125
17:5)
=5
where T =
25
P
Xi
Poisson (125)
i=1
= P (T
143)
= 0:06113746
Since 0:05 < p-value
8.1.7
0:1 there is weak evidence against H0 :
= 5 based on the data.
Exercise
Suppose in a Binomial experiment 42 successes have been observed in 100 trials and the
hypothesis of interest is H0 : = 0:5.
(a) If the alternative hypothesis is HA :
the p-value and give a conclusion.
6= 0:5, suggest a suitable test statistic, calculate
(b) If the alternative hypothesis is HA :
the p-value and give a conclusion.
< 0:5, suggest a suitable test statistic, calculate
8.2
Likelihood Ratio Tests for Simple Hypotheses
In Examples 8.1.4 and 8.1.7 it was reasonably straightforward to suggest a test statistic
which made sense. In this section we consider a general method for constructing a test
statistic which has good properties in the case of a simple hypothesis. The test statistic
we use is the likelihood ratio test statistic which was introduced in your previous statistics
course.
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where 2 and the
dimension of is k. Suppose also that the hypothesis of interest is H0 : = 0 where the
elements of 0 are completely speci…ed. H0 can also be written as H0 : 2 0 where 0
consists of the single point 0 . The dimension of 0 is zero. H0 is a simple hypothesis
since the model and all the parameters are completely speci…ed. The likelihood ratio test
statistic for this simple hypothesis is
(X;
0)
=
2 log R ( 0 ; X)
"
#
L ( 0 ; X)
=
2 log
L(~; X)
h
i
= 2 l(~; X) l ( 0 ; X)
where ~ = ~ (X) is the maximum likelihood estimator of . Note that this test statistic
implicitly assumes that the alternative hypothesis is HA : 6= 0 or HA : 2
= 0.
258
8. HYPOTHESIS TESTING
Let the observed value of the likelihood ratio test statistic be
h
i
(x; 0 ) = 2 l(^; x) l ( 0 ; x)
where x = (x1 ; x2 ; : : : ; xn ) are the observed data. The p-value is
p-value = P [ (X;
0)
(x;
0 ); H0 ]
Note that the p-value is calculated assuming H0 : = 0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is
usually intractable. We use the result from Theorem 7.3.4 which says that under certain
(regularity) conditions
2 log R( ; Xn ) = 2[l(~n ; Xn )
2
l( ; Xn )] !D W
(k)
(8.1)
for each 2 where Xn = (X1 ; X2 ; : : : ; Xn ) and ~n = ~n (X1 ; X2 ; : : : ; Xn ).
Therefore based on the asymptotic result (8.1) and assuming H0 : = 0 is true, the
p-value for testing H0 : = 0 using the likelihood ratio test statistic can be approximated
using
p-value
8.2.1
P [W
0 )]
(x;
2
where W
(k)
Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N ; 2 distribution where
known. Show that, in this special case, the likelihood ratio test statistic for testing
H0 : = 0 has exactly a 2 (1) distribution.
Solution
From Example 7.1.8 we have that the likelihood function of
L( ) =
or more simply
n
exp
2
n
1 P
2
i=1
L( ) = exp
n(x
2
)2
for
2
)2
n(x
2
x)2 exp
(xi
is
2
2<
The corresponding log likelihood function is
l( ) =
n(x
2
)2
for
2
Solving
dl
n(x
=
d
)
2
=0
2<
for
2<
2
is
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES
259
gives = x. Since l( ) is a quadratic function which is concave down we know that ^ = x
is the maximum likelihood estimate. The corresponding maximum likelihood estimator of
is
n
1 P
~=X=
Xi
n i=1
Since L(^ ) = 1, the relative likelihood function is
R( ) =
L( )
L(^ )
n(x
2
= exp
)2
for
2
2<
The likelihood ratio test statistic for testing the hypothesis H0 :
(
0 ; X)
L ( 0 ; X)
L (~ ; X)
n(X
= 2 log exp
2
2
n(X
0)
=
2
=
=
If H0 :
=
0
is true then X
=
0
is
2 log
X
2
0)
since ~ = X
2
2
p 0
= n
N
0;
X
2
and
2
p 0
= n
2
(1)
as required.
8.2.2
Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 :
ratio statistic takes on large values if ~ > 0 or ~ < 0 .
=
0.
Verify that the likelihood
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 :
Compare this with the test in Example 8.1.6.
= 5.
Solution
(a) From Example 6.2.5 we have the likelihood function
L( ) =
nx
e
n
for
0
and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as
n^
L( )
^
R( ) =
=
en( ) for
0
^
L(^)
260
8. HYPOTHESIS TESTING
The likelihood ratio test statistic for H0 :
( 0 ; X) =
=
0
is
2 log R ( 0 ; X)
"
~
=
2 log
~
~ log
= 2n
0
= 2n~
n
0
0
~
1
~
n(~
e
+
0
)
#
~
0
log
0
(8.2)
~
To verify that the likelihood ratio statistic takes on large values if ~ >
equivalently if ~0 < 1 or ~0 > 1, consider the function
g (t) = a [(t
1)
log (t)]
for t > 0 and a > 0
0
or ~ <
0
or
(8.3)
We note that g (t) ! 1 as t ! 0+ and t ! 1. Now
g 0 (t) = a 1
1
t
=a
t
1
t
for t > 0 and a > 0
Since g 0 (t) < 0 for 0 < t < 1, and g 0 (t) > 0 for t > 1 we can conclude that the function
g (t) is a decreasing function for 0 < t < 1 and an increasing function for t > 1 with an
absolute minimum at t = 1. Since g (1) = 0, g (t) is positive for all t > 0 and t 6= 0.
Therefore if we let t =
of t =
0
~
0
~
in (8.2) then we see that
< 1 or large values of t =
(b) If x = 6, n = 25, and H0 :
statistic is
0
~
( 0 ; X) will be large for small values
> 1.
= 5 then the observed value of the likelihood ratio test
(5; x) =
=
2 log R ( 0 ; X)
"
5 25(5:6) 25(6
2 log
e
6
5)
#
= 4:6965
The parameter space is
approximate p-value is
p-value
= f :
> 0g which has dimension 1 and thus k = 1. The
2
P (W 4:6965) where W
(1)
h
i
p
= 2 1 P Z
4:6965
where Z
N (0; 1)
= 0:0302
calculated using R. Since 0:01 < p-value
0:05 there is evidence against H0 : = 5 based
on the data. Compared with the answer in Example 8.1.6 for HA : 6= 5 we note that the
p-values are slightly di¤erent by the conclusion is the same.
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES
8.2.3
261
Example
Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 :
ratio statistic takes on large values if ~ > 0 or ~ < 0 .
=
0.
Verify that the likelihood
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 :
= 5.
(c) From Example 6.6.3 we have
2
Q(X; ) =
n
P
Xi
i=1
=
2n~
2
(2n)
is a pivotal quantity. Explain how this pivotal quantity could be used to test H0 :
if (i) HA : < 0 , (ii) HA : > 0 , and (iii) HA : 6= 0 .
(d) Suppose x = 6 and n = 25. Use the test statistic from (c) for HA :
H0 : = 5. Compare the answer with the answer in (b).
6=
0
=
0
to test
Solution
(a) From Example 6.2.8 we have the likelihood function
L( ) =
n
e
nx=
for
>0
and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as
!n
^
L( )
^
R( ) =
=
en(1 )= for
0
^
L( )
The likelihood ratio test statistic for H0 :
( 0 ; X) =
=
=
0
is
2 log R ( 0 ; X)
!n
"
~
2 log
en(1
= 2n
"
~
!
1
0
~)=
#
~
log
0
!#
To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or
~
~
equivalently if 0 < 1 and 0 > 1 we note that ( 0 ; X) is of the form 8.3 so an argument
similar to Example 8.2.2(a) can be used with t =
(b) If x = 6, n = 25, and H0 :
statistic is
~
0
.
= 5 then the observed value of the likelihood ratio test
(5; x) =
=
2 log R ( 0 ; X)
"
6 25 25(1
2 log
e
5
= 0:8839222
6)=5
#
262
8. HYPOTHESIS TESTING
The parameter space is
approximate p-value is
p-value
= f :
> 0g which has dimension 1 and thus k = 1. The
2
P (W 0:8839222) where W
(1)
h
i
p
= 2 1 P Z
0:8839222
where Z
N (0; 1)
= 0:3471
calculated using R. Since p-value > 0:1 there is no evidence against H0 :
the data.
(c) (i) If HA :
>
0
we could let D =
~
~
0
. If H0 :
=
0
= 5 based on
is true then since E(~) =
0
we
would expect observed values of D = 0 to be close to 1. However if HA : > 0 is true
~
then E(~) = > 0 and we would expect observed values of D = 0 to be larger than 1
and therefore large values of D provide evidence against H0 : = 0 . The corresponding
p-value would be
!
~
^
p-value = P
; H0
0
= P
0
2n^
W
0
(ii) If HA :
<
0
we could still let D =
~
~
0
!
where W
. If H0 :
=
0
2
(2n)
is true then since E(~) =
0
we
would expect observed values of D = 0 to be close to 1. However if HA : < 0 is true
~
then E(~) = < 0 and we would expect observed values of D = 0 to be smaller than
1 and therefore small values of D provide evidence against H0 : = 0 . The corresponding
p-value would be
!
~
^
p-value = P
; H0
0
= P
0
2n^
W
0
(iii) If HA :
6=
0
we could still let D =
~
~
0
!
where W
. If H0 :
=
0
2
(2n)
is true then since E(~) =
0
we
would expect observed values of D = 0 to be close to 1. However if HA : 6= 0 is true
~
then E(~) = 6= 0 and we would expect observed values of D = 0 to be either larger
or smaller than 1 and therefore both large and small values of D provide evidence against
H0 : = 0 . If a large (small) value of D is observed it is not simple to determine exactly
which small (large) values should also be considered. Since we are not that concerned about
the exact p-value, the p-value is usually calculated more simply as
!
!!
2n^
2n^
2
p-value = min 2P W
; 2P W
where W
(2n)
0
0
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES
(d) If x = 6, n = 25, and H0 :
= 5 then the observed value of the D =
(50) 6
5
= min (1:6855; 0:3145)
p-value = min 2P
263
; 2P
W
(50) 6
5
W
~
is d =
0
2
where W
6
5
with
(50)
= 0:3145
calculated using R. Since p-value > 0:1 there is no evidence against H0 :
the data.
h
~
~
We notice the test statistic D = 0 and ( 0 ; X) = 2n
1
log
0
functions of
8.2.4
~
0
= 5 based on
~
0
i
are both
. For this example the p-values are similar and the conclusions are the same.
Example
The following table gives the observed frequencies of the six faces in 100 rolls of a die:
Face: j
Observed Frequency: xj
1
16
2
15
3
14
4
20
5
22
6
13
Total
100
Are these observations consistent with the hypothesis that the die is fair?
Solution
The model for these data is (X1 ; X2 ; : : : ; X6 )
Multinomial(100; 1 ; 2 ; : : : ; 6 ) and the
hypothesis of interest is H0 : 1 = 2 =
= 6 = 61 . Since the model and parameters are
6
P
completely speci…ed this is a simple hypothesis. Since
j = 1 there are really only k = 5
j=1
parameters. The relative likelihood function for ( 1 ;
L ( 1;
2; : : : ; 5)
=
n!
x1 !x2 !
x5 !x6 !
2; : : : ; 5)
x1 x2
1 2
x5
5
x5
5
1
(1
is
1
5)
2
or more simply
L ( 1;
for 0
j
2; : : : ; 5)
x1 x2
1 2
=
1 for j = 1; 2; : : : ; 5 and
5
P
(1
5)
2
x6
1. The log likelihood function is
j
j=1
l ( 1;
2; : : : ; 5)
=
5
P
xj log
j
+ x6 log (1
1
2
5)
j=1
Now
@l
@ j
=
xj
j
=
xj (1
n
1
1
x1
x2
x5
1
2
5
for j = 1; 2; : : : ; 5
5)
2
j (1
1
j
2
(n
x1
x2
5)
x5 )
x6
264
8. HYPOTHESIS TESTING
x5 . We could solve @@lj = 0 for j = 1; 2; : : : ; 5 simultaneously.
In the Binomial case we know ^ = nx . It seems reasonable that the maximum likelihood
x
x
estimate of j is ^j = nj for j = 1; 2; : : : ; 5. To verify this is true we substitute j = nj for
j = 1; 2; : : : ; 5 into @@lj to obtain
since x6 = n
x1
x2
xj P
xi
n i6=j
xj
xj P
xi = 0
n i6=j
xj +
Therefore the maximum likelihood estimator of
Xj
n
is ~j =
for j = 1; 2; : : : ; 5. Note also
5
P
~j = X6 .
that by the invariance property of maximum likelihood estimators ~6 = 1
n
j
j=1
Therefore we can write
6
P
Xj log
l(~1 ; ~2 ; ~3 ; ~4 ; ~5 ; X) =
Xj
n
i=1
Since the null hypothesis is H0 :
1
l(
=
2
=
0 ; X)
=
=
6
P
6
=
1
6
1
6
Xj log
i=1
so the likelihood ratio test statistic is
h
(X; 0 ) = 2 l(~; X)
= 2
6
P
l(
Xj
n
Xj log
i=1
= 2
6
P
Xj log
i=1
0 ; X)
Xj
Ej
i
6
P
Xj log
i=1
1
6
where Ej = n=6 is the expected frequency for outcome j. This test statistic is the likelihood
ratio Goodness of Fit test statistic introduced in your previous statistics course.
For these data the observed value of the likelihood ratio test statistic is
(x;
0)
= 2
6
P
xj log
i=1
= 2 16 log
xj
100=6
16
+ 15 log
100=6
15
100=6
+
+ 13 log
13
100=6
= 3:699649
The approximate p-value is
p-value
P (W
3:699649)
where W
2
(5)
= 0:5934162
calculated using R. Since p-value> 0:1 there is no evidence based on the data against the
hypothesis of a fair die.
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES
265
Note:
(1) In this example the data (X1 ; X2 ; : : : ; X6 ) are not a random sample. The conditions
for (8.1) hold by thinking of the experiment as a sequence of n independent trials with 6
outcomes on each trial.
(2) You may recall from your previous statistics course that the 2 approximation is reasonable if the expected frequency Ej in each category is at least 5.
8.2.5
Exercise
In a long-term study of heart disease in a large group of men, it was noted that 63 men who
had no previous record of heart problems died suddenly of heart attacks. The following
table gives the number of such deaths recorded on each day of the week:
Day of Week
No. of Deaths
Mon.
22
Tues.
7
Wed.
6
Thurs.
13
Fri.
5
Sat.
4
Sun.
6
Test the hypothesis of interest that the deaths are equally likely to occur on any day of the
week.
8.3
Likelihood Ratio Tests for Composite Hypotheses
Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where 2 and has
dimension k. Suppose we wish to test H0 : 2 0 where 0 is a subset of of dimension
q where 0 < q < k. The hypothesis H0 is a composite hypothesis since all the values of
the unknown parameters are not speci…ed. For testing composite hypotheses we use the
likelihood ratio test statistic
2
3
max L( ; X)
2 0
5
(X; 0 ) =
2 log 4
max L( ; X)
2
= 2 l(~; X)
max l( ; X)
2
0
where ~ = ~ (X1 ; X2 ; : : : ; Xn ) is the maximum likelihood estimator of . Note that this
test statistic implicitly assumes that the alternative hypothesis is HA : 2
= 0.
Let the observed value of the likelihood ratio test statistic be
(x;
0)
= 2 l(^; x)
max l ( ; x)
2
0
where x = (x1 ; x2 ; : : : ; xn ) are the observed data. The p-value is
p-value = P [ (X;
0)
(x;
0 ); H0 ]
Note that the p-value is calculated assuming H0 : 2 0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is
266
8. HYPOTHESIS TESTING
usually intractable. Under certain (regularity) conditions it can be shown that, assuming
the hypothesis H0 : 2 0 is true,
(X;
0)
= 2 l(~n ; Xn )
2
max l( ; Xn ) !D W
2
(k
q)
0
where Xn = (X1 ; X2 ; : : : ; Xn ) and ~n = ~n (X1 ; X2 ; : : : ; Xn ). The approximate p-value is
given by
p-value P [W
(x; 0 )]
where x = (x1 ; x2 ; : : : ; xn ) are the observed data.
Note:
The number of degrees of freedom is the di¤erence between the dimension of
and the
dimension of 0 . The degrees of freedom can also be determined as the number of parameters estimated under the model with no restrictions minus the number of parameters
estimated under the model with restrictions imposed by the null hypothesis H0 .
8.3.1
Example
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Gamma( ; ) distribution. Find
the likelihood ratio test statistic for testing H0 : = 0 where is unknown. Indicate how
to …nd the approximate p-value.
(b) For the data in Example 7.1.14 test the hypothesis H0 :
= 2.
Solution
(a) From Example 8.1.2(b) we have = f( ; ) : > 0; > 0g which has dimension k = 2
and 0 = f( 0 ; ) : > 0g which has dimension q = 1 and the hypothesis is composite.
From Exercise 7.1.11 the likelihood function is
L( ; ) = [ ( )
]
n
Q
n
xi
n
1 P
exp
i=1
xi
for
> 0;
>0
i=1
where and the log likelihood function is
l ( ; ) = log L ( ; )
=
n log ( )
n log
+
n
P
n
1 P
log xi
i=1
xi
for
> 0;
i=1
The maximum likelihood estimators cannot be found explicitly so we write
l(~ ; ~ ; X) =
n log (~ )
n~ log ~ + ~
n
P
log Xi
i=1
If
=
0
then the log likelihood function is
l(
0; ) =
n log (
0)
n
0 log
+
0
n
P
i=1
log xi
n
P
i=1
n
1 P
X
~ i=1 i
xi
for
>0
>0
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES
which is only a function of . To determine
267
max l( ; ; X) we note that
( ; )2
0
n
P
and
d
d
l(
0;
xi
d
n 0 i=1
l ( 0; ) =
+
2
d
n
P
= n1 0
xi = x0 and therefore
) = 0 for
i=1
max l( ; ; X) =
( ; )2
n log (
0)
n
0 log
X
+
0
0
+n log (
n~ log ~ + ~
n
P
log Xi
0)
+n
0 log
X
0
0
(~
( 0)
+
(~ )
n
= 2n log
0)
n
P
0
X
~
^ log ^ +
0
x
^
n
1 P
X
~ i=1 i
log Xi + n
0]
i=1
n
P
log Xi +
0 log
X
0
i=1
with corresponding observed value
0)
= 2n log
q=2
~ log ~ +
0
0
i=1
Since k
n
max l( ; ; X)
( ; )2
n log (~ )
(x;
log Xi
0)
= 2 l(~ ; ~ ; X)
= 2[
n
P
i=1
The likelihood ratio test statistic is
(X;
0
(^
( 0)
+
(^ )
n
0)
log xi +
i=1
1=1
p-value
n
P
0 log
x
0
2
P [W
(x; 0 )] where W
(1)
h
i
p
= 2 1 P Z
(x; 0 )
where Z
N (0; 1)
The degrees of freedom can also be determined by noticing that, under the full model two
parameters ( and ) were estimated, and under the null hypothesis H0 : = 0 only one
parameter ( ) was estimated. Therefore 2 1 = 1 are the degrees of freedom.
(b) For H0 :
= 2 and the data in Example 7.1.14 we have n = 30, x = 6:824333,
n
P
1
log xi = 1:794204, ^ = 4:118407, ^ = 1:657032. The observed value of the likelihood
n
i=1
ratio test statistic is (x;
p-value
0)
= 6:886146 with
2
P (W 6:886146) where W
(1)
i
h
p
6:886146
where Z
= 2 1 P Z
N (0; 1)
= 0:008686636
calculated using R. Since p-value
on the data.
0:01 there is strong evidence against H0 :
= 2 based
268
8. HYPOTHESIS TESTING
8.3.2
Example
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( 1 ) distribution and
independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Exponential( 2 ) distribution.
Find the likelihood ratio test statistic for testing H0 : 1 = 2 . Indicate how to …nd the
approximate p-value.
10
P
(b) Find the approximate p-value if the observed data are n = 10,
xi = 22, m = 15,
i=1
15
P
yi = 40. What would you conclude?
i=1
Solution
(a) From Example 8.1.2(c) we have
k = 2 and 0 = f( 1 ; 2 ) : 1 = 2 ;
hypothesis is composite.
= f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension
> 0; 2 > 0g which has dimension q = 1 and the
1
From Example 6.2.8 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Exponential( 1 ) distribution is
n
L1 ( 1 ) =
1
e
nx=
for
1
1
>0
with maximum likelihood estimate ^1 = x.
Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from an
Exponential( 2 ) distribution is
L2 ( 2 ) =
m
2
e
my=
for
2
2
>0
with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for ( 1 ; 2 ) is
L ( 1;
2)
= L1 ( 1 )L2 ( 2 ) for
1
> 0;
2
>0
and the log likelihood function
l ( 1;
2)
=
n log
nx
m log
1
1
my
2
for
1
> 0;
2
>0
2
The independence of the samples also implies the maximum likelihood estimators are still
~1 = X and ~2 = Y . Therefore
l(~1 ; ~2 ; X; Y) =
If
1
=
2
=
n log X
m log Y
(n + m)
then the log likelihood function is
l( ) =
(n + m) log
(nx + my)
for
>0
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES
which is only a function of . To determine
d
l( ) =
d
and
d
d
l ( ) = 0 for
(
nx+my
n+m
=
l( 1 ;
max
1 ; 2 )2
(
max
1 ; 2 )2
(n + m)
+
l( 1 ;
2 ; X; Y)
269
we note that
0
(nx + my)
2
and therefore
2 ; X; Y)
=
nX + mY
n+m
(n + m) log
0
(n + m)
The likelihood ratio test statistic is
(X; Y;
0)
= 2 l(~1 ; ~2 ; X; Y)
= 2
n log X
(
max
1 ; 2 )2
m log Y
= 2 (n + m) log
l( 1 ;
2 ; X; Y)
0
(n + m) + (n + m) log
nX + m Y
n+m
n log X
nX + mY
n+m
+ (n + m)
m log Y
with corresponding observed value
(x; y;
Since k
q=2
= 2 (n + m) log
nx + my
n+m
n log x
m log y
1=1
p-value
(b) For n = 10,
0)
10
P
2
P [W
(x; y; 0 )] where W
(1)
h
i
p
= 2 1 P Z
(x; y; 0 )
where Z
xi = 22, m = 15,
i=1
test statistic is (x; y;
p-value
15
P
N (0; 1)
yi = 40 the observed value of the likelihood ratio
i=1
0)
= 0:2189032 and
2
P (W 0:2189032) where W
(1)
h
i
p
= 2 1 P Z
0:2189032
where Z
N (0; 1)
= 0:6398769
calculated using R. Since p-value > 0:5 there is no evidence against H0 :
the observed data.
8.3.3
1
=
2
based on
Exercise
(a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( 1 ) distribution and independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Poisson( 2 ) distribution. Find the
270
8. HYPOTHESIS TESTING
likelihood ratio test statistic for testing H0 :
p-value.
1
=
2.
Indicate how to …nd the approximate
(b) Find the approximate p-value if the observed data are n = 10,
10
P
xi = 22, m = 15,
i=1
15
P
yi = 40. What would you conclude?
i=1
8.3.4
Example
In a large population of males ages 40 50, the proportion who are regular smokers is
where 0
1 and the proportion who have hypertension (high blood pressure) is
where 0
1. Suppose that n men are selected at random from this population and
the observed data are
Category SH S H SH S H
Frequency x11 x12 x21 x22
where S is the event the male is a smoker and H is the event the male has hypertension.
Find the likelihood ratio test statistic for testing H0 : events S and H are independent.
Indicate how to …nd the approximate p-value.
Solution
The model for these data is (X1 ; X2 ; : : : ; X6 )
parameter space
(
=
(
11 ; 12 ; 21 ; 22 )
:0
ij
Multinomial(100;
2 P
2
P
1 for i; j = 1; 2
11 ; 12 ; 21 ; 22 )
ij
j=1 i=1
1
with
)
which has dimension k = 3.
Let P (S) =
12 = (1
and P (H) =
), 21 = (1
0
21
= f(
= (1
then the hypothesis of interest can be written as H0 :
) , 22 = (1
) (1
) and
11 ; 12 ; 21 ; 22 )
) ;
22
:
= (1
=
11
;
) (1
12
=
(1
); 0
11
=
,
);
1; 0
1g
which has dimension q = 2 and the hypothesis is composite.
From Example 8.2.4 we can see that the relative likelihood function for (
L(
11 ; 12 ; 21 ; 22 )
=
x11 x12 x21 x22
11 12 21 22
The log likelihood function is
l(
11 ; 12 ; 21 ; 22 )
=
2 P
2
P
j=1 i=1
xij log
ij
11 ; 12 ; 21 ; 22 )
is
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES
and the maximum likelihood estimate of
ij
is ^ij =
xij
n
271
for i; j = 1; 2. Therefore
2 P
2
P
Xij log
l(~11 ; ~12 ; ~21 ; ~22 ; X) =
j=1 i=1
Xij
n
If the events S and H are independent events then from Chapter 7, Problem 4 we have
that the likelihood function is
x11 +x12
=
)x21 +x22
(1
) (1
)]x22
)x12 +x22
for 0
) ]x21 [(1
)]x12 [(1
)x11 [ (1
L( ; ) = (
x11 +x21
(1
1, 0
1
The log likelihood is
l( ; ) = (x11 + x12 ) log
+ (x21 + x22 ) log(1
+ (x11 + x21 ) log
for 0 <
< 1, 0 <
+ (x12 + x22 ) log(1
)
)
<1
and the maximum likelihood estimates are
^=
x11 + x12
x11 + x21
and ^ =
n
n
therefore
max
l(
11 ; 12 ; 21 ; 22 )2
(
11 ; 12 ; 21 ; 22 ; X)
0
X11 + X12
X21 + X22
+ (X21 + X22 ) log
n
n
X11 + X21
X12 + X22
+ (X11 + X21 ) log
+ (X12 + X22 ) log
n
n
= (X11 + X12 ) log
The likelihood ratio test statistic can be written as
(X;
0)
= 2 l(~11 ; ~12 ; ~21 ; ~22 ; X)
= 2
2 P
2
P
j=1 i=1
RC
Xij log
(
max
11 ; 12 ; 21 ; 22 )2
l(
11 ; 12 ; 21 ; 22 ; X)
0
Xij
Eij
where Eij = in j , Ri = Xi1 + Xi2 , Cj = X1j + X2j for i; j = 1; 2. Eij is the expected
frequency if the hypothesis of independence is true.
The corresponding observed value is
(x;
0)
=2
2 P
2
P
j=1 i=1
where eij =
ri cj
n ,
xij log
xij
eij
ri = xi1 + xi2 , cj = x1j + x2j for i; j = 1; 2.
272
Since k
8. HYPOTHESIS TESTING
q=3
2=1
p-value
2
P [W
(x; 0 )] where W
(1)
h
i
p
= 2 1 P Z
(x; 0 )
where Z
N (0; 1)
This of course is the usual test of independence in a two-way table which was discussed in
your previous statistics course.
8.4. CHAPTER 8 PROBLEMS
8.4
273
Chapter 8 Problems
1. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the distribution with probability
density function
f (x; ) = x
1
for 0
x
1 for
>0
(a) Find the likelihood ratio test statistic for testing H0 :
(b) If n = 20 and
20
P
log xi =
=
0.
25 …nd the approximate p-value for testing H0 :
=1
i=1
using the asymptotic distribution of the likelihood ratio statistic. What would
you conclude?
2. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Pareto(1; ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 :
…nd the approximate p-value.
(b) For the data n = 25 and
25
P
=
0.
Indicate how to
log xi = 40 …nd the approximate p-value for testing
i=1
H0 :
0
= 1. What would you conclude?
3. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the distribution with probability
density function
2
x 1
f (x; ) = 2 e 2 (x= ) for x > 0; > 0
(a) Find the likelihood ratio test statistic for testing H0 :
…nd the approximate p-value.
(b) If n = 20 and
20
P
i=1
=
0.
Indicate how to
x2i = 10 …nd the approximate p-value for testing H :
= 0:1.
What would you conclude?
4. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Weibull(2; 1 ) distribution and
independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Weibull(2; 2 ) distribution.
Find the likelihood ratio test statistic for testing H0 : 1 = 2 . Indicate how to …nd
the approximate p-value.
5. Suppose (X1 ; X2 ; X3 )
Multinomial(n;
1 ; 2 ; 3 ).
(a) Find the likelihood ratio test statistic for testing H0 :
how to …nd the approximate p-value.
1
=
2
(b) Find the likelihood ratio test statistic for testing H0 : 1 = 2 ;
)2 . Indicate how to …nd the approximate p-value.
3 = (1
=
2
3.
Indicate
= 2 (1
);
274
8. HYPOTHESIS TESTING
6. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N( 1 ; 21 ) distribution and independently Y1 ; Y2 ; : : : ; Ym is a random sample from the N( 2 ; 22 ) distribution. Find
the likelihood ratio test statistic for testing H0 : 1 = 2 ; 21 = 22 . Indicate how to
…nd the approximate p-value.
9. Solutions to Chapter Exercises
275
276
9.1
9. SOLUTIONS TO CHAPTER EXERCISES
Chapter 2
Exercise 2.1.5
(a) P (A) 0 follows from De…nition 2.1.3(A1). From Example 2.1.4(c) we have P A =
0 and therefore P (A) 1.
1 P (A). But from De…nition 2.1.3(A1) P A
(b) Since A = (A \ B) [ A \ B and (A \ B) \ A \ B = ? then by Example 2.1.4(b)
P (A) = P (A \ B) + P A \ B
or
P A \ B = P (A)
P (A \ B)
as required.
(c) Since
A [ B = A \ B [ (A \ B) [ A \ B
is the union of three mutually exclusive events then by De…nition 2.1.3(A3) and Example
2.1.4(a) we have
P (A [ B) = P A \ B + P (A \ B) + P A \ B
(9.1)
By the result proved in (b)
P A \ B = P (A)
P (A \ B)
(9.2)
P A \ B = P (B)
P (A \ B)
(9.3)
and similarly
Substituting (9.2) and (9.3) into (9.1) gives
P (A [ B) = P (A)
P (A \ B) + P (A \ B) + P (B)
= P (A) + P (B)
as required.
P (A \ B)
P (A \ B)
9.1. CHAPTER 2
277
Exercise 2.3.7
By 2.11.8
x2 x3
+
2
3
log (1 + x) = x
Let x = p
for
1<x
1
1 to obtain
log p = (p
which holds for 0 < p
as
1)2
(p
1)
+
2
1)3
(p
3
(9.4)
2 and therefore also hold for 0 < p < 1. Now (9.4) can be written
log p =
=
(1
1
P
Therefore
1
P
p)3
(1
2
p)x
(1
3
for 0 < p < 1
x
x=1
p)2
(1
p)
(1 p)x
x log p
x=1
P
1 1
(1 p)x
=
log p x=1
x
log p
=1
=
log p
1
P
f (x) =
x=1
which holds for 0 < p < 1.
Exercise 2.4.11
(a)
Z1
1
f (x) dx =
Z1
x
1
e
(x= )
dx
0
1
Let y = (x= ) . Then dy = x dx. When x = 0, y = 0 and as x ! 1, y ! 1.
Therefore
Z1
Z1
x 1 (x= )
f (x) dx =
e
dx
1
0
=
Z1
e
y
dy =
(1) = 0! = 1
0
(b) If
= 1 then
f (x) =
1
e
x=
for x > 0
which is the probability density function of an Exponential( ) random variable.
(c) See Figure 9.1
278
9. SOLUTIONS TO CHAPTER EXERCISES
2
α=1,β=0.5
1.8
α=2,β=0.5
1.6
1.4
1.2
f(x)
α=3,β=1
1
0.8
0.6
0.4
α=2,β=1
0.2
0
0
0.5
1
1.5
x
2
2.5
3
Figure 9.1: Graphs of Weibull probability density function
Exercise 2.4.12
(a)
Z1
f (x) dx =
1
=
Z1
x
lim
b!1
= 1
+1
h
dx = lim
b!1
x
j
b
i
Zb
1
x
1
=1
1
lim
= 1 since
b!1 b
dx
i
h
lim b
b!1
>0
(b) See Figure 9.2
4
3.5
α=0.5, β=2
3
2.5
f(x)
2
α=0.5, β=1
α=1, β=2
1.5
1
α=1, β=1
0.5
0
0
0.5
1
1.5
x
2
2.5
3
Figure 9.2: Graphs of Pareto probability density function
9.1. CHAPTER 2
279
Exercise 2.5.4
(a) For X
Cauchy( ; 1) the probability density function is
f (x; ) =
h
1
1 + (x
)2
i for x 2 <;
2<
and 0 otherwise. See Figure 9.3 for a sketch of the probability density function for
= 1; 0; 1.
0.35
0.3
θ=1
θ=0
θ=-1
0.25
0.2
f(x)
0.15
0.1
0.05
0
-6
-4
-2
0
x
2
4
6
Figure 9.3: Cauchy( ; 1) probability density functions for
Let
f0 (x) = f (x; = 0) =
1
(1 + x2 )
=
1; 0; 1
for x 2 <
and 0 otherwise. Then
f (x; ) =
and therefore
(b) For X
h
1
1 + (x
)
2
i = f0 (x
) for x 2 <;
2<
is a location parameter of this distribution.
Cauchy(0; ) the probability density function is
f (x; ) =
h
1
1 + (x= )2
i
for x 2 <;
>0
and 0 otherwise. See Figure 9.4 for a sketch of the probability density function for
= 0:5; 1; 2.
Let
1
for x 2 <
f1 (x) = f (x; = 1) =
(1 + x2 )
280
9. SOLUTIONS TO CHAPTER EXERCISES
0.7
0.6
θ=0.5
0.5
0.4
f(x)
0.3
θ=1
0.2
0.1
θ=2
0
-5
0
x
5
Figure 9.4: Cauchy(0; ) probability density function for
= 0:5; 1; 2
and 0 otherwise. Then
f (x; ) =
and therefore
h
1
1 + (x= )
2
1
x
i = f1
is a scale parameter of this distribution.
for x 2 <;
>0
9.1. CHAPTER 2
281
Exercise 2.6.11
If X
Exponential(1) then the probability density function of X is
x
f (x) = e
for x
0
Y = X 1= = h (X) for
> 0,
> 0 is an increasing function with inverse function
X = (Y = ) = h 1 (Y ). Since the support set of X is A = fx : x 0g, the support set of
Y is B = fy : y 0g.
Since
d
h
dy
1
d
dy
y
(y) =
=
y
1
then by Theorem 2.6.8 the probability density function of Y is
g(y) = f (h
(y))
1
y
=
1
d
h
dy
1
(y= )
e
(y)
for y
0
which is the probability density function of a Weibull( ; ) random variable as required.
Exercise 2.6.12
X is a random variable with probability density function
1
f (x) = x
for 0 < x < 1;
>0
and 0 otherwise. Y = log X is a decreasing function with inverse function X = e Y =
h 1 (Y ). Since the support set of X is A = fx : 0 < x < 1g, the support set of Y is
B = fy : y > 0g.
Since
d
h
dy
1
(y) =
=
d
e
dy
e y
y
then by Theorem 2.6.8 the probability density function of Y is
g(y) = f (h
1
(y))
=
y
1
=
e
e
y
d
h
dy
for y
e
1
(y)
y
0
which is the probability density function of a Exponential
1
random variable as required.
282
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 2.7.4
Using integration by parts with u = 1
Z1
[1
F (x) and dv = dx gives du = f (x) dx, v = x and
F (x)] dx = x [1
(x)] j1
0
F
0
+
Z1
xf (x) dx
0
=
lim x
x!1
Z1
f (t) dt + E (X)
x
The desired result holds if
lim x
x!1
Z1
f (t) dt = 0
(9.5)
x
Since
0
x
Z1
f (t) dt
x
Z1
tf (t) dt
x
then (9.5) holds by the Squeeze Theorem if
lim
x!1
Z1
tf (t) dt = 0
x
Since E (X) =
R1
xf (x) dx exists then G (x) =
0
Rx
tf (t) dt exists for all x > 0 and lim G (x) =
x!1
0
E (X).
By the First Fundamental Theorem of Calculus 2.11.9 and the de…nition of an improper
integral 2.11.11
Z1
tf (t) dt = lim G (b) G (x) = E (X) G (x)
b!1
x
Therefore
lim
x!1
Z1
tf (t) dt =
lim [E (X)
x!1
G (x)] = E (X)
x
= E (X)
= 0
and (9.5) holds.
E (X)
lim G (x)
x!1
9.1. CHAPTER 2
283
Exercise 2.7.9
(a) If X
Poisson( ) then
1
P
E(X (k) ) =
x
e
x!
x(k)
x=0
= e
k
= e
k
k
=
x k
1
P
k
= e
let y = x
y=0
(x
y
y!
e
by 2.11.7
x=k
1
P
k)!
k
for k = 1; 2; : : :
Letting k = 1 and k = 2 we have
E X (1) = E (X) =
and
E X (2) = E [X (X
2
1)] =
so
V ar (X) = E [X (X
2
=
(b) If X
[E (X)]2
1)] + E (X)
2
+
=
Negative Binomial(k; p) then
E(X (j) ) =
1
P
k k
p (p
x
x(j)
x=0
= pk (p
1)j ( k)(j)
1)x
1
P
x=j
= pk (p
1)j ( k)(j)
1
P
x=j
= pk (p
1)j ( k)(j)
k j
(p
x j
1)x
j
by 2.11.4(1)
k j
(p
x j
1)x
j
let y = x
k j
(p
y
1)y
1
P
y=0
k
= p (p
j
(j)
p
j
1) ( k)
= ( k)(j)
1
p
(1 + p
k j
1)
j
by 2.11.3(1)
for j = 1; 2; : : :
Letting k = 1 and k = 2 we have
E X (1) = E (X) = ( k)(1)
p
E X
= E [X (X
(2)
1)] = ( k)
=
p
and
(2)
1
1
p
1
p
k (1
p)
p
2
= k (k + 1)
1
p
p
2
284
9. SOLUTIONS TO CHAPTER EXERCISES
so
V ar (X) = E [X (X
1)] + E (X)
1
= k (k + 1)
k (1
=
+
p
2
k (1
+
p2
p)
p
k (1
=
p)
k (1
p)
2
p
p
k (1 p)
(1
p2
p + p)
k (1 p)
p2
=
(c) If X
p)
2
p
[E (X)]2
Gamma( ; ) then
p
E(X ) =
Z1
x
px
1 e x=
( )
dx =
0
x
+p 1 e x=
( )
dx let y =
x
0
1
=
Z1
( )
Z1
( y)
Z1
y
+p 1
e
y
dy
0
+p
=
( )
+p 1
e
y
dy
which converges for
+p>0
0
=
p
( + p)
( )
for p >
Letting k = 1 and k = 2 we have
( )
( )
( + 1)
=
( )
E (X) =
=
and
E X2
= E X2 =
=
( + 1)
2
( + 2)
=
( )
2
( + 1) ( )
( )
2
so
V ar (X) = E X 2
=
2
[E (X)]2 =
( + 1)
2
(
)2
9.1. CHAPTER 2
(c) If X
285
Weibull( ; ) then
E(X ) =
Z1
=
Z1
k
x
xk
1 e (x= )
x
dx let y =
0
y
k
1=
e
y
k
dy =
0
y k= e
y
dy
0
k
k
=
Z1
+1
for k = 1; 2; : : :
Letting k = 1 and k = 2 we have
1
E (X) =
+1
and
2
2
E X2 = E X2 =
+1
so
V ar (X) = E X 2
(
2
=
[E (X)]2 =
2
2
2
1
+1
1
+1
)
2
+1
2
+1
Exercise 2.8.3
From Markov’s inequality we know
P (jY j
c)
E jY jk
for all k; c > 0
ck
Since we are given that X is a random variable with …nite mean
then
h
i
2
= E (X
)2 = E jX
j2
Substituting Y = X
(9.6)
and …nite variance
, k = 1, and c = k into (9.6) we obtain
P (jX
j
k )
E jX
(k )
2
or
P (jX
for all k > 0 as required.
j
k )
j2
2
=
1
k2
2
(k )
=
1
k2
2
286
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 2.9.2
p
For a Exponential( ) random variable V ar (X) = ( ) = . For g (X) = log X,
g 0 (X) = X1 . Therefore by (2.5), the variance of Y = g (X) = log X is approximately
g0 ( ) ( )
2
1
=
2
( )
= 1
which is a constant.
Exercise 2.10.3
(a) If X
Binomial(n; p) then
M (t) =
=
1
P
n x
p (1
x
etx
x20
1
P
x20
n
x
et p
x
p)
k
= (pet + 1
(b) If X
p)n
x
p)n
(1
x
which converges for t 2 <
by 2.11.3(1)
Poisson( ) then
M (t) =
1
P
etx
x=0
= e
1
P
x=0
et
x
e
x!
et
x!
x
which converges for t 2 <
= e
e
by 2.11.7
= e
+ et
for t 2 <
Exercise 2.10.6
If X
Negative Binomial(k; p) then
MX (t) =
p
1 qet
k
for t <
log q
By Theorem 2.10.4 the moment generating function of Y = X + k is
MY (t) = ekt MX (t)
=
pet
1 qet
k
for t <
log q
9.1. CHAPTER 2
287
Exercise 2.10.14
(a) By the Exponential series 2.11.7
2
t2 =2
t2 =2
M (t) = e
=1+
+
+
1!
2!
1
1 4
= 1 + t2 +
t for t 2 <
2
2!22
t2 =2
Since E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t) we have
E (X) = 1! (0) = 0 and E X 2 = 2!
1
2
=1
and so
V ar (X) = E X 2
[E (X)]2 = 1
(b) By Theorem 2.10.4 the moment generating function of Y = 2X
MY (t) = e t MX (2t)
t (2t)2 =2
= e e
= e
t+(2t)2 =2
1 is
for t 2 <
for t 2 <
By examining the list of moment generating functions in Chapter 11 we see that this is the
moment generating function of a N( 1; 4) random variable. Therefore by the Uniqueness
Theorem for Moment Generating Functions, Y has a N( 1; 4) distribution.
288
9. SOLUTIONS TO CHAPTER EXERCISES
9.2
Chapter 3
Exercise 3.2.5
(a) The joint probability function of X and Y is
f (x; y) = P (X = x; Y = y)
h
n!
2 x
[2 (1
)]y (1
=
x!y! (n x y)!
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y n
2
which is a Multinomial n;
; 2 (1
)2
in
x y
)2 distribution or the trinomial distribution.
) ; (1
(b) The marginal probability function of X is
n
Xx
f1 (x) = P (X = x) =
y=0
=
=
=
and so X
2
x!y! (n
2 x
x
y)!
h
)]y (1
[2 (1
)2
in
x y
h
in x y
(n x)!
[2 (1
)]y (1
)2
y! (n x y)!
y=0
h
in x
x
2 (1
) + (1
)2
by the Binomial Series 2.11.3(1)
1
X
2 x
n!
x! (n x)!
n
x
n
x
n!
2 x
Binomial n;
1
2
2 n x
for x = 0; 1; : : : ; n
.
(c) In a similar manner to (b) the marginal probability function of Y can be shown to be
Binomial(n; 2 (1
)) since P (Aa) = 2 (1
).
(d)
P (X + Y = t) =
P
P
(x;y): x+y=t
=
t
X
x=0
=
=
=
x! (t
t
P
f (x; y) =
f (x; t
x)
x=0
n!
x)! (n
in
n h
(1
)2
t
in
n h
(1
)2
t
n
2
+ 2 (1
t
2 x
t)!
t
tX
t
x
x=0
t
2
)
t
2 x
+ 2 (1
h
(1
)]t
[2 (1
[2 (1
)
)2
in
t
t
x
h
(1
)]t
)2
in
t
x
by the Binomial Series 2.11.3(1)
for t = 0; 1; : : : ; n
Thus X + Y Binomial n; 2 + 2 (1
) which makes sense since X + Y is counting the
number of times an AA or Aa type occurs in n trials and the probability of success is
P (AA) + P (Aa) = 2 + 2 (1
).
9.2. CHAPTER 3
289
Exercise 3.3.6
(a)
1 =
Z1 Z1
1
=
f (x; y)dxdy = k
1
k
2
Z1
k
2
Z1
k
2
Z1
0
lim
a!1
0
=
lim
a!1
0
=
0
=
=
=
Z1 Z1
0
1
ja0
(1 + x + y)2
1
dxdy
(1 + x + y)3
dy
1
1
2 +
(1 + a + y)
(1 + y)2
dy
1
dy
(1 + y)2
k
1 a
lim
j
2 a!1 (1 + y) 0
k
1
lim
+1
2 a!1 (1 + a)
k
2
Therefore k = 2. A graph of the joint probability density function is given in Figure 9.5.
2
1.5
1
0.5
0
0
0
1
1
2
2
3
3
4
4
Figure 9.5: Graph of joint probability density function for Exercise 3.3.6
290
9. SOLUTIONS TO CHAPTER EXERCISES
(b) (i)
P (X
1; Y
2) =
Z2 Z1
0
=
0
1
j10 dy
(1 + x + y)2
0
Z2
1
1
dy =
2 +
(2 + y)
(1 + y)2
1
4
1
3
0
=
Z2
2
dxdy =
(1 + x + y)3
1
1
+1=
(3
2
12
4
(ii) Since f (x; y) and the support set A = f(x; y) : x
and y, P (X Y ) = 0:5
1
(2 + y)
6 + 12) =
0; y
1
(1 + y)
5
12
0g are both symmetric in x
(iii)
P (X + Y
1) =
1
Z1 Z
0
=
y
0
Z1
2
dxdy =
(1 + x + y)3
0
1
1
dy =
2 +
(2)
(1 + y)2
0
1
+1
2
1
+0
4
=
Z1
=
1
j10
(1 + x + y)2
1 1
yj
4 0
f (x; y)dy =
1
Z1
1
4
0
=
2
dy = lim
a!1
(1 + x + y)3
1
(1 + x)2
for x
1
ja0
(1 + x + y)2
0
the marginal probability density function of X is
f1 (x) =
8
<
:
0
1
(1+x)2
x<0
x
0
By symmetry the marginal probability density function of Y is
f2 (y) =
8
<
:
0
1
(1+y)2
y<0
y
0
y
1
j1
(1 + y) 0
(c) Since
Z1
j20
dy
9.2. CHAPTER 3
291
(d) Since
P (X
x; Y
y) =
Zy Zx
0
=
0
Zy
2
dsdt =
(1 + s + t)3
1
1
1+x+t 1+t
1
1
1+x+y 1+y
=
1
x
2 j0 dt
(1 + s + t)
0
1
1
dt
2 +
(1 + x + t)
(1 + t)2
0
=
Zy
jy0
1
+ 1 for x
1+x
the joint cumulative distribution function of X and Y is
8
<
0
F (x; y) = P (X x; Y
y) =
1
1
: 1 + 1+x+y 1+y
lim F (x; y) =
lim
y!1
= 1
1+
1
1+x
1
1+x+y
for x
1
1+y
x
1
1+x
0
the marginal cumulative distribution function of X is
(
0
F1 (x) = P (X x) =
1
1 1+x
x<0
x
0
Check:
d
F1 (x) =
dx
d
dx
1
1
1+x
1
(1 + x)2
= f1 (x) for x
0
x < 0 or y < 0
1
1+x
(e) Since
y!1
0, y
=
0
By symmetry the marginal cumulative distribution function of Y is
8
<
0
y<0
F2 (y) = P (Y
y) =
1
: 1 1+y y 0
0, y
0
292
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.3.7
(a) Since the support set is A = f(x; y) : 0 < x < y < 1g
1 =
Z1 Z1
1 1
Z1
= k
f (x; y)dxdy = k
Z1 Zy
0
e
x y y
j0
e
2y
e
x y
dxdy
0
dy
0
= k
Z1
+e
y
dy
0
= k
= k
=
lim
a!1
lim
a!1
1
e
2
1
e
2
2y
e
y
2a
e
a
ja0
1
+1
2
k
2
and therefore k = 2. A graph of the joint probability density function is given in Figure 9.6
2
1.5
1
0.5
0
0
0
0.5
0.5
1
1
1.5
1.5
Figure 9.6: Graph of joint probability density function for Exercise 3.3.7
9.2. CHAPTER 3
293
(b) (i) The region of integration is shown in Figure 9.7.
P (X
1; Y
2) = 2
Z1 Z2
x y
e
dydx = 2
x=0 y=x
= 2
Z1
x
e
e
y 2
jx
dx
x=0
x
e
Z1
2
e
+e
x
dx = 2
x=0
Z1
e
2
3
1
e
2
2
e
x
+e
2x
2
1
2
dx
x=0
2
= 2 e
x
e
= 1 + 2e
1
e
2
3
3e
2x
j10 = 2 e
e
+
2
y
2
y=x
B
(x,x)
x
1
0
Figure 9.7: Region of integration for Exercise 3.3.7(b)(i)
(ii) Since the support set A = f(x; y) : 0 < x < y < 1g contains only values for which
x < y then P (X Y ) = 1.
(iii) The region of integration is shown in Figure 9.8
P (X + Y
1) = 2
1
Z1=2 Z
x
e
x y
dydx = 2
x=0 y=x
= 2
Z1
e
Z1
e
x
e
y 1 x
jx
dx
x=0
x
e
x 1
+e
x
dx = 2
x=0
Z1
e
1
+e
2x
dx
x=0
= 2
e
1
= 1
2e
1
x
1
e
2
2x
1=2
j0
=2
e
1
1
2
1
e
2
1
+0+
1
2
294
9. SOLUTIONS TO CHAPTER EXERCISES
y
1
y=x
(x,1-x)
C
y=1-x
(x,x)
x
1/2
0
1
Figure 9.8: Region of integration for Exercise 3.3.7(b)(iii)
(c) The marginal probability density function of X is
f1 (x) =
Z1
f (x; y)dy = 2
Z1
x y
e
dy = 2e
x
lim
y a
jx
e
a!1
x
1
= 2e
2x
for x > 0
and 0 otherwise which we recognize is an Exponential(1=2) probability density function.
The marginal probability density function of Y is
Z1
f (x; y)dy = 2
1
Zy
e
x y
dx = 2e
y
Zx Zy
s t
e
x y
j0
= 2e
y
1
e
y
for y > 0
0
and 0 otherwise.
(d) Since
P (X
x; Y
y) = 2
e
s=0 t=s
Zx
= 2
e
dtds = 2
Zx
e
s
e t jys ds
0
s
e
y
+e
s
e
s
+e
2s
ds
0
= 2
Zx
e
y
ds = 2 e
y
e
s
1
e
2
0
= 2e
x y
e
2x
2e
y
+ 1 for x
0, x < y
2s
jx0
9.2. CHAPTER 3
295
P (X
x; Y
y) = 2
Zy Zy
s t
e
s=0 t=s
2y
= e
y
2e
dtds
+ 1 for x > y > 0
and
P (X
x; Y
y) = 0
for x
0 or y
x
therefore the joint cumulative distribution function of X and Y is
8
2e x y e 2x 2e y + 1 x > 0, y > x
>
>
<
e 2y 2e y + 1
x>y>0
F (x; y) = P (X x; Y
y) =
>
>
: 0
x 0 or y x
(e) Since the support set is A = f(x; y) : 0 < x < y < 1g and
lim F (x; y) =
lim 2e
y!1
x y
e
y!1
= 1
2x
e
x) =
y
2e
+1
for x > 0
marginal cumulative distribution function of X is
(
F1 (x) = P (X
2x
0
1
e
x
2x
0
x>0
which we also recognize as the cumulative distribution function of an Exponential(1=2)
random variable.
Since
lim F (x; y) =
x!1
lim 2e
x!y
2y
= e
x y
2e
y
e
2x
2e
y
+ 1 = 2e
2y
2y
e
2e
y
+1
+ 1 for y > 0
the marginal cumulative distribution function of Y is
(
0
F2 (y) = P (Y
y) =
1 + e 2y
y
2e
y
0
y
0
Check:
P (Y
y) =
Zy
t
e
+1
1
2
+1 =
2e
f2 (t) dt = 2
0
= 2
or
d
d
F2 (y) =
e
dy
dy
Zy
e
2t
dt = 2
e
t
1
+ e
2
2t
jy0
0
1
e y+ e
2
2y
2e
y
2y
=1+e
2y
+ 2e
y
2y
2e
= f2 (y)
y
for y > 0
for y > 0
296
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.4.4
From the solution to Exercise 3.2.5 we have
n!
f (x; y) =
2 x
[2 (1
x!y! (n x y)!
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y
n
x
f1 (x) =
and
f2 (y) =
2 x
n
[2 (1
y
2 n x
1
)]y [1
)2
n
in
x y
for x = 0; 1; : : : ; n
)]n
2 (1
h
)]y (1
y
for y = 0; 1; : : : ; n
Since
2 n
)2n 6= f1 (0) f2 (0) = 1
f (0; 0) = (1
[1
)]n
2 (1
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.6 we have
2
(1 + x + y)3
for x
0, y
f1 (x) =
1
(1 + x)2
for x
0
f2 (y) =
1
(1 + y)2
for y
0
f (x; y) =
and
0
Since
f (0; 0) = 2 6= f1 (0) f2 (0) = (1) (1)
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.7 we have
f (x; y) = 2e
x y
f1 (x) = 2e
for 0 < x < y < 1
2x
for x > 0
and
f2 (y) = 2e
y
1
e
y
for y > 0
Since
f (1; 2) = 2e
3
6= f1 (1) f2 (2) = 2e
1
2e
therefore X and Y are not independent random variables.
2
1
e
2
9.2. CHAPTER 3
297
Exercise 3.5.3
P (Y = yjX = x) =
=
P (X = x; Y = y)
P (X = x)
n!
x!y!(n x y)!
2 x
[2 (1
2 x
n!
x!(n x)!
y
[2 (1
=
=
=
=
=
for y = 0; 1; : : : ; (n
(n x)!
y! (n x y)!
"
n x
2 (1
y
1
"
2 (1
n x
y
1
"
2 (1
n x
y
1
"
2 (1
n x
y
1
1
)]
2 y
1
)
2
)
2
)
2
)
2
#y "
#y "
#y "
h
)]y (1
)2
2 n x
h
(1
)
)2
(1
1
2
#(n
2 +
2
1
1
#y "
1
in
x y
x y
2 n x y
1
1
2
in
2
1
2 (1
1
2
x) y
#(n
2 +2
x) y
2
2
)
2
#(n
#(n
x) which is the probability function of a Binomial n
x) y
x) y
x; 2 1(1 2 )
(
)
as
required.
We are given that there are x genotypes of type AA. Therefore there are only n x
members (trials) whose genotype must be determined. The genotype can only be of type
Aa or type aa. In a population with only these two types the proportion of type Aa would
be
2 (1
)
2 (1
)
2 =
2
1
2 (1
) + (1
)
and the proportion of type aa would be
(1
2 (1
)2
) + (1
)
2
=
(1
1
)2
2
=1
2 (1
1
)
2
Since we have (n x) independent trials with probability of Success (type Aa) equal to
2 (1 )
then it follows that the number of Aa types, given that there are x members of type
(1 2 )
AA, would follow a Binomial n
x; 2 1(1 2 )
(
)
distribution.
298
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.5.4
For Example 3.3.3 the conditional probability density function of X given Y = y is
f1 (xjy) =
f (x; y)
x+y
=
f2 (y)
y + 12
for 0 < x < 1 for each 0 < y < 1
Check:
Z1
f1 (xjy) dx =
1
Z1
0
1
x+y
1 dx =
y+2
y+
1 2
x + xyj10
2
1
2
=
1
y+
1
2
1
+y
2
0
=1
By symmetry the conditional probability density function of Y given X = x is
f2 (yjx) =
x+y
x + 21
and
for 0 < y < 1 for each 0 < x < 1
Z1
f2 (yjx) dy = 1
1
For Exercise 3.3.6 the conditional probability density function of X given Y = y is
f (x; y)
f1 (xjy) =
=
f2 (y)
2
(1+x+y)3
1
(1+y)2
=
2 (1 + y)2
(1 + x + y)3
for x > 0 for each y > 0
Check:
Z1
f1 (xjy) dx =
1
Z1
0
1
2 (1 + y)2
2
1
lim
3 dx = (1 + y) a!1
2 j0
(1 + x + y)
(1 + x + y)
= (1 + y)2 lim
a!1
1
1
1
2
=1
2 +
2 = (1 + y)
(1 + a + y)
(1 + y)
(1 + y)2
By symmetry the conditional probability density function of Y given X = x is
f2 (yjx) =
and
2 (1 + x)2
(1 + x + y)3
Z1
for y > 0 for each x > 0
f2 (yjx) dy = 1
1
For Exercise 3.3.7 the conditional probability density function of X given Y = y is
f1 (xjy) =
f (x; y)
2e x y
=
f2 (y)
2e y (1 e
y)
=
e
1
x
e
y
for 0 < x < y for each y > 0
9.2. CHAPTER 3
299
Check:
Z1
Zy
f1 (xjy) dx =
1
x
e
1
e
dx =
y
1
1 e
x y
j0
e
y
=
1
1
y
e
e
y
=1
0
The conditional probability density function of Y given X = x is
f2 (yjx) =
2e x y
f (x; y)
=
= ex
f1 (x)
2e 2x
y
for y > x for each x > 0
Check:
Z1
Z1
f2 (yjx) dy =
ex
y
dy = ex lim
e
y a
jx
Z1 Zy
x y
dxdy
a!1
x
1
= ex 0 + e
x
=1
Exercise 3.6.10
E (XY ) =
Z1 Z1
1 1
Z1
ye
= 2
xyf (x; y)dxdy = 2
y
0
= 2
Z1
ye
0
@
Zy
x
xe
0
y
0
dxA dy = 2
y
ye
0
1
e
y
xye
Z1
y
ye
=
2
0
Z1
2
Z1
Z1
y2e
2y
+ 1 dy =
dy
(2y) e
dy
(2y) e
y2e
2y
y2e
2y
+ ye
jy0 dy
x
2
Z1
2y
ye
y
dy
2
y e
2
Z1
u
2
= 2
1
2
ye
y
dy
0
2
2y
2y
dy + 2 (2)
dy
Z1
(2y) e
but
(2) = 1! = 1
du
Z1
ue
Z1
ue
u
2y
dy
let u = 2y or y =
1
u
so dy = du
2
2
0
2
e
u1
2
0
1
4
dy + 2
Z1
0
Z1
= 2
2y
0
0
= 2
e
0
Z1
0
= 2
x
0
0
=
xe
u1
2
du
0
u2 e
u
0
du
1
2
Z1
0
1
=1
2
du = 2
1
(3)
4
1
(2)
2
but
(3) = 2! = 2
300
9. SOLUTIONS TO CHAPTER EXERCISES
Since X
Exponential(1=2), E (X) = 1=2 and V ar (X) = 1=4.
Z1
E (Y ) =
yf2 (y)dy = 2
1
y
ye
y
e
+ 1 dy
0
Z1
=
Z1
2ye
2y
dy + 2
0
Z1
y
ye
Z1
dy =
0
= 2
Z1
= 2
Z1
2y
2ye
dy
2ye
2y
dy + 2 (2)
0
let u = 2y
0
ue
u1
2
1
2
du = 2
E Y
=
Z1
1
=
2
2
y f2 (y)dy = 2
Z1
u
1
(2) = 2
2
du = 2
1
3
=
2
2
y2e
y
e
y
+ 1 dy
0
Z1
2
y e
2y
dy + 2
0
= 4
ue
0
0
2
Z1
Z1
2
y e
y
dy =
2
0
2
Z1
y2e
2
Z1
u
2
2y
dy
Z1
y2e
2y
dy + 2 (3)
0
let u = 2y
0
= 4
2
e
u1
2
du = 4
1
4
Z1
u2 e
u
1
(3) = 4
4
du = 4
0
0
V ar (Y ) = E Y 2
[E (Y )]2 =
7
2
2
3
2
=
14
9
4
=
5
4
Therefore
Cov (X; Y ) = E (XY )
and
(X; Y ) =
1
2
E (X) E (Y ) = 1
Cov(X; Y )
X Y
=q
1
4
1
4
5
4
3
2
1
=p
5
=1
3
1
=
4
4
1
7
=
2
2
9.2. CHAPTER 3
301
Exercise 3.7.12
Since Y
Gamma( ; 1 )
1
E (Y ) =
=
2
1
V ar (Y ) =
=
2
and
E Y
k
=
Z1
yk
1
y
y
e
( )
dy
let u = y
0
=
( )
Z1
u
k+
1
e
u1
Z1
k
du =
( )
0
k
Since XjY = y
Weibull(p; y
E (Xjy) = y
for
+k >0
1=p )
1=p
1+
1
p
, E (XjY ) = Y
1=p
1+
and
V ar (Xjy) =
y
V ar (XjY ) = Y
1=p
2
1+
2=p
1+
2
p
2
2
p
2
1+
1+
1
p
1
p
Therefore
E (X) = E [E (XjY )] =
=
=
1
0
( + k)
( )
=
uk+
1=p
1
1+
p
1=p
1+
1+
1
p
1
p
( )
1
p
1
p
( )
E Y
1=p
1
p
e
u
du
302
9. SOLUTIONS TO CHAPTER EXERCISES
and
V ar(X) = E[V ar(XjY )] + V ar[E(XjY )]
2
1
1
2
1+
= E Y 2=p
1+
+ V ar Y 1=p
1+
p
p
p
1
1
2
2
1+
E Y 2=p + 2 1 +
V ar Y 1=p
=
1+
p
p
p
2
1
2
=
1+
1+
E Y 2=p
p
p
h
i2
2
1
+ 2 1+
E Y 1=p
E Y 1=p
p
1
2
2
E Y 2=p
1+
E Y 2=p
=
1+
p
p
i2
1
1 h
2
+ 2 1+
1+
E Y 2=p
E Y 1=p
p
p
i2
2
1 h
2
=
1+
1+
E Y 2=p
E Y 1=p
p
p
2
32
2=p
1p
2
1
p
p
2
1
2
4
5
=
1+
1+
p
( )
p
( )
8
2
32 9
>
>
2
2
1
1
< 1+ p
=
1+ p
p
p
2=p
4
5
=
>
>
( )
( )
:
;
Exercise 3.7.13
Since P
Beta(a; b)
E (P ) =
ab
a
, V ar (P ) =
a+b
(a + b + 1) (a + b)2
and
E P
k
=
Z1
pk
(a + b) a
p
(a) (b)
1
p)b
(1
1
dp
0
=
(a + b)
(a) (b)
Z1
pa+k
1
p)b
(1
1
dp
0
=
(a + b) (a + k)
(a) (a + k + b)
Z1
(a + k + b) k+a
p
(a + k) (b)
1
0
=
(a + b) (a + k)
(1) =
(a) (a + k + b)
(a + b) (a + k)
(a) (a + k + b)
(1
p)b
1
dp
9.2. CHAPTER 3
303
provided a + k > 0 and a + k + b > 0.
Since Y jP = p
Geometric(p)
E (Y jp) =
and
V ar (Y jp) =
1
p
p
1
p
p2
, E (Y jP ) =
1
P
P
, V ar (Y jP ) =
=
1
P
1
1
P
1
= 2
P2
P
1
=E P
1
P
Therefore
E (Y ) = E [E (Y jP )] = E
1
P
1
(a + b) (a 1)
1
(a) (a 1 + b)
(a 1 + b) (a 1 + b) (a 1)
(a 1) (a 1) (a 1 + b)
a 1+b
b
1=
a 1
a 1
=
=
=
1
1
provided a > 1 and a + b > 1.
Now
V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )]
= E
1
P2
= E
1
P2
= E
1
P2
= 2E P
= 2
1
P
2
E
1
P
1
1
P
"
1
+E
P
E
1
P
1
P2
+ V ar
E P
+E
1
(a + b) (a 2)
(a) (a 2 + b)
=
1
#
2E
1
E
1
P
1
P
+1
provided a > 2 and a + b > 2.
2
1
E
1
P
2
+ 2E
2
(a + b) (a 1)
(a) (a 1 + b)
(a + b) (a 1)
(a) (a 1 + b)
(a + b 2) (a + b 1) a + b 1
a+b 1
(a 2) (a 1)
a 1
a 1
a+b 1
a+b 2
a+b 1
2
1
a 1
a 2
a 1
ab (a + b 1)
(a 1)2 (a 2)
= 2
=
E P
2
2
2
1
P
1
304
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.9.4
If T = Xi + Xj ; i 6= j; then
T
Binomial(n; pi + pj )
The moment generating function of T = Xi + Xj is
M (t) = E etT + E et(Xi +Xj ) = E etXi +tXj
= M (0; : : : ; 0; t; 0; : : : ; 0; t; 0; : : : ; 0)
=
=
p1 +
+ pi et +
(pi + pj ) et + (1
+ pj et +
pi
pj )
n
+ pk
1
for t 2 <
+ pk
n
for t 2 <
which is the moment generating function of a Binomial(n; pi + pj ) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, T has a Binomial(n; pi + pj )
distribution.
9.3. CHAPTER 4
9.3
305
Chapter 4
Exercise 4.1.2
The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 9.3For s > 1
y
x=y/s
x=y
1
F
E
1
0
G (s) = P (S
=
Z
Y
s = P (Y
sX) = P (Y sX
X
Z
Z1 Zy
Z1
3ydxdy =
3ydxdy = 3y xjyy=s dy
s) = P
y=0 x=y=s
(x;y) 2 E
=
Z1
3y y
y
dy =
s
1
0)
0
Z1
1
s
3y 2 dy =
1
1
s
y 3 j10 = 1
1
s
0
0
The cumulative distribution function for S is
(
G (s) =
As a check we note that lim 1
s!1+
1
s
1
0
s
1
s
1
s>1
= 0 = G (1) and lim 1
s!1
continuous function for all s 2 <.
For s > 1
g (s) =
d
d
G (s) =
ds
ds
1
1
s
The probability density function of S is
g (s) =
and 0 otherwise.
x
1
s2
for s > 1
=
1
s2
1
s
= 1 so G (s) is a
306
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 4.2.5
Since X
Exponential(1) and Y
density function of X and Y is
Exponential(1) independently, the joint probability
x
f (x; y) = f1 (x) f2 (y) = e
= e
e
y
x y
with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 9.9.The transfor.
.
.
y
.
.
.
...
x
0
Figure 9.9: Support set RXY for Example 4.2.5
mation
S : U =X +Y, V =X
has inverse transformation
Y
U +V
U V
, Y =
2
2
are mapped as
X=
Under S the boundaries of RXY
(k; 0) ! (k; k)
(0; k) ! (k; k)
for k
for k
0
0
and the point (1; 2) is mapped to and (3; 1). Thus S maps RXY into
RU V = f(u; v) :
as shown in Figure 9.10.
u < v < u; u > 0g
9.3. CHAPTER 4
307
v=u
v
...
u
0
...
v=-u
Figure 9.10: Support set RU V for Example 4.2.5
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
@x
@u
@y
@u
@x
@v
@y
@v
=
1
2
1
2
1
2
1
2
=
1
2
The joint probability density function of U and V is given by
g (u; v) = f
=
1
e
2
u+v u v
;
2
2
u
1
2
for (u; v) 2 RU V
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u.
g1 (u) =
Z1
g (u; v) dv
1
=
1
e
2
u
Zu
dv
v= u
1
ue u (2)
2
= ue u for u > 0
=
and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
308
9. SOLUTIONS TO CHAPTER EXERCISES
To …nd the marginal probability density functions for V we need to consider the two
cases v 0 and v < 0. For v 0
g2 (v) =
Z1
g (u; v) du
1
=
Z1
1
2
e
u
du
u=v
=
=
=
1
2
1
2
1
e
2
lim e
u b
jv
lim e
b
b!1
e
b!1
v
v
For v < 0
g2 (v) =
Z1
g (u; v) du
1
2
Z1
1
=
e
u
du
u= v
=
=
=
1
2
1
2
1 v
e
2
lim e
u b
lim e
b
j
b!1
b!1
v
ev
Therefore the probability density function of V is
( 1 v
v<0
2e
g2 (v) =
1
v
v 0
2e
which is the probability density function of Double Exponential(0; 1) random variable.
Therefore V
Double Exponential(0; 1).
9.3. CHAPTER 4
309
Exercise 4.2.7
Since X
Beta(a; b) and Y
function of X and Y is
Beta(a + b; c) independently, the joint probability density
(a + b) a 1
x
(1
(a) (b)
(a + b + c) a
x
(a) (b) (c)
f (x; y) =
=
x)b
1
1
(a + b + c) a+b
y
(a + b) (c)
x)b
(1
1
y a+b
1
(1
y)c
1
(1
y)c
1
1
with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g as shown in Figure 9.11.
y
1
x
1
0
Figure 9.11: Support RXY for Exercise 4.2.5
The transformation
S : U = XY , V = X
has inverse transformation
X =V,
Y = U=V
Under S the boundaries of RXY are mapped as
(k; 0) ! (0; k)
0
k
1
(0; k) ! (0; 0)
0
k
1
0
k
1
(k; 1) ! (k; k)
0
k
1
(1; k) ! (k; 1)
and the point
1 1
2; 3
is mapped to and
1 1
6; 2
. Thus S maps RXY into
RU V = f(u; v) : 0 < u < v < 1g
as shown in Figure 9.12
310
9. SOLUTIONS TO CHAPTER EXERCISES
v
1
u
1
0
Figure 9.12: Support set RU V for Exercise 4.2.5
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
0
1
1
v
@y
@v
1
v
=
The joint probability density function of U and V is given by
u
1
v
v
u
(a + b + c) a 1
v
(1 v)b 1
(a) (b) (c)
v
(a + b + c) a+b 1
u
(1 v)b 1 v
(a) (b) (c)
g (u; v) = f v;
=
=
a+b 1
b 1
u
v
1
u
v
1
c 1
c 1
1
v
for (u; v) 2 RU V
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RU V is not rectangular and the range of integration for v will depend on u.
g (u) =
Z1
g (u; v) dv =
(a + b + c) a+b
u
(a) (b) (c)
Z1
(1
v)b
1
u
v
1
v
b 1
v=u
1
=
1
(a + b + c) a+b
u
(a) (b) (c)
1
Z1
b+1
v
v)b
(1
1
c 1
1
dv
v2
u
=
(a + b + c) a
u
(a) (b) (c)
1
Z1
1
Z1
ub
1
1
v
b 1
1
1
u
v
c 1
u
=
(a + b + c) a
u
(a) (b) (c)
u
u
v
u
b 1
1
u
v
c 1
u
dv
v2
u
dv
v2
1
u
v
c 1
dv
9.3. CHAPTER 4
311
To evaluate this integral we make the substitution
u
v
t=
u
u
1
Then
u
v
1
u = (1 u) t
u
= 1 u (1
v
u
= (1 u) dt
v2
u) t = (1
u) (1
t)
When v = u then t = 1 and when v = 1 then t = 0. Therefore
g1 (u) =
(a + b + c) a
u
(a) (b) (c)
1
Z1
u) t]b
[(1
1
[(1
t)]c
u) (1
1
(1
u) dt
0
=
(a + b + c) a
u
(a) (b) (c)
1
(1
b+c 1
u)
Z1
tb
1
(1
t)c
1
dt
0
=
=
(b) (c)
(a + b + c) a 1
u
(1 u)b+c 1
(a) (b) (c)
(b + c)
(a + b + c) a 1
u
(1 u)b+c 1 for 0 < u < 1
(a) (b + c)
and 0 otherwise which is the probability density function of a Beta(a; b + c) random variable.
Therefore U Beta(a; b + c).
Exercise 4.2.12
(a) Consider the transformation
S: U=
X=n
, V =Y
Y =m
which has inverse transformation
X=
Since X
of X and Z is
2 (n)
independently of Y
f (x; y) = f1 (x) f2 (y) =
=
2(n+m)=2
2n=2
n
UV , Y = V
m
2 (m)
then the joint probability density function
1
xn=2
(n=2)
1
xn=2
(n=2) (m=2)
1
1
e
e
x=2
2m=2
x=2 m=2 1
y
1
y m=2
(m=2)
e
1
e
y=2
y=2
with support set RXY = f(x; y) : x > 0; y > 0g. The transformation S maps RXY into
RU V = f(u; v) : u > 0; v > 0g.
312
9. SOLUTIONS TO CHAPTER EXERCISES
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
@x
@u
@y
@u
@x
@v
@y
@v
n
m
=
v
0
@x
@v
n
v
m
=
1
The joint probability density function of U and V is given by
n
n
uv; v
v
m
m
1
n
2(n+m)=2 (n=2) (m=2) m
g (u; v) = f
=
=
n=2 1
(uv)n=2
1
n
e ( m )uv=2 v m=2
n n=2 n=2 1
u
nu
m
v (n+m)=2 1 e v( m +1)=2
(n+m)=2
2
(n=2) (m=2)
1
e
v=2
n
v
m
for (u; v) 2 RU V
and 0 otherwise.
To determine the distribution of U we still need to …nd the marginal probability density
function for U .
g1 (u) =
Z1
g (u; v) dv
1
=
Z1
n n=2 n=2 1
u
nu
m
v (n+m)=2 1 e v( m +1)=2 dv
(n+m)=2
2
(n=2) (m=2)
0
1
n
n
u so that v = 2y 1 + m
u
and dv = 2 1 +
Let y = v2 1 + m
v = 0 then y = 0, and when v ! 1 then y ! 1. Therefore
g1 (u) =
=
Z1
n n=2 n=2 1
u
m
2(n+m)=2 (n=2) (m=2)
0
n n=2 n=2 1 (n+m)=2
u
2
m
2(n+m)=2 (n=2) (m=2)
2y 1 +
n
u
m
1 (n+m)=2 1
Z1
1 (n+m)=2
n
1+ u
m
1
n
mu
e
y
y (n+m)=2
dy. Note that when
2 1+
1
e
y
n
u
m
1
dy
dy
0
=
=
n n=2 n=2 1
u
m
(n=2) (m=2)
n n=2
m
n+m
2
(n=2) (m=2)
1+
un=2
n
u
m
1
(n+m)=2
1+
n
u
m
n+m
2
(n+m)=2
for u > 0
and 0 otherwise which is the probability density function of a random variable with a
F(n; m) distribution. Therefore U = YX=n
F(n; m).
=m
(b) To …nd E (U ) we use
E (U ) = E
X=n
Y =m
=
m
E (X) E Y
n
1
9.3. CHAPTER 4
313
2 (n) independently of Y
2 (m). From Example 4.2.10 we know that if
since X
2
W
(k) then
2p (k=2 + p)
k
E (W p ) =
for + p > 0
(k=2)
2
Therefore
2 (n=2 + 1)
=n
(n=2)
E (X) =
1
E Y
=
2
1
(m=2 1)
1
=
(m=2)
2 (m=2
and
E (U ) =
m
(n) E
n
1
m
=
2
1)
=
1
m
m
m 2
if m > 2
2
if m > 2
To …nd V ar (U ) we need
E U2
Now
E X2 =
E Y
2
=
2
2
"
(X=n)2
=E
(Y =m)2
#
=
m2
E X2 E Y
n2
22 (n=2 + 2)
n
=4
+1
(n=2)
2
(m=2 2)
=
(m=2)
4
m
2
1
1
m
2
2
2
n
= n (n + 2)
2
=
(m
1
2) (m
4)
for m > 4
and
E U2 =
m2
n (n + 2)
n2
(m
1
2) (m
4)
=
n+2
n (m
m2
2) (m
4)
Therefore
V ar (U ) = E U 2
=
=
=
=
=
[E (U )]2
2
n+2
m2
m
n (m 2) (m 4)
m 2
2
m
n+2
1
m 2 n (m 4) m 2
m2
(n + 2) (m 2) n (m 4)
m 2
n (m 4) (m 2)
2
m
2 (n + m 2)
m 2 n (m 4) (m 2)
2m2 (n + m 2)
for m > 4
n (m 2)2 (m 4)
for m > 4
314
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 4.3.3
X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with moment
generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1.
p
The moment generating function of Z = n X
= is
p
MZ (t) = E etZ = E et
= e
= e
= e
= e
= e
n(X
)=
p
n
t n1 P
E exp
Xi
n i=1
n
t P
t
p
E exp
Xi
n i=1
n
t
t Q
p Xi
E exp
since X1 ; X2 ; : : : ; Xn are independent
n
i=1
n
t
t Q
p
M
since X1 ; X2 ; : : : ; Xn are identically distributed
n
i=1
n
t
t
p
M
n
p
n = t
p
n =
p
n =
p
n =
p
n =
Exercise 4.3.7
n
P
(Xi
)2 =
i=1
=
=
n
P
i=1
n
P
i=1
n
P
Xi
X +X
Xi
X
2
2
n
P
+2 X
Xi
X +
i=1
Xi
X
X
=
2
i=1
2
+n X
i=1
since
n
P
Xi
i=1
=
=
n
P
i=1
n
P
i=1
n
P
i=1
= 0
Xi
n
P
X=
i=1
Xi
Xi
n
n
P
i=1
n
P
i=1
n
1 P
Xi
n i=1
Xi
n
P
Xi
nX
X
2
9.3. CHAPTER 4
315
Exercise 4.3.11
Since X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables then by Theorem
4.3.8
n
P
2
Xi X
2
(n 1) S1
2
U=
= i=1
(n 1)
2
2
1
1
Since Y1 ; Y2 ; : : : ; Ym are independent N
V =
(m
1) S22
2
2
=
2;
m
P
2
2
random variables then by Theorem 4.3.8
Yi
Y
2
i=1
2
2
2
(m
1)
U and V are independent random variables since X1 ; X2 ; : : : ; Xn are independent of
Y1 ; Y2 ; : : : ; Ym .
Now
S12 =
S22 =
2
1
2
2
2
(n 1)S1
2
1
n 1
=
2
(m 1)S2
2
2
m 1
=
U= (n
V = (m
1)
1)
Therefore by Theorem 4.2.11
S12 =
S22 =
2
1
2
2
F (n
1; m
1)
316
9. SOLUTIONS TO CHAPTER EXERCISES
9.4
Chapter 5
Exercise 5.4.4
If Xn
Binomial(n; p) then
n
Mn (t) = E etXn = pet + q
If
for t 2 <
(9.7)
= np then
p=
and q = 1
n
Substituting 9.8 into 9.7 and simplifying gives
"
Mn (t) =
Now
lim
n!1
et + 1
n
"
n
1+
n
et 1
n
= 1+
#n
(9.8)
n
et 1
n
t
= e (e
1)
#n
for t 2 <
for t < 1
t
by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of
a Poisson( ) random variable then by Theorem 5.4.1, Xn !D X Poisson( ).
By Theorem 5.4.2
P (Xn = x) =
n x n
p (q)
x
x
(np)x e
x!
np
for x = 0; 1; : : :
Exercise 5.4.7
Let Xi Binomial(1; p), i = 1; 2; : : : independently. Since X1 ; X2 ; : : : are independent and
identically distributed random variables with E (Xi ) = p and V ar (Xi ) = p (1 p), then
p
n Xn p
p
!D Z N(0; 1)
p (1 p)
by the Central Limit Theorem. Let Sn =
n
P
Xi . Then
i=1
S
np
p n
=
np (1 p)
Now by 4.3.2(1) Sn
It follows that
n
1
n
n
P
Xi
p
p p
n p (1
p)
i=1
p
n Xn p
= p
!D Z
p (1 p)
N(0; 1)
Binomial(n; p) and therefore Yn and Sn have the same distribution.
Yn np
Zn = p
!D Z
np (1 p)
N(0; 1)
9.4. CHAPTER 5
317
Exercise 5.5.3
(a) Let g (x) = x2 which is a continuous function for all x 2 <. Since Xn !p a then by
5.5.1(1), Xn2 = g (Xn ) !p g (a) = a2 or Xn2 !p a2 .
(b) Let g (x; y) = xy which is a continuous function for all (x; y) 2 <2 . Since Xn !p a and
Yn !p b then by 5.5.1(2), Xn Yn = g (Xn ; Yn ) !p g (a; b) = ab or Xn Yn !p ab.
(c) Let g (x; y) = x=y which is a continuous function for all (x; y) 2 <2 ; y 6= 0. Since
Xn !p a and Yn !p b 6= 0 then by 5.5.1(2), Xn =Yn = g (Xn ; Yn ) !p g (a; b) = a=b , b 6= 0
or Xn =Yn !p a=b, b 6= 0.
(d) Let g (x; z) = x 2z which is a continuous function for all (x; z) 2 <2 . Since Xn !p a
and Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn 2Zn = g (Xn ; Zn ) !D g (a; Z) =
a 2Z or Xn 2Zn !D a 2Z where Z
N(0; 1). Since a 2Z
N(a; 4), therefore
Xn 2Zn !D a 2Z N(a; 4)
(e) Let g (x; z) = 1=z which is a continuous function for all (x; z) 2 <2 ; z 6= 0. Since
Zn !D Z
N(0; 1) then by Slutsky’s Theorem, 1=Zn = g (Xn ; Zn ) !D g (a; z) = 1=Z
or 1=Zn !D 1=Z where Z N(0; 1). Since h (z) = 1=z is a decreasing function for all z 6= 0
then by Theorem 2.6.8 the probability density function of W = 1=Z is
1
f (w) = p e
2
1=(2w2 )
1
w2
for z 6= 0
Exercise 5.5.8
By (5.9)
p
n Xn
p
!D Z
N(0; 1)
and by Slutsky’s Theorem
Un =
p
n Xn
!D
p
p
Let g (x) = x, a = , and b = 1=2. Then g 0 (x) =
and the Delta Method
n1=2
p
Xn
p
Z
1
p
2 x
N(0; )
(9.9)
and g 0 (a) = g 0 ( ) =
1 p
1
!D p ( Z) = Z
2
2
N 0;
1
4
1
p
2
. By (9.9)
318
9. SOLUTIONS TO CHAPTER EXERCISES
9.5
Chapter 6
Exercise 6.4.6
The probability density function of a Exponential(1; ) random variable is
(x
f (x; ) = e
)
for x
and
2<
and zero otherwise. The support set of the random variable X is [ ; 1) which depends on
the unknown parameter .
The likelihood function is
n
Q
f (xi ; )
L( ) =
=
=
i=1
n
Q
e
i=1
n
Q
(xi
e
xi
)
if xi
en
, i = 1; 2; : : : ; n and
if xi
and
2<
i=1
or more simply
L( ) =
8
<0
:en
2<
if
> x(1)
if
x(1)
where x(1) = min (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. (Note: In order to observe
the sample x1 ; x2 ; : : : ; xn the value of must be smaller than all the observed xi ’s.) L( )
is a increasing function of on the interval ( 1; x(1)] ]. L( ) is maximized at = x(1) .
The maximum likelihood estimate of is ^ = x(1) and the maximum likelihood estimator
is ~ = X(1) .
Note that in this example there is no solution to dd l ( ) = dd (n ) = 0 and the maximum
likelihood estimate of is not found by solving dd l ( ) = 0.
If n = 12 and x(1) = 2
8
<0
if > 2
L( ) =
:e12 if
2
The relative likelihood function is
R( ) =
8
<0
:e12(
2)
if
>2
if
2
is graphed in Figure 9.13 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
R( ) we solve e12( 2) = p to obtain = 2 + log p=12. Since R( ) = 0 if > 2 then a
100p% likelihood interval for is of the form [2 + log (p) =12; 2]. For p = 0:1 we obtain the
10% likelihood interval [2 + log (0:1) =12; 2] = [1:8081; 2]. For p = 0:5 we obtain the 50%
likelihood interval [2 + log (0:5) =12; 2] = [1:9422; 2].
9.5. CHAPTER 6
319
1
0.9
0.8
0.7
R(θ)
0.6
0.5
0.4
0.3
0.2
0.1
0
1.5
1.6
1.7
1.8
θ
1.9
2
2.1
2.2
Figure 9.13: Relative likelihood function for Exercise 6.4.6
Exercise 6.7.3
By Chapter 5, Problem 7 we have
p
n
Q(Xn ; ) = q
Xn
n
Xn
n
1
Xn
n
!D Z
N(0; 1)
(9.10)
and therefore Q(Xn ; ) is an asymptotic pivotal quantity.
Let a be the value such that P (Z a) = (1 + p) =2 where Z N(0; 1). Then by (9.10) we
have
1
0
p Xn
n n
aA
p
P@ a q
Xn
Xn
1
n
n
s
s
!
Xn
a
Xn
Xn
Xn
a
Xn
Xn
p
= P
1
+p
1
n
n
n
n
n
n
n
n
and an approximate 100p% equal tail con…dence interval for is
r
r
xn
a
xn
xn xn
a
xn
xn
p
1
;
+p
1
n
n
n
n
n n
n n
or
where ^ = xn =n.
2
6
6^
4
a
v
u
u^ 1
t
n
^
3
v
u
u^ 1 ^
7
t
7
;^ + a
5
n
320
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 6.7.12
(a) The likelihood function is
L( ) =
n
Q
f (xi ; ) =
n
Q
i=1
i=1
n
P
= exp
xi en
or more simply
L( ) = en
n
Q
i=1
The log likelihood function is
l( ) = n
2
1+e
n
Q
n
P
)
) 2
(x
1
1+e
i=1
i=1
(x
e
(xi
1
log 1 + e
for
) 2
(xi
1+e
(xi
)
for
) 2
2<
for
i=1
The score function is
2<
2<
n
P
e (xi )
d
l( ) = n 2
(xi )
d
i=1 1 + e
n
P
1
= n 2
for 2 <
xi
1
+
e
i=1
S( ) =
Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Note that since
"
#
n
P
d
e xi
S( )=
2
for 2 <
2
d
i=1 (1 + exi )
is negative for all values of > 0 then we know that S ( ) is always decreasing so there
is only one solution to S( ) = 0. Therefore the solution to S( ) = 0 gives the maximum
likelihood estimate.
The information function is
I( ) =
= 2
d2
l( )
d 2
n
P
e xi
i=1
(b) If X
2
(1 + exi
)
for
2<
Logistic( ; 1) then the cumulative distribution function of X is
F (x; ) =
1
1 + e (x
)
for x 2 <;
Solving
u=
1
1 + e (x
)
2<
9.5. CHAPTER 6
321
gives
x=
1
u
log
1
Therefore the inverse cumulative distribution function is
F
1
(u) =
log
1
u
1
for 0 < u < 1
If u is an observation from the Uniform(0; 1) distribution then
vation from the Logistic( ; 1) distribution by Theorem 2.6.6.
log
1
u
1 is an obser-
(c) The generated data are
0:18
1:58
2:47
0:05
1:60
2:59
0:32
1:68
2:76
0:78
1:68
2:78
1:04
1:71
2:87
1:11
1:89
2:91
1:26
1:93
4:02
1:41
2:02
4:52
1:50
2:2:5
5:25
1:57
2:40
5:56
(d) Here is R code for calculating and graphing the likelihood function for these data.
# function for calculating Logistic information for data x and theta=th
LOLF<-function(th,x)
{n<-length(x)
L<-exp(n*th)*(prod(1+exp(th-x)))^(-2)
return(L)}
th<-seq(1,3,0.01)
L<-sapply(th,LOLF,x)
plot(th,L,"l",xlab=expression(theta),
ylab=expression(paste("L(",theta,")")),lwd=3)
30000
0 10000
L(θ)
50000
The graph of the likelihood function is given in Figure 9.5.(e) Here is R code for Newton’s
1.0
1.5
2.0
θ
2.5
3.0
322
9. SOLUTIONS TO CHAPTER EXERCISES
Method. It requires functions for calculating the score and information
# function for calculating Logistic score for data x and theta=th
LOSF<-function(th,x)
{n<-length(x)
S<-n-2*sum(1/(1+exp(x-th)))
return(S)}
#
# function for calculating Logistic information for data x and theta=th
LOIF<-function(th,x)
{n<-length(x)
I<-2*sum(exp(x-th)/(1+exp(x-th))^2)
return(I)}
#
# Newton’s Method for Logistic Example
NewtonLO<-function(th,x)
{thold<-th
thnew<-th+0.1
while (abs(thold-thnew)>0.00001)
{thold<-thnew
thnew<-thold+LOSF(thold,x)/LOIF(thold,x)
print(thnew)}
return(thnew)}
# use Newton’s Method to find the maximum likelihood estimate
# use the mean of the data to begin Newton’s Method
# since theta is the mean of the distribution
thetahat<-NewtonLO(mean(x),x)
cat("thetahat = ",thetahat)
The maximum estimate found using Newton’s Method is
^ = 2:018099
(f ) Here is R code for determining the values of S(^) and I(^).
# calculate Score(thetahat) and the observed information
Sthetahat<-LOSF(thetahat,x)
cat("S(thetahat) = ",Sthetahat)
Ithetahat<-LOIF(thetahat,x)
cat("Observed Information = ",Ithetahat)
The values of S(^) and I(^) are
S(^) = 3:552714
I(^) = 11:65138
10
15
9.5. CHAPTER 6
323
(g) Here is R code for plotting the relative likelihood function for
based on these data.
# function for calculating Logistic relative likelihood function
LORLF<-function(th,thetahat,x)
{R<-LOLF(th,x)/LOLF(thetahat,x)
return(R)}
#
# plot the Logistic relative likelihood function
#plus a line to determine the 15% likelihood interval
th<-seq(1,3,0.01)
R<-sapply(th,LORLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
abline(a=0.15,b=0,col="red",lwd=2)
0.6
0.0
0.2
0.4
R(θ)
0.8
1.0
The graph of the relative likelihood function is given in Figure 9.5.
1.0
1.5
2.0
2.5
3.0
θ
(h) Here is R code for determining the 15% likelihood interval and the approximate 95%
con…dence interval (6.18)
# determine a 15% likelihood interval using uniroot
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=1,upper=1.8)$root
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=2.2,upper=3)$root
# calculate an approximate 95% confidence intervals for theta
L95<-thetahat-1.96/sqrt(Ithetahat)
U95<-thetahat+1.96/sqrt(Ithetahat)
cat("Approximate 95% confidence interval = ",L95,U95) # display values
324
9. SOLUTIONS TO CHAPTER EXERCISES
The 15% likelihood interval is
[1:4479; 2:593964]
The approximate 95% con…dence interval is
[1:443893; 2:592304]
which are very close due to the symmetric nature of the likelihood function.
9.6. CHAPTER 7
9.6
325
Chapter 7
Exercise 7.1.11
If x1 ; x2 ; :::; xn is an observed random sample from the Gamma( ; ) distribution then the
likelihood function for ; is
n
Q
f (xi ; ; )
L( ; ) =
=
i=1
n
Q
i=1
xi 1
e
( )
= [ ( )
xi =
> 0;
>0
1
n
Q
n
]
for
xi
t2
exp
for
> 0;
>0
i=1
where
t2 =
n
P
xi
i=1
or more simply
L( ; ) = [ ( )
]
n
Q
n
xi
t2
exp
for
> 0;
>0
i=1
The log likelihood function is
l ( ; ) = log L ( ; )
=
n log ( )
n log
where
t1 =
n
P
t2
+ t1
for
> 0;
log xi
i=1
The score vector is
S( ; ) =
=
where
h
h
@l
@
n ( ) + t1
(z) =
is the digamma function.
The information matrix is
i
@l
@
2
= 4
@2l
@ 2
@2l
@ @
n
0
@2l
@ @
@2l
@ 2
n
( )
2t2
n
= n4
0
5
2
3
2
3
1
( )
1
3
n
3
2
2
d
log (z)
dz
2
I( ; ) = 4
t2
n log
2x
3
5
5
n
i
>0
326
9. SOLUTIONS TO CHAPTER EXERCISES
where
0
(z) =
d
dz
(z)
is the trigamma function.
The expected information matrix is
J ( ; ) = E [I ( ; ) ; X1 ; :::; Xn ]
2 0
1
( )
= n4
2E (X; ; )
1
3
2
= n4
0
1
( )
1
2
2
2
3
2
3
5
5
since E X; ;
=
.
S ( ; ) = (0 0) must be solved numerically to …nd the maximum likelihood estimates
of and .
9.6. CHAPTER 7
327
Exercise 7.1.14
The data are
1:58
5:47
7:93
2:78
5:52
7:99
2:81
5:87
8:14
3:29
6:07
8:31
3:45
6:11
8:72
The maximum likelihood estimates of
h
(i+1)
(i+1)
i
=
h
(i)
(i)
i
3:64
6:12
9:26
and
+S
3:81
6:26
10:10
4:69
6:42
12:82
4:89
6:74
15:22
5:37
7:49
17:82
can be found using Newton’s Method
(i)
;
(i)
h
I
(i)
;
(i)
i
1
for i = 0; 1; : : :
Here is R code for Newton’s Method for the Gamma Example.
# function for calculating Gamma score for a and b and data x
GASF<-function(a,b,x))
{t1<-sum(log(x))
t2<-sum(x)
n<-length(x)
S<-c(t1-n*(digamma(a)+log(b)),t2/b^2-n*a/b)
return(S)}
#
# function for calculating Gamma information for a and b and data x
GAIF<-function(a,b,x)
{I<-length(x)*cbind(c(trigamma(a),1/b),c(1/b,2*mean(x)/b^3-a/b^2))
return(I)}
#
# Newton’s Method for Gamma Example
NewtonGA<-function(a,b,x)
{thold<-c(a,b)
thnew<-thold+0.1
while (sum(abs(thold-thnew))>0.0000001)
{thold<-thnew
thnew<-thold+GASF(thold[1],thold[2],x)%*%solve(GAIF(thold[1],thold[2]))
print(thnew)}
return(thnew)}
thetahat<-NewtonGA(2,2,x)
328
9. SOLUTIONS TO CHAPTER EXERCISES
The maximum likelihood estimates are ^ = 4:118407 and ^ = 1:657032.
The score vector evaluated at (^ ; ^ ) is
S(^ ; ^ ) =
h
0
1:421085
which indicates we have obtained a local extrema.
10
14
i
The observed information matrix is
"
#
8:239505 18:10466
^
I(^ ; ) =
18:10466 44:99752
Note that since det[I(^ ; ^ )] = (8:239505) (44:99752) (18:10466)2 > 0 and
[I(^ ; ^ )]11 = 8:239505 > 0 then by the second derivative test we have found the maximum
likelihood estimates.
9.6. CHAPTER 7
329
Exercise 7.2.3
(a) The following R code generates the required likelihood regions.
# function for calculating Gamma relative likelihood for parameters a and b and
data x
GARLF<-function(a,b,that,x)
{t<-prod(x)
t2<-sum(x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-((gamma(ah)*bh^ah)/(gamma(a)*b^a))^n
*t^(a-ah)*exp(t2*(1/bh-1/b))
return(L)}
a<-seq(1,8.5,0.02)
b<-seq(0.2,4.5,0.01)
R<-outer(a,b,FUN = GARLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
1
2
b
3
4
The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) are shown in Figure 9.14.
2
4
6
8
a
Figure 9.14: Likelihood regions for Gamma for 30 observations
330
9. SOLUTIONS TO CHAPTER EXERCISES
The likelihood contours are not very elliptical in shape. The contours suggest that large
values of together with small values of or small values of together with large values
of are plausible given the observed data.
(b) Since R (3; 2:7) = 0:14 the point (3; 2:7) lies inside a 10% likelihood region so it is a
plausible value of ( ; ).
3.0
2.5
2.0
1.5
b
3.5
4.0
4.5
(d) The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for 100 observations are
shown in Figure 9.15. We note that for a larger number of observations the likelihood
regions are more elliptical in shape.
2.0
2.5
3.0
3.5
4.0
4.5
5.0
a
Figure 9.15: Likelihood regions for Gamma for 100 observations
9.6. CHAPTER 7
331
Exercise 7.4.4
The following R code graphs the approximate con…dence regions. The function ConfRegion
was used in Example 7.4.3.
1
2
b
3
4
# graph approximate confidence regions
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
2
4
6
8
a
Figure 9.16: Approximate con…dence regions for Gamma for 30 observations
These approximate con…dence regions which are ellipses are very di¤erent than the
likelihood regions in Figure 9.14. In particular we note that ( ; ) = (3; 2:7) lies inside a
10% likelihood region but outside a 99% approximate con…dence region.
There are only 30 observations and these di¤erences suggest the Normal approximation
is not very good. The likelihood regions are a better summary of the uncertainty in the
estimates.
332
9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 7.4.7
Let
Since
h
h
~
~
i
I(^ ; ^ )
i
1
=
"
[J(~ ; ~ )]1=2 !D Z
v^11 v^12
v^12 v^22
#
BVN
h
0 0
then for large n, V ar(~ ) v^11 , V ar( ~ ) v^22 and Cov(~ ; ~ )
mate 95% con…dence interval for a is given by
p
p
[^ 1:96 v^11 ; ^ + 1:96 v^11 ]
i
;
"
1 0
0 1
#!
v^12 . Therefore an approxi-
and an approximate 95% con…dence interval for b is given by
p
p
[ ^ 1:96 v^22 ; ^ + 1:96 v^22 ]
For the data in Exercise 7.1.14, ^ = 4:118407 and ^ = 1:657032 and
h
i 1
I(^ ; ^ )
"
# 1
8:239505 18:10466
=
18:10466 44:99752
"
#
1:0469726
0:4212472
=
0:4212472 0:1917114
An approximate 95% marginal con…dence interval for is
p
p
[4:118407 + 1:96 1:0469726; 4:118407 1:96 1:0469726] = [2:066341; 6:170473]
An approximate 95% con…dence interval for is
p
p
[1:657032 1:96 0:1917114; 1:657032 + 1:96 0:1917114] = [1:281278; 2:032787]
9.6. CHAPTER 7
333
To obtain an approximate 95% marginal con…dence interval for
+
we note that
V ar(~ + ~ ) = V ar(~ ) + V ar( ~ ) + 2Cov(~ ; ~ )
v^11 + v^22 + 2^
v12 = v^
so that an approximate 95% con…dence interval for + is given by
p
p
[^ + ^ 1:96 v^; ^ + ^ + 1:96 v^]
For the data in Example 7.1.13
^ + ^ = 4:118407 + 1:657032 = 5:775439
v^ = v^11 + v^22 + 2^
v12 = 1:0469726 + 0:1917114 + 2( 0:4212472) = 0:3961895
and an approximate 95% marginal con…dence interval for + is
p
p
[5:775439 + 1:96 0:3961895; 5:775439 1:96 0:3961895] = [4:998908; 6:551971]
334
9.7
9. SOLUTIONS TO CHAPTER EXERCISES
Chapter 8
Exercise 8.1.7
(a) Let X = number of successes in n trails. Then X
Binomial(n; ) and E (X) = n .
If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : 6= 0 then a
suitable test statistic is D = jX n 0 j.
For n = 100, x = 42, and
The p-value is
0
= 0:5 the observed value of D is d = jx
P (jX
n 0j
jx
n 0 j ; H0 :
= P (jX
50j
8)
where X
= P (X
42) + P (X
=
n 0 j = j42
50j = 8.
0)
Binomial (100; 0:5)
58)
= 0:06660531 + 0:06660531
= 0:1332106
calculated using R. Since p-value > 0:1 there is no evidence based on the data against
H0 : = 0:5.
(b) If the null hypothesis is H0 :
a suitable test statistic is D = n
For n = 100, x = 42, and
The p-value is
0
=
0
0
and the alternative hypothesis is HA :
0
X
= P (50
X
= P (X
42)
0
then
X.
= 0:5 the observed value of D is d = n
P (n
<
n
8)
0
x; H0 :
where X
=
0
x = 50
42 = 8.
0)
Binomial (50; 0:5)
= 0:06660531
calculated using R. Since 0:05 < p-value < 0:1 there is weak evidence based on the data
against H0 : = 0:5.
Exercise 8.2.5
The model for these data is (X1 ; X2 ; : : : ; X7 )
hypothesis of interest is H0 : 1 = 2 =
=
Multinomial(63; 1 ; 2 ; : : : ; 7 ) and the
1
=
7
7 . Since the model and parameters
7
P
are completely speci…ed this is a simple hypothesis. Since
j = 1 there are only k = 6
j=1
parameters.
The likelihood ratio test statistic, which can be derived in the same way as Example 8.2.4,
is
7
P
Xj
(X; 0 ) = 2
Xj log
Ej
i=1
where Ej = 63=7 is the expected frequency for outcome j.
9.7. CHAPTER 8
335
For these data the observed value of the likelihood ratio test statistic is
(x;
0)
= 2
7
P
xj log
i=1
= 2 22 log
xj
63=7
22
+ 7 log
63=7
7
63=7
+
+ 6 log
6
63=7
= 23:27396
The approximate p-value is
p-value
P (W
23:27396)
2
where W
(6)
= 0:0007
calculated using R. Since p-value < 0:001 there is strong evidence based on the data against
the hypothesis that the deaths are equally likely to occur on any day of the week.
Exercise 8.3.3
(a)
= f( 1 ; 2 ) : 1 > 0;
=
f( 1 ; 2 ) : 1 = 2 ;
0
H0 : 1 = 2 is composite.
> 0g which has dimension k = 2 and
> 0; 2 > 0g which has dimension q = 1 and the hypothesis
2
1
From Example 6.2.5 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Poisson( 1 ) distribution is
L1 ( 1 ) =
nx
n
1 e
1
for
0
1
with maximum likelihood estimate ^1 = x.
Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from an
Poisson( 2 ) distribution is
L2 ( 2 ) =
ny
n
2 e
2
for
0
2
with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for ( 1 ; 2 ) is
L ( 1;
2)
= L1 ( 1 )L2 ( 2 ) for
0;
1
0
2
and the log likelihood function
l ( 1;
2)
= nx log
1
n
1
+ my log
2
m
2
for
1
> 0;
2
>0
The independence of the samples also implies the maximum likelihood estimators are
~1 = X and ~2 = Y . Therefore
l(~1 ; ~2 ; X; Y) = nX log X
nX + mY log Y
= nX log X + mY log Y
mY
nX + mY
336
If
1
9. SOLUTIONS TO CHAPTER EXERCISES
=
2
=
then the log likelihood function is
l ( ) = (nx + my) log
which is only a function of . To determine
(n + m)
(
max
l( 1 ;
1 ; 2 )2
d
d
l ( ) = 0 for
(
max
1 ; 2 )2
nx+my
n+m
=
l( 1 ;
>0
2 ; X; Y)
we note that
0
d
nx + my
l( ) =
d
and
for
(n + m)
and therefore
2 ; X; Y)
= nX + mY log
0
nX + mY
n+m
nX + mY
The likelihood ratio test statistic is
(X; Y;
0)
= 2 l(~1 ; ~2 ; X; Y)
(
max
1 ; 2 )2
l( 1 ;
2 ; X; Y)
0
= 2 nX log X + mY log Y
nX + mY
= 2 nX log X + mY log Y
nX + mY log
nX + mY log
nX + mY
n+m
+ nX + mY
nX + m Y
n+m
with corresponding observed value
(x; y;
Since k
q=2
0)
(nx + my) log
nx + my
n+m
1=1
p-value
(b) For n = 10,
= 2 nx log x + my log y
10
P
2
P [W
(x; y; 0 )] where W
(1)
h
i
p
= 2 1 P Z
(x; y; 0 )
where Z
xi = 22, m = 15,
i=1
test statistic is (x; y;
p-value
15
P
N (0; 1)
yi = 40 the observed value of the likelihood ratio
i=1
0)
= 0:5344026 and
2
P [W 0:5344026] where W
(1)
h
i
p
= 2 1 P Z
0:5344026
where Z
N (0; 1)
= 0:4647618
calculated using R. Since p-value > 0:1 there is no evidence against H0 :
the data.
1
=
2
based on
10. Solutions to Selected End of
Chapter Problems
337
338
10.1
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Chapter 2
1(a) Starting with
1
P
x
=
x=0
it can be shown that
1
P
x
=
x2
x 1
=
x3
x 1
=
x
x=1
1
P
x=1
1
P
x=1
1
for j j < 1
1
for j j < 1
(1
)2
1+
for j j < 1
(1
)3
1+4 + 2
for j j < 1
(1
)4
(10.1)
(10.2)
(10.3)
(1)
and therefore
1
P
1
=
x
k x=1
x
=
f (x) = (1
)2
(1
)2 x
using (10.1) gives k =
x 1
The graph of f (x) in Figure 10.1 is for
for x = 1; 2; :::; 0 <
)2
(1
<1
= 0:3.
0.5
0.45
0.4
0.35
0.3
f(x)
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
x
Figure 10.1: Graph of f (x)
(2)
8
>
<0
x
F (x) = P
>
: (1
t=1
if x < 1
)2 t
t 1
=1
(1 + x
x )
x
for x = 1; 2; :::
10.1. CHAPTER 2
339
Note that F (x) is speci…ed by indicating its value at each jump point.
(3)
1
P
E (X) =
)2 x
x (1
x 1
)2
= (1
x=1
x 1
(1 + )
(1
)3
)2
using (10.2)
(1 + )
(1
)
=
1
P
=
)2 xx
x2 (1
1
)2
= (1
x=1
(1 + 4 + 2 )
(1
)4
(1 + 4 + 2 )
(1
)2
=
)2
[E (X)]2 =
V ar(X) = E(X 2 )
1
P
x3
x 1
x=1
= (1
(4) Using
x2
x=1
= (1
E X2
1
P
using (10.3)
(1 + 4 + 2 )
(1
)2
(1 + )2
2
2 =
(1
)
(1
)2
= 0:3,
P (0:5 < X
2) = P (X = 1) + P (X = 2)
= 0:49 + (0:49)(2)(0:3) = 0:784
P (X > 0:5; X
P (X 2)
P (X = 1) + P (X = 2)
=1
P (X 2)
P (X > 0:5jX
=
2) =
2)
1:(b) (1)
1
k
=
Z1
1
= 2
1
dx = 2
1 + (x= )2
Z1
1
dy
1 + y2
Z1
0
1
dx because of symmetry
1 + (x= )2
1
let y = x;
dy = dx
0
= 2 lim arctan (b) = 2
b!1
Thus k =
1
2
=
and
f (x) =
h
1
1 + (x= )2
i
for x 2 <;
>0
340
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
The graphs for = 0:5, 1 and 2 are plotted in Figure 10.2. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x and on the y axis, each point y is relabeled
y= . The graph of f (x) below is for = 1: Note that the graph of f (x) is symmetric about
the y axis.
0.7
0.6
θ=0.5
0.5
0.4
f (x)
0.3
0.2
θ=0
0.1
θ=2
0
-5
0
x
5
Figure 10.2: Cauchy probability density functions
(2)
F (x) =
Zx
1
=
1
h
1
1 + (t= )
x
arctan
+
2
i dt =
1
2
1
lim arctan
b! 1
t
jxb
for x 2 <
(3) Consider the integral
Z1
0
Since the integral
x
h
i dx =
1 + (x= )2
Z1
0
does not converge, the integral
Z1
1
Z1
t
dt =
1 + t2
lim
b!1
0
h
x
1 + (x= )2
h
i dx
x
1 + (x= )2
i dx
1
ln 1 + b2 = +1
2
10.1. CHAPTER 2
341
does not converge absolutely and E (X) does not exist. Since E (X) does not exist V ar (X)
does not exist.
(4) Using
= 1,
P (0:5 < X 2) = F (2) F (0:5)
1
=
[arctan (2) arctan (0:5)]
0:2048
P (X > 0:5; X 2)
P (X 2)
P (0:5 < X 2)
F (2) F (0:5)
=
P (X 2)
F (2)
arctan (2) arctan (0:5)
0:2403
arctan (2) + 2
P (X > 0:5jX
=
=
2) =
1:(c) (1)
1
k
=
Z1
=
Z1
e
jx
e
jyj
j
dx
1
dy = 2
1
let y = x
Z1
e
y
dy
, then dy = dx
by symmetry
0
= 2 (1) = 2 (0!) = 2
Thus k =
1
2
and
1
f (x) = e jx j for x 2 <; 2 <
2
The graphs for = 1, 0 and 2 are plotted in Figure 10.3. The graph for each di¤erent
value of is obtained from the graph for = 0 by simply shifting the graph for = 0 to
the right units if is positive and shifting the graph for = 0 to the left units if is
negative. Note that the graph of f (x) is symmetric about the line x = .
(2)
8 x
R 1 t
>
>
x
>
< 1 2 e dt
F (x) =
R 1 t
Rx 1 t+
>
>
>
e
dt
+
dt x >
:
2
2e
=
8
< 1 ex
1
x
2
:1 +
2
1
t+ jx
2 e
=1
1
x+
2e
x>
342
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
1
0.9
θ=0
θ=-1
0.8
θ=2
0.7
0.6
f(x)
0.5
0.4
0.3
0.2
0.1
0
-5
0
x
5
Figure 10.3: Double Exponential probability density functions
(3) Since the improper integral
Z1
converges, the integral
(x
E X
=
1
2
Z1
1
Z1
=
1
2
=
Z1
)e
1
2
R1
x2 e
jx
j dx
jx
j
dx
ye
y
dy =
(2) = 1! = 1
y
converges absolutely and by the symmetry of f (x)
jyj
let y = x
1
dy =
2
1
dy + 0 +
2
Z1
Z1
, then dy = dx
y 2 + 2y +
2
e
jyj
dy
1
e
y
dy using the properties of even/odd functions
0
(3) +
= 2+
dx =
Z1
1
(y + ) e
0
=
xe
2
y2e
)
0
we have E (X) = .
2
(x
2
(1) = 2! +
2
2
Therefore
V ar(X) = E(X 2 )
(4) Using
[E (X)]2 = 2 +
2
2
=2
2
)
= 0,
P (0:5 < X
2) = F (2)
1
F (0:5) = (e
2
0:5
e
0:2356
10.1. CHAPTER 2
343
P (X > 0:5jX
=
P (X > 0:5; X
P (X 2)
2
e )
0:2527
2
2) =
1
0:5
2 (e
1 12 e
2)
=
P (0:5 < X 2)
F (2) F (0:5)
=
P (X 2)
F (2)
1:(e) (1)
1
k
=
Z1
x2 e
x
dx
let y = x;
1
dy = dx
0
=
1
3
Z1
y2e
y
dy =
1
(3) =
3
2!
3
=
2
3
0
Thus k =
3
2
and
1 3 2 x
x e
for x 0; > 0
2
The graphs for = 0:5, 1 and 2 are plotted in Figure 10.4. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x= and on the y axis, each point y is relabeled
y.
f (x) =
0.3
θ=0.5
θ=1
θ=2
0.25
0.2
f(x)
0.15
0.1
0.05
0
0
2
4
6
8
10
x
Figure 10.4: Gamma probabilty density functions
(2)
F (x) =
8
>
<0
>
: 21
if x
Rx
0
3 2
t e
t dt
0
if x > 0
344
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Using integration by parts twice we have
1
2
Zx
3 2
t e
t
dt =
0
=
=
=
=
Therefore
F (x) =
2
3
Zx
14
( t)2 e t jx0 + 2
te t dt5
2
80
93
2
Zx
<
=
14
( x)2 e x + 2
( t) e t jx0 + e t dt 5
:
;
2
0
n
oi
1h
( x)2 e x 2 ( x) e x 2 e t jx0
2
i
1h
( x)2 e x 2 ( x) e x 2e x + 2
2
1
1
e x 2 x2 + 2 x + 2
for x > 0
2
8
<0
if x
:1
1
2e
2 2
x
x
+2 x+2
0
if x > 0
(3)
E (X) =
1
2
Z1
x
3 3
x e
dx
1
let y = x;
dy = dx
0
=
1
2
Z1
y3e
y
dy =
1
2
(4) =
3!
3
=
2
0
E X
2
=
1
2
Z1
x
3 4
x e
dx
let y = x;
1
dy = dx
0
=
1
2 2
Z1
y4e
y
dy =
1
4!
12
2 (5) =
2 = 2
2
2
0
and
V ar (X) = E X
2
2
[E (X)] =
12
2
3
2
=
3
2
10.1. CHAPTER 2
(4) Using
345
= 1,
P (0:5 < X
2) = F (2) F (0:5)
1 x 2
= 1
e
x + 2x + 2 j20:5
2
1 2
1
= 1
e (4 + 4 + 2)
1
e
2
2
13
1
e 0:5
e 2 (10)
=
2
4
0:3089
1
+1+2
4
0:5
P (X > 0:5; X 2)
P (X 2)
F (2) F (0:5)
P (0:5 < X 2)
=
=
P (X 2)
F (2)
1
13
0:5
2
e
e (10)
4
= 2
1
2
1 2 e (10)
0:9555
P (X > 0:5jX
2) =
1:(g) (1) Since
ex=
f ( x) =
2
1 + ex=
ex=
=
e2x=
1+e
2
x=
=
e
x=
1+e
x=
2
= f (x)
therefore f is an even function which is symmetric about the y axis.
1
k
=
Z1
f (x)dx =
1
Z1
= 2
0
= 2
Z1
1
e
e
x=
x=
1+e
2 dx
x=
1+e
2 dx
x=
1
b!1 1 + e
lim
x=
by symmetry
1
b!1 1 + e
jb0 = 2
lim
b=
1
=2
2
Therefore k = 1= .
(2)
F (x) =
Zx
1
=
=
e
t=
1+e
1
1+e
1
1+e
x=
x=
t=
2 dt
a!
1
11+e
lim
a!
1
11+e
= lim
for x 2 <
a=
t=
jxa
1
2
=
346
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
0.25
0.2
0.15
f(x)
0.1
0.05
0
-10
-5
0
x
5
10
Figure 10.5: Graph of f (x) for
=2
(3) Since f is a symmetric function about x = 0 then if E (jXj) exists then E (X) = 0.
Now
E (jXj) = 2
Z1
x=
xe
1+e
0
x=
2 dx
Since
Z1
x
e
x=
Z1
dx =
0
y
ye
dy =
(2)
0
converges and
x
e
x=
xe
1+e
x=
x=
2
for x
0
therefore by the Comparison Test for Integrals the improper integral
E (jXj) = 2
Z1
0
converges and thus E (X) = 0.
xe
1+e
x=
x=
2 dx
10.1. CHAPTER 2
347
By symmetry
E X2
Z1
=
1
= 2
x2 e
x=
1+e
x=
Z0
1
= 2
x2 e
x=
1+e
x=
Z0
2
2 dx
1
2 dx
let y = x=
y
y2e
(1 + e
dy
y )2
Using integration by parts with
u = y2;
du = 2ydy;
dv =
e y
;
(1 + e y )2
v=
1
(1 + e
y)
we have
Z0
1
y2e
(1 + e
y
y2
dy
=
lim
y )2
a! 1 (1 + e
j0
y) a
a2
=
lim
a! 1 (1 + e a )
=
=
2a
e a
lim
a! 1
2
lim
e
a! 1
= 0
=
2
2
Z0
Z0
a
2
2
2y
1+e
y
dy
Z0
y
1+e
y
dy
1
2
Z0
Z0
Z0
1
y
1+e
y
dy
y
1+e
y
dy
by L’Hospital’s Rule
dy
y
multiply by
1
yey
dy
1 + ey
Let
u = ey ; du = ey dy; log u = y
Z0
1
yey
dy =
1 + ey
Z1
0
=1
1
1
y
1+e
a
by L’Hospital’s Rule
1
to obtain
lim e
a! 1
log u
du =
1+u
2
12
ey
ey
348
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
This de…nite integral can be found at https://en.wikipedia.org/wiki/List_of_de…nite_integrals.
Z1
2
log u
du =
1+u
12
0
Therefore
E X2
= 2
2
2
2
12
2 2
=
3
and
V ar (X) = E X 2
[E (X)]2 =
2 2
3
02
2 2
=
3
(4)
P (0:5 < X
2) = F (2) F (0:5)
1
1
=
2=2
1+e
1 + e 0:5=2
1
1
=
1
1+e
1 + e 0:25
0:1689
P (X > 0:5jX
2) =
P (X > 0:5; X
P (X 2)
2)
P (0:5 < X
P (X 2)
F (2) F (0:5)
=
F (2)
1
1
=
1
1+e
1+e
0:2310
=
using
0:25
1+e
=2
2)
1
10.1. CHAPTER 2
349
2:(a) Since
)2 x
f (x; ) = (1
x 1
therefore
f0 (x) = f (x; = 0) = 0
and
f1 (x) = f (x; = 1) = 0
Since
f (x; ) 6= f0 (x
therefore
)
1
x
and f (x; ) 6= f1
is neither a location nor scale parameter.
2: (b) Since
f (x; ) =
therefore
1
h
i
1 + (x= )2
1
(1 + x2 )
f1 (x) = f (x; = 1) =
Since
therefore
for x 2 <;
1
x
f (x; ) = f1
>0
for x 2 <
for all x 2 <;
>0
is a scale parameter for this distribution.
2: (c) Since
1
f (x; ) = e
2
therefore
jx
j
for x 2 <;
1
f0 (x) = f (x; = 0) = e
2
jxj
2<
for x 2 <
Since
f (x; ) = f0 (x
therefore
)
for x 2 <;
2<
is a location parameter for this distribution.
2: (e) Since
f (x; ) =
therefore
1
2
3 2
x e
x
for x
1
f1 (x) = f (x; = 1) = x2 e
2
0;
x
>0
for x
Since
f (x; ) = f1 ( x)
for x
therefore 1= is a scale parameter for this distribution.
0;
>0
0
350
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
4: (a) Note that f (x) can be written as
8 2
c =2 ec(x )
>
>
<ke
2
f (x) = ke (x ) =2
>
>
:kec2 =2 e c(x
x<
c
c
)
x
+c
x> +c
Therefore
1
k
= e
c2 =2
= 2ec
Z
2 =2
Z+c
)
dx +
e
c
ec(x
1
Z1
= 2e
=
=
) =2
(x
dx + e
c2 =2
c
e
cu
du +
p
2
c
c2 =2
2
lim
b!1
e
1
p e
2
cu b
jc
+
p
z 2 =2
2 P (jZj
2 c2 =2 c2 p
e
e
+ 2 [2 (c) 1]
c
2 c2 =2 p
e
+ 2 [2 (c) 1]
c
where
c)
where Z
F (x) = ke
Zx
ec(u
x
Z
ecy dy
)
du;
1
= kec
2 =2
1
= kec
=
and F (
c) = kc e
c2 =2 .
2 =2
dx
lim
a! 1
k c2 =2+c(x
e
c
)
1 cy x
e ja
c
N (0; 1)
is the N (0; 1) c.d.f.
c then
c2 =2
)
dz
as required.
4: (b) If x <
c(x
+c
Zc
c
1
e
c
Z1
let y = u
10.1. CHAPTER 2
If
c
x
351
+ c then
F (x) =
k
e
c
c2 =2
x
p Z
1
p e
+k 2
2
(u
)2 =2
z 2 =2
dz
du
let z = u
c
=
x
p Z
1
p e
+k 2
2
k
e
c
c2 =2
k
e
c
k
e
c
c2 =2
p
+ k 2 [ (x
)
( c)]
c2 =2
p
+ k 2 [ (x
)+
(c)
c
=
=
1] :
If x > + c then
F (x) = 1
ke
c2 =2
Z1
e
c(u
)
cy
dy
du
let y = u
x
ke
c2 =2
= 1
ke
c2 =2
= 1
k c2 =2
e
c
= 1
Z1
e
x
1
e
c
lim
b!1
c(x
cy b
jx
)
Therefore
8 2
k c =2+c(x )
>
>
<ce
p
2
F (x) = kc e c =2 + k 2 [ (x
>
>
:1 k ec2 =2 c(x )
x<
)+
(c)
c
x
+c
x> +c
c
Since
1]
c
is a location parameter (see part (c))
E X
k
=
Z1
x f (x) dx =
=
Z1
(y + )k f0 (y) dy
1
1
k
Z1
1
xk f0 (x
) dx
let y = u
(10.4)
352
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
In particular
E (X) =
Z1
(y + ) f0 (y) dy =
=
Z1
yf0 (y) dy + (1)
=
Z1
yf0 (y) dy +
Z1
1
Z1
yf0 (y) dy +
1
f0 (y) dy
1
since f0 (y) is a p.d.f.
1
1
Now
1
k
Z1
c2 =2
yf0 (y) dy = e
1
let y =
=
Z
c
cy
ye dy +
Zc
ye
y 2 =2
c2 =2
dy + e
ye
cy
dy
Z1
ye
cy
c
c
1
Z1
u in the …rst integral
Zc
Z1
cu
c2 =2
ue du + ye
e
c
y 2 =2
c2 =2
dy + e
dy
c
c
By integration by parts
Z1
cy
ye
dy =
c
=
2
1
lim 4 ye
b!1
c
lim
b!1
=
Also since g (y) = ye
about 0
y 2 =2
1
ye
c
1+
1
c2
e
cy b
jc
cy b
jc
+
1
c
Zb
cy
e
c
1
e
c2
cy b
jc
3
dy 5
c2
(10.5)
is a bounded odd function and [ c; c] is a symmetric interval
Z+c
ye
y 2 =2
dy = 0
c
Therefore
1
k
Z1
yf0 (y) dy =
1+
1
c2
e
c2
+0+ 1+
1
c2
1
and
E (X) =
Z1
1
yf0 (y) dy + = 0 + =
e
c2
=0
10.1. CHAPTER 2
353
To determine V ar (X) we note that
V ar (X) = E X 2
[E (X)]2
Z1
=
(y + )2 f0 (y) dy
1
2
=
Z1
y f0 (y) dy + 2
=
Z1
y 2 f0 (y) dy + 2 (0) +
=
Z1
y 2 f0 (y) dy
2
1
Z1
using (10:4)
2
yf0 (y) dy +
1
Z1
f0 (y) dy
2
1
2
2
(1)
1
1
Now
1
k
Z1
1
2
y f0 (y) dy = e
c2 =2
Z
c
2 cy
y e dy +
2
y e
y 2 =2
c
2 =2
y 2 =2
dy + e
y2e
cy
dy +
c
Zc
y2e
c2 =2
y 2 =2
dy
c
By integration by parts and using (10.5) we have
Z1
c
y2e
cy
dy =
2
1
lim 4 y 2 e
b!1
c
cy b
jc
+
2
1
= ce
+
1+ 2
c
c
2
2
2
=
c+ + 3 e c
c c
c2
y2e
Z1
c
c
Z1
Z1
cy
dy
c
u in the …rst integral)
Z1
Zc
c2 =2
2
cu
= e
u e du + y 2 e
= 2ec
dy + e
c2 =2
c
1
(let y =
Zc
2
c
Zb
ye
c
e
c2
cy
3
dy 5
y2e
cy
dy
354
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Also
Z+c
y2e
y 2 =2
c
dy = 2
Zc
y2e
y 2 =2
dy
0
2
= 2 4 ye
n
y 2 =2 c
j0
p
+
2
Zc
0
1
p e
2
y 2 =2
o
= 2
ce
+ 2 [ (c) 0:5]
p
2
2 [2 (c) 1] 2ce c =2
=
p
c2 =2
3
dy 5
using integration by parts
Therefore
V ar (X) =
Z1
y 2 f0 (y) dy
1
p
2
2
2
2
= k 2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2
c c
(
)
1
p
=
2
c2 =2 + 2 [2 (c)
1]
ce
p
2
2
2
2
2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2
c c
4: (c) Let
f0 (x) = f (x; = 0)
(
2
ke x =2
=
2
ke cjxj+c =2
if jxj
c
if jxj > c
Since
f0 (x
) =
(
ke
ke
= f (x)
therefore
(x
cjx
)2 =2
j+c2 =2
if jx
if jx
is a location parameter for this distribution.
j
c
j>c
10.1. CHAPTER 2
355
4: (d) On the graph in Figure 10.6 we have graphed f (x) for c = 1, = 0 (red), f (x) for
c = 2, = 0 (blue) and the N(0; 1) probability density function (black).
0.4
N(0,1) p.d.f. (black)
0.35
0.3
0.25
c=2(blue)
f(x)
0.2
0.15
0.1
c=1(red)
0.05
0
-5
Figure 10.6: Graphs of f (x) for c = 1,
(black).
0
x
= 0 (red), c = 2,
5
= 0 and the N(0; 1) p.d.f.
We note that there is very little di¤erence between the graphs of the N(0; 1) probability
density function and the graph for f (x) for c = 2, = 0 however as c becomes smaller
(c = 1) the “tails”of the probability density function become much “fatter”relative to the
N(0; 1) probability density function.
356
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
5: (a) Since X
P (X
Geometric(p)
k) =
1
P
p (1 p)k
p) =
1 (1 p)
x
p (1
x=k
= (1
p)k
by the Geometric Series
for k = 0; 1; : : :
(10.6)
Therefore
P (X
k + jjX
k) =
=
P (X
k + j; X
P (X k)
p)k+j
(1
=
P (X k + j)
P (X k)
by (10:6)
p)k
(1
k)
p)j
= (1
= P (X
j)
for j = 0; 1; : : :
(10.7)
Suppose we have a large number of items which are to be tested to determine if they are
defective or not. Suppose a proportion p of these items are defective. Items are tested
one after another until the …rst defective item is found. If we let the random variable X
be the number of good items found before observing the …rst defective item then X
Geometric(p) and (10.6) holds. Now P (X j) is the probability we …nd at least j good
items before observing the …rst defective item and P (X k + jjX k) is the probability
we …nd at least j more good items before observing the …rst defective item given that
we have already observed at least k good items before observing the …rst defective item.
Since these probabilities are the same for all nonnegative integers by (10.7), this implies
that, no matter how many good items we have already observed before observing the …rst
defective item, the probability of …nding at least j more good items before observing the
…rst defective item is the same as when we …rst began testing. It is like we have “forgotten”
that we have already observed at least k good items before observing the …rst defective
item. In other words, conditioning on the event that we have already observed at least
k good items before observing the …rst defective item does not a¤ect the probability of
observing at least j more good items before observing the …rst defective item.
5: (b) If Y
Exponential( ) then
P (Y
a) =
Z1
1
y=
e
dy = e
a=
for a > 0
a
Therefore
P (Y
a + bjY
a) =
=
P (Y
e
= e
as required.
a + b; Y
P (Y
a)
a)
=
P (Y
a + b)
P (Y
a)
(a+b)=
e
b=
a=
= P (Y
by (10.8)
b)
for all a; b > 0
(10.8)
10.1. CHAPTER 2
357
6: Since f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support sets A1 ;
A2 ; : : : ; Ak then we know that fi (x) > 0 for all x 2 Ai ; i = 1; 2; : : : ; k. Also since
k
k
P
P
0 < p1 ; p2 ; : : : ; pk
1 with
pi = 1, we have that g (x) =
pi fi (x) > 0 for all
i=1
x2A=
k
S
i=1
Ai and A = support set of X. Also
i=1
Z1
g(x)dx =
k
X
Z1
pi
i=1
1
fi (x) dx =
k
X
pi (1) =
i=1
1
k
X
pi = 1
i=1
Therefore g (x) is a probability density function.
Now the mean of X is given by
E (X) =
Z1
xg(x)dx =
k
X
pi
Z1
2
x g(x)dx =
k
X
pi
Z1
pi
i=1
1
=
k
X
xfi (x) dx
1
i
i=1
As well
E X
2
=
i=1
1
=
k
X
2
i
+
pi
Z1
x2 fi (x) dx
1
2
i
i=1
since
Z1
x2 fi (x) dx =
1
=
=
Z1
(x
1
2
i +
2
i +
i)
2
2
i
2
fi (x) dx + 2
i
Z1
xfi (x) dx
1
2
i
1
2
i
Thus the variance of X is
V ar (X) =
k
X
i=1
pi
2
i
+
2
i
k
X
i=1
2
i
Z1
pi
i
!2
fi (x) dx
358
7:(a) Since X
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Gamma( ; ) the probability density function of X is
f (x) =
1 e x=
x
for x > 0
( )
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = ex = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 1g. Also
x=h
1
d
h
dy
(y) = log y and
1
(y) =
1
y
The probability density function of Y is
=
=
d
h
dy
1
(y)
(log y)
1
e
( )
(log y)
1
g (y) = f h
y
1
(y)
log y=
1=
1
y
1
for y 2 B
( )
and 0 otherwise.
7:(b) Since X
Gamma( ; ) the probability density function of X is
f (x) =
1 e x=
x
( )
for x > 0
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 1=x = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also
x=h
1
(y) =
1
d
and
h
y
dy
1
(y) =
1
y2
The probability density function of Y is
g (y) = f h
=
y
1
(y)
d
h
dy
1 e 1=( y)
( )
1
(y)
for y 2 B
and 0 otherwise. This is the probability density function of an Inverse Gamma( ; ) random
variable. Therefore Y = X 1 Inverse Gamma( ; ).
7:(c) Since X
Gamma(k; ) the probability density function of X is
f (x) =
xk
1 e x=
(k)
k
for x > 0
10.1. CHAPTER 2
359
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 2x= = h(x) is a
one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also
x=h
1
y
d
and
h
2
dy
(y) =
1
(y) =
2
The probability density function of Y is
1
g (y) = f h
yk
=
d
h
dy
(y)
1 e y=2
1
(y)
for y 2 B
(k) 2k
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a
2 (2k).
variable. Therefore Y = 2X=
7:(d) Since X
N
;
2
2 (2k)
random
the probability density function of X is
f (x) = p
1
2
)2
1
2 (x
e2
for x 2 <
Let A = fx : f (x) > 0g = <. Now y = ex = h(x) is a one-to-one function on A and h maps
the set A to the set B = fy : y > 0g. Also
x=h
1
d
h
dy
(y) = log y and
1
(y) =
1
y
The probability density function of Y is
1
g (y) = f h
1
p
y 2
=
d
h
dy
(y)
1
2 (log y
e2
1
(y)
)2
y2B
Note this distribution is called the Lognormal distribution.
7:(e) Since X
N
;
2
the probability density function of X is
f (x) = p
1
2
e2
)2
1
2 (x
for x 2 <
Let A = fx : f (x) > 0g = <. Now y = x 1 = h(x) is a one-to-one function on A and h
maps the set A to the set B = fy : y 6= 0; y 2 <g. Also
x=h
1
1
d
and
h
y
dy
(y) =
1
(y) =
1
y2
The probability density function of Y is
g (y) = f h
=
p
1
(y)
1
2
y2
e2
d
h
dy
1
2
h
1
y
1
(y)
i2
for y 2 B
360
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
7:(f ) Since X
Uniform
;
2
2
the probability density function of X is
f (x) =
1
for
<x<
2
2
and 0 otherwise. Let A = fx : f (x) > 0g = x : 2 < x < 2 . Now y = tan(x) = h(x)
is a one-to-one function on A and h maps A to the set B = fy : 1 < y < 1g. Also
d
1
x = h 1 (y) = arctan(y) and dy
h 1 (y) = 1+y
2 . The probability density function of Y is
g (y) = f h
1
=
1
d
h
dy
(y)
1
1 + y2
1
(y)
for y 2 <
and 0 otherwise. This is the probability density function of a Cauchy(1; 0) random variable.
Therefore Y = tan(X) Cauchy(1; 0).
7:(g) Since X
Pareto( ; ) the probability density function of X is
f (x) =
for x
; ; >0
x +1
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x
> 0g. Now y = log (x= ) = h(x)
is a one-to-one function on A and h maps A to the set B = fy : 0 y < 1g. Also x =
d
h 1 (y) = ey= . The probability density function of Y is
h 1 (y) = ey= and dy
g (y) = f h
1
ey=
=
d
h
dy
(y)
+1
ey=
1
(y)
=e
y
for y 2 B
and 0 otherwise. This is the probability density function of a Exponential(1) random
variable. Therefore Y = log (X= ) Exponential(1).
7:(h) If X
Weibull(2; ) the probability density function of X is
f (x) =
2xe
(x= )2
for x > 0;
2
>0
and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = x2 = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also
x=h
1
(y) = y 1=2 and
d
h
dy
1
(y) =
The probability density function of Y is
g (y) = f h
=
e
y=
2
1
(y)
d
h
dy
1
2
for y 2 B
(y)
1
2y 1=2
10.1. CHAPTER 2
361
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a Exponential
random variable. Therefore Y = X 2 Exponential( 2 )
7:(i) Since X
2
Double Exponential(0; 1) the probability density function of X is
1
f (x) = e
2
jxj
for x 2 <
The cumulative distribution function of Y = X 2 is
p
y X
G (y) = P (Y
y) = P X 2 y = P (
Z py
1 jxj
=
e
dx
p 2
y
Z py
e x dx
by symmetry for y > 0
=
p
y)
0
By the First Fundamental Theorem of Calculus and the chain rule the probability density
function of Y is
p d p
d
G (y) = e y
y
dy
dy
p
1
p e y for y > 0
2 y
g (y) =
=
and 0 otherwise.
7:(j) Since X
t(k) the probability density function of X is
f (x) =
k+1
2
k
2
1
p
k
( k+1
2 )
x2
1+
k
for x 2 <
which is an even function. The cumulative distribution function of Y = X 2 is
G (y) = P (Y
p
=
Z
y
p
= 2
y
p
Zy
0
y) = P X 2
y =P(
p
y
k+1
2
k
2
1
p
k
x2
1+
k
( k+1
2 )
k+1
2
k
2
1
p
k
x2
1+
k
( k+1
2 )
X
p
y)
dx
dx
by symmetry for y > 0
By the First Fundamental Theorem of Calculus and the chain rule the probability density
362
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
function of Y is
g (y) =
d
G (y)
dy
= 2
= 2
=
k+1
1
2
p
k
k
2
k+1
1
2
p
k p
k
2
k+1
1
2
k
1
k
2
2
1+
y
k
( k+1
2 ) d p
y
dy
1+
y
k
( k+1
2 )
1=2
y
1
p
2 y
1
2
1
1
1+ y
k
for y > 0
for y > 0
( k+1
2 )
for y > 0
since
1
2
=
p
and 0 otherwise. This is the probability density function of a F(1; k) random variable.
Therefore Y = X 2 F(1; k).
10.1. CHAPTER 2
363
8:(a) Let
cn =
If T
n+1
2
n
2
1
p
n
t (n) then T has probability density function
t2
n
f (t) = cn 1 +
(n+1)=2
for t 2 <; n = 1; 2; :::
Since f ( t) = f (t), f is an even function whose graph is symmetric about the y axis.
Therefore if E (jT j) exists then due to symmetry E (T ) = 0. To determine when E (jT j)
exists, again due to symmetry, we only need to determine for what values of n the integral
Z1
t2
t 1+
n
(n+1)=2
dt
0
converges.
There are two cases to consider: n = 1 and n > 1.
For n = 1 we have
Z1
t 1 + t2
1
dt = lim
b!1
1
1
ln 1 + t2 jb0 = lim ln 1 + b2 = 1
b!1 2
2
0
and therefore E (T ) does not exist.
For n > 1
Z1
t 1+
t2
n
(n+1)=2
(n 1)=2
n
t2
1+
b!1 n
1
n
"
n
b2
=
1 lim 1 +
b!1
n 1
n
n
=
n 1
dt =
0
lim
jb0
(n 1)=2
#
and the integral converges. Therefore E (T ) = 0 for n > 1.
8:(b) To determine whether V ar(T ) = E T 2 (since E (T ) = 0) exists we need to determine
for what values of n the integral
Z1
0
converges.
t
2
t2
1+
n
(n+1)=2
dt
364
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Now
Z1
t2 1 +
(n+1)=2
t2
n
dt
0
p
Zn
t2
t2 1 +
=
n
(n+1)=2
dt +
Z1
p
0
t2 1 +
(n+1)=2
t2
n
dt
n
The …rst integral is …nite since it is the integral of a …nite function over the …nite interval
p
[0; n]. We will show that the second integral
Z1
p
diverges for n = 1; 2.
Now
Z1
p
2
t
(n+1)=2
t2
1+
n
2
t
dt
n
(n+1)=2
t2
1+
n
p
let y = t= n
dt
n
=n
3=2
Z1
y2 1 + y2
(n+1)=2
dy
(10.9)
1
For n = 1
and since
y2
(1 + y 2 )
y2
1
=
2
2
(y + y )
2
Z1
for y
1
1
dy
2
1
diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 1.
(Note: For n = 1 we could also argue that V ar(T ) does not exist since E (T ) does not exist
for n = 1.)
For n = 2,
y2
y2
1
= 3=2
for y 1
3=2
3=2
2 y
(1 + y 2 )
(y 2 + y 2 )
and since
1
23=2
Z1
1
dy
y
1
diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 2.
10.1. CHAPTER 2
365
Now for n > 2,
E T
2
Z1
cn t
t2
1+
n
= 2cn
Z1
t2
n
=
1
2
t2 1 +
(n+1)=2
dt
(n+1)=2
dt
0
since the integrand is an even function. Integrate by parts using
u = t, dv = t 1 +
du = dt, v =
(n+1)=2
t2
n
n
n 1
dt
1+
(n 1)=2
t2
n
Then
E T
2
= 2cn
"
lim t
b!1
n
+2cn
n
1
Z1
(n+1)=2
t2
1+
n
n
n 1
1+
jb0
#
(n 1)=2
t2
n
dt
0
=
n
2cn
n
1
b
lim
b!1
1+
b2
n
(n+1)=2
+ cn
Z1
n
n
1
1+
(n 1)=2
t2
n
dt
1
where we use symmetry on the second integral.
Now
b
1
lim
= lim
(n+1)=2
b!1
b!1 n+1
2
1 + bn
b 1+
n
b2
n
(n 1)=2
=0
by L’Hopital’s Rule. Also
cn
n
n
1
Z1
t2
1+
n
(n 1)=2
dt
1
=
=
cn
cn
n
2
n
cn
cn
1
n
n
2
n
1=2
n
2
1=2
n
1
n
Z1
cn
2
y
t
let p
=p
n
n 2
1+
(n 2+1)=2
y2
n
2
dy
1
2
where the integral equals one since the integrand is the p.d.f. of a t (n
2) random variable.
366
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Finally
cn
cn
=
=
=
n
2
p
n
2
1
1)
2
n
n
n
n 2
2
2
p
(n
2)
n
n 2+1
2
n
n+1
2
n+1
1
2
n+1
1
2
n+1
2
(n
1=2
n
n+1
2
=
=
n
n
2
1
n
n
2
n+1
2
1
1=2
2
n
n
2
1
n
2
1
1
2
V ar(T ) = E T 2 =
1
n
n
1
Therefore for n > 2
n
n
2
2
1=2
n
1
n
2
n
n
n
n
n
1
1
(n 2) =2
n
1=2
n
n
1
2
10.1. CHAPTER 2
367
9: (a) To …nd E X k we …rst note that since
(a + b) a
x
(a) (b)
f (x) =
1
x)b
(1
1
for 0 < x < 1
and 0 otherwise then
Z1
(a + b) a
x
(a) (b)
1
(1
x)b
1
dx = 1
0
or
Z1
xa
1
(1
x)b
1
(a) (b)
(a + b)
dx =
(10.10)
0
for a > 0, b > 0. Therefore
E X
k
=
(a + b)
(a) (b)
Z1
xk xa
1
(1
x)b
1
dx
(a + b)
(a) (b)
Z1
xa+k
1
(1
x)b
1
dx
0
=
0
(a + b) (a + k) (b)
by (10:10)
(a) (b) (a + b + k)
(a + b)
(a + k)
for k = 1; 2; : : :
(a)
(a + b + k)
=
=
For k = 1 we have
E Xk
(a + 1)
(a + b)
(a)
(a + b + 1)
(a + b)
a (a)
(a) (a + b) (a + b)
a
a+b
= E (X) =
=
=
For k = 2 we have
E X2
=
=
=
(a + 2)
(a + b)
(a)
(a + b + 2)
(a + 1) (a) (a)
(a + b)
(a)
(a + b + 1) (a + b) (a + b)
a (a + 1)
(a + b) (a + b + 1)
368
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
V ar (X) = E X 2
=
=
=
=
[E (X)]2
2
a (a + 1)
a
(a + b) (a + b + 1)
a+b
2
a (a + 1) (a + b) a (a + b + 1)
(a + b)2 (a + b + 1)
a a2 + ab + a + b
a2 + ab + a
(a + b)2 (a + b + 1)
ab
2
(a + b) (a + b + 1)
9: (b)
4.5
4
3.5
3
a=1,b=3
f (x)
a=3,b=1
2.5
a=0.7,b=0.7
2
a=2,b=4
1.5
a=2,b=2
1
0.5
0
0
0.2
0.4
0.6
0.8
1
x
Figure 10.7: Graphs of Beta probability density functions
9: (c) If a = b = 1 then
f (x) = 1 for 0 < x < 1
and 0 otherwise. This is the Uniform(0; 1) probability density function.
10.1. CHAPTER 2
369
10: We will prove this result assuming X is a continuous random variable. The proof for
X a discrete random variable follows in a similar manner with integrals replaced by sums.
Suppose X has probability density function f (x) and E jXjk exists for some integer
k > 1. Then the improper integral
Z1
1
converges. Let A = fx : jxj
Z1
1
jxjk f (x) dx
1g. Then
jxjk f (x) dx =
Z
jxjk f (x) dx +
A
Z
jxjk f (x) dx
A
Since
jxjk f (x)
0
we have
0
Z
k
jxj f (x) dx
A
Convergence of
1
Now
Z1
1
R1
j
f (x) for x 2 A
Z
f (x) dx = P X 2 A
jxj f (x) dx =
(10.11)
A
jxjk f (x) dx and (10.11) imply the convergence of
Z
1
j
jxj f (x) dx +
A
Z
R
A
jxjk f (x) dx.
jxjj f (x) dx for j = 1; 2; :::; k
1
(10.12)
A
and
0
Z
jxjj f (x) dx
1
A
by the same argument as in (10.11). Since
R
A
jxjk f (x)
jxjj f (x)
jxjk f (x) dx converges and
for x 2 A, j = 1; 2; :::; k
then by the Comparison Theorem for Improper Integrals
R
A
both integrals on the right side of (10.12) exist, therefore
E jXj
j
=
Z1
1
1
jxjj f (x) dx converges. Since
jxjj f (x) dx exists for j = 1; 2; :::; k
1
370
11: If X
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Binomial(n; ) then
E (X) = n
and
p
p
V ar (X) = n (1
Let W = X=n. Then
)
E (W ) =
and
(1
)
n
From the result in Section 2:9 we wish to …nd a transformation Y = g (W ) = g (X=n) such
that V ar (Y ) constant then we need g such that
V ar (W ) =
dg
k
=p
d
(1
)
where k is chosen for convenience. We need to solve the separable di¤erential equation
Z
Z
1
d
(10.13)
dg = k p
(1
)
Since
p
d
arcsin x
dx
=
=
=
then the solution to (10.13) is
q
1
d p
x
p 2 dx
( x)
1
1
1 x
1
p
2 x (1
p
1
p
2 x
x)
p
g ( ) = k 2 arcsin( ) + C
p
Letting k = 1=2 and C = 0 we have g ( ) = arcsin( ).
p
Therefore if hX Binomial(n;
) and Y = arcsin( X=n) then
i
p
V ar (Y ) = V ar arcsin( X=n)
constant.
10.1. CHAPTER 2
371
13:(b)
1
P
M (t) = E(etx ) =
= e
x!
x=0
et
= e
x!
x=0
x
et
1
P
= e
xe
etx
e
by the Exponential Series
(et 1)
for t 2 <
t
M 0 (t) = e (e
1)
et
E(X) = M 0 (0) =
t
M 00 (t) = e (e
1)
t
( et )2 + e (e
E(X 2 ) = M 00 (0) =
2
2
et
+
[E(X)]2 =
V ar(X) = E(X )
1)
2
2
+
=
13:(c)
M (t) = E(etx ) =
=
e
=
Z1
Z1
x
e
1
etx e
1
t
Let
(x
dx
dx
1
which converges for
1
y=
)=
1
t x; dy =
t
> 0 or t <
t dx
to obtain
M (t) =
=
=
=
e
Z1
=
e
1
e
=
t
x
1
t
Z1
1
dx =
e
y
e
=
dy =
Z1
e
(1
e
x
1
t
dx
=
lim
t) b!1
e
y b
j
1
t
t
1
e=
lim e
(1
t) b!1
t
e
1
for t <
(1
t)
t
e
b
=
e
(1
=
1
t)
e
t
1
372
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
M 0 (t) =
t
e
(1
t)
e
t
t)2
(1
t
e
=
+
[ (1
(1
t)2
E(X) = M 0 (0) = +
e
M 00 (t) =
t
(1
t)2
E(X 2 ) = M 00 (0) =
=
+
2
(
V ar(X) = E(X 2 )
t
+
(1
t)2
+ ( + 2 )( + )
+3
= ( + )2 +
e
)+
t) + ]
2
+2
=
2
2 et
[ (1
(1
t)3
+2
+2
t) + ]
2
2
[E(X)]2
= ( + )2 +
2
( + )2
2
=
13:(d)
M (t) =
=
=
=
=
=
Z1
1
etx e jx j dx
E(e ) =
2
1
2
3
Z
Z1
14
etx ex dx + etx e (x ) dx5
2
1
3
2
Z1
Z
14
e x(1 t) dx5
ex(t+1) dx + e
e
2
1
"
!
!#
1
e (t+1)
e (1 t)
e
+e
for t + 1 > 0 and 1
2
t+1
1 t
tx
1 et
et
+
for t 2 ( 1; 1)
2 t+1 1 t
et
for t 2 ( 1; 1)
1 t2
M 0 (t) =
=
t
e
1
t2
e
+
e t (2t)
(1
t2 )2
[ 1
t2
t
(1 t2 )2
E(X) = M 0 (0) =
+ 2t]
t>0
10.1. CHAPTER 2
373
e
M 00 (t) =
t
[ 2t + 2] +
(1 t2 )2
E(X 2 ) = M 00 (0) = 2 +
V ar(X) = E(X 2 )
= 2+
t
e
(1
t2 )2
+
e t (4t)
(1
t2 )4
(1
t2 ) + 2t
2
[E(X)]2
2
2
= 2
13:(e)
tX
M (t) = E e
=
Z1
2xetx xdx
1
t
etx + C
0
Since
Z
xetx dx =
M (t) =
=
=
1
t
x
1 tx 1
2
x
e j0
t
t
2
1 t 2
1
e
t
t
t
t
2 (t 1) e + 1
;
t2
For t = 0, M (0) = E (1) = 1. Therefore
8
<1
M (t) = 2[(t
:
1
t
if t 6= 0
if t = 0
1)et +1]
t2
if t 6= 0
Note that
1) et + 1
0
indeterminate of the form
2
t!0
t
0
t
t
2 e + (t 1) e
= lim
by Hospital’s Rule
t!0
2t
et + (t 1) et
0
= lim
indeterminate of the form
t!0
t
0
t
t
t
= lim e + e + (t 1) e
by Hospital’s Rule
lim M (t) = lim
t!0
2 (t
t!0
= 1+1
1=1
= M (0)
Therefore M (t) exists and is continuous for all t 2 <.
374
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Using the Exponential series we have for t 6= 0
2 (t
1) et + 1
t2
=
=
=
=
=
=
1 ti
P
2
(t
1)
+1
t2
i=0 i!
1 ti+1
1 ti
P
2 P
+1
t2 i=0 i!
i=0 i!
1 ti+1
1 ti
P
P
2
t
+
1
+
t
+
t2
i=1 i!
i=2 i!
i+1
i
1
1
Pt
2 Pt
2
t i=1 i!
i=2 i!
i+1
1
1
P
ti+1
2 Pt
t2 i=1 i!
i=1 (i + 1)!
1
1
2 P 1
ti+1
t2 i=1 i! (i + 1)!
+1
and since
1
2 P
1
2
t i=1 i!
1
ti+1 jt=0 = 1
(i + 1)!
therefore M (t) has a Maclaurin series representation for all t 2 < given by
1
1
1
2 P
ti+1
t2 i=1 i! (i + 1)!
1
P
1
1
= 2
ti 1
i!
(i
+
1)!
i=1
1
P
1
1
=
2
ti
(i + 1)! (i + 2)!
i=0
Since E X k = k!
the coe¢ cient of tk in the Maclaurin series for M (t) we have
E (X) = (1!) (2)
1
(1 + 1)!
1
=2
(1 + 2)!
1
2
1
6
E X 2 = (2!) (2)
1
(2 + 1)!
1
=4
(2 + 2)!
1
6
1
24
=
2
3
and
Therefore
V ar (X) = E X
2
1
[E (X)] =
2
2
2
3
2
=
1
18
=
1
2
10.1. CHAPTER 2
375
Alternatively we could …nd E (X) = M 0 (0) using the limit de…nition of the derivative
M 0 (0) = lim
M (t)
2[(t 1)et +1]
t2
= lim
2 (t
t!0
=
2
3
1
t
1) et + 1
t3
t!0
= lim
M (0)
t
t!0
t2
using L’Hospital’s Rule
Similarly E X 2 = M 00 (0) could be found using
M 00 (0) = lim
M 0 (t)
M 0 (0)
t
t!0
where
!
1) et + 1
t2
2 (t
d
M 0 (t) =
dt
for t 6= 0.
13:(f )
M (t) = E e
tX
Z1
=
tx
e xdx +
0
Z1
=
Since
Z
M (t) =
=
1
t
1
t
= e2t
=
e2t
xetx dx =
Z2
etx dx
x
1
t
1
1
t
e
x) dx
Z2
xetx dx:
1
etx + C
1 tx 1 2 tx 2
1
1
e j0 + e j1
x
t
t
t
t
1 t 1
1
2 2t
1
e
+
e
et
t
t
t
t
1
1
2 1 1
+
2 + et
1
t
t t
t
t
t
2e + 1
for t 6= 0
t2
x
For t = 0, M (0) = E (1) = 1. Therefore
(
1
M (t) =
2t
etx (2
1
xetx dx + 2
0
Z2
etx j21
1 2t
1
2
e
1
t
t
2 1
1
1
+
1
+ 2
t
t
t
t
if t = 0
2et +1
t2
if t 6= 0
1
t
et
376
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Note that
2et + 1
0
, indeterminate of the form
, use Hospital’s Rule
2
t!0
t
0
2e2t 2et
0
= lim
, use Hospital’s Rule
, indeterminate of the form
t!0
2t
0
2e2t et
= lim
=2 1=1
t!0
1
lim M (t) = lim
t!0
e2t
and therefore M (t) exists and is continuous for t 2 <.
Using the Exponential series we have for t 6= 0
1
(2t)2 (2t)3 (2t)4
f1
+
2t
+
+
+
+
t2
2!
3!
4!
t2 t3 t4
2 1+t+ + + +
+ 1g
2! 3! 4!
7
= 1 + t + t2 +
12
2et + 1
t2
e2t
=
and since
1+t+
7 2
t +
12
(10.14)
jt=0 = 1
(10.14) is the Maclaurin series representation for M (t) for t 2 <.
Since E X k = k! the coe¢ cient of tk in the Maclaurin series for M (t) we have
E (X) = 1!
7
7
=
12
6
1 = 1 and E X 2 = 2!
Therefore
7
1
1=
6
6
0
Alternatively we could …nd E (X) = M (0) using the limit de…nition of the derivative
[E (X)]2 =
V ar (X) = E X 2
M 0 (0) = lim
M (t)
M (0)
t
2et + 1
= lim
t!0
t3
7
= lim 1 + t +
t!0
12
t!0
e2t
= lim
e2t 2et +1
t2
t!0
t2
= lim
t2
1
t
+ t3 +
t!0
7 4
12 t
t3
+
t2
=1
Similarly E X 2 = M 00 (0) could be found using
M 00 (0) = lim
M 0 (t)
t!0
M 0 (0)
t
where
M 00 (t) =
d
dt
e2t
2et + 1
t2
=
2 t e2t
et
e2t + 2et
t3
1
for t 6= 0
10.1. CHAPTER 2
377
14:(a)
K(t) = log M (t)
M 0 (t)
K 0 (t) =
M (t)
0 (0)
M
K 0 (0) =
M (0)
E(X)
=
1
= E(X) since M (0) = 1
K 00 (t) =
M (t)M 00 (t) [M 0 (t)]2
[M (t)]2
K 00 (0) =
M (0)M 00 (0) [M 0 (0)]2
[M (0)]2
[E(X)]2
E(X 2 )
=
1
= V ar(X)
14:(b) If X
Negative Binomial(k; p) then
M (t) =
p
1 qet
k
for t <
log q;
q=1
p
Therefore
p
1 qet
k log 1 qet
for t <
K(t) = log M (t) = k log
= k log p
log q
qet
kqet
=
1 qet
1 qet
kq
kq
E(X) = K 0 (0) =
=
1 q
p
K 0 (t) =
K 00 (t) = kq
k
"
1
qet et
(1
V ar(X) = K 00 (0) = kq
et
qet )2
qet
#
1 q+q
kq
= 2
2
p
(1 q)
378
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
15:(b)
1+t
1 t
M (t) =
= (1 + t)
1
P
=
1
P
tk
k=0
1
P
tk +
for jtj < 1 by the Geometric series
tk+1
k=0
2
k=0
1 + t + t + ::: t + t2 + t3 + :::
1
P
2tk for jtj < 1
= 1+
=
(10.15)
k=1
Since
M (t) =
1 M (k) (0)
1 E(X k )
P
P
tk =
tk
k!
k!
k=0
k=0
(10.16)
then by matching coe¢ cients in the two series (10.15) and (10.16) we have
E(X k )
= 2 for k = 1; 2; :::
k!
or
E(X k ) = 2k! for k = 1; 2; :::
15:(c)
M (t) =
et
1
=
=
t2
ti
i=0 i!
1
P
1+
= 1+
+ 1+
1
P
t2k
for jtj < 1 by the Geometric series
k=0
t2
t
t3
+ + + ::: 1 + t2 + t4 + :::
1! 2! 3!
1
1
1
1
t+ 1+
t2 +
+
t3
1!
2!
1! 3!
1
1
+
2! 4!
Since
M (t) =
t4 +
1
1
1
+ +
1! 3! 5!
for jtj < 1
t5 + ::: for jtj < 1
1 M (k) (0)
1 E(X k )
P
P
tk =
tk
k!
k!
k=0
k=0
then by matching coe¢ cients in the two series we have
E X 2k
E X 2k+1
k
P
1
for k = 1; 2; :::
(2i)!
i=0
k
P
1
= (2k + 1)!
for k = 1; 2; :::
(2i
+
1)!
i=0
= (2k)!
10.1. CHAPTER 2
379
16: (a)
MY (t) = E e
=
=
tY
=E e
2
p
2
etz e
2
p
2
Z1
=
1
etjzj p e
2
2
e (z
z 2 =2
since etjzj e
dz
z 2 =2
z 2 =2
dz
is an even function
0
2 =2
= 2et
t2 =2
= 2e
t2 =2
Z1
2
2zt)=2
0
Z1
0
where
Z1
1
Z1
= 2e
tjZj
Zt
1
1
p e
2
1
p e
2
(t)
2et =2
dz = p
2
(z t)2 =2
y 2 =2
dz
2
e (z
2zt t2 )=2
0
let y =
(z
t) ; dy =
dy
for t 2 <
is the N(0; 1) cumulative distribution function.
16: (b) To …nd E (Y ) = E (jZj) we …rst note that
(0) =
d
(t) =
dt
and
1
2
1
(t) = p e
2
t2 =2
1
(0) = p
2
Then
i
d h t2 =2
d
MY (t) =
2e
(t)
dt
dt
2
2
= 2tet =2 (t) + 2et =2 (t)
Therefore
E (Y ) = E (jZj)
d
=
MY (t) jt=0
dt
h
=
2 =2
2tet
2
= 0+ p
2
dz
2 =2
(t) + 2et
r
2
=
i
(t) jt=0
dz
380
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
To …nd V ar (Y ) = V ar (jZj) we note that
d
(t) =
dt
0
=
d
dt
(t)
1
p e
2
te
p
=
t2 =2
t2 =2
2
and
0
(0) = 0
Therefore
d2
MY (t) =
dt2
d d
MY (t)
dt dt
i
d h t2 =2
2
=
2te
(t) + 2et =2 (t)
dt
d
2
= 2
et =2 [t (t) + (t)]
dt
n
= 2
and
E Y2
2 =2
et
t (t) +
d2
MY (t) jt=0
dt2
= 2 (1) 0 + (0) +
(t) +
0
2 =2
(t) + tet
o
[t (t) + (t)]
=
0
(0) + (0) [(0)
= 2 (0)
1
= 2
=1
2
Therefore
V ar (Y ) = V ar (jZj)
= E Y2
[E (Y )]2
r !2
2
= 1
= 1
=
2
2
(0) + (0)]
10.1. CHAPTER 2
381
18: Since
MX (t) = E(etx )
1
P
=
ejt pj
j=0
for jtj < h; h > 0
then
1
P
MX (log s) =
=
j=0
1
P
ej log s pj
s j pj
j=0
for j log sj < h; h > 0
which is a power series in s. Similarly
MY (log s) =
1
P
sj qj
j=0
for j log sj < h; h > 0
which is also a power series in s.
We are given that MX (t) = MY (t) for jtj < h; h > 0. Therefore
MX (log s) = MY (log s) for j log sj < h; h > 0
and
1
P
j=0
s j pj =
1
P
j=0
s j qj
for j log sj < h; h > 0
Since two power series are equal if and only if their coe¢ cients are all equal we have pj = qj ,
j = 0; 1; ::: and therefore X and Y have the same distribution.
382
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
19: (a) Since X has moment generating function
et
M (t) =
for jtj < 1
t2
1
the moment generating function of Y = (X
1) =2 is
h
= E et(X
MY (t) = E etY
t
= e
t=2
E e ( 2 )X
= e
t=2
M
= e
t=2
=
i
t
2
et=2
;=
t 2
2
1
1
t
<1
2
for
for jtj < 2
1 2
4t
1
1)=2
19: (b)
2
1 2
t
4
M 0 (t) = ( 1) 1
1
t
2
1
= t 1
2
1 2
t
4
2
E(X) = M 0 (0) = 0
M 00 (t) =
1
2
"
1
E(X 2 ) = M 00 (0) =
V ar(X) = E(X 2 )
1 2
t
4
2
1 2
t
4
+ t ( 2) 1
3
1
t
2
#
1
2
[E(X)]2 =
1
2
0=
1
2
19: (c) Since the moment generating function of a Double Exponential( ; ) random variable
is
e t
1
M (t) =
for jtj <
2 2
1
t
and the moment generating function of Y is
MY (t) =
1
1
1 2
4t
for jtj < 4
therefore by the Uniqueness Theorem for Moment Generating Functions Y has a Double
Exponential 0; 12 distribution.
10.2. CHAPTER 3
10.2
383
Chapter 3
1:(a)
1
k
=
1 P
1
P
q 2 px+y = q 2
y=0 x=0
= q2
= q
1
P
1
P
py
y=0
py
y=0
1
P
py
1
1
1
P
px
x=0
by the Geometric Series since 0 < p < 1
p
since q = 1
p
y=0
= q
1
1
by the Geometric Series
p
= 1
Therefore k = 1.
1:(b) The marginal probability function of X is
f1 (x) = P (X = x) =
P
f (x; y) =
y
= q 2 px
= qpx
1
1
P
q 2 px+y = q 2 px
1
P
y=0
y=0
py
!
by the Geometric Series
1 p
for x = 0; 1; :::
By symmetry marginal probability function of Y is
f2 (y) = qpy
for y = 0; 1; :::
The support set of (X; Y ) is A = f(x; y) : x = 0; 1; :::; y = 0; 1; :::g. Since
f (x; y) = f1 (x) f2 (y)
for (x; y) 2 A
therefore X and Y are independent random variables.
1:(c)
P (X = xjX + Y = t) =
P (X = x; X + Y = t)
P (X = x; Y = t x)
=
P (X + Y = t)
P (X + Y = t)
Now
P (X + Y = t) =
P
P
q 2 px+y = q 2
(x;y): x+y=t
= q 2 pt (t + 1)
t
P
px+(t
x=0
for t = 0; 1; :::
x)
= q 2 pt
t
P
x=0
Therefore
P (X = xjX + Y = t) =
1
q 2 px+(t x)
=
2
t
q p (t + 1)
t+1
for x = 0; 1; :::; t
1
384
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
2:(a)
f (x; y) =
2
e
x! (y
x)!
for x = 0; 1; :::; y; y = 0; 1; :::
OR
f (x; y) =
f1 (x) =
2
e
x! (y
P
x)!
f (x; y) =
y
=
=
=
Note that X
1
P
e
y=x x! (y
1
e 2 P
1
x! y=x (y x)!
2
x)!
let k = y
x
1 1
e 2 e1
e 2 P
=
x! k=0 k!
x!
e 1
x!
x = 0; 1; ::: by the Exponential Series
Poisson(1).
f2 (y) =
P
f (x; y) =
x
=
=
=
=
Note that Y
for y = x; x + 1; :::; x = 0; 1; :::
2
y
P
y
P
e
x=0 x! (y
2
x)!
y!
x)!
x=0 x!(y
2
y
P y x
e
1
y! x=0 x
e
y!
e 2
(1 + 1)y by the Binomial Series
y!
2y e 2
for y = 0; 1; :::
y!
Poisson(2).
2:(b) Since (for example)
f (1; 2) = e
2
6= f1 (1)f2 (2) = e
2
2
12 e
2!
= 2e
3
therefore X and Y are not independent random variables.
OR
The support set of X is A1 = fx : x = 0; 1; :::g, the support set of Y is A2 = fy : y = 0; 1; :::g
and the support set of (X; Y ) is A = f(x; y) : x = 0; 1; :::; y; y = 0; 1; :::g. Since A 6= A1 A2
therefore by the Factorization Theorem for Independence X and Y are not independent
random variables.
10.2. CHAPTER 3
385
3:(a) The support set of (X; Y )
(x; y) : 0 < y < 1 x2 ;
1<x<1
n
o
p
p
=
(x; y) :
1 y < x < 1 y; 0 < y < 1
A =
is pictured in Figure 10.8.
y
y =1-x 2
A
-1
1
0
x
Figure 10.8: Support set for Problem 3(a)
1 =
Z1 Z1
1
= k
f (x; y)dxdy = k
1
Z1
Z Z
1
x y + y 2 j10
2
2
x2
dx = k
= k
0
= k
Z1
1Z x2
Z1
x2 1
x2 +
1
1
2
x2
1
h
2x2 1
1
x2 + 1
x4 dx = k x
x2
2
i
dx
by symmetry
4
5
1 51
x j = k and thus k =
5 0
5
4
0
Therefore
f (x; y) =
x2 + y dydx
x= 1 y=0
(x;y) 2A
1
Z1
x2 + y dydx = k
Z1
5 2
x +y
4
for (x; y) 2 A
2
dx
386
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
3:(b) The marginal probability density function of X is
Z1
f1 (x) =
f (x; y)dy
1
1Z x2
=
5
4
=
5
1
8
x2 + y dy
0
x4
for
1<x<1
and 0 otherwise. The support set of X is A1 = fx :
1 < x < 1g.
The marginal probability density function of Y is
f2 (y) =
Z1
1
=
f (x; y)dx
p
Z1
5
4
p
y
x2 + y dx
1 y
p
=
Z1 y
5
2
x2 + y dx
because of symmetry
0
=
=
=
=
p
5 1 3
1 y
x + yxj0
2 3
5 1
(1 y)3=2 + y (1 y)1=2
2 3
5
(1 y)1=2 [(1 y) + 3y]
6
5
(1 y)1=2 (1 + 2y) for 0 < y < 1
6
and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g.
3:(c) The support set A of (X; Y ) is not rectangular. To show that X and Y are not
independent random variables we only need to …nd x 2 A1 , and y 2 A2 such that (x; y) 2
= A.
Let x =
3
4
and y = 12 . Since
f
3 1
;
4 2
= 0 6= f1
3
4
f2
1
2
therefore X and Y are not independent random variables.
>0
10.2. CHAPTER 3
387
3:(d) The region of integration is pictured in Figure 10.9.
y =x +1
y
2
y =1-x
C
B
-1
1
0
x
Figure 10.9: Region of integration for Problem 3 (d)
P (Y
X + 1) =
Z
Z
5 2
x + y dydx = 1
4
(x;y) 2B
= 1
Z
Z
5 2
x + y dydx
4
(x;y) 2C
Z0
1Z
x2
5 2
x + y dydx
4
x= 1 y=x+1
= 1
5
4
Z0
2
1
x2 y + y 2 j1x+1x dx
2
x= 1
= 1
5
8
Z0 n
[2x2 (1
x2 ) + 1
2
x2 ]
1
Z0
= 1
5
8
= 1
5 1
1
( 1) + + ( 1) + 1 = 1
8 5
2
x4
2x3
3x2
o
[2x2 (x + 1) + (x + 1)2 ] dx
2x dx = 1 +
5 1 5 1 4
x + x + x3 + x2 j0 1
8 5
2
1
=
13
16
5
8
2+5
10
=1
3
16
388
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
4:(a) The support set of (X; Y )
(x; y) : x2 < y < 1;
1<x<1
p
p
y < x < y; 0 < y < 1g
= f(x; y) :
A =
is pictured in Figure 10.10.
y
y=x
2
y=1
1
A
|
-1
|
1
0
x
Figure 10.10: Support set for Problem 4 (a)
1 =
Z1 Z1
1
= k
f (x; y)dxdy = k
1
k
2
x2 ydydx
(x;y) 2A
Z1
Z1
Z1
x2 1
2
x ydydx = k
x= 1 y=x2
=
Z Z
Z1
x2
1 21
y j 2 dx
2 x
x= 1
x4 dx =
k 1 3
x
2 3
1 71
x j 1
7
x= 1
=
=
k 1
2 3
4k
21
1
7
1
1
( 1) + ( 1) = k
3
7
Therefore k = 21=4 and
f (x; y) =
and 0 otherwise.
21 2
x y
4
for (x; y) 2 A
1
3
1
7
10.2. CHAPTER 3
389
4:(b) The marginal probability density function of X is
21
4
f1 (x) =
Z1
x2
21x2
=
x2 ydy
h
y 2 j1x2
8
21x2
1
8
=
i
x4
for
and 0 otherwise. The support set of X is A1 = fx :
1<x<1
1 < x < 1g.
The marginal probability density function of Y
p
f2 (y) =
21
4
x=
=
=
=
=
Z
Zy
p
p
x2 ydx
y
y
21
x2 ydx because of symmetry
2 0
7 h 3 py i
y x j0
2
7
y y 3=2
2
7 5=2
y
for 0 < y < 1
2
and 0 otherwise. The support set of Y is is A2 = fy : 0 < y < 1g.
The support set A of (X; Y ) is not rectangular. To show that X and Y are not independent random variables we only need to …nd x 2 A1 , and y 2 A2 such that (x; y) 2
= A.
Let x =
1
2
and y =
1
10 .
Since
f
1 1
;
2 10
= 0 6= f1
1
2
f2
1
10
therefore X and Y are not independent random variables.
>0
390
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
4:(c) The region of integration is pictured in Figure 10.11.
y
y=x
y=1
2
1
y=x
B
|
-1
|
1
0
x
Figure 10.11: Region of intergration for Problem 4 (c)
P (X
Y) =
Z
Z
21 2
x ydydx
4
(x;y) 2B
=
Z1 Zx
21 2
x ydydx
4
x=0 y=x2
=
21
8
Z1
x2 y 2 jxx2 dx
21
8
Z1
x2 x2
21
8
Z1
0
=
x4 dx
0
=
x4
x6 dx
0
=
=
=
21 1 5 1 7 1
x
x j
8 5
7 0
21 7 5
8
35
3
20
10.2. CHAPTER 3
391
4:(d) The conditional probability density function of X given Y = y is
f (x; y)
f2 (y)
21 2
4 x y
7 5=2
2y
3 2 3=2
x y
2
f1 (xjy) =
=
=
p
for
y<x<
p
y; 0 < y < 1
and 0 otherwise. Check:
Z1
p
f1 (xjy) dx =
1
1
y
2
3=2
Zy
p
3x2 dx
y
h p i
y
= y 3=2 x3 j0
= y
3=2 3=2
y
= 1
The conditional probability density function of Y given X = x is
f2 (yjx) =
=
=
f (x; y)
f1 (x)
21 2
4 x y
21x2
8
(1 x4 )
2y
for x2 < y < 1;
(1 x4 )
1<x<1
and 0 otherwise. Check:
Z1
f2 (yjx) dy =
1
1
x4 )
(1
Z1
2ydy
x2
=
1
(1 x4 )
1 x4
=
1 x4
= 1
y 2 j1x2
392
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(d) (i) The support set of (X; Y )
A = f(x; y) : 0 < x < y < 1g
is pictured in Figure 10.12.
y
y=1
1
y=x
A
|
1
0
x
Figure 10.12: Graph of support set for Problem 6(d) (i)
1 =
Z1 Z1
1
=
f (x; y) dxdy = k
1
Z1 Zy
(x + y) dydx
(x;y) 2A
k (x + y) dxdy = k
y=0 x=0
= k
Z Z
Z1
1 2
y + y 2 dx
2
Z1
k 31
3 2
y dy =
y j0
2
2
Z1
1 2
x + xy jyx=0 dy
2
y=0
0
= k
0
=
k
2
Therefore k = 2 and
f (x; y) = 2 (x + y)
and 0 otherwise.
for (x; y) 2 A
10.2. CHAPTER 3
393
6:(d) (ii) The marginal probability density function of X is
Z1
f1 (x) =
f (x; y) dy =
2 (x + y) dy
y=x
1
=
Z1
2xy + y
2
j1y=x
2
2x + x2
= (2x + 1)
= 1 + 2x
3x2
for 0 < x < 1
and 0 otherwise. The support set of X is A1 = fx : 0 < x < 1g.
The marginal probability density function of Y is
Z1
f2 (y) =
f (x; y) dx =
1
2
Zy
2 (x + y) dx
x=0
x + 2yx jyx=0
=
= y 2 + 2y 2
= 3y
2
0
for 0 < y < 1
and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g.
6:(d) (iii) The conditional probability density function of X given Y = y is
f1 (xjy) =
=
Check:
Z1
f1 (xjy) dx =
1
Zy
f (x; y)
f2 (y)
2 (x + y)
3y 2
for 0 < x < y < 1
2 (x + y)
1
dx = 2
2
3y
3y
x=0
Zy
2 (x + y) dx =
3y 2
=1
3y 2
x=0
The conditional probability density function of Y given X = x is
f2 (yjx) =
=
f (x; y)
f1 (x)
2 (x + y)
1 + 2x 3x2
for 0 < x < y < 1
and 0 otherwise. Check:
Z1
1
f2 (yjx) dy =
Z1
y=x
2 (x + y)
1
dy =
2
1 + 2x 3x
1 + 2x
3x2
Z1
y=x
2 (x + y) dx =
1 + 2x
1 + 2x
3x2
=1
3x2
394
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(d) (iv)
E (Xjy) =
Z1
xf1 (xjy) dx
=
Zy
x
1
2 (x + y)
dx
3y 2
x=0
=
1
3y 2
Zy
2 x2 + yx dx
x=0
=
=
=
=
1
3y 2
1
3y 2
1
3y 2
5
y
9
2 3
x + yx2 jyx=0
3
2 3
y + y3
3
5 3
y
3
for 0 < y < 1
E (Y jx) =
Z1
yf2 (yjx) dy
=
Z1
y
1
2 (x + y)
dy
1 + 2x 3x2
y=x
=
1
1 + 2x
3x2
2
4
Z1
y=x
=
=
=
=
3
2 xy + y 2 dy 5
2
1
xy 2 + y 3 j1y=x
1 + 2x 3x2
3
1
2
2
x+
x3 + x3
2
1 + 2x 3x
3
3
2
5 3
3 +x
3x
1 + 2x 3x2
2 + 3x 5x3
for 0 < x < 1
3 (1 + 2x 3x2 )
10.2. CHAPTER 3
395
6:(f ) (i) The support set of (X; Y )
A = f(x; y) : 0 < y < 1
x; 0 < x < 1g
= f(x; y) : 0 < x < 1
y; 0 < y < 1g
is pictured in Figure 10.13.
y
1
y =1-x
A
x
1
0
Figure 10.13: Support set for Problem 6 (f ) (i)
1 =
Z1 Z1
1
= k
f (x; y) dxdx = k
1
1 x
Z1 Z
=
k
2
Z1
x2
x2 ydydx
(x;y) 2A
2
x ydydx = k
x=0 y=0
Z Z
Z1
x
2
1 21
y j
2 0
x
k
dx =
2
0
2x3 + x4 dx =
k 1
2 3
k
=
60
Therefore k = 60 and
k 1 3
x
2 3
1 4 1 51
x + x j0
2
5
1 1
k 10 15 + 6
+
=
2 5
2
30
f (x; y) = 60x2 y
and 0 otherwise.
x2 (1
0
0
=
Z1
for (x; y) 2 A
x)2 dx
396
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(f ) (ii) The marginal probability density function of X is
Z1
f1 (x) =
f (x; y) dy
1
= 60x
2
= 30x
2
1
Z
x
ydy
0
y 2 j10
= 30x2 (1
x
x)2
for 0 < x < 1
and 0 otherwise. The support of X is A1 = fx : 0 < x < 1g. Note that X
Beta(3; 3).
The marginal probability density function. of Y is
Z1
f2 (y) =
1
= 60y
f (x; y) dx
1
Z
y
x2 dx
0
h
= 20y x3 j10
= 20y (1
y
i
y)3
for 0 < y < 1
and 0 otherwise. The support of Y is A2 = fy : 0 < y < 1g. Note that Y
Beta(2; 4).
6:(f ) (iii) The conditional probability density function of X given Y = y is
f1 (xjy) =
=
=
f (x; y)
f2 (y)
60x2 y
20y (1 y)3
3x2
for 0 < x < 1
(1 y)3
y; 0 < y < 1
Check:
Z1
1
f1 (xjy) dx =
1
(1
y)3
1
Z
0
y
3x2 dx =
1
(1
y)3
h
x3 j01
y
i
=
(1
(1
y)3
=1
y)3
10.2. CHAPTER 3
397
The conditional probability density function of Y given X = x is
f (x; y)
f1 (x)
60x2 y
30x2 (1 x)2
2y
for 0 < y < 1
(1 x)2
f2 (yjx) =
=
=
x; 0 < x < 1
and 0 otherwise. Check:
Z1
1
f2 (yjx) dy =
1
(1
x)2
1
Z
x
2ydy =
0
1
x)2
(1
y 2 j01
x
6:(f ) (iv)
E (Xjy) =
Z1
xf1 (xjy) dx
1
= 3 (1
y)
3
1
Z
y
x x2 dx
0
= 3 (1
=
=
E (Y jx) =
3
(1
4
3
(1
4
Z1
y)
3
y)
3
y)
1 41
x j
4 0
y
y)4
(1
for 0 < y < 1
yf2 (yjx) dy
1
= 2 (1
x)
2
1
Z
x
y (y) dy
0
= 2 (1
=
=
2
(1
3
2
(1
3
x)
2
x)
2
x)
1 31
y j
3 0
(1
x
x)3
for 0 < x < 1
=
(1
(1
x)2
=1
x)2
398
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(g) (i) The support set of (X; Y )
A = f(x; y) : 0 < y < xg
is pictured in Figure 10.14.
y
.
.
.
y=x
...
A
x
0
Figure 10.14: Support set for Problem 6 (g) (i)
1 =
Z1 Z1
1
= k
1
Z1
Z1
e
x 2y
e
y=0 x=y
Z1
= k
dxdy = k
=
k
3
2y
lim
b!1
0
2y
3e
e
x 2y
e
(x;y) 2A
Z1
e
y
lim e
b!1
0
Z1
Z Z
f (x; y) dxdx = k
b
dy = k
Z1
dydx
h
e
3y
x b
jy
e
i
dy
dy
0
3y
dy
0
3y
But 3e
is the probability density function of a Exponential
therefore the integral is equal to 1. Therefore 1 = k=3 or k = 3.
Therefore
f (x; y) = 3e
and 0 otherwise.
x 2y
for (x; y) 2 A
1
3
random variable and
10.2. CHAPTER 3
399
6:(g) (ii) The marginal probability density function of X is
f1 (x) =
Z1
f (x; y) dy =
1
3e
x 2y
dy
0
= 3e
x
1
e
2
3
e
2
x
1
=
Zx
2y x
j0
e
2x
for x > 0
and 0 otherwise. The support of X is A1 = fx : x > 0g.
The marginal probability density function of Y is
f2 (y) =
Z1
f (x; y) dx
=
Z1
3e
x 2y
= 3e
2y
= 3e
2y
= 3e
3y
1
dx
y
lim
b!1
e
y
e
x b
jy
lim e
b
b!1
for y > 0
and 0 otherwise. The support of Y is A2 = fy : y > 0g. Note that Y
Exponential
6:(g) (iii) The conditional probability density function of X given Y = y is
f (x; y)
f2 (y)
3e x 2y
=
3e 3y
= e (x y) for x > y > 0
f1 (xjy) =
Note that XjY = y
Two Parameter Exponential(y; 1).
The conditional probability density function of Y given X = x is
f2 (yjx) =
=
=
f (x; y)
f1 (x)
3e x 2y
3
x (1
e
2e
2y
2e
1
2x )
e
2x
for 0 < y < x
1
3
.
400
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and 0 otherwise. Check:
Z1
1
f2 (yjx) dy =
Zx
2y
2e
1
e
dy =
2x
1
1
e
2x
2y x
j0
e
=
0
6:(g) (iv)
E (Xjy) =
Z1
=
Z1
xf1 (xjy) dx
1
xe
(x y)
y
= ey lim
b!1
h
dx
(x + 1) e
= ey (y + 1) e
y
x b
jy
lim
b!1
i
b+1
eb
= y + 1 for y > 0
E (Y jx) =
=
Z1
yf2 (yjx) dy
1
Zx
2ye 2y
dy
1 e 2x
0
=
=
=
1
1
y+
e 2y jx0
2x
1 e
2
1
1
1
x+
e 2x
1 e 2x 2
2
1 (2x + 1) e 2x
for x > 0
2 (1 e 2x )
1
1
e
e
2x
2x
=1
10.2. CHAPTER 3
7:(a) Since X
401
Uniform(0; 1)
f1 (x) = 1 for 0 < x < 1
The joint probability density function of X and Y is
f (x; y) = f2 (yjx) f1 (x)
1
=
(1)
1 x
1
for 0 < x < y < 1
=
1 x
and 0 otherwise.
7:(b) The marginal probability density function of Y is
f2 (y) =
Z1
=
Zy
f (x; y) dx
1
1
1
x
dx
x=0
=
log (1
=
log (1
x) jy0
y)
for 0 < y < 1
and 0 otherwise.
7:(c) The conditional probability density function of X given Y = y is
f1 (xjy) =
=
=
and 0 otherwise.
f (x; y)
f2 (y)
1
1 x
log (1
(x
y)
1
1) log (1
y)
for 0 < x < y < 1
402
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
9: Since Y j
Binomial(n; ) then
E (Y j ) = n
Since
and V ar (Y j ) = n (1
)
Beta(a; b) then
E( )=
a
ab
, V ar ( ) =
a+b
(a + b + 1) (a + b)2
and
E
2
= V ar ( ) + [E ( )]2
=
=
=
=
2
ab
a
+
a+b
(a + b + 1) (a + b)2
2
ab + a (a + b + 1)
(a + b + 1) (a + b)2
a [b + a(a + b) + a]
a (a + b) (a + 1)
2 =
(a + b + 1) (a + b)
(a + b + 1) (a + b)2
a (a + 1)
(a + b + 1) (a + b)
Therefore
E (Y ) = E [E (Y j )] = E (n ) = nE ( ) = n
a
a+b
and
V ar (Y ) = E [var (Y j )] + V ar [E (Y j )]
= E [n (1
= n E( )
)] + V ar (n )
E
2
+ n2 V ar ( )
a (a + 1)
n2 ab
+
(a + b + 1) (a + b)
(a + b + 1) (a + b)2
(a + b + 1) (a + 1)
n2 ab
= na
+
(a + b + 1) (a + b)
(a + b + 1) (a + b)2
b (a + b)
n2 ab
= na
2 +
(a + b + 1) (a + b)
(a + b + 1) (a + b)2
nab (a + b + n)
=
(a + b + 1) (a + b)2
= n
a
a+b
10.2. CHAPTER 3
10: (a) Since Y j
403
Poisson( ), E (Y j ) =
and since
Gamma( ; ), E ( ) =
.
E (Y ) = E [E (Y j )] = E ( ) =
Since Y j
Poisson( ), V ar (Y j ) =
and since
Gamma( ; ), V ar ( ) =
2
.
V ar (Y ) = E [V ar (Y j )] + V ar [E (Y j )]
= E ( ) + V ar ( )
=
10: (b) Since Y j
2
+
Poisson( ) and
Gamma( ; ) we have
ye
f2 (yj ) =
for y = 0; 1; : : : ;
y!
and
=
1e
f1 ( ) =
>0
for
( )
>0
and by the Product Rule
ye
f ( ; y) = f2 (yj ) f1 ( ) =
1e
y!
=
for y = 0; 1; : : : ;
( )
>0
The marginal probability function of Y is
f2 (y) =
Z1
f ( ; y) d
1
=
1
y!
( )
Z1
y+
1
(1+1= )
e
d
let x =
1+
1
=
+1
0
=
y+
1
y!
( )
+1
Z1
xy+
1
e
x
dx
0
y
=
(y + )
y! ( )
(y +
1) (y +
2)
y! ( )
(1 + )y+
y
=
=
(1 + )
y+
y
y+
1
1
1+
1
1
1+
( ) ( )
y
for y = 0; 1; : : :
If is a nonnegative integer then we recognize this as the probability function of a Negative
Binomial ; 1+1
random variable.
11: (a) First note that
@ j+k
@tj1 @tk2
et1 x+t2 y = xj y k et1 x+t2 y
404
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
@ j+k
@tj1 @tk2
M (t1 ; t2 ) =
Z1 Z1
1
@ j+k
@tj1 @tk2
1
et1 x+t2 y f (x; y) dxdx
(assuming the operations of integration and di¤erentiation can be interchanged)
=
Z1 Z1
1
xj y k et1 x+t2 y f (x; y) dxdx
1
= E X j Y k et1 X+t2 Y
and
@ j+k
@tj1 @tk2
M (t1 ; t2 ) j(t1 ;t2 )=(0;0) = E X j Y k
as required. Note that this proof is for the case of (X; Y ) continuous random variables.
The proof for (X; Y ) discrete random variables follows in a similar manner with integrals
replaced by summations.
11: (b) Suppose that X and Y are independent random variables then
M (t1 ; t2 ) = E et1 X+t2 Y = E et1 X et2 Y = E et1 X E et2 Y = MX (t1 ) MY (t2 )
Suppose that M (t1 ; t2 ) = MX (t1 ) MY (t2 ) for all jt1 j < h; jt2 j < h for some h > 0. Then
Z1 Z1
1
t1 x+t2 y
e
f (x; y) dxdx =
1
Z1
e
t1 x
1
=
Z1 Z1
1
f1 (x) dx
Z1
et2 Y f2 (y) dy
1
et1 x+t2 y f1 (x) f2 (y) dxdx
1
But by the Uniqueness Theorem for Moment Generating Functions this can only hold
if f (x; y) = f1 (x) f2 (y) for all (x; y) and therefore X and Y are independent random
variables.
Thus we have shown that X and Y are independent random variables if and only if
MX (t1 ) MY (t2 ) = M (t1 ; t2 ).
Note that this proof is for the case of (X; Y ) continuous random variables. The proof for
(X; Y ) discrete random variables follows in a similar manner with integrals replaced by
summations.
10.2. CHAPTER 3
11: (c) If (X; Y )
405
Multinomial(n; p1 ; p2 ; p3 ) then
n
M (t1 ; t2 ) = p1 et1 + p2 et2 + p3
for t1 2 <; t2 2 <
By 11 (a)
E (XY ) =
=
@2
M (t1 ; t2 ) j(t1 ;t2 )=(0;0)
@t1 @t2
@2
n
p1 et1 + p2 et2 + p3 j(t1 ;t2 )=(0;0)
@t1 @t2
= n (n
1) p1 et1 p2 et2 p1 et1 + p2 et2 + p3
= n (n
1) p1 p2
n 2
Also
MX (t) = M (t; 0)
=
p1 et + 1
p1
n
for t 2 <
and
d
MX (t) jt=0
dt
= np1 p1 et + 1
E (X) =
p1
= np1
n 1
jt=0
Similarly
E (Y ) = np2
Therefore
Cov (X; Y ) = E (XY )
= n (n
=
E (X) E (Y )
1) p1 p2
np1 p2
np1 np2
j(t1 ;t2 )=(0;0)
406
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
1 = I (2
13:(a) Note that since is a symmetric matrix T = so that
2 identity
T
1
T
T
T
matrix) and
= I. Also t = t
since t is a scalar and xtT = txT since
(x
) tT is a scalar.
[x
1
( + t )]
= [(x
)
t ]
= [(x
)
t ]
= (x
)
1
= (x
)
= (x
( + t )]T
[x
1
t ]T
)
2 tT
t tT
2 tT
t tT
(x
[(x
h
1
(x
)T
(x
)
1
(x
)T
(x
) tT
)
1
(x
)T
xtT + tT
txT + t
T
2 tT
= (x
)
1
(x
)T
xtT + tT
xtT + tT
2 tT
= (x
)
1
(x
)T
2xtT
= (x
)
1
(x
)T
2xtT
)T
tT
i
1
2 tT
tT
t tT
1
t
)T + t
(x
)T + t tT
t (x
1
2 tT
tT
2 tT
t tT
t tT
as required.
Now
M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT
Z1 Z1
1
1
=
(x
)
exp(xtT ) exp
1=2
2
2 j j
1
1
=
Z1 Z1
2 j j1=2
exp
=
Z1 Z1
1h
(x
2
1
exp
2 j j1=2
1
[x
2
1
1
= exp
= exp
since
1
1
1
1
t + t tT
2
T
1
tT + t tT
2
1
2 j j1=2
Z1 Z1
1
1
1
)
1
(x
)T dx1 dx2
(x
)T
2xtT
i
dx1 dx2
( + t )]
1
[x
( + t )]T
1
1
[x
2
( + t )]
2 j
j1=2
exp
2 tT
1
[x
t tT
dx1 dx2
( + t )]T
dx1 dx2
for all t = (t1 ; t2 ) 2 <2
exp
1
[x
2
( + t )]
1
[x
( + t )]T
is a BVN( + t ; ) probability density function and therefore the integral is equal to one.
10.2. CHAPTER 3
407
13:(b) Since
MX1 (t) = M (t; 0) = exp
1t
1
+ t2
2
2
1
for t 2 <
which is the moment generating function of a N 1 ; 21 random variable, then by the
Uniqueness Theorem for Moment Generating Functions X1
N 1 ; 21 . By a similar
argument X2 N 2 ; 22 .
13:(c) Since
=
=
=
@2
1
@2
T
M (t1 ; t2 ) =
exp
t + tT t
@t1 @t2
@t1 @t2
2
2
@
1 2 2
1 2 2
exp
1 2 + t2 2
1 t1 + 2 t2 + t1 1 + t1 t2
@t1 @t2
2
2
@
2
1 2 M (t1 ; t2 )
1 + t1 1 + t2
@t2
2
2
1 2 M (t1 ; t2 ) +
1 2
1 + t1 1 + t2
2 + t2 2 + t1
1 2
M (t1 ; t2 )
therefore
E(XY ) =
=
From (b) we know E(X1 ) =
1
@2
M (t1 ; t2 )j(t1 ;t2 )=(0;0)
@t1 @t2
1 2+ 1 2
and E(X2 ) =
Cov(X; Y ) = E(XY )
2:
Therefore
E(X)E(Y ) =
1 2
+
1 2
1 2
=
1 2
13:(d) By Theorem 3.8.6, X1 and X2 are independent random variables if and only if
M (t1 ; t2 ) = MX1 (t1 )MX2 (t2 )
then X1 and X2 are independent random variables if and only if
exp
1 t1
+
2 t2
= exp
1 t1
1
+ t21
2
1
+ t21
2
2
1
2
1
exp
+ t1 t2
2 t2
1 2
1
+ t22
2
1
+ t22
2
2
2
2
2
for all (t1 ; t2 ) 2 <2 or
1 t1
+
2 t2
1
+ t21
2
2
1
+ t1 t2
1 2
1
+ t22
2
2
2
=
1 t1
1
+ t21
2
2
1
+
2 t2
1
+ t22
2
2
2
or
t1 t2
1 2
for all (t1 ; t2 ) 2 <2 which is true if and only if
random variables if and only if = 0.
=0
= 0. Therefore X1 and X2 are independent
408
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
13:(e) Since
1
tT + t tT
2
E exp(XtT ) = exp
for t 2 <2
therefore
Efexp[(XA + b)tT ]g
= Efexp[XAtT + btT ]g
n
h
= exp(btT )E exp X tAT
= exp(btT ) exp
T
tAT
T
+
io
1
tAT
2
T
tAT
1
= exp btT + AtT + t AT A tT
2
1
= exp ( A + b) tT + t(AT A)t
for t 2 <2
2
which is the moment generating function of a BVN A + b; AT A random variable, then
by the Uniqueness Theorem for Moment Generating Functions, XA+b BVN A + b; AT A .
13:(f ) First note that
1
=
1=2
j j
(x
1
)
2
1
=
1
2 (1
2
"
2)
2
1
1 2
) =
2
2
1
2)
(1
"
#
#
1 2
2
1
1 2
1 2
2
2
T
(x
"
1=2
=
h
=
2 2
1 2
1
2)
(1
x1
2
1
"
#
1
2
1
1 2
1
2
2
1 2
i1=2
=
1
x2
2
1 2)
(
2
x1
1
1 2
1
p
1
2
+
2
2
x2
2
2
2
#
and
(x
=
=
=
=
=
1
)
1
2)
(1
1
2)
(1
"
"
1
2 (1
2
1
2 (1
2
1
2 (1
2
2)
2)
2)
2
x1
)T
(x
1
1
2
x1
1
x1
2
1
1
2
x1
1
x1
2
1
(x2
(x2
x2
x2
1
2
2)
2+
2
2
x2
1
(x1
1 ) (x2
2
(x1
1)
(x1
1)
1
2
1
2
2
2
2
+
#
2
2)
+
1
1
1
2
2 2
2
2
1
2
x1
2
x2
2
1
2
+
2
x2
2
1
2)
2
(x1
2
2
x1
1
1
2
1)
#
10.2. CHAPTER 3
409
The conditional probability density function of X2 given X1 = x1 is
f (x1 ; x2 )
f1 (x1 )
=
1
2 j j1=2
exp
p 1
2
=
=
p
p
h
1
1
2
(x
1
p
1
2
1 2
2
1
p
2 1
2
2
)
exp
exp
"
)
2
x1
1
2
exp
T
1 (x
1
i
1
(
"
1
(x
2
1
2
1
2 (1
2
1
)
(x
)
2
x1
T
1
1
x2
2)
2
which is the probability density function of a N
variable.
2
2
+
2
(x1
1)
1
2
1
+
#)
(x1
2
2
1) ;
#
for x 2 <2
2
1
random
14:(a)
M (t1 ; t2 ) = E et1 X+t2 Y
Z1 Zy
Z1
t1 x+t2 y
x y
2e
dxdy = 2 e
=
e
y=0 x=0
Z1
= 2
e
(1 t2 )y
0
=
=
=
=
=
=
=
=
2
(1
t1 )
2
(1
t1 )
2
(1
2
(1
2
Z1
"
e
1
t1
(1 t2 )y
e
0
Z1 h
(1 t1 )x
e
0
lim
t1 ) b!1
lim
t1 ) b!1
"
"
h
1
(1 t2 )y
#
e
e
(1
(1 t2 )y
e
(1
(1 t2 )b
0
jy0 dy
e
i
(2 t1 t2 )y
e
+
t2 )
(2
0
dy
dx5 dy
i
which converges if t1 < 1; t2 < 1
dy
(2 t1 t2 )y
t1
t2 )
jb0
#
(2 t1 t2 )b
t1
t2 )
+
1
1
t1 ) (1 t2 ) (2 t1 t2 )
2
2 t1 t2 (1 t2 )
(1 t1 ) (1 t2 ) (2 t1 t2 )
2
1 t1
(1 t1 ) (1 t2 ) (2 t1 t2 )
2
for t1 < 1; t2 < 1
(1 t2 ) (2 t1 t2 )
(1
(1 t1 )x
3
if t1 < 1
(1 t1 )y
e
+
t2 )
(2
(1
2 y
Z
t2 )y 4
e
1
(1
t2 )
(2
1
t1
t2 )
#
410
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
14:(b)
2
MX (t) = M (t; 0) =
2
t
=
1
1
2t
1
for t < 1
which is the moment generating function of an Exponential 12 random variable. Therefore
by the Uniqueness Theorem for Moment Generating Functions X Exponential 12 .
MY (t) = M (0; t) =
2
t) (2
(1
t)
=
2
2
3t + t2
for t < 1
which is not a moment generating function we recognize. We …nd the probability density
function of Y using
Zy
2e x y dx = 2e y e x jy0
f2 (y) =
0
= 2e
y
1
e
y
for y > 0
and 0 otherwise.
14:(c) E (X) = 12 since X
Alternatively
Exponential
E (X) =
=
1
2
.
d
d
2
MX (t) jt=0 =
dt
dt 2 t
1
2 ( 1) ( 1)
2
2 jt=0 = 4 = 2
(2 t)
jt=0
Similarly
E (Y ) =
=
d
d
2
MY (t) jt=0 =
dt
dt 2 3t + t2
2 ( 1) ( 3 + 2t)
6
3
jt=0 = =
2
4
2
(2 3t + t2 )
jt=0
Since
@2
@2
1
M (t1 ; t2 ) = 2
@t1 @t2
@t1 @t2 (1 t2 ) (2 t1 t2 )
@
1
= 2
@t2 (1 t2 ) (2 t1 t2 )2
1
1
1
= 2
2
2 + (1
t2 ) (2
(1 t2 ) (2 t1 t2 )
2
t1
t2 )3
we obtain
@2
M (t1 ; t2 ) j(t1 ;t2 )=(0;0)
@t1 @t2
1
1
1
= 2
2
2 + (1
t2 ) (2
(1 t2 ) (2 t1 t2 )
1 2
= 2
+
=1
4 8
E (XY ) =
2
t1
t2 )3
j(t1 ;t2 )=(0;0)
10.2. CHAPTER 3
411
Therefore
Cov(X; Y ) = E(XY )
1
= 1
2
1
=
4
E(X)E(Y )
3
2
412
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
10.3
Chapter 4
1: We are given that X and Y are independent random variables and we want to show that
U = h (X) and V = g (Y ) are independent random variables. We will assume that X and
Y are continuous random variables. The proof for discrete random variables is obtained by
replacing integrals by sums.
Suppose X has probability density function f1 (x) and support set A1 , and Y has
probability density function f2 (y) and support set A2 . Then the joint probability density
function of X and Y is
f (x; y) = f1 (x) f2 (y)
for (x; y) 2 A1
A2
Now for any (u; v) 2 <2
P (U
u; V
v) = P (h (X) u; g (Y ) v)
ZZ
=
f1 (x) f2 (y) dxdy
B
where
B = f(x; y) : h (x)
Let B1 = fx : h (x)
Since
P (U
ug and B2 = fy : g (y)
u; V
v) =
ZZ
u; g (y)
vg. Then B = B1
=
f1 (x) dx
B1
Z
f2 (y) dy
B2
= P (h (X)
= P (U
B2 .
f1 (x) f2 (y) dxdy
B1 B2
Z
vg
u) P (g (Y )
u) P (V
therefore U and V are independent random variables.
v)
v)
for all (u; v) 2 <2
10.3. CHAPTER 4
413
2:(a) The transformation
S : U = X + Y; V = X
has inverse transformation
X = V;
Y =U
V
The support set of (X; Y ), pictured in Figure 10.15, is
RXY = f(x; y) : 0 < x + y < 1; 0 < x < 1; 0 < y < 1g
y
1
y=1-x
R xy
x
1
0
Figure 10.15: RXY for Problem 2 (a)
Under S
(k; 0) ! (k; k)
(k; 1
k) ! (1; k)
(0; k) ! (k; 0)
0<k<1
0<k<1
0<k<1
and thus S maps RXY into
RU V = f(u; v) : 0 < v < u < 1g
which is pictured in Figure 10.16.
414
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
v
v=u
1
R
uv
u
0
1
Figure 10.16: RU V for Problem 2 (a)
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
0
1
1
1
=
1
The joint probability density function of U and V is given by
g (u; v) = f (v; u
v) j 1j
= 24v (u
v)
for (u; v) 2 RU V
and 0 otherwise.
2: (b) The marginal probability density function of U is given by
g1 (u) =
Z1
g (u; v) dv
1
=
Zu
24v (u
v) dv
0
= 12uv 2
= 4u3
and 0 otherwise.
8v 3 ju0
for 0 < u < 1
10.3. CHAPTER 4
415
6:(a)
y
1
t
y=t-x
A
t
t
0
Figure 10.17: Region of integration for 0
For 0
t
x
1
t
1
G (t) = P (T t) = P (X + Y
t)
Zt Zt x
Z Z
4xydydx
4xydydx =
=
x=0 y=0
(x;y) 2At
=
Zt
2x y 2 jt0
=
Zt
2x (t
Zt
2x t2
x
dx
0
x)2 dx
0
=
2tx + x2 dx
0
= x2 t2
= t4
=
1 4
t
6
4 3 1 4t
tx + x j0
3
2
4 4 1 4
t + t
3
2
1
416
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
y
1
Bt
t-1
y=t-x
0
x
1
t-1
Figure 10.18: Region of integration for 1
For 1
t
t
2
2 we use
G (t) = P (T
t) = P (X + Y
t) = 1
P (X + Y > t)
where
P (X + Y > t) =
Z
Z
4xydydx =
=
t 1
Z1
h
2x 1
(t
i
x)2 dx =
Z1
Z1
2x y 2 j1t
t2 + 2tx
x2 dx
4xydydx =
x=t 1 y=t x
(x;y)2 Bt
Z1
Z1
2x 1
so
dx
t 1
t 1
1 41
4
x j
= x 1 t + tx3
3
2 t 1
4
1
4
= 1 t2 + t
(t 1)2 1 t2 + t (t
3
2
3
4
1 1
= 1 t2 + t
+ (t 1)3 (t + 3)
3
2 6
2
x
2
1)3
1
(t
2
4
1 1
t+
(t 1)3 (t + 3) for 1 t 2
3
2 6
The probability density function of T = X + T is
8
2 3
if 0 t
dG (t) < 3 t
h
i
g (t) =
=
1
:2t 4
dt
1)2 (t + 3) + 16 (t 1)3
if 1 < t
3
2 (t
8
< 2 t3
if 0 t 1
3
=
:2
t3 + 6t 4 if 1 < t 2
3
1)4
G (t) = t2
and 0 otherwise.
1
2
10.3. CHAPTER 4
417
6: (c) The transformation
S : U = X 2;
has inverse transformation
X=
p
V = XY
p
Y = V= U
U;
The support set of (X; Y ), pictured in Figure 10.19, is
RXY = f(x; y) : 0 < x < 1; 0 < y < 1g
Under S
y
1
Rxy
x
0
1
Figure 10.19: Support set of (X; Y ) for Problem 6 (c)
(k; 0) !
k2 ; 0
(0; k) ! (0; 0)
(1; k) ! (1; k)
(k; 1) !
0<k<1
0<k<1
0<k<1
2
k ;k
0<k<1
and thus S maps RXY into the region
RU V
=
=
which is pictured in Figure 10.20.
(u; v) : 0 < v <
2
p
u; 0 < u < 1
(u; v) : v < u < 1; 0 < v < 1
418
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
v
1
2
u=v or v=sq rt(u)
R
uv
0
1
u
Figure 10.20: Support set of (U; V ) for Problem 6 (c)
The Jacobian of the inverse transformation is
1
p
2 u
@y
@u
@ (x; y)
=
@ (u; v)
0
p1
u
=
1
2u
The joint probability density function of U and V is given by
g (u; v) = f
p
p
u; v= u
p
p
= 4 u v= u
=
2v
u
1
2u
1
2u
for (u; v) 2 RU V
and 0 otherwise.
6:(d) The marginal probability density function of U is
g1 (u) =
=
Z1
1
p
Zu
0
= 1
and 0 otherwise. Note that U
g (u; v) dv
2v
1 h 2 pu i
dv =
v j0
u
u
for 0 < u < 1
Uniform(0; 1).
10.3. CHAPTER 4
419
The marginal probability density function of V is
g2 (v) =
Z1
g (u; v) du
1
=
Z1
2v
du
u
v2
= 2v log uj1v2
=
2v log v 2
=
4v log (v)
for 0 < v < 1
and 0 otherwise.
6:(e) The support set of X is A1 = fx : 0 < x < 1g and the support set of Y is
A2 = fy : 0 < y < 1g. Since
f (x; y) = 4xy = 2x (2y)
for all (x; y) 2 RXY = A1 A2 , therefore by the Factorization Theorem for Independence
X and Y are independent random variables. Also
f1 (x) = 2x for x 2 A1
and
f2 (y) = 2y
for y 2 A2
so X and Y have the same distribution. Therefore
h
i
E V 3 = E (XY )3
= E X3 E Y 3
2
E X3
2 1
32
Z
= 4 x3 (2x) dx5
=
0
=
=
=
2 51
x j
5 0
2
5
4
25
2
2
420
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
7:(a) The transformation
S : U = X=Y; V = XY
has inverse transformation
X=
p
UV ;
Y =
p
V =U
The support set of (X; Y ), picture in Figure 10.21, is
RXY = f(x; y) : 0 < x < 1; 0 < y < 1g
y
1
Rxy
1
0
x
Figure 10.21: Support set of (X; Y ) for Problem 7 (a)
Under S
(k; 0) ! (1; 0)
0<k<1
(0; k) ! (0; 0) 0 < k < 1
1
(1; k) !
;k
0<k<1
k
(k; 1) ! (k; k) 0 < k < 1
and thus S maps RXY into
RU V =
which is picture in Figure 10.22.
(u; v) : v < u <
1
; 0<v<1
v
10.3. CHAPTER 4
421
v
1
uv=1
. . .
v=u
Ruv
...
u
1
0
Figure 10.22: Support set of (U; V ) for Problem 7 (a)
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
p
pu
2 v
p 1p
2 u v
p
pv
2 pu
v
2u3=2
=
1
1
1
+
=
4u 4u
2u
The joint probability density function of U and V is given by
g (u; v) = f
p
uv;
p
v=u
p p
= 4 uv v=u
=
2v
u
1
2u
1
2u
for (u; v) 2 RU V
and 0 otherwise.
7:(b) The support set of (U; V ) is RU V = (u; v) : v < u < v1 ; 0 < v < 1 which is not
rectangular. The support set of U is A1 = fu : 0 < u < 1g and the support set of V is
A2 = fv : 0 < v < 1g.
Consider the point 12 ; 43 2
= RU V so g 12 ; 34 = 0. Since 12 2 A1 then g1 21 > 0. Since
3
3
1
3
4 2 A2 then g2 4 > 0. Therefore g1 2 g2 4 > 0 so
g
1 3
;
2 4
= 0 6= g1
1
2
and U and V are not independent random variables.
g2
3
4
422
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
7:(c) The marginal probability density function of V is given by
g2 (v) =
Z1
1
= 2v
g (u; v) du
Z1=v
1
du
u
v
= 2v [ln v] j1=v
v
=
4v ln v
for 0 < v < 1
and 0 otherwise.
The marginal probability density function of U is given by
g1 (u) =
=
8 Ru
1
>
>
2vdv
>
u
>
< 0
8
>
<u
>
: u13
g (u; v) dv
1
if 0 < u < 1
1=u
R
>
1
>
>
2vdv
>
u
: 0
=
and 0 otherwise.
Z1
if u
1
if 0 < u < 1
if u
1
10.3. CHAPTER 4
423
8: Since X
Uniform(0; ) and Y
density function of X and Y is
f (x; y) =
Uniform(0; ) independently the joint probability
1
for (x; y) 2 RXY
2
where
RXY = f(x; y) : 0 < x < ; 0 < y < g
which is pictured in Figure 10.23.The transformation
y
θ
R
xy
θ
0
x
Figure 10.23: Support set of (X; Y ) for Problem 8
S: U =X
Y; V = X + Y
has inverse transformation
X=
1
(U + V ) ;
2
Y =
1
(V
2
U)
Under S
(k; 0) ! (k; k)
0<k<
(0; k) ! ( k; k)
( ; k) ! (
(k; ) ! (k
0<k<
k; + k)
0<k<
;k + )
0<k<
S maps RXY into
RU V = f(u; v) :
u < v < 2 + u;
which is pictured in Figure 10.24.
<u
0 or u < v < 2
u; 0 < u < g
424
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
v
2θ
v =2θ+u
v =2θ-u
R
uv
v =-u
v =u
-θ
θ
0
u
Figure 10.24: Support set of (U; V ) for Problem 8
The Jacobian of the inverse transformation is
@ (x; y)
=
@ (u; v)
1
2
1
2
1
2
1
2
=
1 1
1
+ =
4 4
2
The joint probability density function of U and V is given by
g (u; v) = f
=
and 0 otherwise.
1
2 2
1
1
(u + v) ; (v
2
2
u)
for (u; v) 2 RU V
1
2
10.3. CHAPTER 4
425
The marginal probability density function of U is
g1 (u) =
Z1
g (u; v) dv
1
=
=
=
and 0 otherwise.
8
2 R+u
>
1
>
dv =
>
2
>
<2
u
2R u
>
>
1
>
dv =
>
2
:2
u
8
u+
>
< 2
>
:
u
for
1
2 2
(2 + u + u)
for
1
2 2
(2
u
for 0 < u <
<u
0
2
for 0 < u <
juj
for
2
<u<
u)
<u
0
426
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
9: (a) The joint probability density function of (Z1 ; Z2 ) is
f (z1 ; z2 ) =
=
1
1
2
2
p e z1 =2 p e z2 =2
2
2
1 (z12 +z22 )=2
e
for (z1 ; z2 ) 2 <2
2
The support set of (Z1 ; Z2 ) is <2 .
The transformation
X1 =
1+
1 Z1 ;
X2 =
2+
h
Z1 + 1
X2
2
2
has inverse transformation
Z1 =
X1
1
Z2 = p
1
;
1
1
2
2 1=2
Z2
X1
i
1
2
1
The Jacobian of the inverse transformation is
1
@ (z1 ; z2 )
=
@ (x1 ; x2 )
0
1
@z2
@x2
2 )1=2
2 (1
1
p
2 1
=
1
1
Note that
z12 + z22
=
2
x1
1
1
=
1
(1
= (x
2)
)
+
"
1
2)
(1
"
1
2
(x
2
2
x2
1
1
j j1=2
x2
1
2
2
2
x2
+
2
2
+
2
2
2
2
x1
1
1
#
#
)
x=
h
x1 x2
i
and
=
h
Therefore the joint probability density function of X =
and thus X
1
T
where
g (x) =
=
1
x1
1
1
x1
2
2
x1
2
x2
2
1
1=2
2 j j
exp
1
(x
2
)
1
(x
1
h
2
i
X1 X2
)T
i
is
for x 2 <2
BVN( ; ).
9: (b) Since Z1
N(0; 1) and Z1
N(0; 1) independently we know Z12
2 (1) and Z 2 + Z 2
2 (2). From (a) Z 2 + Z 2 = (X
) 1 (X
1
2
1
2
2 (2).
(X
) 1 (X
)T
2 (1)
and Z22
)T . Therefore
10.3. CHAPTER 4
427
10: (a) The joint moment generating function of U and V is
M (s; t) = E esU +tV
h
i
= E es(X+Y )+t(X Y )
h
i
= E e(s+t)X e(s t)Y
h
i h
i
= E e(s+t)X E e(s t)Y
= MX (s + t) MY (s
=
e
= e2
=
(s+t)+
s+
e(2
2 (s+t)2 =2
since X and Y are independent random variables
t)
e
(s t)+
2 (s
t)2 =2
since X
N
;
2
and Y
N
;
2 (2s2 +2t2 )=2
)s+(2
2 )s2 =2
e(2
2 )t2 =2
for s 2 <; t 2 <
10: (b) The moment generating function of U is
MU (s) = M (s; 0)
= e(2
)s+(2
2 )s2 =2
for s 2 <
which is the moment generating function of a N 2 ; 2
generating function of V is
2
random variable. The moment
MV (t) = M (0; t)
= e(2
2 )t2 =2
which is the moment generating function of a N 0; 2
for t 2 <
2
random variable.
Since
M (s; t) = MU (s) MV (t) for all s 2 <; t 2 <
therefore by Theorem 3.8.6, U and V are independent random variables.
Also by the Uniqueness Theorem for Moment Generating Functions U
and V
N 0; 2 2 .
N 2 ;2
2
2
428
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
12: The transformation de…ned by
Y1 =
X1
;
X1 + X2
Y2 =
X1 + X2
;
X1 + X2 + X3
Y3 = X1 + X2 + X3
has inverse transformation
X1 = Y1 Y2 Y3 ; X2 = Y2 Y3 (1
Y1 ) ; X3 = Y3 (1
Y2 )
Let
RX = f(x1 ; x2 ; x3 ) : 0 < x1 < 1; 0 < x2 < 1; 0 < x3 < 1g
and
RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 1; 0 < y3 < 1g
The transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) maps RX into RY .
The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is
@ (x1 ; x2 ; x3 )
@ (y1 ; y2 ; y3 )
=
=
@x1
@y1
@x2
@y1
@x3
@y1
@x1
@y2
@x2
@y2
@x3
@y2
@x1
@y3
@x2
@y3
@x3
@y3
y2 y3
y1 y3
y1 y2
y2 y3 y3 (1 y1 ) y2 (1 y1 )
0
y3
1 y2
= y3 (1
= y3 y22 y3
y1 ) y22 y3 + y1 y22 y3 + (1
y1 y22 y3 + y1 y22 y3 + (1
= y22 y32 + y2 y32
y2 ) y2 y32 (1
y1 ) + y1 y2 y32
y2 ) y2 y32
y1 y2 y32 + y1 y2 y32
y22 y32
= y2 y32
Since X1 ; X2 ; X3 are independent Exponential(1) random variables the joint probability
density function of (X1 ; X2 ; X3 ) is
f (x1 ; x2 ; x3 ) = e
x1 x2 x3
for (x1 ; x2 ; x3 ) 2 RX
The joint probability density function of (Y1 ; Y2 ; Y3 ) is
g (y1 ; y2 ; y3 ) = e
y3
= y2 y32 e
and 0 otherwise.
y2 y32
y3
for (y1 ; y2 ; y3 ) 2 RY
10.3. CHAPTER 4
429
Let
A1 = fy1 : 0 < y1 < 1g
A2 = fy2 : 0 < y2 < 1g
A3 = fy3 : 0 < y3 < 1g
g1 (y1 ) = 1
g2 (y2 ) = 2y2
for y1 2 A1
for y2 2 A2
and
1
g3 (y3 ) = y32 exp ( y3 ) for y3 2 A3
2
Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the
Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables.
Since
Z
gi (yi ) dyi = 1 for i = 1; 2; 3
Ai
therefore the marginal probability density function of Yi is equal to gi (yi ) ; i = 1; 2; 3.
Note that Y1
independently.
Uniform(0; 1) = Beta(1; 1) ; Y2
Beta(2; 1) ; and Y3
Gamma(3; 1)
430
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
13: Let
RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 2 ; 0 < y3 < g
Consider the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) de…ned by
X1 = Y1 cos Y2 sin Y3; X2 = Y1 sin Y2 sin Y3 ; X3 = Y1 cos Y3
The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is
@ (x1 ; x2 ; x3 )
@ (y1 ; y2 ; y3 )
=
@x1
@y1
@x2
@y1
@x3
@y1
@x1
@y2
@x2
@y2
@x3
@y2
=
cos y2 sin y3
sin y2 sin y3
cos y3
= y12 sin y3
@x1
@y3
@x2
@y3
@x3
@y3
y1 sin y2 sin y3 y1 cos y2 cos y3
y1 cos y2 sin y3 y1 sin y2 cos y3
0
sin y3
cos y2 sin y3
sin y2 sin y3
cos y3
= y12 sin y3 [cos y3
sin y2 cos y2 cos y3
cos y2 sin y2 cos y3
0
sin y3
sin2 y2 cos y3
cos2 y2 cos y3
sin y3 cos2 y2 sin y3 + sin2 y2 sin y3 ]
= y12 sin y3
cos2 y3
sin2 y3
y12 sin y3
=
@(x1 ;x2 ;x3 )
1 ;x2 ;x3 )
Since the entries in @(x
@(y1 ;y2 ;y3 ) are all continuous functions for (y1 ; y2 ; y3 ) 2 RY and @(y1 ;y2 ;y3 ) 6=
0 for (y1 ; y2 ; y3 ) 2 RY therefore by the Inverse Mapping Theorem the transformation has
an inverse in the neighbourhood of each point in RY .
Since X1 ; X2 ; X3 are independent N(0; 1) random variables the joint probability density
function of (X1 ; X2 ; X3 ) is
f (x1 ; x2 ; x3 ) = (2 )
3=2
exp
1 2
x + x22 + x22
2 1
for (x1 ; x2 ; x3 ) 2 <3
Now
x21 + x22 + x22 = (y1 cos y2 sin y3; )2 + (y1 sin y2 sin y3 )2 + (y1 cos y3 )2
= y12 cos2 y2 sin2 y3; + sin2 y2 sin2 y3 + cos2 y3
= y12 sin2 y3; cos2 y2 + sin2 y2 + cos2 y3
= y12 sin2 y3; + cos2 y3
= y12
The joint probability density function of (Y1 ; Y2 ; Y3 ) is
g (y1 ; y2 ; y3 ) = (2 )
3=2
exp
= (2 )
3=2
exp
1 2
y
y12 sin y3
2 1
1 2 2
y y sin y3 for (y1 ; y2 ; y3 ) 2 RY
2 1 1
10.3. CHAPTER 4
431
and 0 otherwise.
Let
A1 = fy1 : y1 > 0g
A2 = fy2 : 0 < y2 < 2 g
A3 = fy3 : 0 < y3 < g
g1 (y1 ) =
g2 (y2 ) =
2
1 2
p y12 exp
y
2 1
2
1
for y2 2 A2
2
for y1 2 A1
and
1
sin y3 for y3 2 A3
2
Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the
Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables.
g3 (y3 ) =
Since
Z
gi (yi ) dyi = 1 for i = 1; 2; 3
Ai
therefore the marginal probability density function of Yi is equal to gi (yi ) ; i = 1; 2; 3.
432
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
15: Since X
2 (n),
the moment generating function of X is
MX (t) =
Since U = X + Y
2 (m),
1
(1
2t)
n=2
for t <
1
2
the moment generating function of U is
MU (t) =
1
(1
m=2
2t)
for t <
1
2
The moment generating function of U can also be obtained as
MU (t) = E etU
= E et(X+Y )
= E etX E etY
since X and Y are independent random variables
= MX (t) MY (t)
By rearranging MU (t) = MX (t) MY (t) we obtain
MY (t) =
=
=
MU (t)
MX (t)
(1
2t)
m=2
(1
2t)
1
n=2
(1
(m n)=2
2t)
for t <
1
2
which is the moment generating function of a 2 (m n) random variable. Therefore by
2 (m
the Uniqueness Theorem for Moment Generating Functions Y
n).
10.3. CHAPTER 4
433
16:(a) Note that
n
P
Xi
n
P
X =
i=1
Therefore
n
P
n
P
nX = 0 and
Xi
n
P
ti Xi =
i=1
=
=
=
si
i=1
n h
P
si Xi
i=1
n
P
i=1
n
P
s
n
s+
Xi
X +
s
n
si Ui +
(si
s) = 0
i=1
i=1
s
X +X
s
n
n
P
s
(10.17)
Xi
Xi
X + (si
X +X
i=1
si Ui + sX
n
P
s) X +
(si
s i
X
n
s) + sX
i=1
i=1
Also since X1 ; X2 ; :::; Xn are independent N
n
P
E exp
ti Xi
n
Y
=
i=1
2
;
random variables
E [exp (ti Xi )] =
i=1
= exp
n
P
ti +
i=1
Therefore by (10.17) and (10.18)
E exp
n
P
si Ui + sX
1
2
2
n
Y
i=1
n
P
i=1
= E exp
i=1
exp
n
P
ti Xi
ti +
i=1
16:(b)
n
P
ti =
i=1
n
P
i=1
n
P
si
s+
i=1
t2i =
=
=
=
n
P
i=1
n
P
i=1
n
P
i=1
n
P
i=1
si
(si
(si
(si
n
P
s
=
(si
n
i=1
s+
s
n
s)2 +
s) + n
s2
n
(10.18)
1
2
2
n
P
i=1
t2i
s
=0+s=s
n
(10.20)
(10.21)
n
s P
(si
n i=1
s)2 + 0 +
2 2
ti
(10.19)
2
s)2 + 2
1
2
t2i
i=1
n
P
= exp
ti +
s2
n
s) +
n
P
i=1
s
n
2
434
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
16:(c)
M (s1 ; :::; sn ; s) = E exp
= exp
n
P
i=1
n
P
si Ui + sX
i=1
s+
1
2
= exp
s+
1
2
ti Xi
i=1
ti +
= exp
n
P
= E exp
2
2
2 P
n
t2
by (10.19)
2 i=1 i
n
P
s2
(si s)2 +
by (10.20) and (10.21)
n
i=1
n
s2
1 2P
exp
(si s)2
n
2 i=1
16:(d) Since
MX (s) = M (0; :::; 0; s) = exp
s+
and
MU (s1 ; :::; sn ) = M (s1 ; :::; sn ; 0) = exp
we have
1
2
1
2
2
2
n
P
s2
n
(si
s)2
i=1
M (s1 ; :::; sn ; s) = MX (s) MU (s1 ; :::; sn )
By the Independence Theorem for Moment Generating Functions, X and U = (U1 ; U2 ; : : : ; Un )
are independent random variables. Therefore by Chapter 4, Problem 1, X and
n
n
P
P
2
Ui2 =
Xi X are independent random variables.
i=1
i=1
10.4. CHAPTER 5
10.4
435
Chapter 5
1: (a) Since Yi
Exponential( ; 1), i = 1; 2; : : : independently then
P (Yi > x) =
Z1
e
(y
)
dy = e
(x
)
for x > ; i = 1; 2; : : :
(10.22)
x
and for x >
Fn (x) = P (Xn x) = P (min (Y1 ; Y2 ; : : : ; Yn ) x) = 1 P (Y1 > x; Y2 > x; : : : ; Yn > x)
n
Q
= 1
P (Yi > x) since Y1 ; Y2 ; : : : ; Yn are independent random variables
= 1
i=1
n(x
e
)
using (10:22)
(10.23)
Since
lim Fn (x) =
n!1
therefore
8
<1 if x >
:0 if x <
Xn !p
(10.24)
(b) By (10.24) and the Limit Theorems
Un =
Xn
!p 1
(c) Now
P (Vn
v) = P [n (Xn
v
+
n
using (10:23)
) < v] = P Xn
= 1
e
n(v=n+
= 1
e
v
)
for v
0
which is the cumulative distribution function of an Exponential(1) random variable. Therefore Vn Exponential(1) for n = 1; 2; ::: which implies
Vn !D V
Exponential (1)
(d) Since
P (Wn
w) = P n2 (Xn
) < w = P Xn
= 1
e
n(w=n2 +
= 1
e
w=n
v
+
n2
)
for w
0
therefore
lim P (Wn
n!1
w) = 0 for all w 2 <
which is not a cumulative distribution function. Therefore Wn has no limiting distribution.
436
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
2: We …rst note that
P (Yn
y) = P (max (X1 ; X2 ; : : : ; Xn )
y)
= P (X1 y; X2 y : : : ; Xn y)
n
Q
=
P (Xi y) since X1 ; X2 ; : : : ; Xn are independent random variables
=
i=1
n
Q
F (y)
i=1
= [F (y)]n
for y 2 <
(10.25)
Since F is a cumulative distribution function, F takes on values between 0 and 1. Therefore
the function n [1 F ( )] takes on values between 0 and n. Gn (z) = P (Zn z), the
cumulative distribution function of Zn , equals 0 for z
0 and equals 1 for z
n. For
0<z<n
Gn (z) = P (Zn
z)
= P (n [1
F (Yn )]
= P F (Yn )
z)
z
n
1
Let A be the support set of Xi . F (x) is an increasing function for x 2 A and therefore has
an inverse, F 1 , which is de…ned on the interval (0; 1). Therefore for 0 < z < n
Gn (z) = P F (Yn )
= P Yn
= 1
= 1
= 1
Since
h
lim 1
n!1
therefore
z
n
1
F
1
1
z
n
z
in
P Yn < F 1 1
h
z
F F 1 1
n
z n
1
n
1
z
n
lim Gn (z) =
n!1
ni
=1
8
<0
:1
e
n
z
by (10.25)
for z > 0
if z < 0
e
z
if z > 0
which is the cumulative distribution function of a Exponential(1) random variable. Therefore by the de…nition of convergence in distribution
Zn !D Z
Exponential (1)
10.4. CHAPTER 5
437
3: The moment generating function of a Poisson( ) random variable is
M (t) = exp
et
1
for t 2 <
The moment generating function of
Yn =
p
n X
is
n
1 P
=p
Xi
n i=1
p
n
Mn (t) = E etYn
n
t P
p
Xi
n i=1
n
p
t P
e n t E exp p
Xi
n i=1
n
p
Q
t
e n t
E exp p Xi
n
i=1
since X1 ; X2 ; : : : ; Xn are independent random variables
n
p
t
e n t M p
n
h
i
p
p
e n t exp n et= n 1
h p
i
p
exp
n t + n et= n 1
for t 2 <
= E exp
=
=
=
=
=
and
log Mn (t) =
p
n t+
p
n t+n
p
et=
n
1
for t 2 <
By Taylor’s Theorem
ex = 1 + x +
x2 ec 3
+ x
2
3!
for some c between 0 and x. Therefore
p
t
1
= 1+ p +
n 2
t
1
= 1+ p +
n 2
p
for some cn between 0 and t= n.
et=
n
t 2
1
t 3 cn
p
p
+
e
3!
n
n
t2
1
t3
+
ecn
n
3! n3=2
438
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
log Mn (t) =
p
n t+n
=
p
n t+n
=
1 2
t +
2
t3
3!
p
et=
n
1
1
t3
t
1 t2
p +
+
3! n3=2
n 2 n
1
p
ecn for t 2 <
n
Since
lim cn = 0
n!1
it follows that
lim ecn = e0 = 1
n!1
Therefore
t3
1
1 2
p
t +
n!1 2
3!
n
3
1 2
t
=
t +
(0) (1)
2
3!
1 2
=
t for t 2 <
2
lim log Mn (t) =
n!1
lim
ecn
and
1
lim Mn (t) = e 2
n!1
t2
for t 2 <
which is the moment generating function of a N(0; ) random variable.
Therefore by the Limit Theorem for Moment Generating Functions
Yn !D Y
N (0; )
ecn
10.4. CHAPTER 5
439
4: The moment generating function of an Exponential( ) random variable is
1
M (t) =
1
1
for t <
t
Since X1 ; X2 ; : : : ; Xn are independent random variables, the moment generating function
of
Zn =
=
is
n
P
1
p
Xi n
n i=1
n
p
1 P
p
Xi
n
n i=1
Mn (t) = E etZn
n
p
1 P
p
Xi
n
n i=1
n
p
t P
Xi
e n t exp p
n i=1
n
p
t
n t Q
E exp p Xi
n
i=1
n
p
t
n t
M p
n
!n
p
1
n t
1 ptn
= E exp t
= E
= e
= e
= e
=
e
p
t= n
1
n
t
p
n
for t <
p
n
By Taylor’s Theorem
ex = 1 + x +
x2 ec 3
+ x
2
3!
for some c between 0 and x. Therefore
e
p
t= n
t
1
= 1+ p +
n 2!
t
1
= 1+ p +
n 2
p
for some cn between 0 and t= n.
t
p
n
2 2
t
n
2
+
+
1
3!
1
3!
t
p
n
3 3
t
n3=2
3
ecn
ecn
440
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
2 2
t
t
1
1+ p +
n 2
Mn (t) =
t
1
= [1 + p +
n 2
t
p
n
n
2 2
t
+
n
t
p
n
t
p
n
1
2
=
1
1
2
2 2
t
=
1
1
2
2 2
t
n
where
(n) =
n3=2
2 2
1
t
2
n
n
t
p
n
1
ecn
3 3
t
3 3
t
1
3!
t
p
n
4 4
t
+
1
3!
n3=2
n2
+
1
3!
3 3
t
p
n
4 4
t
n3=2
n3=2
n
ecn
n
3 3
t
1
2
ecn
n3=2
3 3
t
3 3
t
(n)
n
+
n
1
3!
3 3
t
1
3!
+
n1=2
n
ecn
Since
lim cn = 0
n!1
it follows that
lim ecn = e0 = 1
n!1
Also
3 3
t
lim p = 0;
n!1
n
4 4
t
lim
n
n!1
=0
and therefore
lim
(n) = 0
lim
1
2
n!1
Thus by Theorem 5.1.2
lim Mn (t) =
n!1
n!1
= e
1
2 2
t =2
2 2
t
n
+
(n)
n
n
for t 2 <
which is the moment generating function of a N 0;
2
random variable.
Therefore by the Limit Theorem for Moment Generating Functions
Zn !D Z
N 0;
2
ecn
t
p
n
]
n
10.4. CHAPTER 5
441
6: We can rewrite Sn2 as
Sn2 =
=
=
=
=
=
n
P
1
n
1
n
1
n
1
n
1
n
1
n
2
=
Xi
Xn
2
1 i=1
n
P
(Xi
)
Xn
1 i=1
n
P
(Xi
)2 2 Xn
1 i=1
n
P
(Xi
)2 2 Xn
1 i=1
n
P
(Xi
)2 2 Xn
1 i=1
n
P
(Xi
)2 n Xn
1 i=1
"
n
1 P
Xi
n
n 1
n i=1
2
n
P
(Xi
i=1
n
P
Xi
) + n Xn
n
2
2
+ n Xn
(10.26)
i=1
+ n Xn
n Xn
2
(10.27)
2
2
2
Xn
#
(10.28)
Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 < 1, then
Xn !p
(10.29)
by the Weak Law of Large Numbers.
By (10.29) and the Limit Theorems
2
Xn
Let Wi =
Xi
2
!p 0
(10.30)
, i = 1; 2; : : : with
E (Wi ) = E
"
Xi
2
#
=1
and V ar (Wi ) < 1 since E Xi4 < 1.
Since W1 ; W2 ; : : : are independent and identically distributed random variables with
E (Wi ) = 1 and V ar (Wi ) < 1, then
n
Wn =
1X
n
Xi
2
i=1
!p 1
by the Weak Law of Large Numbers.
By (10.28), (10.30), (10.31) and the Limit Theorems
Sn2 !p
2
(1) (1 + 0) =
2
(10.31)
442
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and therefore
Sn
!p 1
(10.32)
by the Limit Theorems.
Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = and V ar (Xi ) = 2 < 1, then
p
n Xn
!D Z N (0; 1)
(10.33)
by the Central Limit Theorem.
By (10.32), (10.33) and Slutsky’s Theorem
Tn =
p
n Xn
Sn
p
=
n(Xn
Sn
)
!D
Z
=Z
1
N (0; 1)
10.4. CHAPTER 5
443
7: (a) Let Y1 ; Y2 ; : : : be independent Binomial(1; ) random variables with
E (Yi ) =
and
V ar (Yi ) = (1
)
for i = 1; 2; : : :
By 4.3.2(1)
n
P
Yi
Binomial (n; )
i=1
Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with
E (Yi ) = and V ar (Yi ) = (1
) < 1, then by the Weak Law of Large Numbers
Since Xn and
n
P
n
1 P
Yi = Yn !p
n i=1
Yi have the same distribution
i=1
Tn =
Xn
!p
n
(10.34)
(b) By (10.34) and the Limit Theorems
Un =
Xn
n
Xn
n
1
!p (1
)
(10.35)
(c) Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with
E (Yi ) = and V ar (Yi ) = (1
) < 1, then by the Central Limit Theorem
p
n Yn
p
!D Z N (0; 1)
(1
)
Since Xn and
n
P
Yi have the same distribution
i=1
p
n
Sn = p
Xn
n
(1
)
!D Z
N (0; 1)
(10.36)
By (10.36) and Slutsky’s Theorem
Wn =
Since Z
p
n
N(0; 1), W =
p
Xn
n
(1
= Sn
)Z
p
(1
N(0; (1
Wn !D W
) !D W =
p
(1
)Z
)) and therefore
N (0; (1
))
(10.37)
444
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
(d) By (10.35), (10.37) and Slutsky’s Theorem
p
(1
Z
= p
)
(1
Wn
W
Zn = p
!D p
Un
(1
(e) To determine the limiting distribution of
"
!
r
p
Xn
Vn = n arcsin
n
)
)
arcsin
=Z
p
N (0; 1)
#
p
let g (x) = arcsin ( x) and a = . Then
g 0 (x) =
q
=
2
and
1
1
p
2
( x)
1
p
x (1
1
p
2 x
x)
g 0 (a) = g 0 ( )
=
By (10.37) and the Delta Method
1
Vn !D p
2
(1
)
Z
p
1
p
2
(1
(1
)=
)
Z
2
N 0;
1
4
(f ) The limiting variance of Wn is equal to (1
) which depends on . The limiting
variance of Zn is 1 which does not depend on . The limiting variance of Vn is 1=4 which
p
does not depend on . The transformation g (x) = arcsin ( x) is a variance-stabilizing
transformation for the Binomial distribution.
10.4. CHAPTER 5
445
8: X1 ; X2 ; : : : are independent Geometric( ) random variables with
E (Xi ) =
1
and
V ar (Xi ) =
1
2
i = 1; 2; : : :
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Weak Law of Large Numbers
n
Yn
1 P
1
=
Xi !p
n
n i=1
Xn =
(10.38)
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Central Limit Theorem
p
n Xn 1
q
!D Z N (0; 1)
(10.39)
1
2
By (10.39) and Slutsky’s Theorem
Wn =
Since Z
p
1
n Xn
N(0; 1), W =
q
Wn =
=
1
Z
2
p
r
p
1
2
N 0; 1
n Xn
1
2
1
n Xn
q
!D W =
1
2
r
1
2
Z
and therefore
!D W
N 0;
1
2
(10.40)
(c) By (10.38) and the Limit Theorems
Vn =
1
1
!p
1 + Xn
1+ 1
(d) To …nd the distribution of
we …rst note that
p
n (Vn
Zn = p
Vn2 (1
=
)
Vn )
Wn =
p
n Xn
=
p
n Xn
=
p
n Xn + 1
=
p
n
1
Vn
+1
1
1
1
1
1
=
(10.41)
446
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore by (10.40)
p
n
1
Vn
1
!D W
N 0;
1
(10.42)
2
Next we determine the limiting distribution of
p
n (Vn
1
which is the numerator of Zn . Let g (x) = x
)
1
and a =
g 0 (x) =
. Then
2
x
and
g 0 (a) = g 0
By (10.42) and the Delta Theorem
r
p
1
2
n (Vn
) !D
1
p
Z=
2
=
1
Z
N 0;
2
(1
)
(10.43)
By (10.43) and Slutsky’s Theorem
or
since if
q
Z
p
1
2
(1
) !D q
n (Vn
)
N(0; 1) then Z
p
n (Vn
q
2
(1
p
1
2
)
)
(1
1
Z =
Z
N (0; 1)
)
!D Z
N (0; 1)
(10.44)
N(0; 1) by symmetry of the N(0; 1) distribution.
Since
p
n (Vn
Zn = p
Vn2 (1
p
)
Vn )
n
pn(V
2
(1
=q 2
then by (10.41) and the Limit Theorems
s
s
Vn2 (1 Vn )
!p
2
(1
)
Vn (1 Vn )
2
(1 )
2
2
(1
(1
)
=1
)
By (10.44), (10.45), (10.46), and Slutsky’s Theorem
Zn !D
Z
=Z
1
)
)
N (0; 1)
(10.45)
(10.46)
10.4. CHAPTER 5
447
9: X1 ; X2 ; : : : ; Xn are independent Gamma(2; ) random variables with
E (Xi ) = 2 and V ar (Xi ) = 2
2
for i = 1; 2; : : : ; n
(a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Weak Law of Large Numbers
n
1 P
Xi !p 2
n i=1
Xn =
By (10.47) and the Limit Theorems
(10.47)
p
Xn
2
p !p p = 2
2
2
and
p
2
p !p 1
Xn = 2
(10.48)
(b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with
E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Central Limit Theorem
p
p
n Xn 2
n Xn 2
p
p
Wn =
=
!D Z N (0; 1)
(10.49)
2
2
2
By (10.49) and Slutsky’s Theorem
Vn =
p
n Xn
2
!D
p
2 Z
N 0; 2
2
(c) By (10.48), (10.49) and Slutsky’s Theorem
"p
#" p
#
p
n Xn 2
n Xn 2
2
p
p
p !D Z (1) = Z
Zn =
=
Xn = 2
Xn = 2
2
(10.50)
N (0; 1)
(d) To determine the limiting distribution of
Un =
p
n log Xn
log (2 )
let g (x) = log x and a = 2 . Then g 0 (x) = 1=x and g 0 (a) = g 0 (2 ) = 1= (2 ). By (10.50)
and the Delta Method
1p
Z
1
2 Z=p
N 0;
Un !D
2
2
2
(e) The limiting variance of Zn is 1 which does not depend on . The limiting variance
of Un is 1=2 which does not depend on . The transformation g (x) = log x is a variancestabalizing transformation for the Gamma distribution.
448
10.5
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Chapter 6
3: (a) If Xi
Geometric( ) then
)x
f (x; ) = (1
for x = 0; 1; : : : ; 0 <
<1
The likelihood function is
L( ) =
=
=
n
Q
f (xi ; )
i=1
n
Q
)xi
(1
i=1
n
)t
(1
where
t=
n
P
for 0 <
<1
xi
i=1
The log likelihood function is
l ( ) = log L ( ) = n log + t log (1
)
for 0 <
<1
The score function is
S ( ) = l0 ( ) =
=
n
n
t
1
(n + t)
(1
)
for 0 <
<1
n
n+t
Since S ( ) > 0 for 0 < < n= (n + t) and S ( ) < 0 for n= (n + t) <
the …rst derivative test the maximum likelihood estimate of is
S ( ) = 0 for
^=
=
< 1; therefore by
n
n+t
and the maximum likelihood estimator is
^=
The information function is
I( ) =
=
n
n+T
where T =
n
P
Xi
i=1
S 0 ( ) = l00 ( )
n
t
for 0 <
2 +
(1
)2
<1
Since I ( ) > 0 for all 0 < < 1, the graph of l ( ) is concave down and this also con…rms
that ^ = n= (n + t) is the maximum likelihood estimate.
10.5. CHAPTER 6
449
3: (b) The observed information is
n
t
=
2 +
^
(1 ^)2
I(^) =
n
n
n+t
2
+
t
t 2
( n+t
)
(n + t)2 (n + t)2
(n + t)3
+
=
n
t
nt
n
^2 (1 ^)
=
=
The expected information is
n
E
2
+
T
(1
)
2
n
E (T )
(1
)2
n
n (1
)=
=
2 +
(1
)2
(1
)+
= n
2
(1
)
n
for 0 <
=
2
(1
)
=
+
2
<1
3: (c) Since
= E (Xi ) =
(1
)
then by the invariance property of maximum likelihood estimators the maximum likelihood
estimator of = E (Xi ) is
1 ^
T
^=
= =X
^
n
3: (d) If n = 20 and t =
20
P
xi = 40 then the maximum likelihood estimate of
is
i=1
^=
The relative likelihood function of
R( ) =
20
1
=
20 + 40
3
is given by
L( )
=
L(^)
20
(1
1 20
3
)40
2 40
3
for 0
1
A graph of R ( ) is given in Figure 10.25A 15% likelihood interval is found by solving
R ( ) = 0:15. The 15% likelihood interval is [0:2234; 0:4570].
R (0:5) = 0:03344 implies that = 0:5 is outside a 10% likelihood interval and we would
conclude that = 0:5 is not a very plausible value of given the data.
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
0.0
0.2
0.4
R(θ)
0.6
0.8
1.0
450
0.2
0.3
0.4
0.5
θ
Figure 10.25: Relative likelihood function for Problem 3
4: Since (X1 ; X2 ; X3 )
L ( 1;
2)
Multinomial n;
2
; 2 (1
n!
=
x1 !x2 ! (n
2 x1
x1
x2 )!
x1
x2 )!
n!
=
x1 !x2 ! (n
2 x2
)2 the likelihood function is
) ; (1
[2 (1
2x1 +x2
(1
h
)]x2 (1
)2n
)2
2x1 x2
in
x1 x2
for 0 <
<1
The log likelihood function is
n!
+ x2 log 2
x1 !x2 ! (n x1 x2 )!
+ (2x1 + x2 ) log + (2n 2x1 x2 ) log (1
l ( ) = log
)
for 0 <
The score function is
S( ) =
=
=
2x1 + x2
2n
(2x1 + x2 )
1
(2x1 + x2 ) (1
) [2n (2x1 + x2 )]
(1
)
2n + (2x1 + x2 ) + (2x1 + x2 )
for 0 <
(1
)
S ( ) = 0 if
=
2x1 + x2
2n
Since
S ( ) > 0 for 0 <
<
2x1 + x2
2n
<1
<1
10.5. CHAPTER 6
451
and
2x1 + x2
2n
therefore by the …rst derivative test, l ( ) has an absolute maximum at
= (2x1 + x2 ) = (2n). Thus the maximum likelihood estimate of is
S ( ) < 0 for 1 >
>
^ = 2x1 + x2
2n
and the maximum likelihood estimator of
is
^ = 2X1 + X2
2n
The information function is
I( )=
2x1 + x2
2
+
2n
(2x1 + x2 )
(1
)2
for 0 <
<1
and the observed information is
I(^) = I
=
=
2x1 + x2
2n
(2n)2 (2x1 + x2 ) (2n)2 [2n (2x1 + x2 )]
=
+
(2x1 + x2 )2
[2n (2x1 + x2 )]2
(2n)2
+
(2x1 + x2 ) [2n
2n
^ 1 ^
(2n)2
=
(2x1 + x2 )]
2x1 +x2
2n
2n
1
2x1 +x2
2n
Since
X1
Binomial n;
2
and X2
E (2X1 + X2 ) = 2n
2
Binomial (n; 2 (1
+ n [2 (1
)] = 2n
The expected information is
(2X1 + X2 )
(1
)2
2n
2n (1
)
1
1
+
2 +
2 = 2n
1
(1
)
2n
for 0 < < 1
(1
)
J( ) = E
=
=
2X1 + X2
2
+
2n
6: (a) Given
P (k children in family; ) =
P (0 children in family; ) =
k
for k = 1; 2; : : :
1 2
1
for 0 < <
1
2
))
452
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and the observed data
No. of children
Frequency observed
the appropriate likelihood function for
0
17
1
22
2
7
3
3
4
1
Total
50
is based on the multinomial model:
50!
17!22!7!3!1!
1 2
1
17
L( ) =
50!
17!22!7!3!1!
1 2
1
17
=
22
2 7
3 3
49
for 0 <
4 1
<
1
2
or more simply
L( ) =
1 2
1
17
49
for 0 <
1
2
<
The log likelihood function is
l ( ) = 17 log (1
2 )
17 log (1
) + 49 log
for 0 <
<
1
2
The score function is
S( ) =
=
=
34
17
49
+
+
2
1
2
34
+ 17
2 2 + 49 1
(1
) (1 2 )
2
98
164 + 49
1
for 0 < <
(1
) (1 2 )
2
1
3 +2
2
The information function is
I( )=
68
(1 2 )2
17
(1
2
)
+
49
for 0 <
2
<
1
2
6: (b) S ( ) = 0 if
98
Therefore S ( ) = 0 if
q
2
164 + 49 = 0 or
2
82
1
+ =0
49
2
s
41 1
82 2
41
=
=
2=
2
49 2
49
49
p
41
1p
41
1
=
6724 4802 =
1922
49 98
49 98
p
1
Since 0 < < 21 and = 41
49 + 98 1922 > 1, we choose
82
49
82 2
49
4
1
2
=
41
49
1p
1922
98
1
2
s
(82)2 2 (49)2
(49)2
10.5. CHAPTER 6
453
Since
S ( ) > 0 for 0 <
<
=
41
49
1p
1922 and S ( ) < 0 for
98
therefore the maximum likelihood estimate of
^=
=
41
49
1p
1922
98
=
41
49
1p
1922 <
98
<
1
2
is
0:389381424147286
0:3894
The observed information for the given data is
I(^) =
68
(1
2^)2
17
49
+ 2
2
^
^
(1
)
1666:88
0.6
0.0
0.2
0.4
R(θ)
0.8
1.0
6: (c) A graph of the relative likelihood function is given in Figure 10.26.A 15% likelihood
0.30
0.35
0.40
0.45
θ
Figure 10.26: Relative likelihood function for Problem 6
interval for
is [0:34; 0:43].
6: (d) Since R (0:45)
0:0097, = 0:45 is outside a 1% likelihood interval and therefore
= 0:45 is not a plausible value of for these data.
6: (e) The expected frequencies are calculated using
!
1 2^
k
e0 = 50
and ek = 50^
^
1
The observed and expected frequencies are
for k = 1; 2; : : :
454
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
No. of children
Observed Frequency: fk
0
17
1
22
2
7
3
3
4
1
Total
50
Expected Frequency: ek
18:12
19:47
7:58
2:95
1:15
50
We see that the agreement between observed and expected frequencies is very good and
the model gives a reasonable …t to the data.
7: Show ^ = x(1) is the maximum likelihood estimate. By Example 2.5.3(a)
parameter for this distribution. By Theorem 6.6.4 Q (X; ) = ~
= X(1)
quantity.
q) = P ~
P (Q (X; )
q
= P X(1)
= 1
q
P X(1) q +
n
Q
e (q+ ) since P (Xi > x) = e
= 1
i=1
nq
= 1
e
for q
(x
)
0
Since
P
= P
= P
= P
= 1
h
^+
Since
1
n
log (1
~ + 1 log (1
n
p)
~
1
log (1 p)
n
1
0 Q (X; )
log (1 p)
n
1
Q (X; )
log (1 p)
P (Q (X; )
n
0
e
~
log(1 p)
0
= 1 (1 p) = p
i
p) ; ^ is a 100p% con…dence interval for .
1
1 p
~ + 1 log 1 + p
P ~ + log
n
2
n
2
1+p
1
1 p
1
~
= P
log
log
n
2
n
2
1
1+p
1
1 p
= P
log
Q (X; )
log
n
2
n
2
1 p
1+p
= 1 elog( 2 )
1 elog( 2 )
=
is a location
is a pivotal
1 p 1 p
+ + + =p
2 2 2 2
0)
for x >
10.5. CHAPTER 6
h
^+
1
n
log
1 p
2
;^ +
455
1
n
log
i
1+p
2
is a 100p% con…dence interval for .
i
h
The interval ^ + n1 log (1 p) ; ^ is a better choice since it contains ^ while the interval
i
h
^ + 1 log 1 p ; ^ + 1 log 1+p
does not.
n
2
n
2
8:(a) If x1 ; x2 ; : : : ; xn is an observed random sample from the Gamma
then the likelihood function is
L( ) =
n
Q
1 1
2;
distribution
f (xi ; )
i=1
=
1=2
n
Q
1=2
xi
1
2
1=2
i=1
n
Q
=
xi
e
i=1
where
n
1
2
xi
t=
n
P
n=2
e
t
for
>0
xi
i=1
or more simply
L( ) =
n=2
e
t
for
>0
The log likelihood function is
l ( ) = log L ( )
n
=
log
2
t for
>0
and the score function is
S( )=
d
n
l( ) =
d
2
t=
n
S ( ) = 0 for
2 t
2
=
>0
n
2t
Since
S ( ) > 0 for 0 <
for
<
n
2t
and
n
2t
therefore by the …rst derivative test l ( ) has a absolute maximum at
S ( ) < 0 for
>
^= n = 1
2t
2x
is the maximum likelihood estimate of
and
~= 1
2X
=
n
2t .
Thus
456
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
is the maximum likelihood estimator of .
8:(b) If Xi
Gamma
1 1
2;
; i = 1; 2; : : : ; n independently then
E (Xi ) =
1
2
and V ar (Xi ) =
1
2 2
and by the Weak Law of Large Numbers
X !p
1
2
and by the Limit Theorems
~ = 1 !p 1
2X
2 21
=
as required.
8:(c) By the Invariance Property of Maximum Likelihood Estimates the maximum likelihood estimate of
= V ar (Xi ) =
1
2 2
is
^=
1
1
2 =
n
2 2t
2^
2
4t2
=2
2n2
=
t
n
2
= 2x2
8:(d) The moment generating function of Xi is
1
M (t) =
t 1=2
1
n
P
The moment generating function of Q = 2
for t <
Xi is
i=1
MQ (t) = E etQ = E exp 2t
n
P
Xi
i=1
=
n
Q
E [exp (2tXi )] =
i=1
=
=
!n=2
1
2t
1
1
(1
n=2
2t)
n
Q
M (2t )
i=1
for 2t <
for t <
1
2
which is the moment generating function of a 2 (n) random variable. Therefore by the
2 (n).
Uniqueness Theorem for Moment Generating Functions, Q
10.5. CHAPTER 6
457
To construct a 95% equal tail con…dence interval for
P (Q a) = 0:025 = P (Q > b) so that
we …nd a and b such that
P (a < Q < b) = P (a < 2T < b) = 0:95
or
a
<
2T
P
so that
a b
2t ; 2t
<
b
2T
= 0:95
is a 95% equal tail con…dence interval for .
For n = 20 we have
P (Q
For t =
20
P
9:59) = 0:025 = P (Q > 34:17)
xi = 6 a 95% equal tail con…dence interval for
is
i=1
9:59 34:17
;
= [0:80; 2:85]
2(6) 2(6)
Since = 0:7 is not in the 95% con…dence interval it is not a plausible value of
the data.
in light of
8:(e) The information function is
I( )=
n
d
S( )= 2
d
2
for
>0
The expected information is
n
2 2
J ( ) = E [I ( ; X1 ; : : : ; Xn )] = E
and
h
By the Central Limit Theorem
p
n X 21
p1
2
=
p
2n
Let
g(x) =
1
2
X
g 0 (x) =
g(a) = g
!D Z
1
1
and a =
2x
2
then
1
2
n
2 2
p
n
=p
2~
p
n ~
=p
2~
i1=2
~
J(~)
=
1
2x2
=
1
2
1
2
=
for
>0
1
2X
N (0; 1)
(10.51)
458
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and
g 0 (a) = g 0
1
2
1
=
2
1
2
2
=
2
2
By the Delta Theorem and (10.51)
p
or
2n
1
2X
p
~
2n
2 2Z
!D
!D
2 2Z
4
N 0; 4
N 0; 4
4
By Slutsky’s Theorem
But if
Z
Since ~ !p
p
2n ~
!D Z N (0; 1)
2 2
N(0; 1) then Z N(0; 1) and thus
p
n ~
p
!D Z N (0; 1)
2
(10.52)
then by the Limit Theorems
~
!p 1
(10.53)
Thus by (10.52), (10.53) and Slutsky’s Theorem
h
=
p ~
n
p
=
2~
i1=2
~
J(~)
p
pn
2
~
~
!D
Z
=Z
1
An approximate 95% con…dence interval is given by
2
For n = 20 and t =
20
P
6
6^
4
N (0; 1)
3
1:96 7
1:96
7
r
; ^+ r
5
J ^
J ^
xi = 6;
i=1
^=
1
2
6
20
=
n
20
5
and J ^ = 2 =
3
2 53
2^
2
and an approximate 95% con…dence interval is given by
5
3
1:96 5
1:96
p ;
= [0:63; 2:70]
+p
3
3:6
3:6
= 3:6
10.5. CHAPTER 6
For n = 20 and t =
459
20
P
xi = 6 the relative likelihood function of
is given by
i=1
R( ) =
L( )
=
L(^)
10
6
e
5 10
e 10
3
=
3
5
10
e10
6
for
>0
0.6
0.0
0.2
0.4
R(θ)
0.8
1.0
A graph of R ( ) is given in Figure 10.27.A 15% likelihood interval is found by solving
1
2
3
4
θ
Figure 10.27: Relative likelihood function for Problem 8
R ( ) = 0:15. The 15% likelihood interval is [0:84; 2:91].
The exact 95% equal tail con…dence interval [0:80; 2:85], the approximate 95% con…dence interval [0:63; 2:70] ; and the 15% likelihood interval [0:84; 2:91] are all approximately
of the same width. The exact con…dence interval and likelihood interval are skewed to the
right while the approximate con…dence interval is symmetric about the maximum likelihood
estimate ^ = 5=3. Approximate con…dence intervals are symmetric about the maximum
likelihood estimate because they are based on a Normal approximation. Since n = 20 the
approximation cannot be completely trusted. Therefore for these data the exact con…dence
interval and the likelihood interval are both better interval estimates for .
R (0:7) = 0:056 implies that = 0:7 is outside a 10% likelihood interval so based on
the likelihood function we would conclude that = 0:7 is not a very plausible value of
given the data. Previously we noted that = 0:7 is also not contained in the exact
95% con…dence interval. Note however that = 0:7 is contained in the approximate 95%
con…dence interval and so based on the approximate con…dence interval we would conclude
that = 0:7 is a reasonable value of given the data. Again the reason for the disagreement
is because n = 20 is not large enough for the approximation to be a good one.
460
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
10.6
Chapter 7
1: Since Xi has cumulative distribution function
F (x;
1; 2)
2
1
=1
for x
x
1;
1
> 0;
2
>0
the probability density function of Xi is
f (x;
1; 2)
d
F (x;
dx
=
=
2
1
x
x
1; 2)
2
for x
1;
1
> 0;
2
>0
The likelihood function is
L ( 1;
2)
=
n
Q
f (xi ;
1; 2)
i=1
=
n
Q
i=1
=
2
1
xi
xi
n n
2 1
2
n
Q
2
if 0 <
2
1
xi ;
i = 1; 2; : : : ; n and
2
>0
1
xi
if 0 <
x(1) and
1
>0
2
i=1
For each value of 2 the likelihood function is maximized over 1 by taking 1 to be as large
as possible subject to 0 < 1
x(1) . Therefore for …xed 2 the likelihood is maximized
for 1 = x(1) . Since this is true for all values of 2 the value of ( 1 ; 2 ) which maximizes
L ( 1 ; 2 ) will necessarily have 1 = x(1) .
To …nd the value of
2
which maximizes L x(1) ;
L2 ( 2 ) = L x(1) ;
=
2
n n 2
2 x(1)
and its logarithm
l2 ( 2 ) = log L2 ( 2 ) = n log
2
+n
n
Q
2
consider the function
2
1
xi
for
2
>0
i=1
2 log x(1)
(
2
+ 1)
n
P
i=1
Now
n
d
l2 ( 2 ) = l20 ( 2 ) =
+ n log x(1)
d 2
2
n
P
n
xi
log
=
x(1)
2
i=1
n
=
t
2
=
n
2t
2
n
P
i=1
log xi
log xi
10.6. CHAPTER 7
461
where
t=
n
P
xi
x(1)
log
i=1
Now l20 ( 2 ) = 0 for 2 = n=t. Since l20 ( 2 ) > 0 for 0 < 2 < n=t and l20 ( 2 ) < 0 for 2 > n=t
therefore by the …rst derivative test l2 ( 2 ) is maximized for 2 = n=t = ^2 . Therefore
L2 ( 2 ) = L x(1) ; 2 is also maximized for 2 = ^2 : Therefore the maximum likelihood
estimators of 1 and 2 are given by
^1 = X(1) and ^2 =
n
n
P
log
i=1
Xi
X(1)
4: (a) If the events S and H are independent events then P (S \ H) = P (S)P (H) =
P (S \ H) = P (S)P (H) = (1
), etc.
,
The likelihood function is
L( ; ) =
n!
(
x11 !x12 !x21 !x22 !
)x11 [ (1
)]x12 [(1
or more simply (ignoring constants with respect to
L( ; ) =
x11 +x12
)x21 +x22
(1
x11 +x21
(1
) ]x21 [(1
) (1
)]x22
and )
)x12 +x22
for 0
1, 0
1
The log likelihood is
l( ; ) = (x11 + x12 ) log
for 0 <
< 1, 0 <
+ (x21 + x22 ) log(1
) + (x11 + x21 ) log
+ (x12 + x22 ) log(1
<1
Since
@l
@
@l
@
=
=
x11 + x12
x11 + x21
x21 + x22
x11 + x12
=
1
x12 + x22
x11 + x21
=
1
n
n
(x11 + x12 )
x11 + x12 n
=
1
(1
)
(x11 + x21 )
x11 + x21 n
=
1
(1
)
the score vector is
S( ; ) =
for 0 <
h
x11 +x12 n
(1
)
< 1, 0 <
x11 +x21 n
(1 )
<1
Solving S( ; ) = (0; 0) gives the maximum likelihood estimates
^=
x11 + x12
x11 + x21
and ^ =
n
n
i
)
462
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
The information matrix is
2
I( ; ) = 4
2
@2l
@ 2
@2l
@ @
x11 +x12
2
= 4
3
@2l
@ @
+
5
@2l
@ 2
n (x11 +x12 )
(1
)2
0
x11 +x21
0
+
2
n (x11 +x21 )
(1 )2
3
5
4:(b) Since X11 + X12 = the number of times the event S is observed and P (S) = , then
the distribution of X11 + X12 is Binomial(n; ). Therefore E (X11 + X12 ) = n and
E
X11 + X12
2
n
+
(X11 + X12 )
(1
)2
n
(1
=
)
Since X11 + X21 = the number of times the event H is observed and P (H) = , then the
distribution of X11 + X21 is Binomial(n; ). Therefore E (X11 + X21 ) = n and
E
X11 + X21
2
n
+
(X11 + X21 )
(1
)2
Therefore the expected information matrix is
2
J( ; )=4
The inverse matrix is
[J ( ; )]
1
n
(1
2
n
(1
0
(1
n
=4
0
)
)
)
0
n
(1
=
3
5
0
(1
n
)
)
3
5
Also V ar (~ ) = (1n ) and V ar( ~ ) = (1n ) so the diagonal entries of [J ( ; )]
the variances of the maximum likelihood estimators.
7:(a) If Yi
Binomial(1; pi ) where pi = 1 + e
constants, the likelihood function for ( ; ) is
L( ; ) =
n
Q
1 yi
p (1
yi i
i=1
or more simply
L( ; ) =
n
Q
i=1
The log likelihood function is
l ( ; ) = log L ( ; ) =
n
P
i=1
pyi i (1
xi
1
pi )1
pi )1
[yi log (pi ) + (1
1
give us
and x1 ; x2 ; : : : ; xn are known
yi
yi
yi ) log (1
pi )]
10.6. CHAPTER 7
463
Note that
@pi
@
@
@
=
1
=
1
xi
1+e
x i )2
(1 + e
xi
e
xi ) (1 + e
(1 + e
xi
e
=
xi )
= pi (1
pi )
and
@pi
@
@
@
=
= xi
1
xi
1+e
1
xi ) (1
(1 + e
=
xi
xi e
xi )2
(1 + e
xi
e
+e
xi )
= xi pi (1
pi )
Therefore
n
P
@l @pi
yi (1 yi ) @pi
=
@pi @
(1 pi ) @
i=1 pi
n
P
yi (1 pi ) (1 yi ) pi
pi (1
=
pi (1 pi )
i=1
n
P
=
[yi (1 pi ) (1 yi ) pi ]
@l
@
=
=
i=1
n
P
(yi
pi )
pi )
i=1
@l
@
n
P
@l @pi
yi (1 yi ) @pi
=
@pi @
(1 pi ) @
i=1 pi
n
P yi (1 pi ) (1 yi ) pi
=
xi pi (1
pi (1 pi )
i=1
n
P
=
xi [yi (1 pi ) (1 yi ) pi ]
=
=
i=1
n
P
xi (yi
pi )
"
#
pi )
i=1
The score vector is
S( ; )=
@l
@
@l
@
2
6
=6
4
n
P
i=1
n
P
(yi
xi (yi
pi )
pi )
i=1
To obtain the expected information we …rst note that
@2l
@
=
2
@
@
@l
@
=
@2l
@
=
@ @
@
@l
@
=
@
@
@
@
n
P
(yi
pi ) =
i=1
n
P
i=1
(yi
pi ) =
n @p
P
i
=
@
i=1
n @p
P
i
=
@
i=1
3
7
7
5
n
P
pi (1
pi )
i=1
n
P
i=1
xi pi (1
pi )
464
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and
@
@2l
2 = @
@
@l
@
=
@
@
n
P
xi (yi
pi ) =
i=1
The information matrix is
2
I( ; ) = 4
2
n
P
xi
i=1
@2l
@ 2
@2l
@ @
@2l
@ @
n
P
@2l
@ 2
pi (1 pi )
6
6 i=1
= 6 n
4 P
xi pi (1 pi )
i=1
@pi
=
@
n
P
i=1
x2i pi (1
pi )
3
5
n
P
i=1
n
P
i=1
3
xi pi (1
pi ) 7
7
7
5
pi )
x2i pi (1
which is a constant function of the random variables Y1 ; Y2 ; :::; Yn and therefore the expected
information is J ( ; ) = I ( ; )
5:(b) The maximum likelihood estimates of and
h
i
@l
@l
S( ; ) =
@
@
n
P
(yi
i=1
h
i
=
0 0
=
pi )
are found by solving the equations
n
P
xi (yi
pi )
i=1
which must be done numerically.
Newton’s method is given by
h
(i+1)
where
(i+1)
(0) ;
(0)
i
=
h
(i)
(i)
i
+S
(i)
;
(i)
is an initial estimate of ( ; ).
h
I
(i)
;
(i)
i
1
i = 0; 1; : : : convergence
10.7. CHAPTER 8
10.7
465
Chapter 8
1:(a) The hypothesis H0 :
speci…ed.
=
0
is a simple hypothesis since the model is completely
From Example 6.3.6 the likelihood function is
n
Q
n
L( ) =
for
xi
>0
i=1
The log likelihood function is
n
P
l( ) = n log +
log xi
for
>0
i=1
and the maximum likelihood estimate is
n
^=
n
P
log xi
i=1
The relative likelihood function is
R( ) =
n
L( )
=
L(^)
^
=
2 log R ( 0 ; X)
"
2 log
= 2
~
0
n log
= 2
= 2n
0
n
Q
0
for
~#
Xi
i=1
n
~ P log Xi
i=1
+n
~
0
+
~
1
~
0
~ 1
~
0
n
1P
log Xi
n i=1
n
P
since
0
log
~
The observed value of the likelihood ratio test statistic is
( 0 ; x) =
0
is
0
0
0
log
~
=
~
n log
= 2n
n
0
^
xi
i=1
The likelihood ratio test statistic for H0 :
( 0 ; X) =
n
Q
2 log R ( 0 ; X)
= 2n
0
^
1
log
0
^
log Xi
i=1
n
=
1
~
466
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
The parameter space is
approximate p-value is
p-value
(b) If n = 20 and
20
P
= f :
> 0g which has dimension 1 and thus k = 1. The
2
P (W
( 0 ; x)) where W
(1)
h
i
p
= 2 1 P Z
( 0 ; x)
where Z
log xi =
N (0; 1)
= 1 then ^ = 20=25 = 0:8 the observed value
25 and H0 :
i=1
of the likelihood ratio test statistic is
( 0 ; x) =
2 log R ( 0 ; X)
1
= 2 (20)
1
log
0:8
= 40 [0:25 log (1:25)]
1
0:8
= 1:074258
and
p
value
2
P (W 1:074258) where W
(1)
h
i
p
= 2 1 P Z
1:074258
where Z
N (0; 1)
= 0:2999857
calculated using R. Since p
the data.
value > 0:1 there is no evidence against H0 :
= 1 based on
4: Since = f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension k = 2 and
0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension q = 1 and the hypothesis is
composite.
From Example 6.5.2 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn
from an Weibull(2; 1 ) distribution is
L1 ( 1 ) =
2n
1
n
1 P
x2i
2
1 i=1
exp
with maximum likelihood estimate ^1 =
1
n
n
P
i=1
for
1
>0
1=2
x2i
.
Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from a
Weibull(2; 2 ) distribution is
L2 ( 2 ) =
2m
2
m
1 P
yi2
2
2 i=1
exp
with maximum likelihood estimate ^2 =
1
m
m
P
i=1
yi2
for
1=2
.
2
>0
10.7. CHAPTER 8
467
Since the samples are independent the likelihood function for ( 1 ;
L ( 1;
2)
= L1 ( 1 )L2 ( 2 ) for
> 0;
1
2)
is
>0
2
and the log likelihood function
l ( 1;
2)
=
2n log
n
1 P
1
2
1 i=1
x2i
2m log
m
1 P
yi2
2
2 i=1
2
for
1
> 0;
2
>0
The independence of the samples implies the maximum likelihood estimators are
n
1 P
X2
n i=1 i
~1 =
Therefore
l(~1 ; ~2 ; X; Y) =
If
1
=
2
=
n log
1=2
n
1 P
X2
n i=1 i
d
l( ) =
d
and
2
i=1
max
1 ; 2 )2
(
2 (n + m)
x2i +
3
n
P
1
n+m
i=1
and therefore
max
1 ; 2 )2
l( 1 ;
2 ; X; Y)
=
(n + m) log
0
m
P
yi2
for
2 ; X; Y)
>0
we note that
i=1
x2i +
m
P
i=1
yi2
1=2
yi2
1
n+m
The likelihood ratio test statistic is
(X; Y;
i=1
l( 1 ;
i=1
x2i +
m
P
(n + m)
0
n
P
2
+
l ( ) = 0 for
=
(
n
P
1
2 (n + m) log
which is only a function of . To determine
d
d
m
1 P
Y2
m i=1 i
m log
then the log likelihood function is
l( ) =
1=2
m
1 P
Y2
m i=1 i
~2 =
n
P
i=1
Xi2 +
m
P
i=1
Yi2
0)
= 2 l(~1 ; ~2 ; X; Y)
(
max
1 ; 2 )2
l( 1 ;
2 ; X; Y)
0
n
m
1 P
1 P
Xi2
m log
Y2
(n + m)
n i=1
m i=1 i
n
m
P
P
1
+ (n + m) log
Xi2 +
Yi2
+ (n + m) ]
n + m i=1
i=1
n
m
P
P
1
= 2[ (n + m) log
Xi2 +
Yi2
n + m i=1
i=1
n
m
1 P
1 P
n log
Xi2
m log
Y2 ]
n i=1
m i=1 i
= 2[
n log
(n + m)
468
10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
with corresponding observed value
(x; y;
0)
= 2[ (n + m) log
n log
Since k
q=2
1=1
p-value
1
n+m
n
1 P
x2
n i=1 i
n
P
i=1
m log
x2i +
m
P
i=1
yi2
m
1 P
y2 ]
m i=1 i
2
P [W
(x; y; 0 )] where W
(1)
h
i
p
= 2 1 P Z
(x; y; 0 )
where Z
N (0; 1)
11. Summary of Named
Distributions
469
Summary of Discrete Distributions
Notation and
Parameters
Function
fx
Discrete Uniforma, b
b≥a
a, b integers
x  a, a  1, … , b
 xr  N−r
n−x 
N  1, 2, …
 Nn 
r  0, 1, … , N
Binomialn, p
0 ≤ p ≤ 1, q  1 − p
n  1, 2, …
x  max 0, n − N  r,
 nx p x q n−x
x  0, 1, … , n
p x q 1−x
0 ≤ p ≤ 1, q  1 − p
x  0, 1
Negative Binomialk, p
k x
 xk−1
x p q

−k
x
k
p −q
k  1, 2, …
x  0, 1, …
Geometricp
pq x
0  p ≤ 1, q  1 − p
x  0, 1, …
Poisson
 x e −
x!
≥0
x  0, 1, …
Multinomialn; p 1 , p 2 , … , p k 
0 ≤ pi ≤ 1
i  1, 2, … , k
k
and ∑ p i  1
i1
Variance
Generating
EX
VarX
Function
b−a1 2 −1
12
ab
2
1
b−a1
b
∑ e tx
xa
t∈
nr
N
nr
N
1 −
r
N
 N−n
N−1
Not tractable
… , minr, n
Bernoullip
0  p ≤ 1, q  1 − p
Mean
Mt
1
b−a1
HypergeometricN, r, n 
n  0, 1, … , N
Moment
Probability
x
np
npq
pe t  q n
p
pq
pe t  q
kq
p
kq
p2
q
p

q
p2

fx 1 , x 2 , … , x k  
n!
x 1 !x 2 !x k !
x i  0, 1, … , n
EX i   np i
i  1, 2, … , k
i  1, 2, … , k
and ∑ x i  n
i1
t∈
p
1−qe t
k
t  − ln q
p
1−qe t
t  − ln q
t
e e −1
t∈
Mt 1 , t 2 , … , t k−1  
p x11 p x22 p xk k
k
t∈
VarX i 
p 1 e t 1  p 2 e t 2  
 np i 1 − p i 
p k−1 e t k−1  p k  n
i  1, 2, … , k
ti ∈ 
i  1, 2, … , k − 1
Summary of Continuous Distributions
Moment
Probability
Notation and
Density
Mean
Variance
Generating
Parameters
Function
EX
VarX
Function
Mt
fx
Uniforma, b
1
b−a
ba
axb
Γab
ΓaΓb
Betaa, b
a  0, b  0
ab
2
b−a 2
12
x a−1 1 − x b−1
0x1

a
ab
ab
ab1ab 2
Γ   x −1 e −x dx
e bt −e at
b−at
t≠0
1
t0

k−1
k1
i0
1∑ 
ai
abi
t∈
0
2
2
e −x− /2 
N,  2 
2 
 ∈ ,  2  0
2
2
e −log x− /2 
2 x
2
 ∈ ,   0
e 
2 /2
x0
1

Exponential
0
e −x/

x≥0
Two Parameter
1

Exponential, 
e −x−/

e 2
−e
2
2 2
2

2
Double
1
2
Exponential, 
e −|x−|/

2
2
1

e x−/−e
 ∈ ,   0
x∈
Gamma, 
x −1 e −x/
  Γ
  0,   0
x0
Inverse Gamma, 
  0,   0
x−/ 
 − 
  0. 5772
1
1−t
t
22
6
Euler’s constant

 2
x −−1 e −1/x
  Γ
1
−1
1
 2 −1 2 −2
x0
1
2
1

e t
1−t
1

e t
1− 2 t 2 
|t| 
x∈
 ∈ ,   0
DNE
t
x≥
 ∈ ,   0
2 t 2 /2
t∈
x∈
Lognormal,  2 
Extreme Value, 
e t
2

1

e t Γ1  t
t  −1/
1 − t −
t
1

DNE
tk
k!
Summary of Continuous Distributions Continued
Moment
Probability
Notation and
Density
Mean
Variance
Generating
Parameters
Function
EX
VarX
Function
Mt
fx
x k/2−1 e −x/2
 2 k
k
2 k/2 Γk/2
k  1, 2, …
1 − 2t −k/2
2k
t
x0


Weibull, 
x −1 e −x/

Γ1 
1


2

 2 Γ1 
2
1


1
2
Not tractable
  0,   0
x0
Pareto, 
 
x 1

−1
2
−1 2 −2
  0,   0
x
1
2

22
3
e t Γ1 − tΓ1  t
DNE
DNE
DNE
0
k
k−2
k  2, 3, …
DNE
k  3, 4, …
k2
k 2 −2
2k 22 k 1 k 2 −2
k 1 k 2 −2 2 k 2 −4
k2  2
k2  4
e −x−/
 1e −x−/
Logistic, 
 ∈ ,   0
1
 1x−/ 2
 ∈ ,   0
DNE
x∈
Γ
tk
−k1/2
2
k1
2
1 xk
k Γ
k  1, 2, …
k
2
x∈
k1
2
k1
k2
Fk 1 , k 2 
k1
2
Γ
k 2  1, 2, …

x∈
Cauchy, 
k 1  1, 2, …
2
−Γ 1 
x
k1
2
−1
Γ
Γ
1
k 1 k 2
2

k2
2
k1
k2
x
−
k 1 k 2
2
DNE
x0
X

X1
X2
1
2
 BVN, 
fx 1 , x 2  
 1 ∈ ,  2 ∈ 

 21
 1  2
 1  2
 22
 1  0,  2  0
−1    1
1
2|| 1/2
e
− 12 x− T  −1 x−
x 1 ∈ , x 2 ∈ 
Mt 1 , t 2 


 e
T t 1 t T t
2
t 1 ∈ , t 2 ∈ 
12. Distribution Tables
473
N(0,1) Cumulative
Distribution Function
This table gives values of F(x) = P(X ≤ x) for X ~ N(0,1) and x ≥ 0
x
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
0.00
0.50000
0.53983
0.57926
0.61791
0.65542
0.69146
0.72575
0.75804
0.78814
0.81594
0.84134
0.86433
0.88493
0.90320
0.91924
0.93319
0.94520
0.95543
0.96407
0.97128
0.97725
0.98214
0.98610
0.98928
0.99180
0.99379
0.99534
0.99653
0.99744
0.99813
0.99865
0.99903
0.99931
0.99952
0.99966
0.99977
0.01
0.50399
0.54380
0.58317
0.62172
0.65910
0.69497
0.72907
0.76115
0.79103
0.81859
0.84375
0.86650
0.88686
0.90490
0.92073
0.93448
0.94630
0.95637
0.96485
0.97193
0.97778
0.98257
0.98645
0.98956
0.99202
0.99396
0.99547
0.99664
0.99752
0.99819
0.99869
0.99906
0.99934
0.99953
0.99968
0.99978
0.02
0.50798
0.54776
0.58706
0.62552
0.66276
0.69847
0.73237
0.76424
0.79389
0.82121
0.84614
0.86864
0.88877
0.90658
0.92220
0.93574
0.94738
0.95728
0.96562
0.97257
0.97831
0.98300
0.98679
0.98983
0.99224
0.99413
0.99560
0.99674
0.99760
0.99825
0.99874
0.99910
0.99936
0.99955
0.99969
0.99978
N(0,1) Quantiles:
p
0.5
0.6
0.7
0.8
0.9
0.00
0.0000
0.2533
0.5244
0.8416
1.2816
0.01
0.0251
0.2793
0.5534
0.8779
1.3408
0.02
0.0502
0.3055
0.5828
0.9154
1.4051
0.03
0.51197
0.55172
0.59095
0.62930
0.66640
0.70194
0.73565
0.76730
0.79673
0.82381
0.84849
0.87076
0.89065
0.90824
0.92364
0.93699
0.94845
0.95818
0.96638
0.97320
0.97882
0.98341
0.98713
0.99010
0.99245
0.99430
0.99573
0.99683
0.99767
0.99831
0.99878
0.99913
0.99938
0.99957
0.99970
0.99979
0.04
0.51595
0.55567
0.59483
0.63307
0.67003
0.70540
0.73891
0.77035
0.79955
0.82639
0.85083
0.87286
0.89251
0.90988
0.92507
0.93822
0.94950
0.95907
0.96712
0.97381
0.97932
0.98382
0.98745
0.99036
0.99266
0.99446
0.99585
0.99693
0.99774
0.99836
0.99882
0.99916
0.99940
0.99958
0.99971
0.99980
0.05
0.51994
0.55962
0.59871
0.63683
0.67364
0.70884
0.74215
0.77337
0.80234
0.82894
0.85314
0.87493
0.89435
0.91149
0.92647
0.93943
0.95053
0.95994
0.96784
0.97441
0.97982
0.98422
0.98778
0.99061
0.99286
0.99461
0.99598
0.99702
0.99781
0.99841
0.99886
0.99918
0.99942
0.99960
0.99972
0.99981
0.06
0.52392
0.56356
0.60257
0.64058
0.67724
0.71226
0.74537
0.77637
0.80511
0.83147
0.85543
0.87698
0.89617
0.91309
0.92785
0.94062
0.95154
0.96080
0.96856
0.97500
0.98030
0.98461
0.98809
0.99086
0.99305
0.99477
0.99609
0.99711
0.99788
0.99846
0.99889
0.99921
0.99944
0.99961
0.99973
0.99981
0.07
0.52790
0.56749
0.60642
0.64431
0.68082
0.71566
0.74857
0.77935
0.80785
0.83398
0.85769
0.87900
0.89796
0.91466
0.92922
0.94179
0.95254
0.96164
0.96926
0.97558
0.98077
0.98500
0.98840
0.99111
0.99324
0.99492
0.99621
0.99720
0.99795
0.99851
0.99893
0.99924
0.99946
0.99962
0.99974
0.99982
0.08
0.53188
0.57142
0.61026
0.64803
0.68439
0.71904
0.75175
0.78230
0.81057
0.83646
0.85993
0.88100
0.89973
0.91621
0.93056
0.94295
0.95352
0.96246
0.96995
0.97615
0.98124
0.98537
0.98870
0.99134
0.99343
0.99506
0.99632
0.99728
0.99801
0.99856
0.99896
0.99926
0.99948
0.99964
0.99975
0.99983
0.09
0.53586
0.57535
0.61409
0.65173
0.68793
0.72240
0.75490
0.78524
0.81327
0.83891
0.86214
0.88298
0.90147
0.91774
0.93189
0.94408
0.95449
0.96327
0.97062
0.97670
0.98169
0.98574
0.98899
0.99158
0.99361
0.99520
0.99643
0.99736
0.99807
0.99861
0.99900
0.99929
0.99950
0.99965
0.99976
0.99983
This table gives values of F-1(p) for p ≥ 0.5
0.03
0.0753
0.3319
0.6128
0.9542
1.4758
0.04
0.1004
0.3585
0.6433
0.9945
1.5548
0.05
0.1257
0.3853
0.6745
1.0364
1.6449
0.06
0.1510
0.4125
0.7063
1.0803
1.7507
0.07
0.1764
0.4399
0.7388
1.1264
1.8808
0.075
0.1891
0.4538
0.7554
1.1503
1.9600
0.08
0.2019
0.4677
0.7722
1.1750
2.0537
0.09
0.2275
0.4959
0.8064
1.2265
2.3263
0.095
0.2404
0.5101
0.8239
1.2536
2.5758
Chi‐Squared Quantiles
This table gives values of x for p = P(X ≤ x) = F(x)
df\p
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
0.005
0.000
0.010
0.072
0.207
0.412
0.676
0.989
1.344
1.735
2.156
2.603
3.074
3.565
4.075
4.601
5.142
5.697
6.265
6.844
7.434
8.034
8.643
9.260
9.886
10.520
11.160
11.808
12.461
13.121
13.787
20.707
27.991
35.534
43.275
51.172
59.196
67.328
0.01
0.000
0.020
0.115
0.297
0.554
0.872
1.239
1.647
2.088
2.558
3.054
3.571
4.107
4.660
5.229
5.812
6.408
7.015
7.633
8.260
8.897
9.542
10.196
10.856
11.524
12.198
12.879
13.565
14.256
14.953
22.164
29.707
37.485
45.442
53.540
61.754
70.065
0.025
0.001
0.051
0.216
0.484
0.831
1.237
1.690
2.180
2.700
3.247
3.816
4.404
5.009
5.629
6.262
6.908
7.564
8.231
8.907
9.591
10.283
10.982
11.689
12.401
13.120
13.844
14.573
15.308
16.047
16.791
24.433
32.357
40.482
48.758
57.153
65.647
74.222
0.05
0.004
0.103
0.352
0.711
1.146
1.635
2.167
2.733
3.325
3.940
4.575
5.226
5.892
6.571
7.261
7.962
8.672
9.391
10.117
10.851
11.591
12.338
13.091
13.848
14.611
15.379
16.151
16.928
17.708
18.493
26.509
34.764
43.188
51.739
60.391
69.126
77.929
0.1
0.016
0.211
0.584
1.064
1.610
2.204
2.833
3.490
4.168
4.865
5.578
6.304
7.042
7.790
8.547
9.312
10.085
10.865
11.651
12.443
13.240
14.041
14.848
15.659
16.473
17.292
18.114
18.939
19.768
20.599
29.051
37.689
46.459
55.329
64.278
73.291
82.358
0.9
2.706
4.605
6.251
7.779
9.236
10.645
12.017
13.362
14.684
15.987
17.275
18.549
19.812
21.064
22.307
23.542
24.769
25.989
27.204
28.412
29.615
30.813
32.007
33.196
34.382
35.563
36.741
37.916
39.087
40.256
51.805
63.167
74.397
85.527
96.578
107.570
118.500
0.95
3.842
5.992
7.815
9.488
11.070
12.592
14.067
15.507
16.919
18.307
19.675
21.026
22.362
23.685
24.996
26.296
27.587
28.869
30.144
31.410
32.671
33.924
35.172
36.415
37.652
38.885
40.113
41.337
42.557
43.773
55.758
67.505
79.082
90.531
101.880
113.150
124.340
0.975
5.024
7.378
9.348
11.143
12.833
14.449
16.013
17.535
19.023
20.483
21.920
23.337
24.736
26.119
27.488
28.845
30.191
31.526
32.852
34.170
35.479
36.781
38.076
39.364
40.646
41.923
43.195
44.461
45.722
46.979
59.342
71.420
83.298
95.023
106.630
118.140
129.560
0.99
6.635
9.210
11.345
13.277
15.086
16.812
18.475
20.090
21.666
23.209
24.725
26.217
27.688
29.141
30.578
32.000
33.409
34.805
36.191
37.566
38.932
40.289
41.638
42.980
44.314
45.642
46.963
48.278
49.588
50.892
63.691
76.154
88.379
100.430
112.330
124.120
135.810
0.995
7.879
10.597
12.838
14.860
16.750
18.548
20.278
21.955
23.589
25.188
26.757
28.300
29.819
31.319
32.801
34.267
35.718
37.156
38.582
39.997
41.401
42.796
44.181
45.559
46.928
48.290
49.645
50.993
52.336
53.672
66.766
79.490
91.952
104.210
116.320
128.300
140.170
Student t Quantiles
This table gives values of x for p = P(X ≤ x ) = F (x ), for p ≥ 0.6
df \ p
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
>100
0.6
0.3249
0.2887
0.2767
0.2707
0.2672
0.2648
0.2632
0.2619
0.2610
0.2602
0.2596
0.2590
0.2586
0.2582
0.2579
0.2576
0.2573
0.2571
0.2569
0.2567
0.2566
0.2564
0.2563
0.2562
0.2561
0.2560
0.2559
0.2558
0.2557
0.2556
0.2550
0.2547
0.2545
0.2543
0.2542
0.2541
0.2540
0.2535
0.7
0.7265
0.6172
0.5844
0.5686
0.5594
0.5534
0.5491
0.5459
0.5435
0.5415
0.5399
0.5386
0.5375
0.5366
0.5357
0.5350
0.5344
0.5338
0.5333
0.5329
0.5325
0.5321
0.5317
0.5314
0.5312
0.5309
0.5306
0.5304
0.5302
0.5300
0.5286
0.5278
0.5272
0.5268
0.5265
0.5263
0.5261
0.5247
0.8
1.3764
1.0607
0.9785
0.9410
0.9195
0.9057
0.8960
0.8889
0.8834
0.8791
0.8755
0.8726
0.8702
0.8681
0.8662
0.8647
0.8633
0.8620
0.8610
0.8600
0.8591
0.8583
0.8575
0.8569
0.8562
0.8557
0.8551
0.8546
0.8542
0.8538
0.8507
0.8489
0.8477
0.8468
0.8461
0.8456
0.8452
0.8423
0.9
3.0777
1.8856
1.6377
1.5332
1.4759
1.4398
1.4149
1.3968
1.3830
1.3722
1.3634
1.3562
1.3502
1.3450
1.3406
1.3368
1.3334
1.3304
1.3277
1.3253
1.3232
1.3212
1.3195
1.3178
1.3163
1.3150
1.3137
1.3125
1.3114
1.3104
1.3031
1.2987
1.2958
1.2938
1.2922
1.2910
1.2901
1.2832
0.95
6.3138
2.9200
2.3534
2.1318
2.0150
1.9432
1.8946
1.8595
1.8331
1.8125
1.7959
1.7823
1.7709
1.7613
1.7531
1.7459
1.7396
1.7341
1.7291
1.7247
1.7207
1.7171
1.7139
1.7109
1.7081
1.7056
1.7033
1.7011
1.6991
1.6973
1.6839
1.6759
1.6706
1.6669
1.6641
1.6620
1.6602
1.6479
0.975
12.7062
4.3027
3.1824
2.7764
2.5706
2.4469
2.3646
2.3060
2.2622
2.2281
2.2010
2.1788
2.1604
2.1448
2.1314
2.1199
2.1098
2.1009
2.0930
2.0860
2.0796
2.0739
2.0687
2.0639
2.0595
2.0555
2.0518
2.0484
2.0452
2.0423
2.0211
2.0086
2.0003
1.9944
1.9901
1.9867
1.9840
1.9647
0.99
31.8205
6.9646
4.5407
3.7469
3.3649
3.1427
2.9980
2.8965
2.8214
2.7638
2.7181
2.6810
2.6503
2.6245
2.6025
2.5835
2.5669
2.5524
2.5395
2.5280
2.5176
2.5083
2.4999
2.4922
2.4851
2.4786
2.4727
2.4671
2.4620
2.4573
2.4233
2.4033
2.3901
2.3808
2.3739
2.3685
2.3642
2.3338
0.995
0.999
0.9995
63.6567 318.3088 636.6192
9.9248 22.3271 31.5991
5.8409 10.2145 12.9240
4.6041
7.1732
8.6103
4.0321
5.8934
6.8688
3.7074
5.2076
5.9588
3.4995
4.7853
5.4079
3.3554
4.5008
5.0413
3.2498
4.2968
4.7809
3.1693
4.1437
4.5869
3.1058
4.0247
4.4370
3.0545
3.9296
4.3178
3.0123
3.8520
4.2208
2.9768
3.7874
4.1405
2.9467
3.7328
4.0728
2.9208
3.6862
4.0150
2.8982
3.6458
3.9651
2.8784
3.6105
3.9216
2.8609
3.5794
3.8834
2.8453
3.5518
3.8495
2.8314
3.5272
3.8193
2.8188
3.5050
3.7921
2.8073
3.4850
3.7676
2.7969
3.4668
3.7454
2.7874
3.4502
3.7251
2.7787
3.4350
3.7066
2.7707
3.4210
3.6896
2.7633
3.4082
3.6739
2.7564
3.3962
3.6594
2.7500
3.3852
3.6460
2.7045
3.3069
3.5510
2.6778
3.2614
3.4960
2.6603
3.2317
3.4602
2.6479
3.2108
3.4350
2.6387
3.1953
3.4163
2.6316
3.1833
3.4019
2.6259
3.1737
3.3905
2.5857
3.1066
3.3101
Download