Uploaded by jamesfife79

Introduction to Mathematical Statistics and Its Applications, An (4th Edition) (Richard J. Larsen, Morris L. Marx) (z-lib.org)

advertisement
A ntroductio to
athema ical
Statistics
and Its Applications
Fourth Edition
Richard J. Larsen
I Morns L. Marx
n Introduction to
athematical Statistics
and I
pplications
Fourth Edition
Ricbard J. Larsen
Vanderbilt University
Morris L. Marx
University o/West Florida
Upper Saddle River, New Jersey 07458
Libral'} of Congress
Cat~llg-in-Public.ation
Data
Larsen. Richard 1.
An introduction to mathematical statistics and its applicatiOJls I
Richard J. Larsen. Morris L. Marx,-41h ed.
p.em.
Includes bibliographical references and index.
ISBN 0-13-186793-8
J. Mathematical stalistics. L Marx, Morris L.
OP data available
n. Title.
Editor-in-Chief/ACQuisitions Editor: Sally Yagan
ManagerlFormatter: Inlerecfive Compo_~ition Corporation
Assistant Managing Editor. Bayani Memioza de Leon
Senior Managing Editor. Linda Mihalov Behrens
Executive Managing Editor: Kathleen Schiapare/li
Manufacturing
Alexis Heydf-Long
Manufacturing Buyer: Maura Zaldivar
Marketing Manager: Halee Dinsey
Marketing Assistant: loon Won Moon
Director of Creative Services: Palll Bel/ami
Art Director: layne Conte
Cover
Bruce Kenselaar
Editorial Assistant Jennifer Urban
Cover Image: GeUy Images, Inc.
© 2006, 2001. 1986, 1981 Pearson Education, Inc.
Pearson Prentice Hall
Pearson Education, Inc.
Saddle River, NJ 07458
All
reserved. No part of this bo<lk may
be reproduced, in any form or by any means,
without permission in writing from the publisher.
Pearson Prentice Hall™ is a trademark of Pearson
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
0-13-186793-8
PearsOJl Education Ltd., London
Pearson Education Australia PTY. Limited, Sydney
Pearson Education Singapore, Pte., ltd
Pearson Education North Asia Ltd, Hong Kong
Pearson Education Canada, Ltd., Toromo
Pearson Education de Mexico, S.A. de c.v.
Pearson Education Tokyo
Pearson Education Malaysia, Pte. Ltd
Inc.
Table of Con
nts
vii
Preface
1
1
Introduction
L 1 A Brief l-he'Tn,'"
1.2
1.3
2
3
2
11
20
Some h'Xl'lmJ>les
A Chapter
Probability
2,1 Introduction
, . . . . , . . . . . .
2.2 Sample Spaces and the AlgdKa of Sets
2.3 . The Probability
2.4 Conditional Probability .
2.5 Independence
Combinatorics
2.7 Combi..ru:ltorial
Look at Statistics (Enumeration and
2.8 Taking a
Monte Carlo Techniques) . . . . . . . . . . . . . . . .
Random Variables
3.1 Introduction.
. . . . , . . , . , .
3.2 Binomial and Hypergeometric Probabilities
Discrete Random Variables ..
Continuous Random Variables
Expected Values.
3.6 The Variance . . . . .
Joint Densities.
Combining Random
Further Properties of the Mean and Variance
Order Statistics
Conditional Densities , . , . . . . . . . . . . .
Moment-Generating Functions . . . . . . . .
Taking a Second Look at Statistics (Iut(ttpreting Means)
Appendix 3.A 1 MINIT AB Applications . . . . . . . . . . . .
V:IITl>.n.
4 Special Distributions
4.1 Introduction........
4.2
Distribution .
''I''''
Distribution .
4.4
Negative Binomial Distribution.
4.5
4.6
Gamma Distribution. . . . . . . .
n...
21
24
36
42
69
113
.203
.220
240
.249
.271
274
.275
.292
.317
.322
.327
iii
iv
Table of Contents
4.7
at Statistics (Monte Carlo
Appendix 4.A.l MINIT AS Applications.
Appendix 4.A.2 A Proof of
Central
Theorem.
.333
.337
.341
5 Estimation
343
.'1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
344
5.2 Estimating Parameters:
Method of Maximum Likelihood
.346
and the Method of Moments .
5.3 Interval Estimation ...
5.4 Properties of Estimators
The Cramer-Rao Lower Bound
5.5 Minimum-Variance
5.6 Sufficient
.398
5.7 Consistency . . . .
.406
. . . . . . . . .
. ...... .
. ... 410
5.8
Estimation
. . . . . . . . .
. . . . . .
.423
5.9 Taking a Second Look at Statistics (Revisiting the Margin of Error)
.424
........ .
Appendix 5.A.l MINITAS Applications . . . . . . . .
6 Hypothesis Testing
6.1 Introduction .
. .....
6.2 The Decision Rule.
. . . . . .
6.3
Binomial
p = Pu .
6.4
I and Type IT Errors . . . . . .
6.5 A Notion of Optimality:
Generaliled Likelihooo
6.6 Taking a
Look at Statistics (Statistical Significance
versus "Practical" Significance) . . . . . . . . . . . . . .
7
.428
.440
446
.. 462
.466
469
The NorltUl.l Distribution
7.1
427
.428
Introduction
Comparing
v-I!
an d s/jYi
7.3 Deriving the Distribution of
7.4 DraWing Inferences About 11 .
About (12
7.6 Taking a Second
at Statistics ("Bad" Eslimatur~) .
Appendix 7.A.1 M1NITAB Applications . . . . . . .
App~m.lix 7.A.2 Sume Dlstrlbution Results for Y and
Appendix
A Proof Theorem 7.5.2 . . . . . . .
Appendix 7.AA A Proof that the One-Sample, TestIs a
8 Types of Data: A Brief
8.1 Introduction . . . . . . . . . . . . . . .
. . . . . . .
Classifying Data .
Taking a Second Look at Statistics (Samples
Not "Valid")
.473
.481
.499
.509
.510
.514
. 516
. 519
.523
528
.552
Table of Contents
9 Two-Sample Problems
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Testing Ho: JJ..x = JJ..y- The Two-Sample t
.................
9.3 Testing Ho:
=
The FTest . . . . . . . . . . . . . . . . . . . . . . . .
9.4 Binomial Data: Testing Ho: px = Py . . . . . . . . . . . . . . . . . . . . . .
9.5 Confidence Intervals for the Two-Sample Probl.em . . . . . . . . . . . . . .
9.6 Taking a Second Look at Statistics (Choosing Samples) . . . . . . . . . . .
Appendix
A Derivation of the Two-Sample t Test
(A Proof of Theorem 9.2.2) . . . . . . . . . . . . .
. .........
Appendix 9.A.2 MINITAB Applications . . . . . . . .
oJ cri-
v
SS3
554
555
582
591
593
10 Goodness-of-Fit Tests
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .
10.2 The Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Goodness-of-Fit Tests: All Parameters Known. . . . . . . . . . . . . . . . .
10.4 Goodness--of-Fit
Parameters Unknown . . . . . . . . . . . . . . . . .
Contingency Tables
..............................
10.6 Taking a Second Look at Statistics (Outliers) . . . . . . . . . . . . . . . . .
Appendix 10.A.1 MINITAB AppliC3tions
...................
S98
599
599
606
11 Regression
646
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 The Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.4 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5 The Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . .
11.6 Taking a Seoond Look at Statistics (How Nor to Interpret the Sample
Correlation Coefficient) . . . . . . . . . . . . . . . . . . . . . . ..
Appendix 1l.A.l MINITAB Applications . . . . . . . . . . . . . . . . . . . . .
Appendix 11.A.2 A Proof of Theorem 11.3 j . . . . . . . . . . . . . . . . . . . .
615
627
64{)
644
. 641
. 647
. 677
. 102
. 117
.
.
728
12 The Analysis of Varbmce
732
12.1 Introduction . . . . .
12.2 The FTest . . . . . .
123 Multiple Comparisons: Tukey's Method . . . . . . . . . . . . . . . . . .. 747
12.4 Testing Subhypotheses with Contrasts . . . . . . . . . . . . . . . . . . . . . 751
Data Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
Taking a Second
at
(Putting
Subject of Statistics
Together-The Contributions of Ronald A. Fisher) . . . . . . . . . . . . . . 161
Appendix 12.A.l MINITAB AppliC3tions . . . . . . . . . . . ..
...... .
"""VfJ'-'LllULA 12.A.2 A Proof of Theorem
. . . . . . . . . . . . . . . . . . . . . 166
Appendix 12.A.3 The Distribution of sfft~~~~) When HI Is
. . . . . . . . 767
vi
Table of Contents
13 Rondomized Block D~ilJns
13.1 Introduction . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
Block Design . . . . . . . . . . . . . . . ..
13.2 The FTest for a
13.3 The Paired l Test . . .
. . . . . . . . . . . . . . . . . .. ""
13.4 Taking a Second Look at Statistics (Choosing Between a Two--Sample
t Test and a Paired,
. . . . . . . . . . . . . .. . . . . ,.
. ...
Appendix HAl MINITAB Applications
. . . . . . . ,. . . , . , . . . . . .
772
773
774
788
14 Nonpanmetric Statistic::s
14.1 InLIoouctioo . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .
.
14.2 The Sign Test . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . .
.
Wilcoxon Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .
14.4 The Kruskal-Wallis Test . . . . . . . . . . . . . . . . . . . . . . . . . . ..
14.5 The Frierlman Test . . . . . .. . . . . . . . . . . . . . . . . , . . . . . .
14.6 Testing for Randomness . . . . . . . . . . . . . .. . . . . , . . . . , . . .
a Second Look at Statistics (Comparing Parametric
14.7
Nonparametric Procedures) . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix 14.Al MINITAB Applications . . . . . . . . . . . . . . . . . . . . . .
802
R03
804
810
826
832
835
'196
800
841
846
Appendix: Statistical Tables
Answers to Sdected Odd-Numbered Questions
876
Bibliography
907
Index
Preface
our text has been sufficiently well received to justify this fourth
who use the text like the coupling of the rigorous
and structured treatment of probability and statistics with real-world case studies and
users of the book have been helpful in pointing out ways to improve our
cmlIl,l:~es found in this fourth edition reflect the many helpful suggestions
we
as weB as our owo experience in teaching from the text
Our first goal in writing this fourth edition was to continue strengthening the bridge
",ptrwl"p" theory and practice. To that end, we have added sections at the end
each
Taking a Second Look at Statistics. These sections discuss practical problems
in applying the ideas in the chapter and also deal with common misunderstandings or
faulty approaches. We also have induded a new section on Bayesian estimation that
well into Chapter 5 on estimation and gives another view of how estimation
"'1J1J,u'cu. It introduces students to Bayesian ideas and also serves to reinforce the
main concepts of estimation.
ideas that are useful and important lie beyond the mathematical scope of the text.
To
such topics within the mathematical context of the book, we have
and
the materia! on simulation and on the use of Monte Carlo studies.
MINITAB is the main tool for simulations and demonstrating computer computations,
the MINITAB sections have been rewritten to conform to Version 14, the latestrelieas<e.
.... '''rTl •• r to
of the book has been the length of time required to cover
cnapters 2 and 3. One of the major changes in the fourth edition is a substantial 1'''''''''''''')T'I
basic probability material. Chapters 2 and 3 have been reorganized and rewritten with
the
a streamlined presentation. These chapters are now easier to teach
can be
less time, yet without loss of rigor.
In that same spirit, we have also improved and streamlined the development of the I,
and F distributions in Chapter 7, the heart of the book. The material
to
the development of the chi square distribution, In addition, we
have made a much better division between the theoretical results and their applications.
Because of the efficiencies in the new edition, covering Chapters 1-7 plus
additional
in one semester is now possible.
a book
All in all, we feel that this new edition furthers our objective of
emphasizes the interrelation between probability theory, mathematical
data analysis. As in previous editions, real-world case studies and mS;WI:IC<ll aJllec:aotes
provide valuable tools to effect the integration of these three areas.
this approach. ..:ILLIUClIll:>
the classroom has strengthened our
importance of each area when seen in the context of the other two.
vii
viii
Preface
SUPPLEMENTS
InstruClor's Solulions Manual. This resource contains worked-out solutions to all text
exercises.
Student SO/Uliom Manual. Featuring complete solutions to selected
tool for
~I; they f'itudy and work
the problem lll,,"'-LU:U.
this is a
ACKNOWLEDGMENTS
We would like to thank the following reviewers for
and suggestions:
detailed and valuable
CTllUC'lsn£lS,
Ditlev Monrad, University of lIIinois at Urbano-Champaign
Vidbu S. Prasad, University of Massachusetts, Lowell
Xu, California State University. Long Beach
Katherine SL Oair, Colby
YimiIl
Michigan State University
Nicolas
Univers;ty of California) Los Angeles.
University of Oregon
Ohio University
University of Ca/ifomia 01 San Diego
Finally, we
and acquisitions
our
and editorial u;>"",,,, ••,,,
of Interactive
of
book.
to Prentice Hall's math editor-in-chief
Sally Yagan,
managing editor,
Mendoza de Leon,
Jennifer Urban, as well as to project manager, Jennifer Crotteau,
Corporation, for
excellent teamwork in the production
Richard J. Larsen
Nushllille, Teilne~~ee
Morris L. Marx
Pensaco/a, Florida
CHAPTER
1
Introduction
1.1
A BRIEF HISTORY
1.2
1.3
SOME EXAMPLES
A CHAPTER SUMMARY
Francis Galton
"Some people hate the very name of statistics, but I find them full of beauty
and interest. Whenever they are not brutalized, but delicately handled by
the higher methods, and are warily interpreted, their power of dealing
with
phenomena is extraordinary. They are the only tools by
which an opening can
cut through the formidable thicket of
that bars the path of those who pursue the Science of man. "
-Francis
2
1.1
Chapter 1
Introduction
A BRIEF HISTORY
Statistics is the science of sampling. How one set of measurements differs from another and
what
implications ofthose differences
are its
Conceptually,
the subject is rooted in the mathematics of probability, but its applications are everywhere.
Statisticians are as likely to be found in a research lab or a field station as they are in a
government
an advertising finn, or a
classroom.
Properly
statistical techniques can be enormously effective clarifying and
quantifying natural phenomena. Figure 1.1.1 illustrates a case in point Pictured at the top is
a facsimile of the kind of data routinely recorded by a seismograph-listed chronologically
are the (x;(;LUrenc.;e times and Richter magnitudes (or a series of carthquakes. Viewed in
that {onnat, the numbers are largely
No paUerns are
nor is there
any obvious connection between the frequencies tremors and their severities.
By way of contrast,
bottom of
1.1.1 shows a statistical summary (using some
the
techniques we will learn
of a set of seismograph data recorded
217
218
219
6119
4:53 PM.
27
712
6-fflA.M.
7/4
220
221
817
8f1
8:19AM
1:10 A.M
3.1
2.0
4.1
10-46 P M
.16
,
~ N;80,338.16e-t.98IR
o
4
5
6
Magnitude on Richter scale, R
FIGURE 1.1.1
7
Section 1.1
A Brief History
3
southern California (66). Plotted above the Richter (R) value of 4.0, for example, is the
"".... "..,,'" number (N) of earthquakes occurring per year in that region
magnitudes
in the range
to
Similar points are included
R-values centered at 4.5, 5.0,
6.0,6.5. and 7.0. Now we can see that the two variables are related: Describing
(N, R)'s exceptiona]]y well is the equation N = 8O.338.16e-1.981R.
In general,
techniques are employed
to (1) describe what did happen
or (2) predict what might
graph at the bottom of Figure 1.1.1 does both.
"fit" the
N = /Joe- fJ1R to the observed set of minor tremors (and finding
= 80,338.16 and fh = -1.981), we can then use that same equation to predict the
. . "-•.,La,""",",,,",, of events lIot represented
the data set. If R = 8.0, for
we would
expect N to equal 0.01:
N = 80,338.16e-1.98HS.0)
=0.01
implies that Californians can
catastrophic earthquakes registering on the
of 8.0 on the
scale to occur. on the
once every 100
It is unarguably true that the interplay between
and
to
what we see in Figure 1.1.1-i5 the
most
theme in statistics. Additional
highlighting
connection will be discussed Section 1.2.
To set the stage for the rest of the
though, we will conclude Section 1.1 with brief
1'".,lnn.,..;: of probability and statistics.
are interesting stories, replete with large casts
of unusual characters and plots that have more than a
unexpected
and tums.
Probability: The Early Years
No one knows where or when the notion chance first arose; it fades into OUT prehistory.
Nevertheless, evidence linking early humans Wilh devices
generating random events is
plentiful: Archaeological digs. for example, throughout the ancient world consistently tum
up a
overabundance of astragali, the heel bones of sheep and
Why should the frequencies of these bones be so
hypothesize that OUT
were fanatical foot
but two other explanations
seem more plausible: The
were used for religious ceremonies and for gambling.
Astragali have six
but are not symmetrical
Figure 1.1.2). Those found
m
typically have their
numbered or engraved. For many ancient
Sheep astTagalus
FIGURE 1.1.2
4
Chapter 1
Introduction
civilizations, astragali were the primary
through which oracles
the
opinions nf their gods, In Asia Minor, for example, it was
in diviMlticm J ilt:!> to
roll, or cast, five astragali, Each possible configuration was
with the name of a
3,3,4,4),
instance,
god and
with it the sought-after advice. An outcome of
was
to
the throw of the savior
and its
was taken as a sign of
encouragement (36):
One one, two
two fours
The deed which thou meditatesl, go do it boldly.
Put thy hand to it The gods have given thee
favorable omens
Shrink not from it in
mind, for no evil
shall befall thee.
on the other hand, the throw
for cover:
the child-eating Cronos, would send
Three fours and two sixes.. God
as follows.
Abide. in thy house., nor go elst~whiere
Lest a ravening and destroying beast come nigh thee.
For I see not that this business is safe. But bide
thy time.
Gradually, over thousands of years,
were
by dice,
became
most common means for
random events. Pottery
found in
tomo",
hefore ?OOO R ('; by the time the Greek
was
in toll
dice were
(Loaded dice have also been found. Mastering the
mathematics of probability would
to be a formidable task for our ancestors, but
they
learned how to cheat!)
drawn between divination
lack of historical records blurs the distinction
ceremonies and recreational gaming. Among mOre recent societies, though, gambling
pm,pre'Pll as a distinct entity, and
popularity was irrefutable. The
and Romans
(91),
were consummate
as were
early
for many
Roman games
been lost, but we can
the lineage of certain modern diversions iJl what Wab playt:d
the; Miudk
most
game of that period was
hazard, the name deriving from
al zhar,
means "a
" Hazard is thought to have
brought to
by soldiers returning from the
its rules are much like those of our TYlr''''''''''''' ..
craps. Cards were first introduced in the fourteenth century and immedi<ltely gave rise to
a game known as Primero, an
form
Board
such as backgammon,
were also
during this
Given
rich tapestry of
and the
with gambling that characterized
so much
Western world, it may seem more than a
that a formal
study of probability was not undertaken sooner than it was. As we will see
first instance of anyone conceptualizing probability, in terms of a mathematical
That means that more than 2000 years of dice games,
occurred in the sixteenth
and board
passed by
someone finally had the insight to write
card
down even the simplest probabilistic abstractions.
rl<l",
Section 1.1
A Brief History
5
Historians generally agree that, as a subject, probability got off to a rocky sTart U"'~.C1W"'"
incompatibility with two of the most dominant
in the evolution of our Western
culture, Greek philosophy and eady Christian
The
were comfortable
with the notion of chance (something the Christians were not), but it went against
nature to suppose that random events could be quantified in any
fashion.
,enevc::u that any
to
mathematically what did happen with what should
have happened was,
phraseology, an improper juxtaposition
"earthly plane"
with the "heavenly plane."
Making matters worse was the antiempiricism that permeated
thinking. Knowlto them, was not something
should be
by
It was better
to reason out a question logically than to search for
explanation in a set of numerical
observations.
these two attitudes had a deadening effect: The Greeks had no
motivation to
about probability in any
sense, nor were they
with
might have pointed them
the direction of a
problems of interpreting data
probability calculus.
If the prospects for the study of probability were dim under the
they became
even worse when Christianity broadened its sphere of influence.
Greeks and Romans
at least accepted the existence chance. They believed their gods to be either unable or
unwilling to get involved in matters so mundane as the outcome of the roll of a die.
writes:
Nothing is so uncertain as a cast of dice, and yet there is no one wbo plays oflen who does
not make a Venus-throw l and occasionally twice and thrice in succession. Then are we, like
foots, to
to say that it bappened by the direction ofVenl.lS rather lhan by chance?
For the early Christians,
there was no such thing as
Every event that
happened, no matter how
was perceived to be a direct manifestation of God's
deliberate intervention. In the words of St. Augustine:
Nos eal> caUS8S quae dicuntur fortuit.ae ... non dicimus
sed latentes; easque tribuimus vel veri Dei ...
(We say that those causes that are said to be chance
are not non-existent but are
and we attribute
them to the will of the true God ... )
Taking
position makes the study probability moot, and it makes a ",...
bjJist a heretic. Not surprisingly, nothing of significance was accomplished in the subject
[or the next fifteen hundred
It was in the sixteenth
that probability, like a mathematical Lazarus, arose
from the
Orchestrating
resurrection was one of
most eccentric
in
own admission, Ca:rC1ano
the
history of mathematics, Gerolamo Cardano. By
personified the best and the worst-the Jekyll and the Hyde-of the Renaissance maIL
He was born in 1501 in Pavia. Facts about his personal life are difficult to verify.
wrote
an autobiography, but
penchant for lying raises doubts about much of what
says.
n.h",_
1When rolling four aSlragali, each of whicb is numbered on/wI' sides, a Venus-throw was having each ofthe
fOUf
numbers appear.
6
Chapter 1
Introduction
Whether true or not., though, his "one-sentence" self-assessment paints an interesting
Nature has made me capable in aU manual work, it has givetl me the spirit of a philosopher
and ability in the
taste and good manners, voluptuousness,
it bas made
me
faithful, fond of wisdom,
inventive. courageous. fond of learning and
teaching, eager to equal the best, to discover new things and make independent progress, of
modest character, a student of medicine, interested in curiosities and discoveries, cunning.
sar·casllc, an initiate in the mysterious lore, industrious, diligent,
living only
from day to
impertinent, contemptuous of religion,
sad. treacherous,
)\Ii1'.lgicjau HIt..:! .sOj'~fer, milSerable, hateful, lascivious, obscene, lying, obsequious. fond of
the prattle of old men, changeable,
indecent, fond of women, quarrelsome. and
because of the conflicts between my nature and soul I am not understood even by those with
whom I associate most frequently.
Formally trained in medicine, Cardano's interest in probability derived from his
addiction to gambling. His love of dice and cards was so all-consuming that he is
said to have once sold all his wife's possessions
to
table stakes! Fortunately,
He began looking
a mathematical
somethlngpositive came out of Cardano's
model that would describe, in some abstract way, the outcome of a random event
What he eventuaUy
is now called the classical definition of probability: If the
associated with some action is I'l,
total number of possible outcomes, aU equaUy
and if m of
n result in the occurrence of some
event,
tbe probability
of that event is min. If a fair die is ron ed, there are n - 6
outcomes. If the
event "outcome is greater than o:r equal to 5" is the one in which we are interested,
then m = 2 (the outcomes 5 and 6) and the probability of the event is ~, or
Figure 1.1.3).
Cardano bad tapped into the most basic principle in probability. The model be dis~
covered may seem trivial in retrospect, but it represented a giant step forward:
was
the first recorded instance of anyone computing a theoretical, as opposed to an empirical,
probability.
the actual impact of Cardano's work was minimal. He wrote a book in
1525,
its publication was delayed until 1663. By
the
of the Renaissance, as
well as interest in probability. bad shifted from Italy to France.
The date cited by many historians (those
are not Carda no supporters) as the
"beginning" probability is 1654. In Paris a well-ta-do gambler,
Chevalier de Mere,
t
• 1
'" 2
" 3
• 4
Outcomes greater
than Qr eqUAl to
5; probability '" 2f6
__
,-----~-----,
\_-"-~--
·_~_I
Possible outcomes
AGURE 1.1.3
Section 1.1
A Brief History
1
asked several prominent mathematicians,
Blaise Pascal, a series of questions,
the best-known of which was the problem of points:
Two people. A and B. agree to playa
of fair games until one person has won six games.
They each have wagered the same amoWlt of money. the intention
thaI th.e winner
win be awarded the entire pot. But suppose. for whatever reason, the series is prematurely
tennina ted, at which. poinl A has won five
and B three. How should the stakes be
divided?
{The correct answer is that A should receive
of the total amount wagered.
(Hint: Suppose the contest were
What scenari~ would lead to A's being the
first person to win six games?)]
Pascal was intrigued by de Mere's questions and shared his thoughts with Pierre Fennat,
a Toulouse civil servant and probably
most
in Europe. Fennat
graciously replied, and from the now
correspondence came not
only the solution to the problem
but the foundation
more general results.
More significantly, news of what Pascal and
were working on spread quickly.
was the Dutch
and mathematician
Others got involved, of
that plagued Cardano a century
Christiaan Huygens. The
earlier were not going to
again.
Best remembered for his work in optics and astronomy, Huygens, early in his career,
WaS intrigued by the problem of points. In 1657 he published
Ratiociniis in Aleae Ludo
(Calculations in Garnes of
a very significant
more comprehensive
anything Pascal and Fermat had done. For almost fifty years it was the standard
has supporters who
in the theory of probability. Not surprisingly,
he should be credited as the founder of probability.
Almost aU the mathematics of probability was still waiting to be discovered. What
wrote was only the humblest of beginnings, a set of
liule resemblance to the topics we teach today. But
mathematics of probability was finally on firm footing.
Statistics: from Aristotle to Quetelet
Historians generally agree that the basic principles of ~'U'''ULL~'~' Ti~asonmg began to
coalesce in the middle of the nineteenth century. What
was the
union of three different "sciences," each of which had
along more or
less independent lines (206).
first of these sciences, what the Gennans called ')UlOic;n/lCur.!Qe.
collection
comparative information on the history, resources, and miJitary prowess
" ' ,. ."VUl<>. Although efforts in this direction peaked in the seventeenth and elll'ntc::eru n
"" .."" ..... "'.. the concept was hardly new: Aristotle had done something
t>1> ..,t,,,,.., B.C. Of the three movements, this one had the least influence on
modern statistics, but it did contribute some tenninology: The word statistics,
arose connection with studies of this type.
The
movement, known as political arithmetic, was defined
one of
prclPcments as "the art
reasoning by figures, upon things relating to government." Of
8
Chapter 1
Introduction
more recent vintage than Staatenkunde,
arithmetic's roots were in seventeenth'-<"5u." .... Making population estimates and constructing mortality tables were
two of
problems it frequently dealt with. In spirit, political arithmetk was
to
what is now called demography.
The
component was the
a calculus of probabiJily. As we saw
earlier, this was a movement that essentially started in seventeenth-century Fmnce in
response to
questions, but it quickly
the "engine" for analyzing
all kinds of data.
Staatenkunde:The
::I.1":l11','ij'A
Description of States
The need for
infonnation on the customs and resources of nations has been
obvjous since antiquity.
is credited with the first major
toward that
descriptions
objective: His Polileiai, written the fourth century B.C, contained
of some 158 different city-states. Unfortunately, the thirst for
thllt led to the
~r.f,lp"m fell victim to the intellectual
of the Dark
and almost 2000 years
elapsed before any similar projects
magnitude were undertaken.
The subject resurfaced during
and the Germans showed the most
meaning the comparative
They not only gave it a
but
were also the first
1660} to incorporate the
into a
curriculum. A leading figure in the
movement was Gottfried
the middle of the
taught at the University of Gottingen
",.."V"", Achenwall's claims to fame is that he was the first to use the word statistics in
in the preface of his 174Y book Abriss der Statswissenschllft der ""1'"'''<'''
print. It
vornehmsten
Reiche ulld Repllbliken.
word comes from the Italian root
Slalo,
>, implying that a statistician is someone concerned with government
it seems to have been
For almost one hundred
affairs.) As
years the word statistics continued to be ussocinted with the comparative description of
states. In the middJe of the nineteenth century, though. the term was redefined, and
statistics became the new name
what had previously
political arithmetic.
How important was the work of AchenwaU and his predecessors to the development of
be sure, their contributions were more indirect
statistics? That would be difficult to say.
point out the
than direct. They left no
and DO general theory. But they
need for collecting accurate data
more importantly,
the notion
that something complex--even as complex as an entire nation--<:an be
studied
by gathering information on its
parts. Thus, they were
important
support to the then growing
that illduction, rather than deduction. was a more surefooted
to scientific truth.
Political
In the sixteenth
the English government
to compi1e records, called bills
numbers deaths and their underlying
causes. Their motivation largely stemmed from the
epidemics that had periodically
ravaged
in the not-too-distant past and were
to become a problem in
England. Certain
officials, including the very
Thomas Cromwell,
of morlaiily, on a parish-to-parish basis,
Section 1.1
A Brief History
9
The tOll (or Ihe year-A General Bill ra.. lJtis presenl yea" ending Ihe \11 <>llJec<;mher, 11165,
a;:.::tl<ding 1(\ (he Report mooe to the Kine" 1'1)1)« excelle", MlioJetlty, by .he Co. ot Purim C1cri<s of
LOod., & c.-gi.es the £oIlowing s.ummllf)l of the results; the details of the s.everal parilihes we omi!..
Ihey heing made liS in 1625, excepl. that the ool-parime,; wel'e I'lOOI 12:Buried in lhe 21 Par;!lhes within the "'l\lIs....... .. . . . . . ........ . . . . . . . . . . .... .... . . . . . . . 15,207
Whercuf oi: the plague. . .
. ...••••........ '" ....... . . . . . . . ............ . . . . . ....••••
9.887
Buried in Ihe 16 Parisbes without l11e walls... . . . . . . . ......... . . . . . . . . .. ... . . .. . . . . . . . . .. 41.351
Wher«>f <>l 1i1e plague.......... . . . . . .... . . ..... . . .. . . ... ..... . . . . . . . . . .. ........ . . . . . . . 28,838
AUhe Pes, house. 10181 btl,ied ...... ' ... . . . . . . .. ........ .. .............................
159
or lhepl&gue.... .................... ................................ ..............
1St.
Buried in tbe 12 oOI·Parishes in Middlese~ and surrey. .. ........... . . . . . .. ......... . . . . . 18.554
Whereof 0{ the plague............. ................................ ................... 21,420
Buried in the 5 PuTi~ in the CHy aoo Ubmies or WeslminSl.et .................. ...... 12.194
Whe"",r Ihe pI.I;,s.I>e .
• ........... ' ................. •• ... .. 8,403
The lola) of alit he dUislenings . . . .. .. . . . .. . . . . . . . .... ................................. 9!J61
The lotal of aU the burials this year . . . . . . . . ......... . . . . . .. ............................ 91,:lO6
Whereof of the pl"Slle
. •. . . . . .. ... . .. . . . .. . . . .•••• .. . .. .. .. . . . . . .••••. . . . . . .. .. 68,596
Abortive ..00 Sdllboroe ......•....
611
l~~45
f'eave.- .................. .
5..251
Oriping in Ihe Ollis .........••••••
Hallg'd & mllde llway Ihemll<:!ved .
HeJidmould .hor and mould Callen.
1,28&
7
14
110
i'llI"""....... .... ........ .......
2Zi
Poy~.........................
I
46
86
OUinsie...........................
35
Ridel............................
535
2
14
Rising 01 the Ugh"" ... '" .. .. .. ..
397
3A
20
Jaundice ...................
Impos.some .......................
Kill by ul<efal accidents ....••••••.
King'. E.ill .......................
,.
Levroolc ............. ..
Lethargy ...••••••......... , ......
Li\ie.l'glown .......................
Bloody Flux, Scowr;"g & FlUJ( ....
Bum' "lid Scalded ................
Calenture .....
... ,
C.. tI(:(:r. Call8relle & FislOla •••••••
Canker and Thrush ...............
Childbed .........................
Otrisl.1mes "lid rnfanls ..••••......
1..258
F....-.dIPax ...................... .
86
Mc&grom ..-.d Heoom:h •..........
12
Frighloo ......................... .
OOUI & Sciali<:a .................. .
OrieL ............ .
23
Measles ..........................
Mutlhered & Shot ................
Overlaid & Starved ..
7
9
A~)I.
and Suddenly ..•..•••.•..
Bedrid .......................... ..
116
10
BLast«l ......................... .
5
8leedill8 ....................... ..
CokI&
16
68
Ccllick &
ComwmptIDn & Tissid: ......... .
Convul5ion & Mod"" ............ .
IJA
Distracted ....................... .
Dropsie & l1m"""y ............ ..
Drowned ........................ .
&ewled ....................... .
Flox & Smallpox ................. .
Foond Dead in streels, fields. &c..
Olll61ened-MIdes.... ............
S ..ried·Males ........
> ...........
4,801!
2.036
5
1,478
SO
21
655
21
46
5.114
58,569
~
.~~~~
~
~
....
...
~
~~~~.
Palsie....................
30
Plague. . . . . . . . .. .. . .. .. .. . . . . . .. .. 6&,596
Planllel ...................
6
20
18
8
3
RupI;~re.
.. . .......... .. .. . . . . ....
SCurt)'..... .......................
Shingles & SwiJle Po"..... . . . . . . . .
Sores, U Icen, Brokell aDd
Bnlised Llmb!...... . . . .. .........
15
10$
2
82
56
III
Spleen............. ...... .........
14
Spotted Feave, & Purples.........
t,929
625
SlOpping <>l the SlOOlach . . ........
Slone snd Slmngll&ty .. .. .. .. . .. . .
Surfe .... .......... ..............
Teelh & Worms .................
Vomiting........... ..............
We.m ............................
332
'lS
2,614
51
I'>
In all.... .........................
9.961
45
Female..... ........... .......... 4,853
Females .......................... 48,731
1..251
InaIL ............................ 97,JQ6
Of Ihe Plague" .................................................................................................... '". ........ 68,596
In<rellse iii Ihe Burials in tbe 130 Pariol1es and the Peslh"'-"-t Ihis year. . . . . . . . ........••. . . . . . ...•••••.... . . . . . . . . . .•••. . . . . . .. 79lXh
Increase oi: the Plague in lhe 130 Parishes and the Pe5thouse Ihis year ............. ""............. ..................... ........ 68,590
FIGURE 1.1.4
felt that these bills would prove invaluable in helping to control the spread of an epidemic.
At first. the bills were published only occasionally. but by the early seventeenth century
they had become a weekly institution?
Figure 1.1.4 (155) shows a portion of a bill that appeared London in 1665.
gravity
of the plague epidemic is strikjngly apparent when we look at the numbers at the top: Out
of 97 ,306 deaths, 68,596 (over 70%) were caused by the plague. The breakdown of certain
other afflictions, though they caused fewer deaths, raises some interesting questions. What
Z An interesting accounl of the bills of mortality is
in Daniel Defoe's A JOl/rnol of the Plague Yeor,
which purportedly chronicles the London plague outbreak of 1665.
10
Chapter 1
Introduction
happened, for example, to the 23 people who were "frighted" or to the 397 who suffered
from "rising of the lights"?
Among the faithful subscribers to the bills was John Graunt. a London merchant.
Graunt not only read the bills. he studied them intently. He looked for patterns, computed death rates, devised ways of estimating population sizes, and even set up a primitive
life table. His results were published in the 1662lreatise Nalural and Political Observ(l{ions
upon (he Bills o/Mortality. This work was a landmark: Graunt had launched the lwin sciences of vital statistics and demography, and, although the name came later, it also signaled
the beginning of political arithmetic. (Graunt did not have to wait long for accolades: in the
year his book was published, he was elected to the prestigious Royal Society of London.)
High on the list of innovations thnt made Graunt's work unique were his Objectives.
Not content simply to describe a situation. although he was adept at doing so, Graunt often
sought to go beyond his data and make generalizations (or, in current statistical terminology, draw inferences). Having been blessed with this particular (urn of mind. he almost
certainly qualifies as the world's first statistician. All Graunt really lacked was the probability theory that would have enabled him to frame his inferences more mathematically.
That theory, though, was just beginning to unfold several hundred miles away in France.
Other seventeenth-century writers were quick to follow through on Graunt's ideas.
William Peuy's Political Arilhmetick was published in \690, although it was probably
wntten some tifteen years earlier. (It was Petty who gave the movement its name.)
Perhaps even more significant were the contributions of Edmund Halley (of "Halley's
comet" fame). Principally an astronomer. he also dabbled in political arithmetic. and in
1693 wrote An ESfimate of the Degrees oflhe Mortality of Mankind, drawn from Curious
Tables of the Births and Funerals lIL the city of Brcslaw; with an attempt to ascertain
the Price of Annuities upon Lives. (Book titles were longer then!) Halley shored up.
mathematically, the efforts of GTaunt and others to construct an accurate mortality table.
In doing so, he laid the foundation for the important theory of annuities. Today, all life
insurance companies base their premium schedules on methods similar to Halley's. (The
first cornpany to follow his lead was The Equitable, founded in 1765.)
For all its initial Aurry of activity. political arithmetic did not fare particularly well in the
eighteenth century. at least in terms of having its methodology fine-tuned. Still, the second
half of the century did see some notabJe achievements for improving the quality of the
databases: Several countries, mcludIng the United States in 17<)0, established a periodic
census. To some extent, answers to the questions that interested Graunt and his followers
had to be deferred until the theory of probability could develop just a little bit more.
Quetelet: The Catalyst
With political arithmetic furniShing the data and many of the questions, and the theory
of probability holding oul the promise of rigorous answers, the birth of statistics was at
hand. All that was needed was a catalyst--someone to bring the two together. Several
individuals served with distinction in that capacity. Karl Friedrich Gauss, the superb
German mathematician and astronomer, was especially helpful in showing how statistical
concepts could be useful in the physical sciences. Similar efforts in France were made
by Laplace. But the man who perhaps best deserves the title of "matchmaker" was a
Belgian. Adolphe Ouetelet.
Section 1.2
Some Examples
11
Quetelet was a mathematician, astronomer, physicist, S()(;lOIIO£l.St, anthropologist, and
poet One of his passions was collecting data, and he was
regularity of
social phenomena. In commenting on the nature of criminal
he once wrote
(69):
Thus we pass from one year to another with the sad perspective of
the same crimes
reproduced in the same order and calling down the same punishments in the same proportions.
Sad condition of bumanity! ... We might enumerate in advance how many individuals will
stain their hands in the blood of their fellows, how many will be
how many will be
poisoners, almost we can enumerate in advance the births and deaths that should occur. There
is a budget which we pay with a frightful regularity; it is that of
chains and the scaffold.
onenr:anon, it was not surprising that
would see in probability
expressing human behavior.
much of the nineteenth
'-'W;UUIJUJU,",...., the cause of statistics, and as a member of more than
one hundred learned
his influence was enormous. When he died in 1874, statistics
had been brought to the brink of its modern era.
SOME EXAMPLES
Do stock markets
and fall randomly? Is there a common element in the aesthetic
standards of the
Greeks and the Shoshoni Indians?
external forces, such as
phases of the moon, affect admissions to mental hospitals? What kind of relationship
exists between
to radiation and cancer mortality?
These
are quite diverse in content, but they share some important similarities.
or impossible to study in a laboratory, and none are likely to yield
recl.SOlrun.g. Indeed, these are precisely the sorts of questions that are usually
answered by
making assumptions about
that generated the
data, and then drawing inferences about those assumptions.
CASE STUDY 1
1
Each
radio and TV reporters offer a bewildering
averages and
indices that presumably indicate the state of the
market.
Are
numbers
any really useful information?
financial analysts would say
_•."~"",,, that speculative markets tend to rise and fall randomly, much as though
wheel were spinning out the
How might that "theory"
We would begin by constructing a model that should describe the behavior of the
market the (rtmdom) hypothesis were [rue.
that end, the notion of "random
translated into two aSSUITIOt:lon,s:
a. The
of the market's rising or falling on a
aCI:IGrlS on any previous days.
b.
is equally likely to go up or down.
day are unaffected by its
(Colltinued Oll next pttgeJ
12
Chapter 1
Introduction
(Case
1.2.1 continued)
Measuring the day-to-day randomness, or its absence, in the markel's movements
at the
of runs.
definition, a run of downturns
can accomplished by
oflength k is a sequence of days starting with a rise, followed by k consecutive declines,
then followed
a
So, for example, a daily sequence of the form
fall,
rise) is a run oflength two.
If
actual
of the market's run
differs
from the
predictions of assumptions (a) and (b),
random-movement hypothesis can
rejected. Fortunately, calculating the "expected" number of (randomly-generated)
runs is straightforward.
Suppose a rise
followed by a fall For a run
one, the market
must next rise. By assumptions (a) and (b), this happens halt the time, so a probability
of would be
to a run of length one. The notation for this will be P(I) = ~.
other half of the time the market falls,
the
(rise, fall, fall). A run
of
two occurs if there is nOw a
Again, this happens
the time,
half represented by the
faU, fall)
Thus,
its probability half of
probability of a run of
two is P(2)
= Continuing in
manner, it
follows that a :run
length k has probability a)k. Furthermore, if there are T total
runs, it seems reasonable to expect T . (!)k of them to be of
k.
Table 1
gives the distribution of 120 runs of downturns observed in daily closing
1994 and
prices of the Standard and Poor's 500 stock index between February
February 9, 1996.
third column gives the corresponding expected
as
calculated from the expression T . (!)\ where T 120.
Notice that the
between actual and predi.cted run frequencies seems
enough to lend at least some
to assumptions (a) and (b). However, the
distribution particularly
expected numbers of longer runs (4, 5, and 6+) do not fit
well. The reason (or that might be that tile "equally likely" provision in assumption (b)
is too restrictive and should be replaced by the probability p as given in assumption (c):
!. i !.
c. The likelihood of a faU in the market is some number p, where 0 .::: p .::: 1.
TABl£ 1.2.1: Rul'\! jl'\ the Closing ["rfees for the S&P 500 Stock Index
Run Length., k
Observed
Expected
28
18
3
30.00
60.00
1
2
3
4
5
6+
2
2
120
15.00
7.50
3.875
3.75
Invoking assumptions (a) and (c), then, aUows for the run length probabilities to be
recalculated. For example, following a (rise, fall) sequence, a rise would be, expected
(Con1/nued on next page)
Section 1.2
Some Examples
13
100(1 - p)% of the time, so P(l) == 1 - p. Another fall, of course, would occur the
remaining p% of the time.
the chance of the next change
a
is 1 - p.
probability of the
(rise, fall, fall. rise)-that is, a run of length two-is
P(2) = p(l - p). In
P(I<:) = jI-l(l - p).
Two questions now
Whichever of the two is more
for further study
depends on the needs and
the model maker.
1. Is the initial assumption p
2. Given the observed
:=
~ justified?
is the best choice (or estimtlte)
p?
To answer Question 1, we must decide whether the du;.creparlCi(:s bc=twEen
and expected run lengths are
enough to be attributed to
enough
way to answer Question 2 is to
the value of p that
to render the model invalid.
best "explains" the observations, in terms of maximizing their likelihood of occurring.
For the data from which Table 12.1 was derived, this
of
mrns out to
p = 0.43. The
expected values, based on P(k) = jI-l(l - p) =
(0.43),,-1(0.57), are given column 3 of Table 1.22.
TABLE 1.2.2: Runs in the Closing PTkes fOf the S&P 500 Stock Index
Run
k
1
2
3
4
5
Observed
Expected [p = 0.43]
67
28
68.4
18
12.6
3
2
2
120
5.4
1.9
Has assumption (c) provided a noticeably better fit? Yes.
five of the six: runlength categories,
frequencies in Table
are closer to the correspondingobserved frequencies than was true for their
Table 1.2.1. Moreover,
bothmodels-p
0 < p < l-areinsubstantialagreementwiththehypothesis
that up-and..-down movements in the market look
like a random sequence.
!
CASE STUDY 1
Not ali rectangles are created equal Since antiquity,
have expressed aesthetic
preferences
rectangles having certain width (w) to length (I) ratios. Plato, for example, wrote that rectangles whose sides were in a
ratio were especially pleasing.
(These are
formed from the two
an equilateral triangle.)
(Continued on fll!X! page)
14
Chapter 1
Introduction
(Ca'll! Study 1.22 iY1IltiJlJud)
Another
calls for the width-to-Iength ratio to be equal to the ratio
the length to the sum of the width
the length. That is,
w
(1.2.1)
Equation 1.2.1 implies that the width is
1), or approximately 0.618, times as
as the length. The Greeks called
l'>~"~-" rec:taD:g1e and used it often in
inclined. The
their architecture (see Figure 1.2.1). Many other cultures were
for example, built their pyramids out stones
were golden
rectanlgies. Today, in our society, the golden rectangle renrlalJlS an architectural and
and even items such as drivers'
picture
have wjl ratiCE: close to 0.618.
FIGURE 12.1: A
The fact that many societies have
as an aesthetic
standard has two ~ihle explanations. One,
to
it because of the
profound influence that Greek writers, philosophers, and artists have had on cultures
about human perception that
all over the world. Or two, there is something
pn~J1~p(JlSes a preference for the golden
in the field of experimental
to test the plausibility
of those two hypotheses by seeing wbether the
rectangle is accorded any
"v ......,."'" status by
that had no contact wbatsoever with
their
study (39) examined
of
sewn
by the
Indians as decorations on their blankets and clothes.
lists
the ratios
for twenty
rectangles.
If, indeed,
also had a preference for golden .l ........O,.u,&l....",
expect their ratios to be "dose"
average value of the entries
though, is 0.661. What does that
Is 0.661 close enough to 0.618 to support the
position that liking the
is a human characteristic, or is 0.661 so far
from 0.618 that the only
is that the Shoshonis did not agr-ee with
the aesthetics espouserl by the
(Continued on neJd page)
Section 1.2
Some Examples
15
TABLE 1.2.3; Width-T04.ength Ratios of Shoshoni Rectangles
0.693
0.662
0.690
0.606
0.570
0.749
0.654
0.615
0.628
0.609
0.844
0.668
0.601
0.576
0.670
0.606
0.611
0.553
0.933
Making that judgment is an
of hypothesis testing, one of the predominant
fonnats used in statistical inference. Mathematically,
testing is based on
Shoshonis and
a variety of probability results covered in Chapters 2 through 5.
their rectangles" then, will have to be put on hold until Chapter 6, where we
how
0.661)
a hypothesized mean
to interpret the difference between a sample mean
0.618).
Comment.
and e, the ratlo W /1 for golden rectangles (more commonly referred
to as either phi or the golden ratio), is a transcendental number with all sorts of fascinating
properties and connections. Indeed,
books have been written on phi-see, for
example (106).
Algebraically, the solution of
equation
is the continued fractjon
1
W
1=1+-----1
1+---1
1+ - - 1
1+
1
+ ...
Among the
associated with phi is its relationship with the Fibonacci series.
The latter, of course, is the famous sequence where each tenn is the swn of its two
predecessors-that is,
1
1
2
3
5
8
21
55
89
'6
Chapter 1
Introd uct ron
Quotients of successive terms in the Fibonacci sequence alternate above and below phi
and they converge to phi:
1/1 = 1.000000
2/1 = 2.000000
3/2 = 1.500000
5/3 = 1.666666
8/5 = 1.600000
13/8 = 1.625000
21/13 = 1.615385
34/21 = 1.619048
55/34 = 1.617647
89/55 = 1.618182
But phi is not just about numbers-it has cosmological significance as well. Figure 1.22
shows n golden rectangle (of width W Ilfld length J), where .a W x w square has been
inscribed in its left-hand-side. What remains is a golden rectangle on the right. inscribed
in which is an l - w X l - !JI square. Below that is another golden rectangle with a
w - (1 - w) x w - (I - w) square inscribed on its right-hand-side. Each such square
leaves another golden rectangle, which can be inscribed with yet another square, and so
on ad infinitum. Connecting the points where the squares touch the golden rectangles
yields a logarithmic spiroJ. the beginning of which is pictured. These curves are quite
common in nature and describe, for example, the shape of spiral galaxies, one of which
being our own Milky Way (see Figure 1.2.3).
w
l-w
w- (I-w)
w-(I-w)
RGURf 1.2.2
What does all this have to do with the Sbosbonis? Absolutely nothing, but mathematical
relationships like these are just too good to pass up! The famous astronomer Joannes
Kepler once wrote (106):
"Geometry has two great treasures; one is the Theorem of Pythagoras; the other [is the
golden ratio]. The lim we may compare to a measure of gold; the second we may name a
precious jewel."
Section 1.2
Examples
17
FIGURE 1.2.3
CASE STUDY 1.2.3
In folklore. the full moon is ofrell portrayed as something sinister. a kind of evil
possessing the power 10 eomrol our ochavior. Over the centurit:s, many pmmint:nt
writers and philosophers have shared Ihis belief (132). Milton. IU Paradise Lv.~t.
refers to
Demoniftc frenzy. moping. melancholy
And moon·~lruck madncss,
And Othello. after the murder of Desdemona, laments:
II b the \'ery error of the moon.
She comes more near the earth than she "vas wolil
And makes men mad.
On a more scholarly level. Sir William Blackstone. the renowned eighteenth-century
barrister. defined a "Iunatie" as
one who hath ... IOSI the use 01 his reason and who halh luci(j inkrvals. somelimcs
cnjoyinjt his senses and sometimes nol. and Ihat frequemly depending. upon chang!;;'s of
the moon,
The possibility of lunar phases influencing human affairs is a
not wi[hoUI
supporter:;. among the scientific community. Studies by reputable medical researcht:rs
" as it has come to be known. \\'I[h
have attempted to link rhe "Transylvania
suicide rates, pyromania, and even epilepsy.
The relationship between lunar eycles and menIal breakdowns has also been
shows the admission Hites to the emergency room of a Virginia
studied. Table I
mental health clinic hefore. during. and after the twelve fun moons from August 1971
[0 July 1972 (l
(Cml1il/lf/!d 011 I//'X/ P(I~")
18
Chapter t
Introduction
(Care Study 1.2.3 continued)
TABLE 1.2.4:. Admission Rates (PatienWDay)
During Full Moon
After Fun Moon
6.4
7.1
6.5
8.6
5.0
13.0
14.0
5,8
8,1
6.0
9.0
7.9
7.7
11.0
12.9
13.0
16.0
25.0
13.0
l4.0
13.1
15.8
13,3
12.8
Month
Full Moon
Aug,
Sept.
Oct.
Nov,
Dec.
Jan.
Feb.
Mar.
May
June
July
Averages
10.4
11.5
13.8
15.4
15.7
11.7
9.2
11.5
10.9
For these data, the average admission rate "during" the full moon is higher than
the "before" and Hafter" admission rates: 13.3 versus 10.9
11.5. Does that impJy
that
Transylvania
is real? Not
The
that needs to be
addressed lS whether
means as different as 13.3, 10.9, and 11.5 could reasonably
have
by chance if, in fact, the Transylvania effect does not exist. We will
learn in Chapter 13 that the answer to that question
to be "no."
CASE STUDY 1.2.4
The oil embargo
1973 raised some very
questions about energy
in
the United States. One of the most controversial is whether
reactors should
assume a more central role in the production of electric power. Those favor point to
their efficiency and to the availability of nuclear material; those against warn of nuclear
"In,ClOents' and
the health hazards
by low-level radiation.
the opponents' position was a serious safety lapse that occurred some
at a
government facility located in
Washington. What happened there is what
fear will be a recurring problem
r{',actors fire proliferated.
Until recently, Hanford was responsible for
the plutonium used in
nuclear weapons. One of the major safety problems encountered there was the storage
of radioactive wastes, Over the years, significant quantities of strontium 90
cesium
137 leaked from their
storage areas into
Columbia River, which
(Continued on next page)
Section 1.2
Some Examples
19
flows along the Washington 0regon IV\r£lP>r and eventually
into the Pacific
"-'"........". The question raised by
health officials was whether
to that
contamination contributed to any
medical problems.
to what extent?
was calculated
of the nine Oregon
a starting point, an
the Columbia River or the Pacific Ocean. It was
counties having frontage on
on several factors, including
county's stream distance from
and the
distance of its population
any water frontage.
a covariate, the cancer
mortality rate was determined for
of the same counties
(42).
4
TABLE 1..2.5: Radioactive Cootamination and cancer Mortality in
County
Index
Umatilla
Morrow
Gilliam
Sherman
Wasco
Hood River
Portland
Columbia
Oatsop
Cancer Mortality per 100,000
2.49
147.1
130.1
2fJ7.s
177.9
11.64
6.41
210.3
A graph of the data
1.2.4) suggests that radiation exposure (x) and
and that the two vary
is y = f30 + fitx.
cancer mortality (y) are
Finding the numerical
fJo and fJl that
such a way that
it "best" fits the data .is a frequently-encountered problem in an area of statistics
the optirnalline, based on methods described in
known as regression anaLysis.
Chapter 11, has the
y = 114.72 + 9.23x.
220
y., 114.72+
Cancer deaths per 180
100.000
till
140
till ..
till
100
o
2
6
4
Index
FIGURE 1.2.4
8
12
20
Chapter 1
1.3
A CHAPTER SUMMARY
Introduction
The concepts of probability lie al the very heart of all statistical problems, the case
studies of Section 1.2 being typical examples. Acknowledging that fact, the next two
chapters take a close look at some of those concepts. Chapter 2 states the axioms
of probability and
their consequences. It also covers the
skills
algebraically manipulating probabilities and gives an introduction to combinatorics, the
mathematics of counting. Chlipter 3 reformulates much of the material in Chapter 2 in
terms of random variables, the latter
a concept of
convenience in applying
Over
years, particular measures of probability
prObability to
as being especially useful: The most prominent of these are profiled in Chapter 4.
Our study of statistics proper begins with Chapter 5, which is a first look at the theory of
parameter estimation. Chapter 6 introduces the notion of hypothesis
a procedure
commands a major share of the remainder of
book. From
that, in one Conn or
a conceptual standpoint, these are
important chapters: Most fonnal applications of
statistical methodology will involve either parameter estimation or hypothesis testing, Or
both.
Among the probability functions featured in Chapter 4, the nonnal distribution-more
familiarly known as
bell-shaped
sufficiently important to merit even further
scrutiny. Chapter 7 derives in some detail many of the properties and applications the
normal distribution as well as Ihose of several related probability functions. Much of the
in
9 through 13 comes from
theory that supports the methodology
Chapter 7.
Chapter 8
some of the basic principles of experimental "design." Its purpose
is to provide a framework for comparing and contrasting the various statistical procedures
9 th.rough 14.
profiled in
Chapters 9,
and 13
the work of Chapter 7. but with the emphasis
populations, similar to what was done in Case Study
on the comparison of
Chapter 10 looks at the important problem of assessing the level of agreement between
II set of data and the vaLues predicted by the probability model from which those data
relationships, such as the one
presumably came (recall Case Study 1.2,1).
radiation exposure and cancer mortality in Case Study
are examined in Chapter II.
Chapter 14 is an introduction to nonparametric statistics. The objective there is
to develop procedures for answering some of the same sorts of questions raised in
Chapters t$, l),
and
but with fewer initial assumptions.
As a general (onnat, each chapter contains numerous examples and case studies,
latter being actual experimental data taken from a variety of sources, primarily
newspapers, magazines, and technical journals. We hope that these applications will make
it abundantly clear that, while the general OTi~ntalinn of this text is theoretical, the
consequences of that theory are never too far from having direct relevance to the "real
world."
CHAPTER
2
Probability
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
INTRODUCTION
SAMPLE SPACES AND THE ALGEBRA OF SETS
THE PROBABILITY FUNCTION
CONDIllONAL PROBABILITY
INDEPENDENCE
COMBINATORICS
COMBINATORIAL PROBABILITY
TAKING A SECOND LOOK AT STATISTICS (ENUMERATION AND MONTE CARLO TECHNIQUES)
Pierre de FemUlf
Blaise Pas('111
One
most influential of seventeenth-century mathematicians, Fermat
earned his Jiving as a lawyer and administrator in Toulouse. He shares
with Descartes for the invention of analytic geometry, but his most
important work may have been in number theory. Fermat did not write
for publication preferring instead to send lerrers and papers to friends. His
correspondence with Pascal was the starting point for the development of
a mathematical theory of probability.
-Pierre de fermat (1601-1665)
was the son of a nobleman. A prodigy of sorts, he had already
published a treatise on conic sections by the age of sixteen. He also
invented one of the
calculating machines to help his father with
accounting work. Pascal's contributions to probability were stimulated by
his correspondence, In 1654, with Fermat Later that year he retired to a
of religious meditation.
-Blaise Pascal (1623-1662)
21
Probability
2.2
Chapter 2
2.1
INTRODUCTION
Experts have estimated that the likelihood of
given UFO sighting being genuine is
on the order of one in one
thousand.
some ten thousand
have been reported to civil authorities. What is the probability that at least one
of thMe ohject~ was, in fact, an alien spacecraft? In 1978. Pete Rose of the Cincinnati
Reds set a National League record by batting safely in forty-four consecutive games. How
hitter?
definition, the mean
unlike]y was that event, given that Rose was a lifetime
free path is the average distance a molecule in a gas ttavels before colliding with another
molecule. How likely is it that the distance a molecule travels between collisions will be
at least twice its mean free path? Suppose a boy's mother and father both
genetic
markers for sickle cell anemia, but neither parent exhibits any of the disease's symptoms.
What are the chances that
son will also be asymptomatic'? What are the odds that a
poker player is dealt a full house or that a craps shooter makes his "point"? rf a woman
has lived to age sevemy, how likely is it thal ~hc will !.lie before her ninetieth birthday?
In
Tom Foley was
of the House and running for re-election. The
after
any of the networks: he trailed his
the election. his race
still not been "called"
Republican Challenger by 2174 votes, but (4,000 absentee ballots remained to be counted.
have wailed (or the absentee ballots to be counted,
howevt:r. c.onr.t:oeo. Should
or was his defeat at that point a virtual certainty?
of those questions would
probability is a subject with
As the nature and
an
range
real-world, everyday applications. What began as an exercise
in understanding games of chance has proven to be useful everywhere. Maybe even more
remarkable is the fact that the solutions to all of these diverse questions are roo led in
a handful of
and theorems. Those results,
with the problem-solving
techniques they empower, are the sum and substance or Chapter 2. We begin, though,
with a bit of history.
The Evolution of the Definition of Probability
Over the years, the definition
probability has
several revisions. There is
nothing contradictory in the multiple
changes primarily reflected the
need for greater generality and more mathematical rigor. The tirst t'ormulalion (often
T"TPT'"Pr> to as the classical definition of probability) is credited to Gerolamo Cardano
(recall Section 1.1). It
only to situations where (I) the number of possible outcomes
is finite and (2) all outcomes are equally-likely. Under those conditions, the probability
of an event comprised of iii outcomes is the ratio m/lj, where n is the total number of
(equally-likely) outcomes. Tossing a
six-sided die, for example, gives mIn ~ as the
(that is,
2,4, or 6).
probability of rolling an even
While Cardano's model was well-suited to gambling scenarios (for which it was
intended). it was obviously inadequate for more general problems, where outcomes we:re
not equaUy likely and/or the number of outcomes was not
Richard von Mises. a
is often credited with
the weaknesses
twentieth-century German
in Cardano's model by defining
probabilities. In
approach, we
Section 2.1
Introduction
23
lim min
Jl-6c>
m
1
n
2
o
_---v-
1
2
3
4
5
n '" numbers of trials
RGUftE 2.1.1
imagine an experiment being repeated over and over again under presu.rtU1bly ldemical
times (m)
conditions. Theoretically, a running taHy could be kept of the number
outcome belonged to a given event divided by n, the total number of limes the .......'.....'·"'1"1'1"
was performed. According to von Mises, the probabiliIy of the given event is the limit
(as n goes to infinity) of the ratio mIn. Figure 2.1.1 illustrates lhe empirical probability of
getting a head by
a fair coin: as the number of tosses continues to increase, the
mIn
to~.
The von Mises approach definitely shores up some of the inadequacies seen in
the Cardano model, but it is not without shortcomings of its own. There is some
conceptuaJ inconsistency, for example, in extolling the limit mIn as a
of defining
a probability empirically, when the very act of repeating an experiment under identical
conditions an infinite number of limes is physically impossible. And left unanswered is
in order for min to be a good approximation
the question of how large n must
lim min.
Andrei Kolmogorov, the greal Russian probabilist, took a
approach. Aware
that many lwentieth-century mathematicians were having success developing subjects
axiomatically, Kobnogorov wondered whether probability might similarly be defined
the von
operationally, rather than as a ratio (like
Cardano model) or as a limil
Mises model).
efforts culminated in a masterpiece of mathematical elegance when he
published Grulldbegriffe der Wahrscheinlichkeiisrechnung (Foundations of the Theory of
Probability) in 1933. In essence, Kolmogorov was
to show that a maximum four
simple axioms was necessary
sufficient to define the way any and all probabilities
must behave. (These will be our starting point in Section 2.3.)
We begin Chapter 2 with some basic (and, presumably, familiar) definitions from
set theory. These are important because probability will evemuaHy be defined as a set
function-that is, a mapping from a set to a number. Then, with the
of Kolmogorov's
axioms in Section 2.3, we will learn how to calculate and manipulate probabilities. The
chapter concludes with an introduction to combinaJorics-the mathematics of systematic
application to probability.
24
Chapter 2
2.2
SAMPlE SPACES AND THE ALGEBRA Of SETS
Probability
The starting point
probability is the definition of
terms: experiment.
and event.
all r~T'rv(\VPn: from classical set
us a familiar mathematical framework within which to
the fonner is what
provides the conceptual
for casting
phenomena into probabilistic
terms,
By an experiment we win mean any procedure that (1) can be
tbeoreticaUy,
(2) has a well-defined set of possible outcomes. Thus,
an infinite number of times;
rolling a
of
qualifies as an experiment; so
measuring a
blood
pn::~un:; ur
a :,,~tfUgIavhiL: allalysis to determ.ine the carbon content of moon
rocks. Asking a would-be psychic to draw a picture an
presumably transmitted
by another
psychic does not qualify as an
because the set of possible
or otherwise
outcomes cannot
listed,
Each of the potential eventualities of an experiment is referred to as a
outcome,
s, and their totality is called the sample space, S. To signify
membership
s in
we write S E S. Any designated collection of sample
including individual
t,..".",,~<, the
space, and the null set, constitutes an event. The latter is
the experiment is one of the members of the event.
SOIlIV,re outcome,
sample
EXAMPLE 1.1.1
'l'Ir'1'oll'l,>Y" the experiment
a coin three times. What is tbe
space? Which
sarnOlle outcomes make up
event A: Majority of coins show
Think of each sample outcome here as an
triple, its
representing
the outcomes of the first,
and third
respectively. Altogether, there are
eight
triples. so those
space:
s= {HHH,
HTH, THH, HTT, THT, TTH,
we se,e that fOUT of the sample ollkomes in S c;onstituf,e the event A:
A=
HTH.
EXAMPLE 2.1.1
lht: fu:;.l UlIt:
one green. each sample outcome
pair (face showing on red
showing on green die), and the
can be represented as a 6 x 6 matrix (see Figure
are often
in tbe event A that the sum of the
Showing is a 7.
2.2.1 that
sample outcomes contained in A are the six
(2,
(3,4).
3), (5,2), and (6,
ruLling twu
Section 2.2
Spaces and the Algebra of Sets
2S
RIce showing on green die
1
2
3
(1,3)
..,
1
(1,1)
(t,2)
11
2
(2,1)
(2.2)
g
3
(3,1)
=a
4
5
4
t:!
~
5
6
(6.6)
EXAMPLE
A local
station advertises two
positions.
women (Wi. Wz. W3) and
a sample
two men (M}, M2) apply. the "experiment" of hiring two coanchors
of 10 outcomes:
S = {(WI, W2), (Wt, W3),-(W:l, W3). (WI. Ml), (WI. M2), (W2, MI),
(W2, M2), (W3. MI), (W3. M2). (Ml. M2)}
Does it matter here'that the two
being ruled are equivalent?
If the station
were seeking to hire, say, a sports anilouncer ana a weather forecaster, the number of
possible outcomes would be 20: (W2.
example, would represent a different staffing
assignment than eM1, W2).
EXAMPlE 2.2.4
nUlrno<~r of sample outcomes associated with an experiment need not be finite. Suppose
that a coin is tossed until the first tail appears. If the first toss is itself a tail, the outcome
is T; if the
occurs on the second toss, the outcome is HT; and
of course,
tail may never occur, and
infinite nature of S
S
HHHT, ... }
EXAMPlE 2.2,5
There are three ways to indicate an experiment's sample space. Ii the number of possible
outcomes is small, we can simply list them, as we did in Examples 2.2.1 through 2.2.3. In
some cases it may be possible to characterize a sample space by
the structure its
outcomes necessarily
is what we did in Example
A third option is to
state a mathematical formula
the sample outcomes must satisfy.
26
Chapter 2
Probability
"VllU""U~';;4
DrolO'amrner is running a subroutine that solves a general
equa+ c = O. Her "experiment" consists of
values for tbe three
coefficients a, b, and c. Define (1) Sand (2) the event A: Equation has two equal
roots.
we must determine the sample
presumably no combinations finite
a, b, and c are inadmissible, we can cb,aralcU~rnre S by writing a series of inequalities:
+
bx
S = {(a, b, c):
-00
< a <
00. -00
< b <
00, -00
Defining A requires the well-known result
eqWll roots if and only if its discriminont, b 2 contingent on a, b, and c satisfying an equation:
<
C
< (0)
..."',""'.,'u.... equation has
in A, then, is
Mi~mbel~snLp
A
QUESTIONS
2.2.1. A graduating engineer bas
up for three job inlerviews. She intends to categorize
each one as being either a "success" or a "failure" depending on whether it leads to a
pl::1nt. trip. Write out. the appropriate ~ample spare. What outcomes are in
event A;
Second success occurs on third interview? In B: FiIst success never occurs? (Hint:
Notice the
between this situation and the coin-tossing experiment described
in Example 2.2.1.)
one red, one blue, and one
outcomes make up the
2.2.2. Three dice are
event A that the sum of the three faces showing
five?
1.2.3. An urn contains six chips numbered 1
6. Three are drawn out. What outcomes
are .in the event "Second smallest
Assume that the order of the
is
irrelevant.
2.2.4. Suppose that two
are dealt from a standard 52-card
deck. Let A be the
event that the S!.IIlI of the two cards is eight (assume that aces
a numerical value
of
How many outcomes are in A?
2.2.5. In
of craps-shooters (where two dice are tossed and the underlying sample
space is
matrix pictured in Figure 2.2.1) is the phrase "making a hard eight." What
might that mean?
\ 2.2.6. A poker deck consists of fifty-two cards, representing thirteen denominations (2
through Ace) and four suits (diamonds,
and spades). A five-card hand is
called a flush if all five cards are in the same suit, but not all five
are
consecutive. Pictured next is a flush in heam. Let N be the set of five cards in bearts
that are not Bushes. How many outcomes are in N? Note: In poker, the denominations
(A, 2, 3, 4, are considered to be consecutive (in
to sequences such as (8,9,
10,1, Q)).
Denominations
2
Suits
3
D
H X X
C
S
4 5 6 7 8 9 10 1
X
X X
K A
Section 2.2
Sample Spaces and the Algebra of Sets
27
2.2.7. Let P be the set of right triangles with a 5" hypotenuse and whose height and Length
are a and b, respectively. Characterize the outcomes in P.
2.2.8.. Suppose a baseball player steps to the plate with the intention of trying to "coax" a
call
base on balls by never swinging at a pitch. The umpire, of course, will
the event A, [hat
pitch either a ball (8) or a strike (S). What outcomes make
ball is called
a batter walks on the sixth pitch? Note: A batter "walks" if the
before the third strike.
2.2.9. A telemarketer is planning to set up a phone bank to bilk widows with a Ponzi scheme.
His past experience (prior to his most recent incarceration) suggests that each phone
will be in use half the time. For a given phone at a given time, let 0 indicate that
the phone is available and Let J indicate that a caller is on the line. Suppose that the
telephones.
telemarketer's "bank" is comprised of
(8) Write out the outcomes in the sample space.
(b) What outcomes would make up the event that exactly two phones were being
used?
How many outcomes would allow for the
(c) Suppose the telemarketer had k
possibility that at most one more call could be received?
2.2.10. Two darts are thrown at the following target:
Let (u, v) denote the outcome that the first dart lands in
u and the second
in region lJ. List the sample space of (u, v) 'so
(b) List the outcomes in the
space oisums, u + v,
A woman has her purse snatched by two teenagers. She is subsequently shown a police
What is the """"I-'J,""
lineup consisting of five suspects, including the two
space associated with the experiment "Woman
two suspects out of lineup"?
makes at
one incorrect identification?
Which outcomes are in the event A:
Consider the experiment of choosing
for the quadratic
ax 2 +
bx + c = O. Characterize the values of (J, b, and c associated with the event A: Equation
has imaginary roots.
of craps, the person rolling the dice (the shooter) wins outright if his first toss
In the
is a 7 or an 11. 1f his first toss is a 2,3, or 12, he loses outright. If his first roll is something
say, a 9, that number becomes his "poine' and he
rolling the
until he
roUs another 9, in which case he wins, or a 7, in which case he loses. Characterize
the sample outcomes contained in the event "Shooter wins with a point of 9."
A probability-minded
offers a convicted murderer a final chance to
ten white and ten bJack. All twenty are to
release.
prisoner is given twenty
be placed into two urns,
to any aUocation scheme the prisoner wishes, with
W11 contain at least one chip. The executioner will
the one proviso being that
then pick one of the two urns at random and from that UITI, one chip at random. If the
set free; if it is black, he "buys the farm."
chip selected is white, the prisoner will
Characterize the sampJe space
the
possible allocation options.
(Intuitively, which a1Jocation affords the prisoner the greatest chance of survival?)
(9)
2.2.11.
2.2.12.
2.213.
1.2.14.
28
OIapter 2
Probability
1.1.15. Suppose that ten chips,
1 through 10, are put
an urn at one minute to
midnight, and chip number 1 is quickly removed. At one-balf minute to midnight,
20 are added to the urn, and
number 2 is quickly reDlOv'e<1.
numbered 11
Then at one-fourth minute to midnight,
21 to 30 are added to the urn,
and chip number 3 is quickly removed.
procedure for adding
to the urn
continues. how many chips will be in the urn at midnight (152)1
Unions. Intersections. and Complements
operations collectively
Associated with events defined on a sample space are
referred to as the algebra. of sets. These are the rules that govern the wnys in which One
event can
combined
another. Consider, for
the
of craps
in Question 2.2.13. The shooter wins on his initjal ro]] if he throws
a 7 or an 11.
the language of
algebra of sets, the event "shooter
a 7 or an 11" is the union of
two
events, "shooter
a 7" and "shooter rolls an 11." If E denotes the union
and if A and B
the two events making up the union, we
E = A U B. The
next several definitions and examples
those portions of the
of sets
we will find particularly useful in the chapters ahead.
Definition 2.2.1. Let A and B be any two events defined over the same SarnpJle space
S.
a. The intersection of A and B, written A n B, is the event whose outcomes belong
to both A and B.
b. TIlc lwiuTL o( A and B. written A U
either A or B or both.
EXAMPlE 2.2.6
A
card is drawn from a poker
A
= {ace of
is the event whose outcomes
Let A be the event that an ace is St:J'C:CLCU;
ace of spades}
ace of diamonds, ace
Let B be the event "Heart is drawn":
B
=
hearts,3
A
n
B
hearts, ...• ace of hearts}
{ace of hearts}
and
A U B
= {2 of hearts, 3 of
.. ,ace of
ace of diamonds, ace of clubs,
ace of spades}
(Let C be the event "club is drawn." Which cards are in B U C? In B
n
C?)
to
Section 2.2
EXAMPLE 2.2.1
Let A be the set of x's
A n B and A U B.
which
+ 2x =
Sample Spaces and the Algebra of Sets
8; lel B be the
which xl
+x
= 6. Find
Since the first equation factors into (x + 4)(x = 0, its solution set is A =
Similarly, the
equation can be written (x + 3)(x - 2) = 0, making B =
Therefore,
A
n
B
29
2}.
2}.
= {2l
and
A U B
= (-4,
2}
EXAMPlE
Consider the electrical circuit pictured in Figure
Let Ai denote the event that switch
1,2,3,4.
A be the event "Circuit is not completed"
A in
terms of the Ai'S.
i fails to close, i
=
AGURE 2.l.l
Call the (D and ® switches line a; can the and (!) switches line b. By inspection, the
fails only if both
a and line b fail But line a
only if either (DOT ®(or both)
fail. That is, the event that
a fails is
union Al U Az. Similarly, the failure of
b
is the uruon A3 U A",. The event that
circuit
then, is an intersection:
Definition 2.2.l. Events A and B defined over the same sample space are said to be
mutually exclusive if they have no outcomes in comroon--that is, if A n B = 0, where
0is
null set
EXAMPLE 2.2.9
Consider a single throw of two dice. Define A to be the event that the sum of the faces
two faces themselves are odd. Then clearly
showing is odd. Let B be the event that
the intersection is empty, the sum of two odd numbers necessarily being even. I'll symbols,
A n B = 0. (Recall
event B n C asked for
2.2.6.)
](J
Chapter 2
Probability
Definition 2.2.3. u:l A be any evenl ddineu un a sample spa!,;!;! S. The complement
of A, written A c, is the event consisting of all the outcomes in S other than those
contained in A.
EXAMPlE 1.1.10
Let A be the set of (x, y)'s for which x 2
+
y2 < 1.
Sketch the region in the xy-plane
A C.
corresponding to
From analytic geometry, we recognize that x2 + y2 < 1 describes the interior of a
circle of radius 1 centered at the origin. Figure 2.2.3 shows the complement-the points
on the circumference of the circle and the points outside the circle.
y
FIGURE 2..2..3
The notions of union and intersection can easily be extended to more than two events.
For example, the expression Al U A2 U ... U Ak defines the set of outcomes belonging
to any of the Ai'S (or to any combination of the Ai'S). Similarly, Al II A2 II ... II Ak is
the set of outcomes belonging to all of the A;'s.
EXAMPLE 1.1.11
Suppose the events A 10 A2, ... , At are intervals of reat numbers such that
Aj = (x: 0
~ x <
Describe the sets Al U A2 U ... U At
1/ I),
i
= 1,2, ... , k
= Lf;=l Ai and At
II A2 II ... II Ak =
«'=I Ai·
Notice that the Ai'S are telescoping sets. That is, Al is the interval 0 ~ x < 1, A2 is the
and so on. It follows, then, that the union of the k Ai'S is simply Al
interval 0 ~ x <
while the intersection of the Ai'S (that is, their overlap) is Ak.
!,
QUESTIONS
2.2.16. Skt.:lch the regiom; in tht.: xy-plant.: correspomling tu A U B am.! A
A={(x,Y):O < x < 3,0 < Y < 3)
and
B
= {(x, Y): 2
< x < 4,2 < Y < 4)
n
8 if
Section 2.2
Sample Spaces and the AJgebra of Sets
31
Example 2.2.7. find A () B and A U B if the two equations were replaced
mequliUtJ!es: x 2 + 2x ::5 8 and x 2 + x ::5 6.
2.2J.8.
A () B () C if A = {x: 0 ::: x ::: 4}, B
0,1. 2.... }.
= Ix: 2 ::5
x ::: 6},
C
= Ix: x =
2.2.l9. An electronic system has four colnpc>ne:nts divided into two pairs. The two components
pair are wired in parallel;
are wired in series. Let
denote the
event "ith component in jtb pair
1.2; j = 1,2. Let A be the event "System
" Write A in terms of the Ai/S.
j=l
2.2.20.
j=2
=
A
(x: 0 S x S I}, B
Ix: 0 ::5 x S 3), and C
diagrams showing each of the following sets of points:
<a>
n
A C () B
C
(b) AC U (B () C)
= Ix: -1 ::: x
:::
Draw
.
(c) A () B () CC
(d) «A U B) () CC'}C
A be the set of
dealt from a 52-card
where the
denominations of the five cards are all consecutive-for
(7 of Hearts, 8 of
Spades. 9 of Spades, 10 of
Jack of Diamonds). Let 8 be the set of five-card
hands where the suits of the five cards are all the same. How
outcomes are in the
event A () B1
2.2.22. Suppose that each of the twelve letters in the word
T
is written on a
E SSE L L A T ION
the events F, R. and C as follows:
F: letters in first
of alphabet
R: letters that are repeated
V: letters that are vowels
Which chips make up the following events:
<a) F () R () V
(b) F C () R () VC
(c) F n RC n V
32
Chapter 2
Probability
2.2.23. Let A. B, and C be any three events defined on a sample space S. Show that
(Iii) (he outcomes in A U
n
are the same as the outcomes in (A U 8) n (A U C)
(b) the outcomes in A n (B U
are the same as the outcomes in (A n B) U
n C).
2.2.24. Let At.
...• Ai(
be any set of events
on a
S. What outcomes
to the event
(At U
U (Af n Af n ... n Af)
U ... U
2.2.25. Let A, B, and C be any three events defined on a sample space S. Show that the
operations of union and intersection are associative proving that
(a) A U (B U
(b) A n (B n
(A U 8) U C :::: A U B U C
8) n C:::: A n B n
c
= (A n
2.2.26. Suppose that three events-A,
and C-are defined on a sample space S. Use the
union, intersection, and complement operations to represent each of the following
events:
(8) none of the three events occurs
(b) all
of the events occur
(e) only event A occurs
(d) exactly one event occurs
(el exactly two events occur
2.2.1:/. What must be true of events A and B if
AUB=B
(b) A n B
A
2.2.28. Let events A and Band """"10"'''- space S be defined
S
= {x:
A
B
Ix:
as the fol/owing intervals:
0:5: x :5: 1O}
0 < x < 51
(.x: 3::s x :5: 7)
Characterize the following events:
(a)
(b)
(e)
(d)
(e)
(f)
AC
A
A
A
r1 B
U B
n
AC U B
AC
BC
n
2.2.29. A coin is tossed four limes and the ~_" •.. " "",n""' ... "A of Heads and/or Tails is recorded.
Define the events A. B. and C as follows:
A: Exactly two heads appear
B: Heads
tails alternate
C : First two tosses are heads
Which events, jf any, are mutually exclusive?
(1)) Which events, jf any, are subsets of other sets?
Section 2.2
Sample Spaces and the
Pictured below are two organizational charts describing the way
vets new proposals. For both models, three vice presidents-I, 2,
opinion.
.£un"'.... ,":.
of
33
.)-ea,:n voice an
Ca)
(b)
For (a), all three must concur if the proposal:is to pass; if anyone of the three favors
the proposal in (b) it
denote the event that
i favors the
proposa~ i = 1,2,3, and let A denote the event that the ",rr'lY'V"'",,,1 r--~'-' F'II'l"Ir'"",,,,
terms of the Aj 's Coy' the two office protorois. Under wbat sorts
system be preferable to the other?
Expressing Events Graphically: Venn Diagrarm;
Relationships based On tWO'or more events can sometimes be difficult to express using
only equations or verbal
An alternative approach that can be highly effective
is to represent the
events graphically in a (onnat known as a Venn diagram.
Figure 2.2.4 shows Verll1
an intersection, a union, a complement, and for
two events that are mutually exclusive. In each case, the
interior of a region
corresponds to the desired event.
Venn diagrams
AnB
AUB
L......._-_....JS
AnB=¢
FlGURE2..2A
EXAMPLE 2.2.12
When two events A and B are defined on a sample space, we will frequently need
to 1""""'"''''''''
s. the event
b. the event
exactly one (of the two) occurs
at most one (of the two) occurs
34
Chapter 2
Probability
Getting expressions for each of these is easy if we visualize the corresponding Venn
diagrams.
The shaded area in Figure 2.2.5 represents the event E that either A or B, but not both,
occurs (that is, exactly one occurs).
A
S
8
1...-_ _ _ _ _---"
FIGURE 2.2.5
Just by looking at the diagram we can formulate an expression for E. The portion of
n B C . Similarly, the portion of B included in E is
B n AC. It follows that £ can be written as a union:
A, for example. included in E is A
E
= (A
nBc) U (B n A C )
(Convince yourself that an equivalent expression for E is (A n B)C n (A U B).)
Figure 2.2.6 shows the event F that at most one (of the two events) occurs. Since the
latter includes every outcome except those belonging to both A and E, we can write
F
= (A
n
B)C
s
AGURE 2.2.6
EXAMPLE 2.2.13
When Swampwater Tech's Class of '64 held its fortieth reunion, one hundred grads
attended. Fifteen of those alumni were lawyers and rumor had it that thirty of the one
hundred were psychopaths. If ten alumni were both lawyers and psychopaths, how many
suffered from neither of those afflictions?
Let L be the set of lawyers and H, the set of psychopaths. If the symbol N (Q) is defined
to be the number of members in set Q, then
N(S)
= 100
N(L) = 15
N(H) =
N(L
n H)
30
= 10
Section 2.2
S
Sample Spaces and the Algebra of Sets
35
L..-_ _ _ _ _- - - '
RGURE 2..2..7
all this information is the Venn diagram in Figure 2.2.7. Notice that
N (L U H) = number of a)umni suffering from at least one affliction
=5+10+20
=
which
lLLUliJll~.othat
100
N(L U H)=N(L)
or
+
were neither lawyers nor psychopaths. In effea,
N(H) - N(L
n
H)
[=15
+ 30 -10=35]
QUES1lONS
:2.2.JL During orientation week, the latest Spiderman movie was shown twice at State
University. Among the entering class of 6000 freshmen, 850 went to see it the first time,
690 the second time, while 4700 failed to see it either time. How many saw it twice?
2.2.32. Let A and B be any two events. Use Venn diagrams to show that
(8) the complement of their intersection is the union of their compJements:
(b) the complement of their union is the intersection of their compJements:
two results are known as DeMorgtm's laws.)
2.2.33. Let A. H, and C be any three events. Use Venn diagrams to show that
(8) A n (B U C) = (A n B) U (A n C)
(b) A U (B n C) = (A U B) n (A U C)
2.2.34. Let
B, and C be any three events. Use Venn diagrams to show that
(8) A U (8 U C) = (A U 8) U C
(b) A n (8 n C) = (A n 8) n C
2.2.35. Let A and H be any two events defined on a sample space S. Which of the
sets are necessarily subsets of which other sets'?
An8
36
Chapter 2
Probability
2236. Use Venn diagrams to :>Ut:t:Cbt an equivalent way
(8) (A n BC)c
(b) B U (A U B)C
(c) A n (A n B)C
re~)re:sel1ltlnl~
the foUowing events:
2..2.37. A total of twelve hundred graduates of State Tech have gotten into medical school in
the
severa] years. Of that number, one thousand earned scores of twenty-seven or
on the MCAT and four hundred had GPAs that were 3.5 or higher. Ml'IY""","""
three hundred had MeATs that were
or
and GPAs that were 3.5
What proportion of those twelve hundred graduates got into medical school
an MCAT lower than twenty-seven and a GPA below 3.51
2.2.38. Let
B. and e be any three events defined on a sample space S. Let N{A), N{O),
N(C), N(A n B), N(A n e), N(B n a n d N(A n B n e)
the numbers
of outcomes in aU the different intersections in which A, B, and e are involved. Use
a formula for N(A U B U C). Hint: Start with the sum
a Venn diagram to
N(A) + N(B) + N(e) and use the Venn diagram to identify the
that
need to be made to thal sum before it can equal N(A U B U C). As a
recall
from p. 35 that N(A U
N(A) + N(B) - N(A n B).
in the case of two
events, subtracting N(A n B) is the Hadjustment."
2.2.39. A poll conducted by a potential presidential candidate asked two questions: (1) Do
you support the candidate's position on taxes? and (2) Do you support the candidate's
position on homeland security? A total of twelve hundred
were received; six
to the second. If
hundred said "yes" to the first question and four hundred
three hundred
said "no" to the taxes
and
to the homeland
security
many said "yes"to the taxes
but "no" to the homeland
security ..."... ,_;:),,""
2.2.40. For two events A and B defined on a sample space S, N(A n Be) = 15, N(A c n B) = 50,
and N(A n B) = 2. Given that N(S) = 120, how many outcomes belong to neither A
nor B?
1. 3
THE PROBABIUTY FUNCTION
introduced in Section 2.2 the twin
of "experiment" and "sampJe space,"
we are now ready to pursue in a fonnal way the all-important problem of
a
DTllf)(J!f)tl'zlv to an experiment's
more
to an event.
if
A is any event defined on 3
space S, the symbol peA) will denote the probability
and we will refer to P as the probability function. It is, in
a
from
a set (i.e., an event) to a
backdrop for our
will he the unions,
of set theory; the starting point will be the
intersections. and
to Section 2.1 that were
set forth by Kolmogorov.
If S has a finite number of members, Kolmogorov showed that as few as
axioms
are necessary and sufficient for characterizing the probability function P:
Axiom 1. Lei A be any event defined over S. Then peA) :::: 0.
Axiom 2. P (S)
1.
.,.""v.,n 3. Lei A and B be any fWD mutually pvt'I".·,,,p events defined over S. Then
peA U B) = peA)
+ P(B)
Section 2.3
a
When S bas an inftnite number
The Probability Function
37
axiom is needed:
Adorn 4. Let At. A2 ..... , be events defmed over S. If Ai
n
Aj
= 0 for each i
:;.; j, then
From these simple statements come the general rules for manipulating
probability
function that apply no matter what specific mathematical fonn it may take in a particular
context
Some Basic Properties of P
Some of the immediate CO[LSe(lUenCl~ of Kobnogorov's axioms are
given in
Theorems 2.3.1 through
their simplicity, several of
properties-as we
will soon see-prove to be
useful in solving all sorts of problems.
Theorem llL P(A c) = 1 - P(A).
Proof. By Axiom 2 and Definition 2.2.3,
peS) = 1 = peA U A C )
But A and A C are mutually exclusive, so
o
and the result follows.
'Theorem 2.3.2. P(0) = O.
Proof. Since 0
=
P(0)
= P(SC) = 1 -
'Theorem 2..3..3. If A C B, then P(A)
(B
which implies
Proof.
peS)
=L
o
P(B).
n A c) are mutually exclusive. ','h,...",t·",,.,,,
P(B) = peA)
TheoremU4.
=
event B may be written in the form
Proof. Note
where A
~
peS)
P(B)
+
=:: P(A) since P(B n
any event A, peA)
~
AC )
PCB
n
AC)
=:: O.
o
L
proof follows immediately
bec:aw;e A C Sand
o
38
Chapter 2
Probability
Theorem 2.3.S. Let A l, A2, . .. •
be events defined over S. If Ai
Proof. The proof ls a straightforward induction
i;U:I;l,um",nL
(i
Aj
= 0 fur i '* j, then
with Axiom 3
the
o
point
Theorem 2.3.6. peA U B)
= peA)
+
P(B)
P(A
(i
B).
Proof The, Venn diagram for A U B certainly
that the statement of the
theorem is true (recaU Figure 2.2.4). More formally, we have from Axiom 3
peA)
= peA
PCB)
=
+
(i
peA
n
B)
and
Adding these two equations
+
= [peA
+
A C)
+
+
peA
n
B)
By Theorem 2.3.5, the sum in the brackets is peA U B). If we subtract peA
both sides of the equation, the resuH follows.
(i
B) from
peA)
PCB)
(i
nC)
p(lJ
(i
peA
(i
B)]
0
EXAMPLE 2.3.1
S such that peA) = 0.3, PCB)
B C ), and (c) P(Ac (i
A and B be two events defined on a sample
peA U B) 0.7. Find (a) peA n B), (b)
a. Transposing the terms in Theorem
= 05,
U
formula for the prohahility
yields a
of an intersection:
peA
n
B)
= peA) + P(B)
- peA U
Here
P(A
n
B)
= 0.3 +
0.5 - 0.7
b. The two cross-hatched regions in
2.3.1 corresp(mcl to A C <lno RC. The union
of
and BC consists of those regions that have cross-hatching in either or both
directions. By inspection, the only portion of S not included in A C U
is the
intersection, A n B. By
2.3.1, then,
=1
-
=1-
=0.9
peA
n
B)
Section 2.1
'l.B
/'
-(
I'-
A;::"
....--
39
B-
~
~~
-
(
The ProbabHity Function
/
s
I
s
FtGURf 2.3.2
Co
The event A C n B corresponds to the region in Figure 2.32 where the cross-hatching
extends in both directions-that is, everywhere in B except the intersection with A.
Therefore,
P(A C
n
B) = PCB) -
= 0.5
= 0.4
peA
n
B)
- 0.1
EXAMPLE 2.3.2
Show that
for any two events A and B defined on a sample space S.
From Example 2.3.18 and 1beorem 2.3.1,
peA
n B) = peA) +
=1
PCB) -
- peA c)
+
peA U B)
1 - P(BC) - peA
U B)
But peA U B) !S 1 from Theorem 2.3.4, so
peA
n B) :!. 1 -
P(A c )
-
P(BC)
EXAMPLE 2.33
Two cards are drawn from a poker deck without replacement. What is the probability
that the second is higher in rank than the first?
Let Al. A2, and A3 be the events "FIrst card is lower in rank," "Hrst card is higher
in rank." and "Both cards have same rank," respectively. Clearly, the three Ai'S are
mutually exclusive and they account for all possible outcomes" so from Theorem 2.3.5.
40
Chapter 2
Probability
Once the first card is drawn, there are three choices for the second that would have the
. Moreover,
demands that peAl) = P(A2), so
same rank-that is, peAl)
implying that P(A2)
= ~.
EXAMPLE 2.3.4
In a newly released martial arts film, the actress playing the lead role
a stunt double
handles an
the physically
action scenes. According to the script,
actress appears in 40% of the film's scenes, her double appears in 30%, and the two
that in a
scene (11) only
them are together 5% of the time. What is the
and (b) neither the lead actress nor the
appears?
the stunt double
!I..
If L is the event "Lead actress appears in scene" and D is the event "Double apf,ealrs
It
in scene," we are given that peL) = OAO, P(D) = 0.30, and peL n D) =
follows that
P(Only double appears)
P(D) -
n D)
=0.30 = n,?,)
Example
b. The event "Neither appears" is the
But peAt least one appears) = peL
N",lCI'I,o"appears)
U D).
1 -
of the event" Atleast one appears."
From
2.3.1 and
then,
peL U D)
+ P(D)
[OAO + 0.30 -
1 - [peL)
= 1 -
- peL
n
D)]
0.05]
= 0.35
EXAMPLE
Having endured (and survived) the mental trauma that comes from taking two
of
chemistry, a year of physics, and a year of
Birf decides to test the medical school
his MeATs to two colleges, X and Y. Based on how his friends have
waters and
fared, he estimates that his probability of being accepted at X is 0.7, and at Y is 0.4. He
ahiu
there is a 75% chance thai at least one of his applications will be rejected.
What is the probability that he gets at least one acceptance?
Let A be the event "School X accepts him" and B. the event "school Y
"
We are given that PCA)
0,7, PCB)
and P(A c U B C ) = 0.75. The question is
peA U 8),
Section 2.3
The Probability Function
PCB) -
n
41
From Theorem
U B) = P(A)
+
peA
B)
Recall from Question
P(A (")
It follows that
somewhere:
B) = 1 - P[(A n B)C]
- 0.75
0.25
of
are not all that bleak-he has an
peA U B)
Comment.
=1
in
= 0.7 + 0.4 - 0.25
=0.85
that P(A U B) varies directly with peA C U BC):
U B) = peA)
+
P(8) -
(1 -
= peA)
+
P(B)
1
+
P(A C U BC ))
P(A
c U
If peA) and PCB),
are fixed, we set the curious
at
one ao:;eotarlce increase if his chances of at least one
chances of getting
QUESTIONS
2.3.L
to a famity-oriented lobbying group, (here is too much crude Language and
"V"'-'''""'"' on television. Forty-cwo percent of the
they screened had language
were considered excessive in
found offensive, 27% were too violent,
'~"b-'.I::>- and violence. What percentage of
did comply with the group's
B be any two events defined on S.
that P(A) 0.4. P(B) = 0.5, and
B) = 0.1. What is the probability that A or B but not both occur?
2.3.3. FYlnrp,RR the following probabilities in terms of P(A), PtB), and P{A n B).
P(A C U BC)
(b)
n (A U B))
A and B be two events defined on S. If !.he probability that at least one of them
2..3.4..
not occur is 0.1, what is PCB)?
occurs is 0.3 and the probability that A occurs but B
2.3..5.
that three fair dice are tossed. Let Ai be the evenl Ihat a 6 shows on the ith die,
1,2,3. Does P(AJ U
U
= ~? '-'AfJ1CUJ".
S such that P«A U B)e) = 0.6 and
A and B are defined on a
A or B but not both will occur?
P(A n B) = 0.2. Whal is the
Ain
£IIifi;#jandAtUA2U",U
At. A2, ...• An be a series of events for
= S.
B be any event defined on S.
B as a
of intersections.
Draw the Venn diagrams that would "n'.,....''''''',...
P(A n B) = P(B)
and (b) peA U B) = P(B).
2.3.9. In the game of "odd man oul" each player tosses a fair coin. If aU the coins turn up the
same except for one, the player tossing the
is
the odd man out
is eliminated from the contest. Suppose that
people are playing. What is the
probability that someone will be eliminated on the
(Hint: Use Theorem 23.1. )
2.3.2.
P(A
n
42
Chapter 2
Probability
2.3.10. An urn contains
numbered 1 through 24. One is drawn at random.
Let A be [he event that the number is divisible by two and let B be the event that the
number is divisible by three. Find P(A U B).
gam!::, a 30% chance
2.3.n. If Slate's footban learn ha~ a 10% chance of
of winning two weeks from now, and a 65°/" chance of
both games, what are
their
of winning exactly once?
23.12. Events AI and
are such that AJ U
n A2 = 0. Find P2 if
P(AI)
PI.
= P2, and 3pI - P2
23.13, Consolidated Tndustries has come under
pressure to eliminate its seemhiring practices. Company officials have agreed that
60% of their new employees will be females and 30% will be
of their
One out lOur new employees. though, will be white males, What
new hires will be minority females?
23.]4. Three
B, and C -are defined on a sample space, S. Given that P( A)
P(B)
O.L and P(C) =
what is the smallest possible value for P[(A U B U
2.3.J5. A coin is to be tossed fOUf times. Define events X and Y such that
=
X: first and last coins have opposite faces
Y: exactly two heads appear
Assume that each of the sixteen HeadfJ'ail sequences has the same probability.
Evaluate
(n) P(XC 11
(b) P(X Ii
2.3.16. Two dice are tossed. Assume that each possible outcome has a ~ probability. Let A
be the event thaI the sum of the faces showing is 6, and let B be the event that the face
showing on one die is twice the face showing on the other. Calculate P(A Ii Be).
2.3.17. Let, A, B, and C be: thr!::!:: t:vt:nt:. ddtn!::d on a samplt:
S. Arrange: the: probabililit!s
of the following events from smallest to
(a) A U B
(b) A Ii B
(c) A
(d) S
n
2.3.18.
2.4
B) U (A Ii C)
is currently running two dot. com scams out of a bogus chatroom. She estimates
that the chances of the first one leadjng to her arrest are one in ten; the "risk"
associated with the second is more on the order of one in thirty. She considers the
busted for both to be 0.0025. What are
chances of
likelihood that she
avoiding incarceration?
CONDITIONAL PROBABILITY
In Section 2.3, we calculated probabiHties of certain events by manipulating other
probabilities whose values we were
Knowing peA},
peA Ii B), for
t:]{amplt!, i:lnuw~ u:. lu
P(A U B) (r!::l.:all Tht!ort!m
Fur many l'!::<ll-wurid
though, the
in a probability problem
simply knowing a
set of other probabilities. Somelimes, we know for a fact that
events halle already
occurred, and those occurrences may have a bearing on the probability we are trying Lo
find. In short. the
of an event A may have to be "adjusted" jf we know for
'VIi"-,,,,,,1"I
2,4
Conditional Probabllity 41
certain that some related event B has already occurred. Any probabjJity that is revised
to take
account the (known) occurrence of other events is said to
a conditional
probability.
Consider a
with A defined as the event «6 appears," Oeariy,
P(A) = But
has already been tossed-by someone who refuses to
us whether or not A
but does enlighten us to
point of continning that B
occurred.. where B is the event "Even number appears:' What are the chances of A now?
up
Here, common sense can help us: There are three equally likely even numbers
the event B-one of them satisfies the event A, so
"updated" probability is
Notice that
effect of additional information, such as
knowledge
occurred, is to revise-indeed. to shrink-the original samp1e space S to a new set of
outcomes Sf. In this example, the original S contained six outcomes, the conditional
sample
three (see
2.4.1).
The symbol P(AIB)-read "the probability of A given
used to denote a
conditional probabiHty. Specifically, P(AIB)
to the probability that A will occur
that B ho.s already occurred.
It will be convenient to have a formula for P(AIB) that can be evaluated in terms of
the
S, rather
the revised S'. Suppose
S is a
sample space with
n outcomes, all equally likely. Assume that A and B are two events containing a and b
outcomes, respectively, and let c denote the number of outcomes in the intersection of A
B (see
2.4.2). Based on the argument suggested in
2.4.1,
conditional
probability of A given B is the ratio of c to b. But cJb can be written as the quotient of two
other
1.
c
-
b
-
bin
so, for this particular case,
P(AIB) = P(A n B)
P(B)
(2.4.1)
same underlying
that leads to Equation 2.4.1, though, holds true even when
the outcomes are not equally likely or when Sis uncountably infinite.
B
~s.
3-
P (6., relative to S) '" 116
RGURE2A1
8)
P (6, relallve to S) '" llJ
s '----------'
AGUR.EVU
Definition 2.4.L
A and B be any two events
on S such
PCB) > O. The
conditional probability of A, assuming that B has already occurred, is written P(AIB)
44
Probability
and is given by
P(AIB)
= P(A
() B)
P(B)
Comment.
2.4.1 can be cross-multiplied to give a frequently
then
expression for the probability of an intersection. If P(AIB) = P(A () B)/ P
P(A () B) = P(AIB)P(B).
(2.4.2)
EXAMPLE 2.4.1
A card is drawn from a poker deck. What is the
the card is a club, given
that the card is a king?
the answer is The king is equally Likely to be a heart, diamond, dub, or
spade. More fonnally, let C be the event
is a
, let K be the event "Card is a
king."
Definition 2.4.1,
!:
P(ClK)
But P(K) =
our intuition,
() K)
() K)
P(card is a king of clubs) =
k
Therefore. confirming
1/52
1
P(CIK) = - =4/52
4
[Notice in tbis example that the conditional proba bility P (C IK ) is
the same as
the
probability P(C)-they both equal This means that our Kll()Wl,eClIl,e
K has occurred gives us no additional insight about the
C occurring. Two
events having this property are said to be independent. We will examine the notion of
lOae.pc;:n(]enc:e and its consequences in
!.
EXAMPLE 2.4.2
l"Irr,nll".1TI" even ones that appear to be
Our intuitions can often be fooled by
simple and straightforward. The "two
here is an often-cited case
in point.
Consider the set of families
two
that [he four possible birth
scqucnccs-(youngcr child is a
child is a boy), (younger child is a boy, older
likely. What is the probability that both children
child is a girl), and so on-are
are boys given that at 1east one is a boy?
The answer is not
The correct answer can be deduced from
By
}-eac:tl
bas
sequences-(b. b), (b, R), (f(, b), and
assumption,
!.
Section 2.4
a
Conditional Probability 45
probability of occurring. Let A be the event that both children are boys, and let B be
event that at least one child is a boy. Then
P(AIB)
= peA
n
B)/P(B)
= peAl/PCB)
since A is a subset of B (so the overlap between A and B is just A). But A has one outcome
{(b, b)} and E has
outcomes ((b, g), (g, b). (b. b)}. Applying Definition 2.4.1, then.
gives
P(AIB)
= (1/4)/(3/4) = :31
Another correct approach is to go back to the sample space and deduce the value of
P(AIB) from first principles. Figure 2.4.3 shows events A and B defined on the four family
types that comprise the sample space S. Knowing that B has occurred redefines the sample
space to include three outcomes:, each now
a probability. Of
possible
outcomes, one-namely. (b, b)--£atisfies the event A. It follows that P(AIE) "" j.
l
I'
((b, b»
(b,g)
A ___- - - ' "
(g,b)
B
(g,g)
I
S = sample space of Iwo-child families
writteoaB (frrst hom, second horn)]
[OlJtromes
AGURE2.4..3
E~PLE2A.3
Two events A and B are defined such that (1) the probability that A occurs but B does
not occur is 0.2, (2) the probability that B occurs but A does not occur is 0.1, and (3) tbe
probability that neither occurs is 0.6. What is P(AIB)?
The three events whose probabilities are given are
on the Venn diagram
shown in Figure 2.4.4. Since
P(neither occurs)
= 0.6 = P«A U 8)c)
it follows that
P(A U B) = 1 - 0.6 = 004 = peA nBC)
+
peA
so
peA
n
B) =0.4 - 0.2 - 0.1
=0.1
n
B)
+
PCB
n AC )
2
, - Neither A nor B
A
s'------------'
AGURE 2.4.4
From Definition 2.4.1, then,
peA
n
B)
~----~=------~----~--~~
peA n B)
0.1
=----
0.1
=0.5
+
P(B
n
+ 0.1
EXAMPLE 2.4.4
The possibility of importing liquified natura] gas (LNG)
has been :SU~,gCl3lCU
as one way of coping with a future
crunch.
though, is the
fact that LNG is highly volatile and
major spill
The question,
occurring near a U.S. port could result
input for future policymakers who
therefore, of the likelihood a
may have to decide whether or not to
the proposal.
Two numbers need to be taken into account: (1) the probability that a tanker will
have an
(2) the probability Lhat a major spill will develop
given that an
Although no significant spills of LNG have yet
in the world, these probabilities can be approximated from records
occurred
on
tankers transporting less dangerous cargo. On the basis of such
it
has been estimated (44) that the probability is 50.~ that an LNG tanker will have an
a<X:IC1!ent on anyone trip. Given that an accident hos occurred, it is suspected that only
3 times in 15,000 will the damage be sufficiently severe that a major spill would
What are the chances that a given LNG shipment would precipitate a
disaster?
Let A denote the event "Spill develops" and let B denote the event "Accident occurs."
Past experience is suggesting that P(B} = ~ and P(AIB)
Of primary concern
is the probability that an accident will occur and a spill will ensue-that is, P(A n B).
Using Equation 2.4.2, we finn that the chances of a
are on the order
Section 2.4
Conditional Probability
41
of 3 in 100 million:
P(Bccident occurs
spill develops)
= peA n B)
= P(AIB)P(B)
3
8
= 15,000 . SO,OOO
O.CJOOOOO()32
EXAMPLE 2..4.5
Max and Muffy are two myopic deer hunters who shoot simultaneously at a nearby
sheepdog
they have mistaken for B lO-point buck. Based on years o( well-documented
ineptitude, it can be assumed that Max has a 20% chance of hitting a statiollary target at
close
Muffy has a 30% chance, and the probability is 0.06 tbat they would both be
on target. Suppose that tbe sheepdog is hit and killed by exactly one bullet. What is the
probability that Muffy fired tbe (atal shot?
A
tbe event that Max hit the dog, and let B
the event
Muffy hit the dog.
Then peA) = 0.2, P(B) = 0.3, and peA n B) = 0.06. We are trying to
P(BI{A C
n
B) U (A nBC»
where the event (A C n B) u
nBc) is the lUlion of A and B
the intersection-that
is, it represents the event that either A or B but not both occur (recall Figure 2.4.4).
Notice,
from Figure 2.4.4 that tbe intersection of B and (A C n B) U (A nBc) is
Therefore. from Definition 2.4.1,
tbe event A C n
P{BI(A C
n
B) U (A nBC»
= [P(A c
n
= [PCB) -
= [0.3
B)]/[P{(A c
n
B) U (A nBC)}]
peA n B)V[P(A U B) - peA n B)]
- 0.06]/[0.2
+ 0.3
- 0.06 - 0.06]
=0.63
CASE STUDY 2.4.1
(Optional)
There once was a brainy baboon
Who always breathed down a bassoon
For he said, "It appears
That in billion.s of years
1 shall certainly hit on a tune."
Eddington
(Conti1ll.l.ed on next page)
50
Chapter 2
Probability
, ("11\" Suull' ),4. j <'I)II/II/IIt',11
A GO THIS BABE AND JUDGEMENT OF TIMEDIOUS RETCH AND NOT LORD
WHAL IF THE EASELVES AND DO AND MAKE AND BASE GATHEM I AY
BEATELLOUS WE PLAY II1EAN!:) HOLY FOOL MOUR WORK FROM INMOST
BED BE CONFOULD HAVE MANY JUDGEMENT WAS IT YOU MASSURE'S TO
LADY WOULD HAT PRIME THAT'S OUR THROWN AND DID WIFE FATHER'ST
LIVENCTH SLEEP TITH I AMBITION TO THIN HIM AND FORCE AND LAW'S
MAY BUT SMELL SO AND SPURSELY SIGNOR GENT MUCH CHIEF MIXTUnN
fiGURE 2,4.6
Om: <.:an only wonder hpw '"human" computer·generaled lex I might be if conditional probabilities for. say. ,even- or eight-letter sequences were av::ulable. KIght now
they are not but
the Lite [hal computer technology is
soon will
be. When that day comes .•• ur monkey will probably still never come up with text as
creative as Hamlet's soliloquy. but a fairly decent
mighl show up
time
to lime!
CASE STUDY 2.4.2
(Optional)
Several years ago. a lelevi~H.m program (inadvertently) spawned a conditional pro!;healed discussions. even in [he nalional
ability problem lh(lt led 10 more Ihan a
media. The 1lhow was Le(,\ Wake (1
and the question involved the
Ihal
conteslants should lake 10 maximize their chances of winning prizes.
On Ihl! program. a conk,tant would he presented with three doors: l;ehind one or
which was the prize. After lhe contestant had selected a door. the host Momy Hall.
would open one of the olhn two doors. showing that the prize was not Ihere. Then
he would give Ihe contestant a choice-either
with the door initially selected or
swih:h to rhe "thinl" UOOI Ihal had flOI bet!1\ ofJened.
For many
common sense seemed to
that switching doors would
make no difference. By assumption. Ihe prize h(ld a one-third chance of
behind
each of the doors when the game
Once a door was opened. it was argued Ihat
each of the remaining dools now had a one-half probahility of hidin!! the
so
contestants gained nOlhing
switching their bets.
Not so_ An application 1,1" Ddlnition 2.-1. L shows thai it ,Ioes make a differt:l1cecontestants, in facL dO/lhle (heir chances of winning by switching doors. To see
a specific (pultypkal) case: the comeslan! has bet on Door #2 and Monty Hall
has opened Ooor IIJ. liIVen lhal sequence of evelUS. we need 10 calculate and compare
rhe t.:on<.lilional probahililv. If the
behind Door # J and Door
respeclively.
It the former is larger (and we will prove (hal it i~). the conlestant should
doors.
ICoJ!lIllJlrd rm
lWrl fllll'l'l
Section 2.4
Conditional Probability
49
TABLE:U,_,
Character
Frequency
Probability
Random Number Range
Space
E
6934
3277
2578
2557
2043
1856
1773
1741
1736
1593
1238
0.1968
0.0930
0.0732
0.0726
0.0580
0.0527
0.0503
0.0494
0.0493
0.0452
0.0351
0.0312
0.0288
0.0252
0.0222
0.0203
0.0178
0.0166
0.0136
0.0123
0.0116
0.0088
0.0072
0.0058
0.0010
0.0008
0.0006
0.0004
00001-06934
06935-10211
10212-12789
12790-15346
17389
17390-19245
19246-21018
21019-22759
22760-'24495
24496- 26088
26089-27326
27327-28425
28426-29439
29440-30328
30329-31111
31112-31827
31828-32456
32457-33040
33041-33518
33519-33951
33952-34361
34362-34670
34671-34925
34926-35128
35129-35162
35163-35189
35190-35210
35211-35224
0
T
A
S
H
N
I
R
L
D
U
M
Y
W
1014
889
783
716
F
C
584
P
B
V
478
433
410
309
K
J
Q
X
Z
203
34
27
21
14
AOOAAORH QNNNDGELC TEFSISO VTALIDMA POESDHEMHIESWON
PJTO.MJ FiL FIM 1 AOFERLMT 0 NORDEERH HMFIOMR.ETWOVRCA
OSRIE IEOBOTOOIM NUDSEEWU WHHS AWUA HIDNEVE NL SELTS
fiGURE 2 A.. 5
a program knowing only single-letter
(Table 2.4.1). Nowhere does
even a single correctly spelled word appear. Contrast that with Figure 2.4.6, showing
computer text generated by a program that bad been given estimates for conditional
probabilities corresponding to aJl 614,656 (= 284 ) four-letter sequences. What we
get is still garble, but the improvement is astounding-more than 80% of the letter
combinations are at least words.
(Continued on next page)
48
Chapter 2
Probability
(niH'
Smdy 2..1.1 (,om iI/lied I
The image of a monkey !\iuing at a Iypewriler. pecking away at random until
he g.et~ lucky anel lype:, nut a perfect copy of the complete works of William
Shakespeare. has long heen a favorite model of slatislician:, and plJilo:,;orhers to
illt1~tJ iltt= lht.' l.li~lillcliun bt=IWL'CIl MlIllt=lhing (hilL if> IIIt=Ol t=lically pO~f>ible but h.tl all
praclical purposes, imp()s~ible. But if that monkey and his typewriter are replaccd
by a bigh-technology computer and if we program in the right sorts of conditional
probabililies, Ihe prospects for generating sOll1e/hill!?, intelligible become a little less
far-fclchcd-mnybe cvcn dislUrbingly less far-fetched (11).
Simulating nonnumerical English text require:,; thai twenty-eight characters be
dealt with: Ihe twenty-six Idters, Ihe space. and the apostrophe, The simplest approach would be to assign each of those characters a number from 1 to 28. Then a
random number in that range would be generated and the character corresponding
to that number would be printed. A second random number would be generated. a
corresponding second character would be primed. and so on.
Would that he a reasonable model? Of course not. Why should, say. X':,; have
the S:-lme chance of heing selected as E's when we know Ihill the latter ilre much
more common? At the VCI1' least. weight:" :"hould be a~signed 10 <111 Ihe characlers
proporti onal to their rdill ive proba hi I ities. Table 2.4.1 shows the e mpirica 1dis! ri bution
of the twenty-six letters. the spllce, and the apostrophe ill the 35,224 chilracters making
up Act III of I/amler. Ranges of random Ilumbers corresponding to each character's
frequency are li~led in the lilst column. If two random numbers were generaled. say.
27351 and 11616. the computer would prim the characters D and O. Doing IhaL of
course. is equivalcnl 10 primil1g a D wilh probabilily n.0312 = [(28425 - 27327 + 1)/
35244 = 1099/352-141 and an 0 with probahility 0.0732 = [(12789 - 10212 + 1)/
352M
= 2578/35244].
Extending Ihi~ idea [0 seljl/cl1Cl'.\· of letters requires an application of Definition
2.4.1. What is the probability. for cX<1mple. that a T follows an E? By definition.
PIT follows an E)
= P(TIE) =
number of ET's
f F'
numher 0 ,~
The analog of Table 2.4.1, then. \\'ould he an array having twenty-eight rows and
twenty-eight columns. The entry in the itll row and jill column woukl he PUIj), the
probabililY that leller i follows letter j.
Tn a 5.imilar fashion. condilional probabilities for longer sequences could also be
estimated. For example. the probability that an A follows the sequence QU would be
Ihe ratio of QUA's to QLJ's:
P( A
follows QU)
= P{AIQLJ) =
number of QUA's
l
f QU'
num )er 0
s
What does our monkey gain by having a typewriter programmed with probabilities
of sequences? Quite a bit. Figure 2.4.5 shows Ihree lines of computer text generated
{CoJJlill(leG ()fl ne,v Pt1{~cJ
Section 2.4
Conditional Probability
51
TABlE 2.4.2
(Prize Location, Door Opened)
(1,3)
(2,1)
(2,3)
(3,1)
1/6
1/6
1/3
Table 2.4.2 shows the sample space associated with the scenario just described. H
the prize is actually behind Door Ill. the host has no choice but to open Door
similarly. if the prize is
Door
the host has no choice but to open Door It!.
In the event that the prize is behind Door #2, though, the host would (theoretically)
open Door IiI half the time and Door #3 half the time.
Notice that the four outcomes in S are not equally likely. There is necessarily a
prize is behind each of the
doors. However, the
one-third probability that
two choices that the host has when the prize is behind Door #2 necessitate that
(2, 3)
one-third probability
represents the
two outcomes (2, 1)
'-'U"'-""";''' of the
being behind Door #2.
then, has the one-sixth probability
in Table
Let A be the event that the prize is behind Door #2, and Let B be the event that the
host opened Door #3. Then
P(AIB) = P(contestant wins by not switching) = [peA
n
B)J/P(B)
=j
Now, let A* be the event that the prize is behind Door #1, and let B (as before)
[n this case,
event that the host opens Door
P(A*IB)
the
= P(contestant wins by switching) = [P(A* n B)l/P(B)
=
[!]/[~ + i]
=j
Common sense would
should always switch
two-thirds.
led us astray again! If given the ....1l1-'1~1i;;, contestants
Doing so ups their
of winning
one-third to
QUESTIONS
2.4..1. Suppose that two fair dice are tossed. What is the probability that
that it
eigbt?
sum equals ten
52
Chapter 2
Probability
2.4.2. Find P(A n B) if PtA) 0.2, P(8):::: 0.4. and P(AIB) + P(BIA):::: 0.75.
2.4.3. If P(AIB) < PIA), show that P{BIA) < PCB).
2.4.4. Let A and B be two events sllch that P«A U B)c} = 0.6 and P(A n 8) 0.1. Let E
be the event that either A or B but not both will occur. Find P(EIA U B).
2.4.5. Suppose that in
2.4.2 we ignored the
of the children and distinguished
(girl, girl). Would the conditional
only three family types; (hoy, boy), (girl, boy),
probability of both children being boys
that at least one is a boy be dilterent
from the answer found on p. 451 Explain.
0.6, P(at
2.4.6. Two evenls, A and B, are defined on a sample space S such thai P(AI8)
least one of the events occurs)
and P(exactly one of Ihe events occurs)
0.6.
Find PI
and P(8).
2.4.7. An urn contains one red chip and one white chip. One is drawn at random. If the chip
selected is
that chip
with two additional red chips are put back inlo the
urn. If a white is drawn, the chip is returned 10 Ihe urn. Then a second chip is drawn.
What is the probability that both selections are red?
2.4.8. Given lhal P(A) = (I and P( 8) = b, show that
=
=
P(AIB) ?:
0+17-1
b
2.4.9. An urn contains one white chip and a second chip thai is equally likely to be white
or black. A chip is drawn at random and relUrned to the urn. Then l:l set:Um.l
is
drawn. What is the probability that a white appears on the second draw given thai a
while appeared on the first draw?
2.4.10.
events A and 8 are such Ihat P(A n B) = 0.1 and P«A U B)e)
0.3. If
0.2, what does prCA n B)I(A U B)C] equal? Him: Draw the Venn diagram.
2.4.11. One hundred voters were asked their opinions of two candidates, A and B, running tor
mayor. Their responses to three questions are summarized below:
=
Yes
Do you like A']
Do you like B?
Do you like both?
65
55
25
(0) What is the probability that someone likes neither?
(b) What is the probability thl:lt someune likes exadly one?
(c) Whal is the probability that someone likes at least one?
(d) What is
probabilily that someone likes at most one?
(e) What is the probability that someone likes exactly one
thai they like at
least one?
(I) or those who like alleasl one, what proportion like both?
(g) Of those who do nol like A, what proportion like B?
2.4.12 A fair coin is rossed three limes. What is the probabilJty that at least two heads will
occur given that at most two heads have occurred?
2.4.13. Two fair dice are rolled. What is Ihe probability that the number on the firsl die was at
least as
as 4 given that the sum of the Iwo dice was eight?
2.4.14. Four cards are dealt from a standard 52-card poker deck. What is the probability Ihat
different
all four are aces given that at least three are aces? Note: There are 270,
sets of (our cards that can be dealt. Assume that the probability associated with each
of those hands is 1/270. 725.
Section 2.4
2.4..15.
Conditional Protxlbilfty
that P(A () BC) = 0.3, P«A U B)c) = 0.2, and P(A
2..4.16. Given that P(A)
2.4.17. Let A and B
B)
= 0.1, find P(AIB).
= 0.4, find P(A).
two events defined on a sample space S such that peA n Be) = OJ,
+
P(B)
= 0.9, P(AIB)
n
53
0.5, and P(BIA)
P(Ac n B) = 0.3, and P«A U 8)e) = 0.2. Find the probability that at least one of
the two events occurs given that at most one occurs.
2.4.18. Suppose two dice are rolled. Assume that each possible outcome has probability 1/36.
Let A be the event that the sum of the two dice is greater than or equal to eight, and
let B be the event that at least one of the dice is a 5. Find P(AIB).
2.4.19. According to your neighborhood bookie, there are
horses scheduled to run in
the third race at the local track, and handicappers have assigned them the foHowing
probabilities of winning;
Horse
Scorpion
Starry Avenger
Australian DoU
Dusty Stake
Outandout
0.10
030
0.20
Suppose that Australian DoH and Dusty
are scratched from the race at the
last minute. What are the chances that Outandoot will prevail over the reduced
field?
2.4.20. Andy, Bob, and Charley have all been serving time for grand
auto. According
to prison scuttlebutt, the warden p1ans to release two of the three next week. They aU
have identical records, so the two to be released win be chosen at random, meaning
that each
a two-third probability of being included in the two to be set free. Andy,
however, is friends with a guard who will know ahead of time which two will
He offers to tell Andy the name of a prisoner other than himself who win be released.
Andy, however, declines the
believing that if he learns the name of a prisoner
scheduled to be released, then his chances of being the other person set free will
drop to one-half (since only two prisoners will be left at that poinl). Is his concern
justified?
Applying Conditional Probability to Higher-Order Intersections
We have seen that conditional probabilities can be useful in evaluating intersection
probabilities-that is, P(A n 8)
P(AIB)P(B)
P(BIA}P(A). A similar result holds
for higher-order intersections. Consider peA n B n C). By thinking of A n B as a single
event--say, D-we can write
=
peA
n
B n C}
=
= P(D n
= P(CID) P(D)
=P(CIA n B)P(A n 8)
=P(CIA n B)P(BIA)P(A)
54
Chapter 2
Probabil ity
Repeating this same argument for n events. A1, A2. "', All,
a formula for the general
case:
P(AI
n A2 n ...
nAn) :P(A"IAI
n
n ... nAil_I)
n A2 n .. n An-2)
. P(An-lIAl
.... P(A2I A l) . peAl)
(2.4.3)
EXAMPlE 2.4.6
An urn contains five white chips,
black chips, and three
chips are
drawn sequentially and without replacement. What is
probability of obtaining the
sequence (white,
white. black)?
2.4.7. shOWS the evolution of the urn's composition as the
sequence is
""""'..... noL'.",..... Define the following
events:
WW
W@
4B
w
4B
~
3R
ew 0WW
®WW
~
w
4B
4B
:3R
2R
2R
_
3B
2R
RGURE 2..4..7
white chip is
on
selection
B: red chip is drawn on second selection
C: white chip is drawn on third seled.ion
D: black crop is drawn on fourth selection
Our objective is to:find peA Ii B
From Equation 2.4.3,
peA
n
B
nCn
D)
n
C
n
= P(DIA n
D).
B Ii C) . P(CIA Ii B) . P(BIA) . P(A)
Each of
probabilities on the right-band side tbe equation bere can be
by just
looking at the urns pictured in Figure 2.4.7: P(DIA Ii B n C) = ~. P(CIA Ii B) = ~,
P{BIA) =
and peA) :::::::
Therefore. the probability of drawing a (wrote, red, wrote.
black) sequence is 0,02:
A.
fi·
peA Ii B Ii C Ii D) =
4
4
9' . 10
240
= 11,880
=0.02
3
.
5
12
Section 2.4
Conditional Probability
55
CASE STUDY 2A.3
Since the lale 1940s, tens of thousands of eyewitness accounts of strange lights in
unidentified flying objects. even alleged abductions by little green men, have
made headlines. None of these incidents, though, has produced any hard evidence,
any irrefutable proof that Earth has been visited by a race of extraterrestrials. Still,
the haunting question remains--are we alone in the universe? Or are there other
civilizations, more advanced than ours, making the occasional flyby?
Until, or unless, a flying saucer plops down on the White House lawn and a strangelooking creature
with the proverbial "Take me to your leader" demand. we
may never know whether we have any cosmic neighbors. Equation 2.4.3. though, can
help us speculate on the probability of our not being alone.
Recent discoveries suggest that planetary systems
like our own may be
quite common.. If so, there are likely to be many planets whose chemical makeups,
temperatures, pressures, and so on, are suitable for life. Let those planets be the points
in our sample space. Relative to them, we can define three events:
A:
life aoses
technical civilization arises (one capable
C: technical civilization is flourishing now
interstellar communication)
In teons of A, B, and
the probability a habitable planet is presently supporting a
technical civilization is the probability of an intersection--specifica1ly, peA n B n C).
Associating a number with peA n B n C) is highly problematic, but the task is
simplified considerably if we work instead with the equivalent conditional formula,
P(ClB
n
A) . P(BIA} . P(A).
Scientists speculate (157) that life of some kind may arise on one-third of all planets
having a suitable environment and that life on maybe 1 of all those planets will
evolve into a
civilization. In our notation, peA} :: land P(BIA) = TAo.
More difficult to estimate is P(ClA n
On
we have had the capability
of interstellar communication (that
radio astronomy) for only a few decades, so
P(CIA n
empirically, is on [he order of 1 X 10-8, But that mayan overly
lJ"o''''UU'''LL~ estimate of a technical civilization's ability to
It may be lrue that
if a civilization can avoid annihilating itself when it
develops nuclear weapons, its
prospects for longevity are fairly good. If that were the case, P(ClA n B) might be as
large as 1 x 10-2 .
Putting these
into the computing formula for peA n B n C) yields a range
for the probability of a habitable planet currently supporting a technical civilization.
chances may be as small as 3.3 x 10- n or as "large" as 3.3 X 10-5 :
or
0.000000000033 < peA n B n
< 0.000033
(Continued 0111101 page)
56
Chapter 2
Probability
(Cose Swdl' 2.43 continued)
A better way to PUI these figures in some kind of perspective is 10 think in
terms of numbers rather Ihan probabilities. Astronomers estimate there are 3 X 1011
habitable plant:ts ill our Milky Way galaxy. Multiplying that total by the lWO limits for
P(A n B n C)
an indication of how many cosmic neighbors we are likely to have.
Specifically. J X 1011 . 0.000000000033"" 10, while 3 x 10" . OJl00033 =. 10,000,000.
So. on Ihe one hand. we may be a galactic rarity. At the same
the probabilities
do not preclude the very renl possibility that the heavens are abuzz with activity and
that our
number in the millions.
QUESTIONS
2.4.21. An urn contains six while chips, four black chips, and five red
Five chips are
drawn OUI, one at a time and without replacemel1t. What is the probability of
the sequence (black. black. red. white, while)? Suppose thallhe chips are numbered I
thrOUgh 15. What is the probabililY of getting a specific
6,4.9, 13)?
2.4.22. A man has II
on a key
one of which opens the door to his apartment. Having
celebrated a bit too much one
he returns home only to find himself unable 10
distinguish one key from another. Resourceful. he works out a fiendishly clever plan:
at random and try it. If it fails to open the doot. he will
He will choose a
it and choose al random one of the
11 - 1 keys. and so on. Clearly. the
probability thai he gains entrance wilh Ihtc fin;{, key he selecls is 1/11. Show that lhe
probability the door opcns with the third key he tries is also 1/11. (Hint: What has to
to Ihe third key?)
happen before he even
2.4.23. Suppose Ihat four cards arc drawn from a standard 52-card
deck. Whal is the
probability of
in order. a 7 of diamonds, a jack of "V(' ....,,~, a 10 of diamonds,
and a 5 of hearts?
2.4.24. One
is drawn at random from an urn thal contains one white chip and one black
white chip is selected, we simply return it to the urn: if the black chip is
chip. 1£
drawn, that chip-together with another black-are returned to the urn. Thena second
chip is drawn. with Ihe same rules for returning it to the urn. Calculate the probability
of drawing lWO whiles followed by thrt:c Lhll.;b.
Calculating "Unconditional" Probabilities
We
Ihis section with lWO very useful theorems Ihal apply to parlilioned sample
spaces. By ddinilion, l:I set of I::Vl::llts AI. Al"", All "partition" S if every outcome in
B
FIGURE 2.4.8
Section 2.4
Conditional Probability
51
the sample
belongs to one and only one of the A;'s-that is, the Ai'S are mutually
Figure
exclusive and their union is S
Let B, as pictured, denote
event
on S. The
result, Theorem 2.4.1,
gives a formula for the "unconditional" probabillty of B (in terms of the Ai's), Then
Theorem
calculates the set of conditional probabilities, peA jiB), j
1.2, ' , .• n.
=
Theorem
Let (Ai
Ai n Aj=0fori *jJ
be a set of events defined over S such that S
> Ofori=1,2, ... ,n.
anyevenlB,
= Ui=l Ai,
II
PCB) = LP(BIA;)P(A i )
i=1
Proof,
the conditions imposed on the
and
But
follows.
PCB
n
Ad can be
ITfr·.ttp'TI
as
product P(BIA;}P(A 1), and the result
o
EXAMPlE lA.1
Urn I contains two red
and four
chips; urn II, three red and one white, A chip
is drawn at random from urn I and transferred to urn II. Then a chip is drawn from urn n.
What is the probabiHty
the chip
from urn II is
Let B be the event "Chip drawn from urn n is red"; let Al and A2 be
events "Chip
tral.1Sferred from urn I is red" and "Chip transferred from urn I is white," respectively.
By
(see
2.4.9), we can deduce all the probabilities
in the
right-hand side of the
In
•
o
one
Red
•
While
•
•
UmII
Urn I
FKiURE 2.4..9
4
5
4
6
- - Drawone
58
Chapter 2
Probability
Putting all this infonnation together, we see that the chances are two out of three that a
red chip will be drawn from urn II:
P(B)
= P(BIAl)P(At) +
P(BIA2)P(A2)
3 4
-4 . -2 + _.
5£)5
2
6
3
EXAMPLE 2.4.8
is removed. What is the probability
A standard poker deck is shuffled and the card on
that the second card is an ace?
Define the following events:
8: second card is an ace
At: top card was an ace
Az: top card was not an ace
=
12
P(8IA?)
:#r, P(A1)
~. Since the Ai'S partition
P(BIAt)
the sample space of two-card selections,
2.4.1 applies. Substituting into the
expression for P(B) shows that is the probability that the second card is an ace:
PCB)
= P(8IAIlP(At) +
3
= 51
4
. 52
4
P(8IAz)P(Az)
48
+ 51 .
4
CommenL Notice that PCB) = P(2nd card is an
is numerically the same as
The analysis jn Example 2.4.8 illustrates a basic nnnl"1nlf'
= P(first card is an
in probability that says, in
"what you don't know, doesn't matter." Here, removal
stlbsequent probability calculations if the identity of
of the top card is irrelevant to
that card remains unknown.
peAl)
EXAMPLE 2.4.9
Ashley is hoping to land a summer internship with a public relations firm. If her interview
an offer. If the
is a bust, though,
goes wen, she has a 70% chance of
her chances getting the position drop to 20%. Unfortunately, Ashley tends to
incoherently when she is under stress, so the likelihood of the interview going well is only
0.10. What is the probability that Ashley gets the internship?
Let B be the event "AshJey is
internship," let Al be the event "Interview goes
well," and
Az be the event "Interview does not go welL" By assumption,
P(BIAI)
peAl)
= 0.70
= 0.10
P(BIA2)
P(A2)
= 0.20
=1
peAl)
=1 -
0.10 = 0.90
Section 2.4
According to Theorem 2.4,1, Ashley
P(8)
a 25% chance
= P(8IAl)P(Al) +
Cooditlooal Probability
59
landing the internship:
P(8IAz)P(AZ)
== (0.70)(0.10) + (0.20)(0.90)
0.25
EXAM PtE 2.4.10
In an upstate congressional race, the incumbent Republican (R) is running
a field
(Dh
and DJ) seeking the nomination. Political pundits estimate
of three
primary are 0.35, 0.40, and 0.25,
that
probabilities of Dl} D:t, and D3 winning
respectively. Furthermore, results from a variety of polls are suggesting that R would have
a 40% chance of defeating DI in the general election, a 35% chance of defeating D'2, and
Assuming all
to be accurate, what are the
a 60% chance of defeating
chances that the Republican will retain his
Let 8 denote the event that" R wins general election," and let denote the event" Di
wins Democratic primary"'; i I, 2, 3. Then
=
and
P(RIAI)
= 0.40
P(RIAz)
=0.35
so
P(8) = P(Republican wins general election)
= P(8IAl)P(Al)
+
P(BI A2)P(Az)
+
P(BI A 3)P(A3)
= (0.40)(0.35) + (0.35)(0.40) + (0.60)(0.25)
=0.43
EXAMPLE 2.4.11
Three chips are placed in an urn. One is red on both sides, a second is blue on both sides,
and the third is red on one side and blue on the other. One
is selected at random
is
What is the
and placed on a table. Suppose that the color showing on that
probability
the color underneath is
red (see Figure 2.4.10)1
At first glance, it may seem that the answer is one·half: We know that the blue/blue
chip has not been
and only one of the remaining two-the red/red CIDII)--S2ltlS'l1es
the event that
color underneath is
Ifthls game were pLayed over and over, though.,
and records were kept
outcomes, it would be found that the proportion of times
that a red top has a red bottom is two·rhlrds, not the one-half that our intuition might
suggest.
correct answer follows from an application of Theorem 2.4.1.
60
Chapter:2
Probability
BQ)B
RQ)R
e
d
e
d
I
I
u
u
e
e
RQ)B
e
I
d
~
RGURE 2.4.10
Define the following events:
bottom side of chip drawn is red
top side of chip drawn is red
At: red/red chip is drawn
A2: bluelblue chip is drawn
red/blue
is drawn
A:
B:
From the definition of conditional probability,
P(AIB)
=
But P(A n B)
P(both sides are red)
used to find the denominator, P(B):
P(B}
= P(BIA1)P(Al) +
1
=1':3+
0
peA n
PCB)
P(red/red chip)
P(BIA2)P(A2)
+
=
Theorem 2.4.1 can be
P(BIA3)P(A3)
III
3
'3+2:
1
=2:
Therefore,
P(AIB)
= 1/2 =-23
Comment. The question posed in Example 2.4.11 gives rise to a simple but effective
con game, The trick is to convince a "mark" that the initial
given on page 59 i:s
correct, meaning that the bottom has a fifty-fifty chance of being the same color as
Under that incorrect presumption that the game is "fair," both panicipants put up
the same amouot of money, but the gambler (know.ing the correct analysis) always bets
that the bottom is the same color as the top. In the long run., then, the con artist will be
winning an even-mooey bet two-thirds of the time!
Section 2.4
Conditional Probability
61
QUESTIONS
2.4.25. A toy manufacturer buys ball bearings from three different suppliers-50% of her
2.4.26.
2.4.27.
2A.2&.
2.4.29.
2A.30.
2.4.31.
2.4.32.
2.4.33.
2.4.34.
total order comes from supplier 1, 30% from supplier 2, and the rest from supplier
3. Past experience has shown thal the quality control standards of
three ""!-'VU'LOL
are not all the same. Two percent of the ball
produced by supplier are
defective, while suppliers 2 and 3 produce defective bearings 3% and 4% of the time,
respectively. What proportion of the ball
in the toy manufacturer's inventory
are defective?
A
coin is
If a head turns up, a fair die is
if a tail turns up, two fair
are tossed. What is
probability that the
(or the sum the faces) showing
on the die (or the dice) is equal to six?
Foreign policy experts estimate that the probability is 0.65 that war will
out next
year between two Middle East countries if either side significantly escalates terrorist
activities: Otherwise, the likelihood of war is estimated to be 0.05.
on what has
happened this year, the chances of
a critical level in the next twelve
months are thought to be three in ten. What is the probability that the two countries
will go [0 war?
.
A telephone solicitor is responsibte for canvassing three suburbs. In the
60% of
the completed calls to Belle Meade have resulted in contributions. compared to 55%
for
Hill and
for Antioch. Her list of telephone num bers inc! ude..<; one thousand
households from Belle,Meade. one thousand from
Hill, and two thousand from
Antioch. Suppose [hat she picks a number at random from the list and places the call.
What is
probability that she
a donation?
If men constitute 47% of the population and teU the truth 78% of [he time, while
women tell the [ruth 63% of the time, what is the probability that a person selected at
random will answer a question truthfully?
I
three red
and one white chip. Urn II
two red
and
two white drips. One chip is drawn from each urn and transferred to the
urn.
Then a chip is drawn from the first urn. What is the probability that the chip ultimately
drawn from um I is red?
The crew of the Starship Enterprise is considering launching a surprise attack against
Borg in a nemral quadrant. Possible interference by
Klingons, though. is causing
Captain Picard and Data to reassess their strategy. According to Data's calculations,
the probability of the Klingons joining forces with the Borg is 0.2384. Captain Picard
feels that the probability of the attack being successful is 0.8 if the Enterprise can catch
the Borg atone, but only 0.3 if they have to engage both adversaries. Data claims that
mission would be a
misadventure if its probability of success were not at
least 0.7306. Should the Enterprise attack?
Recan
"survival" lottery described [n Question 2.2.14. What is the probability of
release associated with the prisoner's optimal strategy?
State Co1!ege is playing Backwater A&M for the oonference footba1! championship.
If Backwater's first-string quarterback is healthy. A&M has a 75% chance of winning.
If they have to start their backup quarterback, their chances of winning drop to 40%.
The team physician
that there is a 70% chance that the first-string quarterback
will play. What is the probability that Backwater wins the game?
An urn oontains
red
and sixty white chips. Six chips are
out and
discarded, and a
chip is drawn. What is the probability that the seventh chip is
red?
62
Chapter 2
Probability
2A.J5. A
2.4.36.
2.4..37.
2.4.38.
2.4.39.
has show!l that seven out of I.en people will say "heads" if asked to call a coin
though, a head occurs, on the average, only five limes
that the coin is
OUl
ten. Does it follow that
have the advantage if you let the other person call
the toss? Explain.
Based on pretrial speculation, the probability that a jury returns a guilty verdict in
a celtain high-profile murder case is thought to be 15% if the defense can discredit
the police department and 80% if
cannot. Veteran court observers believe that
the skilled defense attorneys have a 70% chance of convincing the
that the
either contaminated Or planted some of the key evidence. What is
probability Ihat
the jury returns a guilty verdict?
As an incoming freshman, Marcus believes that he has a 25% chance of earning a GPA
in the 3.5 to 4.0 range, a 35% chance of graduating with a 3.0 to 3.5 GP A, and a 40%
chance of
with a
less than 3.0. From what lh~ pre-med advisor has told
him, he has an 8 in 10 chance of
into medical school if his GPA is above
a5
in 10 chance ifhi£ GPA is in
to 35 ranee, and only a 1 in iO chance if his GPA
falls below 3.0. Based on those estimates, what is the probability that Marcus gets into
medical school?
The governor of a certain state has decided to come out strongly for prison reform and
is
a new early-release program. Its gUidelines are
related to
JJ1~lJIl:x:ni uf the gU\lefllOf's staff would have a 90% chance of being released early; the
proba bilily of early release for inmates not related to the governor's staff would be 0.01.
Suppose that 40% of all
are related to someone on the governor's siaff. What
is the probability that a prisoner selected at random would be eligibJe for early release?
are the percentages of sludents of Slate College enrolled in each of the
school's main divisions. Also listed are (he projX)rtions of students in each divisiun
who are women.
Divjsion
%
% Women
Humanities
Natural Science
History
Social Science
40
10
30
20
60
15
45
75
100
Suppose the Registrar selects one person at random. What is the probability that the
student selected will be a male?
Bayes Theorem
The second resull in this section that is set
the backdrop of a partitioned sample
has a curious history. The first exp]icit statement
Theorem
in
1812, was due to Laplace, but it was named after the Reverend Thomas Bayes, whose
1763 paper (published posthumously) had already outlined the result. On one It:vd, th~
theorem is a relatively minor extension of the definition of conditional probability. When
viewed frum a loftier perspective, though, it lakes on some rather profound philosophical
implications. The
in
have precipitated a schism among
statIStiCIanS:
"Bayesians" analyze data one way; "oon-Bl'Iyesil'lns" often take a fundamentally different
approach (see Section 5.8).
Section 2.4
OUf
use of the result here will have nothing to do with
Conditional Probability
63
statistical interpretation. We
will apply it simply as the Reverend Bayes originally intended, as a formula for
a certain kind of "inverse" probability. If we know P (BIAi) for all i. the
us to compute conditional probabilities "in the other direction"-that is, we can deduce
P(AjIB) from the P(BIA/)'s.
TIleol'em 2.4.2.. (Bayes) Let {Ai
be a set of n events, each with positive probability,
that partition S in such a way that U7=1 Ai = S and Ai n A j = 0 for i '# j. For any event B
(also defined On S), where PCB) > 0,
for any 1 ::: j :::; n.
Proof.
Definition
But Theorem 2.4.1 allows the denominator to be written as
P(BIAi )P(A,), and the
o
result follows.
PROBLEM-SOLVING HINTS
(Working with Partitioned Sample Spaces)
Students sometimes have difficulty setting up problems that involve partitioned
sample
particular. ones whose solution requires an application of
Theorem 2.4.1 or 2.4.2--because of the nature and amount information that needs
to
incorporated into the answers. The
is learning to
which part of
the "given" corresponds to B and which parts correspond to the Ai'S. The following
hints may help.
1. As you read the question, pay particular attention to the last one or two sentences.
Is the problem asking for an unconditional probability (in which case Theorem
2.4.1 applies) or a conditional probability (in which case Theorem 2.4.2 applies)?
2. If the question is asking for an unconditional probability, let B denote the event
whose probability you are trying to
if the question is asking for a conditional
already happened.
probability, let B denote the event that
3. Once event B has
identified, reread the beginning of the question and
assign the Ai'S.
EXAMPLE 2.4.12
A biased coin, twice as likely to come
beads as tails, is tossed once. If it shows heads, a
chips; if it shows
chip is drawn from urn r, which contains three white chips and four
64
Chapter:2
Probability
6W
3R
Urn I
Uroll
White
is drawn
FIGURE 2..4.11
tails, a chip is drawn from urn II, which contains six white chips and three red chips.
a
was drawn, what is
that the coin came up
(see
Given
Figure 2.4.11)?
and P(Tails)
Since J'(Heads)
2J'tTails), it must be true that lL:U~i:1U:'J =
Define the events
i
white chip is
AI: coin came
B:
chip came from urn I)
chip came from utn II)
Figure 2.4.11,
objective is to find P(A2IB).
3
7
P(BIAl) =-
P(Ad
2
=3
so
16
EXAMPlE 2..4.. 13
a power blackout, one
persons are
on
of looting. Each
is
a polygraph tesl
experience it is known
the polygraph is 90%
reliable when administered t.o a guilty suspect and 98% reijabJe when given to someone
who is innocent. Suppose that of the one hundred persons taken into custody, only twelve
were actually involved in any
What is the probability that a
suspect is
innocent
that the polygraph
is guilty?
Let B be the event UPolygraph
is guilty," and let Al
"Suspe~t is
and "Suspe.ct is not guilty," rec;pectively. To say
Section 2.4
Conditional Probability 65
is "90% reliable when administered to a guilty
means that P(BIA}) = 0,90.
Similarly. the 98% reliability for innocent suspects implies
P(BcIAz) = 0.98, or,
equivalently, P(BIAz) :::: 0.02.
We also know that peAl) ==
and P(Az) = ,p&. Substituting into Theorem 2.4.2,
tben, shows that tbe probability a suspect is innocent gjven that the polygraph says he is
guilty is 0.14:
/&
P(AzIB)
P(BIAZ)P(Az)
P(BIAz)P(A2)
= P(BIA])P(AI) +
(0.02)(88/100)
(0.90)(12/100) + (0,02)(88/100)
=0.14
-~~~~~--~~~~~
EXAMPLE 2.4.14
As medical technology advances
adults become more health conscious, the demand
for
screening tests inevitably increases. Looking for problems, tbougb, when
no symptoms are
can have undesirable
that
outweigh the
intended benefits.
Suppose, for example, a woman bas a medical procedure performed to see whether
she has a certain type of cancer. Let B denote the event that the
says she has cancer,
does not).
and let Al denote the event that she actuaUy does (and Az. the event that
Furthermore, suppose the prevalence of the
and the precision of the diagnostic
test are such that
= 0.0001
=
. [and P(Az) 0.9999]
P(BIAl) = 0.90 = P(testsays woman has cancer when, in
peAl)
P(BIAz)
she does)
= P(BIAf) =0.001 = P(fa.lsepositive) = P(test says woman has cancer
when, in fact, she
not)
What is the probability that she does have cancer,
that the diagnostic procedure
says
does? That is, calculate P(AIIB).
Although the metbod of solution here is straightforward, tbe actual numerical answer
is not what we would expect. From Theorem 2.42.
P(BIA,)P(Al)
B
P(AtI ) = P(BIAl)P(Al) + P(BIAf)P(Af)
(0.9) (0.0001)
-~--~~~--~--------
=0.08
So, only 8% of those women identified as having cancer actuaUy do! Table 2.43 shows
strong dependence P(A1IB) on peAl) and P(BIAf)·
66
Chapter 2
TABLE 2.4.3
P{AI)
0.0001
0.001
P(AIIB)
0.001
0.0001
0.001
0.0001
0.01
0.001
0.0001
0.08
0.47
0.47
0.90
0.90
0.99
In
of these probabilities, the Pr1lctllCaJlIl), of screening programs
at diseases
having a low prevalence is open to
especially when the diagnostic procedure,
il.M:lf, pu:.t::s a nunlrivhd health ril>k.
th~c two re~OIl.6, the U:i>e of che::>,!
to screen for tuberculosis is no
advocated by tbe medical community.)
EXAMPLE 2.4,15
to the manufacturer's specifications, your home burglar alarm has a
off if someone breaks into your
During the two years you
lived
the alarm went off on five different nights, each time for no apparent reason.
Suppose the
goes off tomorrow night.. What is the
someone is trying
to break
Note: Police statistics show that the chances any particular
house in your
being burglarized on any
are two in len thousand.
Let B be the event
goes off tomorrow night,"
let Al and A2 be the events
"House is
and "House is not being
"respectiveLy. Then
P(BIAl}
0.95
P(B[A2)
5/730
peAl)
(i.e., five nights in two years)
= 2/10,000
9,998/10,000
The pmbabilityin question is P(AtIB).
Intuitively, it might seem that P(AIIB) should be dose to one because
probabilities look
is dose to one (as it shouLd be) and
tozem (as it should
P(AtIB) turns out to
=--~----~--~~~~------~~
(0.95)(2/10,000) + (5/730) (9998/10.000)
= 0.027
Section 2..4
Conditional Probability
67
That is, if you hear the
going off, the probability is only 0.027 that the house is
burglarized.
P(A2) is so
ComputationaUy.
reason P{AIIB) is so
makes the denominator
P(AtlB) large and,
«washes out" the numerator.
Even if P(BIAl) were substantially increased (by installing a more expensive
P(AIIB) would remain largely unchanged (see
2.4.4).
TABlE 2.4..4
0.95
P(AllB)
0.97
0.027
0.99
0.999
0.028
0.028
EXAMPLE 2_4.16
Currently a college senior, Jeremy has bad a secret crush on Emma ever
grade. Two
fearing that his feelings would forever go unrequited,
Man, acknowledging his
sesilence and sent
a letter t1:lTough
cret romance. Now, fourteen agonizing days
bas yet to receive a
Hoping
someone'g
against hope, Jeremy and his fragile psyche are clinging to the possibility
letter was lost
mail Assuming that (1)
(wbo is actually
dating
Jeremy's father) has a 70% cbance of mailing a
if, in fact,
had
the
letter and (2)
Campus Post Office has a one in fifty chance of losing any particular piece
of mail, what is the probability that Emma never received Jeremy's confession of the heart?
Let B
the event that Jeremy
not receive a response; let AJ
A2 denote
the events
did and did not, respectively, receive Jeremy's
objective
is to find P(A2IB).
From what we know about Emma's behavior and the incompetence
the Campus
P(A2) =
and, course, P{BIA2) = L Also,
Post
P(BIAl)
= P(Jeremy receives no
I Emma received Jeremy's letter)
P lEmma does not respond U
P{Emma does not respond)
=
X P(Emma responds)
0.30 + (1/50)(0.70)
0.314
+
responds II Post Office loses letter)]
P(letter is lost I Emma responds)
68
Chapter 2
Probability
Sadly, the
is nol
nevvs for Jeremy. If P(A2IB) = 0.061, it
follows that Emma's
rtlctliveu the Itluer but nol caring enough to
respond was almost 94%. "Faint heart ne'er won fair
but Jeremy would probably
be weB-advised to direct his romantic
H
QUESTIONS
2.4.40. Urn I contains two white chips and one red
urn II has one white chip and two
red
One chip is drawn at random from urn and transferred to urn n. Then One
chip is drawn from urn II. Sup}XlSe that a red chip is selected from urn II. What is Ihe
probability thai the chip transferred was white?
2.4.4L Urn I contains three red chips and five white chips; Urn II contains four reds and
four
Urn III contains five reds and three whiles. One urn is chosen at random
and one chip is drawn from thai urn. Given that the chip drawn was red. whal is the
probabilily lh<tl III W<t~ lh<: urn ~lIIph:d?
2.4.42.. A dashboard
light is
to flash red if a car's oil pressure is too low. On
a certain model. the probahmty or the light flaShing when it should is 0.99; 2% of the
reason. If there is a 10% chance thai the oil
time, though. it flashes (or no
pressure really is low. what is the probability that a driver needs to be concerned if the
warrnng light goes on'!
2.4..43. Buildingpermils were issued
to three contractors startingup a new SUbdivision:
Tara Construction built two
three houses: and Hearthslone, six
of developing leaky basements; homes
houses. Tara's houses have a 60%
thal same problem 50% or the time and 40%
buill by Westview and Hearthstone
of the lime, respectively. Yesterday, lhe Betler Business Bureau received a complaint
from one of the new homeowners that his basement is
Who is most likely to
have been the conlractm-?
2.4.44. Two sections of a senior probability course are
From what she has heard
aootu Ih~ Iwo inslrtlClors lisl~{1. Fr?lnf'.esril eslimilles IhM her ch;m~s of prI,-<;ing the
Y. The section
course are 0.85 if she gets professor X and 0.60 if she
into which she js put is determined by the registrar.
her chances of being
pn)IeSSc)rX are four out of len. Fifteen weeks later we learn Ihat Francesca
pass the course. What is the probability she was enrolled in
X's
section?
2.4.45. A
store owner is willing to cash personal checks for amounts up to
but she
of customers who wear sunglasses.
of checks written by
bounce. In contrast. 98% of the checks written by persons
not wearing
clear Ihe bank She estimates that 10% of her customers wear
:'Ullgl<.il>~~:'. If Iht: b<tuk IdUlII:. a check and lI~afks it "insufficieut fWlds." what is the
sunglasses?
probability it was written by someone
2.4..46.. Brett and Margo have each thoughl ahout murdering their rich Uncle Basil in
claiming their inheritance a bit early.
to take advantage of Basil's
rat poison in the cherries flambe:
for immoderate desserts, Brett has
unaware of Brett's
has
the chocolate mousse with cyanide. (jiven the
amounls likely lo be eaten. the
of the rat poison being falal is 0.60: the
cyanide. O.9(). Based on other
where
was presented with the same dessert
options, we can assume that he has a
chance of asking for the cherries Hambe, a
40% chance of ordering the chocolate mousse, and a 10"/0 chance of skipping dessert
Section 2.5
2.4..47.
2A,48.
2.4.50.
2.4.51.
2.4.53.
Independence
69
altogether. No sooner are the dishes cleared away when Basil drops dead. In the
absence of any other evidence, who should be considered the prime ,,'-'""v"'.....
Josh takes a twenty-question multiple-choice exam where each question has five
answers. Some of the answers he knows, while others he gets right just by making
lucky guesses. Suppose that the conditional probability of his knowing the answer to a
randomly selected question given that he got it
js 0.92. How many of the twenty
questions was he prepared for?
Recently the
Senate Committee on
and Public Welfare investigated the
feasibility of setting up a national screening program to detect child abuse. A team
of consultants estimated
following probabilities: (1) one child in ninety is abused,
(2) a physician can detect an abused child 90% of the time, and (3) a screening program
would incorrectly I a be13 % of all nonabused children as abused. What is the probability
that the screening program makes that diagnosis?
that a child is actually abused
How does the probability change if the incidence of abuse is one in one thousand? Or
one in fifty?
At State University,
of the students are majoring in Humanities,5O% in History
and Culture, and 20% in Science. Moreover, according to figures released by the
Registrar. the percentages of women majoring in Humanities, History and Culture, and
Science are 75 %,45 %, and 30%, respectivel y. Suppose Justin mee 1s Anna at a fraternity
What is
probability that Anna is a History and Culture ...... ""...... /
An !'eyes-only" diplomatic message is to be transmitted as a binary code of Os and Is.
Past experience with the equipment being used suggests that if a 0 is sent, it will be
as a 1 10% of the
(correctly) received as a 090% of the time (and mistakenly
it will be
as a 1
of the time (and as a 0
of the
time). If a 1 is
time). The text
sent is thought to be 70% 1s and 30% Os. Suppose the next
sent is received as a What is the probability that it was sent as a O?
When Zach wants to contact his girlfriend and he knows she is not at home, he is twice
as likely to send her an
as he is to leave a message on her answering machine.
is 80%; her chances
The probability that
responds to his e-mail within three
of being similarly prompt in answering a phone message increase to 90%. Suppose
responded to the message he left this morning within two hours. What is the probability
that Zach was communicating with her
e-mail?
A dot.com company ships products from three
comins
A are
Based on customer complaints, it appears that 3 % DC the
somehow faulty, as are 5% of the shipments coming from B, and 2% coming from C.
Suppose a customer is mailed an order and calls in a complaint the next day. What is
the probability the item came from Warehouse C? Assume that Warehouses A, B, and
C ship 30%, 20%, and 50% of the dot-corn's sales, respectively.
A desk
three
The first contains two
coins,
second has two silver
coins, and the third has one gold
and one silver coin. A coin is drawn from a drawer
selected at random. Suppose the coin selected was silver. What is the probability that
the other coin in that drawer is gold?
INDEPENDENCE
Section 2.4
with the problem of reevaluating the probability of a given evenl light
is the
of the additional information that some other event has already occurred. It
probability of the given event remains unchanged, regardless of
case, though, that
the outcome of the second event-that is, P(AIB) = peA) = P{AIBc). Events sharing
10
Chapter 2
Probability
this prupt:rty <In: ~l:Iiu to be
Dt:finiliun 2.5.1
condition for two events to be independent.
a nt:cessary and sufficit:nl
independent if peA n B) =
Definition 2.S.L Two events A aDd B are said to
peA) . PCB).
The fact that the probability of the mt,ersecllOn
events
is equal to the product of their individual probabilities follows
from our first
independence, that P (A I B) = P(A). Recall that the rIP . . ." " " , , , of
holds true for any two events A and B [provided that
> 0) I:
n
P(AIB) = peA
B)
PCB}
n
But P(AIE) can equal peA) only if peA
B) factors into peA) times PCB).
EXAMPLE 2.5.1
poker deck and B. the event of
B are independent because the
drawing a diamond.
prob<lbility of their
ufuiamumb-is I::I..Jua11u peA} . PCB):
peA
n
5~
B) =
1
1
peA) . PCB)
4
EXAMPLE 2.5.2
~UIPP()Se
that A and B are independent events. Does it foHow that
and
are also
c
That is, does peA n B} = peA} . PCB) guarantee that P(A nBc) =
acc:oITlpllshe;d by equating two different ... v., ...p'~"'''Yn''' for
u
u
COlnplenlen,lS is the complement of their intersection (recall Ques-
But the union
tion 2.232).
=1 -
u
Combining Equations
and
Since A and B are independent, peA
nBc) = 1 -
n
(2.5.2)
we
+1
1 - peA n B) = 1 - peA)
P(A c
P(A
n
peA)
= peA)
+1
= [1 - P(A)][l = P(A c )
.
P(B c )
PCB)
PCB), so
- PCB} - [1
PCB)]
P(A c nBC)
P(A)· P(B)]
Section 2.5
Independence
11
are. themselves, independent. (li A and
the latter factorization implying
B are independent, ate A and B C lDClepenClen[{
EXAMPlE 2.5.3
Administrntors-R-Us is
litigation by establishing hiring
far they
agreed to employ the 120 people
goals by race and sex for its
characterized in Table 25.1. How many black women do they need in order for the
events A: Employee is female and B:
is black to be independent?
TABLE 2..5.1
White
mack
50
40
30
Male
women necessary for A and B to be independent
the number
Let x
Then
P(A
(i
= P(Black female)
= x 1(120 +
x)
must equal
P(A)P(8)
=
Settingx/(120
black women
+
P(Black) = [(40
= [(40 + x)/(120
to be on the
+ x)/(120 + x)]
+ x)]
. [(30
+ x)/(l20 + x)]
. [(30 + x)/(l20 + x)] implies that x = 24
for A and B to be independent.
C4>mment. Having shown that
is female" and "Employee is black" are
it follow that, say, "Employee is male" and "Employee is white" are
virtue of the derivation in Example 2.5.2, the independence of
events A and B implies the independence of events A C and B C (as well as A and BC
B). It follows, thell, that the x = 24 black women not only make A and B
maerxmaem they also imply, more generally, that "race" and
are independent.
two events. A and B, each baving nonzero probability, aremutuaUyexc)usive.
Are they
independent?
No. A and B are mutually exclusive, then peA (i 8) = O. But P(A) . PCB) > 0 (by
the equality spelled out in Defini tion 25.1 that characterizes independence
is not met.
72
2
Deducing Independence
Sometimes the physicaJ circumstances surrounding two events make it
that
tbe occurrence (or nonoccurrence)
one has absollHeJy no influence or
occurrence (or nonoccurrence) of the other. If that should be the case, then
two events
will necessarily be independent in the sense of Definition 2.5.1.
SUPJJ'Ose a cOIn is tossed twice.
whatever
on the first toss has no
physical connection or
on the outcome
If A
B, then, are
events defined on the second and first
it would have to be the case
that P(AIB) = P(aIB c ) = P
For
let A be the event that the second toss of
i:I fair cuin is a h~au, and Jet B be the event that the first ross of thac coin is a tail. Then
on second toss I Tail on
P(AIB)
toss) = P(Head on second toss) =
1
2
abLe to infer that certain events are
proves to be of enormous help
nrl'\l'\lprr'~ The reason is that
events of interest are, in fact, interevents are independent then
probability of that intersection reduces
product (because of Definition 2S1)-that is, P(A n B) = peA) . P(B).
For the coin tosses just de.scrihf".d,
P(A
n
B)
= P(bead on second toss n
= P(A) . P(B)
on
= P(he.ad on f'(t";cond toSf'()
1
on first
1
= 2: 2:
1
=4
EXAMPLE 2.5.5
Myra and Carlos are summer interns working as proofreaders for a local newspaper.
Based on
Myra has a 50% chance of sJJ'Otting a hyphenation error, while
up on that same kind of mistake 80% of the time. Suppose the copy they are
'-""''''''I'; "'·u....O'''." a hyphenation error. What is the probability it goes
Let A and B be
events that Myra and Carlos, respectively, catch the mistake.
assumption, peA) ..;.. 0.50 and P(lJ) = 0.80. What we are looking for is the probability of
the complement of a union. That is,
P(error goes undetected)
=1
=1
P(error is detected)
P(Myrll or rarlOR or hoth see the mist.ake)
=1
P(A U B)
=1
{peA)
+
P(B) -
P(A
n
B)l
(from
Section 2.5
Independence
73
proofreaders invariably work by themselves, events A and B are necessarily
independent, so P(A hB)_would reduce to the product, peA) . P(B). It follows that
an error would
unnoticed 10% of the
P(error goes undetected)
=1
(0.50
+ 0.80
(O.50)(O.80)} = 1 - 0.90
=0.10
EXAMPLE
Suppose that one of the
associated with the control of carbohydrate metabolism
exhibits two aUeles--a dominant Wand a
w. If the probabilities of the WW,
Ww,
ww genotypes in the present generation are p, q, and r, respectively, for both
males and females, what are the chances that an individual in the next generation will be
aww1
Let A denote the event that an offspring receives a w allele
its
let B denote
the event that it receives the recessive allele from its mother. What we are looking for is
peA
n
B).
According to
information given,
p
= P(parent
genotype WW) = P(WW)
q = P(parent has genotype Ww)
,. = P(parent has genotype
= P(Ww}
= P(ww)
If an offspring is equally likely to receive either of its parent's alleles, the probabilities of
A and B can be computed using Theorem 2.4.1:
peA) = peA I WW)P(WW)
+
peA I Ww)P(Ww)
+
peA
J
ww)P(ww)
1
=O·p+Z·q+l.,.
=,. + 'L2 =
PCB)
any evidence to
contrary, there is every reason here to assume
are independent events, in which case
J...4'_Alllb
peA
n 'B)
A and B
= P(ofispring has genotype ww)
ir
= peA) . PCB)
=(r
+
particular model for allele segregation, together with the independence assumption,
is called random Mendelian mating.)
74
Chapter 2
Probabi Irt.y
EXAMPLE 2.5.7
and Josh
just gotten engaged. What is the probability that they have ...u,....,,_u.
blood types? Assume that blood types for both men and women are distributed in
population according to the following proportions:
Blood
A
40%
B
10%
5%
45%
AB
0
First, note that the event "Emma and Josh have different blood types" includes more
possibilities than does the event "Emma and Josh have the same blood type." That being
posed.
the case, the
will be easier to work with than the
We can start, then, by writing
P(Emrna and Josh have different blood types)
= 1 and Josh have the same blood type)
Now, if we
and Jx
the events that Emma and
respectively, have
blood type
then the event "Emma and Josh
the same blood type" is a union of
intersections,
we can write
P(Emmaand
have the same blood type) = P(EA
n
lA)
n
U (EAB
u
n
JB)
JAB) U (Eo
n
Jo)}
Since thc rour intcrsections here ure mutually exclusive, the probability
their union
probabilities. Moreover, "blood type" is not a factor in
becomes the sum of
the selection of (I spouse, so Ex and
are independent events and
n Jx) =
P(Ex)P(Jx). It follows,
that
and Josh have a 62.5% chance of having
different blood types:
blood types) = 1 - {P
P(Emma and Josh have
+
P(JB)
P(JAB)
1 - f(0.40)(0.40)
+
(0.05) (0.05)
= 0.625
QUESTlONS
2..§.L Suppose that P(A n B)
=
P(A)
= 0.6, and PCB) "'" 0.5.
(9) Are A and B mutually exclusive?
(b) Are A and B independent?
(c) Find P(Ac U BC).
+
+
+
P(Eo)P(JoH
(0.10)(0.10)
(O.45)(0,45)}
Section 2.5
Independence
15
2.5.2.. Spike is not a terribly bright student His chances of
chemistry are 035;
mathematics, 0.40; and both, 0.12. Are the events "Spike passes
and «Spike
passes mathematics" independent? What is the probability that he fails both subjects?
2...5.3. Two fair dice are rolled. What is the probability that
number showing on one will
be twice the number appearing on the other?
2.5.4. Urn T has three red chips. two black chips, and five white chips; urn II has two
four black. and three white. One chip is drawn at random from each urn. What is the
probability that both chips are the same color?
2.S.5.. Dana and Cathy are playing tennis. The probability that Dana wins at least one out of
two games ~ 0.3. What is the probability that Dana wins at least one out of four?
2...5.6. Three points, X I, X2, and X3, are chosen at random in the interval (0. a). A second set
of three points, YJ,
and
are chosen at random in the interval (0. b). Let A be
the event that Xz is between Xl and
Let 8 be the event that Y1 < Y2 <
Find
peA n B).
2...5.7. Suppose that P(A) =
P(B) =
(9) What does P(A U B) equal if
i
i·
1. A and 8 are mutually exclusive?
2. A and 8 are independent?
(b) What does P(A
I 8) equal if
1. A and B are mutually exclusive?
2. A and B are independent?
2.5.8. Suppose that events A, B, and C are independent.
(s)
a Venn diagram to tind an expression for P(A U 8 U C) that does not make
use of a complement
(b) Find an expression for peA U B U C) that does make use of a complement
2.5.9. A fair coin is tossed four times. What is the probability that the number of heads
on the
two tOllSes is
to the number of heads appearing on the
second two tosses?
2.5.10. Suppose that two cards are drawn from a standard 52-card poker deck. Let A be the
event that both are either a jack, queen, king, or ace of hearts, and let B be the event
that both are aces. Are A and B independent? Note: There are 1,,326 equally-likely
from a poker deck.
ways to draw two
Defining the Independence of More Than Two Events
It is not immediately obvious how to extend Definition
to, say, three events. To
call A, B, and C independent, should we require that the probability of the three-way
intersection factors into the product
peA
n
B
the
n
original probabilities,
C) = peA) . PCB) . P(C)
or should we impose the definition we already have on the three pairs of events:
peA
n
B)
= peA)
PCB)
PCB
n
C)
=
PCB)
pee)
= peA)
P(C)
peA () C)
(2.5.4)
Actually, neither condition by
sufficient If three events satisfy Equations
and
2.5.4, we will call them independent (or mutually independent), but Equation
does not
76
Chapter 2
2.5.4. nor does Equation 2.5.4
More generally. the independence
possible intersections equaJ the products
Definition 2.5.2 states the result
t!
Equation 2.5.3 (see Questions 2.5.11
events
that the probabilities of all
aU the corresponding individual probabiHties.
r;.JlalV,/",UL'''' to what was true in the case of two
of Definition 2.5.2 adse when n events are mutually
events, the
independent, and we can calculate P(A. n A2 n ... n An) by computing the product
peAl) . P(A2)' .
Als A2, ... , An are said to be lfUlfep,:;ml'enl if for every set of
... , h bttwt:t:n 1 aml n, indu:,ivt:,
Definition 2..5.2.
inJiL't::s i1,
EXAMPLE 2.5.8
Audrey has registered for four courses in the upcoming faU teon, one each in physics,
English, economics, and sociology.
on what has happened in the recent past. it
would
reasonable to assume that she has a 20% chance of being bumped
the
fTOm the English class, a 30% chance
class, a 10% chance of
the
bumped from the economics
and no chance of being bumped
SOClOI~)gv class. Wha1 is the probability that she
to get into at least one class?
events
is bumped from physics
A2:
is bumped from English
A3:
is bumped from economics
Audre' is
from sociology
As:
P(AI) = 0.20, P(A2) - 0.10, P(A3)
0.30, and P(Ad = O. The chance that Audrey gets
bumped from at
one class can be written as the
a union,
P(Audrey is bumped
at least one class} = P(Al U
(2.5.5)
but evaluating Equation 2.5.5 is somewhat jnvolved because
Ai'S are not mutuaHy
exclusive. A much simpler
is to express the complement
from at
least one" as an intersection:
P (
Audrey is
P (
at least one class
1
different departments are involved, the
"factors" and we can write
Audrey is not
from any das.~s
p(Af n
)
Af n Af n Af)
are likely to be independent events, so
IPT'''U'{''rU1,n
is bumped from at least one
1 - P(Af)P(Af)P(Af)P(Af)
1 - (0.80)(0.90)(0.70)(1.00)
Section 2.5
Independence
n
EXAMPLE 2.5.9
The YouDie-WePay Insurance Company plans to assess
future liabilities by sampling
the reCQrds of its current policyholders. A pilot study has turned up three clients-one
living in Alaska, one in Missouri, and one in Vermont-whose
chances of
surviving to the year 2010 are 0.7, 0.9,
0.3, respectively. What is the probability that by
the end of 2009 the company will have had to pay death benefits to exactly one of the three?
Let Ai be the event "Alaska client survives through 2009." Define A2 and A3
analogously for the Missouri client and Vermont dient, respectively. Then
event
wrilten as the union of three intersections:
"Exactly one dies" can
Since each
peE)
the intersections is mutually exclusive of the other two,
= P(AI n
A2
n Af) + P(AI n
Furthermore, there is no reason to believe that for an practical purposes the fates of
the
are not independent. That being the case, each of the intersection probabilities
reduces to a product, and we can
peE)
= P(Ad
. P(A2)'
(0.7){0.9)(0.7)
+
P{Af) +
P(Al) . P(Af) . P(A3)
(0.7)(0.1)(0.3)
+
+
p(Af) . peAl) P(A3)
(0.3)(0.9)(0.3)
::;: 0.543
Comment. "Declaring" events independent for reasons other than those prescribed
in Definition
is a necessarily subjective endeavor. Here we might feel fairly certain
that a "random" person dying in Alaska. will not affect the survival chances of a "random"
person residing in Missouri (or Vermont). But there may be special circumstances that
invalidate that sort of argument. For example, what if the three individuals in question
assigned
were mercenaries
in an African border war and were aU crew
on an individual
to the same helicopter? practice, all we can do is look at each
basis and try to make a reasonable judgment as to whether the occurrence of one event is
likely to influence the outcome of another.
EXAMPLE 2.5.10
Protocol for making financial decisions in a certain corporation follows
"circuit"
pictured in Figure 2.5.1. Any budget is first screened by L If he approves it, the plan is
5. If either 2 or 3 concurs, it goes to 4. H either 4 or 5 say "yes.·' it
forwarded to 2, 3,
moves on to 6 for a final reading. Only if 6 is also in agreement does the proposal
Suppose that 1, 5, and 6 each
a 50% chance saying "yes," whereas 2, 3, and 4 will
each concur with a probability of 0.70. If everyone comes to a decision independently,
what is the probability that a budget win pass?
Probabilities of this sort are calculated by reducing the circuit to component unions
and intersections. Moreover, if all decisions are
independently, which is
case
here, then every intersection becomes a product.
18
Chapter 2
Probability
L------I5f-------'
FIGURE 2.5.1
Let Ai be the. event that pe.rson I
Figure
aO!)HYVeS
the
"".nlnA<
i = 1> 1., ... ,h. Looking at
we see that
P(budget I-'U~'~J
= P(AI
n
n
{[(A2 U Al)
A4] U As}
n
A6)
= P(Al)P{[(A2 U Al) n A4] U As}P(A6)
By assumption. P(AI) =
P(A6) =
so
P{[(A2 U A3)
P(A2) = 0.7, P(A3) = 0.7, P(A4) = 0.7, peAs) =
n A4D = [P(A2)
[0.7 +
0.637
P(budget passes)
+
= (0.5){0.637 +
-
and
P(A2)P(A3)jP(A 4 )
(0.7) (0.7)J (0.7)
(0.637) (O.S)} (0.5)
Repeated Independent Events
We have
seen several examples where the event
was actually an intersection
independent
events (in
case the probability of the intersection
reduced to a product). There is a
case
that basic scenario that
special
mention
it applies to numerous
situations. If the events
up
the intersection aU arise from the same physical circumstances and
(i.e.,
the same experiment), they are referred to as repealed
they represent
independent trials. The number such trials
finite or infinite.
EXAMPLE 2.5.11
Suppose the
of Christmas tree light.<; you
bought has twenty-four bulbs wired
has a 99.9%
of "working" the first time current is tl~~)U""W,
in series. If each
what is the probability that the
itself, will not
Section 2.S
Independence
19
Let Ai be the event that the ith bulb fails, i = 1.2.... , 24. Then
l' (string fails)
= l' (at least one bulb fails)
= p(Al U U ... U A24)
= 1 - p(string works)
= 1 - P(all twenty-four bulbs work)
= 1 - p(Af n Af n .. , n Af4)
If we assume that bulb failures are independent events,
Moreover, since aU the bulbs are presumably manufactured the same way,
same for an i, so
peAr> is the
P(stringfails) = 1 - {p{Af))24
=1
-
(0.999)24
1 - 0.98
=0.02
The chances are one fifty, in other words, that the
you take it out of the box.
would not work the first time
EXAMPlE 2.5.12
A box contains one two-headed coin and eight fair coins. One is drawn at random and
seven times. Suppose that
seven tosses come up heach;. What is the probability
that the coin is
This is basically a Bayes' problem, but the conditional probabilities on the righthand side of Theorem 2.4.2 appeal to the notion of independence as well. Define the
events
an
B: seven heads occurred in seven tosses
A1: coin tossed has two heads
coin tossed was fair
The question is asking for peAl I 8).
By virtue of the composition of the box. peAl)
P(B
= ~ and peAl) ::::: &. AJso,
I Al) = l' (head on first toss n ... n head on seventh toss I com has two heads)
=17 =1
80
Chapter 2
Probabi Iity
Similarly, p( B I A2) = (!) 7. Substituting into Bayes's Connula shows that the probability
is 0.06 that the coin is fair:
P(A2
I B)
= PCB I At)P(Al)
+
PCB
I
OfO)
==
1(!) + Gf (~)
=0.06
Comment. Let Bn denote the event that the
chosen at
is tossed 1'1 times
with the result being that n heads appear. As. ollr intllition would suggest. P (A2 I Bn) -+ 0
as 1'1 -+ 00:
o
lim P(A2 I B,,) =
"_00
EXAMPLE 2.5.13
During the 1978 basebaH season, Pete Rose of the Cincinnati Reds set a National League
record by
safely in 44 consecutive
Assume that Rose is a .300 hitter and
ai-bat is assumed to be an independent
that he comes to bat four times each game.
event, what probability might reasonably be associated with a hitting streak of that length?
this problem we
to invoke the repeated independent trials model twice---once
tor the four at-bats making
a game and a ~umllllIle: for tbe
gamcs iUU"-''''15
up
streak. Let Ai denote the event "Rose
in ith game," i = 1,2.. " ,44. Then
P(Rose hits safely in forty-four consecutive games) = P(Al n
n ... n A«)
= P(Al) . P(A2) ..... P(A«)
(2.5.6)
Since aU the peA;),s are equal, we can further simplify Equation 2.5.6 by writing
r
P(Rose hits safely in 44 consecutive games) = P(Al)]«
To calculate peAl) we should focus on the complement of
Specifically,
P{A1) = 1 - P(Af)
= 1 - peRuse: does nut hit safely in Game 1)
= 1 - P(Rose makes four outs)
= 1 - (0.700)4
=0.76
(why?)
Section 2.5
Independence
81
Therefore, the probability of a .300 hitter putting together a forty-four-game streak
(during a given set of forty~four games) is 0.0000057:
P(Rose hits safely in forty-four consecutive games) = (0.76)44
= 0,0000057
Comment. The analysis described
has the
"structure" of a r ....""<I'I.....-I
independent trials problem, but the assumptions that the latter
are not entirely
satisfied by the data. Each at-bat, for example, is not really a repetition of the same
different
experiment, nor is P(AI) the same for all i. Rose would obviously
probabilities of getting a hit against different pitchers.' Moreover, "four" was probably
the typical number of official at-bats that
had during a game, but
would
either
or more. Modest deviations from
have been many instances where he
a major
on the probability associated with
game to game, though, would not
Rose's forty-four-game streak.
EXAMPLE
In a certain third world nation, statistics show that only eight out of ten children born in
the early 1980s reached the age of twenty-.one. If the same mortality rate is operative over
the next generation, how many children does a woman need to bear if she wants to have
at least a 75% probability that at least one of her offspring survives to adulthood?
Restated,
question is asking for the smallest
n such that
P(at
one of n children survives to adulthood) 2! 0.75
Assuming that the fates of the n children are independent
P(at least one (of n) " ..TV'V',." to age twenty-one) = 1
=1 Table
P(aH
n
die before adulthood)
(0.80)"
shows the value of 1 - (0.80)" as a function of n.
TABlE 2.5.2
n
1 - (0.80)"
5
0.67
0.74
0.79
6
7
By inspection. we see that the smallest number children for which the probability is at
least 0.75 that at least one of them survives to adulthood is seven.
82
Chapter 2
Probability
EXAMPLE 2.5.15
(Optional'
In the game
one of the ways a
can win is by rolling (with two dice) one of the
sums four, five,
eight, nine, or ten, and then rolling that sum again before rolling a sum of
seven. For example. the sequence of sums six, five, eight, eight. six would result in the player
"point," and he "made
winning on hj,s fifth roll. In gambling parlance, "six" is the
the
of sums eight, four, ten, seven would result in
point." On the other
roll: his point was an
but he
a sum seven bethe pla}'-er losing 011 his
fore he rolled a second eigh 1. What is the probability that a player wins wi th a point of ten?
TABLE 2.53
Sequence of RoUs
Probability
(10,10)
(10, no 10 or 7, 10)
(10. no 10 or 7, no 10 or 7,10)
(3/36)(3/36)
(3/36)(27/36)(3/36)
(3/36)(27/36)(27/36)(3/36)
Table
shows some of the ways a player can make a point often. Each sequence, of
course, is an intersection of independent events, so its probability becomes a product. The
is then the union of all the sequences that could have
event "Player wins with a point
column. Since all those
are mutUally
the
been listed in the
of winning with a
of ten reduces to the sum of an infinite number of
(2.5.7)
Recall from algebra that if 0 < r < 1,
= 1/(1
- r)
to Equation 2.5.7 shows that the
Applying the formula for the sum of a geometric
probability of winning at craps with a point of ten is
P(Player wins with a point often)
= ;6
=
1
3
1
"",,",""1"\
2.5
Independence
83
TABLE 2.5,,4
P (makes point)
4
1/36
16/360
25/396
25/396
16/360
1/36
5
6
8
9
10
Comment. Table 25.4 shows
probabilities
a person "making" each of the
possible
5,6,8,9, and 10. Acoording to the
of craps, a player wins by
either (1) getting a sum of seven or eleven on the first roU or (2) getting a 4, 5, 6 ,8.9, or 10
on the first roll and making the point. But P(sum = 7) = 6/36 and P(sum = 11) = 2/36,80
.
6
P (player wms) =
+
2
36
+
1
36
+
360
+
25
396
+
2S
396
+
360
+
1
36
=0.493
As
"''I1,o'n_1'nroT''''I1
games go, craps is relatively
than 0.500.
probability of the shooter winning
QUESTIONS
2.5.U. Suppose that two fair dice (one red and one green) are rolled. Define
events
a 1 or a 2 shows on the red die
B: a 3, 4, or 5 shows on the green die
A:
c: the dice total is four, eleven, or twelve
Show that these events satisfy Equation 2.5.3 but not Equation 25.4.
2.5.12. A roulette wheel has thirty-six numbers colored red or black: according to the pattern
indicated below:
Roulette wheel pattern
1 2 .3 4 5 6 7 8 9 10 11
13 14 15 16 17 18
R R R R R B B B B R R R R B B B B B
36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19
Define the events
A:
Red number appears
B: Even number
c: Number is less than or equal to eighteen
Show that
events satisfy Equation 2.5.4 but not Equation 2.5.3.
2.5.13. How many probability equations need to be verified to establish the mutual independence of four
84
Chapter 2
Probability
2.5.14. [n a roll of a pair of fair dice (one red and one
let A be Ihe even! the red die
shows a 1 or a 2; and let C be tJle
shows iii 3, 4, or 5; let B be the event tJle green
event/he dice total is seven. Show that A, B, and C are independent.
2.5.15. In a roll of a
of fair dice
red and one
let A be the event of an odd
number on the red die, Let B be the event of an odd number on the green
and let C
be the evenl Ihatthe slim i.~ odd_ Shnw that any pair of Ihese events are independent
but that A, B, and C are not mutually independent.
2.5.16. On her way to work. a commuter encounters (our traffic signals. Assume that the
Ihat her probability of
distance between each of the four is sufficiently
iii green light at any intersection is independent of what happened al any previous
intersection. The first two lights are green for fony seconds of each minutt:; [ht: labl
two, for thirty seconds of each minute. What is the probability that the commuter has
to stop at least three times?
2.5,17. School board officials are debating whether to
all high school seniors 10 take
a proficiency exam before graduating. A student passing all three parts (mathematics,
language skills, and general knowledge) would be awarded a diploma: otherwise. he
to this
would receive only a cerlificate of attendance. A practice test
ninety-rive hlUldred seniors resulted in the following numbers of failures:
Area
Number of Students
Mathematics
skills
General knowledge
......uF" ..... I',...
3325
1900
1425
------------~~~~------
If "Student fails mathematics," "Student fails
skills:' and "Sllllient fl'Ii1~
gene ral knowledge" are independent events, what proportion of next year's seniors
can be expected 10 fail to qualify for a diploma? Does independence seem a reasonable
assumption ir. this situation?
2..5.18. Consider the following four-switch circuit:
If all switches opera Ie independently and P(swilch closes) = p, what is the probability
the circuil is completed?
2.5.19. A fast-food chain is running a new promotion. For each purchase, iii customer is given
a game card Ihat may win $10. The company claims that the probabililY of a person
winning at least once in five tries is 0.32. What is the probability that a customer wins
$10 on his or her first purchase?
2.5.20. Players A, B,.and C toss a fair coin in order. The first to throw a head wins. What are
their respective chances of winning?
2.5.2L Andy. Bob, and Charley have gotten into a
over a female acquaintance
and decide 10 seule their dispule with a three-comered pistol duel. Of the three, Andy
is the worst shot, hitting his target only 30% of the time. Charley, a litHe beuer, is
on-targel 50% of the lime, while Bob never misses. The rules they agree to are simple:
TIII;:Y 1;11t;: tu rhe at the
of their choice in succession, nnd cyclically, in the order
Andy, Bob.
Andy, Bob, Charley, and so on until only one of them is left
Section 2.6
2.5..22.
2.5.24.
2.5..25.
2.5..26.
2.5.1:1.
2.5..28.
2.5.29.
Combinatorics
85
standing. (On each "tum," they get onJy one shot. If a combatant is hit, he no longer
participates,
as a shooter or as a target.) Show that Andy's optimal strategy,
assuming he wants to maximize his chances of staying aiive, is to tire his first shot into
the ground
According to an advertising study, 15% of television viewers who have seen a certai,\
automobile commercial can correctly identify the actor who docs the voice-over.
Suppose that 10 such people are watching TV and the commercial comes on. What is
the probability that at least one of them can name the actor? What is the probability
that exactly one can name the actor?
A fair die is rolled and then n fair coins are tossed, where n is the number showing on
the die. What is the probability that no heads appear?
Each of m urns contains three red chips and four white chips. A total of r samples with
replacement are taken from each urn. What is the probability that at least one red
is drawn from at least one urn?
If two fair dice are tossed. what is the smallest number of throws, 11, for which the
probability of getting at least one double six exceeds 05? (Note: This was one of the
first problems that de Mere communicated to Pascal in 1654.)
A pair of fair dice are rolled until the first sum of eight appears. What is the probability
thaI a sum of seven <k>es not precede
first sum of eight?
An urn contains w white chips, b black chips, and r red chips.
chips are drawn
out at random, one at a time, with replacement. What is the probability that a white
appears before a red?
A Coast Guard dispatcher receives an SOS from a ship that has run aground off the
shore of a small island. Before the captain can relay her exact position, though, her
radio
dead. The dispatcher has n helicopter crews he can send out to conduct a
»t:~IH.':I:I. He
the ship is somewhere either south in area 1 (with probability p)
or north in area (with probability 1
p).
of the n rescue parties is equally
competent and has probability r of locating the ship given it has run aground in
the sector being searched. How should the dispatcher deploy the helicopter crews to
maximize the probability that one of them will find the miSSing ship? Hint: Assume
that m search crews are sent to area I and 11 - m are sent to area II. Let B denote
the event thaI the
is found, let AI be the event that the ship is in area I, and let
A2 be the event
the ship is in area II. Use Theorem 2.4.1 10 get an expression for
PCB); then differentiate with respect to m.
A computer is instructed to generate a random sequence using the digits 0 through 9;
repetitions are permissible. What is the shortest length the sequence can be and still
have at least a 70% probability of containing at least one 4?
COMBINATORICS
Combinatorics is a time-honored branch of mathematics concerned with counting,
ing, and ordering. While blessed with a wealth of early contributors (there are references
to rombina torial problems in the Old Testament). its emergence as a separate discipline is
often credited to the German mathematician and philosopher Gottfried Wjlhelm Leibniz
Dissertatio
arte combinatorial waS perhaps the first
(1646-1716), whose 1666
monograph written on the subject (111).
Applications of comhinatorics are rich in both diversity and number. Users range from
the molecular biolOgist trying to determine how many ways genes call be positioned
along a chromosome, to a computer scientist studying queuing priorities, to a psychologist
modeling the way we learn, to a weekend poker player wondering whether he should
86
Chapter 2
Probability
or a flu.<;h. or a full house. Surprisingly enough, solutions to all of
are
in the same set of four basic theorems and rules,
the
differences that seem to distinguish one question from another.
Counting Ordered
5e«lu~m(:es:
The Multiplication Rule
More often than not, the relevant "outcomes" in a combinatorial problem are ordered
........ 1 ...... ". If two dice are rolled. for
the outcome (4, 5)-that is, the
die
an ordered sequence of length two.
comes up 4 and the second die comes up
The number of such sequences is calculated by using the most fundamental result in
combinatorics, the multiplication ruLe.
Multiplication Rule. If operation A con be performed in m different ways and operation
B inn different ways, the sequence (operation A, operation B) can be performed in m . n
different ways.
Proof.. At the risk of belaboring the obvious, we Can
the multiplication rule
considering a tree diagram (see Figure 2.6.1). Since each version of A can be followed
any of n versions of B. and there are m of the
the total
of "A. B"
sequences that can be pieced together is obviously the product m . n.
0
Opernlioll A
Operation B
1
1
2
AGUflE 2.6.1
Corollary. If operation Ai, i = 1,2, .. - . k, cnn. be perfnrmpd in n; ways, i = 1. 2..... k,
respectively, then the ordered sequence (operaliol1 AI, operation A2 •. .. , operation. At) can
be performed in ttl • /12' ... . nk. ways.
EXAMPlE 2.6.1
The combination lock on a
bas two dials, each marked off with 16 notches
Figure 2.6.2). To open the case, a person
turns the left dial in a certain direction
mark. The
dial is set in a similar
for two revolutions and then stops on a
fashion, after having been turned in a certain direction for two revolutions. How many
different settings are possible?
Section 2.6
Combina1orks
81
c
c
FtGURE 2.6.2
In the terminology of the multiplication rule, opening the briefcase corresponds to
the fouHtep sequence (AI. A2, A3, A4) detailed in Table 2.6.1. Applying the previous
corollary, we see that 1,024 different settings are possible:
Number of different settin~
= n 1.'
=2
n2 . n3 . n4
. 16 . 2 . 16
= 1,024
TABlE 1.6.1
Purpose
Rotating the left dial in a
particular direction
Choosing an endpoint (or the
left dial
Rotating the right dial in a
particular direction
Choosing an endpoint (or the
right dial
Number
Options
2
16
2
16
Comment. Designers of locks should be aware that the number of dials, as opposed to
the number of notches on each dial, is the critical factor in determining how many different
settings are possible. A two-dial lock, for example, where each dial has twenty notches,
to only 2 . 20 . 2 . 20 = 1600 settings. If those forty notches, though, are
gives
distributed among/our dials (10 to each dial), the number of different settings increases
2· to . 2 . to . 2 . 10 . 2 . to).
a hundredfold to }60,OOO
EXAMPLE 2.6.2
Alphonse Bertillon, a nineteenth-century French criminologist, developed an identification system based on eleven anatomical variables (height, head width, ear length, etc.)
that presumably remained essentially unchanged during an individual's adult life. The
range of each variable was divided into three subintervals: smaU, medium, and large. A
person's Bertillon configuro.t.ion was an ordered sequence of eleven letters, say
S, 1£,
m, m, I, $, l,s. 1£, m, 1£
88
Chapter 2
Probability
where a letter indicated the individual's "size" relative to a particular variable. How
that at leasr two citizens will
populated does a city have to be before it can be
have the same
configuration?
Viewed as an ordered sequence, a Bertillon configuration is an eleven-step classification
system, where three
are available at each step. By the multiplication rule, a total
of 3 11 , or
distinct sequences are
any ci ty with at
177.148
adults would necessarily have at least two residents with the same
limited
number of possibilities generated by Bertillon's variables proved to be one of its major
weaknesses. Still, it was widely used in Europe for criminal identification before the
de,velopment of fingerprinting.)
EXAMPLE 2.6.3
In 1824 Louis Braille invented what would eventually become the standard alphabet for
the blind. Based on an earlier form of "night writing" used by the French army for reading
battlefield communiques in the dark, Braille's system replaced each written character with
a six-dot matrix:
. ..
..
..
..
..
. ..
where certain dots were
the choice depending on the character being transcribed.
The letter e, for example, has two raised dots and is written
..
..
..
Punctuation
common words, suffixes, and 80 on also have
dot patterns.
In all, how many different
can be enciphered in Braille?
'Think of the dots as six distinct operations, numbered 1 to 6 (see Figure 2.6.3). In
two options for each dot: We can
it or not raise
forming a Braille letter, we
it. The letter e, for example. corresponds to the
sequence (raise, do not raise, do
Options
1 ..
4-
2·
5·
El
3-
6 ..
Dot number
4
fK.;URE 2.6.3
2 6 Sequences
......."1'.,.,." 2.6
Combinatorics
89
=
not
do not
do not
The number of such sequences, with k 6
and 111 = 112 = ... = n6 = 2, is 26 , or 64.
of tbose sixty-four configurations, though,
has no raised dots, making it of no use to a blind person.
2.6.4 shows the entire
sixty-three-character Braille alphabet.
· · ·
'·" ·· '" ·· · · · · · ·
·
· a · · b · · e · · d · ·e · · · · g · · h · · · · j ·
II
II
II
II
II
II
II
II
II
II
II
II
II
II
II
II
II
II
II
·
II
3
2
II
·
· ·
· .. ··
I
II
II
5
4
II
II
.· ·
..
.. · · ..
..· ..· '." ..· .· · · ..
"
.. · .. · .. .. .. ..
· ·
·
II
II
II
II
II
II
II
II
II
II
v
u
· ·
· III '·"
cb
..·
n
m
1J:
· ·· · ·
· th
sh
II
II
II
·
II
II
II
·
II
II
'"and
..
·
·
·
II
III
·
III
II
III
II
II
II
·
·
II
·
..
II
r
·
·
II
II
II
·
II
t
S
II
II
II
II
II
II
II
of
for
II
II
II
II
II
the
III
III
with
II
II
II
III
II
ed
wb
II
II
0
. · ..· ·
· '" '"
..
.
'"
'"
.. · · .. ..· ..
..
..
..· · .. · · ..· ·
·
·II II II'" '·"
z
II
II
q
II
·
II
·· .. ·
..
9
8
7
p
0
y
II
6
II
II
j
f
1
II
II
II
III
au
er
w
ow
· · · · · · · · · · · · · · · · · ..· · ·
·
·
· · ·
·
· ..
· · '" · · · · ·
· ..
in
en
,
;
'n
III
III
II
II
II
II
II
II
II
II
II
II
II
II
III
II
II
st
II
·
· .. · · ·· ·
·
· · · .· . ..· · ·· ·
· ,
·
-
II
III
II
0
!
:
··
II
II
III
II
II
II
#I
II
ar
· '" · · .... · ..· · · · · ·
· · · · .. ·
· · · · ·
·General· ·. · · . · · ·italic.. Letter
· Capital
·
II
II
II
II
II
accent
Used (or
sigll;
sigll
two-celle.:l
decimal
point
cootractiOIlS
RGURE 2.6.4
II
sign
III
II
III
II
III
2
Probability
EXAMPlE 1.6.4
annual NCAA ("March Madness") basketball tournament starts with a field of
teams. After six rounds of play, the squad
unbeaten is declared the
U<lL'V>J<lJ champion. How many different configurations of winners and losers are possible.
starting with the first round? Assume that the initial pairing of the
invited teams
thirty-two first-round
been done.
a tournament of this sort can pJay out is an exercise
the number of
in applying the multiplication rule twice. Notice, first, that the
first-round
32
Similarly, the resulting sixteen second-round games
games can be decided in 2
can
different winners, and so on. Overall. the totrrnament can be pictured
are
where the number of possible outcomes at the six
, respectively. It follows that the number of possible tournaments
........ u."' ... , would be equally likely!) is the product 232 . 216 .
.21 ,
SlXW-IOtlf
EXAMPLE 1.6.5
An octave contains
distinct notes (on a piano, five black keys and seven white
keys). How many ,htt",V',,,.,.,t
melodies within a
octave can be
if
the bLack keys and white
need to alternate?
Choices:
--5 7
5 7 5 7 5 7
-----
or
7 5 7 5 7 5 7 5
BWBWBWBW
WBWBWBWB
1 2 3 4 5 678
12345678
(b)
FIGURE 2.6.5
There are two
different ways in which the black and
alterna te-the
could produce notes 1, 3, 5, and 7 in the melody, or
4, 6, and 8. Figure 2.6.5 diagrams
two cases. Consider the
VL~""""'__ the odd-numbered notes in the melody. In Multiplication
notes 1, 3, 5, and 7 correspond to
A], A3, AS. and A7
which
the numbers of
options are nl = 5, fl3 = 5, ns 5, and m = 5. The white
(that is, Operations A2. A4,
and As) all have nl
7, i 2,4.6,8, so the number of
a black note oomes first-is the product
74 ,
different "alternating"
or 1,500,625.
the same argument, the second case (where the bJack keys produce the evennumbered notes tn a melody) also generates 74 54 = 1,500.625 melodies.
the number of
with alternating black and white notes is the sum
~...J,"rv.U'~J + 1,500,625, or
Section 2.6
Combinatorics
91
PROBLEM-SOLVING HINTS
(Doing combinatorial problems)
Combinatorial £1""'''Y''''T'''' sometimes call for problemooSolving techniques that are not
routinely used other areas of mathematics. The three listed below are especially
helpful.
that shows the structure of the outcomes that are being counted.
1. Draw a
Be sure [0 include (or indicate)
relevant
A case in point is
Figure
Recognizing at the outset that there are two mutually exclusive
black keys can
ways for
black keys and white keys to alternate
either the odd~numbered notes or the even-numbered
is a critical first step
in solving
problem. Almost invariably, diagrams
as these will suggest
or combination of formulas, that should be applied.
the
2. Use
to "test" the appropriateness of a formula. Typically,
is,
number of ways to
answer to a combinatorial
something-will be so large that
all possible outcomes is not feasible.
It often is feasible, though, to construcl a simple, but analogous, problem
counted). the pnJO(JSed
which
entire set of outcomes can be identified
formula
not agree with the simple<ase enumeration, we know that our
analysis the original question is incorrect.
3. If
outcomes to be
fall into structurally different categories, the
total
of outcomes will
the sum (not
product) of the
of outcomes in each
Recall Example
Alternating melodies
into two structural)y-different categories: black keys can be the odd-numbered
notes or they can be the even-numbered notes
is no third possibility).
Associated with each category is a different set outcomes, implying
the
total number of alternating melodies is the sum of the numbers of outcomes
associated with the two categ<m(~
QUESTIONS
engineer wishes to observe the effects
pressure, and catalyst
If she intends to include
concentration 00 the yield resulting from a certain
two different temperatures, three pressures, and two levels of catalyst,
many
diffeI'lent runs must she make in order to observe each temperature-pressure-catalyst
cOlfibination exactly
2-'.2. A coded message from a CIA operative to his
KGB counterpart is to be sent
the
Q4Er, where
first and last entries must be consonants; the second, an
.1l"_E..... 1 through 9; and the third, one of the
vov.rels. How many different ciphers
can transmi tted?
2.6.3. How many terms will be included in the expansion of
2..6.1. A
(0
+
b
+
c)(d
+e+
!)(x
+
y
+
u
+
v
+
w)
Which of the following will be included in that number: aeu, cdx, bet xvw?
92
Chapter 2
Probability
2.6.4. Suppose that lhe formal for license
in ~ certain slflle is two leiters followed by
four numbers.
(3) How many different plates can be made?
(b) How many different plates are there if the leuers can be repealed but no lwo
numbers can be the same?
(c) How many different plates can be made if
of numbers and lellers is
al10wed except that no plate can have four zeros?
2.6.5. How many integers between tOO and 999 have
and how many of [hose
are odd numbers?
2.6.6. A fast-food restaurant offers customers a choice of
(hat can be added to
a
Huw
[HallY
llifftl till
h1:llllUUI gel:' (.Call lJe
2.6.7. In baseball there are twenty-four different "base-oul"
on
oulS, blises loaded-none ou~ and SO on). Suppose thai a new game.
''''-''U,'-''''I''. is played where there are seven bases (excluding home plate) and each
five outs an
How many base-out configurations would be
in
Puerto
the
have recently been
codes are alleast as
third digit?
codes were five-digit numbers,
Juan.
reality, the lowest zip code was 00601 for
for Ketchikan, Alaska,) An additional four
code is now a nine-digit number. How
are even numbers, and have a seven as
"VlJV\.'-\J'VV'J,
fourteen entrees. six de~e!'ts. and five
beverages. How many different
OO~;SIl)le if a diner intends 10 order only three
courses? (Consider the 1V'"eTl.IlE
Proteins are chains of molecules chosen
from some 20 differen( amino
acids. In a living cell, proteins are
the
a mechanism
whereby ordered sequences of nucieo(ides in the messenger RNA dictate the formation
of a particular amino acid, The four key nucleotides are
cytosine.
and uracil (A.
C, and U). Assuming A,
or U can appeal' any number of
limes in a nucleotide chain and that all sequences are physically
what is the
minimum
the chains must attain 10 have the capability of
the entire sel
of amino
Note: Each sequence in the genetic code must have the same number
of nucleo(ides.
Residents of a condominium have an aUlomalic garage door opener that has a row of
buttons. Each
door has been programmed to respond to a particular set of
bunons
If the condominium houses 250
can residents be assured
doors will open on the same signal? If so, how many additional
families can
before the eight-button code becomes inadequate? Note: The
order in which Ihe buuons are pushed is in'elevanL
In international Morse
each leUer in the alphabet is symbolized by a series
of dots and dashes: the Jette,!'
"for
is encoded liS ",-". What is the
maximum number of dots and/or dashes
10 represent any letter in the .....!lOU_'"
alphabel?
The decimal number
to a sequence of n binary digits ao, at ... , a,,-I,
rl... lh"".rI 10 be
2.6.9. A restaurant ufftrs a choice uf fuur "' ...., .. ." .... ,
2.6.10.
2.6.11.
2.6.12.
2.6.13.
For example, the sequence 0 1 i 0 is equal to 6
Suppose a fair coin is lossed nine times.
l- I . 21 I 1
22
+0
.
the resulting sequence of H's
Section 2.6
"''''rnlP'nt''~'''
exceed
2.6.14. Given the
Combinatoric;
93
a binary sequence of 1's and O's (1 for
0 for
For how many
of tosses will the decimal corresponding to the observed set of heads and tails
in the word
ZOMBIES
in how many ways call two of the letters be arranged
that one is a vowel and one
is a consonant?
2.6.15. Suppose that two cards are drawn-in order-from a standard 52-card poker
In
how many ways can one of the cards be a club and one of the cards be an ace?
2.6.16. Monica's vacation plans require (hat she fly from Nashvitle to Chicago to Seattle
to Anchorage. According to her travel agent, there are three available flights [rom
Nashville to Chicago. five from Chicago to Seattle, and two from Seattle to Anchorage.
Assume that the numbers of options she has for return flights are the same. How many
round-trip itineraries can she schedule?
Counting Pennutations (when the objects are all distinct)
Ordered sequences
in two fundamentally different ways. The fi~!. is the scenario
addressed by the multiplication rule-a process is comprised of k operations, each
to
al10wing ni options, i = 1,2, ... , k; choosing one version of each operation
111112 ... Ilk possibilities.
The second occurs when an ordered arrangement of some specified length k is formed
from a finl.fecollection of objects. Any such arrangement is referred to as a permutation
of length k.
given the three objects A, 8, and
there are
different
of length two that can be formed if the objects cannol be
A B, A C,
and
Theorem 2.6.1. The number ofpermutations of length k that can be formed from a set of II
distinc! elements, repetitions not allowed, is denoted by the symbol" Pk, where
1)(11 - 2)··· (11
k
II!
+ 1)=
--(II - k)!
Proof. Any of the II objects may occupy the
position in the arrangement, any
of II - 1
second, and so on-the number
choices available for filling the kth
position will be n - k + 1 (see Figure 2.6.6). The theorem follows, then, from the
multiplication rule: There will be lI(n - 1)· (n - k + 1) ordered arrangements. 0
Corollary. The number of ways
/1(/1 1)(n - 2) ... 1 II!.
n
1
10
permute an entire set of 11 dislinct objects is
n-t
n-(k-2)
k-l
Position in sequence
FIGURE 2.6.6
n- (k
k
1)
11
PI! -
94
Chapter 2
Probability
EXAMPLE 2.6.6
How many permutations of length k = 3 can be fanned from the set of n = 4 distinct
elements, A, B, C, and D?
According to Theorem 2.6.1, the number should be 24:
=
rtl
(n -
k)!
4!
_ 4- • 3 . 2 . 1 _ 24
(4 - 3)! -
1
-
Confirming that figure, Table 2.6.2 lists the entire set of 24 permutations and illustrates
the Clrgllment u<;ed in the proof of the theorem.
TABLE 2.6.2
B~C
1.
A~C-===~
-c:::::.::::::.
2.
D
B
C
A
-c:::::.::::::. C
7.
8.
9.
8~C-===~
-=..:::.::::::
,D
C
A
C
10.
11.
12.
(BAC)
(BAD)
(BCA)
(BCD)
(BDA)
(BDC)
13.
-c:::::.::::::.
~ D -c:::::.::::::.AB
17.
18-
(CDA)
(CDB)
19.
20.
2122.
23.
24.
(DAB)
(DAC)
(DBA)
(DBC)
(DCA)
(DCH)
B
~
B
D
4.
5.
0.
A -::::::::::::::B
~
/
3.
(ABC)
(ABD)
(ACB)
(A CD)
(ADB)
(A De)
A
-c:::::.::::::. c
~B -====-=~
C
-=::::::::::BA
14.
15.
16.
(CAB)
(CAD)
(CBA)
(eRn)
EXAMPLE 2.6.7
In her sonnet with the famous first line, "How do I love thee? Let me count the ways,"
Elizaberh Barrett Browning listed eight. Suppose Ms. Browning had decided that writing
greeting cards afforded her a better format for expressing her feelings. For how many
years could she have corresponded with her favorite beau on a daily basis and never sent
the same card twice? Assume that each card contains exactly four of the eight "ways"
and thM. order matters.
Section 2.6
Combinatoric!>
95
Ms. BTOwning would be creating a permutation of
objects.
to Theorem
.
Number of dlfferent
cards = gP4
= (8
_8! 4)! = 8 . 7 . 6 . 5
= 1680
At the rate
a
a day,
could keep the correspondence going
more than four
and one-half
EXAMPLE
al!()--·iorll! before Rubik cubes and electronic games had become epidemic-puzzles
were much simpler. One of the more popular combinatorial-related
was a four
by four
consisting of
movable
one empty space.
object was to
maneuver as quickly as
an arbitrary configuration (Figure 2.6.7a) into a specific
pattern
2.6.7b). How many different ways could the puzzle
arranged?
Take
empty space to
square number
and imagine the
rows of the grid
laid
to end to make a sixteen-digit sequence.
permutation
that sequence
corresponds to a different pattern for the grid. By the corollary to Theorem 2.6.1, the
to position
tiles is 16!, or more than twenty trillion (20,922,789,888,000.
to be
That
is more than fiflY limes the number of Slars in the entire Milky
Way galaxy. (Nole: Not all
16! permutations can be generated without physically
removing some of the tiles. Think of the two by two version of
2.6.7 with tiles
nUlllbered 1 through 3. How many of the 4! theoretical
can actually
formed?)
(b)
(II.)
FIGURE 2.6.7
EXAMPLE 2.6.9
A
of 52 cards is shuffled and dealt face up in a TOW. For how many arrangements will
four aces be adjacent?
is a good
for i\Justrating the problem-solving benefits that come
shows
structure that
drawing diagrams, as mentioned earlier.
96
Chapter 2
Probability
Non-Bees
1
3
2
4
FIGURE 2.6.8
to be considered: The four aces are positioned as a "clump" somewhere between or
around the forty-eight non-aces.
Qearly, there are forty-nine "spaces" that could be occupied by the four aces (in front
of the first non-ace, between the first and second non-aces, and so on). Furthermore, by
the corollary to Theorem 2.6.1, once the four aces are assigned to one of those forty-nine
positions, they can still be permuted in 4 P4 = 4! ways. Similarly, the forty-eight nOTI-aces
can be arranged in 4SP48 = 48! ways. It foHows from the multiplication rule, then, that
the number of arrangements having consecutive aces is the product, 49 . 41 . 48!, or,
approximately. 1.46 x 10M.
Comment. Computing n! can be quite cumbersome, even for n's that are fairly small:
We saw in Example 2.6.8, for instance, that 16! is already in the trillions. Fortunately, an
easy-to-use approximation is available. According to Stirling's formula,
In practice, we apply Stirling's formula by writing
and then exponentiating the right-hand side.
Recall Example 2.6.9, where the number of arrangements was calculated to be
49 . 4! . 48!, or 24 . 49!. Substituting into Stirling's formula. we can write
1oS10 (49!)
== 10810 (Ji;) +
~
(49
+ ~) 1oSlO (49)
62.783366
Therefore,
24 . 49!
== 24
.
1062.78337
= 1.46 X 1064
- 4910slO (e)
Section 2.6
Combinatorics
97
EXAMPlE 2.6.10
In chess a rook can move vertically and honwntally (see Figure 2.6.9). can capture any
unobstructed piece located anywhere in its own row or column. In how many ways can
eight distinct rooks be placed on a chessboard (having eight rows and eight columns) so
that no two can capture one another?
FIGURE 2.6.9
To start with a simpler problem, suppose that
eight rooks are all identical. Since no
two rooks can be in the same row or same co)umn (why?), it folloYlS that
row must
contain exactly one. The rook in the first row, however, can be in any of eight columns; the
rook in the second row is then limited to being in one of seven columns, and so on. By the
mUltiplication rule, then, the number of noncapturing configurations for eight identical
rooks is BPg. or 8! (see Figure 2.6.10).
Choice!;
8
7
6
5
4
Thtall1umber
g·7·6·5·4-3-2·1
3
2
1
RGURE2.6.10
Now imagine the eight rooks to be distinct-they might be numbered, for example,
1 through 8. The rook in the first row could be marked with any of eight numbers; the
rook in the second row with any of the remaining seven numbers; and so on. Altogether,
98
2
Probability
there would
to position eight
patterns for each configuration. The total number of ways
noncapturing rooks,
is 8! . 8!, or
.""....., ...1'"'0
EXAMPLE 2_6_11
A new horror movie,
the 13'h, Par! X, stars Jason's great-grandson as a psychotic
to dismember, decapitate, or do whatever
it takes to dispatch
(Le., victim orders) can
four men and
women. (a) How many
screenwriters devise,
they want Jason to do away with all the men
going after any of the women? (b) How many scripts arc possible if the only restriction
for last?
on Jason is that he save
a.
the male counselors are denoted A. B, C, and D.
the femaJe counselors,
and Z. Among the
plots would be the
pictured in
where B is done in
then
and so on. Thc mcn, if thcy firc to
be restricted to the first four
can still be permuted in 4 P4 = 4! ways. The
same
found
the women. Furthermore, the plot
in its
can be thought of as a
first the men are eliminated,
the
then the women. Since 4! ways are available to do the fonner and 4! the
total number different scripts, by the multiplication ruie, is 4! 4!, or 576.
B
Men
0 A
1
2
C Y Z W
4
3
S
6
7
Order of killing
I)
FIGURE 2.6.11
h. If the only condition to
admissibJe scripts is
other seven counselors
met is that Muffy be
that being the
with last, the number of
of ways to permute the
2.6.12).
BWZCYAD
12345678
Order of killing
RGURE 2.6.12
EXAMPLE 2.6.12
Consider the set of
numbers that can be formed by
repetition the
1 through 9. For how many of those permutations will
the 2 precede the 3 and the 41 That is, we want to count sequences like 7 2 5 1 3 6 9 4 8
but not like 6 8 1 5 42 7 3 9.
At first glance, this seems to be a problem well
the scope of Theorem
though, its solution is surpriSingly simple.
With the help of a symmetry
of just the digits 1
4. By the Corollary on page
those four numbers
rise. to 4!(= 24) pe.rmutations. Of those 24, only
, 'i, :1, 4), (2, 1,3,4), (1,2,4,
Section 2.6
Combinatorics
99
1,4, J}-have the
that the I and the 2 come before the 3 and the 4. It
foHows that
of the total number nine-digit permutations should satisfy the condition
being imposed on 1, 2, 3, and 4.
-n
Number of permutations where 1 and 2 precede 3
4
4 = 24 ·9!
60,480
QUESTIONS
2.6.17.
of a large corporal ion has six members willing 10 be nominated for office.
How many different "president/vice president/treasurer"
could be submitted to
the stockholders?
How many ways can a set
be put on a car if all the tires are
How many ways are poosible if 11.110 of the four are snow
2.6.19. Use Stirling's formula to approximate 30!.
(Note: The exact answer is 265,252,R"iQ.81~268,935,3t5, 188.400,000.000.)
2.6.20. The nine members of the
baseball team. the Mahler Maulers. are all
and each can
position equaJly poorly. In how many different
ways can the Maulers take
2.6.21. A
number is to be formed from the digits 1 through 7. with no digit being
used more than once. How many such numbers would be
than 289?
2.6.22.
men and four women are to
seated in a row of chairs numbered 1 through 8.
(a) How many total
are possible?
(b) How many arrangements are
if the men are
to sit in alternate
2.6.23. An engineer needs to take three technical electives sometime during his final four
semesters. The three are to be
from a list of ten. In how many ways can he
schedule those classes, assuming that he never wants to lake more than one technical
P'P,("-"'P in
given term?
2.6.24. How many ways can a twelve-member cheerleading squad (six men and six women)
up 10 (arm six male-female teams? How many ways can six male-female teams be
positioned alon~ a sideline? What
the number 6!6!26
What might the
number 6!6!2t>:2 2 represent?
2.6.25.
thal a seemingly intemlinable German opera is recorded on all six sides of a
three-record album. In how many
can the six sides be played so that at least one
is out of order?
2.6.26. A
of n families. each with III members, are to be lined up for a photograph.
In how many ways can the 11m
be arranged if members of a
must Slay
together?
2.6.27. Suppose that len people. induding you and a
line up for a
picture. How
many ways can the photographer
the line if she wants to
exactly three
people
you and your
2.6.28. Theorem
was the first mathematical result known to have been proved by
that feal being accomplished in 1321 by Levi bell Gerson. Assume that we
do not know the multiplication rule. Prove the theorem the way Levi hen Gerson did.
2.6.29. In how mallY ways can a pack of fifty-two cards be dealt to thirteen players, four to
each, so that every player has one card of
suit?
100
Chapter 2
Probabil ity
2.6.30. If the definition of n! is to hold for all nonnegative
n, show that it follows that
O! must equal one.
2.6.JL The crew of Apollo 17 consisted of a pilot, a copilot, and a geologist. Suppose that
NASA had actually trained nine aviators and four geologists as candidates for the
flight. How many different crews could they have assembled?
2.6.32. Uncle Harry and Aunt Minnie will both be attending your next family reunion.
Unfortunately,
hate each other. Unless they are seated with at least two people
into a shouting match. The side of the table
between them, they are likely to
at which they will be seated has seven chairs. How many seating arrangements are
available for those seven people if a safe distance is to be maintained between your
aunt and your uncle?
2.6.33. In how many ways can the digits 1 through 9 be arranged such that
(a) all the even digits precede all the odd digits
(b) aU the even digits are adjacent to each other
(c) two even digits begin the sequence and two even digits end the sequence
(d) the even digits appear in either i:!.!>lXm.li.ll1:\ UI uescendiog order?
Counting Permutations (when the objects are not all distinct)
The corollary to Theorem 2.6.1 gives a formula for the number of ways an
set of
n objects can be pennuted if the objects are all distinct. Fe,wer than I'l! permutM.1ons are
possible, though, if some of the objects are identical
example, there are 31 = 6 ways
to permute the three distinct objects A, B. and C:
ABC
ACB
BAC
BCA
CAB
CBA
If the three objects to permute, though, are A, A, and B-that
if two of the three are
identical-the number of permutations decreases to three:
AAB
ABA
BAA
As we will see, there are many rea]-worlcJ applkalions where the II Objects to be pennuted
belong to r different categories, each category containing one or more identical objects.
Theorem 2.6.2. The number of ways to arrange n objects, 1.11 being of one kind, n2 of a
second kind, .... and I1r of an rth kind, is
n!
r
where
lIi
=
11.
Section 2.6
Combinatorics
101
Proof
N denote the total number of such arrangements.
anyone of those N.
the
objeds (if
were actually
could
arranged in n1 !n2! ... 11,.!
ways. (Why?) It follows
N . nl !1I21 . . .
is the total number of ways to arrange n
(distinct) objects. But n! equals that same number. Setting N . n1!n2!'"
equal to
111 gives
result
0
are called multinominl coefficients because
Comment. Ratios like n!/(n1 !n2!' _.
the general term in the expansion of
(Xl
+ X2 + ... +
x,.),'
is
I
_ __n_,__
"I L"2
X
"1!n2!" -n,.!
1
,x" T
..
"4c
,.
EXAMPLE 2.6_13
how many
A pastry
a vending
costs
quarters, three dimes, and one nickel?
1
2
3
can a customer put in two
5
4
6
Order ill which coins are deposiled
AG URE Hi. 13
If
coins of a
say QDDQND (see
I:leJlonguig to r = 3 "",,-pm...",><:
COlrlSl,c.lelred identical, then a typical
thought as a permutation
of nickels
=
of dimes
= number of quarters
"2=
"3
sequence,
n
= 6 objects
1
3
=2
By Theorem 2,6.2, there are sixty such sequences:
Of course, had we
different times),
the coins were
minted at different places
number of .....
~' permutations would be 6!, or
"LH ..
102
. . ., "~""L'''' 2
Probabitity
EXAMPlE 2.6.14
Prior to the seventeenth century there were no scientific
a state of affairs that
made it difficult
researchers to
discoveries. If a
sent a copy of
his work to a
there was always a risk that the
might claim it as his
own. The obvious alternative-wait to
to publish a book-invariably
resulted in lengthy
So, as a sort
documentation, scientists would
sometimes send each other anagrams-letter
that, when properly unscrambled,
summarized in a sentence or two what had
discovered.
(1629-1695) looked through his telescope
saw the ring
When Christiaan
(203):
around Saturn, he
the following
llli, mm.
aaaaaaa, ccccc, d,eeeee.g.h,
nnnnnnnnn, 0000. pp, lJ, rr. S, Iltlt, uuuuu
ways can t.he sixty-two letters in Huygens's
be arranged?
7) denote the number
5) the numberofc's. and so on.
l!I1'I."..",nY'.",T"p. multinomial
we find
62!
IV == 7!5!1!5!1!117!4!2!9!4!2!1!2!
os the totnl
apply Stiding's
of Ilrrnngements. To
to the numerator.
for the
H . . ."'UAL ........' "
N, we need to
62!='=
then
(.&) - 62 . log(e) + 62.5 . log(62)
10g(621)
~
The
85.49731
of 85.49731 is 3.143
x lOSS, so
is a number on the order of 3.6 x 1000.
was clearly
When
rcnrrnnged, the anagram becomes "Annnio
terlUi, plano
nusquam
ad eclipticam inclinato," which translates to "Surrounded by a thir
ring, fiat, suspended nowhere, inclined to
")
Section 2.6
Combinatoric:s
103
EXAMPLE 2.6.15
What is the coefficient of x23 in the
(1 + x S + x 9 ) 100?
To understand how this question reLates to permutations, consider the simpler problem
of expanding (a + b)2:
(a
+ b)2 =
(a
+ b)(a +
=a·a+a·b+b·a+b·b
== a l + 2a.b + b2
Notice that each term in the first (a + b) is multiplied by eaeh term in
second (a + b).
Moreover, the coefficient that appears in
of
term
expansion corresponds
to the number of ways that that term can
2 the term 2a.b
reflects the fact that the product ab can result
multiplications:
(a
\,
+ b)(a +
b)
or
(a
,j
+
+b)
ob
By analogy. the coefficient of
in the expansion of (1 + x S + x 9 )100 will be the
number of ways that one term from eacb of the one hundred factors (1 + x 5 + x 9 ) can
multiplied together to form
.
only factors that will produce
, though, is the
5
9
set of two X '8, one x , and ninety-seven 1'8:
that the coefficient of x l3 is the number of ways to
ninety-seven 1's: So, from Theorem 2.6.2,
lVU'iJW;)
coefficient of x23
=
n.>.T1T'I1It ..
two x 9 .s, one x 5 ,
100!
= 485,100
EXAMPlE 2.6.16
palindrome is a phrase whose letters are in the same order
or forward, such as Napoleon's lament
ba.c~k:Ward
Able was I ere I saw Elba
or the often cited
Madam, I'm Adam.
wn4~Inl~r
are
104
Chapter 2
Probability
Words l.hc:msc:lvt::lS can ~ the unit::; in 1:I palindwmc:, lit; in the. t;t:nknc.:e;:.
Girl, bathing on Bikini, eyeing boy,
finds boy eyeing bikini On bathing girl.
Suppose the members of a set ~isting of four objects of one type, six of a second type,
and two of a third type are to be lined up in a row. How many of those permutations are
palindromes?
Think of the twelve objects to arrange as being four A's, six B's, and two C's. If the
arrangement is to be a palindrome, then half of the A's, half of the B's, and half of the C's
must occupy the first six positions in the permutation. Moreover, the final six members
of the sequence must be in the reverse order of the first six. For example, if the objects
comprising the first half of the permutation were
CAB
A
B
B
A
C
then the last six would need to be in the order
B
B
A
B
It follows that the number of palindromes is the number of ways to permute the first
six objects in the sequence, because once the first six are positioned, there is only one
arrangement of the last six that will complete the palindrome. By Theorem 2.6.2, then,
number of palindromes = 6!/(2!311 1) = 60
EXAMPLE 2.6.11
A deliveryman is currently at Point X and needs to stop at Point 0 before driving through
to Point Y (see Figure 2.6.14). How many different routes can be take without ever going
out of his way?
y
0
x
FIGURE 2.6.14
Notice that any admissible path from, say, X to 0 is an ordered sequem:e uf 11
"moves"-rune East and two North. Pictured in Figure 2..6.14, for example. is the particular
X to 0 route
E
ENE
E
E
ENE
E E
Section 2.6
Combinatorics
105
Similarly, any acceptable path from 0 to Y will
of five moves
and
three moves North (the one indicated is E E NNE N E
Since each path from X to 0 corresponds to a unique permutation of nine
and two
N's, the number of such paths (from Theorem 2.6.2) is the quotient
11!/(9!21)
the same reasons, the number
=
different paths from 0 to Y is
8t/(5!3!)
= 56
By the Multiplication Rule, then, the
number of admissible routes from X to Y that
pass through 0 is the product of 55
56, or 3080.
QUESTIONS
2.6.34. Which state name can
more permutations, TENNESSEE or FLORIDA?
How many numbers greater than 4,000,000 can be formed from the digHs 2, 3, 4, 4, 5,
5,5?
2.6.36. An interior decoraror is
to
a shelf containing
books, three wirh
covers, rhree with blue covers, and two with brown covers.
(9) Assuming the titles and the
of the
are irrelevant, in how many ways
can she arrange the eight books?
(b) In how many ways could the books be arranged if they were all considered
distinCt?
(c) In how many ways could the books be
if the
books were considered
indistinguishab1e, but the other five were considered distinct?
2.6.37. Four
(A,
D), three Chinese (#, *, &), and three
(0', /3, y) are
lined up at the box office, waiting 10 buy tickets for [he World's Fair.
(0) How many ways can they position themselves if the Nigerians are to hold the first
and the Greeks, the last three?
four pJaces line; the Chinese, the next
(b) How many arrangements are possibJe if members of the same nationality must
stay together?
(c) How many different queues can be T"T'rn~'l'n
(d) Suppose a vacationing Martian strolls by and wants to photograph the ten for
scrapbook. A bit myopic. the Martian is
capable of discerning
more
obvious differences human anatomy but unable to distinguish one Nigerian
(N) from another, one Chinese (C) from another. or one Greek (G) from another.
Instead of perceiving a line to be B * /3ADfI&CO'y, for examp1e, she would see
NCGNNCCNGG. From the Martian's perspective, in how many different ways
can the ten funny-looking Earthlings line themselves up?
2.6.38.. How many ways can the letters in the word
SLUMGULLION
be arranged so that the three Vs precede all the other consonants?
2.6..39. A tennis tournament has a field of 2n
alJ of whom need to be scheduled to
play the first round. How many different pairings are possible?
106
Chapter 2
Probability
+
2.6..40. What is the coefficiellL of
in the
of (1 +
2.6.41. In how many ways can lht: It:llt:rs of the word
ELEEMOSYNARY
be arranged so that the S is always immediately followed
a Y?
2.6.42. In how many ways can lbe word ABRACADABRA be formed in the
below? Assume that the word must begin with the top A and progress
downward to the bottom A.
_.~....~" ..... 1
R
A
C
A
A
C
C
A
A
D
D
A
D
A
A
n
B
R
2.6.43.
a pilCher faces a batter who never swings. For how many different ball/strike
sequences will the batter be called out on the fifth pitch?
2.6.44. What is the coefficient of w 2x 3YZ 3 in the expansion of (w + x + y +
2.6.45.
in a plane, no thT~.e of which lie on fI sf rnight jille. In how many wfly~
be used as vertices to form two triangles? (Hint: Number the points
Call one of the triangles A and the other B. What does the permutation
A
A
B
123
B
A
B
456
represent?)
2JJ.46. Show that (k!)! is divisible by
. (Hint: Think of a related permutation problem
whose solution would
Theorem
UA7. In bow many ways cali the letters of the word
BROBDI NGNAGI AN
be arranged without changing the order of the vowels?
STATISTICS IS FUN. In bow IHMY
ways can the letters in the anagram be
2.6.49. Linda is taking a five-course load her first semester. bDJgilSJtl,
and history. In how many different ways can sbe earn three A's and two
the entire set of possibilities. Use Theorem 2.6.2 to verify your answer.
2.6.48. Make an iUlagnuII uut uf lhe [l;1miiiar
Section 2.6
Combinatorics
101
Counting Combinations
Order is not always a meaningful characteristic of a collection of elements. Consider a
poker
being dealt a five-card hand. Whether he receives a 2 of hearts, 4 of clubs, 9
of clubs,
of hearts, and ace of diamonds in thai order, or anyone of the other 51 - 1
permutations of those particular five cards is
hand is still the same. As the
last set of examples in this section bear out, there are many such situations-problems
where our only legitimate concern is with the composition of a set of elements, not with
any particular arrangement.
We call a collection of Ie uru:rrdered elements a combination of size Ie. For example,
and D-there are six ways to fOTn'l
given a set of 11 = 4 distinct elements-A, B,
combinations of size 2:
AandB BandC
AandC B
D
A
D CandD
A
formula for counting combinations can be derived quite easily
already know about counting permutations.
what we
Theorem 2.6.3. The number of ways to form combilUllions of size k from a set ofn distinct
objects, repetitions not allowed, is denoted by the symbols
(~) or nCk, where
Proof. Let the symboJ (:) denote the num ber of combinations satisfying the condi·
lions of the theorem. Since each of those combinations can be ordered
product
k! ways, the
(: ) must equal the number of permuto1ions of length Ie that can be formed
from n distinct elements.
11 distinct elements can be formed
permutations of
length k in n(n - 1)··· (n - k + 1) = n!/(n - k)! ways. Therefore,
n!
(n -
Sclving for
(~) gives
result.
k)!
o
Commeut. It often helps to think of combinations in the context of drawing objects
out of an urn. If an urn contains n chips labeled. 1 through n. the number of ways we can
reach in and draw out different samples of size Ie is (:). In deference to this sampling
interpretation for the formation of combinations, (:) is usually read "n things taken Ie
at a
or "II choose k."
108
Chapter 2
Probability
Comment. The symbol
Since the expression
(:). k
(~)
appears in the statement of a familiar theorem from
raised to a power involves two tenus, x and )" the constants
= 0, 1, "', n, are commonly referred to as binomial coefficients.
EXAMPLE 2.6.18
Eight politicians meet at a fund-raising dinner. How many greetings can be exchanged if
each politician shakes hands with every other politician exaclly once?
Imagine the politicians to be eight chips-l through 8--in an urn. A handshake
corresponds to an unordered sample of sjze 2 chosen from that urn. Since repetitions are
not allowed (even the most obsequious
overzealous of campaigners would not
hands with
2.6.3 applies, and the total num ber of
IS
(~) =
8!
or 28.
EXAMPLE
The basketball recruiter for Swampwater Tech has scouted sixteen former NBA starters
that he: thinh he: can p~ uff as Junior
transfers-six are guards, seven arc
and three are centers. Unfortunately, his slush fund of illegal alumni donations
is at an
low and he can afford to buy new
for only nine of the
If he wants to keep three
four forwards, and two centers, how
ways can he
pan:el out the
This is a combination problem that also
an application of the multiplication
TAr'U]"' ...... '"
rule.
note there are
(~)
sets
three guards that could
chosen to "",,,",,",' "'-'
Corvettes (think of drawing a set of three names out of an urn containing six names).
Similarly, the forwards and centers can be bribed in (:) and
follows from the multiplication ruk,
cars is the
or 2] 00
20· 35 . 3).
(~)
ways, respectively. It
that the total number of ways to divvy up the
Section 2.6
Combinatoria
109
EXAMPlE 2.6.20
Your statistics teacher announces a twenty-page ..........LUF, assignment on Monday that is to
be finished by Thursday
You intend to
the first Xl
Monday. the next
X2 pages Tuesday. and the
X3 pages
wherexl + X2 + X3 20 and each
Xi ::: 1. In how many ways can you complete
That
many different
sets of values can be chosen
Xl. X2, and
Imagine the nineteen
between the twenty pages (see Figure 2.6.15). Choosing any
two of
spaces automatically partitioos
twenty pages into
nonempty sets.
Spaces 3
7, for example, would correspond to reading three
on Monday. four
pages on Tuesday, and
pages on Wednesday. The number different values for
the set (XI, X2. X3), then, must equal the
to select two "markers "-namely ,
=
(~). or 171.
II II II II II II II II II II II 'I II II II II II II II I
1
2
3
!
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
!
FIGUR£ 2.6.15
EXAMPlE 2.6.21
Mitch is trying to put a little zing into his
act by
at the '-"",!;;..... lU.U1P>;
of each show. His current engagement is booked to run
months. If he gives one
performance a night
never wants to repeat the same set of jokes on any two ~'I5r'L"",
what is the minimum number of jokes he
in his repertoire?
months of performances create a demand for roughly 120 different sets
n denote the
of jokes that Mitch can teU. The question is asking for the
smaUest n for which (:) :::: 120. Trial-and-error cakulations summarized in Table 2.6..3
show that the optimal n is surprisingly
A set of only
Mitch from having to repeat his opening monologue.
TABLE 2.6.3
?:120?
n
7
8
-+- 9
35
70
126
No
No
Yes
is sufficient to
110
Probability
EXAMPLE 2.6.22
Binomial coefficients have many interesting properties.
the most familiar is Pastriangle,] a numerical array where each entry is equal to the sum of the two numbers
appearing diagonaUy above it (see
2.6.16). Notice that each
triangle can be
as a binomial coefficienl, and the relationship
"'TIT..... "" ..... to reduce to a simple
involving those coefficients:
Prove that Equation 2.6.1 holds for aU
integers 11 and k.
Row
o
(g)
(6)
1
2
3
4
3
(~)
3
4
fi
d)
(~)
2
(~)
4
(i)
(j)
(~)
<i)
(~)
d) d)
(~)
(1)
FIGURE 2..6.16
Consider a set of n
samples of
+
k from
1 distinct objects At. A2 • ... A n +l. We can obviously draw
set in (" ;
1)
diffcrcnt ways. Now, consider any particular
object .. for example, A1. Relative to A I, each of those
(n ; 1)
belongs to one
of two categories: those containing A I and those not containing A I. To form samples
containing Al, we need to select k
1 adJitiooal objects from the remaining n. This can
be done in
At.
(k : 1) ways. Similarly, are (~) ways to form samples not
("; 1) must equal (;) + (k: 1).
VU,..QI.u,u'!5
EXAMPLE 2.6.23
quite
The answers to combinatorial questions can sometimes be obtained
way in which
approaches. What invariably distinguishes one solution from another is
ollt..omes are characterized.
For example, suppose you have just ordered a roast beef sub at a sandwich shop, and
now you
to decide which, if
of tbe available toppings (lettuce, tomato, onions,
its name, Pascal's
was not discovered
before the Frcnch mathcmaticilln
propenies.
wllS
born. It
\\13&
Pascal. lIS basic structure was known hundreds
though, who liin;t made
exten~ive
ll..e of its
Section
Combinatorics
111
Add?
AGUREl.6.17
etc.) to add. If the store has eight "extras" to choose from, how many different subs can
you order?
One
to answer this question is to think
each sub as an ordered sequence of
length eight.
each position in the
corres(Xinds to one the toppings. At
each of those positions, you have two choices-Hood" or "do not add" that particular
topping. Pictured in Figure 2.6.17 is the sequence corresponding to the sub that has
tomato, and onion but no other toppings. Since two choices ("add" or "do not
add") are available for
of the eight toppings, the multiplication rule tells us that the
number of different roast beef subs that would be requested is , or
An ordered sequence of length eight, though, is not the only model capable of
characterizing a roast beef sandwich. We can also distinguish one roast beef sub from
another by the particular combination of toppings that each one has. For
there
are
(!) =
70 different subs having exactly four toppings. It foHows that the total number
different sandwiches is the total number of different combinations of size k, where k
ranges fmm 0 to 8. Reassuringly, that sum agrees with the ordered sequence answer:
total number of different mast beef subs = (~) +
(~)
+
(~) + ... + (~)
1+8+28+···+1
=256
What we have just illustrated here is another property of binomial ooefficients-oamely.
that
t(n)k =2"
(2.6.2)
k=O
of Equation 2.62 is a direct consequence of Newton 's binomial expansion
the second comment following Theorem 2.6.3).
QUESTIONS
2.6.50.
many straight Lines can be drawn between five points (A, B, C, D, and E), no
three of which are collinear?
2.6.51. The Alpha Beta Zeta sorority is trying to fiU a pledge class
nine new members
during fall
Among the twenty-five available candidates, fifteen have been judged
marginally acceptable and ten highly desirable. How many ways can the pledge class
chosen to give a two-to-one ratio of highly desirable to marginally acceptable
candidates?
112
Chapter 2
Probability
Two of those
can row only on the stroke
while
2.6.52. A boat
a crew of
three can row only on
bow side. In how many ways can the two sides of the boat
be manned?
five men and four women, interview for four summer internships
2.6.53. Nine
sponsored by a city newspaper.
(a) [0 how many ways can the newspaper choose a set of four interns?
(b) In how
ways can the newspaper
a set of four interns if it must include
two men
two women in each set?
(c) How many sets of four can be picked such that not everyone in a set is of the same
sex?
2.6.54. TIle final exam in History 101 consists of five essay
that the professor chooses
to the students a week in advance. For how many
from a pool of seven that are
prepared? In this situation does
possihle sets of questions does a student need to
order mailer?
2.6.55. Ten basketball
meet in the school gym for a pickUp game. How many ways can
they form two teams of five each?
2.6.56. A chemist is trying to synthesize part of a straight-chain aliphatic hydrocarbon polymer
that consists of twenty-one radicals-ten ethyls (E). six methyls (M), and five propyls
(P). Assuming all arrangements of radicals are physically possible, how many different
polymers can be formed if no two of the methyl radicals are to be adjacent?
2.6.57. In how many ways can the letters in
MISSISSJ PP/
be
2.6.58. Prove that
so that no two I's are adjacent?
(~) = 211. Him: Use the binomial expansion mentioned on page lOR
2.6.59. Prove that
(Hint: Rewrite the left-hand side as
and consider the problem of selecting a ""'''P'''' of 11
objects.)
from an
set of2n
2.6.60. Sbuw tbal
C) + (;) + ...
(lIillt: Consider the
(~)
+
(~) + ...
of (x _ y)II.)
2.6.61. Prove that successive terms in the sequence
(~), (~), __ . , (:)
then decrease. (Hint: Examine the ratio of two successive terms,
first increase and
(j : k) /
(:) .)
Section 2.7
Combinatorial Probability
113
2.6.62. Imagine n molecules of a gas confined to a rigid container divided into two chambers
by a semipermeable membrane. If i molecules are in the left chamber, the entropy of
the system is defined by the equation
Entropy = log
(:1 )
If JI is even, for what configuration of molecules will the entropy be
(Enfind usefuJ in characterizing heat exchanges, particularly
tropy is a concept
those involving gases. In general terms, the entropy of a system is a measure of its
disorder: As the "randomness" of the positlon and velocity vectors of a system of
particles increases, so does its entropy.) (Hint: See Question 2.6.61.)
2.6.63. Compare the coefficients of /k in {1
+
t)LlO
+
t)C = (1
+
r}d+e to prove that
COMBINATORIAL PROBABILITY
In Section
our concern focused on counting the number of ways a given operation,
or sequence of operations, could be perfonned. In Section 2.7 we want to couple those
enumeration
with the notion of probability. Putting the two together makes a lot
of sense-there arc many combinatorial problems where an enumeration, by
is nol
particularly relevant. A poker player, for example, is not interested knowing the total
number of ways he can draw to a straight; he is interested, though, in his probability of
drawing to a
In a combinatorial
making the transition
an enumeration to a probability
If there are n ways to perform a certain operation and a total of m of those satisfy
is
some stated condition-call it A-then P(A) is defined 10 be the ratio, min. This assumes,
of course, that all possible outcomes are equally likely.
Historically, the "m over 1/" idea is what motivated
early work of Pascal, Fermat,
and Huygens
Section 1.1). Today we recognize that not all probabilities arc so
easily characterized. Nevertheless, the m/II model-the so-called classical definition of
probability-is entirely appropriate for describing a wide variety of phenomena.
EXAMPLE 1.7.1
An urn contains eight
numbered 1 through 8. A sample of three is drawn without
replacement. What is the probability that the largest chip the sample is a 5?
Let A be the event "Largest chip in sample is a 5." Figure 2.7.1 shows what must
happen in order for A to occur: (1) the 5 chip must be selected, and (2) two chips must be
drawn from the subpopulation of chips numbered 1 through 4. By the multiplication rule,
the number
samples satisfying event A is the product
G) .G)'
114
Chapter 2
Probability
0)
0
----------
....
.....
-
OlOosel
-
Choose 2
AGURE 2.1.1
The sample space S for the eJl.'periment of drawing three chips from the urn contains
(~) outcomes, all equally likely. In this situation,
m
=
G) . G)' = G)'
11
and
G)' G)
peA)
= 0.11
G)
EXAMPLE 2.1,2
An urn contains n
chips numbered I through n, n while chips numbered 1 through n,
and n blue chips numbered 1 through n (see
2.7.2). Two chips are drawn at random
and without
What is the prooobility that the two drawn are either the same
color or the same number?
'1
w.
hi
r2
w2
~
'"
tv"
bIt
without
replacemenl
FIGURE 2.1.1
Let A be the event that the two chips drawn are the same color; let B be the event that
they have the same number. We are looking for peA U B).
Since A and D here are mutually exclusive,
P(A U B)
3n chips in the urn,
two . (3n)
2 . Moreover,
With
peA)
+
PCB)
total number of ways to draw an unordered sample of size
1S
P(A);:::: P(2
U 2whites U 2blues)
= P(2 reds) +
P(2 whites)
+
P(2 blues)
Section 2.7
Combinatorial Probability
115
and
P(B} = P(two1's U tw02's U ... U (wan's)
Therefore,
n+1
-1
EXAMPLE 2.7.3
Twelve faif dice are rolled. What is the probability
a. the
six dice aJ] show one
and the
b. not all the faces are the same?
c. each face appears exactly twice?
aJl show a second face?
six
a. The sample space that corresponds to the "experiment" of rolling twelve dice is
the set of ordered
of length twelve, where the outrome at every position
in the sequence is one
the integers 1 through 6. If tbe dice are fair, all 612 such
sequences are equally likely.
Let A be the set of rolls where the first six dice show one face and the second six
show another face. Figure 2.7.3 shows one of the sequences in the event A. Oearly,
the
that appears for the first half of the
could
any of the
integers
from 1 through 6.
Faces
2
2
2
1
2
3
4 4 4
4
8 9 10
Position in sequence
2
2
5
2
6
-74
4
11
4
12
RGURfl.7.]
Five choices would be available for the last half of the sequence (since the two
faces cannot he
same). The number of sequences
the event A, then, is
6P2
6 . 5 = 30. Applying the t'mfn" rule
=
P(A)
== 30/6]2'
1.4
X
10-8
h. Let B be the event that not aU the faces are the same. Then
P(B)
=1 =1 -
P(B c )
6/126
116
Chapter 2
ProbabHfty
there are sixsequences-{l, 1, I, 1, 1,1,1,1,1,1,1,1,), ... , (6. 6, 6, 6, 6, 6, 6, 6,
6,6,6, 6,)-where the twelve faces are all the same.
c. Let C
the event that
face appears exactly twice. From Theorem 2.6.2, the
numbero(ways each face can
twice is 12!/(2! 2!· 2! . 2! . 2! ·21) .
P(C)
=
= 0.0034
. 2! . 2! . 2! . 21 . 2!)
EXAMPlE 2.1.4
A fair die is tossed n times. What is
probability that the sum of the
showing is
n I 2?
The sample space
with
a die n times has (j' outcomes, all of which
in this case are equally likely because the
is presumed fair. There are two
of
outcomes that will produce a sum of n + 2-{a) n - 1 Is and one 3 and (b) n - 2 Is
and two 2s (see
2.7.4). By Theorem
the number of sequences having n
1
n!
n.I
n
ha
.
] 's and one 3 is -1----- = n; likewise, there are
= ( 2 ') outcomes vmg
21sand two
n
P(sum = n
Sum
11+
+
2)
= ---:-'-'Sum = n + 2
n +2
1
1
1
1
3
1
1
1
2
3
1'1-1
n
"2
3
2
n-
1
n
2
1'/
RGURE 2.7.4
EXAMPLE 2.7.5
the foHowing letters from a
To
the
entertained, Tarzan
Scrabble set to play with:
AAA
EE
J
J
K
L
NN
What is the probability toot Cbeetah (who can't spell)
fOnTIS the following sequence:
rp,~irr~m
TARZAN LlKEJANE
(Ignore the spaces between the words).
R
T
z
the letters at random
Section 2.7
Combinatorial Probability
111
If similar letters are considered indistinguisha ,Ie,
2..6.2 appLies, and the
total number of ways to arrange the fourteen letters is 14!/(3!2!1!1!1!1!2!1!1!1!), or
3,632,428,800. Only one of those sequences is tbe desired arrangement, so
1
P("TARZANLIKE1ANE") = 3,632,428,800
Notice that the same answer is obtained if the fourteen
are considered distinct.
Under that scenario, the total number of permutations is 141, but the number of ways to
spel1 TARZAN
JANE increases to
because aU the A's, E's, and N's can
permuted. Therefore,
P("TARZANLIKE1ANEn)
= 3!2!2! = -::--:;-:-:--1-:-:--:-:-:<
14!
EXAMPlE 2.1.6
Suppose that k people are selected at random from the general population. Wbat are the
chances that at least two those k were born on the same day? Known as the birthday
problem, this is a particularly
example of combinatorial probability because its
statement is so simple, its analysis is straightforward, yet its solution, as we win see,
strongly contrary to our intuition.
Picture the k individuals lined up a row to form an ordered sequence. leap year
is omitted, each person
have any of 365 birthdays. By the muJtiplication rule, the
group as a whole
a
space of
birthday
(see Figure 2.7.5).
Define A to be tbe event "at least two people bave the same birthday." If each person
to have the same
of being born on any given day,
365k "'"'~i"'''''H'''''''
is
Figure 2.7.5 are equally likely,
P(A)
A
= Number
Counting the number of sequences in the numerator here is prohibitively difficult
because of the complexity of the event A; fortunately, counting the number of sequences
in
is quite easy. Notice that
birthday sequence in
sample space belongs to
exactly one of two categories (see Figure 2..7.6):
1. At least two people have the same birthday.
2. All k people have different birthdays.
Possible
binhdays: (365)
1
(365
2
)
_
365 k different
sequences
Persoll
FIGURE .2.7.5
118
Chapter 2
Probability
~eq!ueT1lceswhere
at lea...
two people bave lbe same
birthday
.
Seqluel'1lC<:S where all k
10, March 1, .
people have different
birthdays
14,1an.1O.
Sample space; all birthday sequences of
length k (cofl[ains)65k outcomes).
FIGURE 2.1.6
It foHows that
in A
Number
= 365k -
number of sequences where all k pe()I01e
hl'lve different birthdays
The number of ways to (orm birthday sequences for k people subject to the restriction
that aU k
must
different is simply the number of
to form permutations
of length k from a set of 365 distinct OOllects:
Therefore,
P(A) - P(atlcasttwo
l
365
have the SlIme birtbdny)
- k
-
+ 1)
23,40,50, and 70. Notice how
Table 2,7.1 shows P(A) for k
of 15,
greatly exceed what our intuition would '''''1';;.,'-''''''
P(A),s
Comment. Presidential biographies offer one opportunity to "confiml" the unex2.7.1
for P(A). Among our first k ::=:: 40 presidents,
pectedly large values that
two did have the snme birthday: Harding
Polk were both born on November 2. More,
TABlE :2.1.1
k
P(A)
= P (at least two have same birthday)
22
23
0.253
0.476
0.507
0.891
50
0.970
70
0.999
Section 2.7
Combinatorial Probability
surprising. though, are the death dates of the presidents: Adams,
all died on July 4, and Fillmore and Tan both died on March 8.
119
and Monroe
Comment. The values for peA) in Table 2.7.1 are actually slight underestimates for
the true probabilities that at least two of k people will be born on the same day. The
assumption made earlier that all
birthday sequences are equally hkely is not entirely
true: Births are somewhat more common during the summer than they are during the
winter. It has been proven, though, that any sort of deviation from the equally-likely
model wm only serve to increase the chances that two or more people will
the same
birthday (120). So, if k = 40, for example, the probability is slightly greater than O.R91
that at least two were born on the same day.
EXAMPU
of
more instructive--and to some, one of the more useful-applications of
combinatorics is the calculation of probabilities associated with various poker hands. It
will be assumed in what follows that five cards are dealt from a poker deck and that
no other cards are showing, although some may already have been dealt. The ~"'HOJ"_
(~2) = 2.,598,960 different hands, each having probability 1/2.598,960.
space is the set
What are the chances of being dealt (a) a full Muse, (b) one pair, and (c) a straight?
[Probabilities for the various other kinds of poker hands (two pairs, three-of-a-kind, flush,
and so on) are gotten in much the same way.]
house. A full house consists of three cards of one denomination and two
of another. Figure 2.7.7 shows a full house consisting of three 7s and two Queens.
B. Full
Denominations
can be chosen in
the
a denomination has
G)
2
s
given that
possible choices of suits. Thus, by the multiplication rule,
3
4
5
6
7
8
9
10
x
J
Q
X
X
X
FIGURE 2.7.7
G)
C12) available denominations,
D
c
ways.
decided on, the three reqUisite suits can be selected in
Applying the same reasoning to the pair gives
each having
C:)
x
K
A
120
Chapter 2
Probability
2
3
4
D
H
6 7
5
8
10
9
K
J
A
X
X
X
X
C
X
S
FIGURE 2.7.B
b. One pair, To qualify as a one-pair hand, the five cards must include two of the same
denomination and three "single" cards-cards whose denominations match neither
the pair nor each other.
denominations
2,7.8
once selected.
cards can be chosen
the three
can have any of
a pair of
G)
(~2) ways
For the
there are
(\3)
suits. Denominations for
Question 2.7,16), and each card
G) suits, Multiplying these factors together and dividing by (~)
gives a probability of 0.42:
CnG)G)G)G)(~) ~042
-
P(onep,ir) -
,
e~)
c. Straight. A straight is five cards having consecutive denominations but not all in the
same suil-for example, a 4 of diamonds, 5 of hearts, 6 of hearts, 7 of clubs, and 8 of
2. 7.9). An ace may be counted
or "loW:' which means
diamonds (see
that (10,jack, queen, king, ace) is a strajght and so IS (ace, 2,3, 4, 5). (If five consecutive
cards are all in the same suit, the hand is called a straight flush. The 1a ner is considered
a fundamentally different type of hand in the sense that a straight flush "beats" a
To
the numerator for P (straight), we will first ignore the condition that
nU five cards not be in the same suit and
count the number of hands having
consecutive denominations. Note there are ten sets of consecutive denominations
of length
(ace, 2, 3. 4, 5), (2,3,4,5,6), ... , (10, jack, queen, king, ace), With no
restrictions on the suits, each
can be
a
heart, dub, or spade. It
follows, then, that
number of five-card hands having
denominations
is 10 .
GY.
Butforty
10· 4) of those hands are straight
10·
P (straight)
GY
e~)
Therefore,
-40
= 0.00392
2.7.2 shows the probabilities
with all the
Hand i beats hand j if P(hand i) < P(h:mn j).
hands.
Section 2.7
Combinatorial Probability
121
2345678910JQKA
D
H
x
X
X
C
X
x
S
AGURE1.7.!'I
TABlE1..7.2
Hand
Probability
pair
Two pairs
Three-of-a-ldnd
Straight
Flush
Full house
Four-of-a-kind
Straight flush
Royal flush
0.42
0.048
0.021
0.0039
0.0020
0.0014
0.00024
0.000014
0.0000015
PROBLEM-SOLVING HINTS
(Doing combinatorial probability problems)
listed on p. 91 are several hints that can be helpful in counting the number of ways
to do something. Those same hiots apply to the solution of combinatorial probability
problems, but a few others should be
in mind as well
1.
solution to a combinatorial probability problem should be set up as a
quotient of numerator and denominator enumerations. Avoid the temptation to
multiply probabilities associated with each position in
sequence. The latter
approach wiU always "sound" reasonable, but it wiU frequently oversimplify the
prohlem and give the wrong answer.
2. Keep the numerator and denominator consistent with respect to order-if
permutations are being counted in the numerator. be sure that permutations are
being counted in the denominator; likewise, if the outcomes in the numerator are
combinations, the outcomes in the denominator should also be combinations.
3.
number of outcomes associated with any problem involVing the rolling of n
six-sided dice is &'; similarly, the number of outromes associated with tOSSing a
coin It times is 2". The number of outcomes associated with dealing a band of 11
cards from a standard
poker deck is 52Cn .
122
Chapter 2
Probability
QUESTiONS
2.7.1. Ten equally-qualified marketing assistants are candidates for promotion IO associate
buyer; seven are men and three are women. If I he company intends 10 promote four of
the len at random. whal is the probability that
two of Ihe four are women?
numbered 1 through 6. Two are chosen al random and their
2.7.2 An urn contains six
numbers are added logether. What is Ihe probability that the resuhing sum is equal to
five?
2.7.3. An urn contains twenty chips, numbered t Ihrough 2(t Two are drawn simultaneously.
Whal is the probability lhal Ihe numbers on the two chips will differ by more Ihantwo?
2.7.4. A bridge hand (Ihirteen
is dealt from a standard 52-<:ard deck. Let A be the event
thai the hand conlains four aces; let B be the event Ihat the hand contains four kings.
Find PtA U B).
2. 7..5.
a set of ten urns,
of which conlain three white chips and Ihree red chips
each. The tenth conlains five while chips and one red chip. An urn is picked 131 random.
Then a ~ample of lIi7e three is drawn witholll replacement from thai urn. If all three
chips drawn are white. what is Ihe probability the urn
sampled is Ihe one with
five white chips?
2.7.6. A commiHee of fifty politicians is to be chosen from among our one hundred U.S.
Senators. [f the selection is done al random, whal is the probability lhat each slale will
be
2.7.7. Suppose that II fair dice are rolled. Whal are the chances thaI all n faces will be Ihe
same?
2.7.8.. Five fair dice are rolled. Whal is the probabilily Ihat Ihe faces showing constitute a "full
house"-Ihat is, three faces show one number and IWO faces show a second number?
2.7.9. Illidgin¢ that the test tube pictured contains 2/1 grains of sand, fI whitc and 11 blnck.
Suppose the tube is vigorously shaken. Whal is the probability Ihat the two colors of
sand will completely separate; that is, all of one color fall 1.0 the bouom, and all of the
other color lie on top? lHint: Consider Ihe 211
to be
in a row. Tn how
many ways can the 11 white and the II black
2.7.10. Does a monkey have a beller chance of rearranging
Ace L L U U S
10 spell
CAL C U L U S
or
A A 8 EG L R
10
ALGEBRA?
2.7.11. An aparlment building has eight floors. If seven people gel on Ihe elevator on the first
tioor, what is the probability they all want to
off on different Roors? On Ihe same
floor? Whal assumption are you
Doel. it bt:t:m H:a:,ulI<iuk? Explain.
2.7.12. If the leiters in the phrase
A ROLLING STONE GATHERS NO MOSS
are
at random, what are the chances that not all the S's will be ndjacen!?
Section 2.8
Taking a Second Look at Statistics (E numeration and Monte Carlo Techniques)
27.13. Suppose each of ten
is broken into a long pan: and a short part. The twenty
parts are arranged into ten pairs and glued back together, so that again there are ten
sticks. What is the probability that each long part will be paired with a short part?
(Note: This problem is a model for the effects of radiation on a living cell. Each
struck by ionizing radiation, breaks into two paris,
chromosome, as a result of
one part containing the centromere. The cell will die unless the fragment containing
the centromere recombines with one not containing a centromere.)
27.14. Six dice are rolled one time. What is the probability that each of the six faces appears?
27.15. Suppose that a randomly selecled
of k people are brought together. What is the
probability that exactly one pair has the same binhday?
27.16. For one-pair poker hands,. why is the number of denominations for the three
2
cards
rather than
(\O)?
)
C:)
Cl CII)
2.7.17. Dana is not the world's best poker player. Deal! a 2 of diamonds, an 8 of diamonds, an
ace of hearts, an ace of clubs, and an ace of spades, she discards the three aces. What
are her chances of drawing to a flush?
2.1.18. A poker pLayer is deah a 7 of diamonds, a queen of diamonds, a queen of hearts, a
queen of clubs, and an ace of hearts. He discards the 7. What js his probability of
drawing to either a full house or four-of-a-kind?
27.19. Tim is dealt a 4 of clubs, a 6 of hearts, an 8 of hearts, a 9 of hearts, and a king of
diamonds. He discards the 4 and the king. What are hjs chances of drawing to a straight
flush? to a flush?
2.7.20. Five cards are dealt from a standard 52-card deck. What is the probability that the sum
of the faces on the five cards is 48 or more?
2.7.21. Nine cards are dealt from a 52-card deck. Write a formula for the probability that three
uf the five even numerical denominations are represented twice, one of the three face
card appears once. Note: Face cards are the
cards appears twice, and a second
jacks, queens, and kings; 2, 4, 6,8, and to are the even numerical denominations.
2.7.22 A coke hand in bridge is one where none of the thirteen cards is an ace or is higher
than a 9, What is the probability of being dealt such a hand?
2.7.23. A pinochle deck has forty-eight cards, tWO of each of
denominations (9, J, Q, K, 10,
A) and the usual four suits. Among the many hands that coum (or meld is a roundhollse.
which occurs when a player has a king and queen each suit. In a hand of twel\le cards,
what is the probability of gelling a "bare" roundhouse (a king and queen of each suit
and no other
or queens)?
2.1.24. A somewhat inebriated conventioneer finds himself in lhe embarrassing predicament
of being unable 10 predetermine whether his next
will be forward or backward.
What is the probability thaI after hazarding /I such maneuvers he will have stumbled
forward a distance of r steps? (Hint: Lelx denOle the number of steps he takes forward
and y, Ihe number backward. Then x + y = II and x
y = r.)
'AKING A SECOND LOOK AT STATISTICS (ENUMERAll0N AND MONTE CARLO TEGINIQUES)
It is a characteristic of probability and com binatorial problems that proposed solutions can
sound so right and yel be so wrong. Intuition can easily be fooled, and \lerbal arguments
are often inadequate to deal with questions having even a modicum of complexity. There
are Some problem-solving
available, though, that can be very helpful. In
approaches that go back to basics are especially useful.
1H
124
Chapter 2
Probability
Making a Ust and Checking It Twice
Ask a realtor to list the three most important features that a house for
can have anc
the answer is. likely to be "location. location, location." Ask a probabilist to name the
three most helpful
for
difficult combinatorial problems and the answel
well
"enumerate, enumerate, enumerate." Making a paniallist of the sel
might
of outcomes comprising an event can often show that a proposed solution is incorrect anc
what the right answer should
Sometimes, though, the magnitudes of the numbers ir
a problem are so large that making even a panialHst outcomes is not a
optior
for that particular problem, In those cases, the
is to enumerate a much smaller-scalt
problem, one that has all the essential features of the originaL
For example, suppose a student government council is to be comprised of three
freshmen, three sophomores, three juniors,
and one at-large ..,.,r,...,.",,.,nt<l1·,
who could be a member
any of the four classes. Moreover, suppose ten candidate:
from each of the
classes have been nominate.d. How many different thirl een-memhe
councils can be formed?
One approach that
seem
is to
the council members in a way
mimics the statement of the question. That three representatives from each class Cal
be chosen in
40
(~O) ways; then the at-large member would
selected from the remainin,
l2) nominees. Applying the multiplication rule
number of
= 5,806,080,000
that one
dasst
Another approach, which also may seem reasonable, is to
vvill necessarily have foul' representativC8, while the other three will each have three. An
of the four classes, of course, could be the one with four representatives. Electing
for example, and
from eaeh of the other three classes can be done i
C~) (~O) C30) C30)
in any
ways. Allowing for the fact that the four
it foHows that the total number of
.
could I:
councils is 1.451.520.00.
. (10) (10)
3 (10)
3 (10)
3
number of dIfferent counclls = 4
= 1,451.520,000
Is the first approach overcounting {he number of different councils or is the StXOI
approach undercounting? The two
solutions differ a factor of four. Enume
(by hand) even a portion the possible outcomes is not feasible
magnitude of the combinatorial
A very simple analogous question c.
be posed, though, that is easily enumerAted. Suppose there were only two classes-sa
Taking a Secood look at Statistic; (E numeration and Monte Carlo Tedm.iqoes)
sectioo2.8
Freshmen
Oioosel
C,D
A,B
Choose 1
~
AND chOO6e 1 at-large
FIGURE 2..8.. 1
TABlf2..8..1
First Approach
8
Fresb.
Soph.
A
A
A
C
C
A
D
B
B
B
B
C
Second Approach
At-large
Fresh.
A
4
C
C
D
B
B
A
A
B
C
D
D
~ ]
D
D
Duplicates
IrelmDJen and sophomores-and onty two nominees from each cla.ss. Furthermore, supa three-member council is to be formed, consisting of one freshman, one sophomore,
and one representative at large (see Figure
Applied to Figure 2.8.1, the first approach would claim that the number of different
"'-'LUlI..1Jl>
is
G) G) C~ 2), 8.
different councils is 4
or
The second approach would imply that the number of
(= G) G) + G) G) ).
Table 2.S.1 is a listing of the outcomes
generated by the two strategies. By inspection, it is now clear that the first approach is
incorrect-every possible outcome is doubJe-counted. The outcome ACB, for example,
where B is the at-large representative, reappears as SCA, where A is the at-large
representative. The second approach, on the other hand, prevents any such overlapping
from occurring (but does include all possible councils).
Play It Again. Sam
Recall the von Mises definition of probability given on p. 23: If an experiment is repeated
if the event E occurs on m of those repetitions,
n times under identical conditions,
125
126
Chapter 2
Probabilny
then
peE) = lim m
11->00
(2.8.1)
n
To be sure, Equation
is an asymptotic result, but it suggests an obvious (and very
approximation-if n is finite,
P(E)
m
==n
In general, efforts
to estimate probabilities by simulating repetitions of an experiment
(usually with a computer) are referred to as Monte Carlo studies. Usually the technique
is used in situations
an exact probability is difficult to calculate. It can also be
though, as an empirical justification for choosing one proposed solution over another.
For example, consider the game described in Example 2.4.11. An urn contains a red
is
chip, a blue
and a two--color chip (red on one side, bh.lt! uu tht;; uthtf). One
drawn at random
placed on a table.
question is, if blue is
what is the
probability that the color underneath is also blue?
Pictured in
2.8.2 are two ways
conceptualizing the question just posed. The
outcomes in (a) are
that a chip was dr~wn. Starting with that premise, the
answer to the question is
red chip is obviously eliminated and only one of
two
~U."""LUU'6 chips is blue on
sides.
Side drawn
red
blue
}
--.. P(BIB)
=112
red/red
'J
bluefblue --- P(BlB) = 213
tW(H;;olor
red/blue
<a)
(b)
FiGURE 2.8.2
By way of contrast, the outcomes in (b) are assuming that the side of a chip was drawn
If so, the blue color showing could be any of three blue sides, two
which are biU(
underneath. According to model (b), then, the probability of both sides
blue is
The formal
on pp. 60, of course, resolves the debate-the correct answer i:
But suppose that such a derivation was unavailable. How might we assess
relativl
plausibilities of ~ and ~? The answer is simple-just play the game a number of times an'
see
proportion of outcomes that show
on top have blue underneath.
To that
Table
summarizes the results of one hundred
drawings. Fo
a total of fifty-two, blue was showing (5) when the chip was placed on a table;
of the trials (those marked with an asterisk), the color underneath (U) was also blue
Using the approximation suggested by Equation 2.8.1,
i.
i.
P(blue is underneath I blue is on top) = PCB I B)
a figure much more consistent with
j
than with!.
== ~~ == 0.69
Section2.S
Taking a Second Look at Statistics (Enumeration and Monte Carlo Techniques)
TA8l.E l.8..l
Trial #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
S
U
R B
B B*
B R
R
R
R
R
R
B
R
R
51
B
B
53
54
R
R
B
B
30
R
R
55
31
32
33
R
B
56
B
R
B*
57
58
59
R
R
R
34
36
37
B
B B*
B B*
R
B
R
R
R
R
38
B BIIt
39
R
40
R
41
42
43
B B*
B'"
B R
B B*
B
B BIIt
20
21
22
R
R
23
B B*
B R
B
R
R
45
46
R
B
R
B B*
B B*
50
R
R
R
R
U
Trial #
B
R
76
B
B B*
R B
R
R
R
77
S
B B*
B B'"
78
79
R
R
80
B
81
R
82
83
R
R
R
R
84
85
B
B
R
R
R
R
B
B B*
B R
B B*
R
B B*
R
B
B
R
61
62
63
B
R
R
R
86
B
87
B BIIt
R
64
R
R
88
89
90
B R
65
B B*
66
B
67
R
68
B B'"
B B*
69
70
R
47
48
49
S
R
R
B
B Bot
24
25
Trial #
29
B
B B*
B 8*
R 8
B
U
28
B BIi<
R
26
S
R
B B*
B R
R
#
72
73
74
75
R
R
R
R
92
93
94
97
B Bot
B B*
B R
B Bot
R
95
R
%
B B*
B
R R
R B
R R
R 'R
R B
R
R
98
99
B
B*
100
R
R
B
B*
The point of these examples is not to downgrade the importance of rigorous derivations
and exact answers. Far from The application of Theorem 2.4.1 to solve the problem
posed in Example 2.4011 is obviously superior to the Monte Carlo approximation illlJSw
trated Table 2.8.2. Still, enumerations of outcomes and "replications of experiments
can often provide valuable insights and call attention to nuances that might otherwise
go unnoticed As problem-solving techniques probability and combinatorics, they are
extremely, extremely important
127
CHAPTER
Random
3.1
3.2
3.3
3.4
3.S
3.6
3.7
3.8
3.9
3
ariables
INTRODUCTION
BINOMIAL AND HYPERGEOMETRIC PROBABILITIES
DISCRETE RANDOM VARIABLES
CONTINUOUS RANDOM VARIABLES
EXPECTED VALUES
THE VARIANCE
JOINT DENSITIES
COMBINING RANDOM VARIABLES
FURTHER PROPERTIES OF THE MEAN AND VARIANCE
3.10 ORDER STATISTICS
3.11 CONDITIONAL DENSITIES
3.ll MOMENT-6ENERATINc.J fUNCTIoNS
3.13 TAKING A SECOND LOOK AT STATISTICS (INTERPRETING MEANS)
APPENDIX 3..4..1 MINITAB APPLICATIONS
Jakob (Jacques) Bernoulli
One of a Swiss family producing eight distinguished scientists, Jakob
was forced by his father to pursue theological
but his love of
mathematics eventually Jed him to a university career. He and his brother,
Johann, were the most prominent champions of Leibniz's calculus on
continental
the two using the new theory to solve numerous
problems in physics and mathematics. Bernoulli's main work in probability,
Ars Conjectand~ was published after his death by his nephew, Nikolaus, in
1713.
128
-Jakob (1""ClIlf'~) Rprnnulli (1
Section 3.1
Introduction
129
INTRODUcnON
Throughout Chapter 2, probabilities were assigned to events-that is, to sets of sample
outcomes. The events we dealt with were composed of either a finite or a countably
infinite number of sample outcomes, in which case the event's probability was simply the
sum of the probabilities assigned to its outcomes. One particular probability function that
came up over and over
in Chapter 2 was the assignment of ~ as the probability
"""".......,10.<;,..., with
of the n points in a finite sample space. This is the model that
typically describes games of chance (and all of our combinatorial probability problems in
Chapter 2).
The first objective of this chapter is to look at several other useful ways for asprobabilities to sample outcomes. In so doing. we confront the desirability
"redefining" sample spaces using functions known as random variables. How and why
the focus of virtuthese are used-and what their mathematical properties
aUy everything covered in Chapter 3.
As a case in point, suppose a medical researcher is testing eight elderly adults for their
allergic reaction (yes or no) to a new drug for controlling blood pressure. One of the 28
256 possible
points would be
sequence (yes, no, no, yes, no, no, yes, no),
signifying that the first subject had an allergic reaction, the second did not, the third
did not, and so on. Typically, in studies of this sort, the particular subjects experiencing
reactions is of little interest: what does matter is the number who show a reaction. If that
were true here,
outcome's relevant information
the number of anergic reactions)
could
summarized by the number 3. 1
Suppose X denotes the number of allergic reactions among a set of eight adults. Then
X is said to be a random variable and the number 3 is the value of the random variable
for
outcome (yes, no, no, yes, no, no,
no).
In generaJ, random variables are functions
associate numbers with some attribute
of a sample outcome that is deemed to be especially important. If X denotes the random
variable and s denotes a sample outcome, then X (s) = t, where 1 is a real number.
the aUergy example, s = (yes, no, no,
no, no, yes, no) and 1 = 3.
Random variables can often create a dramatically simpler sample space. That certainly
is the case here-the original sample space has 256
28 ) outcomes, each being an
ordered sequence of length eight. The random variable X, on the other hand, has only
nine possihle values, the integers from 0 to 8, inclusive.
In terms of their fundamental structure, aU random variables fall into one two broad
distinction resting on the number of possible values the random variabJe
can equal. If the latter is finite or countably infinite (which would be the case with the
allergic reaction example), the random variable is said to
discrete; if the outcomes
number in a given interval, the number
possibilities is uncountably
can be any
infinite, and the random variable is
to be contimu)!I.$. The difference between the two
is critically important. as we willlearI1 the next several sections.
=
IBy TlIeorem 2.6.2. of course, there would be a 10lal of fifty-siX (= 81/315!) outcomes having exactly
three
All fifty-six would be equivalenl in lerms orwbat they imply aboul Ule drug's likelihood of C<lusing
reactioos.
130
3
Random
The
3 is to introduce the important
and
computational
associated with random variables, both discrete and continuous.
Taken together, these ideas form the bedrock of modem probability and statistics.
~1-2
BINOMIAL AND HVPERGEOMETRIC PROBABIUTIES
section looks at two "1-"'''''''''- probability scenarios that are especially ""IJV.L'<H'''.
both for their theoretical
as well as for their ability to describe rellLl-Vi'OJi
..... vlJ""F, these two models will
us understand random
in
3.3.
in general, the
The
Probability Distribution
probabilities apply to situations involving ii st:rit:S of
and idenlical
trials,
each trial can have only one of two possible outcomes. Imagine three
distinguishable coins being tossed,
having a probability p of coming up heads.
The set of
outcomes are the
in Table 3.2.1. If the probability of any
of the coins
lip heads is P. th~n the prohahility of the sequence (H. H. H) is p3,
since the coin tosses qualify as independent
Similarly, the
of (T, H, H)
is (1 - p)';.
fourth column of Table
shows the probabilities associated with
each of the three-coin sequences.
Suppose our
the coin tosses is the number of heads that occur. Whether
the actual sequence
(H, H, .1) or (H. T,
is immaterial, since
outcome
contains exactly two
The last column of
shows the number of heads in
each of the eight
outcomes. Notice
are three outcomes with <JAD).... W.y
two heads, each having an individual probability
p2(1
p). The probability,
of the event "two heads" is the sum of those three individual probabilities-that
p). Table 3.2.2
the probabilities of
k heads, where k = 0, 1, 2,
or 3.
HUlOOUaI
TABLE:U.1
Heads
1st Coin
2nd Coin
H
H
H
T
H
T
H
H
T
H
H
T
T
T
H
T
H
T
T
T
T
3rd
T
p3
p2(1 _ p)
y(l - p}
';(1 - p)
p(l _ p)2
p(l _ p)2
p(l _ p}2
p)3
(1
:i
2
2
2
1
1
1
0
Section 3.2
Binomial and Hypergeometrk. ProbabiHties
131
TABLE 3.2.2
Number of Heads
Probability
o
(1 - p)3
3p(1 _ p)2
3p2(1
p)
p3
1
2
3
Now, more generally, suppose that Tl coins are tossed, in which case
heads can equaJ any integer from 0 through Tl. llyanalogy,
number of
probability of any)
number of ways ) . particular
sequence
P(k heads) =
to arrange k
having k heads
( heads and n - k tails
(
and 11 - Ie tails
=
number of ways )
to arrange k
( heads and 11 - Ie tails
The number of ways to arrange k
and n - k Ts, though, is __11_'__ or
G)
(recall
Theorem 2,6..2).
Theorem 3.2.1. Consider a series of n i1u1ependent trials, each resulting in Ol1e of two
possible outcomes, "success" or '1ailure." Let p = P (success occurs at any given trial)
and assume that p remains constant from trin! to trial TJren
P(k successes) =
(n)
k
p k (1 -
p) ,,-It • k
Comment. The probability assignment given by the
known as the binomial distribution.
= 0 , 1, .... n
1-<",,,,n,i"I'" in Theorem
EXAMPLE 3.2.1
As the lawyer for a client accused of murder •you are looking for ways to establish "reasonable doubt" in the minds of the jurors. Central to the prosecut.or's case is testimony from a
forensics
who claims that a blood sample taken from
scene of the crime matches
the DNA of your client. One·tenth of 1% of the time, though, such tests are error,
132
Chapter 3
Random Variables
Suppose your client is actually gUilty. If six other laboratories in tbe cOLIn try are capahle
of doing this kind of DNA analysis (and you
them all), what are the chances that at
one will make a mistake and
yOUT client is innocent?
Each of
six analyses
an
trial. where p = P
a
0.001. SUbstituting into Theorem
shows that the lawyer's
is not
to work:
one lab says clien t is innocen t)
1
P (0 labs make a mistake)
= 1
(~)(0.001)O(0.999)6
= 0.006
0.006 is hardly
Given such small values for
the defendant. the . . ,,", ....,"
nand p, though, op1cfln,o
...."'.r...... forensic results would be a
at best. Bul
Baretta, was fond
then again, as the erstwhile TV
the time, don't do the crime. "
1
iU'''""",,,,,' Pharmaceuticals is
with a new affordable AIDS medication,
PM-17, that may have the abilily 10
a victim's immune
monkeys
HIV complex have been
the drug. Researchers
to wait six
count the number of animals whose immunological responses show a
marked
Any inexpensive drug
of being effective 60% of the time
whose chances of success are
would be considered a major breakthrough;
50% or
are not likely to have any
Yet to be finalized are guidelines for interpreting
Kingwest hopes to avoid
a drug that would ultimately prove to be marmaking ejther of two errors: (1)
ketable and (2)
development dollars on a
whose eHeCI1v(~nc:ss.
in the long run, would be
or
As a tentative
rule," the project manager
suggests that unless 16 or more of the monkeys show
research on PM-I7
should be
a. WlJ<tL are the dl<tm;es th<tt the "sixteen or more" rule will cause the company to
PM-I7, even if the drug is 60% effective?
b.
often will the
or
rule allow a50%-effective drug to be perceived
as a major breakthrough?
(0) Each of the monkeys is one of II = 30 independent
come is either a "success" (monkey's immune system is
Of a "failUTe"
immune system is not strengthened). By assunlption, the probability
is p = P
that
produces an immunological improvement in any given
(success) 0.60.
Section 3.2
Binomial and Hypergeometrk Probabilities
probability that exactly k monkeys (out
By Theorem 3.2.1,
improvement after six weeks is
Ck
133
thirty) will show
O
)(O.60)k(OAO)30-k. The probability,
that the
"sixteen or over" rule win cause a 6O%-effective drug to be discarded is the sum of
"binomial" probabilities
k values ranging from 0 to 15:
P(60%·effective drug fails "sixteen or more" rule) =
(~) (O.60)k (OAO)30-k
= 0.1754
Roughly 18% of the time, in other words, a "breakthrough" drug such as PM-I? will
produce test results so mediocre (as measured by the "sixteen or more" rule) that the
company win be
into thinking it has no potential
(b) The other error Kingwest can make is to conclude that PM-17 warrants further
study when, in
its value for p is below a marketable level.
chance that particular
incorrect inference will drawn here is the probability that the number of that successes
than or equal to sixteen when p = 05. That
will
p (50 %-effective PM-I? "'I-'~"-'='''' to
marketable)
= P(sixteen or more successes occur)
=
f (30)
k=16
(0.5)" (05)30-k
k
= 0.43
Thus, even if PM-l?'s success rate is an
low 50%, it
a 43% chance of
performing sufficiently well in thirty trials to satisfy the "sixteen or more" criterion.
Comment. Evaluating binomial summatioos can be tedious, even with a calculator.
Statistical software packages offer a
alternative. Appendix 3.Al
how one such program, MINITAB, can be used to answer the sorts of questions posed in
Example 3.2.2.
EXAMPLE 3.2.3
The Stanley Cup playoff in professional hockey is a seven-game
where the first
team to win four games is declared the champion. The series, then, can last anywhere from
the World
in baseball). Calculate the likelihoods
four to seven games (just
that the series will last four, five,
and seven games. Assume that (1) each
is an
independent event and (2) the two teams are evenly lU"'"'U........
Consider the case where Team A wins the series six games. For that to happen, they
must win exactly
of the
five ga.mes ond they must win the sixth
Because
134
Chapter 3
Random Variables
of th.e independence assumption, we can write
P(Team A wins in six games) = P(Team A wins three of first five) . P(Team A wins sixth
= [
G)
(0.5)3 (0.5)2] . (0.5)
= 0.15625
Since the probability that Team B wins the series in six games is the same (why?),
P (series ends in six games) = P(Team A wins in six games
u Team B wins in six games)
= P (A wins in six) + P (B wins in six)
= 0.15625
(why?)
+ 0.1562..';
= 0.3125
A similar argument allows us to calculate the probabilties of four-, five-, and seven- game
series:
P(four game series) = 2(0.5)4 = 0.125
P(five game series) = 2 [
P(seven game series) = 2 [
G)
G)
(0.5)3(0.5)] (0.5)
= 0.25
(0.5)3(0.5)3] (0.5)
= 0.3125
Having calculated the "theoretical" probabilities associated with the possible lengths of
a Stanley Cup playoff raises an obvious question: How do those likelihoods compare with
the actual distribution of playoff lengths? For a recent fifty-nine year period, Column 2 in
Table 3.2.3 shows the proportion of playoffs that lasted 4, 5, 6, and 7 games, respectively.
Clearly, the agreement between the entries in Columns two and three is not very gooct
Particularly noticeable is the excess of short playoffs (four games) and the deficit of long
playoffs (seven games). What this "lack of fit" suggests is that one or more of the binomial
ilistribution assumptions is not satisfied. Consider, for example, the parameter P. which
we assumed to equal
In reality, its vaJue might be something quite different-just
i.
TABU 3.2.3
Series ~.~"" .. "
4
5
6
7
Observed 1-',..,..,.,......';
19/59 =
15/59 =
15/59 =
10/59 =
0.322
0.254
0.254
0.169
Theoretical Probability
0.12..';
0.250
0.3125
0.312..';
Binomial and Hypergeometric Probabilities
Section 3.2
115
because the teams playing for the championship won their respective divisions, it does
not necessarily follow that
two are
good. Indeed, if the two contending teams
were frequently mismatched, the
would be an increase
number
short playoffs and a decrease in
number of long playoffs. It may also be the case that
momentum is a factor in a team's Chances of winning a
game.
the independence
assumption implicit the binomial modeJ is rendered invalid.
EXAMPlE 3.2.4
Doomsday Airlines ("Come Take the Flight of Your life") has two aircraft-a dilapidated
two-engine
plane and an equally outdated and under-maintained four-engine prop
plane. Each
will land safely only if at least
its
are working properly.
Given that you wish to remrun among the living, under what conditions would you opt
to fiy on the two-engine plane? Assume that each engine on each pJane has the same
and that
such failures are independent events.
probability p of
the tW(>-ell~lIle
P (fligbtlands safely)
=
P(one or more engines work properly)
=
t (2)(1 -
k=l
pi p2-k
(3.2.1)
k
For the four-engine plane.
P(flight lands safely) = P(two or more engines work properly)
(3.2.2)
for the two-engine plane, then,
When to
of p for which
the
"13 £11 11 r.' "
to an algebra problem: We look for
>
or, equivalently,
Simplifying the inequality
(~)(1
-
p)Op4
+
G)(1 -
p)l
>
(~)(1
p)O,}
136
Chapter :3
Random Variables
LO
0.9
0.8
0.6
~
0.5
C/.I
0.4
'"
~
fl..
0.3
0.2
""
""...
" ,,
""
""
"...
/t
/
0.1
0
,
I
I
I
I
I
0.7
- - 2-cnginc plllnc
.. .. ..
0.1
02
0.3
0.4
0.5
0.6
p;: P (e.ngine.
0.7
""
""
" "- .....
0.8
0.9
... -
1.0
f~il;;)
FIGURE 3.2.1
gives
(3p
l)(p -
1) < 0
(3.2.3)
L) is never
so Inequality 3.2.3 win be true only when
1) > 0,
p ;:-. ~ as the desired solution set. Figure 3.2.1
the two "safe return"
as a function of p.
QUESTIONS
3.2.1. An investment analyst has tracked a certain
stock for the past six months and
up a point or down a point.
it
found that On any given day it either
went up on 25% of the days and down on 75%. What is the probability that at the dose
of trading four days from now the price of the stock will be the same as it is
Assume that the
fluctuations are independent event:.'!,
3.2..2. In a nuclear reactor, the fission process is controlled by inserting special rods into the
raruoactive core to absorb neutrons and slow down the nuclear chain reaction. When
functioning
these rods serve as a first-tine defense against a core meltdown.
Suppose a reactor
10 control rods, each operating independently and each having a
0.80 probability of
properly inserted in the event of an "inddt:nt", Furlht:rmun::,
suppose that a meltdown will be prevented if at least half the rods perform satisfactorily.
What is the probability
upon demand, the system will fail?
3.2.3.
that since the early 19508 some lO,ero independent UFO sighting.; have been
......""" ..11 to civil authorities. [f the probability that any sighting is genuine is on the order
100,000, what is the probability that at least] of the 10,000 was
3.2.4. The probability that a circuit board
off an assembly line
Suppose that 12 boards are tested,
(a) What is the probability that
4 win need rework?
(b) What is the probability that at least one needs rework?
Section 3.2
Binomial and Hypergeometric Probabilities
137
3.2.5. A manufacturer has 10 machines that die cut cardboard boxes. The probability that. on
a given day, anyone of the machines will be oUt of service for repair or maintenance is
0.05. If the day's production
the availability at least seven of the machines,
what is the probability the
done?
3.2.6. Two lighting systems are being
for an employee work are~t One requires
SO bulbs, each having a probabilHy of 0.0.') of burning out within a month's time.
The second has 100 bulbs, each with a 0.02 burnout probability. Whichever system is
installed will be inspected once a month for the purpose of replacing bumed·out bulbs.
Which system is likely to
less maintenance? Answer the question by comparing
the probabilities that each wili require at least one bulb to be replaced at the end
of 30 days.
3.2.1. The great English diarist Samuel Pepys asked his friend Sir Isaac Newton the [ollowing
question: Is it more likely to
at least one 6 when 6
are rolied, at least two 6's
when 12 dke are rolled, or 8lleast three 6's when i8 dice are rolled? After considerable
correspondence
(162»). Newton convinced the skeptical Pepys that the first event
is the mosllikely.
the three probabilities.
missiles at an attacking plane. Each has a
20% chance of
on target. h 1wo or more of the shells find their mark, the plane
will
At the same time, the pilot of the plane fires 10 air-to-surface rockets, each
of which has a 0.05 chance of critically disahling the boat. WhlH you rather be on the
plane or the boat?
3.2.8. The gunner on a small assault boa! fires
3.2.9. If a family has four children, is it more likely
will
two boys and two
three of one sex and one of the other? Assume that the probability of a child
boy is j and that the births are independent even Is.
or
a
~ of all patients having a certain disease will recover if
the standard treatment. A new drug is to be tested on a group of 12 volunteers.
If the FDA requires
at least seven of
patients recover before it willilcense
the new
what is the probability that the treatment will be discredited even i[ it
has the potential to increase an individual's recovery rate to
3.2.10. Experience has shown Ihal
!?
3.2.11. Transportation to school for a rural county's 76 children is provided by a fleet of four
buses. Drivers are chosen on a day-to-day basis and come from a pool of local t<1r ....."rc
who have
LO be "on caU". What is the smallest number of drivers that need
10 be in the pool if the county wants to have at least a
probability on
given
day that all the buses will run? Assume that each
has an 80% chance
being
available if contacted.
3.2.12.
captain of a Navy gunboat
a volley of 2S
to be fired at random
along a 500-foot stretch of shoreline that he hopes to establish as a beachhead. Dug
into the beach is a 30-foot-long bunker serving as the enemy's first line of defense. The
captain has reason to believe that the bunker will be destroyed if at least three of the
missiles are on
What is the probability of that happening?
3.2.13. A computer
generated seven random numbers over the interval 0 to 1. Is it more
likely that (t) exactly three will be in the interval! to 1 or (2) fewer than three wiII be
greater than ~?
3.2.14. Listed in the
the
table is the length distribution of World Series competition for
yeats from 1950 to 2002.
138
3
Random Variables
World Series
L£"'r"'l.~
Number of Games. X
Number of Years
4
9
5
6
7
8
11
24
52
Assuming that each World Series game is an
event and that the probability
of either team's winning any particular contest is
find the prohability of each series
length. How well does the model fit the data?
the "expected"
thai is. multiply the prohability of a given length
times 52).
3.2.15. Use Ihe
of (x + y)" (recall the comment in Section 2.6 on page 108) to
that the binomial probabilities sum to 1; that
G)PI.:(l _
p),,-I<:
a series of II independent trials can end in one of three possible outcomes.
and 1<2 denote the number of trials Ihat result in outcomes 1 and 2.
Let Pl and P2 denote the
associated with outcomes 1 and 2.
Theorem
to deduce a
for the probability
kl and k,. occurrences
of outcomes 1 and 2, respectively.
3..2.17. Repair calls for central air conditioners fall into three
categories: coolant
leakage. compressor failure, and electrical malfunction.
has shown that the
Suppose thar
probabilities associated with the three are 0.5, 0.3, and 0.2.
the answer
a dispatcher ha5
in 10 service requeslS for tomorrow
involve coolant
10 Question
10 calculale the probability that 3 of those 10
leakage and 5 will be compressor failures.
3.2.l6.
Hypergeometric Distribution
The second "special" distribution that we want to look at formalizes the urn
Chapler 2.
to those
problems tcnded to be
We listed the entire set of
samples, and then counted the ones that satisfied the
event in
The
and re.dundancy of that ilpproac.h shollid he painfully
What we are seeking here is a genera] formula that can be applied to any and
all such
much like the
in Theorem 3.2.1 can handle the full range
and w white chips, where r + w = N.
any of the
n chips from the urn
without
The question is.
selected. At each drawing we record the color of the chip
what is the probability that exactly k red chips are included among the II that are
removed?
Notice thnt the
just described is similar in some re~;pecl.l>
model, but the
of sampling creates a critical distinction.
drawn was
replaced prior to
another selection, then each drawing
an independent
trial, the chances of
a red at any
would be a constant r/N, and the
be
included in the 11 selections would
chips
would
probability that exactly k
Section 3.2
Binomial and Hypergeometrlc Probabilities
139
a direct application of Theorem 3.2.1:
However, if the chips drawn are not replaced, then the probability drawing a red on
any given attempt is not
rjN: Its value would depend on the colors of the
chips selected earlier.
p = P(red is drawn) = P(succ.ess) does not remain constant
from drawing to drawing, the binomial model of Theorem 3.2.1 does not apply. Instead,
probabilities that arise from the 44no replacement" scenario just described are said to
follow the hypergeometric distribution.
=
Theorem J.2.2. Suppose an urn contains r red chips and w white chips, where r + w
N.
If n chips are drawn out at random; without r~plo.cement, and if Ie denotes the number of
red chips select£d, then
P(kred chips are chosen)
=
whuekvariesoverallthe:integersforwhich (~) and
(32.4)
C:
k) are defined. Theprobabilities
appearing on the right-hand side of Equation 3.2.4 are known as the hypergeometric
distrIbution.
Proof. Assume the chips are distinguishable. We need to Count the number of elements
making up the event of getting k red chips and n - k white chips. The number of ways
to select the red chips, regardless of the order in which tbey are chosen. is r Pt. Similarly,
the number of ways to select the n - k white chips is w
However,
oreier in
which the chips are
does matter. Each outcome is an 11-10ng ordered sequence
of red and white. There are
go. Thus, the number
(~) ways to choose where
the sequence the red chips
elements in the event of interest ~s
G} Pt
wPlI-k.
Now, the
total number of ways to choose n elements from N, in order, without replacement is
NP", so
P(lered chips are chosen)
= ~:...---­
P
N
ll
This quantity. While correct, is not in the form of the statement of the theorem.
To make that conversion, we have ·to change all
the terms
the expression
140
Chapter 3
Random Variables
to factorials:
P(k red
are chosen) = -'--'------
k!
(N -
r!
n)!
w!
n!(N
n)!
o
Comment. The appearance of binomial
suggests a model of selecting
subsets. Indeed, one can consider the
of selecting a subset of size n
simultaneously, where order doesn't matter. In that case, the question remains: what is
the probability of getting k red
and n - k white
A moment's
will
show that the hypergeometric
given in the statement of the rf'lf'{\rpm
answer that question. So, if our
is simply counting the number of
and white
is
chips in the
the probabilities are the same whether the drawing of the
simultaneous, Of the chips are dJ2!wn in
without ._r_t"t •.n,...
".'r.... I1," ..."r!
Comment.
mathematician
1
ab
+ -x
+
c
name hypergeometric
physicist, Leonhard
+
rl""r'fW~"
introduced by the Swiss
1
+ ...
This is an
of considerable flexibility:
to many of the standard infini te
tu 1, and b aHd c i:H~ ~t
to ~aciI other, it
1
+x +
x
2
appropraate
for a, b, and c, it
in analysis. In particular, if (J is set equal
I;CUIUU:;'" to the familiar
series,
+
+ .0.
hence the name hypergeometnc. The relationship of the probability function in Theorem
to Euler's series
apparent if we set (J -n, b = -r. c = w
n + L
and multiply the series by (;) /
the value the
(~) . Then the
for P(k red
are chosen).
of xk will be
Section 3.2
Binomial and Hypergeometric Probabilities
141
EXAMPLE 3.2.5
Keno is among the most popular games played in
Vegas even though it ranks as one
of the least "fair" in the sense that the odds are overwhelmingly in favor of the house.
machine!) A keno card
(Betting on keno is only a little less foolish than playing a
eighty
1 through SO, from which the player selects a sample of size k, where
k can be anything from 1 to
(see
3.22). The "caller" then announces twenty
winning numbers, chosen at random from the eighty. If-and how much-the pJayer
wins
on how many of his numbers match the twenty identified by the caller.
Suppose that a
bets on a ten-spot ticket What is his probability of "catching" five
numbers?
KENO
AGURE 1.2.2
Consider an urn containing
numbers, twenty of which are winners and sixty of
which are
(see Figure 3.2.3). By betting on a ten-spot ticket., the player, in effect, is
drawing a sample of size ten {rom that urn. The probability of "catching" five numbers is
the probability that five of the numbers the player has bet on are contained in the set of
twenty winning nUJmoers
W winning #',
-ChooselO
60 1000g #'s
AGURE 3.23
142
Chapter 3
Random Variables
By Theorem
(with r = 20,
approximately a 5% chance of
W
II = 10, N = 80, and k
exactly five winning numbers:
the player
= (ill,
P(five winning numbers are selected)
=
(~~~~~) =
005
-~~~-----------------------------
EXAMPLE 3.2.6
... " ...,v,.., to
a unanimous
Suppose thal a pool
is assigned to a murder case where the
is so
potential
overwhelming against the
that twenty-three of the twenty-five would return a
guilty verdict. The other two potential jurors would vote to acquit regardless of the facts.
What is
probability that a twelve-member
chosen at random from the
of
twenty-five will be unable to
a unanimous decision'!
Think of the jury pool as an urn containing twenty-five chips, twenty-three of which
who
correspond
who would vote "guiLty" anu twu uf whidJ l:uu~pOl1l110
would vOte "nOl
" If either or both of the
who would vote "not
are
included in the
of twelve,
result would a hung jury.
Theorem
gives 0.74 as the probability that the jury impaneJled would not reach a unanimous
decision:
P(hungJury)
=
,'"''''.. ,.,.,,,, is not
= G)(~)/ G~) +
G)G~)/ (~)
=0.74
EXAMPLE 3.2.1
is fired it becomes scored with minute striations produced by
in the
Appearing as a series of parallel lines, these
have
been
a bullet with a gun,
firings of the same
recognized as a basis for
we::lpon win prMIlc,e hllllets having substantially the same configuration of
Until recently, deciding how close two patterns had to be before it could be concluded
the bullets came from the same
was largely subjective. A ballistics expert would
look at the two bullets under a microscope and make an
judgment based
on past
Today, criminologists are beginning to address the prob)em more
quantitatively, partly with the help of the hypergeometric distribution.
aJong with the suspect's gun.
Suppose a bullet is recovered from the scene of a
a microscope, a grid of m
numbered 1 to m, is superimposed over the bullet.
If m is chosen Jarge
so the
of the
is sufficiently
each of tbat
Binomial and Hypergeometric Probabilities
143
Striatiol'l8 (total of "e)
1IIIIIfl ... lliJ
1 2
3
4
E
b
-"' -
m
S
(II>
(b)
flGUftEl.2A
evidence bullet's III' striations will fall into a different cell (see Figure 32.4(a». Then the
suspect's gun is fued, yielding a test bullet, which will bave a total of nr striations located in
a possibly different set of cells (see Figure 3.2.4{b». How might we assess the similarities
in cell locations for the two striation patterns?
As a model for the striation pattern on the evidence bullet, imagine an urn containing
m chips, with ne corresponding to the striation locations. Now, think of the striation
pattern on the test bullet as representing a sample of
II, from the evidence urn.
By Theorem 3.2.2, the probability that k of the ceU locations will
shared by the two
striation patterns is
Suppose the bullet found at a murder scene is superimposed with a
having m =
cells, ne of which contain striations. The suspect's gun is fired and the bullet is fOllnd to
have III = 3 striations, one of which matches the location of one of the striatklns on the
evidence bullet. What do you think a ballistics expert would conclude?
Intuitively, the similarity between the two buUets woUld
reflected the probability
that one or more striations in the suspect's bullet matched the evidence bUllet. The smaller
that probability is, the stronger would be our belief that the two bullets were fued by the
same gun. Based on the values given for m, lie, and n"
144
Chapter 3
Random Variables
If P(one or more matches} had been a
small
O.OOl-the inference would have been dear-.cut: The same gun fired both bullets, But, here with the
probability of one or more matches
so large, we cannot
out the possibility
that the bullets were fired
two different guns (and, presumably, by two different
EXAMPLE 3.2.8
Wipe Your Feel, a
to establish name recognition in a
community consisting
thousand households.
company's management team
estimates that
thousand of those
would do business with
firm if they
were contacted and informed of the services available. With
in mind,
company
has hired a staff of telemarketers to place one thousand calls. Write a formula for the
probability that at least one hundred new customers will be identified.
this is an urn problem nol
the
three ex.lmples,
for the fact that the numbers
"chips" are powers of ten
than what we have
encountered up to thJS paint. in the terminology of Theorem 3.2.2, N = 60,000, r = 5,000,
w = 55,000, 1/ = 1000. and
P(telemarketers identify k new customers)
=
5000) ( 55,000 )
( k
1000 - k
(~=)
k = 0, L ... , 1000
It follows that
hundred or more new customers are
or fewer new customers are identified)
=1
=1-
(3.2.5)
Needless to say, evaluating Equation
directly is very difficult because of the
number of terms involved and
large [aduriab implicit iu both the numerator and
denominator. In
4 we will learn a series of approximations that virtually trivialize
the evaluation.
section 3.2
Binomial and Hypergeometric Probabilities
145
CASE STUDY 3.2.1
Biting into a plump, juicy apple is one of
innocent pJeasures autumn. Critical
to that enjoyment is the firmness
the apple, a property that growers and shippers
monitor closely.
apple industry goes so far as to set a lowest acceptable limit for
For
Red
firmness, which is measured (in lbs) by inserting a probe into the
Delicious variety, for example,
is supposed to be at least 12
in
state
of Washington, wholesalers are not allowed to sell apples if more than 10% of their
shipment falls below that lb limit.
AU of this raises an obvious question: How can shippers demonstrate that their
apples meet the 10% standard?
each one is not an option-the probe that
measures firmness renders an apple
for sale. That leaves sompling as the only
viable strategy.
Suppose,
example, a shipper
a supply of 144 apples. She decides to
at random and measure
one's firmness, with the intention of selling the remaining
apples if 2 or fewer in the sample are substandard What are
consequences of her
pJan? More specifkalJy, does it have a good chance of "accepting" a shipment that
meets the 10% rule and a good chance of ·'rejecting" one that does not? (If either or
both of those objectives are not met, the plan is inappropriate.)
For examfle, suppose there are actually 10 defective apples among the original
144. Since
X 100 = 6.9%, that shipment would be suitable for sale because fewer
how likely is it that a
than 10% failed to meet the firmness standard TJle question
sample of 15 chosen at random from that shipment will pass inspection?
Notice,
that the number of substandard apples in the sampJe has a hypergeoN = 144. Therefore,
metric distribution with r = 10, W 134, n = 15,
kI
P(sample
inspection)
=::
P(2 or
= 0.320
substandard apples are found)
+ 0.401 + 0.208 =
0.929
So, the probability is reassuringly high that a supply of apples this good would,
follows from
calculation that
be judged acceptable to shjp. Of course, it
time, the number of substandard apples found wi\] be greater than
roughly 7% of
2, in which case the apples would be (incorrectiy) assumed to be unsuitable for sale
(earning them an undeserved one-way ticket to the applesauce factory ... )
How good is the proposed sampling plan at recognizing apples
would, in fact,
be inappropriate to ship? Suppose, for example, that 30, or 21%, of the 144 apples
(Continued on nexi page)
146
Chapter 3
Random Variables
(Lase ~l!Id}' 3.2.1 continued)
would faU below the 12
limit. Ideally. lhe probability here that a
passes
inspection should be small. The number of substandard
found in this case
with r =
w =1
11 =
and N =
so
would be
P(sample
inspection)
= 0,024 + 0.110 +
= 0.355
Here the bad news is thar the sampling plan will allow a 21 % defective supply to
be shipped 36% of the time. The
news is that 64% of the
the number of
substandard apples in the sample will exceed 2, meaning that the correct decision "not
to ship"
be made.
the
of defectives in
Figure 3.2.5 shows P(sample passes) ploued
of this sort are called operating characteristic (or OC) curves:
the
supply.
They summarize how a sampling plan will respond to an possible levels of quality.
~ 0.8
!
0.6
0.4
~
0,2
OL-~---L--~--~--~~"~-,~~~~
o
10
20
30
40
50
60
70
PI'esumed percent del"eclive
80
90
100
FIGURE 1.l.5
Comment.
Every sampling plan invariably alloWS for two kinds of errorsthat should be accepted and accepting shipments that should be
the probabilities of committing these errors can be manipulated
rule and/or
the
size. Some these options
will be explored later in Chapter 6.
"nl.... rn'Pflll"
QUESTIONS
3.2.18. A corporate hoard contains 12 members. The board decides to create a five person
Committee to Hide Corporation Debt. Suppose four members of the board are
accountants. What is the probabili[y [hat the Committee \vill contain two accoumants
and Ihree non-accountants?
Section 3.2
Binomial and Hypergeometric Probabilities
147
3.2.19. One of the popular tourist attractions in Alaska is watching black bears catch salmon,
swimming upstream to spawn. Not all "black" bears are black, though-some are
tan-colored. Suppose that six black bears and three tan-colored bears are working the
rapids of a salmon stream. Over the course of an hour,
different bears are sighted.
What is the probability that those
will include at least twice as many black bears as
tan-colored bears?
3.2..20. A city has 4050 children under the
of IO, including 514 who have not been
vaccinated for measles. Sixty-five of the city's children are enrolled in the ABC
Day
Center. Suppose the municipal health department sends a doctor and a
nurse to ABC to immunize any child who
not already been vaccinated Find
a formula for the probability that exactly k of the children at ABC have not been
vaccinated.
3.2.21. Country A inadvenently launches IO guided missiles--6 armed with nuclear
warheads--at Country B. In response, Country B fires 7 antiballistic missiles, each of
which will destroy exactly one of the
rockets.
antiballistic missiles have
no way of detecting, though, which of the 10 rockets are carrying nuclear warheads.
What are the chances that Country B will be hit by at least one nuclear missile?
3.2.l2. Anne is studying for a history exam covering the
Revolution that will consist
five essay questions selected at random from a
of 10 the professor has handed
out to the class in advance. Not exactly a Napoleon buff, Anne would like to avoid
researching all 10 questions but still be reasonably assured of
a fairly good
grade. Specifically, she wants to have at least an 85% chance of geuing at least four of
the
questions right. wm it be sufficient
studies eight of the 10 questions?
3.2.23.
year a college awards
merit-based scholarships to members
the
freshmen class who have exceptional
school
initial pool of applicants
for the upcoming academic year has been reduced to a "short list" of eight men and
ten women, aU of whom seem equally deserving. If the awards are made at random
from among the 18
what are the chances that both men and women will
represented?
3.2.24. A local lottery is conducted weekly by choosing five chips at random and without
replacement from a popUlation of 40 chips, numbered 1 through 40; order does not
matter. The winn.ing numbers are announced on five
commercials during the
Monday night broadcast of a televised movie. Suppose
first
winning numbers
match three of yours. What are your chances at that point of winning the lottery?
3.2.25. A display case contains 35 gems, of which 10 are reat diamonds and 25 are fake
diamonds. A burglar removes four gems at random, one at a time and without
replacement What is the probability thaI the last gem she steals is the second real
diamond in the set of four?
A bleary-eyed student awakens one morning, late for an 8:00 class, and pulls two socks
out of a drawer that contains two black, six brown, and two blue socks, all randomly
arranged, What is (he probability that the two he draws are a matched
3.2.1:7. Show directly that the set of probabilities associated with the hypergeomelric distribution sLIm to 1. Hint: Expand the identity
and equate coefficients.
3.2.28. Urn I contains five red chips and four white
Urn U contains four red and five
wbite chips. Two chips are drawn simultaneously from Urn I and placed in Urn II.
148
Chapter 3
Random Variables
3.2.29.
3.2.30.
3.2.31.
3.2.32.
Then a
chip is drawn [rU{n Urn II. What is the
that the chip drawn
from Urn II is white? Hint: Ust: Tht:ureIll2A.1.
As the owner of a chain of sporting goods stores. you have just been offered a "dear'
on a
of 100 robot tabLe tennis machines. The
is right but the prospect
of picking up the merchandise at midnight from an unmarked van parked on the side
of lhe New Jersey Turnpike is a bit disconcerting. Being of low repme yourself, you
do not consider Ihe legality of the transaction to be an issue, but you do have concerns
about being cheated. If too many of the machines are in poor working order, the
offer ceases to be a
Suppose you decide to close the deal only if a sample
of 10 machines contains no more than one defective. Construct the corresponding
operating characteristic CUI"llC. For approximately what incoming qualily will you
a shipment 50% of the lime?
lhat r of N
are red.
the chips into three grOllpS of
n., n2,
113, where III + 112 + 113 = N. Generalize the hypergeometric distribution 10 find
the probability that the first group contains n red chips, the second group r2 red chips,
and the third group r3 red chips, where rt + T2 + 11 = r.
Some nomadic tribes. when faced with a life-threatening contagious
will try to
;"""'..""" their chances of survival by
into smaller groups.
a tribe of
21 people, of whom four are carriers of the
split into three groups of 7 each.
What is the probability that at leBst one grOlip is free of Ihe ciisease? Hint: Find the
probability of the complement.
a population contains 111 objects of one
n2 objects of a second kind, .,.,
and nl objects of a tlh kjnd, where nl + n2 + ... + 11/
N. A sample of size 11 is
drawn at random and without replacement. Deduce an
for lhe probability
of draWing kl objects of .he first kind, k2 objects of the :-ecuml
.... (1I1J k/
of the Ilh kind by
Theorem 3.2.2.
four sophomores, four juniors. aod
applied for membership in their school's Communications
a group that oversees the
newspaper, literary ma~zine, and radio show. Eight posi~re
open. If the selection is done at random. what is the probability that each class
two
(Hint: Use Ihe generalized hypergeometric model asked for in
3.3
DISCRETE RANDOM VARIABLES
The binomial and hypergeometric distributions described in Section 3.2 are special cases
ot some important
concepts that we want to explore more funy in this section.
Previously in Chapter 2, we studied in depth the
where every point in a
space is equally likely to occur (recalJ
2.6). The sample space
independent
trials that ultimately led to the binomial distribution presented a quite different scenario:
individual points in S had different probabjlities. For eXAmple, if n
4 ;mel
P
the probabilities assigned to the sample
(s, f, s, j) and (f, j, j, f) are
= l,
M,
(1/3)2(2/3)2
and (2/3)4
respectively, Allowing for 1he possibility that different
outcomes may
differenl probabilities will obviously broaden enormously the range
of real-world problems that probability models can address.
Section 3.3
Discrete Random Variables
14g
How to assign probabilities to outcomes that are not binomial or hypergeometric is one
of
questions investigated in this chapter. A ~ond
issue is the nature of
the sample space itself and whether it makes sense to redefine the outcomes and create,
effect, an alternative sample space. Why we would want to do that
already come
up in our discussion of independent
The "original" sample
in
cases is
a set of ordered sequences, where the ith member of a sequence is either an "s" or an
"i," depending on whether the ith trial
in either success or failure, respectively.
However, knowing which particular trials ended in success is typically less important than
knowing the number that did (recall the
discussion on 129). That being
case, it often makes sense to replace each ordered sequence with the number successes
that sequence contains. Doing so collapses the original set of ordered sequences (i.e.,
outcomes) in S to the set of n + 1 integers ranging from 0 to n. The probabilities assigned
to those integers, of course, are given
the binomial formuJa in Theorem 3.2.1In general, a function that assigns numberS to outcomes is called a random variable.
The purpose of such functions in practice is to define a new sample space whose outcomes
speak more directly to the objectives
the experiment. That was the rationale that
ultimately motivated both the binomial and hypergeometric distributions.
The purpose of tbls section is to (1) outline
general conditions under which
probabilities can be assigned to sample spaces and (2) explore the ways and means
redefining sample spaces through
use of random variables. The notation introduced
in this section is especially important
will
used throughout
remainder of the
book,
Assigning Probabilities: The Discrete Case
We begin with the general problem of assigning probabilities to sample outcomes, the
simplest version of which occurs when tbe number
points in S is either finite or
countably infinite. The probability functions, p(s), that we are looking for those cases
satisfy
conditions in Definition 3.3.1.
Definition 3.3.l. Suppose that S is a finite or cauntably infinite sample
each element of S such that
Let
p
be
a real·valued function defined
a. 0:::: p(s) (or each s
b.
p(s)
E
S
=1
p is said to be a discrete probability function.
Comment. Once p(s)
for all $, it follows that the probability of any event
that is, NA)-is the sum of the probabilities the outcomes comprising A:
peA) =
L
p(s)
(3.3.1 )
Ji'll sEA
Defined in this way, the function peA) satisfies the probability axioms given in Section 2.3.
The next several examples illustrate some of the specific farms that p(s) can have
how
P(A) is calculated.
150
Chapter 3
Random Variables
EXAMPLE 3.3.1
Ace-six fiats are a type of crooked dice where the
is foreshortened in the onesix direction, the effect
that Is and 6s are more likely to occur than any of
the other four
p(s) denote the probability that the face showing is s. For
many ace-six fia.ts, the "cube" is asymmetric to the extent that p(l) = p(6) = ~, while
p(2) = p(3) = p(4)
peS) = ~. Notice that pes) here qualifies as a discrete probability
pes) is
than or equal to 0 and the sum of pes), over all .1', is
function because
=
1( = 2(i) + 4(k))·
3.3.1 that
Suppose A is the event that an even number occurs. It. follows from
peA)
= P(2) +
P(4)
+ P(6) = ~ + k +!
~.
Com.ment. If two ace-six fiats are
seven is equal to 2p(1)p(tl) + 2p(2)p(5)
the probability of getting a sum equal to
+ 2p(3)p(4) = 2(1)2 + 4(1)2 ==
[f two
fair dice are roUed, the probability of getting a sum equal to seven is 2p(1)p(6) +
2p(2)p(5) + 2p(3)p(4) = 6(*)2 =
which is
than
Gamblers cheat
ace-six
and forth between fair dice and
flats, depending on whether
flats by switching
or not they want a sum of seven to be rolled.
ft.
1,
EXAMPlE 3.3.2
Suppose a fair coin is
until a head comes up for the first time. What are the chances
of that happening on an odd-numbered toss?
Note that the sample
here is countably infinite and so is the set of o~
making up the event whose probability we are trying to find. The peA) that we are looking
for, then, will be the sum of an
number oftenns.
Let p(s)
the probability that the first
appears on the sth toss. Since the coin is
presumed to be fair, p(l) = ~. Furthermore, we would
half the time,
a tail
appears, the next toss would be a head, so p(2) ==
= In general, p(.1') = (~r,
s
1,2, .. ,
p(s)
the conditions stated
Oearly, p(.1') ~ 0 for
all s.
see that the sum of the probabilities is 1, recall the formula for the sum of a
geometric series: If 0 <: 7 <: 1,
! . i 1.
78
1
= __
1 -
Applying Equation 3.32 to the sample
P(S)
=
p(s)
=
(3.3.2)
7
here confirms
P(S)
=
1:
Section 3.3
Now,
peA)
A be the event that
= p(l) + p(3) -+ p(5)
p(l)
+
p(3)
+
p(5)
+ ...
Discrete Random Variables
151
first head appears on an odd-numbered toss. Then
But
+ ... =
p(Zs
+
(l)b+l = (1)2: (l)S
4"
= ~ 2:
00
1)
00
CASE STUDY 3.3.1
For good pedagogical reasons, the principles·of probability are always introduced by
considering events defined on familiar sample spaces generated by simple experiments.
that end, we toss coins, deal cards, roll dice, and draw chips from urns. It would
be a
error,
to infer that the importance of probability extends no
further than the nearest casino. In its infancy, gambling and probability were, indeed,
intimately related: Questions
games of chance were often the catalyst that
motivated mathematicians to study random phenomena
earnest But more than
340 years have
since
published De Ratiociniis. Today, the application
of probability to gambling is relatively insignificant (the NCAA March basketball
tournament notwithstanding) compared to the depth and breadth of uses the subject
finds
medicine,
and science.
Probability functions-propedy chosen-can "model" complex real-world phe~ describes the behavior of a fair coin.
nomena every bit as well as P(heads)
The following set of actuarial data is a case in point. Over a period of three years
(= 1096 days) London, records showed that a total of 903
occurred am40ng
males eighty-five years of
and
(188). Columns one and two of Table 3.3.1
give the breakdown of those 903 deaths according to the number occurring on a given
day. Column three gives the proportion of days for which exactly s elderly men died.
TABU:!.3.1
(1)
Number of Deaths, s
0
1
2
3
4
5
6+
(2)
Number of Days
484
391
164
45
11
1
0
-1096
(3)
Proportion
CoL (2) /1096]
(4)
pes)
0.442
0.357
0.150
0.041
0.010
0.001
0.000
0.440
0.361
0.148
0.040
0.008
0.003
0.000
1
1
152
Chapter 3
Random Variables
For reasons that will be gone into at length in Chapter 4,
the behavior of this particular phenomenon is
pes)
=
P(s elderly men die on a
~
probability function that
day)
(:U.3)
0, L 7,
How do we know that the pCs) in Equation 3.3.3 is an appropriate way to
probabililies to the "experiment" of elderly men dying? Because it accurately predicts
what happened. Column four of Table 3.3.1 shows p(s) evaluated for s = O. 1,2.... To
two decimal
the
between the entries in Column three and Column four
JS
EXAMPLE 3.3.3
Consider the following experiment
day for the next month you copy down each
number that
in the
on the front pages of your hometown newspaper. Those
numbers would
be extremely
One
be the age of <:I t:elebrity who
just died. another might report tbe interest rale currently paid on government Treasury
bills, and still another might give the
of square feet of retail space
added
to a local shopping mall.
Suppose you then calculated the proportion of those nnmbers whose leading digit was
a I) the proportion whose leading
was a 2, and so on. Whal relationship would you
proportions to have? Would numbers starting with a 2, for example, occur
expect
as often as numbers starling with a 67
p(s) denote the probability thalthe first significant digit of a "newspaper number"
us that the nine first digits should
is s, S = 1,2.... ,9. Our intuition is likely to
be equally probable-I hal is. p(H
p(2)
... = p(9) =
Given the diversity and
the randomness of the numbers. then: is no obviuu:. It:i:I~un
une digit !>hould be more
common than
Om· intuition,
would be wrong-first
are 1101 equally
likely. Indeed, they are nol even close to being equally likely!
Credit
making this remarkable discovery goes to Simon Newcomb, a mathematician
ago
some portions of logarithm tables are
who ohs~rved mort': Ihan A hllndred
used more than others (77). Specifically, pages at the beginning of such tables are more
dog-eared than
at the
suggesting that users had more occasion to look up logs
of
starting wilh small
than they did numbers slarling with
digits.
Almost fifty years
a physicist, Frank
reexamined Newcomb's claim in
more detail and looked for a mathematical explanation. What is now known as Benford's
loll' asserts that the first digits
many
types of measurements, or combinations
of measurements,
follow the discrete probability
p(s)
POst signifk:altl
l:>~)
=
(I + ;),
.\
],
2... ,,9
!,
Table 3.3.2
Benford's law to the uniform <:I~~umptjon that p(s) =
for all.s.
The differences are striking. According to
law, for example, Is are the mosl
frequently occurring first digil.
6.5 times (- 0.301jOJ)46) as often as 9s
Section 3.3
Discrete Random Variables
153
TA.Bl..£ 3.3.2
s
"UnifoIID" Law
Benford's Law
1
2
0.111
0.111
0.111
0.111
0.111
0.111
0.111
0.301
0.176
0.125
3
4
5
6
7
8
9
0.097
0.079
0.067
0.058
0.051
0.046
0.111
0.111
Comment. A key to why Benford's law is true are the differences in proportional
changes associated with each leading digit. To go from one tbousand to two thousand,
thousand, on the
example, represents a 100% increase; to go from eight thousand to
other hand, is only a 12.5% increase. That would suggest that evolutionary phenomena
28 than witb and 9s-and
such as stock prices would be more likely to start with Is
they are. Still, the precise conditions under which pes)
= log (1 + ~), s = 1,
... ,9 are
a topic of research.
not fully understood and
EXAMPlE 3.3.4
Is
p(s)=
! (1 ~ J...y.
s=O,l,2, ... ;
J..
J..
>
0
a discrete probability function? Why or why not?
To qualify as a discrete probability {unctio~ a given p(s)
to
Parts (a) and
(b) of Definition
A simple inspection shows
Part (a) is satisfied Since A > 0,
pes) is, in fact, greater tban or equal to 0 for aU $ = 0,1,2, ... Part (b) is satisfied if the
outcomes in Sis 1. But
sum of aU the probabilities defined on
pes)
ailsES
=
L00
1
.f~
1
+
(
J..
= 1 : A(1
A
1
+
J..
)5
_ ~)
1+1
1
= 1 + J..
=1
1
+1
1
why?
15il
Chapter 3
Random Variables
The answer, then, is
1
="1+):
(
A
1+),
O. L 2 .... ; A
>-
0 does
qualify as a
probability function. Of course,
it has any practical value
depends on whether the set of values for pes) actually do describe the behavior of
real-world pncen<Jmen:a"
Defining "New" Sample Spaces
We have seen how the function p(s)
a probability with each outcome, S, in a
sample space. Related is the key idea that outcomes can often be grouped or reconfigured
in
that may facilitate problem-solving. Recall [he sample space associated with a
of 11 independent trials, where
s is an
of successes
failures.
most relevant information in such outcomes is often the number of SUccesses
that occur, not a detailed
of which trials ended in success and whic.h ended in
failure. ThaI being the case. it makes sense to define a "new" sample space by
the original outcomes
to the number of successes they contained. The outcome
(j, j, ... , f), for example, had 0 successes. On the other hand, there were 11 outcomes
that yielded 1 success-(s, j,
... , f), (j, s, j, .... f), ... and U. j . .... s). As we
saw earlier in Ihis chapter. that particular
of outcomes ultimately led to the
binomial distribution.
The function that replaces the outcome (s. f, f, ...• I) with the numerical value I
is caUed a random v{lriable. We conclude this section with a discussion of some of the
concepts. terminology, and applications associated with nmdom vnrlobles.
Definition
A function
domain is a sample space S and
values form
set of reaJ numbers is called a discrete random variable.
a finite or countably
We nenote random variahles by upper case letters, often X or Y.
EXAMPLE 3.3.5
Consider
two
an expeliment for which the sample space is a set of ordered
pairs, S = ((i. j) I i = t. 2, .... 6: j = 1. 2, ... , 6}. For a variety of games ranging from
mattt:r~ on a given turn. That
Monopoly to craps, the sum of the numbers showing is
S of thirty-six ordered
would nOI provide
being
case, the original sample
a panicularly convenient bnckdrop
discussing the rules of [hose games. It would be
better to work directly with the sums. Of course, the eleven possible sums (from twO to
values of the random variable X, where X (i. j) = i + j.
twelve) are simply the
Comment. In
above example, suppose we define a random variable XI that gives
the result on the first die and
that gives the resulroll the second die. Then X = X I + X2.
Note how
we could extend this idea to the toss Three
or tell dice. The ability
ones is an advantage of
to conveniently express complex events in terms of
random variable concept that we will see playing out over and over
Section 3.3
Discrete Random Variables
155
"The Probability Density Function
We began this section discussing the function p(s), which assigns a probability to each
outcome s in S. Now, having introduced the notion of a random variable X as a real-valued
to find a mapping analogous to pes)
function defined on S-that is, Xes) == k-we
that assigns probabilities to the different values of k.
X is a probability
Definition 3.3.3. Associated with every discrete
density function (or pdf),
px(k), where
pAk)
= P({s E S I X(s) = k})
Note that p x (k) = 0 for any k not in the
of X.
usuaUy delete all references to sand S and write px(k)
notational simplicity, we will
= P(X
k).
Comment. We have already disclJssed at length two
of the function px(k).
the binomial distribution
in
3.2. If we let the random variable X
denote the number of successes n independent trials, then Theorem 3.2.1 states that
P(X
= k) =
px(k)
= G)pk(l - p),,-k,
k
= 0, 1, ... , n
similar result was given in that same section in connection with
hypergeometric
distribution. If a sample of size n is drawn without replacement from an urn containing r
red
and w white chips, and if we let
random variable X denote the number of
red
included the sample, then (according to Theorem
EXAMPLE 3.3.6
Consider
the rolling of two
as described in Exampie 3.3.5.
i and j denote
the faces showing on
first and second die, respectively, and define the random variable
X to be the sum of
two
XCi, j) = j + j. Find px(k).
According to Definition
each value of px(k) is the sum of the probabilities of
outcomes that
mapped by X onto
value k.
example,
P(X
= 5) = px(5) =
P({s E S
I X(s) = 5))
= P«l, 4) U (4,1) U (2,3) U (3,2»
= P(1,4) +
1
36
4
=-
+
P(4,1)
+
P(2, 3)
+
P(3,2)
156
Chapter 3
Random Variables
TABLE 3.3.3
k
px(k)
k
Px(k)
2
3
4
5
6
7
1/36
8
5/36
2/36
9
4/36
3/36
10
3/36
2/36
1/36
4/36
11
12
6/36
the dice arefair. Values of px(k) for other k are calculated similarly. Table
shows the random variable's entire pdf.
EXAMPLE 3.3.7
Acme Industries typically produces three electric power generators a day; some pass the
company's quality control inspection on their first try and are ready to be shipped; others
to
retooled. The probability of a
needing further work is 0.05. If a
is ready to ship, the finn eams a profit of $10,000. If it needs to
retooled. it
ultimately costs
firm $2000. Let X be the random variable quantifying the company's
daily
rind p x (k).
The underlying
space
is a set of 11 = 3 independent trials, where p =
P(generator passes inspection) = 0.95. If the random variable X is to measure the
company's daily profit, then
X
.$10,000
x
Of'!l~r::Hnrs
(no.
pas::.-iug in:-;pc;t:Liuu)
- $2,000 X
I. s) = 2($10,000) 1($2,000) = $18,000. Moreover, the random
variable X equals $18.000 whenever the
output consists
two successes and one
failure.
is, X(s. I. s) X(s, s, J) = XU, s, s). It foHows tbat
=
p (X
= $18,000) = p x (18,000) =
G)
(0.95)2(0.05)1
TABLE 33.4
No. Defectives
o
1
2
3
k=
$30,000
$18,000
0.857375
$6,000
-$6,000
0.007125
0.000125
= 0.135375
Section 3.3
Table 3.3.4 shows px(k) for
-$6,000).
Discrete Ri!ndom Variables
151
k ($30,000, $18,000, $6,000, and
four possible values
EXAMPLE 3.3.8
As part of her warm-up drill,
player on State's basketball team is
to shoot
a 65% success rate at the foul line,
free throws until two baskets are made. If Rhonda
what is the pdf of the random variable X that describes the number of throws it takes her
to complete the drill? Assume that individual throws constitute independent events.
3.3.1 illustrates what must occur if the drill is to endonthekth
k = 2, 3, 4, ... :
First, Rhonda
to make
one
sometime during the first k - 1 attempts,
and, second,
needs to
a basket on the kth toss. Written formally,
px(k)
= P(X = k) ::::: P(drillendsonkththrow)
and k~2
in
k-l throws) II (basket on ktb throw»
= P(l basket and k-2 misses) , P(basket)
= P«l
Exacdy ooe bMket
Attempts
FlGURE].].1
Notice that k-l rhrt,.......... t
throws
in a basket:
"",\..u-vu"",,,,
have
M
T
k-l
sequences
M
3"
M
k-l
M
T
M
B
2
T
Since each sequence
property tha t exactly one of the
T
M
T
M
B
T
H
probability (0.35)k-2(0,65),
P(l basket and
=
(k-l)(0.35l-2 (o.65)
Therefore,
PX(k)
=
(k-1)(0.35)k-2(0.65) . (0.65)
= (k-l) (0.35)k-2(O.65)2 •
k
= 2. 3. 4, ...
(3.3.4)
Table 33.5 shows the pdf evaluated for
values k. Altbough the range of k is
infirute, the bulk of the probability associated with is concentrated in the values two
through seven; It is higbJy unlikely,
example,
Rhonda would need more than
seven shots to complete
drill.
'58
Chapter 3
Random Variables
TABLE 1.1.5
k
px(k)
2
3
4
0.2958
0.1553
0.0725
5
6
0.0317
0.0133
0.0089
7
Transformations
a variable from one scale to another is a problem that is comfortably
familiar. If a thermometer says the temperature outside is 83°F, we know that the
temperature in
Cellligrade is
= (~)
32) =
(~) (83
=28
An analogous question arises in connection with random variables. Suppose thaI X is a
discrete random
with pdf px(k) If a second
variable, Y, is defined to be
aX + b, where a and b are
what can be said about the pdf for Y?
X is (j discrete random variable. Lei Y =
Theorem 3.3.1.
are constants. Then py (y 1
Proof..
py(y)
(j
X
+ b, where a and b
PX (Y : b).
= P(Y = y)
P(aX
+ b=
=
~X =
y a
b)
PX ( - -
c
EXAMPLE 3.3.9
Let X be a random variable
which px(k) =
k = 1. 2, .... 10. What is the
probability distribution associated with the random V1'Ir'"lnIP Y. where Y 4X - I? 111al
find py(y).
From Theorem
P(Y = y) = P(4X
1 = y) = P(X
(y + 1)/4)
: I
which implies that prey)
1,2, ... , to. But (y
(y
+ 1)/4 = 10 when y
+
39.
= I~
for the ten values of (y
1)/4 = 1 when y = 3, (y
py(y) =
+
+ 1)/4 = 2 when y
Y = 3.7 ..... 39.
1)/4 that
=
7.....
Section 3.3
Discrete Random Variables
159
Cumulative Distribution Function
In working with random vari.ables, we frequently need to calculate
probability that
value
a random variable is somewhere between two numbers. For example, suppose
we have an integer-valued random variable. We might want to calculate an expression
pes :5 X :5 r). If we know the pdf for X, then
I
:5 X ::: I)
= L px(k).
k=s
but depending on the nature of px(k) and the number of terms that
to be added,
t may
quite difficult. An
calculating the sum of px(k) from k = s to k
strategy is to use the
that
P{s :5 X :5 t) = P(X :5 t) -
:5 s - 1)
the two probabilities on the right represent cumulative probabilities of the random
variable X. If the lauer were available (and they often are), then evaluating P (s :5 X :5 t)
by one
subtraction would clearly be
than doing all
calculations implicit
I
in
px(k).
Definition 3.3.4. Let X be a discrete random variable.
any real number I. the
probability that X
on a value .::s I is the cumulo.tive distribution function (cd!)
of X (written Fx{t)). In formal notation, Fx{t) = P(!s E S I X(s) :5 Ii).
was
the cdf is written
the case with
references to sand S are typically deleted.
Fx{t)
= P(X ::: t).
EXAMPLE
Suppose we wish to compute P(21 ::: X :5 40) for a binomial random variable X
with n
50 and p = 0.6.
Theorem 3.2.1, we know the formula
px(k), so
P(21 :5 X :5 40) can be written as a simple, although computationally cumbersome, sum:
P(21 :5 X
.::s 40)
=
f
k=21
(50)(0.6l<O.4)50-k
k
Equivalently. the probability we are looking for can be expressed as the difference
between two
:5 X :5 40)
= P(X :5 40)
-
P(X !: 20)
= Fx(40)
-
Fx(2fJ)
As it turns out,
of
for a binomial random variable are widely available,
both in books and in computer software. Here, for example, Fx(40) = 0.9992 and
Fx(20) = 0.0034, so
P{21 ::: X ::: 40) = 0.9992 - 0.0034
= 0.9958
160
3
Random Variables
EXAMPLE 3.3.11
X denote the
Find Fx(2.5).
Suppose that two
are roUed. Let the random
two faces showing: (a) Find Fx (I) for t = 1,2, ...• 6
a. The sample
with the
=
,o.V'~"1""11I1~'''''
of rolling two fair dice is the sel
where the face
showing on the second die is j.
of ordered pairs, s
on the first die is i and the face
(t,
outcomes are
likely. Now, suppose! is some
(I)
of the
Then
P(X
.:s t)
P(Max (i, j)
= P(i
:s I}
~ I
= PO :s
and j.:5 t}
t} . P(j .:s t}
(why?)
I
6
6
t2
- 36'
1,2,3,4,5,6
random variable X has non-zero probability only
b. Even though
is defined for any real number from -00 to +00.
1 through 6, the
But
Fx(2.5) =
:s
:!. 2.5) - P(X ~ 2)
= Fx(2)
+
+
P(2 <: X
the
definition,
:s 2.5)
0
so
= Fx (2) = 36 =
What would the graph
1
9
as a function of f look like?
QUESTIONS
3.3.1. An urn contains five balls numbered 1 to 5. Two balls are drawn simultaneously.
(8) let X be the larger of the two numbers drawn. Find px(k).
(b) let V be the sum of the two numbers drawn. Find pv(k).
3.3.2. Repeat Question 3.3.1. for the case where the two balls are drawn with replacement.
3.3.3. Suppose a fair die is tossed three times. Let X be the largest of the three faces that
appear. Find Px{k).
3.3.4. Suppose a fair die is tossed three times. Let X be the number of different faces that
appear (so X 1.2. or 3). Find pAk).
3.3.5. A fair com is tossed three limes. Let X be the number of heads in the tosses minus the
number of tails. Find
1.3,4,5.6,8. If both dice
3.3.6. Suppose die one has
1.2.2. 3. 3, 4 and die two has
Show that the pdf for
are rolled. what is the sample space? Let X = total
X is the same as for normal dice.
Section 3.4
Continuous Random Variables
'61
3.3.7. Suppose a particle moves along the x-axis hP'''T'rn1 at O. It moves one
to
of its p06ition after 4
the left or right with equal probability. What
3.3.8. How would the
for in Question 3.3.7. be
if the particle was twice as
likely to move to the right as to tile left?
3.3.9. Suppose that five people, including you and a
Hne up at random. Let the random
variable X denote the number of people standing between you and your friend. What
is PA (k)?
3.3.10. Urn I and
II each have two red chips and two while chips. Two chips are
simultaneously from
urn. Let Xl be (he number of red chips in the first "'.... ",..,..'"
and
the number of red chips in the second
Find the pdf of X I +
3.3.11. Suppose X is a
random variable with n = 4 and p =
What is the
of
1.
2X + I?
3.3.12. Find the cdf for the random variable X in
3.3.13. A fair die is rolled four times. Let the
X denote the number of
that
appear.
and graph the cdf for X.
3.3.14. At the
x 0, 1. .... 6, the cdf for the
random variable X has the value
Fx(x) = x(x + 1)/42. Find 1he pdf for X.
3.3.15. Find the pdf for the discrete random variable X whose cdf at the points x = O. I .... , 6
is
by Fx(x) = .x 3/216.
CONTINUOUS RANDOM VARIABLES
The statement was made in Chapter 2 that all
spaces belong to one of two
types--discrete
spaces are ones that
a finite or a countably
number
of outcomes and continuous sample
are those that contain an
infinite
number of outcomes. Rolling a pair
and recording the
that appear is an
experimen1 with a discrete sample space;
a number at random from the interval
fo, 1] would
a continuous sample space.
How we
probabilities to these l wo types of sample spaces is
33
focussed on discrete sample spaces. Each outcome s is assigned a probability by the
discrete probability function p(s). If a random variable X is defined on the sample space,
the
associated with its outcomes are assigned by the probability density
function px(J.:). Applying those same
though, to the outcomes a continuous
sample space will not work. The
that a continuous sample space has an uncountably
infinite number of outcomes eliminates
assigning II probability to each point
as we did in the discrete case with the
p(s). We
this section with a particular
pdf
On a discrete sample space that suggests how we
III
on a continuous sample space.
an electronic surveillance monitor is turned on briefly at the
of
every hour and has a 0.905 probability of working properly,
how long it has
remained in service. If we let the random
X denote the hour at which the monitor
first
then px(k) is the product of k
probabilities:
px(k)
P(X = k)
= P(monitor fails for
P(monitor functions properly
(0.905)k-l(0.095),
k=
1. 2,3,
time at the kth hour)
first k -1 hours n monitor fails at the kth hour)
162
Chapter 3
Random Variables
0.1
0.1)')
0,(]8
0,07
0.06
Px(k) 0.05
0.04
0.03
0.02
0.01
o
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Hour when monitor (jrst f:ails. k
FIGURE 3.4. 1
of px(l() for k values ranging from 1 to 11.
3.4.1 shows a probability
the width of each bar is 1,
area of
Here the height of the kth bar is px(k),
kth
is also px(k).
Now, look at Figure 3.4.2, where the exponential curve y = O,le-OJx is SUI:>erlmlD01;eCl
px(k). Notice how closeJy the area under the curve approximates the
bars. It follows that the probability that X lies in some given interval will be
similar to the integral of the
curve above that same ,n .." ..... ,,,
For example, the probability that the monitor
during the first four hours
would be the sum
4
P(O:'5 X :'54) =
L px(k)
k=O
4
= L(O.905)k-l(O.095)
.4:=0
= 0.3297
To four decimal
.-"'~.~~,
the corresponding area under the exponential curve is the same:
10f04
dx = 0.3297
=
the similarity here between px(k)
the exponential curve y
is our
alternative to p(s) for
samplt: spa(.:t:s. Instc:au vf
prolbalJlll1tles for individual points. we will define
for inlervals of points, and
those probabilities will be areas under the graph of some function (such as y = O.1e- O,lx),
where the
of the function will reflect the desired probability "measure" to be
associated with
space.
Section 3.4
Continuous Random Variables
163
OJ
0.09
0.08
0.07
Y '" 0.1e-o. h
0.06
pX<k) 0.05
0.04
0.03
0.02
O.OJ
o
1 2 3 4 5 6 7 S 9 1Q 11 12 13 14 15 16 17 IS 19 20 21
HOUT
when moni[Dr first fails.. k
RGUm: 3.4..2
Definition 3..4.1. A probability function P on a set of real numbers S is caJled continuous if there exists a function J(t) such that for any dosed interval [a, b] C S,
P([a, b])
= I: J(t) dl.
Comment. If a proba bility function P satisfies Definit.ion 3.4.1, then peA)
for any set A where the integral is defined
Conversely, suppose a function J(t) has the two properties
=
fA
J(t) dt
L J(t) 2: 0 for all t
2. ~J(t)dt = 1.
If P (A)
= fA J (t) dl for aU A. then P will satisfy the proba"bility axioms given in Section 2.3.
Choosing the Function f(t)
We have seen that the probability structure of any sample space with a finite or countably
infinite number of outcomes is defined by the function pes) = P(outcome is s). For
sample spaces having an uncountably infinite number of possible outcomes, the function
J(t) serves an analogous purpose. Specifically, f(t) defines the probability structure of S
in the sense that the probability of any interval in the sample space is the integral of J(t).
The next set of examples illustrate several different choices for J(I).
EXAMPlE 3.4.'
The continuous equivalent of the equiprobable probability model on a discrete sample
space is the function f(t) defined by f(t) = l/(b - a) for all t in the interval [0, b] (and
J(t) = 0, otherwise). This particular f(t) places equal probability weighting on every
dosed interval of the same length contained in the interval [a, b]. For example, suppose
164
Chapter 3
Random Variables
density
o
8
6
3
1
10
A
FlGURE3A..3
a = 0 and b = 10, and let A = [1. 3] and B = [6.8].
itr (~)
10
3
P(A) =
dt =
~
=
10
and
j(t)
1 (2.)
8
PCB) =
6
10
dt
EXAMPlE 3.4.2
Could j(t) =
.0:::: I :::: 1 be used to define the probability function (or a
sample space whose outcomes
of aU the real numbers in the .""t~...".,
because (1) j(t) > 0
all t,
j(t} dt = 101312 dl = t31~ = 1.
Notice thaUhe shape of f (t)
implies that outcomes close to 1 are more
likely to occur than are outcomes close to O. For exam pie, P ([0, ~]) = f~/3 3t2 dt = t31~/3 =
~. 'While
peri. 1]) = li'3 3t2 dt = ,3 li13
1
=
3
Probability
density
2
Area
o
RCiURE 3.4..4
EXAMPLE 3A.3
By far the most
continuous probability functions is the
curve, known more
as the normal (or Gaussian) distribllfion_ Tbe sample space
its probability function is given by
for the nonnal distribution is the entire
jet) =
1 [1
./iHa exp -2 (I - Jl.)2]
-(1-
,
-00
<
I
<
00,
-00
<
Jl.
<
00,
(1
> 0
Section 3.4
Continuous Random Variables
165
u=O.5
/(1)
FIGURE 3..4.5
Depending on the values assigned to the parameters J.L and (1, f(t) can take on a variety
of shapes and locations; three are illustrated in .ol"'LWI.,
Fitting 1(1) to Data: The Density-Scaled Histogram
The notion of using a continuous
an integer-valued
discrete probability model has
been
3.4.2). The "trick"
there was to replace the spikes that define px(k) with
whose heights are px(k)
and whose widths are one. Doing that
the sum
areas of the rectangles
corresponding to px(k) equal to one, which is the same as the total area under the
approximating continuous probability function.
equality of those two
areas, it makes sense to superimpose (and compare)
of px(k) and the
continuous probability function on the same set of axes.
Now,
the related, but slightly more general, problem of
a continuous
function to model the distribution of a set of n
Yl,)1, .•. ,YII_
Fol!owrin~ the approach taken in Figure 3.4.2, we would start by llla.lU.lJl!; a histogram
n observations. The problem is, the sum of the areas of the bars
that
hisl:Ol!I'2JTI would not necessarily equal one.
a case in point, Table 3.4.1 shows a set of forty observations.
Yi '8
into five classes, each of width ten, produces the distribution and ruslcOglram pictured in
...,U"..... 3.4.6. Furthermore, suppose we have reason to believe that
Yi '8 may
.<:I.U .... VI... sample from a uniform probability function defined over the
[20,70}VV."VLULY
1
f(l) =
1
70 - 20
= 50'
20:51:570
TABLE 1..4.1
33.8
41.6
24.9
62.6
54.5
22.3
68.7
42.3
405
69.7
27.6
62.9
30.3
41.2
57.6
32.9
22.4
64.5
54.8
58.9
25.0
33.4
48.9
60.8
59.2
39.0
68.4
49.1
675
53.1
38.4
42.6
64.1
21.6
69.0
46.0
46.6
166
Chapter 3
Random Variables
Class
Frequency
20<y < 30
3O.:::;y<40
40:s:."<50
50:s: y < 60
6O:s: y < 70
7
12
8
6
9
4
8
10
0
20
40
30
40
50
60
70
J'
FIGURE 3.4.6
(recall
can we appropriately draw the
the Yi'S and
the uniform probability model on the same graph?
Note,
thot f(l) and the histogram are not computible in the sense that the urea
under f(t) is (necessarily) one (= 50 X
but the sum of the areas of the bars making
up the
iE/our hundred:
do),
histogram area = 10(7)
+
+
10(9)
+
10(8)
+
10(10)
=400
Nevertheless, we can "force" the total area of the five bars to match the area under f(t)
by redefining the scale of the vertical axis on
Specifically,frequency
to be replaced with the analog of probability density, whleh would be the scale used on the
vertical axis of any graph of f(t). Intuitively, the density associated with, say, the
[20,30) would
as
quotient
7
40 x 10
because
that constant over the interval [20, 30) would
the latter
does represent the
probability that an observation
(20,30).
Figure
shows a histogram of the data in Table 3.4.1
the height of each bar
has been
to a density, according to the formula
density (of a class) =
class tr.. ft"'''''F''V
.
total no. of observatIons X class
Superimposed is the unifonn probability
20 :S: I :S: 70. Scaled in this
areas under both f (I) and the
are one.
In
density-scaled histograms offer a
but effective, format fOT examining
the "fit" between a set of data and a
continuous model. We will use it often in the
chapters ahead. Applied statisticians have
embraced this particular graphicaJ
technique. Indeed, computer
that include Histograms on their menus
routinely give users
either frequency or density on the
Continuous Random Variables
Section
161
0.03
Class
20.:sy<30
3O.:s y < 40
40.:s y < 50
5O.:sy<60
60.::: y < 70
7/[40(1O)J
6/ [40(1 OH
9/[40(10)]
8/[40(10)]
10/[40(10)]
=
=
=
=
=
0.0175
0.0150
0.0225
0.0200
0.0250
Density
0.01
y
20
30
40
SO
60
70
FIGURE 3.4.7
CASE STUDY 3.4.1
ago, the VS05 transmitter tube was standard equipment on many
Table 3.4.2
part of a reliability study done on
are the liletimes (in hrs) recorded for 9()3 tubes (37). Grouped into
eighty, the densities for
nine classes are
in
last column.
aY""'<;;lU,::>.
radar
listed
TABLE1Al
Number
SO-16O
Density
317
230
118
93
240-320
49
4SO-560
560-700
700+
UJ....."-''"'L'"''""'
17
26
-20
903
0.0044
0.0032
0.0016
0.0013
0.0007
0.0005
0.0002
0.0002
0.0002
has shown that lifetimes of electrical equipment can often
by the eXIX>nential probability function,
J(t) = Ae-)J,
nicely
t > 0
value of A(for reasons explained in Chapter is set equal to
reciprocal
of the average lifetime of the tubes in the sample.
the distribution of these data
also
described by the
model?
way to answer such a question is to
the proposed
on a
graph the density-scaled
The extent to which the two graphs are similar
then becomes an obvious measure of the appropriateness of the model
(Continued on next page)
168
3
Random Variables
(C(lSt SIIU:!Y 3.4.1 comilllled)
(tOO4
0.003
Probability 0.002
0.001
240
80
500 560
VB05 lifetimes (hrs)
700
AGURE 3.4.8
For these data, A would be 0.0056. Figure 3.4.8 shows the function
/(1)
= 0.0056e-000561
plott.ed on (he same axes as the density-scaled
excellenl, and we would have no reservations about
lifetime probabilities. How
is it, for
than five hundred hrs? Based on the
[he agreement is
areas under /(t) to estimate
tube will last longer
would be
0.0608:
P (V805 ureUI11le exceeds
f~ 0.OO56e-o.0056Y dy
= -e-O·OO56)'I~oo = e-O.OO56 (500) = e-2.8
0.0608
Continuous Probability Density Functions
We saw in Section 3.3 how the introduction of discrete random variables facilitated the
solution of certain problems. The same sort of iunctlon can also be defined On
spaces with an uncountably infinite number of outcomes. In practice, continuous random
variables are often simply an identity mapping. so they do not radically
sample space i.n the way that a binomial random variable
Nevertheless, it
have the same notation for both kinds of
spaces.
the real
into the real
Definition 3A.2.. A (unction Y that maps a
Y is the function fy(y)
numbers is called a continuous random variable. The
Section 3.4
Continuous Random Variables
169
having the property that for any numbers a and b,
Pea
s
Y
s b) =
lb
fy(y) dy
EXAMPLE 3.4.4
We saw Case Study 3.4.1 that lifetimes of V805 radar tubes can be nicely modeled by
exponential probability function,
/(I)
= O.OO56e-ROOS6I ,
(>
0
To couch that statementin random variable
would simply require that we define
Y to be the life
a V805 radar tube. Then Y would be the identity mapping and the pdf
the random variable Y would be the same as the probability function, /(1). That we
would write
O.OO56e-o.~.l56y,
fy(y)
y ~0
Similarly, when we work with the bell-shaped normal distribution in
wm write the model in random variable notation as
fy(y) =
1
-==-e
chapters we
-oo<y<oo
EXAMPLE 3.4.5
Suppose we would like a continuous random variable Y to "select" a number between
o 1 in such a way that intervals near the middle of range would more likely
to be represented than intervals near either 0 or L One pdf having that property is the
Figure 3.4.9}. Do we know for
that
function fy(y) = 6y(1 - y),O .::; y s 1
the function pictured in Figure 3.4.9 is a "legitimate" pdf? Yes, because fy(y) ~ 0 for all
y, and
6y(1 - y) dy = 6[).2/2
y3/3] 16 = 1.
It
Comment..
simplify the way pdfs are written, it will
that fy(y) = 0
actually specified in the (untion's definition. In Example 3.4.5,
for all y outside the
Probability
density
'1v=5:
Y
('-YI
O~--~1-----_~I----~3----~-Y
"i
2
FIGURE lA.9
'4
170
Chapter 3
Random Variables
for instance;, the statemem
Jy (y)
y), 0
6.1'(1
y,5 1 is to be interpreted as an
abbreviation for
IY(Y) =
0,
)'<0
6y(1-
O.::sy.::sl
0,
)' :>
1
1
Cumulative Distribution Functions
random varia bIe, discrete or continuous, is a cumulative distribution
random variables (recall Definition 3.3.4), the cd! is a nondecreasing
function. where the
occur at the values of t for which the pdf has
probability. For continuous random
the cdf is a monotorncaUy nondecreasing.
continuous function, In both cases, the
can be helpful in calculating the probability
that a random variable takes on a value in a
interval. As we will see in later chapters.
there are also several important
that hold for continuous cdfs and pdfs. One
such relationship is cited in Theorem
£1,,,,,,rPltf"
Definition 3.43. The cdC for a
of its pdf:
Fy(y) =
random variable Y is an indefinite integral
[~Iy{r) dr =
PHs
E
S I Y(s) :::: yJ)
P(Y :::: y)
Theorem 3.4.1. Let Fy(y) be the cdl of a continuous random variable Y.
d
dy
Proof.
statement
Theorem of Calculus.
a. PlY >
s)
h. P(f' <
Y:s s) =
Jim Fy(y)
y-+oo
.
= .(}.(\,)
.
Theorem 3.4.1 follows immediately (rom the Fundamer\t.al
C
Y be a continuous random variable wilh cd! Fy (y), Then
Theorem 3.4.2.
c.
F~·(,,)
= 1
Fy(s)
Fy(r)
=1
lim Fr(Y) = 0
d
y~-oo
Proof.
It.
b.
P(Y > s) = 1 - P(Y .::s s) since (Y > :;) and (Y :::: s) are complementary events.
But P(Y < s) = Fds), and the conclusion follows.
Since the set (r < Y :::: 05) = (Y .::s 05) - (Y .::s r), P(r < Y :s $)
P(Y::::
pry :s r) = Fy(s)
Fy(r).
Section 3.4
Co
111
=
Iy,,) be a set of values of Y, 11
1. 2,3 .... , where y" < Yn+l for all
11, and lim y"
00. If lim Fy(y,,) :: 1 [or every such sequence {YII), then
=
n-+oo
lim Fy(y)
~oo
[or n
= 1. To that
n~
= (Y :::; Yl), and
= P(LIZ..1 At) = L
set Al
= (Yn-l
An
< Y :::; y,,)
n
= 2,3, ...
Then' FY(Yn}
disjoint. Also, the sample
P(Ak)
S
= lfb.t
peAk), since the AI< are
"'='1
Ak, and by Axiom 4, 1 = P(S) =
peAk). Then putting these equalities together gives 1
Ak) -
d.
Continuous Random Variables
lim
L"
n-+oo k=O
lim Fy(y) =
P(Ak)
PlY
y->--co
=1
=1 -
=
= lim Fy(y,,)
,,-><'£}
::s Y) =
lim P(-Y
y-+-oo
lim P(-Y::s -y) = 1 -
y-+ -00
lim F_y(y)
y-+oo
~ -Y)
= y-+-oo
lim [1
lim P(-Y
y-+oo
P(-Y::s -y)J
::s y)
=0
o
Transformations
If X and Yare two discrete random variables and a and b are constants such that
Y = aX + b, the pdf for Y can be expressed in terms o[ the pdf for X. Theorem
3.3.1 provided
details.. Here we
the analogous result for a linear transfonnation
involving two continuous random variables.
ThOOl:'em 3.4.3. Suppose X is a continuous random variLlble.
and bare conSlIlnts. Then
If(Y)
1
= lal
Y
=aX + b, where a -:;: 0
(Y b)
-0
Proof. We begin by writing an expression for the cdr of Y:
Fr(y)
= PlY ::s y) = P(aX
+ b::s y) = P(aX ::s Y
- b)
At this point we need to consider two cases, the distinction being the sign of a. Suppose.
first. that a > O. Then
Fr(y)
= P(aX ::s y
- b)
==
P
(X ::s
y :
b)
and differentiating Fy(y) yields frey):
fy(y)
=
d
d
-Fy(y) =
dy
1
( ~)=!fx(~)=
a
a
alai
(Y: b)
Ifa < 0,
Fy(y) = P(aX .::s y - b)
= P (X
Y- b) = 1
> -a-
P
(X .:':S
y
a
112
Chapter 3
Random Variables
Differentiation in this case gives
fy(y)
=dyd
(y)
=
!!-.
[1
dy
(y ~ b)
~fx
lal
(Y -a b)
o
and the theorem is
QUESTIONS
3.4.L Suppose Jr(y) =
.0.:::: y .:::: 1. Find p(O.:::: Y .:::: ~).
~
3.4.2. For lht: ndlluom variai;1c }' with pdf Jy(y)
=
+
0 ~ y ::: 1, find P(1 ~ Y ~ 1)
.
3.4.3. Let fy(y)
-1 ~ y .:::: L Find pOY
~I < l). Draw a
of fy(y) and show
the area representing the desin.:d probability.
3.4.4.
infected with a certain form of malaria, the
of time
in remISSion
is described by lhe continuous pdf fy(y)
,},2. n .:::: y .:::: 3, where Y is measured in
years. What is the probability that a malaria patient's remissinn lasts longer than one
year?
3.4.5. The length of time. Y, that a customer spends in line at a bank teller's window before
being served is
by the exponential pdf
= O.2e-O•2y • y 2: O.
(a) What is the probability thnt n customer wi\{
more than 10 minutes?
(b) Suppose the customer will leave if the wait is more than 10 minutes. Assume that
the customer goes to the bank twice next month. Let the random variable X be
the number of times the customer leaves without
served. Calculate px(l).
3.4.6. Let II be a positive
ShowthatJy()')=(n +2)(n +1»'110- y),O.::::y.::::l,is
a pdf.
3.4.7. Find the cdf for the random variable Y
in Question 3.4-.1. Calculate P (0.:::: Y .::::
=
using
3.4.8. If Y is an exponential random variable, fy(y)
3.4.9. If the pdf for Y is
Jy(y)
=
0,
1 _ \yl.
1
find and graph Fy(y).
3.4.10. A continuous random variable Y has a cdf
= A.e-Ay , y 2: O. find Fy(y).
IYI> 1
I.vl ::s 1
by
y < 0
0.:::: y < 1
}'2: 1
i)
Find P(! < Y.::::
twowa'vs--nr'Sl
3.4.11. A random variable Y has
Fy(y)
=
using the cdf and second,
0
In y
l:::s y :::s e
1
e < y
1
v < 1
using the pdf.
Section
.lA.12.
3.4.13.
14.14.
3.4.15.
3.4.16..
3.4.17.
Expected Values
113
Find
(a) P(Y < 2)
(b) P(2 < Y ::s 2~)
(c) P(2 < Y < 2!)
(d) fy(y)
The cdf for a random variable Y is defined by Fy(y) 0 for y < 0; Fy(y) = 4y1 O::s Y ::s 1; and Fy(y) = 1 for y > 1. Find P(l ::s Y ::s ~) by integrating fy(y).
Suppose Fy(y) =
+ y3), O::s Y ::s 2. Find fy(y).
In a certain country. the distribution of a family's disposable income, Y, is
by the pdf fy(y) = ye-)', y 2: O. Find the median of the income distribution-that is,
find the value m such that Fy(m)'= OS
Let Y be the random variable described in Question 3.4.3. Define W 3Y
2. Find
(w). For which values of w is jw(w) of; O?
Suppose that fy (y) is a continuous and symmetric
where symmetry is the property
that fy(y) fy(-y) for aU y. Show that PC-a ::s Y ::s a) = 2Fy(y) - 1.
Let Y be a random variable denoting
age at which a piece of equipment fails. In
reliability theory, the probability that an item fails at time y given that it has survived
until time y is called the hazard rate, hey). [n terms of the pdf and cdC
bel
=
hey)
Find h(y) if Y has an exponential pdf
=1
+
_ Fr(y)
Question 3.4.8).
EXPECTED VALUES
Probability density fUnctions, as we have already seen, provide a global overview of
a random variable's behavior. If X is discrete, px(k) gives P(X = k) for all k; if Y is
continuous, and A is any interval, or countable union of intervals, P(Y (; A)
fA jy(y) dy.
that explicit, though, is not always necessary--or even helpful. There are times when
a more prudent s(rategy is (0 focus
information contained in a pdf by summarizing
certain of its features with single numbers.
The first such feature that we will examine is central tendency, a term referring to the
value of a random variable. C-Onsider the pdf's px(k) and fy(y) pictured in
Figure 3.5.1. Although we obviously cannot predict with certainty what values any future
X's and Y's will take on, it seems dear thal X values will tend to lie somewhere near, /.Lx,
and Y values, somewhere near /.Ly. In some sense, then, we can characterize px(k) by /.Lx,
and fy(y} by /.LY.
=
FIGURE 3.5.1
174
Chapter 3
Random Variables
The most frequently
measure for describing central tendency-that is, for quantifying /Lx and tty-is the expected value. Discussed at some length in this section and in
the expected value of a random variable is a slightly more abstract formulation
Section
discrete settings as the arithmetic average.
of what we are already familiar
Here,
the values
are "weighted" by the
Gambling affords a familiar illustration of the notion of an C)(f)Cctc<.\
Ihe game of roulette. After
are placed, the croupier
the
and declares one
the winner. Disregarding what seems 10
thirty-eight numbers, 00, 0, 1,2, ... ,36, to
a perverse tendency of
roulette wheels to land on numbers for which no money
hIlS been wngered, we will assume that each of these thirty-eight numbers is
(although only the eighteen numbers 1,3,5, ... ,35 are considered to be
and only the
eu.!me:en numbers 2. 3, 4, ... ,
are considered to be even).
that our particular
"even money") is $1 on
If the random
X denotes our winnings, then
on the value} if an odd
occurs,
otherwise. Therefore,
P(X = 1)
=
18
9
=19
and
20
px(-l) = P(X = -1) = 38
Then I~ of the t: .ne we
Intuitively,
if we
more than
cenls each
10
19
win one dollar and }g of the time we will lose one dollar.
in this
we stand to lose, on the average, a Ihtle
we play the
"expected" winnings ...::. $1
-
9
19
-"'U.UJ.J
+
10
19
== -5¢
The number
is called the expected value of X.
Physically, an expected value can be thought of as a center of gravity.
fur
~~<~. .t"~' imagine two
of
~ and ~ (Xlsitioned
a weightless X-axis at
the
-1 and
respectively (see Figure
If a fulcrum were placed at the
point -0.053, the
would be in balance, implying that we can think of that point as
ofr the center of the random variable's distrihution.
(
)
...
fiGURE 3.$.2
Section 3.5
Expected Values
115
Tf X is a discrete random variable taking on each of its values with the same probability,
the expected value of X is simply the everyday notion of an arithmetic average or mean:
L k·
expect(~d value of X =
allk
lenOln:gthis
1
n
= -1 L
n
k
allk
to a discrete X described by an arbitrary pdf, px(k),
expecte~d value
X
= Lk . px(k)
(3.5.1)
allk
a continuous random variable, Y.
summation in Equation
integration and k . px(k) becomes y . frey).
Defininon
is replaced by an
Let X be a discrete random variable with probability
px(k).
is denoted E(X} (or sometimes J.L or J1.x) and is given by
expecled value
E(X)
= J1. = I1x = L k
. px(k)
allk
Similarly, if Y is a continuous random variable with pdf frey),
E(Y)
= J1. =
J1.y
=
f':
y . fr(y) dy
Comment. We assume that both the sum and the integral in Definition 3.5.1 oonverge
absolutely:
L Iklpx(k) <
all k
00
f:
Iylfy(y) dy <
00
If not, we
that
random variable has no finite expected value. One immediate
sum that is not absolutely
reason for requiring absolute convergence is that a
oonvergent depends on the order in which the terms are added, and order should obviously
not
a consideration when defining an average.
EXAMPLE 3.5.1
Suppose X is a binomial random variable with p = ~
k) = m(ij)k(~)3-k,k
11
= 3. Then px(k) = P(X =
== 0,1,2,3. What is the expected value of X?
Applying Definition
gives
E(X) =
_ 0 (~)
729
- ()
2 (300)
(125) _
+ (1) (240)
729 + () 729 + (3) 729 -
1215
_ ~ _ (5)
729 - 3 - 3 9
176
Chapter 3
Random Variables
COllUDent. Notice that the expected value
reduces to five-thirds, which can be
written as three times five-ninths, the latter two factors being nand p, respectively. As
next
that relationship is not a coincidence.
Theorem 1,5.1. Suppose X is a binomial random variable with parameters n l11Ul p. Then
E(X)
= np.
Proot
to Definition
a binomial random variable is the sum
E(X)
k . n!
=
k!(n -
)n-k
k
k)'P (1 -
P
_ p),,-/i:
=
At
point, a trick is caned for. If
E(X)
(3.5.2)
= E g(k) can be fadored in such a way
ala
that e(X)
= h E px,,(k). where px-(k) is the pdf for sOme random variable X", then
alU
= h,
the sum of a pdf over its entire range is one. Here, suppose that np is
factored out of Equation
Then
£(X)
(n E(X)
= np k=1 (k
=np
Now, let.i
=k
-
(1 _ p),l-k
l)!(n
G_~)/-1(1
- L It follows that
E(X)
lettingm
=n
= np
n
(
j
- 1
E(X)
npt(":)pj(l p)m- J
J=ft J
and, since the value of the sum is 1 (why?),
E(X)
np
(3.5.3)
o
Section 3.5
FrrY>r'tPrl
1n
Values
The statement
should come as no surprise. If a
for example, has one hundred questions, each with five possible answers, we
'expelct" to get twenty
by guessing. But if the random variable X
oellou~s the number of correct answers (out of one hundred), 20 = E(X) = lOOa) = np.
EXAMPLE 3.5.2
An urn contains nine chips,
without replacemenl Let X denote
Section 3.2, we recognize X to
four white. Three are .drawn out at random
of red chips in the
Find E(X).
a hypergeometric random V~"'l""l">l
k=O.l,
E(X)
3
=
Comment. As was true in
3.5.1, the value found here for E(X) suggests
a general formula-in
case, for the expected value of a hypergeometric random
variable.
X is a hypergeometric random variable with parameters r. w, ami
r red balls and w white balls. A sample of size n is
urn. Let X be the number of red
in the sample. Then
Tbemem 3.5.2.
n. That is, suppose an urn
drnwn simultaneously from
E(X) =
I"n
r+w
.
o
Proof. See Question
Comment. Let p
the proportion of rd balls in an urn-that is, p =
r
r+w
•
The formula, then
the expected value of a hypergeometric random variable has the
same structure as the formula for the expected value of a binomial random variable:
E(X)
rn
l'
= r+w
=n
r+w
np
178
Chapter 3
Random Variables
EXAMPLE 3.5.3
Among the more common versions of
"numbers" racket is a game called DJ" its name
deriving from the fact that
winning ticket is detennined from Dow
averages.
sets of stocks are used: Industrials, Transportations, and Utilities. Traditionally,
are Quoted at two different times, 11 A.M. and noon. The
of the earlier
quotation are
to fonn a
number; the noon
generates a
second
nmnber.
the same way.
two numbers are then added
together and the last three
of that smn become the winning pick. Figure 3.5.3 shows
a set of quotations for which 906 would be declared the winner.
11
A.M.
Noon quotation
quolation
Jnduslrials
Transportation
U!ilities
Industrials
\
848.1;;:
376.7;3:
110.6:3;
-" -
+
173
906 '" Winning numbel'"
FIGURE 3.5.3
The payoff in DJ. is 700 to 1. Suppose that we bet $5. How much do we stand to win.
or
on the average?
Let p denote the probability of our number
earnings.
x = 1~5~O
..J)J
the winner and 1et X denote our
with probability p
with probability 1 - p
and
E(X)
= $3500 .
. (1 -
p -
p)
Our intuition would suggest (and this time it would be
that each
winning numbers, 000 through 999, is equally likely. That being the case, p
E(X)
$3500·
On the average, then, we lose
C~)
-
$5 .
c:)
the
1/1000 and
=
on a $5.00 bet.
EXAMPLE 3.5.4
Suppose that fifty people are to be given a blood test to see who has a certain
The
obvious laboratory procedure is to examine each person's hlood individually, meaning
Section 3.5
EXpected Values
179
that flity tests would eventually be run.
alternative
is to divide
blood sample into two parts-say, A and B. All of the A's would then be
and
as one sample. If that "pooled" sample proved to negative
all
individuals must necessarily be
of the infection,
no further
would
need to done. If the pooled sample gave a positive reading, of course, aU fifty B samples
would have to be
separately. Under what conditions would it make sense for a
laboratory to consider pooling the fifty SaIJI)P11~S
In
the pooling strategy is
(I.e., more economical) if it can substantially reduce the number of tests that need to be perfonned. Whether or not it can
ae'peIIOS ultim.ately on
probability p that a person is infected with the
random
X denote
number of tests that will have to be perfonned
~~ _ _ _ 1~~ are
aearly,
!
1 if none of the fifty is
51 if at least one of the fifty is infected
x
= px(l)
P(X =
P(none of the fifty is infected)
(1 _ p)50
independence),and
P(X
51)
= px(51)
1 - P(X
= 1)
1 - (1 _ p)50
Therefore,
E(X)
=1
. (1 - p)50
+
51
11 -
(1 _ p)50]
Table 3.5.1 shows E(X) as a
of p. As our intuition would suggest, the pooling
strategy becomes
feasible as the prevalence of the disease diminishes. If the
........ u .... " of a person being infected is 1
the pooling strategy requires an
average of only 3.4 tests, a dramatic improvement over
50 tests that would be needed
samples were tested one by. one. On the other hand, if 1 in 10
pooling would
inappropriate, requiring more than 50 tests [E(X) == SO.7].
TABLE 3.5.1
p
0.5
0.1
0.01
0.001
OJJOOI
E(X)
51.0
50.7
20.8
3.4
180
Chapter 3
Random Variables
EXAMPLE
L:QnS14::aer the following game. A fair coin is
until the first tail
we win $2
if it
on
first toss,
if it
on
second
and, in general,
if it
first occurs on the kth toss.
the random variable X denote our winnings. How much
we have to pay order
to
a fair game? [Note: A
game is one where
"pr'p"""", between the ante and £(X) is 0.]
Known as the St. Petersburg paradox, this problem has a rather unusual answer. First,
note that
1
k-1.2, ...
Therefore,
E(X)
=
i'·;k::::::: 1 +
1
+ 1 + ...
which is a divergent sum.
is, X
not have a finite
value, so in
this game to be fair, OUr ante would have to be an infinite amount of
for
Comment. Mathematicians have
trying to "explain" the St.
paradox
for almost two hundred years
The answer seems clearly absurd-no gambler would
consider pa'Ying even $25 to play such a game, much less an infinite amount-yet the
computations involved in showing that X has no finite expected value are unassailably
correct Where the difficulty lies.
to one common theory, is with our inability to
put in perspective the very small probabili ties of winning very
payoffs. Furthermore,
the problem assumes that our opponent has infinite capital, which is an impossible state
ot affairs. We
a much more reasonable answer for E(X) if the stipulation is i:u.klt:u
that our winnings can be at mos~
$1000
Question 35.19) or if the payoffs are
assigned according to some formula other than
(see Question 3.5.20).
Comment. There are two important lesoons fo he. learned from the
Petersburg
First is the
that E(X) is not necessarily a meaningful characterization
of the "location" of a distribution. Question
another situation where the
formal computation of E(X)
a similarly inappropriate answer. Second, we need
to be aware that the notion of expected value is not necessarily synonymous with the
of worth. Just because a
for example, has a posjtive expected value--even
a very large positive
not imply that someone would want to play
Suppose, for example, that you had the opportunity to spenu yOur last $10,000 on a
SWt~pStalt(es ticket where the
was a billion dollars but the probability of winning was
only 1 in 10,000. The expected value of such :1 bet would be over $90,000,
THtr'l4£ltW
E(X)
= $1.000,OOO,OOOCo,~) +
$90,001
(-$10,000)
Co,ooo)
Expected Value$
Section 3.5
181
it is doubtful that many people would rush out to buy a ticket. (Economists have
long recognized the distinction between a payoff's numerical value and its. perceived
desirability. They refer to the latter as utility.)
EXAMPlE 3.5.6
The distance, Y, that a molecule in a gas travels before colliding with another molecule
can be modeled by the exponential pdf
h(Y)
=#1
y2:0
where # is a positive constant known as the mean free path. Find E(Y).
Since the random
here is continuous, its expected value is an integral:
E(Y)
t
Jo
=
JO
y~e-YI/L dy
#
=
Let w = y/#, so that dw
1/# dy. Then E(Y) =
dv = e-wdw and integrating by
gives
E(Y)
= #[ -we-w
#10 we-wdw. Setting u
e-W]I~ = #
-
wand
(3.5.4)
Equation
that # is aptLy
in fact, represent the.-a:verage
..distance a mOlecule travels, free of any collisions. Nitrogen (N2), for example, at room
temperature and standard atmospheric pressure has # = 0,00005 em. An Nz molecule,
then, travels that far before colliding with another N2 molecule, on
average.
EXAMPlE 3.5.1
One continuous pdf that has a number of interesting appLications in physics is the Rayleigh
distribution, where the pdt is given by
fy(y) =
y 2
e- (la ,
> 0;
0.
0:::; y <
(35.5)
00
Calculate the expected value for a random variable having a Rayleigh distribution.
From Definition 3.5.1,
ECY)
Let v
=
1
00
o
Y .
y
.2
2
Y /211 dy
-e2
0.
= y/c..tia). Then
integrand here is a special case
k
= 1,
182
Chapter 3
Random Variables
Therefore,
1
= 2J2a . 4ifii
E(y)
=
Comment. The pdf
is
for
William Strutt, Baron
the
anti Lwenlidh-ceotury Bfitish physicist who showed that Equation 3.5.5 is the
solution to a problem arising in the study of wave motion. If two waves are superimposed,
it is well known that the height the resultant at any time t is
the
sum
added
3.5.4). Seeking to
of the corresponding heights of the waves
extend that notion. Rayleigh posed the following Question: If n waves, each having the
same amplitude h and the same wavelength, are superimposed randomly with respect to
about the amplitude R of
resultant? Clearly, R is a random
phase, what can we
variable, its value depending on the particular colledion of phase angles
by
the sample. What Rayleigh was able to show in his 1880 paper (173) is that when n is
the pdf
large, the probabilistic behavior of R is described
rrin~l~~nlh-
2r
= -2
iR(r)
r > 0
nh
which is just a "IJ'~"'''''~ case of Equation
with a
Resultant
\
.... _....,,
I
...
Wave 2-
RGURE 3..5.4
A Second Measure of Central Tendency: The Median
While the expected value is the most
used measure of a random
central tendency, it does have a weakness that sometimes makes it misleading and
inappropriate. Specifically, if one or several possible values of a random variable are
either much smaUer or much larger than all the others, the value of J.1.- can be distorted
the sense that it no longer reflects the center of the distribution in any H'''''~'U'E',-'''''
way. For
a small community consists of a homogeneous group of
middle-ronge soJary earners, and then Bill Gates moves to town. Obviously, the town's
HU.H.-"JL'-'
Section
Expected Values
183
l1VC:IiU~C:
salary
and after the multibililonaire arrives will be quite different, even
though he represents only one new value of the "s.l!lary" random variable.
It would
helpfuJ to have a measure of
tendency that was not so sensitive to
"outliers" or to probability distributions that are markedly skewed. One such measure is
the median, which, in effect, divides the area under a pdf
two
areas.
Definition 3.5.2. If X is a discrete random variable, the median, m, is that point for
which
< m)
P(X > 111). In
eventthatP(X:5 m) 0.5 and P(X 2::m/) =0.5,
the
is defined to be the arithmetic average, (m + m/)/2.
If Y is a continuous random variable,
median is the solution to the integral
equation, r:cY:Jy(Y) dy = 0.5.
=
=
EXAMPlE 3.5.8
If a random variable's pdf is symmetric, both J.t andm wiD
equal. Should px(k) or frey)
not be symmetric, though, the difference between the expected
and the median can
considerable, especially if the asymmetry takes the form of extreme skewness. The
situation described here is a case in point
Soft-glow makes a 6()..watt light bulb that is advertised to have an average life of
one thousand hours. Assuming that that performance
is valid, is it reasonable for
consumers to conclude that the Soft-glow bulbs they buy will last for approximately
one-thousand hours?
of a bulb is one thotk~nd hours, the (continuous) pdf, frey),
No! If the average
modeling the length time, Y,
it remains before burning out is likely to have the
form
fy(y)
= O.OOle-O.OO1y ,
y > 0
(3.5.6)
(for reasons explained in Chapter 4). But Equation 3.5.6 is a very skewed pdf,
a
shape much like the curve drawn in Figure 3.4.8. The
for such a distribution will
lie considerably to the left of
mean.
More specifically the median lifetime
these
to Definition 3.5.2-is
the value ttl for which
T
But f;' O.OOle-O,OOlYdy
=1 -
e-O.OOlm Setting the latter equal to 0.5
m
one you
(1/-0.001) tn(O.5)
= 693
that
184
Chapter 3
Random
QUESTIONS
3.5.1. Recall the game of Keno described in Example 3.2.5. The following are all the payoffs
on a $1 wager where the player has bet on 10 numbers. Calculate E(X), where the
random variable X denotes the amount of money won.
Number of Correct Guesses
Payoff
1
2
18
<5
5
6
7
Probability
.935
.0514
8
.0115
.0016
1.35 X 10-4
9
X 10-6
10,000
10
1.12 X 10-7
3.5.2.
Jack first appeared in 1893 at the
World's Fair. Enonnously popular
ever since (250 million boxes are sold each
the snack owes more than a little
of its success,
with children, to the toy included in each box. When a new
Nulty Deluxe flavor was introduced in the mid-l990s, thaI familiar marketing gimmick
was raised to a new level. Placed in one box was a certificate redeemable for a $10,000
ring; in 50 other boxes were certificates for a Breakfast at
video (a movie in
which the
Holly Golightly, finds her engagement
in a Cracker
Jack box); the usual
and
were put in all the other boxes
Calculate
the expected value of
in a box of Nutty Deluxe
Assume that
5 million boxes were
that first year. Also, assume that each video was
worth $30 and each other
3.5.3. The pdf describing the daily
earned by Acme Industries was derived in
Example 3.3.7. Find the company's average
3.5.4. In the game of red ball, lwo drawings are made without epllaC(~m;ent from a bowl that has
amount won is determined
four white ping-pong balls and two red
balls.
a
a player can opt to be paid
how many of the red balls are selected.
the game, which would
under either Rule A or Rule B, as shown. If you were
you choose? Why?
B
A
No. of Red
Balls Drawn
0
1
2
Payoff
0
$2
$10
No. of Red
Balls Drawn
Payoff
0
1
2
$1
$20
°
3.5.5. Recall the
"'''''''''''0''''''' launched by the Wipe Your Feet
company described in .... v!l.n1nil'
On the average, how many new customers would
that effurt identify?
calls would they have to make in order to find an avet'age
of 100 new customers?
Section 3.5
Expected Values
185
l.5.6.. A manufacturer has 100 memory chips in stock, 4% of which are likely to
defective
(based on past
A random sample of 20 chips is selected and shipped to
a
that assembles laptops. Let X denote the number of
that receive
faulty memory chips. Find £(X).
3.S.7. Records show that 642 new students have just entered a certain Florida school district.
Of those
a total of 125 are not adequately vaccinated. The district's physician has
scheduled a day for
to receive whatever shots they miglu need. On any given
day. though, 12% of
district's students are likely to be absent. How many new
students, then, can
expected to remain inadequately vaccinated?
:l5.8. Calculate E(Y) for
foUowing
(a) fy(y) = 3(1 - y)2,O::; y::; 1
(b) fy(y) 4ye- 2y • y ~ 0
(c) fy(y) =
i'
1.
( 0,
0::; y ::; 1
2::; y
.s 3
elsewhere
(d) fy(y) = siny, O::;y::;~
3.5.9. RecaH Question 3.4.4, where the length of time Y (in years) that a malaria patient
pdf fy(y) = ~y2, 0::; y .s 3. What is the
length of
that
a patient
in remission?
3.5.10. Let the random variab1e Y have the uniform distribution over [a. bJ; thatis fy(y) = b~a
fora .s y .s h. Find E(y)
Also, deduce the value of E(y),knowing
that the expected value is the center of gravity of frey).
3.5.U. Show that the expected value associated with the exponential distribution, fy(y) =
,Y > O. is I/A, where A is a
constant.
3.3.12. Show that
fy(y) =
1
y ~1
is a valid pdf but that Y does not have a finite expected value.
3.5.13. Based on recent experience, lO-year-old passenger cars going through a motor .".""1....
inspection station have an 80% chance of passing the emissions test. Suppose that 200
such cars will
checked out next week. Write two formulas that show the number of
cars that are expected to
are
at random from the pdf fy(y) = 3y2.0 ::.:;
3.5.14. Suppose that 15
Y ::; 1. Let X denote the number
in the interval (4, 1). Find E(X).
3.5.15. A city has 74,806 registered automobiles. Each is required to display a bumper decal
law, new decals need to
showing that the owner paid an annual wheel tax of $50.
be purchased during the month of the owner's birthday. How much wheel tax revenue
can the city expect to receive in November?
3.5.16. Regulators have found that 23 of the investment companies that filed for bankruptcy
in the
five
failed because of fraud. not
reasons related to the eOOflomy.
Suppose that
additional
wiU be added to the bankruptcy rolls during
next quarter. How many of those failures are likely to be attributed to fraud?
:l.5.17. An urn OOfltains four chips numbered 1 through 4. Two are drawn without replacement.
Let the random variable X denote the
of the two.
E(X).
3.5.18. A fair coin is tossed three times. Let the random variable X denote the total number of
that appear
the number heads that
on the
and third tosses.
E(X).
186
Chapter 3
Random Variables
3.5.19. How much would you have to ante to make the ST. Petersburg game "fair" (recall
Example
if the most you could win was $10007 Thai is, Ih~ payoffs aJ~ $21: rur
1 ::s k ::s 9,
$1000 for k :::=: 10.
3.5.20. For the 81. Petersburg problem (Example
find the expected payoff if
(3) the amounts won are ,f inslead of ,where 0 < c < 2.
(b) the amounlS won are log 2k. {This was a modification
by D. Bernoulli (a
marginal utility
nephew of James Bernoulli) to take into account the
of money-the more you ha ve, the Jess useful a bit more
3.5.21. A fair die is rolled 3 limes. Let X denote the number of A.I.I~r.d~' faces showing,
X = 1,2,3. Find £(X).
~.5.22. 'Two distinct integers are chosen al random from the firs' five
(WO 11U'.,,,.,,,,.
the expected value of the absolute value of the difference
3.5.23.
that two evenly matched teams are playing in the World Series. On the
how many games will be played? (The winner is the first learn to
four
Assume that each game is an independent event
3.5.24.
one white chip and one blnck chip. A chip is drawn at random. If it is
white. the
is over; if it is black, that chip and another black one are put into
is drawn at random from the "new" urn and the same rules
the urn. Then another
game are followed (if the chip is white, the game is over;
for ending or
if the chip is black, it
in the urn, together with another chip of the same
color). The drawings
untiJ a whilt: chip is sc::lt:clt:<l. Show (hal tilt: ~XtX;CLt:O
number of
necessary to
a white chip is not finite.
3_~.25. A random sample of size 11 is drawn without replacement from an urn containing r red
chips and w while chips. Define the random variable X to be the number of red chips
in the sample. Use the summation
described in Theorem 3.5.1 to prove that
E(X)
= ml(,- +
w).
3.5.26. Given Ihal X is a nonnegative,
random variable, show that
E(X) =
P(X ?:: k)
EXI)ected Value of a Function of a Random Variable
There are
situations that call for finding the
value of a function of
a random
Y = g(X). One common
would be
of scale
problems, where g(X)
aX + b for constants a and b.
the pdf
the new
Y can be easily determined, in which case £(Y) can be calculated by
3.5.1. Often, though, fy(y) can be difficult to
on the complexity g(X). Fortunately, Theorem 3.5.3 allows us to calculate the
value of Y without knowing the pdf for Y.
Theorem 3.5.3. Suppose X is a
function of X. Then the "'-"'1/""'"
rar.!ao.m variable with pdf px(k).
random variable g(X) is given
g(k) . px(k)
provided thai
L
311 k
Ig(k)lpx(k) <
00.
g(X)
a
Section 3.5
Expected Values
181
If Y is a continuous random variable with pdf Jy (Y), and if g (Y) is II continuous junction,
then
vaLue of the rmulom variable g(Y) is
E[g(Y)]
provided Clull f:x, Ig(y)lfr(y) dy <
g(y) . frey) dy
00-
Proot
We will prove the
for the discrete case.
for ",,';;;Lalli>
how the argument is modified when the pdf is qontinuous.
W = g(X).
all possible k·values. kl. k2 • ... , will give rise to a set of w*values, Wlo W2 •... ,
a"T'''1'',~1 more than one k may
associated with a given w. Let Sj be the set of k's
g(k) = Wj [so UjSj is
entire set of k-values
which px.(k) is qefined]. We
P(X E Sj). and we can write
have that peW = Wj}
=
E(W)=LWj
P(W=Wj)=LWj' P(XESj)
j
j
L LWj . px(k)
= L L g(k)px(k) (why?)
=
j
keSj
j
kESj
= Lg(k)px.(k)
all Ii:
Since it is being assumed
E
Ig(k)lpx(k) <
the statement of
all k
holds.
CoroUary. For any random variable W, E(aW
+
b) = aE(W)
+
b, where a and bare
conSUlnts.
case is similar. By TheProof. Suppose W is continuous; the proof for
orem 35.3, E(aW + b)
.1-'oo,(aw + b)/w(w) dw, but the latter can
a.f:.:, W . /w(w) dw + b
fw(w) dw aE(W) + b 1 = aE(W) + b.
0
=
EXAMPlE 3.5.9
Suppose that X is a random variable whose pdf is nonzero only for the three values -2, 1,
and +2:
k
px(k)
-2
g
1
lJ
5
1
2
1
188
Chapter 3
Random Variables
Let W = g(X) = X 2 . Verify the statement of Theorem 3.5.3 by computing E(W) two
ways-fust, by finding pw(w) and summing w . pw(w) over w and. second, by summing
g(k) . px(k) over k.
By inspection, the pdf for W is defined for only two values, 1 and 4:
w (= i?-)
pw(w)
1
R
4
7
~
1
1
Taking the first approach to find E (W) gives
E(W) =
~w.
pw(w) =1 .
(~) + 4· (~)
29
To find the expected value via Theorem 3.5.3, we take
E[g(X)]
.2
= "£.....1<
II
. px(k)
= (-2) 2
5
. -
8
+
(1)
2
1
. -
8
+
(2)
2
2
.-
8
1J.
with the sum here reducing to the answer we already found,
For this particular situation, neither approach was easier than the other. In general,
that will not be the case. Finding pw(w) is often quite difficult, and on those occasions
Theorem 35.3 can be of great benefit.
EXAMPLE 3.5.10
Suppose the amount of propellant, Y, put into a can of spray paint is a random variable
with pdf
fy(y)=3i.
0 < y < 1
Experience has shown that the largest surface area that can be painted by a can having Y
amount of propellant is twenty times the area of a circle generated by a radius of Y ft. If
the Purple Dominoes, a newly formed urban gang, have just stolen their first can of spray
paint, can they expect to have enough to cover a 5' x 8' subway panel with grafitti?
Section 35
No. By assumption, the maximum area (in itl) that can
described by the tlmet/Oln
Expected Values
189
covered by a can of paint is
g(Y) = 20Jt
According to the second statement in Theorem 3.5..3, though. the average value for g(Y)
than the desired 40 f~:
is slightly
E[g(Y)] == fol2OJtJ ·3Jdy
l
= 6(kY
5
0
EXAMPlE 3.5.11
(!l
A fair coin is tossed until a head appears. You will be given
dollars if that first head
occurs on the kth toss. How much money can you expect to be paid?
Let the random variable X denote the toss at which the
head appears, Then
k
= 1,2, .. '
Moreover,
E(amount won) = E
[(!)X] =
2
E[g(X)]
= Lg(k)
, px(k)
allk
=L (l)k
-2 . (l)k
-2
00
k=1
(~)2k = (~y
=L (1)1
-4 (~y
OJ
1.:=0
1
- --...,... - 1
1 -
= $0.33
190
3
Random Variables
EXAMPLE 3.5.12
aVI')W.<lll'JH;)of probability to physics, James Clerk Maxwell
(1831-1879)
a density function given by
S of a molecule in a perfect
!s(s) =
S '>
0
where a is a constant depending on the telJlperat, of the gas and the mass of the
What is
average energy of a molecule in a """"1"1»1'1 gas?
Let m denote the molecule's mass. Recall from
that energy (W), mass (III), find
(S) are related through the
W=
~mS2 =
2
g(S)
To find E(W) we appeal to the second part of Theorem 3.5.3:
00
W) =
10
g(s)!s(s)ds
00
= 10
1
t 1°O
3
=2m
Make
t
10
0
= os2. Then
E(W)
00
rr
(3
fa
m
t 3!2t:-f dt
(~) J;i,
(see Section ",6)
so
= E(W) =
a~(
(~)
3m
=4a
EXAMPLE 3.5.13
Consolidated Industries is planning to market a new product and they are
how
to manufacture. They estimate
each item sold will return a
of m
Expected Values
dollars; each one not sold represents an n dollar
demand for the product, V, will have an eX1XH1lenlLuu ......." ..
191
they suspect the
u U ' ........'VH.
tv (v)
= ( ±)
v > 0
How many items should the company produce if
want to
their expected
profit? (Assume that n, m, and A are known.)
If a total of x items are made, the company's profit can be expressed as a fWlction Q(v),
where
Q(v)
and v is the number of
E(Q(V)]
= fooo Q(v)
=
I
n(x
mv
mX
if v < x
ifv:?;x
v)
sold. It <Vli'V""" that their expected profit is
. [v(v)
nxt(l
= fox[cm + n)v
A.. (m
To find the optimal production
+ n)
-
we
-~-..;;;.
=
+
a
The integration here is straightforward,
simplifies to
E[Q(V)]
1
00
e-v/Adv
mx .
L<AJlUU,.".
A. . (m
+
(~)e-vll dv
Equation 3.5.7 eventually
-
n.x
to solve dE( Q(V)J/dx
(m
+ n)e-
X
/
A -
(3.5.7)
= 0 for x. But
n
and the latter equals zero when
x = -). . In (
EXAMPlE 1.5.14
A point, y, is SelC!CU::Q
segments
to the longer segment·!
m
n
+n
)
from the interval [0, 1J. dividing the line into two
What is the expected value of the ratio of the shorter ~_,..,.~_.
UUJ'YVJlll
y
o
I
2
fIGURE ;1.5.5
1
192
3
Random Variables
Notice, first, that the function
g(Y)
has two expressions,
aelpellQln~
on the location of the chosen point:
g(Y) =
{Y/(1 -
fy(y) = 1,0:s Y :s
E[g(Y)] =
L so
{~1 Y
10 - .i'
second integrand as (1/),
1 dy
O:Sy:s~
!<y:SI
y),
y)/y,
(1 -
=
. 1dy
+ (II
12
-)'
. 1
Y
1)
1
11r
"2
(!\' 1) dy = y- Y{
(In
•
::I
1
In 2 - 2
By ~Ylllllldly, thuugh, the twu
E
an; tht;
~alllt:,
su
=21n2 - 1
=0.39
a little more than 2~
of
QUESTIONS
3.5.27. Suppose X is a binomial random variable with I! 10 and p
~. What is the expected
value of 3X - 41
3.5.28. Recall Question 3.2.4. Suppose Ihal each defective cornP()flenl discovered at the work
station costs the company $100. What is the average
cost to the company for
detective components'!
3.5.29. Let X have the probability den:si(v
x).
0 < x < 1
elsewhere
that Y
Find E(y) two different ways.
Section 3.6
The Variance
'9]
A tooL and die company makes castings for steel stress-monitoring gauges. Their annual
profit, Q, in hundreds ofthousands of dollars. can be expressed as a function of I-JL""V-U'~'
demand, y:
Suppose that the demand (in thousands) for their castings follows an exponential pdf,
fv(y) = &-6y. y > O. Find the
expected profit.
Y inches,
A box is to be constructed so that its heigh tis 5 inches and its base is
where Y is a random
described by the pdf, fv(y) = 6y(1 - y),O < y <: 1.
End the expected volume
box.
3.5.32. Grades on the last Economics 301 exam were not very good. Graphed, their distribution
had a shape similar to the pdf
Jy(y)
1
'
= 5000(100 - y),
0
~
y!:: 100
As a way of "curving" the results, the professor announces that he will
each
person's grade, Y, with a new
g(Y), where gCY) = wJ¥.
the professor's
strategy been
in
class average above 6O?
3.5.33. Find E(y2) if the random
Y has the pdf pic lured below:
I
1.
o
3..5.34. The hypotenuse, Y.
a uniform pdf over
area. Do not leave
~----------------~----y
the isosceles right triangle shown is a a.."'''''''')J variable having
value of the triangle's
interval [6,1OJ. Calculate the
answer as a function of a.
o
3.5.35. An urn contains n
chip i is
to
0
numbered 1 through n. Assume that the probability of choosing
i = 1,2, .... n. H one chip is
calculate E(i), where the
random variable X denotes the number showing on
the sum ofthe
n integers is n(n + 1)/2.
Hint: Recall that
VARIANCE
We saw in
the location of a distribution is an important characteristic
measured by calculating
the mean or the median. A
of a distribution that warrants further scrutiny is its dispersion-that is,
it can be
second
194
Chapter 3
Random Variables
TABLE 3.6.1
(k)
k
1
-1
2
1
i
I
k
p X2(k)
-1,000,000
"2
1,000,000
1
J
1
are totally different:
extent to which its values are spread out. The two
Knowing a pdCs location tells us absolutely nothing about its
",... 'm~'''. Table 3.6.1, for
to zero) but
example, shows two simple
pdfs with
same
with vastly different dispersions.
It is not immediately obvious how the
in a pdf should be quantified. Suppose
that X is any discrete random variable. Une seemingly reasonable approach would be
to average the deviations
X
mean--that is, calculate the expected value of
X - Ji. As it happens, that
will not work because the negative deviations
exactly cancel the
making the numerical value of such an
always zero,
amount of spread present in px(k):
-
Ji) =
E(X) - Ji = Ji - Ji = 0
Another possibility would be to modify Equation 3.6.1 by making all the
tive-that replace E(X
Ji) with E(lX - JLI}. This does work, and it is
to measure
but the absolute value is somewhat trc~ub·Jesorrie
It
not have a simple arithmetic formula, nor is it a
the deviations proves to be a much better approach.
Definition 3.6.1. The variance of a random
"'"""V''''' from JL. If X is discrete with
is the ""~,,,.... ,,.... value of its squared
If Y is continuous with pdf hey),
is not finite. the variance is not defined.]
Comment. One unfortunate consequence of Definition 3.6.1 is that the units for the
are the square of the
for the random variable: If Y is iI'YII"<I<:l,'rpl1
eX.dmple the units for Var(Y) are inches squared. This causes obvious p ...).....' ..."..."
variance back to the sample values. For that reason, in applied ,,,.,,"<1'"''''',
""""'''''~'''P
Section 3.6
is especially important, ch.o::,ne'r~l(
the stondnrd devioJion, which is defined to
The
195
is measured not by the variance
square root of the
That
if X is discrete
(1
= standard deviation =
if Y is continuous
expected value of a random
Comment. The analogy between
center of gravity of a physical system was pointed out in Section 3.5. A -'_.,-between the variance and what
call a moment ofinettia.
set of weights
having masses mh m2, ... are positioned along a (weightless) rigid bar at distances
rt. r2 • ... from an axis of rotation (see Figure 3.6.1), the moment of inertia of the system
is
to
value E m/f. Notice, though, that if the masses were the probabilities
i
with a discrete
'1. r2.··· could be written
-
variable and if the axis of rotation were actually Jl, then
Jl),. .. and Emir; would
same as the
Jl). (k2
i
(k - /I)2 . Px(k).
all k
( ...
.
..
)
AGURE 3.6.1
Definition 3.6.1 gives a fn....mnl
calculating (12 in both the """'............
cases. An equivalent-but
to use-formula is given in Theorem
Theorem 3.6.1. Let W be any random variable, discrete or continuous, having mean Jl and
for which E(W2) isjinile. Then
Proof. We will prove
W is similar. In
Var(W) = E«W
theorem for the continuous case.
let g(W) = (W - /I)2. Then
=
J':
g(w)fw(w) dw
=
J':
argument for discrete
(w
/I)2 fw(w) dw
196
Chapter 3
Random Variables
that appears in the integrand and using the additive
dw
=
i:
i:
=
2
(W - 2fJ.w
W
2
+ IJhfwCw) dw
/W(w) dw - 2fJ.
= E(W2) - 2fJ.2
Note that the equality J~ w2 fw(w) dw
+ p.,2 =
i:
w.fw(W) dw
ECW2)
+
fJ.2 fw(w) dw
fJ.'2.
= E(W2) also follows
3.5.3.
0
EXAMPLE 3.6.1
An urn contains five chips, two red and three white. Suppose that two are drawn out at
random, wilhout replo.cement. Let X denote the nwnber of red chips in the sample. Find
Var(X).
Note, first, that since the chips are not being replaced from drawing to drawing X is
a hypcrgeomctric random
Moreover, we need to find fL, regardless of which
used to calculate . In the notation ofTheorem 3.5.2, r = 2, w = 3, and It = 2, so
p.,
To
rn/(r
+
Var(X)
w)
= 2 . 2/(2 + 3) = 0.8
we write
Var(X) = £[(X - fJ.)2] =
L(x - p.,)2 . fx(x)
allx
= 0.36
To use
From Theorem 3.5.3,
we would
(x) =
02 .
= 1.00
Section 3.6
The Variance
'97
Theo
Var(X) =
J,L2
= 1.00
- (0.8)2
=0.36
coofirming what we calculated
In Section 3.5 we encountered a
of scale formula that applied to expected
variable W. E(a W + b) = aE(W) + b.
values. For any constants a and b and any
A similar issue arises in connection with
variance of a linear transfonnation: If
Var(W) = 0'2, what is the variance of aW + b?
Theorem 3.6.2. Let W be Qny ro:ndQm varinble having menn J,L and where E(W2) is finite.
Then Var(aW + b) = a 2Var(W).
Proof. Using the same approach taken in the
of Theorem 3.6.1, it can be shown
that E[(aW + bf] = a 2E(W2) +
+ We
know from the Corollary to
Theorem 3.5.3 that E(llW + b) = aJ,L + b. Using
3.6.1, then, we can write
Va.r(aW
+
+ b)2) + b)f
+ 2abJ,L + l?] - [aJ,L + bf
2
=[a E(W2) + 2abJ,L + l?] - [a 2J,L2 + 2abJ,L + l?]
= a 2[E(W2) - J,L2] a 2Var(W)
b) =
W
EXAMPLE 3.6.2
ranlOOlm
variable Y is described by the pdf
fy(y)
= 2y,
0 < y < 1
What is the standard deviation of 3Y + 21
First., we need to find the variance of Y. But
l
E(Y)=
loo
2
y. 2ydy=-
3
and
1
dy=-
2
so
Var(Y)
= E(y2)
1
18
-
=
1 _ (_23)2
2
0
Random Variables
Then, by Theorem 3.6.2,
Var(3Y
= (3)2
+
. Var{y)
=9
.
1
18
1
=2
which makes
3Y + 2 equal to
standard deviation
or 0.71.
QUESTIONS
3.6.1 if the sampling is done with
3.6.1. Find Var(X) for the urn problem of
replacement.
3.6..2. Find the variance of Y if
~,
Jy(y)
=
i,
0,
O~Y:$l
2:$ y ::; 3
elsewhere
3.6.3. Ten equally qualified applicants, six men and four women, apply for three lab technician
of the applicants over all the others,. the
positions. Unable to justify choosing
"""""l'\f1,npl director decides to select the
at random. Let X denote the number of
men hired. Compute the standard deviation of X.
3.6.4. C...omput.e the variance for a uniform random variable defined On the unit interval.
3.6..5. Use Theorem 3.6.1 to find the variance of the random variable Y, where
jy{y) = 3(1 - y)2,
0 < y < 1
3.6.6.. If
jy(y)
2y
k2 '
O:::::y<k
for what value of k does Var(Y)
2?
3.6.7. Ca1culate the standard deviation, (f, for the random variable Y whose pdf has the graph
shown below:
o
1
2
3
y
_
________L __ _ _ _ _ _ _ _
~
b -_ _ _ _ _ _
-
I
2
L
Chapter 3
~
'98
Section 3.6
The Variaru:e
199
3.6.8. Consider the pdf defined by
fy(y)
=
2
= y3'
y?:. 1
=
Show that (a)1. frey) dy 1, (b)E(Y) 2, and (c) Var(Y) is not finite.
3.6.9. Frankie and Johnny play the following game. Frankie selects a number at random
from the interval [a, h]. Johnny, not knowing Frankie's number, is to pick a second
number from that same inverval and pay Frankie an amount. W. equal to the squared
difference between the two [so 0 ~ W ~ (h - a)2]. What should be Johnny's strategy
if he wants to minimize his expected loss?
3.6.10. Let Y be a random variable whose pdf is given by fy (y) = Sy4. 0 .::: y .::::: 1. Use
Theorem 3.6.1 to lind Var(Y).
3.6.11. Suppose that Y is an exrmential random variable. so fy (y) = u- AY , Y ?:. O. Show that
the variance of Y is 1/)" .
3.6.12. Suppose that Y is an exponential random variable with)" = 2 (recall Question 3.6.11).
Find P(Y :> E(Y) + 2JVar(Y».
3.6.13. Let X be a random variable with finite mean /1. Define for every real number
a. g(a) = E[(X - a)2]. Show that
g(a)
= E[(X
- /.Ii]
+
(/1 -
a)2.
What is another name for min g(a)?
o
3.6.14. Let Y have the pdf given in Question 3.6.5. Find the variance of W, where W =
-5Y + 12.
3.6.15.. If Y denotes a temperature recorded in degrees Fahrenheit, then ~(Y - 32) is the
corresponding temperature in degrees Celsius. If the standard deviation for a set
of temperatures is 15.7°P, what is the standard deviation of the equivalent Celsius
temperatures?
3.6.16. If E(W) = /1 and Var(W) = 17 2, show that
E
(W-/1) = 0
17
anti
3.6.17. Suppose U is a uniionn random variabJe over [0, 1].
(9) Show that Y = (b - a)U + a is unifonn over [a, b]
(b) Use Part (a) and Question 3.6.4 to find the variance of Y.
Higher Moments
The quantities we have identified as the mean and the variance are actually special cases
of what are referred to more generally as the moments of a random variable. More
precisely, E(W) is the first moment about the origin and (12 is the second moment about
the mean. As the terminology suggests, we will have occasion to define higher moments
of W. Just as E(W) and (12 reflect a random variable's location and dispersion, so it is
possible to characterize other aspects of a distribution in terms of other moments. We will
see, for example, that the skewness of a distribution-that is, the extent to which it is not
symmetric around /1--can be effectively measured in terms of a third moment. Ukewise,
there are issues that arise in certain applied statistics problems that require a knowledge
of the flatness of a pdf, a property that can be quantified by the fourth moment.
200
Chapter 3
Random Variables
Definition 3.6.2. Let W be any random variable with pdf fw(w), t<'or any positive
m[eg(~r r,
rth moment of W
1.
provided
tty, is given by
the
' fw(w) dw <
00
(or provided the analogous condition on
the SUl1urlou'on of Iwl holds, if W is
When r = 1, we usually
,hlzr-M,nt and write E (W) as /-l rather than /-lI.
Z. The rlh moment of W about the mean, IL~, is
by
r
the
provided the finiteness conditions of part 1 hold.
Comment. We can
the binomial expansion of (W -
in terms of p.j. j = 1,2, ...• Y, by simply writing out
= E[(W - J.li] =
=E[(W
- p.)3J = J.l3
= E[(W - tt)dJ = /-l4 -
and so on,
EXAMPlE 3.6.3
The skewness of a pdf can be measured in terms of its
moment about the mean.
It a pdf is symmetric,
J.l)3] will obviously be zero; for pdfs not symmetric.
E[(W - p.)3] will not be zero. In
the symmetry (or
ofapdfis
often measured by the
of skewness, Yt, where
Yt
= ----::----"-
by 0"3 makes Yl Q]l1ner'lS1(mlC~SS.
"shape" parameter in common use is the coefficient of kunosis, Yl, which
....",,",1<,,,.., the fourth moment about the mean. Specifically,
8e(:Dt1ld
3
Section 3.6
The Variance
For certain pdf's, Y2 is a lliIeful measure of peakedne1iS: relatively.flat
platykwtic; more peaked pdf's are calIed leptokurtic [see (97)].
201
are said to
Earlier in this chapter we encountered random variables whose means did not existrecall,
example, the St Petersburg paradox. More generally. there are random variables
having certain of
higher moments finite and
others, not finite. Addressing
the question whether or not a given E (Wi) is finite is the following existence theorem.
If the kth moment of a random variable
Theorem
thank
all moments of order less
Proof. Let frey)
the pdf of a continuous ranaolm variable Y.
E(Y") exists if and only if
J':
Definition
Iylk . frey) dy < co
(3.6.2)
the theorem we must show that
Let 1::::: j < k.
is implied by Inequality 3.6.2. But
1
00
Iyli . frey) dy::= (
IYli. fy{y) dy
"-yl:::l
-00
::; (
fr(y) dy
1y I::: 1
1
::::: 1
+ (
+
+ (
1
11yl >1
lylJ· hey) dy
Iyli. frey) dy
IYI> 1
Iyli. hey) dy
J'YI>l
::::: 1
+ (
Iylk. hey) dy < co
AYI>l
Therefore, E(yJ)
is similar.
j
= 1,2, ...• k
- 1. The
for discrete random variables
EXAMPlE 3.6.4
Many of the random variables that playa major role in statistics have moments existing
for alL k, as does, for instance. the normal distribution introduced in Example 3.4.3. Still,
it is not difficult to find wen-known models for which this is not true. A case in point is
the Student t distribution, a probability function widely used in
procedures. (See
Chapter 7.)
The pdf for a Student I random variable is given by
frey)
c(n)
=
_2.
( 1
+ ~) (11+1)/2
,-co < y < co,
n 2:: 1
202
Chapter 3
Random Variables
where n is
By definition,
to as the distribution's "degrees of freedom" and c(/'I) is a constant
(2k)th moment is the integral
=c(n) .
i: ( y~
1
Is E(y2k) finite?
Not neces.,\arily. Recall from calculus
("+1)/2
dy
+L
n
aD integral of the form
will Wl1vt:rge unly if a > 1. Abu, tht:: cuovefgence
for integrals of
are the same as
Therefore, if E(y2k) is to
we must have
n
or, equivalently, 2k <
lreea~:)m has £(X8) <
/'I.
00,
+1
2k > 1
Thus a Student t random ""'.... "'n. with, say, n = 9 degrees of
but no moment of order
than eight exists.
QUESTIONS
3.6.18. Let Y be a uniform random variable defined over the interval (0,2). Find an expression
for the. rth mome.nt of Y about the origin. A Iso, lI~F, thF. hinomil'll expansion as descrihed
in the comment to find
- Ji)6J.
3.6.19. Find the
skewness for an exponential random variable
the pdf
3.6.20. Calculate the coefficient 'of kurtosis for a uniform random variable
over the
unit
Jy(y)
1,
0 ::: y ::: 1.
3.6.21. Suppose that W is a random variable for which E[(W - J1.)3] = 10 and E(W3) = 4. Is
it possible that Ji =
=
Section 3.7
Joint
rl9'...."'1riP<
203
3.6.22. If Y = aX + b, show that Y has the same coefficients of skewness
kurtosis as X.
3.6.23. Let Y be the random variable of Questlon
where for a
integer II, fy (J)
(II + 2){n + l)y"(1 0 ~)' ~ 1.
(a) Find Var(Y)
(b) For any positive integer k, find the kth moment around the origin.
3.6.14.. L>UI",I-".'~ that the random variable Y is described by the pdf
y > I
fy(y)=c·
(0) Find c.
(b) What is the highest moment
Y that exists?
JOINT DENSITIES
Sectlons 3.3.
3.4 introduced the basic terminology for descrihing the probabilistic
behavior of a single random variable. Such information, while adequate [or many
problems, is insufficient when more than one variable is of interest to the experimenter.
Medical researcbers,
example, continue to
the relationship between blood
cholesterol and heart disease, and, more recently, between "good" cholesterol and "bad"
cholesterol. And more than a
attention--both politkal and
given to
the role played by K -12 funding in the performance of would-be high
graduates
on exit exams. On a smaller scale, electronic eq uipment and systems are often designed to
have built-in redundancy: Whether or not that equipment functions properly ultimately
depends on the reliability of two different components.
point
are many situations where two relevant random variables,
X and
y,2 are defined on the same
space. Knowing only /x(':x} and /y(y), though, does
not necessarily provide enough information to characterize the all-important simultaneous
behavior of X and Y.
purpose of
section is to introduce the concepts, definitions,
and mathematical techniques associated with distributions based on two (or more) random
variables.
Discrete Joint Pdfs
As we saw in the single-variable case, the pdf is defined differently, depending on whether
tile random variable is discrete or continuous. The same distinction applies to joint
We begin with a discussion
pdfs as
apply to two
random
Definition 3.7_1. Suppose S is a discrete sample space on which two random variables,
X
y, are defined. The joint probability densily junction of X and Y (or joint pdf) is
denoted PX,y(x, y), wbere
px.y(x,y)
P({sIX(s) =X
and
yes)
.v))
2For the next several sections we will suspend our earlier practice of using X to denote a discrete random
variable and Y to denole a continuous random variable. The category of the random variables will need to be
del ermined from (he context ollhe problem. Typially. though. X and Y will either both be discrete OT both. be
continuous..
204
Chapter 3
Random Variables
Comment. A convenient shorthand notation for tbe meaning of px, yex, y}, consistent
with what was used earlier
of single discrete random variables, is to write
y) = P(X = x, Y = y).
EXAMPLE 3.7.1
has tWo express lines. Let X and Y denote the number of customers in the
first and in the second, respectively, at any given time. During nonrusb
tbe
pdf of X and Y is
by the nJlJ''''''"'Il<
A
x
Y
0
1
2
3
0
1
2
3
0.1
0.2
0
0
0.2
0.25
O.OS
0
0
0.05
0.05
0.025
0
0
that X and Y differ by
Find P(IX - YI = 1), the
By definition.
P(lX -
0.05
YI = 1) =
one.
PX,y(x, y)
= pX,y(O, 1)
+
PX,y
= 0.2
+ px.y(l,O) +
1)
+
Px,y(2,3)
+ 0.2 + 0.05 + 0.05
px.y(1,2)
+ PX.Y 2)
+ 0.025 + 0.025
=0.55
[Would you
PX,y(x, y) to be symmetric? Would you expect the event IX
to have zero probability?)
YI 2:: 2
EXAMPlE 3.1.2
Suppose two fair
are
X be the sum of the ........... ""d showing, and let Y be
the larger of the two.
for example,
=0
px.y(2. 3) =
2, Y
= 3) =
P(0)
px.y(4. 3) = P(X
4, Y
= 3) =
P«((l. 3)(3,1)))
= 362
and
pJ(,y(6,3) =
=6,Y =
Thf. f.ntirf. joint pdf is given in Tahle 3.7.1.
= P({(3. 3)} =36-1
Section 3.7
Joint Densities
205
TASLE 3.7.1
y
~
1
2
3
4
5
6
0
2/36
1/36
0
0
0
0
0
0
0
0
0
0
2/36
2/36
1/36
0
0
0
0
0
0
0
0
0
2/36
2/36
10
11
12
1/36
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2/36
2/36
2/36
2/36
1/36
0
0
0
0
0
0
0
2/36
2/36
2/36
2/36
2/36
1/36
CoL totals
1/36
3/36
5/36
7/36
9/36
11 /36
2
3
4
5
6
7
8
9
2/36
1/36
0
Row totals
1/36
2/36
3/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
Notice that the row totals in the right-hand margin
the table give the pdf ~fm:
X, Similarly, the column totals along the bottom detail the pdf for Y. Those are not
coincidences. Theorem 3.7,1 gives a fonnal statement of the
joint pdf and the individual pdfs,
between the
Theorem 3.7.L Suppose tluJt px.y(x. y) is lh£joinl pdf of the discrete random variables X
and Y. Then
px(x)
=L
PX,Y(x, y)
and
py(y)
ally
LPX,y(x,y)
allx
Proof. We will prove the first statement. Note that the collection of sets (Y = y) for
aU y form a partition of S; that is, they are disjoint
UallyCY = y) = S, The set
(X=x)=(X=x) n S=(X=x) () Ually(Y=y) = UallyI(X =x) n (Y=y)},s9
px(x)
= P(X == x) = p
(U[(X = x) () (Y = Y)l)
ally
= LP(X =
dy
x, Y
= y) = I>x.y(x, y),
0
dy
Definition l.7.2. An individual pdf obtained by summing a joint
the
random variable is
a marginal pdf..
over aU
of
Continuous Joint Pdfs
If X and Yare both continuous random variables, Definition
does not appJy because
P(X = x. Y = y) will be identically 0 for
(x, y). As was the case in single-variable
206
Chapter 3
Random
"",..... '.tV,,"',
the joint pdf for two continuous random variables will be defined as a fUnction,
Y) lies in a
region of the
integrated
the probability that
Definition 3.7.3, Two random variables defined on the same set of real numbers are
R in the
Y)ER)=ffR
y(x.y) dx dy. The
y)isthejointpd/
xy-plane
o/X and Y.
joinlly continuous if there exists:l nmction Ix.Y(K . .1') ~uch IhM for flny
Note: Any function Ix. y(x, y) for which
L fx,y(x,
y) ::::
0 for all x and
y
2. f':f':fx,y(X,Y)dXd:j,'=l
qualifies as ajornt pdf. We shall employ the convention of
is analogous,
the
pdf is nonzero; everywhere else it will be assumed to
course, to the notation used earlier in describing the domain of single random variables.
EXAMPLE 3.7.3
::SUD~)se that the variation in two continuous random variables, X and Y, can be modeled
by the joint pdf Ix.y(x. y) ex)'. for 0 < }' < x < L Find c.
By
Ix,Y(x, y) will
non-negative as long as c :::. O.
c that
y) to be a
pdf, though, is the one that makes the volume under
to L But
c
1 (y21~)
1
x
0
= c
£o (x.l)
1
2
,,:"111
dx = c-
8 0
(1)
= -8
-
20
dx
c
e= 8.
EXAMPLE 3.7A
A
claims that the
daily number of houl,S,
number of
X, a teenager watches television and the
works on his homework are approximated by the
y) = x),e-{x+V).
What is the probabjJjty a
~'-"""b television as he
The
R, in the
Hgure 3.7.1. It follows that
x
> 0,
y > 0
chosen at
at
twice as much time
working on
corresponding to the event "X:::'
" is shown in
is the volume under fx.Y(x, y) above the region R:
P(X ::::
dydx
Section ,3,7
Joint Densities
207
y
o
RGURE 3.7.1
Separating variables, we can write
1
00
P(X ::: 2Y) =
and the double integral reduces to
f,:
16
=1 - 54
7
27
Geometrk Probability
One particularly important special case of Definition 3.7.3 is tbe joint uniform pdf, wbich is
represented by a surface having a constant height everywhere above a specified rectangle
in the
That
Ix.y(x, Y)
= (b
1
a :::s x :::s h, C ~ Y ~ d
a)(d
If R is some region in tbe rectangle where X and Yare defined. P«X, y) E R) reduces to
a simple ratio areas:
area of R
(3.7.1)
P«X, Y) E R)
--:--~---c-)
=
Calculations based on Equation
are referred to as geometric probabilities.
EXAMPlE 3.1.5
Two friends agree to meet on the University Commons "sometime around
" But
neither of them is particularly punctual-or patient. What will actually happen is that
208
Chapter 3
Random Variables
-y=15
(0,
o
- - _........_ - x
(15.0)
6()
AGURE3.7.2
will
at random sometime the interval from
to 1:00. If one arrives and
the other is not mere, the first person will wail fifteen minutes or until 1:00,
comes
and then leave. What is the probability
two will
together?
To simplify notation, we can represent the time period from 12:00 to 1:00 as the interval
zero to sixty
Then if x and y denote the two arrival
the S<1lnpJLe
space is the 60 X 60 square shown in
3.7.2. Furthermore, the event M, "the two
friends meet/' will occur if and only Ix - yl ::;: 15 or, equivalently, if and only if
-15 ~ X
Y ::;: 15.
inequali ties appear as the shaded region in Figure
Notice that the areas of the two
above and below M are each
to
~(45)(45). It
that the two
of meeting:
p (M )=
area
M
----=area of S
(60)2 _
=0.44
EXAMPlE 3.7.6
A carnival operator wants to set up a ringtoss game.
will throw a ring of diameter
square being of length s (see Figure 3.7.3). If the
ring lands entirely inside a square, the
wins a prize. To ensure a pront, the
d onto a grid of squares, the side of
FIGURE 3.7.3
Section 3.7
Joint Densities
209
$
s
RGURf3.7.4
must
the player's chances winning down to something less
one in five. How
small can the operator make the ratio d / s?
First, it will be assumed that the player is required to stand far enough away so that no
is involved and the dng is falling at random on the grid. From Figure 3.7.4, we see
that in
for
dng not to touch any
of the square, the
center must be
somewhere in the interior a smaller square, each side of which is a distance d /2 from
one of the grid lines.
the area of an interior square is (8 - a)2, the
Since the area of a grid square is s2
probability a winning toss can be written as the ratio:
P(ring touches no lines)
But the operator
that
---,,-- .:::: 0.20
Solving for
gives
d
->1
s That is, if the diameter of the
is at least 55% as long as the side of one of the squares,
the player will have no more than a 20% chance of winning.
QUESTIONS
3.7.1. If px.Y(x,
y) = cxy at the points (1,1), (2, 1), (2,2), and (3, I), and equals 0 elsewhere,
find c.
3.7.2. Let X and Y be two continuous random variables defined over the unit square. What
does c equallf ix.Y(x, y) = c(x2 + y2)?
3.7.3. Suppose
random variables X
Y vary in accordance with the jotnt pdf,
ix,Y(x, y) = c(x + y).O < x < y < 1. Find c.
210
Chapter 3
Random Variables
=
3.7.4. Find c if Ix,yex,)I) cry
X and Y defined over the triangle whose vertices are the
points (0, 0), (0, 1), and (1. 1).
3.7.5. An urn contains four red chips, three white chips, and two blue chips. A random sample
of size 3 is drawn without replacement. Let X denote the number of white chips lnlhe
sample and Y the number of blue. Write a formula for the
pdf of X and Y.
3.7,l,.. Four cards are drawn from a standard pokeI' deck. Let X be the number of kings drawn
and Y the number of queens. Find pX.Y (x. y).
3.7.7. An advisor looks ove! the
of his 50 students to see how many math and
science courses each has
for in the coming semester. He summarizes his
results in a table. What is
probability that a student selected at random will have
signed up tor more math courses than science COUTSes?
Number of math courses, X
Number
or science
courses, Y
2
1
0
U
J1
6
4
1
9
to
3
2
5
0
2
of tossing a fair coin three times. Let X denote the number
3.7.8. Consider the
of heads On the last flip, and let Y denote the lolal number of heads on the three flips.
Find I'X,y(x, y).
3.7.9. Suppose that two fair dice are tossed one time. LeI X denote the number of 2's that
appear, and Y the number of 3's. Write the matrix giving the joint probability density
function for X and Y. Suppose a third random variable, IS
where Z
X + Y.
Use fJx.Y(x, y) to find /lz(z).
3.7.10.
that X and Y have a bivariate uniform
over the unit square:
=
fx,y(x. y)
=
I
e,
0 < x < L
0,
elsewhere
(8) Find c.
(b) Find P(O < X < ~,O < Y <
3.7.11. Let X and Y have the joint
fx.Y(x, y)
l).
= 2e-(x+ y ),
0 < x <
y,
0 <
y
Plnd P(Y < 3X).
3.7.12. A point is chosen at random from the inlerior of a circle whose equation
+ )'.2 < 4.
Let the random variables X and Y denote the x- and y-coordinates of rhe sampled
point. Find fx.Y(x. y),
3.7.13. Find
< 2Y) if fx.Y(x, y) = x + y for X and Y each defined over the unit interval.
3.7.14. Suppose that five independent observations are drawn from the continuous
iT (I) = 2/, 0 :S t :':'0 L Let X denote the number of t'S that fall in the interval 0 I <
and Lel Y denote the number of I'S that fall in the interval ~ :':'0 I <
Find px.y(1. 2).
3.7.15, A point is chosen at random from the interior of a right triangle with base band
height h. What is the probability that [he y vallie is between 0 and h/2?
1·
Section 3.7
Joint Densities
211
Marginal Pdf!; for Continuous Random Variables
The notion of marginal pdfs in connection with discrete random variables was introduced
in Theorem 3.7.1 and Definition 3.7.2. An analogous relationship holds in
case--integration, though, replaces the summation that appears in Theorem
Theorem 3.7.2. Suppose X and Yare jointly continuous with joim pdf Ix.y (x. y). Then the
marginal
Ix(x) and fy(y), are given by
and
IX(X)==i:/x,Y(X,Y)dY
fy(y)
= Llx.y(X,Y)dX
Proof. It
to verify the first of the theorem's two equalities. As is often the case
with proofs for continuous random variables, we begin with
cdf:
Fx(x)
= P(X ~ x) = foo
-00
/x
fx,Y(t, .'I) dt dy
==
-00
/x foo
-00
Ix,Y(x, .'I) dy dl
-00
Differentiating both ends of the equation above gives
lx(x) =
f:
fx,Y(x, .'I)
(recall Theorem
EXAMPLE 3.7.1
Suppose that two continuous random variables, X and Y, have the joint uniform pdf,
fx.y(x, .'I) =
1
0~x~3.
0~y:s2
Find lx(x).
Applying Theorem 3.7.2 gives
{2
(21
L
hex) = J lx.Y(x . .'I) dy = 10 '6 dy =
o
o ~x ~3
Notice that X, by itself. is a uniform random variable defined over the interval [0, 3J;
similarly, we would find that fy(y) has a uniform pdf over the interval [0,2].
EXAMPLE 3.7.8
Consider the case where X and Yare two continuous random variables,jointly distributed
over the first quadrant of thexy-plane according to the joint pdf,
Ix. y
_ [y2e(x , y) - 0
,
y
(x+l) ,
x':: 0, .'I':: 0
1 h
esew ere
111
Chapter 3
Random Variables
Find the two
consider fx(x). By Theurem
/x(x) ::::
In the
"l'Ty",>nri
substitute
u
makingdu
= (x +
After applying
1)
l)dy. This
roo
1
fx (x)
= y(x +
= oX +
'"",,"H"U.u.
/x(x)
1
10
__+1_-;:-1
00
du=
(x +
10
by parts (twice) to
= -(X-~-A
[_u 2 e-
=(X~
[2-
du
du, we g<;:t
U
2
(x
-
U
] [
2e-
(;+'"+~)]
x:::,O
Fmding frey) is a bit easier:
fy(y)
i:
- y2 e->'
=ye-Y ,
y(x. y) dx
fooo
dx
=
1
00
y2 e -y(x+l) dx
=le-Y (~) ( _e-Yx [ )
y~O
QUESTIONS
3.7.16. Find the marginal
of X for the joint prlf oerived in Question 3.7.5.
of X and Y for the joint pdf derived in Question 3.7.8.
3.7.17. Find the marginal
for an internarional conglomerate class,ifies the large number of
3.7.18. The campus
students she interviews into three categories-the lower quarter, the middle half, and
the upper quarter. If she meets six students on a
what is the proha hility
that they will be
divided among the three
What is the marginal
probability that exactly two wilJ belong to the middle
3.7.19. For each of the following joint pdf's. find /x(x) and /'I(y).
(8) !x,Y(X,y)= O:::sx:::s2,O:SY:::Sl
(b) /x,r(x,y)=
,O:sx :::s2,O:::SY:::Sl
Section 3.7
(c)
(d)
(e)
(I)
Joint Densities
213
bu(x, y) :::: ~(x + 2y),O ::s x ::s 1,O::s y :s 1
ix.v(x. y) = c(x + y),O::s: x ::s 1,0 ::s y ::s 1
ix,Y(x. y) = 4xy. 0 :s x :s; 1,0 .:s y ::s 1
ho(:x, y) = xye-(''-+Y), O:s; x, O.:s y
(g) Ix.y(x. y) = ye-.ry-y, O::s x, O::s y
3.7.20. For each of thefollowingjoint
find Ix(x) and fy(y).
(8) Ix,Y (x, y) =
0 .:s x .:s y ::s 2
(b) Ix,'r'(X. Y)
.r O:s; Y :s x .:s 1
(c)
(x,y)=6x,O.:sx.:s l,O:s;y::s:1
x
3.7.21. Suppose that Ix,v(x. y) = 6(1 - .x - y) for.x and y defined over the unit square,
subject to the restriction that 0 ::s x + y ::s: 1. Find the marginal pdf for X.
3.7.22. Find Jy(y) if ix.Y(x. y) =
for x and y defined over the shaded region pictured.
y
o
-------x
3.7.23. Suppose that X and Yare discrete random \1ariables with
pX,v(x.
= x!y!(4
4!
x _ y)!
(I)X
(1)Y (1)4-X-), .
2 '3 6
O::s x
+ y::s 4
Find px(x) and py(x).
3.7.24. A g~lization of the binomial model occurs when there is a sequence of n independent trials with three outcomes, where PI = P{Outoome 1) and P2 = P(Outcome 2).
Let X and Y denote the number of trials (out of n) resulting in Outcome 1 and
Outcome 2, respectively.
(a) Show that
px.y(x.
-
PI - P2)"-x-}',O.:s x
+
y ::s n
(b) Find px(x) and py(x).
Hint:
Question 3.7.23.
Joint Cdfs
For a single random variable X, the cdf of X evaluated at some point x-that is,
the probability
the random variable X takes on a value less than or equal to x.
Extended to two
a joint cdf(evaluated at the point (u. u» is the probability that
X ;5 u and, simultaneously, Y ;5 v.
Definition 3.7.4. Let X and Y be
two random variables. The joint cumulative
distribution function of X and Y (or joint cd/) is denoted Fx.y(u, v). where
Fx,y(u, v)
= P(X <
u
and
Y:s v)
114
Chapter 3
Random Variables
EXAMPLE j.7.9
Find the joint edt Fx.y(u, v), for the two random variables, X and Y, whose joint
is
given by fx.y(x, y) = ~(x + xy), 0 .:s; x ::: 1,0::: y ::: 1.
If
is applied, the probability that X .:::; LI and Y .:::; v
a double
integra.l of Ix.y(x.
Fx.y(u. v)
4 (" t"
=3
10 10
4
v
+
xy)dxdy
xl
:310 ( 2(1 +
4 (V ( [U
=3
10 10 (x + xy)dx ) dy
4
11)
j}
ul
= 310 2(1 +
y) 0
2(
i)"
4U
( + 2: - 32
v
y
y)dy
+ 2v2)
o
which simplifies to
(For what values of It and v is FX,Y(u. v) defined?)
Theorem 3.7.3. Lei Fxy(u, v) be the joint cdt assocwted with the continuous random
y(x, y) is a second partial derivalive
variables X and Y. Then the joint pdf of X and Y,
of the joint cdf-Llw, i.), !x,Y(J.:, y)
a2
ax a)'
,y), provided Fx.Y(x, y) has continuous
second partial derivatives.
EXAMPLE 3.7.10
What is the joiOl pdf of the random variables X and Y whose joint cdf is Fx.
}x2(2Y + y2)?
By Theorem
tx.Y(x, y) =
;;;-FX,y(X. Y)
a2
ily'3 x (2 y
the similarity between
examples; so is Fx,Y(x, y).
iP
= ax ay
2
1
(2y
=
+ i)
4
+ i) = 3"x(2 + 2y) = '3(x +
aDd
y)
y(x. y)
xy)
is the same io both
Section 3.7
Joint Densities
215
Muttivariate Densities
The definitions and theorems in this section extend in a very straightforward way
to situations involving more than two variables. The joint pdf for n discrete random
variables, for example, is denoted PXt •...• XlI(Xl, •.•• x,,) where
PXt. .. "XII (Xl •... , x,,)
For
It
= P(XI :::: X1. ... , X" = XII)
continuous random variables, the joint pdf is that fWlction
the property that for any region R in n-space,
P«Xh .... XII) E R) =
Ixt. ... ,Xn (Xl •... , x,d
Ii ... f
Andu FXI .... ,X,,(Xl •... ,x,,) is thejointcdjofcontinuous random variables Xl • .... X,,-that
FXh ...• X"(xt. ...• xl'l)
P(Xl :SXl •..• ,
:sxl'lHhen
=
if'
tXI"" .x., (Xl ••.. , xll )
= ax} ... aXil
The notion of a marginal pdf also extends readily, although in the n-variate case, a
marginal pdf can, itself, be a joint pdf. Given Xl •...• X". the marginal pdf of any subset
of T of those variables (XiI' Xi-).. "" Xi,) is derived by integrating (or summing) the joint
pdf with
to the remaining rJ - r variables (Xjl' Xil"'" Xj,,_,). If the Xi'S are all
continuous, for example,
QUESTIONS
3.7..2.5. Consider the experiment of simultaneously t05Sing a
coin and
X denote the number of heads showing on the coin and Y the number
00 the die.
(8) List the outcomes in
(b) Fmd FX.l::'(1,
3.7.26. An urn contains 12 chips-4 red., 3 black, and 5 white. A sample of size 4 is to be drawn
without replacement. Let X denote the number of white chips the sample; Y the
number of red. Find Fx.y(l. 2).
3.7:1.7. For each of the folJowingjoint pdf's, find Fx.Y(u, v).
(a) fx.y(x, y) = ~y2, 0::; x :s 2,0:$ Y ::; 1
(b) Ix.Y(x. Y) == l(x + 2y), 0 ~ x ~ 1, 0 ~ Y :s 1
(c) Ix,Y (x, Y) = 4xy, 0::; x :5: 1,0 :5: Y ::: 1
3.7..28, For each of the following joint pdfs, find Fx.y(u. tI).
(a) fx.y(x. y) = 0:::: x ~ y :s 2
(b) Ix.Y(x, y) ::::::: ~,O :s y :5: x .s 1
!.
(c:) fx.v(x,y)=6x.0~)(..s1,0:5y:sl
x
3.7:19. Find and graph Ix,v(x, y) if the joint cdf for random variables X and Y is
Fx.Y(x,y) =xY.
O<x<l.
0<y<1
216
Chapter 3
Random Variables
1.7.30. Find the jOint pdf associated with two random variables X and Y whose:
x > O.
WI is
y > 0
3.7.3L Given that Fx,}'(x. Y) =
+ 5xy 4), 0 < x < 1,0 < Y < 1, find the correspOJld
pdf and use it to calculate P(O < X <
~ < Y < 1).
3.7.32. Provelhal
i,
P(a < X :5 b, C < Y :5 d) = Fxs(b. d) - Fx. y(a. d) - FX, y(b, c)
+
Fx. y (a. c)
3.1.33. A certain brand of ftuorescent bulbs will last, on the average. 1000 hours. Suppose that
four of these bulbs are installed in un office. What is probability that all four lire still
functioning after 1050 hours? If Xi denotes the ith bulb's
assume that
Ii (_l_)
e- xjlOOO
\1000
1=1
for Xi > 0,;
] . 2. 3, 4.
3.1.34. A hand of six cards is dealt from a standard poker deck. Let X denofe the number of
aces, Y the number of kings, and Z the number of queens.
(a) Write a formula for px.t·,z{x, y, z).
(b) Find PX.~'
y) and px,z(x,
3.7.35. Calculate px.y(O. 1) if PX.f.Z(X. y,:::)
=
3!
-
\: -
\'
(~)3-X-Y-l forx,y.::=O,I,2,3andO:;;x + y ~ z~3.
3.1~.
Suppose [hal the random variables X.
(x
Jill.x. y.z(w, x, y. :)
<: W
2
(12 )Y (1)Z
6
1
+
for 0 <: x <' 1.0 <' Y < 1. and:: > O. Find (a)
3.737. The four random variables W, X, Y, and Z have
for 0
(1)~
and Z have the mul(ivariale pdf
y,Z)
!IV.X(W.X),
~)!
y(x. )'), (b) h.z(y,
z), and
fz(z).
multivariate
I6wxyz
<: I, 0 < x <: 1. 0 < y < 1, and 0 < ;:; < 1. Find the
and use it to compute P(O <: W < ~,~ < X <
pdf.
Independence of Two Random Variables
The concept of independent events that was introduced in Section 2.5 leads quite naturally
to a similar definition for independent random variables.
Definition 3.1.5. Two random variables X and Yare
to be independent if for every
interval A and every interval B, P(X E A and Y E B) = P(X E A)P(Y E B).
Theorem 3.1.4. The random variables X lIlIO Y orr: in-tip-pendent if and only if there are
functions g(x) and h(y) such that
hu(;.;,
= g(x)h(y)
(3.7.2)
lfF,qualimd. 7.2 holds, rhere is a constant k such that fx(x) = kg(x) and jy(y) = (1/ k)h(y).
Section 3.7
Proof. We prove the theorem for
Y .::: y)
217
continuous case. The discrete case is similar.
suppose that X- and Yare independent. Then Fx.Y(x. y)
= Fx(x)Fy(y), and we can write
= P(X
:5: x and
= P(X .::: x) P(Y .::: y)
iP
()l
fx.Y(x, y) = ax ay Fx.y(x. y) =
(Ix
d
iJy Fx(x)Fy(y)
Next we need to show that _-, ____ _
begin. note that
fx(x)
Set k
Joint Densh:ies
= .r:
=
f:
ix.Y(x, y) dy
=
hey) dy, so fx(x)
=
d
= dx Fx(x) dy Fy(y) -
fx(x)jy(y)
implies that X and Y are independent. To
f':
g(x)h(y) dy
= g(x)
f:
h(y) dy
kg(x). Similarly, it can be shown that frey) =
(l/k)h(y).
P(X E A and Y E B)
i1
=i 1
=
fx.y(x,y)dxdy
=
i1
=i
g(x)h(y)dxdy
kg(x)(l/k)h(y) dx dy
(x)dx
1
fy(y)dy
= P(X E A)P(Y E B)
o
and the theorem is proved..
EXAMPtE 3.7.11
Suppose that the probabilistic behavior of two random variables X and Y is described by
the
pdf fx.r(x, y) = 12xy(1 - y),O.::: x :5: 1,0.::: y :s; L Are X and Y independent?
If
are,
fx(x) and frey).
According to Theorem 3.7.4,
answer to the independence question is ''yes'' if
fx,Y(x, y) can be factored into a function
x times a function of y. But there are such
functions. Let g(x) 12x and hey) = y(l - y).
find fx(x) and fy(y) requires that the "12" appearing fx.r(x, y) be factored in
suchawaythatg(x) . h(y) = fx(x) . frey)·
=
k=
1
00
-00
hey) dy
= 101 y(l
- y) dy
0
Therefore, fx(x) = kg(x) = !C12x)
O:::;y:::;1.
0.::: x
= [i /2 -
1
y3/3] 11 =-
::s 1 and fy(y) =
0
6
(1/ k)h(y) = 6y(1 - y),
Independence of n (:> 2) Random Variables
In Chapter 2, extending the notion of independence from two eveots to n events proved
The independence of
subset of
n events bad to be
to be something of a
218
Chapter 3
Random Variables
Definition 2.5.2). This is not necessary the case of n random
use the extension of
3.7.4 to n random
as the
definition of
in the multidimensional case.
theorem that
is equivalent to the factorization of the joint pdf holds in the multidimensional case.
Definition 3.7.6.
n random variables,
are said to be independent if
there are functions 81 (xJ), 82(Xz), ...• gn(xn) such that for every Xl, Xl, ... , Xn
ix.,X2 ... .,X.,
A similar statement
with p.
• X2.·· ,Xn )
=
gJ(XI)g2(X2)'" gil (XII)
discrete random
which case J is
re~Ha';eQ
Analogous to the result for n = 2 random
the expression on the
of the equation in
3.7.6 can also be written as the product of
XI, X 2, .. _• and
EXAMPLE 3.7.12
Consider k urns, each holding 11 chips,
1 through n. A chip is to be
at
random from each urn. What is the probability that all k chips will bear the same number?
If X I, X2, ... ,
denote the numbers on
2nd, ... , and kth
we are looking for
probability that Xl = X2 ... = XI;. In terms
P(XI
Each
=
the selections here is obviously independent
according to Definition
and we can write
all the others
SO
the
P(Xt=X2="
Random SalmnlP.'O
Definition 3.7.6
question
as it applies to n random variables
having marginal pdfs.-say, ixl (Xt). !X'l(X2)..
(xn)--that might be
A special case of that definition occurs for virtually every set of data
for
statistical analysis.
an experimenter
a set of.n measurements, Xt. X2 • ...• XII.
under the same conditions. Those Xi'S, then, qualify as a set of Inrlpor1......np''l1 lilUIU.lUU'
variables-moreover, each represents the same pdf. The
familiar-notation
for that scenario is given in Definition 3.7.7. We will encounter it often in the chapters
ahead.
Section 3.7
Joint Densities
219
Definition 3.7.7. Let Xl, X2,""
be a set of n independent random variables, an
having
same pdf. Then X h X 2, ... ,XII are said to
a random sample of size n.
QUESTIONS
3.7.J8. Two
dice are tossed. Let X denote the number appearing 0rJ the first die and Y the
number on the
Show that X and Y are independent.
y
3.7.39. Let Ix,Y(x.y)
),2 e -(.r+ ), 0 ~ x, O:s y. Show that X and Y are independent. What
are the
pdfs in
case'!
3.7.40. Suppose that each of two urns has four chips, numbered 1 through 4. A chip is drawn
from
first urn and bears the number X, That chip is added to the second urn. A
chip is then drawn from the second urn. Call its number Y.
(8) Find px.y(x. y).
(b) Show that px(k) = py(k) = k 1,2.3.4
(c) Show that X and Y are not independent
3.7Al. Let X and Y be random variables withjoint pdf
!,
Ix.Y(x. y) = k.
o ~ x :s 1,
0
~
y ~
1. 0 :s x +
y ~1
a geometric argument to show that X and Y are not independent.
3.7A2. Are the random variables X and Y independent if Ix,Y(x. y) = ~(x
o~ y
+
2y), 0 ~ x ~ 1,
~
11
3.7,.43. Suppose that random variables X and Yare independent with marginal pd~
o:s x :s 1 and Jy(y) = 3y2, 0.::; y ::: L Find P(Y < X),
3.7..44.
Ix~x)
=
the joint cdf of the independent random variables X and Y, where Ix(;x) = x
O:s x ~ 2 and fy(y) =
O:s y :S L
3.7.45. If two random variables X and Yare independent with marginal
o ~ x ~ 1 and fy(y) = 1, 0 ~ y :s 1, calculate P (~
3.7..46. Suppose
d that
y)
=
Ix(x) =
> 2}
x > 0, y > O. Prove for any real numbers a, b, c. and
Pea < X < b, C < Y < d) = P(a < X < b) . P(c < Y < d}
thereby establishing the independence of X and Y.
3.7A1.
the joint pdf Ix.Y (x, y) = 2x + y
2xy,O < x < 1,0 < y < 1, find numbers
a, b, c. and d such that
P(a < X < b, c < Y < d) ¢ P(a < X < b) . P(c < Y < d)
thus demonstrating that X and Y are not independent.
3.7A8.. Prove that if X and Y are two independent random variables, then U = g(X) and
V = h(Y) are also independent.
1.7.49. If two random variables X and Yare defined over a region ill the X V-plane that is nol a
"'-"<Utj!;''- (possibly infinite) with sides parallel to the coordinate axes, can X and Y he
independent?
3.7.50. Write down the joint probability density function for a random
of
n drawn
from the exponential pdf, ix(;x) = (lj).)e-x/"A, x > O.
220
Chapter 3
Random Variables
3.7.5L Suppose that XI>
X3, and X4 are
(Xi) = 4xf, O:s Xi :s 1. Find
(a) P(XI < ~)
(b) P(exactlyone
<~)
random variables, each with pdf
(c) fX}.X2,Xj,X.(XI. X2.XJ.X'4)
(d) F X2. X3 (X2. X3)
3.7.52. A random sample of size n = 2k is taken from a uniform
intervaL Calculate P(XI <~.
3.8
>~. X3 <
l,
defined over the unit
X4 > ~, ...• X2k > ~).
COMBINING'RANDOM VARIABLES
In Section 3.4, we derived a linear transfonnation frequentJy applied to single random
variables-Y = a + bX. Now, armed with the multivariable concepts and techniques
covered in Section 3.7. we can extend the investigation of transformations to functions
defined on sets of random
In
the most important combination of a
set of random variables is often their sum, so we
section with the problem of
finding the pdf of X + Y.
Finding the pdf of a Sum
Theorem 3.8.1. Suppose that X (JJ1d Yare independent random variables. Let W = X
Then
+ Y.
1. If X (JJ1d Yare discrete random varinbles with pdfs px(x) (JJ1d py(y), respectively.
pw(w) =
PX(x)py(w - x)
2. If X (JJ1d Yare continuous random variables with pdfs fx(x) and fy(y). respectively,
fw(w) =
L:
Jx(x)fy(w - x)dx
Proof.
1. pw(w) = P(W
=
= w) = P(X
+
U P(X::::: x. Y = w ~.
Y = W)
x) =
P(X
X,
Y
w - x)
~x
L P(X =x)P(Y = w
= L px(x)Py(w - x)
=
x)
aUx
all
of
where the next-to-Iast equality derives from the independence of X and Y.
2. Since X and Yare continuous random variables, we can find fw(w)
differentiating the corresponding c:df, Fw(w), Here, Fw(w) = P(X + Y ::::: w) is found
by
fx.Y
y)
fx<x) . fy(y) over the shaded
R pictured in
Figure 3.8.1.
=
Section 3.8
Combining Random V.ariables
221
y
z
"""---- x
o
FIGURE 3.8. 1
F'IJ(w)
=
=
f:
f:
fx(x)fy(y)dy dx
=
f: (f:fx(x)
x
fy(y) dY ) dx
fx(x) Fy(w - x)dx
Assume that
integrand in the above equation is sufficiently smooth so that
differentiation and integration can interchanged. Then we can write
fw(w) =
d
dw
f:
(w)
dl
= d-w
oo
1
00
fxex)Fy(w - x)dx =
fx(x)
-00
-00
(d-Fy(w
- x) ) dx
dw
fx(x)fy(w - x) dx
and the theorem is proved.
Comment.
integra]
part (2) above is referred to as the convolution of the
functions fx and
Besides their frequent appearances in random-variable problems,
convolutions turn up in many areas of mathematics and engineering.
EXAMPLE 3.8.1
Suppose that X and Yare two independent binomial random variables, each with the
same success probability but defined on m and n trials, respectively. Specifically,
px(k) = (;)pJr.(1 -
p)m-k,
k
= (:)pk(l-
p)n-k,
k=O,l .... ,1t
0,1, .. " m
and
py(k)
pw(w), where W
X
+
Y.
222
L"'~LIlI""
3
Random
By
I'\","\,.",1n 3.8.1, pw(w) =
L
px(x)PY(w - x),
but the summation over "all x"
all x
" ......,...P'.""" as the set of values for x and w - x such that
are both nonzero. But that will be true for all
to w. Therefore,
needs to be
Py(w -
x), res:~cnv
W
pw(w) = ] ; px(x)py(w - x)
=]; (m)
x
w
px(x)
and
x from 0
n )PW-X(1 - P) 1l-(W-X)
(w - x
t(:)(w ~
x)pW(1
Now, consider an urn
m red chips and I't
If w chips are drawn
is
epllaOemefl1l-tne probability that exactly x red chips are in the
the hypergeometric distribution,
(3.8.1)
3.8.1 from x = 0 to x
Summing
w must equal one (why?), in which case
so '
pw(w) =
(m ;
n)pW(l _
w
= 0, 1•... ,n + m
Should we recognize
Definitely. Compare
structure of pw(w) to the
statement of Theorem
The :random variable W has a binomial distribution where
the probability of success at any
trial is P and tbe total number of trials is n + m.
Example 3.8.1
that the binomial distribution "reproduces"
if X and Yare independent binomial random variables with the same value for
p, their sum is also a binomial random variable. Not all
variables share that
property.
sum of two independent
random
example, is not a
uniform
variable (see Question
EXAMPlE 3.8.2
Suppose a
monitor relies on an electronic sensor, whose
X is modeled
by the exponential pdf /x(x) = >..e- u • x > O. To improve the reliability the monitor,
the manufacturer
included an identical second sensor that is activated only in the
redundancy,) Let the random
event the first sensor malfunctions. (This is
lifetime the second sensor, in which case the lifetime
variable Y denote
of the monitor can be written as the sum W = X + Y. Find fw (w).
Section
Combining Random Variables
223
Since X and Yare both continuous
fw(w)
=
I:
fx (x)fr (w
x)dx
(3.8.2)
Notice that fx(x) > 0 only if x > 0 and h(w - x) > 0 only if x < w. Therefore, the
to an integral from 0 to w, and
integral in Equation 3.8.2 that goes from -00 to
we can write
fw(w) =
low fx(x)fy(w low _
xl dx =
low
By
fw(w), we can assess the improvement in the
monitor's reliability afforded by
cold
X is an exponential random
variable, E(X) = I/l (recall Example
different, for example, are P(X ~ 1/1)
..... ~Pf twice the magnitude
and P(W::::: Ill)? A simple calculation
of the former:
Commet'lt.
P(X ::::: 1/,\)
pew : : : l/le)
= ('0
At)..
I~=
roo
Jl/,)..
Finding the pdfs of Quotients and Products
We conclude this section by
the pdf!; for the quotient and product of two
is, given X and Y, we are looking for fw(w), where
independent random
I) W = YI X and
W=
of the resulting formulas is as important as the
pdf for the sum of two random variables, but both formulas will play key roles in several
7.
derivations in
Theorem 3.8.2.
and fr (y), rpt'nprTIl
W= YIX. Then
independent continuous random variables, with pdfs f x (x)
/hat X is zero for at most a set of isolated points. LeI
.n..'~'LUT""
fw(w)
=
J:
IxIJx(x)fy(wx)dx
224
Olapter 3
Random Variables
Fw(w)
= P(Y IX :s w)
and X;:::O)+P(YIX::::,w and X<O)
= P(Y :s wX and X?': 0) + P(Y::::, wX and X < 0)
= P(Y :s wX and X?': 0) + 1 - P(Y :s wX and X < 0)
=P(YIX:sw
= fooo
i:
/x(x)fy(y)dydx
+ 1-
i:i:
fx(x)fy(y)dydx
Then differentiate Fw{w) to obtain
li
OO
fW(W)=-d
d Fw(w)= d
w
= fooo /x(x) (d~
i:
0
11
0
wX
/x{x)fr(y)
-00
fy(y) d
Y) dx -
dx -
i:
d
-00
fx(x)
d
wX
fx(x)fy(y) dy dx
-00
f~ frey) dY) dx
(3.8.3)
(Note that we are
to
of integration and rht1t",.._,..."t,«"ttl"\n
proceed, we need to differentiate the function G(w) = f::;'Jy(y) dy with respect
to w. By the Fundamental Theorem of Calculus and the chain rule, we find
d G(w)
-d
w
= -d
dw
1
wX
fy(y)dy
-00
d wx =xfy(wx)
= fr(wx)-d
w
Putting this result into _-.. ___ '.~_. 3.83 gives
1
00
fw(w) =
xfx(x)Jy(wx)dx
= foco xfx (x) fy (wx) dx +
1
f': xfx(x)fy(wx)dx
f : (-x)/x(x)fy(wx)dx
00
=
Ix I/x(x)fy(wx)dx + f:1x'/X(X)fy(WX)dX
= f:
Ix I/x (x) fy(wx) dx
which completes the proof.
o
EXAMPlE 3.8.3
Let X and Y be independent random variables with pdfs fx(x)
fr(y) =
Y > O. Define W = Y IX.
fw{w).
i.e-).x. x > 0 and
Section 3.8
Combining Random Variables
225
Substituting into the formula given in Theorem 3.8.2, we can write
Notice that the integral is the eXiJected value of an exponential ranldolm
parameter ),,(1 + w), so it
+ w) (recall Example 3.5.6).
)..2
fw(w) = A(l
+
1
w)
Theorem 3.8..3. ut X and Y be independent continuous random
and fy(y), respectively. Let W XV. Then
00
fw(w) =
Proof. A nnlt;-oy-rme
provide a proof of
1
-00
<'~T,.iffl"h ...... Ul"".n
w:::::O.
VEII1FlTJlP"
with pdjs fx(x)
1
j;i !x(wjx)fy(x) dx
modification of the proof
details are left to the
Theorem 3.8.2 will
o
EXAMPLE 3.8.4
Suppose that X and Y are independent random variables with pdf's fx(x)
and fy(y) = 2y, O:s y :s 1, respectively. Find fw(w), where W = XV.
According to
1
00
fw(w) =
-00
= 1,0 :s x
:s 1
1
Ixlfx(wjX)!r(x) dx
The region of 'nt"" ..·..,hn.... though, needs to be
to
of x for which the
But fx(w/x) is positive only if 0 :s wjx :s 1, which implies that
integrand is
x::::: w. Moreover,
fy(x) to be positive requires that 0 :s x :s 1. Any x, then, from w to
1 will yield a
Therefore,
jw(w)
(1 !(1){2x)dx
1w
x
=
(I 2dx =2
1w
Comment.
3.8.1, 3.8.2, and 3.8.3 can
adapted to situations where X and
Yare not independent by repJacing the product of the marginal pdfs with the joint pdf.
226
Chapter 3
Random Variables
QUESTIONS
3.8.1. Let X and Y be two ~4eJx:~ random variables.
the marginal pdfs shown
below, find the pdf of X + Y. In each case, check to see if X + Y belongs to the same
family of pdfs as do X and Y.
(a) px(k)
3.8.2.
).J<
=
and py(k)
= e-/
1
I)"
k! k
= 0, 1. 2•...
(b) px(k) = py(k) (1 - p)k-l p, k 1. 2, ...
ix(x) =xe-x,x ~ O,and h(y)
e-Y,y::: 0, where X and Yare
the pdf of X + Y.
3.8.3. Let X and Y be two independent random
whose marginal pdfs are given
below. Find the pdf X + Y. Hilll: Consider two cases. 0 .s 'W < 1 and 1 .s w 2.
Jx(x) = I, 0 .s x .s 1, and /r(y) = 1,O.s y .s 1
3.8..4. If a random variable V is
of two independent random variables X and Y.
prove thaI V is independent X + Y.
3.8.5. Let Y be a uniform random variable over the interval [0.1]. Find the pdf of W =
HinT: Firsl lind Fw ('W).
O.s y .s L Find the pdf of W = y2.
3.8.6.. Let Y be a random variable with hey) = 6)'(1
find the pdf of X Y for the
3.8.7. Given tha1 X and Yare independent random
following two sets of marginaJ
(a) fx(x) = 1,0.s x .s 1, and
(b) fx(x)
0 ..s x .s Land
= 2y. 0 .s .'" .s 1
3.8..8.. Let X and Y be two independent random variables. Given the marginal
indicated
below, find the cdf of Y/ X. Hint: Consider two ca!>es, 0 .s 'W .s 1 and 1 < w.
(a) /x(x} = ],0 .s x .s 1. and fy(y)
1, 0 .s .1' .s 1.
(b) /x(x) =
O.sx .s 1, and
(\,)
O.s y.s 1
random variables, where /x(x)
u-',
3.8.9. Suppose that X and Yare two
x::: 0 and /y(y) = e-'v , .'t" 2: O. Find
of Y / x.
3.9
FURTHER PROPERTIES OF THE MEAN AND VARIANCE
Sections 3.5 and 3.6 introduced the basic de:l1nhions related to the expected value
and
of single random variables. We learned how to calculate E(W), E[g(W}],
E(aW + b), Var{W), and Var(aW + b), where a and b are any constants and W could
be either a discrete or a continuous random variable. The purpose of this section is to
examine certain multivariable
of those
based on the joint
material
covered in oJV"U!.!'"
We begin with a theorem [hat generalizes E[8(W>]. While it is stated here for the case
of fwo random
it extends in a very straightforward way to include functions of
n random
hi'llrt'lm
3.9.1.
1. Suppose X and Yare discrete random variables with join! pdf p x.y (x, y), and let
g(X, Y} be a
X and Y. Then the expecTed value o.f the random variable
g(X, Y) is
by
E[g(X, y)] =
provided
E E 18{x, y)1
all x lIli y
g(x, y) .
. Px.y(x, !-.) < 00.
y)
Further Properties of the Mean and Variance
Section 3,9
227
2. Suppose X
Yare continuous random variables with joint pdf Ix.Y (x. y). and let
g(X, Y) be a C01:!Cll1UOllSjimction.. Then the
value o/the random variable
g(X, Y) is given by
E[g(X. Y)]
provided
i:i:
=
. IX,r(x, y)dxdy
Ig(x, y)1 . Ix.r(x, y) dxdy <
00
Proof
basic approach taken in deriving this result is similar to the memclQ
of Theorem 35.3.
(134) for details..
followed in
o
EXAMPlE 3,9.1
Consider the two random variables X and Y
shown in Table 3.9.1. Let
pdf is detailed in the 2 x 4 matrix
g(X, Y) = 3X -
Y
+Y
definition of an ""."",,,t..,,,
Find E[g(X, Y)] two ways-first, by using the
secondly, by using
3.9.1.
TABlE 3..9.1
o
x
Y
1
2
3
o
1
TABLE 3.9.2
Let Z = 3X
2XY + Y. By inspection, Z takes on the values 0, 1, 2, and 3 according
the basic definition, then,
an eXl)eC\:eO
to the
shown in Table 3.9.2.
value is a weighted average, we see that
Y)] is equal to one:
E[g(X, Y)] = E(Z) =
=0
=1
z . fz(z)
218
Chapter 3
Random Variables
The same answer is obtained by applying Theorem 3.9.1 to the joint pdf given in
Figure 3.9.1:
E[g(X, Y)]
= 0 . 81 + 1 . 41 + 2 . "81 + 3 . 0 + 3 . 0 + 2 . "81 + 1 -41 + O·
1
-
8
=1
The advantage, of course, enjoyed by the latter solution is that we avoid the intermediate
step of having to determine fz(z).
EXAMPLE 3.9.1
An electrical circuit has three resistors, Rx, Ry, and Rz, wired in parallel (see Figure 3.9.1).
The nominal resistance of each is fifteen ohms, but their actual resistances" X, Y, and Z,
vary between ten and twenty according to the joint pdf,
1
ix, Y.x (x. y, z) = 675000 (xy
,
+ xz + yz),
10 ::: x ::: 20
10 ::: y ::: 20
10 ::: z ::: 20
What is the expected resistance for the circuit?
FIGURE 3.9.1
Let R denote the circuit's resistance. A well-known result in physics holds that
1
1
1
1
R=X+Y+Z
or, equivalently,
R
= XY +
XYZ
XZ + YZ
= R(X. Y, Z)
Integrating R(x, y, z) . ix, Y.Z (x, y, z) shows that the expected resistance is five:
111
W
E(R)
=
10
W
10
20
10 xy
20
~z
+ xz + yz
=5,0
10
67 1000 (xy
20
111
=65000
7 1,
10
20
.
10
xyzdxdydz
5.
+ .xz + yz) dx dy dz
Further Properties of the Mean
Se<:tion 3.9
Theol'em 3.9.2.
Variance
229
X and Y be any hvo random variables (discrete or continuous dependent
and let a and b be any two constLlnts. Then
or
+ bY) = aE(X) + bE(y)
E(aX
and £(Y) are both finite.
provided
continuous case (the discrete case is " ...,v.. ,., much the same way).
joint pdf of X and Y,
g(X, Y) = aX + bY. By
Proof
Ix.r(x, y)
Theorem 3.9.1,
+ bY) =
i : / " : (ax
1 foo
+
bY)!x,y(x,y)dxdy
00
=
-00
=a
(ax)fx.y(x, y)dxdy
+
(by) fx. y (x, y) dx dy
-00
1: (I":
=a i:x!x(X)dX
= flE(X)
+
+ b i : y ( i : fx.y(x. y) dX) dy
fx.r(x. y) dY ) dx
x
+ b i:ylrCy)
bE(Y)
o
Corollary. Let WI, W2,"" Wll be any random variables for which E(Wi) <
1.2, ... ,11, and let aJ, 02 •.. · ,an be any set
£(atWl
+ fl2W2 + ... + a
ll
W,,) =fllE(W.)
+ a2E(W2) + ... +
00,
!
=
anE(W"J
EXAMPLE 3.9.3
X be a binomial random variable defined on n
success with probability p. Find E(X).
Note, first, that X can be thOtight
as a sum, X
+
represel1.ts the number of successes occurring at
ith trial:
.-11
X, -
+ ... +
X", where
if the ith trial produces a success
a failure
0 if the ith
Xi defined in this way on an individual
binomial, then, can be thought
assumption, Px, (1) = p and px; (0) = 1
£(X)
trials, each trial resulting
as
p, i
=
= E(X}) + E(X2)
is caned a Bernoulli random variable.
sum of n independent BernoulJis.) By
1.2 •... , n. Using the corollary,
+ .
<.
+
E(Xn)
=n . E(X 1 )
the last step being a consequence of the
)= 1 . p
SO
E(X) = np, which is what we found
............. '" identical distributions. But
+0
. (1 - p)
=p
3.5.1).
230
3
Random Variables
Comment.
problem-solving implications of Theorem
and
corollary
should not be underestimated. There are many reaJ-world events that can be modeled as
a linear combination at W. + a2 W2 + ... + all Wn• where the WiS are relatively
prohibitively
random variables. Finding E(al WI + a2 W2 + ... + an W,,) directly may
complexIty of
linear combination. Il may very
difficult because of the
the case, though, that calculating the individual E(Wi)'S is easy. Compare, for
well
3.5.1. Both derive the formuJa that E(X)
np
with
The approach taken in ExampJe
(i.e.,
Theorem 3.9.2) is much
The next several examples further explore
technique
of using linear combinations to facilitate the calculation expected values.
EXAMPLE 3.9A
A disgruntled secretary is upset about having to stuff envelopes. Handed a box of n letters
frustration by putting the letters into the envelopes at
and n envelopes, she vents
random. How many people, on the average, will receive their correct mail?
If X
the
of envelopes properly stuffed, what we want is
applying Definition 3.5.1
would prove formidable because of the difficulty in getting
a workable expression for
Isee (97)J. By using the corollary to Theorem 3.9.2,
though, we can
the problem quite
Let Xj denote a random variable
to the number correct letters put into the i th
L 2, .... n. Then Xi equals 0 or I, and
envelope, i
=1
fork = 0
1
px;(k)
BU1X = Xl + X2
each of the X;'s
=
P(Xj
= k} = ~
{
for k
-n 1
+ ... + E(X".) Furthermore,
+ ... + X" and E(X)
E(X 1) +
the same expected value, lin:
E(Xi)
=
k . P(Xi
= k) = 0 .
n - 1
-n
+1
1
n
1
n
It follows that
E(X)
E(Xj)=n·
(~)
=1
showing
regardless of n, the expected nmntx~r of properly stuffed envelopes is one.
(Are the Xi'S independent? Does it matter?)
EXAMPLE 3.9_5
Ten fair
are rolled. Calculate
expected
of the sum of the
showing.
If the random variable X denotes the sum of the faces showing on the ten
then
X
Xl
+
X2
+ ... +
XlO
Section 3.9
Further Properties of the Mean and Variance
where Xi is the number
for k
ith
= 1,
k .
i
231
= 1.2•... ,10. By assumption, PXj (k) =
t
1= 1k=1
t k = 1 . ~) = 3.5. By the corollary to
Theorem 3.9.2,
=
Notice that E(X) can also be deduced
by appealing to the notion tbat expected
values are centers of gravity. It should be clear from our work with combinatorics that
P(X = 10)
P(X = 60), P(X = 11)
P(X = 59), P(X = 1.2) = P(X = 58), and so on.
The probability function Px(k) is
other words,
implies that its center
of
is the midpoint of the range of its X -values. It must be the case, then, that E (X)
0135.
=
EXAMPlE 3.9.6
The honor count in a
according to tbe formula:
honor count
=4 .
+
can vary from zero to thirty-seven
of
+ 3 . (number of kings) + 2 . (number of queens)
1 . (number of jacks)
is the expected honor count of North's hand?
solution here is a bit unusual
we use the corollary to Theorem 3.9.2
backwards. If Xi. i = 1, 2, 3, 4, denotes the honor count for players North, South, East,
West, respectively, and if X denotes
sum for the entire deck, we can
write
X=Xt
x=
E(X)
=4
.4
+
+
+
3 . 4
+2
. 4
By symmetry, E(Xj) = E(Xj),i =# soitfollows
ten is the expected honor count of North's hand. (Try
making use of the fact that the deck's honor count is
+ 1 . 4 = 40
= 4· E{X t ), which implies that
problem directly, without
EXAMPLE 3.9.1
;SU10o()Se that a random sequence of Is and Os is gen,erated by a computer, where the
of
sequence is n, and
P
= p(l appears in ith {X)sition)
232
Chapter 3
Random Variables
and
1 - p
= p(O appears in ith position).
i::::::1,2, ...
,n
the expected number of runs in the sequence? Note: Arun is a series of consecutive
-"-,,-- outcomes. For example, the
1 1- 0 1 0 0 0
has a total of four runs (1 1, 0, 1,
0 0 0).
Let
denote the outcome appearing in position i, i = 1,2, ... , n.
in the
then, can be expressed in terms of the n - 1 transitions
i = I,
1. Specifically, let
It follows that
R = total number of runs
= 1 + Q(Xl, X2) +
and
,,-1
E(R) = 1
+L
E[Q(Xi. X H l)]
,,,,,1
But
E[Q(X,. X i +l)] = 0 .
= P(Xi
¢
= P(Xi = 1 n
= p(1 - p)
=
+ (1
0)
- p) p
+
P(Xi = 0
n
=
1)
(because of independence)
=2p(1 -
Therefore,
E(R)
=1
+ 2(n
1)p(1 -
p)
Expected Values of Products: A Special Case
We k.now from Theorem 3.9.1 that for any two random variables X and Y.
XYPX,y(x. y)
xyfx.y(x, y) dx dy
If,
uX
Y are discrete
if X and Yare continuous
X and Yare independent, there is an easier way to calculate E(XY).
Further Properties of the Mean and Variance
Section 3.9
233
Th.eorem 3.9..3. 1f X and Yare independent random variables,
E(XY)
= E(X)
. E(Y)
provided E(X) and E(Y) both exist.
Proof. Suppose X and Yare both discrete random variables. Then
joint pdf,
px.y(x. y), can be replaced by the product of their marginal pdfs. px(x) py(y), and
tbe double summation
by Theorem 3.9.1 can be written as the product of two
single summations:
E(XY)
LXY . PX,y(x, y)
=
x all y
= LLxy . Px(x) . py(y)
an ]( all y
= LX
all ](
. px(x) .
= E(X)·
[2> .
py(y)]
all y
y)
The proof when X and Yare both con tinuous random
left as an <;;)"".l""''''.....
o
QUESTIONS
3.9.1. Suppose that r chlps are drawn with replacement from an urn containing n chips,
numbered 1 through n. Let V denote the sum of the numbers drawn. Find E(V).
3.9.2. Suppose that fx.Y(x. y) =
0 S x. 0 S y. Find
+ Y).
Suppose that fx.Y(x. y) =
+ 2y),O :::: x :::: 1,0:$ Y :$ 1. (recall Question 3.7.19(c)).
Find E(X + Y).
3.9.4. Marksmanship competition at a
level
each contestant to take 10 shots
with each of two different hand
Final scores are computed by taking a weighted
the number of bull's-eyes made with the first gun plus six times
average of four
the number gotten witb the second. If Cathie has a 30% chance of hitting the bull's-eye
with each shot from the first gun and a 40% chance with
shot from the second gun,
what is her expected
3.9.5. Suppose that
is a random variable for which E(X,.) = /1, i = 1.2, ...• n. Under what
conditions win the following be true?
Xi
E(taiXi)
= /1
1=1
3.9.6. Suppose that the daily Closing
of stock goes up an eighth of a point with probability
p and down an eighth ofapointwith probabilityq. where p > q.After n days how much
gain can we
the stock to have achieved? Assume tbat the daily
fluctuations
are independent events.
134
Chapter 3
Random Variables
3.9.7. An urn contains r red balls and w white balls. A
+ w).
is
of n balls is drawn in order and
be 1 if the ith draw is red and 0 otherwise i = 1, 2, ... , n.
== E(Xt). i :::::: 2, 3, ... , n
to Theorem 3.9.2 to show that the
number of red balls
Let
withoUl
(11.) Show that
(b) Use the
two fair dice are tossed. Find the expected vaJue of the
3.9.8.
of the faces
3.9..9. Find E(R) for a two-resistor circuit similar to the one described in examOle 3.9.2.
where fx.Y(x, y) = k(x + y), 10 ::s x ::s 20, 10 ::s y ::s 20.
3.9.10. Suppose that X and Y are both unifonnly distributed over the interval
Calculate
Y) from the
the expected value of the square of the distance of the random
origin; that i~ find E(X2 + YZ). Hint: See Question 3.85.
3.9.n. Suppose X represents a point pjcked at random from the interval [0,1] on the
and Y is a point
at random from the interval [0, 1] on the
Assume that X
and Yare independent. What is the
value of the area of the triangle fonned
by the points (X. 0), (0. Y) and (O,O)?
3.9.12. Suppose Y1 , fz ..... f" is a random
from the unlionn pdf over [0, 1]. The
geometric mean of the numbers is the
variable -\YY1 Y2 ..... Y". Compare the
expected value of the
mean to that of the arithmetic mean Y.
Calculating the
vllIlmuu·...
We know from the
of a
Random Variables
to Theorem 3.9.2 that
w...
for any set of random variables WI. W2, ...•
provided E(Wi) exists for all i, A similar
holdS for the varwnce of a sum of random variables, but only if the random variables
Th~em
E(Wj2)
3.9.4. Let WI, W2 .... , Wn be a set
all i. Then
Var(W]
+
W2
+ ... +
ofI1Ul!ep~enlJ!ent
Wn ) = Var(W1 )
random
+ .. +
+
Proof. The derivation is given for a sum of two random
induction argument would
and 3.9.2.
the
Vl}t·mr..lp,~ for
Wl
which
Var(Wn )
+
W2. A simple
for a.rbitrary n. From Theorems 3.6.1
Writing out the squares
Var(WI
+
+ 2W1 W2 +
E(Wf) -
[E(Wl)J2
+ 2[E(W] W2)
-
Wi) -
+
[E(w]}f - 2E(W1 )E(W2) - [E(W2}f
E(Wi) -
[E(W2)f
E(W])E(W2)]
the independence of WI and W2, E(Wl W2) = E(Wl}E(W2), making the last term
3.9.1 vanish. The
tenns combine to
the desired result:
Var(Wt + W2) = Var(Wl) + Var(W2).
0
"-''1'..... 'LVll
Section 3.9
Further Properties
the Mean and Variance
235
Corollary. UI WI. W2,"" W" be any set of independent random variabLes for which
E(W?) < 00/01' all i. Let aI, a2 .... ,all be any set of constants. Then
Var(al WI
+ 02 W2 + ... + an W,,) =
Proof The derivation is
an exercise.
aiVar(Wt)
+ aiV ar(W2) + .. + a~Var(X,,)
on Theorems 3.9,4 and 3.6.2. The details will
left as
o
Comment. A more general version of Theorem 3.9.4 can be proved, one that leads
to a slightly different formula but does not require
Wi'S to be independent. The
argument, however, depends on a definition we have not yet introduced. We will return
to the problem of finding the variance of a sum of random variables in Section 11.4.
EXAMPLE 3_9_8
The binomial random variable, being a sum of n independent Bernoullis, is an obvious
candidate for Theorem 3.9.4. Let Xi denote tbe number of successes occurring on the itb
Then
,_{I
X, -
with probability p
0 with probability 1
p
and
X=Xl
+
= total number of successes in
+ ... +
11
Find Var(X}.
Note that
E(Xd
1· p
+0
. (1 - p)
and
= (1)2
. P
+
(O}2 . (1 - p) = p
so
Var(Xj) = E(X;) - {E(Xi)f = p - p2
= p(l
p)
It foHows, then, that the varilmce of a binomilll random variable is np(l - p):
Var(X) =
Var(Xi)
i=l
= np(l
-
p)
236
3
Random Variables
EXAMPLE 3.9.9
In
it is often necessary to draw inferences based on W, the average computed
from a random sample of n observations. Two properties
are especially important.
if the Wi s come from a popula tion where the mean is J1.. the corollary to Theorem 3.9.2
that
= J1..
if the
come from a population whose variance is
then Var(W) =
To verify the latter, we can appeal to Theorem 3.9.4. Write
-
1
n
W=-
Then
= (~y
Wi
.
Var(Wtl
= (~y (T2 +
1
=n .
+
(~y a
WI
+
(~y
L
1
n
W2
.
Var(W2)
+ .. +
1
+ ... + -n . WI!
+ .. +
(~y
.
Var(Wn )
(~y
=n
QUESTIONS
3.9.13. Suppose that Ix, y(x, y) = A2e- Mx+ y), 0 ~ x, 0 ::: y. Find Var(X + Y). Hint: See
Questions 3.6.11 and
3.9.14. Suppose that
y)
+
0 ~ x ~ 1,0 ::: y ~ 1. Find Var(X + Y). Hint:
See Question
3.9.15. For the unifonn pdf defined over [0, 1j. find the variance of the geometric mean when
n =2
Question
3.9.16. Let X be a binomial
variable based on n trials and a success probability of
Px; Jet Y be an independent binomial random variable based on m trials and a success
probability of py. Find E(W) and Var(W), where W = 4X + 6Y.
3.9.17. Let the Poisson random variable U be the number of calls for technical assistance
received by a computer company during the finn's 9 flOrmal workday hours. Suppose
the average number of calls peT hour is 7.0 and that each call costs the company $50.
Let V be a Poisson random variable representing the number of calls for technical
assistance received during a
remaining 15 hOUTS. Suppose the average number of
caHs per hour is 4.0 for that time
and that each such call costs the company $60.
Find the
cost and the variance of the cost associated with the calls received
a 24-hour day.
3.9.18. A mason is contracted to build a patio retailling wall. Plans call for the base of the wall
to be a row of 50 IO-inch
each
by ~-inch-thick monar.
that
the
used are randomly chosen from a popUlation of bricks whose mean
is
10 inches and whose standard deviation is dz inch. Also, suppose that the mason, on
the average, will make the mortar ~ inch
but the actual dimension varies from
brick to brick, the standard deviation of the thicknesses being
inch. What 1S the
standard deviation of L, the
of the
row of the waH? What assumption are
you making?
-h
_t'Tlnin
3.9
Further Properties
the Mean and Variance
231
3.9.19. An electric circuit has six resistors wired in series, each nominally being 5 ohms. What
is the maximum standard deviation that can be allowed in the manufacture of these
resistors jf
combined circuit
is to have a standard deviation no greater
than 0.4 ohm?
3.9.20. A gambler plays n hands of poker. If he wins the kth hand, he collects k dollars; if he
loses the kIh hand, he collects nothing. Let T denote his total winnings in n hands.
Assuming that
chances of winning each hand are constant and are independent of
his success or failure at any other hand, find
and yar(T).
Approximating the Variance of a Function
Random Variables (Optional)
It is not an uncommon problem for a laboratory scientist to have to measure
quantities, each subject to a certain amount of "error," in order to calculate a final desired
result
example, a physics student trying to determine
acceleration due to gravity,
traveled by a freely falling body in
T, is
to G
G, knows that the distance,
by the equation
(assuming the body is initially at rest) or, equivalently,
2D
G= T2
Suppose distance and time are to be measured directly with a yardstick and a stopwatch.
The values obtained. D and T, will not be exactly correct: rather, we can think of them
as being realizations
random variables, with those variables having
values J.LD
and J.LT and variances Var(D) and Var(T), the
two numbers reflecting the lack of
precISIon the measuring process. Suppose we know from past experience the precisioos
characteristic of
distance and time measurements-what can we then conclude about
the precision in the calculated value
G?
is, knowing Var(D) and Var{T), can we
find Var(G)?
By way of background, We have already seen one result that bears directly on this
sort of "error-propagation" problem. If the quantity to be calculated, W, is the sum of n
independently measured quantities, WI> Wz • ... , W", and if the variance associated with
each of the Wi'S is known, we can appeal to Theorem 3.9.4 and say that
Var(W) = Var(Wl)
+ Var{W2) + ... + Var(W,,)
(3.9.2)
In general, extending Equation 3.9.2 in any exact way to situations where W is some
arbitrary function of a set of
W = g(Wl' W2, ... ,
difficult. It
is a relatively simple matter, though, to get an approximation
variance of W.
More specificaUy, suppose that W is a function of n indepeodent random variables-that
W = g(Wl. W2. .• Wn )· Assume that tti and Var(Wi) are the mean and variance,
respectively, of Wi. i = 1.2, ... , n. Using the first-order terms in a Taylor expansion of
238
Chapter 3
Random Variables
"'2 ..... !-lit), we can write
the function g (WI, W2.. .• Wit) around the point
!-ll!) [
Applying the corollary to Theorem 3.9.4 to Equation
approximation:
Var(W)
==
. ilg
aw:
I
]
(3.9.3)
" (Ilt. ... •Il,,)
yields the SOlIWl[-31:rer
[il
(3.9.4)
CASE STUDY
In a
dental X-ray unit,
from the cathode of the
tube are
decelerated by nuclei in the au\.JU'V. thereby producing Bremsstrahlung radiation
collimated by a lead-lined tube, effect the ae:ilrt:~
(X-rays). These emissions,
image on a sheet film.
Tennessee state regulations (170) require that the ctistanlce,
on the anode of an X -ray tube to the patient's skin
equipment, particularly older units, that distance cannot measured directly because
the exact location the focal spot cannot
determined just
looking at the tube's
outer housing. When
is the case, state inspectors resort to an indirect
procedure. Two films are exposed, one at the unknown distance Wand a second at
a distance W + Z. The two diameters, X
Y, of the resulting circular images are
3.9.2).
then measured (see
By similar triangles,
or
W=
XZ
Y-X
(3.9.5)
in the context of our previous
W
g(X. Y. Z) = XZ(Y
X)-I
(Cominued on next page)
Section 3.9
further Properties of the Mean and Variance
239
Focruspot
00
anode
Thbe
T
1\
I \
I \
I
\
I
I
I
\
I
\
Lead-lined
collimator
I
:
w
\
\1-----7
First
film-
z
1
FiGURE 3.9.2
During the course of one such inspection (96), values measured for the two
diameters X and Y and the backoff distance Z were 6.4 em, 9.7 em, and 10.2 em,
From
3.9.5, then, the anode-to-patient distance is
to
w = (6.4)(10.2) = 19.8 cm
indicating that the unit is in compliance. If the error in W, though, were sufficiently
18 em,
might
be a
probability
the true W was less
meaning the unit was, in fact, out of compliance. It .is not unreasonable, therefore, to
inquire about the magnitude ofVar(W).
To apply Equation 3.9.4, we first need to compute the partial derivatives
g(X, Y. Z). In this case,
ag
+
-=
ax
ay
and
Z
ag
az
-
x
(Continued on next page)
240
Chapter 3
Random Variables
(Case Siudy 3,9.} continued,
Ins:pec:;!:Ors feel that the standard deviation in any of
measurements is on
the order of 0,08 em, so Var(X) = Var(Y)
Var(Z)
(0.08)2. Substituting the
variance estimates and the
derivatives, evaluated at the point (ILX, ILY. ILZ)
(6.4,9.7,10.2) into
3.9.4
=
Var(W)
==
(6.4)(10.2)
(9.7 -
+
+
-(6.4)(10.2)
10.2
]2 (0.08)2
(9.7 - 6.4)
0.08
(
2 [(9.7 6.4- 6.4) ]2 (0.08)2
) +
= 0.782
Therefore, the estimated standard deviation associated with the calculated value
is
or 0.88 em.
W
QUESTIONS
3.9.21. A physics student is trying to determine the gravitational
expression
cOI1St~mt,
G,
the
both
(D) and time (T) are to be measured.
that the standard
deviation of the measurement errors in D is 0.0025 feet and in 0.045 seconds. If the
experimental apparatus is set up so that D will be 4
then T ti]1 be approximately
second. [f D is set at 16
T will be close to 1 second. Which of these two sets of
values for D and T wiJI give a smaller variance for the calculated G?
3.9.22. Suppose that Wt, W2 •... , and Wll are independent random variables with variances (ft,
and
respectively, and let W = Wj + W2 + .. + W".
Var(W)
using Theorem 3.9.4 and Equation 3.9.4.
3.9.23. If h 1S 1tS height and a and b are the lengths of its two parallel
the area of a
rrape:WIO is given by
i
ui, ... ,
A
1
= 2'(0 + b)h
p.y,,,"p,,,,,'I"'" that approximates (fA if a, b, and h are measured independently
with
deviations (fa, m" and (fll, respectively.
calculated distance)
3.9.24. In Case St udy 3.9.1. notice that the difference bet ween 19.8 cm
is slightly more than two standard
and 18 cm (the state reguLation minimum
X-ray
deviations. What does that imply about the probabilily that this
machine is operating
3.10
ORDER STATISTICS
The single-variable transfonnation taken up in Section 3.4 invoLved a standard linear
ujJ,•• "' .......,"", y = aX + b. The bivariate transformations in Section 3.8 were similarly
arithmetic,
being concerned with either sums or products, In this section we will
consider a different sort of transfonnation, one involving the ordering of an entire set of
Section 3,10
Order Statistics
241
random
',.",n<:lt" ........ <ltu'n has wide applicability in many areas of
statistics,
we
consequences in later
Here, though, we
will limit our discussion to two basic
derivations of (1) the marginal pdf of the ith
largest observation in a random sarnDlle and (2) the joint pdf the ith and jth
observations in a random sample.
Definition 3.10.1. Let Y be continuous random variable for
Yl. n .... , y" are
values of a random sample of
n. Reorder the YiS from smallest to largest:
YlI <
J
< ,., < Yn
(No two
Define
the YiS are equal, except with probability zero, since Y is continuous.)
random variable Yi to have the value Y;, 1 ~ i n. Then Y/ is called
itb
ordersuJJtistic. Sometimes y~
Y: are denoted Ym::.x and Ymin, respectively.
EXAMPLE 3.10.1
"'''''I.1U''"''''-' that four measurements are made on the random
Y: Yt
Y4 = 3.2. The corresponding ordered sample would be
= 3.4, Y2 = 4.6,
2.6 < 3.2 < 3.4 < 4.6
The random variable representing
its value for this particular sample being 2.6. Similarly,
is 3.2, and so on.
would be denoted Y{, with
for the second order
The Distribution of Extreme Order Statistics
By definition, every observation in a random sample
the same pdf. For example,
if a set of four measurements is taken from a nonnal distribution with fl. = 80 and
(J =
then /YI(Y)' /Y1(Y)' /Yj(Y)' and /y4 (Y) are
the same-each is a
pdf
with fl. = 80 and (J =
pdf describing an
observation, though, is nor
the same as the pdf describing a random observation. Intuitively, that makes sense. If
a
observation is drawn from a normal
with JJ. = 80 and cr =
it
would not be surprising if that observation were to
on a value near
On
other hand, if a random sample of n = 100 observations is drawn from
same
we would not
the sma11est observation-that is, Ymin-to anywhere
near eighty. Common sense
us that that
observation is likely to
much
smaller than eighty, just as
observation, Ymax , is likely to be much
than
J u ...... OJJ",
It follows, then, that
we can do any
any applications
whatsoever-involving order statistics, we need to know the pdf of
= 1,2, ... , n.
We begin by investigating the pdfs of the "extreme" order statistics, irm.. (y)
irmi"(Y)'
are the simplest to work with. At the end
the section we return to
...",.\",,.,,,1 problems of
(a) the pdf of Y{ for any i and (b) tbe joint
Yj.
wherei < j.
Y;
;
242
Chapter 3
Random Variables
EXAMPLE 3.10.2
Suppose that
Y2. ...• Y" is a random sample of continuous random variables, each
h(Y) and edt
(y). Find
having
a.
hmax(Y)
b.
h",m (y)
= fY~(y), the
of the largest order statistic
fyl (y), the pdf of the smallest order
1
Finding the
of Ym.ax and YOOn is accomplished by using the now-familiar technique
of differentiating a random variable's cd!. Consider.
example, the case of the largest
order
Y~:
Fr~(Y)
= Fym;u(Y) = P(Ym.ax :5 y)
= P(YI
== P(YI
:5 Y
n
:5 Y
n ... n
:5 y)
:5 y) . P(Y2:5 y) ... P(Y"
:s. y)
(why?)
= [Fr(y)]"
Therefore,
(y)
= d/dy[(Fy(y)]II] = n[Fr(y)]1I-1 fy(y)
Similarly, for the smallest order statistic (i
Fy1 (y)
1),
= FYmim(Y) = P(Ymin :5
1 - P(YOOI1 > y)
=1 -
P(YI > y) . P(Y2 > y) ... P(Yn > y)
=1 -
[1 - Fy(y)]n
Therefore,
fy'(y)
1
= d/dy[l
- [1 - Fr(Y)]"} =
_ Fr(y)]n-t h(Y)
EXAMPLE 3.10.3
Suppose a random sample
n = 3 observations-V!, Y2. and
taken
the
y ~ O. Compare hI (y) with fy;(y). Intuitively, which will
exponential pdf, hey) =
be larger, P(YI < 1) or P(Y{ < 1)1
The pdf for
of oourse, is just the pdf of the distribution being sampled-that is.
that we apply the
given in
CXi:l~rnplC:
3.10.2 for
Section 3.10
Order Statistics
243
3
Probability
density
AGURE 3.10.1
Then, since n
= 3 (and i
1). we can write
Figure 3.10.1 shows the two pdfs plotted on the same set of axes. Compared to frl (Y),
the pdf for Y{ has more its area located a.bove the smaller values of y (where Y; is more
likely to lie). For example,
probability that
observation (out of three) is
less than one is 95%, while the probability that a random observation is less than one is
only 63%:
P(Yf < 1)
= 10
1
3y
3e- dy
= 10
3
3
du =
_e- u
1 0
=0.95
P(Yl < 1)
= 10
1
dy =
=1
0
=0.63
EXAMPLE 3.10.4
Suppose a random sample of
ten is drawn from a continuous pdf fr(y). What is the
probability that the largest observatioo, Y{o' is less than the pdf's median, m?
Using the formula for frl1.0 (y) = frmM.(Y) given in
3.10.2, it is certainly true
that
(3.10.1)
but the problem does not """"',M'V fy(y), so Equation 3.10.1 is of no help.
244
Chapter:3
Random Variables
is available, even if fy(y) were sp<;:CltJled:
to the event "Yl < m n Y2 < m n ... n YIO < m
Fortunately, a much
event "Y'10 < m" is
Therefore,
SOlltllICm
P(Y{o < m) = P(YI < m,
H
•
(3.10.2)
< m, ... , YIO < m)
But the ten observations here are mClepen<:lenlt, so the intersection probability implicit on
"""""-.L,uu... side of Equation 3.10.2
ten tenns. Moreover, each
of those terms equals ~ (by definition of the me,dUllnJ,
P(Y{o < m)
= P(YI
< m)
P(Y2 < m)
< m)
= (~)10
= 0.00098
A General Formula for
fy~(y)
I
Having discussed two
more general fJU.JI.,,,,v,U
integer from 1 through n.
statistics, Yrnin and Yrnax • we now turn to the
for the i th order statistic. where i can
any
be a random sample of continuous random variables
Theorem 3.10.1. Let Y1,
Fy(y). The pdf of the ith order statistic
drawn from a distribution having pdf fy(y)
is given by
n!
fr:(Y) = (i - l)!(n - i)!
1::::; i sn.
We will give a heuristic argument that draws on
statement of Theorem 3.10.1 and the binomial
the
given for /y1(Y), see
V"L'UU.
similarity between the
For a fonnal induction
I
the derivation of the binomial probability function, px(k)
(:)tI(l
, where X is the number of successes in n independent
P(X
k)
p
trial
in success. Central to that derivation was the
recognition that the event" X k" is actually a union of all the different (mutually
k successes and n - k failures. Because the trials
exclusive) sequences having
of any such sequence is pk (1 - P )11-k and the number
are independent, the
is the probability that
of such sequences (by
that X = k is the product,
G)
Here we are lODking for the
fy~(y). As was the case with
I
is nl/[k!(n - k)!] (or
G)), so the probability
of the ith order statistic at some point y-that
will reduce tD a combinatDrial
omomllal, that
Section
i-lobs.
1 obs.
245
n-; obs.
-~--+----"----
Y-axis
RGURE 3.10.2
probability
with an intersection of independent events. The
only
difference is that Y/ is a continuous
variable, whereas the
we find here will be a probability density
binomial X is discrete, which means that
Theorem 2.6.2, there aren!j[(i - 1)111(11 - i)!J ways that II observations can
parceled into three groups such that the ith
is at the point y
Figure
Moreover,
likelihood associated with any particular set of points having the
configuration pictured in Figure 3.10.2 will the probability that i - 1 (independent)
observations are all
than y, II
i observations are greater than y, and one
probability
associated witb those
for a given
observation is at y.
set of points would be [Fy(y)]I-l[l - Fy(y)],,-l frey). The probability density, then.,
that the ith order statistic is located at the point y is the product,
n!
Fy(y)r- i fy(y)
(i - 1)1(71
o
EXAMPlE 3.10.S
Suppose that many
observation
confirmed
the annual maximum flood
tide Y (in feet) for a certain river can
modeled by the pdf
fy(y)
1
= 20'
20 < y < 40
(NOle: It is unlikely that flood tides would be described by anything as
as a uniform
pdf. We are making that choice
solely to facilitate the mathematics.) The Army
OJrps of Engineers are planning to build a levee
a certain portion of the
and
so that
is only a 30%
that the
worst
they want to make it high
How high should the
flood in the next thirty-three years will overflow the
levee be? (We assume that there will be only one potential flood per year.)
Let h be the
height. If Ylo Y2, _. _, Y33 denote the flood tides for the next
n
years, what we require of h is that
P(Y:h :> h)
As a
= 0.30
point, notice that for 20 < Y < 40,
F'I(Y)
=
y
1
dy =
20 20
l
20
- 1
246
Chapter 3
Random Variables
Therefore,
33 ' (y
=- 31tH 20
)31 (2 - -y)1
1
fy} (y)
:J2
1
20
and h is the solution of the integral equation
140
h
(33)(32)
(20Y
)31 (2 -
- 1
Y)
20
1
dy =0.30
20
(3.103)
If we make the substitution
u=
-
1
Equation 3.10.3 Simplifies to
P(Yn > h)
= 33(32) f1
u
31 (1
- u) du
{hj'20)-l
=1 _
33
(~
y2 +
_ 1
32
(~
y3
_ 1
(3.10.4)
Setting the right-hand side of Equation 3.10.4 equal to 0.30 and solving for h by trial and
error gives
h
= 39.3 feet
Joint pdfs of Order Statistks
Finding the joint pdf of two or more order statistics is easily accomplished by generalizing
the
that derived from Figure 3.10.2. Suppose, for example, that each of n
observations in a random sample has pdf Jy(y) and cdf Fy(y). The joint pdf for order
Y[ and Yj at points u and '!i. where i < j and u < v, can be deduced from
FigUJe 3.10.3, which shows how the n points must be distributed if the ith and jtb order
statistics are to be located at points u and '!i, respectively,
By Theorem 2.6.2, the number of ways to divide a set of 11 observations into groups of
sizes i
1, 1, j
i - ], 1, and 11 - j is the quotient
n!
(i - 1)!1!(j - i - 1)111(11 - i)!
obs.
---'--+-~--'-----t---<--
RGU'RE 3.10.3
y.a:.xis
Order Statistics 241
Section 3.10
Also, given the independence of the n observations, the probability that i
1 are less than
u is (Fy(u)]i-i, the probability thatj - i - I are betweenu and vis [Fy(v) - Fy(uW-i-i,
and the probability that n - j are greater than v is [1 - Fy(v)j'l- j. Multiplying, then, by
the pdfs describing the likelihoods
Y/ and Yj would be at points u and v, respectively.
the joint pdf of the two order statistics:
/y/.r;(u. v):=: - ( t - - - - - - - - - - j - ) !
[1 - Fy(v)],,-J fy(u) !rev)
(3.10.5)
for i < j and u < v.
EXAMPLE 3.10.6
Let Yl, Y2. and Y3 be a random sample of
n = 3 from the uniform pdf defined over
the unit interval, Jy(y) = l,O :::: y :::: 1. By definition, the range,
of a sample is the
difference between the
and smallest order statistics-in this case,
R = range
= Yma )(
-
Ymin
= Y3 -
Y;
Find fR(r), the
the range.
We will begin by finding the joint pdf of Y; and Y3.
fY~'Y3(U, v) is integrated over
the region Y3 - Y; :::: r to find the edt, F,?(r) = peR s: r). The final step is to differentiate
the edf and make use of
fact that !R(r) = F~(r).
If Jy(y)
1.0 s: y .:::; 1, it follows that
=
Fy(y) = P(Y :5 y)
Applying Equation 3.10.5,
Y~. Specifically,
fy:.y~(u, v)
with n
Moreover, we can write the
FR(r)
-
y < 0
y,
1.
y:> 1
0:5 Y :5 1
t
= 3, i = 1, and j = 3, gives the joint pdf of Y{ and
3!
0
= O!1!O!u
(v
=6(v
o,
u},
1
- u) (1 - v)
0.:::; u < v
= P(Y3
- Y~ ::5. r)
. 1
1
:s 1
VI.
for R in terms of Y{
== P(R .:::; r)
0
13'
= P(Y3 :5 Y: + r)
Figure 3.10.4 shows the region in the f{ f)-plane corresponding to the event that R :5 r.
Integrating the joint pdf of Y{ and Y3over the shaded region
FR(r}
= peR s: r) = /ool-r fll+Y 6(v
1/
- u)
av au
+
11 11
J -y
II
6(v - u) av au
248
Chapter 3
Random Variables
FIGURE 3.1004
The
double integral equals
which implies that
QUESTIONS
:UO.1. Suppose the length of
in minutes, that you have to wait at a bank teller's window
is uniformly distributed over the interval (0, 10). If you go to the bank four times during
the next month, what is the probability that your second longest wait will be less than
5 minutes?
3.10.2. A random sample of size 11
6 is taken from the pdf Jy(y)
3y2, 0 Y :5 1. Find
P(Ys > 0.75).
3.10.3. What is the probability that the larger of two random observations drawn from any
continuous pdf will exceed the sixtieth percentile?
3.10.4. A random sample of size 5 is drawn from the pdf Jy(y) = 2y, 0 :5 Y :5 L Calculate
P(Yf < 0.6 < Ys )' Hint: Consider the complement
3.10.5. Suppose that YI, Y2, ...• YI1 is a random sample of size n drawn from a continuous pdf,
fY(Y), whose median is m.ls P(Y{ > m) less than, equal to. or greater than P(Y~ > m)?
3.10.6. Let Yb
...• YI) be a random sample from the exponential pdf !y(y)
e-Y , y > O.
What is the smallest n for Which P(Ymin < 0.2) > 0.91
< 0.7) if a random
of size 6 is drawn from the uniform
3.10.7. Calculate P(0.6 <
pdf defined over the
(0,1).
1.10.a A random sample of size 11 = 5 is drawn from the
fy(y) = 2y, 0 < y < 1. On the
same set of axes,
the pdfs for Yl, YI' and Y
3.10.9. Suppose that 11 observations are taken at random from the pdf
:s
s'
fy(y)
-oo<y<oo
What is the probability that the smallest observation is
than20?
Section 3.11
Conditional Densities
249
3.10.10. Suppose that n observations are chosen at random from a continuous pdf fy(y). What
is the probabOity that the last observation recorded will
the smaUest number in the
entire sample?
3.10.11. In a certain Jarge metropolitan area the proportion, Y,
from school to school. The
of proponions is
following pdf:
2
It.-_ _..L.
o
Y
1
Suppose the enrollment figures for five schools selected at random are examined.
What is the probability that
with the fourth highest proportion'of bused
children will have a Y -value in excess
What is the probability that none of the
schools will have fewer than 10% their student bused?
3.10.12.
containing n components, where the lifetimes of the components
,Y > O. Show thai
are
random variables and each has pdf fy (y) =
average time elapsing before
component failure occurs is tInA.
3.16.13. Let Yb Y2, ... , Y" be a random
[rom a unifonn pdf over (0.1]. Use Then1
rem 3.10.1 to show that 10
-
i)!
= -'----'--'----.
n!
Question 3.10.13 to find the
value of
where YI , Yl , .. . , is a random
3.10.14.
<:!=Irnnll~ from a uniform pdf defined over
interval (0,1].
3.10.15.
three poinls are picked randomly from the unil interval. Whal is the
that the three are within a haUunit of one anolher?
3.10.16. Suppose a device has three
components, all of
lifetimes (in
months) are modeled by Ihe
pdf, fr(Y) =
Y > O. What is the
probability that all three components will fail within two months of one another?
(1
y)tI-idy
yt,
CONDITIONAL DENSITIES
We have
seen thaI mallY of the
defined in Chapter 2
to the
their random-variable counprobabilities
example,
terparts. Another these carryovers is the notion of a conditional probability, or. in what
will be our
tenninology, a coruiilional probability density
of conditional pdfs are not uncommon. The
and
of a tree, for
can
be considered a pair of random variables.
It lS easy to measure
it can be
height~ thus it
interest to a lumberman to know the
difficult to
heights given a known value for its
of a school board member agonizing over which way !o
vote on a pn)p()seo
would be that much easier if she knew the
increase. Her
that x additional tax dollars would stimulate an
conditional
increase
of y points
twelfth-graders taking a standardized proficiency exam.
250
Chapter 3
Random Variables
finding Conditionar pdfs for Discrete Random Variables
In the case of discrete random variables, a '-'V1Jl\.llLlUu,cU pdf can be treated in the same way
as a conditional probability. Note the
between
3.11.1 and 2.4.1.
DefinitionJ.ll.l. Let X and Y be discrete AW"'~~"U
density function of Y given x-that is. the probability
that X is equal to x-is denoted PYlx(Y) and
pyt>;{Y) = P(Y
= Y I X = x) =
conditional probability
Y takes on the value Y given
~.;....;......:-
for px(x) 'f O.
EXAMPLE 3.11.1
A fair coin is
times. Let
random variable Y denote the total number of
heads that occur, and let X
the number of heads occurring on the last two tosses.
Find the conditjonal
PYlx(Y) for all x and y.
Clearly, there win be three
conditional pdfs, one for eacb possible value X
(x = 0, x = 1, and x = 2).
for each value of x there will be four possible values
of Y, based on whether the
tosses
0,1,2, or 3 heads.
For example, suppose no
occur on the last two tosses. Then X = 0, and
PYIO(Y)
= P(Y
= Y I X = 0)
occur on first three tosses)
=
=
<011'...""'....0"
Similarly,
Y=
that X = L The corresponding
py~(y)
Notice that Y
the first
C) (~r (1 - ~r-Y
(3) (~2)3. 0. 1,2,3
\y
in that case becomes
= P(Y = Y I X = 1)
1 if zero beads occur in the first three tosses, Y = 2 if one
and so on. Therefore,
occurs in
Section 3.11
Conditional Densities
251
.!
Pm(y)
~
8~
______________L -__
~
____~____~_ _
x=l
3
PyU(y)
~
8~
________~____L -__~____- b_ _ _ _ _ __ _
x 0
.!
PYJ06')
~
gL-__~____~____~__~____- b_ _ _ _~_ _
o
'2
1
3
4
y~
5
AGURf 1.11.1
has the same shape, but the possible
Figure 3.11.1 shows the three conditional
values of Y are different for
X
EXAMPlE 3.11.2
Assume that the probabilistic behavior of a pair of discrete random variables X and Y is
described by the joint pdf
px.y(x, y)
defined over the four points (1,
that X = 1 given that Y 2.
By definitioo,
=
PXI2(1)
= xy2/39
(1,3), (2, 2), and (2, 3). Fmd the conditional probability
= P(X = 1 given that Y = 2)
::
px.y(l,2)
py(2}
1 . 22 /39
= 1 . 22 /39 + 2 . 22 /39
=1/3
EXAMPl.l3.11.3
Suppose that X and Y are two independent binomial random variables, each defined OIl
n trials and each having
same success probability p. Let Z = X + Y. Show that the
conditional pdf PXlz(x) is a hypergeometric distribution.
252
Chapter 3
Random Variables
We know from Example 3.8.1 that Z has a binomia1 distribution with parameters 2n
p. That
P) 2n-z • z
0,1, .... 'In.
Definition 3.11.1,
PXlz(X)
= xlZ = z) = pX,z(x. z)
=
pz(z)
P(X = x and Z
=
=
= z)
z)
P(X
= x and Y = z -
P(X
=
P(Z
x)
= z)
. P(Y = z = z)
(because X and Yare independent)
which we recognize as being the hypergeometri.c
Comment. The notion of a conditional pdf generalizes easily to situations involving
more than two discrete
variables. For example, if Y, and Z have the joint pdf
PX,Y,z(x, y,z).
JOInt conditional pdf of, say, X and Y
that Z = z is the ratio
EXAMPLE 3.11.4
Suppose that random variables
Y, and Z have the jOint pdf
Px.y.z(x, y, z)
= xy/9z
for points (1, 1, 1), (2, 1,2). (1,2,2). (2.2,2), and
2, 1). Find PX. Ylz (x, y)
values
ofz.
To begin, we see from the
for which PX,Y,z(x. y. z) is defined that Z has two
values, 1 and 2. Suppose z = 1. Then
1ftJ<l,"LV,' ....
PX,YI1(x,y) = - - - - ' - -
Section 3,11
Conditional Densities
153
But
pz(l)
= P(Z =
1) = P[(1, 1, 1) U (2.2,1)]
1
2
=1'9<1+2'9,1
5
-9
Therefore,
xyJ9
PX.Yll (x. y) = - 5 "9
= xyJ5
for
(x, Y)
= (1,1)
and
(2,2)
Suppose z = 2. Then
pz(2)
= P(Z = 2) = P[(2, 1, 2)
U (1,2,2) U (2,2,2)]
122
= 2 . 18 + 1 . 18 + 2 . 18
-
8
18
so
PX.YI2 (x, Y ) =
PX,Y,z(x. y. 2)
pz(l)
x . yJ18
8
_ xy
-
8
m
for
(x, y)
= (2,1), (1,2),
and
(2, 2)
QUESTIONS
3..11.1. Suppose X and Y have the joint pdi PX,y(x, y) ::::::
3.1L2.
3.11.3.
3.11.4.
3.11.5.
X
+ ~1 + xy
for the points (1, 1),
(1,2), (2, 1), (2. 2), where X denotes a "message" sent (either x = 1 or x = 2) and
Y denotes a "message" received Find tbe probability tbat the message sent was the
message received-that is, find pyrx (x),
Suppose a die is rotied six times. Let X be the total number of 4's that occur and let Y
be the number of 4's in the first two tosses. Find PYlx(Y),
An urn contains eight red chips, six white chips, and four blue chips. A sample of size
3 is drawn without replacement. Let' X denote the number of red ch ips in the sample
and Y, the number of white chips. Find an expression for PYlx (y).
Five cards are dealt from a standard poker deck. Let X be the number of aces received,
and Y. the number of kings. Compute P(X ::::: 21Y 2).
Given that two discrete random variables X and Y follow the joint pdf px.Y(x, y) =
k(x + y), for x = 1.2,3 and y = 1. 2. 3,
=
254
Chapter 3
Random Variables
(8) Find k.
(b) Evaluate PYlx(l} for aU values of x for which Px(x} > O.
3.11.6. Let X denote the number on a chip drawn at random {rom an urn contairling three
numbered 1, 2. and 3. Let Y be the nuro ber of heads that occur when a fair coin
[s tossed X times.
(8) Find PX,Y(x, y}.
(h) Find the marginal
of Y by summing out the x-values.
3.11.7. Suppose X, Y, and Z have a trivariate distribution described by the joint
PX.Y.Z(x,
y, z)
where x, y, and z can be 1 or 2. Tabulate the
conditional pdf of X and Y given
each of the two values of z.
3.11.8. In Question 3.11.7 define the random variabk W to be the "majority" of x, y, and z.
For
W(2.2, 1} = 2 and Wl1, 1, I} = 1. Find the pdf of Wlx.
3.1L9. Lei X and Y be independent random variables where Px(k)
k
e-Jl.~ for k
k.
binomial with
= 0, 1, .... Show thai
= e- l ),/(
k!
of X given X
the conditional
and py(k)
=
=n
is
+
Y
nand _A_. Hint: See Question 3.8.1.
A+f.1.
3.1LIO. Suppose Compositor A is preparing a
to be publiShed. Assume that she
makes X errors on a
page, where X has the Poisson pdf, px(k) = e- 22RI k!,
k = 0, 1, 2, ... A
compositor,
is also
on the book. He makes Y
errors ana page, where py(k) = e-3 y Jk!,k = 0,1,2, ... Assume that Compositor A
nrf'naN':4<:. the first 100 pages of the ~ext and Compositor
the last 100
After
is completed, reviewers (with 100 much time on their hands!)
that the
text contains a total
Write a formula for the exact probability that fewer
than half of the errors are due to Compositor A
Finding Conditional pdfs for Continuous Random Variables
If
variables X and Yare continuous, we can stH! appeal to the quotient IX,r(x, y)J /x(x)
as the definition of frlx(Y) and argue its propriety by
A more satisfying approoch,
though, is to arrive at the same conclusion by taking the limit of Y's "conditionaJ" cdf.
If X is continuous, a direct evaluation of FVlx(Y) = P(Y :::::: ylX = x), via Definition
2.4.1, is impossible, since the denominator would be O. Alternatively, we can think of
P(Y :::::: ylX = x) as a limit:
.5 ylX =x)
lim P(Y .5 ylx .5 X .5 x
h-+O
+ h)
(I, u) dudt
= hlim
-=---'--~----­
..... oo
/x(t)dt
Conditional Densities 255
Evaluating the quotient
limits gives
8. so
rule is indicated:
. 1r, l~+h 1:00 Ix,y(t, 1..1) du dt
jX+h
p(r ~ ylX = x) = l~
fx(t)dt
x
By
(3.11.1)
fundamental theorem of cakulus,
d (x+h
dh
which simplifies Equation
Jx
= g(x + h)
get) dt
to
+ h),
Ix.y[(x
P(Y:s
ylX ::=x) =
fx(x
lim Ix y(x
du
+ h)
+ h. 1..1) du
,
1
11-.0
'
-~=---------
fx(x
+ h)
-00
/x,Y(X,
du
fx(x)
provided that the limit
the
can be interchanged [see (9) for a
discussion of when such an
is valid]. It follows from this last expression that
fx.Y(x, y)/lx(x) behaves as a
probability density functioo should, and we are
extending Definition 3.11.1 to the continuous case.
EXAMPlE 3,11.5
Let X and Y be contin~ous random variables with joint
1(~)(6 - x
fx.Y(x,y) =
0 < x < 2,
- y),
0,
Fmd (a)
a.
!x (x), (b)
elsewhere
fylx(y), and (c)
< Y < 31x
= 1).
Theorem 3.7.2.
=.t:
fx(x)
=
b. Substituting into the
f Ylx ()
Y -
(~) (6
14 (~)
rex, y) dy
=
- 2x),
0 < x < 2
(6 - x - y) dy
statement of .IJ,-,lUllIU.Ull
we can write
fx.Y(X, y) _ ~ _ _
-_x_-.....:y-.:...)
fx(x)
_ 2x)
6
----..:..
2<y<4
256
Chapter 3
Random Variables
Co
TofindP(2 < Y < 31x = l} we simply integrate fyll(Y) over the interval 2 < Y < 3:
P(2 < Y < 31x
1)
=
=
i
3
fylt (y) dy
(J 5 -
12
dy
4
5
8
[A partial check that the derivation of a conditional
integrating Inx (y) over the entire range of Y. That
should be one. Here, for
fyl1(Y) dy =
[(5 - y)/4] dy does equal one.]
example, when x = 1,
R
QUES110NS
3.11.11. Let X be a nonnegative random variable. We say that X is memoryless if
P(X > S
+ fiX>
t)
P(X > s)
forail s, 12: 0
Show that a random variable with pdf /x(x) = (1/'A)e- X / A, x > 0, is memory less.
3.11.12. Given the joint pdf
/x,y(x, y) =
0 < x < Y,
y > 0
find
(8) P(Y < llX < 1)
(b) P(Y < llX 1)
(c) iYlx (y)
(d) £(Ylx)
3.1L13. Find the conditional pdf of Y
x if
Ix.Y(x, y)
=x + y
for 0 :$ x :$ 1 and 0 :$ y :$ 1.
3.11.14,
(f
Ix,Y(x. y) = 2,
x 2: O.
Y 2: 0,
x
+ y:$ 1
show that the conditional pdf of Y given x is uniform.
3.11.15. Suppose that
+ 4x
+
and
/x{x)
==
3'1 . (1 + 4x)
for 0 < x < 1 and 0 < y < 1. Find the marginal pdf for Y.
3.11.16. Suppose that X
Yare distributed
to the joint pdf
fx.Y(x. y) ;::::
2
'5 . (2x +
0.::5 x :$ l.
0.::5 y .::51
Section 3.12
Moment-Generati ng Functions
259
By Definition 3.12.1,
(3.12.2)
To get a closed-form expression for Mx(t)-that
to evaluate the sum indicated in
Equation
a (hopefuJly) familiar formula from algebra: According to
Newlon's binomial expansion,
(3.12.3)
for any x and y.
and 3.12.3, then. that
we let x
= pel
Mx(t)
(Notice in this case
and y
= (1
-
=1 P
+
p. It [olllo~fS from Equations 3.12.2
pel)t!
MX (I) is defined for all values of t).
EXAMPLE
=
Suppose that Y
an exponential pdf, where Jy(y) Ae-AY , y > O.
My(t).
Since the exponential pdf describes a continuous random
My (I) is an integral:
My(t) = E(e IY )
= fooo i
Y . )..e-l.y
dy
fooo )..e-(A-/)y dy
After making the substitution U
(A - t)y, we can write
du
My(t)
_
1
-1-t
A
A
t
1
t
[_e-I.lI
OO
[1
Here, Mdt) is finite and nonzero only when u
less than 1. For 1 > 1, My (t)
to
,,=<I
]"
1m eu-+oo
= (1
uJ =1 --A-t
- t)y > 0, which
t must
258
Chapter :3
Random Variables
EXAMPLE 3.12.1
Supp06e the random
X has a geometric pdf,
px(k) = (1 ~ p)k-l p,
k
1,2•...
(In practice, this is the pdf that models the occurrence of the first success in a series
of
where each trial has a
p of ending in success
function fOT X.
Find Mx(t), the
"'c:.......l'''' the first part of Definition 3.12.1 applies, so
(1 _ p)k-t p
=
00
L [(1 P
1 -
p)et]k
(3.12.1)
k=1
The t in MxCt) cao be
in a neighborhood of zero, as long as Mx{t) < 00.
Here, Mx(t) is an infinite sum of the tenns [(1 - p)elt, and that sum will be finite only
assumed, then, in what
jf (1 - p)e' < 1, or, equivalently, ift < In(1l{1 - p». It will
follows that 0 < t < 10(1/(1 - pl).
Recall that
00
1
Lr
k
=--
k=O
1
r
provided 0 < r < 1. This formula can be used on Equation 3.12.1, where r = (1
and 0 < t < In(1~p»)' Specifically,
Mx(t)
= -p-(
1 - P
=1
P(1_(11
p)e
'
- 1)
EXAMPLE 3.12.2
Supp06e that X is a
random variable with pdf
px(k) = G)pk(1 -
Find Mx(t).
k
= 0,1 ....• n
Section 3.12
Find
(II.) fx(x),
(b) fyjx (y), and
Moment-Generating Functions
251
D
(c) p(~ ~ Y ~ ~IX =
(d) E(Ylx)
3.11.17. If X and Y have the joint pdf
0 < x < y < 1
/x.'1(x,y) =2,
find p(O < X < ~IY = ~).
3.11.18. Find P (X < 11 Y = I!) if X and Y have the joint pdf
fx.'1(x.y) =xy/2,
0 < x < y < 2
3.11.]9. Suppose that Xl, X2, X3, X4, and Xs have the joint pdf
fXIoX2.X3.X4.Xs(Xl. X2. X3, X4, xs)
= 32xIX2X3X4XS
for 0 < Xj < 1, i = 1,2, ... ,5. Find the joint conditional pdf of Xl, X2, and X3 given
that X" = X4 and Xs = Xs·
3.11.20. Suppose the random variables X and Yare jomtly distnbuted accocding to the pdf
fx,Y(x,y)
="76
(2 +"2
x
xy )
' o<
x < 1,
Find
(Il) /x(x)
(b) P(X > 2Y)
(c) p(Y > 11X > ~)
3.12
MOMENT-GENERAllNG FUNCTIONS
Finding moments of random variables directly, particularly the higher moments defined
in Section 3.6, is conceptually straightforward but can be quite problematic: Depending
on the nature of the pdf, integrals and sums of the form
fy(y) dy and L k r px(k)
t:,yY
alIk
can be very difficult Lo evaluate. Fortunately, an alternative method is available. For many
pdfs, we can find a moment-generating function (or mgf), Mw(t), one of whose properties
is that the rth derivative of Mw(t) evaluated at zero is equal to E(WY).
Calculating a Random Variable's Moment-Generating Function
In principle, what we call a moment-generating function is a direct application of
Theorem 3.5.3.
Definition 3.l2.1. Let W be a random variable. The moment-generatingjundion (mgf)
for W is denoted MwCt) and ~ven by
Lrlk pw(k)
Mw(t)
= E(rlw) =
if W is discrete
t<1l~
1-00
rl w fwCw) dw
/
at all vaJues of t for which the expected value exists.
ifWis continuous
260
Chapter 3
Random Variables
EXAMPlE 3.12.4
normal (or bell-shaped) curve was introduced in Example
cumbersome function
Its
is the
Var(Y). Derive the moment-generating function for this most
where Jl
f'..""t't1n,.\n"
models.
random '-'>lr,,,n,I'"
00
f
f
00
=
(VJbrO')
exp
-00
Evaluating the integral in Equation
is best "'1'£''"'1"'1" .... ,'"
of the numerator of the exponent (which means that
y is added and subtracted). That is, we can write
i -
(2Jl
+ 2u 2t)y +
=(y
+
(J-L
-
(Jl
0'21)2 -
+ 0'21))2
{JL
-
by
the
of half the coefficient of
+ 0' 2 0 2 + Jl2
O' 4t 2 + 2Jltu 2
two terms on the right-hand side of Equation 3.12.5, though,
not
can be
out of the integral, and Equation 3.12.4 reduces to
My(t)
=
(J-LI
y,
+
But, together, the latter two factors
generating function for a normally
My(t)
one (why?),
that the momentvariable is given by
=
QUESTIONS
3.12.1. Let X be a random variable with pdf px(k)
for k = 0.1,2, .... n - 1 and 0
1 - tfll
otherwise. Show that Mx(t) =
)
n(l - el
3.1U Two chips are drawn at random and without
[rom an urn that contains five
numbered 1 through 5. If the sum of the
is even, the random
5; if the sum of the chips drawn is odd, X = -3. Find the ma.me;nt·gerlenmrll2.
IUD,Ct1()D for X.
Section 3.12
Moment-Generating Functions
ll2.3. Find the expected value of ~x if X is a binominal random variable with
11
261
= 10 and
p=}.
3.12.4. Find the moment-generating function
the
random variable X whose
probability function is given by
3.12.5. Which pdfs would have the following moment-generating
6l2
(B) My (t) = e
(b) My(t) = 2/(2 - t)
G
(c) Mx(t) =
+ !e')4
(d) Mx(t) = O.3e' /0 - O.7e')
3.126. Let X have pdf
frey) =
y,
O:::;y:::;l
2 - y.
1:::;y~2
0,
elsewhere
!
Find My(t).
3.127. A random variable X is said to have a Poisson distribution if px(k)
P(X
k)
e- A )/'/k!,1I: = 0, 1.2., .... Find the moment-generating function for a Poisson random
variable.
Use the fact that
3.12..8. Let Y be a continuous random variable with h(y) = ye- Y , 0 :$ y. Show that
1
My(t) = (1 _
Using Moment-Generating Functions to Find Moments
practicedjinding the functions Mx (I) and My(t), we now turn to the theorem that
relationship to X" and y,..
Theorem 3.12.1. Let W be a rmwom variable with probability density function /w(w). [If
W is continuous, fw(w) must be sufficiently smooth to allow the order differentUltion
and
to be interchanged.} Lei Mw(t) be the moment-generating function for W.
Then, provided the rth moment exists,
Proof. We will
The
to
straightforward.
the theorem for the continuous case where r is
random variables and to an arbitrary positive un,,..,,,,,,. . rare
262
Chapter 3
Random Variables
For r
= 1,
foo e'Y Jy(y)
dt
M~I)(O) = !:.-
=
00
1=0
-00
1
1
=
00
yely fy(y)dy
-00
1=0
-00
f ()
d
dt
Y
Y
,=0
= I' yeO- y frey) dy
J
-<>0
= . [ yfr(y) dy = E(y)
For r =2,
= LX) y 2eY Jy(y)dy
f::
= LX) leO. y Jy(y)dy
-00
=
1=0-00
y2 fyCy) dy
o
= E(y2)
EXAMPlE 3.12.5
For a geometric random variable X with pdf
px(k)
we saw in Example
= (1
-
p,
k=1.2, ...
(1
p)e]-t
1 that
Mx(t)
-
Find the expected value of X by differentiating its moment-generating function.
the product rule, we can write the first derivative of M x (t) as
M~)(t)
= pet(-l)(l
-
Setting t
- (1 -
p)e)-2(-1)(1 - p)/
p(1 -
-::--:......:..--..:..:--=
(1 -
=0 shows that E(X) =
M(l) (0)
x
= E (X)
= c:-'-p_(1_-----"---'----==
[1 - (1
-
=
1
=p
p)
p
+-p
+
[1 - (1 -
p)e1rlpe
Section 3.12
Moment-Generating Functions
263
EXAMPLE 3.12.6
Find the expected value of an exponential random variable with pdf
frey)
= le-J..y,
y > 0
Use the fact that
My(t)
= A(A
- t)-1
(as shown in Example 3.12.3).
Differentiating My (t) gives
M~l)(t)
= A(-l)(A
-
Set t
- £)-2(-1)
A
(J.. - t)2
----;:;:-
= O. Then
M(l)
0 _
J..
(J.. - 0)2
E(y)
=
r ( )-
implying that
f
EXAMPLE 3.12.1
Find an expression for E(Xk) if the moment-generating function for X is given by
Mx(t)
= (1
- PI -
P2)
+
PIe'
+ P2e 2i
The only way to deduce a formula for an arbitrary moment such as £(Xk) is to calculate
the first couple moments and look for a pattern that can be generalized. Here,
E(X)
= Mr)(O) = Pleo + 2P2e2.O
= PI + 21'2
Taking the second derivative, we see that
264
Chapter 3
Random Variables
implying thai
E(X2)
= M~)(O) =
PI eO
= PI
+
+
22 P2t?o
22p2
Clearly, each successive differentiation will leave the PI term unaffected but will multiply
the P2 term by two. Therefore,
E(Xk) = Mf)(O) = PI
+
2k P2
Using Moment-Generating Functions to Find Variances
In addition to providing a useful technique for calculating E(W'), moment-generating
functions can also find variances, because
(3.12.6)
for any random variable W (recall Theorem 3.6.1). Other useful "descriptors" of pdfs can
also be reduced to com binations of moments. The skewness of a distribution, for exam pIe,
is a function of E[W - tt)3J, where tt = E(W). But
E[(W -
tt)3] = E(W 3) -
3E(W 2)E(W)
+
2[E(W)]3
In many cases, finding E[ (W - tt)2] or E[ (W - tt)3] could be quite difficult if momentgenerating functions were not available.
EXAMPLE 3.12.B
We know from Example 3.12.2 that if X is a binomial random variable with parameters n
and p, then
Mx(t) = (1 -
P
+
+
pe')"-l . pel
pil)"
Use Mx(t) to find the variance of X.
The first two derivatives of Mx(t) are
M~)(t)
= n(1
- P
and
M~)(t)
= pel
Setting t = 0 gives
and
. n(n - 1)(1 -
P
+
pe ),1-2 . pel
'
+
n(1 -
p
+
pe')II-J . pel
Section 3.12
=
n(n - 1)p2
=np(l -
Moment-Generating Functions
+
np -
165
(np)2
p)
(the same answer we found in ExampJe 3.9.8).
EXAMPLE 3.12.9
A discrete Tal1lQOlm variahle X is said to have a Poisson
Px(k)
==
e- A)../<
P(X
= k) = k!'
k
== 0, 1,2, ...
(An example of such a distribution is the mortality data
It can he
(see Question 3.12.7) that the 'HU'U"'H.--~""'"''''
random
is
by
Mx(t)
Case Study 3.3.1.)
function for a Poisson
=
Use Mx(t) to find E(X) and Var(X).
the first derivative of Mx(t)
M~\t) = e-Hle'
so
E(X) = M~)(O)
=
. leO
=A
Mo
""'1" H... the product rule to M~) (t) yields the sro:ma " ...... V<llfn
Mf)(t) ::::: e- HU .
t
)..1 + AI e-}..+J..t
- Ael
=0,
Mf)(o)
. )..eO +
= £(X 2 ) =
)..eO .
e- AHeo . leO
=A +)..2
variance of a Poisson random variable, then, proves to be the same as its mean:
Var(X)
= £(g2)
= Mf>(O)
+A
=A
[£(X)]2
[Mil) (O)f
266
Chapter 3
Random Variables
QUESTIONS
3.129. Calculate
y.~) for a random wniable whose
function is
Mdt)
3.12.10. Find
y4) if Y is an exponential random variable with Jy(y) =
,Y > O.
3.12.11. The form of the moment-generating function for a normal random variable is
My(1)
Example 3.12.4). Differentiate My(t)
= E( y)
andi;
3.12.12. What is
m{J~m(~Ill·-genelra[llng function My(f) =
(1 -
3.12.13. Find E(Y~) if the moment-generating function Cor Y is given
Recall
Use Example 3.12.4 to find E(y2) withoul taking any
Theorem 3.6.1
3.12.14. Find an expression for
jl!) if Mdt) = (1 - t /}..)-r, where).. is any
real
number and /' is a positive
3.1215. Use Mdt) to find the
value of the I.mifonn random variable described in
Question 3.12.1.
3.12.16. Find the variance of Y if My(t) =
(2).
Moment-Generating Functions to Identify pdf's
moments is not the only
of moment-generating functions.
are
the pdf of sums of random variableS--lhat is, finding
where
also used to
W WI +
+ ... + W,!. Their assistance in the latter is particularly important for two
reasons: (1) Many statistical procedures are denned in terms of sums. and (2) alternative
methods fOT
fw, + 1V2 +".+ W" (IJ') are extremely cumbersome.
The next two theorems give the background results necessary for deriving fw (ll').
Theorem 3.12.2 stales a
uniqueness property of
functions: H WI
and W2 are random variables with the same mgfs,
have the same
pdfs. In practice.
of Theorem 3.12.2
algebraic properties
Theorem3.12.2. Suppose 11701 Wl and Wz are random variflblesforwhich MWI (t) = MW2(t)
for some interval of f 's containing O.
= fW2 (ll).
o
Proof. See (97).
3.12.3.
8.
Let W be (l rannom variable with moment-generating .fimction Mw(J). Let V
oW + h. Theil
Mdt)
b. Let WI.
Tions
Mw(at)
. ..• WI! be independent tannOm variables wilh moment-generating funcW = W. + 'W2 + ... + w,J'
(t). MW2(t), ... . and Mv.'" (f),
MW(r)
Proof. The
=
= MII'I(J}
is left as an exercise.
. MW2(t)··· MW,,(t)
o
Section 3.12
Moment-Geflerating Functions
267
EXAMPLE 1.12.1 D
Suppose that Xl and Xl are two independent Poisson random variables with
l.l and 1.2, respectively. That is,
PX1(k)
=
= 0,1,
k
and
(k)
= P(X2 = k) =
----::-::-'--,
k =0, 1, 2 ...
Let X = Xl + X2. What is the pdf for X?
According to Example
the moment-generating functions for
MXI (1)
and Xl are
= e-AI+Ale
and
Moreover, if X = Xl
+
Xl, then by Part b of Theorem 3.12.3,
(3.12.7)
But, by inspection, EQuation
is the moment-generating function that a Poisson
random variable with 1 = 11 + 12 would have. It follows, then, by Theorem 3.12.2 that
px(k)
= -----:.;......
k =0, 1,2•...
Comment.
Poisson random variable reproduces
the sense that the sum of
independent Poissons is also a Poisson. A similar property holds for independent normal
random variables (see Question 3.12.19) and, under
conditions, for independent
binomiaJ random variables (recall
3.8.1).
EXAMPLE 3.12.11
We saw in Example 3.12.4 that a nonna) random variable, Y, with mean IL and variance
has
0'2
fy(Y)
= (1/&0') exp [ -21 (Y
- 1k)2]
0'
'
268
Chapter 3
Random Variables
andmgf
By definition, a stlmdard llOr17Uli random variable is a nonnal random variable for which
/-L = 0 and (J' = L Denoted
the pdf and
for a standard normal random variable
are fz(z)
ratio
= (lj.J2iC)(r1.2/2, -00
< Z < 00 and Mz(t)
== t;l2/2,
Show that the
Y-/-L
is a standard normal random variable, Z .
. Y-Jk
1
J-L
Wnte
as -y - -. By Part a of Theorem 3.12.3,
(J'
(J'
(J'
M(Y-/.ll/a(O
= e-JJ,I/u My (;)
= e-JJ,I/u iJ.ll/u-k?(tlu)2!2)
/21"'>
=e '"
But Mz(r)
=
so it follows from Theorem
that the pdf for
Y-J-L
is the same
(J'
as fz(z). (We call Y - Jk a Z transfor17Ultion. Its importance will become evident in
(J'
Chapter 4.)
QUESTIONS
3.12.17. Use Theorem 3.12.3(a) and Question 3.12.8 to find the moment-generating function
of the random variable Y, where fy(y) = J..ye- ky , y ?:: O.
3.Ul8. Let Y1,
and Y3 be independent random variables, each having the pdf of
tion 3.12.17. Use
3.12.3(b) to find the moment-generating function of
Y. +
+ Compare your anSwer to the
function in
tion 3.12.14.
lJ.2.19. Use Theorems 3.12.2 and
to determine which of the
statements is
true:
(8) The sum of two independent Poisson random variables has a Poisson distribution.
(b) The sum of two independent exponential random variables has an exponential
distribution.
(c) The sum of two independent nonnal random variables has a normal distribution.
3.12.20. Calculate
::; 2) if Mx(t)
= (~ +
3
Y2, .• " Y" is.a random sample of size 11 from a normal distribution
3.12.21. Suppose that
with mean Jl and standard deviation (J. Use moment-generating functions to deduce
1
the pdf of Y = Yi.
11
Section 3.13
Taking a Second look at Statistics (Interpreting Means)
269
3.12.2.2. Suppose the moment-generating function for a random variable W is given by
Calculate P(W :::: 1). Hint· Write W as a sum .
................. Suppose that X is a Poisson random variable, where px(k) = e-A)...k I k!, k
(a) Does the random variable W = 3X have a Poisson distribution?
(b) Does the random
W = 3X + 1 have a Poisson t1,jtrril"nt
3.12..24.
that Y is a
variable, where fy (y) = (1/$0') exp [
0,1, ....
_~ (Y :
Ji
Y}
--00 < Y < 00.
(a) Does the random variable W 3Y have a normal distribution?
(b) Does the random variable W = 3Y + 1 have a normal distribution?
TAKING A SECOND LOOK AT STATISTICS (INTERPRETING MEANS)
One of the most important
out of
3 is the notion of the expected
value (or mean) of a random variable. Defined in Section 3.5 as a number that reflects
of a pdf,
expected value (Ji) was originally introduced for the benefit of
the
gamblers. It spoke
to one of
most fundamental questions-How much will
I win or Lose, on the average, if I playa certain game? (Actually, the real question they
probably had in
was "How
are you going to lose, on the average?") Despite
having had sucha selfiSh, materialistic. gambling-oriented raison d'etre, the expected value
of aU persuasions as a
was
embraced by (respectable)
and
preeminently useful
of a distribution. Today, it would not be an exaggeration
to claim that the majority of all statistical analyses focus on either (1) the expected value
the expected values of two or more random
a single random variable or (2)
variables.
In the lingo of applied
there are actually two fundamentally different types
of "means"-population means and sample means. The term "population mean" is
a synonym for what mathematical statisticians would caIl an expected value-that is,
with a
a popu1ation mean (IJ,) is a weighted average of tbe possible values
theoretical probability mode], either p x(k) or fy (y). depending on whether the underlying
of
random variable is discrete or continuous. A sample mean is the arithmetic
a set of measurements. If,
example, n observations--Yl, )'2. " ' , Yn-are taken on a
Y, the sample mean is denoted
continuous random
270
3
Random Variables
Conceptually. sample means are estimates
IA'IJUJla ..IVH means, where the "quality"
(2) the standard deviation
of the estimation is a function of (1) the
(0)
with the individual measurements.
gets
andlor the standard deviation gets smaller, the aplDrOiXlllnatlon Vliil tend to get
beUer.
means (either y or J-i.) is nOl always easy. To be sure, what they
in principle is dear enough-both v and J-i. are measuring the centers
their rPl:nP.,'I".rP
IV""".)"'':>. Still, many a wrong conclusion can be traced directly to researchers misthe value of a mean. Why? Because the distributions that y andlor J-i. are
may be dramatically different than the distributions we think
point arises in connection with SAT scores. Each Fall the
each the fifty states and the District of Columbia
are released by the
(ETS). With "accountability" being
one of the new
words associated with K-12 education, SAT scores
have become highly
At the
level, Democrats and Republicans each
in no small measure by
campaign on their own versions
scores on standardized exams,
at the state
legislatures often modify
education budgets in response to how well or how poorly their students performed
the year before. Does it make sense,
to use SAT
to characterize the
quality of a state's educa£ion
Absolutely not!
of this sort refer to very
them at
will
different distributions from srate to state. Any attempt to
necessarily be misleading.
One such state-by-state SAT comparison that
in
in Table
(128). Notice that Tennessee's entry is 1023,
is
Does it foHow that Tennessee's educational
is among the best in the
Probably not. Most independent assessments of K-12
rank Tennessee's
are
schools
the weakest in the nation, not among the best. If those
do Tennessee's students do so wen on the SAT?
The answer to thar question lies in the academic profiles of the students who take
in
college-bound students in that state apply exlusively to
the
schools in the South and the Midwest, where admissions are based on the ACT, not
the SAT. The SAT is primarily
by private
where admissions tend to be
more competitive. As a result, the students in Tennessee who take the SAT are not
representative of the entire population of
in that state. A disproportionate
number are exceptionally
those
the students who feel that
schools. The number 1023.
they have the ability to be competitive at
then, is the average of something (in this case, an
subset of all Tennessee students),
but it does not correspond to the center of the SAT distribution for all Tennessee
students.
we look beyond the
moral here is that analyzing data
obvious. What we learn in Chapter 3 about random variables and probability distributions
and
values is helpful only if we take the time to Jearn about the context and
Appendix l.A.1
MINITAB Applications
271
TABlE 3.13.1
State
AK
AL
AZ
AR
CA
CO
DE
DC
FL
GA
Average
Score
State
911
MT
1011
939
935
895
NH
NJ
924
893
1003
898
892
NM
NY
NC
849
ND
1056
879
OH
OK
OR
PA
RI
966
1019
927
879
SC
SD
TN
838
1031
1023
UT
1067
969
844
881
969
IL
IN
IA
NE
NV
1024
876
1080
1044
888
860
886
LA
ME
MD
MA
MI
MN
MS
MO
1011
883
908
1009
1057
1013
1017
VT
VA
WA
WV
WI
WY
899
893
922
921
1044
980
the idiosyncracies of the phenomenon being studied. To do otherwise is likely to
conclusions that are. at best, superficial and. at worst., incorrect.
:NDIX 3.A.1
to
MINITAB APPUCATIONS
Numerous software packages are available for doing a variety of probability and statistical
calculatjons. Among the first to be developed and one that continues to be very popular
is MINITAB. Beginning
we will include at the ends of certain chapters a sbort
discussion of MINITAB solutions to some of the problems that were discussed in that
chapter. What other software
can do and the ways
outputs are formatted
are likely to be quite similar.
272
Chapter 3
Random Variables
Contained in MINITAB are subroutines that can do some of the more important pdf
and cdf computations described in Sections 3.3
3.4. In the case of binomial random
variables, for
the statements
MTB > pdf
SUBC > binomial n p.
and
MTB > edf k;
SUBC > binomial n p.
will calculate G)pk(l
~
p),,-k and
C)pr(l -
p)n-r, respectively. Figure 3.A.l.l
shows the MINITAB program for doing the edf calculation
P(X :S 15») asked for in
Part a of Example 3.2.2.
The commands
k and edt k can be run on many of the probability models most
likely to
encountered in real-world problems. Those on the list that we have already
seen are the binomial, Poisson, normal, uniform, and exponential distributions.
MTB > edt 15;
SUBC > binomial 30 0.60.
Cumulative Distribution Function
Binomial with n ~ 30 and p = 0.600000
x
P(x <,.. x)
15.00
0.1754
FIGURE l.A.. 1.1
discrete random variables, the cdC can be printed out in its entirety (lhat is, for
every integer) by deleting the argument k and using the command MTB < edt;. Typical
is the output in
corresponding to the cd! for a binomial random variable
with n = 4 and p =
i.
HTB > edt,
SUBC > binomial 4 0.167.
Cumulative Distribution Function
BinOmial with n = 4 and p
x
p( X <= x)
o
0.4815
1
0.8676
2
0.9837
3
0.9992
4
1.0000
FIGURE 3.A.. 1..2
=0
167000
Appendix 3A 1
MINITAB Applications
213
Also available is an inverse
command, which in the case of a continuous random
variable Y and a specified probability p identifies
value y having the property that
P(Y ::: y) = Fr(Y) = p.
example, if p = 0,60 and Y is an exponential random
variable with pdf fy{y)
e-Y', y > 0, the value y = 0.9163 has
property that
P(Y ::: 0.9163) = Fy(O.9163) = 0.60. That
=
fO. 9163
Fy(0.9163)
= 10
e-Y dy
= 0.60
With MINITAB the number 0.9163 is found by using the command MTB :> invcd:f 0,60
(see Figure 3.A.l.3).
MTB > invcdf 0.60;
SUBC> exponential 1.
Inverse Cumulative Distribution Function
Exponential vith mean - 1.00000
P(X <= x)
x
0.6000
O.
FlGuru: 3.A.1.]
CHAPTER
4
Special Distributions
4" 1
4.2
4.3
4.4
4.S
4.6
INTRODUCTION
Tl-IE POISSON DISTRIBlITlON
THE NORMAL DISTRIBUTION
THE GEOMETRIC DISTRIBUTION
THE NEGATIVE BINOMIAL DISTRIBUTION
THE GAMMA DISTRIBUTION
4.1 TAKING A SECOND lOOK AT STATISTICS (MONTE CARLO SIMULATIONS)
APPENDIX 4.A.1 MINITAB APPliCATIONS
APPENDIX 4"A.2 A PROOF OF THE CENTRAL UMIT Tl-IEOREM
L. A. J. Quetelet
Q"H~li;L
Although he maintained lifelong literary and artistic interests, Quetelet's
mathematical talents led him to a doctorate from the University of Ghent
and from there to a college teaching position in Brussels. In 1833 he
was appointed astronomer at the Brussels Royal Observatory, after having
been largely responsible for irs founding. His work with the Belgian census
marked the beginning of his pioneering efforts in what today would
be called mathematical sociology. Quetelet was well-known throughout
in scientific and literary circles: At the time of his death he was a
member of more than one hundred learned societies.
-lambert Adolphe Jacques Quetelet (1796-1874)
214
Section 4.2
The Poisson Distribution
275
INTRODUCTION
"qualify" as a probability model, a function defined over a sample space S needs to
for all outcomes in and (2) it must
satisfy only two
(1) It must be
sum or integrate to one.
means., faT
that fy(y)
=~ +
10 (~
1
be considered a pdf beca use Jy (y) ?:: 0 for all 0 :5: y ::: 1 an d
+
0 ::: y :5: 1 can
7;3)
= 1.
It certainly does not follow, though, [hat
Jy(y) and px(k) that satisfy these two
criteria would actually be used as probability models. A pdfhas practical significance only
if it does, indeed. model lhe probabilistic behavior of real-world phenomena. In point
of fact, only a handful offunctions do [and Jy (y) = ~
+
2
0:5: y :5: 1 is not one of
them!].
Whether a probability function-say, Jy{y)-adequately models a given phenomenon
ultimately depends on whether the physical factors that influence the value of Y parallel
the mathematical asswnptions implicit in fy(y). Surprisingly, many measurements
random variables) that seem to be very different are actually the consequence the same
set of assumptions (and will, therefore, be modeled by the same pdf). That
it makes
sense to single out these "real~woTld" pdf's and
their properties in more detail.
for the first time-recall the attention
This, of course, is not an idea we are
to the binomial and hypergeometric distributions in Section
Chapter 4 continues in the spirit of $e{:tion 3.2 by examining five other widely used
models. Three of
five are discrete; the other two are continuous. One of the continuous
pdf's is the normal (or Gaussian) distribution. which, by far, is the most important
all probability models. As we will see, the normal "curve" figures prominently in every
chapter from this point on.
Chapter 4. The only way to
fully the
Examples playa major role
generality of a probability model is to look at some of specific applications. Included in
from the discovery of alpha.particle radiation to an
this chapter are case studies
early ESP experiment to an analysis of pregnancy durations to counti.ng bug parts in peanut
butter.
rHE POISSON DISTRIBUTION
The binomial distribution problems that appeared in Section 3.2 all had relatively small
values for JI, so evaluating px(k) = P(X = k) =
(~) pk(l
-
p)ll-k was not particularly
difficult. But suppose 11 were 1000 and k, 500. Evaluating px(500) would be a formidable
task for many handheld calculators, even today. Two hundred years ago, the prospect of
doing
binomial calculations by hand was a catalyst for mathematicians to
develop some easy-to-use approximations. One of
first such approximations was the
Poisson limit, which eventually gave rise to the Poisson distribution. Both are described
in $e{:tion 4.2.
Simeon Denis Poisson (1781-1840) was an eminent French mathematician and physian academic administrator of some note, and, according to an 1826 letter from the
276
Chapter 4
Special Distributions
mathematician Abel to a
Poisson was a man who knew "how to behave with a
deal of dignity." One of Poisson's many
was the application of probability
to the law, and in 1837 he wrote Recherches sur III Probabilite de Jugemenls. Included in
the latter is a limit for px(k)
= (:) l(1 -
p)1I-k
that holds when n approaches 00, p
approaches 0, and np remains constant. practice. Poisson's limit is used to approximate
hard·t(}-(;alculate binomial
where the values of nand p
the conditions
of the limit-that when 1'1 is large and p is small.
The Poisson limit
Deriving an asymptotic expression for the binomial probability modeJ is a straightforward
exercise in calculus, given that rtp is to remain fixed as n increases.
Theorem 4..2.1. Suppose X is a binomial random variable, where
lIn -
00 and
p -+ 0 in such a way that)..
lim
11-"00
P(X
= np remains constant. then
= k) =
p ...... O
np=onst.
Proof We begin by rewriting the binomial probability in teons of 1:
~r-k
=Ji~k!(nn!
-
But since (1 - ()..jn)]" _
(:k) (1 ~)-k(l_ ~r
~~ nli~-(n-:-!-k)-- l_-;- (1 - ~r
__
as n _
we need only show that
00,
n!
Cn -
-----,-_1
k)!(n
to prove the theorem. However, note that
... (n - k
n!
+
--~~---~=---------------
(n - k)!(n -
(n - l)(n - l)···(n -)..)
a quantity that, indeed, tends to 1 as n -+
00
(since).. remains constant).
o
Section 4.2
The Poisson Distribution 277
EXAMPLE 4.1.1
Theorem 4.2,1 is an asymptotic
Left unanswered is the question of the relevance
of the Poisson limit for finite nand p. That is, how large does n have to be and how small
does p have to be before e-np(np)klk! becomes a good approximation to the binomial
probability, pxCk)?
Since "good approximation" is undefined, there is no way to answer that question in
any completely specific way. Tables
and 4.2.2, though, offer a partial solution by
comparing the closeness of the approximation for two particular sets of values for nand p.
In both cases A = np is equal to one. but in the former, n is set equal to five-in the
latter, to one hundred We see in Table 4.2.1 (n = 5) that for some k the agreement
between the binomial probability and Poisson's limit is not very good. If n is as large as
one hundred, though (Table 4.2.2), the agreement is remarkably good for aU k.
TABLE 4.2.1: Binomial Probabilities and
Poisson Umits; n = 5 and p = ~ (J.. == 1)
(O.2)k (0.8)S-k
k
0
1
2
3
4
5
6+
0.328
0,410
0,205
0.051
0.006
0.000
0
1.000
e- 1 (1)k
0.368
0.368
0.184
0.061
0.015
0,003
0,001
1.000
TABLE 4.2.2: Binomial Probabilities and Poisson
Umits: n "" 100 and p ...
(J.. = 1)
-,-lm
k
0
1
2
3
4
5
6
7
8
9
10
C~) (O.Ol)k(O.99)l00-k
0.366032
0.369730
0.184865
0.060999
0.014942
0.002898
0.000463
0.000063
0.000007
0.000001
0.000000
0.367879
0367879
0.183940
0.061313
0.015328
0.003066
0.000511
0.000073
O. {)(X)()(\9
0.000001
0.000000
1.000000
0.999999
EXAMPLE 4.2.2
Shadyrest Hospital draws its patients from a rural area that has twelve thousand elderly
residents. The probability that anyone of the twelve thousand will have a heart attack
on any given day and will need to be connected to a special cardiac monitoring machine
been estimated to be one
eight thousand. Currently, the hospital
three such
machines. What is the probability that equipment will be inadequate to meet tomorrow's
emergencies?
Let X denote the nwnber of residents who will need the cardiac machine tomorrow.
Note that X is a binomial random variable based on a large
12,000) and a small
p( ~). As such, Poisson's limit can be used to approximate px(k) for any k. In
=
218
Chapter 4
Special Distributions
particular,
P(Shadyrest's cardiac facilities are inadequate) = P(X > 3)
=1
P{X S 3)
=1
~C2,~)(~y(=y2.LOO-k
,;,,1
3 e-U(L5)k
k!
=::0.0656
where A = np = 12,000(~) = L5. On the
then, Shadyrest will not be
to meet all the cardiac needs of its clientele once every fifteen or sixteen days. (Based
on the binomial and Poisson limit comparisons shown on page 276, we would expect the
approximation here to be excellent-n
12,000) is much
and p( = . ) is much
smaller than their counterparts in Table
so the conditions of Theorem 4.2.1 are
more nearly satisfied.)
CASE STUDY 4.2.1
Leukemia is a rare form of cancer whose cause and mode of transmission remain
largely unknown. While evidence abounds that
exposure to radiation can
ncrea~,e a person's risk of contracting the
it is at the same time true that
most cases occur among persons whose
contains no such overexposure. A
related issue, one maybe even more basic than the causality question, concerns the
spread of the
It is
to say that the prevailing medical opinion is that most
are not contagious-still, the hypothesis
that some forms
forms of
of the
particularly
childhood variety. may be. What continues to fuel this
speculat10n are
discoveries of so--called
clusters." aggregations 1n time
and space of unusually large numbers of cases.
To
one of the most frequently cited leukemia clusters in the medical literature
19t5U<> in
Illinois, a suburb of
occurred during the late
and
(74). In the 5~-year period from 1956 to the first four months of
in
Niles reported a total of eight cases of leukemia among children less than fifteen years
of
The number at
(that is, the number of residents in that age range) was
7(J76. To assess the likelihood of that many cases occurring in such a small population,
it is
to look first at the
incidence in
towns. For all
of Cook county, excluding
were 1.152,695 children less than 15 years
of age-and among those, 286 diagnosed cases of leukemia. That gives an average
leukemia rate of 24.8 cases
100,000:
100,000
= 24.8 cases/100,000 children in
years
(Continuea em next poge)
Section 4.2
The Poisson Distribution
279
Now, imagine the 7076 children in Ni1es LO be a series of 11 = 7076 (lndependent)
Bernoulli lrials, each having a probability p = 24.8/100,000 =: 0.000248 of contracting leukemia. The question then becomes, given an n
7076 and a p of 0.000248,
how likely is i1 that eight "successes" would occur? (The expected number, of course,
would be 7f.J76 X 0.000248 = 1.75,) Actually, for reasons that wiIJ be elaborated on
in Chapter 6, it will prove more meaningful to consider the related event, eight or
more cases occurring in a 5!-year span. If the probability associated with
latter is
very small, it could be argued that leukemia did not occur randomly in Niles and that,
perhaps, contagion was a factor.
Using the binomial distribution, we can express the probability of eight or more
cases as
P(80r more cases)
=
(
7~6) (O.0OO248)k (0.99fJ7 52)71J76-k
(4.2.1)
Much of the computational unpleasantness implicit in Equation 4.2.1 can be avoided
by appealing to Theorem 4.2.1. Given that np = 7076 x 0.000248 = 1.75,
P(X ::: 8)
1
P(X .:::: 7)
1 -
L --------'1;=0
k!
7
==
= 1 - 0.99951
0.00049
How close can we expect 0.00049 to be to the "true" binomial sum? Very close.
Considering the accuracy of the Poisson limit when n is as small as one hundred (recall
Table 4.2.2), we should feel very confident here, where n is 7076.
Interpreting the 0.00049 probability is not nearly as easy as assessing its accuracy.
The fact that the probability is so very small tends to denigrate the hypothesis that
leukemia in Niles occurred at random. On the other hand, rare events, such as clusters,
do happen by chance. The basic difficulty in putting the probability associated with
a given duster in any meaningful perspective is not knowing in how many similar
communities leukemia did not exhibit a tendency to cluster. That there is no obvious
way to do this is one reason the leukemia controversy is still with us.
QUESTIONS
4.2L If a typist averages one misspelling in every 3250 words, what are the chances that a
6OOO-word
is free of all such errors? Answer the question two ways-first, by
using an exact binomial analysis, and second, by using a Poisson approximation.
the similarity (or dissimilarity) the two answers surprise you? Explain.
280
Chapter 4
Special Distributions
4.2.2. A medical study recently documented that 90S mistakes were made among the 289,411
wriUen during one year at a large metropolitan leaching hospital. :SU1Pp()Se
patient is admitted with a condition serious
to warrant 10 different pn~SCI'lOtjonIS.
Approximate the probability that at least one will contain an error.
4.2.3. Five hundred people are attending the tirst annual "1 was Hit by Lighting" Club.
Approximate the probability that at most one of the 500 was born on Poisson's birthday.
4.2.4. A chromosome mutation linked with oolorblindness is known to occur, on the average,
once in every 10,000 births.
(0) Approximate the probability thal exactly 3 of the next 20,000 babies born will have
the mutation.
(b) How many babies out of the next 20,000 would have to be born with the mutation
to oonvince you that the "1 in 10,000" estimate is too low? Hint Calculate
P(X .:: k) = 1
P(X :::: k - 1) for various k. (Recall Case Study 4.2.1.)
4.2.5. Suppose that 1% of all items in a supermarket are not priced properly. A customer buys
10 items. Whal is the probability that she will be delayed by the cashier because one
or more of her items requires a
check? Calculate both a binomiaJ answer and a
Explain.
Poisson answer. Is the binomial model "exact" in this
4.2.6, A newly formed life insurance company has underwritten term policies on 120 women
between lhe
of 40 and 44, Suppose that each woman has a 11150 probability of
dying
next calendar year, and each death requires the company to payout
$50,000 in
Approximate the probability that the oompany will have to pay at
least $150,000 in benefits nexi year.
4.2.7. According to an airline industry report (187). roughly 1 piece ofluggage out of every 200
that are checked is lost. Suppose that a frequent-flying businesswoman will be checking
120 bags over lhe course of the next year. Approximate the probabilily that she will lose
2 of more pieces of luggage.
by some
4.2.8. Electromagnetic fields
by power transmission lines are
researchers 10 be a cause of cancel'. Especially at risk would be telephone linemen
because of their frequent proximity to high-voltage wires. According to one study. two
cases of a rare form of cancer were detected among a group of 9500 linemen (181).
In the
population, the incidence of that particular oondition is on the order of
one in a million, What would you oonclude? Hint: Recall the approach taken in Case
Study 4.2.1.
4.2.9. Astronomers estimate that as many as 100 billioo stars in the Milky Way galaxy are
encircled by planets. If so, we may have a plethora of cosmic neighbors, Let p denote
the probability that any such solar system contains intelligent life. How small can p be
and still
a 50-SO chance thai we are not alone?
The Poisson Distribution
The real significance of Poisson's limit theorem went unrecognized for more than fifty
years. Fo)' most of the latter part of the nineteenth century.
4.2.1 was taken
strictly at face value: It provided a convenient approximation for px(k) when X is
binomial, II is large, and p is smalL BUI then in 1898 a Gemlan professor, Ladislaus
von Bortkiewicz, published a monograph entitled Das Geselz der Kleillen Zahlen (The
Law of Small Numbers) that would quickly transform Poisson's "limit" into Poisson's
"distri bution."
What is best remembered about
monograph is the curious set of data
described in Question 4.2.10. The measurements reoorded were the numbers of Prussjan
Section 4,2
Poisson Distribution
281
who were kicked to death by tbeir horses. In analyzing those figures,
Bortkiewic2: was
to show that
function e-"J.. )...1'- I k! is a
probability model
its own right, even when (1) no
binomial random
is present and (2) values
for n aDd p are unavailable. Other
to follow Bortkiewicz's lead,
showing up in ..... "'.Ull""'~
and a steady stream of Poisson
journals. Today the function px(k) = e- h )..." Ik! is universally
as being
the three or four most important data models in all of statistics.
SOlelle.rs
Theorem
The random variable X is said to have a Poisson distribution if
px(k)
= P(X
where )... is a positive constant.
Var(X) = J...
k!
k = 0, 1,2, ..
'
for any Poisson ran:QOJm variable, E(X)
Proof. To show that px(k)
px(k) :::: 0 for all nonnegative
nt ..,{Y . . "rc
= ).. and
as a probability function, note, first of all, that
k. Also, px(k) sums to one:
00
- - =e-A~
L."
kl
k=O
expansion of
=
eA. Verifying that
= )... has already been done in Example
functions.
E(X)
= )..
using moment-generating
fitting the Poisson Distribution to Data
data invariably
to the numbers of
a
event occurs
a series of'''urnts'' (often time or space). For example, X might be the .,.,,~.""v nUlrnbc~r
accidents reported at a given intersection,
records are kept
an
the resulting data wouLd be the sample kl. k2. . . . •
where each ki is a nonnegative
Pnii"",,..,..
kiS can be viewed as
Whether or not a set
so on in the sample are
proportions of Os,
that X = 0, 1,2, and so on, as
by px(k)
show data sets where the variability in the
preOlctea by the Poisson distribution. Notice in
by
sample mean of
<>LL'-">""'''U~VU
will be taken
=
data depends on
the
similar to the probabilities
rA>..k Ikt The next two case ,,. ...,.....""'"
is consistent with the probabilities
case
)... in px(k) is
ki. The reason for
Cbapter 5.
282
Chapter 4
Special Distributions
CASE STUDY 4.2.2
Among the early research projects investigating the nature of radiation was a 1910
study of a-particle emission by Ernest Rutherford and
Geiger (156). For
of 2608 eighth-minute
the two physicists recorded the nwnber of a-particles
called a
from a polonium source
detected by what would eventually
counter). The numbers and proportions of times that k sllCh particles were
detected in a
eighth-minute (k = 0. I, 2, ... ) are detailed in the first three
columns of
4.2.3. Two a particles,
example, were detected in each of
eighth-minute
meaning
X = 2 was the observation recorded 15%
383/2608 x 100) the time.
To see
a probability function of
form px(k) = e-A.)J'jk! can adequately
A with
model the
proportions in the third column, we first need to
the
average value for X. Suppose the six observations comprising the
category are each
the value eleven. Then
TABLE: 4.2.3
No. Detected, k
0
1
2
3
4
5
6
0.02
0.08
0.16
203
383
525
532
408
139
45
27
10
6
2608
7
8
9
10
11+
0.20
0.15
0.10
0.05
0.03
0.01
0.00
0.00
0.10
0.05
0.02
0.01
0.00
0.00
1.0
=
k=
10,092
3.fr7
and the presumed model is px(k) = e- 3.87 (3.fr7)k/k!, k = 0,1,2, .... Notice how
closely the entries in the fourth column [i.e., p:l{(O}, Px(l}. px(2), ... J agree with
the sample proportions appearing in the third column. The conclusion
es<;·ap,tlme: The phenomenon
radiation can be modeled very effectively
Poisson distribution.
Section
The Poisson Distribution
283
CASE STUDY
Table 4.2.4 gives the numbers of fumbles made by 110 Division
during a recent weekend's slate of fifty-five
(I07).
contention that the number of fumbles, X, that a team makes
Poisson random variable?
sueloort the
is a
TABLE 4.2.4
2
5
1
0
2
1
4
3
1
1
1
2
2
0
1
4
1
2
3
2
2
1
3
3
4
2
0
1
4
6
2
3
2
0
5
1
5
3
4
2
3
2
4
1
4
2
4
1
4
1
1
5
1
2
3
1
3
4
2
1
3
2
2
2
4
4
3
t
4
2
0
2
0
3
5
6
0
3
6
3
7
4
2
5
1
4
1
3
4
3
5
2
2
1
1
2
5
5
2
3
3
1
2
4
1
2
5
3
3
0
The first step in summarizing these data is to tally the frequencies and calculate the
Columns 1
sample proportions associated with each value of X
Notice, also, that the average number of fumbles per team is 2.55:
TABLE 4..2.5
No.
Frequency
k
o
8
1
2
24
27
20
3
17
4
5
10
3
1
6
110
k=
Substituting
model most
Proportion
PX (k) = e-2.55 (2.55)k I k!
0.07
0.22
0.08
0.25
0.18
0.16
0.09
0.03
om
-1.0
0.20
0.22
0.14
0.07
0.03
0.01
1.0
_8(:....;0)_+_24_<:....;1)_+_27:....;{2....:.)_+_·._._+_1:.....:{7...:..)
110
for A, then, gives px(k) = e-255 (2.55)k I k! as the particular Poisson
to fit the data.
{Continued on
284
Chapter 4
Distributions
(Case Sludy 4.2.3 continued)
column of Table 4.25 shows px(k) evaluated for each
eight values
listed for k: px(O) e-2.55(2.55)o 10! = 0.08, and so on. Once again, the row-by-row
agreement is quite strong. There appears to be nothing in these data that would
refute the presumption that the number of fumbles a team makes is a Poisson random
variable.
The Poisson Model: The law of Small Numbers
Given that the expression e-'J..).. k I k!
phenomena as diverse as
and
fumbles raises an obvious question: Why is that same px(k) describing
random variables? The answer, of course, is that the underlying
conditions that produce those two sets of measurements are actually much the same,
despite how superficially different the resulting data may seem to be. Both phenomena
of a set of mathematical assumptions known as the Poisson model. Any
are
measurements that are derived
conditions that mirror
assumptions will
nelcessarilv vary in accordance
the Poisson distribution.
the
Consider, for example, the number of fumbles that a football team makes
dividing a time interval of
T into n nonoverlapping
course of a
~, where n is large (see
4.2.1). ~UPp4JSe
.L'-"-'LU'£LlX
Tin
...--'--.
234
5
n
T
RGURE4.2.1
given
is
1. The probability that two or more fumbles occur in
"''''''' .... u''x''x'J O.
2. Fumbles are U'''''''LJ""UIU'-"
subinterval is constant over the
occurs
a
3. The probability tha t a
entire interval from 0 to T.
The n subintervals, then, are analogous to the n independent trials that form the backdrop
(or
"binomial mode]": In each subinterval there will be either zero fumbles or one
L"-llJLAUlI;;', where
Pn
P(fumble occurs in a given subinterval)
remains constant from subinterval to subinterval.
Section
The Poisson Distribution
Let the random
X denote the total
time T, and let A denote the rate at which a team
0.10 fumbles per uu"' .....,,
E(X)
which implies that Pn
AT
Px(k)
n
= AT = np"
285
of fumbles a team makes during
(e.g., A might be
as
(why?)
From Theorem 4.2.1, then,
P(X
= k)
=: -;;
( ) (
AT)k (
1
_ A~)"-k
"
(4.2.2)
So, if a team fumbles at the rate of, say, 0.10 times per minute and they have
baU for
30 nrinutes
a game, AT = (0.1)(30) = 3.0, and the probability that they fumble
exactly k times is approximated by the pdf, px (k) = e- 3 .o(3.0)k I kt, k = D. 1,2....
Now we can see more clearly why Poisson's
" as given in Theorem
, is so
important.
three Poisson model assumptions
at the top of the page for football
fumbles are so unexceptional that they
to countless real-world phenomena.
time they do,
pdf px(k) = e-l. T (ATi Jkl
application.
calculating Poisson Probabilities
In practice, calculating Poisson probabilities is an exercise in choosing T so that AT
represents the expected number of occurrences in whatever "unit" is
with
the random variable X. They look
but the pdf's px(k) =
PX (k) = e-)"T (AT)I!. Jk! are exactly the same and will give identical values
once A
T are properly defined.
EXAMPLE 4.2.3
Suppose
typographical errors are
at the rate of 0.4 per
in State Tech's
pages long, what is the probability
campus newspaper. If next Tuesday's I:?WlLl\)'U is
that fewer than three typos win appear?
of errors that
in sixteen pages.
We start by defining X to be the
The
of independence and constant probability are not
in
setting, so X is likely to be a Poisson random variable. To answer the question using
in Theorem 4.2.2, we
to set ), equal to E(X). But if the error rate is
the expected number typos in sixteen pages will
6.4:
'-4·--
x 16
= 6.4 errors
286
Chapter 4
Special Distributions
It
then, that
P(X < 3) = P(X
:s 2)::::
L e-6.4(64)1<
k
2
'
l
Jc.=O
•
e- 6.4 (6.4)1
e-6.4(6.4)o
+
O!
=
I!
+ --:::-::---
= 0.046
If .... ,...,,,,.,,,... 4.2.2 is
we would define
A = OA errors/page
and
T = 16
Then AT = E(X) = 6.4 and P(X <
would be
e-6A (6.4)K / k!, the same numerical
value found from Theorem 4.22.
EXAMPLE 4.2.4
Entomologists estimate that an average person consumes almost a
of bug parts
in
each year (180). There are that many insect eggs. larvae, and miscel1aneoU<J body
the
we eat and the liquids we drink. The Food and Drug
(FDA) sets
a
Level (FDAL) for each product: Bug-part con,cerltr
the FDAL are considered acceptable. The legal limit for peanut
insect
hundred grams. Suppose the crackers you
bought
"I.,n,""""" with twenty grams of peanut butter. What are the ~"'H~~~
that snack will
at
crunchy critters?
Let X denote the number of bug parts in twenty grams of peanut butter. Assuming
level equal to the FDA limit-that is, thirty
the worst, we will set the
hundred
0.30 fragments/g). Notice
E(X) = 6.0:
fragments
---=--g
X
20 g = 6.0 Iral~e:llI.s
or more bug parts is a
It follows, then, that the probability that your snack
disgusting 0.71:
P(X :::: 5) = 1 -
P(X::::: 4)
1
:::: 1
=
Bon
0.2851
Section 4.2
The Poisson Distribution 181
QUESTIONS
4.2.10. During the latter part of the
century, Prussian officials
information
posed to cavalry soldiers. A total 10 cavalry corps
relating to the hazards that
were monitored over a period 20
Recorded for each year and each corps was
X> the annual number of
to kicks. Summarized in the following table are
the 200 values recorded for X (14).
that these data call be modeled by a Poisson
pdf. Follow the procedure illustrated in Case Studies 4.22 and 4.2.3.
No. of
Observed Number of Corps-Years
in Which k Fatalities Occurred
k
o
109
1
65
22
3
2
3
4
1
20()
4.2.11. A random sample of
seniors enrolled at the University
West Florida was
categorized according to X, the number of times they had changed '"''lIV''' (114). Based
011 the summary of that
shown in the following table,
you conclude
that X can be
as a Poisson random variable?
Number
Major Changes
o
Frequency
237
1
90
2
22
3
7
4.2.12. Midwestern Skies books 10 commuter flights each week
totals are much the
same from week to
as are the numbers of
that are checked.
Listed in the following table are the numbers of bags that were
each of the
first 40 weeks in 2004. Do these figUTe8 support the presumption that
number of
bags lost by
during a typical week is a Poisson
variable?
Week
1
2
3
4
S
6
7
8
9
10
11
12
13
Week
1
o
o
3
4
1
o
2
o
2
3
1
2
14
15
16
17
18
19
20
21
Lost
2
I
3
o
2
S
2
22
23
1
1
1
24
2
25
1
26
3
27
28
29
1
2
o
o
31
t
32
33
3
1
35
36
37
38
o
2
40
1
4
2
1
o
288
Chapter 4
Special Distributioos
4..2.13. In 1893, New Zealand became the first country to permit women to vote. Scattered
over the ensuing 113 years, various countries joined this movement to
this
to women. The table below (127) shows how many countries took this step in a given
year. Do these data seem to follow a Poisson distribu60n?
Yearly Number of Countries
Women the Vote
o
82
1
25
4
2
3
o
4
2
4.2.14.. The following are the daily numbers of death notices for women over the age of 80
that appeared in the London Times over a three-year period (73).
Number of Deaths
Observed ... v ..", ....... ,·"
o
2
3
4
162
267
271
185
111
5
6
7
8
9
61
27
8
3
1
1
10%
(8) Does the Poisson pdf provide a good description of the variability pattern evident
in these data?
(b) If your answer to Part (a) is "no," which
think
not be holding?
the Poisson model
do you
4.l.l5. A certain
of European mite is capable of damaging the bark on orange trees.
The following are the results of inspections done on 100 saplings chosen at random
from a
orchard. The measurement recorded, X, is the number of mite infestations
found on
trunk of each tree. Is it reasonable to assume that X is a Poisson random
variable? If not, which of the Poisson model assumptions is likely not to be true?
No. of
k
No. of Trees
0
1
55
20
2
3
21
1
1
1
0
1
4
5
6
7
Section 4.2
The Poisson Distribution
289
4.2.16. A tool and die press that stamps out cams used in small gasoline engines tends to break
down once every five hours, The machine can be repaired and put back on line quickly,
but each such incident costs $50. What is the probability that maintenance expenses
for the press will be no more than $100 on a typical eight-hour workday?
4.2.17. In a new fiber optic communication system, transmission errors occur at the rate of 1.5
per 10 seconds. What is the probability that more than two errors will occur during the
next half-minute?
4.2.18. Assume that the number of hits, X. that a baseball team makes in a nine-inning game
has a Poisson distribution. [f the probability that a team makes zero hits is what are
their chances of getting two or more hits?
4.2.19. Flaws in metal sheeting produced by a high-temperature roller occur at the rate of one
per 10 square feet. What is the probability that three or more flaws will appear in a
5-by-8-foot panel?
4.2.20. Suppose a radioactive source is metered for two hours, during which time the total
number of alpha particles counted is 482. What is the probability that exactly three
panicles will be counted in the next two minutes? Answer the question two ways--nrst,
by defining X to be the number of particles counted in two minutes, and second, by
defining X to be the number of particles counted in one minute.
4.2.21. Suppose that on-the-job in,juries in a textite mill occur at the rate of 0.1 per day.
(9) What is the probability that two accidents will occur during the next (five-day)
work week?
(b) [s the probability that four accidents will occur over the next two work weeks the
square of your answer to Part (a)? Explain.
4.2.22. Find P(X = 4) if the random variable X has a Poisson distribution such that p(x =
1,
1)
= P(X = 2).
4.2.23. Let X be a Poisson random variable witb parameter A. Show that the probability that
+ e- 2A ).
X is even is
i<l
4.2.24. Let X and Y be independent Poisson random variables with parameters A and /1,
respoctively. Example 3.12.10 established that X + Y is also Poisson with parameter
A + Jt. Prove that same result using Theorem 3.8.I.
4..2.25. If X I is a Poisson random variable fOl' which £(X d = A and if the conditional pdf of
X2 given that X I = XI is binomial with parameters Xl and p, show that the marginal
pdf of X2 is Poisson with E(X2) = Ap.
Intervals Between EVents: The Poisson/Exponential Relationship
Situations sometimes arise where the time interval between consecutively occurring events
is an important random variable. Imagine being responsible for the maintenance on a
network of computers. Clearly, the number of technicians you would need to employ in
order to be capable of responding to service calls in a timely fashion would be a function
of the "waiting time" from one breakdown to another.
Figure 4.2.2 shows the relationship between the random variables X and Y, where X
denotes the number of occurrences in a unit of time and Y denotes the intervaJ between
consecutive occurrences. Pictured are six intervals: X = 0 on one occasion, X = 1 on
three occasions, X = 2 once, and X = 3 once. Resulting from those eight occurrences
are seven measurements on the random variable Y. Obviously, the pdf for Y will depend
on the pdf for X. One particularly important special case of that dependence is the
Poisson/exponential relationship outlined in Theorem 4.2.3.
290
Chapter 4
Special Distributions
Y-values:
Unit time
FIGURE 4.2.2
Theor-em 4.2.3. Suppose a series of events satisfying the Poisson model are occurring at the
rate of A per wtit lime. Let
random variable Y denote the interval
consecutive
events.
Y
the exponenliLll distribution
y > 0
fy(y)
Proof.
an event has occurred at time a. Consider the interval
extends from
at the rate of A per unit time, the
a to a + y. Since the (Poisson) events are
O!
=
probability that no outcomes will occur in the interval (a. a + y) is
Define the random
Y to denote the interval between consecutive occurrences.
Notice that there will be no occurrences in
interval (a,a + y)
if Y > y.
Therefore,
P(Y > y) =
or, equivalently,
y) = 1 -
P(Y
P(Y > y) = 1
for Y. It must be true that
frey} be tbe (unknown)
.:s
=
Taking derivatives of the two
~
dy
Which
loy fy(t)dt
for
r
10
fy(t}dt
P(Y
~(1
.:s y), we can write
- e- 1y )
dy
that
frey) =
y > 0
o
CASE STUDY 4.2.4
Over "short"
periods, a volcano's eruptions are believed to be Poi.sson
events-that they are thought to occur independently and at a constant rate. If so,
between eruptions should have the form hey) =
the pdf describing the
Collected for the purpose
testing that presumption are the data in Table
showing the intervals (in months) that elapsed between thirty-seven consecutive
(Continued on next poge)
Section 4.2
The Poisson Distribution
291
eruptions of
a
thousand-foot volcano Hawaii (110).
the period CO"erlE:O--l,/S.:'lL to 195O-eruptions were occurring at the rate of A
per month (or once
Is the variability in these thirty-six YiS ,,",VI"""''',,",'
with the statement
TABUU.6
73
26
6
41
18
11
26
3
3
6
37
23
2
65
94
51
6
6
68
41
38
16
20
18
12
40
77
91
38 50
61
To answer that question
that the data be reduced to a density-scaled
histogram and superimposed on a
of the predicted exponential pdf
Case Study 3.4.1). Table 4.27
of the histogram. Notice in
Figure 4.2.3 that the
of that hlSI~ogI'am is entirely consistent with the theoretical
mode1- fy(y) =
Theorem 4.2.3.
TABI..£ 4.2.7
Interval
O.:s:y < 20
2O.:s:y< 40
4O.:s:y< 60
6O.:s:y< 80
80 < y < 100
100.:s: y < 120
120.:s: y < 140
13
9
0.0181
0.0125
5
6
0.0069
2
0.0028
0.0000
0
1
0.0083
0.0014
36
0.02
0.01
o
20
60
80
Interval between
fKlUftE 4.2.3
.....
-.:::.:--:.:-:.:,-~I!li!!i~ y
140
100
120
{in m£mth<\
292
Chapter 4
Special Distributions
EXAMPLE 4.2.5
n.U.lVlli"" the most famous of all meteor showers are the Perseids, which occur each year
in early
In some areas the frequency of visible Perseids can be as high as forty
per hour.
that such sighting;; are Poisson
calculate the probability that an
observer who has just seen a meteor will have to wait at least five minutes before seeing
another.
Let the
variable Y denote the interval (in minutes) between consecutive
sightings.
in the units of Y,
forry per hour rate of
Perseids 1:le(:OUles
0.67 per minute. A straightforward integration, then, shows that the probability is 0.036
that an observer will have to wait five minutes or more to see another meteor:
P(Y > 5)
= !SOO 0.67e-O.67J' dy
roo
du
(whereu
= O.67y)
13.33
==
1:33 =
= 0.036
QUESTIONS
4.2.26. Suppose that commercial airplane crashes in a certain
occur at the rate of 2.5
per year.
(8.) Is it reasonable to assume that such crashes are Poisson events? ~AtJ",..".".
(b) What is the probability that four or more crashes will occur next year?
(c) What is the probability that the next two crashes will occur within three months
of one another?
4..2.27. Records show that deaths occur at the rate of 0.1 per day among patients residing in a
large nursing home. If someone dies today, what are the chances that a week or more
will
before another death occurs?
4.2.2&. Suppose that Yl and Y2 are
exponential random
each having
pdf frey) M->'y, y > O. If Y :::;:; Yt + Y2, it can be shown that
y > 0
Recall Case Study 42.4. What is the probabitity that the next three eruptions of Mauna
Loa will be less than 40 months apart?
4.2.29. Fifty
have just been installed in an outdoor security
According to
the manufacturer's specifications, these particular lights are expected to bum out at
the rate of 1.1 per 100 hours. What is the expected number of bulbs that will fail [0 last
for at least 75 hours?
4.3
THE NORMAL DISTRIBUTION
The Poisson limit described in Section 4.2 was not the only, or the
approximation
developed [or the purpose of facilitating the calculation of binomial probabilities.
in the eighteenth century, Abraham DeMoivre proved that areas under the curve
Section 4.3
The Normal Distribution
293
0.2
0.15
0.1
0.05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
FIGURE 4.3.1
fz(z)
=
-00
<
Z
<
00
can be
to estimate P
!.
(
a
~
X-n(l)
)
~ ~b
jl'!(~)(~)
,
where X is a
random variable with.li
I'! and p =
Figure 4.3.1 illustrates the central idea in De Moivre '8 discove~ \.,Pictured is a probability
histogram of
distribution with l1
P ::= ~uperimpose£LQver
c
nlSltO~l'am is the function fz (z) = ~e --?/2. Noti.ce how closely t'he area under the curve
area of the bar, even for this relatively small value of n.
mathematician
Laplace generalized DeMoivre's original idea to binomial
to the full attention of the
approximations for arbitrary p and brought
1812 book, Theone Analytique
mathematical
by including it in his
des Probabilities.
TbeOl'em 4.3.1.
which p
X
a binomial random variable defined on n independent trials for
= P(success). For any numbers a and b,
X - np
( (1:;; Jnp(1 _ p)
Proof. One of
:;;
b)
=
1
lb
e- z2 /2dz
ways to verify Theo~m 4.3.1 is to show that tbe limit of the moment-
generating function
- np
as n _
00 is
el2/ 2.and that el 2 /2 is also the value of
By Theorem 3.12.2, then, the limiting pdf of Z = -r=::::;::;:=:::::;:
is the function fz(z) =
of a more
-00
00.
Appendix 4.A.2 for the proof
o
J!...,L''''U:u.
Comment. We saw in
< Z <
~:tJ.cln
4.2 that Poisson's limit is actually a special case of
Poisson's distribution, px(k) = --,-, k = 0.1. 2, .... Similarly, the DeMoivre-Laplace
k.
294
4
." ........,"', Distributions
limit is a pdf in its own right Justifying that assertion, of course, requires proving
1
to 1 for
-00
< Z <
00.
there is no algebraic or trigonometric substitution that can be used to
by
polar coordinates, we can
demonstrate that
area under (z) is 1.
a necessary and sufficient alternative-namely, that the square of
L:
1
dz
equals one.
To
note that
1
[e-x2!2dX.
dY=~[l°O
1
dxdy
2:rr-<Xl-OO
Let x
= r cos 8 and y = r
1
8, so dxdy
11
00
00
-<Xl
-00
= rdrdO. Then
dx dy
= ~ {2Jr
2:rr
=
1
tX> e-r2/2 r dr d8
10 10
fooo
dr .
foDr d8
=1
Comment. The function
(z)
=
1
is referred to as
standard lWyrnal
(or Gaussian) curve. By convention. any
whose probabilistic behavior is
described by a standard normal curve is denoted
Z (rather than X, Y, or W). Since
Mz(t)
1'/12, it follows readily that
= 0 Var(Z) = 1.
Finding Areas Under
Standard Normal Curve
In order to use
4.3.1, we
to be
to find
area under
graph of
Iz(z) above an
interval [a, b]. In
are obtained one of two
ways-either by using a IWmla/ table, a copy of which
at the back of every ''''''Lm',l",:>
or by running a computer software package. Typically, both approaches
Fz(z) =
::: z),
with Z (and from
we can
the
area).
Table 4.3.1 shows a portion of the normal
that appears in Appendix A.I.
row I.Ulder the Z heading represents a number along the horizontal axis of fz( z) rounded
0 through 9 allow that number to be written to the
off to the nearest tenth;
hundredths
Entries in the body the table are areas under the graph
to
the left
number indicated by the entry's row and column. For example, the number
listed at the intersection of the "1.1'1 row and the" 4" column is 0.8729. which means that
the area under h(z) from -00 to 1.14 is 0.8729. That
dz = 0.8729 =
Figure
< Z ::: 1.14)
= F z (1.l4)
Section 4.3
The Normal Distribution
291
If X is a binomial random variabJe with parameters nand p,
statement for
P(a ::: X ::: b)
==
-
Fz
Comment. Even with the continuity correction refinement, normal curve approximainadequate if n is too small, especially when p is close to 0 or to L As a rule
tions can
of thumb. the DeMoivre-Laplace limit should be used only if the magnitudes of nand p
are such that
Il
P
> 9 --
1 - p
and n >
p
EXAMPLE 4.3.1
Boeing 757s flying certain routes are configured to have 168 economy class seats.
Experience
shown that only 90% of all ticket-holders on those flights will actually
sells 178 tickets for
show up in time to board the plane. Knowing that, suppose an
the 168 seats. What is the probability that not everyone who arrives at the gate on time
can be accommodated?
the random variable X denote the number of would-be pru;;serlgers
up
a flight.
travelers are
with
not
ticketholder constitutes an independent event. Still, we can get a useful approximation to the
178
probability that the flight is overbooked by assuming that X is binomial with n
and p = 0.9. What we are looking for is P(169 ::; X :S
the probability that more
ticket-holders show up than there are seats on the plane. According to Theorem
(and using the continuity correction),
=
P(flight is overbooked)
P(169 ::; X ::; 178)
=P
P
==
168.5 - 178(0.9) < X - 178(0.9) < 178.5 - 178(0.9})
- J178(0.9)(0.1) - ,J178(O.9) (0. 1)
P(2.07 ::; Z .::s 4.57)
= Fz (4.57)
From Appendix A.l,
= P(Z .::s 4.57) is equal to one, for all practical purposes,
and the area under fz(z) to the left of 2.07 is 0.9808. Therefore,
P(flight is overbooked)
= 1.0000
0,9808
=0.0192
implying that the chances are about one in fifty that not every ticket-holder will have a
seaL
298
4
Special Distributions
CASE STUDY
Research in extrasensory perception has ranged from the slightly unconventlonal to
the downright bizarre.
the
of the nineteenth
and even well
into the twentieth century, much what was done involved
and mediums.
But beginning around 1910,
moved out of the seance parlors and into
the laboratory, where
up controlled studies that could be analyzed
In 1938, PraU and Woodruff, working out of Duke
did an
that became a
an
generation of ESP research (70).
and a subject sat at opposite ends of a table. Between them was a
screen with a
gap at the bottom. Five blank cards. visible to
participants,
were
side by side on the table
the screen. On the
of the
screen one of the standard ESP symbols
4.3.4) was hung over each of the
blank cards.
AGUREO.4
The experimenter shuffled a deck of ESP
up the top one, and
coIlcentrate:d on it. The subject tried to guess its
(f he thought it was a circle,
he would point to the blank
on the table that was beneath the circle card hanging
on his side of the screen. The
was then repeated. Altogether, a total of
thirty-two subjects, all students, took
in the experiment.
made a total of
thousand guesses-and were correct 12A89 times.
With five
the probability of a subject's ""'<>11'''''','''
just by chance was}.
a binomial model, the
number
of correct
would be 60,000 X ~, or 12,000. The question is, how "near" to
12,000 is
Should we write off the
excess of 489 as
more than
luck, or can we conclude that ESP has been demonstrated?
To effect a resolution between the conflicting "luck" and "ESP" hypotheses, we
need to compu te the proba bility of the subjects'
12,489 or more correct answers
under the preswnpfion thal p = ~. Only if that
is very small can 12,489
construed as evidence in support of ESP.
Let the random variable X denote the number of correct responses in sixty thousand
tries. Then
P(X ?:.
= ~
(60,000)
k=12.4S9
k
(~)k (i)60JJOO-k
5
(4.3.1 )
5
(Continued on nexi
Section 4.3
Normal Distribution
299
At this point the DeMoivre-Laplace limit theorem
a welcome alternative to
computing the 47,512 binomial probabilities implicit in Equation 4.3.1. First we
the continuity correction and rewrite p(X ::: 12,489) as P(X ?: 12,488.5). Then
P(X?:
-
P
-
X - np
:> l-..~~~-~6~O~,OOO~(1~/~5»)
( Jllp(1 - p) J60,000(1/5)(4/5)
4.99)
= P
(CX.
it..99
0.00001"1)3
this last value
Appendix.
Here, the
obtained from a more
version of Table A.1 in the
that P(X ?: 12,489) is so extremely small makes the "luck" hypothesis
Ii would appear that something other than chance had to be
responsible for
occurrence of so many correct guesses. Still, it does not follow that
ESP has
been demunstrated. Flaw:!. in (h~ ~xpt:rimental setup as wel1 as
the scores could have inadvertently produced what
to be
errors In
a statisticaHy
result. Suffice it to
that a great many scientists remain
highly skeptical
research in general
the PraU-Woodruff experiment in
see (45).]
particular. [For a more thorough critique the data we have just
(I) =
!)
Comment. This is a good set of data for illustrating why we need formal mathematical methods for interpreting data.
our intuitions, when left unsupported
by probability calculations, can often be deceived. A typical first reaction Lo the
Pratt- Woodruff results is to dismiss as inconsequential the 4fi9 additional correct
it seems entirely believable that 60,000 guesses could produce.
answers. To
Only after making the P( X ?: 12.489)
by
an extra 489 correct
computation
we see the utter implausibility of that conclusion. What statistics is
general-rule out
that are not
doing here is what we would like it to
:'.upport~d by the data and point us in (hI:
of inferences that are more likely
to be true.
QUESTIONS
4.3.1. Use Appendix Table A.1 to evaluate Ihe following integrals. In each case, draw a
diagram of fz(:) and shade the area thai corresponds to the integral.
(a)
i
Ll]
1
dz
-0.44
0.94
1
_2
(b) [
-e~' f2 d• -.e>.:;
J2Ji
~
.300
Cha pteI' 4
Distributions
(c)
L748
(d)
1_:
32
~
dz
4.3.2. Let Z be a standard {lonnal random variable. Use Appendix Table A.l to find the
numerical value (or each of the following probabilities. Show each of your answers as
an area under fz(z).
(a) P(O:.:: Z :.:: 2.07)
(b) P(-O.64:::: Z < -0.11)
(c) P(Z > -1.06)
(d) P(Z < -2.33)
(e) P(Z?: 4.61)
4.3.3. (a) Let 0 < a < b. Which number is larger?
dz
or
(b) Leta> O. Whichnumberis larger?
dz or
l
O
+ i /2
1
d1,
0-1/2
(1.24
4.3.4. (8) Evaluate
10
dz.
(b) EvaluateL: 6e-z2 /2 dz.
4.3...S. Assume that the random variable Z is described by a standard normal curve
what values of z are the following statements true?
(8) P{Z:.:: z) = 0.33
(b) P(Z 2:; z) = 0.2236
(c) P(-1.00:::: Z .:.:: z) 0.5004
For
(d) P(-z < z < z) = 0.80
(e) P(z:.:: Z :.:: 2.03) 0.15
4.3..6. Let z(¥ denote the value of Z for which P(Z 2:; z(¥) = a. By definition, the inlerquarti1e
range, Q, for the standard normal curve is the difference
Q = 1,.25 -
<'.75
Q.
4.3.7. Oak Hill has 74,806 registered automobiles. A city ordinance requires each to display
new
a bumper decal showing that the owner paid an annual wheel tax of $50. By
need to be purchased during the month of the owner's birthday. This year's
budget assumes that at least $306,000 in decal revenue will be collected in November.
in that month will be less than antlcl~)at(~
What is the probability that taxes
and produce a budget shortfall?
,
4.3.8. Hertz. Brothers, a small, family-owned radio manufacturer,
electrooic components domestically but subcontracts the cabinets to a
supplier. Altbough
inexpensive, the foreign supplier has a quality control program that leaves mucb to
be desired.
the average, only 80% of the standard 1600-unit shipment that Hertz
Section 4.3
4.3.9.
4.3.10.
4.3.1L
4.3.12..
The Normal Distribution
301
receives is usable.
Hertz has bade orders for 1260 radios but
space
units
for no more than 1310 cabinets. What are the chances thal the number of
to fill all the orders
in Hertz's latest shipment will be large enough to allow
already on hand. yet small enough to avoid causing any invento-ry problems?
Fifty-five percent of the registered vOlers in Sheridanville favor their incumbent mayor
in her bid for reelection. If 400 voters go (0 the polls, approximatc the probability that
(a) the race ends in a tie
(b) the challenger scores an upset
State Tech's basketball team, the Fighting Logarithms, have a 70% foul-shooting
percentage.
(a) Write a formula for the exacl probabillly that out of their next 100
throws
will make between 75 and 80, inclusive.
(b) Approximate the probability asked for in Part (a).
A random sample of 747 obituaries published recently in Sah Lake City newspapers
revealed that 344 (or 46%) of the
died in the three-month period following
their binhdays (129). Assess the statisticaisignificance of that finding by approximating
the probability that 46% or more would die in that particular inteTval if deaths occurred
...,."..,"""" throughout the year. What would you conclude on the basis of your answer?
There is a theory embraced by certain parapsychologists that hypnosis can enhance
a person's ESP ability. To leSI. I.hat hypothesis, an experiment was set up with 15
hypnotized subjects (22). Each was
lO make 100
using the same sort of
ESP cards and protocol that were described in Case StUdy 4.3.1. A lOtal of 326 correct
identifications were made. Can it be argued on the basis of those results that hypnosis
does have an effect on a person's ESP ability? Explain.
4.3.13. rt' pxO;;)
=
(10)
7 J;: (0.3) 1O-J;: " k
k (0.)
0 .I. . . . . 10"
, IS
it
.
ar>propnate
to approximate
P(4.::: X .::: 8) by computing
P
Explain.
4.3.14. A sell-out crowd of 42,200 is expected at Cleveland's
Field for next Tuesday's
game with the Baltimore Orioles, the last before a long road trip.
is trying to
how much food to have on hand. Looking at
concession
records from games played earlier in the season. she knows thal, on the average, 3R%
of all those in attendance will buy a hot dog. How
an order should she place if
exceeding supply'!
she wants to have no more that a 20% chance
Central limit Theorem
It was pointed out in
as the sum of 1/
3.9.3 that
binomial random variable X can
Bernoulli random variables
XI!, where
._!
1
XI -
0
X,.
with probability p
with probability 1
p
... ,
written
]02
Chapter 4
Special Distributions
But if X = Xl
+
X2
+ ."" +
XII' Theorem 4.3.1 can be reexpressed as
Xl +
+ "". + X" - np
"
(
11m
P a:S ---r=:====~---
11-+00
:s b)
=
1
dz
(4.3.2)
Implicit in Equation 4.3.2 is an obvious question: Does the DeMoivre-Laplace limit
other types of random variables as well? Remarkably. the answer is
apply to sums
" Efforts to extend Equation
have continued for more
one hundred and
many of the
years. Russian probabilists-A. M. Lyapunov, in
advances. In 1920, George Polya gave these new generalizations a name that has been
with the result ever since: He called it the central limit theorem (141).
TMorem 4.3.2 (Central Limit TMorem). Let WI> W2.··. he an infinite sequence of
independent random variables, each with the same distribution. Suppose that the mean 11and the variance (/2 of fw(w) are both fin.ite. For any numbers a and h,
lim P
(a < WI + ... +
11-400
W"
dz
-
o
Proof. See Appendix 4.A.2.
Comment. The central limit theorem is often stated in terms of the average of Wt,
W2, .. "' and W", rather than their sum.
E [;(W1
+ ... +
W,,) ] = E(W) = JL
and
Var
[~(Wt
+ .,. +
W,,)] = 0'2 In,
Theorem 4.3.2 can be stated in the equivalent form
1
dz
We will use both formulations, the choice depending on whlch is more convenient for the
problem at
EXAMPtE 4_3.2
The top of Table 4.3.2 shows a MINIT AB simulation where forty random samples of size
five were drawn from a unifonn pdf defined over the interval (0, 1]. Each row corresponds
to a different
The sum of
five numbers appearing in a given sample is denoted
"y" and is listed in column C6. For this particular unilonn pdf, JL
and 0'2 =
(recall
Question 3.6.4), so
i
--''---=----- = -=.:::;.
'tABLE 4.3.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
0.556099
0.497846
0.284027
0.5992S6
0.280689
0.462741
0.556940
0.102855
0.642859
O.01n70
0331291
0355047
0.626197
0.211714
0.535199
0.810374
0.687550
0.424193
0.397373
0.413788
0.6026'.J7
0.963678
0.967499
0.439913
0.215TI4
0.108881
O.3m98
0.635017
39
0.563097
0.687242
0.784501
0.505460
0.336992
0.784279
0.548000
0.096383
0.161502
0.6TI552
0.470454
40
0.104377
30
31
32
33
34
35
36
37
38
0.6%873
0.588979
0.209458
0.667891
0.692159
0.349264
0.246789
0.679119
0.004636
0.568188
0.41070..';
0.961126
0.304754
0.404505
0.130715
0.153955
0.185393
0.529199
0.143507
0.653468
0.094162
0.375850
0.868809
0.446679
0.407494
0.271860
0.173911
0.187311
0.065293
0.544286
0.745614
0355340
0.734869
0.194038
0.788351
0.844281
0.972933
0.232]81
0.267230
0.819950
0.272095
0.414743
0.194460
0.036593
0.471254
0.719907
0.559210
0.728131
0.416351
0.118571
0.920597
0.530345
0.045544
0.603642
0.082226
0.620878
0.201554
0.973991
0.017335
0.247676
O.9093n
0.940770
0.075227
0.002307
0.972351
0.309916
0.365419
0.841320
0.980337
0.4.59559
0.163285
0.824409
0323756
0.831117
0.6&J927
0.038113
0.307234
0.652802
0.047036
0.956614
0.614309
0.839481
0.728826
0.613070
0.711414
0.014393
0.299165
0.908079
0.979254
0.575467
0.933018
0.213012
0.333023
0.827269
0.013395
0.157073
0.234845
0.556255
0.638875
0.307358
0.405564
0.983295
0.971140
0.604762
0.300208
0.831417
0.518055
0.649507
0.565875
0.352540
0.321047
0.430020
0.200790
0.656946
0.515530
0.588927
0.633286
0.189226
0.233126
0.819901
0.439456
0.694474
0.314434
0.489125
0.918221
0.518450
0.801093
0.075108
0.242582
0.585492
0.675899
0.520614
0.405782
0.897901
0.819712
0.090455
0.681147
0.900568
0.653910
0.828882
0.814348
0.554581
0.437144
0.210347
0.666831
0.4{i3567
0.685137
0.077364
0.529171
0.896521
0.682283
0.459238
0.823102
0.050867
0.553788
0.36.S4{13
0.410964
0399502
2.46429
3.13544
1.96199
2.99559
2.05270
2.38545
3.15327
1.87403
2.47588
[.98550
2.08240
339773
3.07021
139539
2.00836
2.TIl72
2.32693
1.40248
2.43086
2.54141
2.23723
338515
3.99699
2.49970
2.03386
i!..16820
1.788l:i6
2.48273
2.67290
2.93874
3.08472
2.27315
2.89960
2.19133
3.19131
2.32940
2..24187
2.17130
2.43474
1.56009
-0.05532
0.98441
-0.83348
0.76m
-0.69295
.....{).ln45
1.01204
-0.96975
-0.03736
-0.79707
-0.6%94
1.39076
0.E8337
-1.71125
-0.76164
0.42095
.....{).26812
-1.70028
-0.10711
0.06416
-0.40708
1.37126
2.31913
-0.00047
.....{).72214
.....{).51402
-U0200
-0.02675
0.26786
0.67969
0.90584
-0.351440.61906
.....{).47819
1.07106
.....{).26429
-0.39990
-0.50922
-0.10111
-1.45610
0.4
0.3
0.2
0.1
0.01---"'-""-'
-3.5
-1.5
-0.5
Z-n;lio
05
1.5
2.5
3.5
304
Chapter 4
Spedal Distributions
At the bottom of Table 4.3.2 is a density-scaled
Y~ (as listed in oolumn
Notice the close
of the
"2
between the distribution of
.,;$/12
those ratios and fz(z): What we see there is entirely consistent with the statement of
Theorem 4.3.2.
Comment. Theorem 4.3.2 is an asymptotic result., yet it can provide surprisingly good
approximations even when n is very small. Example 4.3.2 is a Iypical case in
The
unifonn pdf over {O. 1] looks nothing like a
curve,
random
as
small as n = 5 yield sums that behave probabilistically much like the theoretical limit.
In
samples from
pdfs will produce sums that "converge" quickly
to the theoretical Limit. On the other
if the underlying pdf is sharply SKI~WI~a--H)r
example, fy(J)
10e-lOy, Y > O--it would take a larger n to achieve the level of
agreement present in
4.3.2.
=
EXAMPLE 4.3.3
15 is drawn from the pdf he\')
A random sample of size n
y
= (fi)
= 3(1
P(~
Vi. Use the central limit theorem to
,0 :s y :'S L Let
:s Y:s i)·
first of all, that
E(Y)
= 10
1
y . 3(1 - y)2 dy
1
4
and
According,
to the central limit theorem formulation that appears in the comment
on
302, the probability that Ywill lie between and ~ is approximately 0.99:
l
P
1 - -83)
(8-<Y<-
=p
=P
:'S 2 :'S 2.50)
0.9876
EXAMPLE 4.3.4
In preparing next quarter's budget, the accountant for a small business has one hundred
different expenditures to account for.
predecessor listed each entry to the penny, but
doing so grossly overstates the precision of the
As a more truthful alternative,
she intends to record each budget allocation to the nearest $100. What is the probability
that her total estimated
wiH end up differing from the actual cost by more than
Section 4,3
$5oo? Assume that fl.
. , ., flOO,
items, are independent and uniformly
Let
The Normal Distribution
305
rounding errors she makes on the one hundred
over the interval [-$50, +$50].
+ Y2 + ... + YIOO
= total roundjng error
SUlO = fl
What the accountant wants to estimate is P(lSwol > $500). By the distribution assumption
made for each fi'
E(Y/) =0.
i=1.2, .. ,,100
and
j
S{)
Var(Y;) = E(f?) = -so
1
loo ldy
2500
3
E(SHlO} :::::; E(YI
+
f2
+ .. , +
flOO) =
°
and
Var(SJOO)
= Var(Yl + Y2 + ... +
flOO)
=
100 (2~)
250,000
3
Applying Theorem 4.3.2, then, shows that her strategy has roughly an 8% chance of being
in error by more than $500:
P(lSlool > $500)
= 1 - P(-5OO:::; S100
-°
:::; 500)
=1 - P - - - - - < SI00 -
°
0)
500 < ---.,,-
- 500/../3
== 1 - P(-1.73 < Z < 1.73)
= 0.0836.
EXAMPLE
The annual number of earthquakes registering 2.5 or higher on the Richter scale and having
an epicenter within forty miles of downtown Memphis follows a Poisson distribution with
)" = 65. Calculate the exact probability that nine or more such earthquakes will strike
central limit
next year, and compare that va]ue to an approximation based on
theorem.
306
Chapter 4
Special Distributions
number of earthquakes of that magnitude that will hit Memphis next
the exact probability thai X 2: 9 is a Poisson sum:
If X denotes
P(X 2: 9)
=1
P(X
.:s 8) = 1
8
"'-~
L
x!
x=O
1 - 0.7916
=0.2084
For Poisson random variables, the ratio
" . t h eorem red utes to --=X - A
centra11 Im!t
P(X
2: 9)
=1
- P(X
WI
+ .. +
Wn - nit
.
that appears In the
Question 4.3.18). Therefore,
.:s 8) :::: 1
=1
P(X
.:s 8.5)
P(X - 6.5 < 8.5 .J6.5 -
6.5)
.J6.5
",;" 1 - P(Z.:s 0.78)
= 0.2170
(Notice that the event" X .:s 8" is replaced with" X .:s 8S' before applyingthecemrallimit
theorem transformation. As always, the continuity correction is appropriate whenever a
discrete probability modeJ is being approximated by the area under a curve.)
QUESTIONS
=
4.3.15. A fair coin is tossed 200 times. Let Xi = 1 if the ith toss comes up heads and Xi
0
1,2, ... 200. Calculate the central limit theorem approximation for
otberwise, i
P(IX
E(X)I .:s 5). How does this differ from the DeMoivre-Laplace approximation?
4.3.16. Suppose that 100 fair dice are tossed. Estimate the probability that the sum of the faces
showing exceeds 370. Inc1ude a continuity correction in your "n~II""'"
4.3.17. Let X be the amount won or loss in betting $5 on red in roulette. Then Px (5)
and p.t(-5) = ~. If a gambler bets on red 100 times, use the central limit theorem to
estimate the probability that those wagers result in less than $50 in losses.
4.3.18. If Xl, X2. • .• XII are independenlPoisson random variables with parametersAl.
An.
and if X = Xl +
+ ... + X"' tben X is a Poisson random variable
=
11
witb parameter l
=L
Ai (recall
3.12.10). What specific form does the ratio
;;=1
in Theorem 4.3.2 take if the Xi'S are Poisson random variables?
for a particular silicon
4.3.19. An electronics firm reeei ves, on the average, 50 orders
If the company has 60 chips on hand, use the central limit theorem to approximate
the probabilily that they will be unable to fiJI all their orders for the upcoming week.
Assume tbat weekly demands foUow a Poisson distribution. Hint: See Question 4.3.18.
4.3.20. Considerable controversy has arisen over the possible aftereffects of a nuclear wea)X}ns
test conducted in Nevada in 1957. Included as part of the test were some 3000 military
Section 4,3
Normal Distribution
307
civilian "observers." Now, more than 40 year'!> later,
cases of leukemia
have been diagnosed among Ihose 3000. The expected
of cases, based on
the demographic characteristics of the
was three. Assess the statistical
significance of
findings. Calculate
an exact answer using the Poisson
distribution as well as an approximation
on the central limit theorem.
The Normal Curve as a Model for Individual Measurements
Because of the central limit theorem, we know that sums (or averages) ofvirrually any set
of random variables, when suitably
have distributions
can be appro:rimated
many indivjduaJ
by a standard normal curve, Perhaps even more surprising is the fact
measurements, when suitably
also have a standard normal distribution, \Vhy
true? What do single observations have in common with samples of
should the latter
n?
Astronomers in the
nineteenth century were among the first to understand the
connection. Imagine looking through a telescope for the purpose of determining the
location of a star. Conceptually, the data point, Y. eventually recorded is the sum of two
components: (1) the star's true location}1.'" (which remains unknown) and (2) measurement
error, By definition, measurement error is the net
of all
factors that cause
the random variable Y to have a different value than JL*. Typically, these effects will be
additive, in which case the random variable can be written as a sum:
Y=
(4.3.3)
where WI> for example, might represent the effect atmospheric irregularities,
the
of seismic vibrations, W3 the
of parallax distortions, and so On.
li Equation 4.3.3 is a valid representation of the random variable Y, then it would
individual YiS. Moreover, if
follow that the central limit theorem applies to
E{Y)
=
+
Var(Y) = Var(}1.'"
WI
+
+
WI
W2
+
+ ... +
W2
W,)
+ ... +
= J..i
W,)
=u2
the ratio in Theorem 4.3.2 takes the fonn Y - f.l. Furthermore, t is likely to be very large,
u
so
approximation implied by the ceotrallimit theorem is v<><>'-'U'U£UJ J an equality-that
we take the pdf of
Y-f.l
a
to be fz(z).
Finding an actual formula for Jy(y), then, becomes an exercise in applying TheoY-}1.
rem
Given
Y=}1.
and
+ uZ
4
Special Distributions
Definition 4.3.1. A random variable Y is said to be normally distributed with mean Il
and variance
it:
-oo<y<oo
The symbol Y ~ N (p., a 2 ) will sometimes be
distribution with mean J1. and variance
to denote the fact
Y
a normal
Commwt. Areas
an "arbitrary" normal distribution, fy(y), are '-<>''"'I.Ll<1L!;;;"U by
the
area under the standard
distribution, fz(z.):
.:s Y .:s b) =
The ratio
Y t1
P
--'-<
-
Y- Il <b -- -Il)
a
-
a
P
(
Jl)
ll Il
b --<2<--
a
-
-
a
is often referred to as either a 2 trans!orrn.atum or a Z score.
EXAMPLE 4.3.6
In many states a motorist is legally
or driving under the
(DUI). if his or
her blood
concentration, Y, is 0.10 % or higher. When a
DUloffender
is pulled over,
often request a
test. Although the
analyzers used
the machines do exhibit a certain amount of
for that purpose are remarkably
measurement error. Because of that
the possibility exists that a driver's true
blood alcohol concentration may be under 0.10% even though the analyzer
a reading
Over 0.10%.
has
that repeated breath
measurements taken on the same
of responses that can be described by a normal
with Il
person produce a
equal to the person's true blood alcohol concentration and a equal to 0.<.104%. Suppose
a
Having
a
is stopped at a roadblock on his way home
he
a true blood
concentration of
bit more than he should
under the legal limit. If takes the breath
what are the
he will be incorrectly booked on a DUI charge?
a nUl arrest occurs
Y ::::. 0.10%, we
to find P(Y ::::. 0.10)
/.t =
and a = 0.004 (the
is irrelevant to
probability calculation
can be ignored). An application of the Z transformation shows that the driver has
an 11 %
of being falsely"'" .... """' ......
0.095
> -----,-,--0.004
= 1 - P(2 < 1.25)
= P(Z 2:
= 1 - 0.8944 0.1056
P(Y 2: 0.10) = P
4.3.5 shows fy(y).
Y
the two areas that are equal.
Section 4.3
I
The Normal Distribution
309
\
I
, ,
\
I
,
\
i
,
I
150
I
fy(y)--.J
-~'"
I
,,
I
Area"" 0.1056
y
0.095 0.10
Legally
0.080
0.110
drunk
... "-
......
... ...
o
-3.0
.
Area.= 0.1056
1.25
3.0
FIGURE 4.3.5
EXAMPLE 4.3.7
Mensa (from the
word for "mind") is an international society devoted to intellectual
of the
population is eligible
pursuits. Any person who has an IQ in the upper
to join. What is the lowest IQ that will qualify a person
membership? Assume that lOs
are normally distributed with J.t = 100 and a ::::: 16.
the random variable Y denote a person's lQ, and Jet the constant YL be
lowest 1Q that qualifies someone to be a
Mensan. The two are related by a
probability equation:
P(Y
or, equivalently,
yd
P(Y <
~
yd =: 0.02
=1
0.02 = 0.98
(4.3.4)
Figure 4.3.6).
Applying the Z transformation 10 Equation 4.3.4 gives
P(Y < YL) = P
Y - 100
<
= P(Z <
YL - 100)
16
= 0.98
310
Chapter 4
Special Distributions
0.03
....... ..,---fy(y)
...
...
...
""
'\,
.. .. Area = 0.02
" .....
100
IO
YL
L-..Qualifies for
membership
AGURE4.3.6
From the standard normal table in Appendix Table A.I, tbough,
P(Z < 2.05)
= 0.9798 "" 0.98
off the same area of 0.02 under /z (z), tbey
Since - - - -
whicb .LIUIIJU'''' that 133 is the lowest acceptable 10 for Mensa:
YL
= 100 + 16(2.05) = 133
EXAMPLE 4.3.8
The Army is soliciting proposals for the development of a truck.-launched antitank. missile.
Pentagon officials are requiring that the automatic sigbting mechanism be sufficiently
reliable to guarantee tbat 95 % of the missiles will fall no more than fifty feet sbort of
their target or no more than fifty feet beyond. What is the largest () compatible with that
degree precision? Assume that the horizontal distance a
travels, is normally
between the truck and
distributed with its mean (J1.) equal to the length of the
the target
The requirement that a missile has a 95% probability of landing within fifty feet of Its
can be
by the equation
P(J1.
(see
p
50 .$ Y .:5: J1.
+ 50) = 0.95
43.7): Equivalently,
-50-J1.
a
<
Y-J1.
:S:
J1.+50a
~----=-
=p
-50
50)
( -;-.:5: Z.:5: -;;
= 0.95 (4.3.5)
Section 4.3
The Normal Distribution
311
II
Distance. between
truck and targel
FlGUR£43.1
Following the approach taken in Example 4.3.7. we can "match" Equation 4.3.5 using the
information provided in Appendix Table A.1. Specifically,
P( -1,96 S Z S 1.96)
= 0.95
It must be true. then, that
so
1.96= a
which implies
a =
Any value of q larger than
will result in !r (y) being flatter, and that would have
the consequence that fewer than 95% of the missiles would land.within fifty feet of their
targets. Conversely, if the sighting mechanism produces a u smaller than 25.5, it will be
performing at a level that exceeds the contract specifications (and perhaps costing an
amount that makes the proposal noncompetitive).
EXAMPLE 4.3.9
Suppose a random variable f has the moment-generating function My(t) = ~t+il2.
Calculate P(-1 ::::: Y S 9).
To
notice that My(t) bas the same form as the moment-generating function for
a normal random variable. That is,
e3/+8t2
= eP"l+«(12r2)12
where Ii. = 3 and q2 = 16 (recalJ Example 3.12.4). To evaluate P(-l S Y S 9), then,
requires an application of the Z transformation:
P(-l < Y < 9)
--
= P ( -1 4-
3
-<
Y'- 3 9 - 3)
4 -< -4-
= P(-l.00 < Z < l.ll0}
-"
= 0.9332 - 0.1587
=0.7745
312
Chapter 4
Special Distributions
Theorem 4.3.3. Let Yl be a normally distributed random variable with mean f.Ll and
variance
and let
be a normally distributed random variable with mean f.L2 and
variance a~. Define Y
Yl +
If Yl and are independent, Y is normally distributed
with mean f.Ll + JL2and variance
+ a~,
ur,
ur
=
Proof Let MY,(t) denote the
function for YI, i
1,2, and let
My(t) be the rnornent-generating function for y, Since Y
YJ + Y2, and the Yi'S are
=
independent,
=
My(t)
MYl (I) . M y2 (t)
== el'11+{o;t2}/2 . el'21+(q~t2)!2
3.12.4)
= e(Pl+M)I+(O;+a~)(2/2
We recognize the latter, though, to be the mornent-generating function for a nonna}
random variable with mean f.Ll + 1L2 and variance
+ u~. The result follows by
property stated in Theorem 3.12.2.
0
virtue of the
at
.. '. Y" be a random sample 'Of size n from a normal distribution with
Coronary. Let Yl,
=~
Then the sample mean, Y
mean f.L and variance
II
Yi, is also normally distributed
with mean JL bUI with variance equal to a 2 / n (which implies that
Y~JL
is a standard
normal random variable, Z).
Coronary. Let
Y2. "', Y" be
meamf Ii. L 1L2. •.• , ILn and variances
set of constants. Then Y = al Yl +.
Ili lLi
set of independent normal random variables with
respectively. Let aI, az, .. ,all be any
+ ... + all Y" is normally distributed with mean
01, ,.. , 0';,
and variance a 2 =
EXAMPlE 4.3.10
elevator in the athletic dorm at Swampwater Tech has a maximum capacity of
twenty-four hundred pounds. Suppose that ten football players
on at the twentieth
floor. If the weights of Tech's players are nonnally distributed with a mean of two
twenty pounds and a standard deviation of twenty pounds, what is the
there will
ten fewer Muskrats at tomorrow's practice?
Let the random
... , Y10 denote the weights of the ten players. At
is the probability that Y
p
EY
10
( ;=1
t >
=
Yj exceeds twenty-four hundred
2400) =p
(1
1 )=
10
-LYi>
-·2400
10 1=1
10
But
> 240.0)
Section 4.3
The Normal Distribution
313
can be applied to the latter expression using the corollary on page 311:
AZ
> 240.0)
=:
P
(Y"20/-v'TO
10
=
>
P(Z > 3.16)
= 0.0008
Clearly, the chances a Muskrat splat are minimal. (How much would
players squeezed onto
elevator?)
change if
probability
EXAMPLE 4.3.11
personnel department of a large corporation gives two aptitude tests to job applican ls.
measures verbal ability; the
quantitative ability. From many
experience,
the company has found
a person's verbal score, Yh is normally distributed with
/1.1 = SO and Ot
10. The quantitative scores, f2, are nonnaUy distributed with JA-2 = 100
and 0"2 = 20,
Yl
Y2 appear to be independent. A composite score, Y, is ,,;>;l.I/;llVU
to each applicant, where
To avoid unnecessary paperwork, the company automatically rejects any applicant
whose composite score is below 375. If six individuals submit resum6s, what are the
chances
fewer than half will fail the
test?
FIrst we need to calculate the probability that any given candidate will score below
the composite cutoff. Since Y is a linear combination of independent normal random
variables., Y itself is normally distributed with
E(y) = 3E(fl)
+
2E(Y2)
= 3(50) + 2(100) = 350
and
A Z transfonnatioo, then, shows that the probability of a random applicant being
summarily rejected is 0.6915:
P(Y < 375)
=
P (
y - 350
-J25OO
<
.J25OO350)
375
P(Z < 0.50)
= 0.6915
Now, let the random variable X denote the number of applicants (out of
Y -values would be less than
By its structure, X is binomial with n
p
P(Y < 375)
0.6915. Therefore,
=
=
=
whose
6
314
Chapter 4
Special Distributions
P(fewer than half
the applicants will fail the
= P(X
< 3)
= P(X :s 2) =
2
(~) (O.691S),"(O.3085)6-k
=0,0774
EXAMPLE 4.3.12
Let YJ. Y2 •...• Yg be a random sample ot size nine from a normal distribution where
f.J..
2 and (1 2. Let ,Yi. Yj. Yt be an independent random sample from a normal
distribution for which f.J.. 1 and (1 = 1. Find
;:: r).
The corollary on page 312 can be applied here because the event Y ;::
can be written
in terms at a linear combination-&pecifically, Y ;:: O. Moreover.
=
=
=
r
E(Y)
E(r)
E(Y) -
E(Y"')
= 2 - 1=1
and
Y) = Var(Y) + Var{r)
=-----:,..;-:...+
so
P(Y :::
YO) = p(y -
::: 0)
Var(Y*)
4
(why?)
=
22
12
+4=
25
-1 0-1)
= P --==--::: J25/36 = P(Z::: -1.20)
= 0.8849
QUESTIONS
4.3.2L Econo-Tire is planning an advertising campaign for its newest product, an inexpensive
h8.ve
radial. Preliminary road tests conducted by the firm's quality control
suggested th8.t the lifetimes of these tires will be normally distributed
an average
of 30,000 miles and a standard deviation of 5000 miles. The marketing divisioo would
like to TIm a commercial that makes the claim that a.t least nine out of ten drivers will
get at least 25,000 miles on a set of &ono-Tires. Based on the road test data, is the
company justified in making that assertion?
4.3.22. A large computer chip manufacturing plant under construction in Westbank is expected
to add 1400 children to the county's public school system once the pennament work
force arrives. Any child with an IQ under 80 of over 135 will require indivjduaHzed
instruction that wiU cost the city an additional $1750 per year. How much money
should Westbank anticipate spending next year to meet the needs of its new special ed
students? Assume that IQ scores are normally distributed with a mean (tt) of 100 and
a standard deviation ({1) of 16.
Section 4.3
The Normal Distribution
315
4.3.23. Records for the past several years show that the amount of money collected dflily by
a prominent
is
distributed with a mean (p) of $20,000 and a
standard
$5001What are the chances that tomorrow's donations will
exceed $30,0001
4.3.24. The foHowing letter was written to a well-known dispenser of advlce to the lovelorn
(178):
Dear Abby: You wrote in your column that a woman is
for 266
Who
said
r carried my baby for ten months and five days.,
it because I know the exact date my
was conceived. My husband is in the Navy
and it couldn't have possibly been conceived any other lime because I saw him only
once for an hour, and I didn't see him again until the day before the baby was born.
I don't drink or run around, and there is no way this baby isn't his. so
print a
retractioo about the 266-<1ay carrying time because otherwise I am in a lot of trouble.
San Diego Reader
4.3.2.5.
4.3.26.
4.3.27,
4.3.28.
Diego Reader is telling the truth is a judgment that lies beyond
Whether or not
the scope of any stalistical analysis, but quantifying the plausibility of her story does
not According to the collective experience o{ generations of pediatricians, pregnancy
durations, Y, tend to be normally distributed with Jl = 266
and (J = 16 days. Do a
probability calculation that addresses San Diego Reader's credibliity. What would you
conclude?
A criminologist has developed a questionnaire for predicting whether a teenager will
become a delinquent. Scores on the questionnaire can range from 0 to 100, with
values reflecting a presumably greater criminal tendency. As a rule of-thpmb, the
CrIlffiHlOlO£!.Jst decides to classify a teenager as a potential delinquellt if his or her score
exceeds
The questionnaire has
been tested on a large sample teenagers,
both delinquent and nondelinquent. Among those considered nondelinquent, scores
were normally distributed with a mean (Jl) of 60 and a standard deviation (u) of 10.
Among those considered delinquent, scores were normally distributed with a mean of
80 and a standard deviation of 5.
(a) What proportion of the time will tbe criminologist misdassify a nondelinquent as
a delinquent? A delinquent as a nondelinquent?
(b)
the same set of axes, draw the normal curves that represent the distributions
of scores
by delinquents and nondelinquents. Shade the two areas that
correspond 10 the probabilities asked for in Part (a).
The cross-sectional area
tubing for use in pulmonary resuscitators is normally
distributed with Jl = 12.5 mm 2 and (J = 0.2 mm 2 . When
area is less than 12.0 mm~
or
than 13.0
, the tube
not fit properly. (f the tubes are shipped in
to find?
boxes of 1000, how many wrong-sized tubes per box can doctors
At State University, the average score ofthe
class on the verbal portion of the
SAT is 565,
a standard deviation of 75. Marian scored a
How many of State's
other
freshmen did better? Assume
the scores are nonnally distributed.
A
professor
Chemistry 101 each fall to a large class of freshmen.
she uses standardized exams that she knows from past
produce
bell-shaped grade distributions with a mean of 70 and a standard
of 12. Her
philosophy of grading is to impose standards that will yield, in the
run,20% A's,
26%B's, 38%C's, 12%D's, and 4%Fs. Where should the cutoff be between the A's
and the
Between the
and the C's?
316
Chapter 4
Special Distributions
the random variable Y can be
ofa is
CI .. ~rrlln.f'r1
P(20 :5: Y :5: 60)
by a normal curve with JL
== 40. For
= 0.50
4.3.30. It is estimated that 80% of aU 18-year-old women have weights ranging from 103.5 to
144.5 lb.
the weight distribution can be adequately modeled by a normal
curve and assuming that 103.5 and 144.5 are equidistant from the average weight J.L,
calculate o.
4.3..31. Recall the breath
problem described in Example 4.3.6. Suppose the driver's
blood alcohol concentration is actually 0.11 % rather than 0.095%. What is the probability that the breath analyzer will make an error in his favor and
that he is not
legally drunk?
the police offer the driver a choice-either take the SObriety
test once or take it twice and average the readings. Which option should a "0.095 %"
driver take? Which option should a "0.11 %" driver take? Explain.
4.3.32. If a random variable Y is normally distributed with mean J.L and standard deviation 0,
the Z ratio Y - J.L is often referred to as a normed score: It indicates the magniltU(ie
a
of y relative to the distribution from which it came. " Norming" is sometimes used as
an affirmative action mechanism in hiring decisions. Suppose a
company is
a new sales
aptitude test they have traditionally
for that
shows a distinct gender bias: Scores for men are normally distributed with
JL 62.0 and a = 7.6, while scores for WOmen are normally distributed with JL = 76.3
and C1 = 10.8. Laura and Michael are the two
vying for the position: Laura
has scored 92 on the test and Michael 75. If the company agrees to norm the scores for
gender bias, whom should they hire?
4..3.33. The lOs of nine randomly .selected people are recorded Let Y denote their average.
Assuming the distribution from which the Yi'S were drawn is normal with a mean of
100 and a standard deviation of 16, what is the probability that Y will exceed 103?
What is the probability that any arbitrary Yi win exceed 1037 What is the probability
that exactly three oHhe
will exceed 103?
. .. , Y" be a random sample from a normal distribution where the mean is 2
4.3.34. Let YI.
and the variance is 4. How large must n be in order that
=
P(1.9 :5: Y :5: 2.1) ?:: 0.99
4.3.35. A circuit contains three resistors wired in series. Each is rated at 6 ohms. Suppose,
distributed random variable
however, that the true resistance of each one js a
with a mean of6 ohms and a standard deviation of 0.3 ohm. What is the probability that
the combined resistance will exceed 19 ohms? How "precise" would the manufacturing
process have to be to make the probability less than 0.005 that the combined resistance
of the circuit would exceed 19 ohms?
4..3.36. The cylinders and pistons for a certain internal combustion
are manufactured by
a process that gives a normal distribution of cylinder diameters with a mean of 41.5 em
and a standard deviation of 0.4 em. Similarly, the distribution of piston diameters
is normal with a mean of 40.5 em and a standard deviation of 0.3 em. If the piston
diameter is greater than the cylinder diameter, the former can be reworked until the
two "fit". What proportion of cylinder-piston pairs will need to be reworked?
4.3.37. Use moment-generating functions to prove the two corollaries to Theorem 4.3.3.
Section 4.4
Geometric Distribution
J11
THE GEOMETRIC DISTRIBUTION
Consider a
of independent trials,
having one of two possible outcomes, success
or failure.
p = P(trial ends in
the random variable X to be the trial
at which the first success occurs. Figure 4.4.1 suggests a formu1a for the pdf of X:
px(k}
=
P(X = k}
= P(first success occurs on kth trial)
= P(first k-1 trials end in failure and kth trial ends in SIlCCess)
= P (first k-1 trials
= (1 - p)*-l p,
in failure) . P (kth trial ends in SIlC:CesS)
k
= 1,2, ...
(4.4.1)
We call the probability model in Equation 4.4.1 a geometric distribution (with parameter p).
k
F
1 failures
F
F
s
Independent trials
AGUR£4.4.1
Comment. Even without its association with independent trials and Figure 4.4.1,
the function px(k)
(1 - p)k-l p, k = 1.2.... qualifies as a discrete
because (1)
px(k) ?:: 0 for all k
(2) I: px(k) = 1:
=
.
all k
(1 _ p).I:-l p
= p
=p'
1
-
(1 -
p)
=1
EXAMPLE 4A.1
of fair
are
until a sum of seven appears for the first
probability that more than four roUs win be required for that to happen?
here is an indePendent trial. for which
Each throw of the
p
= P (sum = 7) =
6
1
=6
What is the
318
Chapter 4
Special Distributions
Let X denote the roll at which the first sum of seven appears. Clearly, X has the structure
of a geometric random variable, and
P(X > 4) = 1 -
P(X :::: 4)
= 1
671
= 1 -
1296
=0.48
Theorem4.4.1. Let X have a geometric distribution with px(k)
Then
1. MxCt) =
= (1- p)k-l p, k = 1,2, ....
l-l.!:'p)e
2. E(X) = 1p
3. Var(X)
= 7-
Proof. See Examples 3.12.1 and 3.12.5 for derivations of Mx(t) and E(X). Theformula
for Var(X) is left as an exercise.
0
EXAMPLE 4.4.2
A grocery store is sponsoring a sales promotion where the I.:ashiers give away one of the
letters A, E, L, S, U, and V for each purchase. If a customer collects all six (spelling
VALUES), he or she gets ten dollars worth of groceries free. What is the expected
number of trips to the store a customer needs to make in order to get a complete set'?
Assume the differentletlers are given away randomly.
Let Xi denote the number of purchases necessary to get the ith different. letter,
i = 1, 2, ... , 6, and let X denote the number of purchases necessary to qualify for the ten
dollars. Then X = Xl + X2 + ... + X6 (see Figure 4.4.2). Clearly, Xl equals one with
probability one, so E(X 1) = 1. Having received the first letter, the chances of getting a
different one are for each SUbsequent trip to the store. Therefore,
i
1<.=1,2...
Second
different
letter
First
letter
Trips
Third
different
letter
••~L-~----••--~b-~~----~.~~J
1
1
2
1
2
3
x
AGURE4A.2.
Sixth
different
lener
••
i~----i-~--
2 . . .
section 4,4
That is,
E(X2)
is a geometric random
The Geometric Distribution
with
p
= i.
319
Theorem 4.4.1,
g. (for each
~. Similarly, the- chances of getting a third different Letter are
purchase), so
i X3(k) =
P(X3
= k) =
(62)k-1 (4)6'
k=1.2 ..
and E(X3) = ~. Continuing in this fashion, we can
the remaining E(Xi)'S. follows
that a customer will have to make 14.7 trips to the store, on
average, to collect a
Letters:
complete set of
6
E(X)
=L
6
6
6
6
6
=1+5"+4+"3+2+1
= 14.7
EXAMPLE 4.4.3
Geometric random variables have a curious memoryless property: The probability that it
takes an additional X = k trials to obtain the first success is unaffected by however many
is,
failures have already
observed.
P(X
=n - 1
+ k IX
> n -
1) = P(X
= k)
(4.42)
Prove Equation 4.4.2.
We
from the definition of conditional probability that
P(X =
It -
1
+ k IX
> n
1)
- 1+
n
= - -=n
- -____
---:-------P(X > n - 1)
P(X:::: n - 1
-
+ k)
P(X:::: n
"-1
P(X:::: n
1)
= L (1
-
p)*-lp = P
(1 -
p)j
k=1
- 1), for a geometric random variable is p times the partial
showing that the edt,
(= 1 + (1 - p) + .. , + (1 - p)n-2). Formulas for
sum of a geometric
partial sums are well known.
Here,
11-2
p
L(1 - p)j =
j=O
320
Chapter 4
Special Distributions
Therefore,
P(X=n
1
+
k I X > n - 1)
= -1"-'-~-"":'------::-::­
= p(1
and the latter equals P(X
-
p)k-l
= k).
Comment. Radioactive decay is a physical process for which the memoryless property
consecutive intervals of equal
described in Example 4A.3 applies. If "time" is divided
duration,
period in which a nucleus decays is modeled by
distribution,
where p is a function the
EXAMPLE 4.4.4
One of the "can't miss" schemes that every would-be gambler sooner or later reinvents is
the double-yow-bet strategy.
playing a
of evenly matched games.
bet
01')
first
if you
you
$2 on tbe second game; if you lose tbat one, too,
you bet $4 on the third game; and so on.
Suppose you win for tbe first time on the kth
At tbat point you receive $2'I( and
yow net winnings will be $1.
net amount won after money won
lost on
winning ktb game = on kth game - previous k - 1 games
i' -
= = $1
(1
+ 2 + 4 + ... + 2k - 1)
(2 k -
1)
It would appear that doubling our bets guarantees a profit. Where is the catcb (or should
we all book the next flight to Las Vegas)?
The devil in this case is in the expected value. In order to bankroll the strategy
described, a player
to have $2k - 1 in order to be eligible to play the kth
(to win for the first time on
third try, for example, a player would
lost $1 on
the first game, $2 on the second game, and would have wagered $4 on the third game, so
- 1). Now, if it takes $3 to be
the money spent at that point is 1 + 2 + 4 = $7, or
eligible to win on the second game, and $7 to
eligible to win on the third
how
much capital does a player
to have on the average?
Let X denote the game where the player wins for the first time. Clearly, X is a geometric
~,k = 1,2,....
to win On the kth
random variable for which px(k) =
game requires an investment of g(k) == $2k - I, so the expected amount of money needed
!k-t .
Section 4.4
The Geometric Distribution
321
is E(g(X)), where
E[g(X)]
= £[2x
1)k-l1
- 11 =
k=l
(2" - 1) ( 2
2
(1 - (~r)
1
3
7
=2+4+8+'"
term in the infinite series that defines E[g(X)J is larger than the one that
the gambler would need to
an infinite amount of money in order to im~
plement a double-your*betstrategy! (On a more practical
casinos always have house
limits, so players would not allowed to double their bets indefinitely anyway. Recall that
a similar analysis played a role in the St Petersburg analysis introduced
3.5.5.)
QUESTIONS
convlcllons
fraud
forgery, Jody has a 30% d1ance
each year of
her tax returns audited. What is the probability that she will escape
distorts, misrepresents,
detection for at least three years? Assume that she
and cheats
year.
4.4.2. A teenager is
to get 8
license. Write out the formula for the pdf Px(k),
where the random variable X is the number of tries that he needs to pass the road test.
Assume that his probability of
the exam on
given attempt is 0.10. On the
to require
he gets his license?
average, how many attempts is
4.4.3. Is the following set of data likely to have COme
the geometric pdf px(k) =
(i)t-l . (1), k 1, 2, ... ? Explain.
281
5
4
2 5
2
1
2
5
1
3
2 3 6
3
4
6
262
4 2
2
2 .4
375
3
1
7
3
3
2
8
3
4
4
8
4
2
9
6
3
7
5
3
2
a young couple plans to continue having children until they have their
is
the outcome of each birth
first girL Suppose the probability that a child is a
is an independent event, and the birth at which the first girl appears has a geometric
distribution What is the couple's
Is the geometric pdf a reasonable
model here? Discuss.
4.4.5. Show that the cdf for a geometric random variable is given by Fx(t) ::::: P(X :5 t) =
1 - (1 - p)fll, where [I] denotes the greatest
in t.
4.4.6. Suppose three
dice are
repeatedly. Let the random
X denote the
roll on which a sum of,* appears for the first time. Use the expression for Fx(t) given in
Question 4.4.5 to evaluate P(65 ~ X ~ 75).
4.4.7. Let y be an exponential random variable, where Jr (y) = Ae- Ay • 0 :5 For any DmrrtI\re
integer n, show that P(n !S Y :5 n + 1) =
e- A ). Note that p = 1 , the
"discrete" version of the exponential pdf is the geometric pdf.
4A.4. Recently
!,
4
Special Distributions
4AJt Sometimes the
random variable is defined to be the number of trials, X;
preceding the first success. Write down the
and derive the momentgenerating function for X two ways-(1) by
directly and (2) by
Theorem 3.12.3.
4.4.9. Differentiate the ",r.rn~>nt.a"'r\",..:"t
~~)m,etrilC random variable and
!leorem 4.4.11 e'
1"
4AJ.O. Suppose that the random variables Xl and X2 have mgfs Mx;(t)
1 - (1 -
!)e'
and
Does X have a geometric
Let X = Xl +
(1 -;rt
1)'
e'
distribution? Assume that Xl and
are independent.
4A.l1. The
moment-generating junction for any random variable W is the expected
d'
value of tWo Moreover -EU w )
E(W(W - 1)··· (W - r +
Find the
dt'
factorial moment-generating function for a g~)m,etrllc random variable and use it to
in Theorem 4.4.1.
verify the expected value and variance
MX2(t)
4.5
=1 -
TI-lE NEGATIVE BINOMIAL DISTRIBUTION
The geometric
"..10.,,''''''' introduced in Section 4.4 can be generalized in a very
straigbtiorwal'd ~.......cvv ...............,5"" .... waiting for the rth (instead of the first) success in a
of independent
where each trial has a probability of p of ending in Sllccess
Figure 4.5.1).
T
-1 successes and k - 1 - (r - 1) failures
S
F
s
F
s
1
RGURE4.5.1
Let the random variable X denote the trial at which the rth success occurs.
px(k)
= k) = P (rah success occurs on lcth trial)
= P(r-1 successes occur in first k-l trials and
Sllccess occurs on lth trial)
= P(r-1 successes occur in first k-1 trials)
. P(success occurs on kth trial)
= (k
- 1)
1
r -
(: =~)P'(1
random variable whose
has the form given in
binomiLll diszribution (with parameter p).
neJiEUllVe
,k=r,r
+ 1,...
is said to
(4.5.1)
a
Section 4.5
The Neg<rtive Binomial Distribution
323
Comment. Two equivalent formulations of the negative binomial structure are widely
used. Sometimes X is-de~ed to be the number of trials preceding the rth success; other
times, X is taken to be the number of trials in excess of r that are necessary to achieve
the rth success. The underlying probability structure is the same, however X is defined.
We will primarily use Equation 4.5.1; properties of the other two definitions for X will be
covered in the exercises.
Theorem 4.5..1. Let X have a negative binomial distributwn with px(k)
(1 -
p)k-~. k
1. Mx(t)
= r, r +
= [1
_
r
= (kr
1)
p',
- 1
1, .. , Then
(:e'_ p)e
l
r
2. E(X) = P
3. Var(X) = r(l - p)
p2
Proof. All of these results follow immediately from the fact that X can be written
as the sum of r independent geometric random variables. Xl. X2, .,., X y , eacb with
parameter p. That is,
x = total number of trials to achieve rth success
= number of trials to achieve 1st success
+ number of additional trials to achieve 2nd success + ...
+
number of additional trials to achieve rth success
= Xl +
X2
+ .,. +
X,
where
PXj (k)
= (1
-
p)k-l p,
k
= 1,2, ... •
i
= 1,2, ... ,T.
Therefore,
Mx(t)
= MXJ (t) MX2 (t} . .. Mx, (t)
~ [1 - (i'-p)e'
J
Also, from Theorem 4.4.1,
E(X)
= E(Xl) +
E(X2)
+ .. , +
1
1
1
p
p
p
=-+-+ ... +
r
p
E(X,)
324
Chapter 4
Distributions
Var(X) = Var(Xt)
+
1-p
=lJl+
-
Var(X2)
1
p
+ ... + Var(Xr )
I-p
+"'+];2
p)
o
EXAMPlE 4.5.1
California
are a semipro baseball team. ti.ct('hp'urrr.o all forms of violence,
laid-back Mellow batters never swing at a pitch, and should
fortunate enough
to reach base on a walk, they never try to steal. On the
how many runs will
the Mellows score in a
road game, assuming the
pitcher has a 50%
probability of throwing a strike on
given pitch (81)?
solution to this
illustrates very nicely the interplay IV>t'WP-jl'n
constraints imposed by a
this case, by the rules of baseball)
binlOIIual
mathematical characteristics
ptobability model The
appears twice in this
along with .several of the nr(,,,,,,,r',,"C:
with expected values and linear CO(Oblnatlons.
To
we calculate the probability of a Mellow batter striking out. Let the random
variable X
the number of pitches
for that to happen. Clearly, X = 3,4,5,
or 6
can X not be larger than 61), and
Q,:><>V.... " " .........
px(k)
= P(X = k) =
P(2 strikes are called in the
is the 3rd strike)
pitches
=(k;
k-l
1)(~r(~r-3,
k:3.4,5.6
Therefore,
P(bauer strikes out) =
px(k) =
(~f + (~) (~)
4
+
(~)
21
In order for W to take on the
strike
as must the (w + 3)rd (see
p = P (batter strikes out) =
pw(w)
number of walks the Mellows
a given
two of the first w + 2 batters must
pdf for W, then, is a
...A.D....... }'
(ll)W
= peW = w) = ( w +2 2) (21)3
32
32
'
w=O.1.2 ....
Section 4.5
The Negative Binomial Distribution
325
2 oots, w waJ:ks
Out
w+2 w+3
1
Batters
FIGURE 4.5.2
In order for a run to score, the pnc:ner
loaded. Let the random variable R TiA ...,,-,f-,,,
inning. Then
R=
the bases
0
ifw::;:3
w-3 ifw>3
1
and
00
. peW = w)
=2:(w -
3
-
W"'O
+
= E(W) - 3
L
(w -
= W + 3 = total
(4.5.2)
== pw(t
-
3)
C; 1) (~~r (~~y-3
binomial pdf with r
which we recognize as a
3
E(T)
32
_T
3-
which makes E(W) =
From Equation 4..5.2,
given inning is 0.202:
E(R)
= ~1
transformation
H ....j ... ' . ." "
of Mellow batters appearing in a given inning
Then
pr(l)
pew = w)
(3 -
To evaluate E(W) using
S'Lallem.ent of Theorem 4.5.1
4.5.1. Let
to rescale W to the format of
T
3) .
w...o
-
t
== 3, 4, ...
= 3 and p
Therefore,
32
= 21/32 = 1"
3-11
-"T'
expected number of runs
~VL..:;u
by
Mellows in a
(~) (~~y (~~r + 2 . (~) (~~y G~Y
+ 1 . (~) (~~y (~~)2
- 3
+
3 .
326
4
Special Distributions
Each of the nine innings, of course, would have the same value (or E(R), SO the expected
number of runs in a game is the sum 0.202 + 0.202 + ... + 0.202 = 9(0.202), or 1.82.
QUESTIONS
4.s.L A door-to-door encyclopedia salesperson is required to document five ill-home visits
each day. Suppose that she has a 30% chance of being invited into any given home,
with each address representing an independent trial. What is the probability that she
requires fewer t.han eight houses to achieve her fifth success?
4.5.2. An underground military installation is fortified to the extent that it can withstand up
to three direct hits from air-to-surface missiles and still function. Suppose an enemy
aircraft is armed with missiles, each
a 30% chance of
a direct hit. What
is the probability that the installation will
destroyed with the seventh missile fired?
a fair coin and record the toss, X,
4..5.3. Darryl's statistics homework last
The experiment was to be repeated a total of
where heads appears for the second
100 times. The following are the 100 values for X that Darryl turned in this
Do
you think t.hat he actually did the assignment?
3
7
4
2
8
3
3
4
3
5
7
3
3
3
2
8
4
2
5
2
2
6
2
4
3
5
2
3
5
2
4
7
2
7
4
4
5
6
5
3
9
3
4
2
4
·3
3
5
3
2
2
2
6
2
5
4
10
5
5
4
6
3
2
3
4
7
2
4
5
4
2
6
4
3
3
4
2
8
6
6
3
3
2
3
3
5
2
4
5
4
2
3
4
2
3
6
2
3
2
3
4.5.4. When a machine is improperly adj1:lSt~ it has probability 0.15 of producing a defective
item. Each day the machine is run until three defective items are produced. If this occurs,
it is stopped and checked for adjustment What is the probability that an improperly
adjusted machine will produce five or more
before being stopped? What is t.he
average number of items an improperly adjusted machine will produce before
stopped?
4.5.5. For a negative binomial random variable whose pdf is given by
4.5.1, find
E(X)direcUybyevaluating
Ek(k - 11)P"(1 -
k=r
r
p)k-I'.Hint.: Reduce the sum toone
involving
binomial probabilities with parameters 7 + 1 and p.
4.5.6. Let the random variable X denote the number of trials in excess of 7 that are required
to achieve the 7th success in a series of independent
where p is the probability of
success at any given trial Show that
[Note: This particular formula for px(k) is often used in
of Equation 4..5.1 as the
definition of the pdf for a negative binomial random variable.]
Section 4.6
The Gamma Distribution
327
4.5.7. Calculate the mean, variance, and moment-generating function for a negative binomial
random variable X whose pdf is given by the expression
px(k)
= ( k + kr
-
1)
k
pro - p} ,
k
= 0, 1,2, ...
(see Question 4.5.6.)
4.5.8. Let Xl, X2, and X3 be three independent negative binomial random variables with pdfs
, = (k
px,(k)
_2 1) (~)3
(~)k-3
5
5
Ie
= 3, 4.5, '"
=
for i
1, 2, 3. Define X = Xl + X2 + X3. Find P(10 ::; X ::; 12). Hint: Use the
moment-generating functions of Xl, X2, and X3 to deduce the pdf of X.
4.5.9. Differentiate the moment-generating function Mx{t)
=
[1 _(~_
p)e l
r
to verify
the formula given in Theorem 4.5.1. for E(X).
4.5.10. Suppose that Xl, X2 • ... , Xk are independent negative binomial random variables with
parameters'1 and p, r2 and p, ... , and rk and p, respectively. Let X = Xl + X2 +
... + Xi(. Find Mx(t), px(t). E(X), and Var(X).
THE GAMMA DISTRIBUTION
Suppose a series of independent events are occurring at the constant rate of)., per unit time.
If the random variable Y denotes the interval between consecutive ocrurrences, we know
from Theorem 4.2.3 that fy (y) = >..e- Ay , y > O. Equivalently, Y can be interpreted as the
"waiting time" for the' first occurrence. This section generalizes the Poisson/exponential
relationship and focuses on the interval, or waiting time, required for the rtb event to
occur (see Figure 4.6.1).
Theorem 4.6.1. Suppose tJwt Poisson events are occurring at the constant rate oJ)., per unit
time. Let the random vllrWble Y denote the wailing lime for the rth event Then Y has pdf
Jy(y), where
fy (y
) = (r
r-1 ->.y
).,r
1)! Y
e
,
y> 0
y
•
0
;H
~
~
First
success
Second
rth
success
success
AGURE 4.6.1
~
Time
328
Chapter 4
Special Distributions
Pf'ODj. We will establish the formula (or fy(y) by deriving
differentiating its edf,
Fy(y). Let Y denote the
time to the rthoccurrence. Then
Fy(y) = P(Y ~ Y)
1 -
P(Y > y)
= 1 - P(fewer than r events occur in [0, y])
(AY)*
=1-
k!
since the number of events that occur in the interval [0, y] is a Poisson random variable
with
Xy.
From Theorem 3.4.1,
Jy(y)
=
J
d [1
Fy(y)
= dy
(A.y)k-l
Ae-1y(A.yl
k!
(k ~
1)'
A.e- Ay (AY)*
A.e-ly (A.y)k _
k!
k!
e -AY •
y>O
0
EXAMPLE 4.6.1
space shuttles plan to include two fuel
a~agnmg the next generation
pumps-one
the other in reserve. [f
pump malfunctions,
IS
automatically brought on line.
that fuel be pumped for at most fifty
Suppose a typical mission is expected to
hours.
to the manufacturer's
pumps are expected to fail once
pump
every one
hours (so A = 0.01). What are the chances that such a
system would not remain functioning for the full fifty
Let the random variable Y denote the time that will
before the second pump
breaks down.
to Theorem 4.6.1, the pdf for Y has parameters r = 2
A. = 0.01. and we can write
(001)2
y > 0
fy(y) = 7ye-O·OlY.
P(system fails to last for
=
1050 0.OOOlye-0.01Y dy
(0.50
= 10
II
ue- du
Section 4.6
·The Gamma Distribution
329
where u = O.Oly. The. probability, then, that the primary pump and its backup would not
remain operable for the targeted fifty hours is OJJ9:
0.50
1
o
- l)e- p l
ue-U du =
0.50
,,=0
=0.09
Generalizing the Waiting Time Distribution
10
10
By virtue of Tbeorem 4.6.1,
y,.-le-).Y dy converges for any integer r > O. But
tbe convergence also holds
any real number r > 0, because
any such r there
will be an integer t > r
dy :s
/-1
dy < 00.
finiteness
y,.-le-:Ay dy justifies tbe consideration of a related definite integral, one tbat was first
studied by Euler, but named by Legendre.
10
10
Definition 4.6.1. For any real number r > 0, the gamma /unction of r is denoted r (r),
where
foOO
r(r) =
Theorem 4.6.2.
1. f(l)
rer)
== 10
dy
dy for any real number r > O. Then
=1
2. qr) =
- l)r(r - 1)
3. Ifr is an integer, then f(r)
= (r
- I)!
Proof,
r(I)
10
1. Integrate
1
gamma function by parts.
fooo y,.-l
dy
= y,.-l and dv =
e-yl: + 1000 (r - 1)y,.-2
dy =
=(r
- 1)
r.
Tben
u
y,.-2
dy
= (r
3. Use Part (2) as tbe basis for an induction argument.
exercise.
-
dy
I)r(r - 1)
details will be left as an
o
Definition 4.6.2. Given real numbers r > 0 and)" > 0, the random variable Y is said
gamma pd/with parameters r and A if
to have
),,1'
Jy(y)
=
f(r)
y > 0
Comment. To justify Definition 4.6.2 requires a proof that frey) integrates to one.
Let u = )"y. Tben
330
Chapter 4
Special Distributions
ThOOl'em 4.6.3. Suppose thot Y has a gamma pdf with parameters r and A.. Then
1. E(Y) = rjA
2. Var(Y) = r/A 2
Proof.
1.
E(Y)
=
AT 10
= --
00
00
10o
dy
r(r)
Ar r(r
-- r(r)
r(T) 0
yT e-AY dy
Ar Tr(r)
dy----(1)-rjA
- r(T) Ar+l -
+ 1) fAT+!
0 r(r +
2. A calculation similar to the integration carried out in Part (1) shows that
E(y2) r(r + 1)/A2. Then
Var(Y)
= E(y2)
= r(r +
[E(y)f
1)/1..2 -
(r{Ai
= r/A2
Sums
o
Gamma Random
We have already seen
certain random variables satisfy an additive property that
"re~pf(XlUlCe!;" the
sum of two independent binomial random variables with
the same p, for example, is binomial (recall Example 3.8.1). Similarly, the sum of two
independent
is Poisson and the sum of two independent normals is normal. That
said, most random variables are not additive.
sum of two independent uniforms is not
uniform; the sum of two independent exponentials is not exponential; and so on. Gamma
random variables belong to the short list making up the first category.
Theorem 4.6.4. Suppose U has the gamma pdf with parameters r and A., V has the gamma
pdf with parameters s and A., and U and V are independent. Then U + V has a gamma pdf
With parameters r + s and A..
Proof. The pdf of the sum is the convolution integral
Make the substitution v
= ujt. Then the integral becomes
Section 4.6
The Gamma Distribution
331
and
(1 - V)&-I
dV)
(4.6.1)
The numerical value of the constant in parentheses in Equation 4.6.1 is not immediately obvious, but the factors in front of the parentheses correspond to the functional
part of a gamma pdt with parameters r + sand A. It follows, then, that fu+v(t) must
be that particular gamma pdf. It also follows that the constant in parentheses must
equal 1/ f(r + s) (to comply with/Definition 4.6.2). so, as a "bonus" identity, Equation
implies that
(I v r - 1(I _
V),,-I dv =
10
_f....(r_)_r....
(s....
)
r(r + $)
o
EXAMPLE 4.6.2
In a
large industrial plant, on-the-job accidents
a worker
to be confined to a bed occur at the rate of 0.7 per bour. The company's infirmary has
ten beds. Use the central limit theorem and the properties of the gamma distribution
inadequate to meet the health
to approximate the probability that the infumary will
emergencies that arise during tomorrow's eight-hour workday.
Let Yi denote the waiting time for the ith patient. Then Y
Yl + Y2 + ... + Yll
denotes the length of time from the start of the workday to when the eleventh person
a bed. Oearly, P(Y < 8) = P(infinnary is unable to provide enough beds).
11 and A
0.7, so
Here Y is a gaI11:Qla random variable with parameters r
E(Y) = 11/0.7 = 15.7 and Var(Y) = 11/(0.7)2 == 22.45. Using the central limit theorem,
then, we find that the probability of the infirmary having too few beds to accommodate
tomorrow's demand is apprmtimately 0.05:
=
P(Y < 8)
- 15.7
=P ( Y~
<
'" 22.45
== P(Z < .63)
=0.05
=
8 - 15.7)
~
'" 22.45
Comment. With the help
computer software, the exact answer to the question
posed in Example 4.6.2 can be readily obtained. According to MINITAB,
P(Y < 8)
=
8 (0.7)11
10o
=0.03
10!
yl0 e-O•7y dy
:U2
Distributions
Tbeorem4.6.S. IfY has a gamma pdf with parameters r and ~ then My(t)
(1 - tjl,rr.
Proof.
- (A
-- tY
= (1
-
D
QUESTIONS
4.6.1. An Arctic weather station has three electronic wind gauges. Only one is used at any
given time. The lifetime of each gauge is exponentially distributed with a mean of 1000
hours. What is the pdf of Y, the random variable measuring the time until the last
gauge wears out?
4.6~
In Example
what
account for the sizeable discrepancy between the exact
value for P(Y < 8) and its central limit theorem approximation?
4.6.3. A service COntact on a new university computer system provides 24 free repair calls
from a technician.
the technician is required on the average three times a
month. What is the average time it takes for the service contract to be fulfilled?
4.6.4. Suppose a set of measurements Yl, Y2. .. • YlOO is taken from a gamma pdf {or which
E(Y)
1.5 and Var(Y) = 0.75. How many Yi 's would you expect to find in the interval
(1.0,2.5)1
4.6..5. Demonstrate that J.. plays the role of a scale parameter by showing that if Y is gamma
with parameters r and A, then J.. Y is gamma with parameters rand 1.
4.6.6. Prove that r(1)
=
Hint Consider E(Z2), where Z is a standard normal random
variable.
4.6.7. Show that
_ 15 t;;
-
~v'"
4.6.8. If the random variable Y has the gamma pdf with
J.. > 0, show that
parameter r and arbitrary
E(YI'1I)
Hint: Use the {act that
10 y-1e-Ydy = (r
-
1)! when r is a positive integer,
4.6.9. Differentiate the gamma moment-generating function to
and Var(Y)
in Theorem 4.6.3.
the focmulas for E(Y)
4.6.10. Differentiate the gamma moment-generating function to show that the formula for
E(Y"') given in Question 4.6.8 holds for arbitrary r > O.
Section 4.7
Taking a Second Look at Statistics (Monte Carlo Simulations)
333
TAKING A SECOND LOOK AT STATl5l1CS (MONTE CARLO SIMULATIONS)
Calculating probabilities associated witb (1) single random variables and (2) functions of
sets of random variables has been the overarching theme of Olapters 3 and 4. Facilitating
those computations has been a variety of transformations. summation properties, and
mathematical relationships linking one pdf with another. Collectively, those results are
enormously effective. Sometimes, though, the intrinsic complexity of a random variable
overwhelms our ability to model its probabilistic behavior in any formal or precise way.
An alternative in those situations-tbat Plan
to use a computer to draw random
samples from one or more distributions that modeJ portions of the random variable's
behavior. If a large enough number of such samples is generated, a histogram (or densityscaled hjstogram) can be constructed that will accurately reflect the random variable's
true (but unknown) distribution. Sampling "experiments" ofthis sort are known as Monte
Carlo stwlies.
Real-life situations where a Monte Carlo analysis could
helpful are not difficult to
imagine. Suppose,
instance, you just bought a statewof-the-art, high-definition, plasma
screen television. In addition to the pricey initial cost, an optional warranty is available
that covers all repairs made during the first two years. According to an independent
laboratory's reliability study. this particular set is likely to require 0.75 service calls per
year, on the average. Moreover, the costs of service calls are expected to be normally
distributed with a mean (J,t) 01 $100 and a standard deviation (0') of $20. If the warranty
sells for $200, should you buy it?
Like any insurance policy, a warranty
or may not be a good investment, depending
on what events unfold, and when. Here the relevant random variable is W, the total
amount spent on repair calls during the
two years. For any particular customer, the
value of W will depend on (1) the number of repairs needed in the first two years and
(2) the cost of each repair. Although we have reliability and cost assumptions that ""IHP"""
(1) and (2), the 2-yr limit on
warranty introduces a complexity that goes beyond wbat
we have learned in Chapters 3 and 4. What remains is the option of using random samples
to simulate the repair costs that might accrue during those first two years.
Note, first, that it would not be unreasonable to assume tha t the service calls are Poisson
events.(occurring at the rate of 0.75 per year). If that were the case, Theorem 4.2.3 implies
that
interval,
between successive repair calls will have an exponential distribution
with pdf
fr(y) =0.75e-o.75y ,
y > 0
(see Figure 4.7.1). Moreover, if the random variable C denotes the cost associated with a
particular maintenance call, then, by assumption,
-oo<c<oo
(see Figure 4.7.2).
334
Chapter 4
Special Distributions
0.8
0.6
0.4
0.2
o
4
2
FIGURE 4.1.1
u=
--~~------------~------------~~-c
"
160
100
40
FIGURE 4.1.2
Now, with the pdfs tor Y and C fully specified, we can use
to generate
repair cost
We begin by generating a random sample (of size
one) from the pdf, Jy (y) =
appropriate MINITAB syntax is
/ ' " 1/0.75
MTB > random 1 c1;
SUBC > exponential 1.33.
MTB > print cl
As shown
4.7.3, tbe number
was 1.15988 yrs
repair call occurring
days
1.15988 X 365) after the purchase of tbe
to a
0.8
o
123
)1= 1.15988
4
RGURE4.1.]
Applying the same syntax a
yielded the random
0.284931 yrs
104 days); applying it still a third time produced tbe observation
yrs
534
These last two observations taken on ff (y) correspond to the second
Section 4.7
Taking a Second Look at Statistics (Monte Carlo Simulations)
3rd breakdown (y
423
i'\4 days
104 days
335
1.46394)
repair co&t not covered
~---'----~·r-----'----_~__
n
__
TIme after
purchase (days)
1st breakdown (y '" 1.15988)
repair CO&t '" $127.20
Old breakdown
cost
FiGURE 4..7.4
repair call occurring 104 days after the first; and the third occurring 534 days after the
second (see Figure 4.7.4). Since the warranty does not extend past the first 730 days, the
third repair would not be covered.
The next step in tbe simulation would be to generate two observations frorn Idc) that
would rnodel
costs of the two repairs that occurred during the warranty period. For
each repair,
MINITAB syntax for generating a cost would be
/
'" normal tL u.
MTB
> random 1 cl;
SUllC > normal 100 20.
MTB
> print cl
Running those cornmands twice produced c-values of 127.199 and 98.6673
ure 4.7.5). corresponding to repair bills of $12720
$98,67,
that a total of
$225.87
$127.20 + $98.67) would have been spent on maintenance during the first two
years. In that case, the $200 warranty would bave been a good investment.
The final "step" in the Monte Carlo analysis is to repeat rnany, many times the sampling
"'-'\"""0:<:: that
to
4.7.5-that
generate a
of YiS whose sum (in days) is
less than or equal to 730, and for each Yl in that
generate a corresponding cost,
Ct. The sum of those CiS, then, wit! be a simulated value of
maintenance-cost random
variable, W.
The histogram in Figure 4.7.6 shows the distribution of
costs incurred in
one hundred simulated two-year periods, one being the sequence of events chronicled
in Figure 4.7.5. There is rnuch that it tells us.
of all (and not surprisingly), the
warranty costs more than either the median repair bill (=1117.00) or the rnean repair bill
$159.10).
The customer, in other words, will tend to lose money on the optional protection, and
the company will tend to make money. On the other band, a full 33% of the simulated
two-year breakdown scenarios lead to repair bills in excess of $200, including 6% that
were more than twice the cost of the warranty. At the other extreme, 24% of the
samples produce no maintenance problems whatsoever;
customers, the $200
spent "up-front" is totally wasted!
_
So, should you buy the warranty? Yes, if you feel the need to have a financial cushion
to offset the (small) probability of experiencing exceptionally bad luck; no, if you can
afford to :absorb an occasional big
Spedal Distributions
MTB > random 1 eli
SUBC >
1.33.
MTB > print el
0.8
c1
0.2
1.15988
> random 1 c1;
SUBC > normal iOO 20.
MiS > print c1
cl
127.199
0.6
0.4
o
2
3
100
14(1
2
:I
100
14(1
2
3
4
,
MTB
I
I
I
---- ....
.;
.;
, /0.01
60
MTB > random 1 c1;
SUBC >
1.33.
0.8
MTB >
0.4
cl
0.2
c
0.6
0.284931
y
0
,
MTB > random 1 el;
,
I
J
SUBC > normal 100 20.
> print c1
c1
, /0.01
MTB
-~
98.6673
..
","."
60
M
> random 1 e1;
1.33.
SUBC >
MiS >
Q.6
el
0.2
MTB
4
OA-
1.46394
0
RGURE4.7.5
1
4
Appendix 4A 1
"~Ilty '""'"
~~tOO(l"'loo)3°O
W.... ,"
"''''
o
MINITAB Applications
331
$500
$300
$400
Simulated repair COSI$
AGURf4.7.6
IENDIX 4.A.1
MINITAB APPUCAll0N5
Calculations involving Poisson, exponential,
and gamma random variables can
be readily handled with MINITAB's PDP and CDP commands (recall Appendix 3.A.l).
Figure 4.A1.1(a) shows the syntax for doing the Poisson caJculation in Example 4.2.2.
Values of px(k) = e- LS (1.5)k I k! for all k can be printed out by USing the PDF com.maod
without specifying a particular k [see Figure 4.A.1.1(b)].
(a)
(b)
MTB > edf 3;
SUBC > poisson 1.5.
Cumulotivc Dilitribution Function
Poisson with mu = 1.50000
MTB > pdf;
SUBC > pOisson 1.5.
Probability Density Function
Poisson with mu = 1.50000
P(X <= x)
x
3.00
0.9344MTB > let k1 = 1 - 0.9344
MTB > print: kl
Data Display
kl
0.0656000
o
x
1
2
3
4:
5
6
7
8
9
P(X"
0.2231
0.3347
0.2510
0.1255
0.0471
0.0141
0.0035
0.0008
0.0001
0.0000
P(X > 3)
= 0.0656
AGURf 4.A.1.1
Areas under normal curves between points (l and b are calculated by subtracting Fy(a)
from Fy(b) , just as we did in Section 4.3 (recall the comment after Definirion4.3.1).
is no need, however, 10 reexpress the probability as an area under the standard normal
curve. Figure 4.A.L2 shows the MINTfAB calculation for the probability that the random
variable Y lies between forty-eight and fifty-one, where Y is nonnally distributed with
338
Chapter 4
Special Distributions
JL
= 50 and a
4. According to
computer,
P(48 < Y < 51)
= Fy(51)
- Fy(48)
= 0.5987
- 0.3085
= 0.2902
MTB > edt 51;
SUBC> normal 50 4.
Cumulative Dis.tnlnrtioo Fundion
=
Normal with mean
60.0000 and standard deviation
x
P( X <= x)
51.0000
0.5987
MTB > edt 48;
SUBC> normal 50 4.
Cumulative Distribution Fnndioo
Normal with mean = 50.0000 and standard deviation
x
P( X <= x)
48.0000
0.3085
MTB > let kl - 0.5987 - 0.3085
MTB > print k1
=
4.00000
=
4.00000
Data Display
kl
0.290200
FIGURE 4.A.1.1
Exponential and gamma integrations can also be done on MINITAB, but the computer
expresses those two pdfs as fy(y) = (1j'A)e-Y/J.. {instead of fy(y) = ).e-J..y) and fy(y) =
1
).r(r _ 1)!
10
~
[instead of frey) = (r _ 1)!yr-l e -J..Y). Therefore, to evaluate
1
0.50e-{).50y dy, for example, we would
MTB > edt 1;
SUBC >
2.
(rather than SUBC > exponential 0.50).
Recall Example 4.6.1. In the notation of
4.6.1, P(Y < 50) is the cdi evaluated
at
for a gamma random variable having r 2 and). = 0.01. In MINITAB's notation,
the second parameter is entered as 100
1/0.01)
Figure 4.A.1.3).
On
occasions in O1apter 4 we made use of MINITAB's RANDOM command,
a subroutine that
samples from a specific
Simulations of that sort can be
very helpful in illustrating a variety of statistical
Shown in
4.A.l.4, for
example, is the syntax for generating a random sample of size 50 from a binomial
having n = 60 and p = 0.40. And calculated for each of those 50 observations is its
=
Appendix 4.A.1
MINITAB Applications
339
MTB > cdf 50;
glffiC::.: gamma 2 100.
Cumula.t:ift Distribution Fundion
Gamma with &
x
50.0000
= 2.00000
and
b
= 100.000
p( X <-
0.0902
FlGURE4A13
MTB > random 60 c1;
SURe> binomial 60 0.40.
Data Display
Cl
27 29 23 22 21 21 22 26 26 20 26 25 21
32 22 27 22 20 19 19 21 23 28 23 27 29
13 24 22 26 25 20 25 26 15 24 11 28 21
16 24 22 26 25 21 23 23 20 25 30
MTB > let c2 a (cl - 24)!sqrt(14.4)
MTB > name c2 JZ-ratio J
MTB > print c2
DmDisplay
Z-ratio
0.19057 1.31762 -0.26352 -0.52705 -0.79057 -0.79057
O.
0.52705 -1.05409 0.52705 0.26362 0.79057
-0.52706 0.79057 -0.62705 -1.05409 -1.31762 -1.31762
-0.26352 1.05409 -0.26352 0.79057 1.31762 -2.89875
-0.62706 0.26362 0.26362 -1.05409 0.26352 0.52705
0.00000 . -1. 84466 1.06409 -0.79057 -2.10819 0.00000
0.26362 0.26352 -0.79057 -0.26352 -0.26362 -1.05409
1.58114
-0.52706
2.10819
-0.79057
0.00000
-2.37171
-0.52706
0.26352
R6URE 4.A. 1.4
Z-ratio, given by
.
X - E(X)
X - 60(0.40)
Z-ratlO =
=
JVar(X)
J6O{0.40) (0.60)
X - 2A
== -=::-
(By the DeMoivre-Laplace Theorem, of course, the distribution of those ratios should
normal pdf. !z(1.).) In addition to the binomial
have a shape much like the
distribution, the RANDOM command can also be used to generate samples from the
uniform, Poisson, normal, exponential, aod gamma pdfs.
Often the first step in summarizing a large set of measurements is the construction
of their histogram, a graphical format especiaUy effective at highlighting the shape of a
is calibrated in such
distribution. A density-sCIlled histogram is one whose vertical
a way that the total area under the histogram's bars is equal to 1. The latter version
allows for a direct comparison between the sample distribution and the theoretical pdf
340
Chapter 4
Special Distributions
MTB >
DATA)
DATA)
DATA>
MTl! >
DATA>
DATA>
set <:1
126 73 26 6 41 26 73 23 21 IB 11 3 3 2 6 6 12 38
6 65 68 41 38 50 37 94 16 40 77 91 23 51 20 18 61 12
end
set c2
0 20 qO 60 80 100 120 140
end
MT8 > Histog~am c1;
SUSC>
Density
SUSC>
CutPoint c2:
SUllC>
SUllC>
SUllC>
Bar;
Type 1;
Colo~ 1.
0.02
0.01
o
20
60
80
100
120
140
y
FIGURE 4.A.l.S
from which the data presumably came (recall Case Study 4.2.4).
4.Al.S shows the
MINITAB syntax that
the density-scaled histogram 1"I1("h"'~'1i in Figure 4.2.3.
MINITAB Windows
There is a Windows version of MINITAB that is very convenient for doing many of
data applications that will be discussed in
end-of-chapter appendices.
point-and-click steps will be set
in boxes, like the one below showing the procedure
for constructing a histograIIL
Histograms Using MlNITAB Windows
1.
2.
3.
4.
Enter the data in Cl.
Click on GRAPH, then on HISTOGRAM.
Cl in the GRAPH VARIABLES box.
Click on OK
Appendix 4A2
ENDIX 4.A.2
A Proof of the Central limit Theorem
341
A PROOF OF THE CENTRAl UMrT THEOREM
Proving Theorem 4.3.2 in itS full generality is beyond the level of this text. However, we can
establish a slightly weaker version of the result by assuming that the moment·generaling
function of each Wi
Motivating
derivation is the following lemma.
Lemma. Let Wh Wl •... be a set of random variables such thaI lim MW,,(t)
aJIt
someinleTVaiaboUlO. Then lim Fw,,(w)
,,~oo
"_00
= Fw(w)forall-oo <:: w
= Mw(t) for
<:: 00.
To prove the central limit theorem using moment.generating functions requires showingthat
For notational simplicity, let
+ ... +
WI
where ~ = (Wi Theorem 3.123,
f.L)Ju.
W" - nJ1.
=
SI
+ ... + S"
0 and Var(Si)
Notice that E(S/)
= 1. Moreover, from
where M(t) denotes the moment·generating function common to each of the SiS.
By virtue of the way the SiS are de:fioed, M(O)
1, M(l) (0) = E(S,) = 0, and
M(2)(O) = Var(S/) = 1. Applying Taylor's theorem, then, to M(t), we call write
for some numberr, 1'1
<::
itl. Thus
= exp lim n In
,,~oo
[1 +
-M(2)(S)]
2n
142
Chapter 4
Specie! Distributions
The existence of M(I) implies the ensteJ!lCe of all derivatives. In particular. M(3)(t)
so M(2)(t) is continuous.
lim M(2)(t) = M(2)(0)
1. Since lsi <
1_0
Itl/~, s -+
0 aSn -+ 00, so
lim M(2)(s)
1'1-"'00
= M(2}(O) = 1
Also, as n -+ 00, the quantity (t 2 /2n)M(2)(s) -+ 0 . 1
the definition of the derivative_ Hence we obtain
= 0, so it plays the role of" 6x" in
Since this last expression is the moment-generating function for a standard nonnal random
variable, the theorem is proved.
CHAPTER
5
Estimation
5.1
5.2
INTRODUCTION
ESTIMATING PARAMETERS: THE METHOD OF MAXIMUM LIKELIHOOD
AND THE METHOD OF MOMENTS
5.1 INTERVAL ESTIMATION
5.4 PROPfRTIES OF ESTIMATORS
5.5 MINIMUM-VARiANCE ESTIMATORS; THE CRAMER-RAO LOWER BOUND
5,6 SUFFICIENT ESTIMATORS
5.7 CONSISTENCY
5.8 BAYESIAN ESTIMA1l0N
5.9 TAKING A SECOND LOOK AT STATISTICS (REV1SmNG THE MARGIN OF ERROR)
APPENDIX 5A1 MINITAB APPLICATIONS
.
Ronald Aylmer Fishel'
A towering figure in the development of both applied and mathematical
statistics, Fisher took formal training in mathematics and theoretical
physics, graduating from Cambridge in 1912. After a brief career as a
teacher, he accepted a post in 1919 as statistician at the Rothamsted
Experimental Station. There the day-ta-day problems he encountered in
coliecting and interpreting agricultural data led directly to much of his
most important work in the theory of estimation and experimental design.
fisher was also a prominent geneticist and devoted considerable time to the
development of a quantitative argument that would support Darwin's theory of natural selection. He returned to academia in 1933, succeeding Karl
Pearson as the Galton Professor of Eugenics at the University of London,
Fisher was knighted in 1952.
-Ronald Aylmer Fisher (1890-1962)
].41
5
5.1
Estimation
INTRODUCTION
The ability probability functions to describe, or model., experimental data was demonstrated in numerous
in Chapter 4. In Section 4.2,
the Poisson·
number of alpha
from a radistribution was shown to predict very well
dioactive source as
as the number of fumbles
by a college football team. In
""""uun 4.3 another probability model, the
curve, was applied to
as
and IQ scores. Other models illustrated in
4
diverse as breath
induded the exponential,
binomial, and
distributions.
All of these probability
of course, are actually families of models
sense
each includes one or more parameters. The
model, for instance, is indexed
by the occurrence rate, A. Changing A changes the
associated with px(k)
which
px(k) = e-A).."'j k!, k = 0, 1,2, ... for A = 1 and ).. =
binomial model is defined in terms of the success probability p; the
tL and (1.
the two
M"'~n.,.'" any of these models can be applied, values need to be assigned to their
parameters. Typically, this is done
taking a random sample (of n observations) and
using those measurements to estimate
unknown parameter(s).
0.4
0.4
0.3
0.3
1\,=1
Px(k) 02
px(k) 02
0.1
0.1
0
I
0
2
4
6
8
k
0
1\,;;4
0
2
4
6
8
10
12
k
FIGURE 5.1.1
Imagine being handed a coin whose probability, p, of coming up
is unknown. Your
assignment is to toss the coin three
and use the resulting sequence of lis and Ts to
of three tosses turns out to
HHT. Based
suggest a value for p. SupfX>Se the
on those
what can be reasonably inferred about p?
Start by
the random
X to be the number
heads on Ii given toss.
Then
x=
I
01 if a toss comes up heads
if a toss
tails
Section 5.1
Introduction
345
and the theoretical probability model fOl" X is the function
k
px(k) = p (1 -
Expressed
Xl = I,Xz
p)
I-A-
terms of X. the sequence
= 1, and X3 = O.
=
{ P
1 _ P
for k
for k
=1
=
0
corresponds to a sample of size n
= 3, where
Since the XiS are independent random variables, the probability associated with the
sample is p2(1 - p):
P(XI = 1
n
X2
=
1
n
X3 = 0) = P(Xt
= 1)
. P(X2
= 1)
. P(X3
= 0) =
p2(1 -
p)
Knowing that our objective is to identify a plausible value
an "estimate") for p,
it could be argued that a reasonable choice for that parameter would
the value that
maximiZes
probability of the sample.
5.1.2 shows P(Xl = I, Xl = 1, X3 = 0)
as a function of p.
inspection, we see that the value that maximizes the probability of
HHTisp= j.
More generally, suppose we toss the coin n times and record a set of outcomes Xl::::: kl.
X2 = k2,.·., and XI'! = k". Then
... ,XI'! =kn )=pk1 (1 _ p)l-kl ••• p k.,
ki
-p ) 1-k"
n-tki
(1 -
p)
i=)
0.16
0.12
~
I
0.08
0.04
~~--~~--~~--~~~~--~~----p
0
0.2
0.4
0.6 '2
P"'j
FlGIJtuE 5.1.2
0.8
346
OIapter 5
Estimation
The
of p that maXlJ]!l1ZI~ P (X 1
n-
Jq
the
(1
of
dJdp
[.
r: k.
p=l
(1 -
= kl' ... , =k,,) is, of course, the value for which
k;
with respect to p is O. But
p)
.]
r: kl
,,-
p)
=
1=1
n
}:)i
[.
r: k,-l
pi=1
.,]
p)
(1 -
1=1
+
[t.
k
'
k,
n]
1'1-
(1
k,-l
(5.1.1)
p)
If the derivative is set equal to zero, Equation 5.1.1 reduces to
"
(1 -
p)
+(
ki
n)p=O
Solving for p identifies
(~)
ki
as the value of the parameter that is moot consistent with the n observations kh
... , k".
Comment. Any function of a random
whose objective is to approximate a
parameter is called a statistic, or an estimator. If 8 is the parameter being approximated,
its estimator will be denoted When an estimator is evaluated (by substituting the actual
measurements recorded), the
number is called an esti.mnte. In Example
e.
the function
n
(~)
Xi is an estimator for p; the value
= 3 observations are Xl = 1,
(~)
= 1, and
~
that is calculated when the
= 0 is an estimate of p. More specifically,
Xi is a maximwn likelihood estimator (for p) and
j
[=
(~)
i5
ki =
(~) (2)] is
a maximu.m likelihood estimate (for p).
In this chapter, we look at some of the practical, as well .as the mathematical, issues
involved
the problem of
How is the functional form of an
estimator determined? What
properties does a
estimator have? What
properties would we like an estimator to have? As we answer these questions, our focus
will begin to shift away from the study of probability and toward the study of statistics.
5.2 ESllMATING PARAMETERS: THE METHOD OF MAXIMUM UKELIHOOD
AND THE METHOD Of MOMENTS
fl, Y2, ., Y" is a random sample from a continuous pdf fy(y), whose unknown
parameter is o. (Note: To emphasize that our focus is on the parameter, we will ---'--J
continuous pdf's in
chapter as fy (y; 8); similarly,
probability models with an
Section 5.2
Estimating Parameters
341
unknown parameter (} will be denoted px(k; e)]. The question how should we use the
to approximate ef
Example 5.1.1. we saw that the
p in the discrete probability
ix(k; p) = P"(1 • k = 0, 1 could reasonably be estimated by the function
(~)
" ii. based on
random sample
il. X2
= k2 •. '"
XII =
kn. How would the
of the estimate
if the data came
say, an exponential distribution? Or a
distribution?
this section we introduce two
for finding
method of
maximum likelihood and the method of moments. Others are available, but these are the
same answer.
two that are the most widely used. Often, but not always, they give
... £\,,"~r,,",
The Method of Maximum ukeUhood
The basic idea behind m.aximum
is the rationale that was appealed
to choose as the
for () that value
to in Example 5.1.1. That it seems
the parameter that maximizes the "likelihood'" of tbe
latter is measurea
by a likelihood /unction, which is simply
product of the
pdf eVfllUilltea
each of the data
In Example 5.1.1, the likelihood function for the sample
for Xl = L X2 = 1, and X3 = 0) is the product p2(1 - p).
Definition S.2.L
pdf px(k; 8),
product of the
kl. k2 • ... , k" be a random sample of
n from the discrete
The likelihood /unction, L(8), is
9 is an unknown
evaluated at the n
n
L(O)
=Il px(kj; 9)
1=1
If Yl, Y2 •.•.• Yn is a random sample of
n from a continuous pdf, fy(y; 9), where ()
is an unknown parameter, the likelihood fUnction is written
II
L(O)
fr(Yi; B)
Comment. Joint pdf's and likelihood functions look the same, but the two are
for a set of n random variables is a
interpreted
A joint pdf
function of
of those n
variables, either k}, k2, ... , kn or Yt. Y2, ... , Yn.
By contrast, L is a function of 0; it sbould not be considered a function of either the kiS
or Yi&'
Definition 5.2.2. Let L(O)
=
n PX(ki; 0) and L(8) = n !Y(Yi; 8) be the
n
II
1=1
;=1
.uA~,.uLUJV'U
functions corresponding to random samples ih k2. ...•
and Yl, Y2 •... , YII
from the
pdf px(k; 9) and continuous pdf fy(y; 8), respectively,
9 is
an unknown parameter. In each case, let Be be a value of the parameter such that
L(O,,) 2: L(8) for all possible values of B. Then ge is called a maximum likelihood
estimalf
9.
348
Chapter 5
Estimation
Applying the Method of Maximum Likelihood
We will see in Example 5.2.1 and many subsequent examples that finding the Oe that
maximizes a likelihood function is often an application of the calculus. Specifically, we
solve the equation
:8
L(e)
=
0
for O. In some cases, a more tractable equation results by
setting the derivative of In L(O) equal to O. Sinoe In L(O)
that maximizes In L(B) also maximizes L(B).
with L(B), the same 8e
EXAMPLE 5.2.1
Suppose that Xl = 3, X2 = 2, Xs = 1, and X4 ::::: 3is a set of four independent observations
representing the
probability model, px(k) = (1 - p)k-l p, k = 1,2,3, ... FInd
the
likelihood
for p.
According to Definition 5.2.1,
L(p)={(1- p)3-1 p ]((1 - p)2-t p ][(1_ p)1-lp][(1_ p)3-l p J
::::: (1 _ p)5 p 4
Then In L(p)
= 5 In (1
- p)
+ 41n p. Differentiating in L(p) with respect to p gives
din
5
--dp
To find the
~p = 0
-
4
p
+-p
p that maximizes L(p), we set the derivative equal to zero. Here, _ _5_
1 - p
implies that
+
4(1 -
p)
= 0, and the solution to the latter is
Notice, also, that the second derivative of In L(p) ( =
-5
-
:2)
p :::::
+
a.
is negative for
all 0 < p < I, so p = ~ indeed, a true maximum of the likelihood function. (Following
tbe notation introduced in Definition
~ is called the mnximum likelihood estimate
fOT p, and we would write Pe =
Comment. There is a better way to answer the question posed in Example 5.2.l.
Rather than evaluate---and difierentiate-the likelihood function for tbe particular sample
observed (in this case, the four observations 3,2,1, and 3), we can get a more infonnative
answer by considering the more general problem of taking a random sample of size n
from px(k) (1 - p)k-lp and using the outcomes-Xl =- kb
= k2, ,., Xn = k,,-to
find a formula for the maximwn likelihood estimate.
For the geometric pdf, the likelihood function based on sucb a sample would be written
=
n
II
L(p)
=
(1 - pi~l-l p
;=1
= (1
-
"
Ek;-"
p)i-l
p"
Section 5.2
InL(p)
=
349
to work with In L(p) than L(p). Here,
it will be
was the case in Example
Estimati ng Parameters
(=1 n) .
k/ -
In(! - p)
+ nlnp
and
~ ... u,'''' the
derivative equal to 0
p(n -
tki) +
(1 - p)n =0
/",,1
which implies that
4
(Reassuringly, for the particular sample assumed in Example ..,."",.... - ••
= 4 and E ki =
j ...1
3+ 2+1+3=
formulajuslderived re<llUC<~to the maximum likelihood estimate
of ~ that we found at the outset)
Comment. Implicit in Example
and the
that followed is the important
distinction between a maximum likelihood estimate and a maximum likelilwod estimtUor.
The first is a number (or refers to a number);
second is a random variable (recall the
Comment on 346).
Both ~ and the formula --:-- are maximum li.kelihood estinwtes (for p) and would be
Eki
1... 1
denoted PI!, because both are numerical constants. In the first case, the actual values of
the kiS are provided and Pe ( ~) can be calculated. In the second case, the kj5 are not
identified but they are constants nonelrneJess.
If, on
other hand, we imagine
measurements before they are recorded-that is,
the random variables
... ,
the fonnula --:-- is more propedy
as
Ely
1=1
written as the quotient
n
latter, a random
is the maximum likelihood estimator (for p) and would be
denoted Ii Maximum likelihood estimators, such as p, have pdfs, expected values, and
ances; maximum likelihood
such as Pe, have none of those " ..... ...,. ,.,.""'~ .....')"""'Tti'·"
350
Chapter 5
Estimation
EXAMPlE 5.2.2
An
has reason to believe that the pdf
type of measurement is the continuous model
_1 ye-y/8
f y (y; ()
e2
the variability in a certain
0 < Y <
1
00;
0 < 0 <
00
Five data points have been collected-9.2. 5.6, 18.4, 12.1, and '10.7. Find the maximum
likelihood estimate for O.
Following
advice
in the Comment on p. 348, we begin by
a t;"-'1.L"-'L'''l
formula for Oe-that is, by assuming that the data are the n observations, Yl, )'2 •••. Y,.·
likelihood function, then, oe(;onles
I
1
e e
n -:Y'
il
L(O)
2
-",/8
I
JI
1=1
11
=e-2.n TIYi e
-(1/8)li
;=1
and
+ In TIYi
1'1
In L(O)
-2nlnO
i=1
- -01
Yi
Setting the derivative of In L(O) equal to 0 gives
1
dIn
e +
dO
02
II
;=1
which implies that
1 ,.
Oe= - LYi
2n i=1
The final step is to evaluate numerically the formula for O/!, Substituting the actual
11
5
E Yi
= 5 sample values recorded
::::::: 9.2
+ 5.6 + 18.4 + 12.1 + 10.7 = 56.0, so
1
Oe
= 2(5) (56.0) = 5.6
Using Order ..:>t",ri'ic-lti..c as Maximum Ukelihood Estimates
..
.
dL(()
dIn L(O)
.
Situahons elOst for which the equatIons - - 0 or
=
0 are not mearungful
dO
de
neither
yield a solution for O/!.
occur when the range of the pdf from which
the
are drawn is a function of the
estimated.
happens, for
instance, when the sample of YiS come from
unifonn
fy (y; O) = lIe, 0 :5 Y ::S 0.)
The maximum likelihood estimates in
cases wUl be an order
typically either
Ymill or ymll){.
Section 5.2
Estimating Parameters
351
EXAMPLE
Suppose Yl> Y2 •... , y" is a set of measurements representing an exponential pdf with
A. = 1 but with an unknown "threshold" parameter, 8. That
fY(y; 8)
e -(y-8) ,
Y 2:
e;
8 > 0
(see Figure 5.2.1). Find the maximum likelihood estimate for
FIGURE 5.2.1
Proceeding
function:
the usual fashion, we start by deriving an expression for the likelihood
=e
Here, finding 8e by solving
!!.. (de
t
Yi
+
nO)
equation d 1:
0
= 0 will not work because din L(e) =
= n. Instead, we need to look at the likelihood fUnction directly.
i=1
Il
- I: Yi+ n8
Notice that L~O} = e 1=1
is maximized when the exponent of e is maximized.
for
YI, Y2, .. , Yn (a!ld n),
-
E" Yi + nO as large as possible requires that 0
;=1
as large as possible.
5.2.1 shows bow large 8 can
only as far as the smallest order staristic. Any value of e
condition on frey; 8) that y 2: 8. Therefore, Be = Ymin.
It can be
to the
than Ymin would violate the
352
Chapter 5
Estimation
CASE sruOY 5.2.1
«What are you majoring in?" may be
most common question asked of a college student.
the answer is simple:
decided on a field of study, they doggedly
the way to graduation. For
though,
path is not so straight.
stay with it
Premeds losing the battle with organic chemistry and engineers unable to ;!I"'l,nrPrl
the joy of secants may find their roads to commencement taking a few detours.
Listed in the first two columns of Table 5.2.1 are the results of a "major" poll
conducted at the University of West
(114).
for each
356 upperX, thai he or she had switched majors.
classmen was the number of
Based on the nature of these data, it would not be unreasonable to hypothesize
the
the law of small numbers
that X has a Poisson distribution
Section 4.2). Do the actual frequencies support that contention?
TABLE 5.2.1
Observed
Frequency
Expected
237
90
230.4
1
2
3
7
Number of Major Changes
o
21.8
3.6
356.0
To see if px(k)
=e
-A}...k
can """'·"11'-'" an adequate
to these
for A. Given that
requires that we first
XII
k",
observations
... , and
"
L(}...)=D-. 1
,=
kI·!'
k;
=--"',,-fl ki !
i=l
In L(A)
11) InA
= -nA + ( t;ki
- In
[l" ki!
and
dIn
dA
=-n
+
(Col'Ifi.rn.I.ed on ~It page)
Section 5.2
Estimating Parameters 353
Setting the derivative equal to zero shows that the maximum likelihood estimate for
Ais the sample mean:
1 n
=-I)i
n
i=l
According to the infonnation
in Table 5.2.1, 237 of the ki 's were equal to
zero, ninety were equal to one, and so on. Substituting into Equation 5.2.1, then,
Ae
1
= -[237
356
= 0.435
.0
+ 90
. 1
+ 22
.2
+7
. 3]
so the specific model being proposed is
px(k) = - - - - : - - -
k
= 0, 1, 2, ...
The corresponding expected
356 . px(k)] for each value of X are
listed in column 3 of Table 5.2.1.
with the observed frequencies appears
to be quite good. Our
that nothing in these data rules out using
the Poi'lson as a
[Formal procedures, known as goodness-oj-fit
the
(or lack of agreement) between
tests, have been developed for
will be taken up in Chapter 10.]
a ret of observed and expected frequencies.
Finding Maximum uKelihood Estimates When More Than
One Parameter Is Unknown
If a family of probability models is indexed by two or more unknown parameters-say,
fJl, fh, .•. , 8k-finding maximum likelihood
for
O;s requires the solution of a
set of k simultaneous equations. If k = 2, for
we would typically need to solve
the system
ain
----=0
(101
iHnL(Ot,
=0
(lfh
EXAMPLE 5..2A
SUIl)o()Se a random sample of size n is drawn
-00
twcl-p~lralnet,er
< y <
method of maximum likelihood to find
00; -00
JLe
nonnal pdf,
< JL <
and
0;.
00; 0
2
> 0
354
Chapter 5
Estimation
We start by finding L(f..L.
and in L(f..L,
L(f..L,
(1
2
)
=
and
1'1
2
~ (Yi - f..L)2
In L(J.1., (1 2 ) = --In(21ra
) - -21 L..
2
j""J
a
Moreover,
It
L
=
1=1
and
n
---"-:.-"':'- =
1
. 21r -
~
2
t
(Yi - J-k)2 ( - ; )
i =1
(1
Setting the two derivatives equal to zero gives the equations
(Yi -
and
11
+
(5.2.2)
f..L) = 0
L (Yi -
J-k)2
=0
(5.2.3)
;=)
Equation 5.2.2 simplifies to
which implies that f..Le =
~
n
Yi
= y. Substituting f..L{!,
Equation
or
Comment. The method of maximum likelihood has a long history: Daniel Bernoulli
it as early as 1777 (136).lt was Ronald Fisher, though, in the early years of the
twentieth century, who first studied the mathematical properties of likelihood estimation
in any detail, and the procedure is often.c.~dited to
was
Estimating Parameters
Section 5.2
355
QUESTIONS
=
=
5.2.L A random sample of size 8--Xl
1, X2 0, X3 :: 1. X4 = 1. Xs
and Xs = O-is taken from the probability function
px(k; B) = tI(l - B)l-k,
= 0,1;
k
= 0, X6 = 1. X7 = 1,
0 < B < 1
Find the maximum likelihood estimate for B.
5.2.2.. The number of red chips and white'cluJ'S in an urn is unknown, but the proportion, p,
ofreds is either or!, A sample of size 5, drawn with replacement, yields the sequence
red, white, white, red, and white. What is the maximum likelihood estimate for p?
i
.5.23. Use the sample Y1 = 8.2, Y2 = 9.1, Y3 = 10.6, and Y4
likelihood estimate for A in the exponential pdf
Jy (y; J..)
= J..e-)..",
= 4.9 to calculate the maximum
y:::: 0
5.2.4. Suppose a random sample of size n is drawn from the proba bility model
px(k; B)
=
(/1-"e- 02
k!
k
= 0,1,2, ...
Find a formula for the maximum likelihood estimator,
S.2..S. Given that Yl = 2.3, Y2
e.
= 1.9, and Y3 = 4.6 is a random sample from
Jy(y; 8)
=
Y'e- yIB
(1)4 ,
y
~
0
calculate the maximum likelihood estimate for 8.
5.2.6. Use the method of maximum likelihood to estimate B in the pdf
fy(y; B)
8
= 2.;ye-e'/y,
y > 0
Evaluate 8e for the following random sample of size 4:Yl = 6.2, Y2
Y4 = 4.2.
= 7.0, YJ = 2.5, and
5.2.7. An engineer is creating a project scheduling program and recognizes that the tasks
ma.king up the project are not always completed on time. However, the completion
proportien tends to be fairly high. To reflect this condition, he uses the pdf
where y is the proportion of the task ooropleted. Suppose in his previous project, the
proportion of tasks completed were 0.77, 0.82, 0.92, 0.94, and 0.98. Estimate e.
356
Chapter 5
Estimation
5.2.8. The following data show the number of occupants in passenger cars observed during
one hour at a busy inter8e(;tion in Los
(68). Suppose it can be assumed that
these data foUow a geometric distribution, px(k; p) = (1 - p)k-l p, k = 1,2, ...
Estimate p and compare the observed and expected frequencies for each value of X.
Number of
678
21:7
1
2
3
4
5
6+
56
28
8
14
1011
5..2.9. (8) Based on the random sample Yl =
= 1.8, = 14.2, and Y4 = 7.6, use the
method of maximum likelihood to estimate the pal:auleu:r8 in the uniform pdf
frey; e)
1
0'
(b) Suppose the random sample in Part (a) ·,..,....'rf'.""nk the two-parameter uniform pdf
fy(y; 81,~)
1
= ----a,-1
'
Find the maximum likelihood estimates for a.
and~.
5.2.10. Find the ll1.aximumlikelihood estimate for (J in the pdf
fr(Y; 0)
= -'----=
1 -
if a random sample of size 6 yielded the measurements 0.70, 0.63, 0.92, 0.86, 0.43, and
0.21.
5.2.11. A random sample of size n is taken from the pdf
fy(y; fJ)
= 2y(P,
1
O<y<-
- -a
Find an expression for 8, the maximum likelihood estimator for a.
'5.2.12. If the random variable Y denotes an individual's income, Pareto's law claims that
P(Y ::: y)
Fy (y)
= (;)
=1
(i,
where k is the entire population's minimum income. It follows that
- (;) (i, and, by differentiation,
fy(y; 8)
= el!l
(y1)9+1 .
y ::: k;
8 ::: 1
Assumek is known. Find the maximum likelihood estimator for (J ifincome information
has been collected on a random sample of 25 individuals.
Section 5.2
Estimating Parameters 351
is a measure of lifetimes of rI".,Ji ...·~ that do not age (see Question
the exponential
case of the Weibull distribution,
which measures time to failure of devices
probability of failure increases as
time does. A Weibull random variable Y
Jy(y; ct, /3) = apyJJ-le-ityfJ , 0:::: J,
5.2.13. The
o<
ct. 0 <: f3
(al Find the maximum likelihood estimator
a assuming that f3 is known.
(b) Suppose ct and f3 are both unknown. Write down the equations that would
be solved simultaneously to find the maximum likelihood estimators of a
/3.
5.2.14. Suppose a random sample of size n is drawn from a nonnal pdf where the mean f.L is
known but the variance 0 2 is unknown.
method of maximum likelihood to find
a fonnula
&2. Compare your answer to the maximum likelihood estimator found in
Example 5.2.4.
The Method
Moments
A second
for estimating parameters is the method of moments.
near
the turn of the twentieth century by the great British statistician, Karl Pearson, the method
of moments is
more tractable than the method of maximum likelihood in situations
where the
probability model
parameters.
and its pdf is a function of s unknown
Suppose
Y is a continuous random
... , Os. The first s moments
Y, if they exist, are
by the
parameters, 01,
integrals
In
each £(Y i) will be a different HU!I.'L"UU of the s parameters.
E(yl)
= gl(lh.th •... ,Os)
E(y2) =
th .... , Os)
.
1
Corresponding to each theoretical moment, E(YJ), is a sample moment, -
j
Yi'
ni=l
Intuitively, the jth sample moment is an approximation to the jth theoretical moment.
a system of s
equations, the
two equal for each j
"UJUU'JU~ to which are the desired set
Ole, the •... , and
Definition 5.23.. Let Yl. )'2... , YII
a random sample from
continuous pdf
fy(y; O}, th. "'. Os). The method
moments estimates, Ole,
... , and Ose, for the
358
Estimation
model's unknown parameters are
f:
f:
solutions of the s
Y fy(y; OJ, th.···,
i
fy(y;
th, th,···.
pX(k; Ot. th ..... 0.1'), the
Note: If the underlying
variable is discrete
metnCiQ of moments estimates are the solutions of the
EXAMPlE 5_2.5
Suppose that
four from the pdf
Y2 = 0.10, Y3 = 0.65, and Y4
fy(y; 0) = 9yfJ-l ,
equations,
= 0.23 is.a random sample of size
0::: y < 1
the method of moments estimate for 9.
approach that was LVl,'V~'''''''
we will derive a general
for the method moments estimate
that only one equation needs to be solved
any use of the four data
the
is indexed by
~ ~"",""" the same
The first theoretical moment
Setting E(Y)
to
1
n
n
moment,
Yi(= y), the first
o
=Y
oec:aID~
Section 5.2
Estimating Parameters
359
which implies that the method oC moments estimate for e is
y
8e = - 1 -
Here, y = i(0.42
+
0.10
+
0.65
+
0.23)
Be =
Y
= 0.35, so
0.35
1 - 0.35
= 0.54
CASE STUDY 5.2.2
Although hurricanes generally strike only the eastern and southern coastal regions of
the United States, they do occasionatly sweep inland before completely dissipating.
The U.S. Weather Bureau confirms that in the period from 1900 to 1969 a total of
thirty-six hurricanes moved as far as the Appalachians. In Table 5.1.2 are listed the
maximum twenty-four-hour precipitation levels recorded for those thirty-six storms
during the time they were over the mountains (67).
Figure 5.2.2 shows the data's density-scaled histogram. Its skewed shape suggests
that Y, the maximum twenty-Cour-hour precipitation associated with inland hurricanes,
can be modeled by the two-parameter gamma pdf,
Use the method of moments to estimate rand >..; then superimpose fy(y; reo Ae) on a
graph of the density-scaled histogram of the 36 YiS.
From Theorem 4.6.3,
E(Y)
=
f
and
so
(Continued on next page)
360
Chapter 5
Estimation
Sfitdy 5.22 rontinUl!d)
TABlE 5.2.2: Maximum Twenty-Four-Hour Precipitation Recorded for Thirty-Six Inland Hurricanes
(1900-1969)
Year
Name
1969
1968
1965
1960
1959
1957
Camille
Candy
Betsy
Brenda
Hazel
1952
1945
1942
1940
1939
1938
1934
1933
1932
1932
1929
1928
1928
1923
1923
1920
1916
1916
1912
1906
1902
1901
1900
1900
Able
Precipitation
(inches)
31.00
2.82
3.98
4.02
Meadows,
Russels Point. Ohio
Slide Mt., N.Y.
Big Meadows, Va.
Eagles Mere, Pa.
BloserviUe 1-N, Pa.
North Ford 1# 1, N,C.
Crossnore, N.C.
Big Meadows, Va.
Rhodhiss Darn, N.C.
Caesars Head,
Hubbardston, Mass.
Va.
N.Y.
9.50
4.50
11.40
10.71
6.31
4.95
5.64
5.51
9.72
4.21
11.60
4,75
6.85
6.25
3.42
11.80
0.80
Altapass, N.C
Highlands, N.C.
Lookout Mt., Tenn.
Highlands, N.C.
Norcross, Ga.
Horse Cove, N.C.
Sewanee, Tenn.
Linville, N.C.
Marrobone, Ky.
St. Johnsbury, Vt
3.69
3.10
22.22
7.43
5.00
4.58
4.46
8.00
3.73
3.50
6.20
0.67
(Continued on
Section
S~2
Estimating Parameters
3fi1
0.12
0.10
i!' 0.08
.~
0.06
~
o
0.04
0.02
0
4
M
W
U
Maximum 24-hl" rainfall (ill.)
B
tl
~
II
RGURE 5.2.2
5.2.2,
according to the figures in
1
Yi
36
1
36
==
Y7 =
85.59
To find'e and Ae> then, we need to solve the two equations
T
- = 7.29
A.
and
TV" + 1)
Substituting r
=
= 85.59
into the second equation
=
or AI!
Then, from the
The estimated model,
equation,'e
= 1.60 [= 7.29(0.22)].
fy(y; 1.60.
is superimposed on the data's deooty-scaled histogram in Figure 5.2.3. Considering
the relatively
number of observations in the sample, the agreement is quite
(Conlinued OT/ next page)
362
5
Estimation
Study 5.2.2 continued)
0.12
0.10
O.G5
0.06
0.04
0.02
o
4
8
24
Maximum 24-hr rainfall (io..)
16
12
20
AGURE 5.2.3
good. (The adequacy of the approximation here would come as no
to a
meteorologist: The gamma distribution is frequently used to describe the variation in
precipitation levels.)
QUESTIONS
1
5.2.15. Let Yl, ,Y2, ••. , y" be a random sample of size 17 from the unifonn pdf, fy (y; 9)
0~
::::; 8. Find a formula for the method of moments estimate for 9. Compare the values
method of moments estimate and the maximum likelihood estimate if a random
of size 5
of the numbers 17,
46,39, and S6 (ree-aU
5.2.9).
method of moments to estimate 8 in the pdf
e'
h(Y; 8)
= (82
+
8);YJ-l(1 - y),
0 < y < 1
Assume that a random
of size n has been collected.
5.2.17. A criminologist is searching through FBI files to document the prevalence of a rare
double-whorl finge1'print. Among six consecutive sets of 100,000 prints scanned by
a computer, the numbers of persons having the abnormality are 3, 0, 3, 4, 2, and
1, respectively. Assume that double whorls are Poisson events. Use the method
moments to estimate their occurrence rate, ).. How would your answer change if ).
were estimated using the method of maximum likelihood?
.5.2.18. Find the method of moments estimate for). if a random sample of size n is taken from
the exponential pdf, jy (y; J.) = le-).Y. Y 2: 0,
5.2.19. Suppose that YI = 8.3, Y2 = 4.9, Yj =
and Y4 = 6.5 is a random sample of size 4
from the two-parameter uniform pdf,
Use the method of moments to calculate
and Bu.
Section
Interval Estimation
363
5..2.20. Find a formula for the method of moments estimate for the parameter 8 in the Pareto
pdf,
fy(y; (I)
1)8+1 .
= 8k8 (Y
y~k;
Assume that k is known and the data consist of a random sample of size n. Compare
your answer to the maximum likelihood estimator found in Question 5.212
S.2.Zl. Calculate the method of moments estimate for the parameter f) in the probability
function
if a sample of size 5 is the set of numbers 0, 0, 1, 0, 1.
.5.2.22. Find the method of moments estimates for J1. and (f2, based on a random sample of size
n drawn from a normal pdf, where J1. = £(1') and (f2 = Var(y). Compare your answers
with the maximum likelihood estimates deri"ed in Example 5.2.45.2.23. Use the method of moments to deri"e formulas for estimating the parameters r and p
in the negative binomial pdf,
pxCk;r,p)=(:
=~)pr(1- p)k-r,
k=r,r
+
1, ...
5.2.24. Bird songs can be characterized by the number of dusters ofHsyllables" that are strung
together in rapid
If the last cluster is defined as a
it may be
reasonable to treat the number of dusters in a song as a geometric random variable.
Does the model Px (k) = (1 - p)k-l p, k = 1,2.... adequately describe the following
distribution of 250 song Lengths (102)? Begin by finding the method of moments
estimate for p. Then calculate the set of "expected" frequencies.
No. of
1
132
2
52
3
34
4
9
7
5
5
6
5
6
7
8
INTERVAL e5l1MATION
Point estimates, no matter how they are detennjned. share the same fundamental
weakness: They provide no indication ofthejr inherent precision. We know, for instance,
that i = X is both tp.e maximum likelihood and the method of moments estimator for
the Poisson parameter, A. But suppose a sample of size six is taken from the probability
model px(k) = e-Akk Ik! and we find that
= 6.8. Does it follow that true A is likely
364
Cha pter 5
Estimation
to
close to Ae-Say, in the interval from 6.7 to 6.9--or is the estimation process so
imprecise that)., might actually be as small as 1.0, or as large as 12.01 Unfortunately, point
estimates, by themselves, do not allow us to make
kinds of extrapolations. Any such
be taken into account.
statements require that the variation of the
The usual way to quantify the amount of uncertainty in an estimator is to construct
a confidence interval. In
confidence
are ranges of numbers that have
a high
of "containing" the unknown
as an interior point.
a good sense of tbe estimator's
looking at the width of a confidence interval, we can
precision.
EXAMPLE 5.3.1
Suppose
6.5,
9.9,
12.4
fy(y: tt)
'-"'J •.
...,,,'u.• ,"- a random sample
1
= -==--e
size four from
pdf
-oo<y<oo
is, the four
come from a normal distribution where l1 is equal to 0.8, but the
of
four data points?
mean, tt, is unknown. What values of /J.. are believable in
To answer that
that we keep the distinction between estimates and
all, we know from
5.2.4
the maximum
likelihood estimate for /J.. is /J..e
something very
(~) ;=1 Yi = (~) (38.0) = 9.S. We also know
=Y
about the probabilistic behavior of the maximum likelibood
Y-tt
estimator, Y: According to the Corollary to Theorem 4.3.4,
normal pdf, fz(z). The probability, then, that
A
Y-/J..
has a
will [aU between two
0.8/ 4
specified values can be deduced from Table A 1 in the Appendix. For
P(-1.96 s; Z :::; 1.96)
(see
= 0.95 = P (-1.96 s; Y -
tt s; 1.96)
(53.1)
5.3.1).
Area
~---
--~
-1,96
o
FIGURE 5.3.1
0.95
---===--
1.96
~
y-~
O,81V4
Section 5.3
Interval Estimation
365
"Inverting" probability statements of the sort illustrated in Equation 5.3.1 is the mechanism by which we can identify aset of parameter values compatible with the sample
If
p( -1.96::: Y -Ik :::1.96) =0.95
then
p
(Y
0,8
-
- 1.96 ../4 ::: Ik ::: Y
+ 1.96 0.8)
J4 = 0.95
which implies that the random interval
0.8 -
(Y
1.96 ../4' Y
0.8)
+ 1.96 J4
a 95% chance of containing J..! as an interior point.
case
After substituting for Y,
random interval in
0.8
( 9.50 - L 96 ../4,9.50
+
0.8)
1.96../4
to
= (8.72, 10.28)
We call (8.72, 10.28) a95% confidence interval/or J..!. In the long run, 95% oftbe intervals
constructed in this fashion will contain the unknown J..!; the remaining 5% will lie either
entirely to the left of J..! or entirely to tbe right For a
set data, of course, we have
no way of knowing whether the calculated
(Y - 1.96 ' ~. Y+ 1.96 . ~) is one of
the 95% that contains J..! or one of the 5% that does not.
Figure 5.3.2 illustr~tes graphically the statistical implications associated with the
0,8)
·
(ran dom mterval
Y - 1. 96°·8
,J4' -y + 196
' J4
. For every d'ff
1 erent
t he interva1 will
have a different location. While there is no way to know whether or not a
intervaJ-in
particular, the one the experimenter has just calculated-will include the unknown J..!, we
do have the reassurance that in the long run 95 % all such intervals will,
Comment.
behavior of confidence intervals can be modeled nicely by using a
computer's random number generator. The output in Table 5.3.1 is a case in point. Fifty
1----1---][-
----1-- .,.
---1----1----
I
I...-_...I....-_--'-_ _I . . - _ - ' -_ _ _ _-'--_....A-_ _I..-_
1
2
3
4
5
6
Possible 95% confidence intervals fur p.,
AGURE 5.3.2
7
8
Data set
366
Chapter 5
Estimation
TABLE 53.1
50 el-c4;
normal 10 0.8.
:> rm6an cl-c4 c5
:> let c6 - c5 - 1.96*(0.8)/sqrt(4)
:> let c7 • c5 + 1.96.(0.8)/sqrt{4)
:> name c6 'Lov.Lim.' c1 'Upp.LiIII.'
:> priDt e6 c7
MTB
:> random
SU8C
:>
MTB
MTB
MTB
MTB
MTB
DataDi~lJy
Rov
1
2
3
4
5
6
7
Low. Lim.
46
8.7596
8.8763
8.8337
9.5800
8.5106
9.6946
8.7079
10.0014
9.3408
9.5428
8.4650
9.6346
9.2016
9.2517
8.1U8
9.8439
9.3291
9.5685
8.9728
8.5175
9.3979
9.2116
9.6277
9.4252
9.6868
8.8779
9.1670
9.3271
9.1606
8.8919
9.3838
8.7575
10.4602
8.9437
9.0049.
9.0148
8.8110
9.1981
9.0042
9.1019
9.2161
8.3901
8.6337
9.4606
9.3278
8.6643
47
9.0&U
48
9.2042
9.2710
9.5697
8
9
10
11
12
1S
14
15
16
17
18
19
20
21
22
23
24
25
26
27
2S
29
30
31
32
33
34
35
35
37
38
39
40
41
42
43
44
45
49
50
Lim.
3276
10.4443
10.4017
11.1480
10.0786
11.2626
10.2709
Cont:a1ns Ii- - 10?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
11.6694
NO
10.9088
11.1108
10.0330
11.2026
10.'775a
10.8197
10.3248
11.4119
10.8911
11.1365
10.5408
10.1456
10.9659
10.7795
11.1967
10.9932
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
11.2548
10.4459
10.7250
10.8951
10.7286
10.4599
10.9518
10.3256
12.0282
NO
10.5117
10.5729
10.6828
10.3190
10.1561
10.5122
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
11.2699
10.7847
9.9581
10.2017
11.0286
10.8968
10.1523
10.6221
10.7722
10.8390
11.1377
NO
Yes
Yes
Yes
Yes
Yes
Yes
Yes
YM
SO
confidence intervals
contain tbe true 11(= 10)
Section
Interval Estimation
361
simulations of the confidence interval described in Example 5.3.1 are displayed. That
fifty samples. each of
11 = 4, were drawn from the norma) pdf
Jy(y; f.!)
1
= J27r(O.8) e
(1ii)2.
--00
< Y <
00
MINITAB's RANDOM command. (To fully
the
to know tbe
value that each confidence interval was seeking to contain-the true f.! was assumed to
equal ten). For each sample of size n = 4, the lower and upper limits the corresponding
95% confidence interval were calculated, using the formulas
· =y
- - 196°,8
Low. L1m.
. ..J4
U pp. Lim. = -y
+ 196°·8
. .J4
As the last column in the DATA DISPLAY indicates, only three of the fifty confidence
intervals fail to contain f.! = 10: Samples
and thirty-three yield intervals that lie
entirely to the right of the parameter. while sample forty-two produces a range of
values that lies entirely to the left. The remaining forty-seven intervals, though. or 94%
~ x 100) do contain the true value of f.! as an interior point.
CASE STUDY 5.3.1
In the eighth century B.C., the Etruscan civilization was the most advanced in all of
Italy. Its art forms and political innovations were destined to leave indelible marks
on the entire Western world Originally located alo.ng tbe western coast between the
Arno and Tiber rivers (the region now known as Tuscany), it spread quickly across
tbe Apennines and eventually overran much of Italy. But as quickly as it came, it
faded Militarily it was to prove no. match for the burgeoning Roman legions, and by
the dawn of Christianity it was all but gone.
No chronicles of the
empire bave ever
found, and to this day its
ongms
shrouded in mystery. Were the Etruscans native Italians, or were they
immigrants? And if tbey were immigrants, where did they come from? Much
what
is known has come from anthro.pometric studies-that is, investigations that use body
measurements to detennine racial characteristics and ethnic origins.
A case in point is the set of data given in Table 5.3.2, showing the sizes of eightyfour Etruscan skulls unearthed in various archeo.logical digs thro.ughout Italy (7).
The sample mean, }i, of those measurements is 143.8 mm. Researchers believe that
skull widths of present-day Italian males are normally distributed with a mean (f.!) of
132.4 mm and a standard deviation (u) of 6.0 mm. What does the difference between
y = 143.8 and f.! = 132.4 imply about the likelihood that Etruscans and Italians share
the same ethnic origin?
(Continued on nul page)
368
Chapter 5
Estimation
(eliSe Slwfy 5.3.1 continued)
TABLE 5.3.2
(mm) of 84 Etruscan Males
Maximum Head
141
146
144
141
141
136
137
149
141
142
142
147
148
155
150
144
140
140
139
148
143
143
149
140
132
158
149
144
145
146
143
135
147
142
142
138
150
145
126
135
142
140
148
146
149
137
140
154
140
149
140
147
137
131
152
150
146
134
142
147
158
144
146
148
143
143
132
149
144
152
150
148
143
142
141
154
141
144
142
138
146
145
One way to answer that question is to construct a 95% confidence interval for the
true mean of the population represented by the eighty-four YiS in Table 5.3.2. If that
confidence interval fails to contain fL = 132.4, it could be
that the
were not the
of modern Italians. (Of course, it would also be necessary to
factor in whatever evolutionary trends in skull
have occurred for Homo sapiens.
in general, over the past three thousand years.)
h follows from the discussion in Example 5.3.1 that the endpoints for a 95%
confidence interval for J.L are
by the general formula
(v - 1.96· 5n. y + 1.96· 5n)
that
(
reduces to
143.8 - 1.96 .
6.0
143.8
+ 1.96 ,J84~)-
mm, 145.1 mm)
Since the value fL = 132.4 is not contained in the 95% confidence interval (or even
to being contained), we would conclude that a
mean of 143.8 (based on a
sample of size eighty-four) is not likely to have come from a normal population where
fL
132.4 (and (f
6.0). It would appear, in other words, that Italians are not the
direct descendants of Etruscans.
=
=
Comment. Random intervals can be constructed to have whatever "confidence" we
choose. Suppose Z,,12 is defined to be the value for which P(Z :::: Za/2) = a/2. If a 0.05,
for example, Zal2 = 2.025 = 1.96. A 100(1 - a)% confidence interval for fL, then, is the
=
Section 5,3
range
Interval Estimation
369
numbers
(
Y-
13_
Za/2 .
Y
+
Za/2 •
(3)
..fo
In practice. ex is typically set at either 0.10, 0.05, or 0.01, although in some fields 50%
confidence intervals are frequently used.
Confidence Intervals for the Binomial Parameter, p
Perhaps the most frequently encountered applications of confidence
are those
involving the binomial parameter, p. Opinion surveys are often the context: When polIs
are released, it has become standard practice to issue a discl.ai.mer by saying that the
findings have a certain margin
error.
we will see later in this section, margins
error are related to 95% confidence intervals.
The inversion technique followed in Example 5.3.1 can be applied to large-sample
binomial random variables as welL We know from Theorem 4.3.1 tbat
(X - np)/Jnp(l - p)
(X/n-
approximately a standard normal distribution when X is binomial and II is large. It is
also true that
pdf
can approximated by fz (z), a result that seems plausible given that
likelihood estimator for p.
Therefore,
*
is the maximum
(53.2)
Rewriting Equation
by isolating p in the center of the inequalities leads to the
fonnula given in Theorem 5.3.1.
Theorem 5..3.1. Let k be the nwnber of successes in n independent trials, where n is large
p
P(success) is unknown. An approximate 100(1 - a)% confidence interval/or p
is lhe set of numbers
=
(~
Za/2
J
(k/n)(1 - kIn) k
,n
n
+
Za/2
(k/n)(l - kIn)
n
370
Chapter 5
Estimation
CASE STUDY 5.3.2
Intelligent
on
planets is a
theme that oontinues to be box
magic. Theatergoers seem equaUy enthralled by intergalactic brethren portrayed as
hostile
like the
machines in H. G. Wells's War of the Worlds, or
'-"-'''><1'\'' free
like the
nebbish in Stephen SpieJberg's E. T.
What is not so clear is the extent to which people actually believe
such creatures
exist. In a dose encounter of the statistical kind, a Media-General-Associated Press
poll found that
of 1517 respondents accepted the idea of intelligent life existing on
Based on
what might we reasonably conclude about the
proportion of all Americans who believe we are not alone?
the "believable" values (or p according to
Given that n = 1517 and k =
Theorem 5.3.1 are the numbers from
to 0.50:
713
1517+
(0.44,0.50)
If
true proportion of
in other words, who
in extraterrestrial
life is less than 0.44 or greater than 0.50, it would be highly unlikely that a sample
proportion (based on 1517 responses) would be 0.47.
Comment. We call (0.44,0.50) a 95% confidence interval for p, but it does not follow
that p has a 95% chance of lying between 0.44 and 0.50. The
p is a constant,
The
so it faUs between 0.44 and 0.50 either 0% of the time or 100% of the
refers to the procedure by which the interval is constructed, not to any particular intervaL
This, of course, is entirely analogous to the interpretation given earlier to 95% confidence
intervals for J-L.
Comment. Robert Frost was certainly more familiar with iambic pentameters than
he was with estimated parameters, but in 1942 he wrote a ooupLet
sounded very much
like a
perception a confidence
(99):
We dance round in a
and suppose,
But the Secret sits in the middle and knows.
EXAMPLE 5.3.2
Central to
statistical software package is a random number generator. Two or
three simple commands are typically aU that are required to output a sample of size n
representing any of the standard probability models. But how can we be
that
numbers purporting to be random observations from, say, a norma] distribution with
J1. =
and 0' = 10 actually do represent
particular
Interval Estimation
Section 5.3
111
TABlE 5.3.3
0.00940*
0.93661
0.46474*
0.58175*
5.05936
1.83196
0.81223
1.31728
0.54938*
0.75095
139603
0.48272*
0.86681
0.04804*
1.91987
1.84549
0.81077
0.73217
232466
0.50795*
0.48223*
0.55491*
0.07498*
1.92874
1.20752
0.59111*
0.52019*
0.66715*
0.11041*
3.59149
0.07451*
1.52084
1.93181
0.11387*
0.36793*
0.73169
338765
2.89577
1.38016
1.88641
1.06972
0.78811
0.38966*
0.16938*
3.01784
1.20041
0.41382*
2.40564
0.62928*
216919
0.42250*
2.41135
0.05509*
1.44422
031684*
1.07111
0.09433*
1.16045
0.77279
0.21528*
median of fr(y) = e-Y , y > 0]
*number =.s 0.69315
The answer we cannot; however, a number of "tests" are available to check whether
simulated measurements appear to
random with respect to a given criterion. One
such procedure is the median test.
Suppose Yl. Y2 •... ,Yn denote measurements presumed to have come from a continuOllS pdf fy(y). Let k denote the number of YiS that are less than the median of fy(y). If
the sample is random. we would expect the difference between
~ and .!. to be small. More
2
specifically, a 95% confidence interval
on k should contain the value
n
Listed in
5.3.3 is a set of sixty Yis generated by MINITAB to represent the
exponential pdf, frey) = e-Y • Y 2::: O. Does this sample pass the median test?
The median here ism = 0.69315:
1'1
1
m
e-Ydy = -e- Y [ = 1 -
which implies that m = m(05) = 0.69315. Notice
of
sixty entries in Table 53.3,
a total of k 26 (those marked wi th an asterisk, "') fall to
left of the median. For these
.
k
26
partIcular Yi S , then, ;; = 60 = 0.433.
Let p
the (unknown) probability that a random observation produced by
MINlTAB's generator will lie to the left of the pdf's median. Based on
sixty
of numbeTS extending from
observations, the 95% confidence interval for p is the
0.308 to 0.558:
=
(~
L~ (26/60)(160- 26/60), ~ + 1.96
(26/60)(1 -
= (0.308,0.558)
The fact that the value p = 0.50 is contained in the confidence interval implies that
these data do pass the median test. It is entirely believable, in other words, that a bona
would have twenty-six observations
fide exponential random sample of size
below the
median, and thirty-four above.
372
Chapter 5
Estimation
Margin of Error
In the popular press, estimates for p (i.e., values of ~) are typically accompanied by a
n
IrUlrgin of errorJ as opposed to a confidence interval. The two are related: A
of
error is half the maximum width of a 95% confidence intervaL
number actually
quoted is usually
as a percentage.)
Let w denote the width of a 95% confideoce interval for p. From Theorem 5.3.1,
~ + l.96J(k l n)(1 n-
w
kin) _
(~
_
l.~(kln)(ln -
kin»)
=
Notice that for fixed n, w is a
o :::: ~ ::::
I.
of the product
the largest value that
(~)
(1
~)
(~)
can
(1 - ~}
is
But
!.
that
or! (see
Question 5.3.16). Therefore,
maxw=3.~
'c.I"'V{1
4;;Definition 5.3.1. The m.llrgin of error
with an estlmate!:, where k is the
n
of successes in n independent trials, is l00d%, where
d
= 1.96
EXAMPLE 5.3.3
Hurricane Charley, a
four storm that devastated
of southwestern Florida
in August,
was both a political issue as well as a meteorological catastrophe. Under
scrutiny was the Federal government's response to the storm's victims and whether that
respoll1se might
the
Presidential election.
Several weeks after the clean-up had
a USA Today
reported that 71% of
the 1,002 adults in terviewed "approved" of the way President Bush and his administratioll
were handling disaster relief. What margin of error would be associated with the 71 % 'I
Applying Definitioll 5.3.1 (with n == 1(02) shows
the margin
error associated
with the poll's result is
d
= 1.96/(2Jl002)
= 0.031
Notice that the margin of error has nothing to do with the actual survey results. Had
the percentage of respondents approviog President Bush's handling of the situatioll been
17% or 71 %, or any other number, the
of error. by definition, would have been
the same.
Section 5.3
Interval Estimation
373
Choosing Sample 5i;!:es
Related to confidence intervals and margins of error is an important experimental design
question. Suppose a researcher wishes to estimate the binomial parameter p based on rea
sults from a series of n independent trials, but n has yet /0 be determined. Larger values of It
will, of course , yield estimates having greater precision, but more observations also demand
greater expenditures of time and money. How can those two concerns best be reconciled?
If the experimenter can articulate the minimal degree of precision that would
considered acceptable, a Z transformation can be used to calculate the smallest (ie., the
cheapest) sample
capable of achieving that objective. For example, suppose we want
X to have at
n
problem is solved,
a 100(1 - a)% probability of
if we can find the smallest 11
P(-d ~ ~ -
p
within a distance d of p. The
which
~ d) = 1 -
(5.3.3)
a
Theorem 5.3.2. Let X be the estimator for the pa.rameter p in a binomial distribution. In
n
order for X to have at lenst a 100(1 - a) % probability of being within a distance d of p,
n
should be no smaller than
the sample
2
"a/2
It
= 4J2
where Za/2 is the value for which P(Z 2:. Zan)
= a/2
Proof. Start by dividing the terms in the probability portion of Equation
by the
standard deviation of X to form an approximate Z ratio:
n
-;:==== < -;:====== < -;==:=;===;:::=
r=;::===:=<Z<
d
- Jp(1 - p)/n
But P( -Zan
:s Z
~ Za/2)
=1
)=l-a
- a, so
d
Za/2
which implies that
(5.3.4)
Equation 53.4 is not an acceptable final answer, though, because
right-hand side
is a function of p, the unknown parameter. But p(1
p) :s
for 0 ~ p ~ 1, so the
l
)74
Chapter 5
Ertimation
sample size
n=
would necessarily cause X to satisfy Equation 5.3.3, regardless of
actual value of p.
n
(Notice the connection between the statements of Theorem 5.32 and Definition
o
EXAMPlE 5.3A
A public health survey is being
the proportion
children,
immunization. Organizers of the
"''''L'''.u' .....''5
in a
metropolitan area for the purpose of
zero to fourteen,
are lacking adequate
would like the
proportion of
inadequately immunized children, X, to have at least a 98% probability of being
0.05 of the true proportion, p. How
should the sample
Here 100(1 - a) ::::::
so a = 0.02 and ZuJ2
2.33. By Theorem 5.3.2, then, the
smallest acceptable sample
is 543:
Comment. Ckcasionally, there
be reason to believe that p is necessarily less than
some number '1, where'l <
or
than some number f2, where 1'2 >
If so, the
racl:ors p(l in Equation
can
replaced by either 'l (1 - '1) or '2(1 - (2), and
the
required to estimate p with a specified
will be reduced,
by a considerable amount
Suppose, for example, that previo1J:S immunization studies
that no more than
20% of children between the
of zero and fourteen are inadequately
The
smallest sample
then, for which
f.
P ( -0.05
is 348, an
11
that represents almost
5: - 5
P
a 36%
0.05)
reduction ( = 543 - 348
original
n
= (0.05)2 (0.20) (0.80)
=348
=
x
100) from the
Section 5.3
Interval Estimation
375
Comment. Theorems 5.3.1 and 5.3.2 are both based on the assumption that the X in
X varies according to a biriomial model. What we learned in Section 3.3, though, seems to
II.
contradict that assumption: Samples used in opinion surveys are invariably drawn without
replacement, in which case X is hypergeometric, not binomial The consequences of that
particular "error," however, are
corrected and frequently negligible.
It can be shown mathematically that the expected value
of whether X is binomial or hypergeometric; its
binomial,
X is the same regardless
n
though, is different. If X is
If X is hypergeometric,
where N is the total number of subjects in the population.
·
N
'
. somewhat smaller t
·)
S rnce
N - n < 1, the actuaI
vanance
-X 1S
anh
the (brnomial
- 1
II.
variance we have been assuming,
n
ratio N - n is caUed the finite correclwn
N -
1
factor. If N is much larger than n, which is typically the case, then the magnitude of N - II.
N - 1
will
so close to 1 that the variance of X is equal to p(1 -
for aU practical purposes.
n
II.
The "binomial" assumption in those situations is more than adequate. Only when the
sample is a sizable fraction of the population do we need to include the
correction
factor in any calculations that involve the variance of X.
QUESTIONS
5.3.L The production of a nationally marketed detergent results in certain workers receiving
prolonged exposures to a Bacillus subtilis enzyme. Nineteen workers were tested to
determine the effects of those exposures, if any, on various respiratory functions. One
such f1lIlction, air-flow rate, is measured by computing the ratio of a person's forced
expiratory volume (FEVl ) to his or her vital capacity (VC). (Vital capacity is the
maximum volume of air a person can exhale after taking as deep a breath as possible;
FEV1 is the maximum volume of air a person can exhale in one second.) In persons with
no lung dysfunction, the "nonn" for FEV1NC ratios is 0.80. Based on the following
data (169), is it believable that exposure to the Badllus subtilis enzyme has no effect on
the FEV1NC ratio? Answer the question by constructing a 95% confidence intervaL
Assume that FEV1NC ratios are normally distributed with (1 = 0.09.
376
Olapter 5
Estimation
RH
RB
MB
DM
WB
RB
BF
IT
PS
RB
0.61
0.70
0.63
0.76
0.67
0.72
0.64
0.82
0.88
0.82
WS
RV
EN
WD
FR
PD
EB
PC
RW
0.78
0.84
0.83
0.82
0.74
0.85
0.73
0.85
0.87
5.3.2. Mercury pollution is widely recognized as a serious ecological problem. Much oC the
mercury released into the environment originates as a byproduct of coal burning and
other industrial processes. It does not become dangerous until it Calls into
bodies
of water where microorganisms convert it to methylmercury (CHF), an organic form
that is particularly toxic. Fish are the intermediaries.: They ingest and absorb the
methylmercury and are then eaten by humans. Men and women, however, may not
that question, six women
metabolize CFIf! at the same rate. In one study
were given a known amount of protein-bound methylmercury. Shown in the following
table are the half-lives of the methylmercury in thelr systems (117). For men, the
average CHf! half-life is believed to be 80 days. Assume that for both genders, CH~OO
half-lives are normally distributed with a standard deviation (a) of eight days. Construct
a 95% confidence interval for the true female CFIf! half-life. Based on these data, is it
believable that males and females metabolize methylmercury at the same rate? Explain.
AE
EH
U
.A.N
KR
LU
52
69
73
88
87
56
5.3.3. Suppose a sample of size n is to be drawn from a normal distribution where a is known to
be 14.3. How large does II have to be to guarantee that the length of the 95% confidence
interval for;.t will be less than 3.06?
5.3.4. What "confidence" would be associated with each of the following intervals? Assume
that the random variable Y is normally distributed and that (1 is known.
(a)
(Y - 1.64·
(b)
(--00. y + 2.58·
(c)
(Y - 1.64 . 5n. y)
a
Y+ 2.33 .
5n)
5n)
""''"''''..".\ 5.3
0
371
n, are to be drawn from a normal distribution
5...3.5. Five independent samples. each of
where
Interval Estimation
is known. For each sample, the interval
(Y - 0.96· 5n' y + 1.06 . 5n)
will be constructed. What is the proba bility that at [east four of the intervals will contain
the unknown
5...3.6. Suppose that Yl. )'2 ••..• Yll is a random sample of size n from a normal distribution where
o is known. Depending on how the tail area probabilities are split up. an infinite number
of random intervals having a 95 % probability of containing IJ.. can be constructed. What
is unique about the particular interval
(Y - 1.96 .
:n. Y+ .:n)1
If the standard deviation (0) associated with the pdf that produced the following
sample is
would it be correct to claim that
(2.61
- 1.96 .
3.6
2.61
+
1.96·
3. 6)
.J2i5
(1.03,4.19)
is a 95% confidence interval for /%? Explain.
2.5
3.2
0.5
0.4
0.3
0.1
0.1
0.2
0.2
1.3
0.1
0.4
1.4
11.2
2.1
0.3
10.1
7.4
5.3.8. Food-poisoning outbreaks are often the result of contaminated salads. [n one study
out to assess the magnitude of that problem, the New York City Department
of Health exami,ned 220 tuna salads marketed by various retail and wholesale outlets.
A total of 179 were found to be unsatisfactory for health reasons (166). Find a 90%
confidence interval for P. the true proportion of contaminated tuna saJads marketed in
New York City.
5.3.9. In 1927. the year he lDt 60 home runs, Babe Ruth batted .356, having cotlected 192 hits
in 540 o.fficial at-bats (145).
on
performance that season, construct a 95%
confidence interval
Ruth's probability of getting a
in a future at-bat.
break during the telecast of
Bowl XXlX cost ap5.3.10. To buy a 3Q..second
proximately $1,000,000. Not surprisingly, potential sponsors wanted to know how many
people might be watching. In a survey of 1015 potential viewers,281 said they expected
during the game. Define the releto see less than a quarter of the advertisements
vant
and estimate it using a 90% confidence interval
5...3.1L During one of the first "beer wars" in the early 1980s, a taste test between Schlitz
and Budweiser was the focus of a nationally broadcast TV commercial. One hundred
people agreed to drink from two unmarked mugs and indicate which of the two beers
they liked better; fifty~four said "Bud." Construct and interpret the corresponding 95 %
confidence interval for p, the true proportion of beer-drinkers who prefer Budweiser
to Schlitz. How would Budweiser and Schlitz executives each put these results in the
best possible Jight for their respective companies'!
5...3.12. If (0.57, 0.63) is a 50% oonfidence interval for p. what does
observations were taken?
~
n
equal and how many
378
Chapter 5
Estlmation
5.3.13. Suppose a coin is to be tossed n times for the purpose of estimating p, where
p=
How large must n be to
that the
of the 99% confidence
interval for p will be less than 0.02?
5..3.14. On the morning of November 9, 1994-the
after the electoral landslide that
returned Republicans to power in both branches of Congress-several key races were
still in doubt. The most prominent was the Washington contest involving Democrat
Tom Foley, the reigning speaker of the house. An Associated Press story showed how
narrow the margin had become (124):
With 99 percent of
reporting, Foley trailed Republican challenger
votes, or 50.6 percent to 49.4 percent. About 14,000 absentee
Nethercutt by just
ballots remained uncounted, making the race too close to call.
Let p
P(absentee voter prefers Foley). How small could p have been and still have
Foley a 20% chance of overcoming Nethercutt's lead and winning the election?
5..3.15. Which of the following two intervals has the greater prObability of containing the
binomial parameter p?
X
-+
II
or
=
5.3.16. Examine the first two derivatives of the fUnction g(p)
p(1 - p) to verify the daim
on page
that p(l - p)!! ~ for 0 < p < 1.
5.3.17. Money magazine reported that 30% of 1013 adults
at random could not
correctly define any of the four main types of life insurance. Built into that figure, the
article
is a "3.1 % margin of error." Verify that computation and
in
a short paragraph what the 3.1 % implies.
5..3.18. Viral infections contracted early during a woman's pregnancy can be very harmful to
the fetus.
study found a total of 86 deaths and birth defects among 202 pregnancies
complicated by a first-trimester German measles infection (47). Is it believable that
the true proportion of abnormal births under
circumstances could be as
as 50%? Answer the question by calculating the margin of error for the ""'I ..ve,,,
proportion, 86/202.
5.3.19. Rewrite
5.3.1 to cover the case where a finite correction factor needs to
be included (Le., situations where the sample size n is not negligible relative to the
population size N).
A Forbes-Gallop
in the summer of 1994 questioned 304 chief executives chosen
the nation's largest companies. To the question "Over the next
from a list of 865
6 months do you expect the overall U.S. business climate to get beuer, worse, or remain
about the same?" 70 of the 304 said "better" (52). What margin of error is associated
100)
with their claim that 23 % of CEOs ( = :J& x
are bullish on the economy? Include
a finite correction factor in your calculation (see Question
5.3.21. Given that II observations will produce a binomial parameter estimator, X, having a
margin of error equal to 0.06, how many observations are
to have a margin of error half that size?
for
n
proportion
Section
Properties
Estimators
379
5.3.22. Given that a political poll shows that 52% of the sample
candidate A. whereas
48% would vote for candidate B, and given that the margin of error associated with the
survey is 0.05,
it make sense to claim that the two candidates are
Explain.
s.3.23. Assume that the binomial parameter p is to be
X is the number of successes in n independent
with the function X, where
n
Which demands the Larger sarnplle
requiring that X have a 96% probability of being within 0.05 of p, or requiring
that X have a
;robability of
within 0.04 of
n
5.3..24. Suppose that p is to be estimated by - and we are willing to assume that the true p will
not be
than 0.4. What is the s~allest n for which X will have a 99% probability
of being within 0.05 of p?
n
5.3.25.
P denote the true proportion
students who support the movement to colorize classic films. Let the random
X denote the number of students (out of n)
who prefer colorized versions to black and white. What is the smallest sample size for
X
probability is 80% that the difference between - and p is less than 0.02?
n
5.3.26. University officials are planning to audit 1586 new appointments to estimate the
proportion p who have been incorrectly processed by the PayroJi Department.
which
does the sample size need to be in order for X, the
n
to have an 85% chance of lying within 0.03 of p?
(b) Past audits suggest that p will not be
than 0.10.
recalculate the
size asked for in Part (a).
(a) How
proportion,
that infonnation,
1IlR000RTIES OF ESTIMATORS
method of maximum likelihood and the method moments described in Section
both use very reasonable
to identify estimators for unknown parameters, yet the
two do not always yield the same answer.
example,
that Yt • Yl • ... ,Yn is a
random sample from the
pdf, h(Y; 8)
1/8,0:::: y :::: e, the maximum likelihood
. ~
hi]
.L
f
..
2 ~
f or 0 IS
0
Ymax W e
meUlod 0 moments estlmator IS 0
L Y;.
=
A
=-
n
;=1
Implicit in those two fonnulas is an obvious question-which should we use?
More generally, the fact
parameters have multiple estimators (actually, an B.ll.LLUI....
number of es can be found for any given e) requires that we investigate the statistical
properties associated with the estimation process. What qualities should a
estimator have? Is it possible to find a "best" 8? These and other questions relating to the
theory of estimation will be addressed in the next several sections.
To understand the malhemlJJic., of estimation, we must first
in mind that
is a (unction of a set of random
is, = h(Yb Y2 • ... , YI1 ). As
any itself, is a random variable: It
a pdf, an expected value, and a
all
of which play
roles in evaluating its Cilpabilities.
We will denote the pdf of an
(at some
u) with the symbol fiJ(u) or
PiJ(u), depending on whether is a continuous or a discrete random variable. Probability
e
e,
e
calculations involving 0 will reduce to integrals of f!J{u) (if 8 is continuous) or sums of
P{j(u)
(if 8 is discrete).
380
Chapter 5
Estimation
EXAMPlE S.4.1
a. Suppose a coin for which p = P(heads) is
purpose of estimating p with the function
is to
p= X
tossed ten times for the
where X is
I
of heads. If p =-:0.60, what is the probability that X
~
number
0.601 :s 0.10? That is,
what are tbe chances that the estimator will fall within 0.10 of the true value of
p is discrete-the only values
parameter?
:0
can take on are
-&' ,....
Moreover, when p = 0.60,
Pi; eko) = P(p IkO) = P(X = k) = (~O) (0.60)k(0.4O)1D-k,
k = 0,1, ..
,10
Therefore,
p(1 ~ - 0.601 :s 0.10) = P (0.60 -
0.1O:s
~ :s 0.60 +
0.10)
,:::X:s7)
(I~) (o.60l (0.40) 10-A
b. How likely is the estimator X to lie within 0.10 of p if the
n
one hundred times? Given that n is so large, Ai Z
the variation in
l~' Since E ( ~) =
in
(a) is
can be
p and Var (
~)
to
- p) / n,
we can write
=0.9586
Figure 5.4.1 shows the two probabilities
and 1~. As we would expect., the
sample size produces
estimator-with fl = 10, X has only a 67%
of lying in the range
functions describing
Ai
more
~
as areas under the probability
from 0.50 to 0.70; for n = 100, though, tbe probability of X
true p
0.60) increases to 96%.
within
of
Section 5.4
Properties
Estimators
381
(Approximate) Dhll" of
X
100 when p .. 0.60
ATea .. 0.9586
whenp
o
0.1
0.2
0.3
0.4
0.5
0.6
0.7
O.R
0.9
Values of XII!
FIGURE 5.4.1
Are the additional ninety observations worth the
in precision that we see in
5.4.1? Maybe yes and maybe no. In
the answer to that sort of question
depends on two factors: (1) the cost of taking additional measurements, and (2) the cost
of making bad decisions or inappropriate inferences because of
estimates. In
both
be
difficult to quantify.
Unbiasedness
Because they are random variables, estimators will take on different values from sample
to sample. Typically, some samples will yield Bes
underestimate B while
will
to Bes
are numerically too large. Intuitively, we would like the underestimates to
somehow "balance out" the overestimates-that is, fj should not systematically err in any
one particular direction.
5.4.2 shows the pdfs for two estimators, 81 and ih. Common sense tells us that
01 is the better of the two
f8t (u) is centered with
to
true B; ih. on the
other hand, will tend to give
that are too large
the bulk of fih (u) lies to
the right of the true B.
Definition SA.t. Suppose that ~2 •.
YII is a random samplt;. from
continuous
pdf Jy(y; e). where e is an unknown parameter. An estimator (}
h(Y., Yl, .. ,Yn ))
is said to
unbiased (for B) if E (in = (} for aU (}.
same concept and terminology
"!
True 6
True 6
AGURE5A2
382
Chapter 5
Estimation
apply if
px(k; 0»).
data consist
a random sample Xl. X2. ...•
drawn from a discrete pdf
EXAMPLE 5.4.2
2"
this section that 81 = Y/ and
n
>I.':...... u, pdf, h(Y; 0) = 1/0,0 :s y :s 8.
A
outset
estlIDalCOrs for
f)
in
1. . .
are
or both
An application ofthe coronary to Theorem 3.9..2, together with the fact that
i, proves that is unbiased for 6:
0 j2
fh
2
=Ii
2
e
n
=-2>Ii 1=12
2
nO
=Ii
The mSlXllnwn llA'-'J.J.U\.1VU """ • .u.u'''IAJ~, on the other hand, is
is necessarily
to 0, ffh.(u) will not be centered
£(fh) will be less than O.
exact extent to which fh tends to
calculated. Recall from
3.10.2 that
(J
is easily
1
n·-
o
£(ih) =
!
u .
=!!..-.
8"
i on"-ldU
1 8
U"+
1
n+10
n
=--8
n + 1
If n = 3, in
increases, though,
become increasingly
i
as
as 8, on the average. As n
Yrnax will be onty
in
decreases [which makes sense
ffh. (u) will
around 0 as the sample size
Properties of Estimators
Section 5.4
finite
1
Comment.
unbiased. Let
n
11,
383
we can construct an estimator based on Ymax that is
Ymax • Then
+1
E(~) = E
n
n+1
=-II
+1
=-11
_11_8
If + 1
If
=8
EXAMPLE SA.3
Let Xl, X2 •... , Xn be a random sample from a
an unknown
Consider the estimator
e=
where
GiS
a/Xi
are constants. For what values
E(X), so
By assumption, ()
=
E(O) =
pdf px(k; 8), where () = E(X) is
a1, a2. ' .. , an
will
ebe unbiased?
E(
11
= L ai6
;=1
= (}
i=l
aJ
1=1
Clearly, {j will
L"
unbiased for any set of ai's for which
ai
= 1-
;=1
EXAMPlE SA.4
a random sample YI,
.... Yn
8 normal distribution whose parameters
and 0'2 are both unknown, the maximum likelihood
for
is
1
(Yi -
jJ..
-2
Y)
n i=l
5.2.4). Is il 2
eX[)ecteCi value equal to (1'21
....,"''''..'.1-''...
(121 H not, what function of il 2 does have an
384
Chapter 5
Estimation
first,
Theorem
that for any random variable Y, Var(Y)
=
[E(Y)Jz, Also, from Section 3.9, for any average,
of a sample of n random variables,
Yl. Yz,··..
E(Y)
and Var{Y) (l/n}Var(Yj). Using those results, we can write
=
E(';')
~ E [~
=E
[.!.
• (y, -
11
(Y; _
1')']
2YjY
1'Z)]
+
n i=l
=
H~ -nY') ]
E[
= ~ [~E(Yn - O£(y,)]
~ ~ [~(O' +~'). _n( :' +~,)]
n - 1
=-_.(j-
n
the latter is not
to (12. ;:,2 is biased.
TQ "unbias" the maximum likelihood estimator in this case, we need simply multiply
by n
By convention, the unbiased
of the maximum likelihood estimator
n-
for u 2 in a normal distribution is denoted S2 and is referred to as the sample variance:
S2
1
= sarnolIe variance = n
1
n
1
=n-1
Comment. The SQuare root
the sample variance is called the sample standl2rd
deviation:
S
= sample standard deviation =
1
n
11
-lL
;=1
-
Y)2
In practice, S is the most ........ lllL'UliJY used ...."' ..u.......,....... for a even thou.gh E(S)
that E(S2) = a 2].
¢
a [despite
Section 5.4
Properties of Estimators
385
EXAMPLE 5,4,5
By definition, the geometric mean of a set
n numbers is the nth root of their product.
1
Let Yt and Y2 be a random sample of size two
the pdf fy(y; 8) = "ge-y/lJ, y > 0,
where 0 is an unknown parameter.
an unbiased estimator for 8
on the ""'Utl_"""
geometric mean, JYl Y2.
By Theorem 3.9.1,
t
But
= y/O.
r.
t
dt
= dy/£:) and
~ e- dt = r (~) = ~,Jn (recall Theorem 4.6.2). Therefore,
I
01l
4
implies that
~
4JY1 Y2
8=---
,.
is unbiased
O.
Table 5.4.1 is a computer simulation showing the performance of the estimator
= ~/1l when £:) = 1. In columns Cl and C2 are forty random samples drawn from
pdf fy(y; 1)
y > O. The corresponding
means JYtYz are listed
column C3 and
forty simulated 8..s appear in column C4.
Sample twenty-eight yielded the smallest estimate (8e = 0.13792), while Sample twentynine erred the most in the other direction (0" = 3.97022). Notice that the average of the
1.02) is
close to
parameter's true
(8 = 1.00).
the two agree
forty £:)es
so well, of course, is not surprising, given that 8 is unbiased fOT 8.
e
)86
5
Estimation
TABlESA-1
C3
Est.
y2
1.01324
0.84515
0.44146
1.55721
1.68906
0.36449
1.12210
1.54124
0.12599
0.20148
0.53266
0.20425
4.49631
0.07196
0.50555
2.00492
4.40562
0.07702
0.13929
0.09732
0.24751
0.20255
0.04071
0.23687
0..85065
033847
0.67740
2.92107
0.31922
1.86945
0.41461
0.33562
0.23355
0.45424
1.73641
0.07541
0.29699
1.49059
0.48274
2.43756
1.45129
0.61484
0.37557
0.46802
0.17789
0.47298
0.15451
1.43477
0.48771
0.72270
1.06104
0.97953
0.01732
1.28070
3.40310
2.53520
1.53845
3.60054
030786
2..50065
37 0.52834
38 0.80602
39· 0.17185
40 0.98211
0.09598
1.22856
0.64045
0.38732
1.10229
0.86581
0.09313
1.12503
2.84524
1.04371
0..58988
0.87399
0.37540
1.70620
0.83684
0.34976
0..51193
o Jn671
0,46773
0.12326
0.39774
0.55177
1.47327
0.41882
0.85656
1.11027
1.28632
0.18986
0.15741
0.21455
0.19556
0.53909
0.14091
0.41375
0.95004
0..57580
0.10832
3.11820
0.35060
2.04473
1.27423
0.77193
1.99220
0..51628
0.48259
0.77098
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
23
24
25
26
27
28
29
30
31
32
33
34
35
0.70495
3.96959
0.42351
0.76114
1.07608
1.94639
1.1128(1
0.47797
2.17241
1.06550
0..50641
0.70254
1.87583
0.53326
1.09061
1.41364
1.63780
0.24174
0.20043
average til'
0.27317
0.24899
0.68639
0.17941
0.52680
1.20963
0.73313
0.13792
3.97022
0.44640
2.60343
1.62240
0.98285
2.53655
0.65735
0.61445
0.98164
1.92816
0.53923
0.96911
= 1.02
Section 5.4
Properties of Estimators
381
QUESTIONS
5.4.1. Two chips are drawn without replacement from an urn containing five
numbered
1 through 5, The average of the two drawn is to be used as an estimator, for the true
average all the chips (0 =
Calculate POe - 31 > 1.0).
5.4.2. Supprn;e a random sample of
n = 6 is drawn from the uniform pdf fy (y; B) =
1/0,0..::: y ~ B for the purpa;e of using 8 =
to estimate B.
(&) Calculate the probability that 8 falls within 0.2 of 8
that
parameter's true
value is 3.0.
the sample
(b) Calculate the probability of the event asked for in Part (a)
is 3 instead of 6.
5A.J. Five hundred adults are asked whether they favor a bipartisan campaign finance reform
bill. If the true proportion the electorate in favor of the legislation is 52 %, what are
the chances that fewer than
of tha;e in
support
proposal? Use a Z
transformation to approximate the answer.
5.4.4. A sample of size Ii = 16 is drawn from a normal distribution where a = 10 but Jl is
unknown. If Jl = 20, what is the probability that the estimator jl = Y wiUlie between
19,0 and 21.01
5.4.5.
... , Xn is a 'random sample of
Ii drawn from a Poisson pdf, where
A is an unknown
Show that S. =
is unbiased for l. For what type of
in general, will the
an unbiased estimator? Hint:
The answer is implicit in the derivation showing that is unbiased for the Poisson A.
5.4.6. Let
be the smallest order statistic in a random sampJe of
n
from the
uniform pdf, fy(y; 8) = 1/B, 0 ;5; y ~ B. FInd an unbiased estimator for B based on
froin·
.sA.7. Let Y be the random
described in
5.2..3, where fy (y; B) = e-(Y-(}) •
() ~ y, 0 <: O. Show that Y - 1 is an unbiased estimator for ().
5.4.8. Suppose that 14,10,18, and
constitute a
sample of
4 drawo from a
interval [0,0],
0 is unknown. Find an unbiased
uniform pdf defined over
the third order statistic. What
value does the
estimator for B based on
estimator have for these particular observBtions? Is it possible that we would know
that an estimate for 0 based on f3 was
even if we had no
what the true
of B might be? hXlpllilll,
2, Yt and f2, is drawn from the pdf,
1
O<y<B
What must c equal
statistic C(fl
+ 2Y2) is to be an unbiased estimator for
1
5.4.10. A
of size 1 is
from the uniform pdf defined over
interval [0, ()].
an unbiased estimator for
Hint: Is 0 =
unbiased?
5.4.11, Suppose that W is an unbiased estimator for (). Can W2 be an unbiased estimator
for 02 1
SA.12. We showed in Example 5.4.4 that
=
1
n
-2
- Y) is
is known and does not have to be estimated by Y. Show that
fOT
2
Cf
,,2 = ~ t(Y
j
ni=1
unbiased for a 2 .
Jl
•
is
388
Chapter 5
Estimation
5.4.13. AI; an alternative to imposing
tered" by requiring that its medial1 be
an estimator's distribution can be "centhe unknown parameter 8. If it {j is
said to be median unbiased. Let Y"
be a random sample of size If from the
.
.
II + 1
.
um'farm pdf,fy(y; e) = l/e,O :5 Y :5 8. For arbttrary
IJ, lS 8 = - Ymax medlan
to
A
11
Is it median unbiased for any value of n1
oe]
5.4.14. Let Yl,
be a random sample of size IJ from the
fy (y; 8) =
1 n
n . Ymin- Is 8 unbiased for e? Is 8 = Yj unbiased for 81
•
Lete
Y / II ,
y > O.
A
11 i=l
SA.15. An estimator Btl
lim E(en ) ::: 8.
h(WI' ...• Wn ) is said to be asymptotically unbiased for 8 if
W is a random variab1e with E(W)
p., and with variance 0'2.
li-+oo
2
Show that W is an asymploticaHy unbiased estimator for .
5.4.16. Is the maximum likelihood estimator for 0'2 in a normal pdf, where both p., and 0'2 are
unknown, asymptotically unbiased '{
Efficiency
As We have seen, unknown
rallnel.ers can have a multiplicity of unbiased es1imators.
For samples drawn from the uniform
fy(y; 0)
liB. 0:5 Y :5 8,
both
n+1
211
8 = --- .
and e = - L Yi have expel:;lea values equal to e. Does it matter
=
A
A
II
II
1=1
wecmJO!ie
we would like an estimator to
is not the only
important is its precision. Figure 5.43 shows the
associated with two hypothetical
estimators, ~ and
Both are unbiased for 8, but
is
the better of the two
variance. For any value r,
because of
r :5
That is,92 has a
fh :5 e + r)
> P(8
chance of being within a distance r of the unknown 8 than does 91.
A
(I ~-81 s r)
(U)~ __ ----
....
--- --------
_--------
8-r
6
6+r
FIGURE 5.4.3
Definition
Let 01 and fh be two ~w."a,~~.... estimators (or a parameter 8. If
Var(Bl) <
Section 5.4
Prn,r\oI>,.ti_
of Estimators
389
is more efficient than th. Also, the relative efficie.ocy of fh with respect to
Var(th)/Var(fh).
EXAMPlE 5.4.6
Let Yh Yz, and
a
sample from a norma} rliL""''''
are unknown. Which is a more efficient estimator for fl.,
1
1
1
1
1
1
where both fl. and
= 4" Yi + 2Yz + 4' Y3
or
~
=3
-YI + -Yz + -Y3
33
J.l,z
Notice, first, that both ill and Itz are unbiased for Jl:
~
E(J.I,l)
= E
(14
Yl
+
1
1 + 1)
4Y3
2Y2
1
1
= 4'E(Yl) + 2E (Y2) + 4'E(Y3)
111
+ -J.I,
+
2
-Jl
4
=Jl
and
1 3
1 31)
(3
= E - YI + - Yz + - Y3
=
111
3E(Yl) + 3E(Yz) + '3 E(Y3)
1
+
1
3
-Jl
+
1
3
-Jl
=Jl
But Var(ltz) < Var(ill) SO
is
more efficient of the two:
Var(ill) = Var 1
+
+
(1
Var(J1.2) =Var 3YI
h
=
+
1
+ ~Y3)
1
4'Var(Yz)
+
1
16 Var(Y3)
1 + 1)
3" Y2
3Y3
111
"9 Var (Yt) + 9'Var(Yz) + 9'Var(Y3)
3G z
="9
(f
390
Chapter 5
Estimation
emctelncy of it2 to
18
2
30- /30-
8
2
9
or 1.125.)
EXAMPlE 5.4.1
Let . Yz • ... ,Yn be a.
sBlmp.le irom the uniform pdf defined over the interval
2
•
n +1
8]. We know from Example
Y,; and fh =
. YmllX are
n
... ,,"-"."""...... for (J. Which estimator is more em.C1e:ou
Appealing once again to the fact
Var(Y) = E(y2) - [E(y)j2 for any
variable Y, we can write
ItW''''VJLU
Var(9t)
A
= Var (2;;
4 >1
= '2 LVar(Y;)
n j;:l
4
=
L [E(Yr)
>1
- [E(Yi)f]
i=1
But E(Yi ) =
()
"2 (by SVIIlIDE:l:ry
Var(B2)
A
+ -1 '
= Var (n- n
=
=
(11 :
1
Ymax )
Y.
(n : 1)
Var(Ymllx )
2
[E(Y~ax)
_ [E(Ymax)]2]
Section 5.4
Properties of Estimators
391
From EX~lmple 3.102,
so
= 06y2 . ~ (~ )11-1 dy
1
en 1
n
(J
0 y
We
know that E(Ymax )
A
Var(el)
n+1
(
dy
n
= en
)
J,,n+2
n
I(J
n
2
+ 2 0 = n + 2e
n
= -n-+--#1· so
n+
n
n )2]
= (-1)2[e2
- - (- e2
n
n+2
n+l
e2
n(n + 2)
Notice that n(n
+
' Imp
, I'les t h
2)/3n > 1 for n > 1, wh Ich
at A
Bz =n-+-1 . Ymax has a
A
smaller variance (and is more efficient) than Bt
2
n
n
i=l
n
-= - L Yi.
CASE STUDY 5.4.1
World War II, a very simple statistical procedure was oe1{eJlJpe;Q
war production, It was based on serial numbers and proved to
reliable. When the war ended and the Third Reich's "true" production h""l1r~'"
were revealed, it was found that serial number estimates were far more accurate
instance than all the information gleaned from the more
operations, spies, and informants..
Every piece of German equipment, whether it was a V-2 rocket, a
tire, was stamped with a serial number that .u"u......" ........
it was manufactured. If the total number of, say, Mark I
produced by a
date was N, each would bear one of the integers 1 to N. As the war pr<>gr'esSiea,
(Col1linued on next page)
392
Chapter S
Estimation
(C&e Study 5.4.1 continued)
some of these numbers became known to the Allies-either by the direct capture of
a
of equipment or
when a command
was overrun. For
the War Department's
the problem was to estimate N
the sample
of "captured" serial numbers, 1 ~
y~ < ... < Y~ ~ N.
approaches were
model assumed that the 11 serial numbers
: ) possible sets of It OTlJCI'eo integers from 1 to N and that each set
That is,
was
P (Yt' = Yt' < Y~ =
Y2
< ... < y'n
= . l)
n
=
(N)-l
n
The parameter N was then estimated by ......'.... lIl'S the
numbers to the
serial number:
+
- -1 L '("Y '}
n -
1 ..
Y~
J
"gap'" in the serial
1)
•> J
So, if five tanks were "",,,,ntl,pl"! and they bore the numbers 14,
estimate for the total number tanks produced would be
Ntl! =
298
1
+ 4[(298
146 - 1)
+ ... +
and 298, the
14
1)J
=368
A sec;Qnd estimator used was a
eS1Irn:al()r ~'~'·6~r •• '" to as fh. in J...;AQ.u.FL<;;
N2 = (11 :
Both Nl and
are
be shown that
V""~ll"''1'\
1)
good estimators, but
Var(N2)
of the modified maximum likelihood
_1
N2 has a slightly smaller variance: It ca.o
=1
1
Var(Nt)
(63) for details].
The difference in the
of NI and N2 compared with estimates obtained
from intelligence reports and covert activities was astounding.
serial number
for German tank production in 1942, for example, was thirty-four hundred,
very close to the
"official" estimate, based on information
(;()I1linued on nl!xt page)
Section 5.4
Properties of Estimators
393
gathered through more conventional wartime channels, was a
inRated eighteen
thousand.
magnitude were not uncommon. The sophisticated Nazi
Discrepancies of
propaganda machine
have been
root cause of the "normal"
consistently
on the high side. Germany sought to intimidate her
by
country's industrial prowess. On people, the carefully orchestrated
exaggerating
dissembling worked exactly as planned; on NI and N2, though, it had no effect
whatsoever!
QUESTIONS
SA17. Let
X2 ....• Xn denote the outcomes of a
Xj
fori=L2 ..... 'LLetX=Xl
that PI
(Il)
of II independent trials. where
1
{0
with probability p
with probability J
+
+ ... + Xn.
p
h ::::: - are unbiased estimators for p.
/I
better estimator than Pt because Pt fails
Xl and
(b) Intuitively,
is a
to include any of
the information about
parameter contained in trials 2 through I!. Verify that
speculation by comparing tbe variances of and Pt..
5.4.18. Suppose that II = 5 observations are taken from
unifonn pdf, fy(y: 8) = 1/8,0 :::
y ::: 8. where 8 is unknown. Two unbiased estimators for e are
6
- . Ymax
5
beuer to
Hint: What must be true of Var(Ymax ) and
Which estimator would
Var(Ymjn) given that frey; 8) is symmetric? Does your answer as to which estimator is
better make seose on intuitive grounds? Explain.
5.4.19. Let Yl. f2, .... YI! be a random sample of size n from the pdf fy(y; 8) =
(8) Show that 81 = Yl • .~, and ~ = n . Ymin
(b) Find the variances of tJl.lh, and 8.3.
(c) Calculate the
efficiencies of ~ to
~e-Y/8. y
are all unbiased estimators
> O.
tJ.
82 to
S.4.2(). Given a random
of size II from a Poisson distribution, ~1 = X I and ~2::::: are
two unbiased estimators (or A. Calculate
relative efficiency of to
SA.2L If YI. Y2 • ...• YI! are random observations from a uniform pdf over [0,8), both ~ =
II
(
5.4.22.
+,.
L
1) .
and fh ::::: (n
A
+
1) . Ymin an;: unbiased estimators for e. Show that
Var(~)/Var(ed =
that WI is a random variable with mean 1-t
variance
and W2 is a
we know that
random variable with mean 1-t and variance a}. From Example
cWt + (1 - C)W2 is an unbiased estimator of J.L for any constant c > O. If WI and
W2 are independent, for what value of c is the estimator cW1 + (1 - C)W2 most
efficient?
ai
394
Chapter 5
5.5
MINIMUM-VARIANCE ESTIMATORS: THE CRAMER~RAO lOWER BOUND
Estimation
Given two estimators, Ol and /h, each unbiased for the
0, we know from
Section 5.4 which is "better"-the one with the smaller variance. But nothing in that
to the more fundamental question
how good
and
are relative to
section
the infinitely many other unbiased estimators for B. Is there a
for
that has a
smaller variance than does either OJ or /h? Can we identify the unbiased estimator having
the smallest variance? Addressing those concerns is one of the most elegant, yet practical,
theorems in aU of mathematical statistics, a result known as the Cramer-Roo lower bound.
:)U()Oose a random sample of size n is taken from,
a continuous probability
with fy(y; B) is a
distribution frey; B), where 0 is an unknown parameter.
theoretical Limit below which the variance of any unbiased estimator for B cannot falL
That limit is the Cramer-Rao lower bound. If the variance of a given is equo./ to the
Cramer-Rao lower bound. we know that estimator is optimal in the sense that no unbiased
can estimate B with greater n,,~·"""',...n
9t
e
e
Theorem 5.5.1. Cramer-Rao lnequolity LeI Y1 •
. •. YI! be a random sample from the
continuous pdf fy (y; B), where fy (y; 0) has continuous first-order and second-order partial
derivatives at all but a finite set of points. Suppose that the set of ys for which fy(y: B) :;:. 0
not depend on e. Led; = hey], Y2..'" • Y,,) be any unbiased estimator forB. Then
(A similar statement holds if the n observations come from a discrete pdf, px(k; B»).
o
Proof. See (93).
EXAMPle 5.5.1
...• XI! denote the number of successes (0 or 1)
P = P(success occurs at any
trial) is an
Suppose the random variables Xl,
in each of n independent trials,
unknown parameter. Then
Let X = Xl
is
+
X2
for p
+ ... + Xn
= total number of successes and define
(E(jJ) = E(:) = E~)
n;
=
P). How
p = X. Clearly, p
Var(p) compare with
the Cramer-Rao lower bound for px/(k; p)?
Note,
that
Var(p) = Var
(X) =
It
12 Var(X) = 12np (1 - p) = ---p-}
It
n
n
n
Section 5.5
Minimum-Variance Estimators: The Crame--Rao lower Bound
(since X is a binomial
variable). To evaluate,
Cramer-Rao lower bOund,-we begin by writing
395
/U!cond (onn of
tn(1 - p)
Moreover,
iJ
p)
Xi
= - - -:--p
op
and
Taking the expected
of the second derivative
p
E
=-""2
p
(1
p)
-'---=--::c = (1 -
J
p(l - p)
The Cramer-Rao lower bound, then, reduces to
1
-n [
p(l
~
pJ -
pCl n
whicb equals the variance of p = X. It follows that X is the preferred statistic for
n
n
estimating the binomial parameter p: No
estimator can possibly be more precise.
Definition
e denote the set of
fJ =
... , Yn ) that are
unbiased for the parameter 8 in the continuous pdf fr(y'; 8). We say that ij'+ is a best
(or minimum-variance) estimator if B* E e and
Var(B*) ::: Var(O)
for all BEe
(Similar terminology applies if e is the set of all unbiased estimators
e in a
the parameter
pdf, px(k; 8».
Related to
notion of a best estimator is the concept of efficiency. The connection
is spelled out Definition 5.5.2 for
case where 8 is based on
coming from a
continuous pdf jy (y; 8). The same
applies if the data are a set of XiS from
a discrete pdf px(k; 8).
Let Yl, f2 •...• Yll
a random sample of
n
from the
col1tinuous pdf jy(y; 8). Let {j = h(Yl, Y2 • ... , Yn ) be an unbiased estimator for e.
396
Chapter 5
Estimation
e
unbiased estimator is
to be efficient if
of 0
tbe
Cramer-Rao lower bound associated with Jy(y; 8).
b. The efficiency of an unbiased estimator is the ratio of the Cramer-Rao lower
bound for Jy(y; e) to the variance of
L
e.
e
Con:uneJlt. The designations "efficient" and "best" are not synonymous. If the
of an unbiased estimator is equal to the Cramer-Rao lower bound, then that
...... "'LU."'..U. by definition is a best estimator.
converse, though, is not always true. There
are situations for which the variances of lUI unbiased estimators achieve the Cramer-Rao
lower bound. None of those, then, is efficient, but one (or more) could still be termed best.
For the independent trials described in Example 5.5.1,
p = XII is both efficient and best
EXAMPU 5.5.2
e. e=
2
If Yl.
'"" Y" is a random sample from frey; 6) = 2y/6 , 0 < y <
~
is an
unbiased estimator for 6 (see Question 5.5.1). Show that the variance of is less tJw.n the
Cramer-Rao lower bound for Jy(y; e).
Applying Theorems 3.6.1 and
to the proposed
we can write
e
(3 -) = Var (3'2 . ;;16t
~ Yj )
~ = Var "2 . Y
Var(6)
=
9
1l
L:Var(Yi)
i=1
where
Var(Yi)
= E(Y1) = f
=
(J
10
e2
[E(YdJ2
2y
i . -(J2 dy - [ 10f
e
2y
y . - 2 dy
e
2
]
Therefore,
Var(6)
==
e2
=8n
To calculate the Cramer-Rao lower bound for fy(y; €f), we first note that
In Jy(Y; €f)
= In[2Ytr2] = In2Y -
21ne
Section 5.S
Minimum-Variance Estimators: The Cramer-Rae Lower Bound
391
and
-2
()
Therefore,
4
and
Is the variance of
e less than the Cramer-Rao lower bound? Yes,
A
~
~
< 4n' Is the
statement of Theorem 5.5.1 contradicted? No, because
theorem does not apply in this
situation:
range of fy(y; ()) is a function
a condition that violates one of the
Cramer-Rao assumptions.
e,
QUESTIONS
5.5.1.
the claim made in
5.5.2 that 8 = ~
2
parameter()infy(y;8) 2yj8 ,O < y < 8.
. Y is rul unbiased estimator for the
=1
S..u Let YI. y2 •...• Yn be a random sample from fy(y; 0)
7/-'1/9 , y > O. Compare the
Cramer-Rae (ower bound for Jy(y: 8) to the variance of the maximum likelihood
estimator
_
1
0.8
-
11
E
Is Y a best estimator for 87
IIj .. l
5.5.3. Let XIt X2 •...• X" be a random sample of size 11 from the Poisson distribution, PX (Ie; .i..) ==
e-A.i..k
Al"
k ;:::: 0, I, .... Show that A = E Xj is an efficient estimator for .i...
111=1
5.5.4. Suppose a c8J1dom sample of size 11 is taken from a normal distribution with mean J.L and
vari8J1ce .,.2, where.,.2 is known. Compare the Cramer-Rao lower bound for fy(Y; J.L)
with the variance of [L
= Y = ~n
Yi. Is Y an efficient estimator for J.L?
5.5.5. Let YI. Y2 ••.. , Yn be a random sample from the uniform pdf fy (y; 0) = 1/0,0 :::s y :::s 8.
Compare
Cramer-Rao lower bound for fy(y; 0) with the
of the unbiased
n +1
O.
•
A
estllTlator 0
= - - . Ym&X'
11
lSCUSS.
¥..18
Chapter 5
Estimation
5.5.6. Let fl. Y2 •... , Y" be a random sample of size n from the pdf
fy (y; 8)
(a) Show that
e=
(b) Show that {}
1
= -----,--:c(r -
y> 0
is an unbiased estimator for 8.
= -r
is a minimum variance estimator for 8.
5.5.7. Prove the
of the two forms
for the Cramer-Rao lower bound in
Theorem 5.5.1. Hint Differentiate the equation
fy(y) dy = I with respect to 8 and
1a
00
deduce that
In fy(y) fy(y) dy = 1. Then differentiate again with respect to 8,
-00
5.6
00
SUFflOENT ESTIMATORS
Statisticians have proven to be quite diligent (and creative) in articulating properties
that good estimators should exhibit. Sections 5.4 and
for example, introduced the
will
notions of an estimator being unbiased and having minimum variance; Section
those properties are
explain what it means for an estimator to be "consistent." All
easy to motivate, and they impose conditions on the probabilistic behavior of 6 that make
eminently good sense. In this section, we look at a deeper property of estimators, one that
is not so intuitive but has some particularly important theoretical implications.
Whether or not an estimator is sufficient
to the amount of "information" it
contains about the unknown parameter.
of course, are calculated using values
obtained from random samples (drawn from either px(k; 8) or fy(y; 8». If everything
that we can possibly know from the data about 8 is encapsulated in the estimate Be, then
the corresponding estimator {; is said to be sufficient. A comparison of two estimators,
one
and the other not, should
clarify the COlrlC~)Pr.
An Estimator That Is Sufficient
taken from
Suppose that a random sample of
the Bernoulli pdf
where p is an unknown
likelihooo estimator for p is
We know from Example 5.1.1 that the maximum
p=(~)
(and the maximum likelihood
sufficient estimator for p
Xl = kl,.··. X,. = k" given that
Xi
is Pe = (;;)
kil. To show that
p
is a
that we calculate the conditional probability that
p = Pe·
Section 5.6
Sufficient Estimators
Generalizing the Comment following Example 3.11.7, we can write
= k}. ...• Xn = k" I p
P (X I
Pe)
=
= kl'l
= Pe)
... , X"
()
---"-......::..---:---~-:.......----:...:..;..
P(X 1 = 1<1 ••••• X"
= kl'l)
= Pe)
But
=
"
Ek[
pi_I
"
(1 _ p)
"~Ekl
i-I
= p"Pe (1 _ P )n-"pt
and
n
since
L
Xl has a binomial distribution with parameters nand P
Example 3.9.3).
1=1
Therefore,
(5.6.1)
Notice that P(Xl
condition that
= k1.·.,. X" = kn I P= Pe}isnotafunctionofp· That is predselythe
p = (~) Xi a sufficient estimator. Equation 5.6.1
in effect,
that everything the data can
tell us about the parameter p is contained in tbe
esthnate Pe.Rememberthat, initially, tbejoint pdf of the sample, P(XI = k1. ... , X" k,,),
is a function of the kjs and p. What we have just shown, though, is tbat if tbat probability
is conditioned on tbe value of this particular estimate-tbat
on p =
p is
eliminated and the probability of tbe sample is completely determined (in this case, it
=
):is
Ii
the number of ways to arrange
Os and in a. sample
nPe
of size n for which p = Pe).
If we had used some other estimator--say, p"-and if P(XI = kl.·· .• Xn = kn I
p;) had remained a function of p, the conclusion would be that the information in
was not "sufficient" to eliminate the parameter p from the conditional probability. A
p* =
p; would kl and the conditional
simple example of such a p* would
equals ( n )
np~
-1 , where
(
P;
400
Chapter 5
Estimation
probability of Xl
.= kl' ... ,
XI1
= kn given tbat ill' ::: p; would remain a function of p:
k/
P(XI
Ekl
= k1, .... Xn = kn I pili. = kl) = ---,-----:--:-- = pi-:l (1
11-1-
-
p)
Ek/
i~2
Comment. Some of the dice problems we did in Section 2.4 have
that parallel
to some extent the notion of ao estimator
Suppose, for
we roll a
pair of fair dice without being allowed to view the outcome. Our objective is to calculate
the probability tbat the sum showing is an even number. If we bad no other information,
the answer would be ~.
though, that two people do see the outcome-which was,
is allowed to characterize the outcome without providing lIS
in fact, a sum of 7---and
with the exact sum that occurred. Person A tells us tbat "the sum was Ie&,; than or equal
to seven"; Person B says tbat "tbe sum was an odd number."
Whose information is more helpful? Person B's. The conditional probability of the
sum
even
that the sum is less than or equal to 7 is
which still leaves our
initial question largely unanswered:
ft.
P(sum is even I sum :s: 7)
P(2)
+
P(4)
+
P(4)
+
+
P(6)
.~:c----:-::::-
9
21
In contrast, Person B utilized the data in a way that definitely answered the original
question:
P(sum is even I sum is odd) = 0
In a sense, B's information was "sufficient"; A's information was not.
An Estimator That Is Not Sufficient
Suppose a random sample of size n-Yl = Yl, .. ,Yn = yn-is drawn from the uniform pdf
fy(y: 0) = 1/0.0 ::s: y .:s 0, where e is an unknown parameter. Recall from Question 5.2.15
that the metbod of moments estimator for 0 is
e = 2}' =
A
_
(2)~
- L.; }'j
n
;..1
latter is not a sufficient statistic because all the information in the data that pertains
to the parameter e is not necessarily contained in the numerical value OeIf 8 were a sufficient statistic, then any two random samples (of size n) having the
same value for Oe should yield exactly the same information about O. A simple numerical
Section 5.6
Sufficient Estimators
401
example shows that not to be the case. Suppose n = 3. Consider the two random samples
Yl = 1. Y2 == 2, Y3 = 3 and YI = 0, Y2 1, Y3 = 5. In both cases,
Do both samples, though, convey the same information about the possible value of
No. Based on
sample, the parameter 0 could, in
be equal to 4 (= fJe ). On
the other hand. the second sample rules out the possibility that 8 4 because one of the
observations (YJ = 5) is larger than 4, but according to the defmilion of fy (y; 8), all
must
be tween 0 and O.
A formal Definition
Suppose that Xl = kl' ... , XII
k" is a random sample of
pdf px(k; 8), where B is an unknown parameter. Conceptually,
for fJ if
P(XI = kt, ... , X"
n from the discrete
is a sufficient statistic
= k" I {) = Be)
"
(5.6.2)
where Po (Be; 8) is the pdf of the estimator evaluated at the point {j = Oe
b(k1 • ..• , k ll )
is a constant independent of (J. Equivalently, the condition that qualifies an estimator as
being sufficient can be expressed by cross-multiplying Equation 5.6.2.
Definition 5.6..1.
Xl = kl.'." X" = be a random sample of size n
px(k; B).
The estimator = h(Xb ... • Xn) is sufficient for (} if the likelihood function, L(B),
factors into the product of the pdf for and a constant that does not involve O-that is,
if
e
e
L(fJ)
Il" px(kj; fJ)
Pj;(8e ; O)b(k 1 .···, k n }
i=l
A similar statement holds if
data consist of a random sample Y1
drawn from a continuous pdf fy(y; 8).
Comment.
is sufficient for then any one-to-one function
estimator for B. As a case in point, we showed on p. 399 that
= YI.,··, Yn =
Bis
Yn
a sufficient
4(12 . Chapter 5
Estimation
is a sufficient estimator for the parameter p in a Bernoulli
It is
true,
that
n
p*=np= L
1",,1
is sufficient
p.
EXAMPlE 5.6.1
Let Xl = k1, ... , Xn ::::: kn be a random sample of size n from the Poisson pdf, px(k; A) =
e- AAle I k!, k 0, I, 2, ... Show that
1=1
is a sufficient estimator for A.
From Example 3.12.10, we know that ).,
a sum of n independent Poisson random
each with parameter A, is itself a
random variable with
nA.
then, ). is a sufficient estimator for A if the sample's likelihood function
a product of the pdf for i times a constant that is independent of A..
n
Tl
L(J..) =
e-A>.k; I kit
==
i=l
(5.6.3)
proving that i
L"
Xi is a sufficient estimator
>..
1=1
Comment. The factorization in Equation 5.6.3
that
estimator for >.. It is not, however, an unbiased estimator for J..:
E(i)
=
£(Xi)
Constructing an unbiased estimator based on the sutficlent
matter. Let
i
Tl
L
i=l
Xi is a sufficient
Section 5.6
~
Then EO.*)
function
1
1
= -E(A)
=
-nA =
n
n
Sufficient Estimators
A
403
>." is a one-to-one
A
A, so
is unbiased. for A. Moreover,
>., so, by the Comment on p. 401, i* is, itself, a sufficient estimator for L
EXAMPLE 5.6.2
Let Yl = Yb ... , Yn = y" bearnndomsampleofsizen from the uniform pdf fy(y; e)
O::s y ::s 8. We know
Question 5.2.9 that
= l/o,
iJ = Ymax
is the maximum likelihood estimator for 8. Is Ymax also sufficient
Recall from Example 3.10.2 that
fyrtiJj~(y}
Here, Fy(y) = P(Y ::s y)
81
= n[Fy(y}]n-1 fy(y)
= loy (~) dt = YIB, SO
/Ymtlx (Ymax;
Whether iJ = Ymax is sufficient for () hinges on whether the factorization of L(e) described
n fy(yj;()
II
Definition 5.6.1 can
accomplished. But L(e)
=
= (ll{)n, and we can
i=l
write
1
n-l
L(e) = (l/{)"
= nymax
en
=
.
f~(ee; e) . b(y}, ... , Yn}
which proves that iJ = Ymax is a sufficient estimator for
e.
A Second Factorization Criterion
Using Definition 5.6.1 to verify that an estimator is
requires that the pdf
Pfj(h(kt. ... , kn); 9} or fiJ(h(yl •...• yn}~ 8) be explicitly identified as one of the two factors
whose product equals the likelihood function. If is complicated, thougb, finding
pdf may be prohibitively difficult. The next
gives an alternative factorization
criterion for establishing that an estimator is sufficient. It does not require that the pdf for
9be known.
e
Theorem 5.6.1. Let XI = kl, ... , Xn = kn be a random sample of size n from the discrete
pdf px(k; e). The estimator = h(Xt. .. ,XII} is sufficient for e if and only if there are
functions g(h(kl . ... , k,,}; e) and b(k1 , ••• , k,,) such that
e
L(e)
= g(h(kt •. ... kn}; e}
. b(k!. ...• k,,)
(5.6.4)
where the function b(kl • ...• kn ) does not involve the parameter e. A similar statement hDlds
in the continuous case.
404
Chapter 5
Estimation
Proof.
suppose that 8 is sufficient for e. Then the
criterion of
Definition 5.6.1 includes Equation 5.6.4 as a special case.
Now, assume that Equation 5.6,4 holds. The theorem will be proved if it can be shown
that g(b(lel • ...• k,,); 8) can always be "converted" to include the
of 8 (at which
point Definition 5.6.1 would apply). Let c be some value of the function b(kl •...• kif)
and let A be the set of samples of size n that constitute the inverse
of c-that is
A
(e). Then
=
Pfj(e; 8)
=
1'1
L
px[,x2 •.... x,,(k1,k2 •.. ,kif):::::
npX,(ki)
(kt.k2 ..•• knkA 1=1
L
=
L
gee; 8) . b(kl. k2 • ... , len) = gee; 6)·
b(kt.
... ,len)
(kt. k2._·. k,,)eA
(kt.k2 •••.• kn)tA
Since we are only interested in points where po(e; 8)
E b(kt. k2, .... lell ) ::F- O. Therefore,
::F-
0, we can assume that
(*1 h.··.kn}tA
gee; 6) = Pa(e; 8) .
1
(5.6.5)
Substituting the right-hand-side of Equation 5.65 into Equation 5.6.4 shows that 8
qualifies as a sufficient estimator for 6. A similar
can be made if the data
'-UJJ""U>L of a random sample Yl = YI •... , Y" = y" drawn from a continuous pdf Iy (y; 8).
See (211) for more details.
0
Sufficiency As It Relates to Other Properties of Estimators
Chapter 5 has constructed a rather elaborate facade of mathematical properties and
procedures associated with estimators. We have
whether 8 is unbiased, el11lC1e~nt,
andior sufficient. How we find 8 has also come under scrutiny--some estimators have been
derived using the method of maximum likelihood; others have come from the method
of moments. Not all of these
of estimators and estimation, though, are entirely
Ollllt--50Ime are related and interconnected in a variety of ways.
8s exists for a parameter 8,
Suppose, for example, that a sufficient
suppose that 8M is the maximum likelihood estimator for that same 8. ff, for a given
sample, 8s = 8e , we know from Theorem 5.6.1 that
Since the maximum likelihood estimate, by definition, maximizes L(8), it must also
LL""'A'U~ g(8e ; 8). But any 8 that
g(81!; 8) will necessarily be a function of fJl!' It
follows, then, that maximum likelihood estimators are
functions sufficient
estimators-that
8M = f(8s) (which is the primary theoretical justification
why
maximum likelihood estimators are preferred to method of moments estimators).
Section 5.6
Sufficient Estimators
405
Sufficient estimators also playa critical role in the search for efficient estimators--that
is, unbiased
whose variance equals the Cramer-Rao lower bound. There
be an infinite number of unbiased estimators for any unknown parameter in any pdf. That
said, there may be a subset of those unbiased estimators that are functions of sufficlent
estimators. If so, it can be proved (see (93)) that
variance of
unbiased estimator
based on a sufficient estimator will necessarily be less than the variance of every unbiased
estimator that is not a function of a sufficient estimator. It follows,
that to find an
efficient estimator for 8, we can restrict our attention to functions of sufficient estimators
forO.
QUESTIONS
5.6.1. Let Xl, X2 •... , X" be a random sample of
px(k;p}=(l -
n from the geometric distribution,
p)k-l p ,k=1,2, .... Showthalp=
X j is5ufficientforp.
5.6.2. Suppose a random sample of size II is drawn from the pdf,
fy(y; B) =
e-(y-8),
B::s y
e
(0) Show that ==
is sufficient for the threshold parameter B.
(b) Show that Ymax is not sufficient for B.
5.6.3. Let X h X 2, and
be a set of three independent Bernoulli random variables with
unknown parameter p
(,(Xi
It was shown on p. 399 that p = XI + X2 +
is
sufficient for p. Show that the linear combination p* = X I + 2X2 + 3X3 is not sufficient
for p.
5.6.4. If Bis sufficient for B. show that any one-to-one function of 0 is also sufficient for B.
=
.5.6.5. Show that
q2 =
Yr is sufficient for
0'2
normal pdf with J1. = O.
Let Yt, Y2 • ... , YlI be a random sample
fy(y: B) =
if YI. Y2 • .•.• Y" is a random sample from a
size n from the pdt of Question 5.5.6,
1
(r -
O.S: y
for posi tive parameter B and r a known positive integer. Find a sufficient statistic for B.
5.6.7. Let Yt, Y2 • ...• Y" be a random sample ofsize n from the pdf fy(y; 8) = By8-1, 0 :s y .S: 1.
Use Theorem 5.6.1 to show that W =
Yj is a sufficient estimator for B. Is the maximum
likelihood estimator of B a function of W?
5.6.8. A probability model 8W(w; B) is said to be expressed in exponential form if it call be
written as
cw(w; 8)
= e K (w)p(8)+.'i( ml+q(8)
where the range of W is independent of 8. Show that (j =
K (Wi) is sufficient for £J.
406
Chapter 5
Estimation
5.6.9. Write the pdf Iy (Y; A) == l.e-.A),. Y > 0, in exponential form and deduce a sufficient
statistic for A(see Question 5.6.8). Assume that the data consist of a random sample of
size rI.
5.6.10. Let Yl. Y2 • •.•• Y" be a random sample from a Pareto pdf,
fy(y;8)=fJ/(1
+
y)il+l,
0 < Y <
00;
0 <
f)
<
00
Write fy(y; f) in exponential form and deduce a sufficient statistic for
tion 5.6.8).
5.1
f)
(see Ques-
CONSISTENCY
The properties of estimators that we have examined thus far-for
unbiasedness
and sufficiency-have assumed that the data consist of a fixed sample size n. It sometimes
makes sense, though, to consider the asymptotic behavior of estim.aton;: We may find, for
ex.ample, that an estimator possesses a desirable property in the limit that it fails to exhibit
for any
n.
Recall Example 5.4.4, wbich focused on the maximum likelihood estimator for
m
a sample of
rI
~
1"
on (12 = -
drawn from a normal pdf [that
L
-
(Y, - y)2J. For any
11 ;=1
finite n, &2 is biased:
E
L., (Yj
( -l~
-
-2) = n-1
Y)
n
11 1=1
As rI goes to infinity, though, the limit of £(&2) does equal
, and we say that (12 is
asymptotically unbiased.
Introduced
this section is a second asymptotic
of an estimator. a
property known as consistency. Unlike asymptotic unbiasedness, consistency refen; to the
shape of the pdf for 011 and how that shape
as a function of I'l. (To emphasize the
fact that the estimator for a parameter is now being viewed as a sequence of estimators.,
we will write 0"
of 0.)
an
Oefiniti<m 5.7.1. An estimator = heW}, W2 • ... , WlI ) is said to be consistent for
it converges in probability to (j-that is, if for all e > 0,
{j
if
Comment.. To solve certain kinds of sample size problems, it can be helpful to think
of Definition 5.7.1 in an epsilon/delta context; that is, On is consistent faT f) if fOT all e > 0
and 8 > 0, there exists an nee, 8) such that
POOl! -
(j
I <e)
> 1 - 8
for
n > n(e, 8)
Section 5.7
ConsIstency 407
EXAMPLE 5.1.1
Let Yl, Y2, ... , Y" be a random sample from the unifonn
frey; 8)
1
= e'
0:::: y :::: B
and let 811 = YmV" We already know that Ymllx is biased for B, but is it conIDstent?
Recall from Example 5.4.7 that
Therefore,
P08" - 8
[(8
I <e) =
P(fJ
e)/fJ] < 1, it follows that [(0 - e)/8]1I -+ 0 as n -+
e) = I, proving that 0"
Ymax is consistent for 8.
=
lim P(lOIl - 81 <
"-HX)
00.
Therefore,
Figure 5.7.1 illustrates the convergence of On.
n increases, the shape
h max (y)
changes io such a way that the pdf becomes increasingly concentrated in an e-neighborhood
of 8. For
n > n(e, .s), POO" - fJl < e) > 1
.s.
If 8, e, and S are specified, we can calculate n(e, S), the smallest sample size that will
enable 6n to achieve a given precision. For example, suppose B 4. How large a sample
is required to
eTl
80% chance of lying within 0.10 of 8?
In the terminology of the Comment on page 406, e = 0.10, 8 = 0.20, and
=
an
~
P08 - 41 < 0.10)
=:
1 -
(4 -4
0.10)11
8-8
o
1
2
3
n(8, 8)
FKiURE 5.1.1
n
?: 1 - 0.20
408
Chapter 5
Estimation
Therefore,
(O.975)'I(E,§)
= 0.20
which implies that n(e. S) == 64.
consistency is Chebyshev's inequality, which appears
here as
More generally. the latter serves as an upper bound for the
probability that any random variable lies outside an e-neighborhood of its mean.
Theorem S.7.L (Chebyshev's in.eq'W.11ity.) Let W
variance (J"l. For any e > 0,
any random vari11ble with mean /.1 and
P(I W - /.11 < e) ::: 1 -
a2
or, equivalently,
Proof. In the continuous case,
Var(Y)
f': =l
=
Ili frey) dy
l
J.-t(y _ /.1)2fy(y)dy
+llHt (y _ fJify(y)dy
Omitting the nonnegative middle integra!
Var(Y)
:::.l
(y -
/.1)2fy(y)dy
an inequality:
P
,-e
+[
11.+£
p-t
-00
1
00
(y - /.1)2 frey) dy
+
-co
(y -
pi frey) dy
#+£
(y - /.1)2 frey) dy
:::. {
Ay-p,?:.£
:::.[
(;2 frey)
ly-pl?:£
=
dy
poy - ILl::: s)
Division by i1- completes the proof. (If the random variable is discrete, replace the
integrals with swnmations.)
0
EXAMPlE 5.7.2
Suppose that Xl. X2 .... , X" is a random sample of size n from a discrete pdf px(k; tt).
where E{X)
= /.1
estimator for /.11
and Var(X)
=
<
00.
Let jln
= (~)
XI. Is jln a consistent
409
Section 5.7
""~\TrI'lrta
to Chebyshev's inequality,
Var(/l,,)
> 1
.
82
(1 8 Xj 1
n
But Var(/l,,) = Var ;;
)
=
~I <
PO/ln -
0'2
For any e, 0, and 0'2, an n can be found that makes -2 < 8. Therefore, lim PU/l1l
ne
11-+00
< e) = 1
/l" is consistent for ~),
- #1
Comment.
fact that the "a'"pJ'''' mean, /l", is necessarily a consistent estimator
pdf the data come from, is often
to as the
for
true mean ~, no matter
weak law o/large numbers. It was
by Chebyshev in 1866.
Comment. We saw in Section
one of tbe
reasons tbat justiis the fact
using the method of maximum likelihood to identify good
maximwn likelihood estimators are necessarily functions of sufficient Glll"Ll!..'», A:s
an additional rationale for
maximum likelihood
it can
very general conditions that
likelihood estimators are
QUESTIONS
5.7.L How large a sampJe must
guarantee that
iln
=
from a normaJ pdf where
18 in order to
Y/ has a 90% probability of lying somewhere in the
interval [16, 20J1 Assume that a = 5.0 .
Let Yt, f2 •...• Yn be a
of size n from a normal
that S2
"
1
n
=-L
ni ... 1
f 2 is a
estimator for (J2
= O. Show
= Var(Y).
I
5.7.3. Suppose flo fl •...• fn is a random sample from the
y> O.
(8) Show that
= Y1 is not consistent for A.
(b) Show that
S.7A. An estimator
having JL
pdf, Jy(y; A) = Ae-J.y,
Yi is not consistent for A.
is
(8) Show that any
Hon 5.4.15).
(b) Show that any
tion 5.7.1.
to be squared-error consistent for e if lim
lIHI-eo-error consistent
:.Ir .... _'TT.,r
consistent
- e)2]
= O.
en is asymptotically unlOlaJ5ea (see Quesis consistent in
sense of Defini-
410
Chapter 5
Estimation
5.7.5. o .... ~'vv'''''
5.7.6. If 2n +
mean IJ,
5.8
is to be used as an estimator for the parameter e in the uniform
D:s }' ::s e. Show that 811 is squared-error consistent (see Question
1 random observations are drawn from a continuous and
with
and if fy(Jl; Jl) >/= D. then the sample median,
• is unbiased for IJ" and
1/(8[fy{J.L: IJ,)]2n) [see (54)]. Show that 11" =
is consistent for IJ,.
BAYESIAN ESTIMAnON
analysis is a set of statistical techniques based on
from Bayes' Theorem (recall Section 2.4). In particulaf,
methods for incorporating prior
into
An interesting example of a
solution to an unusual
problem
occurred some years ago in the search for a missing nuclear submarine. In the Spring
of 1%8, the USS Scorpion was on maneuvers with the Sixth Fleet in Mediterranean
to
homeport of Norfolk, Virginia. The
waters. In May, she was ordered to
last message from the Scorpion was received on May 21, and indicated her position to be
a
of islands eight hundred miles off the coast
about fifty miles south of the
of Portugal Navy
that the sub had sunk somewhere along the eastern
coast of the United States. A massive search was mounted, but to no avail, and the
fate remained a mystery.
Enter John
a Navy expert in deep water exploration, who believed the
not been found because it had never reached the eastern seaboard and
waS still somewhere near the Azores. In setting up a search strategy.
the area near the Azores into a grid of n squares, and solicited the
of veteran
commanders on the chances of the "",rnlt""'"
in each
those
Combining their opinions resulted in a set
...• P(A ll ). that the sub had sunk in areas 1,2, ... , n, res:pe,ctl1vel'l/.
Now,
P(Ak) was the largest of the P(AI)S. Then area k would be the first
be
if it had sunk in
region searched. Let Bk be the event that the Scorpion
area k and area k was searched. Assume thai the sub was
From Theorem 2.4.2,
becomes an updated
it P*(Ak). The remaining P(Aj)s, i
be normalized to form the
probabilities P*(Aj), i :;e k, where
L"
>/= k,
can then
P*(Ai)
= L If
i=l
then area j would be searched next. If the sub
set of probabilities, P"''''(Al), P*"'(A2), ... , P"""(A n ), would be
in the same
and the search would continue.
'-"-LV,,,,,,, of 1%8, the USS Scorpion was, indeed, found near the
men aboard had perished. Why it sunk has never been disclosed. One
~~!'>E>~~'~.~ that one of its torpedoes accidentally exploded; Cold Waf COl[lS[nHtcy
think it may have been sunk while spying on a group
What is
the strategy of using Bayes' Theorem to update the location
X",un.,., .. might have sunk proved to be successful.
Section 5.8
Bayesian Estimation
411
Prior Distributions and Posterior Distributions
ConceptuaHy, a major difference between Bayesian analysis and non-Bayesian analysis
are the assumptions associated with unknown parameters.
a non-Bayesian analysis
(which would include all the statistical methodology in this book except
present
section), unknown parameters are viewed as constants; in a Bayesian -'''-J'V'~' fJ... " ......
are
as random variables, meaning they have a
At the outset in a Bayesian analysis,
pdf assigned to
parameter may be based
on little or no information and is referred to as the prior distribution. As soon as some
are collected, it becomes possible-via Bayes' Theorem-to
and
the pdf
to the parameter. Any such updated
is referred to as a posterior distribution.
In the search for the USS Scorpion, the unknown parameters were the probabilities of
finding the sub in each the grid areas surrounding
Azores. The prior distribution on
parameters were the probabilities P(AI), P(A2), ... , P(A,,). Each
an area was
was the
searched and the sub not found, a posterior distribution was calculated-the
set of probabilities P"'(Al), P*(A2).'
P*(A JI ); the second was the set of probabilities
P**(Al), P"'*(A2), . ", P**(A.d; and so on.
.!
EXAMPLE 5.8.1
Suppose a retailer is interested in modeling the number of calls arriving at a phone bank
a five-minute interval.
4.2 established that
Poisson distribution would
the pdf to choose.
what value should be assigned to the Poisson's parameter, A'!
If the rate of calls was constant over a twenty-four-hour period, an estimate
for )..
could be calculated by dividing
total number of calls received during a full day by 288,
If the
the latter being the number of nve-minute intervals in a twenty-four-hour
random variable X,
the number of calls received during a random five-minute
AI<
interval, the estimated probability that X = k would be px(k) =
k~ ,k
0,1,2,.,.
In reality, though, the incoming call rate is not likely to remain constant over an entire
twenty-four-hour period. Suppose, in
that an examination of telephone logs for the
several months suggests that)" equals ten about three-quarters of the time, and it
about one-quarter of
time.
in Bayesian terminology, the rate
parameter is a random variable A,
the (discrete) prior distribution for A is defined by
two probabilities:
PA(8)
= P(I\. =
8) = 0.25
PA(10)
= peA =
10)
and
= 0.75
Now, suppose certain facets the retailer's operation have recently changed (different
products to sell. different amounts of advertising, etc.). Those changes may very well affect
the distribution
with the call rate. Updating the prior distribution for I\. requires
(a) some data and (b) an application of Bayes' Theorem. Being both frugal and statisticallychallenged,
retailer decides to construct a posterior distribution for A on the basis of
a single observation.
that
a five-minute
i.s
at random
the
corresponding value
X is found to
seven. How should PA (8) and PA (10)
revised?
412
Cha pter 5
Estimation
Bayes' Theorem,
peA = 10 I X = 7) =
=
=
-----...:.--...:..,-----:...~-::---.;.-----
--::---...........!..:.........--,.;::---
+
(0.75)
(0.090) (0.75)
= 0.659
(0.140)(0.25) + (0.090)(0.75)
which implies that
peA = 8 I X = 7) = 1 -
= 0.341
Notice that the posterior distribution for A has changed in a way that makes sense
intuitively. lnitiaJly, peA
8) was 0.25. Since the data point, x = 7, is more consistent
with A = 8 than with A 10, the posterior
increased the probability that A 8
(from 0.25 to 0.341) and decreased the
that A 10 (from 0.75 to 0.659).
=
=
Definitkm5.8.1. Let W be a statistic dependent on a parameter 8. Call its pdf f"'(w 10).
Assume that 8 is the value a random
e, whose prior distribution is denoted
p@(8), if e is discrete, and fe(8), if e is continuous.
posterior distribution of
given that W w, is the quotient
=
if W is Ul"''-UOIC':;
ge(81 W
= w) ==
if W is continuous
Note: If e is discrete, call its pdf Pe(8) and replace the integrations with summations.
Comment.
5.8.1 can be used to construct a posterior distribution even if no
information is available on which to base a prior distribution. In such cases, the uniform
pdf is substituted
either p@(8) or fe(8) and referred to as a noninforrtUltive prior.
EXAMPLE 5.8.2
M~
a video game pirate (and Bayesian), is trying to decide how many illegal copies of
Zombie Beach Party to have on hand for the upcoming Holiday Season. To get a rough
idea of what the demand might be, he talks with 11 potential customers and finds that X :::;: k
would buy a
for a
(or
themselves). The obvious choice for a probability
course, would be
binomial pdf.
11 potential customers, the
model for
Section 5.8
Bayesian Estimation
is the familiar
probability that k would actually buy one of Max's illegal
where the maximum likelihood estimate for () is given by
413
= -Ien
It may
well be
though, that Max has some additional insight about the
value of () on the basis of similar
that he illegally marketed previous
Suppose he suspects. for example, that the percentage of potential customers who will
buy Zombie Beach Party is Likely to be between
and 4 %
probably
not exceed
7%. A reasonable prior distribution for €I, then, would be a pdf mostly concentrated over
the interval 0 to 0.07 with a mean or median in the 0.035 range.
One
probability
whose shape would comply with
restraints that Max
is imposing is the beta pdf Written with e as the random variable, the (two-parameter)
beta pdf is given by
~ (0)
J9
= r(r
+ s) Or-l (I
r(r)r(s)
_ 8)s-1
0 _<
,
e <_ I
The beta distribution with r = 2 and s = 4 is pictured in Figure 5.8.1. By choosing different
values for rand s, fe'(O) can be skewed more sharply to the right or to the left, and the
bulk. of the distribution can
concentrated close to zero or close to one. The QuestJlon
is, if an appropriate beta pdf is used as a prior distribution for e, and if a random """l11I.l'"':;
of k potential customers (out of n) said they would buy the video game, what would be a
e?
reasonable posterior distribution
Definition 5.8.1 for the case where W ( X) is discrete and e is continuous,
2.4
C 1.6
.~
Q
.8
.2
.4
,6
FIGURE 5.8.1
.8
1.0
414
Chapter 5
Estimation
Substituting into the numerator
so
t
10
(n)
k
(n)k
r(r +
r(r)r(s)
+
s)
f'(r)r(s)
(1 _ e),,-Hs-1 de
Notice that if the
rands in the beta pdf were relabeledk
respectively, the equation for ie(e) would be
+ r andn -
k
+ s,
But those same exponents for (J and (1 - (J) appear outside
brackets in
expression
ge(e I X = k).
can
only one ie(8) whose variable
are
ek+r - 1 (1 - 8y·-k+s-1 (see Theorem4.6A), it follows that 8e(8 I X k) is a beta pdf with
parameters k + rand n - k + s.
The final
in the construction of a posterior distribution for e is to choose values for
rand s that would produce a (prior)
distribution
the configuration described
on p. 413-that is, with a mean or median at 0.035 and the bulk of the distribution betwee'n
oand 0.07. It can shown (see (92)) that the expected value of a Beta pdf is r/(r +
Setting 0.035, then,
to that quotient implies that
=
s =28r
By
and error with a calculator that can integrate a beta pdf, the values r = 4 and
s = 28(4) 102 are found to yield an ie(fJ) having almost all of its area to the left oro.07.
Substituting those
for rand s
ge(B I X
k)
the completed posterior
distribution:
=
r(n + 106)
fJk+4-1 1
+ 4)f'(n - k + 102)
(
(n + 105)!
fJk+3 1
+ 3)!(n k + 101)! (
8 X - k ge( t - ) - r(k
=
(k
Section 5.8
Bayesian Estimation
415
EXAMPLE 5.8.3
Certain prior distributions "fit" especially well with
parameters in the sense that
the resulting
distributions are
to
with. Example
was a case
point-assigning a
prior distribution to
parameter in a
pdf
leads to a beta posterior distribution. A simBar relationship holds if a gamma pdf is used
as the prior distribution for the parameter a
modeL
Suppose Xl,
.... , X" denotes a random
from the Poisson pdf,
I 8) =
JI
e-(JO"lk!, k = 0,1. ... Let W
=L
Xi. By
3.12.10, W has a Poisson distribution
;=1
pw(w 18) = e- n8 (n8)Wlw!. w = 0, 1,2, ...
be the
distribution assigned to
e. Then
ge(8 I W = w) =
pw(w
Ie pw(w
I O)fe(B)
I 8)fe<B) de
where
e
-118 (ne)W }.If es-I -1liJ
---e
w! r(s)
W
II
n p, £\w+s-l e-(tt+n)O
---v
w! r(s)
in '-''''',.. ,,1-''....
the same argument that simplified the calculation of the 1"'\"''''''Y''I",,. distribution
we can write
ge«) I W =
81.o+s-1 e-(IHn)& IS
only pdf having the
",p,t",,..,, W
+ sand p, +
ge(8
n. It
IW =
then, that
w) =
(p, + n)w+s
r(w + s)
416
Chapter 5
Estimation
CASE STUDY 5.8.1
Predicting the annual number of hurricanes that will hit the U.S. mainland is a problem
receiving a great deal of public attention, given the disastrous summer of 2004 when
four major hurricanes struck Florida, causing billions of dollars of damage and several
mass evacuations. For all the reasons discussed in Section 4.2, the obvious pdf for
modeling the number of hurricanes reaching the mainland is the Poisson, where the
unknown parameter e would be the expected number in a given year.
Table 5.8.1 shows the numbers of hurricanes that actually did come ashore for
three fifty-year periods. Use that information to construct a posterior distribution for
e. Assume that the prior distribution is a gamma pdf.
TABlE 5.8.1
Years
Number of Hurricanes
1851-1900
1901-1950
1951-2000
88
92
72
Not surprisingly, meteorologists consider the data from the earliest period, 1851
to 1900 to be the least reliable. Those eighty-eight hurricanes, then, will be used to
formulate the prior distribution. Let
S
1'.
JEl
(e) = ~e·<-le-piJ
res)
,
0 <
e<
00
Recall from Theorem 4.6.3 that for a gamma pdf, £ (8) = s / JL. For the years from 1851
to 1900, though, the sample average number of hurricanes per year was ~. Setting
the latter equal to £(8) allows s = 88 and JL = 50 to be assigned to the gamma's
parameters. That is, we can take the prior distribution to be
Also, the posterior distribution given at the end of Example 5.8.3 becomes
Ee (e
I W --
W
50
) -- (
+ n)w-tS8 ew+87
1(w
+ 88)
e-(50+,,)0
The data, then, to incorporate into the posterior distribution would be the fact that
+ 72 = 164 hurricanes occurred over the most recent n = 100 years included
in the database. Therefore,
w
= 92
Section 5.8
Bayesian Estimation
417
EXAMPLE 5.8.4
In the examples seen thus far, the joint pdf gW,eCw, B} = pwCw I 8}fe(B) of a statistic W
and a parameter e (with a prior distribution fe(B)) was the starting point in finding the
p()sterior distribution of e. For some applications, though, the objective is not to derive
88(8 I W = w), but, rather, to find the marginni pdf nf W.
For instance, supp()se a sample of size n = 1 is drawn from a Poisson pdf, pw(w 18) =
e- 8 Bw jw!. w = 0,1, .... where the prior distribution is the gamma pdf, fe(e) =
s
~Bs-le-JjlJ. According to Example 5.8.3,
res)
gwe(w, e)
.
1 J-Ls
= Pw(w I B)1e(8) = ___ B w +s- 1e-(J.LHl8
w!
rcs}
What is the corresponding milrgilUll pdf of W-that is, pw(w)?
Recall Theorem 3.7.2. Integrating the joint pdf of Wand e over e gives
00
PW(w)
= 10
gW,e(w,e)de
roo ..!..Lew+S-Ie-(u+Hede
w! r(s)
=..!..~ roo ew+s-1e-(J.l+!Wde
=
jo
w! r(s)jo
r(w
1
(J.l
=
But
r(w
+ s)
wIres)
+ s)
+
)S (
J.l
(
J-L
+
1
)W
1
J-L
+
1
f'(w+s) = (w+s-l) . FmaUy, let p = J1./(J1. + 1), so 1 wlr(s)
w
P
= 1/(J.l +
1), and
the marginal pdf reduces to a negative binomial distribution with parameters sand p:
(see Question 4.5.6).
CASE STUDY 5.8.2
Psychologists use a special coordination test for studying a person's likelihood of
making manual errors. For any given person, the number of such errors made on the
test is known to follow a Poisson distribution with some particular value for the rate
parameter, 8. But as we all know (from watching the clumsy people around us who
spill things and get in our way), 8 varies considerably from person to person. Suppose,
in fact, that variability in e can be described by a gamma pdf. If so, the marginal
(Continl/ed on next page)
418
Chapter S
Estimation
(Case Study.5.B.2 continued)
TABI..E 5.8.2
Frequency
Negative Binomial
Predicted Frequency
82
57
79.2
57.1
2
46
46.3
3
4
5
39
33
28
33.3
Number
of Errors, w
o
1
6
7
8
9
10
11
22
19
17
12
12
13
10
14
9
8
7
6
16
17
18
28.8
17.0
15.0
11.8
10.4
9.3
8.3
6
5.8
19
5
5.2
20
21
22
5
4
4
4.6
24
25
26
3
3
3
2
2
28
29
30
2
2
2
13
4.1
3.7
3.3
2.9
2.4
2.1
1.9
1.7
1.5
13.1
504.0
pdf of the number of errors made by a individual should have a
binomial
distribution (according to Example 5.8.4).
Columns 1 and 2 of Table 5.8.2 show the number of errors made on the coordination
of 504
made zero errors, 57 made one error, and
test by a
(Continued on nexr page)
Section 5.8
Bayesian Estimation
419
so on. To know whether those responses can be adequately modeled by a
s be estimated.
that end,
binomial distribution requires that the parameters p
p in a
binomial
it should
that the maximwn likelihood
II
is
liS 1
L
can be calculated by choosing a
WI.
for s
;=J
and solving
p, By trial and error, the entries shown in Colwnn 3 were based on a
negative binomial pdf for which s = 0.8 and p = (504)(0.8)/3821 = 0.106.
the
model fits exceptionally well, which supports the
carried out in
5.8.4.
Bayesian Estimation
Fundamental to
philosophy of
analysis is
notion that aU relevant information about an unknown parameter, 0, is encoded in the parameter's! posterior
distribution, ge(O I W = w). Given that premise, an obvious question
How
can ge(O I W
w) be used to calculate an appropriate poin.t estiml11or,
One
approach. similar to using the likelihood function to find a maximwn likelihood estimator, is to differentiate the jX)sterior distribution,
case the value for which
dgeCO 1 W = w)/de = O-that
the mode-becomes
For theoretical reasons, though, a method much preferred by Bayesians is to use some
theory as a
for identifying a reasonable In particular,
key ideas from
estimates are chosen to minimize the risk associated with B. where the risk is
expected value the loss incurred
the error in the
Presumably, as B 0
further away from O-that is, as the estimation error
larger-the
with will increase.
e.
e.
e
e
be an estimator for (} based On a
W. The loss function.
Definition 5.8.2.
associated with Bis denoted L(e.O}.
L(e. O} ~ 0 and L(O. 0) = O.
EXAMPLE 5.8.5
It is typically the case that quantifying in
way
consequences, economic or
otherwise, of not
equal to 0 is all but impossible. The "generic" loss functions
in those
are chosen primarily for their mathematical convenience. Two
01 and L(O,O} = (B - 8)2. "'Vll"...U"~.",
most frequently llsed are L(e,O) =
though, the context in whicn a parameter is being estimated does allow for a loss function
to be defined in a very specific and relevant way.
by Max, the
video game
whose
Consider tne inventory diJemma
illegal activities were described in ExampJe 5.8.2. The unknown parameter in question
the proportion
n potential customers who would purchase a copy of
Party. Suppose
Max decides--for whatever reasons-to estimate 0 with As
a consequence, it would follow that he should have 11 copies of the video game available.
That
what would
corresponding loss funciton?
the implications of 8 not being equal to (} are readily quantifiable. If B <
sales will be lost (at a cost of,
$c
video). On
other
if
e
Ie -
.t:...VJlUVL....
e
e,
420
5
Estimation
there will be n(fJ
8) unsold videos, each of which would incur a
cost of, say,
$d per unit. The loss function that applies to Max's situation, then, is clearly defined:
L(9,8)
= {$cn(~
- B)
Sdn(8 - 8)
iffJ < 8
if fJ > 8
Definition 5.8.3. Let L(0,8) be the loss function associated with an estimate of the
parameter O.
Be(8 I W = w)
the posterior distribution of the random variable S.
Then the risk
with 0 is the
value of the loss function with respect
to the posterior distribution of 8.
feL(B,O)B0(B
risk= {
E
all
e
L(O,
I W = w)dO if S is continuous
I W=w)
if e is discrete
Using the Risk function to find i;
Given that the risk function represents the expected loss associated with the estimator B,
it makes sense to look for the B
minimizes the
Anye that achieves that objective
is
to be a Bayes estirn£lle. In general, finding the Bayes estimate
d(risk)/dB = o. For two of the most frequently used loss functions, LUU'~".
the
L(8, B)
Ie - 01 and L(B,
(0
•thereis a much easier way to calculate
=
=
Theorem SAL Ler ge(O I W
ter O.
8.
= w) be the posterior distribution for the
If the loss function associated with jj is
o is the median of ge(O I W = w),
L(B. B) = 18
b. If the loss function associated with lJ is L(B,
forO is the mean ofge(B I W = w).
parame-
01, then the Bayes estimate for
= (0
-
, then
Bayes estimate
Proof
IL The proof follows from a
result
the expected
of a random
variable. The fact that
pdfin the expectation here is a posterior distribution is
irrelevant. The derivation will be
for a continuous random variabLe (having
a finite expected value); the
for the discrete case is
Let fw(w) be the pdf for the random variable W, where the median of W is m.
Then
E(lW
ml)
=
Iw -
mlfw(w)dw
(m -
w)fw(w)dw
i:
i:
=m
+
/w(w)dw -
L
oo
+
i:
w/w(w) dw - m
few - m)/w(w)dw
w/w(w)dw
L
oo
/w(w)dw
Section 5.8
I:
421
of the median so,
and last integrals are equal
E(IW - ml) = -
Bayesian Estimation
+[
wfw(w)dw
wfw(w) dw
Now, suppose m ?:: 0 (the proof for negative m is similar). Splittirllg the first
into two parts gives
E(IW - ml)
=-
f:
wfwCw)dw -
10
m
wfw{w)dw
+
[Wfw(W)dW
Notice that the middle integral is positive, so changing its ne)::aw{e
UJJI-'H'-''' that
EClW - ml)
~-
wfw(w)dw
~ 1......,
~
dw
1
+[
m
+
0
wfw(w)dw
+[
to a plus
wfw(w)dw
wfw(w)dw
Therefore,
E(lW - ml)
~
E(lWI)
(5.8.1)
Finally, suppose b is any constant. Then
1
2 = P(W :::; m) = pew showing that m - b is
Equation 5.8.1 to
E(IW - ml)
W - b. Applying
of the random
......IV ......."
= E(I(W
b :::; m - b),
- b) - (m - b)l)
~
E(lW - bl)
which implies that the median of 80(8 I W = w) is the
estlma'te for 8 when
L(e, e) =
81·
b. Let W be any random variable whose mean is J), and whose variance is finite, and
let b be any constant.
Ie -
E[(W - b)2] =
-
- J1.)
+
- J1.)2]
;:::: Var(W)
(J1. - b)f
+ 2CJ1.
+0+
- h)E(W - J1.)
+ CJ1. -
h)2
(J1. - h)2
implying that E[(W - h)P is minimized when b = J1.. It follows that the Bayes
estimate for e, given a quadratic loss function, is
mean
posterior
distribution.
0
422
Chapter 5
Estimation
EXAMPlE 5.8.6
Recall Example 5.8.3,
the ,..,,.,,r::l ....,,"II"" in a Poisson distribution is assumed to have a
n
gamma prior distribution. For a random
of size n, where W
L
1=1
pw(wI8) = e- n8 (nB)W Jw!,
w = 0, 1,2•...
f.l,s
fe(e) =
_H-
which resulted in the posterior disrribution being a gamma pdf with
w +s
and f.l + n.
Suppose the loss function associated withB is quadratic, L(e, 8) = (/j - ())2. By part (b)
of Theorem
the Bayes estimate for (J is the mean of the posterior distribution. From
Theorem 4.6.3, though, the mean of gf}({J I W = w) is (w + s)J(J}. + n).
Notice that
which shows that the
estimate for
w
+s
J}.
+n
estimate is a weighted average of w the maximum likeHhood
e and .:., the mean of the
J}.
prior distribution.
Mo~eover, as n gets large, the
Bayes estimate converges to the maximum likelihood estimate.
QUESTIONS
5.8.L Suppose that X is a geometric random variable, where PX (kl()) = (1 - 8)k{J, k = 1.2, ...
Assume that the prior distribution for 8 is the beta pdf with parameters l' and s. find
the posterior distribution for B.
5.8.2. Find the squared-error loss (L(e. B) = (8
8)2)
estimate for IJ in Example 5.8.2
and express it as a weighted average of the maximum likelihood estimate for () and the
mean of the prior pdf.
5..8.3. Suppose the binomial pdf described in Example 5.8.2 refers to the number of votes
a candidate might receive in a poll conducted before the general election. Moreover.
distribution has been assigned to B, and every indicator su!~ests
suppose a beta
the election
close. The pollster, then, has good reason for concentrating the bulk
of the
distribution around the value B = ~.
the two beta parameters r
and s both equal to 135 will accomplish that objective (in the event I' = S = 135, the
probability of IJ being between 0.45 and 0.55 is approximately 0.90).
(a) Find the corresponding posterior distribution.
(b) Find the squared-error loss Bayes estimate for () and express it as a weighted
average of the maximum likelihOOd estimate for () and the mean of the prior pdf.
5.8.4. What is the
loss Bayes estimate for the parameter () in a binomial pdf,
where B has a uniform distribution-that
a noninformative
(Recall that a
uniform prior is a beta pdf for which I' = S = 1).
5.8.5. In Questions 5.8.2-5.8.4, is the
estimate unbiased? Is it asymptotically unbiased?
5.8.6. Suppose that Y is a gamma random variable with parameters rand e and the prior lS also
gamma with parameters sand f.L. Show that the posterior pdf is gamma with parameters
r + sand y + f.l.
Section 5.9
a Second Look at Statistics (Revisiting the Margin of Error)
423
5.8.7. Let Y]. Y2, .. . , Y" be a random sample from a gamma pdf with parameters rand
where the prior distribution assigned to lJ is the
pdf with parameters sand Jl.
Let W = f t +
+ ... + f". Find the posterior pdf for 8.
5.8.8. Find the squaroo-error Joss Bayes estimate for 8 Question 5.8.7.
5.8.9. Consider, again,
scenario described in
binomial random variable
X has
nand lJ, where the latter
a beta prior with integer para(Joeters
and s. Integrate the joint pdf px(k I 8)fe(lJ) with
to lJ to show that
pdf of X is given by
k = 0,1, ... ,II
TAKING A SECOND LOOK AT STATISTICS (REVISmNG THE MARGIN OF ERROR)
The margin of error, d, was introduced in Definition 5.3.1 as being half the width of
the largest
95% confidence interval
the binomial parameter p. As such, it
serves as a useful measure of the sampling variation associated with the
p
X
= -,
11
where the random variable X is the number of succeSSeS observed in 11
trials.
That is, we would expect in the long run that at least 95% of the intervals (Pe - d,
Pe + d) would
the true p. where
the observed
proportion
of successes.
error and how it should be interpreted
Unfortunately, the meaning of the
have been
distorted by the popular
The mistake that is made is particularly
prevalent and most egregious in the context of political polls. Here is what
A
poll (based on a sample of n voters) is conducted, Showing, for example, that 52% of
the
intend to support Candidate A and 48%. Candidate B. Moreover, the
of error, based on the sample of size 11, is (correctly) reported
corresponding
What often comes next is a statement that the race is a "statistical
to be,
the difference between the two percentages,
tie" or a "statistical dead heal"
52%
4%, is within the 5 %
of error. Is that statement
No. Is it
even
to being true? No.
If the observed difference in the
supporting
A and Candidate B
is 4% and the margin of error is 5%, then the widest possible 95%
interval
for p, the true difference between the two percentages (p =
true %) would be
= (-1%.9%)
The
implies that we should not
out the possibility that the true value for p
could
as smaU as -1 % (in which case
B would win a tight
or as large
a landslide). The
in the
as +9% (in which case Candidate A would
-1 % to
"statistical
tenninology is the
that all the possible values
+9% are equally likely. ThaI is simply not true,
every confidence
parameter
near the center are much more
than those near either the left-hand Or
424
Chapter 5
Estimation
endpoints. Here, a 4% lead for
A a poll that has a 5% margin
error is not a "tie"-quite the contrary, it would more properly be interpreted as almost
a
that Candidate A
wm.
there is yet a more
problem in using the margin
of error as a measure of the day-ta-day or week-to-week variation in political polls.
By definition, the
of error refers to sampling
is, it reflects the
Tlonf.,nl'lnfl
extent to which
p = X varies if
of size n are drawn
n
from the same
Consecutive political polls,
the same
population. Between one poll and the next, a variety of
can
that can
fundamentally change
of the voting
candidate may give
an especially good speech or
an embarrassing
a scanda1 can
that
damages someone's
or a world event comes to
reason or another reflects more
on one candidate than the
Although
'
•
fI
X
a 11 these possIbilities have the
to m uenee the value of
much more than
n
""""P>"'5 variability can. none of them is inc1uded in the margin of error.
APPENDIX S.A.1
MINITAB APPUCAllONS
ability to generate random observations from many of the standard
LUL' ....
"'V ...". computers can be very effective in illustrating estimation
Table 5.4.1, showing a simulation consisting of forty samples of
size two drawn
the pdf fy(y; (j) = ~e-YI8, y > O.
each (in column C4)
of unbiased ness, it was
is the estimate
. As a demonstration of the
pointed out that the
of those forty Bes is 1.02, a number
close to l.OO, the
theoretical
value of [}.
The meaning of confidence intervals can also be nicely
MINITAB's
RANDOM command.
formulas for confidence intervals is straightforward, but
calling attention to their variability
to sample is best aC(~ornplllStled
Monte Carlo analysis. Example 5.3,1 is a case in point. The fifty simulated
5.3.1 reinforc(e the interb.~etation that;~o)uld be
intervals displayed in
y - 1.96 -J4' Y + 1.96 J4
any particular evaluation of the
.
distributions of
some their important
also be
examined using the computer.
the serial number analysis rI""::'l".r,nP.rI
Study 5.4.1. If the production numbers to be estimated are large, then the "ccn .......... ti'n ...
the
numbers represent a
from a discrete uniform
replaced by the assumption that the captured serial numbers
a
salnplle from the (easier-to-work-with) continuous uniform pdf, defined over the
Two unbiased estimators
81 =
Yi
(2In)
i=l
Appendix SA1
MINITAS Applications
and
fJz = {(n + 1)ln)Ymax
Example 5.4.2). A (ollow-up analysis in Example 5.4.7 showed that Ih.. is the better of
two estimators ...,.....c,,"" ..
suppose the complexity of two unbiased estimators precluded the calcwation of
simplest solution would
How would we
which to use? Probably
to simulate each one's distribution and compare their sample standard deviations.
Figures S.A1.! and S.A.1.2 ilJustrate that technique on the two estimators
fit := (2In)
liTE
YI
and
82 =
«n
+ 1)ln)Yrnax
> random 200
SUBC > uniform 0 3400.
MTB > rmean ci-c5 c6
MTB >
c7 ==
MTB > histogram c7;
SUBC > start 2800;
SUBC > increment 200.
Histogram of C7
N ~ 200
48 Obs. be loy the first class
Midpoint
2800
3000
3200
Count
12
12
19
3400
3600
13
22
3800
4000
17
11
4200
4400
14
8
4600
10
4800
5000
5200
3
6
3
************
************
*********.*~*~.****
*************
**************.****•••
*•• **************
.**********
******* ••••••*
********
***••••***
***
'1""*.**
***
5400
2.*
KTB :> describe c7
C7
C7
N
MEAN
MEDIAN
200
MIN
997.0
3383.8
3418.3
TRMEAN
3388.6
MAX
Ql
OJ
5462.9
2718.0
4002.1
FIGURE SA 1.1
STDEV
SEMEAN
913.2
64.6
426
Chapter 5
Estimation
tITS > random 200 c1-c:S;
SUBC > uniform 0 3400.
MTB > rma.ximum c1-c5 c6
MTB > let c7 - (6/5}*c6
MTB > histogram c7;
SUBC > start 2800;
SUBC > increment 200.
Histogram of C7
N = 200
32 Dba. belov me firs"!; class
Midpoint
Count
2800
8
********
3000
10
*.***•••**
3200
17
••***••• *********
3400
2:2
.****•••**•••**•••***.
3600
36
.***********************************
37
3800
**********•••**.*****.****••••**••***
4000
38
**************************••••**••***.
MTB > describe c7
N
MEAN
MEDIAN
TRMEAN
STDEV
SEMEAN
3398.4
3604.6
3437.1
563.9
39.9
C7
200
1;11
1;13
MIN
MAX
1S13.9
4On.4
3093.2
3841.9
C1
FIGURE S.A.1.2
for the uniform parameter 0. Suppose that n = 5 serial numbers have been "captured"
and the true value for () is 3400. Figure 5.A.l.1 shows the MlNITAB syntax for generating
two hundred samples of size five from fy(y; B) = 1/3400,0 ::.s y ::.s 3400 and calculating
0.. The DESCRIBE command shows that the average of the Oes is 3383.8 and the sample
standard deviation of the two hundred estimates is 913.2.
In contrast, Figure 5.A.1.2 details a si.mjlar simulation (two hundred samples, each of
size five) for the estimator th. The accompanying DESCRIBE output lends supJX)rt to
the claim that ih. is the better estimator-it shows the average Be to be closer to the true
value of 3400 than the average Be calculated from 61 (3398.4 versus 3383.8) and its sample
standard deviation is smaller than the sample standard deviation of the ()es from Ot (563.9
versus 9132).
HAPTER
6
Hypothesis Testing
6.1
6.1
6.3
6.4
6.5
6.6
INTRODUcnON
THE DECISION RULE
TESTING BINOMIAL DATA-No: P = Po
TYPE I AND TYPE II ERRORS
A NOTION OF OPTIMALITY: THE GENERALIZED llKEUHooD RATIO
TAKING A SECOND LOOK AT STATISTICS (STATISTICAL SIGNIACANCE
VERSUS "PRACTICAl''' SIGNIFICANCE)
Pierre-Simon, Marquis de Laplace
As a young man, Lap/ace went to Paris to seek his fortune as a maththat he enter the clergy. He
ematician, disregarding his father's
soon became a protege of d'Alembert and at the age of twenty-four was
elected to the Academy of Sciences. Laplace was recognized as one of the
,,,,,,nmln figures of that group for his work in physics, celestial mechanics,
and pure mathematics. He also enjoyed some political prestige, and his
friend, Napoleon Bonaparte, made him Minister of the Interior for a brief
period. With the restoration
the Bourbon monarchy,
renounced
made him a marquis.
Napoleon for Louis XVIII, who
-Pierre-Simon, Marquis de Laplace (1749-1827)
427
428
6.1
Chapter 6
Hypothesis
INTRODUcnON
as we saw in Chapter 5, often reduce to numerical
of parameters,
either in the form of single points or as confidence intervals. But not always. In many
experimental situations, the conclusion to be drawn is Mt numerical and is more aptly
phrased as a choice between two conflicting theories, or hypotheses. A court psychiatrist,
for example, may be called upon to pronounce an accused murderer either "sane" or
"insane"; the FDA must decide whether a new
vaccine is «effective" or
,
a geneticist concludes that the inheritance of
color in a certain strain of Drosophila
melanogaster either "does" or "does not" follow classical Mendellan principles. In this
"U"V"~~ we examine the statistical methodology and the attendant consequences involved
in makmg decisions of this sort
The process of dichotomizing the p05Sible conclusions of an experiment and then using
the theory of probability to choose one option over the other is known as hypothesis
testing. The two
propositions are called the null hypothesis (written Ho) and
the altenwtive hypothesis (written H1). How we
about choosing between Ho and HI
is conceptually similar to the way a jury deliberates in a court
null hypothesis
is analogous to the defendant lust as the latter is presumed innocent until "proven"
guilty, so is the nun hypothesis "accepted" Wlless the data argue overwhelmingly to
the contrary. Mathematically, choosing
and
is an
in applying
courtroom protocol to situations where the
of measurements made
on random variables,
Chapter 6 focuses on basic principles-in particular, on the probabilistic structure that
underlies the
process. Most of the important specific applications of
hypothesis testing will be taken up
beginning in Chapter 7.
6.2 THE DEGSION RULE
We will introduce
basic concepts of hypothesis testing with an example. Imagine an
automobile company looking for additives that might
mileage. As a
study, they
thirty cars fueled with a new additive on a road trip from Boston to Los
Angeles. Without the additive, those same cars are known to average 25.0 mpg with a
standard deviation (0') of 2.4 mpg.
Suppose it turns out that the thirty cars averaged y 26.3 mpg with the additive. What
should the company conclude? If the additive is effective but the position is taken that
the
(rom 25,0 to 26.3 is due solely LO chance, the company will have
passed up a potentially lucrative product. On the other hand, if the additive is
but the firm interprets the mileage increase as "proof" that the additive works, time and
money will ultimately be wasted developing a product that has no intrinsic value,
In
would assess the increase from 25.0 mpg to 26.3 mpg by framing
the company's choices in the context of the courtroom analogy mentioned Section 6.L
null hypothesis, which is typically a statement reflecting
status quo, would
be the assertion that the additive has no effect; the alternative hypothesis would claim
that the additive does work.
agreement, we give Ho (Like the defendant) the benefit
of the doubt. If the road trip average, then, is "close" to 25.0 in some probabilistic sense
its
still to be determined, we must conclude that the new additive has not
=
Section 6.2
The Decision Rule
429
superiority. The problem is, whether
mpg qualifies as being "close" to
mpg is
not immediately obvious. At this point, rephrasing the question in random variable terminology will prove
helpful.
Yl, )'2, ••. , Y30 denote the mileages recorded by
of the cars
cross-country test run. We
assume that
YiS are nonnally distributed with an
unknown mean IL Furthermore, suppose that
experience with road tests ofthis type
suggest that (f will equal 2.4.1 That is,
fy(y; Jl)
= -==--e
-oo<y<oo
then, can be expressed as statements about
The two competing
we are testing
Jl
HI: Jl
Jl.
effect.
= 25.0
(additive is not effective)
versus
> 25.0 (additive is effective)
Values of the sample mean, y. less than or equal to 25.0 are certainly not grounds
for rejecting the null hypothesis;
a bit larger than 25.0 would also lead to that
conclusion (because of the commitment to give Ho the benefit the doubt). On the other
hand. we would probably view a cross-country
of, say, 35.0 mpg as exceptionaHy
strong evidence against the null hypothesis, and our decision would be "reject Ro." It
follows that somewhere between
and 35.0 there is a point-call it
for all
practical purposes the credibility of Ho
(see
6.2.1).
Possible
sample means
25.0
Values of Y not
markedly inconsistent
with the
assertion that
25
Values ofy
lhatwould
appear to
refute Ho
fIGURE 6.2.1
Finding an appropriate numerical value for 31'" is accomplished by combining the
counroom analogy with what we know about the probabilistic behavior of Y. Suppose. for
to 25.25-that is, we would reject Ho if Y 2:::
the sake argument. we set y*
that a good decision rule? No. Jf25.25 defined "close," then Ho would
rejected 28%
III] praclice. the value of a w;ually needs to be estimated; we will return to that more frequel]tly enoolilitered
s«nario in Cbapter7.
430
Chapter 6
Hypothesis Testing
1.0
Distribution of V when ,',,Ht) : .u '" 25.0 is trlle
/'
J" I Ho is true)
Area" P
=0.2843
A'
/'
;
..
" '"
......... -- --'"
0.5
A'
A'
23.5
24.0
24.5
25.0
26.5
J"'=25.25
LRejectHo
RGURE 6.2.2
of the time even if Ho were true:
P(we
Ho I Ho is true) = P(Y :::
=
I J.1, = 25.0)
- 25.0 > 25.25
P
-
25.0)
2.4/.J30
=P(Z ::: 0.57)
=0.2843
Figure 6.2.2). Common sense, though, tells us that 28% is an inappropriately large
No jury, for example, would
probability for making this kind of incorrect
convict a defendant knowing it had a 28% chance of sending an innocent person to jail.
Larger. Would it be
to set y* equal to, say,
Clearly, we need to make
26.507 Probably not, because setting y* that large would err in the other direction by
giving the null hypothesis too much benefit of the doubt. If y* = 26.50, the probability of
rejecting Ho if Ho were !me is only 0.0003:
P(we reject Ho I Ho is true) :::: p(Y::: 26.50 I J.1, = 25.0)
= p
y -
25.0
( 2.4/.J30 >-
-----==--
= P(Z ::: 3.42)
=0.0003
6.2.3}. Requiring that much
before
Ho would be analogous
to a jury not returning a guilty verdict unless the prosecutor could produce a roomful of
eyewitnesses, an obvious motive, a signed confession, and a dead body in the trunk of the
defendant's car!
If a probability of 0.28 represents too little benefit of the doubt
0.0003
too
what value should we choose for
I Ho is true)?
While there is no way to answer that question definitively or mathematically, researchers
who use hypothesis testing have COme to a consensus that the probability of rejecting Ho
Section
235
24.0
24.5
25.0
The Decision Rule
25.S
431
26.5
I
r=26.50
L
fiGURE 6.2.3
when Ho is true should be somewhere in the neighborhood of 0.05.
seems to
suggest that when a 0.05 probability is used, null hypotheses are
dismissed too
capriciously nor
too wholeheartedly. (More will be said about this particular
probability, and consequences, Section 6.3.)
Comment. In 1768, British
were sent to Boston to quell an outbreak of
civil disturbances.
citizens were killed in the
subsequently put on triaJ (or manslaughter. Explaining the
a
reached, the judge told the jury, "If upon the whole, ye are in any
verdier was to
reasonable doubt of their guilt,
must then.
to the rule of
declare them
innocent"
since, the expression "beyond all reasonable doubt" bas been a
much evidence is
in a jury trial to overturn a
frequently
indicator of
defendant's presumption of innocence. For many experimenters, choosing y* such that
P(we reject Ho I Ho is true) :::: 0.05
is comparable to a jury convicting a defendant only if the latter's guilt is established
"beyond
doubt."
Suppose the 0.05 "criterion" is applied here. Finding the corresponding y* is a
calculation similar to what was done in Example
Given that
=:: y* I Ho is true) = 0.05
it follows that
- 25.0)
P Y - 25.0 > y* - 25.0) = P ( Z > 'y*
----=:( 2.4/~
-
2.4/v'30
-
2.4/..J30
0.05
But we know from Appendix A 1 that P(Z ;::: 1.64) = 0.05. Therefore,
y'" - 25.0
2A/..J30
which implies that y* = 25.718.
1.64
(6.2.1)
402
OIapter 6
Hypothesis Testing
The company's statistical
is now completely determined: They should reject
the null hypothesis that the additive has no
if y 2:: 25.718. Since
sample mean
was 26.3, the appropriate decision indeed, to reject No. It appears that the additive does
11'1"'1"",$>·0" mileage.
Comment. It must be remembered that rejecting No does not prove that No is false,
any more than a jury's decision to convict guarantees that the defendant is guilty. The
0.05 decision rule is simply saying that lIthe true mean (lot) is 25.0, sample means (y) as
or
than 25.718 are expected to occur only 5% of the time. Because of that
is that lot is MIl5.0.
small probability, a reasonable conclusion when y 2::
Table 6.2.1 is a computer simulation of this particular 0.05 decision rule. A total
of seventy-five random
each of size thirty. have been drawn from a normal
distribution having lot == 25.0 and (J = 2A. The corresponding y for each
is then
the entries in
table indicate, five of the
lead
compared with y* == 25.718.
to the erroneous conclusion that Ho: lot = 25.0 should be rejected.
TABLE 6.2.1
24.602
24.587
24.945
24.761
24.177
25.306
25.601
no
no
no
no
no
no
no
no
24.547
24.235
25,809
no
no
DO
25.307
25.011
24,783
25.196
24.577
24.762
25.805
24.380
25.224
24.371
25.033
yes
no
no
no
no
no
no
yes
no
no
no
no
25.866
25.623
24.550
24.919
24.770
25.080
25.3(J7
24.004
24.772
24.843
25.771
24.233
24.853
25.018
25.176
24.750
25.578
24.807
24.298
24.807
24.346
25.261
25.391
2:: 25.718?
Y
?:: 25,7181
no
yes
no
no
no
no
no
no
no
no
no
yes
no
no
no
no
no
no
no
no
no
no
no
no
no
25.200
25.653
25.198
24.758
24.842
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
24.793
24,874
25.513
24.862
25.034
25.150
24.639
25.045
24.803
24.780
25.691
24.207
24.743
24.618
25.401
24.958
25.678
24.795
Section 6.2
The Decision Rule
433
Since each sample mean has a 0.05 probability of exceeding 25.718 (when iJ- =25.0), we
of the
sets to result in a "reject Ho" conclusion.
would expect that 75(0.05), or
Reassuringly, the observed number of
inferences
5) is quite dose to that
expected value.
f-l = f-lo is rejected
a 0.05 decision rule, we say that the
Comment. If
difference between y and f-lo is statistically significant.
Expressing Decision Rules in Terms of Z Ratios
As we have seen, decision rules are statements that spell out the conditions under which
a null hypothesis is to be rejected The format of those statements, though, can
Depending on
context, one
may
easier to work with than another.
1. Rejecting Ho: f-l 25.0 when
Recall Equation
Y :::
= 25.0
+ 1.64
2.4
. .jjQ
= 25.718
is dearly equivalent to rejecting Ho when
y
25.0> 1.64
2.4/J30 -
(6.2.2)
(if one rejects the null hypothesis. the other will
We
do the same).
Y - 250
from Chapter 4 that the random variable
. has a standard normal
distribution (if f-l = 25.0). When a particular y is substituted for (as in Inequality 6.2.2),
y - 25.0
.
is typically (and most
we call
J30 the observed z. Choosmg
between Eo and
2.4/ 30
conveniently) done in tenns of the observed z. In Section 6.4, though, we will encounteT
certain questions related to hypothesis testing that are best answered by phrasing the
decision rule in terms ofy*
Definition 6.2.1.
function of the observed data whose numerical value dictates
whether Ho is accepted or
is called a test statistic.
set of values
the test
statistic that result in the null hypothesis being rejected is called the critical region and
is denoted C. The particular point in C that separates the rejection
from the
acceptance
is called the critical vallll!.
Comment. For
25.0 qualify as test statIStics.
. .
gas mt'1eage examp1e, bo t h -yan d "----==_
If the sample mean is used, the associated critical region would be written
C
= (y; y ~ 25.718}
434
Chapter 6
Hypothesis Testing
(and
is the critical value). If the decision rule is framed in terms of a z ratio,
C
In this latter case, the critical
= {z; z = -
25.0 ?::: 1.64 )
is 1.64.
Definition 6.2.2. The probability that the test statistic Jies in the critical region when
Ho is troe is called the level ofsignificance and is denoted a.
Comment.
principle, the value chosen for a should
the consequences of
making the mistake of rejecting Ho when Ho is true. As those consequences get more
severe, the critical region C should be defined so that a gets smaller. In practice, though,
to quantify the costs of making incorrect inferences are arbitrary at best. In
most situations,
abandon any such attempts and routinely set the level of
"ll'!.~Ulj'''''''''''''''' equal to 0.05. If another a is used, it is likely to be either 0.001, 0.01, or 0.10.
again, the similarity between hypothesis
and courtroom protocol is worth
keeping in mind. Just as experimenters can make a larger or smaller to reflect the
Ho when Ho is true, so can juries demand more
consequences of mistakenly
or less evidence to return a conviction. For juries, any such
are usually dictated
punishment. A grand jury deciding whether or not to
by the severity of the
someone for fraud, for example, will inevitably require less evidence to return a
conviction than will a jury impaneled for a murder triaL
One-Sided Versus Two-Sided Alternatives
In most hypothesis tests, Ho ccmsists of a
number, typically the value of the
IJO'.OH.\,OUo;..J. that
the status quo. The "25.0" in Ho:}J. = 25.0, for example, is the
mileage that would be
when the additive has no effect. If the mean of a normal
our general notation for the null hypothesis
distribution is the parameter being
will be Ho:}J. = }J.o. where }J.o is the status quo value of }J..
Alternative hypotheses, by way of contrast, invariably embrace
of
parameter values. If there is reason to believe before any dala are collected that the
is necessarily
to one particular "side" of Ho, then HI is
parameter being
defined to retlect that limitation and we say that the alternative hypothesis is one-sided.
Two variations are possible: H t can be one-sided to the left (HI:}J. < }J.{) or it can be onesided to lhe righl (Hl:}J. > }J.o). If no such a priori information is available, the alternative
hypothesis needs to accommodate the possibility that the true parameter value might lie
on
of }J.O. Any such alternative is said to be
For
Ho:}J. = }J.o,
the two-sided alternative is written Ht:}J. ¢ J.l.o<
III the gasoHne example, it was tacitly assumed that the additive would either have
no effect (in which case J-L = 25.0 and Ho would be true) or it would increase mileage
true mean would 1ie somewhere "to the right" of Ho). Accordingly,
(implying that
we wrote the alternative hypothesis as
}J. > 25.0. If we had reason to suspect, though,
that the additive might interfere with the
combustibility and possibly decrease
mileage, it would have been necessary to use a two-sided alternative (Hl:}J. ¢ 25.0).
Section 6.2
The Decision Rule
435
Whether the alternative hypothesis is defined to be one-skied or two-sided is important
because the nature of H1 plays a key role in determining the
of the critical region.
We saw earlier that the 0.05 decision rule
testing
HO:JL
= 25.0
versus
:JL >
- 25.0 ?!: 1.64. That is, only if the sample mean is
if
calls for Ho to be
substantially larger than 25.0 will we
Ho.
If the
hypothesis had been two-sided,
means either much smaller
than 25.0 or much larger than 25.0 would be evidence against flo (and in support of H1),
Moreover, the 0.05 proba bility associated with the critical region C would be split into two
halves, with 0,025 being assigned to the left-most portion of C, and 0.025 to the right-most
portion. From Appendix Table Al, though, P(Z ::s -1.96) = P(Z ?!: 1.96) = 0.025, so
.
- 25.0 .
the two-sided 0,05 decision rule would call for HO: JL
to be rejected if
is
=
either (1) .::: -1.96 or (2) ?!: 1.96.
Testing Ho: JL
=:: /La
(a known)
Let Za be the number having the property that P(Z ?:: Za) = a. Values for Za can
found from the standard normal edt tabulated Appendix AI. If a = 0.05, for example,
Z.05 = 1.64 (see Figure 6.2.4). Of course, by
symmetry of
normal curve, -Za has
the property that P(Z ::s
= a.
0.4,
;
I
/'
--'..'
;
I
, .-
I
0.2
I'
Area 0.05
o
AGURE 6.2.4
Theorem 6.2.L Let Yh)'2 •..• ,Yn be a rantiom sample
- JLo
where (}' is known. leI Z = "-----;;::;i-,
n from a normal distribution
a. To test Ho: JL = JLo versus
JL > JLo al the a level of significance. reject Ho if Z ?:: Za·
b. To test Ho: 11- = 11-0 versus H1; JL < 11-0 III the a level of significance,
Ho if
Z::::
-Za·
= JLo versus HI: JL
::s -Zaj2 or (2)?:: '4tj2.
c. To lest Ho: JL
either (1)
:;f:.
JLo at the a level of significance, reject Ho if z is
436
Chapter 6
Hypothesis Testing
EXAMPlE 6.2.1
As part of a "Math for the Twenty-First Century" initiative, Bayview High was chosen to
participate in the evaluation of a new algebra and geometry curriculum. In the recent past,
Bayview's students would
"typical," having earned scores on standardized
exams that were very consistent with national averages.
Two years
a cohort of eighty-six Bayview sophomores, aU randomly selected.,
were
to a special set of classes that integrated algebra and geometry. Aceording
to test
that have just been
those students averaged
on the SAT-I
math exam; nationwide, seniors averaged 494 with a standard deviation of
it be
claimed at
CJ = 0.05 level of significance that the new curriculum had an effect?
To
we define the parameter J..L to be the true average
math score that we
the new
to produce. The obvious "status quo" value for J..L is the
current national average-that is, /.to = 494. The alternative hypothesis here should be
two sided because the possibility certainly exists that a revised curriculum-however well
intentioned-would actually lower a student's achievement.
According to Part (c) of Theorem 6.2.1, then, we should reject Ho:J..L = 494 in favor
of
/.t ¢ 494 at
CJ
0.05 level
significance if the test statistic z is either (1)
< -Z.025(= -1.96) or (2) 2::. Z.02S(= 1.(6). But y = 502, so
=
z = 502 - 494 =0.60
implying that our decision should be "Fail to
"though Bayview's
is
eight points above the national average, it does not follow that the improvement was
due to the new curriculum: An increase of that magnitude could easily have occurred by
chance, even if the new curriculum had no effect whatsoever (see Figure 62.5).
0.4,
,,
;
Area = 0.025
,
..,
#
I
/02
~",fz(z)
.....
"
'\
\
...
...
"-
...
Area =0.025
FIGURE 6.2.5
Comment. If the null hypothesis is not rejected, we should phrase the conclusion
as "Fail to reject Ho" rather than "Accept Bo." Those two statements may seem to be
same, but. in
they have very different connotations. The phrase "Accept Ho"
suggests that the experimenter is concluding that Ho is true.
that may not be the case.
In a court trial, wben a jury returns a verdict of "Not guilty," they are not saying that
Section 6.2
The Decision Rule
437
they necessarily believe that the defendant is innocent. They are simply asserting that
the evidence-in their opinion-is not sufficient to overturn the presumption that the
defendant is innocent That same distinction applies to hypothesis testing. If a test statistic
does not faU into the
(which was
case in Example 6.2.1), the proper
interpretation is to conclude that we
to
Ho."
The P-Value
There are two general ways to quantify the amount of evidence against Ho that is contained
in a given set of data.
first involves the level of significance concept introduced in
Definition
Using that
the experimenter selects a value for 0: (usually 0.05 or
0.01) befoN:! any data are collected. Once 0: is specified, a corresponding critical
can
be identified. If the test statistic falls in the critical region, we reject Ho at the a level of
stgmolcal1ce Another strategy is to calculate a P-value.
Definition 6.U The P-value
with an observed test statistic is the probability
of getting a value for that test statistic as extreme or more extreme than what was
actually observed (relative to HI) given that Ho is true.
Comment.
statistics that yield smaU P-values should be interpreted as evidence
against Ho. More specifically, if the P-value calculated for a test statistic is less than or
equal to a,
null hypothesis can
rejected at the a level of significance.
put
another way, the P-value is the smallest 0: at which we can
Ho.
EXAM PtE 6.2.2
Recall Example 6.2.1. Given that Ho: J.L
P-value is associated with the
interpreted?
Ho: J.L
494 is being
test statistic, Z
against HI: J.L
*" 494, what
= 0.60, and how should it be
= 494 is true, the random variable Z = - - =
a standard normal pdf.
Relative to the two-sided HI. any value of Z
or equal to 0.60 or
than
or equal to -0.60 qualifies as being "as extreme or more extreme" than
observed z.
Therefore,
Definition 6.2.3,
P-value = P{Z ?:: 0.60)
+
P(Z:::: -0.60)
= 0.2743 + 0.2743
=0.5486
Figure 6.2.6).
As noted in the preceding comment, P-values can be used as decision rutes. In
J..;;,l'.CHUpU;;' 6.2.1, 0.05 was the stated level of significance. Having determined here that
the P-value associated with z = 0.60 :is 0.5486, we know that Ho: J.L
494 would not be
rejected at the given a. Indeed, the null hypothesis would not be rejected
aoy value of
a
to and induding 0.5486.
Notice that the P-value would
been halved had H1 been one-sided. Suppose we
were confident that the new a1gebra and geometry classes would not lower a student's
=
438
Chapter 6
Hypothesis Testing
P-vlllue
Area
0.2743 + 0.2743
'" 0.54S6
0.2743
'" 0.2743
RGUR£ 6.2.6
math SAT. The appropriate hypothesis test in that case would be Ho: J-L = 494 versus
Ht: J-L > 494. Moreover, only. values in the right-hand tail of fz(z) would be considered
more extreme than the observed z 0.60, so
=
P-value = P(Z 2: 0.60)
= 0.2743
QUESTIONS
6.2.1. State the decision rule that would be used to test the fOllowing hypotheses. Evaluate the
appropriate test statistic and state your oonclusion.
(11) Ho: J-L = 120 versus Hl; IJ. < 120; y =
1/ = 25. a =
a = 0.08
(b) Ho: IJ. = 42.9 versus HI: IJ. ¢
Y = 45.1, n = 16, a 3.2, a = 0.01
(c) Ho: J-L = 14.2 versus H1: IJ. >
Y = 15.8, n = 9, a = 4.1, a 0.13
6.2.2. An herbalist is experimenting wilh juices extracted from berries and roots that may have
the ability to affect the Stanford-Binet IQ scores of students afflicted with mild cases
of attention deficit disorder (ADD). A mndom sample of 22 children diagnosed with
the condition have been drinking Brain-Blaster daily for two months. Past experience
of 95 on the IQ test with a standard
suggests that children with ADD score an
the a = 0.06 level of S12ItImlcan.ce.
deviation of 15. If the data are to be
what values ofy would cause Ho to be rejected? Assume that H1 is two-sided.
6.2.3. (a) Suppose Ho: /.L = J.Lo is rejected in favor of Ht: IJ. "" IJ.() at the ct = 0.05 level of
significance. Would 110 necessarily be rejected at the a = 0.01 level of significance?
in favor of Nt: J.L ¢ IJ.() at the a
0.01 level of
(b) Suppose Ho: J.L = IJ.() is
sjgnifican~ Would Ho necessarily be rejected at the a = 0.05 level of significance?
6.2.4. Company reooros show that drivers get an average of 32,500 miles on a set of Road
All-Weather radial tires. Hoping to improve that figure. the oompany has added
a new polymer to the rubber that should help protect the tires from deterioration caused
by extreme temperatures. fifteen drivers who tested the new tires have reported getting
the company claim that the polymer has produced a
an average of 33,800 miles.
statistically significant increase in tire mileage? Test Ho: IJ. = 32,500 against a one-sided
alternative at the ct = 0.05 level. Assume that the standard deviation (a) of the tire
""''':;''g'''' has not been affected
the addition ofthe polymer and is still 4000 miles.
Section 6.2
The Decision Rule
439
6..2.5. If Ho: /k = J1.v is
in fallor of H l : /k > /ko, will it necessarily be rejected in favor
of HI: j.t
flo? Assume that a remains the same.
6.2.6. A random sample of size 16 is drawn from a normal distribution having (1 = 6.0 for the
purpose of testing Ho: fL = 30 versus HI: fL #- 30. The experimenter chooses to define
the critical region C to be the set of sample means lying in the interval (29,9, 30.1),
What level of significance does the test have? Why is (29.9,30,1) a poor choice for the
critical region? What range ofyvalues should comprise C, assuming the same a is to
be used?
6.2.7. RecaU the breath analyzers described in Example 4.3,6, The folJowing are 30 blood
alcohol determinations made by Analyzer GTE-lO, a tbree-year-old unit that may be
in need of recalibration. AU 30 measurements were made using a test sample on which
a properly adjusted machine would
a reading 12,6%.
*
12.3
12.6
132
13,1
13.1
12.7
13.1
12.8
12.9
12,4
13.6
12.6
12.4
12.4
13.1
12.6
12.6
13.1
12.9
12,7
12.4
12.6
12.5
12.4
12.7
12,9
12.6
12.4
(a) If fL denotes the true average reading that Analyzer GTE-lO would make on a
person whose blood alcohol concentration is 12.6%, test
HO:fL
6.2.8.
6.2.9.
6.2.11).
6.2.11.
= 12,6
at the a = 0.05 level of significance. Assume that (1 = 0.4. Would you recommend
that the machine be readjusted?
(b) What statistical assumptions are implicit in the hypothesis test done in Part (a)?
Is there any reason to suspect that those assumptions may not be satislied?
Calculate the P-values for the hypothesis tests indicated in Question 6.2.1. Do they
agree with your decisions on whether or not to reject Ho?
Suppose Ho: j.t = 120 is tested against HI: fL #- 120. If a 10 and n 16, what P-value
is associated with the sampJe mean y = 122.3? Under what circumstances will Ho be
rejected?
As a class research project, Rosaunt,warus to see whether the stress of final exams
elevates tbe bJood pressures of freshmen women. When they are not under any
untoward duress, healthy 18-year-old women have systolic blood pressures that average
mm Hg with a standard deviation of 12 rom Hg. If Rosaura finds that the average
blood pressure for the 50 women in Statistics 101 on the day of the final exam is 125.2,
what should she conclude? Set up and test an appropriate hypothesis.
As input for a new inflation model, economists predicted that the average cost of a
hypothetical "food basket" in east Tennessee in July would be $145.75. The standard
deviation ((1) of basket prices was assumed to be $9.50, a figure that has held fairly
constant over the years. To check their prediction, a sample of25 baskets representing
different parts of the region were checked in late July, and the average cost was $149.75.
Let a =0.05. Is the difference between the economists' prediction and the sample mean
statistically significant?
=
=
440
Chapter 6
6,3
TESTING BINOMlAl DAT A-Ho: P ::: Po
Hypothesis Testing
Suppose a set of data-kl, k1 • ... , k,,-represents the outcomes of n Bernoulli trials, where
ki = 1 or 0, depending on whether the ith trial ended in success or failure, respectively.
If p = P(ith
in success) is unknown, it may
appropriate to test the null
hypothesis Ho: p
Po, where Po is some particularly relevant (or status quo) value of p.
Any such procedure is called a binomial hypothesis test, because
test statistic, as we
will see, is a function
= kl + k2 + ... + kif = total nwnber
and we know
from
3.2.1 that the total number of successes, X, in a series of n independent
trials has a binomial distribution,
=
k = 0, 1, 2, ... ,1'1
procedures for testing
P = Po need to
of n. In general, if
resting on the
o<
npo -
< npo
+
COflSJClien:d the distinction
<n
(63.1)
we do a "large-sample" test of
P = Po based on an approximate Z ratio. Otherwise, a
"small-sample"
rule is
one where the
js defined in terms of
the exact binomial distribution assoclat(!d with the random variable X.
A Large-Sample Test for the Binomial Parameter p
Suppose the number of observations, n, making up a set of Bernoulli random variables is
sufficiently large that Inequality 63.1 is satisfied. We know in that case from
43
X
npo
that the random variable
has approximately a standard normal pdf, fz(z)
if p
Po. Values of ---;:-=;;::::=== close to zero, of course, would be evidence in favor
of Ho: p = Po [since E
of
= 0 when P = Po]. Conversely, the credibility
p = Po dearly diminishes as -;:.==;===== moves farther and farther away from
zero. The large-sample test of Ho: P = Po.
Ho: J.J., J.J.,o in Section 6.2.
=
Theorem 6.3.1.
which 0 < npo -
the same basic fonn as the test of
be a random sam Ie 0 1'1 Bernoulli random variables fOT
< npo + 3 npo(l - Po) < n Letk = kl + k2 +, .. + Ie,.
k denote the total number of "successes" in the n trials. Define z = ---;:-=:;;;;;;==::::::::::==.
~~"---'-~
a. To test Ho: p = Po versus Hi: p > Po at the a: level ofsignifialrlce, reject Ho if z 2: la.
b. To test Ho: p = Po versus HI: p < Po at the a: level ofsignificance, reject Ho if z .:s -zO'o
Co To test
p = Po versus Ht: p =I:. Po at the a level of significance, reject Ho if z is
either (1) .:s -Zaj2 or (2) 2: Za/2·
Section 6.3
Testing Binomial Data-Ho: P ;;;: Po
441
CASE STUDY 6.3.1
In gambling parlance, a point spread is a hYJX>thetical increment added to the score of
the presumably weaker of two teams playing. By intention. its magnitude should have
the effect making
game a toss-up; that each team should have a 50% chance
of beating the "",.."",1'1
In
the
on
a highly subjective endeavor, which
the question of whether or not the Las Vegas crowd actually gets it right (116). Addressthat issue, a recent study examined the records of 124 National Football League
games; it was found that in sixty-seven of the matchups (or 54%) the favored team beat
the spread. Is the difference between 54% and 50% small enough to
written
to
chance, or did the study uncover convincing evidence that odds makers are nm capable
of accurately quantifying the competitive edge that one team holds over another?
p = P (favored team beats spread). ff p is
value other than
bookies are assigning point spreads incorrectly. To
tested,
are the hypotheses
=
Ho: P 0.50
versus
Ht: P '# 0.50
We will
0.05 to
the level of significance.
In the terminology of
6,3.1. n = 124, Po = 050. and
ki
=
1 if favored team beats
1q
in ith game
if favored team does not beat spread in i th game
for i = 1,2, .... 124. Therefore, the sum k = kl + k2 + ,.. + kl24 denotes the total
number of times the favored team
the spread.
According to the two-sided decision rule given in Part (c) of Theorem 6.3.1, the
null hypothesis should be rejected if z is either less
or
to -1.96 (= -Z.05/2)
or greater than or equal to 1.96
Z.05/Z)' But
z=
67 - 124(0.50)
=0.90
does YIOt fall in the critical region, so Ho: P = 0.50 should not be rejected at
ex = 0.05 level of significance. The outcomes of these 124
in other words, are
entirely consistent with the presumption that ~kies know which of two teams is
better, and by how muclL
Comment. P -values can
they were in Section 6,2 when
used to
binomial hypothesis
just as
null hypothesis was Ho: 11. = 11.().ln Case Study 6.3.1,
(ConJinuwon next page)
442
Chapter 6
Hypothesis Testing
(Cllse Smdy 63.1 continued;
for example, the observed test statistic is 0.90 and HI is two sided, so the P-value is
O.3Z·
P-value
= P(Z ::: -0.90)
0.1841
=0.37
+
P(Z 2: 0.90)
+ 0.1841
For any a < 0.37, then, our conclusion that the bookies are competent would remain
unchanged,
CASE STUDY 6.3.2
There is a theory that people may tend to "postpone" their deaths until
some
event that has particular meaning to them has passed (139). Birthdays. a family
as the sorts of personal
reunion, or the return of a loved one have all been
milestones that might have such an
National
may be another. Studies
have shown that the mortali.ty rate in the United States drops noticeably during the
Septembers and Octobers of presidential election years. If the postponement theory
is to be believed, the reason for the decrease is that many of the elderly who would
have died in those two months "hang on" until they see who wins.
years ago, a national periodical reported the findings of a study that looked
at obituaries published in a Salt Lake City newspaper. Among the 747 decedents the
identified that sixty, or 8.0%, had died in the three-month period preceding their
to their birthdays,
birth months (129). If individuals are dying randomly with
interval. What should we
we would expect 25% to die during any given
make, then, of the decrease from 25% to 8%. Has the study provided convi.ncing
reported for the sample do not constitute a random
evidence that the death
sample months?
that occurred in
Imagine the 747 deaths being divided into two categories:
the three-month period prior to a person's birthday and those that occurred at other
times
the
Let ki = 1 if the ith
belongs to the first category and
ki = 0, otherwise. Then k = kl + k2 + '" + k747 denotes the total number of deaths
in the first category. The
of course, is the value of a binomial random
with parameter p, where
p
= P (person dies in three months
to birth month)
A.
If people do not postpone their deaths (to wait for a birthday), p should be
or
0.25; if they do, p will be something less than 0.25.
the decrease from 25%
(Continued on next page)
Section 6.3
Testing Binomial Data-Ho: P = Po
443
is done with a one-sided binomial hypothesis test:
t08%,
p=0.25
versus
: p < 0.25
Ci
= 0.05. According to Part (b) of Theorem 6.3.1, Ho should
k
npo
z: = --;::;=::::::::::~= S
-ZJ)5
rejected jf
= -1.64
Substituting for k. n, and Po. we tind that the test statistic falls far to the left of the
...........:u value:
z=
liiii~~~~O:;: =
-10.7
The evidence is overwhelming, rm::reIOI'e
the decrease from
to 8% is due
other than chance. bJl:pll'l11!lt1<)flS other than the
theory,
to
of course, may be wholly or partially
for the nonrandom distribution of
the data show a pattern
consistent with the notion that we do
have some control over when we die.
Comment. A similar conclusion was
in a study conducted
the
"significant event" in that case was not a
Chinese community living in California.
birthday-it was the annual Harvest Moon festival, a celebration that holds particular
meaning
elderly women. Based on census data tracked over a tUl,~nru_f.r\l
period, it was determined that fifty-one
among elderly Chinese women should
and fifty-two
after the
have
during the week before the
festivals. In point of fact, thirty-three died the
before and seventy died the week
after (23).
A Small-Sample
for the Binomial Parameter p
Suppose that
... ,k" is a random sample of Bernoulli random variables where n
is too small
Inequality 6.3.1 to hold.
rule, then, for
p
that was given in
6.3.1 would not
appropriate. Instead, the
defined by using
exact binomial distribution (rather than a normalaP1prClxunaltioll)
EXAMPLE 6_3.1
Suppose that n = 19
relieve
cases.
patients are to be given an experimental drug designed to
standard treatment is known to be effective in 85% of similar
probability that the new drug will reduce a patient's pain, the
444
Chapter 6
Hypothesis Testing
researcher wishes to test
HO:p = 0.85
versus
p
"p
0.85
The decision will be based on the magnitude of k, the total number in the sample for
whom the durg is effective-that on
k = k}
+
+ ... +
k2
k19
where
!q=
I
0
if the new drug fails to relieve ith patienc's pain
1
if the new drug does relieve ith patient's
What should the decision rule be if the
is to
fi somewhere near
10%? [Note that Theorem 6.3.1 does not apply here because Inequality 6.3.1 is not
satisfied-specifical1y, npo + 3Jnpo(1 - Po) = 19(0.85) + 3J19(O.85)(0.15) = 20.8 is
not less than
19).1
If the null hypothesis is true, the expected number of successes would be npo = 19(0.85).
or 16.2. It follows that values
k to the extreme right or extreme left of 16.2 should
constitute the critical region.
MTB > pdf;
SUBC > binomial 19 0.85.
Probability Density Function
Binomial with n '" 19 and P = O. 850000
x
8
9
10
11
12
13
14
15
16
17
18
19
p
.. x)
0.0000
0.0001
0.0007
0.0032
0.0122
0.0374
0.0907
0.1714
0.2428
0.2428
0.1529
0.0456
-P(X:::: 13)
0.0536
19)
=0.0456
-
P(X =
FIGURE 6.3.1
Binomiaf Data-Ho: P = Po
Section 6.3
445
6.3.1 is a MINITAB printout of px(k) = C:)(0.85)k(0.15)19-k. By inspection,
we can see that the
region
C={k:k::;:13
would produce an Ci
with the two sides
P(X
Eel Ho is
or
k=19)
to the desired 0.10 (and would keep the probabilities associated
rejection region roughly the same). In
variable notation,
= P(X
:s 131 p =
= 0.0001
=0.0992
=0.10
+
=
0.85) + P(X 191 p 0.85)
0.0007 + 0.0032 + 0.0122 + 0.0374
+ 0.0456
QUESTIONS
6.3.1. Commercial
working certain
of the Atlantic Ocean sometimes find
they would like to scare
their efforts being hindered by the presence of whales.
without frightening the fish. One of the
being "","\Pr,m,>"
away the
underwater the sounds of a killer whale.
the 52 oa;asions
with is to
technique has been tried, it worked 24 times (that is, the whales immediately left the
area). Experience has shown, though, that 40% of all whales sighted near fishing boats
leave of their own accord, anyway, probably just to get away
the noise of the boat
(8) Let p = P(whaleleaves area after hearing sounds of killer whale). Test Ho: p = 0.40
p > 0.40 at the a
0.05 level of significance.
it be argued on the
versus
basis
these data that
underwater
sounds is an effective
technique for clearing fishing waters of unwanted
.
(b) Calculate the P-value for these data. For what values of a would Ho be relj;:;CU:O
6.3.2. Efforts to find a genetic explanation for why cenain people are rjght-handed and
others left-handed have been
unsuccessful. Reliable data are difficult to find
because of environmental factors that also influence a child's "handedness." To avoid
that
researchers
study the analogous problem of "pawedness" in
animals,
both genotypes and the environment can be partially controlled. In one
such experiment (28), mice were
into a cage having a
tube that was equally
accessible from the right or the
Each mouse was
carefully watched over a
number of feedings. If it used its
paw more than
the
to activate the
it was
to be "right-pawed." Observations of this sort showed that 67% of mice
of
bellonginlgro strain AlJ are right-pawed. A similar protocol was followed on a
35
belonging to strain AlHel. Of those 35, a total of 18 were eventually "''''>OH''''-<
as right-pawed. Test wbether
proportion of right-pawed mice found in the AJHeJ
sample was significantly different fTOm what was known about the All strain. Use a
be the probability
with the critical region.
two-sided alternative and Jet
to win a
seat because of a SJU;aDI,e
6.3.3, Defeated
gender
a politician has
last two years
out in favor
rights issues. A newly
poll claims to have contacted a random salnplle
of the politician's current
and found that
were men. In the
that
he
exit polls indicated
65% of those who voted for him were men. Using
an a
0.05 level of significance, test the null hypothesis 1ha1 his proportion of male
l<ur\nn;rf",.!'l has remained the same. Make the alternative hypothesis one-sided.
446
Chapter 6
Hypothesis Testing
6.3.4. Suppose
p = 0.45 is to be tested against Ht: P > 0.45 at the a = 0.14 level of
significance, where p = P(ith trial ends in success). Jf the sample size is 200, what is the
to be rejected?
smallest number of successes that will cause
6.3.5. Recall the median test described in Example 5.3.2. Reformulate that analysis as a
hypothesis test rather than a confidence interval. What P-value is associated with the
outcomes listed in Table 5.3.31
6.3.6. Among the early attempts to validate the postponement theory introduced in Case
Study 6.32 was an examination of the birth dates and death dates of 348 U.S. celebrities
(139). It was found that 16 of those individuals had died in the month preceding their
birth month. Set up and test the appropriate Ho against a one-sided Hi. Use the 0.05
level of significance.
6.3.7. What a levels are possible with a decision rule of tbe form "Reject Ho if k ~ k*" when
No: P 0.5 is to be tested against H1: P > 0.5 using a random sample of size 11 = 7?
6.3.8. The following is a MINITAB printout ofthe binomial pdf px(k)
=
G)
(0.6)k (O.4)9--k!
= 0, 1. ... , 9. Suppose p == 0.6 is to be tested against Ht: P > 0.6 and we wish
the level of significance to be exactly 0.05. Use Theorem 2.4.1 to combine two different
into a singie randomized decision rule for which a = 0.05.
critical
k
MTB >
SUBC > binomial 9 0.6.
Probability Density Function
Binomial vith n - 9 and P = 0.600000
x
P(X ... x)
0
0.0003
0.0035
0.0212
0.0743
0.1672
0.2508
0.2508
0.1612
0.0605
o 0101
1
2
3
4
6
6
7
8
9
6.3.9. Suppose Ho: p = 0.75 is to be tested
Ht; p < 0.75 using 8 random ,XU.lI'!J. . . of size
n = 7 and the decision rule "Reject Ho k ~ 3."
(a) Wbat is the test's level of significance?
(b) Graph the probability that
will be rf'I*'1"'t~'11 as a junaion of p.
6.4 TYPE I AND TYPE II ERRORS
The possibHity of drawing incorrect conclusions is an inevitable by-product of hypothesis
''''''Ull~. No matter what sort of mathematical facade is
atop the aelCISllon-ruaKllng
nrrV-",j~ there is no way to guarantee that what the test tells us wi,ij be the truth. One kind
Ho when Ho is true-figured prominently in Section 6.3: It was argued
Section 6.4
I and Type Ii Errors
447
that
regions should be defined so as [0
the probability of making
errors
small,
on the order 0.05.
In point of fact, there are two different kinds of errors that can be committed with any
Ho when Ho is
and (2) we can
to reject
hypothesis test: (1) We can
Ho is
These are called Type I and Type II errors,
At the same time,
there are two kinds of correct decisions: (1)
can fail to reject a true Ho and (2) we
can reject a false Ho.
shows these four possible "Decision/State of nature"
True State of Nature
HO is true
Our
Decision
HI is true
to
Ho
Ho
Type!
error
RGUREiA.1
Computing the Probability of Committing a Type I Error
Once an inference is made,
is no way to know whether
was correct It is possible, though, to calculate
probability of
and the magnitude that probability can help us
understand the
hypothesis test
its ability to distinguish between Ho and H1.
Recall the
additive example developed in Section 6.2: Ho: J.L =
tested against : J.L > 25.0 using a sample of size n = 30. The decision
should be rejected if y, the
mpg with
new additive,
25.718. In that case, the probability of committing a
I error is 0.05:
P(Type I
made an error,
"power" of the
25.0 was to
rule stated
or exceeded
= P(reject Ho I
is true)
p(9 ::: 25.718 I J.L = 25.0)
=p
- 25.0)
y - 25.0 25.718
----=( 2.4/J30 >
- 2.4/J30
= P(Z ::: 1.64) = 0.05
Of course,
that the probability of committing a Type I error equals 0.05 should
come as no surprise. In our earlier discussion of how "beyond reasonable doubt" should
be interpreted numerically, we specifically chose the critical region so that the probability
decision rule rejecting Ho when Ho is true would
0.05.
general, the probability of
a Type I error is referred to as a test's level
of significance and is denoted a (recall Definition 6.2.2). The concept is a crucial one: The
448
Chapter 6
Hypothesis Testing
level of
is a single-number summary of the "rules" by which the decision
the amount of evidence the experimenter
process is being conducted. In essence, ex
is demanding to see before abandoning the null hypothesis.
Computing the Probability of Committing it Type II Error
We just saw that
the probability of a Type I error is a nonproblem: There are
no computations necessary,
the probability equals whatever value the ext)erjimf~nt4er
sets a priori for ex. A similar sirna tion does not bold for Type II errors. First, Type II error
probabilities are not specified explicitly by the experimenter; second, each hypothesis test
has an infinite number of Type II error probabilities, one for each value of the parameter
admissible under HI.
N> an
suppose we want to find the probability of committing a
II error
in the gasoline experiment if the true Jl (with the additive) were 25.750. By definition,
P(Type II error I Jl = 25.750)
= P(we fail to reject Ho I Jl = 25.750)
= P(Y < 25.7181 JL = 25.750)
- 25.75
=P
= P(Z < -0.07) = 0.4721
even if the new additive
the fuel economy to 25.750 mpg (from 25 mpg),
our
rule would be "tricked" 47% of the time: that
it would tell us on those
occasions noHo
Ho.
The symbol for the probability of committing a Ty-pe II error is Figure 6.4.2 shows
the sampling distribution of Y when JL = 25.0 (i.e., when Ho is true) and when JL = 25.750
(HI is true);
areas corresponding to ex and fJ are shaded.
Oearly, the magnitude
fJ is a function of the presumed value for Jl. If,
example, the gasoline additive is so effective as to raise fuel efficiency to 26.8 mpg, the
probability that our decision rule would lead us to make a Type II error is a much
1.0
Sampling
distribu lion
of Y when-...,,'
Ho is true / 0.5
Sampling
disll'ibulioo
'-~ofYwhen
fJ. = 25.75
I
;
f
25.718
Accept Ho
..........-
~
AGURf6A..2
Reject
flo
Section 6.4
1.0
,
Sampling
/'
distribution
,
V when ",,'
H(\ is true / 05
\
\
-",""
24
/
\
\
\
\
I
{3 '" O.0068~-~
I
\
,
I
449
j/-'
\
or
Type I and Type II Errors
"
"
,
Sampling
distribution
\- _., of Ywhen
\
JL=26J3
I
,;
---~-
2S
\
I
/
"
a
26
27
0.05
2B
25.718
Accept
11;1
FIGURE 6.4.3
smaller O'()068:
P(Type II error I {.t = 26.8)
= P(we fail to
HO
1J..t=
III = 26.8) =
P(Y<
--= <
----=:---
= P(Z < -2.47) = 0.0068
Figure 6.4.3).
Power Curves
If j3 is
probability that we fail to reject Ho when HI is true, then 1 - j3 is the probability
of
complement, that we rejec1 Ho when Hi is true. We
1 - j3 the
of the
it represents the ability of the decision rule to "recognize" (correctly)
No is false.
The aLternative hypothesis HI usually depends on a parameter. which makes 1 - j3 a
function of that parameter. The relationship they share can be pictured by drawing a
power curve, which is simply a graph of 1 - j3 Versus the set of all possible parameter
values.
Figure 6.4.4
the power curve for
{.t
= 25.0
versus
where f-l is the mean of a normal distribution with cr = 2.4, and the decision rule is "Reject
Ho ify .::: 25.718." The two marked poinls on the curve represent the (J..t, 1 - fJ)
just
determined, (25.75,
and
0.9932). One other point can be gotten for every
power curve, without doing any calculations: When J), = 110 (the value
by Ho),
1
j3 = Ct. Of course. as the true mean gets fartber and
away from the Ho mean,
the power will converge to one.
4SO
Chapter 6
Hypothesis Testing
1.0
Power ",0.72
I
I
I
I
I
I
Power,. 0.29
AI
I
I
25.00
tI
I
I
I
25.50
26.00
26.50
27.00
Presumed value for J.!.
FIGURE 6..4.4
Power curves serve two different purposes. On the one hand, they completely characterize the "performance" that can be expected. from a hypothesis test. In Figure 6.4.4,
(or example, the two arrows show that the probability of rejecting Ho: J1. 25 in (avor of
HI: J1. >
when J1. = 26.0 is approximately O.
(Or, equivalently, Type n errors will
committed roughly 28% of the time when J1. = 26.0.) As the true mean moves closer
to J1.o (and becomes more difficult to distinguish) the JX>wer of the test understandably
diminishes. If J1. =
for example, the graph shows that 1 - fJ falls to 0.29,
Power cwves are also
for comparing one inference procedure with another.
every conceivable hypothesis
situation, a
of procedures for choosing
between Yo and HI will be available. How do we know which to
The answer to that question is ~ot always simple. Some procedures will be computationally more convenient or
to explain than others; some wiH make slightly different
assumptions about the pdf being sampJed. Associated with each of them, though, is a power
HO
curve. If the selection of a hypothesis'test is to hinge solely on its ability to
from HI> then the procedure to choose is the one having the steepest power curve.
Figure 6.4.5 shows the power cwves for two hypothetical methods A and B, each of
which is testing Yo: 8 = Be Versus HI: B '# 80 at the a level of significance. From the
=
.....
MethodB ...... --------
1
1-/.1
, '"
.;
,
I
I
I
I
,
I
I
miURE 6.A.5
,"'
....
Section 6.4
Type I and Type II Errors
451
of power, Method B is
the better of the two-it
probability of correctly rejecting Ho when the parameter () is not
<>'O,UUp'V"l<
Factors That Influence the Power of a Test
The ability of a test procedure to
when Ho is false is clearly of prime importance,
that raises an obvious
What can an experimenter do to influence the value
a
of 1
the case of the Z test
in Theorem 6.2.1, 1 - {3 is a function of Ci, cr,
and n.
appropriately raising or lowering the values of those
the power of
"'l".'"'''''' any given J-t can be
to equaJ any desired level.
of IX on 1 -
f3
the test of
Ho: fJ. = 25.0
earlier in this section.
form, Ci = 0.05, a =
n = 30, and the
y ~ 25.718.
decision rule called for Ho to be
Figure
shows what happens to 1 - {3 (when fJ. = 25.75) if a, n, and fJ. are held
constant
ex is increased to 0.10.
pair of distributions shows
configuration
that
in Figure 6.4.2; the power in this case is I - 0.4721, or
The bottom
portion
the graph illustrates what
when Ci is set at 0.10 instead of O.OS-the
decision rule changes from "Reject Ho if Y ~ 25.718" to "Reject Ho if Y ~ 25.561" (see
Question
and the power increases
to 0.67:
f3 = P(reject
= P(Y
is true)
~
=P
= P(Z 2: -0.43)
= 0.6664
The
6.4.6 accurately
true in
increases the power. That
it does not follow in
experimenters
manipulate a to achieve a
1 - f3. For all the reasons cited in
Section 6.2, (i should typically be set equal to a number somewhere in the neighborhood
of 0.05. If the corresponding 1 - {J
a
J-t is deemed to
inappropriate,
adjustments should be made in the values of (J
The Effects
Although it
increase I -
cr
non 1-
r;
not always be feasible (or even possible), decreasing a will necessarily
gasoline additive example, a is assumed to be 2.4 mpg, the laner
452
Chapter 6
Hypothesis Testing
-I.
1.0
Sampling
.distribution
,
ofYwhen~'
is true " 0.5
'r-\
I
,
Power -= 0.53
Sampling
distribution
ofYwben
p.,=25.75
\
I
\
\
...
25.718
RejectHo
-I.
Power =0.67
1.0
Sampling
I
distribution
.ofYwben~'
Ho is true
\
: 0.5
Sampling
distribution
\,....-- of Y when
p., =25.75
"
I
\
.t
\
f3 '" 0.3336
...
...
25.561
Accept It
-~-
FIGURE 6.4.6
being a measure of the variation in gas mileages from driver to driver achieved in a
cross-country road trip from Boston to Los Angeles (recal] page 428). Intuitively, the
environmental differences inherent in a trip of that magnitude would be considerable.
Different
would encounter
weather conditions,
amounts of
and perhaps take alternate routes.
Suppose, instead, the drivers simply did laps around a test track rather than drive on
actual highways. Conditions from driver to driver would
be much more uniform
the value of 0' would surely be smaller. What would
the
on 1 f3 when tL = 2.5.75
(and a: 0.05) if 0' could be reduced from 2.4 mpg to 1.2 mpg?
As
6.407 shows, reducing 0' has the
the Ho distribution off more
concentrated around tLo(=
and the HI distribution of more concentrated around
25.75). Substituting into
(with 1.2 for 0' in
of 2.4), we find that
=
the critical value
y* moves closer to tLo (from 25.718 to 25.359
25
+ 1.64
1.2 ) )
. ,J3O
Section 6.4
Type I and Type II Errors
453
Whenu .. 2.4
I'ower= OS3
l.0
Sampling
Sampllng
/
distribution
\ - - ofYwhen
\
IJ. = 25.75
dis!ribu lion
I
o!Vwhen~1
~ is true / 0.5
,
;
J
.B '" 0.4721
I
a .. O.OS
25.718
Accept Ho - - ' - -
Whenu=12
20
Power = 0.96
,
r
r
I
I
distnbution
ofYwhen
is true
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
f
I
I
I
I
\
1
Sampling
distn'bution
ofYwhen
IJ. = 25.75
I
I
I
r
I
\
j
j
I
I
.B =- 0.0375 ___
,'"
\
l
I
I
I
I
I
I
I
I
I
Sampling
I
I
f
i,
I
O! ..
I
o.os
1:1
24-
Acrepl Ho
_..1..-_
FIGURE 6.4.7
and the proportion of the HI distribution
incre.t1Ses from
to 0.96:
1 - t1 =
the rejection region
P(Y ?:: 25.3591 JL =
25.359 =P ( Z>-----==-1
=P(Z ?:: -1.78) =
the power)
454
Chapter 6
Hypothesis Testing
In theory, reducing (1 can
a very effective way of increasing the power of a test, as
6.4.7 makes abundantly clear. In practice, though, refinements in the way data are
collected that would have a substantial impact on the magnitude of (1 are often either
difficult to identify or prohibitively
More typically,
achieve the
same
by simply
the sample
Look again at the two sets of distributions Figure 6.4.7. The increase in 1 - fj from 0.53
d
'
. . [ Z = Y - Pi7\.
25J
enommator
0 f th e test statistic
(1/v30
in half by
deviation from 2.4 to
The same numerical effect
would be
if (1 were left unchanged but n was
from 30 to i k U - " l l Q . L
is,
it can easily be increased or decreased, the sample size is the
parameter
almost invariably turn to as the mechanism for ensuring that
a hypothesis test will have a sufficiently
power against a given alternative.
I!A ed by cutting
O 96 was accomp,l.lbh
to.
EXAMPLE SA. 1
SupJX)Se an experimenter wishes to test
=
Ho:Ji. 100
versus
J1. > 100
at the a = 0.05 level of significance and wants 1 - (3 to
0.60
J1. = 103. What is
smallest
cheapest) sample size that will
that objective? Assume that the
variable being measured is oonnally distributed with cr 14.
Findingn, given values fora, 1 - fj, cr, and j.t,requires that two simultaneous equations
be written for the critical value , one in terms of the Ho distribution and the other in
terms of the HI distribution.
the two equal will yield the minimum sample size
that achieves the desired a and 1 - (3.
Consider, first, the consequences of the level of significance being equal to 0.05. By
definition,
a = P(we reject Ho I Ho is true)
= p(y ?!
Y* I j.t = 1(0)
=P
- 100
= P ( Z?::
y* -
>
-=----=-
100)
141Jn
=0.05
But P(Z ?::
= 0.05, so
or, ,..'111\V>I
Y* = 100 + 1.64 .
14
(6.4.1)
Section 6.4
Type I and Type II Errors
455
Similarly.
Ho I HJ is true)
1 - p = P(we
From AI='penOl[l{
y* -
- 103
=P
?:
= P(Y ::: y* l/.t = 103)
103) =0.60
14/../ii
A.l,
0.5987
-= 0.60, so
---=--
which implies that
103 _ 0.25. 14
It follows,
from Equations 6.4.1 and 6.4.2 that
100
Solving for n
guarantee
(6.4.2)
+ 1.64
.
14
14
= 103 -
of seventy-eight observations must be taken to
nVlJOtnesl~
test will have the
precision.
Decision Rules for Nannarmal Data
Our discussion
hypothesis L"""LLLLjI; thus far has
confined to
involving
binomial
or normal data.
rules for other types of probability functions
are rooted in the same basic principles.
In general, to test Ho:8 = 80 ,
6 is the unknown parameterin a pd1 jy(y; 6), we
define
decision rule in terms of where the latter is a sufficient statistic for 6.
The corresponding critical region is
set of values IJ least compatible with
(hut
case of testing
admissible under H1) whose total probability when Ho is true is a. In
/.t = /.to versus
where
are normally distributed,
is a sufficient
for JL, and
least likely
the sample mean that are
admissible under
are those for which y ?: y", where
?: y* I Ho is true) = ct.
e.
EXAMPlE 6.4.2
A random sample of
11 = 8 is drawn from the uniform
for the purpose of testing
fy(y; 6)
= 1/6. 0 ~ y
~
8
Ho:8 = 2.0
versus
8 < 2.0
at
I"~,,""'"
= 0.10 level of significance.
the decision
is to be
on
the
order statistic. What would be the probability of '"'V'..t..UL,U.LLU'1". a Type II error when
ct
6 = 1.7?
456
Chapter 6
Hypothesis Testing
IT Ho is true, Yli should be dose to 2.0, and values of the largest order statistic that are
much smaller than '2.0 would be evidence in favor of HI: fJ -< 2.0. It follows, then, that the
form of the decision rule should be
"Reject Ho: fJ = 2.0 if Yg::: c"
where P(Yg ::: C I Ho is true) = 0.10.
From Example 3.10.2,
hs(Y; e = 2)
= 8 (~)
7
Therefore, the constant c that appears in the
equation
1
2'
CI.
= 0.10 decision rwe must satisfy the
t (y)7
1
2: . 2:dy
=0.10
10 8
or, equivalently,
implying that c = 1.50.
Now, f3 when fJ = 1.7 is, by definition. the probability that Y~ falls in the acceptance
region when HI: e = 1.7 is true. That is,
f3 =
P(Ys > 1.50 I fJ :::::; 1.7)
= 11.50
t· 8 (~)7
~
dy
1.7
1.7
7
= 1 _ (1.5)8 = 0.63
1.7
(see Figure 6.4.8).
5
/3 = 0.63
,
I
I
I
I
I
I
I
I
"
1
o
1
AGURE6A8
"NfofY.'g
when Ho: f} '" 2.0
istf1J£
_ _..l..-y'
2
s
Section 6.4
Type I and Type II Errors
451
EXAMPLE 6.4.3
Four measurements-kl, k2,
k4-are taken on a Poisson random variable, X, where
Px(k; >..) = e-J..>..k)k!, k = 0.1,2, ... for the
oftesting
Ho: >"=0.8
versus
H1: >.. > 0.8
What decision rule should be
if the level of significance is to be
and what will
the power of the test when>.. = 1.21
From Example 5.6.1, we know that is a sufficient statistic for J..; the same would be
4
true,
course, for
L
Xi. It will be more convenient to state the decision rule in terms
;=1
of the latter because we already know the probability mode] that describes its behavior:
Ii Xl. X2. X3. X4 are four independent Poisson random variables, each with parameter >..,
4
then
Xi has a Poisson distribution with parameter 4). (recall Example 3.12.10).
Figure 6.4.9 is a MINITAB printout of the Poisson probability function having
). = 3.2, which would be the sampling distribution
4
Xi when Ho: J..
MTB > pdf;
SUBC > poisson 3.2.
Probability Density Function
Poisson with mu • 3.20000
x
0
1
2
3
4
5
6
1
8
9
critical
10
region
11
12
13
P(X =
0.0408
0.1304
0.2087
0.2226
0.1781
0.1140
0.0608
0.0218
0.0111
0.0040
0.0013
0.0004
0.0001
0.0000
ex = P(reject Ho I HO is true)
= 0.1055
FKiURE 6.4.9
= 0.8 is true.
458
Chapter 6
Hypothesis Testing
MTB > pdf;
SUBC > poisson 4.8.
Probabllity Density Function
Poisson vith mu
= 4.80000
x
P(X
0
1
2
3
4
5
0.0082
0.0395
0.0948
0.1517
0.1820
0.1747
0.1398
0.0959
0.0575
0.0307
0.0147
0.0064
0.0026
0.0009
0.0003
0.0001
0.0000
6
7
8
9
10
11
12
13
14
15
16
III
x)
1
fJ
= P(reject Ho I HI is
= 0:3489
FKiURE6A.10
By inspection,
decision
"Reject Ho: A = 0.8 if
ki 2: 6" gives an a close to the
i=l
desired O.lD.
If HI is true and A =
Xi will have a Poisson distribution with a parameter equal
to 4.8. According to Figure
the probability that the sum of a random sample
size four from such a distribution would equal or exceed 6 (i.e., 1 - fJ when A = 1.2) is
03489.
EXAMPLE 6.4.4
Suppose a random sample of seven observations is taken
(6 + 1)yfJ, 0 ::: y ~ 1, to test
=
HO:6 2
versus
8 > 2
the pdf jy(y; 8) =
Section 6.4
Type I and Type II Errors
459
As a deci!\ion rule, the experimenter plans to record X, the number of )liS that exceed
reject Ho if X :::. 4. Wbat proportion of the time would such a decision rule lead to a
Type I
To evaLuate 0: = P(reject Ho I Ho is true), we first need to recogni7.e that X is a binomial
random variable where n 7 and the parameter p is an area under fy(y; 8 = 2):
=
p
= P(Y:::'
I Ho is true)
= P(Y :::. 0.91 fy(y; 2) = 3y2)
=fo.~
dy
=0.271
It follows, then, that Ho will be incorrectly rejected
a =
2: 418:::::0 2)
=
t ( k)
of the time:
(0.271)11(0.729)1-11
k==4
= 0.092
of
Comment. The basic notions
Type I and Type n errors tirst arose in a quality
control context. The pioneering work was done at the Bell Telephone Laboratories: There
the terms producer's risk and consumer's risk were introduced for what we nOW call a
p. Eventually. these ideas were generalized by Neyman
Pearson in the 1930s and
evolved into the theory of hypothesis testing as we know it today.
QUeSTIONS
6.4.1. Recall the "Math for !:he Twenty-First ('.cntury" hypothesis test done in Example 6.2.l.
Calculate the power of that test when the true mean is 500.
6.4.2. Carry out the details to verify the decision rule change cited on page 451 in connection
with
6.4.6.
6.4..3. For the decision rule found in Question 6.2.2 to test Ho: Jl =
versus HJ: Jl ,;. 95 at the
a = 0.06 level of significance. calculate 1
f3 when Jl = 90.
6.4.4. Construct a power curve for the a := 0.05 test of Ho: Jl = 60 versus HI: 11 ,;. 60 jf the data
consist of a random sample size 16 from a nonna! distribution having u = 4.
6.4..5. If Ho: Jl = 240 is tested against
Jl < 240 at the ex
0.01 level of significance with
a random sample of 25 normally distributed observations, what proportion of the time
will the procedure fail to recognize that 11 has dropped to 2201 Assume that t'.'l = 50.
6.4.6. Suppose II =
observations are taken from a normal distribution where (1 ::::: 8.0 for
the
of testing flo: Jl = 60 versus HI: 11 '#: 60 at
ex = 0.07 level significance.
lead investigator skipped statistics class the day decision rules were
discussed
intends to reject Ho if YfaUs in the region (60 . 60 +
(8) Find y".
(b) What is
power of the test when Jl = 62?
(e) What would the power of the test be when 11 = 62 if the critical region had been
defined the correct way?
46D
Olapter 6
Hypothesis Testing
HI: I.J.. < 200 at the a = 0.10 level of significance
based on a random sample
n from a normal distribution where q
15.0, what is
the smallest value for n that will make the power equal to at least 0.75 when I.J.. = 197?
6.4.7. If HO: I.J.. = 200 is to be tested
6.4.8. Will n = 45 be a sufficiently large sampJe to test Ho: I.J.. = 10 versus H1: J.l. =F 10 at the
a = 0.05 leveJ of significance if the experimenter wants the Type II error probability
to be no greater than 0.20 when II 121 Assume that q 4.
6A.9. If Ho: J.l. = 30 is tested against HOi. > 30 using n = 16 observations (normally
distributed) and if 1 - fJ = 0.85 when II 34, what does Ct equal? Assume that a = 9.
6.4.10. Suppose a sample of size 1 is taken from the pdf fy (Y)
(1j'J...)e- Y/)." y > 0, for the
purpose of testing
The null hypothesis will be rejected if y ~ 3.20.
(0) Calculate the probability of committing a Type I error.
(b) CalcuJate the probability of committing a
II error when A= ~.
(c) Draw a diagram that shows the Ct and p calculated in Parts (a) and (b) as areas.
6.4.ll. Polygraphs used in criminal investigations typically measure five bodily functions: (1)
thoracic respiration, (2) abdominal respiration, (3) blood pressure and pulse rate, (4)
muscular movement and
and (5)
skin response. In principJe, the
magnitude of these responses when the subject is asked a relevant question ("Did you
murder your wire?") indicate whether he is lying or telling the truth. The procedure, of
course, is not infallible, as a recent study bore out (80). Seven experienced polygraph
examiners were gi.ven a set of 40 recorik-20 were from innocent suspects and 20 from
guilty suspects. The subjects had been asked 11 questions, on the basis of which each
examiner was to make an overalljudgmenl.: "Innocent" or "Guilty." The results are as
follows:
Examiner's
"Innocent"
Decision
«Guilty"
What would be the numerical values of ex and f3 in this context? In a judicial setting,
should Type I and Type II errors carry equal weight? Explain.
6.4.12.. An urn contains 10 chips. An unknown number of the chips are white; the others are
red We wish to test
Ho: exactly half the chips are white
versus
than half the chips are wlllte
We will draw, without replacement, three chips and reject HO iftwo or more are white.
Find a. Also, find when the urn is (a) 60% white and (b) 70% white.
e
Section 6.4
Type I and Type II Errors
461
6.4.13. Suppose that a random sample of size 5 is drawn from a uniform pdf
fy(y; 0) =
!~,
0,
0 < y < B
elsewhere
We wish to test
HO:8 =2
versus
H.:fJ > 2
by rejecting the mill hypotheflis if Ymax ::: k. Find the value of k that makes the
probability of committing a Type I error equal to 0.05.
6.4.14. A sample of size 1 is
from the pdf
fy(y) = (fJ
+ 1)/.
O.:sy:s1
The hypothesis HO: fJ = 1 is to be rejected in favor of
fJ > 1 if Y ~ 0.90. What:is the
test's level of significance?
6.4.15. A series of n Bernoulli trials is to be observed as data for testing
p=!
Ht:p> !
versus
null hypothesis will be rejected if k, the observed number of successes., equals n.
For what value
p will the probability of committing a Type IT error equal 0.057
6.4.1.6. Let Xl be a binomial random variable with n = 2 and PXI = P(success). Let X2
be an independent binomial random variable with n
4 and PX 2
P(success). Let
X = XI +
Calculate a: if
=
HO: PXl =
PX2
=
!
>
~
=
versus
PX,
= px~
is to be tested by rejecting the null hypothesis when k 2:: 5.
6.4.17. A sample of size 1 from the pdf fy(y) = (1 + 9)y8, 0 .$ y
testing
:s 1, is to be the basis for
HO:B = 1
versus
H1:8 < 1
The critical region will be the interval y .:s ~. Fmd an expression for 1 - fj as a function
oie.
462
Chapter 6
Hypothesis
6.4.18. An experimenter takes a sample of
1 from the Poisson probability model,
px(k) =
I< = 0, 1,2,.", and wishes to test
).=6
versus
H t :" < 6
by rejecting Ho jf k ::: 2.
(8) Calculate the probability of committing a Type I error.
(b) Calc:ulate
probability
a Type II error when). 4.
6.4.19. A sample ohize 1 is taken from the geometric probability model., px(l<) (1 _ p)k-l p,
I< = 1. 2, 3, ... , to test Ho: P = versus H( p >
The null hypothesis is to be rejected
if I< ~ 4. What is the probability that a Type II error will be committed when p
6.4.20. Suppose that one observation from the exponential
frCy)
.\.e-J.y. Y > 0, is to be
used to test Ho:" 1 versus Hl :). < 1. The decision rule calls for the null hypothesis
to rejected jf Y ~ In 10. Find fJ as a function of A.
6.4.2l. A random sample of size 2 is drawn from a uniform pdf defined over the interval [0, B].
=
l
1.
=
=
=
=
We wish to test
HO:B 2
versus
<2
by rejecting Ho when Y1 + Y2 k. Find the value for k that
a level of significance
of 0.05.
6.4.22. Suppose that the hypotheses of Question 6.4.21 are to be tested with a decision ruJe
of the form, "Reject Ho: B 2 if YIY2 :5 k*." Find the value of k'" that gives a level of
uu'-,...,*'.... of 0.05
Theorem 3.8.3).
6.5
A NOTION OF OPllMAUfY: THE GENERAUZED UKEUHOOD RATIO
In
next several chapters we will be studying some
the particular bypothesis tests
that statisticians most often use in dealing with real-world problems. All of these have
the same conceptual
fundamental notion known as the generalized likelihood
or
More
just a principle, the generalized likelihood ratio is a working
criterion for actually suggesting test procedures.
As a first look at this important
we will conclude Chapter 6 with an application
the
likelihood ratio to the problem of testing the
8 in a uniform
Notice the relationship here between the likelihood ratio and the definition of an
"optimal" hypothesis test.
Suppose Y1 , Y2, ...• Yn is a random sample from a uniform pdf over the interval [0,8),
where 8 is unknown, and our objective is to test
Ho:8
=8
versus
Hl:B <
0
A Notion of Optimality: The Generalized likelihood Ratio
Section 6.5
463
a t a specified level of significance a, What is the "best" decision rule for choosing between
Ho and H1. and by what criterion is it considered optimal?
As a starting point in answering
questions, it will be necessary to define two
parameter spaces, wand O. In general, w is the set of unknown parameter values
the null
admissible under Ho- In the case of the unifonn, the only parameter is 8,
hypothesis restricts it to a single point:
w={8:8=8o )
The second parameter space, Q, is the set of aU possible values of all unknown parameters.
Q={8:0<(J:::;8o }
Now, recall the definition of
likelihood function, L, fTOm Defurition 5.2.1. Given a
sample of size n from a uniform pdf,
L
= L(8) = ,n
,=1
={
h(y!: 8)
(~r '
0,
otherwise
For reasons that will soon be clear, we need to maximize L(8) twice, once under wand
again under O. Since 9 can take on only one value--8o-under w,
max L(8)
tJJ
{ (~r
= L(8o )
0,
Maximizing L(8) under O-that
with no restrictions-is accomplished by simply
substituting the maximum likelihood estimate for 8 into L(8).
the uniform parameter,
Ymax is the maximum likelihood estimate (recall Question 5.2.9). Therefore,
max L(8)
n
= (_1_,)11
Ymax
For notational simplicity, we denote max L(8) and max L(8) by L(we) and L(Qe),
n
w
respectively.
Definition 6.5..1. Let Yl.)'2 •...• Yn be a random sample from fy(y; fh •...• 8lc).
generalized likelihood ratio, A, is defined to be
the uniform distribution,
A
=
(1/00)"
(l/}'max)"
=
(Ymax)11
464
Chapter 6
Hypothesi5 Testing
Note
in
).. will always be
but never greater
one (why?).
Furtbennore, values of the likelihood ratio close to one suggest that tbe data are very
compatible with Ho. That the observations are "explained" almost as well by the Ho
parameters as by any parameters
measured by L(we ) and L(Q,,)]. For
values
of J,. we should accept
Conversely, if L(w,,)/ L(Q,,) were dose to 0,
data would
not be very compatible with the parameter values in wand it would make sense to
reject Ho.
Def:in.i1ioo 6.5.2. A generalized likelihood ratio test (GLRT) is one that rejects Ho
whenever
where J,. '" is chosen so that
P(O < A:::: A. *' I Ho is true)
= ex
[Note: In keeping with the capital letter notation introduced in Chapter 3, A denotes
the generalized likelibood ratio expressed as a random variable.]
when Ho is true. If
Let fA (J,. I Ho) denote the pdf of the generalized likelihood
I Ho) were known, A" (and, therefore, the decision rule) could be determined by
solving the equation
fAC).
ct=
10
A'"
tAO. I
dJ,.
(see Figure 6.5.1). In many situations, tbough, lAO.. I Ho) is not known, and it becomes
necessary to show that A is a monotonic function of some qu.antity W, where the
distribution of W is known. Once we have found such a
any test based on w will
to one based on J,..
a suitable W is easy to find. Note that
peA
s).. '* I Ho is true) = ex = P [
(Y;;x
P
FKiURE 6.5.1
r
SA'"
I Ho is true]
I Ho is true )
A Notion of Optimality: The Generalized likelihood Ratio
Let W
= Ym8x/00 and WOO
465
::/5:*. Then
peA ~ A·
= P(W .:::: WOO I Ho is
I Ho is true)
(65.1)
the right-hand side of Equation 6-5.1 can be evaluated from what we already know
about the density function for the largest order statistic from a uniform distribution. Let
(y; 80) be the density function for Ymax • Then
fw(w; 00) =
8oiY",.. (80w; 80)
(recall Theorem 3.4.3)
which, from Example 3.10.2, reduces to
Therefore,
P(W .:::: w·
(w'
I Ho is true) = 10 nw1J -
1
dw =
implying that the ,...,.,Y,,,.,,, value for W is
=~
if
That is, the GLRT calls for Ho to be
w=
80
<
CommenS. The GLR is applied [0 other hypothesis-testing
in a manner
similar to what was described here: First we find L(we) and L(Oe), then A, and finally
W. The algebra involved, though, usually becomes considerably more formidable. For
example, in the "normal" model taken up in Chapters 7
8, both parameter spaces are
two-dimensional and the likelihood function is a product of densities of the form
_l_e -(1/2){(Y-J1)/a f
£(1
,
-oo<y<oo
QUESTIONS
6.5..1. Let
k2 . ...• k" be a random sample from the geometric probability function
px(k; p) = (1 -
p)k-I P.
k=
1. 2 •..
Find A, the generalized likelihood ratio for testing
p = PO versus HI: p "# po.
6..5.2. Let Yl , Y2 •... , Ylo be a random sample from an exponential pdf with unknown parameter
A. Find the form of the GLRT for Ho:).. = "0 versus HI: A ;!; "0. What .integral would
have to evaluated to detennine the critical value if ~ were equal to 0.05?
466
Chapter 6
Hypothesis
Let Yl, )'2 •.•.• YII be a random sampJe from a nonnal pdf with unknown mean /.L and
variance 1. Find the form of the GLRT for Ho: /.L = /.LO versus HI: /.L ¢ jJ,o·
6.S.4. In the scenario of Question 6.5.3, suppose the alternative hypothesis is
jJ, = ILl. for
some particular value of ILl. How does the likelihood ratio test change in tbis case? In
what way does the critical region depend on the
value of tkl ?
6.5.5. Let k denote the number of successes observed in a sequence of n independent Bernoulli
trials, where p = P(success).
(a) Show that the critical region of the likelihood ratio test of Ho: P = versus H]: p ¢
can be written in the form
!
k . In(k)
+
(n
!
k)· In(n - k} 2: A**
(b) Use the symmetry of the graph of
J(k) = k . In(k)
to show that the
+ (n
-
k) . In(n - k)
critical region can be written in the form
where c is a constant determined by a.
6.5.6. Suppose a sufficient statistic exists for the
8. Use Theorem 5.6.1 to show that
the critical region of a likelihood ratio test will depend on the sufficient statistic.
6.6 TAKlNG A SECOND lOOK AT STATISTICS (STATISl1CAL SIGNIFICANCE
VERSUS "PRACTICAL" SIGNIFICANCE)
The most important concept in this chapter-the notion of statistical
also
the most problematic.. Why? Because statistical
does not always mean what
it seems to mean. By definition, the difference between, say, y and /.La is statistically
significant if Ho: iL = iLo can be rejected at the a = 0.05 level. What that
is
that a sample mean equaJ to the observed y is not
to have come from a (normal)
distribution whose true mean was tko. What it does not imply is that the true mean is
necessarily much different than lLo.
Recall the discussion of power curves in Section 6.4 and, in particular, the effect of 11
on 1 - {3.
example
those topics involved an additive that might be able
to
a car's gas mileage. The
tested were
HO: J.L = 25.0
vs.
Hl:iL > 25.0
where a was assumed to be 2.4 (mpg) and a was set at 0.05. If n = 30, the decision role
called for Ho to be rejected when y 2:. 25.718 (see p. 447). Figure 6.6.1 is the test's power
curve (the point (iL. 1 - {3) =
1
0.47) was calculated on p. 448).
The important point was made in Section 6.4 that researchers have a variety of ways to
increase the power of a test-that is, to decrease the probability of committing a Type II
Section 6.6
Taking a Second Look at Statistics 461
0.75
0.50
30)
------
0.25
FIGURE 6.6. 'I
1
0.75
J - fJ 0.50
0.25
0~~--~----~~5----~----~~---L----2~65---~
FIGURE 6.6.2
error. Experimental!y., the usual way is to increase the sample size, which bas the effect
of reducing the overlap between the Ho and H1 distributions (Figure 6.4.7 pictured such
to 1.2).
a reduction when the sample size was kept fixed but 0' was dec.-eased from
6.62 superimposes the power curves for
to show
effect of non 1 - /3,
testing Ho: Ii> = 25,0 versus Hi: /.L > 25.0 the cases where 11 = 30, 11 == 60, and n = 900
(keeping a = 0.05 and 0' = 2.4).
There is good news in
6.6.2 and there is bad news in FIgure 6.6.2.
good
news-not surprisingly-is that the probability of rejecting a false hypothesis tncreases
dramatically as fI inc.-eases. If the true mean Ii> is 25.25, for example. the Z test will
25.0 14% of the time when n = 30, 20% of the
when
(correctly) reject Ho: Ii>
11
60, and a robust 93% of the time when 11 = 900.
The had news implicit in figure 6.6.2--and, for some, this is
side" of
hypothesis testing-is that any false hypothesis, even one where the true Ii> is just
"epsilon" away from /.Lo, can be rejected virtually 100% of the time if a large enough
is used Why is that bad? Because saying that a
(between y and
sample
li>o) is SlIltistically significant makes it sound meaningful when, in fact, it may be totally
inconsequential
Suppose, for example, an additive could be found that would increase a car's gasrniJeage
from 25.000 mpg to 25.001 mpg. Such a minuscule improvement would mean basically
nothing to the consumer, yet if a large enough sample size were used, the probability of
=
=
468
Chapter 6
Hypothesis Testing
A<;I'''''''LHU<l Ho: It = 25.000 in favor of HI: It > 25.000 could
made arbitrarily close to one.
That is,
difference between y and 25.000 would qualify as being statistically slgmil,callt
even though it had no "practical significance" wbatsoever.
Two lessons should be learned here, one old and one new. The new lesson is to be
wary of inferences drawn
experiments or surveys based on buge sample
Many
.....,........,.,"'y significant
are likely to result in those
but some
those "reject Has" may be driven primarily by the sample size. Paying attention to the
ItUlgniJude of
y-
k
Ito (or - -
Po) is often a good way to keep the conclusion of a
n
bypothesis test in perspective.
The second lesson bas been encountered before and will come up again: Analyzing data
is not a simple exercise in
into formulas or
computer printouts. Realworld data are seldom
and they cannot be adequately summarized, quantified,
or interpreted with any single statistical technique. Hypothesis tests, like every other
inference procedure, have strengths and weaknesses, assumptions and limitations. Being
aware of what they can tell
bow they can trick
the first
toward
them properly.
CHAPTER
7
The Normal Distribution
:,In
7.1
INTRODUCTION
1.2
COMPARING
7.3
DERIVING THE DtSTRIBU110N OF
AND
1.4 DRAWING INFERENCES ABOUT p1.5 DRAWING INFERENCES ABOUT 0'2
1.6 TAKING A SECOND LOOK AT STATISllCS ("BAO" ES11MATORS)
APPENDIX 1.A.1 MINITAB APPUCAll0NS
APPENDtX 1.A.2 SOME DISTRIBUll0N RESULTS FOR Y AND 52
APPENDIX 7 A.3 A PROOF Of THEOREM 1.5.2
APP'ENDtX 1.A.4 A PROOF THAT THE ONE-SAMPlE t TEST IS A GlRT
I know of scarcely anything so apt to impress the imagination as the
wonderful form of cosmic order expressed by the "law of frequency of
error" (the normal dirtribution). The Jaw would have been personified by
the Greeks and deified, if they had known of it. It reigns with serenity and
in complete self effacement amidst the wildest confusion. The huger the
mob, and the greater the anarchy, the more perfect is its sway. It is the
supreme Jaw of Unreason.
-Francis Galton
410
7.1
Chapter 1
The Normal Distribution
INTRODUCTION
Finding probability distributions to describe-alld, ultimately. to predict-empirical data
is one of the most important contributions a sta tistician can make to the research scientist.
Already we have seen a number of functions playing that role. The binomial is an obvious
(Case
model for the number of correct responses in the Pratt-Woodruff ESP
Study 4.3.1); the probability of holding a willning ticket in a game of Keno is given by
the hypergeometric (Example 3.2.5); alld applications of the Poisson have run the gamut
from radioactive
(Case Study 4.2.2) to Saturday afternoon football fumbles
the most widely used probability
Study 4.23). ThO&e examples notwithstanding, by
model in statistics is the IWrmal (or Gaussian) distribution,
(7.1.1)
Some of the history surrounding the normal curve has already been distussed in
Chapter
it first appeared as a limiting form
the binomial.. but then soon found
learned how to find areas under
itself used most often in non-binomial situations. We
normal curves and did some problems involving.sums and averages. Chapter 5 provided
estimates of the parameters of the normal density and showed their role in fitting normal
curves to data. In this chapter. we will take a second look at the properties and applications
of this singularly important pdf, this time paying attention to
part it plays in estimation
and hypothesis testing.
1.2
COMPARING v - It AND y -
SuppO&e that a random sample of n measurements, Yl> Yz, ... , Yn , is to be taken on a
normally distributed. the objective being to draw an
trait that is thought to
about the
pdf's true mean, p,. If the variance
is known, we
know
how to proceed: A decision rule for testing Ho: p, = J10 is given in Theorem 6.2.1, and the
construction of a confidence interval for p, is described in Section 53. As we learned, both
of those procedures are based on the fact that the
Y-p,
has a standard nonnal
distribution, /z(z).
In n"<I,rh,'p though, the parameter
is ISCl.oOl:nknown, so the ratio
-p,
r= cannot be
(j/v n
calculated, even if a value
the mean-say,
substituted for p,. TypicaUy, the only
information experimenters have about (j2 is what can be gleaned from the YiS themselves.
1
" (Y The usual estimator for the population variance. of course, is S2 =
i
n
the unbiased version of the
likelihood estimator for (j2, The question is, what
effect does replacing (j with S have on the Z ratio? Are there are probabilistic differences
between
Y-p,
Y-p,
and ---:::-
Section 7.2
Comparing
y-
471
Hi.storicaUy, many eady practitioners of statistics felt that replacing (1 with Shad,
in
no effect on -the distribution of the Z ratio. Sometimes they were right. If the
sample
is very large (which was not an unusual state of affairs in many of the early
applications of statistics), the estimator S is essentially a constant and for all intents
and purposes equal to the true
CT.
Under those conditions, the ratio
Y-
wiJIbehave
n is smaH,
much like a standard normal random variable. Z. When the sample
though, replacing (1 with S does maner, and it changes the way we draw inferences
about Ii.
Credit for recognizing that Y
-:;,; and
(1/ n
-:;,; do not have the same distribution goes
SI
n
to William Sealy Gossen. After graduating in 1899 from Oxford with a First Class degree
in Chemistry, Gossett took a position at Arthur Guinness, Son &
a firm that
brewed a thick dark ale known as stout. Given the task of making
art of brewing more
quickly
that any experimental smdies would necessarily face
two obstacles.
for a variety of economic and logistical reasons, sample sizes would
invariably small; and second, there would never be any way to know the exact value of
the true variance, (12, associated with any set of measurements.
when
objective of a_smdy was to draw an inference about f.,L. Gossett found
himself working with the ratio Y -:;,;, where n was often on the order of four or five.
S/ n
more he encountered
situation, the more he became convinced that ratios of that sort
are not adequately described by the standard normal pdf. In particular, the distribution ot
Y ; . seemed to have the same general beU-shaped configuration as Jz(z), but the tails
SI n
were "thicker"-that js, ratios much smaller than zero or much greater than zero were
not as rare as the standard normal pdf would predict.
.
Figure 7.2.1 illustrates the distinction between the distributions of Y
-
a.nd
-:;,;
S/
n
that caught Gossen's attention. In Figure 7.2.1a, 500 samples of size n = 4 have been
drawn from a normal distribution where the value a is known. For each sample, the
ratio
-:r.; has been computed. Superimposed over the shaded histogram of those five
0"/ 4
hundred ratios is the standard normal curve, fz(z). Clearly, the probabilistic behavior of
the random variable
-:r.; is entirely consistent with
a/ 4
histogram pictured in
n
fz(z).
is also
each sample, so the ratios comprising the histogram are
,
on
hundred samples of
= 4 drawn from a nonnal distribution. Here, though, S has been calculated for
Y
-:r.;4 rather than -:r.;4
~
~
In this case, the superimposed standard normal pdf does nor adequately describe the
histogra~pecifically, it underestimates the number of ratios much less than zero as
weB as the number much larger than zero (which is exactly what Gossett had noted).
412
Chapter 7
The Normal Distribution
0.2
Observed distrihutioll of
l'-M
uN4 (500 samples)
0.1
fz(z)
-3
-2
o
-1
1
2
3
4
(a)
0.2
distribution oi
l:'
.~
0
0.1
-3
-2
-1
0
1
2
4
3
(b)
FlGURE1.:U
Gossett called the quotient T =
Y-
j.J.,
a t ratio, a.nd published a paper in 1908 entitled
"The Probable Error of a Mean" in
he derived a formula for the ratio's pdf, fr(t).
Today, Gossett's work in finding h(t) is considered to be one of the major statistical
breakthroughs of the early twentieth century. We will derive fL(t) in the next section
and get a first look at some of the ma.ny applications of the ratio Y -
j.J.,
Comment. Initially, Gossett's derivation of fret) attracted very little attention,
inkling of the impact that the
VirtuaUy none of his contemporaries had the
"t distribution" would have in modern statistics. Indeed, fourteen
paper
was published, Gossett sent a tabulation ofms distribution to a fellow statistician (Ronald
A. Fisher) with a note saying, "I am sending you a copy of Student's Tables as you are
only man that's ever: likely to use them:'
Deriving the Distribution of y -;:
Section 7.3
sl", n
413
DERIVING THE DISTRIBUTION OF Y - J.L
speaking,
set of probability functions that statisticians have occasion to use
fall into two categories. There are a dozen or so that can effectively model the individual
measurements
a variety of real-world phenomena. These are the distributions we
studied in Chapters 3
4-most notably,
nonnal, binomial,
exponential,
hypergeometric, and
There is
set of probability distributions that model
the behavior of functions based on sets n random variables.
are called sampling
functions they
are typically used
purposes.
distributions, and
The normal distribution belongs to
We have seen a number of scenarios
(IQ scores., for example) where the Gaussian distribution is
at describing
the distribution of repeated measurements. At the same time, the nonna! distribution is
. u H J..... Lf
used to model
probabilistic behavior of
0)
; . . In the Latter capacity, it serves as a
t!
sampling distribution.
Next to the
distribution, the
most important sampling distributions are
the Student t distribution, the chi square distribution, and the F distribution. ALL three will
be introduced in this section, because we need the lat ler two to
fr (t), the pdf for
.
Y
tt
I rallo,
r.:: .
although our
objective in this section is to study the Student t
S/",11
distribution, we
the process
the two other
distributions that we
wlLl be encountering over and over
in the chapters ahead.
Deriving
pdffor a I ratio is not a simple matter. That may come as a surprise,
Y-tt
-Jii
that deducing the pdf for
going from
Y
tt to
0/ 11
Y~
Sf
is
easy (using moment-generating functions). But
creates some major mathematical complications because
t!
T
ratio of two
variables, Y and
of which are functions
of 11 random
YI, fl •.... Y". In general-and this ratio is no exception--finding
pdf's of quotients of random variables is difficult, especially when the numerator and
denominator random variables
pdfs to
with.
As we will see in the next
fT(t} plays out in several
m
steps. First, we show that
L
ZJ, where the ZjS are independent standard normal random
j=l
variables, has a gamma distribution (more specifically, a special case of the gamma
distribution, called a chi square distribution). Then we show that Y and S2, based on a
random
of size 11 from a Donnal distribution, are independent random variables
and that
two
final step
two
the
(11
1)S2
has a
distribution. Next we
the pdf of
of
chi square random variables (which is called the F distribution). The
tb. proof;, to mow that T2
~ (~/~)
2
can be written as the quotient of
random variables, making it a special case of the F distribution. Knowing
us to deduce fT(t).
474
Chapter 7
The Normal Distribution
Z;, where
u=
Theorem 7..3.1.
Z2 •... , Zm are iruJ.eperuJ.ent staru:iard "fWrmLll
random variables. Then U has a gamma distribution with r
(~r'MJ2)-1e-"'"
1
Ju(u)
Proof. First take m
F Z 2(U)
= m (n = ;
= ~. Thnt is,
O.
= p(Z2::5: u) = P(-Ju ::5: Z ::: v'u) = 2P(O ::5: Z ::: v'u)
= 2 10';;;
dz
d
du
== - FZ2(U) =
Notice that fu(u) =
Theorem 4.6.4, then,
r
u '"
and A
= 1. For any u :::: 0,
Differentiating both sides of the equation for
fz2(u)
=;
and).
(u) gjves /z2(U);
1
=
2
!2r (~)
r't\_1 -ufl
u''1 I~I
e
L
(u)
the form of a
pdf with r
sum of m
bas the stated
= !.
=!
!.
and A ==
By
distribution with
0
The distribution of the sum of squares of independent standard normal random
variables is
important that it
its own name,
the fact that it
represents nothing more
a special case of the gamma distribution.
m
pdf of U
Definition 7.3..1.
=L
where
are inde1)Emdent stan-
j=l
dard normal random variables, is called the
freedom.
square distribuJion with m degrees of
next theorem is especially critical the derivation of fr (t). Using simple algebra,
it can be shown that the
of a t ratio can be
as the quotient of two chi
and the other a function of S2. By showing that Y
random variables, One a function
and S2 are independent
Theorem
does), Theorem 3.82 can be used to find an
eXt)r~>Sion for the
of the quotient
Theorem 7.3.2. Let 1'1>
... , Yn be a random Sl1J11pie from a normal distribution with
mean It and variance a 2 . Then
B.
S2 and Yare independent
b.
---;:-;---
-
1
(Yi -
has a
square distribution with n - 1 degrees of
freedom
Proof.
Appendix 7.A.2
o
Section 7.3
Deriving the Distribution of y -:;
s/v fl
475
As we will see shortly, the
of a I ratio is a special case of an F random variable.
The next definition and theorem summarize the properties of
F distribution that we
will need to find the
associated with the Student t distribution.
Definition 7.3..2. Suppose that V and V are independent
nand m
square random variables
A random variable
of freedom,
the form V / m is
V/n
said to have an F distribution with m and n degrees of freedom.
Comment. The F in the name of
Sir Ronald
Theorem 7.3.3. Suppose Fm
distribution commemorates the renowned
VIm
. == U /n denotes an F random varinble with m and n degrees
11
of freedom. The pdf of Fm .n has the form
!F",.,,(W}
fv(v)
We begin by
1
=
-.,..,...-:-----,~------­
['
(n
mw)(m+n){2
+
uU,"U1l5
the
= ..,....--:=-_ _ v(m{2)-l e -l.l{2
for V / U. From Theorem 7.3.1 we know that
1
and !u(u) = ---;;::---..,......tc.. ,
From Theorem 3.8.2, we have that the pdf of W
fv/u(w) =
w ::: 0
= V / U is
fooo lul!u(u)!v(uw) du
1
du
The integrand is the variable part of a gamma density with r = (m + n)/2 and
1 = (1 + w)j2. Thus, the integral
the inverse of
density's constants. This
statement of the theorem, then, foHows from the
fYJ1t!(w)=f!!..v/u(w)=
U/~
'"
1
n m
/
derived in Chapter 3
= '!!..!v/u (m w)
( ~)
nlm
n
n
o
476
Chapter 1
The Normal Distribution
FTabies
When graphed, an F distribution looks very much like a typical chi square distributionvalues of V j m can never be negative and the F pdf is skewed sharply to the right Oearly,
Ujn
the complexity of iF. (r) makes the function difficult to work with directly. Tables,
though, are widely av~'ilable that give various percentiles of F distributions for different
values of m and n.
Figure 7.3.1 shows !F3.,(r). In general., the symbol Fp •m •n will be used to denote the
100 pth peroentile of the F distribution with m and n degrees of freedom. Here, the 95th
percentile of f F3.s(r)--that is, F.95.3,s-ts 5.41 (see Appendix Table A4).
Area=O.OS
o
2
3
4
6
5
F.9S,3,S
( = 5.41)
FIGURE 7.3.1
Using the F Distribution to Derive the pdf for t Rati05
Now we have all the background results necessary to find the pdf of
Y-jJ.
.Jii . Actually,
Sj n
though, we can do better than that because what we have been calling the "t ratio" is just
one special case of an entire family of quotients known as f ratios. Finding the pdf for that
entire family will give us the probability distribution for y
~
as well.
11
Sf
Ddinltion 7.3.3. Let Z be a standard normal random variable and let U be a chi square
random variable independent of Z with n degrees of freedom. The Student f mtle with
11 degrees offreedom is denoted Tn. where
Comment. The tenn "degrees of freedom" is often abbrieviated by df.
Lemma. The pdf for Tn is symmetric:
iT" (t) = iTn (-t),for all t.
Deriving the Distribution of Y -
Section
417
Proof. Since
-z
=~
is the ratio of a standard normal random variable to an independent chi ~llIAr..
it must have a Student t
/T" (n). But the pdf of
the point t. Therefore, frn(-t)
(I), for all t.
=
Theorem 7.3.4. The pdf for a Student t random variable with n af!l!rel"S of freedom is
given by
r
fr~
(I) =
-~~~----''---;-:-=
fo?r
= -
Proof. Note that
~OO
< t <
(i) (1 + :)
has an F distribution with 1 and n dC.
1
f T!: (t) = ---,-:-:-"'---,-"1/
Suppose that t > O. By
FT,,(t)
i >
of h,,(t),
P(Tra :::; t) =
1
2: +
P(O:s Tn :s t)
1
1
= 2
+ -2 P (-t < T.n < t)
-
= 2~ + ~2 P (0 <- T:.. -< r2)
1
=- +
Differentiating
(I)
(0
00
2
the stated result:
1
-FT,2(?-)
2"
0
478
7
The Normal
r ".,...,.."'.
Comment.
the years, the tower case t has come to be
symbol
for the random variable of Definition 7.3.3. We will follow that convention when the
context allows some flexibiUty. In mathematical statements about distributions, though,
we will be
with random variable
and denote the Student t ratio
asT".
All that remains to
to accomplish OUI original goal of finding the
for
- ;: is to show that the latter is a special case
the Student t random
Sfv n
described in Definition
Theorem 7.3.5 provides
Notice that a ~'''r''n yields a t ratio in
case having n - 1 degrees of rreeac)01.
Theorem 7.3.5. Let Yl • Yz.....
be {J random sample
mean jl and standard deviation (1. Then
has a Student t distribution with n Proof. We can
- J1.. is
<~~"''''U
Y-jl
In
Sf n
ll1p(,>r"I'.~
a nonnal distribution with
offreedom.
in the form
a standard normal random variable
with n - 1 df. Moreover, Theorem 73.2 shows that
are independent The statement of the theorem follows immediately, then, from
Definition
0
'T (t) and'z
ft
How the Two pdf's Are Related
Despite the
disparity in the
of the
for fr,,(t) and h(z),
Student t distributions and the standard normal distribution have much in common. Both
are bell shaped,
and centered around zero. Student t curves, though, are
Deriving the Distribution of yo -::
S/",n
Section
..
419
0.4
,
, '"
.I , ;
II"
.II,
II J
;1 I
1.1 1
II;
II I
/1;
0.2
#.1
IIJ
--4
I
.. "
...
~--- .....
I?
~
"
------::.,.
-3
-2
o
-1
2
1
3
4
AGLlftE 7.3.2
Figure 7.3.2 is a graph of two
.nUUtoH
10 dl Also pictured is the standard
becomes more and more like /z(z).
The convergence of 11;,(1) to fz(z) is a
The sample standard
S is
deviation of S goes to 0 as n goes to infinity
IOl,J!I:ICifl&-<me
with 2 df and the other with
(z). Notice that as n increases, fr~ (t)
of two estimation properties (1)
for a, and (2) the standard
Question 7.3.4). Therefore, as n gets large,
the probabilistic behavior of y - ;.: will become increasingly similar to the distribution
S/v n
to fz(z):
QUES110NS
X;
7.3.1, Show directly-without appealing to the fact that
is a
random variable-that
frCy) as stated in Defmition 7.3.1 is a true probability density turictIon.
7.3.2. Fmd the moment-generating function for a chi
variable and use it to
show that £(x~) = n and Var(x~) = 211.
7.3.3. Is it believable that the numbers 65, 30, and 55 are a random ~rnnl,1"
normal distribution with J..l = SO and a = 1O? Answer the Qu.estllon
distribution. Him.; Let Z; = (fi - SO)/1O and use Theorem
7.3.4. Use the fact that (n - 1 )S2 / (12 is a chi square random
with 11
1 df to prove
Him:: Use the fact that the variance of a chi square ...
7.3...5.
. .. , fll be a random sample from a normal
"'-lY"""lvu7.3.4 to prove that S2 is consistent for a 2 •
U ......V.CLl
,,"~·.,kl'"
with k df is 2k.
Use the statement of
48()
Chapter 7
The Normal Distribution
7..3.6. If Y is a chi square random variable with n
l'f'f'lIllm - the pdf of (Y - n) /..!iii
converges to h (z) as n goes to infinity
Question 7.3.2). Use the asymptotic
of (Y - n)/..!iii to apJl[o:I(imlate the fortieth percentile of a chi square
random variable with 200 degrees
7.3.7. Use
(a)
(b)
(c)
Appendix Table A.4 to find
F5IJ.6,7
F.OO1,15,S
F.90,2.2
7.3.8. Let V and U be
chi square random variables with 7 and 9
freedom. respecti"ely. Is it more likely that
~~~
of
will be between (1) 2.51 and 3.29 or
(2) 3.29 and 4.20?
7.3.9. Use Appendix Table A.4 to find the values of J: that satisfy the following equations:
(a) P(O.l09 <
< x) = 0.95
(b)
< 1.69) = x
(e)
>
= 0.01
(d) P(0.115 < 6.x < 3.29) = 0.90
(x
/2) =
< V
0.25, where V is a
square random variable with 2 df and U
U/3
is an independent chi square random variable with 3 df.
P
7.3.10. Suppose that two independent
with variance 0'2. Let
and
Si
(n -
of size 11 are drawn from a normal
denote the two sample variances. Use the fact that
1)S2 h as a chi. square di'bu
..
sttJ lJon
With n - 1 (If to expl'
am why
Fm•n =1
7.3.11. If the random variable F has an F distribution with m and n
that 1/ F has an F distribution with nand m degrees of freedom.
7.3.12. Use the result cJaimed in Question 7.3.11 to /"oy,'I"f'Jitc:
of
from fF",,~(r). That is, if we know
:::: b) = Q. what values of c and d wjll
"Check" your answer with Appendix Table
and F.9:5.8,2.
of freedom, show
p~ro~ntlles
of
iFn.m (r)
in terms
a and b for which P(a ::::
:::: Fn •m :::: d) = q?
the values of F:OS.2.8.
7.3.13. Show that as n -+ 00, the pdf of a Studenlt random variable with n df converges to
Hint: To show that the constant term in the pdffor Tn converges to 1/5. use
formula,
n!
...... ''1'1'''1''\
7.4
Drawing Inferences About JL
481
7.3.14. Evaluate the integral
roo
10
1
1
+
dx
using the Student t distribution.
DRAWING INfERENCES ABOUT JL
One of the most common of all statistical
is to draw inferences about the mean
of the population being represented by a set of
Indeed, we have already taken a first
look at
in Seelion 6.2. If the ViS come from a normal distibution where 0"
is known, the nuH hypothesis Ho: IL = lLo can be tested by calculating a Z ratio.
y -
IJ.
(recall
6.2.1).
Implicit in that solution. though. is an assumption not likely to be satisfied: rarely
does the
actually know the value (I.
7.3 dealt with precisely that
-IL
scenario and derived the pdf of the ratio TlI -l =
where (J has been replaced by
S. Given
(which we learned
i distribution with n - 1
degrees (Jf freedom), we now have the tools
to draw inferences about JL in the
all-important case where 0" is not known. Section 7.4 illustrates these various techniques
and also
assumption underlying the "t
and looks at what happens
aSS;UITlptllOn is not satisfied.
an
tTables
We ha,.!e already seen that doing hypothesis tests and
confidence intervals
using Y ; . or some other Z ratio requires that we know certain upper and/or lower
(1/ n
percentiles from the standard normal distribution. There
a similar need to identify
appropriate "cutoffs"
Student t distributions when
procedure is based
on
Y - Jl
r:: • or some
S/",n
Figure 7.4.1 shows a portion of the t table that appears
back of every statistics
book. Each row
to a different Student t pdf.
column headings give the
area to the right of the
appearing in the body of the
For example, the
listed in the a
.01 column
the df = 3 row has the
property that
2:
0.01.
More generally, we will use the symbol tex,l> to denote the 100(1 - a)th percentile of
frn (t). That is, P(Tn 2:: la.n)
Figure 7.4.2). No lower
of Student t curves
need to be tabulated
symmetry of fTll (t) implies that P (Tn'::::
a.
The number of different Student t pdfs summarized in a t table varies considerably.
Many tables will provide
for degrees of freedom
one to thirty;
others will include df values from one to fifty, or even
one to one hundred. The last
row in any t table. though, is always labeled "00":
course, correspond
to Za.
=
=
481
Chapter 7
The Normal Distribution
a
df
20
.15
.10
.05
.025
.01
.005
1
2
3
4
5
6
1.376
1.061
0.978
0.941
0.920
0.906
1.963
1.386
1.250
1.190
1.156
1.134
3.078
1.886
1.638
1.533
1.476
1.440
6.3138
2.9200
23534
2.1318
2.0150
1.9432
12.706
4.3027
3.1825
2.5706
2.4469
31.821
6.%5
4.541
3.747
3365
3.143
63.657
9.9248
5,8409
4.6041
4.0321
3.7074
30
0.854
1.055
1.310
1.6973
2.0423
2.457
2.7500
00
0.84
1.04
1.28
1.64
1.96
2.33
2.58
2.n64
AGURE 7A.1
.... ....
. ,..
\
"....
..
".. ...
...
o
RGURE 7A.2
Constructing a Confidence Interval for fL
The fact that Y ~ has a Student t distribution with n - 1 degrees of freedom justifies
Sj n
the statement that
Y-J.L
P ( -tOi /2,,,-l::S S j In
::s t
Ol !2.n-l
)
= 1 - a
or, equivalently, that
(7.4.1)
(provided the Y;s are a random sample from a normal distribution).
When the actual data values are then used to evaluate Y and S, the lower and upper
endpoints identified in Equation 7.4.1 define a 100(1 - a)% confidence interval for J.L.
Section 7.4
Drawing Inferences About /L
483
Tbeorem 7.4.L
Yr. )12 ••.•• Yn be a random :"ample of size n from a nomwl distribution
with (unknown) mean J.L. A 100(1 - a)% confidence interval for J.L is the set of values
s _
y+
CASE STUDY 1.4.1
To hunt flying insects, bats emit high-frequency sounds and then
for their echoes.
Until an insect is located, these pulses are emitted at intervals of from fifty to one
hundred
When an insect is detected, the pulse-te-pulse interval suddenly
ae<::::reasc~-SCI1lli~tlrlles to as low as ten milliseconds-thus enabling the bat to pinpoint
its prey's position. This
an interesting question: How far apart are the bat and
the insect when the bat first senses that the insect is there? Or, put another way, what
is the effective range of a bat's echolocation system?
The
problems
had to be overcome
measuring
bat-ie-insect
the statistical problems involved in
detection distance were far more complex
analyzing the actual data. The procedure that finally evolved was to put a bat into an
and record
eleven-by-sixteen-foot room, along with an ample supply of fruit
with two synchronized
millimeter sound-on-tilm cameras. By examining
the two sets of pictures frame hy frame, scientists could follow
bat's flight pattern
and, at
same time, monitor its pulse frequency.
each insect that was caught
(64), it was therefore possible to estimate
distance between the bat and the insect
at the precise moment the bat's pulse-to-pulse
decreased
Table 7.4.1).
TAm..£7A.1
Catcb Number
Detection Distance (em)
1
62
2
52
3
68
23
34
4
5
6
7
8
45
27
9
42
B3
10
56
11
40
484
Chapter 1
The Normal Distribution
(Q/seStudy 7.4.1 cOlllinued)
Define J,L to be a bat's true average detection distance. Use the eleven observations
in Table 7.401 to construct a 95% confidence interval for J,L.
Letting}'l = 62, Y2 =
... , Yll = 40, we have that
Yi
= 532
>7 =29,000
and
Therefore,
532
y= 11 = 4B.4cm
and
s=
If the population from which the YiS are being drawn is
the behavior
Y-J,L
wi]) be described by a Student t curve
in the Appendix,
ten
ae,~re(!s
P(-2.2281 < TIO < 2.2281)
of freedom. From Table A.2
= 0.95
Accordingly, the 95% confidence interval for JJ., is
(Jrr) ,y +
(Y -
22281
C~))
= 48.4 - 2.2281 JIT .48.4 + 2.2281 (18~))
JiI
(1&1)
(
=
em,60.6
EXAMPLE 1.4.1
The sample mean and sample standard deviation for the candom sample of size II = 20
in the following list are 2.6 and
respectively. Let JJ., denote the true mean of the
distribution being
by these YiS.
Section 7.4
2.5
3.2
0.5
0.1
0.1
0.2
0.2
0.1
0.4
0.4
0.3
7.4
8.6
1.8
0.3
Drawing Inferences About Ii-
485
1.3
1.4
11.2
2.1
10.1
Is it correct to say that a 95 % confidence interval for tJ. is the set of values
_
(
y - t.Q2S.n-l .
= ( 2.6
s _
In' Y +
- 2.0930 .
(025.11-1 .
3.6
6
J21i'
2. +
s )
In
2.0930·
3.6)
J20
= (0.9,4.3)
No. It is true that all the correct factors have been used in calculating (0.9, 4.3), but
Theorem 7.4.1 does not apply in this case because the normality assumption it makes is
dearly being violated.. Figure 7.4.3 is a histogram of the twenty YiS. The extreme skewness
that is so evident there is not consistent with the presumption that the data's underlying
pdf is a normal distribution. As a result, the pdf describing the probabilistic behavior of
Y-tJ.
.J2O would not be h I9 (t)·
SI 20
~ ill this situation is not exactly a T19 random variable
SI 20
leaves unanswered a critical question: Is the ratio approximately a T19 random variable?
We will revisit the normality assumption-and what happens when that assumption is not
satisfied--larer in this section when we discuss a critically important property known as
robustness.
Comment. To say that
o
y
y
5
RGURE 7.4.3
10
486
Chapter 7
The Normal Distribution
QUES110NS
7.4.1. Use ........,."1'1.,, Table A2 to find the following probabilities:
(D)
?!:: 1.134)
(b)
(c)
(d) P(-L055 <
<
7.4.2. What values of x satisfy the following equations?
(a) P(-x.::;
<
0.98
(b) P(T13?!:: x) = 0.85
(cl P(T26 < x) = 0.95
(d) P(T2:::: x) = 0.025
7.4.3. Which of the following differences is
or
(05.n -
= 9 is drawn from
7.4.4. A random sample of size n
a normal distribution with,.., = V.6.
Within what interval (-0, +0) can we expect to find
7.4..5.
of the time?
a random sample of size n
For what value of k is
p
7.4.6..
= 11
Y
27.6 80% of the time? 90%
js drawn from a normal distribution with
(l's7;'°1 ~ » ~
0.05
and S denote the
mean and sample standard deviation, respectively, based
a set of n 20 measurements taken from a normal distribution with J.1 == 90.6. Find
the function k(S) for which
00
P(90.6
k(S) .::; Y
90.6
+ k(S» =
0.99
7.4.7. In the home, the amount of radiation
by a color television set does not pose a
health problem of any consequence. The same may not be true in department stores,
where as many as 15 or 20 sets may be £Urnoo on at the same time and in a relatively
confined area. The following readings (in
per hour) were taken at 10
different department stores, each having at least five
sets in their sales areas (89).
recommended safety limit set by the National Council on Radiation Protection is
mr/h.)
Store
1
2
3
4
5
6
7
8
9
to
Radiation Level
0.40
0.48
0.60
0.15
0.50
0.80
0.50
0.36
0.16
0.89
Section 7.4
Drawing Inferences About IL
481
Construct a 99% confidence interval for the true
department-store radiation
exposure level. .
7.4.8. The following table lists the costs of repairing minivan bumpers damaged by a 5-mph
these seven observations to construct a 95% oonfidence interval for
collision (195).
JL. the true
cost for the population of all minivan models
damaged. NOIe:
sample standard deviation for these data is
Cost of
Nissan Quest
Oldsmobile Silhouette
Grand Caravan
,"',..... " •• Lumina
Toyota Previa LE
Pontiac Trans Sport
MazdaMPV
$1154
1106
1560
1769
1741
3179
7.4.9. Creativity, as any number of studies have shown, is very much .Ii province of the
Whether the focus is music., literature, science, or
an individual's
made
most profound
work seldom occurs late in life.
discoveries at the
Newton, at the age of 23. The following are 12 scientific:
breakthroughs daling from the middle of the sixteenth century to the early years of the
tWI~ULll::'ll century (216). All represented high-water marks in the careers of the scientists
involved.
around sun
basic laws of astronomy
of molion, gravitation, calculus
Nature of electricity
Burning is uniting with oxygen
Earth evolved by gradual processes
for natural selection controlling evolution
Field equations for light
Radioactivity
Quantum theory
Special theory of relativity, E = mc2
Mathematical foundations for quantum theory
(s) What can be inferred
Copernicus
Galileo
Newton
Franklin
Lavoisier
1543
1600
1746
1774
1858
Maxwell
Curie
Planck
Einstein
1864
1896
1901
1905
1926
40
34
23
40
31
33
49
33
34
43
26
39
these data about the true average age at which scientists
do
best work?
the question by constructing a 95% confidence interval.
(b) Before constructing a confidence interval for a set of observations extending over a
period of time, we should be convinced that the YiS
no biases or trends.
example, the
at which
made
discoveries
from
century to century,
the parameter tL would no
be a constant, and the
confidence interval would be meaningless. Plot "date" versus "age" for these 12
Put "date" on the abscissa. Does the variability the Yill appear to be
random with respect to time?
7
Normal DlstriblJtion
7.4.10. Fueled by the popularity of low-fat
the 19905 saw a profusion of new food
products claiming to be "no fat" or "low fat." To assess the impact of those
ages
to
ucts, measurements were taken on tbe daily fat intakes of 10
34. What does J.L
in this context? Use the data-I28.1, 57.1, 117.0, 146.1,
1423, 107.8,
103.7, and 128.7-to construct a 90% confidence interval
[or J.L. Note:
Yi
= 1101.3
= 128.428.67
and
7.4.11. In a nongeriatric population, platelet counts ranging from 140 to 440 (1Ooos per
of blood) are considered "normal." The following are the platelet counts recorded for
24 female
residents (176). Note:
YI
= 4645
24
and
E y; == 959,265
;=1
Count
1
2
3
4
5
6
7
8
9
10
11
12
125
170
250
270
144
184
176
100
220
200
170
160
Count
13
14
15
16
17
18
19
20
21
22
23
24
180
180
280
240
270
220
110
176
280
176
188
176
How does the definition of "nonnal" above compare with the 90 % confidence interval?
7.4.12. If a normally distributed sample of size n = 16 produces a 95% confidence interva1 for
J.L that ranges
44.7 to 49.9, what are the values ofy and s?
7.4.1.3. Two
each of size n, are taken from a nonnal distribution with unknown
mean J.L and unknown standard deviation a. A 90% confidence interval [or J.L is
constructed with the first sample, and a 95% confidence interval for J.L is constructed
with the second. Will the 95% confidence interval necessarily be longer than the 90%
confidence interval? ..... At'''''' ....
7.4.14. Revenues reported last week from nine boutiques franchised
an international
on those figures,
clothier averaged $59,540 with a standard deviation of $6,860.
in what range
the company expect to find the average revenue of all of its
boutiques?
Section 7.4
10':." ..11".1'1
Inferences About II-
489
'7.4..15. What "confidence" is associated with each of the folJowingrandom intervals? Assume
that the Yj'S are normally distributed.
(~), V + 20930 (~) )
(v -
20030
(b)
(v -
1345
(e)
(v - 17056(~) ,V + 27787(~))
(a)
(d)
(-=
Y
+
(~), V +1345 (~) )
1
(
~) )
7.4..16. The following are the median home resale
reported in 17 U.S. cities for the
median
fourth quarter of 1994 (199). Would it be re.a:SOllabJe to estimate the true
home resale
for that period by substituting
data into Theorem 7.4.1 to find
a 95% confidence
for JL1 Explain.
Median
Albuquerque
Atlanta
Rouge
Charlotte
Cleveland
Dallas
Denver
Fort Lauderdale
Indianapolis
Memphis
New Orleans
Philadelphia
Richmond
Sacramento
Salt Lake City
Seattle
93.4
77.1
104.6
98.1
923
119.0
101.8
89.2
77.8
66.6
115.4
99.2
121.5
102.2
Testing Ho: II- == ILo (The One-Sample tTest)
Suppose a (normally distributed) random
size n is observed for the purpose of
testing the null hypothesis that JL = JLo. If a is unknown-which is llSuaHy
case-the
procedure we use is called a one-sample t test. Conceptually, the latter is much like the
Z test of Theorem
rather than z
than fz
except that the decision rule is defined in terms of I
= Ys/-.J//
n
- JLo [which requires that the critical values come from h~-l (t) rather
490
Chapter 7
The Normal Distribution
Theorem 7.4.2. Let Yl ,n.
where (f
8.
.. be a rmulom sample ofsize n from a normal. distribution
is unknown. Let t = ---==-
To test Ho: /-L
= /-La versus
/-L > /-La
at the a level of significance,
HO if
t~
b.
/-L
= /-Lo versus
H1: /-L < /-La at the a level of significance, reject Ho if
t:;;
c.. To test Ho: /-L = /-La versus
either (1) :;; -ta !2,n-l or (2)
"'~I..,.."u ...... 7.A.4
/-L ~ /-La
at the a level of significance, reject Ho if t is
~
the
derivation that
described in Theorem 7.4.2. In short, the test statistic t
function
the A that !dnT'IP<>r" in Definition
which
using the procedure
=Y-
/-La
is a monotonic
one-sample 1 test a
o
EXAMPLE
is a children's disorder characterized by a craving for nonfood substances such as
day, plaster, and paint Anyone affected runs the risk of ingesting high levels of lead,
which can result in
and neurological dysfunction. Checking a child's blood
level is a standard procedure for diagnosing the condition.
Among cbJJdren between the ages of six months and five years., blood lead levels of
16.0 mgJl are considered "normaL" Recently, a random sample of twelve children enrolled
in Molly's Mighty Bear Nursery had their btood
The
sample
mean and sample standard deviation were 18.65 and 5.049,
it be
concluded that children at this particular facility tend to have higher lead levels? At the
ot = 0.05 level, is the
from
to 18.65 statistically significant?
Let ~ denote the true average
lead level
children enrolled at Mighty Bear.
The hypotheses to be tested are
/-L
= 16.0
versus
Hl: ~ > 16.0
thata = O.05,n =
and Eo is one-sided to therigbt, the critical valu.e from Part (a)
Figure 7.4.4).
of Theorem 7.4.2 (and Appendix
A.2) is t.05.J1 = 1.7959
just n little to the
Substituting y and s into the t ratio gives a test stat.istic that
of 1.05,11:
that the
y
of 18.65 does represent a statistically
Section 7.4
Drawing Inferences About p,
491
1.7959
L-RejectHo
AGURE 1.JU
EXAMPLE 1.4.3
Three banks serve a metropolitan area's inner-dty neighborhoods: Federal Trust,
American United, and Third Union. The state banking commission is concerned that
loan applications from inner-city residents are not being accorded
same consideration
that comparable requests have received from individuals in rural areas. Both constituengiven
cies claim to have anecdotal evidence suggesting that the other group is
preferential treatment.
Records show that last year
three banks approved 62 % of all the
mortgage
applicatio]1s fiLed by rural residents. Listed in
7.4.2 are
approval rates posted
over that same period by the twelve branch offices of Federal Trust (FT), American Uni ted
(AU), and Third Union (TU) that work primarily with the inner-city community. Do these
the banks are treating inner-city residents
figures lend any credence to the contention
and rural residents differently? Analyze the data using an Cl' = 0.05 level of significance.
TABLE 1A.2
Affiliation
1
2
3
4
5
6
7
8
9
11
12
& Morgan
Jefferson Pike
East 150th & Oark
MidwayMaU
N. Charter H:ighway
Lewis & Abbot
West 10th & Lorain
Highway 70
Parkway Northwest
Lanier &
King & Tara Court
Bluedot Corners
AU
TU
Percent
59
65
TV
69
Ff
Ff
AU
53
Ff
AU
64
TU
67
60
46
AU
59
492
Chapter 7
The Normal Distribution
TA&E7A3
Banks
n
AU
t Ratio
y
s
58.667
6.946
Critical Value
Reject Ho?
No
TABlE 7.4-4
American United
Federal
Third Union
n
y
4
5
3
52.25
8
t Ratio
5.38
3.96
-3.63
58.80
67.00
2.00
+4.33
-1.81
Value
B.l825
±2.7764
Rej eel Ho?
Yes
No
Yes
As a startio,g point, we might want to test
Ht,). J.t = 62
versus
HI: J.t :#
banks. TabJe 7.4.3 ",,-"'cum,,.
true average approval rate for all
where J.t is
the analysis. The two critical values are
= ±2.2010. and the observed t ratio is
62) so our d'"
1 66 = 58.667 - r:;-;;:'
-.
eclSlon IS
(
to
6.946/'\112
" " ........ 1·\1
iU' "
flO·
The "overall" analysis of Table
though, may be too simplistic. Common sense
three banks separately. What emerges, then, is an entirely
would tell us to look also at
different picture (see Table 7.4.4). Now we can see why both groups felt discriminated
~(l111~'L. American United (I
-3.63) and Third Union (t
+4.33) each bad rates that
differed significantly from 62%-but in opposite directionsJ Only Federal Trust seems to
be dealing with inner-city
and rural
in an even-handed way.
=
QUESTIONS
7.4.17. Recall the Bacillus subtilis data in Question
the null hypothesis that exposure
to the enzyme does not affecl a worker's respiratory capacity (as measured by the
FEV1NC ratio). Use a one-sided Hl and let a = 0.05. Assume that (J is not known.
7.4..18. Recall Case Study 5.3.1. Assess the credibility of the theory that Etruscans were native
Italians by testing an appropriate
against a two-sided Ht- Set a equal to 0.05. Use
143.8 mm and 6.0 mro for y and s, respectively, and let }.to = 132.4. Do these data
appear to satisfy the distribution ~umption made the t test? ........,..1" .......... ·
7.4.19. MBAs R Us advertises that its program increases a person's score on the GMAT by
an average of 40 points. As a way of checking the validity of that claim; a COnsumer
watchdog group hires 15 students to take both the review course and the GMAT. Prior
to starting the course, the 15 students were given a diagnostic test that predicted how
Section 7.4
Drawing Inferences About p.
493
weJl they would do on the GMAT in the absence of any special training. The following
table gives each student's actual GMAT score minus his or her predicted score. Set up
and carry out an appropriate hypothesis test.
the 0.05 level of significance.
Subject
LG
SH
KN
DF
SH
ML
JG
Yi = act. GMAT - pre. GMAT
YT
35
37
33
34
1225
38
40
1369
1089
1156
1444
CW
47
42
1600
1225
1296
1444
1089
784
1156
2209
1764-
DP
46
2116
KH
HS
LL
CE
KK
35
36
38
33
28
34
Study 1.2.2. Let JL denote the true
average wll ratio preferred by the Shoshonis. At
a = 0.05 leveL, test Ho: JL = 0.618
versus HI: JL -# 0.618. What does your conclusion suggest about the "universality" of
the Golden Rectangle? Note: y and s for these data are 0.661 and 0.093, respectively.
7 A.21. A manufacturer of pipe for laying underground electrical cables is concerned about the
pipe's rate of corrosion and whether a special
may retard that rate. As a way
of measuring corrosion, the manufacturer examines a short length of pipe and records
the depth of the maximum pit. The manufacturer's tests have shown that in a year's
with, the average depth
time in the particular kind of soil the manufacturer must
of the maximum pit in a foot of pipe is 0.0042 inches. To see whether that average can
be reduced, 10 pipes are coated with a new plastic and buried in the same soil. After
one year, the following maximum pit depths are recorded (in inches): 0.0039,0.0041,
0.0038, 0.0044, 0.0040, 0.0036,0.0034, 0.0036, 0.0046, and 0.0036. Given that the sample
standard deviation for these 10 measurements is 0.00383
can 1t be concluded at
the 0: = 0.05 level of significance that the plastic coating is beneficial?
7.4.22. The
analysis done in Example 7.4.3 (using all n = 12 banks with y =:: 58.667) failed
to reject Ho: iJ., = 62 at the 0: = 0.05 level. Had JLo
say,
or 58.6, the same
conclusion would have been reached. What do we call the entire set of iJ.,o'S for which
Ho: iJ., = iJ.,o would fUJI be rejected at
a = 0.05
7..4.20- Recall the Shoshoni rectangle data described in
Testing Ho: JL :: JLo When the Normality Assumption l5 I\Iot Met
Every t test makes the same explicit assumption-namely, that the set of n y,'s is normally
distributed. But suppose the normality assumption is not true. What are the consequences?
Is the validity of the r test compromised?
494
Chapter 7
The Normal Distribution
IT'" (t) ;;; pdf of f when data are
not normally distn'buted
Area;aJ2
o
RGURE74.5
7.4.5
true, the
the first
describing
is
We know that if the nonnality
variation of the
I
y -
ratio,
~o
is h,,-l (t). The latter. of
course, provides
decision rule's critical
If Ho: ~ = Jlo is to be tested against
H1: JL Jlo, for example, the null hypothesis is rejected if t is either (1) ::::
or
(2) > 10:/2.,,-1 (whiCh makes the
[error probability equal to 0').
*
If the nonnality
is not true, the pdf of
p
:::: -ta /2.,,-1 )
+
P(
y-
Y -
will not be
JLo
)
SIJii:::::
10:/2.,,-1
~
(t)
and
0'
In effect, violating the nonnality assumption creates two a's: The "nominal" 0' is the Type
at the outset-typically, 0.05 or 0.01.
"true" a is the
I error probability we
actual probability that --:,,;:;.. falls in the rejection
two-sided decision rule
in
(when Ho is true). For the
7.4.5,
true a
Whether or not the validity of the t test is "compromised" by the nonnality assumption
being
depends on the
difference between the two a's. If h,.(t)
in fact, quite similar in shape and location to h"-l (t), then the true et will
approximately
to the nominal et. In that case, the fact that the YiS are not nonnally
(t) are
would
essentially irrelevant. On the other hand, if /p(t) and
7.4.5), it would follow
the
different (as they appear to be
nonnality assumption is critical, and establishing the "significance" of a J ratio becomes
problematic.
Section 7.4
Drawing Inferences About It
495
Unfortunately, getting an exact eJ(pTl~&"ion for IT" (I) is essentially impossible, because
the distribution depends on
pdf
sampled,
there is seldom any way of
knowing precisely what that pdf migb t be. However, we can stil1 meaningfully explore the
sensitivity of the t ratjo to violations
the normality assumption by simulating samples
comparing the resulting histogram of t
to
size n from selected distributions
!r,.-l (I),
Figure 7.4.6 shows four such simulations, using MINITAB;
first three consist of
one hundred random samples of size n = 6. In Figure 7.4.6(a). the samples come from a
7.4.6(b), the underlying pdf is
uniform pdf defined over the interval [0, 1]; in
exponential with).. = 1;
in Figure 7.4.6(c), the data are coming from a Poisson pdf
with).. = 5.
(8)
~
"0
'"
.~
1
r---------------'-- fy(y) == 1
o
1
y
MTB ) eandom 100 ci-c6:
SUBC) uni£o~m 0 1.
MTB ) ~mean cl-c6 c7
MTB ) raedev cl'c6 c8
MTB > let c9 ~ sqr~(6)·«(c7)
MTB > histogram c9
0.,,/(.,,\
This command calculates
4
t retio (n", 6)
FKiURE7..4..6
6
8
496
Chapter 1
The Normal Distribution
(b)
MTB
> random
100 cl-c6:
SURe> expofiefitial 1.
MXS
MTB
MTB
MTB
> rmean
cl-c6 c7
> rSl:dev cl-c6 c9
> let c9 - sqrt(6)*«(c7)
> hisl:ogram c9
-14
-12
]
- 1.0)/(c9»
-10
I
ratio (n -- 6)
FIGURE 7.4.6: (Continued)
lithe normality assumption were true, t ratios based on
of size six would vary
accordance with the
t distribution with 5 df. On pages 495-497, frs(t) has been
superimposed over the histograms of the t
coming from the three different
What we see there is really
remarkable.
t ratios
on YiS
from a
uniform
for example, are
much the same way as t ratios would vary if the
YiS were normally distributed-that
fr' (t) in
case appears to be very similar to
frs(!)' The same is true for samples coming from a Poisson distribution (see page 497).
For both of those underlying
in other words., the true Ct would not be much different
than the nominal Ct.
Figure 7 .4.6(b) tells a slightly
story. When
of size six are drawn
fTom an exponential pdf, the t ratios are not in particularly close agreement with frs (t).
Specifically, very negative t
are
much more often than the Student t curve
would predict, while
t ratios are occurring less often (see Question
But look at Figure 7.4.6(d). When the sample
is increased to n = 15, the skewness so
prominent in Figure 7 A.6(b) is mostly gone.
Drawing Inferences About It 497
Section 1.4
(c)
0.16
.~
5
I'l...(k) .. e jl
:B
k!
rJ!
e'" 0.08
..Q
g"
OL......l_........J._--'-_......J.._-1-_....L...._..J......_..I..-_L....._L----I_k
o
2
4
8
6
10
MTB ) random 100 cl-c6;
SUBC> p01Bson 5.
MTB > rmean cl-c6 c7
MTB > £I:lToev ci-c6 c8
KTB ) leT c9 w I:lqr~(6)·«(c7) - 5.0)/(c8}}
MTB ) hisTogram c9
Sample
distributioo
<I
t ratio (n
fIGURE 7.4.6: (Continued)
Reflected in these specific simulations are some general properties of the t ratio:
1. The distribution of t
fy (y) is not too
y - ); is relatively unaffected by the pdf of the Yi s [provided
Sj",n
and n is not too
2. A$, n 11"1"1""<>"""" the pdf of t
= f-j.l
r:::
n
(I).
De(:ODles increasingly similar to
S/v
In mathematical statistics,
term robust is used to describe a procedure that is not
heavily dependent on whatever assumptions it makes. Figure 7.4.6 shows that the t test is
robust with respect to departures from normolity.
a. practical standpoint, it would be difficult to overstate the importance of the
t test being robust. If the pdf of
of the YiS, we would never know if
- I.L varied dramatically depending on the
true a associated with, say, a 0.05 decision rule
498
Chapter 7
The NOfmal Distribution
(d)
MTB
> ~andom
SUBe>
MTB
MTB
MrB
MTB
100 cl'c15;
exponen~iel
> naean
1.
cl-c15 c16
> rs~dev el-e15 el7
> let CIS ~ sqrt(15)'«(c16 - 1.0)/(c17»
> hls~osram cIS
Sample
distributioo
-4
o
-2
t
2
ratio (n = 15)
FIGURE 1.4.6: (Continued)
was anywhere near 0.05. That degree of uncertainty would make the t test virtually
worthless.
QUESTIONS
7.4.23.. Explain wby the distribution of t ratios calculated from small samples drawn from the
exponential pdf, Jy(y) = e-Y , y ~ 0, will be skewed to the left (recall Figure 7.4.6(b).
HinL- What t10es the shape uf fy(y) imply about lht: possibility of t:l:ICh Yi being close tu
O? If the entire sample did consist of YiS close to 0, what value would the t ratio have?
7.4..24.. Suppose 100 samples of size n = 3 are taken from each of the pdf's
(1) fy(y)
= 2y.
(2) jy(y)
= 4l,
and
Section 7.5
Drawing Inferences About
499
and for each set of three observations the ratio
is calculated, where J.4 is the expected value of the particular pdf being sampled. How
would you expect the distributions the two sets of ratios to be different? How would
they be similar? Be as specific as possibJe.
7..4..25. Suppose that random samples of size n are drawn from the uniform pdf, Jy(y) =
. t = Y - 0.5 .IS calculated. Parts ( band
)
( d) of
1. O .:::: y :s 1. For each sample, t he ratIo
Figure 7.4.6 suggest that the pdf of t will become
to h~-l (r) as n
increases. To which pdf is
(t), itself, converging as n increases?
7..4.26. On which of the fOUowing sets of data would you be reluctant to do a t test? Explain.
(Il)
(b)
(c)
______
__
-dER~
~~
_____________________
________- Z________
~
_____
y
y
DRAWING INFERENCES ABOUT 0'2
When random samples are drawn from a normal distribution, it is USUally the case that
the parameter J.4 is the
of the investigation. More often than not, the mean mirrors
the "effect" of a treatment or condition, in which case it makes sense to apply wbat we
learned in Section 7.4--that is, either construct a confidence interval for JJ. or test
hypothesis that JJ. = /.le.
But exceptionS are not that uncommon. SituationS occur where
"precision"
associated with a measurement is, itself, important-perhaps even more important than
the measurement's "location." If so, we
to shift our focus to the sco.le parameter,
Two
facts that we learned earlier about the population variance will noW come
play. First, an unbiased estimator (or 0'2 based on
maximum likelihood estimator is the
sample variance, S2, where
S2
1
1 /.. 1
It -
y)2
(Yi
And, second, the ratio
II
-
(Yj
- iii
has a
square distribution with n - 1 degrees
freedom. Putting these two pieces
of information together allows US to
inferences about 0'2-10 particular, we can
construct confidence intervals for 0'2 and test the hypothesis that
=
a;.
500
Chapter 1
The Normal Distribution
Chi Square Tables
as we need a t table to
out
about J.L (when is unknown), we need a
chi square table to provide the cutoffs for making
involving (12, The layout of chi
square tables is dictated by the fact that aU chi square pdf's (unlike Z and t distributions)
are skewed (see, for example, Figure 7.5,1, showing a chi square curve havingfive 11 ..' .....""'"
of freedom).
of that
chi square tables
to provide
both the left-hand tail and the right-hand tail of each chi square distribution..
O.IS
0.10
0.05
'" 0.00
o
16
15.086
8
4
1.145
AGURE 7.5.1
........,r'h".... of the chi square table that appears in Appendix A3.
Suc::cessi"e rows refer to rliff ....... 'nf chi square distributions (each
a ,.,.,.,-", ..."" ... , numoer
of degrees offreedom). The column headings denote the areas to the left of
listed in the body of the table.
We will use the symbol x;;.1l to denote the number along the horizontal axis that cuts
off to
left an area p un~r the chi square distribution with n degrees of freedom.
P
df
:01
1
2
3
4
5
6
7
8
9
0.000157
0.0201
0.115
0,297
0.554
0,872
1.239
1.646
2.088
10
11
12
3.053
3.571
.10
0.000982
0.0506
0.216
0.4B4
0.831
1.237
1.690
2.180
2.700
3.247
3.816
4.404
0.00393
0.103
0.352
0.711
1.145
1.635
2.167
2.733
3.940
4.575
5.226
.90
.95
0.0158
0.211
0.584
1.064
1.610
2.204
2.833
3.490
4.168
4.865
12.017
13.362
14,684
15.987
5.578
6.304
18.549
RGURE1.S.2
14.067
15.507
16.919
18.307
19.675
21.026
.975
.99
5.024
7.378
9.348
11.143
6.635
9,210
11345
13.277
15.086
16.812
18..475
20.090
21.666
23.209
24.725
26.217
14.449
16.013
17.535
19.023
20.483
21.920
23.336
Section 7.5
Drawing Infererw:es About (12
501
For example, from th,e fifth row of the chi sqwue table, we see the numbers 1.145 and
15.086 under the column headings ,05 and .99, respectively. It follows that
p(Xf =:: 1.145) =
0.05
and
p(Xg =:: 15.086) =0.99
(see Figure 7.5.1). In terms of the x~." notation, 1.145 =
area to the righi of 15,086, of course, must be 0.01.)
X.1>s.s and 15.086 = x1ts (The
Constructing Confidence Intervals for
(n - 21)S2 h as a ch'1 square dis tn'bUt'Lon Wit
.h
'
nI
- degrees 0 f freedom, we can
SmOO
(1
write
2
P ( xaj2,n-l
=::
(n -
I)S2
:S
2
Xl-..:rj2.11-1
)
=1 -
a
(7.5.1)
If Equation
is tben inverted to isolate (12 in
center of
inequalities, the two
the population
endpoints will necessarily define a loo{1 - a)% confidence interval
variance. The algebraic details will be left as an p.v~.,..,..;,,,,,,
Theorem 7.5.1.
82 denote the sample variance calcu/oJed from a random $O.mple of n
observations drawn from a nomwl distribution with mean J.L and vammce (12. Then
s. a 100(1 - (1)% confidence interval for (12 is the set of values
b. a 100(1 - a) % confidence interval for (1 is the set of values
502
~h"'r_r
7
The Normal Dtrtribution
CASE STUDY 1.5.1
The chain of events that define the
evolution of the Earth began hundreds
of millions of years ago. Fossils pla.y a
role in documenting the rekllive
those
events occurred, but to establish an absolute chronology, scientists rely primarily on
radioactive decay.
One of the newest dating techniques uses a rock's potassium-argon ratio. Almost
all minerals contain potassium (K) as well as certain of its isotopes, including 4OK. The
latter, though, is unstable and decays into isotopes of argon and calcium. 40 AI and
By knowing the ra.tes at which the various daughter products are formed and by
measuring the amounts of 40 Ar and 4{)K
in a specimen,
can estimate
the Object's age.
unCritical to the interpretation of any such dates, of course, is the precision of
precision is to use the technique
derlying procedure. One obvious way to estimate
same age. Whatever variation occurs, then, from
on a sample of rocks known to have
rock to rock is reflecting the inherent precision (or lack of precision) of the Pf()Ce·our
Table 75.1 lists the potassium-argon estimated ages of nineteen mineral samples,
all taken from the Black Forest in southeastern Germany (115). Assume that the
TABlE 1.5.1
of years)
1
2
249
3
254
243
4
268
5
253
6
7
269
287
8
241
9
10
273
11
306
303
12
280
13
260
256
278
14
15
16
17
18
344
304
283
19
310
(Continued on next page)
Drawing Inferences About (1'2
Section 7.5
503
proceduxe's estima-ted ages are normally distributed with (unknown) mean Jl and
(unknown)
a 2 • Construct a 95% confidence interval for (T.
Here
, ... 1
19
2>1 = 1,469,945
1=1
so the sample variance is 733.4:
2
= 19(1,469,945) - (5261)2
s
= 733 4
19(18)
.
Since n = 19, the critical values appearing in the left-hand and right-band limits of the
a confidence interval come from the
square pdf with 18 dt According to Appendix
TableA3,
P(8.23 <
xis
< 31.53)
= 0.95
so the 95% confidence interval for the potassium-argon method's precision is the set
of values
= (20.5 million years, 40.0 million years)
EXAMPLE 1.5.1
The width of a confidence interval for a 2 is a function of both nand
S2:
Width = upper limit - lower limit
-
en 2
1)s2
Xet!2,n-l
=(n
-
1)S2 (
en. -
1)S2
2
Xl-a/2.I'I-l
2
1
- -,;---
(7.5.2)
xa./2.I'I-l
As n gets larger, the interval will tend to get narrower
the unknown a 2 is being
estimated more precisely. What is the smallest number of observations that will guarantee
that the average width of a 95% confidence interval for a 2 is no
than (T2?
504
Chapter 7
The Normal Distribution
Since is an unbiased estimator for
of a 95% confidence interval for the variance
E(Width) = (n - 1)0- 2 (
the expected width
1
2
X.025./!-1 -
1)
x1,S.n-t
Clearly, then, for the expected width to be less than or equal to
that
(n - 1)
--::----
21
X.975.n-l
).5
• n must be chosen so
1
Trial and error can
used to identify the desired n. The
three columns in
Table
come from
chi
distribution in Appendix Table
& the
computation in the last column indicates, n = is the smallest sample
that will yield
95% confidence intervals for
whose average width is less than 0-2 .
TABLE 1.5.2
Testing Ho:
n
2
x.025.n-l
2
X.975.n-l
15
20
30
38
39
5.629
8.907
16.047
22.106
22.878
26.119
0'2:::
(n -
1)
45.722
55,668
1.95
1.55
1.17
1.01
56.895
0.99
O'!
The generali.zed likelihood ratio criterion introduced in Section 6.5 can
used to
set up hypothesis tests for
. The complete derivation appears in Appendix
Theorem 7.5.2 states the resulting decision rule. Playing a key
as it did in the
construction of con.6dence intervals for
the chi square ratio from Theorem
Theorem 7.S.2. Let $2 denote the sample variance calculated from a random sample
n observations drown from a normal distribution with mean ~ and variance 0'2. Let
X 2 = (n - 1)s2/0-;.
To test HO: 0'2
2
X 2>
- Xl-a.n-l·
b. To test Ho: 0- 2
8.
=
if
versus Ht: (12 > (1; at the ex level of significance. reject
= 0-; versus 81:
< (1; at the Oi Jevel of significance,
HO
::s X:;,1I-1'
e. Tu ~t H,,'
ueither (1)
::s
-- o-t
0- 2 ¢ 0-; at the a level of significance, rejecl Ho if X2 is
(} versus
or (2) ::::
Section
Drawing Inferences About q2
505
CASE STUDY 7.5.2
Home buyers can choose a variety of ways to finance mortgages, ranging from fixed~
rate thirly~year notes to one-year adjustables, where interest rates can move up or
down from year to
Dwing the first quarter of 1994,
lenders were
years; the
charging an average rate of 8.84 % on a $100,000 loan amortized over
standard deviation from bank to bank was 0.10%.
Since one-year adjustables give
considerable flexibility in responding quickJy
to changing economic
we might reasonably expect those rates to have a
greater standard deviation than the 0.10% that characterizes thirty-year fixed notes.
Lenders should be more willing to incur higher risks to compete for potential clients
by.
if
know they can make adjustments as time
lists rates quoted by n :; 9
for one-year adjustables (186).
sample standard deviation for those
YiS is s = 0.22. Do these data lend credence
to the speculation that rates for one-year adjustables are more variable than rates for
conventional mortgages?
(12 denote the variance of the population
by the YiS in Table 7.5.3.
To judge whether a standard deviation increase from 0.10% to 0.22% is statistically
significant requires that we test
Ho: (12
= (0.10)2
versus
HI: (12 > (0.10i
Let 0' = 0.05. With n = 9, the rejection region for the chi square ratio [from Part (a)
at XLa,1l-1 = x1s.s = 15.507 (see Figure
But
of Theorem 7.5.2)
so our decision is dear: Reject Ho.
TABLE 7.5.3
Lender
AmSouth Mortgage
Boatmen's National Mortgage
Cavalry Bank
First American National Bank
First Investment
First Republic
NationsBanc Mortgage
Union Planters
MortgageSouth
Initial Rate on
Adjustables
6.38%
6.63
6.88
6.75
6.13
6.50
6.63
6.38
6.50
(COJlJirw.ed on next page)
S06
Chapter 7
The Normal Distribution
(Case
7.5.2 ccnlinued)
0.12
0.08
0.04
o
AGURE7.5.3
QUESTIONS
7.5.1. Use Appendix Table A.3 to find the following cutoffs and indicate their location on the
chi square distribution.
graph of the
X,~.14
(b) X.~.2
(8)
(c) X2.015,9
7.5.2. Evaluate the foUowing probabilities:
(D) P(Xf7?: 8.672)
(b) P{X~ < 10.645)
(c) P(9.591:::
34.170)
(d) P(X~ < 9.210)
xio :::
7.5.3. Find the value y that satisfies each of the
(D) p{X~ 2: y) = 0.99
rrUlnUlIT!
= 0.05
(b)
P(x?s::: y)
(e)
P(9.542:::
xb ::: y) = 0.09
(d)
p{y:::
::: 48.232) =
0.95
7.5.4. For what value of n is each of the f'nllnWUl statements true'!
(a)
2: 5.(09) 0.975
(b)
(c)
(d)
7.5.5. For df values
:::
X; : :; 30.144) = 0.05
=0.05
::: 24.769)
= 0.80
the range of Appendix Table A3, chi square cutoffs can be
using a formula based on cutoffs from the standard normal pdf, fz (z).
Section 7,5
Drawing Inferences About (12
~ z~)
501
= p, respectively. Then
Approximate the 95th percentile of the chi square distribution with 200 df. That is, find
the value of y for whkh
p(xk::: y) .;" 0.95
7.5.6. Let Yl. Y2, . , .• Y" be a random sample size n from a normal distrihution having mean
Ii- and variance (12, What is the smallest value of n for which
Hint: Use a trial-and--error method.
7.5..7. Start with the fact that (11 - I)S2/02 has a
square distribution with n
1 df (if
the YjS are normally distributed) and derive the confidence interval formulas given in
Theorem
7.5.&. Arandnmsampleofsizen = 19 is drawn from a normal distribution for whicha 2 = 12.0.
In what range are we likely to find the sample
$21 Answer the question by
finding two numbers (J and b such that
::s S2 ::s b) = 0.95
7.5..9. One of the occupational hazards of being an airplane pilot is the hearing loss that results
from being exposed to high noise levels. To document the magnitude of the problem,
a team of researchers measured the cockpit
levels in 18 commercial aircraft The
results (in decibels) are as follows (94).
Plane
1
2
3
4
5
6
7
8
9
Noise Level
Plane
Noise Level
74
10
72
71
11
90
80
82
85
80
13
14
15
16
75
17
75
18
87
73
83
86
83
83
80
(8) Assume that cockpit noise levels are normally distributed. Use Theorem 7.5.1 to
construct a 95% confidence interval
the standard deviation of noise levels from
plane to
(b) Use these same data to COIlStruct two one-sided 95% confidence intervals for (1.
508
Chapter 7
The Normal Distribution
7.5.10.
Study 7.5.1, tbe 95% confidence interval was constructed for (1 rather than
more likely to focus on the standard deviation or on
. In practice, is an
the
or do you
that both formulas in Theorem 7.5.1 are likely to be used.
.................... often? Explain.
7.5.11.
the asymptotic
of chi square random variables (see Question
to derive large-sample confidence interval formulas for (1 and (12.
(b) Use your answer to Part (a) to construct an approximate 95% confidence interval
the standard deviation of estimated potassium-argon ages based on tbe 19
in Table 7.5.1. How does this confidence interval compare with the one
Case Study 7.5.1?
7.5.12. If a 90% confidence interval for
is reported to be (51.47,261.90), what is the value
of the
standard deviation?
7.5.13. Let Y1 , Y2 •... , Y" be a random sample of size n from the pdf
fy(y) =
(~)
(a) Use mClmlet1l-g(!;neratmg functions to show that the ratio 2nf/8 has a chi square
distribution
(b) Use the result in Part (a) to derive a 100(1 - ct)% confidence interval forB.
rocks was used
the advent of the ~VI"""'''''''lt-'''''~'JH
7.5.14. Another method for
metboddescribed in
7.5.1. Based on a mineral's leadoontent, it was capable
of yielding estimates for this same time period with a standard deviation of 30.4 million
years. The potassium-argon method in Case Study 7.5.1 bad a smaller
standard
deviation of ...1733.4 = 27.1 million years. Is
that the poita5siutIl-ar
method is more precise? Using the data in Table
test at the 0.05 level whetber tbe
poj~~iuln-.alrg()n method has a smaller standard deviation than tbe older pn:ICe,awre
lead.
machine puts into 25
7.5.lS. When working properly, tbe amounts of cement that a
recorded for 30
bags have a standard deviation «(1) of 1.0 kg. Below are the
selected at random from a day's
Test Ho: (12 = 1 versus HI: (12 > 1
the ct =
JeveJ of significance.
that the weights are
distributed.
26.18
2530
25.18
24.54
25.14
25.44
24.49
25.01
25.12
25.67
Note:
Yi = 758.62 and
7.5.16. A stock analyst claims to
quality mutual funds and
24.22
23.97
25.05
26.24
25.01
24.71
25.27
24.22
24.49
25.68
26.01
25.50
25.84
26.09
25.21
26.04
2523
= 19,195.7938
deviserl a matbematical technique for
that a client's portfolio will have
nTi'nni'IPl:
.;><.V,","'.,U,,"
high
Section 7.6
Taking a Second look at Statistics ("Bad- Estimators)
509
10-year annualized returns
lower volatility; that is, a smaller standard deviation.
After 10
'one
the
portfolios showed an average 100year
annualized return
and a
deviation of 10.17%. The benchmarks for
the
of funds
area mean of 10.10% and a standard deviation of 15.67%.
(a)
Ii be the mean for a
portfolio selected by the analyst's method. Test
at the 0.05 level that the portfolio beat the benchmark; that is, test 80: JL = 10.1
versus
Ii > 10.1.
(b) Let Cf be the stanOJuo ael/lallOn for a 24-stock portfolio selected by the analyst's
that the portfolio beat the benchmark~ that is, test
TAKING A SECOND lOOK AT
ATI.I:TII'-C
(UBAD" ESTIMATORS)
Estimating
has been a major theme of Chapters 5, 6, and 7, and it will
continue to playa prominent role the chapters ahead as our attention increasingly tums
toward statistical inference. Not surprisingly, OUT discussion of estimation has been driven
by the
to
"good" estimators-that is, ones that can claim to be unbiased,
eIIJiClenl, cOlnsl~stent, and/or sufficient. In the spirit of "thinking outside the box," though,
we might want to ask whether it would ever be desirable to use a "bad" estimator. If our
objective is the pursuit of truth, the answer, of course, would be "no." If, on the other
the answer is sometimes "yes."
hand, OUT
For
you
Psychology 101 exam because of illness, and
you two options to make up the work: you can take either (1) a sixtymake-up test or (2) a one hundred-question TrueIFalse make-up tesl
promises that the two tests will be equivalent-the questions
the same
of difficulty an d in both cases 75 % will be the lowest passing grade
. " ... nw,F, a score of forty-five or higher on the sixty-question test and seventy-five or
on the one hundred-question test). Which option should you choose, assuming you
want to
YOUT chances of I-'"""",.I,J:',
answer is that
option might be beUer, depending on how much you know
(or don't know) about Psychology 101. Suppose p denotes your probability of answering
question correctly. Both of these tests, of course, represent a
mClep,enQelll Bernoulli trials, and we have already Seen that the unbiased,
and
estimator for p in such a model is X, where n is
n
sixty or one hundred) and the random variable X is
f X. p(l·
the number of questions nS'llllel-ea correct Iy. M oreover, th e vanance 0 -- IS - - - - " - n
n
X.
-.
h
X
t h at 100 IS a more precIse estimator for p t an
it has a smaller variance_
the length of the test
Which estimator is
for you, though, is not necessarily 1~' If your knowledge
of the Psychology 101 materiaJ is deficient to the extent that your value of p is
0.75 (i.e., less than the passing mark), then it
your
interest to estimate p
as poorly as possible, meaning with as small a sample as possible (the sixty-question test)_
On the other hand, if your proba bility of answering a test
correctly is
510
Chapter 7
The Nonnal Distribution
it would be to your advantage to
p as precisely as possible (by taking the one
hundred-question test),
A
cbances of passing either test reduces to a simple summation binomial
probabilities:
P(pass 6O-question test)
= P(X/60 ::: 0.75)
=P(X :::
=
f ( ~ )~(1
p)OO-k
k=45
P(pass l00-question test)
= P(Xj100 :::
= P(X:::
=
f ( l~
)
l(1 _
p)lOO-k
k=75
7.6.1 shows a comparison of these two probabilities
p values of 0.65, 0.70,
0.80, and 0.85. As predicted, POo.rly prepared students should take their chances with
the shorter test (and
less precise estimator, ~); well-prepared students (for whom
p > 0.75) are better served by estimating p as precisely as possible (by taking the one
bundred-question test). For example, someone with 8 65% probability of answering·a
random True!Fa1se question correctly who takes the one bundred-question test has a
cbance of "getting lucky" in the sense' that his or her estimator
(= 1~) would
be 0.75
3i
or
(enough to pass the test). That same student, though, would have a
times
greater chance of passing the
test:
to the table, the estimator
X when p
0.65 has a 7% probability of
in the student's favor (by equalling or
exceeding 0.75).
TABLE 7...6,1
Probabilities of Passing
p:
6Q..question test
100-question test
APPENDIX 7.A.1
0,65
0.70
0.80
0.85
0.07
0.02
0-24
0.16
0.87
0.91
0.99
1.00
MINITAB APPUCATIONS
Many statistical procedures. including several featured in this chapter, require that the
"""LU'I.n... mean and
standard
be calculated. MINITAB's DESCRffiE
command gives y and s, along with several other useful num.erical characteristics
a
sample. Figure 7.A.1.1 shows the DESCRIBE input and output for
twenty observations
in
7.4.1.
Appendix 7.A. 1
MTB :>
DATA>
DATA>
DATA>
MTB
MINITAB Applicatloos
511
set ci
2.5 3.2 0.5 O,! 0.3 0.1 0.1 0.2 1.4 8.6 0.2 0.1
0.4 l.a 0.3 1.3 1.~ 11.2 2.1 10.1
end
> de6Gdbe cl
Descriptive Statistics: Cl
Variable
C1
H
H*
Mean SE Mean
20 0 2,610
StOev
0.809 3.617
Minimum
01
0.100 0.225
Q3
0.900 3.025
Median
Maximum
11.200
FIGURE 7..A.. 1.1
Here,
N '" sample size
from c1 (that is, the
N* - number of observations
number of "interior" blanks)
Mean .. sample mean .. y
s
SE Mean .. standard error of the mean '" In
StDev = sample standard deviation • s
Minimum = smallest observation
Q1 .. first quartile '" 25th percentile
Median £ middle observation (in terms of magnitude). or
average of the middle tvo if n is even
Q3 2 third
B
75th percentile
observation
Maximum '"
Using MINlTAB Windows
L Enter data under
in the WORKSHEET. Click on STAT, then on
STATISTICS. then on DISPLAY DESCRIPTIVE STATISTICS.
:z.. Type Cl in VARIABLES boX; click on OK.
............J.L"-'
Percentiles of chi square, t, and F distributions·can be obtained using the lNVCDF
corrurumd introduced in Appendix 3.A.1. Figure 7.A.l..2 shows the syntax for printing out
X.~5.6(= 12.5916) and F.Ol,4,7(= 0.(667746).
512
r ... •• ....•....
1
The Normal Distribution
MTB :> invcdf O. 95;
SUBC >
6.
Chi-Square with 6 DF
P(X <-
x)
0.95
x
12.5916
MTB > invcdf 0.01;
SUBC> f 4 7.
F distribution with 4 DF in numerator and 7 DF in denominator
P(X
(=
0.01
x
0.0667746
To find Student t
notation needs to
bave defined 1.10.13, for example, to be the value for whlch
~ f.l0.13)
= 0.10
In the [enml[lOl()~ of the INVCDF 1.,;U.l.l..LU1""UU, though, 1.10,13(=
percentlie of the frI3 (t} pdf (see Figure
is the ninetieth
MTB > invcdf O.
SUBC> t 13.
Student'S t distribution with 13 DF
P( X <'"
0.9
x
1.35017
FlGURE7A1.3
MINITAB
for constructing a confidence
for J.t (Theorem 7.4.1)
is "TINTERVAL X Y,"
X denotes the desired
for the confidence coefficient
1 - a and Y is the column where the data are stored.
1.A.1.4 shows the
TINTERVAL command
to the bat data from Case Study 1.4.1; 1 - a is taken
to
Appendix 7.A..l
MiB
DATA
DATA
MTB
> set cl
> 62 52 68 23 34 46
> end
:;. tinterval 0.95 cl
27
42
83
MINITAB Applications
56
S13
40
One-Sample T: Cl
Variable N
Mean
C1
11 48.3636
StDev
18.0846
SE Mean
6.4527
95%
(36.2142,
cr
60.6131)
RGUR£7A1A
Constructing Confidence Intervals Using MINITAB Windows
1. Enter data under
in the WORKSHEET.
2. Click on STAT, then on BASIC STATISTICS, then on
T.
3. Enter Cl
SAMPLES IN COLUMNS
click on OPTIONS, and
enter the
of 100(1 - a) in the CONFIDENCE
box.
Click on OK. Click on OK.
Figure 7.A1.5 shows the input and output for doing a t test on the approval data given
in Table 7.4.2. The basic command is "TrEST X Y," where X is the value of Ji.o and Y
is
column where the data are stored. If no
punctuation is used, the ......,.,,....,."'.....,
automaticaHy
Hi to be two-sided. If a
test, to the right is desired, we write
MiB :;. ttest X Y;
SUBC :;. alternative- H.
For a one-sided test to the left, the subcommand becomes "alternative -1".
MiS > set c1
DATA > 69 65 69 63 60
DATA:;. end
MTB
68 64 46 67 61 69
:;. ttest 62 c1
One-Sample
Cl
Test of mu = 62 VB not
= 62
Variable N
Mean StDev SE Mean
C1
12 58.6667 6.9467
2.0050
95%
CI
2536. 63.0797)
T
P
-1.66
0.126
FIGURE 7A 1.5
Notice that no value for a is entered, and the conclusion is not phrased as
Ro."
the analysis ends with the calculation
data's
P-volue.
"Accept Ho" or
514
Chapter 7
The Normal Distribution
Here,
P-value
= P (Tn'::: -1.66) + P(Tu :::: 1.66)
= 0.0626
(recall Definition 6.2.3).
is "Fail to reject HQ."
+
0.0626
the P-value exceeds the intended
0.05), the conclusion
1.
data under
the WORKSHEET.
1.
on
then on BASIC STATISTICS, then on l-SAMPLE
3. Type Cl in SAMPLES IN COLUMNS lxlx; click on TEST MEAN and enter
of /.Lo.
on OPTIONS, then click on NOT EQUAL
the
4. Click on whichever HJ is desired Click on OK; then click on OK.
APPENDIX 1.A.2
SOME DISTRIBUTION RESULTS FOR Y AND
s2
Theorem 7.A.2.L Let Yl,
• •. Yn be a random sample of size n from a normnl distribution with mean /.L and variance . Define
Y=
1
11.
Yj
1
and S2 =
11.
Then
a. Y and
are independent.
b. --..".-- has a chi
distribution with n - 1 degrees offreedom.
Proof.
proof of
theorem relies on certain linear algebra techniques as well as
a cbange..of-variables formula for multiple integrals. Definition 7 A.2.1
the Lemma
details, see (46) or
that follows review the necessary background results. For
(224).
Definitioo , ..................
a:. A matrix A is said to be orthogonal if AAT /.
b. Let fJ be any n-dimensional vector over the real numbers. That is, fJ
(Cl> C2 •...• en), where
Cj is a real
The length of fJ will be defined as
Appendix 7A2
Some Distribution Results forYand
515
Lemma.
s.. A mntrix A is orthogonal if muJ. only if
IlAtJll = IltJll for ellch tJ
b. If anullrix A is orthogonal, then del A = 1.
c. Let g be a one-to-one continuous mnpping on a subsel,
1
of n-spoce. Then
!(Xl ..... XII)dxl ... dXn=l f(g(Yl'"'' Yn»
g(D)
J(g)dYJ· .. dYn
D
wkre J(g) is the Jacobian of the transformmion.
Set Xi = (fi - /.l)/u for i = 1,2, ...• n. Then all the XiS are N(O, 1). Let A be an
1
t....
T
n X n orthogonal
whose Jast TOW is
-::Tn"'" .Jii).
X = (Xl ... ·, Xn)
-.
and define Z
= (Z1. Z2 .... , ZI'I)
T
-""
tj:)Xl + ... + (jn)Xn = In x.]
For
by the tra.nsformation Z
=
-L
AX. [Note that Zn -
set D,
P
P(AX E D) =
ED)
= [
lA-ID
Iv
=In
=
whereg(z) =
p(X
E A-1D)
/XI." .• X,,(Xl, ""XII) dXl' .. dxn
fXi," .,X" (g(Z» detJ (g) dZl ... dZ n
!Xt....•
X~
.
1 . dZl' .. dZn
is orthogonal, so setting (Xl. "', Xn)T = A -I Z, we have
z. But
2
Xl
+ ... + xn2 =
/Xt, .. "X"
=
2
Zl
+ ... + zn2
(arrn/2e-(l/2)(X;+"'+";)
=(21r)-n/2e~(1/2)(li+"'+z;)
this we conclude that
p(i e
D)
=
Iv
(21r)-n/2e-(II/2)(ri+"+z;)dZl .. ·dzn
implying that the Z jS are independent standard normals.
Z~J
=
Z'?J
j=l
+
XJ=
j=l
(Xj j ...t
X)2
+ nK'-
516
Normal Distribution
n-1
1'1
j=l
j=l
LZJ= I:(Xj
and the
L"
(Yj -
'-V.l.U..
"!;:)'""u
+
follows for standard normal variables. Also,
f.L and
(Xi - X)2, the conclusion follows for N(f.L, (12) variables.
::
1...1
Comment. As
/emmo.'
just presented, we established a
the
vel~ion
Fisher's
Let Xl. X2 •...• Xn be moer:x~no,em standard normal random variables and let A be an
orthogonal matrix. Define
:::::: A(X1• .... X,,)T. Then the ZiS are independent
standard normal random variables.
APPENDtx
A PRooF'OF THEOREM 7.5.2
"o'.. n'~T a two-sided HI- The relevant
We begin by considering the test
parameter spaces are
and
n = {(p., (12):
-00 < p. < 00,
O::s 0'2)
In both, llle,J1LiiOOlnUlD likelihood estimate for p. is y. In w,
for
is simply
inO.
= (lIn)
"
likelihood estimate
(Yi - y)2(seeExampleS.4.4).
the two
likelihood functions, m~lxilni2:ed over wand over n, are
and
2
n
11/2
=
n
Appendix 7.A3
A Proof of Theofem 7.5.2
517
It follows that the ,il,e!lera.ll1~d likelihood ratio is given by
We need to know the behavior of A, considered as a
of {(I;/(l5). For simplicity,
let x = «(1;;/(16)' Then A = x"i2 e -(ni2)x+n/2 and the inequality A :$ A* is equivalent to
xe-x :$ e-1 (A*)2/n.
right-hand
is again an arbitrary
sayk*.Figure7.A3.1
is a graph of y =
. Notice that the values of x = «(1;/(15) for which xe-x :5 k*, and
equivalently A:::: A", fall into two regions, one for values of a;/a~
to zero
the
other for values of (I; /(15 much larger than one. According to the likelihood fa tio principle,
we should reject Hn tor any A :$ A", where peA :5 A"'I Ho) = ct. But A" detennines (via k")
numbers a and b so that the critical region is C = {((Ii /(15): «(I; /(I~) :$ a or (a; /(I~) ~ b}.
k'"
o i1r---------~----- x
Rejectn.-RGURE 1.A3.1
Comment. At this point it is necessary to make a
peA :::: )."180) ct, it does not follow that
=
P
(~)
approximation. Just
--;.....::.--:;----
~
b
and, in fact, the two tails of the critical regions will not have exactly the same probability.
Nevertheless, the two are numerically close enough so that we will not substantially
compromise
likelihood ratio criterion by setting each one equal to ctJ2.
518
Chapter 7
The Normal DlrulbutlOfl
Note that
p
(~)
--;......;;.-;:----
~
a
=p
= p
[-,---:::--(1'1.
-
O'a
~ na]
and, similarly,
p
Thus we will cnoose as
values X;J2,1I-1 and Xf-aJ2,II-t and reject Ho if either
1)$2
(1'1. -
2
----,,--- :s K;;/2.11-1
or
(1'1. - 1)?
---::2:--- ::::
0'0
(see Figure 7.A.3.2).
Xl- a l2.,,-l
LRe,iectHo
FIGURE 7.02
Appendix 7.A.4
CommeIlt.
A Proof that the One-Sample t Test Is a GLRT
One-sided tests for dispersion are set up in a
"''-'l1.''.'';U
Ln<1LU!VLL.
519
In the case
of
u .....2_
no·"
.... 2
-"'0
versus
Hl:C/2 <
HO is rejected if
-'-----:~-
2
-< X 0',11-1
a5
flO:
versus
H1:(12 >
(16
is rejected if
aDD< 1..A.A
A PROOF THAT
Theorem 7.A.4.1.
ONE-SAMPLf t
AGLRT
one-sample t test, as oUilined in Theorem 7.4.2, is a GLRT.
Proof. Consider
test of flo:!.L !.La versus HI: J.l ~ tLo.
two parameter spaces
restricted to Ho and Ho U HI-that is, wandO, respectively-are given by
w = {(!.L.
<
and
Without elaborating the details
be readily shown that, under lJ),
J.le
Example 5.2.4 for a
= J.lo
Under
J.le
=Y
1 "
= - L(Yi n i=l
similar problem). it can
520
Chapter 7
The Normal Distribution
L(w... ) :::
1l!2
L(!2",) =
11
Dr
L (Yi -
Y)2
i=1
From L(we) and L(n,:) we
the likelihood ratio:
the case, it will prove to be more convenient to
a test On a monoto~
......" ........ of A, rather than on Aitself. We
by
the ratio's denominator.
As is
~
Il
L (Yi ;=1
j1.{)2
= L [(yi
+
1=1
Il
==
L (Yi
;=1
- Y)2
+ nCY -
j1.{))2
A Proof that the Qnre-Sample tTest Is a GLRT
Appendix 7.A.4
511
Therefore.
A=
1
-
J.l(J)2
+ -,,""':::--"';""';;"';""';;
L (Yl -
)i)2
i=1
-71/2
= ( 1+
-1'1 - 1 )
where
Observe that as t2 increases, A decreases. This implies that the original GLRT-wbich,
definition., would have rejected Ho for any Athat was too small, say, less than A"'-is
equivalent to a test that rejects Ho whenever ,2 is too large. But t is an observation
the
variable
T=--~
Theorem 7.35)
Thus "too large" translates numerically into 'a/2,7I-l:
But
and the theorem is proved.
0
CHAPTER
8
Types of Data: A Brief Overview
8.1
B.2
8.3
INTROOUCTION
ClA5SIFYtNG DATA
TAKING A SECOND lOOK AT STATl!!i11CS (SAMPlES ARE NOT"VAUD"O
Aretbedata
qUlllitative or
qu.mtitll.tive'1
Quantitative
Ate tbe unit&
Ili.milar Of
diMim.iIar?
Dissimilar
More Ihan two
How many
One
r----'""'--1 treatll:lent levels 1----..;
lire involved'?
lWo
Dependent Are tbe samples
dependent or
independent?
The practice of stati:rtk.s is typically conducted on two distinct levels.
Analyzing data requires first and foremost an understanding of random
variables. Which pdfs are modeling the observations? What parameters
are involved, and how should they be estimated? Broader Issues. though,.
need to be addressed as well. How is the entire .set of measurements
configured? Which factors are being investigated; in what ways are they
related? Altogether.. seven different types of data are profiled in Chapw- 8. Collectively, they represent a sizeable fraction of the "experimental
designs" any researcher is likely to encounw-.
522
Section 8.1
Introduction 523
INTRODUcnON
Chapters 6
7 have introduced the basic principles statistical inference.
typical
objective
that material was either to construct a confidence interval or to test the
credibility of a null hypothesis. A variety of fonnulas and decision rules were derived to
accommodate distinctions the nature of the data and the parameter
investigated.
It should not
unnoticed, though, that every set of data in those two chapters, despite
superficial differences,
a critically important common denominator-each
represents the exact same experimental design.
A working knowledge of statistics requires that the subject be pursued at two different
levels.
the one hand, attention
to
paid to the mathematical properties
in the in.dividual met1suremenJs.
are what might be thought of as the
"micro" structure ofstatistics. What is the pdf of the YiS? Do we know E(Yi) or Var(l'i)?
Are the
independent?
Viewed coUectively, thougb, every set of measurements
has a
overall
It will
those
features that we focus on in this chapter. A
structure, or
number of issues need to be addressed. How is one design
from another? Under
what circumstances is a given design desirable? Or undesirable?
does the design of
an experiment influence the analysis of that experiment?
The answers to some of
questions will need to be deferred until each design is
taken up individually and in detail later the text. For now our objective is much more
limited--Chapter 8 is meant to be a brief introduction to some of the important ideas inhere will serve as a backdrop and a frame
volved in the classification of data. What we
reilerence for the mul tiplicity of sta tistical procedures
in Chapters 9 through 14.
Defi nitions
To describe an experimental design, and to distinguish one from another, requires that
we understand several
definitions.
Treatments and Treatment Levels. The word treatment is used to denote any condition
or trait that is "applied to" or "characteristic of" the subjects being measured. Different
versions, extents, or
of a treatment are referred to as levels. Illustrating that
distinction is
breakdown in
8.1 which shows consumer reactions
a scale
TABLE 8_1.1
Sports Coupe
Age of SubjeCt
Male
21-44
8
45-64
7
7
7
65+
4
6
7
6
6
5
3
5
Four-Door Sedt:Jn
Male
Female
6
8
7
5
8
8
9
8
6
7
7
9
524
Olapter B
Types of Data: A Brief Overview
of one to ten) to two new automobile models. Listed are the opinions given by a total
of twenty-four subjects. Age, gender, and model of car are all considered treatments.
The three levels of age are tbe ranges 21-44, 45-64, and 65+. Similarly, male and female
are the two levels of gender, and sports coupe and four·door sedan are the model levels.
Blocks.. Sometimes groups subjects share certain
they respond to treatments, yet those characteristics are of no intrinsic
experimenter. We call any sucb group of related subjects a block.
Table 8.1.2
the yields of corn (in bushels) that were barvested from three fields: A,
B, and Equal
in each field were
with one three
King's Formula 6, or Greenway. The objective was to
the ettc:ctlverle88
three fertilizers.
TABLE 8..1.2
A
B
C
Gro.Fast
King's Formula 6
Greenway
126
84
113
137
119
89.
121
87
124
Even city slickers can readily appreciate that no three fields will
entirely identical
in their ability to grow com. Variations in drainage, soil composition, and sunlight will
inevitably have
on fertility.
precise nature of those field-to-field differences,
though, is not being quantified, nor is it the experiment's
In
lingo of experimental
design. fields A, B, and C are blocks. (Gro-Fast, King's Formula 6, and Greenway. on the
other band, are treatment levels because they represent speci.fic formulations and their
comparison is the study's stated objective.)
Independent and Dependent Samples. Whatever the context, data collected for
purpose of comparing two or more treatment levels are necessarily
dependen1 or
independent. Table 8.13 is an example of the former. Listed are interest rates on borne
mortgage loans offered by three competing banks. The 9.6, 10.1, and 9.8 reported on
15 are considered dependent
because of what they have in coromon:
the particular economic conditions that
All three refiect, probably to no small
prevailed on January 15. By the same argument, entries 9.4, 9.9, and 9.8 are also related-in
TABLE 8.1.3
Date
Jan.
Marcb 10
JulyS
Sept 1
Second Union
Bankers Trust
Commerce Mutual
9.6%
10.1%
9.9
9.6
11.0
9.S%
9.8
9.4
9.3
10.6
9.5
lOA
Section 8.1
Introduction
525
TABLE 8.1.4
Brand A
Brand B
852
801
864
835
843
&J7
832
819
their case, by
of whatever
circumstances were present on March 10. Without
exception, measurements that belong to the same block are considered to be dependent.
In practice, there are many different ways to make measurements dependent; "place"
and 8.1.3) are two of the most common.
and "time" (as in Tables
Contrast the structure of Table
with the lWO sets of measurements in Table
showing the lengths of lime (in hours) that it
ten light bulbs to burn out.
of the
bulbs were brand A; the other five were brand B. Here there is no row-by-row common
8.1.3. The
recorded for the first brand
denominator analogous to
in
bulb has no special link to the 810 recorded for the
brand B bulb. Similarly, the
and 801 in the second row are unrelated. Because the absence of any direct connections
between these two sets of observations, row-by-row, we say that brand A and brand B
measurements are independent samples.
Similar and Dissimilar Un.i1s. Units must also be taken
account
we classify
a data set's macrostructure. Two measurements are said to be similar if their units are the
same and dissimilar otherwise. Tables
8.1.3,
8.1.4 have all been examples
of data that are unit compatible.
information displayed in Table
does not follow
area and (2)
asking price for five
that pattern. It shows (1) the amount of
properties listed by a local realtor. Since the first measurement is recorded in square
and the second is in dollars, the two are considered dissimilar.
TABLE 8.1.5
UvingArea
(in square feet)
Asking Price
1049 Ridgeview
Tyne
2860
3210
$410,500
419,900
6086 Harding
2350
5340
346,000
659.500
Property
4111
Quantitative Measurements and Qualitative Met'lSUrements. Finally, a distinction
needs to be drawn between measurements that are quantitative and those that are
qualitative. By definition, quantitative data are observations where the
values
are numerical. "Values" for qualitative data are either categories or traits. Table 8.1.6
S2i
Chapter 8
Types of Data: A Brief Overview
TABlE 8.1.6
Olden Properties
Builders
Maverick CDs
Adam East
Bayou Construction
Type
Oassification
Real estate
Construction
Conunercial
Commercial
Construction
Loss
Doubtful
Loss
Substandard
Marginal
illustrates qualitative data on the status of a bank's five largest loans in trouble. Here, one
measurement has
possible (nonnumerical) values; the other
four:
I
Commercial
of Loan == Construction
Real estate
..
classificatiOn =
(By way of comparison,
the data
Marginal
Substandard
~btful
!
Tables 8.1.1-8.1.5 are quantitative.)
CASE STUDY 8.1.1
8.1.7 tracks the recent history of
postage rates (184). On May
1971, the
cost of sending a letter first class was 8¢; by
1,1995 (nine price hikes later),
T.A8l£ 8.1.7
Date
May 16, 1971
March 2,1974
Dec.
1975
May 29, 1978
March 22,
Nov. 1, 1981
Feb. 17, 1985
April 3, 1988
3, 1991
Jan. 1, 1995
Years after Jan 1, 1971
Cost
0.37
3.17
5.00
7.41
10.22
10.83
14.13
17.25
8
10
13
15
20
22
25
20.09
29
24.00
32
(Conti~ on
next page)
Section 8.1
Introduction
527
'fABLE 8..1.8
Passenger Boardings
1991)
Passenger Boardings
41,388
44,880
44.148
39,568
34,185
37,604
Feb.
34,805
33,025
34,873
31,330
30,954
~arch
32,402
April
38,020
42,828
41,204
~onth
July
Aug.
Sept.
Oet.
Nov.
~ay
June
(F~aL1992)
42,038
28,231
29,109
38,080
34,184
39,842
46,727
a stamp cost 32¢. The figures in Table 8.1.8
the numbers of passenger hoardings by
month for fiscal
1991
1992.. as reported by the Pensacola Regional Airport.
Relative to the definitions just introduced, how are
two sets of data comparable?
How are
different?
In both cases, the information recorded is quantital.ive and dependent, with
the source of the dependency being
.. For the
data, there are two
treatments, "Years !lfter Jan. 1. 1971" and "Cost (¢)." For the airport data, there is
at two levels, "FIscaJ 1991"
one treatment-«Passenger boardings"-but it
"Fiscal
" Moreover, the
in Table
are dissimillU,
those in
Table 8.1.8 are similnr.
Possible Designs
the definitions
on pages 52.3-525 can give
to an enormous number
of different experimental designs, far more than can be
in this text Still, the
number of designs that are widely
is quite small The vast majority of data likely to
be encountered full into one of the following seven designs:
One-sample data
Two-sample data
k-sampJe data
Paired data
Randomized block data
Regression data
Categorical data
528
Chapter 8
Types of Data: A Brief Overview
The postage figures in Table 8.1.7, for example, qualify as regression
the Dru;seln2f~r
boardings in Table 8.1.8 are paired daJ.o.. (The ratings in Table 8.1.1, on the other hand,
have a more complicated experimental structure and cannot be described
any of these
seven basic designs.)
and reduced to a mathematical
Section 8.2, each design will be profiled
model. Special attention will be given to each
for what type
of inference is it likely to be used?
8.2
a.ASSIFYING,DATA
The answers to no more than
questions are needed to classify a set of data into one
of the seven basic models listed in the preceding section:
1. Are the observations quantitative or qualitative?
2. Are the units similar or dissimilar?
3. How many treatment levels are involved?
4. Are
observations dependent or independent?
In Section 8.2, we use these four questions as the starting point in dlstinguislting one
from another.
One-Sample Data
The simplest of all experimental designs, one-sample tklta conslsl of a single random sample
of size n. Necessarily, the n observations are measurements reflecting one particular set
of conditions or one
treatment.
could be either qualitative or quantitative.
of
that
Typical is
showing for a sample of ten airlines the
landed within fifteen minutes of
scheduled arrival times (197).
By far, the t\Vo most frequently encountered examples of one-sample data are (1) a
random sample of n normally distributed observations and (2) a random sample
SUC~Ce!;se!)" and "failures" occurring in a
of n
Bernoulli trials. For
TABLE 8..2..1
Carrier
United
America West
Delta
USAir
TWA
Continental
Southwest
Alaska
American
Northwest
on Time
82.0%
88.0
76.1
83.5
78.1
77.3
92.1
87.4
79.3
Section 8.2
Classifying Data
52!)
salnples from a normal distribution, the objective is often to construct confidence intervals
or test hypotheses aboul/k (using the Student t distribution) or to draw inferences about
2 distribution). Theorems 7.4.1 and
(12 (using the x
detail tbe procedures
drawing conclusions about JA,: Theorems
and 7.5.2 deal with confidence
and
hypothesis tests for
Data recorded as "success" or "failure" are typically modeled by the binomial
distribution, and inference procedures focus on the unknown success probability, p.
Theorem 6.3.1 gives the large-sample decision rule for testing Ho: p = p()~ confidence
intervals for p are taken up in Theorem 5.3.1.
Mathematical Model
Figure 8.2.1 illustrates the structure of one-sample data. For the purpose of comparing
experimental designs, it often helps to represent data points as sums of fixed and variable
components. These expressions are known as model eqWltions. For one-sample data, the
model equation for an arbitrary Yi is written
(fixed) mean of the probability distribution being
by the
where JA, denotes
data and Ei is a random variable
the "error" the measurement-that is,
deviation the measurement from its mean, IJ..
Treatment
Model Equation
li=J1+B;,
i=l,2, ... ,n
Y
II
fiGURE 8.2. 1
U the Yi'S are quantitative measurements, the assumption often made is that E:i is a
normal random variabJe with mean zero and standard deviation u. The latter is equivalent
to assuming that Yj is normally distributed with mean IJ. and standard deviation (1.
Two-Sample Data
one-sample design typically requires that a set of measurements be compared to a
fixed standard-for example, testing the null hypothesis 110: JA, = JA,(). More likely to be
encountered, though, are situations where an appropriate standard fails to exist or cannot
be identified. In those cases, measurements need to be taken on each of the treatment
levels being compared. The simplest such design occurs when only two treatment levels
are involved and the two sampJes are independent
Consider the data in Table 8.22 showing the
(in seconds) that 15 male
flies
and female fruit flies (Drosophila melanogaster) spent preening themselves (31). Here,
530
Chapter 8
Types of Data: A Brief Overview
TABlE 8.2.2
Male
2.3
2.9
Female
Times (sec), Xi
1.9
2.2
2.4
3.3
12
2.0
2.3
1.9
2,7
1.2
1.3
2.1
Times (sec), Yi
3.7
11,7
5.4
2.8
2.2
2.4
4.0
2.8
2.0
2.8
2.4
2.9
10.7
2.4
3.2
"Male" and "Female" are the two treatment levels, the units are similar (seconds), and
the samples are independent.
are the conditions that
two-sample data.
Two-sample inferences tend to be hypothesis tests rather than confidence intervals,
although botb techniques will be developed in Chapter 9. In Table 82.2, for example, the
two
means are x =
sec (for the
and y = 4.09 sec (for the females).
Suppose iLK and ILl' denote the true
preening times
male fruit flies and
female fruit flies, respectively. Is the null hypothesis Ho: J.1-X = ILy credible in light of the
difference between x and y? As we will see in section 9.2, the answer to that question
takes the form of a two-Sllmple t test. (And, yes, the answer is what male chauvinists would
be hoping
additional preening time spent by
females
statistically
Significant!)
Mathematical Model. Let Xi and Yj denote the ith and jth observations in the X and
Y samples, respectively. The assumptions implicit in the two-sample fonnat
that
the Xs and Ys are independent and that
Xi
= ILx +
Ei,
i
= 1,2, ... , n
and
j =1,2 •... ,m
In many situations, the error terms, Ej and ej, are assumed to be nonnally distributed with
mean zero and the same standard deviation 0' (see Figure 8.2.2).
Treatment Levels
Model Equation
- -1 - - - -2 - - - - - - -Xi = J1.K + tj, i = 1,2, ... , n
XII
Y",
FlGUftE 8.2.2
Section 8.2
dassH'ying Data
531
k-Sample Data
When more than two treatment levels are
compared, and whell the samples
representing those levels are independent, the observations are said to be k-sllmple datil.
Although their assumptions are comparable. two-sample data and k-sample data are
treated as distinct experimental designs
the methods for
them are
totally different.
Table
summarizes a set of k-sample data
k = 3. The same strain of bacteria
was grown in each of nine Petri dishes, and the latter were divided into three groups.
Each group was treated with a different antibacterial agent. Two days later
diameters
of the areas showing no bacteria!
were measured (in centimeters).
TABlE 8..2.3
M21z
ATC3
B169
2.9
5.0
3.1
4.8
4.6
2.93
4.80
4.3
Sample means:
3.87
Typically, the objective with k-sample data is to test Ho: J.Ll J.L2 =
Jl j represents the true mean associated with the jih treatmellt leveL
...
= J.Lt, where
the
in
Table 8.2.3, (or example, the
to
resolved is whether the differellccs among the
sample means (3.87, 2.93. and 4.80) are sufficiently large to reject
hypothesis that
J.Ll
= J.L2 = J.L3·
The I test format that figures so prominently in the interpretation of one-sample
two-sample
cannot be extended to accommodate k-sample data. A more powerful
technique, known as the analysis of variance, is oeeded. The latter will be developed in
Chapters and 13.
Matbematical Model. The only structural difference betweell
mathematical modnumber of treatment levels compared (see
els for two-sample and k-sample data is
Figure 8.2.3). However, withk > 2, USing different letters to represent different treatment
levels is unwieldy. Double-subscript notation is much more convenient-Yij will denote
Treatment Levels
1
2
Model
fu
Yu
Y21
k
Y22
Y2k
Y1l22
Ynkk
+
YI'j = J.Lj
i = 1,2. .... nj.
j
FIGURE 8.2.3
= 1, 2, .... k
532
Chapter 8
of Data: A Brief Overview
the ith observation in the jth
Likewise,
error terms will
written BU' As
'-""',.v ..... the latter are usually assume<! to be nonnaHy distributed with mean zero and the
same standard deviation a for aU i and j, Moreover, all the samples must be independent.
Paired Data
l:W()-S.imJPJe and k-sample
treatment levels are compared
ituleperuienl
samples. An alternative is to use dependent samples by
subjects into blocks. If
only two treatment levels are involved. dependent measurements are classified as paired
tinta, A typical scenario is the application of two treatments or conditions to the same
subject-for example, blood pressure measurements taken "before" and "after" a
receive<! medication.
Table 8.2.4 shows a paire<!-<iata comparison of a baseball team's ba tting averages. The
two treatment levels are when a game was playe<! ("Nighttime" or "Daytime"). The two
entries in a given row--for example, the .310 and ,320 for
clearly dependent:
A player with a
average during night games is likely to have a high average during
poor-hitting players will probably have low batting averages
day games as welL
regardless of when games are sche<!uled.
TABLE 8.2.4
Ave,
RA,d
3b
WC,ll
JA,lb
DC,c
RS,2b
.310
.286
.302
.280
.214
.302
JL,ss
.276
BB,d
.285
.320
.290
.298
.W
.226
.300
.290
.295
The statistical
of two-sample data and paired data is often the same.
that the true averages (/.LX and /.LY)
seek to examine the plausibility of the null
with the two treatment levels are equal.
Mathematical Model. The responses to treatment leveJs X and Y for the ith pair are
denoted Xi
Y/, respectively. Both measurements
reflect the particular
conditions that
the ith pair. We will denote the "pair
the symbol
That is, Xi = I-LX +
+ Ei and Yj = /.LY + B; + The fact that Bi is the same (or
both Xi and Y; is precisely what makes the samples dependent (see
8.2.4).
Randomized Block Data
When dependent samples are used to compare more than two treatment levels, the
an obvious
measurements are referred to as randomized block data. Despite
generalization of paired data, the randomized block
is
separately OA.-..""'-"',"
Section 8.2
Oassifying Data
533
AGUR.EB.2A
the methods required for its analysis are entirely
(recall the similar justification
for keeping two-sample
and k-sample data as two separate ae!>JgrlS).
8.2.5 summarizes the results of a randomized
experiment set up to
investigate the possible effects
"blood
" a controversial procedure whereby
athletes are injected with additional
blood cells for the purpose of enhancing
performance (17). Six runners were the subjects (and, thus, the blocks). Each was
ten thousand-meter races: once after receiving extra red blood cells, once
injected
a placebo, and once after receiving no treatment whatsoever. Listed
are their times
minutes) to complete the race.
TABLE B.2.S
No Injection
1
2
3
4
5
6
34.03
32.85
33.50
32.52
34.15
33.77
32.70
33.62
31.23
33.05
31.55
32.33
31.20
32.80
33.07
Oearly, the times in a
row are dependent-all three depend to some extent on
the
speed of
subject.
of
treatment level might
be
Documenting
from subject to subject, though, would not
the
objective for doing this sort
study. If !hI. !h2, and Ji3
the true average times
characteristic of the nn injection, placebo, and blood doping treatment levels, respectively,
the experimenter's tirst
would
to test Ho: Jil = Jl2 = Ji3. As we will see
Chapter
the
as to whether or not a null hypothesis
this sort should be
rejected turns out to
another application of the analysis of variance.
Mathematical Modd. Randomized block data have the same basic structure as do
paired data. As we saw with k-sampie data, though, the multiplicity of treatment levels
Figure 8.2.5). As before, the Bi
dictates that double
notation be used
component is the term that makes
observations in a given row-Yil. Yi2 •... , and
Yik-dependenL
534
Chapter 8
of Data: A Brief Overview
Treatment Levels
Block
1
2
1
2
Yu
Y12
Ii:
Modell:.Y1J<1UIUU;:'
Yu
Y2\;
i
= 1,2•... 111,
j=1,2, .. ,k
11
Ynl
Yn 2
Yn,k
RGURElU..5
Regression Data
All the experimental designs introduced up to this point share the property that their
measurements have the same units. Moreover, each has had the same basic objective: to
quantify or to compare
of one or more treatment
In contrast. regression
dllla typically consist of measurements with dissimilar units, and their objective is to study
the functional relationship between the variables r-ather than test the null hypothesis that
a set of means are aU equal.
Table 8.2.6, showing the increase in the cost of a first class postage stamp from 1971 to
1995, is an example of
data (recan Case Study 8.1.1). Any direct comparison
of the information in
second and third columns is impossible
the
are
incompatible. It makes sense, instead, to focus on the relationship between years after
Jan. 1, 1971 and cost.
Graphing is especially helpful with regression data. Figure 8.2.6 shows .a plot of
Cost (= y) versus Yean after Jan. 1, 1971
x). Superimposed is a straight line-y =
7.50 + l.()4x-that "best" fits the ten (Xi. Yi)S (using a technique we will learn in
Chapter 10).
TABLE 8.1.6
Years after
Ian. 1,1971
Cost
(in cents)
5116171
312n4
12131n5
037
3.17
5129nS
7.41
1022
10.83
14.13
17.25
S
10
13
15
Date
3122181
11/1/81
2117/85
413188
2/3191
111/95
5.00
20.09
24.00
18
20
22
29
32
Section 8.2
Classifying Data
535
35
30
Q.
~25
t;
!!'"
20
]
15
(.)
"-I
0
~
10
5
o
10.00
5.00
15.00
20.00
25.00
Years after Jan. 1,1971
FIGURE 8.2..6
Mathematkal Model Regression data often have the form (Xi. Yi), where Xi is a
number and Yj is a random variable (having different units from Xi)' A particularly
important special case is the so-called linear mode~ where the mean of Yi is linearly
related to Xi.
is, Yi = fJo + !3tXi + €j, where £i :is normally distributed with mean
uro and standard deviation a. More generally. E(Yi) can be any function, g(Xj, fJo, Pt .... ).
of xi-for example, E(Yi) = /3oxft
OT
Independent
Variable
Subject
1
Xl
2
X2
E(Yd
= fJoeiJJ x1
Dependent
Variable
Y2
Figure 82.7).
Model Equation
g(Xj./30,!3!> ..• )
+ Ei.
1,2, ... ,n
n
Xn
Y"
AGUREB.2.7
Categorical Data
If the information recorded for
of two dissimilar
is qualitative rather
than quantitative, we call the measurements categorical dOla. Typical is a recent study
undertaken to investigate the relationship-if one exists-between a physician's Specialty
(X) and his or her Malpractice history (Y).
range of each variable was reduced to
536
Chapter 8
Types of Data: A Brief OVerview
three (nonnumerical) classes:
Specialty
Malpractice history
=
=
orthopedic surgery (OS)
obstetrics-gynecology (OB)
{ internal medicine (1M)
A: noclaim
B: one or more
ending
in nonzero indemnity
C: one or more claims but nOne
requiring compensation
In its original fonn, the information coUected on the 1942 physicians interviewed
8.2.8 (32). Data of
sort are usually summarized
looked like the listing in
by tallying the number of times each (X. Y) "combination" occurs and displaying those
frequencies in a contingency table (see Figure 8.2.9).
The inference procedure that typicaUy accompanies the construction of a contingency
table is a hypothesis test, where Ho states that the random variables X and Y are
independent.
is a
frequently encountered experimental design, especially in the
Malpractice History
Case
ML
EM
1M
OB
OS
1M
B
B
C
A
MS
OB
C
1
SB
2
3
4
1942
FtGlJftE 8.l.8
Obstetrics-Gynecology
Internal
Medicine
Totals
Orthopedic
No claims
At least one
claim lost
At
one
but
no damages awarded
147
349
700
1205
106
14
62
317
156
149
115
420
Totals
400
647
886
1942
Section
social
a
SCleO(:es.
The statistical
that will be
Classifying Data
537
for analyzing car,egIDn1c:al data is the chi square
in Chapter 11.
L.....LUL''"I'''' ...
Mathematiad Modd. The assumptions associated with categorical data are far
is no
specific
those we have seen
the six previous experimental designs.
requirement of normality. for
and no particular model equation. In effect, X
and Y can be any discrete random
whatsoever
8.2.10).
Variable
Subject
Second
Variable
1
2
n
Model
Yl
Y2
X and Yare discrete
random variables
Yn
XI>
FIGURE 8.2.10
Start
Are Ihe data
qualitative or
quantitative?
QualJtative
Are Ihe units
Dissimilar
&imilar or
di!similar?
Similar
More than two
How many
.------1 treatment levels
One
I----_~
are involved?
f:.~~~~ Are the samples
....
dependent or
independent?
I
Are the samples Dependent
dependentot
independent?
FIGURE 8.2. 11
538
Chapter 8
Types of Data: A Brief Overview
A Flowchart For Classifying Data
It was mentioned a1 the outset of this section that classifying data into the seven models
just
that a maximum of
questions be answered (recall page 528).
8.2.11 is a flowchart that summarizes the model-identification process.
EXAMPLE 8.2.1
The federal Community Reinvestment Act of 1977 was enacted out concern thaI banks
were reluctant to make loans in low- and moderate-income areas, even when applicants
seemed otherwise acceptable. The
in Table 8.2.7 show one particular bank's credit
penetration in ten low-income census tracts (A through J) and ten high-income census
tracts (K through T). To which of the seven models do these data
Note,
that the measurements (1) are quantitative and (2) have similar units.
Low-income and High-income correspond to two treatment levels, and the two samples
are clearly
(the 4.6 recorded in tract A, for example, has
in common with the 11.6 recorded in tract K). From the flowchart, then, the answers
quantitanve!similarJtwoJindependent imply that these are two·sample dizta.
TABLES.l.7
Low Income
Tract
A
B
C
D
E
F
H
I
J
Percent of
with Credit
4.6
6.6
High Income
Census Tract
Percent of Households
with Credit
K
L
11.6
8.5
8.2
15.1
12.6
11.3
9.1
4.2
M
9.8
6.9
11.0
6.0
4.6
4.2
5.1
N
0
P
Q
R
S
T
6.4
5.9
EXAMPLE 8.2.2
ill 1991, a rule change in college football narrowed the distance between the
of that legislation on the probability of
goalposts
23'4" to 18'6". The
players successfully
points
touchdowns (P ATs) are
in
8.2.8.
The numbers in the tirst column are based on aU college games played through September
of the 1990 season; those in the second column corne from the 1991 season (194). What
experimental design is represented?
,"j",'" the numerical appearance of the information in Table 8.2.8, the actual data
here are qualitative, not quantitative.
959, 829, 46. and 82 are not
H.'''",nTn'',",
.L.; .....
TABLE 8.2.8
"Wide" Goalposts
Total
Percent successful
"Narrow" Goalposts
(1991 season)
959
829
46
82.
1005
95.4
911
91.0
they are summaries of measurements. What was recorded for each attempted conversion
were two
of qunlitative information:
g
oal
tpas -
)ut(:OITle of kick
=
{Wide
narrow
{Successful
1
unsuccessf u
Only later were the 1916 data points summed up,
and reduced to the four
frequencies appearing in Table 8.2.8. By the answer to the first question posed in Figure
8.2.11,
are categoricnl data.
EXAMPLE
People looking at the vertical lines in Figure 8.2.12 will tend to
the right one
as shorter, even though the two are equal Moreover, the perceived difference in
lengths-what psycbologists call the "strength" of the il1usion-bas been shown to be a
function of age.
a study was done to see wbether individuals who are bypnotized
n:~).t:::;.scu to different
perceive the illusion differently. Table 8.2.9 shows
illusion strengths measured
eight subjects while they were (1) awake, (2) regressed to
age nine, and (3) regressed to age five (142). Whicb the seven experimental designs do
these
represent?
at the sequence of questions
by the flowchart in Figure 8..2.11:
1.
the data qualitative or quantitative? Quantilotive
2. Are the units similar or dissimilar? Similar
3.
many treatment
are involved? More than two
4. Are the observations dependent or
Dependent
According to the flowchart, then, these measurements qualify as randomized block
540
Chapter 8
Types of Data; A Brief Overview
TABU 8.2.9
1
2
3
4
5
6
7
8
(1)
(2)
Regressed
to Age 9
(3)
Regressed
to AgeS
0.81
0.44
0,44
0.56
0.19
0.94
0.44
0.06
0.69
0.31
0.44
0.44
0.19
0.44
0.44
0.19
0.56
0.44
0.44
0.44
0.31
0.44
0.19
FIGURE 8.2.12
QUESllONS
for Questions 8.2.1-8.2.12 use the flowchart in Figure
to identify the experimental
designs represented. In each case, /l1'tSwer whichever of the questions on p. (528) are
necessary to l1Ulke the determination..
8.2.l. Kepler's Third Law states that "the squares of the periods of the planets are proportional
to the cubes of their mean distance from the Sun.." Listed below are the periods of
revolution (x). the mean distances from the sun (y), and the values x 2 /
planets in the solar
(4).
r for the nine
Section 8.2
Planet
Xi
Mercury
Venus
Mars
Jupiter
Saturn
Uranus
Neptune
Pluto
(years)
Classifying Data
541
Yi (astronomical units)
0.241
0.615
1.000
1.881
11.86
0.387
0.723
1.000
1.524
5.203
29.46
9.54
84.01
164.8
19.18
30.06
248.4
39.52
1.002
1.001
1.000
1.000
0.999
1.000
1.000
1.000
1.000
8.l.2. Mandatory helmet laws for
riders
a contrO,versial
Some
states have had a "limited" ordinance that applied to only younger
others
have a "comprehensive" statute requiring aU riders to wear helmets. Listed below are
the deaths per 10,000 registered motorcycles
by states
type of
legislation (192).
Helmet Law
Comprehensive Helmet Law
6.8
10.6
9.6
9.1
5.2
13.2
7.0
4.1
5.7
7.6
3.0
6.7
15.0
7.1
11.2
17.9
113
85
93
6.9
7.3
4.2
4.8
10.5
8.1
9.1
05
6.7
6.4
4.8
5.0
7.0
6.8
8.1
12.9
5A
1U.l. Aedes aegypti is the scientific name of the mosquito that transmits yellow fever. AJthough
no longer a
health
tbe Western world, yellow rever was perhaps the
most
disease in the United States for
200 years.
see how long it takes the Aedes mosquito to complete a
five young females
were allowed to bite an exposed human forearm without the
of being swatted.
The resulting blood-sucking times (in seconds) are summarized below (90).
Mosquito
Bite Duration (5)
1
2
202.9
3
315.0
4
5
8.2A. Male cockroaches can be
anrBgclflIsuc_l:owaro other male cockroaches. Encounters
may be lieeting or quite ........ ",t..rt
resulting
antennae and
542
Chapter 8
Types of Data: A Brief Overview
broken
A study was done to see whether cockroach density has any
on
the frequency of serious altercations. Ten groups of four male cockroaches (Byrsotria
fumigolo) were each subjected to three levels of density: high. intermediate, and
low. The following are tbe numbers of
encounters per minute that were
observed (16).
1
0.30
0.20
0.17
0.25
0.27
0.19
Intermediate
Low
0.12
0.28
0.20
0.15
031
0.16
0.20
0.17
0.18
0.20
0.20
9
10
0.23
0.31
0.29
0.11
0.24
0.13
0.36
0.20
0.12
0.19
0.08
0.18
020
Averages:
0.25
0.18
2
3
4
5
6
7
8
8.2.S. Luxury suites, many costing more tban $100,000 to rent, have become big-budget status
symbols in new sports arenas. Below are tbe numbers of suites (x) and tbeir projected
revenues (y) for nine of the country's newest facilities (2fJ7).
Arena
Palace (Detroit)
Orlando Arena
Bradley Center (Milwaukee)
America West (Phoenix)
OlarIotte Coliseum
Center (Minneapolis)
City Arena
Miami Arena
ARCO Arena (Sacramento)
Number of
Suites, x
Projected Revenues
(in millions), y
180
26
68
88
12
67
$11.0
1.4
3.0
6.0
0.9
4.0
56
18
1,4
30
2.1
8.2.6.. Deptb perception is a life-or-death ability for lambs inhabiting rugged mountain terrain.
How quickly a lamb deveJops that faculty may depend on the amount of time it spends
with its ewe. Thi.rteen sets of lamb liuennates were the subjects of an experiment tbat
addressed that question (101). One member of each litter was left with its motber; the
other was removed immediately after birth. Once every hour, the lambs were
Classifying Data
SectiooB.2
on a simulated cliff, part of which included a platform of glass. If a lamb placed its feet
on the glass, it ''failed'' the test, since that would have been equivalent to walking off
the cliff. Below are the
numbers when the lambs first learned not to walk on the
IJBiS5--mat is, when they first developed depth percer:,t1o.n.
Number of Trials to Learn
Depth Perception
Group
Mothered, Xi
1
2
3
4
5
6
2
)Ii
3
3
7
8
9
5
10
3
2
5
5
1
1
4
2
7
5
3
1
10
5
4
8
7
3
7
5
8.2.7. To see
expectations for students can become self-fulfilling prophecies,
fifteen first-graders were given a standard 10 test.
childrens' teachers, though, were
told it was a special test for predicting whether a child would show sudden spurts of
inteUectual growth in the near future (see
divided the children into
three gTO!..tpS of sizes
five, and four at
but they informed the teachers that,
according to the test, the children in Group I wo.uld not demonstrate any pronounced
intellectual growth for the next year, those in
II would develop at a moderate rate,
and those in
1lI could be expected to
exceptional progress.
the same
were again given a standard 10 test. Below are the
the two scores
child (second test - first test).
Changes in IQ (second test - first test)
Group I
3
2
6
10
10
5
Group
10
4
11
14
3
Group III
20
9
18
19
544
Chapter 8
TypeS of Data: A Brief Overview
8.2.8. Among young drivers, roughly a third of aU fatal automobile accidents are speed-related;
by age 60 that proportion drops to about one-tenth. Listed below are a recent year's
percentages of speed-related fatalities for ages ranging from 16 to 72 (198).
Fatalities
Percent
16
37
17
18
33
19
34
20
24
33
31
28
27
26
32
23
42
16
13
57
10
72
9
7
8.2.9. Gorillas are not the solitary creatures that they are often made out to be: they
live in groups whose average size is about 16, which usually incJudes 3 adult males,
6 adult females, and 7 "youngsters." Listed below are the sizes of 10 groups of
mountain gorillas observed in the volcanic highlands of the Albert National Park in the
Congo (161).
Group
No. of Gorillas
1
8
2
19
3
4
5
5
24
11
6
20
7
18
8
21
9
27
10
16
8.2.10. Roughly 360,000 bankruptcies were filed in
Federal Court during 1981; by 1990
the annual number was more than twice that figure. The following are the numbers of
business failures reported year by year through the 1980s (182).
~..........
8.2
dassifying Data
545
Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
360,329
344,275
477,856
561,274
594.,567
642,993
726.484
un. The diversity of bird s[)eIdes in a given area is related to plant diversity, as measured by
variation in foliage
as well as
variety of flora.. Below are indices measured
on those two traits for
habitats (113).
Plant Cover
Diversity, Xi
Bird Species
Diversity, YI
1
2
3
4
0.90
0.76
1.67
5
0.20
1.80
1.36
2.92
2.61
0.42
0.49
1.90
2.38
124
2.80
241
2.80
2.16
6
7
8
9
10
11
12
13
1.44
1.12
1.04
0.48
133
1.10
1.56
1.15
8.2J.l. M.a1e
often have trouble distinguishing other male toads
a state of
that can lead to awkward moments during
season.
toad A inadvertently makes inappropriate romantic overtures to
the
a short call known as a release chirp. Below are tbe
release chirps measured for. 15 male toads innocently caught up in .LJUO><lU'VvULU1'"""
(19).
546
Chapter 8
Types of Data: A Brief OVelView
Toad
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Length of
Release
(s)
0.11
0.06
0.06
0.06
0.11
0.08
0.08
0.10
0.06
0.06
0.15
0.16
0.11
0.10
0.07
For Questions 8.2.13-8.2.33 identify the experimental design (one-sarnple, two-sample, etc.)
that each set of data represents.
8.2.13. A pharmaceutical company is testing two new
designed to improve the blood~
clotting ability of hemophiliacs. Six subjects volunteering for the study are randomly
divided into two groups of size three. The first group is given drug
the second group,
drug B. The response variable in each case is the subject's prothrombin time, a number
that reflects the time it takes for a clot to form. The results (in seconds) for group A
are 32.6, 46.7, and 81.2; for group B, 25.9, 33.6, and 35.1.
8.2.l4. Investment fums financing the construction of new shopping centers pay close attention
to the amount of retail floor space already available. Listed below are population and
floor space figures for five southern cities.
City
Population, x
Retail Floor Space
(in million square meters), y
1
400,000
3,450
2
150,000
1,250,000
1,825
7,480
2,Y75,000
760,000
14,260
3
4
5
5,290
8.2..15. Nine political writers were asked to assess the United States' cul:pability in murders
committed by revolutionary groups financed by the CIA. Scores were to be assigned
using a scale of 0 to 100. Three ofthe writers were native Americans living in the U.S.,
three were native Americans liv.ing abroad, and three were foreign nationals.
Section 8.2
Americans in U.S.
Classifying Data
547
Americans
45
45
65
75
5(J
40
55
90
85
&2..16. To see whether low-priced homes are easier to seil than
national realty company collected the following
homes were on the market before being sold.
.,......11 ....' ' ' .. ' ...
Number of Days on Market
City
Low-Priced
Buffalo
Charlotte
Newark
55
70
40
30
110
70
8.2.17. The following is a breakdown of what 120 ,"",-,'-''''F.''' freshmen intend to do next Summer.
Work
Male
Female
School
Play
22
14
19
14
31
20
An
was done on the delivery of first-class mail originating from the
in the following table. Recorded for each city was the dve"",",,'t::
(in days) that it took a letter to reach a destination in that same city. ;:sarnpl,es
were taken on two occasions, Sept. 1,2001 and Sept. 1.2004.
1,2001
Wooster
Midland
Beaumont
Manchester
2004
1.8
1.7
2.0
22
1.9
20
2.5
1.7
me;tneXlS (A and B) are available for removing dangerous heavy
8uprpli<;~. Eight water samples collected from various parts of the United
to compare the two methods. Four were treated with Method A
..........'-"" with Method B. After the processes were completed, each "..Ut. . ....,
purity on a scale of 1 to 100.
548
Chapter 8
Types of Data: A Brief Overview
Method A
Method B
88.6
81.4
92.1
84.6
90.7
91.4
78.6
93.6
8.2.20. Out of 120 senior citizens polled, 65 favored a complete overhaul of the health care
system while 55 preferred more modest changes. When the same choice was put to 85
first-time voters, 40 said they were in favor of major reform while 45 opted for minor
revisions.
8.2.21. To illustrate the complexity and
of IRS
a tax-reform lobbying
group has sent the same five clients to each of two professional tax pre parers. The
following are the estimated tax liabilities quoted by each of the preparers.
B
Client
GS
MB
AA
DP
SB
$31,281
14,256
26,197
8,283
47,825
$26,850
13,958
25,520
9,107
43,192
8.2.22.. The production of a certain organic chemical
ammonium chloride. The
manufacturer can obtain the ammonium chloride
one of three forms: powdered,
moderately ground, and coarse. To see if the consistency of the NH4Cl is itself a
factor that needs to be considered, the manufacturer decides to rUll the reaction seven
times with each form of ammonium chloride. The following are the resulting yields (in
pounds).
Moderately
Powdered NH4 Cl
GroundN~a
CoarseNH4Cl
146
152
149
161
158
154
149
150
141
138
142
146
139
137
145
144
148
154
148
150
8.2.23. An investigation was conducted of 107 fatal poisonings of children. Each death was
caused by one of three drugs. In each instance it was determined how the child received
the fatal overdose. Responsibility for the 107 accidents was assessed according to the
following breakdown.
Section 8.2
DrugB
DrugC
10
10
4
18
18
10
13
A
Child Responsible
Parent Responsible
Another Person
Classifying Data
549
were produced showing the """,·,.",C'I'>
workers in a
manufacturing
The enllIle:5
Three different departments were selected at random for the
shown are average annual salaries, in thousands of donars.
of an affirmative-adion litigation,
"(1"..1.1<;;;:' earned by White, Black, and
Hispanic
White
Department 1
Department 2
Department 3
20.2
19.8
19.9
20.6
19.7
19.0
19.2
18.4
20.0
a
was done on fifty people bitten by
animals. Twenty
8.2.25. In
victims were given the standard Pasteur treatment, while the other thirty were given
gamma globulin.
the
treatment in addition to one or more doses of
given the standard treatment
twenty survived in the 5<'''U'''''''
Nine of
globulin
the cost of a basic cable TV package
8.216. To see if any geographical pricing differences
of six
three in the
and three in
was determined for a random
the
Monthly
for the southeastern cities were $13.20, $11,55, and
$16.75;
in the three northwestern
paid $14.80, $17.65, and $19.20.
8..2.27. A public relations firm hired by a would-be presidential candidate has conducted a
pon to see whether their client faces a gender
Out of 800 men
325
strongly supported the candidate. 151 were
opposed, and
were undecided.
were strong supporters, 241 were
Among the 750 women included in the sample,
strong opponents, and 251 were un(leCloea..
8.2.28. As part of a review of its rate structure, an automobile insurance company has
compiled the following data on claims filed by five male policyholders and five femaJe
policyholders.
(male)
Claims
in 2004
Client
$2750
SB
JM
AK
0
0
ML
JT
$1500
0
MS
8M
LL
Claims Filed
in 2004
0
0
0
$2150
0
550
Chapter 8
Types of Data: A Brief Overview
8.2.29. A company claims to have produced a blended gasoline that can improve a car's fuel
consumption. They decide to compare their product with the
gas currently on
the market Three different cars were used for the test: a Porsche, a Buick, and a VW.
The Porsche got 13.6 mpg with the new gas and 12.2 mpg with the "standard" gas; the
Buick got 18.7 mpg with the new
and 185 with the standard; the figures for the VW
were 34.5 and 326, respectively.
8.l.30. In a survey conducted by State University's
Center, a sample of three
freshmen said they studied 6, 4, and 10 hours, respectively, over the weekend The
same question was posed to three sophomores, who reported study times of 4, S, and
7 hours. For three juniors. the responses were 2, 8, and 6 hours.
8.2.31. A consumer advocacy group, investigating the prices of steel· belted radial tires produced by three major manufacturers. collects the following data.
Year
1995
2000
2005
Company A
CompanyB
CompanyC
$62.00
$68.00
$72.00
$65.00
$69.00
$75.00
$70.00
$78.00
$75.00
8.2.32. A small fourth-grade class is randomly split into two groups. Each group is taught
fractions using a different method After three
both
are given the same
100-point test. The scores of students in the first group are
scores reported for the second group are 76,
80, 72. and 67.
91,72, and 68; the
8.2.33. The trock length of a storm (the distaoce it covers while maintaining a certain minimum
wind velocity and precipitation intensity) is an important parameter in a storm's
"profile." Listed below are the track lengths recorded for eight severe hailstorms that
occurred in New England over 8 five-year period (60).
(km)
Date of Storm
6 June 1961
3{J June 1961
1 July 1964
1 July 1964
5 August 1964
10 August 1965
13 August 1965
7 June 1966
16
160
95
65
30
26
26
24
8.2.34. The two-sample data shown on the left give the responses of six subjects to two
treatments, X and Y. Would it make sense to graph these data using the format that
appears on the right? Why or why not?
Section 8.2
Classifying Data
.
4
Treatment X
Treatment Y
3
4
3
2
1
2
551
3
y
.
2
1
0
3
2
1
4
x
8.2.35. Under what circumstances would the structure below be classified as regression dnJa?
Under what circumstance!! would it be classified as one-sample data?
Day
Observation, y
1
Yl
2
3
Y3
n
YII
Y2
8..2.36. Would it be better to graph the data shown below using format (a) or format (b)?
Explain.
Trel11ment
Response
10
..
8
.
6
..
4
.
Pair
X
1
2
3
6
10
8
4
4
4
5
7
6
10
..
..
..
8
Response
6
4
2
2
0
Y
Y
X
(Il>
0
Y
X
(b)
552
8.3
Chapter 8
Types of Data: A Brief Overview
TAK1NG A SECOND lOOK AT STATISTICS (SAMPLES ARE NOT "VAUD"!)
Designing an experiment invariably
that two fundamental issues be resolved.
First and foremost is the choice of the design itself. Based on the type of data available
and the objectives to be addressed, what overall "structure" should the experiment
of the most frequently
answers to that question are the seven
have?
models profiled in Chapter 8, ranging from the simplicity of the one-sample design to the
complexity of the randomized block ...."".1".".
As soon as a
has
a second question immediately follows: How
large should the sample size (or sample
be? It
that question, though, that
leads to a
common sampling misconception.
is a widely held belief (even by
many experienced experimenters who should know better) that some samples are "valid"
(presumably because of their
while others are not. Every consulting statistician
could probably retire to Hawaii at an
age if he or she got a dollar for every time
an experimenter posed the foHowing sort of question: "I intend to compare Treatment X
and Treatment Y using the two-sample fonnat. My plan is to take 20 measurements on
each of the two treatments. Will those be valid samples?"
The sentiment behind such a question is entirely understandable: the researcher is
asking whether two samples of size 20 will be "adequate" (in some
for addressing
the objectives of the experiment. Unfortunately, the word "valid" is meaningless in thi~
context There
such thing as a valid sample
the word "valid" has no statistical
definition.
To be sure, we have already learned how to calculate
smallest values of 11 that will
achieve certain objectives. typically expressed in terms of the precision of an estimator
or the power a hypotbesis test. Recall Theorem 53.2.
guarantee that the estimator
X/n for the binomial parameter p has at least a 100(1 - 0')% chance of
within a
distance d of p requires that 11 be as least as
as z;/1/4d2 .
Suppose, for
we want a sample
capable of guaranteeing that X /n will
have an 80%
100(1
0')%) chance
within 0.05
d) of p. By Theorem 53.2,
n?: (1.28)2 = 164
On the other hand, that sample of n = 164 would not be large enough to guarantee tbat
X/n has, say, a 95% chance of being within 0.03 of p. To meet these latter requirements.,
n would have to as least as
as 1068 (= (1.96)2/4(0.03)2).
Therein lies the problem. Sample sizes that can satisfy one set of specifications will not
necessarily be capable of satisfying another. There is no "one size fits all" value for 11 that
qualifies a sample as being "adequate" or "sufficient" or "valid."
In a broader sense, the phrase "valid sample" is much like the expression "statistical
tie"
Section
Both are widely used, and each is a well-intentioned attempt
to simplify an important statistical concept. Unfortunately, both also share the dubious
distinction of being mathematical nonsense.
CHAPTER
9
Two-Sample Problems
9.1
9.2
INTRODUcnON
TESllNG Ho: ilx :: ily- THE TWO-SAMPLE t TEST
9.3
TESllNG Ho: u~ '" C1~-THE FTEST
9.4 BINOMIAL DATA: TESTING Ho; Px:: py
9.5 CONFIDENCE INTERVAlS fOR THE TWO-SAMPLE PROBLEM
9.6 TAKING A SECOND LOOK AT STATISTICS (CHOOSING SAMPLES)
APPENDIX 9A 1 A DERIVATION OF THE TWO-SAMPLE t TEST (A PROOF OF THEOREM 9.2.2)
APPENDIX 9A2 MINITAB APPUCATIONS
William Sealy Gosset ("Student")
After earning an Oxford
in mathematics and chemistry,
working in 1899 for Messrs. Guinness, a Dublin brewery. Fluctuations
in materials and temperature and the necessarily small-scale experiments
inherent brewing convinced him of the necessity for a new small-sample
theory of statistics. Writing under
pseudonym "Student" he published
work with the t ratio that was destined to
a cornerstone of modern
statistical methodology. -William Sealy Gosset C"Student") (1876-1937)
553
554
9.1
Chapter 9
Two-Sample Problems
INTRODUCTION
The simplicity of the one-sample model makes it the ·logical starting point for any
discussion of statistical inference, but it also limits its applicability to the real world.
Very few
involve just a
treatment or a single set of conditions. On
the contrary,
almost invariably design experiments to compare responses
to several treatment levels--or, at the very least, to compare a single treatment with a
controL
In this chapter we examine the simplest of these multilevel
the two-sample
problem. Structurally, the two-sample probJem always falls into one of two dttferent
mats: Either two (presumably) different treatment levels are applied to two independent
sets of similar subjects or the same treatment is applied to two (presumably) different
kinds of subjects. Comparing the effectiveness of germicide A relative to that of germicide
one produces in two sets of similarly cultured
B by measuring the zones of inhibition
Petri
would be an example of the first type. Another would be testing whether
monkeys raised by themselves (treatment X) react differently in a stress situation from
monkeys raiSed with siblings (treatment V). On the other hand, examining the bones of
sixty-year-old men and sixty-year-old women, all life-long residents of the same city, to
see whether both sexes absorb environmental strontium-90 at the same rate would be an
example of the second type.
Inference in two-sample problems usually reduces to a comparison allocation parameters. We might assume, for example. that the population of responses associated with,
say, treatment X is normally distributed with mean ttx and standard deviation (1x while
the Y distribution is normal with mean tty
standard deviation O'y. Comparing location
Ho: ttx = tty. As always, the alternative may be either
parameters,
reduces to
jJ.. y. (If the data are
one-sided, Hl. tt x < jJ.. y or Hi: tt x > tty, or two-sided, H1 : jJ.. X
binomial, the location parameters are px and py, the true "success" probabilities for
treatments X and Y, and the null hypothesis takes the form Ho: PX py.)
Sometimes, although much less frequently, it becomes more relevant to compare
than
locations. A food company, lor
the variabilities of two treatments,
example, trying to decide which of two types of machines to
for filling cereal boxes
would naturally be concerned about the average
of
boxes filled by each
type, but they would also want to know something about the variabilities of the weights.
Obviously, a machine that produced high proportions of "underfills" and "overfills"
would be a distinct liability. In a situation of this sort, the
null hypothesis is
HO:(1~ =
For comparing the means of two normal populations, the standard
is
two-sample t fest. As described in Section 9.2. this is a relatively straightforward extension
of Chapter 7's one-sample t test. For comparing variances, though, it will be necessaIY
to introduce a completely new test-this one based on the F distribution of Section 7.3.
The binomial version of the two-sample problem. testing Ho: px = pr, is taken up in
'*
=
9.4.
It was mentioned in connection with one-sample problems that certain inferences,
...."',.uv'U
for various reasons, are more aptly phrased in terms of confidence intervals rather than
hypothesis tests. The same is true of two-sample problems. In Section 9.5, confidence
Section 9.2
Two-Sample t Test
Testing Ho: P-X =
intervals are constructed for the location difference of two populations,
Px - py), and the variability quotient, O'~
10';.
/.-LX -
/Ly
555
(or
TESTING Ho: /Lx ::: /Lv-THE TWO-SAMPlE t TEST
We will suppose the data for a given experiment consist of two independent random
samples. Xl;
. .. •
and Yt. Y2 • ...• Ym. representing either the
to
two populations from which the Xs and Ys are drawn
in
9.1. Furthermore,
will be presumed normal. Let fJ<X and /.-Ly denote their means. Our objective is to derive a
procedure
testing Ho: P,X = /Ly.
As it turns out, the precise form of the test we are looking for depends on the
variances of the X and Y populations. II it can be assumed that O'~ and
are equal,
it is a relatively straightforward task to produce the GLRT for Ho: Wi{
/.-Lt'- (This
in
what we will do in Theorem 92.2.) But if
variances of the two populations
are not equal, tbe problem becomes much more complex. This second case, known as the
Behrens-Fisher problem, is more
seventy-five years
and remains one of the more
famous "unsolved" problems in statistics. What headway investigators have made has
been confined to approximate solutions (see. for example, SUkhatme (174) or Cochran
(25)]. These however, will not be discussed
we will restrict our attention to testing
Ho: /.-LX Wf when it can be ~umed that
For the oneMsample test that /L = /10, the GLRT was shown to be a function of a special
case of the l ratio introduced Definition 7.3.3 (recall Theorem 7.3.5). We begin this
""""'" ....,,... with a theorem that
still another special case of Definition 7.3.3.
0';
=
O'i = 0';.
=
Threorem9~1.
Let
X2 •.. ·• XII bearandomsample ofsize n from a normaldistribu.ti.on
with mean /LX and standard deviation and lei YIo Y2 •... • Ym be an indepen.dent random
sample of size m from a nomuJ. distribution with mean p,y and standard deviation
Let S~ and S~ be the two corresponding sample variances, an.d S;J the pooled variance.
where
0'
0'.
"
(Xi
+
(¥i -
~__~~________~~_~i=~l~__________~~______
n+m-2
n+m-
Then
has a Student t distribu.tion with n
+m
- 2 degrees offreedom.
556
Chapter 9
Two-Sample Problems
Proof.
method of proof here is very similar to what was used for
Note that an equivalent formulation of Tn +m-2 is
x - y-
(J.LX - fLY)
=r==============================
ButE(X =J.LX - J.LyandVar(X - y)=,,2jn
the ratio has a standard normal distribution: fz(z).
+
,,2jm, so the numerator of
In the denominator,
and
are independent X2 random variables with n - 1 and m - 1 elf, respectively, so
hasax2 distribution withn
the numerator and
that
+m -
has a Student t distribution with n
2df(r~Tbeorem4.6.4).Also, by Appendix7.A2.
ItfoUowsfromDefinition 7.3.3, then.
+m -
2 df.
o
Section
Testing Ho: /Lx
=ltv-The Two-Sample tTest
551
Theorem 9.2.2. Let Xl, x2. .. • »If and Yb Y2.·. , . Ym
independent random samples from
normal distributions with means /.LX and /.LY, respectively, and with /he same standard
deviation (1. Let
t
a.
t
~
b. To
= -r:;===::=
test Ho: /.LX
= /.Ly versus Hi: /.LX
> /.Ly at the a level of significance, reject Ho if
t""n+III-2.
test Ho: /.LX
= /.Lf versus H1: /.Lx
< /.Ly at the Ci level of significance, reject Ho if
t,:s
c.. To test Ho: Mx
= /.LY versus HI: Mx :f MY at
a
level of significance, reject Ho if t
is either (1):::: -ta!2.If+m-2 or (2) ~ (",/2.11+111-2·
Proof. See Appendix 9.A.1.
0
CASE STUDY 9.2.1
Cases of disputed authorship are not very common, but when they do occur they can
be very difficult to resolve. Speculation has persisted for several hundred years that
And whether it was
some of Shakespeare's works were written by Sir Francis
Alexander Hammon or James Madison who wrote certain of the Federalist Papers is
still an open question. A similar, though more recent, dispute centers around Mark
Twain (18).
In 1861, a series of ten essays appeared
the New Orleans Daily Crescent.
Signed "Quintus Curtius Snodgrass,"
essays purported to chronicle the author's
adventures as a member the Louisiana militia. 'While historians generally agree that
the accounts referred to actually did happen, there seems to be no record of anyone
named Quintus Cumus Snodgrass. Adding to the mystery is the fact that the style
of the
bears unmistakable traces--at least to some critics-of the humor and
irony that made Mark Twain so famous.
Most typically, efforts to unravel these sorts of ··yes, he did-no, he didn't" controversies rely heavily on literary and historical clues. But not always. There is also a statistical approach to the problem. Studies have shown that authors are remarkably conlength. That a given author
Sistent in the extent to which they use words of a
will use roughly the same proportion of, say, three-letter words in something he writes
this year as he did in whatever he wrote last year. The same holds true for words of any
proportion of three-letter words that author consistently uses will
length. But.
very likely be different from the proportion of three-letter words that au thor B uses. It
follows that by comparing the proportions of words of a certain length essays known
to be the work of Mark Twain to the proportions found in the ten Snodgrass <;;:>;:K1Y,;:',
we should be able to assess the likelihood of the two authors' being one and
same.
(ConlirlUEd on next page)
558
Chapter 9
Two-Sample Problems
(Que Study 9.21 continued)
TABlE 9.2.1: Proportion of Three-letter Words
Twain
Proportion
Sergeant Fathom letter
Madame CapreH
Mark Twain letters in
Te"itorinl Enterprise
First letter
Second letter
Third letter
Fourth letter
First Innocents Abroad letter
Fl1'St half
Second half
0.217
0.240
0.230
0.229
QCS
Proportion
Letter I
II
Letter III
Letter IV
Letter V
Letter VI
Letter vn
Letter VITI
Letter IX
Letter X
0.209
0.205
0.196
0.210
0.202
O.W
0.224
0.223
0.220
0.201
0.217
Table 92.1 shows the proportions of three-letter words found in eight Twain
essays and in the ten Snodgrass essays. (Each of the Twain works was written at
approximately the same time the Snodgrass essays appeared.)
If Xl =
Xl = 0.262, ... , X8 = 0.217, and Yt
0.209, Y2 = 0.205, ... , YlO =
0.201, then
=
x = 1.855 = 0.2319
and
y = 2.097 = 0.2097
To analyze these data, we need to decide what the magnitude of the difference between
the sample means. x - y = 0.2319 - 0.2097 = 0.0222. actually tells us. Let J.Lx and
J.Ly denote the true fractions of the
that Twain and Snodgrass, respectively, use
three-letter words. Of course, not having examined the complete works of the two
authors. we have no way of evaluating either J.Lx or j.ly. so they become the unknown
arolete'rs of the problem. What needs to be decided, then, is whether an observed
sample difference as large as 0.0222 impJies that J.LX and J.L}' are, themselves. not the
same'? Or is 0.0222 small enough to still be compatible with the hypothesis that the
true means are equal'? Put formally, we must
between
Ho:
J.Lx
= My
and
(Continued on nexs page)
Section 9.2
Testing Ho: IJ.X = J.I.y-The Two-Sample tTest
559
Since
8
L
= 0.4316 and
::::0.4406
i::=1
two s.ample variances are4-3
=0.0002103
= 0.00009:55
Combined, they give a pooled standard deviation of 0.0121:
= .J0.(J(XJ1457
=0.0121
According to Theorem 92.1, if Ho: li-X = f-tr is true, the sampling distribution of
xT "'" -J-:=1==1=
Sp '8 + 10
is described by a Student I curve with
8
+ 10 -
2) degrees of freedom.
(Continued OIl next fHlCe)
560
Chapter 9
Two-Sample Problems
(Case Study 9.2..1 umtinued)
Area = 0.005
RGURE 9.2.1
Suppose we
a = 0.01. By Part (c) of Theorem 9.22, Ho should be rejected
-(005.16
-2.9208 or
in favor of a two-sided Hl if either (1) t :S -ley!2.n+m-2
(2) t :?: 1aj2,II+m-2 1.005.16 = 2.9208 (see Figure 9.2.1). But
=
=
=
0.2319 - 0.2097
t=-----;;:;;.=;::=;=
=3.88
a value falling considerably to the right of (005,16. Therefore, we reject Ho-it would
appear that Twain and Snodgrass were not the same person.
Comment. The XiS and YjS in Table 9.2.1, being proportions, are necessarily not
normaUy distributed random variables, so the basic assumption of Theorem 9.22 is not
met Fortunately, the
of nonnormality on the probabilistic behavior of
are frequently minimal. The robustness property of the one-sample t ratio that we
investigated in Chapter 7 (recall Figure 7.4.7) also holds true for the two-sample I ratio.
CASE STUDY 9.2.2
Dislike your statistics instructor? Retaliation time will come at the end of the semester,
student course evaluation form with 1s. Were you pleased?
when you pepper
send a signal with a load of Either way, students' evaluations of their instructors do
matter. The..;;e instruments are commonly used for promotion, tenure, and merit raise
decisions.
Studies of student course evaluations show that they do have value. They tend
to show reliability and consistency. Yet questions remain as to the ability of these
questionnaires to identify good teachers and courses.
Ho: p.,x= p.,y-The Two-Sample t
Section
561
A veteran instructor of developmental psychology decided to do a study (212) On
how a single changed factor
affect
student course evaluations. He had at·
tended a workshop extolling the virtue of an enthusiastic style in the classroom-more
hand gestures, increased voice pitch variabiHty, and the like. The vehicle for the study
was the
undergraduate developmental psychology course he had taught
in the fall semester. He set about to teach the
semester offering in the same
way, with the exception of a more enthusiastic style.
The professor fuHy understood the difficulty of controlling for the many variables.
He selected the
class to have the same demographics as the one in the (alL He
used the same textbook, syllabus, and tests. He listened to audio tapes of the fall lectures
and reproduced them as closely as possible, covering the same topics in the same order,
The first step in examining the effect of enthusiasm on course evaluations is to
establish that students have, in fact, perceived an
in enthusiasm. Table 9.2.2
summarizes the ratings the instructor
on the "enthusiasm" Question for the
in sample means (2.14 to 4.21) is statistically signifitwo semesters. Unless the
cant,
is no point in trying to compare faU and spring responses to other questions.
TABLE 9.2.2
Spring, Yi
m =243
x=2.14
Sx 0.94
Y=4.21
= 0.83
Sy
Let JJ..x and JJ..f denote the true means associated with the two different teaching
styles. There is no reason to think that increased enthusiasm on the part of
instructor would decrease the students' perception of enthusiasm, so it can be
here thai H1 should be one sided. That is, we want to test
/.LX
= JJ..Y
versus
HI: fJ.X < JJ..y
Let (J. = 0.05,
Since n
229 and m =
the t statistic has 229
the decision rule calls for the rejection
freedom.
t =
-r='3===:= :S
'C1.ni:'m--L.
+
243 - 2
Ho if
= 470 degrees of
= -t.05.470
(CanJinued on next page)
562
Chapter 9
Two-Sample Problems
(DJse Study 92.2 oonrinued)
A glance at Table A2 shows that for any value II > 100, Za is a good approximation
la.n. That
-1.05,470 == -Z.05
-1.64.
The pooled standard deviation for these data is 0.885:
=
Sp
=
Therefore,
2.14 - 4.21
= -~r===== -
I
-L-'."""
and our conclusion is a resounding rejection of Ho-the increased enthusiasm was.
indeed, noticed.
The real question of interest is whether t1!e change in enthusiasm produced a
perceived Change in some other
of teaching that we know did not change. For
example, the instructor did not become more knowledgeable about the material over
the course of the two semesters. The student ratings, though, disagree.
Table 9.23 shows the instructor's fall and
ratings on the "knowledgeable"
"fU'-">uvu. Is the
from x 3.61 to Y =
statistically significant? Yes. For
Sp = 0.898 and
=
t
3.61 - 4.05
= --t==;====:= =
wh.ich falls far to the left of the 0.05 critical value
-1.64).
a bit disturbing. Table 9.2.2
What we can glean from these data is both
appears to confirm the widely
belief that enthusiasm is an important factor in
a more cautionary note.
teaching. Table 9.2.3, on the other hand,
It speaks to another widely held belief-that student evaluations can sometimes be
difficult to interpret. Questions that purport to be measuring one trait may, in fact, be
reflecting something entirely different
TABLE 9.2.3
Fall, Xi
n =229
x= 3.61
Sx
=0.84
Yi
m =243
y=4.05
Sy =0.95
Section 9.2
Testing Ho: /Lx:::: Ily-The Two-Sample tTest
563
Comment. It occasionally happens that an experimenter wants to test Ho: /.Lx = tLy
and knows the values of ai and a~. FOT those situations., the t test of Theorem 9.22
is inappropriate. If
n XiS and m YiS are normally distributed, it follows from the
Corollary to Theorem 43.4 that
Z
=x
(9.2.1)
has a standard normal distribution. Any such test of Ho: /.Lx
on an observed Z ratio rather than an observed t ratio.
/.Ll', then, should be based
QUESTIONS
9.2..1. Ring Lardner was one of this country's most popular writers during the 1920s and
1930s. He was also a chronic alcoholic who died prematurely at the age of 48. The
following table lists the liCe spans of some of Lardner's contemporaries (38). Those in
the sample OIl
left were all problem drinkers; they died, on the
at age 65.
The 12 (sober) writers 011 the right tended to live a full 10 years longer. Can be argued
that an increase of that magnitude is statistically significant? Test an appropriate nuU
hypothesiS against a one-sided Ht. Use the 0.05 level of significance. Note: The pooled
sample standard deviation
these two samples is 13.9.
Authors Not Noted
for Alchohol Abuse
Authors Noted
for Alchohol Abuse
Name
Ring Lardner
Sinclair Lewis
, Raymond Chandler
O'Neill
Robert Benchley
J.P, Marquand
Dashiell Hammett
e.e. cummings
Edmund Wilson
Average:
48
66
71
56
67
67
70
77
65.2
Name
at Death
Carl Van Doren
65
87
32
77
Pound
Randolph Bourne
Van Wyck Brooks
Samuel Eliot Morrison
John Crowe
Conrad Aiken
Ames Williams
Henry Miller
ArchibaJd MacLeish
James Thurber
Average:
89
86
77
84
64
88
90
67
9.2.2. Poverty Point is the name given to a number of widely scattered archaeological sites
throughout Louisiana, Mississippi, and Arkansas, These are the remains of a society
thought to have tlourished during the period from 1700 to 500 B.C. Among their
characteristic artifacts are ornaments that were fashioned out of clay and then baked.
The following table shows the dates (in years
associated with four of these baked clay
Terral Lewis and Jaketown (85).
ornaments found in two different Poverty Point
564
Chapter 9
Two-Sample Problems
The averages for the two samples are 1133.0 and 1013.5, respectively. Is it believable that
these two settlements developed the technology to manufacture baked clay ornamenls
at the same time? Set up and test an appropriate
against a two-sided HJ at the
a = 0.05 level of significance. Note: $.1' = 266.9 and Sy = 224.3.
Xi
Jaketown
1492
1169
883
988
Yi
1346
942
908
858
9.2..3. Nod-swimming in male ducks is a highly ritualized behavioral trait. The tenn refers
to a rapid back-and-forth movement of a duck's head. It frequently occurs during
courtship displays and occasionally occurs when the duck is approached by another
male perceived to have higher status. It may depend on the duck's "race." In an
experiment investigating the latter (100), two sets of green-winged teals, American
and European, were photographed for several days. The following table gives the
frequencies (per 10,000 frames of film) with which each bird initiated the nod-swimming
motion. At the 0.01 level of
test the null hypothesis that the true average
nod-swimming frequencies
of
and
ducks are the
same.
Amer. Male
Freq., Xi
Eur.Male
A
B
C
14.6
28.8
19.1
D
E
F
G
H
I
J
50.3
35.7
L
M
N
0
P
Q
R
Yi
3.6
8.2
7.8
27.5
7.0
19.7
17.0
3.5
13.3
12.4
19.0
14.1
Note: For these two samples,
6
6
i=l
.=1
U
12
i==l
; ...1
LX; = 171.6 Lxi = 5745.60
LYi = 153.1 L Yi = 2526.09
9.2.4. A major source of
poisoning" comes from the ingestion of methylmercury
(~). which is found in contaminated fish (recall Question 5.3.2). Among the questions pursued
medical investigators trying to understand the nature of this particular
Section 9.2
Testing Ho: Itx '" ltv-The Two-Sample t Test 565
health problem is whether methylmercury is equally hazardous to men and women. The
following (117) are the hall-lives of methylmercury in the systems of six women and nine
men who volunteered for a study where each subject was given an oral administration of
CHF. Is there evidence here that women metabolize methylmercury at a different: rate
than men do? Do an appropriate two-sample t test at the ell == 0.01 level of significance.
Note: The two sample standard deviations for these data are ax = 15.1 and 9y = 8.1.
Methylmercury (CHf3) Half-Lives (in Days)
R Females., Xi
Males, Yi
52
72
69
88
frl
73
88
87
74
78
56
70
78
93
74
9.2.5. The use of carpeting in hospitals., while having definite esthetic merits, raises an obvious
question: Are carpeted floors sanitary? One way to get at au answer is to compare
bacterial levels in carpeted and uncarpeted rooms. Airborne bacteria can be counted by
passing room air at a known rate over a growth mediu~ incubating that medium, and
then counting the number of bacterial colonies that form. In one such study done in a
Montana hospital (209), room air was pumped over a Petri dish at the rate of 1 cubic
foot per minute. This procedure was repeated in 16 patient rooms, 8 carpeted Bud 8
uncarpeted. The results, expressed in terms of "bacteria per cubic foot of air," are listed
in the following table.
Rooms
BacteriaJft3
Uncarpefed Rooms
BacterialW
210
11.8
8.2
212
216
220
7.1
223
13.0
10.8
226
227
10.1
14.6
14.0
228
221
12.1
8.3
3.8
7.2
12.0
222
11.1
224
229
10.1
214
215
217
For the carpeted rooms,
8
LXi =89.6
8
and
1",,1
LX: = 1053.70
1""1
For the uncarpeted rooms,
8
LYi =783
;=1
8
and
EY1=838.49
i=1
56fi
Chapter 9
Two-Sample Problems
Test whether carpeting has any effect on the level of airborne bacteria in patient rooms.
Let (X 0.05.
9.2.6. In addition to marketing tea, Upton also sells packaged dinner entrees. The company
was interested in knowing whether the buying habits for such products differed between
and married couples. In particular, in a
of consumers, they were asked to
respond to the question "Do you use coupons regularly?" by a numerical scale, where
1 stands for agree strongly, 2 for agree, 3 for neutral, 4 for disagree, and 5 for disagree
strongly. The results of the poll are given in the following table (21).
=
Use Coupons Regularly
(X)
:x =3.10
sx = 1.469
Married (Y)
n=
y = 2.43
Sy
= 1.350
=
9.2.1.
9.2.8.
9.2.9.
Is the observed difference significant at the (X 0.05 level?
Accidents R Us and Roadkill specialize in· writing insurance policies for high-risk
drivers. Last year, Accidents R Us processed 100 chums. Settlements averaged $2000
and had a sample standard deviation of $600. A smaller firm, Roadkill resolved only
50 claims, but the payouts averaged $2500 with a sample standard deviation of $700.
Can we conclude from last year's experience that the average awards paid by the two
companies tend not to be the same? Set up and carry out an appropriate analysis.
A company markets two brands of latex paint-regular and a more expensive brand
that claims to dry an hour faster, A consumer magazine decides to test this claim by
painting 10 panels with each product. The average drying time of the regular brand is
2.1 hours with a sample standard deviation of 12 minutes. The fast-drying version has
an average of 1.6 hours with a sample standard deviation of 16 minutes. Test the null
hypothesis that the more expensive brand dries 8n hout quicker. Use a one-sided HI.
Leta::;;: 0.05.
(Ii) Suppose 110: JlX = /Ly is to be tested against 111: Jlx "I< Jl'l' The two sample sizes
are 6 and 11. If :: 15.3, what .is the smaUest value for
- yl that will result in
Ho being
at the a 0.01 level of significance?
(b) What is the
value for:X - y that will lead to the rejection of
Jlx
/L)'
in favor of H,: Jlx > Jly if a
0.05, sp 214.9, n 13, and m 8?
Suppose that 110: Jlx
/Ly is being tested against 111: JlX -+ Jly, where
and u~ are
known to be 17.6 and 22.9, respectively. If n ::::;: to, m
20, x 81.6, and y : ; ;: . 79.9,
what P-value would be associated with the observed Z ratio?
An executive has two routes that she can take to and from work each day. The first.is
by interstate; the second requires driving through town. On the average it takes her
33 minutes to get to work by the interstate and 3S minutes by going through town. The
standard deviations for the two routes are 6 and 5 minutes, respectively. Assume the
distributions of the
for the two routes are approJtimately normally distributed.
(8) What.is the probability that on a given day driving throUgh town would be the
quicker of her choices?
(b) Wbat.is the probability that driving through town for an entire week (10 trips)
would yield a lower average time than taking the interstate for the entire week?
Prove that the Z ratio given in Equation 9.2.1 has a standard nonna1 distribution.
=
9.2.10.
9.2.11.
9.2.12.
=
=
=
=
=
=
=
ui
Testing Ho: ILx::::: ILv-The Two-Sample tTest
Section 9.2
9~13.
567
If Xl> X2,.·"
and Yt, Y2,"" YIJI are independent random samples from normal
distributions
tbe same
,prove that their pooled sample variance, s~, is an
2
unbiased estimator for a .
9..2.14. Let
X2.· .. , X tl and fl. Y2., ... , Y", be independent random samples drawn from
normal distributions with means J,Lx and ILl', respectlvely, and with the same known
variance
.Use the generalized likelihood ratio criterion to derive a test procedure
for choosing between Ho: J,LX = J,Lf and H}: ILx ¢ J,Ly.
9.2..15.. When a} ai, Ho: ILX ILY can be tested
using the statistic
*
=
t=
--;======
which has an approximate t distributlon with v degrees of freedom, where
greatest integer in
-
tI
is the
1)
A person exposed to ao infectious agent, either by contact or by vaccination,
normally develops antibodies to that
Presumably, the severity of an infection
is related to the number of antibodies produced. The degree of antibody response is
indicated by saying that the person's blood serum has a certain tiler, with higher titers
indicating greater concentrations of antlbodies. The foilowing table gives the titers of
22 persons involved in a tularemia epidemic in Vermont (20). Eleven were quite ill; the
other 11 were asymptomatic. Use an approximate t ratio to test Ho: ILX = J,Lyagainst a
one-sided HI at the 0.05 level of significance.
Note: The sample standard deviations for the "Severely ill" and U Asymptomatic"
groups are 428 and 183, rp.I:.........:":tnl·elv
Severely III
Subject
1
2
3
4
5
6
7
8
9
10
11
9~16.
Titer
640
80
1280
160
640
640
1280
640
160
320
160
Asymptomatic
Subject
Titer
13
320
320
320
10
14
16
17
18
19
20
21
22
80
160
10
640
160
320
For the approximate two-sample t test described. in Question 9.2.15, i.t will be true that
v<n+m-2
Why is that a disadvantage for the approximate test? That is, why is it better to use the
Theorem 9.2.1 version of the t test if, in fact, O'} a~?
568
Chapter 9
Two-Sample Problems
9.2.17. The two-sample data described in Question 8.2.2 would be analyzed by testing
Ho: ILX = MY, where MX and ILy denote the true average motorcycle-rcIated fatality
rates for states having "limited" and "comprehensive" helmet laws, respectively.
(a) Should the t test for Ho: ILK
ILY follow the fonnat of Theorem 9.2.2 or the
approximation given in Question 9.2.15? Explain.
(b) Is there anything unusual about these data?
9.3
TE511NG flo: u~
u~-THE FTEST
Although by far the majority of two-sample problems are set up to detect possible shifts
in location parameters, situations sometimes
where it is equaUy important-perhaps
even more important-to compare variability parameters. Two machines on an assembly
line, for example, may be producing items whose average dimensions (J-L x and Ity) of some
UUI:kllleSSi-;are not significantly different but whose variabilities (as measured
and O'i) are.
becomes a critical
of information if the increased variability
by
results in an unacceptable proportion
items from one of the machines falling outside
the engineering specifications (see Figure 9.3.1).
In this section we will examine the generalized likelihood ratio test of Ho: o~ =
versus H]:
:p.
The data will consist of two independent random samples of sizes
11 and m: The first-Xl. X2, • .• xn-is assumed to have come from a normal distribution
the
.)'2 •...• Ym-from a normal distribution
having mean ILX and variance
(All four
are assumed to be unknown.)
having mean ILY and
Theorem 9.3.1 gives the test
that will be used. The proof will not be given,
but it follows the same basic pattern we have seen in other GLRTs; the important step
is showing that the likelihood ratio is a monotonic function of the F random variable
described in Definition 7.3.2.
O'i
o¥
ai 0';.
0'1;
0';.
Comment. Tests of Ho:
= o'~
another, more routine, context. Recall that
the procedure for testing the equality of ILx and J-ty depended on whether or not the two
Output from machine X
(Acceptable) proporIion
100 tbin
I U
x
I
(Acceptable) proponioo
too thick
I
(Unacceptable) proportion:
too thin
I Ux < oy
I
I
Output from machine Y
i (Unacceptable) proponion
I
too thick
FIGURE 9.3.1: Variability of machine outputs,
Section 9.3
Testing
_2 _
"'x -
FTest
569
ai
population variances were equal. This implies thar a test of Ho:
a~ should precede
every test of Ho: I1x = I1Y· If the fonner is accepted, the t test on I1X and I1Y is done
if Ho:
a~ is rejected, Theorem 9.2.2 is not entirely
to
9.2.2;
appropriate. A frequently used alternative in that case is the approximate t test described
Question 9.2.15.
Theorem 9.3.1.
, X2 • •.• , Xn and Yl, )'2, ... , Yin be illdepelldenl random samples from
normal distributions with means 11 x and J1 y and standord deviations ax and ar, respectively.
a. To lest Ho: a~ =
versus
. a; >
at (he a level of significance:, reject Ho if
s~/si ::: F,.,.,,,,-I.n-l.
b.
test Ho: a~ = a; versus Ht:
< a~ at the a level of significance, reject Ho ~r
s~/si ~
Fl-tt.m-l,n-l.
a;
'*
c. To lest HO:
= versus Hl: a;
at the a level of significance, reject Ho if
is either (1) .::: Fa /2,TII-l.n-l or (2) 2: Fl- a /2.m-l,n-l·
Comment. The GLRT described in Theorem 9.3.1 isapproximale for
same sort of
reason the GLRT for Ho:
=
was approximate (see Theorem 7.5.2). The distribution
of the test statistic, 1 is not symmetric,
the two ranges of variance ratios yielding
AS less than or
to A* (i.e., the lefttail
right tail the critical region) have slightly
areas.
the
of convenience. though, it is customary to choose the two
critical vaiues so that each cuts off the same area, a/2.
St Si,
a5
CASE STUDY 9.3.1
Electroencephalograms are records showing fluctuations of electrical activity in
brain. Among the several different kinds of brain waves produced, the dominant ones
are usually alpha waves. These have a characteristic frequency of anywhere from eight
to thirteen cycles per second.
this example was to see whether
"""",...,,,,,. of the experiment described
sensory deprivation over an extended period of time has
effect on
alpha-wave
pattern. The subjects were twenty inmates in a Canadian prison. They were randomly
spIlt into two equal-sized groups. Members of one
were placed in solitary
confinement;
in the olher group were allowed to remain in their own
Seven days
alpha-wave frequencies were
for
twenty subjects (59),
as shown in Tab1e 9.3.l.
UUi"Hll~ from
9.3.2,
was an
decrease in alpha-wave frequency
for
in solitary confinement. There also appears to have been an increase the
(Continued on nexl page)
510
Chapter 9
Two-Sample Problems
(OJse Srudy 9.3.1 continued)
TAW 9.3.1: Alpha-Wave
(CPS)
Nonconfined, Xi
Confinement, Yi
10.7
10,7
9.6
10.4
10.4
9.7
10.9
10.5
10.3
9.6
11.1
11.2
10.4
10.3
9.2
9.3
9.9
9.0
10.9
11
1)'
fi
""0"
..::
10
<U
<II
~
.[
••
•
••
0
•
8
0
§
8
9
:(
• NOllconfined
o Solitllry
0
0
AGURE 9.3.2: Alpha-wave frequencies (cps).
variability for that group. We will use the F test to determine whether
observed
difference in variability (4 = 0.21 versus = 0.36) is statistically significant.
Let
and o} denote
true variances of alpha-wave frequencies for nonconfined
and soIitary..confined
respectively. The hypotheses to be tested are
u 'a2
2
no·
X = af
versus
Hl:
ai ¥- {f~
(Continued on n£XI pttge)
Section 9.3
Testing Ho: O'~:::: O'}-The FTest
571
be the level of significance. Given that
10
2>1 = 105.8
= 1121.26
1=1
yl =959.70
the
variances become
s} =
-------'---'--
= 0.21
and
2 _
Sy -
10(959.70) - (97.8)2 _ 0 36
10(9)
- .
an observed F ratio of 1.71:
the sample
F
= 0.36 = 171
0.21
.
Both nand m are ten, so we
s~ Jsj to UC:;JlU"""
an F random
variable with nine and nine
of freedom (assuming
is true). From
Table AA in the Appendix, we see that the values cutting
in either
tail of that distribution are 0.248 and 4.03 (see Figure 9.3.3).
Since the observed F
between the two critical
is
to fail to reject Ho-a
variances equal to 1.71
not
out the
possibility that the two true
are equal. (In light of
comment preceding
Theorem 9.3.1, it wou~d now
appropriate to test Ho: /Lx = /Ly
the two-sample
1 test described in :se(~t1Cin
F distribution with
Density
9 and
ofrree.clom
Area =
Area = Q.025
FIGURE 9.1.1: Distribution of 5~/5~ when HO is true.
sn
Chapter 9
Two--Sample Problems
QUESTIONS
9.3.1. Short people tend to Jive longer than tall people, acrording to a theory held by certain
medical researchers. Reasons for the disparity remain unclear, but studies have shown
that short baseball players enjoy a longer tife expectancy than tall baseball players.
A
finding has been documented for professional boxers. The foUowing table
(159) is a breakdown of the life spans of 31 former
grouped into two
5'7") and "TaU" (;::: 5(811 ). The sample variance for the short
categories-"Silort"
, for the tall presidents, 86.9 years2•
presidents is 73.6
Short Presidents (::;5'71/)
President
Height
Madison
5'4"
5'6"
Van
B. Harrison
J. Adams
J.O. Adams
5'7/1
Tall Presidents Ci!S8")
President
67
W. Harrison
Polk
Taylor
90
Grant
85
79
80
Hayes
Truman
Fillmore
Pierce
A. Johnson
T. Roosevelt
Eisenhower
Cleveland
Wilson
Hoover
Monroe
Tyler
Buchanan
Taft
Harding
Jackson
Washington
Arthur
F. Roosevelt
L.Johnson
Jefferson
Height
Age
5'8"
68
53
65
5'8"
5/8"
5'8 1JJ
'Z
5'8 1 II
~
63
70
5'9"
88
74
5'10"
64
5'10"
5/10"
60
5' 10"
5'10"
5'11/1
5'1111
5' 11!1
6'
60
78
71
67
90
6'
6'
73
71
77
(I
72
6'
61 1H
6'21'
67
(l2f'
56
63
61'21'
6''21'
(l2r
64
83
(8) Test Ho: aj
against a two-sided HI at the a :: 0.05 level of significance.
(b) Based on your conclusion in Part (a), would it be appropriate to test
/-LX = MY
using the two--sample l test of Theorem
9.3.2. A safe investment for the nonexpert is the certificate of deposit (CD) issued by many
banks and other financial institutions. Typically, the larger the term of the investment,
samples of 6-month CD rates
the higher the interest rate paid The following table
Testing Ho:
Section 9.3
and
FTest
rates for a $10,000 investment. Is there a difference
at the a = 0.05 level?
573
the variability of
$10/)00 CD Rates
Note;
6 Month
12 Month
5.10
5.10
5.31
5.00
5.26
5.10
5.26
5.02
5.15
5.35
5.20
5.40
5.20
5.83
5.21
5.40
the 6-montb rates, Sx = 0.122; for the
sy
= 0..209.
r"U'.'VL'''''' the standard personality inventories used
psychologists is the therna.tic
am:>erc:.ep(IOI) test (TAT). A subject is shown a
pictures and is asked to make
up a story about each one. Interpreted properly,
content of the stories can provide
valuable insights into the subject's mental
following data show the
TAT results for 4() women, 20 of whom were
of normal children Bnd 20
the mothers of schizophrenic
In
case
suhject was shown the same
set of 10 pictures. The figures :recorded were the numbers of stories (out of 10) that
revealed a positive parent-ehild relationship, one
the mother was dearly capable
of :interacting with her child:in
way (210).
Mothers of Schizophrenic Children
Mothers of Normal Children
8
4
2
3
3
1
2
1
4
4
6
3
4
6
6
4
4
1
2
3
1
2
7
'0
3
2
2
0
a; q;
qi
=
=
3
3
1
1
4
2
3
1
2
2
2
1
(a) Test Ho:
= versus H1:
where
and q~ are the variances of
the scores of mothers of
_11..;1,,,_ •• _ and scores of mothers of schizophrenic
children, respectively. Let a = 0.05.
(b) H Ho: oJ o~ is
in
(a), test Ho: iLx tty versus Hl: ttx ¢ iLY. Set a
equal to 0.05.
9.3.4. In a study designed to
of.a strong magnetic field on the early
development of mice (8), 10
each containing three 3()..day-old albino female mice,
were subjected for a period of 12 days to a magnetic field having an average strength of
80 OeJcm.. Thirty other mice, bol..lSed in 10 similar cages.,. were not put in the magnetic
514
Chapter 9
Two-Sample Problems
field and served as controls. Listed in the table are the weight gains, in grams, for each
of the 20 sets of mice.
In Magnetic Field
Weight Gain
1
2
3
4
5
6
7
8
9
10
22.8
10.2
20.8
27.0
19.2
9.0
14.2
19.8
14.5
14.8
Not in Magnetic Freld
Cage
Weight Gain (g)
11
12
13
14
235
31.0
19.5
26.2
265
15
16
25.2
17
18
19
20
24.5
23.8
27.8
22.0
Test whether the variances of the two sets of weight gains are significantly different.
Let a = 0.05. Note: For the mice in the magnetic field, sx = 5.67; for the other mice,
Sy = 3.18.
!u.s. Raynaud's
is characterized by the sudden
of blood circulatioo
in the fingers, a condition that results in discoloration and heat loss. The magnitude
of the problem is evidenced in the following data, where 20
(10 "nonnals"
and 10 with Raynaud's syodrome) immersed their right forefingers in water kept at
1~C. The heat output (in cal/cm2/minute) of the forefinger was then measured with a
calorimeter (109).
Subjects with Raynoud's Syndrome
NormIJ/ Subjects
Patient
Heat Output
(callcm 2 huin)
Patient
W.K.
M.N.
S.A.
2.43
1.83
2.43
2.70
1.88
1.96
1.53
1.R.
J.G.
G.K.
AS.
2.08
1.85
2.44
L.P.
x=2.11
Sx =037
R.A.
RM.
P.M.
KA.
HM.
0.81
0.70
0.74
036
0.75
S.M.
RM.
0.56
0.65
0.87
OAO
B.W.
N.B.
0.31
y=0.62
=0.20
Sy
oJ .. O'}-The FTest 515
Section 9.3
Test that the heat-output variances for nonnal
and those with Raynaud's
syndrome are the ~me. Use a two-sided alternative and the 0.05 level of
significance.
9.3.6. The bitter, 8-month baseball strike that ended the
to have substantial reperCU$ions at the box
so abruptly was expected
1995 season finally got
under way. It did. By the end of the first week of play, American League teams were
National League teams fared even
playing to 12.8% fewer fans than the year
worse-their attendance was dow1115.1 % (200).
on the team-by-team attendance
figures given below, would it be appropriate to use the pooled two-sample t test of
Theorem 9.22 to assess the statistical significance of the difference between those
two means?
American League
Team
Baltimore
Boston
California
Chicago
Oeveland
Detroit
Kansas City
Milwaukee
Minnesota
New York
Oakland
Seattle
Texas
Toronto
Average:
-2%
National League
Team
ClIange
Atlanta
-49%
-4
-27
Colorado
No home
Houston
-30
No home
Montreal
New York
Philadelphia
Pittsburgh
San Diego
San Francisco
St. Louis
Average:
-18
-27
-15
-16
-10
-1
-9
-28
-10
-45
-14
-15.1%
9.3.7. For the data in
the sample variances for the methylmercury half-lives are
227:n for the females and
the males. Does the magnitude of that difference
invalidate
9.2.2 to test Ho: /1>X = p,y? Explain.
9.3.8. Crosstown busing to
for de facto segregation was begun on a fairly
scale in Nashville
1960s. Progress was made, but critics argued that too
many racial
Among the data cited in the early
19705 are the following
the percentages of African-American students
enrolled in a
of 18 public schools (172). Nine of the schools were
located in predominantly African-American neighborhoods; the other nine, in predominantly white neighborhoods. Whlch version of the two-sample I test, Theorem 9.2.2,
or the approximation
9.215, would be more appropriate for deciding
whether the
35.9% and 19.7% is statistically significant? Justify
your answer.
516
Chapter 9
Two-Sample Problems
Schools io White
Neighborhoods
American
21 %
14
28
41
32
11
30
29
46
39
6
18
24
25
23
Average: 19.7%
45
Average: 35.9%
9.3.9. Show that the generalized likelihood ratio for testing Ho:
as described in Theorem 9.3.1 is given by
), -
L(w,,)
-- -
•-
L(Oe) -
/
t
(XI _ X)2]fl 2[t
[
+
versus Hi:
*
(yj _ Y)2]"'!2
(m
n)<II+ni)J2
1=1
j=1
----;:;:-'----,,,-- -"'-------=----'=-------,..:::::...-:-=
n"12mm/2
[ "
L(Xi - X)2
i=l
m
+
]
("'+n)12
L(y] - Y)2
j=1
9.:UO. Let Xl, X2, ... , X nand
... , Ym be independent random samples from normal
distributions with means {.LX and Ji-y and standard deviations ax and Gy, respectivel
where {.Lx and Ji-y are known. Derive the GLRT for Ho:
= a~ versus HI:
> Gy.
0-;
9A
~NOMIAl
0'1
1,
DATA: TESTING Ho: Px == Pv
Up to this point, the data considered in Chapter 9 have been independent random
samples of sizes 11 and m drawn from two continuous distributions--in fact, from two
normal distributions. Other scenarios.
are quite possible. The Xs and Ys
..",\1""""'7,f continuous raodom variables but have density functions
than the normal
Or they might be discrete. In this section we consider the most common example of this
latter type: situations where the two sets of data are binomial.
Applying the Generalized IJkelihood Ratio Criterion
Suppose that n Bernoulli trials related to treatment X have resulted
x successes, and
m (independent) Bernoulli trials related to treatment Y have yielded y successes. We
wish to test whether PX and py, the true probabilities
success for
X and
Treatment Y, are equal:
Ho: Px =
versus
HI: px
Let ex be tbe level of
* py
p)
Section 9.4
Binomial Data: Testing Ho: Px :::
Following the notation used for GLRTs, the two parameter spaces here are
w = {(px, py) : 0
:s Px =
py
:s 1]
and
0= {(px, pv):
FUI:the~rmore
O.:s PX :s 1,0.:s py .:s I}
the likelihood function cao
to p(= Px = py) equal to zero and solving for
derivative of In L with
p gives a not too surprising result-namely,
Pe
That is, the maximum likelihood estllnare for p under Ho is the pooled success proportion.
Similarly, solving alnLjapx
0
alnL/iJpy = 0 gives the two
sample
likelihood esfunates, for px
proportions as the unrestricted
=
x
n
.
y
pr, =m
-
PUlting Pet px., and py. back into L gives the generalized likelihood ratio:
L(wt')
A= - L(O",)
[(x + y)j(n + m>l~+Y [1 - (x + y)/(n + m)]"+m-.r- y
= -=---....::..;..~:------..:..--::::-::--------:-:--'---'----=--::-(x/n)X [1 - (x/n)f (yjm)Y [1 - (y/m)]m y
(9.4.1)
of
Equation 9.4.1 is such a difficult function to work with that it is
to find an
approximation to the usual
likelihood ratio test. There are several available. It
can be shown, for example,
In Afor this probJem has an asymptotic 1. 2 distribution
with 1 degree offreedom (211).
an approximate tWChSided, a:::: 0,05 testis to reject
flo if -2 In A 2: 3.84.
Another approach,
one most often used, is to appeaJ to
central limit
theorem and make the " ...."' ....H.
that
has an approximate ::>IA.Ill...UiU normal distribution. Under Ho, of course,
518
Chapter 9
Two-Sample Problems
Y) =
X
Var ( -;; -;;;
p(l - p)
+
"---n--"--
m
-
p)
nm
If p is now repJaced by x
+
n+m
its maximum likelihood estimate under w. we get
statement of Theorem 9.4.1.
Thoorem 9.4.1. Let x and y denote the numbers of successes observed in two independent
sets of n and m Bernoulli trials, respectively, where Px and Py are the true success
probabilities associated with each set of trinls. Let Pi!
=x +y
n+m
and define
a. To test Ho: p X = py versus HI: p X > py al the ()( level of significance, reject Ho if
Z 2:: Zo·
b. To teS(
Px < py at the ()( level of significance,
px
Py versus
Ho if
Z :5 -Zac. To lest Ho: p X = py versUS H,: PX ¢:. Py oJ the ()( level of significance, reject Ho if z is
either (1) S
or (2) 2:: Za/2.
Comment. The utility of Theorem 9.4.1 actually exlends beyond the scope we have
just described. Any continuous variable can always be ruchotomized and "transformed"
into a Bernoulli
For example, blood pressure can be recorded in terms of "mm
Hg," a continuous
or simply as "nonnal" or "abnonnal," a BernouW """M!>lhl
The next two case studies illustrate these two sources of binomial data. In the first, the
measuremen 18 begin and end as Bernoulli variables; in the second, the initial measuremen t
of "number of nightmares per month" is dichotomized into "often" and "seldom."
CASE STUDY 9.4.1
Local
have some discretion in the disposition of criminal cases that appear
their courts. For some cases, the judge and the defendant's lawyer will enter
into a plea bargain, where
pleads gwlty to a lesser
How often this
happens is measured by the mitigation rale, the proportion of criminal cases where the
defendant qualifies for prison time but receives a greatly shortened term or no prison
time at aU.
(Continued on next page)
Section 9.4
Binomial Data: Testing Ho: Px =py
519
A recent Florida . Corrections Department study showed that the mitigation rate
in &cambia County from January 1994 through March 1996 was 61.7% (1033 out of
1675 cases), making it the state's fourth highest. Not happy with that distinction, the
area's State Attorney instituted some new policies designed to limit the number of
plea bargains. A follow-up study (138) revealed that the July 1996 through June 1997
mitigation rate decreased to 52.1 % (344 out of 660 cases). Is it fair to attribute that
decline to the State Attorney's efforts, or can the drop from 61.7% to 52.1 % be written
off to chance?
Let PX be the true probability that mitigation would have occurred during the
period January 1994 through March 1996, and let py denote the analogous probability
for July 1996 through June 1997. The hypotheses to be tested are
HO: PX
= py (= p)
versus
Ht:Px>py
Leta =0.01.
If Ho is true, the pooled estimate of P would be the overall mitigation rate. That is,
1033
PI'!
+
344
1377
='1675 + 660 = 2335 = 0.590
The sample proportions of the mitigation rate for the first period and second period
are ~ = 0.617 and
= 0.521. respectively. According to Theorem 9.4.1, then, the
test statistic is equal to 4.25:
S
0.617 - 0.521
Z
= -;::::~~::;:::::;::::::;::::==~~~::::;:::::::::=
= 4.25
(0.590)(0.410)
(0.590)(0.410)
1675
+
660
Since z exceeds the a = 0.01 critical value (Z.Ol = 2.33), we should reject the nuli
hypothesis and conclude that the more stringent policies laid down by the State
Attorney did have the desired effect. of lowering the county's mitigation rate.
CASE STUDY 9.4..2
Over the years, numerous studies have sought to characterize the nightmare sufferer.
Out of these has emerged the stereotype of someone with high anxiety, low ego
strength, feelings of inadequacy, and poorer-than-average physical health. What is not
so well-known. though. is whether men fall into this pattern with the same frequency
as women. To this end, a clinical survey (76) looked at nightmare frequencies
(Continued on nex! page)
580
9
Two-Sample Problems
(Case
9.4.2 continued)
TABLE 9.4.1: Frequency of Nightmares
Nightmares often
Nightmares seldom
Totals
% often:
Men
Women
Total
55
105
60
132
115
237
160
34.4
women. Each subject was asked whether he (or
for a sample of 160 men and
experienced
"often"
least once a month) or "seldom"
than
once a month).
percentages of men and women
"often" were 34.4% and
respectively (see Table 9.4.1). Is the difference between those two percentages
statistically significant?
Let PM and pw denote
true
or men
nightmares
and
women having nightmares often, respectively. The hypotheses to be tested are
HO:PM
Pw
versus
Let ex = O.OS. Then
Pe
=
55+60
±Z.025
=±
become the two critical values.
M(}reOVc~r
0.327, so
0.344 - 0.313
=0.62
The conclusion, then, is
We fail to reject the null hypothesis-these data provide
no convincing evidence that the frequency of nightmares is different for men than for
women.
QUESTIONS
9.4.1. The phenomenon of handedness has been extensively studied in human populations.
The percentages of adults who are right·handed, left-handed, and ambidextrous are
well documented. What is not so well-known is that a similar phenomenon is
in lower animals. Dogs, for example, can be either right-pawed or left-pawed.. _~"f'"",",,~P.
that in a random sample of 200 beagles it is found that 55 are left-pawed and
a
of 200 collies 40 are left-pawed.. Can we conclude that the difference
random
in the two
proportions olIeft-pawed
is statistically
Binomial Data: Testing Ho: Px
Section 9.4
=py
581
9.4.2. In a study designed to see whether a controUed
could retard the process of
arteriosclerosis, a total of 846
chosen peJ:S011S were followed over an eightyear period. Half were instructed to eat
foods; the other balf could eat
whatever they wanted. At the end of
years, 66 persons in the diet group were
found to have died of either myocardial
or
infarction, as compared
to 93 deaths of a similar nature in the
(214). Do the appropriate analysis.
Leta = 0.05.
9.4..3. Water witching, the practice of
movements of a forked twig to locate
underground water (or minerals),
over 400
Its first detailed description
appears in Agricola's De re MeudliCi1, published in
That water witching works
remains a belief widely held among rural people in
throughout the Americas.
[In 1960 the number of "active" water witches in the United States was estimated to be
more than 20,000 (205).] Reliable evidence supporting or refuting water witching is bard
to find, Personal acx:ounts
successes or failures tend to be strongly biased by
the attitude of the observer.
following data show the outcomes of all the wells dug
in Fence Lake, New Mexico, where
"witched" wells and 32 "nonwitched" wells were
sunk.
for each well was whether it proved to be successful (S) or unsuccessful
(U). What would you COrllC1UiOe"!
Nonwitched Wells
Witched Wells
S
S
S
S
U
S
S
S
S
S
S
S
S
S
S
S
S
U
S
S
S
S
U
S
S
S
S
S
S
S
S
U
S
S
S
U
U
S
S
S
S
S
S
S
S
S
S
S
S
S
U
S
S
S
S
S
S
S
9.4.4. If flying saucers are a 1ol,,,.'UIJ..'" phenomenon, it would follow that the nature of sightings
(that is,
should be similar in different parts of the world.
A prominent
investigator compiled a listing of 91
reported Spain and
1117
Among the information recorded was whether
saucer was
(JTf"mY1,1I or
His data are summarized in the following table (86). Let PS
aelIlO[:e the true probabilities of "Saucer on ground" in Spain and Not in Spa.i.n.
respectlV(;Iy.
Ho: ps PHS against a two-sided Hl" Let a = 0,01.
Saucer on ground
;saluct~r hovering
9.4.5.
In Spain
Not in Spain
53
705
38
412
Ho: Px = py is being tested against HI; Px ." py on the basis of two "'~'''''V'''''
sets of 100 Bernoulli trials. H x, the number of successes in the first set, is 60
y,
number of successes in the second set, is 48, what P-value would be associated with the
data?
582
Chapter 9
Two-Sample Problems
4134 of
9.4.6. A total of 8605 students are enrolled full-time at State University this
whom are women. Of the 6001 students who live on campus. 2915 are women. Can it
be
that the difference in the proportion of men and women living on campus is
Carry out an
Let a =
9.4.7. The kittiwake is a seagull whose
behavior is basically monogamous. Nonnally,
the birds
for several months
the completion of one
season and
reunite at
of the next.
or not the birds actually
reonite,
by the success of their "relationship" the season before. A
though, may be
total of 769 kittiwake pair-bonds were studied
over the course of two hr"... t1in
seaso~ of those
some 609 successfully
the first season; the rernainin,g
previously successful
160 were unsuccessful. The following season,
relationship left something to be
bonds "divorced," as did 100 of the 160 whose
desired. OlD we
that the difference in
two
rates (29% and 63%) is
statistically significant?
Breeding in Previous Year
Successful
Unsuccessful
Number divorced
Number not 11,,,,r"N-"11
175
434
100
Total
Percent divorced
609
60
160
29
63
~M.8.
A utility infielder for a National League club
last season in 300
plate. This year he hit .250 in 200 at-bats. The owners are trying to cut his
year on the
that his output has
The player argues, ".y' .... ""'....
his performances the last two seasons have not been significantly different, so
should not be reduced, Who is right?
9.4..9. Compute -2 In A
9.4.1) for the
data of Case Study
and
use it to test the
that px = py. Let a
9.5
CONFIDENCE INTERVALS FOR THE lWO-SAMPLE PROBLEM
data lend themselves nicely to the hypothesis
format because a
can always be
(whlch was not the case
every set of one-sample
The same inferences, though, can
as easily be
in terms of confidence
Simple inversions similar to the derivation of Equation 7.4.1 will yield confidence
intervals
/-Lx - tty, a-;:!u;, and px
Py,
WO-lSClIHljJIt:
Theorem 9.5.1.
, Xl, ... , Xn and)'1. yz" ',Ym be independent random samples drawn
from nonnal
with means ttx
/-Ly, respectively, and with the same standard
deviation, u.
denote the data's pooled standard deviation. A 100(1
% confidence
by
interval for ttx - /-Ly
~+ ;;; , x
• Spy -;;
- + tCl{2.n+m-2
- y
.
~rinn9.5
Proof. We know
Confidence Intervals for the Two·Saniple Problem
·""',r'\TP'n\
583
9.2.1 that
a ..;JLylU....JllL t distribution with n
+m
- 2 df. Therefore,
(9.5.1)
9.5.1 by isolating ILx - ILy in the center of the inequalities gives
in the theorem.
CASE STUDY 9.5.1
Occasionally in forensic medicine. or in the aftermath of a bad accident, lClentl1tym
the sex of a victim can be a very difficult task. In some of these cases., dental structure
provides a useful criterion, since individual teeth will remain in good condition long
after other tissues have deteriorated. Furthermore, studies have shown that
teeth and male teeth have different physical and chemical characteristics.
The extent to which X-rays can penetrate tooth enamel,
instance, is ClIIJterc~nt
for men than it is for women. listed in Table 9.5.1 are "spectropenetration gy8,(l)emo;:"'
for eight female teeth and eight male teeth (57). These numbers are measures
rate of change in the amount of X-ray penetration through a
of
tooth enamel at a wavelength of 600 urn as opposed to 400 nrn.
TABlE 9.S.1: Enamel Spectropenetration Gradients
Male, Xi
Female, Yi
4.9
5.4
4.8
5.0
5.5
5.4
6.6
63
43
4.1
5.6
4.0
3.6
(Conriooe.d on nexi page)
584
Chapter 9
Two-Sample
(Case
9.5.1 continued)
Let Jl.x and Jl.y be the popuJation means
associated with male teeth and with female
spe:ctr<Op(meltra1ion gradients
respe(;tively Note that
8
LXi =43.4
and
i=l
from which
x = 43.4 =5.4
8
and
- (43.4)2
8(7)
= 0.55
Similarly,
Yi
=
yf =
and
;=1
166.95
i=1
so that
and
2
= 8(166.95) - (36.1)2
8(7)
Sy
nel~I()re.
= 0 58
.
the pooled standard deviation is equal to 0.75:
_
7(0.55)
+ 7(0.58) = .J0.565 = 0.75
8+8-2
VVe
tbatthe
will be approximated by a
I curve
degrees of freedom. Since
'.Cf2S,}4 = 2.1448, the
confidence interval for Jl.X - Jl.y is given by
21 .. Ao - ~ ( x - y - . ~Py 8" + 8"' x - y
+
= (5.4 - 4.5 - 2.1448(0.75).JO.25.
= (0.1, 1.7)
+ 2.1448(0.75)JO.25)
,,,,,.,. .......... 9.5
Confidence Intervals for
Two-Sample Problem
585
Comment. Here the 95% confidence interval does not include the value zero. This
means that had we tested
HO: J.1X
= ILY
versus
Ht: J.1x
at the c¥
= 0.05 level ~f significance,
"* ILY
would have been rejected.
Theorem 9.5.2. Let x t • X2 • .•• , XII and )'1 , ."2, ...• ),,,, be independent random samples drawn
fromnonnal distributions wilhstandard deviations ax and ay, respectively. A 100(1 - c¥)%
confidence interval for the variance
a~/a:, is given by
1,11-1 .
-+~:"..
)
has an F distribution with m
I and n - 1 dr,
and follow the strategy used in the proof of Theorem 9.5.I-that
the center of the analogous inequalities.
isolate aj/a: in
Proof. Start with the fact that
o
CASE STUDY 9.5.2
easiest way 10 measure the movement, or flow, of a glacier is wilh a camera.
First a set of reference points is
off at various
near the
Then these points, along with the glacier, are photographed trom an airplane. The
problem is this: How long should the
interval
between photographs? If 100
shorr a period has elapsed, the glacier will not have moved very far and
errors
associated with the photographic technique will relatively
If too long a period
has elapsed, parts of the glacier might be deformed by the surrounding terrain, an
eventuality that could introduce substantial variability into the point-to-point velocity
estimates.
Two sets of flow rates for the Antarctic's Hoseason Glacier have been calculated
(118), one based on photographs taken three years apart, the other, jive
apart
(see Table 9.5.2). On
basis of other considerations, it can be assumed that
eight
in question.
Htrue" flow rate was constant for
objective
is to assess the relative variabilities associated with the threeand five-year time periods. One way to do this-assuming the data to be normal-is
(Cotllinlled on /lext page)
586
Chapter 9
TwcrSample Problems
(Case Study 9.5.2 continu.edJ
TABlE 9.5.2; FIo\N Rates Estimated for the Hoseasoo Glacier (Meters Per Day)
Three-Year
Five-Year Span, Yi
0.72
0.74
0.73
0.76
0.74
0.75
0.77
0.72
0.72
0.73
0.75
0.74
to construct, say, a 95% confidence interval for the variance ratio. If that interval
does not contain the value "1," we infer that the two time periods lead to flow rate
estimates of significantly different precision.
From Table
7
7
LXi =
I>; =3.9089
1=1
1=1
so that
= 7(3.9089)
- (5.23)2 = 0 000224
7(6)
.
Similarly,
Yi
= 3.64
yl = 2.6504
and
;=1
i=1
making
s~
=
--::...._---'c__________
= 0.000120
two critkal values come from Table A.4 in the Appendix:
F.025.4,6 =
0.109 and
F.97S.4.6
= 6.23
(O::mti.IuJW 00 fleXI page)
Section 9.5
Confidence Intervals for the Two-Sample Problem
Substituting. then, into the statement
confidence interval for (J';:I(f~:
587
Theorem 9.5.2 gives (0.203, 11.629) as a 95%
0.000224 09 0.000224 6 .23 )
01
( 0.000120.
'0.(100120
(0.203, 11.629)
Thus, although the three-year data had a
sample variance than the
data, no conc1usions can be drawn about the true variances being different, because
the ratio (J'V(J'~ : : : 1 is contained in the confidence interval.
Theorem 9.5.3. Let x and)' denote the numbers
observed in two independent
sets of 11 and II! Bernoulli trials, respectively. If px and py denote the tl1le success
by
probabilities. an approximate 100(1 - a)% confidence interval for px - py is
oX
)'
n
m
x
n
Proof
In
+
o
Question
CASE STU DY 9.5.3
Unti1 almost the end of the nineteenth century
mortality associated with surgicaJ
operations--even minor ones-was extremely
The major problem was ",f,,,...h,,·,,,
The germ theory as a modeJ for
transmission was still unknown, so
was no concept of sterilization. As a result. many patients died from postoperative
complications.
The major breakthrough that was so desperately needed finally carne when Joseph
a British physician,
reading about some
the work done by Louis
Pasteur. In a series of classic experiments, Pasteur had succeeded in demonstrating
the role that yeasts and
play fermentation. Lister conjectured that human
infections might have a
origin. To test his theory
began using
data in Table 9.5.3 show the
carbolic acid as an operating-room disinfectant.
(Continued on
/!exl page)
588
Chapter 9
Two-Sample Problems
(Case Study 9.5.3 ooTl1inued)
TABLE 9.53: Mortality Rates-Uster'sAmpvtations
Carbolic acid used?
Patient
lived?
Yes
No
Total
No
Yes
Total
19
34
16
6
53
22
40
outcomes of
amputations that he performed. thirty-five without the aid of
carbolic
and forty with the
of carbolic acid (213).
Let PW (estimated by ~) and pW/o (estimated by ~) denote the true survival
probabilities for patients amputated "with" and "without" the use of carbolic acid.
respectively. To construct a 95% confidence interval for Pw - pW/o we note that
Zt:tp 1.96; then Theorem 9.5.3 reduces to
=
34
19
40
35
= (0.31 - 1.9fw'0])i()3,0.31 + 1.96JO.Ol03)
= (0.11,0.51)
Since PW
pW/o
0 is not included in the interval (which lies entirely to the right
it should be concluded that carbolic acid does have an effect-a beneficial
one--on a surgery patient's survival rate.
QUESTIONS
9.5..L During the 19908, computer and communications industries were the glamour businesses. Were their high profiles. though, reflected in the compensation paid to their
CEOs (53)? The following table lists samples of 1995 salary plus bonuses (in $1000s)
chief executive officers from (1) the
and communications industry and
(2) the mOre traditional financial services industry. Construct a 95% confidence interval
for the difference in tbe average compensation
by the two groups. Note: The
pooled standard deviation for these data is 411.
Section 9.5
Confidence Intervals for the Two-Sample Problem
589
1995 CEO SllUuy + Bonuses ($l,OOOs)
& Communications
Company
Adobe Systems
Alltel
America Online
Applied Materials
BMC Software
Frontier
Nynex
Read-Rite
Solectron
Camp.
668
200
1688
752
1235
1485
1020
788
Financial Services
Company
Comp.
Boatmen's Bancshs
CCB Frnancial
Commercial Federal
ChicagoNBD
First of America Bk
Great Western Finl
Huntington
Magna Group
MBIA
National
Old National Bncp
OnBancorp
PNCBank
RCSB Financial
Summit Bancorp
1150
491
566
1296
498
953
504
750
799
292
500
647
1267
9.5.1. In 1965 a silver shortage in the United States prompted Congress to authorize the
minting of silverless dimes and quarters. They also recommended that the silver content
of half-dollars be reduced
9CI% to 40%. Historically, Huctuations the amount
of rare metals found in coins are not uncommon (75). The following data may be a
case in poinl Listed are the silver percentages found in samples of a Byzantine coin
minted on two separate occasions during the reign Manuel I (1143-118O). Construct
a 9CI% confidence interval for ~ x - iJ-y, the true average difference in the coin's silver
content (= "early" - "late"). What does the intecval imply about the outcome of testing
Ho: ILX = iJ-y? Note: ax == 0.54 and Sy = 0.36.
Coinage, Xi
(% Ag)
5.9
6.8
5.6
(iA
5.5
7.0
5.1
6.6
7.7
5.8
5.8
6.9
6.2
Average: 6.7
Average: 5.6
590
Chapter 9
Two-Sample Problems
9.5.3. Male fiddler crabs solicit attention from the opposite sex by standing in front of their
burrows and waving their claws at the females who walk by. If a female likes what
she sees, she pays the male a brief visit in his burrow. If everything goes weU and the
crustacean chemistry clicks, she will stay a little longer and mate. In what may be a
ploy to lessen the risk of spending the night alone, some of the males build elaborate
data (226) suggest that a male's time
mud domes over their burrows. Do the
waving to females is influenced by whether his burrow has a dome? Answer the
question by constructing and interpreting a 95% confidence interval for JLX - ILl" Note:
$p = 11.2.
% of Time Spent Waving to Females
Males with Domes, Xi
Males without Domes, YI
100.0
58.6
76.4
842
96.5
88.8
85.3
93.5
83.6
84.1
79.1
83.6
9.5.4.. Recall the preening time data in Table 8.2.2. Let /LX be the true average preening time
for male fruit flies, and /.LY, the true average for female fruit Hies. Construct a 99%
confidence interval for ILX - /.Ly. What do the
of your interval imply about
the outcome of
Ho: /.LX = ILl' versus H1: /.LX ¢ ILl' at the a ::=. 0.01 level of
sjgnificance?
9.5.5. Carry out the details to complete the proof of Theorem 95.1.
9.5..6. Suppose that Xl, X2, ... ,
and Yh Yl, ... , Ym are independent random samples from
normal distributions with means /.LX and /.Ly and known standard deviations Cfx and Cfy,
resp~tively. Derive a 100(1 - a)% confidence interval for /.LX - /LY.
9.5.7. Construct a 95% confidence interval for (1ji:l(1~ based on the presidential life
data
in Question 93.1. The hypothesis test referred to in Part (a) of that question
to a
"fail to reject Ho" conclusion. Does that agree with your confidence interval? Explain.
9.5.8. One of the parameters used in evaluating myocardial function is the end diastolic
volume (EDV). The following table shows EDVsrecorded for
Normal, Xj
Constrictive
Yi
62
24
60
56
42
62
74
49
44
67
28
80
48
Section 9.6
Taking a Second look at Statistics (Choosing Samples)
591
to have normal cardiac fUllctjon and for six with constrictive pericarditis (204). Would
it be correct to use Theorem 9.2.2 to test Ho: f.J.x = f.J.y? Answer the question by
constructing a 95% confidence interval for uiJu~.
Complete the proof of Theorem 9.5.2.
9.5.10. Construct an 80% confidence interval
the difference PM - PW in the nightmare
frequency data summarized in Case Study
9.5.1L If PX and py denote the true success probabilities associated with two sets of nand m
independent Bernoulli trials, respectively. the ratio
x
-
9.5.12.
y
-
-
(PK
has approximately a standard normal distribution. Use that fact to prove Theorem
rates in the United States tend to be much higher for men than for women,
at aU ages. That pattern may not extend to aU professions, though. Death certificates
obtained for the 3637 members of the American Chemical Society who died over a
2O-year period revealed that
the
male deaths were suicides, as compared
to 13 of the 115 female deaths (103). Construct a 95% confidence interval for the
difference in suicide rates. What wouJd you conclude?
TAKING A SECOND LOOK AT STATISllCS (CHOOSING SAMPLES)
....,."l\..1'l.J'''UI'~
sample sizes is a topic that invariably receives extensive coverage whenever
statistics and
design are disCussed.
good reason. Whatever the
context, the number of observations making up a
set
prominently in the
ability of those data to address any and all of the questions raised by the experimenter. As
sample
get
we know that estimators become more precise
hypothesis tests
at distinguishing between Ho and Hl. Larger sample
of course, are also
get
more expensive. The trade-off between how many observations researchers can afford to
take and how many they would like to take is a choice that has to be made early on in
design of any experiment If
sample
ultimately decided upon are too small,
is a risk that the objectives of the study will not be fuHy achieved-parameters may be
estimated with insufficient precision
hypothesis tests may reach incorrect conclusions.
That said, Choosing sample sizes is often not as critical to the sUCcess of an experiment
as choosing
subjects. In a two-sample design,
example, how should we ........'..........
which particular
to
to Treatment X and which to Treatment Y. If the
subjects comprising a sample are somehow "biased" with respect to
measurement
being recorded,
integrity of the conclusions is irretrievably compromised. There are
no statistical techniques
"correcting" inferences based on measurements
were
biased in some unknown way. It is aJso true that biases can be very SUbtle, yet still have
a pronounced effect on the final measurements. That being the case, it is incumbent
OIl researchers to take every possible precaution at
outset to prevent inappropriate
to treatments.
assignments of
For example, suppose
your
Project you plan to study whetber a new
synthetic testosterone can affect the behavior of female rats. Your intention is to
up a
592
Chapter 9
TwcrSample Problems
two-sample design where ten rats will be given weekly injections of the new testosterone
compound and another ten rats will serve as a control group, receiving weekly injections
of a placebo. At the end of eight weeks, all twenty rats will
put in a large community
cage, and the behavior each one will be closely monitored for signs of aggression.
Last week you placed an order for twenty female Rattus norvegicus from the local
cage. Your plan is
Rats 'R Us franchise. They arrived today, aU housed in one
to remove ten of the twenty "at random," and then put those ten in a similarly
The ten removed will be receiving the testosterone
the ten remaining
the originaJ cage will constitute the control group. The question is, which ten should be
removed?
The obvious
in and pull out ten (what's the big deal?)-is very much the
wrong answer! Why? Because the samples fonned in
a way might very weU biased
for example, you (understandably) tended to avoid trying to grab rats that looked
they might
If that were the case, the ones you drew out would be biased, by virtue
of
more
than the ones left behind. Since the measurements ultimately to be
taken deal with
the samples in that particular way would be a fatal
flaw. Whether the total sample size was twenty or two thousand, the results would be
worthless.
In
on our intuitive sense the 'word "random" to allocate
to
to number
different treatments is risky, to say the least The correct approach would
rats from one to twenty and then use a random number table or a computer's random
number generator to identify the ten to be removed. Figure 9.6.1 shows the MINITAB
syntax for choosing a random sample often numbers from the integers one through twenty.
According to this particular run of the SAMPLE routine, the ten rats to be removed for
the testosterone injections are (in order) numbers 1, 5, 8, 9,10,14,15,18,19 and 20.
There is a moral here. Designing, carrying out, and analyzing an experiment is
an exercise that draws on a variety of scientific, computational, and statistical skills,
some
which may be
sophisticated. No matter how weU those
issues
are attended to, though, the
will fail if the simplest and most basic aspects of
the experiment--such as assigning subjects to treatments--are not carefully scrutinized
and properly done. The Devil, as the saying goes, is in the details.
> set c1
DATA> 1 :20
DATA> and
M'I'8
MTB
MTB
> sample 10 c1 c2
> print c2
Data Display
c2
18
1
20
19
9
10
RGURE 9.6.1
8
15
14
5
A Derivation of the Two-Sample t Test (A Proof of Theorem 9.2.2)
Appendix 9.A..1
!ENDiX 9A1
593
A DERlVAl10N Of THE lWO-SAMPLE r: TEST (A PROOF OF THEOREM 92.2)
To begin, we note that bath the restricted and unrestricted parameter spaces, wand
are three dimensional:
w
Q
= {(J.tx, J.tY, a): -00 <
= {(J.tx, J.ty, (1): -00 <
-00
< p,y < 00,0 < a < 00)
n
i=1
n
m
11
=
J.tx < 00,
< 00,0 < (1 < 00)
Ys are independent (and normal),
theXs
L(w)
= J.ty
J.tx
!X(Xi)
fr(Yi)
j=l
(9A.l.l)
where J.t = p,x = p,y. If we take In L(w)
solved L(w)/aJ.t = Oand a L(w)/au 2 = 0
....
",t
...........-I maximum-likelihood estimates:
simultaneously, the solutions will be
n
LXi +
i=l
(9.A.l.2)
and
(Xi -
0'2
p"./ + L'" (YJ
-
J.te)2
= _______i=_l_ _ __
We
Substituting Equations 9.A.1.2 aod 9.A.1.3
the generalized likelihood ratio:
n
+
(9A.l.3)
m
Equation 9.A.1.1
Similarly, the likelihood function unrestricted by the
the numerator of
hypothesis is
594
Ola pter 9
Two-Sample Problems
Here,
dlnL(O)=O
dlnL(O)=O
dJ.LX
alnL(O)
0
dJ.LY
gives
+
=------------~--------
n+m
If these estimates are substituted into Equation 9.A.l.4, the maximum value for L(O)
simplifies to
L(Ol')
-1
2 )(n+m)/2
= ( e jmuo.
It follows, then, that tbe generalized likelihood ratio, ).... is equal to
or, equivalently,
11
L(Xi - X)2
)...2/(n+m)
+
= ______1=_1_ _ _--:::-_-'-_ _ _ _ _ _ _ _--;::-
Using the identity
nX'+
=
Appendix 9A2
MINITAB Applications
595
we can write ).2/(n+m) as
(Xi
;=1
(Y'J _ y)2 + n_nm_
+m
;=1
1
-----------------~----~-------------
1
+
~--------~--~~--~------m
(Xi - X)2
+
L (Yi _
(~ + ~)
y)2
j=1
n+m-2
---------------------~---
11
where s~ is
+m
-
2
(X -:vi
+ -"..,""""';;---:;-'---..,-
sj[O/n) +
(l/m)]
pooled variance:
Therefore, in terms of the observed t ratio, A.2/(n+l1I) simplifies to
).2/(I1+m)
=
n +m - 2
n+m-2+
(9.A.L5)
At this point the proof is almost complete, The generalized likelihood ratio criteriOn,
rejecting Ho: Ji-X = Ji-Y when 0 < ). ::s ). '", is clearly equivalent to rejecting the null
hypothesis when 0 < ).,2/(II+m) ::s
. But both of these, from Equation 9.A.l.5, are the
same as rejecting Ho when ,2 is too large. Thus the decision rule in tenns of ,2 is
Reject Ho: P,x
= p,y in favor of HI: P,x
¢
p,y if,2 :::: t.,(1
Or, phrasing this in still another way, we should reject Ho
P(-t"" < T <
,'"I Ho: Ji-X = Ji-Yis true) = 1 -
By Theorem 9.2.1, though, T has a Student r distribution withn
±t'" = ±lu/2.n+m-2, and the theorem is proved.
NDIX 9A2
+m
(t
- 2df, which makes
MaNITAS APPUCATIONS
MlNITAB has a simple command--lWOSAMPLE
doing a two-sample t
test on a set
and YiS stored in
Cl and
respectively.
same command
automatically coru;tructs a 95% confidence interval
Ji-X - P,y.
""''''-UJA
596
Chapter 9
Two-Sample Problems
MTB >
DATA>
DATA>
MTB >
DATA>
DATA>
DATA>
MTB >
MTB >
SUBC >
set c1
0.225 0.262 0.217 0.240 0.230 0.229 0.235 0.217
end
set c2
0 209 0.205 0.196 0.210 0.202 0.207 0.224 0.223
0.220 0.201
end
name c1 'X' c2 'Y'
twosample c1 c2;
pooled..
Two-Sample T-Test and CI: X, Y
Two-sample T for X vs Y
N
Mean
StOev SE Mean
X 8
0.2319 0.0146 0.0051
Y 10 0.20970 0.00966 0.0031
Difference = mu (X) - mu (y)
Estimate for difference: 0.022175
95% CI for difference: (0.010053. 0.034297)
T-Test of difference • 0 (vs not
: T-Value - 3.88 P-Value = 0.001 DF • 16
80th use Pooled StDev a 0.0121
RGURE 9.A.2.1
Figure 9.A.2.1 shows the syntax for analyzing the Quintus Curtius
Table 9.2.1. Notice that a subcommand is included. If we write
~nloo~~ralSS
data in
MTB > twosample c1 c2
MINJTAB will assume the two population variances are not equal, and it will perform the
approximate I test described in Question 9.2..15. If the intention is to assume that
(}'~
(and do the t test as described in Theorem 9.2.1), the proper syntax is
MTB
> twosample c1 c2;
SUBC
>
As is typical, MINJTAB
the test statistic with a P-value rather than an
"Accept Ho" or "Reject Ho" conclusion. Here, P = 0,001 which is consistent with the
Study 9.2.1 to "reject Ho at the Oi = 0.01 level of significance."
decision reached in
Figure 9.A.2.2 shows the "unpooled" analysis of these same data. The conclusion is the
same, although the P-value has almost tripled. because both the test
and
degrees of freedom have decreased (recall Question 9.2.16).
Appendix 9A2
MTB >
DATA>
DATA >
MTB :>
DATA>
DATA>
MTB >
MTB >
MINITAB Applications
591
set c1
0.225 0.262'0.
0.240 0.230 0.229 0.235 0.217
end
set c2
0.209 0.205 0.196 0.210 0.202 0.207 0.224 0.223 0.220 0.201
end
name c1 'X' c2 'V'
tvosample c1 c2
Two-Sample T-Test and CI: X, Y
Two-sample T for X vs Y
N
Mean
StDev
SE Mean
X 8
Y 10
0.2319
0.20970
0.0146
0.00966
0.0051
0.0031
Difference = mu (X) - mu (y)
Estimate for difference: 0.022175
95% CI for difference: (0.008997, 0.035353)
T-Test of difference = 0 (va not B): T-Value = 3.70 P-Value: 0.003 DF
AGURE 9.A.2.2
Testing "0: ILx
= ILl' Using MlNJTAB Windows
1.
two samples in Cl and
respectively.
2.. Click on STAT, then on BASIC STATISTICS, then on 2-SAMPLE t.
3. Click on SAMPLES IN DIFFERENT COLUMNS, and type Cl in FIRST box
and
SECOND
4. Oick on ASSUME
VARIANCES (if a pooled t test is desired).
S. Click on OPTIONS
6. Enter value for 100 (1 - a) in CONFIDENCE LEVEL box.
Click on NOT EQUAL; then
on whichever
is desired.
8. Click on OK; click on remaining OK.
= 11
C HAP T E R
10
Goodness-of. . Fit Tests
10.1 INTRODUCTION
10.2 THE MUlTlNOMIAl DISTRlBU110N
10.3 GOODNEss..OF-m TESTS: All PARAMETERS KNOWN
10.4 GOODNESS-OF-FIT TESTS: PARAMrnRS UNKNOWN
10.5 CONTINGENCY TABLES
10.6 TAKING A SECOND LOOK AT STATISTICS (OUTIJERS)
APPENDIX 10A1 MINITAB APPUCATIONS
Karl Pearson
Called by some the founder of twentieth-century statistics, Pearson received his university education at Cambridge, concentrating on physics..
philosophy, and law. He was called to the bar in 1881 but never practiced.
In 7911 Pearson
his chair of applied mathematics and mechanics
at University College, London, and became the first Galton Professor of
Eugenics, as was Galton's wish Together with Weldon, Pearson founded
the prestigious journal Biometrika and served as its principal editor from
1901 until his
-Karl Pearson (1857-1936)
598
Section 10.2
The Multinomial Distribution
599
INTRODUCTION
The give and take between the mathematics of probability and the empiricism of statistics
again we have seen
should be, by now, a theme comfortably familiar. Time and
repeated measurements, no matter what their source, exhibiting a regularity of pattern
that can be well approximated by one or more of the handful of probability functions
introduced in Chapter 4. Until now, a11 the inferences resulting from this interfacing have
been parameter specific, a facl to which the many hypothesis tests about means, variances,
and binomial proportions paraded forth in Chapters 6, 7, and 9 bear ample testimony.
Still, there are other situations where the basicfonn px(k) or fy(y), rather than the value
of its parameters, is the most important qucstion at issue. These situations are the focus
of Chapter 10.
A geneticist, for example,
want to know whether the inheritance of a certain
set of traits follows the same set of ratios as those prescribed by Mendelian theory.
The objective of a psychologist, on the other hand, might be to confion or refute a
newly proposed model for cognitive serial learning. Probably the most habitua1 users of
inference procedures directed at the entire pdf, though, are statisticians themselves: As
a prelude to doing any sort of hypothesis test or confidence interval, an attempt should
be made, sample size permitting, to verify that the data are, indeed, representative of
whatever distribution that procedure presumes. Usually, this will mean testing to see
whether a set of YiS might conceivably be representing a normal distribution.
In general, any procedure that
to determine whether a set oC data could reasonably
have originated from some given probability distribution, or class of probability distributions, is cal1ed a.goodness-of-fil'X.est. The principle behind the particular goodness-of-fit
test we will look at is very straightf01ward: First the observed data are grouped, more
or Jess arbitrarily, into k
then each
occupancy is calculated
on the basis of the presumed model. If it should happen that the set of observed and
expected frequencies show considerably more disagreement than sampling variability
would predict, our conclusion will be that the supposed px(k) or fy(y) was incorrect.
In practice, goodness-of-fit tests have
variants, depending on the specificity of
nuH hypothesis. Section (0.3 describes the approach to take when both the form of
the presumed data model and
values of its parameters are known. More typically, we
know the form of Px(k) or Jy(Y), but their parameters need to be estimated; these are
taken up in Section lOA
somewhat different application of goodness-oC-fit testing is the focus of Section to.5.
There the null hypothesis is that two random variables are independent. [n more than a
few fields endeavor, tests for independence are among the most frequently used of all
inference procedures.
THE MULTINOMIAL DISTRIBUTION
Their diversity notwithstanding, most gOodness-of-fit tests are based on essentially the
same statistic, one that has an asymptotic chi square distribution. The underlying structure
of that statistic, though, derives from the multinomial distributioll, a direct extension of
the familiar binomial. In this section we define the multinomia1 and state those of its
properties that relate to the probtem goodness-of-fit testing.
600
Chapter 10
Goodness-of-Fit Tem
Given a
of 11 independent Bernoulli trials, each with success probability p, we
know that the pdf for X, the total number of successes, is
P(X =k)
= px(k) = (:)pk(l
-
p)n-k,
k=O,l, ... ,n
(10.2.1)
of the obvious ways to
Equation 10.2.1 is to consider situations
at each trial one of t outcomes can occur, rather than just one of two. That is, we will
assume that each trial wi]) result in one ofthe outcomesrl, r:t •...• rr, where p(ri) = Pi, i =
1. 2, .... t (see
10.2.1). It follows, of course, that
Possible
outcomes
rJ
Tl
r:t
12
r,
rr
1
2
Pi
i
Pi = 1.
rl
= P(n),
'2
= L 2.. .. ,1
rl
-
n
Independenttrla1s
FIGURE 10.2.1
In the binomial model, the two possible outcomes are denoted sand f, where P(s) = P
and P(f) = 1 - p. Moreover, the outcomes of the 11 trials can be nicely suounarized
with a single random variable X. where X denotes the number of successes. In the more
multinomial model, we will need a random variable to count the number of times
that each of the riS occurs. To that end, we
Xi
= number of times Yi occurs,
For a gjven set of n trials, then, Xl
= kl. Xl = k2,
i
.. , XI
= 1,2, ...• 1
k/
Theorem lO.2.L Let Xi denote the number of times that the outcome r; occurs, i 1,2, ... ,I, in a series of n independent trials, where Pi = P(r;). Then the vector
(Xl,
... , XI) has a multinomial distribu tion and
=---I
k;
= 0, 1,. .. , n;
i
= 1, 2, .... t; L k, = n
i;ol
Proof. Any particular sequence of kt T18, k2128 • ... , and kr TIS has probability p~1
p~t. Moreover, the total number of outcome sequences that win generate the values
Section 10.2
The Multinomial Distribution
601
(kl. k2, ... , k t ) is the ~~"U~~~ of ways to permute n objects, k. of one type, k2 ofasecond
type, ... , and It, of a t th
By Theorem 2.6.2 that number is n! / It} !kl!.. kt !, and
statement of the
follows.
Comment. Depending on the context, the TiS associated with the n trials
Figure I 0.2. 1 can
either
numerical values (or categories) or ranges
numerical
values (or categories). Example 10.2.1 illustrates the first type; Example 10.2.2, the second
The only requirements imposed on the r;s are (1) they must span all of the outcomes
possible at a given trial and (2) they must
mutually exclusive.
EXAMPLE 10.2.1
Suppose a loaded die is tossed twelve times, where
Pi
=
=
i appeaR»
i = 1,
... ,6
What is the probability that each face will appear exactly twice?
Note, first, tbat
6
6(6
6.
LPt::;:: 1 = LeI =c
1=1
>
+
1)
~...".--
;=1
which implies that c =
(and Pi = i /21). In the terminology of Theorem 10.2.1, then,
the possible outcomes each trial are the t = 6 faces, 1
71) through 6
76), and Xl is
of times face i occurs, i = 1,2, ... ,6.
question is asking
the probability of the vector
According to
P(XI
""'W'P'IT!
10.2.1,
= 2, Xl = 2, ... , X6 =
=
0.0005
EXAMPLE 10.2.2
Five observations are drawn at random from the pdf
fy(y)
=
6y(1
y),
O:s y :s 1
What is the probability that one of the observations lies in the interval [0, 0.25), none in the
interval
0.50), three the
[O.SO, 0.75), and one the interval [0.75.1.00]1
602
Chapter 10
Tests
2
_____ ,_~fy(y)==6Y(1
..,
,,
, ""
,
"
, Pt
0
P:o,
P4
0.50
0.25
Y2
rl
0.75
r3
1.00
'4
AGUftE 10.2.2
Figure 10.22 shows the pdf being sampled, together with the ranges 1'1. '2, 1'3, and '4,
and the intended disposition of
five data points. The PiS of Theorem 10.2.1 are now
example,
PI:
areas.. Integrating Jy(y) from 0 to
Pl
=
fO.25
10 6y(I.-
y) dy
=3y210.25 - 2y31°·25
o
=
By symmetry, P4
=
0
5
Moreover, since the area under fy(y) equals 1,
P2
= P3 = ~2 (1
_ 3210) = 1132
Let Xj denote the number of observations that fall into the ith range, i = 1,2.3,4. The
probability.
associated with the multinomial vector (1, 0, 3, 1) is 0.0198:
3,
= 1) = 1!
O~~! 11 (:2Y G~r (~~y (:2Y
= 0.0198
A MultinomiaUBinomial Relationship
Since the multinomial pdf is conceptually a straightforward generalization of the binomial
pdf, it should come as no surprise that each XI in a multinomial vector is, itself, a binomial
random variable.
Theorem 10.2.2.. Suppose the vector
,X2 •... • Xl) is a multinomial random variable
with parameters n. Pl. P2 •... , and PI' Then the marginal distribution of Xi. i = 1,2, ... , t,
is the bim;mia1 pdf with parameters 11. and Pi.
Section 10.2
The Multinomial Distribution
601
Uo;;;'.JL!~N
the pdffor we need
to dichotomize the
outcomes
into "ri" and "not rj."
Xi becomes,
Download