Mathematical Statistics Textbook: Larsen & Marx, 4th Ed.

A ntroductio to athema ical Statistics and Its Applications Fourth Edition Richard J. Larsen I Morns L. Marx n Introduction to athematical Statistics and I pplications Fourth Edition Ricbard J. Larsen Vanderbilt University Morris L. Marx University o/West Florida Upper Saddle River, New Jersey 07458 Libral'} of Congress Cat~llg-in-Public.ation Data Larsen. Richard 1. An introduction to mathematical statistics and its applicatiOJls I Richard J. Larsen. Morris L. Marx,-41h ed. p.em. Includes bibliographical references and index. ISBN 0-13-186793-8 J. Mathematical stalistics. L Marx, Morris L. OP data available n. Title. Editor-in-Chief/ACQuisitions Editor: Sally Yagan ManagerlFormatter: Inlerecfive Compo_~ition Corporation Assistant Managing Editor. Bayani Memioza de Leon Senior Managing Editor. Linda Mihalov Behrens Executive Managing Editor: Kathleen Schiapare/li Manufacturing Alexis Heydf-Long Manufacturing Buyer: Maura Zaldivar Marketing Manager: Halee Dinsey Marketing Assistant: loon Won Moon Director of Creative Services: Palll Bel/ami Art Director: layne Conte Cover Bruce Kenselaar Editorial Assistant Jennifer Urban Cover Image: GeUy Images, Inc. © 2006, 2001. 1986, 1981 Pearson Education, Inc. Pearson Prentice Hall Pearson Education, Inc. Saddle River, NJ 07458 All reserved. No part of this bo<lk may be reproduced, in any form or by any means, without permission in writing from the publisher. Pearson Prentice Hall™ is a trademark of Pearson Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 0-13-186793-8 PearsOJl Education Ltd., London Pearson Education Australia PTY. Limited, Sydney Pearson Education Singapore, Pte., ltd Pearson Education North Asia Ltd, Hong Kong Pearson Education Canada, Ltd., Toromo Pearson Education de Mexico, S.A. de c.v. Pearson Education Tokyo Pearson Education Malaysia, Pte. Ltd Inc. Table of Con nts vii Preface 1 1 Introduction L 1 A Brief l-he'Tn,'" 1.2 1.3 2 3 2 11 20 Some h'Xl'lmJ>les A Chapter Probability 2,1 Introduction , . . . . , . . . . . . 2.2 Sample Spaces and the AlgdKa of Sets 2.3 . The Probability 2.4 Conditional Probability . 2.5 Independence Combinatorics 2.7 Combi..ru:ltorial Look at Statistics (Enumeration and 2.8 Taking a Monte Carlo Techniques) . . . . . . . . . . . . . . . . Random Variables 3.1 Introduction. . . . . , . . , . , . 3.2 Binomial and Hypergeometric Probabilities Discrete Random Variables .. Continuous Random Variables Expected Values. 3.6 The Variance . . . . . Joint Densities. Combining Random Further Properties of the Mean and Variance Order Statistics Conditional Densities , . , . . . . . . . . . . . Moment-Generating Functions . . . . . . . . Taking a Second Look at Statistics (Iut(ttpreting Means) Appendix 3.A 1 MINIT AB Applications . . . . . . . . . . . . V:IITl>.n. 4 Special Distributions 4.1 Introduction........ 4.2 Distribution . ''I'''' Distribution . 4.4 Negative Binomial Distribution. 4.5 4.6 Gamma Distribution. . . . . . . . n... 21 24 36 42 69 113 .203 .220 240 .249 .271 274 .275 .292 .317 .322 .327 iii iv Table of Contents 4.7 at Statistics (Monte Carlo Appendix 4.A.l MINIT AS Applications. Appendix 4.A.2 A Proof of Central Theorem. .333 .337 .341 5 Estimation 343 .'1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 5.2 Estimating Parameters: Method of Maximum Likelihood .346 and the Method of Moments . 5.3 Interval Estimation ... 5.4 Properties of Estimators The Cramer-Rao Lower Bound 5.5 Minimum-Variance 5.6 Sufficient .398 5.7 Consistency . . . . .406 . . . . . . . . . . ...... . . ... 410 5.8 Estimation . . . . . . . . . . . . . . . .423 5.9 Taking a Second Look at Statistics (Revisiting the Margin of Error) .424 ........ . Appendix 5.A.l MINITAS Applications . . . . . . . . 6 Hypothesis Testing 6.1 Introduction . . ..... 6.2 The Decision Rule. . . . . . . 6.3 Binomial p = Pu . 6.4 I and Type IT Errors . . . . . . 6.5 A Notion of Optimality: Generaliled Likelihooo 6.6 Taking a Look at Statistics (Statistical Significance versus "Practical" Significance) . . . . . . . . . . . . . . 7 .428 .440 446 .. 462 .466 469 The NorltUl.l Distribution 7.1 427 .428 Introduction Comparing v-I! an d s/jYi 7.3 Deriving the Distribution of 7.4 DraWing Inferences About 11 . About (12 7.6 Taking a Second at Statistics ("Bad" Eslimatur~) . Appendix 7.A.1 M1NITAB Applications . . . . . . . App~m.lix 7.A.2 Sume Dlstrlbution Results for Y and Appendix A Proof Theorem 7.5.2 . . . . . . . Appendix 7.AA A Proof that the One-Sample, TestIs a 8 Types of Data: A Brief 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . Classifying Data . Taking a Second Look at Statistics (Samples Not "Valid") .473 .481 .499 .509 .510 .514 . 516 . 519 .523 528 .552 Table of Contents 9 Two-Sample Problems 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Testing Ho: JJ..x = JJ..y- The Two-Sample t ................. 9.3 Testing Ho: = The FTest . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Binomial Data: Testing Ho: px = Py . . . . . . . . . . . . . . . . . . . . . . 9.5 Confidence Intervals for the Two-Sample Probl.em . . . . . . . . . . . . . . 9.6 Taking a Second Look at Statistics (Choosing Samples) . . . . . . . . . . . Appendix A Derivation of the Two-Sample t Test (A Proof of Theorem 9.2.2) . . . . . . . . . . . . . . ......... Appendix 9.A.2 MINITAB Applications . . . . . . . . oJ cri- v SS3 554 555 582 591 593 10 Goodness-of-Fit Tests 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 10.2 The Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Goodness-of-Fit Tests: All Parameters Known. . . . . . . . . . . . . . . . . 10.4 Goodness--of-Fit Parameters Unknown . . . . . . . . . . . . . . . . . Contingency Tables .............................. 10.6 Taking a Second Look at Statistics (Outliers) . . . . . . . . . . . . . . . . . Appendix 10.A.1 MINITAB AppliC3tions ................... S98 599 599 606 11 Regression 646 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 The Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 The Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 11.6 Taking a Seoond Look at Statistics (How Nor to Interpret the Sample Correlation Coefficient) . . . . . . . . . . . . . . . . . . . . . . .. Appendix 1l.A.l MINITAB Applications . . . . . . . . . . . . . . . . . . . . . Appendix 11.A.2 A Proof of Theorem 11.3 j . . . . . . . . . . . . . . . . . . . . 615 627 64{) 644 . 641 . 647 . 677 . 102 . 117 . . 728 12 The Analysis of Varbmce 732 12.1 Introduction . . . . . 12.2 The FTest . . . . . . 123 Multiple Comparisons: Tukey's Method . . . . . . . . . . . . . . . . . .. 747 12.4 Testing Subhypotheses with Contrasts . . . . . . . . . . . . . . . . . . . . . 751 Data Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 Taking a Second at (Putting Subject of Statistics Together-The Contributions of Ronald A. Fisher) . . . . . . . . . . . . . . 161 Appendix 12.A.l MINITAB AppliC3tions . . . . . . . . . . . .. ...... . """VfJ'-'LllULA 12.A.2 A Proof of Theorem . . . . . . . . . . . . . . . . . . . . . 166 Appendix 12.A.3 The Distribution of sfft~~~~) When HI Is . . . . . . . . 767 vi Table of Contents 13 Rondomized Block D~ilJns 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Block Design . . . . . . . . . . . . . . . .. 13.2 The FTest for a 13.3 The Paired l Test . . . . . . . . . . . . . . . . . . . . .. "" 13.4 Taking a Second Look at Statistics (Choosing Between a Two--Sample t Test and a Paired, . . . . . . . . . . . . . .. . . . . ,. . ... Appendix HAl MINITAB Applications . . . . . . . ,. . . , . , . . . . . . 772 773 774 788 14 Nonpanmetric Statistic::s 14.1 InLIoouctioo . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 14.2 The Sign Test . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . Wilcoxon Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 14.4 The Kruskal-Wallis Test . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14.5 The Frierlman Test . . . . . .. . . . . . . . . . . . . . . . . , . . . . . . 14.6 Testing for Randomness . . . . . . . . . . . . . .. . . . . , . . . . , . . . a Second Look at Statistics (Comparing Parametric 14.7 Nonparametric Procedures) . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 14.Al MINITAB Applications . . . . . . . . . . . . . . . . . . . . . . 802 R03 804 810 826 832 835 '196 800 841 846 Appendix: Statistical Tables Answers to Sdected Odd-Numbered Questions 876 Bibliography 907 Index Preface our text has been sufficiently well received to justify this fourth who use the text like the coupling of the rigorous and structured treatment of probability and statistics with real-world case studies and users of the book have been helpful in pointing out ways to improve our cmlIl,l:~es found in this fourth edition reflect the many helpful suggestions we as weB as our owo experience in teaching from the text Our first goal in writing this fourth edition was to continue strengthening the bridge ",ptrwl"p" theory and practice. To that end, we have added sections at the end each Taking a Second Look at Statistics. These sections discuss practical problems in applying the ideas in the chapter and also deal with common misunderstandings or faulty approaches. We also have induded a new section on Bayesian estimation that well into Chapter 5 on estimation and gives another view of how estimation "'1J1J,u'cu. It introduces students to Bayesian ideas and also serves to reinforce the main concepts of estimation. ideas that are useful and important lie beyond the mathematical scope of the text. To such topics within the mathematical context of the book, we have and the materia! on simulation and on the use of Monte Carlo studies. MINITAB is the main tool for simulations and demonstrating computer computations, the MINITAB sections have been rewritten to conform to Version 14, the latestrelieas<e. .... '''rTl •• r to of the book has been the length of time required to cover cnapters 2 and 3. One of the major changes in the fourth edition is a substantial 1'''''''''''''')T'I basic probability material. Chapters 2 and 3 have been reorganized and rewritten with the a streamlined presentation. These chapters are now easier to teach can be less time, yet without loss of rigor. In that same spirit, we have also improved and streamlined the development of the I, and F distributions in Chapter 7, the heart of the book. The material to the development of the chi square distribution, In addition, we have made a much better division between the theoretical results and their applications. Because of the efficiencies in the new edition, covering Chapters 1-7 plus additional in one semester is now possible. a book All in all, we feel that this new edition furthers our objective of emphasizes the interrelation between probability theory, mathematical data analysis. As in previous editions, real-world case studies and mS;WI:IC<ll aJllec:aotes provide valuable tools to effect the integration of these three areas. this approach. ..:ILLIUClIll:> the classroom has strengthened our importance of each area when seen in the context of the other two. vii viii Preface SUPPLEMENTS InstruClor's Solulions Manual. This resource contains worked-out solutions to all text exercises. Student SO/Uliom Manual. Featuring complete solutions to selected tool for ~I; they f'itudy and work the problem lll,,"'-LU:U. this is a ACKNOWLEDGMENTS We would like to thank the following reviewers for and suggestions: detailed and valuable CTllUC'lsn£lS, Ditlev Monrad, University of lIIinois at Urbano-Champaign Vidbu S. Prasad, University of Massachusetts, Lowell Xu, California State University. Long Beach Katherine SL Oair, Colby YimiIl Michigan State University Nicolas Univers;ty of California) Los Angeles. University of Oregon Ohio University University of Ca/ifomia 01 San Diego Finally, we and acquisitions our and editorial u;>"",,,, ••,,, of Interactive of book. to Prentice Hall's math editor-in-chief Sally Yagan, managing editor, Mendoza de Leon, Jennifer Urban, as well as to project manager, Jennifer Crotteau, Corporation, for excellent teamwork in the production Richard J. Larsen Nushllille, Teilne~~ee Morris L. Marx Pensaco/a, Florida CHAPTER 1 Introduction 1.1 A BRIEF HISTORY 1.2 1.3 SOME EXAMPLES A CHAPTER SUMMARY Francis Galton "Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with phenomena is extraordinary. They are the only tools by which an opening can cut through the formidable thicket of that bars the path of those who pursue the Science of man. " -Francis 2 1.1 Chapter 1 Introduction A BRIEF HISTORY Statistics is the science of sampling. How one set of measurements differs from another and what implications ofthose differences are its Conceptually, the subject is rooted in the mathematics of probability, but its applications are everywhere. Statisticians are as likely to be found in a research lab or a field station as they are in a government an advertising finn, or a classroom. Properly statistical techniques can be enormously effective clarifying and quantifying natural phenomena. Figure 1.1.1 illustrates a case in point Pictured at the top is a facsimile of the kind of data routinely recorded by a seismograph-listed chronologically are the (x;(;LUrenc.;e times and Richter magnitudes (or a series of carthquakes. Viewed in that {onnat, the numbers are largely No paUerns are nor is there any obvious connection between the frequencies tremors and their severities. By way of contrast, bottom of 1.1.1 shows a statistical summary (using some the techniques we will learn of a set of seismograph data recorded 217 218 219 6119 4:53 PM. 27 712 6-fflA.M. 7/4 220 221 817 8f1 8:19AM 1:10 A.M 3.1 2.0 4.1 10-46 P M .16 , ~ N;80,338.16e-t.98IR o 4 5 6 Magnitude on Richter scale, R FIGURE 1.1.1 7 Section 1.1 A Brief History 3 southern California (66). Plotted above the Richter (R) value of 4.0, for example, is the "".... "..,,'" number (N) of earthquakes occurring per year in that region magnitudes in the range to Similar points are included R-values centered at 4.5, 5.0, 6.0,6.5. and 7.0. Now we can see that the two variables are related: Describing (N, R)'s exceptiona]]y well is the equation N = 8O.338.16e-1.981R. In general, techniques are employed to (1) describe what did happen or (2) predict what might graph at the bottom of Figure 1.1.1 does both. "fit" the N = /Joe- fJ1R to the observed set of minor tremors (and finding = 80,338.16 and fh = -1.981), we can then use that same equation to predict the . . "-•.,La,""",",,,",, of events lIot represented the data set. If R = 8.0, for we would expect N to equal 0.01: N = 80,338.16e-1.98HS.0) =0.01 implies that Californians can catastrophic earthquakes registering on the of 8.0 on the scale to occur. on the once every 100 It is unarguably true that the interplay between and to what we see in Figure 1.1.1-i5 the most theme in statistics. Additional highlighting connection will be discussed Section 1.2. To set the stage for the rest of the though, we will conclude Section 1.1 with brief 1'".,lnn.,..;: of probability and statistics. are interesting stories, replete with large casts of unusual characters and plots that have more than a unexpected and tums. Probability: The Early Years No one knows where or when the notion chance first arose; it fades into OUT prehistory. Nevertheless, evidence linking early humans Wilh devices generating random events is plentiful: Archaeological digs. for example, throughout the ancient world consistently tum up a overabundance of astragali, the heel bones of sheep and Why should the frequencies of these bones be so hypothesize that OUT were fanatical foot but two other explanations seem more plausible: The were used for religious ceremonies and for gambling. Astragali have six but are not symmetrical Figure 1.1.2). Those found m typically have their numbered or engraved. For many ancient Sheep astTagalus FIGURE 1.1.2 4 Chapter 1 Introduction civilizations, astragali were the primary through which oracles the opinions nf their gods, In Asia Minor, for example, it was in diviMlticm J ilt:!> to roll, or cast, five astragali, Each possible configuration was with the name of a 3,3,4,4), instance, god and with it the sought-after advice. An outcome of was to the throw of the savior and its was taken as a sign of encouragement (36): One one, two two fours The deed which thou meditatesl, go do it boldly. Put thy hand to it The gods have given thee favorable omens Shrink not from it in mind, for no evil shall befall thee. on the other hand, the throw for cover: the child-eating Cronos, would send Three fours and two sixes.. God as follows. Abide. in thy house., nor go elst~whiere Lest a ravening and destroying beast come nigh thee. For I see not that this business is safe. But bide thy time. Gradually, over thousands of years, were by dice, became most common means for random events. Pottery found in tomo", hefore ?OOO R ('; by the time the Greek was in toll dice were (Loaded dice have also been found. Mastering the mathematics of probability would to be a formidable task for our ancestors, but they learned how to cheat!) drawn between divination lack of historical records blurs the distinction ceremonies and recreational gaming. Among mOre recent societies, though, gambling pm,pre'Pll as a distinct entity, and popularity was irrefutable. The and Romans (91), were consummate as were early for many Roman games been lost, but we can the lineage of certain modern diversions iJl what Wab playt:d the; Miudk most game of that period was hazard, the name deriving from al zhar, means "a " Hazard is thought to have brought to by soldiers returning from the its rules are much like those of our TYlr''''''''''''' .. craps. Cards were first introduced in the fourteenth century and immedi<ltely gave rise to a game known as Primero, an form Board such as backgammon, were also during this Given rich tapestry of and the with gambling that characterized so much Western world, it may seem more than a that a formal study of probability was not undertaken sooner than it was. As we will see first instance of anyone conceptualizing probability, in terms of a mathematical That means that more than 2000 years of dice games, occurred in the sixteenth and board passed by someone finally had the insight to write card down even the simplest probabilistic abstractions. rl<l", Section 1.1 A Brief History 5 Historians generally agree that, as a subject, probability got off to a rocky sTart U"'~.C1W"'" incompatibility with two of the most dominant in the evolution of our Western culture, Greek philosophy and eady Christian The were comfortable with the notion of chance (something the Christians were not), but it went against nature to suppose that random events could be quantified in any fashion. ,enevc::u that any to mathematically what did happen with what should have happened was, phraseology, an improper juxtaposition "earthly plane" with the "heavenly plane." Making matters worse was the antiempiricism that permeated thinking. Knowlto them, was not something should be by It was better to reason out a question logically than to search for explanation in a set of numerical observations. these two attitudes had a deadening effect: The Greeks had no motivation to about probability in any sense, nor were they with might have pointed them the direction of a problems of interpreting data probability calculus. If the prospects for the study of probability were dim under the they became even worse when Christianity broadened its sphere of influence. Greeks and Romans at least accepted the existence chance. They believed their gods to be either unable or unwilling to get involved in matters so mundane as the outcome of the roll of a die. writes: Nothing is so uncertain as a cast of dice, and yet there is no one wbo plays oflen who does not make a Venus-throw l and occasionally twice and thrice in succession. Then are we, like foots, to to say that it bappened by the direction ofVenl.lS rather lhan by chance? For the early Christians, there was no such thing as Every event that happened, no matter how was perceived to be a direct manifestation of God's deliberate intervention. In the words of St. Augustine: Nos eal> caUS8S quae dicuntur fortuit.ae ... non dicimus sed latentes; easque tribuimus vel veri Dei ... (We say that those causes that are said to be chance are not non-existent but are and we attribute them to the will of the true God ... ) Taking position makes the study probability moot, and it makes a ",... bjJist a heretic. Not surprisingly, nothing of significance was accomplished in the subject [or the next fifteen hundred It was in the sixteenth that probability, like a mathematical Lazarus, arose from the Orchestrating resurrection was one of most eccentric in own admission, Ca:rC1ano the history of mathematics, Gerolamo Cardano. By personified the best and the worst-the Jekyll and the Hyde-of the Renaissance maIL He was born in 1501 in Pavia. Facts about his personal life are difficult to verify. wrote an autobiography, but penchant for lying raises doubts about much of what says. n.h",_ 1When rolling four aSlragali, each of whicb is numbered on/wI' sides, a Venus-throw was having each ofthe fOUf numbers appear. 6 Chapter 1 Introduction Whether true or not., though, his "one-sentence" self-assessment paints an interesting Nature has made me capable in aU manual work, it has givetl me the spirit of a philosopher and ability in the taste and good manners, voluptuousness, it bas made me faithful, fond of wisdom, inventive. courageous. fond of learning and teaching, eager to equal the best, to discover new things and make independent progress, of modest character, a student of medicine, interested in curiosities and discoveries, cunning. sar·casllc, an initiate in the mysterious lore, industrious, diligent, living only from day to impertinent, contemptuous of religion, sad. treacherous, )\Ii1'.lgicjau HIt..:! .sOj'~fer, milSerable, hateful, lascivious, obscene, lying, obsequious. fond of the prattle of old men, changeable, indecent, fond of women, quarrelsome. and because of the conflicts between my nature and soul I am not understood even by those with whom I associate most frequently. Formally trained in medicine, Cardano's interest in probability derived from his addiction to gambling. His love of dice and cards was so all-consuming that he is said to have once sold all his wife's possessions to table stakes! Fortunately, He began looking a mathematical somethlngpositive came out of Cardano's model that would describe, in some abstract way, the outcome of a random event What he eventuaUy is now called the classical definition of probability: If the associated with some action is I'l, total number of possible outcomes, aU equaUy and if m of n result in the occurrence of some event, tbe probability of that event is min. If a fair die is ron ed, there are n - 6 outcomes. If the event "outcome is greater than o:r equal to 5" is the one in which we are interested, then m = 2 (the outcomes 5 and 6) and the probability of the event is ~, or Figure 1.1.3). Cardano bad tapped into the most basic principle in probability. The model be dis~ covered may seem trivial in retrospect, but it represented a giant step forward: was the first recorded instance of anyone computing a theoretical, as opposed to an empirical, probability. the actual impact of Cardano's work was minimal. He wrote a book in 1525, its publication was delayed until 1663. By the of the Renaissance, as well as interest in probability. bad shifted from Italy to France. The date cited by many historians (those are not Carda no supporters) as the "beginning" probability is 1654. In Paris a well-ta-do gambler, Chevalier de Mere, t • 1 '" 2 " 3 • 4 Outcomes greater than Qr eqUAl to 5; probability '" 2f6 __ ,-----~-----, \_-"-~-- ·_~_I Possible outcomes AGURE 1.1.3 Section 1.1 A Brief History 1 asked several prominent mathematicians, Blaise Pascal, a series of questions, the best-known of which was the problem of points: Two people. A and B. agree to playa of fair games until one person has won six games. They each have wagered the same amoWlt of money. the intention thaI th.e winner win be awarded the entire pot. But suppose. for whatever reason, the series is prematurely tennina ted, at which. poinl A has won five and B three. How should the stakes be divided? {The correct answer is that A should receive of the total amount wagered. (Hint: Suppose the contest were What scenari~ would lead to A's being the first person to win six games?)] Pascal was intrigued by de Mere's questions and shared his thoughts with Pierre Fennat, a Toulouse civil servant and probably most in Europe. Fennat graciously replied, and from the now correspondence came not only the solution to the problem but the foundation more general results. More significantly, news of what Pascal and were working on spread quickly. was the Dutch and mathematician Others got involved, of that plagued Cardano a century Christiaan Huygens. The earlier were not going to again. Best remembered for his work in optics and astronomy, Huygens, early in his career, WaS intrigued by the problem of points. In 1657 he published Ratiociniis in Aleae Ludo (Calculations in Garnes of a very significant more comprehensive anything Pascal and Fermat had done. For almost fifty years it was the standard has supporters who in the theory of probability. Not surprisingly, he should be credited as the founder of probability. Almost aU the mathematics of probability was still waiting to be discovered. What wrote was only the humblest of beginnings, a set of liule resemblance to the topics we teach today. But mathematics of probability was finally on firm footing. Statistics: from Aristotle to Quetelet Historians generally agree that the basic principles of ~'U'''ULL~'~' Ti~asonmg began to coalesce in the middle of the nineteenth century. What was the union of three different "sciences," each of which had along more or less independent lines (206). first of these sciences, what the Gennans called ')UlOic;n/lCur.!Qe. collection comparative information on the history, resources, and miJitary prowess " ' ,. ."VUl<>. Although efforts in this direction peaked in the seventeenth and elll'ntc::eru n "" .."" ..... "'.. the concept was hardly new: Aristotle had done something t>1> ..,t,,,,.., B.C. Of the three movements, this one had the least influence on modern statistics, but it did contribute some tenninology: The word statistics, arose connection with studies of this type. The movement, known as political arithmetic, was defined one of prclPcments as "the art reasoning by figures, upon things relating to government." Of 8 Chapter 1 Introduction more recent vintage than Staatenkunde, arithmetic's roots were in seventeenth'-<"5u." .... Making population estimates and constructing mortality tables were two of problems it frequently dealt with. In spirit, political arithmetk was to what is now called demography. The component was the a calculus of probabiJily. As we saw earlier, this was a movement that essentially started in seventeenth-century Fmnce in response to questions, but it quickly the "engine" for analyzing all kinds of data. Staatenkunde:The ::I.1":l11','ij'A Description of States The need for infonnation on the customs and resources of nations has been obvjous since antiquity. is credited with the first major toward that descriptions objective: His Polileiai, written the fourth century B.C, contained of some 158 different city-states. Unfortunately, the thirst for thllt led to the ~r.f,lp"m fell victim to the intellectual of the Dark and almost 2000 years elapsed before any similar projects magnitude were undertaken. The subject resurfaced during and the Germans showed the most meaning the comparative They not only gave it a but were also the first 1660} to incorporate the into a curriculum. A leading figure in the movement was Gottfried the middle of the taught at the University of Gottingen ",.."V"", Achenwall's claims to fame is that he was the first to use the word statistics in in the preface of his 174Y book Abriss der Statswissenschllft der ""1'"'''<''' print. It vornehmsten Reiche ulld Repllbliken. word comes from the Italian root Slalo, >, implying that a statistician is someone concerned with government it seems to have been For almost one hundred affairs.) As years the word statistics continued to be ussocinted with the comparative description of states. In the middJe of the nineteenth century, though. the term was redefined, and statistics became the new name what had previously political arithmetic. How important was the work of AchenwaU and his predecessors to the development of be sure, their contributions were more indirect statistics? That would be difficult to say. point out the than direct. They left no and DO general theory. But they need for collecting accurate data more importantly, the notion that something complex--even as complex as an entire nation--<:an be studied by gathering information on its parts. Thus, they were important support to the then growing that illduction, rather than deduction. was a more surefooted to scientific truth. Political In the sixteenth the English government to compi1e records, called bills numbers deaths and their underlying causes. Their motivation largely stemmed from the epidemics that had periodically ravaged in the not-too-distant past and were to become a problem in England. Certain officials, including the very Thomas Cromwell, of morlaiily, on a parish-to-parish basis, Section 1.1 A Brief History 9 The tOll (or Ihe year-A General Bill ra.. lJtis presenl yea" ending Ihe \11 <>llJec<;mher, 11165, a;:.::tl<ding 1(\ (he Report mooe to the Kine" 1'1)1)« excelle", MlioJetlty, by .he Co. ot Purim C1cri<s of LOod., & c.-gi.es the £oIlowing s.ummllf)l of the results; the details of the s.everal parilihes we omi!.. Ihey heing made liS in 1625, excepl. that the ool-parime,; wel'e I'lOOI 12:Buried in lhe 21 Par;!lhes within the "'l\lIs....... .. . . . . . ........ . . . . . . . . . . .... .... . . . . . . . 15,207 Whercuf oi: the plague. . . . ...••••........ '" ....... . . . . . . . ............ . . . . . ....•••• 9.887 Buried in Ihe 16 Parisbes without l11e walls... . . . . . . . ......... . . . . . . . . .. ... . . .. . . . . . . . . .. 41.351 Wher«>f <>l 1i1e plague.......... . . . . . .... . . ..... . . .. . . ... ..... . . . . . . . . . .. ........ . . . . . . . 28,838 AUhe Pes, house. 10181 btl,ied ...... ' ... . . . . . . .. ........ .. ............................. 159 or lhepl&gue.... .................... ................................ .............. 1St. Buried in tbe 12 oOI·Parishes in Middlese~ and surrey. .. ........... . . . . . .. ......... . . . . . 18.554 Whereof 0{ the plague............. ................................ ................... 21,420 Buried in the 5 PuTi~ in the CHy aoo Ubmies or WeslminSl.et .................. ...... 12.194 Whe"",r Ihe pI.I;,s.I>e . • ........... ' ................. •• ... .. 8,403 The lola) of alit he dUislenings . . . .. .. . . . .. . . . . . . . .... ................................. 9!J61 The lotal of aU the burials this year . . . . . . . . ......... . . . . . .. ............................ 91,:lO6 Whereof of the pl"Slle . •. . . . . .. ... . .. . . . .. . . . .•••• .. . .. .. .. . . . . . .••••. . . . . . .. .. 68,596 Abortive ..00 Sdllboroe ......•.... 611 l~~45 f'eave.- .................. . 5..251 Oriping in Ihe Ollis .........•••••• Hallg'd & mllde llway Ihemll<:!ved . HeJidmould .hor and mould Callen. 1,28& 7 14 110 i'llI"""....... .... ........ ....... 2Zi Poy~......................... I 46 86 OUinsie........................... 35 Ridel............................ 535 2 14 Rising 01 the Ugh"" ... '" .. .. .. .. 397 3A 20 Jaundice ................... Impos.some ....................... Kill by ul<efal accidents ....••••••. King'. E.ill ....................... ,. Levroolc ............. .. Lethargy ...••••••......... , ...... Li\ie.l'glown ....................... Bloody Flux, Scowr;"g & FlUJ( .... Bum' "lid Scalded ................ Calenture ..... ... , C.. tI(:(:r. Call8relle & FislOla ••••••• Canker and Thrush ............... Childbed ......................... Otrisl.1mes "lid rnfanls ..••••...... 1..258 F....-.dIPax ...................... . 86 Mc&grom ..-.d Heoom:h •.......... 12 Frighloo ......................... . OOUI & Sciali<:a .................. . OrieL ............ . 23 Measles .......................... Mutlhered & Shot ................ Overlaid & Starved .. 7 9 A~)I. and Suddenly ..•..•••.•.. Bedrid .......................... .. 116 10 BLast«l ......................... . 5 8leedill8 ....................... .. CokI& 16 68 Ccllick & ComwmptIDn & Tissid: ......... . Convul5ion & Mod"" ............ . IJA Distracted ....................... . Dropsie & l1m"""y ............ .. Drowned ........................ . &ewled ....................... . Flox & Smallpox ................. . Foond Dead in streels, fields. &c.. Olll61ened-MIdes.... ............ S ..ried·Males ........ > ........... 4,801! 2.036 5 1,478 SO 21 655 21 46 5.114 58,569 ~ .~~~~ ~ ~ .... ... ~ ~~~~. Palsie.................... 30 Plague. . . . . . . . .. .. . .. .. .. . . . . . .. .. 6&,596 Planllel ................... 6 20 18 8 3 RupI;~re. .. . .......... .. .. . . . . .... SCurt)'..... ....................... Shingles & SwiJle Po"..... . . . . . . . . Sores, U Icen, Brokell aDd Bnlised Llmb!...... . . . .. ......... 15 10$ 2 82 56 III Spleen............. ...... ......... 14 Spotted Feave, & Purples......... t,929 625 SlOpping <>l the SlOOlach . . ........ Slone snd Slmngll&ty .. .. .. .. . .. . . Surfe .... .......... .............. Teelh & Worms ................. Vomiting........... .............. We.m ............................ 332 'lS 2,614 51 I'> In all.... ......................... 9.961 45 Female..... ........... .......... 4,853 Females .......................... 48,731 1..251 InaIL ............................ 97,JQ6 Of Ihe Plague" .................................................................................................... '". ........ 68,596 In<rellse iii Ihe Burials in tbe 130 Pariol1es and the Peslh"'-"-t Ihis year. . . . . . . . ........••. . . . . . ...•••••.... . . . . . . . . . .•••. . . . . . .. 79lXh Increase oi: the Plague in lhe 130 Parishes and the Pe5thouse Ihis year ............. ""............. ..................... ........ 68,590 FIGURE 1.1.4 felt that these bills would prove invaluable in helping to control the spread of an epidemic. At first. the bills were published only occasionally. but by the early seventeenth century they had become a weekly institution? Figure 1.1.4 (155) shows a portion of a bill that appeared London in 1665. gravity of the plague epidemic is strikjngly apparent when we look at the numbers at the top: Out of 97 ,306 deaths, 68,596 (over 70%) were caused by the plague. The breakdown of certain other afflictions, though they caused fewer deaths, raises some interesting questions. What Z An interesting accounl of the bills of mortality is in Daniel Defoe's A JOl/rnol of the Plague Yeor, which purportedly chronicles the London plague outbreak of 1665. 10 Chapter 1 Introduction happened, for example, to the 23 people who were "frighted" or to the 397 who suffered from "rising of the lights"? Among the faithful subscribers to the bills was John Graunt. a London merchant. Graunt not only read the bills. he studied them intently. He looked for patterns, computed death rates, devised ways of estimating population sizes, and even set up a primitive life table. His results were published in the 1662lreatise Nalural and Political Observ(l{ions upon (he Bills o/Mortality. This work was a landmark: Graunt had launched the lwin sciences of vital statistics and demography, and, although the name came later, it also signaled the beginning of political arithmetic. (Graunt did not have to wait long for accolades: in the year his book was published, he was elected to the prestigious Royal Society of London.) High on the list of innovations thnt made Graunt's work unique were his Objectives. Not content simply to describe a situation. although he was adept at doing so, Graunt often sought to go beyond his data and make generalizations (or, in current statistical terminology, draw inferences). Having been blessed with this particular (urn of mind. he almost certainly qualifies as the world's first statistician. All Graunt really lacked was the probability theory that would have enabled him to frame his inferences more mathematically. That theory, though, was just beginning to unfold several hundred miles away in France. Other seventeenth-century writers were quick to follow through on Graunt's ideas. William Peuy's Political Arilhmetick was published in \690, although it was probably wntten some tifteen years earlier. (It was Petty who gave the movement its name.) Perhaps even more significant were the contributions of Edmund Halley (of "Halley's comet" fame). Principally an astronomer. he also dabbled in political arithmetic. and in 1693 wrote An ESfimate of the Degrees oflhe Mortality of Mankind, drawn from Curious Tables of the Births and Funerals lIL the city of Brcslaw; with an attempt to ascertain the Price of Annuities upon Lives. (Book titles were longer then!) Halley shored up. mathematically, the efforts of GTaunt and others to construct an accurate mortality table. In doing so, he laid the foundation for the important theory of annuities. Today, all life insurance companies base their premium schedules on methods similar to Halley's. (The first cornpany to follow his lead was The Equitable, founded in 1765.) For all its initial Aurry of activity. political arithmetic did not fare particularly well in the eighteenth century. at least in terms of having its methodology fine-tuned. Still, the second half of the century did see some notabJe achievements for improving the quality of the databases: Several countries, mcludIng the United States in 17<)0, established a periodic census. To some extent, answers to the questions that interested Graunt and his followers had to be deferred until the theory of probability could develop just a little bit more. Quetelet: The Catalyst With political arithmetic furniShing the data and many of the questions, and the theory of probability holding oul the promise of rigorous answers, the birth of statistics was at hand. All that was needed was a catalyst--someone to bring the two together. Several individuals served with distinction in that capacity. Karl Friedrich Gauss, the superb German mathematician and astronomer, was especially helpful in showing how statistical concepts could be useful in the physical sciences. Similar efforts in France were made by Laplace. But the man who perhaps best deserves the title of "matchmaker" was a Belgian. Adolphe Ouetelet. Section 1.2 Some Examples 11 Quetelet was a mathematician, astronomer, physicist, S()(;lOIIO£l.St, anthropologist, and poet One of his passions was collecting data, and he was regularity of social phenomena. In commenting on the nature of criminal he once wrote (69): Thus we pass from one year to another with the sad perspective of the same crimes reproduced in the same order and calling down the same punishments in the same proportions. Sad condition of bumanity! ... We might enumerate in advance how many individuals will stain their hands in the blood of their fellows, how many will be how many will be poisoners, almost we can enumerate in advance the births and deaths that should occur. There is a budget which we pay with a frightful regularity; it is that of chains and the scaffold. onenr:anon, it was not surprising that would see in probability expressing human behavior. much of the nineteenth '-'W;UUIJUJU,",...., the cause of statistics, and as a member of more than one hundred learned his influence was enormous. When he died in 1874, statistics had been brought to the brink of its modern era. SOME EXAMPLES Do stock markets and fall randomly? Is there a common element in the aesthetic standards of the Greeks and the Shoshoni Indians? external forces, such as phases of the moon, affect admissions to mental hospitals? What kind of relationship exists between to radiation and cancer mortality? These are quite diverse in content, but they share some important similarities. or impossible to study in a laboratory, and none are likely to yield recl.SOlrun.g. Indeed, these are precisely the sorts of questions that are usually answered by making assumptions about that generated the data, and then drawing inferences about those assumptions. CASE STUDY 1 1 Each radio and TV reporters offer a bewildering averages and indices that presumably indicate the state of the market. Are numbers any really useful information? financial analysts would say _•."~"",,, that speculative markets tend to rise and fall randomly, much as though wheel were spinning out the How might that "theory" We would begin by constructing a model that should describe the behavior of the market the (rtmdom) hypothesis were [rue. that end, the notion of "random translated into two aSSUITIOt:lon,s: a. The of the market's rising or falling on a aCI:IGrlS on any previous days. b. is equally likely to go up or down. day are unaffected by its (Colltinued Oll next pttgeJ 12 Chapter 1 Introduction (Case 1.2.1 continued) Measuring the day-to-day randomness, or its absence, in the markel's movements at the of runs. definition, a run of downturns can accomplished by oflength k is a sequence of days starting with a rise, followed by k consecutive declines, then followed a So, for example, a daily sequence of the form fall, rise) is a run oflength two. If actual of the market's run differs from the predictions of assumptions (a) and (b), random-movement hypothesis can rejected. Fortunately, calculating the "expected" number of (randomly-generated) runs is straightforward. Suppose a rise followed by a fall For a run one, the market must next rise. By assumptions (a) and (b), this happens halt the time, so a probability of would be to a run of length one. The notation for this will be P(I) = ~. other half of the time the market falls, the (rise, fall, fall). A run of two occurs if there is nOw a Again, this happens the time, half represented by the faU, fall) Thus, its probability half of probability of a run of two is P(2) = Continuing in manner, it follows that a :run length k has probability a)k. Furthermore, if there are T total runs, it seems reasonable to expect T . (!)k of them to be of k. Table 1 gives the distribution of 120 runs of downturns observed in daily closing 1994 and prices of the Standard and Poor's 500 stock index between February February 9, 1996. third column gives the corresponding expected as calculated from the expression T . (!)\ where T 120. Notice that the between actual and predi.cted run frequencies seems enough to lend at least some to assumptions (a) and (b). However, the distribution particularly expected numbers of longer runs (4, 5, and 6+) do not fit well. The reason (or that might be that tile "equally likely" provision in assumption (b) is too restrictive and should be replaced by the probability p as given in assumption (c): !. i !. c. The likelihood of a faU in the market is some number p, where 0 .::: p .::: 1. TABl£ 1.2.1: Rul'\! jl'\ the Closing ["rfees for the S&P 500 Stock Index Run Length., k Observed Expected 28 18 3 30.00 60.00 1 2 3 4 5 6+ 2 2 120 15.00 7.50 3.875 3.75 Invoking assumptions (a) and (c), then, aUows for the run length probabilities to be recalculated. For example, following a (rise, fall) sequence, a rise would be, expected (Con1/nued on next page) Section 1.2 Some Examples 13 100(1 - p)% of the time, so P(l) == 1 - p. Another fall, of course, would occur the remaining p% of the time. the chance of the next change a is 1 - p. probability of the (rise, fall, fall. rise)-that is, a run of length two-is P(2) = p(l - p). In P(I<:) = jI-l(l - p). Two questions now Whichever of the two is more for further study depends on the needs and the model maker. 1. Is the initial assumption p 2. Given the observed := ~ justified? is the best choice (or estimtlte) p? To answer Question 1, we must decide whether the du;.creparlCi(:s bc=twEen and expected run lengths are enough to be attributed to enough way to answer Question 2 is to the value of p that to render the model invalid. best "explains" the observations, in terms of maximizing their likelihood of occurring. For the data from which Table 12.1 was derived, this of mrns out to p = 0.43. The expected values, based on P(k) = jI-l(l - p) = (0.43),,-1(0.57), are given column 3 of Table 1.22. TABLE 1.2.2: Runs in the Closing PTkes fOf the S&P 500 Stock Index Run k 1 2 3 4 5 Observed Expected [p = 0.43] 67 28 68.4 18 12.6 3 2 2 120 5.4 1.9 Has assumption (c) provided a noticeably better fit? Yes. five of the six: runlength categories, frequencies in Table are closer to the correspondingobserved frequencies than was true for their Table 1.2.1. Moreover, bothmodels-p 0 < p < l-areinsubstantialagreementwiththehypothesis that up-and..-down movements in the market look like a random sequence. ! CASE STUDY 1 Not ali rectangles are created equal Since antiquity, have expressed aesthetic preferences rectangles having certain width (w) to length (I) ratios. Plato, for example, wrote that rectangles whose sides were in a ratio were especially pleasing. (These are formed from the two an equilateral triangle.) (Continued on fll!X! page) 14 Chapter 1 Introduction (Ca'll! Study 1.22 iY1IltiJlJud) Another calls for the width-to-Iength ratio to be equal to the ratio the length to the sum of the width the length. That is, w (1.2.1) Equation 1.2.1 implies that the width is 1), or approximately 0.618, times as as the length. The Greeks called l'>~"~-" rec:taD:g1e and used it often in inclined. The their architecture (see Figure 1.2.1). Many other cultures were for example, built their pyramids out stones were golden rectanlgies. Today, in our society, the golden rectangle renrlalJlS an architectural and and even items such as drivers' picture have wjl ratiCE: close to 0.618. FIGURE 12.1: A The fact that many societies have as an aesthetic standard has two ~ihle explanations. One, to it because of the profound influence that Greek writers, philosophers, and artists have had on cultures about human perception that all over the world. Or two, there is something pn~J1~p(JlSes a preference for the golden in the field of experimental to test the plausibility of those two hypotheses by seeing wbether the rectangle is accorded any "v ......,."'" status by that had no contact wbatsoever with their study (39) examined of sewn by the Indians as decorations on their blankets and clothes. lists the ratios for twenty rectangles. If, indeed, also had a preference for golden .l ........O,.u,&l....", expect their ratios to be "dose" average value of the entries though, is 0.661. What does that Is 0.661 close enough to 0.618 to support the position that liking the is a human characteristic, or is 0.661 so far from 0.618 that the only is that the Shoshonis did not agr-ee with the aesthetics espouserl by the (Continued on neJd page) Section 1.2 Some Examples 15 TABLE 1.2.3; Width-T04.ength Ratios of Shoshoni Rectangles 0.693 0.662 0.690 0.606 0.570 0.749 0.654 0.615 0.628 0.609 0.844 0.668 0.601 0.576 0.670 0.606 0.611 0.553 0.933 Making that judgment is an of hypothesis testing, one of the predominant fonnats used in statistical inference. Mathematically, testing is based on Shoshonis and a variety of probability results covered in Chapters 2 through 5. their rectangles" then, will have to be put on hold until Chapter 6, where we how 0.661) a hypothesized mean to interpret the difference between a sample mean 0.618). Comment. and e, the ratlo W /1 for golden rectangles (more commonly referred to as either phi or the golden ratio), is a transcendental number with all sorts of fascinating properties and connections. Indeed, books have been written on phi-see, for example (106). Algebraically, the solution of equation is the continued fractjon 1 W 1=1+-----1 1+---1 1+ - - 1 1+ 1 + ... Among the associated with phi is its relationship with the Fibonacci series. The latter, of course, is the famous sequence where each tenn is the swn of its two predecessors-that is, 1 1 2 3 5 8 21 55 89 '6 Chapter 1 Introd uct ron Quotients of successive terms in the Fibonacci sequence alternate above and below phi and they converge to phi: 1/1 = 1.000000 2/1 = 2.000000 3/2 = 1.500000 5/3 = 1.666666 8/5 = 1.600000 13/8 = 1.625000 21/13 = 1.615385 34/21 = 1.619048 55/34 = 1.617647 89/55 = 1.618182 But phi is not just about numbers-it has cosmological significance as well. Figure 1.22 shows n golden rectangle (of width W Ilfld length J), where .a W x w square has been inscribed in its left-hand-side. What remains is a golden rectangle on the right. inscribed in which is an l - w X l - !JI square. Below that is another golden rectangle with a w - (1 - w) x w - (I - w) square inscribed on its right-hand-side. Each such square leaves another golden rectangle, which can be inscribed with yet another square, and so on ad infinitum. Connecting the points where the squares touch the golden rectangles yields a logarithmic spiroJ. the beginning of which is pictured. These curves are quite common in nature and describe, for example, the shape of spiral galaxies, one of which being our own Milky Way (see Figure 1.2.3). w l-w w- (I-w) w-(I-w) RGURf 1.2.2 What does all this have to do with the Sbosbonis? Absolutely nothing, but mathematical relationships like these are just too good to pass up! The famous astronomer Joannes Kepler once wrote (106): "Geometry has two great treasures; one is the Theorem of Pythagoras; the other [is the golden ratio]. The lim we may compare to a measure of gold; the second we may name a precious jewel." Section 1.2 Examples 17 FIGURE 1.2.3 CASE STUDY 1.2.3 In folklore. the full moon is ofrell portrayed as something sinister. a kind of evil possessing the power 10 eomrol our ochavior. Over the centurit:s, many pmmint:nt writers and philosophers have shared Ihis belief (132). Milton. IU Paradise Lv.~t. refers to Demoniftc frenzy. moping. melancholy And moon·~lruck madncss, And Othello. after the murder of Desdemona, laments: II b the \'ery error of the moon. She comes more near the earth than she "vas wolil And makes men mad. On a more scholarly level. Sir William Blackstone. the renowned eighteenth-century barrister. defined a "Iunatie" as one who hath ... IOSI the use 01 his reason and who halh luci(j inkrvals. somelimcs cnjoyinjt his senses and sometimes nol. and Ihat frequemly depending. upon chang!;;'s of the moon, The possibility of lunar phases influencing human affairs is a not wi[hoUI supporter:;. among the scientific community. Studies by reputable medical researcht:rs " as it has come to be known. \\'I[h have attempted to link rhe "Transylvania suicide rates, pyromania, and even epilepsy. The relationship between lunar eycles and menIal breakdowns has also been shows the admission Hites to the emergency room of a Virginia studied. Table I mental health clinic hefore. during. and after the twelve fun moons from August 1971 [0 July 1972 (l (Cml1il/lf/!d 011 I//'X/ P(I~") 18 Chapter t Introduction (Care Study 1.2.3 continued) TABLE 1.2.4:. Admission Rates (PatienWDay) During Full Moon After Fun Moon 6.4 7.1 6.5 8.6 5.0 13.0 14.0 5,8 8,1 6.0 9.0 7.9 7.7 11.0 12.9 13.0 16.0 25.0 13.0 l4.0 13.1 15.8 13,3 12.8 Month Full Moon Aug, Sept. Oct. Nov, Dec. Jan. Feb. Mar. May June July Averages 10.4 11.5 13.8 15.4 15.7 11.7 9.2 11.5 10.9 For these data, the average admission rate "during" the full moon is higher than the "before" and Hafter" admission rates: 13.3 versus 10.9 11.5. Does that impJy that Transylvania is real? Not The that needs to be addressed lS whether means as different as 13.3, 10.9, and 11.5 could reasonably have by chance if, in fact, the Transylvania effect does not exist. We will learn in Chapter 13 that the answer to that question to be "no." CASE STUDY 1.2.4 The oil embargo 1973 raised some very questions about energy in the United States. One of the most controversial is whether reactors should assume a more central role in the production of electric power. Those favor point to their efficiency and to the availability of nuclear material; those against warn of nuclear "In,ClOents' and the health hazards by low-level radiation. the opponents' position was a serious safety lapse that occurred some at a government facility located in Washington. What happened there is what fear will be a recurring problem r{',actors fire proliferated. Until recently, Hanford was responsible for the plutonium used in nuclear weapons. One of the major safety problems encountered there was the storage of radioactive wastes, Over the years, significant quantities of strontium 90 cesium 137 leaked from their storage areas into Columbia River, which (Continued on next page) Section 1.2 Some Examples 19 flows along the Washington 0regon IV\r£lP>r and eventually into the Pacific "-'"........". The question raised by health officials was whether to that contamination contributed to any medical problems. to what extent? was calculated of the nine Oregon a starting point, an the Columbia River or the Pacific Ocean. It was counties having frontage on on several factors, including county's stream distance from and the distance of its population any water frontage. a covariate, the cancer mortality rate was determined for of the same counties (42). 4 TABLE 1..2.5: Radioactive Cootamination and cancer Mortality in County Index Umatilla Morrow Gilliam Sherman Wasco Hood River Portland Columbia Oatsop Cancer Mortality per 100,000 2.49 147.1 130.1 2fJ7.s 177.9 11.64 6.41 210.3 A graph of the data 1.2.4) suggests that radiation exposure (x) and and that the two vary is y = f30 + fitx. cancer mortality (y) are Finding the numerical fJo and fJl that such a way that it "best" fits the data .is a frequently-encountered problem in an area of statistics the optirnalline, based on methods described in known as regression anaLysis. Chapter 11, has the y = 114.72 + 9.23x. 220 y., 114.72+ Cancer deaths per 180 100.000 till 140 till .. till 100 o 2 6 4 Index FIGURE 1.2.4 8 12 20 Chapter 1 1.3 A CHAPTER SUMMARY Introduction The concepts of probability lie al the very heart of all statistical problems, the case studies of Section 1.2 being typical examples. Acknowledging that fact, the next two chapters take a close look at some of those concepts. Chapter 2 states the axioms of probability and their consequences. It also covers the skills algebraically manipulating probabilities and gives an introduction to combinatorics, the mathematics of counting. Chlipter 3 reformulates much of the material in Chapter 2 in terms of random variables, the latter a concept of convenience in applying Over years, particular measures of probability prObability to as being especially useful: The most prominent of these are profiled in Chapter 4. Our study of statistics proper begins with Chapter 5, which is a first look at the theory of parameter estimation. Chapter 6 introduces the notion of hypothesis a procedure commands a major share of the remainder of book. From that, in one Conn or a conceptual standpoint, these are important chapters: Most fonnal applications of statistical methodology will involve either parameter estimation or hypothesis testing, Or both. Among the probability functions featured in Chapter 4, the nonnal distribution-more familiarly known as bell-shaped sufficiently important to merit even further scrutiny. Chapter 7 derives in some detail many of the properties and applications the normal distribution as well as Ihose of several related probability functions. Much of the in 9 through 13 comes from theory that supports the methodology Chapter 7. Chapter 8 some of the basic principles of experimental "design." Its purpose is to provide a framework for comparing and contrasting the various statistical procedures 9 th.rough 14. profiled in Chapters 9, and 13 the work of Chapter 7. but with the emphasis populations, similar to what was done in Case Study on the comparison of Chapter 10 looks at the important problem of assessing the level of agreement between II set of data and the vaLues predicted by the probability model from which those data relationships, such as the one presumably came (recall Case Study 1.2,1). radiation exposure and cancer mortality in Case Study are examined in Chapter II. Chapter 14 is an introduction to nonparametric statistics. The objective there is to develop procedures for answering some of the same sorts of questions raised in Chapters t$, l), and but with fewer initial assumptions. As a general (onnat, each chapter contains numerous examples and case studies, latter being actual experimental data taken from a variety of sources, primarily newspapers, magazines, and technical journals. We hope that these applications will make it abundantly clear that, while the general OTi~ntalinn of this text is theoretical, the consequences of that theory are never too far from having direct relevance to the "real world." CHAPTER 2 Probability 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 INTRODUCTION SAMPLE SPACES AND THE ALGEBRA OF SETS THE PROBABILITY FUNCTION CONDIllONAL PROBABILITY INDEPENDENCE COMBINATORICS COMBINATORIAL PROBABILITY TAKING A SECOND LOOK AT STATISTICS (ENUMERATION AND MONTE CARLO TECHNIQUES) Pierre de FemUlf Blaise Pas('111 One most influential of seventeenth-century mathematicians, Fermat earned his Jiving as a lawyer and administrator in Toulouse. He shares with Descartes for the invention of analytic geometry, but his most important work may have been in number theory. Fermat did not write for publication preferring instead to send lerrers and papers to friends. His correspondence with Pascal was the starting point for the development of a mathematical theory of probability. -Pierre de fermat (1601-1665) was the son of a nobleman. A prodigy of sorts, he had already published a treatise on conic sections by the age of sixteen. He also invented one of the calculating machines to help his father with accounting work. Pascal's contributions to probability were stimulated by his correspondence, In 1654, with Fermat Later that year he retired to a of religious meditation. -Blaise Pascal (1623-1662) 21 Probability 2.2 Chapter 2 2.1 INTRODUCTION Experts have estimated that the likelihood of given UFO sighting being genuine is on the order of one in one thousand. some ten thousand have been reported to civil authorities. What is the probability that at least one of thMe ohject~ was, in fact, an alien spacecraft? In 1978. Pete Rose of the Cincinnati Reds set a National League record by batting safely in forty-four consecutive games. How hitter? definition, the mean unlike]y was that event, given that Rose was a lifetime free path is the average distance a molecule in a gas ttavels before colliding with another molecule. How likely is it that the distance a molecule travels between collisions will be at least twice its mean free path? Suppose a boy's mother and father both genetic markers for sickle cell anemia, but neither parent exhibits any of the disease's symptoms. What are the chances that son will also be asymptomatic'? What are the odds that a poker player is dealt a full house or that a craps shooter makes his "point"? rf a woman has lived to age sevemy, how likely is it thal ~hc will !.lie before her ninetieth birthday? In Tom Foley was of the House and running for re-election. The after any of the networks: he trailed his the election. his race still not been "called" Republican Challenger by 2174 votes, but (4,000 absentee ballots remained to be counted. have wailed (or the absentee ballots to be counted, howevt:r. c.onr.t:oeo. Should or was his defeat at that point a virtual certainty? of those questions would probability is a subject with As the nature and an range real-world, everyday applications. What began as an exercise in understanding games of chance has proven to be useful everywhere. Maybe even more remarkable is the fact that the solutions to all of these diverse questions are roo led in a handful of and theorems. Those results, with the problem-solving techniques they empower, are the sum and substance or Chapter 2. We begin, though, with a bit of history. The Evolution of the Definition of Probability Over the years, the definition probability has several revisions. There is nothing contradictory in the multiple changes primarily reflected the need for greater generality and more mathematical rigor. The tirst t'ormulalion (often T"TPT'"Pr> to as the classical definition of probability) is credited to Gerolamo Cardano (recall Section 1.1). It only to situations where (I) the number of possible outcomes is finite and (2) all outcomes are equally-likely. Under those conditions, the probability of an event comprised of iii outcomes is the ratio m/lj, where n is the total number of (equally-likely) outcomes. Tossing a six-sided die, for example, gives mIn ~ as the (that is, 2,4, or 6). probability of rolling an even While Cardano's model was well-suited to gambling scenarios (for which it was intended). it was obviously inadequate for more general problems, where outcomes we:re not equaUy likely and/or the number of outcomes was not Richard von Mises. a is often credited with the weaknesses twentieth-century German in Cardano's model by defining probabilities. In approach, we Section 2.1 Introduction 23 lim min Jl-6c> m 1 n 2 o _---v- 1 2 3 4 5 n '" numbers of trials RGUftE 2.1.1 imagine an experiment being repeated over and over again under presu.rtU1bly ldemical times (m) conditions. Theoretically, a running taHy could be kept of the number outcome belonged to a given event divided by n, the total number of limes the .......'.....'·"'1"1'1" was performed. According to von Mises, the probabiliIy of the given event is the limit (as n goes to infinity) of the ratio mIn. Figure 2.1.1 illustrates lhe empirical probability of getting a head by a fair coin: as the number of tosses continues to increase, the mIn to~. The von Mises approach definitely shores up some of the inadequacies seen in the Cardano model, but it is not without shortcomings of its own. There is some conceptuaJ inconsistency, for example, in extolling the limit mIn as a of defining a probability empirically, when the very act of repeating an experiment under identical conditions an infinite number of limes is physically impossible. And left unanswered is in order for min to be a good approximation the question of how large n must lim min. Andrei Kolmogorov, the greal Russian probabilist, took a approach. Aware that many lwentieth-century mathematicians were having success developing subjects axiomatically, Kobnogorov wondered whether probability might similarly be defined the von operationally, rather than as a ratio (like Cardano model) or as a limil Mises model). efforts culminated in a masterpiece of mathematical elegance when he published Grulldbegriffe der Wahrscheinlichkeiisrechnung (Foundations of the Theory of Probability) in 1933. In essence, Kolmogorov was to show that a maximum four simple axioms was necessary sufficient to define the way any and all probabilities must behave. (These will be our starting point in Section 2.3.) We begin Chapter 2 with some basic (and, presumably, familiar) definitions from set theory. These are important because probability will evemuaHy be defined as a set function-that is, a mapping from a set to a number. Then, with the of Kolmogorov's axioms in Section 2.3, we will learn how to calculate and manipulate probabilities. The chapter concludes with an introduction to combinaJorics-the mathematics of systematic application to probability. 24 Chapter 2 2.2 SAMPlE SPACES AND THE ALGEBRA Of SETS Probability The starting point probability is the definition of terms: experiment. and event. all r~T'rv(\VPn: from classical set us a familiar mathematical framework within which to the fonner is what provides the conceptual for casting phenomena into probabilistic terms, By an experiment we win mean any procedure that (1) can be tbeoreticaUy, (2) has a well-defined set of possible outcomes. Thus, an infinite number of times; rolling a of qualifies as an experiment; so measuring a blood pn::~un:; ur a :,,~tfUgIavhiL: allalysis to determ.ine the carbon content of moon rocks. Asking a would-be psychic to draw a picture an presumably transmitted by another psychic does not qualify as an because the set of possible or otherwise outcomes cannot listed, Each of the potential eventualities of an experiment is referred to as a outcome, s, and their totality is called the sample space, S. To signify membership s in we write S E S. Any designated collection of sample including individual t,..".",,~<, the space, and the null set, constitutes an event. The latter is the experiment is one of the members of the event. SOIlIV,re outcome, sample EXAMPLE 1.1.1 'l'Ir'1'oll'l,>Y" the experiment a coin three times. What is tbe space? Which sarnOlle outcomes make up event A: Majority of coins show Think of each sample outcome here as an triple, its representing the outcomes of the first, and third respectively. Altogether, there are eight triples. so those space: s= {HHH, HTH, THH, HTT, THT, TTH, we se,e that fOUT of the sample ollkomes in S c;onstituf,e the event A: A= HTH. EXAMPLE 2.1.1 lht: fu:;.l UlIt: one green. each sample outcome pair (face showing on red showing on green die), and the can be represented as a 6 x 6 matrix (see Figure are often in tbe event A that the sum of the Showing is a 7. 2.2.1 that sample outcomes contained in A are the six (2, (3,4). 3), (5,2), and (6, ruLling twu Section 2.2 Spaces and the Algebra of Sets 2S RIce showing on green die 1 2 3 (1,3) .., 1 (1,1) (t,2) 11 2 (2,1) (2.2) g 3 (3,1) =a 4 5 4 t:! ~ 5 6 (6.6) EXAMPLE A local station advertises two positions. women (Wi. Wz. W3) and a sample two men (M}, M2) apply. the "experiment" of hiring two coanchors of 10 outcomes: S = {(WI, W2), (Wt, W3),-(W:l, W3). (WI. Ml), (WI. M2), (W2, MI), (W2, M2), (W3. MI), (W3. M2). (Ml. M2)} Does it matter here'that the two being ruled are equivalent? If the station were seeking to hire, say, a sports anilouncer ana a weather forecaster, the number of possible outcomes would be 20: (W2. example, would represent a different staffing assignment than eM1, W2). EXAMPlE 2.2.4 nUlrno<~r of sample outcomes associated with an experiment need not be finite. Suppose that a coin is tossed until the first tail appears. If the first toss is itself a tail, the outcome is T; if the occurs on the second toss, the outcome is HT; and of course, tail may never occur, and infinite nature of S S HHHT, ... } EXAMPlE 2.2,5 There are three ways to indicate an experiment's sample space. Ii the number of possible outcomes is small, we can simply list them, as we did in Examples 2.2.1 through 2.2.3. In some cases it may be possible to characterize a sample space by the structure its outcomes necessarily is what we did in Example A third option is to state a mathematical formula the sample outcomes must satisfy. 26 Chapter 2 Probability "VllU""U~';;4 DrolO'amrner is running a subroutine that solves a general equa+ c = O. Her "experiment" consists of values for tbe three coefficients a, b, and c. Define (1) Sand (2) the event A: Equation has two equal roots. we must determine the sample presumably no combinations finite a, b, and c are inadmissible, we can cb,aralcU~rnre S by writing a series of inequalities: + bx S = {(a, b, c): -00 < a < 00. -00 < b < 00, -00 Defining A requires the well-known result eqWll roots if and only if its discriminont, b 2 contingent on a, b, and c satisfying an equation: < C < (0) ..."',""'.,'u.... equation has in A, then, is Mi~mbel~snLp A QUESTIONS 2.2.1. A graduating engineer bas up for three job inlerviews. She intends to categorize each one as being either a "success" or a "failure" depending on whether it leads to a pl::1nt. trip. Write out. the appropriate ~ample spare. What outcomes are in event A; Second success occurs on third interview? In B: FiIst success never occurs? (Hint: Notice the between this situation and the coin-tossing experiment described in Example 2.2.1.) one red, one blue, and one outcomes make up the 2.2.2. Three dice are event A that the sum of the three faces showing five? 1.2.3. An urn contains six chips numbered 1 6. Three are drawn out. What outcomes are .in the event "Second smallest Assume that the order of the is irrelevant. 2.2.4. Suppose that two are dealt from a standard 52-card deck. Let A be the event that the S!.IIlI of the two cards is eight (assume that aces a numerical value of How many outcomes are in A? 2.2.5. In of craps-shooters (where two dice are tossed and the underlying sample space is matrix pictured in Figure 2.2.1) is the phrase "making a hard eight." What might that mean? \ 2.2.6. A poker deck consists of fifty-two cards, representing thirteen denominations (2 through Ace) and four suits (diamonds, and spades). A five-card hand is called a flush if all five cards are in the same suit, but not all five are consecutive. Pictured next is a flush in heam. Let N be the set of five cards in bearts that are not Bushes. How many outcomes are in N? Note: In poker, the denominations (A, 2, 3, 4, are considered to be consecutive (in to sequences such as (8,9, 10,1, Q)). Denominations 2 Suits 3 D H X X C S 4 5 6 7 8 9 10 1 X X X K A Section 2.2 Sample Spaces and the Algebra of Sets 27 2.2.7. Let P be the set of right triangles with a 5" hypotenuse and whose height and Length are a and b, respectively. Characterize the outcomes in P. 2.2.8.. Suppose a baseball player steps to the plate with the intention of trying to "coax" a call base on balls by never swinging at a pitch. The umpire, of course, will the event A, [hat pitch either a ball (8) or a strike (S). What outcomes make ball is called a batter walks on the sixth pitch? Note: A batter "walks" if the before the third strike. 2.2.9. A telemarketer is planning to set up a phone bank to bilk widows with a Ponzi scheme. His past experience (prior to his most recent incarceration) suggests that each phone will be in use half the time. For a given phone at a given time, let 0 indicate that the phone is available and Let J indicate that a caller is on the line. Suppose that the telephones. telemarketer's "bank" is comprised of (8) Write out the outcomes in the sample space. (b) What outcomes would make up the event that exactly two phones were being used? How many outcomes would allow for the (c) Suppose the telemarketer had k possibility that at most one more call could be received? 2.2.10. Two darts are thrown at the following target: Let (u, v) denote the outcome that the first dart lands in u and the second in region lJ. List the sample space of (u, v) 'so (b) List the outcomes in the space oisums, u + v, A woman has her purse snatched by two teenagers. She is subsequently shown a police What is the """"I-'J,"" lineup consisting of five suspects, including the two space associated with the experiment "Woman two suspects out of lineup"? makes at one incorrect identification? Which outcomes are in the event A: Consider the experiment of choosing for the quadratic ax 2 + bx + c = O. Characterize the values of (J, b, and c associated with the event A: Equation has imaginary roots. of craps, the person rolling the dice (the shooter) wins outright if his first toss In the is a 7 or an 11. 1f his first toss is a 2,3, or 12, he loses outright. If his first roll is something say, a 9, that number becomes his "poine' and he rolling the until he roUs another 9, in which case he wins, or a 7, in which case he loses. Characterize the sample outcomes contained in the event "Shooter wins with a point of 9." A probability-minded offers a convicted murderer a final chance to ten white and ten bJack. All twenty are to release. prisoner is given twenty be placed into two urns, to any aUocation scheme the prisoner wishes, with W11 contain at least one chip. The executioner will the one proviso being that then pick one of the two urns at random and from that UITI, one chip at random. If the set free; if it is black, he "buys the farm." chip selected is white, the prisoner will Characterize the sampJe space the possible allocation options. (Intuitively, which a1Jocation affords the prisoner the greatest chance of survival?) (9) 2.2.11. 2.2.12. 2.213. 1.2.14. 28 OIapter 2 Probability 1.1.15. Suppose that ten chips, 1 through 10, are put an urn at one minute to midnight, and chip number 1 is quickly removed. At one-balf minute to midnight, 20 are added to the urn, and number 2 is quickly reDlOv'e<1. numbered 11 Then at one-fourth minute to midnight, 21 to 30 are added to the urn, and chip number 3 is quickly removed. procedure for adding to the urn continues. how many chips will be in the urn at midnight (152)1 Unions. Intersections. and Complements operations collectively Associated with events defined on a sample space are referred to as the algebra. of sets. These are the rules that govern the wnys in which One event can combined another. Consider, for the of craps in Question 2.2.13. The shooter wins on his initjal ro]] if he throws a 7 or an 11. the language of algebra of sets, the event "shooter a 7 or an 11" is the union of two events, "shooter a 7" and "shooter rolls an 11." If E denotes the union and if A and B the two events making up the union, we E = A U B. The next several definitions and examples those portions of the of sets we will find particularly useful in the chapters ahead. Definition 2.2.1. Let A and B be any two events defined over the same SarnpJle space S. a. The intersection of A and B, written A n B, is the event whose outcomes belong to both A and B. b. TIlc lwiuTL o( A and B. written A U either A or B or both. EXAMPlE 2.2.6 A card is drawn from a poker A = {ace of is the event whose outcomes Let A be the event that an ace is St:J'C:CLCU; ace of spades} ace of diamonds, ace Let B be the event "Heart is drawn": B = hearts,3 A n B hearts, ...• ace of hearts} {ace of hearts} and A U B = {2 of hearts, 3 of .. ,ace of ace of diamonds, ace of clubs, ace of spades} (Let C be the event "club is drawn." Which cards are in B U C? In B n C?) to Section 2.2 EXAMPLE 2.2.1 Let A be the set of x's A n B and A U B. which + 2x = Sample Spaces and the Algebra of Sets 8; lel B be the which xl +x = 6. Find Since the first equation factors into (x + 4)(x = 0, its solution set is A = Similarly, the equation can be written (x + 3)(x - 2) = 0, making B = Therefore, A n B 29 2}. 2}. = {2l and A U B = (-4, 2} EXAMPlE Consider the electrical circuit pictured in Figure Let Ai denote the event that switch 1,2,3,4. A be the event "Circuit is not completed" A in terms of the Ai'S. i fails to close, i = AGURE 2.l.l Call the (D and ® switches line a; can the and (!) switches line b. By inspection, the fails only if both a and line b fail But line a only if either (DOT ®(or both) fail. That is, the event that a fails is union Al U Az. Similarly, the failure of b is the uruon A3 U A",. The event that circuit then, is an intersection: Definition 2.2.l. Events A and B defined over the same sample space are said to be mutually exclusive if they have no outcomes in comroon--that is, if A n B = 0, where 0is null set EXAMPLE 2.2.9 Consider a single throw of two dice. Define A to be the event that the sum of the faces two faces themselves are odd. Then clearly showing is odd. Let B be the event that the intersection is empty, the sum of two odd numbers necessarily being even. I'll symbols, A n B = 0. (Recall event B n C asked for 2.2.6.) ](J Chapter 2 Probability Definition 2.2.3. u:l A be any evenl ddineu un a sample spa!,;!;! S. The complement of A, written A c, is the event consisting of all the outcomes in S other than those contained in A. EXAMPlE 1.1.10 Let A be the set of (x, y)'s for which x 2 + y2 < 1. Sketch the region in the xy-plane A C. corresponding to From analytic geometry, we recognize that x2 + y2 < 1 describes the interior of a circle of radius 1 centered at the origin. Figure 2.2.3 shows the complement-the points on the circumference of the circle and the points outside the circle. y FIGURE 2..2..3 The notions of union and intersection can easily be extended to more than two events. For example, the expression Al U A2 U ... U Ak defines the set of outcomes belonging to any of the Ai'S (or to any combination of the Ai'S). Similarly, Al II A2 II ... II Ak is the set of outcomes belonging to all of the A;'s. EXAMPLE 1.1.11 Suppose the events A 10 A2, ... , At are intervals of reat numbers such that Aj = (x: 0 ~ x < Describe the sets Al U A2 U ... U At 1/ I), i = 1,2, ... , k = Lf;=l Ai and At II A2 II ... II Ak = «'=I Ai· Notice that the Ai'S are telescoping sets. That is, Al is the interval 0 ~ x < 1, A2 is the and so on. It follows, then, that the union of the k Ai'S is simply Al interval 0 ~ x < while the intersection of the Ai'S (that is, their overlap) is Ak. !, QUESTIONS 2.2.16. Skt.:lch the regiom; in tht.: xy-plant.: correspomling tu A U B am.! A A={(x,Y):O < x < 3,0 < Y < 3) and B = {(x, Y): 2 < x < 4,2 < Y < 4) n 8 if Section 2.2 Sample Spaces and the AJgebra of Sets 31 Example 2.2.7. find A () B and A U B if the two equations were replaced mequliUtJ!es: x 2 + 2x ::5 8 and x 2 + x ::5 6. 2.2J.8. A () B () C if A = {x: 0 ::: x ::: 4}, B 0,1. 2.... }. = Ix: 2 ::5 x ::: 6}, C = Ix: x = 2.2.l9. An electronic system has four colnpc>ne:nts divided into two pairs. The two components pair are wired in parallel; are wired in series. Let denote the event "ith component in jtb pair 1.2; j = 1,2. Let A be the event "System " Write A in terms of the Ai/S. j=l 2.2.20. j=2 = A (x: 0 S x S I}, B Ix: 0 ::5 x S 3), and C diagrams showing each of the following sets of points: <a> n A C () B C (b) AC U (B () C) = Ix: -1 ::: x ::: Draw . (c) A () B () CC (d) «A U B) () CC'}C A be the set of dealt from a 52-card where the denominations of the five cards are all consecutive-for (7 of Hearts, 8 of Spades. 9 of Spades, 10 of Jack of Diamonds). Let 8 be the set of five-card hands where the suits of the five cards are all the same. How outcomes are in the event A () B1 2.2.22. Suppose that each of the twelve letters in the word T is written on a E SSE L L A T ION the events F, R. and C as follows: F: letters in first of alphabet R: letters that are repeated V: letters that are vowels Which chips make up the following events: <a) F () R () V (b) F C () R () VC (c) F n RC n V 32 Chapter 2 Probability 2.2.23. Let A. B, and C be any three events defined on a sample space S. Show that (Iii) (he outcomes in A U n are the same as the outcomes in (A U 8) n (A U C) (b) the outcomes in A n (B U are the same as the outcomes in (A n B) U n C). 2.2.24. Let At. ...• Ai( be any set of events on a S. What outcomes to the event (At U U (Af n Af n ... n Af) U ... U 2.2.25. Let A, B, and C be any three events defined on a sample space S. Show that the operations of union and intersection are associative proving that (a) A U (B U (b) A n (B n (A U 8) U C :::: A U B U C 8) n C:::: A n B n c = (A n 2.2.26. Suppose that three events-A, and C-are defined on a sample space S. Use the union, intersection, and complement operations to represent each of the following events: (8) none of the three events occurs (b) all of the events occur (e) only event A occurs (d) exactly one event occurs (el exactly two events occur 2.2.1:/. What must be true of events A and B if AUB=B (b) A n B A 2.2.28. Let events A and Band """"10"'''- space S be defined S = {x: A B Ix: as the fol/owing intervals: 0:5: x :5: 1O} 0 < x < 51 (.x: 3::s x :5: 7) Characterize the following events: (a) (b) (e) (d) (e) (f) AC A A A r1 B U B n AC U B AC BC n 2.2.29. A coin is tossed four limes and the ~_" •.. " "",n""' ... "A of Heads and/or Tails is recorded. Define the events A. B. and C as follows: A: Exactly two heads appear B: Heads tails alternate C : First two tosses are heads Which events, jf any, are mutually exclusive? (1)) Which events, jf any, are subsets of other sets? Section 2.2 Sample Spaces and the Pictured below are two organizational charts describing the way vets new proposals. For both models, three vice presidents-I, 2, opinion. .£un"'.... ,":. of 33 .)-ea,:n voice an Ca) (b) For (a), all three must concur if the proposal:is to pass; if anyone of the three favors the proposal in (b) it denote the event that i favors the proposa~ i = 1,2,3, and let A denote the event that the ",rr'lY'V"'",,,1 r--~'-' F'II'l"Ir'"",,,, terms of the Aj 's Coy' the two office protorois. Under wbat sorts system be preferable to the other? Expressing Events Graphically: Venn Diagrarm; Relationships based On tWO'or more events can sometimes be difficult to express using only equations or verbal An alternative approach that can be highly effective is to represent the events graphically in a (onnat known as a Venn diagram. Figure 2.2.4 shows Verll1 an intersection, a union, a complement, and for two events that are mutually exclusive. In each case, the interior of a region corresponds to the desired event. Venn diagrams AnB AUB L......._-_....JS AnB=¢ FlGURE2..2A EXAMPLE 2.2.12 When two events A and B are defined on a sample space, we will frequently need to 1""""'"'''''''' s. the event b. the event exactly one (of the two) occurs at most one (of the two) occurs 34 Chapter 2 Probability Getting expressions for each of these is easy if we visualize the corresponding Venn diagrams. The shaded area in Figure 2.2.5 represents the event E that either A or B, but not both, occurs (that is, exactly one occurs). A S 8 1...-_ _ _ _ _---" FIGURE 2.2.5 Just by looking at the diagram we can formulate an expression for E. The portion of n B C . Similarly, the portion of B included in E is B n AC. It follows that £ can be written as a union: A, for example. included in E is A E = (A nBc) U (B n A C ) (Convince yourself that an equivalent expression for E is (A n B)C n (A U B).) Figure 2.2.6 shows the event F that at most one (of the two events) occurs. Since the latter includes every outcome except those belonging to both A and E, we can write F = (A n B)C s AGURE 2.2.6 EXAMPLE 2.2.13 When Swampwater Tech's Class of '64 held its fortieth reunion, one hundred grads attended. Fifteen of those alumni were lawyers and rumor had it that thirty of the one hundred were psychopaths. If ten alumni were both lawyers and psychopaths, how many suffered from neither of those afflictions? Let L be the set of lawyers and H, the set of psychopaths. If the symbol N (Q) is defined to be the number of members in set Q, then N(S) = 100 N(L) = 15 N(H) = N(L n H) 30 = 10 Section 2.2 S Sample Spaces and the Algebra of Sets 35 L..-_ _ _ _ _- - - ' RGURE 2..2..7 all this information is the Venn diagram in Figure 2.2.7. Notice that N (L U H) = number of a)umni suffering from at least one affliction =5+10+20 = which lLLUliJll~.othat 100 N(L U H)=N(L) or + were neither lawyers nor psychopaths. In effea, N(H) - N(L n H) [=15 + 30 -10=35] QUES1lONS :2.2.JL During orientation week, the latest Spiderman movie was shown twice at State University. Among the entering class of 6000 freshmen, 850 went to see it the first time, 690 the second time, while 4700 failed to see it either time. How many saw it twice? 2.2.32. Let A and B be any two events. Use Venn diagrams to show that (8) the complement of their intersection is the union of their compJements: (b) the complement of their union is the intersection of their compJements: two results are known as DeMorgtm's laws.) 2.2.33. Let A. H, and C be any three events. Use Venn diagrams to show that (8) A n (B U C) = (A n B) U (A n C) (b) A U (B n C) = (A U B) n (A U C) 2.2.34. Let B, and C be any three events. Use Venn diagrams to show that (8) A U (8 U C) = (A U 8) U C (b) A n (8 n C) = (A n 8) n C 2.2.35. Let A and H be any two events defined on a sample space S. Which of the sets are necessarily subsets of which other sets'? An8 36 Chapter 2 Probability 2236. Use Venn diagrams to :>Ut:t:Cbt an equivalent way (8) (A n BC)c (b) B U (A U B)C (c) A n (A n B)C re~)re:sel1ltlnl~ the foUowing events: 2..2.37. A total of twelve hundred graduates of State Tech have gotten into medical school in the severa] years. Of that number, one thousand earned scores of twenty-seven or on the MCAT and four hundred had GPAs that were 3.5 or higher. Ml'IY""",""" three hundred had MeATs that were or and GPAs that were 3.5 What proportion of those twelve hundred graduates got into medical school an MCAT lower than twenty-seven and a GPA below 3.51 2.2.38. Let B. and e be any three events defined on a sample space S. Let N{A), N{O), N(C), N(A n B), N(A n e), N(B n a n d N(A n B n e) the numbers of outcomes in aU the different intersections in which A, B, and e are involved. Use a formula for N(A U B U C). Hint: Start with the sum a Venn diagram to N(A) + N(B) + N(e) and use the Venn diagram to identify the that need to be made to thal sum before it can equal N(A U B U C). As a recall from p. 35 that N(A U N(A) + N(B) - N(A n B). in the case of two events, subtracting N(A n B) is the Hadjustment." 2.2.39. A poll conducted by a potential presidential candidate asked two questions: (1) Do you support the candidate's position on taxes? and (2) Do you support the candidate's position on homeland security? A total of twelve hundred were received; six to the second. If hundred said "yes" to the first question and four hundred three hundred said "no" to the taxes and to the homeland security many said "yes"to the taxes but "no" to the homeland security ..."... ,_;:),,"" 2.2.40. For two events A and B defined on a sample space S, N(A n Be) = 15, N(A c n B) = 50, and N(A n B) = 2. Given that N(S) = 120, how many outcomes belong to neither A nor B? 1. 3 THE PROBABIUTY FUNCTION introduced in Section 2.2 the twin of "experiment" and "sampJe space," we are now ready to pursue in a fonnal way the all-important problem of a DTllf)(J!f)tl'zlv to an experiment's more to an event. if A is any event defined on 3 space S, the symbol peA) will denote the probability and we will refer to P as the probability function. It is, in a from a set (i.e., an event) to a backdrop for our will he the unions, of set theory; the starting point will be the intersections. and to Section 2.1 that were set forth by Kolmogorov. If S has a finite number of members, Kolmogorov showed that as few as axioms are necessary and sufficient for characterizing the probability function P: Axiom 1. Lei A be any event defined over S. Then peA) :::: 0. Axiom 2. P (S) 1. .,.""v.,n 3. Lei A and B be any fWD mutually pvt'I".·,,,p events defined over S. Then peA U B) = peA) + P(B) Section 2.3 a When S bas an inftnite number The Probability Function 37 axiom is needed: Adorn 4. Let At. A2 ..... , be events defmed over S. If Ai n Aj = 0 for each i :;.; j, then From these simple statements come the general rules for manipulating probability function that apply no matter what specific mathematical fonn it may take in a particular context Some Basic Properties of P Some of the immediate CO[LSe(lUenCl~ of Kobnogorov's axioms are given in Theorems 2.3.1 through their simplicity, several of properties-as we will soon see-prove to be useful in solving all sorts of problems. Theorem llL P(A c) = 1 - P(A). Proof. By Axiom 2 and Definition 2.2.3, peS) = 1 = peA U A C ) But A and A C are mutually exclusive, so o and the result follows. 'Theorem 2.3.2. P(0) = O. Proof. Since 0 = P(0) = P(SC) = 1 - 'Theorem 2..3..3. If A C B, then P(A) (B which implies Proof. peS) =L o P(B). n A c) are mutually exclusive. ','h,...",t·",,.,,, P(B) = peA) TheoremU4. = event B may be written in the form Proof. Note where A ~ peS) P(B) + =:: P(A) since P(B n any event A, peA) ~ AC ) PCB n AC) =:: O. o L proof follows immediately bec:aw;e A C Sand o 38 Chapter 2 Probability Theorem 2.3.S. Let A l, A2, . .. • be events defined over S. If Ai Proof. The proof ls a straightforward induction i;U:I;l,um",nL (i Aj = 0 fur i '* j, then with Axiom 3 the o point Theorem 2.3.6. peA U B) = peA) + P(B) P(A (i B). Proof The, Venn diagram for A U B certainly that the statement of the theorem is true (recaU Figure 2.2.4). More formally, we have from Axiom 3 peA) = peA PCB) = + (i peA n B) and Adding these two equations + = [peA + A C) + + peA n B) By Theorem 2.3.5, the sum in the brackets is peA U B). If we subtract peA both sides of the equation, the resuH follows. (i B) from peA) PCB) (i nC) p(lJ (i peA (i B)] 0 EXAMPLE 2.3.1 S such that peA) = 0.3, PCB) B C ), and (c) P(Ac (i A and B be two events defined on a sample peA U B) 0.7. Find (a) peA n B), (b) a. Transposing the terms in Theorem = 05, U formula for the prohahility yields a of an intersection: peA n B) = peA) + P(B) - peA U Here P(A n B) = 0.3 + 0.5 - 0.7 b. The two cross-hatched regions in 2.3.1 corresp(mcl to A C <lno RC. The union of and BC consists of those regions that have cross-hatching in either or both directions. By inspection, the only portion of S not included in A C U is the intersection, A n B. By 2.3.1, then, =1 - =1- =0.9 peA n B) Section 2.1 'l.B /' -( I'- A;::" ....-- 39 B- ~ ~~ - ( The ProbabHity Function / s I s FtGURf 2.3.2 Co The event A C n B corresponds to the region in Figure 2.32 where the cross-hatching extends in both directions-that is, everywhere in B except the intersection with A. Therefore, P(A C n B) = PCB) - = 0.5 = 0.4 peA n B) - 0.1 EXAMPLE 2.3.2 Show that for any two events A and B defined on a sample space S. From Example 2.3.18 and 1beorem 2.3.1, peA n B) = peA) + =1 PCB) - - peA c) + peA U B) 1 - P(BC) - peA U B) But peA U B) !S 1 from Theorem 2.3.4, so peA n B) :!. 1 - P(A c ) - P(BC) EXAMPLE 2.33 Two cards are drawn from a poker deck without replacement. What is the probability that the second is higher in rank than the first? Let Al. A2, and A3 be the events "FIrst card is lower in rank," "Hrst card is higher in rank." and "Both cards have same rank," respectively. Clearly, the three Ai'S are mutually exclusive and they account for all possible outcomes" so from Theorem 2.3.5. 40 Chapter 2 Probability Once the first card is drawn, there are three choices for the second that would have the . Moreover, demands that peAl) = P(A2), so same rank-that is, peAl) implying that P(A2) = ~. EXAMPLE 2.3.4 In a newly released martial arts film, the actress playing the lead role a stunt double handles an the physically action scenes. According to the script, actress appears in 40% of the film's scenes, her double appears in 30%, and the two that in a scene (11) only them are together 5% of the time. What is the and (b) neither the lead actress nor the appears? the stunt double !I.. If L is the event "Lead actress appears in scene" and D is the event "Double apf,ealrs It in scene," we are given that peL) = OAO, P(D) = 0.30, and peL n D) = follows that P(Only double appears) P(D) - n D) =0.30 = n,?,) Example b. The event "Neither appears" is the But peAt least one appears) = peL N",lCI'I,o"appears) U D). 1 - of the event" Atleast one appears." From 2.3.1 and then, peL U D) + P(D) [OAO + 0.30 - 1 - [peL) = 1 - - peL n D)] 0.05] = 0.35 EXAMPLE Having endured (and survived) the mental trauma that comes from taking two of chemistry, a year of physics, and a year of Birf decides to test the medical school his MeATs to two colleges, X and Y. Based on how his friends have waters and fared, he estimates that his probability of being accepted at X is 0.7, and at Y is 0.4. He ahiu there is a 75% chance thai at least one of his applications will be rejected. What is the probability that he gets at least one acceptance? Let A be the event "School X accepts him" and B. the event "school Y " We are given that PCA) 0,7, PCB) and P(A c U B C ) = 0.75. The question is peA U 8), Section 2.3 The Probability Function PCB) - n 41 From Theorem U B) = P(A) + peA B) Recall from Question P(A (") It follows that somewhere: B) = 1 - P[(A n B)C] - 0.75 0.25 of are not all that bleak-he has an peA U B) Comment. =1 in = 0.7 + 0.4 - 0.25 =0.85 that P(A U B) varies directly with peA C U BC): U B) = peA) + P(8) - (1 - = peA) + P(B) 1 + P(A C U BC )) P(A c U If peA) and PCB), are fixed, we set the curious at one ao:;eotarlce increase if his chances of at least one chances of getting QUESTIONS 2.3.L to a famity-oriented lobbying group, (here is too much crude Language and "V"'-'''""'"' on television. Forty-cwo percent of the they screened had language were considered excessive in found offensive, 27% were too violent, '~"b-'.I::>- and violence. What percentage of did comply with the group's B be any two events defined on S. that P(A) 0.4. P(B) = 0.5, and B) = 0.1. What is the probability that A or B but not both occur? 2.3.3. FYlnrp,RR the following probabilities in terms of P(A), PtB), and P{A n B). P(A C U BC) (b) n (A U B)) A and B be two events defined on S. If !.he probability that at least one of them 2..3.4.. not occur is 0.1, what is PCB)? occurs is 0.3 and the probability that A occurs but B 2.3..5. that three fair dice are tossed. Let Ai be the evenl Ihat a 6 shows on the ith die, 1,2,3. Does P(AJ U U = ~? '-'AfJ1CUJ". S such that P«A U B)e) = 0.6 and A and B are defined on a A or B but not both will occur? P(A n B) = 0.2. Whal is the Ain £IIifi;#jandAtUA2U",U At. A2, ...• An be a series of events for = S. B be any event defined on S. B as a of intersections. Draw the Venn diagrams that would "n'.,....''''''',... P(A n B) = P(B) and (b) peA U B) = P(B). 2.3.9. In the game of "odd man oul" each player tosses a fair coin. If aU the coins turn up the same except for one, the player tossing the is the odd man out is eliminated from the contest. Suppose that people are playing. What is the probability that someone will be eliminated on the (Hint: Use Theorem 23.1. ) 2.3.2. P(A n 42 Chapter 2 Probability 2.3.10. An urn contains numbered 1 through 24. One is drawn at random. Let A be [he event that the number is divisible by two and let B be the event that the number is divisible by three. Find P(A U B). gam!::, a 30% chance 2.3.n. If Slate's footban learn ha~ a 10% chance of of winning two weeks from now, and a 65°/" chance of both games, what are their of winning exactly once? 23.12. Events AI and are such that AJ U n A2 = 0. Find P2 if P(AI) PI. = P2, and 3pI - P2 23.13, Consolidated Tndustries has come under pressure to eliminate its seemhiring practices. Company officials have agreed that 60% of their new employees will be females and 30% will be of their One out lOur new employees. though, will be white males, What new hires will be minority females? 23.]4. Three B, and C -are defined on a sample space, S. Given that P( A) P(B) O.L and P(C) = what is the smallest possible value for P[(A U B U 2.3.J5. A coin is to be tossed fOUf times. Define events X and Y such that = X: first and last coins have opposite faces Y: exactly two heads appear Assume that each of the sixteen HeadfJ'ail sequences has the same probability. Evaluate (n) P(XC 11 (b) P(X Ii 2.3.16. Two dice are tossed. Assume that each possible outcome has a ~ probability. Let A be the event thaI the sum of the faces showing is 6, and let B be the event that the face showing on one die is twice the face showing on the other. Calculate P(A Ii Be). 2.3.17. Let, A, B, and C be: thr!::!:: t:vt:nt:. ddtn!::d on a samplt: S. Arrange: the: probabililit!s of the following events from smallest to (a) A U B (b) A Ii B (c) A (d) S n 2.3.18. 2.4 B) U (A Ii C) is currently running two dot. com scams out of a bogus chatroom. She estimates that the chances of the first one leadjng to her arrest are one in ten; the "risk" associated with the second is more on the order of one in thirty. She considers the busted for both to be 0.0025. What are chances of likelihood that she avoiding incarceration? CONDITIONAL PROBABILITY In Section 2.3, we calculated probabiHties of certain events by manipulating other probabilities whose values we were Knowing peA}, peA Ii B), for t:]{amplt!, i:lnuw~ u:. lu P(A U B) (r!::l.:all Tht!ort!m Fur many l'!::<ll-wurid though, the in a probability problem simply knowing a set of other probabilities. Somelimes, we know for a fact that events halle already occurred, and those occurrences may have a bearing on the probability we are trying Lo find. In short. the of an event A may have to be "adjusted" jf we know for 'VIi"-,,,,,,1"I 2,4 Conditional Probabllity 41 certain that some related event B has already occurred. Any probabjJity that is revised to take account the (known) occurrence of other events is said to a conditional probability. Consider a with A defined as the event «6 appears," Oeariy, P(A) = But has already been tossed-by someone who refuses to us whether or not A but does enlighten us to point of continning that B occurred.. where B is the event "Even number appears:' What are the chances of A now? up Here, common sense can help us: There are three equally likely even numbers the event B-one of them satisfies the event A, so "updated" probability is Notice that effect of additional information, such as knowledge occurred, is to revise-indeed. to shrink-the original samp1e space S to a new set of outcomes Sf. In this example, the original S contained six outcomes, the conditional sample three (see 2.4.1). The symbol P(AIB)-read "the probability of A given used to denote a conditional probabiHty. Specifically, P(AIB) to the probability that A will occur that B ho.s already occurred. It will be convenient to have a formula for P(AIB) that can be evaluated in terms of the S, rather the revised S'. Suppose S is a sample space with n outcomes, all equally likely. Assume that A and B are two events containing a and b outcomes, respectively, and let c denote the number of outcomes in the intersection of A B (see 2.4.2). Based on the argument suggested in 2.4.1, conditional probability of A given B is the ratio of c to b. But cJb can be written as the quotient of two other 1. c - b - bin so, for this particular case, P(AIB) = P(A n B) P(B) (2.4.1) same underlying that leads to Equation 2.4.1, though, holds true even when the outcomes are not equally likely or when Sis uncountably infinite. B ~s. 3- P (6., relative to S) '" 116 RGURE2A1 8) P (6, relallve to S) '" llJ s '----------' AGUR.EVU Definition 2.4.L A and B be any two events on S such PCB) > O. The conditional probability of A, assuming that B has already occurred, is written P(AIB) 44 Probability and is given by P(AIB) = P(A () B) P(B) Comment. 2.4.1 can be cross-multiplied to give a frequently then expression for the probability of an intersection. If P(AIB) = P(A () B)/ P P(A () B) = P(AIB)P(B). (2.4.2) EXAMPLE 2.4.1 A card is drawn from a poker deck. What is the the card is a club, given that the card is a king? the answer is The king is equally Likely to be a heart, diamond, dub, or spade. More fonnally, let C be the event is a , let K be the event "Card is a king." Definition 2.4.1, !: P(ClK) But P(K) = our intuition, () K) () K) P(card is a king of clubs) = k Therefore. confirming 1/52 1 P(CIK) = - =4/52 4 [Notice in tbis example that the conditional proba bility P (C IK ) is the same as the probability P(C)-they both equal This means that our Kll()Wl,eClIl,e K has occurred gives us no additional insight about the C occurring. Two events having this property are said to be independent. We will examine the notion of lOae.pc;:n(]enc:e and its consequences in !. EXAMPLE 2.4.2 l"Irr,nll".1TI" even ones that appear to be Our intuitions can often be fooled by simple and straightforward. The "two here is an often-cited case in point. Consider the set of families two that [he four possible birth scqucnccs-(youngcr child is a child is a boy), (younger child is a boy, older likely. What is the probability that both children child is a girl), and so on-are are boys given that at 1east one is a boy? The answer is not The correct answer can be deduced from By }-eac:tl bas sequences-(b. b), (b, R), (f(, b), and assumption, !. Section 2.4 a Conditional Probability 45 probability of occurring. Let A be the event that both children are boys, and let B be event that at least one child is a boy. Then P(AIB) = peA n B)/P(B) = peAl/PCB) since A is a subset of B (so the overlap between A and B is just A). But A has one outcome {(b, b)} and E has outcomes ((b, g), (g, b). (b. b)}. Applying Definition 2.4.1, then. gives P(AIB) = (1/4)/(3/4) = :31 Another correct approach is to go back to the sample space and deduce the value of P(AIB) from first principles. Figure 2.4.3 shows events A and B defined on the four family types that comprise the sample space S. Knowing that B has occurred redefines the sample space to include three outcomes:, each now a probability. Of possible outcomes, one-namely. (b, b)--£atisfies the event A. It follows that P(AIE) "" j. l I' ((b, b» (b,g) A ___- - - ' " (g,b) B (g,g) I S = sample space of Iwo-child families writteoaB (frrst hom, second horn)] [OlJtromes AGURE2.4..3 E~PLE2A.3 Two events A and B are defined such that (1) the probability that A occurs but B does not occur is 0.2, (2) the probability that B occurs but A does not occur is 0.1, and (3) tbe probability that neither occurs is 0.6. What is P(AIB)? The three events whose probabilities are given are on the Venn diagram shown in Figure 2.4.4. Since P(neither occurs) = 0.6 = P«A U 8)c) it follows that P(A U B) = 1 - 0.6 = 004 = peA nBC) + peA so peA n B) =0.4 - 0.2 - 0.1 =0.1 n B) + PCB n AC ) 2 , - Neither A nor B A s'------------' AGURE 2.4.4 From Definition 2.4.1, then, peA n B) ~----~=------~----~--~~ peA n B) 0.1 =---- 0.1 =0.5 + P(B n + 0.1 EXAMPLE 2.4.4 The possibility of importing liquified natura] gas (LNG) has been :SU~,gCl3lCU as one way of coping with a future crunch. though, is the fact that LNG is highly volatile and major spill The question, occurring near a U.S. port could result input for future policymakers who therefore, of the likelihood a may have to decide whether or not to the proposal. Two numbers need to be taken into account: (1) the probability that a tanker will have an (2) the probability Lhat a major spill will develop given that an Although no significant spills of LNG have yet in the world, these probabilities can be approximated from records occurred on tankers transporting less dangerous cargo. On the basis of such it has been estimated (44) that the probability is 50.~ that an LNG tanker will have an a<X:IC1!ent on anyone trip. Given that an accident hos occurred, it is suspected that only 3 times in 15,000 will the damage be sufficiently severe that a major spill would What are the chances that a given LNG shipment would precipitate a disaster? Let A denote the event "Spill develops" and let B denote the event "Accident occurs." Past experience is suggesting that P(B} = ~ and P(AIB) Of primary concern is the probability that an accident will occur and a spill will ensue-that is, P(A n B). Using Equation 2.4.2, we finn that the chances of a are on the order Section 2.4 Conditional Probability 41 of 3 in 100 million: P(Bccident occurs spill develops) = peA n B) = P(AIB)P(B) 3 8 = 15,000 . SO,OOO O.CJOOOOO()32 EXAMPLE 2..4.5 Max and Muffy are two myopic deer hunters who shoot simultaneously at a nearby sheepdog they have mistaken for B lO-point buck. Based on years o( well-documented ineptitude, it can be assumed that Max has a 20% chance of hitting a statiollary target at close Muffy has a 30% chance, and the probability is 0.06 tbat they would both be on target. Suppose that tbe sheepdog is hit and killed by exactly one bullet. What is the probability that Muffy fired tbe (atal shot? A tbe event that Max hit the dog, and let B the event Muffy hit the dog. Then peA) = 0.2, P(B) = 0.3, and peA n B) = 0.06. We are trying to P(BI{A C n B) U (A nBC» where the event (A C n B) u nBc) is the lUlion of A and B the intersection-that is, it represents the event that either A or B but not both occur (recall Figure 2.4.4). Notice, from Figure 2.4.4 that tbe intersection of B and (A C n B) U (A nBc) is Therefore. from Definition 2.4.1, tbe event A C n P{BI(A C n B) U (A nBC» = [P(A c n = [PCB) - = [0.3 B)]/[P{(A c n B) U (A nBC)}] peA n B)V[P(A U B) - peA n B)] - 0.06]/[0.2 + 0.3 - 0.06 - 0.06] =0.63 CASE STUDY 2.4.1 (Optional) There once was a brainy baboon Who always breathed down a bassoon For he said, "It appears That in billion.s of years 1 shall certainly hit on a tune." Eddington (Conti1ll.l.ed on next page) 50 Chapter 2 Probability , ("11\" Suull' ),4. j <'I)II/II/IIt',11 A GO THIS BABE AND JUDGEMENT OF TIMEDIOUS RETCH AND NOT LORD WHAL IF THE EASELVES AND DO AND MAKE AND BASE GATHEM I AY BEATELLOUS WE PLAY II1EAN!:) HOLY FOOL MOUR WORK FROM INMOST BED BE CONFOULD HAVE MANY JUDGEMENT WAS IT YOU MASSURE'S TO LADY WOULD HAT PRIME THAT'S OUR THROWN AND DID WIFE FATHER'ST LIVENCTH SLEEP TITH I AMBITION TO THIN HIM AND FORCE AND LAW'S MAY BUT SMELL SO AND SPURSELY SIGNOR GENT MUCH CHIEF MIXTUnN fiGURE 2,4.6 Om: <.:an only wonder hpw '"human" computer·generaled lex I might be if conditional probabilities for. say. ,even- or eight-letter sequences were av::ulable. KIght now they are not but the Lite [hal computer technology is soon will be. When that day comes .•• ur monkey will probably still never come up with text as creative as Hamlet's soliloquy. but a fairly decent mighl show up time to lime! CASE STUDY 2.4.2 (Optional) Several years ago. a lelevi~H.m program (inadvertently) spawned a conditional pro!;healed discussions. even in [he nalional ability problem lh(lt led 10 more Ihan a media. The 1lhow was Le(,\ Wake (1 and the question involved the Ihal conteslants should lake 10 maximize their chances of winning prizes. On Ihl! program. a conk,tant would he presented with three doors: l;ehind one or which was the prize. After lhe contestant had selected a door. the host Momy Hall. would open one of the olhn two doors. showing that the prize was not Ihere. Then he would give Ihe contestant a choice-either with the door initially selected or swih:h to rhe "thinl" UOOI Ihal had flOI bet!1\ ofJened. For many common sense seemed to that switching doors would make no difference. By assumption. Ihe prize h(ld a one-third chance of behind each of the doors when the game Once a door was opened. it was argued Ihat each of the remaining dools now had a one-half probahility of hidin!! the so contestants gained nOlhing switching their bets. Not so_ An application 1,1" Ddlnition 2.-1. L shows thai it ,Ioes make a differt:l1cecontestants, in facL dO/lhle (heir chances of winning by switching doors. To see a specific (pultypkal) case: the comeslan! has bet on Door #2 and Monty Hall has opened Ooor IIJ. liIVen lhal sequence of evelUS. we need 10 calculate and compare rhe t.:on<.lilional probahililv. If the behind Door # J and Door respeclively. It the former is larger (and we will prove (hal it i~). the conlestant should doors. ICoJ!lIllJlrd rm lWrl fllll'l'l Section 2.4 Conditional Probability 49 TABLE:U,_, Character Frequency Probability Random Number Range Space E 6934 3277 2578 2557 2043 1856 1773 1741 1736 1593 1238 0.1968 0.0930 0.0732 0.0726 0.0580 0.0527 0.0503 0.0494 0.0493 0.0452 0.0351 0.0312 0.0288 0.0252 0.0222 0.0203 0.0178 0.0166 0.0136 0.0123 0.0116 0.0088 0.0072 0.0058 0.0010 0.0008 0.0006 0.0004 00001-06934 06935-10211 10212-12789 12790-15346 17389 17390-19245 19246-21018 21019-22759 22760-'24495 24496- 26088 26089-27326 27327-28425 28426-29439 29440-30328 30329-31111 31112-31827 31828-32456 32457-33040 33041-33518 33519-33951 33952-34361 34362-34670 34671-34925 34926-35128 35129-35162 35163-35189 35190-35210 35211-35224 0 T A S H N I R L D U M Y W 1014 889 783 716 F C 584 P B V 478 433 410 309 K J Q X Z 203 34 27 21 14 AOOAAORH QNNNDGELC TEFSISO VTALIDMA POESDHEMHIESWON PJTO.MJ FiL FIM 1 AOFERLMT 0 NORDEERH HMFIOMR.ETWOVRCA OSRIE IEOBOTOOIM NUDSEEWU WHHS AWUA HIDNEVE NL SELTS fiGURE 2 A.. 5 a program knowing only single-letter (Table 2.4.1). Nowhere does even a single correctly spelled word appear. Contrast that with Figure 2.4.6, showing computer text generated by a program that bad been given estimates for conditional probabilities corresponding to aJl 614,656 (= 284 ) four-letter sequences. What we get is still garble, but the improvement is astounding-more than 80% of the letter combinations are at least words. (Continued on next page) 48 Chapter 2 Probability (niH' Smdy 2..1.1 (,om iI/lied I The image of a monkey !\iuing at a Iypewriler. pecking away at random until he g.et~ lucky anel lype:, nut a perfect copy of the complete works of William Shakespeare. has long heen a favorite model of slatislician:, and plJilo:,;orhers to illt1~tJ iltt= lht.' l.li~lillcliun bt=IWL'CIl MlIllt=lhing (hilL if> IIIt=Ol t=lically pO~f>ible but h.tl all praclical purposes, imp()s~ible. But if that monkey and his typewriter are replaccd by a bigh-technology computer and if we program in the right sorts of conditional probabililies, Ihe prospects for generating sOll1e/hill!?, intelligible become a little less far-fclchcd-mnybe cvcn dislUrbingly less far-fetched (11). Simulating nonnumerical English text require:,; thai twenty-eight characters be dealt with: Ihe twenty-six Idters, Ihe space. and the apostrophe, The simplest approach would be to assign each of those characters a number from 1 to 28. Then a random number in that range would be generated and the character corresponding to that number would be printed. A second random number would be generated. a corresponding second character would be primed. and so on. Would that he a reasonable model? Of course not. Why should, say. X':,; have the S:-lme chance of heing selected as E's when we know Ihill the latter ilre much more common? At the VCI1' least. weight:" :"hould be a~signed 10 <111 Ihe characlers proporti onal to their rdill ive proba hi I ities. Table 2.4.1 shows the e mpirica 1dis! ri bution of the twenty-six letters. the spllce, and the apostrophe ill the 35,224 chilracters making up Act III of I/amler. Ranges of random Ilumbers corresponding to each character's frequency are li~led in the lilst column. If two random numbers were generaled. say. 27351 and 11616. the computer would prim the characters D and O. Doing IhaL of course. is equivalcnl 10 primil1g a D wilh probabilily n.0312 = [(28425 - 27327 + 1)/ 35244 = 1099/352-141 and an 0 with probahility 0.0732 = [(12789 - 10212 + 1)/ 352M = 2578/35244]. Extending Ihi~ idea [0 seljl/cl1Cl'.\· of letters requires an application of Definition 2.4.1. What is the probability. for cX<1mple. that a T follows an E? By definition. PIT follows an E) = P(TIE) = number of ET's f F' numher 0 ,~ The analog of Table 2.4.1, then. \\'ould he an array having twenty-eight rows and twenty-eight columns. The entry in the itll row and jill column woukl he PUIj), the probabililY that leller i follows letter j. Tn a 5.imilar fashion. condilional probabilities for longer sequences could also be estimated. For example. the probability that an A follows the sequence QU would be Ihe ratio of QUA's to QLJ's: P( A follows QU) = P{AIQLJ) = number of QUA's l f QU' num )er 0 s What does our monkey gain by having a typewriter programmed with probabilities of sequences? Quite a bit. Figure 2.4.5 shows Ihree lines of computer text generated {CoJJlill(leG ()fl ne,v Pt1{~cJ Section 2.4 Conditional Probability 51 TABlE 2.4.2 (Prize Location, Door Opened) (1,3) (2,1) (2,3) (3,1) 1/6 1/6 1/3 Table 2.4.2 shows the sample space associated with the scenario just described. H the prize is actually behind Door Ill. the host has no choice but to open Door similarly. if the prize is Door the host has no choice but to open Door It!. In the event that the prize is behind Door #2, though, the host would (theoretically) open Door IiI half the time and Door #3 half the time. Notice that the four outcomes in S are not equally likely. There is necessarily a prize is behind each of the doors. However, the one-third probability that two choices that the host has when the prize is behind Door #2 necessitate that (2, 3) one-third probability represents the two outcomes (2, 1) '-'U"'-""";''' of the being behind Door #2. then, has the one-sixth probability in Table Let A be the event that the prize is behind Door #2, and Let B be the event that the host opened Door #3. Then P(AIB) = P(contestant wins by not switching) = [peA n B)J/P(B) =j Now, let A* be the event that the prize is behind Door #1, and let B (as before) [n this case, event that the host opens Door P(A*IB) the = P(contestant wins by switching) = [P(A* n B)l/P(B) = [!]/[~ + i] =j Common sense would should always switch two-thirds. led us astray again! If given the ....1l1-'1~1i;;, contestants Doing so ups their of winning one-third to QUESTIONS 2.4..1. Suppose that two fair dice are tossed. What is the probability that that it eigbt? sum equals ten 52 Chapter 2 Probability 2.4.2. Find P(A n B) if PtA) 0.2, P(8):::: 0.4. and P(AIB) + P(BIA):::: 0.75. 2.4.3. If P(AIB) < PIA), show that P{BIA) < PCB). 2.4.4. Let A and B be two events sllch that P«A U B)c} = 0.6 and P(A n 8) 0.1. Let E be the event that either A or B but not both will occur. Find P(EIA U B). 2.4.5. Suppose that in 2.4.2 we ignored the of the children and distinguished (girl, girl). Would the conditional only three family types; (hoy, boy), (girl, boy), probability of both children being boys that at least one is a boy be dilterent from the answer found on p. 451 Explain. 0.6, P(at 2.4.6. Two evenls, A and B, are defined on a sample space S such thai P(AI8) least one of the events occurs) and P(exactly one of Ihe events occurs) 0.6. Find PI and P(8). 2.4.7. An urn contains one red chip and one white chip. One is drawn at random. If the chip selected is that chip with two additional red chips are put back inlo the urn. If a white is drawn, the chip is returned 10 Ihe urn. Then a second chip is drawn. What is the probability that both selections are red? 2.4.8. Given lhal P(A) = (I and P( 8) = b, show that = = P(AIB) ?: 0+17-1 b 2.4.9. An urn contains one white chip and a second chip thai is equally likely to be white or black. A chip is drawn at random and relUrned to the urn. Then l:l set:Um.l is drawn. What is the probability that a white appears on the second draw given thai a while appeared on the first draw? 2.4.10. events A and 8 are such Ihat P(A n B) = 0.1 and P«A U B)e) 0.3. If 0.2, what does prCA n B)I(A U B)C] equal? Him: Draw the Venn diagram. 2.4.11. One hundred voters were asked their opinions of two candidates, A and B, running tor mayor. Their responses to three questions are summarized below: = Yes Do you like A'] Do you like B? Do you like both? 65 55 25 (0) What is the probability that someone likes neither? (b) What is the probability thl:lt someune likes exadly one? (c) Whal is the probability that someone likes at least one? (d) What is probabilily that someone likes at most one? (e) What is the probability that someone likes exactly one thai they like at least one? (I) or those who like alleasl one, what proportion like both? (g) Of those who do nol like A, what proportion like B? 2.4.12 A fair coin is rossed three limes. What is the probabilJty that at least two heads will occur given that at most two heads have occurred? 2.4.13. Two fair dice are rolled. What is Ihe probability that the number on the firsl die was at least as as 4 given that the sum of the Iwo dice was eight? 2.4.14. Four cards are dealt from a standard 52-card poker deck. What is the probability Ihat different all four are aces given that at least three are aces? Note: There are 270, sets of (our cards that can be dealt. Assume that the probability associated with each of those hands is 1/270. 725. Section 2.4 2.4..15. Conditional Protxlbilfty that P(A () BC) = 0.3, P«A U B)c) = 0.2, and P(A 2..4.16. Given that P(A) 2.4.17. Let A and B B) = 0.1, find P(AIB). = 0.4, find P(A). two events defined on a sample space S such that peA n Be) = OJ, + P(B) = 0.9, P(AIB) n 53 0.5, and P(BIA) P(Ac n B) = 0.3, and P«A U 8)e) = 0.2. Find the probability that at least one of the two events occurs given that at most one occurs. 2.4.18. Suppose two dice are rolled. Assume that each possible outcome has probability 1/36. Let A be the event that the sum of the two dice is greater than or equal to eight, and let B be the event that at least one of the dice is a 5. Find P(AIB). 2.4.19. According to your neighborhood bookie, there are horses scheduled to run in the third race at the local track, and handicappers have assigned them the foHowing probabilities of winning; Horse Scorpion Starry Avenger Australian DoU Dusty Stake Outandout 0.10 030 0.20 Suppose that Australian DoH and Dusty are scratched from the race at the last minute. What are the chances that Outandoot will prevail over the reduced field? 2.4.20. Andy, Bob, and Charley have all been serving time for grand auto. According to prison scuttlebutt, the warden p1ans to release two of the three next week. They aU have identical records, so the two to be released win be chosen at random, meaning that each a two-third probability of being included in the two to be set free. Andy, however, is friends with a guard who will know ahead of time which two will He offers to tell Andy the name of a prisoner other than himself who win be released. Andy, however, declines the believing that if he learns the name of a prisoner scheduled to be released, then his chances of being the other person set free will drop to one-half (since only two prisoners will be left at that poinl). Is his concern justified? Applying Conditional Probability to Higher-Order Intersections We have seen that conditional probabilities can be useful in evaluating intersection probabilities-that is, P(A n 8) P(AIB)P(B) P(BIA}P(A). A similar result holds for higher-order intersections. Consider peA n B n C). By thinking of A n B as a single event--say, D-we can write = peA n B n C} = = P(D n = P(CID) P(D) =P(CIA n B)P(A n 8) =P(CIA n B)P(BIA)P(A) 54 Chapter 2 Probabil ity Repeating this same argument for n events. A1, A2. "', All, a formula for the general case: P(AI n A2 n ... nAn) :P(A"IAI n n ... nAil_I) n A2 n .. n An-2) . P(An-lIAl .... P(A2I A l) . peAl) (2.4.3) EXAMPlE 2.4.6 An urn contains five white chips, black chips, and three chips are drawn sequentially and without replacement. What is probability of obtaining the sequence (white, white. black)? 2.4.7. shOWS the evolution of the urn's composition as the sequence is """"'..... noL'.",..... Define the following events: WW W@ 4B w 4B ~ 3R ew 0WW ®WW ~ w 4B 4B :3R 2R 2R _ 3B 2R RGURE 2..4..7 white chip is on selection B: red chip is drawn on second selection C: white chip is drawn on third seled.ion D: black crop is drawn on fourth selection Our objective is to:find peA Ii B From Equation 2.4.3, peA n B nCn D) n C n = P(DIA n D). B Ii C) . P(CIA Ii B) . P(BIA) . P(A) Each of probabilities on the right-band side tbe equation bere can be by just looking at the urns pictured in Figure 2.4.7: P(DIA Ii B n C) = ~. P(CIA Ii B) = ~, P{BIA) = and peA) ::::::: Therefore. the probability of drawing a (wrote, red, wrote. black) sequence is 0,02: A. fi· peA Ii B Ii C Ii D) = 4 4 9' . 10 240 = 11,880 =0.02 3 . 5 12 Section 2.4 Conditional Probability 55 CASE STUDY 2A.3 Since the lale 1940s, tens of thousands of eyewitness accounts of strange lights in unidentified flying objects. even alleged abductions by little green men, have made headlines. None of these incidents, though, has produced any hard evidence, any irrefutable proof that Earth has been visited by a race of extraterrestrials. Still, the haunting question remains--are we alone in the universe? Or are there other civilizations, more advanced than ours, making the occasional flyby? Until, or unless, a flying saucer plops down on the White House lawn and a strangelooking creature with the proverbial "Take me to your leader" demand. we may never know whether we have any cosmic neighbors. Equation 2.4.3. though, can help us speculate on the probability of our not being alone. Recent discoveries suggest that planetary systems like our own may be quite common.. If so, there are likely to be many planets whose chemical makeups, temperatures, pressures, and so on, are suitable for life. Let those planets be the points in our sample space. Relative to them, we can define three events: A: life aoses technical civilization arises (one capable C: technical civilization is flourishing now interstellar communication) In teons of A, B, and the probability a habitable planet is presently supporting a technical civilization is the probability of an intersection--specifica1ly, peA n B n C). Associating a number with peA n B n C) is highly problematic, but the task is simplified considerably if we work instead with the equivalent conditional formula, P(ClB n A) . P(BIA} . P(A). Scientists speculate (157) that life of some kind may arise on one-third of all planets having a suitable environment and that life on maybe 1 of all those planets will evolve into a civilization. In our notation, peA} :: land P(BIA) = TAo. More difficult to estimate is P(ClA n On we have had the capability of interstellar communication (that radio astronomy) for only a few decades, so P(CIA n empirically, is on [he order of 1 X 10-8, But that mayan overly lJ"o''''UU'''LL~ estimate of a technical civilization's ability to It may be lrue that if a civilization can avoid annihilating itself when it develops nuclear weapons, its prospects for longevity are fairly good. If that were the case, P(ClA n B) might be as large as 1 x 10-2 . Putting these into the computing formula for peA n B n C) yields a range for the probability of a habitable planet currently supporting a technical civilization. chances may be as small as 3.3 x 10- n or as "large" as 3.3 X 10-5 : or 0.000000000033 < peA n B n < 0.000033 (Continued 0111101 page) 56 Chapter 2 Probability (Cose Swdl' 2.43 continued) A better way to PUI these figures in some kind of perspective is 10 think in terms of numbers rather Ihan probabilities. Astronomers estimate there are 3 X 1011 habitable plant:ts ill our Milky Way galaxy. Multiplying that total by the lWO limits for P(A n B n C) an indication of how many cosmic neighbors we are likely to have. Specifically. J X 1011 . 0.000000000033"" 10, while 3 x 10" . OJl00033 =. 10,000,000. So. on Ihe one hand. we may be a galactic rarity. At the same the probabilities do not preclude the very renl possibility that the heavens are abuzz with activity and that our number in the millions. QUESTIONS 2.4.21. An urn contains six while chips, four black chips, and five red Five chips are drawn OUI, one at a time and without replacemel1t. What is the probability of the sequence (black. black. red. white, while)? Suppose thallhe chips are numbered I thrOUgh 15. What is the probabililY of getting a specific 6,4.9, 13)? 2.4.22. A man has II on a key one of which opens the door to his apartment. Having celebrated a bit too much one he returns home only to find himself unable 10 distinguish one key from another. Resourceful. he works out a fiendishly clever plan: at random and try it. If it fails to open the doot. he will He will choose a it and choose al random one of the 11 - 1 keys. and so on. Clearly. the probability thai he gains entrance wilh Ihtc fin;{, key he selecls is 1/11. Show that lhe probability the door opcns with the third key he tries is also 1/11. (Hint: What has to to Ihe third key?) happen before he even 2.4.23. Suppose Ihat four cards arc drawn from a standard 52-card deck. Whal is the probability of in order. a 7 of diamonds, a jack of "V(' ....,,~, a 10 of diamonds, and a 5 of hearts? 2.4.24. One is drawn at random from an urn thal contains one white chip and one black white chip is selected, we simply return it to the urn: if the black chip is chip. 1£ drawn, that chip-together with another black-are returned to the urn. Thena second chip is drawn. with Ihe same rules for returning it to the urn. Calculate the probability of drawing lWO whiles followed by thrt:c Lhll.;b. Calculating "Unconditional" Probabilities We Ihis section with lWO very useful theorems Ihal apply to parlilioned sample spaces. By ddinilion, l:I set of I::Vl::llts AI. Al"", All "partition" S if every outcome in B FIGURE 2.4.8 Section 2.4 Conditional Probability 51 the sample belongs to one and only one of the A;'s-that is, the Ai'S are mutually Figure exclusive and their union is S Let B, as pictured, denote event on S. The result, Theorem 2.4.1, gives a formula for the "unconditional" probabillty of B (in terms of the Ai's), Then Theorem calculates the set of conditional probabilities, peA jiB), j 1.2, ' , .• n. = Theorem Let (Ai Ai n Aj=0fori *jJ be a set of events defined over S such that S > Ofori=1,2, ... ,n. anyevenlB, = Ui=l Ai, II PCB) = LP(BIA;)P(A i ) i=1 Proof, the conditions imposed on the and But follows. PCB n Ad can be ITfr·.ttp'TI as product P(BIA;}P(A 1), and the result o EXAMPlE lA.1 Urn I contains two red and four chips; urn II, three red and one white, A chip is drawn at random from urn I and transferred to urn II. Then a chip is drawn from urn n. What is the probabiHty the chip from urn II is Let B be the event "Chip drawn from urn n is red"; let Al and A2 be events "Chip tral.1Sferred from urn I is red" and "Chip transferred from urn I is white," respectively. By (see 2.4.9), we can deduce all the probabilities in the right-hand side of the In • o one Red • While • • UmII Urn I FKiURE 2.4..9 4 5 4 6 - - Drawone 58 Chapter 2 Probability Putting all this infonnation together, we see that the chances are two out of three that a red chip will be drawn from urn II: P(B) = P(BIAl)P(At) + P(BIA2)P(A2) 3 4 -4 . -2 + _. 5£)5 2 6 3 EXAMPLE 2.4.8 is removed. What is the probability A standard poker deck is shuffled and the card on that the second card is an ace? Define the following events: 8: second card is an ace At: top card was an ace Az: top card was not an ace = 12 P(8IA?) :#r, P(A1) ~. Since the Ai'S partition P(BIAt) the sample space of two-card selections, 2.4.1 applies. Substituting into the expression for P(B) shows that is the probability that the second card is an ace: PCB) = P(8IAIlP(At) + 3 = 51 4 . 52 4 P(8IAz)P(Az) 48 + 51 . 4 CommenL Notice that PCB) = P(2nd card is an is numerically the same as The analysis jn Example 2.4.8 illustrates a basic nnnl"1nlf' = P(first card is an in probability that says, in "what you don't know, doesn't matter." Here, removal stlbsequent probability calculations if the identity of of the top card is irrelevant to that card remains unknown. peAl) EXAMPLE 2.4.9 Ashley is hoping to land a summer internship with a public relations firm. If her interview an offer. If the is a bust, though, goes wen, she has a 70% chance of her chances getting the position drop to 20%. Unfortunately, Ashley tends to incoherently when she is under stress, so the likelihood of the interview going well is only 0.10. What is the probability that Ashley gets the internship? Let B be the event "AshJey is internship," let Al be the event "Interview goes well," and Az be the event "Interview does not go welL" By assumption, P(BIAI) peAl) = 0.70 = 0.10 P(BIA2) P(A2) = 0.20 =1 peAl) =1 - 0.10 = 0.90 Section 2.4 According to Theorem 2.4,1, Ashley P(8) a 25% chance = P(8IAl)P(Al) + Cooditlooal Probability 59 landing the internship: P(8IAz)P(AZ) == (0.70)(0.10) + (0.20)(0.90) 0.25 EXAM PtE 2.4.10 In an upstate congressional race, the incumbent Republican (R) is running a field (Dh and DJ) seeking the nomination. Political pundits estimate of three primary are 0.35, 0.40, and 0.25, that probabilities of Dl} D:t, and D3 winning respectively. Furthermore, results from a variety of polls are suggesting that R would have a 40% chance of defeating DI in the general election, a 35% chance of defeating D'2, and Assuming all to be accurate, what are the a 60% chance of defeating chances that the Republican will retain his Let 8 denote the event that" R wins general election," and let denote the event" Di wins Democratic primary"'; i I, 2, 3. Then = and P(RIAI) = 0.40 P(RIAz) =0.35 so P(8) = P(Republican wins general election) = P(8IAl)P(Al) + P(BI A2)P(Az) + P(BI A 3)P(A3) = (0.40)(0.35) + (0.35)(0.40) + (0.60)(0.25) =0.43 EXAMPLE 2.4.11 Three chips are placed in an urn. One is red on both sides, a second is blue on both sides, and the third is red on one side and blue on the other. One is selected at random is What is the and placed on a table. Suppose that the color showing on that probability the color underneath is red (see Figure 2.4.10)1 At first glance, it may seem that the answer is one·half: We know that the blue/blue chip has not been and only one of the remaining two-the red/red CIDII)--S2ltlS'l1es the event that color underneath is Ifthls game were pLayed over and over, though., and records were kept outcomes, it would be found that the proportion of times that a red top has a red bottom is two·rhlrds, not the one-half that our intuition might suggest. correct answer follows from an application of Theorem 2.4.1. 60 Chapter:2 Probability BQ)B RQ)R e d e d I I u u e e RQ)B e I d ~ RGURE 2.4.10 Define the following events: bottom side of chip drawn is red top side of chip drawn is red At: red/red chip is drawn A2: bluelblue chip is drawn red/blue is drawn A: B: From the definition of conditional probability, P(AIB) = But P(A n B) P(both sides are red) used to find the denominator, P(B): P(B} = P(BIA1)P(Al) + 1 =1':3+ 0 peA n PCB) P(red/red chip) P(BIA2)P(A2) + = Theorem 2.4.1 can be P(BIA3)P(A3) III 3 '3+2: 1 =2: Therefore, P(AIB) = 1/2 =-23 Comment. The question posed in Example 2.4.11 gives rise to a simple but effective con game, The trick is to convince a "mark" that the initial given on page 59 i:s correct, meaning that the bottom has a fifty-fifty chance of being the same color as Under that incorrect presumption that the game is "fair," both panicipants put up the same amouot of money, but the gambler (know.ing the correct analysis) always bets that the bottom is the same color as the top. In the long run., then, the con artist will be winning an even-mooey bet two-thirds of the time! Section 2.4 Conditional Probability 61 QUESTIONS 2.4.25. A toy manufacturer buys ball bearings from three different suppliers-50% of her 2.4.26. 2.4.27. 2A.2&. 2.4.29. 2A.30. 2.4.31. 2.4.32. 2.4.33. 2.4.34. total order comes from supplier 1, 30% from supplier 2, and the rest from supplier 3. Past experience has shown thal the quality control standards of three ""!-'VU'LOL are not all the same. Two percent of the ball produced by supplier are defective, while suppliers 2 and 3 produce defective bearings 3% and 4% of the time, respectively. What proportion of the ball in the toy manufacturer's inventory are defective? A coin is If a head turns up, a fair die is if a tail turns up, two fair are tossed. What is probability that the (or the sum the faces) showing on the die (or the dice) is equal to six? Foreign policy experts estimate that the probability is 0.65 that war will out next year between two Middle East countries if either side significantly escalates terrorist activities: Otherwise, the likelihood of war is estimated to be 0.05. on what has happened this year, the chances of a critical level in the next twelve months are thought to be three in ten. What is the probability that the two countries will go [0 war? . A telephone solicitor is responsibte for canvassing three suburbs. In the 60% of the completed calls to Belle Meade have resulted in contributions. compared to 55% for Hill and for Antioch. Her list of telephone num bers inc! ude..<; one thousand households from Belle,Meade. one thousand from Hill, and two thousand from Antioch. Suppose [hat she picks a number at random from the list and places the call. What is probability that she a donation? If men constitute 47% of the population and teU the truth 78% of [he time, while women tell the [ruth 63% of the time, what is the probability that a person selected at random will answer a question truthfully? I three red and one white chip. Urn II two red and two white drips. One chip is drawn from each urn and transferred to the urn. Then a chip is drawn from the first urn. What is the probability that the chip ultimately drawn from um I is red? The crew of the Starship Enterprise is considering launching a surprise attack against Borg in a nemral quadrant. Possible interference by Klingons, though. is causing Captain Picard and Data to reassess their strategy. According to Data's calculations, the probability of the Klingons joining forces with the Borg is 0.2384. Captain Picard feels that the probability of the attack being successful is 0.8 if the Enterprise can catch the Borg atone, but only 0.3 if they have to engage both adversaries. Data claims that mission would be a misadventure if its probability of success were not at least 0.7306. Should the Enterprise attack? Recan "survival" lottery described [n Question 2.2.14. What is the probability of release associated with the prisoner's optimal strategy? State Co1!ege is playing Backwater A&M for the oonference footba1! championship. If Backwater's first-string quarterback is healthy. A&M has a 75% chance of winning. If they have to start their backup quarterback, their chances of winning drop to 40%. The team physician that there is a 70% chance that the first-string quarterback will play. What is the probability that Backwater wins the game? An urn oontains red and sixty white chips. Six chips are out and discarded, and a chip is drawn. What is the probability that the seventh chip is red? 62 Chapter 2 Probability 2A.J5. A 2.4.36. 2.4..37. 2.4.38. 2.4.39. has show!l that seven out of I.en people will say "heads" if asked to call a coin though, a head occurs, on the average, only five limes that the coin is OUl ten. Does it follow that have the advantage if you let the other person call the toss? Explain. Based on pretrial speculation, the probability that a jury returns a guilty verdict in a celtain high-profile murder case is thought to be 15% if the defense can discredit the police department and 80% if cannot. Veteran court observers believe that the skilled defense attorneys have a 70% chance of convincing the that the either contaminated Or planted some of the key evidence. What is probability Ihat the jury returns a guilty verdict? As an incoming freshman, Marcus believes that he has a 25% chance of earning a GPA in the 3.5 to 4.0 range, a 35% chance of graduating with a 3.0 to 3.5 GP A, and a 40% chance of with a less than 3.0. From what lh~ pre-med advisor has told him, he has an 8 in 10 chance of into medical school if his GPA is above a5 in 10 chance ifhi£ GPA is in to 35 ranee, and only a 1 in iO chance if his GPA falls below 3.0. Based on those estimates, what is the probability that Marcus gets into medical school? The governor of a certain state has decided to come out strongly for prison reform and is a new early-release program. Its gUidelines are related to JJ1~lJIl:x:ni uf the gU\lefllOf's staff would have a 90% chance of being released early; the proba bilily of early release for inmates not related to the governor's staff would be 0.01. Suppose that 40% of all are related to someone on the governor's siaff. What is the probability that a prisoner selected at random would be eligibJe for early release? are the percentages of sludents of Slate College enrolled in each of the school's main divisions. Also listed are (he projX)rtions of students in each divisiun who are women. Divjsion % % Women Humanities Natural Science History Social Science 40 10 30 20 60 15 45 75 100 Suppose the Registrar selects one person at random. What is the probability that the student selected will be a male? Bayes Theorem The second resull in this section that is set the backdrop of a partitioned sample has a curious history. The first exp]icit statement Theorem in 1812, was due to Laplace, but it was named after the Reverend Thomas Bayes, whose 1763 paper (published posthumously) had already outlined the result. On one It:vd, th~ theorem is a relatively minor extension of the definition of conditional probability. When viewed frum a loftier perspective, though, it lakes on some rather profound philosophical implications. The in have precipitated a schism among statIStiCIanS: "Bayesians" analyze data one way; "oon-Bl'Iyesil'lns" often take a fundamentally different approach (see Section 5.8). Section 2.4 OUf use of the result here will have nothing to do with Conditional Probability 63 statistical interpretation. We will apply it simply as the Reverend Bayes originally intended, as a formula for a certain kind of "inverse" probability. If we know P (BIAi) for all i. the us to compute conditional probabilities "in the other direction"-that is, we can deduce P(AjIB) from the P(BIA/)'s. TIleol'em 2.4.2.. (Bayes) Let {Ai be a set of n events, each with positive probability, that partition S in such a way that U7=1 Ai = S and Ai n A j = 0 for i '# j. For any event B (also defined On S), where PCB) > 0, for any 1 ::: j :::; n. Proof. Definition But Theorem 2.4.1 allows the denominator to be written as P(BIAi )P(A,), and the o result follows. PROBLEM-SOLVING HINTS (Working with Partitioned Sample Spaces) Students sometimes have difficulty setting up problems that involve partitioned sample particular. ones whose solution requires an application of Theorem 2.4.1 or 2.4.2--because of the nature and amount information that needs to incorporated into the answers. The is learning to which part of the "given" corresponds to B and which parts correspond to the Ai'S. The following hints may help. 1. As you read the question, pay particular attention to the last one or two sentences. Is the problem asking for an unconditional probability (in which case Theorem 2.4.1 applies) or a conditional probability (in which case Theorem 2.4.2 applies)? 2. If the question is asking for an unconditional probability, let B denote the event whose probability you are trying to if the question is asking for a conditional already happened. probability, let B denote the event that 3. Once event B has identified, reread the beginning of the question and assign the Ai'S. EXAMPLE 2.4.12 A biased coin, twice as likely to come beads as tails, is tossed once. If it shows heads, a chips; if it shows chip is drawn from urn r, which contains three white chips and four 64 Chapter:2 Probability 6W 3R Urn I Uroll White is drawn FIGURE 2..4.11 tails, a chip is drawn from urn II, which contains six white chips and three red chips. a was drawn, what is that the coin came up (see Given Figure 2.4.11)? and P(Tails) Since J'(Heads) 2J'tTails), it must be true that lL:U~i:1U:'J = Define the events i white chip is AI: coin came B: chip came from urn I) chip came from utn II) Figure 2.4.11, objective is to find P(A2IB). 3 7 P(BIAl) =- P(Ad 2 =3 so 16 EXAMPlE 2..4.. 13 a power blackout, one persons are on of looting. Each is a polygraph tesl experience it is known the polygraph is 90% reliable when administered t.o a guilty suspect and 98% reijabJe when given to someone who is innocent. Suppose that of the one hundred persons taken into custody, only twelve were actually involved in any What is the probability that a suspect is innocent that the polygraph is guilty? Let B be the event UPolygraph is guilty," and let Al "Suspe~t is and "Suspe.ct is not guilty," rec;pectively. To say Section 2.4 Conditional Probability 65 is "90% reliable when administered to a guilty means that P(BIA}) = 0,90. Similarly. the 98% reliability for innocent suspects implies P(BcIAz) = 0.98, or, equivalently, P(BIAz) :::: 0.02. We also know that peAl) == and P(Az) = ,p&. Substituting into Theorem 2.4.2, tben, shows that tbe probability a suspect is innocent gjven that the polygraph says he is guilty is 0.14: /& P(AzIB) P(BIAZ)P(Az) P(BIAz)P(A2) = P(BIA])P(AI) + (0.02)(88/100) (0.90)(12/100) + (0,02)(88/100) =0.14 -~~~~~--~~~~~ EXAMPLE 2.4.14 As medical technology advances adults become more health conscious, the demand for screening tests inevitably increases. Looking for problems, tbougb, when no symptoms are can have undesirable that outweigh the intended benefits. Suppose, for example, a woman bas a medical procedure performed to see whether she has a certain type of cancer. Let B denote the event that the says she has cancer, does not). and let Al denote the event that she actuaUy does (and Az. the event that Furthermore, suppose the prevalence of the and the precision of the diagnostic test are such that = 0.0001 = . [and P(Az) 0.9999] P(BIAl) = 0.90 = P(testsays woman has cancer when, in peAl) P(BIAz) she does) = P(BIAf) =0.001 = P(fa.lsepositive) = P(test says woman has cancer when, in fact, she not) What is the probability that she does have cancer, that the diagnostic procedure says does? That is, calculate P(AIIB). Although the metbod of solution here is straightforward, tbe actual numerical answer is not what we would expect. From Theorem 2.42. P(BIA,)P(Al) B P(AtI ) = P(BIAl)P(Al) + P(BIAf)P(Af) (0.9) (0.0001) -~--~~~--~-------- =0.08 So, only 8% of those women identified as having cancer actuaUy do! Table 2.43 shows strong dependence P(A1IB) on peAl) and P(BIAf)· 66 Chapter 2 TABLE 2.4.3 P{AI) 0.0001 0.001 P(AIIB) 0.001 0.0001 0.001 0.0001 0.01 0.001 0.0001 0.08 0.47 0.47 0.90 0.90 0.99 In of these probabilities, the Pr1lctllCaJlIl), of screening programs at diseases having a low prevalence is open to especially when the diagnostic procedure, il.M:lf, pu:.t::s a nunlrivhd health ril>k. th~c two re~OIl.6, the U:i>e of che::>,! to screen for tuberculosis is no advocated by tbe medical community.) EXAMPLE 2.4,15 to the manufacturer's specifications, your home burglar alarm has a off if someone breaks into your During the two years you lived the alarm went off on five different nights, each time for no apparent reason. Suppose the goes off tomorrow night.. What is the someone is trying to break Note: Police statistics show that the chances any particular house in your being burglarized on any are two in len thousand. Let B be the event goes off tomorrow night," let Al and A2 be the events "House is and "House is not being "respectiveLy. Then P(BIAl} 0.95 P(B[A2) 5/730 peAl) (i.e., five nights in two years) = 2/10,000 9,998/10,000 The pmbabilityin question is P(AtIB). Intuitively, it might seem that P(AIIB) should be dose to one because probabilities look is dose to one (as it shouLd be) and tozem (as it should P(AtIB) turns out to =--~----~--~~~~------~~ (0.95)(2/10,000) + (5/730) (9998/10.000) = 0.027 Section 2..4 Conditional Probability 67 That is, if you hear the going off, the probability is only 0.027 that the house is burglarized. P(A2) is so ComputationaUy. reason P{AIIB) is so makes the denominator P(AtlB) large and, «washes out" the numerator. Even if P(BIAl) were substantially increased (by installing a more expensive P(AIIB) would remain largely unchanged (see 2.4.4). TABlE 2.4..4 0.95 P(AllB) 0.97 0.027 0.99 0.999 0.028 0.028 EXAMPLE 2_4.16 Currently a college senior, Jeremy has bad a secret crush on Emma ever grade. Two fearing that his feelings would forever go unrequited, Man, acknowledging his sesilence and sent a letter t1:lTough cret romance. Now, fourteen agonizing days bas yet to receive a Hoping someone'g against hope, Jeremy and his fragile psyche are clinging to the possibility letter was lost mail Assuming that (1) (wbo is actually dating Jeremy's father) has a 70% cbance of mailing a if, in fact, had the letter and (2) Campus Post Office has a one in fifty chance of losing any particular piece of mail, what is the probability that Emma never received Jeremy's confession of the heart? Let B the event that Jeremy not receive a response; let AJ A2 denote the events did and did not, respectively, receive Jeremy's objective is to find P(A2IB). From what we know about Emma's behavior and the incompetence the Campus P(A2) = and, course, P{BIA2) = L Also, Post P(BIAl) = P(Jeremy receives no I Emma received Jeremy's letter) P lEmma does not respond U P{Emma does not respond) = X P(Emma responds) 0.30 + (1/50)(0.70) 0.314 + responds II Post Office loses letter)] P(letter is lost I Emma responds) 68 Chapter 2 Probability Sadly, the is nol nevvs for Jeremy. If P(A2IB) = 0.061, it follows that Emma's rtlctliveu the Itluer but nol caring enough to respond was almost 94%. "Faint heart ne'er won fair but Jeremy would probably be weB-advised to direct his romantic H QUESTIONS 2.4.40. Urn I contains two white chips and one red urn II has one white chip and two red One chip is drawn at random from urn and transferred to urn n. Then One chip is drawn from urn II. Sup}XlSe that a red chip is selected from urn II. What is Ihe probability thai the chip transferred was white? 2.4.4L Urn I contains three red chips and five white chips; Urn II contains four reds and four Urn III contains five reds and three whiles. One urn is chosen at random and one chip is drawn from thai urn. Given that the chip drawn was red. whal is the probabilily lh<tl III W<t~ lh<: urn ~lIIph:d? 2.4.42.. A dashboard light is to flash red if a car's oil pressure is too low. On a certain model. the probahmty or the light flaShing when it should is 0.99; 2% of the reason. If there is a 10% chance thai the oil time, though. it flashes (or no pressure really is low. what is the probability that a driver needs to be concerned if the warrnng light goes on'! 2.4..43. Buildingpermils were issued to three contractors startingup a new SUbdivision: Tara Construction built two three houses: and Hearthslone, six of developing leaky basements; homes houses. Tara's houses have a 60% thal same problem 50% or the time and 40% buill by Westview and Hearthstone of the lime, respectively. Yesterday, lhe Betler Business Bureau received a complaint from one of the new homeowners that his basement is Who is most likely to have been the conlractm-? 2.4.44. Two sections of a senior probability course are From what she has heard aootu Ih~ Iwo inslrtlClors lisl~{1. Fr?lnf'.esril eslimilles IhM her ch;m~s of prI,-<;ing the Y. The section course are 0.85 if she gets professor X and 0.60 if she into which she js put is determined by the registrar. her chances of being pn)IeSSc)rX are four out of len. Fifteen weeks later we learn Ihat Francesca pass the course. What is the probability she was enrolled in X's section? 2.4.45. A store owner is willing to cash personal checks for amounts up to but she of customers who wear sunglasses. of checks written by bounce. In contrast. 98% of the checks written by persons not wearing clear Ihe bank She estimates that 10% of her customers wear :'Ullgl<.il>~~:'. If Iht: b<tuk IdUlII:. a check and lI~afks it "insufficieut fWlds." what is the sunglasses? probability it was written by someone 2.4..46.. Brett and Margo have each thoughl ahout murdering their rich Uncle Basil in claiming their inheritance a bit early. to take advantage of Basil's rat poison in the cherries flambe: for immoderate desserts, Brett has unaware of Brett's has the chocolate mousse with cyanide. (jiven the amounls likely lo be eaten. the of the rat poison being falal is 0.60: the cyanide. O.9(). Based on other where was presented with the same dessert options, we can assume that he has a chance of asking for the cherries Hambe, a 40% chance of ordering the chocolate mousse, and a 10"/0 chance of skipping dessert Section 2.5 2.4..47. 2A,48. 2.4.50. 2.4.51. 2.4.53. Independence 69 altogether. No sooner are the dishes cleared away when Basil drops dead. In the absence of any other evidence, who should be considered the prime ,,'-'""v"'..... Josh takes a twenty-question multiple-choice exam where each question has five answers. Some of the answers he knows, while others he gets right just by making lucky guesses. Suppose that the conditional probability of his knowing the answer to a randomly selected question given that he got it js 0.92. How many of the twenty questions was he prepared for? Recently the Senate Committee on and Public Welfare investigated the feasibility of setting up a national screening program to detect child abuse. A team of consultants estimated following probabilities: (1) one child in ninety is abused, (2) a physician can detect an abused child 90% of the time, and (3) a screening program would incorrectly I a be13 % of all nonabused children as abused. What is the probability that the screening program makes that diagnosis? that a child is actually abused How does the probability change if the incidence of abuse is one in one thousand? Or one in fifty? At State University, of the students are majoring in Humanities,5O% in History and Culture, and 20% in Science. Moreover, according to figures released by the Registrar. the percentages of women majoring in Humanities, History and Culture, and Science are 75 %,45 %, and 30%, respectivel y. Suppose Justin mee 1s Anna at a fraternity What is probability that Anna is a History and Culture ...... ""...... / An !'eyes-only" diplomatic message is to be transmitted as a binary code of Os and Is. Past experience with the equipment being used suggests that if a 0 is sent, it will be as a 1 10% of the (correctly) received as a 090% of the time (and mistakenly it will be as a 1 of the time (and as a 0 of the time). If a 1 is time). The text sent is thought to be 70% 1s and 30% Os. Suppose the next sent is received as a What is the probability that it was sent as a O? When Zach wants to contact his girlfriend and he knows she is not at home, he is twice as likely to send her an as he is to leave a message on her answering machine. is 80%; her chances The probability that responds to his e-mail within three of being similarly prompt in answering a phone message increase to 90%. Suppose responded to the message he left this morning within two hours. What is the probability that Zach was communicating with her e-mail? A dot.com company ships products from three comins A are Based on customer complaints, it appears that 3 % DC the somehow faulty, as are 5% of the shipments coming from B, and 2% coming from C. Suppose a customer is mailed an order and calls in a complaint the next day. What is the probability the item came from Warehouse C? Assume that Warehouses A, B, and C ship 30%, 20%, and 50% of the dot-corn's sales, respectively. A desk three The first contains two coins, second has two silver coins, and the third has one gold and one silver coin. A coin is drawn from a drawer selected at random. Suppose the coin selected was silver. What is the probability that the other coin in that drawer is gold? INDEPENDENCE Section 2.4 with the problem of reevaluating the probability of a given evenl light is the of the additional information that some other event has already occurred. It probability of the given event remains unchanged, regardless of case, though, that the outcome of the second event-that is, P(AIB) = peA) = P{AIBc). Events sharing 10 Chapter 2 Probability this prupt:rty <In: ~l:Iiu to be Dt:finiliun 2.5.1 condition for two events to be independent. a nt:cessary and sufficit:nl independent if peA n B) = Definition 2.S.L Two events A aDd B are said to peA) . PCB). The fact that the probability of the mt,ersecllOn events is equal to the product of their individual probabilities follows from our first independence, that P (A I B) = P(A). Recall that the rIP . . ." " " , , , of holds true for any two events A and B [provided that > 0) I: n P(AIB) = peA B) PCB} n But P(AIE) can equal peA) only if peA B) factors into peA) times PCB). EXAMPLE 2.5.1 poker deck and B. the event of B are independent because the drawing a diamond. prob<lbility of their ufuiamumb-is I::I..Jua11u peA} . PCB): peA n 5~ B) = 1 1 peA) . PCB) 4 EXAMPLE 2.5.2 ~UIPP()Se that A and B are independent events. Does it foHow that and are also c That is, does peA n B} = peA} . PCB) guarantee that P(A nBc) = acc:oITlpllshe;d by equating two different ... v., ...p'~"'''Yn''' for u u COlnplenlen,lS is the complement of their intersection (recall Ques- But the union tion 2.232). =1 - u Combining Equations and Since A and B are independent, peA nBc) = 1 - n (2.5.2) we +1 1 - peA n B) = 1 - peA) P(A c P(A n peA) = peA) +1 = [1 - P(A)][l = P(A c ) . P(B c ) PCB) PCB), so - PCB} - [1 PCB)] P(A c nBC) P(A)· P(B)] Section 2.5 Independence 11 are. themselves, independent. (li A and the latter factorization implying B are independent, ate A and B C lDClepenClen[{ EXAMPlE 2.5.3 Administrntors-R-Us is litigation by establishing hiring far they agreed to employ the 120 people goals by race and sex for its characterized in Table 25.1. How many black women do they need in order for the events A: Employee is female and B: is black to be independent? TABLE 2..5.1 White mack 50 40 30 Male women necessary for A and B to be independent the number Let x Then P(A (i = P(Black female) = x 1(120 + x) must equal P(A)P(8) = Settingx/(120 black women + P(Black) = [(40 = [(40 + x)/(120 to be on the + x)/(120 + x)] + x)] . [(30 + x)/(l20 + x)] . [(30 + x)/(l20 + x)] implies that x = 24 for A and B to be independent. C4>mment. Having shown that is female" and "Employee is black" are it follow that, say, "Employee is male" and "Employee is white" are virtue of the derivation in Example 2.5.2, the independence of events A and B implies the independence of events A C and B C (as well as A and BC B). It follows, thell, that the x = 24 black women not only make A and B maerxmaem they also imply, more generally, that "race" and are independent. two events. A and B, each baving nonzero probability, aremutuaUyexc)usive. Are they independent? No. A and B are mutually exclusive, then peA (i 8) = O. But P(A) . PCB) > 0 (by the equality spelled out in Defini tion 25.1 that characterizes independence is not met. 72 2 Deducing Independence Sometimes the physicaJ circumstances surrounding two events make it that tbe occurrence (or nonoccurrence) one has absollHeJy no influence or occurrence (or nonoccurrence) of the other. If that should be the case, then two events will necessarily be independent in the sense of Definition 2.5.1. SUPJJ'Ose a cOIn is tossed twice. whatever on the first toss has no physical connection or on the outcome If A B, then, are events defined on the second and first it would have to be the case that P(AIB) = P(aIB c ) = P For let A be the event that the second toss of i:I fair cuin is a h~au, and Jet B be the event that the first ross of thac coin is a tail. Then on second toss I Tail on P(AIB) toss) = P(Head on second toss) = 1 2 abLe to infer that certain events are proves to be of enormous help nrl'\l'\lprr'~ The reason is that events of interest are, in fact, interevents are independent then probability of that intersection reduces product (because of Definition 2S1)-that is, P(A n B) = peA) . P(B). For the coin tosses just de.scrihf".d, P(A n B) = P(bead on second toss n = P(A) . P(B) on = P(he.ad on f'(t";cond toSf'() 1 on first 1 = 2: 2: 1 =4 EXAMPLE 2.5.5 Myra and Carlos are summer interns working as proofreaders for a local newspaper. Based on Myra has a 50% chance of sJJ'Otting a hyphenation error, while up on that same kind of mistake 80% of the time. Suppose the copy they are '-""''''''I'; "'·u....O'''." a hyphenation error. What is the probability it goes Let A and B be events that Myra and Carlos, respectively, catch the mistake. assumption, peA) ..;.. 0.50 and P(lJ) = 0.80. What we are looking for is the probability of the complement of a union. That is, P(error goes undetected) =1 =1 P(error is detected) P(Myrll or rarlOR or hoth see the mist.ake) =1 P(A U B) =1 {peA) + P(B) - P(A n B)l (from Section 2.5 Independence 73 proofreaders invariably work by themselves, events A and B are necessarily independent, so P(A hB)_would reduce to the product, peA) . P(B). It follows that an error would unnoticed 10% of the P(error goes undetected) =1 (0.50 + 0.80 (O.50)(O.80)} = 1 - 0.90 =0.10 EXAMPLE Suppose that one of the associated with the control of carbohydrate metabolism exhibits two aUeles--a dominant Wand a w. If the probabilities of the WW, Ww, ww genotypes in the present generation are p, q, and r, respectively, for both males and females, what are the chances that an individual in the next generation will be aww1 Let A denote the event that an offspring receives a w allele its let B denote the event that it receives the recessive allele from its mother. What we are looking for is peA n B). According to information given, p = P(parent genotype WW) = P(WW) q = P(parent has genotype Ww) ,. = P(parent has genotype = P(Ww} = P(ww) If an offspring is equally likely to receive either of its parent's alleles, the probabilities of A and B can be computed using Theorem 2.4.1: peA) = peA I WW)P(WW) + peA I Ww)P(Ww) + peA J ww)P(ww) 1 =O·p+Z·q+l.,. =,. + 'L2 = PCB) any evidence to contrary, there is every reason here to assume are independent events, in which case J...4'_Alllb peA n 'B) A and B = P(ofispring has genotype ww) ir = peA) . PCB) =(r + particular model for allele segregation, together with the independence assumption, is called random Mendelian mating.) 74 Chapter 2 Probabi Irt.y EXAMPLE 2.5.7 and Josh just gotten engaged. What is the probability that they have ...u,....,,_u. blood types? Assume that blood types for both men and women are distributed in population according to the following proportions: Blood A 40% B 10% 5% 45% AB 0 First, note that the event "Emma and Josh have different blood types" includes more possibilities than does the event "Emma and Josh have the same blood type." That being posed. the case, the will be easier to work with than the We can start, then, by writing P(Emrna and Josh have different blood types) = 1 and Josh have the same blood type) Now, if we and Jx the events that Emma and respectively, have blood type then the event "Emma and Josh the same blood type" is a union of intersections, we can write P(Emmaand have the same blood type) = P(EA n lA) n U (EAB u n JB) JAB) U (Eo n Jo)} Since thc rour intcrsections here ure mutually exclusive, the probability their union probabilities. Moreover, "blood type" is not a factor in becomes the sum of the selection of (I spouse, so Ex and are independent events and n Jx) = P(Ex)P(Jx). It follows, that and Josh have a 62.5% chance of having different blood types: blood types) = 1 - {P P(Emma and Josh have + P(JB) P(JAB) 1 - f(0.40)(0.40) + (0.05) (0.05) = 0.625 QUESTlONS 2..§.L Suppose that P(A n B) = P(A) = 0.6, and PCB) "'" 0.5. (9) Are A and B mutually exclusive? (b) Are A and B independent? (c) Find P(Ac U BC). + + + P(Eo)P(JoH (0.10)(0.10) (O.45)(0,45)} Section 2.5 Independence 15 2.5.2.. Spike is not a terribly bright student His chances of chemistry are 035; mathematics, 0.40; and both, 0.12. Are the events "Spike passes and «Spike passes mathematics" independent? What is the probability that he fails both subjects? 2...5.3. Two fair dice are rolled. What is the probability that number showing on one will be twice the number appearing on the other? 2.5.4. Urn T has three red chips. two black chips, and five white chips; urn II has two four black. and three white. One chip is drawn at random from each urn. What is the probability that both chips are the same color? 2.S.5.. Dana and Cathy are playing tennis. The probability that Dana wins at least one out of two games ~ 0.3. What is the probability that Dana wins at least one out of four? 2...5.6. Three points, X I, X2, and X3, are chosen at random in the interval (0. a). A second set of three points, YJ, and are chosen at random in the interval (0. b). Let A be the event that Xz is between Xl and Let 8 be the event that Y1 < Y2 < Find peA n B). 2...5.7. Suppose that P(A) = P(B) = (9) What does P(A U B) equal if i i· 1. A and 8 are mutually exclusive? 2. A and 8 are independent? (b) What does P(A I 8) equal if 1. A and B are mutually exclusive? 2. A and B are independent? 2.5.8. Suppose that events A, B, and C are independent. (s) a Venn diagram to tind an expression for P(A U 8 U C) that does not make use of a complement (b) Find an expression for peA U B U C) that does make use of a complement 2.5.9. A fair coin is tossed four times. What is the probability that the number of heads on the two tOllSes is to the number of heads appearing on the second two tosses? 2.5.10. Suppose that two cards are drawn from a standard 52-card poker deck. Let A be the event that both are either a jack, queen, king, or ace of hearts, and let B be the event that both are aces. Are A and B independent? Note: There are 1,,326 equally-likely from a poker deck. ways to draw two Defining the Independence of More Than Two Events It is not immediately obvious how to extend Definition to, say, three events. To call A, B, and C independent, should we require that the probability of the three-way intersection factors into the product peA n B the n original probabilities, C) = peA) . PCB) . P(C) or should we impose the definition we already have on the three pairs of events: peA n B) = peA) PCB) PCB n C) = PCB) pee) = peA) P(C) peA () C) (2.5.4) Actually, neither condition by sufficient If three events satisfy Equations and 2.5.4, we will call them independent (or mutually independent), but Equation does not 76 Chapter 2 2.5.4. nor does Equation 2.5.4 More generally. the independence possible intersections equaJ the products Definition 2.5.2 states the result t! Equation 2.5.3 (see Questions 2.5.11 events that the probabilities of all aU the corresponding individual probabiHties. r;.JlalV,/",UL'''' to what was true in the case of two of Definition 2.5.2 adse when n events are mutually events, the independent, and we can calculate P(A. n A2 n ... n An) by computing the product peAl) . P(A2)' . Als A2, ... , An are said to be lfUlfep,:;ml'enl if for every set of ... , h bttwt:t:n 1 aml n, indu:,ivt:, Definition 2..5.2. inJiL't::s i1, EXAMPLE 2.5.8 Audrey has registered for four courses in the upcoming faU teon, one each in physics, English, economics, and sociology. on what has happened in the recent past. it would reasonable to assume that she has a 20% chance of being bumped the fTOm the English class, a 30% chance class, a 10% chance of the bumped from the economics and no chance of being bumped SOClOI~)gv class. Wha1 is the probability that she to get into at least one class? events is bumped from physics A2: is bumped from English A3: is bumped from economics Audre' is from sociology As: P(AI) = 0.20, P(A2) - 0.10, P(A3) 0.30, and P(Ad = O. The chance that Audrey gets bumped from at one class can be written as the a union, P(Audrey is bumped at least one class} = P(Al U (2.5.5) but evaluating Equation 2.5.5 is somewhat jnvolved because Ai'S are not mutuaHy exclusive. A much simpler is to express the complement from at least one" as an intersection: P ( Audrey is P ( at least one class 1 different departments are involved, the "factors" and we can write Audrey is not from any das.~s p(Af n ) Af n Af n Af) are likely to be independent events, so IPT'''U'{''rU1,n is bumped from at least one 1 - P(Af)P(Af)P(Af)P(Af) 1 - (0.80)(0.90)(0.70)(1.00) Section 2.5 Independence n EXAMPLE 2.5.9 The YouDie-WePay Insurance Company plans to assess future liabilities by sampling the reCQrds of its current policyholders. A pilot study has turned up three clients-one living in Alaska, one in Missouri, and one in Vermont-whose chances of surviving to the year 2010 are 0.7, 0.9, 0.3, respectively. What is the probability that by the end of 2009 the company will have had to pay death benefits to exactly one of the three? Let Ai be the event "Alaska client survives through 2009." Define A2 and A3 analogously for the Missouri client and Vermont dient, respectively. Then event wrilten as the union of three intersections: "Exactly one dies" can Since each peE) the intersections is mutually exclusive of the other two, = P(AI n A2 n Af) + P(AI n Furthermore, there is no reason to believe that for an practical purposes the fates of the are not independent. That being the case, each of the intersection probabilities reduces to a product, and we can peE) = P(Ad . P(A2)' (0.7){0.9)(0.7) + P{Af) + P(Al) . P(Af) . P(A3) (0.7)(0.1)(0.3) + + p(Af) . peAl) P(A3) (0.3)(0.9)(0.3) ::;: 0.543 Comment. "Declaring" events independent for reasons other than those prescribed in Definition is a necessarily subjective endeavor. Here we might feel fairly certain that a "random" person dying in Alaska. will not affect the survival chances of a "random" person residing in Missouri (or Vermont). But there may be special circumstances that invalidate that sort of argument. For example, what if the three individuals in question assigned were mercenaries in an African border war and were aU crew on an individual to the same helicopter? practice, all we can do is look at each basis and try to make a reasonable judgment as to whether the occurrence of one event is likely to influence the outcome of another. EXAMPLE 2.5.10 Protocol for making financial decisions in a certain corporation follows "circuit" pictured in Figure 2.5.1. Any budget is first screened by L If he approves it, the plan is 5. If either 2 or 3 concurs, it goes to 4. H either 4 or 5 say "yes.·' it forwarded to 2, 3, moves on to 6 for a final reading. Only if 6 is also in agreement does the proposal Suppose that 1, 5, and 6 each a 50% chance saying "yes," whereas 2, 3, and 4 will each concur with a probability of 0.70. If everyone comes to a decision independently, what is the probability that a budget win pass? Probabilities of this sort are calculated by reducing the circuit to component unions and intersections. Moreover, if all decisions are independently, which is case here, then every intersection becomes a product. 18 Chapter 2 Probability L------I5f-------' FIGURE 2.5.1 Let Ai be the. event that pe.rson I Figure aO!)HYVeS the "".nlnA< i = 1> 1., ... ,h. Looking at we see that P(budget I-'U~'~J = P(AI n n {[(A2 U Al) A4] U As} n A6) = P(Al)P{[(A2 U Al) n A4] U As}P(A6) By assumption. P(AI) = P(A6) = so P{[(A2 U A3) P(A2) = 0.7, P(A3) = 0.7, P(A4) = 0.7, peAs) = n A4D = [P(A2) [0.7 + 0.637 P(budget passes) + = (0.5){0.637 + - and P(A2)P(A3)jP(A 4 ) (0.7) (0.7)J (0.7) (0.637) (O.S)} (0.5) Repeated Independent Events We have seen several examples where the event was actually an intersection independent events (in case the probability of the intersection reduced to a product). There is a case that basic scenario that special mention it applies to numerous situations. If the events up the intersection aU arise from the same physical circumstances and (i.e., the same experiment), they are referred to as repealed they represent independent trials. The number such trials finite or infinite. EXAMPLE 2.5.11 Suppose the of Christmas tree light.<; you bought has twenty-four bulbs wired has a 99.9% of "working" the first time current is tl~~)U""W, in series. If each what is the probability that the itself, will not Section 2.S Independence 19 Let Ai be the event that the ith bulb fails, i = 1.2.... , 24. Then l' (string fails) = l' (at least one bulb fails) = p(Al U U ... U A24) = 1 - p(string works) = 1 - P(all twenty-four bulbs work) = 1 - p(Af n Af n .. , n Af4) If we assume that bulb failures are independent events, Moreover, since aU the bulbs are presumably manufactured the same way, same for an i, so peAr> is the P(stringfails) = 1 - {p{Af))24 =1 - (0.999)24 1 - 0.98 =0.02 The chances are one fifty, in other words, that the you take it out of the box. would not work the first time EXAMPlE 2.5.12 A box contains one two-headed coin and eight fair coins. One is drawn at random and seven times. Suppose that seven tosses come up heach;. What is the probability that the coin is This is basically a Bayes' problem, but the conditional probabilities on the righthand side of Theorem 2.4.2 appeal to the notion of independence as well. Define the events an B: seven heads occurred in seven tosses A1: coin tossed has two heads coin tossed was fair The question is asking for peAl I 8). By virtue of the composition of the box. peAl) P(B = ~ and peAl) ::::: &. AJso, I Al) = l' (head on first toss n ... n head on seventh toss I com has two heads) =17 =1 80 Chapter 2 Probabi Iity Similarly, p( B I A2) = (!) 7. Substituting into Bayes's Connula shows that the probability is 0.06 that the coin is fair: P(A2 I B) = PCB I At)P(Al) + PCB I OfO) == 1(!) + Gf (~) =0.06 Comment. Let Bn denote the event that the chosen at is tossed 1'1 times with the result being that n heads appear. As. ollr intllition would suggest. P (A2 I Bn) -+ 0 as 1'1 -+ 00: o lim P(A2 I B,,) = "_00 EXAMPLE 2.5.13 During the 1978 basebaH season, Pete Rose of the Cincinnati Reds set a National League record by safely in 44 consecutive Assume that Rose is a .300 hitter and ai-bat is assumed to be an independent that he comes to bat four times each game. event, what probability might reasonably be associated with a hitting streak of that length? this problem we to invoke the repeated independent trials model twice---once tor the four at-bats making a game and a ~umllllIle: for tbe gamcs iUU"-''''15 up streak. Let Ai denote the event "Rose in ith game," i = 1,2.. " ,44. Then P(Rose hits safely in forty-four consecutive games) = P(Al n n ... n A«) = P(Al) . P(A2) ..... P(A«) (2.5.6) Since aU the peA;),s are equal, we can further simplify Equation 2.5.6 by writing r P(Rose hits safely in 44 consecutive games) = P(Al)]« To calculate peAl) we should focus on the complement of Specifically, P{A1) = 1 - P(Af) = 1 - peRuse: does nut hit safely in Game 1) = 1 - P(Rose makes four outs) = 1 - (0.700)4 =0.76 (why?) Section 2.5 Independence 81 Therefore, the probability of a .300 hitter putting together a forty-four-game streak (during a given set of forty~four games) is 0.0000057: P(Rose hits safely in forty-four consecutive games) = (0.76)44 = 0,0000057 Comment. The analysis described has the "structure" of a r ....""<I'I.....-I independent trials problem, but the assumptions that the latter are not entirely satisfied by the data. Each at-bat, for example, is not really a repetition of the same different experiment, nor is P(AI) the same for all i. Rose would obviously probabilities of getting a hit against different pitchers.' Moreover, "four" was probably the typical number of official at-bats that had during a game, but would either or more. Modest deviations from have been many instances where he a major on the probability associated with game to game, though, would not Rose's forty-four-game streak. EXAMPLE In a certain third world nation, statistics show that only eight out of ten children born in the early 1980s reached the age of twenty-.one. If the same mortality rate is operative over the next generation, how many children does a woman need to bear if she wants to have at least a 75% probability that at least one of her offspring survives to adulthood? Restated, question is asking for the smallest n such that P(at one of n children survives to adulthood) 2! 0.75 Assuming that the fates of the n children are independent P(at least one (of n) " ..TV'V',." to age twenty-one) = 1 =1 Table P(aH n die before adulthood) (0.80)" shows the value of 1 - (0.80)" as a function of n. TABlE 2.5.2 n 1 - (0.80)" 5 0.67 0.74 0.79 6 7 By inspection. we see that the smallest number children for which the probability is at least 0.75 that at least one of them survives to adulthood is seven. 82 Chapter 2 Probability EXAMPLE 2.5.15 (Optional' In the game one of the ways a can win is by rolling (with two dice) one of the sums four, five, eight, nine, or ten, and then rolling that sum again before rolling a sum of seven. For example. the sequence of sums six, five, eight, eight. six would result in the player "point," and he "made winning on hj,s fifth roll. In gambling parlance, "six" is the the of sums eight, four, ten, seven would result in point." On the other roll: his point was an but he a sum seven bethe pla}'-er losing 011 his fore he rolled a second eigh 1. What is the probability that a player wins wi th a point of ten? TABLE 2.53 Sequence of RoUs Probability (10,10) (10, no 10 or 7, 10) (10. no 10 or 7, no 10 or 7,10) (3/36)(3/36) (3/36)(27/36)(3/36) (3/36)(27/36)(27/36)(3/36) Table shows some of the ways a player can make a point often. Each sequence, of course, is an intersection of independent events, so its probability becomes a product. The is then the union of all the sequences that could have event "Player wins with a point column. Since all those are mutUally the been listed in the of winning with a of ten reduces to the sum of an infinite number of (2.5.7) Recall from algebra that if 0 < r < 1, = 1/(1 - r) to Equation 2.5.7 shows that the Applying the formula for the sum of a geometric probability of winning at craps with a point of ten is P(Player wins with a point often) = ;6 = 1 3 1 "",,",""1"\ 2.5 Independence 83 TABLE 2.5,,4 P (makes point) 4 1/36 16/360 25/396 25/396 16/360 1/36 5 6 8 9 10 Comment. Table 25.4 shows probabilities a person "making" each of the possible 5,6,8,9, and 10. Acoording to the of craps, a player wins by either (1) getting a sum of seven or eleven on the first roU or (2) getting a 4, 5, 6 ,8.9, or 10 on the first roll and making the point. But P(sum = 7) = 6/36 and P(sum = 11) = 2/36,80 . 6 P (player wms) = + 2 36 + 1 36 + 360 + 25 396 + 2S 396 + 360 + 1 36 =0.493 As "''I1,o'n_1'nroT''''I1 games go, craps is relatively than 0.500. probability of the shooter winning QUESTIONS 2.5.U. Suppose that two fair dice (one red and one green) are rolled. Define events a 1 or a 2 shows on the red die B: a 3, 4, or 5 shows on the green die A: c: the dice total is four, eleven, or twelve Show that these events satisfy Equation 2.5.3 but not Equation 25.4. 2.5.12. A roulette wheel has thirty-six numbers colored red or black: according to the pattern indicated below: Roulette wheel pattern 1 2 .3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 R R R R R B B B B R R R R B B B B B 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 Define the events A: Red number appears B: Even number c: Number is less than or equal to eighteen Show that events satisfy Equation 2.5.4 but not Equation 2.5.3. 2.5.13. How many probability equations need to be verified to establish the mutual independence of four 84 Chapter 2 Probability 2.5.14. [n a roll of a pair of fair dice (one red and one let A be Ihe even! the red die shows a 1 or a 2; and let C be tJle shows iii 3, 4, or 5; let B be the event tJle green event/he dice total is seven. Show that A, B, and C are independent. 2.5.15. In a roll of a of fair dice red and one let A be the event of an odd number on the red die, Let B be the event of an odd number on the green and let C be the evenl Ihatthe slim i.~ odd_ Shnw that any pair of Ihese events are independent but that A, B, and C are not mutually independent. 2.5.16. On her way to work. a commuter encounters (our traffic signals. Assume that the Ihat her probability of distance between each of the four is sufficiently iii green light at any intersection is independent of what happened al any previous intersection. The first two lights are green for fony seconds of each minutt:; [ht: labl two, for thirty seconds of each minute. What is the probability that the commuter has to stop at least three times? 2.5,17. School board officials are debating whether to all high school seniors 10 take a proficiency exam before graduating. A student passing all three parts (mathematics, language skills, and general knowledge) would be awarded a diploma: otherwise. he to this would receive only a cerlificate of attendance. A practice test ninety-rive hlUldred seniors resulted in the following numbers of failures: Area Number of Students Mathematics skills General knowledge ......uF" ..... I',... 3325 1900 1425 ------------~~~~------ If "Student fails mathematics," "Student fails skills:' and "Sllllient fl'Ii1~ gene ral knowledge" are independent events, what proportion of next year's seniors can be expected 10 fail to qualify for a diploma? Does independence seem a reasonable assumption ir. this situation? 2..5.18. Consider the following four-switch circuit: If all switches opera Ie independently and P(swilch closes) = p, what is the probability the circuil is completed? 2.5.19. A fast-food chain is running a new promotion. For each purchase, iii customer is given a game card Ihat may win $10. The company claims that the probabililY of a person winning at least once in five tries is 0.32. What is the probability that a customer wins $10 on his or her first purchase? 2.5.20. Players A, B,.and C toss a fair coin in order. The first to throw a head wins. What are their respective chances of winning? 2.5.2L Andy. Bob, and Charley have gotten into a over a female acquaintance and decide 10 seule their dispule with a three-comered pistol duel. Of the three, Andy is the worst shot, hitting his target only 30% of the time. Charley, a litHe beuer, is on-targel 50% of the lime, while Bob never misses. The rules they agree to are simple: TIII;:Y 1;11t;: tu rhe at the of their choice in succession, nnd cyclically, in the order Andy, Bob. Andy, Bob, Charley, and so on until only one of them is left Section 2.6 2.5..22. 2.5.24. 2.5..25. 2.5..26. 2.5.1:1. 2.5..28. 2.5.29. Combinatorics 85 standing. (On each "tum," they get onJy one shot. If a combatant is hit, he no longer participates, as a shooter or as a target.) Show that Andy's optimal strategy, assuming he wants to maximize his chances of staying aiive, is to tire his first shot into the ground According to an advertising study, 15% of television viewers who have seen a certai,\ automobile commercial can correctly identify the actor who docs the voice-over. Suppose that 10 such people are watching TV and the commercial comes on. What is the probability that at least one of them can name the actor? What is the probability that exactly one can name the actor? A fair die is rolled and then n fair coins are tossed, where n is the number showing on the die. What is the probability that no heads appear? Each of m urns contains three red chips and four white chips. A total of r samples with replacement are taken from each urn. What is the probability that at least one red is drawn from at least one urn? If two fair dice are tossed. what is the smallest number of throws, 11, for which the probability of getting at least one double six exceeds 05? (Note: This was one of the first problems that de Mere communicated to Pascal in 1654.) A pair of fair dice are rolled until the first sum of eight appears. What is the probability thaI a sum of seven <k>es not precede first sum of eight? An urn contains w white chips, b black chips, and r red chips. chips are drawn out at random, one at a time, with replacement. What is the probability that a white appears before a red? A Coast Guard dispatcher receives an SOS from a ship that has run aground off the shore of a small island. Before the captain can relay her exact position, though, her radio dead. The dispatcher has n helicopter crews he can send out to conduct a »t:~IH.':I:I. He the ship is somewhere either south in area 1 (with probability p) or north in area (with probability 1 p). of the n rescue parties is equally competent and has probability r of locating the ship given it has run aground in the sector being searched. How should the dispatcher deploy the helicopter crews to maximize the probability that one of them will find the miSSing ship? Hint: Assume that m search crews are sent to area I and 11 - m are sent to area II. Let B denote the event thaI the is found, let AI be the event that the ship is in area I, and let A2 be the event the ship is in area II. Use Theorem 2.4.1 10 get an expression for PCB); then differentiate with respect to m. A computer is instructed to generate a random sequence using the digits 0 through 9; repetitions are permissible. What is the shortest length the sequence can be and still have at least a 70% probability of containing at least one 4? COMBINATORICS Combinatorics is a time-honored branch of mathematics concerned with counting, ing, and ordering. While blessed with a wealth of early contributors (there are references to rombina torial problems in the Old Testament). its emergence as a separate discipline is often credited to the German mathematician and philosopher Gottfried Wjlhelm Leibniz Dissertatio arte combinatorial waS perhaps the first (1646-1716), whose 1666 monograph written on the subject (111). Applications of comhinatorics are rich in both diversity and number. Users range from the molecular biolOgist trying to determine how many ways genes call be positioned along a chromosome, to a computer scientist studying queuing priorities, to a psychologist modeling the way we learn, to a weekend poker player wondering whether he should 86 Chapter 2 Probability or a flu.<;h. or a full house. Surprisingly enough, solutions to all of are in the same set of four basic theorems and rules, the differences that seem to distinguish one question from another. Counting Ordered 5e«lu~m(:es: The Multiplication Rule More often than not, the relevant "outcomes" in a combinatorial problem are ordered ........ 1 ...... ". If two dice are rolled. for the outcome (4, 5)-that is, the die an ordered sequence of length two. comes up 4 and the second die comes up The number of such sequences is calculated by using the most fundamental result in combinatorics, the multiplication ruLe. Multiplication Rule. If operation A con be performed in m different ways and operation B inn different ways, the sequence (operation A, operation B) can be performed in m . n different ways. Proof.. At the risk of belaboring the obvious, we Can the multiplication rule considering a tree diagram (see Figure 2.6.1). Since each version of A can be followed any of n versions of B. and there are m of the the total of "A. B" sequences that can be pieced together is obviously the product m . n. 0 Opernlioll A Operation B 1 1 2 AGUflE 2.6.1 Corollary. If operation Ai, i = 1,2, .. - . k, cnn. be perfnrmpd in n; ways, i = 1. 2..... k, respectively, then the ordered sequence (operaliol1 AI, operation A2 •. .. , operation. At) can be performed in ttl • /12' ... . nk. ways. EXAMPlE 2.6.1 The combination lock on a bas two dials, each marked off with 16 notches Figure 2.6.2). To open the case, a person turns the left dial in a certain direction mark. The dial is set in a similar for two revolutions and then stops on a fashion, after having been turned in a certain direction for two revolutions. How many different settings are possible? Section 2.6 Combina1orks 81 c c FtGURE 2.6.2 In the terminology of the multiplication rule, opening the briefcase corresponds to the fouHtep sequence (AI. A2, A3, A4) detailed in Table 2.6.1. Applying the previous corollary, we see that 1,024 different settings are possible: Number of different settin~ = n 1.' =2 n2 . n3 . n4 . 16 . 2 . 16 = 1,024 TABlE 1.6.1 Purpose Rotating the left dial in a particular direction Choosing an endpoint (or the left dial Rotating the right dial in a particular direction Choosing an endpoint (or the right dial Number Options 2 16 2 16 Comment. Designers of locks should be aware that the number of dials, as opposed to the number of notches on each dial, is the critical factor in determining how many different settings are possible. A two-dial lock, for example, where each dial has twenty notches, to only 2 . 20 . 2 . 20 = 1600 settings. If those forty notches, though, are gives distributed among/our dials (10 to each dial), the number of different settings increases 2· to . 2 . to . 2 . 10 . 2 . to). a hundredfold to }60,OOO EXAMPLE 2.6.2 Alphonse Bertillon, a nineteenth-century French criminologist, developed an identification system based on eleven anatomical variables (height, head width, ear length, etc.) that presumably remained essentially unchanged during an individual's adult life. The range of each variable was divided into three subintervals: smaU, medium, and large. A person's Bertillon configuro.t.ion was an ordered sequence of eleven letters, say S, 1£, m, m, I, $, l,s. 1£, m, 1£ 88 Chapter 2 Probability where a letter indicated the individual's "size" relative to a particular variable. How that at leasr two citizens will populated does a city have to be before it can be have the same configuration? Viewed as an ordered sequence, a Bertillon configuration is an eleven-step classification system, where three are available at each step. By the multiplication rule, a total of 3 11 , or distinct sequences are any ci ty with at 177.148 adults would necessarily have at least two residents with the same limited number of possibilities generated by Bertillon's variables proved to be one of its major weaknesses. Still, it was widely used in Europe for criminal identification before the de,velopment of fingerprinting.) EXAMPLE 2.6.3 In 1824 Louis Braille invented what would eventually become the standard alphabet for the blind. Based on an earlier form of "night writing" used by the French army for reading battlefield communiques in the dark, Braille's system replaced each written character with a six-dot matrix: . .. .. .. .. .. . .. where certain dots were the choice depending on the character being transcribed. The letter e, for example, has two raised dots and is written .. .. .. Punctuation common words, suffixes, and 80 on also have dot patterns. In all, how many different can be enciphered in Braille? 'Think of the dots as six distinct operations, numbered 1 to 6 (see Figure 2.6.3). In two options for each dot: We can it or not raise forming a Braille letter, we it. The letter e, for example. corresponds to the sequence (raise, do not raise, do Options 1 .. 4- 2· 5· El 3- 6 .. Dot number 4 fK.;URE 2.6.3 2 6 Sequences ......."1'.,.,." 2.6 Combinatorics 89 = not do not do not The number of such sequences, with k 6 and 111 = 112 = ... = n6 = 2, is 26 , or 64. of tbose sixty-four configurations, though, has no raised dots, making it of no use to a blind person. 2.6.4 shows the entire sixty-three-character Braille alphabet. · · · '·" ·· '" ·· · · · · · · · · a · · b · · e · · d · ·e · · · · g · · h · · · · j · II II II II II II II II II II II II II II II II II II II · II 3 2 II · · · · .. ·· I II II 5 4 II II .· · .. .. · · .. ..· ..· '." ..· .· · · .. " .. · .. · .. .. .. .. · · · II II II II II II II II II II v u · · · III '·" cb ..· n m 1J: · ·· · · · th sh II II II · II II II · II II '"and .. · · · II III · III II III II II II · · II · .. II r · · II II II · II t S II II II II II II II of for II II II II II the III III with II II II III II ed wb II II 0 . · ..· · · '" '" .. . '" '" .. · · .. ..· .. .. .. ..· · .. · · ..· · · ·II II II'" '·" z II II q II · II ·· .. · .. 9 8 7 p 0 y II 6 II II j f 1 II II II III au er w ow · · · · · · · · · · · · · · · · · ..· · · · · · · · · · .. · · '" · · · · · · .. in en , ; 'n III III II II II II II II II II II II II II III II II st II · · .. · · ·· · · · · · .· . ..· · ·· · · , · - II III II 0 ! : ·· II II III II II II #I II ar · '" · · .... · ..· · · · · · · · · · .. · · · · · · ·General· ·. · · . · · ·italic.. Letter · Capital · II II II II II accent Used (or sigll; sigll two-celle.:l decimal point cootractiOIlS RGURE 2.6.4 II sign III II III II III 2 Probability EXAMPlE 1.6.4 annual NCAA ("March Madness") basketball tournament starts with a field of teams. After six rounds of play, the squad unbeaten is declared the U<lL'V>J<lJ champion. How many different configurations of winners and losers are possible. starting with the first round? Assume that the initial pairing of the invited teams thirty-two first-round been done. a tournament of this sort can pJay out is an exercise the number of in applying the multiplication rule twice. Notice, first, that the first-round 32 Similarly, the resulting sixteen second-round games games can be decided in 2 can different winners, and so on. Overall. the totrrnament can be pictured are where the number of possible outcomes at the six , respectively. It follows that the number of possible tournaments ........ u."' ... , would be equally likely!) is the product 232 . 216 . .21 , SlXW-IOtlf EXAMPLE 1.6.5 An octave contains distinct notes (on a piano, five black keys and seven white keys). How many ,htt",V',,,.,.,t melodies within a octave can be if the bLack keys and white need to alternate? Choices: --5 7 5 7 5 7 5 7 ----- or 7 5 7 5 7 5 7 5 BWBWBWBW WBWBWBWB 1 2 3 4 5 678 12345678 (b) FIGURE 2.6.5 There are two different ways in which the black and alterna te-the could produce notes 1, 3, 5, and 7 in the melody, or 4, 6, and 8. Figure 2.6.5 diagrams two cases. Consider the VL~""""'__ the odd-numbered notes in the melody. In Multiplication notes 1, 3, 5, and 7 correspond to A], A3, AS. and A7 which the numbers of options are nl = 5, fl3 = 5, ns 5, and m = 5. The white (that is, Operations A2. A4, and As) all have nl 7, i 2,4.6,8, so the number of a black note oomes first-is the product 74 , different "alternating" or 1,500,625. the same argument, the second case (where the bJack keys produce the evennumbered notes tn a melody) also generates 74 54 = 1,500.625 melodies. the number of with alternating black and white notes is the sum ~...J,"rv.U'~J + 1,500,625, or Section 2.6 Combinatorics 91 PROBLEM-SOLVING HINTS (Doing combinatorial problems) Combinatorial £1""'''Y''''T'''' sometimes call for problemooSolving techniques that are not routinely used other areas of mathematics. The three listed below are especially helpful. that shows the structure of the outcomes that are being counted. 1. Draw a Be sure [0 include (or indicate) relevant A case in point is Figure Recognizing at the outset that there are two mutually exclusive black keys can ways for black keys and white keys to alternate either the odd~numbered notes or the even-numbered is a critical first step in solving problem. Almost invariably, diagrams as these will suggest or combination of formulas, that should be applied. the 2. Use to "test" the appropriateness of a formula. Typically, is, number of ways to answer to a combinatorial something-will be so large that all possible outcomes is not feasible. It often is feasible, though, to construcl a simple, but analogous, problem counted). the pnJO(JSed which entire set of outcomes can be identified formula not agree with the simple<ase enumeration, we know that our analysis the original question is incorrect. 3. If outcomes to be fall into structurally different categories, the total of outcomes will the sum (not product) of the of outcomes in each Recall Example Alternating melodies into two structural)y-different categories: black keys can be the odd-numbered notes or they can be the even-numbered notes is no third possibility). Associated with each category is a different set outcomes, implying the total number of alternating melodies is the sum of the numbers of outcomes associated with the two categ<m(~ QUESTIONS engineer wishes to observe the effects pressure, and catalyst If she intends to include concentration 00 the yield resulting from a certain two different temperatures, three pressures, and two levels of catalyst, many diffeI'lent runs must she make in order to observe each temperature-pressure-catalyst cOlfibination exactly 2-'.2. A coded message from a CIA operative to his KGB counterpart is to be sent the Q4Er, where first and last entries must be consonants; the second, an .1l"_E..... 1 through 9; and the third, one of the vov.rels. How many different ciphers can transmi tted? 2.6.3. How many terms will be included in the expansion of 2..6.1. A (0 + b + c)(d +e+ !)(x + y + u + v + w) Which of the following will be included in that number: aeu, cdx, bet xvw? 92 Chapter 2 Probability 2.6.4. Suppose that lhe formal for license in ~ certain slflle is two leiters followed by four numbers. (3) How many different plates can be made? (b) How many different plates are there if the leuers can be repealed but no lwo numbers can be the same? (c) How many different plates can be made if of numbers and lellers is al10wed except that no plate can have four zeros? 2.6.5. How many integers between tOO and 999 have and how many of [hose are odd numbers? 2.6.6. A fast-food restaurant offers customers a choice of (hat can be added to a Huw [HallY llifftl till h1:llllUUI gel:' (.Call lJe 2.6.7. In baseball there are twenty-four different "base-oul" on oulS, blises loaded-none ou~ and SO on). Suppose thai a new game. ''''-''U,'-''''I''. is played where there are seven bases (excluding home plate) and each five outs an How many base-out configurations would be in Puerto the have recently been codes are alleast as third digit? codes were five-digit numbers, Juan. reality, the lowest zip code was 00601 for for Ketchikan, Alaska,) An additional four code is now a nine-digit number. How are even numbers, and have a seven as "VlJV\.'-\J'VV'J, fourteen entrees. six de~e!'ts. and five beverages. How many different OO~;SIl)le if a diner intends 10 order only three courses? (Consider the 1V'"eTl.IlE Proteins are chains of molecules chosen from some 20 differen( amino acids. In a living cell, proteins are the a mechanism whereby ordered sequences of nucieo(ides in the messenger RNA dictate the formation of a particular amino acid, The four key nucleotides are cytosine. and uracil (A. C, and U). Assuming A, or U can appeal' any number of limes in a nucleotide chain and that all sequences are physically what is the minimum the chains must attain 10 have the capability of the entire sel of amino Note: Each sequence in the genetic code must have the same number of nucleo(ides. Residents of a condominium have an aUlomalic garage door opener that has a row of buttons. Each door has been programmed to respond to a particular set of bunons If the condominium houses 250 can residents be assured doors will open on the same signal? If so, how many additional families can before the eight-button code becomes inadequate? Note: The order in which Ihe buuons are pushed is in'elevanL In international Morse each leUer in the alphabet is symbolized by a series of dots and dashes: the Jette,!' "for is encoded liS ",-". What is the maximum number of dots and/or dashes 10 represent any letter in the .....!lOU_'" alphabel? The decimal number to a sequence of n binary digits ao, at ... , a,,-I, rl... lh"".rI 10 be 2.6.9. A restaurant ufftrs a choice uf fuur "' ...., .. ." .... , 2.6.10. 2.6.11. 2.6.12. 2.6.13. For example, the sequence 0 1 i 0 is equal to 6 Suppose a fair coin is lossed nine times. l- I . 21 I 1 22 +0 . the resulting sequence of H's Section 2.6 "''''rnlP'nt''~''' exceed 2.6.14. Given the Combinatoric; 93 a binary sequence of 1's and O's (1 for 0 for For how many of tosses will the decimal corresponding to the observed set of heads and tails in the word ZOMBIES in how many ways call two of the letters be arranged that one is a vowel and one is a consonant? 2.6.15. Suppose that two cards are drawn-in order-from a standard 52-card poker In how many ways can one of the cards be a club and one of the cards be an ace? 2.6.16. Monica's vacation plans require (hat she fly from Nashvitle to Chicago to Seattle to Anchorage. According to her travel agent, there are three available flights [rom Nashville to Chicago. five from Chicago to Seattle, and two from Seattle to Anchorage. Assume that the numbers of options she has for return flights are the same. How many round-trip itineraries can she schedule? Counting Pennutations (when the objects are all distinct) Ordered sequences in two fundamentally different ways. The fi~!. is the scenario addressed by the multiplication rule-a process is comprised of k operations, each to al10wing ni options, i = 1,2, ... , k; choosing one version of each operation 111112 ... Ilk possibilities. The second occurs when an ordered arrangement of some specified length k is formed from a finl.fecollection of objects. Any such arrangement is referred to as a permutation of length k. given the three objects A, 8, and there are different of length two that can be formed if the objects cannol be A B, A C, and Theorem 2.6.1. The number ofpermutations of length k that can be formed from a set of II distinc! elements, repetitions not allowed, is denoted by the symbol" Pk, where 1)(11 - 2)··· (11 k II! + 1)= --(II - k)! Proof. Any of the II objects may occupy the position in the arrangement, any of II - 1 second, and so on-the number choices available for filling the kth position will be n - k + 1 (see Figure 2.6.6). The theorem follows, then, from the multiplication rule: There will be lI(n - 1)· (n - k + 1) ordered arrangements. 0 Corollary. The number of ways /1(/1 1)(n - 2) ... 1 II!. n 1 10 permute an entire set of 11 dislinct objects is n-t n-(k-2) k-l Position in sequence FIGURE 2.6.6 n- (k k 1) 11 PI! - 94 Chapter 2 Probability EXAMPLE 2.6.6 How many permutations of length k = 3 can be fanned from the set of n = 4 distinct elements, A, B, C, and D? According to Theorem 2.6.1, the number should be 24: = rtl (n - k)! 4! _ 4- • 3 . 2 . 1 _ 24 (4 - 3)! - 1 - Confirming that figure, Table 2.6.2 lists the entire set of 24 permutations and illustrates the Clrgllment u<;ed in the proof of the theorem. TABLE 2.6.2 B~C 1. A~C-===~ -c:::::.::::::. 2. D B C A -c:::::.::::::. C 7. 8. 9. 8~C-===~ -=..:::.:::::: ,D C A C 10. 11. 12. (BAC) (BAD) (BCA) (BCD) (BDA) (BDC) 13. -c:::::.::::::. ~ D -c:::::.::::::.AB 17. 18- (CDA) (CDB) 19. 20. 2122. 23. 24. (DAB) (DAC) (DBA) (DBC) (DCA) (DCH) B ~ B D 4. 5. 0. A -::::::::::::::B ~ / 3. (ABC) (ABD) (ACB) (A CD) (ADB) (A De) A -c:::::.::::::. c ~B -====-=~ C -=::::::::::BA 14. 15. 16. (CAB) (CAD) (CBA) (eRn) EXAMPLE 2.6.7 In her sonnet with the famous first line, "How do I love thee? Let me count the ways," Elizaberh Barrett Browning listed eight. Suppose Ms. Browning had decided that writing greeting cards afforded her a better format for expressing her feelings. For how many years could she have corresponded with her favorite beau on a daily basis and never sent the same card twice? Assume that each card contains exactly four of the eight "ways" and thM. order matters. Section 2.6 Combinatoric!> 95 Ms. BTOwning would be creating a permutation of objects. to Theorem . Number of dlfferent cards = gP4 = (8 _8! 4)! = 8 . 7 . 6 . 5 = 1680 At the rate a a day, could keep the correspondence going more than four and one-half EXAMPLE al!()--·iorll! before Rubik cubes and electronic games had become epidemic-puzzles were much simpler. One of the more popular combinatorial-related was a four by four consisting of movable one empty space. object was to maneuver as quickly as an arbitrary configuration (Figure 2.6.7a) into a specific pattern 2.6.7b). How many different ways could the puzzle arranged? Take empty space to square number and imagine the rows of the grid laid to end to make a sixteen-digit sequence. permutation that sequence corresponds to a different pattern for the grid. By the corollary to Theorem 2.6.1, the to position tiles is 16!, or more than twenty trillion (20,922,789,888,000. to be That is more than fiflY limes the number of Slars in the entire Milky Way galaxy. (Nole: Not all 16! permutations can be generated without physically removing some of the tiles. Think of the two by two version of 2.6.7 with tiles nUlllbered 1 through 3. How many of the 4! theoretical can actually formed?) (b) (II.) FIGURE 2.6.7 EXAMPLE 2.6.9 A of 52 cards is shuffled and dealt face up in a TOW. For how many arrangements will four aces be adjacent? is a good for i\Justrating the problem-solving benefits that come shows structure that drawing diagrams, as mentioned earlier. 96 Chapter 2 Probability Non-Bees 1 3 2 4 FIGURE 2.6.8 to be considered: The four aces are positioned as a "clump" somewhere between or around the forty-eight non-aces. Qearly, there are forty-nine "spaces" that could be occupied by the four aces (in front of the first non-ace, between the first and second non-aces, and so on). Furthermore, by the corollary to Theorem 2.6.1, once the four aces are assigned to one of those forty-nine positions, they can still be permuted in 4 P4 = 4! ways. Similarly, the forty-eight nOTI-aces can be arranged in 4SP48 = 48! ways. It foHows from the multiplication rule, then, that the number of arrangements having consecutive aces is the product, 49 . 41 . 48!, or, approximately. 1.46 x 10M. Comment. Computing n! can be quite cumbersome, even for n's that are fairly small: We saw in Example 2.6.8, for instance, that 16! is already in the trillions. Fortunately, an easy-to-use approximation is available. According to Stirling's formula, In practice, we apply Stirling's formula by writing and then exponentiating the right-hand side. Recall Example 2.6.9, where the number of arrangements was calculated to be 49 . 4! . 48!, or 24 . 49!. Substituting into Stirling's formula. we can write 1oS10 (49!) == 10810 (Ji;) + ~ (49 + ~) 1oSlO (49) 62.783366 Therefore, 24 . 49! == 24 . 1062.78337 = 1.46 X 1064 - 4910slO (e) Section 2.6 Combinatorics 97 EXAMPlE 2.6.10 In chess a rook can move vertically and honwntally (see Figure 2.6.9). can capture any unobstructed piece located anywhere in its own row or column. In how many ways can eight distinct rooks be placed on a chessboard (having eight rows and eight columns) so that no two can capture one another? FIGURE 2.6.9 To start with a simpler problem, suppose that eight rooks are all identical. Since no two rooks can be in the same row or same co)umn (why?), it folloYlS that row must contain exactly one. The rook in the first row, however, can be in any of eight columns; the rook in the second row is then limited to being in one of seven columns, and so on. By the mUltiplication rule, then, the number of noncapturing configurations for eight identical rooks is BPg. or 8! (see Figure 2.6.10). Choice!; 8 7 6 5 4 Thtall1umber g·7·6·5·4-3-2·1 3 2 1 RGURE2.6.10 Now imagine the eight rooks to be distinct-they might be numbered, for example, 1 through 8. The rook in the first row could be marked with any of eight numbers; the rook in the second row with any of the remaining seven numbers; and so on. Altogether, 98 2 Probability there would to position eight patterns for each configuration. The total number of ways noncapturing rooks, is 8! . 8!, or .""....., ...1'"'0 EXAMPLE 2_6_11 A new horror movie, the 13'h, Par! X, stars Jason's great-grandson as a psychotic to dismember, decapitate, or do whatever it takes to dispatch (Le., victim orders) can four men and women. (a) How many screenwriters devise, they want Jason to do away with all the men going after any of the women? (b) How many scripts arc possible if the only restriction for last? on Jason is that he save a. the male counselors are denoted A. B, C, and D. the femaJe counselors, and Z. Among the plots would be the pictured in where B is done in then and so on. Thc mcn, if thcy firc to be restricted to the first four can still be permuted in 4 P4 = 4! ways. The same found the women. Furthermore, the plot in its can be thought of as a first the men are eliminated, the then the women. Since 4! ways are available to do the fonner and 4! the total number different scripts, by the multiplication ruie, is 4! 4!, or 576. B Men 0 A 1 2 C Y Z W 4 3 S 6 7 Order of killing I) FIGURE 2.6.11 h. If the only condition to admissibJe scripts is other seven counselors met is that Muffy be that being the with last, the number of of ways to permute the 2.6.12). BWZCYAD 12345678 Order of killing RGURE 2.6.12 EXAMPLE 2.6.12 Consider the set of numbers that can be formed by repetition the 1 through 9. For how many of those permutations will the 2 precede the 3 and the 41 That is, we want to count sequences like 7 2 5 1 3 6 9 4 8 but not like 6 8 1 5 42 7 3 9. At first glance, this seems to be a problem well the scope of Theorem though, its solution is surpriSingly simple. With the help of a symmetry of just the digits 1 4. By the Corollary on page those four numbers rise. to 4!(= 24) pe.rmutations. Of those 24, only , 'i, :1, 4), (2, 1,3,4), (1,2,4, Section 2.6 Combinatorics 99 1,4, J}-have the that the I and the 2 come before the 3 and the 4. It foHows that of the total number nine-digit permutations should satisfy the condition being imposed on 1, 2, 3, and 4. -n Number of permutations where 1 and 2 precede 3 4 4 = 24 ·9! 60,480 QUESTIONS 2.6.17. of a large corporal ion has six members willing 10 be nominated for office. How many different "president/vice president/treasurer" could be submitted to the stockholders? How many ways can a set be put on a car if all the tires are How many ways are poosible if 11.110 of the four are snow 2.6.19. Use Stirling's formula to approximate 30!. (Note: The exact answer is 265,252,R"iQ.81~268,935,3t5, 188.400,000.000.) 2.6.20. The nine members of the baseball team. the Mahler Maulers. are all and each can position equaJly poorly. In how many different ways can the Maulers take 2.6.21. A number is to be formed from the digits 1 through 7. with no digit being used more than once. How many such numbers would be than 289? 2.6.22. men and four women are to seated in a row of chairs numbered 1 through 8. (a) How many total are possible? (b) How many arrangements are if the men are to sit in alternate 2.6.23. An engineer needs to take three technical electives sometime during his final four semesters. The three are to be from a list of ten. In how many ways can he schedule those classes, assuming that he never wants to lake more than one technical P'P,("-"'P in given term? 2.6.24. How many ways can a twelve-member cheerleading squad (six men and six women) up 10 (arm six male-female teams? How many ways can six male-female teams be positioned alon~ a sideline? What the number 6!6!26 What might the number 6!6!2t>:2 2 represent? 2.6.25. thal a seemingly intemlinable German opera is recorded on all six sides of a three-record album. In how many can the six sides be played so that at least one is out of order? 2.6.26. A of n families. each with III members, are to be lined up for a photograph. In how many ways can the 11m be arranged if members of a must Slay together? 2.6.27. Suppose that len people. induding you and a line up for a picture. How many ways can the photographer the line if she wants to exactly three people you and your 2.6.28. Theorem was the first mathematical result known to have been proved by that feal being accomplished in 1321 by Levi bell Gerson. Assume that we do not know the multiplication rule. Prove the theorem the way Levi hen Gerson did. 2.6.29. In how mallY ways can a pack of fifty-two cards be dealt to thirteen players, four to each, so that every player has one card of suit? 100 Chapter 2 Probabil ity 2.6.30. If the definition of n! is to hold for all nonnegative n, show that it follows that O! must equal one. 2.6.JL The crew of Apollo 17 consisted of a pilot, a copilot, and a geologist. Suppose that NASA had actually trained nine aviators and four geologists as candidates for the flight. How many different crews could they have assembled? 2.6.32. Uncle Harry and Aunt Minnie will both be attending your next family reunion. Unfortunately, hate each other. Unless they are seated with at least two people into a shouting match. The side of the table between them, they are likely to at which they will be seated has seven chairs. How many seating arrangements are available for those seven people if a safe distance is to be maintained between your aunt and your uncle? 2.6.33. In how many ways can the digits 1 through 9 be arranged such that (a) all the even digits precede all the odd digits (b) aU the even digits are adjacent to each other (c) two even digits begin the sequence and two even digits end the sequence (d) the even digits appear in either i:!.!>lXm.li.ll1:\ UI uescendiog order? Counting Permutations (when the objects are not all distinct) The corollary to Theorem 2.6.1 gives a formula for the number of ways an set of n objects can be pennuted if the objects are all distinct. Fe,wer than I'l! permutM.1ons are possible, though, if some of the objects are identical example, there are 31 = 6 ways to permute the three distinct objects A, B. and C: ABC ACB BAC BCA CAB CBA If the three objects to permute, though, are A, A, and B-that if two of the three are identical-the number of permutations decreases to three: AAB ABA BAA As we will see, there are many rea]-worlcJ applkalions where the II Objects to be pennuted belong to r different categories, each category containing one or more identical objects. Theorem 2.6.2. The number of ways to arrange n objects, 1.11 being of one kind, n2 of a second kind, .... and I1r of an rth kind, is n! r where lIi = 11. Section 2.6 Combinatorics 101 Proof N denote the total number of such arrangements. anyone of those N. the objeds (if were actually could arranged in n1 !n2! ... 11,.! ways. (Why?) It follows N . nl !1I21 . . . is the total number of ways to arrange n (distinct) objects. But n! equals that same number. Setting N . n1!n2!'" equal to 111 gives result 0 are called multinominl coefficients because Comment. Ratios like n!/(n1 !n2!' _. the general term in the expansion of (Xl + X2 + ... + x,.),' is I _ __n_,__ "I L"2 X "1!n2!" -n,.! 1 ,x" T .. "4c ,. EXAMPLE 2.6_13 how many A pastry a vending costs quarters, three dimes, and one nickel? 1 2 3 can a customer put in two 5 4 6 Order ill which coins are deposiled AG URE Hi. 13 If coins of a say QDDQND (see I:leJlonguig to r = 3 "",,-pm...",><: COlrlSl,c.lelred identical, then a typical thought as a permutation of nickels = of dimes = number of quarters "2= "3 sequence, n = 6 objects 1 3 =2 By Theorem 2,6.2, there are sixty such sequences: Of course, had we different times), the coins were minted at different places number of ..... ~' permutations would be 6!, or "LH .. 102 . . ., "~""L'''' 2 Probabitity EXAMPlE 2.6.14 Prior to the seventeenth century there were no scientific a state of affairs that made it difficult researchers to discoveries. If a sent a copy of his work to a there was always a risk that the might claim it as his own. The obvious alternative-wait to to publish a book-invariably resulted in lengthy So, as a sort documentation, scientists would sometimes send each other anagrams-letter that, when properly unscrambled, summarized in a sentence or two what had discovered. (1629-1695) looked through his telescope saw the ring When Christiaan (203): around Saturn, he the following llli, mm. aaaaaaa, ccccc, d,eeeee.g.h, nnnnnnnnn, 0000. pp, lJ, rr. S, Iltlt, uuuuu ways can t.he sixty-two letters in Huygens's be arranged? 7) denote the number 5) the numberofc's. and so on. l!I1'I."..",nY'.",T"p. multinomial we find 62! IV == 7!5!1!5!1!117!4!2!9!4!2!1!2! os the totnl apply Stiding's of Ilrrnngements. To to the numerator. for the H . . ."'UAL ........' " N, we need to 62!='= then (.&) - 62 . log(e) + 62.5 . log(62) 10g(621) ~ The 85.49731 of 85.49731 is 3.143 x lOSS, so is a number on the order of 3.6 x 1000. was clearly When rcnrrnnged, the anagram becomes "Annnio terlUi, plano nusquam ad eclipticam inclinato," which translates to "Surrounded by a thir ring, fiat, suspended nowhere, inclined to ") Section 2.6 Combinatoric:s 103 EXAMPLE 2.6.15 What is the coefficient of x23 in the (1 + x S + x 9 ) 100? To understand how this question reLates to permutations, consider the simpler problem of expanding (a + b)2: (a + b)2 = (a + b)(a + =a·a+a·b+b·a+b·b == a l + 2a.b + b2 Notice that each term in the first (a + b) is multiplied by eaeh term in second (a + b). Moreover, the coefficient that appears in of term expansion corresponds to the number of ways that that term can 2 the term 2a.b reflects the fact that the product ab can result multiplications: (a \, + b)(a + b) or (a ,j + +b) ob By analogy. the coefficient of in the expansion of (1 + x S + x 9 )100 will be the number of ways that one term from eacb of the one hundred factors (1 + x 5 + x 9 ) can multiplied together to form . only factors that will produce , though, is the 5 9 set of two X '8, one x , and ninety-seven 1'8: that the coefficient of x l3 is the number of ways to ninety-seven 1's: So, from Theorem 2.6.2, lVU'iJW;) coefficient of x23 = n.>.T1T'I1It .. two x 9 .s, one x 5 , 100! = 485,100 EXAMPlE 2.6.16 palindrome is a phrase whose letters are in the same order or forward, such as Napoleon's lament ba.c~k:Ward Able was I ere I saw Elba or the often cited Madam, I'm Adam. wn4~Inl~r are 104 Chapter 2 Probability Words l.hc:msc:lvt::lS can ~ the unit::; in 1:I palindwmc:, lit; in the. t;t:nknc.:e;:. Girl, bathing on Bikini, eyeing boy, finds boy eyeing bikini On bathing girl. Suppose the members of a set ~isting of four objects of one type, six of a second type, and two of a third type are to be lined up in a row. How many of those permutations are palindromes? Think of the twelve objects to arrange as being four A's, six B's, and two C's. If the arrangement is to be a palindrome, then half of the A's, half of the B's, and half of the C's must occupy the first six positions in the permutation. Moreover, the final six members of the sequence must be in the reverse order of the first six. For example, if the objects comprising the first half of the permutation were CAB A B B A C then the last six would need to be in the order B B A B It follows that the number of palindromes is the number of ways to permute the first six objects in the sequence, because once the first six are positioned, there is only one arrangement of the last six that will complete the palindrome. By Theorem 2.6.2, then, number of palindromes = 6!/(2!311 1) = 60 EXAMPLE 2.6.11 A deliveryman is currently at Point X and needs to stop at Point 0 before driving through to Point Y (see Figure 2.6.14). How many different routes can be take without ever going out of his way? y 0 x FIGURE 2.6.14 Notice that any admissible path from, say, X to 0 is an ordered sequem:e uf 11 "moves"-rune East and two North. Pictured in Figure 2..6.14, for example. is the particular X to 0 route E ENE E E ENE E E Section 2.6 Combinatorics 105 Similarly, any acceptable path from 0 to Y will of five moves and three moves North (the one indicated is E E NNE N E Since each path from X to 0 corresponds to a unique permutation of nine and two N's, the number of such paths (from Theorem 2.6.2) is the quotient 11!/(9!21) the same reasons, the number = different paths from 0 to Y is 8t/(5!3!) = 56 By the Multiplication Rule, then, the number of admissible routes from X to Y that pass through 0 is the product of 55 56, or 3080. QUESTIONS 2.6.34. Which state name can more permutations, TENNESSEE or FLORIDA? How many numbers greater than 4,000,000 can be formed from the digHs 2, 3, 4, 4, 5, 5,5? 2.6.36. An interior decoraror is to a shelf containing books, three wirh covers, rhree with blue covers, and two with brown covers. (9) Assuming the titles and the of the are irrelevant, in how many ways can she arrange the eight books? (b) In how many ways could the books be arranged if they were all considered distinCt? (c) In how many ways could the books be if the books were considered indistinguishab1e, but the other five were considered distinct? 2.6.37. Four (A, D), three Chinese (#, *, &), and three (0', /3, y) are lined up at the box office, waiting 10 buy tickets for [he World's Fair. (0) How many ways can they position themselves if the Nigerians are to hold the first and the Greeks, the last three? four pJaces line; the Chinese, the next (b) How many arrangements are possibJe if members of the same nationality must stay together? (c) How many different queues can be T"T'rn~'l'n (d) Suppose a vacationing Martian strolls by and wants to photograph the ten for scrapbook. A bit myopic. the Martian is capable of discerning more obvious differences human anatomy but unable to distinguish one Nigerian (N) from another, one Chinese (C) from another. or one Greek (G) from another. Instead of perceiving a line to be B * /3ADfI&CO'y, for examp1e, she would see NCGNNCCNGG. From the Martian's perspective, in how many different ways can the ten funny-looking Earthlings line themselves up? 2.6.38.. How many ways can the letters in the word SLUMGULLION be arranged so that the three Vs precede all the other consonants? 2.6..39. A tennis tournament has a field of 2n alJ of whom need to be scheduled to play the first round. How many different pairings are possible? 106 Chapter 2 Probability + 2.6..40. What is the coefficiellL of in the of (1 + 2.6.41. In how many ways can lht: It:llt:rs of the word ELEEMOSYNARY be arranged so that the S is always immediately followed a Y? 2.6.42. In how many ways can lbe word ABRACADABRA be formed in the below? Assume that the word must begin with the top A and progress downward to the bottom A. _.~....~" ..... 1 R A C A A C C A A D D A D A A n B R 2.6.43. a pilCher faces a batter who never swings. For how many different ball/strike sequences will the batter be called out on the fifth pitch? 2.6.44. What is the coefficient of w 2x 3YZ 3 in the expansion of (w + x + y + 2.6.45. in a plane, no thT~.e of which lie on fI sf rnight jille. In how many wfly~ be used as vertices to form two triangles? (Hint: Number the points Call one of the triangles A and the other B. What does the permutation A A B 123 B A B 456 represent?) 2JJ.46. Show that (k!)! is divisible by . (Hint: Think of a related permutation problem whose solution would Theorem UA7. In bow many ways cali the letters of the word BROBDI NGNAGI AN be arranged without changing the order of the vowels? STATISTICS IS FUN. In bow IHMY ways can the letters in the anagram be 2.6.49. Linda is taking a five-course load her first semester. bDJgilSJtl, and history. In how many different ways can sbe earn three A's and two the entire set of possibilities. Use Theorem 2.6.2 to verify your answer. 2.6.48. Make an iUlagnuII uut uf lhe [l;1miiiar Section 2.6 Combinatorics 101 Counting Combinations Order is not always a meaningful characteristic of a collection of elements. Consider a poker being dealt a five-card hand. Whether he receives a 2 of hearts, 4 of clubs, 9 of clubs, of hearts, and ace of diamonds in thai order, or anyone of the other 51 - 1 permutations of those particular five cards is hand is still the same. As the last set of examples in this section bear out, there are many such situations-problems where our only legitimate concern is with the composition of a set of elements, not with any particular arrangement. We call a collection of Ie uru:rrdered elements a combination of size Ie. For example, and D-there are six ways to fOTn'l given a set of 11 = 4 distinct elements-A, B, combinations of size 2: AandB BandC AandC B D A D CandD A formula for counting combinations can be derived quite easily already know about counting permutations. what we Theorem 2.6.3. The number of ways to form combilUllions of size k from a set ofn distinct objects, repetitions not allowed, is denoted by the symbols (~) or nCk, where Proof. Let the symboJ (:) denote the num ber of combinations satisfying the condi· lions of the theorem. Since each of those combinations can be ordered product k! ways, the (: ) must equal the number of permuto1ions of length Ie that can be formed from n distinct elements. 11 distinct elements can be formed permutations of length k in n(n - 1)··· (n - k + 1) = n!/(n - k)! ways. Therefore, n! (n - Sclving for (~) gives result. k)! o Commeut. It often helps to think of combinations in the context of drawing objects out of an urn. If an urn contains n chips labeled. 1 through n. the number of ways we can reach in and draw out different samples of size Ie is (:). In deference to this sampling interpretation for the formation of combinations, (:) is usually read "n things taken Ie at a or "II choose k." 108 Chapter 2 Probability Comment. The symbol Since the expression (:). k (~) appears in the statement of a familiar theorem from raised to a power involves two tenus, x and )" the constants = 0, 1, "', n, are commonly referred to as binomial coefficients. EXAMPLE 2.6.18 Eight politicians meet at a fund-raising dinner. How many greetings can be exchanged if each politician shakes hands with every other politician exaclly once? Imagine the politicians to be eight chips-l through 8--in an urn. A handshake corresponds to an unordered sample of sjze 2 chosen from that urn. Since repetitions are not allowed (even the most obsequious overzealous of campaigners would not hands with 2.6.3 applies, and the total num ber of IS (~) = 8! or 28. EXAMPLE The basketball recruiter for Swampwater Tech has scouted sixteen former NBA starters that he: thinh he: can p~ uff as Junior transfers-six are guards, seven arc and three are centers. Unfortunately, his slush fund of illegal alumni donations is at an low and he can afford to buy new for only nine of the If he wants to keep three four forwards, and two centers, how ways can he pan:el out the This is a combination problem that also an application of the multiplication TAr'U]"' ...... '" rule. note there are (~) sets three guards that could chosen to "",,,",,",' "'-' Corvettes (think of drawing a set of three names out of an urn containing six names). Similarly, the forwards and centers can be bribed in (:) and follows from the multiplication ruk, cars is the or 2] 00 20· 35 . 3). (~) ways, respectively. It that the total number of ways to divvy up the Section 2.6 Combinatoria 109 EXAMPlE 2.6.20 Your statistics teacher announces a twenty-page ..........LUF, assignment on Monday that is to be finished by Thursday You intend to the first Xl Monday. the next X2 pages Tuesday. and the X3 pages wherexl + X2 + X3 20 and each Xi ::: 1. In how many ways can you complete That many different sets of values can be chosen Xl. X2, and Imagine the nineteen between the twenty pages (see Figure 2.6.15). Choosing any two of spaces automatically partitioos twenty pages into nonempty sets. Spaces 3 7, for example, would correspond to reading three on Monday. four pages on Tuesday, and pages on Wednesday. The number different values for the set (XI, X2. X3), then, must equal the to select two "markers "-namely , = (~). or 171. II II II II II II II II II II II 'I II II II II II II II I 1 2 3 ! 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ! FIGUR£ 2.6.15 EXAMPlE 2.6.21 Mitch is trying to put a little zing into his act by at the '-"",!;;..... lU.U1P>; of each show. His current engagement is booked to run months. If he gives one performance a night never wants to repeat the same set of jokes on any two ~'I5r'L"", what is the minimum number of jokes he in his repertoire? months of performances create a demand for roughly 120 different sets n denote the of jokes that Mitch can teU. The question is asking for the smaUest n for which (:) :::: 120. Trial-and-error cakulations summarized in Table 2.6..3 show that the optimal n is surprisingly A set of only Mitch from having to repeat his opening monologue. TABLE 2.6.3 ?:120? n 7 8 -+- 9 35 70 126 No No Yes is sufficient to 110 Probability EXAMPLE 2.6.22 Binomial coefficients have many interesting properties. the most familiar is Pastriangle,] a numerical array where each entry is equal to the sum of the two numbers appearing diagonaUy above it (see 2.6.16). Notice that each triangle can be as a binomial coefficienl, and the relationship "'TIT..... "" ..... to reduce to a simple involving those coefficients: Prove that Equation 2.6.1 holds for aU integers 11 and k. Row o (g) (6) 1 2 3 4 3 (~) 3 4 fi d) (~) 2 (~) 4 (i) (j) (~) <i) (~) d) d) (~) (1) FIGURE 2..6.16 Consider a set of n samples of + k from 1 distinct objects At. A2 • ... A n +l. We can obviously draw set in (" ; 1) diffcrcnt ways. Now, consider any particular object .. for example, A1. Relative to A I, each of those (n ; 1) belongs to one of two categories: those containing A I and those not containing A I. To form samples containing Al, we need to select k 1 adJitiooal objects from the remaining n. This can be done in At. (k : 1) ways. Similarly, are (~) ways to form samples not ("; 1) must equal (;) + (k: 1). VU,..QI.u,u'!5 EXAMPLE 2.6.23 quite The answers to combinatorial questions can sometimes be obtained way in which approaches. What invariably distinguishes one solution from another is ollt..omes are characterized. For example, suppose you have just ordered a roast beef sub at a sandwich shop, and now you to decide which, if of tbe available toppings (lettuce, tomato, onions, its name, Pascal's was not discovered before the Frcnch mathcmaticilln propenies. wllS born. It \\13& Pascal. lIS basic structure was known hundreds though, who liin;t made exten~ive ll..e of its Section Combinatorics 111 Add? AGUREl.6.17 etc.) to add. If the store has eight "extras" to choose from, how many different subs can you order? One to answer this question is to think each sub as an ordered sequence of length eight. each position in the corres(Xinds to one the toppings. At each of those positions, you have two choices-Hood" or "do not add" that particular topping. Pictured in Figure 2.6.17 is the sequence corresponding to the sub that has tomato, and onion but no other toppings. Since two choices ("add" or "do not add") are available for of the eight toppings, the multiplication rule tells us that the number of different roast beef subs that would be requested is , or An ordered sequence of length eight, though, is not the only model capable of characterizing a roast beef sandwich. We can also distinguish one roast beef sub from another by the particular combination of toppings that each one has. For there are (!) = 70 different subs having exactly four toppings. It foHows that the total number different sandwiches is the total number of different combinations of size k, where k ranges fmm 0 to 8. Reassuringly, that sum agrees with the ordered sequence answer: total number of different mast beef subs = (~) + (~) + (~) + ... + (~) 1+8+28+···+1 =256 What we have just illustrated here is another property of binomial ooefficients-oamely. that t(n)k =2" (2.6.2) k=O of Equation 2.62 is a direct consequence of Newton 's binomial expansion the second comment following Theorem 2.6.3). QUESTIONS 2.6.50. many straight Lines can be drawn between five points (A, B, C, D, and E), no three of which are collinear? 2.6.51. The Alpha Beta Zeta sorority is trying to fiU a pledge class nine new members during fall Among the twenty-five available candidates, fifteen have been judged marginally acceptable and ten highly desirable. How many ways can the pledge class chosen to give a two-to-one ratio of highly desirable to marginally acceptable candidates? 112 Chapter 2 Probability Two of those can row only on the stroke while 2.6.52. A boat a crew of three can row only on bow side. In how many ways can the two sides of the boat be manned? five men and four women, interview for four summer internships 2.6.53. Nine sponsored by a city newspaper. (a) [0 how many ways can the newspaper choose a set of four interns? (b) In how ways can the newspaper a set of four interns if it must include two men two women in each set? (c) How many sets of four can be picked such that not everyone in a set is of the same sex? 2.6.54. TIle final exam in History 101 consists of five essay that the professor chooses to the students a week in advance. For how many from a pool of seven that are prepared? In this situation does possihle sets of questions does a student need to order mailer? 2.6.55. Ten basketball meet in the school gym for a pickUp game. How many ways can they form two teams of five each? 2.6.56. A chemist is trying to synthesize part of a straight-chain aliphatic hydrocarbon polymer that consists of twenty-one radicals-ten ethyls (E). six methyls (M), and five propyls (P). Assuming all arrangements of radicals are physically possible, how many different polymers can be formed if no two of the methyl radicals are to be adjacent? 2.6.57. In how many ways can the letters in MISSISSJ PP/ be 2.6.58. Prove that so that no two I's are adjacent? (~) = 211. Him: Use the binomial expansion mentioned on page lOR 2.6.59. Prove that (Hint: Rewrite the left-hand side as and consider the problem of selecting a ""'''P'''' of 11 objects.) from an set of2n 2.6.60. Sbuw tbal C) + (;) + ... (lIillt: Consider the (~) + (~) + ... of (x _ y)II.) 2.6.61. Prove that successive terms in the sequence (~), (~), __ . , (:) then decrease. (Hint: Examine the ratio of two successive terms, first increase and (j : k) / (:) .) Section 2.7 Combinatorial Probability 113 2.6.62. Imagine n molecules of a gas confined to a rigid container divided into two chambers by a semipermeable membrane. If i molecules are in the left chamber, the entropy of the system is defined by the equation Entropy = log (:1 ) If JI is even, for what configuration of molecules will the entropy be (Enfind usefuJ in characterizing heat exchanges, particularly tropy is a concept those involving gases. In general terms, the entropy of a system is a measure of its disorder: As the "randomness" of the positlon and velocity vectors of a system of particles increases, so does its entropy.) (Hint: See Question 2.6.61.) 2.6.63. Compare the coefficients of /k in {1 + t)LlO + t)C = (1 + r}d+e to prove that COMBINATORIAL PROBABILITY In Section our concern focused on counting the number of ways a given operation, or sequence of operations, could be perfonned. In Section 2.7 we want to couple those enumeration with the notion of probability. Putting the two together makes a lot of sense-there arc many combinatorial problems where an enumeration, by is nol particularly relevant. A poker player, for example, is not interested knowing the total number of ways he can draw to a straight; he is interested, though, in his probability of drawing to a In a combinatorial making the transition an enumeration to a probability If there are n ways to perform a certain operation and a total of m of those satisfy is some stated condition-call it A-then P(A) is defined 10 be the ratio, min. This assumes, of course, that all possible outcomes are equally likely. Historically, the "m over 1/" idea is what motivated early work of Pascal, Fermat, and Huygens Section 1.1). Today we recognize that not all probabilities arc so easily characterized. Nevertheless, the m/II model-the so-called classical definition of probability-is entirely appropriate for describing a wide variety of phenomena. EXAMPLE 1.7.1 An urn contains eight numbered 1 through 8. A sample of three is drawn without replacement. What is the probability that the largest chip the sample is a 5? Let A be the event "Largest chip in sample is a 5." Figure 2.7.1 shows what must happen in order for A to occur: (1) the 5 chip must be selected, and (2) two chips must be drawn from the subpopulation of chips numbered 1 through 4. By the multiplication rule, the number samples satisfying event A is the product G) .G)' 114 Chapter 2 Probability 0) 0 ---------- .... ..... - OlOosel - Choose 2 AGURE 2.1.1 The sample space S for the eJl.'periment of drawing three chips from the urn contains (~) outcomes, all equally likely. In this situation, m = G) . G)' = G)' 11 and G)' G) peA) = 0.11 G) EXAMPLE 2.1,2 An urn contains n chips numbered I through n, n while chips numbered 1 through n, and n blue chips numbered 1 through n (see 2.7.2). Two chips are drawn at random and without What is the prooobility that the two drawn are either the same color or the same number? '1 w. hi r2 w2 ~ '" tv" bIt without replacemenl FIGURE 2.1.1 Let A be the event that the two chips drawn are the same color; let B be the event that they have the same number. We are looking for peA U B). Since A and D here are mutually exclusive, P(A U B) 3n chips in the urn, two . (3n) 2 . Moreover, With peA) + PCB) total number of ways to draw an unordered sample of size 1S P(A);:::: P(2 U 2whites U 2blues) = P(2 reds) + P(2 whites) + P(2 blues) Section 2.7 Combinatorial Probability 115 and P(B} = P(two1's U tw02's U ... U (wan's) Therefore, n+1 -1 EXAMPLE 2.7.3 Twelve faif dice are rolled. What is the probability a. the six dice aJ] show one and the b. not all the faces are the same? c. each face appears exactly twice? aJl show a second face? six a. The sample space that corresponds to the "experiment" of rolling twelve dice is the set of ordered of length twelve, where the outrome at every position in the sequence is one the integers 1 through 6. If tbe dice are fair, all 612 such sequences are equally likely. Let A be the set of rolls where the first six dice show one face and the second six show another face. Figure 2.7.3 shows one of the sequences in the event A. Oearly, the that appears for the first half of the could any of the integers from 1 through 6. Faces 2 2 2 1 2 3 4 4 4 4 8 9 10 Position in sequence 2 2 5 2 6 -74 4 11 4 12 RGURfl.7.] Five choices would be available for the last half of the sequence (since the two faces cannot he same). The number of sequences the event A, then, is 6P2 6 . 5 = 30. Applying the t'mfn" rule = P(A) == 30/6]2' 1.4 X 10-8 h. Let B be the event that not aU the faces are the same. Then P(B) =1 =1 - P(B c ) 6/126 116 Chapter 2 ProbabHfty there are sixsequences-{l, 1, I, 1, 1,1,1,1,1,1,1,1,), ... , (6. 6, 6, 6, 6, 6, 6, 6, 6,6,6, 6,)-where the twelve faces are all the same. c. Let C the event that face appears exactly twice. From Theorem 2.6.2, the numbero(ways each face can twice is 12!/(2! 2!· 2! . 2! . 2! ·21) . P(C) = = 0.0034 . 2! . 2! . 2! . 21 . 2!) EXAMPlE 2.1.4 A fair die is tossed n times. What is probability that the sum of the showing is n I 2? The sample space with a die n times has (j' outcomes, all of which in this case are equally likely because the is presumed fair. There are two of outcomes that will produce a sum of n + 2-{a) n - 1 Is and one 3 and (b) n - 2 Is and two 2s (see 2.7.4). By Theorem the number of sequences having n 1 n! n.I n ha . ] 's and one 3 is -1----- = n; likewise, there are = ( 2 ') outcomes vmg 21sand two n P(sum = n Sum 11+ + 2) = ---:-'-'Sum = n + 2 n +2 1 1 1 1 3 1 1 1 2 3 1'1-1 n "2 3 2 n- 1 n 2 1'/ RGURE 2.7.4 EXAMPLE 2.7.5 the foHowing letters from a To the entertained, Tarzan Scrabble set to play with: AAA EE J J K L NN What is the probability toot Cbeetah (who can't spell) fOnTIS the following sequence: rp,~irr~m TARZAN LlKEJANE (Ignore the spaces between the words). R T z the letters at random Section 2.7 Combinatorial Probability 111 If similar letters are considered indistinguisha ,Ie, 2..6.2 appLies, and the total number of ways to arrange the fourteen letters is 14!/(3!2!1!1!1!1!2!1!1!1!), or 3,632,428,800. Only one of those sequences is tbe desired arrangement, so 1 P("TARZANLIKE1ANE") = 3,632,428,800 Notice that the same answer is obtained if the fourteen are considered distinct. Under that scenario, the total number of permutations is 141, but the number of ways to spel1 TARZAN JANE increases to because aU the A's, E's, and N's can permuted. Therefore, P("TARZANLIKE1ANEn) = 3!2!2! = -::--:;-:-:--1-:-:--:-:-:< 14! EXAMPlE 2.1.6 Suppose that k people are selected at random from the general population. Wbat are the chances that at least two those k were born on the same day? Known as the birthday problem, this is a particularly example of combinatorial probability because its statement is so simple, its analysis is straightforward, yet its solution, as we win see, strongly contrary to our intuition. Picture the k individuals lined up a row to form an ordered sequence. leap year is omitted, each person have any of 365 birthdays. By the muJtiplication rule, the group as a whole a space of birthday (see Figure 2.7.5). Define A to be tbe event "at least two people bave the same birthday." If each person to have the same of being born on any given day, 365k "'"'~i"'''''H''''''' is Figure 2.7.5 are equally likely, P(A) A = Number Counting the number of sequences in the numerator here is prohibitively difficult because of the complexity of the event A; fortunately, counting the number of sequences in is quite easy. Notice that birthday sequence in sample space belongs to exactly one of two categories (see Figure 2..7.6): 1. At least two people have the same birthday. 2. All k people have different birthdays. Possible binhdays: (365) 1 (365 2 ) _ 365 k different sequences Persoll FIGURE .2.7.5 118 Chapter 2 Probability ~eq!ueT1lceswhere at lea... two people bave lbe same birthday . Seqluel'1lC<:S where all k 10, March 1, . people have different birthdays 14,1an.1O. Sample space; all birthday sequences of length k (cofl[ains)65k outcomes). FIGURE 2.1.6 It foHows that in A Number = 365k - number of sequences where all k pe()I01e hl'lve different birthdays The number of ways to (orm birthday sequences for k people subject to the restriction that aU k must different is simply the number of to form permutations of length k from a set of 365 distinct OOllects: Therefore, P(A) - P(atlcasttwo l 365 have the SlIme birtbdny) - k - + 1) 23,40,50, and 70. Notice how Table 2,7.1 shows P(A) for k of 15, greatly exceed what our intuition would '''''1';;.,'-'''''' P(A),s Comment. Presidential biographies offer one opportunity to "confiml" the unex2.7.1 for P(A). Among our first k ::=:: 40 presidents, pectedly large values that two did have the snme birthday: Harding Polk were both born on November 2. More, TABlE :2.1.1 k P(A) = P (at least two have same birthday) 22 23 0.253 0.476 0.507 0.891 50 0.970 70 0.999 Section 2.7 Combinatorial Probability surprising. though, are the death dates of the presidents: Adams, all died on July 4, and Fillmore and Tan both died on March 8. 119 and Monroe Comment. The values for peA) in Table 2.7.1 are actually slight underestimates for the true probabilities that at least two of k people will be born on the same day. The assumption made earlier that all birthday sequences are equally hkely is not entirely true: Births are somewhat more common during the summer than they are during the winter. It has been proven, though, that any sort of deviation from the equally-likely model wm only serve to increase the chances that two or more people will the same birthday (120). So, if k = 40, for example, the probability is slightly greater than O.R91 that at least two were born on the same day. EXAMPU of more instructive--and to some, one of the more useful-applications of combinatorics is the calculation of probabilities associated with various poker hands. It will be assumed in what follows that five cards are dealt from a poker deck and that no other cards are showing, although some may already have been dealt. The ~"'HOJ"_ (~2) = 2.,598,960 different hands, each having probability 1/2.598,960. space is the set What are the chances of being dealt (a) a full Muse, (b) one pair, and (c) a straight? [Probabilities for the various other kinds of poker hands (two pairs, three-of-a-kind, flush, and so on) are gotten in much the same way.] house. A full house consists of three cards of one denomination and two of another. Figure 2.7.7 shows a full house consisting of three 7s and two Queens. B. Full Denominations can be chosen in the a denomination has G) 2 s given that possible choices of suits. Thus, by the multiplication rule, 3 4 5 6 7 8 9 10 x J Q X X X FIGURE 2.7.7 G) C12) available denominations, D c ways. decided on, the three reqUisite suits can be selected in Applying the same reasoning to the pair gives each having C:) x K A 120 Chapter 2 Probability 2 3 4 D H 6 7 5 8 10 9 K J A X X X X C X S FIGURE 2.7.B b. One pair, To qualify as a one-pair hand, the five cards must include two of the same denomination and three "single" cards-cards whose denominations match neither the pair nor each other. denominations 2,7.8 once selected. cards can be chosen the three can have any of a pair of G) (~2) ways For the there are (\3) suits. Denominations for Question 2.7,16), and each card G) suits, Multiplying these factors together and dividing by (~) gives a probability of 0.42: CnG)G)G)G)(~) ~042 - P(onep,ir) - , e~) c. Straight. A straight is five cards having consecutive denominations but not all in the same suil-for example, a 4 of diamonds, 5 of hearts, 6 of hearts, 7 of clubs, and 8 of 2. 7.9). An ace may be counted or "loW:' which means diamonds (see that (10,jack, queen, king, ace) is a strajght and so IS (ace, 2,3, 4, 5). (If five consecutive cards are all in the same suit, the hand is called a straight flush. The 1a ner is considered a fundamentally different type of hand in the sense that a straight flush "beats" a To the numerator for P (straight), we will first ignore the condition that nU five cards not be in the same suit and count the number of hands having consecutive denominations. Note there are ten sets of consecutive denominations of length (ace, 2, 3. 4, 5), (2,3,4,5,6), ... , (10, jack, queen, king, ace), With no restrictions on the suits, each can be a heart, dub, or spade. It follows, then, that number of five-card hands having denominations is 10 . GY. Butforty 10· 4) of those hands are straight 10· P (straight) GY e~) Therefore, -40 = 0.00392 2.7.2 shows the probabilities with all the Hand i beats hand j if P(hand i) < P(h:mn j). hands. Section 2.7 Combinatorial Probability 121 2345678910JQKA D H x X X C X x S AGURE1.7.!'I TABlE1..7.2 Hand Probability pair Two pairs Three-of-a-ldnd Straight Flush Full house Four-of-a-kind Straight flush Royal flush 0.42 0.048 0.021 0.0039 0.0020 0.0014 0.00024 0.000014 0.0000015 PROBLEM-SOLVING HINTS (Doing combinatorial probability problems) listed on p. 91 are several hints that can be helpful in counting the number of ways to do something. Those same hiots apply to the solution of combinatorial probability problems, but a few others should be in mind as well 1. solution to a combinatorial probability problem should be set up as a quotient of numerator and denominator enumerations. Avoid the temptation to multiply probabilities associated with each position in sequence. The latter approach wiU always "sound" reasonable, but it wiU frequently oversimplify the prohlem and give the wrong answer. 2. Keep the numerator and denominator consistent with respect to order-if permutations are being counted in the numerator. be sure that permutations are being counted in the denominator; likewise, if the outcomes in the numerator are combinations, the outcomes in the denominator should also be combinations. 3. number of outcomes associated with any problem involVing the rolling of n six-sided dice is &'; similarly, the number of outromes associated with tOSSing a coin It times is 2". The number of outcomes associated with dealing a band of 11 cards from a standard poker deck is 52Cn . 122 Chapter 2 Probability QUESTiONS 2.7.1. Ten equally-qualified marketing assistants are candidates for promotion IO associate buyer; seven are men and three are women. If I he company intends 10 promote four of the len at random. whal is the probability that two of Ihe four are women? numbered 1 through 6. Two are chosen al random and their 2.7.2 An urn contains six numbers are added logether. What is Ihe probability that the resuhing sum is equal to five? 2.7.3. An urn contains twenty chips, numbered t Ihrough 2(t Two are drawn simultaneously. Whal is the probability lhal Ihe numbers on the two chips will differ by more Ihantwo? 2.7.4. A bridge hand (Ihirteen is dealt from a standard 52-<:ard deck. Let A be the event thai the hand conlains four aces; let B be the event Ihat the hand contains four kings. Find PtA U B). 2. 7..5. a set of ten urns, of which conlain three white chips and Ihree red chips each. The tenth conlains five while chips and one red chip. An urn is picked 131 random. Then a ~ample of lIi7e three is drawn witholll replacement from thai urn. If all three chips drawn are white. what is Ihe probability the urn sampled is Ihe one with five white chips? 2.7.6. A commiHee of fifty politicians is to be chosen from among our one hundred U.S. Senators. [f the selection is done al random, whal is the probability lhat each slale will be 2.7.7. Suppose that II fair dice are rolled. Whal are the chances thaI all n faces will be Ihe same? 2.7.8.. Five fair dice are rolled. Whal is the probabilily Ihat Ihe faces showing constitute a "full house"-Ihat is, three faces show one number and IWO faces show a second number? 2.7.9. Illidgin¢ that the test tube pictured contains 2/1 grains of sand, fI whitc and 11 blnck. Suppose the tube is vigorously shaken. Whal is the probability Ihat the two colors of sand will completely separate; that is, all of one color fall 1.0 the bouom, and all of the other color lie on top? lHint: Consider Ihe 211 to be in a row. Tn how many ways can the 11 white and the II black 2.7.10. Does a monkey have a beller chance of rearranging Ace L L U U S 10 spell CAL C U L U S or A A 8 EG L R 10 ALGEBRA? 2.7.11. An aparlment building has eight floors. If seven people gel on Ihe elevator on the first tioor, what is the probability they all want to off on different Roors? On Ihe same floor? Whal assumption are you Doel. it bt:t:m H:a:,ulI<iuk? Explain. 2.7.12. If the leiters in the phrase A ROLLING STONE GATHERS NO MOSS are at random, what are the chances that not all the S's will be ndjacen!? Section 2.8 Taking a Second Look at Statistics (E numeration and Monte Carlo Techniques) 27.13. Suppose each of ten is broken into a long pan: and a short part. The twenty parts are arranged into ten pairs and glued back together, so that again there are ten sticks. What is the probability that each long part will be paired with a short part? (Note: This problem is a model for the effects of radiation on a living cell. Each struck by ionizing radiation, breaks into two paris, chromosome, as a result of one part containing the centromere. The cell will die unless the fragment containing the centromere recombines with one not containing a centromere.) 27.14. Six dice are rolled one time. What is the probability that each of the six faces appears? 27.15. Suppose that a randomly selecled of k people are brought together. What is the probability that exactly one pair has the same binhday? 27.16. For one-pair poker hands,. why is the number of denominations for the three 2 cards rather than (\O)? ) C:) Cl CII) 2.7.17. Dana is not the world's best poker player. Deal! a 2 of diamonds, an 8 of diamonds, an ace of hearts, an ace of clubs, and an ace of spades, she discards the three aces. What are her chances of drawing to a flush? 2.1.18. A poker pLayer is deah a 7 of diamonds, a queen of diamonds, a queen of hearts, a queen of clubs, and an ace of hearts. He discards the 7. What js his probability of drawing to either a full house or four-of-a-kind? 27.19. Tim is dealt a 4 of clubs, a 6 of hearts, an 8 of hearts, a 9 of hearts, and a king of diamonds. He discards the 4 and the king. What are hjs chances of drawing to a straight flush? to a flush? 2.7.20. Five cards are dealt from a standard 52-card deck. What is the probability that the sum of the faces on the five cards is 48 or more? 2.7.21. Nine cards are dealt from a 52-card deck. Write a formula for the probability that three uf the five even numerical denominations are represented twice, one of the three face card appears once. Note: Face cards are the cards appears twice, and a second jacks, queens, and kings; 2, 4, 6,8, and to are the even numerical denominations. 2.7.22 A coke hand in bridge is one where none of the thirteen cards is an ace or is higher than a 9, What is the probability of being dealt such a hand? 2.7.23. A pinochle deck has forty-eight cards, tWO of each of denominations (9, J, Q, K, 10, A) and the usual four suits. Among the many hands that coum (or meld is a roundhollse. which occurs when a player has a king and queen each suit. In a hand of twel\le cards, what is the probability of gelling a "bare" roundhouse (a king and queen of each suit and no other or queens)? 2.1.24. A somewhat inebriated conventioneer finds himself in lhe embarrassing predicament of being unable 10 predetermine whether his next will be forward or backward. What is the probability thaI after hazarding /I such maneuvers he will have stumbled forward a distance of r steps? (Hint: Lelx denOle the number of steps he takes forward and y, Ihe number backward. Then x + y = II and x y = r.) 'AKING A SECOND LOOK AT STATISTICS (ENUMERAll0N AND MONTE CARLO TEGINIQUES) It is a characteristic of probability and com binatorial problems that proposed solutions can sound so right and yel be so wrong. Intuition can easily be fooled, and \lerbal arguments are often inadequate to deal with questions having even a modicum of complexity. There are Some problem-solving available, though, that can be very helpful. In approaches that go back to basics are especially useful. 1H 124 Chapter 2 Probability Making a Ust and Checking It Twice Ask a realtor to list the three most important features that a house for can have anc the answer is. likely to be "location. location, location." Ask a probabilist to name the three most helpful for difficult combinatorial problems and the answel well "enumerate, enumerate, enumerate." Making a paniallist of the sel might of outcomes comprising an event can often show that a proposed solution is incorrect anc what the right answer should Sometimes, though, the magnitudes of the numbers ir a problem are so large that making even a panialHst outcomes is not a optior for that particular problem, In those cases, the is to enumerate a much smaller-scalt problem, one that has all the essential features of the originaL For example, suppose a student government council is to be comprised of three freshmen, three sophomores, three juniors, and one at-large ..,.,r,...,.",,.,nt<l1·, who could be a member any of the four classes. Moreover, suppose ten candidate: from each of the classes have been nominate.d. How many different thirl een-memhe councils can be formed? One approach that seem is to the council members in a way mimics the statement of the question. That three representatives from each class Cal be chosen in 40 (~O) ways; then the at-large member would selected from the remainin, l2) nominees. Applying the multiplication rule number of = 5,806,080,000 that one dasst Another approach, which also may seem reasonable, is to vvill necessarily have foul' representativC8, while the other three will each have three. An of the four classes, of course, could be the one with four representatives. Electing for example, and from eaeh of the other three classes can be done i C~) (~O) C30) C30) in any ways. Allowing for the fact that the four it foHows that the total number of . could I: councils is 1.451.520.00. . (10) (10) 3 (10) 3 (10) 3 number of dIfferent counclls = 4 = 1,451.520,000 Is the first approach overcounting {he number of different councils or is the StXOI approach undercounting? The two solutions differ a factor of four. Enume (by hand) even a portion the possible outcomes is not feasible magnitude of the combinatorial A very simple analogous question c. be posed, though, that is easily enumerAted. Suppose there were only two classes-sa Taking a Secood look at Statistic; (E numeration and Monte Carlo Tedm.iqoes) sectioo2.8 Freshmen Oioosel C,D A,B Choose 1 ~ AND chOO6e 1 at-large FIGURE 2..8.. 1 TABlf2..8..1 First Approach 8 Fresb. Soph. A A A C C A D B B B B C Second Approach At-large Fresh. A 4 C C D B B A A B C D D ~ ] D D Duplicates IrelmDJen and sophomores-and onty two nominees from each cla.ss. Furthermore, supa three-member council is to be formed, consisting of one freshman, one sophomore, and one representative at large (see Figure Applied to Figure 2.8.1, the first approach would claim that the number of different "'-'LUlI..1Jl> is G) G) C~ 2), 8. different councils is 4 or The second approach would imply that the number of (= G) G) + G) G) ). Table 2.S.1 is a listing of the outcomes generated by the two strategies. By inspection, it is now clear that the first approach is incorrect-every possible outcome is doubJe-counted. The outcome ACB, for example, where B is the at-large representative, reappears as SCA, where A is the at-large representative. The second approach, on the other hand, prevents any such overlapping from occurring (but does include all possible councils). Play It Again. Sam Recall the von Mises definition of probability given on p. 23: If an experiment is repeated if the event E occurs on m of those repetitions, n times under identical conditions, 125 126 Chapter 2 Probabilny then peE) = lim m 11->00 (2.8.1) n To be sure, Equation is an asymptotic result, but it suggests an obvious (and very approximation-if n is finite, P(E) m ==n In general, efforts to estimate probabilities by simulating repetitions of an experiment (usually with a computer) are referred to as Monte Carlo studies. Usually the technique is used in situations an exact probability is difficult to calculate. It can also be though, as an empirical justification for choosing one proposed solution over another. For example, consider the game described in Example 2.4.11. An urn contains a red is chip, a blue and a two--color chip (red on one side, bh.lt! uu tht;; uthtf). One drawn at random placed on a table. question is, if blue is what is the probability that the color underneath is also blue? Pictured in 2.8.2 are two ways conceptualizing the question just posed. The outcomes in (a) are that a chip was dr~wn. Starting with that premise, the answer to the question is red chip is obviously eliminated and only one of two ~U."""LUU'6 chips is blue on sides. Side drawn red blue } --.. P(BIB) =112 red/red 'J bluefblue --- P(BlB) = 213 tW(H;;olor red/blue <a) (b) FiGURE 2.8.2 By way of contrast, the outcomes in (b) are assuming that the side of a chip was drawn If so, the blue color showing could be any of three blue sides, two which are biU( underneath. According to model (b), then, the probability of both sides blue is The formal on pp. 60, of course, resolves the debate-the correct answer i: But suppose that such a derivation was unavailable. How might we assess relativl plausibilities of ~ and ~? The answer is simple-just play the game a number of times an' see proportion of outcomes that show on top have blue underneath. To that Table summarizes the results of one hundred drawings. Fo a total of fifty-two, blue was showing (5) when the chip was placed on a table; of the trials (those marked with an asterisk), the color underneath (U) was also blue Using the approximation suggested by Equation 2.8.1, i. i. P(blue is underneath I blue is on top) = PCB I B) a figure much more consistent with j than with!. == ~~ == 0.69 Section2.S Taking a Second Look at Statistics (Enumeration and Monte Carlo Techniques) TA8l.E l.8..l Trial # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 S U R B B B* B R R R R R R B R R 51 B B 53 54 R R B B 30 R R 55 31 32 33 R B 56 B R B* 57 58 59 R R R 34 36 37 B B B* B B* R B R R R R 38 B BIIt 39 R 40 R 41 42 43 B B* B'" B R B B* B B BIIt 20 21 22 R R 23 B B* B R B R R 45 46 R B R B B* B B* 50 R R R R U Trial # B R 76 B B B* R B R R R 77 S B B* B B'" 78 79 R R 80 B 81 R 82 83 R R R R 84 85 B B R R R R B B B* B R B B* R B B* R B B R 61 62 63 B R R R 86 B 87 B BIIt R 64 R R 88 89 90 B R 65 B B* 66 B 67 R 68 B B'" B B* 69 70 R 47 48 49 S R R B B Bot 24 25 Trial # 29 B B B* B 8* R 8 B U 28 B BIi< R 26 S R B B* B R R # 72 73 74 75 R R R R 92 93 94 97 B Bot B B* B R B Bot R 95 R % B B* B R R R B R R R 'R R B R R 98 99 B B* 100 R R B B* The point of these examples is not to downgrade the importance of rigorous derivations and exact answers. Far from The application of Theorem 2.4.1 to solve the problem posed in Example 2.4011 is obviously superior to the Monte Carlo approximation illlJSw trated Table 2.8.2. Still, enumerations of outcomes and "replications of experiments can often provide valuable insights and call attention to nuances that might otherwise go unnoticed As problem-solving techniques probability and combinatorics, they are extremely, extremely important 127 CHAPTER Random 3.1 3.2 3.3 3.4 3.S 3.6 3.7 3.8 3.9 3 ariables INTRODUCTION BINOMIAL AND HYPERGEOMETRIC PROBABILITIES DISCRETE RANDOM VARIABLES CONTINUOUS RANDOM VARIABLES EXPECTED VALUES THE VARIANCE JOINT DENSITIES COMBINING RANDOM VARIABLES FURTHER PROPERTIES OF THE MEAN AND VARIANCE 3.10 ORDER STATISTICS 3.11 CONDITIONAL DENSITIES 3.ll MOMENT-6ENERATINc.J fUNCTIoNS 3.13 TAKING A SECOND LOOK AT STATISTICS (INTERPRETING MEANS) APPENDIX 3..4..1 MINITAB APPLICATIONS Jakob (Jacques) Bernoulli One of a Swiss family producing eight distinguished scientists, Jakob was forced by his father to pursue theological but his love of mathematics eventually Jed him to a university career. He and his brother, Johann, were the most prominent champions of Leibniz's calculus on continental the two using the new theory to solve numerous problems in physics and mathematics. Bernoulli's main work in probability, Ars Conjectand~ was published after his death by his nephew, Nikolaus, in 1713. 128 -Jakob (1""ClIlf'~) Rprnnulli (1 Section 3.1 Introduction 129 INTRODUcnON Throughout Chapter 2, probabilities were assigned to events-that is, to sets of sample outcomes. The events we dealt with were composed of either a finite or a countably infinite number of sample outcomes, in which case the event's probability was simply the sum of the probabilities assigned to its outcomes. One particular probability function that came up over and over in Chapter 2 was the assignment of ~ as the probability """".......,10.<;,..., with of the n points in a finite sample space. This is the model that typically describes games of chance (and all of our combinatorial probability problems in Chapter 2). The first objective of this chapter is to look at several other useful ways for asprobabilities to sample outcomes. In so doing. we confront the desirability "redefining" sample spaces using functions known as random variables. How and why the focus of virtuthese are used-and what their mathematical properties aUy everything covered in Chapter 3. As a case in point, suppose a medical researcher is testing eight elderly adults for their allergic reaction (yes or no) to a new drug for controlling blood pressure. One of the 28 256 possible points would be sequence (yes, no, no, yes, no, no, yes, no), signifying that the first subject had an allergic reaction, the second did not, the third did not, and so on. Typically, in studies of this sort, the particular subjects experiencing reactions is of little interest: what does matter is the number who show a reaction. If that were true here, outcome's relevant information the number of anergic reactions) could summarized by the number 3. 1 Suppose X denotes the number of allergic reactions among a set of eight adults. Then X is said to be a random variable and the number 3 is the value of the random variable for outcome (yes, no, no, yes, no, no, no). In generaJ, random variables are functions associate numbers with some attribute of a sample outcome that is deemed to be especially important. If X denotes the random variable and s denotes a sample outcome, then X (s) = t, where 1 is a real number. the aUergy example, s = (yes, no, no, no, no, yes, no) and 1 = 3. Random variables can often create a dramatically simpler sample space. That certainly is the case here-the original sample space has 256 28 ) outcomes, each being an ordered sequence of length eight. The random variable X, on the other hand, has only nine possihle values, the integers from 0 to 8, inclusive. In terms of their fundamental structure, aU random variables fall into one two broad distinction resting on the number of possible values the random variabJe can equal. If the latter is finite or countably infinite (which would be the case with the allergic reaction example), the random variable is said to discrete; if the outcomes number in a given interval, the number possibilities is uncountably can be any infinite, and the random variable is to be contimu)!I.$. The difference between the two is critically important. as we willlearI1 the next several sections. = IBy TlIeorem 2.6.2. of course, there would be a 10lal of fifty-siX (= 81/315!) outcomes having exactly three All fifty-six would be equivalenl in lerms orwbat they imply aboul Ule drug's likelihood of C<lusing reactioos. 130 3 Random The 3 is to introduce the important and computational associated with random variables, both discrete and continuous. Taken together, these ideas form the bedrock of modem probability and statistics. ~1-2 BINOMIAL AND HVPERGEOMETRIC PROBABIUTIES section looks at two "1-"'''''''''- probability scenarios that are especially ""IJV.L'<H'''. both for their theoretical as well as for their ability to describe rellLl-Vi'OJi ..... vlJ""F, these two models will us understand random in 3.3. in general, the The Probability Distribution probabilities apply to situations involving ii st:rit:S of and idenlical trials, each trial can have only one of two possible outcomes. Imagine three distinguishable coins being tossed, having a probability p of coming up heads. The set of outcomes are the in Table 3.2.1. If the probability of any of the coins lip heads is P. th~n the prohahility of the sequence (H. H. H) is p3, since the coin tosses qualify as independent Similarly, the of (T, H, H) is (1 - p)';. fourth column of Table shows the probabilities associated with each of the three-coin sequences. Suppose our the coin tosses is the number of heads that occur. Whether the actual sequence (H, H, .1) or (H. T, is immaterial, since outcome contains exactly two The last column of shows the number of heads in each of the eight outcomes. Notice are three outcomes with <JAD).... W.y two heads, each having an individual probability p2(1 p). The probability, of the event "two heads" is the sum of those three individual probabilities-that p). Table 3.2.2 the probabilities of k heads, where k = 0, 1, 2, or 3. HUlOOUaI TABLE:U.1 Heads 1st Coin 2nd Coin H H H T H T H H T H H T T T H T H T T T T 3rd T p3 p2(1 _ p) y(l - p} ';(1 - p) p(l _ p)2 p(l _ p)2 p(l _ p}2 p)3 (1 :i 2 2 2 1 1 1 0 Section 3.2 Binomial and Hypergeometrk. ProbabiHties 131 TABLE 3.2.2 Number of Heads Probability o (1 - p)3 3p(1 _ p)2 3p2(1 p) p3 1 2 3 Now, more generally, suppose that Tl coins are tossed, in which case heads can equaJ any integer from 0 through Tl. llyanalogy, number of probability of any) number of ways ) . particular sequence P(k heads) = to arrange k having k heads ( heads and n - k tails ( and 11 - Ie tails = number of ways ) to arrange k ( heads and 11 - Ie tails The number of ways to arrange k and n - k Ts, though, is __11_'__ or G) (recall Theorem 2,6..2). Theorem 3.2.1. Consider a series of n i1u1ependent trials, each resulting in Ol1e of two possible outcomes, "success" or '1ailure." Let p = P (success occurs at any given trial) and assume that p remains constant from trin! to trial TJren P(k successes) = (n) k p k (1 - p) ,,-It • k Comment. The probability assignment given by the known as the binomial distribution. = 0 , 1, .... n 1-<",,,,n,i"I'" in Theorem EXAMPLE 3.2.1 As the lawyer for a client accused of murder •you are looking for ways to establish "reasonable doubt" in the minds of the jurors. Central to the prosecut.or's case is testimony from a forensics who claims that a blood sample taken from scene of the crime matches the DNA of your client. One·tenth of 1% of the time, though, such tests are error, 132 Chapter 3 Random Variables Suppose your client is actually gUilty. If six other laboratories in tbe cOLIn try are capahle of doing this kind of DNA analysis (and you them all), what are the chances that at one will make a mistake and yOUT client is innocent? Each of six analyses an trial. where p = P a 0.001. SUbstituting into Theorem shows that the lawyer's is not to work: one lab says clien t is innocen t) 1 P (0 labs make a mistake) = 1 (~)(0.001)O(0.999)6 = 0.006 0.006 is hardly Given such small values for the defendant. the . . ,,", ....," nand p, though, op1cfln,o ...."'.r...... forensic results would be a at best. Bul Baretta, was fond then again, as the erstwhile TV the time, don't do the crime. " 1 iU'''""",,,,,' Pharmaceuticals is with a new affordable AIDS medication, PM-17, that may have the abilily 10 a victim's immune monkeys HIV complex have been the drug. Researchers to wait six count the number of animals whose immunological responses show a marked Any inexpensive drug of being effective 60% of the time whose chances of success are would be considered a major breakthrough; 50% or are not likely to have any Yet to be finalized are guidelines for interpreting Kingwest hopes to avoid a drug that would ultimately prove to be marmaking ejther of two errors: (1) ketable and (2) development dollars on a whose eHeCI1v(~nc:ss. in the long run, would be or As a tentative rule," the project manager suggests that unless 16 or more of the monkeys show research on PM-I7 should be a. WlJ<tL are the dl<tm;es th<tt the "sixteen or more" rule will cause the company to PM-I7, even if the drug is 60% effective? b. often will the or rule allow a50%-effective drug to be perceived as a major breakthrough? (0) Each of the monkeys is one of II = 30 independent come is either a "success" (monkey's immune system is Of a "failUTe" immune system is not strengthened). By assunlption, the probability is p = P that produces an immunological improvement in any given (success) 0.60. Section 3.2 Binomial and Hypergeometrk Probabilities probability that exactly k monkeys (out By Theorem 3.2.1, improvement after six weeks is Ck 133 thirty) will show O )(O.60)k(OAO)30-k. The probability, that the "sixteen or over" rule win cause a 6O%-effective drug to be discarded is the sum of "binomial" probabilities k values ranging from 0 to 15: P(60%·effective drug fails "sixteen or more" rule) = (~) (O.60)k (OAO)30-k = 0.1754 Roughly 18% of the time, in other words, a "breakthrough" drug such as PM-I? will produce test results so mediocre (as measured by the "sixteen or more" rule) that the company win be into thinking it has no potential (b) The other error Kingwest can make is to conclude that PM-17 warrants further study when, in its value for p is below a marketable level. chance that particular incorrect inference will drawn here is the probability that the number of that successes than or equal to sixteen when p = 05. That will p (50 %-effective PM-I? "'I-'~"-'='''' to marketable) = P(sixteen or more successes occur) = f (30) k=16 (0.5)" (05)30-k k = 0.43 Thus, even if PM-l?'s success rate is an low 50%, it a 43% chance of performing sufficiently well in thirty trials to satisfy the "sixteen or more" criterion. Comment. Evaluating binomial summatioos can be tedious, even with a calculator. Statistical software packages offer a alternative. Appendix 3.Al how one such program, MINITAB, can be used to answer the sorts of questions posed in Example 3.2.2. EXAMPLE 3.2.3 The Stanley Cup playoff in professional hockey is a seven-game where the first team to win four games is declared the champion. The series, then, can last anywhere from the World in baseball). Calculate the likelihoods four to seven games (just that the series will last four, five, and seven games. Assume that (1) each is an independent event and (2) the two teams are evenly lU"'"'U........ Consider the case where Team A wins the series six games. For that to happen, they must win exactly of the five ga.mes ond they must win the sixth Because 134 Chapter 3 Random Variables of th.e independence assumption, we can write P(Team A wins in six games) = P(Team A wins three of first five) . P(Team A wins sixth = [ G) (0.5)3 (0.5)2] . (0.5) = 0.15625 Since the probability that Team B wins the series in six games is the same (why?), P (series ends in six games) = P(Team A wins in six games u Team B wins in six games) = P (A wins in six) + P (B wins in six) = 0.15625 (why?) + 0.1562..'; = 0.3125 A similar argument allows us to calculate the probabilties of four-, five-, and seven- game series: P(four game series) = 2(0.5)4 = 0.125 P(five game series) = 2 [ P(seven game series) = 2 [ G) G) (0.5)3(0.5)] (0.5) = 0.25 (0.5)3(0.5)3] (0.5) = 0.3125 Having calculated the "theoretical" probabilities associated with the possible lengths of a Stanley Cup playoff raises an obvious question: How do those likelihoods compare with the actual distribution of playoff lengths? For a recent fifty-nine year period, Column 2 in Table 3.2.3 shows the proportion of playoffs that lasted 4, 5, 6, and 7 games, respectively. Clearly, the agreement between the entries in Columns two and three is not very gooct Particularly noticeable is the excess of short playoffs (four games) and the deficit of long playoffs (seven games). What this "lack of fit" suggests is that one or more of the binomial ilistribution assumptions is not satisfied. Consider, for example, the parameter P. which we assumed to equal In reality, its vaJue might be something quite different-just i. TABU 3.2.3 Series ~.~"" .. " 4 5 6 7 Observed 1-',..,..,.,......'; 19/59 = 15/59 = 15/59 = 10/59 = 0.322 0.254 0.254 0.169 Theoretical Probability 0.12..'; 0.250 0.3125 0.312..'; Binomial and Hypergeometric Probabilities Section 3.2 115 because the teams playing for the championship won their respective divisions, it does not necessarily follow that two are good. Indeed, if the two contending teams were frequently mismatched, the would be an increase number short playoffs and a decrease in number of long playoffs. It may also be the case that momentum is a factor in a team's Chances of winning a game. the independence assumption implicit the binomial modeJ is rendered invalid. EXAMPlE 3.2.4 Doomsday Airlines ("Come Take the Flight of Your life") has two aircraft-a dilapidated two-engine plane and an equally outdated and under-maintained four-engine prop plane. Each will land safely only if at least its are working properly. Given that you wish to remrun among the living, under what conditions would you opt to fiy on the two-engine plane? Assume that each engine on each pJane has the same and that such failures are independent events. probability p of the tW(>-ell~lIle P (fligbtlands safely) = P(one or more engines work properly) = t (2)(1 - k=l pi p2-k (3.2.1) k For the four-engine plane. P(flight lands safely) = P(two or more engines work properly) (3.2.2) for the two-engine plane, then, When to of p for which the "13 £11 11 r.' " to an algebra problem: We look for > or, equivalently, Simplifying the inequality (~)(1 - p)Op4 + G)(1 - p)l > (~)(1 p)O,} 136 Chapter :3 Random Variables LO 0.9 0.8 0.6 ~ 0.5 C/.I 0.4 '" ~ fl.. 0.3 0.2 "" ""... " ,, "" "" "... /t / 0.1 0 , I I I I I 0.7 - - 2-cnginc plllnc .. .. .. 0.1 02 0.3 0.4 0.5 0.6 p;: P (e.ngine. 0.7 "" "" " "- ..... 0.8 0.9 ... - 1.0 f~il;;) FIGURE 3.2.1 gives (3p l)(p - 1) < 0 (3.2.3) L) is never so Inequality 3.2.3 win be true only when 1) > 0, p ;:-. ~ as the desired solution set. Figure 3.2.1 the two "safe return" as a function of p. QUESTIONS 3.2.1. An investment analyst has tracked a certain stock for the past six months and up a point or down a point. it found that On any given day it either went up on 25% of the days and down on 75%. What is the probability that at the dose of trading four days from now the price of the stock will be the same as it is Assume that the fluctuations are independent event:.'!, 3.2..2. In a nuclear reactor, the fission process is controlled by inserting special rods into the raruoactive core to absorb neutrons and slow down the nuclear chain reaction. When functioning these rods serve as a first-tine defense against a core meltdown. Suppose a reactor 10 control rods, each operating independently and each having a 0.80 probability of properly inserted in the event of an "inddt:nt", Furlht:rmun::, suppose that a meltdown will be prevented if at least half the rods perform satisfactorily. What is the probability upon demand, the system will fail? 3.2.3. that since the early 19508 some lO,ero independent UFO sighting.; have been ......""" ..11 to civil authorities. [f the probability that any sighting is genuine is on the order 100,000, what is the probability that at least] of the 10,000 was 3.2.4. The probability that a circuit board off an assembly line Suppose that 12 boards are tested, (a) What is the probability that 4 win need rework? (b) What is the probability that at least one needs rework? Section 3.2 Binomial and Hypergeometric Probabilities 137 3.2.5. A manufacturer has 10 machines that die cut cardboard boxes. The probability that. on a given day, anyone of the machines will be oUt of service for repair or maintenance is 0.05. If the day's production the availability at least seven of the machines, what is the probability the done? 3.2.6. Two lighting systems are being for an employee work are~t One requires SO bulbs, each having a probabilHy of 0.0.') of burning out within a month's time. The second has 100 bulbs, each with a 0.02 burnout probability. Whichever system is installed will be inspected once a month for the purpose of replacing bumed·out bulbs. Which system is likely to less maintenance? Answer the question by comparing the probabilities that each wili require at least one bulb to be replaced at the end of 30 days. 3.2.1. The great English diarist Samuel Pepys asked his friend Sir Isaac Newton the [ollowing question: Is it more likely to at least one 6 when 6 are rolied, at least two 6's when 12 dke are rolled, or 8lleast three 6's when i8 dice are rolled? After considerable correspondence (162»). Newton convinced the skeptical Pepys that the first event is the mosllikely. the three probabilities. missiles at an attacking plane. Each has a 20% chance of on target. h 1wo or more of the shells find their mark, the plane will At the same time, the pilot of the plane fires 10 air-to-surface rockets, each of which has a 0.05 chance of critically disahling the boat. WhlH you rather be on the plane or the boat? 3.2.8. The gunner on a small assault boa! fires 3.2.9. If a family has four children, is it more likely will two boys and two three of one sex and one of the other? Assume that the probability of a child boy is j and that the births are independent even Is. or a ~ of all patients having a certain disease will recover if the standard treatment. A new drug is to be tested on a group of 12 volunteers. If the FDA requires at least seven of patients recover before it willilcense the new what is the probability that the treatment will be discredited even i[ it has the potential to increase an individual's recovery rate to 3.2.10. Experience has shown Ihal !? 3.2.11. Transportation to school for a rural county's 76 children is provided by a fleet of four buses. Drivers are chosen on a day-to-day basis and come from a pool of local t<1r ....."rc who have LO be "on caU". What is the smallest number of drivers that need 10 be in the pool if the county wants to have at least a probability on given day that all the buses will run? Assume that each has an 80% chance being available if contacted. 3.2.12. captain of a Navy gunboat a volley of 2S to be fired at random along a 500-foot stretch of shoreline that he hopes to establish as a beachhead. Dug into the beach is a 30-foot-long bunker serving as the enemy's first line of defense. The captain has reason to believe that the bunker will be destroyed if at least three of the missiles are on What is the probability of that happening? 3.2.13. A computer generated seven random numbers over the interval 0 to 1. Is it more likely that (t) exactly three will be in the interval! to 1 or (2) fewer than three wiII be greater than ~? 3.2.14. Listed in the the table is the length distribution of World Series competition for yeats from 1950 to 2002. 138 3 Random Variables World Series L£"'r"'l.~ Number of Games. X Number of Years 4 9 5 6 7 8 11 24 52 Assuming that each World Series game is an event and that the probability of either team's winning any particular contest is find the prohability of each series length. How well does the model fit the data? the "expected" thai is. multiply the prohability of a given length times 52). 3.2.15. Use Ihe of (x + y)" (recall the comment in Section 2.6 on page 108) to that the binomial probabilities sum to 1; that G)PI.:(l _ p),,-I<: a series of II independent trials can end in one of three possible outcomes. and 1<2 denote the number of trials Ihat result in outcomes 1 and 2. Let Pl and P2 denote the associated with outcomes 1 and 2. Theorem to deduce a for the probability kl and k,. occurrences of outcomes 1 and 2, respectively. 3..2.17. Repair calls for central air conditioners fall into three categories: coolant leakage. compressor failure, and electrical malfunction. has shown that the Suppose thar probabilities associated with the three are 0.5, 0.3, and 0.2. the answer a dispatcher ha5 in 10 service requeslS for tomorrow involve coolant 10 Question 10 calculale the probability that 3 of those 10 leakage and 5 will be compressor failures. 3.2.l6. Hypergeometric Distribution The second "special" distribution that we want to look at formalizes the urn Chapler 2. to those problems tcnded to be We listed the entire set of samples, and then counted the ones that satisfied the event in The and re.dundancy of that ilpproac.h shollid he painfully What we are seeking here is a genera] formula that can be applied to any and all such much like the in Theorem 3.2.1 can handle the full range and w white chips, where r + w = N. any of the n chips from the urn without The question is. selected. At each drawing we record the color of the chip what is the probability that exactly k red chips are included among the II that are removed? Notice thnt the just described is similar in some re~;pecl.l> model, but the of sampling creates a critical distinction. drawn was replaced prior to another selection, then each drawing an independent trial, the chances of a red at any would be a constant r/N, and the be included in the 11 selections would chips would probability that exactly k Section 3.2 Binomial and Hypergeometrlc Probabilities 139 a direct application of Theorem 3.2.1: However, if the chips drawn are not replaced, then the probability drawing a red on any given attempt is not rjN: Its value would depend on the colors of the chips selected earlier. p = P(red is drawn) = P(succ.ess) does not remain constant from drawing to drawing, the binomial model of Theorem 3.2.1 does not apply. Instead, probabilities that arise from the 44no replacement" scenario just described are said to follow the hypergeometric distribution. = Theorem J.2.2. Suppose an urn contains r red chips and w white chips, where r + w N. If n chips are drawn out at random; without r~plo.cement, and if Ie denotes the number of red chips select£d, then P(kred chips are chosen) = whuekvariesoverallthe:integersforwhich (~) and (32.4) C: k) are defined. Theprobabilities appearing on the right-hand side of Equation 3.2.4 are known as the hypergeometric distrIbution. Proof. Assume the chips are distinguishable. We need to Count the number of elements making up the event of getting k red chips and n - k white chips. The number of ways to select the red chips, regardless of the order in which tbey are chosen. is r Pt. Similarly, the number of ways to select the n - k white chips is w However, oreier in which the chips are does matter. Each outcome is an 11-10ng ordered sequence of red and white. There are go. Thus, the number (~) ways to choose where the sequence the red chips elements in the event of interest ~s G} Pt wPlI-k. Now, the total number of ways to choose n elements from N, in order, without replacement is NP", so P(lered chips are chosen) = ~:...--- P N ll This quantity. While correct, is not in the form of the statement of the theorem. To make that conversion, we have ·to change all the terms the expression 140 Chapter 3 Random Variables to factorials: P(k red are chosen) = -'--'------ k! (N - r! n)! w! n!(N n)! o Comment. The appearance of binomial suggests a model of selecting subsets. Indeed, one can consider the of selecting a subset of size n simultaneously, where order doesn't matter. In that case, the question remains: what is the probability of getting k red and n - k white A moment's will show that the hypergeometric given in the statement of the rf'lf'{\rpm answer that question. So, if our is simply counting the number of and white is chips in the the probabilities are the same whether the drawing of the simultaneous, Of the chips are dJ2!wn in without ._r_t"t •.n,... ".'r.... I1," ..."r! Comment. mathematician 1 ab + -x + c name hypergeometric physicist, Leonhard + rl""r'fW~" introduced by the Swiss 1 + ... This is an of considerable flexibility: to many of the standard infini te tu 1, and b aHd c i:H~ ~t to ~aciI other, it 1 +x + x 2 appropraate for a, b, and c, it in analysis. In particular, if (J is set equal I;CUIUU:;'" to the familiar series, + + .0. hence the name hypergeometnc. The relationship of the probability function in Theorem to Euler's series apparent if we set (J -n, b = -r. c = w n + L and multiply the series by (;) / the value the (~) . Then the for P(k red are chosen). of xk will be Section 3.2 Binomial and Hypergeometric Probabilities 141 EXAMPLE 3.2.5 Keno is among the most popular games played in Vegas even though it ranks as one of the least "fair" in the sense that the odds are overwhelmingly in favor of the house. machine!) A keno card (Betting on keno is only a little less foolish than playing a eighty 1 through SO, from which the player selects a sample of size k, where k can be anything from 1 to (see 3.22). The "caller" then announces twenty winning numbers, chosen at random from the eighty. If-and how much-the pJayer wins on how many of his numbers match the twenty identified by the caller. Suppose that a bets on a ten-spot ticket What is his probability of "catching" five numbers? KENO AGURE 1.2.2 Consider an urn containing numbers, twenty of which are winners and sixty of which are (see Figure 3.2.3). By betting on a ten-spot ticket., the player, in effect, is drawing a sample of size ten {rom that urn. The probability of "catching" five numbers is the probability that five of the numbers the player has bet on are contained in the set of twenty winning nUJmoers W winning #', -ChooselO 60 1000g #'s AGURE 3.23 142 Chapter 3 Random Variables By Theorem (with r = 20, approximately a 5% chance of W II = 10, N = 80, and k exactly five winning numbers: the player = (ill, P(five winning numbers are selected) = (~~~~~) = 005 -~~~----------------------------- EXAMPLE 3.2.6 ... " ...,v,.., to a unanimous Suppose thal a pool is assigned to a murder case where the is so potential overwhelming against the that twenty-three of the twenty-five would return a guilty verdict. The other two potential jurors would vote to acquit regardless of the facts. What is probability that a twelve-member chosen at random from the of twenty-five will be unable to a unanimous decision'! Think of the jury pool as an urn containing twenty-five chips, twenty-three of which who correspond who would vote "guiLty" anu twu uf whidJ l:uu~pOl1l110 would vOte "nOl " If either or both of the who would vote "not are included in the of twelve, result would a hung jury. Theorem gives 0.74 as the probability that the jury impaneJled would not reach a unanimous decision: P(hungJury) = ,'"''''.. ,.,.,,,, is not = G)(~)/ G~) + G)G~)/ (~) =0.74 EXAMPLE 3.2.1 is fired it becomes scored with minute striations produced by in the Appearing as a series of parallel lines, these have been a bullet with a gun, firings of the same recognized as a basis for we::lpon win prMIlc,e hllllets having substantially the same configuration of Until recently, deciding how close two patterns had to be before it could be concluded the bullets came from the same was largely subjective. A ballistics expert would look at the two bullets under a microscope and make an judgment based on past Today, criminologists are beginning to address the prob)em more quantitatively, partly with the help of the hypergeometric distribution. aJong with the suspect's gun. Suppose a bullet is recovered from the scene of a a microscope, a grid of m numbered 1 to m, is superimposed over the bullet. If m is chosen Jarge so the of the is sufficiently each of tbat Binomial and Hypergeometric Probabilities 143 Striatiol'l8 (total of "e) 1IIIIIfl ... lliJ 1 2 3 4 E b -"' - m S (II> (b) flGUftEl.2A evidence bullet's III' striations will fall into a different cell (see Figure 32.4(a». Then the suspect's gun is fued, yielding a test bullet, which will bave a total of nr striations located in a possibly different set of cells (see Figure 3.2.4{b». How might we assess the similarities in cell locations for the two striation patterns? As a model for the striation pattern on the evidence bullet, imagine an urn containing m chips, with ne corresponding to the striation locations. Now, think of the striation pattern on the test bullet as representing a sample of II, from the evidence urn. By Theorem 3.2.2, the probability that k of the ceU locations will shared by the two striation patterns is Suppose the bullet found at a murder scene is superimposed with a having m = cells, ne of which contain striations. The suspect's gun is fired and the bullet is fOllnd to have III = 3 striations, one of which matches the location of one of the striatklns on the evidence bullet. What do you think a ballistics expert would conclude? Intuitively, the similarity between the two buUets woUld reflected the probability that one or more striations in the suspect's bullet matched the evidence bUllet. The smaller that probability is, the stronger would be our belief that the two bullets were fued by the same gun. Based on the values given for m, lie, and n" 144 Chapter 3 Random Variables If P(one or more matches} had been a small O.OOl-the inference would have been dear-.cut: The same gun fired both bullets, But, here with the probability of one or more matches so large, we cannot out the possibility that the bullets were fired two different guns (and, presumably, by two different EXAMPLE 3.2.8 Wipe Your Feel, a to establish name recognition in a community consisting thousand households. company's management team estimates that thousand of those would do business with firm if they were contacted and informed of the services available. With in mind, company has hired a staff of telemarketers to place one thousand calls. Write a formula for the probability that at least one hundred new customers will be identified. this is an urn problem nol the three ex.lmples, for the fact that the numbers "chips" are powers of ten than what we have encountered up to thJS paint. in the terminology of Theorem 3.2.2, N = 60,000, r = 5,000, w = 55,000, 1/ = 1000. and P(telemarketers identify k new customers) = 5000) ( 55,000 ) ( k 1000 - k (~=) k = 0, L ... , 1000 It follows that hundred or more new customers are or fewer new customers are identified) =1 =1- (3.2.5) Needless to say, evaluating Equation directly is very difficult because of the number of terms involved and large [aduriab implicit iu both the numerator and denominator. In 4 we will learn a series of approximations that virtually trivialize the evaluation. section 3.2 Binomial and Hypergeometric Probabilities 145 CASE STUDY 3.2.1 Biting into a plump, juicy apple is one of innocent pJeasures autumn. Critical to that enjoyment is the firmness the apple, a property that growers and shippers monitor closely. apple industry goes so far as to set a lowest acceptable limit for For Red firmness, which is measured (in lbs) by inserting a probe into the Delicious variety, for example, is supposed to be at least 12 in state of Washington, wholesalers are not allowed to sell apples if more than 10% of their shipment falls below that lb limit. AU of this raises an obvious question: How can shippers demonstrate that their apples meet the 10% standard? each one is not an option-the probe that measures firmness renders an apple for sale. That leaves sompling as the only viable strategy. Suppose, example, a shipper a supply of 144 apples. She decides to at random and measure one's firmness, with the intention of selling the remaining apples if 2 or fewer in the sample are substandard What are consequences of her pJan? More specifkalJy, does it have a good chance of "accepting" a shipment that meets the 10% rule and a good chance of ·'rejecting" one that does not? (If either or both of those objectives are not met, the plan is inappropriate.) For examfle, suppose there are actually 10 defective apples among the original 144. Since X 100 = 6.9%, that shipment would be suitable for sale because fewer how likely is it that a than 10% failed to meet the firmness standard TJle question sample of 15 chosen at random from that shipment will pass inspection? Notice, that the number of substandard apples in the sampJe has a hypergeoN = 144. Therefore, metric distribution with r = 10, W 134, n = 15, kI P(sample inspection) =:: P(2 or = 0.320 substandard apples are found) + 0.401 + 0.208 = 0.929 So, the probability is reassuringly high that a supply of apples this good would, follows from calculation that be judged acceptable to shjp. Of course, it time, the number of substandard apples found wi\] be greater than roughly 7% of 2, in which case the apples would be (incorrectiy) assumed to be unsuitable for sale (earning them an undeserved one-way ticket to the applesauce factory ... ) How good is the proposed sampling plan at recognizing apples would, in fact, be inappropriate to ship? Suppose, for example, that 30, or 21%, of the 144 apples (Continued on nexi page) 146 Chapter 3 Random Variables (Lase ~l!Id}' 3.2.1 continued) would faU below the 12 limit. Ideally. lhe probability here that a passes inspection should be small. The number of substandard found in this case with r = w =1 11 = and N = so would be P(sample inspection) = 0,024 + 0.110 + = 0.355 Here the bad news is thar the sampling plan will allow a 21 % defective supply to be shipped 36% of the time. The news is that 64% of the the number of substandard apples in the sample will exceed 2, meaning that the correct decision "not to ship" be made. the of defectives in Figure 3.2.5 shows P(sample passes) ploued of this sort are called operating characteristic (or OC) curves: the supply. They summarize how a sampling plan will respond to an possible levels of quality. ~ 0.8 ! 0.6 0.4 ~ 0,2 OL-~---L--~--~--~~"~-,~~~~ o 10 20 30 40 50 60 70 PI'esumed percent del"eclive 80 90 100 FIGURE 1.l.5 Comment. Every sampling plan invariably alloWS for two kinds of errorsthat should be accepted and accepting shipments that should be the probabilities of committing these errors can be manipulated rule and/or the size. Some these options will be explored later in Chapter 6. "nl.... rn'Pflll" QUESTIONS 3.2.18. A corporate hoard contains 12 members. The board decides to create a five person Committee to Hide Corporation Debt. Suppose four members of the board are accountants. What is the probabili[y [hat the Committee \vill contain two accoumants and Ihree non-accountants? Section 3.2 Binomial and Hypergeometric Probabilities 147 3.2.19. One of the popular tourist attractions in Alaska is watching black bears catch salmon, swimming upstream to spawn. Not all "black" bears are black, though-some are tan-colored. Suppose that six black bears and three tan-colored bears are working the rapids of a salmon stream. Over the course of an hour, different bears are sighted. What is the probability that those will include at least twice as many black bears as tan-colored bears? 3.2..20. A city has 4050 children under the of IO, including 514 who have not been vaccinated for measles. Sixty-five of the city's children are enrolled in the ABC Day Center. Suppose the municipal health department sends a doctor and a nurse to ABC to immunize any child who not already been vaccinated Find a formula for the probability that exactly k of the children at ABC have not been vaccinated. 3.2.21. Country A inadvenently launches IO guided missiles--6 armed with nuclear warheads--at Country B. In response, Country B fires 7 antiballistic missiles, each of which will destroy exactly one of the rockets. antiballistic missiles have no way of detecting, though, which of the 10 rockets are carrying nuclear warheads. What are the chances that Country B will be hit by at least one nuclear missile? 3.2.l2. Anne is studying for a history exam covering the Revolution that will consist five essay questions selected at random from a of 10 the professor has handed out to the class in advance. Not exactly a Napoleon buff, Anne would like to avoid researching all 10 questions but still be reasonably assured of a fairly good grade. Specifically, she wants to have at least an 85% chance of geuing at least four of the questions right. wm it be sufficient studies eight of the 10 questions? 3.2.23. year a college awards merit-based scholarships to members the freshmen class who have exceptional school initial pool of applicants for the upcoming academic year has been reduced to a "short list" of eight men and ten women, aU of whom seem equally deserving. If the awards are made at random from among the 18 what are the chances that both men and women will represented? 3.2.24. A local lottery is conducted weekly by choosing five chips at random and without replacement from a popUlation of 40 chips, numbered 1 through 40; order does not matter. The winn.ing numbers are announced on five commercials during the Monday night broadcast of a televised movie. Suppose first winning numbers match three of yours. What are your chances at that point of winning the lottery? 3.2.25. A display case contains 35 gems, of which 10 are reat diamonds and 25 are fake diamonds. A burglar removes four gems at random, one at a time and without replacement What is the probability thaI the last gem she steals is the second real diamond in the set of four? A bleary-eyed student awakens one morning, late for an 8:00 class, and pulls two socks out of a drawer that contains two black, six brown, and two blue socks, all randomly arranged, What is (he probability that the two he draws are a matched 3.2.1:7. Show directly that the set of probabilities associated with the hypergeomelric distribution sLIm to 1. Hint: Expand the identity and equate coefficients. 3.2.28. Urn I contains five red chips and four white Urn U contains four red and five wbite chips. Two chips are drawn simultaneously from Urn I and placed in Urn II. 148 Chapter 3 Random Variables 3.2.29. 3.2.30. 3.2.31. 3.2.32. Then a chip is drawn [rU{n Urn II. What is the that the chip drawn from Urn II is white? Hint: Ust: Tht:ureIll2A.1. As the owner of a chain of sporting goods stores. you have just been offered a "dear' on a of 100 robot tabLe tennis machines. The is right but the prospect of picking up the merchandise at midnight from an unmarked van parked on the side of lhe New Jersey Turnpike is a bit disconcerting. Being of low repme yourself, you do not consider Ihe legality of the transaction to be an issue, but you do have concerns about being cheated. If too many of the machines are in poor working order, the offer ceases to be a Suppose you decide to close the deal only if a sample of 10 machines contains no more than one defective. Construct the corresponding operating characteristic CUI"llC. For approximately what incoming qualily will you a shipment 50% of the lime? lhat r of N are red. the chips into three grOllpS of n., n2, 113, where III + 112 + 113 = N. Generalize the hypergeometric distribution 10 find the probability that the first group contains n red chips, the second group r2 red chips, and the third group r3 red chips, where rt + T2 + 11 = r. Some nomadic tribes. when faced with a life-threatening contagious will try to ;"""'..""" their chances of survival by into smaller groups. a tribe of 21 people, of whom four are carriers of the split into three groups of 7 each. What is the probability that at leBst one grOlip is free of Ihe ciisease? Hint: Find the probability of the complement. a population contains 111 objects of one n2 objects of a second kind, .,., and nl objects of a tlh kjnd, where nl + n2 + ... + 11/ N. A sample of size 11 is drawn at random and without replacement. Deduce an for lhe probability of draWing kl objects of .he first kind, k2 objects of the :-ecuml .... (1I1J k/ of the Ilh kind by Theorem 3.2.2. four sophomores, four juniors. aod applied for membership in their school's Communications a group that oversees the newspaper, literary ma~zine, and radio show. Eight posi~re open. If the selection is done at random. what is the probability that each class two (Hint: Use Ihe generalized hypergeometric model asked for in 3.3 DISCRETE RANDOM VARIABLES The binomial and hypergeometric distributions described in Section 3.2 are special cases ot some important concepts that we want to explore more funy in this section. Previously in Chapter 2, we studied in depth the where every point in a space is equally likely to occur (recalJ 2.6). The sample space independent trials that ultimately led to the binomial distribution presented a quite different scenario: individual points in S had different probabjlities. For eXAmple, if n 4 ;mel P the probabilities assigned to the sample (s, f, s, j) and (f, j, j, f) are = l, M, (1/3)2(2/3)2 and (2/3)4 respectively, Allowing for 1he possibility that different outcomes may differenl probabilities will obviously broaden enormously the range of real-world problems that probability models can address. Section 3.3 Discrete Random Variables 14g How to assign probabilities to outcomes that are not binomial or hypergeometric is one of questions investigated in this chapter. A ~ond issue is the nature of the sample space itself and whether it makes sense to redefine the outcomes and create, effect, an alternative sample space. Why we would want to do that already come up in our discussion of independent The "original" sample in cases is a set of ordered sequences, where the ith member of a sequence is either an "s" or an "i," depending on whether the ith trial in either success or failure, respectively. However, knowing which particular trials ended in success is typically less important than knowing the number that did (recall the discussion on 129). That being case, it often makes sense to replace each ordered sequence with the number successes that sequence contains. Doing so collapses the original set of ordered sequences (i.e., outcomes) in S to the set of n + 1 integers ranging from 0 to n. The probabilities assigned to those integers, of course, are given the binomial formuJa in Theorem 3.2.1In general, a function that assigns numberS to outcomes is called a random variable. The purpose of such functions in practice is to define a new sample space whose outcomes speak more directly to the objectives the experiment. That was the rationale that ultimately motivated both the binomial and hypergeometric distributions. The purpose of tbls section is to (1) outline general conditions under which probabilities can be assigned to sample spaces and (2) explore the ways and means redefining sample spaces through use of random variables. The notation introduced in this section is especially important will used throughout remainder of the book, Assigning Probabilities: The Discrete Case We begin with the general problem of assigning probabilities to sample outcomes, the simplest version of which occurs when tbe number points in S is either finite or countably infinite. The probability functions, p(s), that we are looking for those cases satisfy conditions in Definition 3.3.1. Definition 3.3.l. Suppose that S is a finite or cauntably infinite sample each element of S such that Let p be a real·valued function defined a. 0:::: p(s) (or each s b. p(s) E S =1 p is said to be a discrete probability function. Comment. Once p(s) for all $, it follows that the probability of any event that is, NA)-is the sum of the probabilities the outcomes comprising A: peA) = L p(s) (3.3.1 ) Ji'll sEA Defined in this way, the function peA) satisfies the probability axioms given in Section 2.3. The next several examples illustrate some of the specific farms that p(s) can have how P(A) is calculated. 150 Chapter 3 Random Variables EXAMPLE 3.3.1 Ace-six fiats are a type of crooked dice where the is foreshortened in the onesix direction, the effect that Is and 6s are more likely to occur than any of the other four p(s) denote the probability that the face showing is s. For many ace-six fia.ts, the "cube" is asymmetric to the extent that p(l) = p(6) = ~, while p(2) = p(3) = p(4) peS) = ~. Notice that pes) here qualifies as a discrete probability pes) is than or equal to 0 and the sum of pes), over all .1', is function because = 1( = 2(i) + 4(k))· 3.3.1 that Suppose A is the event that an even number occurs. It. follows from peA) = P(2) + P(4) + P(6) = ~ + k +! ~. Com.ment. If two ace-six fiats are seven is equal to 2p(1)p(tl) + 2p(2)p(5) the probability of getting a sum equal to + 2p(3)p(4) = 2(1)2 + 4(1)2 == [f two fair dice are roUed, the probability of getting a sum equal to seven is 2p(1)p(6) + 2p(2)p(5) + 2p(3)p(4) = 6(*)2 = which is than Gamblers cheat ace-six and forth between fair dice and flats, depending on whether flats by switching or not they want a sum of seven to be rolled. ft. 1, EXAMPlE 3.3.2 Suppose a fair coin is until a head comes up for the first time. What are the chances of that happening on an odd-numbered toss? Note that the sample here is countably infinite and so is the set of o~ making up the event whose probability we are trying to find. The peA) that we are looking for, then, will be the sum of an number oftenns. Let p(s) the probability that the first appears on the sth toss. Since the coin is presumed to be fair, p(l) = ~. Furthermore, we would half the time, a tail appears, the next toss would be a head, so p(2) == = In general, p(.1') = (~r, s 1,2, .. , p(s) the conditions stated Oearly, p(.1') ~ 0 for all s. see that the sum of the probabilities is 1, recall the formula for the sum of a geometric series: If 0 <: 7 <: 1, ! . i 1. 78 1 = __ 1 - Applying Equation 3.32 to the sample P(S) = p(s) = (3.3.2) 7 here confirms P(S) = 1: Section 3.3 Now, peA) A be the event that = p(l) + p(3) -+ p(5) p(l) + p(3) + p(5) + ... Discrete Random Variables 151 first head appears on an odd-numbered toss. Then But + ... = p(Zs + (l)b+l = (1)2: (l)S 4" = ~ 2: 00 1) 00 CASE STUDY 3.3.1 For good pedagogical reasons, the principles·of probability are always introduced by considering events defined on familiar sample spaces generated by simple experiments. that end, we toss coins, deal cards, roll dice, and draw chips from urns. It would be a error, to infer that the importance of probability extends no further than the nearest casino. In its infancy, gambling and probability were, indeed, intimately related: Questions games of chance were often the catalyst that motivated mathematicians to study random phenomena earnest But more than 340 years have since published De Ratiociniis. Today, the application of probability to gambling is relatively insignificant (the NCAA March basketball tournament notwithstanding) compared to the depth and breadth of uses the subject finds medicine, and science. Probability functions-propedy chosen-can "model" complex real-world phe~ describes the behavior of a fair coin. nomena every bit as well as P(heads) The following set of actuarial data is a case in point. Over a period of three years (= 1096 days) London, records showed that a total of 903 occurred am40ng males eighty-five years of and (188). Columns one and two of Table 3.3.1 give the breakdown of those 903 deaths according to the number occurring on a given day. Column three gives the proportion of days for which exactly s elderly men died. TABU:!.3.1 (1) Number of Deaths, s 0 1 2 3 4 5 6+ (2) Number of Days 484 391 164 45 11 1 0 -1096 (3) Proportion CoL (2) /1096] (4) pes) 0.442 0.357 0.150 0.041 0.010 0.001 0.000 0.440 0.361 0.148 0.040 0.008 0.003 0.000 1 1 152 Chapter 3 Random Variables For reasons that will be gone into at length in Chapter 4, the behavior of this particular phenomenon is pes) = P(s elderly men die on a ~ probability function that day) (:U.3) 0, L 7, How do we know that the pCs) in Equation 3.3.3 is an appropriate way to probabililies to the "experiment" of elderly men dying? Because it accurately predicts what happened. Column four of Table 3.3.1 shows p(s) evaluated for s = O. 1,2.... To two decimal the between the entries in Column three and Column four JS EXAMPLE 3.3.3 Consider the following experiment day for the next month you copy down each number that in the on the front pages of your hometown newspaper. Those numbers would be extremely One be the age of <:I t:elebrity who just died. another might report tbe interest rale currently paid on government Treasury bills, and still another might give the of square feet of retail space added to a local shopping mall. Suppose you then calculated the proportion of those nnmbers whose leading digit was a I) the proportion whose leading was a 2, and so on. Whal relationship would you proportions to have? Would numbers starting with a 2, for example, occur expect as often as numbers starling with a 67 p(s) denote the probability thalthe first significant digit of a "newspaper number" us that the nine first digits should is s, S = 1,2.... ,9. Our intuition is likely to be equally probable-I hal is. p(H p(2) ... = p(9) = Given the diversity and the randomness of the numbers. then: is no obviuu:. It:i:I~un une digit !>hould be more common than Om· intuition, would be wrong-first are 1101 equally likely. Indeed, they are nol even close to being equally likely! Credit making this remarkable discovery goes to Simon Newcomb, a mathematician ago some portions of logarithm tables are who ohs~rved mort': Ihan A hllndred used more than others (77). Specifically, pages at the beginning of such tables are more dog-eared than at the suggesting that users had more occasion to look up logs of starting wilh small than they did numbers slarling with digits. Almost fifty years a physicist, Frank reexamined Newcomb's claim in more detail and looked for a mathematical explanation. What is now known as Benford's loll' asserts that the first digits many types of measurements, or combinations of measurements, follow the discrete probability p(s) POst signifk:altl l:>~) = (I + ;), .\ ], 2... ,,9 !, Table 3.3.2 Benford's law to the uniform <:I~~umptjon that p(s) = for all.s. The differences are striking. According to law, for example, Is are the mosl frequently occurring first digil. 6.5 times (- 0.301jOJ)46) as often as 9s Section 3.3 Discrete Random Variables 153 TA.Bl..£ 3.3.2 s "UnifoIID" Law Benford's Law 1 2 0.111 0.111 0.111 0.111 0.111 0.111 0.111 0.301 0.176 0.125 3 4 5 6 7 8 9 0.097 0.079 0.067 0.058 0.051 0.046 0.111 0.111 Comment. A key to why Benford's law is true are the differences in proportional changes associated with each leading digit. To go from one tbousand to two thousand, thousand, on the example, represents a 100% increase; to go from eight thousand to other hand, is only a 12.5% increase. That would suggest that evolutionary phenomena 28 than witb and 9s-and such as stock prices would be more likely to start with Is they are. Still, the precise conditions under which pes) = log (1 + ~), s = 1, ... ,9 are a topic of research. not fully understood and EXAMPlE 3.3.4 Is p(s)= ! (1 ~ J...y. s=O,l,2, ... ; J.. J.. > 0 a discrete probability function? Why or why not? To qualify as a discrete probability {unctio~ a given p(s) to Parts (a) and (b) of Definition A simple inspection shows Part (a) is satisfied Since A > 0, pes) is, in fact, greater tban or equal to 0 for aU $ = 0,1,2, ... Part (b) is satisfied if the outcomes in Sis 1. But sum of aU the probabilities defined on pes) ailsES = L00 1 .f~ 1 + ( J.. = 1 : A(1 A 1 + J.. )5 _ ~) 1+1 1 = 1 + J.. =1 1 +1 1 why? 15il Chapter 3 Random Variables The answer, then, is 1 ="1+): ( A 1+), O. L 2 .... ; A >- 0 does qualify as a probability function. Of course, it has any practical value depends on whether the set of values for pes) actually do describe the behavior of real-world pncen<Jmen:a" Defining "New" Sample Spaces We have seen how the function p(s) a probability with each outcome, S, in a sample space. Related is the key idea that outcomes can often be grouped or reconfigured in that may facilitate problem-solving. Recall [he sample space associated with a of 11 independent trials, where s is an of successes failures. most relevant information in such outcomes is often the number of SUccesses that occur, not a detailed of which trials ended in success and whic.h ended in failure. ThaI being the case. it makes sense to define a "new" sample space by the original outcomes to the number of successes they contained. The outcome (j, j, ... , f), for example, had 0 successes. On the other hand, there were 11 outcomes that yielded 1 success-(s, j, ... , f), (j, s, j, .... f), ... and U. j . .... s). As we saw earlier in Ihis chapter. that particular of outcomes ultimately led to the binomial distribution. The function that replaces the outcome (s. f, f, ...• I) with the numerical value I is caUed a random v{lriable. We conclude this section with a discussion of some of the concepts. terminology, and applications associated with nmdom vnrlobles. Definition A function domain is a sample space S and values form set of reaJ numbers is called a discrete random variable. a finite or countably We nenote random variahles by upper case letters, often X or Y. EXAMPLE 3.3.5 Consider two an expeliment for which the sample space is a set of ordered pairs, S = ((i. j) I i = t. 2, .... 6: j = 1. 2, ... , 6}. For a variety of games ranging from mattt:r~ on a given turn. That Monopoly to craps, the sum of the numbers showing is S of thirty-six ordered would nOI provide being case, the original sample a panicularly convenient bnckdrop discussing the rules of [hose games. It would be better to work directly with the sums. Of course, the eleven possible sums (from twO to values of the random variable X, where X (i. j) = i + j. twelve) are simply the Comment. In above example, suppose we define a random variable XI that gives the result on the first die and that gives the resulroll the second die. Then X = X I + X2. Note how we could extend this idea to the toss Three or tell dice. The ability ones is an advantage of to conveniently express complex events in terms of random variable concept that we will see playing out over and over Section 3.3 Discrete Random Variables 155 "The Probability Density Function We began this section discussing the function p(s), which assigns a probability to each outcome s in S. Now, having introduced the notion of a random variable X as a real-valued to find a mapping analogous to pes) function defined on S-that is, Xes) == k-we that assigns probabilities to the different values of k. X is a probability Definition 3.3.3. Associated with every discrete density function (or pdf), px(k), where pAk) = P({s E S I X(s) = k}) Note that p x (k) = 0 for any k not in the of X. usuaUy delete all references to sand S and write px(k) notational simplicity, we will = P(X k). Comment. We have already disclJssed at length two of the function px(k). the binomial distribution in 3.2. If we let the random variable X denote the number of successes n independent trials, then Theorem 3.2.1 states that P(X = k) = px(k) = G)pk(l - p),,-k, k = 0, 1, ... , n similar result was given in that same section in connection with hypergeometric distribution. If a sample of size n is drawn without replacement from an urn containing r red and w white chips, and if we let random variable X denote the number of red included the sample, then (according to Theorem EXAMPLE 3.3.6 Consider the rolling of two as described in Exampie 3.3.5. i and j denote the faces showing on first and second die, respectively, and define the random variable X to be the sum of two XCi, j) = j + j. Find px(k). According to Definition each value of px(k) is the sum of the probabilities of outcomes that mapped by X onto value k. example, P(X = 5) = px(5) = P({s E S I X(s) = 5)) = P«l, 4) U (4,1) U (2,3) U (3,2» = P(1,4) + 1 36 4 =- + P(4,1) + P(2, 3) + P(3,2) 156 Chapter 3 Random Variables TABLE 3.3.3 k px(k) k Px(k) 2 3 4 5 6 7 1/36 8 5/36 2/36 9 4/36 3/36 10 3/36 2/36 1/36 4/36 11 12 6/36 the dice arefair. Values of px(k) for other k are calculated similarly. Table shows the random variable's entire pdf. EXAMPLE 3.3.7 Acme Industries typically produces three electric power generators a day; some pass the company's quality control inspection on their first try and are ready to be shipped; others to retooled. The probability of a needing further work is 0.05. If a is ready to ship, the finn eams a profit of $10,000. If it needs to retooled. it ultimately costs firm $2000. Let X be the random variable quantifying the company's daily rind p x (k). The underlying space is a set of 11 = 3 independent trials, where p = P(generator passes inspection) = 0.95. If the random variable X is to measure the company's daily profit, then X .$10,000 x Of'!l~r::Hnrs (no. pas::.-iug in:-;pc;t:Liuu) - $2,000 X I. s) = 2($10,000) 1($2,000) = $18,000. Moreover, the random variable X equals $18.000 whenever the output consists two successes and one failure. is, X(s. I. s) X(s, s, J) = XU, s, s). It foHows tbat = p (X = $18,000) = p x (18,000) = G) (0.95)2(0.05)1 TABLE 33.4 No. Defectives o 1 2 3 k= $30,000 $18,000 0.857375 $6,000 -$6,000 0.007125 0.000125 = 0.135375 Section 3.3 Table 3.3.4 shows px(k) for -$6,000). Discrete Ri!ndom Variables 151 k ($30,000, $18,000, $6,000, and four possible values EXAMPLE 3.3.8 As part of her warm-up drill, player on State's basketball team is to shoot a 65% success rate at the foul line, free throws until two baskets are made. If Rhonda what is the pdf of the random variable X that describes the number of throws it takes her to complete the drill? Assume that individual throws constitute independent events. 3.3.1 illustrates what must occur if the drill is to endonthekth k = 2, 3, 4, ... : First, Rhonda to make one sometime during the first k - 1 attempts, and, second, needs to a basket on the kth toss. Written formally, px(k) = P(X = k) ::::: P(drillendsonkththrow) and k~2 in k-l throws) II (basket on ktb throw» = P(l basket and k-2 misses) , P(basket) = P«l Exacdy ooe bMket Attempts FlGURE].].1 Notice that k-l rhrt,.......... t throws in a basket: "",\..u-vu"",,,, have M T k-l sequences M 3" M k-l M T M B 2 T Since each sequence property tha t exactly one of the T M T M B T H probability (0.35)k-2(0,65), P(l basket and = (k-l)(0.35l-2 (o.65) Therefore, PX(k) = (k-1)(0.35)k-2(0.65) . (0.65) = (k-l) (0.35)k-2(O.65)2 • k = 2. 3. 4, ... (3.3.4) Table 33.5 shows the pdf evaluated for values k. Altbough the range of k is infirute, the bulk of the probability associated with is concentrated in the values two through seven; It is higbJy unlikely, example, Rhonda would need more than seven shots to complete drill. '58 Chapter 3 Random Variables TABLE 1.1.5 k px(k) 2 3 4 0.2958 0.1553 0.0725 5 6 0.0317 0.0133 0.0089 7 Transformations a variable from one scale to another is a problem that is comfortably familiar. If a thermometer says the temperature outside is 83°F, we know that the temperature in Cellligrade is = (~) 32) = (~) (83 =28 An analogous question arises in connection with random variables. Suppose thaI X is a discrete random with pdf px(k) If a second variable, Y, is defined to be aX + b, where a and b are what can be said about the pdf for Y? X is (j discrete random variable. Lei Y = Theorem 3.3.1. are constants. Then py (y 1 Proof.. py(y) (j X + b, where a and b PX (Y : b). = P(Y = y) P(aX + b= = ~X = y a b) PX ( - - c EXAMPLE 3.3.9 Let X be a random variable which px(k) = k = 1. 2, .... 10. What is the probability distribution associated with the random V1'Ir'"lnIP Y. where Y 4X - I? 111al find py(y). From Theorem P(Y = y) = P(4X 1 = y) = P(X (y + 1)/4) : I which implies that prey) 1,2, ... , to. But (y (y + 1)/4 = 10 when y + 39. = I~ for the ten values of (y 1)/4 = 1 when y = 3, (y py(y) = + + 1)/4 = 2 when y Y = 3.7 ..... 39. 1)/4 that = 7..... Section 3.3 Discrete Random Variables 159 Cumulative Distribution Function In working with random vari.ables, we frequently need to calculate probability that value a random variable is somewhere between two numbers. For example, suppose we have an integer-valued random variable. We might want to calculate an expression pes :5 X :5 r). If we know the pdf for X, then I :5 X ::: I) = L px(k). k=s but depending on the nature of px(k) and the number of terms that to be added, t may quite difficult. An calculating the sum of px(k) from k = s to k strategy is to use the that P{s :5 X :5 t) = P(X :5 t) - :5 s - 1) the two probabilities on the right represent cumulative probabilities of the random variable X. If the lauer were available (and they often are), then evaluating P (s :5 X :5 t) by one subtraction would clearly be than doing all calculations implicit I in px(k). Definition 3.3.4. Let X be a discrete random variable. any real number I. the probability that X on a value .::s I is the cumulo.tive distribution function (cd!) of X (written Fx{t)). In formal notation, Fx{t) = P(!s E S I X(s) :5 Ii). was the cdf is written the case with references to sand S are typically deleted. Fx{t) = P(X ::: t). EXAMPLE Suppose we wish to compute P(21 ::: X :5 40) for a binomial random variable X with n 50 and p = 0.6. Theorem 3.2.1, we know the formula px(k), so P(21 :5 X :5 40) can be written as a simple, although computationally cumbersome, sum: P(21 :5 X .::s 40) = f k=21 (50)(0.6l<O.4)50-k k Equivalently. the probability we are looking for can be expressed as the difference between two :5 X :5 40) = P(X :5 40) - P(X !: 20) = Fx(40) - Fx(2fJ) As it turns out, of for a binomial random variable are widely available, both in books and in computer software. Here, for example, Fx(40) = 0.9992 and Fx(20) = 0.0034, so P{21 ::: X ::: 40) = 0.9992 - 0.0034 = 0.9958 160 3 Random Variables EXAMPLE 3.3.11 X denote the Find Fx(2.5). Suppose that two are roUed. Let the random two faces showing: (a) Find Fx (I) for t = 1,2, ...• 6 a. The sample with the = ,o.V'~"1""11I1~''''' of rolling two fair dice is the sel where the face showing on the second die is j. of ordered pairs, s on the first die is i and the face (t, outcomes are likely. Now, suppose! is some (I) of the Then P(X .:s t) P(Max (i, j) = P(i :s I} ~ I = PO :s and j.:5 t} t} . P(j .:s t} (why?) I 6 6 t2 - 36' 1,2,3,4,5,6 random variable X has non-zero probability only b. Even though is defined for any real number from -00 to +00. 1 through 6, the But Fx(2.5) = :s :!. 2.5) - P(X ~ 2) = Fx(2) + + P(2 <: X the definition, :s 2.5) 0 so = Fx (2) = 36 = What would the graph 1 9 as a function of f look like? QUESTIONS 3.3.1. An urn contains five balls numbered 1 to 5. Two balls are drawn simultaneously. (8) let X be the larger of the two numbers drawn. Find px(k). (b) let V be the sum of the two numbers drawn. Find pv(k). 3.3.2. Repeat Question 3.3.1. for the case where the two balls are drawn with replacement. 3.3.3. Suppose a fair die is tossed three times. Let X be the largest of the three faces that appear. Find Px{k). 3.3.4. Suppose a fair die is tossed three times. Let X be the number of different faces that appear (so X 1.2. or 3). Find pAk). 3.3.5. A fair com is tossed three limes. Let X be the number of heads in the tosses minus the number of tails. Find 1.3,4,5.6,8. If both dice 3.3.6. Suppose die one has 1.2.2. 3. 3, 4 and die two has Show that the pdf for are rolled. what is the sample space? Let X = total X is the same as for normal dice. Section 3.4 Continuous Random Variables '61 3.3.7. Suppose a particle moves along the x-axis hP'''T'rn1 at O. It moves one to of its p06ition after 4 the left or right with equal probability. What 3.3.8. How would the for in Question 3.3.7. be if the particle was twice as likely to move to the right as to tile left? 3.3.9. Suppose that five people, including you and a Hne up at random. Let the random variable X denote the number of people standing between you and your friend. What is PA (k)? 3.3.10. Urn I and II each have two red chips and two while chips. Two chips are simultaneously from urn. Let Xl be (he number of red chips in the first "'.... ",..,..'" and the number of red chips in the second Find the pdf of X I + 3.3.11. Suppose X is a random variable with n = 4 and p = What is the of 1. 2X + I? 3.3.12. Find the cdf for the random variable X in 3.3.13. A fair die is rolled four times. Let the X denote the number of that appear. and graph the cdf for X. 3.3.14. At the x 0, 1. .... 6, the cdf for the random variable X has the value Fx(x) = x(x + 1)/42. Find 1he pdf for X. 3.3.15. Find the pdf for the discrete random variable X whose cdf at the points x = O. I .... , 6 is by Fx(x) = .x 3/216. CONTINUOUS RANDOM VARIABLES The statement was made in Chapter 2 that all spaces belong to one of two types--discrete spaces are ones that a finite or a countably number of outcomes and continuous sample are those that contain an infinite number of outcomes. Rolling a pair and recording the that appear is an experimen1 with a discrete sample space; a number at random from the interval fo, 1] would a continuous sample space. How we probabilities to these l wo types of sample spaces is 33 focussed on discrete sample spaces. Each outcome s is assigned a probability by the discrete probability function p(s). If a random variable X is defined on the sample space, the associated with its outcomes are assigned by the probability density function px(J.:). Applying those same though, to the outcomes a continuous sample space will not work. The that a continuous sample space has an uncountably infinite number of outcomes eliminates assigning II probability to each point as we did in the discrete case with the p(s). We this section with a particular pdf On a discrete sample space that suggests how we III on a continuous sample space. an electronic surveillance monitor is turned on briefly at the of every hour and has a 0.905 probability of working properly, how long it has remained in service. If we let the random X denote the hour at which the monitor first then px(k) is the product of k probabilities: px(k) P(X = k) = P(monitor fails for P(monitor functions properly (0.905)k-l(0.095), k= 1. 2,3, time at the kth hour) first k -1 hours n monitor fails at the kth hour) 162 Chapter 3 Random Variables 0.1 0.1)') 0,(]8 0,07 0.06 Px(k) 0.05 0.04 0.03 0.02 0.01 o 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Hour when monitor (jrst f:ails. k FIGURE 3.4. 1 of px(l() for k values ranging from 1 to 11. 3.4.1 shows a probability the width of each bar is 1, area of Here the height of the kth bar is px(k), kth is also px(k). Now, look at Figure 3.4.2, where the exponential curve y = O,le-OJx is SUI:>erlmlD01;eCl px(k). Notice how closeJy the area under the curve approximates the bars. It follows that the probability that X lies in some given interval will be similar to the integral of the curve above that same ,n .." ..... ,,, For example, the probability that the monitor during the first four hours would be the sum 4 P(O:'5 X :'54) = L px(k) k=O 4 = L(O.905)k-l(O.095) .4:=0 = 0.3297 To four decimal .-"'~.~~, the corresponding area under the exponential curve is the same: 10f04 dx = 0.3297 = the similarity here between px(k) the exponential curve y is our alternative to p(s) for samplt: spa(.:t:s. Instc:au vf prolbalJlll1tles for individual points. we will define for inlervals of points, and those probabilities will be areas under the graph of some function (such as y = O.1e- O,lx), where the of the function will reflect the desired probability "measure" to be associated with space. Section 3.4 Continuous Random Variables 163 OJ 0.09 0.08 0.07 Y '" 0.1e-o. h 0.06 pX<k) 0.05 0.04 0.03 0.02 O.OJ o 1 2 3 4 5 6 7 S 9 1Q 11 12 13 14 15 16 17 IS 19 20 21 HOUT when moni[Dr first fails.. k RGUm: 3.4..2 Definition 3..4.1. A probability function P on a set of real numbers S is caJled continuous if there exists a function J(t) such that for any dosed interval [a, b] C S, P([a, b]) = I: J(t) dl. Comment. If a proba bility function P satisfies Definit.ion 3.4.1, then peA) for any set A where the integral is defined Conversely, suppose a function J(t) has the two properties = fA J(t) dt L J(t) 2: 0 for all t 2. ~J(t)dt = 1. If P (A) = fA J (t) dl for aU A. then P will satisfy the proba"bility axioms given in Section 2.3. Choosing the Function f(t) We have seen that the probability structure of any sample space with a finite or countably infinite number of outcomes is defined by the function pes) = P(outcome is s). For sample spaces having an uncountably infinite number of possible outcomes, the function J(t) serves an analogous purpose. Specifically, f(t) defines the probability structure of S in the sense that the probability of any interval in the sample space is the integral of J(t). The next set of examples illustrate several different choices for J(I). EXAMPlE 3.4.' The continuous equivalent of the equiprobable probability model on a discrete sample space is the function f(t) defined by f(t) = l/(b - a) for all t in the interval [0, b] (and J(t) = 0, otherwise). This particular f(t) places equal probability weighting on every dosed interval of the same length contained in the interval [a, b]. For example, suppose 164 Chapter 3 Random Variables density o 8 6 3 1 10 A FlGURE3A..3 a = 0 and b = 10, and let A = [1. 3] and B = [6.8]. itr (~) 10 3 P(A) = dt = ~ = 10 and j(t) 1 (2.) 8 PCB) = 6 10 dt EXAMPlE 3.4.2 Could j(t) = .0:::: I :::: 1 be used to define the probability function (or a sample space whose outcomes of aU the real numbers in the .""t~..."., because (1) j(t) > 0 all t, j(t} dt = 101312 dl = t31~ = 1. Notice thaUhe shape of f (t) implies that outcomes close to 1 are more likely to occur than are outcomes close to O. For exam pie, P ([0, ~]) = f~/3 3t2 dt = t31~/3 = ~. 'While peri. 1]) = li'3 3t2 dt = ,3 li13 1 = 3 Probability density 2 Area o RCiURE 3.4..4 EXAMPLE 3A.3 By far the most continuous probability functions is the curve, known more as the normal (or Gaussian) distribllfion_ Tbe sample space its probability function is given by for the nonnal distribution is the entire jet) = 1 [1 ./iHa exp -2 (I - Jl.)2] -(1- , -00 < I < 00, -00 < Jl. < 00, (1 > 0 Section 3.4 Continuous Random Variables 165 u=O.5 /(1) FIGURE 3..4.5 Depending on the values assigned to the parameters J.L and (1, f(t) can take on a variety of shapes and locations; three are illustrated in .ol"'LWI., Fitting 1(1) to Data: The Density-Scaled Histogram The notion of using a continuous an integer-valued discrete probability model has been 3.4.2). The "trick" there was to replace the spikes that define px(k) with whose heights are px(k) and whose widths are one. Doing that the sum areas of the rectangles corresponding to px(k) equal to one, which is the same as the total area under the approximating continuous probability function. equality of those two areas, it makes sense to superimpose (and compare) of px(k) and the continuous probability function on the same set of axes. Now, the related, but slightly more general, problem of a continuous function to model the distribution of a set of n Yl,)1, .•. ,YII_ Fol!owrin~ the approach taken in Figure 3.4.2, we would start by llla.lU.lJl!; a histogram n observations. The problem is, the sum of the areas of the bars that hisl:Ol!I'2JTI would not necessarily equal one. a case in point, Table 3.4.1 shows a set of forty observations. Yi '8 into five classes, each of width ten, produces the distribution and ruslcOglram pictured in ...,U"..... 3.4.6. Furthermore, suppose we have reason to believe that Yi '8 may .<:I.U .... VI... sample from a uniform probability function defined over the [20,70}VV."VLULY 1 f(l) = 1 70 - 20 = 50' 20:51:570 TABLE 1..4.1 33.8 41.6 24.9 62.6 54.5 22.3 68.7 42.3 405 69.7 27.6 62.9 30.3 41.2 57.6 32.9 22.4 64.5 54.8 58.9 25.0 33.4 48.9 60.8 59.2 39.0 68.4 49.1 675 53.1 38.4 42.6 64.1 21.6 69.0 46.0 46.6 166 Chapter 3 Random Variables Class Frequency 20<y < 30 3O.:::;y<40 40:s:."<50 50:s: y < 60 6O:s: y < 70 7 12 8 6 9 4 8 10 0 20 40 30 40 50 60 70 J' FIGURE 3.4.6 (recall can we appropriately draw the the Yi'S and the uniform probability model on the same graph? Note, thot f(l) and the histogram are not computible in the sense that the urea under f(t) is (necessarily) one (= 50 X but the sum of the areas of the bars making up the iE/our hundred: do), histogram area = 10(7) + + 10(9) + 10(8) + 10(10) =400 Nevertheless, we can "force" the total area of the five bars to match the area under f(t) by redefining the scale of the vertical axis on Specifically,frequency to be replaced with the analog of probability density, whleh would be the scale used on the vertical axis of any graph of f(t). Intuitively, the density associated with, say, the [20,30) would as quotient 7 40 x 10 because that constant over the interval [20, 30) would the latter does represent the probability that an observation (20,30). Figure shows a histogram of the data in Table 3.4.1 the height of each bar has been to a density, according to the formula density (of a class) = class tr.. ft"'''''F''V . total no. of observatIons X class Superimposed is the unifonn probability 20 :S: I :S: 70. Scaled in this areas under both f (I) and the are one. In density-scaled histograms offer a but effective, format fOT examining the "fit" between a set of data and a continuous model. We will use it often in the chapters ahead. Applied statisticians have embraced this particular graphicaJ technique. Indeed, computer that include Histograms on their menus routinely give users either frequency or density on the Continuous Random Variables Section 161 0.03 Class 20.:sy<30 3O.:s y < 40 40.:s y < 50 5O.:sy<60 60.::: y < 70 7/[40(1O)J 6/ [40(1 OH 9/[40(10)] 8/[40(10)] 10/[40(10)] = = = = = 0.0175 0.0150 0.0225 0.0200 0.0250 Density 0.01 y 20 30 40 SO 60 70 FIGURE 3.4.7 CASE STUDY 3.4.1 ago, the VS05 transmitter tube was standard equipment on many Table 3.4.2 part of a reliability study done on are the liletimes (in hrs) recorded for 9()3 tubes (37). Grouped into eighty, the densities for nine classes are in last column. aY""'<;;lU,::>. radar listed TABLE1Al Number SO-16O Density 317 230 118 93 240-320 49 4SO-560 560-700 700+ UJ....."-''"'L'"''""' 17 26 -20 903 0.0044 0.0032 0.0016 0.0013 0.0007 0.0005 0.0002 0.0002 0.0002 has shown that lifetimes of electrical equipment can often by the eXIX>nential probability function, J(t) = Ae-)J, nicely t > 0 value of A(for reasons explained in Chapter is set equal to reciprocal of the average lifetime of the tubes in the sample. the distribution of these data also described by the model? way to answer such a question is to the proposed on a graph the density-scaled The extent to which the two graphs are similar then becomes an obvious measure of the appropriateness of the model (Continued on next page) 168 3 Random Variables (C(lSt SIIU:!Y 3.4.1 comilllled) (tOO4 0.003 Probability 0.002 0.001 240 80 500 560 VB05 lifetimes (hrs) 700 AGURE 3.4.8 For these data, A would be 0.0056. Figure 3.4.8 shows the function /(1) = 0.0056e-000561 plott.ed on (he same axes as the density-scaled excellenl, and we would have no reservations about lifetime probabilities. How is it, for than five hundred hrs? Based on the [he agreement is areas under /(t) to estimate tube will last longer would be 0.0608: P (V805 ureUI11le exceeds f~ 0.OO56e-o.0056Y dy = -e-O·OO56)'I~oo = e-O.OO56 (500) = e-2.8 0.0608 Continuous Probability Density Functions We saw in Section 3.3 how the introduction of discrete random variables facilitated the solution of certain problems. The same sort of iunctlon can also be defined On spaces with an uncountably infinite number of outcomes. In practice, continuous random variables are often simply an identity mapping. so they do not radically sample space i.n the way that a binomial random variable Nevertheless, it have the same notation for both kinds of spaces. the real into the real Definition 3A.2.. A (unction Y that maps a Y is the function fy(y) numbers is called a continuous random variable. The Section 3.4 Continuous Random Variables 169 having the property that for any numbers a and b, Pea s Y s b) = lb fy(y) dy EXAMPLE 3.4.4 We saw Case Study 3.4.1 that lifetimes of V805 radar tubes can be nicely modeled by exponential probability function, /(I) = O.OO56e-ROOS6I , (> 0 To couch that statementin random variable would simply require that we define Y to be the life a V805 radar tube. Then Y would be the identity mapping and the pdf the random variable Y would be the same as the probability function, /(1). That we would write O.OO56e-o.~.l56y, fy(y) y ~0 Similarly, when we work with the bell-shaped normal distribution in wm write the model in random variable notation as fy(y) = 1 -==-e chapters we -oo<y<oo EXAMPLE 3.4.5 Suppose we would like a continuous random variable Y to "select" a number between o 1 in such a way that intervals near the middle of range would more likely to be represented than intervals near either 0 or L One pdf having that property is the Figure 3.4.9}. Do we know for that function fy(y) = 6y(1 - y),O .::; y s 1 the function pictured in Figure 3.4.9 is a "legitimate" pdf? Yes, because fy(y) ~ 0 for all y, and 6y(1 - y) dy = 6[).2/2 y3/3] 16 = 1. It Comment.. simplify the way pdfs are written, it will that fy(y) = 0 actually specified in the (untion's definition. In Example 3.4.5, for all y outside the Probability density '1v=5: Y ('-YI O~--~1-----_~I----~3----~-Y "i 2 FIGURE lA.9 '4 170 Chapter 3 Random Variables for instance;, the statemem Jy (y) y), 0 6.1'(1 y,5 1 is to be interpreted as an abbreviation for IY(Y) = 0, )'<0 6y(1- O.::sy.::sl 0, )' :> 1 1 Cumulative Distribution Functions random varia bIe, discrete or continuous, is a cumulative distribution random variables (recall Definition 3.3.4), the cd! is a nondecreasing function. where the occur at the values of t for which the pdf has probability. For continuous random the cdf is a monotorncaUy nondecreasing. continuous function, In both cases, the can be helpful in calculating the probability that a random variable takes on a value in a interval. As we will see in later chapters. there are also several important that hold for continuous cdfs and pdfs. One such relationship is cited in Theorem £1,,,,,,rPltf" Definition 3.43. The cdC for a of its pdf: Fy(y) = random variable Y is an indefinite integral [~Iy{r) dr = PHs E S I Y(s) :::: yJ) P(Y :::: y) Theorem 3.4.1. Let Fy(y) be the cdl of a continuous random variable Y. d dy Proof. statement Theorem of Calculus. a. PlY > s) h. P(f' < Y:s s) = Jim Fy(y) y-+oo . = .(}.(\,) . Theorem 3.4.1 follows immediately (rom the Fundamer\t.al C Y be a continuous random variable wilh cd! Fy (y), Then Theorem 3.4.2. c. F~·(,,) = 1 Fy(s) Fy(r) =1 lim Fr(Y) = 0 d y~-oo Proof. It. b. P(Y > s) = 1 - P(Y .::s s) since (Y > :;) and (Y :::: s) are complementary events. But P(Y < s) = Fds), and the conclusion follows. Since the set (r < Y :::: 05) = (Y .::s 05) - (Y .::s r), P(r < Y :s $) P(Y:::: pry :s r) = Fy(s) Fy(r). Section 3.4 Co 111 = Iy,,) be a set of values of Y, 11 1. 2,3 .... , where y" < Yn+l for all 11, and lim y" 00. If lim Fy(y,,) :: 1 [or every such sequence {YII), then = n-+oo lim Fy(y) ~oo [or n = 1. To that n~ = (Y :::; Yl), and = P(LIZ..1 At) = L set Al = (Yn-l An < Y :::; y,,) n = 2,3, ... Then' FY(Yn} disjoint. Also, the sample P(Ak) S = lfb.t peAk), since the AI< are "'='1 Ak, and by Axiom 4, 1 = P(S) = peAk). Then putting these equalities together gives 1 Ak) - d. Continuous Random Variables lim L" n-+oo k=O lim Fy(y) = P(Ak) PlY y->--co =1 =1 - = = lim Fy(y,,) ,,-><'£} ::s Y) = lim P(-Y y-+-oo lim P(-Y::s -y) = 1 - y-+ -00 lim F_y(y) y-+oo ~ -Y) = y-+-oo lim [1 lim P(-Y y-+oo P(-Y::s -y)J ::s y) =0 o Transformations If X and Yare two discrete random variables and a and b are constants such that Y = aX + b, the pdf for Y can be expressed in terms o[ the pdf for X. Theorem 3.3.1 provided details.. Here we the analogous result for a linear transfonnation involving two continuous random variables. ThOOl:'em 3.4.3. Suppose X is a continuous random variLlble. and bare conSlIlnts. Then If(Y) 1 = lal Y =aX + b, where a -:;: 0 (Y b) -0 Proof. We begin by writing an expression for the cdr of Y: Fr(y) = PlY ::s y) = P(aX + b::s y) = P(aX ::s Y - b) At this point we need to consider two cases, the distinction being the sign of a. Suppose. first. that a > O. Then Fr(y) = P(aX ::s y - b) == P (X ::s y : b) and differentiating Fy(y) yields frey): fy(y) = d d -Fy(y) = dy 1 ( ~)=!fx(~)= a a alai (Y: b) Ifa < 0, Fy(y) = P(aX .::s y - b) = P (X Y- b) = 1 > -a- P (X .:':S y a 112 Chapter 3 Random Variables Differentiation in this case gives fy(y) =dyd (y) = !!-. [1 dy (y ~ b) ~fx lal (Y -a b) o and the theorem is QUESTIONS 3.4.L Suppose Jr(y) = .0.:::: y .:::: 1. Find p(O.:::: Y .:::: ~). ~ 3.4.2. For lht: ndlluom variai;1c }' with pdf Jy(y) = + 0 ~ y ::: 1, find P(1 ~ Y ~ 1) . 3.4.3. Let fy(y) -1 ~ y .:::: L Find pOY ~I < l). Draw a of fy(y) and show the area representing the desin.:d probability. 3.4.4. infected with a certain form of malaria, the of time in remISSion is described by lhe continuous pdf fy(y) ,},2. n .:::: y .:::: 3, where Y is measured in years. What is the probability that a malaria patient's remissinn lasts longer than one year? 3.4.5. The length of time. Y, that a customer spends in line at a bank teller's window before being served is by the exponential pdf = O.2e-O•2y • y 2: O. (a) What is the probability thnt n customer wi\{ more than 10 minutes? (b) Suppose the customer will leave if the wait is more than 10 minutes. Assume that the customer goes to the bank twice next month. Let the random variable X be the number of times the customer leaves without served. Calculate px(l). 3.4.6. Let II be a positive ShowthatJy()')=(n +2)(n +1»'110- y),O.::::y.::::l,is a pdf. 3.4.7. Find the cdf for the random variable Y in Question 3.4-.1. Calculate P (0.:::: Y .:::: = using 3.4.8. If Y is an exponential random variable, fy(y) 3.4.9. If the pdf for Y is Jy(y) = 0, 1 _ \yl. 1 find and graph Fy(y). 3.4.10. A continuous random variable Y has a cdf = A.e-Ay , y 2: O. find Fy(y). IYI> 1 I.vl ::s 1 by y < 0 0.:::: y < 1 }'2: 1 i) Find P(! < Y.:::: twowa'vs--nr'Sl 3.4.11. A random variable Y has Fy(y) = using the cdf and second, 0 In y l:::s y :::s e 1 e < y 1 v < 1 using the pdf. Section .lA.12. 3.4.13. 14.14. 3.4.15. 3.4.16.. 3.4.17. Expected Values 113 Find (a) P(Y < 2) (b) P(2 < Y ::s 2~) (c) P(2 < Y < 2!) (d) fy(y) The cdf for a random variable Y is defined by Fy(y) 0 for y < 0; Fy(y) = 4y1 O::s Y ::s 1; and Fy(y) = 1 for y > 1. Find P(l ::s Y ::s ~) by integrating fy(y). Suppose Fy(y) = + y3), O::s Y ::s 2. Find fy(y). In a certain country. the distribution of a family's disposable income, Y, is by the pdf fy(y) = ye-)', y 2: O. Find the median of the income distribution-that is, find the value m such that Fy(m)'= OS Let Y be the random variable described in Question 3.4.3. Define W 3Y 2. Find (w). For which values of w is jw(w) of; O? Suppose that fy (y) is a continuous and symmetric where symmetry is the property that fy(y) fy(-y) for aU y. Show that PC-a ::s Y ::s a) = 2Fy(y) - 1. Let Y be a random variable denoting age at which a piece of equipment fails. In reliability theory, the probability that an item fails at time y given that it has survived until time y is called the hazard rate, hey). [n terms of the pdf and cdC bel = hey) Find h(y) if Y has an exponential pdf =1 + _ Fr(y) Question 3.4.8). EXPECTED VALUES Probability density fUnctions, as we have already seen, provide a global overview of a random variable's behavior. If X is discrete, px(k) gives P(X = k) for all k; if Y is continuous, and A is any interval, or countable union of intervals, P(Y (; A) fA jy(y) dy. that explicit, though, is not always necessary--or even helpful. There are times when a more prudent s(rategy is (0 focus information contained in a pdf by summarizing certain of its features with single numbers. The first such feature that we will examine is central tendency, a term referring to the value of a random variable. C-Onsider the pdf's px(k) and fy(y) pictured in Figure 3.5.1. Although we obviously cannot predict with certainty what values any future X's and Y's will take on, it seems dear thal X values will tend to lie somewhere near, /.Lx, and Y values, somewhere near /.Ly. In some sense, then, we can characterize px(k) by /.Lx, and fy(y} by /.LY. = FIGURE 3.5.1 174 Chapter 3 Random Variables The most frequently measure for describing central tendency-that is, for quantifying /Lx and tty-is the expected value. Discussed at some length in this section and in the expected value of a random variable is a slightly more abstract formulation Section discrete settings as the arithmetic average. of what we are already familiar Here, the values are "weighted" by the Gambling affords a familiar illustration of the notion of an C)(f)Cctc<.\ Ihe game of roulette. After are placed, the croupier the and declares one the winner. Disregarding what seems 10 thirty-eight numbers, 00, 0, 1,2, ... ,36, to a perverse tendency of roulette wheels to land on numbers for which no money hIlS been wngered, we will assume that each of these thirty-eight numbers is (although only the eighteen numbers 1,3,5, ... ,35 are considered to be and only the eu.!me:en numbers 2. 3, 4, ... , are considered to be even). that our particular "even money") is $1 on If the random X denotes our winnings, then on the value} if an odd occurs, otherwise. Therefore, P(X = 1) = 18 9 =19 and 20 px(-l) = P(X = -1) = 38 Then I~ of the t: .ne we Intuitively, if we more than cenls each 10 19 win one dollar and }g of the time we will lose one dollar. in this we stand to lose, on the average, a Ihtle we play the "expected" winnings ...::. $1 - 9 19 -"'U.UJ.J + 10 19 == -5¢ The number is called the expected value of X. Physically, an expected value can be thought of as a center of gravity. fur ~~<~. .t"~' imagine two of ~ and ~ (Xlsitioned a weightless X-axis at the -1 and respectively (see Figure If a fulcrum were placed at the point -0.053, the would be in balance, implying that we can think of that point as ofr the center of the random variable's distrihution. ( ) ... fiGURE 3.$.2 Section 3.5 Expected Values 115 Tf X is a discrete random variable taking on each of its values with the same probability, the expected value of X is simply the everyday notion of an arithmetic average or mean: L k· expect(~d value of X = allk lenOln:gthis 1 n = -1 L n k allk to a discrete X described by an arbitrary pdf, px(k), expecte~d value X = Lk . px(k) (3.5.1) allk a continuous random variable, Y. summation in Equation integration and k . px(k) becomes y . frey). Defininon is replaced by an Let X be a discrete random variable with probability px(k). is denoted E(X} (or sometimes J.L or J1.x) and is given by expecled value E(X) = J1. = I1x = L k . px(k) allk Similarly, if Y is a continuous random variable with pdf frey), E(Y) = J1. = J1.y = f': y . fr(y) dy Comment. We assume that both the sum and the integral in Definition 3.5.1 oonverge absolutely: L Iklpx(k) < all k 00 f: Iylfy(y) dy < 00 If not, we that random variable has no finite expected value. One immediate sum that is not absolutely reason for requiring absolute convergence is that a oonvergent depends on the order in which the terms are added, and order should obviously not a consideration when defining an average. EXAMPLE 3.5.1 Suppose X is a binomial random variable with p = ~ k) = m(ij)k(~)3-k,k 11 = 3. Then px(k) = P(X = == 0,1,2,3. What is the expected value of X? Applying Definition gives E(X) = _ 0 (~) 729 - () 2 (300) (125) _ + (1) (240) 729 + () 729 + (3) 729 - 1215 _ ~ _ (5) 729 - 3 - 3 9 176 Chapter 3 Random Variables COllUDent. Notice that the expected value reduces to five-thirds, which can be written as three times five-ninths, the latter two factors being nand p, respectively. As next that relationship is not a coincidence. Theorem 1,5.1. Suppose X is a binomial random variable with parameters n l11Ul p. Then E(X) = np. Proot to Definition a binomial random variable is the sum E(X) k . n! = k!(n - )n-k k k)'P (1 - P _ p),,-/i: = At point, a trick is caned for. If E(X) (3.5.2) = E g(k) can be fadored in such a way ala that e(X) = h E px,,(k). where px-(k) is the pdf for sOme random variable X", then alU = h, the sum of a pdf over its entire range is one. Here, suppose that np is factored out of Equation Then £(X) (n E(X) = np k=1 (k =np Now, let.i =k - (1 _ p),l-k l)!(n G_~)/-1(1 - L It follows that E(X) lettingm =n = np n ( j - 1 E(X) npt(":)pj(l p)m- J J=ft J and, since the value of the sum is 1 (why?), E(X) np (3.5.3) o Section 3.5 FrrY>r'tPrl 1n Values The statement should come as no surprise. If a for example, has one hundred questions, each with five possible answers, we 'expelct" to get twenty by guessing. But if the random variable X oellou~s the number of correct answers (out of one hundred), 20 = E(X) = lOOa) = np. EXAMPLE 3.5.2 An urn contains nine chips, without replacemenl Let X denote Section 3.2, we recognize X to four white. Three are .drawn out at random of red chips in the Find E(X). a hypergeometric random V~"'l""l">l k=O.l, E(X) 3 = Comment. As was true in 3.5.1, the value found here for E(X) suggests a general formula-in case, for the expected value of a hypergeometric random variable. X is a hypergeometric random variable with parameters r. w, ami r red balls and w white balls. A sample of size n is urn. Let X be the number of red in the sample. Then Tbemem 3.5.2. n. That is, suppose an urn drnwn simultaneously from E(X) = I"n r+w . o Proof. See Question Comment. Let p the proportion of rd balls in an urn-that is, p = r r+w • The formula, then the expected value of a hypergeometric random variable has the same structure as the formula for the expected value of a binomial random variable: E(X) rn l' = r+w =n r+w np 178 Chapter 3 Random Variables EXAMPLE 3.5.3 Among the more common versions of "numbers" racket is a game called DJ" its name deriving from the fact that winning ticket is detennined from Dow averages. sets of stocks are used: Industrials, Transportations, and Utilities. Traditionally, are Quoted at two different times, 11 A.M. and noon. The of the earlier quotation are to fonn a number; the noon generates a second nmnber. the same way. two numbers are then added together and the last three of that smn become the winning pick. Figure 3.5.3 shows a set of quotations for which 906 would be declared the winner. 11 A.M. Noon quotation quolation Jnduslrials Transportation U!ilities Industrials \ 848.1;;: 376.7;3: 110.6:3; -" - + 173 906 '" Winning numbel'" FIGURE 3.5.3 The payoff in DJ. is 700 to 1. Suppose that we bet $5. How much do we stand to win. or on the average? Let p denote the probability of our number earnings. x = 1~5~O ..J)J the winner and 1et X denote our with probability p with probability 1 - p and E(X) = $3500 . . (1 - p - p) Our intuition would suggest (and this time it would be that each winning numbers, 000 through 999, is equally likely. That being the case, p E(X) $3500· On the average, then, we lose C~) - $5 . c:) the 1/1000 and = on a $5.00 bet. EXAMPLE 3.5.4 Suppose that fifty people are to be given a blood test to see who has a certain The obvious laboratory procedure is to examine each person's hlood individually, meaning Section 3.5 EXpected Values 179 that flity tests would eventually be run. alternative is to divide blood sample into two parts-say, A and B. All of the A's would then be and as one sample. If that "pooled" sample proved to negative all individuals must necessarily be of the infection, no further would need to done. If the pooled sample gave a positive reading, of course, aU fifty B samples would have to be separately. Under what conditions would it make sense for a laboratory to consider pooling the fifty SaIJI)P11~S In the pooling strategy is (I.e., more economical) if it can substantially reduce the number of tests that need to be perfonned. Whether or not it can ae'peIIOS ultim.ately on probability p that a person is infected with the random X denote number of tests that will have to be perfonned ~~ _ _ _ 1~~ are aearly, ! 1 if none of the fifty is 51 if at least one of the fifty is infected x = px(l) P(X = P(none of the fifty is infected) (1 _ p)50 independence),and P(X 51) = px(51) 1 - P(X = 1) 1 - (1 _ p)50 Therefore, E(X) =1 . (1 - p)50 + 51 11 - (1 _ p)50] Table 3.5.1 shows E(X) as a of p. As our intuition would suggest, the pooling strategy becomes feasible as the prevalence of the disease diminishes. If the ........ u .... " of a person being infected is 1 the pooling strategy requires an average of only 3.4 tests, a dramatic improvement over 50 tests that would be needed samples were tested one by. one. On the other hand, if 1 in 10 pooling would inappropriate, requiring more than 50 tests [E(X) == SO.7]. TABLE 3.5.1 p 0.5 0.1 0.01 0.001 OJJOOI E(X) 51.0 50.7 20.8 3.4 180 Chapter 3 Random Variables EXAMPLE L:QnS14::aer the following game. A fair coin is until the first tail we win $2 if it on first toss, if it on second and, in general, if it first occurs on the kth toss. the random variable X denote our winnings. How much we have to pay order to a fair game? [Note: A game is one where "pr'p"""", between the ante and £(X) is 0.] Known as the St. Petersburg paradox, this problem has a rather unusual answer. First, note that 1 k-1.2, ... Therefore, E(X) = i'·;k::::::: 1 + 1 + 1 + ... which is a divergent sum. is, X not have a finite value, so in this game to be fair, OUr ante would have to be an infinite amount of for Comment. Mathematicians have trying to "explain" the St. paradox for almost two hundred years The answer seems clearly absurd-no gambler would consider pa'Ying even $25 to play such a game, much less an infinite amount-yet the computations involved in showing that X has no finite expected value are unassailably correct Where the difficulty lies. to one common theory, is with our inability to put in perspective the very small probabili ties of winning very payoffs. Furthermore, the problem assumes that our opponent has infinite capital, which is an impossible state ot affairs. We a much more reasonable answer for E(X) if the stipulation is i:u.klt:u that our winnings can be at mos~ $1000 Question 35.19) or if the payoffs are assigned according to some formula other than (see Question 3.5.20). Comment. There are two important lesoons fo he. learned from the Petersburg First is the that E(X) is not necessarily a meaningful characterization of the "location" of a distribution. Question another situation where the formal computation of E(X) a similarly inappropriate answer. Second, we need to be aware that the notion of expected value is not necessarily synonymous with the of worth. Just because a for example, has a posjtive expected value--even a very large positive not imply that someone would want to play Suppose, for example, that you had the opportunity to spenu yOur last $10,000 on a SWt~pStalt(es ticket where the was a billion dollars but the probability of winning was only 1 in 10,000. The expected value of such :1 bet would be over $90,000, THtr'l4£ltW E(X) = $1.000,OOO,OOOCo,~) + $90,001 (-$10,000) Co,ooo) Expected Value$ Section 3.5 181 it is doubtful that many people would rush out to buy a ticket. (Economists have long recognized the distinction between a payoff's numerical value and its. perceived desirability. They refer to the latter as utility.) EXAMPlE 3.5.6 The distance, Y, that a molecule in a gas travels before colliding with another molecule can be modeled by the exponential pdf h(Y) =#1 y2:0 where # is a positive constant known as the mean free path. Find E(Y). Since the random here is continuous, its expected value is an integral: E(Y) t Jo = JO y~e-YI/L dy # = Let w = y/#, so that dw 1/# dy. Then E(Y) = dv = e-wdw and integrating by gives E(Y) = #[ -we-w #10 we-wdw. Setting u e-W]I~ = # - wand (3.5.4) Equation that # is aptLy in fact, represent the.-a:verage ..distance a mOlecule travels, free of any collisions. Nitrogen (N2), for example, at room temperature and standard atmospheric pressure has # = 0,00005 em. An Nz molecule, then, travels that far before colliding with another N2 molecule, on average. EXAMPlE 3.5.1 One continuous pdf that has a number of interesting appLications in physics is the Rayleigh distribution, where the pdt is given by fy(y) = y 2 e- (la , > 0; 0. 0:::; y < (35.5) 00 Calculate the expected value for a random variable having a Rayleigh distribution. From Definition 3.5.1, ECY) Let v = 1 00 o Y . y .2 2 Y /211 dy -e2 0. = y/c..tia). Then integrand here is a special case k = 1, 182 Chapter 3 Random Variables Therefore, 1 = 2J2a . 4ifii E(y) = Comment. The pdf is for William Strutt, Baron the anti Lwenlidh-ceotury Bfitish physicist who showed that Equation 3.5.5 is the solution to a problem arising in the study of wave motion. If two waves are superimposed, it is well known that the height the resultant at any time t is the sum added 3.5.4). Seeking to of the corresponding heights of the waves extend that notion. Rayleigh posed the following Question: If n waves, each having the same amplitude h and the same wavelength, are superimposed randomly with respect to about the amplitude R of resultant? Clearly, R is a random phase, what can we variable, its value depending on the particular colledion of phase angles by the sample. What Rayleigh was able to show in his 1880 paper (173) is that when n is the pdf large, the probabilistic behavior of R is described rrin~l~~nlh- 2r = -2 iR(r) r > 0 nh which is just a "IJ'~"'''''~ case of Equation with a Resultant \ .... _....,, I ... Wave 2- RGURE 3..5.4 A Second Measure of Central Tendency: The Median While the expected value is the most used measure of a random central tendency, it does have a weakness that sometimes makes it misleading and inappropriate. Specifically, if one or several possible values of a random variable are either much smaUer or much larger than all the others, the value of J.1.- can be distorted the sense that it no longer reflects the center of the distribution in any H'''''~'U'E',-''''' way. For a small community consists of a homogeneous group of middle-ronge soJary earners, and then Bill Gates moves to town. Obviously, the town's HU.H.-"JL'-' Section Expected Values 183 l1VC:IiU~C: salary and after the multibililonaire arrives will be quite different, even though he represents only one new value of the "s.l!lary" random variable. It would helpfuJ to have a measure of tendency that was not so sensitive to "outliers" or to probability distributions that are markedly skewed. One such measure is the median, which, in effect, divides the area under a pdf two areas. Definition 3.5.2. If X is a discrete random variable, the median, m, is that point for which < m) P(X > 111). In eventthatP(X:5 m) 0.5 and P(X 2::m/) =0.5, the is defined to be the arithmetic average, (m + m/)/2. If Y is a continuous random variable, median is the solution to the integral equation, r:cY:Jy(Y) dy = 0.5. = = EXAMPlE 3.5.8 If a random variable's pdf is symmetric, both J.t andm wiD equal. Should px(k) or frey) not be symmetric, though, the difference between the expected and the median can considerable, especially if the asymmetry takes the form of extreme skewness. The situation described here is a case in point Soft-glow makes a 6()..watt light bulb that is advertised to have an average life of one thousand hours. Assuming that that performance is valid, is it reasonable for consumers to conclude that the Soft-glow bulbs they buy will last for approximately one-thousand hours? of a bulb is one thotk~nd hours, the (continuous) pdf, frey), No! If the average modeling the length time, Y, it remains before burning out is likely to have the form fy(y) = O.OOle-O.OO1y , y > 0 (3.5.6) (for reasons explained in Chapter 4). But Equation 3.5.6 is a very skewed pdf, a shape much like the curve drawn in Figure 3.4.8. The for such a distribution will lie considerably to the left of mean. More specifically the median lifetime these to Definition 3.5.2-is the value ttl for which T But f;' O.OOle-O,OOlYdy =1 - e-O.OOlm Setting the latter equal to 0.5 m one you (1/-0.001) tn(O.5) = 693 that 184 Chapter 3 Random QUESTIONS 3.5.1. Recall the game of Keno described in Example 3.2.5. The following are all the payoffs on a $1 wager where the player has bet on 10 numbers. Calculate E(X), where the random variable X denotes the amount of money won. Number of Correct Guesses Payoff 1 2 18 <5 5 6 7 Probability .935 .0514 8 .0115 .0016 1.35 X 10-4 9 X 10-6 10,000 10 1.12 X 10-7 3.5.2. Jack first appeared in 1893 at the World's Fair. Enonnously popular ever since (250 million boxes are sold each the snack owes more than a little of its success, with children, to the toy included in each box. When a new Nulty Deluxe flavor was introduced in the mid-l990s, thaI familiar marketing gimmick was raised to a new level. Placed in one box was a certificate redeemable for a $10,000 ring; in 50 other boxes were certificates for a Breakfast at video (a movie in which the Holly Golightly, finds her engagement in a Cracker Jack box); the usual and were put in all the other boxes Calculate the expected value of in a box of Nutty Deluxe Assume that 5 million boxes were that first year. Also, assume that each video was worth $30 and each other 3.5.3. The pdf describing the daily earned by Acme Industries was derived in Example 3.3.7. Find the company's average 3.5.4. In the game of red ball, lwo drawings are made without epllaC(~m;ent from a bowl that has amount won is determined four white ping-pong balls and two red balls. a a player can opt to be paid how many of the red balls are selected. the game, which would under either Rule A or Rule B, as shown. If you were you choose? Why? B A No. of Red Balls Drawn 0 1 2 Payoff 0 $2 $10 No. of Red Balls Drawn Payoff 0 1 2 $1 $20 ° 3.5.5. Recall the "'''''''''''0''''''' launched by the Wipe Your Feet company described in .... v!l.n1nil' On the average, how many new customers would that effurt identify? calls would they have to make in order to find an avet'age of 100 new customers? Section 3.5 Expected Values 185 l.5.6.. A manufacturer has 100 memory chips in stock, 4% of which are likely to defective (based on past A random sample of 20 chips is selected and shipped to a that assembles laptops. Let X denote the number of that receive faulty memory chips. Find £(X). 3.S.7. Records show that 642 new students have just entered a certain Florida school district. Of those a total of 125 are not adequately vaccinated. The district's physician has scheduled a day for to receive whatever shots they miglu need. On any given day. though, 12% of district's students are likely to be absent. How many new students, then, can expected to remain inadequately vaccinated? :l5.8. Calculate E(Y) for foUowing (a) fy(y) = 3(1 - y)2,O::; y::; 1 (b) fy(y) 4ye- 2y • y ~ 0 (c) fy(y) = i' 1. ( 0, 0::; y ::; 1 2::; y .s 3 elsewhere (d) fy(y) = siny, O::;y::;~ 3.5.9. RecaH Question 3.4.4, where the length of time Y (in years) that a malaria patient pdf fy(y) = ~y2, 0::; y .s 3. What is the length of that a patient in remission? 3.5.10. Let the random variab1e Y have the uniform distribution over [a. bJ; thatis fy(y) = b~a fora .s y .s h. Find E(y) Also, deduce the value of E(y),knowing that the expected value is the center of gravity of frey). 3.5.U. Show that the expected value associated with the exponential distribution, fy(y) = ,Y > O. is I/A, where A is a constant. 3.3.12. Show that fy(y) = 1 y ~1 is a valid pdf but that Y does not have a finite expected value. 3.5.13. Based on recent experience, lO-year-old passenger cars going through a motor .".""1.... inspection station have an 80% chance of passing the emissions test. Suppose that 200 such cars will checked out next week. Write two formulas that show the number of cars that are expected to are at random from the pdf fy(y) = 3y2.0 ::.:; 3.5.14. Suppose that 15 Y ::; 1. Let X denote the number in the interval (4, 1). Find E(X). 3.5.15. A city has 74,806 registered automobiles. Each is required to display a bumper decal law, new decals need to showing that the owner paid an annual wheel tax of $50. be purchased during the month of the owner's birthday. How much wheel tax revenue can the city expect to receive in November? 3.5.16. Regulators have found that 23 of the investment companies that filed for bankruptcy in the five failed because of fraud. not reasons related to the eOOflomy. Suppose that additional wiU be added to the bankruptcy rolls during next quarter. How many of those failures are likely to be attributed to fraud? :l.5.17. An urn OOfltains four chips numbered 1 through 4. Two are drawn without replacement. Let the random variable X denote the of the two. E(X). 3.5.18. A fair coin is tossed three times. Let the random variable X denote the total number of that appear the number heads that on the and third tosses. E(X). 186 Chapter 3 Random Variables 3.5.19. How much would you have to ante to make the ST. Petersburg game "fair" (recall Example if the most you could win was $10007 Thai is, Ih~ payoffs aJ~ $21: rur 1 ::s k ::s 9, $1000 for k :::=: 10. 3.5.20. For the 81. Petersburg problem (Example find the expected payoff if (3) the amounts won are ,f inslead of ,where 0 < c < 2. (b) the amounlS won are log 2k. {This was a modification by D. Bernoulli (a marginal utility nephew of James Bernoulli) to take into account the of money-the more you ha ve, the Jess useful a bit more 3.5.21. A fair die is rolled 3 limes. Let X denote the number of A.I.I~r.d~' faces showing, X = 1,2,3. Find £(X). ~.5.22. 'Two distinct integers are chosen al random from the firs' five (WO 11U'.,,,.,,,,. the expected value of the absolute value of the difference 3.5.23. that two evenly matched teams are playing in the World Series. On the how many games will be played? (The winner is the first learn to four Assume that each game is an independent event 3.5.24. one white chip and one blnck chip. A chip is drawn at random. If it is white. the is over; if it is black, that chip and another black one are put into is drawn at random from the "new" urn and the same rules the urn. Then another game are followed (if the chip is white, the game is over; for ending or if the chip is black, it in the urn, together with another chip of the same color). The drawings untiJ a whilt: chip is sc::lt:clt:<l. Show (hal tilt: ~XtX;CLt:O number of necessary to a white chip is not finite. 3_~.25. A random sample of size 11 is drawn without replacement from an urn containing r red chips and w while chips. Define the random variable X to be the number of red chips in the sample. Use the summation described in Theorem 3.5.1 to prove that E(X) = ml(,- + w). 3.5.26. Given Ihal X is a nonnegative, random variable, show that E(X) = P(X ?:: k) EXI)ected Value of a Function of a Random Variable There are situations that call for finding the value of a function of a random Y = g(X). One common would be of scale problems, where g(X) aX + b for constants a and b. the pdf the new Y can be easily determined, in which case £(Y) can be calculated by 3.5.1. Often, though, fy(y) can be difficult to on the complexity g(X). Fortunately, Theorem 3.5.3 allows us to calculate the value of Y without knowing the pdf for Y. Theorem 3.5.3. Suppose X is a function of X. Then the "'-"'1/""'" rar.!ao.m variable with pdf px(k). random variable g(X) is given g(k) . px(k) provided thai L 311 k Ig(k)lpx(k) < 00. g(X) a Section 3.5 Expected Values 181 If Y is a continuous random variable with pdf Jy (Y), and if g (Y) is II continuous junction, then vaLue of the rmulom variable g(Y) is E[g(Y)] provided Clull f:x, Ig(y)lfr(y) dy < g(y) . frey) dy 00- Proot We will prove the for the discrete case. for ",,';;;Lalli> how the argument is modified when the pdf is qontinuous. W = g(X). all possible k·values. kl. k2 • ... , will give rise to a set of w*values, Wlo W2 •... , a"T'''1'',~1 more than one k may associated with a given w. Let Sj be the set of k's g(k) = Wj [so UjSj is entire set of k-values which px.(k) is qefined]. We P(X E Sj). and we can write have that peW = Wj} = E(W)=LWj P(W=Wj)=LWj' P(XESj) j j L LWj . px(k) = L L g(k)px(k) (why?) = j keSj j kESj = Lg(k)px.(k) all Ii: Since it is being assumed E Ig(k)lpx(k) < the statement of all k holds. CoroUary. For any random variable W, E(aW + b) = aE(W) + b, where a and bare conSUlnts. case is similar. By TheProof. Suppose W is continuous; the proof for orem 35.3, E(aW + b) .1-'oo,(aw + b)/w(w) dw, but the latter can a.f:.:, W . /w(w) dw + b fw(w) dw aE(W) + b 1 = aE(W) + b. 0 = EXAMPlE 3.5.9 Suppose that X is a random variable whose pdf is nonzero only for the three values -2, 1, and +2: k px(k) -2 g 1 lJ 5 1 2 1 188 Chapter 3 Random Variables Let W = g(X) = X 2 . Verify the statement of Theorem 3.5.3 by computing E(W) two ways-fust, by finding pw(w) and summing w . pw(w) over w and. second, by summing g(k) . px(k) over k. By inspection, the pdf for W is defined for only two values, 1 and 4: w (= i?-) pw(w) 1 R 4 7 ~ 1 1 Taking the first approach to find E (W) gives E(W) = ~w. pw(w) =1 . (~) + 4· (~) 29 To find the expected value via Theorem 3.5.3, we take E[g(X)] .2 = "£.....1< II . px(k) = (-2) 2 5 . - 8 + (1) 2 1 . - 8 + (2) 2 2 .- 8 1J. with the sum here reducing to the answer we already found, For this particular situation, neither approach was easier than the other. In general, that will not be the case. Finding pw(w) is often quite difficult, and on those occasions Theorem 35.3 can be of great benefit. EXAMPLE 3.5.10 Suppose the amount of propellant, Y, put into a can of spray paint is a random variable with pdf fy(y)=3i. 0 < y < 1 Experience has shown that the largest surface area that can be painted by a can having Y amount of propellant is twenty times the area of a circle generated by a radius of Y ft. If the Purple Dominoes, a newly formed urban gang, have just stolen their first can of spray paint, can they expect to have enough to cover a 5' x 8' subway panel with grafitti? Section 35 No. By assumption, the maximum area (in itl) that can described by the tlmet/Oln Expected Values 189 covered by a can of paint is g(Y) = 20Jt According to the second statement in Theorem 3.5..3, though. the average value for g(Y) than the desired 40 f~: is slightly E[g(Y)] == fol2OJtJ ·3Jdy l = 6(kY 5 0 EXAMPlE 3.5.11 (!l A fair coin is tossed until a head appears. You will be given dollars if that first head occurs on the kth toss. How much money can you expect to be paid? Let the random variable X denote the toss at which the head appears, Then k = 1,2, .. ' Moreover, E(amount won) = E [(!)X] = 2 E[g(X)] = Lg(k) , px(k) allk =L (l)k -2 . (l)k -2 00 k=1 (~)2k = (~y =L (1)1 -4 (~y OJ 1.:=0 1 - --...,... - 1 1 - = $0.33 190 3 Random Variables EXAMPLE 3.5.12 aVI')W.<lll'JH;)of probability to physics, James Clerk Maxwell (1831-1879) a density function given by S of a molecule in a perfect !s(s) = S '> 0 where a is a constant depending on the telJlperat, of the gas and the mass of the What is average energy of a molecule in a """"1"1»1'1 gas? Let m denote the molecule's mass. Recall from that energy (W), mass (III), find (S) are related through the W= ~mS2 = 2 g(S) To find E(W) we appeal to the second part of Theorem 3.5.3: 00 W) = 10 g(s)!s(s)ds 00 = 10 1 t 1°O 3 =2m Make t 10 0 = os2. Then E(W) 00 rr (3 fa m t 3!2t:-f dt (~) J;i, (see Section ",6) so = E(W) = a~( (~) 3m =4a EXAMPLE 3.5.13 Consolidated Industries is planning to market a new product and they are how to manufacture. They estimate each item sold will return a of m Expected Values dollars; each one not sold represents an n dollar demand for the product, V, will have an eX1XH1lenlLuu ......." .. 191 they suspect the u U ' ........'VH. tv (v) = ( ±) v > 0 How many items should the company produce if want to their expected profit? (Assume that n, m, and A are known.) If a total of x items are made, the company's profit can be expressed as a fWlction Q(v), where Q(v) and v is the number of E(Q(V)] = fooo Q(v) = I n(x mv mX if v < x ifv:?;x v) sold. It <Vli'V""" that their expected profit is . [v(v) nxt(l = fox[cm + n)v A.. (m To find the optimal production + n) - we -~-..;;;. = + a The integration here is straightforward, simplifies to E[Q(V)] 1 00 e-v/Adv mx . L<AJlUU,.". A. . (m + (~)e-vll dv Equation 3.5.7 eventually - n.x to solve dE( Q(V)J/dx (m + n)e- X / A - (3.5.7) = 0 for x. But n and the latter equals zero when x = -). . In ( EXAMPlE 1.5.14 A point, y, is SelC!CU::Q segments to the longer segment·! m n +n ) from the interval [0, 1J. dividing the line into two What is the expected value of the ratio of the shorter ~_,..,.~_. UUJ'YVJlll y o I 2 fIGURE ;1.5.5 1 192 3 Random Variables Notice, first, that the function g(Y) has two expressions, aelpellQln~ on the location of the chosen point: g(Y) = {Y/(1 - fy(y) = 1,0:s Y :s E[g(Y)] = L so {~1 Y 10 - .i' second integrand as (1/), 1 dy O:Sy:s~ !<y:SI y), y)/y, (1 - = . 1dy + (II 12 -)' . 1 Y 1) 1 11r "2 (!\' 1) dy = y- Y{ (In • ::I 1 In 2 - 2 By ~Ylllllldly, thuugh, the twu E an; tht; ~alllt:, su =21n2 - 1 =0.39 a little more than 2~ of QUESTIONS 3.5.27. Suppose X is a binomial random variable with I! 10 and p ~. What is the expected value of 3X - 41 3.5.28. Recall Question 3.2.4. Suppose Ihal each defective cornP()flenl discovered at the work station costs the company $100. What is the average cost to the company for detective components'! 3.5.29. Let X have the probability den:si(v x). 0 < x < 1 elsewhere that Y Find E(y) two different ways. Section 3.6 The Variance '9] A tooL and die company makes castings for steel stress-monitoring gauges. Their annual profit, Q, in hundreds ofthousands of dollars. can be expressed as a function of I-JL""V-U'~' demand, y: Suppose that the demand (in thousands) for their castings follows an exponential pdf, fv(y) = &-6y. y > O. Find the expected profit. Y inches, A box is to be constructed so that its heigh tis 5 inches and its base is where Y is a random described by the pdf, fv(y) = 6y(1 - y),O < y <: 1. End the expected volume box. 3.5.32. Grades on the last Economics 301 exam were not very good. Graphed, their distribution had a shape similar to the pdf Jy(y) 1 ' = 5000(100 - y), 0 ~ y!:: 100 As a way of "curving" the results, the professor announces that he will each person's grade, Y, with a new g(Y), where gCY) = wJ¥. the professor's strategy been in class average above 6O? 3.5.33. Find E(y2) if the random Y has the pdf pic lured below: I 1. o 3..5.34. The hypotenuse, Y. a uniform pdf over area. Do not leave ~----------------~----y the isosceles right triangle shown is a a.."'''''''')J variable having value of the triangle's interval [6,1OJ. Calculate the answer as a function of a. o 3.5.35. An urn contains n chip i is to 0 numbered 1 through n. Assume that the probability of choosing i = 1,2, .... n. H one chip is calculate E(i), where the random variable X denotes the number showing on the sum ofthe n integers is n(n + 1)/2. Hint: Recall that VARIANCE We saw in the location of a distribution is an important characteristic measured by calculating the mean or the median. A of a distribution that warrants further scrutiny is its dispersion-that is, it can be second 194 Chapter 3 Random Variables TABLE 3.6.1 (k) k 1 -1 2 1 i I k p X2(k) -1,000,000 "2 1,000,000 1 J 1 are totally different: extent to which its values are spread out. The two Knowing a pdCs location tells us absolutely nothing about its ",... 'm~'''. Table 3.6.1, for to zero) but example, shows two simple pdfs with same with vastly different dispersions. It is not immediately obvious how the in a pdf should be quantified. Suppose that X is any discrete random variable. Une seemingly reasonable approach would be to average the deviations X mean--that is, calculate the expected value of X - Ji. As it happens, that will not work because the negative deviations exactly cancel the making the numerical value of such an always zero, amount of spread present in px(k): - Ji) = E(X) - Ji = Ji - Ji = 0 Another possibility would be to modify Equation 3.6.1 by making all the tive-that replace E(X Ji) with E(lX - JLI}. This does work, and it is to measure but the absolute value is somewhat trc~ub·Jesorrie It not have a simple arithmetic formula, nor is it a the deviations proves to be a much better approach. Definition 3.6.1. The variance of a random "'"""V''''' from JL. If X is discrete with is the ""~,,,.... ,,.... value of its squared If Y is continuous with pdf hey), is not finite. the variance is not defined.] Comment. One unfortunate consequence of Definition 3.6.1 is that the units for the are the square of the for the random variable: If Y is iI'YII"<I<:l,'rpl1 eX.dmple the units for Var(Y) are inches squared. This causes obvious p ...).....' ..."..." variance back to the sample values. For that reason, in applied ,,,.,,"<1'"''''', """"'''''~'''P Section 3.6 is especially important, ch.o::,ne'r~l( the stondnrd devioJion, which is defined to The 195 is measured not by the variance square root of the That if X is discrete (1 = standard deviation = if Y is continuous expected value of a random Comment. The analogy between center of gravity of a physical system was pointed out in Section 3.5. A -'_.,-between the variance and what call a moment ofinettia. set of weights having masses mh m2, ... are positioned along a (weightless) rigid bar at distances rt. r2 • ... from an axis of rotation (see Figure 3.6.1), the moment of inertia of the system is to value E m/f. Notice, though, that if the masses were the probabilities i with a discrete '1. r2.··· could be written - variable and if the axis of rotation were actually Jl, then Jl),. .. and Emir; would same as the Jl). (k2 i (k - /I)2 . Px(k). all k ( ... . .. ) AGURE 3.6.1 Definition 3.6.1 gives a fn....mnl calculating (12 in both the """'............ cases. An equivalent-but to use-formula is given in Theorem Theorem 3.6.1. Let W be any random variable, discrete or continuous, having mean Jl and for which E(W2) isjinile. Then Proof. We will prove W is similar. In Var(W) = E«W theorem for the continuous case. let g(W) = (W - /I)2. Then = J': g(w)fw(w) dw = J': argument for discrete (w /I)2 fw(w) dw 196 Chapter 3 Random Variables that appears in the integrand and using the additive dw = i: i: = 2 (W - 2fJ.w W 2 + IJhfwCw) dw /W(w) dw - 2fJ. = E(W2) - 2fJ.2 Note that the equality J~ w2 fw(w) dw + p.,2 = i: w.fw(W) dw ECW2) + fJ.2 fw(w) dw fJ.'2. = E(W2) also follows 3.5.3. 0 EXAMPLE 3.6.1 An urn contains five chips, two red and three white. Suppose that two are drawn out at random, wilhout replo.cement. Let X denote the nwnber of red chips in the sample. Find Var(X). Note, first, that since the chips are not being replaced from drawing to drawing X is a hypcrgeomctric random Moreover, we need to find fL, regardless of which used to calculate . In the notation ofTheorem 3.5.2, r = 2, w = 3, and It = 2, so p., To rn/(r + Var(X) w) = 2 . 2/(2 + 3) = 0.8 we write Var(X) = £[(X - fJ.)2] = L(x - p.,)2 . fx(x) allx = 0.36 To use From Theorem 3.5.3, we would (x) = 02 . = 1.00 Section 3.6 The Variance '97 Theo Var(X) = J,L2 = 1.00 - (0.8)2 =0.36 coofirming what we calculated In Section 3.5 we encountered a of scale formula that applied to expected variable W. E(a W + b) = aE(W) + b. values. For any constants a and b and any A similar issue arises in connection with variance of a linear transfonnation: If Var(W) = 0'2, what is the variance of aW + b? Theorem 3.6.2. Let W be Qny ro:ndQm varinble having menn J,L and where E(W2) is finite. Then Var(aW + b) = a 2Var(W). Proof. Using the same approach taken in the of Theorem 3.6.1, it can be shown that E[(aW + bf] = a 2E(W2) + + We know from the Corollary to Theorem 3.5.3 that E(llW + b) = aJ,L + b. Using 3.6.1, then, we can write Va.r(aW + + b)2) + b)f + 2abJ,L + l?] - [aJ,L + bf 2 =[a E(W2) + 2abJ,L + l?] - [a 2J,L2 + 2abJ,L + l?] = a 2[E(W2) - J,L2] a 2Var(W) b) = W EXAMPLE 3.6.2 ranlOOlm variable Y is described by the pdf fy(y) = 2y, 0 < y < 1 What is the standard deviation of 3Y + 21 First., we need to find the variance of Y. But l E(Y)= loo 2 y. 2ydy=- 3 and 1 dy=- 2 so Var(Y) = E(y2) 1 18 - = 1 _ (_23)2 2 0 Random Variables Then, by Theorem 3.6.2, Var(3Y = (3)2 + . Var{y) =9 . 1 18 1 =2 which makes 3Y + 2 equal to standard deviation or 0.71. QUESTIONS 3.6.1 if the sampling is done with 3.6.1. Find Var(X) for the urn problem of replacement. 3.6..2. Find the variance of Y if ~, Jy(y) = i, 0, O~Y:$l 2:$ y ::; 3 elsewhere 3.6.3. Ten equally qualified applicants, six men and four women, apply for three lab technician of the applicants over all the others,. the positions. Unable to justify choosing """""l'\f1,npl director decides to select the at random. Let X denote the number of men hired. Compute the standard deviation of X. 3.6.4. C...omput.e the variance for a uniform random variable defined On the unit interval. 3.6..5. Use Theorem 3.6.1 to find the variance of the random variable Y, where jy{y) = 3(1 - y)2, 0 < y < 1 3.6.6.. If jy(y) 2y k2 ' O:::::y<k for what value of k does Var(Y) 2? 3.6.7. Ca1culate the standard deviation, (f, for the random variable Y whose pdf has the graph shown below: o 1 2 3 y _ ________L __ _ _ _ _ _ _ _ ~ b -_ _ _ _ _ _ - I 2 L Chapter 3 ~ '98 Section 3.6 The Variaru:e 199 3.6.8. Consider the pdf defined by fy(y) = 2 = y3' y?:. 1 = Show that (a)1. frey) dy 1, (b)E(Y) 2, and (c) Var(Y) is not finite. 3.6.9. Frankie and Johnny play the following game. Frankie selects a number at random from the interval [a, h]. Johnny, not knowing Frankie's number, is to pick a second number from that same inverval and pay Frankie an amount. W. equal to the squared difference between the two [so 0 ~ W ~ (h - a)2]. What should be Johnny's strategy if he wants to minimize his expected loss? 3.6.10. Let Y be a random variable whose pdf is given by fy (y) = Sy4. 0 .::: y .::::: 1. Use Theorem 3.6.1 to lind Var(Y). 3.6.11. Suppose that Y is an exrmential random variable. so fy (y) = u- AY , Y ?:. O. Show that the variance of Y is 1/)" . 3.6.12. Suppose that Y is an exponential random variable with)" = 2 (recall Question 3.6.11). Find P(Y :> E(Y) + 2JVar(Y». 3.6.13. Let X be a random variable with finite mean /1. Define for every real number a. g(a) = E[(X - a)2]. Show that g(a) = E[(X - /.Ii] + (/1 - a)2. What is another name for min g(a)? o 3.6.14. Let Y have the pdf given in Question 3.6.5. Find the variance of W, where W = -5Y + 12. 3.6.15.. If Y denotes a temperature recorded in degrees Fahrenheit, then ~(Y - 32) is the corresponding temperature in degrees Celsius. If the standard deviation for a set of temperatures is 15.7°P, what is the standard deviation of the equivalent Celsius temperatures? 3.6.16. If E(W) = /1 and Var(W) = 17 2, show that E (W-/1) = 0 17 anti 3.6.17. Suppose U is a uniionn random variabJe over [0, 1]. (9) Show that Y = (b - a)U + a is unifonn over [a, b] (b) Use Part (a) and Question 3.6.4 to find the variance of Y. Higher Moments The quantities we have identified as the mean and the variance are actually special cases of what are referred to more generally as the moments of a random variable. More precisely, E(W) is the first moment about the origin and (12 is the second moment about the mean. As the terminology suggests, we will have occasion to define higher moments of W. Just as E(W) and (12 reflect a random variable's location and dispersion, so it is possible to characterize other aspects of a distribution in terms of other moments. We will see, for example, that the skewness of a distribution-that is, the extent to which it is not symmetric around /1--can be effectively measured in terms of a third moment. Ukewise, there are issues that arise in certain applied statistics problems that require a knowledge of the flatness of a pdf, a property that can be quantified by the fourth moment. 200 Chapter 3 Random Variables Definition 3.6.2. Let W be any random variable with pdf fw(w), t<'or any positive m[eg(~r r, rth moment of W 1. provided tty, is given by the ' fw(w) dw < 00 (or provided the analogous condition on the SUl1urlou'on of Iwl holds, if W is When r = 1, we usually ,hlzr-M,nt and write E (W) as /-l rather than /-lI. Z. The rlh moment of W about the mean, IL~, is by r the provided the finiteness conditions of part 1 hold. Comment. We can the binomial expansion of (W - in terms of p.j. j = 1,2, ...• Y, by simply writing out = E[(W - J.li] = =E[(W - p.)3J = J.l3 = E[(W - tt)dJ = /-l4 - and so on, EXAMPlE 3.6.3 The skewness of a pdf can be measured in terms of its moment about the mean. It a pdf is symmetric, J.l)3] will obviously be zero; for pdfs not symmetric. E[(W - p.)3] will not be zero. In the symmetry (or ofapdfis often measured by the of skewness, Yt, where Yt = ----::----"- by 0"3 makes Yl Q]l1ner'lS1(mlC~SS. "shape" parameter in common use is the coefficient of kunosis, Yl, which ....",,",1<,,,.., the fourth moment about the mean. Specifically, 8e(:Dt1ld 3 Section 3.6 The Variance For certain pdf's, Y2 is a lliIeful measure of peakedne1iS: relatively.flat platykwtic; more peaked pdf's are calIed leptokurtic [see (97)]. 201 are said to Earlier in this chapter we encountered random variables whose means did not existrecall, example, the St Petersburg paradox. More generally. there are random variables having certain of higher moments finite and others, not finite. Addressing the question whether or not a given E (Wi) is finite is the following existence theorem. If the kth moment of a random variable Theorem thank all moments of order less Proof. Let frey) the pdf of a continuous ranaolm variable Y. E(Y") exists if and only if J': Definition Iylk . frey) dy < co (3.6.2) the theorem we must show that Let 1::::: j < k. is implied by Inequality 3.6.2. But 1 00 Iyli . frey) dy::= ( IYli. fy{y) dy "-yl:::l -00 ::; ( fr(y) dy 1y I::: 1 1 ::::: 1 + ( + + ( 1 11yl >1 lylJ· hey) dy Iyli. frey) dy IYI> 1 Iyli. hey) dy J'YI>l ::::: 1 + ( Iylk. hey) dy < co AYI>l Therefore, E(yJ) is similar. j = 1,2, ...• k - 1. The for discrete random variables EXAMPlE 3.6.4 Many of the random variables that playa major role in statistics have moments existing for alL k, as does, for instance. the normal distribution introduced in Example 3.4.3. Still, it is not difficult to find wen-known models for which this is not true. A case in point is the Student t distribution, a probability function widely used in procedures. (See Chapter 7.) The pdf for a Student I random variable is given by frey) c(n) = _2. ( 1 + ~) (11+1)/2 ,-co < y < co, n 2:: 1 202 Chapter 3 Random Variables where n is By definition, to as the distribution's "degrees of freedom" and c(/'I) is a constant (2k)th moment is the integral =c(n) . i: ( y~ 1 Is E(y2k) finite? Not neces.,\arily. Recall from calculus ("+1)/2 dy +L n aD integral of the form will Wl1vt:rge unly if a > 1. Abu, tht:: cuovefgence for integrals of are the same as Therefore, if E(y2k) is to we must have n or, equivalently, 2k < lreea~:)m has £(X8) < /'I. 00, +1 2k > 1 Thus a Student t random ""'.... "'n. with, say, n = 9 degrees of but no moment of order than eight exists. QUESTIONS 3.6.18. Let Y be a uniform random variable defined over the interval (0,2). Find an expression for the. rth mome.nt of Y about the origin. A Iso, lI~F, thF. hinomil'll expansion as descrihed in the comment to find - Ji)6J. 3.6.19. Find the skewness for an exponential random variable the pdf 3.6.20. Calculate the coefficient 'of kurtosis for a uniform random variable over the unit Jy(y) 1, 0 ::: y ::: 1. 3.6.21. Suppose that W is a random variable for which E[(W - J1.)3] = 10 and E(W3) = 4. Is it possible that Ji = = Section 3.7 Joint rl9'...."'1riP< 203 3.6.22. If Y = aX + b, show that Y has the same coefficients of skewness kurtosis as X. 3.6.23. Let Y be the random variable of Questlon where for a integer II, fy (J) (II + 2){n + l)y"(1 0 ~)' ~ 1. (a) Find Var(Y) (b) For any positive integer k, find the kth moment around the origin. 3.6.14.. L>UI",I-".'~ that the random variable Y is described by the pdf y > I fy(y)=c· (0) Find c. (b) What is the highest moment Y that exists? JOINT DENSITIES Sectlons 3.3. 3.4 introduced the basic terminology for descrihing the probabilistic behavior of a single random variable. Such information, while adequate [or many problems, is insufficient when more than one variable is of interest to the experimenter. Medical researcbers, example, continue to the relationship between blood cholesterol and heart disease, and, more recently, between "good" cholesterol and "bad" cholesterol. And more than a attention--both politkal and given to the role played by K -12 funding in the performance of would-be high graduates on exit exams. On a smaller scale, electronic eq uipment and systems are often designed to have built-in redundancy: Whether or not that equipment functions properly ultimately depends on the reliability of two different components. point are many situations where two relevant random variables, X and y,2 are defined on the same space. Knowing only /x(':x} and /y(y), though, does not necessarily provide enough information to characterize the all-important simultaneous behavior of X and Y. purpose of section is to introduce the concepts, definitions, and mathematical techniques associated with distributions based on two (or more) random variables. Discrete Joint Pdfs As we saw in the single-variable case, the pdf is defined differently, depending on whether tile random variable is discrete or continuous. The same distinction applies to joint We begin with a discussion pdfs as apply to two random Definition 3.7_1. Suppose S is a discrete sample space on which two random variables, X y, are defined. The joint probability densily junction of X and Y (or joint pdf) is denoted PX,y(x, y), wbere px.y(x,y) P({sIX(s) =X and yes) .v)) 2For the next several sections we will suspend our earlier practice of using X to denote a discrete random variable and Y to denole a continuous random variable. The category of the random variables will need to be del ermined from (he context ollhe problem. Typially. though. X and Y will either both be discrete OT both. be continuous.. 204 Chapter 3 Random Variables Comment. A convenient shorthand notation for tbe meaning of px, yex, y}, consistent with what was used earlier of single discrete random variables, is to write y) = P(X = x, Y = y). EXAMPLE 3.7.1 has tWo express lines. Let X and Y denote the number of customers in the first and in the second, respectively, at any given time. During nonrusb tbe pdf of X and Y is by the nJlJ''''''"'Il< A x Y 0 1 2 3 0 1 2 3 0.1 0.2 0 0 0.2 0.25 O.OS 0 0 0.05 0.05 0.025 0 0 that X and Y differ by Find P(IX - YI = 1), the By definition. P(lX - 0.05 YI = 1) = one. PX,y(x, y) = pX,y(O, 1) + PX,y = 0.2 + px.y(l,O) + 1) + Px,y(2,3) + 0.2 + 0.05 + 0.05 px.y(1,2) + PX.Y 2) + 0.025 + 0.025 =0.55 [Would you PX,y(x, y) to be symmetric? Would you expect the event IX to have zero probability?) YI 2:: 2 EXAMPlE 3.1.2 Suppose two fair are X be the sum of the ........... ""d showing, and let Y be the larger of the two. for example, =0 px.y(2. 3) = 2, Y = 3) = P(0) px.y(4. 3) = P(X 4, Y = 3) = P«((l. 3)(3,1))) = 362 and pJ(,y(6,3) = =6,Y = Thf. f.ntirf. joint pdf is given in Tahle 3.7.1. = P({(3. 3)} =36-1 Section 3.7 Joint Densities 205 TASLE 3.7.1 y ~ 1 2 3 4 5 6 0 2/36 1/36 0 0 0 0 0 0 0 0 0 0 2/36 2/36 1/36 0 0 0 0 0 0 0 0 0 2/36 2/36 10 11 12 1/36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2/36 2/36 2/36 2/36 1/36 0 0 0 0 0 0 0 2/36 2/36 2/36 2/36 2/36 1/36 CoL totals 1/36 3/36 5/36 7/36 9/36 11 /36 2 3 4 5 6 7 8 9 2/36 1/36 0 Row totals 1/36 2/36 3/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 Notice that the row totals in the right-hand margin the table give the pdf ~fm: X, Similarly, the column totals along the bottom detail the pdf for Y. Those are not coincidences. Theorem 3.7,1 gives a fonnal statement of the joint pdf and the individual pdfs, between the Theorem 3.7.L Suppose tluJt px.y(x. y) is lh£joinl pdf of the discrete random variables X and Y. Then px(x) =L PX,Y(x, y) and py(y) ally LPX,y(x,y) allx Proof. We will prove the first statement. Note that the collection of sets (Y = y) for aU y form a partition of S; that is, they are disjoint UallyCY = y) = S, The set (X=x)=(X=x) n S=(X=x) () Ually(Y=y) = UallyI(X =x) n (Y=y)},s9 px(x) = P(X == x) = p (U[(X = x) () (Y = Y)l) ally = LP(X = dy x, Y = y) = I>x.y(x, y), 0 dy Definition l.7.2. An individual pdf obtained by summing a joint the random variable is a marginal pdf.. over aU of Continuous Joint Pdfs If X and Yare both continuous random variables, Definition does not appJy because P(X = x. Y = y) will be identically 0 for (x, y). As was the case in single-variable 206 Chapter 3 Random "",..... '.tV,,"', the joint pdf for two continuous random variables will be defined as a fUnction, Y) lies in a region of the integrated the probability that Definition 3.7.3, Two random variables defined on the same set of real numbers are R in the Y)ER)=ffR y(x.y) dx dy. The y)isthejointpd/ xy-plane o/X and Y. joinlly continuous if there exists:l nmction Ix.Y(K . .1') ~uch IhM for flny Note: Any function Ix. y(x, y) for which L fx,y(x, y) :::: 0 for all x and y 2. f':f':fx,y(X,Y)dXd:j,'=l qualifies as ajornt pdf. We shall employ the convention of is analogous, the pdf is nonzero; everywhere else it will be assumed to course, to the notation used earlier in describing the domain of single random variables. EXAMPLE 3.7.3 ::SUD~)se that the variation in two continuous random variables, X and Y, can be modeled by the joint pdf Ix.y(x. y) ex)'. for 0 < }' < x < L Find c. By Ix,Y(x, y) will non-negative as long as c :::. O. c that y) to be a pdf, though, is the one that makes the volume under to L But c 1 (y21~) 1 x 0 = c £o (x.l) 1 2 ,,:"111 dx = c- 8 0 (1) = -8 - 20 dx c e= 8. EXAMPLE 3.7A A claims that the daily number of houl,S, number of X, a teenager watches television and the works on his homework are approximated by the y) = x),e-{x+V). What is the probabjJjty a ~'-"""b television as he The R, in the Hgure 3.7.1. It follows that x > 0, y > 0 chosen at at twice as much time working on corresponding to the event "X:::' " is shown in is the volume under fx.Y(x, y) above the region R: P(X :::: dydx Section ,3,7 Joint Densities 207 y o RGURE 3.7.1 Separating variables, we can write 1 00 P(X ::: 2Y) = and the double integral reduces to f,: 16 =1 - 54 7 27 Geometrk Probability One particularly important special case of Definition 3.7.3 is tbe joint uniform pdf, wbich is represented by a surface having a constant height everywhere above a specified rectangle in the That Ix.y(x, Y) = (b 1 a :::s x :::s h, C ~ Y ~ d a)(d If R is some region in tbe rectangle where X and Yare defined. P«X, y) E R) reduces to a simple ratio areas: area of R (3.7.1) P«X, Y) E R) --:--~---c-) = Calculations based on Equation are referred to as geometric probabilities. EXAMPlE 3.1.5 Two friends agree to meet on the University Commons "sometime around " But neither of them is particularly punctual-or patient. What will actually happen is that 208 Chapter 3 Random Variables -y=15 (0, o - - _........_ - x (15.0) 6() AGURE3.7.2 will at random sometime the interval from to 1:00. If one arrives and the other is not mere, the first person will wail fifteen minutes or until 1:00, comes and then leave. What is the probability two will together? To simplify notation, we can represent the time period from 12:00 to 1:00 as the interval zero to sixty Then if x and y denote the two arrival the S<1lnpJLe space is the 60 X 60 square shown in 3.7.2. Furthermore, the event M, "the two friends meet/' will occur if and only Ix - yl ::;: 15 or, equivalently, if and only if -15 ~ X Y ::;: 15. inequali ties appear as the shaded region in Figure Notice that the areas of the two above and below M are each to ~(45)(45). It that the two of meeting: p (M )= area M ----=area of S (60)2 _ =0.44 EXAMPlE 3.7.6 A carnival operator wants to set up a ringtoss game. will throw a ring of diameter square being of length s (see Figure 3.7.3). If the ring lands entirely inside a square, the wins a prize. To ensure a pront, the d onto a grid of squares, the side of FIGURE 3.7.3 Section 3.7 Joint Densities 209 $ s RGURf3.7.4 must the player's chances winning down to something less one in five. How small can the operator make the ratio d / s? First, it will be assumed that the player is required to stand far enough away so that no is involved and the dng is falling at random on the grid. From Figure 3.7.4, we see that in for dng not to touch any of the square, the center must be somewhere in the interior a smaller square, each side of which is a distance d /2 from one of the grid lines. the area of an interior square is (8 - a)2, the Since the area of a grid square is s2 probability a winning toss can be written as the ratio: P(ring touches no lines) But the operator that ---,,-- .:::: 0.20 Solving for gives d ->1 s That is, if the diameter of the is at least 55% as long as the side of one of the squares, the player will have no more than a 20% chance of winning. QUESTIONS 3.7.1. If px.Y(x, y) = cxy at the points (1,1), (2, 1), (2,2), and (3, I), and equals 0 elsewhere, find c. 3.7.2. Let X and Y be two continuous random variables defined over the unit square. What does c equallf ix.Y(x, y) = c(x2 + y2)? 3.7.3. Suppose random variables X Y vary in accordance with the jotnt pdf, ix,Y(x, y) = c(x + y).O < x < y < 1. Find c. 210 Chapter 3 Random Variables = 3.7.4. Find c if Ix,yex,)I) cry X and Y defined over the triangle whose vertices are the points (0, 0), (0, 1), and (1. 1). 3.7.5. An urn contains four red chips, three white chips, and two blue chips. A random sample of size 3 is drawn without replacement. Let X denote the number of white chips lnlhe sample and Y the number of blue. Write a formula for the pdf of X and Y. 3.7,l,.. Four cards are drawn from a standard pokeI' deck. Let X be the number of kings drawn and Y the number of queens. Find pX.Y (x. y). 3.7.7. An advisor looks ove! the of his 50 students to see how many math and science courses each has for in the coming semester. He summarizes his results in a table. What is probability that a student selected at random will have signed up tor more math courses than science COUTSes? Number of math courses, X Number or science courses, Y 2 1 0 U J1 6 4 1 9 to 3 2 5 0 2 of tossing a fair coin three times. Let X denote the number 3.7.8. Consider the of heads On the last flip, and let Y denote the lolal number of heads on the three flips. Find I'X,y(x, y). 3.7.9. Suppose that two fair dice are tossed one time. LeI X denote the number of 2's that appear, and Y the number of 3's. Write the matrix giving the joint probability density function for X and Y. Suppose a third random variable, IS where Z X + Y. Use fJx.Y(x, y) to find /lz(z). 3.7.10. that X and Y have a bivariate uniform over the unit square: = fx,y(x. y) = I e, 0 < x < L 0, elsewhere (8) Find c. (b) Find P(O < X < ~,O < Y < 3.7.11. Let X and Y have the joint fx.Y(x, y) l). = 2e-(x+ y ), 0 < x < y, 0 < y Plnd P(Y < 3X). 3.7.12. A point is chosen at random from the inlerior of a circle whose equation + )'.2 < 4. Let the random variables X and Y denote the x- and y-coordinates of rhe sampled point. Find fx.Y(x. y), 3.7.13. Find < 2Y) if fx.Y(x, y) = x + y for X and Y each defined over the unit interval. 3.7.14. Suppose that five independent observations are drawn from the continuous iT (I) = 2/, 0 :S t :':'0 L Let X denote the number of t'S that fall in the interval 0 I < and Lel Y denote the number of I'S that fall in the interval ~ :':'0 I < Find px.y(1. 2). 3.7.15, A point is chosen at random from the interior of a right triangle with base band height h. What is the probability that [he y vallie is between 0 and h/2? 1· Section 3.7 Joint Densities 211 Marginal Pdf!; for Continuous Random Variables The notion of marginal pdfs in connection with discrete random variables was introduced in Theorem 3.7.1 and Definition 3.7.2. An analogous relationship holds in case--integration, though, replaces the summation that appears in Theorem Theorem 3.7.2. Suppose X and Yare jointly continuous with joim pdf Ix.y (x. y). Then the marginal Ix(x) and fy(y), are given by and IX(X)==i:/x,Y(X,Y)dY fy(y) = Llx.y(X,Y)dX Proof. It to verify the first of the theorem's two equalities. As is often the case with proofs for continuous random variables, we begin with cdf: Fx(x) = P(X ~ x) = foo -00 /x fx,Y(t, .'I) dt dy == -00 /x foo -00 Ix,Y(x, .'I) dy dl -00 Differentiating both ends of the equation above gives lx(x) = f: fx,Y(x, .'I) (recall Theorem EXAMPLE 3.7.1 Suppose that two continuous random variables, X and Y, have the joint uniform pdf, fx.y(x, .'I) = 1 0~x~3. 0~y:s2 Find lx(x). Applying Theorem 3.7.2 gives {2 (21 L hex) = J lx.Y(x . .'I) dy = 10 '6 dy = o o ~x ~3 Notice that X, by itself. is a uniform random variable defined over the interval [0, 3J; similarly, we would find that fy(y) has a uniform pdf over the interval [0,2]. EXAMPLE 3.7.8 Consider the case where X and Yare two continuous random variables,jointly distributed over the first quadrant of thexy-plane according to the joint pdf, Ix. y _ [y2e(x , y) - 0 , y (x+l) , x':: 0, .'I':: 0 1 h esew ere 111 Chapter 3 Random Variables Find the two consider fx(x). By Theurem /x(x) :::: In the "l'Ty",>nri substitute u makingdu = (x + After applying 1) l)dy. This roo 1 fx (x) = y(x + = oX + '"",,"H"U.u. /x(x) 1 10 __+1_-;:-1 00 du= (x + 10 by parts (twice) to = -(X-~-A [_u 2 e- =(X~ [2- du du, we g<;:t U 2 (x - U ] [ 2e- (;+'"+~)] x:::,O Fmding frey) is a bit easier: fy(y) i: - y2 e->' =ye-Y , y(x. y) dx fooo dx = 1 00 y2 e -y(x+l) dx =le-Y (~) ( _e-Yx [ ) y~O QUESTIONS 3.7.16. Find the marginal of X for the joint prlf oerived in Question 3.7.5. of X and Y for the joint pdf derived in Question 3.7.8. 3.7.17. Find the marginal for an internarional conglomerate class,ifies the large number of 3.7.18. The campus students she interviews into three categories-the lower quarter, the middle half, and the upper quarter. If she meets six students on a what is the proha hility that they will be divided among the three What is the marginal probability that exactly two wilJ belong to the middle 3.7.19. For each of the following joint pdf's. find /x(x) and /'I(y). (8) !x,Y(X,y)= O:::sx:::s2,O:SY:::Sl (b) /x,r(x,y)= ,O:sx :::s2,O:::SY:::Sl Section 3.7 (c) (d) (e) (I) Joint Densities 213 bu(x, y) :::: ~(x + 2y),O ::s x ::s 1,O::s y :s 1 ix.v(x. y) = c(x + y),O::s: x ::s 1,0 ::s y ::s 1 ix,Y(x. y) = 4xy. 0 :s x :s; 1,0 .:s y ::s 1 ho(:x, y) = xye-(''-+Y), O:s; x, O.:s y (g) Ix.y(x. y) = ye-.ry-y, O::s x, O::s y 3.7.20. For each of thefollowingjoint find Ix(x) and fy(y). (8) Ix,Y (x, y) = 0 .:s x .:s y ::s 2 (b) Ix,'r'(X. Y) .r O:s; Y :s x .:s 1 (c) (x,y)=6x,O.:sx.:s l,O:s;y::s:1 x 3.7.21. Suppose that Ix,v(x. y) = 6(1 - .x - y) for.x and y defined over the unit square, subject to the restriction that 0 ::s x + y ::s: 1. Find the marginal pdf for X. 3.7.22. Find Jy(y) if ix.Y(x. y) = for x and y defined over the shaded region pictured. y o -------x 3.7.23. Suppose that X and Yare discrete random \1ariables with pX,v(x. = x!y!(4 4! x _ y)! (I)X (1)Y (1)4-X-), . 2 '3 6 O::s x + y::s 4 Find px(x) and py(x). 3.7.24. A g~lization of the binomial model occurs when there is a sequence of n independent trials with three outcomes, where PI = P{Outoome 1) and P2 = P(Outcome 2). Let X and Y denote the number of trials (out of n) resulting in Outcome 1 and Outcome 2, respectively. (a) Show that px.y(x. - PI - P2)"-x-}',O.:s x + y ::s n (b) Find px(x) and py(x). Hint: Question 3.7.23. Joint Cdfs For a single random variable X, the cdf of X evaluated at some point x-that is, the probability the random variable X takes on a value less than or equal to x. Extended to two a joint cdf(evaluated at the point (u. u» is the probability that X ;5 u and, simultaneously, Y ;5 v. Definition 3.7.4. Let X and Y be two random variables. The joint cumulative distribution function of X and Y (or joint cd/) is denoted Fx.y(u, v). where Fx,y(u, v) = P(X < u and Y:s v) 114 Chapter 3 Random Variables EXAMPLE j.7.9 Find the joint edt Fx.y(u, v), for the two random variables, X and Y, whose joint is given by fx.y(x, y) = ~(x + xy), 0 .:s; x ::: 1,0::: y ::: 1. If is applied, the probability that X .:::; LI and Y .:::; v a double integra.l of Ix.y(x. Fx.y(u. v) 4 (" t" =3 10 10 4 v + xy)dxdy xl :310 ( 2(1 + 4 (V ( [U =3 10 10 (x + xy)dx ) dy 4 11) j} ul = 310 2(1 + y) 0 2( i)" 4U ( + 2: - 32 v y y)dy + 2v2) o which simplifies to (For what values of It and v is FX,Y(u. v) defined?) Theorem 3.7.3. Lei Fxy(u, v) be the joint cdt assocwted with the continuous random y(x, y) is a second partial derivalive variables X and Y. Then the joint pdf of X and Y, of the joint cdf-Llw, i.), !x,Y(J.:, y) a2 ax a)' ,y), provided Fx.Y(x, y) has continuous second partial derivatives. EXAMPLE 3.7.10 What is the joiOl pdf of the random variables X and Y whose joint cdf is Fx. }x2(2Y + y2)? By Theorem tx.Y(x, y) = ;;;-FX,y(X. Y) a2 ily'3 x (2 y the similarity between examples; so is Fx,Y(x, y). iP = ax ay 2 1 (2y = + i) 4 + i) = 3"x(2 + 2y) = '3(x + aDd y) y(x. y) xy) is the same io both Section 3.7 Joint Densities 215 Muttivariate Densities The definitions and theorems in this section extend in a very straightforward way to situations involving more than two variables. The joint pdf for n discrete random variables, for example, is denoted PXt •...• XlI(Xl, •.•• x,,) where PXt. .. "XII (Xl •... , x,,) For It = P(XI :::: X1. ... , X" = XII) continuous random variables, the joint pdf is that fWlction the property that for any region R in n-space, P«Xh .... XII) E R) = Ixt. ... ,Xn (Xl •... , x,d Ii ... f Andu FXI .... ,X,,(Xl •... ,x,,) is thejointcdjofcontinuous random variables Xl • .... X,,-that FXh ...• X"(xt. ...• xl'l) P(Xl :SXl •..• , :sxl'lHhen = if' tXI"" .x., (Xl ••.. , xll ) = ax} ... aXil The notion of a marginal pdf also extends readily, although in the n-variate case, a marginal pdf can, itself, be a joint pdf. Given Xl •...• X". the marginal pdf of any subset of T of those variables (XiI' Xi-).. "" Xi,) is derived by integrating (or summing) the joint pdf with to the remaining rJ - r variables (Xjl' Xil"'" Xj,,_,). If the Xi'S are all continuous, for example, QUESTIONS 3.7..2.5. Consider the experiment of simultaneously t05Sing a coin and X denote the number of heads showing on the coin and Y the number 00 the die. (8) List the outcomes in (b) Fmd FX.l::'(1, 3.7.26. An urn contains 12 chips-4 red., 3 black, and 5 white. A sample of size 4 is to be drawn without replacement. Let X denote the number of white chips the sample; Y the number of red. Find Fx.y(l. 2). 3.7:1.7. For each of the folJowingjoint pdf's, find Fx.Y(u, v). (a) fx.y(x, y) = ~y2, 0::; x :s 2,0:$ Y ::; 1 (b) Ix.Y(x. Y) == l(x + 2y), 0 ~ x ~ 1, 0 ~ Y :s 1 (c) Ix,Y (x, Y) = 4xy, 0::; x :5: 1,0 :5: Y ::: 1 3.7..28, For each of the following joint pdfs, find Fx.y(u. tI). (a) fx.y(x. y) = 0:::: x ~ y :s 2 (b) Ix.Y(x, y) ::::::: ~,O :s y :5: x .s 1 !. (c:) fx.v(x,y)=6x.0~)(..s1,0:5y:sl x 3.7:19. Find and graph Ix,v(x, y) if the joint cdf for random variables X and Y is Fx.Y(x,y) =xY. O<x<l. 0<y<1 216 Chapter 3 Random Variables 1.7.30. Find the jOint pdf associated with two random variables X and Y whose: x > O. WI is y > 0 3.7.3L Given that Fx,}'(x. Y) = + 5xy 4), 0 < x < 1,0 < Y < 1, find the correspOJld pdf and use it to calculate P(O < X < ~ < Y < 1). 3.7.32. Provelhal i, P(a < X :5 b, C < Y :5 d) = Fxs(b. d) - Fx. y(a. d) - FX, y(b, c) + Fx. y (a. c) 3.1.33. A certain brand of ftuorescent bulbs will last, on the average. 1000 hours. Suppose that four of these bulbs are installed in un office. What is probability that all four lire still functioning after 1050 hours? If Xi denotes the ith bulb's assume that Ii (_l_) e- xjlOOO \1000 1=1 for Xi > 0,; ] . 2. 3, 4. 3.1.34. A hand of six cards is dealt from a standard poker deck. Let X denofe the number of aces, Y the number of kings, and Z the number of queens. (a) Write a formula for px.t·,z{x, y, z). (b) Find PX.~' y) and px,z(x, 3.7.35. Calculate px.y(O. 1) if PX.f.Z(X. y,:::) = 3! - \: - \' (~)3-X-Y-l forx,y.::=O,I,2,3andO:;;x + y ~ z~3. 3.1~. Suppose [hal the random variables X. (x Jill.x. y.z(w, x, y. :) <: W 2 (12 )Y (1)Z 6 1 + for 0 <: x <' 1.0 <' Y < 1. and:: > O. Find (a) 3.737. The four random variables W, X, Y, and Z have for 0 (1)~ and Z have the mul(ivariale pdf y,Z) !IV.X(W.X), ~)! y(x. )'), (b) h.z(y, z), and fz(z). multivariate I6wxyz <: I, 0 < x <: 1. 0 < y < 1, and 0 < ;:; < 1. Find the and use it to compute P(O <: W < ~,~ < X < pdf. Independence of Two Random Variables The concept of independent events that was introduced in Section 2.5 leads quite naturally to a similar definition for independent random variables. Definition 3.1.5. Two random variables X and Yare to be independent if for every interval A and every interval B, P(X E A and Y E B) = P(X E A)P(Y E B). Theorem 3.1.4. The random variables X lIlIO Y orr: in-tip-pendent if and only if there are functions g(x) and h(y) such that hu(;.;, = g(x)h(y) (3.7.2) lfF,qualimd. 7.2 holds, rhere is a constant k such that fx(x) = kg(x) and jy(y) = (1/ k)h(y). Section 3.7 Proof. We prove the theorem for Y .::: y) 217 continuous case. The discrete case is similar. suppose that X- and Yare independent. Then Fx.Y(x. y) = Fx(x)Fy(y), and we can write = P(X :5: x and = P(X .::: x) P(Y .::: y) iP ()l fx.Y(x, y) = ax ay Fx.y(x. y) = (Ix d iJy Fx(x)Fy(y) Next we need to show that _-, ____ _ begin. note that fx(x) Set k Joint Densh:ies = .r: = f: ix.Y(x, y) dy = hey) dy, so fx(x) = d = dx Fx(x) dy Fy(y) - fx(x)jy(y) implies that X and Y are independent. To f': g(x)h(y) dy = g(x) f: h(y) dy kg(x). Similarly, it can be shown that frey) = (l/k)h(y). P(X E A and Y E B) i1 =i 1 = fx.y(x,y)dxdy = i1 =i g(x)h(y)dxdy kg(x)(l/k)h(y) dx dy (x)dx 1 fy(y)dy = P(X E A)P(Y E B) o and the theorem is proved.. EXAMPtE 3.7.11 Suppose that the probabilistic behavior of two random variables X and Y is described by the pdf fx.r(x, y) = 12xy(1 - y),O.::: x :5: 1,0.::: y :s; L Are X and Y independent? If are, fx(x) and frey). According to Theorem 3.7.4, answer to the independence question is ''yes'' if fx,Y(x, y) can be factored into a function x times a function of y. But there are such functions. Let g(x) 12x and hey) = y(l - y). find fx(x) and fy(y) requires that the "12" appearing fx.r(x, y) be factored in suchawaythatg(x) . h(y) = fx(x) . frey)· = k= 1 00 -00 hey) dy = 101 y(l - y) dy 0 Therefore, fx(x) = kg(x) = !C12x) O:::;y:::;1. 0.::: x = [i /2 - 1 y3/3] 11 =- ::s 1 and fy(y) = 0 6 (1/ k)h(y) = 6y(1 - y), Independence of n (:> 2) Random Variables In Chapter 2, extending the notion of independence from two eveots to n events proved The independence of subset of n events bad to be to be something of a 218 Chapter 3 Random Variables Definition 2.5.2). This is not necessary the case of n random use the extension of 3.7.4 to n random as the definition of in the multidimensional case. theorem that is equivalent to the factorization of the joint pdf holds in the multidimensional case. Definition 3.7.6. n random variables, are said to be independent if there are functions 81 (xJ), 82(Xz), ...• gn(xn) such that for every Xl, Xl, ... , Xn ix.,X2 ... .,X., A similar statement with p. • X2.·· ,Xn ) = gJ(XI)g2(X2)'" gil (XII) discrete random which case J is re~Ha';eQ Analogous to the result for n = 2 random the expression on the of the equation in 3.7.6 can also be written as the product of XI, X 2, .. _• and EXAMPLE 3.7.12 Consider k urns, each holding 11 chips, 1 through n. A chip is to be at random from each urn. What is the probability that all k chips will bear the same number? If X I, X2, ... , denote the numbers on 2nd, ... , and kth we are looking for probability that Xl = X2 ... = XI;. In terms P(XI Each = the selections here is obviously independent according to Definition and we can write all the others SO the P(Xt=X2=" Random SalmnlP.'O Definition 3.7.6 question as it applies to n random variables having marginal pdfs.-say, ixl (Xt). !X'l(X2).. (xn)--that might be A special case of that definition occurs for virtually every set of data for statistical analysis. an experimenter a set of.n measurements, Xt. X2 • ...• XII. under the same conditions. Those Xi'S, then, qualify as a set of Inrlpor1......np''l1 lilUIU.lUU' variables-moreover, each represents the same pdf. The familiar-notation for that scenario is given in Definition 3.7.7. We will encounter it often in the chapters ahead. Section 3.7 Joint Densities 219 Definition 3.7.7. Let Xl, X2,"" be a set of n independent random variables, an having same pdf. Then X h X 2, ... ,XII are said to a random sample of size n. QUESTIONS 3.7.J8. Two dice are tossed. Let X denote the number appearing 0rJ the first die and Y the number on the Show that X and Y are independent. y 3.7.39. Let Ix,Y(x.y) ),2 e -(.r+ ), 0 ~ x, O:s y. Show that X and Y are independent. What are the pdfs in case'! 3.7.40. Suppose that each of two urns has four chips, numbered 1 through 4. A chip is drawn from first urn and bears the number X, That chip is added to the second urn. A chip is then drawn from the second urn. Call its number Y. (8) Find px.y(x. y). (b) Show that px(k) = py(k) = k 1,2.3.4 (c) Show that X and Y are not independent 3.7Al. Let X and Y be random variables withjoint pdf !, Ix.Y(x. y) = k. o ~ x :s 1, 0 ~ y ~ 1. 0 :s x + y ~1 a geometric argument to show that X and Y are not independent. 3.7A2. Are the random variables X and Y independent if Ix,Y(x. y) = ~(x o~ y + 2y), 0 ~ x ~ 1, ~ 11 3.7,.43. Suppose that random variables X and Yare independent with marginal pd~ o:s x :s 1 and Jy(y) = 3y2, 0.::; y ::: L Find P(Y < X), 3.7..44. Ix~x) = the joint cdf of the independent random variables X and Y, where Ix(;x) = x O:s x ~ 2 and fy(y) = O:s y :S L 3.7.45. If two random variables X and Yare independent with marginal o ~ x ~ 1 and fy(y) = 1, 0 ~ y :s 1, calculate P (~ 3.7..46. Suppose d that y) = Ix(x) = > 2} x > 0, y > O. Prove for any real numbers a, b, c. and Pea < X < b, C < Y < d) = P(a < X < b) . P(c < Y < d} thereby establishing the independence of X and Y. 3.7A1. the joint pdf Ix.Y (x, y) = 2x + y 2xy,O < x < 1,0 < y < 1, find numbers a, b, c. and d such that P(a < X < b, c < Y < d) ¢ P(a < X < b) . P(c < Y < d) thus demonstrating that X and Y are not independent. 3.7A8.. Prove that if X and Y are two independent random variables, then U = g(X) and V = h(Y) are also independent. 1.7.49. If two random variables X and Yare defined over a region ill the X V-plane that is nol a "'-"<Utj!;''- (possibly infinite) with sides parallel to the coordinate axes, can X and Y he independent? 3.7.50. Write down the joint probability density function for a random of n drawn from the exponential pdf, ix(;x) = (lj).)e-x/"A, x > O. 220 Chapter 3 Random Variables 3.7.5L Suppose that XI> X3, and X4 are (Xi) = 4xf, O:s Xi :s 1. Find (a) P(XI < ~) (b) P(exactlyone <~) random variables, each with pdf (c) fX}.X2,Xj,X.(XI. X2.XJ.X'4) (d) F X2. X3 (X2. X3) 3.7.52. A random sample of size n = 2k is taken from a uniform intervaL Calculate P(XI <~. 3.8 >~. X3 < l, defined over the unit X4 > ~, ...• X2k > ~). COMBINING'RANDOM VARIABLES In Section 3.4, we derived a linear transfonnation frequentJy applied to single random variables-Y = a + bX. Now, armed with the multivariable concepts and techniques covered in Section 3.7. we can extend the investigation of transformations to functions defined on sets of random In the most important combination of a set of random variables is often their sum, so we section with the problem of finding the pdf of X + Y. Finding the pdf of a Sum Theorem 3.8.1. Suppose that X (JJ1d Yare independent random variables. Let W = X Then + Y. 1. If X (JJ1d Yare discrete random varinbles with pdfs px(x) (JJ1d py(y), respectively. pw(w) = PX(x)py(w - x) 2. If X (JJ1d Yare continuous random variables with pdfs fx(x) and fy(y). respectively, fw(w) = L: Jx(x)fy(w - x)dx Proof. 1. pw(w) = P(W = = w) = P(X + U P(X::::: x. Y = w ~. Y = W) x) = P(X X, Y w - x) ~x L P(X =x)P(Y = w = L px(x)Py(w - x) = x) aUx all of where the next-to-Iast equality derives from the independence of X and Y. 2. Since X and Yare continuous random variables, we can find fw(w) differentiating the corresponding c:df, Fw(w), Here, Fw(w) = P(X + Y ::::: w) is found by fx.Y y) fx<x) . fy(y) over the shaded R pictured in Figure 3.8.1. = Section 3.8 Combining Random V.ariables 221 y z """---- x o FIGURE 3.8. 1 F'IJ(w) = = f: f: fx(x)fy(y)dy dx = f: (f:fx(x) x fy(y) dY ) dx fx(x) Fy(w - x)dx Assume that integrand in the above equation is sufficiently smooth so that differentiation and integration can interchanged. Then we can write fw(w) = d dw f: (w) dl = d-w oo 1 00 fxex)Fy(w - x)dx = fx(x) -00 -00 (d-Fy(w - x) ) dx dw fx(x)fy(w - x) dx and the theorem is proved. Comment. integra] part (2) above is referred to as the convolution of the functions fx and Besides their frequent appearances in random-variable problems, convolutions turn up in many areas of mathematics and engineering. EXAMPLE 3.8.1 Suppose that X and Yare two independent binomial random variables, each with the same success probability but defined on m and n trials, respectively. Specifically, px(k) = (;)pJr.(1 - p)m-k, k = (:)pk(l- p)n-k, k=O,l .... ,1t 0,1, .. " m and py(k) pw(w), where W X + Y. 222 L"'~LIlI"" 3 Random By I'\","\,.",1n 3.8.1, pw(w) = L px(x)PY(w - x), but the summation over "all x" all x " ......,...P'.""" as the set of values for x and w - x such that are both nonzero. But that will be true for all to w. Therefore, needs to be Py(w - x), res:~cnv W pw(w) = ] ; px(x)py(w - x) =]; (m) x w px(x) and x from 0 n )PW-X(1 - P) 1l-(W-X) (w - x t(:)(w ~ x)pW(1 Now, consider an urn m red chips and I't If w chips are drawn is epllaOemefl1l-tne probability that exactly x red chips are in the the hypergeometric distribution, (3.8.1) 3.8.1 from x = 0 to x Summing w must equal one (why?), in which case so ' pw(w) = (m ; n)pW(l _ w = 0, 1•... ,n + m Should we recognize Definitely. Compare structure of pw(w) to the statement of Theorem The :random variable W has a binomial distribution where the probability of success at any trial is P and tbe total number of trials is n + m. Example 3.8.1 that the binomial distribution "reproduces" if X and Yare independent binomial random variables with the same value for p, their sum is also a binomial random variable. Not all variables share that property. sum of two independent random example, is not a uniform variable (see Question EXAMPlE 3.8.2 Suppose a monitor relies on an electronic sensor, whose X is modeled by the exponential pdf /x(x) = >..e- u • x > O. To improve the reliability the monitor, the manufacturer included an identical second sensor that is activated only in the redundancy,) Let the random event the first sensor malfunctions. (This is lifetime the second sensor, in which case the lifetime variable Y denote of the monitor can be written as the sum W = X + Y. Find fw (w). Section Combining Random Variables 223 Since X and Yare both continuous fw(w) = I: fx (x)fr (w x)dx (3.8.2) Notice that fx(x) > 0 only if x > 0 and h(w - x) > 0 only if x < w. Therefore, the to an integral from 0 to w, and integral in Equation 3.8.2 that goes from -00 to we can write fw(w) = low fx(x)fy(w low _ xl dx = low By fw(w), we can assess the improvement in the monitor's reliability afforded by cold X is an exponential random variable, E(X) = I/l (recall Example different, for example, are P(X ~ 1/1) ..... ~Pf twice the magnitude and P(W::::: Ill)? A simple calculation of the former: Commet'lt. P(X ::::: 1/,\) pew : : : l/le) = ('0 At).. I~= roo Jl/,).. Finding the pdfs of Quotients and Products We conclude this section by the pdf!; for the quotient and product of two is, given X and Y, we are looking for fw(w), where independent random I) W = YI X and W= of the resulting formulas is as important as the pdf for the sum of two random variables, but both formulas will play key roles in several 7. derivations in Theorem 3.8.2. and fr (y), rpt'nprTIl W= YIX. Then independent continuous random variables, with pdfs f x (x) /hat X is zero for at most a set of isolated points. LeI .n..'~'LUT"" fw(w) = J: IxIJx(x)fy(wx)dx 224 Olapter 3 Random Variables Fw(w) = P(Y IX :s w) and X;:::O)+P(YIX::::,w and X<O) = P(Y :s wX and X?': 0) + P(Y::::, wX and X < 0) = P(Y :s wX and X?': 0) + 1 - P(Y :s wX and X < 0) =P(YIX:sw = fooo i: /x(x)fy(y)dydx + 1- i:i: fx(x)fy(y)dydx Then differentiate Fw{w) to obtain li OO fW(W)=-d d Fw(w)= d w = fooo /x(x) (d~ i: 0 11 0 wX /x{x)fr(y) -00 fy(y) d Y) dx - dx - i: d -00 fx(x) d wX fx(x)fy(y) dy dx -00 f~ frey) dY) dx (3.8.3) (Note that we are to of integration and rht1t",.._,..."t,«"ttl"\n proceed, we need to differentiate the function G(w) = f::;'Jy(y) dy with respect to w. By the Fundamental Theorem of Calculus and the chain rule, we find d G(w) -d w = -d dw 1 wX fy(y)dy -00 d wx =xfy(wx) = fr(wx)-d w Putting this result into _-.. ___ '.~_. 3.83 gives 1 00 fw(w) = xfx(x)Jy(wx)dx = foco xfx (x) fy (wx) dx + 1 f': xfx(x)fy(wx)dx f : (-x)/x(x)fy(wx)dx 00 = Ix I/x(x)fy(wx)dx + f:1x'/X(X)fy(WX)dX = f: Ix I/x (x) fy(wx) dx which completes the proof. o EXAMPlE 3.8.3 Let X and Y be independent random variables with pdfs fx(x) fr(y) = Y > O. Define W = Y IX. fw{w). i.e-).x. x > 0 and Section 3.8 Combining Random Variables 225 Substituting into the formula given in Theorem 3.8.2, we can write Notice that the integral is the eXiJected value of an exponential ranldolm parameter ),,(1 + w), so it + w) (recall Example 3.5.6). )..2 fw(w) = A(l + 1 w) Theorem 3.8..3. ut X and Y be independent continuous random and fy(y), respectively. Let W XV. Then 00 fw(w) = Proof. A nnlt;-oy-rme provide a proof of 1 -00 <'~T,.iffl"h ...... Ul"".n w:::::O. VEII1FlTJlP" with pdjs fx(x) 1 j;i !x(wjx)fy(x) dx modification of the proof details are left to the Theorem 3.8.2 will o EXAMPLE 3.8.4 Suppose that X and Y are independent random variables with pdf's fx(x) and fy(y) = 2y, O:s y :s 1, respectively. Find fw(w), where W = XV. According to 1 00 fw(w) = -00 = 1,0 :s x :s 1 1 Ixlfx(wjX)!r(x) dx The region of 'nt"" ..·..,hn.... though, needs to be to of x for which the But fx(w/x) is positive only if 0 :s wjx :s 1, which implies that integrand is x::::: w. Moreover, fy(x) to be positive requires that 0 :s x :s 1. Any x, then, from w to 1 will yield a Therefore, jw(w) (1 !(1){2x)dx 1w x = (I 2dx =2 1w Comment. 3.8.1, 3.8.2, and 3.8.3 can adapted to situations where X and Yare not independent by repJacing the product of the marginal pdfs with the joint pdf. 226 Chapter 3 Random Variables QUESTIONS 3.8.1. Let X and Y be two ~4eJx:~ random variables. the marginal pdfs shown below, find the pdf of X + Y. In each case, check to see if X + Y belongs to the same family of pdfs as do X and Y. (a) px(k) 3.8.2. ).J< = and py(k) = e-/ 1 I)" k! k = 0, 1. 2•... (b) px(k) = py(k) (1 - p)k-l p, k 1. 2, ... ix(x) =xe-x,x ~ O,and h(y) e-Y,y::: 0, where X and Yare the pdf of X + Y. 3.8.3. Let X and Y be two independent random whose marginal pdfs are given below. Find the pdf X + Y. Hilll: Consider two cases. 0 .s 'W < 1 and 1 .s w 2. Jx(x) = I, 0 .s x .s 1, and /r(y) = 1,O.s y .s 1 3.8..4. If a random variable V is of two independent random variables X and Y. prove thaI V is independent X + Y. 3.8.5. Let Y be a uniform random variable over the interval [0.1]. Find the pdf of W = HinT: Firsl lind Fw ('W). O.s y .s L Find the pdf of W = y2. 3.8.6.. Let Y be a random variable with hey) = 6)'(1 find the pdf of X Y for the 3.8.7. Given tha1 X and Yare independent random following two sets of marginaJ (a) fx(x) = 1,0.s x .s 1, and (b) fx(x) 0 ..s x .s Land = 2y. 0 .s .'" .s 1 3.8..8.. Let X and Y be two independent random variables. Given the marginal indicated below, find the cdf of Y/ X. Hint: Consider two ca!>es, 0 .s 'W .s 1 and 1 < w. (a) /x(x} = ],0 .s x .s 1. and fy(y) 1, 0 .s .1' .s 1. (b) /x(x) = O.sx .s 1, and (\,) O.s y.s 1 random variables, where /x(x) u-', 3.8.9. Suppose that X and Yare two x::: 0 and /y(y) = e-'v , .'t" 2: O. Find of Y / x. 3.9 FURTHER PROPERTIES OF THE MEAN AND VARIANCE Sections 3.5 and 3.6 introduced the basic de:l1nhions related to the expected value and of single random variables. We learned how to calculate E(W), E[g(W}], E(aW + b), Var{W), and Var(aW + b), where a and b are any constants and W could be either a discrete or a continuous random variable. The purpose of this section is to examine certain multivariable of those based on the joint material covered in oJV"U!.!'" We begin with a theorem [hat generalizes E[8(W>]. While it is stated here for the case of fwo random it extends in a very straightforward way to include functions of n random hi'llrt'lm 3.9.1. 1. Suppose X and Yare discrete random variables with join! pdf p x.y (x, y), and let g(X, Y} be a X and Y. Then the expecTed value o.f the random variable g(X, Y) is by E[g(X, y)] = provided E E 18{x, y)1 all x lIli y g(x, y) . . Px.y(x, !-.) < 00. y) Further Properties of the Mean and Variance Section 3,9 227 2. Suppose X Yare continuous random variables with joint pdf Ix.Y (x. y). and let g(X, Y) be a C01:!Cll1UOllSjimction.. Then the value o/the random variable g(X, Y) is given by E[g(X. Y)] provided i:i: = . IX,r(x, y)dxdy Ig(x, y)1 . Ix.r(x, y) dxdy < 00 Proof basic approach taken in deriving this result is similar to the memclQ of Theorem 35.3. (134) for details.. followed in o EXAMPlE 3,9.1 Consider the two random variables X and Y shown in Table 3.9.1. Let pdf is detailed in the 2 x 4 matrix g(X, Y) = 3X - Y +Y definition of an ""."",,,t..,,, Find E[g(X, Y)] two ways-first, by using the secondly, by using 3.9.1. TABlE 3..9.1 o x Y 1 2 3 o 1 TABLE 3.9.2 Let Z = 3X 2XY + Y. By inspection, Z takes on the values 0, 1, 2, and 3 according the basic definition, then, an eXl)eC\:eO to the shown in Table 3.9.2. value is a weighted average, we see that Y)] is equal to one: E[g(X, Y)] = E(Z) = =0 =1 z . fz(z) 218 Chapter 3 Random Variables The same answer is obtained by applying Theorem 3.9.1 to the joint pdf given in Figure 3.9.1: E[g(X, Y)] = 0 . 81 + 1 . 41 + 2 . "81 + 3 . 0 + 3 . 0 + 2 . "81 + 1 -41 + O· 1 - 8 =1 The advantage, of course, enjoyed by the latter solution is that we avoid the intermediate step of having to determine fz(z). EXAMPLE 3.9.1 An electrical circuit has three resistors, Rx, Ry, and Rz, wired in parallel (see Figure 3.9.1). The nominal resistance of each is fifteen ohms, but their actual resistances" X, Y, and Z, vary between ten and twenty according to the joint pdf, 1 ix, Y.x (x. y, z) = 675000 (xy , + xz + yz), 10 ::: x ::: 20 10 ::: y ::: 20 10 ::: z ::: 20 What is the expected resistance for the circuit? FIGURE 3.9.1 Let R denote the circuit's resistance. A well-known result in physics holds that 1 1 1 1 R=X+Y+Z or, equivalently, R = XY + XYZ XZ + YZ = R(X. Y, Z) Integrating R(x, y, z) . ix, Y.Z (x, y, z) shows that the expected resistance is five: 111 W E(R) = 10 W 10 20 10 xy 20 ~z + xz + yz =5,0 10 67 1000 (xy 20 111 =65000 7 1, 10 20 . 10 xyzdxdydz 5. + .xz + yz) dx dy dz Further Properties of the Mean Se<:tion 3.9 Theol'em 3.9.2. Variance 229 X and Y be any hvo random variables (discrete or continuous dependent and let a and b be any two constLlnts. Then or + bY) = aE(X) + bE(y) E(aX and £(Y) are both finite. provided continuous case (the discrete case is " ...,v.. ,., much the same way). joint pdf of X and Y, g(X, Y) = aX + bY. By Proof Ix.r(x, y) Theorem 3.9.1, + bY) = i : / " : (ax 1 foo + bY)!x,y(x,y)dxdy 00 = -00 =a (ax)fx.y(x, y)dxdy + (by) fx. y (x, y) dx dy -00 1: (I": =a i:x!x(X)dX = flE(X) + + b i : y ( i : fx.y(x. y) dX) dy fx.r(x. y) dY ) dx x + b i:ylrCy) bE(Y) o Corollary. Let WI, W2,"" Wll be any random variables for which E(Wi) < 1.2, ... ,11, and let aJ, 02 •.. · ,an be any set £(atWl + fl2W2 + ... + a ll W,,) =fllE(W.) + a2E(W2) + ... + 00, ! = anE(W"J EXAMPLE 3.9.3 X be a binomial random variable defined on n success with probability p. Find E(X). Note, first, that X can be thOtight as a sum, X + represel1.ts the number of successes occurring at ith trial: .-11 X, - + ... + X", where if the ith trial produces a success a failure 0 if the ith Xi defined in this way on an individual binomial, then, can be thought assumption, Px, (1) = p and px; (0) = 1 £(X) trials, each trial resulting as p, i = = E(X}) + E(X2) is caned a Bernoulli random variable. sum of n independent BernoulJis.) By 1.2 •... , n. Using the corollary, + . <. + E(Xn) =n . E(X 1 ) the last step being a consequence of the )= 1 . p SO E(X) = np, which is what we found ............. '" identical distributions. But +0 . (1 - p) =p 3.5.1). 230 3 Random Variables Comment. problem-solving implications of Theorem and corollary should not be underestimated. There are many reaJ-world events that can be modeled as a linear combination at W. + a2 W2 + ... + all Wn• where the WiS are relatively prohibitively random variables. Finding E(al WI + a2 W2 + ... + an W,,) directly may complexIty of linear combination. Il may very difficult because of the the case, though, that calculating the individual E(Wi)'S is easy. Compare, for well 3.5.1. Both derive the formuJa that E(X) np with The approach taken in ExampJe (i.e., Theorem 3.9.2) is much The next several examples further explore technique of using linear combinations to facilitate the calculation expected values. EXAMPLE 3.9A A disgruntled secretary is upset about having to stuff envelopes. Handed a box of n letters frustration by putting the letters into the envelopes at and n envelopes, she vents random. How many people, on the average, will receive their correct mail? If X the of envelopes properly stuffed, what we want is applying Definition 3.5.1 would prove formidable because of the difficulty in getting a workable expression for Isee (97)J. By using the corollary to Theorem 3.9.2, though, we can the problem quite Let Xj denote a random variable to the number correct letters put into the i th L 2, .... n. Then Xi equals 0 or I, and envelope, i =1 fork = 0 1 px;(k) BU1X = Xl + X2 each of the X;'s = P(Xj = k} = ~ { for k -n 1 + ... + E(X".) Furthermore, + ... + X" and E(X) E(X 1) + the same expected value, lin: E(Xi) = k . P(Xi = k) = 0 . n - 1 -n +1 1 n 1 n It follows that E(X) E(Xj)=n· (~) =1 showing regardless of n, the expected nmntx~r of properly stuffed envelopes is one. (Are the Xi'S independent? Does it matter?) EXAMPLE 3.9_5 Ten fair are rolled. Calculate expected of the sum of the showing. If the random variable X denotes the sum of the faces showing on the ten then X Xl + X2 + ... + XlO Section 3.9 Further Properties of the Mean and Variance where Xi is the number for k ith = 1, k . i 231 = 1.2•... ,10. By assumption, PXj (k) = t 1= 1k=1 t k = 1 . ~) = 3.5. By the corollary to Theorem 3.9.2, = Notice that E(X) can also be deduced by appealing to the notion tbat expected values are centers of gravity. It should be clear from our work with combinatorics that P(X = 10) P(X = 60), P(X = 11) P(X = 59), P(X = 1.2) = P(X = 58), and so on. The probability function Px(k) is other words, implies that its center of is the midpoint of the range of its X -values. It must be the case, then, that E (X) 0135. = EXAMPlE 3.9.6 The honor count in a according to tbe formula: honor count =4 . + can vary from zero to thirty-seven of + 3 . (number of kings) + 2 . (number of queens) 1 . (number of jacks) is the expected honor count of North's hand? solution here is a bit unusual we use the corollary to Theorem 3.9.2 backwards. If Xi. i = 1, 2, 3, 4, denotes the honor count for players North, South, East, West, respectively, and if X denotes sum for the entire deck, we can write X=Xt x= E(X) =4 .4 + + + 3 . 4 +2 . 4 By symmetry, E(Xj) = E(Xj),i =# soitfollows ten is the expected honor count of North's hand. (Try making use of the fact that the deck's honor count is + 1 . 4 = 40 = 4· E{X t ), which implies that problem directly, without EXAMPLE 3.9.1 ;SU10o()Se that a random sequence of Is and Os is gen,erated by a computer, where the of sequence is n, and P = p(l appears in ith {X)sition) 232 Chapter 3 Random Variables and 1 - p = p(O appears in ith position). i::::::1,2, ... ,n the expected number of runs in the sequence? Note: Arun is a series of consecutive -"-,,-- outcomes. For example, the 1 1- 0 1 0 0 0 has a total of four runs (1 1, 0, 1, 0 0 0). Let denote the outcome appearing in position i, i = 1,2, ... , n. in the then, can be expressed in terms of the n - 1 transitions i = I, 1. Specifically, let It follows that R = total number of runs = 1 + Q(Xl, X2) + and ,,-1 E(R) = 1 +L E[Q(Xi. X H l)] ,,,,,1 But E[Q(X,. X i +l)] = 0 . = P(Xi ¢ = P(Xi = 1 n = p(1 - p) = + (1 0) - p) p + P(Xi = 0 n = 1) (because of independence) =2p(1 - Therefore, E(R) =1 + 2(n 1)p(1 - p) Expected Values of Products: A Special Case We k.now from Theorem 3.9.1 that for any two random variables X and Y. XYPX,y(x. y) xyfx.y(x, y) dx dy If, uX Y are discrete if X and Yare continuous X and Yare independent, there is an easier way to calculate E(XY). Further Properties of the Mean and Variance Section 3.9 233 Th.eorem 3.9..3. 1f X and Yare independent random variables, E(XY) = E(X) . E(Y) provided E(X) and E(Y) both exist. Proof. Suppose X and Yare both discrete random variables. Then joint pdf, px.y(x. y), can be replaced by the product of their marginal pdfs. px(x) py(y), and tbe double summation by Theorem 3.9.1 can be written as the product of two single summations: E(XY) LXY . PX,y(x, y) = x all y = LLxy . Px(x) . py(y) an ]( all y = LX all ]( . px(x) . = E(X)· [2> . py(y)] all y y) The proof when X and Yare both con tinuous random left as an <;;)"".l""''''..... o QUESTIONS 3.9.1. Suppose that r chlps are drawn with replacement from an urn containing n chips, numbered 1 through n. Let V denote the sum of the numbers drawn. Find E(V). 3.9.2. Suppose that fx.Y(x. y) = 0 S x. 0 S y. Find + Y). Suppose that fx.Y(x. y) = + 2y),O :::: x :::: 1,0:$ Y :$ 1. (recall Question 3.7.19(c)). Find E(X + Y). 3.9.4. Marksmanship competition at a level each contestant to take 10 shots with each of two different hand Final scores are computed by taking a weighted the number of bull's-eyes made with the first gun plus six times average of four the number gotten witb the second. If Cathie has a 30% chance of hitting the bull's-eye with each shot from the first gun and a 40% chance with shot from the second gun, what is her expected 3.9.5. Suppose that is a random variable for which E(X,.) = /1, i = 1.2, ...• n. Under what conditions win the following be true? Xi E(taiXi) = /1 1=1 3.9.6. Suppose that the daily Closing of stock goes up an eighth of a point with probability p and down an eighth ofapointwith probabilityq. where p > q.After n days how much gain can we the stock to have achieved? Assume tbat the daily fluctuations are independent events. 134 Chapter 3 Random Variables 3.9.7. An urn contains r red balls and w white balls. A + w). is of n balls is drawn in order and be 1 if the ith draw is red and 0 otherwise i = 1, 2, ... , n. == E(Xt). i :::::: 2, 3, ... , n to Theorem 3.9.2 to show that the number of red balls Let withoUl (11.) Show that (b) Use the two fair dice are tossed. Find the expected vaJue of the 3.9.8. of the faces 3.9..9. Find E(R) for a two-resistor circuit similar to the one described in examOle 3.9.2. where fx.Y(x, y) = k(x + y), 10 ::s x ::s 20, 10 ::s y ::s 20. 3.9.10. Suppose that X and Y are both unifonnly distributed over the interval Calculate Y) from the the expected value of the square of the distance of the random origin; that i~ find E(X2 + YZ). Hint: See Question 3.85. 3.9.n. Suppose X represents a point pjcked at random from the interval [0,1] on the and Y is a point at random from the interval [0, 1] on the Assume that X and Yare independent. What is the value of the area of the triangle fonned by the points (X. 0), (0. Y) and (O,O)? 3.9.12. Suppose Y1 , fz ..... f" is a random from the unlionn pdf over [0, 1]. The geometric mean of the numbers is the variable -\YY1 Y2 ..... Y". Compare the expected value of the mean to that of the arithmetic mean Y. Calculating the vllIlmuu·... We know from the of a Random Variables to Theorem 3.9.2 that w... for any set of random variables WI. W2, ...• provided E(Wi) exists for all i, A similar holdS for the varwnce of a sum of random variables, but only if the random variables Th~em E(Wj2) 3.9.4. Let WI, W2 .... , Wn be a set all i. Then Var(W] + W2 + ... + ofI1Ul!ep~enlJ!ent Wn ) = Var(W1 ) random + .. + + Proof. The derivation is given for a sum of two random induction argument would and 3.9.2. the Vl}t·mr..lp,~ for Wl which Var(Wn ) + W2. A simple for a.rbitrary n. From Theorems 3.6.1 Writing out the squares Var(WI + + 2W1 W2 + E(Wf) - [E(Wl)J2 + 2[E(W] W2) - Wi) - + [E(w]}f - 2E(W1 )E(W2) - [E(W2}f E(Wi) - [E(W2)f E(W])E(W2)] the independence of WI and W2, E(Wl W2) = E(Wl}E(W2), making the last term 3.9.1 vanish. The tenns combine to the desired result: Var(Wt + W2) = Var(Wl) + Var(W2). 0 "-''1'..... 'LVll Section 3.9 Further Properties the Mean and Variance 235 Corollary. UI WI. W2,"" W" be any set of independent random variabLes for which E(W?) < 00/01' all i. Let aI, a2 .... ,all be any set of constants. Then Var(al WI + 02 W2 + ... + an W,,) = Proof The derivation is an exercise. aiVar(Wt) + aiV ar(W2) + .. + a~Var(X,,) on Theorems 3.9,4 and 3.6.2. The details will left as o Comment. A more general version of Theorem 3.9.4 can be proved, one that leads to a slightly different formula but does not require Wi'S to be independent. The argument, however, depends on a definition we have not yet introduced. We will return to the problem of finding the variance of a sum of random variables in Section 11.4. EXAMPLE 3_9_8 The binomial random variable, being a sum of n independent Bernoullis, is an obvious candidate for Theorem 3.9.4. Let Xi denote tbe number of successes occurring on the itb Then ,_{I X, - with probability p 0 with probability 1 p and X=Xl + = total number of successes in + ... + 11 Find Var(X}. Note that E(Xd 1· p +0 . (1 - p) and = (1)2 . P + (O}2 . (1 - p) = p so Var(Xj) = E(X;) - {E(Xi)f = p - p2 = p(l p) It foHows, then, that the varilmce of a binomilll random variable is np(l - p): Var(X) = Var(Xi) i=l = np(l - p) 236 3 Random Variables EXAMPLE 3.9.9 In it is often necessary to draw inferences based on W, the average computed from a random sample of n observations. Two properties are especially important. if the Wi s come from a popula tion where the mean is J1.. the corollary to Theorem 3.9.2 that = J1.. if the come from a population whose variance is then Var(W) = To verify the latter, we can appeal to Theorem 3.9.4. Write - 1 n W=- Then = (~y Wi . Var(Wtl = (~y (T2 + 1 =n . + (~y a WI + (~y L 1 n W2 . Var(W2) + .. + 1 + ... + -n . WI! + .. + (~y . Var(Wn ) (~y =n QUESTIONS 3.9.13. Suppose that Ix, y(x, y) = A2e- Mx+ y), 0 ~ x, 0 ::: y. Find Var(X + Y). Hint: See Questions 3.6.11 and 3.9.14. Suppose that y) + 0 ~ x ~ 1,0 ::: y ~ 1. Find Var(X + Y). Hint: See Question 3.9.15. For the unifonn pdf defined over [0, 1j. find the variance of the geometric mean when n =2 Question 3.9.16. Let X be a binomial variable based on n trials and a success probability of Px; Jet Y be an independent binomial random variable based on m trials and a success probability of py. Find E(W) and Var(W), where W = 4X + 6Y. 3.9.17. Let the Poisson random variable U be the number of calls for technical assistance received by a computer company during the finn's 9 flOrmal workday hours. Suppose the average number of calls peT hour is 7.0 and that each call costs the company $50. Let V be a Poisson random variable representing the number of calls for technical assistance received during a remaining 15 hOUTS. Suppose the average number of caHs per hour is 4.0 for that time and that each such call costs the company $60. Find the cost and the variance of the cost associated with the calls received a 24-hour day. 3.9.18. A mason is contracted to build a patio retailling wall. Plans call for the base of the wall to be a row of 50 IO-inch each by ~-inch-thick monar. that the used are randomly chosen from a popUlation of bricks whose mean is 10 inches and whose standard deviation is dz inch. Also, suppose that the mason, on the average, will make the mortar ~ inch but the actual dimension varies from brick to brick, the standard deviation of the thicknesses being inch. What 1S the standard deviation of L, the of the row of the waH? What assumption are you making? -h _t'Tlnin 3.9 Further Properties the Mean and Variance 231 3.9.19. An electric circuit has six resistors wired in series, each nominally being 5 ohms. What is the maximum standard deviation that can be allowed in the manufacture of these resistors jf combined circuit is to have a standard deviation no greater than 0.4 ohm? 3.9.20. A gambler plays n hands of poker. If he wins the kth hand, he collects k dollars; if he loses the kIh hand, he collects nothing. Let T denote his total winnings in n hands. Assuming that chances of winning each hand are constant and are independent of his success or failure at any other hand, find and yar(T). Approximating the Variance of a Function Random Variables (Optional) It is not an uncommon problem for a laboratory scientist to have to measure quantities, each subject to a certain amount of "error," in order to calculate a final desired result example, a physics student trying to determine acceleration due to gravity, traveled by a freely falling body in T, is to G G, knows that the distance, by the equation (assuming the body is initially at rest) or, equivalently, 2D G= T2 Suppose distance and time are to be measured directly with a yardstick and a stopwatch. The values obtained. D and T, will not be exactly correct: rather, we can think of them as being realizations random variables, with those variables having values J.LD and J.LT and variances Var(D) and Var(T), the two numbers reflecting the lack of precISIon the measuring process. Suppose we know from past experience the precisioos characteristic of distance and time measurements-what can we then conclude about the precision in the calculated value G? is, knowing Var(D) and Var{T), can we find Var(G)? By way of background, We have already seen one result that bears directly on this sort of "error-propagation" problem. If the quantity to be calculated, W, is the sum of n independently measured quantities, WI> Wz • ... , W", and if the variance associated with each of the Wi'S is known, we can appeal to Theorem 3.9.4 and say that Var(W) = Var(Wl) + Var{W2) + ... + Var(W,,) (3.9.2) In general, extending Equation 3.9.2 in any exact way to situations where W is some arbitrary function of a set of W = g(Wl' W2, ... , difficult. It is a relatively simple matter, though, to get an approximation variance of W. More specificaUy, suppose that W is a function of n indepeodent random variables-that W = g(Wl. W2. .• Wn )· Assume that tti and Var(Wi) are the mean and variance, respectively, of Wi. i = 1.2, ... , n. Using the first-order terms in a Taylor expansion of 238 Chapter 3 Random Variables "'2 ..... !-lit), we can write the function g (WI, W2.. .• Wit) around the point !-ll!) [ Applying the corollary to Theorem 3.9.4 to Equation approximation: Var(W) == . ilg aw: I ] (3.9.3) " (Ilt. ... •Il,,) yields the SOlIWl[-31:rer [il (3.9.4) CASE STUDY In a dental X-ray unit, from the cathode of the tube are decelerated by nuclei in the au\.JU'V. thereby producing Bremsstrahlung radiation collimated by a lead-lined tube, effect the ae:ilrt:~ (X-rays). These emissions, image on a sheet film. Tennessee state regulations (170) require that the ctistanlce, on the anode of an X -ray tube to the patient's skin equipment, particularly older units, that distance cannot measured directly because the exact location the focal spot cannot determined just looking at the tube's outer housing. When is the case, state inspectors resort to an indirect procedure. Two films are exposed, one at the unknown distance Wand a second at a distance W + Z. The two diameters, X Y, of the resulting circular images are 3.9.2). then measured (see By similar triangles, or W= XZ Y-X (3.9.5) in the context of our previous W g(X. Y. Z) = XZ(Y X)-I (Cominued on next page) Section 3.9 further Properties of the Mean and Variance 239 Focruspot 00 anode Thbe T 1\ I \ I \ I \ I I I \ I \ Lead-lined collimator I : w \ \1-----7 First film- z 1 FiGURE 3.9.2 During the course of one such inspection (96), values measured for the two diameters X and Y and the backoff distance Z were 6.4 em, 9.7 em, and 10.2 em, From 3.9.5, then, the anode-to-patient distance is to w = (6.4)(10.2) = 19.8 cm indicating that the unit is in compliance. If the error in W, though, were sufficiently 18 em, might be a probability the true W was less meaning the unit was, in fact, out of compliance. It .is not unreasonable, therefore, to inquire about the magnitude ofVar(W). To apply Equation 3.9.4, we first need to compute the partial derivatives g(X, Y. Z). In this case, ag + -= ax ay and Z ag az - x (Continued on next page) 240 Chapter 3 Random Variables (Case Siudy 3,9.} continued, Ins:pec:;!:Ors feel that the standard deviation in any of measurements is on the order of 0,08 em, so Var(X) = Var(Y) Var(Z) (0.08)2. Substituting the variance estimates and the derivatives, evaluated at the point (ILX, ILY. ILZ) (6.4,9.7,10.2) into 3.9.4 = Var(W) == (6.4)(10.2) (9.7 - + + -(6.4)(10.2) 10.2 ]2 (0.08)2 (9.7 - 6.4) 0.08 ( 2 [(9.7 6.4- 6.4) ]2 (0.08)2 ) + = 0.782 Therefore, the estimated standard deviation associated with the calculated value is or 0.88 em. W QUESTIONS 3.9.21. A physics student is trying to determine the gravitational expression cOI1St~mt, G, the both (D) and time (T) are to be measured. that the standard deviation of the measurement errors in D is 0.0025 feet and in 0.045 seconds. If the experimental apparatus is set up so that D will be 4 then T ti]1 be approximately second. [f D is set at 16 T will be close to 1 second. Which of these two sets of values for D and T wiJI give a smaller variance for the calculated G? 3.9.22. Suppose that Wt, W2 •... , and Wll are independent random variables with variances (ft, and respectively, and let W = Wj + W2 + .. + W". Var(W) using Theorem 3.9.4 and Equation 3.9.4. 3.9.23. If h 1S 1tS height and a and b are the lengths of its two parallel the area of a rrape:WIO is given by i ui, ... , A 1 = 2'(0 + b)h p.y,,,"p,,,,,'I"'" that approximates (fA if a, b, and h are measured independently with deviations (fa, m" and (fll, respectively. calculated distance) 3.9.24. In Case St udy 3.9.1. notice that the difference bet ween 19.8 cm is slightly more than two standard and 18 cm (the state reguLation minimum X-ray deviations. What does that imply about the probabilily that this machine is operating 3.10 ORDER STATISTICS The single-variable transfonnation taken up in Section 3.4 invoLved a standard linear ujJ,•• "' .......,"", y = aX + b. The bivariate transformations in Section 3.8 were similarly arithmetic, being concerned with either sums or products, In this section we will consider a different sort of transfonnation, one involving the ordering of an entire set of Section 3,10 Order Statistics 241 random ',.",n<:lt" ........ <ltu'n has wide applicability in many areas of statistics, we consequences in later Here, though, we will limit our discussion to two basic derivations of (1) the marginal pdf of the ith largest observation in a random sarnDlle and (2) the joint pdf the ith and jth observations in a random sample. Definition 3.10.1. Let Y be continuous random variable for Yl. n .... , y" are values of a random sample of n. Reorder the YiS from smallest to largest: YlI < J < ,., < Yn (No two Define the YiS are equal, except with probability zero, since Y is continuous.) random variable Yi to have the value Y;, 1 ~ i n. Then Y/ is called itb ordersuJJtistic. Sometimes y~ Y: are denoted Ym::.x and Ymin, respectively. EXAMPLE 3.10.1 "'''''I.1U''"''''-' that four measurements are made on the random Y: Yt Y4 = 3.2. The corresponding ordered sample would be = 3.4, Y2 = 4.6, 2.6 < 3.2 < 3.4 < 4.6 The random variable representing its value for this particular sample being 2.6. Similarly, is 3.2, and so on. would be denoted Y{, with for the second order The Distribution of Extreme Order Statistics By definition, every observation in a random sample the same pdf. For example, if a set of four measurements is taken from a nonnal distribution with fl. = 80 and (J = then /YI(Y)' /Y1(Y)' /Yj(Y)' and /y4 (Y) are the same-each is a pdf with fl. = 80 and (J = pdf describing an observation, though, is nor the same as the pdf describing a random observation. Intuitively, that makes sense. If a observation is drawn from a normal with JJ. = 80 and cr = it would not be surprising if that observation were to on a value near On other hand, if a random sample of n = 100 observations is drawn from same we would not the sma11est observation-that is, Ymin-to anywhere near eighty. Common sense us that that observation is likely to much smaller than eighty, just as observation, Ymax , is likely to be much than J u ...... OJJ", It follows, then, that we can do any any applications whatsoever-involving order statistics, we need to know the pdf of = 1,2, ... , n. We begin by investigating the pdfs of the "extreme" order statistics, irm.. (y) irmi"(Y)' are the simplest to work with. At the end the section we return to ...",.\",,.,,,1 problems of (a) the pdf of Y{ for any i and (b) tbe joint Yj. wherei < j. Y; ; 242 Chapter 3 Random Variables EXAMPLE 3.10.2 Suppose that Y2. ...• Y" is a random sample of continuous random variables, each h(Y) and edt (y). Find having a. hmax(Y) b. h",m (y) = fY~(y), the of the largest order statistic fyl (y), the pdf of the smallest order 1 Finding the of Ym.ax and YOOn is accomplished by using the now-familiar technique of differentiating a random variable's cd!. Consider. example, the case of the largest order Y~: Fr~(Y) = Fym;u(Y) = P(Ym.ax :5 y) = P(YI == P(YI :5 Y n :5 Y n ... n :5 y) :5 y) . P(Y2:5 y) ... P(Y" :s. y) (why?) = [Fr(y)]" Therefore, (y) = d/dy[(Fy(y)]II] = n[Fr(y)]1I-1 fy(y) Similarly, for the smallest order statistic (i Fy1 (y) 1), = FYmim(Y) = P(Ymin :5 1 - P(YOOI1 > y) =1 - P(YI > y) . P(Y2 > y) ... P(Yn > y) =1 - [1 - Fy(y)]n Therefore, fy'(y) 1 = d/dy[l - [1 - Fr(Y)]"} = _ Fr(y)]n-t h(Y) EXAMPLE 3.10.3 Suppose a random sample n = 3 observations-V!, Y2. and taken the y ~ O. Compare hI (y) with fy;(y). Intuitively, which will exponential pdf, hey) = be larger, P(YI < 1) or P(Y{ < 1)1 The pdf for of oourse, is just the pdf of the distribution being sampled-that is. that we apply the given in CXi:l~rnplC: 3.10.2 for Section 3.10 Order Statistics 243 3 Probability density AGURE 3.10.1 Then, since n = 3 (and i 1). we can write Figure 3.10.1 shows the two pdfs plotted on the same set of axes. Compared to frl (Y), the pdf for Y{ has more its area located a.bove the smaller values of y (where Y; is more likely to lie). For example, probability that observation (out of three) is less than one is 95%, while the probability that a random observation is less than one is only 63%: P(Yf < 1) = 10 1 3y 3e- dy = 10 3 3 du = _e- u 1 0 =0.95 P(Yl < 1) = 10 1 dy = =1 0 =0.63 EXAMPLE 3.10.4 Suppose a random sample of ten is drawn from a continuous pdf fr(y). What is the probability that the largest observatioo, Y{o' is less than the pdf's median, m? Using the formula for frl1.0 (y) = frmM.(Y) given in 3.10.2, it is certainly true that (3.10.1) but the problem does not """"',M'V fy(y), so Equation 3.10.1 is of no help. 244 Chapter:3 Random Variables is available, even if fy(y) were sp<;:CltJled: to the event "Yl < m n Y2 < m n ... n YIO < m Fortunately, a much event "Y'10 < m" is Therefore, SOlltllICm P(Y{o < m) = P(YI < m, H • (3.10.2) < m, ... , YIO < m) But the ten observations here are mClepen<:lenlt, so the intersection probability implicit on """""-.L,uu... side of Equation 3.10.2 ten tenns. Moreover, each of those terms equals ~ (by definition of the me,dUllnJ, P(Y{o < m) = P(YI < m) P(Y2 < m) < m) = (~)10 = 0.00098 A General Formula for fy~(y) I Having discussed two more general fJU.JI.,,,,v,U integer from 1 through n. statistics, Yrnin and Yrnax • we now turn to the for the i th order statistic. where i can any be a random sample of continuous random variables Theorem 3.10.1. Let Y1, Fy(y). The pdf of the ith order statistic drawn from a distribution having pdf fy(y) is given by n! fr:(Y) = (i - l)!(n - i)! 1::::; i sn. We will give a heuristic argument that draws on statement of Theorem 3.10.1 and the binomial the given for /y1(Y), see V"L'UU. similarity between the For a fonnal induction I the derivation of the binomial probability function, px(k) (:)tI(l , where X is the number of successes in n independent P(X k) p trial in success. Central to that derivation was the recognition that the event" X k" is actually a union of all the different (mutually k successes and n - k failures. Because the trials exclusive) sequences having of any such sequence is pk (1 - P )11-k and the number are independent, the is the probability that of such sequences (by that X = k is the product, G) Here we are lODking for the fy~(y). As was the case with I is nl/[k!(n - k)!] (or G)), so the probability of the ith order statistic at some point y-that will reduce tD a combinatDrial omomllal, that Section i-lobs. 1 obs. 245 n-; obs. -~--+----"---- Y-axis RGURE 3.10.2 probability with an intersection of independent events. The only difference is that Y/ is a continuous variable, whereas the we find here will be a probability density binomial X is discrete, which means that Theorem 2.6.2, there aren!j[(i - 1)111(11 - i)!J ways that II observations can parceled into three groups such that the ith is at the point y Figure Moreover, likelihood associated with any particular set of points having the configuration pictured in Figure 3.10.2 will the probability that i - 1 (independent) observations are all than y, II i observations are greater than y, and one probability associated witb those for a given observation is at y. set of points would be [Fy(y)]I-l[l - Fy(y)],,-l frey). The probability density, then., that the ith order statistic is located at the point y is the product, n! Fy(y)r- i fy(y) (i - 1)1(71 o EXAMPlE 3.10.S Suppose that many observation confirmed the annual maximum flood tide Y (in feet) for a certain river can modeled by the pdf fy(y) 1 = 20' 20 < y < 40 (NOle: It is unlikely that flood tides would be described by anything as as a uniform pdf. We are making that choice solely to facilitate the mathematics.) The Army OJrps of Engineers are planning to build a levee a certain portion of the and so that is only a 30% that the worst they want to make it high How high should the flood in the next thirty-three years will overflow the levee be? (We assume that there will be only one potential flood per year.) Let h be the height. If Ylo Y2, _. _, Y33 denote the flood tides for the next n years, what we require of h is that P(Y:h :> h) As a = 0.30 point, notice that for 20 < Y < 40, F'I(Y) = y 1 dy = 20 20 l 20 - 1 246 Chapter 3 Random Variables Therefore, 33 ' (y =- 31tH 20 )31 (2 - -y)1 1 fy} (y) :J2 1 20 and h is the solution of the integral equation 140 h (33)(32) (20Y )31 (2 - - 1 Y) 20 1 dy =0.30 20 (3.103) If we make the substitution u= - 1 Equation 3.10.3 Simplifies to P(Yn > h) = 33(32) f1 u 31 (1 - u) du {hj'20)-l =1 _ 33 (~ y2 + _ 1 32 (~ y3 _ 1 (3.10.4) Setting the right-hand side of Equation 3.10.4 equal to 0.30 and solving for h by trial and error gives h = 39.3 feet Joint pdfs of Order Statistks Finding the joint pdf of two or more order statistics is easily accomplished by generalizing the that derived from Figure 3.10.2. Suppose, for example, that each of n observations in a random sample has pdf Jy(y) and cdf Fy(y). The joint pdf for order Y[ and Yj at points u and '!i. where i < j and u < v, can be deduced from FigUJe 3.10.3, which shows how the n points must be distributed if the ith and jtb order statistics are to be located at points u and '!i, respectively, By Theorem 2.6.2, the number of ways to divide a set of 11 observations into groups of sizes i 1, 1, j i - ], 1, and 11 - j is the quotient n! (i - 1)!1!(j - i - 1)111(11 - i)! obs. ---'--+-~--'-----t---<-- RGU'RE 3.10.3 y.a:.xis Order Statistics 241 Section 3.10 Also, given the independence of the n observations, the probability that i 1 are less than u is (Fy(u)]i-i, the probability thatj - i - I are betweenu and vis [Fy(v) - Fy(uW-i-i, and the probability that n - j are greater than v is [1 - Fy(v)j'l- j. Multiplying, then, by the pdfs describing the likelihoods Y/ and Yj would be at points u and v, respectively. the joint pdf of the two order statistics: /y/.r;(u. v):=: - ( t - - - - - - - - - - j - ) ! [1 - Fy(v)],,-J fy(u) !rev) (3.10.5) for i < j and u < v. EXAMPLE 3.10.6 Let Yl, Y2. and Y3 be a random sample of n = 3 from the uniform pdf defined over the unit interval, Jy(y) = l,O :::: y :::: 1. By definition, the range, of a sample is the difference between the and smallest order statistics-in this case, R = range = Yma )( - Ymin = Y3 - Y; Find fR(r), the the range. We will begin by finding the joint pdf of Y; and Y3. fY~'Y3(U, v) is integrated over the region Y3 - Y; :::: r to find the edt, F,?(r) = peR s: r). The final step is to differentiate the edf and make use of fact that !R(r) = F~(r). If Jy(y) 1.0 s: y .:::; 1, it follows that = Fy(y) = P(Y :5 y) Applying Equation 3.10.5, Y~. Specifically, fy:.y~(u, v) with n Moreover, we can write the FR(r) - y < 0 y, 1. y:> 1 0:5 Y :5 1 t = 3, i = 1, and j = 3, gives the joint pdf of Y{ and 3! 0 = O!1!O!u (v =6(v o, u}, 1 - u) (1 - v) 0.:::; u < v = P(Y3 - Y~ ::5. r) . 1 1 :s 1 VI. for R in terms of Y{ == P(R .:::; r) 0 13' = P(Y3 :5 Y: + r) Figure 3.10.4 shows the region in the f{ f)-plane corresponding to the event that R :5 r. Integrating the joint pdf of Y{ and Y3over the shaded region FR(r} = peR s: r) = /ool-r fll+Y 6(v 1/ - u) av au + 11 11 J -y II 6(v - u) av au 248 Chapter 3 Random Variables FIGURE 3.1004 The double integral equals which implies that QUESTIONS :UO.1. Suppose the length of in minutes, that you have to wait at a bank teller's window is uniformly distributed over the interval (0, 10). If you go to the bank four times during the next month, what is the probability that your second longest wait will be less than 5 minutes? 3.10.2. A random sample of size 11 6 is taken from the pdf Jy(y) 3y2, 0 Y :5 1. Find P(Ys > 0.75). 3.10.3. What is the probability that the larger of two random observations drawn from any continuous pdf will exceed the sixtieth percentile? 3.10.4. A random sample of size 5 is drawn from the pdf Jy(y) = 2y, 0 :5 Y :5 L Calculate P(Yf < 0.6 < Ys )' Hint: Consider the complement 3.10.5. Suppose that YI, Y2, ...• YI1 is a random sample of size n drawn from a continuous pdf, fY(Y), whose median is m.ls P(Y{ > m) less than, equal to. or greater than P(Y~ > m)? 3.10.6. Let Yb ...• YI) be a random sample from the exponential pdf !y(y) e-Y , y > O. What is the smallest n for Which P(Ymin < 0.2) > 0.91 < 0.7) if a random of size 6 is drawn from the uniform 3.10.7. Calculate P(0.6 < pdf defined over the (0,1). 1.10.a A random sample of size 11 = 5 is drawn from the fy(y) = 2y, 0 < y < 1. On the same set of axes, the pdfs for Yl, YI' and Y 3.10.9. Suppose that 11 observations are taken at random from the pdf :s s' fy(y) -oo<y<oo What is the probability that the smallest observation is than20? Section 3.11 Conditional Densities 249 3.10.10. Suppose that n observations are chosen at random from a continuous pdf fy(y). What is the probabOity that the last observation recorded will the smaUest number in the entire sample? 3.10.11. In a certain Jarge metropolitan area the proportion, Y, from school to school. The of proponions is following pdf: 2 It.-_ _..L. o Y 1 Suppose the enrollment figures for five schools selected at random are examined. What is the probability that with the fourth highest proportion'of bused children will have a Y -value in excess What is the probability that none of the schools will have fewer than 10% their student bused? 3.10.12. containing n components, where the lifetimes of the components ,Y > O. Show thai are random variables and each has pdf fy (y) = average time elapsing before component failure occurs is tInA. 3.16.13. Let Yb Y2, ... , Y" be a random [rom a unifonn pdf over (0.1]. Use Then1 rem 3.10.1 to show that 10 - i)! = -'----'--'----. n! Question 3.10.13 to find the value of where YI , Yl , .. . , is a random 3.10.14. <:!=Irnnll~ from a uniform pdf defined over interval (0,1]. 3.10.15. three poinls are picked randomly from the unil interval. Whal is the that the three are within a haUunit of one anolher? 3.10.16. Suppose a device has three components, all of lifetimes (in months) are modeled by Ihe pdf, fr(Y) = Y > O. What is the probability that all three components will fail within two months of one another? (1 y)tI-idy yt, CONDITIONAL DENSITIES We have seen thaI mallY of the defined in Chapter 2 to the their random-variable counprobabilities example, terparts. Another these carryovers is the notion of a conditional probability, or. in what will be our tenninology, a coruiilional probability density of conditional pdfs are not uncommon. The and of a tree, for can be considered a pair of random variables. It lS easy to measure it can be height~ thus it interest to a lumberman to know the difficult to heights given a known value for its of a school board member agonizing over which way !o vote on a pn)p()seo would be that much easier if she knew the increase. Her that x additional tax dollars would stimulate an conditional increase of y points twelfth-graders taking a standardized proficiency exam. 250 Chapter 3 Random Variables finding Conditionar pdfs for Discrete Random Variables In the case of discrete random variables, a '-'V1Jl\.llLlUu,cU pdf can be treated in the same way as a conditional probability. Note the between 3.11.1 and 2.4.1. DefinitionJ.ll.l. Let X and Y be discrete AW"'~~"U density function of Y given x-that is. the probability that X is equal to x-is denoted PYlx(Y) and pyt>;{Y) = P(Y = Y I X = x) = conditional probability Y takes on the value Y given ~.;....;......:- for px(x) 'f O. EXAMPLE 3.11.1 A fair coin is times. Let random variable Y denote the total number of heads that occur, and let X the number of heads occurring on the last two tosses. Find the conditjonal PYlx(Y) for all x and y. Clearly, there win be three conditional pdfs, one for eacb possible value X (x = 0, x = 1, and x = 2). for each value of x there will be four possible values of Y, based on whether the tosses 0,1,2, or 3 heads. For example, suppose no occur on the last two tosses. Then X = 0, and PYIO(Y) = P(Y = Y I X = 0) occur on first three tosses) = = <011'...""'....0" Similarly, Y= that X = L The corresponding py~(y) Notice that Y the first C) (~r (1 - ~r-Y (3) (~2)3. 0. 1,2,3 \y in that case becomes = P(Y = Y I X = 1) 1 if zero beads occur in the first three tosses, Y = 2 if one and so on. Therefore, occurs in Section 3.11 Conditional Densities 251 .! Pm(y) ~ 8~ ______________L -__ ~ ____~____~_ _ x=l 3 PyU(y) ~ 8~ ________~____L -__~____- b_ _ _ _ _ __ _ x 0 .! PYJ06') ~ gL-__~____~____~__~____- b_ _ _ _~_ _ o '2 1 3 4 y~ 5 AGURf 1.11.1 has the same shape, but the possible Figure 3.11.1 shows the three conditional values of Y are different for X EXAMPlE 3.11.2 Assume that the probabilistic behavior of a pair of discrete random variables X and Y is described by the joint pdf px.y(x, y) defined over the four points (1, that X = 1 given that Y 2. By definitioo, = PXI2(1) = xy2/39 (1,3), (2, 2), and (2, 3). Fmd the conditional probability = P(X = 1 given that Y = 2) :: px.y(l,2) py(2} 1 . 22 /39 = 1 . 22 /39 + 2 . 22 /39 =1/3 EXAMPl.l3.11.3 Suppose that X and Y are two independent binomial random variables, each defined OIl n trials and each having same success probability p. Let Z = X + Y. Show that the conditional pdf PXlz(x) is a hypergeometric distribution. 252 Chapter 3 Random Variables We know from Example 3.8.1 that Z has a binomia1 distribution with parameters 2n p. That P) 2n-z • z 0,1, .... 'In. Definition 3.11.1, PXlz(X) = xlZ = z) = pX,z(x. z) = pz(z) P(X = x and Z = = = z) z) P(X = x and Y = z - P(X = P(Z x) = z) . P(Y = z = z) (because X and Yare independent) which we recognize as being the hypergeometri.c Comment. The notion of a conditional pdf generalizes easily to situations involving more than two discrete variables. For example, if Y, and Z have the joint pdf PX,Y,z(x, y,z). JOInt conditional pdf of, say, X and Y that Z = z is the ratio EXAMPLE 3.11.4 Suppose that random variables Y, and Z have the jOint pdf Px.y.z(x, y, z) = xy/9z for points (1, 1, 1), (2, 1,2). (1,2,2). (2.2,2), and 2, 1). Find PX. Ylz (x, y) values ofz. To begin, we see from the for which PX,Y,z(x. y. z) is defined that Z has two values, 1 and 2. Suppose z = 1. Then 1ftJ<l,"LV,' .... PX,YI1(x,y) = - - - - ' - - Section 3,11 Conditional Densities 153 But pz(l) = P(Z = 1) = P[(1, 1, 1) U (2.2,1)] 1 2 =1'9<1+2'9,1 5 -9 Therefore, xyJ9 PX.Yll (x. y) = - 5 "9 = xyJ5 for (x, Y) = (1,1) and (2,2) Suppose z = 2. Then pz(2) = P(Z = 2) = P[(2, 1, 2) U (1,2,2) U (2,2,2)] 122 = 2 . 18 + 1 . 18 + 2 . 18 - 8 18 so PX.YI2 (x, Y ) = PX,Y,z(x. y. 2) pz(l) x . yJ18 8 _ xy - 8 m for (x, y) = (2,1), (1,2), and (2, 2) QUESTIONS 3..11.1. Suppose X and Y have the joint pdi PX,y(x, y) :::::: 3.1L2. 3.11.3. 3.11.4. 3.11.5. X + ~1 + xy for the points (1, 1), (1,2), (2, 1), (2. 2), where X denotes a "message" sent (either x = 1 or x = 2) and Y denotes a "message" received Find tbe probability tbat the message sent was the message received-that is, find pyrx (x), Suppose a die is rotied six times. Let X be the total number of 4's that occur and let Y be the number of 4's in the first two tosses. Find PYlx(Y), An urn contains eight red chips, six white chips, and four blue chips. A sample of size 3 is drawn without replacement. Let' X denote the number of red ch ips in the sample and Y, the number of white chips. Find an expression for PYlx (y). Five cards are dealt from a standard poker deck. Let X be the number of aces received, and Y. the number of kings. Compute P(X ::::: 21Y 2). Given that two discrete random variables X and Y follow the joint pdf px.Y(x, y) = k(x + y), for x = 1.2,3 and y = 1. 2. 3, = 254 Chapter 3 Random Variables (8) Find k. (b) Evaluate PYlx(l} for aU values of x for which Px(x} > O. 3.11.6. Let X denote the number on a chip drawn at random {rom an urn contairling three numbered 1, 2. and 3. Let Y be the nuro ber of heads that occur when a fair coin [s tossed X times. (8) Find PX,Y(x, y}. (h) Find the marginal of Y by summing out the x-values. 3.11.7. Suppose X, Y, and Z have a trivariate distribution described by the joint PX.Y.Z(x, y, z) where x, y, and z can be 1 or 2. Tabulate the conditional pdf of X and Y given each of the two values of z. 3.11.8. In Question 3.11.7 define the random variabk W to be the "majority" of x, y, and z. For W(2.2, 1} = 2 and Wl1, 1, I} = 1. Find the pdf of Wlx. 3.1L9. Lei X and Y be independent random variables where Px(k) k e-Jl.~ for k k. binomial with = 0, 1, .... Show thai = e- l ),/( k! of X given X the conditional and py(k) = =n is + Y nand _A_. Hint: See Question 3.8.1. A+f.1. 3.1LIO. Suppose Compositor A is preparing a to be publiShed. Assume that she makes X errors on a page, where X has the Poisson pdf, px(k) = e- 22RI k!, k = 0, 1, 2, ... A compositor, is also on the book. He makes Y errors ana page, where py(k) = e-3 y Jk!,k = 0,1,2, ... Assume that Compositor A nrf'naN':4<:. the first 100 pages of the ~ext and Compositor the last 100 After is completed, reviewers (with 100 much time on their hands!) that the text contains a total Write a formula for the exact probability that fewer than half of the errors are due to Compositor A Finding Conditional pdfs for Continuous Random Variables If variables X and Yare continuous, we can stH! appeal to the quotient IX,r(x, y)J /x(x) as the definition of frlx(Y) and argue its propriety by A more satisfying approoch, though, is to arrive at the same conclusion by taking the limit of Y's "conditionaJ" cdf. If X is continuous, a direct evaluation of FVlx(Y) = P(Y :::::: ylX = x), via Definition 2.4.1, is impossible, since the denominator would be O. Alternatively, we can think of P(Y :::::: ylX = x) as a limit: .5 ylX =x) lim P(Y .5 ylx .5 X .5 x h-+O + h) (I, u) dudt = hlim -=---'--~---- ..... oo /x(t)dt Conditional Densities 255 Evaluating the quotient limits gives 8. so rule is indicated: . 1r, l~+h 1:00 Ix,y(t, 1..1) du dt jX+h p(r ~ ylX = x) = l~ fx(t)dt x By (3.11.1) fundamental theorem of cakulus, d (x+h dh which simplifies Equation Jx = g(x + h) get) dt to + h), Ix.y[(x P(Y:s ylX ::=x) = fx(x lim Ix y(x du + h) + h. 1..1) du , 1 11-.0 ' -~=--------- fx(x + h) -00 /x,Y(X, du fx(x) provided that the limit the can be interchanged [see (9) for a discussion of when such an is valid]. It follows from this last expression that fx.Y(x, y)/lx(x) behaves as a probability density functioo should, and we are extending Definition 3.11.1 to the continuous case. EXAMPlE 3,11.5 Let X and Y be contin~ous random variables with joint 1(~)(6 - x fx.Y(x,y) = 0 < x < 2, - y), 0, Fmd (a) a. !x (x), (b) elsewhere fylx(y), and (c) < Y < 31x = 1). Theorem 3.7.2. =.t: fx(x) = b. Substituting into the f Ylx () Y - (~) (6 14 (~) rex, y) dy = - 2x), 0 < x < 2 (6 - x - y) dy statement of .IJ,-,lUllIU.Ull we can write fx.Y(X, y) _ ~ _ _ -_x_-.....:y-.:...) fx(x) _ 2x) 6 ----..:.. 2<y<4 256 Chapter 3 Random Variables Co TofindP(2 < Y < 31x = l} we simply integrate fyll(Y) over the interval 2 < Y < 3: P(2 < Y < 31x 1) = = i 3 fylt (y) dy (J 5 - 12 dy 4 5 8 [A partial check that the derivation of a conditional integrating Inx (y) over the entire range of Y. That should be one. Here, for fyl1(Y) dy = [(5 - y)/4] dy does equal one.] example, when x = 1, R QUES110NS 3.11.11. Let X be a nonnegative random variable. We say that X is memoryless if P(X > S + fiX> t) P(X > s) forail s, 12: 0 Show that a random variable with pdf /x(x) = (1/'A)e- X / A, x > 0, is memory less. 3.11.12. Given the joint pdf /x,y(x, y) = 0 < x < Y, y > 0 find (8) P(Y < llX < 1) (b) P(Y < llX 1) (c) iYlx (y) (d) £(Ylx) 3.1L13. Find the conditional pdf of Y x if Ix.Y(x, y) =x + y for 0 :$ x :$ 1 and 0 :$ y :$ 1. 3.11.14, (f Ix,Y(x. y) = 2, x 2: O. Y 2: 0, x + y:$ 1 show that the conditional pdf of Y given x is uniform. 3.11.15. Suppose that + 4x + and /x{x) == 3'1 . (1 + 4x) for 0 < x < 1 and 0 < y < 1. Find the marginal pdf for Y. 3.11.16. Suppose that X Yare distributed to the joint pdf fx.Y(x. y) ;:::: 2 '5 . (2x + 0.::5 x :$ l. 0.::5 y .::51 Section 3.12 Moment-Generati ng Functions 259 By Definition 3.12.1, (3.12.2) To get a closed-form expression for Mx(t)-that to evaluate the sum indicated in Equation a (hopefuJly) familiar formula from algebra: According to Newlon's binomial expansion, (3.12.3) for any x and y. and 3.12.3, then. that we let x = pel Mx(t) (Notice in this case and y = (1 - =1 P + p. It [olllo~fS from Equations 3.12.2 pel)t! MX (I) is defined for all values of t). EXAMPLE = Suppose that Y an exponential pdf, where Jy(y) Ae-AY , y > O. My(t). Since the exponential pdf describes a continuous random My (I) is an integral: My(t) = E(e IY ) = fooo i Y . )..e-l.y dy fooo )..e-(A-/)y dy After making the substitution U (A - t)y, we can write du My(t) _ 1 -1-t A A t 1 t [_e-I.lI OO [1 Here, Mdt) is finite and nonzero only when u less than 1. For 1 > 1, My (t) to ,,=<I ]" 1m eu-+oo = (1 uJ =1 --A-t - t)y > 0, which t must 258 Chapter :3 Random Variables EXAMPLE 3.12.1 Supp06e the random X has a geometric pdf, px(k) = (1 ~ p)k-l p, k 1,2•... (In practice, this is the pdf that models the occurrence of the first success in a series of where each trial has a p of ending in success function fOT X. Find Mx(t), the "'c:.......l'''' the first part of Definition 3.12.1 applies, so (1 _ p)k-t p = 00 L [(1 P 1 - p)et]k (3.12.1) k=1 The t in MxCt) cao be in a neighborhood of zero, as long as Mx{t) < 00. Here, Mx(t) is an infinite sum of the tenns [(1 - p)elt, and that sum will be finite only assumed, then, in what jf (1 - p)e' < 1, or, equivalently, ift < In(1l{1 - p». It will follows that 0 < t < 10(1/(1 - pl). Recall that 00 1 Lr k =-- k=O 1 r provided 0 < r < 1. This formula can be used on Equation 3.12.1, where r = (1 and 0 < t < In(1~p»)' Specifically, Mx(t) = -p-( 1 - P =1 P(1_(11 p)e ' - 1) EXAMPLE 3.12.2 Supp06e that X is a random variable with pdf px(k) = G)pk(1 - Find Mx(t). k = 0,1 ....• n Section 3.12 Find (II.) fx(x), (b) fyjx (y), and Moment-Generating Functions 251 D (c) p(~ ~ Y ~ ~IX = (d) E(Ylx) 3.11.17. If X and Y have the joint pdf 0 < x < y < 1 /x.'1(x,y) =2, find p(O < X < ~IY = ~). 3.11.18. Find P (X < 11 Y = I!) if X and Y have the joint pdf fx.'1(x.y) =xy/2, 0 < x < y < 2 3.11.]9. Suppose that Xl, X2, X3, X4, and Xs have the joint pdf fXIoX2.X3.X4.Xs(Xl. X2. X3, X4, xs) = 32xIX2X3X4XS for 0 < Xj < 1, i = 1,2, ... ,5. Find the joint conditional pdf of Xl, X2, and X3 given that X" = X4 and Xs = Xs· 3.11.20. Suppose the random variables X and Yare jomtly distnbuted accocding to the pdf fx,Y(x,y) ="76 (2 +"2 x xy ) ' o< x < 1, Find (Il) /x(x) (b) P(X > 2Y) (c) p(Y > 11X > ~) 3.12 MOMENT-GENERAllNG FUNCTIONS Finding moments of random variables directly, particularly the higher moments defined in Section 3.6, is conceptually straightforward but can be quite problematic: Depending on the nature of the pdf, integrals and sums of the form fy(y) dy and L k r px(k) t:,yY alIk can be very difficult Lo evaluate. Fortunately, an alternative method is available. For many pdfs, we can find a moment-generating function (or mgf), Mw(t), one of whose properties is that the rth derivative of Mw(t) evaluated at zero is equal to E(WY). Calculating a Random Variable's Moment-Generating Function In principle, what we call a moment-generating function is a direct application of Theorem 3.5.3. Definition 3.l2.1. Let W be a random variable. The moment-generatingjundion (mgf) for W is denoted MwCt) and ~ven by Lrlk pw(k) Mw(t) = E(rlw) = if W is discrete t<1l~ 1-00 rl w fwCw) dw / at all vaJues of t for which the expected value exists. ifWis continuous 260 Chapter 3 Random Variables EXAMPlE 3.12.4 normal (or bell-shaped) curve was introduced in Example cumbersome function Its is the Var(Y). Derive the moment-generating function for this most where Jl f'..""t't1n,.\n" models. random '-'>lr,,,n,I'" 00 f f 00 = (VJbrO') exp -00 Evaluating the integral in Equation is best "'1'£''"'1"'1" .... ,'" of the numerator of the exponent (which means that y is added and subtracted). That is, we can write i - (2Jl + 2u 2t)y + =(y + (J-L - (Jl 0'21)2 - + 0'21))2 {JL - by the of half the coefficient of + 0' 2 0 2 + Jl2 O' 4t 2 + 2Jltu 2 two terms on the right-hand side of Equation 3.12.5, though, not can be out of the integral, and Equation 3.12.4 reduces to My(t) = (J-LI y, + But, together, the latter two factors generating function for a normally My(t) one (why?), that the momentvariable is given by = QUESTIONS 3.12.1. Let X be a random variable with pdf px(k) for k = 0.1,2, .... n - 1 and 0 1 - tfll otherwise. Show that Mx(t) = ) n(l - el 3.1U Two chips are drawn at random and without [rom an urn that contains five numbered 1 through 5. If the sum of the is even, the random 5; if the sum of the chips drawn is odd, X = -3. Find the ma.me;nt·gerlenmrll2. IUD,Ct1()D for X. Section 3.12 Moment-Generating Functions ll2.3. Find the expected value of ~x if X is a binominal random variable with 11 261 = 10 and p=}. 3.12.4. Find the moment-generating function the random variable X whose probability function is given by 3.12.5. Which pdfs would have the following moment-generating 6l2 (B) My (t) = e (b) My(t) = 2/(2 - t) G (c) Mx(t) = + !e')4 (d) Mx(t) = O.3e' /0 - O.7e') 3.126. Let X have pdf frey) = y, O:::;y:::;l 2 - y. 1:::;y~2 0, elsewhere ! Find My(t). 3.127. A random variable X is said to have a Poisson distribution if px(k) P(X k) e- A )/'/k!,1I: = 0, 1.2., .... Find the moment-generating function for a Poisson random variable. Use the fact that 3.12..8. Let Y be a continuous random variable with h(y) = ye- Y , 0 :$ y. Show that 1 My(t) = (1 _ Using Moment-Generating Functions to Find Moments practicedjinding the functions Mx (I) and My(t), we now turn to the theorem that relationship to X" and y,.. Theorem 3.12.1. Let W be a rmwom variable with probability density function /w(w). [If W is continuous, fw(w) must be sufficiently smooth to allow the order differentUltion and to be interchanged.} Lei Mw(t) be the moment-generating function for W. Then, provided the rth moment exists, Proof. We will The to straightforward. the theorem for the continuous case where r is random variables and to an arbitrary positive un,,..,,,,,,. . rare 262 Chapter 3 Random Variables For r = 1, foo e'Y Jy(y) dt M~I)(O) = !:.- = 00 1=0 -00 1 1 = 00 yely fy(y)dy -00 1=0 -00 f () d dt Y Y ,=0 = I' yeO- y frey) dy J -<>0 = . [ yfr(y) dy = E(y) For r =2, = LX) y 2eY Jy(y)dy f:: = LX) leO. y Jy(y)dy -00 = 1=0-00 y2 fyCy) dy o = E(y2) EXAMPlE 3.12.5 For a geometric random variable X with pdf px(k) we saw in Example = (1 - p, k=1.2, ... (1 p)e]-t 1 that Mx(t) - Find the expected value of X by differentiating its moment-generating function. the product rule, we can write the first derivative of M x (t) as M~)(t) = pet(-l)(l - Setting t - (1 - p)e)-2(-1)(1 - p)/ p(1 - -::--:......:..--..:..:--= (1 - =0 shows that E(X) = M(l) (0) x = E (X) = c:-'-p_(1_-----"---'----== [1 - (1 - = 1 =p p) p +-p + [1 - (1 - p)e1rlpe Section 3.12 Moment-Generating Functions 263 EXAMPLE 3.12.6 Find the expected value of an exponential random variable with pdf frey) = le-J..y, y > 0 Use the fact that My(t) = A(A - t)-1 (as shown in Example 3.12.3). Differentiating My (t) gives M~l)(t) = A(-l)(A - Set t - £)-2(-1) A (J.. - t)2 ----;:;:- = O. Then M(l) 0 _ J.. (J.. - 0)2 E(y) = r ( )- implying that f EXAMPLE 3.12.1 Find an expression for E(Xk) if the moment-generating function for X is given by Mx(t) = (1 - PI - P2) + PIe' + P2e 2i The only way to deduce a formula for an arbitrary moment such as £(Xk) is to calculate the first couple moments and look for a pattern that can be generalized. Here, E(X) = Mr)(O) = Pleo + 2P2e2.O = PI + 21'2 Taking the second derivative, we see that 264 Chapter 3 Random Variables implying thai E(X2) = M~)(O) = PI eO = PI + + 22 P2t?o 22p2 Clearly, each successive differentiation will leave the PI term unaffected but will multiply the P2 term by two. Therefore, E(Xk) = Mf)(O) = PI + 2k P2 Using Moment-Generating Functions to Find Variances In addition to providing a useful technique for calculating E(W'), moment-generating functions can also find variances, because (3.12.6) for any random variable W (recall Theorem 3.6.1). Other useful "descriptors" of pdfs can also be reduced to com binations of moments. The skewness of a distribution, for exam pIe, is a function of E[W - tt)3J, where tt = E(W). But E[(W - tt)3] = E(W 3) - 3E(W 2)E(W) + 2[E(W)]3 In many cases, finding E[ (W - tt)2] or E[ (W - tt)3] could be quite difficult if momentgenerating functions were not available. EXAMPLE 3.12.B We know from Example 3.12.2 that if X is a binomial random variable with parameters n and p, then Mx(t) = (1 - P + + pe')"-l . pel pil)" Use Mx(t) to find the variance of X. The first two derivatives of Mx(t) are M~)(t) = n(1 - P and M~)(t) = pel Setting t = 0 gives and . n(n - 1)(1 - P + pe ),1-2 . pel ' + n(1 - p + pe')II-J . pel Section 3.12 = n(n - 1)p2 =np(l - Moment-Generating Functions + np - 165 (np)2 p) (the same answer we found in ExampJe 3.9.8). EXAMPLE 3.12.9 A discrete Tal1lQOlm variahle X is said to have a Poisson Px(k) == e- A)../< P(X = k) = k!' k == 0, 1,2, ... (An example of such a distribution is the mortality data It can he (see Question 3.12.7) that the 'HU'U"'H.--~""'"'''' random is by Mx(t) Case Study 3.3.1.) function for a Poisson = Use Mx(t) to find E(X) and Var(X). the first derivative of Mx(t) M~\t) = e-Hle' so E(X) = M~)(O) = . leO =A Mo ""'1" H... the product rule to M~) (t) yields the sro:ma " ...... V<llfn Mf)(t) ::::: e- HU . t )..1 + AI e-}..+J..t - Ael =0, Mf)(o) . )..eO + = £(X 2 ) = )..eO . e- AHeo . leO =A +)..2 variance of a Poisson random variable, then, proves to be the same as its mean: Var(X) = £(g2) = Mf>(O) +A =A [£(X)]2 [Mil) (O)f 266 Chapter 3 Random Variables QUESTIONS 3.129. Calculate y.~) for a random wniable whose function is Mdt) 3.12.10. Find y4) if Y is an exponential random variable with Jy(y) = ,Y > O. 3.12.11. The form of the moment-generating function for a normal random variable is My(1) Example 3.12.4). Differentiate My(t) = E( y) andi; 3.12.12. What is m{J~m(~Ill·-genelra[llng function My(f) = (1 - 3.12.13. Find E(Y~) if the moment-generating function Cor Y is given Recall Use Example 3.12.4 to find E(y2) withoul taking any Theorem 3.6.1 3.12.14. Find an expression for jl!) if Mdt) = (1 - t /}..)-r, where).. is any real number and /' is a positive 3.1215. Use Mdt) to find the value of the I.mifonn random variable described in Question 3.12.1. 3.12.16. Find the variance of Y if My(t) = (2). Moment-Generating Functions to Identify pdf's moments is not the only of moment-generating functions. are the pdf of sums of random variableS--lhat is, finding where also used to W WI + + ... + W,!. Their assistance in the latter is particularly important for two reasons: (1) Many statistical procedures are denned in terms of sums. and (2) alternative methods fOT fw, + 1V2 +".+ W" (IJ') are extremely cumbersome. The next two theorems give the background results necessary for deriving fw (ll'). Theorem 3.12.2 stales a uniqueness property of functions: H WI and W2 are random variables with the same mgfs, have the same pdfs. In practice. of Theorem 3.12.2 algebraic properties Theorem3.12.2. Suppose 11701 Wl and Wz are random variflblesforwhich MWI (t) = MW2(t) for some interval of f 's containing O. = fW2 (ll). o Proof. See (97). 3.12.3. 8. Let W be (l rannom variable with moment-generating .fimction Mw(J). Let V oW + h. Theil Mdt) b. Let WI. Tions Mw(at) . ..• WI! be independent tannOm variables wilh moment-generating funcW = W. + 'W2 + ... + w,J' (t). MW2(t), ... . and Mv.'" (f), MW(r) Proof. The = = MII'I(J} is left as an exercise. . MW2(t)··· MW,,(t) o Section 3.12 Moment-Geflerating Functions 267 EXAMPLE 1.12.1 D Suppose that Xl and Xl are two independent Poisson random variables with l.l and 1.2, respectively. That is, PX1(k) = = 0,1, k and (k) = P(X2 = k) = ----::-::-'--, k =0, 1, 2 ... Let X = Xl + X2. What is the pdf for X? According to Example the moment-generating functions for MXI (1) and Xl are = e-AI+Ale and Moreover, if X = Xl + Xl, then by Part b of Theorem 3.12.3, (3.12.7) But, by inspection, EQuation is the moment-generating function that a Poisson random variable with 1 = 11 + 12 would have. It follows, then, by Theorem 3.12.2 that px(k) = -----:.;...... k =0, 1,2•... Comment. Poisson random variable reproduces the sense that the sum of independent Poissons is also a Poisson. A similar property holds for independent normal random variables (see Question 3.12.19) and, under conditions, for independent binomiaJ random variables (recall 3.8.1). EXAMPLE 3.12.11 We saw in Example 3.12.4 that a nonna) random variable, Y, with mean IL and variance has 0'2 fy(Y) = (1/&0') exp [ -21 (Y - 1k)2] 0' ' 268 Chapter 3 Random Variables andmgf By definition, a stlmdard llOr17Uli random variable is a nonnal random variable for which /-L = 0 and (J' = L Denoted the pdf and for a standard normal random variable are fz(z) ratio = (lj.J2iC)(r1.2/2, -00 < Z < 00 and Mz(t) == t;l2/2, Show that the Y-/-L is a standard normal random variable, Z . . Y-Jk 1 J-L Wnte as -y - -. By Part a of Theorem 3.12.3, (J' (J' (J' M(Y-/.ll/a(O = e-JJ,I/u My (;) = e-JJ,I/u iJ.ll/u-k?(tlu)2!2) /21"'> =e '" But Mz(r) = so it follows from Theorem that the pdf for Y-J-L is the same (J' as fz(z). (We call Y - Jk a Z transfor17Ultion. Its importance will become evident in (J' Chapter 4.) QUESTIONS 3.12.17. Use Theorem 3.12.3(a) and Question 3.12.8 to find the moment-generating function of the random variable Y, where fy(y) = J..ye- ky , y ?:: O. 3.Ul8. Let Y1, and Y3 be independent random variables, each having the pdf of tion 3.12.17. Use 3.12.3(b) to find the moment-generating function of Y. + + Compare your anSwer to the function in tion 3.12.14. lJ.2.19. Use Theorems 3.12.2 and to determine which of the statements is true: (8) The sum of two independent Poisson random variables has a Poisson distribution. (b) The sum of two independent exponential random variables has an exponential distribution. (c) The sum of two independent nonnal random variables has a normal distribution. 3.12.20. Calculate ::; 2) if Mx(t) = (~ + 3 Y2, .• " Y" is.a random sample of size 11 from a normal distribution 3.12.21. Suppose that with mean Jl and standard deviation (J. Use moment-generating functions to deduce 1 the pdf of Y = Yi. 11 Section 3.13 Taking a Second look at Statistics (Interpreting Means) 269 3.12.2.2. Suppose the moment-generating function for a random variable W is given by Calculate P(W :::: 1). Hint· Write W as a sum . ................. Suppose that X is a Poisson random variable, where px(k) = e-A)...k I k!, k (a) Does the random variable W = 3X have a Poisson distribution? (b) Does the random W = 3X + 1 have a Poisson t1,jtrril"nt 3.12..24. that Y is a variable, where fy (y) = (1/$0') exp [ 0,1, .... _~ (Y : Ji Y} --00 < Y < 00. (a) Does the random variable W 3Y have a normal distribution? (b) Does the random variable W = 3Y + 1 have a normal distribution? TAKING A SECOND LOOK AT STATISTICS (INTERPRETING MEANS) One of the most important out of 3 is the notion of the expected value (or mean) of a random variable. Defined in Section 3.5 as a number that reflects of a pdf, expected value (Ji) was originally introduced for the benefit of the gamblers. It spoke to one of most fundamental questions-How much will I win or Lose, on the average, if I playa certain game? (Actually, the real question they probably had in was "How are you going to lose, on the average?") Despite having had sucha selfiSh, materialistic. gambling-oriented raison d'etre, the expected value of aU persuasions as a was embraced by (respectable) and preeminently useful of a distribution. Today, it would not be an exaggeration to claim that the majority of all statistical analyses focus on either (1) the expected value the expected values of two or more random a single random variable or (2) variables. In the lingo of applied there are actually two fundamentally different types of "means"-population means and sample means. The term "population mean" is a synonym for what mathematical statisticians would caIl an expected value-that is, with a a popu1ation mean (IJ,) is a weighted average of tbe possible values theoretical probability mode], either p x(k) or fy (y). depending on whether the underlying of random variable is discrete or continuous. A sample mean is the arithmetic a set of measurements. If, example, n observations--Yl, )'2. " ' , Yn-are taken on a Y, the sample mean is denoted continuous random 270 3 Random Variables Conceptually. sample means are estimates IA'IJUJla ..IVH means, where the "quality" (2) the standard deviation of the estimation is a function of (1) the (0) with the individual measurements. gets andlor the standard deviation gets smaller, the aplDrOiXlllnatlon Vliil tend to get beUer. means (either y or J-i.) is nOl always easy. To be sure, what they in principle is dear enough-both v and J-i. are measuring the centers their rPl:nP.,'I".rP IV""".)"'':>. Still, many a wrong conclusion can be traced directly to researchers misthe value of a mean. Why? Because the distributions that y andlor J-i. are may be dramatically different than the distributions we think point arises in connection with SAT scores. Each Fall the each the fifty states and the District of Columbia are released by the (ETS). With "accountability" being one of the new words associated with K-12 education, SAT scores have become highly At the level, Democrats and Republicans each in no small measure by campaign on their own versions scores on standardized exams, at the state legislatures often modify education budgets in response to how well or how poorly their students performed the year before. Does it make sense, to use SAT to characterize the quality of a state's educa£ion Absolutely not! of this sort refer to very them at will different distributions from srate to state. Any attempt to necessarily be misleading. One such state-by-state SAT comparison that in in Table (128). Notice that Tennessee's entry is 1023, is Does it foHow that Tennessee's educational is among the best in the Probably not. Most independent assessments of K-12 rank Tennessee's are schools the weakest in the nation, not among the best. If those do Tennessee's students do so wen on the SAT? The answer to thar question lies in the academic profiles of the students who take in college-bound students in that state apply exlusively to the schools in the South and the Midwest, where admissions are based on the ACT, not the SAT. The SAT is primarily by private where admissions tend to be more competitive. As a result, the students in Tennessee who take the SAT are not representative of the entire population of in that state. A disproportionate number are exceptionally those the students who feel that schools. The number 1023. they have the ability to be competitive at then, is the average of something (in this case, an subset of all Tennessee students), but it does not correspond to the center of the SAT distribution for all Tennessee students. we look beyond the moral here is that analyzing data obvious. What we learn in Chapter 3 about random variables and probability distributions and values is helpful only if we take the time to Jearn about the context and Appendix l.A.1 MINITAB Applications 271 TABlE 3.13.1 State AK AL AZ AR CA CO DE DC FL GA Average Score State 911 MT 1011 939 935 895 NH NJ 924 893 1003 898 892 NM NY NC 849 ND 1056 879 OH OK OR PA RI 966 1019 927 879 SC SD TN 838 1031 1023 UT 1067 969 844 881 969 IL IN IA NE NV 1024 876 1080 1044 888 860 886 LA ME MD MA MI MN MS MO 1011 883 908 1009 1057 1013 1017 VT VA WA WV WI WY 899 893 922 921 1044 980 the idiosyncracies of the phenomenon being studied. To do otherwise is likely to conclusions that are. at best, superficial and. at worst., incorrect. :NDIX 3.A.1 to MINITAB APPUCATIONS Numerous software packages are available for doing a variety of probability and statistical calculatjons. Among the first to be developed and one that continues to be very popular is MINITAB. Beginning we will include at the ends of certain chapters a sbort discussion of MINITAB solutions to some of the problems that were discussed in that chapter. What other software can do and the ways outputs are formatted are likely to be quite similar. 272 Chapter 3 Random Variables Contained in MINITAB are subroutines that can do some of the more important pdf and cdf computations described in Sections 3.3 3.4. In the case of binomial random variables, for the statements MTB > pdf SUBC > binomial n p. and MTB > edf k; SUBC > binomial n p. will calculate G)pk(l ~ p),,-k and C)pr(l - p)n-r, respectively. Figure 3.A.l.l shows the MINITAB program for doing the edf calculation P(X :S 15») asked for in Part a of Example 3.2.2. The commands k and edt k can be run on many of the probability models most likely to encountered in real-world problems. Those on the list that we have already seen are the binomial, Poisson, normal, uniform, and exponential distributions. MTB > edt 15; SUBC > binomial 30 0.60. Cumulative Distribution Function Binomial with n ~ 30 and p = 0.600000 x P(x <,.. x) 15.00 0.1754 FIGURE l.A.. 1.1 discrete random variables, the cdC can be printed out in its entirety (lhat is, for every integer) by deleting the argument k and using the command MTB < edt;. Typical is the output in corresponding to the cd! for a binomial random variable with n = 4 and p = i. HTB > edt, SUBC > binomial 4 0.167. Cumulative Distribution Function BinOmial with n = 4 and p x p( X <= x) o 0.4815 1 0.8676 2 0.9837 3 0.9992 4 1.0000 FIGURE 3.A.. 1..2 =0 167000 Appendix 3A 1 MINITAB Applications 213 Also available is an inverse command, which in the case of a continuous random variable Y and a specified probability p identifies value y having the property that P(Y ::: y) = Fr(Y) = p. example, if p = 0,60 and Y is an exponential random variable with pdf fy{y) e-Y', y > 0, the value y = 0.9163 has property that P(Y ::: 0.9163) = Fy(O.9163) = 0.60. That = fO. 9163 Fy(0.9163) = 10 e-Y dy = 0.60 With MINITAB the number 0.9163 is found by using the command MTB :> invcd:f 0,60 (see Figure 3.A.l.3). MTB > invcdf 0.60; SUBC> exponential 1. Inverse Cumulative Distribution Function Exponential vith mean - 1.00000 P(X <= x) x 0.6000 O. FlGuru: 3.A.1.] CHAPTER 4 Special Distributions 4" 1 4.2 4.3 4.4 4.S 4.6 INTRODUCTION Tl-IE POISSON DISTRIBlITlON THE NORMAL DISTRIBUTION THE GEOMETRIC DISTRIBUTION THE NEGATIVE BINOMIAL DISTRIBUTION THE GAMMA DISTRIBUTION 4.1 TAKING A SECOND lOOK AT STATISTICS (MONTE CARLO SIMULATIONS) APPENDIX 4.A.1 MINITAB APPliCATIONS APPENDIX 4"A.2 A PROOF OF THE CENTRAL UMIT Tl-IEOREM L. A. J. Quetelet Q"H~li;L Although he maintained lifelong literary and artistic interests, Quetelet's mathematical talents led him to a doctorate from the University of Ghent and from there to a college teaching position in Brussels. In 1833 he was appointed astronomer at the Brussels Royal Observatory, after having been largely responsible for irs founding. His work with the Belgian census marked the beginning of his pioneering efforts in what today would be called mathematical sociology. Quetelet was well-known throughout in scientific and literary circles: At the time of his death he was a member of more than one hundred learned societies. -lambert Adolphe Jacques Quetelet (1796-1874) 214 Section 4.2 The Poisson Distribution 275 INTRODUCTION "qualify" as a probability model, a function defined over a sample space S needs to for all outcomes in and (2) it must satisfy only two (1) It must be sum or integrate to one. means., faT that fy(y) =~ + 10 (~ 1 be considered a pdf beca use Jy (y) ?:: 0 for all 0 :5: y ::: 1 an d + 0 ::: y :5: 1 can 7;3) = 1. It certainly does not follow, though, [hat Jy(y) and px(k) that satisfy these two criteria would actually be used as probability models. A pdfhas practical significance only if it does, indeed. model lhe probabilistic behavior of real-world phenomena. In point of fact, only a handful offunctions do [and Jy (y) = ~ + 2 0:5: y :5: 1 is not one of them!]. Whether a probability function-say, Jy{y)-adequately models a given phenomenon ultimately depends on whether the physical factors that influence the value of Y parallel the mathematical asswnptions implicit in fy(y). Surprisingly, many measurements random variables) that seem to be very different are actually the consequence the same set of assumptions (and will, therefore, be modeled by the same pdf). That it makes sense to single out these "real~woTld" pdf's and their properties in more detail. for the first time-recall the attention This, of course, is not an idea we are to the binomial and hypergeometric distributions in Section Chapter 4 continues in the spirit of $e{:tion 3.2 by examining five other widely used models. Three of five are discrete; the other two are continuous. One of the continuous pdf's is the normal (or Gaussian) distribution. which, by far, is the most important all probability models. As we will see, the normal "curve" figures prominently in every chapter from this point on. Chapter 4. The only way to fully the Examples playa major role generality of a probability model is to look at some of specific applications. Included in from the discovery of alpha.particle radiation to an this chapter are case studies early ESP experiment to an analysis of pregnancy durations to counti.ng bug parts in peanut butter. rHE POISSON DISTRIBUTION The binomial distribution problems that appeared in Section 3.2 all had relatively small values for JI, so evaluating px(k) = P(X = k) = (~) pk(l - p)ll-k was not particularly difficult. But suppose 11 were 1000 and k, 500. Evaluating px(500) would be a formidable task for many handheld calculators, even today. Two hundred years ago, the prospect of doing binomial calculations by hand was a catalyst for mathematicians to develop some easy-to-use approximations. One of first such approximations was the Poisson limit, which eventually gave rise to the Poisson distribution. Both are described in $e{:tion 4.2. Simeon Denis Poisson (1781-1840) was an eminent French mathematician and physian academic administrator of some note, and, according to an 1826 letter from the 276 Chapter 4 Special Distributions mathematician Abel to a Poisson was a man who knew "how to behave with a deal of dignity." One of Poisson's many was the application of probability to the law, and in 1837 he wrote Recherches sur III Probabilite de Jugemenls. Included in the latter is a limit for px(k) = (:) l(1 - p)1I-k that holds when n approaches 00, p approaches 0, and np remains constant. practice. Poisson's limit is used to approximate hard·t(}-(;alculate binomial where the values of nand p the conditions of the limit-that when 1'1 is large and p is small. The Poisson limit Deriving an asymptotic expression for the binomial probability modeJ is a straightforward exercise in calculus, given that rtp is to remain fixed as n increases. Theorem 4..2.1. Suppose X is a binomial random variable, where lIn - 00 and p -+ 0 in such a way that).. lim 11-"00 P(X = np remains constant. then = k) = p ...... O np=onst. Proof We begin by rewriting the binomial probability in teons of 1: ~r-k =Ji~k!(nn! - But since (1 - ()..jn)]" _ (:k) (1 ~)-k(l_ ~r ~~ nli~-(n-:-!-k)-- l_-;- (1 - ~r __ as n _ we need only show that 00, n! Cn - -----,-_1 k)!(n to prove the theorem. However, note that ... (n - k n! + --~~---~=--------------- (n - k)!(n - (n - l)(n - l)···(n -)..) a quantity that, indeed, tends to 1 as n -+ 00 (since).. remains constant). o Section 4.2 The Poisson Distribution 277 EXAMPLE 4.1.1 Theorem 4.2,1 is an asymptotic Left unanswered is the question of the relevance of the Poisson limit for finite nand p. That is, how large does n have to be and how small does p have to be before e-np(np)klk! becomes a good approximation to the binomial probability, pxCk)? Since "good approximation" is undefined, there is no way to answer that question in any completely specific way. Tables and 4.2.2, though, offer a partial solution by comparing the closeness of the approximation for two particular sets of values for nand p. In both cases A = np is equal to one. but in the former, n is set equal to five-in the latter, to one hundred We see in Table 4.2.1 (n = 5) that for some k the agreement between the binomial probability and Poisson's limit is not very good. If n is as large as one hundred, though (Table 4.2.2), the agreement is remarkably good for aU k. TABLE 4.2.1: Binomial Probabilities and Poisson Umits; n = 5 and p = ~ (J.. == 1) (O.2)k (0.8)S-k k 0 1 2 3 4 5 6+ 0.328 0,410 0,205 0.051 0.006 0.000 0 1.000 e- 1 (1)k 0.368 0.368 0.184 0.061 0.015 0,003 0,001 1.000 TABLE 4.2.2: Binomial Probabilities and Poisson Umits: n "" 100 and p ... (J.. = 1) -,-lm k 0 1 2 3 4 5 6 7 8 9 10 C~) (O.Ol)k(O.99)l00-k 0.366032 0.369730 0.184865 0.060999 0.014942 0.002898 0.000463 0.000063 0.000007 0.000001 0.000000 0.367879 0367879 0.183940 0.061313 0.015328 0.003066 0.000511 0.000073 O. {)(X)()(\9 0.000001 0.000000 1.000000 0.999999 EXAMPLE 4.2.2 Shadyrest Hospital draws its patients from a rural area that has twelve thousand elderly residents. The probability that anyone of the twelve thousand will have a heart attack on any given day and will need to be connected to a special cardiac monitoring machine been estimated to be one eight thousand. Currently, the hospital three such machines. What is the probability that equipment will be inadequate to meet tomorrow's emergencies? Let X denote the nwnber of residents who will need the cardiac machine tomorrow. Note that X is a binomial random variable based on a large 12,000) and a small p( ~). As such, Poisson's limit can be used to approximate px(k) for any k. In = 218 Chapter 4 Special Distributions particular, P(Shadyrest's cardiac facilities are inadequate) = P(X > 3) =1 P{X S 3) =1 ~C2,~)(~y(=y2.LOO-k ,;,,1 3 e-U(L5)k k! =::0.0656 where A = np = 12,000(~) = L5. On the then, Shadyrest will not be to meet all the cardiac needs of its clientele once every fifteen or sixteen days. (Based on the binomial and Poisson limit comparisons shown on page 276, we would expect the approximation here to be excellent-n 12,000) is much and p( = . ) is much smaller than their counterparts in Table so the conditions of Theorem 4.2.1 are more nearly satisfied.) CASE STUDY 4.2.1 Leukemia is a rare form of cancer whose cause and mode of transmission remain largely unknown. While evidence abounds that exposure to radiation can ncrea~,e a person's risk of contracting the it is at the same time true that most cases occur among persons whose contains no such overexposure. A related issue, one maybe even more basic than the causality question, concerns the spread of the It is to say that the prevailing medical opinion is that most are not contagious-still, the hypothesis that some forms forms of of the particularly childhood variety. may be. What continues to fuel this speculat10n are discoveries of so--called clusters." aggregations 1n time and space of unusually large numbers of cases. To one of the most frequently cited leukemia clusters in the medical literature 19t5U<> in Illinois, a suburb of occurred during the late and (74). In the 5~-year period from 1956 to the first four months of in Niles reported a total of eight cases of leukemia among children less than fifteen years of The number at (that is, the number of residents in that age range) was 7(J76. To assess the likelihood of that many cases occurring in such a small population, it is to look first at the incidence in towns. For all of Cook county, excluding were 1.152,695 children less than 15 years of age-and among those, 286 diagnosed cases of leukemia. That gives an average leukemia rate of 24.8 cases 100,000: 100,000 = 24.8 cases/100,000 children in years (Continuea em next poge) Section 4.2 The Poisson Distribution 279 Now, imagine the 7076 children in Ni1es LO be a series of 11 = 7076 (lndependent) Bernoulli lrials, each having a probability p = 24.8/100,000 =: 0.000248 of contracting leukemia. The question then becomes, given an n 7076 and a p of 0.000248, how likely is i1 that eight "successes" would occur? (The expected number, of course, would be 7f.J76 X 0.000248 = 1.75,) Actually, for reasons that wiIJ be elaborated on in Chapter 6, it will prove more meaningful to consider the related event, eight or more cases occurring in a 5!-year span. If the probability associated with latter is very small, it could be argued that leukemia did not occur randomly in Niles and that, perhaps, contagion was a factor. Using the binomial distribution, we can express the probability of eight or more cases as P(80r more cases) = ( 7~6) (O.0OO248)k (0.99fJ7 52)71J76-k (4.2.1) Much of the computational unpleasantness implicit in Equation 4.2.1 can be avoided by appealing to Theorem 4.2.1. Given that np = 7076 x 0.000248 = 1.75, P(X ::: 8) 1 P(X .:::: 7) 1 - L --------'1;=0 k! 7 == = 1 - 0.99951 0.00049 How close can we expect 0.00049 to be to the "true" binomial sum? Very close. Considering the accuracy of the Poisson limit when n is as small as one hundred (recall Table 4.2.2), we should feel very confident here, where n is 7076. Interpreting the 0.00049 probability is not nearly as easy as assessing its accuracy. The fact that the probability is so very small tends to denigrate the hypothesis that leukemia in Niles occurred at random. On the other hand, rare events, such as clusters, do happen by chance. The basic difficulty in putting the probability associated with a given duster in any meaningful perspective is not knowing in how many similar communities leukemia did not exhibit a tendency to cluster. That there is no obvious way to do this is one reason the leukemia controversy is still with us. QUESTIONS 4.2L If a typist averages one misspelling in every 3250 words, what are the chances that a 6OOO-word is free of all such errors? Answer the question two ways-first, by using an exact binomial analysis, and second, by using a Poisson approximation. the similarity (or dissimilarity) the two answers surprise you? Explain. 280 Chapter 4 Special Distributions 4.2.2. A medical study recently documented that 90S mistakes were made among the 289,411 wriUen during one year at a large metropolitan leaching hospital. :SU1Pp()Se patient is admitted with a condition serious to warrant 10 different pn~SCI'lOtjonIS. Approximate the probability that at least one will contain an error. 4.2.3. Five hundred people are attending the tirst annual "1 was Hit by Lighting" Club. Approximate the probability that at most one of the 500 was born on Poisson's birthday. 4.2.4. A chromosome mutation linked with oolorblindness is known to occur, on the average, once in every 10,000 births. (0) Approximate the probability thal exactly 3 of the next 20,000 babies born will have the mutation. (b) How many babies out of the next 20,000 would have to be born with the mutation to oonvince you that the "1 in 10,000" estimate is too low? Hint Calculate P(X .:: k) = 1 P(X :::: k - 1) for various k. (Recall Case Study 4.2.1.) 4.2.5. Suppose that 1% of all items in a supermarket are not priced properly. A customer buys 10 items. Whal is the probability that she will be delayed by the cashier because one or more of her items requires a check? Calculate both a binomiaJ answer and a Explain. Poisson answer. Is the binomial model "exact" in this 4.2.6, A newly formed life insurance company has underwritten term policies on 120 women between lhe of 40 and 44, Suppose that each woman has a 11150 probability of dying next calendar year, and each death requires the company to payout $50,000 in Approximate the probability that the oompany will have to pay at least $150,000 in benefits nexi year. 4.2.7. According to an airline industry report (187). roughly 1 piece ofluggage out of every 200 that are checked is lost. Suppose that a frequent-flying businesswoman will be checking 120 bags over lhe course of the next year. Approximate the probabilily that she will lose 2 of more pieces of luggage. by some 4.2.8. Electromagnetic fields by power transmission lines are researchers 10 be a cause of cancel'. Especially at risk would be telephone linemen because of their frequent proximity to high-voltage wires. According to one study. two cases of a rare form of cancer were detected among a group of 9500 linemen (181). In the population, the incidence of that particular oondition is on the order of one in a million, What would you oonclude? Hint: Recall the approach taken in Case Study 4.2.1. 4.2.9. Astronomers estimate that as many as 100 billioo stars in the Milky Way galaxy are encircled by planets. If so, we may have a plethora of cosmic neighbors, Let p denote the probability that any such solar system contains intelligent life. How small can p be and still a 50-SO chance thai we are not alone? The Poisson Distribution The real significance of Poisson's limit theorem went unrecognized for more than fifty years. Fo)' most of the latter part of the nineteenth century. 4.2.1 was taken strictly at face value: It provided a convenient approximation for px(k) when X is binomial, II is large, and p is smalL BUI then in 1898 a Gemlan professor, Ladislaus von Bortkiewicz, published a monograph entitled Das Geselz der Kleillen Zahlen (The Law of Small Numbers) that would quickly transform Poisson's "limit" into Poisson's "distri bution." What is best remembered about monograph is the curious set of data described in Question 4.2.10. The measurements reoorded were the numbers of Prussjan Section 4,2 Poisson Distribution 281 who were kicked to death by tbeir horses. In analyzing those figures, Bortkiewic2: was to show that function e-"J.. )...1'- I k! is a probability model its own right, even when (1) no binomial random is present and (2) values for n aDd p are unavailable. Other to follow Bortkiewicz's lead, showing up in ..... "'.Ull""'~ and a steady stream of Poisson journals. Today the function px(k) = e- h )..." Ik! is universally as being the three or four most important data models in all of statistics. SOlelle.rs Theorem The random variable X is said to have a Poisson distribution if px(k) = P(X where )... is a positive constant. Var(X) = J... k! k = 0, 1,2, .. ' for any Poisson ran:QOJm variable, E(X) Proof. To show that px(k) px(k) :::: 0 for all nonnegative nt ..,{Y . . "rc = ).. and as a probability function, note, first of all, that k. Also, px(k) sums to one: 00 - - =e-A~ L." kl k=O expansion of = eA. Verifying that = )... has already been done in Example functions. E(X) = ).. using moment-generating fitting the Poisson Distribution to Data data invariably to the numbers of a event occurs a series of'''urnts'' (often time or space). For example, X might be the .,.,,~.""v nUlrnbc~r accidents reported at a given intersection, records are kept an the resulting data wouLd be the sample kl. k2. . . . • where each ki is a nonnegative Pnii"",,..,.. kiS can be viewed as Whether or not a set so on in the sample are proportions of Os, that X = 0, 1,2, and so on, as by px(k) show data sets where the variability in the preOlctea by the Poisson distribution. Notice in by sample mean of <>LL'-">""'''U~VU will be taken = data depends on the similar to the probabilities rA>..k Ikt The next two case ,,. ...,.....""'" is consistent with the probabilities case )... in px(k) is ki. The reason for Cbapter 5. 282 Chapter 4 Special Distributions CASE STUDY 4.2.2 Among the early research projects investigating the nature of radiation was a 1910 study of a-particle emission by Ernest Rutherford and Geiger (156). For of 2608 eighth-minute the two physicists recorded the nwnber of a-particles called a from a polonium source detected by what would eventually counter). The numbers and proportions of times that k sllCh particles were detected in a eighth-minute (k = 0. I, 2, ... ) are detailed in the first three columns of 4.2.3. Two a particles, example, were detected in each of eighth-minute meaning X = 2 was the observation recorded 15% 383/2608 x 100) the time. To see a probability function of form px(k) = e-A.)J'jk! can adequately A with model the proportions in the third column, we first need to the average value for X. Suppose the six observations comprising the category are each the value eleven. Then TABLE: 4.2.3 No. Detected, k 0 1 2 3 4 5 6 0.02 0.08 0.16 203 383 525 532 408 139 45 27 10 6 2608 7 8 9 10 11+ 0.20 0.15 0.10 0.05 0.03 0.01 0.00 0.00 0.10 0.05 0.02 0.01 0.00 0.00 1.0 = k= 10,092 3.fr7 and the presumed model is px(k) = e- 3.87 (3.fr7)k/k!, k = 0,1,2, .... Notice how closely the entries in the fourth column [i.e., p:l{(O}, Px(l}. px(2), ... J agree with the sample proportions appearing in the third column. The conclusion es<;·ap,tlme: The phenomenon radiation can be modeled very effectively Poisson distribution. Section The Poisson Distribution 283 CASE STUDY Table 4.2.4 gives the numbers of fumbles made by 110 Division during a recent weekend's slate of fifty-five (I07). contention that the number of fumbles, X, that a team makes Poisson random variable? sueloort the is a TABLE 4.2.4 2 5 1 0 2 1 4 3 1 1 1 2 2 0 1 4 1 2 3 2 2 1 3 3 4 2 0 1 4 6 2 3 2 0 5 1 5 3 4 2 3 2 4 1 4 2 4 1 4 1 1 5 1 2 3 1 3 4 2 1 3 2 2 2 4 4 3 t 4 2 0 2 0 3 5 6 0 3 6 3 7 4 2 5 1 4 1 3 4 3 5 2 2 1 1 2 5 5 2 3 3 1 2 4 1 2 5 3 3 0 The first step in summarizing these data is to tally the frequencies and calculate the Columns 1 sample proportions associated with each value of X Notice, also, that the average number of fumbles per team is 2.55: TABLE 4..2.5 No. Frequency k o 8 1 2 24 27 20 3 17 4 5 10 3 1 6 110 k= Substituting model most Proportion PX (k) = e-2.55 (2.55)k I k! 0.07 0.22 0.08 0.25 0.18 0.16 0.09 0.03 om -1.0 0.20 0.22 0.14 0.07 0.03 0.01 1.0 _8(:....;0)_+_24_<:....;1)_+_27:....;{2....:.)_+_·._._+_1:.....:{7...:..) 110 for A, then, gives px(k) = e-255 (2.55)k I k! as the particular Poisson to fit the data. {Continued on 284 Chapter 4 Distributions (Case Sludy 4.2.3 continued) column of Table 4.25 shows px(k) evaluated for each eight values listed for k: px(O) e-2.55(2.55)o 10! = 0.08, and so on. Once again, the row-by-row agreement is quite strong. There appears to be nothing in these data that would refute the presumption that the number of fumbles a team makes is a Poisson random variable. The Poisson Model: The law of Small Numbers Given that the expression e-'J..).. k I k! phenomena as diverse as and fumbles raises an obvious question: Why is that same px(k) describing random variables? The answer, of course, is that the underlying conditions that produce those two sets of measurements are actually much the same, despite how superficially different the resulting data may seem to be. Both phenomena of a set of mathematical assumptions known as the Poisson model. Any are measurements that are derived conditions that mirror assumptions will nelcessarilv vary in accordance the Poisson distribution. the Consider, for example, the number of fumbles that a football team makes dividing a time interval of T into n nonoverlapping course of a ~, where n is large (see 4.2.1). ~UPp4JSe .L'-"-'LU'£LlX Tin ...--'--. 234 5 n T RGURE4.2.1 given is 1. The probability that two or more fumbles occur in "''''''' .... u''x''x'J O. 2. Fumbles are U'''''''LJ""UIU'-" subinterval is constant over the occurs a 3. The probability tha t a entire interval from 0 to T. The n subintervals, then, are analogous to the n independent trials that form the backdrop (or "binomial mode]": In each subinterval there will be either zero fumbles or one L"-llJLAUlI;;', where Pn P(fumble occurs in a given subinterval) remains constant from subinterval to subinterval. Section The Poisson Distribution Let the random X denote the total time T, and let A denote the rate at which a team 0.10 fumbles per uu"' .....,, E(X) which implies that Pn AT Px(k) n = AT = np" 285 of fumbles a team makes during (e.g., A might be as (why?) From Theorem 4.2.1, then, P(X = k) =: -;; ( ) ( AT)k ( 1 _ A~)"-k " (4.2.2) So, if a team fumbles at the rate of, say, 0.10 times per minute and they have baU for 30 nrinutes a game, AT = (0.1)(30) = 3.0, and the probability that they fumble exactly k times is approximated by the pdf, px (k) = e- 3 .o(3.0)k I kt, k = D. 1,2.... Now we can see more clearly why Poisson's " as given in Theorem , is so important. three Poisson model assumptions at the top of the page for football fumbles are so unexceptional that they to countless real-world phenomena. time they do, pdf px(k) = e-l. T (ATi Jkl application. calculating Poisson Probabilities In practice, calculating Poisson probabilities is an exercise in choosing T so that AT represents the expected number of occurrences in whatever "unit" is with the random variable X. They look but the pdf's px(k) = PX (k) = e-)"T (AT)I!. Jk! are exactly the same and will give identical values once A T are properly defined. EXAMPLE 4.2.3 Suppose typographical errors are at the rate of 0.4 per in State Tech's pages long, what is the probability campus newspaper. If next Tuesday's I:?WlLl\)'U is that fewer than three typos win appear? of errors that in sixteen pages. We start by defining X to be the The of independence and constant probability are not in setting, so X is likely to be a Poisson random variable. To answer the question using in Theorem 4.2.2, we to set ), equal to E(X). But if the error rate is the expected number typos in sixteen pages will 6.4: '-4·-- x 16 = 6.4 errors 286 Chapter 4 Special Distributions It then, that P(X < 3) = P(X :s 2):::: L e-6.4(64)1< k 2 ' l Jc.=O • e- 6.4 (6.4)1 e-6.4(6.4)o + O! = I! + --:::-::--- = 0.046 If .... ,...,,,,.,,,... 4.2.2 is we would define A = OA errors/page and T = 16 Then AT = E(X) = 6.4 and P(X < would be e-6A (6.4)K / k!, the same numerical value found from Theorem 4.22. EXAMPLE 4.2.4 Entomologists estimate that an average person consumes almost a of bug parts in each year (180). There are that many insect eggs. larvae, and miscel1aneoU<J body the we eat and the liquids we drink. The Food and Drug (FDA) sets a Level (FDAL) for each product: Bug-part con,cerltr the FDAL are considered acceptable. The legal limit for peanut insect hundred grams. Suppose the crackers you bought "I.,n,""""" with twenty grams of peanut butter. What are the ~"'H~~~ that snack will at crunchy critters? Let X denote the number of bug parts in twenty grams of peanut butter. Assuming level equal to the FDA limit-that is, thirty the worst, we will set the hundred 0.30 fragments/g). Notice E(X) = 6.0: fragments ---=--g X 20 g = 6.0 Iral~e:llI.s or more bug parts is a It follows, then, that the probability that your snack disgusting 0.71: P(X :::: 5) = 1 - P(X::::: 4) 1 :::: 1 = Bon 0.2851 Section 4.2 The Poisson Distribution 181 QUESTIONS 4.2.10. During the latter part of the century, Prussian officials information posed to cavalry soldiers. A total 10 cavalry corps relating to the hazards that were monitored over a period 20 Recorded for each year and each corps was X> the annual number of to kicks. Summarized in the following table are the 200 values recorded for X (14). that these data call be modeled by a Poisson pdf. Follow the procedure illustrated in Case Studies 4.22 and 4.2.3. No. of Observed Number of Corps-Years in Which k Fatalities Occurred k o 109 1 65 22 3 2 3 4 1 20() 4.2.11. A random sample of seniors enrolled at the University West Florida was categorized according to X, the number of times they had changed '"''lIV''' (114). Based 011 the summary of that shown in the following table, you conclude that X can be as a Poisson random variable? Number Major Changes o Frequency 237 1 90 2 22 3 7 4.2.12. Midwestern Skies books 10 commuter flights each week totals are much the same from week to as are the numbers of that are checked. Listed in the following table are the numbers of bags that were each of the first 40 weeks in 2004. Do these figUTe8 support the presumption that number of bags lost by during a typical week is a Poisson variable? Week 1 2 3 4 S 6 7 8 9 10 11 12 13 Week 1 o o 3 4 1 o 2 o 2 3 1 2 14 15 16 17 18 19 20 21 Lost 2 I 3 o 2 S 2 22 23 1 1 1 24 2 25 1 26 3 27 28 29 1 2 o o 31 t 32 33 3 1 35 36 37 38 o 2 40 1 4 2 1 o 288 Chapter 4 Special Distributioos 4..2.13. In 1893, New Zealand became the first country to permit women to vote. Scattered over the ensuing 113 years, various countries joined this movement to this to women. The table below (127) shows how many countries took this step in a given year. Do these data seem to follow a Poisson distribu60n? Yearly Number of Countries Women the Vote o 82 1 25 4 2 3 o 4 2 4.2.14.. The following are the daily numbers of death notices for women over the age of 80 that appeared in the London Times over a three-year period (73). Number of Deaths Observed ... v ..", ....... ,·" o 2 3 4 162 267 271 185 111 5 6 7 8 9 61 27 8 3 1 1 10% (8) Does the Poisson pdf provide a good description of the variability pattern evident in these data? (b) If your answer to Part (a) is "no," which think not be holding? the Poisson model do you 4.l.l5. A certain of European mite is capable of damaging the bark on orange trees. The following are the results of inspections done on 100 saplings chosen at random from a orchard. The measurement recorded, X, is the number of mite infestations found on trunk of each tree. Is it reasonable to assume that X is a Poisson random variable? If not, which of the Poisson model assumptions is likely not to be true? No. of k No. of Trees 0 1 55 20 2 3 21 1 1 1 0 1 4 5 6 7 Section 4.2 The Poisson Distribution 289 4.2.16. A tool and die press that stamps out cams used in small gasoline engines tends to break down once every five hours, The machine can be repaired and put back on line quickly, but each such incident costs $50. What is the probability that maintenance expenses for the press will be no more than $100 on a typical eight-hour workday? 4.2.17. In a new fiber optic communication system, transmission errors occur at the rate of 1.5 per 10 seconds. What is the probability that more than two errors will occur during the next half-minute? 4.2.18. Assume that the number of hits, X. that a baseball team makes in a nine-inning game has a Poisson distribution. [f the probability that a team makes zero hits is what are their chances of getting two or more hits? 4.2.19. Flaws in metal sheeting produced by a high-temperature roller occur at the rate of one per 10 square feet. What is the probability that three or more flaws will appear in a 5-by-8-foot panel? 4.2.20. Suppose a radioactive source is metered for two hours, during which time the total number of alpha particles counted is 482. What is the probability that exactly three panicles will be counted in the next two minutes? Answer the question two ways--nrst, by defining X to be the number of particles counted in two minutes, and second, by defining X to be the number of particles counted in one minute. 4.2.21. Suppose that on-the-job in,juries in a textite mill occur at the rate of 0.1 per day. (9) What is the probability that two accidents will occur during the next (five-day) work week? (b) [s the probability that four accidents will occur over the next two work weeks the square of your answer to Part (a)? Explain. 4.2.22. Find P(X = 4) if the random variable X has a Poisson distribution such that p(x = 1, 1) = P(X = 2). 4.2.23. Let X be a Poisson random variable witb parameter A. Show that the probability that + e- 2A ). X is even is i<l 4.2.24. Let X and Y be independent Poisson random variables with parameters A and /1, respoctively. Example 3.12.10 established that X + Y is also Poisson with parameter A + Jt. Prove that same result using Theorem 3.8.I. 4..2.25. If X I is a Poisson random variable fOl' which £(X d = A and if the conditional pdf of X2 given that X I = XI is binomial with parameters Xl and p, show that the marginal pdf of X2 is Poisson with E(X2) = Ap. Intervals Between EVents: The Poisson/Exponential Relationship Situations sometimes arise where the time interval between consecutively occurring events is an important random variable. Imagine being responsible for the maintenance on a network of computers. Clearly, the number of technicians you would need to employ in order to be capable of responding to service calls in a timely fashion would be a function of the "waiting time" from one breakdown to another. Figure 4.2.2 shows the relationship between the random variables X and Y, where X denotes the number of occurrences in a unit of time and Y denotes the intervaJ between consecutive occurrences. Pictured are six intervals: X = 0 on one occasion, X = 1 on three occasions, X = 2 once, and X = 3 once. Resulting from those eight occurrences are seven measurements on the random variable Y. Obviously, the pdf for Y will depend on the pdf for X. One particularly important special case of that dependence is the Poisson/exponential relationship outlined in Theorem 4.2.3. 290 Chapter 4 Special Distributions Y-values: Unit time FIGURE 4.2.2 Theor-em 4.2.3. Suppose a series of events satisfying the Poisson model are occurring at the rate of A per wtit lime. Let random variable Y denote the interval consecutive events. Y the exponenliLll distribution y > 0 fy(y) Proof. an event has occurred at time a. Consider the interval extends from at the rate of A per unit time, the a to a + y. Since the (Poisson) events are O! = probability that no outcomes will occur in the interval (a. a + y) is Define the random Y to denote the interval between consecutive occurrences. Notice that there will be no occurrences in interval (a,a + y) if Y > y. Therefore, P(Y > y) = or, equivalently, y) = 1 - P(Y P(Y > y) = 1 for Y. It must be true that frey} be tbe (unknown) .:s = Taking derivatives of the two ~ dy Which loy fy(t)dt for r 10 fy(t}dt P(Y ~(1 .:s y), we can write - e- 1y ) dy that frey) = y > 0 o CASE STUDY 4.2.4 Over "short" periods, a volcano's eruptions are believed to be Poi.sson events-that they are thought to occur independently and at a constant rate. If so, between eruptions should have the form hey) = the pdf describing the Collected for the purpose testing that presumption are the data in Table showing the intervals (in months) that elapsed between thirty-seven consecutive (Continued on next poge) Section 4.2 The Poisson Distribution 291 eruptions of a thousand-foot volcano Hawaii (110). the period CO"erlE:O--l,/S.:'lL to 195O-eruptions were occurring at the rate of A per month (or once Is the variability in these thirty-six YiS ,,",VI"""''',,",' with the statement TABUU.6 73 26 6 41 18 11 26 3 3 6 37 23 2 65 94 51 6 6 68 41 38 16 20 18 12 40 77 91 38 50 61 To answer that question that the data be reduced to a density-scaled histogram and superimposed on a of the predicted exponential pdf Case Study 3.4.1). Table 4.27 of the histogram. Notice in Figure 4.2.3 that the of that hlSI~ogI'am is entirely consistent with the theoretical mode1- fy(y) = Theorem 4.2.3. TABI..£ 4.2.7 Interval O.:s:y < 20 2O.:s:y< 40 4O.:s:y< 60 6O.:s:y< 80 80 < y < 100 100.:s: y < 120 120.:s: y < 140 13 9 0.0181 0.0125 5 6 0.0069 2 0.0028 0.0000 0 1 0.0083 0.0014 36 0.02 0.01 o 20 60 80 Interval between fKlUftE 4.2.3 ..... -.:::.:--:.:-:.:,-~I!li!!i~ y 140 100 120 {in m£mth<\ 292 Chapter 4 Special Distributions EXAMPLE 4.2.5 n.U.lVlli"" the most famous of all meteor showers are the Perseids, which occur each year in early In some areas the frequency of visible Perseids can be as high as forty per hour. that such sighting;; are Poisson calculate the probability that an observer who has just seen a meteor will have to wait at least five minutes before seeing another. Let the variable Y denote the interval (in minutes) between consecutive sightings. in the units of Y, forry per hour rate of Perseids 1:le(:OUles 0.67 per minute. A straightforward integration, then, shows that the probability is 0.036 that an observer will have to wait five minutes or more to see another meteor: P(Y > 5) = !SOO 0.67e-O.67J' dy roo du (whereu = O.67y) 13.33 == 1:33 = = 0.036 QUESTIONS 4.2.26. Suppose that commercial airplane crashes in a certain occur at the rate of 2.5 per year. (8.) Is it reasonable to assume that such crashes are Poisson events? ~AtJ",..".". (b) What is the probability that four or more crashes will occur next year? (c) What is the probability that the next two crashes will occur within three months of one another? 4..2.27. Records show that deaths occur at the rate of 0.1 per day among patients residing in a large nursing home. If someone dies today, what are the chances that a week or more will before another death occurs? 4.2.2&. Suppose that Yl and Y2 are exponential random each having pdf frey) M->'y, y > O. If Y :::;:; Yt + Y2, it can be shown that y > 0 Recall Case Study 42.4. What is the probabitity that the next three eruptions of Mauna Loa will be less than 40 months apart? 4.2.29. Fifty have just been installed in an outdoor security According to the manufacturer's specifications, these particular lights are expected to bum out at the rate of 1.1 per 100 hours. What is the expected number of bulbs that will fail [0 last for at least 75 hours? 4.3 THE NORMAL DISTRIBUTION The Poisson limit described in Section 4.2 was not the only, or the approximation developed [or the purpose of facilitating the calculation of binomial probabilities. in the eighteenth century, Abraham DeMoivre proved that areas under the curve Section 4.3 The Normal Distribution 293 0.2 0.15 0.1 0.05 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 FIGURE 4.3.1 fz(z) = -00 < Z < 00 can be to estimate P !. ( a ~ X-n(l) ) ~ ~b jl'!(~)(~) , where X is a random variable with.li I'! and p = Figure 4.3.1 illustrates the central idea in De Moivre '8 discove~ \.,Pictured is a probability histogram of distribution with l1 P ::= ~uperimpose£LQver c nlSltO~l'am is the function fz (z) = ~e --?/2. Noti.ce how closely t'he area under the curve area of the bar, even for this relatively small value of n. mathematician Laplace generalized DeMoivre's original idea to binomial to the full attention of the approximations for arbitrary p and brought 1812 book, Theone Analytique mathematical by including it in his des Probabilities. TbeOl'em 4.3.1. which p X a binomial random variable defined on n independent trials for = P(success). For any numbers a and b, X - np ( (1:;; Jnp(1 _ p) Proof. One of :;; b) = 1 lb e- z2 /2dz ways to verify Theo~m 4.3.1 is to show that tbe limit of the moment- generating function - np as n _ 00 is el2/ 2.and that el 2 /2 is also the value of By Theorem 3.12.2, then, the limiting pdf of Z = -r=::::;::;:=:::::;: is the function fz(z) = of a more -00 00. Appendix 4.A.2 for the proof o J!...,L''''U:u. Comment. We saw in < Z < ~:tJ.cln 4.2 that Poisson's limit is actually a special case of Poisson's distribution, px(k) = --,-, k = 0.1. 2, .... Similarly, the DeMoivre-Laplace k. 294 4 ." ........,"', Distributions limit is a pdf in its own right Justifying that assertion, of course, requires proving 1 to 1 for -00 < Z < 00. there is no algebraic or trigonometric substitution that can be used to by polar coordinates, we can demonstrate that area under (z) is 1. a necessary and sufficient alternative-namely, that the square of L: 1 dz equals one. To note that 1 [e-x2!2dX. dY=~[l°O 1 dxdy 2:rr-<Xl-OO Let x = r cos 8 and y = r 1 8, so dxdy 11 00 00 -<Xl -00 = rdrdO. Then dx dy = ~ {2Jr 2:rr = 1 tX> e-r2/2 r dr d8 10 10 fooo dr . foDr d8 =1 Comment. The function (z) = 1 is referred to as standard lWyrnal (or Gaussian) curve. By convention. any whose probabilistic behavior is described by a standard normal curve is denoted Z (rather than X, Y, or W). Since Mz(t) 1'/12, it follows readily that = 0 Var(Z) = 1. Finding Areas Under Standard Normal Curve In order to use 4.3.1, we to be to find area under graph of Iz(z) above an interval [a, b]. In are obtained one of two ways-either by using a IWmla/ table, a copy of which at the back of every ''''''Lm',l",:> or by running a computer software package. Typically, both approaches Fz(z) = ::: z), with Z (and from we can the area). Table 4.3.1 shows a portion of the normal that appears in Appendix A.I. row I.Ulder the Z heading represents a number along the horizontal axis of fz( z) rounded 0 through 9 allow that number to be written to the off to the nearest tenth; hundredths Entries in the body the table are areas under the graph to the left number indicated by the entry's row and column. For example, the number listed at the intersection of the "1.1'1 row and the" 4" column is 0.8729. which means that the area under h(z) from -00 to 1.14 is 0.8729. That dz = 0.8729 = Figure < Z ::: 1.14) = F z (1.l4) Section 4.3 The Normal Distribution 291 If X is a binomial random variabJe with parameters nand p, statement for P(a ::: X ::: b) == - Fz Comment. Even with the continuity correction refinement, normal curve approximainadequate if n is too small, especially when p is close to 0 or to L As a rule tions can of thumb. the DeMoivre-Laplace limit should be used only if the magnitudes of nand p are such that Il P > 9 -- 1 - p and n > p EXAMPLE 4.3.1 Boeing 757s flying certain routes are configured to have 168 economy class seats. Experience shown that only 90% of all ticket-holders on those flights will actually sells 178 tickets for show up in time to board the plane. Knowing that, suppose an the 168 seats. What is the probability that not everyone who arrives at the gate on time can be accommodated? the random variable X denote the number of would-be pru;;serlgers up a flight. travelers are with not ticketholder constitutes an independent event. Still, we can get a useful approximation to the 178 probability that the flight is overbooked by assuming that X is binomial with n and p = 0.9. What we are looking for is P(169 ::; X :S the probability that more ticket-holders show up than there are seats on the plane. According to Theorem (and using the continuity correction), = P(flight is overbooked) P(169 ::; X ::; 178) =P P == 168.5 - 178(0.9) < X - 178(0.9) < 178.5 - 178(0.9}) - J178(0.9)(0.1) - ,J178(O.9) (0. 1) P(2.07 ::; Z .::s 4.57) = Fz (4.57) From Appendix A.l, = P(Z .::s 4.57) is equal to one, for all practical purposes, and the area under fz(z) to the left of 2.07 is 0.9808. Therefore, P(flight is overbooked) = 1.0000 0,9808 =0.0192 implying that the chances are about one in fifty that not every ticket-holder will have a seaL 298 4 Special Distributions CASE STUDY Research in extrasensory perception has ranged from the slightly unconventlonal to the downright bizarre. the of the nineteenth and even well into the twentieth century, much what was done involved and mediums. But beginning around 1910, moved out of the seance parlors and into the laboratory, where up controlled studies that could be analyzed In 1938, PraU and Woodruff, working out of Duke did an that became a an generation of ESP research (70). and a subject sat at opposite ends of a table. Between them was a screen with a gap at the bottom. Five blank cards. visible to participants, were side by side on the table the screen. On the of the screen one of the standard ESP symbols 4.3.4) was hung over each of the blank cards. AGUREO.4 The experimenter shuffled a deck of ESP up the top one, and coIlcentrate:d on it. The subject tried to guess its (f he thought it was a circle, he would point to the blank on the table that was beneath the circle card hanging on his side of the screen. The was then repeated. Altogether, a total of thirty-two subjects, all students, took in the experiment. made a total of thousand guesses-and were correct 12A89 times. With five the probability of a subject's ""'<>11'''''',''' just by chance was}. a binomial model, the number of correct would be 60,000 X ~, or 12,000. The question is, how "near" to 12,000 is Should we write off the excess of 489 as more than luck, or can we conclude that ESP has been demonstrated? To effect a resolution between the conflicting "luck" and "ESP" hypotheses, we need to compu te the proba bility of the subjects' 12,489 or more correct answers under the preswnpfion thal p = ~. Only if that is very small can 12,489 construed as evidence in support of ESP. Let the random variable X denote the number of correct responses in sixty thousand tries. Then P(X ?:. = ~ (60,000) k=12.4S9 k (~)k (i)60JJOO-k 5 (4.3.1 ) 5 (Continued on nexi Section 4.3 Normal Distribution 299 At this point the DeMoivre-Laplace limit theorem a welcome alternative to computing the 47,512 binomial probabilities implicit in Equation 4.3.1. First we the continuity correction and rewrite p(X ::: 12,489) as P(X ?: 12,488.5). Then P(X?: - P - X - np :> l-..~~~-~6~O~,OOO~(1~/~5») ( Jllp(1 - p) J60,000(1/5)(4/5) 4.99) = P (CX. it..99 0.00001"1)3 this last value Appendix. Here, the obtained from a more version of Table A.1 in the that P(X ?: 12,489) is so extremely small makes the "luck" hypothesis Ii would appear that something other than chance had to be responsible for occurrence of so many correct guesses. Still, it does not follow that ESP has been demunstrated. Flaw:!. in (h~ ~xpt:rimental setup as wel1 as the scores could have inadvertently produced what to be errors In a statisticaHy result. Suffice it to that a great many scientists remain highly skeptical research in general the PraU-Woodruff experiment in see (45).] particular. [For a more thorough critique the data we have just (I) = !) Comment. This is a good set of data for illustrating why we need formal mathematical methods for interpreting data. our intuitions, when left unsupported by probability calculations, can often be deceived. A typical first reaction Lo the Pratt- Woodruff results is to dismiss as inconsequential the 4fi9 additional correct it seems entirely believable that 60,000 guesses could produce. answers. To Only after making the P( X ?: 12.489) by an extra 489 correct computation we see the utter implausibility of that conclusion. What statistics is general-rule out that are not doing here is what we would like it to :'.upport~d by the data and point us in (hI: of inferences that are more likely to be true. QUESTIONS 4.3.1. Use Appendix Table A.1 to evaluate Ihe following integrals. In each case, draw a diagram of fz(:) and shade the area thai corresponds to the integral. (a) i Ll] 1 dz -0.44 0.94 1 _2 (b) [ -e~' f2 d• -.e>.:; J2Ji ~ .300 Cha pteI' 4 Distributions (c) L748 (d) 1_: 32 ~ dz 4.3.2. Let Z be a standard {lonnal random variable. Use Appendix Table A.l to find the numerical value (or each of the following probabilities. Show each of your answers as an area under fz(z). (a) P(O:.:: Z :.:: 2.07) (b) P(-O.64:::: Z < -0.11) (c) P(Z > -1.06) (d) P(Z < -2.33) (e) P(Z?: 4.61) 4.3.3. (a) Let 0 < a < b. Which number is larger? dz or (b) Leta> O. Whichnumberis larger? dz or l O + i /2 1 d1, 0-1/2 (1.24 4.3.4. (8) Evaluate 10 dz. (b) EvaluateL: 6e-z2 /2 dz. 4.3...S. Assume that the random variable Z is described by a standard normal curve what values of z are the following statements true? (8) P{Z:.:: z) = 0.33 (b) P(Z 2:; z) = 0.2236 (c) P(-1.00:::: Z .:.:: z) 0.5004 For (d) P(-z < z < z) = 0.80 (e) P(z:.:: Z :.:: 2.03) 0.15 4.3..6. Let z(¥ denote the value of Z for which P(Z 2:; z(¥) = a. By definition, the inlerquarti1e range, Q, for the standard normal curve is the difference Q = 1,.25 - <'.75 Q. 4.3.7. Oak Hill has 74,806 registered automobiles. A city ordinance requires each to display new a bumper decal showing that the owner paid an annual wheel tax of $50. By need to be purchased during the month of the owner's birthday. This year's budget assumes that at least $306,000 in decal revenue will be collected in November. in that month will be less than antlcl~)at(~ What is the probability that taxes and produce a budget shortfall? , 4.3.8. Hertz. Brothers, a small, family-owned radio manufacturer, electrooic components domestically but subcontracts the cabinets to a supplier. Altbough inexpensive, the foreign supplier has a quality control program that leaves mucb to be desired. the average, only 80% of the standard 1600-unit shipment that Hertz Section 4.3 4.3.9. 4.3.10. 4.3.1L 4.3.12.. The Normal Distribution 301 receives is usable. Hertz has bade orders for 1260 radios but space units for no more than 1310 cabinets. What are the chances thal the number of to fill all the orders in Hertz's latest shipment will be large enough to allow already on hand. yet small enough to avoid causing any invento-ry problems? Fifty-five percent of the registered vOlers in Sheridanville favor their incumbent mayor in her bid for reelection. If 400 voters go (0 the polls, approximatc the probability that (a) the race ends in a tie (b) the challenger scores an upset State Tech's basketball team, the Fighting Logarithms, have a 70% foul-shooting percentage. (a) Write a formula for the exacl probabillly that out of their next 100 throws will make between 75 and 80, inclusive. (b) Approximate the probability asked for in Part (a). A random sample of 747 obituaries published recently in Sah Lake City newspapers revealed that 344 (or 46%) of the died in the three-month period following their binhdays (129). Assess the statisticaisignificance of that finding by approximating the probability that 46% or more would die in that particular inteTval if deaths occurred ...,."..,"""" throughout the year. What would you conclude on the basis of your answer? There is a theory embraced by certain parapsychologists that hypnosis can enhance a person's ESP ability. To leSI. I.hat hypothesis, an experiment was set up with 15 hypnotized subjects (22). Each was lO make 100 using the same sort of ESP cards and protocol that were described in Case StUdy 4.3.1. A lOtal of 326 correct identifications were made. Can it be argued on the basis of those results that hypnosis does have an effect on a person's ESP ability? Explain. 4.3.13. rt' pxO;;) = (10) 7 J;: (0.3) 1O-J;: " k k (0.) 0 .I. . . . . 10" , IS it . ar>propnate to approximate P(4.::: X .::: 8) by computing P Explain. 4.3.14. A sell-out crowd of 42,200 is expected at Cleveland's Field for next Tuesday's game with the Baltimore Orioles, the last before a long road trip. is trying to how much food to have on hand. Looking at concession records from games played earlier in the season. she knows thal, on the average, 3R% of all those in attendance will buy a hot dog. How an order should she place if exceeding supply'! she wants to have no more that a 20% chance Central limit Theorem It was pointed out in as the sum of 1/ 3.9.3 that binomial random variable X can Bernoulli random variables XI!, where ._! 1 XI - 0 X,. with probability p with probability 1 p ... , written ]02 Chapter 4 Special Distributions But if X = Xl + X2 + ."" + XII' Theorem 4.3.1 can be reexpressed as Xl + + "". + X" - np " ( 11m P a:S ---r=:====~--- 11-+00 :s b) = 1 dz (4.3.2) Implicit in Equation 4.3.2 is an obvious question: Does the DeMoivre-Laplace limit other types of random variables as well? Remarkably. the answer is apply to sums " Efforts to extend Equation have continued for more one hundred and many of the years. Russian probabilists-A. M. Lyapunov, in advances. In 1920, George Polya gave these new generalizations a name that has been with the result ever since: He called it the central limit theorem (141). TMorem 4.3.2 (Central Limit TMorem). Let WI> W2.··. he an infinite sequence of independent random variables, each with the same distribution. Suppose that the mean 11and the variance (/2 of fw(w) are both fin.ite. For any numbers a and h, lim P (a < WI + ... + 11-400 W" dz - o Proof. See Appendix 4.A.2. Comment. The central limit theorem is often stated in terms of the average of Wt, W2, .. "' and W", rather than their sum. E [;(W1 + ... + W,,) ] = E(W) = JL and Var [~(Wt + .,. + W,,)] = 0'2 In, Theorem 4.3.2 can be stated in the equivalent form 1 dz We will use both formulations, the choice depending on whlch is more convenient for the problem at EXAMPtE 4_3.2 The top of Table 4.3.2 shows a MINIT AB simulation where forty random samples of size five were drawn from a unifonn pdf defined over the interval (0, 1]. Each row corresponds to a different The sum of five numbers appearing in a given sample is denoted "y" and is listed in column C6. For this particular unilonn pdf, JL and 0'2 = (recall Question 3.6.4), so i --''---=----- = -=.:::;. 'tABLE 4.3.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 0.556099 0.497846 0.284027 0.5992S6 0.280689 0.462741 0.556940 0.102855 0.642859 O.01n70 0331291 0355047 0.626197 0.211714 0.535199 0.810374 0.687550 0.424193 0.397373 0.413788 0.6026'.J7 0.963678 0.967499 0.439913 0.215TI4 0.108881 O.3m98 0.635017 39 0.563097 0.687242 0.784501 0.505460 0.336992 0.784279 0.548000 0.096383 0.161502 0.6TI552 0.470454 40 0.104377 30 31 32 33 34 35 36 37 38 0.6%873 0.588979 0.209458 0.667891 0.692159 0.349264 0.246789 0.679119 0.004636 0.568188 0.41070..'; 0.961126 0.304754 0.404505 0.130715 0.153955 0.185393 0.529199 0.143507 0.653468 0.094162 0.375850 0.868809 0.446679 0.407494 0.271860 0.173911 0.187311 0.065293 0.544286 0.745614 0355340 0.734869 0.194038 0.788351 0.844281 0.972933 0.232]81 0.267230 0.819950 0.272095 0.414743 0.194460 0.036593 0.471254 0.719907 0.559210 0.728131 0.416351 0.118571 0.920597 0.530345 0.045544 0.603642 0.082226 0.620878 0.201554 0.973991 0.017335 0.247676 O.9093n 0.940770 0.075227 0.002307 0.972351 0.309916 0.365419 0.841320 0.980337 0.4.59559 0.163285 0.824409 0323756 0.831117 0.6&J927 0.038113 0.307234 0.652802 0.047036 0.956614 0.614309 0.839481 0.728826 0.613070 0.711414 0.014393 0.299165 0.908079 0.979254 0.575467 0.933018 0.213012 0.333023 0.827269 0.013395 0.157073 0.234845 0.556255 0.638875 0.307358 0.405564 0.983295 0.971140 0.604762 0.300208 0.831417 0.518055 0.649507 0.565875 0.352540 0.321047 0.430020 0.200790 0.656946 0.515530 0.588927 0.633286 0.189226 0.233126 0.819901 0.439456 0.694474 0.314434 0.489125 0.918221 0.518450 0.801093 0.075108 0.242582 0.585492 0.675899 0.520614 0.405782 0.897901 0.819712 0.090455 0.681147 0.900568 0.653910 0.828882 0.814348 0.554581 0.437144 0.210347 0.666831 0.4{i3567 0.685137 0.077364 0.529171 0.896521 0.682283 0.459238 0.823102 0.050867 0.553788 0.36.S4{13 0.410964 0399502 2.46429 3.13544 1.96199 2.99559 2.05270 2.38545 3.15327 1.87403 2.47588 [.98550 2.08240 339773 3.07021 139539 2.00836 2.TIl72 2.32693 1.40248 2.43086 2.54141 2.23723 338515 3.99699 2.49970 2.03386 i!..16820 1.788l:i6 2.48273 2.67290 2.93874 3.08472 2.27315 2.89960 2.19133 3.19131 2.32940 2..24187 2.17130 2.43474 1.56009 -0.05532 0.98441 -0.83348 0.76m -0.69295 .....{).ln45 1.01204 -0.96975 -0.03736 -0.79707 -0.6%94 1.39076 0.E8337 -1.71125 -0.76164 0.42095 .....{).26812 -1.70028 -0.10711 0.06416 -0.40708 1.37126 2.31913 -0.00047 .....{).72214 .....{).51402 -U0200 -0.02675 0.26786 0.67969 0.90584 -0.351440.61906 .....{).47819 1.07106 .....{).26429 -0.39990 -0.50922 -0.10111 -1.45610 0.4 0.3 0.2 0.1 0.01---"'-""-' -3.5 -1.5 -0.5 Z-n;lio 05 1.5 2.5 3.5 304 Chapter 4 Spedal Distributions At the bottom of Table 4.3.2 is a density-scaled Y~ (as listed in oolumn Notice the close of the "2 between the distribution of .,;$/12 those ratios and fz(z): What we see there is entirely consistent with the statement of Theorem 4.3.2. Comment. Theorem 4.3.2 is an asymptotic result., yet it can provide surprisingly good approximations even when n is very small. Example 4.3.2 is a Iypical case in The unifonn pdf over {O. 1] looks nothing like a curve, random as small as n = 5 yield sums that behave probabilistically much like the theoretical limit. In samples from pdfs will produce sums that "converge" quickly to the theoretical Limit. On the other if the underlying pdf is sharply SKI~WI~a--H)r example, fy(J) 10e-lOy, Y > O--it would take a larger n to achieve the level of agreement present in 4.3.2. = EXAMPLE 4.3.3 15 is drawn from the pdf he\') A random sample of size n y = (fi) = 3(1 P(~ Vi. Use the central limit theorem to ,0 :s y :'S L Let :s Y:s i)· first of all, that E(Y) = 10 1 y . 3(1 - y)2 dy 1 4 and According, to the central limit theorem formulation that appears in the comment on 302, the probability that Ywill lie between and ~ is approximately 0.99: l P 1 - -83) (8-<Y<- =p =P :'S 2 :'S 2.50) 0.9876 EXAMPLE 4.3.4 In preparing next quarter's budget, the accountant for a small business has one hundred different expenditures to account for. predecessor listed each entry to the penny, but doing so grossly overstates the precision of the As a more truthful alternative, she intends to record each budget allocation to the nearest $100. What is the probability that her total estimated wiH end up differing from the actual cost by more than Section 4,3 $5oo? Assume that fl. . , ., flOO, items, are independent and uniformly Let The Normal Distribution 305 rounding errors she makes on the one hundred over the interval [-$50, +$50]. + Y2 + ... + YIOO = total roundjng error SUlO = fl What the accountant wants to estimate is P(lSwol > $500). By the distribution assumption made for each fi' E(Y/) =0. i=1.2, .. ,,100 and j S{) Var(Y;) = E(f?) = -so 1 loo ldy 2500 3 E(SHlO} :::::; E(YI + f2 + .. , + flOO) = ° and Var(SJOO) = Var(Yl + Y2 + ... + flOO) = 100 (2~) 250,000 3 Applying Theorem 4.3.2, then, shows that her strategy has roughly an 8% chance of being in error by more than $500: P(lSlool > $500) = 1 - P(-5OO:::; S100 -° :::; 500) =1 - P - - - - - < SI00 - ° 0) 500 < ---.,,- - 500/../3 == 1 - P(-1.73 < Z < 1.73) = 0.0836. EXAMPLE The annual number of earthquakes registering 2.5 or higher on the Richter scale and having an epicenter within forty miles of downtown Memphis follows a Poisson distribution with )" = 65. Calculate the exact probability that nine or more such earthquakes will strike central limit next year, and compare that va]ue to an approximation based on theorem. 306 Chapter 4 Special Distributions number of earthquakes of that magnitude that will hit Memphis next the exact probability thai X 2: 9 is a Poisson sum: If X denotes P(X 2: 9) =1 P(X .:s 8) = 1 8 "'-~ L x! x=O 1 - 0.7916 =0.2084 For Poisson random variables, the ratio " . t h eorem red utes to --=X - A centra11 Im!t P(X 2: 9) =1 - P(X WI + .. + Wn - nit . that appears In the Question 4.3.18). Therefore, .:s 8) :::: 1 =1 P(X .:s 8.5) P(X - 6.5 < 8.5 .J6.5 - 6.5) .J6.5 ",;" 1 - P(Z.:s 0.78) = 0.2170 (Notice that the event" X .:s 8" is replaced with" X .:s 8S' before applyingthecemrallimit theorem transformation. As always, the continuity correction is appropriate whenever a discrete probability modeJ is being approximated by the area under a curve.) QUESTIONS = 4.3.15. A fair coin is tossed 200 times. Let Xi = 1 if the ith toss comes up heads and Xi 0 1,2, ... 200. Calculate the central limit theorem approximation for otberwise, i P(IX E(X)I .:s 5). How does this differ from the DeMoivre-Laplace approximation? 4.3.16. Suppose that 100 fair dice are tossed. Estimate the probability that the sum of the faces showing exceeds 370. Inc1ude a continuity correction in your "n~II""'" 4.3.17. Let X be the amount won or loss in betting $5 on red in roulette. Then Px (5) and p.t(-5) = ~. If a gambler bets on red 100 times, use the central limit theorem to estimate the probability that those wagers result in less than $50 in losses. 4.3.18. If Xl, X2. • .• XII are independenlPoisson random variables with parametersAl. An. and if X = Xl + + ... + X"' tben X is a Poisson random variable = 11 witb parameter l =L Ai (recall 3.12.10). What specific form does the ratio ;;=1 in Theorem 4.3.2 take if the Xi'S are Poisson random variables? for a particular silicon 4.3.19. An electronics firm reeei ves, on the average, 50 orders If the company has 60 chips on hand, use the central limit theorem to approximate the probabilily that they will be unable to fiJI all their orders for the upcoming week. Assume tbat weekly demands foUow a Poisson distribution. Hint: See Question 4.3.18. 4.3.20. Considerable controversy has arisen over the possible aftereffects of a nuclear wea)X}ns test conducted in Nevada in 1957. Included as part of the test were some 3000 military Section 4,3 Normal Distribution 307 civilian "observers." Now, more than 40 year'!> later, cases of leukemia have been diagnosed among Ihose 3000. The expected of cases, based on the demographic characteristics of the was three. Assess the statistical significance of findings. Calculate an exact answer using the Poisson distribution as well as an approximation on the central limit theorem. The Normal Curve as a Model for Individual Measurements Because of the central limit theorem, we know that sums (or averages) ofvirrually any set of random variables, when suitably have distributions can be appro:rimated many indivjduaJ by a standard normal curve, Perhaps even more surprising is the fact measurements, when suitably also have a standard normal distribution, \Vhy true? What do single observations have in common with samples of should the latter n? Astronomers in the nineteenth century were among the first to understand the connection. Imagine looking through a telescope for the purpose of determining the location of a star. Conceptually, the data point, Y. eventually recorded is the sum of two components: (1) the star's true location}1.'" (which remains unknown) and (2) measurement error, By definition, measurement error is the net of all factors that cause the random variable Y to have a different value than JL*. Typically, these effects will be additive, in which case the random variable can be written as a sum: Y= (4.3.3) where WI> for example, might represent the effect atmospheric irregularities, the of seismic vibrations, W3 the of parallax distortions, and so On. li Equation 4.3.3 is a valid representation of the random variable Y, then it would individual YiS. Moreover, if follow that the central limit theorem applies to E{Y) = + Var(Y) = Var(}1.'" WI + + WI W2 + + ... + W2 W,) + ... + = J..i W,) =u2 the ratio in Theorem 4.3.2 takes the fonn Y - f.l. Furthermore, t is likely to be very large, u so approximation implied by the ceotrallimit theorem is v<><>'-'U'U£UJ J an equality-that we take the pdf of Y-f.l a to be fz(z). Finding an actual formula for Jy(y), then, becomes an exercise in applying TheoY-}1. rem Given Y=}1. and + uZ 4 Special Distributions Definition 4.3.1. A random variable Y is said to be normally distributed with mean Il and variance it: -oo<y<oo The symbol Y ~ N (p., a 2 ) will sometimes be distribution with mean J1. and variance to denote the fact Y a normal Commwt. Areas an "arbitrary" normal distribution, fy(y), are '-<>''"'I.Ll<1L!;;;"U by the area under the standard distribution, fz(z.): .:s Y .:s b) = The ratio Y t1 P --'-< - Y- Il <b -- -Il) a - a P ( Jl) ll Il b --<2<-- a - - a is often referred to as either a 2 trans!orrn.atum or a Z score. EXAMPLE 4.3.6 In many states a motorist is legally or driving under the (DUI). if his or her blood concentration, Y, is 0.10 % or higher. When a DUloffender is pulled over, often request a test. Although the analyzers used the machines do exhibit a certain amount of for that purpose are remarkably measurement error. Because of that the possibility exists that a driver's true blood alcohol concentration may be under 0.10% even though the analyzer a reading Over 0.10%. has that repeated breath measurements taken on the same of responses that can be described by a normal with Il person produce a equal to the person's true blood alcohol concentration and a equal to 0.<.104%. Suppose a Having a is stopped at a roadblock on his way home he a true blood concentration of bit more than he should under the legal limit. If takes the breath what are the he will be incorrectly booked on a DUI charge? a nUl arrest occurs Y ::::. 0.10%, we to find P(Y ::::. 0.10) /.t = and a = 0.004 (the is irrelevant to probability calculation can be ignored). An application of the Z transformation shows that the driver has an 11 % of being falsely"'" .... """' ...... 0.095 > -----,-,--0.004 = 1 - P(2 < 1.25) = P(Z 2: = 1 - 0.8944 0.1056 P(Y 2: 0.10) = P 4.3.5 shows fy(y). Y the two areas that are equal. Section 4.3 I The Normal Distribution 309 \ I , , \ I , \ i , I 150 I fy(y)--.J -~'" I ,, I Area"" 0.1056 y 0.095 0.10 Legally 0.080 0.110 drunk ... "- ...... ... ... o -3.0 . Area.= 0.1056 1.25 3.0 FIGURE 4.3.5 EXAMPLE 4.3.7 Mensa (from the word for "mind") is an international society devoted to intellectual of the population is eligible pursuits. Any person who has an IQ in the upper to join. What is the lowest IQ that will qualify a person membership? Assume that lOs are normally distributed with J.t = 100 and a ::::: 16. the random variable Y denote a person's lQ, and Jet the constant YL be lowest 1Q that qualifies someone to be a Mensan. The two are related by a probability equation: P(Y or, equivalently, yd P(Y < ~ yd =: 0.02 =1 0.02 = 0.98 (4.3.4) Figure 4.3.6). Applying the Z transformation 10 Equation 4.3.4 gives P(Y < YL) = P Y - 100 < = P(Z < YL - 100) 16 = 0.98 310 Chapter 4 Special Distributions 0.03 ....... ..,---fy(y) ... ... ... "" '\, .. .. Area = 0.02 " ..... 100 IO YL L-..Qualifies for membership AGURE4.3.6 From the standard normal table in Appendix Table A.I, tbough, P(Z < 2.05) = 0.9798 "" 0.98 off the same area of 0.02 under /z (z), tbey Since - - - - whicb .LIUIIJU'''' that 133 is the lowest acceptable 10 for Mensa: YL = 100 + 16(2.05) = 133 EXAMPLE 4.3.8 The Army is soliciting proposals for the development of a truck.-launched antitank. missile. Pentagon officials are requiring that the automatic sigbting mechanism be sufficiently reliable to guarantee tbat 95 % of the missiles will fall no more than fifty feet sbort of their target or no more than fifty feet beyond. What is the largest () compatible with that degree precision? Assume that the horizontal distance a travels, is normally between the truck and distributed with its mean (J1.) equal to the length of the the target The requirement that a missile has a 95% probability of landing within fifty feet of Its can be by the equation P(J1. (see p 50 .$ Y .:5: J1. + 50) = 0.95 43.7): Equivalently, -50-J1. a < Y-J1. :S: J1.+50a ~----=- =p -50 50) ( -;-.:5: Z.:5: -;; = 0.95 (4.3.5) Section 4.3 The Normal Distribution 311 II Distance. between truck and targel FlGUR£43.1 Following the approach taken in Example 4.3.7. we can "match" Equation 4.3.5 using the information provided in Appendix Table A.1. Specifically, P( -1,96 S Z S 1.96) = 0.95 It must be true. then, that so 1.96= a which implies a = Any value of q larger than will result in !r (y) being flatter, and that would have the consequence that fewer than 95% of the missiles would land.within fifty feet of their targets. Conversely, if the sighting mechanism produces a u smaller than 25.5, it will be performing at a level that exceeds the contract specifications (and perhaps costing an amount that makes the proposal noncompetitive). EXAMPLE 4.3.9 Suppose a random variable f has the moment-generating function My(t) = ~t+il2. Calculate P(-1 ::::: Y S 9). To notice that My(t) bas the same form as the moment-generating function for a normal random variable. That is, e3/+8t2 = eP"l+«(12r2)12 where Ii. = 3 and q2 = 16 (recalJ Example 3.12.4). To evaluate P(-l S Y S 9), then, requires an application of the Z transformation: P(-l < Y < 9) -- = P ( -1 4- 3 -< Y'- 3 9 - 3) 4 -< -4- = P(-l.00 < Z < l.ll0} -" = 0.9332 - 0.1587 =0.7745 312 Chapter 4 Special Distributions Theorem 4.3.3. Let Yl be a normally distributed random variable with mean f.Ll and variance and let be a normally distributed random variable with mean f.L2 and variance a~. Define Y Yl + If Yl and are independent, Y is normally distributed with mean f.Ll + JL2and variance + a~, ur, ur = Proof Let MY,(t) denote the function for YI, i 1,2, and let My(t) be the rnornent-generating function for y, Since Y YJ + Y2, and the Yi'S are = independent, = My(t) MYl (I) . M y2 (t) == el'11+{o;t2}/2 . el'21+(q~t2)!2 3.12.4) = e(Pl+M)I+(O;+a~)(2/2 We recognize the latter, though, to be the mornent-generating function for a nonna} random variable with mean f.Ll + 1L2 and variance + u~. The result follows by property stated in Theorem 3.12.2. 0 virtue of the at .. '. Y" be a random sample 'Of size n from a normal distribution with Coronary. Let Yl, =~ Then the sample mean, Y mean f.L and variance II Yi, is also normally distributed with mean JL bUI with variance equal to a 2 / n (which implies that Y~JL is a standard normal random variable, Z). Coronary. Let Y2. "', Y" be meamf Ii. L 1L2. •.• , ILn and variances set of constants. Then Y = al Yl +. Ili lLi set of independent normal random variables with respectively. Let aI, az, .. ,all be any + ... + all Y" is normally distributed with mean 01, ,.. , 0';, and variance a 2 = EXAMPlE 4.3.10 elevator in the athletic dorm at Swampwater Tech has a maximum capacity of twenty-four hundred pounds. Suppose that ten football players on at the twentieth floor. If the weights of Tech's players are nonnally distributed with a mean of two twenty pounds and a standard deviation of twenty pounds, what is the there will ten fewer Muskrats at tomorrow's practice? Let the random ... , Y10 denote the weights of the ten players. At is the probability that Y p EY 10 ( ;=1 t > = Yj exceeds twenty-four hundred 2400) =p (1 1 )= 10 -LYi> -·2400 10 1=1 10 But > 240.0) Section 4.3 The Normal Distribution 313 can be applied to the latter expression using the corollary on page 311: AZ > 240.0) =: P (Y"20/-v'TO 10 = > P(Z > 3.16) = 0.0008 Clearly, the chances a Muskrat splat are minimal. (How much would players squeezed onto elevator?) change if probability EXAMPLE 4.3.11 personnel department of a large corporation gives two aptitude tests to job applican ls. measures verbal ability; the quantitative ability. From many experience, the company has found a person's verbal score, Yh is normally distributed with /1.1 = SO and Ot 10. The quantitative scores, f2, are nonnaUy distributed with JA-2 = 100 and 0"2 = 20, Yl Y2 appear to be independent. A composite score, Y, is ,,;>;l.I/;llVU to each applicant, where To avoid unnecessary paperwork, the company automatically rejects any applicant whose composite score is below 375. If six individuals submit resum6s, what are the chances fewer than half will fail the test? FIrst we need to calculate the probability that any given candidate will score below the composite cutoff. Since Y is a linear combination of independent normal random variables., Y itself is normally distributed with E(y) = 3E(fl) + 2E(Y2) = 3(50) + 2(100) = 350 and A Z transfonnatioo, then, shows that the probability of a random applicant being summarily rejected is 0.6915: P(Y < 375) = P ( y - 350 -J25OO < .J25OO350) 375 P(Z < 0.50) = 0.6915 Now, let the random variable X denote the number of applicants (out of Y -values would be less than By its structure, X is binomial with n p P(Y < 375) 0.6915. Therefore, = = = whose 6 314 Chapter 4 Special Distributions P(fewer than half the applicants will fail the = P(X < 3) = P(X :s 2) = 2 (~) (O.691S),"(O.3085)6-k =0,0774 EXAMPLE 4.3.12 Let YJ. Y2 •...• Yg be a random sample ot size nine from a normal distribution where f.J.. 2 and (1 2. Let ,Yi. Yj. Yt be an independent random sample from a normal distribution for which f.J.. 1 and (1 = 1. Find ;:: r). The corollary on page 312 can be applied here because the event Y ;:: can be written in terms at a linear combination-&pecifically, Y ;:: O. Moreover. = = = r E(Y) E(r) E(Y) - E(Y"') = 2 - 1=1 and Y) = Var(Y) + Var{r) =-----:,..;-:...+ so P(Y ::: YO) = p(y - ::: 0) Var(Y*) 4 (why?) = 22 12 +4= 25 -1 0-1) = P --==--::: J25/36 = P(Z::: -1.20) = 0.8849 QUESTIONS 4.3.2L Econo-Tire is planning an advertising campaign for its newest product, an inexpensive h8.ve radial. Preliminary road tests conducted by the firm's quality control suggested th8.t the lifetimes of these tires will be normally distributed an average of 30,000 miles and a standard deviation of 5000 miles. The marketing divisioo would like to TIm a commercial that makes the claim that a.t least nine out of ten drivers will get at least 25,000 miles on a set of &ono-Tires. Based on the road test data, is the company justified in making that assertion? 4.3.22. A large computer chip manufacturing plant under construction in Westbank is expected to add 1400 children to the county's public school system once the pennament work force arrives. Any child with an IQ under 80 of over 135 will require indivjduaHzed instruction that wiU cost the city an additional $1750 per year. How much money should Westbank anticipate spending next year to meet the needs of its new special ed students? Assume that IQ scores are normally distributed with a mean (tt) of 100 and a standard deviation ({1) of 16. Section 4.3 The Normal Distribution 315 4.3.23. Records for the past several years show that the amount of money collected dflily by a prominent is distributed with a mean (p) of $20,000 and a standard $5001What are the chances that tomorrow's donations will exceed $30,0001 4.3.24. The foHowing letter was written to a well-known dispenser of advlce to the lovelorn (178): Dear Abby: You wrote in your column that a woman is for 266 Who said r carried my baby for ten months and five days., it because I know the exact date my was conceived. My husband is in the Navy and it couldn't have possibly been conceived any other lime because I saw him only once for an hour, and I didn't see him again until the day before the baby was born. I don't drink or run around, and there is no way this baby isn't his. so print a retractioo about the 266-<1ay carrying time because otherwise I am in a lot of trouble. San Diego Reader 4.3.2.5. 4.3.26. 4.3.27, 4.3.28. Diego Reader is telling the truth is a judgment that lies beyond Whether or not the scope of any stalistical analysis, but quantifying the plausibility of her story does not According to the collective experience o{ generations of pediatricians, pregnancy durations, Y, tend to be normally distributed with Jl = 266 and (J = 16 days. Do a probability calculation that addresses San Diego Reader's credibliity. What would you conclude? A criminologist has developed a questionnaire for predicting whether a teenager will become a delinquent. Scores on the questionnaire can range from 0 to 100, with values reflecting a presumably greater criminal tendency. As a rule of-thpmb, the CrIlffiHlOlO£!.Jst decides to classify a teenager as a potential delinquellt if his or her score exceeds The questionnaire has been tested on a large sample teenagers, both delinquent and nondelinquent. Among those considered nondelinquent, scores were normally distributed with a mean (Jl) of 60 and a standard deviation (u) of 10. Among those considered delinquent, scores were normally distributed with a mean of 80 and a standard deviation of 5. (a) What proportion of the time will tbe criminologist misdassify a nondelinquent as a delinquent? A delinquent as a nondelinquent? (b) the same set of axes, draw the normal curves that represent the distributions of scores by delinquents and nondelinquents. Shade the two areas that correspond 10 the probabilities asked for in Part (a). The cross-sectional area tubing for use in pulmonary resuscitators is normally distributed with Jl = 12.5 mm 2 and (J = 0.2 mm 2 . When area is less than 12.0 mm~ or than 13.0 , the tube not fit properly. (f the tubes are shipped in to find? boxes of 1000, how many wrong-sized tubes per box can doctors At State University, the average score ofthe class on the verbal portion of the SAT is 565, a standard deviation of 75. Marian scored a How many of State's other freshmen did better? Assume the scores are nonnally distributed. A professor Chemistry 101 each fall to a large class of freshmen. she uses standardized exams that she knows from past produce bell-shaped grade distributions with a mean of 70 and a standard of 12. Her philosophy of grading is to impose standards that will yield, in the run,20% A's, 26%B's, 38%C's, 12%D's, and 4%Fs. Where should the cutoff be between the A's and the Between the and the C's? 316 Chapter 4 Special Distributions the random variable Y can be ofa is CI .. ~rrlln.f'r1 P(20 :5: Y :5: 60) by a normal curve with JL == 40. For = 0.50 4.3.30. It is estimated that 80% of aU 18-year-old women have weights ranging from 103.5 to 144.5 lb. the weight distribution can be adequately modeled by a normal curve and assuming that 103.5 and 144.5 are equidistant from the average weight J.L, calculate o. 4.3..31. Recall the breath problem described in Example 4.3.6. Suppose the driver's blood alcohol concentration is actually 0.11 % rather than 0.095%. What is the probability that the breath analyzer will make an error in his favor and that he is not legally drunk? the police offer the driver a choice-either take the SObriety test once or take it twice and average the readings. Which option should a "0.095 %" driver take? Which option should a "0.11 %" driver take? Explain. 4.3.32. If a random variable Y is normally distributed with mean J.L and standard deviation 0, the Z ratio Y - J.L is often referred to as a normed score: It indicates the magniltU(ie a of y relative to the distribution from which it came. " Norming" is sometimes used as an affirmative action mechanism in hiring decisions. Suppose a company is a new sales aptitude test they have traditionally for that shows a distinct gender bias: Scores for men are normally distributed with JL 62.0 and a = 7.6, while scores for WOmen are normally distributed with JL = 76.3 and C1 = 10.8. Laura and Michael are the two vying for the position: Laura has scored 92 on the test and Michael 75. If the company agrees to norm the scores for gender bias, whom should they hire? 4..3.33. The lOs of nine randomly .selected people are recorded Let Y denote their average. Assuming the distribution from which the Yi'S were drawn is normal with a mean of 100 and a standard deviation of 16, what is the probability that Y will exceed 103? What is the probability that any arbitrary Yi win exceed 1037 What is the probability that exactly three oHhe will exceed 103? . .. , Y" be a random sample from a normal distribution where the mean is 2 4.3.34. Let YI. and the variance is 4. How large must n be in order that = P(1.9 :5: Y :5: 2.1) ?:: 0.99 4.3.35. A circuit contains three resistors wired in series. Each is rated at 6 ohms. Suppose, distributed random variable however, that the true resistance of each one js a with a mean of6 ohms and a standard deviation of 0.3 ohm. What is the probability that the combined resistance will exceed 19 ohms? How "precise" would the manufacturing process have to be to make the probability less than 0.005 that the combined resistance of the circuit would exceed 19 ohms? 4..3.36. The cylinders and pistons for a certain internal combustion are manufactured by a process that gives a normal distribution of cylinder diameters with a mean of 41.5 em and a standard deviation of 0.4 em. Similarly, the distribution of piston diameters is normal with a mean of 40.5 em and a standard deviation of 0.3 em. If the piston diameter is greater than the cylinder diameter, the former can be reworked until the two "fit". What proportion of cylinder-piston pairs will need to be reworked? 4.3.37. Use moment-generating functions to prove the two corollaries to Theorem 4.3.3. Section 4.4 Geometric Distribution J11 THE GEOMETRIC DISTRIBUTION Consider a of independent trials, having one of two possible outcomes, success or failure. p = P(trial ends in the random variable X to be the trial at which the first success occurs. Figure 4.4.1 suggests a formu1a for the pdf of X: px(k} = P(X = k} = P(first success occurs on kth trial) = P(first k-1 trials end in failure and kth trial ends in SIlCCess) = P (first k-1 trials = (1 - p)*-l p, in failure) . P (kth trial ends in SIlC:CesS) k = 1,2, ... (4.4.1) We call the probability model in Equation 4.4.1 a geometric distribution (with parameter p). k F 1 failures F F s Independent trials AGUR£4.4.1 Comment. Even without its association with independent trials and Figure 4.4.1, the function px(k) (1 - p)k-l p, k = 1.2.... qualifies as a discrete because (1) px(k) ?:: 0 for all k (2) I: px(k) = 1: = . all k (1 _ p).I:-l p = p =p' 1 - (1 - p) =1 EXAMPLE 4A.1 of fair are until a sum of seven appears for the first probability that more than four roUs win be required for that to happen? here is an indePendent trial. for which Each throw of the p = P (sum = 7) = 6 1 =6 What is the 318 Chapter 4 Special Distributions Let X denote the roll at which the first sum of seven appears. Clearly, X has the structure of a geometric random variable, and P(X > 4) = 1 - P(X :::: 4) = 1 671 = 1 - 1296 =0.48 Theorem4.4.1. Let X have a geometric distribution with px(k) Then 1. MxCt) = = (1- p)k-l p, k = 1,2, .... l-l.!:'p)e 2. E(X) = 1p 3. Var(X) = 7- Proof. See Examples 3.12.1 and 3.12.5 for derivations of Mx(t) and E(X). Theformula for Var(X) is left as an exercise. 0 EXAMPLE 4.4.2 A grocery store is sponsoring a sales promotion where the I.:ashiers give away one of the letters A, E, L, S, U, and V for each purchase. If a customer collects all six (spelling VALUES), he or she gets ten dollars worth of groceries free. What is the expected number of trips to the store a customer needs to make in order to get a complete set'? Assume the differentletlers are given away randomly. Let Xi denote the number of purchases necessary to get the ith different. letter, i = 1, 2, ... , 6, and let X denote the number of purchases necessary to qualify for the ten dollars. Then X = Xl + X2 + ... + X6 (see Figure 4.4.2). Clearly, Xl equals one with probability one, so E(X 1) = 1. Having received the first letter, the chances of getting a different one are for each SUbsequent trip to the store. Therefore, i 1<.=1,2... Second different letter First letter Trips Third different letter ••~L-~----••--~b-~~----~.~~J 1 1 2 1 2 3 x AGURE4A.2. Sixth different lener •• i~----i-~-- 2 . . . section 4,4 That is, E(X2) is a geometric random The Geometric Distribution with p = i. 319 Theorem 4.4.1, g. (for each ~. Similarly, the- chances of getting a third different Letter are purchase), so i X3(k) = P(X3 = k) = (62)k-1 (4)6' k=1.2 .. and E(X3) = ~. Continuing in this fashion, we can the remaining E(Xi)'S. follows that a customer will have to make 14.7 trips to the store, on average, to collect a Letters: complete set of 6 E(X) =L 6 6 6 6 6 =1+5"+4+"3+2+1 = 14.7 EXAMPLE 4.4.3 Geometric random variables have a curious memoryless property: The probability that it takes an additional X = k trials to obtain the first success is unaffected by however many is, failures have already observed. P(X =n - 1 + k IX > n - 1) = P(X = k) (4.42) Prove Equation 4.4.2. We from the definition of conditional probability that P(X = It - 1 + k IX > n 1) - 1+ n = - -=n - -____ ---:-------P(X > n - 1) P(X:::: n - 1 - + k) P(X:::: n "-1 P(X:::: n 1) = L (1 - p)*-lp = P (1 - p)j k=1 - 1), for a geometric random variable is p times the partial showing that the edt, (= 1 + (1 - p) + .. , + (1 - p)n-2). Formulas for sum of a geometric partial sums are well known. Here, 11-2 p L(1 - p)j = j=O 320 Chapter 4 Special Distributions Therefore, P(X=n 1 + k I X > n - 1) = -1"-'-~-"":'------::-:: = p(1 and the latter equals P(X - p)k-l = k). Comment. Radioactive decay is a physical process for which the memoryless property consecutive intervals of equal described in Example 4A.3 applies. If "time" is divided duration, period in which a nucleus decays is modeled by distribution, where p is a function the EXAMPLE 4.4.4 One of the "can't miss" schemes that every would-be gambler sooner or later reinvents is the double-yow-bet strategy. playing a of evenly matched games. bet 01') first if you you $2 on tbe second game; if you lose tbat one, too, you bet $4 on the third game; and so on. Suppose you win for tbe first time on the kth At tbat point you receive $2'I( and yow net winnings will be $1. net amount won after money won lost on winning ktb game = on kth game - previous k - 1 games i' - = = $1 (1 + 2 + 4 + ... + 2k - 1) (2 k - 1) It would appear that doubling our bets guarantees a profit. Where is the catcb (or should we all book the next flight to Las Vegas)? The devil in this case is in the expected value. In order to bankroll the strategy described, a player to have $2k - 1 in order to be eligible to play the kth (to win for the first time on third try, for example, a player would lost $1 on the first game, $2 on the second game, and would have wagered $4 on the third game, so - 1). Now, if it takes $3 to be the money spent at that point is 1 + 2 + 4 = $7, or eligible to win on the second game, and $7 to eligible to win on the third how much capital does a player to have on the average? Let X denote the game where the player wins for the first time. Clearly, X is a geometric ~,k = 1,2,.... to win On the kth random variable for which px(k) = game requires an investment of g(k) == $2k - I, so the expected amount of money needed !k-t . Section 4.4 The Geometric Distribution 321 is E(g(X)), where E[g(X)] = £[2x 1)k-l1 - 11 = k=l (2" - 1) ( 2 2 (1 - (~r) 1 3 7 =2+4+8+'" term in the infinite series that defines E[g(X)J is larger than the one that the gambler would need to an infinite amount of money in order to im~ plement a double-your*betstrategy! (On a more practical casinos always have house limits, so players would not allowed to double their bets indefinitely anyway. Recall that a similar analysis played a role in the St Petersburg analysis introduced 3.5.5.) QUESTIONS convlcllons fraud forgery, Jody has a 30% d1ance each year of her tax returns audited. What is the probability that she will escape distorts, misrepresents, detection for at least three years? Assume that she and cheats year. 4.4.2. A teenager is to get 8 license. Write out the formula for the pdf Px(k), where the random variable X is the number of tries that he needs to pass the road test. Assume that his probability of the exam on given attempt is 0.10. On the to require he gets his license? average, how many attempts is 4.4.3. Is the following set of data likely to have COme the geometric pdf px(k) = (i)t-l . (1), k 1, 2, ... ? Explain. 281 5 4 2 5 2 1 2 5 1 3 2 3 6 3 4 6 262 4 2 2 2 .4 375 3 1 7 3 3 2 8 3 4 4 8 4 2 9 6 3 7 5 3 2 a young couple plans to continue having children until they have their is the outcome of each birth first girL Suppose the probability that a child is a is an independent event, and the birth at which the first girl appears has a geometric distribution What is the couple's Is the geometric pdf a reasonable model here? Discuss. 4.4.5. Show that the cdf for a geometric random variable is given by Fx(t) ::::: P(X :5 t) = 1 - (1 - p)fll, where [I] denotes the greatest in t. 4.4.6. Suppose three dice are repeatedly. Let the random X denote the roll on which a sum of,* appears for the first time. Use the expression for Fx(t) given in Question 4.4.5 to evaluate P(65 ~ X ~ 75). 4.4.7. Let y be an exponential random variable, where Jr (y) = Ae- Ay • 0 :5 For any DmrrtI\re integer n, show that P(n !S Y :5 n + 1) = e- A ). Note that p = 1 , the "discrete" version of the exponential pdf is the geometric pdf. 4A.4. Recently !, 4 Special Distributions 4AJt Sometimes the random variable is defined to be the number of trials, X; preceding the first success. Write down the and derive the momentgenerating function for X two ways-(1) by directly and (2) by Theorem 3.12.3. 4.4.9. Differentiate the ",r.rn~>nt.a"'r\",..:"t ~~)m,etrilC random variable and !leorem 4.4.11 e' 1" 4AJ.O. Suppose that the random variables Xl and X2 have mgfs Mx;(t) 1 - (1 - !)e' and Does X have a geometric Let X = Xl + (1 -;rt 1)' e' distribution? Assume that Xl and are independent. 4A.l1. The moment-generating junction for any random variable W is the expected d' value of tWo Moreover -EU w ) E(W(W - 1)··· (W - r + Find the dt' factorial moment-generating function for a g~)m,etrllc random variable and use it to in Theorem 4.4.1. verify the expected value and variance MX2(t) 4.5 =1 - TI-lE NEGATIVE BINOMIAL DISTRIBUTION The geometric "..10.,,''''''' introduced in Section 4.4 can be generalized in a very straigbtiorwal'd ~.......cvv ...............,5"" .... waiting for the rth (instead of the first) success in a of independent where each trial has a probability of p of ending in Sllccess Figure 4.5.1). T -1 successes and k - 1 - (r - 1) failures S F s F s 1 RGURE4.5.1 Let the random variable X denote the trial at which the rth success occurs. px(k) = k) = P (rah success occurs on lcth trial) = P(r-1 successes occur in first k-l trials and Sllccess occurs on lth trial) = P(r-1 successes occur in first k-1 trials) . P(success occurs on kth trial) = (k - 1) 1 r - (: =~)P'(1 random variable whose has the form given in binomiLll diszribution (with parameter p). neJiEUllVe ,k=r,r + 1,... is said to (4.5.1) a Section 4.5 The Neg<rtive Binomial Distribution 323 Comment. Two equivalent formulations of the negative binomial structure are widely used. Sometimes X is-de~ed to be the number of trials preceding the rth success; other times, X is taken to be the number of trials in excess of r that are necessary to achieve the rth success. The underlying probability structure is the same, however X is defined. We will primarily use Equation 4.5.1; properties of the other two definitions for X will be covered in the exercises. Theorem 4.5..1. Let X have a negative binomial distributwn with px(k) (1 - p)k-~. k 1. Mx(t) = r, r + = [1 _ r = (kr 1) p', - 1 1, .. , Then (:e'_ p)e l r 2. E(X) = P 3. Var(X) = r(l - p) p2 Proof. All of these results follow immediately from the fact that X can be written as the sum of r independent geometric random variables. Xl. X2, .,., X y , eacb with parameter p. That is, x = total number of trials to achieve rth success = number of trials to achieve 1st success + number of additional trials to achieve 2nd success + ... + number of additional trials to achieve rth success = Xl + X2 + .,. + X, where PXj (k) = (1 - p)k-l p, k = 1,2, ... • i = 1,2, ... ,T. Therefore, Mx(t) = MXJ (t) MX2 (t} . .. Mx, (t) ~ [1 - (i'-p)e' J Also, from Theorem 4.4.1, E(X) = E(Xl) + E(X2) + .. , + 1 1 1 p p p =-+-+ ... + r p E(X,) 324 Chapter 4 Distributions Var(X) = Var(Xt) + 1-p =lJl+ - Var(X2) 1 p + ... + Var(Xr ) I-p +"'+];2 p) o EXAMPlE 4.5.1 California are a semipro baseball team. ti.ct('hp'urrr.o all forms of violence, laid-back Mellow batters never swing at a pitch, and should fortunate enough to reach base on a walk, they never try to steal. On the how many runs will the Mellows score in a road game, assuming the pitcher has a 50% probability of throwing a strike on given pitch (81)? solution to this illustrates very nicely the interplay IV>t'WP-jl'n constraints imposed by a this case, by the rules of baseball) binlOIIual mathematical characteristics ptobability model The appears twice in this along with .several of the nr(,,,,,,,r',,"C: with expected values and linear CO(Oblnatlons. To we calculate the probability of a Mellow batter striking out. Let the random variable X the number of pitches for that to happen. Clearly, X = 3,4,5, or 6 can X not be larger than 61), and Q,:><>V.... " " ......... px(k) = P(X = k) = P(2 strikes are called in the is the 3rd strike) pitches =(k; k-l 1)(~r(~r-3, k:3.4,5.6 Therefore, P(bauer strikes out) = px(k) = (~f + (~) (~) 4 + (~) 21 In order for W to take on the strike as must the (w + 3)rd (see p = P (batter strikes out) = pw(w) number of walks the Mellows a given two of the first w + 2 batters must pdf for W, then, is a ...A.D....... }' (ll)W = peW = w) = ( w +2 2) (21)3 32 32 ' w=O.1.2 .... Section 4.5 The Negative Binomial Distribution 325 2 oots, w waJ:ks Out w+2 w+3 1 Batters FIGURE 4.5.2 In order for a run to score, the pnc:ner loaded. Let the random variable R TiA ...,,-,f-,,, inning. Then R= the bases 0 ifw::;:3 w-3 ifw>3 1 and 00 . peW = w) =2:(w - 3 - W"'O + = E(W) - 3 L (w - = W + 3 = total (4.5.2) == pw(t - 3) C; 1) (~~r (~~y-3 binomial pdf with r which we recognize as a 3 E(T) 32 _T 3- which makes E(W) = From Equation 4..5.2, given inning is 0.202: E(R) = ~1 transformation H ....j ... ' . ." " of Mellow batters appearing in a given inning Then pr(l) pew = w) (3 - To evaluate E(W) using S'Lallem.ent of Theorem 4.5.1 4.5.1. Let to rescale W to the format of T 3) . w...o - t == 3, 4, ... = 3 and p Therefore, 32 = 21/32 = 1" 3-11 -"T' expected number of runs ~VL..:;u by Mellows in a (~) (~~y (~~r + 2 . (~) (~~y G~Y + 1 . (~) (~~y (~~)2 - 3 + 3 . 326 4 Special Distributions Each of the nine innings, of course, would have the same value (or E(R), SO the expected number of runs in a game is the sum 0.202 + 0.202 + ... + 0.202 = 9(0.202), or 1.82. QUESTIONS 4.s.L A door-to-door encyclopedia salesperson is required to document five ill-home visits each day. Suppose that she has a 30% chance of being invited into any given home, with each address representing an independent trial. What is the probability that she requires fewer t.han eight houses to achieve her fifth success? 4.5.2. An underground military installation is fortified to the extent that it can withstand up to three direct hits from air-to-surface missiles and still function. Suppose an enemy aircraft is armed with missiles, each a 30% chance of a direct hit. What is the probability that the installation will destroyed with the seventh missile fired? a fair coin and record the toss, X, 4..5.3. Darryl's statistics homework last The experiment was to be repeated a total of where heads appears for the second 100 times. The following are the 100 values for X that Darryl turned in this Do you think t.hat he actually did the assignment? 3 7 4 2 8 3 3 4 3 5 7 3 3 3 2 8 4 2 5 2 2 6 2 4 3 5 2 3 5 2 4 7 2 7 4 4 5 6 5 3 9 3 4 2 4 ·3 3 5 3 2 2 2 6 2 5 4 10 5 5 4 6 3 2 3 4 7 2 4 5 4 2 6 4 3 3 4 2 8 6 6 3 3 2 3 3 5 2 4 5 4 2 3 4 2 3 6 2 3 2 3 4.5.4. When a machine is improperly adj1:lSt~ it has probability 0.15 of producing a defective item. Each day the machine is run until three defective items are produced. If this occurs, it is stopped and checked for adjustment What is the probability that an improperly adjusted machine will produce five or more before being stopped? What is t.he average number of items an improperly adjusted machine will produce before stopped? 4.5.5. For a negative binomial random variable whose pdf is given by 4.5.1, find E(X)direcUybyevaluating Ek(k - 11)P"(1 - k=r r p)k-I'.Hint.: Reduce the sum toone involving binomial probabilities with parameters 7 + 1 and p. 4.5.6. Let the random variable X denote the number of trials in excess of 7 that are required to achieve the 7th success in a series of independent where p is the probability of success at any given trial Show that [Note: This particular formula for px(k) is often used in of Equation 4..5.1 as the definition of the pdf for a negative binomial random variable.] Section 4.6 The Gamma Distribution 327 4.5.7. Calculate the mean, variance, and moment-generating function for a negative binomial random variable X whose pdf is given by the expression px(k) = ( k + kr - 1) k pro - p} , k = 0, 1,2, ... (see Question 4.5.6.) 4.5.8. Let Xl, X2, and X3 be three independent negative binomial random variables with pdfs , = (k px,(k) _2 1) (~)3 (~)k-3 5 5 Ie = 3, 4.5, '" = for i 1, 2, 3. Define X = Xl + X2 + X3. Find P(10 ::; X ::; 12). Hint: Use the moment-generating functions of Xl, X2, and X3 to deduce the pdf of X. 4.5.9. Differentiate the moment-generating function Mx{t) = [1 _(~_ p)e l r to verify the formula given in Theorem 4.5.1. for E(X). 4.5.10. Suppose that Xl, X2 • ... , Xk are independent negative binomial random variables with parameters'1 and p, r2 and p, ... , and rk and p, respectively. Let X = Xl + X2 + ... + Xi(. Find Mx(t), px(t). E(X), and Var(X). THE GAMMA DISTRIBUTION Suppose a series of independent events are occurring at the constant rate of)., per unit time. If the random variable Y denotes the interval between consecutive ocrurrences, we know from Theorem 4.2.3 that fy (y) = >..e- Ay , y > O. Equivalently, Y can be interpreted as the "waiting time" for the' first occurrence. This section generalizes the Poisson/exponential relationship and focuses on the interval, or waiting time, required for the rtb event to occur (see Figure 4.6.1). Theorem 4.6.1. Suppose tJwt Poisson events are occurring at the constant rate oJ)., per unit time. Let the random vllrWble Y denote the wailing lime for the rth event Then Y has pdf Jy(y), where fy (y ) = (r r-1 ->.y ).,r 1)! Y e , y> 0 y • 0 ;H ~ ~ First success Second rth success success AGURE 4.6.1 ~ Time 328 Chapter 4 Special Distributions Pf'ODj. We will establish the formula (or fy(y) by deriving differentiating its edf, Fy(y). Let Y denote the time to the rthoccurrence. Then Fy(y) = P(Y ~ Y) 1 - P(Y > y) = 1 - P(fewer than r events occur in [0, y]) (AY)* =1- k! since the number of events that occur in the interval [0, y] is a Poisson random variable with Xy. From Theorem 3.4.1, Jy(y) = J d [1 Fy(y) = dy (A.y)k-l Ae-1y(A.yl k! (k ~ 1)' A.e- Ay (AY)* A.e-ly (A.y)k _ k! k! e -AY • y>O 0 EXAMPLE 4.6.1 space shuttles plan to include two fuel a~agnmg the next generation pumps-one the other in reserve. [f pump malfunctions, IS automatically brought on line. that fuel be pumped for at most fifty Suppose a typical mission is expected to hours. to the manufacturer's pumps are expected to fail once pump every one hours (so A = 0.01). What are the chances that such a system would not remain functioning for the full fifty Let the random variable Y denote the time that will before the second pump breaks down. to Theorem 4.6.1, the pdf for Y has parameters r = 2 A. = 0.01. and we can write (001)2 y > 0 fy(y) = 7ye-O·OlY. P(system fails to last for = 1050 0.OOOlye-0.01Y dy (0.50 = 10 II ue- du Section 4.6 ·The Gamma Distribution 329 where u = O.Oly. The. probability, then, that the primary pump and its backup would not remain operable for the targeted fifty hours is OJJ9: 0.50 1 o - l)e- p l ue-U du = 0.50 ,,=0 =0.09 Generalizing the Waiting Time Distribution 10 10 By virtue of Tbeorem 4.6.1, y,.-le-).Y dy converges for any integer r > O. But tbe convergence also holds any real number r > 0, because any such r there will be an integer t > r dy :s /-1 dy < 00. finiteness y,.-le-:Ay dy justifies tbe consideration of a related definite integral, one tbat was first studied by Euler, but named by Legendre. 10 10 Definition 4.6.1. For any real number r > 0, the gamma /unction of r is denoted r (r), where foOO r(r) = Theorem 4.6.2. 1. f(l) rer) == 10 dy dy for any real number r > O. Then =1 2. qr) = - l)r(r - 1) 3. Ifr is an integer, then f(r) = (r - I)! Proof, r(I) 10 1. Integrate 1 gamma function by parts. fooo y,.-l dy = y,.-l and dv = e-yl: + 1000 (r - 1)y,.-2 dy = =(r - 1) r. Tben u y,.-2 dy = (r 3. Use Part (2) as tbe basis for an induction argument. exercise. - dy I)r(r - 1) details will be left as an o Definition 4.6.2. Given real numbers r > 0 and)" > 0, the random variable Y is said gamma pd/with parameters r and A if to have ),,1' Jy(y) = f(r) y > 0 Comment. To justify Definition 4.6.2 requires a proof that frey) integrates to one. Let u = )"y. Tben 330 Chapter 4 Special Distributions ThOOl'em 4.6.3. Suppose thot Y has a gamma pdf with parameters r and A.. Then 1. E(Y) = rjA 2. Var(Y) = r/A 2 Proof. 1. E(Y) = AT 10 = -- 00 00 10o dy r(r) Ar r(r -- r(r) r(T) 0 yT e-AY dy Ar Tr(r) dy----(1)-rjA - r(T) Ar+l - + 1) fAT+! 0 r(r + 2. A calculation similar to the integration carried out in Part (1) shows that E(y2) r(r + 1)/A2. Then Var(Y) = E(y2) = r(r + [E(y)f 1)/1..2 - (r{Ai = r/A2 Sums o Gamma Random We have already seen certain random variables satisfy an additive property that "re~pf(XlUlCe!;" the sum of two independent binomial random variables with the same p, for example, is binomial (recall Example 3.8.1). Similarly, the sum of two independent is Poisson and the sum of two independent normals is normal. That said, most random variables are not additive. sum of two independent uniforms is not uniform; the sum of two independent exponentials is not exponential; and so on. Gamma random variables belong to the short list making up the first category. Theorem 4.6.4. Suppose U has the gamma pdf with parameters r and A., V has the gamma pdf with parameters s and A., and U and V are independent. Then U + V has a gamma pdf With parameters r + s and A.. Proof. The pdf of the sum is the convolution integral Make the substitution v = ujt. Then the integral becomes Section 4.6 The Gamma Distribution 331 and (1 - V)&-I dV) (4.6.1) The numerical value of the constant in parentheses in Equation 4.6.1 is not immediately obvious, but the factors in front of the parentheses correspond to the functional part of a gamma pdt with parameters r + sand A. It follows, then, that fu+v(t) must be that particular gamma pdf. It also follows that the constant in parentheses must equal 1/ f(r + s) (to comply with/Definition 4.6.2). so, as a "bonus" identity, Equation implies that (I v r - 1(I _ V),,-I dv = 10 _f....(r_)_r.... (s.... ) r(r + $) o EXAMPLE 4.6.2 In a large industrial plant, on-the-job accidents a worker to be confined to a bed occur at the rate of 0.7 per bour. The company's infirmary has ten beds. Use the central limit theorem and the properties of the gamma distribution inadequate to meet the health to approximate the probability that the infumary will emergencies that arise during tomorrow's eight-hour workday. Let Yi denote the waiting time for the ith patient. Then Y Yl + Y2 + ... + Yll denotes the length of time from the start of the workday to when the eleventh person a bed. Oearly, P(Y < 8) = P(infinnary is unable to provide enough beds). 11 and A 0.7, so Here Y is a gaI11:Qla random variable with parameters r E(Y) = 11/0.7 = 15.7 and Var(Y) = 11/(0.7)2 == 22.45. Using the central limit theorem, then, we find that the probability of the infirmary having too few beds to accommodate tomorrow's demand is apprmtimately 0.05: = P(Y < 8) - 15.7 =P ( Y~ < '" 22.45 == P(Z < .63) =0.05 = 8 - 15.7) ~ '" 22.45 Comment. With the help computer software, the exact answer to the question posed in Example 4.6.2 can be readily obtained. According to MINITAB, P(Y < 8) = 8 (0.7)11 10o =0.03 10! yl0 e-O•7y dy :U2 Distributions Tbeorem4.6.S. IfY has a gamma pdf with parameters r and ~ then My(t) (1 - tjl,rr. Proof. - (A -- tY = (1 - D QUESTIONS 4.6.1. An Arctic weather station has three electronic wind gauges. Only one is used at any given time. The lifetime of each gauge is exponentially distributed with a mean of 1000 hours. What is the pdf of Y, the random variable measuring the time until the last gauge wears out? 4.6~ In Example what account for the sizeable discrepancy between the exact value for P(Y < 8) and its central limit theorem approximation? 4.6.3. A service COntact on a new university computer system provides 24 free repair calls from a technician. the technician is required on the average three times a month. What is the average time it takes for the service contract to be fulfilled? 4.6.4. Suppose a set of measurements Yl, Y2. .. • YlOO is taken from a gamma pdf {or which E(Y) 1.5 and Var(Y) = 0.75. How many Yi 's would you expect to find in the interval (1.0,2.5)1 4.6..5. Demonstrate that J.. plays the role of a scale parameter by showing that if Y is gamma with parameters r and A, then J.. Y is gamma with parameters rand 1. 4.6.6. Prove that r(1) = Hint Consider E(Z2), where Z is a standard normal random variable. 4.6.7. Show that _ 15 t;; - ~v'" 4.6.8. If the random variable Y has the gamma pdf with J.. > 0, show that parameter r and arbitrary E(YI'1I) Hint: Use the {act that 10 y-1e-Ydy = (r - 1)! when r is a positive integer, 4.6.9. Differentiate the gamma moment-generating function to and Var(Y) in Theorem 4.6.3. the focmulas for E(Y) 4.6.10. Differentiate the gamma moment-generating function to show that the formula for E(Y"') given in Question 4.6.8 holds for arbitrary r > O. Section 4.7 Taking a Second Look at Statistics (Monte Carlo Simulations) 333 TAKING A SECOND LOOK AT STATl5l1CS (MONTE CARLO SIMULATIONS) Calculating probabilities associated witb (1) single random variables and (2) functions of sets of random variables has been the overarching theme of Olapters 3 and 4. Facilitating those computations has been a variety of transformations. summation properties, and mathematical relationships linking one pdf with another. Collectively, those results are enormously effective. Sometimes, though, the intrinsic complexity of a random variable overwhelms our ability to model its probabilistic behavior in any formal or precise way. An alternative in those situations-tbat Plan to use a computer to draw random samples from one or more distributions that modeJ portions of the random variable's behavior. If a large enough number of such samples is generated, a histogram (or densityscaled hjstogram) can be constructed that will accurately reflect the random variable's true (but unknown) distribution. Sampling "experiments" ofthis sort are known as Monte Carlo stwlies. Real-life situations where a Monte Carlo analysis could helpful are not difficult to imagine. Suppose, instance, you just bought a statewof-the-art, high-definition, plasma screen television. In addition to the pricey initial cost, an optional warranty is available that covers all repairs made during the first two years. According to an independent laboratory's reliability study. this particular set is likely to require 0.75 service calls per year, on the average. Moreover, the costs of service calls are expected to be normally distributed with a mean (J,t) 01 $100 and a standard deviation (0') of $20. If the warranty sells for $200, should you buy it? Like any insurance policy, a warranty or may not be a good investment, depending on what events unfold, and when. Here the relevant random variable is W, the total amount spent on repair calls during the two years. For any particular customer, the value of W will depend on (1) the number of repairs needed in the first two years and (2) the cost of each repair. Although we have reliability and cost assumptions that ""IHP""" (1) and (2), the 2-yr limit on warranty introduces a complexity that goes beyond wbat we have learned in Chapters 3 and 4. What remains is the option of using random samples to simulate the repair costs that might accrue during those first two years. Note, first, that it would not be unreasonable to assume tha t the service calls are Poisson events.(occurring at the rate of 0.75 per year). If that were the case, Theorem 4.2.3 implies that interval, between successive repair calls will have an exponential distribution with pdf fr(y) =0.75e-o.75y , y > 0 (see Figure 4.7.1). Moreover, if the random variable C denotes the cost associated with a particular maintenance call, then, by assumption, -oo<c<oo (see Figure 4.7.2). 334 Chapter 4 Special Distributions 0.8 0.6 0.4 0.2 o 4 2 FIGURE 4.1.1 u= --~~------------~------------~~-c " 160 100 40 FIGURE 4.1.2 Now, with the pdfs tor Y and C fully specified, we can use to generate repair cost We begin by generating a random sample (of size one) from the pdf, Jy (y) = appropriate MINITAB syntax is / ' " 1/0.75 MTB > random 1 c1; SUBC > exponential 1.33. MTB > print cl As shown 4.7.3, tbe number was 1.15988 yrs repair call occurring days 1.15988 X 365) after the purchase of tbe to a 0.8 o 123 )1= 1.15988 4 RGURE4.1.] Applying the same syntax a yielded the random 0.284931 yrs 104 days); applying it still a third time produced tbe observation yrs 534 These last two observations taken on ff (y) correspond to the second Section 4.7 Taking a Second Look at Statistics (Monte Carlo Simulations) 3rd breakdown (y 423 i'\4 days 104 days 335 1.46394) repair co&t not covered ~---'----~·r-----'----_~__ n __ TIme after purchase (days) 1st breakdown (y '" 1.15988) repair CO&t '" $127.20 Old breakdown cost FiGURE 4..7.4 repair call occurring 104 days after the first; and the third occurring 534 days after the second (see Figure 4.7.4). Since the warranty does not extend past the first 730 days, the third repair would not be covered. The next step in tbe simulation would be to generate two observations frorn Idc) that would rnodel costs of the two repairs that occurred during the warranty period. For each repair, MINITAB syntax for generating a cost would be / '" normal tL u. MTB > random 1 cl; SUllC > normal 100 20. MTB > print cl Running those cornmands twice produced c-values of 127.199 and 98.6673 ure 4.7.5). corresponding to repair bills of $12720 $98,67, that a total of $225.87 $127.20 + $98.67) would have been spent on maintenance during the first two years. In that case, the $200 warranty would bave been a good investment. The final "step" in the Monte Carlo analysis is to repeat rnany, many times the sampling "'-'\"""0:<:: that to 4.7.5-that generate a of YiS whose sum (in days) is less than or equal to 730, and for each Yl in that generate a corresponding cost, Ct. The sum of those CiS, then, wit! be a simulated value of maintenance-cost random variable, W. The histogram in Figure 4.7.6 shows the distribution of costs incurred in one hundred simulated two-year periods, one being the sequence of events chronicled in Figure 4.7.5. There is rnuch that it tells us. of all (and not surprisingly), the warranty costs more than either the median repair bill (=1117.00) or the rnean repair bill $159.10). The customer, in other words, will tend to lose money on the optional protection, and the company will tend to make money. On the other band, a full 33% of the simulated two-year breakdown scenarios lead to repair bills in excess of $200, including 6% that were more than twice the cost of the warranty. At the other extreme, 24% of the samples produce no maintenance problems whatsoever; customers, the $200 spent "up-front" is totally wasted! _ So, should you buy the warranty? Yes, if you feel the need to have a financial cushion to offset the (small) probability of experiencing exceptionally bad luck; no, if you can afford to :absorb an occasional big Spedal Distributions MTB > random 1 eli SUBC > 1.33. MTB > print el 0.8 c1 0.2 1.15988 > random 1 c1; SUBC > normal iOO 20. MiS > print c1 cl 127.199 0.6 0.4 o 2 3 100 14(1 2 :I 100 14(1 2 3 4 , MTB I I I ---- .... .; .; , /0.01 60 MTB > random 1 c1; SUBC > 1.33. 0.8 MTB > 0.4 cl 0.2 c 0.6 0.284931 y 0 , MTB > random 1 el; , I J SUBC > normal 100 20. > print c1 c1 , /0.01 MTB -~ 98.6673 .. ","." 60 M > random 1 e1; 1.33. SUBC > MiS > Q.6 el 0.2 MTB 4 OA- 1.46394 0 RGURE4.7.5 1 4 Appendix 4A 1 "~Ilty '""'" ~~tOO(l"'loo)3°O W.... ," "'''' o MINITAB Applications 331 $500 $300 $400 Simulated repair COSI$ AGURf4.7.6 IENDIX 4.A.1 MINITAB APPUCAll0N5 Calculations involving Poisson, exponential, and gamma random variables can be readily handled with MINITAB's PDP and CDP commands (recall Appendix 3.A.l). Figure 4.A1.1(a) shows the syntax for doing the Poisson caJculation in Example 4.2.2. Values of px(k) = e- LS (1.5)k I k! for all k can be printed out by USing the PDF com.maod without specifying a particular k [see Figure 4.A.1.1(b)]. (a) (b) MTB > edf 3; SUBC > poisson 1.5. Cumulotivc Dilitribution Function Poisson with mu = 1.50000 MTB > pdf; SUBC > pOisson 1.5. Probability Density Function Poisson with mu = 1.50000 P(X <= x) x 3.00 0.9344MTB > let k1 = 1 - 0.9344 MTB > print: kl Data Display kl 0.0656000 o x 1 2 3 4: 5 6 7 8 9 P(X" 0.2231 0.3347 0.2510 0.1255 0.0471 0.0141 0.0035 0.0008 0.0001 0.0000 P(X > 3) = 0.0656 AGURf 4.A.1.1 Areas under normal curves between points (l and b are calculated by subtracting Fy(a) from Fy(b) , just as we did in Section 4.3 (recall the comment after Definirion4.3.1). is no need, however, 10 reexpress the probability as an area under the standard normal curve. Figure 4.A.L2 shows the MINTfAB calculation for the probability that the random variable Y lies between forty-eight and fifty-one, where Y is nonnally distributed with 338 Chapter 4 Special Distributions JL = 50 and a 4. According to computer, P(48 < Y < 51) = Fy(51) - Fy(48) = 0.5987 - 0.3085 = 0.2902 MTB > edt 51; SUBC> normal 50 4. Cumulative Dis.tnlnrtioo Fundion = Normal with mean 60.0000 and standard deviation x P( X <= x) 51.0000 0.5987 MTB > edt 48; SUBC> normal 50 4. Cumulative Distribution Fnndioo Normal with mean = 50.0000 and standard deviation x P( X <= x) 48.0000 0.3085 MTB > let kl - 0.5987 - 0.3085 MTB > print k1 = 4.00000 = 4.00000 Data Display kl 0.290200 FIGURE 4.A.1.1 Exponential and gamma integrations can also be done on MINITAB, but the computer expresses those two pdfs as fy(y) = (1j'A)e-Y/J.. {instead of fy(y) = ).e-J..y) and fy(y) = 1 ).r(r _ 1)! 10 ~ [instead of frey) = (r _ 1)!yr-l e -J..Y). Therefore, to evaluate 1 0.50e-{).50y dy, for example, we would MTB > edt 1; SUBC > 2. (rather than SUBC > exponential 0.50). Recall Example 4.6.1. In the notation of 4.6.1, P(Y < 50) is the cdi evaluated at for a gamma random variable having r 2 and). = 0.01. In MINITAB's notation, the second parameter is entered as 100 1/0.01) Figure 4.A.1.3). On occasions in O1apter 4 we made use of MINITAB's RANDOM command, a subroutine that samples from a specific Simulations of that sort can be very helpful in illustrating a variety of statistical Shown in 4.A.l.4, for example, is the syntax for generating a random sample of size 50 from a binomial having n = 60 and p = 0.40. And calculated for each of those 50 observations is its = Appendix 4.A.1 MINITAB Applications 339 MTB > cdf 50; glffiC::.: gamma 2 100. Cumula.t:ift Distribution Fundion Gamma with & x 50.0000 = 2.00000 and b = 100.000 p( X <- 0.0902 FlGURE4A13 MTB > random 60 c1; SURe> binomial 60 0.40. Data Display Cl 27 29 23 22 21 21 22 26 26 20 26 25 21 32 22 27 22 20 19 19 21 23 28 23 27 29 13 24 22 26 25 20 25 26 15 24 11 28 21 16 24 22 26 25 21 23 23 20 25 30 MTB > let c2 a (cl - 24)!sqrt(14.4) MTB > name c2 JZ-ratio J MTB > print c2 DmDisplay Z-ratio 0.19057 1.31762 -0.26352 -0.52705 -0.79057 -0.79057 O. 0.52705 -1.05409 0.52705 0.26362 0.79057 -0.52706 0.79057 -0.62705 -1.05409 -1.31762 -1.31762 -0.26352 1.05409 -0.26352 0.79057 1.31762 -2.89875 -0.62706 0.26362 0.26362 -1.05409 0.26352 0.52705 0.00000 . -1. 84466 1.06409 -0.79057 -2.10819 0.00000 0.26362 0.26352 -0.79057 -0.26352 -0.26362 -1.05409 1.58114 -0.52706 2.10819 -0.79057 0.00000 -2.37171 -0.52706 0.26352 R6URE 4.A. 1.4 Z-ratio, given by . X - E(X) X - 60(0.40) Z-ratlO = = JVar(X) J6O{0.40) (0.60) X - 2A == -=::- (By the DeMoivre-Laplace Theorem, of course, the distribution of those ratios should normal pdf. !z(1.).) In addition to the binomial have a shape much like the distribution, the RANDOM command can also be used to generate samples from the uniform, Poisson, normal, exponential, aod gamma pdfs. Often the first step in summarizing a large set of measurements is the construction of their histogram, a graphical format especiaUy effective at highlighting the shape of a is calibrated in such distribution. A density-sCIlled histogram is one whose vertical a way that the total area under the histogram's bars is equal to 1. The latter version allows for a direct comparison between the sample distribution and the theoretical pdf 340 Chapter 4 Special Distributions MTB > DATA) DATA) DATA> MTl! > DATA> DATA> set <:1 126 73 26 6 41 26 73 23 21 IB 11 3 3 2 6 6 12 38 6 65 68 41 38 50 37 94 16 40 77 91 23 51 20 18 61 12 end set c2 0 20 qO 60 80 100 120 140 end MT8 > Histog~am c1; SUSC> Density SUSC> CutPoint c2: SUllC> SUllC> SUllC> Bar; Type 1; Colo~ 1. 0.02 0.01 o 20 60 80 100 120 140 y FIGURE 4.A.l.S from which the data presumably came (recall Case Study 4.2.4). 4.Al.S shows the MINITAB syntax that the density-scaled histogram 1"I1("h"'~'1i in Figure 4.2.3. MINITAB Windows There is a Windows version of MINITAB that is very convenient for doing many of data applications that will be discussed in end-of-chapter appendices. point-and-click steps will be set in boxes, like the one below showing the procedure for constructing a histograIIL Histograms Using MlNITAB Windows 1. 2. 3. 4. Enter the data in Cl. Click on GRAPH, then on HISTOGRAM. Cl in the GRAPH VARIABLES box. Click on OK Appendix 4A2 ENDIX 4.A.2 A Proof of the Central limit Theorem 341 A PROOF OF THE CENTRAl UMrT THEOREM Proving Theorem 4.3.2 in itS full generality is beyond the level of this text. However, we can establish a slightly weaker version of the result by assuming that the moment·generaling function of each Wi Motivating derivation is the following lemma. Lemma. Let Wh Wl •... be a set of random variables such thaI lim MW,,(t) aJIt someinleTVaiaboUlO. Then lim Fw,,(w) ,,~oo "_00 = Fw(w)forall-oo <:: w = Mw(t) for <:: 00. To prove the central limit theorem using moment.generating functions requires showingthat For notational simplicity, let + ... + WI where ~ = (Wi Theorem 3.123, f.L)Ju. W" - nJ1. = SI + ... + S" 0 and Var(Si) Notice that E(S/) = 1. Moreover, from where M(t) denotes the moment·generating function common to each of the SiS. By virtue of the way the SiS are de:fioed, M(O) 1, M(l) (0) = E(S,) = 0, and M(2)(O) = Var(S/) = 1. Applying Taylor's theorem, then, to M(t), we call write for some numberr, 1'1 <:: itl. Thus = exp lim n In ,,~oo [1 + -M(2)(S)] 2n 142 Chapter 4 Specie! Distributions The existence of M(I) implies the ensteJ!lCe of all derivatives. In particular. M(3)(t) so M(2)(t) is continuous. lim M(2)(t) = M(2)(0) 1. Since lsi < 1_0 Itl/~, s -+ 0 aSn -+ 00, so lim M(2)(s) 1'1-"'00 = M(2}(O) = 1 Also, as n -+ 00, the quantity (t 2 /2n)M(2)(s) -+ 0 . 1 the definition of the derivative_ Hence we obtain = 0, so it plays the role of" 6x" in Since this last expression is the moment-generating function for a standard nonnal random variable, the theorem is proved. CHAPTER 5 Estimation 5.1 5.2 INTRODUCTION ESTIMATING PARAMETERS: THE METHOD OF MAXIMUM LIKELIHOOD AND THE METHOD OF MOMENTS 5.1 INTERVAL ESTIMATION 5.4 PROPfRTIES OF ESTIMATORS 5.5 MINIMUM-VARiANCE ESTIMATORS; THE CRAMER-RAO LOWER BOUND 5,6 SUFFICIENT ESTIMATORS 5.7 CONSISTENCY 5.8 BAYESIAN ESTIMA1l0N 5.9 TAKING A SECOND LOOK AT STATISTICS (REV1SmNG THE MARGIN OF ERROR) APPENDIX 5A1 MINITAB APPLICATIONS . Ronald Aylmer Fishel' A towering figure in the development of both applied and mathematical statistics, Fisher took formal training in mathematics and theoretical physics, graduating from Cambridge in 1912. After a brief career as a teacher, he accepted a post in 1919 as statistician at the Rothamsted Experimental Station. There the day-ta-day problems he encountered in coliecting and interpreting agricultural data led directly to much of his most important work in the theory of estimation and experimental design. fisher was also a prominent geneticist and devoted considerable time to the development of a quantitative argument that would support Darwin's theory of natural selection. He returned to academia in 1933, succeeding Karl Pearson as the Galton Professor of Eugenics at the University of London, Fisher was knighted in 1952. -Ronald Aylmer Fisher (1890-1962) ].41 5 5.1 Estimation INTRODUCTION The ability probability functions to describe, or model., experimental data was demonstrated in numerous in Chapter 4. In Section 4.2, the Poisson· number of alpha from a radistribution was shown to predict very well dioactive source as as the number of fumbles by a college football team. In """"uun 4.3 another probability model, the curve, was applied to as and IQ scores. Other models illustrated in 4 diverse as breath induded the exponential, binomial, and distributions. All of these probability of course, are actually families of models sense each includes one or more parameters. The model, for instance, is indexed by the occurrence rate, A. Changing A changes the associated with px(k) which px(k) = e-A).."'j k!, k = 0, 1,2, ... for A = 1 and ).. = binomial model is defined in terms of the success probability p; the tL and (1. the two M"'~n.,.'" any of these models can be applied, values need to be assigned to their parameters. Typically, this is done taking a random sample (of n observations) and using those measurements to estimate unknown parameter(s). 0.4 0.4 0.3 0.3 1\,=1 Px(k) 02 px(k) 02 0.1 0.1 0 I 0 2 4 6 8 k 0 1\,;;4 0 2 4 6 8 10 12 k FIGURE 5.1.1 Imagine being handed a coin whose probability, p, of coming up is unknown. Your assignment is to toss the coin three and use the resulting sequence of lis and Ts to of three tosses turns out to HHT. Based suggest a value for p. SupfX>Se the on those what can be reasonably inferred about p? Start by the random X to be the number heads on Ii given toss. Then x= I 01 if a toss comes up heads if a toss tails Section 5.1 Introduction 345 and the theoretical probability model fOl" X is the function k px(k) = p (1 - Expressed Xl = I,Xz p) I-A- terms of X. the sequence = 1, and X3 = O. = { P 1 _ P for k for k =1 = 0 corresponds to a sample of size n = 3, where Since the XiS are independent random variables, the probability associated with the sample is p2(1 - p): P(XI = 1 n X2 = 1 n X3 = 0) = P(Xt = 1) . P(X2 = 1) . P(X3 = 0) = p2(1 - p) Knowing that our objective is to identify a plausible value an "estimate") for p, it could be argued that a reasonable choice for that parameter would the value that maximiZes probability of the sample. 5.1.2 shows P(Xl = I, Xl = 1, X3 = 0) as a function of p. inspection, we see that the value that maximizes the probability of HHTisp= j. More generally, suppose we toss the coin n times and record a set of outcomes Xl::::: kl. X2 = k2,.·., and XI'! = k". Then ... ,XI'! =kn )=pk1 (1 _ p)l-kl ••• p k., ki -p ) 1-k" n-tki (1 - p) i=) 0.16 0.12 ~ I 0.08 0.04 ~~--~~--~~--~~~~--~~----p 0 0.2 0.4 0.6 '2 P"'j FlGIJtuE 5.1.2 0.8 346 OIapter 5 Estimation The of p that maXlJ]!l1ZI~ P (X 1 n- Jq the (1 of dJdp [. r: k. p=l (1 - = kl' ... , =k,,) is, of course, the value for which k; with respect to p is O. But p) .] r: kl ,,- p) = 1=1 n }:)i [. r: k,-l pi=1 .,] p) (1 - 1=1 + [t. k ' k, n] 1'1- (1 k,-l (5.1.1) p) If the derivative is set equal to zero, Equation 5.1.1 reduces to " (1 - p) +( ki n)p=O Solving for p identifies (~) ki as the value of the parameter that is moot consistent with the n observations kh ... , k". Comment. Any function of a random whose objective is to approximate a parameter is called a statistic, or an estimator. If 8 is the parameter being approximated, its estimator will be denoted When an estimator is evaluated (by substituting the actual measurements recorded), the number is called an esti.mnte. In Example e. the function n (~) Xi is an estimator for p; the value = 3 observations are Xl = 1, (~) = 1, and ~ that is calculated when the = 0 is an estimate of p. More specifically, Xi is a maximwn likelihood estimator (for p) and j [= (~) i5 ki = (~) (2)] is a maximu.m likelihood estimate (for p). In this chapter, we look at some of the practical, as well .as the mathematical, issues involved the problem of How is the functional form of an estimator determined? What properties does a estimator have? What properties would we like an estimator to have? As we answer these questions, our focus will begin to shift away from the study of probability and toward the study of statistics. 5.2 ESllMATING PARAMETERS: THE METHOD OF MAXIMUM UKELIHOOD AND THE METHOD Of MOMENTS fl, Y2, ., Y" is a random sample from a continuous pdf fy(y), whose unknown parameter is o. (Note: To emphasize that our focus is on the parameter, we will ---'--J continuous pdf's in chapter as fy (y; 8); similarly, probability models with an Section 5.2 Estimating Parameters 341 unknown parameter (} will be denoted px(k; e)]. The question how should we use the to approximate ef Example 5.1.1. we saw that the p in the discrete probability ix(k; p) = P"(1 • k = 0, 1 could reasonably be estimated by the function (~) " ii. based on random sample il. X2 = k2 •. '" XII = kn. How would the of the estimate if the data came say, an exponential distribution? Or a distribution? this section we introduce two for finding method of maximum likelihood and the method of moments. Others are available, but these are the same answer. two that are the most widely used. Often, but not always, they give ... £\,,"~r,,", The Method of Maximum ukeUhood The basic idea behind m.aximum is the rationale that was appealed to choose as the for () that value to in Example 5.1.1. That it seems the parameter that maximizes the "likelihood'" of tbe latter is measurea by a likelihood /unction, which is simply product of the pdf eVfllUilltea each of the data In Example 5.1.1, the likelihood function for the sample for Xl = L X2 = 1, and X3 = 0) is the product p2(1 - p). Definition S.2.L pdf px(k; 8), product of the kl. k2 • ... , k" be a random sample of n from the discrete The likelihood /unction, L(8), is 9 is an unknown evaluated at the n n L(O) =Il px(kj; 9) 1=1 If Yl, Y2 •.•.• Yn is a random sample of n from a continuous pdf, fy(y; 9), where () is an unknown parameter, the likelihood fUnction is written II L(O) fr(Yi; B) Comment. Joint pdf's and likelihood functions look the same, but the two are for a set of n random variables is a interpreted A joint pdf function of of those n variables, either k}, k2, ... , kn or Yt. Y2, ... , Yn. By contrast, L is a function of 0; it sbould not be considered a function of either the kiS or Yi&' Definition 5.2.2. Let L(O) = n PX(ki; 0) and L(8) = n !Y(Yi; 8) be the n II 1=1 ;=1 .uA~,.uLUJV'U functions corresponding to random samples ih k2. ...• and Yl, Y2 •... , YII from the pdf px(k; 9) and continuous pdf fy(y; 8), respectively, 9 is an unknown parameter. In each case, let Be be a value of the parameter such that L(O,,) 2: L(8) for all possible values of B. Then ge is called a maximum likelihood estimalf 9. 348 Chapter 5 Estimation Applying the Method of Maximum Likelihood We will see in Example 5.2.1 and many subsequent examples that finding the Oe that maximizes a likelihood function is often an application of the calculus. Specifically, we solve the equation :8 L(e) = 0 for O. In some cases, a more tractable equation results by setting the derivative of In L(O) equal to O. Sinoe In L(O) that maximizes In L(B) also maximizes L(B). with L(B), the same 8e EXAMPLE 5.2.1 Suppose that Xl = 3, X2 = 2, Xs = 1, and X4 ::::: 3is a set of four independent observations representing the probability model, px(k) = (1 - p)k-l p, k = 1,2,3, ... FInd the likelihood for p. According to Definition 5.2.1, L(p)={(1- p)3-1 p ]((1 - p)2-t p ][(1_ p)1-lp][(1_ p)3-l p J ::::: (1 _ p)5 p 4 Then In L(p) = 5 In (1 - p) + 41n p. Differentiating in L(p) with respect to p gives din 5 --dp To find the ~p = 0 - 4 p +-p p that maximizes L(p), we set the derivative equal to zero. Here, _ _5_ 1 - p implies that + 4(1 - p) = 0, and the solution to the latter is Notice, also, that the second derivative of In L(p) ( = -5 - :2) p ::::: + a. is negative for all 0 < p < I, so p = ~ indeed, a true maximum of the likelihood function. (Following tbe notation introduced in Definition ~ is called the mnximum likelihood estimate fOT p, and we would write Pe = Comment. There is a better way to answer the question posed in Example 5.2.l. Rather than evaluate---and difierentiate-the likelihood function for tbe particular sample observed (in this case, the four observations 3,2,1, and 3), we can get a more infonnative answer by considering the more general problem of taking a random sample of size n from px(k) (1 - p)k-lp and using the outcomes-Xl =- kb = k2, ,., Xn = k,,-to find a formula for the maximwn likelihood estimate. For the geometric pdf, the likelihood function based on sucb a sample would be written = n II L(p) = (1 - pi~l-l p ;=1 = (1 - " Ek;-" p)i-l p" Section 5.2 InL(p) = 349 to work with In L(p) than L(p). Here, it will be was the case in Example Estimati ng Parameters (=1 n) . k/ - In(! - p) + nlnp and ~ ... u,'''' the derivative equal to 0 p(n - tki) + (1 - p)n =0 /",,1 which implies that 4 (Reassuringly, for the particular sample assumed in Example ..,."",.... - •• = 4 and E ki = j ...1 3+ 2+1+3= formulajuslderived re<llUC<~to the maximum likelihood estimate of ~ that we found at the outset) Comment. Implicit in Example and the that followed is the important distinction between a maximum likelihood estimate and a maximum likelilwod estimtUor. The first is a number (or refers to a number); second is a random variable (recall the Comment on 346). Both ~ and the formula --:-- are maximum li.kelihood estinwtes (for p) and would be Eki 1... 1 denoted PI!, because both are numerical constants. In the first case, the actual values of the kiS are provided and Pe ( ~) can be calculated. In the second case, the kj5 are not identified but they are constants nonelrneJess. If, on other hand, we imagine measurements before they are recorded-that is, the random variables ... , the fonnula --:-- is more propedy as Ely 1=1 written as the quotient n latter, a random is the maximum likelihood estimator (for p) and would be denoted Ii Maximum likelihood estimators, such as p, have pdfs, expected values, and ances; maximum likelihood such as Pe, have none of those " ..... ...,. ,.,.""'~ .....')"""'Tti'·" 350 Chapter 5 Estimation EXAMPlE 5.2.2 An has reason to believe that the pdf type of measurement is the continuous model _1 ye-y/8 f y (y; () e2 the variability in a certain 0 < Y < 1 00; 0 < 0 < 00 Five data points have been collected-9.2. 5.6, 18.4, 12.1, and '10.7. Find the maximum likelihood estimate for O. Following advice in the Comment on p. 348, we begin by a t;"-'1.L"-'L'''l formula for Oe-that is, by assuming that the data are the n observations, Yl, )'2 •••. Y,.· likelihood function, then, oe(;onles I 1 e e n -:Y' il L(O) 2 -",/8 I JI 1=1 11 =e-2.n TIYi e -(1/8)li ;=1 and + In TIYi 1'1 In L(O) -2nlnO i=1 - -01 Yi Setting the derivative of In L(O) equal to 0 gives 1 dIn e + dO 02 II ;=1 which implies that 1 ,. Oe= - LYi 2n i=1 The final step is to evaluate numerically the formula for O/!, Substituting the actual 11 5 E Yi = 5 sample values recorded ::::::: 9.2 + 5.6 + 18.4 + 12.1 + 10.7 = 56.0, so 1 Oe = 2(5) (56.0) = 5.6 Using Order ..:>t",ri'ic-lti..c as Maximum Ukelihood Estimates .. . dL(() dIn L(O) . Situahons elOst for which the equatIons - - 0 or = 0 are not mearungful dO de neither yield a solution for O/!. occur when the range of the pdf from which the are drawn is a function of the estimated. happens, for instance, when the sample of YiS come from unifonn fy (y; O) = lIe, 0 :5 Y ::S 0.) The maximum likelihood estimates in cases wUl be an order typically either Ymill or ymll){. Section 5.2 Estimating Parameters 351 EXAMPLE Suppose Yl> Y2 •... , y" is a set of measurements representing an exponential pdf with A. = 1 but with an unknown "threshold" parameter, 8. That fY(y; 8) e -(y-8) , Y 2: e; 8 > 0 (see Figure 5.2.1). Find the maximum likelihood estimate for FIGURE 5.2.1 Proceeding function: the usual fashion, we start by deriving an expression for the likelihood =e Here, finding 8e by solving !!.. (de t Yi + nO) equation d 1: 0 = 0 will not work because din L(e) = = n. Instead, we need to look at the likelihood fUnction directly. i=1 Il - I: Yi+ n8 Notice that L~O} = e 1=1 is maximized when the exponent of e is maximized. for YI, Y2, .. , Yn (a!ld n), - E" Yi + nO as large as possible requires that 0 ;=1 as large as possible. 5.2.1 shows bow large 8 can only as far as the smallest order staristic. Any value of e condition on frey; 8) that y 2: 8. Therefore, Be = Ymin. It can be to the than Ymin would violate the 352 Chapter 5 Estimation CASE sruOY 5.2.1 «What are you majoring in?" may be most common question asked of a college student. the answer is simple: decided on a field of study, they doggedly the way to graduation. For though, path is not so straight. stay with it Premeds losing the battle with organic chemistry and engineers unable to ;!I"'l,nrPrl the joy of secants may find their roads to commencement taking a few detours. Listed in the first two columns of Table 5.2.1 are the results of a "major" poll conducted at the University of West (114). for each 356 upperX, thai he or she had switched majors. classmen was the number of Based on the nature of these data, it would not be unreasonable to hypothesize the the law of small numbers that X has a Poisson distribution Section 4.2). Do the actual frequencies support that contention? TABLE 5.2.1 Observed Frequency Expected 237 90 230.4 1 2 3 7 Number of Major Changes o 21.8 3.6 356.0 To see if px(k) =e -A}...k can """'·"11'-'" an adequate to these for A. Given that requires that we first XII k", observations ... , and " L(}...)=D-. 1 ,= kI·!' k; =--"',,-fl ki ! i=l In L(A) 11) InA = -nA + ( t;ki - In [l" ki! and dIn dA =-n + (Col'Ifi.rn.I.ed on ~It page) Section 5.2 Estimating Parameters 353 Setting the derivative equal to zero shows that the maximum likelihood estimate for Ais the sample mean: 1 n =-I)i n i=l According to the infonnation in Table 5.2.1, 237 of the ki 's were equal to zero, ninety were equal to one, and so on. Substituting into Equation 5.2.1, then, Ae 1 = -[237 356 = 0.435 .0 + 90 . 1 + 22 .2 +7 . 3] so the specific model being proposed is px(k) = - - - - : - - - k = 0, 1, 2, ... The corresponding expected 356 . px(k)] for each value of X are listed in column 3 of Table 5.2.1. with the observed frequencies appears to be quite good. Our that nothing in these data rules out using the Poi'lson as a [Formal procedures, known as goodness-oj-fit the (or lack of agreement) between tests, have been developed for will be taken up in Chapter 10.] a ret of observed and expected frequencies. Finding Maximum uKelihood Estimates When More Than One Parameter Is Unknown If a family of probability models is indexed by two or more unknown parameters-say, fJl, fh, .•. , 8k-finding maximum likelihood for O;s requires the solution of a set of k simultaneous equations. If k = 2, for we would typically need to solve the system ain ----=0 (101 iHnL(Ot, =0 (lfh EXAMPLE 5..2A SUIl)o()Se a random sample of size n is drawn -00 twcl-p~lralnet,er < y < method of maximum likelihood to find 00; -00 JLe nonnal pdf, < JL < and 0;. 00; 0 2 > 0 354 Chapter 5 Estimation We start by finding L(f..L. and in L(f..L, L(f..L, (1 2 ) = and 1'1 2 ~ (Yi - f..L)2 In L(J.1., (1 2 ) = --In(21ra ) - -21 L.. 2 j""J a Moreover, It L = 1=1 and n ---"-:.-"':'- = 1 . 21r - ~ 2 t (Yi - J-k)2 ( - ; ) i =1 (1 Setting the two derivatives equal to zero gives the equations (Yi - and 11 + (5.2.2) f..L) = 0 L (Yi - J-k)2 =0 (5.2.3) ;=) Equation 5.2.2 simplifies to which implies that f..Le = ~ n Yi = y. Substituting f..L{!, Equation or Comment. The method of maximum likelihood has a long history: Daniel Bernoulli it as early as 1777 (136).lt was Ronald Fisher, though, in the early years of the twentieth century, who first studied the mathematical properties of likelihood estimation in any detail, and the procedure is often.c.~dited to was Estimating Parameters Section 5.2 355 QUESTIONS = = 5.2.L A random sample of size 8--Xl 1, X2 0, X3 :: 1. X4 = 1. Xs and Xs = O-is taken from the probability function px(k; B) = tI(l - B)l-k, = 0,1; k = 0, X6 = 1. X7 = 1, 0 < B < 1 Find the maximum likelihood estimate for B. 5.2.2.. The number of red chips and white'cluJ'S in an urn is unknown, but the proportion, p, ofreds is either or!, A sample of size 5, drawn with replacement, yields the sequence red, white, white, red, and white. What is the maximum likelihood estimate for p? i .5.23. Use the sample Y1 = 8.2, Y2 = 9.1, Y3 = 10.6, and Y4 likelihood estimate for A in the exponential pdf Jy (y; J..) = J..e-)..", = 4.9 to calculate the maximum y:::: 0 5.2.4. Suppose a random sample of size n is drawn from the proba bility model px(k; B) = (/1-"e- 02 k! k = 0,1,2, ... Find a formula for the maximum likelihood estimator, S.2..S. Given that Yl = 2.3, Y2 e. = 1.9, and Y3 = 4.6 is a random sample from Jy(y; 8) = Y'e- yIB (1)4 , y ~ 0 calculate the maximum likelihood estimate for 8. 5.2.6. Use the method of maximum likelihood to estimate B in the pdf fy(y; B) 8 = 2.;ye-e'/y, y > 0 Evaluate 8e for the following random sample of size 4:Yl = 6.2, Y2 Y4 = 4.2. = 7.0, YJ = 2.5, and 5.2.7. An engineer is creating a project scheduling program and recognizes that the tasks ma.king up the project are not always completed on time. However, the completion proportien tends to be fairly high. To reflect this condition, he uses the pdf where y is the proportion of the task ooropleted. Suppose in his previous project, the proportion of tasks completed were 0.77, 0.82, 0.92, 0.94, and 0.98. Estimate e. 356 Chapter 5 Estimation 5.2.8. The following data show the number of occupants in passenger cars observed during one hour at a busy inter8e(;tion in Los (68). Suppose it can be assumed that these data foUow a geometric distribution, px(k; p) = (1 - p)k-l p, k = 1,2, ... Estimate p and compare the observed and expected frequencies for each value of X. Number of 678 21:7 1 2 3 4 5 6+ 56 28 8 14 1011 5..2.9. (8) Based on the random sample Yl = = 1.8, = 14.2, and Y4 = 7.6, use the method of maximum likelihood to estimate the pal:auleu:r8 in the uniform pdf frey; e) 1 0' (b) Suppose the random sample in Part (a) ·,..,....'rf'.""nk the two-parameter uniform pdf fy(y; 81,~) 1 = ----a,-1 ' Find the maximum likelihood estimates for a. and~. 5.2.10. Find the ll1.aximumlikelihood estimate for (J in the pdf fr(Y; 0) = -'----= 1 - if a random sample of size 6 yielded the measurements 0.70, 0.63, 0.92, 0.86, 0.43, and 0.21. 5.2.11. A random sample of size n is taken from the pdf fy(y; fJ) = 2y(P, 1 O<y<- - -a Find an expression for 8, the maximum likelihood estimator for a. '5.2.12. If the random variable Y denotes an individual's income, Pareto's law claims that P(Y ::: y) Fy (y) = (;) =1 (i, where k is the entire population's minimum income. It follows that - (;) (i, and, by differentiation, fy(y; 8) = el!l (y1)9+1 . y ::: k; 8 ::: 1 Assumek is known. Find the maximum likelihood estimator for (J ifincome information has been collected on a random sample of 25 individuals. Section 5.2 Estimating Parameters 351 is a measure of lifetimes of rI".,Ji ...·~ that do not age (see Question the exponential case of the Weibull distribution, which measures time to failure of devices probability of failure increases as time does. A Weibull random variable Y Jy(y; ct, /3) = apyJJ-le-ityfJ , 0:::: J, 5.2.13. The o< ct. 0 <: f3 (al Find the maximum likelihood estimator a assuming that f3 is known. (b) Suppose ct and f3 are both unknown. Write down the equations that would be solved simultaneously to find the maximum likelihood estimators of a /3. 5.2.14. Suppose a random sample of size n is drawn from a nonnal pdf where the mean f.L is known but the variance 0 2 is unknown. method of maximum likelihood to find a fonnula &2. Compare your answer to the maximum likelihood estimator found in Example 5.2.4. The Method Moments A second for estimating parameters is the method of moments. near the turn of the twentieth century by the great British statistician, Karl Pearson, the method of moments is more tractable than the method of maximum likelihood in situations where the probability model parameters. and its pdf is a function of s unknown Suppose Y is a continuous random ... , Os. The first s moments Y, if they exist, are by the parameters, 01, integrals In each £(Y i) will be a different HU!I.'L"UU of the s parameters. E(yl) = gl(lh.th •... ,Os) E(y2) = th .... , Os) . 1 Corresponding to each theoretical moment, E(YJ), is a sample moment, - j Yi' ni=l Intuitively, the jth sample moment is an approximation to the jth theoretical moment. a system of s equations, the two equal for each j "UJUU'JU~ to which are the desired set Ole, the •... , and Definition 5.23.. Let Yl. )'2... , YII a random sample from continuous pdf fy(y; O}, th. "'. Os). The method moments estimates, Ole, ... , and Ose, for the 358 Estimation model's unknown parameters are f: f: solutions of the s Y fy(y; OJ, th.···, i fy(y; th, th,···. pX(k; Ot. th ..... 0.1'), the Note: If the underlying variable is discrete metnCiQ of moments estimates are the solutions of the EXAMPlE 5_2.5 Suppose that four from the pdf Y2 = 0.10, Y3 = 0.65, and Y4 fy(y; 0) = 9yfJ-l , equations, = 0.23 is.a random sample of size 0::: y < 1 the method of moments estimate for 9. approach that was LVl,'V~''''''' we will derive a general for the method moments estimate that only one equation needs to be solved any use of the four data the is indexed by ~ ~"",""" the same The first theoretical moment Setting E(Y) to 1 n n moment, Yi(= y), the first o =Y oec:aID~ Section 5.2 Estimating Parameters 359 which implies that the method oC moments estimate for e is y 8e = - 1 - Here, y = i(0.42 + 0.10 + 0.65 + 0.23) Be = Y = 0.35, so 0.35 1 - 0.35 = 0.54 CASE STUDY 5.2.2 Although hurricanes generally strike only the eastern and southern coastal regions of the United States, they do occasionatly sweep inland before completely dissipating. The U.S. Weather Bureau confirms that in the period from 1900 to 1969 a total of thirty-six hurricanes moved as far as the Appalachians. In Table 5.1.2 are listed the maximum twenty-four-hour precipitation levels recorded for those thirty-six storms during the time they were over the mountains (67). Figure 5.2.2 shows the data's density-scaled histogram. Its skewed shape suggests that Y, the maximum twenty-Cour-hour precipitation associated with inland hurricanes, can be modeled by the two-parameter gamma pdf, Use the method of moments to estimate rand >..; then superimpose fy(y; reo Ae) on a graph of the density-scaled histogram of the 36 YiS. From Theorem 4.6.3, E(Y) = f and so (Continued on next page) 360 Chapter 5 Estimation Sfitdy 5.22 rontinUl!d) TABlE 5.2.2: Maximum Twenty-Four-Hour Precipitation Recorded for Thirty-Six Inland Hurricanes (1900-1969) Year Name 1969 1968 1965 1960 1959 1957 Camille Candy Betsy Brenda Hazel 1952 1945 1942 1940 1939 1938 1934 1933 1932 1932 1929 1928 1928 1923 1923 1920 1916 1916 1912 1906 1902 1901 1900 1900 Able Precipitation (inches) 31.00 2.82 3.98 4.02 Meadows, Russels Point. Ohio Slide Mt., N.Y. Big Meadows, Va. Eagles Mere, Pa. BloserviUe 1-N, Pa. North Ford 1# 1, N,C. Crossnore, N.C. Big Meadows, Va. Rhodhiss Darn, N.C. Caesars Head, Hubbardston, Mass. Va. N.Y. 9.50 4.50 11.40 10.71 6.31 4.95 5.64 5.51 9.72 4.21 11.60 4,75 6.85 6.25 3.42 11.80 0.80 Altapass, N.C Highlands, N.C. Lookout Mt., Tenn. Highlands, N.C. Norcross, Ga. Horse Cove, N.C. Sewanee, Tenn. Linville, N.C. Marrobone, Ky. St. Johnsbury, Vt 3.69 3.10 22.22 7.43 5.00 4.58 4.46 8.00 3.73 3.50 6.20 0.67 (Continued on Section S~2 Estimating Parameters 3fi1 0.12 0.10 i!' 0.08 .~ 0.06 ~ o 0.04 0.02 0 4 M W U Maximum 24-hl" rainfall (ill.) B tl ~ II RGURE 5.2.2 5.2.2, according to the figures in 1 Yi 36 1 36 == Y7 = 85.59 To find'e and Ae> then, we need to solve the two equations T - = 7.29 A. and TV" + 1) Substituting r = = 85.59 into the second equation = or AI! Then, from the The estimated model, equation,'e = 1.60 [= 7.29(0.22)]. fy(y; 1.60. is superimposed on the data's deooty-scaled histogram in Figure 5.2.3. Considering the relatively number of observations in the sample, the agreement is quite (Conlinued OT/ next page) 362 5 Estimation Study 5.2.2 continued) 0.12 0.10 O.G5 0.06 0.04 0.02 o 4 8 24 Maximum 24-hr rainfall (io..) 16 12 20 AGURE 5.2.3 good. (The adequacy of the approximation here would come as no to a meteorologist: The gamma distribution is frequently used to describe the variation in precipitation levels.) QUESTIONS 1 5.2.15. Let Yl, ,Y2, ••. , y" be a random sample of size 17 from the unifonn pdf, fy (y; 9) 0~ ::::; 8. Find a formula for the method of moments estimate for 9. Compare the values method of moments estimate and the maximum likelihood estimate if a random of size 5 of the numbers 17, 46,39, and S6 (ree-aU 5.2.9). method of moments to estimate 8 in the pdf e' h(Y; 8) = (82 + 8);YJ-l(1 - y), 0 < y < 1 Assume that a random of size n has been collected. 5.2.17. A criminologist is searching through FBI files to document the prevalence of a rare double-whorl finge1'print. Among six consecutive sets of 100,000 prints scanned by a computer, the numbers of persons having the abnormality are 3, 0, 3, 4, 2, and 1, respectively. Assume that double whorls are Poisson events. Use the method moments to estimate their occurrence rate, ).. How would your answer change if ). were estimated using the method of maximum likelihood? .5.2.18. Find the method of moments estimate for). if a random sample of size n is taken from the exponential pdf, jy (y; J.) = le-).Y. Y 2: 0, 5.2.19. Suppose that YI = 8.3, Y2 = 4.9, Yj = and Y4 = 6.5 is a random sample of size 4 from the two-parameter uniform pdf, Use the method of moments to calculate and Bu. Section Interval Estimation 363 5..2.20. Find a formula for the method of moments estimate for the parameter 8 in the Pareto pdf, fy(y; (I) 1)8+1 . = 8k8 (Y y~k; Assume that k is known and the data consist of a random sample of size n. Compare your answer to the maximum likelihood estimator found in Question 5.212 S.2.Zl. Calculate the method of moments estimate for the parameter f) in the probability function if a sample of size 5 is the set of numbers 0, 0, 1, 0, 1. .5.2.22. Find the method of moments estimates for J1. and (f2, based on a random sample of size n drawn from a normal pdf, where J1. = £(1') and (f2 = Var(y). Compare your answers with the maximum likelihood estimates deri"ed in Example 5.2.45.2.23. Use the method of moments to deri"e formulas for estimating the parameters r and p in the negative binomial pdf, pxCk;r,p)=(: =~)pr(1- p)k-r, k=r,r + 1, ... 5.2.24. Bird songs can be characterized by the number of dusters ofHsyllables" that are strung together in rapid If the last cluster is defined as a it may be reasonable to treat the number of dusters in a song as a geometric random variable. Does the model Px (k) = (1 - p)k-l p, k = 1,2.... adequately describe the following distribution of 250 song Lengths (102)? Begin by finding the method of moments estimate for p. Then calculate the set of "expected" frequencies. No. of 1 132 2 52 3 34 4 9 7 5 5 6 5 6 7 8 INTERVAL e5l1MATION Point estimates, no matter how they are detennjned. share the same fundamental weakness: They provide no indication ofthejr inherent precision. We know, for instance, that i = X is both tp.e maximum likelihood and the method of moments estimator for the Poisson parameter, A. But suppose a sample of size six is taken from the probability model px(k) = e-Akk Ik! and we find that = 6.8. Does it follow that true A is likely 364 Cha pter 5 Estimation to close to Ae-Say, in the interval from 6.7 to 6.9--or is the estimation process so imprecise that)., might actually be as small as 1.0, or as large as 12.01 Unfortunately, point estimates, by themselves, do not allow us to make kinds of extrapolations. Any such be taken into account. statements require that the variation of the The usual way to quantify the amount of uncertainty in an estimator is to construct a confidence interval. In confidence are ranges of numbers that have a high of "containing" the unknown as an interior point. a good sense of tbe estimator's looking at the width of a confidence interval, we can precision. EXAMPLE 5.3.1 Suppose 6.5, 9.9, 12.4 fy(y: tt) '-"'J •. ...,,,'u.• ,"- a random sample 1 = -==--e size four from pdf -oo<y<oo is, the four come from a normal distribution where l1 is equal to 0.8, but the of four data points? mean, tt, is unknown. What values of /J.. are believable in To answer that that we keep the distinction between estimates and all, we know from 5.2.4 the maximum likelihood estimate for /J.. is /J..e something very (~) ;=1 Yi = (~) (38.0) = 9.S. We also know =Y about the probabilistic behavior of the maximum likelibood Y-tt estimator, Y: According to the Corollary to Theorem 4.3.4, normal pdf, fz(z). The probability, then, that A Y-/J.. has a will [aU between two 0.8/ 4 specified values can be deduced from Table A 1 in the Appendix. For P(-1.96 s; Z :::; 1.96) (see = 0.95 = P (-1.96 s; Y - tt s; 1.96) (53.1) 5.3.1). Area ~--- --~ -1,96 o FIGURE 5.3.1 0.95 ---===-- 1.96 ~ y-~ O,81V4 Section 5.3 Interval Estimation 365 "Inverting" probability statements of the sort illustrated in Equation 5.3.1 is the mechanism by which we can identify aset of parameter values compatible with the sample If p( -1.96::: Y -Ik :::1.96) =0.95 then p (Y 0,8 - - 1.96 ../4 ::: Ik ::: Y + 1.96 0.8) J4 = 0.95 which implies that the random interval 0.8 - (Y 1.96 ../4' Y 0.8) + 1.96 J4 a 95% chance of containing J..! as an interior point. case After substituting for Y, random interval in 0.8 ( 9.50 - L 96 ../4,9.50 + 0.8) 1.96../4 to = (8.72, 10.28) We call (8.72, 10.28) a95% confidence interval/or J..!. In the long run, 95% oftbe intervals constructed in this fashion will contain the unknown J..!; the remaining 5% will lie either entirely to the left of J..! or entirely to tbe right For a set data, of course, we have no way of knowing whether the calculated (Y - 1.96 ' ~. Y+ 1.96 . ~) is one of the 95% that contains J..! or one of the 5% that does not. Figure 5.3.2 illustr~tes graphically the statistical implications associated with the 0,8) · (ran dom mterval Y - 1. 96°·8 ,J4' -y + 196 ' J4 . For every d'ff 1 erent t he interva1 will have a different location. While there is no way to know whether or not a intervaJ-in particular, the one the experimenter has just calculated-will include the unknown J..!, we do have the reassurance that in the long run 95 % all such intervals will, Comment. behavior of confidence intervals can be modeled nicely by using a computer's random number generator. The output in Table 5.3.1 is a case in point. Fifty 1----1---][- ----1-- .,. ---1----1---- I I...-_...I....-_--'-_ _I . . - _ - ' -_ _ _ _-'--_....A-_ _I..-_ 1 2 3 4 5 6 Possible 95% confidence intervals fur p., AGURE 5.3.2 7 8 Data set 366 Chapter 5 Estimation TABLE 53.1 50 el-c4; normal 10 0.8. :> rm6an cl-c4 c5 :> let c6 - c5 - 1.96*(0.8)/sqrt(4) :> let c7 • c5 + 1.96.(0.8)/sqrt{4) :> name c6 'Lov.Lim.' c1 'Upp.LiIII.' :> priDt e6 c7 MTB :> random SU8C :> MTB MTB MTB MTB MTB DataDi~lJy Rov 1 2 3 4 5 6 7 Low. Lim. 46 8.7596 8.8763 8.8337 9.5800 8.5106 9.6946 8.7079 10.0014 9.3408 9.5428 8.4650 9.6346 9.2016 9.2517 8.1U8 9.8439 9.3291 9.5685 8.9728 8.5175 9.3979 9.2116 9.6277 9.4252 9.6868 8.8779 9.1670 9.3271 9.1606 8.8919 9.3838 8.7575 10.4602 8.9437 9.0049. 9.0148 8.8110 9.1981 9.0042 9.1019 9.2161 8.3901 8.6337 9.4606 9.3278 8.6643 47 9.0&U 48 9.2042 9.2710 9.5697 8 9 10 11 12 1S 14 15 16 17 18 19 20 21 22 23 24 25 26 27 2S 29 30 31 32 33 34 35 35 37 38 39 40 41 42 43 44 45 49 50 Lim. 3276 10.4443 10.4017 11.1480 10.0786 11.2626 10.2709 Cont:a1ns Ii- - 10? Yes Yes Yes Yes Yes Yes Yes 11.6694 NO 10.9088 11.1108 10.0330 11.2026 10.'775a 10.8197 10.3248 11.4119 10.8911 11.1365 10.5408 10.1456 10.9659 10.7795 11.1967 10.9932 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 11.2548 10.4459 10.7250 10.8951 10.7286 10.4599 10.9518 10.3256 12.0282 NO 10.5117 10.5729 10.6828 10.3190 10.1561 10.5122 Yes Yes Yes Yes Yes Yes Yes Yes 11.2699 10.7847 9.9581 10.2017 11.0286 10.8968 10.1523 10.6221 10.7722 10.8390 11.1377 NO Yes Yes Yes Yes Yes Yes Yes YM SO confidence intervals contain tbe true 11(= 10) Section Interval Estimation 361 simulations of the confidence interval described in Example 5.3.1 are displayed. That fifty samples. each of 11 = 4, were drawn from the norma) pdf Jy(y; f.!) 1 = J27r(O.8) e (1ii)2. --00 < Y < 00 MINITAB's RANDOM command. (To fully the to know tbe value that each confidence interval was seeking to contain-the true f.! was assumed to equal ten). For each sample of size n = 4, the lower and upper limits the corresponding 95% confidence interval were calculated, using the formulas · =y - - 196°,8 Low. L1m. . ..J4 U pp. Lim. = -y + 196°·8 . .J4 As the last column in the DATA DISPLAY indicates, only three of the fifty confidence intervals fail to contain f.! = 10: Samples and thirty-three yield intervals that lie entirely to the right of the parameter. while sample forty-two produces a range of values that lies entirely to the left. The remaining forty-seven intervals, though. or 94% ~ x 100) do contain the true value of f.! as an interior point. CASE STUDY 5.3.1 In the eighth century B.C., the Etruscan civilization was the most advanced in all of Italy. Its art forms and political innovations were destined to leave indelible marks on the entire Western world Originally located alo.ng tbe western coast between the Arno and Tiber rivers (the region now known as Tuscany), it spread quickly across tbe Apennines and eventually overran much of Italy. But as quickly as it came, it faded Militarily it was to prove no. match for the burgeoning Roman legions, and by the dawn of Christianity it was all but gone. No chronicles of the empire bave ever found, and to this day its ongms shrouded in mystery. Were the Etruscans native Italians, or were they immigrants? And if tbey were immigrants, where did they come from? Much what is known has come from anthro.pometric studies-that is, investigations that use body measurements to detennine racial characteristics and ethnic origins. A case in point is the set of data given in Table 5.3.2, showing the sizes of eightyfour Etruscan skulls unearthed in various archeo.logical digs thro.ughout Italy (7). The sample mean, }i, of those measurements is 143.8 mm. Researchers believe that skull widths of present-day Italian males are normally distributed with a mean (f.!) of 132.4 mm and a standard deviation (u) of 6.0 mm. What does the difference between y = 143.8 and f.! = 132.4 imply about the likelihood that Etruscans and Italians share the same ethnic origin? (Continued on nul page) 368 Chapter 5 Estimation (eliSe Slwfy 5.3.1 continued) TABLE 5.3.2 (mm) of 84 Etruscan Males Maximum Head 141 146 144 141 141 136 137 149 141 142 142 147 148 155 150 144 140 140 139 148 143 143 149 140 132 158 149 144 145 146 143 135 147 142 142 138 150 145 126 135 142 140 148 146 149 137 140 154 140 149 140 147 137 131 152 150 146 134 142 147 158 144 146 148 143 143 132 149 144 152 150 148 143 142 141 154 141 144 142 138 146 145 One way to answer that question is to construct a 95% confidence interval for the true mean of the population represented by the eighty-four YiS in Table 5.3.2. If that confidence interval fails to contain fL = 132.4, it could be that the were not the of modern Italians. (Of course, it would also be necessary to factor in whatever evolutionary trends in skull have occurred for Homo sapiens. in general, over the past three thousand years.) h follows from the discussion in Example 5.3.1 that the endpoints for a 95% confidence interval for J.L are by the general formula (v - 1.96· 5n. y + 1.96· 5n) that ( reduces to 143.8 - 1.96 . 6.0 143.8 + 1.96 ,J84~)- mm, 145.1 mm) Since the value fL = 132.4 is not contained in the 95% confidence interval (or even to being contained), we would conclude that a mean of 143.8 (based on a sample of size eighty-four) is not likely to have come from a normal population where fL 132.4 (and (f 6.0). It would appear, in other words, that Italians are not the direct descendants of Etruscans. = = Comment. Random intervals can be constructed to have whatever "confidence" we choose. Suppose Z,,12 is defined to be the value for which P(Z :::: Za/2) = a/2. If a 0.05, for example, Zal2 = 2.025 = 1.96. A 100(1 - a)% confidence interval for fL, then, is the = Section 5,3 range Interval Estimation 369 numbers ( Y- 13_ Za/2 . Y + Za/2 • (3) ..fo In practice. ex is typically set at either 0.10, 0.05, or 0.01, although in some fields 50% confidence intervals are frequently used. Confidence Intervals for the Binomial Parameter, p Perhaps the most frequently encountered applications of confidence are those involving the binomial parameter, p. Opinion surveys are often the context: When polIs are released, it has become standard practice to issue a discl.ai.mer by saying that the findings have a certain margin error. we will see later in this section, margins error are related to 95% confidence intervals. The inversion technique followed in Example 5.3.1 can be applied to large-sample binomial random variables as welL We know from Theorem 4.3.1 tbat (X - np)/Jnp(l - p) (X/n- approximately a standard normal distribution when X is binomial and II is large. It is also true that pdf can approximated by fz (z), a result that seems plausible given that likelihood estimator for p. Therefore, * is the maximum (53.2) Rewriting Equation by isolating p in the center of the inequalities leads to the fonnula given in Theorem 5.3.1. Theorem 5..3.1. Let k be the nwnber of successes in n independent trials, where n is large p P(success) is unknown. An approximate 100(1 - a)% confidence interval/or p is lhe set of numbers = (~ Za/2 J (k/n)(1 - kIn) k ,n n + Za/2 (k/n)(l - kIn) n 370 Chapter 5 Estimation CASE STUDY 5.3.2 Intelligent on planets is a theme that oontinues to be box magic. Theatergoers seem equaUy enthralled by intergalactic brethren portrayed as hostile like the machines in H. G. Wells's War of the Worlds, or '-"-'''><1'\'' free like the nebbish in Stephen SpieJberg's E. T. What is not so clear is the extent to which people actually believe such creatures exist. In a dose encounter of the statistical kind, a Media-General-Associated Press poll found that of 1517 respondents accepted the idea of intelligent life existing on Based on what might we reasonably conclude about the proportion of all Americans who believe we are not alone? the "believable" values (or p according to Given that n = 1517 and k = Theorem 5.3.1 are the numbers from to 0.50: 713 1517+ (0.44,0.50) If true proportion of in other words, who in extraterrestrial life is less than 0.44 or greater than 0.50, it would be highly unlikely that a sample proportion (based on 1517 responses) would be 0.47. Comment. We call (0.44,0.50) a 95% confidence interval for p, but it does not follow that p has a 95% chance of lying between 0.44 and 0.50. The p is a constant, The so it faUs between 0.44 and 0.50 either 0% of the time or 100% of the refers to the procedure by which the interval is constructed, not to any particular intervaL This, of course, is entirely analogous to the interpretation given earlier to 95% confidence intervals for J-L. Comment. Robert Frost was certainly more familiar with iambic pentameters than he was with estimated parameters, but in 1942 he wrote a ooupLet sounded very much like a perception a confidence (99): We dance round in a and suppose, But the Secret sits in the middle and knows. EXAMPLE 5.3.2 Central to statistical software package is a random number generator. Two or three simple commands are typically aU that are required to output a sample of size n representing any of the standard probability models. But how can we be that numbers purporting to be random observations from, say, a norma] distribution with J1. = and 0' = 10 actually do represent particular Interval Estimation Section 5.3 111 TABlE 5.3.3 0.00940* 0.93661 0.46474* 0.58175* 5.05936 1.83196 0.81223 1.31728 0.54938* 0.75095 139603 0.48272* 0.86681 0.04804* 1.91987 1.84549 0.81077 0.73217 232466 0.50795* 0.48223* 0.55491* 0.07498* 1.92874 1.20752 0.59111* 0.52019* 0.66715* 0.11041* 3.59149 0.07451* 1.52084 1.93181 0.11387* 0.36793* 0.73169 338765 2.89577 1.38016 1.88641 1.06972 0.78811 0.38966* 0.16938* 3.01784 1.20041 0.41382* 2.40564 0.62928* 216919 0.42250* 2.41135 0.05509* 1.44422 031684* 1.07111 0.09433* 1.16045 0.77279 0.21528* median of fr(y) = e-Y , y > 0] *number =.s 0.69315 The answer we cannot; however, a number of "tests" are available to check whether simulated measurements appear to random with respect to a given criterion. One such procedure is the median test. Suppose Yl. Y2 •... ,Yn denote measurements presumed to have come from a continuOllS pdf fy(y). Let k denote the number of YiS that are less than the median of fy(y). If the sample is random. we would expect the difference between ~ and .!. to be small. More 2 specifically, a 95% confidence interval on k should contain the value n Listed in 5.3.3 is a set of sixty Yis generated by MINITAB to represent the exponential pdf, frey) = e-Y • Y 2::: O. Does this sample pass the median test? The median here ism = 0.69315: 1'1 1 m e-Ydy = -e- Y [ = 1 - which implies that m = m(05) = 0.69315. Notice of sixty entries in Table 53.3, a total of k 26 (those marked wi th an asterisk, "') fall to left of the median. For these . k 26 partIcular Yi S , then, ;; = 60 = 0.433. Let p the (unknown) probability that a random observation produced by MINlTAB's generator will lie to the left of the pdf's median. Based on sixty of numbeTS extending from observations, the 95% confidence interval for p is the 0.308 to 0.558: = (~ L~ (26/60)(160- 26/60), ~ + 1.96 (26/60)(1 - = (0.308,0.558) The fact that the value p = 0.50 is contained in the confidence interval implies that these data do pass the median test. It is entirely believable, in other words, that a bona would have twenty-six observations fide exponential random sample of size below the median, and thirty-four above. 372 Chapter 5 Estimation Margin of Error In the popular press, estimates for p (i.e., values of ~) are typically accompanied by a n IrUlrgin of errorJ as opposed to a confidence interval. The two are related: A of error is half the maximum width of a 95% confidence intervaL number actually quoted is usually as a percentage.) Let w denote the width of a 95% confideoce interval for p. From Theorem 5.3.1, ~ + l.96J(k l n)(1 n- w kin) _ (~ _ l.~(kln)(ln - kin») = Notice that for fixed n, w is a o :::: ~ :::: I. of the product the largest value that (~) (1 ~) (~) can (1 - ~} is But !. that or! (see Question 5.3.16). Therefore, maxw=3.~ 'c.I"'V{1 4;;Definition 5.3.1. The m.llrgin of error with an estlmate!:, where k is the n of successes in n independent trials, is l00d%, where d = 1.96 EXAMPLE 5.3.3 Hurricane Charley, a four storm that devastated of southwestern Florida in August, was both a political issue as well as a meteorological catastrophe. Under scrutiny was the Federal government's response to the storm's victims and whether that respoll1se might the Presidential election. Several weeks after the clean-up had a USA Today reported that 71% of the 1,002 adults in terviewed "approved" of the way President Bush and his administratioll were handling disaster relief. What margin of error would be associated with the 71 % 'I Applying Definitioll 5.3.1 (with n == 1(02) shows the margin error associated with the poll's result is d = 1.96/(2Jl002) = 0.031 Notice that the margin of error has nothing to do with the actual survey results. Had the percentage of respondents approviog President Bush's handling of the situatioll been 17% or 71 %, or any other number, the of error. by definition, would have been the same. Section 5.3 Interval Estimation 373 Choosing Sample 5i;!:es Related to confidence intervals and margins of error is an important experimental design question. Suppose a researcher wishes to estimate the binomial parameter p based on rea sults from a series of n independent trials, but n has yet /0 be determined. Larger values of It will, of course , yield estimates having greater precision, but more observations also demand greater expenditures of time and money. How can those two concerns best be reconciled? If the experimenter can articulate the minimal degree of precision that would considered acceptable, a Z transformation can be used to calculate the smallest (ie., the cheapest) sample capable of achieving that objective. For example, suppose we want X to have at n problem is solved, a 100(1 - a)% probability of if we can find the smallest 11 P(-d ~ ~ - p within a distance d of p. The which ~ d) = 1 - (5.3.3) a Theorem 5.3.2. Let X be the estimator for the pa.rameter p in a binomial distribution. In n order for X to have at lenst a 100(1 - a) % probability of being within a distance d of p, n should be no smaller than the sample 2 "a/2 It = 4J2 where Za/2 is the value for which P(Z 2:. Zan) = a/2 Proof. Start by dividing the terms in the probability portion of Equation by the standard deviation of X to form an approximate Z ratio: n -;:==== < -;:====== < -;==:=;===;:::= r=;::===:=<Z< d - Jp(1 - p)/n But P( -Zan :s Z ~ Za/2) =1 )=l-a - a, so d Za/2 which implies that (5.3.4) Equation 53.4 is not an acceptable final answer, though, because right-hand side is a function of p, the unknown parameter. But p(1 p) :s for 0 ~ p ~ 1, so the l )74 Chapter 5 Ertimation sample size n= would necessarily cause X to satisfy Equation 5.3.3, regardless of actual value of p. n (Notice the connection between the statements of Theorem 5.32 and Definition o EXAMPlE 5.3A A public health survey is being the proportion children, immunization. Organizers of the "''''L'''.u' .....''5 in a metropolitan area for the purpose of zero to fourteen, are lacking adequate would like the proportion of inadequately immunized children, X, to have at least a 98% probability of being 0.05 of the true proportion, p. How should the sample Here 100(1 - a) :::::: so a = 0.02 and ZuJ2 2.33. By Theorem 5.3.2, then, the smallest acceptable sample is 543: Comment. Ckcasionally, there be reason to believe that p is necessarily less than some number '1, where'l < or than some number f2, where 1'2 > If so, the racl:ors p(l in Equation can replaced by either 'l (1 - '1) or '2(1 - (2), and the required to estimate p with a specified will be reduced, by a considerable amount Suppose, for example, that previo1J:S immunization studies that no more than 20% of children between the of zero and fourteen are inadequately The smallest sample then, for which f. P ( -0.05 is 348, an 11 that represents almost 5: - 5 P a 36% 0.05) reduction ( = 543 - 348 original n = (0.05)2 (0.20) (0.80) =348 = x 100) from the Section 5.3 Interval Estimation 375 Comment. Theorems 5.3.1 and 5.3.2 are both based on the assumption that the X in X varies according to a biriomial model. What we learned in Section 3.3, though, seems to II. contradict that assumption: Samples used in opinion surveys are invariably drawn without replacement, in which case X is hypergeometric, not binomial The consequences of that particular "error," however, are corrected and frequently negligible. It can be shown mathematically that the expected value of whether X is binomial or hypergeometric; its binomial, X is the same regardless n though, is different. If X is If X is hypergeometric, where N is the total number of subjects in the population. · N ' . somewhat smaller t ·) S rnce N - n < 1, the actuaI vanance -X 1S anh the (brnomial - 1 II. variance we have been assuming, n ratio N - n is caUed the finite correclwn N - 1 factor. If N is much larger than n, which is typically the case, then the magnitude of N - II. N - 1 will so close to 1 that the variance of X is equal to p(1 - for aU practical purposes. n II. The "binomial" assumption in those situations is more than adequate. Only when the sample is a sizable fraction of the population do we need to include the correction factor in any calculations that involve the variance of X. QUESTIONS 5.3.L The production of a nationally marketed detergent results in certain workers receiving prolonged exposures to a Bacillus subtilis enzyme. Nineteen workers were tested to determine the effects of those exposures, if any, on various respiratory functions. One such f1lIlction, air-flow rate, is measured by computing the ratio of a person's forced expiratory volume (FEVl ) to his or her vital capacity (VC). (Vital capacity is the maximum volume of air a person can exhale after taking as deep a breath as possible; FEV1 is the maximum volume of air a person can exhale in one second.) In persons with no lung dysfunction, the "nonn" for FEV1NC ratios is 0.80. Based on the following data (169), is it believable that exposure to the Badllus subtilis enzyme has no effect on the FEV1NC ratio? Answer the question by constructing a 95% confidence intervaL Assume that FEV1NC ratios are normally distributed with (1 = 0.09. 376 Olapter 5 Estimation RH RB MB DM WB RB BF IT PS RB 0.61 0.70 0.63 0.76 0.67 0.72 0.64 0.82 0.88 0.82 WS RV EN WD FR PD EB PC RW 0.78 0.84 0.83 0.82 0.74 0.85 0.73 0.85 0.87 5.3.2. Mercury pollution is widely recognized as a serious ecological problem. Much oC the mercury released into the environment originates as a byproduct of coal burning and other industrial processes. It does not become dangerous until it Calls into bodies of water where microorganisms convert it to methylmercury (CHF), an organic form that is particularly toxic. Fish are the intermediaries.: They ingest and absorb the methylmercury and are then eaten by humans. Men and women, however, may not that question, six women metabolize CFIf! at the same rate. In one study were given a known amount of protein-bound methylmercury. Shown in the following table are the half-lives of the methylmercury in thelr systems (117). For men, the average CHf! half-life is believed to be 80 days. Assume that for both genders, CH~OO half-lives are normally distributed with a standard deviation (a) of eight days. Construct a 95% confidence interval for the true female CFIf! half-life. Based on these data, is it believable that males and females metabolize methylmercury at the same rate? Explain. AE EH U .A.N KR LU 52 69 73 88 87 56 5.3.3. Suppose a sample of size n is to be drawn from a normal distribution where a is known to be 14.3. How large does II have to be to guarantee that the length of the 95% confidence interval for;.t will be less than 3.06? 5.3.4. What "confidence" would be associated with each of the following intervals? Assume that the random variable Y is normally distributed and that (1 is known. (a) (Y - 1.64· (b) (--00. y + 2.58· (c) (Y - 1.64 . 5n. y) a Y+ 2.33 . 5n) 5n) ""''"''''..".\ 5.3 0 371 n, are to be drawn from a normal distribution 5...3.5. Five independent samples. each of where Interval Estimation is known. For each sample, the interval (Y - 0.96· 5n' y + 1.06 . 5n) will be constructed. What is the proba bility that at [east four of the intervals will contain the unknown 5...3.6. Suppose that Yl. )'2 ••..• Yll is a random sample of size n from a normal distribution where o is known. Depending on how the tail area probabilities are split up. an infinite number of random intervals having a 95 % probability of containing IJ.. can be constructed. What is unique about the particular interval (Y - 1.96 . :n. Y+ .:n)1 If the standard deviation (0) associated with the pdf that produced the following sample is would it be correct to claim that (2.61 - 1.96 . 3.6 2.61 + 1.96· 3. 6) .J2i5 (1.03,4.19) is a 95% confidence interval for /%? Explain. 2.5 3.2 0.5 0.4 0.3 0.1 0.1 0.2 0.2 1.3 0.1 0.4 1.4 11.2 2.1 0.3 10.1 7.4 5.3.8. Food-poisoning outbreaks are often the result of contaminated salads. [n one study out to assess the magnitude of that problem, the New York City Department of Health exami,ned 220 tuna salads marketed by various retail and wholesale outlets. A total of 179 were found to be unsatisfactory for health reasons (166). Find a 90% confidence interval for P. the true proportion of contaminated tuna saJads marketed in New York City. 5.3.9. In 1927. the year he lDt 60 home runs, Babe Ruth batted .356, having cotlected 192 hits in 540 o.fficial at-bats (145). on performance that season, construct a 95% confidence interval Ruth's probability of getting a in a future at-bat. break during the telecast of Bowl XXlX cost ap5.3.10. To buy a 3Q..second proximately $1,000,000. Not surprisingly, potential sponsors wanted to know how many people might be watching. In a survey of 1015 potential viewers,281 said they expected during the game. Define the releto see less than a quarter of the advertisements vant and estimate it using a 90% confidence interval 5...3.1L During one of the first "beer wars" in the early 1980s, a taste test between Schlitz and Budweiser was the focus of a nationally broadcast TV commercial. One hundred people agreed to drink from two unmarked mugs and indicate which of the two beers they liked better; fifty~four said "Bud." Construct and interpret the corresponding 95 % confidence interval for p, the true proportion of beer-drinkers who prefer Budweiser to Schlitz. How would Budweiser and Schlitz executives each put these results in the best possible Jight for their respective companies'! 5...3.12. If (0.57, 0.63) is a 50% oonfidence interval for p. what does observations were taken? ~ n equal and how many 378 Chapter 5 Estlmation 5.3.13. Suppose a coin is to be tossed n times for the purpose of estimating p, where p= How large must n be to that the of the 99% confidence interval for p will be less than 0.02? 5..3.14. On the morning of November 9, 1994-the after the electoral landslide that returned Republicans to power in both branches of Congress-several key races were still in doubt. The most prominent was the Washington contest involving Democrat Tom Foley, the reigning speaker of the house. An Associated Press story showed how narrow the margin had become (124): With 99 percent of reporting, Foley trailed Republican challenger votes, or 50.6 percent to 49.4 percent. About 14,000 absentee Nethercutt by just ballots remained uncounted, making the race too close to call. Let p P(absentee voter prefers Foley). How small could p have been and still have Foley a 20% chance of overcoming Nethercutt's lead and winning the election? 5..3.15. Which of the following two intervals has the greater prObability of containing the binomial parameter p? X -+ II or = 5.3.16. Examine the first two derivatives of the fUnction g(p) p(1 - p) to verify the daim on page that p(l - p)!! ~ for 0 < p < 1. 5.3.17. Money magazine reported that 30% of 1013 adults at random could not correctly define any of the four main types of life insurance. Built into that figure, the article is a "3.1 % margin of error." Verify that computation and in a short paragraph what the 3.1 % implies. 5..3.18. Viral infections contracted early during a woman's pregnancy can be very harmful to the fetus. study found a total of 86 deaths and birth defects among 202 pregnancies complicated by a first-trimester German measles infection (47). Is it believable that the true proportion of abnormal births under circumstances could be as as 50%? Answer the question by calculating the margin of error for the ""'I ..ve,,, proportion, 86/202. 5.3.19. Rewrite 5.3.1 to cover the case where a finite correction factor needs to be included (Le., situations where the sample size n is not negligible relative to the population size N). A Forbes-Gallop in the summer of 1994 questioned 304 chief executives chosen the nation's largest companies. To the question "Over the next from a list of 865 6 months do you expect the overall U.S. business climate to get beuer, worse, or remain about the same?" 70 of the 304 said "better" (52). What margin of error is associated 100) with their claim that 23 % of CEOs ( = :J& x are bullish on the economy? Include a finite correction factor in your calculation (see Question 5.3.21. Given that II observations will produce a binomial parameter estimator, X, having a margin of error equal to 0.06, how many observations are to have a margin of error half that size? for n proportion Section Properties Estimators 379 5.3.22. Given that a political poll shows that 52% of the sample candidate A. whereas 48% would vote for candidate B, and given that the margin of error associated with the survey is 0.05, it make sense to claim that the two candidates are Explain. s.3.23. Assume that the binomial parameter p is to be X is the number of successes in n independent with the function X, where n Which demands the Larger sarnplle requiring that X have a 96% probability of being within 0.05 of p, or requiring that X have a ;robability of within 0.04 of n 5.3..24. Suppose that p is to be estimated by - and we are willing to assume that the true p will not be than 0.4. What is the s~allest n for which X will have a 99% probability of being within 0.05 of p? n 5.3.25. P denote the true proportion students who support the movement to colorize classic films. Let the random X denote the number of students (out of n) who prefer colorized versions to black and white. What is the smallest sample size for X probability is 80% that the difference between - and p is less than 0.02? n 5.3.26. University officials are planning to audit 1586 new appointments to estimate the proportion p who have been incorrectly processed by the PayroJi Department. which does the sample size need to be in order for X, the n to have an 85% chance of lying within 0.03 of p? (b) Past audits suggest that p will not be than 0.10. recalculate the size asked for in Part (a). (a) How proportion, that infonnation, 1IlR000RTIES OF ESTIMATORS method of maximum likelihood and the method moments described in Section both use very reasonable to identify estimators for unknown parameters, yet the two do not always yield the same answer. example, that Yt • Yl • ... ,Yn is a random sample from the pdf, h(Y; 8) 1/8,0:::: y :::: e, the maximum likelihood . ~ hi] .L f .. 2 ~ f or 0 IS 0 Ymax W e meUlod 0 moments estlmator IS 0 L Y;. = A =- n ;=1 Implicit in those two fonnulas is an obvious question-which should we use? More generally, the fact parameters have multiple estimators (actually, an B.ll.LLUI.... number of es can be found for any given e) requires that we investigate the statistical properties associated with the estimation process. What qualities should a estimator have? Is it possible to find a "best" 8? These and other questions relating to the theory of estimation will be addressed in the next several sections. To understand the malhemlJJic., of estimation, we must first in mind that is a (unction of a set of random is, = h(Yb Y2 • ... , YI1 ). As any itself, is a random variable: It a pdf, an expected value, and a all of which play roles in evaluating its Cilpabilities. We will denote the pdf of an (at some u) with the symbol fiJ(u) or PiJ(u), depending on whether is a continuous or a discrete random variable. Probability e e, e calculations involving 0 will reduce to integrals of f!J{u) (if 8 is continuous) or sums of P{j(u) (if 8 is discrete). 380 Chapter 5 Estimation EXAMPlE S.4.1 a. Suppose a coin for which p = P(heads) is purpose of estimating p with the function is to p= X tossed ten times for the where X is I of heads. If p =-:0.60, what is the probability that X ~ number 0.601 :s 0.10? That is, what are tbe chances that the estimator will fall within 0.10 of the true value of p is discrete-the only values parameter? :0 can take on are -&' ,.... Moreover, when p = 0.60, Pi; eko) = P(p IkO) = P(X = k) = (~O) (0.60)k(0.4O)1D-k, k = 0,1, .. ,10 Therefore, p(1 ~ - 0.601 :s 0.10) = P (0.60 - 0.1O:s ~ :s 0.60 + 0.10) ,:::X:s7) (I~) (o.60l (0.40) 10-A b. How likely is the estimator X to lie within 0.10 of p if the n one hundred times? Given that n is so large, Ai Z the variation in l~' Since E ( ~) = in (a) is can be p and Var ( ~) to - p) / n, we can write =0.9586 Figure 5.4.1 shows the two probabilities and 1~. As we would expect., the sample size produces estimator-with fl = 10, X has only a 67% of lying in the range functions describing Ai more ~ as areas under the probability from 0.50 to 0.70; for n = 100, though, tbe probability of X true p 0.60) increases to 96%. within of Section 5.4 Properties Estimators 381 (Approximate) Dhll" of X 100 when p .. 0.60 ATea .. 0.9586 whenp o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 O.R 0.9 Values of XII! FIGURE 5.4.1 Are the additional ninety observations worth the in precision that we see in 5.4.1? Maybe yes and maybe no. In the answer to that sort of question depends on two factors: (1) the cost of taking additional measurements, and (2) the cost of making bad decisions or inappropriate inferences because of estimates. In both be difficult to quantify. Unbiasedness Because they are random variables, estimators will take on different values from sample to sample. Typically, some samples will yield Bes underestimate B while will to Bes are numerically too large. Intuitively, we would like the underestimates to somehow "balance out" the overestimates-that is, fj should not systematically err in any one particular direction. 5.4.2 shows the pdfs for two estimators, 81 and ih. Common sense tells us that 01 is the better of the two f8t (u) is centered with to true B; ih. on the other hand, will tend to give that are too large the bulk of fih (u) lies to the right of the true B. Definition SA.t. Suppose that ~2 •. YII is a random samplt;. from continuous pdf Jy(y; e). where e is an unknown parameter. An estimator (} h(Y., Yl, .. ,Yn )) is said to unbiased (for B) if E (in = (} for aU (}. same concept and terminology "! True 6 True 6 AGURE5A2 382 Chapter 5 Estimation apply if px(k; 0»). data consist a random sample Xl. X2. ...• drawn from a discrete pdf EXAMPLE 5.4.2 2" this section that 81 = Y/ and n >I.':...... u, pdf, h(Y; 0) = 1/0,0 :s y :s 8. A outset estlIDalCOrs for f) in 1. . . are or both An application ofthe coronary to Theorem 3.9..2, together with the fact that i, proves that is unbiased for 6: 0 j2 fh 2 =Ii 2 e n =-2>Ii 1=12 2 nO =Ii The mSlXllnwn llA'-'J.J.U\.1VU """ • .u.u'''IAJ~, on the other hand, is is necessarily to 0, ffh.(u) will not be centered £(fh) will be less than O. exact extent to which fh tends to calculated. Recall from 3.10.2 that (J is easily 1 n·- o £(ih) = ! u . =!!..-. 8" i on"-ldU 1 8 U"+ 1 n+10 n =--8 n + 1 If n = 3, in increases, though, become increasingly i as as 8, on the average. As n Yrnax will be onty in decreases [which makes sense ffh. (u) will around 0 as the sample size Properties of Estimators Section 5.4 finite 1 Comment. unbiased. Let n 11, 383 we can construct an estimator based on Ymax that is Ymax • Then +1 E(~) = E n n+1 =-II +1 =-11 _11_8 If + 1 If =8 EXAMPLE SA.3 Let Xl, X2 •... , Xn be a random sample from a an unknown Consider the estimator e= where GiS a/Xi are constants. For what values E(X), so By assumption, () = E(O) = pdf px(k; 8), where () = E(X) is a1, a2. ' .. , an will ebe unbiased? E( 11 = L ai6 ;=1 = (} i=l aJ 1=1 Clearly, {j will L" unbiased for any set of ai's for which ai = 1- ;=1 EXAMPlE SA.4 a random sample YI, .... Yn 8 normal distribution whose parameters and 0'2 are both unknown, the maximum likelihood for is 1 (Yi - jJ.. -2 Y) n i=l 5.2.4). Is il 2 eX[)ecteCi value equal to (1'21 ....,"''''..'.1-''... (121 H not, what function of il 2 does have an 384 Chapter 5 Estimation first, Theorem that for any random variable Y, Var(Y) = [E(Y)Jz, Also, from Section 3.9, for any average, of a sample of n random variables, Yl. Yz,··.. E(Y) and Var{Y) (l/n}Var(Yj). Using those results, we can write = E(';') ~ E [~ =E [.!. • (y, - 11 (Y; _ 1')'] 2YjY 1'Z)] + n i=l = H~ -nY') ] E[ = ~ [~E(Yn - O£(y,)] ~ ~ [~(O' +~'). _n( :' +~,)] n - 1 =-_.(j- n the latter is not to (12. ;:,2 is biased. TQ "unbias" the maximum likelihood estimator in this case, we need simply multiply by n By convention, the unbiased of the maximum likelihood estimator n- for u 2 in a normal distribution is denoted S2 and is referred to as the sample variance: S2 1 = sarnolIe variance = n 1 n 1 =n-1 Comment. The SQuare root the sample variance is called the sample standl2rd deviation: S = sample standard deviation = 1 n 11 -lL ;=1 - Y)2 In practice, S is the most ........ lllL'UliJY used ...."' ..u.......,....... for a even thou.gh E(S) that E(S2) = a 2]. ¢ a [despite Section 5.4 Properties of Estimators 385 EXAMPLE 5,4,5 By definition, the geometric mean of a set n numbers is the nth root of their product. 1 Let Yt and Y2 be a random sample of size two the pdf fy(y; 8) = "ge-y/lJ, y > 0, where 0 is an unknown parameter. an unbiased estimator for 8 on the ""'Utl_""" geometric mean, JYl Y2. By Theorem 3.9.1, t But = y/O. r. t dt = dy/£:) and ~ e- dt = r (~) = ~,Jn (recall Theorem 4.6.2). Therefore, I 01l 4 implies that ~ 4JY1 Y2 8=--- ,. is unbiased O. Table 5.4.1 is a computer simulation showing the performance of the estimator = ~/1l when £:) = 1. In columns Cl and C2 are forty random samples drawn from pdf fy(y; 1) y > O. The corresponding means JYtYz are listed column C3 and forty simulated 8..s appear in column C4. Sample twenty-eight yielded the smallest estimate (8e = 0.13792), while Sample twentynine erred the most in the other direction (0" = 3.97022). Notice that the average of the 1.02) is close to parameter's true (8 = 1.00). the two agree forty £:)es so well, of course, is not surprising, given that 8 is unbiased fOT 8. e )86 5 Estimation TABlESA-1 C3 Est. y2 1.01324 0.84515 0.44146 1.55721 1.68906 0.36449 1.12210 1.54124 0.12599 0.20148 0.53266 0.20425 4.49631 0.07196 0.50555 2.00492 4.40562 0.07702 0.13929 0.09732 0.24751 0.20255 0.04071 0.23687 0..85065 033847 0.67740 2.92107 0.31922 1.86945 0.41461 0.33562 0.23355 0.45424 1.73641 0.07541 0.29699 1.49059 0.48274 2.43756 1.45129 0.61484 0.37557 0.46802 0.17789 0.47298 0.15451 1.43477 0.48771 0.72270 1.06104 0.97953 0.01732 1.28070 3.40310 2.53520 1.53845 3.60054 030786 2..50065 37 0.52834 38 0.80602 39· 0.17185 40 0.98211 0.09598 1.22856 0.64045 0.38732 1.10229 0.86581 0.09313 1.12503 2.84524 1.04371 0..58988 0.87399 0.37540 1.70620 0.83684 0.34976 0..51193 o Jn671 0,46773 0.12326 0.39774 0.55177 1.47327 0.41882 0.85656 1.11027 1.28632 0.18986 0.15741 0.21455 0.19556 0.53909 0.14091 0.41375 0.95004 0..57580 0.10832 3.11820 0.35060 2.04473 1.27423 0.77193 1.99220 0..51628 0.48259 0.77098 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 23 24 25 26 27 28 29 30 31 32 33 34 35 0.70495 3.96959 0.42351 0.76114 1.07608 1.94639 1.1128(1 0.47797 2.17241 1.06550 0..50641 0.70254 1.87583 0.53326 1.09061 1.41364 1.63780 0.24174 0.20043 average til' 0.27317 0.24899 0.68639 0.17941 0.52680 1.20963 0.73313 0.13792 3.97022 0.44640 2.60343 1.62240 0.98285 2.53655 0.65735 0.61445 0.98164 1.92816 0.53923 0.96911 = 1.02 Section 5.4 Properties of Estimators 381 QUESTIONS 5.4.1. Two chips are drawn without replacement from an urn containing five numbered 1 through 5, The average of the two drawn is to be used as an estimator, for the true average all the chips (0 = Calculate POe - 31 > 1.0). 5.4.2. Supprn;e a random sample of n = 6 is drawn from the uniform pdf fy (y; B) = 1/0,0..::: y ~ B for the purpa;e of using 8 = to estimate B. (&) Calculate the probability that 8 falls within 0.2 of 8 that parameter's true value is 3.0. the sample (b) Calculate the probability of the event asked for in Part (a) is 3 instead of 6. 5A.J. Five hundred adults are asked whether they favor a bipartisan campaign finance reform bill. If the true proportion the electorate in favor of the legislation is 52 %, what are the chances that fewer than of tha;e in support proposal? Use a Z transformation to approximate the answer. 5.4.4. A sample of size Ii = 16 is drawn from a normal distribution where a = 10 but Jl is unknown. If Jl = 20, what is the probability that the estimator jl = Y wiUlie between 19,0 and 21.01 5.4.5. ... , Xn is a 'random sample of Ii drawn from a Poisson pdf, where A is an unknown Show that S. = is unbiased for l. For what type of in general, will the an unbiased estimator? Hint: The answer is implicit in the derivation showing that is unbiased for the Poisson A. 5.4.6. Let be the smallest order statistic in a random sampJe of n from the uniform pdf, fy(y; 8) = 1/B, 0 ;5; y ~ B. FInd an unbiased estimator for B based on froin· .sA.7. Let Y be the random described in 5.2..3, where fy (y; B) = e-(Y-(}) • () ~ y, 0 <: O. Show that Y - 1 is an unbiased estimator for (). 5.4.8. Suppose that 14,10,18, and constitute a sample of 4 drawo from a interval [0,0], 0 is unknown. Find an unbiased uniform pdf defined over the third order statistic. What value does the estimator for B based on estimator have for these particular observBtions? Is it possible that we would know that an estimate for 0 based on f3 was even if we had no what the true of B might be? hXlpllilll, 2, Yt and f2, is drawn from the pdf, 1 O<y<B What must c equal statistic C(fl + 2Y2) is to be an unbiased estimator for 1 5.4.10. A of size 1 is from the uniform pdf defined over interval [0, ()]. an unbiased estimator for Hint: Is 0 = unbiased? 5.4.11, Suppose that W is an unbiased estimator for (). Can W2 be an unbiased estimator for 02 1 SA.12. We showed in Example 5.4.4 that = 1 n -2 - Y) is is known and does not have to be estimated by Y. Show that fOT 2 Cf ,,2 = ~ t(Y j ni=1 unbiased for a 2 . Jl • is 388 Chapter 5 Estimation 5.4.13. AI; an alternative to imposing tered" by requiring that its medial1 be an estimator's distribution can be "centhe unknown parameter 8. If it {j is said to be median unbiased. Let Y" be a random sample of size If from the . . II + 1 . um'farm pdf,fy(y; e) = l/e,O :5 Y :5 8. For arbttrary IJ, lS 8 = - Ymax medlan to A 11 Is it median unbiased for any value of n1 oe] 5.4.14. Let Yl, be a random sample of size IJ from the fy (y; 8) = 1 n n . Ymin- Is 8 unbiased for e? Is 8 = Yj unbiased for 81 • Lete Y / II , y > O. A 11 i=l SA.15. An estimator Btl lim E(en ) ::: 8. h(WI' ...• Wn ) is said to be asymptotically unbiased for 8 if W is a random variab1e with E(W) p., and with variance 0'2. li-+oo 2 Show that W is an asymploticaHy unbiased estimator for . 5.4.16. Is the maximum likelihood estimator for 0'2 in a normal pdf, where both p., and 0'2 are unknown, asymptotically unbiased '{ Efficiency As We have seen, unknown rallnel.ers can have a multiplicity of unbiased es1imators. For samples drawn from the uniform fy(y; 0) liB. 0:5 Y :5 8, both n+1 211 8 = --- . and e = - L Yi have expel:;lea values equal to e. Does it matter = A A II II 1=1 wecmJO!ie we would like an estimator to is not the only important is its precision. Figure 5.43 shows the associated with two hypothetical estimators, ~ and Both are unbiased for 8, but is the better of the two variance. For any value r, because of r :5 That is,92 has a fh :5 e + r) > P(8 chance of being within a distance r of the unknown 8 than does 91. A (I ~-81 s r) (U)~ __ ---- .... --- -------- _-------- 8-r 6 6+r FIGURE 5.4.3 Definition Let 01 and fh be two ~w."a,~~.... estimators (or a parameter 8. If Var(Bl) < Section 5.4 Prn,r\oI>,.ti_ of Estimators 389 is more efficient than th. Also, the relative efficie.ocy of fh with respect to Var(th)/Var(fh). EXAMPlE 5.4.6 Let Yh Yz, and a sample from a norma} rliL""'''' are unknown. Which is a more efficient estimator for fl., 1 1 1 1 1 1 where both fl. and = 4" Yi + 2Yz + 4' Y3 or ~ =3 -YI + -Yz + -Y3 33 J.l,z Notice, first, that both ill and Itz are unbiased for Jl: ~ E(J.I,l) = E (14 Yl + 1 1 + 1) 4Y3 2Y2 1 1 = 4'E(Yl) + 2E (Y2) + 4'E(Y3) 111 + -J.I, + 2 -Jl 4 =Jl and 1 3 1 31) (3 = E - YI + - Yz + - Y3 = 111 3E(Yl) + 3E(Yz) + '3 E(Y3) 1 + 1 3 -Jl + 1 3 -Jl =Jl But Var(ltz) < Var(ill) SO is more efficient of the two: Var(ill) = Var 1 + + (1 Var(J1.2) =Var 3YI h = + 1 + ~Y3) 1 4'Var(Yz) + 1 16 Var(Y3) 1 + 1) 3" Y2 3Y3 111 "9 Var (Yt) + 9'Var(Yz) + 9'Var(Y3) 3G z ="9 (f 390 Chapter 5 Estimation emctelncy of it2 to 18 2 30- /30- 8 2 9 or 1.125.) EXAMPlE 5.4.1 Let . Yz • ... ,Yn be a. sBlmp.le irom the uniform pdf defined over the interval 2 • n +1 8]. We know from Example Y,; and fh = . YmllX are n ... ,,"-"."""...... for (J. Which estimator is more em.C1e:ou Appealing once again to the fact Var(Y) = E(y2) - [E(y)j2 for any variable Y, we can write ItW''''VJLU Var(9t) A = Var (2;; 4 >1 = '2 LVar(Y;) n j;:l 4 = L [E(Yr) >1 - [E(Yi)f] i=1 But E(Yi ) = () "2 (by SVIIlIDE:l:ry Var(B2) A + -1 ' = Var (n- n = = (11 : 1 Ymax ) Y. (n : 1) Var(Ymllx ) 2 [E(Y~ax) _ [E(Ymax)]2] Section 5.4 Properties of Estimators 391 From EX~lmple 3.102, so = 06y2 . ~ (~ )11-1 dy 1 en 1 n (J 0 y We know that E(Ymax ) A Var(el) n+1 ( dy n = en ) J,,n+2 n I(J n 2 + 2 0 = n + 2e n = -n-+--#1· so n+ n n )2] = (-1)2[e2 - - (- e2 n n+2 n+l e2 n(n + 2) Notice that n(n + ' Imp , I'les t h 2)/3n > 1 for n > 1, wh Ich at A Bz =n-+-1 . Ymax has a A smaller variance (and is more efficient) than Bt 2 n n i=l n -= - L Yi. CASE STUDY 5.4.1 World War II, a very simple statistical procedure was oe1{eJlJpe;Q war production, It was based on serial numbers and proved to reliable. When the war ended and the Third Reich's "true" production h""l1r~'" were revealed, it was found that serial number estimates were far more accurate instance than all the information gleaned from the more operations, spies, and informants.. Every piece of German equipment, whether it was a V-2 rocket, a tire, was stamped with a serial number that .u"u......" ........ it was manufactured. If the total number of, say, Mark I produced by a date was N, each would bear one of the integers 1 to N. As the war pr<>gr'esSiea, (Col1linued on next page) 392 Chapter S Estimation (C&e Study 5.4.1 continued) some of these numbers became known to the Allies-either by the direct capture of a of equipment or when a command was overrun. For the War Department's the problem was to estimate N the sample of "captured" serial numbers, 1 ~ y~ < ... < Y~ ~ N. approaches were model assumed that the 11 serial numbers : ) possible sets of It OTlJCI'eo integers from 1 to N and that each set That is, was P (Yt' = Yt' < Y~ = Y2 < ... < y'n = . l) n = (N)-l n The parameter N was then estimated by ......'.... lIl'S the numbers to the serial number: + - -1 L '("Y '} n - 1 .. Y~ J "gap'" in the serial 1) •> J So, if five tanks were "",,,,ntl,pl"! and they bore the numbers 14, estimate for the total number tanks produced would be Ntl! = 298 1 + 4[(298 146 - 1) + ... + and 298, the 14 1)J =368 A sec;Qnd estimator used was a eS1Irn:al()r ~'~'·6~r •• '" to as fh. in J...;AQ.u.FL<;; N2 = (11 : Both Nl and are be shown that V""~ll"''1'\ 1) good estimators, but Var(N2) of the modified maximum likelihood _1 N2 has a slightly smaller variance: It ca.o =1 1 Var(Nt) (63) for details]. The difference in the of NI and N2 compared with estimates obtained from intelligence reports and covert activities was astounding. serial number for German tank production in 1942, for example, was thirty-four hundred, very close to the "official" estimate, based on information (;()I1linued on nl!xt page) Section 5.4 Properties of Estimators 393 gathered through more conventional wartime channels, was a inRated eighteen thousand. magnitude were not uncommon. The sophisticated Nazi Discrepancies of propaganda machine have been root cause of the "normal" consistently on the high side. Germany sought to intimidate her by country's industrial prowess. On people, the carefully orchestrated exaggerating dissembling worked exactly as planned; on NI and N2, though, it had no effect whatsoever! QUESTIONS SA17. Let X2 ....• Xn denote the outcomes of a Xj fori=L2 ..... 'LLetX=Xl that PI (Il) of II independent trials. where 1 {0 with probability p with probability J + + ... + Xn. p h ::::: - are unbiased estimators for p. /I better estimator than Pt because Pt fails Xl and (b) Intuitively, is a to include any of the information about parameter contained in trials 2 through I!. Verify that speculation by comparing tbe variances of and Pt.. 5.4.18. Suppose that II = 5 observations are taken from unifonn pdf, fy(y: 8) = 1/8,0 ::: y ::: 8. where 8 is unknown. Two unbiased estimators for e are 6 - . Ymax 5 beuer to Hint: What must be true of Var(Ymax ) and Which estimator would Var(Ymjn) given that frey; 8) is symmetric? Does your answer as to which estimator is better make seose on intuitive grounds? Explain. 5.4.19. Let Yl. f2, .... YI! be a random sample of size n from the pdf fy(y; 8) = (8) Show that 81 = Yl • .~, and ~ = n . Ymin (b) Find the variances of tJl.lh, and 8.3. (c) Calculate the efficiencies of ~ to ~e-Y/8. y are all unbiased estimators > O. tJ. 82 to S.4.2(). Given a random of size II from a Poisson distribution, ~1 = X I and ~2::::: are two unbiased estimators (or A. Calculate relative efficiency of to SA.2L If YI. Y2 • ...• YI! are random observations from a uniform pdf over [0,8), both ~ = II ( 5.4.22. +,. L 1) . and fh ::::: (n A + 1) . Ymin an;: unbiased estimators for e. Show that Var(~)/Var(ed = that WI is a random variable with mean 1-t variance and W2 is a we know that random variable with mean 1-t and variance a}. From Example cWt + (1 - C)W2 is an unbiased estimator of J.L for any constant c > O. If WI and W2 are independent, for what value of c is the estimator cW1 + (1 - C)W2 most efficient? ai 394 Chapter 5 5.5 MINIMUM-VARIANCE ESTIMATORS: THE CRAMER~RAO lOWER BOUND Estimation Given two estimators, Ol and /h, each unbiased for the 0, we know from Section 5.4 which is "better"-the one with the smaller variance. But nothing in that to the more fundamental question how good and are relative to section the infinitely many other unbiased estimators for B. Is there a for that has a smaller variance than does either OJ or /h? Can we identify the unbiased estimator having the smallest variance? Addressing those concerns is one of the most elegant, yet practical, theorems in aU of mathematical statistics, a result known as the Cramer-Roo lower bound. :)U()Oose a random sample of size n is taken from, a continuous probability with fy(y; B) is a distribution frey; B), where 0 is an unknown parameter. theoretical Limit below which the variance of any unbiased estimator for B cannot falL That limit is the Cramer-Rao lower bound. If the variance of a given is equo./ to the Cramer-Rao lower bound. we know that estimator is optimal in the sense that no unbiased can estimate B with greater n,,~·"""',...n 9t e e Theorem 5.5.1. Cramer-Rao lnequolity LeI Y1 • . •. YI! be a random sample from the continuous pdf fy (y; B), where fy (y; 0) has continuous first-order and second-order partial derivatives at all but a finite set of points. Suppose that the set of ys for which fy(y: B) :;:. 0 not depend on e. Led; = hey], Y2..'" • Y,,) be any unbiased estimator forB. Then (A similar statement holds if the n observations come from a discrete pdf, px(k; B»). o Proof. See (93). EXAMPle 5.5.1 ...• XI! denote the number of successes (0 or 1) P = P(success occurs at any trial) is an Suppose the random variables Xl, in each of n independent trials, unknown parameter. Then Let X = Xl is + X2 for p + ... + Xn = total number of successes and define (E(jJ) = E(:) = E~) n; = P). How p = X. Clearly, p Var(p) compare with the Cramer-Rao lower bound for px/(k; p)? Note, that Var(p) = Var (X) = It 12 Var(X) = 12np (1 - p) = ---p-} It n n n Section 5.5 Minimum-Variance Estimators: The Crame--Rao lower Bound (since X is a binomial variable). To evaluate, Cramer-Rao lower bOund,-we begin by writing 395 /U!cond (onn of tn(1 - p) Moreover, iJ p) Xi = - - -:--p op and Taking the expected of the second derivative p E =-""2 p (1 p) -'---=--::c = (1 - J p(l - p) The Cramer-Rao lower bound, then, reduces to 1 -n [ p(l ~ pJ - pCl n whicb equals the variance of p = X. It follows that X is the preferred statistic for n n estimating the binomial parameter p: No estimator can possibly be more precise. Definition e denote the set of fJ = ... , Yn ) that are unbiased for the parameter 8 in the continuous pdf fr(y'; 8). We say that ij'+ is a best (or minimum-variance) estimator if B* E e and Var(B*) ::: Var(O) for all BEe (Similar terminology applies if e is the set of all unbiased estimators e in a the parameter pdf, px(k; 8». Related to notion of a best estimator is the concept of efficiency. The connection is spelled out Definition 5.5.2 for case where 8 is based on coming from a continuous pdf jy (y; 8). The same applies if the data are a set of XiS from a discrete pdf px(k; 8). Let Yl, f2 •...• Yll a random sample of n from the col1tinuous pdf jy(y; 8). Let {j = h(Yl, Y2 • ... , Yn ) be an unbiased estimator for e. 396 Chapter 5 Estimation e unbiased estimator is to be efficient if of 0 tbe Cramer-Rao lower bound associated with Jy(y; 8). b. The efficiency of an unbiased estimator is the ratio of the Cramer-Rao lower bound for Jy(y; e) to the variance of L e. e Con:uneJlt. The designations "efficient" and "best" are not synonymous. If the of an unbiased estimator is equal to the Cramer-Rao lower bound, then that ...... "'LU."'..U. by definition is a best estimator. converse, though, is not always true. There are situations for which the variances of lUI unbiased estimators achieve the Cramer-Rao lower bound. None of those, then, is efficient, but one (or more) could still be termed best. For the independent trials described in Example 5.5.1, p = XII is both efficient and best EXAMPU 5.5.2 e. e= 2 If Yl. '"" Y" is a random sample from frey; 6) = 2y/6 , 0 < y < ~ is an unbiased estimator for 6 (see Question 5.5.1). Show that the variance of is less tJw.n the Cramer-Rao lower bound for Jy(y; e). Applying Theorems 3.6.1 and to the proposed we can write e (3 -) = Var (3'2 . ;;16t ~ Yj ) ~ = Var "2 . Y Var(6) = 9 1l L:Var(Yi) i=1 where Var(Yi) = E(Y1) = f = (J 10 e2 [E(YdJ2 2y i . -(J2 dy - [ 10f e 2y y . - 2 dy e 2 ] Therefore, Var(6) == e2 =8n To calculate the Cramer-Rao lower bound for fy(y; €f), we first note that In Jy(Y; €f) = In[2Ytr2] = In2Y - 21ne Section 5.S Minimum-Variance Estimators: The Cramer-Rae Lower Bound 391 and -2 () Therefore, 4 and Is the variance of e less than the Cramer-Rao lower bound? Yes, A ~ ~ < 4n' Is the statement of Theorem 5.5.1 contradicted? No, because theorem does not apply in this situation: range of fy(y; ()) is a function a condition that violates one of the Cramer-Rao assumptions. e, QUESTIONS 5.5.1. the claim made in 5.5.2 that 8 = ~ 2 parameter()infy(y;8) 2yj8 ,O < y < 8. . Y is rul unbiased estimator for the =1 S..u Let YI. y2 •...• Yn be a random sample from fy(y; 0) 7/-'1/9 , y > O. Compare the Cramer-Rae (ower bound for Jy(y: 8) to the variance of the maximum likelihood estimator _ 1 0.8 - 11 E Is Y a best estimator for 87 IIj .. l 5.5.3. Let XIt X2 •...• X" be a random sample of size 11 from the Poisson distribution, PX (Ie; .i..) == e-A.i..k Al" k ;:::: 0, I, .... Show that A = E Xj is an efficient estimator for .i... 111=1 5.5.4. Suppose a c8J1dom sample of size 11 is taken from a normal distribution with mean J.L and vari8J1ce .,.2, where.,.2 is known. Compare the Cramer-Rao lower bound for fy(Y; J.L) with the variance of [L = Y = ~n Yi. Is Y an efficient estimator for J.L? 5.5.5. Let YI. Y2 ••.. , Yn be a random sample from the uniform pdf fy (y; 0) = 1/0,0 :::s y :::s 8. Compare Cramer-Rao lower bound for fy(y; 0) with the of the unbiased n +1 O. • A estllTlator 0 = - - . Ym&X' 11 lSCUSS. ¥..18 Chapter 5 Estimation 5.5.6. Let fl. Y2 •... , Y" be a random sample of size n from the pdf fy (y; 8) (a) Show that e= (b) Show that {} 1 = -----,--:c(r - y> 0 is an unbiased estimator for 8. = -r is a minimum variance estimator for 8. 5.5.7. Prove the of the two forms for the Cramer-Rao lower bound in Theorem 5.5.1. Hint Differentiate the equation fy(y) dy = I with respect to 8 and 1a 00 deduce that In fy(y) fy(y) dy = 1. Then differentiate again with respect to 8, -00 5.6 00 SUFflOENT ESTIMATORS Statisticians have proven to be quite diligent (and creative) in articulating properties that good estimators should exhibit. Sections 5.4 and for example, introduced the will notions of an estimator being unbiased and having minimum variance; Section those properties are explain what it means for an estimator to be "consistent." All easy to motivate, and they impose conditions on the probabilistic behavior of 6 that make eminently good sense. In this section, we look at a deeper property of estimators, one that is not so intuitive but has some particularly important theoretical implications. Whether or not an estimator is sufficient to the amount of "information" it contains about the unknown parameter. of course, are calculated using values obtained from random samples (drawn from either px(k; 8) or fy(y; 8». If everything that we can possibly know from the data about 8 is encapsulated in the estimate Be, then the corresponding estimator {; is said to be sufficient. A comparison of two estimators, one and the other not, should clarify the COlrlC~)Pr. An Estimator That Is Sufficient taken from Suppose that a random sample of the Bernoulli pdf where p is an unknown likelihooo estimator for p is We know from Example 5.1.1 that the maximum p=(~) (and the maximum likelihood sufficient estimator for p Xl = kl,.··. X,. = k" given that Xi is Pe = (;;) kil. To show that p is a that we calculate the conditional probability that p = Pe· Section 5.6 Sufficient Estimators Generalizing the Comment following Example 3.11.7, we can write = k}. ...• Xn = k" I p P (X I Pe) = = kl'l = Pe) ... , X" () ---"-......::..---:---~-:.......----:...:..;.. P(X 1 = 1<1 ••••• X" = kl'l) = Pe) But = " Ek[ pi_I " (1 _ p) "~Ekl i-I = p"Pe (1 _ P )n-"pt and n since L Xl has a binomial distribution with parameters nand P Example 3.9.3). 1=1 Therefore, (5.6.1) Notice that P(Xl condition that = k1.·.,. X" = kn I P= Pe}isnotafunctionofp· That is predselythe p = (~) Xi a sufficient estimator. Equation 5.6.1 in effect, that everything the data can tell us about the parameter p is contained in tbe esthnate Pe.Rememberthat, initially, tbejoint pdf of the sample, P(XI = k1. ... , X" k,,), is a function of the kjs and p. What we have just shown, though, is tbat if tbat probability is conditioned on tbe value of this particular estimate-tbat on p = p is eliminated and the probability of tbe sample is completely determined (in this case, it = ):is Ii the number of ways to arrange Os and in a. sample nPe of size n for which p = Pe). If we had used some other estimator--say, p"-and if P(XI = kl.·· .• Xn = kn I p;) had remained a function of p, the conclusion would be that the information in was not "sufficient" to eliminate the parameter p from the conditional probability. A p* = p; would kl and the conditional simple example of such a p* would equals ( n ) np~ -1 , where ( P; 400 Chapter 5 Estimation probability of Xl .= kl' ... , XI1 = kn given tbat ill' ::: p; would remain a function of p: k/ P(XI Ekl = k1, .... Xn = kn I pili. = kl) = ---,-----:--:-- = pi-:l (1 11-1- - p) Ek/ i~2 Comment. Some of the dice problems we did in Section 2.4 have that parallel to some extent the notion of ao estimator Suppose, for we roll a pair of fair dice without being allowed to view the outcome. Our objective is to calculate the probability tbat the sum showing is an even number. If we bad no other information, the answer would be ~. though, that two people do see the outcome-which was, is allowed to characterize the outcome without providing lIS in fact, a sum of 7---and with the exact sum that occurred. Person A tells us tbat "the sum was Ie&,; than or equal to seven"; Person B says tbat "tbe sum was an odd number." Whose information is more helpful? Person B's. The conditional probability of the sum even that the sum is less than or equal to 7 is which still leaves our initial question largely unanswered: ft. P(sum is even I sum :s: 7) P(2) + P(4) + P(4) + + P(6) .~:c----:-::::- 9 21 In contrast, Person B utilized the data in a way that definitely answered the original question: P(sum is even I sum is odd) = 0 In a sense, B's information was "sufficient"; A's information was not. An Estimator That Is Not Sufficient Suppose a random sample of size n-Yl = Yl, .. ,Yn = yn-is drawn from the uniform pdf fy(y: 0) = 1/0.0 ::s: y .:s 0, where e is an unknown parameter. Recall from Question 5.2.15 that the metbod of moments estimator for 0 is e = 2}' = A _ (2)~ - L.; }'j n ;..1 latter is not a sufficient statistic because all the information in the data that pertains to the parameter e is not necessarily contained in the numerical value OeIf 8 were a sufficient statistic, then any two random samples (of size n) having the same value for Oe should yield exactly the same information about O. A simple numerical Section 5.6 Sufficient Estimators 401 example shows that not to be the case. Suppose n = 3. Consider the two random samples Yl = 1. Y2 == 2, Y3 = 3 and YI = 0, Y2 1, Y3 = 5. In both cases, Do both samples, though, convey the same information about the possible value of No. Based on sample, the parameter 0 could, in be equal to 4 (= fJe ). On the other hand. the second sample rules out the possibility that 8 4 because one of the observations (YJ = 5) is larger than 4, but according to the defmilion of fy (y; 8), all must be tween 0 and O. A formal Definition Suppose that Xl = kl' ... , XII k" is a random sample of pdf px(k; 8), where B is an unknown parameter. Conceptually, for fJ if P(XI = kt, ... , X" n from the discrete is a sufficient statistic = k" I {) = Be) " (5.6.2) where Po (Be; 8) is the pdf of the estimator evaluated at the point {j = Oe b(k1 • ..• , k ll ) is a constant independent of (J. Equivalently, the condition that qualifies an estimator as being sufficient can be expressed by cross-multiplying Equation 5.6.2. Definition 5.6..1. Xl = kl.'." X" = be a random sample of size n px(k; B). The estimator = h(Xb ... • Xn) is sufficient for (} if the likelihood function, L(B), factors into the product of the pdf for and a constant that does not involve O-that is, if e e L(fJ) Il" px(kj; fJ) Pj;(8e ; O)b(k 1 .···, k n } i=l A similar statement holds if data consist of a random sample Y1 drawn from a continuous pdf fy(y; 8). Comment. is sufficient for then any one-to-one function estimator for B. As a case in point, we showed on p. 399 that = YI.,··, Yn = Bis Yn a sufficient 4(12 . Chapter 5 Estimation is a sufficient estimator for the parameter p in a Bernoulli It is true, that n p*=np= L 1",,1 is sufficient p. EXAMPlE 5.6.1 Let Xl = k1, ... , Xn ::::: kn be a random sample of size n from the Poisson pdf, px(k; A) = e- AAle I k!, k 0, I, 2, ... Show that 1=1 is a sufficient estimator for A. From Example 3.12.10, we know that )., a sum of n independent Poisson random each with parameter A, is itself a random variable with nA. then, ). is a sufficient estimator for A if the sample's likelihood function a product of the pdf for i times a constant that is independent of A.. n Tl L(J..) = e-A>.k; I kit == i=l (5.6.3) proving that i L" Xi is a sufficient estimator >.. 1=1 Comment. The factorization in Equation 5.6.3 that estimator for >.. It is not, however, an unbiased estimator for J..: E(i) = £(Xi) Constructing an unbiased estimator based on the sutficlent matter. Let i Tl L i=l Xi is a sufficient Section 5.6 ~ Then EO.*) function 1 1 = -E(A) = -nA = n n Sufficient Estimators A 403 >." is a one-to-one A A, so is unbiased. for A. Moreover, >., so, by the Comment on p. 401, i* is, itself, a sufficient estimator for L EXAMPLE 5.6.2 Let Yl = Yb ... , Yn = y" bearnndomsampleofsizen from the uniform pdf fy(y; e) O::s y ::s 8. We know Question 5.2.9 that = l/o, iJ = Ymax is the maximum likelihood estimator for 8. Is Ymax also sufficient Recall from Example 3.10.2 that fyrtiJj~(y} Here, Fy(y) = P(Y ::s y) 81 = n[Fy(y}]n-1 fy(y) = loy (~) dt = YIB, SO /Ymtlx (Ymax; Whether iJ = Ymax is sufficient for () hinges on whether the factorization of L(e) described n fy(yj;() II Definition 5.6.1 can accomplished. But L(e) = = (ll{)n, and we can i=l write 1 n-l L(e) = (l/{)" = nymax en = . f~(ee; e) . b(y}, ... , Yn} which proves that iJ = Ymax is a sufficient estimator for e. A Second Factorization Criterion Using Definition 5.6.1 to verify that an estimator is requires that the pdf Pfj(h(kt. ... , kn); 9} or fiJ(h(yl •...• yn}~ 8) be explicitly identified as one of the two factors whose product equals the likelihood function. If is complicated, thougb, finding pdf may be prohibitively difficult. The next gives an alternative factorization criterion for establishing that an estimator is sufficient. It does not require that the pdf for 9be known. e Theorem 5.6.1. Let XI = kl, ... , Xn = kn be a random sample of size n from the discrete pdf px(k; e). The estimator = h(Xt. .. ,XII} is sufficient for e if and only if there are functions g(h(kl . ... , k,,}; e) and b(k1 , ••• , k,,) such that e L(e) = g(h(kt •. ... kn}; e} . b(k!. ...• k,,) (5.6.4) where the function b(kl • ...• kn ) does not involve the parameter e. A similar statement hDlds in the continuous case. 404 Chapter 5 Estimation Proof. suppose that 8 is sufficient for e. Then the criterion of Definition 5.6.1 includes Equation 5.6.4 as a special case. Now, assume that Equation 5.6,4 holds. The theorem will be proved if it can be shown that g(b(lel • ...• k,,); 8) can always be "converted" to include the of 8 (at which point Definition 5.6.1 would apply). Let c be some value of the function b(kl •...• kif) and let A be the set of samples of size n that constitute the inverse of c-that is A (e). Then = Pfj(e; 8) = 1'1 L px[,x2 •.... x,,(k1,k2 •.. ,kif)::::: npX,(ki) (kt.k2 ..•• knkA 1=1 L = L gee; 8) . b(kl. k2 • ... , len) = gee; 6)· b(kt. ... ,len) (kt. k2._·. k,,)eA (kt.k2 •••.• kn)tA Since we are only interested in points where po(e; 8) E b(kt. k2, .... lell ) ::F- O. Therefore, ::F- 0, we can assume that (*1 h.··.kn}tA gee; 6) = Pa(e; 8) . 1 (5.6.5) Substituting the right-hand-side of Equation 5.65 into Equation 5.6.4 shows that 8 qualifies as a sufficient estimator for 6. A similar can be made if the data '-UJJ""U>L of a random sample Yl = YI •... , Y" = y" drawn from a continuous pdf Iy (y; 8). See (211) for more details. 0 Sufficiency As It Relates to Other Properties of Estimators Chapter 5 has constructed a rather elaborate facade of mathematical properties and procedures associated with estimators. We have whether 8 is unbiased, el11lC1e~nt, andior sufficient. How we find 8 has also come under scrutiny--some estimators have been derived using the method of maximum likelihood; others have come from the method of moments. Not all of these of estimators and estimation, though, are entirely Ollllt--50Ime are related and interconnected in a variety of ways. 8s exists for a parameter 8, Suppose, for example, that a sufficient suppose that 8M is the maximum likelihood estimator for that same 8. ff, for a given sample, 8s = 8e , we know from Theorem 5.6.1 that Since the maximum likelihood estimate, by definition, maximizes L(8), it must also LL""'A'U~ g(8e ; 8). But any 8 that g(81!; 8) will necessarily be a function of fJl!' It follows, then, that maximum likelihood estimators are functions sufficient estimators-that 8M = f(8s) (which is the primary theoretical justification why maximum likelihood estimators are preferred to method of moments estimators). Section 5.6 Sufficient Estimators 405 Sufficient estimators also playa critical role in the search for efficient estimators--that is, unbiased whose variance equals the Cramer-Rao lower bound. There be an infinite number of unbiased estimators for any unknown parameter in any pdf. That said, there may be a subset of those unbiased estimators that are functions of sufficlent estimators. If so, it can be proved (see (93)) that variance of unbiased estimator based on a sufficient estimator will necessarily be less than the variance of every unbiased estimator that is not a function of a sufficient estimator. It follows, that to find an efficient estimator for 8, we can restrict our attention to functions of sufficient estimators forO. QUESTIONS 5.6.1. Let Xl, X2 •... , X" be a random sample of px(k;p}=(l - n from the geometric distribution, p)k-l p ,k=1,2, .... Showthalp= X j is5ufficientforp. 5.6.2. Suppose a random sample of size II is drawn from the pdf, fy(y; B) = e-(y-8), B::s y e (0) Show that == is sufficient for the threshold parameter B. (b) Show that Ymax is not sufficient for B. 5.6.3. Let X h X 2, and be a set of three independent Bernoulli random variables with unknown parameter p (,(Xi It was shown on p. 399 that p = XI + X2 + is sufficient for p. Show that the linear combination p* = X I + 2X2 + 3X3 is not sufficient for p. 5.6.4. If Bis sufficient for B. show that any one-to-one function of 0 is also sufficient for B. = .5.6.5. Show that q2 = Yr is sufficient for 0'2 normal pdf with J1. = O. Let Yt, Y2 • ... , YlI be a random sample fy(y: B) = if YI. Y2 • .•.• Y" is a random sample from a size n from the pdt of Question 5.5.6, 1 (r - O.S: y for posi tive parameter B and r a known positive integer. Find a sufficient statistic for B. 5.6.7. Let Yt, Y2 • ...• Y" be a random sample ofsize n from the pdf fy(y; 8) = By8-1, 0 :s y .S: 1. Use Theorem 5.6.1 to show that W = Yj is a sufficient estimator for B. Is the maximum likelihood estimator of B a function of W? 5.6.8. A probability model 8W(w; B) is said to be expressed in exponential form if it call be written as cw(w; 8) = e K (w)p(8)+.'i( ml+q(8) where the range of W is independent of 8. Show that (j = K (Wi) is sufficient for £J. 406 Chapter 5 Estimation 5.6.9. Write the pdf Iy (Y; A) == l.e-.A),. Y > 0, in exponential form and deduce a sufficient statistic for A(see Question 5.6.8). Assume that the data consist of a random sample of size rI. 5.6.10. Let Yl. Y2 • •.•• Y" be a random sample from a Pareto pdf, fy(y;8)=fJ/(1 + y)il+l, 0 < Y < 00; 0 < f) < 00 Write fy(y; f) in exponential form and deduce a sufficient statistic for tion 5.6.8). 5.1 f) (see Ques- CONSISTENCY The properties of estimators that we have examined thus far-for unbiasedness and sufficiency-have assumed that the data consist of a fixed sample size n. It sometimes makes sense, though, to consider the asymptotic behavior of estim.aton;: We may find, for ex.ample, that an estimator possesses a desirable property in the limit that it fails to exhibit for any n. Recall Example 5.4.4, wbich focused on the maximum likelihood estimator for m a sample of rI ~ 1" on (12 = - drawn from a normal pdf [that L - (Y, - y)2J. For any 11 ;=1 finite n, &2 is biased: E L., (Yj ( -l~ - -2) = n-1 Y) n 11 1=1 As rI goes to infinity, though, the limit of £(&2) does equal , and we say that (12 is asymptotically unbiased. Introduced this section is a second asymptotic of an estimator. a property known as consistency. Unlike asymptotic unbiasedness, consistency refen; to the shape of the pdf for 011 and how that shape as a function of I'l. (To emphasize the fact that the estimator for a parameter is now being viewed as a sequence of estimators., we will write 0" of 0.) an Oefiniti<m 5.7.1. An estimator = heW}, W2 • ... , WlI ) is said to be consistent for it converges in probability to (j-that is, if for all e > 0, {j if Comment.. To solve certain kinds of sample size problems, it can be helpful to think of Definition 5.7.1 in an epsilon/delta context; that is, On is consistent faT f) if fOT all e > 0 and 8 > 0, there exists an nee, 8) such that POOl! - (j I <e) > 1 - 8 for n > n(e, 8) Section 5.7 ConsIstency 407 EXAMPLE 5.1.1 Let Yl, Y2, ... , Y" be a random sample from the unifonn frey; 8) 1 = e' 0:::: y :::: B and let 811 = YmV" We already know that Ymllx is biased for B, but is it conIDstent? Recall from Example 5.4.7 that Therefore, P08" - 8 [(8 I <e) = P(fJ e)/fJ] < 1, it follows that [(0 - e)/8]1I -+ 0 as n -+ e) = I, proving that 0" Ymax is consistent for 8. = lim P(lOIl - 81 < "-HX) 00. Therefore, Figure 5.7.1 illustrates the convergence of On. n increases, the shape h max (y) changes io such a way that the pdf becomes increasingly concentrated in an e-neighborhood of 8. For n > n(e, .s), POO" - fJl < e) > 1 .s. If 8, e, and S are specified, we can calculate n(e, S), the smallest sample size that will enable 6n to achieve a given precision. For example, suppose B 4. How large a sample is required to eTl 80% chance of lying within 0.10 of 8? In the terminology of the Comment on page 406, e = 0.10, 8 = 0.20, and = an ~ P08 - 41 < 0.10) =: 1 - (4 -4 0.10)11 8-8 o 1 2 3 n(8, 8) FKiURE 5.1.1 n ?: 1 - 0.20 408 Chapter 5 Estimation Therefore, (O.975)'I(E,§) = 0.20 which implies that n(e. S) == 64. consistency is Chebyshev's inequality, which appears here as More generally. the latter serves as an upper bound for the probability that any random variable lies outside an e-neighborhood of its mean. Theorem S.7.L (Chebyshev's in.eq'W.11ity.) Let W variance (J"l. For any e > 0, any random vari11ble with mean /.1 and P(I W - /.11 < e) ::: 1 - a2 or, equivalently, Proof. In the continuous case, Var(Y) f': =l = Ili frey) dy l J.-t(y _ /.1)2fy(y)dy +llHt (y _ fJify(y)dy Omitting the nonnegative middle integra! Var(Y) :::.l (y - /.1)2fy(y)dy an inequality: P ,-e +[ 11.+£ p-t -00 1 00 (y - /.1)2 frey) dy + -co (y - pi frey) dy #+£ (y - /.1)2 frey) dy :::. { Ay-p,?:.£ :::.[ (;2 frey) ly-pl?:£ = dy poy - ILl::: s) Division by i1- completes the proof. (If the random variable is discrete, replace the integrals with swnmations.) 0 EXAMPlE 5.7.2 Suppose that Xl. X2 .... , X" is a random sample of size n from a discrete pdf px(k; tt). where E{X) = /.1 estimator for /.11 and Var(X) = < 00. Let jln = (~) XI. Is jln a consistent 409 Section 5.7 ""~\TrI'lrta to Chebyshev's inequality, Var(/l,,) > 1 . 82 (1 8 Xj 1 n But Var(/l,,) = Var ;; ) = ~I < PO/ln - 0'2 For any e, 0, and 0'2, an n can be found that makes -2 < 8. Therefore, lim PU/l1l ne 11-+00 < e) = 1 /l" is consistent for ~), - #1 Comment. fact that the "a'"pJ'''' mean, /l", is necessarily a consistent estimator pdf the data come from, is often to as the for true mean ~, no matter weak law o/large numbers. It was by Chebyshev in 1866. Comment. We saw in Section one of tbe reasons tbat justiis the fact using the method of maximum likelihood to identify good maximwn likelihood estimators are necessarily functions of sufficient Glll"Ll!..'», A:s an additional rationale for maximum likelihood it can very general conditions that likelihood estimators are QUESTIONS 5.7.L How large a sampJe must guarantee that iln = from a normaJ pdf where 18 in order to Y/ has a 90% probability of lying somewhere in the interval [16, 20J1 Assume that a = 5.0 . Let Yt, f2 •...• Yn be a of size n from a normal that S2 " 1 n =-L ni ... 1 f 2 is a estimator for (J2 = O. Show = Var(Y). I 5.7.3. Suppose flo fl •...• fn is a random sample from the y> O. (8) Show that = Y1 is not consistent for A. (b) Show that S.7A. An estimator having JL pdf, Jy(y; A) = Ae-J.y, Yi is not consistent for A. is (8) Show that any Hon 5.4.15). (b) Show that any tion 5.7.1. to be squared-error consistent for e if lim lIHI-eo-error consistent :.Ir .... _'TT.,r consistent - e)2] = O. en is asymptotically unlOlaJ5ea (see Quesis consistent in sense of Defini- 410 Chapter 5 Estimation 5.7.5. o .... ~'vv''''' 5.7.6. If 2n + mean IJ, 5.8 is to be used as an estimator for the parameter e in the uniform D:s }' ::s e. Show that 811 is squared-error consistent (see Question 1 random observations are drawn from a continuous and with and if fy(Jl; Jl) >/= D. then the sample median, • is unbiased for IJ" and 1/(8[fy{J.L: IJ,)]2n) [see (54)]. Show that 11" = is consistent for IJ,. BAYESIAN ESTIMAnON analysis is a set of statistical techniques based on from Bayes' Theorem (recall Section 2.4). In particulaf, methods for incorporating prior into An interesting example of a solution to an unusual problem occurred some years ago in the search for a missing nuclear submarine. In the Spring of 1%8, the USS Scorpion was on maneuvers with the Sixth Fleet in Mediterranean to homeport of Norfolk, Virginia. The waters. In May, she was ordered to last message from the Scorpion was received on May 21, and indicated her position to be a of islands eight hundred miles off the coast about fifty miles south of the of Portugal Navy that the sub had sunk somewhere along the eastern coast of the United States. A massive search was mounted, but to no avail, and the fate remained a mystery. Enter John a Navy expert in deep water exploration, who believed the not been found because it had never reached the eastern seaboard and waS still somewhere near the Azores. In setting up a search strategy. the area near the Azores into a grid of n squares, and solicited the of veteran commanders on the chances of the "",rnlt""'" in each those Combining their opinions resulted in a set ...• P(A ll ). that the sub had sunk in areas 1,2, ... , n, res:pe,ctl1vel'l/. Now, P(Ak) was the largest of the P(AI)S. Then area k would be the first be if it had sunk in region searched. Let Bk be the event that the Scorpion area k and area k was searched. Assume thai the sub was From Theorem 2.4.2, becomes an updated it P*(Ak). The remaining P(Aj)s, i be normalized to form the probabilities P*(Aj), i :;e k, where L" >/= k, can then P*(Ai) = L If i=l then area j would be searched next. If the sub set of probabilities, P"''''(Al), P*"'(A2), ... , P"""(A n ), would be in the same and the search would continue. '-"-LV,,,,,,, of 1%8, the USS Scorpion was, indeed, found near the men aboard had perished. Why it sunk has never been disclosed. One ~~!'>E>~~'~.~ that one of its torpedoes accidentally exploded; Cold Waf COl[lS[nHtcy think it may have been sunk while spying on a group What is the strategy of using Bayes' Theorem to update the location X",un.,., .. might have sunk proved to be successful. Section 5.8 Bayesian Estimation 411 Prior Distributions and Posterior Distributions ConceptuaHy, a major difference between Bayesian analysis and non-Bayesian analysis are the assumptions associated with unknown parameters. a non-Bayesian analysis (which would include all the statistical methodology in this book except present section), unknown parameters are viewed as constants; in a Bayesian -'''-J'V'~' fJ... " ...... are as random variables, meaning they have a At the outset in a Bayesian analysis, pdf assigned to parameter may be based on little or no information and is referred to as the prior distribution. As soon as some are collected, it becomes possible-via Bayes' Theorem-to and the pdf to the parameter. Any such updated is referred to as a posterior distribution. In the search for the USS Scorpion, the unknown parameters were the probabilities of finding the sub in each the grid areas surrounding Azores. The prior distribution on parameters were the probabilities P(AI), P(A2), ... , P(A,,). Each an area was was the searched and the sub not found, a posterior distribution was calculated-the set of probabilities P"'(Al), P*(A2).' P*(A JI ); the second was the set of probabilities P**(Al), P"'*(A2), . ", P**(A.d; and so on. .! EXAMPLE 5.8.1 Suppose a retailer is interested in modeling the number of calls arriving at a phone bank a five-minute interval. 4.2 established that Poisson distribution would the pdf to choose. what value should be assigned to the Poisson's parameter, A'! If the rate of calls was constant over a twenty-four-hour period, an estimate for ).. could be calculated by dividing total number of calls received during a full day by 288, If the the latter being the number of nve-minute intervals in a twenty-four-hour random variable X, the number of calls received during a random five-minute AI< interval, the estimated probability that X = k would be px(k) = k~ ,k 0,1,2,.,. In reality, though, the incoming call rate is not likely to remain constant over an entire twenty-four-hour period. Suppose, in that an examination of telephone logs for the several months suggests that)" equals ten about three-quarters of the time, and it about one-quarter of time. in Bayesian terminology, the rate parameter is a random variable A, the (discrete) prior distribution for A is defined by two probabilities: PA(8) = P(I\. = 8) = 0.25 PA(10) = peA = 10) and = 0.75 Now, suppose certain facets the retailer's operation have recently changed (different products to sell. different amounts of advertising, etc.). Those changes may very well affect the distribution with the call rate. Updating the prior distribution for I\. requires (a) some data and (b) an application of Bayes' Theorem. Being both frugal and statisticallychallenged, retailer decides to construct a posterior distribution for A on the basis of a single observation. that a five-minute i.s at random the corresponding value X is found to seven. How should PA (8) and PA (10) revised? 412 Cha pter 5 Estimation Bayes' Theorem, peA = 10 I X = 7) = = = -----...:.--...:..,-----:...~-::---.;.----- --::---...........!..:.........--,.;::--- + (0.75) (0.090) (0.75) = 0.659 (0.140)(0.25) + (0.090)(0.75) which implies that peA = 8 I X = 7) = 1 - = 0.341 Notice that the posterior distribution for A has changed in a way that makes sense intuitively. lnitiaJly, peA 8) was 0.25. Since the data point, x = 7, is more consistent with A = 8 than with A 10, the posterior increased the probability that A 8 (from 0.25 to 0.341) and decreased the that A 10 (from 0.75 to 0.659). = = Definitkm5.8.1. Let W be a statistic dependent on a parameter 8. Call its pdf f"'(w 10). Assume that 8 is the value a random e, whose prior distribution is denoted p@(8), if e is discrete, and fe(8), if e is continuous. posterior distribution of given that W w, is the quotient = if W is Ul"''-UOIC':; ge(81 W = w) == if W is continuous Note: If e is discrete, call its pdf Pe(8) and replace the integrations with summations. Comment. 5.8.1 can be used to construct a posterior distribution even if no information is available on which to base a prior distribution. In such cases, the uniform pdf is substituted either p@(8) or fe(8) and referred to as a noninforrtUltive prior. EXAMPLE 5.8.2 M~ a video game pirate (and Bayesian), is trying to decide how many illegal copies of Zombie Beach Party to have on hand for the upcoming Holiday Season. To get a rough idea of what the demand might be, he talks with 11 potential customers and finds that X :::;: k would buy a for a (or themselves). The obvious choice for a probability course, would be binomial pdf. 11 potential customers, the model for Section 5.8 Bayesian Estimation is the familiar probability that k would actually buy one of Max's illegal where the maximum likelihood estimate for () is given by 413 = -Ien It may well be though, that Max has some additional insight about the value of () on the basis of similar that he illegally marketed previous Suppose he suspects. for example, that the percentage of potential customers who will buy Zombie Beach Party is Likely to be between and 4 % probably not exceed 7%. A reasonable prior distribution for €I, then, would be a pdf mostly concentrated over the interval 0 to 0.07 with a mean or median in the 0.035 range. One probability whose shape would comply with restraints that Max is imposing is the beta pdf Written with e as the random variable, the (two-parameter) beta pdf is given by ~ (0) J9 = r(r + s) Or-l (I r(r)r(s) _ 8)s-1 0 _< , e <_ I The beta distribution with r = 2 and s = 4 is pictured in Figure 5.8.1. By choosing different values for rand s, fe'(O) can be skewed more sharply to the right or to the left, and the bulk. of the distribution can concentrated close to zero or close to one. The QuestJlon is, if an appropriate beta pdf is used as a prior distribution for e, and if a random """l11I.l'"':; of k potential customers (out of n) said they would buy the video game, what would be a e? reasonable posterior distribution Definition 5.8.1 for the case where W ( X) is discrete and e is continuous, 2.4 C 1.6 .~ Q .8 .2 .4 ,6 FIGURE 5.8.1 .8 1.0 414 Chapter 5 Estimation Substituting into the numerator so t 10 (n) k (n)k r(r + r(r)r(s) + s) f'(r)r(s) (1 _ e),,-Hs-1 de Notice that if the rands in the beta pdf were relabeledk respectively, the equation for ie(e) would be + r andn - k + s, But those same exponents for (J and (1 - (J) appear outside brackets in expression ge(e I X = k). can only one ie(8) whose variable are ek+r - 1 (1 - 8y·-k+s-1 (see Theorem4.6A), it follows that 8e(8 I X k) is a beta pdf with parameters k + rand n - k + s. The final in the construction of a posterior distribution for e is to choose values for rand s that would produce a (prior) distribution the configuration described on p. 413-that is, with a mean or median at 0.035 and the bulk of the distribution betwee'n oand 0.07. It can shown (see (92)) that the expected value of a Beta pdf is r/(r + Setting 0.035, then, to that quotient implies that = s =28r By and error with a calculator that can integrate a beta pdf, the values r = 4 and s = 28(4) 102 are found to yield an ie(fJ) having almost all of its area to the left oro.07. Substituting those for rand s ge(B I X k) the completed posterior distribution: = r(n + 106) fJk+4-1 1 + 4)f'(n - k + 102) ( (n + 105)! fJk+3 1 + 3)!(n k + 101)! ( 8 X - k ge( t - ) - r(k = (k Section 5.8 Bayesian Estimation 415 EXAMPLE 5.8.3 Certain prior distributions "fit" especially well with parameters in the sense that the resulting distributions are to with. Example was a case point-assigning a prior distribution to parameter in a pdf leads to a beta posterior distribution. A simBar relationship holds if a gamma pdf is used as the prior distribution for the parameter a modeL Suppose Xl, .... , X" denotes a random from the Poisson pdf, I 8) = JI e-(JO"lk!, k = 0,1. ... Let W =L Xi. By 3.12.10, W has a Poisson distribution ;=1 pw(w 18) = e- n8 (n8)Wlw!. w = 0, 1,2, ... be the distribution assigned to e. Then ge(8 I W = w) = pw(w Ie pw(w I O)fe(B) I 8)fe<B) de where e -118 (ne)W }.If es-I -1liJ ---e w! r(s) W II n p, £\w+s-l e-(tt+n)O ---v w! r(s) in '-''''',.. ,,1-''.... the same argument that simplified the calculation of the 1"'\"''''''Y''I",,. distribution we can write ge«) I W = 81.o+s-1 e-(IHn)& IS only pdf having the ",p,t",,..,, W + sand p, + ge(8 n. It IW = then, that w) = (p, + n)w+s r(w + s) 416 Chapter 5 Estimation CASE STUDY 5.8.1 Predicting the annual number of hurricanes that will hit the U.S. mainland is a problem receiving a great deal of public attention, given the disastrous summer of 2004 when four major hurricanes struck Florida, causing billions of dollars of damage and several mass evacuations. For all the reasons discussed in Section 4.2, the obvious pdf for modeling the number of hurricanes reaching the mainland is the Poisson, where the unknown parameter e would be the expected number in a given year. Table 5.8.1 shows the numbers of hurricanes that actually did come ashore for three fifty-year periods. Use that information to construct a posterior distribution for e. Assume that the prior distribution is a gamma pdf. TABlE 5.8.1 Years Number of Hurricanes 1851-1900 1901-1950 1951-2000 88 92 72 Not surprisingly, meteorologists consider the data from the earliest period, 1851 to 1900 to be the least reliable. Those eighty-eight hurricanes, then, will be used to formulate the prior distribution. Let S 1'. JEl (e) = ~e·<-le-piJ res) , 0 < e< 00 Recall from Theorem 4.6.3 that for a gamma pdf, £ (8) = s / JL. For the years from 1851 to 1900, though, the sample average number of hurricanes per year was ~. Setting the latter equal to £(8) allows s = 88 and JL = 50 to be assigned to the gamma's parameters. That is, we can take the prior distribution to be Also, the posterior distribution given at the end of Example 5.8.3 becomes Ee (e I W -- W 50 ) -- ( + n)w-tS8 ew+87 1(w + 88) e-(50+,,)0 The data, then, to incorporate into the posterior distribution would be the fact that + 72 = 164 hurricanes occurred over the most recent n = 100 years included in the database. Therefore, w = 92 Section 5.8 Bayesian Estimation 417 EXAMPLE 5.8.4 In the examples seen thus far, the joint pdf gW,eCw, B} = pwCw I 8}fe(B) of a statistic W and a parameter e (with a prior distribution fe(B)) was the starting point in finding the p()sterior distribution of e. For some applications, though, the objective is not to derive 88(8 I W = w), but, rather, to find the marginni pdf nf W. For instance, supp()se a sample of size n = 1 is drawn from a Poisson pdf, pw(w 18) = e- 8 Bw jw!. w = 0,1, .... where the prior distribution is the gamma pdf, fe(e) = s ~Bs-le-JjlJ. According to Example 5.8.3, res) gwe(w, e) . 1 J-Ls = Pw(w I B)1e(8) = ___ B w +s- 1e-(J.LHl8 w! rcs} What is the corresponding milrgilUll pdf of W-that is, pw(w)? Recall Theorem 3.7.2. Integrating the joint pdf of Wand e over e gives 00 PW(w) = 10 gW,e(w,e)de roo ..!..Lew+S-Ie-(u+Hede w! r(s) =..!..~ roo ew+s-1e-(J.l+!Wde = jo w! r(s)jo r(w 1 (J.l = But r(w + s) wIres) + s) + )S ( J.l ( J-L + 1 )W 1 J-L + 1 f'(w+s) = (w+s-l) . FmaUy, let p = J1./(J1. + 1), so 1 wlr(s) w P = 1/(J.l + 1), and the marginal pdf reduces to a negative binomial distribution with parameters sand p: (see Question 4.5.6). CASE STUDY 5.8.2 Psychologists use a special coordination test for studying a person's likelihood of making manual errors. For any given person, the number of such errors made on the test is known to follow a Poisson distribution with some particular value for the rate parameter, 8. But as we all know (from watching the clumsy people around us who spill things and get in our way), 8 varies considerably from person to person. Suppose, in fact, that variability in e can be described by a gamma pdf. If so, the marginal (Continl/ed on next page) 418 Chapter S Estimation (Case Study.5.B.2 continued) TABI..E 5.8.2 Frequency Negative Binomial Predicted Frequency 82 57 79.2 57.1 2 46 46.3 3 4 5 39 33 28 33.3 Number of Errors, w o 1 6 7 8 9 10 11 22 19 17 12 12 13 10 14 9 8 7 6 16 17 18 28.8 17.0 15.0 11.8 10.4 9.3 8.3 6 5.8 19 5 5.2 20 21 22 5 4 4 4.6 24 25 26 3 3 3 2 2 28 29 30 2 2 2 13 4.1 3.7 3.3 2.9 2.4 2.1 1.9 1.7 1.5 13.1 504.0 pdf of the number of errors made by a individual should have a binomial distribution (according to Example 5.8.4). Columns 1 and 2 of Table 5.8.2 show the number of errors made on the coordination of 504 made zero errors, 57 made one error, and test by a (Continued on nexr page) Section 5.8 Bayesian Estimation 419 so on. To know whether those responses can be adequately modeled by a s be estimated. that end, binomial distribution requires that the parameters p p in a binomial it should that the maximwn likelihood II is liS 1 L can be calculated by choosing a WI. for s ;=J and solving p, By trial and error, the entries shown in Colwnn 3 were based on a negative binomial pdf for which s = 0.8 and p = (504)(0.8)/3821 = 0.106. the model fits exceptionally well, which supports the carried out in 5.8.4. Bayesian Estimation Fundamental to philosophy of analysis is notion that aU relevant information about an unknown parameter, 0, is encoded in the parameter's! posterior distribution, ge(O I W = w). Given that premise, an obvious question How can ge(O I W w) be used to calculate an appropriate poin.t estiml11or, One approach. similar to using the likelihood function to find a maximwn likelihood estimator, is to differentiate the jX)sterior distribution, case the value for which dgeCO 1 W = w)/de = O-that the mode-becomes For theoretical reasons, though, a method much preferred by Bayesians is to use some theory as a for identifying a reasonable In particular, key ideas from estimates are chosen to minimize the risk associated with B. where the risk is expected value the loss incurred the error in the Presumably, as B 0 further away from O-that is, as the estimation error larger-the with will increase. e. e. e e be an estimator for (} based On a W. The loss function. Definition 5.8.2. associated with Bis denoted L(e.O}. L(e. O} ~ 0 and L(O. 0) = O. EXAMPLE 5.8.5 It is typically the case that quantifying in way consequences, economic or otherwise, of not equal to 0 is all but impossible. The "generic" loss functions in those are chosen primarily for their mathematical convenience. Two 01 and L(O,O} = (B - 8)2. "'Vll"...U"~.", most frequently llsed are L(e,O) = though, the context in whicn a parameter is being estimated does allow for a loss function to be defined in a very specific and relevant way. by Max, the video game whose Consider tne inventory diJemma illegal activities were described in ExampJe 5.8.2. The unknown parameter in question the proportion n potential customers who would purchase a copy of Party. Suppose Max decides--for whatever reasons-to estimate 0 with As a consequence, it would follow that he should have 11 copies of the video game available. That what would corresponding loss funciton? the implications of 8 not being equal to (} are readily quantifiable. If B < sales will be lost (at a cost of, $c video). On other if e Ie - .t:...VJlUVL.... e e, 420 5 Estimation there will be n(fJ 8) unsold videos, each of which would incur a cost of, say, $d per unit. The loss function that applies to Max's situation, then, is clearly defined: L(9,8) = {$cn(~ - B) Sdn(8 - 8) iffJ < 8 if fJ > 8 Definition 5.8.3. Let L(0,8) be the loss function associated with an estimate of the parameter O. Be(8 I W = w) the posterior distribution of the random variable S. Then the risk with 0 is the value of the loss function with respect to the posterior distribution of 8. feL(B,O)B0(B risk= { E all e L(O, I W = w)dO if S is continuous I W=w) if e is discrete Using the Risk function to find i; Given that the risk function represents the expected loss associated with the estimator B, it makes sense to look for the B minimizes the Anye that achieves that objective is to be a Bayes estirn£lle. In general, finding the Bayes estimate d(risk)/dB = o. For two of the most frequently used loss functions, LUU'~". the L(8, B) Ie - 01 and L(B, (0 •thereis a much easier way to calculate = = Theorem SAL Ler ge(O I W ter O. 8. = w) be the posterior distribution for the If the loss function associated with jj is o is the median of ge(O I W = w), L(B. B) = 18 b. If the loss function associated with lJ is L(B, forO is the mean ofge(B I W = w). parame- 01, then the Bayes estimate for = (0 - , then Bayes estimate Proof IL The proof follows from a result the expected of a random variable. The fact that pdfin the expectation here is a posterior distribution is irrelevant. The derivation will be for a continuous random variabLe (having a finite expected value); the for the discrete case is Let fw(w) be the pdf for the random variable W, where the median of W is m. Then E(lW ml) = Iw - mlfw(w)dw (m - w)fw(w)dw i: i: =m + /w(w)dw - L oo + i: w/w(w) dw - m few - m)/w(w)dw w/w(w)dw L oo /w(w)dw Section 5.8 I: 421 of the median so, and last integrals are equal E(IW - ml) = - Bayesian Estimation +[ wfw(w)dw wfw(w) dw Now, suppose m ?:: 0 (the proof for negative m is similar). Splittirllg the first into two parts gives E(IW - ml) =- f: wfwCw)dw - 10 m wfw{w)dw + [Wfw(W)dW Notice that the middle integral is positive, so changing its ne)::aw{e UJJI-'H'-''' that EClW - ml) ~- wfw(w)dw ~ 1......, ~ dw 1 +[ m + 0 wfw(w)dw +[ to a plus wfw(w)dw wfw(w)dw Therefore, E(lW - ml) ~ E(lWI) (5.8.1) Finally, suppose b is any constant. Then 1 2 = P(W :::; m) = pew showing that m - b is Equation 5.8.1 to E(IW - ml) W - b. Applying of the random ......IV ......." = E(I(W b :::; m - b), - b) - (m - b)l) ~ E(lW - bl) which implies that the median of 80(8 I W = w) is the estlma'te for 8 when L(e, e) = 81· b. Let W be any random variable whose mean is J), and whose variance is finite, and let b be any constant. Ie - E[(W - b)2] = - - J1.) + - J1.)2] ;:::: Var(W) (J1. - b)f + 2CJ1. +0+ - h)E(W - J1.) + CJ1. - h)2 (J1. - h)2 implying that E[(W - h)P is minimized when b = J1.. It follows that the Bayes estimate for e, given a quadratic loss function, is mean posterior distribution. 0 422 Chapter 5 Estimation EXAMPlE 5.8.6 Recall Example 5.8.3, the ,..,,.,,r::l ....,,"II"" in a Poisson distribution is assumed to have a n gamma prior distribution. For a random of size n, where W L 1=1 pw(wI8) = e- n8 (nB)W Jw!, w = 0, 1,2•... f.l,s fe(e) = _H- which resulted in the posterior disrribution being a gamma pdf with w +s and f.l + n. Suppose the loss function associated withB is quadratic, L(e, 8) = (/j - ())2. By part (b) of Theorem the Bayes estimate for (J is the mean of the posterior distribution. From Theorem 4.6.3, though, the mean of gf}({J I W = w) is (w + s)J(J}. + n). Notice that which shows that the estimate for w +s J}. +n estimate is a weighted average of w the maximum likeHhood e and .:., the mean of the J}. prior distribution. Mo~eover, as n gets large, the Bayes estimate converges to the maximum likelihood estimate. QUESTIONS 5.8.L Suppose that X is a geometric random variable, where PX (kl()) = (1 - 8)k{J, k = 1.2, ... Assume that the prior distribution for 8 is the beta pdf with parameters l' and s. find the posterior distribution for B. 5.8.2. Find the squared-error loss (L(e. B) = (8 8)2) estimate for IJ in Example 5.8.2 and express it as a weighted average of the maximum likelihood estimate for () and the mean of the prior pdf. 5..8.3. Suppose the binomial pdf described in Example 5.8.2 refers to the number of votes a candidate might receive in a poll conducted before the general election. Moreover. distribution has been assigned to B, and every indicator su!~ests suppose a beta the election close. The pollster, then, has good reason for concentrating the bulk of the distribution around the value B = ~. the two beta parameters r and s both equal to 135 will accomplish that objective (in the event I' = S = 135, the probability of IJ being between 0.45 and 0.55 is approximately 0.90). (a) Find the corresponding posterior distribution. (b) Find the squared-error loss Bayes estimate for () and express it as a weighted average of the maximum likelihOOd estimate for () and the mean of the prior pdf. 5.8.4. What is the loss Bayes estimate for the parameter () in a binomial pdf, where B has a uniform distribution-that a noninformative (Recall that a uniform prior is a beta pdf for which I' = S = 1). 5.8.5. In Questions 5.8.2-5.8.4, is the estimate unbiased? Is it asymptotically unbiased? 5.8.6. Suppose that Y is a gamma random variable with parameters rand e and the prior lS also gamma with parameters sand f.L. Show that the posterior pdf is gamma with parameters r + sand y + f.l. Section 5.9 a Second Look at Statistics (Revisiting the Margin of Error) 423 5.8.7. Let Y]. Y2, .. . , Y" be a random sample from a gamma pdf with parameters rand where the prior distribution assigned to lJ is the pdf with parameters sand Jl. Let W = f t + + ... + f". Find the posterior pdf for 8. 5.8.8. Find the squaroo-error Joss Bayes estimate for 8 Question 5.8.7. 5.8.9. Consider, again, scenario described in binomial random variable X has nand lJ, where the latter a beta prior with integer para(Joeters and s. Integrate the joint pdf px(k I 8)fe(lJ) with to lJ to show that pdf of X is given by k = 0,1, ... ,II TAKING A SECOND LOOK AT STATISTICS (REVISmNG THE MARGIN OF ERROR) The margin of error, d, was introduced in Definition 5.3.1 as being half the width of the largest 95% confidence interval the binomial parameter p. As such, it serves as a useful measure of the sampling variation associated with the p X = -, 11 where the random variable X is the number of succeSSeS observed in 11 trials. That is, we would expect in the long run that at least 95% of the intervals (Pe - d, Pe + d) would the true p. where the observed proportion of successes. error and how it should be interpreted Unfortunately, the meaning of the have been distorted by the popular The mistake that is made is particularly prevalent and most egregious in the context of political polls. Here is what A poll (based on a sample of n voters) is conducted, Showing, for example, that 52% of the intend to support Candidate A and 48%. Candidate B. Moreover, the of error, based on the sample of size 11, is (correctly) reported corresponding What often comes next is a statement that the race is a "statistical to be, the difference between the two percentages, tie" or a "statistical dead heal" 52% 4%, is within the 5 % of error. Is that statement No. Is it even to being true? No. If the observed difference in the supporting A and Candidate B is 4% and the margin of error is 5%, then the widest possible 95% interval for p, the true difference between the two percentages (p = true %) would be = (-1%.9%) The implies that we should not out the possibility that the true value for p could as smaU as -1 % (in which case B would win a tight or as large a landslide). The in the as +9% (in which case Candidate A would -1 % to "statistical tenninology is the that all the possible values +9% are equally likely. ThaI is simply not true, every confidence parameter near the center are much more than those near either the left-hand Or 424 Chapter 5 Estimation endpoints. Here, a 4% lead for A a poll that has a 5% margin error is not a "tie"-quite the contrary, it would more properly be interpreted as almost a that Candidate A wm. there is yet a more problem in using the margin of error as a measure of the day-ta-day or week-to-week variation in political polls. By definition, the of error refers to sampling is, it reflects the Tlonf.,nl'lnfl extent to which p = X varies if of size n are drawn n from the same Consecutive political polls, the same population. Between one poll and the next, a variety of can that can fundamentally change of the voting candidate may give an especially good speech or an embarrassing a scanda1 can that damages someone's or a world event comes to reason or another reflects more on one candidate than the Although ' • fI X a 11 these possIbilities have the to m uenee the value of much more than n """"P>"'5 variability can. none of them is inc1uded in the margin of error. APPENDIX S.A.1 MINITAB APPUCAllONS ability to generate random observations from many of the standard LUL' .... "'V ...". computers can be very effective in illustrating estimation Table 5.4.1, showing a simulation consisting of forty samples of size two drawn the pdf fy(y; (j) = ~e-YI8, y > O. each (in column C4) of unbiased ness, it was is the estimate . As a demonstration of the pointed out that the of those forty Bes is 1.02, a number close to l.OO, the theoretical value of [}. The meaning of confidence intervals can also be nicely MINITAB's RANDOM command. formulas for confidence intervals is straightforward, but calling attention to their variability to sample is best aC(~ornplllStled Monte Carlo analysis. Example 5.3,1 is a case in point. The fifty simulated 5.3.1 reinforc(e the interb.~etation that;~o)uld be intervals displayed in y - 1.96 -J4' Y + 1.96 J4 any particular evaluation of the . distributions of some their important also be examined using the computer. the serial number analysis rI""::'l".r,nP.rI Study 5.4.1. If the production numbers to be estimated are large, then the "ccn .......... ti'n ... the numbers represent a from a discrete uniform replaced by the assumption that the captured serial numbers a salnplle from the (easier-to-work-with) continuous uniform pdf, defined over the Two unbiased estimators 81 = Yi (2In) i=l Appendix SA1 MINITAS Applications and fJz = {(n + 1)ln)Ymax Example 5.4.2). A (ollow-up analysis in Example 5.4.7 showed that Ih.. is the better of two estimators ...,.....c,,"" .. suppose the complexity of two unbiased estimators precluded the calcwation of simplest solution would How would we which to use? Probably to simulate each one's distribution and compare their sample standard deviations. Figures S.A1.! and S.A.1.2 ilJustrate that technique on the two estimators fit := (2In) liTE YI and 82 = «n + 1)ln)Yrnax > random 200 SUBC > uniform 0 3400. MTB > rmean ci-c5 c6 MTB > c7 == MTB > histogram c7; SUBC > start 2800; SUBC > increment 200. Histogram of C7 N ~ 200 48 Obs. be loy the first class Midpoint 2800 3000 3200 Count 12 12 19 3400 3600 13 22 3800 4000 17 11 4200 4400 14 8 4600 10 4800 5000 5200 3 6 3 ************ ************ *********.*~*~.**** ************* **************.****••• *•• ************** .********** ******* ••••••* ******** ***••••*** *** '1""*.** *** 5400 2.* KTB :> describe c7 C7 C7 N MEAN MEDIAN 200 MIN 997.0 3383.8 3418.3 TRMEAN 3388.6 MAX Ql OJ 5462.9 2718.0 4002.1 FIGURE SA 1.1 STDEV SEMEAN 913.2 64.6 426 Chapter 5 Estimation tITS > random 200 c1-c:S; SUBC > uniform 0 3400. MTB > rma.ximum c1-c5 c6 MTB > let c7 - (6/5}*c6 MTB > histogram c7; SUBC > start 2800; SUBC > increment 200. Histogram of C7 N = 200 32 Dba. belov me firs"!; class Midpoint Count 2800 8 ******** 3000 10 *.***•••** 3200 17 ••***••• ********* 3400 2:2 .****•••**•••**•••***. 3600 36 .*********************************** 37 3800 **********•••**.*****.****••••**••*** 4000 38 **************************••••**••***. MTB > describe c7 N MEAN MEDIAN TRMEAN STDEV SEMEAN 3398.4 3604.6 3437.1 563.9 39.9 C7 200 1;11 1;13 MIN MAX 1S13.9 4On.4 3093.2 3841.9 C1 FIGURE S.A.1.2 for the uniform parameter 0. Suppose that n = 5 serial numbers have been "captured" and the true value for () is 3400. Figure 5.A.l.1 shows the MlNITAB syntax for generating two hundred samples of size five from fy(y; B) = 1/3400,0 ::.s y ::.s 3400 and calculating 0.. The DESCRIBE command shows that the average of the Oes is 3383.8 and the sample standard deviation of the two hundred estimates is 913.2. In contrast, Figure 5.A.1.2 details a si.mjlar simulation (two hundred samples, each of size five) for the estimator th. The accompanying DESCRIBE output lends supJX)rt to the claim that ih. is the better estimator-it shows the average Be to be closer to the true value of 3400 than the average Be calculated from 61 (3398.4 versus 3383.8) and its sample standard deviation is smaller than the sample standard deviation of the ()es from Ot (563.9 versus 9132). HAPTER 6 Hypothesis Testing 6.1 6.1 6.3 6.4 6.5 6.6 INTRODUcnON THE DECISION RULE TESTING BINOMIAL DATA-No: P = Po TYPE I AND TYPE II ERRORS A NOTION OF OPTIMALITY: THE GENERALIZED llKEUHooD RATIO TAKING A SECOND LOOK AT STATISTICS (STATISTICAL SIGNIACANCE VERSUS "PRACTICAl''' SIGNIFICANCE) Pierre-Simon, Marquis de Laplace As a young man, Lap/ace went to Paris to seek his fortune as a maththat he enter the clergy. He ematician, disregarding his father's soon became a protege of d'Alembert and at the age of twenty-four was elected to the Academy of Sciences. Laplace was recognized as one of the ,,,,,,nmln figures of that group for his work in physics, celestial mechanics, and pure mathematics. He also enjoyed some political prestige, and his friend, Napoleon Bonaparte, made him Minister of the Interior for a brief period. With the restoration the Bourbon monarchy, renounced made him a marquis. Napoleon for Louis XVIII, who -Pierre-Simon, Marquis de Laplace (1749-1827) 427 428 6.1 Chapter 6 Hypothesis INTRODUcnON as we saw in Chapter 5, often reduce to numerical of parameters, either in the form of single points or as confidence intervals. But not always. In many experimental situations, the conclusion to be drawn is Mt numerical and is more aptly phrased as a choice between two conflicting theories, or hypotheses. A court psychiatrist, for example, may be called upon to pronounce an accused murderer either "sane" or "insane"; the FDA must decide whether a new vaccine is «effective" or , a geneticist concludes that the inheritance of color in a certain strain of Drosophila melanogaster either "does" or "does not" follow classical Mendellan principles. In this "U"V"~~ we examine the statistical methodology and the attendant consequences involved in makmg decisions of this sort The process of dichotomizing the p05Sible conclusions of an experiment and then using the theory of probability to choose one option over the other is known as hypothesis testing. The two propositions are called the null hypothesis (written Ho) and the altenwtive hypothesis (written H1). How we about choosing between Ho and HI is conceptually similar to the way a jury deliberates in a court null hypothesis is analogous to the defendant lust as the latter is presumed innocent until "proven" guilty, so is the nun hypothesis "accepted" Wlless the data argue overwhelmingly to the contrary. Mathematically, choosing and is an in applying courtroom protocol to situations where the of measurements made on random variables, Chapter 6 focuses on basic principles-in particular, on the probabilistic structure that underlies the process. Most of the important specific applications of hypothesis testing will be taken up beginning in Chapter 7. 6.2 THE DEGSION RULE We will introduce basic concepts of hypothesis testing with an example. Imagine an automobile company looking for additives that might mileage. As a study, they thirty cars fueled with a new additive on a road trip from Boston to Los Angeles. Without the additive, those same cars are known to average 25.0 mpg with a standard deviation (0') of 2.4 mpg. Suppose it turns out that the thirty cars averaged y 26.3 mpg with the additive. What should the company conclude? If the additive is effective but the position is taken that the (rom 25,0 to 26.3 is due solely LO chance, the company will have passed up a potentially lucrative product. On the other hand, if the additive is but the firm interprets the mileage increase as "proof" that the additive works, time and money will ultimately be wasted developing a product that has no intrinsic value, In would assess the increase from 25.0 mpg to 26.3 mpg by framing the company's choices in the context of the courtroom analogy mentioned Section 6.L null hypothesis, which is typically a statement reflecting status quo, would be the assertion that the additive has no effect; the alternative hypothesis would claim that the additive does work. agreement, we give Ho (Like the defendant) the benefit of the doubt. If the road trip average, then, is "close" to 25.0 in some probabilistic sense its still to be determined, we must conclude that the new additive has not = Section 6.2 The Decision Rule 429 superiority. The problem is, whether mpg qualifies as being "close" to mpg is not immediately obvious. At this point, rephrasing the question in random variable terminology will prove helpful. Yl, )'2, ••. , Y30 denote the mileages recorded by of the cars cross-country test run. We assume that YiS are nonnally distributed with an unknown mean IL Furthermore, suppose that experience with road tests ofthis type suggest that (f will equal 2.4.1 That is, fy(y; Jl) = -==--e -oo<y<oo then, can be expressed as statements about The two competing we are testing Jl HI: Jl Jl. effect. = 25.0 (additive is not effective) versus > 25.0 (additive is effective) Values of the sample mean, y. less than or equal to 25.0 are certainly not grounds for rejecting the null hypothesis; a bit larger than 25.0 would also lead to that conclusion (because of the commitment to give Ho the benefit the doubt). On the other hand. we would probably view a cross-country of, say, 35.0 mpg as exceptionaHy strong evidence against the null hypothesis, and our decision would be "reject Ro." It follows that somewhere between and 35.0 there is a point-call it for all practical purposes the credibility of Ho (see 6.2.1). Possible sample means 25.0 Values of Y not markedly inconsistent with the assertion that 25 Values ofy lhatwould appear to refute Ho fIGURE 6.2.1 Finding an appropriate numerical value for 31'" is accomplished by combining the counroom analogy with what we know about the probabilistic behavior of Y. Suppose. for to 25.25-that is, we would reject Ho if Y 2::: the sake argument. we set y* that a good decision rule? No. Jf25.25 defined "close," then Ho would rejected 28% III] praclice. the value of a w;ually needs to be estimated; we will return to that more frequel]tly enoolilitered s«nario in Cbapter7. 430 Chapter 6 Hypothesis Testing 1.0 Distribution of V when ,',,Ht) : .u '" 25.0 is trlle /' J" I Ho is true) Area" P =0.2843 A' /' ; .. " '" ......... -- --'" 0.5 A' A' 23.5 24.0 24.5 25.0 26.5 J"'=25.25 LRejectHo RGURE 6.2.2 of the time even if Ho were true: P(we Ho I Ho is true) = P(Y ::: = I J.1, = 25.0) - 25.0 > 25.25 P - 25.0) 2.4/.J30 =P(Z ::: 0.57) =0.2843 Figure 6.2.2). Common sense, though, tells us that 28% is an inappropriately large No jury, for example, would probability for making this kind of incorrect convict a defendant knowing it had a 28% chance of sending an innocent person to jail. Larger. Would it be to set y* equal to, say, Clearly, we need to make 26.507 Probably not, because setting y* that large would err in the other direction by giving the null hypothesis too much benefit of the doubt. If y* = 26.50, the probability of rejecting Ho if Ho were !me is only 0.0003: P(we reject Ho I Ho is true) :::: p(Y::: 26.50 I J.1, = 25.0) = p y - 25.0 ( 2.4/.J30 >- -----==-- = P(Z ::: 3.42) =0.0003 6.2.3}. Requiring that much before Ho would be analogous to a jury not returning a guilty verdict unless the prosecutor could produce a roomful of eyewitnesses, an obvious motive, a signed confession, and a dead body in the trunk of the defendant's car! If a probability of 0.28 represents too little benefit of the doubt 0.0003 too what value should we choose for I Ho is true)? While there is no way to answer that question definitively or mathematically, researchers who use hypothesis testing have COme to a consensus that the probability of rejecting Ho Section 235 24.0 24.5 25.0 The Decision Rule 25.S 431 26.5 I r=26.50 L fiGURE 6.2.3 when Ho is true should be somewhere in the neighborhood of 0.05. seems to suggest that when a 0.05 probability is used, null hypotheses are dismissed too capriciously nor too wholeheartedly. (More will be said about this particular probability, and consequences, Section 6.3.) Comment. In 1768, British were sent to Boston to quell an outbreak of civil disturbances. citizens were killed in the subsequently put on triaJ (or manslaughter. Explaining the a reached, the judge told the jury, "If upon the whole, ye are in any verdier was to reasonable doubt of their guilt, must then. to the rule of declare them innocent" since, the expression "beyond all reasonable doubt" bas been a much evidence is in a jury trial to overturn a frequently indicator of defendant's presumption of innocence. For many experimenters, choosing y* such that P(we reject Ho I Ho is true) :::: 0.05 is comparable to a jury convicting a defendant only if the latter's guilt is established "beyond doubt." Suppose the 0.05 "criterion" is applied here. Finding the corresponding y* is a calculation similar to what was done in Example Given that =:: y* I Ho is true) = 0.05 it follows that - 25.0) P Y - 25.0 > y* - 25.0) = P ( Z > 'y* ----=:( 2.4/~ - 2.4/v'30 - 2.4/..J30 0.05 But we know from Appendix A 1 that P(Z ;::: 1.64) = 0.05. Therefore, y'" - 25.0 2A/..J30 which implies that y* = 25.718. 1.64 (6.2.1) 402 OIapter 6 Hypothesis Testing The company's statistical is now completely determined: They should reject the null hypothesis that the additive has no if y 2:: 25.718. Since sample mean was 26.3, the appropriate decision indeed, to reject No. It appears that the additive does 11'1"'1"",$>·0" mileage. Comment. It must be remembered that rejecting No does not prove that No is false, any more than a jury's decision to convict guarantees that the defendant is guilty. The 0.05 decision rule is simply saying that lIthe true mean (lot) is 25.0, sample means (y) as or than 25.718 are expected to occur only 5% of the time. Because of that is that lot is MIl5.0. small probability, a reasonable conclusion when y 2:: Table 6.2.1 is a computer simulation of this particular 0.05 decision rule. A total of seventy-five random each of size thirty. have been drawn from a normal distribution having lot == 25.0 and (J = 2A. The corresponding y for each is then the entries in table indicate, five of the lead compared with y* == 25.718. to the erroneous conclusion that Ho: lot = 25.0 should be rejected. TABLE 6.2.1 24.602 24.587 24.945 24.761 24.177 25.306 25.601 no no no no no no no no 24.547 24.235 25,809 no no DO 25.307 25.011 24,783 25.196 24.577 24.762 25.805 24.380 25.224 24.371 25.033 yes no no no no no no yes no no no no 25.866 25.623 24.550 24.919 24.770 25.080 25.3(J7 24.004 24.772 24.843 25.771 24.233 24.853 25.018 25.176 24.750 25.578 24.807 24.298 24.807 24.346 25.261 25.391 2:: 25.718? Y ?:: 25,7181 no yes no no no no no no no no no yes no no no no no no no no no no no no no 25.200 25.653 25.198 24.758 24.842 no no no no no no no no no no no no no no no no no no no no no no no no no 24.793 24,874 25.513 24.862 25.034 25.150 24.639 25.045 24.803 24.780 25.691 24.207 24.743 24.618 25.401 24.958 25.678 24.795 Section 6.2 The Decision Rule 433 Since each sample mean has a 0.05 probability of exceeding 25.718 (when iJ- =25.0), we of the sets to result in a "reject Ho" conclusion. would expect that 75(0.05), or Reassuringly, the observed number of inferences 5) is quite dose to that expected value. f-l = f-lo is rejected a 0.05 decision rule, we say that the Comment. If difference between y and f-lo is statistically significant. Expressing Decision Rules in Terms of Z Ratios As we have seen, decision rules are statements that spell out the conditions under which a null hypothesis is to be rejected The format of those statements, though, can Depending on context, one may easier to work with than another. 1. Rejecting Ho: f-l 25.0 when Recall Equation Y ::: = 25.0 + 1.64 2.4 . .jjQ = 25.718 is dearly equivalent to rejecting Ho when y 25.0> 1.64 2.4/J30 - (6.2.2) (if one rejects the null hypothesis. the other will We do the same). Y - 250 from Chapter 4 that the random variable . has a standard normal distribution (if f-l = 25.0). When a particular y is substituted for (as in Inequality 6.2.2), y - 25.0 . is typically (and most we call J30 the observed z. Choosmg between Eo and 2.4/ 30 conveniently) done in tenns of the observed z. In Section 6.4, though, we will encounteT certain questions related to hypothesis testing that are best answered by phrasing the decision rule in terms ofy* Definition 6.2.1. function of the observed data whose numerical value dictates whether Ho is accepted or is called a test statistic. set of values the test statistic that result in the null hypothesis being rejected is called the critical region and is denoted C. The particular point in C that separates the rejection from the acceptance is called the critical vallll!. Comment. For 25.0 qualify as test statIStics. . . gas mt'1eage examp1e, bo t h -yan d "----==_ If the sample mean is used, the associated critical region would be written C = (y; y ~ 25.718} 434 Chapter 6 Hypothesis Testing (and is the critical value). If the decision rule is framed in terms of a z ratio, C In this latter case, the critical = {z; z = - 25.0 ?::: 1.64 ) is 1.64. Definition 6.2.2. The probability that the test statistic Jies in the critical region when Ho is troe is called the level ofsignificance and is denoted a. Comment. principle, the value chosen for a should the consequences of making the mistake of rejecting Ho when Ho is true. As those consequences get more severe, the critical region C should be defined so that a gets smaller. In practice, though, to quantify the costs of making incorrect inferences are arbitrary at best. In most situations, abandon any such attempts and routinely set the level of "ll'!.~Ulj'''''''''''''''' equal to 0.05. If another a is used, it is likely to be either 0.001, 0.01, or 0.10. again, the similarity between hypothesis and courtroom protocol is worth keeping in mind. Just as experimenters can make a larger or smaller to reflect the Ho when Ho is true, so can juries demand more consequences of mistakenly or less evidence to return a conviction. For juries, any such are usually dictated punishment. A grand jury deciding whether or not to by the severity of the someone for fraud, for example, will inevitably require less evidence to return a conviction than will a jury impaneled for a murder triaL One-Sided Versus Two-Sided Alternatives In most hypothesis tests, Ho ccmsists of a number, typically the value of the IJO'.OH.\,OUo;..J. that the status quo. The "25.0" in Ho:}J. = 25.0, for example, is the mileage that would be when the additive has no effect. If the mean of a normal our general notation for the null hypothesis distribution is the parameter being will be Ho:}J. = }J.o. where }J.o is the status quo value of }J.. Alternative hypotheses, by way of contrast, invariably embrace of parameter values. If there is reason to believe before any dala are collected that the is necessarily to one particular "side" of Ho, then HI is parameter being defined to retlect that limitation and we say that the alternative hypothesis is one-sided. Two variations are possible: H t can be one-sided to the left (HI:}J. < }J.{) or it can be onesided to lhe righl (Hl:}J. > }J.o). If no such a priori information is available, the alternative hypothesis needs to accommodate the possibility that the true parameter value might lie on of }J.O. Any such alternative is said to be For Ho:}J. = }J.o, the two-sided alternative is written Ht:}J. ¢ J.l.o< III the gasoHne example, it was tacitly assumed that the additive would either have no effect (in which case J-L = 25.0 and Ho would be true) or it would increase mileage true mean would 1ie somewhere "to the right" of Ho). Accordingly, (implying that we wrote the alternative hypothesis as }J. > 25.0. If we had reason to suspect, though, that the additive might interfere with the combustibility and possibly decrease mileage, it would have been necessary to use a two-sided alternative (Hl:}J. ¢ 25.0). Section 6.2 The Decision Rule 435 Whether the alternative hypothesis is defined to be one-skied or two-sided is important because the nature of H1 plays a key role in determining the of the critical region. We saw earlier that the 0.05 decision rule testing HO:JL = 25.0 versus :JL > - 25.0 ?!: 1.64. That is, only if the sample mean is if calls for Ho to be substantially larger than 25.0 will we Ho. If the hypothesis had been two-sided, means either much smaller than 25.0 or much larger than 25.0 would be evidence against flo (and in support of H1), Moreover, the 0.05 proba bility associated with the critical region C would be split into two halves, with 0,025 being assigned to the left-most portion of C, and 0.025 to the right-most portion. From Appendix Table Al, though, P(Z ::s -1.96) = P(Z ?!: 1.96) = 0.025, so . - 25.0 . the two-sided 0,05 decision rule would call for HO: JL to be rejected if is = either (1) .::: -1.96 or (2) ?!: 1.96. Testing Ho: JL =:: /La (a known) Let Za be the number having the property that P(Z ?:: Za) = a. Values for Za can found from the standard normal edt tabulated Appendix AI. If a = 0.05, for example, Z.05 = 1.64 (see Figure 6.2.4). Of course, by symmetry of normal curve, -Za has the property that P(Z ::s = a. 0.4, ; I /' --'..' ; I , .- I 0.2 I' Area 0.05 o AGURE 6.2.4 Theorem 6.2.L Let Yh)'2 •..• ,Yn be a rantiom sample - JLo where (}' is known. leI Z = "-----;;::;i-, n from a normal distribution a. To test Ho: JL = JLo versus JL > JLo al the a level of significance. reject Ho if Z ?:: Za· b. To test Ho: 11- = 11-0 versus H1; JL < 11-0 III the a level of significance, Ho if Z:::: -Za· = JLo versus HI: JL ::s -Zaj2 or (2)?:: '4tj2. c. To lest Ho: JL either (1) :;f:. JLo at the a level of significance, reject Ho if z is 436 Chapter 6 Hypothesis Testing EXAMPlE 6.2.1 As part of a "Math for the Twenty-First Century" initiative, Bayview High was chosen to participate in the evaluation of a new algebra and geometry curriculum. In the recent past, Bayview's students would "typical," having earned scores on standardized exams that were very consistent with national averages. Two years a cohort of eighty-six Bayview sophomores, aU randomly selected., were to a special set of classes that integrated algebra and geometry. Aceording to test that have just been those students averaged on the SAT-I math exam; nationwide, seniors averaged 494 with a standard deviation of it be claimed at CJ = 0.05 level of significance that the new curriculum had an effect? To we define the parameter J..L to be the true average math score that we the new to produce. The obvious "status quo" value for J..L is the current national average-that is, /.to = 494. The alternative hypothesis here should be two sided because the possibility certainly exists that a revised curriculum-however well intentioned-would actually lower a student's achievement. According to Part (c) of Theorem 6.2.1, then, we should reject Ho:J..L = 494 in favor of /.t ¢ 494 at CJ 0.05 level significance if the test statistic z is either (1) < -Z.025(= -1.96) or (2) 2::. Z.02S(= 1.(6). But y = 502, so = z = 502 - 494 =0.60 implying that our decision should be "Fail to "though Bayview's is eight points above the national average, it does not follow that the improvement was due to the new curriculum: An increase of that magnitude could easily have occurred by chance, even if the new curriculum had no effect whatsoever (see Figure 62.5). 0.4, ,, ; Area = 0.025 , .., # I /02 ~",fz(z) ..... " '\ \ ... ... "- ... Area =0.025 FIGURE 6.2.5 Comment. If the null hypothesis is not rejected, we should phrase the conclusion as "Fail to reject Ho" rather than "Accept Bo." Those two statements may seem to be same, but. in they have very different connotations. The phrase "Accept Ho" suggests that the experimenter is concluding that Ho is true. that may not be the case. In a court trial, wben a jury returns a verdict of "Not guilty," they are not saying that Section 6.2 The Decision Rule 437 they necessarily believe that the defendant is innocent. They are simply asserting that the evidence-in their opinion-is not sufficient to overturn the presumption that the defendant is innocent That same distinction applies to hypothesis testing. If a test statistic does not faU into the (which was case in Example 6.2.1), the proper interpretation is to conclude that we to Ho." The P-Value There are two general ways to quantify the amount of evidence against Ho that is contained in a given set of data. first involves the level of significance concept introduced in Definition Using that the experimenter selects a value for 0: (usually 0.05 or 0.01) befoN:! any data are collected. Once 0: is specified, a corresponding critical can be identified. If the test statistic falls in the critical region, we reject Ho at the a level of stgmolcal1ce Another strategy is to calculate a P-value. Definition 6.U The P-value with an observed test statistic is the probability of getting a value for that test statistic as extreme or more extreme than what was actually observed (relative to HI) given that Ho is true. Comment. statistics that yield smaU P-values should be interpreted as evidence against Ho. More specifically, if the P-value calculated for a test statistic is less than or equal to a, null hypothesis can rejected at the a level of significance. put another way, the P-value is the smallest 0: at which we can Ho. EXAM PtE 6.2.2 Recall Example 6.2.1. Given that Ho: J.L P-value is associated with the interpreted? Ho: J.L 494 is being test statistic, Z against HI: J.L *" 494, what = 0.60, and how should it be = 494 is true, the random variable Z = - - = a standard normal pdf. Relative to the two-sided HI. any value of Z or equal to 0.60 or than or equal to -0.60 qualifies as being "as extreme or more extreme" than observed z. Therefore, Definition 6.2.3, P-value = P{Z ?:: 0.60) + P(Z:::: -0.60) = 0.2743 + 0.2743 =0.5486 Figure 6.2.6). As noted in the preceding comment, P-values can be used as decision rutes. In J..;;,l'.CHUpU;;' 6.2.1, 0.05 was the stated level of significance. Having determined here that the P-value associated with z = 0.60 :is 0.5486, we know that Ho: J.L 494 would not be rejected at the given a. Indeed, the null hypothesis would not be rejected aoy value of a to and induding 0.5486. Notice that the P-value would been halved had H1 been one-sided. Suppose we were confident that the new a1gebra and geometry classes would not lower a student's = 438 Chapter 6 Hypothesis Testing P-vlllue Area 0.2743 + 0.2743 '" 0.54S6 0.2743 '" 0.2743 RGUR£ 6.2.6 math SAT. The appropriate hypothesis test in that case would be Ho: J-L = 494 versus Ht: J-L > 494. Moreover, only. values in the right-hand tail of fz(z) would be considered more extreme than the observed z 0.60, so = P-value = P(Z 2: 0.60) = 0.2743 QUESTIONS 6.2.1. State the decision rule that would be used to test the fOllowing hypotheses. Evaluate the appropriate test statistic and state your oonclusion. (11) Ho: J-L = 120 versus Hl; IJ. < 120; y = 1/ = 25. a = a = 0.08 (b) Ho: IJ. = 42.9 versus HI: IJ. ¢ Y = 45.1, n = 16, a 3.2, a = 0.01 (c) Ho: J-L = 14.2 versus H1: IJ. > Y = 15.8, n = 9, a = 4.1, a 0.13 6.2.2. An herbalist is experimenting wilh juices extracted from berries and roots that may have the ability to affect the Stanford-Binet IQ scores of students afflicted with mild cases of attention deficit disorder (ADD). A mndom sample of 22 children diagnosed with the condition have been drinking Brain-Blaster daily for two months. Past experience of 95 on the IQ test with a standard suggests that children with ADD score an the a = 0.06 level of S12ItImlcan.ce. deviation of 15. If the data are to be what values ofy would cause Ho to be rejected? Assume that H1 is two-sided. 6.2.3. (a) Suppose Ho: /.L = J.Lo is rejected in favor of Ht: IJ. "" IJ.() at the ct = 0.05 level of significance. Would 110 necessarily be rejected at the a = 0.01 level of significance? in favor of Nt: J.L ¢ IJ.() at the a 0.01 level of (b) Suppose Ho: J.L = IJ.() is sjgnifican~ Would Ho necessarily be rejected at the a = 0.05 level of significance? 6.2.4. Company reooros show that drivers get an average of 32,500 miles on a set of Road All-Weather radial tires. Hoping to improve that figure. the oompany has added a new polymer to the rubber that should help protect the tires from deterioration caused by extreme temperatures. fifteen drivers who tested the new tires have reported getting the company claim that the polymer has produced a an average of 33,800 miles. statistically significant increase in tire mileage? Test Ho: IJ. = 32,500 against a one-sided alternative at the ct = 0.05 level. Assume that the standard deviation (a) of the tire ""''':;''g'''' has not been affected the addition ofthe polymer and is still 4000 miles. Section 6.2 The Decision Rule 439 6..2.5. If Ho: /k = J1.v is in fallor of H l : /k > /ko, will it necessarily be rejected in favor of HI: j.t flo? Assume that a remains the same. 6.2.6. A random sample of size 16 is drawn from a normal distribution having (1 = 6.0 for the purpose of testing Ho: fL = 30 versus HI: fL #- 30. The experimenter chooses to define the critical region C to be the set of sample means lying in the interval (29,9, 30.1), What level of significance does the test have? Why is (29.9,30,1) a poor choice for the critical region? What range ofyvalues should comprise C, assuming the same a is to be used? 6.2.7. RecaU the breath analyzers described in Example 4.3,6, The folJowing are 30 blood alcohol determinations made by Analyzer GTE-lO, a tbree-year-old unit that may be in need of recalibration. AU 30 measurements were made using a test sample on which a properly adjusted machine would a reading 12,6%. * 12.3 12.6 132 13,1 13.1 12.7 13.1 12.8 12.9 12,4 13.6 12.6 12.4 12.4 13.1 12.6 12.6 13.1 12.9 12,7 12.4 12.6 12.5 12.4 12.7 12,9 12.6 12.4 (a) If fL denotes the true average reading that Analyzer GTE-lO would make on a person whose blood alcohol concentration is 12.6%, test HO:fL 6.2.8. 6.2.9. 6.2.11). 6.2.11. = 12,6 at the a = 0.05 level of significance. Assume that (1 = 0.4. Would you recommend that the machine be readjusted? (b) What statistical assumptions are implicit in the hypothesis test done in Part (a)? Is there any reason to suspect that those assumptions may not be satislied? Calculate the P-values for the hypothesis tests indicated in Question 6.2.1. Do they agree with your decisions on whether or not to reject Ho? Suppose Ho: j.t = 120 is tested against HI: fL #- 120. If a 10 and n 16, what P-value is associated with the sampJe mean y = 122.3? Under what circumstances will Ho be rejected? As a class research project, Rosaunt,warus to see whether the stress of final exams elevates tbe bJood pressures of freshmen women. When they are not under any untoward duress, healthy 18-year-old women have systolic blood pressures that average mm Hg with a standard deviation of 12 rom Hg. If Rosaura finds that the average blood pressure for the 50 women in Statistics 101 on the day of the final exam is 125.2, what should she conclude? Set up and test an appropriate hypothesis. As input for a new inflation model, economists predicted that the average cost of a hypothetical "food basket" in east Tennessee in July would be $145.75. The standard deviation ((1) of basket prices was assumed to be $9.50, a figure that has held fairly constant over the years. To check their prediction, a sample of25 baskets representing different parts of the region were checked in late July, and the average cost was $149.75. Let a =0.05. Is the difference between the economists' prediction and the sample mean statistically significant? = = 440 Chapter 6 6,3 TESTING BINOMlAl DAT A-Ho: P ::: Po Hypothesis Testing Suppose a set of data-kl, k1 • ... , k,,-represents the outcomes of n Bernoulli trials, where ki = 1 or 0, depending on whether the ith trial ended in success or failure, respectively. If p = P(ith in success) is unknown, it may appropriate to test the null hypothesis Ho: p Po, where Po is some particularly relevant (or status quo) value of p. Any such procedure is called a binomial hypothesis test, because test statistic, as we will see, is a function = kl + k2 + ... + kif = total nwnber and we know from 3.2.1 that the total number of successes, X, in a series of n independent trials has a binomial distribution, = k = 0, 1, 2, ... ,1'1 procedures for testing P = Po need to of n. In general, if resting on the o< npo - < npo + COflSJClien:d the distinction <n (63.1) we do a "large-sample" test of P = Po based on an approximate Z ratio. Otherwise, a "small-sample" rule is one where the js defined in terms of the exact binomial distribution assoclat(!d with the random variable X. A Large-Sample Test for the Binomial Parameter p Suppose the number of observations, n, making up a set of Bernoulli random variables is sufficiently large that Inequality 63.1 is satisfied. We know in that case from 43 X npo that the random variable has approximately a standard normal pdf, fz(z) if p Po. Values of ---;:-=;;::::=== close to zero, of course, would be evidence in favor of Ho: p = Po [since E of = 0 when P = Po]. Conversely, the credibility p = Po dearly diminishes as -;:.==;===== moves farther and farther away from zero. The large-sample test of Ho: P = Po. Ho: J.J., J.J.,o in Section 6.2. = Theorem 6.3.1. which 0 < npo - the same basic fonn as the test of be a random sam Ie 0 1'1 Bernoulli random variables fOT < npo + 3 npo(l - Po) < n Letk = kl + k2 +, .. + Ie,. k denote the total number of "successes" in the n trials. Define z = ---;:-=:;;;;;;==::::::::::==. ~~"---'-~ a. To test Ho: p = Po versus Hi: p > Po at the a: level ofsignifialrlce, reject Ho if z 2: la. b. To test Ho: p = Po versus HI: p < Po at the a: level ofsignificance, reject Ho if z .:s -zO'o Co To test p = Po versus Ht: p =I:. Po at the a level of significance, reject Ho if z is either (1) .:s -Zaj2 or (2) 2: Za/2· Section 6.3 Testing Binomial Data-Ho: P ;;;: Po 441 CASE STUDY 6.3.1 In gambling parlance, a point spread is a hYJX>thetical increment added to the score of the presumably weaker of two teams playing. By intention. its magnitude should have the effect making game a toss-up; that each team should have a 50% chance of beating the "",.."",1'1 In the on a highly subjective endeavor, which the question of whether or not the Las Vegas crowd actually gets it right (116). Addressthat issue, a recent study examined the records of 124 National Football League games; it was found that in sixty-seven of the matchups (or 54%) the favored team beat the spread. Is the difference between 54% and 50% small enough to written to chance, or did the study uncover convincing evidence that odds makers are nm capable of accurately quantifying the competitive edge that one team holds over another? p = P (favored team beats spread). ff p is value other than bookies are assigning point spreads incorrectly. To tested, are the hypotheses = Ho: P 0.50 versus Ht: P '# 0.50 We will 0.05 to the level of significance. In the terminology of 6,3.1. n = 124, Po = 050. and ki = 1 if favored team beats 1q in ith game if favored team does not beat spread in i th game for i = 1,2, .... 124. Therefore, the sum k = kl + k2 + ,.. + kl24 denotes the total number of times the favored team the spread. According to the two-sided decision rule given in Part (c) of Theorem 6.3.1, the null hypothesis should be rejected if z is either less or to -1.96 (= -Z.05/2) or greater than or equal to 1.96 Z.05/Z)' But z= 67 - 124(0.50) =0.90 does YIOt fall in the critical region, so Ho: P = 0.50 should not be rejected at ex = 0.05 level of significance. The outcomes of these 124 in other words, are entirely consistent with the presumption that ~kies know which of two teams is better, and by how muclL Comment. P -values can they were in Section 6,2 when used to binomial hypothesis just as null hypothesis was Ho: 11. = 11.().ln Case Study 6.3.1, (ConJinuwon next page) 442 Chapter 6 Hypothesis Testing (Cllse Smdy 63.1 continued; for example, the observed test statistic is 0.90 and HI is two sided, so the P-value is O.3Z· P-value = P(Z ::: -0.90) 0.1841 =0.37 + P(Z 2: 0.90) + 0.1841 For any a < 0.37, then, our conclusion that the bookies are competent would remain unchanged, CASE STUDY 6.3.2 There is a theory that people may tend to "postpone" their deaths until some event that has particular meaning to them has passed (139). Birthdays. a family as the sorts of personal reunion, or the return of a loved one have all been milestones that might have such an National may be another. Studies have shown that the mortali.ty rate in the United States drops noticeably during the Septembers and Octobers of presidential election years. If the postponement theory is to be believed, the reason for the decrease is that many of the elderly who would have died in those two months "hang on" until they see who wins. years ago, a national periodical reported the findings of a study that looked at obituaries published in a Salt Lake City newspaper. Among the 747 decedents the identified that sixty, or 8.0%, had died in the three-month period preceding their to their birthdays, birth months (129). If individuals are dying randomly with interval. What should we we would expect 25% to die during any given make, then, of the decrease from 25% to 8%. Has the study provided convi.ncing reported for the sample do not constitute a random evidence that the death sample months? that occurred in Imagine the 747 deaths being divided into two categories: the three-month period prior to a person's birthday and those that occurred at other times the Let ki = 1 if the ith belongs to the first category and ki = 0, otherwise. Then k = kl + k2 + '" + k747 denotes the total number of deaths in the first category. The of course, is the value of a binomial random with parameter p, where p = P (person dies in three months to birth month) A. If people do not postpone their deaths (to wait for a birthday), p should be or 0.25; if they do, p will be something less than 0.25. the decrease from 25% (Continued on next page) Section 6.3 Testing Binomial Data-Ho: P = Po 443 is done with a one-sided binomial hypothesis test: t08%, p=0.25 versus : p < 0.25 Ci = 0.05. According to Part (b) of Theorem 6.3.1, Ho should k npo z: = --;::;=::::::::::~= S -ZJ)5 rejected jf = -1.64 Substituting for k. n, and Po. we tind that the test statistic falls far to the left of the ...........:u value: z= liiii~~~~O:;: = -10.7 The evidence is overwhelming, rm::reIOI'e the decrease from to 8% is due other than chance. bJl:pll'l11!lt1<)flS other than the theory, to of course, may be wholly or partially for the nonrandom distribution of the data show a pattern consistent with the notion that we do have some control over when we die. Comment. A similar conclusion was in a study conducted the "significant event" in that case was not a Chinese community living in California. birthday-it was the annual Harvest Moon festival, a celebration that holds particular meaning elderly women. Based on census data tracked over a tUl,~nru_f.r\l period, it was determined that fifty-one among elderly Chinese women should and fifty-two after the have during the week before the festivals. In point of fact, thirty-three died the before and seventy died the week after (23). A Small-Sample for the Binomial Parameter p Suppose that ... ,k" is a random sample of Bernoulli random variables where n is too small Inequality 6.3.1 to hold. rule, then, for p that was given in 6.3.1 would not appropriate. Instead, the defined by using exact binomial distribution (rather than a normalaP1prClxunaltioll) EXAMPLE 6_3.1 Suppose that n = 19 relieve cases. patients are to be given an experimental drug designed to standard treatment is known to be effective in 85% of similar probability that the new drug will reduce a patient's pain, the 444 Chapter 6 Hypothesis Testing researcher wishes to test HO:p = 0.85 versus p "p 0.85 The decision will be based on the magnitude of k, the total number in the sample for whom the durg is effective-that on k = k} + + ... + k2 k19 where !q= I 0 if the new drug fails to relieve ith patienc's pain 1 if the new drug does relieve ith patient's What should the decision rule be if the is to fi somewhere near 10%? [Note that Theorem 6.3.1 does not apply here because Inequality 6.3.1 is not satisfied-specifical1y, npo + 3Jnpo(1 - Po) = 19(0.85) + 3J19(O.85)(0.15) = 20.8 is not less than 19).1 If the null hypothesis is true, the expected number of successes would be npo = 19(0.85). or 16.2. It follows that values k to the extreme right or extreme left of 16.2 should constitute the critical region. MTB > pdf; SUBC > binomial 19 0.85. Probability Density Function Binomial with n '" 19 and P = O. 850000 x 8 9 10 11 12 13 14 15 16 17 18 19 p .. x) 0.0000 0.0001 0.0007 0.0032 0.0122 0.0374 0.0907 0.1714 0.2428 0.2428 0.1529 0.0456 -P(X:::: 13) 0.0536 19) =0.0456 - P(X = FIGURE 6.3.1 Binomiaf Data-Ho: P = Po Section 6.3 445 6.3.1 is a MINITAB printout of px(k) = C:)(0.85)k(0.15)19-k. By inspection, we can see that the region C={k:k::;:13 would produce an Ci with the two sides P(X Eel Ho is or k=19) to the desired 0.10 (and would keep the probabilities associated rejection region roughly the same). In variable notation, = P(X :s 131 p = = 0.0001 =0.0992 =0.10 + = 0.85) + P(X 191 p 0.85) 0.0007 + 0.0032 + 0.0122 + 0.0374 + 0.0456 QUESTIONS 6.3.1. Commercial working certain of the Atlantic Ocean sometimes find they would like to scare their efforts being hindered by the presence of whales. without frightening the fish. One of the being "","\Pr,m,>" away the underwater the sounds of a killer whale. the 52 oa;asions with is to technique has been tried, it worked 24 times (that is, the whales immediately left the area). Experience has shown, though, that 40% of all whales sighted near fishing boats leave of their own accord, anyway, probably just to get away the noise of the boat (8) Let p = P(whaleleaves area after hearing sounds of killer whale). Test Ho: p = 0.40 p > 0.40 at the a 0.05 level of significance. it be argued on the versus basis these data that underwater sounds is an effective technique for clearing fishing waters of unwanted . (b) Calculate the P-value for these data. For what values of a would Ho be relj;:;CU:O 6.3.2. Efforts to find a genetic explanation for why cenain people are rjght-handed and others left-handed have been unsuccessful. Reliable data are difficult to find because of environmental factors that also influence a child's "handedness." To avoid that researchers study the analogous problem of "pawedness" in animals, both genotypes and the environment can be partially controlled. In one such experiment (28), mice were into a cage having a tube that was equally accessible from the right or the Each mouse was carefully watched over a number of feedings. If it used its paw more than the to activate the it was to be "right-pawed." Observations of this sort showed that 67% of mice of bellonginlgro strain AlJ are right-pawed. A similar protocol was followed on a 35 belonging to strain AlHel. Of those 35, a total of 18 were eventually "''''>OH''''-< as right-pawed. Test wbether proportion of right-pawed mice found in the AJHeJ sample was significantly different fTOm what was known about the All strain. Use a be the probability with the critical region. two-sided alternative and Jet to win a seat because of a SJU;aDI,e 6.3.3, Defeated gender a politician has last two years out in favor rights issues. A newly poll claims to have contacted a random salnplle of the politician's current and found that were men. In the that he exit polls indicated 65% of those who voted for him were men. Using an a 0.05 level of significance, test the null hypothesis 1ha1 his proportion of male l<ur\nn;rf",.!'l has remained the same. Make the alternative hypothesis one-sided. 446 Chapter 6 Hypothesis Testing 6.3.4. Suppose p = 0.45 is to be tested against Ht: P > 0.45 at the a = 0.14 level of significance, where p = P(ith trial ends in success). Jf the sample size is 200, what is the to be rejected? smallest number of successes that will cause 6.3.5. Recall the median test described in Example 5.3.2. Reformulate that analysis as a hypothesis test rather than a confidence interval. What P-value is associated with the outcomes listed in Table 5.3.31 6.3.6. Among the early attempts to validate the postponement theory introduced in Case Study 6.32 was an examination of the birth dates and death dates of 348 U.S. celebrities (139). It was found that 16 of those individuals had died in the month preceding their birth month. Set up and test the appropriate Ho against a one-sided Hi. Use the 0.05 level of significance. 6.3.7. What a levels are possible with a decision rule of tbe form "Reject Ho if k ~ k*" when No: P 0.5 is to be tested against H1: P > 0.5 using a random sample of size 11 = 7? 6.3.8. The following is a MINITAB printout ofthe binomial pdf px(k) = G) (0.6)k (O.4)9--k! = 0, 1. ... , 9. Suppose p == 0.6 is to be tested against Ht: P > 0.6 and we wish the level of significance to be exactly 0.05. Use Theorem 2.4.1 to combine two different into a singie randomized decision rule for which a = 0.05. critical k MTB > SUBC > binomial 9 0.6. Probability Density Function Binomial vith n - 9 and P = 0.600000 x P(X ... x) 0 0.0003 0.0035 0.0212 0.0743 0.1672 0.2508 0.2508 0.1612 0.0605 o 0101 1 2 3 4 6 6 7 8 9 6.3.9. Suppose Ho: p = 0.75 is to be tested Ht; p < 0.75 using 8 random ,XU.lI'!J. . . of size n = 7 and the decision rule "Reject Ho k ~ 3." (a) Wbat is the test's level of significance? (b) Graph the probability that will be rf'I*'1"'t~'11 as a junaion of p. 6.4 TYPE I AND TYPE II ERRORS The possibHity of drawing incorrect conclusions is an inevitable by-product of hypothesis ''''''Ull~. No matter what sort of mathematical facade is atop the aelCISllon-ruaKllng nrrV-",j~ there is no way to guarantee that what the test tells us wi,ij be the truth. One kind Ho when Ho is true-figured prominently in Section 6.3: It was argued Section 6.4 I and Type Ii Errors 447 that regions should be defined so as [0 the probability of making errors small, on the order 0.05. In point of fact, there are two different kinds of errors that can be committed with any Ho when Ho is and (2) we can to reject hypothesis test: (1) We can Ho is These are called Type I and Type II errors, At the same time, there are two kinds of correct decisions: (1) can fail to reject a true Ho and (2) we can reject a false Ho. shows these four possible "Decision/State of nature" True State of Nature HO is true Our Decision HI is true to Ho Ho Type! error RGUREiA.1 Computing the Probability of Committing a Type I Error Once an inference is made, is no way to know whether was correct It is possible, though, to calculate probability of and the magnitude that probability can help us understand the hypothesis test its ability to distinguish between Ho and H1. Recall the additive example developed in Section 6.2: Ho: J.L = tested against : J.L > 25.0 using a sample of size n = 30. The decision should be rejected if y, the mpg with new additive, 25.718. In that case, the probability of committing a I error is 0.05: P(Type I made an error, "power" of the 25.0 was to rule stated or exceeded = P(reject Ho I is true) p(9 ::: 25.718 I J.L = 25.0) =p - 25.0) y - 25.0 25.718 ----=( 2.4/J30 > - 2.4/J30 = P(Z ::: 1.64) = 0.05 Of course, that the probability of committing a Type I error equals 0.05 should come as no surprise. In our earlier discussion of how "beyond reasonable doubt" should be interpreted numerically, we specifically chose the critical region so that the probability decision rule rejecting Ho when Ho is true would 0.05. general, the probability of a Type I error is referred to as a test's level of significance and is denoted a (recall Definition 6.2.2). The concept is a crucial one: The 448 Chapter 6 Hypothesis Testing level of is a single-number summary of the "rules" by which the decision the amount of evidence the experimenter process is being conducted. In essence, ex is demanding to see before abandoning the null hypothesis. Computing the Probability of Committing it Type II Error We just saw that the probability of a Type I error is a nonproblem: There are no computations necessary, the probability equals whatever value the ext)erjimf~nt4er sets a priori for ex. A similar sirna tion does not bold for Type II errors. First, Type II error probabilities are not specified explicitly by the experimenter; second, each hypothesis test has an infinite number of Type II error probabilities, one for each value of the parameter admissible under HI. N> an suppose we want to find the probability of committing a II error in the gasoline experiment if the true Jl (with the additive) were 25.750. By definition, P(Type II error I Jl = 25.750) = P(we fail to reject Ho I Jl = 25.750) = P(Y < 25.7181 JL = 25.750) - 25.75 =P = P(Z < -0.07) = 0.4721 even if the new additive the fuel economy to 25.750 mpg (from 25 mpg), our rule would be "tricked" 47% of the time: that it would tell us on those occasions noHo Ho. The symbol for the probability of committing a Ty-pe II error is Figure 6.4.2 shows the sampling distribution of Y when JL = 25.0 (i.e., when Ho is true) and when JL = 25.750 (HI is true); areas corresponding to ex and fJ are shaded. Oearly, the magnitude fJ is a function of the presumed value for Jl. If, example, the gasoline additive is so effective as to raise fuel efficiency to 26.8 mpg, the probability that our decision rule would lead us to make a Type II error is a much 1.0 Sampling distribu lion of Y when-...,,' Ho is true / 0.5 Sampling disll'ibulioo '-~ofYwhen fJ. = 25.75 I ; f 25.718 Accept Ho ..........- ~ AGURf6A..2 Reject flo Section 6.4 1.0 , Sampling /' distribution , V when ",,' H(\ is true / 05 \ \ -","" 24 / \ \ \ \ I {3 '" O.0068~-~ I \ , I 449 j/-' \ or Type I and Type II Errors " " , Sampling distribution \- _., of Ywhen \ JL=26J3 I ,; ---~- 2S \ I / " a 26 27 0.05 2B 25.718 Accept 11;1 FIGURE 6.4.3 smaller O'()068: P(Type II error I {.t = 26.8) = P(we fail to HO 1J..t= III = 26.8) = P(Y< --= < ----=:--- = P(Z < -2.47) = 0.0068 Figure 6.4.3). Power Curves If j3 is probability that we fail to reject Ho when HI is true, then 1 - j3 is the probability of complement, that we rejec1 Ho when Hi is true. We 1 - j3 the of the it represents the ability of the decision rule to "recognize" (correctly) No is false. The aLternative hypothesis HI usually depends on a parameter. which makes 1 - j3 a function of that parameter. The relationship they share can be pictured by drawing a power curve, which is simply a graph of 1 - j3 Versus the set of all possible parameter values. Figure 6.4.4 the power curve for {.t = 25.0 versus where f-l is the mean of a normal distribution with cr = 2.4, and the decision rule is "Reject Ho ify .::: 25.718." The two marked poinls on the curve represent the (J..t, 1 - fJ) just determined, (25.75, and 0.9932). One other point can be gotten for every power curve, without doing any calculations: When J), = 110 (the value by Ho), 1 j3 = Ct. Of course. as the true mean gets fartber and away from the Ho mean, the power will converge to one. 4SO Chapter 6 Hypothesis Testing 1.0 Power ",0.72 I I I I I I Power,. 0.29 AI I I 25.00 tI I I I 25.50 26.00 26.50 27.00 Presumed value for J.!. FIGURE 6..4.4 Power curves serve two different purposes. On the one hand, they completely characterize the "performance" that can be expected. from a hypothesis test. In Figure 6.4.4, (or example, the two arrows show that the probability of rejecting Ho: J1. 25 in (avor of HI: J1. > when J1. = 26.0 is approximately O. (Or, equivalently, Type n errors will committed roughly 28% of the time when J1. = 26.0.) As the true mean moves closer to J1.o (and becomes more difficult to distinguish) the JX>wer of the test understandably diminishes. If J1. = for example, the graph shows that 1 - fJ falls to 0.29, Power cwves are also for comparing one inference procedure with another. every conceivable hypothesis situation, a of procedures for choosing between Yo and HI will be available. How do we know which to The answer to that question is ~ot always simple. Some procedures will be computationally more convenient or to explain than others; some wiH make slightly different assumptions about the pdf being sampJed. Associated with each of them, though, is a power HO curve. If the selection of a hypothesis'test is to hinge solely on its ability to from HI> then the procedure to choose is the one having the steepest power curve. Figure 6.4.5 shows the power cwves for two hypothetical methods A and B, each of which is testing Yo: 8 = Be Versus HI: B '# 80 at the a level of significance. From the = ..... MethodB ...... -------- 1 1-/.1 , '" .; , I I I I , I I miURE 6.A.5 ,"' .... Section 6.4 Type I and Type II Errors 451 of power, Method B is the better of the two-it probability of correctly rejecting Ho when the parameter () is not <>'O,UUp'V"l< Factors That Influence the Power of a Test The ability of a test procedure to when Ho is false is clearly of prime importance, that raises an obvious What can an experimenter do to influence the value a of 1 the case of the Z test in Theorem 6.2.1, 1 - {3 is a function of Ci, cr, and n. appropriately raising or lowering the values of those the power of "'l".'"'''''' any given J-t can be to equaJ any desired level. of IX on 1 - f3 the test of Ho: fJ. = 25.0 earlier in this section. form, Ci = 0.05, a = n = 30, and the y ~ 25.718. decision rule called for Ho to be Figure shows what happens to 1 - {3 (when fJ. = 25.75) if a, n, and fJ. are held constant ex is increased to 0.10. pair of distributions shows configuration that in Figure 6.4.2; the power in this case is I - 0.4721, or The bottom portion the graph illustrates what when Ci is set at 0.10 instead of O.OS-the decision rule changes from "Reject Ho if Y ~ 25.718" to "Reject Ho if Y ~ 25.561" (see Question and the power increases to 0.67: f3 = P(reject = P(Y is true) ~ =P = P(Z 2: -0.43) = 0.6664 The 6.4.6 accurately true in increases the power. That it does not follow in experimenters manipulate a to achieve a 1 - f3. For all the reasons cited in Section 6.2, (i should typically be set equal to a number somewhere in the neighborhood of 0.05. If the corresponding 1 - {J a J-t is deemed to inappropriate, adjustments should be made in the values of (J The Effects Although it increase I - cr non 1- r; not always be feasible (or even possible), decreasing a will necessarily gasoline additive example, a is assumed to be 2.4 mpg, the laner 452 Chapter 6 Hypothesis Testing -I. 1.0 Sampling .distribution , ofYwhen~' is true " 0.5 'r-\ I , Power -= 0.53 Sampling distribution ofYwben p.,=25.75 \ I \ \ ... 25.718 RejectHo -I. Power =0.67 1.0 Sampling I distribution .ofYwben~' Ho is true \ : 0.5 Sampling distribution \,....-- of Y when p., =25.75 " I \ .t \ f3 '" 0.3336 ... ... 25.561 Accept It -~- FIGURE 6.4.6 being a measure of the variation in gas mileages from driver to driver achieved in a cross-country road trip from Boston to Los Angeles (recal] page 428). Intuitively, the environmental differences inherent in a trip of that magnitude would be considerable. Different would encounter weather conditions, amounts of and perhaps take alternate routes. Suppose, instead, the drivers simply did laps around a test track rather than drive on actual highways. Conditions from driver to driver would be much more uniform the value of 0' would surely be smaller. What would the on 1 f3 when tL = 2.5.75 (and a: 0.05) if 0' could be reduced from 2.4 mpg to 1.2 mpg? As 6.407 shows, reducing 0' has the the Ho distribution off more concentrated around tLo(= and the HI distribution of more concentrated around 25.75). Substituting into (with 1.2 for 0' in of 2.4), we find that = the critical value y* moves closer to tLo (from 25.718 to 25.359 25 + 1.64 1.2 ) ) . ,J3O Section 6.4 Type I and Type II Errors 453 Whenu .. 2.4 I'ower= OS3 l.0 Sampling Sampllng / distribution \ - - ofYwhen \ IJ. = 25.75 dis!ribu lion I o!Vwhen~1 ~ is true / 0.5 , ; J .B '" 0.4721 I a .. O.OS 25.718 Accept Ho - - ' - - Whenu=12 20 Power = 0.96 , r r I I distnbution ofYwhen is true I I I I I I I I I I I I I I I I I f I I I I \ 1 Sampling distn'bution ofYwhen IJ. = 25.75 I I I r I \ j j I I .B =- 0.0375 ___ ,'" \ l I I I I I I I I I Sampling I I f i, I O! .. I o.os 1:1 24- Acrepl Ho _..1..-_ FIGURE 6.4.7 and the proportion of the HI distribution incre.t1Ses from to 0.96: 1 - t1 = the rejection region P(Y ?:: 25.3591 JL = 25.359 =P ( Z>-----==-1 =P(Z ?:: -1.78) = the power) 454 Chapter 6 Hypothesis Testing In theory, reducing (1 can a very effective way of increasing the power of a test, as 6.4.7 makes abundantly clear. In practice, though, refinements in the way data are collected that would have a substantial impact on the magnitude of (1 are often either difficult to identify or prohibitively More typically, achieve the same by simply the sample Look again at the two sets of distributions Figure 6.4.7. The increase in 1 - fj from 0.53 d ' . . [ Z = Y - Pi7\. 25J enommator 0 f th e test statistic (1/v30 in half by deviation from 2.4 to The same numerical effect would be if (1 were left unchanged but n was from 30 to i k U - " l l Q . L is, it can easily be increased or decreased, the sample size is the parameter almost invariably turn to as the mechanism for ensuring that a hypothesis test will have a sufficiently power against a given alternative. I!A ed by cutting O 96 was accomp,l.lbh to. EXAMPLE SA. 1 SupJX)Se an experimenter wishes to test = Ho:Ji. 100 versus J1. > 100 at the a = 0.05 level of significance and wants 1 - (3 to 0.60 J1. = 103. What is smallest cheapest) sample size that will that objective? Assume that the variable being measured is oonnally distributed with cr 14. Findingn, given values fora, 1 - fj, cr, and j.t,requires that two simultaneous equations be written for the critical value , one in terms of the Ho distribution and the other in terms of the HI distribution. the two equal will yield the minimum sample size that achieves the desired a and 1 - (3. Consider, first, the consequences of the level of significance being equal to 0.05. By definition, a = P(we reject Ho I Ho is true) = p(y ?! Y* I j.t = 1(0) =P - 100 = P ( Z?:: y* - > -=----=- 100) 141Jn =0.05 But P(Z ?:: = 0.05, so or, ,..'111\V>I Y* = 100 + 1.64 . 14 (6.4.1) Section 6.4 Type I and Type II Errors 455 Similarly. Ho I HJ is true) 1 - p = P(we From AI='penOl[l{ y* - - 103 =P ?: = P(Y ::: y* l/.t = 103) 103) =0.60 14/../ii A.l, 0.5987 -= 0.60, so ---=-- which implies that 103 _ 0.25. 14 It follows, from Equations 6.4.1 and 6.4.2 that 100 Solving for n guarantee (6.4.2) + 1.64 . 14 14 = 103 - of seventy-eight observations must be taken to nVlJOtnesl~ test will have the precision. Decision Rules for Nannarmal Data Our discussion hypothesis L"""LLLLjI; thus far has confined to involving binomial or normal data. rules for other types of probability functions are rooted in the same basic principles. In general, to test Ho:8 = 80 , 6 is the unknown parameterin a pd1 jy(y; 6), we define decision rule in terms of where the latter is a sufficient statistic for 6. The corresponding critical region is set of values IJ least compatible with (hut case of testing admissible under H1) whose total probability when Ho is true is a. In /.t = /.to versus where are normally distributed, is a sufficient for JL, and least likely the sample mean that are admissible under are those for which y ?: y", where ?: y* I Ho is true) = ct. e. EXAMPlE 6.4.2 A random sample of 11 = 8 is drawn from the uniform for the purpose of testing fy(y; 6) = 1/6. 0 ~ y ~ 8 Ho:8 = 2.0 versus 8 < 2.0 at I"~,,""'" = 0.10 level of significance. the decision is to be on the order statistic. What would be the probability of '"'V'..t..UL,U.LLU'1". a Type II error when ct 6 = 1.7? 456 Chapter 6 Hypothesis Testing IT Ho is true, Yli should be dose to 2.0, and values of the largest order statistic that are much smaller than '2.0 would be evidence in favor of HI: fJ -< 2.0. It follows, then, that the form of the decision rule should be "Reject Ho: fJ = 2.0 if Yg::: c" where P(Yg ::: C I Ho is true) = 0.10. From Example 3.10.2, hs(Y; e = 2) = 8 (~) 7 Therefore, the constant c that appears in the equation 1 2' CI. = 0.10 decision rwe must satisfy the t (y)7 1 2: . 2:dy =0.10 10 8 or, equivalently, implying that c = 1.50. Now, f3 when fJ = 1.7 is, by definition. the probability that Y~ falls in the acceptance region when HI: e = 1.7 is true. That is, f3 = P(Ys > 1.50 I fJ :::::; 1.7) = 11.50 t· 8 (~)7 ~ dy 1.7 1.7 7 = 1 _ (1.5)8 = 0.63 1.7 (see Figure 6.4.8). 5 /3 = 0.63 , I I I I I I I I " 1 o 1 AGURE6A8 "NfofY.'g when Ho: f} '" 2.0 istf1J£ _ _..l..-y' 2 s Section 6.4 Type I and Type II Errors 451 EXAMPLE 6.4.3 Four measurements-kl, k2, k4-are taken on a Poisson random variable, X, where Px(k; >..) = e-J..>..k)k!, k = 0.1,2, ... for the oftesting Ho: >"=0.8 versus H1: >.. > 0.8 What decision rule should be if the level of significance is to be and what will the power of the test when>.. = 1.21 From Example 5.6.1, we know that is a sufficient statistic for J..; the same would be 4 true, course, for L Xi. It will be more convenient to state the decision rule in terms ;=1 of the latter because we already know the probability mode] that describes its behavior: Ii Xl. X2. X3. X4 are four independent Poisson random variables, each with parameter >.., 4 then Xi has a Poisson distribution with parameter 4). (recall Example 3.12.10). Figure 6.4.9 is a MINITAB printout of the Poisson probability function having ). = 3.2, which would be the sampling distribution 4 Xi when Ho: J.. MTB > pdf; SUBC > poisson 3.2. Probability Density Function Poisson with mu • 3.20000 x 0 1 2 3 4 5 6 1 8 9 critical 10 region 11 12 13 P(X = 0.0408 0.1304 0.2087 0.2226 0.1781 0.1140 0.0608 0.0218 0.0111 0.0040 0.0013 0.0004 0.0001 0.0000 ex = P(reject Ho I HO is true) = 0.1055 FKiURE 6.4.9 = 0.8 is true. 458 Chapter 6 Hypothesis Testing MTB > pdf; SUBC > poisson 4.8. Probabllity Density Function Poisson vith mu = 4.80000 x P(X 0 1 2 3 4 5 0.0082 0.0395 0.0948 0.1517 0.1820 0.1747 0.1398 0.0959 0.0575 0.0307 0.0147 0.0064 0.0026 0.0009 0.0003 0.0001 0.0000 6 7 8 9 10 11 12 13 14 15 16 III x) 1 fJ = P(reject Ho I HI is = 0:3489 FKiURE6A.10 By inspection, decision "Reject Ho: A = 0.8 if ki 2: 6" gives an a close to the i=l desired O.lD. If HI is true and A = Xi will have a Poisson distribution with a parameter equal to 4.8. According to Figure the probability that the sum of a random sample size four from such a distribution would equal or exceed 6 (i.e., 1 - fJ when A = 1.2) is 03489. EXAMPLE 6.4.4 Suppose a random sample of seven observations is taken (6 + 1)yfJ, 0 ::: y ~ 1, to test = HO:6 2 versus 8 > 2 the pdf jy(y; 8) = Section 6.4 Type I and Type II Errors 459 As a deci!\ion rule, the experimenter plans to record X, the number of )liS that exceed reject Ho if X :::. 4. Wbat proportion of the time would such a decision rule lead to a Type I To evaLuate 0: = P(reject Ho I Ho is true), we first need to recogni7.e that X is a binomial random variable where n 7 and the parameter p is an area under fy(y; 8 = 2): = p = P(Y:::' I Ho is true) = P(Y :::. 0.91 fy(y; 2) = 3y2) =fo.~ dy =0.271 It follows, then, that Ho will be incorrectly rejected a = 2: 418:::::0 2) = t ( k) of the time: (0.271)11(0.729)1-11 k==4 = 0.092 of Comment. The basic notions Type I and Type n errors tirst arose in a quality control context. The pioneering work was done at the Bell Telephone Laboratories: There the terms producer's risk and consumer's risk were introduced for what we nOW call a p. Eventually. these ideas were generalized by Neyman Pearson in the 1930s and evolved into the theory of hypothesis testing as we know it today. QUeSTIONS 6.4.1. Recall the "Math for !:he Twenty-First ('.cntury" hypothesis test done in Example 6.2.l. Calculate the power of that test when the true mean is 500. 6.4.2. Carry out the details to verify the decision rule change cited on page 451 in connection with 6.4.6. 6.4..3. For the decision rule found in Question 6.2.2 to test Ho: Jl = versus HJ: Jl ,;. 95 at the a = 0.06 level of significance. calculate 1 f3 when Jl = 90. 6.4.4. Construct a power curve for the a := 0.05 test of Ho: Jl = 60 versus HI: 11 ,;. 60 jf the data consist of a random sample size 16 from a nonna! distribution having u = 4. 6.4..5. If Ho: Jl = 240 is tested against Jl < 240 at the ex 0.01 level of significance with a random sample of 25 normally distributed observations, what proportion of the time will the procedure fail to recognize that 11 has dropped to 2201 Assume that t'.'l = 50. 6.4.6. Suppose II = observations are taken from a normal distribution where (1 ::::: 8.0 for the of testing flo: Jl = 60 versus HI: 11 '#: 60 at ex = 0.07 level significance. lead investigator skipped statistics class the day decision rules were discussed intends to reject Ho if YfaUs in the region (60 . 60 + (8) Find y". (b) What is power of the test when Jl = 62? (e) What would the power of the test be when 11 = 62 if the critical region had been defined the correct way? 46D Olapter 6 Hypothesis Testing HI: I.J.. < 200 at the a = 0.10 level of significance based on a random sample n from a normal distribution where q 15.0, what is the smallest value for n that will make the power equal to at least 0.75 when I.J.. = 197? 6.4.7. If HO: I.J.. = 200 is to be tested 6.4.8. Will n = 45 be a sufficiently large sampJe to test Ho: I.J.. = 10 versus H1: J.l. =F 10 at the a = 0.05 leveJ of significance if the experimenter wants the Type II error probability to be no greater than 0.20 when II 121 Assume that q 4. 6A.9. If Ho: J.l. = 30 is tested against HOi. > 30 using n = 16 observations (normally distributed) and if 1 - fJ = 0.85 when II 34, what does Ct equal? Assume that a = 9. 6.4.10. Suppose a sample of size 1 is taken from the pdf fy (Y) (1j'J...)e- Y/)." y > 0, for the purpose of testing The null hypothesis will be rejected if y ~ 3.20. (0) Calculate the probability of committing a Type I error. (b) CalcuJate the probability of committing a II error when A= ~. (c) Draw a diagram that shows the Ct and p calculated in Parts (a) and (b) as areas. 6.4.ll. Polygraphs used in criminal investigations typically measure five bodily functions: (1) thoracic respiration, (2) abdominal respiration, (3) blood pressure and pulse rate, (4) muscular movement and and (5) skin response. In principJe, the magnitude of these responses when the subject is asked a relevant question ("Did you murder your wire?") indicate whether he is lying or telling the truth. The procedure, of course, is not infallible, as a recent study bore out (80). Seven experienced polygraph examiners were gi.ven a set of 40 recorik-20 were from innocent suspects and 20 from guilty suspects. The subjects had been asked 11 questions, on the basis of which each examiner was to make an overalljudgmenl.: "Innocent" or "Guilty." The results are as follows: Examiner's "Innocent" Decision «Guilty" What would be the numerical values of ex and f3 in this context? In a judicial setting, should Type I and Type II errors carry equal weight? Explain. 6.4.12.. An urn contains 10 chips. An unknown number of the chips are white; the others are red We wish to test Ho: exactly half the chips are white versus than half the chips are wlllte We will draw, without replacement, three chips and reject HO iftwo or more are white. Find a. Also, find when the urn is (a) 60% white and (b) 70% white. e Section 6.4 Type I and Type II Errors 461 6.4.13. Suppose that a random sample of size 5 is drawn from a uniform pdf fy(y; 0) = !~, 0, 0 < y < B elsewhere We wish to test HO:8 =2 versus H.:fJ > 2 by rejecting the mill hypotheflis if Ymax ::: k. Find the value of k that makes the probability of committing a Type I error equal to 0.05. 6.4.14. A sample of size 1 is from the pdf fy(y) = (fJ + 1)/. O.:sy:s1 The hypothesis HO: fJ = 1 is to be rejected in favor of fJ > 1 if Y ~ 0.90. What:is the test's level of significance? 6.4.15. A series of n Bernoulli trials is to be observed as data for testing p=! Ht:p> ! versus null hypothesis will be rejected if k, the observed number of successes., equals n. For what value p will the probability of committing a Type IT error equal 0.057 6.4.1.6. Let Xl be a binomial random variable with n = 2 and PXI = P(success). Let X2 be an independent binomial random variable with n 4 and PX 2 P(success). Let X = XI + Calculate a: if = HO: PXl = PX2 = ! > ~ = versus PX, = px~ is to be tested by rejecting the null hypothesis when k 2:: 5. 6.4.17. A sample of size 1 from the pdf fy(y) = (1 + 9)y8, 0 .$ y testing :s 1, is to be the basis for HO:B = 1 versus H1:8 < 1 The critical region will be the interval y .:s ~. Fmd an expression for 1 - fj as a function oie. 462 Chapter 6 Hypothesis 6.4.18. An experimenter takes a sample of 1 from the Poisson probability model, px(k) = I< = 0, 1,2,.", and wishes to test ).=6 versus H t :" < 6 by rejecting Ho jf k ::: 2. (8) Calculate the probability of committing a Type I error. (b) Calc:ulate probability a Type II error when). 4. 6.4.19. A sample ohize 1 is taken from the geometric probability model., px(l<) (1 _ p)k-l p, I< = 1. 2, 3, ... , to test Ho: P = versus H( p > The null hypothesis is to be rejected if I< ~ 4. What is the probability that a Type II error will be committed when p 6.4.20. Suppose that one observation from the exponential frCy) .\.e-J.y. Y > 0, is to be used to test Ho:" 1 versus Hl :). < 1. The decision rule calls for the null hypothesis to rejected jf Y ~ In 10. Find fJ as a function of A. 6.4.2l. A random sample of size 2 is drawn from a uniform pdf defined over the interval [0, B]. = l 1. = = = = We wish to test HO:B 2 versus <2 by rejecting Ho when Y1 + Y2 k. Find the value for k that a level of significance of 0.05. 6.4.22. Suppose that the hypotheses of Question 6.4.21 are to be tested with a decision ruJe of the form, "Reject Ho: B 2 if YIY2 :5 k*." Find the value of k'" that gives a level of uu'-,...,*'.... of 0.05 Theorem 3.8.3). 6.5 A NOTION OF OPllMAUfY: THE GENERAUZED UKEUHOOD RATIO In next several chapters we will be studying some the particular bypothesis tests that statisticians most often use in dealing with real-world problems. All of these have the same conceptual fundamental notion known as the generalized likelihood or More just a principle, the generalized likelihood ratio is a working criterion for actually suggesting test procedures. As a first look at this important we will conclude Chapter 6 with an application the likelihood ratio to the problem of testing the 8 in a uniform Notice the relationship here between the likelihood ratio and the definition of an "optimal" hypothesis test. Suppose Y1 , Y2, ...• Yn is a random sample from a uniform pdf over the interval [0,8), where 8 is unknown, and our objective is to test Ho:8 =8 versus Hl:B < 0 A Notion of Optimality: The Generalized likelihood Ratio Section 6.5 463 a t a specified level of significance a, What is the "best" decision rule for choosing between Ho and H1. and by what criterion is it considered optimal? As a starting point in answering questions, it will be necessary to define two parameter spaces, wand O. In general, w is the set of unknown parameter values the null admissible under Ho- In the case of the unifonn, the only parameter is 8, hypothesis restricts it to a single point: w={8:8=8o ) The second parameter space, Q, is the set of aU possible values of all unknown parameters. Q={8:0<(J:::;8o } Now, recall the definition of likelihood function, L, fTOm Defurition 5.2.1. Given a sample of size n from a uniform pdf, L = L(8) = ,n ,=1 ={ h(y!: 8) (~r ' 0, otherwise For reasons that will soon be clear, we need to maximize L(8) twice, once under wand again under O. Since 9 can take on only one value--8o-under w, max L(8) tJJ { (~r = L(8o ) 0, Maximizing L(8) under O-that with no restrictions-is accomplished by simply substituting the maximum likelihood estimate for 8 into L(8). the uniform parameter, Ymax is the maximum likelihood estimate (recall Question 5.2.9). Therefore, max L(8) n = (_1_,)11 Ymax For notational simplicity, we denote max L(8) and max L(8) by L(we) and L(Qe), n w respectively. Definition 6.5..1. Let Yl.)'2 •...• Yn be a random sample from fy(y; fh •...• 8lc). generalized likelihood ratio, A, is defined to be the uniform distribution, A = (1/00)" (l/}'max)" = (Ymax)11 464 Chapter 6 Hypothesi5 Testing Note in ).. will always be but never greater one (why?). Furtbennore, values of the likelihood ratio close to one suggest that tbe data are very compatible with Ho. That the observations are "explained" almost as well by the Ho parameters as by any parameters measured by L(we ) and L(Q,,)]. For values of J,. we should accept Conversely, if L(w,,)/ L(Q,,) were dose to 0, data would not be very compatible with the parameter values in wand it would make sense to reject Ho. Def:in.i1ioo 6.5.2. A generalized likelihood ratio test (GLRT) is one that rejects Ho whenever where J,. '" is chosen so that P(O < A:::: A. *' I Ho is true) = ex [Note: In keeping with the capital letter notation introduced in Chapter 3, A denotes the generalized likelibood ratio expressed as a random variable.] when Ho is true. If Let fA (J,. I Ho) denote the pdf of the generalized likelihood I Ho) were known, A" (and, therefore, the decision rule) could be determined by solving the equation fAC). ct= 10 A'" tAO. I dJ,. (see Figure 6.5.1). In many situations, tbough, lAO.. I Ho) is not known, and it becomes necessary to show that A is a monotonic function of some qu.antity W, where the distribution of W is known. Once we have found such a any test based on w will to one based on J,.. a suitable W is easy to find. Note that peA s).. '* I Ho is true) = ex = P [ (Y;;x P FKiURE 6.5.1 r SA'" I Ho is true] I Ho is true ) A Notion of Optimality: The Generalized likelihood Ratio Let W = Ym8x/00 and WOO 465 ::/5:*. Then peA ~ A· = P(W .:::: WOO I Ho is I Ho is true) (65.1) the right-hand side of Equation 6-5.1 can be evaluated from what we already know about the density function for the largest order statistic from a uniform distribution. Let (y; 80) be the density function for Ymax • Then fw(w; 00) = 8oiY",.. (80w; 80) (recall Theorem 3.4.3) which, from Example 3.10.2, reduces to Therefore, P(W .:::: w· (w' I Ho is true) = 10 nw1J - 1 dw = implying that the ,...,.,Y,,,.,,, value for W is =~ if That is, the GLRT calls for Ho to be w= 80 < CommenS. The GLR is applied [0 other hypothesis-testing in a manner similar to what was described here: First we find L(we) and L(Oe), then A, and finally W. The algebra involved, though, usually becomes considerably more formidable. For example, in the "normal" model taken up in Chapters 7 8, both parameter spaces are two-dimensional and the likelihood function is a product of densities of the form _l_e -(1/2){(Y-J1)/a f £(1 , -oo<y<oo QUESTIONS 6.5..1. Let k2 . ...• k" be a random sample from the geometric probability function px(k; p) = (1 - p)k-I P. k= 1. 2 •.. Find A, the generalized likelihood ratio for testing p = PO versus HI: p "# po. 6..5.2. Let Yl , Y2 •... , Ylo be a random sample from an exponential pdf with unknown parameter A. Find the form of the GLRT for Ho:).. = "0 versus HI: A ;!; "0. What .integral would have to evaluated to detennine the critical value if ~ were equal to 0.05? 466 Chapter 6 Hypothesis Let Yl, )'2 •.•.• YII be a random sampJe from a nonnal pdf with unknown mean /.L and variance 1. Find the form of the GLRT for Ho: /.L = /.LO versus HI: /.L ¢ jJ,o· 6.S.4. In the scenario of Question 6.5.3, suppose the alternative hypothesis is jJ, = ILl. for some particular value of ILl. How does the likelihood ratio test change in tbis case? In what way does the critical region depend on the value of tkl ? 6.5.5. Let k denote the number of successes observed in a sequence of n independent Bernoulli trials, where p = P(success). (a) Show that the critical region of the likelihood ratio test of Ho: P = versus H]: p ¢ can be written in the form ! k . In(k) + (n ! k)· In(n - k} 2: A** (b) Use the symmetry of the graph of J(k) = k . In(k) to show that the + (n - k) . In(n - k) critical region can be written in the form where c is a constant determined by a. 6.5.6. Suppose a sufficient statistic exists for the 8. Use Theorem 5.6.1 to show that the critical region of a likelihood ratio test will depend on the sufficient statistic. 6.6 TAKlNG A SECOND lOOK AT STATISTICS (STATISl1CAL SIGNIFICANCE VERSUS "PRACTICAL" SIGNIFICANCE) The most important concept in this chapter-the notion of statistical also the most problematic.. Why? Because statistical does not always mean what it seems to mean. By definition, the difference between, say, y and /.La is statistically significant if Ho: iL = iLo can be rejected at the a = 0.05 level. What that is that a sample mean equaJ to the observed y is not to have come from a (normal) distribution whose true mean was tko. What it does not imply is that the true mean is necessarily much different than lLo. Recall the discussion of power curves in Section 6.4 and, in particular, the effect of 11 on 1 - {3. example those topics involved an additive that might be able to a car's gas mileage. The tested were HO: J.L = 25.0 vs. Hl:iL > 25.0 where a was assumed to be 2.4 (mpg) and a was set at 0.05. If n = 30, the decision role called for Ho to be rejected when y 2:. 25.718 (see p. 447). Figure 6.6.1 is the test's power curve (the point (iL. 1 - {3) = 1 0.47) was calculated on p. 448). The important point was made in Section 6.4 that researchers have a variety of ways to increase the power of a test-that is, to decrease the probability of committing a Type II Section 6.6 Taking a Second Look at Statistics 461 0.75 0.50 30) ------ 0.25 FIGURE 6.6. 'I 1 0.75 J - fJ 0.50 0.25 0~~--~----~~5----~----~~---L----2~65---~ FIGURE 6.6.2 error. Experimental!y., the usual way is to increase the sample size, which bas the effect of reducing the overlap between the Ho and H1 distributions (Figure 6.4.7 pictured such to 1.2). a reduction when the sample size was kept fixed but 0' was dec.-eased from 6.62 superimposes the power curves for to show effect of non 1 - /3, testing Ho: Ii> = 25,0 versus Hi: /.L > 25.0 the cases where 11 = 30, 11 == 60, and n = 900 (keeping a = 0.05 and 0' = 2.4). There is good news in 6.6.2 and there is bad news in FIgure 6.6.2. good news-not surprisingly-is that the probability of rejecting a false hypothesis tncreases dramatically as fI inc.-eases. If the true mean Ii> is 25.25, for example. the Z test will 25.0 14% of the time when n = 30, 20% of the when (correctly) reject Ho: Ii> 11 60, and a robust 93% of the time when 11 = 900. The had news implicit in figure 6.6.2--and, for some, this is side" of hypothesis testing-is that any false hypothesis, even one where the true Ii> is just "epsilon" away from /.Lo, can be rejected virtually 100% of the time if a large enough is used Why is that bad? Because saying that a (between y and sample li>o) is SlIltistically significant makes it sound meaningful when, in fact, it may be totally inconsequential Suppose, for example, an additive could be found that would increase a car's gasrniJeage from 25.000 mpg to 25.001 mpg. Such a minuscule improvement would mean basically nothing to the consumer, yet if a large enough sample size were used, the probability of = = 468 Chapter 6 Hypothesis Testing A<;I'''''''LHU<l Ho: It = 25.000 in favor of HI: It > 25.000 could made arbitrarily close to one. That is, difference between y and 25.000 would qualify as being statistically slgmil,callt even though it had no "practical significance" wbatsoever. Two lessons should be learned here, one old and one new. The new lesson is to be wary of inferences drawn experiments or surveys based on buge sample Many .....,........,.,"'y significant are likely to result in those but some those "reject Has" may be driven primarily by the sample size. Paying attention to the ItUlgniJude of y- k Ito (or - - Po) is often a good way to keep the conclusion of a n bypothesis test in perspective. The second lesson bas been encountered before and will come up again: Analyzing data is not a simple exercise in into formulas or computer printouts. Realworld data are seldom and they cannot be adequately summarized, quantified, or interpreted with any single statistical technique. Hypothesis tests, like every other inference procedure, have strengths and weaknesses, assumptions and limitations. Being aware of what they can tell bow they can trick the first toward them properly. CHAPTER 7 The Normal Distribution :,In 7.1 INTRODUCTION 1.2 COMPARING 7.3 DERIVING THE DtSTRIBU110N OF AND 1.4 DRAWING INFERENCES ABOUT p1.5 DRAWING INFERENCES ABOUT 0'2 1.6 TAKING A SECOND LOOK AT STATISllCS ("BAO" ES11MATORS) APPENDIX 1.A.1 MINITAB APPUCAll0NS APPENDtX 1.A.2 SOME DISTRIBUll0N RESULTS FOR Y AND 52 APPENDIX 7 A.3 A PROOF Of THEOREM 1.5.2 APP'ENDtX 1.A.4 A PROOF THAT THE ONE-SAMPlE t TEST IS A GlRT I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the "law of frequency of error" (the normal dirtribution). The Jaw would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self effacement amidst the wildest confusion. The huger the mob, and the greater the anarchy, the more perfect is its sway. It is the supreme Jaw of Unreason. -Francis Galton 410 7.1 Chapter 1 The Normal Distribution INTRODUCTION Finding probability distributions to describe-alld, ultimately. to predict-empirical data is one of the most important contributions a sta tistician can make to the research scientist. Already we have seen a number of functions playing that role. The binomial is an obvious (Case model for the number of correct responses in the Pratt-Woodruff ESP Study 4.3.1); the probability of holding a willning ticket in a game of Keno is given by the hypergeometric (Example 3.2.5); alld applications of the Poisson have run the gamut from radioactive (Case Study 4.2.2) to Saturday afternoon football fumbles the most widely used probability Study 4.23). ThO&e examples notwithstanding, by model in statistics is the IWrmal (or Gaussian) distribution, (7.1.1) Some of the history surrounding the normal curve has already been distussed in Chapter it first appeared as a limiting form the binomial.. but then soon found learned how to find areas under itself used most often in non-binomial situations. We normal curves and did some problems involving.sums and averages. Chapter 5 provided estimates of the parameters of the normal density and showed their role in fitting normal curves to data. In this chapter. we will take a second look at the properties and applications of this singularly important pdf, this time paying attention to part it plays in estimation and hypothesis testing. 1.2 COMPARING v - It AND y - SuppO&e that a random sample of n measurements, Yl> Yz, ... , Yn , is to be taken on a normally distributed. the objective being to draw an trait that is thought to about the pdf's true mean, p,. If the variance is known, we know how to proceed: A decision rule for testing Ho: p, = J10 is given in Theorem 6.2.1, and the construction of a confidence interval for p, is described in Section 53. As we learned, both of those procedures are based on the fact that the Y-p, has a standard nonnal distribution, /z(z). In n"<I,rh,'p though, the parameter is ISCl.oOl:nknown, so the ratio -p, r= cannot be (j/v n calculated, even if a value the mean-say, substituted for p,. TypicaUy, the only information experimenters have about (j2 is what can be gleaned from the YiS themselves. 1 " (Y The usual estimator for the population variance. of course, is S2 = i n the unbiased version of the likelihood estimator for (j2, The question is, what effect does replacing (j with S have on the Z ratio? Are there are probabilistic differences between Y-p, Y-p, and ---:::- Section 7.2 Comparing y- 471 Hi.storicaUy, many eady practitioners of statistics felt that replacing (1 with Shad, in no effect on -the distribution of the Z ratio. Sometimes they were right. If the sample is very large (which was not an unusual state of affairs in many of the early applications of statistics), the estimator S is essentially a constant and for all intents and purposes equal to the true CT. Under those conditions, the ratio Y- wiJIbehave n is smaH, much like a standard normal random variable. Z. When the sample though, replacing (1 with S does maner, and it changes the way we draw inferences about Ii. Credit for recognizing that Y -:;,; and (1/ n -:;,; do not have the same distribution goes SI n to William Sealy Gossen. After graduating in 1899 from Oxford with a First Class degree in Chemistry, Gossett took a position at Arthur Guinness, Son & a firm that brewed a thick dark ale known as stout. Given the task of making art of brewing more quickly that any experimental smdies would necessarily face two obstacles. for a variety of economic and logistical reasons, sample sizes would invariably small; and second, there would never be any way to know the exact value of the true variance, (12, associated with any set of measurements. when objective of a_smdy was to draw an inference about f.,L. Gossett found himself working with the ratio Y -:;,;, where n was often on the order of four or five. S/ n more he encountered situation, the more he became convinced that ratios of that sort are not adequately described by the standard normal pdf. In particular, the distribution ot Y ; . seemed to have the same general beU-shaped configuration as Jz(z), but the tails SI n were "thicker"-that js, ratios much smaller than zero or much greater than zero were not as rare as the standard normal pdf would predict. . Figure 7.2.1 illustrates the distinction between the distributions of Y - a.nd -:;,; S/ n that caught Gossen's attention. In Figure 7.2.1a, 500 samples of size n = 4 have been drawn from a normal distribution where the value a is known. For each sample, the ratio -:r.; has been computed. Superimposed over the shaded histogram of those five 0"/ 4 hundred ratios is the standard normal curve, fz(z). Clearly, the probabilistic behavior of the random variable -:r.; is entirely consistent with a/ 4 histogram pictured in n fz(z). is also each sample, so the ratios comprising the histogram are , on hundred samples of = 4 drawn from a nonnal distribution. Here, though, S has been calculated for Y -:r.;4 rather than -:r.;4 ~ ~ In this case, the superimposed standard normal pdf does nor adequately describe the histogra~pecifically, it underestimates the number of ratios much less than zero as weB as the number much larger than zero (which is exactly what Gossett had noted). 412 Chapter 7 The Normal Distribution 0.2 Observed distrihutioll of l'-M uN4 (500 samples) 0.1 fz(z) -3 -2 o -1 1 2 3 4 (a) 0.2 distribution oi l:' .~ 0 0.1 -3 -2 -1 0 1 2 4 3 (b) FlGURE1.:U Gossett called the quotient T = Y- j.J., a t ratio, a.nd published a paper in 1908 entitled "The Probable Error of a Mean" in he derived a formula for the ratio's pdf, fr(t). Today, Gossett's work in finding h(t) is considered to be one of the major statistical breakthroughs of the early twentieth century. We will derive fL(t) in the next section and get a first look at some of the ma.ny applications of the ratio Y - j.J., Comment. Initially, Gossett's derivation of fret) attracted very little attention, inkling of the impact that the VirtuaUy none of his contemporaries had the "t distribution" would have in modern statistics. Indeed, fourteen paper was published, Gossett sent a tabulation ofms distribution to a fellow statistician (Ronald A. Fisher) with a note saying, "I am sending you a copy of Student's Tables as you are only man that's ever: likely to use them:' Deriving the Distribution of y -;: Section 7.3 sl", n 413 DERIVING THE DISTRIBUTION OF Y - J.L speaking, set of probability functions that statisticians have occasion to use fall into two categories. There are a dozen or so that can effectively model the individual measurements a variety of real-world phenomena. These are the distributions we studied in Chapters 3 4-most notably, nonnal, binomial, exponential, hypergeometric, and There is set of probability distributions that model the behavior of functions based on sets n random variables. are called sampling functions they are typically used purposes. distributions, and The normal distribution belongs to We have seen a number of scenarios (IQ scores., for example) where the Gaussian distribution is at describing the distribution of repeated measurements. At the same time, the nonna! distribution is . u H J..... Lf used to model probabilistic behavior of 0) ; . . In the Latter capacity, it serves as a t! sampling distribution. Next to the distribution, the most important sampling distributions are the Student t distribution, the chi square distribution, and the F distribution. ALL three will be introduced in this section, because we need the lat ler two to fr (t), the pdf for . Y tt I rallo, r.:: . although our objective in this section is to study the Student t S/",11 distribution, we the process the two other distributions that we wlLl be encountering over and over in the chapters ahead. Deriving pdffor a I ratio is not a simple matter. That may come as a surprise, Y-tt -Jii that deducing the pdf for going from Y tt to 0/ 11 Y~ Sf is easy (using moment-generating functions). But creates some major mathematical complications because t! T ratio of two variables, Y and of which are functions of 11 random YI, fl •.... Y". In general-and this ratio is no exception--finding pdf's of quotients of random variables is difficult, especially when the numerator and denominator random variables pdfs to with. As we will see in the next fT(t} plays out in several m steps. First, we show that L ZJ, where the ZjS are independent standard normal random j=l variables, has a gamma distribution (more specifically, a special case of the gamma distribution, called a chi square distribution). Then we show that Y and S2, based on a random of size 11 from a Donnal distribution, are independent random variables and that two final step two the (11 1)S2 has a distribution. Next we the pdf of of chi square random variables (which is called the F distribution). The tb. proof;, to mow that T2 ~ (~/~) 2 can be written as the quotient of random variables, making it a special case of the F distribution. Knowing us to deduce fT(t). 474 Chapter 7 The Normal Distribution Z;, where u= Theorem 7..3.1. Z2 •... , Zm are iruJ.eperuJ.ent staru:iard "fWrmLll random variables. Then U has a gamma distribution with r (~r'MJ2)-1e-"'" 1 Ju(u) Proof. First take m F Z 2(U) = m (n = ; = ~. Thnt is, O. = p(Z2::5: u) = P(-Ju ::5: Z ::: v'u) = 2P(O ::5: Z ::: v'u) = 2 10';;; dz d du == - FZ2(U) = Notice that fu(u) = Theorem 4.6.4, then, r u '" and A = 1. For any u :::: 0, Differentiating both sides of the equation for fz2(u) =; and). (u) gjves /z2(U); 1 = 2 !2r (~) r't\_1 -ufl u''1 I~I e L (u) the form of a pdf with r sum of m bas the stated = !. =! !. and A == By distribution with 0 The distribution of the sum of squares of independent standard normal random variables is important that it its own name, the fact that it represents nothing more a special case of the gamma distribution. m pdf of U Definition 7.3..1. =L where are inde1)Emdent stan- j=l dard normal random variables, is called the freedom. square distribuJion with m degrees of next theorem is especially critical the derivation of fr (t). Using simple algebra, it can be shown that the of a t ratio can be as the quotient of two chi and the other a function of S2. By showing that Y random variables, One a function and S2 are independent Theorem does), Theorem 3.82 can be used to find an eXt)r~>Sion for the of the quotient Theorem 7.3.2. Let 1'1> ... , Yn be a random Sl1J11pie from a normal distribution with mean It and variance a 2 . Then B. S2 and Yare independent b. ---;:-;--- - 1 (Yi - has a square distribution with n - 1 degrees of freedom Proof. Appendix 7.A.2 o Section 7.3 Deriving the Distribution of y -:; s/v fl 475 As we will see shortly, the of a I ratio is a special case of an F random variable. The next definition and theorem summarize the properties of F distribution that we will need to find the associated with the Student t distribution. Definition 7.3..2. Suppose that V and V are independent nand m square random variables A random variable of freedom, the form V / m is V/n said to have an F distribution with m and n degrees of freedom. Comment. The F in the name of Sir Ronald Theorem 7.3.3. Suppose Fm distribution commemorates the renowned VIm . == U /n denotes an F random varinble with m and n degrees 11 of freedom. The pdf of Fm .n has the form !F",.,,(W} fv(v) We begin by 1 = -.,..,...-:-----,~------ [' (n mw)(m+n){2 + uU,"U1l5 the = ..,....--:=-_ _ v(m{2)-l e -l.l{2 for V / U. From Theorem 7.3.1 we know that 1 and !u(u) = ---;;::---..,......tc.. , From Theorem 3.8.2, we have that the pdf of W fv/u(w) = w ::: 0 = V / U is fooo lul!u(u)!v(uw) du 1 du The integrand is the variable part of a gamma density with r = (m + n)/2 and 1 = (1 + w)j2. Thus, the integral the inverse of density's constants. This statement of the theorem, then, foHows from the fYJ1t!(w)=f!!..v/u(w)= U/~ '" 1 n m / derived in Chapter 3 = '!!..!v/u (m w) ( ~) nlm n n o 476 Chapter 1 The Normal Distribution FTabies When graphed, an F distribution looks very much like a typical chi square distributionvalues of V j m can never be negative and the F pdf is skewed sharply to the right Oearly, Ujn the complexity of iF. (r) makes the function difficult to work with directly. Tables, though, are widely av~'ilable that give various percentiles of F distributions for different values of m and n. Figure 7.3.1 shows !F3.,(r). In general., the symbol Fp •m •n will be used to denote the 100 pth peroentile of the F distribution with m and n degrees of freedom. Here, the 95th percentile of f F3.s(r)--that is, F.95.3,s-ts 5.41 (see Appendix Table A4). Area=O.OS o 2 3 4 6 5 F.9S,3,S ( = 5.41) FIGURE 7.3.1 Using the F Distribution to Derive the pdf for t Rati05 Now we have all the background results necessary to find the pdf of Y-jJ. .Jii . Actually, Sj n though, we can do better than that because what we have been calling the "t ratio" is just one special case of an entire family of quotients known as f ratios. Finding the pdf for that entire family will give us the probability distribution for y ~ as well. 11 Sf Ddinltion 7.3.3. Let Z be a standard normal random variable and let U be a chi square random variable independent of Z with n degrees of freedom. The Student f mtle with 11 degrees offreedom is denoted Tn. where Comment. The tenn "degrees of freedom" is often abbrieviated by df. Lemma. The pdf for Tn is symmetric: iT" (t) = iTn (-t),for all t. Deriving the Distribution of Y - Section 417 Proof. Since -z =~ is the ratio of a standard normal random variable to an independent chi ~llIAr.. it must have a Student t /T" (n). But the pdf of the point t. Therefore, frn(-t) (I), for all t. = Theorem 7.3.4. The pdf for a Student t random variable with n af!l!rel"S of freedom is given by r fr~ (I) = -~~~----''---;-:-= fo?r = - Proof. Note that ~OO < t < (i) (1 + :) has an F distribution with 1 and n dC. 1 f T!: (t) = ---,-:-:-"'---,-"1/ Suppose that t > O. By FT,,(t) i > of h,,(t), P(Tra :::; t) = 1 2: + P(O:s Tn :s t) 1 1 = 2 + -2 P (-t < T.n < t) - = 2~ + ~2 P (0 <- T:.. -< r2) 1 =- + Differentiating (I) (0 00 2 the stated result: 1 -FT,2(?-) 2" 0 478 7 The Normal r ".,...,.."'. Comment. the years, the tower case t has come to be symbol for the random variable of Definition 7.3.3. We will follow that convention when the context allows some flexibiUty. In mathematical statements about distributions, though, we will be with random variable and denote the Student t ratio asT". All that remains to to accomplish OUI original goal of finding the for - ;: is to show that the latter is a special case the Student t random Sfv n described in Definition Theorem 7.3.5 provides Notice that a ~'''r''n yields a t ratio in case having n - 1 degrees of rreeac)01. Theorem 7.3.5. Let Yl • Yz..... be {J random sample mean jl and standard deviation (1. Then has a Student t distribution with n Proof. We can - J1.. is <~~"''''U Y-jl In Sf n ll1p(,>r"I'.~ a nonnal distribution with offreedom. in the form a standard normal random variable with n - 1 df. Moreover, Theorem 73.2 shows that are independent The statement of the theorem follows immediately, then, from Definition 0 'T (t) and'z ft How the Two pdf's Are Related Despite the disparity in the of the for fr,,(t) and h(z), Student t distributions and the standard normal distribution have much in common. Both are bell shaped, and centered around zero. Student t curves, though, are Deriving the Distribution of yo -:: S/",n Section .. 419 0.4 , , '" .I , ; II" .II, II J ;1 I 1.1 1 II; II I /1; 0.2 #.1 IIJ --4 I .. " ... ~--- ..... I? ~ " ------::.,. -3 -2 o -1 2 1 3 4 AGLlftE 7.3.2 Figure 7.3.2 is a graph of two .nUUtoH 10 dl Also pictured is the standard becomes more and more like /z(z). The convergence of 11;,(1) to fz(z) is a The sample standard S is deviation of S goes to 0 as n goes to infinity IOl,J!I:ICifl&-<me with 2 df and the other with (z). Notice that as n increases, fr~ (t) of two estimation properties (1) for a, and (2) the standard Question 7.3.4). Therefore, as n gets large, the probabilistic behavior of y - ;.: will become increasingly similar to the distribution S/v n to fz(z): QUES110NS X; 7.3.1, Show directly-without appealing to the fact that is a random variable-that frCy) as stated in Defmition 7.3.1 is a true probability density turictIon. 7.3.2. Fmd the moment-generating function for a chi variable and use it to show that £(x~) = n and Var(x~) = 211. 7.3.3. Is it believable that the numbers 65, 30, and 55 are a random ~rnnl,1" normal distribution with J..l = SO and a = 1O? Answer the Qu.estllon distribution. Him.; Let Z; = (fi - SO)/1O and use Theorem 7.3.4. Use the fact that (n - 1 )S2 / (12 is a chi square random with 11 1 df to prove Him:: Use the fact that the variance of a chi square ... 7.3...5. . .. , fll be a random sample from a normal "'-lY"""lvu7.3.4 to prove that S2 is consistent for a 2 • U ......V.CLl ,,"~·.,kl'" with k df is 2k. Use the statement of 48() Chapter 7 The Normal Distribution 7..3.6. If Y is a chi square random variable with n l'f'f'lIllm - the pdf of (Y - n) /..!iii converges to h (z) as n goes to infinity Question 7.3.2). Use the asymptotic of (Y - n)/..!iii to apJl[o:I(imlate the fortieth percentile of a chi square random variable with 200 degrees 7.3.7. Use (a) (b) (c) Appendix Table A.4 to find F5IJ.6,7 F.OO1,15,S F.90,2.2 7.3.8. Let V and U be chi square random variables with 7 and 9 freedom. respecti"ely. Is it more likely that ~~~ of will be between (1) 2.51 and 3.29 or (2) 3.29 and 4.20? 7.3.9. Use Appendix Table A.4 to find the values of J: that satisfy the following equations: (a) P(O.l09 < < x) = 0.95 (b) < 1.69) = x (e) > = 0.01 (d) P(0.115 < 6.x < 3.29) = 0.90 (x /2) = < V 0.25, where V is a square random variable with 2 df and U U/3 is an independent chi square random variable with 3 df. P 7.3.10. Suppose that two independent with variance 0'2. Let and Si (n - of size 11 are drawn from a normal denote the two sample variances. Use the fact that 1)S2 h as a chi. square di'bu .. sttJ lJon With n - 1 (If to expl' am why Fm•n =1 7.3.11. If the random variable F has an F distribution with m and n that 1/ F has an F distribution with nand m degrees of freedom. 7.3.12. Use the result cJaimed in Question 7.3.11 to /"oy,'I"f'Jitc: of from fF",,~(r). That is, if we know :::: b) = Q. what values of c and d wjll "Check" your answer with Appendix Table and F.9:5.8,2. of freedom, show p~ro~ntlles of iFn.m (r) in terms a and b for which P(a :::: :::: Fn •m :::: d) = q? the values of F:OS.2.8. 7.3.13. Show that as n -+ 00, the pdf of a Studenlt random variable with n df converges to Hint: To show that the constant term in the pdffor Tn converges to 1/5. use formula, n! ...... ''1'1'''1''\ 7.4 Drawing Inferences About JL 481 7.3.14. Evaluate the integral roo 10 1 1 + dx using the Student t distribution. DRAWING INfERENCES ABOUT JL One of the most common of all statistical is to draw inferences about the mean of the population being represented by a set of Indeed, we have already taken a first look at in Seelion 6.2. If the ViS come from a normal distibution where 0" is known, the nuH hypothesis Ho: IL = lLo can be tested by calculating a Z ratio. y - IJ. (recall 6.2.1). Implicit in that solution. though. is an assumption not likely to be satisfied: rarely does the actually know the value (I. 7.3 dealt with precisely that -IL scenario and derived the pdf of the ratio TlI -l = where (J has been replaced by S. Given (which we learned i distribution with n - 1 degrees (Jf freedom), we now have the tools to draw inferences about JL in the all-important case where 0" is not known. Section 7.4 illustrates these various techniques and also assumption underlying the "t and looks at what happens aSS;UITlptllOn is not satisfied. an tTables We ha,.!e already seen that doing hypothesis tests and confidence intervals using Y ; . or some other Z ratio requires that we know certain upper and/or lower (1/ n percentiles from the standard normal distribution. There a similar need to identify appropriate "cutoffs" Student t distributions when procedure is based on Y - Jl r:: • or some S/",n Figure 7.4.1 shows a portion of the t table that appears back of every statistics book. Each row to a different Student t pdf. column headings give the area to the right of the appearing in the body of the For example, the listed in the a .01 column the df = 3 row has the property that 2: 0.01. More generally, we will use the symbol tex,l> to denote the 100(1 - a)th percentile of frn (t). That is, P(Tn 2:: la.n) Figure 7.4.2). No lower of Student t curves need to be tabulated symmetry of fTll (t) implies that P (Tn':::: a. The number of different Student t pdfs summarized in a t table varies considerably. Many tables will provide for degrees of freedom one to thirty; others will include df values from one to fifty, or even one to one hundred. The last row in any t table. though, is always labeled "00": course, correspond to Za. = = 481 Chapter 7 The Normal Distribution a df 20 .15 .10 .05 .025 .01 .005 1 2 3 4 5 6 1.376 1.061 0.978 0.941 0.920 0.906 1.963 1.386 1.250 1.190 1.156 1.134 3.078 1.886 1.638 1.533 1.476 1.440 6.3138 2.9200 23534 2.1318 2.0150 1.9432 12.706 4.3027 3.1825 2.5706 2.4469 31.821 6.%5 4.541 3.747 3365 3.143 63.657 9.9248 5,8409 4.6041 4.0321 3.7074 30 0.854 1.055 1.310 1.6973 2.0423 2.457 2.7500 00 0.84 1.04 1.28 1.64 1.96 2.33 2.58 2.n64 AGURE 7A.1 .... .... . ,.. \ ".... .. ".. ... ... o RGURE 7A.2 Constructing a Confidence Interval for fL The fact that Y ~ has a Student t distribution with n - 1 degrees of freedom justifies Sj n the statement that Y-J.L P ( -tOi /2,,,-l::S S j In ::s t Ol !2.n-l ) = 1 - a or, equivalently, that (7.4.1) (provided the Y;s are a random sample from a normal distribution). When the actual data values are then used to evaluate Y and S, the lower and upper endpoints identified in Equation 7.4.1 define a 100(1 - a)% confidence interval for J.L. Section 7.4 Drawing Inferences About /L 483 Tbeorem 7.4.L Yr. )12 ••.•• Yn be a random :"ample of size n from a nomwl distribution with (unknown) mean J.L. A 100(1 - a)% confidence interval for J.L is the set of values s _ y+ CASE STUDY 1.4.1 To hunt flying insects, bats emit high-frequency sounds and then for their echoes. Until an insect is located, these pulses are emitted at intervals of from fifty to one hundred When an insect is detected, the pulse-te-pulse interval suddenly ae<::::reasc~-SCI1lli~tlrlles to as low as ten milliseconds-thus enabling the bat to pinpoint its prey's position. This an interesting question: How far apart are the bat and the insect when the bat first senses that the insect is there? Or, put another way, what is the effective range of a bat's echolocation system? The problems had to be overcome measuring bat-ie-insect the statistical problems involved in detection distance were far more complex analyzing the actual data. The procedure that finally evolved was to put a bat into an and record eleven-by-sixteen-foot room, along with an ample supply of fruit with two synchronized millimeter sound-on-tilm cameras. By examining the two sets of pictures frame hy frame, scientists could follow bat's flight pattern and, at same time, monitor its pulse frequency. each insect that was caught (64), it was therefore possible to estimate distance between the bat and the insect at the precise moment the bat's pulse-to-pulse decreased Table 7.4.1). TAm..£7A.1 Catcb Number Detection Distance (em) 1 62 2 52 3 68 23 34 4 5 6 7 8 45 27 9 42 B3 10 56 11 40 484 Chapter 1 The Normal Distribution (Q/seStudy 7.4.1 cOlllinued) Define J,L to be a bat's true average detection distance. Use the eleven observations in Table 7.401 to construct a 95% confidence interval for J,L. Letting}'l = 62, Y2 = ... , Yll = 40, we have that Yi = 532 >7 =29,000 and Therefore, 532 y= 11 = 4B.4cm and s= If the population from which the YiS are being drawn is the behavior Y-J,L wi]) be described by a Student t curve in the Appendix, ten ae,~re(!s P(-2.2281 < TIO < 2.2281) of freedom. From Table A.2 = 0.95 Accordingly, the 95% confidence interval for JJ., is (Jrr) ,y + (Y - 22281 C~)) = 48.4 - 2.2281 JIT .48.4 + 2.2281 (18~)) JiI (1&1) ( = em,60.6 EXAMPLE 1.4.1 The sample mean and sample standard deviation for the candom sample of size II = 20 in the following list are 2.6 and respectively. Let JJ., denote the true mean of the distribution being by these YiS. Section 7.4 2.5 3.2 0.5 0.1 0.1 0.2 0.2 0.1 0.4 0.4 0.3 7.4 8.6 1.8 0.3 Drawing Inferences About Ii- 485 1.3 1.4 11.2 2.1 10.1 Is it correct to say that a 95 % confidence interval for tJ. is the set of values _ ( y - t.Q2S.n-l . = ( 2.6 s _ In' Y + - 2.0930 . (025.11-1 . 3.6 6 J21i' 2. + s ) In 2.0930· 3.6) J20 = (0.9,4.3) No. It is true that all the correct factors have been used in calculating (0.9, 4.3), but Theorem 7.4.1 does not apply in this case because the normality assumption it makes is dearly being violated.. Figure 7.4.3 is a histogram of the twenty YiS. The extreme skewness that is so evident there is not consistent with the presumption that the data's underlying pdf is a normal distribution. As a result, the pdf describing the probabilistic behavior of Y-tJ. .J2O would not be h I9 (t)· SI 20 ~ ill this situation is not exactly a T19 random variable SI 20 leaves unanswered a critical question: Is the ratio approximately a T19 random variable? We will revisit the normality assumption-and what happens when that assumption is not satisfied--larer in this section when we discuss a critically important property known as robustness. Comment. To say that o y y 5 RGURE 7.4.3 10 486 Chapter 7 The Normal Distribution QUES110NS 7.4.1. Use ........,."1'1.,, Table A2 to find the following probabilities: (D) ?!:: 1.134) (b) (c) (d) P(-L055 < < 7.4.2. What values of x satisfy the following equations? (a) P(-x.::; < 0.98 (b) P(T13?!:: x) = 0.85 (cl P(T26 < x) = 0.95 (d) P(T2:::: x) = 0.025 7.4.3. Which of the following differences is or (05.n - = 9 is drawn from 7.4.4. A random sample of size n a normal distribution with,.., = V.6. Within what interval (-0, +0) can we expect to find 7.4..5. of the time? a random sample of size n For what value of k is p 7.4.6.. = 11 Y 27.6 80% of the time? 90% js drawn from a normal distribution with (l's7;'°1 ~ » ~ 0.05 and S denote the mean and sample standard deviation, respectively, based a set of n 20 measurements taken from a normal distribution with J.1 == 90.6. Find the function k(S) for which 00 P(90.6 k(S) .::; Y 90.6 + k(S» = 0.99 7.4.7. In the home, the amount of radiation by a color television set does not pose a health problem of any consequence. The same may not be true in department stores, where as many as 15 or 20 sets may be £Urnoo on at the same time and in a relatively confined area. The following readings (in per hour) were taken at 10 different department stores, each having at least five sets in their sales areas (89). recommended safety limit set by the National Council on Radiation Protection is mr/h.) Store 1 2 3 4 5 6 7 8 9 to Radiation Level 0.40 0.48 0.60 0.15 0.50 0.80 0.50 0.36 0.16 0.89 Section 7.4 Drawing Inferences About IL 481 Construct a 99% confidence interval for the true department-store radiation exposure level. . 7.4.8. The following table lists the costs of repairing minivan bumpers damaged by a 5-mph these seven observations to construct a 95% oonfidence interval for collision (195). JL. the true cost for the population of all minivan models damaged. NOIe: sample standard deviation for these data is Cost of Nissan Quest Oldsmobile Silhouette Grand Caravan ,"',..... " •• Lumina Toyota Previa LE Pontiac Trans Sport MazdaMPV $1154 1106 1560 1769 1741 3179 7.4.9. Creativity, as any number of studies have shown, is very much .Ii province of the Whether the focus is music., literature, science, or an individual's made most profound work seldom occurs late in life. discoveries at the Newton, at the age of 23. The following are 12 scientific: breakthroughs daling from the middle of the sixteenth century to the early years of the tWI~ULll::'ll century (216). All represented high-water marks in the careers of the scientists involved. around sun basic laws of astronomy of molion, gravitation, calculus Nature of electricity Burning is uniting with oxygen Earth evolved by gradual processes for natural selection controlling evolution Field equations for light Radioactivity Quantum theory Special theory of relativity, E = mc2 Mathematical foundations for quantum theory (s) What can be inferred Copernicus Galileo Newton Franklin Lavoisier 1543 1600 1746 1774 1858 Maxwell Curie Planck Einstein 1864 1896 1901 1905 1926 40 34 23 40 31 33 49 33 34 43 26 39 these data about the true average age at which scientists do best work? the question by constructing a 95% confidence interval. (b) Before constructing a confidence interval for a set of observations extending over a period of time, we should be convinced that the YiS no biases or trends. example, the at which made discoveries from century to century, the parameter tL would no be a constant, and the confidence interval would be meaningless. Plot "date" versus "age" for these 12 Put "date" on the abscissa. Does the variability the Yill appear to be random with respect to time? 7 Normal DlstriblJtion 7.4.10. Fueled by the popularity of low-fat the 19905 saw a profusion of new food products claiming to be "no fat" or "low fat." To assess the impact of those ages to ucts, measurements were taken on tbe daily fat intakes of 10 34. What does J.L in this context? Use the data-I28.1, 57.1, 117.0, 146.1, 1423, 107.8, 103.7, and 128.7-to construct a 90% confidence interval [or J.L. Note: Yi = 1101.3 = 128.428.67 and 7.4.11. In a nongeriatric population, platelet counts ranging from 140 to 440 (1Ooos per of blood) are considered "normal." The following are the platelet counts recorded for 24 female residents (176). Note: YI = 4645 24 and E y; == 959,265 ;=1 Count 1 2 3 4 5 6 7 8 9 10 11 12 125 170 250 270 144 184 176 100 220 200 170 160 Count 13 14 15 16 17 18 19 20 21 22 23 24 180 180 280 240 270 220 110 176 280 176 188 176 How does the definition of "nonnal" above compare with the 90 % confidence interval? 7.4.12. If a normally distributed sample of size n = 16 produces a 95% confidence interva1 for J.L that ranges 44.7 to 49.9, what are the values ofy and s? 7.4.1.3. Two each of size n, are taken from a nonnal distribution with unknown mean J.L and unknown standard deviation a. A 90% confidence interval [or J.L is constructed with the first sample, and a 95% confidence interval for J.L is constructed with the second. Will the 95% confidence interval necessarily be longer than the 90% confidence interval? ..... At'''''' .... 7.4.14. Revenues reported last week from nine boutiques franchised an international on those figures, clothier averaged $59,540 with a standard deviation of $6,860. in what range the company expect to find the average revenue of all of its boutiques? Section 7.4 10':." ..11".1'1 Inferences About II- 489 '7.4..15. What "confidence" is associated with each of the folJowingrandom intervals? Assume that the Yj'S are normally distributed. (~), V + 20930 (~) ) (v - 20030 (b) (v - 1345 (e) (v - 17056(~) ,V + 27787(~)) (a) (d) (-= Y + (~), V +1345 (~) ) 1 ( ~) ) 7.4..16. The following are the median home resale reported in 17 U.S. cities for the median fourth quarter of 1994 (199). Would it be re.a:SOllabJe to estimate the true home resale for that period by substituting data into Theorem 7.4.1 to find a 95% confidence for JL1 Explain. Median Albuquerque Atlanta Rouge Charlotte Cleveland Dallas Denver Fort Lauderdale Indianapolis Memphis New Orleans Philadelphia Richmond Sacramento Salt Lake City Seattle 93.4 77.1 104.6 98.1 923 119.0 101.8 89.2 77.8 66.6 115.4 99.2 121.5 102.2 Testing Ho: II- == ILo (The One-Sample tTest) Suppose a (normally distributed) random size n is observed for the purpose of testing the null hypothesis that JL = JLo. If a is unknown-which is llSuaHy case-the procedure we use is called a one-sample t test. Conceptually, the latter is much like the Z test of Theorem rather than z than fz except that the decision rule is defined in terms of I = Ys/-.J// n - JLo [which requires that the critical values come from h~-l (t) rather 490 Chapter 7 The Normal Distribution Theorem 7.4.2. Let Yl ,n. where (f 8. .. be a rmulom sample ofsize n from a normal. distribution is unknown. Let t = ---==- To test Ho: /-L = /-La versus /-L > /-La at the a level of significance, HO if t~ b. /-L = /-Lo versus H1: /-L < /-La at the a level of significance, reject Ho if t:;; c.. To test Ho: /-L = /-La versus either (1) :;; -ta !2,n-l or (2) "'~I..,.."u ...... 7.A.4 /-L ~ /-La at the a level of significance, reject Ho if t is ~ the derivation that described in Theorem 7.4.2. In short, the test statistic t function the A that !dnT'IP<>r" in Definition which using the procedure =Y- /-La is a monotonic one-sample 1 test a o EXAMPLE is a children's disorder characterized by a craving for nonfood substances such as day, plaster, and paint Anyone affected runs the risk of ingesting high levels of lead, which can result in and neurological dysfunction. Checking a child's blood level is a standard procedure for diagnosing the condition. Among cbJJdren between the ages of six months and five years., blood lead levels of 16.0 mgJl are considered "normaL" Recently, a random sample of twelve children enrolled in Molly's Mighty Bear Nursery had their btood The sample mean and sample standard deviation were 18.65 and 5.049, it be concluded that children at this particular facility tend to have higher lead levels? At the ot = 0.05 level, is the from to 18.65 statistically significant? Let ~ denote the true average lead level children enrolled at Mighty Bear. The hypotheses to be tested are /-L = 16.0 versus Hl: ~ > 16.0 thata = O.05,n = and Eo is one-sided to therigbt, the critical valu.e from Part (a) Figure 7.4.4). of Theorem 7.4.2 (and Appendix A.2) is t.05.J1 = 1.7959 just n little to the Substituting y and s into the t ratio gives a test stat.istic that of 1.05,11: that the y of 18.65 does represent a statistically Section 7.4 Drawing Inferences About p, 491 1.7959 L-RejectHo AGURE 1.JU EXAMPLE 1.4.3 Three banks serve a metropolitan area's inner-dty neighborhoods: Federal Trust, American United, and Third Union. The state banking commission is concerned that loan applications from inner-city residents are not being accorded same consideration that comparable requests have received from individuals in rural areas. Both constituengiven cies claim to have anecdotal evidence suggesting that the other group is preferential treatment. Records show that last year three banks approved 62 % of all the mortgage applicatio]1s fiLed by rural residents. Listed in 7.4.2 are approval rates posted over that same period by the twelve branch offices of Federal Trust (FT), American Uni ted (AU), and Third Union (TU) that work primarily with the inner-city community. Do these the banks are treating inner-city residents figures lend any credence to the contention and rural residents differently? Analyze the data using an Cl' = 0.05 level of significance. TABLE 1A.2 Affiliation 1 2 3 4 5 6 7 8 9 11 12 & Morgan Jefferson Pike East 150th & Oark MidwayMaU N. Charter H:ighway Lewis & Abbot West 10th & Lorain Highway 70 Parkway Northwest Lanier & King & Tara Court Bluedot Corners AU TU Percent 59 65 TV 69 Ff Ff AU 53 Ff AU 64 TU 67 60 46 AU 59 492 Chapter 7 The Normal Distribution TA&E7A3 Banks n AU t Ratio y s 58.667 6.946 Critical Value Reject Ho? No TABlE 7.4-4 American United Federal Third Union n y 4 5 3 52.25 8 t Ratio 5.38 3.96 -3.63 58.80 67.00 2.00 +4.33 -1.81 Value B.l825 ±2.7764 Rej eel Ho? Yes No Yes As a startio,g point, we might want to test Ht,). J.t = 62 versus HI: J.t :# banks. TabJe 7.4.3 ",,-"'cum,,. true average approval rate for all where J.t is the analysis. The two critical values are = ±2.2010. and the observed t ratio is 62) so our d'" 1 66 = 58.667 - r:;-;;:' -. eclSlon IS ( to 6.946/'\112 " " ........ 1·\1 iU' " flO· The "overall" analysis of Table though, may be too simplistic. Common sense three banks separately. What emerges, then, is an entirely would tell us to look also at different picture (see Table 7.4.4). Now we can see why both groups felt discriminated ~(l111~'L. American United (I -3.63) and Third Union (t +4.33) each bad rates that differed significantly from 62%-but in opposite directionsJ Only Federal Trust seems to be dealing with inner-city and rural in an even-handed way. = QUESTIONS 7.4.17. Recall the Bacillus subtilis data in Question the null hypothesis that exposure to the enzyme does not affecl a worker's respiratory capacity (as measured by the FEV1NC ratio). Use a one-sided Hl and let a = 0.05. Assume that (J is not known. 7.4..18. Recall Case Study 5.3.1. Assess the credibility of the theory that Etruscans were native Italians by testing an appropriate against a two-sided Ht- Set a equal to 0.05. Use 143.8 mm and 6.0 mro for y and s, respectively, and let }.to = 132.4. Do these data appear to satisfy the distribution ~umption made the t test? ........,..1" .......... · 7.4.19. MBAs R Us advertises that its program increases a person's score on the GMAT by an average of 40 points. As a way of checking the validity of that claim; a COnsumer watchdog group hires 15 students to take both the review course and the GMAT. Prior to starting the course, the 15 students were given a diagnostic test that predicted how Section 7.4 Drawing Inferences About p. 493 weJl they would do on the GMAT in the absence of any special training. The following table gives each student's actual GMAT score minus his or her predicted score. Set up and carry out an appropriate hypothesis test. the 0.05 level of significance. Subject LG SH KN DF SH ML JG Yi = act. GMAT - pre. GMAT YT 35 37 33 34 1225 38 40 1369 1089 1156 1444 CW 47 42 1600 1225 1296 1444 1089 784 1156 2209 1764- DP 46 2116 KH HS LL CE KK 35 36 38 33 28 34 Study 1.2.2. Let JL denote the true average wll ratio preferred by the Shoshonis. At a = 0.05 leveL, test Ho: JL = 0.618 versus HI: JL -# 0.618. What does your conclusion suggest about the "universality" of the Golden Rectangle? Note: y and s for these data are 0.661 and 0.093, respectively. 7 A.21. A manufacturer of pipe for laying underground electrical cables is concerned about the pipe's rate of corrosion and whether a special may retard that rate. As a way of measuring corrosion, the manufacturer examines a short length of pipe and records the depth of the maximum pit. The manufacturer's tests have shown that in a year's with, the average depth time in the particular kind of soil the manufacturer must of the maximum pit in a foot of pipe is 0.0042 inches. To see whether that average can be reduced, 10 pipes are coated with a new plastic and buried in the same soil. After one year, the following maximum pit depths are recorded (in inches): 0.0039,0.0041, 0.0038, 0.0044, 0.0040, 0.0036,0.0034, 0.0036, 0.0046, and 0.0036. Given that the sample standard deviation for these 10 measurements is 0.00383 can 1t be concluded at the 0: = 0.05 level of significance that the plastic coating is beneficial? 7.4.22. The analysis done in Example 7.4.3 (using all n = 12 banks with y =:: 58.667) failed to reject Ho: iJ., = 62 at the 0: = 0.05 level. Had JLo say, or 58.6, the same conclusion would have been reached. What do we call the entire set of iJ.,o'S for which Ho: iJ., = iJ.,o would fUJI be rejected at a = 0.05 7..4.20- Recall the Shoshoni rectangle data described in Testing Ho: JL :: JLo When the Normality Assumption l5 I\Iot Met Every t test makes the same explicit assumption-namely, that the set of n y,'s is normally distributed. But suppose the normality assumption is not true. What are the consequences? Is the validity of the r test compromised? 494 Chapter 7 The Normal Distribution IT'" (t) ;;; pdf of f when data are not normally distn'buted Area;aJ2 o RGURE74.5 7.4.5 true, the the first describing is We know that if the nonnality variation of the I y - ratio, ~o is h,,-l (t). The latter. of course, provides decision rule's critical If Ho: ~ = Jlo is to be tested against H1: JL Jlo, for example, the null hypothesis is rejected if t is either (1) :::: or (2) > 10:/2.,,-1 (whiCh makes the [error probability equal to 0'). * If the nonnality is not true, the pdf of p :::: -ta /2.,,-1 ) + P( y- Y - will not be JLo ) SIJii::::: 10:/2.,,-1 ~ (t) and 0' In effect, violating the nonnality assumption creates two a's: The "nominal" 0' is the Type at the outset-typically, 0.05 or 0.01. "true" a is the I error probability we actual probability that --:,,;:;.. falls in the rejection two-sided decision rule in (when Ho is true). For the 7.4.5, true a Whether or not the validity of the t test is "compromised" by the nonnality assumption being depends on the difference between the two a's. If h,.(t) in fact, quite similar in shape and location to h"-l (t), then the true et will approximately to the nominal et. In that case, the fact that the YiS are not nonnally (t) are would essentially irrelevant. On the other hand, if /p(t) and 7.4.5), it would follow the different (as they appear to be nonnality assumption is critical, and establishing the "significance" of a J ratio becomes problematic. Section 7.4 Drawing Inferences About It 495 Unfortunately, getting an exact eJ(pTl~&"ion for IT" (I) is essentially impossible, because the distribution depends on pdf sampled, there is seldom any way of knowing precisely what that pdf migb t be. However, we can stil1 meaningfully explore the sensitivity of the t ratjo to violations the normality assumption by simulating samples comparing the resulting histogram of t to size n from selected distributions !r,.-l (I), Figure 7.4.6 shows four such simulations, using MINITAB; first three consist of one hundred random samples of size n = 6. In Figure 7.4.6(a). the samples come from a 7.4.6(b), the underlying pdf is uniform pdf defined over the interval [0, 1]; in exponential with).. = 1; in Figure 7.4.6(c), the data are coming from a Poisson pdf with).. = 5. (8) ~ "0 '" .~ 1 r---------------'-- fy(y) == 1 o 1 y MTB ) eandom 100 ci-c6: SUBC) uni£o~m 0 1. MTB ) ~mean cl-c6 c7 MTB ) raedev cl'c6 c8 MTB > let c9 ~ sqr~(6)·«(c7) MTB > histogram c9 0.,,/(.,,\ This command calculates 4 t retio (n", 6) FKiURE7..4..6 6 8 496 Chapter 1 The Normal Distribution (b) MTB > random 100 cl-c6: SURe> expofiefitial 1. MXS MTB MTB MTB > rmean cl-c6 c7 > rSl:dev cl-c6 c9 > let c9 - sqrt(6)*«(c7) > hisl:ogram c9 -14 -12 ] - 1.0)/(c9» -10 I ratio (n -- 6) FIGURE 7.4.6: (Continued) lithe normality assumption were true, t ratios based on of size six would vary accordance with the t distribution with 5 df. On pages 495-497, frs(t) has been superimposed over the histograms of the t coming from the three different What we see there is really remarkable. t ratios on YiS from a uniform for example, are much the same way as t ratios would vary if the YiS were normally distributed-that fr' (t) in case appears to be very similar to frs(!)' The same is true for samples coming from a Poisson distribution (see page 497). For both of those underlying in other words., the true Ct would not be much different than the nominal Ct. Figure 7 .4.6(b) tells a slightly story. When of size six are drawn fTom an exponential pdf, the t ratios are not in particularly close agreement with frs (t). Specifically, very negative t are much more often than the Student t curve would predict, while t ratios are occurring less often (see Question But look at Figure 7.4.6(d). When the sample is increased to n = 15, the skewness so prominent in Figure 7 A.6(b) is mostly gone. Drawing Inferences About It 497 Section 1.4 (c) 0.16 .~ 5 I'l...(k) .. e jl :B k! rJ! e'" 0.08 ..Q g" OL......l_........J._--'-_......J.._-1-_....L...._..J......_..I..-_L....._L----I_k o 2 4 8 6 10 MTB ) random 100 cl-c6; SUBC> p01Bson 5. MTB > rmean cl-c6 c7 MTB > £I:lToev ci-c6 c8 KTB ) leT c9 w I:lqr~(6)·«(c7) - 5.0)/(c8}} MTB ) hisTogram c9 Sample distributioo <I t ratio (n fIGURE 7.4.6: (Continued) Reflected in these specific simulations are some general properties of the t ratio: 1. The distribution of t fy (y) is not too y - ); is relatively unaffected by the pdf of the Yi s [provided Sj",n and n is not too 2. A$, n 11"1"1""<>"""" the pdf of t = f-j.l r::: n (I). De(:ODles increasingly similar to S/v In mathematical statistics, term robust is used to describe a procedure that is not heavily dependent on whatever assumptions it makes. Figure 7.4.6 shows that the t test is robust with respect to departures from normolity. a. practical standpoint, it would be difficult to overstate the importance of the t test being robust. If the pdf of of the YiS, we would never know if - I.L varied dramatically depending on the true a associated with, say, a 0.05 decision rule 498 Chapter 7 The NOfmal Distribution (d) MTB > ~andom SUBe> MTB MTB MrB MTB 100 cl'c15; exponen~iel > naean 1. cl-c15 c16 > rs~dev el-e15 el7 > let CIS ~ sqrt(15)'«(c16 - 1.0)/(c17» > hls~osram cIS Sample distributioo -4 o -2 t 2 ratio (n = 15) FIGURE 1.4.6: (Continued) was anywhere near 0.05. That degree of uncertainty would make the t test virtually worthless. QUESTIONS 7.4.23.. Explain wby the distribution of t ratios calculated from small samples drawn from the exponential pdf, Jy(y) = e-Y , y ~ 0, will be skewed to the left (recall Figure 7.4.6(b). HinL- What t10es the shape uf fy(y) imply about lht: possibility of t:l:ICh Yi being close tu O? If the entire sample did consist of YiS close to 0, what value would the t ratio have? 7.4..24.. Suppose 100 samples of size n = 3 are taken from each of the pdf's (1) fy(y) = 2y. (2) jy(y) = 4l, and Section 7.5 Drawing Inferences About 499 and for each set of three observations the ratio is calculated, where J.4 is the expected value of the particular pdf being sampled. How would you expect the distributions the two sets of ratios to be different? How would they be similar? Be as specific as possibJe. 7..4..25. Suppose that random samples of size n are drawn from the uniform pdf, Jy(y) = . t = Y - 0.5 .IS calculated. Parts ( band ) ( d) of 1. O .:::: y :s 1. For each sample, t he ratIo Figure 7.4.6 suggest that the pdf of t will become to h~-l (r) as n increases. To which pdf is (t), itself, converging as n increases? 7..4.26. On which of the fOUowing sets of data would you be reluctant to do a t test? Explain. (Il) (b) (c) ______ __ -dER~ ~~ _____________________ ________- Z________ ~ _____ y y DRAWING INFERENCES ABOUT 0'2 When random samples are drawn from a normal distribution, it is USUally the case that the parameter J.4 is the of the investigation. More often than not, the mean mirrors the "effect" of a treatment or condition, in which case it makes sense to apply wbat we learned in Section 7.4--that is, either construct a confidence interval for JJ. or test hypothesis that JJ. = /.le. But exceptionS are not that uncommon. SituationS occur where "precision" associated with a measurement is, itself, important-perhaps even more important than the measurement's "location." If so, we to shift our focus to the sco.le parameter, Two facts that we learned earlier about the population variance will noW come play. First, an unbiased estimator (or 0'2 based on maximum likelihood estimator is the sample variance, S2, where S2 1 1 /.. 1 It - y)2 (Yi And, second, the ratio II - (Yj - iii has a square distribution with n - 1 degrees freedom. Putting these two pieces of information together allows US to inferences about 0'2-10 particular, we can construct confidence intervals for 0'2 and test the hypothesis that = a;. 500 Chapter 1 The Normal Distribution Chi Square Tables as we need a t table to out about J.L (when is unknown), we need a chi square table to provide the cutoffs for making involving (12, The layout of chi square tables is dictated by the fact that aU chi square pdf's (unlike Z and t distributions) are skewed (see, for example, Figure 7.5,1, showing a chi square curve havingfive 11 ..' .....""'" of freedom). of that chi square tables to provide both the left-hand tail and the right-hand tail of each chi square distribution.. O.IS 0.10 0.05 '" 0.00 o 16 15.086 8 4 1.145 AGURE 7.5.1 ........,r'h".... of the chi square table that appears in Appendix A3. Suc::cessi"e rows refer to rliff ....... 'nf chi square distributions (each a ,.,.,.,-", ..."" ... , numoer of degrees offreedom). The column headings denote the areas to the left of listed in the body of the table. We will use the symbol x;;.1l to denote the number along the horizontal axis that cuts off to left an area p un~r the chi square distribution with n degrees of freedom. P df :01 1 2 3 4 5 6 7 8 9 0.000157 0.0201 0.115 0,297 0.554 0,872 1.239 1.646 2.088 10 11 12 3.053 3.571 .10 0.000982 0.0506 0.216 0.4B4 0.831 1.237 1.690 2.180 2.700 3.247 3.816 4.404 0.00393 0.103 0.352 0.711 1.145 1.635 2.167 2.733 3.940 4.575 5.226 .90 .95 0.0158 0.211 0.584 1.064 1.610 2.204 2.833 3.490 4.168 4.865 12.017 13.362 14,684 15.987 5.578 6.304 18.549 RGURE1.S.2 14.067 15.507 16.919 18.307 19.675 21.026 .975 .99 5.024 7.378 9.348 11.143 6.635 9,210 11345 13.277 15.086 16.812 18..475 20.090 21.666 23.209 24.725 26.217 14.449 16.013 17.535 19.023 20.483 21.920 23.336 Section 7.5 Drawing Infererw:es About (12 501 For example, from th,e fifth row of the chi sqwue table, we see the numbers 1.145 and 15.086 under the column headings ,05 and .99, respectively. It follows that p(Xf =:: 1.145) = 0.05 and p(Xg =:: 15.086) =0.99 (see Figure 7.5.1). In terms of the x~." notation, 1.145 = area to the righi of 15,086, of course, must be 0.01.) X.1>s.s and 15.086 = x1ts (The Constructing Confidence Intervals for (n - 21)S2 h as a ch'1 square dis tn'bUt'Lon Wit .h ' nI - degrees 0 f freedom, we can SmOO (1 write 2 P ( xaj2,n-l =:: (n - I)S2 :S 2 Xl-..:rj2.11-1 ) =1 - a (7.5.1) If Equation is tben inverted to isolate (12 in center of inequalities, the two the population endpoints will necessarily define a loo{1 - a)% confidence interval variance. The algebraic details will be left as an p.v~.,..,..;,,,,,, Theorem 7.5.1. 82 denote the sample variance calcu/oJed from a random $O.mple of n observations drawn from a nomwl distribution with mean J.L and vammce (12. Then s. a 100(1 - (1)% confidence interval for (12 is the set of values b. a 100(1 - a) % confidence interval for (1 is the set of values 502 ~h"'r_r 7 The Normal Dtrtribution CASE STUDY 1.5.1 The chain of events that define the evolution of the Earth began hundreds of millions of years ago. Fossils pla.y a role in documenting the rekllive those events occurred, but to establish an absolute chronology, scientists rely primarily on radioactive decay. One of the newest dating techniques uses a rock's potassium-argon ratio. Almost all minerals contain potassium (K) as well as certain of its isotopes, including 4OK. The latter, though, is unstable and decays into isotopes of argon and calcium. 40 AI and By knowing the ra.tes at which the various daughter products are formed and by measuring the amounts of 40 Ar and 4{)K in a specimen, can estimate the Object's age. unCritical to the interpretation of any such dates, of course, is the precision of precision is to use the technique derlying procedure. One obvious way to estimate same age. Whatever variation occurs, then, from on a sample of rocks known to have rock to rock is reflecting the inherent precision (or lack of precision) of the Pf()Ce·our Table 75.1 lists the potassium-argon estimated ages of nineteen mineral samples, all taken from the Black Forest in southeastern Germany (115). Assume that the TABlE 1.5.1 of years) 1 2 249 3 254 243 4 268 5 253 6 7 269 287 8 241 9 10 273 11 306 303 12 280 13 260 256 278 14 15 16 17 18 344 304 283 19 310 (Continued on next page) Drawing Inferences About (1'2 Section 7.5 503 proceduxe's estima-ted ages are normally distributed with (unknown) mean Jl and (unknown) a 2 • Construct a 95% confidence interval for (T. Here , ... 1 19 2>1 = 1,469,945 1=1 so the sample variance is 733.4: 2 = 19(1,469,945) - (5261)2 s = 733 4 19(18) . Since n = 19, the critical values appearing in the left-hand and right-band limits of the a confidence interval come from the square pdf with 18 dt According to Appendix TableA3, P(8.23 < xis < 31.53) = 0.95 so the 95% confidence interval for the potassium-argon method's precision is the set of values = (20.5 million years, 40.0 million years) EXAMPLE 1.5.1 The width of a confidence interval for a 2 is a function of both nand S2: Width = upper limit - lower limit - en 2 1)s2 Xet!2,n-l =(n - 1)S2 ( en. - 1)S2 2 Xl-a/2.I'I-l 2 1 - -,;--- (7.5.2) xa./2.I'I-l As n gets larger, the interval will tend to get narrower the unknown a 2 is being estimated more precisely. What is the smallest number of observations that will guarantee that the average width of a 95% confidence interval for a 2 is no than (T2? 504 Chapter 7 The Normal Distribution Since is an unbiased estimator for of a 95% confidence interval for the variance E(Width) = (n - 1)0- 2 ( the expected width 1 2 X.025./!-1 - 1) x1,S.n-t Clearly, then, for the expected width to be less than or equal to that (n - 1) --::---- 21 X.975.n-l ).5 • n must be chosen so 1 Trial and error can used to identify the desired n. The three columns in Table come from chi distribution in Appendix Table & the computation in the last column indicates, n = is the smallest sample that will yield 95% confidence intervals for whose average width is less than 0-2 . TABLE 1.5.2 Testing Ho: n 2 x.025.n-l 2 X.975.n-l 15 20 30 38 39 5.629 8.907 16.047 22.106 22.878 26.119 0'2::: (n - 1) 45.722 55,668 1.95 1.55 1.17 1.01 56.895 0.99 O'! The generali.zed likelihood ratio criterion introduced in Section 6.5 can used to set up hypothesis tests for . The complete derivation appears in Appendix Theorem 7.5.2 states the resulting decision rule. Playing a key as it did in the construction of con.6dence intervals for the chi square ratio from Theorem Theorem 7.S.2. Let $2 denote the sample variance calculated from a random sample n observations drown from a normal distribution with mean ~ and variance 0'2. Let X 2 = (n - 1)s2/0-;. To test HO: 0'2 2 X 2> - Xl-a.n-l· b. To test Ho: 0- 2 8. = if versus Ht: (12 > (1; at the ex level of significance. reject = 0-; versus 81: < (1; at the Oi Jevel of significance, HO ::s X:;,1I-1' e. Tu ~t H,,' ueither (1) ::s -- o-t 0- 2 ¢ 0-; at the a level of significance, rejecl Ho if X2 is (} versus or (2) :::: Section Drawing Inferences About q2 505 CASE STUDY 7.5.2 Home buyers can choose a variety of ways to finance mortgages, ranging from fixed~ rate thirly~year notes to one-year adjustables, where interest rates can move up or down from year to Dwing the first quarter of 1994, lenders were years; the charging an average rate of 8.84 % on a $100,000 loan amortized over standard deviation from bank to bank was 0.10%. Since one-year adjustables give considerable flexibility in responding quickJy to changing economic we might reasonably expect those rates to have a greater standard deviation than the 0.10% that characterizes thirty-year fixed notes. Lenders should be more willing to incur higher risks to compete for potential clients by. if know they can make adjustments as time lists rates quoted by n :; 9 for one-year adjustables (186). sample standard deviation for those YiS is s = 0.22. Do these data lend credence to the speculation that rates for one-year adjustables are more variable than rates for conventional mortgages? (12 denote the variance of the population by the YiS in Table 7.5.3. To judge whether a standard deviation increase from 0.10% to 0.22% is statistically significant requires that we test Ho: (12 = (0.10)2 versus HI: (12 > (0.10i Let 0' = 0.05. With n = 9, the rejection region for the chi square ratio [from Part (a) at XLa,1l-1 = x1s.s = 15.507 (see Figure But of Theorem 7.5.2) so our decision is dear: Reject Ho. TABLE 7.5.3 Lender AmSouth Mortgage Boatmen's National Mortgage Cavalry Bank First American National Bank First Investment First Republic NationsBanc Mortgage Union Planters MortgageSouth Initial Rate on Adjustables 6.38% 6.63 6.88 6.75 6.13 6.50 6.63 6.38 6.50 (COJlJirw.ed on next page) S06 Chapter 7 The Normal Distribution (Case 7.5.2 ccnlinued) 0.12 0.08 0.04 o AGURE7.5.3 QUESTIONS 7.5.1. Use Appendix Table A.3 to find the following cutoffs and indicate their location on the chi square distribution. graph of the X,~.14 (b) X.~.2 (8) (c) X2.015,9 7.5.2. Evaluate the foUowing probabilities: (D) P(Xf7?: 8.672) (b) P{X~ < 10.645) (c) P(9.591::: 34.170) (d) P(X~ < 9.210) xio ::: 7.5.3. Find the value y that satisfies each of the (D) p{X~ 2: y) = 0.99 rrUlnUlIT! = 0.05 (b) P(x?s::: y) (e) P(9.542::: xb ::: y) = 0.09 (d) p{y::: ::: 48.232) = 0.95 7.5.4. For what value of n is each of the f'nllnWUl statements true'! (a) 2: 5.(09) 0.975 (b) (c) (d) 7.5.5. For df values ::: X; : :; 30.144) = 0.05 =0.05 ::: 24.769) = 0.80 the range of Appendix Table A3, chi square cutoffs can be using a formula based on cutoffs from the standard normal pdf, fz (z). Section 7,5 Drawing Inferences About (12 ~ z~) 501 = p, respectively. Then Approximate the 95th percentile of the chi square distribution with 200 df. That is, find the value of y for whkh p(xk::: y) .;" 0.95 7.5.6. Let Yl. Y2, . , .• Y" be a random sample size n from a normal distrihution having mean Ii- and variance (12, What is the smallest value of n for which Hint: Use a trial-and--error method. 7.5..7. Start with the fact that (11 - I)S2/02 has a square distribution with n 1 df (if the YjS are normally distributed) and derive the confidence interval formulas given in Theorem 7.5.&. Arandnmsampleofsizen = 19 is drawn from a normal distribution for whicha 2 = 12.0. In what range are we likely to find the sample $21 Answer the question by finding two numbers (J and b such that ::s S2 ::s b) = 0.95 7.5..9. One of the occupational hazards of being an airplane pilot is the hearing loss that results from being exposed to high noise levels. To document the magnitude of the problem, a team of researchers measured the cockpit levels in 18 commercial aircraft The results (in decibels) are as follows (94). Plane 1 2 3 4 5 6 7 8 9 Noise Level Plane Noise Level 74 10 72 71 11 90 80 82 85 80 13 14 15 16 75 17 75 18 87 73 83 86 83 83 80 (8) Assume that cockpit noise levels are normally distributed. Use Theorem 7.5.1 to construct a 95% confidence interval the standard deviation of noise levels from plane to (b) Use these same data to COIlStruct two one-sided 95% confidence intervals for (1. 508 Chapter 7 The Normal Distribution 7.5.10. Study 7.5.1, tbe 95% confidence interval was constructed for (1 rather than more likely to focus on the standard deviation or on . In practice, is an the or do you that both formulas in Theorem 7.5.1 are likely to be used. .................... often? Explain. 7.5.11. the asymptotic of chi square random variables (see Question to derive large-sample confidence interval formulas for (1 and (12. (b) Use your answer to Part (a) to construct an approximate 95% confidence interval the standard deviation of estimated potassium-argon ages based on tbe 19 in Table 7.5.1. How does this confidence interval compare with the one Case Study 7.5.1? 7.5.12. If a 90% confidence interval for is reported to be (51.47,261.90), what is the value of the standard deviation? 7.5.13. Let Y1 , Y2 •... , Y" be a random sample of size n from the pdf fy(y) = (~) (a) Use mClmlet1l-g(!;neratmg functions to show that the ratio 2nf/8 has a chi square distribution (b) Use the result in Part (a) to derive a 100(1 - ct)% confidence interval forB. rocks was used the advent of the ~VI"""'''''''lt-'''''~'JH 7.5.14. Another method for metboddescribed in 7.5.1. Based on a mineral's leadoontent, it was capable of yielding estimates for this same time period with a standard deviation of 30.4 million years. The potassium-argon method in Case Study 7.5.1 bad a smaller standard deviation of ...1733.4 = 27.1 million years. Is that the poita5siutIl-ar method is more precise? Using the data in Table test at the 0.05 level whetber tbe poj~~iuln-.alrg()n method has a smaller standard deviation than tbe older pn:ICe,awre lead. machine puts into 25 7.5.lS. When working properly, tbe amounts of cement that a recorded for 30 bags have a standard deviation «(1) of 1.0 kg. Below are the selected at random from a day's Test Ho: (12 = 1 versus HI: (12 > 1 the ct = JeveJ of significance. that the weights are distributed. 26.18 2530 25.18 24.54 25.14 25.44 24.49 25.01 25.12 25.67 Note: Yi = 758.62 and 7.5.16. A stock analyst claims to quality mutual funds and 24.22 23.97 25.05 26.24 25.01 24.71 25.27 24.22 24.49 25.68 26.01 25.50 25.84 26.09 25.21 26.04 2523 = 19,195.7938 deviserl a matbematical technique for that a client's portfolio will have nTi'nni'IPl: .;><.V,","'.,U,," high Section 7.6 Taking a Second look at Statistics ("Bad- Estimators) 509 10-year annualized returns lower volatility; that is, a smaller standard deviation. After 10 'one the portfolios showed an average 100year annualized return and a deviation of 10.17%. The benchmarks for the of funds area mean of 10.10% and a standard deviation of 15.67%. (a) Ii be the mean for a portfolio selected by the analyst's method. Test at the 0.05 level that the portfolio beat the benchmark; that is, test 80: JL = 10.1 versus Ii > 10.1. (b) Let Cf be the stanOJuo ael/lallOn for a 24-stock portfolio selected by the analyst's that the portfolio beat the benchmark~ that is, test TAKING A SECOND lOOK AT ATI.I:TII'-C (UBAD" ESTIMATORS) Estimating has been a major theme of Chapters 5, 6, and 7, and it will continue to playa prominent role the chapters ahead as our attention increasingly tums toward statistical inference. Not surprisingly, OUT discussion of estimation has been driven by the to "good" estimators-that is, ones that can claim to be unbiased, eIIJiClenl, cOlnsl~stent, and/or sufficient. In the spirit of "thinking outside the box," though, we might want to ask whether it would ever be desirable to use a "bad" estimator. If our objective is the pursuit of truth, the answer, of course, would be "no." If, on the other the answer is sometimes "yes." hand, OUT For you Psychology 101 exam because of illness, and you two options to make up the work: you can take either (1) a sixtymake-up test or (2) a one hundred-question TrueIFalse make-up tesl promises that the two tests will be equivalent-the questions the same of difficulty an d in both cases 75 % will be the lowest passing grade . " ... nw,F, a score of forty-five or higher on the sixty-question test and seventy-five or on the one hundred-question test). Which option should you choose, assuming you want to YOUT chances of I-'"""",.I,J:', answer is that option might be beUer, depending on how much you know (or don't know) about Psychology 101. Suppose p denotes your probability of answering question correctly. Both of these tests, of course, represent a mClep,enQelll Bernoulli trials, and we have already Seen that the unbiased, and estimator for p in such a model is X, where n is n sixty or one hundred) and the random variable X is f X. p(l· the number of questions nS'llllel-ea correct Iy. M oreover, th e vanance 0 -- IS - - - - " - n n X. -. h X t h at 100 IS a more precIse estimator for p t an it has a smaller variance_ the length of the test Which estimator is for you, though, is not necessarily 1~' If your knowledge of the Psychology 101 materiaJ is deficient to the extent that your value of p is 0.75 (i.e., less than the passing mark), then it your interest to estimate p as poorly as possible, meaning with as small a sample as possible (the sixty-question test)_ On the other hand, if your proba bility of answering a test correctly is 510 Chapter 7 The Nonnal Distribution it would be to your advantage to p as precisely as possible (by taking the one hundred-question test), A cbances of passing either test reduces to a simple summation binomial probabilities: P(pass 6O-question test) = P(X/60 ::: 0.75) =P(X ::: = f ( ~ )~(1 p)OO-k k=45 P(pass l00-question test) = P(Xj100 ::: = P(X::: = f ( l~ ) l(1 _ p)lOO-k k=75 7.6.1 shows a comparison of these two probabilities p values of 0.65, 0.70, 0.80, and 0.85. As predicted, POo.rly prepared students should take their chances with the shorter test (and less precise estimator, ~); well-prepared students (for whom p > 0.75) are better served by estimating p as precisely as possible (by taking the one bundred-question test). For example, someone with 8 65% probability of answering·a random True!Fa1se question correctly who takes the one bundred-question test has a cbance of "getting lucky" in the sense' that his or her estimator (= 1~) would be 0.75 3i or (enough to pass the test). That same student, though, would have a times greater chance of passing the test: to the table, the estimator X when p 0.65 has a 7% probability of in the student's favor (by equalling or exceeding 0.75). TABLE 7...6,1 Probabilities of Passing p: 6Q..question test 100-question test APPENDIX 7.A.1 0,65 0.70 0.80 0.85 0.07 0.02 0-24 0.16 0.87 0.91 0.99 1.00 MINITAB APPUCATIONS Many statistical procedures. including several featured in this chapter, require that the """LU'I.n... mean and standard be calculated. MINITAB's DESCRffiE command gives y and s, along with several other useful num.erical characteristics a sample. Figure 7.A.1.1 shows the DESCRIBE input and output for twenty observations in 7.4.1. Appendix 7.A. 1 MTB :> DATA> DATA> DATA> MTB MINITAB Applicatloos 511 set ci 2.5 3.2 0.5 O,! 0.3 0.1 0.1 0.2 1.4 8.6 0.2 0.1 0.4 l.a 0.3 1.3 1.~ 11.2 2.1 10.1 end > de6Gdbe cl Descriptive Statistics: Cl Variable C1 H H* Mean SE Mean 20 0 2,610 StOev 0.809 3.617 Minimum 01 0.100 0.225 Q3 0.900 3.025 Median Maximum 11.200 FIGURE 7..A.. 1.1 Here, N '" sample size from c1 (that is, the N* - number of observations number of "interior" blanks) Mean .. sample mean .. y s SE Mean .. standard error of the mean '" In StDev = sample standard deviation • s Minimum = smallest observation Q1 .. first quartile '" 25th percentile Median £ middle observation (in terms of magnitude). or average of the middle tvo if n is even Q3 2 third B 75th percentile observation Maximum '" Using MINlTAB Windows L Enter data under in the WORKSHEET. Click on STAT, then on STATISTICS. then on DISPLAY DESCRIPTIVE STATISTICS. :z.. Type Cl in VARIABLES boX; click on OK. ............J.L"-' Percentiles of chi square, t, and F distributions·can be obtained using the lNVCDF corrurumd introduced in Appendix 3.A.1. Figure 7.A.l..2 shows the syntax for printing out X.~5.6(= 12.5916) and F.Ol,4,7(= 0.(667746). 512 r ... •• ....•.... 1 The Normal Distribution MTB :> invcdf O. 95; SUBC > 6. Chi-Square with 6 DF P(X <- x) 0.95 x 12.5916 MTB > invcdf 0.01; SUBC> f 4 7. F distribution with 4 DF in numerator and 7 DF in denominator P(X (= 0.01 x 0.0667746 To find Student t notation needs to bave defined 1.10.13, for example, to be the value for whlch ~ f.l0.13) = 0.10 In the [enml[lOl()~ of the INVCDF 1.,;U.l.l..LU1""UU, though, 1.10,13(= percentlie of the frI3 (t} pdf (see Figure is the ninetieth MTB > invcdf O. SUBC> t 13. Student'S t distribution with 13 DF P( X <'" 0.9 x 1.35017 FlGURE7A1.3 MINITAB for constructing a confidence for J.t (Theorem 7.4.1) is "TINTERVAL X Y," X denotes the desired for the confidence coefficient 1 - a and Y is the column where the data are stored. 1.A.1.4 shows the TINTERVAL command to the bat data from Case Study 1.4.1; 1 - a is taken to Appendix 7.A..l MiB DATA DATA MTB > set cl > 62 52 68 23 34 46 > end :;. tinterval 0.95 cl 27 42 83 MINITAB Applications 56 S13 40 One-Sample T: Cl Variable N Mean C1 11 48.3636 StDev 18.0846 SE Mean 6.4527 95% (36.2142, cr 60.6131) RGUR£7A1A Constructing Confidence Intervals Using MINITAB Windows 1. Enter data under in the WORKSHEET. 2. Click on STAT, then on BASIC STATISTICS, then on T. 3. Enter Cl SAMPLES IN COLUMNS click on OPTIONS, and enter the of 100(1 - a) in the CONFIDENCE box. Click on OK. Click on OK. Figure 7.A1.5 shows the input and output for doing a t test on the approval data given in Table 7.4.2. The basic command is "TrEST X Y," where X is the value of Ji.o and Y is column where the data are stored. If no punctuation is used, the ......,.,,....,."'....., automaticaHy Hi to be two-sided. If a test, to the right is desired, we write MiB :;. ttest X Y; SUBC :;. alternative- H. For a one-sided test to the left, the subcommand becomes "alternative -1". MiS > set c1 DATA > 69 65 69 63 60 DATA:;. end MTB 68 64 46 67 61 69 :;. ttest 62 c1 One-Sample Cl Test of mu = 62 VB not = 62 Variable N Mean StDev SE Mean C1 12 58.6667 6.9467 2.0050 95% CI 2536. 63.0797) T P -1.66 0.126 FIGURE 7A 1.5 Notice that no value for a is entered, and the conclusion is not phrased as Ro." the analysis ends with the calculation data's P-volue. "Accept Ho" or 514 Chapter 7 The Normal Distribution Here, P-value = P (Tn'::: -1.66) + P(Tu :::: 1.66) = 0.0626 (recall Definition 6.2.3). is "Fail to reject HQ." + 0.0626 the P-value exceeds the intended 0.05), the conclusion 1. data under the WORKSHEET. 1. on then on BASIC STATISTICS, then on l-SAMPLE 3. Type Cl in SAMPLES IN COLUMNS lxlx; click on TEST MEAN and enter of /.Lo. on OPTIONS, then click on NOT EQUAL the 4. Click on whichever HJ is desired Click on OK; then click on OK. APPENDIX 1.A.2 SOME DISTRIBUTION RESULTS FOR Y AND s2 Theorem 7.A.2.L Let Yl, • •. Yn be a random sample of size n from a normnl distribution with mean /.L and variance . Define Y= 1 11. Yj 1 and S2 = 11. Then a. Y and are independent. b. --..".-- has a chi distribution with n - 1 degrees offreedom. Proof. proof of theorem relies on certain linear algebra techniques as well as a cbange..of-variables formula for multiple integrals. Definition 7 A.2.1 the Lemma details, see (46) or that follows review the necessary background results. For (224). Definitioo , .................. a:. A matrix A is said to be orthogonal if AAT /. b. Let fJ be any n-dimensional vector over the real numbers. That is, fJ (Cl> C2 •...• en), where Cj is a real The length of fJ will be defined as Appendix 7A2 Some Distribution Results forYand 515 Lemma. s.. A mntrix A is orthogonal if muJ. only if IlAtJll = IltJll for ellch tJ b. If anullrix A is orthogonal, then del A = 1. c. Let g be a one-to-one continuous mnpping on a subsel, 1 of n-spoce. Then !(Xl ..... XII)dxl ... dXn=l f(g(Yl'"'' Yn» g(D) J(g)dYJ· .. dYn D wkre J(g) is the Jacobian of the transformmion. Set Xi = (fi - /.l)/u for i = 1,2, ...• n. Then all the XiS are N(O, 1). Let A be an 1 t.... T n X n orthogonal whose Jast TOW is -::Tn"'" .Jii). X = (Xl ... ·, Xn) -. and define Z = (Z1. Z2 .... , ZI'I) T -"" tj:)Xl + ... + (jn)Xn = In x.] For by the tra.nsformation Z = -L AX. [Note that Zn - set D, P P(AX E D) = ED) = [ lA-ID Iv =In = whereg(z) = p(X E A-1D) /XI." .• X,,(Xl, ""XII) dXl' .. dxn fXi," .,X" (g(Z» detJ (g) dZl ... dZ n !Xt....• X~ . 1 . dZl' .. dZn is orthogonal, so setting (Xl. "', Xn)T = A -I Z, we have z. But 2 Xl + ... + xn2 = /Xt, .. "X" = 2 Zl + ... + zn2 (arrn/2e-(l/2)(X;+"'+";) =(21r)-n/2e~(1/2)(li+"'+z;) this we conclude that p(i e D) = Iv (21r)-n/2e-(II/2)(ri+"+z;)dZl .. ·dzn implying that the Z jS are independent standard normals. Z~J = Z'?J j=l + XJ= j=l (Xj j ...t X)2 + nK'- 516 Normal Distribution n-1 1'1 j=l j=l LZJ= I:(Xj and the L" (Yj - '-V.l.U.. "!;:)'""u + follows for standard normal variables. Also, f.L and (Xi - X)2, the conclusion follows for N(f.L, (12) variables. :: 1...1 Comment. As /emmo.' just presented, we established a the vel~ion Fisher's Let Xl. X2 •...• Xn be moer:x~no,em standard normal random variables and let A be an orthogonal matrix. Define :::::: A(X1• .... X,,)T. Then the ZiS are independent standard normal random variables. APPENDtx A PRooF'OF THEOREM 7.5.2 "o'.. n'~T a two-sided HI- The relevant We begin by considering the test parameter spaces are and n = {(p., (12): -00 < p. < 00, O::s 0'2) In both, llle,J1LiiOOlnUlD likelihood estimate for p. is y. In w, for is simply inO. = (lIn) " likelihood estimate (Yi - y)2(seeExampleS.4.4). the two likelihood functions, m~lxilni2:ed over wand over n, are and 2 n 11/2 = n Appendix 7.A3 A Proof of Theofem 7.5.2 517 It follows that the ,il,e!lera.ll1~d likelihood ratio is given by We need to know the behavior of A, considered as a of {(I;/(l5). For simplicity, let x = «(1;;/(16)' Then A = x"i2 e -(ni2)x+n/2 and the inequality A :$ A* is equivalent to xe-x :$ e-1 (A*)2/n. right-hand is again an arbitrary sayk*.Figure7.A3.1 is a graph of y = . Notice that the values of x = «(1;/(15) for which xe-x :5 k*, and equivalently A:::: A", fall into two regions, one for values of a;/a~ to zero the other for values of (I; /(15 much larger than one. According to the likelihood fa tio principle, we should reject Hn tor any A :$ A", where peA :5 A"'I Ho) = ct. But A" detennines (via k") numbers a and b so that the critical region is C = {((Ii /(15): «(I; /(I~) :$ a or (a; /(I~) ~ b}. k'" o i1r---------~----- x Rejectn.-RGURE 1.A3.1 Comment. At this point it is necessary to make a peA :::: )."180) ct, it does not follow that = P (~) approximation. Just --;.....::.--:;---- ~ b and, in fact, the two tails of the critical regions will not have exactly the same probability. Nevertheless, the two are numerically close enough so that we will not substantially compromise likelihood ratio criterion by setting each one equal to ctJ2. 518 Chapter 7 The Normal DlrulbutlOfl Note that p (~) --;......;;.-;:---- ~ a =p = p [-,---:::--(1'1. - O'a ~ na] and, similarly, p Thus we will cnoose as values X;J2,1I-1 and Xf-aJ2,II-t and reject Ho if either 1)$2 (1'1. - 2 ----,,--- :s K;;/2.11-1 or (1'1. - 1)? ---::2:--- :::: 0'0 (see Figure 7.A.3.2). Xl- a l2.,,-l LRe,iectHo FIGURE 7.02 Appendix 7.A.4 CommeIlt. A Proof that the One-Sample t Test Is a GLRT One-sided tests for dispersion are set up in a "''-'l1.''.'';U Ln<1LU!VLL. 519 In the case of u .....2_ no·" .... 2 -"'0 versus Hl:C/2 < HO is rejected if -'-----:~- 2 -< X 0',11-1 a5 flO: versus H1:(12 > (16 is rejected if aDD< 1..A.A A PROOF THAT Theorem 7.A.4.1. ONE-SAMPLf t AGLRT one-sample t test, as oUilined in Theorem 7.4.2, is a GLRT. Proof. Consider test of flo:!.L !.La versus HI: J.l ~ tLo. two parameter spaces restricted to Ho and Ho U HI-that is, wandO, respectively-are given by w = {(!.L. < and Without elaborating the details be readily shown that, under lJ), J.le Example 5.2.4 for a = J.lo Under J.le =Y 1 " = - L(Yi n i=l similar problem). it can 520 Chapter 7 The Normal Distribution L(w... ) ::: 1l!2 L(!2",) = 11 Dr L (Yi - Y)2 i=1 From L(we) and L(n,:) we the likelihood ratio: the case, it will prove to be more convenient to a test On a monoto~ ......" ........ of A, rather than on Aitself. We by the ratio's denominator. As is ~ Il L (Yi ;=1 j1.{)2 = L [(yi + 1=1 Il == L (Yi ;=1 - Y)2 + nCY - j1.{))2 A Proof that the Qnre-Sample tTest Is a GLRT Appendix 7.A.4 511 Therefore. A= 1 - J.l(J)2 + -,,""':::--"';""';;"';""';; L (Yl - )i)2 i=1 -71/2 = ( 1+ -1'1 - 1 ) where Observe that as t2 increases, A decreases. This implies that the original GLRT-wbich, definition., would have rejected Ho for any Athat was too small, say, less than A"'-is equivalent to a test that rejects Ho whenever ,2 is too large. But t is an observation the variable T=--~ Theorem 7.35) Thus "too large" translates numerically into 'a/2,7I-l: But and the theorem is proved. 0 CHAPTER 8 Types of Data: A Brief Overview 8.1 B.2 8.3 INTROOUCTION ClA5SIFYtNG DATA TAKING A SECOND lOOK AT STATl!!i11CS (SAMPlES ARE NOT"VAUD"O Aretbedata qUlllitative or qu.mtitll.tive'1 Quantitative Ate tbe unit& Ili.milar Of diMim.iIar? Dissimilar More Ihan two How many One r----'""'--1 treatll:lent levels 1----..; lire involved'? lWo Dependent Are tbe samples dependent or independent? The practice of stati:rtk.s is typically conducted on two distinct levels. Analyzing data requires first and foremost an understanding of random variables. Which pdfs are modeling the observations? What parameters are involved, and how should they be estimated? Broader Issues. though,. need to be addressed as well. How is the entire .set of measurements configured? Which factors are being investigated; in what ways are they related? Altogether.. seven different types of data are profiled in Chapw- 8. Collectively, they represent a sizeable fraction of the "experimental designs" any researcher is likely to encounw-. 522 Section 8.1 Introduction 523 INTRODUcnON Chapters 6 7 have introduced the basic principles statistical inference. typical objective that material was either to construct a confidence interval or to test the credibility of a null hypothesis. A variety of fonnulas and decision rules were derived to accommodate distinctions the nature of the data and the parameter investigated. It should not unnoticed, though, that every set of data in those two chapters, despite superficial differences, a critically important common denominator-each represents the exact same experimental design. A working knowledge of statistics requires that the subject be pursued at two different levels. the one hand, attention to paid to the mathematical properties in the in.dividual met1suremenJs. are what might be thought of as the "micro" structure ofstatistics. What is the pdf of the YiS? Do we know E(Yi) or Var(l'i)? Are the independent? Viewed coUectively, thougb, every set of measurements has a overall It will those features that we focus on in this chapter. A structure, or number of issues need to be addressed. How is one design from another? Under what circumstances is a given design desirable? Or undesirable? does the design of an experiment influence the analysis of that experiment? The answers to some of questions will need to be deferred until each design is taken up individually and in detail later the text. For now our objective is much more limited--Chapter 8 is meant to be a brief introduction to some of the important ideas inhere will serve as a backdrop and a frame volved in the classification of data. What we reilerence for the mul tiplicity of sta tistical procedures in Chapters 9 through 14. Defi nitions To describe an experimental design, and to distinguish one from another, requires that we understand several definitions. Treatments and Treatment Levels. The word treatment is used to denote any condition or trait that is "applied to" or "characteristic of" the subjects being measured. Different versions, extents, or of a treatment are referred to as levels. Illustrating that distinction is breakdown in 8.1 which shows consumer reactions a scale TABLE 8_1.1 Sports Coupe Age of SubjeCt Male 21-44 8 45-64 7 7 7 65+ 4 6 7 6 6 5 3 5 Four-Door Sedt:Jn Male Female 6 8 7 5 8 8 9 8 6 7 7 9 524 Olapter B Types of Data: A Brief Overview of one to ten) to two new automobile models. Listed are the opinions given by a total of twenty-four subjects. Age, gender, and model of car are all considered treatments. The three levels of age are tbe ranges 21-44, 45-64, and 65+. Similarly, male and female are the two levels of gender, and sports coupe and four·door sedan are the model levels. Blocks.. Sometimes groups subjects share certain they respond to treatments, yet those characteristics are of no intrinsic experimenter. We call any sucb group of related subjects a block. Table 8.1.2 the yields of corn (in bushels) that were barvested from three fields: A, B, and Equal in each field were with one three King's Formula 6, or Greenway. The objective was to the ettc:ctlverle88 three fertilizers. TABLE 8..1.2 A B C Gro.Fast King's Formula 6 Greenway 126 84 113 137 119 89. 121 87 124 Even city slickers can readily appreciate that no three fields will entirely identical in their ability to grow com. Variations in drainage, soil composition, and sunlight will inevitably have on fertility. precise nature of those field-to-field differences, though, is not being quantified, nor is it the experiment's In lingo of experimental design. fields A, B, and C are blocks. (Gro-Fast, King's Formula 6, and Greenway. on the other band, are treatment levels because they represent speci.fic formulations and their comparison is the study's stated objective.) Independent and Dependent Samples. Whatever the context, data collected for purpose of comparing two or more treatment levels are necessarily dependen1 or independent. Table 8.13 is an example of the former. Listed are interest rates on borne mortgage loans offered by three competing banks. The 9.6, 10.1, and 9.8 reported on 15 are considered dependent because of what they have in coromon: the particular economic conditions that All three refiect, probably to no small prevailed on January 15. By the same argument, entries 9.4, 9.9, and 9.8 are also related-in TABLE 8.1.3 Date Jan. Marcb 10 JulyS Sept 1 Second Union Bankers Trust Commerce Mutual 9.6% 10.1% 9.9 9.6 11.0 9.S% 9.8 9.4 9.3 10.6 9.5 lOA Section 8.1 Introduction 525 TABLE 8.1.4 Brand A Brand B 852 801 864 835 843 &J7 832 819 their case, by of whatever circumstances were present on March 10. Without exception, measurements that belong to the same block are considered to be dependent. In practice, there are many different ways to make measurements dependent; "place" and 8.1.3) are two of the most common. and "time" (as in Tables Contrast the structure of Table with the lWO sets of measurements in Table showing the lengths of lime (in hours) that it ten light bulbs to burn out. of the bulbs were brand A; the other five were brand B. Here there is no row-by-row common 8.1.3. The recorded for the first brand denominator analogous to in bulb has no special link to the 810 recorded for the brand B bulb. Similarly, the and 801 in the second row are unrelated. Because the absence of any direct connections between these two sets of observations, row-by-row, we say that brand A and brand B measurements are independent samples. Similar and Dissimilar Un.i1s. Units must also be taken account we classify a data set's macrostructure. Two measurements are said to be similar if their units are the same and dissimilar otherwise. Tables 8.1.3, 8.1.4 have all been examples of data that are unit compatible. information displayed in Table does not follow area and (2) asking price for five that pattern. It shows (1) the amount of properties listed by a local realtor. Since the first measurement is recorded in square and the second is in dollars, the two are considered dissimilar. TABLE 8.1.5 UvingArea (in square feet) Asking Price 1049 Ridgeview Tyne 2860 3210 $410,500 419,900 6086 Harding 2350 5340 346,000 659.500 Property 4111 Quantitative Measurements and Qualitative Met'lSUrements. Finally, a distinction needs to be drawn between measurements that are quantitative and those that are qualitative. By definition, quantitative data are observations where the values are numerical. "Values" for qualitative data are either categories or traits. Table 8.1.6 S2i Chapter 8 Types of Data: A Brief Overview TABlE 8.1.6 Olden Properties Builders Maverick CDs Adam East Bayou Construction Type Oassification Real estate Construction Conunercial Commercial Construction Loss Doubtful Loss Substandard Marginal illustrates qualitative data on the status of a bank's five largest loans in trouble. Here, one measurement has possible (nonnumerical) values; the other four: I Commercial of Loan == Construction Real estate .. classificatiOn = (By way of comparison, the data Marginal Substandard ~btful ! Tables 8.1.1-8.1.5 are quantitative.) CASE STUDY 8.1.1 8.1.7 tracks the recent history of postage rates (184). On May 1971, the cost of sending a letter first class was 8¢; by 1,1995 (nine price hikes later), T.A8l£ 8.1.7 Date May 16, 1971 March 2,1974 Dec. 1975 May 29, 1978 March 22, Nov. 1, 1981 Feb. 17, 1985 April 3, 1988 3, 1991 Jan. 1, 1995 Years after Jan 1, 1971 Cost 0.37 3.17 5.00 7.41 10.22 10.83 14.13 17.25 8 10 13 15 20 22 25 20.09 29 24.00 32 (Conti~ on next page) Section 8.1 Introduction 527 'fABLE 8..1.8 Passenger Boardings 1991) Passenger Boardings 41,388 44,880 44.148 39,568 34,185 37,604 Feb. 34,805 33,025 34,873 31,330 30,954 ~arch 32,402 April 38,020 42,828 41,204 ~onth July Aug. Sept. Oet. Nov. ~ay June (F~aL1992) 42,038 28,231 29,109 38,080 34,184 39,842 46,727 a stamp cost 32¢. The figures in Table 8.1.8 the numbers of passenger hoardings by month for fiscal 1991 1992.. as reported by the Pensacola Regional Airport. Relative to the definitions just introduced, how are two sets of data comparable? How are different? In both cases, the information recorded is quantital.ive and dependent, with the source of the dependency being .. For the data, there are two treatments, "Years !lfter Jan. 1. 1971" and "Cost (¢)." For the airport data, there is at two levels, "FIscaJ 1991" one treatment-«Passenger boardings"-but it "Fiscal " Moreover, the in Table are dissimillU, those in Table 8.1.8 are similnr. Possible Designs the definitions on pages 52.3-525 can give to an enormous number of different experimental designs, far more than can be in this text Still, the number of designs that are widely is quite small The vast majority of data likely to be encountered full into one of the following seven designs: One-sample data Two-sample data k-sampJe data Paired data Randomized block data Regression data Categorical data 528 Chapter 8 Types of Data: A Brief Overview The postage figures in Table 8.1.7, for example, qualify as regression the Dru;seln2f~r boardings in Table 8.1.8 are paired daJ.o.. (The ratings in Table 8.1.1, on the other hand, have a more complicated experimental structure and cannot be described any of these seven basic designs.) and reduced to a mathematical Section 8.2, each design will be profiled model. Special attention will be given to each for what type of inference is it likely to be used? 8.2 a.ASSIFYING,DATA The answers to no more than questions are needed to classify a set of data into one of the seven basic models listed in the preceding section: 1. Are the observations quantitative or qualitative? 2. Are the units similar or dissimilar? 3. How many treatment levels are involved? 4. Are observations dependent or independent? In Section 8.2, we use these four questions as the starting point in dlstinguislting one from another. One-Sample Data The simplest of all experimental designs, one-sample tklta conslsl of a single random sample of size n. Necessarily, the n observations are measurements reflecting one particular set of conditions or one treatment. could be either qualitative or quantitative. of that Typical is showing for a sample of ten airlines the landed within fifteen minutes of scheduled arrival times (197). By far, the t\Vo most frequently encountered examples of one-sample data are (1) a random sample of n normally distributed observations and (2) a random sample SUC~Ce!;se!)" and "failures" occurring in a of n Bernoulli trials. For TABLE 8..2..1 Carrier United America West Delta USAir TWA Continental Southwest Alaska American Northwest on Time 82.0% 88.0 76.1 83.5 78.1 77.3 92.1 87.4 79.3 Section 8.2 Classifying Data 52!) salnples from a normal distribution, the objective is often to construct confidence intervals or test hypotheses aboul/k (using the Student t distribution) or to draw inferences about 2 distribution). Theorems 7.4.1 and (12 (using the x detail tbe procedures drawing conclusions about JA,: Theorems and 7.5.2 deal with confidence and hypothesis tests for Data recorded as "success" or "failure" are typically modeled by the binomial distribution, and inference procedures focus on the unknown success probability, p. Theorem 6.3.1 gives the large-sample decision rule for testing Ho: p = p()~ confidence intervals for p are taken up in Theorem 5.3.1. Mathematical Model Figure 8.2.1 illustrates the structure of one-sample data. For the purpose of comparing experimental designs, it often helps to represent data points as sums of fixed and variable components. These expressions are known as model eqWltions. For one-sample data, the model equation for an arbitrary Yi is written (fixed) mean of the probability distribution being by the where JA, denotes data and Ei is a random variable the "error" the measurement-that is, deviation the measurement from its mean, IJ.. Treatment Model Equation li=J1+B;, i=l,2, ... ,n Y II fiGURE 8.2. 1 U the Yi'S are quantitative measurements, the assumption often made is that E:i is a normal random variabJe with mean zero and standard deviation u. The latter is equivalent to assuming that Yj is normally distributed with mean IJ. and standard deviation (1. Two-Sample Data one-sample design typically requires that a set of measurements be compared to a fixed standard-for example, testing the null hypothesis 110: JA, = JA,(). More likely to be encountered, though, are situations where an appropriate standard fails to exist or cannot be identified. In those cases, measurements need to be taken on each of the treatment levels being compared. The simplest such design occurs when only two treatment levels are involved and the two sampJes are independent Consider the data in Table 8.22 showing the (in seconds) that 15 male flies and female fruit flies (Drosophila melanogaster) spent preening themselves (31). Here, 530 Chapter 8 Types of Data: A Brief Overview TABlE 8.2.2 Male 2.3 2.9 Female Times (sec), Xi 1.9 2.2 2.4 3.3 12 2.0 2.3 1.9 2,7 1.2 1.3 2.1 Times (sec), Yi 3.7 11,7 5.4 2.8 2.2 2.4 4.0 2.8 2.0 2.8 2.4 2.9 10.7 2.4 3.2 "Male" and "Female" are the two treatment levels, the units are similar (seconds), and the samples are independent. are the conditions that two-sample data. Two-sample inferences tend to be hypothesis tests rather than confidence intervals, although botb techniques will be developed in Chapter 9. In Table 82.2, for example, the two means are x = sec (for the and y = 4.09 sec (for the females). Suppose iLK and ILl' denote the true preening times male fruit flies and female fruit flies, respectively. Is the null hypothesis Ho: J.1-X = ILy credible in light of the difference between x and y? As we will see in section 9.2, the answer to that question takes the form of a two-Sllmple t test. (And, yes, the answer is what male chauvinists would be hoping additional preening time spent by females statistically Significant!) Mathematical Model. Let Xi and Yj denote the ith and jth observations in the X and Y samples, respectively. The assumptions implicit in the two-sample fonnat that the Xs and Ys are independent and that Xi = ILx + Ei, i = 1,2, ... , n and j =1,2 •... ,m In many situations, the error terms, Ej and ej, are assumed to be nonnally distributed with mean zero and the same standard deviation 0' (see Figure 8.2.2). Treatment Levels Model Equation - -1 - - - -2 - - - - - - -Xi = J1.K + tj, i = 1,2, ... , n XII Y", FlGUftE 8.2.2 Section 8.2 dassH'ying Data 531 k-Sample Data When more than two treatment levels are compared, and whell the samples representing those levels are independent, the observations are said to be k-sllmple datil. Although their assumptions are comparable. two-sample data and k-sample data are treated as distinct experimental designs the methods for them are totally different. Table summarizes a set of k-sample data k = 3. The same strain of bacteria was grown in each of nine Petri dishes, and the latter were divided into three groups. Each group was treated with a different antibacterial agent. Two days later diameters of the areas showing no bacteria! were measured (in centimeters). TABlE 8..2.3 M21z ATC3 B169 2.9 5.0 3.1 4.8 4.6 2.93 4.80 4.3 Sample means: 3.87 Typically, the objective with k-sample data is to test Ho: J.Ll J.L2 = Jl j represents the true mean associated with the jih treatmellt leveL ... = J.Lt, where the in Table 8.2.3, (or example, the to resolved is whether the differellccs among the sample means (3.87, 2.93. and 4.80) are sufficiently large to reject hypothesis that J.Ll = J.L2 = J.L3· The I test format that figures so prominently in the interpretation of one-sample two-sample cannot be extended to accommodate k-sample data. A more powerful technique, known as the analysis of variance, is oeeded. The latter will be developed in Chapters and 13. Matbematical Model. The only structural difference betweell mathematical modnumber of treatment levels compared (see els for two-sample and k-sample data is Figure 8.2.3). However, withk > 2, USing different letters to represent different treatment levels is unwieldy. Double-subscript notation is much more convenient-Yij will denote Treatment Levels 1 2 Model fu Yu Y21 k Y22 Y2k Y1l22 Ynkk + YI'j = J.Lj i = 1,2. .... nj. j FIGURE 8.2.3 = 1, 2, .... k 532 Chapter 8 of Data: A Brief Overview the ith observation in the jth Likewise, error terms will written BU' As '-""',.v ..... the latter are usually assume<! to be nonnaHy distributed with mean zero and the same standard deviation a for aU i and j, Moreover, all the samples must be independent. Paired Data l:W()-S.imJPJe and k-sample treatment levels are compared ituleperuienl samples. An alternative is to use dependent samples by subjects into blocks. If only two treatment levels are involved. dependent measurements are classified as paired tinta, A typical scenario is the application of two treatments or conditions to the same subject-for example, blood pressure measurements taken "before" and "after" a receive<! medication. Table 8.2.4 shows a paire<!-<iata comparison of a baseball team's ba tting averages. The two treatment levels are when a game was playe<! ("Nighttime" or "Daytime"). The two entries in a given row--for example, the .310 and ,320 for clearly dependent: A player with a average during night games is likely to have a high average during poor-hitting players will probably have low batting averages day games as welL regardless of when games are sche<!uled. TABLE 8.2.4 Ave, RA,d 3b WC,ll JA,lb DC,c RS,2b .310 .286 .302 .280 .214 .302 JL,ss .276 BB,d .285 .320 .290 .298 .W .226 .300 .290 .295 The statistical of two-sample data and paired data is often the same. that the true averages (/.LX and /.LY) seek to examine the plausibility of the null with the two treatment levels are equal. Mathematical Model. The responses to treatment leveJs X and Y for the ith pair are denoted Xi Y/, respectively. Both measurements reflect the particular conditions that the ith pair. We will denote the "pair the symbol That is, Xi = I-LX + + Ei and Yj = /.LY + B; + The fact that Bi is the same (or both Xi and Y; is precisely what makes the samples dependent (see 8.2.4). Randomized Block Data When dependent samples are used to compare more than two treatment levels, the an obvious measurements are referred to as randomized block data. Despite generalization of paired data, the randomized block is separately OA.-..""'-"'," Section 8.2 Oassifying Data 533 AGUR.EB.2A the methods required for its analysis are entirely (recall the similar justification for keeping two-sample and k-sample data as two separate ae!>JgrlS). 8.2.5 summarizes the results of a randomized experiment set up to investigate the possible effects "blood " a controversial procedure whereby athletes are injected with additional blood cells for the purpose of enhancing performance (17). Six runners were the subjects (and, thus, the blocks). Each was ten thousand-meter races: once after receiving extra red blood cells, once injected a placebo, and once after receiving no treatment whatsoever. Listed are their times minutes) to complete the race. TABLE B.2.S No Injection 1 2 3 4 5 6 34.03 32.85 33.50 32.52 34.15 33.77 32.70 33.62 31.23 33.05 31.55 32.33 31.20 32.80 33.07 Oearly, the times in a row are dependent-all three depend to some extent on the speed of subject. of treatment level might be Documenting from subject to subject, though, would not the objective for doing this sort study. If !hI. !h2, and Ji3 the true average times characteristic of the nn injection, placebo, and blood doping treatment levels, respectively, the experimenter's tirst would to test Ho: Jil = Jl2 = Ji3. As we will see Chapter the as to whether or not a null hypothesis this sort should be rejected turns out to another application of the analysis of variance. Mathematical Modd. Randomized block data have the same basic structure as do paired data. As we saw with k-sampie data, though, the multiplicity of treatment levels Figure 8.2.5). As before, the Bi dictates that double notation be used component is the term that makes observations in a given row-Yil. Yi2 •... , and Yik-dependenL 534 Chapter 8 of Data: A Brief Overview Treatment Levels Block 1 2 1 2 Yu Y12 Ii: Modell:.Y1J<1UIUU;:' Yu Y2\; i = 1,2•... 111, j=1,2, .. ,k 11 Ynl Yn 2 Yn,k RGURElU..5 Regression Data All the experimental designs introduced up to this point share the property that their measurements have the same units. Moreover, each has had the same basic objective: to quantify or to compare of one or more treatment In contrast. regression dllla typically consist of measurements with dissimilar units, and their objective is to study the functional relationship between the variables r-ather than test the null hypothesis that a set of means are aU equal. Table 8.2.6, showing the increase in the cost of a first class postage stamp from 1971 to 1995, is an example of data (recan Case Study 8.1.1). Any direct comparison of the information in second and third columns is impossible the are incompatible. It makes sense, instead, to focus on the relationship between years after Jan. 1, 1971 and cost. Graphing is especially helpful with regression data. Figure 8.2.6 shows .a plot of Cost (= y) versus Yean after Jan. 1, 1971 x). Superimposed is a straight line-y = 7.50 + l.()4x-that "best" fits the ten (Xi. Yi)S (using a technique we will learn in Chapter 10). TABLE 8.1.6 Years after Ian. 1,1971 Cost (in cents) 5116171 312n4 12131n5 037 3.17 5129nS 7.41 1022 10.83 14.13 17.25 S 10 13 15 Date 3122181 11/1/81 2117/85 413188 2/3191 111/95 5.00 20.09 24.00 18 20 22 29 32 Section 8.2 Classifying Data 535 35 30 Q. ~25 t; !!'" 20 ] 15 (.) "-I 0 ~ 10 5 o 10.00 5.00 15.00 20.00 25.00 Years after Jan. 1,1971 FIGURE 8.2..6 Mathematkal Model Regression data often have the form (Xi. Yi), where Xi is a number and Yj is a random variable (having different units from Xi)' A particularly important special case is the so-called linear mode~ where the mean of Yi is linearly related to Xi. is, Yi = fJo + !3tXi + €j, where £i :is normally distributed with mean uro and standard deviation a. More generally. E(Yi) can be any function, g(Xj, fJo, Pt .... ). of xi-for example, E(Yi) = /3oxft OT Independent Variable Subject 1 Xl 2 X2 E(Yd = fJoeiJJ x1 Dependent Variable Y2 Figure 82.7). Model Equation g(Xj./30,!3!> ..• ) + Ei. 1,2, ... ,n n Xn Y" AGUREB.2.7 Categorical Data If the information recorded for of two dissimilar is qualitative rather than quantitative, we call the measurements categorical dOla. Typical is a recent study undertaken to investigate the relationship-if one exists-between a physician's Specialty (X) and his or her Malpractice history (Y). range of each variable was reduced to 536 Chapter 8 Types of Data: A Brief OVerview three (nonnumerical) classes: Specialty Malpractice history = = orthopedic surgery (OS) obstetrics-gynecology (OB) { internal medicine (1M) A: noclaim B: one or more ending in nonzero indemnity C: one or more claims but nOne requiring compensation In its original fonn, the information coUected on the 1942 physicians interviewed 8.2.8 (32). Data of sort are usually summarized looked like the listing in by tallying the number of times each (X. Y) "combination" occurs and displaying those frequencies in a contingency table (see Figure 8.2.9). The inference procedure that typicaUy accompanies the construction of a contingency table is a hypothesis test, where Ho states that the random variables X and Y are independent. is a frequently encountered experimental design, especially in the Malpractice History Case ML EM 1M OB OS 1M B B C A MS OB C 1 SB 2 3 4 1942 FtGlJftE 8.l.8 Obstetrics-Gynecology Internal Medicine Totals Orthopedic No claims At least one claim lost At one but no damages awarded 147 349 700 1205 106 14 62 317 156 149 115 420 Totals 400 647 886 1942 Section social a SCleO(:es. The statistical that will be Classifying Data 537 for analyzing car,egIDn1c:al data is the chi square in Chapter 11. L.....LUL''"I'''' ... Mathematiad Modd. The assumptions associated with categorical data are far is no specific those we have seen the six previous experimental designs. requirement of normality. for and no particular model equation. In effect, X and Y can be any discrete random whatsoever 8.2.10). Variable Subject Second Variable 1 2 n Model Yl Y2 X and Yare discrete random variables Yn XI> FIGURE 8.2.10 Start Are Ihe data qualitative or quantitative? QualJtative Are Ihe units Dissimilar &imilar or di!similar? Similar More than two How many .------1 treatment levels One I----_~ are involved? f:.~~~~ Are the samples .... dependent or independent? I Are the samples Dependent dependentot independent? FIGURE 8.2. 11 538 Chapter 8 Types of Data: A Brief Overview A Flowchart For Classifying Data It was mentioned a1 the outset of this section that classifying data into the seven models just that a maximum of questions be answered (recall page 528). 8.2.11 is a flowchart that summarizes the model-identification process. EXAMPLE 8.2.1 The federal Community Reinvestment Act of 1977 was enacted out concern thaI banks were reluctant to make loans in low- and moderate-income areas, even when applicants seemed otherwise acceptable. The in Table 8.2.7 show one particular bank's credit penetration in ten low-income census tracts (A through J) and ten high-income census tracts (K through T). To which of the seven models do these data Note, that the measurements (1) are quantitative and (2) have similar units. Low-income and High-income correspond to two treatment levels, and the two samples are clearly (the 4.6 recorded in tract A, for example, has in common with the 11.6 recorded in tract K). From the flowchart, then, the answers quantitanve!similarJtwoJindependent imply that these are two·sample dizta. TABLES.l.7 Low Income Tract A B C D E F H I J Percent of with Credit 4.6 6.6 High Income Census Tract Percent of Households with Credit K L 11.6 8.5 8.2 15.1 12.6 11.3 9.1 4.2 M 9.8 6.9 11.0 6.0 4.6 4.2 5.1 N 0 P Q R S T 6.4 5.9 EXAMPLE 8.2.2 ill 1991, a rule change in college football narrowed the distance between the of that legislation on the probability of goalposts 23'4" to 18'6". The players successfully points touchdowns (P ATs) are in 8.2.8. The numbers in the tirst column are based on aU college games played through September of the 1990 season; those in the second column corne from the 1991 season (194). What experimental design is represented? ,"j",'" the numerical appearance of the information in Table 8.2.8, the actual data here are qualitative, not quantitative. 959, 829, 46. and 82 are not H.'''",nTn'',", .L.; ..... TABLE 8.2.8 "Wide" Goalposts Total Percent successful "Narrow" Goalposts (1991 season) 959 829 46 82. 1005 95.4 911 91.0 they are summaries of measurements. What was recorded for each attempted conversion were two of qunlitative information: g oal tpas - )ut(:OITle of kick = {Wide narrow {Successful 1 unsuccessf u Only later were the 1916 data points summed up, and reduced to the four frequencies appearing in Table 8.2.8. By the answer to the first question posed in Figure 8.2.11, are categoricnl data. EXAMPLE People looking at the vertical lines in Figure 8.2.12 will tend to the right one as shorter, even though the two are equal Moreover, the perceived difference in lengths-what psycbologists call the "strength" of the il1usion-bas been shown to be a function of age. a study was done to see wbether individuals who are bypnotized n:~).t:::;.scu to different perceive the illusion differently. Table 8.2.9 shows illusion strengths measured eight subjects while they were (1) awake, (2) regressed to age nine, and (3) regressed to age five (142). Whicb the seven experimental designs do these represent? at the sequence of questions by the flowchart in Figure 8..2.11: 1. the data qualitative or quantitative? Quantilotive 2. Are the units similar or dissimilar? Similar 3. many treatment are involved? More than two 4. Are the observations dependent or Dependent According to the flowchart, then, these measurements qualify as randomized block 540 Chapter 8 Types of Data; A Brief Overview TABU 8.2.9 1 2 3 4 5 6 7 8 (1) (2) Regressed to Age 9 (3) Regressed to AgeS 0.81 0.44 0,44 0.56 0.19 0.94 0.44 0.06 0.69 0.31 0.44 0.44 0.19 0.44 0.44 0.19 0.56 0.44 0.44 0.44 0.31 0.44 0.19 FIGURE 8.2.12 QUESllONS for Questions 8.2.1-8.2.12 use the flowchart in Figure to identify the experimental designs represented. In each case, /l1'tSwer whichever of the questions on p. (528) are necessary to l1Ulke the determination.. 8.2.l. Kepler's Third Law states that "the squares of the periods of the planets are proportional to the cubes of their mean distance from the Sun.." Listed below are the periods of revolution (x). the mean distances from the sun (y), and the values x 2 / planets in the solar (4). r for the nine Section 8.2 Planet Xi Mercury Venus Mars Jupiter Saturn Uranus Neptune Pluto (years) Classifying Data 541 Yi (astronomical units) 0.241 0.615 1.000 1.881 11.86 0.387 0.723 1.000 1.524 5.203 29.46 9.54 84.01 164.8 19.18 30.06 248.4 39.52 1.002 1.001 1.000 1.000 0.999 1.000 1.000 1.000 1.000 8.l.2. Mandatory helmet laws for riders a contrO,versial Some states have had a "limited" ordinance that applied to only younger others have a "comprehensive" statute requiring aU riders to wear helmets. Listed below are the deaths per 10,000 registered motorcycles by states type of legislation (192). Helmet Law Comprehensive Helmet Law 6.8 10.6 9.6 9.1 5.2 13.2 7.0 4.1 5.7 7.6 3.0 6.7 15.0 7.1 11.2 17.9 113 85 93 6.9 7.3 4.2 4.8 10.5 8.1 9.1 05 6.7 6.4 4.8 5.0 7.0 6.8 8.1 12.9 5A 1U.l. Aedes aegypti is the scientific name of the mosquito that transmits yellow fever. AJthough no longer a health tbe Western world, yellow rever was perhaps the most disease in the United States for 200 years. see how long it takes the Aedes mosquito to complete a five young females were allowed to bite an exposed human forearm without the of being swatted. The resulting blood-sucking times (in seconds) are summarized below (90). Mosquito Bite Duration (5) 1 2 202.9 3 315.0 4 5 8.2A. Male cockroaches can be anrBgclflIsuc_l:owaro other male cockroaches. Encounters may be lieeting or quite ........ ",t..rt resulting antennae and 542 Chapter 8 Types of Data: A Brief Overview broken A study was done to see whether cockroach density has any on the frequency of serious altercations. Ten groups of four male cockroaches (Byrsotria fumigolo) were each subjected to three levels of density: high. intermediate, and low. The following are tbe numbers of encounters per minute that were observed (16). 1 0.30 0.20 0.17 0.25 0.27 0.19 Intermediate Low 0.12 0.28 0.20 0.15 031 0.16 0.20 0.17 0.18 0.20 0.20 9 10 0.23 0.31 0.29 0.11 0.24 0.13 0.36 0.20 0.12 0.19 0.08 0.18 020 Averages: 0.25 0.18 2 3 4 5 6 7 8 8.2.S. Luxury suites, many costing more tban $100,000 to rent, have become big-budget status symbols in new sports arenas. Below are tbe numbers of suites (x) and tbeir projected revenues (y) for nine of the country's newest facilities (2fJ7). Arena Palace (Detroit) Orlando Arena Bradley Center (Milwaukee) America West (Phoenix) OlarIotte Coliseum Center (Minneapolis) City Arena Miami Arena ARCO Arena (Sacramento) Number of Suites, x Projected Revenues (in millions), y 180 26 68 88 12 67 $11.0 1.4 3.0 6.0 0.9 4.0 56 18 1,4 30 2.1 8.2.6.. Deptb perception is a life-or-death ability for lambs inhabiting rugged mountain terrain. How quickly a lamb deveJops that faculty may depend on the amount of time it spends with its ewe. Thi.rteen sets of lamb liuennates were the subjects of an experiment tbat addressed that question (101). One member of each litter was left with its motber; the other was removed immediately after birth. Once every hour, the lambs were Classifying Data SectiooB.2 on a simulated cliff, part of which included a platform of glass. If a lamb placed its feet on the glass, it ''failed'' the test, since that would have been equivalent to walking off the cliff. Below are the numbers when the lambs first learned not to walk on the IJBiS5--mat is, when they first developed depth percer:,t1o.n. Number of Trials to Learn Depth Perception Group Mothered, Xi 1 2 3 4 5 6 2 )Ii 3 3 7 8 9 5 10 3 2 5 5 1 1 4 2 7 5 3 1 10 5 4 8 7 3 7 5 8.2.7. To see expectations for students can become self-fulfilling prophecies, fifteen first-graders were given a standard 10 test. childrens' teachers, though, were told it was a special test for predicting whether a child would show sudden spurts of inteUectual growth in the near future (see divided the children into three gTO!..tpS of sizes five, and four at but they informed the teachers that, according to the test, the children in Group I wo.uld not demonstrate any pronounced intellectual growth for the next year, those in II would develop at a moderate rate, and those in 1lI could be expected to exceptional progress. the same were again given a standard 10 test. Below are the the two scores child (second test - first test). Changes in IQ (second test - first test) Group I 3 2 6 10 10 5 Group 10 4 11 14 3 Group III 20 9 18 19 544 Chapter 8 TypeS of Data: A Brief Overview 8.2.8. Among young drivers, roughly a third of aU fatal automobile accidents are speed-related; by age 60 that proportion drops to about one-tenth. Listed below are a recent year's percentages of speed-related fatalities for ages ranging from 16 to 72 (198). Fatalities Percent 16 37 17 18 33 19 34 20 24 33 31 28 27 26 32 23 42 16 13 57 10 72 9 7 8.2.9. Gorillas are not the solitary creatures that they are often made out to be: they live in groups whose average size is about 16, which usually incJudes 3 adult males, 6 adult females, and 7 "youngsters." Listed below are the sizes of 10 groups of mountain gorillas observed in the volcanic highlands of the Albert National Park in the Congo (161). Group No. of Gorillas 1 8 2 19 3 4 5 5 24 11 6 20 7 18 8 21 9 27 10 16 8.2.10. Roughly 360,000 bankruptcies were filed in Federal Court during 1981; by 1990 the annual number was more than twice that figure. The following are the numbers of business failures reported year by year through the 1980s (182). ~.......... 8.2 dassifying Data 545 Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 360,329 344,275 477,856 561,274 594.,567 642,993 726.484 un. The diversity of bird s[)eIdes in a given area is related to plant diversity, as measured by variation in foliage as well as variety of flora.. Below are indices measured on those two traits for habitats (113). Plant Cover Diversity, Xi Bird Species Diversity, YI 1 2 3 4 0.90 0.76 1.67 5 0.20 1.80 1.36 2.92 2.61 0.42 0.49 1.90 2.38 124 2.80 241 2.80 2.16 6 7 8 9 10 11 12 13 1.44 1.12 1.04 0.48 133 1.10 1.56 1.15 8.2J.l. M.a1e often have trouble distinguishing other male toads a state of that can lead to awkward moments during season. toad A inadvertently makes inappropriate romantic overtures to the a short call known as a release chirp. Below are tbe release chirps measured for. 15 male toads innocently caught up in .LJUO><lU'VvULU1'""" (19). 546 Chapter 8 Types of Data: A Brief OVelView Toad 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Length of Release (s) 0.11 0.06 0.06 0.06 0.11 0.08 0.08 0.10 0.06 0.06 0.15 0.16 0.11 0.10 0.07 For Questions 8.2.13-8.2.33 identify the experimental design (one-sarnple, two-sample, etc.) that each set of data represents. 8.2.13. A pharmaceutical company is testing two new designed to improve the blood~ clotting ability of hemophiliacs. Six subjects volunteering for the study are randomly divided into two groups of size three. The first group is given drug the second group, drug B. The response variable in each case is the subject's prothrombin time, a number that reflects the time it takes for a clot to form. The results (in seconds) for group A are 32.6, 46.7, and 81.2; for group B, 25.9, 33.6, and 35.1. 8.2.l4. Investment fums financing the construction of new shopping centers pay close attention to the amount of retail floor space already available. Listed below are population and floor space figures for five southern cities. City Population, x Retail Floor Space (in million square meters), y 1 400,000 3,450 2 150,000 1,250,000 1,825 7,480 2,Y75,000 760,000 14,260 3 4 5 5,290 8.2..15. Nine political writers were asked to assess the United States' cul:pability in murders committed by revolutionary groups financed by the CIA. Scores were to be assigned using a scale of 0 to 100. Three ofthe writers were native Americans living in the U.S., three were native Americans liv.ing abroad, and three were foreign nationals. Section 8.2 Americans in U.S. Classifying Data 547 Americans 45 45 65 75 5(J 40 55 90 85 &2..16. To see whether low-priced homes are easier to seil than national realty company collected the following homes were on the market before being sold. .,......11 ....' ' ' .. ' ... Number of Days on Market City Low-Priced Buffalo Charlotte Newark 55 70 40 30 110 70 8.2.17. The following is a breakdown of what 120 ,"",-,'-''''F.''' freshmen intend to do next Summer. Work Male Female School Play 22 14 19 14 31 20 An was done on the delivery of first-class mail originating from the in the following table. Recorded for each city was the dve"",",,'t:: (in days) that it took a letter to reach a destination in that same city. ;:sarnpl,es were taken on two occasions, Sept. 1,2001 and Sept. 1.2004. 1,2001 Wooster Midland Beaumont Manchester 2004 1.8 1.7 2.0 22 1.9 20 2.5 1.7 me;tneXlS (A and B) are available for removing dangerous heavy 8uprpli<;~. Eight water samples collected from various parts of the United to compare the two methods. Four were treated with Method A ..........'-"" with Method B. After the processes were completed, each "..Ut. . ...., purity on a scale of 1 to 100. 548 Chapter 8 Types of Data: A Brief Overview Method A Method B 88.6 81.4 92.1 84.6 90.7 91.4 78.6 93.6 8.2.20. Out of 120 senior citizens polled, 65 favored a complete overhaul of the health care system while 55 preferred more modest changes. When the same choice was put to 85 first-time voters, 40 said they were in favor of major reform while 45 opted for minor revisions. 8.2.21. To illustrate the complexity and of IRS a tax-reform lobbying group has sent the same five clients to each of two professional tax pre parers. The following are the estimated tax liabilities quoted by each of the preparers. B Client GS MB AA DP SB $31,281 14,256 26,197 8,283 47,825 $26,850 13,958 25,520 9,107 43,192 8.2.22.. The production of a certain organic chemical ammonium chloride. The manufacturer can obtain the ammonium chloride one of three forms: powdered, moderately ground, and coarse. To see if the consistency of the NH4Cl is itself a factor that needs to be considered, the manufacturer decides to rUll the reaction seven times with each form of ammonium chloride. The following are the resulting yields (in pounds). Moderately Powdered NH4 Cl GroundN~a CoarseNH4Cl 146 152 149 161 158 154 149 150 141 138 142 146 139 137 145 144 148 154 148 150 8.2.23. An investigation was conducted of 107 fatal poisonings of children. Each death was caused by one of three drugs. In each instance it was determined how the child received the fatal overdose. Responsibility for the 107 accidents was assessed according to the following breakdown. Section 8.2 DrugB DrugC 10 10 4 18 18 10 13 A Child Responsible Parent Responsible Another Person Classifying Data 549 were produced showing the """,·,.",C'I'> workers in a manufacturing The enllIle:5 Three different departments were selected at random for the shown are average annual salaries, in thousands of donars. of an affirmative-adion litigation, "(1"..1.1<;;;:' earned by White, Black, and Hispanic White Department 1 Department 2 Department 3 20.2 19.8 19.9 20.6 19.7 19.0 19.2 18.4 20.0 a was done on fifty people bitten by animals. Twenty 8.2.25. In victims were given the standard Pasteur treatment, while the other thirty were given gamma globulin. the treatment in addition to one or more doses of given the standard treatment twenty survived in the 5<'''U''''''' Nine of globulin the cost of a basic cable TV package 8.216. To see if any geographical pricing differences of six three in the and three in was determined for a random the Monthly for the southeastern cities were $13.20, $11,55, and $16.75; in the three northwestern paid $14.80, $17.65, and $19.20. 8..2.27. A public relations firm hired by a would-be presidential candidate has conducted a pon to see whether their client faces a gender Out of 800 men 325 strongly supported the candidate. 151 were opposed, and were undecided. were strong supporters, 241 were Among the 750 women included in the sample, strong opponents, and 251 were un(leCloea.. 8.2.28. As part of a review of its rate structure, an automobile insurance company has compiled the following data on claims filed by five male policyholders and five femaJe policyholders. (male) Claims in 2004 Client $2750 SB JM AK 0 0 ML JT $1500 0 MS 8M LL Claims Filed in 2004 0 0 0 $2150 0 550 Chapter 8 Types of Data: A Brief Overview 8.2.29. A company claims to have produced a blended gasoline that can improve a car's fuel consumption. They decide to compare their product with the gas currently on the market Three different cars were used for the test: a Porsche, a Buick, and a VW. The Porsche got 13.6 mpg with the new gas and 12.2 mpg with the "standard" gas; the Buick got 18.7 mpg with the new and 185 with the standard; the figures for the VW were 34.5 and 326, respectively. 8.l.30. In a survey conducted by State University's Center, a sample of three freshmen said they studied 6, 4, and 10 hours, respectively, over the weekend The same question was posed to three sophomores, who reported study times of 4, S, and 7 hours. For three juniors. the responses were 2, 8, and 6 hours. 8.2.31. A consumer advocacy group, investigating the prices of steel· belted radial tires produced by three major manufacturers. collects the following data. Year 1995 2000 2005 Company A CompanyB CompanyC $62.00 $68.00 $72.00 $65.00 $69.00 $75.00 $70.00 $78.00 $75.00 8.2.32. A small fourth-grade class is randomly split into two groups. Each group is taught fractions using a different method After three both are given the same 100-point test. The scores of students in the first group are scores reported for the second group are 76, 80, 72. and 67. 91,72, and 68; the 8.2.33. The trock length of a storm (the distaoce it covers while maintaining a certain minimum wind velocity and precipitation intensity) is an important parameter in a storm's "profile." Listed below are the track lengths recorded for eight severe hailstorms that occurred in New England over 8 five-year period (60). (km) Date of Storm 6 June 1961 3{J June 1961 1 July 1964 1 July 1964 5 August 1964 10 August 1965 13 August 1965 7 June 1966 16 160 95 65 30 26 26 24 8.2.34. The two-sample data shown on the left give the responses of six subjects to two treatments, X and Y. Would it make sense to graph these data using the format that appears on the right? Why or why not? Section 8.2 Classifying Data . 4 Treatment X Treatment Y 3 4 3 2 1 2 551 3 y . 2 1 0 3 2 1 4 x 8.2.35. Under what circumstances would the structure below be classified as regression dnJa? Under what circumstance!! would it be classified as one-sample data? Day Observation, y 1 Yl 2 3 Y3 n YII Y2 8..2.36. Would it be better to graph the data shown below using format (a) or format (b)? Explain. Trel11ment Response 10 .. 8 . 6 .. 4 . Pair X 1 2 3 6 10 8 4 4 4 5 7 6 10 .. .. .. 8 Response 6 4 2 2 0 Y Y X (Il> 0 Y X (b) 552 8.3 Chapter 8 Types of Data: A Brief Overview TAK1NG A SECOND lOOK AT STATISTICS (SAMPLES ARE NOT "VAUD"!) Designing an experiment invariably that two fundamental issues be resolved. First and foremost is the choice of the design itself. Based on the type of data available and the objectives to be addressed, what overall "structure" should the experiment of the most frequently answers to that question are the seven have? models profiled in Chapter 8, ranging from the simplicity of the one-sample design to the complexity of the randomized block ...."".1".". As soon as a has a second question immediately follows: How large should the sample size (or sample be? It that question, though, that leads to a common sampling misconception. is a widely held belief (even by many experienced experimenters who should know better) that some samples are "valid" (presumably because of their while others are not. Every consulting statistician could probably retire to Hawaii at an age if he or she got a dollar for every time an experimenter posed the foHowing sort of question: "I intend to compare Treatment X and Treatment Y using the two-sample fonnat. My plan is to take 20 measurements on each of the two treatments. Will those be valid samples?" The sentiment behind such a question is entirely understandable: the researcher is asking whether two samples of size 20 will be "adequate" (in some for addressing the objectives of the experiment. Unfortunately, the word "valid" is meaningless in thi~ context There such thing as a valid sample the word "valid" has no statistical definition. To be sure, we have already learned how to calculate smallest values of 11 that will achieve certain objectives. typically expressed in terms of the precision of an estimator or the power a hypotbesis test. Recall Theorem 53.2. guarantee that the estimator X/n for the binomial parameter p has at least a 100(1 - 0')% chance of within a distance d of p requires that 11 be as least as as z;/1/4d2 . Suppose, for we want a sample capable of guaranteeing that X /n will have an 80% 100(1 0')%) chance within 0.05 d) of p. By Theorem 53.2, n?: (1.28)2 = 164 On the other hand, that sample of n = 164 would not be large enough to guarantee tbat X/n has, say, a 95% chance of being within 0.03 of p. To meet these latter requirements., n would have to as least as as 1068 (= (1.96)2/4(0.03)2). Therein lies the problem. Sample sizes that can satisfy one set of specifications will not necessarily be capable of satisfying another. There is no "one size fits all" value for 11 that qualifies a sample as being "adequate" or "sufficient" or "valid." In a broader sense, the phrase "valid sample" is much like the expression "statistical tie" Section Both are widely used, and each is a well-intentioned attempt to simplify an important statistical concept. Unfortunately, both also share the dubious distinction of being mathematical nonsense. CHAPTER 9 Two-Sample Problems 9.1 9.2 INTRODUcnON TESllNG Ho: ilx :: ily- THE TWO-SAMPLE t TEST 9.3 TESllNG Ho: u~ '" C1~-THE FTEST 9.4 BINOMIAL DATA: TESTING Ho; Px:: py 9.5 CONFIDENCE INTERVAlS fOR THE TWO-SAMPLE PROBLEM 9.6 TAKING A SECOND LOOK AT STATISTICS (CHOOSING SAMPLES) APPENDIX 9A 1 A DERIVATION OF THE TWO-SAMPLE t TEST (A PROOF OF THEOREM 9.2.2) APPENDIX 9A2 MINITAB APPUCATIONS William Sealy Gosset ("Student") After earning an Oxford in mathematics and chemistry, working in 1899 for Messrs. Guinness, a Dublin brewery. Fluctuations in materials and temperature and the necessarily small-scale experiments inherent brewing convinced him of the necessity for a new small-sample theory of statistics. Writing under pseudonym "Student" he published work with the t ratio that was destined to a cornerstone of modern statistical methodology. -William Sealy Gosset C"Student") (1876-1937) 553 554 9.1 Chapter 9 Two-Sample Problems INTRODUCTION The simplicity of the one-sample model makes it the ·logical starting point for any discussion of statistical inference, but it also limits its applicability to the real world. Very few involve just a treatment or a single set of conditions. On the contrary, almost invariably design experiments to compare responses to several treatment levels--or, at the very least, to compare a single treatment with a controL In this chapter we examine the simplest of these multilevel the two-sample problem. Structurally, the two-sample probJem always falls into one of two dttferent mats: Either two (presumably) different treatment levels are applied to two independent sets of similar subjects or the same treatment is applied to two (presumably) different kinds of subjects. Comparing the effectiveness of germicide A relative to that of germicide one produces in two sets of similarly cultured B by measuring the zones of inhibition Petri would be an example of the first type. Another would be testing whether monkeys raised by themselves (treatment X) react differently in a stress situation from monkeys raiSed with siblings (treatment V). On the other hand, examining the bones of sixty-year-old men and sixty-year-old women, all life-long residents of the same city, to see whether both sexes absorb environmental strontium-90 at the same rate would be an example of the second type. Inference in two-sample problems usually reduces to a comparison allocation parameters. We might assume, for example. that the population of responses associated with, say, treatment X is normally distributed with mean ttx and standard deviation (1x while the Y distribution is normal with mean tty standard deviation O'y. Comparing location Ho: ttx = tty. As always, the alternative may be either parameters, reduces to jJ.. y. (If the data are one-sided, Hl. tt x < jJ.. y or Hi: tt x > tty, or two-sided, H1 : jJ.. X binomial, the location parameters are px and py, the true "success" probabilities for treatments X and Y, and the null hypothesis takes the form Ho: PX py.) Sometimes, although much less frequently, it becomes more relevant to compare than locations. A food company, lor the variabilities of two treatments, example, trying to decide which of two types of machines to for filling cereal boxes would naturally be concerned about the average of boxes filled by each type, but they would also want to know something about the variabilities of the weights. Obviously, a machine that produced high proportions of "underfills" and "overfills" would be a distinct liability. In a situation of this sort, the null hypothesis is HO:(1~ = For comparing the means of two normal populations, the standard is two-sample t fest. As described in Section 9.2. this is a relatively straightforward extension of Chapter 7's one-sample t test. For comparing variances, though, it will be necessaIY to introduce a completely new test-this one based on the F distribution of Section 7.3. The binomial version of the two-sample problem. testing Ho: px = pr, is taken up in '* = 9.4. It was mentioned in connection with one-sample problems that certain inferences, ...."',.uv'U for various reasons, are more aptly phrased in terms of confidence intervals rather than hypothesis tests. The same is true of two-sample problems. In Section 9.5, confidence Section 9.2 Two-Sample t Test Testing Ho: P-X = intervals are constructed for the location difference of two populations, Px - py), and the variability quotient, O'~ 10';. /.-LX - /Ly 555 (or TESTING Ho: /Lx ::: /Lv-THE TWO-SAMPlE t TEST We will suppose the data for a given experiment consist of two independent random samples. Xl; . .. • and Yt. Y2 • ...• Ym. representing either the to two populations from which the Xs and Ys are drawn in 9.1. Furthermore, will be presumed normal. Let fJ<X and /.-Ly denote their means. Our objective is to derive a procedure testing Ho: P,X = /Ly. As it turns out, the precise form of the test we are looking for depends on the variances of the X and Y populations. II it can be assumed that O'~ and are equal, it is a relatively straightforward task to produce the GLRT for Ho: Wi{ /.-Lt'- (This in what we will do in Theorem 92.2.) But if variances of the two populations are not equal, tbe problem becomes much more complex. This second case, known as the Behrens-Fisher problem, is more seventy-five years and remains one of the more famous "unsolved" problems in statistics. What headway investigators have made has been confined to approximate solutions (see. for example, SUkhatme (174) or Cochran (25)]. These however, will not be discussed we will restrict our attention to testing Ho: /.-LX Wf when it can be ~umed that For the oneMsample test that /L = /10, the GLRT was shown to be a function of a special case of the l ratio introduced Definition 7.3.3 (recall Theorem 7.3.5). We begin this """"'" ....,,... with a theorem that still another special case of Definition 7.3.3. 0'; = O'i = 0';. = Threorem9~1. Let X2 •.. ·• XII bearandomsample ofsize n from a normaldistribu.ti.on with mean /LX and standard deviation and lei YIo Y2 •... • Ym be an indepen.dent random sample of size m from a nomuJ. distribution with mean p,y and standard deviation Let S~ and S~ be the two corresponding sample variances, an.d S;J the pooled variance. where 0' 0'. " (Xi + (¥i - ~__~~________~~_~i=~l~__________~~______ n+m-2 n+m- Then has a Student t distribu.tion with n +m - 2 degrees offreedom. 556 Chapter 9 Two-Sample Problems Proof. method of proof here is very similar to what was used for Note that an equivalent formulation of Tn +m-2 is x - y- (J.LX - fLY) =r============================== ButE(X =J.LX - J.LyandVar(X - y)=,,2jn the ratio has a standard normal distribution: fz(z). + ,,2jm, so the numerator of In the denominator, and are independent X2 random variables with n - 1 and m - 1 elf, respectively, so hasax2 distribution withn the numerator and that +m - has a Student t distribution with n 2df(r~Tbeorem4.6.4).Also, by Appendix7.A2. ItfoUowsfromDefinition 7.3.3, then. +m - 2 df. o Section Testing Ho: /Lx =ltv-The Two-Sample tTest 551 Theorem 9.2.2. Let Xl, x2. .. • »If and Yb Y2.·. , . Ym independent random samples from normal distributions with means /.LX and /.LY, respectively, and with /he same standard deviation (1. Let t a. t ~ b. To = -r:;===::= test Ho: /.LX = /.Ly versus Hi: /.LX > /.Ly at the a level of significance, reject Ho if t""n+III-2. test Ho: /.LX = /.Lf versus H1: /.Lx < /.Ly at the Ci level of significance, reject Ho if t,:s c.. To test Ho: Mx = /.LY versus HI: Mx :f MY at a level of significance, reject Ho if t is either (1):::: -ta!2.If+m-2 or (2) ~ (",/2.11+111-2· Proof. See Appendix 9.A.1. 0 CASE STUDY 9.2.1 Cases of disputed authorship are not very common, but when they do occur they can be very difficult to resolve. Speculation has persisted for several hundred years that And whether it was some of Shakespeare's works were written by Sir Francis Alexander Hammon or James Madison who wrote certain of the Federalist Papers is still an open question. A similar, though more recent, dispute centers around Mark Twain (18). In 1861, a series of ten essays appeared the New Orleans Daily Crescent. Signed "Quintus Curtius Snodgrass," essays purported to chronicle the author's adventures as a member the Louisiana militia. 'While historians generally agree that the accounts referred to actually did happen, there seems to be no record of anyone named Quintus Cumus Snodgrass. Adding to the mystery is the fact that the style of the bears unmistakable traces--at least to some critics-of the humor and irony that made Mark Twain so famous. Most typically, efforts to unravel these sorts of ··yes, he did-no, he didn't" controversies rely heavily on literary and historical clues. But not always. There is also a statistical approach to the problem. Studies have shown that authors are remarkably conlength. That a given author Sistent in the extent to which they use words of a will use roughly the same proportion of, say, three-letter words in something he writes this year as he did in whatever he wrote last year. The same holds true for words of any proportion of three-letter words that author consistently uses will length. But. very likely be different from the proportion of three-letter words that au thor B uses. It follows that by comparing the proportions of words of a certain length essays known to be the work of Mark Twain to the proportions found in the ten Snodgrass <;;:>;:K1Y,;:', we should be able to assess the likelihood of the two authors' being one and same. (ConlirlUEd on next page) 558 Chapter 9 Two-Sample Problems (Que Study 9.21 continued) TABlE 9.2.1: Proportion of Three-letter Words Twain Proportion Sergeant Fathom letter Madame CapreH Mark Twain letters in Te"itorinl Enterprise First letter Second letter Third letter Fourth letter First Innocents Abroad letter Fl1'St half Second half 0.217 0.240 0.230 0.229 QCS Proportion Letter I II Letter III Letter IV Letter V Letter VI Letter vn Letter VITI Letter IX Letter X 0.209 0.205 0.196 0.210 0.202 O.W 0.224 0.223 0.220 0.201 0.217 Table 92.1 shows the proportions of three-letter words found in eight Twain essays and in the ten Snodgrass essays. (Each of the Twain works was written at approximately the same time the Snodgrass essays appeared.) If Xl = Xl = 0.262, ... , X8 = 0.217, and Yt 0.209, Y2 = 0.205, ... , YlO = 0.201, then = x = 1.855 = 0.2319 and y = 2.097 = 0.2097 To analyze these data, we need to decide what the magnitude of the difference between the sample means. x - y = 0.2319 - 0.2097 = 0.0222. actually tells us. Let J.Lx and J.Ly denote the true fractions of the that Twain and Snodgrass, respectively, use three-letter words. Of course, not having examined the complete works of the two authors. we have no way of evaluating either J.Lx or j.ly. so they become the unknown arolete'rs of the problem. What needs to be decided, then, is whether an observed sample difference as large as 0.0222 impJies that J.LX and J.L}' are, themselves. not the same'? Or is 0.0222 small enough to still be compatible with the hypothesis that the true means are equal'? Put formally, we must between Ho: J.Lx = My and (Continued on nexs page) Section 9.2 Testing Ho: IJ.X = J.I.y-The Two-Sample tTest 559 Since 8 L = 0.4316 and ::::0.4406 i::=1 two s.ample variances are4-3 =0.0002103 = 0.00009:55 Combined, they give a pooled standard deviation of 0.0121: = .J0.(J(XJ1457 =0.0121 According to Theorem 92.1, if Ho: li-X = f-tr is true, the sampling distribution of xT "'" -J-:=1==1= Sp '8 + 10 is described by a Student I curve with 8 + 10 - 2) degrees of freedom. (Continued OIl next fHlCe) 560 Chapter 9 Two-Sample Problems (Case Study 9.2..1 umtinued) Area = 0.005 RGURE 9.2.1 Suppose we a = 0.01. By Part (c) of Theorem 9.22, Ho should be rejected -(005.16 -2.9208 or in favor of a two-sided Hl if either (1) t :S -ley!2.n+m-2 (2) t :?: 1aj2,II+m-2 1.005.16 = 2.9208 (see Figure 9.2.1). But = = = 0.2319 - 0.2097 t=-----;;:;;.=;::=;= =3.88 a value falling considerably to the right of (005,16. Therefore, we reject Ho-it would appear that Twain and Snodgrass were not the same person. Comment. The XiS and YjS in Table 9.2.1, being proportions, are necessarily not normaUy distributed random variables, so the basic assumption of Theorem 9.22 is not met Fortunately, the of nonnormality on the probabilistic behavior of are frequently minimal. The robustness property of the one-sample t ratio that we investigated in Chapter 7 (recall Figure 7.4.7) also holds true for the two-sample I ratio. CASE STUDY 9.2.2 Dislike your statistics instructor? Retaliation time will come at the end of the semester, student course evaluation form with 1s. Were you pleased? when you pepper send a signal with a load of Either way, students' evaluations of their instructors do matter. The..;;e instruments are commonly used for promotion, tenure, and merit raise decisions. Studies of student course evaluations show that they do have value. They tend to show reliability and consistency. Yet questions remain as to the ability of these questionnaires to identify good teachers and courses. Ho: p.,x= p.,y-The Two-Sample t Section 561 A veteran instructor of developmental psychology decided to do a study (212) On how a single changed factor affect student course evaluations. He had at· tended a workshop extolling the virtue of an enthusiastic style in the classroom-more hand gestures, increased voice pitch variabiHty, and the like. The vehicle for the study was the undergraduate developmental psychology course he had taught in the fall semester. He set about to teach the semester offering in the same way, with the exception of a more enthusiastic style. The professor fuHy understood the difficulty of controlling for the many variables. He selected the class to have the same demographics as the one in the (alL He used the same textbook, syllabus, and tests. He listened to audio tapes of the fall lectures and reproduced them as closely as possible, covering the same topics in the same order, The first step in examining the effect of enthusiasm on course evaluations is to establish that students have, in fact, perceived an in enthusiasm. Table 9.2.2 summarizes the ratings the instructor on the "enthusiasm" Question for the in sample means (2.14 to 4.21) is statistically signifitwo semesters. Unless the cant, is no point in trying to compare faU and spring responses to other questions. TABLE 9.2.2 Spring, Yi m =243 x=2.14 Sx 0.94 Y=4.21 = 0.83 Sy Let JJ..x and JJ..f denote the true means associated with the two different teaching styles. There is no reason to think that increased enthusiasm on the part of instructor would decrease the students' perception of enthusiasm, so it can be here thai H1 should be one sided. That is, we want to test /.LX = JJ..Y versus HI: fJ.X < JJ..y Let (J. = 0.05, Since n 229 and m = the t statistic has 229 the decision rule calls for the rejection freedom. t = -r='3===:= :S 'C1.ni:'m--L. + 243 - 2 Ho if = 470 degrees of = -t.05.470 (CanJinued on next page) 562 Chapter 9 Two-Sample Problems (DJse Study 92.2 oonrinued) A glance at Table A2 shows that for any value II > 100, Za is a good approximation la.n. That -1.05,470 == -Z.05 -1.64. The pooled standard deviation for these data is 0.885: = Sp = Therefore, 2.14 - 4.21 = -~r===== - I -L-'.""" and our conclusion is a resounding rejection of Ho-the increased enthusiasm was. indeed, noticed. The real question of interest is whether t1!e change in enthusiasm produced a perceived Change in some other of teaching that we know did not change. For example, the instructor did not become more knowledgeable about the material over the course of the two semesters. The student ratings, though, disagree. Table 9.23 shows the instructor's fall and ratings on the "knowledgeable" "fU'-">uvu. Is the from x 3.61 to Y = statistically significant? Yes. For Sp = 0.898 and = t 3.61 - 4.05 = --t==;====:= = wh.ich falls far to the left of the 0.05 critical value -1.64). a bit disturbing. Table 9.2.2 What we can glean from these data is both appears to confirm the widely belief that enthusiasm is an important factor in a more cautionary note. teaching. Table 9.2.3, on the other hand, It speaks to another widely held belief-that student evaluations can sometimes be difficult to interpret. Questions that purport to be measuring one trait may, in fact, be reflecting something entirely different TABLE 9.2.3 Fall, Xi n =229 x= 3.61 Sx =0.84 Yi m =243 y=4.05 Sy =0.95 Section 9.2 Testing Ho: /Lx:::: Ily-The Two-Sample tTest 563 Comment. It occasionally happens that an experimenter wants to test Ho: /.Lx = tLy and knows the values of ai and a~. FOT those situations., the t test of Theorem 9.22 is inappropriate. If n XiS and m YiS are normally distributed, it follows from the Corollary to Theorem 43.4 that Z =x (9.2.1) has a standard normal distribution. Any such test of Ho: /.Lx on an observed Z ratio rather than an observed t ratio. /.Ll', then, should be based QUESTIONS 9.2..1. Ring Lardner was one of this country's most popular writers during the 1920s and 1930s. He was also a chronic alcoholic who died prematurely at the age of 48. The following table lists the liCe spans of some of Lardner's contemporaries (38). Those in the sample OIl left were all problem drinkers; they died, on the at age 65. The 12 (sober) writers 011 the right tended to live a full 10 years longer. Can be argued that an increase of that magnitude is statistically significant? Test an appropriate nuU hypothesiS against a one-sided Ht. Use the 0.05 level of significance. Note: The pooled sample standard deviation these two samples is 13.9. Authors Not Noted for Alchohol Abuse Authors Noted for Alchohol Abuse Name Ring Lardner Sinclair Lewis , Raymond Chandler O'Neill Robert Benchley J.P, Marquand Dashiell Hammett e.e. cummings Edmund Wilson Average: 48 66 71 56 67 67 70 77 65.2 Name at Death Carl Van Doren 65 87 32 77 Pound Randolph Bourne Van Wyck Brooks Samuel Eliot Morrison John Crowe Conrad Aiken Ames Williams Henry Miller ArchibaJd MacLeish James Thurber Average: 89 86 77 84 64 88 90 67 9.2.2. Poverty Point is the name given to a number of widely scattered archaeological sites throughout Louisiana, Mississippi, and Arkansas, These are the remains of a society thought to have tlourished during the period from 1700 to 500 B.C. Among their characteristic artifacts are ornaments that were fashioned out of clay and then baked. The following table shows the dates (in years associated with four of these baked clay Terral Lewis and Jaketown (85). ornaments found in two different Poverty Point 564 Chapter 9 Two-Sample Problems The averages for the two samples are 1133.0 and 1013.5, respectively. Is it believable that these two settlements developed the technology to manufacture baked clay ornamenls at the same time? Set up and test an appropriate against a two-sided HJ at the a = 0.05 level of significance. Note: $.1' = 266.9 and Sy = 224.3. Xi Jaketown 1492 1169 883 988 Yi 1346 942 908 858 9.2..3. Nod-swimming in male ducks is a highly ritualized behavioral trait. The tenn refers to a rapid back-and-forth movement of a duck's head. It frequently occurs during courtship displays and occasionally occurs when the duck is approached by another male perceived to have higher status. It may depend on the duck's "race." In an experiment investigating the latter (100), two sets of green-winged teals, American and European, were photographed for several days. The following table gives the frequencies (per 10,000 frames of film) with which each bird initiated the nod-swimming motion. At the 0.01 level of test the null hypothesis that the true average nod-swimming frequencies of and ducks are the same. Amer. Male Freq., Xi Eur.Male A B C 14.6 28.8 19.1 D E F G H I J 50.3 35.7 L M N 0 P Q R Yi 3.6 8.2 7.8 27.5 7.0 19.7 17.0 3.5 13.3 12.4 19.0 14.1 Note: For these two samples, 6 6 i=l .=1 U 12 i==l ; ...1 LX; = 171.6 Lxi = 5745.60 LYi = 153.1 L Yi = 2526.09 9.2.4. A major source of poisoning" comes from the ingestion of methylmercury (~). which is found in contaminated fish (recall Question 5.3.2). Among the questions pursued medical investigators trying to understand the nature of this particular Section 9.2 Testing Ho: Itx '" ltv-The Two-Sample t Test 565 health problem is whether methylmercury is equally hazardous to men and women. The following (117) are the hall-lives of methylmercury in the systems of six women and nine men who volunteered for a study where each subject was given an oral administration of CHF. Is there evidence here that women metabolize methylmercury at a different: rate than men do? Do an appropriate two-sample t test at the ell == 0.01 level of significance. Note: The two sample standard deviations for these data are ax = 15.1 and 9y = 8.1. Methylmercury (CHf3) Half-Lives (in Days) R Females., Xi Males, Yi 52 72 69 88 frl 73 88 87 74 78 56 70 78 93 74 9.2.5. The use of carpeting in hospitals., while having definite esthetic merits, raises an obvious question: Are carpeted floors sanitary? One way to get at au answer is to compare bacterial levels in carpeted and uncarpeted rooms. Airborne bacteria can be counted by passing room air at a known rate over a growth mediu~ incubating that medium, and then counting the number of bacterial colonies that form. In one such study done in a Montana hospital (209), room air was pumped over a Petri dish at the rate of 1 cubic foot per minute. This procedure was repeated in 16 patient rooms, 8 carpeted Bud 8 uncarpeted. The results, expressed in terms of "bacteria per cubic foot of air," are listed in the following table. Rooms BacteriaJft3 Uncarpefed Rooms BacterialW 210 11.8 8.2 212 216 220 7.1 223 13.0 10.8 226 227 10.1 14.6 14.0 228 221 12.1 8.3 3.8 7.2 12.0 222 11.1 224 229 10.1 214 215 217 For the carpeted rooms, 8 LXi =89.6 8 and 1",,1 LX: = 1053.70 1""1 For the uncarpeted rooms, 8 LYi =783 ;=1 8 and EY1=838.49 i=1 56fi Chapter 9 Two-Sample Problems Test whether carpeting has any effect on the level of airborne bacteria in patient rooms. Let (X 0.05. 9.2.6. In addition to marketing tea, Upton also sells packaged dinner entrees. The company was interested in knowing whether the buying habits for such products differed between and married couples. In particular, in a of consumers, they were asked to respond to the question "Do you use coupons regularly?" by a numerical scale, where 1 stands for agree strongly, 2 for agree, 3 for neutral, 4 for disagree, and 5 for disagree strongly. The results of the poll are given in the following table (21). = Use Coupons Regularly (X) :x =3.10 sx = 1.469 Married (Y) n= y = 2.43 Sy = 1.350 = 9.2.1. 9.2.8. 9.2.9. Is the observed difference significant at the (X 0.05 level? Accidents R Us and Roadkill specialize in· writing insurance policies for high-risk drivers. Last year, Accidents R Us processed 100 chums. Settlements averaged $2000 and had a sample standard deviation of $600. A smaller firm, Roadkill resolved only 50 claims, but the payouts averaged $2500 with a sample standard deviation of $700. Can we conclude from last year's experience that the average awards paid by the two companies tend not to be the same? Set up and carry out an appropriate analysis. A company markets two brands of latex paint-regular and a more expensive brand that claims to dry an hour faster, A consumer magazine decides to test this claim by painting 10 panels with each product. The average drying time of the regular brand is 2.1 hours with a sample standard deviation of 12 minutes. The fast-drying version has an average of 1.6 hours with a sample standard deviation of 16 minutes. Test the null hypothesis that the more expensive brand dries 8n hout quicker. Use a one-sided HI. Leta::;;: 0.05. (Ii) Suppose 110: JlX = /Ly is to be tested against 111: Jlx "I< Jl'l' The two sample sizes are 6 and 11. If :: 15.3, what .is the smaUest value for - yl that will result in Ho being at the a 0.01 level of significance? (b) What is the value for:X - y that will lead to the rejection of Jlx /L)' in favor of H,: Jlx > Jly if a 0.05, sp 214.9, n 13, and m 8? Suppose that 110: Jlx /Ly is being tested against 111: JlX -+ Jly, where and u~ are known to be 17.6 and 22.9, respectively. If n ::::;: to, m 20, x 81.6, and y : ; ;: . 79.9, what P-value would be associated with the observed Z ratio? An executive has two routes that she can take to and from work each day. The first.is by interstate; the second requires driving through town. On the average it takes her 33 minutes to get to work by the interstate and 3S minutes by going through town. The standard deviations for the two routes are 6 and 5 minutes, respectively. Assume the distributions of the for the two routes are approJtimately normally distributed. (8) What.is the probability that on a given day driving throUgh town would be the quicker of her choices? (b) Wbat.is the probability that driving through town for an entire week (10 trips) would yield a lower average time than taking the interstate for the entire week? Prove that the Z ratio given in Equation 9.2.1 has a standard nonna1 distribution. = 9.2.10. 9.2.11. 9.2.12. = = = = = = = ui Testing Ho: ILx::::: ILv-The Two-Sample tTest Section 9.2 9~13. 567 If Xl> X2,.·" and Yt, Y2,"" YIJI are independent random samples from normal distributions tbe same ,prove that their pooled sample variance, s~, is an 2 unbiased estimator for a . 9..2.14. Let X2.· .. , X tl and fl. Y2., ... , Y", be independent random samples drawn from normal distributions with means J,Lx and ILl', respectlvely, and with the same known variance .Use the generalized likelihood ratio criterion to derive a test procedure for choosing between Ho: J,LX = J,Lf and H}: ILx ¢ J,Ly. 9.2..15.. When a} ai, Ho: ILX ILY can be tested using the statistic * = t= --;====== which has an approximate t distributlon with v degrees of freedom, where greatest integer in - tI is the 1) A person exposed to ao infectious agent, either by contact or by vaccination, normally develops antibodies to that Presumably, the severity of an infection is related to the number of antibodies produced. The degree of antibody response is indicated by saying that the person's blood serum has a certain tiler, with higher titers indicating greater concentrations of antlbodies. The foilowing table gives the titers of 22 persons involved in a tularemia epidemic in Vermont (20). Eleven were quite ill; the other 11 were asymptomatic. Use an approximate t ratio to test Ho: ILX = J,Lyagainst a one-sided HI at the 0.05 level of significance. Note: The sample standard deviations for the "Severely ill" and U Asymptomatic" groups are 428 and 183, rp.I:.........:":tnl·elv Severely III Subject 1 2 3 4 5 6 7 8 9 10 11 9~16. Titer 640 80 1280 160 640 640 1280 640 160 320 160 Asymptomatic Subject Titer 13 320 320 320 10 14 16 17 18 19 20 21 22 80 160 10 640 160 320 For the approximate two-sample t test described. in Question 9.2.15, i.t will be true that v<n+m-2 Why is that a disadvantage for the approximate test? That is, why is it better to use the Theorem 9.2.1 version of the t test if, in fact, O'} a~? 568 Chapter 9 Two-Sample Problems 9.2.17. The two-sample data described in Question 8.2.2 would be analyzed by testing Ho: ILX = MY, where MX and ILy denote the true average motorcycle-rcIated fatality rates for states having "limited" and "comprehensive" helmet laws, respectively. (a) Should the t test for Ho: ILK ILY follow the fonnat of Theorem 9.2.2 or the approximation given in Question 9.2.15? Explain. (b) Is there anything unusual about these data? 9.3 TE511NG flo: u~ u~-THE FTEST Although by far the majority of two-sample problems are set up to detect possible shifts in location parameters, situations sometimes where it is equaUy important-perhaps even more important-to compare variability parameters. Two machines on an assembly line, for example, may be producing items whose average dimensions (J-L x and Ity) of some UUI:kllleSSi-;are not significantly different but whose variabilities (as measured and O'i) are. becomes a critical of information if the increased variability by results in an unacceptable proportion items from one of the machines falling outside the engineering specifications (see Figure 9.3.1). In this section we will examine the generalized likelihood ratio test of Ho: o~ = versus H]: :p. The data will consist of two independent random samples of sizes 11 and m: The first-Xl. X2, • .• xn-is assumed to have come from a normal distribution the .)'2 •...• Ym-from a normal distribution having mean ILX and variance (All four are assumed to be unknown.) having mean ILY and Theorem 9.3.1 gives the test that will be used. The proof will not be given, but it follows the same basic pattern we have seen in other GLRTs; the important step is showing that the likelihood ratio is a monotonic function of the F random variable described in Definition 7.3.2. O'i o¥ ai 0';. 0'1; 0';. Comment. Tests of Ho: = o'~ another, more routine, context. Recall that the procedure for testing the equality of ILx and J-ty depended on whether or not the two Output from machine X (Acceptable) proporIion 100 tbin I U x I (Acceptable) proponioo too thick I (Unacceptable) proportion: too thin I Ux < oy I I Output from machine Y i (Unacceptable) proponion I too thick FIGURE 9.3.1: Variability of machine outputs, Section 9.3 Testing _2 _ "'x - FTest 569 ai population variances were equal. This implies thar a test of Ho: a~ should precede every test of Ho: I1x = I1Y· If the fonner is accepted, the t test on I1X and I1Y is done if Ho: a~ is rejected, Theorem 9.2.2 is not entirely to 9.2.2; appropriate. A frequently used alternative in that case is the approximate t test described Question 9.2.15. Theorem 9.3.1. , X2 • •.• , Xn and Yl, )'2, ... , Yin be illdepelldenl random samples from normal distributions with means 11 x and J1 y and standord deviations ax and ar, respectively. a. To lest Ho: a~ = versus . a; > at (he a level of significance:, reject Ho if s~/si ::: F,.,.,,,,-I.n-l. b. test Ho: a~ = a; versus Ht: < a~ at the a level of significance, reject Ho ~r s~/si ~ Fl-tt.m-l,n-l. a; '* c. To lest HO: = versus Hl: a; at the a level of significance, reject Ho if is either (1) .::: Fa /2,TII-l.n-l or (2) 2: Fl- a /2.m-l,n-l· Comment. The GLRT described in Theorem 9.3.1 isapproximale for same sort of reason the GLRT for Ho: = was approximate (see Theorem 7.5.2). The distribution of the test statistic, 1 is not symmetric, the two ranges of variance ratios yielding AS less than or to A* (i.e., the lefttail right tail the critical region) have slightly areas. the of convenience. though, it is customary to choose the two critical vaiues so that each cuts off the same area, a/2. St Si, a5 CASE STUDY 9.3.1 Electroencephalograms are records showing fluctuations of electrical activity in brain. Among the several different kinds of brain waves produced, the dominant ones are usually alpha waves. These have a characteristic frequency of anywhere from eight to thirteen cycles per second. this example was to see whether """",...,,,,,. of the experiment described sensory deprivation over an extended period of time has effect on alpha-wave pattern. The subjects were twenty inmates in a Canadian prison. They were randomly spIlt into two equal-sized groups. Members of one were placed in solitary confinement; in the olher group were allowed to remain in their own Seven days alpha-wave frequencies were for twenty subjects (59), as shown in Tab1e 9.3.l. UUi"Hll~ from 9.3.2, was an decrease in alpha-wave frequency for in solitary confinement. There also appears to have been an increase the (Continued on nexl page) 510 Chapter 9 Two-Sample Problems (OJse Srudy 9.3.1 continued) TAW 9.3.1: Alpha-Wave (CPS) Nonconfined, Xi Confinement, Yi 10.7 10,7 9.6 10.4 10.4 9.7 10.9 10.5 10.3 9.6 11.1 11.2 10.4 10.3 9.2 9.3 9.9 9.0 10.9 11 1)' fi ""0" ..:: 10 <U <II ~ .[ •• • •• 0 • 8 0 § 8 9 :( • NOllconfined o Solitllry 0 0 AGURE 9.3.2: Alpha-wave frequencies (cps). variability for that group. We will use the F test to determine whether observed difference in variability (4 = 0.21 versus = 0.36) is statistically significant. Let and o} denote true variances of alpha-wave frequencies for nonconfined and soIitary..confined respectively. The hypotheses to be tested are u 'a2 2 no· X = af versus Hl: ai ¥- {f~ (Continued on n£XI pttge) Section 9.3 Testing Ho: O'~:::: O'}-The FTest 571 be the level of significance. Given that 10 2>1 = 105.8 = 1121.26 1=1 yl =959.70 the variances become s} = -------'---'-- = 0.21 and 2 _ Sy - 10(959.70) - (97.8)2 _ 0 36 10(9) - . an observed F ratio of 1.71: the sample F = 0.36 = 171 0.21 . Both nand m are ten, so we s~ Jsj to UC:;JlU""" an F random variable with nine and nine of freedom (assuming is true). From Table AA in the Appendix, we see that the values cutting in either tail of that distribution are 0.248 and 4.03 (see Figure 9.3.3). Since the observed F between the two critical is to fail to reject Ho-a variances equal to 1.71 not out the possibility that the two true are equal. (In light of comment preceding Theorem 9.3.1, it wou~d now appropriate to test Ho: /Lx = /Ly the two-sample 1 test described in :se(~t1Cin F distribution with Density 9 and ofrree.clom Area = Area = Q.025 FIGURE 9.1.1: Distribution of 5~/5~ when HO is true. sn Chapter 9 Two--Sample Problems QUESTIONS 9.3.1. Short people tend to Jive longer than tall people, acrording to a theory held by certain medical researchers. Reasons for the disparity remain unclear, but studies have shown that short baseball players enjoy a longer tife expectancy than tall baseball players. A finding has been documented for professional boxers. The foUowing table (159) is a breakdown of the life spans of 31 former grouped into two 5'7") and "TaU" (;::: 5(811 ). The sample variance for the short categories-"Silort" , for the tall presidents, 86.9 years2• presidents is 73.6 Short Presidents (::;5'71/) President Height Madison 5'4" 5'6" Van B. Harrison J. Adams J.O. Adams 5'7/1 Tall Presidents Ci!S8") President 67 W. Harrison Polk Taylor 90 Grant 85 79 80 Hayes Truman Fillmore Pierce A. Johnson T. Roosevelt Eisenhower Cleveland Wilson Hoover Monroe Tyler Buchanan Taft Harding Jackson Washington Arthur F. Roosevelt L.Johnson Jefferson Height Age 5'8" 68 53 65 5'8" 5/8" 5'8 1JJ 'Z 5'8 1 II ~ 63 70 5'9" 88 74 5'10" 64 5'10" 5/10" 60 5' 10" 5'10" 5'11/1 5'1111 5' 11!1 6' 60 78 71 67 90 6' 6' 73 71 77 (I 72 6' 61 1H 6'21' 67 (l2f' 56 63 61'21' 6''21' (l2r 64 83 (8) Test Ho: aj against a two-sided HI at the a :: 0.05 level of significance. (b) Based on your conclusion in Part (a), would it be appropriate to test /-LX = MY using the two--sample l test of Theorem 9.3.2. A safe investment for the nonexpert is the certificate of deposit (CD) issued by many banks and other financial institutions. Typically, the larger the term of the investment, samples of 6-month CD rates the higher the interest rate paid The following table Testing Ho: Section 9.3 and FTest rates for a $10,000 investment. Is there a difference at the a = 0.05 level? 573 the variability of $10/)00 CD Rates Note; 6 Month 12 Month 5.10 5.10 5.31 5.00 5.26 5.10 5.26 5.02 5.15 5.35 5.20 5.40 5.20 5.83 5.21 5.40 the 6-montb rates, Sx = 0.122; for the sy = 0..209. r"U'.'VL'''''' the standard personality inventories used psychologists is the therna.tic am:>erc:.ep(IOI) test (TAT). A subject is shown a pictures and is asked to make up a story about each one. Interpreted properly, content of the stories can provide valuable insights into the subject's mental following data show the TAT results for 4() women, 20 of whom were of normal children Bnd 20 the mothers of schizophrenic In case suhject was shown the same set of 10 pictures. The figures :recorded were the numbers of stories (out of 10) that revealed a positive parent-ehild relationship, one the mother was dearly capable of :interacting with her child:in way (210). Mothers of Schizophrenic Children Mothers of Normal Children 8 4 2 3 3 1 2 1 4 4 6 3 4 6 6 4 4 1 2 3 1 2 7 '0 3 2 2 0 a; q; qi = = 3 3 1 1 4 2 3 1 2 2 2 1 (a) Test Ho: = versus H1: where and q~ are the variances of the scores of mothers of _11..;1,,,_ •• _ and scores of mothers of schizophrenic children, respectively. Let a = 0.05. (b) H Ho: oJ o~ is in (a), test Ho: iLx tty versus Hl: ttx ¢ iLY. Set a equal to 0.05. 9.3.4. In a study designed to of.a strong magnetic field on the early development of mice (8), 10 each containing three 3()..day-old albino female mice, were subjected for a period of 12 days to a magnetic field having an average strength of 80 OeJcm.. Thirty other mice, bol..lSed in 10 similar cages.,. were not put in the magnetic 514 Chapter 9 Two-Sample Problems field and served as controls. Listed in the table are the weight gains, in grams, for each of the 20 sets of mice. In Magnetic Field Weight Gain 1 2 3 4 5 6 7 8 9 10 22.8 10.2 20.8 27.0 19.2 9.0 14.2 19.8 14.5 14.8 Not in Magnetic Freld Cage Weight Gain (g) 11 12 13 14 235 31.0 19.5 26.2 265 15 16 25.2 17 18 19 20 24.5 23.8 27.8 22.0 Test whether the variances of the two sets of weight gains are significantly different. Let a = 0.05. Note: For the mice in the magnetic field, sx = 5.67; for the other mice, Sy = 3.18. !u.s. Raynaud's is characterized by the sudden of blood circulatioo in the fingers, a condition that results in discoloration and heat loss. The magnitude of the problem is evidenced in the following data, where 20 (10 "nonnals" and 10 with Raynaud's syodrome) immersed their right forefingers in water kept at 1~C. The heat output (in cal/cm2/minute) of the forefinger was then measured with a calorimeter (109). Subjects with Raynoud's Syndrome NormIJ/ Subjects Patient Heat Output (callcm 2 huin) Patient W.K. M.N. S.A. 2.43 1.83 2.43 2.70 1.88 1.96 1.53 1.R. J.G. G.K. AS. 2.08 1.85 2.44 L.P. x=2.11 Sx =037 R.A. RM. P.M. KA. HM. 0.81 0.70 0.74 036 0.75 S.M. RM. 0.56 0.65 0.87 OAO B.W. N.B. 0.31 y=0.62 =0.20 Sy oJ .. O'}-The FTest 515 Section 9.3 Test that the heat-output variances for nonnal and those with Raynaud's syndrome are the ~me. Use a two-sided alternative and the 0.05 level of significance. 9.3.6. The bitter, 8-month baseball strike that ended the to have substantial reperCU$ions at the box so abruptly was expected 1995 season finally got under way. It did. By the end of the first week of play, American League teams were National League teams fared even playing to 12.8% fewer fans than the year worse-their attendance was dow1115.1 % (200). on the team-by-team attendance figures given below, would it be appropriate to use the pooled two-sample t test of Theorem 9.22 to assess the statistical significance of the difference between those two means? American League Team Baltimore Boston California Chicago Oeveland Detroit Kansas City Milwaukee Minnesota New York Oakland Seattle Texas Toronto Average: -2% National League Team ClIange Atlanta -49% -4 -27 Colorado No home Houston -30 No home Montreal New York Philadelphia Pittsburgh San Diego San Francisco St. Louis Average: -18 -27 -15 -16 -10 -1 -9 -28 -10 -45 -14 -15.1% 9.3.7. For the data in the sample variances for the methylmercury half-lives are 227:n for the females and the males. Does the magnitude of that difference invalidate 9.2.2 to test Ho: /1>X = p,y? Explain. 9.3.8. Crosstown busing to for de facto segregation was begun on a fairly scale in Nashville 1960s. Progress was made, but critics argued that too many racial Among the data cited in the early 19705 are the following the percentages of African-American students enrolled in a of 18 public schools (172). Nine of the schools were located in predominantly African-American neighborhoods; the other nine, in predominantly white neighborhoods. Whlch version of the two-sample I test, Theorem 9.2.2, or the approximation 9.215, would be more appropriate for deciding whether the 35.9% and 19.7% is statistically significant? Justify your answer. 516 Chapter 9 Two-Sample Problems Schools io White Neighborhoods American 21 % 14 28 41 32 11 30 29 46 39 6 18 24 25 23 Average: 19.7% 45 Average: 35.9% 9.3.9. Show that the generalized likelihood ratio for testing Ho: as described in Theorem 9.3.1 is given by ), - L(w,,) -- - •- L(Oe) - / t (XI _ X)2]fl 2[t [ + versus Hi: * (yj _ Y)2]"'!2 (m n)<II+ni)J2 1=1 j=1 ----;:;:-'----,,,-- -"'-------=----'=-------,..:::::...-:-= n"12mm/2 [ " L(Xi - X)2 i=l m + ] ("'+n)12 L(y] - Y)2 j=1 9.:UO. Let Xl, X2, ... , X nand ... , Ym be independent random samples from normal distributions with means {.LX and Ji-y and standard deviations ax and Gy, respectivel where {.Lx and Ji-y are known. Derive the GLRT for Ho: = a~ versus HI: > Gy. 0-; 9A ~NOMIAl 0'1 1, DATA: TESTING Ho: Px == Pv Up to this point, the data considered in Chapter 9 have been independent random samples of sizes 11 and m drawn from two continuous distributions--in fact, from two normal distributions. Other scenarios. are quite possible. The Xs and Ys ..",\1""""'7,f continuous raodom variables but have density functions than the normal Or they might be discrete. In this section we consider the most common example of this latter type: situations where the two sets of data are binomial. Applying the Generalized IJkelihood Ratio Criterion Suppose that n Bernoulli trials related to treatment X have resulted x successes, and m (independent) Bernoulli trials related to treatment Y have yielded y successes. We wish to test whether PX and py, the true probabilities success for X and Treatment Y, are equal: Ho: Px = versus HI: px Let ex be tbe level of * py p) Section 9.4 Binomial Data: Testing Ho: Px ::: Following the notation used for GLRTs, the two parameter spaces here are w = {(px, py) : 0 :s Px = py :s 1] and 0= {(px, pv): FUI:the~rmore O.:s PX :s 1,0.:s py .:s I} the likelihood function cao to p(= Px = py) equal to zero and solving for derivative of In L with p gives a not too surprising result-namely, Pe That is, the maximum likelihood estllnare for p under Ho is the pooled success proportion. Similarly, solving alnLjapx 0 alnL/iJpy = 0 gives the two sample likelihood esfunates, for px proportions as the unrestricted = x n . y pr, =m - PUlting Pet px., and py. back into L gives the generalized likelihood ratio: L(wt') A= - L(O",) [(x + y)j(n + m>l~+Y [1 - (x + y)/(n + m)]"+m-.r- y = -=---....::..;..~:------..:..--::::-::--------:-:--'---'----=--::-(x/n)X [1 - (x/n)f (yjm)Y [1 - (y/m)]m y (9.4.1) of Equation 9.4.1 is such a difficult function to work with that it is to find an approximation to the usual likelihood ratio test. There are several available. It can be shown, for example, In Afor this probJem has an asymptotic 1. 2 distribution with 1 degree offreedom (211). an approximate tWChSided, a:::: 0,05 testis to reject flo if -2 In A 2: 3.84. Another approach, one most often used, is to appeaJ to central limit theorem and make the " ...."' ....H. that has an approximate ::>IA.Ill...UiU normal distribution. Under Ho, of course, 518 Chapter 9 Two-Sample Problems Y) = X Var ( -;; -;;; p(l - p) + "---n--"-- m - p) nm If p is now repJaced by x + n+m its maximum likelihood estimate under w. we get statement of Theorem 9.4.1. Thoorem 9.4.1. Let x and y denote the numbers of successes observed in two independent sets of n and m Bernoulli trials, respectively, where Px and Py are the true success probabilities associated with each set of trinls. Let Pi! =x +y n+m and define a. To test Ho: p X = py versus HI: p X > py al the ()( level of significance, reject Ho if Z 2:: Zo· b. To teS( Px < py at the ()( level of significance, px Py versus Ho if Z :5 -Zac. To lest Ho: p X = py versUS H,: PX ¢:. Py oJ the ()( level of significance, reject Ho if z is either (1) S or (2) 2:: Za/2. Comment. The utility of Theorem 9.4.1 actually exlends beyond the scope we have just described. Any continuous variable can always be ruchotomized and "transformed" into a Bernoulli For example, blood pressure can be recorded in terms of "mm Hg," a continuous or simply as "nonnal" or "abnonnal," a BernouW """M!>lhl The next two case studies illustrate these two sources of binomial data. In the first, the measuremen 18 begin and end as Bernoulli variables; in the second, the initial measuremen t of "number of nightmares per month" is dichotomized into "often" and "seldom." CASE STUDY 9.4.1 Local have some discretion in the disposition of criminal cases that appear their courts. For some cases, the judge and the defendant's lawyer will enter into a plea bargain, where pleads gwlty to a lesser How often this happens is measured by the mitigation rale, the proportion of criminal cases where the defendant qualifies for prison time but receives a greatly shortened term or no prison time at aU. (Continued on next page) Section 9.4 Binomial Data: Testing Ho: Px =py 519 A recent Florida . Corrections Department study showed that the mitigation rate in &cambia County from January 1994 through March 1996 was 61.7% (1033 out of 1675 cases), making it the state's fourth highest. Not happy with that distinction, the area's State Attorney instituted some new policies designed to limit the number of plea bargains. A follow-up study (138) revealed that the July 1996 through June 1997 mitigation rate decreased to 52.1 % (344 out of 660 cases). Is it fair to attribute that decline to the State Attorney's efforts, or can the drop from 61.7% to 52.1 % be written off to chance? Let PX be the true probability that mitigation would have occurred during the period January 1994 through March 1996, and let py denote the analogous probability for July 1996 through June 1997. The hypotheses to be tested are HO: PX = py (= p) versus Ht:Px>py Leta =0.01. If Ho is true, the pooled estimate of P would be the overall mitigation rate. That is, 1033 PI'! + 344 1377 ='1675 + 660 = 2335 = 0.590 The sample proportions of the mitigation rate for the first period and second period are ~ = 0.617 and = 0.521. respectively. According to Theorem 9.4.1, then, the test statistic is equal to 4.25: S 0.617 - 0.521 Z = -;::::~~::;:::::;::::::;::::==~~~::::;:::::::::= = 4.25 (0.590)(0.410) (0.590)(0.410) 1675 + 660 Since z exceeds the a = 0.01 critical value (Z.Ol = 2.33), we should reject the nuli hypothesis and conclude that the more stringent policies laid down by the State Attorney did have the desired effect. of lowering the county's mitigation rate. CASE STUDY 9.4..2 Over the years, numerous studies have sought to characterize the nightmare sufferer. Out of these has emerged the stereotype of someone with high anxiety, low ego strength, feelings of inadequacy, and poorer-than-average physical health. What is not so well-known. though. is whether men fall into this pattern with the same frequency as women. To this end, a clinical survey (76) looked at nightmare frequencies (Continued on nex! page) 580 9 Two-Sample Problems (Case 9.4.2 continued) TABLE 9.4.1: Frequency of Nightmares Nightmares often Nightmares seldom Totals % often: Men Women Total 55 105 60 132 115 237 160 34.4 women. Each subject was asked whether he (or for a sample of 160 men and experienced "often" least once a month) or "seldom" than once a month). percentages of men and women "often" were 34.4% and respectively (see Table 9.4.1). Is the difference between those two percentages statistically significant? Let PM and pw denote true or men nightmares and women having nightmares often, respectively. The hypotheses to be tested are HO:PM Pw versus Let ex = O.OS. Then Pe = 55+60 ±Z.025 =± become the two critical values. M(}reOVc~r 0.327, so 0.344 - 0.313 =0.62 The conclusion, then, is We fail to reject the null hypothesis-these data provide no convincing evidence that the frequency of nightmares is different for men than for women. QUESTIONS 9.4.1. The phenomenon of handedness has been extensively studied in human populations. The percentages of adults who are right·handed, left-handed, and ambidextrous are well documented. What is not so well-known is that a similar phenomenon is in lower animals. Dogs, for example, can be either right-pawed or left-pawed.. _~"f'"",",,~P. that in a random sample of 200 beagles it is found that 55 are left-pawed and a of 200 collies 40 are left-pawed.. Can we conclude that the difference random in the two proportions olIeft-pawed is statistically Binomial Data: Testing Ho: Px Section 9.4 =py 581 9.4.2. In a study designed to see whether a controUed could retard the process of arteriosclerosis, a total of 846 chosen peJ:S011S were followed over an eightyear period. Half were instructed to eat foods; the other balf could eat whatever they wanted. At the end of years, 66 persons in the diet group were found to have died of either myocardial or infarction, as compared to 93 deaths of a similar nature in the (214). Do the appropriate analysis. Leta = 0.05. 9.4..3. Water witching, the practice of movements of a forked twig to locate underground water (or minerals), over 400 Its first detailed description appears in Agricola's De re MeudliCi1, published in That water witching works remains a belief widely held among rural people in throughout the Americas. [In 1960 the number of "active" water witches in the United States was estimated to be more than 20,000 (205).] Reliable evidence supporting or refuting water witching is bard to find, Personal acx:ounts successes or failures tend to be strongly biased by the attitude of the observer. following data show the outcomes of all the wells dug in Fence Lake, New Mexico, where "witched" wells and 32 "nonwitched" wells were sunk. for each well was whether it proved to be successful (S) or unsuccessful (U). What would you COrllC1UiOe"! Nonwitched Wells Witched Wells S S S S U S S S S S S S S S S S S U S S S S U S S S S S S S S U S S S U U S S S S S S S S S S S S S U S S S S S S S 9.4.4. If flying saucers are a 1ol,,,.'UIJ..'" phenomenon, it would follow that the nature of sightings (that is, should be similar in different parts of the world. A prominent investigator compiled a listing of 91 reported Spain and 1117 Among the information recorded was whether saucer was (JTf"mY1,1I or His data are summarized in the following table (86). Let PS aelIlO[:e the true probabilities of "Saucer on ground" in Spain and Not in Spa.i.n. respectlV(;Iy. Ho: ps PHS against a two-sided Hl" Let a = 0,01. Saucer on ground ;saluct~r hovering 9.4.5. In Spain Not in Spain 53 705 38 412 Ho: Px = py is being tested against HI; Px ." py on the basis of two "'~'''''V''''' sets of 100 Bernoulli trials. H x, the number of successes in the first set, is 60 y, number of successes in the second set, is 48, what P-value would be associated with the data? 582 Chapter 9 Two-Sample Problems 4134 of 9.4.6. A total of 8605 students are enrolled full-time at State University this whom are women. Of the 6001 students who live on campus. 2915 are women. Can it be that the difference in the proportion of men and women living on campus is Carry out an Let a = 9.4.7. The kittiwake is a seagull whose behavior is basically monogamous. Nonnally, the birds for several months the completion of one season and reunite at of the next. or not the birds actually reonite, by the success of their "relationship" the season before. A though, may be total of 769 kittiwake pair-bonds were studied over the course of two hr"... t1in seaso~ of those some 609 successfully the first season; the rernainin,g previously successful 160 were unsuccessful. The following season, relationship left something to be bonds "divorced," as did 100 of the 160 whose desired. OlD we that the difference in two rates (29% and 63%) is statistically significant? Breeding in Previous Year Successful Unsuccessful Number divorced Number not 11,,,,r"N-"11 175 434 100 Total Percent divorced 609 60 160 29 63 ~M.8. A utility infielder for a National League club last season in 300 plate. This year he hit .250 in 200 at-bats. The owners are trying to cut his year on the that his output has The player argues, ".y' .... ""'.... his performances the last two seasons have not been significantly different, so should not be reduced, Who is right? 9.4..9. Compute -2 In A 9.4.1) for the data of Case Study and use it to test the that px = py. Let a 9.5 CONFIDENCE INTERVALS FOR THE lWO-SAMPLE PROBLEM data lend themselves nicely to the hypothesis format because a can always be (whlch was not the case every set of one-sample The same inferences, though, can as easily be in terms of confidence Simple inversions similar to the derivation of Equation 7.4.1 will yield confidence intervals /-Lx - tty, a-;:!u;, and px Py, WO-lSClIHljJIt: Theorem 9.5.1. , Xl, ... , Xn and)'1. yz" ',Ym be independent random samples drawn from nonnal with means ttx /-Ly, respectively, and with the same standard deviation, u. denote the data's pooled standard deviation. A 100(1 % confidence by interval for ttx - /-Ly ~+ ;;; , x • Spy -;; - + tCl{2.n+m-2 - y . ~rinn9.5 Proof. We know Confidence Intervals for the Two·Saniple Problem ·""',r'\TP'n\ 583 9.2.1 that a ..;JLylU....JllL t distribution with n +m - 2 df. Therefore, (9.5.1) 9.5.1 by isolating ILx - ILy in the center of the inequalities gives in the theorem. CASE STUDY 9.5.1 Occasionally in forensic medicine. or in the aftermath of a bad accident, lClentl1tym the sex of a victim can be a very difficult task. In some of these cases., dental structure provides a useful criterion, since individual teeth will remain in good condition long after other tissues have deteriorated. Furthermore, studies have shown that teeth and male teeth have different physical and chemical characteristics. The extent to which X-rays can penetrate tooth enamel, instance, is ClIIJterc~nt for men than it is for women. listed in Table 9.5.1 are "spectropenetration gy8,(l)emo;:"' for eight female teeth and eight male teeth (57). These numbers are measures rate of change in the amount of X-ray penetration through a of tooth enamel at a wavelength of 600 urn as opposed to 400 nrn. TABlE 9.S.1: Enamel Spectropenetration Gradients Male, Xi Female, Yi 4.9 5.4 4.8 5.0 5.5 5.4 6.6 63 43 4.1 5.6 4.0 3.6 (Conriooe.d on nexi page) 584 Chapter 9 Two-Sample (Case 9.5.1 continued) Let Jl.x and Jl.y be the popuJation means associated with male teeth and with female spe:ctr<Op(meltra1ion gradients respe(;tively Note that 8 LXi =43.4 and i=l from which x = 43.4 =5.4 8 and - (43.4)2 8(7) = 0.55 Similarly, Yi = yf = and ;=1 166.95 i=1 so that and 2 = 8(166.95) - (36.1)2 8(7) Sy nel~I()re. = 0 58 . the pooled standard deviation is equal to 0.75: _ 7(0.55) + 7(0.58) = .J0.565 = 0.75 8+8-2 VVe tbatthe will be approximated by a I curve degrees of freedom. Since '.Cf2S,}4 = 2.1448, the confidence interval for Jl.X - Jl.y is given by 21 .. Ao - ~ ( x - y - . ~Py 8" + 8"' x - y + = (5.4 - 4.5 - 2.1448(0.75).JO.25. = (0.1, 1.7) + 2.1448(0.75)JO.25) ,,,,,.,. .......... 9.5 Confidence Intervals for Two-Sample Problem 585 Comment. Here the 95% confidence interval does not include the value zero. This means that had we tested HO: J.1X = ILY versus Ht: J.1x at the c¥ = 0.05 level ~f significance, "* ILY would have been rejected. Theorem 9.5.2. Let x t • X2 • .•• , XII and )'1 , ."2, ...• ),,,, be independent random samples drawn fromnonnal distributions wilhstandard deviations ax and ay, respectively. A 100(1 - c¥)% confidence interval for the variance a~/a:, is given by 1,11-1 . -+~:".. ) has an F distribution with m I and n - 1 dr, and follow the strategy used in the proof of Theorem 9.5.I-that the center of the analogous inequalities. isolate aj/a: in Proof. Start with the fact that o CASE STUDY 9.5.2 easiest way 10 measure the movement, or flow, of a glacier is wilh a camera. First a set of reference points is off at various near the Then these points, along with the glacier, are photographed trom an airplane. The problem is this: How long should the interval between photographs? If 100 shorr a period has elapsed, the glacier will not have moved very far and errors associated with the photographic technique will relatively If too long a period has elapsed, parts of the glacier might be deformed by the surrounding terrain, an eventuality that could introduce substantial variability into the point-to-point velocity estimates. Two sets of flow rates for the Antarctic's Hoseason Glacier have been calculated (118), one based on photographs taken three years apart, the other, jive apart (see Table 9.5.2). On basis of other considerations, it can be assumed that eight in question. Htrue" flow rate was constant for objective is to assess the relative variabilities associated with the threeand five-year time periods. One way to do this-assuming the data to be normal-is (Cotllinlled on /lext page) 586 Chapter 9 TwcrSample Problems (Case Study 9.5.2 continu.edJ TABlE 9.5.2; FIo\N Rates Estimated for the Hoseasoo Glacier (Meters Per Day) Three-Year Five-Year Span, Yi 0.72 0.74 0.73 0.76 0.74 0.75 0.77 0.72 0.72 0.73 0.75 0.74 to construct, say, a 95% confidence interval for the variance ratio. If that interval does not contain the value "1," we infer that the two time periods lead to flow rate estimates of significantly different precision. From Table 7 7 LXi = I>; =3.9089 1=1 1=1 so that = 7(3.9089) - (5.23)2 = 0 000224 7(6) . Similarly, Yi = 3.64 yl = 2.6504 and ;=1 i=1 making s~ = --::...._---'c__________ = 0.000120 two critkal values come from Table A.4 in the Appendix: F.025.4,6 = 0.109 and F.97S.4.6 = 6.23 (O::mti.IuJW 00 fleXI page) Section 9.5 Confidence Intervals for the Two-Sample Problem Substituting. then, into the statement confidence interval for (J';:I(f~: 587 Theorem 9.5.2 gives (0.203, 11.629) as a 95% 0.000224 09 0.000224 6 .23 ) 01 ( 0.000120. '0.(100120 (0.203, 11.629) Thus, although the three-year data had a sample variance than the data, no conc1usions can be drawn about the true variances being different, because the ratio (J'V(J'~ : : : 1 is contained in the confidence interval. Theorem 9.5.3. Let x and)' denote the numbers observed in two independent sets of 11 and II! Bernoulli trials, respectively. If px and py denote the tl1le success by probabilities. an approximate 100(1 - a)% confidence interval for px - py is oX )' n m x n Proof In + o Question CASE STU DY 9.5.3 Unti1 almost the end of the nineteenth century mortality associated with surgicaJ operations--even minor ones-was extremely The major problem was ",f,,,...h,,·,,, The germ theory as a modeJ for transmission was still unknown, so was no concept of sterilization. As a result. many patients died from postoperative complications. The major breakthrough that was so desperately needed finally carne when Joseph a British physician, reading about some the work done by Louis Pasteur. In a series of classic experiments, Pasteur had succeeded in demonstrating the role that yeasts and play fermentation. Lister conjectured that human infections might have a origin. To test his theory began using data in Table 9.5.3 show the carbolic acid as an operating-room disinfectant. (Continued on /!exl page) 588 Chapter 9 Two-Sample Problems (Case Study 9.5.3 ooTl1inued) TABLE 9.53: Mortality Rates-Uster'sAmpvtations Carbolic acid used? Patient lived? Yes No Total No Yes Total 19 34 16 6 53 22 40 outcomes of amputations that he performed. thirty-five without the aid of carbolic and forty with the of carbolic acid (213). Let PW (estimated by ~) and pW/o (estimated by ~) denote the true survival probabilities for patients amputated "with" and "without" the use of carbolic acid. respectively. To construct a 95% confidence interval for Pw - pW/o we note that Zt:tp 1.96; then Theorem 9.5.3 reduces to = 34 19 40 35 = (0.31 - 1.9fw'0])i()3,0.31 + 1.96JO.Ol03) = (0.11,0.51) Since PW pW/o 0 is not included in the interval (which lies entirely to the right it should be concluded that carbolic acid does have an effect-a beneficial one--on a surgery patient's survival rate. QUESTIONS 9.5..L During the 19908, computer and communications industries were the glamour businesses. Were their high profiles. though, reflected in the compensation paid to their CEOs (53)? The following table lists samples of 1995 salary plus bonuses (in $1000s) chief executive officers from (1) the and communications industry and (2) the mOre traditional financial services industry. Construct a 95% confidence interval for the difference in tbe average compensation by the two groups. Note: The pooled standard deviation for these data is 411. Section 9.5 Confidence Intervals for the Two-Sample Problem 589 1995 CEO SllUuy + Bonuses ($l,OOOs) & Communications Company Adobe Systems Alltel America Online Applied Materials BMC Software Frontier Nynex Read-Rite Solectron Camp. 668 200 1688 752 1235 1485 1020 788 Financial Services Company Comp. Boatmen's Bancshs CCB Frnancial Commercial Federal ChicagoNBD First of America Bk Great Western Finl Huntington Magna Group MBIA National Old National Bncp OnBancorp PNCBank RCSB Financial Summit Bancorp 1150 491 566 1296 498 953 504 750 799 292 500 647 1267 9.5.1. In 1965 a silver shortage in the United States prompted Congress to authorize the minting of silverless dimes and quarters. They also recommended that the silver content of half-dollars be reduced 9CI% to 40%. Historically, Huctuations the amount of rare metals found in coins are not uncommon (75). The following data may be a case in poinl Listed are the silver percentages found in samples of a Byzantine coin minted on two separate occasions during the reign Manuel I (1143-118O). Construct a 9CI% confidence interval for ~ x - iJ-y, the true average difference in the coin's silver content (= "early" - "late"). What does the intecval imply about the outcome of testing Ho: ILX = iJ-y? Note: ax == 0.54 and Sy = 0.36. Coinage, Xi (% Ag) 5.9 6.8 5.6 (iA 5.5 7.0 5.1 6.6 7.7 5.8 5.8 6.9 6.2 Average: 6.7 Average: 5.6 590 Chapter 9 Two-Sample Problems 9.5.3. Male fiddler crabs solicit attention from the opposite sex by standing in front of their burrows and waving their claws at the females who walk by. If a female likes what she sees, she pays the male a brief visit in his burrow. If everything goes weU and the crustacean chemistry clicks, she will stay a little longer and mate. In what may be a ploy to lessen the risk of spending the night alone, some of the males build elaborate data (226) suggest that a male's time mud domes over their burrows. Do the waving to females is influenced by whether his burrow has a dome? Answer the question by constructing and interpreting a 95% confidence interval for JLX - ILl" Note: $p = 11.2. % of Time Spent Waving to Females Males with Domes, Xi Males without Domes, YI 100.0 58.6 76.4 842 96.5 88.8 85.3 93.5 83.6 84.1 79.1 83.6 9.5.4.. Recall the preening time data in Table 8.2.2. Let /LX be the true average preening time for male fruit flies, and /.LY, the true average for female fruit Hies. Construct a 99% confidence interval for ILX - /.Ly. What do the of your interval imply about the outcome of Ho: /.LX = ILl' versus H1: /.LX ¢ ILl' at the a ::=. 0.01 level of sjgnificance? 9.5.5. Carry out the details to complete the proof of Theorem 95.1. 9.5..6. Suppose that Xl, X2, ... , and Yh Yl, ... , Ym are independent random samples from normal distributions with means /.LX and /.Ly and known standard deviations Cfx and Cfy, resp~tively. Derive a 100(1 - a)% confidence interval for /.LX - /LY. 9.5.7. Construct a 95% confidence interval for (1ji:l(1~ based on the presidential life data in Question 93.1. The hypothesis test referred to in Part (a) of that question to a "fail to reject Ho" conclusion. Does that agree with your confidence interval? Explain. 9.5.8. One of the parameters used in evaluating myocardial function is the end diastolic volume (EDV). The following table shows EDVsrecorded for Normal, Xj Constrictive Yi 62 24 60 56 42 62 74 49 44 67 28 80 48 Section 9.6 Taking a Second look at Statistics (Choosing Samples) 591 to have normal cardiac fUllctjon and for six with constrictive pericarditis (204). Would it be correct to use Theorem 9.2.2 to test Ho: f.J.x = f.J.y? Answer the question by constructing a 95% confidence interval for uiJu~. Complete the proof of Theorem 9.5.2. 9.5.10. Construct an 80% confidence interval the difference PM - PW in the nightmare frequency data summarized in Case Study 9.5.1L If PX and py denote the true success probabilities associated with two sets of nand m independent Bernoulli trials, respectively. the ratio x - 9.5.12. y - - (PK has approximately a standard normal distribution. Use that fact to prove Theorem rates in the United States tend to be much higher for men than for women, at aU ages. That pattern may not extend to aU professions, though. Death certificates obtained for the 3637 members of the American Chemical Society who died over a 2O-year period revealed that the male deaths were suicides, as compared to 13 of the 115 female deaths (103). Construct a 95% confidence interval for the difference in suicide rates. What wouJd you conclude? TAKING A SECOND LOOK AT STATISllCS (CHOOSING SAMPLES) ....,."l\..1'l.J'''UI'~ sample sizes is a topic that invariably receives extensive coverage whenever statistics and design are disCussed. good reason. Whatever the context, the number of observations making up a set prominently in the ability of those data to address any and all of the questions raised by the experimenter. As sample get we know that estimators become more precise hypothesis tests at distinguishing between Ho and Hl. Larger sample of course, are also get more expensive. The trade-off between how many observations researchers can afford to take and how many they would like to take is a choice that has to be made early on in design of any experiment If sample ultimately decided upon are too small, is a risk that the objectives of the study will not be fuHy achieved-parameters may be estimated with insufficient precision hypothesis tests may reach incorrect conclusions. That said, Choosing sample sizes is often not as critical to the sUCcess of an experiment as choosing subjects. In a two-sample design, example, how should we ........'.......... which particular to to Treatment X and which to Treatment Y. If the subjects comprising a sample are somehow "biased" with respect to measurement being recorded, integrity of the conclusions is irretrievably compromised. There are no statistical techniques "correcting" inferences based on measurements were biased in some unknown way. It is aJso true that biases can be very SUbtle, yet still have a pronounced effect on the final measurements. That being the case, it is incumbent OIl researchers to take every possible precaution at outset to prevent inappropriate to treatments. assignments of For example, suppose your Project you plan to study whetber a new synthetic testosterone can affect the behavior of female rats. Your intention is to up a 592 Chapter 9 TwcrSample Problems two-sample design where ten rats will be given weekly injections of the new testosterone compound and another ten rats will serve as a control group, receiving weekly injections of a placebo. At the end of eight weeks, all twenty rats will put in a large community cage, and the behavior each one will be closely monitored for signs of aggression. Last week you placed an order for twenty female Rattus norvegicus from the local cage. Your plan is Rats 'R Us franchise. They arrived today, aU housed in one to remove ten of the twenty "at random," and then put those ten in a similarly The ten removed will be receiving the testosterone the ten remaining the originaJ cage will constitute the control group. The question is, which ten should be removed? The obvious in and pull out ten (what's the big deal?)-is very much the wrong answer! Why? Because the samples fonned in a way might very weU biased for example, you (understandably) tended to avoid trying to grab rats that looked they might If that were the case, the ones you drew out would be biased, by virtue of more than the ones left behind. Since the measurements ultimately to be taken deal with the samples in that particular way would be a fatal flaw. Whether the total sample size was twenty or two thousand, the results would be worthless. In on our intuitive sense the 'word "random" to allocate to to number different treatments is risky, to say the least The correct approach would rats from one to twenty and then use a random number table or a computer's random number generator to identify the ten to be removed. Figure 9.6.1 shows the MINITAB syntax for choosing a random sample often numbers from the integers one through twenty. According to this particular run of the SAMPLE routine, the ten rats to be removed for the testosterone injections are (in order) numbers 1, 5, 8, 9,10,14,15,18,19 and 20. There is a moral here. Designing, carrying out, and analyzing an experiment is an exercise that draws on a variety of scientific, computational, and statistical skills, some which may be sophisticated. No matter how weU those issues are attended to, though, the will fail if the simplest and most basic aspects of the experiment--such as assigning subjects to treatments--are not carefully scrutinized and properly done. The Devil, as the saying goes, is in the details. > set c1 DATA> 1 :20 DATA> and M'I'8 MTB MTB > sample 10 c1 c2 > print c2 Data Display c2 18 1 20 19 9 10 RGURE 9.6.1 8 15 14 5 A Derivation of the Two-Sample t Test (A Proof of Theorem 9.2.2) Appendix 9.A..1 !ENDiX 9A1 593 A DERlVAl10N Of THE lWO-SAMPLE r: TEST (A PROOF OF THEOREM 92.2) To begin, we note that bath the restricted and unrestricted parameter spaces, wand are three dimensional: w Q = {(J.tx, J.tY, a): -00 < = {(J.tx, J.ty, (1): -00 < -00 < p,y < 00,0 < a < 00) n i=1 n m 11 = J.tx < 00, < 00,0 < (1 < 00) Ys are independent (and normal), theXs L(w) = J.ty J.tx !X(Xi) fr(Yi) j=l (9A.l.l) where J.t = p,x = p,y. If we take In L(w) solved L(w)/aJ.t = Oand a L(w)/au 2 = 0 .... ",t ...........-I maximum-likelihood estimates: simultaneously, the solutions will be n LXi + i=l (9.A.l.2) and (Xi - 0'2 p"./ + L'" (YJ - J.te)2 = _______i=_l_ _ __ We Substituting Equations 9.A.1.2 aod 9.A.1.3 the generalized likelihood ratio: n + (9A.l.3) m Equation 9.A.1.1 Similarly, the likelihood function unrestricted by the the numerator of hypothesis is 594 Ola pter 9 Two-Sample Problems Here, dlnL(O)=O dlnL(O)=O dJ.LX alnL(O) 0 dJ.LY gives + =------------~-------- n+m If these estimates are substituted into Equation 9.A.l.4, the maximum value for L(O) simplifies to L(Ol') -1 2 )(n+m)/2 = ( e jmuo. It follows, then, that tbe generalized likelihood ratio, ).... is equal to or, equivalently, 11 L(Xi - X)2 )...2/(n+m) + = ______1=_1_ _ _--:::-_-'-_ _ _ _ _ _ _ _--;::- Using the identity nX'+ = Appendix 9A2 MINITAB Applications 595 we can write ).2/(n+m) as (Xi ;=1 (Y'J _ y)2 + n_nm_ +m ;=1 1 -----------------~----~------------- 1 + ~--------~--~~--~------m (Xi - X)2 + L (Yi _ (~ + ~) y)2 j=1 n+m-2 ---------------------~--- 11 where s~ is +m - 2 (X -:vi + -"..,""""';;---:;-'---..,- sj[O/n) + (l/m)] pooled variance: Therefore, in terms of the observed t ratio, A.2/(n+l1I) simplifies to ).2/(I1+m) = n +m - 2 n+m-2+ (9.A.L5) At this point the proof is almost complete, The generalized likelihood ratio criteriOn, rejecting Ho: Ji-X = Ji-Y when 0 < ). ::s ). '", is clearly equivalent to rejecting the null hypothesis when 0 < ).,2/(II+m) ::s . But both of these, from Equation 9.A.l.5, are the same as rejecting Ho when ,2 is too large. Thus the decision rule in tenns of ,2 is Reject Ho: P,x = p,y in favor of HI: P,x ¢ p,y if,2 :::: t.,(1 Or, phrasing this in still another way, we should reject Ho P(-t"" < T < ,'"I Ho: Ji-X = Ji-Yis true) = 1 - By Theorem 9.2.1, though, T has a Student r distribution withn ±t'" = ±lu/2.n+m-2, and the theorem is proved. NDIX 9A2 +m (t - 2df, which makes MaNITAS APPUCATIONS MlNITAB has a simple command--lWOSAMPLE doing a two-sample t test on a set and YiS stored in Cl and respectively. same command automatically coru;tructs a 95% confidence interval Ji-X - P,y. ""''''-UJA 596 Chapter 9 Two-Sample Problems MTB > DATA> DATA> MTB > DATA> DATA> DATA> MTB > MTB > SUBC > set c1 0.225 0.262 0.217 0.240 0.230 0.229 0.235 0.217 end set c2 0 209 0.205 0.196 0.210 0.202 0.207 0.224 0.223 0.220 0.201 end name c1 'X' c2 'Y' twosample c1 c2; pooled.. Two-Sample T-Test and CI: X, Y Two-sample T for X vs Y N Mean StOev SE Mean X 8 0.2319 0.0146 0.0051 Y 10 0.20970 0.00966 0.0031 Difference = mu (X) - mu (y) Estimate for difference: 0.022175 95% CI for difference: (0.010053. 0.034297) T-Test of difference • 0 (vs not : T-Value - 3.88 P-Value = 0.001 DF • 16 80th use Pooled StDev a 0.0121 RGURE 9.A.2.1 Figure 9.A.2.1 shows the syntax for analyzing the Quintus Curtius Table 9.2.1. Notice that a subcommand is included. If we write ~nloo~~ralSS data in MTB > twosample c1 c2 MINJTAB will assume the two population variances are not equal, and it will perform the approximate I test described in Question 9.2..15. If the intention is to assume that (}'~ (and do the t test as described in Theorem 9.2.1), the proper syntax is MTB > twosample c1 c2; SUBC > As is typical, MINJTAB the test statistic with a P-value rather than an "Accept Ho" or "Reject Ho" conclusion. Here, P = 0,001 which is consistent with the Study 9.2.1 to "reject Ho at the Oi = 0.01 level of significance." decision reached in Figure 9.A.2.2 shows the "unpooled" analysis of these same data. The conclusion is the same, although the P-value has almost tripled. because both the test and degrees of freedom have decreased (recall Question 9.2.16). Appendix 9A2 MTB > DATA> DATA > MTB :> DATA> DATA> MTB > MTB > MINITAB Applications 591 set c1 0.225 0.262'0. 0.240 0.230 0.229 0.235 0.217 end set c2 0.209 0.205 0.196 0.210 0.202 0.207 0.224 0.223 0.220 0.201 end name c1 'X' c2 'V' tvosample c1 c2 Two-Sample T-Test and CI: X, Y Two-sample T for X vs Y N Mean StDev SE Mean X 8 Y 10 0.2319 0.20970 0.0146 0.00966 0.0051 0.0031 Difference = mu (X) - mu (y) Estimate for difference: 0.022175 95% CI for difference: (0.008997, 0.035353) T-Test of difference = 0 (va not B): T-Value = 3.70 P-Value: 0.003 DF AGURE 9.A.2.2 Testing "0: ILx = ILl' Using MlNJTAB Windows 1. two samples in Cl and respectively. 2.. Click on STAT, then on BASIC STATISTICS, then on 2-SAMPLE t. 3. Click on SAMPLES IN DIFFERENT COLUMNS, and type Cl in FIRST box and SECOND 4. Oick on ASSUME VARIANCES (if a pooled t test is desired). S. Click on OPTIONS 6. Enter value for 100 (1 - a) in CONFIDENCE LEVEL box. Click on NOT EQUAL; then on whichever is desired. 8. Click on OK; click on remaining OK. = 11 C HAP T E R 10 Goodness-of. . Fit Tests 10.1 INTRODUCTION 10.2 THE MUlTlNOMIAl DISTRlBU110N 10.3 GOODNEss..OF-m TESTS: All PARAMETERS KNOWN 10.4 GOODNESS-OF-FIT TESTS: PARAMrnRS UNKNOWN 10.5 CONTINGENCY TABLES 10.6 TAKING A SECOND LOOK AT STATISTICS (OUTIJERS) APPENDIX 10A1 MINITAB APPUCATIONS Karl Pearson Called by some the founder of twentieth-century statistics, Pearson received his university education at Cambridge, concentrating on physics.. philosophy, and law. He was called to the bar in 1881 but never practiced. In 7911 Pearson his chair of applied mathematics and mechanics at University College, London, and became the first Galton Professor of Eugenics, as was Galton's wish Together with Weldon, Pearson founded the prestigious journal Biometrika and served as its principal editor from 1901 until his -Karl Pearson (1857-1936) 598 Section 10.2 The Multinomial Distribution 599 INTRODUCTION The give and take between the mathematics of probability and the empiricism of statistics again we have seen should be, by now, a theme comfortably familiar. Time and repeated measurements, no matter what their source, exhibiting a regularity of pattern that can be well approximated by one or more of the handful of probability functions introduced in Chapter 4. Until now, a11 the inferences resulting from this interfacing have been parameter specific, a facl to which the many hypothesis tests about means, variances, and binomial proportions paraded forth in Chapters 6, 7, and 9 bear ample testimony. Still, there are other situations where the basicfonn px(k) or fy(y), rather than the value of its parameters, is the most important qucstion at issue. These situations are the focus of Chapter 10. A geneticist, for example, want to know whether the inheritance of a certain set of traits follows the same set of ratios as those prescribed by Mendelian theory. The objective of a psychologist, on the other hand, might be to confion or refute a newly proposed model for cognitive serial learning. Probably the most habitua1 users of inference procedures directed at the entire pdf, though, are statisticians themselves: As a prelude to doing any sort of hypothesis test or confidence interval, an attempt should be made, sample size permitting, to verify that the data are, indeed, representative of whatever distribution that procedure presumes. Usually, this will mean testing to see whether a set of YiS might conceivably be representing a normal distribution. In general, any procedure that to determine whether a set oC data could reasonably have originated from some given probability distribution, or class of probability distributions, is cal1ed a.goodness-of-fil'X.est. The principle behind the particular goodness-of-fit test we will look at is very straightf01ward: First the observed data are grouped, more or Jess arbitrarily, into k then each occupancy is calculated on the basis of the presumed model. If it should happen that the set of observed and expected frequencies show considerably more disagreement than sampling variability would predict, our conclusion will be that the supposed px(k) or fy(y) was incorrect. In practice, goodness-of-fit tests have variants, depending on the specificity of nuH hypothesis. Section (0.3 describes the approach to take when both the form of the presumed data model and values of its parameters are known. More typically, we know the form of Px(k) or Jy(Y), but their parameters need to be estimated; these are taken up in Section lOA somewhat different application of goodness-oC-fit testing is the focus of Section to.5. There the null hypothesis is that two random variables are independent. [n more than a few fields endeavor, tests for independence are among the most frequently used of all inference procedures. THE MULTINOMIAL DISTRIBUTION Their diversity notwithstanding, most gOodness-of-fit tests are based on essentially the same statistic, one that has an asymptotic chi square distribution. The underlying structure of that statistic, though, derives from the multinomial distributioll, a direct extension of the familiar binomial. In this section we define the multinomia1 and state those of its properties that relate to the probtem goodness-of-fit testing. 600 Chapter 10 Goodness-of-Fit Tem Given a of 11 independent Bernoulli trials, each with success probability p, we know that the pdf for X, the total number of successes, is P(X =k) = px(k) = (:)pk(l - p)n-k, k=O,l, ... ,n (10.2.1) of the obvious ways to Equation 10.2.1 is to consider situations at each trial one of t outcomes can occur, rather than just one of two. That is, we will assume that each trial wi]) result in one ofthe outcomesrl, r:t •...• rr, where p(ri) = Pi, i = 1. 2, .... t (see 10.2.1). It follows, of course, that Possible outcomes rJ Tl r:t 12 r, rr 1 2 Pi i Pi = 1. rl = P(n), '2 = L 2.. .. ,1 rl - n Independenttrla1s FIGURE 10.2.1 In the binomial model, the two possible outcomes are denoted sand f, where P(s) = P and P(f) = 1 - p. Moreover, the outcomes of the 11 trials can be nicely suounarized with a single random variable X. where X denotes the number of successes. In the more multinomial model, we will need a random variable to count the number of times that each of the riS occurs. To that end, we Xi = number of times Yi occurs, For a gjven set of n trials, then, Xl = kl. Xl = k2, i .. , XI = 1,2, ...• 1 k/ Theorem lO.2.L Let Xi denote the number of times that the outcome r; occurs, i 1,2, ... ,I, in a series of n independent trials, where Pi = P(r;). Then the vector (Xl, ... , XI) has a multinomial distribu tion and =---I k; = 0, 1,. .. , n; i = 1, 2, .... t; L k, = n i;ol Proof. Any particular sequence of kt T18, k2128 • ... , and kr TIS has probability p~1 p~t. Moreover, the total number of outcome sequences that win generate the values Section 10.2 The Multinomial Distribution 601 (kl. k2, ... , k t ) is the ~~"U~~~ of ways to permute n objects, k. of one type, k2 ofasecond type, ... , and It, of a t th By Theorem 2.6.2 that number is n! / It} !kl!.. kt !, and statement of the follows. Comment. Depending on the context, the TiS associated with the n trials Figure I 0.2. 1 can either numerical values (or categories) or ranges numerical values (or categories). Example 10.2.1 illustrates the first type; Example 10.2.2, the second The only requirements imposed on the r;s are (1) they must span all of the outcomes possible at a given trial and (2) they must mutually exclusive. EXAMPLE 10.2.1 Suppose a loaded die is tossed twelve times, where Pi = = i appeaR» i = 1, ... ,6 What is the probability that each face will appear exactly twice? Note, first, tbat 6 6(6 6. LPt::;:: 1 = LeI =c 1=1 > + 1) ~...".-- ;=1 which implies that c = (and Pi = i /21). In the terminology of Theorem 10.2.1, then, the possible outcomes each trial are the t = 6 faces, 1 71) through 6 76), and Xl is of times face i occurs, i = 1,2, ... ,6. question is asking the probability of the vector According to P(XI ""'W'P'IT! 10.2.1, = 2, Xl = 2, ... , X6 = = 0.0005 EXAMPLE 10.2.2 Five observations are drawn at random from the pdf fy(y) = 6y(1 y), O:s y :s 1 What is the probability that one of the observations lies in the interval [0, 0.25), none in the interval 0.50), three the [O.SO, 0.75), and one the interval [0.75.1.00]1 602 Chapter 10 Tests 2 _____ ,_~fy(y)==6Y(1 .., ,, , "" , " , Pt 0 P:o, P4 0.50 0.25 Y2 rl 0.75 r3 1.00 '4 AGUftE 10.2.2 Figure 10.22 shows the pdf being sampled, together with the ranges 1'1. '2, 1'3, and '4, and the intended disposition of five data points. The PiS of Theorem 10.2.1 are now example, PI: areas.. Integrating Jy(y) from 0 to Pl = fO.25 10 6y(I.- y) dy =3y210.25 - 2y31°·25 o = By symmetry, P4 = 0 5 Moreover, since the area under fy(y) equals 1, P2 = P3 = ~2 (1 _ 3210) = 1132 Let Xj denote the number of observations that fall into the ith range, i = 1,2.3,4. The probability. associated with the multinomial vector (1, 0, 3, 1) is 0.0198: 3, = 1) = 1! O~~! 11 (:2Y G~r (~~y (:2Y = 0.0198 A MultinomiaUBinomial Relationship Since the multinomial pdf is conceptually a straightforward generalization of the binomial pdf, it should come as no surprise that each XI in a multinomial vector is, itself, a binomial random variable. Theorem 10.2.2.. Suppose the vector ,X2 •... • Xl) is a multinomial random variable with parameters n. Pl. P2 •... , and PI' Then the marginal distribution of Xi. i = 1,2, ... , t, is the bim;mia1 pdf with parameters 11. and Pi. Section 10.2 The Multinomial Distribution 601 Uo;;;'.JL!~N the pdffor we need to dichotomize the outcomes into "ri" and "not rj." Xi becomes,

Mathematical Statistics Textbook: Larsen & Marx, 4th Ed.

Related documents

Products

Support

Mathematical Statistics Textbook: Larsen & Marx, 4th Ed.

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib