Statistics 510: Notes 10 Reading: Sections 4.7. Schedule: Thursday night (10/13): I will e-mail Homework 5 which will be due Friday, October 21st. Monday, 10/17: Fall break, no class. Wednesday, 10/19: Chapters 4.8-4.9 Monday, 10/24: Chapters 5.1-5.2 Wednesday, 10/26: Midterm. The midterm will cover Chapters 1-4. I will provide review materials and review problems for the midterm by Wednesday 10/19. I will hold extended office hours the week before the midterm. I. Poisson Random Variables A random variable X taking on one of the values 0, 1, 2, ... is said to be a Poisson random variable with parameter if for some 0 , p(i ) P{ X i} e i i! , i 0,1, 2, This is a pmf because p(i) e i 0 i 0 i i! e e 1 Poisson random variables provide an approximation for a binomial random variable when n is large and p is small enough so that np is of moderate size. Let X be a binomial random variable with parameters (n,p) and let np . Then n! P( X i) p i (1 p ) n i (n i )!i ! n! (n i )!i ! n n i 1 n n(n 1) (n i 1) i (1 / n) n ni i ! (1 / n)i For n large, moderate and i considerably smaller than i n(n 1) (n i 1) 1 e , 1, 1 1 i n n n Hence for n large and moderate, n P( X i ) e i i i! . (Note that if i is not considerably smaller than n and p is small, then P( X i ) 0 for a binomial random variable and P( X i ) 0 for a Poisson random variable since i would be much greater than np if i is not considerably smaller than n). The following tables gives an idea of the accuracy of the Poisson approximation to the binomial. For n=100, p=1/100, the Poisson approximation is remarkably good. Binomial probabilities and Poisson probabilities for n=5 and p=1/5 ( 1 ) 5 e (1) x (.2) (.8) 1 5 x x x 0 1 2 3 4 5 6 x x! 0.328 0.410 0.205 0.051 0.006 0.000 0 0.368 0.368 0.184 0.061 0.015 0.003 0.001 Binomial probabilities and Poisson probabilities for n=100 and p=1/100 ( 1 ) 100 e (1) X (.01) (.99) x x 0 1 2 3 4 5 6 7 8 9 10 0.366032 0.369730 0.184865 0.060999 0.014942 0.002898 0.000463 0.000063 0.000007 0.000001 0.000000 100 x 1 x x! 0.367879 0.367879 0.183940 0.061313 0.015328 0.003066 0.000511 0.000073 0.000009 0.000001 0.000000 Applications of Poisson random variables: The Poisson family of random variables provides a good model for the number of successes in an experiment consisting of a large number of independent trials with a small probability of success for each trial (since the number of successes is a binomial random variable with n large and p small) Examples of random phenomenon that are accurately modeled as Poisson random variables include: The number of misprints on a page (or a group of pages) of a book The number of people in a community living to 100 years of age The number of wrong telephone numbers that are dialed in a day Example 1: A chromosome mutation believed to be linked with colorblindness is known to occur, on the average, once in every 10,000 births. If 20,000 babies are born this year in a certain city, what is the probability that at least one will develop color blindness? Expected Value and Variance of Poisson Random Variables Let X be a Poisson ( ) random variable. ie i E( X ) i! i 0 e i 1 i 1 (i 1)! e i 0 j j! (by letting j i 1) e e i 2e i E( X ) i! i 0 2 ie i 1 i 1 (i 1)! ( j 1)e j e (by letting j i 1) j ! j 0 je j e j j ! j! j 0 j 0 ( 1) where the final equality follows since the first sum is the expected value of a Poisson random variable with parameter and the second is the sum of the probabilities of this random variable. Therefore, Var ( X ) E ( X 2 ) ( E ( X )) 2 . Poisson random variables for number of events occurring in a time period Another use of the Poisson probability distribution, besides approximating the binomial for large n, small p, is to model the number of “events” occurring in a certain period of time, e.g., the number of earthquakes occurring during some fixed time span the number of wars per year the number of electrons emitted from a radioactive source during a given period of time the number of freak accidents, such as falls in the shower, for a large population during a given period of time (used by insurance companies) number of vehicles that pass a marker on a roadway during a given period of time. Let X denote the number of events occurring in a certain period of time. Suppose for a positive constant , the following assumptions hold true: 1. The probability that exactly 1 event occurs in a given interval of length h is equal to h o(h) where o(h) stands f (h) / h 0 [for for any function f ( h) that is such that lim h 0 2 instance, f (h) h is o(h) whereas f (h) h is not.] 2. The probability that 2 or more events occur in an interval of length h is equal to o(h) . 3. For any integers n, j1 , j2 , , jn and any set of n nonoverlapping intervals, if we define Ei to be the event that exactly ji of the events under consideration occur in the ith of these intervals, then events E1 , independent. , En are Under Assumptions 1-3, the number of events occurring in any interval of length t is a Poisson random variable with parameter t . Proof: Let N (t ) denote the number of events occurring in the interval [0, t ] . To obtain an expression for P{N (t ) k} , we start by breaking the interval [0, t ] into n non-overlapping subintervals of length t / n . Now, P{N (t ) k} P{k of the subintervals contain exactly 1 event and the other n k contain 0 events} +P{ N (t ) k and at least one subinterval contains 2 or more events} (1.1) Let A and B denote the two mutually exclusive events on the right hand side of the above equation. We have P( B) P(at least one subinterval contains 2 or more events) n P( {ith subinterval contains 2 or more events}) i 1 n P(ith subinterval contains 2 or more events) i 1 n = o(t / n) i=1 no(t / n) o(t / n) t t/n Now for any t, t / n 0 as n 0 and so o(t / n)(t / n) 0 as n by the definition of o. Hence, P( B) 0 as n (1.2) . On the other hand, since assumptions 1 and 2 imply that P(0 events occur in an interval of length h) 1 [ h o(h) o(h)] 1 h o(h) we see from Assumption 3 that P( A) P{k of the subintervals contain exactly 1 event and and the other n-k contain 0 events} n t t o n k n However, since k t t 1 n o n nk t t o(t / n) n o t t t as n , n t/n n it follows by the same argument that verified the Poisson approximation to the binomial that t P( A) e t k! k as n (1.3) Thus, by letting n and using (1.1), (1.2) and (1.3), we obtain e t ( t ) k P{N (t ) k} , k 0,1,... k! Hence, if assumptions 1-3 are satisfied, the number of events occurring in any fixed interval of length t is a Poisson random variable with mean t . The value is the rate per unit time at which events occur. Example 2: In the 432 years from 1500 to 1931, war broke out somewhere in the world a total of 299 times (By definition, a military action was a war if it either was legally declared, involved over 50,000 troops or resulted in significant boundary realignments. To achieve greater uniformity from war to war, major confrontations were split into smaller “subwars”: World War I, for example, was treated as five separate wars). The following table gives the distribution of the number of years in which x wars broke out and the expected frequencies for a Poisson ( 0.69 ) random variable. Number of wars beginning in a given year 0 1 2 3 4+ Total Observed Frequency Expected Frequency 223 142 48 15 4 432 217 149 52 12 2 432 Example 3: Suppose that earthquakes occur in the western portion of the United States in accordance with assumptions 1, 2 and 3 with 2 and with 1 week as the unit of time (That is, earthquakes occur in accordance wiht the three assumptions at the rate of 2 per week). (a) Find the probability that at least 3 earthquakes occur during the next 2 weeks. (b) Find the probability distribution of the time, starting from now, until the next earthquake.