Statistics 510: Notes 12 Reading: Sections 4.8-4.9 Schedule: I will e-mail review problems for midterm by Thursday night. Friday, 10/21, 5 pm: Homework 5 due. Office hours by appointment. Monday, 10/24, class: Chapters 5.1-5.2 Monday, 10/24, evening (time, location TBA): question and answer session for midterm Tuesday, 10/25: Office hours: 1-2, 4:45-6:45, by appointment. Wednesday, 10/26P Midterm. Midterm info: Covers lectures 1-12 on Chapters 1-4. Best review is to review the homework problems and class notes. The exam is closed book but you are allowed two 8.5 x 11 sheets of notes, front and back. Bring a calculator. I. Review: Poisson Distribution Arises in two settings: (1) Poisson distribution provides an approximation to the binomial distribution when n is large, p is small and np is moderate. (2) Poisson distribution is used to model the number of events that occur in a time period t when (a) the probability of an event occurring in a given small time period t ' is approximately proportion to t ' (b) the probability of two or more events occurring in a given small time period t ' is much smaller than t ' (c) the number of events occurring in two non-overlapping time periods are independent When (a), (b) and (c) are satisfied, the number of events occurring in a time period t has a Poisson ( t ) distribution. The parameter is called the rate of the Poisson distribution. The mean number of events that occur is t and the variance of the number of events is also t . Sketch of proof for Poisson distribution under (a)-(c): For a large value of n, we can divide the time period t into n nonoverlapping intervals of length t / n . The number of events occurring in time period t is then approximately Binomial (n, t / n) . Using the Poisson approximation to the binomial, the number of events occurring in time period t is approximately Poisson (n * t / n) =Poisson ( t ). Taking the limit as n yields the result. Number of events occurring in space: The Poisson distribution also applies to the number of events occurring in space. Instead of intervals of length t, we have domains of area or volume t. Assumptions (a)-(c) become: (a’) the probability of an event occurring in a given small region of area or volume t is approximately proportion to t (b’) the probability of two or more events occurring in a given small region of area or volume t is much smaller than t (c’) the number of events occurring in two non-overlapping regions are independent The parameter for a Poisson distribution for the number of events occurring in space is called the intensity. Example 1: Bacteria are distributed throughout a volume of liquid according to assumptions (a’), (b’) and (c’) with an intensity of 0.6 organisms per mm3. A measuring device counts the number of bacteria in a 10 mm3 volume of the liquid. What is the probability that more than two bacteria are in this measured volume? II. Geometric Random Variable (Section 4.8.1) Suppose that independent trials, each having a probability p, 0 p 1 , of being a success, are performed until a success occurs. Let X be the random variable that denotes the number of trials required. The probability mass function of X is P{ X n} (1 p)n1 p n 1, 2, (1.1) The pmf follows because in order for X to equal n, it is necessary and sufficient that the first n-1 trials are failures and the nth trial is a success. A random variable that has the pmf (1.1) is called a geometric random variable with parameter p. The expected value and variance of a geometric (p) random variable are 1 1 p E ( X ) , Var ( X ) 2 . p p Example 2: A fair die is tossed. What is the probability that the first six occurs on the fourth roll? What is the expected number of tosses needed to toss the first six? III. Negative Binomial Distribution (Section 4.8.2) Suppose that independent trials, each having a probability p, 0 p 1 , of being a success, are performed until r successes occur. Let X be the random variable that denotes the number of trials required. The probability mass function of X is n 1 r nr P{ X n} n r , r 1, (1.2) p (1 p) r 1 A random variable whose pmf is given by (1.3) is called a negative binomial random variable with parameters (r , p ) . Note that the geometric random variable is a negative binomial random variable with parameters (1, p) . The expected value and variance of a negative binomial random variable are r r (1 p) E ( X ) , Var ( X ) p p2 The pmf follows because in order for X to equal n, it is necessary and sufficient that the first n-1 trials are failures and the nth trial is a success. Example 3: Suppose that an underground military installation is fortified to the extent that it can withstand up to four direct hits from air-to-surface missiles and still function. Enemy aircraft can score direct hits with these particular missiles with probability 0.7. Assume all firings are independent. What is the probability that a plane will require fewer than 8 shots to destroy the installation? What is the expected number of shots required to destroy the installation? IV. Hypergeometric Random Variables (Section 4.8.3) Suppose that a sample of size n is to be chosen randomly (without replacement) from an urn containing N balls, of which m are white and N m are black. If we let X be the random variable that denotes the number of white balls selected, then m N m i n i P{ X i} , i 0,1, , n N (1.4) n A random variable X whose pmf is given by (1.4) is said to be a hypergeometric random variable with parameters n, N , m . The expected value and variance of a hypergeometric random variable with parameters n, N , m is nm n 1 E( X ) , Var ( X ) np(1 p ) 1 N N 1 . Example 4: A Scrabble set consists of 54 consonants and 44 vowels. What is the probability that your initial draw (of seven letters) will be all consonants? six consonants and one vowel? five consonants and two vowels? V. Zeta (or Zipf) distribution A random variable is said to have a zeta (sometimes called the Zipf) distribution with parameter if its probability mass function is given by C P{ X k} 1 , k 1, 2, k for some value of 0 . Since the sum of the foregoing probabilities must equal 1, it follows that 1 1 C k 1 k 1 Consider a population of objects that are grouped into categories (such as all words in a book (grouped into words) or people living in urban areas in a country (grouped into cities). Let X k denote the event that a randomly chosen object belongs to the kth largest group. The Zipf distribution has been found to accurately describe P{ X k} such as words in a book and the cities people live in. Rank n 1 7 13 19 25 31 City NewYork Detroit Baltimore Washington, D.C. New Orleans Kansas City, Mo. Population (1990) 7,322,564 1,027,974 736,014 606,900 496,938 434,829 Expected population under Zipf’s distribution with 0 10,000,000 1,428,571 769,231 526,316 400,000 322,581 37 49 61 73 85 97 Virginia Beach, Va. Toledo Arlington'Texas Baton Rouge, La. Hialeah, Fla. Bakersfield, Calif. 393,089 332,943 261,721 219,531 188,008 174,820 270,270 204,082 163,934 136,986 117,647 103,093 VI. Properties of the Cumulative Distribution Function (Section 4.9) Recall that the cumulative distribution function (CDF) of a random variable X is the function F (b) P( X b) . All probability questions about X can be answered in terms of the cdf F. For example, P(a X b) F (b) F (a) for all a b .