Lecture Notes Stat 330 Fall 2006 Heike Hofmann February 15, 2008 Contents 1 Introduction 1.1 Basic Probability . . . . . . . . . . . . . . . . . . 1.1.1 Examples for sample spaces . . . . . . . . 1.1.2 Examples for events . . . . . . . . . . . . 1.2 Basic Notation of Sets . . . . . . . . . . . . . . . 1.3 Kolmogorov’s Axioms . . . . . . . . . . . . . . . 1.4 Counting Methods . . . . . . . . . . . . . . . . . 1.4.1 Two Basic Counting Principles . . . . . . 1.4.2 Ordered Samples with Replacement . . . 1.4.3 Ordered Samples without Replacement . . 1.4.4 Unordered Samples without Replacement 1.5 Conditional Probabilities . . . . . . . . . . . . . . 1.6 Independence of Events . . . . . . . . . . . . . . 1.7 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . 1.8 Bernoulli Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 2 4 8 8 8 10 11 12 13 15 20 2 Random Variables 2.1 Discrete Random Variables . . . . . . . . . . . . . . . . 2.1.1 Expectation and Variance . . . . . . . . . . . . . 2.1.2 Some Properties of Expectation and Variance . . 2.1.3 Probability Distribution Function . . . . . . . . . 2.2 Special Discrete Probability Mass Functions . . . . . . . 2.2.1 Bernoulli pmf . . . . . . . . . . . . . . . . . . . . 2.2.2 Binomial pmf . . . . . . . . . . . . . . . . . . . . 2.2.3 Geometric pmf . . . . . . . . . . . . . . . . . . . 2.2.4 Poisson pmf . . . . . . . . . . . . . . . . . . . . . 2.2.5 Compound Discrete Probability Mass Functions 2.3 Continuous Random Variables . . . . . . . . . . . . . . . 2.4 Some special continuous density functions . . . . . . . . 2.4.1 Uniform Density . . . . . . . . . . . . . . . . . . 2.4.2 Exponential distribution . . . . . . . . . . . . . . 2.4.3 Erlang density . . . . . . . . . . . . . . . . . . . 2.4.4 Gaussian or Normal density . . . . . . . . . . . . 2.5 Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 24 26 27 27 28 28 29 30 31 34 37 39 39 40 42 44 47 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Elementary Simulation 49 3.1 Basic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 A general method for discrete data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 i 0 CONTENTS 3.2.2 3.3 A general Method for Continuous Densities . . . . . . . . 3.2.2.1 Simulating Binomial & Geometric distributions . 3.2.2.2 Simulating a Poisson distribution . . . . . . . . 3.2.2.3 Simulating a Normal distribution . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 54 54 54 55 4 Stochastic Processes 59 4.1 Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Birth & Death Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5 Queuing systems 5.1 Little’s Law . . . . . . . . . . 5.2 The M/M/1 Queue . . . . . 5.3 The M/M/1/K queue . . . . 5.4 The M/M/c queue . . . . . . 5.5 Machine-Repairmen Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 70 71 73 74 76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Statistical Inference 6.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 6.1.1 Maximum Likelihood Estimation . . . . . . . . . . . 6.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Large sample C.I. for µ . . . . . . . . . . . . . . . . 6.2.2 Large sample confidence intervals for a proportion p 6.2.2.1 Conservative Method: . . . . . . . . . . . . 6.2.2.2 Substitution Method: . . . . . . . . . . . . 6.2.3 Related C.I. Methods . . . . . . . . . . . . . . . . . 6.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . 6.4 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Simple Linear Regression (SLR) . . . . . . . . . . . 6.4.1.1 The sample correlation r . . . . . . . . . . 6.4.1.2 Coefficient of determination R2 . . . . . . . 6.4.2 Simple linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 . 79 . 81 . 85 . 86 . 89 . 89 . 90 . 91 . 93 . 96 . 97 . 100 . 100 . 102 A Distribution Tables 103 A.1 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 A.2 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Chapter 1 Introduction Motivation In every field of human life there are processes that cannot be described exactly (by an algorithm). For example, how fast does a web page respond? when does the bus come? how many cars are on the parking lot at 8.55 am? By observation of these processes or by experiments we can detect patterns of behavior, such as: “ usually, the first week of semester the campus network is slow”, “by 8.50 am the parking lot at the Design building usually is full”. Our goal is to analyze these patterns further: 1.1 Basic Probability Real World Observation, experiment with unknown outcome list of all possible outcomes 1.1.1 Mathematical World Random experiment sample space Ω (read: Omega) individual outcome elementary event A, A ∈ Ω (read: A is an element of Ω) a collection of individual outcomes event A, A ⊂ Ω (read: A is a subset of Ω) assignment of the likelihood or chance of a possible outcome probability of an event A, P (A). Examples for sample spaces 1. I attempt to sign on to AOL from my home - to do so successfully the local phone number must be working and AOL’s network must be working. Ω = { (phone up, network up), (phone up, network down), (phone down, network up), (phone down, network down) } 2. Online I attempt to access a web page and record the time required to receive and display it (in seconds). Ω = (0, ∞) seconds 3. on a network there are two possible routes a message can take to a destination - in order for a message to get to the recipient, one of the routes and the recipient’s computer must be up. 1 2 CHAPTER 1. INTRODUCTION Ω1 in tabular form: route 1 up up up up down down down down route 2 up up down down up up down down recipient’s computer up down up down up down up down or, alternatively: Ω2 = { successful transmission, no transmission } Summary 1.1.1 • Sample spaces can be finite, countable infinite or uncountable infinite. • There is no such thing as THE sample space for a problem. The complexity of Ω can vary, many are possible for a given problem. 1.1.2 Examples for events With the same examples as before, we can define events in the following way: 1. A = fail to log on B = AOL network down then A is a subset of Ω and can be written as a set of elementary events: A = { (phone up, network down), (phone down, network up), (phone down, network down)} Similarly: B = {(phone up, network down), (phone down, network down)} 2. C = at least 10 s are required, C = [10, ∞). 3. D = message gets through D with first sample space: D = {(U, U, U ), (U, D, U ), (D, U, U )} Once we begin to talk about events in terms of sets, we need to know the standard notation and basic rules for computation: 1.2 Basic Notation of Sets For the definitions throughout this section assume that A and B are two events. Definition 1.2.1 (Union) A ∪ B is the event consisting of all outcomes in A or in B or in both read: A or B 1.2. BASIC NOTATION OF SETS 3 Example 1.2.1 2. time required to retrieve and display a particular web page. Let A, B, C and D be events: A = [100, 200), B = [150, ∞), C = [200, ∞) and D = [50, 75]. Then A ∪ B = [100, ∞) and A ∪ C = [100, ∞) and A ∪ D = [50, 75] ∪ [100, 200] Definition 1.2.2 (Intersection) A ∩ B is the event consisting of all outcomes simultaneously in A and in B. read: A and B Example 1.2.2 2. Let A, B, C and D be defined as above. Then A∩B = [100, 200) ∩ [150, ∞) = [150, 200) A∩D = [100, 200) ∩ [50, 75] = ∅ 3. Let A be the event “fail to log on” and B = “network down”. Then A ∩ B = {(phone up, network down), (phone down, network down)} = B B is a subset of A. Definition 1.2.3 (Empty Set) ∅ is the the set with no outcomes Definition 1.2.4 (Complement) Ā is the event consisting of all outcomes not in A. read: not A Example 1.2.3 3. message example Let D be the event that a message gets through. D̄ = { ( D,D,U), (D,U,D), (U,D,D), (D,D,D) }. 4 CHAPTER 1. INTRODUCTION Definition 1.2.5 (disjoint sets) Two events A and B are called mutually exclusive or disjoint, if their intersection is empty: A∩B =∅ [ 1.3 Kolmogorov’s Axioms Example: 3. From my experience with the network provider, I can decide that the chance that my next message gets through is 90 %. Write: P (D) = 0.9 To be able to work with probabilities properly - to compute with them - one must lay down a set of postulates: Kolmogorov’s Axioms A system of probabilities ( a probability model) is an assignment of numbers P (A) to events A ⊂ Ω in such a manner that the probability of any event A is a real number between 0 and 1 the sum of probabilities of all events in the sample space is 1 (i) 0 ≤ P (A) ≤ 1 for all A (ii) P (Ω) = 1. (iii) if A1 , A2 , . . . are (possibly, infinite many) disjoint events (i.e. Ai ∩ Aj = ∅ for all i, j) then X P (A1 ∪ A2 ∪ . . .) = P (A1 ) + P (A2 ) + . . . = P (Ai ). the probability of a disjoint union of events is equal to the sum of the individual probabilities i These are the basic rules of operation of a probability model: • every valid model must obey these, • any system that does, is a valid model Whether or not a particular model is realistic or appropriate for a specific application is another question. Example 1.3.1 Draw a single card from a standard deck of playing cards Ω = { red, black } Two different, equally valid probability models are: Model 1 P (Ω) = 1 P ( red ) = 0.5 P ( black ) = 0.5 Mathematically, both schemes are equally valid. Model 2 P (Ω) = 1 P ( red ) = 0.3 P ( black ) = 0.7 1.3. KOLMOGOROV’S AXIOMS 5 Beginning from the axioms of probability one can prove a number of useful theorems about how a probability model must operate. We start with the probability of Ω and derive others from that. Theorem 1.3.1 Let A be an event in Ω, then P (Ā) = 1 − P (A) for all A ⊂ Ω. For the proof we need to consider three main facts and piece them together appropriately: 1. We know that P (Ω) = 1 because of axiom (ii) 2. Ω can be written as Ω = A ∪ Ā because of the definition of an event’s complement. 3. A and Ā are disjoint and therefore the probability of their union equals the sum of the individual probabilities (axiom iii). All together: (1) (2) (3) 1 = P (Ω) = P (A ∪ Ā) = P (A) + P (Ā). This yields the statement. 2 Example 1.3.2 3. If I believe that the probability that a message gets through is 0.9, I also must believe that it fails with probability 0.1 Corollary 1.3.2 The probability of the empty set P (∅) is zero. For a proof of the above statement we exploit that the empty set is the complement of Ω. Then we can apply Theorem 1.3.1. Thm 1.3.1 P (∅) = P (Ω̄) = 1 − P (Ω) = 1 − 1 = 0. 2 Theorem 1.3.3 (Addition Rule of Probability) Let A and B be two events of Ω, then: P (A ∪ B) = P (A) + P (B) − P (A ∩ B) To see why this makes sense, think of probability as the area in the Venn diagram: By simply adding P (A) and P (B), P (A ∩ B) gets counted twice and must be subtracted off to get P (A ∪ B). Example 1.3.3 6 CHAPTER 1. INTRODUCTION 1. AOL dial-up: If I judge: P ( phone up ) = 0.9 P ( network up ) = 0.6 P ( phone up, network up ) = 0.55 then P ( phone up or network up) = 0.9 + 0.6 − 0.55 = 0.95 diagram: network up down phone up down .55 .05 .35 .05 .90 .10 .60 .40 1 Example 1.3.4 computer access Out of a group of 100 students, 30 have a laptop, 50 have a computer (desktop or laptop) of their own and 90 have access to a computer lab. One student is picked at random. What is the probability that he/she (a) has a computer but not a laptop? (b) has access to a lab and a computer of his/her own? (c) does have a laptop or a computer? To get an overview of the situation, we can first define events and draw a Venn diagram. Define L = student has laptop C = student has computer of his/her own A = student has access to lab Since L is a subset of C ( we write this as L ⊂ C), L is inside C in the diagram. L C A Ω From the Venn diagram we see that the students who have a computer but no laptop are inside C but not inside L, i.e. in (a) we are looking for C ∩ L̄. Since 30 students out of the total of 50 students in C have a laptop there are 20 remaining students who have a computer but no laptop. this corresponds to a probability of 20%: P (C ∩ L̄) = 0.2 In (b) we are looking for the intersection of C and A. We cannot compute this value exactly, but we can give an upper and a lower limit: 1.3. KOLMOGOROV’S AXIOMS 7 P (A ∩ C) ≤ min P (A), P (C) = 0.5 and since P (A ∪ C) = P (A) + P (C) − P (A ∩ C) = 0.9 + 0.5 − P (A ∩ C) = 1.4 − P (A ∩ C), which can not be greater than 1, we know that P (A ∩ C) needs to be at least 0.4. In short: 0.4 ≤ P (A ∩ C) ≤ 0.5 i.e. between 40 and 50 % of all students have both access to a lab and a computer of his/her own. The number of students who have a laptop or a computer is just the number of students who have a computer, since laptops are a subgroup of computers. Therefore P (C ∪ L) = P (C) = 0.5. Example 1.3.5 A box contains 4 chips, 1 of them is defective. A person draws one chip at random. What is a suitable probability that the person draws the defective chip? Common sense tells us, that since one out of the four chips is defective, the person has a chance of 25% to draw the defective chip. Just for training, we will write this down in terms of probability theory: One possible sample space Ω is: Ω = {g1 , g2 , g3 , d} (i.e. we distinguish the good chips, which may be a bit artificial. It will become obvious, why that is a good idea anyway, later on.) The event to draw the defective chip is then A = {d}. We can write the probability to draw the defective chip by comparing the sizes of A and Ω: P (A) = |{d}| |A| = = 0.25. |Ω| |{g1 , g2 , g3 , d}| Be careful, though! The above method to compute probabilities is only valid in a special case: Theorem 1.3.4 If all elementary events in a sample space are equally likely (i.e. P ({ωi }) = const for all ω ∈ Ω), the probability of an event A is given by: |A| , P (A) = |Ω| where |A| gives the number of elements in A. Example 1.3.6 continued The person now draws two chips. What is the probability that the defective chip is among them? We need to set up a new sample space containing all possibilities for drawing two chips: Ω = {{g1 , g2 }, {g1 , g3 }, {g1 , d}, {g2 , g3 }, {g2 , d}, {g3 , d}} 8 CHAPTER 1. INTRODUCTION E = “ defective chip is among the two chips drawn” = = {{g1 , d}, {g2 , d}, {g3 , d}}. Then P (E) = |E| 3 = = 0.5. |Ω| 6 Finding P (E) involves counting the number of outcomes in E. Counting by hand is sometimes not feasible if Ω is large. Therefore, we need some standard counting methods. 1.4 Counting Methods Each week, millions of people buy tickets for the federal lottery. Each ticket has six different numbers printed on it, each in the range 1 to 49. How many different lottery tickets are there, and what are the chances of winning? – Probably you would agree that the chances of winning the lottery are very low – some people think of it as being equal to the probability of being hit by lightning. In the following section we are going to look at methods to find out how low the chances of winning the lottery are exactly – we are going to do that by counting all different possibilities for the result of a lottery drawing. 1.4.1 Two Basic Counting Principles Summation Principle If a complex action can be performed using k alternative methods and each method can be performed in n1 , n2 , ..., nk different ways, respectively, the complex action can be performed in n1 + n2 + ... + nk different ways. Multiplication Principle If a complex action can be broken down in a series of k components and these components can be performed in respectively n1 , n2 , . . . , nk ways, then the complex action can be performed in n1 · n2 · . . . · nk different ways. Example 1.4.1 Toss a coin first, then toss a die: results in 2 · 6 = 12 possible outcomes of the experiment. die coin H T 1.4.2 1 2 3 4 5 6 1 2 3 4 5 6 Ordered Samples with Replacement just to make sure we know what we are talking about, here are the definitions that will explain this section’s title: 1.4. COUNTING METHODS 9 Definition 1.4.1 (ordered sample) If r objects are selected from a set of n objects, and if the order of selection is noted, the selected set of r objects is called an ordered sample. Definition 1.4.2 (Sampling w/wo replacement) 1.4.4 Sampling with replacement occurs when an object is selected and then replaced before the next object is selected. Sampling without replacement occurs when an object is not replaced after it has been selected. Situation: Imagine a box with n balls in it numbered from 1 to n. We are interested in the number of ways to sequentially select k balls from the box when the same ball can be drawn repeatedly (with replacement). This is our first application of the multiplication principle: Instead of looking at the complex action, we break it down into the k single draws. For each draw, we have n different possibilities to draw a ball. The complex action can therefore be done in n · . . . · n} = nk different ways. | · n {z k times The sample space Ω can be written as: Ω = {(x1 , x2 , . . . , xk )|xi ∈ {1, . . . , n}} = {x1 x2 . . . xk |xi ∈ {1, . . . , n}} We already know that |Ω| = nk . Example 1.4.2 (a) How many valid five digit octal numbers (with leading zeros) do exist? In a valid octal number each digit needs to be between 0 and 7. We therefore have 8 choices for each digit, yielding 85 different five digit octal numbers. (b) What is the probability that a randomly chosen five digit number is a valid octal number? One possible sample space for this experiment would be Ω = {x1 x2 . . . x5 |xi ∈ {0, . . . , 9}}, yielding |Ω| = 105 . Since all numbers in Ω are equally likely, we can apply Thm 1.3.4 and get for the sought probability: P ( “randomly chosen five digit number is a valid octal number” ) = 85 ≈ 0.328. 105 Example 1.4.3 Pick 3 Pick 3 is a game played daily at the State Lottery. The rules are as follows: Choose three digits between 0 and 9 and order them. To win, the numbers must be drawn in the exact order you’ve chosen. Clearly, the number of different ways to choose numbers in this way is 10 · 10 · 10 = 1000. odds (= probability) to win: 1/1000. 10 CHAPTER 1. INTRODUCTION 1.4.3 Ordered Samples without Replacement Situation: Same box as before. We are interested in the number of ways to sequentially draw k balls from the box when each ball can be drawn only once (without replacement). Again, we break up the complex action into k single draws and apply the multiplication principle: Draw # of Choices 1st n 2nd (n − 1) 3rd (n − 2) ... ... total choices: n · (n − 1) · (n − 2) · . . . · (n − k + 1) = The fraction n! (n−k)! kth (n − k + 1) n! (n − k)! is important enough to get a name of its own: Definition 1.4.3 (Permutation number) P (n, k) := n!/(n − k)! is the number of permutations of n distinct objects taken k at a time. Example 1.4.4 (a) I only remember that a friend’s (4 digit) telephone number consists of the numbers 3,4, 8 and 9. How many different numbers does that describe? That’s the situation, where we take 4 objects out of a set of 4 objects and order them - that is P (4, 4)!. P (4, 4) = 4! 4! 24 = = = 24. (4 − 4)! 0! 1 (b) In a survey, you are asked to choose from seven items on a pizza your favorite three and rank them. How many different results will the survey have at most? - P (7, 3). P (7, 3) = 7! = 7 · 6 · 5 = 210. (7 − 3)! Variation: How many different sets of “top 3” items are there? (i.e. now we do not regard the order of the favorite three items.) Think: The value P (7, 3) is the result of a two-step action. First, we choose 3 items out of 7. Secondly, we order them. Therefore (multiplication principle!): P (7, 3) | {z } = X |{z} · P (3, 3) | {z } # of ways to choose # of ways to choose # of ways to choose 3 from 7 and order them 3 out of 7 items 3 out of 3 and order them So: X= P (7, 3) 7! 7·6·5 = = = 35. P (3, 3) 4!3! 3·2·1 This example leads us directly to the next section: 1.4. COUNTING METHODS 1.4.4 11 Unordered Samples without Replacement Same box as before. We are interested in the number of ways to choose k balls (at once) out of a box with n balls. As we’ve seen in the last example, this can be done in P (n, k) n! = P (k, k) (n − k)!k! different ways. Again, this number is interesting enough to get a name: Definition 1.4.4 (Binomial Coefficient) For two integer numbers n, k with k ≤ n the Binomial coefficient is defined as n n! := (n − k)!k! k Read: “out of n choose k” or “k out of n”. Example 1.4.5 Powerball (without the Powerball) Pick five (different) numbers out of 49 - the lottery will also draw five numbers. You’ve won, if at least three of the numbers are right. (a) What is the probability to have five matching numbers? Ω, the sample space, is the set of all possible five-number-sets: Ω = {{x1 , x2 , x3 , x4 , x5 }|xi ∈ {1, . . . , 49}} |Ω| = 49 5 = 49! = 1906884. 5!44! The odds to win a matching five are 1: 1 906 884 - they are about the same as to die from being struck by lightning. (b) What is the probability to have exactly three matching numbers? Answering this question is a bit tricky. But: since the order of the five numbers you’ve chosen doesn’t matter, we can assume that we picked the three right numbers at first and then picked two wrong numbers. Do you see it? That’s again a complex action that we can split up into two simpler actions. We need to figure out first, how many ways there are to choose 3 numbers out of the right 5 numbers. Obviously, this can be done in 53 = 10 ways. Secondly, the number of ways to choose the remaining 2 numbers out of the wrong 49-5 = 44 numbers is 44 2 = 231. In total, we have 10 · 231 = 2310 possible ways to choose three right numbers, which gives a probability of 11/90804 ≈ 0.0001. Note: the probability to have exactly three right numbers was given as 5 49−5 P ( “3 matching numbers” ) = 3 5−3 49 5 We will come across these probabilities quite a few times from now on. 12 CHAPTER 1. INTRODUCTION (b) What is the probability to win? (i.e to have at least three matching numbers) In order to have a win, we need to have exactly 3, 4 or 5 matching numbers. We already know the probabilities for exactly 3 or 5 matching numbers. What remains, is the probability for exactly 4 matching numbers. If we use the above formula and substitute the 3 by a 4, we get 5 49−5 P ( “4 matching numbers” ) = 4 5−4 49 5 = 5 · 49 ≈ 0.000128 49 5 In total the probability to win is: P ( “win” ) = P ( “3 matching numbers” ) + P ( “4 matches” ) + P ( “5 matches” ) = 1 + 5 · 49 + 231 = = 477 : 1906884 ≈ 0.00025. 1906884 Please note: In the previous examples we’ve used parentheses ( ), see definition , to indicate that the order of the elements inside matters. These constructs are called tuples. If the order of the elements does not matter, we use { } - the usual symbol for sets. 1.5 Conditional Probabilities Example 1.5.1 A box contains 4 computer chips, two of them are defective. Obviously, the probability to draw a defective chip in one random draw is 2/4 = 0.5. We analyze this chip now and find out that it is a good one. If we draw now, what is the probability to draw a defective chip? Now, the probability to draw a defective chip has changed to 2/3. Conclusion: The probability of an event A may change if we know (before we start the experiment for A) the outcome of another event B. We need to add another term to our mathematical description of probabilities: Real World assessment of “chance” given additional, partial information Mathematical World conditional probability of one event A given another event B. write: P (A|B) Definition 1.5.1 (conditional probability) The conditional probability of event A given event B is defined as: P (A|B) := P (A ∩ B) P (B) if P (B) 6= 0. Example 1.5.2 A lot of unmarked Pentium III chips in a box is as Good Defective 400 mHz 480 20 500 500 mHz 490 10 500 970 30 1000 1.6. INDEPENDENCE OF EVENTS 13 Drawing a chip at random has the following probabilities: P (D) = 0.03 P (400mHz) = 0.5 P (G) = 0.97 P (500mHz) = 0.5) check: these two must sum to 1. check: these two must sum to 1, too. P (D and 400mHz) = 20/1000 = 0.02 P (D and 500mHz) = 10/1000 = 0.01 Suppose now, that I have the partial information that the chip selected is a 400 mHz chip. What is now the probability that it is defective? Using the above formula, we get P ( chip is D| chip is 400mHz) = P ( chip is D and chip is 400mHz) 0.02 = = 0.04. P ( chip is 400mHz) 0.5 i.e. knowing the speed of the chip influences our probability assignment to whether the chip is defective or not. Note: Rewriting the above definition of conditional probability gives: P (A ∩ B) = P (B) · P (A|B), (1.1) i.e. knowing two out of the three probabilities gives us the third for free. We have seen that the occurrence of an event B may change the probability for an event A. If an event B does not have any influence on the probability of A we say, that the events A and B are independent: 1.6 Independence of Events Definition 1.6.1 (Independence of Events) Two events A and B are called independent, if P (A ∩ B) = P (A) · P (B) (Alternate definition: P (A|B) = P (A)) Independence is the mathematical counterpart of the everyday notion of “unrelatedness” of two events. Example 1.6.1 Safety System at a nuclear reactor Suppose there are two physically separate safety systems A and B in a nuclear reactor. An “incident” can occur only when both of them fail in the event of a problem. Suppose the probabilities for the systems to fail in a problem are: P (A fails) = 10−4 P (B fails) = 10−8 The probability for an incident is then P ( incident ) = P (A and B fail at the same time = = P (A fails and B fails) Using that A and B are independent from each other, we can compute the intersection of the events that both systems fail as the product of the probabilities for individual failures: P (A fails and B fails) A,B independent = P (A fails ) · P (B fails) Therefore the probability for an incident is: P ( incident ) = P (A fails ) · P (B fails) = 10−4 · 10−8 = 10−12 . 14 CHAPTER 1. INTRODUCTION Comments The safety system at a nuclear reactor is an example for a “parallel system” A parallel system consists of k components c1 , . . . , ck , that are arranged as drawn in the diagram 1.1. C1 C2 1 2 Ck Figure 1.1: Parallel system with k components. The system works as long as there is at least one unbroken path between 1 and 2 (= at least one of the components still works). Under the assumption that all components work independently from each other, it is fairly easy to compute the probability that a parallel system will fail: P ( system fails ) = P ( all components fail ) = = P (c1 fails ∩ c2 fails ∩ . . . ck fails) components are independent = = P (c1 fails) · P (c2 fails) · . . . · P (ck fails) A similar kind of calculation can be done for a “series system”. A series system, again, consists of k supposedly independent components c1 , . . . , ck arranged as shown in diagram 1.2. 1 C1 C2 2 Ck Figure 1.2: Series system with k components. This time, the system only works, if all of its components are working. Therefore, we can compute the probability that a series system works as: P ( system works ) = P ( all components work ) = = P (c1 works ∩ c2 works ∩ . . . ck works) components are independent = = P (c1 works) · P (c2 works) · . . . · P (ck works) Please note that based on the above probabilities it is easy to compute the probability that a parallel system is working and a series system fails, respectively, as: P ( parallel system works ) P ( series system fails ) T hm1.3.1 = T hm1.3.1 = 1 − P ( parallel system fails) 1 − P ( parallel system works) The probability that a system works is sometimes called the system’s reliability. Note that a parallel system is very reliable, a series system usually is very unreliable. Warning: independence and disjointness are two very different concepts! 1.7. BAYES’ RULE 15 Disjointness: Independence: If A and B are disjoint, their intersection is empty, has therefore probability 0: If A and B are independent events, the probability of their intersection can be computed as the product of their individual probabilities: P (A ∩ B) = P (∅) = 0. P (A ∩ B) = P (A) · P (B) If neither of A or B are empty, the probability for the intersection will not be 0 either! The concept of independence between events can be extended to more than two events: Definition 1.6.2 (Mutual Independence) A list of events A1 , . . . , An is called mutually independent, if for any subset {i1 , . . . , ik } ⊂ {1, . . . , n} of indices we have: P (Ai1 ∩ Ai2 ∩ . . . ∩ Aik ) = P (Ai1 ) · P (Ai2 ) · . . . · P (Aik ). Note: for more than 3 events pairwise independence does not imply mutual independence. 1.7 Bayes’ Rule Example 1.7.1 Treasure Hunt Suppose that there are three closed boxes. The first box contains two gold coins, the second box contains one gold coin and one silver coin, and the third box contains two silver coins. Suppose that you select one of the boxes randomly and then select one of the coins from this box. What is the probability that the coin you selected is golden? For a problem like this, that consists of a step-wise procedure, it is often useful to draw a tree (a flow chart) of the choices we can make in each step. The diagram below shows the tree for the 2 steps of choosing a box first and choosing one of two coins in that box. 1/3 1/3 1 gold 1/2 gold 1/2 silver 1 silver B1 B2 1/3 B3 The lines are marked by the probabilities, with which each step is done: Choosing one box (at random) means, that all boxes are equally likely to be chosen: P (Bi ) = 13 for i = 1, 2, 3. In the first box are two gold coins: A gold coin in this box is therefore chosen with probability 1. The second box has one golden and one silver coin. A gold coin is therefore chosen with probability 0.5. 16 CHAPTER 1. INTRODUCTION How do we piece these information together? We have two possible paths in the tree, to get a golden coin as a result. Each path corresponds to one event. E1 = choose Box 1 and pick one of the two golden coins E2 = choose Box 2 and pick the golden coin We need the probabilities for these two events. Think: use equation (1.1) to get P (Ei )! P (E1 ) = P ( choose Box 1 and pick one of the two golden coins) = = P ( choose Box 1 ) · P ( pick one of the two golden coins |B1 ) = 1 = · 1. 3 and P (E2 ) = P ( choose Box 2 and pick one of the two golden coins) = = P ( choose Box 2 ) · P ( pick one of the two golden coins |B2 ) = 1 1 1 · = . = 3 2 6 The probability to choose a golden coin is the sum of P (E1 ) and P (E2 ) (since those are the only ways to get a golden coin, as we’ve seen in the tree diagram). P ( golden coin ) = 1 1 + = 0.5. 3 6 There are several things to learn from this example: 1. Instead of trying to tackle the whole problem, we’ve divided it into several smaller pieces, that are more manageable (Divide and conquer Principle). 2. We identified the smaller parts by looking at the description of the problem with the help of a tree. And: if you compare the probabilities on the lines of the tree with the probabilities we used to compute the smaller pieces E1 and E2 , you’ll see that those correspond closely to the branches of the tree. The probability of E1 is computed as the product of all probabilities on the edges from the root to the leaf for E1 . Definition 1.7.1 (cover) A set of k events B1 , . . . , Bk is called a cover of the sample space Ω, if (i) the events are pairwise disjoint, i.e. Bi ∩ Bj = ∅ for all i, j (ii) the union of the events contains Ω: k [ Bi = Ω i=1 What is a cover, then? – You can think of a cover as several non-overlapping pieces, which in total contain every possible case of the sample space, like pieces of a jig-saw puzzle e.g. Compare with diagram 1.3. 1.7. BAYES’ RULE 17 Figure 1.3: B1 , B2 . . . , Bk are a cover of Ω. The boxes from the last example, B1 , B2 , and B3 , are a cover of the sample space. this is a formal way for “Divide and Conquer” Theorem 1.7.2 (Total Probability) If the set B1 , . . . , Bk is a cover of the sample space Ω, we can compute the probability for an event A by (cf. fig.1.4): k X P (A) = P (Bi ) · P (A|Bi ). i=1 Note: Instead of writing P (Bi )·P (A|Bi ) we could have written P (A∩Bi ) - this is the definition of conditional probability cf. def. 1.5.1. Figure 1.4: The probability of event A is put together as sum of the probabilities of the smaller pieces (theorem of total probability). The challenge in using this Theorem is to identify what set of events to use as cover, i.e. to identify in which parts to dissect the problem. Very often, the cover B1 , B2 , . . . , Bk has only two elements, and looks like E, Ē. Tree Diagram: B1 P(A| B1) A B1 B2 P(A| B2) A B2 Bk P(A| Bk) A Bk The probability of each node in the tree can be calculated by multiplying all probabilities from the root to the event (1st rule of tree diagrams). Summing up all the probabilities in the leaves gives P (A) (2nd rule). 18 CHAPTER 1. INTRODUCTION Homework: Powerball - with the Powerball Redo the above analysis under the assumption that besides the five numbers chosen from 1 to 49 you choose an additional number, again, between 1 and 49 as the Powerball. The Powerball may be a number you’ve already chosen or a new one. You’ve won, if at least the Powerball is the right number or, if the Powerball is wrong, at least three out of the other five numbers must match. • Show that the events “Powerball is right”, “Powerball is wrong”is a cover of the sample space (for that, you need to define a sample space). • Draw a tree diagram for all possible ways to win, given that the Powerball is right or wrong. • What is the probability to win? Extra Problem (tricky): Seven Lamps 1 2 A system of seven lamps is given as drawn in the diagram. 7 3 6 4 Each lamp fails (independently) with probability p = 0.1. The system works as long as not two lamps next to each other fail. What is the probability that the system works? 5 Example 1.7.2 Forensic Analysis On a crime site the police found traces of DNA (evidence DNA), which could be identified to belong to the perpetrator. Now, the search is done by looking for a DNA match. The probability for ”a man from the street” to have the same DNA as the DNA from the crime site (random match) is approx. 1: 1 Mio. For the analysis, whether someone is a DNA match or not, a test is used. The test is not totally reliable, but if a person is a true DNA match, the test will be positive with a probability of 1. If the person is not a DNA match, the test will still be positive with a probability of 1:100000. Assuming, that the police found a man with a positive test result. What is the probability that he actually is a DNA match? First, we have to formulate the above text into probability statements. The probability for a random match is P ( match ) = 1 : 1 Mio = 10−6 . Now, the probabilities for a positive test result: P ( test pos | match ) = 1 P ( test pos | no match ) = 1 : 100000 = 10−5 The probability asked for in the question is, again, a conditional probability. We know already, that the man has a positive test result. We look for the probability, that he is a match. This translates to P ( match | test pos. ). First, we use the definition of conditional probability to re-write this probability: P ( match | test pos. ) = P ( match ∩ test pos. ) P ( test pos. ) This doesn’t seem to help a lot, since we still don’t know a single one of those probabilities. But we do the same trick once again for the numerator: P ( match ∩ test pos. ) = P ( test pos. | match ) · P ( match ) 1.7. BAYES’ RULE 19 Now, we know both of these probabilities and get P ( match ∩ test pos. ) = 1 · 10−6 . The denominator is a bit more tricky. But remember the theorem of total probabilities - we just need a proper cover to compute this probability. The way this particular problem is posed, we find a suitable cover in the events match and no match. Using the theorem of total probability gives us: P ( test pos. ) = P ( match ) · P ( test pos. | match ) + P ( no match ) · P ( test pos. | no match ) We have got the numbers for all of these probabilities! Plugging them in gives: P ( test pos. ) = 10−6 · 1 + (1 − 10−6 ) · 10−5 = 1.1 · 10− 5. In total this gives a probability for the man with the positive test result to be a true match of slightly less than 10%! P ( match | test pos. ) = 10−6 · (1.1 · 10− 5.) = 1/11. Is that result plausible? - If you look at the probability for a false positive test result and compare it with the overall probability for a true DNA match, you can see, that the test is ten times more likely to give a positive result than there are true matches.This means that, if 10 Mio people are tested, we would expect 10 people to have a true DNA match. On the other hand, the test will yield additional 100 false positive results, which gives us a total of 110 people with positive test results. This, by the way, is not a property limited to DNA tests - it’s a property of every test, where the overall percentage of positives is fairly small, like e.g. tuberculosis tests, HIV tests or - in Europe - tests for mad cow disease. Theorem 1.7.3 (Bayes’ Rule) If B1 , B2 , . . . , Bk is a cover of the sample space Ω, P (Bj |A) = P (A | Bj ) · P (Bj ) P (Bj ∩ A) = Pk P (A) i=1 P (A | Bi ) · P (Bi ) for all j and ∅ = 6 A ⊂ Ω. Example 1.7.3 A given lot of chips contains 2% defective chips. Each chip is tested before delivery. However, the tester is not wholly reliable: P ( “tester says chip is good” | “chip is good” ) = 0.95 P ( “tester says chip is defective” | “chip is defective” ) = 0.94 If the test device says the chip is defective, what is the probability that the chip actually is defective? P ( chip is defective | {z } | :=Cd tester says it’s defective ) = P (Cd |Td ) | {z } Bayes’ Rule, use Cd ,C̄d as cover :=Td = = P (Td |Cd )P (Cd ) = P (Td |Cd )P (Cd ) + P (Td |C¯d )P (C¯d ) 0.94 · 0.02 = 0.28. 0.94 · 0.02 + (1 − P (T¯d |C¯d ) ·0.98 | {z } 0.05 = 20 1.8 CHAPTER 1. INTRODUCTION Bernoulli Experiments A random experiment with only two outcomes is called a Bernoulli experiment. Outcomes are e.g. 0,1 “success”, “failure” “hit”, “miss” “good”, “defective” The probabilities for “success” and “failure” are called p and q, respectively. (Then: p + q = 1) A compound experiment, consisting of a sequence of n independent repetitions is called a sequence of Bernoulli experiments. Example 1.8.1 Transmit binary digits through a communication channel with success = “digit received correctly”. Toss a coin repeatedly, success = “head”. Sample spaces Let Ωi be the sample space of an experiment involving i Bernoulli experiments: Ω1 = {0, 1} Ω2 = {(0, 0), (0, 1), (1, 0), (1, 1)} = { 00, 01, 10, 11 | {z } } all two-digit binary numbers .. . Ωn = {n − digit binary numbers} = {n − tuples of 0s and 1s} Probability assignment For Ω1 probabilities are already assigned: P (0) = q P (1) = p For Ω2 : P (00) = q 2 P (01) = qp P (10) = pq Generally, for Ωn : P (s) = pk q n−k if s has exactly k 1s and n − k 0s. P (11) = p2 Example 1.8.2 Very simple dartboard We will assume, that only those darts count, that actually hit the dartboard. If a player throws a dart and hits the board at random, the probability to hit the red zone will be directly proportional to the red area. Since out of the nine squares in total 8 are gray and only one is red, the probabilities are: P ( gray ) = 98 . P ( red ) = 19 A player now throws three darts, one after the other. What are the possible sequences of red and gray hits, and what are their probabilities? We have, again, a step-wise setup of the problem, we can therefore draw a tree: 1.8. BERNOULLI EXPERIMENTS 21 r r g r r rrg rgr g g r r g g sequence rrr r g g rgg grr grg ggr ggg probability 1 93 8 93 8 93 82 93 8 93 82 93 82 93 83 93 Most of the time, however, we are not interested in the exact sequence in which the darts are thrown - but in the overall result, how many times a player hits the red area. This leads us to the notion of a random variable. 22 CHAPTER 1. INTRODUCTION Chapter 2 Random Variables If the value of a numerical variable depends on the outcome of an experiment, we call the variable a random variable. Definition 2.0.1 (Random Variable) A function X : Ω → 7 R is called a random variable. X assigns to each elementary event a real value. Standard notation: capital letters from the end of the alphabet. Example 2.0.3 Very simple Dartboard In the case of three darts on a board as in the previous example, we are usually not interested in the order, in which the darts have been thrown. We only want to count the number of times, the red area has been hit. This count is a random variable! More formally: we define X to be the function, that assigns to a sequence of three throws the number of times, that the red area is hit. X(s) = k, if s consists of k hits to the red area and 3 − k hits to the gray area. X(s) is then an integer between 0 and 3 for every possible sequence. What is then the probability, that a player hits the red area exactly two times? We are looking now for all those elementary events s of our sample space, for which X(s) = 2. Going back to the tree, we find three possibilities for s : rrg, rgr and grr. This is the subset of Ω, for which X(s) = 2. Very formally, this set can be written as: {s|X(s) = 2} We want to know the total probability: P ({s|X(s) = 2}) = P (rrg ∪ rgr ∪ grr) = P (rrg) + P (rgr) + P (grr) = To avoid cumbersome notation, we write X=x for the event {ω|ω ∈ Ω and X(ω) = x}. 23 8 8 8 + 3 + 3 = 0.03. 93 9 9 24 CHAPTER 2. RANDOM VARIABLES Example 2.0.4 Communication Channel Suppose, 8 bits are sent through a communication channel. Each bit has a certain probability to be received incorrectly. So this is a Bernoulli experiment, and we can use Ω8 as our sample space. We are interested in the number of bits that are received incorrectly. Use random variable X to “count” the number of wrong bits. X assigns a value between 0 and 8 to each sequence in Ω8 . Now it’s very easy to write events like: a) no wrong bit received X=0 P (X = 0) b) at least one wrong bit received X≥1 P (X ≥ 1) c) exactly three bits are wrong X=3 P (X = 3) d) at least 3, but not more than 6 bits wrong 3 ≤ X ≤ 6 P (3 ≤ X ≤ 6) Definition 2.0.2 (Image of a random variable) The image of a random variable X is defined as all possible values X can reach Im(x) := X(Ω). Depending on whether or not the image of a random variable is countable, we distinguish between discrete and continuous random variables. Example 2.0.5 1. Put a disk drive into service, measure Y = “time till the first major failure”. Sample space Ω = (0, ∞). Y has uncountable image → Y is a continuous random variable. 2. Communication channel: X = “# of incorrectly received bits” Im(X) = {0, 1, 2, 3, 4, 5, 6, 7, 8} is a finite set → X is a discrete random variable. 2.1 Discrete Random Variables Assume X is a discrete random variable. The image of X is therefore countable and can be written as {x1 , x2 , x3 , . . .} Very often we are interested in probabilities of the form P (X = x). We can think of this expression as a function, that yields different probabilities depending on the value of x. Definition 2.1.1 (Probability Mass Function, PMF) The function pX (x) := P (X = x) is called the probability mass function of X. A probability mass function has two main properties: all values must be between 0 and 1 the sum of all values is 1 Theorem 2.1.2 (Properties of a pmf ) pX is the pmf of X, if and only if (i) 0 ≤ pX (x) ≤ 1 for all x ∈ {x1 , x2 , x3 , . . .} P (ii) i pX (xi ) = 1 Note: this gives us an easy method to check, whether a function is a probability mass function! 2.1. DISCRETE RANDOM VARIABLES 25 Example 2.1.1 Which of the following functions is a valid probability mass function? 1. x pX (x) -3 0.1 -1 0.45 0 0.15 5 0.25 7 0.05 2. y pY (y) -1 0.1 0 0.45 1.5 0.25 3 -0.05 4.5 0.25 3. z pZ (z) 0 0.22 5 0.17 7 0.18 1 0.18 3 0.24 We need to check the two properties of a pmf for pX , pY and pZ . 1st property: probabilities between 0 and 1 ? This eliminates pY from the list of potential probability mass functions, since pY (3) is negative. The other two functions fulfill the property. 2nd P property: sum of all probabilities is 1? Pi p(xi ) = 1, so pX is a valid probability mass function. i p(zi ) = 0.99 6= 1, so pZ is not a valid probability mass function. Example 2.1.2 Probability Mass Functions 1. Very Simple Dartboard X, the number of times, a player hits the red area with three darts is a value between 0 and 3. What is the probability mass function for X? The probability mass function pX can be given as a list of all possible values: pX (0) = P (X = 0) = P (ggg) = 83 ≈ 0.70 93 82 ≈ 0.26 93 8 pX (2) = P (X = 2) = P (rrg) + P (rgr) + P (grr) = 3 · 3 ≈ 0.03 9 1 pX (3) = P (X = 3) = P (rrr) = 3 ≈ 0.01 9 pX (1) = P (X = 1) = P (rgg) + P (grg) + P (ggr) = 3 · 2. Roll of a fair die Let Y be the number of spots on the upturned face of a die: Obviously, Y is a random variable with image {1, 2, 3, 4, 5, 6}. Assuming, that the die is a fair die means, that the probability for each side is equal. The probability mass function for Y therefore is pY (i) = 61 for all i in {1, 2, 3, 4, 5, 6}. 3. The diagram shows all six faces of a particular die. If Z denotes the number of spots on the upturned face after toss this die, what is the probability mass function for Z? Assuming, that each face of the die appears with the same probability, we have 1 possibility to get a 1 or a 4, and two possibilities for a 2 or 3 to appear, which gives a probability mass function of: x p(x) 1 1/6 2 1/3 3 1/3 4 1/6 26 2.1.1 CHAPTER 2. RANDOM VARIABLES Expectation and Variance Example 2.1.3 Game Suppose we play a “game”, where you toss a die. Let X be the number of spots, then if X is 1,3 or 5 I pay you $ X 2 or 4 you pay me $ 2 · X 6 no money changes hands. What money do I expect to win? For that, we look at another function, h(x), that counts the money I win with respect to the number of spots: −x for x = 1, 3, 5 2x for x = 2, 4 h(x) = 0 for x = 6. Now we make a list: In 1/6 of all tosses X will be 1, and I will gain -1 dollars In 1/6 of all tosses X will be 2, and I will gain 4 dollars In 1/6 of all tosses X will be 3, and I will gain -3 dollars In 1/6 of all tosses X will be 4, and I will gain 8 dollars In 1/6 of all tosses X will be 5, and I will gain -5 dollars In 1/6 of all tosses X will be 6, and I will gain 0 dollars In total I expect to get 61 · (−1) + 16 · 4 + 61 · (−3) + 61 · 8 + 16 · (−5) + 61 · 0 = 63 = 0.5 dollars per play. Assume, that instead of a fair die, we use the die from example 3. How does that change my expected gain? h(x) is not affected by the different die, but my expected gain changes: in total I expect to gain: 1 1 1 1 9 1 6 · (−1) + 3 · 4 + 3 · (−3) + 6 · 8 + 0 · (−5) + 6 · 0 = 6 = 1.5 dollars per play. Definition 2.1.3 (Expectation) The expected value of a function h(X) is defined as E[h(X)] := X h(xi ) · pX (xi ). i The most important version of this is h(x) = x: E[X] = X xi · pX (xi ) =: µ i Example 2.1.4 Toss of a Die Toss a fair die, and denote by X the number of spots on the upturned face. What is the expected value for X? Looking at the above definition for E[X], we see that we need to know the probability mass function for a computation. The probability mass function of X is pX (i) = 61 for all i ∈ {1, 2, 3, 4, 5, 6}. Therefore 6 X 1 1 1 1 1 1 E[X] = ipX (i) = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = 3.5. 6 6 6 6 6 6 i=1 A second common measure for describing a random variable is a measure, how far its values are spread out. We measure, how far we expect values to be away from the expected value: 2.1. DISCRETE RANDOM VARIABLES 27 Definition 2.1.4 (Variance of a random variable) The variance of a random variable X is defined as: V ar[X] := E[(X − E[X])2 ] = X (xi − E[X])2 · pX (xi ) i The variance is measured in squared units of X. p σ := V ar[X] is called the standard deviation of X, its units are the original units of X. Example 2.1.5 Toss of a Die, continued Toss a fair die, and denote with X the number of spots on the upturned face. What is the variance for X? Looking at the above definition for V ar[X], we see that we need to know the probability mass function and E[X] for a computation. The probability mass function of X is pX (i) = 61 for all i ∈ {1, 2, 3, 4, 5, 6}; E[X] = 3.5 Therefore 6 X 1 1 1 1 1 1 2 V ar[X] = (Xi − 3.5)2 pX (i) = 6.25 · + 2.25 · + 0.25 · + 0.25 · + 2.25 · + 6.25 · = 2.917 (spots ). 6 6 6 6 6 6 i=1 The standard deviation for X is: σ= 2.1.2 p V ar(X) = 1.71 (spots). Some Properties of Expectation and Variance The following theorems make computations with expected value and variance of random variables easier: Theorem 2.1.5 For two random variables X and Y and two real numbers a, b holds: E[aX + bY ] = aE[X] + bE[Y ]. Theorem 2.1.6 For a random variable X and a real number a holds: (i) E[X 2 ] = V ar[X] + (E[X])2 (ii) V ar[aX] = a2 V ar[X] Theorem 2.1.7 (Chebyshev’s Inequality) For any positive real number k, and random variable X with variance σ 2 : P (|X − E[X]| ≤ kσ) ≥ 1 − 2.1.3 1 k2 Probability Distribution Function Very often we are interested in the probability of a whole range of values, like P (X ≤ 5) or P (4 ≤ X ≤ 16). For that we define another function: Definition 2.1.8 (probability distribution function) Assume X is a discrete random variable: The function FX (t) := P (X ≤ t) is called the probability distribution function of X. 28 CHAPTER 2. RANDOM VARIABLES Relationship between pX and FX Since X is a discrete random variable, the image of X can be written as {x1 , x2 , x3 , . . .}, we are therefore interested in all xi with xi ≤ t: X FX (t) = P (X ≤ t) = P ({xi |xi ≤ t}) = pX (xi ). i,with xi ≤t Note: in contrast to the probability mass function, FX is defined on R (not only on the image of X). Example 2.1.6 Roll a fair die X = # of spots on upturned face Ω = {1, 2, 3, 4, 5, 6} pX (1) = pX (2) = . . . = pX (6) = 16 F (X)(t) = P i<t pX (i) = Properties of FX variable X. Pbtc i=1 pX (i) = btc 6 , where btc is the truncated value of t. The following properties hold for the probability distribution function FX of a random • 0 ≤ FX (t) ≤ 1 for all t ∈ R • FX is monotone increasing, (i.e. if x1 ≤ x2 then FX (x1 ) ≤ FX (x2 ).) • limt→−∞ FX (t) = 0 and limt→∞ FX (t) = 1. • FX (t) has a positive jump equal to pX (xi ) at {x1 , x2 , x3 , . . .}; FX is constant in the interval [xi , xi+1 ). Whenever no confusion arises, we will omit the subscript X. 2.2 Special Discrete Probability Mass Functions In many theoretical and practical problems, several probability mass functions occur often enough to be worth exploring here. 2.2.1 Bernoulli pmf Situation: Bernoulli experiment (only two outcomes: success/ no success) with P ( success ) = p We define a random variable X as: X( success ) = 1 X( no success ) = 0 The probability mass function pX of X is then: pX (0) = 1 − p pX (1) = p This probability mass function is called the Bernoulli mass function. The distribution function FX is then: t<0 0 1−p 0≤t<1 FX (t) = 1 1≤t This distribution function is called the Bernoulli distribution function. That’s a very simple probability function, and we’ve already seen sequences of Bernoulli experiments. . . 2.2. SPECIAL DISCRETE PROBABILITY MASS FUNCTIONS 2.2.2 29 Binomial pmf Situation: n sequential Bernoulli experiments, with success rate p for a single trial. Single trials are independent from each other. We are only interested in the number of successes he had in total after n trials, therefore we define a random variable X as: X = “ number of successes in n trials” This leads to an image of X as im(X) = {0, 1, 2, . . . , n} We can think of the sample space Ω as the set of sequences of length n that only consist of the letters S and F for “success” and ”failure”: Ω = {F...F F, F...F S, ...., S...SS} This way, we get 2n different outcomes in the sample space. Now, we want to derive a probability mass function for X, i.e. we want to get to a general expression for pX (k) for all possible k = 0, . . . , n. pX (k) = P (X = k), i.e. we want to find the probability, that in a sequence of n trials there are exactly k successes. Think: if s is a sequence with k successes and n − k failures, we already know the probability: P (s) = pk (1 − p)n−k . Now we need to know, how many possibilities there are, to have k successes in n trials: think of the n trials as numbers from 1 to n. To have k successes, we need to choose a set of k of these numbers out of the n possible numbers. Do you see it? - That’s the Binomial coefficient, again. pX (k) is therefore: n k p (1 − p)n−k . pX (k) = k This probability mass function is called the Binomial mass function. The distribution function FX is: FX (t) = btc X n i=0 i pi (1 − p)n−i =: Bn,p (t) This function is called the Binomial distribution Bn,p , where n is the number of trials, and p is the probability for a success. It is a bit cumbersome to compute values for the distribution function. Therefore, those values are tabled with respect to n and p. Example 2.2.1 Compute the probabilities for the following events: A box contains 15 components that each have a failure rate of 2%. What is the probability that 1. exactly two out of the fifteen components are defective? 2. at most two components are broken? 3. more than three components are broken? 4. more than 1 but less than 4 are broken? Let X be the number of broken components. Then X has a B15,0.02 distribution. 30 CHAPTER 2. RANDOM VARIABLES 1. P (exactly two out of the fifteen components are defective) = pX (2) = 15 2 0.022 0.9813 = 0.0323. 2. P (at most two components are broken) = P (X ≤ 2) = B15,0.02 (2) = 0.9638. 3. P ( more than three components are broken ) = P (X > 3) = 1 − P (X ≤ 3) = 1 − 0.9945 = 0.0055. 4. P ( more than 1 but less than 4 are broken ) = P (1 < X < 4) = P (X ≤ 3) − P (X ≤ 1) = 0.9945 − 0.8290 = 0.1655. If we want to say that a random variable has a binomial distribution, we write: X ∼ Bn,p What are the expected value and variance of X ∼ Bn,p ? E[X] = = = n X i · pX (i) = i=0 n X n i i· p (1 − p)n−i = i i=0 n X i=1 = np · i n! pi (1 − p)n−i i!(n − i)! n−1 X j=0 | V ar[X] 2.2.3 j:=i−1 = (n − 1)! pj (1 − p)n−1−j = np j!((n − 1) − j)! {z } =1 = . . . = np(1 − p). Geometric pmf Assume, we have a single Bernoulli experiment with probability for success p. Now, we repeat this experiment until we have a first success. Denote by X the number of repetitions of the experiment until we have the first success. Note: X = k means, that we have k − 1 failures and the first success in the kth repetition of the experiment. The sample space Ω is therefore infinite and starts at 1 (we need at least one experiment): Ω = {1, 2, 3, 4, . . .} Probability mass function: pX (k) = P (X = k) = (1 − p)k−1 · p | {z } |{z} k−1 failures success! This probability mass function is called the Geometric mass function. Expected value and variance of X are: E[X] = ∞ X i=1 V ar[X] = i(1 − p)i p = . . . = 1 , p ∞ X 1 1−p (i − )2 (1 − p)i p = . . . = . p p2 i=1 2.2. SPECIAL DISCRETE PROBABILITY MASS FUNCTIONS 31 Example 2.2.2 Repeat-until loop Examine the following programming statement: Repeat S until B assume P (B = true) = 0.1 and let X be the number of times S is executed. Then, X has a geometric distribution, P (X = k) = pX (k) = 0.9k−1 · 0.1 How often is S executed on average? - What is E[X]? Using the above formula, we get E[X] = 1 p = 10. We still need to compute the distribution function FX . Remember, FX (t) is the probability for X ≤ t. Instead of tackling this problem directly, we use a trick and look at the complementary event X > t. If X is greater than t, this means that the first btc trials yields failures. This is easy to compute! It’s just (1 − p)btc . Therefore the probability distribution function is: FX (t) = 1 − (1 − p)btc =: Geop (t) This function is called the Geometric distribution (function) Geop . Example 2.2.3 Time Outs at the Alpha Farm Watch the input queue at the alpha farm for a job that times out. The probability that a job times out is 0.05. Let Y be the number of the first job to time out, then Y ∼ Geo0.05 . What’s then the probability that • the third job times out? P (Y = 3) = 0.952 0.05 = 0.045 • Y is less than 3? P (Y < 3) = P (Y ≤ 2) = 1 − 0.952 = 0.0975 • the first job to time out is between the third and the seventh? P (3 ≤ Y ≤ 7) = P (Y ≤ 7) − P (Y ≤ 2) = 1 − 0.957 − (1 − 0.952 ) = 0.204 What are the expected value for Y , what is V ar[Y ]? Plugging in p = 0.05 in the above formulas gives us: 2.2.4 E[Y ] = V ar[Y ] = 1 = 20 p 1−p = 380 p2 we expect the 20th job to be the first time out very spread out! Poisson pmf The Poisson density follows from a certain set of assumptions about the occurrence of “rare” events in time or space. The kind of variables modelled using a Poisson density are e.g. X = # of alpha particles emitted from a polonium bar in an 8 minute period. Y = # of flaws on a standard size piece of manufactured product (100m coaxial cable) Z = # of hits on a web page in a 24h period. 32 CHAPTER 2. RANDOM VARIABLES The Poisson probability mass function is defined as: p(x) = e−λ λx x! for x = 0, 1, 2, 3, . . . λ is called the rate parameter. P oλ (t) := FX (t) is the Poisson distribution (function). We need to check that p(x) as defined above is actually a probability mass function, i.e. we need to check whether the two basic properties (see theorem 2.1.2) are true: • Obviously, all values of p(x) are positive for x ≥ 0. • Do all probabilities sum to 1? ∞ X p(x) = k=0 ∞ X ∞ e−λ X λk λk = e−λ k! k! (∗) k=0 k=0 Now, we need to remember from calculus that the exponential function has the series representation ex = ∞ X xn . n! n=0 In our case this simplifies (∗) to: e−λ ∞ X λk k=0 k! = e−λ · eλ = 1. p(x) is therefore a valid probability mass function. Expected Value and Variance of X ∼ P oλ are: E[X] V ar[X] = ∞ X e−λ λx x = ... = λ x! x=0 = ... = λ Computing E[X] and V ar[X] involves some math, but as it is not too hard, we can do the computation for E[X]: E[X] = ∞ ∞ X X λx e−λ λx = e−λ x = x x! x! x=0 x=0 = e−λ = e −λ for x = 0 the expression is 0 ∞ ∞ X X λx λx x = e−λ = x! (x − 1)! x=1 x=1 λ = e−λ λ ∞ X x=1 ∞ X x=0 x λx−1 = (x − 1)! x λx = e−λ λeλ = λ (x)! start at x = 0 again and change summation index How do we choose λ in an example? - look at the expected value! 2.2. SPECIAL DISCRETE PROBABILITY MASS FUNCTIONS 33 Example 2.2.4 A manufacturer of chips produces 1% defectives. What is the probability that in a box of 100 chips no defective is found? Let X be the number of defective chips found in the box. So far, we would have modelled X as a Binomial variable with distribution B100,0.01 . 100 Then P (X = 0) = 100 0.010 = 0.366. 0 0.99 On the other hand, a defective chip can be considered to be a rare event, since p is small (p = 0.01). What else can we do? We expect 100 · 0.01 = 1 chip out of the box to be defective. If we model X as Poisson variable, we know, that the expected value of X is λ. In this example, therefore, λ = 1. −1 0 Then P (X = 0) = e 0!1 = 0.3679. No big differences between the two approaches! For larger k, however, the binomial coefficient nk becomes hard to compute, and it is easier to use the Poisson distribution instead of the Binomial distribution. Poisson approximation of Binomial pmf For large n, the Binomial distribution is approximated by the Poisson distribution, where λ is given as np: n k (np)k p (1 − p)n−k ≈ e−np k! k Rule of thumb: use Poisson approximation if n ≥ 20 and (at the same time) p ≤ 0.05. Why does the approximation work? - We will have a closer look at why the Poisson distribution approximates the Binomial distribution. This also explains why the Poisson is defined as it is. Example 2.2.5 Typos Imagine you are supposed to proofread a paper. Let us assume that there are on average 2 typos on a page and a page has 1000 words. This gives a probability of 0.002 for each word to contain a typo. The number of typos on a page X is then a Binomial random variable, i.e. X ∼ B1000,0.002 . Let’s have a closer look at a couple of probabilities: • the probability for no typo on a page is P (X = 0). We know, that P (X = 0) = (1 − 0.002)1000 = 0.9981000 . We can also write this probability as P (X = 0) = 2 1− 1000 1000 (= 0.13506). From calculus we know, that x n = ex . n→∞ n Therefore the probability for no typo on the page is approximately lim 1− P (X = 0) ≈ e−2 (= 0.13534). • the probability for exactly one typo on a page is 1000 P (X = 1) = 0.002 · 0.998999 1 (= 0.27067). We can write this as 2 P (X = 1) = 1000 · 1000 1− 2 1000 999 ≈ 2 · e−2 (= 0.27067) 34 CHAPTER 2. RANDOM VARIABLES • the probability for exactly two typos on a page is 1000 P (X = 2) = 0.0022 · 0.998998 2 (= 0.27094), which we again re-write to 1000 · 999 22 P (X = 2) = · 2 1000 · 1000 2 1− 1000 998 ≈ 2 · e−2 (= 0.27067) • and a last one: the probability for exactly three typos on a page is 1000 P (X = 3) = 0.0023 · 0.998997 (= 0.18063), 3 which is P (X = 3) = 2.2.5 1000 · 999 · 998 23 · 3·2 1000 · 1000 · 1000 1− 2 1000 997 ≈ 23 −2 ·e 3! (= 0.18045) Compound Discrete Probability Mass Functions Real problems very seldom concern a single random variable. As soon as more than 1 variable is involved it is not sufficient to think of modeling them only individually - their joint behavior is important. Again, the How do we specify probabilities for more than one random variable at a time? individual probabili- Consider the 2 variable case: X, Y are two discrete variables. The joint probability mass function is defined ties must be between 0 as and 1 and their sum PX,Y (x, y) := P (X = x ∩ Y = y) must be 1. Example 2.2.6 A box contains 5 unmarked PowerPC G4 processors of different speeds: 2 400 mHz 1 450 mHz 2 500 mHz Select two processors out of the box (without replacement) and let X = speed of the first selected processor Y = speed of the second selected processor For a sample space we can draw a table of all the possible combinations of processors. We will distinguish between processors of the same speed by using the subscripts 1 or 2 . Ω 4001 4002 4001 x 4002 x 450 x x 5001 x x 5002 x x 1st processor 450 x x x x 5001 x x x x 5002 x x x x - 2nd processor In total we have 5 · 4 = 20 possible combinations. Since we draw at random, we assume that each of the above combinations is equally likely. This yields the following probability mass function: 2.2. SPECIAL DISCRETE PROBABILITY MASS FUNCTIONS 400 450 500 (mHz) 1st proc. 400 0.1 0.1 0.2 35 2nd processor 450 500 (mHz) 0.1 0.2 0.0 0.1 0.1 0.1 What is the probability for X = Y ? this might be important if we wanted to match the chips to assemble a dual processor machine: P (X = Y ) = pX,Y (400, 400) + pX,Y (450, 450) + pX,Y (500, 500) = = 0.1 + 0 + 0.1 = 0.2. Another example: What is the probability that the first processor has higher speed than the second? P (X > Y ) = pX,Y (400, 450) + pX,Y (400, 500) + pX,Y (450, 500) = = 0.1 + 0.2 + 0.1 = 0.4. We can go from joint probability mass functions to individual pmfs: X pX (x) = pX,Y (x, y) y X “marginal” pmfs pY (y) = pX,Y (x, y) x Example 2.2.7 Continued For the previous example the marginal probability mass functions are x pX (x) 400 0.4 450 0.2 500 (mHz) 0.4 y pY (y) 400 0.4 450 0.2 500 (mHz) 0.4 Just as we had the notion of expected value for functions with a single random variable, there’s an expected value for functions in several random variables: X E[h(X, Y )] := h(x, y)pX,Y (x, y) x,y Example 2.2.8 Continued Let X, Y be as before. What is E[|X − Y |] (the average speed difference)? here, we have the situation E[|X − Y |] = E[h(X, Y )], with h(X, Y ) = |X − Y |. Using the above definition of expected value gives us: X E[|X − Y |] = |x − y|pX,Y (x, y) = x,y = |400 − 400| · 0.1 + |400 − 450| · 0.1 + |400 − 500| · 0.2 + |450 − 400| · 0.1 + |450 − 450| · 0.0 + |450 − 500| · 0.1 + |500 − 400| · 0.2 + |500 − 450| · 0.1 + |500 − 500| · 0.1 = = 0 + 5 + 20 + 5 + 0 + 5 + 20 + 5 + 0 = 60. 36 CHAPTER 2. RANDOM VARIABLES The most important cases for h(X, Y ) in this context are linear combinations of X and Y . For two variables we can measure how “similar” their values are: Definition 2.2.1 (Covariance) The covariance between two random variables X and Y is defined as: Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] Note, that this definition looks very much like the definition for the variance of a single random variable. In fact, if we set Y := X in the above definition, the Cov(X, X) = V ar(X). Definition 2.2.2 (Correlation) The (linear) correlation between two variables X and Y is % := p Cov(X, Y ) V ar(X) · V ar(Y ) read: “rho” Facts about %: • % is between -1 and 1 • if % = 1 or -1, Y is a linear function of X %=1 % = −1 → Y = aX + b with a > 0, → Y = aX + b with a < 0, % is a measure of linear association between X and Y . % near ±1 indicates a strong linear relationship, % near 0 indicates lack of linear association. Example 2.2.9 Continued What is % in our box with five chips? Check: E[X] = E[Y ] = 450 Use marginal pmfs to compute! V ar[X] = V ar[Y ] = 2000 The covariance between X and Y is: X Cov(X, Y ) = (x − E[X])(y − E[Y ])pX,Y (x, y) = x,y = (400 − 450)(400 − 450) · 0.1 + (450 − 450)(400 − 450) · 0.1 + (500 − 450)(400 − 450) · 0.2 + (400 − 450)(450 − 450) · 0.1 + (450 − 450)(450 − 450) · 0.0 + (500 − 450)(450 − 450) · 0.1 + (400 − 450)(500 − 450) · 0.2 + (450 − 450)(500 − 450) · 0.1 + (500 − 450)(500 − 450) · 0.1 = = 250 + 0 − 500 + 0 + 0 + 0 − 500 + 250 + 0 = −500. % therefore is %= p Cov(X, Y ) V ar(X)V ar(Y ) = −500 = −0.25, 2000 % indicates a weak negative (linear) association. Definition 2.2.3 (Independence) Two random variables X and Y are independent, if their joint probability pX,Y is equal to the product of the marginal densities pX · pY . 2.3. CONTINUOUS RANDOM VARIABLES 37 Note: so far, we’ve had a definition for the independence of two events A and B: A and B are independent, if P (A ∩ B) = P (A) · P (B). Random variables are independent, if all events of the form X = x and Y = y are independent. Example 2.2.10 Continued Let X and Y be defined as previously. Are X and Y independent? Check: pX,Y (x, y) = pX (x) · pY (y) for all possible combinations of x and y. Trick: whenever there is a zero in the joint probability mass function, the variables cannot be independent: pX,Y (450, 450) = 0 6= 0.2 · 0.2 = pX (450) · pY (450). Therefore, X and Y are not independent! More properties of Variance and Expected Values Theorem 2.2.4 If two random variables X and Y are independent, E[X · Y ] V ar[X + Y ] = = E[X] · E[Y ] = V ar[X] + V ar[Y ] Theorem 2.2.5 For two random variables X and Y and three real numbers a, b, c holds: V ar[aX + bY + c] = a2 V ar[X] + b2 V ar[Y ] + 2ab · Cov(X, Y ) Note: by comparing the two results, we see that for two independent random variables X and Y , the covariance Cov(X, Y ) = 0. Example 2.2.11 Continued E[X − Y ] V ar[X − Y ] 2.3 = E[X] − E[Y ] = 450 − 450 = 0 = V ar[X] + (−1)2 V ar[Y ] − 2 Cov(X, Y ) = 2000 + 2000 + 1000 = 5000 Continuous Random Variables All previous considerations for discrete variables have direct counterparts for continuous variables. So far, a lot of sums have been involved, e.g. to compute the distribution functions or expected values. Summing over (uncountable) infinite many values corresponds to an integral. The main trick in working with continuous random variables is to substitute all sums by integrals in the definitions. As in the case of a discrete random variable, we define a distribution function as the probability that a random variable has outcome t or a smaller value: Definition 2.3.1 (probability distribution function) Assume X is a continuous random variable: The function FX (t) := P (X ≤ t) is called the probability distribution function of X. The only difference to the discrete case is that the distribution function of a continuous variable is not a stairstep function: 38 CHAPTER 2. RANDOM VARIABLES Properties of FX variable X. The following properties hold for the probability distribution function FX for random • 0 ≤ FX (t) ≤ 1 for all t ∈ R • FX is monotone increasing, (i.e. if x1 ≤ x2 then FX (x1 ) ≤ FX (x2 ).) • limt→−∞ FX (t) = 0 and limt→∞ FX (t) = 1. f (x) is no probability! f (x) may be > 1. Now, however, the situation is slightly different from the discrete case: Definition 2.3.2 (density function) For a continuous variable X with distribution function FX the density function of X is defined as: 0 fX (x) := FX (x). Theorem 2.3.3 (Properties of f (x)) A function fX is a density function of X, if (i) fX (x) ≥ 0 for all x, R∞ (ii) −∞ f (x)dx = 1. Relationship between fX and FX Since the density function fX is defined as the derivative of the distribution function, we can re-gain the distribution function from the density by integrating: Then Rt • FX (t) = P (X ≤ t) = −∞ f (x)dx • P (a ≤ X ≤ b) = Rb a f (x)dx Therefore, Z P (X = a) = P (a ≤ X ≤ a) = a f (x)dx = 0. a Example 2.3.1 Let Y be the time until the first major failure of a new disk drive. A possible density function for Y is −y e y>0 f (y) = 0 otherwise First, we need to check, that f (y) is actually a density function. Obviously, f (y) is a non-negative function on whole of R. The second condition, f must fulfill to be a density of Y is Z ∞ Z ∞ f (y)dy = e−y dy = −e−y |∞ 0 = 0 − (−1) = 1 −∞ 0 What is the probability that the first major disk drive failure occurs within the first year? Z P (Y ≤ 1) = 1 e−y dy = −e−y |10 = 1 − e−1 ≈ 0.63. 0 What is the distribution function of Y ? Z t Z t f (y)dy = e−y dy = 1 − e−t for all t ≥ 0. FY (t) = ∞ 0 2.4. SOME SPECIAL CONTINUOUS DENSITY FUNCTIONS f(y) 39 density function of Y y F(y) distribution function of Y y Figure 2.1: Density and Distribution function of random variable Y . Summary: discrete random variable image Im(X) finite or countable infinite continuous random variable image Im(X) uncountable probability distribution function: P FX (t) = P (X ≤ t) = k≤btc pX (k) FX (t) = P (X ≤ t) = probability mass function: pX (x) = P (X = x) probability density function: 0 fX (x) = FX (x) expected value: P E[h(X)] = x h(x) · pX (x) E[h(X)] = variance: V ar[X] =P E[(X − E[X])2 ] = = x (x − E[X])2 pX (x) V ar[X] =RE[(X − E[X])2 ] = ∞ = −∞ (x − E[X])2 fX (x)dx 2.4 2.4.1 R x Rt ∞ f (x)dx h(x) · fX (x)dx Some special continuous density functions Uniform Density One of the most basic cases of a continuous density is the uniform density. On the finite interval (a, b) each value has the same density (cf. diagram 2.2): 1 if a < x < b b−a f (x) = 0 otherwise The distribution function FX is Ua,b (x) := FX (x) = 0 x b−a 1 if x ≤ a if a < x < b if x ≥ b. We now know how to compute expected value and variance of a continuous random variable. 40 CHAPTER 2. RANDOM VARIABLES f(x) 1/ uniform density on (a,b) (b-a) a b x Figure 2.2: Density function of a uniform variable X on (a, b). Assume, X has a uniform distribution on (a, b). Then Z b 1 1 1 2b dx = x | = b−a b−a2 a a b2 − a2 1 = = (a + b). 2(b − a) 2 Z b a+b 2 1 (b − a)2 V ar[X] = (x − ) dx = . . . = . 2 b−a 12 a E[X] x = Example 2.4.1 The(pseudo) random number generator on my calculator is supposed to create realizations of U (0, 1) random variables. Define U as the next random number the calculator produces. What is the probability, that the next number is higher than 0.85? 1 For that, we want to compute P (U ≥ 0.85). We know the density function of U : fU (u) = 1−0 = 1. Therefore Z 1 P (U ≥ 0.85) = 1du = 1 − 0.85 = 0.15. 0.85 2.4.2 Exponential distribution This density is commonly used to model waiting times between occurrences of “rare” events, lifetimes of electrical or mechanical devices. Definition 2.4.1 (Exponential density) A random variable X has exponential density (cf. figure 2.3), if λe−λx if x ≥ 0 fX (x) = 0 otherwise λ is called the rate parameter. Mean, variance and distribution function are easy to compute. They are: E[X] = V ar[X] = Expλ (t) = 1 λ 1 λ2 FX (t) = 0 1 − e−λx if x < 0 if x ≥ 0 The following example will accompany us throughout the remainder of this class: we expect X to be in the middle between a and b - makes sense, doesn’t it? 2.4. SOME SPECIAL CONTINUOUS DENSITY FUNCTIONS 41 f2 f1 f0.5 x Figure 2.3: Density functions of exponential variables for different rate parameters 0.5, 1, and 2. Example 2.4.2 Hits on a webpage On average there are 2 hits per minute on a specific web page. I start to observe this web page at a certain time point 0, and decide to model the waiting time till the first hit Y (in min) using an exponential distribution. What is a sensible value for λ, the rate parameter? Think: on average there are 2 hits per minute - which makes an average waiting time of 0.5 minutes between hits. We will use this value as the expected value for Y : E[Y ] = 0.5. On the other hand, we know, that the expected value for Y is 1/λ. → we are back at 2 = λ as a sensible choice for the parameter! λ describes the rate, at which this web page is hit! What is the probability that we have to wait at most 40 seconds to observe the first hit? ok, we know the rate at which hits come to the web page in minutes - so, it’s advisable to express the 40s in minutes also: The above probability then becomes: What is the probability that we have to wait at most 2/3 min to observe the first hit? This, we can compute: P (Y ≤ 2/3) = Expλ (2/3) = 1 − −e−2/3·2 ≈ 0.736 How long do we have to wait at most, to observe a first hit with a probability of 0.9? This is a very different approach to what we have looked at so far! Here, we want to find a t, for which P (Y ≤ t) = 0.9: P (Y ≤ t) = 0.9 ⇐⇒ 1 − e−2t = 0.9 ⇐⇒ e−2t = 0.1 ⇐⇒ t = −0.5 ln 0.1 ≈ 1.15 (min) - that’s approx. 69 s. Memoryless property Example 2.4.3 Hits on a web page In the previous example I stated that we start to observe the web page a time point 0. Does the choice of this time point affect our analysis in any way? Let’s assume, that during the first minute after we started to observe the page, there is no hit. What is the probability, that we have to wait for another 40 seconds for the first hit? - this implies an answer to the question, what would have happened, if we had started our observation of the web page a minute later - would we still get the same results? 42 CHAPTER 2. RANDOM VARIABLES The probability we want to compute is a conditional probability. If we think back - the conditional probability of A given B was defined as P (A ∩ B) P (A|B) := P (B) Now, we have to identify, what the events A and B are in our case. The information we have is, that during the first minute, we did not observe a hit =: B, i.e. B = (Y > 1). The probability we want to know, is that we have to wait another 40 s for the first hit: A = wait for 1 min and 40 s for the first hit (= Y ≤ 5/3). P ( first hit within 5/3 min P (A ∩ B) P (Y ≤ 5/3 ∩ Y > 1) = = P (B) P (Y > 1) | no hit during 1st min) = P (A|B) = = P (1 < Y ≤ 5/3) e−2 − e−10/3 = 0.736. = 1 − P (Y < 1) e−2 That’s exactly the same probability as we had before!!! The result of this example is no coincidence. We can generalize: P (Y ≤ t + s|Y ≥ s) = 1 − e−λt = P (Y ≤ t) This means: a random variable with an exponential distribution “forgets” about its past. This is called the memoryless property of the exponential distribution. An electrical or mechanical device whose lifetime we model as an exponential variable therefore “stays as good as new” until it suddenly breaks, i.e. we assume that there’s no aging process. 2.4.3 Erlang density Example 2.4.4 Hits on a web page Remember: we modeled waiting times until the first hit as Exp2 . How long do we have to wait for the second hit? In order to get the waiting time for the second hit, we can add the waiting times until the first hit and the time between the first and the second hit. For both of these we know the distribution: Y1 , the waiting time until the first hit is an exponential variable with λ = 2. After we have observed the first hit, we start the experiment again and wait for the next hit. Since the exponential distribution is memoryless, this is as good as waiting for the first hit. We therefore can model Y2 , the time between first and second hit, by another exponential distribution with the same rate λ = 2. What we are interested in is Y := Y1 + Y2 . Unfortunately, we don’t know the distribution of Y , yet. Definition 2.4.2 (Erlang density) If Y1 , . . . , Yk are k independent exponential random variables with parameter λ, their sum X has an Erlang distribution: k X X := Yi is Erlang(k,λ) i=1 The Erlang density fk,λ is ( f (x) = λe −λx 0 k−1 · (λx) (k−1)! k is called the stage parameter, λ is the rate parameter. x<0 for x ≥ 0 2.4. SOME SPECIAL CONTINUOUS DENSITY FUNCTIONS 43 Expected value and variance of an Erlang distributed variable X can be computed using the properties of expected value and variance for sums of independent random variables: E[X] V ar[X] k k X X 1 = E[ Yi ] = E[Yi ] = k · λ i=1 i=1 = V ar[ k X Yi ] = i=1 k X V ar[Yi ] = k · i=1 1 λ2 In order to compute the distribution function, we need another result about the relationship between P oλ and Expλ . Theorem 2.4.3 If X1 , X2 , X3 , . . . are independent exponential random variables with parameter λ and (cf. fig. 2.4) W := largest index j such that j X Xi ≤ T i=1 for some fixed T > 0. Then W ∼ P oλT . * 0 X1 * X2 * X3 * * <- occurrence times T Figure 2.4: W = 3 in this example. With this theorem, we can derive an expression for the Erlang distribution function. Let X be an Erlangk,λ variable: Erlangk,λ (x) = P (X ≤ x) = 1st trick = 1 − P (X > x) = 1 − P( X Yi > x) above theorem = i | {z } less than k hits observed = 1 − P o( a Poisson r.v. with rate xλ ≤ k − 1) = = 1 − P oλx (k − 1). Example 2.4.5 Hits on a web page What is the density of the waiting time until the next hit? We said that Y as previously defined, is the sum of two exponential variables, each with rate λ = 2. X has therefore an Erlang distribution with stage parameter 2, and the density is given as fX (x) = fk,λ (x) = 4xe−2x for x ≥ 0 If we wait for the third hit, what is the probability that we have to wait more than 1 min? Z := waiting time until the third hit has an Erlang(3,2) distribution. P (Z > 1) = 1 − Erlang3,2 (1) = 1 − (1 − P o2·1 (3 − 1)) = P o2 (2) = 0.677 44 CHAPTER 2. RANDOM VARIABLES Note: The exponential distribution is a special case of an Erlang distribution: Expλ = Erlang(k=1,λ) Erlang distributions are used to model waiting times of components that are exposed to peak stresses. It is assumed that they can withstand k − 1 peaks and fail with the kth peak. We will come across the Erlang distribution again, when modelling the waiting times in queueing systems, where customers arrive with a Poisson rate and need exponential time to be served. 2.4.4 Gaussian or Normal density The normal density is the archetypical “bell-shaped” density. The density has two parameters: µ and σ 2 and is defined as (x−µ)2 1 fµ,σ2 (x) = √ e− 2σ2 2πσ 2 The expected value and variance of a normal distributed r.v. X are: Z ∞ E[X] = xfµ.σ2 (x)dx = . . . = µ −∞ Z ∞ V ar[X] = (x − µ)2 fµ.σ2 (x)dx = . . . = σ 2 . −∞ Note: the parameters µ and σ 2 are actually mean and variance of X - and that’s what they are called. f0,0.5 f0,1 f0,2 x f-1,1 f0,1 f2,1 x Figure 2.5: Normal densities for several parameters. µ determines the location of the peak on the x−axis, σ 2 determines the “width” of the bell. 2.4. SOME SPECIAL CONTINUOUS DENSITY FUNCTIONS 45 The distribution function of X is Z t fµ,σ2 (x)dx Nµ,σ2 (t) := Fµ,σ2 (t) = −∞ Unfortunately, there does not exist a closed form for this integral - fµ,σ2 does not have a simple antiderivative. However, to get probabilities means we need to evaluate this integral. This leaves us with several choices: 1. personal numerical integration uuuh, bad, bad, idea 2. use of statistical software later 3. standard tables of normal probabilities We will use the third option, mainly. First of all: only a special case of the normal distributions is tabled: only positive values of N (0, 1) are tabled - N (0, 1) is the normal distribution, that has mean 0 and a variance of 1. This is the so-called standard normal distribution, also written as Φ. A table for this distribution is enough, though. We will use several tricks to get any normal distribution into the shape of a standard normal distribution: Basic facts about the normal distribution that allow the use of tables (i) for X ∼ N (µ, σ 2 ) holds: Z := X −µ ∼ N (0, 1) σ This process is called standardizing X. (this is at least plausible, since E[Z] = V ar[Z] = 1 (E[X] − µ) = 0 σ 1 V ar[X] = 1 σ2 (ii) Φ(−z) = 1 − Φ(z) since f0,1 is symmetric in 0 (see fig. 2.6 for an explanation). f0,1 P(Z ≤ -z) P(Z ‡ +z) -z +z x Figure 2.6: standard normal density. Remember, the area below the graph up to a specified vertical line represents the probability that the random variable Z is less than this value. It’s easy to see, that the areas in the tails are equal: P (Z ≤ −z) = P (Z ≥ +z). And we already know, that P (Z ≥ +z) = 1 − P (Z ≤ z), which proves the above statement. Example 2.4.6 Suppose Z is a standard normal random variable. this is, what we are going to do! 46 CHAPTER 2. RANDOM VARIABLES 1. P (Z < 1) = ? P (Z < 1) = Φ(1) straight look-up = 0.8413. 2. P (0 < Z < 1) = ? P (0 < Z < 1) = P (Z < 1) − P (Z < 0) = Φ(1) − Φ(0) look-up = 0.8413 − 0.5 = 0.3413. 3. P (Z < −2.31) = ? P (Z < −2.31) = 1 − Φ(2.31) look-up = 1 − 0.9896 = 0.0204. 4. P (|Z| > 2) = ? P (|Z| > 2) = P (Z < −2) + P (Z > 2) = 2(1 − Φ(2)) (1) look-up = (2) f0,1 (3) f0,1 Example 2.4.7 Suppose, X ∼ N (1, 2) P (1 < X < 2) =? A standardization of X gives Z := P (1 < X < 2) 2(1 − 0.9772) = 0.0456. f0,1 (4) f0,1 X−1 √ . 2 1−1 X −1 2−1 P( √ < √ < √ )= 2 2 2 √ = P (0 < Z < 0.5 2) = Φ(0.71) − Φ(0) = 0.7611 − 0.5 = 0.2611. = Note that the standard normal table only shows probabilities for z < 3.99. This is all we need, though, since P (Z ≥ 4) ≤ 0.0001. Example 2.4.8 Suppose the battery life of a laptop is normally distributed with σ = 20 min. Engineering design requires, that only 1% of batteries fail to last 300 min. What mean battery life is required to ensure this condition? Let X denote the battery life in minutes, then X has a normal distribution with unknown mean µ and standard deviation σ = 20 min. What is µ? The condition, that only 1% of batteries is allowed to fail the 300 min limit translates to: P (X < 300) ≤ 0.01 We must make sure to choose µ such, that this condition holds. 2.5. CENTRAL LIMIT THEOREM (CLT) 47 In order to compute the probability, we must standardize X: Z := Then P (X ≤ 300) = P ( X −µ 20 X −µ 300 − µ 300 − µ 300 − µ ≤ ) = P (Z ≤ ) = Φ( ) 20 20 20 20 The condition requires: P (X ≤ 300) ≤ 0.01 300 − µ ⇐⇒ Φ( ) ≤ 0.01 = 1 − 0.99 = 1 − Φ(2.33) = Φ(−2.33) 20 300 − µ ⇐⇒ ≤ −2.33 20 ⇐⇒ µ ≥ 346.6. Normal distributions have a “reproductive property”, i.e. if X and Y are normal variables, then W := aX + bY is also a normal variable, with: E[W ] V ar[W ] = aE[X] + bE[Y ] = a2 V ar[X] + b2 V ar[Y ] + 2abCov(X, Y ) The normal distribution is extremely common/ useful, for one reason: the normal distribution approximates a lot of other distributions. This is the result of one of the most fundamental theorems in Math: 2.5 Central Limit Theorem (CLT) Theorem 2.5.1 (Central Limit Theorem) If X1 , X2 , . . . , Xn are n independent, identically distributed random variables with E[Xi ] = µ and V ar[Xi ] = σ 2 , then: Pn the sample mean X̄ := n1 i=1 Xi is approximately normal distributed with E[X̄] = µ and V ar[X̄] = σ 2 /n. 2 i.e. X̄ ∼ N (µ, σn ) or P i Xi ∼ N (nµ, nσ 2 ) Corollary 2.5.2 (a) for large n the binomial distribution Bn,p is approximately normal Nnp,np(1−p) . (b) for large λ the Poisson distribution P oλ is approximately normal Nλ,λ . (c) for large k the Erlang distribution Erlangk,λ is approximately normal N k , k λ λ2 Why? (a) Let X be a variable with a Bn,p distribution. We know, that X is the result from repeating the same Bernoulli experiment n times and looking at the overall number of successes. We can therefor, write X as the sum of n B1,p variables Xi : X := X1 + X2 + . . . + Xn X is then the sum of n independent, identically distributed random variables. Then, the Central Limit Theorem states, that X has an approximate normal distribution with E[X] = nE[Xi ] = np and V ar[X] = nV ar[Xi ] = np(1 − p). 48 CHAPTER 2. RANDOM VARIABLES (b) it is enough to show the statement for the case that λ is a large integer: Let Y be a Poisson variable with rate λ. Then we can think of Y as the number of occurrences in an experiment that runs for time λ - that is the same as to observe λ experiments that each run independently for time 1 and add their results: Y = Y1 + Y2 + . . . + Yλ , with Yi ∼ P o1 . Again, Y is the sum of n independent, identically distributed random variables. Then, the Central Limit Theorem states, that X has an approximate normal distribution with E[Y ] = λ · 1 and V ar[Y ] = λV ar[Yi ] = λ. (c) this statement is the easiest to prove, since an Erlangk,λ distributed variable Z is by definition the sum of k independently distributed exponential variables Z1 , . . . , Zk . For Z the CLT holds, and we get, that Z is approximately normal distributed with E[Z] = kE[Zi ] = and V ar[Z] = kV ar[Zi ] = λk2 . k λ 2 Why do we need the central limit theorem at all? - first of all, the CLT gives us the distribution of the sample mean in a very general setting: the only thing we need to know, is that all the observed values come from the same distribution, and the variance for this distribution is not infinite. A second reason is, that most tables only contain the probabilities up to a certain limit - the Poisson table e.g. only has values for λ ≤ 10, the Binomial distribution is tabled only for n ≤ 20. After that, we can use the Normal approximation to get probabilities. Example 2.5.1 Hits on a webpage Hits occur with a rate of 2 per min. What is the probability to wait for more than 20 min for the 50th hit? Let Y be the waiting time until the 50th hit. We know: Y has an Erlang50,2 distribution. therefore: P (Y > 20) = 1 − Erlang50,2 (20) = 1 − (1 − P o2·20 (50 − 1)) = = P o40 (49) ≈ N40,40 (49) = 49 − 40 table √ Φ = Φ(1.42) = 0.9222. 40 = CLT ! Example 2.5.2 Mean of Uniform Variables Let U1 , U2 , U3 , U4 , and U5 be standard uniform variables, i.e. Ui ∼ U(0,1) . Without the CLT we would have no idea, what distribution the sample mean Ū = approx 1 With it, we know: Ū ∼ N (0.5, 60 ). Issue: 1 5 Accuracy of approximation • increases with n • increases with the amount of symmetry in the distribution of Xi Rule of thumb for the Binomial distribution: Use the normal approximation for Bn,p , if np > 5 (if p ≤ 0.5) or nq > 5 (if p ≥ 0.5)! P5 i=1 Ui had! Chapter 3 Elementary Simulation 3.1 Basic Problem Simulation allows us to get approximate results to all kinds of probability problems, that we couldn’t solve analytically. The basic problem is: X, Y, . . . , Z | {z } k random variables k independent random variables. Assume, we know how to simulate each of these variables g(x, y, . . . , z) some quite complicated function g of k variables V = g(X, Y, . . . , Z) a random variable of interest We might be interested in some aspects of the density of V - e.g. P (13 < V < 17) = ? E[V ] = ? V ar[V ] = ? unless g is simple, k is small, and we are very lucky, we may not be able to solve these problems analytically. Using simulation, we can do the following: steps of Simulation: 1. Simulate some large number (say n) of values for each of the k variables X, Y, . . . , Z. we then have a set of n k-tuples of the form (Xi , Yi , . . . , Zi ) for i = 1, . . . , n 2. plug each (Xi , Yi , . . . , Zi ) into function g and compute Vi : Vi = g(Xi , Yi , . . . , Zi ) for i = 1, . . . , n 3. then approximate (a) P (a ≤ V ≤ b) by #Vi : Vi ∈ [a, b] n 49 50 CHAPTER 3. ELEMENTARY SIMULATION (b) E[h(V )] by n n 1X h(Vi ) n i=1 i.e. E[V ] = 1X Vi = V̄ n i=1 (c) V ar[V ] by n 1X (Vi − V̄ )2 n i=1 We want to be able to perform an experiment with a given set of probabilities. Starting point: 3.2 Random Number Generators Random number generators (rng) produce a stream of numbers that look like realizations of independent standard uniform variables U1 , U2 , U3 , . . . Usually, these numbers are not completely random , but pseudo random. This way, we ensure repeatability of an experiment. (Note: even the trick to link the system’s rand() function to the internal clock, gives you only pseudo random numbers, since the same time will give you exactly the same stream of random numbers.) There are hundreds of methods that have been proposed for doing this - some (most?) are pretty bad. A good method - and, in fact, current standard in most operating systems, is: Linear Congruential Method Definition 3.2.1 (Linear Congruential Sequence) For integers a, c, and m a sequence of “random numbers” xn is defined by: xi ≡ (axi−1 + c) mod m for i = 1, 2, . . . Note: this sequence still depends on the choice of x0 , the so-called seed of the sequence. Choosing different seeds yields different sequences. That way, we get a sequence with elements in [0, m − 1]. We define ui := xmi . The choice of the parameters a, c and m is crucial! obviously, we want to get as many different numbers as possible - therefore m needs to be as large as possible and preferably prime (that way we get rid of small cycles). Example 3.2.1 rng examples Status-quo in industry is the so called Minimal standard generator. It fulfills the common requirements of a rng and at the same time is very fast. Its parameters are: c = 2 a = 16807 m = 231 − 1 An example for a terrible random number generator is the RANDU, with c = 0 a = 65539 m = 231 3.2. RANDOM NUMBER GENERATORS 51 It was widely used, before people discovered how bad it actually is: Knowing two successive random numbers gives you the possibility to predict the next number pretty well. . . . that’s not, how rng s are supposed to work. For more information about random number generators and different techniques, how to produce and check them, look at http://crypto.mat.sbg.ac.at/results/karl/server/ State of the art at the moment is the Marsaglia-Multicarry-RNG: #define znew ((z=36969*(z&65535)+(z>>16))<<16) #define wnew ((w=18000*(w&65535)+(w>>16))&65535) #define IUNI (znew+wnew) #define UNI (znew+wnew)*2.328306e-10 static unsigned long z=362436069, w=521288629; void setseed(unsigned long i1,unsigned long i2){z=i1; w=i2;} /* Whenever you need random integers or random reals in your C program, just insert those six lines at (near?) the beginning of the program. In every expression where you want a random real in [0,1) use UNI, or use IUNI for a random 32-bit integer. No need to mess with ranf() or ranf(lastI), etc, with their requisite overheads. Choices for replacing the two multipliers 36969 and 18000 are given below. Thus you can tailor your own in-line multiply-with-carry random number generator. This section is expressed as a C comment, in case you want to keep it filed with your essential six lines: */ /* Use of IUNI in an expression will produce a 32-bit unsigned random integer, while UNI will produce a random real in [0,1). The static variables z and w can be reassigned to i1 and i2 by setseed(i1,i2); You may replace the two constants 36969 and 18000 by any pair of distinct constants from this list: 18000 18030 18273 18513 18879 19074 19098 19164 19215 19584 19599 19950 20088 20508 20544 20664 20814 20970 21153 21243 21423 21723 21954 22125 22188 22293 22860 22938 22965 22974 23109 23124 23163 23208 23508 23520 23553 23658 23865 24114 24219 24660 24699 24864 24948 25023 25308 25443 26004 26088 26154 26550 26679 26838 27183 27258 27753 27795 27810 27834 27960 28320 28380 28689 28710 28794 28854 28959 28980 29013 29379 29889 30135 30345 30459 30714 30903 30963 31059 31083 (or any other 16-bit constants k for which both k*2^16-1 and k*2^15-1 are prime)*/ Armed with a Uniform rng, all kinds of other distributions can be generated: 52 3.2.1 CHAPTER 3. ELEMENTARY SIMULATION A general method for discrete data Consider a discrete pmf with: x p(x) x1 < x2 < . . . < xn p(x1 ) p(x2 ) . . . p(xn ) The distribution function F then is: F (t) = X p(xi ) i,xi ≤t Suppose, we have a sequence of independent, standard uniform random variables U1 , U2 , . . . and they have realizations u1 , u2 , . . . (realizations are real values in (0,1)). then we define the ith element in our new sequence to be xj , if j−1 X p(xk ) ≤ ui ≤ k=1 | j X p(xk ) k=1 {z F (xj−1 ) } {z | F (xj ) } Then X has probability mass function p. This is less complicated than it looks. Have a look at figure 3.1. Getting the right x-value for a specific u is done by drawing a horizontal line from the y-axis to the graph of F and following the graph down to the x-axis. - This is, how we get the inverse of a function, graphically. 1 u1 x u2 x 0 x1 x2 x3 x4 ... xn x Figure 3.1: Getting the value corresponding to ui is done by drawing a straight line to the right, until we hit the graph of F , and following the graph down to xj . Example 3.2.2 Simulate the roll of a fair die Let X be the number of spots on the upturned face. The probability mass function of X is p(i) = i = 1, . . . , 6, the distribution function is FX (t) = btc 6 for all t ∈ (0, 6). We therefore get X from a standard uniform variable U by 1 1 if 0 ≤ U ≤ 6 2 if 16 < U ≤ 62 3 if 2 < U ≤ 3 6 6 X= 3 4 4 if < U ≤ 6 6 5 if 46 < U ≤ 65 6 if 56 < U ≤ 66 A faster definition than the one above is X = d6 · U e. 1 6 for all 3.2. RANDOM NUMBER GENERATORS 3.2.2 53 A general Method for Continuous Densities Consider a continuous density f with distribution function F . We know (cf. fig. 3.2) that F : (x0 , ∞) 7→ (0, 1) (x0 could be −∞) has an inverse function: F −1 : (0, 1) 7→ (x0 , ∞) FX x xo Figure 3.2: Starting at some value x0 any continuous distribution function has an inverse. In this example, x0 = 1. General Method: For a given standard uniform variable U ∼ U(0,1) we define −1 X := FX (U ) Then X has distribution FX . Why? For a proof of the above statement, we must compute the distribution function X has. Remember, the distribution function of X, FX at value x is the probability that X is x or less: P (X ≤ x) trick = apply FX to both sides of the inequality = P (FX (X) ≤ FX (x)) dfn of U = = P (U ≤ FX (x)) = U is a standard uniform variable,P (U ≤ t) = t = F (x). Therefore, X has exactly the distribution, we wanted it to have. Example 3.2.3 Simulate from Expλ Suppose, we want to simulate a random variable X that has an exponential distribution with rate λ. How do we do this based on a standard uniform variable U ∼ U(0,1) ? The distribution function for Expλ is 0 for x ≤ 0 Expλ (x) = 1 − e−λx for x ≥ 0 So, Expλ : (0, ∞) 7→ (0, 1) has an inverse: Let u be a positive real number: u = 1 − e−λx ⇐⇒ 1 − u = e−λx ⇐⇒ ln(1 − u) = −λx ⇐⇒ x 1 −1 (u) = − ln(1 − u) =: FX λ 2 54 CHAPTER 3. ELEMENTARY SIMULATION then X := − λ1 ln(1 − U ) has an exponential distribution with rate λ. In fact, since 1 − U is uniform, if U is uniform, we could also have used X := − λ1 ln U For specific densities there are a lot of different special tricks for simulating observations: For all of the next sections, let’s assume that we have a sequence of independent standard uniform variables U1 , U2 , U3 , . . . 3.2.2.1 Simulating Binomial & Geometric distributions Let p be the rate of success for a single Bernoulli trial. define: Xi = 0 1 if ui ≥ p if ui < p Then X := n X Xi ∼ Bn,p i=1 and W := # of Xi until the first is 1 W ∼ Geometricp 3.2.2.2 Simulating a Poisson distribution With given U , we know, X = − λ1 ln U has an exponential distribution with rate λ. Define j j+1 X X Y := largest index j such that Xi ≤ 1 and Xi > 1 i=1 i=1 then Y ∼ P oλ 3.2.2.3 Simulating a Normal distribution To simulate a normal distribution, we need two sequences of standard uniform variables. Let U1 and U2 be two independent standard uniform variables. Define −1/2 cos(2πU2 ) −1/2 sin(2πU2 ) Z1 := [−2 ln U1 ] Z2 := [−2 ln U1 ] Then both Z1 and Z2 have a standard normal distribution and are independent, Z1 , Z2 ∼ N (0, 1) and X := µ + σZi ∼ N (µ, σ 2 ) 3.3. EXAMPLES 3.3 55 Examples Example 3.3.1 Simple electric circuit Consider an electric circuit with three resistors as shown in the diagram: Simple Physics predicts that R, the overall resistance is: R = R1 + ( 1 1 −1 R2 · R3 + ) = R1 + R1 R3 R2 + R3 Assume, the resistors are independent and have a normal distribution with mean 100 Ω and a standard deviation of 2 Ω. What should we expect for R, the overall resistance? The following lines are R output from a simulation of 1000 values of R: # Example: Simple Electric Circuit # # Goal: Simulate 1000 random numbers for each of the resistances R1, R2 and R3. # Compute R, the overall resistance, from those values and get approximations for # expectated value, variance and probabilities: # # rnorm (n, mean=0, sd = 1) generates n normal random numbers with the specified # mean and standard deviation # R1 <- rnorm (1000, mean=100, sd = 2) R2 <- rnorm (1000, mean=100, sd = 2) R3 <- rnorm (1000, mean=100, sd = 2) # # compute R: R <- R1 + R2*R3/(R2 + R3) # # now get the estimates: mean(R) > [1] 149.9741 sd(R) > [1] 2.134474 # # ... the probability that R is less than 146 is given by the number of values # that are less than 146 divided by 1000: sum(R<146)/1000 > [1] 0.04 Example 3.3.2 at MacMall Assume, you have a summer job at MacMall, your responsibility are the Blueberry IMacs in stock. At the start of the day, you have 20 Blueberry IMacs in stock. We know: X = # of IMacs ordered per day is Poisson with mean 30 Y = # of IMacs received from Apple is Poisson with mean 15 a day 56 CHAPTER 3. ELEMENTARY SIMULATION Question: What is the probability that at the end of the day you have inventory left in the stock. Let I be the number of Blueberry IMacs in stock at the end of the day. I = 20 − X + Y Asked for is the probability that I ≥ 1). Again, we use R for simulating I: # Example: MacMall # # Goal: generate 1000 Poisson values with lambda = 30 # # Remember: 1 Poisson value needs several exponential values # step 1: produce exponential values u1 <- runif(33000) e1 <- -1/30*log(u1) sum(e1) [1] 1099.096 # # sum of the exponential values is > 1000, therefore we have enough values # to produce 1000 Poisson values # # step 2: # add the exponential values (cumsum is cumulative sum) E1 <- cumsum(e1) E1[25:35] [1] 0.7834028 0.7926534 0.7929962 0.7959631 0.8060001 0.8572329 0.8670336 [8] 0.8947401 1.0182220 1.0831698 1.1001983 E1 <- floor(E1) E1[25:35] [1] 0 0 0 0 0 0 0 0 1 1 1 # # Each time we step over the next integer, we get another Poisson value # by counting how many exponential values we needed to get there. # # step 3: # The ’table’ command counts, how many values of each integer we have X <- table(E1) X[1:10] 0 1 2 3 4 5 6 7 8 9 32 26 31 32 17 27 31 33 32 31 # # we have 1099 values, we only need 1000 X <- X[1:1000] # # check, whether X is a Poisson variable (then, e.g. mean and variance # must be equal to lambda, which is 30 in our example) # mean(X) [1] 30.013 var(X) [1] 29.84067 3.3. EXAMPLES 57 # # generate another 1000 Poisson values, this time lambda is 15 Y <- rpois(1000,15) # looks a lot easier! # # now compute the variable of interest: I is the number of Blueberry IMacs # we have in store at the end of the day I <- 20 - X + Y # # and, finally, # the result we were looking for; # the (empirical) probability, that at the end of the day there are still # computers in the store: sum(I > 0)/1000 [1] 0.753 Using simulation gives us the answer, that with an estimated probability of 0.753 there will be Blueberry IMacs in stock at the end of the day. Why does simulation work? On what properties do we rely when simulating? P (V ∈ [a, b]) approximated by p̂ = #Vi :Vni ∈[a,b] Pn 1 E[h(V )] approximated by h̄ := n i=1 h(Vi ) Pn 1 2 V ar[V ] approximated by i=1 (Vi − V̄ ) n Suppose V1 = g(X1 , Y1 , . . . , Z1 ), V2 = g(X2 , Y2 , . . . , Z2 ) . . . Vn = g(Xn , Yn , . . . , Zn ) are i.i.d then #{Vi : Vi ∈ [a, b]} ∼ Bn,p with p = P (V ∈ [a, b]), n = # trials. So, we can compute expected value and variance of p̂: n E[p̂] = 1 1X E[Vi ] = n · p = p n i=1 n V ar[p̂] = n 1 1 X p(1 − p) 1 V ar[Vi ] = 2 n · p(1 − p) = ≤ → 0 for n → ∞ n2 i=1 n n 4n i.e. we have the picture that for large values of n, p̂ has a density centered at the “true” value for P (V ∈ [a, b]) with small spread. i.e. for large n p̂ is close to p with high probability. Similarly, for Vi i.i.d, h(Vi ) are also i.i.d. Then n 1X E[h(Vi )] = E[h(V )] E[h̄] = n i=1 and V ar[h̄] = n 1 X V ar[h(Vi )] = V ar[h(V )]/n → 0 for n → ∞. n2 i=1 Once again we have that picture for h̄, that the density for h̄ is centered at E[h(V )] for large n and has small spread. 58 CHAPTER 3. ELEMENTARY SIMULATION Chapter 4 Stochastic Processes Definition 4.0.1 (Stochastic Process) A stochastic process is a set of random variables indexed by time: X(t) Modeling requires somehow (mathematically consistent) specifying the joint distribution (X(t1 ), X(t2 ), X(t3 ), . . . , X(tk )) for any choice of t1 < t2 < t3 < . . . < tk . Values of X(t) are called states, the set of all possible values for X(t) is called the state space. We have been looking at a Poisson process for some time - our example “hits on a web page” is a typical example for a Poisson process - so, here’s a formal definition: 4.1 Poisson Process Definition 4.1.1 (Poisson Process) A stochastic process X(t) is called homogenous Poisson process with rate λ, if 1. for t > 0 X(t) takes values in {0, 1, 2, 3, . . .}. distribution depends only on length of interval 2. for any 0 ≤ t1 < t2 : X(t2 ) − X(t1 ) ∼ P oλ(t2 −t1 ) non-overlapping intervals are independent 3. for any 0 ≤ t1 < t2 ≤ t3 < t4 Xt2 − Xt1 is independent from Xt4 − Xt3 Jargon: X(t) is a “counting process” with independent Poisson increments. Example 4.1.1 Hits on a web page Number of hits on a webpage Counter X(t) A counter of the number of hits on our webpage is an example for a Poisson Process with rate λ = 2. In the example X(t) = 3 for t between 5 and 8 minutes. Time t (in min) 59 60 CHAPTER 4. STOCHASTIC PROCESSES Note: • X(t) can be thought of as the number of occurrences until time t. • Similarly, X(t2 ) − X(t1 ) is the number of occurrences in the interval (t1 , t2 ]. • With the same argument, X(0) = 0 - ALWAYS! • The distribution of X(t) is Poisson with rate λt, since: X(t) = X(t) − X(0) ∼ P oλ(t−0) For a given Poisson process X(t) we define occurrences O0 = 0 Oj = time of the jth occurrence = = the first t for which X(t) ≥ j and the inter-arrival time between successive hits: Ij = Oj − Oj−1 for j = 1, 2, . . . The time until the kth hit Ok is therefore given as the sum of inter-arrival times Ok = I1 + . . . + Ik . Theorem 4.1.2 X(t) is a Poisson process with rate λ ⇐⇒ The inter-arrival times I1 , I2 , . . . are i.i.d. Expλ . Further: the time until the kth hit Ok is an Erlangk,λ distributed variable, ⇐⇒ X(t) is a Poisson process with rate λ. This theorem is very important! - it links the Poisson, Exponential, and Erlang distributions tightly together. Consider the following very important example: Example 4.1.2 Hits on a webpage Hits on a popular Web page occur according to a Poisson Process with a rate of 10 hits/min. One begins observation at exactly noon. 1. Evaluate the probability of 2 or less hits in the first minute. Let X be the number of hits in the first minute, then X is a Poisson variable with λ = 10: P (X ≤ 2) = P o10 (2) = e−10 + 10 · e−10 + 102 /2e−10 = 0.0028. or table-lookup p.788 2. Evaluate the probability that the time till the first hit exceeds 10 seconds. Let Y be the time until the first hit - then Y has an Exponential distribution with parameter λ = 10 per minute or λ = 1/6 per second. P (Y ≥ 10) = 1 − P (Y ≤ 10) = 1 − (1 − e−10·1/6 ) = e−5/3 = 0.1889. 3. Evaluate the mean and the variance of the time till the 4th hit. Let Z be the time till the 4th hit. Then Z has an Erlang distribution with stage parameter k = 4 and λ = 10 per minute. E[Z] = V ar[Z] = 4 k = = 0.4 minutes λ 10 k 4 = = 0.04 minutes2 . λ2 100 4.2. BIRTH & DEATH PROCESSES 61 4. Evaluate the probability that the time till the 4th hit exceeds 24 seconds. P (Z > 24) = 1 − P (Z ≤ 24) = 1 − Erlang4,1/6 (24) = = 1 − (1 − P o1/6·24 (4 − 1)) = P o4 (3) table,p.786 = 0.433 5. The number of hits in the first hour is Poisson with mean 600. You would like to know the probability of more than 650 hits. Exact calculation isn’t really feasible. So approximate this probability and justify your approximation. A Poisson distribution with large rate λ can be approximated by a normal distribution (corollary from the Central Limit Theorem) with mean µ = λ and variance σ 2 = λ. Then X approx ∼ N (600, 600) → Z := approx X−600 √ ∼ 600 N (0, 1). Then: P (X > 650) = 1 − P (X ≤ 650) = 1 − P ≈ 1 − Φ(2.05) table, p.789 = 650 − 600 ≈ Z≤ √ 600 1 − 0.9798 = 0.0202. Another interesting property of the Poisson process model that’s consistent with thinking of it as “random occurrences” in time t, is Theorem 4.1.3 Let X(t) be a Poisson process. Given that X(T ) = k, the conditional distribution of the time of the k occurrences O1 , . . . , Ok is the same as the distribution of k ordered independent standard uniform variables U(1) , U(2) , . . . , U(k) . This tells us a way to simulate a Poisson process with rate λ on the interval (0, T ): - first, draw a Poisson value w from P oλT . - This tells us, how many uniform values Ui we need to simulate. - secondly, generate w many standard uniform values u1 , . . . , uw - define oi = T · u(i) , where u(i) is the ith smallest value among u1 , . . . , uw . The above theorem tells us, that, if we pick k values at random from an interval (0, t), we can assume, that if we order them, the distance between two successive values has an exponential distribution with rate λ = k/t. So far, we are looking only at arrivals of events. Besides that, we could, for example, look at the number of surfers that are on our web site at the same time. There, we have departures as well and, related to that, the time each surfer stays - which we will call service time (from the perspective of the web server). This leads us to another model: 4.2 Birth & Death Processes Birth & Death Processes (B+D) are a generalization of Poisson processes, that allow the modelling of queues, i.e. we assume, that arrivals stay some time in the system and leave again after that. A B+D process X(t) is a stochastic process that monitors the number of people in a system. If X(t) = k, we assume that at time t there are k people in the system. Again, X(t) is called the state at time t. X(t) is in {0, 1, 2, 3 . . .}, for all t. We can visualize (see fig. 4.1) the set-up for a B+D process in a state diagram as movements between consecutive states. Conditional on X(t) = k we either move to state k + 1 or to k − 1, depending on whether a birth or a death occurs first. 62 CHAPTER 4. STOCHASTIC PROCESSES 0 1 2 3 ... Figure 4.1: State diagram of a Birth & Death process. Example 4.2.1 Stat Printer The ”heavy-duty” printer in the Stats department gets 3 jobs per hour. On average, it takes 15 min to complete printing. The printer queue is monitored for a day (8h total time): Jobs arrive at the following points in time (in h): job i arrival time 1 0.10 2 0.40 3 0.78 4 1.06 5 1.36 6 1.84 7 1.87 8 2.04 9 3.10 10 4.42 job i arrival time 11 4.46 12 4.66 13 4.68 14 4.89 15 5.01 16 5.56 17 5.56 18 5.85 19 6.32 20 6.99 The printer finishes jobs at: job i finishing time 1 0.22 2 0.63 3 1.61 4 1.71 5 1.76 6 1.90 7 2.32 8 2.68 9 3.42 10 4.67 job i finishing time 11 5.31 12 5.54 13 5.59 14 5.62 15 5.84 16 6.04 17 6.83 18 7.10 19 7.23 20 7.39 Let X(t) be the number of jobs in the printer and its queue at time t. X(t) is a Birth & Death process. (a) Draw the graph of X(t) for the values monitored. Number of jobs in the system at time t 5 4 3 X(t) 2 1 0 0 2 4 6 Time (in h) (b) What is the (empirical) probability that there are 5 jobs in the printer and its queue at some time t? The empirical probability for 5 jobs in the printer is the time, X(t) is in state 5 divided by the total time: \= 5) = (5.31 − 5.01) + (5.59 − 5.56) = 0.33 = 0.04125. P (X(t) 8 8 4.2. BIRTH & DEATH PROCESSES 63 The model for a birth or a death is given conditional on X(t) = k as: B D if = time till a potential birth ∼ Expλk = time till a potential deathj ∼ Expµk B<D B>D the move is to state k + 1 at time t + B the move is to state k − 1 at time t + D remember: P (B = D) = 0! B and D are independent for each state k. This implies, that, given the process is in state k, the probability to move to state k+1 k−1 λk µk + λk µk . is µk + λk is Then Y = min(B, D) is the remaining time in state k until the move. What can we say about the distribution of Y := min(B, D)? P (Y ≤ y) = P (min(B, D) ≤ y) = P (B ≤ y ∪ D ≤ y) = way, way back, we looked at this kind of probability. . . = P (B ≤ y) + P (D ≤ y) − P (B ≤ y ∩ D ≤ y) = B, Dare independent. = P (B ≤ y) + P (D ≤ y) − P (B ≤ y) · P (D ≤ y) = −λk y +1−e −µk y = 1−e = 1 − e(λk +µk )y = Expλk +µk (y), − (1 − e −λk y )(1 − e −µk y B ∼ Expλk , D ∼ Expµk )= i.e. Y itself is again an exponential variable, its rate is the the sum of the rates of B and D. Knowing the distribution of Y , the staying time in state k, gives us, e.g. the possibility to compute the mean staying time in state k. The mean staying time in state k is the expected value of an exponential distribution with rate λk + µk . The mean staying time therefore is 1/(λk + µk ). We will mark this result by (*) and use it below. Note: A Poisson process with rate λ is a special case of a Birth & Death process, where the birth rates and death rates are constant, λk = λ and µk = 0 for all k. The analysis of this model for small t is mathematically difficult because of “start-up” effects - but in some cases, we can compute the “large t” behaviour. A lot depends on the ratio of births and deaths: this is result (*) 64 CHAPTER 4. STOCHASTIC PROCESSES Number of jobs in the system at time t 15 X(t) 5 0 0 500 1000 1500 2000 Time (in sec) Number of jobs in the system at time t 60 X(t) 20 0 0 500 1000 1500 2000 Time (in sec) Number of jobs in the system at time t 400 200 X(t) 0 0 500 1000 1500 2000 Time (in sec) In the picture, three different simulations of Birth & Death processes are shown. Only in the first case, the process is stable (birth rate < death rate). The other two processes are unstable (birth rate = death rate (2nd process) and birth rate > death rate (3rd process)). Only if the B+D process is stable, it will find an equilibrium after some time - this is called the steady state of the B+D process. Mathematically, the notion of a steady state state translates to lim P (X(t) = k) = pk for all k, P where the pk are numbers between 0 and 1, with k pk = 1. The pk probabilities are called the steady state probabilities of the B+D process, they form a density function for X. At the moment it is not clear why the steady state probabilities need to exist at all - in fact, for some systems they do not. For the moment, though, we will assume, that they exist and try to compute them. On the way to the result we will come across conditions under which they will actually exist. We can figure out what the pk must be as follows: t→∞ time in state k until time t → total time t mean stay # of visits to state k by time t in state k total time t # of visits to state k by time t total time t A fraction of λk λk +µk pk → pk → pk (λk + µk ) use (*) = long run rate of visits to k visits to state k result in moves to state k + 1, so λk · pk (λk + µk ) = λk pk λk + µk 4.2. BIRTH & DEATH PROCESSES 65 is the long run rate of transitions from state k to k + 1 and, similarly, µk pk is the long run rate of transitions from state k to state k − 1. From the very simple principle, that overall everything that flows into state k has to flow out again, we get the so-called balance equations for the steady state probabilities: Balance equations The Flow-In = Flow-Out Principle provides us with the means to derive equations between the steady state probabilities. 1. For state 0 µ1 p1 = λ0 p0 0 0 i.e. p1 = 1 λ0 µ1 p0 . 1 2. For state 1 µ1 p1 + λ1 p1 = λ0 p0 + µ2 p2 1 0 0 i.e. p2 = λ1 µ2 p1 = λ0 λ1 µ1 µ2 p0 . 1 1 2 2 3. For state 2 µ2 p2 + λ2 p2 = λ0 p0 + µ3 p3 1 0 0 i.e. p3 = λ2 µ3 p2 = λ0 λ1 λ2 µ1 µ2 µ3 p0 . 1 1 2 2 4. . . . for state k we get: pk = λ0 λ1 λ2 · . . . · λk−1 p0 . µ1 µ2 µ3 · . . . · µk ok, so now we know all the steady state probabilities depending on p0 . But what use has that, if we don’t know p0 ? Here, we need another trick: we know, that the steady state probabilities are the density function for the state X. Their sum must therefore be 1! Then 1 = p0 + p1 + p2 + . . . λ0 λ1 λ0 + + ... = p0 1 + µ1 µ1 µ2 | {z } :=S If this sum S converges, we get p0 = S −1 . If it doesn’t converge, we know that we don’t have any steady state probabilities, i.e. the B+D process never reaches an equilibrium. The analysis of S is crucial! 66 CHAPTER 4. STOCHASTIC PROCESSES If S exists, p0 does, and with p0 all pk , which implies, that the Birth & Death process is stable. If S does not exist, then the B & D process is unstable, i.e. it does not have an equilibrium and no steady state probabilities. Special case: Birth & Death process with constant birth and death rates If all birth rates λk = λ a constant birth rate and µk = µ for all k, the ratio between birth and death rates is constant, too: λ a := µ a is called the traffic intensity. In order to decide, whether a specific B&D process is stable or not, we have to look at S. For constant traffic intensities, S can be written as: ∞ S =1+ X λ0 λ1 λ0 + + . . . = 1 + a + a2 + a3 + ... = ak µ1 µ1 µ2 k=0 This sum is called a geometric series. If 0 < a < 1 the series converges: S= 1 1−a p0 = S −1 = 1 − a pk = ak · (1 − a) = P (X(t) = k), i.e. for 0 < a < 1. Then: X(t) therefore has a Geometric distribution for large t: X(t) ∼ Geo1−a for large t and 0 < a < 1. Example 4.2.2 Printer queue (continued) A certain printer in the Stat Lab gets jobs with a rate of 3 per hour. On average, the printer needs 15 min to finish a job. Let X(t) be the number of jobs in the printer and its queue at time t. X(t) is a Birth & Death Process with constant arrival rate λ = 3 and constant death rate µ = 4. (a) Draw a state diagram for X(t) - the (technically possible) number of jobs in the printer (and its queue). 3 0 3 1 4 3 2 4 3 3 4 4 (b) What is the (true) probability that at some time t the printer is idle? P (X(t) = 0) = p0 = 1 − 3 = 0.25. 4 (c) What is the probability that there arrive more than 7 jobs during one hour? Let Y be the number of arrivals. Y is a Poisson Process with arrival rate λ = 3. Y (t) ∼ P oλ·t . P (Y (1) > 7) = 1 − P (Y (1) ≤ 7) = 1 − P o3·1 (7) = 1 − 0.949 = 0.051. 4.2. BIRTH & DEATH PROCESSES 67 (d) What is the probability that the printer is idle for more than 1 hour at a time? (Hint: this is the probability that X(t) = 0 and - at the same time - no job arrives for more than one hour.) Let Z be the time until the next arrival, then Z ∼ Exp3 . P (X(t) = 0 ∩ Z > 1) X(t),Zindependent = P (X(t) = 0) · P (Z > 1) = p0 · (1 − Exp3 (1)) = 0.25 · e−3 = 0.0124 (d) What is the probability that there are 3 jobs in the printer queue at time t (including the job printed at the moment)? P (X(t) = 3) = p3 = .753 · .25 = 0.10 (e) What is the difference between the true and the empirical probability of exactly 5 jobs in th printer system? p5 = 0.755 · 0.25 = 005933 pb5 = 0.04125 The probabilities are close - which means that we can assume that this particular printer queue actually behaves like a Birth & Death process. Two Examples of Birth & Death Processes Communication System A communication system has two processors for decoding messages and a buffer that will hold at most two further messages. If the buffer is full, any incoming message is lost. Each processor needs on average 2 min to decode a message. Messages come in with a rate of 1 per min. Assume exponential distributions both for interarrival times between messages and the time needed to decode a message. Use a Birth & Death process to model the number of messages in the system. (a) Carefully draw a transition state diagram. 1 0 1 1 0.5 1 2 1 1 3 1 4 1 (b) Find the steady state probability that there are no messages in the system. Since p0 is S −1 , we need to compute S first: S = = Therefore p0 = 19 . λ0 λ0 λ1 λ0 λ1 λ2 λ0 λ1 λ2 λ3 + + + = µ1 µ1 µ2 µ1 µ2 µ3 µ1 µ2 µ3 µ4 1 + 2 + 2 + 2 + 2 = 9. 1+ 68 CHAPTER 4. STOCHASTIC PROCESSES (c) Find the steady state probability mass function of X(t) (i.e. find the other pk s) (d) How many messages are in the system on average once it has reached its stable state? ICB - International Campus Bank Ames The ICB Ames employs three tellers. Customers arrive according to a Poisson process with a mean rate of 1 per minute. If a customer finds all tellers busy, he or she joins a queue that is serviced by all tellers. Transaction times are independent and have exponential distributions with mean 2 minutes. (a) Sketch an appropriate state diagram for this queueing system. 1 0 1 1 0.5 1 2 1 1 3 1.5 1 4 1.5 1.5 (b) As it turns out, the large t probability that there are no customers in the system is p0 = 1/9. What is the probability that a customer entering the bank must enter the queue and wait for service? A person entering the bank must queue for service, if at least three people are in the bank (not including the one who enters at the moment). We are therefore looking for the large t probability, that X(t) is at least 3: P (X(t) ≥ 3) = = 1 − P (X(t) < 3) = 1 − P (X(t) ≤ 2) = 1 1 4 1 1 − (p0 + p1 + p2 ) = 1 − ( + 2 · + 2 · ) = . 9 9 9 9 Chapter 5 Queuing systems Queueing system server 1 enter the system some population of individuals server 2 according to some random mechanism exit the system server c Depending upon the specifics of the application there are many varieties of queuing systems corresponding combinations like • size & nature of calling population is it finite or a (potentially) infinite set? is it homogenous, i.e. only one type of individuals, or several types? • random mechanism by which the population enters • nature of the queue finite/ infinite • nature of the queuing discipline FIF0 or priority (i.e. different types of individuals get different treatment) • number and behavior of servers distribution of service times? Variety of matters one might want to investigate: • mean number of individuals in the system • mean queue length • fraction of customers turned away (for a finite queue length) • mean waiting time • etc. 69 70 CHAPTER 5. QUEUING SYSTEMS Notation: FY /FS /c/K FY distribution of inter arrival times Y FS distribution of service times S Usually, we will assume a FIFO queue. c number of servers K maximum number of individuals in the system The distributions FY and FS are chosen from a small set of distributions, denoted by: M exponential (Memoryless) distribution Ek Erlang k stage D deterministic distribution G a general, not further specified distribution Usually, we will be interested in a couple of properties for each queuing system. The main properties are: L length of system on average = average number of individuals in the system W average waiting time (time in queue and service time Ws average service time Wq average waiting time in queue Lq average length of queue The main idea of a queuing system is to model the number of individuals in the system (queue and server) as a Birth & Death Process. This gives us a way to analyze the queuing systems using the methods from the previous chapter. X(t) = number of individuals in the system at time t is the Birth & Death Process we’ ll be interested in. 5.1 Little’s Law The next theorem is based on a simple principle. However, don’t underestimate the theorem’s importance! - It links waiting times to the number of people in the system and will be very useful in the future: Theorem 5.1.1 (Little’s Law) For a queuing system in steady state L = λ̄ · W where L is the average number of individuals in the system, W is the average time spent in the system, and λ̄ is the average rate at which individuals enter the system. This theorem can also be applied to the queue itself: Lq = λ¯q · Wq and the service center Ls = λ¯s · Ws . Relationship between properties For the properties L, W, Ws , Wq , Lq there are usually two different ways to get a result: the easy and the difficult one (involving infinite summations and similar nice stuff). To make sure, we choose the easy way of computation, here’s an overview of the relationship between all these properties: 5.2. THE M/M/1 QUEUE 71 L = E[X(t)] W = L/λ WS = E[S] 5.2 Lq = Wq λ Wq = W - WS The M/M/1 Queue Situation: exponential inter arrival times with rate λ, exponential service times with rate µ. Let N (t) denote the number of individuals in the system at time t, N (t) can then be modeled using a Birth & Death process: λk µk = birth rate = death rate = arrival rate = service rate = λ for all k = µ for all k We’ve already seen that the ratio λ/µ is very important for the analysis of the B&D process. This ratio is called the traffic intensity a. For a M/M/1 queuing system, the traffic intensity is constant for all k. The previous problem of finding the steady state probabilities of the B&D process is equivalent to finding the steady state probabilities for the number of individuals in the queuing system, pk = lim P (N (t) = k). t→∞ The B&D balance equations then say that 1 = p0 (1 + a + a2 + . . .) The question whether we have a steady state or not is the reduced to the question whether or not a < 1. P∞ 1 For a < 1 S = k=0 ak = 1−a , then 1−a p0 = p1 = a(1 − a) p2 = a2 (1 − a) p3 = a3 (1 − a) ... pk = ak (1 − a) N (t) has a geometric distribution for large t! The mean number of individuals in the queuing system L is limt→∞ E[N (t)]: L = lim E[N (t)] = t→∞ a . 1−a The closer the service rate is to the arrival rate, the larger is the expected number of people in the system. The mean time spent in the system W is then, using Little’s Law: W = L/λ = 1 1 · µ 1−a mean number in system mean time in system 72 CHAPTER 5. QUEUING SYSTEMS The overall time spent in the system is a sum of the time spent in the queue Wq and the average time spent in service Ws . Since we know that service times are exponentially distributed with rate µ, Ws = µ1 . For the time spent waiting in the queue we therefore get 1 1 1 a Wq = W − Ws = −1 = . µ 1−a µ1−a The average length of the queue is, using Little’s Law again, given as average length of the queue Lq = Wq λ = server utilization rate a2 1−a Further we see, that the long run probability that the server is busy is given as: p := P (server busy) = 1 − P (system empty) = 1 − p0 = a. distribution of time in queue Denote by q(t) the time that an individual entering the system at time t has to spend waiting in the queue. Clearly, the distribution of the waiting times depends on the number of individuals already in the queue at time t. Assume, that the individual entering the system doesn’t have to wait at all in the queue - that happens exactly when the system at time t is empty. For large t we therefore get: lim P (q(t) = 0) = p0 = 1 − a. t→∞ Think: If there are k individuals in the queue, the waiting time q(t) is Erlangk,µ (we’re waiting for k departures, departures occur with a rate of µ). This is a conditional distribution for q(t), since it is based on the assumption about the number of people in the queue: q(t)|X(t) = k ∼ Erlangk,µ for large t We can put those pieces together in order to get the large t distribution for q(t) using the theorem of total probability. For large t and x ≥ 0: Fq(t) (x) = P (q(t) ≤ x) = total probability! ∞ X P (q(t) ≤ x ∩ X(t) = k) = = = k=0 ∞ X P (q(t) ≤ x|X(t) = k)pk = k=0 = p0 + ∞ X (1 − P oµx (k − 1))pk = k=1 = p0 + ∞ X (1 − = 1 − ae j e−µx j=0 k=1 −x/W k−1 X (xµ) )pk = . . . j! , where W is the average time spent in the system, W = 1 µ · 1 1−a = 1 µ−λ . Example 5.2.1 Printer Queue (continued) A certain printer in the Stat Lab gets jobs with a rate of 3 per hour. On average, the printer needs 15 min to finish a job. average time in the queue 5.3. THE M/M/1/K QUEUE 73 Let X(t) be the number of jobs in the printer and its queue at time t. We know already: X(t) is a Birth & Death Process with constant arrival rate λ = 3 and constant death rate µ = 4. The properties of interest for this printer system then are: L = E[X(t)] = 0.75 a = =3 1−a 0.25 Wq 1 = 0.25 hours = 15 min µ 3 L = = 1 hour = λ 3 = W − Ws = 0.75 hours = 45 minutes Lq = Wq λq = 0.75 · 3 = 2.25 Ws W = On average, a job has to spend 45 min in the queue. What is the probability that a job has to spend less than 20 min in the queue? We denoted the waiting time in the queue by q(t). q(t) has distribution function 1 − aey(µ−λ) . The probability asked for is P (q(t) < 2/6) = 1 − 0.75 · e−2/6·(4−3) = 0.4626. 5.3 The M/M/1/K queue An M/M/1 queue with limited size K is a lot more realistic than the one with infinite queue. Unfortunately, it’s computationally slightly harder to deal with. X(t) is modelled as a Birth & Death Process with states {0, 1, ..., K}. Its state diagram looks like: λ 0 λ λ 1 λ 2 µ µ K µ µ Since X(t) has only a finite number of states, it’s a stable process independently from the values of λ and µ. The steady state probabilities pk are: = ak p0 pk = p0 = S−1 = 1−a 1 − aK+1 where a = µλ , the traffic intensity and S = 1 + a + a2 + ... + aK = The mean number of individuals in the queue L then is: 1−aK+1 1−a . L = E[X(t)] = 0 · p0 + 1 · p1 + 2 · p2 + ... + K · pK = = K X k=0 kpk = K X k=0 kak · p0 = ... = a (K + 1)aK+1 − 1−a 1 − aK+1 Another interesting property of a queuing system with limited size is the number of individuals that get turned away. From a marketing perspective they are the ”expensive” ones - they are most likely annoyed and less inclined to return. It’s therefore a good strategy to try and minimize this number. 74 CHAPTER 5. QUEUING SYSTEMS Since an incoming individual is turned away, when the system is full, the probability for being turned away is pK . The rate of individuals being turned away therefore is pK · λ. For the expected total waiting time W , we used Little’s theorem: W = L λ̄ where λ̄ is the average arrival rate into the system. At this point we have to be careful when dealing with limited systems: λ̄ is NOT equal to the arrival rate λ. We have to adjust λ by the rate of individuals who are turned away. The adjusted rate λa of individuals entering the system is: λa = λ − pK λ = (1 − pK )λ. The expected total waiting time is then W = L/λa and the expected length of the queue Lq = Wq · λa . Example 5.3.1 Convenience Store In a small convenience store there’s room for only 4 customers. The owner himself deals with all the customers - he likes chatting a bit. On average it takes a customer 4 minutes to pay for his/her purchase. Customers arrive at an average of 1 per 5 minutes. If a customer finds the shop full, he/she will go away immediately. 1. What fraction of time will the owner be in the shop on his own? The number of customers in the shop can be modelled as a Birth& death Process with arrival rate λ = 0.2 per minute and µ = 0.25 per minute and upper size K = 4. The probability (or fraction of time) that the owner will be alone is p0 = 1−a 1−aK+1 = 0.2 1−0.85 = 0.2975. 2. What is the mean number of customers in the store? L= (K + 1)aK+1 a − = 1.56. 1−a 1 − aK+1 3. What fraction of customers is turned away per hour? p4 λ = 0.84 · 0.2975 · 0.2 per minute = 0.0243 per minute = 1.46 per hour 4. What is the average time a customer has to spend for check-out? W = L = 1.56/(0.2 − 0.0243) = 8.88 minutes . λa For limited queueing systems the adjusted arrival rate λa must be considered for applying Little’s Law. 5.4 The M/M/c queue Again, X(t) the number of individuals in the queueing system can be modeled as a birth & death process. The transition state diagram for the X(t) is: λ λ 0 λ 2 1 µ λ 2µ λ K c-1 3µ (c-1)µ λ 1c cµ cµ 5.4. THE M/M/C QUEUE 75 Clearly, the critical thing here in terms of whether or not a steady state exists is whether or not λ/(cµ) < 1. Let a = λ/µ and % = a/c = λ/(cµ). The balance equations for steady state are: p1 p2 p3 ... pc = ap0 a2 p0 = 2·1 a3 = 3! p0 = = %· pc+1 ... pn ac c! p0 = %n−c · ac c! p0 for n ≥ c. ac c! p0 In order to get an expression for p0 , we use the condition, that the overall sum of probabilities must be 1. This gives: 1= ∞ X pk = p0 k=0 ∞ ac X k−c + % k! c! c−1 k X a k=0 ! k=c c−1 k X a ac 1 . + = p0 c! 1 − % k=0 k! {z } | =:S This system has a steady state, if % < 1, in that case p0 = S −1 . The other probabilities pn are given as: an n! p0 an c!cn−c p0 pn = for 0 ≤ n ≤ c − 1 for n ≥ c A key descriptor for the system is the probability, with which an entering customer must queue for service this is equal to the probability that all servers are busy. The formula for this probability is known as Erlang’s C formula or Erlang’s delay formula and written as C(c, a). Obviously, in this queueing system an entering individual must queue for service exactly then, when c or more individuals are already in the system. lim P (X(t) ≥ c) = C(c, a) = t→∞ ∞ X pk = 1 − k=c c−1 pk = k=0 X ak 1 − p0 k! = p0 c−1 X prob. that entering customer must queue ! = k=0 = p0 ac . c!(1 − %) The steady state mean number of individuals in the queue Lq is Lq = ∞ X (k − c)pk = k=c a c! ∞ X k%k k=1 | {z } P k 0 %( ∞ k=1 % ) = (k − c) k=c c = p0 ∞ X % C(c, a). 1−% = p0 average number in queue ak p0 = c!ck−c ac % c! (1 − %)2 76 CHAPTER 5. QUEUING SYSTEMS By Little’s Law mean waiting time in queue 1 % 1 Wq = Lq /λ = · C(c, a) = C(c, a). λ 1−% cµ(1 − %) overall time in system Then W = Wq + Ws = Wq + number in system 1 , µ and the overall number of individual in the system is on average L=W ·λ=a+ % C(c, a). 1−% Example 5.4.1 Bank A bank has three tellers. Customers arrive at a rate of 1 per minute. Each teller needs on average 2 min to deal with a customer. What are the specifications of this queue? For this queue,λ = 1, µ = 0.5, c = 3, a = µλ = 2, and % = ac = 2/3. The probability that no customer is in he bank then is p0 = c−1 k X a k=0 ac 1 + k! c! 1 − % Lq Wq Ws W !−1 −1 4 23 1 1 = 1+2+ + · = . 2 3! 1 − % 9 ac % = 8/9. c! (1 − %)2 = Lq /λ = 8/9 minutes . 1 = = 2 minutes. µ = Ws + Wq = 26/9 minutes = p0 L = W λ = 26/9 5.5 Machine-Repairmen Problems The Machine Repairmen problems are a special case of queueing systems with non-constant arrival rates: the arrival rate at time t depends on the state the system is in at time t. The situation is like this: We have K identical machines and c repairmen. Each machine breaks down with rate λ, the time between breaks is exponentially distributed. A single repairman can work on only one machine - each machine can be repaired by only a single repairman at a time. The time for a repair is also exponentially distributed with rate µ. Denote with X(t) the number of machines broken down at time t. Then, N has values between 0 and K at any given time. X(t) can be modeled as a B&D process, the transition state diagram is: K 0 K-1 1 K-c+1 ... K-c c-1 (c-1) K-c-1 ... c c c . K-1 .. c . K. . c This system, though there are only K + 1 possible states, is computationally tricky. 5.5. MACHINE-REPAIRMEN PROBLEMS 77 Denote by a = λ/µ the traffic intensity. The steady state probabilities are given as p1 p2 p3 ... pk = = = Kap0 pc+1 ... pk K(K−1) 2 a p0 2·1 K(K−1)(K−2) 3 a p0 3! K k = = K−c c apc = k! k−c K c! % k = (K − c)% K c ac p0 ac p0 for k ≥ c ak p0 for k ≤ c and c−1 X K p0 = k=0 k k a + K X k! k=c c! % k−c !−1 K c a k Once we have the steady state probabilities, we can compute the average number of machines waiting for services, Lq , as: K X Lq = (k − c)pk k=c+1 Example 5.5.1 A company has 3 machines that each break down at exponentially distributed intervals of mean 1 hour. When one of these machines is down, only one repairman can work on it at a time, and while it is down the company loses $50/hour in profit. Repairs require 1 hour on average to complete and the repair times are exponentially distributed. Repairmen can be hired for $30/hour (and must be paid regardless of whether there is a broken machine to work on). Do you recommend that the company employs 1 repairman or 2 repairmen? Show your whole analysis. Let X(t) be the number of broken machines at time t. Then X(t) can be modelled by a B&D process for a) a single repairman and b) two repairmen. The two state diagrams are: b) a) 0 1 2 3 0 1 2 3 For a) the balance equations give 1 = p0 (1 + 3 + 6 + 6) 1 = p0 (1 + 3 + 3 + 1.5) → p0 = 0.0625 and p1 = 0.1875, p2 = p3 = 0.375 2 → p0 = 17 6 6 3 and p1 = , p2 = , p3 = 17 17 17 If hiring one repairman the company loses $30 each hour. On top of that each hour down time of each machine costs another $50. Let L be the loss per hour. a) E[L] = 30p0 + 80p1 + 130p2 + 180p3 = 30 + 240 + 780 + 1080 = 133.125. 16 b) E[L] = 1 (60 · 2 + 110 · 6 + 160 · 6 + 210 · 3) = 139.412 17 With only one repairman the overall costs for the company are slightly less than with two repairmen. 78 CHAPTER 5. QUEUING SYSTEMS Chapter 6 Statistical Inference From now on, we will use probability theory only to find answers to the questions arising from specific problems we are working on. In this chapter we want to draw inferences about some characteristic of an underlying population - e.g. the average height of a person. Instead of measuring this characteristic of each individual, we will draw a sample, i.e. choose a “suitable” subset of the population and measure the characteristic only for those individuals. Using some probabilistic arguments we can then extend the information we got from that sample and make an estimate of the characteristic for the whole population. Probability theory will give us the means to find those estimates and measure, how “probable” our estimates are. Of course, choosing the sample, is crucial. We will demand two properties from a sample: • the sample should be representative - taking only basketball players into the sample would change our estimate about a person’s height drastically. • if there’s a large number in the sample we should come close to the “true” value of the characteristic The three main area of statistics are • estimation of parameters: point or interval estimates: “my best guess for value x is . . . ”, “my guess is that value x is in interval (a, b)” • evaluation of plausibility of values: hypothesis testing • prediction of future (individual) values 6.1 Parameter Estimation Statistics are all around us - scores in sports, prices at the grocers, weather reports ( and how often they turn out to be close to the actual weather), taxes, evaluations . . . The most basic form of statistics are descriptive statistics. But - what exactly is a statistic? - Here is the formal definition: Definition 6.1.1 (Statistics) Any function W (x1 , . . . , xk ) of observed values x1 , . . . , xk is called a statistic. Some statistics you already know are: 79 80 CHAPTER 6. STATISTICAL INFERENCE P X̄ = n1 i Xi X(1) - Parentheses indicate that the values are sorted X(n) X(n) − X(1) value(s) that appear(s) most often “middle value” - that value, for which one half of the data is larger, the other half is smaller. If n is odd the median is X(n/2) , if n is even, the median is the average of the two middle values: 0.5 · X((n−1)/2) + 0.5 · X((n+1)/2) For this section it is important to distinguish between xi and Xi properly. If not stated otherwise, any capital letter denotes some random variable, a small letter describes a realization of this random variable, i.e. what we have observed. xi therefore is a real number, Xi is a function, that assigns a real number to an event from the sample space. Mean (Average) Minimum Maximum Range Mode Median Definition 6.1.2 (estimator) Let X1 , . . . , Xk be k i.i.d random variables with distribution Fθ with (unknown) parameter θ. A statistic Θ̂ = Θ̂(X1 , . . . , Xk ) used to estimate the value of θ is called an estimator of θ. θ̂ = Θ̂(x1 , . . . , xk ) is called an estimate of θ. Desirable properties of estimates: x true value • value for from one sample Unbiasedness: • E[Θ̂] = θ • • and not • • • • • • • •• • • •• • • • x Efficiency: estimator 1 • •• • • •• • x• • • • • is better than • • •• • • x• • • • • • • estimator 2 Consistency: • • Consistency, if we have a larger sample size n, we want the estimate θ̂ to be closer to the true parameter θ. • • • • Efficiency, for two estimators, Θ̂1 and Θ̂2 of the same parameter θ, Θ̂1 is said to be more efficient than Θ̂2 , if V ar[Θ̂1 ] < V ar[Θ̂2 ] • • x • • • • • • • • • • • x• • • lim P (|Θ̂ − θ| > ) = 0 n→∞ same estimator for n = 100 • •• • • x•• • •• • • • • Unbiasedness, i.e the expected value of an estimator is the true parameter: for n = 10000 Example 6.1.1 Let X1 , . . . , Xn be n i.i.d. random variables with E[Xi ] = µ. Pn Then X̄ = n1 i=1 Xi is an unbiased estimator of µ, because n E[X̄] = 1X 1 E[Xi ] = n · µ = µ. n i=1 n ok, so, once we have an estimator, we can decide, whether it has the properties. But how do we find estimators? 6.1. PARAMETER ESTIMATION 6.1.1 81 Maximum Likelihood Estimation Situation: We have n data values x1 , . . . , xn . The assumption is, that these data values are realizations of n i.i.d. random variables X1 , . . . , Xn with distribution Fθ . Unfortunately the value for θ is unknown. X observed values x1, x2, x3, ... f with =0 f with = -1.8 f with =1 By changing the value for θ we can “move the density function fθ around” - in the diagram, the third density function fits the data best. Principle: since we do not know the true value θ of the distribution, we take that value θ̂ that most likely produced the observed values, i.e. maximize something like P (X1 = x1 ∩ X2 = x2 ∩ . . . ∩ Xn = xn ) Xi are independent! = = P (X1 = x1 ) · P (X2 = x2 ) · . . . · P (Xn = xn ) = = n Y P (Xi = xi ) (*) i=1 This is not quite the right way to write the probability, if X1 , . . . , Xn are continuous variables. (Remember: P (X = x) = 0 for a continuous variable X; this is still valid) We use the above “probability” just as a plausibility argument. To come around the problem that P (X = x) = 0 for a continuous variable, we will write (*) as: n Y pθ (xi ) and i=1 | n Y fθ (xi ) i=1 {z } for discreteXi | {z } for continuousXi where pθ ) is the probability mass function of discrete Xi (all Xi have the same, since they are i.d) and fθ is the density function of continuous Xi . Both these functions depend on θ. In fact, we can write the above expressions as a function in θ. This function, which we will denote by L(θ), is called the Likelihood function of X1 , . . . , Xn . The goal is now, to find a value θ̂ that maximizes the Likelihood function. (this is what “moves” the density to the right spot, so it fits the observed values well) How do we get a maximum of L(θ)? - by the usual way, we maximize a function! - Differentiate it and set it to zero! (After that, we ought to check with the second derivative, whether we’ve actually found a maximum, but we won’t do that unless we’ve found more than one possible value for θ̂.) Most of the time, it is difficult to find a derivative of L(θ) - instead we use another trick, and find a maximum for log L(θ), the Log-Likelihood function. Note: though its name is “log”, we use the natural logarithm ln. The plan to find an ML-estimator is: 1. Find Likelihood function L(θ). 2. Get natural log of Likelihood function log L(θ). 3. Differentiate log-Likelihood function with respect to θ. 82 CHAPTER 6. STATISTICAL INFERENCE 4. Set derivative to zero. 5. Solve for θ. Example 6.1.2 Roll a Die A die is rolled until its face shows a 6. repeating this experiment 100 times gave the following results: #Rolls of a Die until first 6 20 15 # runs 10 5 0 1 k # trials 1 18 2 20 3 8 4 9 2 5 9 3 4 5 6 6 5 7 8 7 8 9 11 8 3 9 5 14 11 3 16 14 3 20 15 3 27 16 1 17 1 20 1 29 21 1 27 1 29 1 We know, that k the number of rolls until a 6 shows up has a geometric distribution Geop . For a fair die, p is 1/6. The Geometric distribution has probability mass function p(k) = (1 − p)k−1 · p. What is the ML-estimate p̂ for p? 1. Likelihood function L(p): Since we have observed 100 outcomes k1 , ..., k100 , the likelihood function L(p) = L(p) = 100 Y (1 − p)ki −1 p = p100 · i=1 100 Y P100 (1 − p)ki −1 = p100 · (1 − p) i=1 (ki −1) 2. log of Likelihood function log L(p): P100 log p100 · (1 − p) i=1 ki −100 = P100 = log p100 + log (1 − p) i=1 ki −100 = ! 100 X = 100 log p + ki − 100 log(1 − p). = i=1 i=1 p(ki ), P100 = p100 · (1 − p) i=1 log L(p) Q100 i=1 ki −100 . 6.1. PARAMETER ESTIMATION 83 3. Differentiate log-Likelihood with respect to p: d log L(p) dp 1 100 + p = 100 X 100 X 100(1 − p) − ! ! ki − 100 p = i=1 1 p(1 − p) = −1 = 1−p ki − 100 i=1 1 p(1 − p) = ! 100 − p 100 X ! ki . i=1 4. Set derivative to zero. For the estimate p̂ the derivative must be zero: ⇐⇒ 1 p̂(1 − p̂) d log L(p̂) = 0 dp ! 100 X 100 − p̂ ki = 0 i=1 5. Solve for p̂. 1 p̂(1 − p̂) 100 − p̂ 100 X ! ki = 0 = 0 i=1 100 − p̂ 100 X ki i=1 100 p̂ = P100 i=1 ki In total, we have an estimate p̂ = 100 568 = 1 100 1 P100 i=1 ki . = 0.1710. Example 6.1.3 Red Cars in the Parking Lot The values 3,2,3,3,4,1,4,2,4,3 have been observed while counting the numbers of red cars pulling into the parking lot # 22 between 8:30 - 8:40 am Mo to Fr during two weeks. The assumption is, that these values are realizations of ten independent Poisson variables with (the same) rate λ. What is the Maximum Likelihood estimate of λ? x The probability mass function of a Poisson distribution is pλ (x) = e−λ · λx! . We have ten values xi , this gives a Likelihood function: L(λ) = 10 Y i=1 e−λ · 10 Y P10 1 λXi = e−10λ · λ i=1 Xi · Xi ! Xi ! i=1 The log-Likelihood then is log L(λ) = −10λ + ln(λ) · 10 X i=1 Xi − X ln(Xi ). 84 CHAPTER 6. STATISTICAL INFERENCE Differentiating the log-Likelihood with respect to λ gives: 10 d 1 X log L(λ) = −10 + · Xi dλ λ i=1 Setting it to zero: 10 1 X · Xi = 10 λ̂ i=1 10 ⇐⇒ λ̂ = 1 X Xi 10 i=1 ⇐⇒ λ̂ = 29 = 2.9 10 This gives us an estimate for λ - and since λ is also the expected value of the Poisson distribution, we can say, that on average the number of red cars pulling into the parking lot each morning between 8:30 and 8:40 pm is 2.9. ML-estimators for µ and σ 2 of a Normal distribution Let X1 , . . . , Xn be n independent, identically distributed normal variables with E[Xi ] = µ and V ar[Xi ] = σ 2 . µ and σ 2 are unknown. The normal density function fµ,σ2 is fµ,σ2 (x) = √ 1 2πσ 2 e− (x−µ)2 2σ 2 Since we have n independent variables, the Likelihood function is a product of n densities: L(µ, σ 2 ) = n Y i=1 √ 1 2πσ 2 e− (xi −µ)2 2σ 2 = (2πσ 2 )n/2 · e− Pn i=1 (xi −µ)2 2σ 2 Log-Likelihood: log L(µ, σ 2 ) = − n 1 X n ln(2πσ 2 ) − 2 (xi − µ)2 2 2σ i=1 Since we have now two parameters, µ and σ 2 , we need to get 2 partial derivatives of the log-Likelihood: d log L(µ, σ 2 ) dµ = 0−2· d log L(µ, σ 2 ) dσ 2 = − n n −1 X 1 X 2 · (x − µ) · (−1) = (xi − µ)2 i 2σ 2 i=1 σ 2 i=1 n n 1 1 X + (xi − µ)2 2 σ2 2(σ 2 )2 i=1 We know, must find values for µ and σ 2 , that yield zeros for both derivatives at the same time. d Setting dµ log L(µ, σ 2 ) = 0 gives n 1X µ̂ = xi , n i=1 plugging this value into the derivative for σ 2 and setting n d dσ 2 log L(µ̂, σ 2 ) = 0 gives 1X σˆ2 = (xi − µ̂)2 n i=1 6.2. CONFIDENCE INTERVALS 6.2 85 Confidence intervals The previous section has provided a way to compute point estimates for parameters. Based on that, our next question is - how good is this point estimate? or How close is the estimate to the true value of the parameter? Instead of just looking at the point estimate, we will now try to compute an interval around the estimated parameter value, in which the true parameter is “likely” to fall. An interval like that is called confidence interval. Definition 6.2.1 (Confidence Interval) Let θ̂ be an estimate of θ. If P (|θ̂ − θ| < e) > α, we say, that the interval (θ̂ − e, θ̂ + e) is an α · 100% Confidence interval of θ (cf. fig. 6.1). Usually, α is a value near 1, such as 0.9, 0.95, 0.99, 0.999, etc. Note: • for any given set of values x1 , . . . , xn the value or θ̂ is fixed, as well as the interval (θ̂ − e, θ̂ + e). • The true value θ is either within the confidence interval or not. P( prob £ 1 - x-e x -e x+e -e< x < e + e) > prob £ 1 - confidence interval for Figure 6.1: The probability that x̄ falls into an e interval around µ is α. Vice versa, we know, that for all of those x̄ µ is within an e interval around x̄. That’s the idea of a confidence interval. !!DON’T DO!! A lot of people are tempted to reformulate the above probability to: P (θ̂ − e < θ < θ̂ + e) > α Though it looks ok, it’s not. Repeat: IT IS NOT OK. θ is a fixed value - therefore, it does not have a probability to fall into some interval. The only probability that we have, here, is P (θ − e < θ̂ < θ + e) > α, we can therefore say, that θ̂ has a probability of at least α to fall into an e- interval around θ. Unfortunately, that doesn’t help at all, since we do not know θ! How do we compute confidence intervals, then? - that’s different for each estimator. First, we look at estimates of a mean of a distribution: 86 6.2.1 CHAPTER 6. STATISTICAL INFERENCE Large sample C.I. for µ Situation: we have a large set of observed values (n > 30, usually). The assumption is, that these values are realizations of n i.i.d random variables X1 , . . . , Xn with E[X̄] = µ and V ar[X̄] = σ 2 . We already know from the previous section, that X̄ is an unbiased ML-estimator for µ. But we know more! - The CLT tells us, that in exactly the situation we are X̄ is an approximately normal 2 distributed random variable with E[X̄] = µ and V ar[X̄] = σn . We therefore can find the boundary e by using the standard normal distribution. Remember: if X̄ ∼ X̄−µ √ ∼ N (0, 1) = Φ: N (µ, σ 2 /n) then Z := σ/ n P (|X̄ − µ| ≤ e) ≥ α use standardization |X̄ − µ| e √ ≤ √ ⇐⇒ P ≥α σ/ n σ/ n e √ ⇐⇒ P |Z| < ≥α σ/ n e e √ ⇐⇒ P − √ < Z < ≥α σ/ n σ/ n e e √ ⇐⇒ Φ −Φ − √ ≥α σ/ n σ/ n e e √ √ ⇐⇒ Φ − 1−Φ ≥α σ/ n σ/ n e √ −1≥α ⇐⇒ 2Φ σ/ n e α √ ⇐⇒ Φ ≥1+ 2 σ/ n e 1 + α −1 √ ≥Φ ⇐⇒ 2 σ/ n σ 1+α √ ⇐⇒ e ≥ Φ−1 2 n | {z } :=z This computation gives a α· 100% confidence value around µ as: σ σ X̄ − z · √ , X̄ + z · √ n n Now we can do an example: Example 6.2.1 Suppose, we want to find a 95% confidence interval for the mean salary of an ISU employee. A random sample of 100 ISU employees gives us a sample mean salary of $21543 = x̄. Suppose, the standard deviation of salaries is known to be $3000. By using the above expression, we get a 95% confidence interval as: 1 + 0.95 3000 −1 21543 ± Φ ·√ = 21543 ± Φ−1 (0.975) · 300 2 100 How do we read Φ−1 (0.975) from the standard normal table? - We look for which z the probability N(0,1) (z) ≥ 0.975! 6.2. CONFIDENCE INTERVALS 87 This gives us z = 1.96, the 95% confidence interval is then: 21543 ± 588, i.e. if we repeat this study 100 times (with 100 different employees each time), we can say: in 95 out of 100 studies, the true parameter µ falls into a $588 range around x̄. Critical values for z, depending on α are: α 0.90 0.95 0.98 0.99 z = Φ−1 ( 1+α 2 ) 1.65 1.96 2.33 2.58 Problem: Usually, we do not know q σP n 1 2 Slight generalization: use s = n−1 i=1 (Xi − X̄) instead of σ! An α· 100% confidence interval for µ is given as s s X̄ − z · √ , X̄ + z · √ n n where z = Φ−1 ( 1+α 2 ). Example 6.2.2 Suppose, we want to analyze some complicated queueing system, for which we have no formulas and theory. We are interested in the mean queue length of the system after reaching steady state. The only thing possible for us is to run simulations of this system and look at the queue length at some large time t, e.g. t = 1000 hrs. After 50 simulations, we have got data: X1 = number in queue at time 1000 hrs in 1st simulation X2 = number in queue at time 1000 hrs in 2nd simulation ... X50 = number in queue at time 1000 hrs in 50th simulation q Pn 1 2 Our observations yield an average queue length of x̄ = 21.5 and s = n−1 i=1 (xi − x̄) = 15. A 90% confidence interval is given as s s x̄ − z · √ , x̄ + z · √ n n = = 15 15 21.5 − 1.65 · √ , 21.5 + 1.65 · √ 50 50 (17.9998, 25.0002) = Example 6.2.3 The graphs show a set of 80 experiments. The values from each experiment are shown in one of the green framed boxes. Each experiment consists of simulating 20 values from a standard normal distributions (these are drawn as the small blue lines). For each of the experiments, the average from the 20 value is computed (that’s x̄) as well as a confidence interval for µ- for parts a) and b) it’s the 95% confidence interval, for part c) it is the 90% confidence interval, for part d) it is the 99% confidence interval. The upper and the lower confidence bound together with the sample mean are drawn in red next to the sampled observations. 88 CHAPTER 6. STATISTICAL INFERENCE a) 95 % confidence intervals b) 95 % confidence intervals c) 90 % confidence intervals d) 99 % confidence intervals There are several things to see from this diagram. First of all, we know in this example the “true” value of the parameter µ - since the observations are sampled from a standard normal distribution, µ = 0. The true parameter is represented by the straight horizontal line through 0. 6.2. CONFIDENCE INTERVALS 89 We see, that each sample yields a different confidence interval, all of the are centered around the sample mean. The different sizes of the intervals tells us another thing: in computing these confidence intervals, we had to use the estimate s instead of the true standard deviation σ = 1. Each sample gave a slightly different standard deviation. Overall, though, the intervals are not very different in lengths between parts a) and b). The intervals in c) tend to be slightly smaller, though - these are 90% confidence intervals, whereas the intervals in part d) are on average larger than the first ones, they are 99% confidence intervals. Almost all the confidence intervals contain 0 - but not all. And that is, what we expect. For a 90% confidence interval we expect, that in 10 out of 100 times, the confidence interval does not contain the true parameter. When we check that - we see, that in part c) 4 out of the 20 confidence intervals don’t contain the true parameter for µ - that’s 20%, on average we would expect 10% of the conficence intervals not to contain µ. Official use of Confidence Intervals: In an average of 90 out of 100 times the 90% confidence interval of θ does contain the true value of θ. 6.2.2 Large sample confidence intervals for a proportion p Let p be a proportion of a large population or a probability. In order to get an estimate for this proportion, we can take a sample of n individuals from the population and check each one of them, whether or not they fulfill the criterion to be in that proportion of interest. Mathematically, this corresponds to a Bernoulli-n-sequence, where we are only interested in the number of “successes”, X, which in our case corresponds to the number of individuals that qualify for the interesting subgroup. X then has a Binomial distribution, with parameters n and p. We know, that X̄ is an estimate for E[X]. Now think: for a Binomial variable X, the expected value E[X] = n · p. Therefore we get an estimate p̂ for p as p̂ = n1 X̄. Furthermore, we even have a distribution for p̂ for large n: Since X̂ is, using the CLT, a normal variable with E[X̄] = np and V ar[X̄] = np(1 − p), we get that for large n p̂ is a approximately normally distributed with E[p̂] = p and V ar[p̂] = p(1−p) . n BTW: this tells us, that p̂ is an unbiased estimator of p. Prepared with the distribution of p̂ we can set up an α · 100% confidence interval as: (p̂ − e, p̂ + e) where e is some positive real number with: P (|p̂ − p| ≤ e) ≥ α We can derive the expression for e in the same way as in the previous section and we come up with: e=z· p(1 − p) n where z = Φ−1 ( 1+α 2 ). We also run into the problem that e in this form is not ready for use, since we do not know the value for p. In this situation, we have different options. We can either find a value that maximizes the value p(1 − p) or we can substitute an appropriate value for p. 6.2.2.1 Conservative Method: replace p(1 − p) by something that’s guaranteed to be at least as large. The function p(1 − p) has a maximum for p = 0.5. p(1 − p) is then 0.25. p 90 CHAPTER 6. STATISTICAL INFERENCE The conservative α · 100% confidence interval for p is 1 p̂ ± z · √ 2 n where z = Φ−1 ( 1+α 2 ). 6.2.2.2 Substitution Method: Substitute p̂ for p, then: The α · 100% confidence interval for p by substitution is r p̂(1 − p̂) p̂ ± z · n where z = Φ−1 ( 1+α 2 ). Where is the difference between the two methods? • for large n there is almost no difference at all • if p̂ is close to 0.5, there is also almost no difference Besides that, conservative confidence intervals (as the name says) are larger than confidence intervals found by substitution. However, they are at the same time easier to compute. Example 6.2.4 Complicated queueing system, continued Suppose, that now we are interested in the large t probability p that a server is available. Doing 100 simulations has shown, that in 65 of them a server was available at time t = 1000 hrs. What is a 95% confidence interval for this probability? 60 If 60 out of 100 simulations showed a free server, we can use p̂ = 100 = 0.6 as an estimate for p. −1 For a 95% confidence interval, z = Φ (0.975) = 1.96. The conservative confidence interval is: 1 1 p̂ ± z √ = 0.6 ± 1.96 √ = 0.6 ± 0.098. 2 n 2 · 100 For the confidence interval using substitution we get: r r p̂(1 − p̂ 0.6 · 0.4 p̂ ± z = 0.6 ± 1.96 = 0.6 ± 0.096. n 100 Example 6.2.5 Batting Average In the 2002 season the baseball player Sammy Sosa had a batting average of 0.288. (The batting average is the ratio of the number of hits and the times at bat.) Sammy Sosa was at bats 555 times in the 2002 season. Could the ”true” batting average still be 0.300? Compute a 95% Confidence Interval for the true batting average. Conservative Method gives: 0.288 ± 0.288 ± 1 1.96 · √ 2 555 0.042 6.2. CONFIDENCE INTERVALS 91 Substitution Method gives: r 0.288 ± 1.96 · 0.288 ± 0.038 0.288(1 − 0.288) 555 The substitution method gives a slightly smaller confidence interval, but both intervals contain 0.3. There is not enough evidence to allow the conclusion that the true average is not 0.3. Confidence intervals give a way to measure the precision we get from simulations intended to evaluate probabilities. But besides that it also gives as a way to plan how large a sample size has to be to get a desired precision. Example 6.2.6 Suppose, we want to estimate the fraction of records in the 2000 IRS data base that have a taxable income over $35 K. We want to get a 98% confidence interval and wish to estimate the quantity to within 0.01. this means that our boundaries e need to be smaller than 0.01 (we’ll choose a conservative confidence interval for ease of computation): e ≤ 0.01 1 z is 2.33 ⇐⇒ z · √ ≤ 0.01 2 n 1 ⇐⇒ 2.33 · √ ≤ 0.01 2 n √ 2.33 ⇐⇒ n ≥ = 116.5 2 · 0.01 ⇒ n ≥ 13573 6.2.3 Related C.I. Methods Related to the previous confidence intervals, are confidence intervals for the difference between two means, µ1 − µ2 , or the difference between two proportions , p1 − p2 . Confidence intervals for these differences are given as: large n confidence interval for µ1 − µ2 (based on independent X̄1 and X̄2 ) X̄1 − X̄2 ± z q s21 n1 + s22 n2 large n confidence interval for p1 − p2 (based on independent p̂1 and p̂2 ) p̂1 − p̂2 ± z 12 q or p̂1 − p̂2 ± z stitution) 1 qn1 + 1 n2 p̂1 (1−p̂1 ) n1 (conservative) + p̂2 (1−p̂2 ) n2 (sub- Why? The argumentation in both cases is very similar - we will only discuss the confidence interval for the difference between means. X̄1 − X̄2 is approximately normal, since X̄1 and X̄2 are approximately normal, with (X̄1 , X̄2 are independent) E[X̄1 − X̄2 ] V ar[X̄1 − X̄2 ] = E[X̄1 ] − E[X̄2 ] = µ1 − µ2 = V ar[X̄1 ] + (−1)2 V ar[X̄2 ] = σ2 σ2 + n1 n2 92 CHAPTER 6. STATISTICAL INFERENCE Then we can use the same arguments as before and get a C.I. for µ1 − µ2 as shown above. 2 Example 6.2.7 Assume, we have two parts of the IRS database: East Coast and West Coast. We want to compare the mean taxable income between reported from the two regions in 2000. East Coast West Coast # of sampled records: n1 = 1000 n2 = 2000 mean taxable income: x̄1 = $37200 x̄2 = $42000 standard deviation: s1 = $10100 s2 = $15600 We can, for example, compute a 2 sided 95% confidence interval for µ1 − µ2 = difference in mean taxable income as reported from 2000 tax return between East and West Coast: r 101002 156002 + = −5000 ± 927 37000 − 42000 ± 1000 2000 Note: this shows pretty conclusively that the mean West Coast taxable income is higher than the mean East Coast taxable income (in the report from 2000). The interval contains only negative numbers - if it contained the 0, the message wouldn’t be so clear. One-sided intervals idea: use only one of the end points x̄ ± z √sn This yields confidence intervals for µ of the form (##, ∞) | {z } (−∞, #) | {z } upper bound lower bound However, now we need to adjust z to the new situation. Instead of worrying about two tails of the normal distribution, we use for a one sided confidence interval only one tail. P( x < x + e) < e x+e prob ≤ 1 - confidence interval for Figure 6.2: One sided (upper bounded) confidence interval for µ (in red). Example 6.2.8 complicated queueing system, continued What is a 95% upper confidence bound of µ, the parameter for the length of the queue? −1 x̄ + z √sn is the upper confidence bound. Instead of z = Φ−1 ( α+1 (α) (see fig. 6.2). 2 ) we use z = Φ This gives: 21.5 + 1.65 √1550 = 25.0 as the upper confidence bound. Therefore the one sided upper bounded confidence interval is (−∞, 25.0). Critical values z = Φ−1 (α) for the one sided confidence interval are 6.3. HYPOTHESIS TESTING 93 α 0.90 0.95 0.98 0.99 z = Φ−1 (α) 1.29 1.65 2.06 2.33 Example 6.2.9 Two different digital communication systems send 100 large messages via each system and determine how many are corrupted in transmission. p̂1 = 0.05 and pˆ2 = 0.10. What’s the difference in the corruption rates? Find a 98% confidence interval: Use: r 0.05 · 0.95 0.10 · 0.90 + = −0.05 ± 0.086 0.05 − 0.1 ± 2.33 · 100 100 This calculation tells us, that based on these sample sizes, we don’t even have a solid idea about the sign of p1 − p2 , i.e. we can’t tell which of the pi s is larger. So far, we have only considered large sample confidence intervals. The problem with smaller sample sizes is, that the normal approximation in the CLT doesn’t work, if the standard deviation σ 2 is unknown. What you need to know is, that there exist different methods to compute C.I. for smaller sample sizes. 6.3 Hypothesis Testing Example 6.3.1 Tea Tasting Lady It is claimed that a certain lady is able to tell, by tasting a cup of tea with milk, whether the milk was put in first or the tea was put in first. To put the claim to the test, the lady is given 10 cups of tea to taste and is asked to state in each case whether the milk went in first or the tea went in first. To guard against deliberate or accidental communication of information, before pouring each cup of tea a coin is tossed to decide whether the milk goes in first or the tea goes in first. The person who brings the cup of tea to the lady does not know the outcome of the coin toss. Either the lady has some skill (she can tell to some extent the difference) or she has not, in which case she is simply guessing. Suppose, the lady tested 10 cups of tea in this manner and got 9 of them right. This looks rather suspicious, the lady seems to have some skill. But how can we check it? We start with the sceptical assumption that the lady does not have any skill. If the lady has no skill at all, the probability she gives a correct answer for any single cup of tea is 1/2. The number of cups she gets right has therefore a Binomial distribution with parameter n = 10 and p = 0.5. The diagram shows the probability mass function of this distribution: p(x) observed x x 94 CHAPTER 6. STATISTICAL INFERENCE Events that are as unlikely or less likely are, that the lady got all 10 cups right or - very different, but nevertheless very rare - that she only got 1 cup or none right (note, this would be evidence of some “antiskill”, but it would certainly be evidence against her guessing). The total probability for these events is (remember, the binomial probability mass function is p(x) = nx px (1− p)n−x ) p(0) + p(1) + p(9) + p(10) = 0.510 + 10 · 0.510 + 10 · 0.510 + 0.510 = 0.021 i.e. what we have just observed is a fairly rare event under the assumption, that the lady is only guessing. This suggests, that the lady may have some skill in detecting which was poured first into the cup. Jargon: 0.021 is called the p-value for testing the hypothesis p = 0.5. The fact that the p-value is small is evidence against the hypothesis. Hypothesis testing is a formal procedure to check whether or not some - previously made - assumption can be rejected based on the data. We are going to abstract the main elements of the previous example and cook up a standard series of steps for hypothesis testing: Example 6.3.2 University CC administrators have historical records that indicate that between August and Oct 2002 the mean time between hits on the ISU homepage was 2 per min. They suspect that in fact the mean time between hits has decreased (i.e. traffic is up) - sampling 50 inter-arrival times from records for November 2002 gives: X̄ = 1.7 min and s = 1.9 min. Is this strong evidence for an increase in traffic? Formal Procedure Application to Example 1 2 3 4 5 State a “null hypothesis” of the form H0 : function of parameter(s) = # meant to embody a status quo/ pre data view State an “alternative hypothesis” of the form > 6= # Ha : function of parameter(s) < meant to identify departure from H0 State test criteria - consists of a test statistic, a “reference distribution” giving the behavior of the test statistic if H0 is true and the kinds of values of the test statistic that count as evidence against H0 . show computations Report and interpret a p-value = “observed level of significance, with which H0 can be rejected”. This is the probability of an observed value of the test statistic at least as extreme as the one at hand. The smaller this value is, the less likely it is that H0 is true. Note aside: a 90% confidence interval for µ is H0 : µ = 2.0 min between hits Ha : µ < 2 (traffic is down) √ test statistic will be Z = X̄−2.0 s/ n The reference density will be standard normal, large negative values for Z count as evidence against H0 in favor of Ha sample gives z = 1.7−2.0 √ 1.9/ 50 = −1.12 The p-value is P (Z ≤ −1.12) = Φ(−1.12) = 0.1314 This value is not terribly small - the evidence of a decrease in mean time between hits is somewhat weak. s x̄ ± 1.65 √ = 1.7 ± 0.44 n This interval contains the hypothesized value of µ = 2.0 6.3. HYPOTHESIS TESTING 95 There are four basic hypothesis tests of this form, testing a mean, a proportion or differences between two means or two proportions. Depending on the hypothesis, the test statistic will be different. Here’s an overview of the tests, we are going to use: Hypothesis Statistic Reference Distribution X̄−# H0 : µ = # Z = s/√n Z is standard normal H0 : p = # Z= H0 : µ1 − µ2 = # Z= H0 : p 1 − p 2 = # where p̂ = Z=√ q p̂−# Z is standard normal #(1−#) n X̄ 1 −X̄2 −# r s2 1 n1 s2 p̂(1−p̂) 1 n1 Z is standard normal + n2 2 p̂1 −p̂q 2 −# + n1 Z is standard normal 2 n1 p̂1 +n2 p̂2 n1 +n2 . Example 6.3.3 tax fraud Historically, IRS taxpayer compliance audits have revealed that about 5% of individuals do things on their tax returns that invite criminal prosecution. A sample of n = 1000 tax returns produces p̂ = 0.061 as an estimate of the fraction of fraudulent returns. does this provide a clear signal of change in the tax payer behavior? 1. state null hypothesis: H0 : p = 0.05 2. alternative hypothesis: Ha : p 6= 0.05 3. test statistic: p̂ − 0.05 Z=p 0.05 · 0.95/n Z has under the null hypothesis a standard normal distribution, any large values of Z - positive and negative values - will count as evidence against H0 . p 4. computation: z = (0.061 − 0.05)/ 0.05 · 0.95/1000 = 1.59 5. p-value: P (|Z| ≥ 1.59) = P (Z ≤ −1.59) + P (Z ≥ 1.59) = 0.11 This is not a very small value, we therefore have only very weak evidence against H0 . Example 6.3.4 life time of disk drives n1 = 30 and n2 = 40 disk drives of 2 different designs were tested under conditions of “accelerated” stress and times to failure recorded: Standard Design n1 = 30 x̄1 = 1205 hr s1 = 1000 hr New Design n2 = 40 x̄2 = 1400 hr s2 = 900 hr Does this provide conclusive evidence that the new design has a larger mean time to failure under “accelerated” stress conditions? 1. state null hypothesis: H0 : µ1 = µ2 (µ1 − µ2 = 0) 2. alternative hypothesis: Ha : µ1 < µ2 (µ1 − µ2 < 0) 96 CHAPTER 6. STATISTICAL INFERENCE 3. test statistic is: x̄1 − x̄2 − 0 Z= q 2 s1 s22 n1 + n2 Z has under the null hypothesis a standard normal distribution, we will consider large negative values of Z as evidence against H0 . p 4. computation: z = (1205 − 1400 − 0)/ 10002 /30 + 9002 /40 = −0.84 5. p-value: P (Z < −0.84) = 0.2005 This is not a very small value, we therefore have only very weak evidence against H0 . Example 6.3.5 queueing systems 2 very complicated queuing systems: We’d like to know, whether there is a difference in the large t probabilities of there being an available server. We do simulations for each system, and look whether at time t = 2000 there is a server available: System 1 System 2 n1 = 1000 runs n2 = 500 runs (each with different random seed) server at time t = 2000 available? 551 p̂1 = 1000 p̂2 = 303 500 How strong is the evidence of a difference between the t = 2000 availability of a server for the two systems? 1. state null hypothesis: H0 : p1 = p2 (p1 − p2 = 0) 2. alternative hypothesis: Ha : p1 6= p2 (p1 − p2 6= 0) 3. Preliminary: note that, if there was no difference between the two systems, a plausible estimate of the availability of a server would be p̂ = np̂1 + np̂2 551 + 303 = 0.569 = n1 + n2 1000 + 500 a test statistic is: Z=p p̂1 − p̂2 − 0 q p̂(1 − p̂) · n11 + 1 n2 Z has under the null hypothesis a standard normal distribution, we will consider large values of Z as evidence against H0 . p p 4. computation: z = (0.551 − 0.606)/( 0.569 · (1 − 0.569) 1/1000 + 1/500) = −2.03 5. p-value: P (|Z| > 2.03) = 0.04 This is fairly strong evidence of a real difference in t=2000 availabilities of a server between the two systems. 6.4 Regression A statistical investigation only rarely focusses on the distribution of a single variable. We are often interested in comparisons among several variables, in changes in a variable over time, or in relationships among several variables. The idea of regression is that we have a vector X1 , . . . , Xk and try to approximate the behavior of Y by finding a function g(X1 , . . . , Xk ) such that Y ≈ g(X1 , . . . , Xk ). Simplest possible version is: 6.4. REGRESSION 6.4.1 97 Simple Linear Regression (SLR) Situation: k = 1 and Y is approximately linearly related to X, i.e. g(x) = b0 + b1 x. Notes: • Scatterplot of Y vs X should show the linear relationship. • linear relationship may be true only after a transformation of X and/or Y , i.e. one needs to find the “right” scale for the variables: e.g. if y ≈ cxb , this is nonlinear in x, but it implies that ln x + ln c, ln y ≈ b |{z} |{z} x0 =:y 0 so on a log scale for both x and y-axis one gets a linear relationship. Example 6.4.1 Mileage vs Weight Measurements on 38 1978-79 model automobiles. Gas mileage in miles per gallon as measured by Consumers’ Union on a test track. Weight as reported by automobile manufacturer. A scatterplot of mpg versus weight shows an indirect proportional relationship: 35 30 M 25 P G 20 2.25 Transform weight by 1 x 3.00 Weight 3.75 to weight−1 . A scatterplot of mpg versus weight−1 reveals a linear relationship: 35 30 M 25 P G 20 0.300 0.375 1/Wgt 0.450 Example 6.4.2 Olympics - long jump Results for the long jump for all olympic games between 1900 and 1996 are: 98 CHAPTER 6. STATISTICAL INFERENCE year 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 year long jump (in m) 1900 7.19 1904 7.34 1908 7.48 1912 7.60 1920 7.15 1924 7.45 1928 7.74 1932 7.64 1936 8.06 1948 7.82 1952 7.57 1956 7.83 A scatterplot of long jump versus year shows: long jump (in m) 8.12 8.07 8.90 8.24 8.34 8.54 8.54 8.72 8.67 8.50 l o 8.5 n g j 8.0 u m 7.5 p 0 20 40 year 60 80 The plot shows that it is perhaps reasonable to say that y ≈ β0 + β1 x The first issue to be dealt with in this context is: if we accept that y ≈ β0 + β1 x, how do we derive empirical values of β0 , β1 from n data points (x, y)? The standard answer is the “least squares” principle: y y=b0 + b1 x 0.75 0.50 0.25 -0.00 0.2 0.4 0.6 0.8 x In comparing lines that might be drawn through the plot we look at: Q(b0 , b1 ) = n X (yi − (b0 + b1 xi )) 2 i=1 i.e. we look at the sum of squared vertical distances from points to the line and attempt to minimize this 6.4. REGRESSION 99 sum of squares: d Q(b0 , b1 ) db0 = −2 d Q(b0 , b1 ) db1 = −2 n X i=1 n X (yi − (b0 + b1 xi )) xi (yi − (b0 + b1 xi )) i=1 Setting the derivatives to zero gives: nb0 − b1 b0 n X xi − b1 i=1 n X i=1 n X xi = x2i = i=1 n X i=1 n X yi xi yi i=1 Least squares solutions for b0 and b1 are: b1 = Pn Pn Pn Pn 1 (x − x̄)(yi − ȳ) i=1 xi · i=1 xi yi − n i=1 yi i=1 Pn i = Pn Pn 2 2 1 2 (x − x̄) xi − ( xi ) i=1 i i=1 n b0 = n slope i=1 n 1X 1X yi − b1 xi ȳ − x̄b1 = n i=1 n i=1 y − intercept at x = 0 These solutions produce the “best fitting line”. Example 6.4.3 Olympics - long jump, continued X := year, Y := long jump n X n X xi = 1100, i=1 i=1 n X x2i = 74608 n X yi = 175.518, i=1 yi2 = 1406.109, n X xi yi = 9079.584 i=1 i=1 The parameters for the best fitting line are: b1 = b0 = 9079.584 − 1100·175.518 22 11002 22 74608 − = 0.0155(in m) 175.518 1100 − · 0.0155 = 7.2037 22 22 The regression equation is high jump = 7.204 + 0.016year (in m). It is useful for addition, to be able to judge how well the line describes the data - i.e. how “linear looking” a plot really is. There are a couple of means doing this: 100 CHAPTER 6. STATISTICAL INFERENCE 6.4.1.1 The sample correlation r This is what we would get for a theoretical correlation % if we had random variables X and Y and their distribution. Pn Pn Pn Pn 1 i=1 xi yi − n i=1 xi · i=1 yi i=1 (xi − x̄)(yi − ȳ) = r r := pPn Pn 2 2 Pn Pn Pn Pn 2 2 1 1 2 2 i=1 (xi − x̄) · i=1 (yi − ȳ) i=1 xi − n ( i=1 xi ) i=1 yi − n ( i=1 yi ) The numerator is the numerator of b1 , one part under the root of the denominator is the denominator of b1 . Because of its connection to %, the sample correlation r fulfills (it’s not obvious to see, and we want prove it): • −1 ≤ r ≤ 1 • r = ±1 exactly, when all (x, y) data pairs fall on a single straight line. • r has the same sign as b1 . Example 6.4.4 Olympics - long jump, continued r= q 9079.584 − (74608 − 1100·175.518 22 11002 n )(1406.109 − = 0.8997 175.5182 ) 22 Second measure for goodness of fit: 6.4.1.2 Coefficient of determination R2 This is based on a comparison of “variation accounted for” by the line versus “raw variation” of y. The idea is that !2 n n n X X 1 X 2 2 (yi − ȳ) = yi − yi = SST T otal S um of S quares n i=1 i=1 i=1 is a measure for the variability of y. (It’s (n − 1) · s2y ) y 0.75 0.50 y 0.25 -0.00 0.2 0.4 0.6 0.8 x After fitting the line ŷ = b0 + b1 x, one doesn’t predict y as ȳ anymore and suffer the errors of prediction above, but rather only the errors ŷi − yi =: ei . So, after fitting the line n X i=1 e2i = n X (yi − ŷ)2 = SSES um of S quares of E rrors i=1 is a measure for the remaining/residual/ error variation. 6.4. REGRESSION 101 y y=b0 + b1 x 0.75 0.50 0.25 -0.00 0.2 0.4 0.6 0.8 x The fact is that SST ≥ SSE. So: SSR := SST − SSE ≥ 0. SSR is taken as a measure of “variation accounted for” in the fitting of the line. The coefficient of determination R2 is defined as: R2 = SSR SST Obviously: 0 ≤ R2 ≤ 1, the closer R2 is to 1, the better is the linear fit. Example 6.4.5 Olympics - long jump, continued Pn Pn 2 2 SST = i=1 yi2 − n1 ( i=1 yi ) = 1406.109 − 175.518 = 5.81. 22 SSE and SSR? y x ŷ y − ŷ (y − ŷ)2 7.185 0 7.204 -0.019 0.000 7.341 4 7.266 0.075 0.006 7.480 8 7.328 0.152 0.023 7.601 12 7.390 0.211 0.045 7.150 20 7.513 -0.363 0.132 7.445 24 7.575 -0.130 0.017 7.741 28 7.637 0.104 0.011 7.639 32 7.699 -0.060 0.004 8.060 36 7.761 0.299 0.089 7.823 48 7.947 -0.124 0.015 7.569 52 8.009 -0.440 0.194 7.830 56 8.071 -0.241 0.058 8.122 60 8.133 -0.011 0.000 8.071 64 8.195 -0.124 0.015 8.903 68 8.257 0.646 0.417 8.242 72 8.319 -0.077 0.006 8.344 76 8.381 -0.037 0.001 8.541 80 8.443 0.098 0.010 8.541 84 8.505 0.036 0.001 8.720 88 8.567 0.153 0.024 8.670 92 8.629 0.041 0.002 8.500 96 8.691 -0.191 0.036 SSE = 1.107 So SSR = SST − SSE = 5.810 − 1.107 = 4.703 and R2 = SSR SST = 0.8095. Connection between R2 and r R2 is SSR/SST - that’s the squared sample correlation of y and ŷ. If - and only if! - we use a linear function in x to predict y, i.e. ŷ = b0 + b1 x, the correlation between ŷ and x is 1. Then R2 (and only then!) is equal to the squared sample correlation between y and x = r2 : R2 = r2 if and only if ŷ = b0 + b1 x 102 CHAPTER 6. STATISTICAL INFERENCE Example 6.4.6 Olympics - long jump, continued R2 = 0.8095 = (0.8997)2 = r2 . It is possible to go beyond simply fitting a line and summarizing the goodness of fit in terms of r and R2 to doing inference, i.e. making confidence intervals, predictions, . . . based on the line fitting. But for that, we need a probability model. 6.4.2 Simple linear Regression Model In words: for input x the output y is normally distributed with mean β0 + β1 x = µy|x and standard deviation σ. In symbols: yi = β0 + β1 xi + i with i i.i.d. normal N (0, σ 2 ). β0 , β1 , and σ 2 are the parameters of the model and have to be estimated from the data (the data pairs (xi , yi ). Pictorially: y density of y given x x How do we get estimates for β0 , β1 , and σ 2 ? Point estimates: β̂0 = b0 , βˆ1 = b1 from Least Squares fit (which gives β̂0 and βˆ1 the name Least Squares Estimates. and σ 2 ? σ 2 measures the variation around the “true” line β0 + β1 x - we don’t know that line, but only b0 + b1 x. Should we base the estimation of σ 2 on this line? The “right” estimator for σ 2 turns out to be: n σ̂ 2 = 1 X SSE (yi − ŷi )2 = . n − 2 i=1 n−2 Example 6.4.7 Olympics - long jump, continued β̂0 = b0 = 7.2073 (in m) β̂1 = b1 = 0.0155 (in m) SSE 1.107 = = = 0.055. n−2 20 σ̂ 2 Overall, we assume a linear regression model of the form: y = 7.2037 + 0.0155x + e, with e ∼ N (0, 0.055). Appendix A Distribution Tables Binomial Distribution Bn,p (x) = bxc X n i=0 i pi (1 − p)n−i n=1 x=0 p=0.01 0.99 0.05 0.95 0.1 0.9 0.15 1/6 0.85 0.8333333 0.2 0.8 0.25 0.75 0.3 0.7 1/3 0.6666667 0.4 0.6 0.5 0.5 n=2 x=0 1 p=0.01 0.9801 0.9999 0.05 0.9025 0.9975 0.1 0.81 0.99 0.15 1/6 0.7225 0.6944444 0.9775 0.9722222 0.2 0.64 0.96 0.25 0.5625 0.9375 0.3 0.49 0.91 1/3 0.4444444 0.8888889 0.4 0.36 0.84 0.5 0.25 0.75 n=3 x=0 1 2 p=0.01 0.970299 0.999702 0.999999 0.05 0.857375 0.992750 0.999875 0.1 0.729 0.972 0.999 0.15 1/6 0.614125 0.5787037 0.939250 0.9259259 0.996625 0.9953704 0.2 0.512 0.896 0.992 0.25 0.421875 0.843750 0.984375 0.3 0.343 0.784 0.973 1/3 0.2962963 0.7407407 0.9629630 0.4 0.216 0.648 0.936 0.5 0.125 0.500 0.875 n=4 x=0 1 2 3 p=0.01 0.960596 0.999408 0.999996 1.000000 0.05 0.8145062 0.9859812 0.9995188 0.9999938 0.1 0.6561 0.9477 0.9963 0.9999 0.15 0.5220063 0.8904813 0.9880187 0.9994937 1/6 0.4822531 0.8680556 0.9837963 0.9992284 0.2 0.4096 0.8192 0.9728 0.9984 0.25 0.3164063 0.7382812 0.9492188 0.9960938 0.3 0.2401 0.6517 0.9163 0.9919 1/3 0.1975309 0.5925926 0.8888889 0.9876543 0.4 0.1296 0.4752 0.8208 0.9744 0.5 0.0625 0.3125 0.6875 0.9375 n=5 x=0 1 2 3 4 p=0.01 0.9509900 0.9990199 0.9999901 1.0000000 1.0000000 0.05 0.7737809 0.9774075 0.9988419 0.9999700 0.9999997 0.1 0.59049 0.91854 0.99144 0.99954 0.99999 0.15 0.4437053 0.8352100 0.9733881 0.9977725 0.9999241 1/6 0.4018776 0.8037551 0.9645062 0.9966564 0.9998714 0.2 0.32768 0.73728 0.94208 0.99328 0.99968 0.25 0.2373047 0.6328125 0.8964844 0.9843750 0.9990234 0.3 0.16807 0.52822 0.83692 0.96922 0.99757 1/3 0.1316872 0.4609053 0.7901235 0.9547325 0.9958848 0.4 0.07776 0.33696 0.68256 0.91296 0.98976 0.5 0.03125 0.18750 0.50000 0.81250 0.96875 n=6 x=0 1 2 3 4 5 p=0.01 0.9414801 0.9985396 0.9999804 0.9999999 1.0000000 1.0000000 0.05 0.7350919 0.9672262 0.9977702 0.9999136 0.9999982 1.0000000 0.1 0.531441 0.885735 0.984150 0.998730 0.999945 0.999999 0.15 0.3771495 0.7764843 0.9526614 0.9941148 0.9996013 0.9999886 1/6 0.3348980 0.7367755 0.9377143 0.9912980 0.9993356 0.9999786 0.2 0.262144 0.655360 0.901120 0.983040 0.998400 0.999936 0.25 0.1779785 0.5339355 0.8305664 0.9624023 0.9953613 0.9997559 0.3 0.117649 0.420175 0.744310 0.929530 0.989065 0.999271 1/3 0.0877915 0.3511660 0.6803841 0.8998628 0.9821674 0.9986283 0.4 0.046656 0.233280 0.544320 0.820800 0.959040 0.995904 0.5 0.015625 0.109375 0.343750 0.656250 0.890625 0.984375 n=7 x=0 1 2 3 4 5 6 p=0.01 0.9320653 0.9979690 0.9999660 0.9999997 1.0000000 1.0000000 1.0000000 0.05 0.6983373 0.9556195 0.9962430 0.9998064 0.9999940 0.9999999 1.0000000 0.1 0.4782969 0.8503056 0.9743085 0.9972720 0.9998235 0.9999936 0.9999999 0.15 0.3205771 0.7165841 0.9262348 0.9878968 0.9987784 0.9999305 0.9999983 1/6 0.2790816 0.6697960 0.9042245 0.9823674 0.9979960 0.9998714 0.9999964 0.2 0.2097152 0.5767168 0.8519680 0.9666560 0.9953280 0.9996288 0.9999872 0.25 0.1334839 0.4449463 0.7564087 0.9294434 0.9871216 0.9986572 0.9999390 0.3 0.0823543 0.3294172 0.6470695 0.8739640 0.9712045 0.9962092 0.9997813 1/3 0.05852766 0.26337449 0.57064472 0.82670325 0.95473251 0.99314129 0.99954275 0.4 0.0279936 0.1586304 0.4199040 0.7102080 0.9037440 0.9811584 0.9983616 0.5 0.0078125 0.0625000 0.2265625 0.5000000 0.7734375 0.9375000 0.9921875 103 104 APPENDIX A. DISTRIBUTION TABLES n=8 x=0 1 2 3 4 5 6 7 p=0.01 0.9227447 0.9973099 0.9999461 0.9999993 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.6634204 0.9427553 0.9942118 0.9996282 0.9999846 0.9999996 1.0000000 1.0000000 0.1 0.4304672 0.8131047 0.9619082 0.9949756 0.9995683 0.9999766 0.9999993 1.0000000 0.15 0.2724905 0.6571830 0.8947872 0.9786475 0.9971461 0.9997577 0.9999881 0.9999997 1/6 0.2325680 0.6046769 0.8651531 0.9693436 0.9953912 0.9995588 0.9999756 0.9999994 0.2 0.1677722 0.5033165 0.7969178 0.9437184 0.9895936 0.9987686 0.9999155 0.9999974 0.25 0.1001129 0.3670807 0.6785431 0.8861847 0.9727020 0.9957733 0.9996185 0.9999847 0.3 0.05764801 0.25529833 0.55177381 0.80589565 0.94203235 0.98870779 0.99870967 0.99993439 1/3 0.03901844 0.19509221 0.46822131 0.74135040 0.91205609 0.98033836 0.99740893 0.99984758 0.4 0.01679616 0.10637568 0.31539456 0.59408640 0.82632960 0.95019264 0.99148032 0.99934464 0.5 0.00390625 0.03515625 0.14453125 0.36328125 0.63671875 0.85546875 0.96484375 0.99609375 n=9 x=0 1 2 3 4 5 6 7 8 p=0.01 0.9135172 0.9965643 0.9999197 0.9999988 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.6302494 0.9287886 0.9916390 0.9993574 0.9999668 0.9999988 1.0000000 1.0000000 1.0000000 0.1 0.3874205 0.7748410 0.9470279 0.9916689 0.9991091 0.9999358 0.9999970 0.9999999 1.0000000 0.15 0.2316169 0.5994792 0.8591466 0.9660685 0.9943713 0.9993660 0.9999536 0.9999980 1.0000000 1/6 0.1938067 0.5426588 0.8217404 0.9519785 0.9910499 0.9988642 0.9999061 0.9999954 0.9999999 0.2 0.1342177 0.4362076 0.7381975 0.9143583 0.9804186 0.9969336 0.9996861 0.9999811 0.9999995 0.25 0.07508469 0.30033875 0.60067749 0.83427429 0.95107269 0.99000549 0.99865723 0.99989319 0.99999619 0.3 0.04035361 0.19600323 0.46283117 0.72965910 0.90119134 0.97470516 0.99570911 0.99956697 0.99998032 1/3 0.02601229 0.14306762 0.37717828 0.65030737 0.85515419 0.95757761 0.99171874 0.99903470 0.99994919 0.4 0.01007770 0.07054387 0.23178701 0.48260966 0.73343232 0.90064742 0.97496525 0.99619891 0.99973786 0.5 0.00195313 0.01953125 0.08984375 0.25390625 0.50000000 0.74609375 0.91015625 0.98046875 0.99804688 n=10 x=0 1 2 3 4 5 6 7 8 9 p=0.01 0.9043821 0.9957338 0.9998862 0.9999980 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.5987369 0.9138616 0.9884964 0.9989715 0.9999363 0.9999972 0.9999999 1.0000000 1.0000000 1.0000000 0.1 0.3486784 0.7360989 0.9298092 0.9872048 0.9983651 0.9998531 0.9999909 0.9999996 1.0000000 1.0000000 0.15 0.1968744 0.5442998 0.8201965 0.9500302 0.9901259 0.9986168 0.9998654 0.9999913 0.9999997 1.0000000 1/6 0.1615056 0.4845167 0.7752268 0.9302722 0.9845380 0.9975618 0.9997325 0.9999806 0.9999992 1.0000000 0.2 0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356 0.9999221 0.9999958 0.9999999 0.25 0.05631351 0.24402523 0.52559280 0.77587509 0.92187309 0.98027229 0.99649429 0.99958420 0.99997044 0.99999905 0.3 0.02824752 0.14930835 0.38278279 0.64961072 0.84973167 0.95265101 0.98940792 0.99840961 0.99985631 0.99999410 1/3 0.01734153 0.10404918 0.29914139 0.55926434 0.78687192 0.92343647 0.98033836 0.99659605 0.99964436 0.99998306 0.4 0.00604662 0.04635740 0.16728975 0.38228060 0.63310326 0.83376138 0.94523812 0.98770545 0.99832228 0.99989514 0.5 0.00097656 0.01074219 0.05468750 0.17187500 0.37695313 0.62304688 0.82812500 0.94531250 0.98925781 0.99902344 n=11 x=0 1 2 3 4 5 6 7 8 9 10 p=0.01 0.8953383 0.9948203 0.9998446 0.9999969 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.5688001 0.8981054 0.9847647 0.9984477 0.9998881 0.9999942 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.3138106 0.6973569 0.9104381 0.9814652 0.9972490 0.9997043 0.9999771 0.9999988 1.0000000 1.0000000 1.0000000 0.15 0.1673432 0.4921860 0.7788120 0.9305551 0.9841116 0.9973431 0.9996781 0.9999724 0.9999984 0.9999999 1.0000000 1/6 0.1345880 0.4306816 0.7267751 0.9044313 0.9754937 0.9953912 0.9993707 0.9999392 0.9999961 0.9999998 1.0000000 0.2 0.08589935 0.32212255 0.61740155 0.83886080 0.94959043 0.98834579 0.99803464 0.99976479 0.99998106 0.99999908 0.99999998 0.25 0.04223514 0.19709730 0.45520091 0.71330452 0.88537359 0.96567249 0.99243879 0.99881172 0.99987388 0.99999189 0.99999976 0.3 0.01977327 0.11299010 0.31274045 0.56956234 0.78969538 0.92177521 0.97838085 0.99570911 0.99942230 0.99995276 0.99999823 1/3 0.01156102 0.07514663 0.23411065 0.47255669 0.71100273 0.87791495 0.96137106 0.99117682 0.99862826 0.99987016 0.99999435 0.4 0.00362797 0.03023309 0.11891681 0.29628426 0.53277420 0.75349813 0.90064742 0.97071852 0.99407555 0.99926600 0.99995806 0.5 0.00048828 0.00585938 0.03271484 0.11328125 0.27441406 0.50000000 0.72558594 0.88671875 0.96728516 0.99414063 0.99951172 n=12 x=0 1 2 3 4 5 6 7 8 9 10 11 p=0.01 0.8863849 0.9938255 0.9997944 0.9999954 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.5403601 0.8816401 0.9804317 0.9977636 0.9998161 0.9999889 0.9999995 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.2824295 0.6590023 0.8891300 0.9743625 0.9956707 0.9994588 0.9999498 0.9999966 0.9999998 1.0000000 1.0000000 1.0000000 0.15 0.1422418 0.4434596 0.7358181 0.9077937 0.9760781 0.9953584 0.9993279 0.9999283 0.9999945 0.9999997 1.0000000 1.0000000 1/6 0.1121567 0.3813326 0.6774262 0.8748219 0.9636500 0.9920750 0.9987075 0.9998445 0.9999866 0.9999992 1.0000000 1.0000000 0.2 0.06871948 0.27487791 0.55834575 0.79456895 0.92744450 0.98059472 0.99609687 0.99941876 0.99993780 0.99999547 0.99999980 1.00000000 0.25 0.03167635 0.15838176 0.39067501 0.64877862 0.84235632 0.94559777 0.98574722 0.99721849 0.99960834 0.99996239 0.99999779 0.99999994 0.3 0.01384129 0.08502505 0.25281535 0.49251577 0.72365547 0.88215126 0.96139916 0.99051063 0.99830834 0.99979362 0.99998459 0.99999947 1/3 0.00770735 0.05395143 0.18112265 0.39307468 0.63152071 0.82227754 0.93355236 0.98124157 0.99614445 0.99945620 0.99995296 0.99999812 0.4 0.00217678 0.01959104 0.08344332 0.22533728 0.43817822 0.66520856 0.84178771 0.94269008 0.98473273 0.99718982 0.99968123 0.99998322 0.5 0.00024414 0.00317383 0.01928711 0.07299805 0.19384766 0.38720703 0.61279297 0.80615234 0.92700195 0.98071289 0.99682617 0.99975586 n=13 x=0 1 2 3 4 5 6 7 8 9 10 11 12 p=0.01 0.8775210 0.9927511 0.9997347 0.9999933 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.5133421 0.8645761 0.9754922 0.9968970 0.9997134 0.9999803 0.9999990 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.2541866 0.6213450 0.8661172 0.9658393 0.9935398 0.9990800 0.9999007 0.9999919 0.9999995 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.1209055 0.3982769 0.6919643 0.8819973 0.9658354 0.9924664 0.9987325 0.9998382 0.9999846 0.9999989 0.9999999 1.0000000 1.0000000 1/6 0.09346388 0.33646996 0.62807727 0.84192262 0.94884530 0.98733746 0.99760204 0.99965496 0.99996289 0.99999711 0.99999985 0.99999999 1.00000000 0.2 0.05497558 0.23364622 0.50165218 0.74732431 0.90086939 0.96996468 0.99299644 0.99875438 0.99983399 0.99998394 0.99999893 0.99999996 1.00000000 0.25 0.02375726 0.12670541 0.33260170 0.58425272 0.79396190 0.91978741 0.97570986 0.99435067 0.99901088 0.99987388 0.99998894 0.99999940 0.99999999 0.3 0.00968890 0.06366992 0.20247826 0.42060565 0.65431356 0.83460252 0.93762479 0.98177719 0.99596903 0.99934804 0.99992730 0.99999500 0.99999984 1/3 0.00513823 0.03853673 0.13873224 0.32242400 0.55203870 0.75869193 0.89646076 0.96534517 0.99117682 0.99835228 0.99978737 0.99998307 0.99999937 0.4 0.00130607 0.01262534 0.05790241 0.16857970 0.35304185 0.57439642 0.77115605 0.90232913 0.96791567 0.99220698 0.99868467 0.99986243 0.99999329 0.5 0.00012207 0.00170898 0.01123047 0.04614258 0.13342285 0.29052734 0.50000000 0.70947266 0.86657715 0.95385742 0.98876953 0.99829102 0.99987793 105 n=14 x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 p=0.01 0.8687458 0.9915988 0.9996649 0.9999908 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.4876750 0.8470144 0.9699464 0.9958268 0.9995726 0.9999669 0.9999980 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.2287679 0.5846291 0.8416400 0.9558671 0.9907698 0.9985259 0.9998186 0.9999828 0.9999987 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.1027697 0.3566712 0.6479112 0.8534924 0.9532597 0.9884717 0.9977925 0.9996724 0.9999626 0.9999968 0.9999998 1.0000000 1.0000000 1.0000000 1/6 0.07788657 0.29596895 0.57947605 0.80628173 0.93102485 0.98092210 0.99589128 0.99931280 0.99991157 0.99999141 0.99999939 0.99999997 1.00000000 1.00000000 0.2 0.04398047 0.19791209 0.44805099 0.69818988 0.87016037 0.95614562 0.98839009 0.99760279 0.99961807 0.99995395 0.99999594 0.99999975 0.99999999 1.00000000 0.25 0.01781795 0.10096837 0.28112762 0.52133996 0.74153460 0.88833103 0.96172924 0.98969047 0.99784582 0.99965813 0.99996018 0.99999679 0.99999984 1.00000000 0.3 0.00678223 0.04747562 0.16083576 0.35516743 0.58420119 0.78051584 0.90671811 0.96853147 0.99171148 0.99833434 0.99975352 0.99997469 0.99999839 0.99999995 1/3 0.00342549 0.02740390 0.10533374 0.26119341 0.47550047 0.68980752 0.85053781 0.94238370 0.98256627 0.99596046 0.99930901 0.99991783 0.99999394 0.99999979 0.4 0.00078364 0.00809763 0.03979158 0.12430878 0.27925699 0.48585459 0.69245220 0.84985990 0.94168106 0.98249046 0.99609359 0.99939132 0.99994094 0.99999732 0.5 6.1035e-05 9.1553e-04 6.4697e-03 2.8687e-02 8.9783e-02 2.1198e-01 3.9526e-01 6.0474e-01 7.8802e-01 9.1022e-01 9.7131e-01 9.9353e-01 9.9908e-01 9.9994e-01 n=15 x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 p=0.01 0.8600584 0.9903702 0.9995842 0.9999875 0.9999997 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.4632912 0.8290475 0.9637998 0.9945327 0.9993853 0.9999472 0.9999965 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.2058911 0.5490430 0.8159389 0.9444444 0.9872795 0.9977503 0.9996894 0.9999664 0.9999972 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.08735422 0.31858598 0.60422520 0.82265520 0.93829461 0.98318991 0.99639441 0.99939039 0.99991910 0.99999166 0.99999935 0.99999996 1.00000000 1.00000000 1.00000000 1/6 0.06490547 0.25962189 0.53222487 0.76848078 0.91023433 0.97260589 0.99339642 0.99874255 0.99981178 0.99997810 0.99999806 0.99999987 0.99999999 1.00000000 1.00000000 0.2 0.03518437 0.16712577 0.39802321 0.64816210 0.83576628 0.93894857 0.98194119 0.99576025 0.99921501 0.99988677 0.99998754 0.99999899 0.99999994 1.00000000 1.00000000 0.25 0.01336346 0.08018077 0.23608781 0.46128688 0.68648594 0.85163192 0.94337969 0.98270016 0.99580699 0.99920505 0.99988466 0.99998764 0.99999908 0.99999996 1.00000000 0.3 0.00474756 0.03526760 0.12682772 0.29686793 0.51549106 0.72162144 0.86885743 0.94998746 0.98475747 0.99634748 0.99932777 0.99990834 0.99999128 0.99999948 0.99999999 1/3 0.00228366 0.01941110 0.07935713 0.20924019 0.40406478 0.61837184 0.79696105 0.91176840 0.96917208 0.99149573 0.99819282 0.99971489 0.99996857 0.99999784 0.99999993 0.4 0.00047019 0.00517204 0.02711400 0.09050190 0.21727771 0.40321555 0.60981316 0.78689682 0.90495259 0.96616670 0.99065234 0.99807223 0.99972110 0.99997477 0.99999893 0.5 3.0518e-05 4.8828e-04 3.6926e-03 1.7578e-02 5.9235e-02 1.5088e-01 3.0362e-01 5.0000e-01 6.9638e-01 8.4912e-01 9.4077e-01 9.8242e-01 9.9631e-01 9.9951e-01 9.9997e-01 n=16 x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 p=0.01 0.8514578 0.9890671 0.9994921 0.9999835 0.9999996 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.4401267 0.8107597 0.9570621 0.9929961 0.9991427 0.9999191 0.9999940 0.9999997 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.1853020 0.5147278 0.7892493 0.9315938 0.9829960 0.9967032 0.9994955 0.9999387 0.9999941 0.9999995 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.07425109 0.28390121 0.56137932 0.78989070 0.92094870 0.97645562 0.99441374 0.99894100 0.99983979 0.99998078 0.99999819 0.99999987 0.99999999 1.00000000 1.00000000 1.00000000 1/6 0.05408789 0.22716915 0.48679104 0.72910480 0.88660874 0.96221063 0.98993133 0.99785153 0.99963357 0.99995038 0.99999473 0.99999957 0.99999997 1.00000000 1.00000000 1.00000000 0.2 0.0281475 0.1407375 0.3518437 0.5981343 0.7982454 0.9183121 0.9733427 0.9929964 0.9985241 0.9997524 0.9999674 0.9999967 0.9999998 1.0000000 1.0000000 1.0000000 0.25 0.01002260 0.06347644 0.19711105 0.40498711 0.63018618 0.81034543 0.92044275 0.97287004 0.99253028 0.99835553 0.99971476 0.99996189 0.99999622 0.99999974 0.99999999 1.00000000 0.3 0.00332329 0.02611159 0.09935968 0.24585586 0.44990412 0.65978233 0.82468663 0.92564845 0.97432647 0.99287048 0.99843368 0.99973417 0.99996640 0.99999702 0.99999984 1.00000000 1/3 0.00152244 0.01370195 0.05937512 0.16594583 0.33912325 0.54693615 0.73743131 0.87349929 0.95003752 0.98405451 0.99596046 0.99920754 0.99988401 0.99998808 0.99999923 0.99999998 0.4 0.00028211 0.00329130 0.01833721 0.06514674 0.16656738 0.32884041 0.52717411 0.71606335 0.85773028 0.94168106 0.98085808 0.99510427 0.99906155 0.99987330 0.99998926 0.99999957 0.5 1.5259e-05 2.5940e-04 2.0905e-03 1.0635e-02 3.8406e-02 1.0506e-01 2.2725e-01 4.0181e-01 5.9819e-01 7.7275e-01 8.9494e-01 9.6159e-01 9.8936e-01 9.9791e-01 9.9974e-01 9.9998e-01 n=17 x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 p=0.01 0.8429432 0.9876910 0.9993878 0.9999786 0.9999994 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.4181203 0.7922280 0.9497470 0.9911994 0.9988354 0.9998803 0.9999903 0.9999994 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.1667718 0.4817852 0.7617972 0.9173594 0.9778558 0.9953325 0.9992162 0.9998944 0.9999885 0.9999990 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.06311342 0.25245369 0.51975760 0.75561400 0.90129000 0.96812958 0.99172002 0.99826191 0.99970497 0.99995963 0.99999558 0.99999962 0.99999997 1.00000000 1.00000000 1.00000000 1.00000000 1/6 0.04507324 0.19832227 0.44352072 0.68871917 0.86035808 0.94961032 0.98531121 0.99653149 0.99933656 0.99989758 0.99998734 0.99999876 0.99999991 0.99999999 1.00000000 1.00000000 1.00000000 0.2 0.0225180 0.1182195 0.3096225 0.5488762 0.7582232 0.8942988 0.9623366 0.9890657 0.9974185 0.9995068 0.9999244 0.9999908 0.9999991 0.9999999 1.0000000 1.0000000 1.0000000 0.25 0.00751695 0.05011298 0.16370240 0.35301810 0.57388641 0.76530561 0.89291842 0.95976322 0.98761522 0.99689922 0.99937495 0.99990011 0.99998764 0.99999886 0.99999993 1.00000000 1.00000000 0.3 0.00232631 0.01927510 0.07738525 0.20190701 0.38868964 0.59681886 0.77521534 0.89535990 0.95972306 0.98730728 0.99676472 0.99934402 0.99989673 0.99998784 0.99999899 0.99999995 1.00000000 1/3 0.00101496 0.00964211 0.04415073 0.13042226 0.28139745 0.47766519 0.67393293 0.82814329 0.92452477 0.97271551 0.99199181 0.99812518 0.99965852 0.99995339 0.99999552 0.99999973 0.99999999 0.4 0.00016927 0.00208762 0.01231885 0.04642293 0.12599913 0.26393120 0.44784063 0.64050766 0.80106351 0.90810075 0.96518727 0.98940580 0.99747864 0.99954860 0.99994288 0.99999545 0.99999983 0.5 7.6294e-06 1.3733e-04 1.1749e-03 6.3629e-03 2.4521e-02 7.1732e-02 1.6615e-01 3.1453e-01 5.0000e-01 6.8547e-01 8.3385e-01 9.2827e-01 9.7548e-01 9.9364e-01 9.9883e-01 9.9986e-01 9.9999e-01 106 APPENDIX A. DISTRIBUTION TABLES n=18 x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 p=0.01 0.8345138 0.9862435 0.9992708 0.9999726 0.9999992 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.3972143 0.7735226 0.9418711 0.9891268 0.9984536 0.9998280 0.9999848 0.9999989 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.1500946 0.4502839 0.7337960 0.9018032 0.9718061 0.9935848 0.9988279 0.9998265 0.9999791 0.9999980 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.05364641 0.22405265 0.47966202 0.72023554 0.87943860 0.95810364 0.98818146 0.99728062 0.99948851 0.99992143 0.99999019 0.99999901 0.99999992 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1/6 0.03756104 0.17278077 0.40265431 0.64785276 0.83175160 0.93473495 0.97936106 0.99466145 0.99886905 0.99980408 0.99997238 0.99999686 0.99999972 0.99999998 1.00000000 1.00000000 1.00000000 1.00000000 0.2 0.0180144 0.0990792 0.2713419 0.5010255 0.7163538 0.8670837 0.9487290 0.9837199 0.9957480 0.9990891 0.9998409 0.9999775 0.9999975 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000 0.25 0.00563771 0.03946397 0.13530504 0.30568917 0.51866933 0.71745081 0.86101522 0.94305202 0.98065222 0.99457822 0.99875602 0.99976882 0.99996575 0.99999605 0.99999966 0.99999998 1.00000000 1.00000000 0.3 0.00162841 0.01419046 0.05995221 0.16455048 0.33265485 0.53438010 0.72169640 0.85931654 0.94041412 0.97903201 0.99392749 0.99857023 0.99973092 0.99996050 0.99999565 0.99999966 0.99999998 1.00000000 1/3 0.00067664 0.00676639 0.03264786 0.10166508 0.23107238 0.41224261 0.60851035 0.77673984 0.89239761 0.95665193 0.98556638 0.99608072 0.99914740 0.99985510 0.99998147 0.99999832 0.99999990 1.00000000 0.4 0.00010156 0.00132028 0.00822636 0.03278130 0.09416865 0.20875837 0.37427686 0.56344085 0.73684117 0.86528585 0.94235266 0.97971839 0.99424950 0.99872062 0.99978517 0.99997442 0.99999808 0.99999993 0.5 3.8147e-06 7.2479e-05 6.5613e-04 3.7689e-03 1.5442e-02 4.8126e-02 1.1894e-01 2.4034e-01 4.0726e-01 5.9274e-01 7.5966e-01 8.8106e-01 9.5187e-01 9.8456e-01 9.9623e-01 9.9934e-01 9.9993e-01 1.00000000 n=19 x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 p=0.01 0.8261686 0.9847262 0.9991406 0.9999656 0.9999990 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.3773536 0.7547072 0.9334536 0.9867640 0.9979872 0.9997593 0.9999769 0.9999982 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.1350852 0.4202650 0.7054448 0.8850024 0.9648058 0.9914070 0.9983036 0.9997267 0.9999639 0.9999961 0.9999996 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.04559945 0.19849172 0.44132061 0.68414951 0.85555814 0.94630389 0.98366978 0.99591575 0.99915733 0.99985649 0.99997987 0.99999769 0.99999978 0.99999998 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1/6 0.03130086 0.15024415 0.36434206 0.60698635 0.80110179 0.91757105 0.97192338 0.99211138 0.99816778 0.99964824 0.99994433 0.99999278 0.99999924 0.99999994 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 0.2 0.01441152 0.08286623 0.23688934 0.45508874 0.67328814 0.83693770 0.93239993 0.97672169 0.99334235 0.99842088 0.99969051 0.99995021 0.99999349 0.99999932 0.99999994 1.00000000 1.00000000 1.00000000 1.00000000 0.25 0.00422828 0.03100741 0.11134478 0.26309314 0.46542429 0.66775544 0.82512412 0.92254282 0.97125217 0.99109672 0.99771157 0.99951562 0.99991652 0.99998848 0.99999876 0.99999990 0.99999999 1.00000000 1.00000000 0.3 0.00113989 0.01042185 0.04622368 0.13317100 0.28222354 0.47386252 0.66550151 0.81803049 0.91608484 0.96744664 0.98945884 0.99717741 0.99938271 0.99989163 0.99998510 0.99999846 0.99999989 1.00000000 1.00000000 1/3 0.00045109 0.00473648 0.02402070 0.07865934 0.18793662 0.35185253 0.54308777 0.72066334 0.85384502 0.93523383 0.97592823 0.99257594 0.99812518 0.99961920 0.99993935 0.99999271 0.99999938 0.99999997 1.00000000 0.4 6.0936e-05 8.3279e-04 5.4639e-03 2.2959e-02 6.9614e-02 1.6292e-01 3.0807e-01 4.8778e-01 6.6748e-01 8.1391e-01 9.1153e-01 9.6477e-01 9.8844e-01 9.9693e-01 9.9936e-01 9.9990e-01 9.9999e-01 1.00000000 1.00000000 0.5 1.9073e-06 3.8147e-05 3.6430e-04 2.2125e-03 9.6054e-03 3.1784e-02 8.3534e-02 1.7964e-01 3.2380e-01 5.0000e-01 6.7620e-01 8.2036e-01 9.1647e-01 9.6822e-01 9.9039e-01 9.9779e-01 9.9964e-01 9.9996e-01 1.00000000 n=20 x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 p=0.01 0.8179069 0.9831407 0.9989964 0.9999574 0.9999986 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.05 0.3584859 0.7358395 0.9245163 0.9840985 0.9974261 0.9996707 0.9999661 0.9999971 0.9999998 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.1 0.1215767 0.3917470 0.6769268 0.8670467 0.9568255 0.9887469 0.9976139 0.9995844 0.9999401 0.9999928 0.9999993 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.15 0.03875953 0.17555788 0.40489628 0.64772517 0.82984685 0.93269203 0.97806490 0.99407885 0.99867109 0.99975162 0.99996137 0.99999502 0.99999947 0.99999995 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1/6 0.02608405 0.13042027 0.32865907 0.56654564 0.76874922 0.89815951 0.96286466 0.98874672 0.99715838 0.99940150 0.99989498 0.99998471 0.99999816 0.99999982 0.99999999 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 0.2 0.01152922 0.06917529 0.20608472 0.41144886 0.62964826 0.80420779 0.91330749 0.96785734 0.99001821 0.99740517 0.99943659 0.99989827 0.99998484 0.99999815 0.99999982 0.99999999 1.00000000 1.00000000 1.00000000 1.00000000 0.25 0.00317121 0.02431263 0.09126043 0.22515605 0.41484150 0.61717265 0.78578195 0.89818814 0.95907483 0.98613558 0.99605786 0.99906461 0.99981630 0.99997049 0.99999619 0.99999961 0.99999997 1.00000000 1.00000000 1.00000000 0.3 0.00079792 0.00763726 0.03548313 0.10708680 0.23750778 0.41637083 0.60800981 0.77227180 0.88666854 0.95203810 0.98285518 0.99486184 0.99872112 0.99973895 0.99995706 0.99999445 0.99999946 0.99999996 1.00000000 1.00000000 1/3 0.00030073 0.00330802 0.01759263 0.06044646 0.15151086 0.29721389 0.47934269 0.66147148 0.80945113 0.90810423 0.96236343 0.98702670 0.99627543 0.99912119 0.99983263 0.99997492 0.99999715 0.99999977 0.99999999 1.00000000 0.4 3.6562e-05 5.2405e-04 3.6115e-03 1.5961e-02 5.0952e-02 1.2560e-01 2.5001e-01 4.1589e-01 5.9560e-01 7.5534e-01 8.7248e-01 9.4347e-01 9.7897e-01 9.9353e-01 9.9839e-01 9.9968e-01 9.9995e-01 1.00000000 1.00000000 1.00000000 0.5 9.5367e-07 2.0027e-05 2.0123e-04 1.2884e-03 5.9090e-03 2.0695e-02 5.7659e-02 1.3159e-01 2.5172e-01 4.1190e-01 5.8810e-01 7.4828e-01 8.6841e-01 9.4234e-01 9.7931e-01 9.9409e-01 9.9871e-01 9.9980e-01 9.9998e-01 1.00000000 R commands to generate the above table: n<-1 probs<-c(0.01,0.05,0.1,0.15,1/6,0.2,0.25,0.3,1/3,0.4,0.5) ptabline<-function(x){ return(pbinom(x,prob=probs,size=n)) 107 } ptab<-function(x){ n<<-x t(sapply(0:(x-1),FUN=ptabline)) } lapply(1:20,FUN=ptab) 108 APPENDIX A. DISTRIBUTION TABLES A.1 Poisson Distribution P oλ (x) = bxc X k=0 e−λ λk k! x 0 1 2 3 4 5 6 7 8 9 10 λ =0.1 0.9048374 0.9953212 0.9998453 0.9999962 0.9999999 1.0000000 0.2 0.8187308 0.9824769 0.9988515 0.9999432 0.9999977 0.9999999 1.0000000 0.3 0.7408182 0.9630637 0.9964005 0.9997342 0.9999842 0.9999992 1.0000000 0.4 0.6703200 0.9384481 0.9920737 0.9992237 0.9999388 0.9999960 0.9999998 1.0000000 0.5 0.6065307 0.9097960 0.9856123 0.9982484 0.9998279 0.9999858 0.9999990 0.9999999 1.0000000 0.6 0.5488116 0.8780986 0.9768847 0.9966419 0.9996055 0.9999611 0.9999967 0.9999998 1.0000000 0.7 0.4965853 0.8441950 0.9658584 0.9942465 0.9992145 0.9999100 0.9999911 0.9999992 0.9999999 1.0000000 0.8 0.4493290 0.8087921 0.9525774 0.9909201 0.9985887 0.9998157 0.9999793 0.9999979 0.9999998 1.0000000 0.9 0.4065697 0.7724824 0.9371431 0.9865413 0.9976559 0.9996565 0.9999566 0.9999952 0.9999995 1.0000000 1.0 0.3678794 0.7357589 0.9196986 0.9810118 0.9963402 0.9994058 0.9999168 0.9999898 0.9999989 0.9999999 1.0000000 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 λ =1.1 0.3328711 0.6990293 0.9004163 0.9742582 0.9945647 0.9990321 0.9998512 0.9999799 0.9999976 0.9999997 1.0000000 1.2 0.3011942 0.6626273 0.8794871 0.9662310 0.9922542 0.9984998 0.9997489 0.9999630 0.9999951 0.9999994 0.9999999 1.0000000 1.3 0.2725318 0.6268231 0.8571125 0.9569045 0.9893370 0.9977694 0.9995964 0.9999357 0.9999909 0.9999988 0.9999999 1.0000000 1.4 0.2465970 0.5918327 0.8334977 0.9462747 0.9857467 0.9967989 0.9993777 0.9998935 0.9999837 0.9999978 0.9999997 1.0000000 1.5 0.2231302 0.5578254 0.8088468 0.9343575 0.9814241 0.9955440 0.9990740 0.9998304 0.9999723 0.9999959 0.9999994 0.9999999 1.0000000 1.6 0.2018965 0.5249309 0.7833585 0.9211865 0.9763177 0.9939597 0.9986642 0.9997396 0.9999546 0.9999929 0.9999990 0.9999999 1.0000000 1.7 0.1826835 0.4932455 0.7572232 0.9068106 0.9703852 0.9920006 0.9981249 0.9996123 0.9999283 0.9999880 0.9999982 0.9999997 1.0000000 1.8 0.1652989 0.4628369 0.7306211 0.8912916 0.9635933 0.9896220 0.9974306 0.9994385 0.9998903 0.9999806 0.9999969 0.9999995 0.9999999 1.0000000 1.9 0.1495686 0.4337490 0.7037204 0.8747022 0.9559186 0.9867808 0.9965539 0.9992065 0.9998366 0.9999696 0.9999948 0.9999992 0.9999999 1.0000000 2.0 0.1353353 0.4060058 0.6766764 0.8571235 0.9473470 0.9834364 0.9954662 0.9989033 0.9997626 0.9999535 0.9999917 0.9999986 0.9999998 1.0000000 x λ =2.1 0 0.1224564 1 0.3796149 2 0.6496314 3 0.8386428 4 0.9378739 5 0.9795509 6 0.9941379 7 0.9985140 8 0.9996627 9 0.9999307 10 0.9999870 11 0.9999978 12 0.9999996 13 0.9999999 14 1.00000000 15 16 17 2.2 0.1108032 0.3545701 0.6227137 0.8193524 0.9275037 0.9750902 0.9925387 0.9980224 0.9995305 0.9998991 0.9999802 0.9999964 0.9999994 0.9999999 1.0000000 2.3 0.1002588 0.3308542 0.5960388 0.7993471 0.9162493 0.9700243 0.9906381 0.9974112 0.9993584 0.9998561 0.9999705 0.9999944 0.9999990 0.9999998 1.0000000 2.4 0.09071795 0.30844104 0.56970875 0.77872291 0.90413141 0.96432749 0.98840592 0.99666138 0.99913802 0.99979846 0.99995696 0.99999155 0.99999846 0.99999974 0.99999996 1.00000000 2.5 0.0820850 0.2872975 0.5438131 0.7575761 0.8911780 0.9579790 0.9858127 0.9957533 0.9988597 0.9997226 0.9999384 0.9999874 0.9999976 0.9999996 0.9999999 1.0000000 2.6 0.07427358 0.26738488 0.51842958 0.73600164 0.87742349 0.95096285 0.98282990 0.99466624 0.99851305 0.99962435 0.99991329 0.99998158 0.99999638 0.99999934 0.99999989 0.99999998 1.00000000 2.7 0.06720551 0.24866040 0.49362449 0.71409218 0.86290786 0.94326833 0.97943055 0.99337883 0.99808637 0.99949864 0.99987995 0.99997354 0.99999460 0.99999897 0.99999982 0.99999997 1.00000000 2.8 0.06081006 0.23107824 0.46945368 0.69193743 0.84767606 0.93488969 0.97558938 0.99186926 0.99756722 0.99933991 0.99983627 0.99996261 0.99999209 0.99999844 0.99999971 0.99999995 0.99999999 1.00000000 2.9 0.05502322 0.21459056 0.44596320 0.66962342 0.83177708 0.92582620 0.97128327 0.99011549 0.99694217 0.99914188 0.99977979 0.99994797 0.99998861 0.99999768 0.99999956 0.99999992 0.99999999 1.00000000 3.0 0.04978707 0.19914827 0.42319008 0.64723189 0.81526324 0.91608206 0.96649146 0.98809550 0.99619701 0.99889751 0.99970766 0.99992861 0.99998385 0.99999660 0.99999933 0.99999988 0.99999998 1.00000000 A.1. POISSON DISTRIBUTION 109 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 λ =3.5 0.03019738 0.13588823 0.32084720 0.53663267 0.72544495 0.85761355 0.93471190 0.97326108 0.99012634 0.99668506 0.99898061 0.99971101 0.99992404 0.99998140 0.99999574 0.99999908 0.99999981 0.99999996 0.99999999 1.00000000 4.0 0.01831564 0.09157819 0.23810331 0.43347012 0.62883694 0.78513039 0.88932602 0.94886638 0.97863657 0.99186776 0.99716023 0.99908477 0.99972628 0.99992367 0.99998007 0.99999511 0.99999887 0.99999975 0.99999995 0.99999999 1.00000000 4.5 0.01110900 0.06109948 0.17357807 0.34229596 0.53210358 0.70293043 0.83105058 0.91341353 0.95974269 0.98290727 0.99333133 0.99759572 0.99919486 0.99974841 0.99992634 0.99997972 0.99999473 0.99999870 0.99999970 0.99999993 0.99999999 1.00000000 5.0 0.00673795 0.04042768 0.12465202 0.26502592 0.44049329 0.61596066 0.76218346 0.86662833 0.93190637 0.96817194 0.98630473 0.99454691 0.99798115 0.99930201 0.99977375 0.99993099 0.99998013 0.99999458 0.99999860 0.99999966 0.99999992 0.99999998 1.00000000 5.5 0.00408677 0.02656401 0.08837643 0.20169920 0.35751800 0.52891869 0.68603598 0.80948528 0.89435668 0.94622253 0.97474875 0.98901186 0.99554912 0.99831488 0.99940143 0.99979983 0.99993678 0.99998109 0.99999463 0.99999855 0.99999963 0.99999991 0.99999998 1.00000000 6.0 0.00247875 0.01735127 0.06196880 0.15120388 0.28505650 0.44567964 0.60630278 0.74397976 0.84723749 0.91607598 0.95737908 0.97990804 0.99117252 0.99637151 0.99859965 0.99949090 0.99982512 0.99994308 0.99998240 0.99999482 0.99999855 0.99999961 0.99999990 0.99999998 0.99999999 1.00000000 6.5 0.00150344 0.01127579 0.04303595 0.11184961 0.22367182 0.36904068 0.52652362 0.67275778 0.79157303 0.87738405 0.93316121 0.96612044 0.98397336 0.99289982 0.99704424 0.99884016 0.99956975 0.99984872 0.99994945 0.99998391 0.99999511 0.99999858 0.99999961 0.99999990 0.99999997 0.99999999 1.00000000 7.0 0.00091188 0.00729506 0.02963616 0.08176542 0.17299161 0.30070828 0.44971106 0.59871384 0.72909127 0.83049594 0.90147921 0.94665038 0.97300023 0.98718861 0.99428280 0.99759342 0.99904182 0.99963822 0.99987015 0.99995560 0.99998551 0.99999547 0.99999865 0.99999961 0.99999989 0.99999997 0.99999999 1.00000000 7.5 0.00055308 0.00470122 0.02025672 0.05914546 0.13206186 0.24143645 0.37815469 0.52463853 0.66196712 0.77640761 0.86223798 0.92075869 0.95733413 0.97843535 0.98973957 0.99539168 0.99804111 0.99921000 0.99969700 0.99988925 0.99996134 0.99998709 0.99999587 0.99999873 0.99999963 0.99999989 0.99999997 0.99999999 1.00000000 8.0 0.00033546 0.00301916 0.01375397 0.04238011 0.09963240 0.19123606 0.31337428 0.45296081 0.59254734 0.71662426 0.81588579 0.88807600 0.93620280 0.96581930 0.98274301 0.99176900 0.99628200 0.99840574 0.99934963 0.99974706 0.99990603 0.99996659 0.99998861 0.99999627 0.99999883 0.99999964 0.99999990 0.99999997 0.99999999 1.00000000 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 λ=9 0.00012341 0.00123410 0.00623220 0.02122649 0.05496364 0.11569052 0.20678084 0.32389696 0.45565260 0.58740824 0.70598832 0.80300838 0.87577343 0.92614923 0.95853367 0.97796434 0.98889409 0.99468043 0.99757360 0.99894405 0.99956075 0.99982505 0.99993317 0.99997548 0.99999135 0.99999706 0.99999904 0.99999969 0.99999991 0.99999997 0.99999999 1.00000000 10 0.00004540 0.00049940 0.00276940 0.01033605 0.02925269 0.06708596 0.13014140 0.22022060 0.33281970 0.45792970 0.58303980 0.69677610 0.79155650 0.86446440 0.91654150 0.95125960 0.97295840 0.98572240 0.99281350 0.99654570 0.99841170 0.99930030 0.99970430 0.99987990 0.99995310 0.99998230 0.99999360 0.99999770 0.99999920 0.99999970 0.99999990 1.00000000 11 0.00001670 0.00020042 0.00121087 0.00491587 0.01510460 0.03751981 0.07861437 0.14319153 0.23198513 0.34051064 0.45988870 0.57926676 0.68869665 0.78129117 0.85404401 0.90739609 0.94407565 0.96780948 0.98231349 0.99071054 0.99532892 0.99774808 0.99895765 0.99953614 0.99980129 0.99991795 0.99996731 0.99998742 0.99999532 0.99999831 0.99999941 0.99999980 12 0.00000614 0.00007987 0.00052226 0.00229179 0.00760039 0.02034103 0.04582231 0.08950450 0.15502780 0.24239220 0.34722940 0.46159730 0.57596520 0.68153560 0.77202450 0.84441570 0.89870900 0.93703370 0.96258350 0.97872020 0.98840230 0.99393490 0.99695260 0.99852710 0.99931440 0.99969220 0.99986670 0.99994420 0.99997740 0.99999110 0.99999660 0.99999880 13 0.00000226 0.00003164 0.00022264 0.00105030 0.00374019 0.01073389 0.02588692 0.05402825 0.09975791 0.16581190 0.25168200 0.35316490 0.46310470 0.57304460 0.67513150 0.76360690 0.83549310 0.89046500 0.93016690 0.95733130 0.97498820 0.98591860 0.99237750 0.99602820 0.99800570 0.99903400 0.99954810 0.99979570 0.99991060 0.99996210 0.99998440 0.99999380 14 0.00000083 0.00001247 0.00009396 0.00047425 0.00180525 0.00553205 0.01422792 0.03161966 0.06205520 0.10939940 0.17568120 0.26003990 0.35845840 0.46444760 0.57043670 0.66935990 0.75591770 0.82720060 0.88264290 0.92349510 0.95209160 0.97115590 0.98328780 0.99067240 0.99498010 0.99739240 0.99869130 0.99936490 0.99970160 0.99986420 0.99994010 0.99997430 15 0.00000031 0.00000489 0.00003931 0.00021138 0.00085664 0.00279243 0.00763190 0.01800219 0.03744649 0.06985366 0.11846440 0.18475180 0.26761100 0.36321780 0.46565370 0.56808960 0.66412320 0.74885880 0.81947170 0.87521880 0.91702910 0.94689360 0.96725580 0.98053540 0.98883520 0.99381510 0.99668810 0.99828420 0.99913930 0.99958160 0.99980270 0.99990970 20 0.00000000 0.00000004 0.00000046 0.00000320 0.00001694 0.00007191 0.00025512 0.00077859 0.00208726 0.00499541 0.01081172 0.02138682 0.03901199 0.06612764 0.10486430 0.15651310 0.22107420 0.29702840 0.38142190 0.47025730 0.55909260 0.64369760 0.72061130 0.78749280 0.84322740 0.88781500 0.92211320 0.94751930 0.96566650 0.97818180 0.98652530 0.99190820 25 0.00000000 0.00000000 0.00000000 0.00000004 0.00000027 0.00000140 0.00000611 0.00002292 0.00007548 0.00022148 0.00058646 0.00141597 0.00314412 0.00646748 0.01240206 0.02229302 0.03774765 0.06047504 0.09204086 0.13357480 0.18549230 0.24729880 0.31753350 0.39387550 0.47339850 0.55292140 0.62938580 0.70018610 0.76340070 0.81789610 0.86330890 0.89993210 30 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000002 0.00000012 0.00000052 0.00000205 0.00000712 0.00002235 0.00006388 0.00016770 0.00040728 0.00092068 0.00194748 0.00387273 0.00727022 0.01293270 0.02187347 0.03528462 0.05444340 0.08056902 0.11464590 0.15724200 0.20835740 0.26733660 0.33286910 0.40308250 0.47571700 0.54835150 0.61864300 110 APPENDIX A. DISTRIBUTION TABLES x λ=9 10 11 12 13 14 15 20 25 30 32 1.00000000 1.00000000 0.99999993 0.99999960 0.99999760 0.99998930 0.99995980 0.99527260 0.92854400 0.68454120 0.99999998 0.99999980 0.99999910 0.99999570 0.99998260 0.99731160 0.95021960 0.74444880 33 34 0.99999999 0.99999990 0.99999970 0.99999830 0.99999270 0.99851100 0.96615760 0.79730830 1.00000000 1.00000000 0.99999990 0.99999940 0.99999700 0.99919630 0.97754190 0.84261650 35 36 1.00000000 0.99999980 0.99999880 0.99957710 0.98544770 0.88037340 37 0.99999990 0.99999950 0.99978290 0.99078940 0.91098700 1.00000000 0.99999980 0.99989120 0.99430370 0.93515570 38 39 0.99999990 0.99994680 0.99655640 0.95374700 40 1.00000000 0.99997460 0.99796440 0.96769040 0.99998810 0.99882290 0.97789300 41 42 0.99999460 0.99933390 0.98518050 43 0.99999760 0.99963100 0.99026480 0.99999890 0.99979990 0.99373140 44 45 0.99999950 0.99989360 0.99604240 46 0.99999980 0.99994460 0.99754960 0.99999990 0.99997170 0.99851170 47 48 1.00000000 0.99998580 0.99911300 0.99999300 0.99948110 49 50 0.99999660 0.99970200 51 0.99999840 0.99983190 52 0.99999930 0.99990690 53 0.99999970 0.99994930 54 0.99999980 0.99997290 55 0.99999990 0.99998570 56 1.00000000 0.99999260 0.99999620 57 58 0.99999810 59 0.99999910 60 0.99999960 R commands to generate the above table: ll <- seq(0.1:1,by=0.1) poistab<-function(x){ return(ppois(x,lambda=ll)) } t(sapply(0:10,FUN=poistab)) ll <- ll+1 t(sapply(0:13,FUN=poistab)) ll <- ll+1 t(sapply(0:17,FUN=poistab)) ll <- seq(3.5,8,by=0.5) t(sapply(0:32,FUN=poistab)) ll <- c(9,10,11,12,13,14,15,20,25,30) t(sapply(0:60,FUN=poistab)) A.2. STANDARD NORMAL DISTRIBUTION Standard Normal Distribution 0.1 0.3 A.2 111 -4 -2 N0,1(x) 0 Z x Nµ=0,σ2 =1 (x) = −∞ 2 1 √ ·e 2π 4 2 − t2 dt x 0.0 0.1 0.2 0.3 0.4 0.00 0.5 0.5398278 0.5792597 0.6179114 0.6554217 0.01 0.5039894 0.5437953 0.5831662 0.6217195 0.659097 0.02 0.5079783 0.5477584 0.5870644 0.6255158 0.6627573 0.03 0.5119665 0.5517168 0.5909541 0.6293 0.6664022 0.04 0.5159534 0.55567 0.5948349 0.6330717 0.6700314 0.05 0.5199388 0.5596177 0.5987063 0.6368307 0.6736448 0.06 0.5239222 0.5635595 0.6025681 0.6405764 0.6772419 0.07 0.5279032 0.5674949 0.6064199 0.6443088 0.6808225 0.08 0.5318814 0.5714237 0.6102612 0.6480273 0.6843863 0.09 0.5358564 0.5753454 0.6140919 0.6517317 0.6879331 0.5 0.6 0.7 0.8 0.9 0.6914625 0.7257469 0.7580363 0.7881446 0.8159399 0.6949743 0.7290691 0.7611479 0.7910299 0.8185887 0.6984682 0.7323711 0.7642375 0.7938919 0.8212136 0.701944 0.7356527 0.7673049 0.7967306 0.8238145 0.7054015 0.7389137 0.77035 0.7995458 0.8263912 0.7088403 0.7421539 0.7733726 0.8023375 0.8289439 0.7122603 0.7453731 0.7763727 0.8051055 0.8314724 0.7156612 0.7485711 0.7793501 0.8078498 0.8339768 0.7190427 0.7517478 0.7823046 0.8105703 0.8364569 0.7224047 0.7549029 0.7852361 0.8132671 0.8389129 1.0 1.1 1.2 1.3 1.4 0.8413447 0.8643339 0.8849303 0.9031995 0.9192433 0.8437524 0.8665005 0.8868606 0.9049021 0.9207302 0.8461358 0.8686431 0.8887676 0.9065825 0.9221962 0.848495 0.8707619 0.8906514 0.9082409 0.9236415 0.85083 0.8728568 0.8925123 0.9098773 0.9250663 0.8531409 0.8749281 0.8943502 0.911492 0.9264707 0.8554277 0.8769756 0.8961653 0.913085 0.927855 0.8576903 0.8789995 0.8979577 0.9146565 0.9292191 0.8599289 0.8809999 0.8997274 0.9162067 0.9305634 0.8621434 0.8829768 0.9014747 0.9177356 0.9318879 1.5 1.6 1.7 1.8 1.9 0.9331928 0.9452007 0.9554345 0.9640697 0.9712834 0.9344783 0.9463011 0.9563671 0.9648521 0.9719334 0.9357445 0.9473839 0.9572838 0.9656205 0.9725711 0.9369916 0.9484493 0.9581849 0.966375 0.9731966 0.9382198 0.9494974 0.9590705 0.9671159 0.9738102 0.9394292 0.9505285 0.9599408 0.9678432 0.9744119 0.9406201 0.9515428 0.9607961 0.9685572 0.9750021 0.9417924 0.9525403 0.9616364 0.9692581 0.9755808 0.9429466 0.9535213 0.962462 0.969946 0.9761482 0.9440826 0.954486 0.963273 0.970621 0.9767045 2.0 2.1 2.2 2.3 2.4 0.9772499 0.9821356 0.9860966 0.9892759 0.9918025 0.9777844 0.9825708 0.9864474 0.9895559 0.9920237 0.9783083 0.982997 0.9867906 0.9898296 0.9922397 0.9788217 0.9834142 0.9871263 0.9900969 0.9924506 0.9793248 0.9838226 0.9874545 0.9903581 0.9926564 0.9798178 0.9842224 0.9877755 0.9906133 0.9928572 0.9803007 0.9846137 0.9880894 0.9908625 0.9930531 0.9807738 0.9849966 0.9883962 0.991106 0.9932443 0.9812372 0.9853713 0.9886962 0.9913437 0.9934309 0.9816911 0.9857379 0.9889893 0.9915758 0.9936128 2.5 2.6 2.7 2.8 2.9 0.9937903 0.9953388 0.996533 0.9974449 0.9981342 0.9939634 0.9954729 0.9966358 0.9975229 0.9981929 0.9941323 0.9956035 0.9967359 0.9975988 0.9982498 0.9942969 0.9957308 0.9968333 0.9976726 0.9983052 0.9944574 0.9958547 0.996928 0.9977443 0.9983589 0.9946139 0.9959754 0.9970202 0.997814 0.9984111 0.9947664 0.996093 0.9971099 0.9978818 0.9984618 0.9949151 0.9962074 0.9971972 0.9979476 0.998511 0.99506 0.9963189 0.9972821 0.9980116 0.9985588 0.9952012 0.9964274 0.9973646 0.9980738 0.9986051 3.0 3.1 3.2 3.3 3.4 0.9986501 0.9990324 0.9993129 0.9995166 0.9996631 0.9986938 0.9990646 0.9993363 0.9995335 0.9996752 0.9987361 0.9990957 0.999359 0.9995499 0.9996869 0.9987772 0.999126 0.999381 0.9995658 0.9996982 0.9988171 0.9991553 0.9994024 0.9995811 0.9997091 0.9988558 0.9991836 0.999423 0.9995959 0.9997197 0.9988933 0.9992112 0.9994429 0.9996103 0.9997299 0.9989297 0.9992378 0.9994623 0.9996242 0.9997398 0.998965 0.9992636 0.999481 0.9996376 0.9997493 0.9989992 0.9992886 0.9994991 0.9996505 0.9997585 3.5 3.6 3.7 3.8 3.9 0.9997674 0.9998409 0.9998922 0.9999277 0.9999519 0.9997759 0.9998469 0.9998964 0.9999305 0.9999539 0.9997842 0.9998527 0.9999004 0.9999333 0.9999557 0.9997922 0.9998583 0.9999043 0.9999359 0.9999575 0.9997999 0.9998637 0.999908 0.9999385 0.9999593 0.9998074 0.9998689 0.9999116 0.9999409 0.9999609 0.9998146 0.9998739 0.999915 0.9999433 0.9999625 0.9998215 0.9998787 0.9999184 0.9999456 0.9999641 0.9998282 0.9998834 0.9999216 0.9999478 0.9999655 0.9998347 0.9998879 0.9999247 0.9999499 0.999967 4.0 0.9999683 0.9999696 0.9999709 0.9999721 0.9999733 0.9999744 0.9999755 0.9999765 0.9999775 0.9999784 R commands to generate the above table: dec2 <- seq(0.00,.09,by=0.01) dec1 <- seq(0,4,by=.1) normtab<-function(x){ return(pnorm(x+dec2)) } t(sapply(dec1,FUN=normtab))