QBM117 Business Statistics Probability Distributions Random variables and probability distributions Objectives • To define a random variable. • To define the probability distribution for a random variable. • To distinguish between a discrete random variable and a continuous random variable. • To introduce discrete probability distributions. • Calculate the mean, variance and standard deviation of a discrete probability distribution. Random Variables • A random variable is a variable whose numerical value is determined by a the outcome of a random experiment. • It is random because the value it assumes depends on chance. Examples of Random Variables • Imagine drawing a student at random from the student body. • The student’s height, weight, weekly income and grade point average are all numerical values describing properties of the randomly selected student. • They are all random variables. Random Experiment Draw a student at random from the student body Random Variable Height (meters) of the randomly selected student Possible values for the random variable Any value between about 1.5 m and 2 m Random Experiment Toss Two coins Random Variable The number of heads Possible values for the random variable 0, 1 or 2 Random Experiment Audit 50 tax returns Random Variable The number of returns containing errors Possible values for the random variable 0, 1, 2,…,50 Random Experiment Weigh a shipment of goods Random Variable The weight of the shipment Possible values for the random variable Any value greater than or equal to 0 Notation • We make the distinction between random variable and the values it can assume, by following the convention of using a capital letter such as X and Y to denote random variables, and using lower-case letters such as x and y to denote their values. Discrete and Continuous Random Variables • There are two types of random variables - discrete - continuous • They are distinguished from one another by the number of possible values they can assume. Discrete Random Variables • A discrete random variable has a finite number of possible values. • For example - the number of defective items in a production batch - the number of telephone calls received in a given hour - the number of customers served in a hotel reception on a given day Continuous Random Variables • A continuous random variable has an infinite number of possible values. • For example - the duration of long-distance telephone calls - The lifetime of a certain brand of tyres - The total annual sales of a firm - The rate of return of a particular stock Examples revisited Random Experiment Draw a student at random from the student body Random Variable Height (meters) of the randomly selected student Possible values for the random variable Any value between about 1.5 m and 2 m Continuous or Discrete? Continuous Random Experiment Toss Two coins Random Variable The number of heads Possible values for the random variable 0, 1 or 2 Continuous or Discrete? Discrete Random Experiment Audit 50 tax returns Random Variable The number of returns containing errors Possible values for the random variable 0, 1, 2,…,50 Continuous or Discrete? Discrete Random Experiment Weigh a shipment of goods Random Variable The weight of the shipment Possible values for the random variable Any value greater than or equal to 0 Continuous or Discrete? Continuous Probability Distributions • A probability distribution of a random variable X tells us what the possible values of X are and the associated probabilities P(X=x) or p(x). • There are two types of probability distributions - discrete probability distribution - continuous distribution Discrete Probability Distributions • The probability distribution of a discrete random variable is a table, formula or graph that lists all the possible values of the random variable and their associated probabilities. X P(X=x) x1 x2 … xn p1 p2 … pn Requirements of Discrete Probability Distributions If a discrete random variable X can take values x1, x2,…, xn with probabilities p(x1), p(x2),…, p(xn) , the probabilities must satisfy two requirements: 1. Every probability p(xi) is a number between 0 and 1 0 p( xi ) 1 for i 1,2,..., n 1. The probabilities must add to 1 n p( x ) 1 i 1 i Example 1 Consider a study of 300 households in a town in the coast of Queensland. As a part of this study, data were collected showing the number of children in each household. The following results were obtained: 54 of the households has no children, 117 had 1 child, 72 had 2 children, 42 had 3 children, 12 had 4 children, and 3 had 5 children. Consider the experiment of randomly selecting one of these households to participate in a follow-up study. Let X = number of children in the household selected. The possible values of X are 0, 1, 2, 3, 4, and 5. The probability that the selected household has no children is 54/300 = 0.18. Hence P(X=0) = 0.18 The probability that the selected household has 1 child is 117/300 = 0.39. Hence P(X=1) = 0.39 The probability that the selected household has 2 children is 72/300 = 0.24 Hence P(X=2) = 0.24 The probability that the selected household has 3 children is 42/300 = 0.14 Hence P(X=3) = 0.14 The probability that the selected household has 4 children is 12/300 = 0.04 Hence P(X=4) = 0.04 The probability that the selected household has 5 children is 3/300 = 0.01 Hence P(X=5) = 0.01 The probability distribution of X can be presented in tabular form. X 0 1 2 3 4 5 P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01 Note that each of the probabilities is between 0 and 1, and that the probabilities add to 1. The probability distribution of X can also be presented in terms of the following formula 0.18 0.39 0.24 p( x) 0.14 0.04 0.01 if x 0 if x 1 if x 2 if x 3 if x 4 if x 5 It can also be presented in the form of a graph. 0.45 0.4 0.35 P(X=x) 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 X 4 5 Using a Probability Distribution • A primary advantage of defining a random variable and its probability distribution is that once the probability distribution is known, it is relatively easy to determine the probability of a variety of events that may be of interest to a decision maker. • We interpret the probabilities the same way we did last week when we were looking at probability. Consider Example 1: P(X=4) = 0.04 implies that the probability that a randomly selected household has 4 children is 0.04 • We can also apply the addition rule for mutually exclusive events. Consider Example 1: The values of X are mutually exclusive; a household can have 0, 1, 2, 3, 4 or 5 children. The probability that a randomly selected household has 3 or more children is P( X 3) P( X 3) P( X 4) P( X 5) 0.14 0.04 0.01 0.19 Example 2 Using historical records, the personnel manager of a plant has determined the probability distribution of X, the number of employees absent per day. It is X 0 1 2 3 4 5 6 7 P(X=x) 0.005 0.025 0.310 0.340 0.220 0.080 0.019 0.001 What is the probability that there are no absent employees on any given day? What is the probability that there are no more than 2 employees absent on any given day? What is the probability that there are no absent employees on any given day? P(X=0) = 0.005 What is the probability that there are at most 2 absent employees on any given day? P( X 3) P( X 0) P( X 1) P( X 2) P( X 3) 0.05 0.025 0.310 0.340 0.68 Expected Value and Variance • In Topic 1 we calculated sample and population means and variances for frequency distributions. • A probability distribution is the distribution of a population. • We can calculate the population mean and variance for probability distributions. Expected Value The mean, or expected value, of a discrete random variable X is obtained by • multiplying each possible value of X by its associated probability • and then summing the resulting products. n E ( X ) xi p( xi ) i 1 Example 1 revisited X 0 1 2 3 4 5 P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01 The expected number of children per household is E( X ) 0 0.18 1 0.39 2 0.24 3 0.14 4 0.04 5 0.01 1.5 Variance The variance of a discrete random variable X is found by • subtracting the mean from each value and squaring this difference. • multiplying squared difference by the associated probability, • and then summing the resulting products n 2 V ( X ) ( xi )2 p( xi ) i 1 • A more computationally efficient method of calculating the variance of a discrete random variable is to use the following formula n V ( X ) xi p( xi ) 2 2 2 i 1 • This is just a rearrangement of the formula on the previous slide. Example 1 revisited X 0 1 2 3 4 5 P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01 The variance of number of children per household is V (X ) 2 02 0.18 12 0.39 22 0.24 32 0.14 4 0.04 5 0.01 1.5 2 2 3.5 1.5 1.25 2 2 Standard Deviation • Following on from Topic 1, the standard deviation can be found by taking the square root of the variance Example 1 revisited X 0 1 2 3 4 5 P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01 The standard deviation of X, the number of children per household, is 1.25 1.12 (2d.p.) Example 2 revisited X = the number of employees absent per day X 0 1 2 3 4 5 6 7 P(X=x) 0.005 0.025 0.310 0.340 0.220 0.080 0.019 0.001 Determine the mean and standard deviation of the number of employees absent per day. The mean number of employees absent per day is 0 0.005 1 0.025 2 0.310 3 0.340 4 0.220 5 0.080 6 0.019 7 0.001 3.066 The mean number of employees absent per day is 02 0.005 12 0.025 22 0.310 32 0.340 4 0.220 5 0.080 6 0.019 2 2 7 2 0.001 (3.066) 2 10.587 (3.066) 2 1.178 (2d.p.) 2 Example 3 (Exercise 5.19) The owner of a small firm has just purchased a personal computer, which she expects will surge her for the next two years. The owner has just been told that she must buy a surge suppressor to provide protection for her new hardware against possible surges or variations in the electrical current. Her son David, a recent university graduate, advises that an inexpensive suppressor could be purchased that would provide protection against one surge only. He notes that the amount of damage done without a suppressor would depend on the extent of the surge. David conservatively estimates that, over the next two years, there is a 1% chance of incurring $400 damage and a 2% chance of incurring $200 damage. But the probability of incurring $100 damage is 0.1. 1. How much should the owner be willing to pay for a surge suppressor? 2. Determine the standard deviation of the possible amounts of damage. To answer these questions we need to construct the probability distribution for the amount of damage incurred. Let X = the amount of damage incurred. David conservatively estimates that, over the next two years, there is a 1% chance of incurring $400 damage and a 2% chance of incurring $200 damage. But the probability of incurring $100 damage is 0.1. X 0 100 200 400 P(X=x) 0.87 0.10 0.02 0.01 1. To determine how much the owner should be willing to pay for a surge suppressor we need to work out the expected amount of damage to be incurred. E ( X ) 0 0.87 100 0.10 200 0.02 400 0.01 18 The expected amount of damage to be incurred is $18, therefore the owner should be willing to pay up to $18. 2. To determine the standard deviation of the possible amounts of damage we need to calculate the variance and then take the square root of the variance to obtain the standard deviation. V ( X ) 02 0.87 1002 0.10 2002 0.02 4002 0.01 182 3400 182 3076 3076 55.46 Hence the standard deviation of the possible amounts of damage is $55.46. Reading for next lecture • Chapter 5 Section 5.4 Exercises • • • • 5.1 5.5 5.11 5.22 a and b only