Probability distributions: part 1 BSAD 30 Dave Novak Source: Anderson et al., 2015 Quantitative Methods for Business 12th edition – some slides are directly from J. Loucks © 2013 Cengage Learning Covered so far… Chapter 1: Introduction What is modeling Types of models Basic problem formulation Review of basic linear (algebraic) problems Chapter 2: Introduction to probability 2 Review of probability concepts (complement, union, intersection, conditional probability, joint probability table, independence, mutually exclusive) Overview Random Variables Discrete Probability Distributions Uniform Probability Distribution Binomial Probability Distribution Poisson Probability Distribution 3 Link to examples of types of discrete distributions • http://www.epixanalytics.com/modelassist/AtRisk/ Model_Assist.htm#Distributions/Discrete_distribu tions/Discrete_distributions.htm Overview In general, what is a probability distribution? 4 A table, equation, or graphical representation that links the possible outcomes of an experiment to their likelihood (probability) of occurrence Overview We will briefly look at three “common” discrete probability examples Uniform Binomial Poisson 5 In business applications, we often find instances of random variables that follow a discrete uniform, binomial, or Poisson probability distribution What is a random variable? A random variable (RV) is a numerical description of the outcome of an experiment Keep in mind that there is a difference between numeric variables and categorical variables Numeric: temperature, speed, age, monetized data, etc. Categorical: state of residence, gender, blood type, etc. 6 What is a random variable? 7 Two types of numeric random variables: Discrete Continuous Random variables 8 Random variables Question Family size Random Variable x x = Number of dependents in family reported on tax return Distance from x = Distance in miles from home to store home to the store site Own dog or cat 9 Type Discrete Continuous Discrete x = 1 if own no pet; = 2 if own dog(s) only; = 3 if own cat(s) only; = 4 if own dog(s) and cat(s) Example Discrete random variable (RV) with a finite number of possible values Let x = number of TVs sold at the store in one day, where x can take on 5 values (0, 1, 2, 3, 4) There is a readily identifiable upper bound to the number of TVs sold on any given day In this case, no more than 4 TVs sold 10 Example Discrete random variable (RV) with an infinite number of possible values Let x = number of customers arriving in one day, where x can take on the values 0, 1, 2, . . . There is no readily identifiable upper bound on the number of customers coming into the store on any given day There cannot be an infinite # of customers, but we are not setting an upper bound (could be 75, 500, or 2,000) 11 Discrete probability distributions The probability distribution for a random variable describes how probabilities associated with each value are distributed (or allocated) over all possible values We can describe a discrete probability distribution with a table, graph, or equation 12 In the TV sales example, we would want a mathematical and/or visual representation of the probability of selling 0, 1, 2, 3, or 4 TVs on any given day Discrete probability distributions The probability distribution is defined by a probability function, denoted by f(x), which provides the probability for each value of the random variable The function f(x) is a mathematical representation of the probability distribution The following conditions are required: f(x) > 0 13 f(x) = 1 Discrete distribution: DiCarlo motors example Using historical data on car sales, a tabular representation of sales is created Units Sold 0 1 2 3 4 5 14 Number of Days 54 117 72 42 12 3 300 .18 = 54/300 x 0 1 2 3 4 5 f(x) .18 .39 .24 .14 .04 .01 1.00 .04 = 12/300 Discrete distribution: DiCarlo motors example Graphical representation Probability .50 .40 .30 .20 .10 0 15 1 2 3 4 5 Values of Random Variable x (car sales) Discrete distribution: DiCarlo motors example The probability distribution provides the following information There is a 0.18 probability that no cars will be sold during a day f(0) = 18% The most probable sales volume is 1, with f(1) = 0.39 f(1) = 39% There is a 0.05 probability of either four or five cars being sold f(4) + f(5) = 5% 16 Summary Up to this point, we have not discussed the specific TYPE of discrete probability distribution (i.e. uniform, binomial, Poisson, etc.) We have only discussed probability distributions in terms of being discrete as opposed to continuous A review of basic statistical concepts is next 17 Expected value and variance The expected value, or mean, of a random variable is a measure of its central location Mean, median, and mode are measures of central tendency because they identify a single value as “typical” or representative of all values in a probability distribution E(x) = = x f(x) 18 Expected value and variance The variance, 2, summarizes the variability in the values of a random variable The standard deviation, , is defined as the positive square root of the variance Var(x) = 2 = (x - )2f(x) StdDev(x) = = 2 19 Expected value and variance Both the StdDev and variance provide a measure of how much the values in the probability distribution differ from the mean The higher the standard deviation, the more different the different observations are from one another and from the mean When a probability distribution has a high standard deviation, the mean is not a good measure of central tendency 20 Expected value and variance Scores = 1,4,3,4,2,7,18,3,7,2,4,3 Mean = 5 Median = 3.5 Standard Deviation = 4.53 21 The standard deviation indicates that the average difference between each score and the mean is around 4.5 points. However, only one score (18) is 4.5 or more points different from the mean. The one extreme score (18) overly influences the mean. The median (3.5) is a better measure of central tendency in this case because extreme scores do not influence the median Discrete distribution: DiCarlo motors example Units Sold 0 1 2 3 4 5 22 Number of Days 54 117 72 42 12 3 300 x 0 1 2 3 4 5 f(x) .18 .39 .24 .14 .04 .01 1.00 DiCarlo motors example Calculate expected value of discrete RV x 0 1 2 3 4 5 23 f(x) .18 .39 .24 .14 .04 .01 E(x) = expected number of cars sold in a day xf(x) .00 .39 .48 .42 .16 .05 1.50 0 x 0.18 = 0 1 x 0.39 = 0.39 DiCarlo motors example Calculate variance and StdDev x x- (x - )2 f(x) 0 1 2 3 4 5 -1.5 -0.5 0.5 1.5 2.5 3.5 2.25 0.25 0.25 2.25 6.25 12.25 .18 .39 .24 .14 .04 .01 (x - )2f(x) .4050 .0975 .0600 .3150 .2500 .1225 Variance of daily sales = 2 = 1.2500 Standard deviation of daily sales = = 1.2500 = 1.118 cars 24 cars squared DiCarlo motors example Calculate variance and StdDev Var(x) = 2 = (x - )2f(x) = 0.4050 + 0.0975 + 0.0600 + 0.3150 + 0.2500 + 0.1225 Var(x) = 2 = 1.25 Standard deviation of daily sales = = 1.2500 = 1.118 cars 25 Expected value and variance From a decision-making or analyst perspective what are some of the practical implications of this discussion? If the data you are analyzing have a high variance, making decisions based on the mean, or even stressing the importance of the average, is likely to be misleading The median might be a better measure of central tendency 26 Expected value and variance What should you do? Generate a visual representation of the data! You need to better characterize the data to see if they fit into any well-known families of probability distributions – this would be the first step in analysis Are data skewed or symmetrical? Knowing what the data “aren’t” is also useful 27 Expected value and variance What should you do? Knowing that data do not follow a particular distribution is important in terms of analysis There are particular characteristics associated with different types of distributions that can guide you in your analysis 28 Discrete Distributions we will examine 1) Uniform 2) Binomial or Bernoulli 3) Poisson 29 Discrete uniform probability distribution The discrete uniform probability distribution is the simplest example of a discrete probability distribution given by a formula f(x) = 1/n the values of the random variable are equally likely where: n = the number of values the random variable may assume 30 Example: getting a 1, 2, 3, 4, 5, or 6 when rolling single die – f(x) = 1/6 Binomial probability distribution Also known as Bernoulli distribution Has four properties: 1) Experiment consists of n, independent trials 2) Only TWO outcomes are possible for each trial (success/failure, good/bad, on/off, yes/no, etc.) 3) The probability of success stays the same for all trials 4) All trials are independent 31 Binomial probability distribution We are interested in the number of successes, or positive outcomes occurring in the n trials x denotes the number of successes, or positive outcomes occurring in the n trials n! f (x) p x (1 p)( n x ) x !(n x )! 32 where: f(x) = the probability of x successes in n trials n = the number of trials p = the probability of success on any one trial Binomial probability distribution n! x ( nx ) f (x) p (1 p) x !(n x )! Number of experimental outcomes providing exactly x successes in n trials 33 Probability of a particular sequence of trial outcomes with x successes in n trials Binomial probability distribution Assume the probability that any customer who comes into a store and actually makes a purchase is 0.3 (30% chance of success) What is the probability that 2 of the next 3 customers who enter the store make a purchase? 34 Identify: n, x, p Binomial probability distribution 35 Binomial probability distribution (decision tree) 1st Customer 2nd Customer Purchases (.3) Purchases (.3) Does Not Purchase (.7) (.7) Does Not Purchase 36 Purchases (.3) Does Not Purchase (.7) 3rd Customer P (.3) x 3 Prob. .027 DNP (.7) 2 .063 P (.3) 2 .063 DNP (.7) 1 .147 P (.3) 2 .063 DNP (.7) P (.3) 1 .147 1 .147 DNP (.7) 0 .343 Binomial probability distribution If a six-sided die is rolled three times, what is the probability that the number 5 comes up twice? 37 Identify: n, x, p Binomial probability distribution 38 Binomial probability distribution 1st roll 2nd roll Success (.17) Success “5” (.17) Failure (.83) (.83) Failure (1,2, 3, 4, 6) Success (.17) Failure (.83) 3rd roll S (.17) 3 Prob. .005 F (.83) 2 .024 S (.17) 2 .024 F (.83) 1 .117 S (.17) 2 .024 F (.83) 1 .117 S (.17) 1 .117 0 .572 F (.83) 39 x Binomial probability distribution What’s the probability if I roll a die 10 times, the number 5 comes up four times? 40 Identify: n, x, p Binomial probability distribution Expected value E(x) = = np Variance Var(x) = 2 = np(1 p) Standard deviation np(1 p) 41 Binomial probability distribution 42 In the clothing store example, calculate: Expected value Variance Standard deviation Poisson probability distribution A Poisson distributed random variable is often useful in estimating the number of occurrences over a specified interval of time or space which can be counted in whole numbers 43 Very useful in RISK analysis It is a discrete random variable that may assume an infinite sequence of values (x = 0, 1, 2, . . . ∞) Poisson versus Binomial How is an RV that follows a Poisson distribution different from an RV that follows a binomial distribution? It is possible to count how many events have occurred, but meaningless to ask how many events have NOT occurred In the binomial situation, we know the probability of two mutually exclusive events (p, q) – in the Poisson situation, we have no q (it has only one parameter the average frequency an event occurs) 44 Poisson versus Binomial Binomial Distribution Poisson Distribution Fixed Number of Trials (n) [10 pie throws] Infinite Number of Trials Only 2 Possible Outcomes [hit or miss] Unlimited Number of Outcomes Possible Probability of Success is Constant (p) [0.4 success rate] Mean of the Distribution is the Same for All Intervals (mu) Each Trial is Independent [throw 1 has no effect on throw 2] Number of Occurrences in Any Given Interval Independent of Others Predicts Number of Successes within a Predicts Number of Occurrences per Set Number of Trials Unit Time, Space, ... 45 Source: http://www2.cedarcrest.edu/academic/bio/hale/biostat/session12links/BvsP.html Poisson probability distribution Examples Number of customers arriving at a supermarket checkout between 5 PM and 6 PM Number of text messages you receive over the course of a week Number of car accidents over the course of a year 46 Poisson probability distribution Two properties of Poisson distributions 1) The probability of occurrence is the same over any two time intervals of equal length 2) The occurrence or nonoccurrence in any time interval is independent of occurrence or nonoccurrence in any other time interval 47 Poisson probability distribution f ( x) x e x! where: f(x) = probability of x occurrences in an interval = mean number of occurrences in an interval e = 2.71828 For more info: https://en.wikipedia.org/wiki/E_(mathematical_constant) 48 Drive-up teller window example Suppose that we are interested in the number of cars arriving at the drive-up teller window of a bank during a 15-minute period on weekday mornings We assume that the probability of a car arriving is the same for any two time periods of equal length (i.e. prob of a car arriving in the first minute is exactly the same as the prob of a car arriving in the last minute), and the arrival or non-arrival of a car in any time period is independent of the arrival or non-arrival in any other time period An analysis of historical data shows that the average number of cars arriving during a 15-minute interval of time is 10, so the Poisson probability function with = 49 10 applies Drive-up teller window example We want to know the probability that exactly 5 cars will arrive over the 15 minute time interval Identify: x and = 10 arrivals / 15 minutes, x = 5 X=5 => we are given that there are 10 arrivals every 15 minutes, so the average # of arrivals over the time period is 10 50 Drive-up teller window example = 10 arrivals / 15 minutes, x = 5 105 (2.71828)10 f (5) .0378 5! So, there is a 3.78% chance that exactly 5 cars will arrive over the 15 minute time period 51 Highway defect example • Suppose that we are concerned with the occurrence of major defects in a section of highway one month after that section was resurfaced • We assume that the probability of a defect is the same for any two highway intervals of equal length (i.e. the probability of a defect between mile markers 1 and 2 is the same as the probability of a defect between mile markers 4 and 5, etc.) and that the occurrence of a defect in any one mile interval is independent of the occurrence or nonoccurrence of a defect in any other interval • Thus, the Poisson probability distribution applies 52 Highway defect example 53 Find the probability that no major defects occur in a specific 3-mile stretch of highway assuming that major defects occur at the average rate of two defects per mile Highway defect example 54 Poisson probability distribution Expected value E(x) = µ = the rate or frequency of an event Variance Var(x) = 2 = Standard deviation = 55 Highway defect example 56 In the highway defect example, calculate: Expected value Variance Standard deviation Summary Discussion of random variables Discrete Continuous Examples of discrete probability distributions Uniform Binomial Poisson 57