Discrete probability distributions Chapter 6 - Sullivan Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 212-220 8000 x 74 21 Office hours: Tue, Thu 1:30-3 pm Random variables and distributions • A random variable is a variable (typically represented by x) that has a single numerical value, determined by chance, for each outcome of a procedure. • A probability distribution is a graph, table, or a formula that gives the probability for each value of the random variable. Probabilities of girls x (girls) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 P(x) 0.000 0.001 0.006 0.022 0.061 0.122 0.183 0.209 0.183 0.122 0.061 0.022 0.006 0.001 0.000 Gender of children: A study consists of randomly selecting 14 newborn babies and counting the number of girls in the sample. If we assume that having a boy or a girl is equally likely, and let x = number of girls among the 14 babies then x is a random variable because its value depends on chance. The possible values are x =0,1,2,3,…,11,12,13,14. 2 A probability distribution is shown to the left. Discrete and Continuous Random Variables (r.v.’s) • • A discrete random variable has either a finite or countable number of values. Countable means it might be infinite, but you can still “count” them (there are gaps between them). A continuous random variable has infinitely many values without gaps between them (like interval subsets of the real numbers). Examples: Discrete random variables: • Number of eggs a hen lays per day. – cannot lay 2.3 eggs one day! – R.v.: # not known for sure in advance! • Number of people attending the Columbus Day Parade. – Discrete r.v.: counting the number of people. Random: we do not know in advance exactly how many are going. (but we might have an estimate) • The sum of the faces when we roll two dice. • The points in hand of Black Jack. • The average number of eggs per hen per day in a farm with 10 hens. Continuous random variables: • Amount of milk a cow produces a day. –Continuous r.v.: She might yield 1.345 gallons, or 1.34512 (no gaps in measurement). • The humidity at a given day. –Continuous r.v.: Percentage of humidity can be 75.34%. • The daily closing value of the Dow Jones Industrial Average index. • The daily ocean temperature at a marine laboratory investigating whales. 3 Probability histogram 0.250 Probability histogram • Very similar to relative frequency histogram • Instead of percent (relative frequency) probability is shown. • The values 0, 1, 2, …, 13, 14, are at the center of the rectangles -> base = 1 • area = height*base = height Probability 0.200 0.150 0.100 0.050 0.000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of Girls among 14 newborns 4 Requirements of Probability Distributions • ∑P(x) = 1 • 0 ≤ P(x) ≤ 1 where x assumes all possible values. for every individual value of x. Discussion: • x takes all possible values, so it represents all options in the sample space – For table ‘girls’, sum is 0.999, almost 1 except for rounding errors. • All P(x) between 0 and 1 because they are probabilities! Probabilities ??? x 0 1 2 3 P(x) 0.2 0.3 0.4 0.5 P(x) = x/9 for x = 2,3, & 4 Example: Does the table represent a probability distribution? • All values between 0 and 1. Good! • ∑P(x) = 0.2+0.3+0.4+0.5 = 1.4 . Uups! • is not 1. • Therefore, it is not a probability distribution. Does the function P(x) = x/9 represent a probability distribution? • • • • P(2) =2/9, P(3) =3/9, P(4) =4/9, ∑P(x) = 2/9 + 3/9 + 4/9 = (2+3+4)/9 = 9/9 = 1 It is 1. Therefore, the function does represent a probability distribution. 5 Mean, Variance and Standard Deviation for Distributions • • • 2 = ∑ x•P(x) = ∑ (x – )2•P(x) = ∑ [ x 2 •P(x) ] – 2 mean variance variance (alternative formula) • = √ ∑ [ x 2 •P(x) ] – 2 standard deviation x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 sum mean P(x) 0.000 0.001 0.006 0.022 0.061 0.122 0.183 0.209 0.183 0.122 0.061 0.022 0.006 0.001 0.000 1.000 x^2 P(x) x P(x) 0.000 0.000 0.001 0.001 0.011 0.022 0.067 0.200 0.244 0.978 0.611 3.055 1.100 6.598 10.264 1.466 11.730 1.466 1.100 9.898 0.611 6.110 0.244 2.688 0.067 0.800 0.011 0.144 0.001 0.012 52.500 7.000 Rationale: 7.000 -mean^2 variance standard deviation 1.871 -49.000 3.500 6 Mean, Variance for Distributions (round-off and unusual values) • • • Round off at 1 more decimal than data! Minimum usual value – 2 Maximum usual value + 2 Example: In previous calculation, = 7, =1.9. • Minimum usual value: – 2 = 7 – 2(1.9) = 3.2 • Maximum usual value: + 2 = 7 + 2(1.9) = 10.8 For the group of 14 babies, the usual values for the number of girls fall between 3.2 and 10.8. Rare event rule: If, under a given assumption, the probability of an event is extremely low, we conclude that the assumption is most likely incorrect. With probabilities: • x successes among n trials are unusually high if P(x or more) <0.05 • x successes among n trials are unusually low if P(x or less) <0.05 Example (Gender Selection): Getting 13 or more girls. P(13 or more girls) =P(13)+P(14) = 0.001+0.000 = 0.001 unusually high. 7 Expected Value The mean of a discrete random variable (expected value) denoted by E or μX , and it represents the average value of the outcomes. μX = E = E[X] = ∑ { x•P(x) } Example (NJ pick 3 game): Bet $ 0.50 and select a 3 digit number between 000 and 999. If you get the number, you collect $275. Your net gain is then $274.50. Suppose that you bet $0.50 on the number 007. What is your expected value of gain or loss? Event x P(x) xP(x) A: Each outcome is equally likely. Win $274.50 0.001 $0.2745 Loose -$0.50 0.999 -$0.4995 P(win) = 1/1000 = 0.001 P(loss) = 999/1000 = 0.999 Total -$0.2250 E[X] = ∑ x•P(x) = ∑ x•P(x) =274.50 • 0.001 + (-0.50) • 0.999 win loss = 0.2745 - 0.4995 = - 0.225 On average you will be loosing 22.5 cents every time you play. 8 Bernoulli Distribution The Bernoulli probability distribution results from a procedure such that: • there is one trial, like one flip of a coin • there are only two outcomes (heads/tails, 0/1, red/white, success/failure) Examples: • • • • Probabilities: Tossing one coin (or bean) • 1 trial outcomes: heads or tails • Birth of one child: 1 trial Outcomes: boy or girl Tossing one die, win if it’s 6, loose 1-5 1 trial outcomes: win or loose • Suppose you pay $1 to play and get $3 back if ‘6’comes out. Weather tomorrow 1 trial (day) Outcomes: rain or shine P(X=heads)=0.5 P(X=tails)=0.5 P(girl)=0.513 = p success probability P(boy)=0.487 = q = (1– p) failure prob X=“number of girls” in one birth: 0 or1 • = 0P(0)+1P(1) = 0 q + 1p = p 2 = 0 2P(0)+12P(1) – p 2 =p – p 2 =p(1 – p) = pq P( win) =1/6 = p , P(loose)=5/6 = q X=“number of wins” in one toss: 0 or 1 = 0P(0)+1P(1) = p = 1/6, 2 =pq= 5/36 Expectation: E[X] =3•1/6 + (-1)•6/6 = – 3/6 On average you will be loosing 50 cents per play Binomial Distributions A procedure has a binomial probability distribution if: • each trial must have all outcomes in 2 categories • the procedure has a fixed number of trials • the trials are independent • the probabilities must remain constant for each trial Notation for binomial probability distributions: 2 categories: S success (p prob. of success) Probabilities: P(S) = p n x :: X = x p q P(x) = P( X = x ) P( X ≤ x ) B( n , p ) F failure (q prob. of failure) P(F) = q =1– p fixed number of trials X denotes the random variable, x denotes number of successes in n trials probability of success (success is arbitrary, can be good or not) probability of failure probability of getting exactly x successes among n trials probability of getting x or less successes among n trials binomial distribution with n trials and probability of success p Note: B(n,p) = sum of n independent Bernoulli distributions with probability of success p X = Y1 + Y2 + …+ Yn X = Y1 + Y2 +…+ Y n = p + p +…+ p = np 10 2X = 2Y1+ 2Y2 +…+ 2Y n = pq + pq +…+ pq = npq Binomial Distributions: Examples Remember: • Poll and test samples usually done without replacement -> dependent • If sample small enough (< 5% of population), then it is safe to assume independence (even though there is no independence) Multiple choice answers: (answered at random, options: a,b,c,d,e, 4 questions) • P(3 answers correct) • Binomially distributed? – – – – Number of trials fixed n = 4. Trials independent. (answers do not depend on previous ones). 2 outcomes: right, or wrong. One answer correct, p=1/5=0.2; q = 0.8. YES! Use binomial formula 11 Binomial Distributions: Examples Continued Use table A-1: n 4 4 4 4 4 x 0 1 2 3 4 p 0.2 0.41 0.41 0.154 0.026 0.002 x 0 1 2 3 4 P (x) 0.4096 0.4096 0.1536 0.0256 0.0016 Hence, P(3) = 0.0256 Question: What is the probability that at least 3 answers are correct? • • HW: Sullivan Review Chapter 6, SC p315 #1-5, 7, 8, 13, 15 ‘at least 3 answers correct’ = {X≥3} = {X=3 or X= 4} P(X ≥ 3 ) = P(X = 3 ) + P(X = 4 ) = 0.0256 + 0.0016 = 0.0272 Mean, variance and expectation: X = np = 4 ( 0.2) = 0.8 2X = npq = 4 ( 0.2) (0.8) = 0.64 -> X = 0.8 Suppose that someone you pay $1000 if the person that answers at random won’t answer 3 or more answers correctly, and that you receive $100 otherwise. What is your expected loss/gain? E[ X] = -1000 (0.0272) + 100 ( 1- 0.0272) = - 27.2 + 77.28 = 49.92 12 Homework • Sullivan Review exercises chapter 6 – P. 315 (softcover) • 1-5, 7, 8, 13, 15 13