Section 5.1 – Probability Distributions A random variable is a variable (typically represented by x) that has a numeric value, determined by chance, for each possible outcome of an experiment Examples: The number of students passing a certain class The average height of the students in a class The number of girls in a family of 5 children The sum on the faces of two rolled dice The number of defective parts in a sample of 20 The average daily temperature A word about randomness The word randomness suggests unpredictability. Randomness and uncertainty are vague concepts that deal with variation. A simple example of randomness involves a coin toss. The outcome of the toss is uncertain. Since the coin tossing experiment is unpredictable, the outcome is said to exhibit randomness. Even though individual flips of a coin are unpredictable, if we flip the coin a large number of times, a pattern will emerge. Roughly half of the flips will be heads and half will be tails. This long-run regularity of a random event is described with probability. Our discussions of randomness will be limited to phenomenon that in the short run are not exactly predictable but do exhibit long run regularity. A discrete random variable has either a finite or a countable number of values. This chapter deals with discrete random variables. A continuous random variable has infinitely many values, and those values can be associated with measurements on a continuous scale in such a way that there are no gaps or interruptions. A probability distribution is a graph, table, or formula that gives the probability for each possible value of the random variable. (Notice: similar to relative frequency tables, histograms) A probability histogram is a way to graph a probability distribution. The vertical scale shows probabilities instead of relative frequencies. Note that the area of these rectangles is the same as the probabilities. 1 M116 – NOTES – CH 5 Section 5.1 – Probability Distributions Requirements for a Probability Distribution o 0 P(X = x) 1 o The sum of the probabilities of a discrete random variable is 1. P( X x) 1 To evaluate the mean and standard deviation of a probability distribution using the calculator Enter x into L1 Enter the probabilities into L2 Press STAT Arrow right to CALC Select 1: 1-Var Stats L1,L2 Press ENTER Identifying Unusual Results with the Range Rule of Thumb (section 6.2) The range rule of thumb is based on the principle that for many data sets (symmetrical, bell shaped), the vast majority (such as 95%) of sample values lie within two standard deviations of the mean. Less common values are more than two standard deviations from the mean. Minimum “usual” value ~ mean – 2 * standard deviation = 2 Maximum “usual” value ~ mean + 2 * standard deviation = 2 Identifying Unusual Results with Probabilities Unusually high: x successes among n trials is unusually high if P(x or more) is very small (such as less than 0.05) Unusually low: x successes among n trials is unusually low if P( or fewer) is very small (such as less than 0.05) 2 M116 – TI 83/84 CALCULATOR – CH 5 Using the TI-83/84 calculator to find the Mean and Standard Deviation of Probability Distributions To evaluate the mean and standard deviation using the calculator Enter x into L1 Enter the probabilities into L2 Press STAT Arrow right to CALC Select 1: 1-Var Stats L1,L2 Press ENTER Example 1) When randomly selecting jail inmates convicted of DWI (driving while intoxicated), the probability distribution for the number x of prior DWI sentences is as described in the accompanying table (based on data from the U.S. Department of Justice). x 0 1 2 3 P(x) 0.512 0.301 0.138 0.049 a) What is the population and the success attribute? b) Describe in words the random variable. (What are we counting?) c) What are the possible values of the random variable? d) Verify that the given table is a probability distribution e) Use the calculator to find the mean and standard deviation of this distribution. f) Which values are usual and which are unusual, according to (i) The probability rule? (ii) The range rule of thumb? 3 M116 – NOTES – CH 5 Section 5.2 & 5.3 – Binomial Experiments Features of a binomial experiment (5.2) 1) 2) 3) 4) The experiment has a fixed number of trials (n) The trials must be independent Each trial has 2 possible outcomes: success (S) and failure (F) Probabilities remain constant for each trial. p is the probability of success, and q is the probability of failure When sampling without replacement, the events can be treated as if they were independent if the sample size is no more than 5% of the population size. (That is, n 0.05 N ) Find binomial probabilities with a shortcut feature of the calculator To find individual probabilities: Use binompdf(n,p,x) Press 2nd VARS Select 0:binompdf( Type n,p,x) Press ENTER To calculate cumulative probabilities from 0 to x, use binomcdf(n,p,x) Mean, Variance, and Standard Deviation for the Binomial Distribution (5.3) If we have the probability distribution in the editor of the calculator we can use the calculator by doing STAT – CALC, 1-VarStat L1, L2 Otherwise we can use these formulas for binomial distributions. npq np Remember that the variance is the square of the standard deviation: Variance = 2 ( npq )2 npq Unusual values (5.3) For a binomial distribution, it is unusual for the number of successes to be more than 2.5 σ from µ. Minimum “usual” value ~ 2.5 Maximum “usual” value ~ 2.5 4 M116 – TI 83/84 CALCULATOR – CH 5 Binomial Distributions and Simulations (Chapter 5) Example 2) – Booking tickets: Air America has a policy of booking as many as 15 persons on an airplane that can seat only 14. Past studies have revealed that only 85% of the booked passengers actually arrive for the flight. Find the probability that if Air America books 15 persons, not enough seats will be available. a) Describe the random variable and success attribute. Give the possible values of the random variable. Give the number of trials and the probability of success. b) Use the calculator to find the probability that if Air America books 15 persons, not enough seats will be available. c) Is it unusual to find that there are not enough sits available? Should overbooking be a concern for passengers? d) SIMULATION Now we are going to simulate this situation by repeating the experiment 20 times. Use MATH PRB 7:randBin(n,p) and press ENTER 20 times. Record results in a table, and then use your table to answer the question to the problem. e) Use class results and answer the question again. f) OPTIONAL (OYO) Here we have another simulation technique. Use the calculator to generate 50 numbers that come from a binomial distribution with n = 15 and p = 0.85 (We’ll clear List 1, generate the numbers and store them into List 1, we’ll sort the list and then explore the editor) STAT 4:ClrList L1 : MATH PRB 7:randBin(n,p,50) STO L1 : STAT 3:SortA(L1) Go to the editor, explore the list and count how many times we had 15 passengers showing up. Then determine the probability, and compare with the theoretical results from part (a). Comment on the law of large numbers. 5 M116 – TI 83/84 CALCULATOR – CH 5 Binomial Distributions – Usual and Unusual Outcomes– Why do we care about it? 1) Experiment: Counting the number of girls born to 100 women. Use the range rule of thumb with n = 100 and p = .5 to find the usual range of the distribution of x: the number of girls among 100 babies. Application: Gender Selection: ProCare Industries, LTD., once provided a product called “Gender Choice”, which, according to advertising claims, allowed couples to “increase your chances of having a boy up to 85%, and a girl up to 80%”. Gender Choice was available in blue packages for couples wanting a baby boy and pink packages for couples wanting a baby girl. Suppose we conduct an experiment with 100 couples who want to have baby girls, and they all follow the Gender Choice “easy-to-use in-home system” described in the pink package. In the box, they show the CLAIM: Gender Choice increases your chances of having a girl (or boy). So here we have two conflicting hypothesis: Gender Choice has no effect Claim Gender Choice increases your chances of having a girl/ boy What would you conclude about the effectiveness of Gender Choice if 100 couples using the pink package have 100 babies consisting of a) 52 girls? (52%) There is sufficient evidence to support the claim that the gender selection method is effective There is not sufficient evidence to support the claim that the gender selection method is effective b) 77 girls? (77%) There is sufficient evidence to support the claim that the gender selection method is effective There is not sufficient evidence to support the claim that the gender selection method is effective Rare Event Rule for Inferential Statistics If, under a given assumption, the probability of a particular observed event is exceptionally small, (less than 0.05) we conclude that the assumption is probably not correct. 6 2) The rate of Lyme disease cases in Clinton County is 2%. In groups of 1000 what is the usual range of the distribution of x: the number of people of the county who has Lyme disease out of 1000. Here is the rest of the story: A new vaccine has been developed to avoid getting Lyme disease. We would like to know whether the vaccine is effective. There are two conflicting hypotheses: The vaccine is not effective Claim The vaccine is effective Case 1: When 1000 people from that county are given the new vaccine, it is found that 19 of them contract Lyme disease We support the claim that the vaccine is effective We don’t have enough evidence to support the claim that the vaccine is effective Case 2: When 1000 people from that county are given the new vaccine, it is found that 7 of them contract Lyme disease We support the claim that the vaccine is effective We don’t have enough evidence to support the claim that the vaccine is effective Rare Event Rule for Inferential Statistics If, under a given assumption, the probability of a particular observed event is exceptionally small, (less than 0.05) we conclude that the assumption is probably not correct. 7 Section 5.2 and 5.3 – how this material helps us in inferential statistics? 3) There are two conflicting hypotheses: The coin is fair Claim The coin is not fair Case 1: Heads turns up 17 times in 30 tosses We support the claim that the coin is NOT fair We don’t have enough evidence to support the claim that the coin is NOT fair Case 2: Heads turns up 27 times in 30 tosses We support the claim that the coin is NOT fair We don’t have enough evidence to support the claim that the coin is NOT fair 8 4) There are two conflicting hypotheses: The die is fair Claim The die is not fair Case 1: The outcome of 1 occurs 9 times in 60 rolls We support the claim that the die is NOT fair We don’t have enough evidence to support the claim that the die is NOT fair Case 2: The outcome of 1 occurs 52 times in 60 rolls We support the claim that the die is NOT fair We don’t have enough evidence to support the claim that the die is NOT fair 9