Goal 1: To help practicing teachers get a fundamental understanding of t-tests and p-value in a short amount of time and apply it to biology data they obtain, without “too much math.” 2-sample t-tests p-value approach Sampling distributions & CLT Continuous probability distribution Probability Fundamentals Goal 2: Learn to use TI-84 Graphing Calculator to do statistical analysis and find p-values and understand what the results mean. AIMS – statistics workshop Day 2 page 1 Goal for Day 2: Develop an intuitive understanding of the binomial and the normal probability distributions, de-emphasizing math formulas; Relate normal distributions to teaching and the biology experiment; Get comfortable with characteristics of normal distribution, such as spread (in St. Dev.), and probability is an area. Illustrate calculator usage for finding probabilities of probability distributions. Develop understanding of what statistically significant means in terms of probability. AIMS – statistics workshop Day 2 page 2 Illustration of Probability Simulator on the TI-84 using the TI-Smart View software Question: Is it better to use actual manipulatives (like spinners and dice) with students, or software simulations? Does the age of the student matter? AIMS – statistics workshop Day 2 page 3 Recall: A Probability Distribution is a list of the outcomes from an experiment along with their respective probabilities. Experiment: toss 2 coins Probability Distribution: Heads 0 1 2 P(H) 1/4 1/2 1/4 The only rules for a probability distribution is that each probability must be a number between 0 and 1, and the sum of all the probabilities must add to 1 (or 100%). AIMS – statistics workshop Day 2 page 4 Probability distributions may result from discrete or continuous data. We will look at a discrete probability distribution - the binomial distribution. Then we will look at a continuous probability distribution - the normal distribution. Probability Distributions Discrete * Binomial Geometric Poisson Continuous * Normal Chi-Square AIMS – statistics workshop Day 2 page 5 What is discrete data? What is continuous data? discrete before continuous Discrete = counts AIMS – statistics workshop Day 2 page 6 Continuous = measurements Discrete or continuous? 1. heights of third-graders in your class 2. number of students in each grade 3. number of siblings for each child 4. weights of students 5. teacher’s salary schedule AIMS – statistics workshop Day 2 page 7 spinners, coins, dice = discrete data All of the examples of probability distributions from Day 1 were discrete. When an experiment (with discrete data) has exactly two outcomes, the probability is constant AIMS – statistics workshop Day 2 page 8 for each trial, and there are a fixed number of trials, the experiment is called binomial. Tossing coins is a (discrete) binomial experiment. 1st toss 2nd toss H H T H T T AIMS – statistics workshop Day 2 page 9 Notice that binomial experiments have two branches for each trial. Experiment: toss 2 coins Probability Distribution: Heads 0 1 2 P(H) 1/4 1/2 1/4 Bar graph for probability distribution 1 AIMS – statistics workshop Day 2 page 10 Notice the sum of the areas of the bars = 1 1/2 0 0 1 2 The notion that the sum of the areas of the bars in a probability distribution is equal to 100% is a key concept in statistics! AIMS – statistics workshop Day 2 page 11 The experiment of rolling 2 dice is NOT binomial, and you easily see that there are more than two branches in the tree diagram. Ex: Roll 2 dice 1st die 1 2nd die 1 2 3 4 5 6 2 AIMS – statistics workshop Day 2 page 12 3 4 5 6 Experiment: roll 2 dice Probability Distribution: sum P(sum) 1 2 3 4 5 0 1/36 2/36 3/36 4/36 AIMS – statistics workshop Day 2 page 13 6 7 8 9 10 11 12 13 5/36 6/36 5/36 4/36 3/36 2/36 1/36 0 However, it is a probability distribution, and here’s the bar graph for rolling two dice. The sum of the areas of the bars = 100%. AIMS – statistics workshop Day 2 page 14 6/36 5/36 4/36 3/36 2/36 1/36 2 3 4 5 6 7 8 9 10 11 12 BINOMIAL EXPERIMENT Requirements: AIMS – statistics workshop Day 2 page 15 1. 2. 3. 4. outcomes classified into two categories has a fixed number of trials, n each trial is independent probability of success stays constant for each trial, p Notation: n = number of trials x = desired number of successes p = probability of success on single trial Which of the following are binomial experiments? 1. Tossing an unbiased coin 500 times. 2. Tossing a biased coin 500 times. AIMS – statistics workshop Day 2 page 16 3. Surveying 500 consumers to find the brands of toothpaste preferred. 4. Surveying 500 consumers to determine whether their preferred brand of toothpaste is Brand X. 5. Polling 500 voters on the Presidential election from a population of 200,000 voters if 35% are Republican and 65% are Democratic. 6. Polling 1000 voters in the presidential election from a population of 8 million voters, of which 40% are Democrats, 35% are Republicans, and 25% are Independent. 7. Firing 20 missiles at a target with a hit rate of 90%. 8. Testing a sample of eight drug dosages from a population of 5000, of which 2% are contaminated. 9. Administering a driving test to 50 license applicants with a passing rate of 72%. QUIZ Number your paper 1-5. AIMS – statistics workshop Day 2 page 17 Multiple choice: answer A, B, C, D We are going to look at the probability distribution for the number of right answers. AIMS – statistics workshop Day 2 page 18 # right 0 1 2 3 4 5 Prob(# right) How do we find the probabilities for the table? We could use tree diagrams . . . very messy . . . item 1 2 3 4 R R W 5R W R W AIMS – statistics workshop Day 2 page 19 R R W R W R W R W W W R W R W R W R W R W R W R W R R W R W R W R W R W R W R W R W R W R W R W R W R W R W or we could use rules of probability to find W probabilities . . . . very tedious . . . AIMS – statistics workshop Day 2 page 20 Ex: Prob(exactly 1 right) = P(RWWWW) or P(WRWWW) or P(WWRWW) or P(WWWRW) or P(WWWWR) = ¼•¾•¾•¾•¾+¼•¾•¾•¾•¾+¼•¾•¾•¾•¾+¼• ¾•¾•¾•¾+¼•¾•¾•¾•¾= And we’d have to do this for each entry in the table!! Note: the binomial formula is a generalization of the rules of probability: n! . px . (1– p)n-x P(x) = x!(n–x)! Another, easier alternative is to let the TI-84 calculator “do the math.” AIMS – statistics workshop Day 2 page 21 DIST –> binompdf(n, p, x) # right 0 1 2 3 4 5 Prob(# right) And the graph of the distribution might look like this: AIMS – statistics workshop Day 2 page 22 0 1 2 3 4 5 But the point here is NOT to get bogged down in calculator distributions (although you may want to explore them further on your own), and not to get bogged down in the math, but rather to understand this: AIMS – statistics workshop Day 2 page 23 The graphs of probability distributions show 100% of the probability, which is equivalent to the sum of the areas of the bars. The area of each bar represents the probability for the particular value of the random variable. With this main idea in mind, let’s move on to a continuous probability distribution, the normal distribution. NORMAL DISTRIBUTION AIMS – statistics workshop Day 2 page 24 Characteristics: 1. Continuous data (or data treated as continuous) 2. Symmetric 3. Virtual range (99.8% of data) within 3 standard deviations each side of mean (a spread of 6 st. dev.) 4. Total area under the curve = 100% 5. Probability = area under curve between 2 data values Notation: x = mean s = standard deviation AIMS – statistics workshop Day 2 page 25 With continuous data, we can’t make a table as we did for discrete data. Instead, we draw the graph of the distribution, but instead of a bar graph, we use a frequency curve. We are going to do two problems, just to give you a sense of how probability is related to the normal distribution. AIMS – statistics workshop Day 2 page 26 Heights of women are normally distributed with a mean of 63.6 in. and a standard deviation of 2.5 in. (based on information from the National Health Survey). The U.S. Army requires women's heights to be between 58 in. and 80 in. Find the percentage of women meeting that height requirement. Are many women being denied the opportunity to join the Army because they are too short or too tall? To solve the problem, we need to know the area (which is the probability) between 58 and 80 under the curve. AIMS – statistics workshop Day 2 page 27 In calculus-based statistics, you integrate to find the area between two points under a curve. CALCULUS!! Argh!! We are going to let the TI-84 calculator “do the math.” DISTR –>normalcdf (lower, upper, x , s) AIMS – statistics workshop Day 2 page 28 Cans of regular Coke are labeled as containing 12 oz. The contents are normally distributed with a mean of 12.19 oz. and a standard deviation of 0.11 oz. (based on information from the Coca-Cola Co.). What percentage of cans contain less than the 12 oz. Printed on the label? For this problem, the question of interest is: Are many consumers being cheated? AIMS – statistics workshop Day 2 page 29 The Normal Probability Distribution helps us decide what results are typical, and what results are unusual, i.e., If you knew the heights of plants are normally distributed (mean 13, st. dev. 2), then how unusual would it be to get a plant that grew 22 inches tall? AIMS – statistics workshop Day 2 page 30 Now, a little twist on our thinking . . . what if you are told the probability of winning Lotto is 1/13000000, and yet you buy a ticket and . . . YOU WIN? Is it significant? In Statistics, we use the word significant to describe getting a result or outcome that we didn’t expect to get (because the probability was so small). So a plant that grows to 22 inches, when the heights of plants are normally distributed with a mean 13 and a st. dev. 2, is statistically significant because the probability of getting such a plant is low. AIMS – statistics workshop Day 2 page 31 Connected Math – page 8 Quality ratings for natural and regular peanut butter If you make a decision based on looking at graphs, you can’t be sure if the differences you see are typical sampling fluctuation, or a significant difference in quality ratings. Statistics helps you find the probability, which helps you make an informed decision! AIMS – statistics workshop Day 2 page 32 Suppose two teachers compare their student's ISAT scores, and see that one of the classes has lower scores than the other class. Should the teacher whose class has lower scores be concerned? What if the two teachers were told that these results were just random sampling fluctuation, and that there was no statistically significant difference between the classes? AIMS – statistics workshop Day 2 page 33 Or maybe a teacher notices that a many of her students are heavier than students in the other classes of the same grade. Should she be worried about her class being overweight? What if she were told that the probability of having a class with this average weight was very unusual, i.e., statistically significant. Now should she worry? Explain what it means to you now when you hear the words “statistically significant.” AIMS – statistics workshop Day 2 page 34 OK, so the idea of knowing probabilities can help us make good decisions. And if an outcome is statistically significant, that means you got a result which had a low probability of occurring. And we have some understanding of the normal distribution and finding probabilities using it. But now you might be thinking … not all data is normal. The plant data you will be collecting is not necessarily normal. So what’s the fuss about knowing the normal distribution???? - stay tuned for tomorrow’s lesson! AIMS – statistics workshop Day 2 page 35