YOUNGSTOWN CITY SCHOOLS MATH: ALGEBRA II UNIT 4: INFERENCES AND CONCLUSIONS FROM DATA (7 WEEKS) 2013-2014 Synopsis: Students use the concepts from statistics learned in earlier grades to relate to various types of data and probability distributions. Using several different ways of collecting data: sampling, surveys, experiments and simulations, the students make conclusions based on the data. They will examine randomness and error in evaluating their conclusions. STANDARDS S.ID.4 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets, and tables to estimate areas under the normal curve. S.ID.4 While students may have heard of the normal distribution, it is unlikely that they will have prior experience using it to make specific estimates. Build on students’ understanding of data distributions to help them see how the normal distribution uses area to make estimates of frequencies (which can be expressed as probabilities). Emphasize that only some data are well described by a normal distribution. S.IC.1 Understand statistics as a process for making inferences about population parameters based on a random sample from that population. S.IC.2 Decide if a specified model is consistent with results from a given data generating process, e.g., using simulation. For example, a model says a spinning coin falls heads up with probability 0.5. Would a result of 5 tails in a row cause you to question the model? S.IC.2 For S.IC.2, include comparing theoretical and empirical results to evaluate the effectiveness of a treatment. S.IC.3, 4, 5 Iin earlier grades, students are introduced to different ways of collecting data and use graphical displays and summary statistics to make comparisons. These ideas are revisited with a focus on how the way in which data are collected determines the scope and nature of the conclusions that can be drawn from that data. The concept of statistical significance is developed informally through simulation as meaning a result that is unlikely to have occurred solely as a result of random selection in sampling or random assignment in an experiment. S.IC.3 Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each. S.IC.4 Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. S.IC.5 Use data from a randomized experiment to compare two treatments; uses simulations to decide if differences between parameters are significant. S.IC.4,5 Focus on the variability of results from experiments—that is, focus on statistics as a way of dealing with, not eliminating, inherent randomness. S.IC.6 Evaluate reports based on data. S.MD.6, 7 Extend to more complex probability models. Include situations such as those involving quality control, or diagnostic tests that yield both false positive and false negative results. *Use S.MD.6,7 as Enrichment Support; See Model Curriculum S.MD.6 (+) Use probabilities to make fair decisions (e.g., drawing by lots, using a random number generator). S.MD.7 (+) Analyze decisions and strategies using probability concepts (e.g., product testing, medical testing, pulling a hockey goalie at the end of a game). 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 1 MATH PRACTICES 1. Make sense of problems and persevere in solving them. 2. Reason abstractly and quantitatively. 3. Construct viable arguments and critique the reasoning of others. 4. Model with mathematics. 5. Use appropriate tools strategically. 6. Attend to precision. 7. Look for and make use of structure. 8. Look for and express regularity in repeated reasoning LITERACY STANDARDS L.1 L.2 L.4 L.5 L.6 L.7 L.8 L.9 Learn to read mathematical text (including textbooks, articles, problems, problem explanations) Communicate using correct mathematical terminology Listen to and critique peer explanations of reasoning Justify orally and in writing mathematical reasoning Represent and interpret data with an without technology Research mathematics topics or related problems Read appropriate text, providing explanations for mathematical concepts, reasoning or procedures Apply details of mathematical reading/use information found in texts to support reasoning and develop a “works cited document” for research done to solve a problem. MOTIVATION 1. To introduce students to making inferences from data, engage them in activity #1 on page 10 which TEACHER NOTES is a simulation concerning prizes in cereal boxes. 2. Have students set academic and personal goals for the unit. 3. Review the authentic assessment for the students so they know what will be expected at the end of the unit. TEACHING-LEARNING Vocabulary: Unimodal TEACHER NOTES Bimodal Standard deviation Normal distribution Bell shaped curve Probability Relative frequency Sample space Independent events Dependent events Counting principle Permutations Combinations Expected value Conditional probability Surveys Empirical probability Theoretical probability Random variable Observational studies Margin of error Statistic Parameter Web sites useful in this unit that help with statistical data: http://illuminations.nctm.org/ActivityDetail.aspx?ID=159 this tool generates a sampling distribution by generating a large number of trials. You can choose the type of trial, how many tickets to draw, and how many times to repeat the trial. The results are shown in a histogram and table. http://illuminations.nctm.org/ActivityDetail.aspx?ID=78 This tool can be used to create a histogram for analyzing the distribution of a data set using data that you enter or using pre-loaded data that you select. http://illuminations.nctm.org/ActivityDetail.aspx?ID=67 This applet simulates drawing tickets from a box, where each ticket has a number written on it. After you decide which tickets to place in the box, the applet chooses tickets at random. The relative frequency of each number is displayed in a frequency distribution at the bottom of the applet. 1. Students should have a good understanding of mean, median (refer to the textbook page 664), MAD, histograms, and box plots. Teacher may need to review these concepts. With histograms, discuss unimodal, and bimodal. (S.ID.4, MP.1, MP.2, MP.4, MP.5, L.1, L.2, L.6) For MAD, refer to the web site: http://www.glencoe.com/sites/pdfs/impact_math/ls9_c1_mean_absolute_deviation.pdf 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 2 TEACHING-LEARNING TEACHER NOTES 2. To review dot plots, examine overlapping dot plots with the following two problems: a) Go Shop Already is a babysitting service in a mall. For a month, they kept track of the number of babies they were left to watch. Make a dot plot of the data: 31, 24, 29, 24, 16, 14, 25, 17, 30, 18, 19, 26, 18, 23, 26, 17, 16, 24, 30, 27, 19, 29, 22, 20, 32, 30, 20, 21, 29, 23 b) We Sit For You is a babysitting service in a mall. For a month, they kept track of the number of babies they were left to watch. Make a dot plot of the data on top of the dot plot for (a) and in a different color: 35, 43, 43, 31, 32, 37, 37, 32, 45, 38, 43, 36, 31, 40, 42, 36, 41, 41, 44, 31, 31, 31, 31, 32, 43, 40, 33, 34, 44, 42 c) By examining the dot plots, which one would seem to be a better babysitting service? Explain. d) Reinforce with more examples of comparing data from two different data sets. Use histograms as well as dot plots. (S.ID.4, MP.1, MP.2, MP.4, MP.5, L.1, L.2, L.6) 3. Introduce the concept of standard deviation: A measure of the dispersion of a set of data from its mean. The more spread apart the data, the higher the deviation. To calculate the standard deviation , use the following formula: . Introduce with the following example: The table shows the amount of money spent on education per student in 2000. Atlantic States Expenditure per State student New Jersey 6440 Maryland 7680 Virginia 5450 North Carolina 5890 Midwest l States Expenditure per State student Ohio 7540 Indiana 8214 Illinois 5620 Michigan 6420 a) Find the mean: (Ans: 6656.75) b) Find the standard deviation: Use a table to calculate the standard deviation: Ans: 2733.90 xn − (xn − )2 Sum Divide the sum of the differences squared by the number of entries and then take the square root. Refer to Sections 12-6 and 12.7 in the text book. (S.ID.4, MP.1, MP.2, MP.4, MP.5, L.1, L.2, L.6) 4. Examine standard deviation in context with normal distribution or bell shaped curve. Discuss data which fits a normal distribution and data which does not. Then look at a set of data which forms a bell shaped curve such as the one below on students’ grades, draw a histogram and then sketch the bell shaped curve using the histogram as a guide. Discuss the percentages that are in each standard deviation away from the mean (refer to the “Key Concept” on page 672 in the textbook). Once this has been discussed, continue with the example problem. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 3 TEACHING-LEARNING TEACHER NOTES Example: The grades on a math test on statistics were 3 students received a 90%, 6 students received an 80%, 9 students received a 70%, 5 students received a 60%, 2 students received a 50%. a) Draw a histogram for the data, b) Find the mean Ans: 61.38 c) Find the standard deviation. Ans: 13.74 d) Draw the normal distribution e) Using the normal curve, what is the probability that a student will received a score of at least 59%? Approximate the probability, then discuss #5 below to find a more exact answer. f) Using the normal curve, what is the probability that a student will receive at least a 77%? Approximate the probability, then discuss #5 below to find a more exact answer. (S.ID.4, MP.1, MP.2, MP.4, MP.5, MP.6, L.1, L.2, L.6) 5. If asked to find the probability or relative frequency which is the area under the curve, use the table on page 11. Applying the table to the previous example (4-f), subtract the given score (77) from the mean (61-77 = -16) and take the absolute value (16), divide by the standard deviation (16/13=1.23). You are 1.23 standard deviations from the mean, so to read that off the table look for 1.2 on the left hand side and .03 on the top then read down and across (.3907). This is the area from the mean to 77, so to find the area from 77 to 100 subtract .3907 from .5 = 0.1093 (you are only working with half the curve). For part 4-e: 61-59 = 2, 2/13 = .154, you are .154 standard deviations from the mean, read 0.15 on the chart to give 0.0596, add 0.5 to 0.0596 to obtain the remainder of the curve giving 0. 5596. There are more problems on the following link and the table on page 11 can be used for those problems. The problems from the link are also attached on pages 12-15 (S.ID.4, MP.4, MP.5, MP.6, MP.8, L.2) http://www.analyzemath.com/statistics/normal_distribution.html DISTRICT GENERATED ASSESSMENT 6. To be able to work with probability the students need a good understanding of the difference between a statistic and a parameter, sample space, independent events, dependent events, counting principle, permutations, combinations, expected value and making fair decisions. Those concepts need to be reviewed before starting probability. The following links may be of help: (S.MD.6, S.IC.1, S.IC.2, MP.1, MP.2, MP.4, MP.5, MP.6, MP.7, MP.8, L.1, L.2, L.4, L.6, L.7, L.8) http://www.regentsprep.org/Regents/math/ALGEBRA/APR1/indexAPR1.htm http://www.regentsprep.org/Regents/math/ALGEBRA/APR2/indexAPR2.htm http://www.regentsprep.org/Regents/math/ALGEBRA/APR4/indexAPR4.htm http://www.regentsprep.org/Regents/math/ALGEBRA/APR6/indexAPR6.htm http://www.regentsprep.org/Regents/math/ALGEBRA/APR8/indexAPR8.htm http://www.regentsprep.org/Regents/math/ALGEBRA/MultipleChoiceReview/Probability.htm http://www.mathwords.com/c/combination_formula.htm Discuss the definition of probability, then have students do the following activity to determine the answer to the question, “When two coins are tossed, what is the probability that both coins come up heads?” What would you expect the answer to be? a) Divide students into groups of two or three b) Have them toss two coins and keep track if both are heads. c) Repeat the above step 49 times, yielding 50 trials d) Calculate the empirical probability (students have called this experimental probability in earlier grades) 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 4 TEACHING-LEARNING TEACHER NOTES e) Using a tree diagram, calculate the theoretical probability f) Compare the empirical to the theoretical probability and discuss discrepancies Discuss the following probability problem with the students: Downloaded on i-tunes on the computer are 36 songs of which 24 are Country-Western and 12 are rap. You want to load 10 songs randomly because you are in a hurry. What is the probability that you select 5 Country-Western and 5 rap? a) Determine number of successes: C(24,5) * C(12, 5) = b) Find total number of possible 10 song selections: C(36,10)=254,186,856 c) Find the probability: = 0.1324 = 13.24% Reinforce with additional problems, possibly having students make up problems similar to the one above and passing them to the student sitting next to them to solve. In order to make decisions based on probabilities, students need a good understanding of expected value. Expected value which is, in a probability distribution , the weighted average of possible values of a random variable, with weights given by their respective theoretical probabilities, is known as the expected value, usually represented by E(x). The expected value informs about what to expect in an experiment “in the long run,” after many trials. In most of the cases, there could be no such value in the sample space. Example 1: Look at the probability distribution: suppose two die are rolled, show the probability of the sum of the numbers appearing on the roll. This is the probability distribution Sum 2 3 4 5 Probability 1/36 1/18 1/12 1/9 6 7 5/36 1/6 8 9 5/36 1/9 10 1/12 11 1/18 12 1/36 Their probabilities are found by making a 6x6 square with the numbers 1 through 6 across the top and down the left side. Then put the sums of the top and side numbers in the square. Count the number of 2’s in the square which is 1 so the probability is 1/36, count the number of 3’s in the square which is 2 so the probability is 2/36 = 1/18, etc. 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 Have students make a histogram and compare to a normal curve. Now students are ready to look at the expected value. Take each sum and multiply it by its probability and then add the results: So the expected value is 7. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 5 TEACHING-LEARNING TEACHER NOTES Example 2: In the Vindicator a report was published stating the following statistics: TRI-COUNTY HUNGER By the numbers Feeding America’s “Map the Meal Gap” study, based on statistics collected by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics, says 31,430 Mahoning Valley children face food insecurity. Food insecurity numbers Ohio: 2,054,290, or 17.8 percent of the state’s population. Columbiana, Mahoning and Trumbull counties combined: 91,620, or 16.4 percent of the tricounty area’s population. Columbiana County: 16,760, or 15.5 percent of the county’s population. Mahoning County: 41,080, or 17.1 percent of the county’s population. Trumbull County: 33,780, or 16 percent of the county’s population. (Source: Feeding America Map the Meal Gap study) Based on these data for Mahoning County, how many adults face food insecurity? Ans: 9650 What is the expected value of the number of people facing food insecurity if they lived in the tricounty area? Ans: 91,620 Reinforce finding probability distributions and expected value with additional examples. Using probability to make fair decisions is used in every-day life. The following two activities illustrate fairness: A probability experiment may be considered "fair" if all outcomes are equally likely, or (in some cases) if the expected value of some random variable is 0. Example 1: There are 6 players in volleyball game. The team has to choose one of them randomly to be captain for a game. Tasha's plan: Assign each player a number. Then roll a number cube. The captain is the player whose number comes up. Martin's plan: Assign each player a number. Then flip 3 coins. Select a player according to the following chart. HHH - 1 HHT - 2 HTH - 3 HTT - 4 THH - 5 THT - 6 TTH - 1 TTT - 2 Check whether both of the plans can be considered fair in selecting a captain. First check Tasha's plan for fairness. Ans: The sample space of the number cube is {1, 2, 3, 4, 5, 6}, and each is equally likely possible outcome. Each player has an equal chance of selection as captain with probability of 1/6. Next check Martin's plan for fairness. Ans: The sample space of flipping 3 coins is {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} and there are 8 outcomes, which are equally likely. The players 1 and 2 have probability of 2/8 to be selected as captain, whereas the other players have probability of 1/8. Here, each does not have equal chance of selection as captain. So, Martin's plan cannot be considered to be “fair” in the selection of captain. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 6 TEACHING-LEARNING TEACHER NOTES Example 2: At a school fair, you are given a number of tokens. In one stall at the fair, there is a spinner with 8 sectors. If the spinner lands on a red sector, you win 3 tokens. If you land on a green sector, you win 5 tokens. If you land on any other sector, you lose 2 tokens. Is this game fair? The spinner has 8 sectors and each is an equally likely possibility. Sample space is {red sector, green sector, 6 other sectors} Write the probability distribution for a single spin of spinner and the amount of tokens you win. Ans: Use the weighted average formula to find expected value. The expected value is not zero, and the game is not fair. So you will lose about 0.5 tokens for a single spin. The following lesson works with independent events, probability of compound events (independent events), dependent events, and conditional probability (probability of dependent events). Have students research the terms independent events and the probability of compound events occurring, find working definitions, and examples with answers. Collect the examples and use as worksheet problems for the entire class. To review dependent events and conditional probability, use the following link 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 7 TEACHING-LEARNING TEACHER NOTES http://www.mathgoodies.com/lessons/vol6/dependent_events.html (this is attached on pages 16-19 and it explains this concept well with additional problems at the end of the lesson which are not included here.) The activities on the link below tie these concepts together and serve as a good reinforcement. The concepts are: compute and interpret the expected value of random variables in simple cases, understand how to compute the probability of a compound event, understand the concepts of conditional probability and independent events, use simulations to construct empirical probability distributions. http://illuminations.nctm.org/LessonDetail.aspx?id=L290 DISTRICT GENERATED ASSESSMENT 7. Students will be collecting data using the three primary methods: surveys, experiments, and observational studies. In order to do this they need to know the differences among the three, which is discussed on the worksheet attached on pages 20-21. After students have completed the worksheet, discuss different scenarios which would be appropriate for each type of data collection. (S.IC.1, S.IC.2, S.IC.3, S.IC.4, S.IC.5, S.IC.6, S.MD.7, L.2, L.5, L.6) In groups or individually, have students pose a question or hypothesis, develop a survey that would assist in evaluating the question, find a random population to answer the survey, find the mean, find the standard deviation, draw a normal curve, make a table showing the probability distribution and find the expected value. Depending on the survey, use a die, random number generator on the calculator, or a coin to perform a simulation of the survey. Compare the results to the actual survey and calculate the margin of error. Have students research data that was arrived at experimentally that compares two treatments such as A new drug was tested for its effectiveness and side effects. There were 120 people in study and 84 had expected results from the drug, and 36 had side effects P(emp) = = .70 or 70% P(emp) = = .30 = 30% For this example, a coin toss can be used to simulate the experiment and compare the two results using the p value. If there are more than two results in your experiment, you can use the random number generator found on the TI’s, die, etc. instead of the coin toss. The process for calculating the p-value and null hypothesis are attached on pages 22-24. Have students do an observational study, such as most popular color and then compare it to a survey asking people what they think the most popular color is. Have them examine statistical significance in relation to the randomness of the selection in sampling. On pages 25-27 are questions with answers that students should be able to answer upon completions of the unit. TRADITIONAL ASSESSMENT TEACHER NOTES 1. Unit Tests: Multiple-Choice Questions TEACHER CLASSROOM ASSESSMENT TEACHER NOTES 1. Quizzes 2. 2- and 4-point questions 3. Other teacher assignments 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 8 AUTHENTIC ASSESSMENT TEACHER NOTES 1. Students evaluate unit goals 2. To culminate the unit, have students research companies, banks, pharmaceutical companies, etc. that do product testing and use their statistics to make inferences to sell their product. Have them choose one, state their source and write a summary of their findings. (S.IC.1, S.IC.6, S.MD.7, MP.2, MP.4, MP.5, MP.7, L.1, L.2, L.5, L.6, L.9) 3. To make inferences and justify your conclusion you will engage in the following steps: a. Create a hypothesis. b. Decide on a method of data collection c. Implement the method and collect the data d. Analyze the data e. Compare results with expectations f. Perform a simulation and compute margin of error g. Make inferences from your data collection and justify your conclusions ELEMENTS OF THE PROJECT Create a hypothesis RUBRIC AUTHENTIC ASSESSMENT 0 1 Did not attempt Did not attempt Method of data collection Did not attempt Collect and present data Analyze data using mean, standard deviation, probability distributions, and/or p value Did not attempt Compared results with expectations Justified several conclusions Make inferences Did not attempt Justify conclusions Did not attempt Justified one conclusions Did not attempt Simulation results Compare simulation with actual data using margin of error 7/01/2013 Did not attempt Created an hypothesis appropriate for data collection Chose one of the three methods discussed in the unit Compared results with something other than expectations Simulation was performed incorrectly Compared simulation with actual data, no margin of error stated Made one inference Compare results with expectations Did not attempt 2 Created an hypothesis not appropriate for data collection Did not choose one of the three methods discussed in the unit Collected data, did not present in an organized manner Analyzed data using only one of the methods listed Data collected and presented in an organized manner Analyzed data using two or more of the methods listed Simulation performed correctly and results shown Compared simulation with actual data showing margin of error Made several inferences YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 9 MOTIVATION #1 Note to teacher: This activity is designed to review several statistical concepts: histogram, dot plot, mean, median, and MAD. You may need to review those concepts before they begin. It is also designed to introduce students to simulation activities and making inferences. A simulation is the use of a probability experiment to mimic a real-life situation. The real-life situation that is to be mimicked is: General Mills is offering 6 different superhero prizes, one in each box, in their Honey Nut Cheerios boxes. The prizes are equally and randomly distributed. The question is “How many boxes on average would you have to buy to get a complete set of superheroes?” Collect Data: Work in pairs to complete steps 1 through 4: Step 1: Since there are 6 different prizes, use a die with each number to represent the prizes. Step 2: Roll the die and record which prize was chosen, keeping track on a data sheet shown below Step 3: Continue to roll the die until all 6 prizes have been chosen and then stop. This is the end of one trial in your simulation. Record the number of boxes for this trial Step 4: Repeat steps 1 – 3 until you have 25 trials, using a new tally sheet for each trial. Number Number. of of trials boxes per trial DATA TALLY SHEET 1 Prize Number Boxes Purchased 2 1 3 2 4 3 5 4 6 5 6 7 Total needed 8 9 ANALYZE DATA 10 1. Create a histogram and dot plot displaying your data. 11 2. Find the mean, median, maximum, and minimum. 12 3. Combine all the traits from the entire class and find the mean, median, 13 maximum and minimum. 14 MAKE A CONJECTURE 15 1. If you perform 50 additional trials, will your results change? 16 2. When reporting the data, should you use the individual trials or the entire 17 class? 18 3. If there were 10 superheroes instead of six, would you need to buy more for 19 the few boxes of cereal on averages? 20 21 22 23 24 25 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 10 Data table for area under the normal distribution curve 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916 2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936 2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974 2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 11 Example to help read the above chart: Percent of Population Between 0 and 0.45 Start at the row for 0.4, and read along until 0.45: there is the value 0.1736 So 0.1736 of the population is between 0 and 0.45 Standard Deviations from the Mean and 0.1736 is 17.36% So 17.36% of the population is between 0 and 0.45 Standard Deviations from the Mean Because the curve is symmetrical, the same table can be used for values going either direction, so a negative 0.45 also has an area of 0.1736 Normal Distribution Problems with Answers 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 12 Problems and applications on normal distributions are presented. The answers to these problems are at the bottom of the page. Also an online normal distribution probability calculator may be useful to check your answers. 1. X is a normally normally distributed variable with mean μ = 30 and standard deviation σ = 4. Find a) P(x < 40) b) P(x > 21) c) P(30 < x < 35) 2. A radar unit is used to measure speeds of cars on a motorway. The speeds are normally distributed with a mean of 90 km/hr and a standard deviation of 10 km/hr. What is the probability that a car picked at random is travelling at more than 100 km/hr? 3. For a certain type of computers, the length of time between charges of the battery is normally distributed with a mean of 50 hours and a standard deviation of 15 hours. John owns one of these computers and wants to know the probability that the length of time will be between 50 and 70 hours. 4. Entry to a certain University is determined by a national test. The scores on this test are normally distributed with a mean of 500 and a standard deviation of 100. Tom wants to be admitted to this university and he knows that he must score better than at least 70% of the students who took the test. Tom takes the test and scores 585. Will he be admitted to this university? 5. The length of similar components produced by a company are approximated by a normal distribution model with a mean of 5 cm and a standard deviation of 0.02 cm. If a component is chosen at random a) what is the probability that the length of this component is between 4.98 and 5.02 cm? b) what is the probability that the length of this component is between 4.96 and 5.04 cm? 6. The length of life of an instrument produced by a machine has a normal distribution with a mean of 12 months and standard deviation of 2 months. Find the probability that an instrument produced by this machine will last a) less than 7 months. b) between 7 and 12 months. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 13 7. The time taken to assemble a car in a certain plant is a random variable having a normal distribution of 20 hours and a standard deviation of 2 hours. What is the probability that a car can be assembled at this plant in a period of time a) less than 19.5 hours? b) between 20 and 22 hours? 8. A large group of students took a test in Physics and the final grades have a mean of 70 and a standard deviation of 10. If we can approximate the distribution of these grades by a normal distribution, what percent of the students a) scored higher than 80? b) should pass the test (grades≥60)? c) should fail the test (grades<60)? 9. The annual salaries of employees in a large company are approximately normally distributed with a mean of $50,000 and a standard deviation of $20,000. a) What percent of people earn less than $40,000? b) What percent of people earn between $45,000 and $65,000? c) What percent of people earn more than $70,000? Answers to the Above Questions 1. Note: What is meant here by area is the area under the standard normal curve. a) For x = 40, the z-value z = (40 - 30) / 4 = 2.5 Hence P(x < 40) = P(z < 2.5) = [area to the left of 2.5] = 0.9938 b) For x = 21, z = (21 - 30) / 4 = -2.25 Hence P(x > 21) = P(z > -2.25) = [total area] - [area to the left of -2.25] = 1 - 0.0122 = 0.9878 c) For x = 30 , z = (30 - 30) / 4 = 0 and for x = 35, z = (35 - 30) / 4 = 1.25 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 14 Hence P(30 < x < 35) = P(0 < z < 1.25) = [area to the left of z = 1.25] - [area to the left of 0] = 0.8944 - 0.5 = 0.3944 2. Let x be the random variable that represents the speed of cars. x has μ = 90 and σ = 10. We have to find the probability that x is higher than 100 or P(x > 100) For x = 100 , z = (100 - 90) / 10 = 1 P(x > 90) = P(z >, 1) = [total area] - [area to the left of z = 1] = 1 - 0.8413 = 0.1587 The probability that a car selected at a random has a speed greater than 100 km/hr is equal to 0.1587 3. Let x be the random variable that represents the length of time. It has a mean of 50 and a standard deviation of 15. We have to find the probability that x is between 50 and 70 or P( 50< x < 70) For x = 50 , z = (50 - 50) / 15 = 0 For x = 70 , z = (70 - 50) / 15 = 1.33 (rounded to 2 decimal places) P( 50< x < 70) = P( 0< z < 1.33) = [area to the left of z = 1.33] - [area to the left of z = 0] = 0.9082 - 0.5 = 0.4082 The probability that John's computer has a length of time between 50 and 70 hours is equal to 0.4082. 4. Let x be the random variable that represents the scores. x is normally ditsributed with a mean of 500 and a standard deviation of 100. The total area under the normal curve represents the total number of students who took the test. If we multiply the values of the areas under the curve by 100, we obtain percentages. For x = 585 , z = (585 - 500) / 100 = 0.85 The proportion P of students who scored below 585 is given by P = [area to the left of z = 0.85] = 0.8023 = 80.23% Tom scored better than 80.23% of the students who took the test and he will be admitted to this University. 5. a) P(4.98 < x < 5.02) = P(-1 < z < 1) = 0.6826 b) P(4.96 < x < 5.04) = P(-2 < z < 2) = 0.9544 6. a) P(x < 7) = P(z < -2.5) = 0.0062 b) P(7 < x < 12) = P(-2.5 < z < 0) = 0.4938 7. a) P(x < 19.5) = P(z < -0.25) = 0.4013 b) P(20 < x < 22) = P(0 < z < 1) = 0.3413 8. a) For x = 80, z = 1 Area to the right (higher than) z = 1 is equal to 0.1586 = 15.87% scored more that 80. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 15 b) For x = 60, z = -1 Area to the right of z = -1 is equal to 0.8413 = 84.13% should pass the test. c)100% - 84.13% = 15.87% should fail the test. 9. a) For x = 40000, z = -0.5 Area to the left (less than) of z = -0.5 is equal to 0.3085 = 30.85% earn less than $40,000. b) For x = 45000 , z = -0.25 and for x = 65000, z = 0.75 Area between z = -0.25 and z = 0.75 is equal to 0.3720 = 37.20 earn between $45,000 and $65,000. c)For x = 70000, z = 1 Area to the right (higher) of z = 1 is equal to 0.1586 = 15.86% earn more than $70,000. Dependent Events Experiment 1: 7/01/2013 Unit 6 > Lesson 8 of 12 A card is chosen at random from a standard deck of 52 playing cards. Without replacing it, a second card is chosen. What is the probability that the first card chosen is a queen and the second card chosen is a jack? YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 16 Analysis: The probability that the first card is a queen is 4 out of 52. However, if the first card is not replaced, then the second card is chosen from only 51 cards. Accordingly, the probability that the second card is a jack given that the first card is a queen is 4 out of 51. Conclusion: The outcome of choosing the first card has affected the outcome of choosing the second card, making these events dependent. Definition: Two events are dependent if the outcome or occurrence of the first affects the outcome or occurrence of the second so that the probability is changed. Now that we have accounted for the fact that there is no replacement, we can find the probability of the dependent events in Experiment 1 by multiplying the probabilities of each event. Experiment 1: A card is chosen at random from a standard deck of 52 playing cards. Without replacing it, a second card is chosen. What is the probability that the first card chosen is a queen and the second card chosen is a jack? Probabilities: 4 P(queen on first pick) = 52 4 P(jack on 2nd pick given queen on 1st pick) = 51 4 P(queen and jack) = 4 · 16 = 4 = 52 51 2652 663 Experiment 1 involved two compound, dependent events. The probability of choosing a jack on the second pick given that a queen was chosen on the first pick is called a conditional probability. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 17 Definition: The conditional probability of an event B in relationship to an event A is the probability that event B occurs given that event A has already occurred. The notation for conditional probability is P(B|A) [pronounced as The probability of event B given A]. The notation used above does not mean that B is divided by A. It means the probability of event B given that event A has already occurred. To find the probability of the two dependent events, we use a modified version of Multiplication Rule 1, which was presented in the last lesson. Multiplication Rule 2: When two events, A and B, are dependent, the probability of both occurring is: P(A and B) = P(A) · P(B|A) Let's look at some experiments in which we can apply this rule. Experiment 2: Mr. Parietti needs two students to help him with a science demonstration for his class of 18 girls and 12 boys. He randomly chooses one student who comes to the front of the room. He then chooses a second student from those still seated. What is the probability that both students chosen are girls? Probabilities: P(Girl 1 and Girl 2) = P(Girl 1) and P(Girl 2|Girl 1) 18 = 17 · 30 29 306 = 870 = 7/01/2013 51 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 18 145 Experiment 3: In a shipment of 20 computers, 3 are defective. Three computers are randomly selected and tested. What is the probability that all three are defective if the first and second ones are not replaced after being tested? Probabilities: P(3 defectives) = 3 2 · 1 · 6 = 20 19 18 1 = 6840 1140 Experiment 4: Four cards are chosen at random from a deck of 52 cards without replacement. What is the probability of choosing a ten, a nine, an eight and a seven in order? Probabilities: P(10 and 9 and 8 and 7) = 4 4 4 4 256 32 · · · = = 52 51 50 49 6,497,400 812,175 Experiment 5: Three cards are chosen at random from a deck of 52 cards without replacement. What is the probability of choosing 3 aces? Probabilities: P(3 aces) = 4 3 · 2 · 24 = 1 = 52 51 50 132,600 5,525 Summary: 7/01/2013 Two events are dependent if the outcome or occurrence of the first affects the outcome or occurrence of the second so that the probability is changed. The YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 19 conditional probability of an event B in relationship to an event A is the probability that event B occurs given that event A has already occurred. The notation for conditional probability is P(B|A). When two events, A and B, are dependent, the probability of both occurring is: P(A and B) = P(A) · P(B|A) T/L #7 Worksheet: Surveys, Experiments, and Observational Data Collection REAL-WORLD EXAMPLE 1: BIASED AND UNBIASED SAMPLES SURVEYS State whether each method would produce a random sample. Explain your reasoning. a. surveying people coming out of a movie theater to find out people’s favorite entertainment This would probably not result in a random sample because the people surveyed would probably be more likely than normal to select going to the movies as a favorite entertainment. Also, people who do not go to movies would not be represented. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 20 b. placing a survey in the local newspaper to determine how people voted in the last election This would probably not result in a random sample because only people who buy the local newspaper would be represented. Also, not all people would fill out the survey. c. selecting students at a school to answer questions by randomly drawing their student identification numbers from a hat This would result in a random sample. Each student identification number in the hat would have an equal change of being selected. EXAMPLE 2: SURVEY DESIGN SCHOOL SURVEYS Shawna wants to determine the most desired dinner food for the senior class picnic. Which questions will get her the answer she is seeking? a. Do you like hamburgers? This question is biased in favor of hamburgers. b. Which is better, hamburgers or lunch meat? This question is biased because it only gives two options. c. What would you most like to eat at the senior picnic? This is an unbiased question that will produce the answer she is seeking. REAL-WORLD EXAMPLE 3: EXPERIMENTS AND OBSERVATIONAL STUDIES State whether each situation represents an experiment or an observational study. Identify the control group and the treatment group. If it is an experiment, determine whether there is bias. a. Find 100 people and randomly split them into two groups. One group takes vitamins once a day and the other group does not take any vitamins at all. This is an experiment because the people are put into groups at random. The treatment group is the vitamin takers, and the control is the other group. This is a biased experiment because the participants all know which group they are in. b. Find 100 students, half of whom had part time jobs and compare their grade-point averages. This is an observational study. The students who had jobs are the treated group and the other students are the control. EXAMPLE 4: EXPERIMENTS AND OBSERVATIONAL STUDIES Determine whether each situation calls for a survey, an observational study, or an experiment. Explain the process. a. You want to find opinions on a local board of education election. This calls for a survey. It is best to call random numbers throughout the school district in order to get an unbiased sample. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 21 b. You want to find out if 5 years of sending text messages affects manual dexterity. This calls for an observational study. The manual dexterity of people who have sent text messages for five years is compared to the manual dexterity of an equal number of non-texters. c. You want to test a treatment for allergies. This calls for an experiment. The test subjects are people with allergies. The treated group receives the treatment while the control group gets a placebo. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 22 Calculation P value (http://www.wikihow.com/Calculate-P-Value) P-value, or probability value, is a statistical measure that helps scientists determine if their hypotheses are correct. It is directly related to the significance level, which is an important component in determining whether the data obtained from scientific research is statistically significant. You can use a table to find the p-value after you calculate other statistical values. Chi-square is one of the statistical values you must find first. Calculate the chi-square to compare 2 sets of data. The equation is: (|o-e|-.05)^2/e, where "o" equals the observed, or actual, data, and "e" equals the expected data. For example, you could test the theory that people who drive red cars get more speeding tickets than those who drive blue cars. Start by stating a null hypothesis: people who drive red and blue cars get equal amounts of speeding tickets. Determine your expected observations based on previous research. For example, you discover that prior studies show a a 2-to-1 ratio of speeding tickets for red cars. So, in a hypothetical study of 150 blue and red cards, you would expect that 100 of the red cars would have received tickets versus 50 of the blue cars. Use the chi-square equation. Your chi-square would be equal to 2.970075 since your research results are that 90 red cars received speeding tickets and that 60 blue cars received tickets. Determine degrees of freedom. Degrees of freedom are basically the amount of variability involved in the research, which is limited by the number of categories you are examining. There is one degree of freedom in this example because for the chi-square test, the equation for degrees of freedom is: n-1, where "n" is the number of categories (i.e. 2 different cars, red and blue). 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 23 Choose the significance level. The significance level is determined by the researcher, and is customarily set at 0.05, or 5 percent. Essentially, this means that 5 percent of the time, the results in your study would be derived by complete chance. But 95 percent of the time, the variable in your study would be the cause of your results. In this case, 5 percent of the time, the results in your study would be derived by complete chance. But 95 percent of the time the color of the car would be the cause of the speeding ticket. Use a chi-square distribution table to find the p-value. A chi-square distribution table will give you the p-value based on the degrees of freedom and the chi-square calculation. You can find tables online or in statistics textbooks. Not all chi-square values are represented on distribution tables, so use the value closest to your chi-square calculation. In the example, the chi-square was 2.970075. So, using a chi-square distribution table, the p-value is 0.10. So, the results of the sample study are not significantly different to refute, or disprove, the theory that red cars receive more speeding tickets. Since the p value is greater than the significance level, you “fail to reject” the null hypothesis 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 24 Below are examples of null and alternative hypotheses written out in symbolic form for cases A, B, C, and D in the following Table: 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 25 Questions students should be able to answer upon completion of the unit: http://www.southalabama.edu/coe/bset/johnson/studyq/sq16.htm 1. What is the difference between a statistic and a parameter? 2. What is the symbol for the population mean? 3. What is the symbol for the population correlation coefficient? 4. What is the definition of a sampling distribution? 5. How does the idea of repeated sampling relate to the concept of a sampling distribution? 6. What is a null hypothesis? 7. To whom is the researcher similar to in hypothesis testing: the defense attorney or the prosecuting attorney? Why? 8. What is the difference between a probability value and the significance level? 9. Why do educational researchers usually use .05 as their significance level? 10. State the two decision making rules of hypothesis testing. · Rule one: · Rule two: 11. Do the following statements sound like typical null or alternative hypotheses? (A) The coin is fair. (B) There is no difference between male and female incomes in the population. (C) There is no correlation in the population. (D) The patient is not sick (i.e., is well). (E) The defendant is innocent. 12. If a finding is statistically significant, why is it also important to consider practical significance? 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 26 Questions with Answers that students should be able to answer upon completion of the unit: http://www.southalabama.edu/coe/bset/johnson/studyq/sq16.htm 1. What is the difference between a statistic and a parameter? A statistic is a numerical characteristic of a sample, and a parameter is a numerical characteristic of a population. 2. What is the symbol for the population mean? The symbol is the Greek letter mu (i.e., µ). 3. What is the symbol for the population correlation coefficient? The symbol is the Greek letter rho (i.e., ρ ). 4. What is the definition of a sampling distribution? The sampling distribution is the theoretical probability distribution of the values of a statistic that results when all possible random samples of a particular size are drawn from a population. 5. How does the idea of repeated sampling relate to the concept of a sampling distribution? Repeated sampling involves drawing many or all possible samples from a population. 6. What is a null hypothesis? A null hypothesis is a statement about a population parameter. It usually predicts no difference or no relationship in the population. The null hypothesis is the “status quo,” the “nothing new,” or the “business as usual” hypothesis. It is the hypothesis that is directly tested in hypothesis testing. 7. To whom is the researcher similar to in hypothesis testing: the defense attorney or the prosecuting attorney? Why? The researcher is similar to the prosecuting attorney is the sense that the researcher brings the null hypothesis “to trial” when she believes there is probability strong evidence against the null. · Just as the prosecutor usually believes that the person on trial is not innocent, the researcher usually believes that the null hypothesis is not true. · In the court system the jury must assume (by law) that the person is innocent until the evidence clearly calls this assumption into question; analogously, in hypothesis testing the researcher must assume (in order to use hypothesis testing) that the null hypothesis is true until the evidence calls this assumption into question. 8. What is the difference between a probability value and the significance level? Basically in hypothesis testing the goal is to see if the probability value is less than or equal to the significance level (i.e., is p ≤ alpha). · The probability value (also called the p-value) is the probability of the result found in your research study of occurring (or an even more extreme result occurring), under the assumption that the null hypothesis is true. · That is, you assume that the null hypothesis is true and then see how often your finding would occur if this assumption were true. · The significance level (also called the alpha level) is the cutoff value the researcher selects and then uses to decide when to reject the null hypothesis. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 27 · Most researchers select the significance or alpha level of .05 to use in their research; hence, they reject the null hypothesis when the p-value (which is obtained from the computer printout) is less than or equal to .05. 9. Why do educational researchers usually use .05 as their significance level? It has become part of the statistical hypothesis testing culture. · It is a convention. · It reflects a concern over making type I errors (i.e., wanting to avoid the situation where you reject the null when it is true, that is, wanting to avoid “false positive” errors). · If you set the significance level at .05, then you will only reject a true null hypothesis 5% or the time (i.e., you will only make a type I error 5% of the time) in the long run. 10. State the two decision making rules of hypothesis testing. · Rule one: If the p-value is less than or equal to the significance level then reject the null hypothesis and conclude that the research finding is statistically significant. · Rule two: If the p-value is greater than the significance level then you “fail to reject” the null hypothesis and conclude that the finding is not statistically significant. 11. Do the following statements sound like typical null or alternative hypotheses? (A) The coin is fair. (B) There is no difference between male and female incomes in the population. (C) There is no correlation in the population. (D) The patient is not sick (i.e., is well). (E) The defendant is innocent. All of these sound like null alternative hypotheses (i.e., the “nothing new” or “status quo” hypothesis). We usually assume that a coin is fair in games of chance; when testing the difference between male and female incomes in hypothesis testing we assume the null of no difference; when testing the statistical significance of a correlation coefficient using hypothesis testing, we assume that the correlation in the population is zero; in medical testing we assume the person does not have the illness until the medical tests suggest otherwise; and in our system of jurisprudence we assume that a defendant is innocent until the evidence strongly suggests otherwise. 12. If a finding is statistically significant, why is it also important to consider practical significance? When your finding is statistically significant all you know is that your result would be unlikely if the null hypothesis were true and that you therefore have decided to reject your null hypothesis and to go with your alternative hypothesis. Unfortunately, this does not tell you anything about how big of an effect is present or how important the effect would be for practical purposes. That’s why once you determine that a finding is statistically significant you must next use one of the effect size indicators to tell you how strong the relationship. Think about this effect size and the nature of your variables (e.g., is the IV easily manipulated in the real world? Will the amount of change relative to the costs in bringing this about be reasonable?). · Once you consider these additional issues beyond statistical significance, you will be ready to make a decision about the practical significance of your study results. 7/01/2013 YCS ALGEBRA II: UNIT 4 INFERENCES AND CONCLUSIONS FORM DATA 2013-2014 28