Chapter 16 16.1 – Statistics Organizing Data 16.2 – Measures of Central Tendency 16.3 – Measures of Variation Stem-and-Leaf Plots ASTRONAUTS Display the data shown in a stem-and-leaf plot. Step 1 Find the least and the greatest number. 54 77 Stem-and-Leaf Plots ASTRONAUTS Display the data shown in a stem-and-leaf plot. 54 77 Step 2 Draw a vertical line and write the stems from 5 to 7 to the left of the line. Stem-and-Leaf Plots ASTRONAUTS Display the data shown in a stem-and-leaf plot. Step 3 Write the leaves to the right of the line, with the corresponding stem. Stem and Leaf Diagram Stem-and-Leaf Plots ASTRONAUTS Display the data shown in a stem-and-leaf plot. Step 4 Rearrange the leaves so they are ordered from least to greatest. Ranked Stem and Leaf Plot Frequency Distributions Relative Frequency 0.67 0.23 0.06 0 0.03 Relative Frequency 0.14 0.33 0.25 0.17 0.11 Relative Frequency The ratio of the absolute frequency to the total number of data points in a frequency distribution Also known as Experimental Probability. As the number of data points in any experiment increase the Experimental Probability approaches the theoretical probability. Measures of Variation Used to describe the distribution of the data: Range: Difference between the high and the low data points Variance: Standard Deviation: Relative Frequency The ratio of the absolute frequency to the total number of data points in a frequency distribution Also known as Experimental Probability. As the number of data points in any experiment increase the Experimental Probability approaches the theoretical probability. Measures of Variation Used to measure how spread out the data is. Range: Difference between the high and the low data points Mean, Median, Mode: Measures of central tendency Variance and Standard Deviation: Measures how much the data values differ from the mean. Mean Deviation Mean The average of all data points. To find the mean of a set of data, add all the data and divide by the number of data points. n x x i 1 i n Sample Space The sum of all possible outcomes for the event The sum of all probabilities assigned to outcomes in a sample space must be 1. Complement For an event A the event Not A is the complement of A Mean The average of all data points. To find the mean of a set of data, add all the data and divide by the number of data points. n x x i 1 i n Median The value of the data point that is exactly in the middle of the data. To find the median, put the data in order from least to greatest and find the term exactly in the middle. If there is no one term exactly in the middle, average the two that are in the middle. Complement For an event A the event Not A is the complement of A Mean The average of all data points. To find the mean of a set of data, add all the data and divide by the number of data points. n x x i 1 i n Median The value of the data point that is exactly in the middle of the data. To find the median, put the data in order from least to greatest and find the term exactly in the middle. If there is no one term exactly in the middle, average the two that are in the middle. Mode The value of the data point that occurs most often. Possible to have more than one mode Box-and-Whisker Plots Step 1 Find the least and greatest number. Then draw a number line that covers the range of the data. Step 2 Find the median, the extremes, and the upper and lower quartiles. Mark these points above the number line. New hampshire 13 Delaware 28 Maryland 31 Rhode Island 40 Georgia 100 Virginia 112 New York 127 New Jersey 130 South Carolina 187 Massachusetts 192 Maine 228 North Carolina 301 Florida 580 Lower Extreme Lower Quartile : Lower hinge Median Upper Quartile : Upper hinge Upper Extreme Step 3 Draw a box and the whiskers. Computing Measures of Variation Range: Difference between the high and the low data points n Mean Deviation | x x | i 1 i n 2 ( x x ) i n Variance: 2 i 1 n Standard Deviation: n 2 ( x x ) i i 1 n Find the mean, median, mode, variance, mean deviation, and standard deviation of the following data 59 59 65 68 69 72 73 76 78 81 81 88 Find the mean, median, mode, variance, mean deviation, and standard deviation of the following data 10 12 9 10 8 9 13 8 9 9 10 10 12 13 HW #16.1-3 Pg 692-693 3, 9, 10-13 Pg 698-699 3, 6, 9-16 Pg 702-703 5, 7-10 Chapter 16 16.4 – The Normal Distribution The heights of 16-year-old girls are not distributed uniformly, or evenly. Many more girls are average height than are very short or very tall. The values are distributed so that they are frequent near the mean, and become more rare and infrequent the farther they are from the mean. The most common distribution with this characteristic is a Normal curves are symmetric with respect to the vertical line at the mean. The spread of each curve is defined by its standard deviation. Areas under this curve represent probabilities from normal distributions. When data are distributed in a bell-shaped or normal curve about 68% of the data lie within one standard deviation on either side of the mean, and about 95% lie within two standard deviations of the mean. Consider the bell-shaped distribution of IQ scores for students in a school. The mean is 100 and the standard deviation is 15. What percent of students in the school would we expect to have IQs between 85 and 115? What percent of the students can we expect to have IQs between 70 and 115? What percent of the students in the school can we expect to have IQs above 115? What percent of the students in the school can we expect to have IQs in the range from 85 to 145? The given times of 33 minutes and 57 minutes represent one standard deviation on either side of the mean, So, 68% of the shoppers will spend between 33 an 57 minutes in the supermarket. According to a survey by the National Center for Health Statistics, the heights of adult men in the United States are normally distributed with a mean of 69 inches and a standard deviation of 2.75 inches. If you randomly choose 1 adult man, what is the probability that all he is 71.75 inches tall or taller? Z-Score A z-score for a value is the number of standard deviations the value is from the mean. The sign of the z-score is its direction from the mean. For example, if a value has a z-score of -2, it is two standard deviations below the mean. z xx What is the z-score for 46 in a normal distribution whose mean is 44 and whose standard deviation is 2? For this distribution, mean = 44 and Standard Deviation = 2. z xx 46 44 1 2 Thus, 46 is one standard deviation above the mean. A value is selected randomly from a normal distribution. What is the probability that its z-score is less than -1.46? The probability that a value has a z-score less than -1.46 is equal to the area of the shaded region under the curve. This value is given in Table 7. Table 7 on page 850 gives the probability that a value in the distribution has a z-score that is less than a given value A student received a score of 56 on a normally distributed standardized test. The test had a mean of 50 and a standard deviation of 5.What is the probability that a randomly selected student achieved a higher score? We need to find the probability that a value greater than 56 is selected from a normal distribution with a mean of 50 and a standard deviation of 5. Look up 1.2 in Table 7. P(Score < 56) = 0.8849 P(Score > 56) = 1-0.8849 = 0.1151 How often would we expect to find an 1Q greater than 142 in a sample of students whose mean 1Q was 110 where the standard deviation was 16? A company manufactures cover plates for boxes with lengths of 4 inches. Due to variation in the process, the lengths of the plates are normally distributed about a mean of 4 inches with a standard deviation of 0.01 inch. A plate is considered a "reject" if its length is less than 3.98 inches or greater than 4.02 inches.What percent of the production are considered "rejects"? HW #16.4 Pg 708 1-31 Odd, 33-43 Chapter 16 16.5 – Collecting Data Randomness and Bias Objective: Evaluate and select sampling methods. Objective: Describe how to take a stratified random sample. Objective: Evaluate and select sampling methods. A scientist is studying the weight gain or loss of mice that are given a certain treatment. When choosing mice for the experiment, the scientist reaches into a cage with 30 mice and selects the 5 largest mice in the cage. Is the sample random? Objective: Evaluate and select sampling methods. Describe how a random sample of 10 individuals might be chosen from a high school graduating class of 202 to receive a gift certificate. Objective: Evaluate and select sampling methods. Although the processes involved in the development of a random sample guarantee that requirements of equal probability and independence are satisfied, they do not guarantee that the sample drawn will be REPRESENTATIVE Objective: Evaluate and select sampling methods. A town newsletter is doing an article on high school students. A questionnaire is sent to a random sample of school-aged students. Are the data representative? No. A sample of students from kindergarten through grade 12 would not give results representative of high school students. Even when the sample is restricted to high school students, the data may not be representative. Data might have been collected largely from members of the high school chorus. In this case, the data would most likely be biased. That is, it is likely that the data would be overly influenced by factors that are related to musical interests. Objective: Evaluate and select sampling methods. Objective: Describe how to take a stratified random sample. To draw a representative sample, we may need to divide the population into distinct subgroups, called strata. Then we can use stratified random sampling to assure that the sample has the same characteristics as the population. 1. Each member of the population must be placed in one and only one stratum. 2. A random sample is drawn so that the sample has the same distribution among the strata as the population. Objective: Describe how to take a stratified random sample. A college has 1260 freshmen, 1176 sophomores, 840 juniors, and 924 seniors. Describe how to take a stratified random sample of 200 students. We would randomly sample 60 freshmen, 56 sophomores, 40 juniors, and 44 seniors. HW #16.5 Pg 713-714 1-19 Chapter 16 16.6 – Testing Hypothesis Statisticians are often asked to test whether a given set of observational data represents what one would expect to observe by chance or whether it differs greatly from what one might expect. To determine what constitutes a significant difference, statisticians establish a level of error they are willing to tolerate. If we throw a coin 30 times and the coin shows tails 23 times, we may decide either that 1. The results occurred by chance. • The coin is fair and randomly showed 23 tails, although the likelihood that a coin lands tails 23 times is remote. 2. The results did not occur by chance. • The coin is unevenly weighted or was tossed so that tails had a higher probability of being thrown. Before stating that the results were biased or influenced by other circumstances, we want to be sure there is a 5% or less probability that the results occurred by chance. Then we can state that the coin is fair at the 5% level of significance. Find the probability of throwing 23 tails in 30 tosses 23 7 30 1 1 23 2 2 .0019 1% Since the probability is less than 5%, we may state that at the 5% level of significance, the results did not occur by chance. One of the most common ways to test whether a given set of data differs from what one would expect by chance is to use the chi-square (2) test. This test is used to compare observed data with expected data. If there is a large difference between observed and expected data, we get a large value for 2. If there is no difference, 2 = O. If we toss a coin 30 times and get 23 heads calculate 2 : What does that tell us about the whether or not the event occurred by chance or not? The chi-square test is typically used to accept or reject a hypothesis about a set of data and to generalize about a population. null hypothesis: There is no statistical difference between the expected and the observed data. Thus, the observed results occurred by chance. The larger the value of 2,the higher the probability the null hypothesis is false. How large must X2 be for us to reject the null hypothesis? For a specific level of significance, we reject the null hypothesis if the calculated value of chi-square exceeds the table value for the number of possible outcomes. Suppose we rolled a number cube 72 times and found that we had 13 ones, 18 threes, and 12 sixes. Determine whether these results occurred by chance. Use a 5% level of significance. The null hypothesis is that the results occurred by chance. There are 4 possible outcomes. The chi-square value for a 5% significance level for 4 possible outcomes is 7.81. Since 4.44 does not exceed this value, we can state that our results occurred by chance. Thus, we can accept the null hypothesis. HW #16.6 Pg 718-720 1-13 Test Review * Stem and Leaf Plots * Frequency plots Relative Frequency * Box and Whisker plots * Normal Distributions Mean, Median, Mode, Standard Deviation, Mean Deviation, Variance, Z-score * Hypothesis testing Chi-Square, Significance Level, Null hypothesis * Random Sample Representative, Biases, Stratified random sample * Challenge Problems Find the variance and standard deviation of each set of data. {5, 8, 2, 9, 4} {16, 22, 18, 31, 25, 22} The useful life of a radial tire is normally distributed with a mean of 30,000 miles and a standard deviation of 5000 miles. The company makes 10,000 tires a month. 1. About how many tires will last between 25,000 and 35,000 miles? 2. About how many tires will last more than 40,000 miles? 3. About how many tires will last less than 25,000 miles? 4. What is the probability that if you buy a radial tire at random, it will last between 20,000 and 35,000 miles? The vending machine in the school cafeteria usually dispenses about 6 ounces of soft drink. Lately, it is not working properly, and the variability of how much of the soft drink it dispenses has been getting greater. The amounts are normally distributed with a standard deviation of 0.2 ounce. 1. What percent of the time will you get more than 6 ounces of soft drink? 2. What percent of the time will you get less than 6 ounces of soft drink? 3. What percent of the time will you get between 5.6 and 6.4 ounces of soft drink? 4. If you purchased a soda, what is the probability that it dispenses less than 6.5 oz of soda If the mean GPA at Troy is 3.4 with a standard deviation of 0.38, what is the GPA for a student in the top 5%? Mr. Burnum gave an exam to his 30 Algebra 2 students at the end of the first semester. The scores were normally distributed with a mean score of 78 and a standard deviation of 6. What percent of the students would you expect to receive a grade of at least 70%. Cucumbers grown on a certain farm have weights with a standard deviation of 2 ounces. What is the mean weight if 85% of the cucumbers weigh less than 16 ounces? A coke machine is set up so that it will dispense soda into a can. If the actual amount of soda dispensed is normally distributed such that only 10% of all cans have less than 11.75 ounces and only 25% of all cans have more than 12.75 ounces, what are the mean and standard deviation of the soda dispensing machine? HW #R-16 Pg 722-724 1-17