MATH STUDENT BOOK 6th Grade | Unit 2 Unit 2 | Data Analysis MATH 602 Data Analysis INTRODUCTION |3 1. COLLECTING AND DESCRIBING DATA 5 COLLECTING VALID DATA |6 MEDIAN, MODE, AND RANGE |12 DESCRIBING DATA |17 SELF TEST 1: COLLECTING AND DESCRIBING DATA |22 2. ORGANIZING DATA 24 FREQUENCY TABLES |25 HISTOGRAMS |30 LINE PLOTS |36 STEM-AND-LEAF PLOTS |43 SELF TEST 2: ORGANIZING DATA |50 3. DISPLAYING AND INTERPRETING DATA 53 BAR GRAPHS |54 LINE GRAPHS |59 VENN DIAGRAMS |65 VERTEX-EDGE GRAPHS |71 SELF TEST 3: DISPLAYING AND INTERPRETING DATA |77 4.REVIEW 80 GLOSSARY |89 LIFEPAC Test is located in the center of the booklet. Please remove before starting the unit. Section 1 |1 Data Analysis | Unit 2 Author: Glynlyon Staff Editor: Alan Christopherson, M.S. MEDIA CREDITS: Page 5: © Ridofranz, iStock, Thinkstock; 12: © Jupiterimages, Brand X Pictures, Thinkstock; 30: © Ingram Publishing, Thinkstock; 59: © bgsmith, iStock, Thinkstock; © lucky336, iStock, Thinkstock. 804 N. 2nd Ave. E. Rock Rapids, IA 51246-1759 © MMXV by Alpha Omega Publications a division of Glynlyon, Inc. All rights reserved. LIFEPAC is a registered trademark of Alpha Omega Publications, Inc. All trademarks and/or service marks referenced in this material are the property of their respective owners. Alpha Omega Publications, Inc. makes no claim of ownership to any trademarks and/ or service marks other than their own and their affiliates, and makes no claim of affiliation to any companies whose trademarks may be listed in this material, other than their own. 2| Section 1 Unit 2 | Data Analysis Data Analysis Introduction In this unit, you will be introduced to the topic of Data Analysis. You will learn about ways to collect and organize data, and many ways to display data. You’ll learn what type of graph is best to display different data sets. You’ll also discover different ways to describe data, called measures of central tendency. You will learn many new terms and have several concepts to build on in your future studies of statistics. Objectives Read these objectives. The objectives tell you what you will be able to do when you have successfully completed this LIFEPAC. When you have finished this LIFEPAC, you should be able to: z Determine whether a sample is biased or random. z Find the measures of central tendency. z Use Venn diagrams to solve problems, including counting problems z Solve route problems with vertex-edge graphs. z Organize and display data in frequency tables, histograms, line plots, stem-and-leaf plots, bar graphs, and line graphs. Section 1 |3 Data Analysis | Unit 2 Survey the LIFEPAC. Ask yourself some questions about this study and write your questions here. ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ 4| Section 1 Unit 2 | Data Analysis 1. COLLECTING AND DESCRIBING DATA Mr. Smith asked his class at ABC School if they could determine the height of a typical sixth grader at their school. Four of the students arrived at the following conclusions: Cathy – A typical 6th grader is about 5 feet tall. Al – A typical 6th grader is about 6 feet tall. Debbie – A typical 6th grader is about 1 5__ feet tall. 2 1 – A typical 6th grader is about 5__ 2 feet tall. Bob How could these students draw such different conclusions? To answer this question, we will begin to explore the mathematical subject of statistics. In statistics, we look at a set of data (such as the height of sixth graders) and see how it can be organized and analyzed to provide us with useful information. In this lesson, we will look at how data is collected and how we can draw conclusions about a set of data using the mean. Objectives Review these objectives. When you have completed this section, you should be able to: z Determine z Describe z Find whether a sample is biased or random. a set of data using the mean. the median, mode, and range for a set of data. z Describe a set of data using the median, mode, and range of a set of numerical data. z Determine when each measure of central tendency provides a good representation of the z Determine how an outlier affects the measures of central tendency. data. Vocabulary biased question. A question that leads individuals toward a certain answer. biased sample. A sample not representative of the entire population. central tendency. Ways to describe or summarize data. data. Information (often numerical). extreme values. The smallest and largest values in a data set. Section 1 |5 Data Analysis | Unit 2 mean. The sum of a set of data divided by the number of items in the set. median. The middle value of a set of data arranged in numerical order. mode. The most frequently occurring number(s) in a set of data. outlier. A value that is far removed from the rest of the values in a set of data. population. All of the possible data in a given topic. random sample. A sample in which every member of the population has an equal chance of being selected; unbiased sample. range. The difference between the largest and smallest data points. sample. A small part of a population chosen to represent the entire group. statistics. The collection, organization, and analysis of numerical information. survey. A sampling of a population used to make predictions. Note: All vocabulary words in this LIFEPAC appear in boldface print the first time they are used. If you are not sure of the meaning when you are reading, study the definitions given. COLLECTING VALID DATA If we are curious to know the answer to a question such as “What is the typical height of a sixth grader?” we can often find answers by collecting and analyzing data. One way to collect data is to take a survey of the population we are interested in. If the population is small enough, the whole population can be surveyed. However, if the population is very large—let’s say all sixth graders in your state—then we need a sample of sixth graders that represents the population. Samples can be biased or random. A biased sample means that the sample does not represent the population. A biased sample might not represent the population because not enough data was collected or because the sample did not truly represent the population. The best kind of sample to get is a random sample. A random sample means that every member of the population has an equal chance of being selected. Let’s look at the samples taken by Mr. Smith’s students to determine if the samples are biased or random. In the ABC School, each of the four students in Mr. Smith’s class took a survey of sixth graders 6| Section 1 to collect data. Here is how each student collected the data: Al is on the sixth-grade basketball team. He measured himself and his teammates. Bob measured the first 20 sixth graders he saw at recess. Cathy measured herself and four of her friends. Debbie measured all 80 sixth graders. What do you think about Al’s sample? Do you think sampling basketball players would provide a realistic sample of typical sixth graders? Probably not. We would say Al’s sample is biased because it does not represent the population. Cathy did not collect enough data, so her sample is biased also. Bob’s sample is random and unbiased, and he has enough data to represent the population. Bob and Debbie reached the same conclusion, yet Debbie measured all of the sixth graders, and Bob measured only 20 sixth graders. The more data we have, the more likely our conclusions are valid, but we need to be efficient also. Unit 2 | Data Analysis Did you know? Part of statistics is carefully choosing a relatively small sample that will represent the whole population. For example, it would be impossible to survey every home in the country to see what TV show people watch on Thursday! Example: For each of the following topics, should we sample the population, or survey everybody? 1. the most popular movie star in America 2. the favorite food in Mr. Smith’s class 3. the number of hours of TV that students at ABC School watch in a week Solution: For each case, we will look at the size of the population and determine the size of the sample. 1. The American population is too large to survey everybody, so we need a relatively small sample to determine the most popular movie star. 2. The number of students is small enough in Mr. Smith’s class that the survey can include everybody. 3. The number of students in the whole school is too large to survey everyone’s TV-watching habits. A small sample from each class could be taken. It is important that the sample is random, so that the results will not be biased. Example: Which of the following samples would most likely be random? 1. People coming out of the library are asked about a library tax. 2. A questionnaire is included in the newspaper. 3. An alphabetical list of students is sampled by pulling assigned numbers out of a basket. Solution: For each sample, we need to look at whether the sample represents the population and whether it is random. 1. Not random: The people being surveyed are likely to have strong opinions about the library. 2. Not random: Only people with strong opinions are likely to respond. 3. Random: Each student is chosen randomly and will be likely to represent the population. Section 1 |7 Data Analysis | Unit 2 Another factor that can cause a sample to be biased is how a question on a survey is worded. A biased question seems to have a right or wrong answer. The question seems to be leading you to respond in a certain way. Examples of biased questions: Do you, like everybody your age, dislike classical music? Do you agree with the unfair tax law? Who is doing a better job, the efficient mayor, or the lazy chief of police? People who give surveys need to be careful that questions are unbiased so that the results will be valid. Examples of unbiased questions: What is your height? What food do you like best? What color are your eyes? FINDING THE MEAN Once we have collected a random, unbiased sample, how do we analyze the data? There are several measures of central tendency that help us summarize the data. A measure of central tendency gives us one number that represents all the data. One of these is the mean, often called the average. The mean tells us what each number would be if the data were evened out. The mean is found by adding up all the numbers in the data set, and dividing by the number of items in the set: sum of numbers mean = ___________________ number of items Example: Find the mean for the high temperatures for a week in Honolulu, Hawaii: 89°, 87°, 85°, 84°, 86°, 83°, 88° Solution: To find the mean, we will add all the numbers, and divide by the number of temperatures. There are seven temperatures, so we will divide the total by seven. sum of numbers mean = ___________________ number of items 89° + 87° + 85° + 84° + 86° + 83° + 88° 602° _________________________________________ = ______ 7 7 602° ______ = 86° 7 Add the numbers. Divide by 7. So, the mean high temperature for the week was about 86°. So, we can say that the temperature in Honolulu was around 86° for the week. 8| Section 1 Unit 2 | Data Analysis Let’s go back to our problem from the beginning of our lesson and take a look at the data Bob collected for the height of sixth graders at ABC School. Example: Bob measured 20 sixth graders in inches: 65, 67, 64, 68, 66, 66, 60, 72, 60, 69, 69, 65, 67, 64, 1 67, 67, 66, 68, 66, 66. Is his conclusion that sixth graders at ABC School are around 5__ feet 2 (66 inches) tall correct? Solution: To see if Bob is correct, we will use the mean to summarize the data by adding the numbers and dividing by the number of students (20). sum of numbers mean = ___________________ number of items Think about it! The mean will always be between the lowest and highest number in the data set. Can you see why? 65 + 67 + 64 + 68 + 66 + 66 + 60 + 72 + 60 + 69 + 69 + 65 + 67 + 64 + 67 + 67 Add all the heights. + 66 + 68 + 66 + 66 = 1,320" 1320 ______ 20 = 66" Divide by 20. 1 2 So, Bob’s conclusion is correct: sixth graders at ABC School are around 66 inches (5__ feet) tall. Let’s Review! Before going on to the practice problems, make sure you understand the main points of this lesson. 99Sampling is used to draw conclusions about a population that is too large to survey in its entirety. 99Biased samples and questions can affect the validity of the data. 99The mean is one measure of central tendency and helps to summarize the data. Section 1 |9 Data Analysis | Unit 2 Match the following items. 1.1 _________ a question that leads individuals toward a certain answer _________ a sample not representative of the entire population _________ ways to describe or summarize data _________ information (often numerical) _________ the sum of a set of data divided by the number of items in the set _________ all of the possible data in a given topic _________ a sample in which every member of the population has an equal chance of being selected; unbiased sample _________ a small part of a population chosen to represent the entire group _________ the collection, organization, and analysis of numerical information _________ a sampling of a population used to make predictions a. central tendency b. biased sample c.population d. random sample e.statistics f. biased question g.sample h.survey i.data j.mean Circle each correct answer. 1.2_ Which sample size of a population of 200 is most likely to give a reliable conclusion? a. 10 b. 30 c. 50 d. 80 1.3_ Which question is a biased question? a. What is your favorite color out of red, blue, and green? b. How old are you? c. Do you prefer a beautiful sunny day or a depressing rainy day? d. How many siblings do you have? 1.4_ A sample of a population of 100 students is collected. Five students playing dodge ball at recess are asked if they prefer playing outdoors or indoors. Which of the following is not true of the sample? a. The sample is not large enough. b. The question is biased. c. The sample is not random. d. The students are not part of the population. 10| Section 1 Unit 2 | Data Analysis 1.5_ Students in Mr. Smith’s class want to find out how far they can jump. Which of the following will help give a valid conclusion about the data? a. Five students are chosen randomly to jump. b. All 25 students are measured. c. Students who want to jump are measured. d. Students playing during recess are measured. 1.6_ What is the mean for the following set of data? 12, 10, 11, 13, 14 a. 10 b. 12 c. 13 d. 14 1.7_ The mean for a set of data is 10. Which of the following is the set of data with a mean of 10? a. 9, 9, 9, 9, 9 b. 12, 11, 13, 14, 11 c. 8, 12, 10, 13, 7 d. 5, 8, 9, 6, 7 1.8_ The high temperatures for each day last week increased by one degree each day: 65°, 66°, 67°, 68°, 69°, 70°, 71°. What was the mean temperature for the week? a. 68° b. 65° c. 71° d. 70° Section 1 |11 Data Analysis | Unit 2 The median is another measure of central tendency. It is the middle value in a set of data when the values are written in numerical order. Half the values are at the median or higher, and half are at the median or lower. Let’s find the median using some of the data from the newspaper article above. Seven homes were sold last month for the following amounts, in thousands of dollars: 200, 180, 250, 200, 210, 190, and 220. First, we will place the numbers in numerical order so we can find the middle value, the median: 180, 190, 200, 200, 210, 220, 250. To find the median, we will cross out the smallest and largest values (180 and 250), then the next smallest and next largest values (190 and 220), and so on until we get to the middle: MEDIAN, MODE, AND RANGE The median is another measure of central tendency that helps us summarize a set of data, such as home prices. You may even have seen the phrase “median home price” and wondered what it was, or why it was important. In this lesson, we will explore the measures of central tendency and the range for a set of data. MEDIAN: A MEASURE OF CENTRAL TENDENCY Measures of central tendency describe an average, or center, of the data. They are measures that help us summarize numerical data and draw conclusions about the data set. You have already learned one measure of central tendency: the mean. It is found by adding all the numbers in the data set and then dividing by the number of items in the set. 12| Section 1 180, 190, 200, (200), 210, 220, 250 So, the median number for this data set is 200. When there is an odd number of items in the set of data, there will only be one value in the middle of the list. That middle value is the median. When there is an even number of items in the set of data, there will be two values in the middle of the list. There can’t be two medians in a set of data, so in this case, the mean of the two middle numbers is the median. Take a look at this next example. This might help! When rewriting the values in a data set in numerical order, make sure you have the same number of values and haven’t skipped any. It may help to cross out each number from the original set as you rewrite it in numerical order. Unit 2 | Data Analysis Example: Find the median of 120, 142, 83, 211, 187, and 99. Solution: First, put the numbers in numerical order from smallest to largest, making sure that all values in the set are accounted for: 83, 99, 120, 142, 187, 211 Then find the middle value. Since there is an even number of items in this set, the median will be the mean of the middle two numbers. The middle of the list is between 120 and 142. In order to find the median, find the mean of 120 and 142: (120 + 142) ÷ 2 = 262 ÷ 2 = 131 So the median of this set is 131. MODE AND RANGE The mode is the third measure of central tendency. The mode is the value that occurs most often. A data set can have no mode, one mode, or more than one mode. If we take a look at the home values again (180, 190, 200, 200, 210, 220, 250), we can see that 200 is the mode because it occurs most often (twice), while each of the other values occurs only once. The range is usually included when discussing mean, median, and mode, even though it’s not a measure of central tendency. The range is the difference between the smallest and largest values, known as the extreme values. The range gives us an idea of how far the data is spread. The wider the range, the more data is needed for a valid conclusion about the data. When we found the median in the data set of house values, the first two values that we crossed out were the extreme values. The smallest number was 180 and the largest number was 250. Subtracting the smallest number from the largest will tell us the range. 250 – 180 = 70 So, the range in house values is $70,000, from a low of $180,000 to a high of $250,000. We also learned that the median is $200,000 (as the article stated), and the mode is also $200,000. Keep in mind... When stating the range, it is helpful to mention the extreme values also to give an idea of where the data is spread. A meteorologist would say, “Tomorrow’s temperatures will range from 60 to 80 degrees.” Section 1 |13 Data Analysis | Unit 2 FINDING THE MEDIAN, MODE, AND RANGE Let’s try a few more examples. Example: The students in Mr. Smith’s math class took a test worth 10 points. Find the median, mode, and range for the test scores: 1, 2, 3, 4, 4, 5, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10 Solution: The data is already in numerical order, so it does not need to be sorted. To find the median, we will cross out the extreme values (1 and 10), then the next pair of extreme values (2 and 10), and so on, until we get to the middle. 1, 2, 3, 4, 4, 5, 6, 6, 7, (7, 7), 8, 8, 8, 9, 9, 9, 10, 10, 10 Median If a data set has an even number of values, the median is the mean of the two middle values. If the two middle values are the same, the median will be that value. In this example, the two middle values are 7, so the median will also be 7. Mode The mode is the most frequently occurring number(s) in a set of data. Since 7, 8, 9, and 10 each occur 3 times each, there are 4 modes. Range The extreme values are 1 (low score) and 10 (high score), so the range is 9 (10 – 1 = 9) So the median test score is 7. There are 4 modes: 7, 8, 9, and 10. The range is 9, from a low score of 1 to a high score of 10. 14| Section 1 Unit 2 | Data Analysis Example: In Science, Al has 6 test scores, each out of 100 points: 65, 72, 83, 91, 97, and 98. Find the median, mode, and range of his test scores. Solution: Again, the scores are already in numerical order, so we can pair the numbers to find the median. Median 65, 72, 83, 91, 97, 98 sum of numbers mean = ___________________ number of items 83 + 91 _______ Add the scores 2 174 _____ = 87 2 Divide by 2 Make note! The median does not have to be an actual data point. It represents the middle, where half the numbers are above it and half are below it. So, the median is 87. Range The extreme values are 65 (low score) and 98 (high score). The range is the difference between the two numbers. 98 – 65 = 33 So, the range is 33. Mode Notice that all the test scores are different, so there is no mode! This happens sometimes, and it gives us an idea that the data may be spread out since none of the data repeats. So, the median score is 87, with no mode and a range of 33, from 65 to 98. Let’s Review! Before going on to the practice problems, make sure you understand the main points of this lesson. 99The mean, median, and mode are measures of central tendency. Along with the range, these values summarize a set of numerical data. 99The median is the middle number, with half of the values at or above it, and half at or below it. 99The mode is the value that occurs most often. 99The range is the difference between the extreme values. Section 1 |15 Data Analysis | Unit 2 Match the following items. 1.9 _________ the smallest and largest values in a data set a.median b. extreme values _________ the middle value of a set of data arranged in numerical order _________ the most frequently occurring number(s) in a set of data _________ the difference between the largest and smallest data points c.mode d.range Circle each correct answer. 1.10_ Use the following set of data: _ 78, 78, 83, 85, 89, 91, 95, 98 _ What is the median for the set of data? a. 85b. 87c. 89d. 98 1.11_ Use the following set of data: _ 78, 78, 83, 85, 89, 91, 95, 98 _ What is the mode for the set of data? a. 78 b. 87 c. 20 d. 98 c. 20 d. 98 c. 1, 5, 5, 5, 5 d. 1, 2, 5, 6, 8 1.12_ Use the following set of data: _ 78, 78, 83, 85, 89, 91, 95, 98 _ What is the range for the set of data? a. 78 b. 87 1.13_ Which set of data does not have a mode? a. 1, 3, 5, 5, 6 b. 1, 1, 1, 2, 2 1.14_ For which set of data are the median and mode the same number? a. 5, 5, 6, 8, 9, 10 b. 3, 5, 6, 8, 8, 9 c. 4, 5, 7, 7, 8, 10 d. 4, 5, 7, 8, 9, 9 Place a check mark next to each correct answer (check all that apply). 1.15_ If there are five numbers in a data set, how many modes can there be? 16| Section 1 0 1 2 3 Unit 2 | Data Analysis 1.16_ Given the following set of data: 3, 4, 5, 6, 8, 9, 9, which measures below have a value of 6? range mode high value median 1.17_ If all the values in a set of data are the same, which of the following is true? The range is zero. The median and mode are the same. The low and high values are the same. The range and mode are the same. DESCRIBING DATA Mr. Smith’s math class takes a quiz worth ten points. The scores are summarized below using the range and the measures of central tendency. Range — 10, from 0 to 10 Mode —9 Median Mean —9 —6 Can you tell how the class scored overall? Can you make a good prediction about what the data would look like for the class? In this lesson, we will look at how the measures of central tendency are used to describe data and which measures to use to best describe a set of data. Once we know the measures of central tendency and the range for a set of numerical data, we can summarize the information. Usually we are trying to answer a question or draw conclusions about a topic. For instance: How did Mr. Smith’s class do on the math quiz? We can describe the data in two ways. We can ask ourselves what a typical number for this data would be, or, we can make a prediction about future events: If a student was absent and took the math quiz tomorrow, what score would she be likely to get? So, what do the measures of central tendency tell us? Let’s look at the results for Mr. Smith’s math quiz to see. Range — 10, from 0 to 10 If the data is within a small range, then it’s easy to summarize the data. If the range had been 3, say from 6 to 8, we would probably conclude that most students scored around 7. However, the range is fairly wide, indicating the data could be spread out. Median —9 The median tells us the halfway point of the data within the range. Since we know the range for the math quiz, we know that half of the class scored 9 or 10, and half scored 9 or less, all the way down to 0. Mode —9 The mode is the most common number, so it can tell us where data might be concentrated (or spread out if there are a number of modes or no mode). Since the mode is 9, this confirms that there is a lot of data around 9. Mean —6 The mean is useful in summarizing data if the range is narrow. Since the range for the math quiz is fairly wide, it wouldn’t be accurate to say that a typical score on the math quiz was 6, when we know there are scores from 0 to 10. However, since we know that half of the data is Section 1 |17 Data Analysis | Unit 2 9 or 10, the fact that the mean is lower than 9 tells us that there are low scores that are pulling the mean down. So, we could say that a typical score on Mr. Smith’s math quiz was around 9, but that some students scored low. Or, we could say that if another student took the quiz, she would probably score pretty high, but there would be a chance she would score very low. If the mode, median, and mean are close to each other, then it’s easy to summarize the data set because the data must be concentrated near these values. If all three are different, and/or the range is wide, perhaps more data is needed to draw valid conclusions. This might help! When summarizing data, follow these general rules: MEASURE GENERAL RULE Mode Helpful in showing where data may be concentrated (or spread out, if there is no mode). Median Helpful when used with the range, it tells us where each half of the data is. Especially useful if the median is near to one of the extreme values. Mean Helpful if the range is narrow. Example: Here are the number of hours seven students watched television in one week: 5, 6, 7, 7, 7, 8, 9. Using range and the measures of central tendency, how could we describe this set of data? Solution: First, we will find the measures of central tendency and the range, and use them to summarize the data. Range — 4 The extreme values are 5 and 9: 9–5=4 Mode — 7 There are three 7’s. Every other number occurs once. Median — 7 5, 6, 7, (7), 7, 8, 9 Mean — 7 5+6+7+7+7+8+9 _________________________ 7 = 49 ___ 7 = 7 Since the mode, median, and mean are all 7, and the range is narrow (4), we can say that these students watched around 7 hours of TV a week. If we chose another student, we would predict that they would watch around 7 hours of TV a week, too. 18| Section 1 Unit 2 | Data Analysis OUTLIERS AND DESCRIBING DATA Sometimes, a set of data has a data point that is far from the other data. This faraway point is called an outlier. An outlier is an unusual quantity because it doesn’t represent the data. Example: Here is a set of data with an outlier. How does the outlier affect the range and measures of central tendency? 2, 7, 8, 8, 8, 9, 9 If we know there is an outlier, we can consider the data set with it, or without it. How does an outlier affect the range and measures of central tendency? Let’s look at an example. Think about it! By its definition, an outlier is an extreme value, so the range will always be affected. If we remove the outlier, we have a better idea of the spread of the data. Solution: Think about it! The outlier is 2 because it is 5 away from the next value. So we will compare these two data sets. We will calculate each measure with and without the outlier, and compare the results. With an outlier in the data, the mean may not be a good representation of the data. In our example, the numbers (except for 2) are around 8, yet the mean is 7 because of the outlier. 2, 7, 8, 8, 8, 9, 9 7, 8, 8, 8, 9, 9 With the outlier: Without the outlier: Range — 7 Range — 2 The extreme values are The extreme values are 2 and 9: 7 and 9: 9 –2 = 7 9 – 7 = 2 Mode — 8 Mode — 8 8 occurs three times. 8 occurs three times. Each other value appears once each. Median — 8 Median — 8 2, 7, 8, (8), 8, 9, 9 7, 8, 8, 8, 9, 9 Mean — 7 Mean — 8 There are seven numbers in the set. Now there are six numbers. 2+7+8+8+8+9+9 51 _________________________ = ___ ≈ 7.3 7 7 7+8+8+8+9+9 49 ______________________ = ___ ≈ 8.2 7 6 So, the mode and median did not change. However, the range and mean were affected. When we look at a data set with an outlier, we should consider it without the outlier also because it gives us a better idea of the range of most of the data, and its mean. Section 1 |19 Data Analysis | Unit 2 Let’s take another look at the results from Mr. Smith’s math quiz, with one more detail added: Range — 10, 0 to 10 Outlier Mode —0 —9 Median Mean —9 —6 Now it makes more sense why the mean is not close to the mode. The outlier “pulls down” the mean. Let’s Review! Before going on to the practice problems, make sure you understand the main points of this lesson. 99The mean, median, mode are used to describe a set of numerical data. 99Each measure of central tendency can be useful in describing the data. 99An outlier will affect the mean and range but not usually the mode or median. Place a check mark next to each correct answer (check all that apply). 1.18_ Which of the following is true of an outlier? An outlier is a false data point. An outlier does not affect measures of central tendency. An outlier is not near the main cluster of data. An outlier is always the highest or lowest value in a data set. 1.19_ Given the set of data below, which measure(s) will change if the outlier is removed? _ 1, 6, 8, 8, 8 range mode median mean Answer true or false. 1.20 _______________ If two different data sets have the same mean, the one with a wider range gives a better representation of its data. Circle each correct answer. 1.21_ When is the mean most useful in describing a set of data? a. when there is an outlier b. when the range is wide c. when the range is narrow d. when there are two or more modes 1.22_ What will happen to the measures of central tendency if the outlier is removed? 4, 5, 5, 6, 7, 8, 17 a. They will not change. b. The mode will change. c. The mean will decrease. d. The mean will increase. 20| Section 1 Unit 2 | Data Analysis 1.23_ Which of the following is an outlier in the set of data below? 2, 12, 12, 14, 17, 18, 19 a. 2 b.12 c. 14 d.19 1.24_ A group of sixth graders’ heights are summarized below. If another sixth grader were measured, what would be a good prediction for her height? _ Range — 10, from 50 to 60; Mode — 57; Median — 55; Mean — 55 a. 60 inches b. 57 inches c. 50 inches d. 55 inches 1.25_ Which statement best describes the data? 1, 6, 6, 6, 8 a. The range is 6. b. The data is concentrated around 6 and 1 is an outlier. c. The mean is 4.5 and the range is 7. d. Eight is an outlier. 1.26_ Which set of measures would give a clear conclusion about the data? 5, 6, 7, 7, 7, 8, 9 a. Range — 5 b. Range — 25 Mode — 7 Mode — 8 Median — 9 Median — 8 Mean — 8 Mean — 8 Outlier — 3 c. Range — 8 d. Range — 4 Mode — none Mode — 7 Median — 6 Median — 7 Mean — 9 Mean — 7 TEACHER CHECK initials date Review the material in this section in preparation for the Self Test. The Self Test will check your mastery of this particular section. The items missed on this Self Test will indicate specific areas where restudy is needed for mastery. Section 1 |21 Data Analysis | Unit 2 SELF TEST 1: COLLECTING AND DESCRIBING DATA Circle each correct answer (each answer, 7 points). 1.01_ What would be the best way to get an unbiased sample that represents the population for the following topic: What is the favorite sport in Mr. Smith’s class? a. Ask four people on your soccer team. b. Ask, “Do you like silly soccer, or exciting football?” c. Ask people reading at recess. d. Randomly survey at least 20 people in the class. 1.02_ How much data is needed to have a representative sample of the population? a. 10 to 20 items. b. At least 100 items. c. All of the population. d. It depends on the size of the population. 1.03_ Which sample size out of a population of 1,000 is most likely to lead to a valid conclusion? a. 50 b. 100 c. 150 d. 200 1.04_ What is the mean of the set of data? 6, 7, 10, 12, 12, 13 a. 6 b. 7 c. 10 d. 11 What is the median for the set of data? 6, 7, 10, 12, 12, 13 a. 7 b. 10 c. 11 d. 12 What is the mode for the set of data? 6, 7, 10, 12, 12, 13 a. 10 b. 11 c. 12 d. 13 What is the range for the set of data? 6, 7, 10, 12, 12, 13 a. 6 b.7 c. 11 d.13 1.08_ Which set of data has the same mode and median? a. 3, 4, 6, 7, 8 b. 3, 3, 4, 5, 5 c. 6, 7, 7, 8, 9 d. 2, 2, 3, 4, 6 1.09_ Which value in the data set is an outlier? 7, 8, 9, 10, 10, 22 a. 7 b. 9.5 d. 22 1.05_ 1.06_ 1.07_ c. 10 1.010_ Given the following information, which is the best description of the data? Range — 5, from 12 to 17; Mode — 14; Median — 14.5; Mean — 15 a. The data is around 12. b. The data is around 14. c. The data is around 15. d. The data is around 17. 22| Section 1 Unit 2 | Data Analysis 1.011_ What will happen to the mean if the outlier is removed? 4, 5, 5, 7, 9, 17 a. It will not change. b. It will be the same as the median. c. It will decrease. d. It will increase. 1.012_ A set of data is described as: The data is around 7. If another measurement were taken it would probably be around 7. Which group of measures would lead to this conclusion? a.Range — 10 b. Range — 4 Mode — 7 Mode — 6 Median — 9 Median — 7 Mean — 5Mean — 8 c.Range — 12 d.Range — 5 Mode — 9Mode — 9 Median — 9Median — 9 Mean — 9Mean — 7 Place a check mark next to each correct answer (check all that apply; each answer, 8 points). 1.013_ Given the following set of data, which measures have a value of 6? 3, 4, 6, 8, 9 range mode median mean 1.014_ Given the set of data below, which measure(s) will change if the outlier is removed? 4, 7, 9, 9, 10 range mode median mean 80 100 SCORE TEACHER initials date Section 1 |23 MAT0602 – Apr ‘15 Printing 804 N. 2nd Ave. E. Rock Rapids, IA 51246-1759 800-622-3070 www.aop.com ISBN 978-0-7403-3466-5 9 780740 334665