Section 3: Analyzing Data with Fathom Section 3: Analyzing Data with Fathom Summary: Teachers analyze automobile data using Fathom to describe center and spread using dot plots, box plots, histograms. They will examine distributions of univariate data of a quantitative attribute as well as comparison of distributions when a qualitative attribute is added to separate distributions by categories. They will consider pedagogical issues related to the use of various graphical representations, measures of center and spread, and dynamic statistical software. Objectives: Mathematical: Teachers will be able to • generate questions to explore given a data set; • examine the distribution of a univariate data set using dot plots, box plots, and histograms, including comparing distributions; • describe the center and spread of a data set using resistant (median and interquartile range) and nonresistant (mean and standard deviation) measures; • develop a conceptual understanding of the usefulness of the standard deviation. Technological: Teachers will be able to use Fathom to • create dot plots, box plots, and histograms of univariate data; • add a qualitative attribute to an existing graphical distribution of a quantitative attribute, both as a key legend and as a category on the y-axis; • plot statistical measures on graphs; • compute basic statistics in a summary table. Pedagogical: Teachers will • consider the advantages and disadvantages of dynamic linking capabilities and different graphical representations in Fathom; • consider how different graphical representations and measures of center and spread can draw attention to similarities and differences when comparing data sets; • consider the benefits and drawbacks of tasks to assist students in reasoning about data. Prerequisites: Material discussed in Section 1 of this module Vocabulary: univariate data, bivariate data, interquartile range, deviations, standard deviation, resistant measures, and nonresistant measures. Technology Files: 2006_Vehicles.ftm Emergency Technology Files: 2006_Vehicles_Part_3.ftm Required Materials: Fathom v.2 _____________________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 1 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Section 3: Analyzing Data with Fathom Data about an observed phenomenon comes in many different forms—often frequencies, scores, codes, categories, or measurements. In addition, these different forms of data can be represented in multiple ways. While viewing data in a table may assist in examining individual cases, graphs and descriptive statistical measures may help in analyzing and characterizing trends in the whole data set, or the aggregate. Software tools have made the re-presentation of data in graphs and the calculation of statistical measures quick and easy. Thus, rather than spending valuable time in constructing graphical displays or computing measures, software tools facilitate quick displays and computations that allow for more time to be spent on analyzing the data. In Sections 1 and 2, we used the software TinkerPlots to assist in analysis of data. In this Section 3 and 4, we will be using Fathom 2.0 (Key Curriculum Press, 2005. TinkerPlots and Fathom use a similar interface to allow users to conduct data analysis. TinkerPlots was designed to encourage users to create graphical displays by implementing a series of actions, while Fathom allows users to easily create a variety of standard graphical displays with fewer actions. While TinkerPlots has the capability to display measures of center on a graph, Fathom includes a whole suite of tools that can allow users to compute descriptive and inferential statistics. Thus, Fathom is a much more powerful statistical tool, while TinkerPlots is a powerful tool for analyzing data in graphical form. Like TinkerPlots, Fathom was created to allow users to have dynamic control over data—meaning that as you change things in a document, everything linked to what you are changing will update while you drag. This linking between tabular data, graphical representations, and statistical measures can be a powerful tool for exploring data in meaningful ways. We will start this Section with exploring univariate data (a single attribute in a data set) and will use what we learn with univariate data to explore bivariate data (two attributes in a data set). __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 2 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Part 1: Asking Questions from Data Increases in gas prices over the past several years may be one contributing factor to many automobile manufacturers’ focus on improving vehicle miles per gallon (mpg) performance and development of alternative types of engines that use a combination of electricity and gasoline. Many people in America have also revisited the type of vehicle they own, especially families who have longer commutes to the workplace. To help us become more informed about the variety of vehicles on the market today, we have assembled a collection of 41 vehicles manufactured in 2006. Most of the vehicles (30) were rated as the top fuel economy leaders in the most popular vehicle classes. This data is depicted in the table on the following page. Although a typical cycle of data analysis starts with forming questions and then collecting data to answer the question, textbooks and teachers often use pre-collected data sets with their students to provide an immediate springboard for exploring a phenomenon and to begin analyzing data. When students are presented with a given data set, they need to learn how to examine the data and formulate specific questions that can be answered knowing the various quantitative and qualitative variables (called attributes in Fathom) available about each case. FOCUS ON MATHEMATICS M-Q1. Review the data in the table. Generate at least four different questions that you could explore by analyzing this data set. FOCUS ON PEDAGOGY P-Q1. Describe two classroom situations, one for which it would be beneficial to use a pre-collected set of data, and one for which students should be collecting data themselves. Provide a rationale for the benefits in each situation. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 3 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom 2006 Vehicle Data Mfr Chevrolet Chevrolet Ford Ford Ford Ford Ford Ford Gmc Gmc Gmc Gmc Honda Honda Honda Honda Honda Honda Hyundai Hyundai Hyundai Isuzu Jeep Lexus Lexus Mazda Mazda Mazda Merc-Benz Mini Mini Pontiac Saturn Suzuki Toyota Toyota Toyota Toyota Volkswagen Volkswagen Volkswagen Model Cargo Van Passenger Van Escape Fwd Escape Hybrid Fwd Focus Wagon Focus Wagon Ranger Pickup Ranger Pickup Savana Cargo Van Savana Passen Van Sierra Hybrid 2wd Sierra Hybrid 4wd Accord Accord Hybrid Civic Hybrid Insight Insight Odyssey Elantra Sonata Sonata Ascender 4wd Liberty 4wd Rx 330 4wd Rx 400h 4wd B2300 2wd B2300 2wd Tribute 2wd E320 Cdi Mini Cooper Mini Cooper Vibe Ion Aerio Awd Corolla Matrix Prius Scion Xb Tacoma 2wd Golf New Beetle New Beetle Class Van Van Suv Suv Wagon Wagon Truck Truck Van Van Truck Truck Sedan Sedan Compact Compact Compact Minivan Sedan Sedan Sedan Suv Suv Suv Suv Truck Truck Suv Sedan Compact Compact Wagon Compact Compact Wagon Sedan Wagon Truck Compact Compact Compact Trans Auto Auto Manual Auto Auto Manual Auto Manual Auto Auto Auto Auto Auto Auto Auto Auto Manual Auto Manual Auto Manual Auto Auto Auto Auto Auto Manual Manual Auto Auto Manual Manual Manual Auto Manual Auto Auto Auto Manual Auto Manual City 15 15 24 36 26 26 21 24 15 15 18 17 24 25 49 57 60 20 27 24 24 22 22 18 31 21 24 24 27 26 28 30 37 35 30 60 30 21 37 35 37 Hwy 20 19 29 31 32 34 26 29 20 19 21 19 34 34 51 56 66 28 34 33 34 26 26 24 27 26 29 29 37 34 36 36 44 42 36 51 34 26 44 42 44 AnnFuel 1940 1940 1270 1000 1178 1138 1436 1270 1940 1940 1736 1835 1178 1176 660 591 525 1436 1099 1221 1178 1338 1338 1800 1138 1436 1270 1270 1024 1242 1242 1000 769 809 1000 601 1066 1436 769 809 769 Engine Standard Standard Standard Hybrid Standard Standard Standard Standard Standard Standard Hybrid Hybrid Standard Hybrid Hybrid Hybrid Hybrid Standard Standard Standard Standard Diesel Diesel Standard Hybrid Standard Standard Standard Diesel Standard Standard Standard Diesel Diesel Standard Hybrid Standard Standard Diesel Diesel Diesel Weight 4894 5295 3180 3627 2775 2771 3028 3028 4894 5295 5038 5357 3168 3589 2875 1881 1850 4475 2784 3266 3253 4954 4011 4065 4365 2994 2994 3192 3835 2557 2425 2700 2752 2859 2679 2890 2470 3180 2972 2965 2884 Mfr: Manufacturer Model: Model name Class: Vehicle classes used to classify by passenger and cargo volume (cars) and gross vehicle weight rating (trucks). Trans: either Automatic or Manual Transmission City: estimated MPG in City driving Hwy: estimated MPG in Highway driving AnnFuel: Estimated annual fuel cost assuming 15,000 miles per year (55% city and 45% hwy) and average fuel price Engine: Standard (accepts unleaded gas), Diesel (accepts diesel), or Hybrid (runs part on electricity and part on unleaded fuel) Weight: Weight of vehicle, including standard equipment and all fluids, but no passengers, cargo, or optional equipment Data retrieved from 2006 Fuel Economy Guide http://www.fueleconomy.gov/feg/download.shtml __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 4 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Part 2: Examining Univariate Distributions To explore the vehicle data using Fathom, open the 2006_Vehicles.ftm file. When you open the file, you should see one icon: the collection icon 2006 Vehicles . Tech Tip: Different cases can be viewed in the inspection window by clicking the right arrow in the bottom left corner of the window. The number 41 indicates that there are a total of 41 cases in the collection. Double clicking on the collection icon opens the inspect collection window which provides a view of the values for the attributes for each case (shown in Figure 3.1). The name of each attribute in the data set will be listed in pink with one attribute per row. The inspection window contains 41 data cards, one for each of the cases in the data set. The data cards are useful for examining each individual case. However, to do analysis on the whole data set, it is helpful to view the data set in a table. Tech Tip: If the Case Table does not show the data, drag and drop the name of the collection onto the body of the case table. To view a collection of data as a table: 1. click on the Collection icon to select the 2006 vehicle collection. 2. From the object shelf, drag and drop a New Case Table into the document. 3. Click and drag a corner of the case table to resize it. 2006 Vehicles Mfr Figure 3. 1 Figure 3. 2 Model Class Trans City Hw y AnnFuel Engine Weight 1 Chevrolet Cargo Van Van Auto 15 20 1940 Standard 4894 2 Chevrolet Passenger Van Van Auto 15 19 1940 Standard 5295 3 Ford Escape Fw d Suv Manual 24 29 1270 Standard 3180 4 Ford Escape Hybrid Fw d Suv Auto 36 31 1000 Hybrid 3627 5 Ford Focus Wagon Wagon Auto 26 32 1178 Standard 2775 6 Ford Focus Wagon Wagon Manual 26 34 1138 Standard 2771 7 Ford Ranger Pickup Truck Auto 21 26 1436 Standard 3028 8 Ford Ranger Pickup Truck Manual 24 29 1270 Standard 3028 4894 9 Gmc Savana Cargo Van Van Auto 15 20 1940 Standard 10 Gmc Savana Passen Van Van Auto 15 19 1940 Standard 5295 11 Gmc Sierra Hybrid 2w d Truck Auto 18 21 1736 Hybrid 5038 12 Gmc Sierra Hybrid 4w d Truck Auto 17 19 1835 Hybrid 5357 13 Honda Accord Sedan Auto 24 34 1178 Standard 3168 14 Honda Accord Hybrid Sedan Auto 25 34 1176 Hybrid 3589 Figure 3. 3 __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 5 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom The first question we are going to examine about the 2006 vehicle data set is, “How do these automobiles typically perform in their gas mileage when driving in the city?” In order to answer this question, we need a measurable attribute of the automobiles that can be used to characterize performance in gas mileage when driving in the city. The attribute that provides a measure of this characteristic is City, which gives the estimated mpg reported by the US Environmental Protection Agency based on their lab testing. When asking questions about a phenomenon, students may have difficulty determining how to collect a measurable attribute that can be used to answer to the question. This same difficulty can occur when students have access to a pre-collected data set and want to ask questions about the phenomenon. They may ask questions for which no quantitative or qualitative attribute in the data is helpful in answering. To answer our question, it would be useful to view the distribution of the City mpg graphically. To construct graphs in Fathom, a user must place an attribute on a given axis. This action will populate the graph with the data associated with this attribute. The purposeful placement of an attribute onto an axis can help students connect the numerical data to the graphical representation. The default graph in Fathom is a dot plot. Tech Tip: You can change the scale of the axis by clicking and dragging the axis. When the hand is vertical, this will translate the axis. When the hand is horizontal, dragging will dilate the scale. To view data graphically, 1. click and drag the Graph object from the object shelf. The graph will be blank. 2. Click and drag the attribute label (City) in the Case Table and drop it onto the xaxis in the graph where it reads “Drop an attribute here”. Figure 3. 4 We currently have three representations of our data set: 1) Collection (shown as cards in the inspection window), 2) case table, and 3) a dot plot. These representations of data are linked together. This allows a user to locate a case across multiple representations. In addition, changes in data in one representation will be automatically changed in all representations of the data. Tech Tip: You can undo a few changes by selecting the Undo command (ctrl-z) from the Edit menu. To change a data value, 1. from the case table, click on the row number for a case (e.g. to choose the Ford Ranger Pickup, click on the number 7 to highlight that case row). 2. To change the data value graphically, click on the red data icon and drag it to the left or right. Notice the change in the corresponding numerical value in the table. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 6 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Since the 2006 vehicle data should be a fixed data set, we need to revert the data to its original values. In Fathom, a data icon in a graph can be dragged to change its value; however, it is possible to prevent a user from changing the data value. In the case of the 2006 vehicle data, this would be wise. To revert a collection, 1. select the2006 Vehicle collection object. 2. From the File menu, choose Revert Collection. To prevent changes in a collection by dragging data icons, 1. select any of the open objects (e.g., Collection, Table, Graph) in the workspace, 2. Under the Collection menu, choose Prevent Changing Values in Graphs. Although we want to keep the data set fixed, we can still take advantage of the linked capabilities between the case table and the graph to answer a few questions about the vehicles performance for City mpg. The linking of these representations allows students to explore individual cases while also considering the case with the entire aggregate. Since many students initially are interested in and focus on individual cases, it can be helpful to ask questions about individual cases that also allow students to consider the relative position of these cases to the aggregate. FOCUS ON MATHEMATICS M-Q2. By clicking on the data icons on the graph, find which vehicles are at the low and high ends of the distribution. M-Q3. The Volkswagon New Beetle with Automatic transmission is a trendy favorite for many Americans. By clicking on the case row for this vehicle in the case table, use the graph to describe the New Beetle’s standing in City mpg relative to the other vehicles. M-Q4. There appears to be a cluster of 4 vehicles with a City mpg above 45. Clicking and dragging a selection box around those data icons will highlight the vehicles in the case table. Examine these 4 cases carefully. List two or three attributes these vehicles have in common. FOCUS ON PEDAGOGY P-Q2. What are the advantages and disadvantages of having the representations dynamically linked when working with a data set? __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 7 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom P-Q3. The linking of multiple representations in software like Fathom allows one to simultaneously view the distribution of an entire data set while focusing on individual cases. How might this feature help or hinder students’ analysis of the data? Two other graphical representations often used to display quantitative attributes of univariate data are histograms and box plots (also called box-and-whisker plots). Viewing the data in these different representations may illuminate or obscure different aspects of the distribution. Drag down two more empty Graph objects into the workspace and drag and drop the City attribute onto the x-axis of each graph. To assist in comparing the three different representations, we are going to change one graph to be a box plot and one to be a histogram. To create a box plot, 1. from the drop down menu in the top right corner of the graph window, select the Box Plot option. To create a histogram: 1. from the drop down menu in the top right corner of the graph window, select the Histogram option. Figure 3. 5 To adjust the bin width in a histogram: 1. point to a vertical boundary for one bar in the histogram. The cursor will change to a double arrowed line. 2. Either click and drag to adjust the bin width dynamically, or double click and enter a value for the binAlignment and binWidth (see Figure 3.6, in our example we can start the first bin at 15 and have a width of 5). Figure 3. 6 The distribution of City mpg is shown in Figure 3.7 as a dot plot, box plot, and histogram. If you click on a case or select a range of cases in any one the graphs, the corresponding cases will also be highlighted in the other graphs. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 8 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Figure 3. 7 FOCUS ON MATHEMATICS M-Q5. Compare the representation of the City data in the three graphs in Figure 3.7. What characteristics of the distribution are more noticeable or are hidden in each representation? M-Q6. By only examining the graphs, what would you characterize as a typical City mpg for these automobiles? FOCUS ON PEDAGOGY P-Q4. How can examining a distribution using three different linked graphical representations be a help or hindrance for students? P-Q5. How could students use the box plot to describe the center and spread of the City mpg? P-Q6. Describe how you could help students understand why the median is not located in the center of the middle 50% of the data. Although the median is displayed in the box plot, it may be helpful to display the location of the median and mean on the graphs. Overlaying a statistical measure on a graphical representation can provide students with a visual way of conceptualizing the location of the measure in relationship to the entire aggregate. This can help students understand better how the value of the measure represents the entire data set and how its location is related to the distribution of data values. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 9 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Tech Tip: When typing formulas in the Formula Editor, if Fathom recognizes the function, the text turns blue. If the name of an attribute is recognized as one in the data set, the text turns pink. To add a vertical line representing a measure to a graph: 1. with the graph window selected, choose the Graph menu and select the Plot Value option. 2. A formula editor window will appear. In the textbox to the right of “Value=” type in the function to compute the statistical measure. For our example, we will want to use mean(City) and median(City). Figure 3. 8 You can add the mean and median measure to each of the three graphs. Figure 3.9 displays both measures overlaid on the dot plot. Figure 3. 9 FOCUS ON MATHEMATICS M-Q7. Do either of the measures of center, mean or median, best represent a typical City mpg for these automobiles? Defend your choice or provide an alternative way of representing the typical City mpg. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 10 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Part 3: Comparing Distributions Using Center and Spread1 Thus far, we have explored the City mpg for the entire aggregate of vehicles. It is obvious from our analysis that some types of vehicles may have better City mpg than others. In particular, we previously noticed that the four cases considered as outliers were all Hybrid engines. Our data set contains vehicles of three different Engine types: Standard, Diesel, and Hybrid. When students make an observation like this about a data set, it often prompts them to explore a new question. This is an important feature of EDA—analysis of data leads to more questions, which leads to further analysis. Consider the following question: Which type of engines give vehicles the best fuel economy in the city? To examine this question, we need to use two attributes in the data set: City mpg and Engine type. We now have a question that needs us to use bivariate data with one quantitative attribute (City) and one qualitative attribute (Engine). Having students examine one quantitative and one qualitative attribute together in a data set can provide a transition into the working with bivariate data (two attributes) to answer a question. One way to begin examining the data with attention to the two attributes is to overlay the qualitative attribute on top of the dot plot of the distribution of the City mpg. This action will recolor the icons according to the categories of the qualitative attribute and display a legend explaining the coloring. To overlay a legend attribute to a graph: 1. click and drag the name of an attribute form the case table and point to the interior of the plot window. Directions will appear as shown in Figure 3.10. You only need use the Shift or Ctrl keys if it is not clear which type of attribute you are dragging, or if you want to purposely use an attribute a specific way (e.g., if the categories Figure 3. 10 of a qualitative attribute have been entered using numeric codes such as 1, 2, 3, you may have to use the Shift key to force Fathom to recognize the data at categorical). 2. Release the mouse and notice the appearance of the legend and that different shapes and colors are represented (see Figure 3.11). If the legend attribute is 1 The technology file “2006_Vehicles_Part_3.ftm” is available for students to use for Part 3 if they were unable to complete Part 2 with the technology. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 11 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom qualitative, shapes and colors will be used, if the attribute is quantitative, a color gradient will appear (we will explore this in a later section). Figure 3. 11 FOCUS ON MATHEMATICS M-Q8. Viewing Figure 3.11, what can you say about the City mpg for each of the three Engine types? FOCUS ON PEDAGOGY P-Q7. How can overlaying a categorical (qualitative) attribute on a dot plot of a numerical (quantitative) attribute influence students’ ability to examine data? The graph in Figure 3.12 is good way for students to begin to coordinate two attributes in a data set, and thus is a first step in learning to conduct bivariate data analysis where one variable is quantitative and the other is qualitative. In Fathom, students can also place the qualitative attribute on the y-axis and separate the data into distinct categories. In our example, we can drag and drop the attribute Engine onto the y-axis. This will allow us to view the distribution of City mpg for each engine type separately (Figure 3.12) __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 12 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Tech Tip: To remove a legend attribute from a graph, click on the plot window and from the Graph menu, select Remove Legend Attribute. Figure 3. 12 FOCUS ON MATHEMATICS M-Q9. What similarities and differences do you notice about the distributions of City mpg for each of the Engine types? M-Q10. Examine the location of the mean and median in the three distributions. Explain the relative location of the mean and median to each other in the three distributions. Although dot plots are useful, changing the graphical representation to another form may highlight different aspects of the distribution. Change the graphical display from a dot plot to a box plot (See Figure 3.13). Figure 3. 13 __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 13 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom FOCUS ON MATHEMATICS M-Q11. What characteristics of the distributions beyond the measures of center are highlighted when viewed as box plots? FOCUS ON PEDAGOGY P-Q8. How can examining the statistical measures of mean and median along with the dot plot or box plot display of the distribution for each engine type assist students in reasoning about center and spread when comparing the three groups? P-Q9. How could you use the data to help students understand why in each of the three box plots in Figure 3.13 the whiskers are not the same length? In addition to comparing distributions graphically and displaying measures on a graph, it is also helpful to use technology to compute and display the exact values of several statistical measures. A summary table is useful in computing these statistics. To create a Summary Table with several statistical measures, 1. drag down an empty summary object. 2. Click and drag a quantitative attribute (City mpg) to the summary table. Once the cursor is over the summary table, a down arrow and a right arrow appear. Figure 3. 14 Drop the quantitative attribute below the down arrow. 3. By default, the measure computed and displayed is the mean. There are three ways to add more measures. From the Summary menu, you could select Add Formula, Add Basic Statistics, or Add Five-Number Summary. For our example, choose Add FiveNumber Summary. You will likely have to resize the Summary table window. 4. You can also add a qualitative attribute to the Summary table to recompute the statistics for each separate category. In our example, we want to drag drop the attribute Engine next to the right arrow. Again, you will likely have to Figure 3. 15 resize the window to view the statistical measures for each category. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 14 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Figure 3. 16 Now we have two powerful tools to help us analyze and compare the distributions of City mpg for the different Engine types. We can change the graphical display to show dot plots, box plots or histograms or use the Summary Table to compute additional statistical measures. FOCUS ON MATHEMATICS M-Q12. Use the graphical displays and the statistical measures to compare the distributions of the City mpg for the three Engine types. Which type of engines give vehicles the best fuel economy in the city? Justify your reasoning. FOCUS ON PEDAGOGY P-Q10. What are some of the key features of this vehicle data set that make it useful in helping students attend to important ideas of center and spread when comparing data sets? Asking students to compare distributions has been shown to be a useful technique for helping students transition from considering data as individual cases to paying attention to data as an aggregate. In addition, tasks that ask students to compare distributions can help them consider characteristics such as shape and spread as useful complements to measures of center. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 15 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Part 4: Understanding Spread of a Distribution Pedagogy Tip: A detailed discussion of the IQR can be found in Section 1, Part 4. When representing data in a box plot, students can focus on the median as a measure of center and the interquartile range (IQR) as a measure of the middle 50% of the data, represented as the “box”. Thus, the IQR can help describe the spread of a data set and is useful to consider in concert with the median as a measure of center. When we use means to compare centers, then it does not make sense to use interquartile ranges, which are computed using the medians, to analyze spread. Rather, a different measure of spread, the standard deviation, is often used. This measure of spread takes into consideration how each data point deviates from the mean. Consider the diagram in Figure 3.17. There are five data points shown with values {3, 5, 11, 12, 14}. The vertical red line represents the location of the mean, which has a value of 9. From each data point, there is a horizontal black line from that point to the mean, representing how much the value of that point deviates from the mean. There are five values for the deviations {-6, -4, +2, +3, +5}. Notice that the sum of the deviations from the mean is zero. The standard deviation is a way of describing how the data points typical deviate from the mean. However, since some of the deviation values are positive while others are negative, it is not helpful to simply find the sum or the mean of these deviations. One method that can be used to eliminate the negative deviations is to square Figure 3. 17 each deviation. Once deviations from the mean are squared, their sum will no longer be zero. The squared deviations are represented as the area of the gray squares in the diagram with values {36, 16, 4, 9, and 25}. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 16 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Two common measures that are used for describing the spread or dispersion of data around the mean are variance and standard deviation, both of which are based on the mean of the squared deviations. The variance is the mean of the squared deviations and can be found by dividing the sum of the squared deviations by n (if you are working with the entire population) or n-1 (if you are working with a sample)2. In order to have a measure of spread that is on the same scale as the original data, we can take the square root of this mean. This will standardize the measure, resulting in the measure called the standard deviation. By default, Fathom will compute standard deviations and variances based on a sample. However, there are formulas in Fathom that can be used to compute these measures based on a population if so desired. The median and interquartile range are considered resistant measures because they are based on ranks in data and not numerical values. Therefore, they are not strongly influenced by outliers. The mean and the standard deviation are considered nonresistant measures because they are based on numerical values of each data point. Therefore, a numerical value well outside of the range of most of the data will affect each of these measures. FOCUS ON MATHEMATICS M-Q13. What does the magnitude of the standard deviation tell you about the dispersion of the data points in relationship to the mean? M-Q14. Consider the following formulas for computing the variance (s2) and standard deviation (s) for data in a sample of size n where x represents the mean and xi is the ith data value. n n s2 = ( xi x ) 2 i =1 s= (x i x)2 i =1 n 1 n 1 Explain what each part of the formula represents with respect to the diagram in Figure 3.17 and the explanation above. M-Q15. Explain why the 2006 Vehicle data are considered a sample rather than a population. 2 When finding the variance and standard deviation of a population, we divide by n. However, most data sets are a sample of the population. If we compute the variance for a sample in the same way that we compute the variance of a population, we will have a biased estimator of the population variance. That is, if we took all possible samples of n members and calculated the variance by dividing by n and took the mean of those variances, this value would not be equal to the true value of the population variance. Fortunately the correction for this bias is remarkably simple. To correct for this bias, we divide by n-1 rather than n when we have a sample. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 17 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom M-Q16. Consider the distributions and location of the mean City mpg for each of the three Engine types. Which engine type do you predict will have the largest standard deviation? The smallest? Explain your reasoning based on how the data values deviate from the mean for each Engine type. M-Q17. Use a summary table to find the value of the standard deviation of the City mpg for each of the three Engine types. What do these values tell you about the spread of the City mpg? Do the calculations match your predictions? FOCUS ON PEDAGOGY P-Q11. Students are often introduced to the standard deviation through instruction on how to compute its value based on the formulas shown in M-Q14. What is the benefit of using a diagram such as the one in Figure 3.17 to help students conceptualize standard deviation as a measure that describes typical deviation from the mean? P-Q12. What are the advantages or drawbacks of having students examine several distributions with the means indicated as in M-Q14 and asking them to predict magnitude of a standard deviation before using Fathom to compute the exact values? __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 18 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom SUGGESTED ASSIGNMENTS H-Q1 (Mathematical) Use Fathom to create graphical displays and compute statistical measures to compare the distributions of the Highway mpg for the three Engine types. Which type of engines give vehicles the best fuel economy on the highway? Justify your reasoning. H-Q2 (Mathematical and Pedagogical) The mean absolute deviation is often introduced in middle school as an introductory measure of spread. While the mean absolute deviation is easy to compute, the behavior of the absolute value function make it a more difficult measure to use when conducting more complex statistical analyses and is therefore infrequently used in high school and college. Instead of using squaring as a method to eliminate the negative deviations, the mean absolute deviation is computed by finding the absolute value of each deviation from the mean and then finding the mean of these values. Consider the collection of 9 cases with a mean of 5 shown in the table and dot plot below. a) What is the value of the mean absolute deviation (MAD) for this data set? b) What does the value of the MAD indicate about the spread of the data? c) How would you need to change the values in the data set so that the mean remains 5 but the MAD increases to 24/9? d) Describe the benefits and drawbacks of using the mean absolute deviation and the benefits and drawbacks of using the standard deviation with middle and/or high school mathematics students. __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 19 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom HQ-3. (Pedagogical) Compare the pedagogical benefits and drawbacks of using Fathom and TinkerPlots to explore univariate data with respect to the following points: • The organization of data in a collection • The linking of representations • The representations available and the construction of graphs • Use of color • The ability to display measures on a graph • Calculation of measures H-Q4. (Pedagogical) When is it advantageous to use the median and interquartile range as summary measures? Mean and standard deviation? When examining a distribution, how can you assist students in deciding if resistant or nonresistant measures are appropriate? __________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 20 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006