Statway TM A statistics pathway for college students

Statway TM A statistics pathway for college students Module 1: Statistical Studies and Overview of the Data Analysis Process Module 2: Summarizing Data Graphically and Numerically Module 3: Reasoning About Bivariate Numerical Data—Linear Relationships Module 4: Modeling Nonlinear Relationships Module 5: Reasoning About Bivariate Categorical Data and Introduction to Probability Module 6: Formalizing Probability and Probability Distributions Module 7: Linking Probability to Statistical Inference Module 8: Inference for One Proportion Module 9: Inference for Two Proportions Module 10: Inference for Means Module 11: Chi-Squared Tests Module 12: Other Mathematical Content Version 1.0 A resource from The Charles A. Dana Center at The University of Texas at Austin July 2011 Frontmatter Statway—Full Version 1.0, July 2011 Unless otherwise indicated, the materials found in this resource are Copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin Outside the license described below, no part of this resource shall be reproduced, stored in a retrieval system, or transmitted by any means—electronically, mechanically, or via photocopying, recording, or otherwise, including via methods yet to be invented—without express written permission from the Foundation and the University. The original version of this work was created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. STATWAYTM / StatwayTM is a trademark of the Carnegie Foundation for the Advancement of Teaching. *** This copyright notice is intended to prohibit unlicensed commercial use of the Statway materials. License for use Statway Version 1.0, developed by the Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported license. To view the details of this license, see creativecommons.org/licenses/by-nc-sa/3.0. In general, under this license You are free: to Share—to copy, distribute, and transmit the work to Remix—to adapt the work Under the following conditions: Attribution—You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). We request you attribute the work thus: The original version of this work was developed by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This work is used (or adapted) under the Creative Commons Attribution-NonCommercialShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license: creativecommons.org/licenses/by-nc-sa/3.0. For more information about Carnegie’s work on Statway, see www.carnegiefoundation.org/statway; for information on the Dana Center’s work on The New Mathways Project, see www.utdanacenter.org/mathways. Noncommercial—You may not use this work for commercial purposes. Share Alike—If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. The Charles A. Dana Center at the University of Texas at Austin, as well as the authors and editors, assume no liability for any loss or damage resulting from the use of this resource. We have made extensive efforts to ensure the accuracy of the information in this resource, to provide proper acknowledgement of original sources, and to otherwise comply with copyright law. If you find an error or you believe we have failed to provide proper acknowledgment, please contact us at dana-txshop@utlists.utexas.edu. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. ii Frontmatter The Charles A. Dana Center The University of Texas at Austin 1616 Guadalupe Street, Suite 3.206 Austin, TX 78701-1222 Fax: 512-232-1855 dana-txshop@utlists.utexas.edu www.utdanacenter.org Statway—Full Version 1.0, July 2011 The Carnegie Foundation for the Advancement of Teaching 51 Vista Lane Stanford, California, 94305 Phone: 650-566-5110 pathways@carnegiefoundation.org www.carnegiefoundation.org About the development of this resource The content for this full version of Statway was developed under a November 30, 2010, agreement by a team of faculty authors and reviewers contracted and managed by the Charles A. Dana Center at the University of Texas at Austin with funding from the Carnegie Foundation for the Advancement of Teaching. This resource was produced in Microsoft Word 2008 and 2011 for the Mac. The content of these 12 modules was developed and produced (that is, written, reviewed, edited, and laid out) by the Charles A. Dana Center at The University of Texas at Austin and delivered by the Dana Center to the Carnegie Foundation for the Advancement of Teaching on June 30, 2011. Some issues to be aware of: • PDF files need to be viewed with Adobe Acrobat for full functionality. If viewed through Preview, which is the default on some computers, the URLs may not be correct. • The file names indicate the lesson number and whether the document is the instructor or student version or the out-of-class experience. The Dana Center is engaged in a process of revising and improving these materials to create the Dana Center Statistics Pathway. We welcome feedback from the community as part of our course revision process. If you would like to discuss these materials or learn more about the Dana Center’s plans for this course, contact us at mathways@austin.utexas.edu. About the Charles A. Dana Center at The University of Texas at Austin The Dana Center collaborates with local and national entities to improve education systems so that they foster opportunity for all students, particularly in mathematics and science. We are dedicated to nurturing students’ intellectual passions and ensuring that every student leaves school prepared for success in postsecondary education and the contemporary workplace—and for active participation in our modern democracy. The Center was founded in 1991 in the College of Natural Sciences at The University of Texas at Austin. Our original purpose—which continues in our work today—was to raise student achievement in K–16 mathematics and science, especially for historically underserved populations. We carry out our work by supporting high standards and building system capacity; collaborating with key state and national organizations to address emerging issues; creating and delivering professional supports for educators and education leaders; and writing and publishing education resources, including student supports. Our staff of more than 80 researchers and education professionals has worked intensively with dozens of school systems in nearly 20 states and with 90 percent of Texas’s more than 1,000 school districts. As one of the College’s largest research units, the Dana Center works to further the university’s mission of achieving excellence in education, research, and public service. We are committed to ensuring that the accident of where a student attends school does not limit the academic opportunities he or she can pursue. For more information about the Dana Center and our programs and resources, see our homepage at www.utdanacenter.org. To access our resources (many of them free) please see our products index at www.utdanacenter.org/products. To learn about Dana Center professional development sessions, see our professional development site at www.utdanacenter.org/pd. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. iii Frontmatter Statway—Full Version 1.0, July 2011 Acknowledgments The original version of this work was created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. Carnegie Corporation of New York, The Bill & Melinda Gates Foundation, The William and Flora Hewlett Foundation, Lumina Foundation, and The Kresge Foundation joined in partnership with the Carnegie Foundation for the Advancement of Teaching in this work. Leadership—Charles A. Dana Center at the University of Texas at Austin Uri Treisman, director Susan Hudson Hull, program director of mathematics national initiatives Leadership—Carnegie Foundation for the Advancement of Teaching Anthony S. Bryk, president Bernadine Chuck Fong, senior managing partner Louis Gomez, senior fellow Paul LeMahieu, senior fellow James Stigler, senior fellow Uri Treisman, senior fellow Guadalupe Valdés, senior fellow Statway Project Leads Kristen Bishop, former team lead for the New Mathways Project, the Charles A. Dana Center at the University of Texas at Austin Thomas J. Connolly, project lead, Statway, the Charles A. Dana Center at the University of Texas at Austin Karon Klipple, director of Statway, the Carnegie Foundation for the Advancement of Teaching Jane Muhich, director of Quantway, the Carnegie Foundation for the Advancement of Teaching Project Staff—Charles A. Dana Center at the University of Texas at Austin Richard Blount, advisor Kathi Cook, project director, online services team Jenna Cullinane, research associate Steve Engler, lead editor and production editor Amy Getz, team lead for the New Mathways Project Susan Hudson Hull, program director of mathematics national initiatives Joseph Hunt, graduate research assistant Rachel Jenkins, consulting editor Erica Moreno, program coordinator Carol Robinson, administrative associate Cathy Seeley, senior fellow Rachele Seifert, administrative associate Lilly Soto, senior administrative associate Phil Swann, senior designer Laura Torres, graduate research assistant Thomas Wiegel, freelance formatter and proofreader The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. iv Frontmatter Statway—Full Version 1.0, July 2011 Authors Contracted by the Dana Center Roxy Peck, professor emerita of statistics, California Polytechnic State University, San Luis Obispo, California Beth Chance, professor of statistics, California Polytechnic State University, San Luis Obispo, California Robert C. delMas, associate professor of educational psychology, University of Minnesota, Minneapolis, Minnesota Scott Guth, professor of mathematics, Mt. San Antonio College, Walnut, California Rebekah Isaak, graduate research student, University of Minnesota, Minneapolis, Minnesota Leah McGuire, assistant professor, University of Minnesota, Minneapolis, Minnesota Jiyoon Park, graduate research student, University of Minnesota, Minneapolis, Minnesota Brian Kotz, associate professor of mathematics, Montgomery College, Germantown, Maryland Chris Olsen, assistant professor of mathematics and statistics, Grinnell College, Grinnell, Iowa Mary Parker, professor of mathematics, Austin Community College, Austin, Texas Michael A. Posner, associate professor of statistics, Villanova University, Villanova, Pennsylvania Thomas H. Short, professor, John Carroll University, University Heights, Ohio Penny Smeltzer, teacher of statistics, Westwood High School, Austin, Texas Myra Snell, professor of mathematics, Los Medanos College, Pittsburg, California Laura Ziegler, graduate research student, University of Minnesota, Minneapolis, Minnesota Reviewers Contracted by the Dana Center Michelle Brock, American River College, Sacramento, California Thomas J. Connolly, the Charles A. Dana Center at the University of Texas at Austin Andre Freeman, Capital Community College, Hartford, Connecticut Karon Klipple, the Carnegie Foundation for the Advancement of Teaching Roxy Peck, professor emerita of statistics, California Polytechnic State University, San Luis Obispo, California Jim Smart, Tallahassee Community College, Tallahassee, Florida Myra Snell, Los Medanos College, Pittsburg, California Committee for Statistics Learning Outcomes Rose Asera, formerly of the Carnegie Foundation for the Advancement of Teaching Kristen Bishop, formerly of the Charles A. Dana Center at the University of Texas at Austin Richelle (Rikki) Blair, American Mathematical Association of Two-Year Colleges (AMATYC); Lakeland Community College, Ohio David Bressoud, Mathematical Association of America (MAA); Macalester College, Minnesota John Climent, American Mathematical Association of Two-Year Colleges (AMATYC); Cecil College, Maryland Peg Crider, Lone Star College, Tomball, Texas Jenna Cullinane, the Charles A. Dana Center at the University of Texas at Austin Robert C. delMas, Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); University of Minnesota, Minneapolis, Minnesota Bernadine Chuck Fong, the Carnegie Foundation for the Advancement of Teaching Karen Givvin, the University of California, Los Angeles Larry Gray, American Mathematical Society (AMS); University of Minnesota Susan Hudson Hull, the Charles A. Dana Center at the University of Texas at Austin Rob Kimball, American Mathematical Association of Two-Year Colleges (AMATYC); Wake Technical Community College, North Carolina Dennis Pearl, Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); The Ohio State University Roxy Peck, American Statistical Association (ASA); Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); California Polytechnic State University, San Luis Obispo, California The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. v Frontmatter Statway—Full Version 1.0, July 2011 Myra Snell, American Mathematical Association of Two-Year Colleges (AMATYC); Los Medanos College, Pittsburg, California Jim Stigler, the Carnegie Foundation for the Advancement of Teaching; the University of California, Los Angeles Daniel Teague, Mathematical Association of America (MAA); North Carolina School of Science and Mathematics, Durham Uri Treisman, the Carnegie Foundation for the Advancement of Teaching; the Charles A. Dana Center at the University of Texas at Austin Version 1.0 of Statway was developed in collaboration with faculty from the following colleges, the “Collaboratory,” who advised on the development of the course. These Collaboratory colleges are: Florida Miami Dade College, Miami, Florida Tallahassee Community College, Tallahassee, Florida Valencia Community College, Orlando, Florida California American River College, Sacramento, California Foothill College, Los Altos Hills, California Mt. San Antonio College, Walnut, California Pierce College, Woodland Hills, California San Diego City College, San Diego, California California State University System Texas CSU Northridge Sacramento State University San Jose State University Austin Community College, Austin, Texas El Paso Community College, El Paso, Texas Houston Community College, Houston, Texas Northwest Vista College, San Antonio, Texas Richland College, Dallas, Texas Connecticut Washington Capital Community College, Hartford, Connecticut Gateway Community College, New Haven, Connecticut Housatonic Community College, Bridgeport, Connecticut Naugatuck Valley Community College, Waterbury, Connecticut Seattle Central Community College, Seattle, Washington Tacoma Community College, Tacoma, Washington The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. vi Frontmatter Statway—Full Version 1.0, July 2011 Statway, Full Version 1.0, July 2011 Table of Contents Module 1: Statistical Studies and Overview of the Data Analysis Process Lesson 1.1.1: The Statistical Analysis Process Lesson 1.1.2: Types of Statistical Studies and Scope of Conclusions Lesson 1.2.1: Collecting Data by Sampling Lesson 1.2.2: Random Sampling Lesson 1.2.3: Other Sampling Strategies Lesson 1.2.4: Sources of Bias in Sampling Lesson 1.3.1: Collecting Data by Conducting an Experiment Lesson 1.3.2: Other Design Considerations—Blinding, Control Groups, and Placebos Lesson 1.4.1: Drawing Conclusions from Statistical Studies Module 2: Summarizing Data Graphically and Numerically Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Lesson 2.1.2: Constructing Histograms for Quantitative Data Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Lesson 2.2.2: Constructing Histograms for Quantitative Data Lesson 2.3.1: Quantifying Variability Relative to the Median Lesson 2.4.1: Quantifying Variability Relative to the Mean Lesson 2.4.2: The Sample Variance Module 3: Reasoning About Bivariate Numerical Data—Linear Relationships Lesson 3.1.1: Introduction to Scatterplots and Bivariate Relationships Lesson 3.1.2: Developing an Intuitive Sense of Form, Direction, and Strength of the Relationship Between Two Measurements Lesson 3.1.3: Introduction to the Correlation Coefficient and Its Properties Lesson 3.1.4: Correlation Formula Lesson 3.1.5: Correlation Is Not Causation Lesson 3.2.1: Using Lines to Make Predictions Lesson 3.2.2: Least Squares Regression Line as Line of Best Fit The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. vii Frontmatter Statway—Full Version 1.0, July 2011 Lesson 3.2.3: Investigating the Meaning of Numbers in the Equation of a Line Lesson 3.2.4: Special Properties of the Least Squares Regression Line Lesson 3.3.1: Using Residuals to Determine If a Line Is a Good Fit Lesson 3.3.2: Using Residuals to Determine If a Line Is an Appropriate Model Module 4: Modeling Nonlinear Relationships Lesson 4.1.1: Investigating Patterns in Data Lesson 4.1.2: Exponential Models Lesson 4.1.3: Assessing How Well a Model Fits the Data Module 5: Reasoning About Bivariate Categorical Data and Introduction to Probability Lesson 5.1.1: Reasoning About Risk and Chance Lesson 5.1.2: Defining Risk Lesson 5.1.3: Interpreting Risk Lesson 5.1.4: Comparing Risks Lesson 5.1.5: More on Conditional Risks Module 6: Formalizing Probability and Probability Distributions Lesson 6.1.1: Probability Lesson 6.1.2: Probability Rules Lesson 6.1.3: Simulation, Discrete Random Variables, and Probability Distributions Lesson 6.2.1: Probability Distributions of Continuous Random Variables Lesson 6.2.2: Z-Scores and Normal Distributions Lesson 6.2.3: Using Normal Distributions to Find Probabilities and Critical Values Module 7: Linking Probability to Statistical Inference Lesson 7.1.1: Predicting an Election—Statistics and Sampling Variability Lesson 7.1.2: Sampling from a Population Lesson 7.1.3: Testing Statistical Hypotheses Lesson 7.2.1: Two Types of Inferential Procedures—Estimation and Hypothesis Testing Lesson 7.2.2: Connecting Sampling Distributions and Confidence Intervals Lesson 7.2.3: Connecting Sampling Distributions and Hypothesis Testing The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. viii Frontmatter Statway—Full Version 1.0, July 2011 Module 8: Inference for One Proportion Lesson 8.1.1: Sampling Distribution of One Proportion Lesson 8.1.2: Sampling Distribution of One Proportion Lesson 8.2.1: Estimation of One Proportion Lesson 8.2.2: Estimation of One Proportion Lesson 8.3.1: Estimation of One Proportion Lesson 8.3.2: Hypothesis Testing for One Proportion Module 9: Inference for Two Proportions Lesson 9.1.1: Sampling Distribution of Differences of Two Proportions Lesson 9.1.2: Using Technology to Explore the Sampling Distribution of the Differences in Two Proportions Lesson 9.2.1: Confidence Intervals for the Difference in Two Population Proportions Lesson 9.2.2: Computing and Interpreting Confidence Intervals for the Difference in Two Population Proportions Lesson 9.3.1: A Statistical Test for the Difference in Two Population Proportions Lesson 9.3.2: A Statistical Test for the Difference in Two Population Proportions Lesson 9.3.3: Conducting a Statistical Test for the Difference in Two Population Proportions Module 10: Inference for Means Lesson 10.1.1: The Sampling Distribution of the Sample Mean Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Lesson 10.2.1: Estimating a Population Mean Lesson 10.2.2: T-Statistics and T-Distributions Lesson 10.2.3: The Confidence Interval for a Population Mean Lesson 10.3.1: Testing Hypotheses About a Population Mean Lesson 10.3.2: Test Statistic and P-Values, One-Sample T-Test Lesson 10.4.1: Inferences About the Difference Between Two Population Means Lesson 10.4.2: Inference for Paired Data Lesson 10.4.3: Two-Sample T-Test The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. ix Frontmatter Statway—Full Version 1.0, July 2011 Module 11: Chi-Squared Tests Lesson 11.1.1: Introduction to Chi-Square Tests for One-Way Tables Lesson 11.1.2: Executing the Chi-Square Test for One-Way Tables (Goodness-of-Fit) Lesson 11.1.3: The Chi-Square Distribution and Degrees of Freedom Lesson 11.2.1: Introduction to Chi-Square Tests for Two-Way Tables Lesson 11.2.2: Executing the Chi-Square Test for Independence in Two-Way Tables Lesson 11.2.3: Executing the Chi-Square Test for Homogeneity in Two-Way Tables Module 12: Other Mathematical Content Lesson 12.1.1: Statistical Linear Relationships and Mathematical Models of Linear Relationships Lesson 12.1.2: Mathematical Linear Models Lesson 12.1.3: Contrasting Mathematical and Statistical Linear Relationships Lesson 12.1.4: Proportional Models Lesson 12.2.1: Multiple Representations of Exponential Models Lesson 12.2.2: Linear Models—Answering Various Types of Questions Algebraically Lesson 12.2.3: Power Models Lesson 12.2.4: Solving Inequalities The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. x Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will begin to understand that the process of describing distributions includes descriptions of the center, spread, and shape. These descriptions are useful for characterizing a distribution and comparing distributions. Students will begin to be able to • • • construct dotpolots using technology. estimate typical values and typical ranges of values from a dotplot. use dotplots to compare data sets. Part I [estimated total time: 15–20 minutes] Introduction [estimated time: 5–8 minutes] (Note: The student handout includes the following information on how this module connects to Module 1 and on the context of the data set for this lesson. Based on your understanding of students’ needs, decide how to go over this information with the class [e.g., giving students time to read, and then asking questions to prompt comprehension of the information].) In Module 1, you focused on thinking through the data analysis process: • • • • • formulating a question that can be answered using data, deciding if you will collect data through an observational study or an experiment, determining a measure that generates useful data, analyzing the data, and drawing conclusions based on the analysis of the data. In this module, you will use this data analysis process but will focus on developing statistical tools for analyzing data. Today, you will investigate a question related to basketball. The National Basketball Association (NBA) announced that a new basketball would be used for the 2006–2007 season. (www.nba.com/news/blackbox_060628.html) The NBA is introducing a new Official Game Ball for play beginning in the 2006–07 season. The new ball, manufactured by Spalding, features a new design and a new material that together offer better grip, feel, and consistency than the current leather ball. This marks the first change to the ball in over 35 years and only the second in 60 seasons. … The NBA and Spalding subjected the ball to a rigorous evaluation process that included laboratory and on-‐court testing. Every NBA team received the new ball and had the opportunity to use it in practice. The ball also was tested in the NBA Development League and was used in activities during NBA All-‐Star 2006 in Houston. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Players in the league complained about the new ball, and the NBA announced that the traditional leather ball would be used again beginning January 1, 2007. (http://tinyurl.com/6cl7j8o) Washington Wizards guard Gilbert Arenas said the new basketball gets slippery when it comes into contact with even small amounts of sweat. Teammate Antawn Jamison said he had trouble palming the new ball while driving to the basket. Miami Heat center Shaquille O’Neal said it “feels like one of those cheap balls that you buy at the toy store.” Some players, including league MVP Steve Nash, recently began complaining that the new ball was producing small cuts on their hands.” (Note: There is interesting information in these articles. Consider showing the picture of the new ball and highlighting other information in the articles that you think is of interest to students. For example, the players filed a complaint about the new ball with their union.) The player complaints may have been based on the performance of the new ball, and this may have had a measureable effect on the games. Task [estimated time: 5–8 minutes] In this task, you will use data to answer the question, “Did the synthetic ball affect game performance?” Discuss the following questions with a neighbor or your group: (1) What data could be collected to answer this question? (Be specific about what you would measure.) What would you expect to see in the data if the synthetic ball was affecting the players’ performance during the game? For example, Jamison said he had trouble palming the new ball while driving to the basket. So, you could determine the percentage of attempted layups that were scored in every game. This is your measure of game performance. You could gather these data for the 2006 season (when the synthetic ball was used) and for the 2007 season (when the traditional leather ball was used) and compare the data sets. You can see if a smaller percentage of layups were successful with the synthetic ball. This is an observational study. Wrap-‐Up [estimated time: 5–8 minutes] Note: After students have had 5–8 minutes to think about what data they would collect, call on volunteers to share their ideas. Do not expect students to have detailed study designs. The goal is to have a few ideas articulated to illustrate that there are many possible ways to define a measure and design a study. Highlight use of concepts from Module 1 that come up in their responses. For example, point out if they are designing an experiment or an observational study. Add to student explanations if necessary to clarify the measurement. Point out whether the measure yields numerical or categorical data. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Part II [estimated time: 25 minutes, including wrap-‐up] Task [Student Handout] At the end of this lesson, you will analyze real data from the NBA. You will have the number of points scored by each team for the games played in last week of 2006 when the synthetic ball was used. For comparison, you will have the number of points scored by each team for the games played in the first week of 2007 when the traditional leather ball was used. So, your measure is “points per team scored in each game.” Before you look at real NBA data from the 2006 and 2007 seasons, first think about how the two sets of data might differ if the synthetic ball affected the points scored by the teams. (2) The following are made-‐up data (not real NBA data). (a) What does a dot represent in these dotplots? (b) Which dotplot gives you data about game performance with the synthetic ball? (c) Compare the two sets of data. Do the data suggest that the synthetic ball affected the points scored by the teams? Jot down some notes about how the data support your answer. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data (3) The following are more made-‐up data. Compare the two sets of data. Do the data suggest that the synthetic ball affected the points scored by the teams? Jot down some notes about how the data support your answer. (4) The following are more made-‐up data. Compare the two sets of data. Do the data suggest that the synthetic ball affected the points scored by the teams? Jot down some notes about how the data support your answer. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Wrap-‐Up/Direct Instruction About Statistical Concepts The goal of the wrap-‐up is to introduce the concepts of shape, center (typical or representative value), and spread (overall range or range of typical values) as distinguishing features of a distribution. You can do this by leading a class discussion or delivering a minilecture. You might use an analogy here to explain that statisticians describe distributions by describing center, spread, and shape, just as you might describe a person by describing his or her sex, age, race, height, and weight. Discuss each pair of graphs in Questions 2–4. For each pair, establish whether the synthetic ball helped or hurt game performance. Highlight how the distributions differ. For the first pair, the synthetic ball lowered the number of points each team scored. Compare the overall range of each distribution, but emphasize that the centers differ (about 84 points for the synthetic ball and about 108 for the traditional leather ball). Talk about the center as a way to identify a typical or representative value. For the second pair, the centers of the two distributions are about the same (around 100). The overall range of values is also the same (70 to 120). However, the synthetic ball appears to make the scoring more consistent. There is less spread in the data when you pay attention to how the data are distributed between 70 and 120. A large portion of the data falls between about 95 and 105. Scores less than 90 and above 110 are rare. However, with the traditional leather ball, you see that scores around 80 and around 110 are not unusual. There is more variability in the scores with the traditional leather ball. For the third pair, the shapes of the distributions differ. The synthetic ball again appears to produce more consistent scoring patterns with a triangular or somewhat bell-‐shaped pattern centered around 100. It appears that you would be as likely to see a game with scores between 90 and 100 as between 100 and 110. There are two outliers from this pattern, but these appear to be unusual. The leather ball produced a scoring pattern that is skewed to the left; most scores were between 90 and 100, but it was not unusual to see games in which teams scored between 70 and 90. It is very rare for games to have scores above 100. This shape is “left skewed” because there is a tail of values to the left of the main clump. This is a good time to introduce students to making dotplots using technology. Emphasize that in order to facilitate visual comparisons, they need to make graphs of distributions side-‐by-‐side and on the same scale. Dotplots are included for use in this homework assignment, but you could also have students construct them. Homework in the rest of Module 2 requires the use of technology. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Homework [Student Handout] (1) Now you are going to look at the real NBA data. You have the number of points scored by each team for the games played in last week of 2006 when the synthetic ball was used. For comparison, you have the number of points scored by each team for the games played in the first week of 2007 when the traditional leather ball was used. So, your measure is “points per team scored in each game,” which is called Points in the Excel file, NBA_new_ball_data_2006-‐2007.xls. Use technology to create two dotplots of the points scored in each game by each team so that you can compare the use of the synthetic ball with the use of the traditional leather ball. You can also use the two dotplots of these data provided below. Do the real data suggest that the synthetic ball affected the number of points scored? Explain what you are seeing in the data that supports your answer. (2) One way you can compare distributions is by describing the center of the data, which can be viewed as a typical value that could be used to represent the data. (a) Pick a typical score to represent each data set. (b) Circle that value in each dotplot. (c) Is the typical score you chose for games played with the synthetic ball higher, lower, or about the same as the typical score for games played with the traditional leather ball? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data (3) Another way to compare distributions is to describe the spread, which can be viewed as the overall range of the data from the lowest value to the highest value. It can also be viewed as a typical range of values that could be used to represent the data. (a) Give a range of scores that represents typical scores for each data set. (b) Mark the typical range of values in each dotplot. (c) Does either ball appear to result in more consistent scoring patterns? (Is the typical range you chose noticeably shorter for one set of data?) (4) In addition to center and spread, you can also use descriptions of the shape, which try to capture the patterns you see in the data. How would you describe the overall shape of each dotplot? (5) The following paragraphs compare the NBA data for the last week of 2006 and the first week of 2007. Read the paragraphs, and then follow the instructions below. Our goal was to determine if the synthetic basketball affected NBA game performance. We compared the number of points scored by each team in games played the last week of 2006 with the synthetic ball to the number of points scored by each team in games played the first week of 2007 with the traditional leather ball. This was an observational study. We found that the distribution of points scored did not differ much. The typical number of points scored by a team in a game was around 100 points. This was true whether the synthetic ball or the traditional leather ball was used. Of course, there was variability in points scored by different teams during different games. However, typical scores ranged from about 85 points to 110 points for both sets of data. So, scoring was similarly consistent with either type of ball. Both balls had scoring patterns that were slightly skewed to the right with a tail made up of 4 to 7 high-‐ scoring games with scores above 120. Again, we viewed this as similar. So, our conclusion is that the synthetic ball did not affect scoring. Of course, other aspects of the game could have been affected by the synthetic ball as the players said. We only looked at the effect of the ball on points scored during one week of play. These paragraphs are an example of an accurate, precise, and thorough description of a statistical study. In particular, they illustrate how to use descriptions of center, spread, and shape to compare data sets and draw a conclusion. Label the following in these paragraphs: (a) the research question being investigated (b) the measure used to define what data are collected (c) the type of study (observational or experiment) (d) the use of center in the data analysis (e) the use of spread in the data analysis (f) the use of shape in the data analysis (g) the conclusion drawn from the data analysis The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data In Module 1, you focused on thinking through the data analysis process: • • • • • formulating a question that can be answered using data, deciding if you will collect data through an observational study or an experiment, determining a measure that generates useful data, analyzing the data, and drawing conclusions based on the analysis of the data. In this module, you will use this data analysis process but will focus on developing statistical tools for analyzing data. Today, you will investigate a question related to basketball. Part I The National Basketball Association (NBA) announced that a new basketball would be used for the 2006–2007 season. (www.nba.com/news/blackbox_060628.html) The NBA is introducing a new Official Game Ball for play beginning in the 2006–07 season. The new ball, manufactured by Spalding, features a new design and a new material that together offer better grip, feel, and consistency than the current leather ball. This marks the first change to the ball in over 35 years and only the second in 60 seasons. … The NBA and Spalding subjected the ball to a rigorous evaluation process that included laboratory and on-‐court testing. Every NBA team received the new ball and had the opportunity to use it in practice. The ball also was tested in the NBA Development League and was used in activities during NBA All-‐Star 2006 in Houston. Players in the league complained about the new ball, and the NBA announced that the traditional leather ball would be used again beginning January 1, 2007. (http://tinyurl.com/6cl7j8o) Washington Wizards guard Gilbert Arenas said the new basketball gets slippery when it comes into contact with even small amounts of sweat. Teammate Antawn Jamison said he had trouble palming the new ball while driving to the basket. Miami Heat center Shaquille O’Neal said it “feels like one of those cheap balls that you buy at the toy store.” Some players, including league MVP Steve Nash, recently began complaining that the new ball was producing small cuts on their hands.” The player complaints may have been based on the performance of the new ball, and this may have had a measureable effect on the games. In this task, you will use data to answer the question, “Did the synthetic ball affect game performance?” The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Discuss the following questions with a neighbor or your group: (1) What data could be collected to answer this question? (Be specific about what you would measure.) What would you expect to see in the data if the synthetic ball was affecting the players’ performance during the game? For example, Jamison said he had trouble palming the new ball while driving to the basket. So, you could determine the percentage of attempted layups that were scored in every game. This is your measure of game performance. You could gather these data for the 2006 season (when the synthetic ball was used) and for the 2007 season (when the traditional leather ball was used) and compare the data sets. You can see if a smaller percentage of layups were successful with the synthetic ball. This is an observational study. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Part II At the end of this lesson, you will analyze real data from the NBA. You will have the number of points scored by each team for the games played in last week of 2006 when the synthetic ball was used. For comparison, you will have the number of points scored by each team for the games played in the first week of 2007 when the traditional leather ball was used. So, your measure is “points per team scored in each game.” Before you look at real NBA data from the 2006 and 2007 seasons, first think about how the two sets of data might differ if the synthetic ball affected the points scored by the teams. (2) The following are made-‐up data (not real NBA data). (a) What does a dot represent in these dotplots? (b) Which dotplot gives you data about game performance with the synthetic ball? (c) Compare the two sets of data. Do the data suggest that the synthetic ball affected the points scored by the teams? Jot down some notes about how the data support your answer. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data (3) The following are more made-‐up data. Compare the two sets of data. Do the data suggest that the synthetic ball affected the points scored by the teams? Jot down some notes about how the data support your answer. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data (4) The following are more made-‐up data. Compare the two sets of data. Do the data suggest that the synthetic ball affected the points scored by the teams? Jot down some notes about how the data support your answer. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Homework (1) Now you are going to look at the real NBA data. You have the number of points scored by each team for the games played in last week of 2006 when the synthetic ball was used. For comparison, you have the number of points scored by each team for the games played in the first week of 2007 when the traditional leather ball was used. So, your measure is “points per team scored in each game,” which is called Points in the Excel file, NBA_new_ball_data_2006-‐2007.xls. Use technology to create two dotplots of the points scored in each game by each team so that you can compare the use of the synthetic ball with the use of the traditional leather ball. You can also use the two dotplots of these data provided below. Do the real data suggest that the synthetic ball affected the number of points scored? Explain what you are seeing in the data that supports your answer. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data (2) One way you can compare distributions is by describing the center of the data, which can be viewed as a typical value that could be used to represent the data. (a) Pick a typical score to represent each data set. (b) Circle that value in each dotplot. (c) Is the typical score you chose for games played with the synthetic ball higher, lower, or about the same as the typical score for games played with the traditional leather ball? (3) Another way to compare distributions is to describe the spread, which can be viewed as the overall range of the data from the lowest value to the highest value. It can also be viewed as a typical range of values that could be used to represent the data. (a) Give a range of scores that represents typical scores for each data set. (b) Mark the typical range of values in each dotplot. (c) Does either ball appear to result in more consistent scoring patterns? (Is the typical range you chose noticeably shorter for one set of data?) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data (4) In addition to center and spread, you can also use descriptions of the shape, which try to capture the patterns you see in the data. How would you describe the overall shape of each dotplot? (5) The following paragraphs compare the NBA data for the last week of 2006 and the first week of 2007. Read the paragraphs, and then follow the instructions below. Our goal was to determine if the synthetic basketball affected NBA game performance. We compared the number of points scored by each team in games played the last week of 2006 with the synthetic ball to the number of points scored by each team in games played the first week of 2007 with the traditional leather ball. This was an observational study. We found that the distribution of points scored did not differ much. The typical number of points scored by a team in a game was around 100 points. This was true whether the synthetic ball or the traditional leather ball was used. Of course, there was variability in points scored by different teams during different games. However, typical scores ranged from about 85 points to 110 points for both sets of data. So, scoring was similarly consistent with either type of ball. Both balls had scoring patterns that were slightly skewed to the right with a tail made up of 4 to 7 high-‐scoring games with scores above 120. Again, we viewed this as similar. So, our conclusion is that the synthetic ball did not affect scoring. Of course, other aspects of the game could have been affected by the synthetic ball as the players said. We only looked at the effect of the ball on points scored during one week of play. These paragraphs are an example of an accurate, precise, and thorough description of a statistical study. In particular, they illustrate how to use descriptions of center, spread, and shape to compare data sets and draw a conclusion. Label the following in these paragraphs: (a) the research question being investigated (b) the measure used to define what data are collected (c) the type of study (observational or experiment) (d) the use of center in the data analysis (e) the use of spread in the data analysis (f) the use of shape in the data analysis (g) the conclusion drawn from the data analysis The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data Estimated number of 50-‐minute class sessions: 1 Materials Required Appropriate statistical software or copies of plots/output Learning Goals Students will understand that • • • • the choice of bin size affects the appearance and interpretability of histograms. Thus, careful consideration must be given to the selection of bin size when constructing a histogram. the set of the number of data points within each bin can be used to construct frequency histograms. The set of the proportion of data points within each bin can be used to construct relative frequency histograms. there is a difference between frequency and relative frequency histograms—and that there are benefits of each. histograms can be used to characterize a distribution, specifically in terms of center, shape, and spread. Students will be able to • • • • choose ranges for the bins, tally data values into bins, report the frequencies for each bin, and compute the relative frequencies for each bin. use technology to construct histograms. describe the important characteristics of histograms in context. compare two histograms based on important characteristics such as center, shape, and spread. Constructing Histograms for a Single Quantitative Data Set Introduction to the Context of the Task [Student Handout] In Lesson 2.1.1, you analyzed data from the 2006–2007 National Basketball Association season, when the league changed to a new synthetic basketball. The NBA responded to pressure from the players by changing back to the traditional leather basketball on January 1, 2007. You also examined whether the change back to the traditional ball seemed to be associated with differences in the distribution of total points scored by the two teams in the games during the last week of 2006 and the first week of 2007. In this lesson, you will consider the distributions of a different variable. It is possible that the change in basketballs might have been associated with the difference in points scored by the two teams. (Note: A likely explanation for the variability in the point differences between using a synthetic basketball versus a leather basketball [last week of 2006 and first week of 2007, respectively] is that unlike synthetic basketballs, leather basketballs absorb sweat and dirt, which makes them heavier. This means that there is greater variability among leather basketballs used across the NBA, and thus in different games. This variability can result in varying ball performance such as how well it bounces, or varying ballhandling, such as the ability of a player to hold on to the ball [i.e., “palming” the ball], throw it to other players, or shoot a basket. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data There seems to be a clear home-‐court advantage in the NBA, so for this lesson the difference in the teams’ scores will be calculated as the Home Team’s Score minus the Visiting Team’s Score. If the final score of a game is Home Team: 110, Visiting Team: 90, then the final score difference is 110 – 90 = 20. If the final score of a game is Home Team: 88, Visiting Team: 100, then the final score difference is 88 – 100 = –12. The following tables show the differences in scores for the games during the last week of 2006 and the first week of 2007. 2006 Data 16 13 11 1 19 5 2 23 –7 8 10 15 –13 –7 6 23 –16 –25 6 –13 15 25 3 2 14 23 9 10 –1 10 26 9 –10 19 10 22 –3 –10 1 7 14 –11 6 17 8 29 23 10 –4 –7 2 10 10 14 6 2007 Data –6 –8 –1 4 24 –11 –8 5 12 –3 7 9 –15 2 –18 –2 11 3 9 –24 –4 14 19 –9 –9 2 5 32 28 –5 –18 13 11 12 17 5 –12 4 –7 –5 3 –14 4 8 23 –3 5 (Note: As a way to check for understanding of the data presentation, it may be helpful to ask students why both data sets do not have the same number of data points. In addition, perhaps have the students determine the number of data values in each set—they will need that number later in the lesson when calculating relative frequency. Also, ask them for a minimum and maximum of each.) Activities [Student Handout] (1) Based only on a visual examination of the data values, what is your impression of the whether the distributions of final score differences vary between the last week of 2006 and the first week of 2007? Explain your reasoning. (Note: This activity demonstrates that it is difficult to do this simply by looking at data sets and that visual representations are a very helpful and appropriate tool.) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (2) A table of bins will help you group the final score differences into intervals of five points each (for example, 6 to 10 points, 11 to 15 points, and so on). The bins start with the lowest final score difference of –25 and end with the highest final score difference of 29. The bins are shown in the following table: Last Week of 2006 Season Only Range Tally Frequency Relative Frequency –25 to –21 | 1 1/55 = 0.018 = 1.8% –20 to –16 –15 to –11 –10 to –6 |||| 5 –5 to –1 0 to 4 5 to 9 10 to 14 15 to 19 | 20 to 24 25 to 29 For each value in the 2006 Data table, determine the bin it falls into. For example, the first final score difference is 16, so it belongs in the bin that represents the range 15 to 19. A tally mark (|) is placed in the Tally column next to the range 15 to 19, as shown in the table. Continue doing this for all values in the 2006 Data table. Each time a tally reaches five marks, represent it using a horizontal tally mark (||||), as shown for the bin (range) –10 to –6. (3) The frequency is equal to the number of tally marks for each bin (range). Convert the tallies to numbers that represent the frequencies for each bin. For example, the bin for –10 to –6 has five tally marks, so the frequency is 5, as shown in the table. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (Note: The graphs in Questions 4 and 7 should be handed to students on a separate sheet of paper after they have completed the tallies and frequencies. Otherwise it may make the previous activities seem unnecessary or the graph anticlimactic.) (4) The Frequency column is used to construct a frequency histogram. Draw the height of each bin to the number of its frequency. Histogram of 2006 12 Frequency 10 8 6 4 2 0 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 2006 (5) Look at the bars of the histogram carefully. What do the heights of each bar represent? How can the bars help you compare the tallies of the various bins? (Note: Provide instruction or facilitate discussion so that students recognize that the histogram is a graphical representation of the bins and that the height of each bar represents the tally for an individual bin.) (6) The relative frequency is the proportion of the total number of observations that fall in each range of the table. There were 55 games played in the final week of 2006, so each relative frequency value is found by dividing the corresponding frequency by 55. Enter the resulting proportions into the column labeled Relative Frequency. The first relative frequency has been already calculated: 1/55 = 0.018 = 1.8% in the bin labeled –25 to –19, since there is a frequency of 1 for that bin. Now, complete the table by calculating the relative frequencies for the remaining bins (ranges). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (7) A relative frequency histogram represents the relative frequencies for the bins instead of the frequencies. In other words, it represents the percentage of the total number of scores for each bin. Your instructor handed out copies of the frequency histogram for Question 4 and a relative frequency histogram that is for the final score differences in the last week of 2006. The relative frequency in this histogram is reported as a percentage instead of as a decimal. Histogram of 2006 Relative Frequency Percentage 25 20 15 10 5 0 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 2006 Do you notice any similarities between the two histograms? (Note: Provide instruction or facilitate discussion so that students recognize that the relative heights of the bars in each histogram are the same [i.e., their shapes are identical]. This can lead into instruction/discussion of the qualitative aspects of center, shape, and spread in the context of a histogram. This is addressed in the next activity.) (8) The following questions relate to important features of a graph such as a histogram. (a) How would you describe the shape of the distribution of final score differences in the last week of 2006? Is it symmetrical or does it lean in one direction? (Answer: The distribution is skewed slightly toward lower values. Note: Students have not yet been exposed to the shape of distributions using terms such as skewed and symmetric.) (b) Estimate the value for the center of the distribution. (Answer: The center seems to be around 10 points. This is interesting because it represents the approximate value of the home-‐court advantage.) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (c) Report a value for the range (or spread) of the distribution. (Answer: The range of the data is about 55 points because the smallest value is near –25 and the largest value is near 30.) Wrap-‐Up/Direct Instruction The following is the complete table for games in the last week of the 2006 NBA season: Last Week of 2006 Season Only Range Tally Frequency Relative Frequency –25 to –21 | 1 1/55 = 0.018 = 1.8% –20 to –16 | 1 1/55 = 0.018 = 1.8% –15 to –11 ||| 3 3/55 = 0.055 = 5.5% –10 to –6 |||| 5 5/55 = 0.091 = 9.1% –5 to –1 ||| 3 3/55 = 0.055 = 5.5% 0 to 4 |||| | 6 6/55 = 0.109 = 10.9% 5 to 9 |||| |||| 10 10/55 = 0.182 = 18.2% 10 to 14 |||| |||| || 12 12/55 = 0.218 = 21.8% 15 to 19 |||| | 6 6/55 = 0.109 = 10.9% 20 to 24 |||| 5 5/55 = 0.091 = 9.1% 25 to 29 ||| 3 3/55 = 0.055 = 5.5% Note: Show students the following graphs and then ask them about the difference. Ask students what they notice about the difference (possibly some guiding question about the impact of bin width size). Then have a wrap-‐up discussion using the information provided herein. It is important to realize that the width of the bins in a histogram is a subjective choice. A guideline is that if the bin width produces 5 to 15 bins (or bars) in the histogram, then it is a reasonable choice. Bin widths that produce fewer than five bars tend to smooth out too much of the shape of the distribution. Bin widths that produce more than 15 bars tend to exaggerate the pattern of peaks and valleys in a distribution. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data Here is an example of choosing a wider bin width for the score differences data from the last week of 2006: Histogram of 2006 20 Frequency 15 10 5 0 -30 -20 -10 0 2006 10 20 30 Here is an example of using narrow bin widths in the histogram: Histogram of 2006 9 8 7 Frequency 6 5 4 3 2 1 0 6 4 2 0 8 6 4 2 0 8 6 4 2 -2 -2 -2 -2 -1 -1 -1 -1 -1 - - - - 0 2 4 2006 6 8 10 12 14 16 18 20 22 24 26 28 30 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data Homework [Student Handout] (1) Consider the NBA data. (a) Use technology to construct a frequency histogram for the Home Team–Visiting Team differences in the first week of 2007, after the league changed back to the traditional leather ball. (b) Use technology to construct a relative frequency histogram for the differences from the first week of 2007. (c) Describe the important features of the distribution of differences from the first week of 2007. (d) Compare the histogram for 2007 to that for 2006. To do this effectively, you must use the same bin size for both data sets. Did you use frequency or relative frequency in comparing the two histograms? Explain your answer. (Note: Question 1d is an optional activity that presents an opportunity to foreshadow the next lesson. This could be graded for completeness rather than correctness.) Answers to Question 1 (a) Here is a reasonable frequency histogram for the data from the first week of 2007: Histogram of 2007 9 8 7 Frequency 6 5 4 3 2 1 0 -25 -20 -15 -10 -5 0 5 2007 10 15 20 25 30 35 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (b) Here is a relative frequency histogram for the same data: Histogram of 2007 18 16 14 Percent 12 10 8 6 4 2 0 -25 -20 -15 -10 -5 0 5 2007 10 15 20 25 30 35 (c) The distribution of the final score differences for the first week of 2007 is more symmetric than the distribution for the last week of 2006. The center for the 2007 distribution seems to be noticeably lower, located around 5 points. The spread for the 2007 distribution is about the same as that for 2006, at about 60 points. There are no outstanding unusual features in the histograms. (2) The February 2011 issue of Consumer Reports magazine provides the overall score ratings for 25 different folding treadmill exercise machines. The following are the overall score ratings: 81 79 76 76 75 74 73 73 73 72 71 71 71 70 70 69 66 66 65 65 64 63 61 61 50 (a) Use technology to construct a frequency histogram for the overall score ratings for folding treadmill machines. (b) Use technology to construct a relative frequency histogram for the overall score ratings for folding treadmill machines. (c) Describe the important features of the overall score ratings for folding treadmill machines. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 9 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data Answers to Question 2 (a) Here is a reasonable frequency histogram for the overall score ratings for folding treadmill machines: Histogram of Overall Score 6 Frequency 5 4 3 2 1 0 48 52 56 60 64 68 Overall Score 72 76 80 84 (b) Here is a reasonable relative frequency histogram for the overall score ratings for folding treadmill machines: Relative Frequency Histogram of Overall Score Relative Frequency Percentage 25 20 15 10 5 0 48 52 56 60 64 68 Overall Score 72 76 80 84 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 10 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (c) The shape of the distribution is roughly symmetric. The center is at about 72. The spread is represented by the range of approximately 36 units, from 48 to 84. There is an outlier at 50. Note: Students have not yet been exposed to the term outlier. While they may not use this specific term, they may attempt to describe it in their own words. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 11 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data Constructing Histograms for a Single Quantitative Data Set In Lesson 2.1.1, you analyzed data from the 2006–2007 National Basketball Association season, when the league changed to a new synthetic basketball. The NBA responded to pressure from the players by changing back to the traditional leather basketball on January 1, 2007. You also examined whether the change back to the traditional ball seemed to be associated with differences in the distribution of total points scored by the two teams in the games during the last week of 2006 and the first week of 2007. In this lesson, you will consider the distributions of a different variable. It is possible that the change in basketballs might have been associated with the difference in points scored by the two teams. There seems to be a clear home-‐court advantage in the NBA, so for this lesson the difference in the teams’ scores will be calculated as the Home Team’s Score minus the Visiting Team’s Score. If the final score of a game is Home Team: 110, Visiting Team: 90, then the final score difference is 110 – 90 = 20. If the final score of a game is Home Team: 88, Visiting Team: 100, then the final score difference is 88 – 100 = –12. The following tables show the differences in scores for the games during the last week of 2006 and the first week of 2007. 2006 Data 16 13 11 1 19 5 2 23 –7 8 10 15 –13 –7 6 23 –16 –25 6 –13 15 25 3 2 14 23 9 10 –1 10 26 9 –10 19 10 22 –3 –10 1 7 14 –11 6 17 8 29 23 10 –4 –7 2 10 10 14 6 2007 Data –6 –8 –1 4 24 –11 –8 5 12 –3 7 9 –15 2 –18 –2 11 3 9 –24 –4 14 19 –9 –9 2 5 32 28 –5 –18 13 11 12 17 5 –12 4 –7 –5 3 –14 4 8 23 –3 5 (1) Based only on a visual examination of the data values, what is your impression of the whether the distributions of final score differences vary between the last week of 2006 and the first week of 2007? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (2) A table of bins will help you group the final score differences into intervals of five points each (for example, 6 to 10 points, 11 to 15 points, and so on). The bins start with the lowest final score difference of –25 and end with the highest final score difference of 29. The bins are shown in the following table: Last Week of 2006 Season Only Range Tally Frequency Relative Frequency –25 to –21 | 1 1/55 = 0.018 = 1.8% –20 to –16 –15 to –11 –10 to –6 |||| 5 –5 to –1 0 to 4 5 to 9 10 to 14 15 to 19 | 20 to 24 25 to 29 For each value in the 2006 Data table, determine the bin it falls into. For example, the first final score difference is 16, so it belongs in the bin that represents the range 15 to 19. A tally mark (|) is placed in the Tally column next to the range 15 to 19, as shown in the table. Continue doing this for all values in the 2006 Data table. Each time a tally reaches five marks, represent it using a horizontal tally mark (||||), as shown for the bin (range) –10 to –6. (3) The frequency is equal to the number of tally marks for each bin (range). Convert the tallies to numbers that represent the frequencies for each bin. For example, the bin for –10 to –6 has five tally marks, so the frequency is 5, as shown in the table. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (4) The Frequency column is used to construct a frequency histogram. Draw the height of each bin to the number of its frequency. (5) Look at the bars of the histogram carefully. What do the heights of each bar represent? How can the bars help you compare the tallies of the various bins? (6) The relative frequency is the proportion of the total number of observations that fall in each range of the table. There were 55 games played in the final week of 2006, so each relative frequency value is found by dividing the corresponding frequency by 55. Enter the resulting proportions into the column labeled Relative Frequency. The first relative frequency has been already calculated: 1/55 = 0.018 = 1.8% in the bin labeled –25 to –19, since there is a frequency of 1 for that bin. Now, complete the table by calculating the relative frequencies for the remaining bins (ranges). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (7) A relative frequency histogram represents the relative frequencies for the bins instead of the frequencies. In other words, it represents the percentage of the total number of scores for each bin. Your instructor handed out copies of the frequency histogram for Question 4 and a relative frequency histogram that is for the final score differences in the last week of 2006. The relative frequency in this histogram is reported as a percentage instead of as a decimal. Do you notice any similarities between the two histograms? (8) The following questions relate to important features of a graph such as a histogram. (a) How would you describe the shape of the distribution of final score differences in the last week of 2006? Is it symmetrical or does it lean in one direction? (b) Estimate the value for the center of the distribution. (c) Report a value for the range (or spread) of the distribution. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data Homework (1) Consider the NBA data. (a) Use technology to construct a frequency histogram for the Home Team–Visiting Team differences in the first week of 2007, after the league changed back to the traditional leather ball. (b) Use technology to construct a relative frequency histogram for the differences from the first week of 2007. (c) Describe the important features of the distribution of differences from the first week of 2007. (d) Compare the histogram for 2007 to that for 2006. To do this effectively, you must use the same bin size for both data sets. Did you use frequency or relative frequency in comparing the two histograms? Explain your answer. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.2: Constructing Histograms for Quantitative Data (2) The February 2011 issue of Consumer Reports magazine provides the overall score ratings for 25 different folding treadmill exercise machines. The following are the overall score ratings: 81 79 76 76 75 74 73 73 73 72 71 71 71 70 70 69 66 66 65 65 64 63 61 61 50 (a) Use technology to construct a frequency histogram for the overall score ratings for folding treadmill machines. (b) Use technology to construct a relative frequency histogram for the overall score ratings for folding treadmill machines. (c) Describe the important features of the overall score ratings for folding treadmill machines. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples Estimated number of 50-‐minute class sessions: 0.5 Materials Required Appropriate statistical software or copies of output Learning Goals Students will understand that • • distributions are most easily compared when graphs are presented side by side and on the same scale. comparisons can be used to relate the important features in two or more graphs of quantitative data. Students will be able to • • use technology to construct side-‐by-‐side graphs on the same scale for two or more groups. describe and compare the important characteristics of two or more graphs in context. Instructor Notes In this lesson, students should develop a facility with constructing comparative graphs for two independent groups of quantitative data using technology. The graphs should be side by side and on the same scale so that visual comparisons are easy. You want students to develop the habit of examining and comparing the important features of distributions of quantitative data, including shape, center, spread, and unusual features. Comparisons should use words like greater, smaller, and same; lists of important characteristics are not sufficient. Relative frequency histograms allow comparisons across groups with different numbers of observations. Converting to a percentage scale facilitates visual comparisons of the two distributions without the distraction caused by different sample sizes. In subsequent lessons, students will use statistics to quantify the characteristics of distributions of quantitative information. To address the comparison of statistics between two groups, they will learn to perform statistical tests and compute confidence intervals. Comparing Distributions in Two Samples Introduction to the Context of the Task [Student Handout] (Note: The shapes in the two following distributions differ, but otherwise the main feature is a clear shift toward a lower distribution center in the 2007 final score differences. It is not clear that this change is due to chance, the change back to the traditional leather ball, or some other factor.) In Lesson 2.1.1, you analyzed data from the 2006–2007 NBA season, when the league changed to a new synthetic basketball. The league responded to pressure from the players by changing back to the traditional leather basketball on January 1, 2007. In that lesson, you examined whether the change seemed to be associated with differences in the distribution of total points scored by the two teams in games during the last week of 2006 and first week of 2007. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples For this lesson, the final score differences for those games are displayed below: 2006 Final Score Differences 16 13 11 1 19 5 2 23 –7 8 10 15 –13 –7 6 23 –16 –25 6 –13 15 25 3 2 14 23 9 10 –1 10 26 9 –10 19 10 22 –3 –10 1 7 14 –11 6 17 8 29 23 10 –4 –7 2 10 10 14 6 2007 Final Score Differences –6 –8 –1 4 24 –11 –8 5 12 –3 7 9 –15 2 –18 –2 11 3 9 –24 –4 14 19 –9 –9 2 5 32 28 –5 –18 13 11 12 17 5 –12 4 –7 –5 3 –14 4 8 23 –3 5 The fundamental question is whether the change back to the traditional basketball is associated with a change in the final score differences. Activities (1) Use technology to construct side-‐by-‐side dotplots on the same scale that can be used to compare the distributions of final score differences in the last week of 2006 and the first week of 2007. Year Dotplot of Difference vs Year 2006 2007 -24 -16 -8 0 8 Difference 16 24 32 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples (2) Use technology to construct side-‐by-‐side frequency histograms and relative frequency histograms on the same scale to compare the distributions of the final score differences in the two years. Do your visual impressions of the comparisons of the distributions between the two years change depending on whether you use frequency histograms or relative frequency histograms? Explain your reasoning. Histogram of 2006, 2007 -20 2006 -10 0 9 12 10 20 30 2007 Frequency 8 10 7 8 6 5 6 4 3 4 2 2 0 1 -20 -10 0 10 20 30 0 Histogram of 2006, 2007 -20 2006 Relative Frequency Percentage 25 0 10 20 30 2007 16 20 14 12 15 10 8 10 6 4 5 2 0 18 -10 -20 -10 0 10 20 30 0 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples Wrap-‐Up/Direct Instruction Have students form small groups to describe the shape, center, spread, and other unusual features of the previous histograms. Ask students if it matters that the graphs they are comparing are on the same axes/scale and side by side. Elicit student group answers, guide discussion, and reach a consensus with the class about the following features: • • • • Shapes: Neither graph is perfectly symmetric. The 2006 data are slightly skewed toward lower values and the 2007 data toward higher values. Centers: The center for the 2006 data is about 10 points, and the center for the 2007 data is slightly lower, at about 5 points. Dispersion (Spread): Both groups spread over approximately the same ranges of values (about –25 to about 30). The dispersions are about equal. Unusual Features: Other than the skew of the two distributions, there are no glaring unusual features. Homework [Student Handout] The February 2011 issue of Consumer Reports magazine provides the Overall Score ratings for exercise treadmills. There are two treadmill categories: nonfolding and folding. Here are the Overall Score ratings for these two types of treadmill: Nonfolding Treadmills 85 84 83 82 81 78 78 69 65 60 Folding Treadmills 81 79 76 76 75 74 73 73 73 72 71 71 71 70 70 69 66 66 65 65 64 63 61 61 50 (1) Use technology to construct side-‐by-‐side graphs to compare the distributions of the Overall Score ratings for the two types of treadmills. (2) Describe and compare the important features of the distributions of the Overall Score ratings for the two types of treadmills. (3) Suppose your friend is going to purchase a treadmill, but cannot decide whether to purchase a nonfolding or folding model. Regardless of the price, which type would you recommend for the highest quality? What evidence do you have to recommend one type over the other? Be specific. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples Answers (1) Here are side-‐by-‐side dotplots, frequency histograms, and relative frequency histograms comparing the distributions of Overall Score ratings for the two types of treadmill: Type Dotplot of Rating vs Type Folding Nonfolding 50 55 60 65 70 75 80 85 Rating Histogram of Rating 50 60 Folding 70 80 90 Nonfolding 10 Frequency 8 6 4 2 0 50 60 70 80 90 Rating Panel variable: Type Histogram of Rating 50 Relative Frequency Percentage Folding 60 70 80 90 Nonfolding 40 30 20 10 0 50 60 70 80 90 Rating Panel variable: Type The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples (2) Both distributions are relatively symmetric. The center for folding treadmills is about 75, lower than the center for nonfolding treadmills, which is closer to 80. The spread for the nonfolding treadmills seems smaller than the spread for folding treadmills. There is a potential outlier in the folding treadmill group and a gap near 70 in the nonfolding treadmill group. (3) It seems that—as a group—nonfolding treadmills have greater Overall Score ratings. The center for nonfolding treadmills is about five points higher than the center for folding treadmills. The ratings for nonfolding treadmills are also slightly more consistent; they have a smaller dispersion than the ratings for the folding treadmills. If price is not an issue, then students should recommend that their friend purchase a nonfolding treadmill. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples In Lesson 2.1.1, you analyzed data from the 2006–2007 NBA season, when the league changed to a new synthetic basketball. The league responded to pressure from the players by changing back to the traditional leather basketball on January 1, 2007. In that lesson, you examined whether the change seemed to be associated with differences in the distribution of total points scored by the two teams in games during the last week of 2006 and first week of 2007. For this lesson, the final score differences for those games are displayed below: 2006 Final Score Differences 16 13 11 1 19 5 2 23 –7 8 10 15 –13 –7 6 23 –16 –25 6 –13 15 25 3 2 14 23 9 10 –1 10 26 9 –10 19 10 22 –3 –10 1 7 14 –11 6 17 8 29 23 10 –4 –7 2 10 10 14 6 2007 Final Score Differences –6 –8 –1 4 24 –11 –8 5 12 –3 7 9 –15 2 –18 –2 11 3 9 –24 –4 14 19 –9 –9 2 5 32 28 –5 –18 13 11 12 17 5 –12 4 –7 –5 3 –14 4 8 23 –3 5 The fundamental question is whether the change back to the traditional basketball is associated with a change in the final score differences. (1) Use technology to construct side-‐by-‐side dotplots on the same scale that can be used to compare the distributions of final score differences in the last week of 2006 and the first week of 2007. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples (2) Use technology to construct side-‐by-‐side frequency histograms and relative frequency histograms on the same scale to compare the distributions of the final score differences in the two years. Do your visual impressions of the comparisons of the distributions between the two years change depending on whether you use frequency histograms or relative frequency histograms? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples Homework The February 2011 issue of Consumer Reports magazine provides the Overall Score ratings for exercise treadmills. There are two treadmill categories: nonfolding and folding. Here are the Overall Score ratings for these two types of treadmill: Nonfolding Treadmills 85 84 83 82 81 78 78 69 65 60 Folding Treadmills 81 79 76 76 75 74 73 73 73 72 71 71 71 70 70 69 66 66 65 65 64 63 61 61 50 (1) Use technology to construct side-‐by-‐side graphs to compare the distributions of the Overall Score ratings for the two types of treadmills. (2) Describe and compare the important features of the distributions of the Overall Score ratings for the two types of treadmills. (3) Suppose your friend is going to purchase a treadmill, but cannot decide whether to purchase a nonfolding or folding model. Regardless of the price, which type would you recommend for the highest quality? What evidence do you have to recommend one type over the other? Be specific. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will begin to understand that • • • the center of a data set is an important and commonly studied characteristic. there is more than one way to quantify the center. Two commonly used measures are the sample mean and sample median. the characteristics of a data set affect which measure of center (median or mean) is most appropriate. Students will begin to be able to • • • approximate the values of the sample mean and sample median by visual examining a graph. compute the values of the sample mean and sample median for a given collection of data, by hand or using technology. recognize the advantages and disadvantages of using the sample mean or sample median in different circumstances (i.e., given the characteristics of a given data set). This lesson introduces students to the notions of sample mean and sample median through visual examination and consideration of data and distributions. Subsequent lessons formally define the sample mean and sample median and consider how their properties are similar and different. Part I: Rich Task [Student Handout] (Note: Students work in small groups and share their results with the class, with the instructor facilitating discussion.) (1) The following dotplots show the monthly normal temperatures for two U.S. cities: St. Louis, Missouri, and San Francisco, California. (a) By visually examining the dotplot only—do not attempt to compute anything!—select a single temperature value that represents the typical normal temperature for St. Louis. Do the same for San Francisco. How do these two values compare? In other words, are they close in value? (Note: Monitor student groups to ensure that they are seeking values that describe the center of the distribution, either the median or mean. At this point, they are not attempting to compute either value [i.e., they are relying only on visual inspection].) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (b) Suppose you have a friend who is going to spend a week in San Francisco in January, when the normal temperature is about 52°F. The only piece of information that your friend has about the weather in San Francisco is the representative temperature value that you selected. Will your friend be able to pack clothes that are appropriate for the weather? (Note: Suppose a student selects a representative value of 57°F. Yes, since the difference between representative temperature—selected by the student—and the monthly temperature is 57°F – 54°F = 3°F, so the information the friend has will enable him or her to pack appropriate clothes.) (c) What if your friend were going to St. Louis in January, when the normal temperature is about 29°F? Will your friend be able to pack clothes that are appropriate for the weather? (Note: Suppose a student selects a representative value of 57°F. No, since the difference between the mean or median and the monthly temperature is 57°F – 29°F = 28°F, so the information the friend has will not enable him or her to pack appropriate clothes.) (d) What do your answers tell you about the representative temperature value for each city? How representative are they really? Would it be helpful to have another way of describing a temperature data set? Can you think of a way to do this? (Note: Facilitate class discussion toward the idea of examining the maximum and minimum values and an expression of the width of the range between the maximum and minimum. This will later help students understand the concept of spread of a distribution.) Part II: Conceptual Tasks [Student Handout] Estimating the Center of a Data Set by Inspection of Numerical Data (Note: Ask students to consider the following distributions from two different contexts. Ask individuals or groups of students to consider what numerical value is representative of each distribution. Ask students to describe how the selected representative value was determined.) (2) The following are the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period: 169 154 179 202 197 175 How much weight do normal adolescent laboratory rats typically gain in one month? Explain how you chose the value you report. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (3) The following table contains the number of runs scored during the 2010 Major League Baseball season for American League teams: Team Runs Baltimore Orioles 613 Boston Red Sox 818 Chicago White Sox 752 Cleveland Indians 646 Detroit Tigers 751 Kansas City Royals 676 Los Angeles Angels 681 Minnesota Twins 781 New York Yankees 859 Oakland Athletics 663 Seattle Mariners 513 Tampa Bay Rays 802 Texas Rangers 787 Toronto Blue Jays 755 How many runs per season would you expect a typical American League team to score? Explain your reasoning. Estimating the Center of a Data Set in the Context of Comparisons of Data Sets/Graphics (Note: Given a variety of real and simulated distributions, students will approximate the values of the sample mean and sample median by visual examination. They will encounter skewed and censored distributions, in which the values of the sample mean and sample median are substantially different. Students will understand that the values of the sample mean and sample median are similar for symmetric distributions. They will explore comparisons between distributions in the context of addressing a research question.) (4) Here are the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period: 169 154 179 202 197 175 And here are the weight gains (in grams) for a sample of six adolescent laboratory rats that were given a high daily dose of a stimulant drug. 137 158 153 147 168 147 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Use these values to compare the estimated sample mean and estimated sample median weight gains for adolescent rats that are not receiving any medication and those that are receiving a high daily dose of a stimulant. Do you think that the stimulant has any effect on monthly weight gain? Explain your reasoning. (5) The following are side-‐by-‐side histograms for the numbers of runs scored by Major League Baseball teams in 2010. The American League (AL) teams and the National League (NL) teams are represented in separate histograms. Histogram of Runs 500 AL 600 700 800 900 NL 6 Frequency 5 4 3 2 1 0 500 600 700 800 900 Runs Panel variable: League (a) By only examining the graphs, estimate the sample mean (balance point) for the number of runs scored by AL teams in 2010. Estimate the sample mean number of runs scored by NL teams in 2010. (b) By only examining the graphs, estimate the sample median (middle value) for the number of runs scored by AL teams in 2010. Estimate the sample median number of runs scored by NL teams in 2010. (Note about how estimation should occur: Estimate visually only or numerically from the histogram. For example, estimating data set values based on midrange of bin width.) (c) Are the values for the sample mean and sample median you estimated for AL teams similar or different? Are the sample mean and sample median values you estimated for NL teams similar or different? Explain your reasoning. (d) The American League uses a designated hitter, which many fans believe leads to greater offense and more runs scored, while the National League does not. (The designated hitter is a player that bats in place of the pitcher, who is usually a weak hitter.) Does the evidence you gathered from the sample means and sample medians indicate that there is The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median a difference in runs scored between the American League and National League? How large does the difference seem to be? Reflection [Student Handout] How did you estimate the sample mean and sample median from the graphs? What is the definition of the sample mean? What is the definition of the sample median? How do these values represent a distribution of quantitative data? The exploration you have conducted in this lesson introduces the notions of sample mean and sample median through visual examination and consideration of data and distributions. In the next lessons, you will define the sample mean and sample median and consider how their properties are similar and different. Homework [Student Handout] (1) On January 1, 2007, the National Basketball Association (NBA) changed from using a synthetic basketball back to the traditional leather ball. The following side-‐by-‐side dotplots represent the distributions of total points scored by both teams in NBA basketball games played during the last week of 2006 (with the synthetic ball) and the first week of 2007 (with the leather ball): Year Dotplot of Difference vs Year 2006 2007 -24 -16 -8 0 8 Difference 16 24 32 (a) By visual examining the dotplots, estimate the sample mean (average) total number of points scored by both teams for games played during the last week of 2006. Estimate the sample mean total number of points scored by both teams during the first week of 2007. (b) Compare the values of your estimated sample means. Does it seem like changing back to the traditional basketball made a difference in the sample means of total points scored? Explain your reasoning. (c) Now estimate the sample median total numbers of points scored by both teams for games played in the last week of 2006 and then for the first week of 2007. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (d) Continue to compare the values of your estimated sample medians. Does it seem like changing back to the traditional basketball made a difference in the sample medians of total points scored? Explain your reasoning. (e) Does the evidence contained in the sample mean and sample median values you estimated indicate that there were differences in points scored using the synthetic and traditional leather basketballs? Why or why not? (2) The accompanying histogram displays the distribution of the ages of the 30 women appearing on the 15th season (2010) of the television show The Bachelor. Histogram of Age 6 Frequency 5 4 3 2 1 0 23.5 25.5 27.5 Age 29.5 31.5 (a) Based on your visual inspection of this histogram, report a representative value for the ages of women on The Bachelor. (b) Why did you choose the value you reported? Is it the sample mean, sample median, or some other value? Explain why you think it best represents the ages in the distribution. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Part I (1) The following dotplots show the monthly normal temperatures for two U.S. cities: St. Louis, Missouri, and San Francisco, California. (a) By visually examining the dotplot only—do not attempt to compute anything!—select a single temperature value that represents the typical normal temperature for St. Louis. Do the same for San Francisco. How do these two values compare? In other words, are they close in value? (b) Suppose you have a friend who is going to spend a week in San Francisco in January, when the normal temperature is about 52°F. The only piece of information that your friend has about the weather in San Francisco is the representative temperature value that you selected. Will your friend be able to pack clothes that are appropriate for the weather? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (c) What if your friend were going to St. Louis in January, when the normal temperature is about 29°F? Will your friend be able to pack clothes that are appropriate for the weather? (d) What do your answers tell you about the representative temperature value for each city? How representative are they really? Would it be helpful to have another way of describing a temperature data set? Can you think of a way to do this? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Part II Estimating the Center of a Data Set by Inspection of Numerical Data (2) The following are the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period: 169 154 179 202 197 175 How much weight do normal adolescent laboratory rats typically gain in one month? Explain how you chose the value you report. (3) The following table contains the number of runs scored during the 2010 Major League Baseball season for American League teams: Team Runs Baltimore Orioles 613 Boston Red Sox 818 Chicago White Sox 752 Cleveland Indians 646 Detroit Tigers 751 Kansas City Royals 676 Los Angeles Angels 681 Minnesota Twins 781 New York Yankees 859 Oakland Athletics 663 Seattle Mariners 513 Tampa Bay Rays 802 Texas Rangers 787 Toronto Blue Jays 755 How many runs per season would you expect a typical American League team to score? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Estimating the Center of a Data Set in the Context of Comparisons of Data Sets/Graphics (4) Here are the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period: 169 179 202 197 175 And here are the weight gains (in grams) for a sample of six adolescent laboratory rats that were given a high daily dose of a stimulant drug. 137 154 158 153 147 168 147 Use these values to compare the estimated sample mean and estimated sample median weight gains for adolescent rats that are not receiving any medication and those that are receiving a high daily dose of a stimulant. Do you think that the stimulant has any effect on monthly weight gain? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (5) The following are side-‐by-‐side histograms for the numbers of runs scored by Major League Baseball teams in 2010. The American League (AL) teams and the National League (NL) teams are represented in separate histograms. Histogram of Runs 500 AL 600 700 800 900 NL 6 Frequency 5 4 3 2 1 0 500 600 700 800 900 Runs Panel variable: League (a) By only examining the graphs, estimate the sample mean (balance point) for the number of runs scored by AL teams in 2010. Estimate the sample mean number of runs scored by NL teams in 2010. (b) By only examining the graphs, estimate the sample median (middle value) for the number of runs scored by AL teams in 2010. Estimate the sample median number of runs scored by NL teams in 2010. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (c) Are the values for the sample mean and sample median you estimated for AL teams similar or different? Are the sample mean and sample median values you estimated for NL teams similar or different? Explain your reasoning. (d) The American League uses a designated hitter, which many fans believe leads to greater offense and more runs scored, while the National League does not. (The designated hitter is a player that bats in place of the pitcher, who is usually a weak hitter.) Does the evidence you gathered from the sample means and sample medians indicate that there is a difference in runs scored between the American League and National League? How large does the difference seem to be? Reflection How did you estimate the sample mean and sample median from the graphs? What is the definition of the sample mean? What is the definition of the sample median? How do these values represent a distribution of quantitative data? The exploration you have conducted in this lesson introduces the notions of sample mean and sample median through visual examination and consideration of data and distributions. In the next lessons, you will define the sample mean and sample median and consider how their properties are similar and different. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Homework (1) On January 1, 2007, the National Basketball Association (NBA) changed from using a synthetic basketball back to the traditional leather ball. The following side-‐by-‐side dotplots represent the distributions of total points scored by both teams in NBA basketball games played during the last week of 2006 (with the synthetic ball) and the first week of 2007 (with the leather ball): Year Dotplot of Difference vs Year 2006 2007 -24 -16 -8 0 8 Difference 16 24 32 (a) By visual examining the dotplots, estimate the sample mean (average) total number of points scored by both teams for games played during the last week of 2006. Estimate the sample mean total number of points scored by both teams during the first week of 2007. (b) Compare the values of your estimated sample means. Does it seem like changing back to the traditional basketball made a difference in the sample means of total points scored? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (c) Now estimate the sample median total numbers of points scored by both teams for games played in the last week of 2006 and then for the first week of 2007. (d) Continue to compare the values of your estimated sample medians. Does it seem like changing back to the traditional basketball made a difference in the sample medians of total points scored? Explain your reasoning. (e) Does the evidence contained in the sample mean and sample median values you estimated indicate that there were differences in points scored using the synthetic and traditional leather basketballs? Why or why not? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median (2) The accompanying histogram displays the distribution of the ages of the 30 women appearing on the 15th season (2010) of the television show The Bachelor. Histogram of Age 6 Frequency 5 4 3 2 1 0 23.5 25.5 27.5 Age 29.5 31.5 (a) Based on your visual inspection of this histogram, report a representative value for the ages of women on The Bachelor. (b) Why did you choose the value you reported? Is it the sample mean, sample median, or some other value? Explain why you think it best represents the ages in the distribution. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 9 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will understand that • • the mean and median are both relatively easy to compute by hand. the mean and median are not necessarily the same values. Students will be able to • • compute the mean and median, by hand and by using technology. compare the values of the mean and median, and understand why the values might be different. Activity Summary Students are provided data in at least two groups on at least one quantitative variable, and they compute the values of the mean and median for each group. Students compare the values in the two groups and discuss which statistic (mean or median) would be used in a report of the results. They write a brief summary in context, reporting and comparing the specific values for at least one of the statistics (mean or median). Computing the Sample Mean and Sample Median for Quantitative Data [Student Handout] Introduction to the Context of the Task One problem in Lesson 2.2.1 dealt with the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period, accompanied by the weight gains for a sample of six adolescent rats that received a high daily dose of a stimulant drug. Here are the weight gains (in grams) for the two groups: Control Group (Not Given a Stimulant) Stimulant Group 169 137 154 158 179 153 202 147 197 168 175 147 To determine whether there might be an effect on weight gain due to the stimulant, you will determine representative values, namely the sample mean and sample median, for the two groups. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data Activities (1) The following histogram represents the distribution of the six weight gain values for the Control group: Histogram of Control 2.0 Frequency 1.5 1.0 0.5 0.0 152 160 168 176 184 Control 192 200 208 Where do you think that this histogram balances? That is, where on the number line would you set a balance point so that the distribution does not tilt one way, to the left or to the right? (2) The sample mean (or average) defines the balance point for the distribution. The sample mean is often denoted by the symbol !" , which is pronounced “x-‐bar.” The sample mean is equal to the sum of the data values divided by the number of data values. For the control group, !" = (169 + 154 + 179 + 202 + 197 + 175)/6 = 179.3 grams Use the data from the Stimulant group to compute the sample mean (average) for the weight gains. Call the resulting sample mean !" . (3) Compare the values of !" and!" (the sample means for weight gains in the Control and Stimulant groups). Which sample mean is larger? Is the difference between the two sample means enough for you to believe that the stimulant might have affected weight gain in adolescent rats? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data (4) Recall that the weight gains for the adolescent rats in the Control group were the following: 169 154 179 202 197 175 A sample median is the middle of a sorted list of data values. The following is the process for finding the sample median for the Control group values: • First, sort the data values in order from smallest to largest: 154 • 169 175 179 197 202 Notice that the middle of this list falls between 175 and 179: 154 169 175 179 197 202  This means that the sample median for the Control group—which is denoted Mx—is the average of the two values beside it, 175 and 179. That is, Mx = (175 + 179)/2 = 177 grams (Note: If there is an odd number of data values, the median is the data value that falls in the middle of the sorted list.) Now, compute the sample median for the Stimulant group, and call it My. (5) Compare the values of the medians for weight gains in the two groups. Which median is larger? Is the difference between the two sample medians large enough to be important? Or are the two sample medians not substantially different? (6) Compare the values of the sample mean and sample median for the Control group. Does it matter which statistic you use? Would your conclusions about whether the stimulant affects weight gain change if you used one statistic or the other? Explain your reasoning. Repeat this analysis for the sample mean and sample median for the Stimulant group. (7) Does the evidence contained in these data values indicate that the distribution of weight gains for rats in the Stimulant group has a different representative value than the distribution of weight gains for rats in the Control group? Explain your reasoning, and refer to the data to support your conclusion. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data Homework Recall the monthly normal temperature data from St. Louis, Missouri, and San Francisco, California. The data values are contained in the following table: City Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 (1) Compute the sample mean (average) monthly normal temperature values for both St. Louis and San Francisco. [Answer: The sample mean monthly normal temperature for St. Louis is !" = (29.3 + 33.9 + 45.1 + 56.7 + 66.1 + 75.4 + 79.8 + 77.6 + 70.2 + 58.4 + 46.2 + 33.9)/12 = 672.6/12 = 56°F. The sample mean monthly normal temperature for San Francisco is !" = (51.1 + 54.4 + 54.9 + 56.0 + 56.6 + 58.4 + 59.1 + 60.1 + 62.3 + 62.0 + 57.2 + 51.7)/12 = 683.8/12 = 57°F.] (2) Compute the sample median monthly normal temperature values for both St. Louis and San Francisco. [Answer: The sorted monthly normal temperatures for St. Louis are 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8  The sample median falls between the sixth and seventh values, so it is equal to Mx = (56.7 + 58.4)/2 = 57.6°F. The sorted monthly normal temperatures for San Francisco are 51.1 51.7 54.4 54.9 56.0 56.6 57.2 58.4 59.1 60.1 62.0 62.3  The sample median falls between the sixth and seventh values, so it is equal to My = (56.6 + 57.2)/2 = 56.9°F.] (3) Write a brief comparison of the sample mean and sample median monthly normal temperature values for the two cities. (Answer: St. Louis and San Francisco are different in many ways, but their representative monthly normal temperatures barely differ at all. The sample mean for St. Louis is 56°F, which is just 1° lower than the sample mean for San Francisco. The sample median for St. Louis is nearly 1° higher than the sample median for San Francisco, 57.6°F and 56.9°F, respectively. It is hard to imagine many other large U.S. cities that are so similar in terms of their representative temperatures.) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data Computing the Sample Mean and Sample Median for Quantitative Data One problem in Lesson 2.2.1 dealt with the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period, accompanied by the weight gains for a sample of six adolescent rats that received a high daily dose of a stimulant drug. Here are the weight gains (in grams) for the two groups: Control Group (Not Given a Stimulant) Stimulant Group 169 137 154 158 179 153 202 147 197 168 175 147 To determine whether there might be an effect on weight gain due to the stimulant, you will determine representative values, namely the sample mean and sample median, for the two groups. (1) The following histogram represents the distribution of the six weight gain values for the Control group: Histogram of Control 2.0 Frequency 1.5 1.0 0.5 0.0 152 160 168 176 184 Control 192 200 208 Where do you think that this histogram balances? That is, where on the number line would you set a balance point so that the distribution does not tilt one way, to the left or to the right? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data (2) The sample mean (or average) defines the balance point for the distribution. The sample mean is often denoted by the symbol !" , which is pronounced “x-‐bar.” The sample mean is equal to the sum of the data values divided by the number of data values. For the control group, !" = (169 + 154 + 179 + 202 + 197 + 175)/6 = 179.3 grams Use the data from the Stimulant group to compute the sample mean (average) for the weight gains. Call the resulting sample mean !" . (3) Compare the values of !" and!" (the sample means for weight gains in the Control and Stimulant groups). Which sample mean is larger? Is the difference between the two sample means enough for you to believe that the stimulant might have affected weight gain in adolescent rats? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data (4) Recall that the weight gains for the adolescent rats in the Control group were the following: 169 154 179 202 197 175 A sample median is the middle of a sorted list of data values. The following is the process for finding the sample median for the Control group values: • First, sort the data values in order from smallest to largest: 154 • 169 175 179 197 202 Notice that the middle of this list falls between 175 and 179: 154 169 175 179 197 202  This means that the sample median for the Control group—which is denoted Mx—is the average of the two values beside it, 175 and 179. That is, Mx = (175 + 179)/2 = 177 grams (Note: If there is an odd number of data values, the median is the data value that falls in the middle of the sorted list.) Now, compute the sample median for the Stimulant group, and call it My. (5) Compare the values of the medians for weight gains in the two groups. Which median is larger? Is the difference between the two sample medians large enough to be important? Or are the two sample medians not substantially different? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data (6) Compare the values of the sample mean and sample median for the Control group. Does it matter which statistic you use? Would your conclusions about whether the stimulant affects weight gain change if you used one statistic or the other? Explain your reasoning. Repeat this analysis for the sample mean and sample median for the Stimulant group. (7) Does the evidence contained in these data values indicate that the distribution of weight gains for rats in the Stimulant group has a different representative value than the distribution of weight gains for rats in the Control group? Explain your reasoning, and refer to the data to support your conclusion. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.2.2: Constructing Histograms for Quantitative Data Homework Recall the monthly normal temperature data from St. Louis, Missouri, and San Francisco, California. The data values are contained in the following table: City St. Louis Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 (1) Compute the sample mean (average) monthly normal temperature values for both St. Louis and San Francisco. (2) Compute the sample median monthly normal temperature values for both St. Louis and San Francisco. (3) Write a brief comparison of the sample mean and sample median monthly normal temperature values for the two cities. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median Estimated number of 50-‐minute class sessions: 2 Learning Goals Students will begin to understand that • • • quartiles are based on the notion of medians at the quarter points. the five-‐number summary is a simple way to reduce data in order to capture multiple characteristics. boxplots are a graphical representation of the five-‐number summary. Students will begin to be able to • • compute the values in the five-‐number summary by hand or using technology. construct side-‐by-‐side boxplots on the same scale for two or more groups. Part I: Rich Task [Student Handout] Recall the monthly normal temperatures for St. Louis, Missouri, and San Francisco, California, which are presented again in the following table: Monthly Normal Temperatures (°F) for St. Louis and San Francisco Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 In earlier lessons, you examined representative values for the center of the distributions of the monthly normal temperatures, including the mean and median. In this lesson, you will consider the variability (or spread) in the data. (1) Examine the dotplots below for the monthly normal temperatures for St. Louis and San Francisco. Write a sentence comparing how variable the monthly normal temperature values are for the cities. (2) In previous lessons, you explored how to represent and interpret data in graphical displays such as dotplots and histograms. Then you learned to summarize the center of a distribution numerically using measures such as the sample mean and sample median. Reporting a number to represent the distribution is important, but in this example the centers of the distributions are similar and the variabilities are very different. Consider how you might report a single number to represent the variability in the data for St. Louis. What value would you report? Use the data from San Francisco to report a similar single number to The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median represent the variability in the monthly normal temperatures. Do the numbers you determined to represent the variabilities in monthly normal temperatures for St. Louis and San Francisco capture the differences in the distributions? Is the value for St. Louis substantially larger to reflect greater variability? (3) As individuals or in groups, share the numeric values you used to represent the variabilities in monthly normal temperatures for St. Louis and San Francisco with the rest of the class. Explain why you chose these values to represent variability and how they facilitate the comparison of the variability for the two cities. Part II: Scaffolded Conceptual Tasks [Student Handout] (4) One statistic (or number) that can be used to represent the variability in quantitative data is the range. This is the maximum value minus the minimum data value: range = maximum – minimum Report the values of the minimum, maximum, and range in monthly normal temperature for St. Louis. Report the corresponding values for San Francisco. Make a brief comparison of the ranges for the two cities. One problem with the range is that it is sensitive to outliers and extreme observations. In other words, it only depends on the values at the lower and upper ends of the list of sorted values. These values do not represent the central portion of the data. It is possible that they are unusual compared to the rest of the distribution. It is also possible that they are reported in error. Another approach to quantify variability is to find the quartiles, which are the first and third quarter points in the data. Here are the monthly normal temperatures for St. Louis, sorted from smallest to largest. St. Louis 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8 Consider breaking the data up into quarters. Since there are 12 observations in the St. Louis data, each quarter of the data contains 12/4 = 3 observations. The dividers in the following table illustrate the four quarters of the data. St. Louis 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8 The median value, which you computed in Lesson 2.2.1, falls in the middle of this list, at the middle divider. The value of the median is (56.7 + 58.4)/2 = 57.55 °F. The first-‐quarter point occurs between the values 33.9 and 45.1. This value is called the first (or lower) quartile, and is denoted by Q1. In this example, Q1 = (33.9 + 45.1)/2 = 39.50 °F. Note that the first quartile (Q1) represents the median of the lower half of values in the dataset. That is, a median of half of the dataset is a quarter point. The third-‐quarter point occurs between the values of 70.2 and 75.4 °F. The third (or upper) quartile is denoted by Q3 and is found to be Q3 = (70.1 + 75.4)/2 = 72.75 °F. Once again, the third quartile represents the median of the higher half of values in the data set. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (5) Together with the median, minimum, and maximum, the quartiles form what is called the five-‐ number summary. Here is the five-‐number summary for the monthly normal temperatures in St. Louis. St. Louis Minimum 29.3 Q1 (first quartile) 39.50 Median 57.55 Q3 (third quartile) 72.75 Maximum 79.8 Use the information for the monthly normal temperatures in San Francisco to construct the five-‐number summary. Report the values in the following table: San Francisco Minimum Q1 (first quartile) Median Q3 (third quartile) Maximum (6) An alternative way to quantify variability in numeric data is to find the distance between the quartiles (Q1 and Q3). This value is called the interquartile range and is abbreviated as IQR. The formula is IQR = Q3 – Q1. For the monthly normal temperatures in St, Louis, the IQR is Q3 – Q1 = 72.75 – 39.50 = 33.25 °F. Report the value of the IQR for the monthly normal temperatures in San Francisco. (7) Write a comparison of the values of the IQRs for the monthly normal temperatures in St. Louis and San Francisco. Did your conclusion match the conclusion from your visual examination of the data in Question 1? Did you reach the same conclusion as the one you reached when you compared the ranges of the values in Question 6? Explain. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median The values in a five-‐number summary can be represented in a graph called a boxplot (sometimes these are called a “box and whiskers plot”). Here is a boxplot for the monthly normal temperatures for St. Louis: Boxplot of St. Louis Monthly Normal Temperature 80 Maximum Q3 70 60 Median 50 40 Q1 Minimum 30 Each value in the five-‐number summary is plotted on the y-‐axis. A box is drawn around the quartiles, a line is drawn through the median, and the minimum and maximum are connected to the box by lines. This display represents the IQR graphically; the IQR is the length of the box. Boxplots also draw the viewer’s attention to the center of the graph, which represents the center of the distribution of the data. (8) The graph below contains the boxplot for St. Louis. Sketch the boxplot for San Francisco beside the boxplot for St. Louis. Write a brief comparison of the variability (spread) represented in the boxplots of monthly normal temperatures for St. Louis and San Francisco. Compare your conclusion based on the boxplots with the other comparisons you have made in this lesson. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median Reflection/Wrap-‐Up The example in this lesson generates focus on differences in variability because the centers of the two distributions are essentially identical. The context of monthly normal temperatures is accessible and provides the opportunity to concentrate on the concept of quantifying variability. The range is simple to compute, but it is sensitive to outliers and extreme observations. The IQR offers a relatively simple measure of variability, and it is resistant to the effect of outliers and extreme observations. Boxplots provide a graph with a simple structure that contains visual representations of center and variability. Boxplots are analogous to a skeleton of a data set: They are always based on five simple summary values of the data set, like bones, but represent data sets of different sizes and characteristics, like bodies built on top of skeletons. If possible, encourage students to construct side-‐by-‐side boxplots using appropriate and available technology. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median Homework [Student Handout] (1) Recall the home team–visiting team differences in scores for NBA games during the last week of 2006 (using a synthetic basketball) and the first week of 2007 (using the traditional leather basketball) displayed in the following tables: 2006 Final Score Differences 16 13 11 1 19 5 2 23 –7 8 10 15 –13 –7 6 23 –16 –25 6 –13 15 25 3 2 14 23 9 10 –1 10 26 9 –10 19 10 22 –3 –10 1 7 14 –11 6 17 8 29 23 10 –4 –7 2 10 10 14 6 2007 Final Score Differences –6 –8 –1 4 24 –11 –8 5 12 –3 7 9 –15 2 –18 –2 11 3 9 –24 –4 14 19 –9 –9 2 5 32 28 –5 –18 13 11 12 17 5 –12 4 –7 –5 3 –14 4 8 23 –3 5 (a) Construct side-‐by-‐side dotplots or histograms of the data values in the two years. Describe the important features of the distributions of the home team minus visiting team score differences based on the graphs. (b) Report the values for the minimum, maximum, and range for the data values in 2006. Report the corresponding values for 2007. (c) Report the values of the first and third quartiles (Q1 and Q3), for the data values in 2006. Report the IQR for 2006. Report the corresponding values for 2007. (d) Construct side-‐by-‐side boxplots for the values in 2006 and 2007. (e) Write a comparison of the variability in home team minus visiting team score differences at the end of 2006 and the beginning of 2007. Refer to specific statistical and graphical evidence to support your conclusion. (2) Suppose an error was made when the data values for monthly normal temperatures in St. Louis were recorded in the table below. The value for July was recorded in error. Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 97.8 77.6 70.2 58.4 46.2 33.9 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (a) How does this error affect the range? Be specific in your response. (b) How does this error affect the IQR? Be specific in your response. (Note: The IQR is resistant to outliers or extreme observations. The range is sensitive to outliers and extreme observations.) (c) How does the change affect the boxplot for the data? Explain, or illustrate by sketching the boxplot. (3) The following table contains the ages of the 30 women on the 15th season (2010) of “The Bachelor.” Bachelorette Age Bachelorette Age Alli 24 Lindsay 25 Ashley H 26 Lisa M 24 Ashley S 26 Lisa P 27 Britnee 25 Madison 25 Britt 25 Marissa 26 Chantal 28 Meghan 30 Cristy 30 Melissa 32 Emily 24 Michelle 30 J 26 Raichel 29 Jackie 27 Rebecca 30 Jill 28 Renee 28 Keltie 28 Sarah L 25 Kimberly 27 Sarah P 27 Lacey 27 Shawntel 25 Lauren 26 Stacey 26 (a) Construct a dotplot or histogram for the ages of the bachelorettes. Describe important features of the graph. (b) Report the values in the five-‐number summary for the ages of the bachelorettes. (c) Construct a boxplot for the ages of the bachelorettes. (d) Compare the boxplot you constructed in Question 3c to the graph you constructed in Question 3a. Which do you prefer to represent the distribution of the ages of the bachelorettes? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median Part I Recall the monthly normal temperatures for St. Louis, Missouri, and San Francisco, California, which are presented again in the following table: Monthly Normal Temperatures (°F) for St. Louis and San Francisco Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 In earlier lessons, you examined representative values for the center of the distributions of the monthly normal temperatures, including the mean and median. In this lesson, you will consider the variability (or spread) in the data. (1) Examine the dotplots below for the monthly normal temperatures for St. Louis and San Francisco. Write a sentence comparing how variable the monthly normal temperature values are for the cities. (2) In previous lessons, you explored how to represent and interpret data in graphical displays such as dotplots and histograms. Then you learned to summarize the center of a distribution numerically using measures such as the sample mean and sample median. Reporting a number to represent the distribution is important, but in this example the centers of the distributions are similar and the variabilities are very different. Consider how you might report a single number to represent the variability in the data for St. Louis. What value would you report? Use the data from San Francisco to report a similar single number to represent the variability in the monthly normal temperatures. Do the numbers you determined to represent the variabilities in monthly normal temperatures for St. Louis and San Francisco capture the differences in the distributions? Is the value for St. Louis substantially larger to reflect greater variability? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (3) As individuals or in groups, share the numeric values you used to represent the variabilities in monthly normal temperatures for St. Louis and San Francisco with the rest of the class. Explain why you chose these values to represent variability and how they facilitate the comparison of the variability for the two cities. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median Part II (4) One statistic (or number) that can be used to represent the variability in quantitative data is the range. This is the maximum value minus the minimum data value: range = maximum – minimum Report the values of the minimum, maximum, and range in monthly normal temperature for St. Louis. Report the corresponding values for San Francisco. Make a brief comparison of the ranges for the two cities. One problem with the range is that it is sensitive to outliers and extreme observations. In other words, it only depends on the values at the lower and upper ends of the list of sorted values. These values do not represent the central portion of the data. It is possible that they are unusual compared to the rest of the distribution. It is also possible that they are reported in error. Another approach to quantify variability is to find the quartiles, which are the first and third quarter points in the data. Here are the monthly normal temperatures for St. Louis, sorted from smallest to largest. St. Louis 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8 Consider breaking the data up into quarters. Since there are 12 observations in the St. Louis data, each quarter of the data contains 12/4 = 3 observations. The dividers in the following table illustrate the four quarters of the data. St. Louis 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8 The median value, which you computed in Lesson 2.2.1, falls in the middle of this list, at the middle divider. The value of the median is (56.7 + 58.4)/2 = 57.55 °F. The first-‐quarter point occurs between the values 33.9 and 45.1. This value is called the first (or lower) quartile, and is denoted by Q1. In this example, Q1 = (33.9 + 45.1)/2 = 39.50 °F. Note that the first quartile (Q1) represents the median of the lower half of values in the dataset. That is, a median of half of the dataset is a quarter point. The third-‐quarter point occurs between the values of 70.2 and 75.4 °F. The third (or upper) quartile is denoted by Q3 and is found to be Q3 = (70.1 + 75.4)/2 = 72.75 °F. Once again, the third quartile represents the median of the higher half of values in the data set. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (5) Together with the median, minimum, and maximum, the quartiles form what is called the five-‐ number summary. Here is the five-‐number summary for the monthly normal temperatures in St. Louis. St. Louis Minimum 29.3 Q1 (first quartile) 39.50 Median 57.55 Q3 (third quartile) 72.75 Maximum 79.8 Use the information for the monthly normal temperatures in San Francisco to construct the five-‐number summary. Report the values in the following table: San Francisco Minimum Q1 (first quartile) Median Q3 (third quartile) Maximum (6) An alternative way to quantify variability in numeric data is to find the distance between the quartiles (Q1 and Q3). This value is called the interquartile range and is abbreviated as IQR. The formula is IQR = Q3 – Q1. For the monthly normal temperatures in St, Louis, the IQR is Q3 – Q1 = 72.75 – 39.50 = 33.25 °F. Report the value of the IQR for the monthly normal temperatures in San Francisco. (7) Write a comparison of the values of the IQRs for the monthly normal temperatures in St. Louis and San Francisco. Did your conclusion match the conclusion from your visual examination of the data in Question 1? Did you reach the same conclusion as the one you reached when you compared the ranges of the values in Question 6? Explain. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median The values in a five-‐number summary can be represented in a graph called a boxplot (sometimes these are called a “box and whiskers plot”). Here is a boxplot for the monthly normal temperatures for St. Louis: Boxplot of St. Louis Monthly Normal Temperature 80 Maximum Q3 70 60 Median 50 40 Q1 Minimum 30 Each value in the five-‐number summary is plotted on the y-‐axis. A box is drawn around the quartiles, a line is drawn through the median, and the minimum and maximum are connected to the box by lines. This display represents the IQR graphically; the IQR is the length of the box. Boxplots also draw the viewer’s attention to the center of the graph, which represents the center of the distribution of the data. (8) The graph below contains the boxplot for St. Louis. Sketch the boxplot for San Francisco beside the boxplot for St. Louis. Write a brief comparison of the variability (spread) represented in the boxplots of monthly normal temperatures for St. Louis and San Francisco. Compare your conclusion based on the boxplots with the other comparisons you have made in this lesson. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median Homework (1) Recall the home team–visiting team differences in scores for NBA games during the last week of 2006 (using a synthetic basketball) and the first week of 2007 (using the traditional leather basketball) displayed in the following tables: 2006 Final Score Differences 16 13 11 1 19 5 2 23 –7 8 10 15 –13 –7 6 23 –16 –25 6 –13 15 25 3 2 14 23 9 10 –1 10 26 9 –10 19 10 22 –3 –10 1 7 14 –11 6 17 8 29 23 10 –4 –7 2 10 10 14 6 2007 Final Score Differences –6 –8 –1 4 24 –11 –8 5 12 –3 7 9 –15 2 –18 –2 11 3 9 –24 –4 14 19 –9 –9 2 5 32 28 –5 –18 13 11 12 17 5 –12 4 –7 –5 3 –14 4 8 23 –3 5 (a) Construct side-‐by-‐side dotplots or histograms of the data values in the two years. Describe the important features of the distributions of the home team minus visiting team score differences based on the graphs. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (b) Report the values for the minimum, maximum, and range for the data values in 2006. Report the corresponding values for 2007. (c) Report the values of the first and third quartiles (Q1 and Q3), for the data values in 2006. Report the IQR for 2006. Report the corresponding values for 2007. (d) Construct side-‐by-‐side boxplots for the values in 2006 and 2007. (e) Write a comparison of the variability in home team minus visiting team score differences at the end of 2006 and the beginning of 2007. Refer to specific statistical and graphical evidence to support your conclusion. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (2) Suppose an error was made when the data values for monthly normal temperatures in St. Louis were recorded in the table below. The value for July was recorded in error. Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 97.8 77.6 70.2 58.4 46.2 33.9 (a) How does this error affect the range? Be specific in your response. (b) How does this error affect the IQR? Be specific in your response. (c) How does the change affect the boxplot for the data? Explain, or illustrate by sketching the boxplot. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (3) The following table contains the ages of the 30 women on the 15th season (2010) of “The Bachelor.” Bachelorette Age Bachelorette Age Alli 24 Lindsay 25 Ashley H 26 Lisa M 24 Ashley S 26 Lisa P 27 Britnee 25 Madison 25 Britt 25 Marissa 26 Chantal 28 Meghan 30 Cristy 30 Melissa 32 Emily 24 Michelle 30 J 26 Raichel 29 Jackie 27 Rebecca 30 Jill 28 Renee 28 Keltie 28 Sarah L 25 Kimberly 27 Sarah P 27 Lacey 27 Shawntel 25 Lauren 26 Stacey 26 (a) Construct a dotplot or histogram for the ages of the bachelorettes. Describe important features of the graph. (b) Report the values in the five-‐number summary for the ages of the bachelorettes. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 9 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.3.1: Quantifying Variability Relative to the Median (c) Construct a boxplot for the ages of the bachelorettes. (d) Compare the boxplot you constructed in Question 3c to the graph you constructed in Question 3a. Which do you prefer to represent the distribution of the ages of the bachelorettes? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 10 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will begin to understand that • • distributions of quantitative data exhibit variability and to effectively describe a distribution of such data, measures for variability must be considered. the variance of a data set is a measure of the average squared deviation from the mean of the datapoints. Students will begin to be able to • • compute popular measures of variability, including the sample variance. interpret the variance in context. Part I: Rich Task [Student Handout] Recall the monthly normal temperature data from St. Louis and San Francisco. The data values are contained in the following table: Monthly Normal Temperatures (°F) for St. Louis and San Francisco Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 Here are side-‐by-‐side dotplots of the monthly normal temperatures in the two cities. The measures of variability that you have considered are the range and the interquartile range (IQR). The range is the difference between the maximum value and minimum value. The range does not use all of the specific data values, and it is sensitive to outliers and extreme observations. The IQR is related to the median, because it is based on the values of the first and third quartiles. Remember that the sample mean is the average of the data values. It uses all of the specific values. Your goal is to develop a measure of variability to partner with the sample mean, using all of the specific data values. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean Recall the sample means for the two cities: St. Louis x = (29.3 + 33.9 + 45.1 + 56.7 + 66.1 + 75.4 + 79.8 + 77.6 + 70.2 + 58.4 + 46.2 + 33.9)/12 = 672.6/12 = 56.0 °F. San Francisco y = (51.1 + 54.4 + 54.9 + 56.0 + 56.6 + 58.4 + 59.1 + 60.1 + 62.3 + 62.0 + 57.2 + 51.7)/12 = 683.8/12 = 57.0 °F. Part II: Scaffolded Conceptual Tasks [Student Handout] The range and IQR both take an outside-‐in approach to representing variability. In other words, values in the extremes of the distribution (or at least away from the center of the distribution) are subtracted to provide a number that represents the variability in a distribution. Rather than work from the outside in, this lesson adopts an inside-‐out approach starting with the sample mean. Each datapoint has a distance from the center, which in this case is the sample mean. This distance is known as the deviation from the mean or just deviation. Data values with large deviations contribute to a larger variability in the data set. Values with small deviations do not contribute much to the total variability. These distances from the sample mean to the data values represent the inside-‐out (from the center to the data values) approach to quantifying variability. (1) You know that the sample mean for St. Louis is 56.0 °F. Consider the value 29.3 for the monthly normal temperature for January. The first deviation is computed in the table below. The deviation is the observed value (29.3) minus the sample mean (56.0). In this case the value of the first deviation is 29.3 – 56.0 = –26.7. St. Louis Month Temperature Deviation Jan. 29.3 29.3 – 56.0 = –26.7 Feb. 33.9 March 45.1 Apr. 56.7 May 66.1 June 75.4 July 79.8 Aug. 77.6 Sept. 70.2 Oct. 58.4 Nov. 46.2 Dec. 33.9 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (a) What does it mean that the first deviation is negative (–26.7)? (b) Explain what the first deviation means in the context of the problem. (c) Complete the Deviation column by computing the values for the remaining 11 months. (2) One approach to measuring variability is to combine the information contained in the deviations. A simple way to combine the deviations is to add them up in a sum. Compute the sum of the deviations in the table: St. Louis Month Temperature Deviation Jan. 29.3 –26.7 Feb. 33.9 –22.1 March 45.1 –10.9 Apr. 56.7 0.7 May 66.1 10.1 June 75.4 19.4 July 79.8 23.8 Aug. 77.6 21.6 Sept. 70.2 14.2 Oct. 58.4 2.4 Nov. 46.2 –9.8 Dec. 33.9 –22.1 Sum = Notice that the sum of the deviations is 0.6, which is very close to 0. In fact, if you kept all of the decimal places in the calculations and not rounded the values, the sum is exactly 0. This is true for any data set—the sum of the deviations is 0. Explain why the deviations will always sum to 0. That is, what is it about the sample mean that determines that the distances from the data values to the sample mean always add up to 0? (Note: Ask students some questions here that will guide them toward the idea of squaring the deviations. For example, • • Since the sum of deviations is 0, what is the average or typical deviation? The average deviation is not a very helpful measure of spread, since it is basically 0. It would be nice if you could find a way to assess the typical distance a point is from the mean. To do so you need each distance or deviation to be positive. Mathematically, what can you do to modify the deviations so that they are positive? Here students might say, The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean • “Take the absolute value.” Then ask students how they enjoy working with absolute values? Easy? Hard? What if you square the deviations? Will the squares be positive? Then have them complete the following table.) (3) One way to prevent the deviations from canceling out—that is, to sum to 0—is to square them. The square of any number, positive or negative, is always positive. Remember that 22 = 4 and that (–2)2 = 4 as well. The square of the first deviation is calculated in the following table. The value of the first deviation is –26.7, so (–26.7)2 = 712.89. Compute the squared deviations for the remaining data values, and fill in the table. St. Louis Month Temperature Deviation Squared Deviation 2 Jan. 29.3 –26.7 (–26.7) = 712.89 Feb. 33.9 –22.1 March 45.1 –10.9 Apr. 56.7 0.7 May 66.1 10.1 June 75.4 19.4 July 79.8 23.8 Aug. 77.6 21.6 Sept. 70.2 14.2 Oct. 58.4 2.4 Nov. 46.2 –9.8 Dec. 33.9 –22.1 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (4) Compute the sum of the squared deviations by totaling the values in the column. St. Louis Month Temperature Deviation Squared Deviation Jan. 29.3 –26.7 712.89 Feb. 33.9 –22.1 488.41 March 45.1 –10.9 118.81 Apr. 56.7 0.7 0.49 May 66.1 10.1 102.01 June 75.4 19.4 376.36 July 79.8 23.8 566.44 Aug. 77.6 21.6 466.56 Sept. 70.2 14.2 201.64 Oct. 58.4 2.4 5.76 Nov. 46.2 –9.8 96.04 Dec. 33.9 –22.1 488.41 Sum = The sum of the squared deviations for St. Louis is 3,623.82. What are the units of this value? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (5) The next table contains the monthly normal temperature values for San Francisco. Complete the table by computing the deviations, squared deviations, and sum of the squared deviations. San Francisco Month Temperature Deviation Squared Deviation Jan. 51.1 Feb. 54.4 March 54.9 Apr. 56.0 May 56.6 June 58.4 July 59.1 Aug. 60.1 Sept. 62.3 Oct. 62.0 Nov. 57.2 Dec. 51.7 Sum = Compare the sum of the squared deviations for St. Louis to those for San Francisco. Explain how these values represent how the variability in monthly normal temperatures for the two cities differ. (6) Although the sum of the squared deviations represents the variability in a distribution, it is not a commonly used statistic. Normally, the sum of the squared deviations is part of the calculation of the sample variance. Here is the formula for the sample variance, which is frequently denoted by s2. "#$!%&!'()!"*#+,)-!-)./+'/%0" sample variance = s2 = "! ! !1 ! Note that n represents the sample size, which in this example is equal to 12 for each city. For St. Louis, the sample variance is s2 = "#$!%&!'()!"*#+,)-!-)./+'/%0" 34563786 !2! !2!3697:: "! ! !1 16! ! !1 ! For San Francisco, the sample variance is s2 = "#$!%&!'()!"*#+,)-!-)./+'/%0" 133456 !2! !2!15416 "! ! !1 16! ! !1 ! The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean What are the units for the sample variances in the context of monthly normal temperatures? Write a brief comparison of the values of the sample variances for the two cities, describing how they represent differences in variability in the distribution of monthly normal temperatures. Reflection The challenge of this lesson is to motivate the need to consider some way to stop the deviations from canceling. The tradition is to accumulate the sum of the squared deviations and then compute the sample variance by dividing the sum by the square root of one less than the sample size. Curious students may ask about the n – 1 in the denominator. It makes some sense to use n in the denominator, which results in the average of the squared deviations. The mathematical reason for using n – 1 has to do with the bias of a statistic, which is a topic that is usually encountered in a mathematical statistics course. Emphasizing the units of the measures of variability is important because many users of statistics prefer to use square roots so that the resulting measure—known as the sample standard deviation—is in the units of the original data. Standard deviation is the topic of the next lesson. Homework [Student Handout] (1) Recall the example in Lesson 2.2.1 about the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period, accompanied by the weight gains for a sample of six adolescent rats that were given a high daily dose of a stimulant drug. Here are the weight gains for the two groups: Control 169 154 179 202 197 175 Stimulant Group 137 158 153 147 168 147 (a) Using technology or by hand, compute the sample variances for the weight gains in each group. Be sure to state the units. (b) Write a brief comparison of the sample variances for the weight gains in the two groups. Do the sample variances indicate that the variability in the distributions is substantially different or not? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (2) Here are the names and ages for the 30 women on the 15th season (2010) of “The Bachelor.” Bachelorette Age Bachelorette Age Alli 24 Lindsay 25 Ashley H 26 Lisa M 24 Ashley S 26 Lisa P 27 Britnee 25 Madison 25 Britt 25 Marissa 26 Chantal 28 Meghan 30 Cristy 30 Melissa 32 Emily 24 Michelle 30 J 26 Raichel 29 Jackie 27 Rebecca 30 Jill 28 Renee 28 Keltie 28 Sarah L 25 Kimberly 27 Sarah P 27 Lacey 27 Shawntel 25 Lauren 26 Stacey 26 Report the sample variance for the ages of the women on the show. Include units. Explain what the sample variance means in context. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean Part I Recall the monthly normal temperature data from St. Louis and San Francisco. The data values are contained in the following table: Monthly Normal Temperatures (°F) for St. Louis and San Francisco Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 Here are side-‐by-‐side dotplots of the monthly normal temperatures in the two cities. The measures of variability that you have considered are the range and the interquartile range (IQR). The range is the difference between the maximum value and minimum value. The range does not use all of the specific data values, and it is sensitive to outliers and extreme observations. The IQR is related to the median, because it is based on the values of the first and third quartiles. Remember that the sample mean is the average of the data values. It uses all of the specific values. Your goal is to develop a measure of variability to partner with the sample mean, using all of the specific data values. Recall the sample means for the two cities: St. Louis x = (29.3 + 33.9 + 45.1 + 56.7 + 66.1 + 75.4 + 79.8 + 77.6 + 70.2 + 58.4 + 46.2 + 33.9)/12 = 672.6/12 = 56.0 °F. San Francisco y = (51.1 + 54.4 + 54.9 + 56.0 + 56.6 + 58.4 + 59.1 + 60.1 + 62.3 + 62.0 + 57.2 + 51.7)/12 = 683.8/12 = 57.0 °F. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean Part II The range and IQR both take an outside-‐in approach to representing variability. In other words, values in the extremes of the distribution (or at least away from the center of the distribution) are subtracted to provide a number that represents the variability in a distribution. Rather than work from the outside in, this lesson adopts an inside-‐out approach starting with the sample mean. Each datapoint has a distance from the center, which in this case is the sample mean. This distance is known as the deviation from the mean or just deviation. Data values with large deviations contribute to a larger variability in the data set. Values with small deviations do not contribute much to the total variability. These distances from the sample mean to the data values represent the inside-‐out (from the center to the data values) approach to quantifying variability. (1) You know that the sample mean for St. Louis is 56.0 °F. Consider the value 29.3 for the monthly normal temperature for January. The first deviation is computed in the table below. The deviation is the observed value (29.3) minus the sample mean (56.0). In this case the value of the first deviation is 29.3 – 56.0 = –26.7. St. Louis Month Temperature Deviation Jan. 29.3 29.3 – 56.0 = –26.7 Feb. 33.9 March 45.1 Apr. 56.7 May 66.1 June 75.4 July 79.8 Aug. 77.6 Sept. 70.2 Oct. 58.4 Nov. 46.2 Dec. 33.9 (a) What does it mean that the first deviation is negative (–26.7)? (b) Explain what the first deviation means in the context of the problem. (c) Complete the Deviation column by computing the values for the remaining 11 months. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (2) One approach to measuring variability is to combine the information contained in the deviations. A simple way to combine the deviations is to add them up in a sum. Compute the sum of the deviations in the table: St. Louis Month Temperature Deviation Jan. 29.3 –26.7 Feb. 33.9 –22.1 March 45.1 –10.9 Apr. 56.7 0.7 May 66.1 10.1 June 75.4 19.4 July 79.8 23.8 Aug. 77.6 21.6 Sept. 70.2 14.2 Oct. 58.4 2.4 Nov. 46.2 –9.8 Dec. 33.9 –22.1 Sum = Notice that the sum of the deviations is 0.6, which is very close to 0. In fact, if you kept all of the decimal places in the calculations and not rounded the values, the sum is exactly 0. This is true for any data set—the sum of the deviations is 0. Explain why the deviations will always sum to 0. That is, what is it about the sample mean that determines that the distances from the data values to the sample mean always add up to 0? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (3) One way to prevent the deviations from canceling out—that is, to sum to 0—is to square them. The square of any number, positive or negative, is always positive. Remember that 22 = 4 and that (–2)2 = 4 as well. The square of the first deviation is calculated in the following table. The value of the first deviation is –26.7, so (–26.7)2 = 712.89. Compute the squared deviations for the remaining data values, and fill in the table. St. Louis Month Temperature Deviation Squared Deviation 2 Jan. 29.3 –26.7 (–26.7) = 712.89 Feb. 33.9 –22.1 March 45.1 –10.9 Apr. 56.7 0.7 May 66.1 10.1 June 75.4 19.4 July 79.8 23.8 Aug. 77.6 21.6 Sept. 70.2 14.2 Oct. 58.4 2.4 Nov. 46.2 –9.8 Dec. 33.9 –22.1 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (4) Compute the sum of the squared deviations by totaling the values in the column. St. Louis Month Temperature Deviation Squared Deviation Jan. 29.3 –26.7 712.89 Feb. 33.9 –22.1 488.41 March 45.1 –10.9 118.81 Apr. 56.7 0.7 0.49 May 66.1 10.1 102.01 June 75.4 19.4 376.36 July 79.8 23.8 566.44 Aug. 77.6 21.6 466.56 Sept. 70.2 14.2 201.64 Oct. 58.4 2.4 5.76 Nov. 46.2 –9.8 96.04 Dec. 33.9 –22.1 488.41 Sum = The sum of the squared deviations for St. Louis is 3,623.82. What are the units of this value? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (5) The next table contains the monthly normal temperature values for San Francisco. Complete the table by computing the deviations, squared deviations, and sum of the squared deviations. San Francisco Month Temperature Deviation Squared Deviation Jan. 51.1 Feb. 54.4 March 54.9 Apr. 56.0 May 56.6 June 58.4 July 59.1 Aug. 60.1 Sept. 62.3 Oct. 62.0 Nov. 57.2 Dec. 51.7 Sum = Compare the sum of the squared deviations for St. Louis to those for San Francisco. Explain how these values represent how the variability in monthly normal temperatures for the two cities differ. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (6) Although the sum of the squared deviations represents the variability in a distribution, it is not a commonly used statistic. Normally, the sum of the squared deviations is part of the calculation of the sample variance. Here is the formula for the sample variance, which is frequently denoted by s2. "#$!%&!'()!"*#+,)-!-)./+'/%0" sample variance = s2 = "! ! !1 ! Note that n represents the sample size, which in this example is equal to 12 for each city. For St. Louis, the sample variance is s2 = For San Francisco, the sample variance is s2 = "#$!%&!'()!"*#+,)-!-)./+'/%0" 34563786 !2! !2!3697:: "! ! !1 16! ! !1 ! "#$!%&!'()!"*#+,)-!-)./+'/%0" 133456 !2! !2!15416 "! ! !1 16! ! !1 ! What are the units for the sample variances in the context of monthly normal temperatures? Write a brief comparison of the values of the sample variances for the two cities, describing how they represent differences in variability in the distribution of monthly normal temperatures. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean Homework (1) Recall the example in Lesson 2.2.1 about the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period, accompanied by the weight gains for a sample of six adolescent rats that were given a high daily dose of a stimulant drug. Here are the weight gains for the two groups: Control 169 154 179 202 197 175 Stimulant Group 137 158 153 147 168 147 (a) Using technology or by hand, compute the sample variances for the weight gains in each group. Be sure to state the units. (b) Write a brief comparison of the sample variances for the weight gains in the two groups. Do the sample variances indicate that the variability in the distributions is substantially different or not? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Student Handout April 16, 2012 (Full Version 1.0) Initiating Lesson 2.4.1: Quantifying Variability Relative to the Mean (2) Here are the names and ages for the 30 women on the 15th season (2010) of “The Bachelor.” Bachelorette Age Bachelorette Age Alli 24 Lindsay 25 Ashley H 26 Lisa M 24 Ashley S 26 Lisa P 27 Britnee 25 Madison 25 Britt 25 Marissa 26 Chantal 28 Meghan 30 Cristy 30 Melissa 32 Emily 24 Michelle 30 J 26 Raichel 29 Jackie 27 Rebecca 30 Jill 28 Renee 28 Keltie 28 Sarah L 25 Kimberly 27 Sarah P 27 Lacey 27 Shawntel 25 Lauren 26 Stacey 26 Report the sample variance for the ages of the women on the show. Include units. Explain what the sample variance means in context. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 9 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will understand that • • • sample variance represents variability in the data and is reported in squared units. sample standard deviation is the square root of the sample variance. sample standard deviation is reported in the original units of the data. Students will be able to • • • • • • report the value of a sample variance using technology. interpret the value of a sample variance in context. compare the values of two sample variances in a ratio. report the value of a sample standard deviation using technology. interpret the value of a sample standard deviation in context. compare the values of two sample standard deviations in a ratio. Activity Use technology to compute the values for the sample variances of two groups. Compare the values of the sample variances in the groups using a ratio. Discuss the relative size of one sample variance to the other. Interpret the values and their comparison in context. Discuss whether the comparison matches the visual assessment of variability in the distributions. Discuss the comfort level with presenting results in squared units. Compute the sample standard deviation as the square root of the sample variance. Interpret the sample standard deviation in context. Compare the relative sizes of two sample standard deviations. Introduction to the Context of the Task [Student Handout] The following table shows the state-‐by-‐state Median Household Income data1 matched to whether the state was won by Democrat Barack Obama or Republican John McCain in the 2008 presidential election. Median Household Income (Thousands of Dollars) California 57 Alabama 44 Colorado 61 Alaska 64 Connecticut 65 Arizona 47 Delaware 51 Arkansas 40 District of Columbia 56 Georgia 46 Florida 45 Idaho 47 Hawaii 62 Kansas 48 Blue States (Obama) Red States (McCain) Median Household Income (Thousands of Dollars) 1 Retrieved from www.census.gov/hhes/www/income/data/statemedian/index.html The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance Median Household Income (Thousands of Dollars) Illinois 53 Kentucky 41 Indiana 47 Louisiana 40 Iowa 50 Mississippi 36 Maine 47 Missouri 46 Maryland 64 Montana 43 Massachusetts 60 Nebraska 51 Michigan 50 North Dakota 50 Minnesota 55 Oklahoma 46 Nevada 55 South Carolina 42 New Hampshire 66 South Dakota 52 New Jersey 65 Tennessee 40 New Mexico 42 Texas 46 New York 50 Utah 63 North Carolina 43 West Virginia 38 Ohio 47 Wyoming 53 Oregon 52 Pennsylvania 51 Rhode Island 53 Vermont 51 Virginia 62 Washington 57 Wisconsin 51 Blue States (Obama) Red States (McCain) Median Household Income (Thousands of Dollars) The authors of this lesson were curious to learn if there was any association between median income and whether the state was won by the Democratic or Republican candidate. They thought that there might be differences in the representative median household income in the two groups, but they also thought there might be differences in variability in the two groups. In other words, the median household income values in one group might be more or less dispersed than the median household income values in the other group. This might be possible because Obama’s support came from a more diverse collection of states scattered across the country, while McCain’s support was concentrated in the South and West. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance Direct Instruction The following are the steps for computing the sample variance for a given data set: 1. Compute the mean of the data. 2. Subtract the mean from each data point and square each result. 3. Compute the sum of the results in Step 2. 4. Divide the sum in Step 3 by the number of data points minus 1. Tasks [Student Handout] (1) Construct side-‐by-‐side boxplots to represent the distributions of median household income among the states won by Obama and the states won by McCain. (2) Compute the sample variances to represent the variability in the distributions of median household income in the two groups. (3) Interpret the values of the sample variances in this context (i.e., What do they mean?). (4) To compare the two sample variances, form the ratio of the larger sample variance to the smaller sample variance. Does the ratio indicate that the values of the sample variances are similar or different between the two groups? (5) Compute the sample standard deviations of the distributions of median household income in the two groups by taking the square root of each sample variance. (6) Interpret the values of the sample standard deviations in this context (i.e., What do they mean?). (7) As with the two sample variances, form the ratio of the larger sample standard deviation to the smaller sample standard deviation. Does the ratio make you believe that the distributions have similar variability? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance Wrap-‐Up Here are side-‐by-‐side boxplots for the distributions of median household income in the two groups: Boxplot of Income vs Election 70 65 Income 60 55 50 45 40 35 Blue Red Election The variability represented in the two groups does not appear to be dramatically different. If anything, the red states (McCain) may have greater variability in their distribution of median household income. This is contrary to the hypothesis that there may be greater diversity (from a financial standpoint) among the blue states (Obama). Minitab was used to compute the values of descriptive statistics for the distributions of median household income in the two groups below: Descriptive Statistics: Income Variable Election N Mean StDev Variance Income Blue 29 54.07 6.83 46.64 Income Red 22 46.50 7.12 50.74 # The sample variance for the blue states is "" = 46.64 thousand dollars2. The sample variance for the red " ! states is "" = 50.74 thousand dollars2. These values are the representative squared deviations for their ! respective groups. The variability in median household income in the red states is larger than the variability in the blue states, as the graph indicates. Sample variances are never negative and are often compared by forming a ratio of sample variances. The ratio of the larger sample variance to the smaller sample variance in this case is 50.74/46.64 = 1.09. This means the variability represented by the sample variance in the red states is 1.09 times larger than the variability in the blue states. The sample standard deviation for the blue states is "" ! = ! #$%$# = 6.83 thousand dollars. ! The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance The sample standard deviation for the red states is "" ! = ! #$%&' = 7.12 thousand dollars. ! These are the representative deviations (in the units of the original data) for the two groups. Like sample variances, sample standard deviations can be compared in a ratio because they are never negative. The ratio of the larger sample standard deviation to the smaller sample standard deviation in this case is 7.12/6.83 = 1.04. This indicates the variability in the blue states (as represented by the sample standard deviations) is 1.04 times as large as the variability in the red states. Because the ratios of the sample variances and the sample standard deviations are close to 1, it seems there is not much difference in the variability in median household income between the states that Obama won and the states that McCain won in the 2008 presidential election. Homework [Student Handout] (1) Recall the monthly normal temperature data for St. Louis and San Francisco. The data appear in the following table: Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 (a) Compute the sample variance for the monthly normal temperatures in St. Louis. Do the same for San Francisco. Interpret the two sample variances in context. (b) Compute the ratio of the sample variances, with the larger sample variance in the numerator. Explain whether the ratio gives the impression that the distributions of monthly normal temperatures in the two cities have similar or different variability. (c) Compute the sample standard deviations for the monthly normal temperatures in the two cities. Interpret the sample standard deviations in this context. What information do they provide? (d) Report the ratio of the two sample standard deviations, and explain whether the ratio makes you believe that the variability in monthly normal temperatures in the two cities are similar or different. (2) Recall the data introduced in Lesson 2.2.1 about the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period, accompanied by the weight gains for a sample of six adolescent rats given a high daily dose of a stimulant drug. Here are the weight gains for the two groups: Control Group 169 154 179 202 197 175 Stimulant Group 137 158 153 147 168 147 (a) Compute the sample variance for the weight gains in the control group and again for the stimulant group. Interpret the two sample variances in this context. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance (b) Compute the ratio of the sample variances, with the larger sample variance in the numerator. Explain whether the ratio gives the impression that the distributions of weight gains in the two groups have similar or different variability. (c) Compute the sample standard deviations for the weight gains in the two groups. Interpret the sample standard deviations in this context. (d) Compute the ratio of the two sample standard deviations, and explain whether the ratio makes you believe that the variability in weight gains in the two groups are similar or different. (e) One concern in a study such as this is that the experimental group (in this case, the stimulant group) may experience substantially more or less consistent results than the control group. Based on the evidence you have accumulated in Questions 2a–2d, does there appear to be a difference in the consistency of weight gain the control and stimulant groups? Explain by referring to specific statistical evidence. Answers (1) Here are the descriptive statistics for the monthly normal temperatures in St. Louis and San Francisco: Descriptive Statistics: Temperature Variable City N Mean StDev Variance Temperature SF 12 56.98 3.62 13.12 Temperature STL 12 56.05 18.15 329.44 Clearly there is more variability in the distribution of monthly normal temperatures for St. Louis than for San Francisco. The ratio of the sample variances is 329.44/13.12 = 25.1. The representative squared deviation is over 25 times greater for St. Louis than for San Francisco. The ratio of the sample standard deviations is 18.15/3.62 = 5.0. The representative deviation is five times greater for St. Louis than for San Francisco. (2) Here are the descriptive statistics for the rat weight gain study: Descriptive Statistics: Control, Stimulant Variable N Mean StDev Variance Control 6 179.33 17.85 318.67 Stimulant 6 151.67 10.65 113.47 There seems to be more variability in the control group. The ratio of the sample variances is 318.67/113.47 = 2.8. That is, the representative squared deviation is nearly three times as large for the control group than for the stimulant group. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance The ratio of the sample standard deviations is 17.85/10.65 = 1.7. The representative deviation is almost twice as large in the control group. Based on the small sample size in this study, the weight gains in the stimulant group had less variability (i.e., were more consistent) than the weight gains in the stimulant group. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance 1 The following table shows the state-‐by-‐state Median Household Income data matched to whether the state was won by Democrat Barack Obama or Republican John McCain in the 2008 presidential election. Median Household Income (Thousands of Dollars) California 57 Alabama 44 Colorado 61 Alaska 64 Connecticut 65 Arizona 47 Delaware 51 Arkansas 40 District of Columbia 56 Georgia 46 Florida 45 Idaho 47 Hawaii 62 Kansas 48 Illinois 53 Kentucky 41 Indiana 47 Louisiana 40 Iowa 50 Mississippi 36 Maine 47 Missouri 46 Maryland 64 Montana 43 Massachusetts 60 Nebraska 51 Michigan 50 North Dakota 50 Minnesota 55 Oklahoma 46 Nevada 55 South Carolina 42 New Hampshire 66 South Dakota 52 New Jersey 65 Tennessee 40 New Mexico 42 Texas 46 New York 50 Utah 63 North Carolina 43 West Virginia 38 Ohio 47 Wyoming 53 Oregon 52 Pennsylvania 51 Rhode Island 53 Vermont 51 Virginia 62 Washington 57 Wisconsin 51 Blue States (Obama) Red States (McCain) Median Household Income (Thousands of Dollars) 1 Retrieved from www.census.gov/hhes/www/income/data/statemedian/index.html The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance The authors of this lesson were curious to learn if there was any association between median income and whether the state was won by the Democratic or Republican candidate. They thought that there might be differences in the representative median household income in the two groups, but they also thought there might be differences in variability in the two groups. In other words, the median household income values in one group might be more or less dispersed than the median household income values in the other group. This might be possible because Obama’s support came from a more diverse collection of states scattered across the country, while McCain’s support was concentrated in the South and West. (1) Construct side-‐by-‐side boxplots to represent the distributions of median household income among the states won by Obama and the states won by McCain. (2) Compute the sample variances to represent the variability in the distributions of median household income in the two groups. (3) Interpret the values of the sample variances in this context (i.e., What do they mean?). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance (4) To compare the two sample variances, form the ratio of the larger sample variance to the smaller sample variance. Does the ratio indicate that the values of the sample variances are similar or different between the two groups? (5) Compute the sample standard deviations of the distributions of median household income in the two groups by taking the square root of each sample variance. (6) Interpret the values of the sample standard deviations in this context (i.e., What do they mean?). (7) As with the two sample variances, form the ratio of the larger sample standard deviation to the smaller sample standard deviation. Does the ratio make you believe that the distributions have similar variability? Explain your reasoning. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance Homework (1) Recall the monthly normal temperature data for St. Louis and San Francisco. The data appear in the following table: Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 (a) Compute the sample variance for the monthly normal temperatures in St. Louis. Do the same for San Francisco. Interpret the two sample variances in context. (b) Compute the ratio of the sample variances, with the larger sample variance in the numerator. Explain whether the ratio gives the impression that the distributions of monthly normal temperatures in the two cities have similar or different variability. (c) Compute the sample standard deviations for the monthly normal temperatures in the two cities. Interpret the sample standard deviations in this context. What information do they provide? (d) Report the ratio of the two sample standard deviations, and explain whether the ratio makes you believe that the variability in monthly normal temperatures in the two cities are similar or different. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance (2) Recall the data introduced in Lesson 2.2.1 about the weight gains (in grams) for a sample of six normal adolescent laboratory rats over a one-‐month period, accompanied by the weight gains for a sample of six adolescent rats given a high daily dose of a stimulant drug. Here are the weight gains for the two groups: Control Group 169 154 179 202 197 175 Stimulant Group 137 158 153 147 168 147 (a) Compute the sample variance for the weight gains in the control group and again for the stimulant group. Interpret the two sample variances in this context. (b) Compute the ratio of the sample variances, with the larger sample variance in the numerator. Explain whether the ratio gives the impression that the distributions of weight gains in the two groups have similar or different variability. (c) Compute the sample standard deviations for the weight gains in the two groups. Interpret the sample standard deviations in this context. (d) Compute the ratio of the two sample standard deviations, and explain whether the ratio makes you believe that the variability in weight gains in the two groups are similar or different. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 16, 2012 (Full Version 1.0) Supporting Lesson 2.4.2: The Sample Variance (e) One concern in a study such as this is that the experimental group (in this case, the stimulant group) may experience substantially more or less consistent results than the control group. Based on the evidence you have accumulated in Questions 2a–2d, does there appear to be a difference in the consistency of weight gain the control and stimulant groups? Explain by referring to specific statistical evidence. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6

Statway TM A statistics pathway for college students

Related documents

Products

Support

Statway TM A statistics pathway for college students

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib