Statway TM A statistics pathway for college students

Statway TM A statistics pathway for college students Module 1: Statistical Studies and Overview of the Data Analysis Process Module 2: Summarizing Data Graphically and Numerically Module 3: Reasoning About Bivariate Numerical Data—Linear Relationships Module 4: Modeling Nonlinear Relationships Module 5: Reasoning About Bivariate Categorical Data and Introduction to Probability Module 6: Formalizing Probability and Probability Distributions Module 7: Linking Probability to Statistical Inference Module 8: Inference for One Proportion Module 9: Inference for Two Proportions Module 10: Inference for Means Module 11: Chi-Squared Tests Module 12: Other Mathematical Content Version 1.0 A resource from The Charles A. Dana Center at The University of Texas at Austin July 2011 Frontmatter Statway—Full Version 1.0, July 2011 Unless otherwise indicated, the materials found in this resource are Copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin Outside the license described below, no part of this resource shall be reproduced, stored in a retrieval system, or transmitted by any means—electronically, mechanically, or via photocopying, recording, or otherwise, including via methods yet to be invented—without express written permission from the Foundation and the University. The original version of this work was created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. STATWAYTM / StatwayTM is a trademark of the Carnegie Foundation for the Advancement of Teaching. *** This copyright notice is intended to prohibit unlicensed commercial use of the Statway materials. License for use Statway Version 1.0, developed by the Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported license. To view the details of this license, see creativecommons.org/licenses/by-nc-sa/3.0. In general, under this license You are free: to Share—to copy, distribute, and transmit the work to Remix—to adapt the work Under the following conditions: Attribution—You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). We request you attribute the work thus: The original version of this work was developed by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This work is used (or adapted) under the Creative Commons Attribution-NonCommercialShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license: creativecommons.org/licenses/by-nc-sa/3.0. For more information about Carnegie’s work on Statway, see www.carnegiefoundation.org/statway; for information on the Dana Center’s work on The New Mathways Project, see www.utdanacenter.org/mathways. Noncommercial—You may not use this work for commercial purposes. Share Alike—If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. The Charles A. Dana Center at the University of Texas at Austin, as well as the authors and editors, assume no liability for any loss or damage resulting from the use of this resource. We have made extensive efforts to ensure the accuracy of the information in this resource, to provide proper acknowledgement of original sources, and to otherwise comply with copyright law. If you find an error or you believe we have failed to provide proper acknowledgment, please contact us at dana-txshop@utlists.utexas.edu. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. ii Frontmatter The Charles A. Dana Center The University of Texas at Austin 1616 Guadalupe Street, Suite 3.206 Austin, TX 78701-1222 Fax: 512-232-1855 dana-txshop@utlists.utexas.edu www.utdanacenter.org Statway—Full Version 1.0, July 2011 The Carnegie Foundation for the Advancement of Teaching 51 Vista Lane Stanford, California, 94305 Phone: 650-566-5110 pathways@carnegiefoundation.org www.carnegiefoundation.org About the development of this resource The content for this full version of Statway was developed under a November 30, 2010, agreement by a team of faculty authors and reviewers contracted and managed by the Charles A. Dana Center at the University of Texas at Austin with funding from the Carnegie Foundation for the Advancement of Teaching. This resource was produced in Microsoft Word 2008 and 2011 for the Mac. The content of these 12 modules was developed and produced (that is, written, reviewed, edited, and laid out) by the Charles A. Dana Center at The University of Texas at Austin and delivered by the Dana Center to the Carnegie Foundation for the Advancement of Teaching on June 30, 2011. Some issues to be aware of: • PDF files need to be viewed with Adobe Acrobat for full functionality. If viewed through Preview, which is the default on some computers, the URLs may not be correct. • The file names indicate the lesson number and whether the document is the instructor or student version or the out-of-class experience. The Dana Center is engaged in a process of revising and improving these materials to create the Dana Center Statistics Pathway. We welcome feedback from the community as part of our course revision process. If you would like to discuss these materials or learn more about the Dana Center’s plans for this course, contact us at mathways@austin.utexas.edu. About the Charles A. Dana Center at The University of Texas at Austin The Dana Center collaborates with local and national entities to improve education systems so that they foster opportunity for all students, particularly in mathematics and science. We are dedicated to nurturing students’ intellectual passions and ensuring that every student leaves school prepared for success in postsecondary education and the contemporary workplace—and for active participation in our modern democracy. The Center was founded in 1991 in the College of Natural Sciences at The University of Texas at Austin. Our original purpose—which continues in our work today—was to raise student achievement in K–16 mathematics and science, especially for historically underserved populations. We carry out our work by supporting high standards and building system capacity; collaborating with key state and national organizations to address emerging issues; creating and delivering professional supports for educators and education leaders; and writing and publishing education resources, including student supports. Our staff of more than 80 researchers and education professionals has worked intensively with dozens of school systems in nearly 20 states and with 90 percent of Texas’s more than 1,000 school districts. As one of the College’s largest research units, the Dana Center works to further the university’s mission of achieving excellence in education, research, and public service. We are committed to ensuring that the accident of where a student attends school does not limit the academic opportunities he or she can pursue. For more information about the Dana Center and our programs and resources, see our homepage at www.utdanacenter.org. To access our resources (many of them free) please see our products index at www.utdanacenter.org/products. To learn about Dana Center professional development sessions, see our professional development site at www.utdanacenter.org/pd. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. iii Frontmatter Statway—Full Version 1.0, July 2011 Acknowledgments The original version of this work was created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. Carnegie Corporation of New York, The Bill & Melinda Gates Foundation, The William and Flora Hewlett Foundation, Lumina Foundation, and The Kresge Foundation joined in partnership with the Carnegie Foundation for the Advancement of Teaching in this work. Leadership—Charles A. Dana Center at the University of Texas at Austin Uri Treisman, director Susan Hudson Hull, program director of mathematics national initiatives Leadership—Carnegie Foundation for the Advancement of Teaching Anthony S. Bryk, president Bernadine Chuck Fong, senior managing partner Louis Gomez, senior fellow Paul LeMahieu, senior fellow James Stigler, senior fellow Uri Treisman, senior fellow Guadalupe Valdés, senior fellow Statway Project Leads Kristen Bishop, former team lead for the New Mathways Project, the Charles A. Dana Center at the University of Texas at Austin Thomas J. Connolly, project lead, Statway, the Charles A. Dana Center at the University of Texas at Austin Karon Klipple, director of Statway, the Carnegie Foundation for the Advancement of Teaching Jane Muhich, director of Quantway, the Carnegie Foundation for the Advancement of Teaching Project Staff—Charles A. Dana Center at the University of Texas at Austin Richard Blount, advisor Kathi Cook, project director, online services team Jenna Cullinane, research associate Steve Engler, lead editor and production editor Amy Getz, team lead for the New Mathways Project Susan Hudson Hull, program director of mathematics national initiatives Joseph Hunt, graduate research assistant Rachel Jenkins, consulting editor Erica Moreno, program coordinator Carol Robinson, administrative associate Cathy Seeley, senior fellow Rachele Seifert, administrative associate Lilly Soto, senior administrative associate Phil Swann, senior designer Laura Torres, graduate research assistant Thomas Wiegel, freelance formatter and proofreader The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. iv Frontmatter Statway—Full Version 1.0, July 2011 Authors Contracted by the Dana Center Roxy Peck, professor emerita of statistics, California Polytechnic State University, San Luis Obispo, California Beth Chance, professor of statistics, California Polytechnic State University, San Luis Obispo, California Robert C. delMas, associate professor of educational psychology, University of Minnesota, Minneapolis, Minnesota Scott Guth, professor of mathematics, Mt. San Antonio College, Walnut, California Rebekah Isaak, graduate research student, University of Minnesota, Minneapolis, Minnesota Leah McGuire, assistant professor, University of Minnesota, Minneapolis, Minnesota Jiyoon Park, graduate research student, University of Minnesota, Minneapolis, Minnesota Brian Kotz, associate professor of mathematics, Montgomery College, Germantown, Maryland Chris Olsen, assistant professor of mathematics and statistics, Grinnell College, Grinnell, Iowa Mary Parker, professor of mathematics, Austin Community College, Austin, Texas Michael A. Posner, associate professor of statistics, Villanova University, Villanova, Pennsylvania Thomas H. Short, professor, John Carroll University, University Heights, Ohio Penny Smeltzer, teacher of statistics, Westwood High School, Austin, Texas Myra Snell, professor of mathematics, Los Medanos College, Pittsburg, California Laura Ziegler, graduate research student, University of Minnesota, Minneapolis, Minnesota Reviewers Contracted by the Dana Center Michelle Brock, American River College, Sacramento, California Thomas J. Connolly, the Charles A. Dana Center at the University of Texas at Austin Andre Freeman, Capital Community College, Hartford, Connecticut Karon Klipple, the Carnegie Foundation for the Advancement of Teaching Roxy Peck, professor emerita of statistics, California Polytechnic State University, San Luis Obispo, California Jim Smart, Tallahassee Community College, Tallahassee, Florida Myra Snell, Los Medanos College, Pittsburg, California Committee for Statistics Learning Outcomes Rose Asera, formerly of the Carnegie Foundation for the Advancement of Teaching Kristen Bishop, formerly of the Charles A. Dana Center at the University of Texas at Austin Richelle (Rikki) Blair, American Mathematical Association of Two-Year Colleges (AMATYC); Lakeland Community College, Ohio David Bressoud, Mathematical Association of America (MAA); Macalester College, Minnesota John Climent, American Mathematical Association of Two-Year Colleges (AMATYC); Cecil College, Maryland Peg Crider, Lone Star College, Tomball, Texas Jenna Cullinane, the Charles A. Dana Center at the University of Texas at Austin Robert C. delMas, Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); University of Minnesota, Minneapolis, Minnesota Bernadine Chuck Fong, the Carnegie Foundation for the Advancement of Teaching Karen Givvin, the University of California, Los Angeles Larry Gray, American Mathematical Society (AMS); University of Minnesota Susan Hudson Hull, the Charles A. Dana Center at the University of Texas at Austin Rob Kimball, American Mathematical Association of Two-Year Colleges (AMATYC); Wake Technical Community College, North Carolina Dennis Pearl, Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); The Ohio State University Roxy Peck, American Statistical Association (ASA); Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); California Polytechnic State University, San Luis Obispo, California The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. v Frontmatter Statway—Full Version 1.0, July 2011 Myra Snell, American Mathematical Association of Two-Year Colleges (AMATYC); Los Medanos College, Pittsburg, California Jim Stigler, the Carnegie Foundation for the Advancement of Teaching; the University of California, Los Angeles Daniel Teague, Mathematical Association of America (MAA); North Carolina School of Science and Mathematics, Durham Uri Treisman, the Carnegie Foundation for the Advancement of Teaching; the Charles A. Dana Center at the University of Texas at Austin Version 1.0 of Statway was developed in collaboration with faculty from the following colleges, the “Collaboratory,” who advised on the development of the course. These Collaboratory colleges are: Florida Miami Dade College, Miami, Florida Tallahassee Community College, Tallahassee, Florida Valencia Community College, Orlando, Florida California American River College, Sacramento, California Foothill College, Los Altos Hills, California Mt. San Antonio College, Walnut, California Pierce College, Woodland Hills, California San Diego City College, San Diego, California California State University System Texas CSU Northridge Sacramento State University San Jose State University Austin Community College, Austin, Texas El Paso Community College, El Paso, Texas Houston Community College, Houston, Texas Northwest Vista College, San Antonio, Texas Richland College, Dallas, Texas Connecticut Washington Capital Community College, Hartford, Connecticut Gateway Community College, New Haven, Connecticut Housatonic Community College, Bridgeport, Connecticut Naugatuck Valley Community College, Waterbury, Connecticut Seattle Central Community College, Seattle, Washington Tacoma Community College, Tacoma, Washington The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. vi Frontmatter Statway—Full Version 1.0, July 2011 Statway, Full Version 1.0, July 2011 Table of Contents Module 1: Statistical Studies and Overview of the Data Analysis Process Lesson 1.1.1: The Statistical Analysis Process Lesson 1.1.2: Types of Statistical Studies and Scope of Conclusions Lesson 1.2.1: Collecting Data by Sampling Lesson 1.2.2: Random Sampling Lesson 1.2.3: Other Sampling Strategies Lesson 1.2.4: Sources of Bias in Sampling Lesson 1.3.1: Collecting Data by Conducting an Experiment Lesson 1.3.2: Other Design Considerations—Blinding, Control Groups, and Placebos Lesson 1.4.1: Drawing Conclusions from Statistical Studies Module 2: Summarizing Data Graphically and Numerically Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Lesson 2.1.2: Constructing Histograms for Quantitative Data Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Lesson 2.2.2: Constructing Histograms for Quantitative Data Lesson 2.3.1: Quantifying Variability Relative to the Median Lesson 2.4.1: Quantifying Variability Relative to the Mean Lesson 2.4.2: The Sample Variance Module 3: Reasoning About Bivariate Numerical Data—Linear Relationships Lesson 3.1.1: Introduction to Scatterplots and Bivariate Relationships Lesson 3.1.2: Developing an Intuitive Sense of Form, Direction, and Strength of the Relationship Between Two Measurements Lesson 3.1.3: Introduction to the Correlation Coefficient and Its Properties Lesson 3.1.4: Correlation Formula Lesson 3.1.5: Correlation Is Not Causation Lesson 3.2.1: Using Lines to Make Predictions Lesson 3.2.2: Least Squares Regression Line as Line of Best Fit The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. vii Frontmatter Statway—Full Version 1.0, July 2011 Lesson 3.2.3: Investigating the Meaning of Numbers in the Equation of a Line Lesson 3.2.4: Special Properties of the Least Squares Regression Line Lesson 3.3.1: Using Residuals to Determine If a Line Is a Good Fit Lesson 3.3.2: Using Residuals to Determine If a Line Is an Appropriate Model Module 4: Modeling Nonlinear Relationships Lesson 4.1.1: Investigating Patterns in Data Lesson 4.1.2: Exponential Models Lesson 4.1.3: Assessing How Well a Model Fits the Data Module 5: Reasoning About Bivariate Categorical Data and Introduction to Probability Lesson 5.1.1: Reasoning About Risk and Chance Lesson 5.1.2: Defining Risk Lesson 5.1.3: Interpreting Risk Lesson 5.1.4: Comparing Risks Lesson 5.1.5: More on Conditional Risks Module 6: Formalizing Probability and Probability Distributions Lesson 6.1.1: Probability Lesson 6.1.2: Probability Rules Lesson 6.1.3: Simulation, Discrete Random Variables, and Probability Distributions Lesson 6.2.1: Probability Distributions of Continuous Random Variables Lesson 6.2.2: Z-Scores and Normal Distributions Lesson 6.2.3: Using Normal Distributions to Find Probabilities and Critical Values Module 7: Linking Probability to Statistical Inference Lesson 7.1.1: Predicting an Election—Statistics and Sampling Variability Lesson 7.1.2: Sampling from a Population Lesson 7.1.3: Testing Statistical Hypotheses Lesson 7.2.1: Two Types of Inferential Procedures—Estimation and Hypothesis Testing Lesson 7.2.2: Connecting Sampling Distributions and Confidence Intervals Lesson 7.2.3: Connecting Sampling Distributions and Hypothesis Testing The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. viii Frontmatter Statway—Full Version 1.0, July 2011 Module 8: Inference for One Proportion Lesson 8.1.1: Sampling Distribution of One Proportion Lesson 8.1.2: Sampling Distribution of One Proportion Lesson 8.2.1: Estimation of One Proportion Lesson 8.2.2: Estimation of One Proportion Lesson 8.3.1: Estimation of One Proportion Lesson 8.3.2: Hypothesis Testing for One Proportion Module 9: Inference for Two Proportions Lesson 9.1.1: Sampling Distribution of Differences of Two Proportions Lesson 9.1.2: Using Technology to Explore the Sampling Distribution of the Differences in Two Proportions Lesson 9.2.1: Confidence Intervals for the Difference in Two Population Proportions Lesson 9.2.2: Computing and Interpreting Confidence Intervals for the Difference in Two Population Proportions Lesson 9.3.1: A Statistical Test for the Difference in Two Population Proportions Lesson 9.3.2: A Statistical Test for the Difference in Two Population Proportions Lesson 9.3.3: Conducting a Statistical Test for the Difference in Two Population Proportions Module 10: Inference for Means Lesson 10.1.1: The Sampling Distribution of the Sample Mean Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Lesson 10.2.1: Estimating a Population Mean Lesson 10.2.2: T-Statistics and T-Distributions Lesson 10.2.3: The Confidence Interval for a Population Mean Lesson 10.3.1: Testing Hypotheses About a Population Mean Lesson 10.3.2: Test Statistic and P-Values, One-Sample T-Test Lesson 10.4.1: Inferences About the Difference Between Two Population Means Lesson 10.4.2: Inference for Paired Data Lesson 10.4.3: Two-Sample T-Test The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. ix Frontmatter Statway—Full Version 1.0, July 2011 Module 11: Chi-Squared Tests Lesson 11.1.1: Introduction to Chi-Square Tests for One-Way Tables Lesson 11.1.2: Executing the Chi-Square Test for One-Way Tables (Goodness-of-Fit) Lesson 11.1.3: The Chi-Square Distribution and Degrees of Freedom Lesson 11.2.1: Introduction to Chi-Square Tests for Two-Way Tables Lesson 11.2.2: Executing the Chi-Square Test for Independence in Two-Way Tables Lesson 11.2.3: Executing the Chi-Square Test for Homogeneity in Two-Way Tables Module 12: Other Mathematical Content Lesson 12.1.1: Statistical Linear Relationships and Mathematical Models of Linear Relationships Lesson 12.1.2: Mathematical Linear Models Lesson 12.1.3: Contrasting Mathematical and Statistical Linear Relationships Lesson 12.1.4: Proportional Models Lesson 12.2.1: Multiple Representations of Exponential Models Lesson 12.2.2: Linear Models—Answering Various Types of Questions Algebraically Lesson 12.2.3: Power Models Lesson 12.2.4: Solving Inequalities The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. x Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will begin to understand that • sampling variability is a characteristic of the sampling process when taking a sample mean from a single population. larger samples have sample means that tend to more accurately estimate the population mean, so there is less variability in sample means for large samples when compared to smaller samples. the sampling distribution of sample means is similar to the sampling of sample proportions. Specifically, the mean of the sample means is the population mean. As the sample size increases, the standard deviation of the sample means decreases. The shape of the sampling distribution of sample means is normal if the sample size is large. • • Students will be able to • given parameters for a population, identify or draw a sampling distribution with appropriate mean and standard error. use the formula for standard error to determine the standard error given ! and sample size. • Part I Introduction [Student Handout] In Modules 7–9, you focused on using sample proportions to estimate or test a claim about a population proportion. To create a confidence interval or test a hypothesis, you had to understand how sample proportions behave as you collect random samples from the population. Of course, you also needed to understand how the sample proportions are related to the population proportion. In these previous modules, you worked with categorical variables. In this module, the focus is on quantitative variables, so you will calculate means from sample data instead of proportions. The first task of this lesson asks you to recall what you know about sampling distributions for proportions. Next, you are asked to think about how these ideas might extend to sampling distributions for means. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean Task [Student Handout] (1) Here you return to a problem that is similar to an investigation from Module 7. To estimate the proportion of voters in a city who support Candidate A for mayor, the following steps are taken: • • • • Simulate polling data for the categorical variable, “Which candidate do you support?” Simulate collecting random samples of voters from all over the city. For each sample, calculate the proportion of the sample that supports Candidate A. Graph the distribution of sample proportions. (a) The dotplot to the right shows the results from the simulation for 200 samples, each with 50 voters. What does a dot represent in the dotplot of the sampling distribution? (b) Based on the dotplot, give a single number estimate for the proportion of voters in the city who support Candidate A. Jot down notes to capture your thinking. Measures from Sample Size n=50 0.3 0.4 0.5 0.6 0.7 0.8 Proportion_for_A Dot Plot 0.9 1.0 (c) The simulation was conducted twice, using different sample sizes. Each time, 200 samples were collected. One simulation used samples of size 25, while the other used samples of size 100. Which of the following graphs is size 25? Size 100? Jot down notes to capture your thinking. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean (2) Now imagine that you want to estimate the average time it takes someone to vote from the time that person reaches the polling place. As you did in Module 7, a simulated data set is created for the quantitative variable time to vote. The simulation mimics the collecting of random samples of 50 people at a time from the population of voters. Each sample has 50 people in it. Since you want to estimate the average time it takes to vote, the mean voting time is calculated for each sample in the simulation. Two hundred samples are collected. (a) What does a dot represent in the dotplot of the sampling distribution? Measures from Sample size n=50 vote time (b) Give a single number estimate for the mean amount of time it takes an individual to vote in this city. (In other words, estimate the population mean using the distribution of sample means.) Jot down notes to capture your thinking. 16.0 16.5 17.0 17.5 18.0 18.5 19.0 sample_mean_time Dot Plot 19.5 20.0 (c) If you run the simulation again using a sample size of 100, how do you think the distribution of sample means changes? (d) Now think about the distribution of voting times for all individual voters in the city. How do you think the center and spread for the distribution of sample means compares to the center and spread of the distribution of individual voting times for all voters in the city? Wrap-‐Up/Discussion of Statistical Concepts The rest of this lesson is devoted to understanding and describing the sampling distribution for sample means. So at this point, you do not need to deliver a lecture on the features of the sampling distribution for sample means. Instead, have a few students share their answers and reasoning for Question 2, and then generalize the reasoning into conjectures about the behavior of sample means. Most likely, students will use the center of the sampling distribution as an estimate for the mean time it takes to vote in the city. Point out that this makes sense because sample means are expected to be pretty good estimates for the population mean. So, it is anticipated that many samples will have means close to the population mean. Prompt students to make connections to the sampling distribution for proportions, which is also centered at the population parameter. Most likely, students will also predict that larger samples have less variability. Point out that this makes sense because larger samples are expected to provide more accurate predictions for the overall mean voting time. Prompt students to make connections to the effect of sample size on the variability in the sampling distribution for proportions. You can also tie the previous ideas to conjectures about the population’s center and spread. It is anticipated that the individual voting times will have more variability (you can think of this as a sampling distribution with sample size of 1) but a center (mean) similar to the center (mean) of the sampling distributions. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean Part II [Student Handout] In the southwestern United States, live oak trees (quercus viriniana) produce a bounty of acorns that serve as food for many types of birds and small mammals. Early Native Americans used the sweet meat as a part of their fall diet. Today, large hardwood trees are important in reducing carbon dioxide, the notorious greenhouse gas. Recent changes in temperature and rainfall in Central Texas may have an effect on the germination of live oaks and the long-‐term future of the trees. It is known that larger acorns have a better chance for germination. Rainfall and temperature seem to have an effect on acorn production and size. To set a baseline for long-‐term studies on acorn production and possible changes, a botanist wants to determine the mean weight of acorns from this species of tree. To easily combine your results for comparison, round all sample means to the nearest tenth. (Note: Depending on your class size, you may want to ask students to take more than one sample of acorns.) 1. Using a random number table or the random number generator on your calculator, select a sample of five acorns from the population and find the mean. Repeat as directed by your instructor. To better understand the sampling process as it applies to means, you will simulate the sampling process using a very large collection of these acorns as your population. 2. Place your mean values on the class dotplot for samples of size 5. 3. Using a random number table or the random number generator on your calculator, select a sample of 20 acorns from the population and find the mean. Repeat as directed by your instructor. 4. Place your mean values on the class dotplot for samples of size 20. Acorn Mass No. 1 2 3 4 Mass 3.4 3.5 3.4 3.4 No. 51 52 53 54 Mass 4.5 3.7 4.3 5.3 No. 101 102 103 104 Mass 5.2 5.6 3.9 2 No. 151 152 153 154 Mass 4.9 3.7 3.3 4.5 No. 201 202 203 204 Mass 3.5 3.7 2.6 3.8 No. 251 252 253 254 Mass 0.7 2.6 2.6 2.7 No. 301 302 303 304 Mass 4.6 4.2 3.4 3.9 No. 351 352 353 354 Mass 4.9 2.4 3.8 3.4 5 6 7 8 2.7 4.6 5.3 5.3 55 56 57 58 4.6 4.2 2.7 4.4 105 106 107 108 4.3 4.6 4.8 6.1 155 156 157 158 4.1 3.7 4.2 3.7 205 206 207 208 3.4 4.5 4 5.1 255 256 257 258 2.1 2 2.6 2.5 305 306 307 308 3.1 3.3 3.6 4.2 355 356 357 358 4.4 4.3 5.1 3.4 9 10 11 12 13 3.9 5 3.7 4.7 2.3 59 60 61 62 63 3.5 4.8 5 4.3 2.6 109 110 111 112 113 3.3 4.5 2.5 4.4 3.4 159 160 161 162 163 4.2 5 3.6 4.5 4.1 209 210 211 212 213 3 1.6 3.2 5.4 4.9 259 260 261 262 263 3.7 2.5 0.8 4.4 2.6 309 310 311 312 313 4.1 4.6 3.2 3.5 4.9 359 360 361 362 363 5.1 2.6 5.4 2.7 2 14 15 16 17 18 3.1 3.2 6.6 6.1 5 64 65 66 67 68 3.6 4.6 4.6 3.3 3.8 114 115 116 117 118 4.2 4.8 3.8 2.6 4.3 164 165 166 167 168 3.4 3.8 3.7 4.2 5.3 214 215 216 217 218 2.7 2.9 3.6 3.1 3 264 265 266 267 268 3.8 5.2 2.6 3.5 3.3 314 315 316 317 318 4.2 3.5 4 3.8 4.7 364 365 366 367 368 3.6 2.1 2.3 3.3 3.2 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean No. 19 Mass 3.4 No. 69 Mass 5.4 No. 119 Mass 3.2 No. 169 Mass 4 No. 219 Mass 4.5 No. 269 Mass 2.6 No. 319 Mass 3.6 No. 369 Mass 4.3 20 21 22 23 24 3.8 3.4 3.2 4.1 4.4 70 71 72 73 74 4.4 1.2 4.7 6.3 2.5 120 121 122 123 124 4.2 4.2 3 3.2 3.8 170 171 172 173 174 3.5 3.7 3.9 2.9 3.3 220 221 222 223 224 3.7 3.8 3.8 3.2 3.3 270 271 272 273 274 3.3 5.4 2.6 3.7 4 320 321 322 323 324 3.5 2.9 4.9 5.3 4 370 371 372 373 374 3.7 3.6 3.9 3.3 4.4 25 26 27 28 29 4.2 3.1 3.5 0.9 5.9 75 76 77 78 79 3.4 5.1 2.3 5.6 5.7 125 126 127 128 129 3 5 5 6.3 3.3 175 176 177 178 179 4.1 4.5 3.9 3.8 3.3 225 226 227 228 229 2.7 3.7 3.9 3.8 3.1 275 276 277 278 279 4.2 3.6 2.9 0.7 4.4 325 326 327 328 329 2.9 3.9 3.5 3.3 2.8 375 376 377 378 379 4.8 4.8 3.7 5.8 3.7 30 31 32 33 34 4.1 3.3 4 4.6 5.4 80 81 82 83 84 6.7 4.2 4.1 4.7 5.7 130 131 132 133 134 3.5 5.1 5.2 4.1 5.1 180 181 182 183 184 3.4 3.9 3.5 3.3 2.3 230 231 232 233 234 3.7 3.5 3.6 2 2.7 280 281 282 283 284 2.7 4.3 4 3.6 1.4 330 331 332 333 334 2.9 3.5 3.1 4 3.5 380 381 382 383 384 4.3 2.4 3.5 2.6 4.6 35 36 37 38 39 4.2 4 3.2 5.7 2.5 85 86 87 88 89 5.1 4.2 4.9 1.4 4.6 135 136 137 138 139 3.1 4.6 3.9 4 4 185 186 187 188 189 3.9 3.5 2.5 3.5 5.1 235 236 237 238 239 4.7 0.9 2.7 4 3.4 285 286 287 288 289 3.5 3.3 3.7 2.3 3.6 335 336 337 338 339 4.3 3 2.7 2.1 3.7 385 386 387 388 389 5.7 4.5 3.4 4.1 3.2 40 41 42 43 4 5 2.9 4.3 90 91 92 93 3.5 5.3 5.1 2 140 141 142 143 5.1 4.2 4.4 2.5 190 191 192 193 3.8 2.4 3.2 2.6 240 241 242 243 3 3.5 3.7 4.3 290 291 292 293 5 2 4.6 2.9 340 341 342 343 2.3 4.1 3.9 3.3 390 391 392 393 3.8 4.1 2.4 3.1 44 45 46 47 48 3.7 5.1 2.2 4.4 5.1 94 95 96 97 98 4.8 4.6 4.6 4.9 4.9 144 145 146 147 148 4 2.5 4 3.3 3.4 194 195 196 197 198 2.8 4.4 4.1 3.9 3.6 244 245 246 247 248 4.6 3 3.3 1 1.2 294 295 296 297 298 4.5 3.5 3.2 2.6 3.6 344 345 346 347 348 4.2 4.2 1 3.1 1.9 394 395 396 397 398 4.6 3.3 4.9 3 2.6 49 50 3.8 4.4 99 100 4.2 4.3 149 150 3 5.7 199 200 4.5 2.7 249 250 2.6 2.9 299 300 3.8 4.4 349 350 3.7 3.8 399 400 2.2 3.1 (3) Estimate the mean weight of the acorns for each graph. (4) Comment on any similarities and differences between the two graphs. (5) What do you suppose the population of acorns looks like? (6) Given that 80 of the acorns from your population weigh less than 3 grams, calculate the probability that a single randomly selected acorn weighs less than 3 grams. (7) Estimate the probability that a random sample of 5 acorns has a mean weight less than 3 grams. (8) Estimate the probability that a random sample of 20 acorns has a mean weight less than 3 grams. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean Homework [Student Handout] (1) Describe how the center, spread and shape of the sampling distribution of sample means is related to the distribution of individual measurements in the population. (2) Here is another way to view the ideas developed in this lesson. Go to www.stat.tamu.edu/~west/ applets/samplemean.html. The applet is set so that !µ ! = !"##!$%&!! ! = !' . (a) Use the applet to complete the table below. You can control the size of the sample using the slider. Collect 20 samples of each sample size. Sample Size 1 4 9 16 25 Number of samples Mean of means Standard deviation of means (b) Explain how the values in the table relate to what you learned in this lesson. (c) Explain how the following graph (created by this applet) relates to what you learned in this lesson. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean Part I In Modules 7–9, you focused on using sample proportions to estimate or test a claim about a population proportion. To create a confidence interval or test a hypothesis, you had to understand how sample proportions behave as you collect random samples from the population. Of course, you also needed to understand how the sample proportions are related to the population proportion. In these previous modules, you worked with categorical variables. In this module, the focus is on quantitative variables, so you will calculate means from sample data instead of proportions. The first task of this lesson asks you to recall what you know about sampling distributions for proportions. Next, you are asked to think about how these ideas might extend to sampling distributions for means. (1) Here you return to a problem that is similar to an investigation from Module 7. To estimate the proportion of voters in a city who support Candidate A for mayor, the following steps are taken: • • • • Simulate polling data for the categorical variable, “Which candidate do you support?” Simulate collecting random samples of voters from all over the city. For each sample, calculate the proportion of the sample that supports Candidate A. Graph the distribution of sample proportions. (a) The dotplot to the right shows the results from the simulation for 200 samples, each with 50 voters. What does a dot represent in the dotplot of the sampling distribution? Measures from Sample Size n=50 Dot Plot 0.3 0.4 0.5 0.6 0.7 0.8 Proportion_for_A 0.9 1.0 (b) Based on the dotplot, give a single number estimate for the proportion of voters in the city who support Candidate A. Jot down notes to capture your thinking. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean (c) The simulation was conducted twice, using different sample sizes. Each time, 200 samples were collected. One simulation used samples of size 25, while the other used samples of size 100. Which of the following graphs is size 25? Size 100? Jot down notes to capture your thinking. (2) Now imagine that you want to estimate the average time it takes someone to vote from the time that person reaches the polling place. As you did in Module 7, a simulated data set is created for the quantitative variable time to vote. The simulation mimics the collecting of random samples of 50 people at a time from the population of voters. Each sample has 50 people in it. Since you want to estimate the average time it takes to vote, the mean voting time is calculated for each sample in the simulation. Two hundred samples are collected. (a) What does a dot represent in the dotplot of the sampling distribution? Measures from Sample size n=50 vote time 16.0 16.5 17.0 17.5 18.0 18.5 19.0 sample_mean_time Dot Plot 19.5 20.0 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean (b) Give a single number estimate for the mean amount of time it takes an individual to vote in this city. (In other words, estimate the population mean using the distribution of sample means.) Jot down notes to capture your thinking. (c) If you run the simulation again using a sample size of 100, how do you think the distribution of sample means changes? (d) Now think about the distribution of voting times for all individual voters in the city. How do you think the center and spread for the distribution of sample means compares to the center and spread of the distribution of individual voting times for all voters in the city? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean Part II In the southwestern United States, live oak trees (quercus viriniana) produce a bounty of acorns that serve as food for many types of birds and small mammals. Early Native Americans used the sweet meat as a part of their fall diet. Today, large hardwood trees are important in reducing carbon dioxide, the notorious greenhouse gas. Recent changes in temperature and rainfall in Central Texas may have an effect on the germination of live oaks and the long-‐term future of the trees. It is known that larger acorns have a better chance for germination. Rainfall and temperature seem to have an effect on acorn production and size. To set a baseline for long-‐term studies on acorn production and possible changes, a botanist wants to determine the mean weight of acorns from this species of tree. To easily combine your results for comparison, round all sample means to the nearest tenth. 1. Using a random number table or the random number generator on your calculator, select a sample of five acorns from the population and find the mean. Repeat as directed by your instructor. To better understand the sampling process as it applies to means, you will simulate the sampling process using a very large collection of these acorns as your population. 2. Place your mean values on the class dotplot for samples of size 5. 3. Using a random number table or the random number generator on your calculator, select a sample of 20 acorns from the population and find the mean. Repeat as directed by your instructor. 4. Place your mean values on the class dotplot for samples of size 20. Acorn Mass No. 1 2 3 Mass 3.4 3.5 3.4 No. 51 52 53 Mass 4.5 3.7 4.3 No. 101 102 103 Mass 5.2 5.6 3.9 No. 151 152 153 Mass 4.9 3.7 3.3 No. 201 202 203 Mass 3.5 3.7 2.6 No. 251 252 253 Mass 0.7 2.6 2.6 No. 301 302 303 Mass 4.6 4.2 3.4 No. 351 352 353 Mass 4.9 2.4 3.8 4 5 6 7 8 3.4 2.7 4.6 5.3 5.3 54 55 56 57 58 5.3 4.6 4.2 2.7 4.4 104 105 106 107 108 2 4.3 4.6 4.8 6.1 154 155 156 157 158 4.5 4.1 3.7 4.2 3.7 204 205 206 207 208 3.8 3.4 4.5 4 5.1 254 255 256 257 258 2.7 2.1 2 2.6 2.5 304 305 306 307 308 3.9 3.1 3.3 3.6 4.2 354 355 356 357 358 3.4 4.4 4.3 5.1 3.4 9 10 11 12 13 3.9 5 3.7 4.7 2.3 59 60 61 62 63 3.5 4.8 5 4.3 2.6 109 110 111 112 113 3.3 4.5 2.5 4.4 3.4 159 160 161 162 163 4.2 5 3.6 4.5 4.1 209 210 211 212 213 3 1.6 3.2 5.4 4.9 259 260 261 262 263 3.7 2.5 0.8 4.4 2.6 309 310 311 312 313 4.1 4.6 3.2 3.5 4.9 359 360 361 362 363 5.1 2.6 5.4 2.7 2 14 15 16 17 3.1 3.2 6.6 6.1 64 65 66 67 3.6 4.6 4.6 3.3 114 115 116 117 4.2 4.8 3.8 2.6 164 165 166 167 3.4 3.8 3.7 4.2 214 215 216 217 2.7 2.9 3.6 3.1 264 265 266 267 3.8 5.2 2.6 3.5 314 315 316 317 4.2 3.5 4 3.8 364 365 366 367 3.6 2.1 2.3 3.3 18 19 20 21 5 3.4 3.8 3.4 68 69 70 71 3.8 5.4 4.4 1.2 118 119 120 121 4.3 3.2 4.2 4.2 168 169 170 171 5.3 4 3.5 3.7 218 219 220 221 3 4.5 3.7 3.8 268 269 270 271 3.3 2.6 3.3 5.4 318 319 320 321 4.7 3.6 3.5 2.9 368 369 370 371 3.2 4.3 3.7 3.6 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean No. 22 Mass 3.2 No. 72 Mass 4.7 No. 122 Mass 3 No. 172 Mass 3.9 No. 222 Mass 3.8 No. 272 Mass 2.6 No. 322 Mass 4.9 No. 372 Mass 3.9 23 24 25 26 27 4.1 4.4 4.2 3.1 3.5 73 74 75 76 77 6.3 2.5 3.4 5.1 2.3 123 124 125 126 127 3.2 3.8 3 5 5 173 174 175 176 177 2.9 3.3 4.1 4.5 3.9 223 224 225 226 227 3.2 3.3 2.7 3.7 3.9 273 274 275 276 277 3.7 4 4.2 3.6 2.9 323 324 325 326 327 5.3 4 2.9 3.9 3.5 373 374 375 376 377 3.3 4.4 4.8 4.8 3.7 28 29 30 31 32 0.9 5.9 4.1 3.3 4 78 79 80 81 82 5.6 5.7 6.7 4.2 4.1 128 129 130 131 132 6.3 3.3 3.5 5.1 5.2 178 179 180 181 182 3.8 3.3 3.4 3.9 3.5 228 229 230 231 232 3.8 3.1 3.7 3.5 3.6 278 279 280 281 282 0.7 4.4 2.7 4.3 4 328 329 330 331 332 3.3 2.8 2.9 3.5 3.1 378 379 380 381 382 5.8 3.7 4.3 2.4 3.5 33 34 35 36 37 4.6 5.4 4.2 4 3.2 83 84 85 86 87 4.7 5.7 5.1 4.2 4.9 133 134 135 136 137 4.1 5.1 3.1 4.6 3.9 183 184 185 186 187 3.3 2.3 3.9 3.5 2.5 233 234 235 236 237 2 2.7 4.7 0.9 2.7 283 284 285 286 287 3.6 1.4 3.5 3.3 3.7 333 334 335 336 337 4 3.5 4.3 3 2.7 383 384 385 386 387 2.6 4.6 5.7 4.5 3.4 38 39 40 41 42 5.7 2.5 4 5 2.9 88 89 90 91 92 1.4 4.6 3.5 5.3 5.1 138 139 140 141 142 4 4 5.1 4.2 4.4 188 189 190 191 192 3.5 5.1 3.8 2.4 3.2 238 239 240 241 242 4 3.4 3 3.5 3.7 288 289 290 291 292 2.3 3.6 5 2 4.6 338 339 340 341 342 2.1 3.7 2.3 4.1 3.9 388 389 390 391 392 4.1 3.2 3.8 4.1 2.4 43 44 45 46 4.3 3.7 5.1 2.2 93 94 95 96 2 4.8 4.6 4.6 143 144 145 146 2.5 4 2.5 4 193 194 195 196 2.6 2.8 4.4 4.1 243 244 245 246 4.3 4.6 3 3.3 293 294 295 296 2.9 4.5 3.5 3.2 343 344 345 346 3.3 4.2 4.2 1 393 394 395 396 3.1 4.6 3.3 4.9 47 48 49 50 4.4 5.1 3.8 4.4 97 98 99 100 4.9 4.9 4.2 4.3 147 148 149 150 3.3 3.4 3 5.7 197 198 199 200 3.9 3.6 4.5 2.7 247 248 249 250 1 1.2 2.6 2.9 297 298 299 300 2.6 3.6 3.8 4.4 347 348 349 350 3.1 1.9 3.7 3.8 397 398 399 400 3 2.6 2.2 3.1 (3) Estimate the mean weight of the acorns for each graph. (4) Comment on any similarities and differences between the two graphs. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean (5) What do you suppose the population of acorns looks like? (6) Given that 80 of the acorns from your population weigh less than 3 grams, calculate the probability that a single randomly selected acorn weighs less than 3 grams. (7) Estimate the probability that a random sample of 5 acorns has a mean weight less than 3 grams. (8) Estimate the probability that a random sample of 20 acorns has a mean weight less than 3 grams. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean Homework (1) Describe how the center, spread and shape of the sampling distribution of sample means is related to the distribution of individual measurements in the population. (2) Here is another way to view the ideas developed in this lesson. Go to www.stat.tamu.edu/~west/ applets/samplemean.html. The applet is set so that !µ ! = !"##!$%&!! ! = !' . (a) Use the applet to complete the table below. You can control the size of the sample using the slider. Collect 20 samples of each sample size. Sample Size 1 4 9 16 25 Number of samples Mean of means Standard deviation of means The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.1.1: The Sampling Distribution of the Sample Mean (b) Explain how the values in the table relate to what you learned in this lesson. (c) Explain how the following graph (created by this applet) relates to what you learned in this lesson. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will understand that the sampling distribution of the sample mean converges to a normal distribution with increasing sample size. Students will be able to describe the morphing of the shape of a sampling distribution from the shape of the population (n = 1) to normal (n  large). Introduction to Context of the Lesson [Student Handout] In Lesson 10.1.1, technology was used to sample from a population of 400 real acorn weights. In this lesson, you will use a computer to extend your study of the sampling distribution of the mean by generating samples, calculating means, and plotting the sampling distribution of the mean. The speed of the computer allows you to simulate the sampling process on a very large scale in a very short time. You will be particularly concerned with the shape of the population distribution and how it compares to the shape of the sampling distribution of the mean for different sample sizes. (Note: Depending on the availability of computers in the classroom, students can do these activities in small groups of two or three, or they may follow along with the instructor projecting on a large screen. The directions that follow are generic and cover both instances. The goal of Task 1 is for students to develop a familiarity with the applet, either following the instructor’s steps or performing the steps at their own computer. If students are performing the simulations on their computers, they will almost certainly have slight problems with the directions—the instructor should actively walk around and detect these problems, as students are sometimes reluctant to admit they are having difficulties.) This lesson uses a terrific website, the Rice Virtual Lab in Statistics (http://onlinestatbook.com/ rvls.html), developed at Rice University in Houston, Texas. To begin, click Simulations/Demonstrations and then Sampling Distribution Simulation. Your primary goal is to understand how the shape of a population distribution and the size of a sample affect the sampling distribution of the mean. Your tool for reaching this goal is computer-‐generated sampling experiments. There are two tasks in this lesson. First, you need to understand how to interpret the output of the simulation. Then you use the simulation to compare sampling distributions (a) when the shape of the population distribution is varied and (b) when the sample size is varied. Task 1 [Student Handout] The first task is understanding the simulation output. Click the Begin button and the Sampling Distributions panel appears. There are three regions for plotting histograms; use only the first two (i.e., the middle two regions.) The top region, Parent population, initially shows a population that is approximately normal. The shape of the parent population can be changed using the drop-‐down box that currently says Normal. Click the drop-‐down box and choose a Uniform population. (1) Describe the shape of this population. What are the possible values of x that can be drawn? Does it appear that some numbers are more likely to occur than others? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape In the first simulation, you take samples from this uniformly distributed population. As the simulation unfolds, two distributions are plotted. The first, Sample Data, keeps track of each individual sample. The second, Distribution of Means, keeps track of the distribution of sample means. This distribution changes as you add the results of each sample. Set the Distribution of Means drop-‐down boxes to Mean and N = 2, and press the Sample Data button labeled Animated. (2) Describe the results you see in the Sample Data distribution? What are these two values? (3) Describe the result you see in the Distribution of Means distribution. What is this value? Click the Animated button a few times to observe what is happening to the Sample Data and the Distribution of Means histograms. (4) What information does the Sample Data histogram contain? (5) What information does the Distribution of Means histogram contain? You can speed up the simulation by choosing more than one sample at a time. Repeatedly click the Sample Data button labeled 5. (6) What is happening as you click the 5 sample button? (Hint: Notice the Reps = row to the left of the Distribution of Means histogram.) To really speed up the simulation, click the 1,000 and 10,000 sample buttons. Click these until the shape of the distribution of means changes very little with each click. When this distribution of means settles down, you have completed the simulation of the sampling distribution of the mean for a uniform parent population and a sample size of 2! (7) Describe the distribution of means. What is the range of values in this distribution? What is the shape of the distribution? Where is it centered? Finally, check the Fit normal box to the right of the Distribution of Means histogram. The computer superimposes a normal distribution with the mean and standard deviation equal to the results of the simulation. (8) How does your distribution of means compare to the theoretical normal distribution? Where is the normal distribution higher than your histogram? Where is the normal distribution lower than your histogram? Wrap-‐Up/Transition to Task 2 At this point, students should understand the basic output of these programs. If students are performing the simulations at their desks, they should feel comfortable with the applet. In Task 2, you focus on two questions: (a) How does the sampling distribution of the mean change The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape when the shape of the parent population changes? and (b) How does the sampling distribution change when the sample size increases? At the end of Task 2, students should see that • the shape of the sampling distribution tends to look more “normalish” than the parent population. • the shape of the sampling distribution tends to look more and more “normalish” as the sample size increases. Task 2 [Student Handout] Now that you have mastered the simulation techniques, let’s see how the shape of the sampling distribution changes when you vary the parent population and the sample size. Click the Uniform drop-‐ down menu to check out the possibilities for parent population shapes. In the first part of this task, you use the built-‐in parent populations; then you expand your horizons and create some custom parent populations. For the second task, work in teams of two or three. Divide and conquer the work, and discuss your collective results with team members and come to a single conclusion for the task’s questions. Task 2a With the Fit normal box checked, perform the simulation for each of the built-‐in parent population shapes and sample sizes. Sketch your results in the boxes below. You may perform the simulation with all the possible sample sizes, but only sketch the results of the sample sizes N = 2, N = 10, and N = 20. Uniform N = 2 N = 10 N = 20 Normal N = 2 N = 10 N = 20 Skewed N = 2 N = 20 N = 10 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape (9) As the sample goes from N = 2 to N = 20, what changes do you see in the shapes of these sampling distributions? Are these changes very small or very large? (10) As the sample goes from N = 2 to N = 20, what changes do you see in the location of the sampling distributions? (Do the distributions move to the left or right, or do they stay where they are?) Are these changes very small or very large? (11) As the sample goes from N = 2 to N = 20, what changes do you see in the variability of the sampling distributions? (Do the distributions get more variable or less variable, or do they stay about the same?) Are these changes very small or very large? (12) In a short paragraph, summarize your findings in Questions 9–11. Specifically, compare the changes you observed in the uniform, normal, and skewed population distributions. (Note: At this point, students should notice the sampling distribution converging to a normal shape as the sample size increases. Students may delineate changes in each distribution and fail to notice this common change across the different populations. Ask students to summarize what they are finding, and ask if they discovered common behavior of the sampling distributions for the different parent distributions.) Task 2b Now that you have gained some experience with simulations of sampling distributions with different populations, you will create some populations of your own and experiment with their sampling distributions of the mean. The first custom parent population you will consider is one that is seriously skewed. Use your mouse to customize the parent population so that it looks like the figure below. This is accomplished by dragging your mouse over the distribution. You only need to get close— do not worry about being perfect here! Seriously Skewed N = 2 N = 5 N = 20 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape The second custom parent population you will consider is one that is U-‐shaped. Use your mouse to customize the parent population so that it looks like the figure below. (Again, you only need to get close—do not worry about being perfect here!) Seriously Weird N = 2 N = 5 N = 20 You have now simulated sampling from two very different populations. For the following questions, consider the changes that you saw in both sampling distributions, not each one. That is, how would you describe the behaviors that are similar for both distributions? (Note: Again, the idea here is that students see a common thread in the changes, not separate behaviors in the two parent distributions.) (13) As the sample goes from N = 2 to N = 20, what changes do you see in the shapes of these sampling distributions? Are these changes very small or very large? (14) As the sample goes from N = 2 to N = 20, what changes do you see in the location of the sampling distributions? (Do the distributions move to the left or right, or do they stay where they are?) Are these changes very small or very large? (15) As the sample goes from N = 2 to N = 20, what changes do you see in the variability of the sampling distributions? (Do the distributions get more variable or less variable, or do they stay about the same?) Are these changes very small or very large? (Note: Once again, ask students to summarize what they are finding, and ask if they discovered common behavior of the sampling distributions for the different sample sizes. Consider the following question, followed by class discussion: “In a short paragraph, summarize the answers to Questions 13 –15. Again, focus on summarizing the changes you observed in both distributions.”) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Task 2c Okay, it is confession time. You probably have noticed something about the sampling distribution of the mean: As the sample size gets larger, the sampling distribution of the mean seems to get closer and closer to a normal distribution. However, now is the time to unleash your secret suspicion that these distributions were concocted to convince you of something that does not actually work all the time. Here is your chance to check out this conspiracy theory! With the rest of your team, create the weirdest, wildest parent population you can collectively imagine. Implement your custom distribution with the mouse, sketch the distribution, and then simulate the sampling distribution of the mean as you did in the earlier tasks. Seriously Your Own N = 2 N = 5 N = 20 Wrap-‐Up Hopefully, students have tried any number of bizarre distributions. Encourage them to show the class what they tried and how the shape of the sampling distribution changed as the sample size increased. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Homework [Student Handout] 1. Write a short paragraph that expresses your belief about the sampling distribution of the mean. “Based on our work with a variety of parent distributions and sample sizes, it seems to be the case that…” (Note: Assign these questions if students have out-‐of-‐class access to computers.) 2. A possible population distribution is one that is triangular. Use the applet to construct a triangular population distribution. Put the tip of the triangle at about 8. How does the shape of the sampling distribution change as the sample size increases? Write a short summary paragraph of your observations. 3. Yet another possible population distribution is one that is bimodal. For example, suppose you took samples of the length of human hair in your school. It is probable (but not certain!) that young women tend to have long hair and young men have short hair. Use the applet to construct a bimodal population distribution. How does the shape of the sampling distribution change as the sample size increases? Write a short summary paragraph of your observations. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape In Lesson 10.1.1, technology was used to sample from a population of 400 real acorn weights. In this lesson, you will use a computer to extend your study of the sampling distribution of the mean by generating samples, calculating means, and plotting the sampling distribution of the mean. The speed of the computer allows you to simulate the sampling process on a very large scale in a very short time. You will be particularly concerned with the shape of the population distribution and how it compares to the shape of the sampling distribution of the mean for different sample sizes. This lesson uses a terrific website, the Rice Virtual Lab in Statistics (http://onlinestatbook.com/ rvls.html), developed at Rice University in Houston, Texas. To begin, click Simulations/Demonstrations and then Sampling Distribution Simulation. Your primary goal is to understand how the shape of a population distribution and the size of a sample affect the sampling distribution of the mean. Your tool for reaching this goal is computer-‐generated sampling experiments. There are two tasks in this lesson. First, you need to understand how to interpret the output of the simulation. Then you use the simulation to compare sampling distributions (a) when the shape of the population distribution is varied and (b) when the sample size is varied. Task 1 The first task is understanding the simulation output. Click the Begin button and the Sampling Distributions panel appears. There are three regions for plotting histograms; use only the first two (i.e., the middle two regions.) The top region, Parent population, initially shows a population that is approximately normal. The shape of the parent population can be changed using the drop-‐down box that currently says Normal. Click the drop-‐down box and choose a Uniform population. (1) Describe the shape of this population. What are the possible values of x that can be drawn? Does it appear that some numbers are more likely to occur than others? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape In the first simulation, you take samples from this uniformly distributed population. As the simulation unfolds, two distributions are plotted. The first, Sample Data, keeps track of each individual sample. The second, Distribution of Means, keeps track of the distribution of sample means. This distribution changes as you add the results of each sample. Set the Distribution of Means drop-‐down boxes to Mean and N = 2, and press the Sample Data button labeled Animated. (2) Describe the results you see in the Sample Data distribution? What are these two values? (3) Describe the result you see in the Distribution of Means distribution. What is this value? Click the Animated button a few times to observe what is happening to the Sample Data and the Distribution of Means histograms. (4) What information does the Sample Data histogram contain? (5) What information does the Distribution of Means histogram contain? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape You can speed up the simulation by choosing more than one sample at a time. Repeatedly click the Sample Data button labeled 5. (6) What is happening as you click the 5 sample button? (Hint: Notice the Reps = row to the left of the Distribution of Means histogram.) To really speed up the simulation, click the 1,000 and 10,000 sample buttons. Click these until the shape of the distribution of means changes very little with each click. When this distribution of means settles down, you have completed the simulation of the sampling distribution of the mean for a uniform parent population and a sample size of 2! (7) Describe the distribution of means. What is the range of values in this distribution? What is the shape of the distribution? Where is it centered? Finally, check the Fit normal box to the right of the Distribution of Means histogram. The computer superimposes a normal distribution with the mean and standard deviation equal to the results of the simulation. (8) How does your distribution of means compare to the theoretical normal distribution? Where is the normal distribution higher than your histogram? Where is the normal distribution lower than your histogram? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Task 2 Now that you have mastered the simulation techniques, let’s see how the shape of the sampling distribution changes when you vary the parent population and the sample size. Click the Uniform drop-‐ down menu to check out the possibilities for parent population shapes. In the first part of this task, you use the built-‐in parent populations; then you expand your horizons and create some custom parent populations. For the second task, work in teams of two or three. Divide and conquer the work, and discuss your collective results with team members and come to a single conclusion for the task’s questions. Task 2a With the Fit normal box checked, perform the simulation for each of the built-‐in parent population shapes and sample sizes. Sketch your results in the boxes below. You may perform the simulation with all the possible sample sizes, but only sketch the results of the sample sizes N = 2, N = 10, and N = 20. Uniform N = 2 N = 10 N = 20 Normal N = 2 N = 10 N = 20 Skewed N = 2 N = 20 N = 10 (9) As the sample goes from N = 2 to N = 20, what changes do you see in the shapes of these sampling distributions? Are these changes very small or very large? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape (10) As the sample goes from N = 2 to N = 20, what changes do you see in the location of the sampling distributions? (Do the distributions move to the left or right, or do they stay where they are?) Are these changes very small or very large? (11) As the sample goes from N = 2 to N = 20, what changes do you see in the variability of the sampling distributions? (Do the distributions get more variable or less variable, or do they stay about the same?) Are these changes very small or very large? (12) In a short paragraph, summarize your findings in Questions 9–11. Specifically, compare the changes you observed in the uniform, normal, and skewed population distributions. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Task 2b Now that you have gained some experience with simulations of sampling distributions with different populations, you will create some populations of your own and experiment with their sampling distributions of the mean. The first custom parent population you will consider is one that is seriously skewed. Use your mouse to customize the parent population so that it looks like the figure below. This is accomplished by dragging your mouse over the distribution. You only need to get close—do not worry about being perfect here! Seriously Skewed N = 2 N = 5 N = 20 The second custom parent population you will consider is one that is U-‐shaped. Use your mouse to customize the parent population so that it looks like the figure below. (Again, you only need to get close—do not worry about being perfect here!) Seriously Weird N = 2 N = 5 N = 20 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape You have now simulated sampling from two very different populations. For the following questions, consider the changes that you saw in both sampling distributions, not each one. That is, how would you describe the behaviors that are similar for both distributions? (13) As the sample goes from N = 2 to N = 20, what changes do you see in the shapes of these sampling distributions? Are these changes very small or very large? (14) As the sample goes from N = 2 to N = 20, what changes do you see in the location of the sampling distributions? (Do the distributions move to the left or right, or do they stay where they are?) Are these changes very small or very large? (15) As the sample goes from N = 2 to N = 20, what changes do you see in the variability of the sampling distributions? (Do the distributions get more variable or less variable, or do they stay about the same?) Are these changes very small or very large? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Task 2c Okay, it is confession time. You probably have noticed something about the sampling distribution of the mean: As the sample size gets larger, the sampling distribution of the mean seems to get closer and closer to a normal distribution. However, now is the time to unleash your secret suspicion that these distributions were concocted to convince you of something that does not actually work all the time. Here is your chance to check out this conspiracy theory! With the rest of your team, create the weirdest, wildest parent population you can collectively imagine. Implement your custom distribution with the mouse, sketch the distribution, and then simulate the sampling distribution of the mean as you did in the earlier tasks. Seriously Your Own N = 2 N = 5 N = 20 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Homework 1. Write a short paragraph that expresses your belief about the sampling distribution of the mean. “Based on our work with a variety of parent distributions and sample sizes, it seems to be the case that…” 2. A possible population distribution is one that is triangular. Use the applet to construct a triangular population distribution. Put the tip of the triangle at about 8. How does the shape of the sampling distribution change as the sample size increases? Write a short summary paragraph of your observations. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 9 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape 3. Yet another possible population distribution is one that is bimodal. For example, suppose you took samples of the length of human hair in your school. It is probable (but not certain!) that young women tend to have long hair and young men have short hair. Use the applet to construct a bimodal population distribution. How does the shape of the sampling distribution change as the sample size increases? Write a short summary paragraph of your observations. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 10 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.2.1: Estimating a Population Mean Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will understand that • • • sampling variability is a characteristic of the sampling process when taking a sample mean from a single population. the distribution of the sample mean depends on the sample size and the variability of the population under study. the T-‐family of distributions is used to describe the sampling distribution of the T-‐statistic. Students will begin to be able to • • • identify the appropriate T-‐distribution associated with a sample by finding the correct number of degrees of freedom. calculate the T-‐statistic. use the T-‐procedures to construct a confidence interval for the population mean. Materials Required • • • small numbered tags containers to place the tags T-‐tables The number of tags will vary depending on how you decide to run the experiment. In general, the best approach for the distribution of tag numbers is a triangular distribution such as the following, shown as deviations from a mean of 0: For low-‐variability tags, 0 –1 0 1 –2 –1 0 1 2 –3 –2 –1 0 1 2 3 –4 –3 –2 –1 0 1 2 3 4 –5 –4 –3 –2 –1 0 1 2 3 4 5 0 –2 0 2 –4 –2 0 2 4 –6 –4 –2 0 2 4 6 –8 –6 –4 –2 0 2 4 6 8 –10 –8 –6 –4 –2 0 2 4 6 8 10 For high-‐variability tags, The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.2.1: Estimating a Population Mean Introduction [Student Handout] (Note: Generally these distributions, replicated 3 times for a total of 108 numbers per set, should be sufficient for sample sizes of 2 and 5 [i.e., at least 10 times the sample size]. The general idea behind this lesson is to get students out of the black box and into the “real” world of sampling. The distributions of sample means they get should look like the sampling distributions the computer delivered; the intent here is to engender a physical understanding of the act of sampling from a population and calculating a mean.) In today’s topic, you will capitalize on aspects of inference that you are familiar with. You have previously constructed confidence intervals for and tested hypotheses about population proportions. Recall that the logical basis for making inferences is an understanding of the sampling distribution of a statistic. This distribution gives information about (a) the possible values of sample statistics that may occur and (b) the probabilities of those values appearing as you reach into a population and take a sample. What will be different from your earlier experience is that your sample statistic is not the sample proportion, but the sample mean. As you proceed, there are two major clusters of ideas that you need to keep straight. First, what aspects of the procedures being introduced are very similar to what you have seen before? These are ideas that you will see again as you learn more about inference. Second, what ideas and procedures are being introduced for the first time? Rich Task [Student Handout] In Topic 10.1, you used a computer to simulate the behavior of sample means and observed the power of the Central Limit Theorem. It was suggested that as sample size increases the sampling distribution of the mean gets closer and closer to the normal distribution. In that simulation, you chose the population mean and standard deviation in advance. Now let’s consider a more realistic situation, one faced by everyone who does statistics: What if the population mean or standard deviation is not known? (which, of course, you do not—if you knew those values, sampling would not needed!) Not only that—what if you cannot take sample sizes as large as you wish? Is the behavior of sample means in this circumstance plausibly similar to your theoretical result as demonstrated by the Central Limit Theorem? You will perform a small simulation to see if this seems likely. (Note: There are many ways to do this, and much depends on the preferences of the instructor. One instructor may have only four bags and have each student perform one run of the simulation while discussing a topic. Others may have the teams assemble in groups to perform the runs.) Your instructor will divide you into four teams. Each team will simulate a process of randomly sampling numbered tags for one of the following situations: • • • • The sample size is small, and the variability of the population is relatively large. The sample size is small, and the variability of the population is relatively small. The sample size is really small, and the variability of the population is relatively large. The sample size is really small, and the variability of the population is relatively small. Sample sizes of n = 2 and n = 5 have been arbitrarily chosen. Actually, the choice was not very arbitrary—with these sample sizes, calculation of sample means is easier. Your instructor will let The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.2.1: Estimating a Population Mean you in on the secret means and standard deviations of the tag numbers when you are finished with the simulation. These quantities are being kept secret to prevent your advance knowledge of the population means and standard deviations from biasing your interpretation of the simulation results. (Note: The best graphic portrayal is a dotplot—it is easy to interpret and easy for students to do on the board. The teams will graphically display the simulation results. After all the results are in, take a few moments as a whole class to discuss any differences. Jot down your thoughts for future reference. Focus your discussion on the shape, center, and spread of the distributions of the means: • • • Are the distributions centered at the same number, or at least close? Are the distributions equally variable, or at least close? Are the shapes of the distributions the same, or at least similar? You are now ready for the secret numbers: the true means and standard deviations of the tag populations.) Wrap-‐Up Give students the secret population means and standard deviations of the numbers. Discuss the center, variability, and shapes of the generated distributions of sample means. What should be observable is that the centers are unchanged and variability is lower for populations with small variability and large samples. This is consistent with what students have seen in the computer simulations. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.2.1: Estimating a Population Mean Introduction The general idea behind this lesson is to get students out of the black box and into the “real” world of sampling. The distributions of sample means they get should look like the sampling distributions the computer delivered; the intent here is to engender a physical understanding of the act of sampling from a population and calculating a mean.) In today’s topic, you will capitalize on aspects of inference that you are familiar with. You have previously constructed confidence intervals for and tested hypotheses about population proportions. Recall that the logical basis for making inferences is an understanding of the sampling distribution of a statistic. This distribution gives information about (a) the possible values of sample statistics that may occur and (b) the probabilities of those values appearing as you reach into a population and take a sample. What will be different from your earlier experience is that your sample statistic is not the sample proportion, but the sample mean. As you proceed, there are two major clusters of ideas that you need to keep straight. First, what aspects of the procedures being introduced are very similar to what you have seen before? These are ideas that you will see again as you learn more about inference. Second, what ideas and procedures are being introduced for the first time? Task In Topic 10.1, you used a computer to simulate the behavior of sample means and observed the power of the Central Limit Theorem. It was suggested that as sample size increases the sampling distribution of the mean gets closer and closer to the normal distribution. In that simulation, you chose the population mean and standard deviation in advance. Now let’s consider a more realistic situation, one faced by everyone who does statistics: What if the population mean or standard deviation is not known? (which, of course, you do not—if you knew those values, sampling would not needed!) Not only that—what if you cannot take sample sizes as large as you wish? Is the behavior of sample means in this circumstance plausibly similar to your theoretical result as demonstrated by the Central Limit Theorem? You will perform a small simulation to see if this seems likely. Your instructor will divide you into four teams. Each team will simulate a process of randomly sampling numbered tags for one of the following situations: • • • • The sample size is small, and the variability of the population is relatively large. The sample size is small, and the variability of the population is relatively small. The sample size is really small, and the variability of the population is relatively large. The sample size is really small, and the variability of the population is relatively small. Sample sizes of n = 2 and n = 5 have been arbitrarily chosen. Actually, the choice was not very arbitrary—with these sample sizes, calculation of sample means is easier. Your instructor will let you in on the secret means and standard deviations of the tag numbers when you are finished with the simulation. These quantities are being kept secret to prevent your advance knowledge of the population means and standard deviations from biasing your interpretation of the simulation results. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will understand that the T-‐family of distributions is used to describe the sampling distribution of the T-‐statistic. Students will begin to be able to • • identify the appropriate T-‐distribution associated with a sample by finding the correct number of degrees of freedom. calculate the T-‐statistic. Introduction [Student Handout] (Note: For the most part, the introduction is a lecture/presentation. Initially, students should recall their physical simulation to orient them to the problem of mathematically describing the calculations with the T-‐distribution. As much as possible, parallels should be drawn to their prior work with the standard normal distribution and inference with proportions. The underlying logic of hypothesis testing and constructing confidence intervals is not changed; only the sampling distribution is new.) The distributions of the results from the simulations in Lesson 10.2.1 should have looked familiar. You may have noticed that the distributions of sample means were centered in different places or that they appeared to have different variability. Now that you know the actual means of the data (and have 20/20 hindsight), do your results seem reasonable? • • For the samples that were the same size, does the distribution of the sample means collected from the more variable population seem more variable? For the samples with different sample sizes but the same variability, was the variability of the sample means less for the samples from the larger size? If you answered yes to both questions, your simulation results are consistent with the mathematical theory of how sampling works in the case of the sample mean. The shape of the sampling distribution of the mean is a more difficult question to address. If you reflected on your past experience, you may have speculated that the shapes of the distributions might be approximately “normal,” which certainly seems reasonable. However, if you are not given the population standard deviation, you must resort to estimating it using the sample standard deviation. When making that substitution, there is a sudden change in the shape of the sampling distribution of the sample mean. The sudden change results in procedures and methods that are very similar to what you have previously used for statistical inference. However, you will have a new family of distributions known as T-‐distributions. When making an inference about a single population mean and substituting the sample standard deviation in place of the population standard deviation, you use two related concepts: T-‐distributions (used when constructing confidence intervals) and the T-‐statistic (used when testing a hypothesis about the population mean). The first change with the new T-‐distributions is that rather than using the standardized variable (z) as in # ! ! !µ "! = ! " $ ! The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions you substitute your sample standard deviation (s) for σ. The resulting standardized variable and its formula are # ! ! !µ "! = ! $ % ! There are many different normal distributions, one each for any given mean and standard deviation. There are also many different T-‐distributions, and they are distinguished by a positive whole number known (rather mysteriously) as the number of degrees of freedom (df). All T-‐distributions are bell-‐ shaped, similar in appearance to the standard normal curve, and are centered at 0. The T-‐distributions with smaller degrees of freedom are heavier in the tails; as the number of degrees of freedom increases, the tails become less heavy, and eventually the T-‐distributions become practically indistinguishable from the standard normal distribution. The appropriate number of degrees of freedom to use with a given data set depends on the sample size. The T-‐distribution to use when making inferences about a single population mean is the one with degrees of freedom equal to the sample size minus 1 (n – 1). Skill 1: How many degrees of freedom? [Student Handout] To help you become familiar with the skills needed for inference about a population mean, you will consider some actual data. Your newly developed skills will be • • identifying the correct number of degrees of freedom to use for any given sample, and thus the correct sampling distribution for the sample mean; and calculating the T-‐statistic for a sample. Technically, to calculate a T-‐statistic you need to either know or hypothesize a population mean. For your example, µ = 0.5 has been arbitrarily selected for reasons that will become apparent shortly. The data you will consider are measures of the coiling behavior of some samples of cottonmouth snakes. There are samples of snakes of both genders and two age ranges to analyze: adult males, adult females, juvenile males, and juvenile females.1 You will investigate the degree to which these creatures appear to be left-‐handed. That is, what proportion of the time do individual snakes from an identified population curl so that the left side of its body is “pointing in”? The observed coiling index for an individual snake is defined as the fraction of the observed coils that were left-‐handed. If a particular snake was neither left-‐ handed nor right-‐handed, its coiling index was 0.5. The following are the coiling indices for the sample of 15 adult females: 0.582 0.585 0.550 0.554 0.609 0.545 0.544 0.600 0.638 0.656 0.600 0.696 0.424 0.493 0.491 1 Roth, E. D. (2003). Handedness’ in snakes? Lateralization of coiling behavior in a cottonmouth, Agkistrodon piscivorus leucostoma, population. Animal Behaviour, 66, 337–341. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions Left-‐coiled snake Right-‐coiled snake Two questions will be asked about this set of data. The first question is about the degrees of freedom associated with this sample. Now, do not be suspicious—this is not a trick question! The answer is quick and easy: Q: How many degrees of freedom are associated with this sample size? A: Since n = 15, the appropriate number of degrees of freedom (n – 1) is 14. (Note: Sometimes students can get hung up on the degrees of freedom, wanting to know “what they are.” An explanation of degrees of freedom is far beyond the level of introductory statistics, although some texts make a valiant attempt. Perhaps the best substitute for an explanation is that degrees of freedom are like the degree of a polynomial: simply a number that describes which T-‐distribution and which type of polynomial you have.) Skill 2: Calculating the T-‐statistic [Student Handout] The second question involves the calculation of the T-‐statistic for a given set of data. You usually calculate the T-‐statistic for a sample when testing a hypothesis about a population mean. Recall that hypothesis testing involves making a hypothesis about a population value and then using a test statistic as a measure of the discrepancy between the statistic and the hypothesized parameter. Just as is true of the Z-‐statistic, the T-‐statistic is a measure of how many standard deviations the sample is from the hypothesized mean, which standard deviation is the standard deviation of the sampling distribution of the test statistic, and appears in the denominator of the T-‐statistic. In the case of the mean, you estimate this standard deviation using the standard deviation of the sample. Since at this point you do not actually have a hypothesis to test, a value has been arbitrarily assigned to the population mean: µ = 0.5. (This corresponds to the hypothesis that the population of snakes for that gender and age combination is neither left-‐handed nor right-‐handed.) Q: For an assumed population mean of µ = 0.5, what T-‐statistic would be calculated from the sample of adult females? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions A: To calculate the value of the T-‐statistic using the previous formula, you need to find the mean and sample standard deviation of the coiling behavior of the adult females for which you have data. Turning to a calculator, you find !" ! = !"#$%& and !"! = !"#"$% . Then the T-‐statistic is calculated as follows: # ! ! !µ "#$%&! ! !"#$ "! = ! !=! ! = !)#(*$ $ "#"'( % &$ ! (Note: After demonstrating the calculation, let students determine the degrees of freedom, the means and standard deviations, and finally the T-‐statistics for the other three data sets.) The data for the other three categories of these slithery little critters are shown below: Adult males 0.563 0.556 0.522 0.541 0.395 Juvenile males 0.486 0.492 0.475 0.464 0.493 Juvenile females 0.512 0.556 0.565 0.417 0.429 (1) For each of these categories of slithering critter, answer the following: (a) The adult males and both male and female juveniles have the same sample size. What is the appropriate number of degrees of freedom for inference purposes? (b) Calculate the means and standard deviations for the adult males and both male and female juveniles. From this information, calculate the T-‐statistics, assuming a population mean of µ = 0.5. (Answers: Adult males 0.5154 0.0691 Juvenile males 0.4820 0.0123 Juvenile females 0.4958 0.0695 Adult males: "! = ! ! # ! ! !µ "#$%$&! ! !"#$ !=! ! = !"#&()* $ "#"'(% % $ Juvenile males: "! = ! ! # ! ! !µ "#$%&"! ! !"#' !=! ! = ! ! )#&*& $ "#"(&) % ' Juvenile females: "! = ! ! # ! ! !µ "#$%&'! ! !"#& !=! ! = ! ! "#)*&) $ "#"(%& % & The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions After checking to make sure student answers are correct, transfer the T-‐statistics to the following table. The ensuing discussion calls attention to the fact that the T is a “standardized” number, just like the Z-‐statistic. As long as the degrees of freedom are the same, the value of T indicates how far the sample mean is from the presumed population mean, in units of standard deviations. As with the Z, negative Ts indicate a sample mean below the presumed population mean.) When you calculate the Z-‐statistic for a particular number in a data set or the sample mean from a sample of a population, you are standardizing the result to make it interpretable without reference to the mean or standard deviation in the data set or population. The T-‐statistic accomplishes the same purpose when you are considering how far a sample mean is from the hypothesized population mean, relative to the standard deviation of the sample. Write your calculated T-‐statistics in the table below. Group T-‐statistic Adult males Adult females Juvenile males Juvenile females The population mean was arbitrarily assumed to be µ = 0.5. Considering only the T-‐statistics, consider the following questions: (2) Which of the four T-‐statistics indicates that the sample mean is farthest away from µ = 0.5 when compared to the sample variability? Explain your reasoning in a couple of sentences. (3) Which of the four T-‐statistics indicate(s) that the sample mean is below µ = 0.5? (4) Which of the four T-‐statistics indicates that the sample mean is closest to µ = 0.5 when compared to the sample variability? Explain your reasoning in a couple of sentences. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions The distributions of the results from the simulations in Lesson 10.2.1 should have looked familiar. You may have noticed that the distributions of sample means were centered in different places or that they appeared to have different variability. Now that you know the actual means of the data (and have 20/20 hindsight), do your results seem reasonable? • • For the samples that were the same size, does the distribution of the sample means collected from the more variable population seem more variable? For the samples with different sample sizes but the same variability, was the variability of the sample means less for the samples from the larger size? If you answered yes to both questions, your simulation results are consistent with the mathematical theory of how sampling works in the case of the sample mean. The shape of the sampling distribution of the mean is a more difficult question to address. If you reflected on your past experience, you may have speculated that the shapes of the distributions might be approximately “normal,” which certainly seems reasonable. However, if you are not given the population standard deviation, you must resort to estimating it using the sample standard deviation. When making that substitution, there is a sudden change in the shape of the sampling distribution of the sample mean. The sudden change results in procedures and methods that are very similar to what you have previously used for statistical inference. However, you will have a new family of distributions known as T-‐distributions. When making an inference about a single population mean and substituting the sample standard deviation in place of the population standard deviation, you use two related concepts: T-‐distributions (used when constructing confidence intervals) and the T-‐statistic (used when testing a hypothesis about the population mean). The first change with the new T-‐distributions is that rather than using the standardized variable (z) as in "! = ! ! # ! ! !µ " $ you substitute your sample standard deviation (s) for σ. The resulting standardized variable and its formula are "! = ! # ! ! !µ $ % ! There are many different normal distributions, one each for any given mean and standard deviation. There are also many different T-‐distributions, and they are distinguished by a positive whole number known (rather mysteriously) as the number of degrees of freedom (df). All T-‐distributions are bell-‐ shaped, similar in appearance to the standard normal curve, and are centered at 0. The T-‐distributions with smaller degrees of freedom are heavier in the tails; as the number of degrees of freedom increases, the tails become less heavy, and eventually the T-‐distributions become practically indistinguishable from the standard normal distribution. The appropriate number of degrees of freedom to use with a given data set depends on the sample size. The T-‐distribution to use when making inferences about a single population mean is the one with degrees of freedom equal to the sample size minus 1 (n – 1). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions Skill 1: How many degrees of freedom? To help you become familiar with the skills needed for inference about a population mean, you will consider some actual data. Your newly developed skills will be • • identifying the correct number of degrees of freedom to use for any given sample, and thus the correct sampling distribution for the sample mean; and calculating the T-‐statistic for a sample. Technically, to calculate a T-‐statistic you need to either know or hypothesize a population mean. For your example, µ = 0.5 has been arbitrarily selected for reasons that will become apparent shortly. The data you will consider are measures of the coiling behavior of some samples of cottonmouth snakes. There are samples of snakes of both genders and two age ranges to analyze: adult males, adult females, juvenile males, and juvenile females.1 You will investigate the degree to which these creatures appear to be left-‐handed. That is, what proportion of the time do individual snakes from an identified population curl so that the left side of its body is “pointing in”? The observed coiling index for an individual snake is defined as the fraction of the observed coils that were left-‐handed. If a particular snake was neither left-‐ handed nor right-‐handed, its coiling index was 0.5. The following are the coiling indices for the sample of 15 adult females: 0.582 0.585 0.550 0.554 0.609 0.545 0.544 0.600 0.638 0.656 0.600 0.696 0.424 0.493 0.491 Left-‐coiled snake Right-‐coiled snake 1 Roth, E. D. (2003). Handedness’ in snakes? Lateralization of coiling behavior in a cottonmouth, Agkistrodon piscivorus leucostoma, population. Animal Behaviour, 66, 337–341. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions Two questions will be asked about this set of data. The first question is about the degrees of freedom associated with this sample. Now, do not be suspicious—this is not a trick question! The answer is quick and easy: Q: How many degrees of freedom are associated with this sample size? A: Since n = 15, the appropriate number of degrees of freedom (n – 1) is 14. Skill 2: Calculating the T-‐statistic The second question involves the calculation of the T-‐statistic for a given set of data. You usually calculate the T-‐statistic for a sample when testing a hypothesis about a population mean. Recall that hypothesis testing involves making a hypothesis about a population value and then using a test statistic as a measure of the discrepancy between the statistic and the hypothesized parameter. Just as is true of the Z-‐statistic, the T-‐statistic is a measure of how many standard deviations the sample is from the hypothesized mean, which standard deviation is the standard deviation of the sampling distribution of the test statistic, and appears in the denominator of the T-‐statistic. In the case of the mean, you estimate this standard deviation using the standard deviation of the sample. Since at this point you do not actually have a hypothesis to test, a value has been arbitrarily assigned to the population mean: µ = 0.5. (This corresponds to the hypothesis that the population of snakes for that gender and age combination is neither left-‐handed nor right-‐handed.) Q: For an assumed population mean of µ = 0.5, what T-‐statistic would be calculated from the sample of adult females? A: To calculate the value of the T-‐statistic using the previous formula, you need to find the mean and sample standard deviation of the coiling behavior of the adult females for which you have data. Turning to a calculator, you find !" ! = !"#$%& and !"! = !"#"$% . Then the T-‐statistic is calculated as follows: # ! ! !µ "#$%&! ! !"#$ "! = ! !=! ! = !)#(*$ $ "#"'( % &$ ! The data for the other three categories of these slithery little critters are shown below: Adult males 0.563 0.556 0.522 0.541 0.395 Juvenile males 0.486 0.492 0.475 0.464 0.493 Juvenile females 0.512 0.556 0.565 0.417 0.429 (1) For each of these categories of slithering critter, answer the following: (a) The adult males and both male and female juveniles have the same sample size. What is the appropriate number of degrees of freedom for inference purposes? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions (b) Calculate the means and standard deviations for the adult males and both male and female juveniles. From this information, calculate the T-‐statistics, assuming a population mean of µ = 0.5. When you calculate the Z-‐statistic for a particular number in a data set or the sample mean from a sample of a population, you are standardizing the result to make it interpretable without reference to the mean or standard deviation in the data set or population. The T-‐statistic accomplishes the same purpose when you are considering how far a sample mean is from the hypothesized population mean, relative to the standard deviation of the sample. Write your calculated T-‐statistics in the table below. Group T-‐statistic Adult males Adult females Juvenile males Juvenile females The population mean was arbitrarily assumed to be µ = 0.5. Considering only the T-‐statistics, consider the following questions: (2) Which of the four T-‐statistics indicates that the sample mean is farthest away from µ = 0.5 when compared to the sample variability? Explain your reasoning in a couple of sentences. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.2: T-‐Statistics and T-‐Distributions (3) Which of the four T-‐statistics indicate(s) that the sample mean is below µ = 0.5? (4) Which of the four T-‐statistics indicates that the sample mean is closest to µ = 0.5 when compared to the sample variability? Explain your reasoning in a couple of sentences. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will begin to be able to use the T-‐procedures to construct a confidence interval for the population mean. Introduction [Student Handout] (Note: For this part, a T-‐table is required. T-‐tables can have different looks and feel, but any standard T-‐table works.) Your attention now will turn to the calculation of the confidence interval for a population mean. There are many similarities between this calculation and the calculation for a population proportion: • • • The formula for the confidence interval for a population mean has very much the same structure as the formula for the confidence interval for a population proportion. You can calculate confidence intervals of different levels, 90%, 95%, and 99% being the most common confidence levels. The interpretation of the confidence interval is the same as your experience with confidence intervals for a population proportion. There are also a few differences: • • The table of the critical values that helps determine the length of the confidence interval is not as detailed as the standard normal curve table; this will take a little getting used to. The T-‐procedures for constructing confidence intervals for the population mean (and for testing hypotheses) depends on the population being at least approximately normal. These differences indicate that you have two new procedures to learn: (a) convincing yourselves that the population from which you sampled is credibly normal in shape, and (b) reading the T-‐table. (It is possible that your calculator will render the need for the T-‐table nonexistent.) Books have slightly different T-‐tables, and calculators have slightly different sequences of buttons to push. Therefore, to some extent, you will have to write in vague generalities, but your instructor will fill in the details for your T-‐chart and calculator or computer software. (The information in the T-‐tables is the same, but they are organized differently and round to different numbers of decimals.) The T-‐table [Student Handout] You will consider reading the T-‐table first. Just as the confidence interval for a population proportion surrounds the sample proportion with an interval of a certain size, the confidence interval for a population mean surrounds the sample mean with an interval of a certain size. From the T-‐table or software, you need the critical T-‐values that lead you to the correct length for the confidence interval. The formula for a confidence interval for the population mean is ! $ "#! = !!" ± # $ # $ & " %% ! Notice the similarity to the formula for the confidence interval for a population proportion. Both involve a center (the sample mean in this case), an estimate of the standard deviation of the sampling The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean ! distribution # " interval. " ! $ " &% , and a critical value (!" ). Each quantity is needed to construct the confidence # The tables of critical values for the T-‐statistic will probably be similar to the partial table below: Central area 0.90 0.95 0.98 0.99 captured Confidence 90% 95% 98% 99% 1 6.31 12.71 31.82 63.66 2 2.92 4.30 6.97 9.93 3 2.35 3.18 4.54 5.84 4 2.13 2.78 3.75 4.60 5 2.02 2.57 3.37 4.03 … 14 1.76 2.15 2.62 2.98 15 1.75 2.13 2.60 2.95 16 1.75 2.12 2.58 2.92 …. level Each row of the table corresponds to the number of degrees of freedom associated with the sample size. Each column corresponds to a confidence level. (Remember, your table may be slightly different, perhaps rounding the critical T-‐values to three decimal points, and it will almost certainly have more columns than presented here.) As an example of constructing a confidence interval for a population mean, you will consider a common intervention associated with athletes: anterior cruciate ligament (ACL) surgery. (Thank goodness, no more snakes!) The ACL surgery aims at reconstruction of the knee and involves grafting the ACL to the tibia and femur. Grafting is accomplished using “fixation” nails. A basic consideration with this sort of surgery is preventing the tendon from rupturing again, and this brings up a question of the force needed to detach the fixation nail. To address this question, a team of researchers experimented with 15 human cadaver knees and a particular brand of fixation nail. After the surgery, the investigators increased the force on the fixation nails until they failed. The forces, measured in Newtons, needed to rupture the tendon in a sample of 15 cadaver knees are shown in the following table. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean Pull-‐out Force (Newtons) 550 485 515 370 510 420 500 600 600 450 350 350 450 500 400 You first must address the problem of deciding whether the data can be regarded as credibly resulting from sampling from a population that is at least approximately normal. Fortunately, the T-‐procedures work pretty well even if the population is only close to normal in shape. Since data on the whole population is not available, your best judgment is required to inspect the data you have. A workable rule of thumb involves checking for outliers in the data. If you do not see any outliers, you can be confident (no pun intended!) that your procedures for generating the confidence interval will be okay to use. A boxplot of your data on forces needed to rupture tendons from the ACL surgery is shown. Since there are no outliers, you will proceed to construct the confidence interval for the population mean. For these data, the mean and standard deviation are !" ! = !"#$!%&'()*+ and !"! = !"#$%!&'()*+, . Boxplot of Force 600 550 Force 500 450 400 350 Having justified the use of the T-‐procedure for constructing a confidence interval for these data, you can proceed to calculate the confidence interval. The first question is the easy one: What is the appropriate number of degrees of freedom? Since your sample size is n = 15, the appropriate number of degrees of freedom is 14. You will set your confidence level at the commonly used level of 95% and construct the confidence interval: ! $ "#$%&'!!!!!!!" ! ± !# ( # $ & " %% ! $ !!!!!!!!!!!!!! = !)*+! ± !# ( # ,-.) & " -# % ( !!!!!!!!!!!!!! = !)*+! ± !# ( /-.+-* ! ) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean Now you need the appropriate critical T-‐value from the table. Remember, you have 14 degrees of freedom and have settled on a 95% confidence interval. From the 14th row and 95% column in the partial table, the critical T-‐value is 2.15. ! $ "#$%&'!!!!!" ! ± !# ( # $ & " %% ! $ !!!!!!!!!!!!!! = !)*+! ± !# ( # ,-.) & " -# % ( !!!!!!!!!!!!!! = !)*+! ± !# ( /-.+-* ( )( ) !!!!!!!!!!!!!! = !)*+! ± ! /.-# /-.+-* ) !!!!!!!!!!!!!!! = !0)/).,-1!#-#.,/2 From these calculations, you can say that you are 95% confident the true population mean is in the interval from 424.81 Newtons to 515.82 Newtons. If you have a calculator to construct the confidence interval, your task is much easier: enter the data, choose a confidence level, and press the right buttons. Your calculator will probably print the sample mean and sample standard deviation as well as the confidence interval. It is customary to report the mean and standard deviation of the data, as well as the number of degrees of freedom when constructing a confidence interval, but your instructor may have slightly different requirements. For the following problems, verify that it is credible that the samples were gathered from random samples from normal populations, and then construct a 95% confidence interval for each population mean. (1) The resting pulse rates were gathered for 10 adult men involved in a study of the effectiveness of a regimen of bicycle exercise. These data are pulse rates in heartbeats per minute. 73 83 85 87 91 99 87 85 83 79 ! $ "#$%&'!!!!!!!" ! ± !# ( # $ & " %% ! $ !!!!!!!!!!!!!! = !)#*+! ± !# ( # ,*)"& " ./ % ( !!!!!!!!!!!!!! = !)#*+! ± !# ( +*.)/ ( )( ) !!!!!!!!!!!!!! = !)#*+! ± ! +*+,+ +*.)/ ) !!!!!!!!!!!!!!! = !0)/*+,"1!"/*.-.2 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean (2) The design of windows in cold climates must take into account how well the materials conduct heat. When the window material is manufactured, quality control procedures mandate the collection of samples of the material to see if the windows meet the design specifications. The heat conductivity results from a single day’s sample are given below. (Conductivity is measured in watts per square meter of surface per degree Celsius of temperature difference on the two sides of the material.) 1.12 1.06 1.10 1.08 1.11 1.09 1.16 1.19 1.13 1.15 1.17 1.14 ! $ "#$%&'!!!!!" ! ± !# ( # $ & " %% ! $ !!!!!!!!!!!!!! = !)*)+#! ± !# ( # ,*,-."/ & " )+ % ( !!!!!!!!!!!!!! = !)*)+#! ± !# ( ,*,))+01 ( )( ) !!!!!!!!!!!!!! = !)*)+#! ± ! +*+,) ,*,))+01 ) !!!!!!!!!!!!!!! = !2)*),,+3!)3)0".4 Wrap-‐Up After verifying the students’ correct procedures, once again point out the similarity between the Z-‐ and T-‐calculations. Stress that they should conceptually see the similarities of logic, but not forget the important differences in the formulas. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean Homework [Student Handout] (1) At Abraham Lincoln High School, an achievement test is used for college placement. Counselors selected a simple random sample of 30 students from the school and scored their tests by hand before sending them in for computer processing. For this sample, !" ! = "#"!$%&!#! = !#" . Construct a 99% confidence interval for µ, the population mean score for Abraham Lincoln High School. (2) Logging activity in forests is thought to affect the behavior of black bears. An important measure of animal behavior is the home range, the area used by animals in their daily lives. In a study of black bears in a logged Canadian forest, the spring and early summer home range (in square kilometers) of 12 radio-‐collared female black bears was measured with the following results: 39.9 23.5 42.1 29.4 34.4 40.9 27.9 22.3 13.0 20.1 13.3 8.6 (a) Construct and interpret a 95% confidence interval for the mean home range of female black bears in this logged forest. (b) The typical home range of females in forests with no logging is 20 square kilometers. Based on the confidence interval from Question 2a, do you think that the mean home range size of females in this logged forest could be the same as the mean home range size in nonlogged forests? Explain your reasoning, using appropriate statistical terminology. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean Your attention now will turn to the calculation of the confidence interval for a population mean. There are many similarities between this calculation and the calculation for a population proportion: • • • The formula for the confidence interval for a population mean has very much the same structure as the formula for the confidence interval for a population proportion. You can calculate confidence intervals of different levels, 90%, 95%, and 99% being the most common confidence levels. The interpretation of the confidence interval is the same as your experience with confidence intervals for a population proportion. There are also a few differences: • • The table of the critical values that helps determine the length of the confidence interval is not as detailed as the standard normal curve table; this will take a little getting used to. The T-‐procedures for constructing confidence intervals for the population mean (and for testing hypotheses) depends on the population being at least approximately normal. These differences indicate that you have two new procedures to learn: (a) convincing yourselves that the population from which you sampled is credibly normal in shape, and (b) reading the T-‐table. (It is possible that your calculator will render the need for the T-‐table nonexistent.) Books have slightly different T-‐tables, and calculators have slightly different sequences of buttons to push. Therefore, to some extent, you will have to write in vague generalities, but your instructor will fill in the details for your T-‐chart and calculator or computer software. (The information in the T-‐tables is the same, but they are organized differently and round to different numbers of decimals.) The T-‐table You will consider reading the T-‐table first. Just as the confidence interval for a population proportion surrounds the sample proportion with an interval of a certain size, the confidence interval for a population mean surrounds the sample mean with an interval of a certain size. From the T-‐table or software, you need the critical T-‐values that lead you to the correct length for the confidence interval. The formula for a confidence interval for the population mean is ! $ "#! = !!" ± # $ # $ & " %% ! Notice the similarity to the formula for the confidence interval for a population proportion. Both involve a center (the sample mean in this case), an estimate of the standard deviation of the sampling ! distribution # " interval. " ! $ " & , and a critical value (!" ). Each quantity is needed to construct the confidence #% The tables of critical values for the T-‐statistic will probably be similar to the partial table on the following page: The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean Central area 0.90 0.95 0.98 0.99 captured Confidence 90% 95% 98% 99% 1 6.31 12.71 31.82 63.66 2 2.92 4.30 6.97 9.93 3 2.35 3.18 4.54 5.84 4 2.13 2.78 3.75 4.60 5 2.02 2.57 3.37 4.03 … 14 1.76 2.15 2.62 2.98 15 1.75 2.13 2.60 2.95 16 1.75 2.12 2.58 2.92 …. level Each row of the table corresponds to the number of degrees of freedom associated with the sample size. Each column corresponds to a confidence level. (Remember, your table may be slightly different, perhaps rounding the critical T-‐values to three decimal points, and it will almost certainly have more columns than presented here.) As an example of constructing a confidence interval for a population mean, you will consider a common intervention associated with athletes: anterior cruciate ligament (ACL) surgery. (Thank goodness, no more snakes!) The ACL surgery aims at reconstruction of the knee and involves grafting the ACL to the tibia and femur. Grafting is accomplished using “fixation” nails. A basic consideration with this sort of surgery is preventing the tendon from rupturing again, and this brings up a question of the force needed to detach the fixation nail. To address this question, a team of researchers experimented with 15 human cadaver knees and a particular brand of fixation nail. After the surgery, the investigators increased the force on the fixation nails until they failed. The forces, measured in Newtons, needed to rupture the tendon in a sample of 15 cadaver knees are shown in the following table. Pull-‐out Force (Newtons) 550 485 515 370 510 420 500 600 600 450 350 350 450 500 400 You first must address the problem of deciding whether the data can be regarded as credibly resulting from sampling from a population that is at least approximately normal. Fortunately, the T-‐procedures work pretty well even if the population is only close to normal in shape. Since data on the whole The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean population is not available, your best judgment is required to inspect the data you have. A workable rule of thumb involves checking for outliers in the data. If you do not see any outliers, you can be confident (no pun intended!) that your procedures for generating the confidence interval will be okay to use. A boxplot of your data on forces needed to rupture tendons from the ACL surgery is shown. Since there are no outliers, you will proceed to construct the confidence interval for the population mean. For these data, the mean and standard deviation are !" ! = !"#$!%&'()*+ and !"! = !"#$%!&'()*+, . Boxplot of Force 600 550 Force 500 450 400 350 Having justified the use of the T-‐procedure for constructing a confidence interval for these data, you can proceed to calculate the confidence interval. The first question is the easy one: What is the appropriate number of degrees of freedom? Since your sample size is n = 15, the appropriate number of degrees of freedom is 14. You will set your confidence level at the commonly used level of 95% and construct the confidence interval: ! $ "#$%&'!!!!!!!" ! ± !# ( # $ & " %% ! $ !!!!!!!!!!!!!! = !)*+! ± !# ( # ,-.) & " -# % ( ) !!!!!!!!!!!!!! = !)*+! ± !# ( /-.+-* ! Now you need the appropriate critical T-‐value from the table. Remember, you have 14 degrees of freedom and have settled on a 95% confidence interval. From the 14th row and 95% column in the partial table, the critical T-‐value is 2.15. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean ! $ "#$%&'!!!!!" ! ± !# ( # $ & " %% ! $ !!!!!!!!!!!!!! = !)*+! ± !# ( # ,-.) & " -# % ( !!!!!!!!!!!!!! = !)*+! ± !# ( /-.+-* ( )( ) !!!!!!!!!!!!!! = !)*+! ± ! /.-# /-.+-* ) !!!!!!!!!!!!!!! = !0)/).,-1!#-#.,/2 From these calculations, you can say that you are 95% confident the true population mean is in the interval from 424.81 Newtons to 515.82 Newtons. If you have a calculator to construct the confidence interval, your task is much easier: enter the data, choose a confidence level, and press the right buttons. Your calculator will probably print the sample mean and sample standard deviation as well as the confidence interval. It is customary to report the mean and standard deviation of the data, as well as the number of degrees of freedom when constructing a confidence interval, but your instructor may have slightly different requirements. For the following problems, verify that it is credible that the samples were gathered from random samples from normal populations, and then construct a 95% confidence interval for each population mean. (1) The resting pulse rates were gathered for 10 adult men involved in a study of the effectiveness of a regimen of bicycle exercise. These data are pulse rates in heartbeats per minute. 73 83 85 87 91 99 87 85 83 79 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean (2) The design of windows in cold climates must take into account how well the materials conduct heat. When the window material is manufactured, quality control procedures mandate the collection of samples of the material to see if the windows meet the design specifications. The heat conductivity results from a single day’s sample are given below. (Conductivity is measured in watts per square meter of surface per degree Celsius of temperature difference on the two sides of the material.) 1.12 1.06 1.10 1.08 1.11 1.09 1.16 1.19 1.13 1.15 1.17 1.14 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.2.3: The Confidence Interval for a Population Mean Homework (1) At Abraham Lincoln High School, an achievement test is used for college placement. Counselors selected a simple random sample of 30 students from the school and scored their tests by hand before sending them in for computer processing. For this sample, !" ! = "#"!$%&!#! = !#" . Construct a 99% confidence interval for µ, the population mean score for Abraham Lincoln High School. (2) Logging activity in forests is thought to affect the behavior of black bears. An important measure of animal behavior is the home range, the area used by animals in their daily lives. In a study of black bears in a logged Canadian forest, the spring and early summer home range (in square kilometers) of 12 radio-‐collared female black bears was measured with the following results: 39.9 23.5 42.1 29.4 34.4 40.9 27.9 22.3 13.0 20.1 13.3 8.6 (a) Construct and interpret a 95% confidence interval for the mean home range of female black bears in this logged forest. (b) The typical home range of females in forests with no logging is 20 square kilometers. Based on the confidence interval from Question 2a, do you think that the mean home range size of females in this logged forest could be the same as the mean home range size in nonlogged forests? Explain your reasoning, using appropriate statistical terminology. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will begin to understand that the sampling distribution of the test statistic is a T-‐distribution. Students will begin to be able to set up correct null and alternative hypotheses for a population mean. Introduction [Student Handout] As you know, there are two basic forms of inference: the use of confidence intervals and hypothesis testing. In a previous lesson, you considered the problem of constructing a confidence interval for a population mean. You will now turn to the companion problem: testing the hypothesis that a population mean is equal to a specified value. Once again, you will capitalize on your previous experience with statistical inference. First, the sampling distribution of the sample mean has not changed its identity—it is a T-‐distribution. Second, the logic of the hypothesis test for a population mean has the same logic as the hypothesis for a population proportion; you will simply shift from a Z-‐statistic to a T-‐statistic. To motivate the discussion, you will use real data gathered to answer a real scientific question: How do monarch butterflies know where to go when they migrate? Rich Task [Student Handout] It is well known that some animals traveling over great distances are guided by the magnetic field of Earth. The homing pigeon is probably the most familiar example. In some cases, it is not clear how animals detect Earth’s magnetic field—one particular case is the monarch butterfly. Monarchs cannot survive a long cold winter and migrate south in the fall. Monarchs west of the Rocky Mountains travel to California, and those east of the Rocky Mountains fly to Mexico. They fly to the same winter roosts, often to the same trees year after year. It is not known how they locate their winter homes. How do they accomplish this? One possibility is that monarchs’ anatomy features some magnetic material that could be utilized in a magnetic field to assist them in navigation. A biologist wished to determine if monarchs might have some magnetic material in their bodies, using extremely sensitive magnetometers. He reasoned that if the magnetometers detected the presence of magnetic material in monarch bodies, the case for monarchs’ use of the magnetic field of Earth to help in their migration would be strengthened. Unfortunately, the magnetometer itself creates some “background” magnetism, about 200 pico-‐emus of magnetic intensity. (A pico-‐emu is 10–12 electromagnetic units.) To demonstrate that monarchs have some magnetic material in their bodies, it must be shown that the magnetic intensity of the butterflies exceeds 200 pico-‐emus naturally. A random sample of 16 butterflies was prepared by bathing them in distilled water. Then they were placed near the magnetometer to measure their body’s background magnetic intensity. The data on the measured magnetic intensity for each butterfly are as follows. You will test the hypothesis that the mean magnetic intensity is 200 pico-‐emus. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean Raw Data: Magnetic Intensity of Prepared Specimens in Pico-‐emus 486 328 532 323 173 122 366 298 124 219 264 490 145 75 402 290 (Note: As the hypothesis testing procedure unfolds, identify the similarities and differences between the hypothesis test for the mean and the previous hypothesis test [Z]. Stress that the logic is the same, the general form of the test statistic is the same, but the sampling distribution differs. In addition, stress that the T-‐distribution is the basis of inference for the confidence interval for and the hypothesis test of the mean.) In a previous topic, the T-‐statistic was introduced and calculated. You will use the T-‐statistic to test hypotheses about the population mean. (Note: At this point, assess the collective student memories for characteristics of the T-‐statistic and the T-‐distribution[s]: • • • The T-‐statistic is a standardized variable. The T-‐distributions are symmetric and centered at the mean. The T-‐distributions are distinguished by the associated the number of degrees of freedom in a particular context.) The first question to address is, “What will we accept as evidence against our hypothesis?” In other words, “What sort of sample mean counts as evidence of magnetic material in the monarch? What sort of mean is going to distinguish magnetic material in the monarch from background magnetism in the measuring device?” It seems reasonable that a sample mean sufficiently large would provide evidence of magnetic material in the monarchs. A sample mean close to 200 pico-‐emus or lower than 200 pico-‐ emus would not provide evidence of magnetic material; a sample mean of 200 pico-‐emus or less could be attributed to the magnetometer. Thus, it seems that a one-‐tailed hypothesis test is in order. You will perform the test at the 0.05 level (5%) of significance. The hypothesis testing procedure will be familiar from your work with the hypothesis testing you have done with proportions: 1. Define the parameter of interest. 2. Specify the null and alternative hypotheses. 3. Check the assumptions. 4. Perform the mechanics. 5. Interpret the results in the context of the problem. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean Let µ = mean background (natural) magnetic intensity of monarch butterflies. "# $ !µ ! = !%&& "' $ !µ ! > !%&& ! ! = !&(&) " ! = !%*+(*, #! = !,-,()& $! ! = !,. You do not know if the population of monarchs is normally distributed with respect to its level of magnetism, so you must check the plausibility of this assumption. A normal probability plot, histogram, or boxplot can be used to determine if this assumption is reasonable. Magnetic Intensity of Prepared Specimens (in Pico-‐emus) Normal Quantile Plot 3 .99 2 .95 .90 1 .75 .50 0 .25 -1 .10 .05 -2 .01 0 -3 0 100 200 300 400 500 600 100 200 400 500 600 300 Since there are no indications of skew in any of the plots, the assumption of normality in the population is judged to be reasonable. Thus, the T-‐procedure is justified. "! = ! !! = ! # ! ! !"#$%&"'()*'+!,-./' $ % 012314! ! !055 464375 48 !!! = !03792 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean P-‐value = 0.0113 (from calculator). Since P-‐value < α = 0.05, Ho is rejected. You have sufficient evidence at the 0.05 level that the mean magnetic intensity of monarchs is greater than 2.0. Without a calculator, you need the T-‐table. Since the sample size is 16, the appropriate number of degrees of freedom is n – 1 = 15. To reach statistical significance, the value of the T-‐statistic must equal or exceed 1.753. (Note: T-‐tables may differ. If the T-‐table presents the P-‐value as a two-‐tailed area, students should look for a value with a two-‐tailed area of 0.10, since this is a one-‐tailed test.) Homework [Student Handout] For each of the following problems, identify an appropriate null hypothesis and alternative hypothesis and the appropriate number of degrees of freedom for the T-‐test. (1) A boat manufacturer claims that a particular boat and motor combination burns less than 4 gallons of fuel per hour. Fuel consumption for a random sample of 10 similar boats resulted in the data below: 4.06 4.29 4.26 4.64 4.23 3.93 3.64 4.13 3.93 3.86 Is there sufficient evidence to conclude that the manufacturer’s claim is correct? Use a 5% confidence interval (α = 0.50) and test the appropriate hypothesis. (2) The SensoTeknika company specializes in designing small hand-‐held data collection devices that can be used in applications such as detecting radiation in shipping containers for homeland security, testing drinking water samples for pollutant levels, and scanning barcodes to track the delivery of packages. The company advertises that its devices last an average of 60 hours of continuous use after being fully charged. A manager who recently bought a large number of these devices for her environmental consulting services company has received complaints from some people on her team that the devices are losing power more quickly than expected. The manager suspects that SensoTeknika’s claim of 60 hours is not true, so she randomly selects eight of the devices to test how long they last after being fully charged. The results in hours are as follows: 50 57 61 60 56 62 53 57 Does the manager have sufficient evidence to support her suspicion? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean As you know, there are two basic forms of inference: the use of confidence intervals and hypothesis testing. In a previous lesson, you considered the problem of constructing a confidence interval for a population mean. You will now turn to the companion problem: testing the hypothesis that a population mean is equal to a specified value. Once again, you will capitalize on your previous experience with statistical inference. First, the sampling distribution of the sample mean has not changed its identity—it is a T-‐distribution. Second, the logic of the hypothesis test for a population mean has the same logic as the hypothesis for a population proportion; you will simply shift from a Z-‐statistic to a T-‐statistic. To motivate the discussion, you will use real data gathered to answer a real scientific question: How do monarch butterflies know where to go when they migrate? Task It is well known that some animals traveling over great distances are guided by the magnetic field of Earth. The homing pigeon is probably the most familiar example. In some cases, it is not clear how animals detect Earth’s magnetic field—one particular case is the monarch butterfly. Monarchs cannot survive a long cold winter and migrate south in the fall. Monarchs west of the Rocky Mountains travel to California, and those east of the Rocky Mountains fly to Mexico. They fly to the same winter roosts, often to the same trees year after year. It is not known how they locate their winter homes. How do they accomplish this? One possibility is that monarchs’ anatomy features some magnetic material that could be utilized in a magnetic field to assist them in navigation. A biologist wished to determine if monarchs might have some magnetic material in their bodies, using extremely sensitive magnetometers. He reasoned that if the magnetometers detected the presence of magnetic material in monarch bodies, the case for monarchs’ use of the magnetic field of Earth to help in their migration would be strengthened. Unfortunately, the magnetometer itself creates some “background” magnetism, about 200 pico-‐emus of magnetic intensity. (A pico-‐emu is 10–12 electromagnetic units.) To demonstrate that monarchs have some magnetic material in their bodies, it must be shown that the magnetic intensity of the butterflies exceeds 200 pico-‐emus naturally. A random sample of 16 butterflies was prepared by bathing them in distilled water. Then they were placed near the magnetometer to measure their body’s background magnetic intensity. The data on the measured magnetic intensity for each butterfly are as follows. You will test the hypothesis that the mean magnetic intensity is 200 pico-‐emus. Raw Data: Magnetic Intensity of Prepared Specimens in Pico-‐emus 486 328 532 323 173 122 366 298 124 219 264 490 145 75 402 290 In a previous topic, the T-‐statistic was introduced and calculated. You will use the T-‐statistic to test hypotheses about the population mean. The first question to address is, “What will we accept as evidence against our hypothesis?” In other words, “What sort of sample mean counts as evidence of magnetic material in the monarch? What sort The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean of mean is going to distinguish magnetic material in the monarch from background magnetism in the measuring device?” It seems reasonable that a sample mean sufficiently large would provide evidence of magnetic material in the monarchs. A sample mean close to 200 pico-‐emus or lower than 200 pico-‐ emus would not provide evidence of magnetic material; a sample mean of 200 pico-‐emus or less could be attributed to the magnetometer. Thus, it seems that a one-‐tailed hypothesis test is in order. You will perform the test at the 0.05 level (5%) of significance. The hypothesis testing procedure will be familiar from your work with the hypothesis testing you have done with proportions: 1. Define the parameter of interest. 2. Specify the null and alternative hypotheses. 3. Check the assumptions. 4. Perform the mechanics. 5. Interpret the results in the context of the problem. Let µ = mean background (natural) magnetic intensity of monarch butterflies. "# $ !µ ! = !%&& "' $ !µ ! > !%&& ! ! = !&(&) " ! = !%*+(*, #! = !,-,()& !$! = !,. You do not know if the population of monarchs is normally distributed with respect to its level of magnetism, so you must check the plausibility of this assumption. A normal probability plot, histogram, or boxplot can be used to determine if this assumption is reasonable. 3 .99 2 .95 .90 1 Normal Quantile Plot Magnetic Intensity of Prepared Specimens (in Pico-‐emus) .75 .50 0 .25 -1 .10 .05 -2 .01 0 -3 0 100 200 300 400 500 600 100 200 300 400 500 600 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean Since there are no indications of skew in any of the plots, the assumption of normality in the population is judged to be reasonable. Thus, the T-‐procedure is justified. "! = ! !! = ! # ! ! !"#$%&"'()*'+!,-./' $ % 012314! ! !055 464375 48 !!! = !03792 P-‐value = 0.0113 (from calculator). Since P-‐value < α = 0.05, Ho is rejected. You have sufficient evidence at the 0.05 level that the mean magnetic intensity of monarchs is greater than 2.0. Without a calculator, you need the T-‐table. Since the sample size is 16, the appropriate number of degrees of freedom is n – 1 = 15. To reach statistical significance, the value of the T-‐statistic must equal or exceed 1.753. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.3.1: Testing Hypotheses About a Population Mean Homework For each of the following problems, identify an appropriate null hypothesis and alternative hypothesis and the appropriate number of degrees of freedom for the T-‐test. (1) A boat manufacturer claims that a particular boat and motor combination burns less than 4 gallons of fuel per hour. Fuel consumption for a random sample of 10 similar boats resulted in the data below: 4.06 4.29 4.26 4.64 4.23 3.93 3.64 4.13 3.93 3.86 Is there sufficient evidence to conclude that the manufacturer’s claim is correct? Use a 5% confidence interval (α = 0.50) and test the appropriate hypothesis. (2) The SensoTeknika company specializes in designing small hand-‐held data collection devices that can be used in applications such as detecting radiation in shipping containers for homeland security, testing drinking water samples for pollutant levels, and scanning barcodes to track the delivery of packages. The company advertises that its devices last an average of 60 hours of continuous use after being fully charged. A manager who recently bought a large number of these devices for her environmental consulting services company has received complaints from some people on her team that the devices are losing power more quickly than expected. The manager suspects that SensoTeknika’s claim of 60 hours is not true, so she randomly selects eight of the devices to test how long they last after being fully charged. The results in hours are as follows: 50 57 61 60 56 62 53 57 Does the manager have sufficient evidence to support her suspicion? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.3.2: Test Statistic and P-‐Values, One-‐Sample T-‐Test Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will begin to be able to • • understand the reasons for the steps in the hypothesis testing process. execute the procedures to test the hypothesis about a population mean. Introduction [Student Handout] (Note: This lesson may be done the same day as Lesson 10.3.1 or the next day. Here you focus on the decisions and mechanics of the hypothesis testing procedure after the setup of the initial null and alternative hypotheses. Generally, the lesson consists of a discussion of the steps necessary and why they are important. After students have written their responses, guide the discussion to the answers provided below, or something similar. Students should realize that these steps are reasonable statistically and/or pedagogically, not constraints on their creativity.) In a previous lesson, you saw the steps for testing a hypothesis about a population mean, and your homework focused on the setup of the hypotheses. Today, you will consider the remaining steps in the hypothesis testing procedure. The steps in testing a statistical hypothesis constitute a formal piece of writing in statistics, and you want to be sure you know not only what the steps are, but also why they are important. In pairs, take a few minutes to discuss the following questions and write two-‐ to three-‐ sentence answers: (1) Why is it important to check the assumptions? (Answer: The sampling distribution depends on the assumptions being at least approximately correct.) (2) Why do you write the formula? (Answer: The formula is a clear statement about what statistical procedure you are using [e.g., distinguishing between a Z-‐ and T-‐procedure].) (3) Why do you drag out the calculations? Why do you not just write the P-‐value? (Answer: Part of the reason is that the steps are traditional, but also the greater detail in work allows instructors to identify specific problems students may be having.) (4) Why do you interpret the results in context? (Answer: Since the problem is posed within a specific context, the answer should also be presented in that context.) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.3.2: Test Statistic and P-‐Values, One-‐Sample T-‐Test Wrap-‐Up Reinforce these ideas with repetition—write the answers on the board and insist that students’ work will be carefully evaluated and adherence to these standards of writing will be strictly enforced. Give students enough time to do at least one homework problem in class, and monitor their progress to ensure they are doing the necessary work in each step. Homework [Student Handout] (1) The environmental consulting services company EnvoSense uses its hand-‐held data collection devices to test for the presence of a particular chemical in the fish caught from a large lake. To ensure that the fish are safe to eat, the company must verify that the mean concentration of the chemical does not exceed 50 parts per billion (ppb), according to the U.S. Department of Agriculture guidelines for food safety. EnvoSense took a random sample of 10 fish from the lake and measured the chemical concentrations in ppb: 26 66 48 54 51 45 69 49 37 38 Based on this sample, will EnvoSense have sufficient evidence to certify that the fish are safe to eat? (2) The microchips manufactured by the MicroHard Company must meet exact measurements such that they fit properly together with other computer components. The industrial robot used to manufacture the microchips is calibrated to ensure that the length of the microchips is exactly 6000.0 microns (a micron is one-‐thousandth of a millimeter). A technician suspects that the robot is not properly calibrated, so he randomly selects eight microchips from a batch that was recently manufactured and measures the length of each: 6,001.5 5,999.2 5,998.5 5,999.2 5,998.1 6,001.4 6,000.4 5,999.8 Based on this sample, will the technician have sufficient evidence to conclude that the robot needs to be recalibrated? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.3.2: Test Statistic and P-‐Values, One-‐Sample T-‐Test In a previous lesson, you saw the steps for testing a hypothesis about a population mean, and your homework focused on the setup of the hypotheses. Today, you will consider the remaining steps in the hypothesis testing procedure. The steps in testing a statistical hypothesis constitute a formal piece of writing in statistics, and you want to be sure you know not only what the steps are, but also why they are important. In pairs, take a few minutes to discuss the following questions and write two-‐ to three-‐ sentence answers: (1) Why is it important to check the assumptions? (2) Why do you write the formula? (3) Why do you drag out the calculations? Why do you not just write the P-‐value? (4) Why do you interpret the results in context? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.3.2: Test Statistic and P-‐Values, One-‐Sample T-‐Test Homework (1) The environmental consulting services company EnvoSense uses its hand-‐held data collection devices to test for the presence of a particular chemical in the fish caught from a large lake. To ensure that the fish are safe to eat, the company must verify that the mean concentration of the chemical does not exceed 50 parts per billion (ppb), according to the U.S. Department of Agriculture guidelines for food safety. EnvoSense took a random sample of 10 fish from the lake and measured the chemical concentrations in ppb: 26 66 48 54 51 45 69 49 37 38 Based on this sample, will EnvoSense have sufficient evidence to certify that the fish are safe to eat? (2) The microchips manufactured by the MicroHard Company must meet exact measurements such that they fit properly together with other computer components. The industrial robot used to manufacture the microchips is calibrated to ensure that the length of the microchips is exactly 6000.0 microns (a micron is one-‐thousandth of a millimeter). A technician suspects that the robot is not properly calibrated, so he randomly selects eight microchips from a batch that was recently manufactured and measures the length of each: 6,001.5 5,999.2 5,998.5 5,999.2 5,998.1 6,001.4 6,000.4 5,999.8 Based on this sample, will the technician have sufficient evidence to conclude that the robot needs to be recalibrated? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will begin to understand that • • paired data are acquired in a different manner than independent data. the sampling distributions of the differences in sample means of paired versus independent samples are not governed by the same rules. Students will begin to be able to • • • differentiate sampling of pairs and sampling independently in context. use their knowledge of the sampling distribution of the sample mean in this new context. interpret the results of inference (hypothesis testing and confidence intervals) in context. Introduction [Student Handout] Moving from inference for a single population mean to inference for two population means, you will encounter a problem that you did not see with proportions—there are two different situations that lead you to ask about a difference in population means, and the difference affects how you proceed. The basic question you must ask of the data from the two populations is, “Are the data paired?” The answer depends on how the data were gathered. Generally speaking, if the measures taken are independent of each other, the data are not paired, and if the measures taken are somehow related, the data are paired. These ideas will be illustrated with two sets of data. Rich Task Monitoring the Blood Pressure of Small Children [Student Handout] When doctors evaluate young children (0–4 years old) for possible cardiovascular abnormalities, continuous monitoring of blood pressure is desirable. Historically, blood pressure of adults has been monitored by inserting a catheter with a pressure-‐sensing mechanism into an artery. This procedure can be dangerous because of the risk of infection and internal bleeding. Finapres Catheter 23 31 30 38 30 30 45 43 A noninvasive method known as Finapres (Finger arterial pressure) is used. With Finapres, a small clamping device called a cuff is attached to a middle finger on one end and a monitoring device on the other. A new, tiny Finapres cuff has been developed and data gathered to compare the indicated blood pressures using the Finapres method and the catheter. A researcher recently measured the blood pressure of 15 children (0–4 years), using the two methods simultaneously. If the cuff was applied to the right middle finger, the catheter was inserted in the left radial artery (lower arm) and vice versa if the cuff was applied to the left middle finger. The left/right decisions for the cuff were made by random assignment. The results for diastolic pressure are shown in the table. 51 35 45 45 51 49 25 41 37 45 58 57 57 61 25 45 76 78 57 53 45 48 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means When taken, these data are considered to be paired because two measures are taken on a single entity, in this case a child. (1) Create a scenario that is similar to the one described, but results in unpaired data. (Answer: An example of unpaired data is taking measures using the Finapres on 15 randomly selected children, and then taking measures using a catheter on 15 other randomly selected children. This results in independent measures because there is no relation between any measure taken using Finapres and those using a catheter.) Coiling Behavior of Snakes [Student Handout] Let’s return to the data on the coiling behavior of cottonmouth snakes. Remember that the investigator defined a laterality index of the coiling behavior. Recall also some measures that were given: Adult males 0.563 0.556 0.522 0.541 0.395 Juvenile males 0.486 0.492 0.475 0.464 0.493 Juvenile females 0.512 0.556 0.565 0.417 0.429 (2) Are these data paired or unpaired? Explain your reasoning. (Answer: These data have been gathered independently, since the inclusion of any particular adult male has no relation to inclusion of any juvenile male or female in the samples.) (3) Suppose a detailed description of the snakes accompanied these data—that is, the adult male snakes are the fathers of the corresponding juvenile males and females for which coiling behavior is measured. (Answer: For this case, the data would not be independent [i.e., the data are paired].) Instructor Notes Ask a question and guide discussion to ensure that students understand that you cannot tell about independence by simply looking at the sample sizes. In the Finapres data and the snake data, the sample sizes are equal. This is very important because in many cases the data are presented in a table format. You must look to the context to decide if the data are paired! It is very important to drive home this distinction. Students will use the equality of sample sizes and the tabular presentation as a suboptimal strategy when deciding whether data are paired. It is difficult for students to correctly identify pairing and independence, possibly because they are mesmerized by their suboptimal strategy and are unwilling to read carefully. Discuss the following examples thoroughly in class, and stress the rationale behind whether the data are paired or independent. Assign examples not discussed in class as homework and discuss the next day. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means Homework [Student Handout] In the following situations, identify the data as paired or independent, and provide reasons for your classification. Approach Distance (m) (1) The early detection of danger is important for the survival of animals. at First Run In a field experiment in Costa Rica, investigators located and directly approached black iguanas (i.e., they walked straight toward the Eye No Eye iguanas). Two treatments were randomly assigned to the individual Contact Contact iguanas. In one treatment, the investigator gazed at the iguana while 3.19 2.09 approaching, maintaining eye contact. In the second treatment, the 2.34 1.96 investigator did not gaze at the iguana while approaching. The outcome 2.45 1.85 measured was the distance of the investigator from the iguana when it decided to run away. The researchers believe that eye contact is noticed 2.71 2.45 by the iguana, leading to a longer approach distance. Data from this 1.90 2.77 experiment are shown in the table. 2.12 2.55 2.56 2.44 3.41 2.80 2.41 3.27 2.66 2.01 2.86 3.49 2.44 2.75 (2) To test how well two different weight-‐loss programs work, 10 women were randomly selected from a large population of volunteers to participate in a two-‐month study. The women followed Plan A during the first month, and they switched to Plan B during the second month. Their weight loss, in terms of percent bodyweight lost during the month, are given in the following table. Participant 1 2 3 4 5 6 7 8 9 10 Plan A 4.7 5.2 4.4 3.2 4.1 4.9 4.3 3.9 4.0 4.1 Plan B 4.9 5.6 4.7 3.9 4.4 4.8 4.6 3.8 4.3 4.9 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means 3. The home range of an animal is the average area it occupies while foraging for food and defending its territory. It is thought that home ranges of animals usually do not change much, except when an area is under environmental stress. As part of a study of white-‐tailed deer in Florida, the deer were radiocollared and their movements followed over the course of a year. The home range data are shown below, where the area is reported in hectares (1 hectare = 2.471 acres). The investigators are interested in determining whether the home ranges of white-‐tailed deer change over the course of as little time as a year. 1992 Home Range 1991 Home Range Difference of Home Ranges 80 175 95 268 206 –62 113 103 –10 83 93 10 24 9 –15 111 115 4 100 135 35 103 14 –89 293 104 –189 95 104 9 152 319 167 133 59 –74 293 125 –168 32 112 80 80 206 126 61 115 54 271 49 –222 111 150 39 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means Downy Woodpecker Bill Lengths (cm) (4) Male and female Downy Woodpeckers (Picoides pubescens) drill in different areas of trees. One theory about why this occurs is that there are physical characteristics of males and females that lead them to choose different foraging locations. One possibility is the bill length of the males and females; longer bills may allow one gender to drill deeper into a tree. Male Female 2.01 1.78 1.84 1.76 1.86 1.74 1.91 1.82 1.75 1.87 1.79 1.84 1.88 1.82 2.05 1.87 1.85 1.93 1.90 1.76 1.94 1.96 1.86 1.86 The data in the table are the bill lengths of 12 male and 12 female randomly selected Downy Woodpeckers caught and released in a banding survey. The investigators want to know whether these data provide evidence that the male and female Downy Woodpeckers differ in mean bill size. An initial analysis of the data established the plausibility that the distributions of bill lengths are approximately normal. (5) The Iowa Tests of Basic Skills is a collection of various achievement tests given to students in Grades 1–8. Student achievement levels are reported on a scale that runs from 0 to 13. The North Snowshoe Community Schools are evaluating their reading program for students whose native language is not English. In one part of the study, 10 students randomly selected from a large population of students in the program took the Reading Comprehension test in 3rd grade and again in the 4th grade. Their scores for each year, in grade equivalents, are listed below. Normal growth is defined as 1 unit. Student Number 1 2 3 4 5 6 7 8 9 10 3rd-‐Grade Score 3.5 3.2 2.7 3.9 3.5 5.8 4.6 2.5 2.4 3.5 4th-‐Grade Score 4.8 3.5 2.6 4.8 4.2 6.5 5.2 2.9 2.2 3.7 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means Moving from inference for a single population mean to inference for two population means, you will encounter a problem that you did not see with proportions—there are two different situations that lead you to ask about a difference in population means, and the difference affects how you proceed. The basic question you must ask of the data from the two populations is, “Are the data paired?” The answer depends on how the data were gathered. Generally speaking, if the measures taken are independent of each other, the data are not paired, and if the measures taken are somehow related, the data are paired. These ideas will be illustrated with two sets of data. Monitoring the Blood Pressure of Small Children When doctors evaluate young children (0–4 years old) for possible cardiovascular abnormalities, continuous monitoring of blood pressure is desirable. Historically, blood pressure of adults has been monitored by inserting a catheter with a pressure-‐sensing mechanism into an artery. This procedure can be dangerous because of the risk of infection and internal bleeding. A noninvasive method known as Finapres (Finger arterial pressure) is used. With Finapres, a small clamping device called a cuff is attached to a middle finger on one end and a monitoring device on the other. A new, tiny Finapres cuff has been developed and data gathered to compare the indicated blood pressures using the Finapres method and the catheter. A researcher recently measured the blood pressure of 15 children (0–4 years), using the two methods simultaneously. If the cuff was applied to the right middle finger, the catheter was inserted in the left radial artery (lower arm) and vice versa if the cuff was applied to the left middle finger. The left/right decisions for the cuff were made by random assignment. The results for diastolic pressure are shown in the table. When taken, these data are considered to be paired because two measures are taken on a single entity, in this case a child. Finapres Catheter 23 31 30 38 30 30 45 43 51 35 45 45 51 49 25 41 37 45 58 57 57 61 25 45 76 78 57 53 45 48 (1) Create a scenario that is similar to the one described, but results in unpaired data. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means Coiling Behavior of Snakes Let’s return to the data on the coiling behavior of cottonmouth snakes. Remember that the investigator defined a laterality index of the coiling behavior. Recall also some measures that were given: Adult males 0.563 0.556 0.522 0.541 0.395 Juvenile males 0.486 0.492 0.475 0.464 0.493 Juvenile females 0.512 0.556 0.565 0.417 0.429 (2) Are these data paired or unpaired? Explain your reasoning. (3) Suppose a detailed description of the snakes accompanied these data—that is, the adult male snakes are the fathers of the corresponding juvenile males and females for which coiling behavior is measured. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means Homework In the following situations, identify the data as paired or independent, and provide reasons for your classification. (1) The early detection of danger is important for the survival of animals. In a field experiment in Costa Rica, investigators located and directly approached black iguanas (i.e., they walked straight toward the iguanas). Two treatments were randomly assigned to the individual iguanas. In one treatment, the investigator gazed at the iguana while approaching, maintaining eye contact. In the second treatment, the investigator did not gaze at the iguana while approaching. The outcome measured was the distance of the investigator from the iguana when it decided to run away. The researchers believe that eye contact is noticed by the iguana, leading to a longer approach distance. Data from this experiment are shown in the table. Approach Distance (m) at First Run Eye Contact No Eye Contact 3.19 2.09 2.34 1.96 2.45 1.85 2.71 2.45 1.90 2.77 2.12 2.55 2.56 2.44 3.41 2.80 2.41 3.27 2.66 2.01 2.86 3.49 2.44 2.75 (2) To test how well two different weight-‐loss programs work, 10 women were r andomly selected from a large population of volunteers to participate in a two-‐month study. The women followed Plan A during the first month, and they switched to Plan B during the second month. Their weight loss, in terms of percent bodyweight lost during the month, are given in the following table. Participant 1 2 3 4 5 6 7 8 9 10 Plan A 4.7 5.2 4.4 3.2 4.1 4.9 4.3 3.9 4.0 4.1 Plan B 4.9 5.6 4.7 3.9 4.4 4.8 4.6 3.8 4.3 4.9 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means 3. The home range of an animal is the average area it occupies while foraging for food and defending its territory. It is thought that home ranges of animals usually do not change much, except when an area is under environmental stress. As part of a study of white-‐tailed deer in Florida, the deer were radiocollared and their movements followed over the course of a year. The home range data are shown below, where the area is reported in hectares (1 hectare = 2.471 acres). The investigators are interested in determining whether the home ranges of white-‐tailed deer change over the course of as little time as a year. 1992 Home Range 1991 Home Range Difference of Home Ranges 80 175 95 268 206 –62 113 103 –10 83 93 10 24 9 –15 111 115 4 100 135 35 103 14 –89 293 104 –189 95 104 9 152 319 167 133 59 –74 293 125 –168 32 112 80 80 206 126 61 115 54 271 49 –222 111 150 39 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 25, 2012 (Full Version 1.0) Initiating Lesson 10.4.1: Inferences About the Difference Between Two Population Means Downy Woodpecker Bill Lengths (cm) (4) Male and female Downy Woodpeckers (Picoides pubescens) drill in different areas of trees. One theory about why this occurs is that there are physical characteristics of males and females that lead them to choose different foraging locations. One possibility is the bill length of the males and females; longer bills may allow one gender to drill deeper into a tree. Male Female 2.01 1.78 1.84 1.76 1.86 1.74 1.91 1.82 1.75 1.87 1.79 1.84 1.88 1.82 2.05 1.87 1.85 1.93 1.90 1.76 1.94 1.96 1.86 1.86 The data in the table are the bill lengths of 12 male and 12 female randomly selected Downy Woodpeckers caught and released in a banding survey. The investigators want to know whether these data provide evidence that the male and female Downy Woodpeckers differ in mean bill size. An initial analysis of the data established the plausibility that the distributions of bill lengths are approximately normal. (5) The Iowa Tests of Basic Skills is a collection of various achievement tests given to students in Grades 1–8. Student achievement levels are reported on a scale that runs from 0 to 13. The North Snowshoe Community Schools are evaluating their reading program for students whose native language is not English. In one part of the study, 10 students randomly selected from a large population of students in the program took the Reading Comprehension test in 3rd grade and again in the 4th grade. Their scores for each year, in grade equivalents, are listed below. Normal growth is defined as 1 unit. Student Number 1 2 3 4 5 6 7 8 9 10 3rd-‐Grade Score 3.5 3.2 2.7 3.9 3.5 5.8 4.6 2.5 2.4 3.5 4th-‐Grade Score 4.8 3.5 2.6 4.8 4.2 6.5 5.2 2.9 2.2 3.7 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.2: Inference for Paired Data Estimated number of 50-‐minute class sessions: 0.5 Learning Goals Students will understand that • • the sampling distribution of the mean difference is a T-‐distribution with degrees of freedom equal to the number of pairs – 1. the confidence intervals and hypothesis tests proceed as analyses of single differences of pairs. Introduction [Student Handout] Two phrases that capture both the similarity and difference between inference with paired and independent data are mean of differences and difference of means. Paired data are analyzed by looking at the difference between the numbers in the two pairs; therefore, your interest is in the mean of differences. In this lesson, you will see how this analysis unfolds. Consider the Finapres data from Lesson 10.4.1. Recall that blood pressure was taken from babies using two different methods; the data is organized in the following table. The question being asked is about the difference between the two measures taken on each child. Because the same child was measured twice, the question, “Do the two methods differ in measures?” changes to, “Is the difference in measures equal to zero?” Child No. Finapres Catheter Difference 1 23 31 2 30 38 3 30 30 4 45 43 5 51 35 6 45 45 7 51 49 8 25 41 9 37 45 10 58 57 11 57 61 12 25 45 13 76 78 14 57 53 15 45 48 Thus, your inferences—both confidence intervals and hypothesis tests—are concerned with a single population of differences. Because this is so, you already know how to analyze these data, since you previously learned how to make inferences about population means. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.2: Inference for Paired Data (Note: At this point, take students through the two inferential procedures with the Finapres data. First decide in what direction to subtract, Finapres – Catheter or Catheter – Finapres. The motivation for the hypothesis test is, “Is there a difference?” and the hypothesis is a difference of zero. In the case of the confidence interval, the question is phrased as one of bias. In other words, does the Finapres method give numbers that are different from the catheter method? How large is this bias and in what direction is it?) Wrap-‐Up/Homework [Student Handout] (Note: If possible, identify and analyze one of the paired data examples provided in Lesson 10.4.1 as directed below. Assign the remaining sets of paired data as homework.) In Lesson 10.4.1, you identified the data as paired or independent. For the data that are paired, do the following: (1) Test the appropriate null hypothesis, being careful to perform all steps, including the checking of assumptions. (2) Construct a confidence interval for the mean difference. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.2: Inference for Paired Data Two phrases that capture both the similarity and difference between inference with paired and independent data are mean of differences and difference of means. Paired data are analyzed by looking at the difference between the numbers in the two pairs; therefore, your interest is in the mean of differences. In this lesson, you will see how this analysis unfolds. Consider the Finapres data from Lesson 10.4.1. Recall that blood pressure was taken from babies using two different methods; the data is organized in the following table. The question being asked is about the difference between the two measures taken on each child. Because the same child was measured twice, the question, “Do the two methods differ in measures?” changes to, “Is the difference in measures equal to zero?” Child No. Finapres Catheter Difference 1 23 31 2 30 38 3 30 30 4 45 43 5 51 35 6 45 45 7 51 49 8 25 41 9 37 45 10 58 57 11 57 61 12 25 45 13 76 78 14 57 53 15 45 48 Thus, your inferences—both confidence intervals and hypothesis tests—are concerned with a single population of differences. Because this is so, you already know how to analyze these data, since you previously learned how to make inferences about population means. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.2: Inference for Paired Data Wrap-‐Up/Homework In Lesson 10.4.1, you identified the data as paired or independent. For the data that are paired, do the following: (1) Test the appropriate null hypothesis, being careful to perform all steps, including the checking of assumptions. (2) Construct a confidence interval for the mean difference. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will understand that • • the sampling distribution of the difference in means is approximated by the appropriate T-‐distribution. this new hypothesis testing procedure follows the same logic as previous tests of hypotheses. Students will be able to use the T-‐tables and/or technology to test hypotheses about the difference between population means, given sample data. Introduction [Student Handout] Once again, you return to the data on the coiling behavior of cottonmouth snakes. Remember, you identified these data as having come from independent samples. In other words, the inclusion of one snake in a sample had no bearing on whether another snake was included in any other sample. Your data came from random samples of three separate populations: adult males, juvenile males, and juvenile females. Adult males 0.563 0.556 0.522 0.541 0.395 Juvenile males 0.486 0.492 0.475 0.464 0.493 Juvenile females 0.512 0.556 0.565 0.417 0.429 You will investigate the processes of inference about the differences in means, using these data to illustrate hypothesis testing and confidence intervals. To grapple with the problem of inference about µ ! !µ # (the difference two population means), you first need to understand the center, variability, and ! " shape of the sampling distribution of the statistic "" ! !"# (the difference in sample means). The ! sampling distribution of the difference in sample means is something of a replay of earlier sampling distributions, with only a minor adjustment of degrees of freedom. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test The Sampling Distribution of "" ! !" # ! Center Spread The mean of the distribution of "" ! ! !"# is equal to µ" ! !µ # : ! ! µ ! = !µ" ! !µ # ! "" !!!!"# The variability of the distribution of "" ! !"# is a function of the variability in each sampling ! distribution of the individual sample means: ## ## ! " !"!" ! = ! " ! + ! # " # $" $# ! The shape of the sampling distribution of the statistic Shape ( " ! !" ) ! ! ! (µ " # #"# !+! " ### ! !µ # ) is approximately a $" $# ! T-‐distribution, with degrees of freedom somewhere between the quantities: (a) the smaller of (" ! ! !") and !(" ! ! !#) , and (b) !(" ! ! !") ! + ! (" ! ! !") . ! " " " # The appropriate number of degrees of freedom is determined by a complicated formula involving the sample variances and the sample sizes. If you are using a calculator with statistics functions or a computer with statistical software, this formula is used to get the appropriate number of degrees of freedom. (Oddly, the number of degrees of freedom can be a decimal!) ( ) ( ) If you do not have a calculator of software, the rule is to use the smaller of "" ! ! !" and "" ! ! !# as a ! ! fallback position. (Note: Assure students that their calculations using either degrees of freedom will be okay.) The methods for testing a hypothesis about a difference in population means and constructing a confidence interval for a difference in population means follow the usual logic. For a hypothesis test, you calculate a standardized test statistic: "#$"!$"%"&$"&'!!(!! "! = ! ! $"%"&$"&'! ! !)*+,")#$&-#.!/%01# $"%2.%3.!#33,3!,4!$"%"&$"&' ( # ! ! !# ) ! ! ! (µ ! ! !µ ) 5 6 5 6 5 $ %5 !+! 6 6 6 $ %6 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test For a confidence interval, you perform the usual calculations: ( ) ( "#!!$!!%&'&(%&()!!*!! )+(&()',!-',./ ! ! ! %&'01'+1!/++2+ ) $4 $4 µ3 ! " !µ 4 ! = ! "3 ! " !"4 ! ± !# 5 3 ! + ! 4 %3 %4 ! As an example, compare male and female juvenile coiling behavior. The means and standard deviations have already been calculated for you. The following are the results. ( ) (Note: With sample sizes of 5, boxplots and histograms are useless in assessing normality. The normal probability plots from Minitab are presented as follows. Assure students that they should do as you say, not as you do in this case!) Probability Plot of Males Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 0.482 0.01235 5 0.311 0.392 Percent 80 70 60 50 40 30 20 10 5 1 0.42 0.44 0.46 0.48 Males 0.50 0.52 0.54 Probability Plot of Females Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 0.4958 0.06955 5 0.373 0.258 Percent 80 70 60 50 40 30 20 10 5 1 0.2 0.3 0.4 0.5 Females 0.6 0.7 0.8 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test Juvenile Snake Data Mean Sample Standard Deviation Approximately Normal? Males 0.482 0.01235 Yes Females 0.496 0.0695 Yes Gender The hypothesis test follows the usual procedure: "#$!µ " !%&'!µ # !(#!$)#!*+*,-%$.+&!/#%&0!1+2!/%-#0!%&'!1#/%-#03!2#0*#4$.5#-67 8+ 9 !µ: ! !µ ; ! = !< 8% 9 !µ: ! !µ ; ! " !< #! = !<7<= >)#4?!$)#!%00,/*$.+&0@!!AB+,2!.&0$2,4$+2!)%0!'+&#!$).03!%&'!$)#6!%2#!+?%67C $! = ! ( % ! ! !% ) ! ! ! (µ ! ! !µ ) : ; : ; : & ': !=! !+! ; ; ; & '; (<7DE;! ! !<7DFG) ! ! ! (<) <7<:;H=; <7<GF=; !+! = = !! = ! ! <7DD !( ! )*+,-! = !<7GE=3!!.# ! = !D Since the P-‐value is greater than 0.05, you do not have sufficient evidence that the male and female juveniles exhibit different coiling behavior. ( ) ( ) The rule about using the smaller of "" ! ! !" and "" ! ! !# for the degrees of freedom was employed. ! ! (Note: Remind students of the format of hypothesis testing. Different books have slightly different presentations.) As an example of constructing a confidence interval, the same data are used. Remember, you have already checked the assumption of normality. Again, the critical value of t (2.776) associated with 4 degrees of freedom is used. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test ( )( "#!!$!!%&'&(%&()!!*!! )+(&()',!-',./ %&'01'+1!/++2+ ( ) µ3 ! ! !µ 4 ! = ! "3 ! !"4 ! ± !# 5 ( $34 %3 !+! ) $44 %4 !!!!!!!!!!! = ! 67894! ! !678:; ! ± !47<<; ! ) 67634=>4 676;:>4 !+! > > !!!!!!!!!!! = !?!67363>@!676<=:A Wrap-‐Up/Homework [Student Handout] (Note: If possible, identify and analyze one of the independent data examples provided in Lesson 10.4.1 as directed below. Assign the remaining sets of independent data as homework.) In Lesson 10.4.1, you identified the data as paired or independent. For the data that are independent, do the following: (1) Test the appropriate null hypothesis, being careful to perform all steps, including the checking of assumptions. (2) Construct a confidence interval for the difference in means. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test Once again, you return to the data on the coiling behavior of cottonmouth snakes. Remember, you identified these data as having come from independent samples. In other words, the inclusion of one snake in a sample had no bearing on whether another snake was included in any other sample. Your data came from random samples of three separate populations: adult males, juvenile males, and juvenile females. Adult males 0.563 0.556 0.522 0.541 0.395 Juvenile males 0.486 0.492 0.475 0.464 0.493 Juvenile females 0.512 0.556 0.565 0.417 0.429 You will investigate the processes of inference about the differences in means, using these data to illustrate hypothesis testing and confidence intervals. To grapple with the problem of inference about µ ! !µ # (the difference two population means), you first need to understand the center, variability, and ! " shape of the sampling distribution of the statistic "" ! !"# (the difference in sample means). The ! sampling distribution of the difference in sample means is something of a replay of earlier sampling distributions, with only a minor adjustment of degrees of freedom. The Sampling Distribution of "" ! !" # ! Center Spread The mean of the distribution of "" ! ! !"# is equal to µ" ! !µ # : ! ! µ ! = !µ" ! !µ # ! "" !!!!"# The variability of the distribution of "" ! !"# is a function of the variability in each sampling ! distribution of the individual sample means: ! " !"!" ! = ! ! " # #"# $" !+! The shape of the sampling distribution of the statistic Shape ### $# ( " ! !" ) ! ! ! (µ " # # " # !+! " ! !µ # # # # ) is approximately a $" $# ! T-‐distribution, with degrees of freedom somewhere between the quantities: (a) the smaller of (" ! ! !") and !(" ! ! !#) , and (b) !(" ! ! !") ! + ! (" ! ! !") . ! " " " # The appropriate number of degrees of freedom is determined by a complicated formula involving the sample variances and the sample sizes. If you are using a calculator with statistics functions or a computer with statistical software, this formula is used to get the appropriate number of degrees of freedom. (Oddly, the number of degrees of freedom can be a decimal!) ( ) ( ) If you do not have a calculator of software, the rule is to use the smaller of "" ! ! !" and "" ! ! !# as a ! ! fallback position. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test The methods for testing a hypothesis about a difference in population means and constructing a confidence interval for a difference in population means follow the usual logic. For a hypothesis test, you calculate a standardized test statistic: $"%"&$"&'! ! !)*+,")#$&-#.!/%01# "#$"!$"%"&$"&'!!(!! $"%2.%3.!#33,3!,4!$"%"&$"&' "! = ! ( # ! ! !# ) ! ! ! (µ ! ! !µ ) 5 6 5 $56 !+! 6 $66 %5 %6 ! For a confidence interval, you perform the usual calculations: ( ) ( "#!!$!!%&'&(%&()!!*!! )+(&()',!-',./ ! ! ! %&'01'+1!/++2+ ( ) $34 ) $44 µ3 ! " !µ 4 ! = ! "3 ! " !"4 ! ± !# !+! %3 %4 ! As an example, compare male and female juvenile coiling behavior. The means and standard deviations have already been calculated for you. The following are the results. 5 Probability Plot of Males Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 0.482 0.01235 5 0.311 0.392 Percent 80 70 60 50 40 30 20 10 5 1 0.42 0.44 0.46 0.48 Males 0.50 0.52 0.54 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test Probability Plot of Females Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 0.4958 0.06955 5 0.373 0.258 Percent 80 70 60 50 40 30 20 10 5 1 0.2 0.3 0.4 0.5 Females 0.6 0.7 0.8 Juvenile Snake Data Mean Sample Standard Deviation Approximately Normal? Males 0.482 0.01235 Yes Females 0.496 0.0695 Yes Gender The hypothesis test follows the usual procedure: "#$!µ " !%&'!µ # !(#!$)#!*+*,-%$.+&!/#%&0!1+2!/%-#0!%&'!1#/%-#03!2#0*#4$.5#-67 8+ 9 !µ: ! !µ ; ! = !< 8% 9 !µ: ! !µ ; ! " !< #! = !<7<= >)#4?!$)#!%00,/*$.+&0@!!AB+,2!.&0$2,4$+2!)%0!'+&#!$).03!%&'!$)#6!%2#!+?%67C $! = ! ( % ! ! !% ) ! ! ! (µ ! ! !µ ) : ; : &:; ': !=! !+! ; &;; '; (<7DE;! ! !<7DFG) ! ! ! (<) <7<:;H=; <7<GF=; !+! = = !! = ! ! <7DD !( ! )*+,-! = !<7GE=3!!.# ! = !D Since the P-‐value is greater than 0.05, you do not have sufficient evidence that the male and female juveniles exhibit different coiling behavior. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test ( ) ( ) The rule about using the smaller of "" ! ! !" and "" ! ! !# for the degrees of freedom was employed. ! ! As an example of constructing a confidence interval, the same data are used. Remember, you have already checked the assumption of normality. Again, the critical value of t (2.776) associated with 4 degrees of freedom is used. ( )( "#!!$!!%&'&(%&()!!*!! )+(&()',!-',./ %&'01'+1!/++2+ ( ) µ3 ! ! !µ 4 ! = ! "3 ! !"4 ! ± !# 5 ( $34 %3 !+! ) $44 %4 !!!!!!!!!!! = ! 67894! ! !678:; ! ± !47<<; ! ) 67634=>4 676;:>4 !+! > > !!!!!!!!!!! = !?!67363>@!676<=:A The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 25, 2012 (Full Version 1.0) Supporting Lesson 10.4.3: Two-‐Sample T-‐Test Wrap-‐Up/Homework In Lesson 10.4.1, you identified the data as paired or independent. For the data that are independent, do the following: (1) Test the appropriate null hypothesis, being careful to perform all steps, including the checking of assumptions. (2) Construct a confidence interval for the difference in means. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5

Statway TM A statistics pathway for college students

Related documents

Products

Support

Statway TM A statistics pathway for college students

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib