Statway TM A statistics pathway for college students

Statway TM A statistics pathway for college students Module 1: Statistical Studies and Overview of the Data Analysis Process Module 2: Summarizing Data Graphically and Numerically Module 3: Reasoning About Bivariate Numerical Data—Linear Relationships Module 4: Modeling Nonlinear Relationships Module 5: Reasoning About Bivariate Categorical Data and Introduction to Probability Module 6: Formalizing Probability and Probability Distributions Module 7: Linking Probability to Statistical Inference Module 8: Inference for One Proportion Module 9: Inference for Two Proportions Module 10: Inference for Means Module 11: Chi-Squared Tests Module 12: Other Mathematical Content Version 1.0 A resource from The Charles A. Dana Center at The University of Texas at Austin July 2011 Frontmatter Statway—Full Version 1.0, July 2011 Unless otherwise indicated, the materials found in this resource are Copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin Outside the license described below, no part of this resource shall be reproduced, stored in a retrieval system, or transmitted by any means—electronically, mechanically, or via photocopying, recording, or otherwise, including via methods yet to be invented—without express written permission from the Foundation and the University. The original version of this work was created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. STATWAYTM / StatwayTM is a trademark of the Carnegie Foundation for the Advancement of Teaching. *** This copyright notice is intended to prohibit unlicensed commercial use of the Statway materials. License for use Statway Version 1.0, developed by the Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported license. To view the details of this license, see creativecommons.org/licenses/by-nc-sa/3.0. In general, under this license You are free: to Share—to copy, distribute, and transmit the work to Remix—to adapt the work Under the following conditions: Attribution—You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). We request you attribute the work thus: The original version of this work was developed by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This work is used (or adapted) under the Creative Commons Attribution-NonCommercialShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license: creativecommons.org/licenses/by-nc-sa/3.0. For more information about Carnegie’s work on Statway, see www.carnegiefoundation.org/statway; for information on the Dana Center’s work on The New Mathways Project, see www.utdanacenter.org/mathways. Noncommercial—You may not use this work for commercial purposes. Share Alike—If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. The Charles A. Dana Center at the University of Texas at Austin, as well as the authors and editors, assume no liability for any loss or damage resulting from the use of this resource. We have made extensive efforts to ensure the accuracy of the information in this resource, to provide proper acknowledgement of original sources, and to otherwise comply with copyright law. If you find an error or you believe we have failed to provide proper acknowledgment, please contact us at dana-txshop@utlists.utexas.edu. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. ii Frontmatter The Charles A. Dana Center The University of Texas at Austin 1616 Guadalupe Street, Suite 3.206 Austin, TX 78701-1222 Fax: 512-232-1855 dana-txshop@utlists.utexas.edu www.utdanacenter.org Statway—Full Version 1.0, July 2011 The Carnegie Foundation for the Advancement of Teaching 51 Vista Lane Stanford, California, 94305 Phone: 650-566-5110 pathways@carnegiefoundation.org www.carnegiefoundation.org About the development of this resource The content for this full version of Statway was developed under a November 30, 2010, agreement by a team of faculty authors and reviewers contracted and managed by the Charles A. Dana Center at the University of Texas at Austin with funding from the Carnegie Foundation for the Advancement of Teaching. This resource was produced in Microsoft Word 2008 and 2011 for the Mac. The content of these 12 modules was developed and produced (that is, written, reviewed, edited, and laid out) by the Charles A. Dana Center at The University of Texas at Austin and delivered by the Dana Center to the Carnegie Foundation for the Advancement of Teaching on June 30, 2011. Some issues to be aware of: • PDF files need to be viewed with Adobe Acrobat for full functionality. If viewed through Preview, which is the default on some computers, the URLs may not be correct. • The file names indicate the lesson number and whether the document is the instructor or student version or the out-of-class experience. The Dana Center is engaged in a process of revising and improving these materials to create the Dana Center Statistics Pathway. We welcome feedback from the community as part of our course revision process. If you would like to discuss these materials or learn more about the Dana Center’s plans for this course, contact us at mathways@austin.utexas.edu. About the Charles A. Dana Center at The University of Texas at Austin The Dana Center collaborates with local and national entities to improve education systems so that they foster opportunity for all students, particularly in mathematics and science. We are dedicated to nurturing students’ intellectual passions and ensuring that every student leaves school prepared for success in postsecondary education and the contemporary workplace—and for active participation in our modern democracy. The Center was founded in 1991 in the College of Natural Sciences at The University of Texas at Austin. Our original purpose—which continues in our work today—was to raise student achievement in K–16 mathematics and science, especially for historically underserved populations. We carry out our work by supporting high standards and building system capacity; collaborating with key state and national organizations to address emerging issues; creating and delivering professional supports for educators and education leaders; and writing and publishing education resources, including student supports. Our staff of more than 80 researchers and education professionals has worked intensively with dozens of school systems in nearly 20 states and with 90 percent of Texas’s more than 1,000 school districts. As one of the College’s largest research units, the Dana Center works to further the university’s mission of achieving excellence in education, research, and public service. We are committed to ensuring that the accident of where a student attends school does not limit the academic opportunities he or she can pursue. For more information about the Dana Center and our programs and resources, see our homepage at www.utdanacenter.org. To access our resources (many of them free) please see our products index at www.utdanacenter.org/products. To learn about Dana Center professional development sessions, see our professional development site at www.utdanacenter.org/pd. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. iii Frontmatter Statway—Full Version 1.0, July 2011 Acknowledgments The original version of this work was created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. Carnegie Corporation of New York, The Bill & Melinda Gates Foundation, The William and Flora Hewlett Foundation, Lumina Foundation, and The Kresge Foundation joined in partnership with the Carnegie Foundation for the Advancement of Teaching in this work. Leadership—Charles A. Dana Center at the University of Texas at Austin Uri Treisman, director Susan Hudson Hull, program director of mathematics national initiatives Leadership—Carnegie Foundation for the Advancement of Teaching Anthony S. Bryk, president Bernadine Chuck Fong, senior managing partner Louis Gomez, senior fellow Paul LeMahieu, senior fellow James Stigler, senior fellow Uri Treisman, senior fellow Guadalupe Valdés, senior fellow Statway Project Leads Kristen Bishop, former team lead for the New Mathways Project, the Charles A. Dana Center at the University of Texas at Austin Thomas J. Connolly, project lead, Statway, the Charles A. Dana Center at the University of Texas at Austin Karon Klipple, director of Statway, the Carnegie Foundation for the Advancement of Teaching Jane Muhich, director of Quantway, the Carnegie Foundation for the Advancement of Teaching Project Staff—Charles A. Dana Center at the University of Texas at Austin Richard Blount, advisor Kathi Cook, project director, online services team Jenna Cullinane, research associate Steve Engler, lead editor and production editor Amy Getz, team lead for the New Mathways Project Susan Hudson Hull, program director of mathematics national initiatives Joseph Hunt, graduate research assistant Rachel Jenkins, consulting editor Erica Moreno, program coordinator Carol Robinson, administrative associate Cathy Seeley, senior fellow Rachele Seifert, administrative associate Lilly Soto, senior administrative associate Phil Swann, senior designer Laura Torres, graduate research assistant Thomas Wiegel, freelance formatter and proofreader The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. iv Frontmatter Statway—Full Version 1.0, July 2011 Authors Contracted by the Dana Center Roxy Peck, professor emerita of statistics, California Polytechnic State University, San Luis Obispo, California Beth Chance, professor of statistics, California Polytechnic State University, San Luis Obispo, California Robert C. delMas, associate professor of educational psychology, University of Minnesota, Minneapolis, Minnesota Scott Guth, professor of mathematics, Mt. San Antonio College, Walnut, California Rebekah Isaak, graduate research student, University of Minnesota, Minneapolis, Minnesota Leah McGuire, assistant professor, University of Minnesota, Minneapolis, Minnesota Jiyoon Park, graduate research student, University of Minnesota, Minneapolis, Minnesota Brian Kotz, associate professor of mathematics, Montgomery College, Germantown, Maryland Chris Olsen, assistant professor of mathematics and statistics, Grinnell College, Grinnell, Iowa Mary Parker, professor of mathematics, Austin Community College, Austin, Texas Michael A. Posner, associate professor of statistics, Villanova University, Villanova, Pennsylvania Thomas H. Short, professor, John Carroll University, University Heights, Ohio Penny Smeltzer, teacher of statistics, Westwood High School, Austin, Texas Myra Snell, professor of mathematics, Los Medanos College, Pittsburg, California Laura Ziegler, graduate research student, University of Minnesota, Minneapolis, Minnesota Reviewers Contracted by the Dana Center Michelle Brock, American River College, Sacramento, California Thomas J. Connolly, the Charles A. Dana Center at the University of Texas at Austin Andre Freeman, Capital Community College, Hartford, Connecticut Karon Klipple, the Carnegie Foundation for the Advancement of Teaching Roxy Peck, professor emerita of statistics, California Polytechnic State University, San Luis Obispo, California Jim Smart, Tallahassee Community College, Tallahassee, Florida Myra Snell, Los Medanos College, Pittsburg, California Committee for Statistics Learning Outcomes Rose Asera, formerly of the Carnegie Foundation for the Advancement of Teaching Kristen Bishop, formerly of the Charles A. Dana Center at the University of Texas at Austin Richelle (Rikki) Blair, American Mathematical Association of Two-Year Colleges (AMATYC); Lakeland Community College, Ohio David Bressoud, Mathematical Association of America (MAA); Macalester College, Minnesota John Climent, American Mathematical Association of Two-Year Colleges (AMATYC); Cecil College, Maryland Peg Crider, Lone Star College, Tomball, Texas Jenna Cullinane, the Charles A. Dana Center at the University of Texas at Austin Robert C. delMas, Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); University of Minnesota, Minneapolis, Minnesota Bernadine Chuck Fong, the Carnegie Foundation for the Advancement of Teaching Karen Givvin, the University of California, Los Angeles Larry Gray, American Mathematical Society (AMS); University of Minnesota Susan Hudson Hull, the Charles A. Dana Center at the University of Texas at Austin Rob Kimball, American Mathematical Association of Two-Year Colleges (AMATYC); Wake Technical Community College, North Carolina Dennis Pearl, Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); The Ohio State University Roxy Peck, American Statistical Association (ASA); Consortium for the Advancement of Undergraduate Statistics Education (CAUSE); California Polytechnic State University, San Luis Obispo, California The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. v Frontmatter Statway—Full Version 1.0, July 2011 Myra Snell, American Mathematical Association of Two-Year Colleges (AMATYC); Los Medanos College, Pittsburg, California Jim Stigler, the Carnegie Foundation for the Advancement of Teaching; the University of California, Los Angeles Daniel Teague, Mathematical Association of America (MAA); North Carolina School of Science and Mathematics, Durham Uri Treisman, the Carnegie Foundation for the Advancement of Teaching; the Charles A. Dana Center at the University of Texas at Austin Version 1.0 of Statway was developed in collaboration with faculty from the following colleges, the “Collaboratory,” who advised on the development of the course. These Collaboratory colleges are: Florida Miami Dade College, Miami, Florida Tallahassee Community College, Tallahassee, Florida Valencia Community College, Orlando, Florida California American River College, Sacramento, California Foothill College, Los Altos Hills, California Mt. San Antonio College, Walnut, California Pierce College, Woodland Hills, California San Diego City College, San Diego, California California State University System Texas CSU Northridge Sacramento State University San Jose State University Austin Community College, Austin, Texas El Paso Community College, El Paso, Texas Houston Community College, Houston, Texas Northwest Vista College, San Antonio, Texas Richland College, Dallas, Texas Connecticut Washington Capital Community College, Hartford, Connecticut Gateway Community College, New Haven, Connecticut Housatonic Community College, Bridgeport, Connecticut Naugatuck Valley Community College, Waterbury, Connecticut Seattle Central Community College, Seattle, Washington Tacoma Community College, Tacoma, Washington The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. vi Frontmatter Statway—Full Version 1.0, July 2011 Statway, Full Version 1.0, July 2011 Table of Contents Module 1: Statistical Studies and Overview of the Data Analysis Process Lesson 1.1.1: The Statistical Analysis Process Lesson 1.1.2: Types of Statistical Studies and Scope of Conclusions Lesson 1.2.1: Collecting Data by Sampling Lesson 1.2.2: Random Sampling Lesson 1.2.3: Other Sampling Strategies Lesson 1.2.4: Sources of Bias in Sampling Lesson 1.3.1: Collecting Data by Conducting an Experiment Lesson 1.3.2: Other Design Considerations—Blinding, Control Groups, and Placebos Lesson 1.4.1: Drawing Conclusions from Statistical Studies Module 2: Summarizing Data Graphically and Numerically Lesson 2.1.1: Dotplots, Histograms, and Distributions for Quantitative Data Lesson 2.1.2: Constructing Histograms for Quantitative Data Lesson 2.1.3: Comparing Distributions of Quantitative Data in Two Independent Samples Lesson 2.2.1: Quantifying the Center of a Distribution—Sample Mean and Sample Median Lesson 2.2.2: Constructing Histograms for Quantitative Data Lesson 2.3.1: Quantifying Variability Relative to the Median Lesson 2.4.1: Quantifying Variability Relative to the Mean Lesson 2.4.2: The Sample Variance Module 3: Reasoning About Bivariate Numerical Data—Linear Relationships Lesson 3.1.1: Introduction to Scatterplots and Bivariate Relationships Lesson 3.1.2: Developing an Intuitive Sense of Form, Direction, and Strength of the Relationship Between Two Measurements Lesson 3.1.3: Introduction to the Correlation Coefficient and Its Properties Lesson 3.1.4: Correlation Formula Lesson 3.1.5: Correlation Is Not Causation Lesson 3.2.1: Using Lines to Make Predictions Lesson 3.2.2: Least Squares Regression Line as Line of Best Fit The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. vii Frontmatter Statway—Full Version 1.0, July 2011 Lesson 3.2.3: Investigating the Meaning of Numbers in the Equation of a Line Lesson 3.2.4: Special Properties of the Least Squares Regression Line Lesson 3.3.1: Using Residuals to Determine If a Line Is a Good Fit Lesson 3.3.2: Using Residuals to Determine If a Line Is an Appropriate Model Module 4: Modeling Nonlinear Relationships Lesson 4.1.1: Investigating Patterns in Data Lesson 4.1.2: Exponential Models Lesson 4.1.3: Assessing How Well a Model Fits the Data Module 5: Reasoning About Bivariate Categorical Data and Introduction to Probability Lesson 5.1.1: Reasoning About Risk and Chance Lesson 5.1.2: Defining Risk Lesson 5.1.3: Interpreting Risk Lesson 5.1.4: Comparing Risks Lesson 5.1.5: More on Conditional Risks Module 6: Formalizing Probability and Probability Distributions Lesson 6.1.1: Probability Lesson 6.1.2: Probability Rules Lesson 6.1.3: Simulation, Discrete Random Variables, and Probability Distributions Lesson 6.2.1: Probability Distributions of Continuous Random Variables Lesson 6.2.2: Z-Scores and Normal Distributions Lesson 6.2.3: Using Normal Distributions to Find Probabilities and Critical Values Module 7: Linking Probability to Statistical Inference Lesson 7.1.1: Predicting an Election—Statistics and Sampling Variability Lesson 7.1.2: Sampling from a Population Lesson 7.1.3: Testing Statistical Hypotheses Lesson 7.2.1: Two Types of Inferential Procedures—Estimation and Hypothesis Testing Lesson 7.2.2: Connecting Sampling Distributions and Confidence Intervals Lesson 7.2.3: Connecting Sampling Distributions and Hypothesis Testing The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. viii Frontmatter Statway—Full Version 1.0, July 2011 Module 8: Inference for One Proportion Lesson 8.1.1: Sampling Distribution of One Proportion Lesson 8.1.2: Sampling Distribution of One Proportion Lesson 8.2.1: Estimation of One Proportion Lesson 8.2.2: Estimation of One Proportion Lesson 8.3.1: Estimation of One Proportion Lesson 8.3.2: Hypothesis Testing for One Proportion Module 9: Inference for Two Proportions Lesson 9.1.1: Sampling Distribution of Differences of Two Proportions Lesson 9.1.2: Using Technology to Explore the Sampling Distribution of the Differences in Two Proportions Lesson 9.2.1: Confidence Intervals for the Difference in Two Population Proportions Lesson 9.2.2: Computing and Interpreting Confidence Intervals for the Difference in Two Population Proportions Lesson 9.3.1: A Statistical Test for the Difference in Two Population Proportions Lesson 9.3.2: A Statistical Test for the Difference in Two Population Proportions Lesson 9.3.3: Conducting a Statistical Test for the Difference in Two Population Proportions Module 10: Inference for Means Lesson 10.1.1: The Sampling Distribution of the Sample Mean Lesson 10.1.2: Using an Applet to Explore the Sampling Distribution of the Mean with Focus on Shape Lesson 10.2.1: Estimating a Population Mean Lesson 10.2.2: T-Statistics and T-Distributions Lesson 10.2.3: The Confidence Interval for a Population Mean Lesson 10.3.1: Testing Hypotheses About a Population Mean Lesson 10.3.2: Test Statistic and P-Values, One-Sample T-Test Lesson 10.4.1: Inferences About the Difference Between Two Population Means Lesson 10.4.2: Inference for Paired Data Lesson 10.4.3: Two-Sample T-Test The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. ix Frontmatter Statway—Full Version 1.0, July 2011 Module 11: Chi-Squared Tests Lesson 11.1.1: Introduction to Chi-Square Tests for One-Way Tables Lesson 11.1.2: Executing the Chi-Square Test for One-Way Tables (Goodness-of-Fit) Lesson 11.1.3: The Chi-Square Distribution and Degrees of Freedom Lesson 11.2.1: Introduction to Chi-Square Tests for Two-Way Tables Lesson 11.2.2: Executing the Chi-Square Test for Independence in Two-Way Tables Lesson 11.2.3: Executing the Chi-Square Test for Homogeneity in Two-Way Tables Module 12: Other Mathematical Content Lesson 12.1.1: Statistical Linear Relationships and Mathematical Models of Linear Relationships Lesson 12.1.2: Mathematical Linear Models Lesson 12.1.3: Contrasting Mathematical and Statistical Linear Relationships Lesson 12.1.4: Proportional Models Lesson 12.2.1: Multiple Representations of Exponential Models Lesson 12.2.2: Linear Models—Answering Various Types of Questions Algebraically Lesson 12.2.3: Power Models Lesson 12.2.4: Solving Inequalities The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. x Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion Estimated number of 50-‐minute class sessions: 2 Materials Required • • A coin for each student A six-‐sided die for each student Learning Goals Students will begin to understand that • • • • the sample proportion (p) centers around the true population proportion (π). for sufficiently large samples of independent observations, the sample proportion is normally distributed. a sample is considered sufficiently large when the number of events (nπ) and the number of nonevents [n(1 – π)] are both at least 5. the standard error of a sample proportion in the large-‐sample case is ( ! !" ! "! • • ) ! the standard error of a sample proportion decreases as the proportion moves away from 0.5, toward either 0 or 1. as the sample size increases, the standard error decreases. Students will begin to be able to • • determine whether the large-‐sample procedures for one proportion are appropriate. state the sampling distribution for the large-‐sample case for one proportion. Lesson Overview This initiating lesson focuses on generating sampling distributions of sample proportions and examining their properties. It begins with a rich task asking students to imagine (and draw) the distribution of the proportion of heads when flipping a fair coin. This task is expanded to 2 flips and then 10 flips and then to the proportion of sixes when rolling a fair six-‐sided die 10 times. The contextual scaffolding has students manually generating these distributions, comparing them to their guesses, and discussing the impact of changing n and π. Little’s Plinko applet is then used to conduct simulations demonstrating that the sampling distribution of the sample proportion is approximately normally distributed for large samples—when nπ and n(1 – π) are both at least 5— but not for small samples. Students then examine the standard error formula using the applet, showing that it increases as they move away from 0.5, toward either 0 or 1. In summary, students will notice that the sampling distribution of the sample proportion for large " ! "# ! #! % ' . A discussion of what independence means in this context takes samples follows ! $ ! ! " $ ' # & ( ) place as well. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion Opening Discussion and Questions Show a coin to the class. The discussion questions below are optional and present a review of previous material. • • • What are the possible outcomes of this coin? (You may challenge students on this—how do they know it will not land on an edge?) What type of variable is it? (This is a nice chance to use personal response systems.) What is the probability of getting heads? How do you know? (Okay, assume it is a fair coin that has a probability for heads and tails of 50% each. Topic 8.3 revisits this.) Are coin flips independent? (You can include a review of Module 5 here or just present the fact that coin flips are independent. You can demonstrate that if you throw the coin up or drop it down without any rotation, the results will be [almost] perfectly dependent.) Predicting the Distribution of the Proportions At the end of this section, students should have six distributions on their paper. You may wish to provide them with a two-‐sided instruction sheet, giving them space to draw the distributions. Ask students to think about the possible values for the proportion of heads when you flip the coin one time. Have them to draw this distribution. Next ask students to think about the possible values for the proportion of heads when you flip the coin two times. Have them to draw this distribution. Next ask students to think about the possible values for the proportion of heads when you flip the coin 10 times. Have them to draw this distribution. (This is a good time for a Think-‐Pair-‐Share before class discussion occurs.) Discussion Remind students that the way to describe a distribution is through its center, spread, and shape. Have them discuss with a partner (or the class) the shape of each distribution and why they chose what they did. Show a die to the class. Engage in a similar discussion to the Opening Discussion and Questions (the discussion questions), but regarding the proportion of times the die might roll a six. Ask students to think about the possible values for the proportion of sixes when you roll the die one time. Instruct them to draw this distribution. Ask students to think about the possible values for the proportion of sixes when you roll the die two times. Instruct them to draw this distribution. Ask students to think about the possible values for the proportion of sixes when you roll the die 10 times. Instruct them to draw this distribution. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion Generating the Distribution of the Proportion Note: In a small class of fewer than 20 or 30 students, you may have each student generate more than one result for each. Ask each student to do the following: 1. Flip a coin one time and record the percent heads. 2. Flip a coin two times and record the percent heads. 3. Flip a coin 10 times and record the percent heads. 4. Roll a die one time and record the percent of sixes. 5. Roll a die two times and record the percent of sixes. 6. Roll a die 10 times and record the percent of sixes. Have students generate the distributions, and then collect and combine them all. Technology is helpful here, but brute force works as well (e.g., for two coins, ask how many got 0% heads, how many got 50% heads, and how many got 100% heads). Display these distributions. The following are discussion points: • • • • Why did the distribution generated by the class not look exactly like the one that you drew? (These samples are randomly selected from a population. Each time you do this, you will likely get a different answer.) Center: You should find that each distribution centers around the true value (1/2 for the coin and 1/6 for the die). Spread: You should notice that the spread decreases as the sample size (number of flips or rolls) increases. Shape: You will notice that the shapes begin to have a peak in the middle—near the true value—and start looking like the bell-‐shaped curve. Although, especially for the dice rolls, it does not really look that normal. Simulating the Distribution of the Proportion and Examining Properties of Sampling Distribution of One Proportion Next, students simulate these types of experiments using a computer. Once students understand how to generate data and read the results, they should grasp the concept of simulation. This can be done using a number of different technological options: • • • • David Little’s Plinko and the Binomial Distribution applet at www.math.psu.edu/dlittle/java/ probability/plinko/index.html (You can download a version of the applet for offline use.) Statistics Online Computational Resource (www.socr.ucla.edu/htmls/SOCR_Experiments.html) Rossman/Chance Applet Collection (http://statweb.calpoly.edu/bchance/applets/BinomDist3/ BinomDist.html) software such as Fathom, Minitab, and so on. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion Activity A Start off using 11 bins, probability (π) 0.5, and speed 1. There are a few things to point out to students: • • • The bins simulate 10 flips of a fair coin (i.e., the bins show the number of heads that result from ten flips of a coin). There are 11 bins, because you can get 0–10 heads (i.e., 11 different possible outcomes of the 10 coin flips). There are 10 levels of bins on the board, each corresponding to a successive flip of the coin. The number at the bottom is the number of heads (or tails) that result. Note that you have been talking about proportions, so students can divide this number by 10 to get the proportion of heads (10 => 1 or 100%, 5 => 0.5 or 50%, etc.). Ask students, “What do you think the shape of the distribution will look like?” Start the applet. As each ball drops, point out that there is a 50% chance of going to the left (getting heads) and a 50% chance of going to the right (getting tails). As the balls accumulate, take note of the numbers on the bottom: • • • The count is the number of balls dropped (the number of times the program has done 10 flips of a coin ). The mean is the mean number of heads. Divide by 10 to get the mean proportion. This number should be close to 0.5. The standard deviation is the square root of the variance. Divide this by 10 to get the standard error of the proportion. This number should be close to !"# $% & = & $"#' $% & = &%"$#' Once you end the simulation, ask students to describe the shape of the distribution. It should look fairly normally distributed. Activity B Now, clear the screen, and change the probability to 0.17, which corresponds to 10 rolls of a fair six-‐sided die. Point out that the 11 bins simulate 10 rolls of a die, counting the number of sixes that they get, since they can get 0–10 sixes. Ask students, “What do you think the shape will look like?” Start the applet. As each ball drops, point out that there is a 1/6 chance of going to the left (getting a 6), and a 5/6 chance of going to the right (getting a different number). As the balls accumulate, take note of the numbers on the bottom: • • • The count is the number of balls that have dropped (the number of times the program rolls 10 dice). The mean is the mean number of sixes. Divide by 10 to get the mean proportion. This number should be close to 0.17. The standard deviation is the square root of the variance that is listed. Divide this by 10 to get the standard error of the proportion. This number should be close to The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion !"#! !$ % = % !"!& !$ % = %$"!!& Notice that this number is smaller than the coin-‐flip simulation. Students may be able to see this visually as well. Once you end the simulation, ask students to describe the shape of the distribution. It should look positively skewed. Discussion Why is the standard error of the sample proportion decreasing when the proportion decreases? Have students think back to basic probability calculations. They are less likely to deviate from the mean when the probability is small (or large). Specifically, based on the binomial distribution, the chance of being 3 higher than the mean of 5 is the chance of getting 8 or more heads, which is 0.55. The chance of getting 5 or more sixes is only 0.015. Thus, the variance is maximized at π = 0.5, and decreases as it moves away from 0.5, toward either 0 or 1. The formula for standard error of a proportion is ( ! !" ! "! ! ) While students do not have the tools to derive this formula, they can see that it is consistent with the previous statement. Activity C Now, clear the results and change the probability to 0.5 and the bins to 51, simulating 50 flips of a fair coin. Point out that the 51 bins simulate 50 flips of a fair coin, since they can get 0–50 heads. Ask students, “What do you think the shape will look like?” Start the applet. Point out how unlikely it is for the ball to land near the extremes of the distribution. As the balls accumulate, take note of the numbers on the bottom: • • • The count is the number of balls that have dropped (the number of times the program flips 50 coins). The mean is the mean number of heads. Divide by 50 to get the mean proportion. This number should be close to 0.5. The standard deviation is the square root of the variance that is listed. Divide this by 50 to get the standard error of the proportion. This number should be close to !"#$ $% & = & '#$( $% & = &%#%)! Once you end the simulation, ask students to describe the shape of the distribution. It should look fairly normally distributed. The spread is much smaller than when you did 10 flips. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion Discussion As the sample size (number of flips) increases, the standard error decreases. Again, this is consistent with the manual simulations as well as in the formula: ( ! !" ! "! ! ) Since n is in the denominator, as n increases, the standard error decreases. Intuitively, this should make sense as well. Activity D Clear the screen, and change the probability to 0.17, simulating 50 rolls of a fair six-‐sided die. Point out that the 51 bins simulate 50 dice rolls, counting the number of sixes they get, since they can get 0–50 sixes. Ask students, “What do you think the shape will look like?” Start the applet. As each ball drops, point out that there is a 1/6 chance of going to the left (getting a 6), and a 5/6 chance of going to the right (getting a different number). As the balls accumulate, take note of the numbers on the bottom: • • • The count is the number of balls that have dropped (the number of times the program rolls 50 dice) The mean is the mean number of sixes. Divide by 50 to get the mean proportion. This number should be close to 0.17. The standard deviation is the square root of the variance that is listed. Divide this by 50 to get the standard error of the proportion. This number should be close to !"# $% & = & '"(( $% & = &%"%$) Notice that this number is smaller than the coin-‐flip simulation and smaller than the dice-‐ roll simulation with only 10 rolls. Students may be able to see this visually as well. Once you end the simulation, ask students to describe the shape of the distribution. It should look normally distributed. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion Discussion Summarize the distributions you simulated: Activity Type π n SE πp n(1 – π) Shape A Coin 0.5 10 0.158 5 5 Normal C Coin 0.5 50 0.071 25 25 Normal B Die 1/6 10 0.119 1.7 8.3 Skewed D Die 1/6 50 0.053 8.3 41.7 Normal Of note, when π = 1/6 and n = 10, the distribution does not look very normal. However, when π = 1/6 and n = 50, the distribution looks approximately normal. So, the condition to determine must include information about both n and π. Specifically, nπ and n(1 – π) must both be at least 5. Wrap-‐Up The sampling distribution of the sample proportion for large samples is ( " ! #$ ! $! ! ! "$! " # $ # ) %' ' & The large-‐sample condition is met if nπ and n(1 – π) are both at least 5. This is a guideline (or rule of thumb). The observations (flips, rolls, etc.) must also be independent of each other. Homework [Student Handout] (1) For which of the following situations does the large-‐sample condition hold? (a) n = 10, π = 0.5 (b) n = 10, π = 0.05 (c) n = 100, π = 0.1 (d) n = 100, π = 0.01 (e) n = 15, π = 0.8 (2) Suppose you were interested in flipping a fair coin 100 times. What is the distribution of the sample proportion of heads? State the shape of the distribution and an indicator of center and spread. State and verify any assumptions necessary for this condition. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion 1 (3) Suppose 22.5% of all college students have been tested for HIV . If a random sample of 68 college students are chosen, what is the distribution of the sample proportion of students who have been tested for HIV? State the shape of the distribution as well as an indicator of center and spread. State and verify any assumptions necessary for this condition. (4) [Advanced proficiency] Pass the Pigs is a popular game. Information about the game can be found on Wikipedia at http://en.wikipedia.org/wiki/Pass_the_Pigs. An online version of the game can be found at www.toptrumps.com/play/pigs/pigs.html. According to Wikipedia, 65.1% of the time the pig lands on one side or the other. (a) How many times must you roll the pig for the distribution to become approximately normal, according to the large-‐sample rule? (b) How many times must you roll the pig for the standard error to be no larger than 1%? 1 Crosby, R. A., Miller, K. H., Staten, R. R., & Noland, M. (2005). Prevalence and correlates of HIV testing among college students: an exploratory study. Sex Health, 19–22. Retrieved from www.ncbi.nlm.nih.gov/pubmed/16334708. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion Homework (1) For which of the following situations does the large-‐sample condition hold? (a) n = 10, π = 0.5 (b) n = 10, π = 0.05 (c) n = 100, π = 0.1 (d) n = 100, π = 0.01 (e) n = 15, π = 0.8 (2) Suppose you were interested in flipping a fair coin 100 times. What is the distribution of the sample proportion of heads? State the shape of the distribution and an indicator of center and spread. State and verify any assumptions necessary for this condition. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.1.1: Sampling Distribution of One Proportion 1 (3) Suppose 22.5% of all college students have been tested for HIV . If a random sample of 68 college students are chosen, what is the distribution of the sample proportion of students who have been tested for HIV? State the shape of the distribution as well as an indicator of center and spread. State and verify any assumptions necessary for this condition. (4) Pass the Pigs is a popular game. Information about the game can be found on Wikipedia at http://en.wikipedia.org/wiki/Pass_the_Pigs. An online version of the game can be found at www.toptrumps.com/play/pigs/pigs.html. According to Wikipedia, 65.1% of the time the pig lands on one side or the other. (a) How many times must you roll the pig for the distribution to become approximately normal, according to the large-‐sample rule? (b) How many times must you roll the pig for the standard error to be no larger than 1%? 1 Crosby, R. A., Miller, K. H., Staten, R. R., & Noland, M. (2005). Prevalence and correlates of HIV testing among college students: an exploratory study. Sex Health, 19–22. Retrieved from www.ncbi.nlm.nih.gov/pubmed/16334708. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will understand that • the sampling distribution of the sample proportion for large samples follows ( " ! #$ ! $! ! ! "$! " # $ # • • ) %' ' & the standard error decreases (or the precision increases) as the sample size increases. this material integrates directly with Topic 6.2 where the normal distributions were introduced. Students will be able to • determine whether the sample is sufficiently large and the observations are independent to state that the sample proportion is approximately normally distributed: ( " ! #$ ! $! ! ! "$! " # $ # • • ) %' ' & calculate probabilities from proportions in the large-‐sample case using the normal distribution. calculate percentiles from proportions in the large-‐sample case using the normal distribution. Lesson Overview Students are provided with contextual examples and asked to calculate probabilities and percentiles. These are based on past lessons or other real-‐world scenarios. Students verify that the large-‐sample techniques are appropriate, determine the distribution of the sample proportion, and calculate either a probability or percentile based on a series of questions. They should understand the impact of changing n on these calculations. Discussion/Questions [Student Handout] Lesson 8.1.1 discussed how, under certain conditions, the sampling distribution of the sample proportion follows a normal distribution with mean π and standard deviation: ( ! !" ! "! ! ) These conditions are (a) the observations must be independent of each other and (b) nπ and n(1 – π) must each be at least 5. Recall from Module 6 about continuous probability distributions: • The probability of being in a certain region is the area under the curve (see the graphs below). The first one represents an upper tail probability—the probability of being greater than an indicated value. The second one represents the probability of being between two indicated values. In both cases, the area under the curve is equal to the probability of being in the specified region. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion 1.0 Density 0.8 0.6 0.4 0.2 0.0 X 0.4 Density 0.3 0.2 0.1 0.0 • X The probability of being at exactly one point is equal to zero. For example, what is the chance that someone is exactly 5 feet and 8.9021384573908573405923498534 inches tall? Essentially, the probability is zero, since any one point represents an exact number. In practice, this means that weak inequalities and strict inequalities are the same. For example, P(x ≤ 5) = P(x < 5), since P(x = 5) = 0, or very close to it. Recall that Topic 6.2 discussed various properties of the normal distribution. In particular, for any normal distribution, you can calculate z, which is a standard normal distribution with a mean of 0 and a standard deviation of 1. The Z-‐value represents the number of standard deviations above the mean. For example, z = 1.3 means you are 1.3 standard deviations above the mean, while z = –2 means you are 2 standard deviations below the mean. The purpose of calculating a Z-‐value is so you can look up the probability on a table for any normal distribution (with a nonzero mean or a standard deviation that is not 1, like height or IQ). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion ( " ! #$ ! $! Since ! ! " $ ! " # $ # ) %' , you know that !! = ! "! ! !! ( ' & ! "! ! !! # ) . One example of a standard normal table can be found at the end of this lesson. There are also a few applets on the Internet that can be used as well: • • • www-‐stat.stanford.edu/~naras/jsm/FindProbability.html http://davidmlane.com/hyperstat/z_table.html www.rossmanchance.com/applets/NormalCalcs/NormalCalculations.html Examples of Calculating Probabilities from Scenarios [Student Handout] (Note: These problems can be done by the instructor, via a Think-‐Pair-‐Share, clickers, etc. Other problems can be chosen based on student interest.) (1) According to the Center for Disease Control and Prevention, the Human Immunodeficiency Virus (HIV) rate among college students is 1 in 1,500 students. At a large university of 30,000 students, what is the chance that fewer than 10 students are HIV positive? n = 30,000 and π = 1/1,500 = 0.000666 Check assumptions: nπ = 30,000/1,500 = 20 and n(1 – π) = 29,980. Both are at least 5. You could call into question the independence of the observations, since some students might be infected by other students, so proceed with caution! p = 10/30,000 = 0.000333 " $ "#"""$$$& ! &"#"""''' $ !& ! &! ! ! < "#"""$$$%! & = &"#"""''' & = &! $ &<& "#"""''' (& ! &"#"""''' $ ! (& ! &! $# " $")"""" " !"#"""$$$ % = &! $ # & < & & = &! # & < & ! ,#,* & = &"#"($ "#"""(*+ '& # ( ) ( ( ) ) ( ) % ' ' ' ' '& Using the Z-‐table, the value 0.013 can be found in the row labeled –2.2 and the column labeled 0.04. This corresponds to a z-‐value of –2.24. (2) It is known that 52% of births are male (yet 52% of people in the United States are female, since they have a longer life expectancy). In a certain hospital, there are typically 120 births per month. What is the chance that more than 70 births are male? n = 120 and π = 0.52 Check assumptions: nπ = 120(0.52) = 62.4 and n(1 – π) = 57.6. Both are at least 5. As long as you do not have any identical twins, independence will likely hold. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion " % $ ' !' ! '! #()*+' ! '#()% ' ! ! > "# &! ' = '#()% ' = '! $ '>' $ ' $%# #()% $' ! '#()% ' $ ! $' ! '! $# '& " $%# " #(#,+ % = '! $ # ' > ' ' = '! # ' > '$(+* ' = '$' ! '! # ' < '$(+* ' = '$' ! '#(.$,' = '#(#*#(#-), '& # ( ) ( ( ) ) ( ) ( ) Examples of Calculating Percentiles from Scenarios [Student Handout] (Note: When discussing percentiles for proportions, it is sometimes confusing for students. There are now three percents that are being used: the percentile, π, and p.) Sometimes, you are interested in calculating the value of p for a given percentile, rather than a probability for a given p. The following are based on the previous examples. (3) According to the Center for Disease Control and Prevention, the HIV rate among college students is 1 in 1,500 students. At a large university of 30,000 students, what is the 10th percentile for the proportion of students you expect to be HIV positive? n = 30,000 and π = 1/1,500 = 0.000666 Check assumptions: nπ = 30,000/1,500 = 20 and n(1 – π) = 29,980. Both are at least 5. You could call into question the independence of the observations, since some students might be infected by other students, so proceed with caution! The 10th percentile corresponds to the value in the Z-‐table of –1.28. Thus, you can calculate the value of p that corresponds to this: !! = ! "! ! !! ( ! "! ! !! # ) "! ! !&#&&&''' ! => ! !"#$%! = ! "! = !&#&&&'''! ! !"#$% ( &#&&&''' "! ! !&#&&&''' ( (&)&&& &#&&&''' "! ! !&#&&&''' (&)&&& ) => ) ! => !"! = !&#&&&*+, or about 1 in 2,100. There is a shortcut you can use to determine these. Specifically, rearranging the formula results in the following equation: !! = ! "! ! !! ( ! "! ! !! # ) ! => !"! = !! ! + !! ( ! "! ! !! # ) The latter equation is the shortcut formula. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion (4) It is known that 52% of births are male (yet 52% of people in the United States are female, since they have a longer life expectancy). In a certain hospital, there are typically 120 births per month. In 80% of months, the percent of males who are born there is less than what percent? n = 120 and π = 0.52 Check assumptions: nπ = 120(0.52) = 62.4 and n(1 – π) = 57.6. Both are at least 5. As long as you do not have any identical twins, independence will likely hold. Using the shortcut formula: !! = !! ! + !" ( ! "! ! !! # ) ! => !!! = !#$%& + #$'( ( #$%& "! ! !#$%& "&# ) ! = !#$%&! + !#$#(! = !#$%) Wrap-‐Up For sampling distributions from proportions, !! = ! "! ! !! ( ! "! ! !! # ) When reading values from the standard normal table (the Z-‐table): • • • P(Z < b) can be obtained from the table. b can take on any positive or negative number. For values lower than –4 (off the table), P(Z < b) is close to 0. For values higher than 4 (off the table), P(Z < b) is close to 1. P(Z > b) = 1 – P(Z < b). If you ever want a probability for greater than, you can use the complement rule to subtract the value in the table from 1. P(a < Z < b) = P(Z < b) – P(Z < a) For calculating percentiles, the shortcut formula is !! = !! ! + !" ( ! "! ! !! # ) Homework [Student Handout] (1) Recall Question 2 from Lesson 8.1.1: Suppose you are interested in flipping a fair coin 100 times. What is the distribution of the sample proportion of heads (a) What is the chance of getting no more than 40 heads in 100 flips? (b) What is the 80th percentile for the number of heads you would get in 100 flips? (2) Recall Question 3 from Lesson 8.1.1: Suppose that 22.5% of all college students have been tested for HIV. If a random sample of 68 college students is chosen, what is the distribution of the sample proportion of students who have been tested for HIV? (a) What is the chance that at least 50% of the sample has been tested for HIV? (b) What is the 10th percentile of the proportion of students in the sample of 68 who have been tested for HIV? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Probability Under the Standard Normal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 0.00 0.500 0.540 0.579 0.618 0.655 0.691 0.726 0.758 0.788 0.816 0.841 0.864 0.885 0.903 0.919 0.933 0.945 0.955 0.964 0.971 0.977 0.982 0.986 0.989 0.992 0.994 0.995 0.997 0.997 0.998 0.9987 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.99995 0.01 0.504 0.544 0.583 0.622 0.659 0.695 0.729 0.761 0.791 0.819 0.844 0.867 0.887 0.905 0.921 0.934 0.946 0.956 0.965 0.972 0.978 0.983 0.986 0.990 0.992 0.994 0.995 0.997 0.998 0.998 0.9987 0.9991 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.99995 0.02 0.508 0.548 0.587 0.626 0.663 0.698 0.732 0.764 0.794 0.821 0.846 0.869 0.889 0.907 0.922 0.936 0.947 0.957 0.966 0.973 0.978 0.983 0.987 0.990 0.992 0.994 0.996 0.997 0.998 0.998 0.9987 0.9991 0.9994 0.9995 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.03 0.512 0.552 0.591 0.629 0.666 0.702 0.736 0.767 0.797 0.824 0.848 0.871 0.891 0.908 0.924 0.937 0.948 0.958 0.966 0.973 0.979 0.983 0.987 0.990 0.992 0.994 0.996 0.997 0.998 0.998 0.9988 0.9991 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.04 0.516 0.556 0.595 0.633 0.670 0.705 0.739 0.770 0.800 0.826 0.851 0.873 0.893 0.910 0.925 0.938 0.949 0.959 0.967 0.974 0.979 0.984 0.987 0.990 0.993 0.994 0.996 0.997 0.998 0.998 0.9988 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.05 0.520 0.560 0.599 0.637 0.674 0.709 0.742 0.773 0.802 0.829 0.853 0.875 0.894 0.911 0.926 0.939 0.951 0.960 0.968 0.974 0.980 0.984 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.998 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.06 0.524 0.564 0.603 0.641 0.677 0.712 0.745 0.776 0.805 0.831 0.855 0.877 0.896 0.913 0.928 0.941 0.952 0.961 0.969 0.975 0.980 0.985 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.998 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.07 0.528 0.567 0.606 0.644 0.681 0.716 0.749 0.779 0.808 0.834 0.858 0.879 0.898 0.915 0.929 0.942 0.953 0.962 0.969 0.976 0.981 0.985 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.9985 0.9989 0.9992 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.08 0.532 0.571 0.610 0.648 0.684 0.719 0.752 0.782 0.811 0.836 0.860 0.881 0.900 0.916 0.931 0.943 0.954 0.962 0.970 0.976 0.981 0.985 0.989 0.991 0.993 0.995 0.996 0.997 0.998 0.9986 0.9990 0.9993 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99997 0.09 0.536 0.575 0.614 0.652 0.688 0.722 0.755 0.785 0.813 0.839 0.862 0.883 0.901 0.918 0.932 0.944 0.954 0.963 0.971 0.977 0.982 0.986 0.989 0.992 0.994 0.995 0.996 0.997 0.998 0.9986 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.9999 0.99997 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Probability Under the Standard Normal Distribution –0.0 –0.1 –0.2 –0.3 –0.4 –0.5 –0.6 –0.7 –0.8 –0.9 –1.0 –1.1 –1.2 –1.3 –1.4 –1.5 –1.6 –1.7 –1.8 –1.9 –2.0 –2.1 –2.2 –2.3 –2.4 –2.5 –2.6 –2.7 –2.8 –2.9 –3.0 –3.1 –3.2 –3.3 –3.4 –3.5 –3.6 –3.7 –3.8 –3.9 0.00 0.500 0.460 0.421 0.382 0.345 0.309 0.274 0.242 0.212 0.184 0.159 0.136 0.115 0.097 0.081 0.067 0.055 0.045 0.036 0.029 0.023 0.018 0.014 0.011 0.008 0.006 0.005 0.003 0.003 0.002 0.0013 0.0010 0.0007 0.0005 0.0003 0.0002 0.0002 0.0001 0.0001 0.00005 0.01 0.496 0.456 0.417 0.378 0.341 0.305 0.271 0.239 0.209 0.181 0.156 0.133 0.113 0.095 0.079 0.066 0.054 0.044 0.035 0.028 0.022 0.017 0.014 0.010 0.008 0.006 0.005 0.003 0.002 0.002 0.0013 0.0009 0.0007 0.0005 0.0003 0.0002 0.0002 0.0001 0.0001 0.00005 0.02 0.492 0.452 0.413 0.374 0.337 0.302 0.268 0.236 0.206 0.179 0.154 0.131 0.111 0.093 0.078 0.064 0.053 0.043 0.034 0.027 0.022 0.017 0.013 0.010 0.008 0.006 0.004 0.003 0.002 0.002 0.0013 0.0009 0.0006 0.0005 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.03 0.488 0.448 0.409 0.371 0.334 0.298 0.264 0.233 0.203 0.176 0.152 0.129 0.109 0.092 0.076 0.063 0.052 0.042 0.034 0.027 0.021 0.017 0.013 0.010 0.008 0.006 0.004 0.003 0.002 0.002 0.0012 0.0009 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.04 0.484 0.444 0.405 0.367 0.330 0.295 0.261 0.230 0.200 0.174 0.149 0.127 0.107 0.090 0.075 0.062 0.051 0.041 0.033 0.026 0.021 0.016 0.013 0.010 0.007 0.006 0.004 0.003 0.002 0.002 0.0012 0.0008 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.05 0.480 0.440 0.401 0.363 0.326 0.291 0.258 0.227 0.198 0.171 0.147 0.125 0.106 0.089 0.074 0.061 0.049 0.040 0.032 0.026 0.020 0.016 0.012 0.009 0.007 0.005 0.004 0.003 0.002 0.002 0.0011 0.0008 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.06 0.476 0.436 0.397 0.359 0.323 0.288 0.255 0.224 0.195 0.169 0.145 0.123 0.104 0.087 0.072 0.059 0.048 0.039 0.031 0.025 0.020 0.015 0.012 0.009 0.007 0.005 0.004 0.003 0.002 0.002 0.0011 0.0008 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.07 0.472 0.433 0.394 0.356 0.319 0.284 0.251 0.221 0.192 0.166 0.142 0.121 0.102 0.085 0.071 0.058 0.047 0.038 0.031 0.024 0.019 0.015 0.012 0.009 0.007 0.005 0.004 0.003 0.002 0.0015 0.0011 0.0008 0.0005 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.08 0.468 0.429 0.390 0.352 0.316 0.281 0.248 0.218 0.189 0.164 0.140 0.119 0.100 0.084 0.069 0.057 0.046 0.038 0.030 0.024 0.019 0.015 0.011 0.009 0.007 0.005 0.004 0.003 0.002 0.0014 0.0010 0.0007 0.0005 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00003 0.09 0.464 0.425 0.386 0.348 0.312 0.278 0.245 0.215 0.187 0.161 0.138 0.117 0.099 0.082 0.068 0.056 0.046 0.037 0.029 0.023 0.018 0.014 0.011 0.008 0.006 0.005 0.004 0.003 0.002 0.0014 0.0010 0.0007 0.0005 0.0003 0.0002 0.0002 0.0001 0.0001 0.0001 0.00003 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Lesson 8.1.1 discussed how, under certain conditions, the sampling distribution of the sample proportion follows a normal distribution with mean π and standard deviation: ( ! !" ! "! ! ) These conditions are (a) the observations must be independent of each other and (b) nπ and n(1 – π) must each be at least 5. Recall from Module 6 about continuous probability distributions: • The probability of being in a certain region is the area under the curve (see the graphs below). The first one represents an upper tail probability—the probability of being greater than an indicated value. The second one represents the probability of being between two indicated values. In both cases, the area under the curve is equal to the probability of being in the specified region. 1.0 Density 0.8 0.6 0.4 0.2 0.0 X 0.4 Density 0.3 0.2 0.1 0.0 X The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion • The probability of being at exactly one point is equal to zero. For example, what is the chance that someone is exactly 5 feet and 8.9021384573908573405923498534 inches tall? Essentially, the probability is zero, since any one point represents an exact number. In practice, this means that weak inequalities and strict inequalities are the same. For example, P(x ≤ 5) = P(x < 5), since P(x = 5) = 0, or very close to it. Recall that Topic 6.2 discussed various properties of the normal distribution. In particular, for any normal distribution, you can calculate z, which is a standard normal distribution with a mean of 0 and a standard deviation of 1. The Z-‐value represents the number of standard deviations above the mean. For example, z = 1.3 means you are 1.3 standard deviations above the mean, while z = –2 means you are 2 standard deviations below the mean. The purpose of calculating a Z-‐value is so you can look up the probability on a table for any normal distribution (with a nonzero mean or a standard deviation that is not 1, like height or IQ). ( " ! #$ ! $! Since ! ! " $ ! " # $ # ) %' , you know that !! = ! ' & "! ! !! ( ! "! ! !! ) . # One example of a standard normal table can be found at the end of this lesson. There are also a few applets on the Internet that can be used as well: • • • www-‐stat.stanford.edu/~naras/jsm/FindProbability.html http://davidmlane.com/hyperstat/z_table.html www.rossmanchance.com/applets/NormalCalcs/NormalCalculations.html The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Examples of Calculating Probabilities from Scenarios (1) According to the Center for Disease Control and Prevention, the Human Immunodeficiency Virus (HIV) rate among college students is 1 in 1,500 students. At a large university of 30,000 students, what is the chance that fewer than 10 students are HIV positive? n = 30,000 and π = 1/1,500 = 0.000666 Check assumptions: nπ = 30,000/1,500 = 20 and n(1 – π) = 29,980. Both are at least 5. You could call into question the independence of the observations, since some students might be infected by other students, so proceed with caution! p = 10/30,000 = 0.000333 " $ "#"""$$$& ! &"#"""''' $ !& ! &! ! ! < "#"""$$$%! & = &"#"""''' & = &! $ &<& "#"""''' (& ! &"#"""''' $ ! (& ! &! $# " $")"""" " % !"#"""$$$ = &! $ # & < & & = &! # & < & ! ,#,* & = &"#"($ "#"""(*+ '& # ( ) ( ( ) ( ) ) % ' ' ' ' '& Using the Z-‐table, the value 0.013 can be found in the row labeled –2.2 and the column labeled 0.04. This corresponds to a z-‐value of –2.24. (2) It is known that 52% of births are male (yet 52% of people in the United States are female, since they have a longer life expectancy). In a certain hospital, there are typically 120 births per month. What is the chance that more than 70 births are male? n = 120 and π = 0.52 Check assumptions: nπ = 120(0.52) = 62.4 and n(1 – π) = 57.6. Both are at least 5. As long as you do not have any identical twins, independence will likely hold. " % $ ' !' ! '! #()*+' ! '#()% ' $ "# ! !> &! ' = '#()% ' = '! '>' $ ' $%# ! $' ! ' ! #()% $' ! '#()% ' $ $# '& " $%# " #(#,+ % = '! $ # ' > ' ' = '! # ' > '$(+* ' = '$' ! '! # ' < '$(+* ' = '$' ! '#(.$,' = '#(#*#(#-), '& # ( ) ( ( ) ) ( ( ) ) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Examples of Calculating Percentiles from Scenarios Sometimes, you are interested in calculating the value of p for a given percentile, rather than a probability for a given p. The following are based on the previous examples. (3) According to the Center for Disease Control and Prevention, the HIV rate among college students is 1 in 1,500 students. At a large university of 30,000 students, what is the 10th percentile for the proportion of students you expect to be HIV positive? n = 30,000 and π = 1/1,500 = 0.000666 Check assumptions: nπ = 30,000/1,500 = 20 and n(1 – π) = 29,980. Both are at least 5. You could call into question the independence of the observations, since some students might be infected by other students, so proceed with caution! The 10th percentile corresponds to the value in the Z-‐table of –1.28. Thus, you can calculate the value of p that corresponds to this: "! ! !! !! = ! ( ! "! ! !! # ) "! ! !&#&&&''' ! => ! !"#$%! = ! "! = !&#&&&'''! ! !"#$% ( &#&&&''' "! ! !&#&&&''' (&)&&& ( &#&&&''' "! ! !&#&&&''' (&)&&& ) => ) ! => !"! = !&#&&&*+, or about 1 in 2,100. There is a shortcut you can use to determine these. Specifically, rearranging the formula results in the following equation: !! = ! "! ! !! ( ! "! ! !! # ) ! => !"! = !! ! + !! ( ! "! ! !! # ) The latter equation is the shortcut formula. (4) It is known that 52% of births are male (yet 52% of people in the United States are female, since they have a longer life expectancy). In a certain hospital, there are typically 120 births per month. In 80% of months, the percent of males who are born there is less than what percent? n = 120 and π = 0.52 Check assumptions: nπ = 120(0.52) = 62.4 and n(1 – π) = 57.6. Both are at least 5. As long as you do not have any identical twins, independence will likely hold. Using the shortcut formula: !! = !! ! + !" ( ! "! ! !! # ) ! => !!! = !#$%& + #$'( ( #$%& "! ! !#$%& "&# ) ! = !#$%&! + !#$#(! = !#$%) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Homework (1) Recall Question 2 from Lesson 8.1.1: Suppose you are interested in flipping a fair coin 100 times. What is the distribution of the sample proportion of heads (a) What is the chance of getting no more than 40 heads in 100 flips? (b) What is the 80th percentile for the number of heads you would get in 100 flips? (2) Recall Question 3 from Lesson 8.1.1: Suppose that 22.5% of all college students have been tested for HIV. If a random sample of 68 college students is chosen, what is the distribution of the sample proportion of students who have been tested for HIV? (a) What is the chance that at least 50% of the sample has been tested for HIV? (b) What is the 10th percentile of the proportion of students in the sample of 68 who have been tested for HIV? The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Probability Under the Standard Normal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 0.00 0.500 0.540 0.579 0.618 0.655 0.691 0.726 0.758 0.788 0.816 0.841 0.864 0.885 0.903 0.919 0.933 0.945 0.955 0.964 0.971 0.977 0.982 0.986 0.989 0.992 0.994 0.995 0.997 0.997 0.998 0.9987 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.99995 0.01 0.504 0.544 0.583 0.622 0.659 0.695 0.729 0.761 0.791 0.819 0.844 0.867 0.887 0.905 0.921 0.934 0.946 0.956 0.965 0.972 0.978 0.983 0.986 0.990 0.992 0.994 0.995 0.997 0.998 0.998 0.9987 0.9991 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.99995 0.02 0.508 0.548 0.587 0.626 0.663 0.698 0.732 0.764 0.794 0.821 0.846 0.869 0.889 0.907 0.922 0.936 0.947 0.957 0.966 0.973 0.978 0.983 0.987 0.990 0.992 0.994 0.996 0.997 0.998 0.998 0.9987 0.9991 0.9994 0.9995 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.03 0.512 0.552 0.591 0.629 0.666 0.702 0.736 0.767 0.797 0.824 0.848 0.871 0.891 0.908 0.924 0.937 0.948 0.958 0.966 0.973 0.979 0.983 0.987 0.990 0.992 0.994 0.996 0.997 0.998 0.998 0.9988 0.9991 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.04 0.516 0.556 0.595 0.633 0.670 0.705 0.739 0.770 0.800 0.826 0.851 0.873 0.893 0.910 0.925 0.938 0.949 0.959 0.967 0.974 0.979 0.984 0.987 0.990 0.993 0.994 0.996 0.997 0.998 0.998 0.9988 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.05 0.520 0.560 0.599 0.637 0.674 0.709 0.742 0.773 0.802 0.829 0.853 0.875 0.894 0.911 0.926 0.939 0.951 0.960 0.968 0.974 0.980 0.984 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.998 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.06 0.524 0.564 0.603 0.641 0.677 0.712 0.745 0.776 0.805 0.831 0.855 0.877 0.896 0.913 0.928 0.941 0.952 0.961 0.969 0.975 0.980 0.985 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.998 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.07 0.528 0.567 0.606 0.644 0.681 0.716 0.749 0.779 0.808 0.834 0.858 0.879 0.898 0.915 0.929 0.942 0.953 0.962 0.969 0.976 0.981 0.985 0.988 0.991 0.993 0.995 0.996 0.997 0.998 0.9985 0.9989 0.9992 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99996 0.08 0.532 0.571 0.610 0.648 0.684 0.719 0.752 0.782 0.811 0.836 0.860 0.881 0.900 0.916 0.931 0.943 0.954 0.962 0.970 0.976 0.981 0.985 0.989 0.991 0.993 0.995 0.996 0.997 0.998 0.9986 0.9990 0.9993 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.99997 0.09 0.536 0.575 0.614 0.652 0.688 0.722 0.755 0.785 0.813 0.839 0.862 0.883 0.901 0.918 0.932 0.944 0.954 0.963 0.971 0.977 0.982 0.986 0.989 0.992 0.994 0.995 0.996 0.997 0.998 0.9986 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.9999 0.99997 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.1.2: Sampling Distribution of One Proportion Probability Under the Standard Normal Distribution –0.0 –0.1 –0.2 –0.3 –0.4 –0.5 –0.6 –0.7 –0.8 –0.9 –1.0 –1.1 –1.2 –1.3 –1.4 –1.5 –1.6 –1.7 –1.8 –1.9 –2.0 –2.1 –2.2 –2.3 –2.4 –2.5 –2.6 –2.7 –2.8 –2.9 –3.0 –3.1 –3.2 –3.3 –3.4 –3.5 –3.6 –3.7 –3.8 –3.9 0.00 0.500 0.460 0.421 0.382 0.345 0.309 0.274 0.242 0.212 0.184 0.159 0.136 0.115 0.097 0.081 0.067 0.055 0.045 0.036 0.029 0.023 0.018 0.014 0.011 0.008 0.006 0.005 0.003 0.003 0.002 0.0013 0.0010 0.0007 0.0005 0.0003 0.0002 0.0002 0.0001 0.0001 0.00005 0.01 0.496 0.456 0.417 0.378 0.341 0.305 0.271 0.239 0.209 0.181 0.156 0.133 0.113 0.095 0.079 0.066 0.054 0.044 0.035 0.028 0.022 0.017 0.014 0.010 0.008 0.006 0.005 0.003 0.002 0.002 0.0013 0.0009 0.0007 0.0005 0.0003 0.0002 0.0002 0.0001 0.0001 0.00005 0.02 0.492 0.452 0.413 0.374 0.337 0.302 0.268 0.236 0.206 0.179 0.154 0.131 0.111 0.093 0.078 0.064 0.053 0.043 0.034 0.027 0.022 0.017 0.013 0.010 0.008 0.006 0.004 0.003 0.002 0.002 0.0013 0.0009 0.0006 0.0005 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.03 0.488 0.448 0.409 0.371 0.334 0.298 0.264 0.233 0.203 0.176 0.152 0.129 0.109 0.092 0.076 0.063 0.052 0.042 0.034 0.027 0.021 0.017 0.013 0.010 0.008 0.006 0.004 0.003 0.002 0.002 0.0012 0.0009 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.04 0.484 0.444 0.405 0.367 0.330 0.295 0.261 0.230 0.200 0.174 0.149 0.127 0.107 0.090 0.075 0.062 0.051 0.041 0.033 0.026 0.021 0.016 0.013 0.010 0.007 0.006 0.004 0.003 0.002 0.002 0.0012 0.0008 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.05 0.480 0.440 0.401 0.363 0.326 0.291 0.258 0.227 0.198 0.171 0.147 0.125 0.106 0.089 0.074 0.061 0.049 0.040 0.032 0.026 0.020 0.016 0.012 0.009 0.007 0.005 0.004 0.003 0.002 0.002 0.0011 0.0008 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.06 0.476 0.436 0.397 0.359 0.323 0.288 0.255 0.224 0.195 0.169 0.145 0.123 0.104 0.087 0.072 0.059 0.048 0.039 0.031 0.025 0.020 0.015 0.012 0.009 0.007 0.005 0.004 0.003 0.002 0.002 0.0011 0.0008 0.0006 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.07 0.472 0.433 0.394 0.356 0.319 0.284 0.251 0.221 0.192 0.166 0.142 0.121 0.102 0.085 0.071 0.058 0.047 0.038 0.031 0.024 0.019 0.015 0.012 0.009 0.007 0.005 0.004 0.003 0.002 0.0015 0.0011 0.0008 0.0005 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00004 0.08 0.468 0.429 0.390 0.352 0.316 0.281 0.248 0.218 0.189 0.164 0.140 0.119 0.100 0.084 0.069 0.057 0.046 0.038 0.030 0.024 0.019 0.015 0.011 0.009 0.007 0.005 0.004 0.003 0.002 0.0014 0.0010 0.0007 0.0005 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.00003 0.09 0.464 0.425 0.386 0.348 0.312 0.278 0.245 0.215 0.187 0.161 0.138 0.117 0.099 0.082 0.068 0.056 0.046 0.037 0.029 0.023 0.018 0.014 0.011 0.008 0.006 0.005 0.004 0.003 0.002 0.0014 0.0010 0.0007 0.0005 0.0003 0.0002 0.0002 0.0001 0.0001 0.0001 0.00003 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion Estimated number of 50-‐minute class sessions: 1.5 Materials Required You will need one of the following: • • • a 1962 penny for each student in the classroom (these can be bought on eBay for $5–10 for a roll of 50 pennies), or a deck of playing cards, or thumbtacks. Learning Goals Students will understand that • • • • estimation is appropriate when the population proportion is unknown. the sample proportion is the best estimate of the population proportion. the necessary condition for large-‐sample estimation of a proportion is that np and n(1 – p) are both at least 5 (note the change from π to p). the standard error can be estimated by ( ! !" ! "! • ) " this Lesson links to informal estimation in Module 7. For a given set of data, students will be able to • • • • • • • • recognize that they are asked to estimate a population proportion. verify whether large-‐sample procedures are appropriate and if so, determine the appropriate critical value for a given level of confidence. interpret the margin of error in context. compute the confidence interval estimate of a population proportion. interpret the interval in context. interpret the confidence level associated with the confidence interval estimate. describe the relationship between sample size and margin of error. given a description of a statistical study and a confidence interval for a population proportion, evaluate whether conclusions are reasonable. Lesson Overview This initiating lesson asks students how they would estimate a proportion. Specifically, what percent of coins, when banged, will fall on heads? (The instructor should be able to easily modify the lesson to other contexts.) The contextual scaffolding clarifies the distinction between sample statistics and proportion parameters, recognizes that the sample proportion is the best estimate of the population proportion, discusses the need for an interval estimate, determines whether large-‐sample techniques are appropriate, and calculates an interval estimate. Interpretation of the margin of error and the interval is discussed. The Rossman/Chance applet (www.rossmanchance.com/applets/NewConfsim/Confsim.html) is used to reinforce the interpretation of the confidence level introduced in Module 7. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion Opening Discussion and Questions In this Rich Task, the true proportion of some dichotomous event is estimated. This example examines the true proportion of heads when banging a 1962 penny. Alternative exercises include the following: • • • Determine the probability of a certain outcome using Pass the Pigs (see Lesson 8.1.1). Determine the true percent of red cards in a deck, after removing some cards. Determine the chance that a thumbtack will land on its head when flipped. Topic-‐Initiating Task The goal of this task is to estimate the true proportion of heads when the students bang a 1962 penny. This coin has been chosen since its construction results in a high probability of landing on tails. To bang a penny, stand it on edge (this may take a while to master), bang the table on which it stands, and record the result of how it lands, either heads or tails. Before you begin, this is a good chance to ask students to guess what the probability of getting heads is. Predicting the answer invests them in discovering whether they are right. Main Question Ask students to develop a method to determine what the probability is that the penny lands on heads. After giving students time to think (alone or in groups), you may wish to offer them three guiding questions that break down this task and align it with the learning outcomes: • • • What is the true proportion of heads for a 1962 penny when banged? Students should recognize that it is unknown. If it were known, this exercise would be futile. How can you best estimate the true proportion of heads for a 1962 penny when banged? Discussions about samples versus populations and sample statistics versus population parameters should occur. The population parameter is unknown and constant. The sample statistic is calculated each time they do an experiment and is variable. This will be demonstrated when students begin to gather their data. What if you want to know how accurate the estimate is? Can you determine a range of values for the true proportion of heads that are believable? How confident are you in this estimate? How informative is it? Some students may approach this with the normal distribution theory, some will use informal inference via simulations, and some may think of other ideas. Generating the Data Have each student bang the coin 10 times and record the number of heads they get on the 10 trials. (The author usually asks each student to do it 10 times and then form groups of 5 students, so they have a total of 50 flips. This seems to be the right number for large-‐sample theory to apply). Then use the data to estimate the actual percent of times the coin lands on heads. You may wish to discuss experimental design here and how to guarantee that everyone is doing the experiment as similar as possible, by controlling for certain factors (e.g., whether the fist strikes the side with heads or tails). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion Applying the Normal Distribution to the Large-‐Sample Case to Calculate a Confidence Interval If, for example, a group of students gets 9 heads out of 50 bangs, then… Total number of trials: n = 50 Total number of heads: x = 9 Sample proportion: p = x/n = 9/50 = 0.18 Population proportion: π = unknown (this is what they are trying to estimate). Note that this is the first condition for estimation. The parameter needs to be unknown. Tell students to apply their estimate to the data. Remind them of the large-‐sample theory from Lesson 8.1.1: If nπ and n(1 – π) are at least 5, the sample proportion should follow a normal distribution with mean π and standard error: ( ! !" ! "! ! ) However, p is unknown here, so how can you verify these conditions? Since p is a reasonable estimate of π, you can substitute p for π in the formulas. Conditions for large-‐sample theory: np and n(1 – p) are both at least 5. ( " ! #$ ! $! If so, ! ! " $ ! " # $ # ) %' . ' & The best estimate of π is p = 0.18. This is also called the point estimate, since it is the single best estimate. However, what about an interval estimate for π? Clearly, it is a different statement to say that π is 0.18 ± 0.01 (17–19%) compared to 0.18 ± 0.15 (3–33%). The former is much more preferred. So, how is the interval chosen? Answer: Use the fact that p follows a normal distribution. Thus, to have a 95% probability of capturing the true value: P(π – 1.96 se < p < π + 1.96 se) = 0.95 = P(p – 1.96 se < π < p + 1.96 se) Thus, the 95% confidence interval is !! ± !"#$% ( ! "! ! !! " ). You can choose another level of confidence. You can generalize this formula to be true for any level of confidence by using: !! ± !" " ( ! #! ! !! # ) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion The value of z* can be obtained from a standard normal table. For example, for the middle 90% of the distribution, look up the 95th percentile and the 5th percentile. By symmetry, they are the same number in absolute value. In this case, the number is 1.645. Some common values for z* are given in the following table: Confidence Level z* 99% 2.576 95% 1.960 90% 1.645 80% 1.282 Thus, a 95% confidence interval for the true proportion of heads when banging a penny is !! ± !" " ( ! #! ! !! # ) !$!%&#'! ± !#&() %&#'*#! ! !%&#'+ !$!%&#'! ± !#&()*%&%,-+!$!%&#'! ± !%&##!$!*%&%./!%&0(+ ,% Or, “We are 95% confident that the true proportion of heads when banging a 1962 penny is between 7% and 29%.” Note: If students generated their results using bootstrap samples from the computer or some other methods, you can show them how much easier it is to perform this calculation compared to the informal inferential techniques. Exploring the Relationship Between Margin of Error, Confidence Level, and Sample Size Why not just say you are completely confident that the interval is 0 to 1? Why not state the interval as precisely as possible (margin of error of 0)? Choosing a Confidence Level Why would you not want to be 100% confident? You would. It is easy to create a 100% confidence interval. You might want to ask students to do this—what are the limits of a 100% confidence interval? Based on the standard normal distribution, the critical Z-‐value for a 100% confidence interval would be infinity. Thus, a 100% confidence interval goes from –infinity to +infinity, or in the case of proportions from 0 to 1. So it is pretty easy to say that the coin will land on heads between 0% and 100% of the time. However, it is also pretty noninformative. So, you need to choose a number less than 100%. What if you wanted to be more precise? Of course you want to be more precise. Can you ever say that the true proportion is exactly 0.18 (from the previous example)? You might want to ask students to calculate this—what percent confidence has a margin of error of 0? Yes, with 0% confidence, since z* = 0 and thus the margin of error is 0. This, again, is a useless statement. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion What is the impact of increasing the sample size? Combine data from the entire class on the number of heads and total number of bangs. Calculate the 95% confidence interval. Show how the margin of error for this interval is much smaller than the previous one. Why? The standard error is smaller, by a factor of sqrt(n). Increasing the sample size takes resources (time and money), but is a great way to reduce the margin of error. Understanding the Correct Interpretation of a Confidence Interval What is the correct interpretation of a confidence interval? Note that it is not called a probability interval. It is not correct to say that • • “There is a 95% chance that the true population proportion is inside the interval.” Since the true proportion is a constant, it is either 100% or 0%, but which one is unknown. “95% of the sample data are within the interval.” So, what is the correct interpretation? The correct interpretation is, “If I repeated this process a large number of times, then 95% of the confidence intervals using this method would contain the true population proportion.” Note: Again, you do not know whether the interval contains the true proportion. To examine this, let’s look at the Rossman/Chance applet: www.rossmanchance.com/applets/NewConfsim/Confsim.html. Using the defaults (or other values you choose for Method = Proportion), click on Sample. Notice a few things: • • • First, look at the last sample graph, in the lower right-‐hand corner. This shows the result of the sample drawn—the number of heads (successes) and tails (failures), and the estimated proportion (they use p̂ instead of p). Next, look at the sample statistics in the upper right-‐hand corner. This is the sampling distribution of the sample proportion. One value is plotted there. Then, note the plot in the middle. The true value of π is shown as a solid vertical line. Your interval is shown with a dot for the point estimate and a green bar for the confidence interval if it covers π or a red bar if it does not cover π. Next, click on Reset and change Intervals to 100. This creates 100 intervals. The bottom right graph is only the last sample, the upper right graph is the sampling distribution of the sample mean, and the middle display shows all 100 intervals. You should get about five of them red and the rest green. You can click on Sample repeatedly to show that it is a random process, generating a different set of 100 intervals each time you click. On average, there will be 5/100 intervals that are red and do not contain π. Students should see that 95% of all the intervals contain the true value, yet when they create just one interval, whether the interval contains the true value is unknown. The confidence level corresponds to the process of creating the intervals. 95% of the confidence intervals created using this method contain the true value. This applet helps students see that. If you are in a computer lab (or for homework), ask students to play with this applet. Take note of what happens if you change the confidence level to 90% (the intervals get narrower and 90% of them are green, containing the true proportion), change the true population proportion (π) (the location of the intervals will move, and the width gets slightly smaller), or change the sample size (the width decreases, but 95% of them in the long run are green). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion Wrap-‐Up When the large-‐sample conditions are met, the confidence interval estimate for π is !! ± !" " ( ! #! ! !! # ) The correct interpretation for this interval relies on the fact that the process is a reliable one. You typically never know whether the interval contains the true value. Common Values for z* Confidence Level z* 99% 2.576 95% 1.960 90% 1.645 80% 1.282 Homework for Lesson 8.2.1 and/or 8.2.2 [Student Handout] (1) Which one of the following is the correct interpretation of the 95% confidence interval for one proportion? (a) There is a 95% chance that the true population proportion will be within the confidence limits. (b) 95% of the data are within the confidence limits. (c) If this process is repeated an infinite number of times, 95% of the sample proportions will be within the confidence limits. (d) None of the above. (2) If you want to reduce the width of your confidence interval, (circle all that apply) (a) increase your confidence level. (b) decrease your confidence level. (c) increase your sample size. (d) decrease your sample size. (e) take a new sample. (f) none of the above. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion (3) For a class project, you want to estimate the percentage of students who drink coffee at several liberal arts colleges. Two colleges in this group are Mandeville College and Gresham College, with student populations of 4,000 and 10,000 students, respectively. For each college, you plan to take a random sample of 2% of the student population. The margin of error for a 99% confidence interval will be (a) smaller for Gresham than for Mandeville. (b) smaller for Mandeville than for Gresham. (c) the same for each school. (d) there is insufficient information. (4) Suppose you wish to estimate the percentage of people who speed while driving in a town with a major university. You pick Austin, Texas (The University of Texas) and Lincoln, Nebraska (University of Nebraska). Both cities have populations of over 100,000, but Austin’s population is three times greater than that of Lincoln. You also expect the rates of speeding to be about the same in each city. Suppose you take a random sample of 1,000 drivers from each city. The margin of error for a 95% confidence interval will be (a) smaller for the Austin sample than the Lincoln sample. (b) smaller for the Lincoln sample than the Austin sample. (c) the same for both samples. (d) It is not possible to determine without more detailed information about the population sizes. (5) Suppose in a random sample of 50 students at your university, 32 believe that the men’s basketball team will make it to the NCAA tournament this year. Create a 90% confidence interval for this percent? Interpret the interval. (6) Is your backpack too heavy? Experts have stated that people should not carry a backpack that weighs more than 10% of the person’s weight. In a sample of 30 students from your school, it was found that 12 have backpacks weighing more than 10% of their weight. Create an 80% confidence interval for the proportion of all students at your school who have backpacks more than 10% of their weight. Interpret this interval. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion Homework (1) Which one of the following is the correct interpretation of the 95% confidence interval for one proportion? (a) There is a 95% chance that the true population proportion will be within the confidence limits. (b) 95% of the data are within the confidence limits. (c) If this process is repeated an infinite number of times, 95% of the sample proportions will be within the confidence limits. (d) None of the above. (2) If you want to reduce the width of your confidence interval, (circle all that apply) (a) increase your confidence level. (b) decrease your confidence level. (c) increase your sample size. (d) decrease your sample size. (e) take a new sample. (f) none of the above. (3) For a class project, you want to estimate the percentage of students who drink coffee at several liberal arts colleges. Two colleges in this group are Mandeville College and Gresham College, with student populations of 4,000 and 10,000 students, respectively. For each college, you plan to take a random sample of 2% of the student population. The margin of error for a 99% confidence interval will be (a) smaller for Gresham than for Mandeville. (b) smaller for Mandeville than for Gresham. (c) the same for each school. (d) there is insufficient information. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.2.1: Estimation of One Proportion (4) Suppose you wish to estimate the percentage of people who speed while driving in a town with a major university. You pick Austin, Texas (The University of Texas) and Lincoln, Nebraska (University of Nebraska). Both cities have populations of over 100,000, but Austin’s population is three times greater than that of Lincoln. You also expect the rates of speeding to be about the same in each city. Suppose you take a random sample of 1,000 drivers from each city. The margin of error for a 95% confidence interval will be (a) smaller for the Austin sample than the Lincoln sample. (b) smaller for the Lincoln sample than the Austin sample. (c) the same for both samples. (d) It is not possible to determine without more detailed information about the population sizes. (5) Suppose in a random sample of 50 students at your university, 32 believe that the men’s basketball team will make it to the NCAA tournament this year. Create a 90% confidence interval for this percent? Interpret the interval. (6) Is your backpack too heavy? Experts have stated that people should not carry a backpack that weighs more than 10% of the person’s weight. In a sample of 30 students from your school, it was found that 12 have backpacks weighing more than 10% of their weight. Create an 80% confidence interval for the proportion of all students at your school who have backpacks more than 10% of their weight. Interpret this interval. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion Estimated number of 50-‐minute class sessions: 1–1.5 Materials Required The Water Challenge (per student): • • • • • Two small cups At least 1 ounce or more of tap water At least 1 ounce or more of bottled water A marker or pen for the long version of the blind experiment A coin for randomized order modification, or other randomizing instrument (e.g., TI calculator, computer). The author usually brings one to two large bottles of tap water and one to two large bottles of bottled water. Learning Goals Students will understand that • • • • estimation is appropriate when the population proportion is unknown. the sample proportion is the best estimate of the population proportion. the necessary condition for large-‐sample estimation of a proportion is np and n(1 – p) are both at least 5 (note the change from π to p). the standard error can be estimated as ( ! !" ! "! " • ) this lesson links to informal estimation in Module 7. For a given set of data, students will be able to • • • • • • • • recognize that they are asked to estimate a population proportion. verify whether large-‐sample procedures are appropriate, and if so, determine the appropriate critical value for a given level of confidence. interpret the margin of error in context. compute the confidence interval estimate of a population proportion. interpret the interval in context. interpret the confidence level associated with the confidence interval estimate. describe the relationship between sample size and margin of error. given the description of a statistical study and a confidence interval for a population proportion, evaluate whether conclusions are reasonable. Lesson Overview Two examples of confidence interval calculations are presented. You may substitute another context that appeals to you or students. In the first example, students determine their preference between bottled water and tap water. Specifically, what percent of college students prefer bottled water over tap water? In the second example, an estimate of what percent of students support a political candidate is The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion calculated. Other examples could include what percent of time “Pass The Pigs” pigs land on their snout, what percent of passengers do not show up for their flights (a good discussion of independence ensues with this one), what percent of all first-‐year students consider transferring, what percent of lots of a particular manufactured product are defective, or what percent of students have been tested for HIV. Discussions around producing these data can link nicely to previous modules about experimental design or sampling. The calculations can be implemented in a number of different modalities—lecture, students working on it on their own, think-‐pair-‐shares, group work, and so on. The Bottled Water Challenge This experiment is based on the Pepsi Challenge,1 a marketing campaign that began in 1975 claiming that people preferred Pepsi Cola over Coca Cola in a national taste test. Discussing the history of this with students is a nice introduction. A number of YouTube videos can be found on this, including www.youtube.com/watch?v=v7lw_vhxtNc&feature=related. Showing the video and having a discussion about the experiment is a nice introduction for students as well. Here are some guiding questions: • • • Why were the soda labels masked from view? (Answer: If a person has a preconceived notion of which soda he or she prefers, it affects the choice, which biases the results.) Does the order in which the taster tastes them matter? (Answer: It might, so in a well-‐designed experiment this is randomized.) One critique of the Pepsi Challenge is that a sip is not the same as whole-‐cup or prolonged drinking and preference may be different due to the sweetness. Due to some students’ aversion to soda (for health reasons) and a known concern about high fructose corn syrup, this has been modified for the classroom. Instead of comparing Coke and Pepsi, compare whether students prefer bottled water or tap water. The goal is to determine what percent of students prefer bottled water to tap water. Optional Prior to conducting the experiment, ask students a few questions that can be used for other investigations within the Statway course: • • • How many bottles (or other units) of water do they typically drink per day? Do they prefer tap water or bottled water? When they drink water, do they normally drink tap or bottled? There are a few variations of this experiment, depending on how much time you want to take, the materials you have to offer students, and the additional topics (related to experimental design) that you want to cover. These variations are covered in the following sections: 1 http://en.wikipedia.org/wiki/Pepsi_Challenge The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion Simplest (Biased) Experiment In this experiment, students know which one they taste, and this will likely bias the experiment. So it is not recommended. Each student does the following: • • • • • Take two cups and label one with T for tap water and one with B for bottled water. Pour about 1 ounce of tap water into Cup T and about 1 ounce of bottled water into Cup B. Taste the water in each cup. Decide which one he or she prefers and record this result as either T for tap water or B for bottled water. Report this result to the instructor, when asked. Blind Experiment, Long Version In this experiment, students do not know which water they are tasting, so it is unbiased by students’ prior belief about which water tastes better. Students work in pairs. While this version takes more time than the shorter blind experiment below, it is a nice way for students to engage in more of the process of experimentation as it is really done. Each student does the following: • • • • • • • Take two cups and label one with 1 and the other with 2. For his or her partner, secretly pour one type of water into Cup 1 and another into Cup 2 (students record which one is which on their paper, but take care to not reveal it). Taste the water in each cup. Decide which one he or she prefers. Reveal which water he or she prefers to the partner. Record this result as either T for tap water or B for bottled water. Report this result to the instructor, when asked. Blind Experiment, Short Version Similar to the previous version, bring in two to four bottles of tap water, one to two bottles of bottled water, and one to two emptied bottles of bottled water filled with tap water, so that students cannot tell which contains bottled water or tap water. Label one of them (or one set of them if using two bottles of each) 1 and the other one 2. Make sure to note which one is which. Each student does the following: • • • • • • • Take two cups and label one with 1 and the other with 2. Pour Water 1 into Cup 1 and Water 2 into Cup 2. Taste the water in each cup. Decide which one he or she prefers. Record this result as either 1 or 2. Report this result to the instructor, when asked. The instructor then reveals which one is which. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion Randomized Order Modification This can be used for any of the previously described experiments. Each student flips a coin. Those who get heads taste Cup 1 (or Cup T if unblinded) first and those who get tails start with Cup 2 (or Cup B if unblinded). Thus, the order of drinking is randomized. If there is a preference for the first water tasted, this washes out (pun intended!) the effect of which water is tasted first. If you use a computer, you can modify the method of randomization so that half the class tastes one water first, rather than leave it to random chance. However, experience shows that this really does not make much difference. Again, it is just to learn how statistics are done in the real world. Once the experiment is done, gather data from students on which water they preferred—tap water or bottled water. For example, you might find that of 30 students, 13 prefer tap water. Then, calculation can proceed. Calculation • • • • π is unknown. π is the true proportion of college students taking the Statway course who prefer bottled water. Observations are independent of each other. Each student’s preference was not influenced by other students, unless someone did not do the experiment and just copied his or her answer off a neighbor or people collaborated on which one they preferred, for example. Notation: n = 30, x = 13, so p = x/n = 13/30 = 0.43 Thus, the best guess (point estimate) for π is 0.43. However, it is better to calculate an interval estimate to know how precise the estimate is. Let’s assume a 90% confidence interval. Recall that if np and n(1 – p) are both at least 5, and the observations are independent of each other, then the confidence interval for π is !! ± " " • ( ! #! ! !! # ) np and n(1 – p) are at least 5 np = 30(0.43) = 13 and n(1 – p) = 30(1 – 0.43) = 30(0.57) = 17, which are both at least 5 !± "! ( ! "# ! #! ! ) # = #$%&'±"%(&) ( ) # = #$%&'±"%(&)*$%$+,# = #$%&'± $%") $%&' "# ! #$%&' '$ The margin of error is 15%. The confidence interval is from 0.28 to 0.58, often written as (0.28, 0.58). Interpretation: You are 90% confident that the true proportion of students who prefer bottled water to tap water is somewhere between 28% and 58%. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion Political Polling Here, students try to estimate the proportion of students who plan on voting for Barack Obama in the 2012 presidential election. You can easily change this context based on your student population to another political election, a student president election, or a current-‐day political issue (e.g., immigration, abortion, invasion of other countries). Discuss how this polling will be done with students. A few issues are relevant: • There are only two possible answers—Obama or not Obama. You could exclude students who are not eligible to vote in the next election, but it is better to ask for preferences, so that everyone can participate. Are observations independent? If students choose classes together and also agree on politics, then perhaps not. However, ignore that for this calculation and assume that it is a small percent of students here. To which population are they generalizing? All students at their school? How is this class different than those students? The favorability or unfavorability is not so extreme: at least five students are Obama supporters and five are not (so large-‐scale techniques can be employed). Will students answer honestly? Will there be any nonresponses? How will they collect the results? If you ask students to raise hands (at least with eyes open), they might modify their answers based on what their peers say. The author likes collecting these data using personal response systems (such as clickers), but it can be done in a number of different ways. • • • • • Calculation • • • π is unknown. π is the true proportion of people (college students? community college students? community college students who sign up for a Statway course?) who plan on voting for Obama in 2012. Observations are independent of each other. Each student’s preference was not influenced by other students, except as noted above. Assuming that 31 of 45 students will vote for Obama: Notation: n = 45, x = 31, so p = x/n = 31/45 = 0.69 Thus, the best guess (point estimate) for π is 0.69. However, it is better to calculate an interval estimate, so you know how precise their estimate is. Let’s assume that a 95% confidence interval is wanted. Recall that if np and n(1 – p) are both at least 5, and the observations are independent of each other, then the confidence interval for π is !± "! • np and n(1 – p) are at least 5 ( ! "# ! #! # ) np = 45(0.69) = 31 and n(1 – p) = 45(1 – 0.69) = 45(0.31) = 14, which are both at least 5 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion • !± "! ( ! "# ! #! # ) # = #$%&'±"%'& ( $%&' "# ! #$%&' () ) # = #$%&'±"%'&*$%$&'+# = #$%&'± $%"( The margin of error is 14%. The confidence interval is (0.55, 0.83). Interpretation: You are 95% confident that the true proportion of people (students?) who plan on voting for Obama in 2012 is somewhere between 55% and 83%. Question for Thought: Name two ways to reduce the margin of error of this interval. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion Homework (1) Which one of the following is the correct interpretation of the 95% confidence interval for one proportion? (a) There is a 95% chance that the true population proportion will be within the confidence limits. (b) 95% of the data are within the confidence limits. (c) If this process is repeated an infinite number of times, 95% of the sample proportions will be within the confidence limits. (d) None of the above. (2) If you want to reduce the width of your confidence interval, (circle all that apply) (a) increase your confidence level. (b) decrease your confidence level. (c) increase your sample size. (d) decrease your sample size. (e) take a new sample. (f) none of the above. (3) For a class project, you want to estimate the percentage of students who drink coffee at several liberal arts colleges. Two colleges in this group are Mandeville College and Gresham College, with student populations of 4,000 and 10,000 students, respectively. For each college, you plan to take a random sample of 2% of the student population. The margin of error for a 99% confidence interval will be (a) smaller for Gresham than for Mandeville. (b) smaller for Mandeville than for Gresham. (c) the same for each school. (d) there is insufficient information. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.2.2: Estimation of One Proportion (4) Suppose you wish to estimate the percentage of people who speed while driving in a town with a major university. You pick Austin, Texas (The University of Texas) and Lincoln, Nebraska (University of Nebraska). Both cities have populations of over 100,000, but Austin’s population is three times greater than that of Lincoln. You also expect the rates of speeding to be about the same in each city. Suppose you take a random sample of 1,000 drivers from each city. The margin of error for a 95% confidence interval will be (a) smaller for the Austin sample than the Lincoln sample. (b) smaller for the Lincoln sample than the Austin sample. (c) the same for both samples. (d) It is not possible to determine without more detailed information about the population sizes. (5) Suppose in a random sample of 50 students at your university, 32 believe that the men’s basketball team will make it to the NCAA tournament this year. Create a 90% confidence interval for this percent? Interpret the interval. (6) Is your backpack too heavy? Experts have stated that people should not carry a backpack that weighs more than 10% of the person’s weight. In a sample of 30 students from your school, it was found that 12 have backpacks weighing more than 10% of their weight. Create an 80% confidence interval for the proportion of all students at your school who have backpacks more than 10% of their weight. Interpret this interval. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion Estimated number of 50-‐minute class sessions: 1.5 Materials Required Several large bags of plain M&Ms and small cups or napkins for students to take at least 40 M&Ms each. Optional A two-‐headed coin (easily found and purchased on the Internet) or a stacked deck of cards (i.e., the cards are all red [hearts and diamonds] or all black [clubs and spades]) Learning Goals Students will understand • • • • that hypothesis testing is appropriate when a decision is desired regarding an unknown population proportion. that the necessary conditions for large-‐sample hypothesis testing for a proportion are independent observations, and nπO and n(1 – πO) are both at least 5. the procedures of carrying out a hypothesis test. how this lesson links to informal estimation in Module 7. For a given set of data, students will be able to • • • • • • • • recognize that they are asked to make a decision about an unknown population proportion. given a claim about a population proportion, choose appropriate null and alternative hypotheses (for one-‐sided alternatives). describe Type I and Type II errors in context and select an appropriate significance level based on consideration of the consequences of Type I and Type II errors in a given situation. state and check the conditions needed to use the large-‐sample test for a population proportion and, if the conditions are met, compute the Z-‐value and find the associated P-‐value. use the P-‐value and chosen significance level to reach a decision. explain what the phrase statistically significant means. explain the difference between statistical significance and practical significance. given the description of a statistical study and the results of a test of hypotheses about a population proportion, evaluate whether conclusions are reasonable. Lesson Overview This initiating lesson asks students to make a decision about a population proportion. Specifically, is the proportion of brown M&Ms equal to 1/6? You should be able to easily modify the lesson to other contexts (e.g., Is the proportion of sixes on a six-‐sided die equal to 1/6? Is the proportion of heads resulting from a coin flip equal to 1/2?) The contextual scaffolding reviews the distinction between sample statistics and proportion parameters, sets up appropriate null and alternative hypotheses, recognizes the distribution of the sample proportion under the null hypothesis, explains Type I and Type II errors and chooses an appropriate level of significance, computes the Z-‐value and P-‐value, and makes a decision about the population proportion relative to the alternative hypothesis. Motivation for significance level is done using a The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion stacked deck, Out of This World magic trick (Posner), or a two-‐headed coin. A discussion of statistical significance ensues as well. Opening Discussion and Questions [Student Handout] M&Ms are popular candies. They come in different colors and different kinds (e.g., milk chocolate, peanut, dark chocolate). In milk chocolate M&Ms, the colors are blue, brown, green, orange, red, and yellow. The question is, “Are they evenly distributed?” (Note: Asking students whether they think the M&Ms are evenly distributed via personal response systems or clickers may invest them more in the answer.) Rich Task [Student Handout] General Question: How could someone determine whether M&Ms are evenly distributed? (Note: Let students ponder this for a while—the answer is really via a chi-‐squared goodness of fit, but that is not covered until later.) Research Question: How could someone determine whether brown M&Ms are underrepresented (less than 1/6 of all M&Ms are brown)? (Note: Students should recognize this as inference for one proportion. Note that this question may give students some pause—why choose underrepresented and not just different than 1/6? The answer is that this lesson first presents a one-‐sided test that is much more intuitive from a statistical perspective. Lesson 8.3.2 introduces a two-‐sided test, which may be more logically intuitive for students. Once both have been covered, the students will make the decision about which alternative hypothesis is appropriate.) Guiding Question: If there really are 1/6 brown M&Ms, is there a small or large chance that you could observe the proportion of brown M&Ms as extreme or more extreme than you actually observed in your sample? Reviewing the Distinction Between Sample Statistics and Proportion Parameters [Student Handout] You are trying to estimate the unknown population parameter (π), which is the true proportion of brown M&Ms. You will make a decision about this parameter based on the sample proportion (p), which is the proportion of brown M&Ms that you count in your batch. Setting Up Appropriate Null and Alternative Hypotheses [Student Handout] Begin by assuming that 1/6 of all M&Ms really are brown. The research question asks whether brown M&Ms are underrepresented. Statistically, write this as the null hypothesis (HO) and alternative hypothesis (Ha): HO: π = 1/6 Ha: π < 1/6 where π is the population proportion of brown M&Ms. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion (Note: In setting up the alternative hypothesis, it may be helpful to tell students that whatever is to the left and right of the equal sign in the null hypothesis should be the same in the alternative hypothesis, and that the equal sign should be replaced by either <, >, or ≠. Thus, the options for the alternative hypothesis are the following: Ha: π < 1/6 Ha: π > 1/6 Ha: π ≠ 1/6 Since they are testing whether brown M&Ms are underrepresented, this is true if the population proportion of brown M&Ms were less than 1/6, which is what the alternative hypothesis says.) Type I and Type II Errors and Choosing an Appropriate Level of Significance [Student Handout] Ultimately, you will decide whether there is evidence for the alternative hypothesis. Your decision will lead you to either reject or not reject the null hypothesis. With either decision, you might make an error. For example, assume that 1/6 of all M&Ms are brown. If you randomly choose 35 M&Ms, there is a 1-‐in-‐591 chance [1 – (1 – 1/6)35 = 0.00169 and 1/0.00169 = 591] that you will get 0 brown M&Ms. Thus, there is a chance (however small) that you would think that brown M&Ms are underrepresented even though they are not. Conversely, if you choose not to reject the null hypothesis, but the true proportion was really less than 1/6, you also made an error. Statistics distinguishes between the above errors as Type I and Type II errors. They are summarized in the following table. You only have the ability to control the decision that you make. Note that if you never want to make a type I error, you can easily choose to never reject HO. However, that puts you at a higher risk of a Type II error. True State of Nature Types of Errors Decision H0 True H0 False Reject H0 Type I error Good decision Do Not Reject H0 Good decision Type II error P(Type I error) = P(reject HO|HO true) = α The above equation reads as, “The probability of making a Type I error is equal to the probability of rejecting the null hypothesis, given that the null hypothesis is true, which is represented by the value of alpha.” The | symbol means “given that.” P(Type II error) = P(do not reject HO|HO false) = β The above equation reads as, “The probability of making a Type II error is equal to the probability of not rejecting the null hypothesis, given that the null hypothesis is false, which is represented by the value of beta.” The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion (Note: Students should recognize the type of error they might commit based on the decision that they make. In addition, they should know that one way of reducing the chance of Type I and Type II errors is by increasing the sample size. By convention, this lesson chooses an α or “significance level” of 0.05 and [in the Statway course] ignores β.) Optional If you want to motivate the reason for choosing a significance level (α = 0.05), you can do so using a two-‐headed coin. (Or, a stacked deck of cards—all red or all black—can be used instead.) Ask students if they want to play a game. A volunteer is selected, and you flip the coin. If you get tails, the volunteer gets a free point on the next homework assignment. If you get heads, he or she gets nothing. Typically, after about four to five students get tails, some students start grumbling and asking to see the coin. The probability of getting four heads in a row on a fair coin is 6% [(1/2)4] and 5 heads in a row is 3% [(1/2)5]. As behavioral psychologists have shown, the 5% cutoff for a rare event is natural for most people. This is a nice demonstration, since you can then refer back to the simple calculations for four or five heads as a hypothesis test, which also assumes it is a fair coin (HO: π = 0.5). This demonstration might be better played as a fun game at the beginning of this class, since it feels like a non sequitur to students here, or when discussing informal reasoning. Posner’s Out of This World magic trick1 can also be used to extend this discussion to correctly guessing the color of 20–25 cards from a deck. Recognizing the Distribution of the Sample Proportion Under the Null Hypothesis [Student Handout] Recall that the guiding question asks whether there is a small or large chance of getting a result as extreme or more extreme than you observed. To calculate the chance of this, you need to know what the distribution of the sample proportion is. Based on Topic 8.1, for a large enough sample size of 30 M&Ms, " " "% ! % " $ # # ! ! "$ " $ # &' $$ # ( % ) '' '' & since nπ = 30(1/6) = 5, n(1 – π) = 30(5/6) = 24, and the observations are assumed to be independent of each other. For more than 30 M&Ms, the conditions are also met, but the mean and standard error are different. What does it mean for M&Ms to be independent of each other? It means the color of M&M is not related to the color of the next one selected—no matter which color you choose, the chance that 1 www07.homepage.villanova.edu/michael.posner/files/Magic%20of%20Statistics.pdf The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion the next one is red, for example, is equal to the population proportion of red M&Ms. This task assumes this is true. In reality, however, this may not be the case (although that could be something you are testing). Apparently, M&Ms are made separately by color and then mixed together in a large vat. If they are not sufficiently mixed, then all red M&Ms, for example, are close together in the vat. When bags are extracted, you are likely to get either a bag with extra red M&Ms if it was drawn from the red section or fewer red M&Ms if it was drawn from the other sections. Now, consider what you observe in your bag. If you select a red M&M from your bag, the chance that the next M&M is red is higher than the true proportion, whereas if you select a nonred M&M from your bag, the chance that the next M&M is red is lower than the true proportion. In this case, the colors of the M&Ms are not independent. Gathering the Data [Student Handout] (Note: Hand out packs of M&Ms to students or cups to scoop some M&Ms from a larger bag [remind them not to look in the bag before choosing]. It is helpful to make sure they have at least 40 M&Ms, mostly so the large-‐sample conditions are met.) Count out the number of brown M&Ms (x) and the total number of M&Ms (n) in your cup. Once counted, you can eat them. Calculate the chance of getting a result as extreme or more extreme than what you observed if the null hypothesis is true. (Note: While they are working on this, you can gather the data from the entire class.) Once you have counted the number of M&Ms, draw the normal distribution for the data, under the null hypothesis. Computing the Test Statistic (Z-‐Value) and P-‐Value [Student Handout] If, for example, there are 207 brown M&Ms out of 1,425 M&Ms (combining data from the entire class), then n = 1,425 and x = 207, so p = 207/1,425 = 0.145 or (14.5%). If the true proportion was 1/6 (or 16.7%), what is the chance that you could get 14.5% or more extreme (lower, in this case) in a sample of 1,425 M&Ms? "! ! !! Recall from Lesson 8.1.1 that !! = ! Thus, !! = ! "! ! !! ( ! "! ! !! # ) !=! ( ! "! ! !! # ) ) !=! #$"%&! ! !#$"'( ( #$"'( "! ! !#$"'( ")%*& !#$#** ! = ! ! *$,( #$##+, From the standard normal distribution (Z) table, the probability of being less than this value is 0.009 (or just less than 1%). The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion The following plot, generated with Minitab, gives a more precise value using more significant digits in the calculation. The shaded region is what you are looking for and has a probability of 0.0098 (or again just less than 1%). This number is called the P-‐value. Distribution Plot Normal, Mean=0.1667, StDev=0.0093 40 Density 30 20 10 0.009815 0 0.145 0.1667 X Relating the P-‐value to the guiding question: The P-‐value is the chance that you could observe the proportion of brown M&Ms as extreme or more extreme than you actually observed in your sample, if the true proportion of brown M&Ms was 1/6. General Form: The P-‐value is the chance of observing a result as extreme or more extreme than the observed value if the null hypothesis is true. (Note that it is not the probability that the null hypothesis is true.) Making a Decision About the Population Proportion Relative to the Alternative Hypothesis [Student Handout] If the P-‐value is less than α, it is said that there is a small chance that you could observe a result as extreme or more extreme than the observed value from the sample data. So, what did you do wrong? You assumed that the true proportion was really 1/6. Thus, you reject the null hypothesis. When you reject the null hypothesis, there is a statistically significant difference between the sample proportion and the population proportion under the null hypothesis. Formally, you would say, “At the α level of significance, there is enough evidence to support the alternative hypothesis.” If the P-‐value is greater than α, it is said that there is a large chance that you could observe a result as extreme or more extreme than the observed value from the sample data. So, it is believable that the data came from the distribution under the null hypothesis. Thus, you fail to reject the null hypothesis. Formally, you would say, “At the α level of significance, there is not enough evidence to support the alternative hypothesis.” The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion Note that the burden of proof is in the alternative. You could fail to reject since there is no difference or only a small difference between the true parameter and hypothesized parameter, or perhaps you just do not have a large enough sample to show a real difference that exists is beyond what would have happened by chance alone. Now, back to the M&M data. The P-‐value is 0.0098, which is less than the significance level of 0.05, so you reject the null hypothesis. The formal conclusion states, “At the 5% level of significance there is enough evidence to show that the true proportion of brown M&Ms is less than 1/6.” Wrap-‐Up There are six steps in formal hypothesis testing. 1. Parameter: Identify the unknown parameter (or parameters). 2. Hypotheses: State the null and alternative hypotheses. It should take on the form HO: π = πO versus Ha: π > πO or Ha: π < πO or Ha: π ≠ πO Remind students that πO is just a placeholder for the problem-‐specific value that must be put there. In the context of a problem, it should not be written. They should also state the level of significance at which they will test (typically 0.05). 3. Assumptions: If students use large-‐sample inference on a proportion (which is all that has been covered in formal inference so far), they need to verify that # !" $% " %!" % ! ! " % !" # # % $ ( ) &( ( ( ' This is done by checking that the observations are independent of each other and that both nπO and n(1 – πO) are at least 5. (Note: There are different opinions of when these assumptions should be stated. In general, the author likes it before the hypotheses, but in this case, that cannot be done, since they need to know πO.) 4. Sample Statistic: Calculate the sample statistic based on the sample. So far, the only statistic that has been discussed is the standard normal distribution or the Z-‐statistic: !! = ! "! ! !" " ( " " #! ! !" " ) # 5. Decision: Make a decision about the hypotheses. If P-‐value < α, then reject the null hypothesis. If P-‐value > α, then do not reject the null hypothesis. 6. Conclusion: State a conclusion in the context of the problem: If you reject H0, then “At the α level of significance, there is enough evidence to believe that the alternative is true.” The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 7 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion If you do not reject H0, then “At the α level of significance, there is not enough evidence to believe that the alternative is true.” (Again, this does not mean the null hypothesis is true.) The steps for the hypothesis are essentially the same as what students did in informal inference in Module 7; however, the calculations should go much faster, since no simulation is required. In addition, generally accepted statistics practice is to use formal inferential techniques. Homework [Student Handout] (1) Suppose you flip a coin 50 times and get 35 heads. Do you think the coin is fair? Conduct a hypothesis test at the 5% level of significance to answer this. (2) You are hired as a quality control person for a local organic farm. The owner wants to make sure that the produce the farm is distributing is market-‐ready. He wants no more than 5% of the produce to be unacceptable for sale (i.e., rotten, broken). After an extensive discussion on what qualifies as unacceptable, you examine a random sample of 200 pieces of fruit from one delivery truck. If 12 pieces of fruit from the sample are unacceptable, is there statistically significant evidence that more than 5% of the fruits in the entire batch are unacceptable? Conduct the appropriate hypothesis test at the 5% level of significance. (3) Refer to the political polling problem in Lesson 8.2.2. According to CIRCLE,2 68% of young people who voted in 2008 voted for Barack Obama. Based on your data (and naively assuming that your class represents all young people), has the percent of young people who favor Obama declined from 2008 to 2012? To answer this, test whether the true percentage of young people who voted for Obama is still 68% versus the alternative that it is now lower than 68%. Conduct this test at the 5% level of significance. (4) Refer to the bottled water experiment in Lesson 8.2.2. If the majority of people (>50%) prefer bottled water, there is an argument for paying for bottled water for taste reasons. (There may be other reasons to drink it [e.g., purity of water] and reasons to not drink it [e.g., building an immune system and potentially harmful chemicals in plastic].) Test the hypothesis at the 5% level of significance that more than 50% of people prefer bottled water. 2 The Center for Information and Research on Civic Learning and Engagement (www.civicyouth.org) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 8 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion M&Ms are popular candies. They come in different colors and different kinds (e.g., milk chocolate, peanut, dark chocolate). In milk chocolate M&Ms, the colors are blue, brown, green, orange, red, and yellow. The question is, “Are they evenly distributed?” General Question: How could someone determine whether M&Ms are evenly distributed? Research Question: How could someone determine whether brown M&Ms are underrepresented (less than 1/6 of all M&Ms are brown)? Guiding Question: If there really are 1/6 brown M&Ms, is there a small or large chance that you could observe the proportion of brown M&Ms as extreme or more extreme than you actually observed in your sample? Reviewing the Distinction Between Sample Statistics and Proportion Parameters You are trying to estimate the unknown population parameter (π), which is the true proportion of brown M&Ms. You will make a decision about this parameter based on the sample proportion (p), which is the proportion of brown M&Ms that you count in your batch. Setting Up Appropriate Null and Alternative Hypotheses Begin by assuming that 1/6 of all M&Ms really are brown. The research question asks whether brown M&Ms are underrepresented. Statistically, write this as the null hypothesis (HO) and alternative hypothesis (Ha): HO: π = 1/6 Ha: π < 1/6 where π is the population proportion of brown M&Ms. Type I and Type II Errors and Choosing an Appropriate Level of Significance Ultimately, you will decide whether there is evidence for the alternative hypothesis. Your decision will lead you to either reject or not reject the null hypothesis. With either decision, you might make an error. For example, assume that 1/6 of all M&Ms are brown. If you randomly choose 35 M&Ms, there is a 1-‐in-‐591 chance [1 – (1 – 1/6)35 = 0.00169 and 1/0.00169 = 591] that you will get 0 brown M&Ms. Thus, there is a chance (however small) that you would think that brown M&Ms are underrepresented even though they are not. Conversely, if you choose not to reject the null hypothesis, but the true proportion was really less than 1/6, you also made an error. Statistics distinguishes between the above errors as Type I and Type II errors. They are summarized in the following table. You only have the ability to control the decision that you make. Note that if you never want to make a type I error, you can easily choose to never reject HO. However, that puts you at a higher risk of a Type II error. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion True State of Nature Types of Errors Decision H0 True H0 False Reject H0 Type I error Good decision Do Not Reject H0 Good decision Type II error P(Type I error) = P(reject HO|HO true) = α The above equation reads as, “The probability of making a Type I error is equal to the probability of rejecting the null hypothesis, given that the null hypothesis is true, which is represented by the value of alpha.” The | symbol means “given that.” P(Type II error) = P(do not reject HO|HO false) = β The above equation reads as, “The probability of making a Type II error is equal to the probability of not rejecting the null hypothesis, given that the null hypothesis is false, which is represented by the value of beta.” Recognizing the Distribution of the Sample Proportion Under the Null Hypothesis Recall that the guiding question asks whether there is a small or large chance of getting a result as extreme or more extreme than you observed. To calculate the chance of this, you need to know what the distribution of the sample proportion is. Based on Topic 8.1, for a large enough sample size of 30 M&Ms, " " "% ! % " $ # # " ! ! "$ $ &' $$ # # ( ) % ' ' '' & since nπ = 30(1/6) = 5, n(1 – π) = 30(5/6) = 24, and the observations are assumed to be independent of each other. For more than 30 M&Ms, the conditions are also met, but the mean and standard error are different. What does it mean for M&Ms to be independent of each other? It means the color of M&M is not related to the color of the next one selected—no matter which color you choose, the chance that the next one is red, for example, is equal to the population proportion of red M&Ms. This task assumes this is true. In reality, however, this may not be the case (although that could be something you are testing). Apparently, M&Ms are made separately by color and then mixed together in a large vat. If they are not sufficiently mixed, then all red M&Ms, for example, are close together in the vat. When bags are extracted, you are likely to get either a bag with extra red M&Ms if it was drawn from the red section or fewer red M&Ms if it was drawn from the other sections. Now, consider what you observe in your bag. If you select a red M&M from your bag, the chance that the next M&M is red is higher than the true proportion, whereas if you select a nonred M&M from your bag, the chance that the next M&M is red is lower than the true proportion. In this case, the colors of the M&Ms are not independent. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion Gathering the Data Count out the number of brown M&Ms (x) and the total number of M&Ms (n) in your cup. Once counted, you can eat them. Calculate the chance of getting a result as extreme or more extreme than what you observed if the null hypothesis is true. Once you have counted the number of M&Ms, draw the normal distribution for the data, under the null hypothesis. Computing the Test Statistic (Z-‐Value) and P-‐Value If, for example, there are 207 brown M&Ms out of 1,425 M&Ms (combining data from the entire class), then n = 1,425 and x = 207, so p = 207/1,425 = 0.145 or (14.5%). If the true proportion was 1/6 (or 16.7%), what is the chance that you could get 14.5% or more extreme (lower, in this case) in a sample of 1,425 M&Ms? "! ! !! Recall from Lesson 8.1.1 that !! = ! Thus, !! = ! "! ! !! ( ! "! ! !! # ) !=! ( ! "! ! !! # ) ) !=! #$"%&! ! !#$"'( ( #$"'( "! ! !#$"'( ")%*& !#$#** ! = ! ! *$,( #$##+, From the standard normal distribution (Z) table, the probability of being less than this value is 0.009 (or just less than 1%). The following plot, generated with Minitab, gives a more precise value using more significant digits in the calculation. The shaded region is what you are looking for and has a probability of 0.0098 (or again just less than 1%). This number is called the P-‐value. Distribution Plot Normal, Mean=0.1667, StDev=0.0093 40 Density 30 20 10 0.009815 0 0.145 0.1667 X The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion Relating the P-‐value to the guiding question: The P-‐value is the chance that you could observe the proportion of brown M&Ms as extreme or more extreme than you actually observed in your sample, if the true proportion of brown M&Ms was 1/6. General Form: The P-‐value is the chance of observing a result as extreme or more extreme than the observed value if the null hypothesis is true. (Note that it is not the probability that the null hypothesis is true.) Making a Decision About the Population Proportion Relative to the Alternative Hypothesis If the P-‐value is less than α, it is said that there is a small chance that you could observe a result as extreme or more extreme than the observed value from the sample data. So, what did you do wrong? You assumed that the true proportion was really 1/6. Thus, you reject the null hypothesis. When you reject the null hypothesis, there is a statistically significant difference between the sample proportion and the population proportion under the null hypothesis. Formally, you would say, “At the α level of significance, there is enough evidence to support the alternative hypothesis.” If the P-‐value is greater than α, it is said that there is a large chance that you could observe a result as extreme or more extreme than the observed value from the sample data. So, it is believable that the data came from the distribution under the null hypothesis. Thus, you fail to reject the null hypothesis. Formally, you would say, “At the α level of significance, there is not enough evidence to support the alternative hypothesis.” Note that the burden of proof is in the alternative. You could fail to reject since there is no difference or only a small difference between the true parameter and hypothesized parameter, or perhaps you just do not have a large enough sample to show a real difference that exists is beyond what would have happened by chance alone. Now, back to the M&M data. The P-‐value is 0.0098, which is less than the significance level of 0.05, so you reject the null hypothesis. The formal conclusion states, “At the 5% level of significance there is enough evidence to show that the true proportion of brown M&Ms is less than 1/6.” The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion Homework (1) Suppose you flip a coin 50 times and get 35 heads. Do you think the coin is fair? Conduct a hypothesis test at the 5% level of significance to answer this. (2) You are hired as a quality control person for a local organic farm. The owner wants to make sure that the produce the farm is distributing is market-‐ready. He wants no more than 5% of the produce to be unacceptable for sale (i.e., rotten, broken). After an extensive discussion on what qualifies as unacceptable, you examine a random sample of 200 pieces of fruit from one delivery truck. If 12 pieces of fruit from the sample are unacceptable, is there statistically significant evidence that more than 5% of the fruits in the entire batch are unacceptable? Conduct the appropriate hypothesis test at the 5% level of significance. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Student Handout April 24, 2012 (Full Version 1.0) Initiating Lesson 8.3.1: Estimation of One Proportion 1 (3) Refer to the political polling problem in Lesson 8.2.2. According to CIRCLE, 68% of young people who voted in 2008 voted for Barack Obama. Based on your data (and naively assuming that your class represents all young people), has the percent of young people who favor Obama declined from 2008 to 2012? To answer this, test whether the true percentage of young people who voted for Obama is still 68% versus the alternative that it is now lower than 68%. Conduct this test at the 5% level of significance. (4) Refer to the bottled water experiment in Lesson 8.2.2. If the majority of people (>50%) prefer bottled water, there is an argument for paying for bottled water for taste reasons. (There may be other reasons to drink it [e.g., purity of water] and reasons to not drink it [e.g., building an immune system and potentially harmful chemicals in plastic].) Test the hypothesis at the 5% level of significance that more than 50% of people prefer bottled water. 1 The Center for Information and Research on Civic Learning and Engagement (www.civicyouth.org) The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion Estimated number of 50-‐minute class sessions: 1 Learning Goals Students will understand • • • • that hypothesis testing is appropriate when a decision is desired regarding an unknown population proportion. that the necessary conditions for large-‐sample hypothesis testing for a proportion are independent observations and both nπO and n(1 – πO) are at least 5. the procedures of carrying out a hypothesis test. how this lesson links to informal estimation in Module 7. For a given set of data, students will be able to • • • • • • • • recognize that they are asked to make a decision about an unknown population proportion. given a claim about a population proportion, choose appropriate null and alternative hypotheses (including two-‐sided alternatives). describe Type I and Type II errors in context and select an appropriate significance level based on consideration of the consequences of Type I and Type II errors in a given situation. state and check the conditions needed in order to use the large-‐sample test for a population proportion and, if the conditions are met, compute the Z-‐value and find the associated P-‐value. use the P-‐value and the chosen significance level to reach a decision. explain what the phrase statistically significant means. explain the difference between statistical significance and practical significance. given the description of a statistical study and the results of a test of hypotheses about a population proportion, evaluate whether conclusions are reasonable. Examples/Activities The context can be any one that appeals to you or students. The contexts presented here concern the question of whether male births are more likely than female births and the investigation of the HIV testing rate on your campus (by either instructor or student, dependent on desired modality). Other examples include the following: • • • • whether a coin is fair, whether more people prefer bottled water over tap water (see Lesson 8.2.2), whether the homeless rate in a town has decreased, and the percentage of likely voters in the next election who will vote for the incumbent (see Lesson 8.2.2). These can link nicely to discussions about experimental design or sampling. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion Are Male Births More Likely than Female Births? [Student Handout] At a local hospital, there were 13,173 males born out of 25,468 singleton (one-‐baby) births. Is there evidence at the 5% level of significance that more males are born than females? 1. Parameter: The unknown parameter is π, the true proportion of male births. 2. Hypotheses: HO: π = 0.5 versus Ha: π > 0.5 Use α = 0.05. 3. Assumptions: " "#$ &' ! '"#$ $ ! ! " $ "#$% ($%)*+ $ # ) %' ( ' ' & since births are independent of each other and nπO = 25,468(0.5) = 12,734 and n(1 – πO) = 25,468(0.5) = 12,734 are both much greater than 5. Births are independent of each other since they are all singletons. The genders of identical twins are not independent of each other. You may wish to inquire whether there are births to different mothers, since the gender of a child can be associated with people (or how they were conceived). 4. Sample Statistic and P-‐Value p = 13,173/25,468 = 0.517 !! = ! "#$%&! ! !"#$ ( ) "#$ %! ! !"#$ ! = !$#), '$()*+ P-‐value = P(p > 0.517|π = 0.5) = P(Z > 5.43) = 1 – P(Z < 5.43) < 1 – 0.9997 = 0.0003 5. Decision: Since P-‐value < α, reject the null hypothesis. 6. Conclusion: At the 5% level of significance, there is enough evidence to show that the rate of male births is higher than 50%. Statistical Versus Practical Significance In this example, you find that the difference is statistically significant, even though the difference is quite small. Do you really care whether the rate of male births is 51.7% instead of 50%? Most parents would accept that it is almost a 50/50 chance. It is important to distinguish between statistical significance (as found here) and clinical or practical significance, which asks whether the difference is meaningful in practical or clinical terms. In particular, be wary of very small or very large sample sizes. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion To illustrate this point, what if 26 of 50 births at a small hospital were male? Here, the rate of males is even higher than before: 52% vs. 51.7%. However, the Z-‐statistic is 0.28, which results in a P-‐value of 0.389—not even close to statistically significant. In the first case, you say that the rate of males is clearly above 50%, whereas in the second case there is not enough evidence to state that the rate is above 50%. This is due to the fact that there are 50 people in one case and 25,468 in the other. Using expert judgment is important. In this case, who cares that the male birth rate is (significantly) just higher than 50%? Parents might not, but demographers sure do. Thus, it is not practically significant for parents, but is for demographers. In statistics, the context is always important! HIV Testing Rates on College Campuses (Example of a Two-‐Sided Test) Human Immunodeficiency Virus (HIV) is what causes Acquired Immune Deficiency Syndrome (AIDS). HIV is transmitted through bodily fluids, including sexual exchanges, intravenous drug use with used needles, breast-‐feeding, and blood transfusions. Quantitative Literacy and Motivating Question: What percent of adults, aged 15–49 in North America, are living with HIV or AIDS? (Answer: 0.5% as of 20091) It is estimated that 33 million people worldwide are HIV-‐positive (HIV+) as of 2009, with 2.6 million new HIV infections per year, and 1.8 million deaths per year due to AIDS.1 A large campaign in the United States in recent years is known as the Knowledge Campaign, where people are encouraged to know their HIV status and take appropriate actions based on this information. According to one report,2 one in 500 college students are HIV+. This means that at your college, there may be around (student population/500) students who are HIV positive. One article3 estimated that 22.5% of all college students have been tested for HIV. (28% of those age 20 or older versus 15% of those under age 20; 25% of females versus 18% of males; and 43% minorities versus 20% others) Question: Does this college (using this class as a sample) have a different HIV testing rate than the general population? As long as you have 23 students in your class (or 34 students under age 20), this example will work. You will need some way of polling the class anonymously (e.g., personal response systems/clickers, online, or having students write their answer down on paper and hand it in). Discussion of Bias This class is not a random sample from all college students. Ask students to identify one way in which they are different (class year? major? gender? etc.) and discuss how this bias could affect the result. For example, if the class is mostly first-‐year students, the rate should be lower. If the students are more female, the rate should be higher. If the students are nursing students or premed, they might have more awareness of HIV, so their rate should be higher (assuming they have not made the decision to avoid engaging in risky behavior). If the college’s HIV testing club all signed up for this class together, clearly the results would be biased. AVERT. (2009). Global HIV and AIDS estimates, end of 2009. Retrieved from www.avert.org/worldstats.htm. Retrieved from www2.potsdam.edu/aeg/pages/manual/commonques.html. 3 Crosby, R. A., Miller, K. H., Staten, R. R., & Noland, M. (2005). Prevalence and correlates of HIV testing among college students: an exploratory study. Sex Health, 19–22. 1 2 The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion Another source of bias is from nonresponse bias (some students might choose to not answer for some reason) or from students who lie about whether they have been tested. These call the results of the sample into question. Hypothesis Test 1. Parameter: The unknown parameter is π, the true proportion who have been tested for HIV at your university. 2. Hypotheses: HO: π = 0.225 versus Ha: π ≠ 0.225 Use α = 0.05. If you have only one subpopulation listed above (students under age 20, for example), you can test a different hypothesis. 3. Assumptions: " "#$$% '( ! ("#$$% $ ! ! " $ "#$$%& $) $ # ) %' ( ' ' & in a class of 27 students, since answers are independent (assuming they do not copy each other and there are no couples or people who dragged each other to get tested), and both nπO = 27(0.225) = 6 and n(1 – πO) = 27(1 – 0.225) = 21 are greater than 5. 4. Sample Statistic and P-‐Value: p = 3/27 = 0.111 !! = ! "#$$$! ! !"#%%& ( ) "#%%& $! ! !"#%%& ! = ! !$#(% %' What is the P-‐value? Remind students that the P-‐value is the chance of being as extreme or more extreme than the observed data, given that the null hypothesis is true. However, in this situation, students should accept the possibility that the sample proportion is either above or below the hypothesized proportion of 0.225. The sample proportion from the data was 0.114 below the hypothesized proportion, but have students consider the chance that it could have been 0.114 above that value instead. (Two-‐ sided tests are often more intuitive for students in the two-‐sample case—males are higher than females or females are higher than males—but it is relevant here.) Thus, the P-‐value is P(p < 0.111|π = 0.225) in the lower tail and P(p > 0.339|π = 0.225). The value 0.339 is chosen since it is 0.111 above the hypothesized value (0.225 + 0.111). This can be seen in the following graph: The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion Distribution Plot Normal, Mean=0.225, StDev=0.08 5 Density 4 3 2 1 0.07708 0 0.07708 0.111 0.225 X 0.339 Students can calculate that P(p < 0.111|π = 0.225) + P(p > 0.339|π = 0.225) = P(Z < –1.42) + P(Z > 1.42) Note the symmetry here! Thus, the P-‐value for a two-‐sided test is always (for normal distribution, at least) twice the appropriate one-‐sided P-‐value. The appropriate one-‐sided P-‐value is the tail probability, not its complement. Thus, P-‐value = P(Z < –1.42) + P(Z > 1.42) = 2 P(Z < –1.42) = 2(0.078) = 0.156. (Note: This is slightly different from the Minitab image, which uses a more exact number than 1.42.) 5. Decision: Since P-‐value > α, do not reject the null hypothesis. 6. Conclusion: At the 5% level of significance, there is not enough evidence to show that the rate of HIV testing at this college is different from the national average of 22.5%. Students can also say it is believable that the true rate at the university is 22.5%. However, they should not say the true rate is 22.5%. Homework [Student Handout] (1) A study found that 52% of students at a college in the Midwest are against animal testing. Do students at your college have a different rate of being against animal testing than students at the Midwest college? In a random sample of 75 students from your college, 47 are against animal testing. Conduct the appropriate hypothesis test at the 5% level of significance. (2) What does a P-‐value mean? A researcher tested the null hypothesis that the population proportion is equal to πO (HO: π = πO). The Z-‐test for one proportion produced a P-‐value of 0.01. All assumptions of the test have been satisfied. Identify which of the following statements are true: (a) There is a 1% chance of getting a result as extreme or more extreme than the observed one when HO is true. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 5 Statway Instructor’s Notes April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion (b) There is a 1% chance that the null hypothesis is true. (c) There is a 1% chance that the decision to reject HO is wrong. (d) There is a 99% chance that the alternative hypothesis is true, given the observed data. (e) A small P-‐value indicates a large effect. (f) Failure to reject HO means that the population proportion is probably πO. (g) Rejecting HO confirms the quality of the research design. (h) If HO is not rejected, the study is a failure. (i) Assuming that HO is true and the study is repeated many times, about 1% of these results will be even more inconsistent with HO than the observed result. (3) Find the error and correct it. (The information in italics is the problem introduction and is not the error in question). A casino is alerted that one of its (six-‐sided) dice may be loaded, with the number six appearing too often. The casino’s management decides to discard the die if there is evidence, at the 5% level of significance, that there is more than a 1/6 chance that a six occurs. In a sample of 100 dice rolls, 10 resulted in a six. From this, the casino followed these six steps: • • Assumptions: Random sample with independent observations—since dice rolls are independent—and both 100 × 1/6 = 16.7 and 100 × 5/6 = 83.3 are at least 5. Hypotheses: HO : π = 1/6 versus Ha : π > 1/6 α = 0.05 • Test Statistic: !! = ! "! ! !" ( " "! ! !" # • • • ) !=! #$"! ! !#$"%% ( #$"%% "! ! !#$"%% ) ! = ! !"$&' "## P-‐value: P(Z > –1.79) = 0.037 Decision: Reject HO since p < α. Conclusion: At the 5% level of significance, there is enough evidence to show that six appears more often than expected by random chance alone. Therefore, the casino management decides to discard the die. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 6 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion Are Male Births More Likely than Female Births? At a local hospital, there were 13,173 males born out of 25,468 singleton (one-‐baby) births. Is there evidence at the 5% level of significance that more males are born than females? 1. Parameter: The unknown parameter is π, the true proportion of male births. 2. Hypotheses: HO: π = 0.5 versus Ha: π > 0.5 Use α = 0.05. 3. Assumptions: " "#$ &' ! '"#$ $ ! ! " $ "#$% ($%)*+ $ # ) %' ( ' ' & since births are independent of each other and nπO = 25,468(0.5) = 12,734 and n(1 – πO) = 25,468(0.5) = 12,734 are both much greater than 5. Births are independent of each other since they are all singletons. The genders of identical twins are not independent of each other. You may wish to inquire whether there are births to different mothers, since the gender of a child can be associated with people (or how they were conceived). 4. Sample Statistic and P-‐Value p = 13,173/25,468 = 0.517 !! = ! "#$%&! ! !"#$ ( ) "#$ %! ! !"#$ ! = !$#), '$()*+ P-‐value = P(p > 0.517|π = 0.5) = P(Z > 5.43) = 1 – P(Z < 5.43) < 1 – 0.9997 = 0.0003 5. Decision: Since P-‐value < α, reject the null hypothesis. 6. Conclusion: At the 5% level of significance, there is enough evidence to show that the rate of male births is higher than 50%. Statistical Versus Practical Significance In this example, you find that the difference is statistically significant, even though the difference is quite small. Do you really care whether the rate of male births is 51.7% instead of 50%? Most parents would accept that it is almost a 50/50 chance. It is important to distinguish between statistical significance (as found here) and clinical or practical significance, which asks whether the difference is meaningful in practical or clinical terms. In particular, be wary of very small or very large sample sizes. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 1 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion To illustrate this point, what if 26 of 50 births at a small hospital were male? Here, the rate of males is even higher than before: 52% vs. 51.7%. However, the Z-‐statistic is 0.28, which results in a P-‐value of 0.389—not even close to statistically significant. In the first case, you say that the rate of males is clearly above 50%, whereas in the second case there is not enough evidence to state that the rate is above 50%. This is due to the fact that there are 50 people in one case and 25,468 in the other. Using expert judgment is important. In this case, who cares that the male birth rate is (significantly) just higher than 50%? Parents might not, but demographers sure do. Thus, it is not practically significant for parents, but is for demographers. In statistics, the context is always important! The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 2 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion Homework (1) A study found that 52% of students at a college in the Midwest are against animal testing. Do students at your college have a different rate of being against animal testing than students at the Midwest college? In a random sample of 75 students from your college, 47 are against animal testing. Conduct the appropriate hypothesis test at the 5% level of significance. (2) What does a P-‐value mean? A researcher tested the null hypothesis that the population proportion is equal to πO (HO: π = πO). The Z-‐test for one proportion produced a P-‐value of 0.01. All assumptions of the test have been satisfied. Identify which of the following statements are true: (a) There is a 1% chance of getting a result as extreme or more extreme than the observed one when HO is true. (b) There is a 1% chance that the null hypothesis is true. (c) There is a 1% chance that the decision to reject HO is wrong. (d) There is a 99% chance that the alternative hypothesis is true, given the observed data. (e) A small P-‐value indicates a large effect. (f) Failure to reject HO means that the population proportion is probably πO. (g) Rejecting HO confirms the quality of the research design. (h) If HO is not rejected, the study is a failure. (i) Assuming that HO is true and the study is repeated many times, about 1% of these results will be even more inconsistent with HO than the observed result. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 3 Statway Student Handout April 24, 2012 (Full Version 1.0) Supporting Lesson 8.3.2: Hypothesis Testing for One Proportion (3) Find the error and correct it. (The information in italics is the problem introduction and is not the error in question). A casino is alerted that one of its (six-‐sided) dice may be loaded, with the number six appearing too often. The casino’s management decides to discard the die if there is evidence, at the 5% level of significance, that there is more than a 1/6 chance that a six occurs. In a sample of 100 dice rolls, 10 resulted in a six. From this, the casino followed these six steps: • • Assumptions: Random sample with independent observations—since dice rolls are independent—and both 100 × 1/6 = 16.7 and 100 × 5/6 = 83.3 are at least 5. Hypotheses: HO : π = 1/6 versus Ha : π > 1/6 α = 0.05 • Test Statistic: !! = ! "! ! !" ( " "! ! !" # • • • ) !=! #$"! ! !#$"%% ( #$"%% "! ! !#$"%% ) ! = ! !"$&' "## P-‐value: P(Z > –1.79) = 0.037 Decision: Reject HO since p < α. Conclusion: At the 5% level of significance, there is enough evidence to show that six appears more often than expected by random chance alone. Therefore, the casino management decides to discard the die. The original versions of the Statway™ and Quantway™ courses were created by The Charles A. Dana Center at The University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching, and are copyright © 2011 by the Carnegie Foundation for the Advancement of Teaching and the Charles A. Dana Center at The University of Texas at Austin. STATWAY™/Statway™ and Quantway™ are trademarks of the Carnegie Foundation for the Advancement of Teaching. The Dana Center’s frontmatter for Statway™ and Quantway™ is available at www.utdanacenter.org/mathways. 4

Statway TM A statistics pathway for college students

Related documents

Products

Support

Statway TM A statistics pathway for college students

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib