Pacing Guide Secondary III: Block 1 Sampling and Inference Recommended Time Frame: 7 weeks Start Date: August 21, 2013 Estimated End Date: October 11, 2013 Actual End Date: Background/Helpful Information: The central idea in this block is the idea of inference—how reliable, predictive information can be inferred about a population from the data from a representative sample of that population. In the fullness of time, students will begin to encounter this idea in Grade 7 of the Utah Core, but this year’s transition group will not have had this experience. This block will need to build an understanding of randomness and its role in selecting a “representative” sample from a target population. Given the inherent variability in any population with respect to a given characteristic of interest (i.e. parameter), eliminating bias in the process of sampling is critical to being able to use statistical methods to make inferences from data collected. The block has three parts: Understanding Randomness and Sampling. Students discover what “random” really means, what kind(s) of variability are associated with randomness, and that randomness can’t be faked. They use simulations to explore sampling in various contexts. Assuring Random Sampling in Various Kinds of Data Collection. Students learn about the four types of data collection structures (simulations, which they’ve already encountered, and sample surveys, observational studies, and experiments) and what needs to be done in each case to assure that random sampling occurs and that bias (intentional or unintentional) doesn’t infect the sampling process. Recognizing When Variability of a Sample is (Probably) Not Random. Ultimately, research and data collection efforts are aimed at discovering characteristics, causes, and influences that aren’t just the result of random variation. We need to have a systematic, mathematicallyreliable means of recognizing when a data sample is “different enough” from what we expect a completely random sample to be so that we can infer that something statistically significant is probably accounting for the difference. Different data collection structures will lead to differently shaped distributions due to random variation. Although, students will have informally seen other distributions, particularly the uniform and geometric distributions (without learning their names), we will focus on the normal distribution in detail as a means of learning how to recognize variability that is probably not due to randomness alone. Students should understand how the standard deviation is computed and how to do it using technology. Connections could be made to the least-squares approach for fitting a line to data and the idea of standard deviation as characterizing the variability of a sample. Students will learn under what conditions is the normal distribution appropriate to assume for a population (that data is unimodal and relatively symmetric). An analogy to linear regression is apt here as well: Not all bivariate data are best modeled with a line of best fit, not all data collections for a population are best modeled with a normal distribution. Just as with bivariate data, students must learn to be alert for signs that the normal distribution may not apply. They will learn the 98-95-99.7 Rule for the normal distribution. They will learn how to figure out a sample’s z-score (the number of standard deviations it is from the mean) and use it to decide whether or not it is likely to be the result of something other than random chance. Note that we are not, for non-honors classes, computing probabilities formally, nor applying statistical hypothesis testing (such a t-tests). Teachers may wish to introduce a z-table of probabilities to aid in conceptual development, but not assess fluency using such tables to compute probabilities at the non-honors level. Rather, focus on having students visually place a sample mean on a histogram of a normal population distribution in order to consider how far out into the “tail” of the distribution does it need to be to be reasonably sure that it is influenced by something more than random variation. CURRICULUM INSTRUCTION ASSESSMENT (connected background standards from prior years) 7.SP.1: Understand that statistics can be used to gain information about a population by examining a sample of the population; generalizations about a population from a sample are valid only if the sample is representative of that population. Understand that random sampling tends to produce representative samples and School Advisory Panel: http://www.illustrativemathematics.org/illustrations/186 Strict Parents: http://www.illustrativemathematics.org/illustrations/122 Summative Assessment (s): Block 1 Growth Assessment Preand Post Formative Assessment(s): support valid inferences. 7.SP.2: Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be. Understand and evaluate random processes underlying statistical experiments. S.IC.1: Understand that statistics allows inferences to be made about population parameters based on a random sample from that population. S.IC.2: Decide if a specified model is consistent with results from a given data-generating process, e.g., using simulation. For example, a model says a spinning coin falls heads up with probability 0.5. Would a result of 5 tails in a row cause you to question the model? [Include comparing theoretical and empirical results to evaluate the effectiveness of a treatment.] Make inferences and justify conclusions from sample surveys, experiments, and observational studies. [In earlier grades, students are introduced to different ways of collecting data and use graphical displays and summary statistics to make comparisons. These ideas are revisited with a focus on how the way in which data is collected determines the scope and nature of the conclusions that can be drawn from that data. The concept of statistical significance is developed informally through simulation as meaning a result that is unlikely to have occurred solely as a result of random selection in sampling or random assignment in an experiment.] S.IC.3: Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each. S.IC.4: Use data from a sample survey to estimate a Why Randomize: http://www.illustrativemathematics.org/illustrations/191 Do you Fit in This Car: http://www.illustrativemathematics.org/illustrations/1020 Should We Send Out a Certificate: http://www.illustrativemathematics.org/illustrations/1218 Birthday Paradox Does Your iPod Play Favorites? Collect All Panda Population What is Random Behavior? Homework, Checkpoints, Quizzes, Tests Performance Assessment(s): population mean or proportion; develop a margin of error through the use of simulation models for random sampling. [Focus on the variability of results from experiments— that is, focus on statistics as a way of dealing with, not eliminating, inherent randomness.] S.IC.5: Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant. S.IC.6: Evaluate reports based on data. Summarize, represent, and interpret data on a single count or measurement variable. S.ID.4: Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets, and tables to estimate areas under the normal curve. [While students may have heard of the normal distribution, it is unlikely that they will have prior experience using it to make specific estimates. Build on students’ understanding of data distributions to help them see how the normal distribution uses area to make estimates of frequencies (which can be expressed as probabilities). Emphasize that only some data are well described by a normal distribution.]