Wendell B. Barnwell II wbarnwell@wcpss.net Leesville Road High School Major Topics/ Strands A. Interpreting Categorical and Quantitative Data (ID) • Exploring Data B. Conditional Probability and the Rules of Probability (CP) • Anticipating Patterns in Advance C. Making Inferences and Justifying Conclusions (IC) • Statistical Inference Why Statistics? Arthur Benjamin TED Talk 2009 Teach Statistics over Calculus http://www.youtube.co m/watch?v=BhMKmovN jvc Why Statistics? (con’d) Most people will take at most one Statistics class in their lives. That includes future senators to sales clerks, … as well as presidents, CEOs, jurors, doctors, and other decision makers It’s our job to teach them how to make informed decisions! Prudential Age Commercial Awesome data collection example. http://www.youtube.co m/watch?v=C3qj88J7-jA Types of Variables!! Categorical Data Quantitative Data M&M colors Height Gender Armspan Whether an individual Distance from home has a cellular phone Graphing Variables Categorical Data Quantitative Data Pie chart Dotplot Bar chart Stemplot Two-way table Histogram Scatterplot Time plot Common Core Math 1 Goals Summarize, represent, and interpret data on a single count or measurement variable. S-ID.1 Represent data with plots on the real number line (dot plots, histograms, and box plots). S-ID.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets. S-ID.3 Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers). Middle Grades Foundation 6th grade: Develop understanding of statistical variability. 6.SP.1. Recognize a statistical question as one that anticipates variability in the data related to the question and accounts for it in the answers. 6.SP.2. Understand that a set of data collected to answer a statistical question has a distribution which can be described by its center, spread, and overall shape. 6.SP.3. Recognize that a measure of center for a numerical data set summarizes all of its values with a single number, while a measure of variation describes how its values vary with a single number. Summarize and describe distributions. 6.SP.4. Display numerical data in plots on a number line, including dot plots, histograms, and box plots. 6.SP.5. Summarize numerical data sets in relation to their context. a) Reporting the number of observations. b) Describing the nature of the attribute under investigation, including how it was measured and its units of measurement. c) Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered. d) Relating the choice of measures of center and variability to the shape of the data distribution and the context in which the data were gathered. Middle Grades Foundation 7th grade: Use random sampling to draw inferences about a population. 7.SP.1. Understand that statistics can be used to gain information about a population by examining a sample of the population; generalizations about a population from a sample are valid only if the sample is representative of that population. Understand that random sampling tends to produce representative samples and support valid inferences. 7.SP.2 Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. Draw informal comparative inferences about two populations. 7.SP.3 Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability. For example, the mean height of players on the basketball team is 10 cm greater than the mean height of players on the soccer team, about twice the variability (mean absolute deviation) on either team; on a dot plot, the separation between the two distributions of heights is noticeable. 7.SP.4 Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations. For example, decide whether the words in a chapter of a seventh-grade science book are generally longer than the words in a chapter of a fourth-grade science book. Activity #1 – Tennis Balls Using a ruler measure the diameter of a tennis ball to the nearest millimeter. Place your measurement on a post it and place it on the board above our number line. Describe the distribution. Activity #2 – Peanuts! Don’t freak out there are none in the room! My students took a sample of unshelled peanuts and measured the lengths of those peanuts in millimeters. We then created a line plot Middle School Foundation 8.SP.4. Understand that patterns of association can also be seen in bivariate categorical data by displaying frequencies and relative frequencies in a two-way table. Construct and interpret a two-way table summarizing data on twocategorical variables collected from the same subjects. Use relative frequencies calculated for rows or columns to describe possible association between the two variables. For example, collect data from students in your class on whether or not they have a curfew on school nights and whether or not they have assigned chores at home. Is there evidence that those who have a curfew also tend to have chores? Activity #3 – M&M Data Take pack of snack size M&Ms and compare it to a pack of regular size M&Ms. Create a two way table to compare this data. How can we compare this data? How can we graph this data? Middle School Foundation 8th grade: Use random sampling to draw inferences about a population. 8.SP.1. Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association. . 8.SP.2 Know that straight lines are widely used to model relationships between two quantitative variables. For scatter plots that suggest a linear association, informally fit a straight line, and informally assess the model fit by judging the closeness of the data points to the line. Activity 4 – Typhoons in the Pacific This is a problem I adapted from the 2013 AP Statistics exam problem #6. Common Core Math 2 Goals S-CP.1 Describe events as subsets of a sample space (the set of outcomes) using characteristics (or categories) of the outcomes, or as unions, intersections, or complements of other events ("or," "and," "not") with visual representations including Venn diagrams. S-CP.2 Understand that two events A and B are independent if the probability of A and B occurring together is the product of their probabilities, and use this characterization to determine if they are independent. Common Core Math 2 Goals S-CP.3 Understand the conditional probability of A given B as P(A and B)/P(B), and interpret independence of A and B as saying that the conditional probability of A given B is the same as the probability of A, and the conditional probability of B given A is the same as the probability of B. S-CP.4 Construct and interpret two-way frequency tables of data when two categories are associated with each object being classified. Use the two-way table as a sample space to decide if events are independent and to approximate conditional probabilities. Common Core Math 2 Goals S-CP.5 Recognize and explain the concepts of conditional probability and independence in everyday language and everyday situations. S-CP.6 Find the conditional probability of A given B as the fraction of B's outcomes that also belong to A, and interpret the answer in terms of the model. Middle Grade Alignment CCSS.Math.Content.7.SP.C.5 Understand that the probability of a chance event is a number between 0 and 1 that expresses the likelihood of the event occurring. Larger numbers indicate greater likelihood. A probability near 0 indicates an unlikely event, a probability around 1/2 indicates an event that is neither unlikely nor likely, and a probability near 1 indicates a likely event. CCSS.Math.Content.7.SP.C.6 Approximate the probability of a chance event by collecting data on the chance process that produces it and observing its longrun relative frequency, and predict the approximate relative frequency given the probability. CCSS.Math.Content.7.SP.C.7 Develop a probability model and use it to find probabilities of events. Compare probabilities from a model to observed frequencies; if the agreement is not good, explain possible sources of the discrepancy. CCSS.Math.Content.7.SP.C.7a Develop a uniform probability model by assigning equal probability to all outcomes, and use the model to determine probabilities of events. CCSS.Math.Content.7.SP.C.7b Develop a probability model (which may not be uniform) by observing frequencies in data generated from a chance process. Middle Grade Alignment CCSS.Math.Content.7.SP.C.8 Find probabilities of compound events using organized lists, tables, tree diagrams, and simulation. CCSS.Math.Content.7.SP.C.8a Understand that, just as with simple events, the probability of a compound event is the fraction of outcomes in the sample space for which the compound event occurs. CCSS.Math.Content.7.SP.C.8b Represent sample spaces for compound events using methods such as organized lists, tables and tree diagrams. For an event described in everyday language (e.g., "rolling double sixes"), identify the outcomes in the sample space which compose the event. CCSS.Math.Content.7.SP.C.8c Design and use a simulation to generate frequencies for compound events. CCSS.Math.Content.8.SP.A.4 Understand that patterns of association can also be seen in bivariate categorical data by displaying frequencies and relative frequencies in a two-way table. Construct and interpret a two-way table summarizing data on two categorical variables collected from the same subjects. Use relative frequencies calculated for rows or columns to describe possible association between the two variables. Why Probability? Looking at games of chance Card games, lotteries, fantasy sports, horse racing Looking at social science data Life, Death, medical field, biostatistics Looking at scientific data variations in individual measurement are random (example: tennis ball diameter measurements) Chance Behavior Chance Behavior is unpredictable in the short run but has a regular and predictable pattern in the long run. Randomness We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions. Priniples of Randomness 1. Long series of independent trials 2. The idea is empirical. We can estimate a real-world probability by actually observing many trials. (ex. Simulation – combining class data) 3. Short runs only give a rough estimate; some several hundred simulations are necessary to settle down a probability. Definition of Probability The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions. That is, the probability is a long-term relative frequency. Interpreting Probabilities Ex. (a) – There is a .3 chance of rain tomorrow. How do you interpret this statement? Interpreting Probabilities Ex. (a) – There is a .3 chance of rain tomorrow. Answer: Under the same conditions after a long run of days under the same conditions there is 30% chance that it will rain tomorrow. Meteorologists may have examined 100 days, 200 days maybe more, but probably not just 10 days and 3 resulted in rain. Interpreting Probabilities Ex. (b) – Your probability of winning at this lottery game is 1/1000. How do you interpret this statement? Interpreting Probabilities Ex. (b) – Your probability of winning at this lottery game is 1/1000. Answer: Playing the lottery for a long run of the same conditions there is a one and one-thousand chance of winning. It may take a 1,000, 2,000, maybe more plays of this lottery to settle down this probability and finally result in a win. Must be Independent !!! In order for an event to be considered random it must be independent. Each event does not influence the outcome of another event. Example: rolling a die. Rolling a 3 does not influence the probability of rolling a 6 on the next roll. Sample Space A Sample Space S is a random phenomenon is the set of all possible Outcomes. Event An event is any outcome or a set of outcomes of a random phenomenon. This is a subset of the sample space Probability Model A probability model is a mathematical description of a random phenomenon consisting of two parts: A sample space A way of assigning probabilities to events Example #1 Consider a situation in which shoppers were categorized by gender (M or F) and the type of music purchased (C = classical, R = rock, K = country, and P = Rap) a) What is the sample space? b) What probability is associated to each event? c) Event in which a shopper purchased classical. d) Event in which the shopper was male. Example #2 An observer stands at the bottom of a freeway offramp and records the turning direction (L=left, R=right) of each of three successive vehicles. What is the sample space? What’s the probability of each outcome? What event(s) has exactly one car turning right? What event(s) has exactly one car turning left? What event(s) have all cars turning the same direction Assigning Probability Some events are equally likely and some are not. Students need to be aware that the TOTAL number of events is not always the denominator to the probability. Let’s start with some possible “equally likely scenarios” Equally Likely Events (a) Whether a fair die lands on 1,2,3,4,5 or 6. (b) The sum of two fair dice landing on 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (c) A fair coin landing on heads or tails when tossed (d) A fair coin landing on heads or tails when spun on its side. (e) A tennis racquet landing with the label “up” or “down” when spun on its end Equally Likely Events (f) Your grade in this course being A, B, C, D, or F (g) Whether or not California experiences a catastrophic earthquake within the next year (h) Whether or not your server correctly brings you the meal you ordered in a restaurant (i) Whether or not there is intelligent life on Mars (j) Whether or not a woman will be elected President in next election. Equally Likely Events (k) Whether or not a woman will be elected President before the year 2010. (l) Colors of Reese’s Pieces candies: orange, yellow and brown Probability Example #1 The heart association claims that only 10% of US adults over age 30 can pass the president’s physical fitness commission’s minimum requirements. In a group of 4 randomly chosen adults, what is the probability that 2 can pass and 2 cannot pass? Probability Example #2 Advertising Agency Worksheet Simulation The imitation of chance behavior, based on a model that accurately reflects the situation, is called a simulation. Simulation Steps State: What is the question of interest about some chance process? Plan: Describe how to use a chance device to imitate one repetition of the process Explain clearly how to identify the outcomes of the chance process and what variable to measure. Do: Perform many repetitions of the simulation. Conclude: Use the results to answer the question of interest Probability Example #3 Eric Staal, center for the Carolina Hurricanes, is off to a strong start of an NHL season the season. He is getting about 8 shots on goal a game and is making a third of his shots. What is the probability that Eric scores 4 goals in a game? Common Core Math 3 Goals Understand and evaluate random processes underlying statistical experiments S-IC.1 Understand statistics as a process for making inferences about population parameters based on a random sample from that population. Make inferences and justify conclusions from sample surveys, experiments, and observational studies S-IC.3 Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each. S-IC.4 Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. S-IC.5 Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant. S-IC.6 Evaluate reports based on data. Middle Grades Alignment 7th grade: Use random sampling to draw inferences about a population. 7.SP.1. Understand that statistics can be used to gain information about a population by examining a sample of the population; generalizations about a population from a sample are valid only if the sample is representative of that population. Understand that random sampling tends to produce representative samples and support valid inferences. 7.SP.2 Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. Middle Grades Alignment Draw informal comparative inferences about two populations. 7.SP.3 Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability. 7.SP.4 Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations. Activity #1 - The “1 in 6 wins” Game As a special promotion for its 20-ounce bottles of soda, a soft drink company printed a message on the inside of each cap. Some of the caps said “Please try again”, while others said “You’re a winner!” The company advertised the promotion with the slogan “1 in 6 wins a prize.” Seven friends each buy one bottle 20-ounce bottle of the soda at a local convenience store. The clerk is surprised when three of them win a prize. Is this group of friends just lucky, or is the company’s claim inaccurate? Activity #2 –Sleep Deprivation Source: Rossman et. al NSF Project Researchers have established that sleep deprivation has a harmful effect on visual learning. But do these effects linger for several days, or can a person “make up” for sleep deprivation by getting a full night’s sleep on subsequent nights? A recent study investigated this question by randomly assigning 21 subjects to one of two groups: one group was deprived of sleep on the night following training and pre-testing with a visual discrimination task, and the other group was permitted unrestricted sleep on that first night. Both groups were then allowed as much sleep as they wanted on the following two nights. All subjects were then re-tested on the third day. Sleep Deprivation Data Subjects’ performance on the test was recorded as the minimum time (in milliseconds) between stimuli appearing on a computer screen for which they could accurately report what they had seen on the screen. • Sleep deprivation (n = 11): -14.7, -10.7, -10.7, 2.2, 2.4, 4.5, 7.2, 9.6, 10.0, 21.3, 21.8 • Unrestricted sleep (n = 10): -7.0, 11.6, 12.1, 12.6, 14.5, 18.6, 25.2, 30.5, 34.5, 45.6 Did sleep deprivation cause difference in performance? Or is there another possible explanation? Rerandomizing Simulation Place 21 cards (subjects) in a bag If no difference in treatment effects, then values same as in original study How large a difference in group means with different random assignments? Mix your cards and draw 10 to represent the unrestricted group. Compare your mean to 19.82. Report. Physical simulation can be tedious… Final Thoughts – sleep deprivation Research question? Do the effects of sleep deprivation on visual learning last for several days? Idea: suppose there’s no “treatment effect” Differences due to random assignment? “Re-randomize” many times What would you conclude? Closing Thoughts / Questions 2 variable statistics (scatterplot, models)