Chapter 1 Introduction to the Statistical Process Section 1.1: Introduction to Statistics Statistics vs. Anecdotal Evidence Smoking causes cancer. Seat belts save lives. Autism and Vaccines Nelson says it wasn't long after her son Parker's shots at 15 months that she noticed something was wrong. "He had run a slight fever after the vaccinations, but i didn't think anything of it," said Nelson. "You know kids run fevers all the time, but about a week after that he just completely stopped talking." After months of worrying, wondering, and going back and forth with doctors, an official diagnosis was made: autism. Nelson believes it started with the vaccines. "Gradually, I started piecing it together. He got sick after his vaccinations and about a week later everything changed. He was a completely different little boy then," said Nelson. What is Statistics? Statistics the discipline that guides us to produce or collect data which is then analyzed in order to draw inferences or make predictions. Numerical summaries such as means, percentages, and standard deviations are called statistics. Descriptive Statistics Descriptive Statistics refers to methods for summarizing data. These summaries consist of graphs (histograms, scatterplots, pie charts, etc.) and numbers (means, standard deviations, regression equations, percentages, etc.). Inferential Statistics Inferential statistics refers to methods of making decisions or predictions about a population or a process, based on data obtained from a sample. We will use tests of significance and confidence intervals to achieve this. This semester, we will be looking at and conducting a number of studies Statistical Process 1. Ask a research question 2. Design a study Research Conjecture 3. Collect data 4. Explore the data Logic of Inference 5. Draw inferences - Significance - Estimation Scope of Inference 6. Formulate conclusions - Generalize - Cause/Effect 7. Communicate findings Physicians’ Health Study I 1. Research Question: Will taking aspirin help reduce heart attacks? 2. Design Study: Started in 1982 with 22,071 male physicians. • Half took a 325mg aspirin every other day (the other half took a placebo) Physicians’ Health Study I 3. Collect Data: Intended to go until 1995, the aspirin study was stopped in 1988 after 189 heart attacks occurred in the placebo group and 104 in the aspirin group. Hoped to be a wonder drug, it was found there was no benefit or harm from beta carotene. This result allowed investigators to turn to other, more promising agents. Physicians’ Health Study I 4. Explore Data: 1.7% in the placebo group had heart attacks while only 0.9% in the aspirin group had heart attacks. (45% reduction in heart attacks for the aspirin group) 5. Draw Inferences: The likelihood of the difference between the proportions of heart attacks in each group being as large as it was just by chance is very, very small. Physicians’ Health Study I 6. Formulate Conclusions: They concluded that taking aspirin does reduce the likelihood of heart attacks in middle-age and older males. 7. Report Findings: Terminology The individual entities on which data are recorded are called observational units. The recorded characteristics of the observational units are the variables of interest. What are the observational units and variables in the Physician’s Health Study? Section 1.2 Introduction to the Logic of Statistical Inference Dolphin Communication Can dolphins communicate abstract ideas? In an experiment done in the 1960s, Doris was instructed which of two buttons to push. She then had to communicate this to Buzz (who could not see Doris). If he picked the correct button, both dolphins would get a reward. What are the observational units and variables in this study? Dolphin Communication In one set of trials, Buzz chose the correct button 15 out of 16 times. Based on these results, do you think Buzz knew which button to push or is he just guessing? How might we justify an answer? How might we model this situation? Modeling Buzz and Doris Flip Coins Applet Can Chimps Solve Problems? http://youtu.be/ySMh1mBi3cI Exploration 1.2: Can Chimps Solve Problems? Sarah, a 30 year-old chimp, is shown videos of a person struggling with some problem. (can’t reach a banana, cage door locked, record player not working, etc.) She is then shown two pictures. One of the solution and one not. She then picks one of the pictures. Does Sarah understand the solution to these problems or is she just randomly picking a picture? Exploration 1.2 (pg 15) Read the first paragraph. 1. State the research question. (This is a broad statement.) 2. State the research conjecture. (This is more specific to our test.) Sarah correctly picked 7 of the 8 pictures. Is this unlikely if she is just guessing? Continue working on the exploration. Section 1.3 Statistical Significance: Other Random Choice Models Can dogs sniff out cancer? Marine sniffing samples Can Dogs Sniff Out Cancer? 1. Research Question: Can dogs detect a patient with cancer by smelling their breath? 2. Design a study: Five breath bags were shown to Marine, one from a cancer patient and four from non-cancer patients. 3. Collect data: Marine completed 33 attempts at this procedure. 4. Explore the data: Marine identified the correct bag 30 out of 33 times. Can Dogs Sniff Out Cancer? How is the chance model we will use for this situation different than our previous ones? Can we use coins again? Can Dogs Sniff Out Cancer? 5. Draw Inferences Three S Strategy Statistic: Compute the statistic from the observed data. Simulate: Identify a model that represents a chance explanation. Use the model to simulate data that “could have happened” when the chance model is true. Calculate the value of the statistic from the could-havebeen data. Repeat the simulation process to generate a distribution of the could-have-been values for the statistic. Strength of evidence: Consider whether the value of the observed statistic is unlikely to occur when the chance model is true. Can Dogs Sniff Out Cancer? We have the statistic. Marine made the correct identification 30 out of 33 times. How could we set up a simulation? Tactile (how could this be done?) Applet Strength of evidence. Is 30 out of 33 very unlikely under the chance model? Can Dogs Sniff Out Cancer? 6: Formulate conclusions: Can we conclude that marine can identify cancerous breath? Can we conclude that all dogs can do this? Some dogs? 7: Communicate findings: Marine, the dog that can sniff out bowel cancer By Jeremy Laurance, Health Editor A labrador retriever called Marine has been trained to sniff out cancer with stunning accuracy, researchers report today. Terminology: Hypotheses The null hypothesis is the chance explanation. Typically the alternative hypothesis is what the researchers think is true. Null hypothesis: Marine is randomly choosing which bag to sit next to. Alternative hypothesis: Marine is not randomly choosing which bag to sit next to. Terminology: Null Distribution We will refer to the distribution of chance outcomes as the null distribution. For Marine, we should have gotten a null distribution similar to the following. Terminology: P-value The p-value as the proportion of outcomes in the null distribution that are at least as extreme as the value of the statistic actually observed in the study. What was our p-value for Marine? Were they all the same? Were they all close to the same? Guidelines for evaluating strength of evidence from p-values p-value >0.10, not much evidence against null hypothesis 0.05 < p-value < 0.10, moderate evidence against the null hypothesis 0.01 < p-value < 0.05, strong evidence against the null hypothesis 0.001 < p-value < 0.01, very strong evidence against the null hypothesis p-value < 0.001, extremely strong evidence against the null hypothesis Terminology: Statistically Significant If the observed results provide strong evidence that the data did not arise by random chance alone then the research result is called statistically significant. Are Marine’s results statistically significant? Let’s play some rock-paper-scissors Rock smashes scissors Paper covers rock Scissors cut paper Play the novice version at least 30 times and keep track of all your choices. Activity 1.4 Now work on activity 1.4. Criminal Justice System vs. Significance Tests Innocent until proven guilty. We assume a defendant is innocent and the prosecution has to collect evidence to try to prove the defendant is guilty. Likewise, we assume our chance model (or null hypothesis) is true and we collect data and calculate a sample proportion. We then show how unlikely our proportion is if the chance model is true. Criminal Justice System vs. Significance Tests If the prosecution shows lots of evidence that go against this assumption of innocence (DNA, witnesses, motive, contradictory story, etc.) then the jury concludes that the defendant the innocence assumptions is wrong. If after we collect data and find that the likelihood (p-value) of such a proportion is so small that it would rarely occur by chance if the null hypothesis is true, then we conclude our assumption of the chance model being true is wrong. Review For Sarah the chimp, you could have gotten a null distribution similar to the one shown here. • What does a single dot represent? • What does the whole distribution represent? • What is the p-value for this simulation? • What does this p-value mean? More Review The null hypothesis is the chance explanation. Typically the alternative hypothesis is what the researchers think is true. Three S Strategy Statistic, Simulate, Strength of evidence The p-value as the proportion of outcomes in the null distribution that are at least as extreme as the value of the statistic actually observed in the study. Still More Review A small p-value gives evidence against the null and for the alternative. If the observed results provide strong evidence that the data did not arise by random chance alone then the research result is called statistically significant. Section 1.4 Other Chance Models Ron Artest, choker at the line? In the 2009-10 basketball Season Ron Artest made 68.8% of his free throws, similar to his career average. In his first 15 attempts in the playoffs, he only made 7 free throws. (46.7%) Is this evidence that he is “choking” and performing significantly worse than during the regular season? Ron Artest Example What are the observational units? Artest’s 15 free throw attempts. What is the variable? Whether or not he makes the free throw. What is the statistic of interest? 7/15 Notation Our sample proportion (statistic) can be described using the symbol 𝑝 (p-hat). A parameter is a numerical summary of a variable that is either an unobservable long-run outcome or a value for an entire population. It can be described using the symbol 𝜋 (pi). In our example, 𝜋 = 0.688 and 𝑝 = 0.467. Hypotheses Null hypothesis: Ron Artest’s performance at the free throw line during the 2010 NBA finals is the same as his regular season performance; his probability of making a basket in the playoffs is 0.688. Alternative hypothesis: Ron Artest’s performance at the free throw line during the 2010 NBA finals is worse than his regular season performance; his probability of making a basket in the playoffs is less than 0.688. Simulated Chance Model Coins, cards, dice, spinners, etc. don’t really work well here to develop a chance model of a 68.8% success rate. But we can still use the magic of an applet. (While this will be a different applet than the first two we used, it is essentially the same.) Ron Artest Continued So we have moderate evidence against the null. Let’s see what would happen if we had more data. Suppose he continued to shoot 46.7% from the free throw line so that he made 7 out of 15 of his next attempts as well for a total of 14 out of 30. Let’s return to the applet to see how our p-value would change. Ron Artest Continued As the sample size increases, there is less variability in our null distribution. It is still centered around 0.688, but its width becomes more and more narrow. As a result, 0.467 gets further and further out in the tail and thus the p-value gets smaller. This should make intuitive sense in that with a larger sample size, we have more evidence. Ron Artest Continued Besides a larger sample size, how else could we get more evidence against the null? Artest could make fewer shots. Is that what really happened? No. Artest made 4 of his next 5 shots for a total of 11 out of 20 (55%) for the playoffs. Let’s return to the applet and see how this changes our p-value. Exploration 1.4 Shaky Putting? Phil Mickelson is one of the best golfers in the world. He’s won the Masters Tournament three times. However, 2011 was not his best year. He seemed to struggle with his putting and switched to a “belly putter” late in the year. Exploration 1.4 Was Mickelson a poor putter in 2011? In this exploration, you will compare Mickelson’s 2011 record of putting from 10 feet away from the hole with that of all other professional golfers that year. Was he significantly worse than his peers? Section 1.5 Modeling More Complex Situations Infant preference for helper or hinderer? Helper Toy Baby chooses a toy Helper or Hinderer? Sixteen babies were shown the two demonstrations. One helper toy and one hinderer toy. Which toy used and the order was random. When presented with the two toys (randomly which was to the left and which to the right) 14 of the babies chose the helper toy. How is this experiment different than any we have looked at so far? Helper or Hinderer? The key difference is that each attempt was made by a different baby. Our chance model implies that each baby has the same chance of choosing the helper toy (50%). It could be that some babies randomly choose and some do not. We will talk about this in our conclusion. Let’s run the test. Helper or Hinderer? Null Hypothesis: Each baby is randomly choosing one of two toys. (The babies choose the helper toy 50% of the time in the long run.) Alternative Hypothesis: The babies are not randomly choosing, but show a preference for the helper toy. (The babies choose the helper toy more than 50% of the time in the long run.) We can use any applet to test this. Remember that our sample proportion is 14 out of 16. Helper or Hinderer? So what can we conclude? Do all the babies prefer the helper toy? Do some of the babies prefer the helper toy? Because we had a low p-value, we can conclude that not all the babies are randomly choosing and that at least some of them prefer the helper toy. Can we make conclusions beyond these 16 babies? Which Tire? Two students miss a chemistry exam because of excessive partying, but blame their absence on a flat tire. The professor allowed them to take a make-up exam, and he sent them to separate rooms to take it. The first question, worth 5 points, was quite easy. The second question, worth 95 points, asked: Which tire was flat? Which Tire? How would you answer this question? Driver’s side front Passenger’s side front Driver’s side rear Passenger’s side rear Exploration 1.5: Tire Story Falls Flat We will use the data from class to determine if students have a preference for picking one of the four tires. This is similar to the helper-hinderer example because our observational units are different people. Let’s work exploration 1.5 (page 50).