Chapter 1 Statistical Thinking •What is statistics? •Why do we study statistics Statistical Thinking • the science of collecting, organizing, and analyzing data • the mathematics of the collection, organization and interpretation of numerical data • The branch of mathematics which is the study of the methods of collecting and analyzing data • a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters Statistical Thinking Statistics is a discipline which is concerned with: – designing experiments and other data collection, – summarizing information to aid understanding, – drawing conclusions from data, and – estimating the present or predicting the future. Statistical Thinking • "I like to think of statistics as the science of learning from data...." Jon Kettenring, ASA President, 1997 • Steps of statistical analysis involve: – collecting information (Data Collection) – evaluating the information (Data Analysis) – drawing conclusions (Statistical Inference) Statistical Thinking • What type of information? – A test group's favorite amount of sweetness in a blend of fruit juices – The number of men and women hired by a city government – The velocity of a burning gas on the sun's surface – Clinical trials to investigate the effectiveness of new treatments – Field experiments to evaluate irrigation methods – Measurements of water quality Statistical Thinking Problems • Is a new treatment for heart disease more effective than a standard one? • Is using a high octane gas beneficial to car performance? • Does reading an article in statistics improve students’ statistics grade? Statistical Thinking • Is a new treatment for heart disease more effective than a standard one? – Pick, say, 100 heart patients – Divide them into two groups, 50 in each group – Group 1------------New treatment – Group 2------------Standard treatment Statistical Thinking Results • 40 out of 50 of Group 1 patients improved • 30 out of 50 of Group 2 patients improved • Conclusion: New treatment is more effective! Statistical Thinking • How do you divide the patients? • Have you controlled other factors? (fitness level, life style, age, etc) • How do you decide who gets what treatment? Ethical issues???? Statistical Thinking Comparing Test Scores • Select 10 students and give them a journal article in statistics. • Test their knowledge about the article and record their scores • Repeat the test after they take STT 231. Statistical Thinking Result • 8 out of the 10 students improved their scores. • Question: Can we conclude that reading the article has improved students’ knowledge about statistics? Statistical Thinking Look at worst case scenarios: “Under the assumption that the new treatment is no better than the standard one, what is the chance that 80% of the patients benefit from this treatment?” “Under the assumption that STT 231 brings no benefit, how likely is it that we see 80% of the students improve their scores? “ Statistical Thinking Need a model to answer these questions!! If STT 231 is not beneficial, then students’ scores may go up or down with 50% chance. This is equivalent to flipping a coin: • • 50% chance you get Head 50% chance you get Tail Statistical Thinking • Comparing pre and post test scores for 10 students is equivalent to – flipping a coin 10 times and calculating the chance of observing 8H • Relevant Questions: – Will the chance of observing 80% of the time H depend on the number of students involved in the experiment? – Will this chance go up, down or remain the same if you repeat the experiment with 200 students? Statistical Thinking • Suppose the proportion of improvement in 10 trials is 4.4%. What does this mean? – If STT 231 is not beneficial, then there is a 4.4%chance that we will observe 8 out of 10 students’ scores improve. – There is little hope that 8 students’ scores will improve by just by CHANCE Statistical Thinking • Suppose the proportion of improvement in 10 trials is 4.4%. • We observed 8 students’ scores out of 10 improve. • What does this mean? Statistical Thinking • Course is highly effective • Course is ineffective and we observed an unlikely event. • We do not know which one! Statistical Thinking • Suppose there is a “small” chance that an event happens by CHANCE, • Then this is an indication for a strong evidence that the change that we observe did not happen by CHANCE. • Hence there is a strong evidence for a factor to be responsible for this change. Statistical Thinking • The course is highly effective!! • Reasoning: What we observed is very unlikely if the course was ineffective. Hence the course is effective. • The 80% score increment is unlikely to be achieved if the course was ineffective. Statistical Thinking Some Remarks For questions that involve uncertainty: – Carefully formulate the question you want to answer (Modeling) – Collect Data – Summarize, analyze and present data – Draw Conclusions. Conclusions always include uncertainty – Support your conclusions by quantifying how confident you are about your conclusions. Chapter 2 A Design Example • • • • • The Polio Vaccine Case Caused by virus Especially deadly in children Big problem during the first half of the 20th Century Develop vaccine to fight the disease Jonas Salk (~1950) A Design Example • Problem with vaccines: – Are they safe? – Are they effective? • Undertake a large scale trial to answer these questions A Design Example • Case 1: A Simple Study – Distribute the vaccine widely (under the assumption it is safe) – Decrease in the number of polio cases after the vaccine provides evidence that the vaccine is effective • Problem????? A Design Example Problems • Lack of control group – Is decrease in number of polio due to the vaccine or other factors? • How reliable is the assumption “vaccine is safe”? A Design Example • Case 2: Adding a Control Group – Have two groups • Control group-----gets salt solution • Treatment group---gets the actual vaccine A Design Example • Example (Observed Control Study) – Control Group---all 1st and 3rd grade children – Treatment group---all 2nd graders • Assumption: – Age difference between control and treatment group was felt to be unimportant A Design Example • Potential Problems: – Parents of 2nd graders may not agree to vaccinating their kids – Parents of sicker kids are most likely to accept the vaccine – More educated parents tend to accept the vaccine – Parents of sick 1st and 3rd graders may object that their kids are not getting treatment A Design Example • Difficulty in diagnosing polio – Extreme case of polio are easy to diagnose – Less severe cases of polio have symptoms similar to other common illnesses A Design Example • Potential Problems – Physicians are aware of who has received the vaccine and who has not – Less severe case of polio in a 2nd grader (who has received the vaccine) may wrongly diagnosed as another illness – Less severe case in a 1st or 3rd grader will most likely be diagnosed as polio A Design Example • Case 3: Randomization, Placebo Control, Double Blindness – Random assignment of control and treatment groups • Select a child • Flip a coin-------H-------Treatment Group T---------Control Group Design Example • Placebo Control – Kids in the control group receive salt solution • Double Blind – Neither the child – nor the parents – nor the doctors/nurses who make the diagnosis of polio know whether a kid receives the vaccine or the placebo A Design Example Summary • In designing experiments – Introduce some sort of control group – Use randomization to avoid bias in selection and assignment of subjects for the study – Double blind experiments give protection against biases, both intentional and unintentional A Design Example • Perform the experiment on a large number of subjects (Polio case ~in millions of kids) • Repeat the experiment several times before making definitive conclusions A Design Example Basic Principles of Experimental Designs • Randomization • Blocking (Treatment/Control Groups) • Replication