Chapter 1 Statistical Thinking
•What is statistics?
•Why do we study statistics
Statistical Thinking
• the science of collecting, organizing, and analyzing data
• the mathematics of the collection, organization and interpretation of numerical data
• The branch of mathematics which is the study of the methods of collecting and analyzing data
• a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters
Statistical Thinking
Statistics is a discipline which is concerned with:
– designing experiments and other data collection,
– summarizing information to aid understanding,
– drawing conclusions from data, and
– estimating the present or predicting the future.
Statistical Thinking
• "I like to think of statistics as the science of learning from data ...." Jon Kettenring, ASA
President, 1997
• Steps of statistical analysis involve:
– collecting information ( Data Collection )
– evaluating the information ( Data Analysis )
– drawing conclusions ( Statistical Inference )
Statistical Thinking
• What type of information?
– A test group's favorite amount of sweetness in a blend of fruit juices
– The number of men and women hired by a city government
– The velocity of a burning gas on the sun's surface
– Clinical trials to investigate the effectiveness of new treatments
– Field experiments to evaluate irrigation methods
– Measurements of water quality
Statistical Thinking
Problems
• Is a new treatment for heart disease more effective than a standard one?
• Is using a high octane gas beneficial to car performance?
• Does reading an article in statistics improve students’ statistics grade?
Statistical Thinking
• Is a new treatment for heart disease more effective than a standard one?
– Pick, say, 100 heart patients
– Divide them into two groups, 50 in each group
– Group 1------------New treatment
– Group 2------------Standard treatment
Statistical Thinking
Results
• 40 out of 50 of Group 1 patients improved
• 30 out of 50 of Group 2 patients improved
• Conclusion: New treatment is more effective!
Statistical Thinking
• How do you divide the patients?
• Have you controlled other factors? (fitness level, life style, age, etc)
• How do you decide who gets what treatment? Ethical issues????
Statistical Thinking
Comparing Test Scores
• Select 10 students and give them a journal article in statistics.
• Test their knowledge about the article and record their scores
• Repeat the test after they take STT 231.
Statistical Thinking
Result
• 8 out of the 10 students improved their scores.
• Question: Can we conclude that reading the article has improved students’ knowledge about statistics?
Statistical Thinking
Look at worst case scenarios:
“ Under the assumption that the new treatment is no better than the standard one , what is the chance that 80% of the patients benefit from this treatment?”
“ Under the assumption that STT 231 brings no benefit , how likely is it that we see 80% of the students improve their scores? “
Statistical Thinking
Need a model to answer these questions!!
If STT 231 is not beneficial, then students’ scores may go up or down with 50% chance.
This is equivalent to flipping a coin:
• 50% chance you get Head
• 50% chance you get Tail
Statistical Thinking
• Comparing pre and post test scores for 10 students is equivalent to
– flipping a coin 10 times and calculating the chance of observing 8H
• Relevant Questions:
– Will the chance of observing 80% of the time H depend on the number of students involved in the experiment?
– Will this chance go up, down or remain the same if you repeat the experiment with 200 students?
Statistical Thinking
• Suppose the proportion of improvement in
10 trials is 4.4%. What does this mean?
– If STT 231 is not beneficial, then there is a
4.4%chance that we will observe 8 out of 10 students’ scores improve.
– There is little hope that 8 students’ scores will improve by just by CHANCE
Statistical Thinking
• Suppose the proportion of improvement in
10 trials is 4.4%.
• We observed 8 out of 10 students’ scores improve.
• What does this mean?
Statistical Thinking
• Course is highly effective
• Course is ineffective and we observed an unlikely event.
• We do not know which one!
Statistical Thinking
• Suppose there is a “small” chance that an event happens by CHANCE ,
• Then this is an indication for a strong evidence that the change that we observe did not happen by CHANCE.
• Hence there is a strong evidence for a factor to be responsible for this change.
Statistical Thinking
• The course is highly effective!!
• Reasoning: What we observed is very unlikely if the course was ineffective.
Hence the course is effective.
• The 80% score increment is unlikely to be achieved if the course was ineffective.
Statistical Thinking
Some Remarks
For questions that involve uncertainty:
– Carefully formulate the question you want to answer
(Modeling)
– Collect Data
– Summarize, analyze and present data
– Draw Conclusions. Conclusions always include uncertainty
– Support your conclusions by quantifying how confident you are about your conclusions.
Chapter 2 A Design Example
The Polio Vaccine Case
• Caused by virus
• Especially deadly in children
• Big problem during the first half of the 20 th
Century
• Develop vaccine to fight the disease
• Jonas Salk (~1950)
A Design Example
• Problem with vaccines:
– Are they safe?
– Are they effective?
• Undertake a large scale trial to answer these questions
A Design Example
• Case 1: A Simple Study
– Distribute the vaccine widely (under the assumption it is safe)
– Decrease in the number of polio cases after the vaccine provides evidence that the vaccine is effective
• Problem?????
A Design Example
Problems
• Lack of control group
– Is decrease in number of polio due to the vaccine or other factors?
• How reliable is the assumption “vaccine is safe”?
A Design Example
• Case 2: Adding a Control Group
– Have two groups
• Control group-----gets salt solution
• Treatment group---gets the actual vaccine
A Design Example
• Example (Observed Control Study)
– Control Group---all 1 st and 3 rd grade children
– Treatment group---all 2 nd graders
• Assumption:
– Age difference between control and treatment group was felt to be unimportant
A Design Example
• Potential Problems:
– Parents of 2 nd graders may not agree to vaccinating their kids
– Parents of sicker kids are most likely to accept the vaccine
– More educated parents tend to accept the vaccine
– Parents of sick 1 st and 3 rd graders may object that their kids are not getting treatment
A Design Example
• Difficulty in diagnosing polio
– Extreme case of polio are easy to diagnose
– Less severe cases of polio have symptoms similar to other common illnesses
A Design Example
• Potential Problems
– Physicians are aware of who has received the vaccine and who has not
– Less severe case of polio in a 2 nd grader (who has received the vaccine) may wrongly diagnosed as another illness
– Less severe case in a 1 st or 3 rd grader will most likely be diagnosed as polio
A Design Example
• Case 3: Randomization, Placebo Control,
Double Blindness
– Random assignment of control and treatment groups
• Select a child
• Flip a coin-------H-------Treatment Group
T---------Control Group
Design Example
• Placebo Control
– Kids in the control group receive salt solution
• Double Blind
– Neither the child
– nor the parents
– nor the doctors/nurses who make the diagnosis of polio know whether a kid receives the vaccine or the placebo
A Design Example
Summary
• In designing experiments
– Introduce some sort of control group
– Use randomization to avoid bias in selection and assignment of subjects for the study
– Double blind experiments give protection against biases, both intentional and unintentional
A Design Example
• Perform the experiment on a large number of subjects (Polio case ~in millions of kids)
• Repeat the experiment several times before making definitive conclusions
A Design Example
Basic Principles of Experimental Designs
• Randomization
• Blocking (Treatment/Control Groups)
• Replication