Chapter 1: Drawing Statistical Conclusions 1.1 Case Studies 1.1.1 Motivation and Creativity

advertisement
Chapter 1: Drawing Statistical Conclusions
1.1 Case Studies
1.1.1 Motivation and Creativity
Strong evidence was found that subjects given intrinsic motivation scored higher than those
given extrinsic motivation. It is estimated that the differences between the scores is 4.1 points
with a 95% confidence interval from 1.3 to 7.0 points.
• Can the researcher conclude that the difference in scores is caused by the group membership? Why or why not?
• To what group of people do these results apply? Why?
1.1.2 Sex Discrimination in Employment
The mean starting salary for males is estimated to be $560 to $1080 larger than the mean
starting salary for females (95% confidence interval).
• Can we conclude that the difference in salaries is due to gender? Why or why not?
• If not, could you design a different experiment to establish a causal relationship?
• What population can these results be inferred to?
1
1.2 Statistical Inference and Study Design
• What does the term inference mean?
• What is a statistical inference?
• Why must every statistical statement include a measure of uncertainty?
• ** Interpreting statistical results is crucially linked to study design! i.e. The inferences
we can make DEPEND on the study’s design.**
1.2.1 Does A Cause B?
Were the units (subjects) randomly assigned to groups?
What does randomly assigned actually mean?
• YES:
– Study is a randomized experiment.
– Cause-and-effect statements can be made.
– What are confounding variables?
– Why does randomization allow for causal statements?
– Unfair randomization?
• NO:
– Study is an observational study.
2
– Cause-and-effect statements are NOT justified from the design
– Remember, “A causes B” is not equivalent to “A is associated with B”!
– Don’t worry, observational studies are useful! See p 7.
1.
2.
3.
4. Sometimes observational studies are all that can be done.
Experimental Design
A field of statistics studying methods for randomly assigning units to groups (after they have
been selected for the study).
• For starters:
What are the independent units to which the treatments are applied?
• For a 2-treatment randomized experiment all units have same chance of being in group A
(or group B).
• Examples of simple randomization procedures:
– If there are 20 experimental units, start flipping a coin for each. The first 10 heads
can be placed in group A, the remaining in group B.
R> sample(c("A","B"), 15, replace=TRUE)
– Number all from 1 to 20, pick 10 numbers at random to get treatment A.
R> sample(1:20, 10, replace=FALSE)
## 1:20 means the integers 1,2,...,19,20
1.2.2 Were the units used randomly sampled from some larger population?
• YES:
– Results CAN be inferred to larger population (the one sampled from)
– What does random sampling attempt to ensure?
– One method: Simple Random Sampling (SRS) Each member of the population has
an equal chance of being selected, independent of which other members are selected.
• NO:
– Results can NOT be inferred to any larger population (only apply to the units in the
study).
– Examples of non-random sampling techniques:
3
– Inference beyond your sample is only as good as your assumption that your sample
is as representative as a random sample.
Sampling
again, a whole field of statistics studies methods for randomly sampling units from a population.
At MSU, Stat 446.
• Example of choosing a simple random sample (SRS) of size n:
1. Assign each unit in the population a number (a frame), usually successive integers.
2. For a random sample of n units, use a computer to generate n distinct random integers
without replacement from the integers 1 to 2000 (size of the frame) such that each
has equal chance of being selected.
R> sample(2000, 27, replace=FALSE) ## here 2000 is short for 1:2000
3. Include the units with the corresponding n labels in the study. Do not second guess
your random number generator or you defeat the purpose of random sampling!
• Other random sampling procedures:
– Systematic, Random cluster sampling, variable probability sampling, adaptive sampling, etc.
• Does random sampling apply to randomized experiments or observational studies or both?
Comments:
• When writing a Scope of Inference answer the TWO questions and explain the answers
with language specific to the problem.
• Cause-and-effect relationships may still be drawn from randomized experiments without
random sampling. Causation can be extended from the sample to some larger population
by appealing to expert opinion.
• An observational study based on a non-random sample is scary. We have to pretend that
the units are representative of the population, but this should be accompanied by strong
justification and even then the potential for bias cannot be ruled out. Always report such
issues and let the reader evaluate the strength of conclusions.
• Do you know of a counter-example to this statement? “All studies are run on convenience
samples.”
• Best design for an experiment/study:
1. Randomly select units from population of interest
4
2. Randomly assign units to treatment (study) groups
3. Replicate by assigning multiple units to treatment groups (or choosing multiple units
per group)
1.2.3 Chance Models and Statistical Inference
• Random sampling and random treatment assignment allow us to use probability models
to link data to parameters of the population.
• Statistical inference depends on a probability model.
• Probability models allow us to measure uncertainty.
• Statistical uncertainty covers the variability of random sampling and allocation within the
population.
• Does not cover variation in measurement error or bias of measurements.
Randomized
Experiment
Observational
Study
Random
Sample
Nonrandom
Sample
Table 1: Fill in the statistical inference permitted by study designs
Moral of the story
• Time spent thinking and planning PRIOR to data collection is time well spent. You should
know what statistical analysis you are going to use BEFORE collecting your data. A well
thought out design can make statistical analysis and its interpretation “easy.”
• A statistical analysis cannot salvage a poorly designed study.
5
Download