if - RossmanChance

advertisement
Using Simulations to Teach
Statistical Inference
Beth Chance, Allan Rossman (Cal Poly)
ICTCM 2011
1
Joint Work with

Soma Roy, Karen McGaughey (Cal Poly),





Alex Herrington (Cal Poly undergrad)
John Holcomb (Cleveland State),
George Cobb (Mt. Holyoke),
Nathan Tintle, Jill VanderStoep, Todd
Swanson (Hope College)
This project has been supported by the
National Science Foundation, DUE/CCLI
#0633349
ICTCM 2011
2
Outline


Motivation/Goals
Examples






Binomial process, randomized experiment- binary,
randomized experiment - quantitative response
Series of lab assignments
Discussion points
Student feedback, Evaluation results
Design principles & implementation
Observations, Open questions
ICTCM 2011
3
Motivation

Cobb (2007) – 12 reasons to teach
permutation tests…





Model is “simple and easily grasped”
Matches production process, links data production
and inference
Role for tactile and computer simulations
Easily extendible to other designs (e.g., blocking)
Fisherian logic
--”The Introductory Statistics Course:
A Ptolemaic Curriculum” (TISE)
ICTCM 2011
4
Goals

Develop an introductory curriculum that
focuses on randomization-based approach to
inference



vs. using simulation to teach traditional inference
From beginning of course, permeate all topics
Improve understanding of inference and
statistical process in general

More modern (computer intensive) and flexible
approach to inferential analysis
ICTCM 2011
5
Brief overview of labs


Case-study focus
Pre-lab



50-minute (computer) lab period
Online instructions




Background, Review questions submitted in advance
Directed questions following statistical process
Embedded applets or statistical software
Application/Extension
Lab report with partner
ICTCM 2011
6
Example 1: Friend or Foe
(Helper/Hinderer)







Videos
Research question
Pre-lab
Descriptive analysis
Introduction of null hypothesis,
p-value terminology
Plausible values
Conclusions
ICTCM 2011
7
Discussion Points

Can this be done on day one?

Yes if can motivate the simulation


Loaded dice
Before reveal the data?
ICTCM 2011
8
<<After tactile simulation>> How many infants
would need to choose the helper toy for you to be
convinced the choice was not made “at random,”
but they actually prefer the helper toy?

Many students can reason inferentially



“If a choice is made at complete random, then
having 13 infants would be highly unlikely”
“Based on the coin flipping experiment, the results
stated that at/over 12 was extremely rare.
Therefore, at least 12 infants …
“Would be around 12-16 because it seems highly
unlikely that given a 50-50 option 12-16 would
choose the helper toy”
ICTCM 2011
9
<<After tactile simulation>> How many infants
would need to choose the helper toy for you to be
convinced the choice was not made “at random,”
but they actually prefer the helper toy?

But maybe not as well “distributionally”

Is it unusual? = “barely over half”


Examine language carefully





vs. unusual compared to distribution
“Unlikely that choice is random”
“Prove”
“Simulate”, “Repeated this study”
“At random” = 50/50, “model”
“Random” = anything is possible
ICTCM 2011
10
Discussion Points

Can this be done on day one?

Yes if can motivate the simulation





Loaded dice
Before reveal the data?
Enough understanding of “chance model”?
Use of class data instead? (“observed” vs. research
study)
Yes, if return to and build on the ideas throughout
the course

So what comes next?
ICTCM 2011
11
Discussion Points

Tactile simulation


Population vs process




Defining the parameter
3Ss: statistic, simulate, strength of evidence


One coin 16 times vs. 16 coins
“could have been” distribution of data
“what if the null was true” distribution of statistic
Fill in the blank wording
Timing of final report

Follow-up in-class discussion
ICTCM 2011
12
Example 2: Two Proportions

Is Yawning Contagious?





Modelling entire process: data collection,
descriptive statistics, inferential analysis,
conclusions
Parallelisms to first example
Could random assignment alone produce a
difference in the group proportions at least this
extreme?
Card shuffling, recreate two-way table
Extend to own data
ICTCM 2011
13
Lab Instructions
ICTCM 2011
14
Exam Questions



Horizontal axis
Shade p-value
Make up a research question
ICTCM 2011
15
Discussion Points






Starting with a significant result but when
ready to discuss insignificant?
How critical is authentic data?
Choice of statistic (count vs. difference in
proportion)
Role of traditional symbols and notation?
Visualization of bar graphs from trial to trial
Implementation of predict and test
ICTCM 2011
16
Example 3: Two means

Are there lingering effects to sleep
deprivation?



Randomized experiment
Quantitative data
Parallel inferential reasoning process


Index cards
Possible follow-up/extensions: what if -4.33?,
medians, plausible values
ICTCM 2011
17
Discussion Points


Role of tactile simulation
Scaffolding of lab report



When should “normal-based” methods be
introduced



Introductory sentences, labeling of graphs
Write conclusion to journal
Alternative approximation to simulation
Position, method for confidence intervals
Choice of technology

Advantages/Disadvantages

Applets, Minitab, R, Fathom
ICTCM 2011
18
Post-Lab Assessment (Fall 2010)

Following the lab comparing two groups on a
quantitative variable (65 responses)





Discuss the purpose of the simulation process
What information does the simulation process reveal
to help you answer the research question?
Essentially correct: 35.4% demonstrated
understanding of the big picture (looking at
repeated shuffles to assess whether the
observed results happened by chance)
Partially: 38.5% (one of null or comparison)
Incorrect: 26.1% (“better understand the data”)
ICTCM 2011
19
Post-Lab Assessment (Fall 2010)

Did students address the null hypothesis?


Did students reference the random assignment?


36.9% E/ 36.9% P/ 26.2% I
Did students focus on comparing the observed
result?


33.9% E/ 38.5% P/ 27.7% I
64.6% E/ 13.8% P/ 21.5% I
Did students explain how they would link the
pieces together and draw their conclusion?

24.6% E/ 60% P/ 15% I
ICTCM 2011
20
Student Surveys
ICTCM 2011
21
Student Surveys
ICTCM 2011
22
Student Surveys

Example 3 simulation
ICTCM 2011
23
Student Surveys
ICTCM 2011
24
Student Surveys
ICTCM 2011
25
Student Surveys

Helper/Hinderer (Winter 2011) – Did the lab
help you understand the overall process of a
statistical investigation?
ICTCM 2011
26
Student Surveys

Did subsequent labs increase understanding?
ICTCM 2011
27
Remainder of labs


Lab 4: Random babies
Lab 5: Reese’s Pieces (demo)



Lab 6: Sleepless nights (finite population)




Normal approximation, CLT for binary
Transition to formal test of significance (6 steps)
t approximation, CLT for quantitative, conf interval
Lab 7: Simulation of matched-pairs
Lab 8: Simulation of regression sampling
Chi-square, ANOVA
ICTCM 2011
28
Lab Report
ICTCM 2011
29
Student Feedback (Winter 2011)


Google docs survey during last week of
course
Two instructors
ICTCM 2011
30
Student end-of-course surveys (W 11)
ICTCM 2011
31
Student end-of-course surveys
ICTCM 2011
32
Student end-of-course surveys
ICTCM 2011
33
Student end-of-course surveys
ICTCM 2011
34
Student end-of-course surveys
ICTCM 2011
35
Student end-of-course surveys
ICTCM 2011
36
Student end-of-course surveys
ICTCM 2011
37
Student end-of-course surveys
ICTCM 2011
38
Student end-of-course surveys
ICTCM 2011
39
Student end-of-course surveys
ICTCM 2011
40
Student end-of-course surveys
ICTCM 2011
41
Student end-of-course surveys
ICTCM 2011
42
Student end-of-course surveys
ICTCM 2011
43
Student end-of-course surveys
ICTCM 2011
44
Student end-of-course surveys
ICTCM 2011
45
Top 2 most interesting labs

Instructor A



Is Yawning Contagious?
Heart Rates (matched pairs)
Instructor B



Friend or Foe
Is Yawning Contagious?
Reese’s Pieces
ICTCM 2011
46
Top 2 most/least helpful labs

Most helpful:


Friend or Foe
Least Helpful (Instructor B):


Random babies
Melting away (intro two-sample t, paired)
ICTCM 2011
47
Exam 1


In a recent Gallup survey of 500 randomly
selected US adult Republicans, 390 said they
believe their congressional representative
should vote to repeal the Healthcare Law.
Suppose we wish to determine if significantly
more than three-quarters (75%) of US adult
Republicans favor repeal.
The coin tossing simulation applet was used to
generate the following two dotplots (A) and (B).
Which, if either, of the two plots (A) and (B) was
created using the correct procedure? Explain
how you know.
ICTCM 2011
48
Exam 1

35% picked B (usually citing null .75500)




But some look at shape, or later p-value
29% picked A (observed result)
23% neither (wanted .5500 = 250)
13% other responses: 0, .75, 50, can’t tell,
anything possible, label is wrong
ICTCM 2011
49
Exam 2

Heights of females are known to follow a normal
distribution with a mean of 64 inches and a
standard deviation of 3 inches. Consider the
behavior of sample means. Each of the graphs
below depicts the behavior of the sample mean
heights of females.
a. One graph shows the distribution of sample
means for many, many samples of size 10. The
other graph shows the distribution of sample
means for many, many samples of size 50.
Which graph goes with which sample size?
ICTCM 2011
50
Exam 2

85% matched n=10 and n = 50
ICTCM 2011
51
Exam 2

Suppose we wish to test the following
hypotheses about the population of Cal Poly
undergraduate women:
H o : Height  64
H A :  Height  64

For which graph (A or B) would you expect
the p-value to be smaller? Explain using the
appropriate statistical reasoning.
ICTCM 2011
52
Exam 2

77% picked B


Mixture of appealing to smaller SD/outliers, larger
sample size means smaller p-value, and thinking
in terms of test statistic
A few choices not internally consistent
ICTCM 2011
53
Student understanding of p-value

CAOS questions (final exam)

Statistically significant results correspond to small
p-values



Recognize valid p-value interpretation



Traditional (National/Hope/CP): 69/86/41%
Randomization (Hope/CP): 95%/95%
Traditional (National/Hope/CP): 57/41/74%
Randomization (Hope/CP): 60/72%
p-value as probability of Ho - Invalid


Traditional (National/Hope/CP): 59/69/68%
Randomization (Hope/CP): 80%/89%
ICTCM 2011
54
Student understanding of p-value

CAOS questions (final exam)

p-value as probability of Ha – Invalid



Traditional (National/Hope/CP): 54/48/72%
Randomization (Hope/CP): 45/67%
Recognize a simulation approach to evaluate
significance (simulate with no preference vs.
repeating the experiment)


Traditional (National/Hope/CP): 20/20/30%
Randomization (Hope/CP): 32%/40%
ICTCM 2011
55
Student understanding of p-value

p-value interpretation in regression (final
exam)
ICTCM 2011
56
Student understanding of process

Video game question (Final exam: NCSU, Hope,
Cal Poly, UCLA, Rhodes College)




What is the explanation for the process the
student followed?
Which of the following was used as a basis for
simulating the data 1000 times?
What does the histogram tell you about whether
$5 incentives are effective in improving
performance on the video game?
Which of the following could be the approximate
p-value in this situation?
ICTCM 2011
57
Student understanding of process

Simulation process




Fall: over 40% chose “This process allows her to
determine how many times she needs to replicate
the experiment for valid results.”
About 70% pick “The $5 incentive and verbal
encouragement are equally effective at improving
performance.” as underlying assumption
Still evidence some look at center at zero or
shape as evidence of no treatment effect
1/3 to ½ could estimate p-value from graph
ICTCM 2011
58
Example – 2009 AP Statistics Exam

A consumer organization would like a method
for measuring the skewness of the data. One
possible statistic for measuring skewness is
the ratio mean/median….


Calculate statistic for sample data…
Draw conclusion from simulated data …
ICTCM 2011
59
Design Principles




Tactile simulation
Visual, contextual animation of tactile simulation
Intermediate animation capability
Level of student construction



Carefully designed, spiraling activities



Ease of changing inputs
Connect elements between graphs
“Stop!”
Thought questions
Allow for student exploration
ICTCM 2011
60
Implementation




Early in course
Repetition through course, connections
Normal approximations
Lab assignments








Focus on entire statistical process
Motivating research question
Follow-up application
Thought questions
Screen captures
Pre-lab questions
Minitab demos (Adobe Captivate)
Exam questions
ICTCM 2011
61
Observations


Students quickly get sense of trying to
determine whether a result could be “just due
to chance”
Still struggle with more technical
understanding



Under the null hypothesis
Observed vs. hypothesized value
Students may fail to see connections
between scenarios
ICTCM 2011
62
Suggestions/Open Questions

Begin with class discussion/brain-storming on
how to evaluate data before show class
results



Student data vs. genuine research article


Loaded dice, biased coin tossing
Thought questions
“the result” vs. “your result”
Choice of first exposure


Significant?
Random sampling or random assignment
ICTCM 2011
63
Suggestions/Open Questions

Scaffolding

Observational units, variable




How would you add one more dot to graph?
At some point, require students to enter the
correct “observed result” (e.g., Captivate)
At some point, ask students to design the
simulation?
Start with fill in the blank interpretation?
ICTCM 2011
64
Suggestions/Open Questions

One crank or more?
When connect to normal approximations?

How make sure traditional methods don’t overtake
once they are introduced?
 How much discuss exact methods?
Confidence intervals


ICTCM 2011
65
Summary

Very promising but also need to be very
careful, and need a strong cycle of repetition
closely tied to rest of course…
ICTCM 2011
66
Download