BrawStats

advertisement
Making Statistics Surprising
Roger Watt
Kelly Younger
Lizzie Collins
Rebecca Skinner
Francesca Worsnop
Science from the outside
Idea
Knowledge
Science from the inside
Idea
Result
Evidence
Knowledge
Science from the inside
Idea
Result
Evidence
Knowledge
Idea
Result
Hypothesis
Design
Evidence
Inference
Describing
Data Analysis
Persuading
Knowledge
What matters here?
Idea
Result
Hypothesis
Design
Evidence
Inference
Describing
Data Analysis
Persuading
Knowledge
What matters here?
Idea
Result
Hypothesis
Design
Evidence
Decisions required
Inference
Describing
Data Analysis
Persuading
Knowledge
What matters here?
Idea
Hypothesis
Result
What variables?
What types of variable?
What relationships between variables?
Inference
Describing
Data Analysis
Design
Persuading
What sampling method?
What deployment of sample (between/within)?
What sample size?
Evidence
Knowledge
Lesson
• We must make decisions
– these matter
• We may have preferences
– these don’t matter
The Student Journey
What appears to matter here to a student?
Idea
Result
Hypothesis
Design
Evidence
Inference
Describing
Data Analysis
Persuading
Knowledge
What appears to matter here to a student?
What test?
t-test
chi-sqr
correlation
ANOVA
regression
ANCOVA
MANOVA
How to test?
Formulae
Calculations
Σ(xi-x)2
SPSS
What columns?
Result
Inference
Data Analysis
Numbers….
Dozens of
numbers
SSQ
F, t, p
How many sig figs?
The Student Experience
• Stats is Hard
– disconnected facts
– tedious arithmetic
• Stats is Disempowering
– easy to make simple mistakes
– myriad of details obscure concepts
• Stats is not fun
– no pleasant surprises
The Main Goal: Doing stats
• Understanding:
– Preserve the whole picture
• Conceptual Insight:
– Full grasp of issues that matter for the outcome
• Skills:
– Confident in essentials
The Plan
• Materials
– Whole picture always present
– Concentrate on research decisions
– Remove disconnected facts
• Learning
– Repeated Experience
– Immediate feedback
– Discovery
The Whole Picture
Idea
Result
Hypothesis
Design
Evidence
Inference
Describing
Data Analysis
Persuading
Knowledge
Research Decisions
Result
Idea
Hypothesis
Design
Evidence
Inference
Describing
Data Analysis
Persuading
Knowledge
Result
Idea
Hypothesis
What variables?
Inference
Describing
What
types of variable?
What relationships between variables?
Data Analysis
Persuading
Design
What sampling method?
What deployment of sample (between/within)?
What sample size?
Evidence
Knowledge
BrawStats
• Materials
– Whole process always visible
– Decisions require user input
• everything else automatic
• Learning
– Encourages experimenting & discovery
– Every action produces a relevant graphical output
• immediately
BrawStats
• Hypothesis
– How many variables?
– What variables?
– What types of variable?
– What relationship between variables?
Variables
Variables
Variables
Logic
Variables
Logic
Variables
IQ
150
100
Logic
50
female
male
gender
Prediction
Hypothesis
Dependent Va ri a bl e
IQ (Interva l )
Mea n = 100 Std = 15
Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%)
Predicted Means
gender
fema l e ma l e
IQ 107
93
BrawStats
• Design
– How to sample?
– Within/Between?
– How many participants?
Variables
IQ
150
100
Logic
50
female
male
gender
Prediction
Hypothesis
Dependent Va ri a bl e
IQ (Interva l )
Mea n = 100 Std = 15
Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%)
Predicted Means
gender
fema l e ma l e
IQ 107
93
Variables
IQ
150
100
Logic
50
female
male
gender
Prediction
Hypothesis
Dependent Va ri a bl e
IQ (Interva l )
Mea n = 100 Std = 15
Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%)
Predicted Means
gender
fema l e ma l e
IQ 107
93
Variables
IQ
150
100
Logic
50
female
male
gender
Prediction
Hypothesis
Dependent Va ri a bl e
IQ (Interva l )
Mea n = 100 Std = 15
Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%)
Predicted Means
gender
fema l e ma l e
IQ 107
93
Design
BrawStats
• Everything else
– done for you
Variables
IQ
150
100
Logic
50
female
male
gender
Prediction
Hypothesis
Dependent Va ri a bl e
IQ (Interva l )
Mea n = 100 Std = 15
Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%)
Predicted Means
gender
fema l e ma l e
IQ 107
93
Design
Variables
IQ
150
100
Logic
50
female
male
gender
Prediction
Hypothesis
Dependent Va ri a bl e
IQ (Interva l )
Mea n = 100 Std = 15
Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%)
Predicted Means
gender
fema l e ma l e
IQ 107
93
Design
Variables
Logic
Prediction
Design
Variables
Design
Evidence
Logic
Prediction
Variables
Design
Evidence
Logic
Prediction
BrawStats
• Structure
1. Whole process always visible
2. Decisions require user input
3. Everything else automatic
• Learning
4. Every action produces a relevant graphical
output immediately
5. Encourages experimenting & discovery
1. Whole process always visible
2. Decisions require user input
3. Everything else automatic
4. Relevant graphical output immediately
5. Encourages experimenting & discovery
The Main Goal: Doing stats
• Understanding:
– Preserve the whole picture
• Conceptual Insight:
– Full grasp of issues that matter for the outcome
• Skills:
– Confident in essentials
The Next Goal : Expected Outcomes
• Understanding:
– Relationship of outcome to chance (sampling error)
• Conceptual Insight:
– Strengths and weaknesses of statistical testing (NHST)
• Skills:
– Interpret statistical outcomes
The Next Goal : Expected Outcomes
• Understanding:
– Relationship of outcome to chance (sampling error)
• Conceptual Insight:
– Strengths and weaknesses of statistical testing (NHST)
• Skills:
– Interpret statistical outcomes
The Next Goal : Expected Outcomes
• Understanding:
– Relationship of outcome to chance (sampling error)
• Conceptual Insight:
– Strengths and weaknesses of statistical testing (NHST)
• Skills:
– Interpret statistical outcomes
The Next Goal: Expected Outcomes
• Understanding:
– Relationship of outcome to chance (sampling error)
• Conceptual Insight:
– Strengths and weaknesses of statistical testing (NHST)
• Skills:
– Interpret statistical outcomes
Consequences of the p-value
distribution
We are locked into the type of system given
by this truth table:
H0 Correct
p<=0.05
p>0.05
H0 Incorrect
Type I error
Type II error
t-test independent samples (n=63100)
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0.01
0.1
criterion p
1
1.0
p(Type II error)
p(Type I error)
1
Lessons
• sampling error matters
• p-value
– depends on sampling error
– is poorly behaved
• p-values cannot be easily interpreted
The Last Goal: Exploring stats
• Understanding:
– Relationship of outcome to design decisions
• Conceptual Insight:
– Strengths and weaknesses of designs
• Skills:
– Make optimal decisions
Result
Idea
Hypothesis
What variables?
Inference
Describing
What
types of variable?
What relationships between variables?
Data Analysis
Persuading
Design
What sampling method?
What deployment of sample (between/within)?
What sample size?
Evidence
Knowledge
The Basic Design Choices
•
•
•
•
Variable Type
Between/Within
No participants
Sampling strategy
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
Hypothesis
Dependent Va ri a bl e
IQ (Interva l )
Mea n = 100 Std = 15
Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%)
Predicted Means
gender
fema l e ma l e
IQ 107
93
1
The Basic Design Choices
•
•
•
•
Variable Type
Between/Within
No participants
Sampling strategy
Pearson correlation(n=11260)
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
IQ
0
i
o
c5
c4
type of IV
c3
c2
1
p(Type II error)
p(Type I error)
1
Pearson correlation(n=18380)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
i
o
c5
c4
type of IV
c3
c2
1
p(Type II error)
p(Type I error)
1
The Basic Design Choices
•
•
•
•
Variable Type
Between/Within
No participants
Sampling strategy
t-test paired samplesgender
(n=10480)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
i
r
repeated measures
1
p(Type II error)
p(Type I error)
1
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
i
r
repeated measures
1
p(Type II error)
p(Type I error)
1
t-test paired samples(n=162040)
gender
The Basic Design Choices
•
•
•
•
Variable Type
Between/Within
No participants
Sampling strategy
t-test independent samples(n=2780)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
20
40
60
no of participants
80
100
1
p(Type II error)
p(Type I error)
1
t-test independent samples(n=18000)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
20
40
60
no of participants
80
100
1
p(Type II error)
p(Type I error)
1
The Basic Design Choices
•
•
•
•
Variable Type
Between/Within
No participants
Sampling strategy
t-test independent samples(n=27100)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
0.2
0.4
0.6
independence
0.8
1
p(Type II error)
p(Type I error)
1
t-test independent samples(n=13580)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
0.2
0.4
0.6
independence
0.8
1
p(Type II error)
p(Type I error)
1
The Basic Assumptions
• Normality:
– skew
– kurtosis
t-test independent samples(n=8270)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
-1
-0.5
0
skew
0.5
1
1
p(Type II error)
p(Type I error)
1
t-test independent samples(n=15000)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
-1
-0.5
0
skew
0.5
1
1
p(Type II error)
p(Type I error)
1
t-test independent samples(n=8640)
0
0.8
0.2
0.6
0.4
0.4
0.6
0.2
0.8
0
-1
-0.5
0
kurtosis
0.5
1
1
p(Type II error)
p(Type I error)
1
t-test independent samples(n=8640)
1
p(Type I error)
0.8
0.6
0.4
0.2
0
-1
-0.5
0
kurtosis
0.5
1
Lessons
• early decisions matter:
– interval>ordinal>categorical
– no participants
– sampling strategy
• between/within
• non-independence
• not much else matters
– skew
– kurtosis
The Student Experience
• Stats is Hard
– disconnected facts
– tedious arithmetic
• Stats is Disempowering
– easy to make simple mistakes
– myriad of details obscure concepts
• Stats is not fun
– no pleasant surprises
The Main Goal: Doing stats
• Understanding:
– Preserve the whole picture
• Conceptual Insight:
– Full grasp of issues that matter for the outcome
• Skills:
– Confident in essentials
The Plan
• Materials
– Whole picture always present
– Concentrate on research decisions
– Remove disconnected facts
• Learning
– Repeated Experience
– Immediate feedback
– Discovery
BrawStats
• Materials
– Whole process always visible
– Decisions require user input
• everything else automatic
• Learning
– Encourages experimenting & discovery
– Every action produces a relevant graphical output
• immediately
Lessons
• It (almost) worked
– not sure why
– maybe because:
•
•
•
•
no numbers/arithmetic
single coherent process
it is (??) self-explaining & self-illustrating
foraging for undocumented features
Download