Making Statistics Surprising Roger Watt Kelly Younger Lizzie Collins Rebecca Skinner Francesca Worsnop Science from the outside Idea Knowledge Science from the inside Idea Result Evidence Knowledge Science from the inside Idea Result Evidence Knowledge Idea Result Hypothesis Design Evidence Inference Describing Data Analysis Persuading Knowledge What matters here? Idea Result Hypothesis Design Evidence Inference Describing Data Analysis Persuading Knowledge What matters here? Idea Result Hypothesis Design Evidence Decisions required Inference Describing Data Analysis Persuading Knowledge What matters here? Idea Hypothesis Result What variables? What types of variable? What relationships between variables? Inference Describing Data Analysis Design Persuading What sampling method? What deployment of sample (between/within)? What sample size? Evidence Knowledge Lesson • We must make decisions – these matter • We may have preferences – these don’t matter The Student Journey What appears to matter here to a student? Idea Result Hypothesis Design Evidence Inference Describing Data Analysis Persuading Knowledge What appears to matter here to a student? What test? t-test chi-sqr correlation ANOVA regression ANCOVA MANOVA How to test? Formulae Calculations Σ(xi-x)2 SPSS What columns? Result Inference Data Analysis Numbers…. Dozens of numbers SSQ F, t, p How many sig figs? The Student Experience • Stats is Hard – disconnected facts – tedious arithmetic • Stats is Disempowering – easy to make simple mistakes – myriad of details obscure concepts • Stats is not fun – no pleasant surprises The Main Goal: Doing stats • Understanding: – Preserve the whole picture • Conceptual Insight: – Full grasp of issues that matter for the outcome • Skills: – Confident in essentials The Plan • Materials – Whole picture always present – Concentrate on research decisions – Remove disconnected facts • Learning – Repeated Experience – Immediate feedback – Discovery The Whole Picture Idea Result Hypothesis Design Evidence Inference Describing Data Analysis Persuading Knowledge Research Decisions Result Idea Hypothesis Design Evidence Inference Describing Data Analysis Persuading Knowledge Result Idea Hypothesis What variables? Inference Describing What types of variable? What relationships between variables? Data Analysis Persuading Design What sampling method? What deployment of sample (between/within)? What sample size? Evidence Knowledge BrawStats • Materials – Whole process always visible – Decisions require user input • everything else automatic • Learning – Encourages experimenting & discovery – Every action produces a relevant graphical output • immediately BrawStats • Hypothesis – How many variables? – What variables? – What types of variable? – What relationship between variables? Variables Variables Variables Logic Variables Logic Variables IQ 150 100 Logic 50 female male gender Prediction Hypothesis Dependent Va ri a bl e IQ (Interva l ) Mea n = 100 Std = 15 Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%) Predicted Means gender fema l e ma l e IQ 107 93 BrawStats • Design – How to sample? – Within/Between? – How many participants? Variables IQ 150 100 Logic 50 female male gender Prediction Hypothesis Dependent Va ri a bl e IQ (Interva l ) Mea n = 100 Std = 15 Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%) Predicted Means gender fema l e ma l e IQ 107 93 Variables IQ 150 100 Logic 50 female male gender Prediction Hypothesis Dependent Va ri a bl e IQ (Interva l ) Mea n = 100 Std = 15 Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%) Predicted Means gender fema l e ma l e IQ 107 93 Variables IQ 150 100 Logic 50 female male gender Prediction Hypothesis Dependent Va ri a bl e IQ (Interva l ) Mea n = 100 Std = 15 Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%) Predicted Means gender fema l e ma l e IQ 107 93 Design BrawStats • Everything else – done for you Variables IQ 150 100 Logic 50 female male gender Prediction Hypothesis Dependent Va ri a bl e IQ (Interva l ) Mea n = 100 Std = 15 Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%) Predicted Means gender fema l e ma l e IQ 107 93 Design Variables IQ 150 100 Logic 50 female male gender Prediction Hypothesis Dependent Va ri a bl e IQ (Interva l ) Mea n = 100 Std = 15 Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%) Predicted Means gender fema l e ma l e IQ 107 93 Design Variables Logic Prediction Design Variables Design Evidence Logic Prediction Variables Design Evidence Logic Prediction BrawStats • Structure 1. Whole process always visible 2. Decisions require user input 3. Everything else automatic • Learning 4. Every action produces a relevant graphical output immediately 5. Encourages experimenting & discovery 1. Whole process always visible 2. Decisions require user input 3. Everything else automatic 4. Relevant graphical output immediately 5. Encourages experimenting & discovery The Main Goal: Doing stats • Understanding: – Preserve the whole picture • Conceptual Insight: – Full grasp of issues that matter for the outcome • Skills: – Confident in essentials The Next Goal : Expected Outcomes • Understanding: – Relationship of outcome to chance (sampling error) • Conceptual Insight: – Strengths and weaknesses of statistical testing (NHST) • Skills: – Interpret statistical outcomes The Next Goal : Expected Outcomes • Understanding: – Relationship of outcome to chance (sampling error) • Conceptual Insight: – Strengths and weaknesses of statistical testing (NHST) • Skills: – Interpret statistical outcomes The Next Goal : Expected Outcomes • Understanding: – Relationship of outcome to chance (sampling error) • Conceptual Insight: – Strengths and weaknesses of statistical testing (NHST) • Skills: – Interpret statistical outcomes The Next Goal: Expected Outcomes • Understanding: – Relationship of outcome to chance (sampling error) • Conceptual Insight: – Strengths and weaknesses of statistical testing (NHST) • Skills: – Interpret statistical outcomes Consequences of the p-value distribution We are locked into the type of system given by this truth table: H0 Correct p<=0.05 p>0.05 H0 Incorrect Type I error Type II error t-test independent samples (n=63100) 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0.01 0.1 criterion p 1 1.0 p(Type II error) p(Type I error) 1 Lessons • sampling error matters • p-value – depends on sampling error – is poorly behaved • p-values cannot be easily interpreted The Last Goal: Exploring stats • Understanding: – Relationship of outcome to design decisions • Conceptual Insight: – Strengths and weaknesses of designs • Skills: – Make optimal decisions Result Idea Hypothesis What variables? Inference Describing What types of variable? What relationships between variables? Data Analysis Persuading Design What sampling method? What deployment of sample (between/within)? What sample size? Evidence Knowledge The Basic Design Choices • • • • Variable Type Between/Within No participants Sampling strategy 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 Hypothesis Dependent Va ri a bl e IQ (Interva l ) Mea n = 100 Std = 15 Independent Va ri a bl e gender (Ca tegori ca l ) fema l e(50%) ma l e(50%) Predicted Means gender fema l e ma l e IQ 107 93 1 The Basic Design Choices • • • • Variable Type Between/Within No participants Sampling strategy Pearson correlation(n=11260) 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 IQ 0 i o c5 c4 type of IV c3 c2 1 p(Type II error) p(Type I error) 1 Pearson correlation(n=18380) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 i o c5 c4 type of IV c3 c2 1 p(Type II error) p(Type I error) 1 The Basic Design Choices • • • • Variable Type Between/Within No participants Sampling strategy t-test paired samplesgender (n=10480) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 i r repeated measures 1 p(Type II error) p(Type I error) 1 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 i r repeated measures 1 p(Type II error) p(Type I error) 1 t-test paired samples(n=162040) gender The Basic Design Choices • • • • Variable Type Between/Within No participants Sampling strategy t-test independent samples(n=2780) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 20 40 60 no of participants 80 100 1 p(Type II error) p(Type I error) 1 t-test independent samples(n=18000) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 20 40 60 no of participants 80 100 1 p(Type II error) p(Type I error) 1 The Basic Design Choices • • • • Variable Type Between/Within No participants Sampling strategy t-test independent samples(n=27100) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 0.2 0.4 0.6 independence 0.8 1 p(Type II error) p(Type I error) 1 t-test independent samples(n=13580) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 0.2 0.4 0.6 independence 0.8 1 p(Type II error) p(Type I error) 1 The Basic Assumptions • Normality: – skew – kurtosis t-test independent samples(n=8270) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 -1 -0.5 0 skew 0.5 1 1 p(Type II error) p(Type I error) 1 t-test independent samples(n=15000) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 -1 -0.5 0 skew 0.5 1 1 p(Type II error) p(Type I error) 1 t-test independent samples(n=8640) 0 0.8 0.2 0.6 0.4 0.4 0.6 0.2 0.8 0 -1 -0.5 0 kurtosis 0.5 1 1 p(Type II error) p(Type I error) 1 t-test independent samples(n=8640) 1 p(Type I error) 0.8 0.6 0.4 0.2 0 -1 -0.5 0 kurtosis 0.5 1 Lessons • early decisions matter: – interval>ordinal>categorical – no participants – sampling strategy • between/within • non-independence • not much else matters – skew – kurtosis The Student Experience • Stats is Hard – disconnected facts – tedious arithmetic • Stats is Disempowering – easy to make simple mistakes – myriad of details obscure concepts • Stats is not fun – no pleasant surprises The Main Goal: Doing stats • Understanding: – Preserve the whole picture • Conceptual Insight: – Full grasp of issues that matter for the outcome • Skills: – Confident in essentials The Plan • Materials – Whole picture always present – Concentrate on research decisions – Remove disconnected facts • Learning – Repeated Experience – Immediate feedback – Discovery BrawStats • Materials – Whole process always visible – Decisions require user input • everything else automatic • Learning – Encourages experimenting & discovery – Every action produces a relevant graphical output • immediately Lessons • It (almost) worked – not sure why – maybe because: • • • • no numbers/arithmetic single coherent process it is (??) self-explaining & self-illustrating foraging for undocumented features