Core Methods in Educational Data Mining HUDK4050 Fall 2015 Assignment Q1-Q3 • Within data set 1, how many correlations are statistically significant (according to the customary p<0.05 definition) – If you do not apply any sort of post-hoc control? – If you use Bonferroni – If you use Benjamini&Hochberg Q4 • What is the correlation with the lowest pvalue that comes up significant for B&H but not for Bonferroni? Q5-Q7 • Within data set 2, how many correlations are statistically significant (according to the customary p<0.05 definition) – If you do not apply any sort of post-hoc control? – If you use Bonferroni – If you use Benjamini&Hochberg Q8 • What is the lowest correlation that is still statistically significant, according to Bonferroni’s test? Why were the “lowest significant correlations” so different between data sets? Q9 Question 9: Now do you see why Professor Baker says that statistical significance doesn’t matter much for really big data sets? (and this is NOT a big data set by reckoning in other fields) • Yes – Bonferroni is ridiculously conservative with 1,112 tests, and yet correlations that are absurdly small still come up statistically significant. • No. I think the answer to Question 9 is a fine correlation, perfectly likely to represent a large effect size. • No. Big data is a FAD. No one should ever expect to work with a data set over 150 data points, especially after the societal collapse predicted by James Howard Kunstler occurs. Today… • We’ll discuss discovery with models Discovery with Models • What is discovery with models? Discovery with Models • How is it used in prediction? • How is it used in relationship mining? Discovery with Models • Why do we need a special term for it? (Do we need a special term for it?) Advantages & Disadvantages • What are some of the advantages and disadvantages of discovery with models? Questions or comments? Questions or comments about Dawson et al. case study? Questions or comments about Hershkovitz et al. paper? Questions or comments about Fancsali paper? Any other questions about discovery with models? Assignment B7 • Clustering • Any questions? Next Class • Tuesday, November 17 • Baker, R.S. (2015) Big Data and Education. Ch. 7, V1, V2, V3, V4, V5. • Amershi, S. Conati, C. (2009) Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning Environments. Journal of Educational Data Mining, 1 (1), 18-71. • Bowers, A.J. (2010) Analyzing the Longitudinal K-12 Grading Histories of Entire Cohorts of Students: Grades, Data Driven Decision Making, Dropping Out and Hierarchical Cluster Analysis. Practical Assessment, Research & Evaluation (PARE), 15(7), 1-18. The End