Assumptions “Essentially, all models are wrong, but some are useful” Your model has to be wrong… … but that’s o.k. if it’s illuminating! George E.P. Box Linear Model Assumptions Absence of Collinearity No influential data points Normality of Errors Homoskedasticity of Errors Independence Linear Model Assumptions Absence of Collinearity No influential data points Normality of Errors Homoskedasticity of Errors Independence Absence of Collinearity Baayen (2008: 182) Absence of Collinearity Baayen (2008: 182) Where does collinearity come from? …most often, correlated predictor variables Demo What to do? Linear Model Assumptions Absence of Collinearity No influential data points Normality of Errors Homoskedasticity of Errors Independence Leverage Baayen (2008: 189-190) 0 -5 y 5 p = 0.37 -5 0 x 5 0 -5 y 5 p = 0.000529 -5 0 x 5 -1000 -500 0 500 1000 p = 1.18618272971752e-26 -1000 -500 0 500 1000 p = 0.0699135015186808 Leave-one-out Influence Diagnostics DFbeta (…and much more) Winter & Matlock (2013) Linear Model Assumptions Absence of Collinearity No influential data points Normality of Errors Homoskedasticity of Errors Independence Normality of Error The error (not the data!) is assumed to be normally distributed So, the residuals should be normally distributed Histogram of residuals(xmdl) 50 100 ✔ 0 Frequency 150 200 xmdl = lm(y ~ x) hist(residuals(xmdl)) -4 -2 0 residuals(xmdl) 2 qqnorm(residuals(xmdl)) qqline(residuals(xmdl)) 1 -1 0 ✔ -2 Sample Quantiles 2 Normal Q-Q Plot -2 -1 0 Theoretical Quantiles 1 2 qqnorm(residuals(xmdl)) qqline(residuals(xmdl)) ✗ Linear Model Assumptions Absence of Collinearity No influential data points Normality of Errors Homoskedasticity of Errors Independence Homoskedasticity of Error The error (not the data!) is assumed to have equal variance across the predicted values So, the residuals should have equal variance across the predicted values Noise 0 200 400 600 Reaction time 800 1000 -3 -2 -1 0 1 residuals(xmdl) ✔ -10 -5 0 fitted(xmdl) 5 2 3 4 -5 0 fitted(xmdl) 5 -0.5 0.0 0.5 1.0 residuals(xmdl) ✗ 1.5 2.0 2.5 150000 50000 residuals(xmdl) 0 -100000 -200000 1e+06 2e+06 3e+06 fitted(xmdl) 4e+06 5e+06 ✗ 6e+06 WHAT TO IF NORMALITY/HOMOSKEDASTI CITY IS VIOLATED? Either: nothing + report the violation Or: report the violation + transformations Two types of transformations Linear Transformations Nonlinear Transformations Leave shape of the distribution intact (centering, scaling) Do change the shape of the distribution 200 100 0 Frequency 300 400 Histogram of xdata$Pic.RT 0 1000 2000 xdata$Pic.RT 3000 4000 100 50 0 Frequency 150 Histogram of log(xdata$Pic.RT) 5.5 6.0 6.5 7.0 log(xdata$Pic.RT) 7.5 8.0 8.5 1000 500 0 -500 -1000 residuals(xmdl) 1500 2000 Before transformation 500 1000 fitted(xmdl) 1500 2000 0.5 0.0 -0.5 residuals(xmdl.log) 1.0 After transformation 6.0 6.5 fitted(xmdl.log) 7.0 Still bad…. 7.5 …. but better!! Assumptions Absence of Collinearity No influential data points Normality of Errors Homoskedasticity of Errors Independence Assumptions Normality of Errors Homoskedasticity of Errors (Histogram of Residuals) Residual Plot Q-Q plot of Residuals Assumptions Normality of Errors Absence of Collinearity Independence Homoskedasticity Errors Noofinfluential data points Assumptions Absence of Collinearity No influential data points Normality of Errors Homoskedasticity of Errors Independence What is independence? Common experimental data Subject Item #1 Rep 1 Rep 3 Rep 2 Item ... Item ... Common experimental data Subject Item #1 Rep 1 Rep 3 Rep 2 Item ... Pseudoreplication = Disregarding Dependencies Item ... Subject1 Subject1 Subject1 … Item1 Item2 Item3 … “pooling fallacy” Machlis et al. (1985) Subject2 Subject2 Subject3 …. Item1 Item2 Item3 … “pseudoreplication” Hurlbert (1984) Hierarchical data is everywhere • Typological data (e.g., Bell 1978, Dryer 1989, Perkins 1989; Jaeger et al., 2011) • Organizational data • Classroom data Finnish Norwegian Swedish English French Spanish Germa n Hungarian Romanian Italian Turkish Finnish Norwegian Swedish English French Spanish Germa n Hungarian Romanian Italian Turkish Hierarchical data is everywhere Class 1 Class 2 Hierarchical data is everywhere Class 1 Class 2 Hierarchical data is everywhere Class 1 Class 2 Hierarchical data is everywhere Hierarchical data is everywhere Intraclass Correlation (ICC) Simulation for 16 subjects pseudoreplication Type I error rate items analysis Interpretational Problem: What’s the population for inference? Violating the independence assumption makes the p-value… …meaningless S1 S2 S1 S2 That’s it (for now)