Now What • Last class introduced statistical concepts. • How do we actually implement these ideas and get some results?? • Goal: introduce you to what’s out there, things you need to be conscious of • Familiarity with terms (understand papers) Experiment: IV= smartness drug; DV= IQ Experimental group 1 scores: mean = 115 Experimental group 2 scores: mean = 102 Control group scores: mean = 100 It looks different, but how different IS different? T tests Inferential test to decide if there is a real (significant) difference between the means of 2 data sets. In other words, do our 2 groups of people (experimental and control) represent 2 different POPULATIONS of people, or not? Hypothesis Testing (review) • The steps of hypothesis testing are: 1. Restate the question as a research hypothesis and a null hypothesis about the populations. ex) Does drug A make people super smart? null hypothesis (assumed to be true)= Drug A has no effect on people’s intelligence. HO: µ1 = µ2 • 2. Figure out what the comparison distribution looks like (your control group, usually) 3. Decide Type 1 error (alpha level) • Determine the cutoff sample score on the comparison distribution at which point the null should be rejected • Typically 5% or 1% • 4. determine where your sample falls on the comparison distribution • 5. reject or retain null hypothesis Backbone of inferential stats • Idea of assuming null and rejecting it if your results would only happen 5% of the time if the null were true • Underlies any “p” value, significance test you’ll ever see Stats packages • Matlab, stata, sas, R, JMP, spss • Different ones more popular in different fields; • I will reference SPSS (popular in psychology) Descriptive statistics • Start off getting familiar with your data • Analyze descriptive statistics • Means, quartiles, outliers, plots, frequencies T tests-NOT interchangeable • 1-sample T test Use this if you know the population mean (or you have hypothesized a population mean) and you want to know if your sample belongs to this population Ex- IQ population mean is 100, is my sample different? Or, I have a theory that everyone is 6 feet tall. I can take a sample of people and see if this is true. 1 sample t test • Analyze Compare means 1 sample t test Test Variable= the thing you want to know if it is different from the hypothesized population mean Test Value= hypothesized population mean (default is 0) Reading output • Is my variable ‘caldif’ significantly different than the null population mean of zero? One-Sample Statistics N caldif 67 Mean -.1090 Std. Deviation 1.80174 Std. Error Mean .22012 One-Sample Test Test Value = 0 caldif t -.495 df 66 Sig. (2-tailed) .622 Mean Difference -.10896 95% Confidence Interval of the Difference Lower Upper -.5484 .3305 Independent Samples T Test Compares the mean scores of two groups, on a given variable Ex- is the IQ score for my control and experimental groups different? Is the mean height different for men and women? Independent Samples T Test • Analyze Compare means 1 sample t test Test variable= the dependent variable (iq, height) Grouping variable= differentiate the 2 groups you’re comparing. Example, you have a variable called sex and the values can be 1 or 2 corresponding to male and female. Paired samples t test • Compares the means of 2 variables; tests if average difference is significantly different from zero • Use when the scores are not independent from each other (ex, scores from the same subjects before and after some intervention) • Ask yourself: are all the data points randomly selected, or is the second sample paired to the first? Ex), before using your device the subject’s mean happiness score was 100, afterwards it’s 102, is this average difference of 2 significantly different from no difference at all? Paired (dependent) samples T test • Analyze Compare means paired samples t test HappinessScoreBefore HappinessScoreAfter Which T test? • Is there a change in children’s understanding of algebra before and after using a learning program? • The average IQ in America is 100. Are MIT students different from this? • Which deodorant is better? Each subject gets each brand, one on each arm. • Which shoes lead to faster running? One sibling gets type A, the other type B. • Which remote control do people learn to use faster? We randomly select subjects from the population. • You will have more power with a repeated measure design, but sometimes (often) there are reasons you can’t design your study that way. -order effects (learning, long-lasting intervention) - ‘demand’ effects Important assumptions for inferential statistics • 1 homogeneity of variance check and correct if necessary (ex, Levene test; Welsh procedure) • 2. normal distribution check and correct if necessary (ex, transform data to log, square) • 3 random sample of population vital! Or else be clear on what population you’re really learning about • 4 independence of samples vital! Knowing the score for one subject give you no specific hints on how another will score Anova • T tests are when you have only 1 or 2 groups. For more, use the anova model. • Basic method: compares the variance between groups/within groups • Is this ratio (‘F ratio’) is significantly >1 1 way anova • Compare means from multiple groups What is the effect of three different online learning environments and students’ ‘interest’ score? Three different groups (N=12) OnlineEnvironment1 9 7 8 9 OnlineEnvironment2 2 3 1 1 OnlineEnvironment3 8 7 9 7 8.25 1.75 7.75 Treatment means 5.92 overall mean (‘grand mean’) 1 way anova • The basic model is that An individual score = overall mean + effect of treatment (group mean) + error Total variance = total variance between groups +total variance with group (as error term) ‘sum of squares’ OnlineEnvironment1 9 7 8 9 OnlineEnvironment2 2 3 1 1 OnlineEnvironment3 8 7 9 7 8.25 1.75 7.75 5.92 SS total = (9-5.92)^2 + (7-5.92)^2 + (8-5.92)^2… + (7-5.92)^2 = 112.92 SS between = (8.25-5.92)^2 + (1.75-5.92)^2 + (7.75)^2 = 26.17 SS within= (9-8.25)^2 + (7-8.25)^2… + (7-7.75)^2 = 7.42 Mean squares You get the average sum of squares, or mean squares, by dividing sum of squares by degrees of freedom (measure of independent pieces of information) • Df between = J-1 (groups-1) • Df within = N-J (total people-groups) • So, MS between = 26.17/2 = 13.1 • MS within = 7.42/ (12-3=9) = .83 F ratio • MS between/MS within • Signal/ Noise ratio • 13.1/.83 = 15.78 • If no effect, you’d expect a ratio of 1 • Ratio of 15 seems strong. Check with F table (same principle as with T test earlier!) Spss 1 way anova • Analyze General Linear Model Univariate Fixed vs random factors • Fixed factor: the levels under study are the only levels of interest; you can’t generalize to anything else • Random effect: levels were drawn randomly from population, you can generalize • Ex- do people from different countries like my new phone differently? Give phone to people from Japan, India, America. 2+ way anova • Main effects • Interaction effects : Testing gender and age (undergrad vs senior citizen): DV = engagedness with robot You get 3 overall effects -effect of gender on engagedness -effect of age on engagedness -interaction of gender and age on engagedness (does the effect of gender depend on age?/ does the effect of age depend on gender?) Reading the output Tests of Between-Subj ects Effects Dependent Variable: ANXIETY Source Corrected Model Intercept CHLDREAR SEX CHLDREAR * SEX Error Total Corrected Total Type III Sum of Squares 944.708a 3927.042 826.583 100.042 18.083 149.250 5021.000 1093.958 df 5 1 2 1 2 18 24 23 a. R Squared = .864 (Adjusted R Squared = .826) Mean Square 188.942 3927.042 413.292 100.042 9.042 8.292 F 22.787 473.613 49.844 12.065 1.090 Sig . .000 .000 .000 .003 .357 Partial Eta Squared .864 .963 .847 .401 .108 contrasts OnlineEnvironment1 9 7 8 9 OnlineEnvironment2 2 3 1 1 OnlineEnvironment3 8 7 9 7 8.25 1.75 7.75 5.92 The main effects (‘omnibus test’) tells you that something is going on here, there is some difference somewhere, but doesn’t tell you what. Is group 1 different than group 3? Are groups 1 and 3 together different than group 2? Spss contrasts Contrast coefficients (add to zero): 1, 0, -1 .5, -1,.5 “Name brand” polynomial etc Omnibus v contrasts • Significant omnibus means there will be at least 1 significant contrast, but • Nonsignificant omnibus DOESN’T necessarily mean there are no significant contrasts A Priori vs Post-hoc • A priori (planned) = theory driven, you planned to test this before you saw your data • Post-hoc = exploratory, data-driven • When doing post-hoc contrasts you must be especially careful of type 1 error. Planned v post-hoc • Family wise error – take this SERIOUSLY • With an alpha (type 1 error) of .05, you expect 1 test 5% chance you’re wrong 5 tests 25% chance one of them is wrong 20 tests 1 of them is probably wrong • To keep overall error at 5%, if you are doing multiple contrasts you can do a “Bonferroni” correction, which just means you divide .05 by the number of contrasts • Ex, 10 contrasts. I want overall error to be 5%. So each contrast must meet at stricter cutoff- .005% Correlation and Regression • Correlation: linear relationship between X and Y (no assumptions about IV/DV) • Regression: what is the best guess about Y given a certain value of X (X is IV) • Similar to anova model Best fit line to the data racism as a function of conservatism 7 6 5 4 3 2 1 0 1 2 political conservatism 3 4 5 6 spss -Analyze correlate bivariate Correlations date date meandif Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N 1 82 .013 .908 82 meandif .013 .908 82 1 82 The 2 variables, date and meandif, have a Pearson correlation of .013, and the significance is .908. (i.e. not significant). • Analyze regression linear ANOVAb Model 1 Regression Residual Total Sum of Squares 61.393 202.148 263.542 a. Predictors: (Constant), maxvoltage b. Dependent Variable: secondcal df 1 65 66 Mean Square 61.393 3.110 F 19.741 Sig. .000a Effect size • Review: can be significant but tiny effect • Knowing significance doesn’t give indication of effect strength • Amount by which the 2 populations don’t overlap • Amount of total variance explained by your variable (sometimes) D=(mean1-mean2)/standard deviation Effect size • There are different measures of effect size • SPSS gives you ‘partial eta squared’ variance associated with your variable/ variance associated with your variable + error Check box Confidence intervals -If you did the experiment 100 times and made a distribution of your results, the true mean would fall within these results 95% -In other words, there is a 95% chance that the true mean falls within your confidence interval -If it crosses 0, it’s nonsignificant Independent Samples Test Levene's Test for Equality of Variances F meandif Equal variances assumed Equal variances not assumed Sig. .002 .964 t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 1.262 80 .210 4.58285 3.63036 -2.64180 11.80750 1.266 79.027 .209 4.58285 3.62026 -2.62306 11.78876 Power calculation • Tough to actually calculate unless you have an idea of the effect size (which you make up, or get an idea from past research) • Power is a matter of effect size and N Ex) what is the effect of a new keyboard on typing speed? Mean typing speed with old keyboard = 40 Standard deviation of population using old keyboard =10 Researcher plans to do a study with 25 people and predicts that with the new keyboard, people’s typing speed will be 29. This will be tested at the 1% significance level (1-tailed). Z scores • Standardized way to see how many standard deviations away from the mean is a point? Power calculation cont comparison distribution mean =40; sd =10 Predicted mean = 49; sd =10 N=25 1.Sd distribution of means = sqrt (10^2/25) =2 2. Figure the cutoff on the comparison distribution (use z table) 3. Figure the z score for this point on the experimental distribution 4. Figure probability of getting more extreme score than cutoff. This is your power.