Correlations and T-tests Matching level of measurement to statistical procedures We can match statistical methods to the level of measurement of the two variables that we want to assess: Level of Measurement Nominal Ordinal Interval Ratio Nominal Chisquare Chisquare T-test ANOVA T-test ANOVA Ordinal Chisquare ChiSquare ANOVA ANOVA Interval T-test ANOVA ANOVA Correlation Regression Correlation Regression Ratio T-test ANOVA ANOVA Correlation Regression Correlation Regression However, we should only use these tests when: We have a normal distribution for an interval or ratio level variable. When the dependent variable (for Correlation, T-test, ANOVA, and Regression) is interval or ratio. When our sample has been randomly selected or is from a population. Interpreting a Correlation from an SPSS Printout Corre lations Educational Level (years) Pearson Correlation Sig. (2-tailed) N Beginning Salary Pearson Correlation Sig. (2-tailed) N Educational Level (years) 1 . 474 .633** .000 474 **. Correlation is significant at the 0.01 level (2-tailed). Beginning Salary .633** .000 474 1 . 474 A correlation is: An association between two interval or ratio variables. Can be positive or negative. Measures the strength of the association between the two variables and whether it is large enough to be statistically signficant. Can range from -1.00 to 0.00 and from 0.00 to 1.00. Example: Types of Relationships Positive Income ($) Negative Education (yrs) Income ($) No Relationship Education (yrs) Income ($) Education (yrs) 20,000 10 20,000 18 20,000 14 30,000 12 30,000 16 30,000 18 40,000 14 40,000 14 40,000 10 50,000 16 50,000 12 50,000 12 75,000 18 75,000 10 75,000 16 The stronger the correlation the closer it will be to 1.00 or -1.00. Weak correlations will be close to 0.00 (either positive or negative) You can see the degree of correlation (association) by using a scatterplot graph 22 20 18 16 14 12 10 8 6 0 20000 Current Salary 40000 60000 80000 100000 120000 140000 Looking at a scatterplot from the same data set, current and beginning salary we can see a stronger correlation 100000 80000 60000 40000 20000 0 0 20000 40000 Current Salary 60000 80000 100000 120000 140000 If we run the correlation between these two variables in SPSS, we find Correlations Beginning Salary Current Salary Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Beginning Current Salary Salary 1 .880** . .000 474 474 .880** 1 .000 . 474 474 **. Correlation is s ignificant at the 0.01 level (2-tailed). For these two variables, if we were to test a hypothesis at Confidence Level, .01 Alternative Hypothesis: There is a positive association between beginning and current salary. Null Hypothesis: There is no association between beginning and current salary. Decision: r (correlation) = .88 at p. = .000. .000 is less than .01. We reject the null hypothesis and accept the alternative hypothesis! (Bonus Question): Why would we expect the previous correlation to be statistically significant at below the p.= .01 level? Answer: This is a large data set N = 474 – this makes it likely that if there is a correlation, it will be statistically significant at a low significance (p) level. Larger data sets are less likely to be affected by sampling or random error! Other important information on correlation Correlation does not tell us if one variable “causes” the other – so there really isn’t an independent or dependent variable. With correlation, you should be able to draw a straight line between the highest and lowest point in the distribution. Points that are off the “best fit” line, indicate that the correlation is less than perfect (1/+1). Regression is the statistical method that allows us to determine whether the value of one interval/ratio level can be used to predict or determine the value of another. Another measure of association is a t-test. T-tests Measure the association between a nominal level variable and an interval or ratio level variable. It looks at whether the nominal level variable causes a change in the interval/ratio variable. Therefore the nominal level variable is always the independent variable and the interval/ratio variable is always the dependent. Example of t-test – Self –Esteem Scores Men Women 32 34 44 18 56 52 18 16 21 33 39 26 25 35 28 20 32.875 29.25 Important things to know about an independent samples t-test It can only be used when the nominal variable has only two categories. Most often the nominal variable pertains to membership in a specific demographic group or a sample. The association examined by the independent samples t-test is whether the mean of interval/ratio variable differs significantly in each of the two groups. If it does, that means that group membership “causes” the change or difference in the mean score. Looking at the difference in means between the two groups, can we tell if the difference is large enough to be statistically significant? Group Statistics Beginning Salary Gender Male Female N 258 216 Mean $20301.4 $13092.0 Std. Deviation ********* ********* Std. Error Mean $567.275 $199.742 T-test results Independent Samples Test Levene's Test for Equality of Variances F Beginning Salary Equal variances assumed Equal variances not assumed 105.969 Sig. .000 t-test for Equality of Means t Mean Sig. (2-tailed) Difference df Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 11.152 472 .000 $7,209.43 $646.447 $5939.16 $8479.70 11.987 318.818 .000 $7,209.43 $601.413 $6026.19 $8392.67 Positive and Negative t-tests Your t-test will be positive when, the lowest value category (1,2) or (0,1) is entered into the grouping menu first and the mean of that first group is higher than the second group. Your t-test will be negative when the lowest value category is entered into the grouping menu first and the mean of the second group is higher than the first group. Paired Samples T-Test Used when respondents have taken both a pre and post-test using the same measurement tool (usually a standardized test). Supplements results obtained when the mean scores for all the respondents on the post test is subtracted from the pre test scores. If there is a change in the scores from the pre test and post test, it usually means that the intervention is effective. A statistically significant paired samples t-test usually means that the change in pre and post test score is large enough that the change can not be simply due to random or sampling error. An important exception here is that the change in pre and post test score must be in the direction (positive/negative specified in the hypothesis). Pair-samples t-test (continued) For example if our hypothesis states that: Participation in the welfare reform experiment is associated with a positive change in welfare recipient wages from work and participation in the experiment actually decreased wages, then our hypothesis would not be confirmed. We would accept the null hypothesis and accept the alternative hypothesis. Pre-test wages = Mean = $400 per month for each participant Post-test wages = Mean = $350 per month for each participant. However, we need to know the t-test value to know if the difference in means is large enough to be statistically significant. What are the alternative and null hypothesis for this study? Let’s test a hypothesis for an independent t-test We want to know if women have higher scores on a test of exam-related anxiety than men. The researcher has set the confidence level for this study at p. = .05. On the SPSS printout, t=2.6, p. = .03. What are the alternative and null hypothesis? Can we accept or reject the null hypothesis. Answer Alternative hypothesis: Women have higher levels of exam-related anxiety than men as measured by a standardized test. Null hypothesis: There will be no difference between men and women on the standardized test of examrelated anxiety. Reject the null hypothesis, (p = .03 is less than the confidence level of .05.) Accept the alternative hypothesis. There is a relationship. Computing a Correlation Select Analyze Select Correlate Select two or more variables and click add Click o.k. Computing an independent t-test Select Analyze Select Means Select Independent T-test Select Test (Dependent Variable - must be ratio) Select Grouping Variable (must be nominal – only two categories) Select numerical category for each group (Usually group 1 = 1, group 2 = 2) Click o.k. Computing a paired sample t-test Select Analyze Select Compare Means Select Paired Samples T-test Highlight two interval/ratio variables – should be from pre and post test Click on arrow Click o.k. Data from Paired Sample T-test Paired Samples Statistics Pair 1 Current Salary Beginning Salary Mean $34419.6 $17016.1 N 474 474 Std. Deviation ********* ********* Std. Error Mean $784.311 $361.510 More data from paired samples t-test Pa ired Sa mpl es Test Paired Differences Mean Pair 1 Current Salary Beginning Salary $17403.5 St d. Deviation ********* St d. Error Mean $496.732 95% Confidenc e Int erval of t he Difference Lower Upper $16427.4 $18379.6 t 35.036 df Sig. (2-tailed) 473 .000 Analysis of Variance (ANOVA) Is used when you want to compare means for three or more groups. You have a normal distribution (random sample or population. It can be used to determine causation. It contains an independent variable that is nominal and a dependent variable that is interval/ratio.