- Identify the appropriate test One proportion test Two samples independent Two samples paired Estimate the correlation Significant correlation What test should you do? Example 1 You want to find if more than 90% of skytrain users have valid fare. 1. The test will be ___________ tailed. 2. It involves ______ sample(s). 3. It deals with _______________ (mean(s) / proportion(s)). What test? Example 2. Two groups of people, one from western Canada and one from eastern Canada, were asked to rate the appeal of a pilot episode for a TV series from 0 to 10. You’re interested in whether there is a difference between eastern and western preferences. 1. The test will be ___________ tailed. 2. It involves ______ sample(s), which are ____________ 3. It deals with _______________ (mean(s) / proportion(s)). What test should you do? You want to know if adding vitamins to the diet of one child will make that child grow taller than their sibling. 1. The test will be ___________ tailed. 2. It involves ______ sample(s), which are ____________ 3. It deals with _______________ (mean(s) / proportion(s)). You want to know if adding vitamins to the diet of one child will make that child grow taller than their sibling. We need a One-tailed, paired-sample test Possible confounds: Age (are we comparing siblings of different ages? Should we make the comparison in adulthood?) Sex (Boys tend to be taller, but the difference appears mainly in later ages). If we were correlating height to vitamin use, we could handle the confounding variables by taking the __________________ of vitamin intake and height, ______________age and sex. One downside to partial correlation (other than the additional math that we leave to SPSS) is that every variable we control for costs us another degree of freedom. In this course, we control for a single variable, so the degrees of freedom becomes _______ Seriously, they fly for over a hundred metres to escape predators. In the vitamin example, say the mean height difference was 23cm. (When they first tested the effects of vitamins in Mexico, the results were DRAMATIC) Assume these are heights of adult men after the experiment. Baseline Sibling 146 cm 154 160 141 155 Vitamin Sibling 167 cm 179 178 172 175 Difference 21 cm 25 18 31 20 Since this is paired data, everything is done using the differences and not the original data. Difference 21 cm 25 18 31 20 Mean difference = 23 Standard deviation of the difference SD= 5.15 (Will be provided) SD cannot be found from the standard errors of the two groups. In paired data, everything is handled like a single sample, but it’s a sample of the differences. SD= 5.15 = 23 n = 5 (5 pairs, so 5 raw numbers) We could get a confidence interval of the mean difference. n =5, so df = n= 1 = 4 For a 95% confidence interval, t* = 3.184 The confidence interval is We’re 95% sure this interval contains the true mean of the difference. At alpha = 0.05, we would reject any null hypothesis that claimed the mean difference was outside (15.67 to 30.33) Example 4: A recent sample of 120 faucets in Simon Fraser – Burnaby found that 42 of them were faulty. If more than 0.25 of the fountains are faulty then ‘major repairs needed’ will be declared on all of the faucets. Test if the 42/120 is significantly above 0.25. What kind of test? We should a one-tailed test of one proportion. Recall: Proportions always use the normal distribution (z-score instead of t-score, df irrelevant) P is the sample proportion. P = 42/120 = 0.35 SE = 0.0435, now we find the z score Z = 2.297. From the normal table we could get the p-value of 0.0108. From the t-table, bottom row, one-tailed, we could get 0.01 < p < 0.025 because z is between 1.960 and 2.054. We’re interested in 5% on a particular side. A 90% confidence interval would also tell us if we’re significantly over .25 faulty faucets. 90% because it’s 10% on the outside, so 5% on each side. The 90% confidence interval that SPSS gives us is. It’s always the confidence interval of the _______ in SPSS. In this case that means difference from the _______ ______________ value of .25. The confidence interval of the proportion is (.28 to .42) (That’s a difference of .03 from .25 and a difference of .17 from .25) Komodo Dragons have a slow acting poison to let them take down pray larger than them. They’re already huge. Recall: The requirements for correlation: 1. 2. 3. 4. Interval data. Normally distributed x and y. Linear Relationship Random sample (usually assumed instead of checked) Also, heteroscedasticity can be a problem because it inflates the number of outliers. Estimate the Pearson correlation. Identify any problems with using the correlation. The correlation is _______ There is no upward or downward trend (through the whole thing). Correlation doesn’t describe the strength of the relationship because the ______________ requirement isn’t met. Estimate the Pearson correlation. Identify any problems with using the correlation. The correlation is _______There is an upward trend but still some unexplained scatter/variation. (______________ of the variation is explained) No notable issues with correlation here. Estimate the Pearson correlation. Identify any problems with using the correlation. The correlation is _______. There is a downward trend, but it’s by no means a perfect line. Also, there is evidence of heteroscedasticity. Consider the following correlation. r = -0.600, n=18 How much variance in Y is explained by X? Consider the following correlation. r = -0.600, n=18 How much variance in Y is explained by X? ______________ Is this correlation significantly different from zero at = 0.05? Is this correlation r = -0.600, n=18 significantly different from zero at = 0.05? t-score = 3. n = 18, so degrees of freedom = 16. (That’s a sample of 18 minus 2 variables) t = 3. t* = 2.119 for df = 16, ______________0.05 level. (two-sided or two-tailed because we didn’t specify above zero or below zero) t-score t > t-critical t* So we reject the null hypothesis. This correlation is significantly different from zero. t-score = 3. n = 18, so degrees of freedom = 16. (That’s a sample of 18 minus 2 variables) t = 3. t* = 2.119 for df = 16, two-sided 0.05 level. (two-sided or two-tailed because we didn’t specify above zero or below zero) t-score t > t-critical t* So we ______________ the null hypothesis. This correlation is significantly different from zero. New question. Consider the correlation r=0.125, n=300. How much variance in Y is explained by X? New question. Consider the correlation r=0.125, n=300. How much variance in Y is explained by X? _____________________. Through this correlation, X explains _______of the variance. This correlation is pretty weak. But is it significant ( = 0.05)? t = 2.175. t* = 1.980 ,two-tailed, 0.05 level df = 120 (book) t = 2.175. t* = 1.968 ,two-tailed, 0.05 level df = 298 (CPU) t-score > t* So again, we reject the null. Despite the correlation being weak, it is significant. Depending on the situation r = 0.125 n=300 may not be practically significant, but it is statistically significant, which is the first step. (All statistical significance means is that we can conclude the correlation isn’t zero) Good luck on Friday!