Estimate the Pearson correlation.

advertisement
-
Identify the appropriate test
One proportion test
Two samples independent
Two samples paired
Estimate the correlation
Significant correlation
What test should you do? Example 1
You want to find if more than 90% of skytrain users have valid
fare.
1. The test will be ___________ tailed.
2. It involves ______ sample(s).
3. It deals with _______________ (mean(s) / proportion(s)).
What test? Example 2.
Two groups of people, one from western Canada and one
from eastern Canada, were asked to rate the appeal of a pilot
episode for a TV series from 0 to 10.
You’re interested in whether there is a difference between
eastern and western preferences.
1. The test will be ___________ tailed.
2. It involves ______ sample(s), which are ____________
3. It deals with _______________ (mean(s) / proportion(s)).
What test should you do?
You want to know if adding vitamins to the diet of one child
will make that child grow taller than their sibling.
1. The test will be ___________ tailed.
2. It involves ______ sample(s), which are ____________
3. It deals with _______________ (mean(s) / proportion(s)).
You want to know if adding vitamins to the diet of one child
will make that child grow taller than their sibling.
We need a One-tailed, paired-sample test
Possible confounds: Age (are we comparing siblings of
different ages? Should we make the comparison in
adulthood?)
Sex (Boys tend to be taller, but the difference appears mainly
in later ages).
If we were correlating height to vitamin use, we could handle
the confounding variables by taking the
__________________ of vitamin intake and height,
______________age and sex.
One downside to partial correlation (other than the additional
math that we leave to SPSS) is that every variable we control
for costs us another degree of freedom.
In this course, we control for a single variable, so the degrees
of freedom becomes _______
Seriously, they fly for over a hundred metres to escape
predators.
In the vitamin example, say the mean height difference was
23cm.
(When they first tested the effects of vitamins in Mexico, the
results were DRAMATIC)
Assume these are heights of adult men after the experiment.
Baseline Sibling
146 cm
154
160
141
155
Vitamin Sibling
167 cm
179
178
172
175
Difference
21 cm
25
18
31
20
Since this is paired data, everything is done using the
differences and not the original data.
Difference
21 cm
25
18
31
20
Mean difference = 23
Standard deviation of the difference SD= 5.15 (Will be
provided)
SD cannot be found from the standard errors of the two
groups.
In paired data, everything is handled like a single sample, but
it’s a sample of the differences.
SD= 5.15
= 23
n = 5 (5 pairs, so 5 raw numbers)
We could get a confidence interval of the mean difference.
n =5, so df = n= 1 = 4
For a 95% confidence interval, t* = 3.184
The confidence interval is
We’re 95% sure this interval contains the true mean of the
difference.
At alpha = 0.05, we would reject any null hypothesis that
claimed the mean difference was outside (15.67 to 30.33)
Example 4: A recent sample of 120 faucets in Simon Fraser –
Burnaby found that 42 of them were faulty. If more than 0.25
of the fountains are faulty then ‘major repairs needed’ will be
declared on all of the faucets.
Test if the 42/120 is significantly above 0.25.
What kind of test?
We should a one-tailed test of one proportion.
Recall: Proportions always use the normal distribution (z-score
instead of t-score, df irrelevant)
P is the sample proportion.
P = 42/120 = 0.35
SE = 0.0435, now we find the z score
Z = 2.297. From the normal table we could get the p-value of
0.0108.
From the t-table, bottom row, one-tailed, we could get
0.01 < p < 0.025 because z is between 1.960 and 2.054.
We’re interested in 5% on a particular side. A 90% confidence
interval would also tell us if we’re significantly over .25 faulty
faucets.
90% because it’s 10% on the outside, so 5% on each side.
The 90% confidence interval that SPSS gives us is.
It’s always the confidence interval of the _______ in SPSS.
In this case that means difference from the _______
______________ value of .25.
The confidence interval of the proportion is (.28 to .42)
(That’s a difference of .03 from .25 and a difference of .17 from
.25)
Komodo Dragons have a slow acting poison to let them take
down pray larger than them. They’re already huge.
Recall: The requirements for correlation:
1.
2.
3.
4.
Interval data.
Normally distributed x and y.
Linear Relationship
Random sample (usually assumed instead of checked)
Also, heteroscedasticity can be a problem because it inflates
the number of outliers.
Estimate the Pearson correlation.
Identify any problems with using the correlation.
The correlation is _______ There is no upward or
downward trend (through the whole thing).
Correlation doesn’t describe the strength of the relationship
because the ______________ requirement isn’t met.
Estimate the Pearson correlation.
Identify any problems with using the correlation.
The correlation is _______There is an upward trend but still
some unexplained scatter/variation. (______________ of
the variation is explained)
No notable issues with correlation here.
Estimate the Pearson correlation.
Identify any problems with using the correlation.
The correlation is _______. There is a downward trend, but
it’s by no means a perfect line.
Also, there is evidence of heteroscedasticity.
Consider the following correlation.
r = -0.600, n=18
How much variance in Y is explained by X?
Consider the following correlation.
r = -0.600, n=18
How much variance in Y is explained by X?
______________
Is this correlation significantly different from zero at
= 0.05?
Is this correlation r = -0.600, n=18 significantly different from
zero at = 0.05?
t-score = 3.
n = 18, so degrees of freedom = 16.
(That’s a sample of 18 minus 2 variables)
t = 3. t* = 2.119 for df = 16, ______________0.05 level.
(two-sided or two-tailed because we didn’t specify above zero
or below zero)
t-score t > t-critical t*
So we reject the null hypothesis. This correlation is
significantly different from zero.
t-score = 3.
n = 18, so degrees of freedom = 16.
(That’s a sample of 18 minus 2 variables)
t = 3. t* = 2.119 for df = 16, two-sided 0.05 level.
(two-sided or two-tailed because we didn’t specify above zero
or below zero)
t-score t > t-critical t*
So we ______________ the null hypothesis. This
correlation is significantly different from zero.
New question. Consider the correlation r=0.125, n=300.
How much variance in Y is explained by X?
New question. Consider the correlation r=0.125, n=300.
How much variance in Y is explained by X?
_____________________.
Through this correlation, X explains _______of the variance.
This correlation is pretty weak. But is it significant (
= 0.05)?
t = 2.175. t* = 1.980 ,two-tailed, 0.05 level df = 120 (book)
t = 2.175. t* = 1.968 ,two-tailed, 0.05 level df = 298 (CPU)
t-score > t*
So again, we reject the null. Despite the correlation being
weak, it is significant.
Depending on the situation r = 0.125 n=300 may not be
practically significant, but it is statistically significant, which is
the first step.
(All statistical significance means is that we can conclude the
correlation isn’t zero)
Good luck on Friday!
Download