Problems 10.19 (in detail), and 10.21 (basic)

advertisement
Problems 10.19 (in detail), and 10.21 (basic)
Using correlation to see if paired
Begin Chapter 11: The regression equation.
Problem 10.19: Do reading and television viewing compete for
leisure time? We have a random sample of 10 children with…
X: Books read last year
0
7
2
1
5
4
3
3
0
1
Y: Hours TV watched per day
3
1
2
2
0
1
3
2
7
4
First, what are we interested in finding?
(X is books/year, Y is TV/day)
A) Is the mean of X less than a specific value?
B) Is the mean of X less than the mean of Y?
C) Does X decrease as Y increases?
D) Is the proportion of X more than a specific value?
What are we interested in finding?
(X is books/year, Y is TV/day)
A) Is the mean of X less than a specific value? NO
One sample t-test
B) Is the mean of X less than the mean of Y?
C) Does X decrease as Y increases?
D) Is the proportion of X more than a specific value?
What are we interested in finding?
(X is books/year, Y is TV/day)
A) Is the mean of X less than a specific value? NO
One sample t-test
B) Is the mean of X less than the mean of Y? NO
Two sample t-test
C) Does X decrease as Y increases?
D) Is the proportion of X more than a specific value?
What are we interested in finding?
(X is books/year, Y is TV/day)
A) Is the mean of X less than a specific value? NO
One sample t-test
B) Is the mean of X less than the mean of Y? NO
Two sample t-test
C) Does X decrease as Y increases?
D) Is the proportion of X more than a specific value? NO
One sample z-test (of a proportion)
What are we interested in finding?
(X is books/year, Y is TV/day)
A) Is the mean of X less than a specific value? NO
One sample t-test
B) Is the mean of X less than the mean of Y? NO
Two sample t-test
C) Does X decrease as Y increases? YES!!!!
Correlation
D) Is the proportion of X more than a specific value? NO
One sample z-test (of a proportion)
So let’s do a correlation.
We’re trying to find out if book reading decreases as television
watching increases.
In other words, we’re looking for a __________
correlation.
That implies that any test for significance will be
______ tailed, also called _______ sided.
So let’s do a correlation.
We’re trying to find out if book reading decreases as television
watching increases.
In other words, we’re looking for a NEGATIVE
correlation.
That implies that any test for significance will be
ONE tailed, also called ONE sided.
Analyze  Correlate  Bivariate
Protip: You can set the test to one-tailed in the pop-up
window where you select variables. This just cuts the p-value
in half for you.
r = -.725, which is strong and negative.
p-value = .009
We could reject the null hypothesis that books read and TV
watched are uncorrelated, but.. are we done?
r = -.725, which is strong and negative.
p-value = .009
We could reject the null hypothesis that books read and TV
watched are uncorrelated, but.. are we done?
No, we should also check the scatterplot and
maybe the histograms or residuals.
A correlation requires a linear relationship and
normality.
With n=10, a histogram won’t tell us much.
The scatterplot shows a downward trend. With so few points,
it’s hard to tell if there’s a curve or just an influential point and
an outlier.
We can do a Spearman correlation as well, which handles
curved relationships better.
We come to the same conclusion: a strong negative
correlation, definitely significant.
Part of the problem with this data point is that the number of
books read per year can’t go below zero. Following the trend,
this person would read negative books if he/she could.
We have a slight violation of normality. You can only read 0
books or more, but the normal curve continues into the
negatives. Usually this part of the curve is so small we ignore
it, but here it comes up. Without it, correlation might have
been even stronger.
One more thing: We’re using books per year, but TV per day?
Does it matter that these are on different scales.
TVyear is hours of television per year (TV *365, ignoring leapyears)
There is no difference.
Absolutely none.
This is good. It means the correlation reflects the relationship
between two variables regardless of the scale of
measurement.
Correlation (and t-scores, etc.) is unaffected by scale.
The size of an object is the same if you measure it in meters or
kilometers, so why should any conclusions about it change?
Another quick one before we finish Chapter 10.
Problem 10.21: Besides studying time, intelligence itself may
be related to test performance. Find the partial correlation of
studying time and exam grade, holding baseline skill constant.
Alpha = 0.10 (higher than usual)
X: Hours Studied
4
1
3
5
8
2
7
6
Y: Exam Grade
5
2
1
5
9
7
6
8
Z: Baseline skill
100
95
95
108
110
117
110
115
First, let’s look at the simple correlations.
We find a positive correlation between grades and study time.
Hours and skill look correlated, but with only six degrees of
freedom, it could easily be a fluke (We’d see this strength 24%
of the time)
Exam grades and baseline skill are also strongly correlated.
Is the grades payoff from study time just a side effect of a
possible link between skill and time spent?
For that we need the partial correlations.
Simple correlation rXY = .683
Partial correlation
rXY.Z
= .634.
Not much change. However, the correlation between study
time and skill is now considered not insignificant because
we’re down to 5 degrees of freedom.
rXY = .683
(simple)
rXY.Z = .634 (holding skill constant)
The skill-time correlation wasn’t significant, so it’s not a
surprize that it didn’t affect the grades-time correlation much.
Study time is positively correlated to both skill and grades, so if
there was a difference, we’d expect partial < simple.
Everyone likes finishing a chapter.
Chapter 11: Regression
With correlation, we can describe how strongly and in what
direction there is a linear relationship between two variables.
With regression we can describe how much one variable
increases or decreases as another one goes up.
This is done by way of a regression line, which is the line
that goes through the middle of the data in a scatter plot.
The formula of the regression line, and any line, is the slopeintercept formula.
X is the explanatory/independent variable.
Y is the response/dependent variable.
Which variable goes where is up to the question at hand.
b is the slope, it’s defined by rise-over-run.
(IMPORTANT!!!)
“For every 1 unit that X increases, Y increases by b units.”
If the slope b is positive, then Y will increase by a positive
amount when X increases.
If the slope b is negative, then Y will increase by a negative
amount when X increases. (In other words, Y decreases
when X increases when b is negative)
Either the correlation r and the slope b are both positive or
they’re both negative.
If the correlation is significantly difference from zero, so is the
slope.
a is the intercept.
It’s the average value of Y when X is zero.
This makes a lot more sense in a practical context.
e stands for error, or residual.
It’s a measure of how far the line is from describing Y perfectly.
Errors always has an average of zero (added over all points).
Example 1: Books vs. Television.
This the scatterplot of books read per year vs. TV watched per
day.
The regression line through it has the formula:
= 4.701 – 0.841 X
= 4.701 – 0.841 X
The slope is -0.841.
That means for every extra hour/day of TV watching, 0.841
less books are being read, on average.
= 4.701 – 0.841 X
The slope is -0.841, and the intercept is 4.701.
That means when no TV is watched (the TV variable is zero), an
average of 4.701 books are read per year.
= 4.701 – 0.841 X
There is no number for error shown because this is the average
trend.
is an estimate of the true response Y.
The errors are the difference between what we estimate, and
what we really get.
e
=
-
Y
When we’re just looking at the trend, we ignore e.
Example 2: Grades and study time.
= 1.893 + 0.774X
= 1.893 + 0.774X
The slope is 0.774.
That means for every 1 hour of study time, the average exam
grade went up by 0.774 points.
= 1.893 + 0.774X
The slope is 0.774, and the intercept is 1.893.
The means the average exam grade for someone who put no
hours into studying was 1.893.
Monday: More on the intercept, regression SPSS, prediction
and extrapolation.
Wednesday: Midterm 2 review. (Mostly chapter 7)
Friday: Midterm 2. DUN DUN DUN!
Download