Wk13_2

advertisement
This time: A correction, ANOVA with review, student reviews.
Assignment 4 is marked and in the workshop.
Correction from Monday’s lecture.
A more detailed account of why this is so is in Wk13_Extra on
webpage, but it’s for interest only.
What I said:
The proportion of variation explained is just like the r-squared
from correlation. In fact, if I took the correlation of marriage
number and marriage length and squared it, I should get the
same proportion.
What I neglected to say:
I will get the same amount of variance that is explained
THE REGRESSION.
BY
The regression and ANOVA
are two different ways to model the data, so depending on the
data one will explain a lot more of the variance than the other.
If the independent variable is nominal (groups), ANOVA will do
much better. It will explain more variance than a regression.
If the data is independent variable is interval (X), regression
will do better.
In the caffeine.sav data, ANOVA beats regression hands down.
ANOVA r-squared 235/281 = 0.836, 83.6% variance explained
Regression r-squared 16/281 = 0.057, 5.7% variance explained
But in the marriage data, number of previous marriages could
be interpreted as interval. This fact and the linear decrease in
marriage length as previous marriages increases means that
regression is ALSO a useful tool to analyze the data.
In short, both ANOVA and regression were appropriate for the
marriage data. We would expect to get similar results from
both.
Regression explains 80.3%
ANOVA explains 80.6%
Too long, didn’t read:
When I said the proportion of variance explained is exactly rsquared…
Proportion of variance explained in a regression is the rsquared for regression.
Proportion of variance explained in ANOVA is the r-squared for
ANOVA. You can compare these to help determine which is
better.
That they matched for the marriage example was a freak event
Thanks for bearing with me sandwiching that into the lecture.
(No dragons harmed in this DLT)
New example: Consider the dataset Ch8_24.sav, which has a
list of patients with a panic disorder including
- The type of treatment they are receiving.
(Behavioural, Cognitive, or Medication)
- The response from 0 (no help) to 10 (helping
completely) to the treatment.
Tossing ethics aside, let’s assume that the assignment of
therapy to person was random.
For cost reasons, the medication group is larger than the other
two. This is called an ____________ design.
The response/dependent variable is ordinal (0-10 scale), but
we’ll treat it like it’s interval because the points on the scale
could be assumed to be evenly spaced apart.
The explanatory/independent variable is nominal (type of
treatment). It has no natural ordering, so we can only treat it
as nominal.
What should we do?
Our toolbox:
Normal z-test
One sample t-test
Two sample t-test
Correlation
Regression
Chi-Squared
Odds Ratio
ANOVA
Our toolbox:
Normal z-test
  One group only, needs known σ
One sample t-test
  One group only
Two sample t-test
  Two groups only
Correlation
Regression
Chi-Squared
Odds Ratio
ANOVA
Our toolbox:
Normal z-test
  One group only, needs known σ
One sample t-test
  One group only
Two sample t-test
  Two groups only
Correlation
  Needs interval explanatory
Regression
  Needs interval explanatory
Chi-Squared
Odds Ratio
ANOVA
Our toolbox:
Normal z-test
  One group only, needs known σ
One sample t-test
  One group only
Two sample t-test
  Two groups only
Correlation
  Needs interval explanatory
Regression
  Needs interval explanatory
Chi-Squared
  Needs nominal response
Odds Ratio
  Needs nominal response
ANOVA
  This one
There is one more requirement of ANOVA, _________
standard deviation. We can check this subjectively but looking
at a scatterplot.
They appear to be spread about the same amount, so the
assumption that the standard deviations are the same is
reasonable.
However, since all the values are whole numbers, the
scatterplot can be hiding something: multiple cases with the
same value.
There’s no way to tell how many cases each of these dots
represents. There could be any number of cases that with a
response of “5”.
Option one: Another visualization.
Does anyone else remember the __________________?
The boxplot gives us a picture of a measure of spread, the
_______________, the range between Q1 and Q3.
The height of each box is half the data, and no box is much
larger than any other.
The boxes are close to the same height; also none of the
categories have tons of outliers. So there’s little evidence that
the true standard deviations are different.
Side-by-side boxplots give information that scatterplots can’t,
they’re also very useful when the groups have _________.
Option two: Look at the sample standard deviation.
We can find it and other info in the summary statistics.
All the standard deviations are between 1.68 and 1.81, again
no evidence of heteroscedastisity.
Let’s actually do the ANOVA, with a default alpha 0.05.
Is there a significant difference between the means?
No. Sig., the p-value is .204. P-value > Alpha, so we fail to
reject to null (that all the true means are the same).
The sample means aren’t different enough to say that that the
population means are different.
How much of the variation in response to therapy is explained
by the type of treatment?
Only 10.426 / 84.074 = 0.124, or 12.4% of the variation is
explained.
Knowing the group would help very little if at all in predicting
the response to treatment.
Some ending notes about this problem:
Having one group larger than the others (13 cases for
medication, 7 cases in the behavioural and cognitive), didn’t
cause any problems.
Like t-tests and chi-squared tests, really small groups have
their own issues, but just because they’re small, not because
they’re a different size than some other group.
Only having groups with different amounts of variation and
oddities like outliers are a problem (we’ll see one on Friday).
Early ending for student evaluations.
Next time: ANOVA examples, course wrap-up.
For reference (not on final):
Boxplots made by :
Graphs  Legacy Dialogs  Boxplots
 Summaries for groups of cases.
Summary stats made by:
Analyze  Descriptive Stats  Explore
Put “Response to Therapy” in Dependent List
Put “Name of Therapy” in Factor List
Download