One more example of a hypothesis test Chapter 10: Scatterplots

advertisement
One more example of a hypothesis test
Chapter 10: Scatterplots
If time: Joy of Stats – 200 Countries, 200 Years
You should know this.
You should be familiar with all of this, but don’t waste
too much time memorizing.
Alzhiemer’s Onset and Gender (From Ch.7 exercises, #21)
We’ve been given a list of age of Alzheimer’s onset ages from
men and women.
We want to find out if there is a difference between the ages
that men get Alzheimer’s and the age women get it.
Alzheimer’s Onset
Sample Mean
Standard Devation
Sample Size
Men
67.75
6.58
8
Women
66.55
5.34
9
Is this a one-sided or two-sided test?
“If there’s a difference” tells us this is…
______________
Is this a one-sided or two-sided test?
“If there’s a difference” tells us this is…
a two-sided test.
It’s a two-sided test…
Is it about means or proportions?
We’re talking about ages, so the ___________ is
appropriate.
It’s a two-sided test…
Is it about means or proportions?
We’re talking about ages, so the
appropriate.
mean
is
It’s a two-sided test of a mean or means.
We weren’t told otherwise, so we don’t know the true
standard deviation.
We use the sample standard deviation instead.
It’s a two-sided test of a mean or means, using the sample
standard deviation
We’re comparing men and women, so we’re
interested in ________________
It’s a two-sided, two-sample test of means, using the sample
standard deviation
We’re comparing men and women, so we’re
interested in two means
It’s a two-sided, two-sample test of means, using the sample
standard deviation
Is it an independent or a paired test?
There a different number of people in each
group, so they must be ________________
It’s a two-sided, independent two-sample test of means, using
the sample standard deviation.
Is it an independent or a paired test?
There a different number of people in each
group, so they must be independent
Hope you’re not getting caught up in the concepts
We have this data:
Alzheimer’s Onset
Sample Mean
Standard Devation
Sample Size
Men
67.75
6.58
8
Women
66.55
5.34
9
And we know to do an independent two-sample test.
All our formulae will have t and s in them, instead of z
and sigma.
If we do a confidence interval, it will have mu =
something, instead of pi = something.
We input the raw data into SPSS and then click
Analyze  Compare Means  Independent T Test
We get this for the first couple columns:
From this result we _____________ assume equal
variances. This is because __________________,
so we use the __________ row.
From this result we
can
assume equal variances.
This is because Sig. (p-value) is large.
so we use the top row.
We get this for the middle five columns of the table.
Sig. (2-tailed) is large so we _________________ the null.
That means we detect __________________ between the
onset age of alzhiemers between men and women.
Sig. (2-tailed) is large so we fail to reject the null.
That means we detect no significant difference
between the onset age of alzhiemers between men
and women.
We can also tell that the difference between the means was
________ standard errors. (t-score)
The area beyond this score (on both sides) was found on the tdistribution with _____ degrees of freedom.
The area was ____________ (p-value)
We can also tell that the difference between the means was
0.413
standard errors. (t-score)
The area beyond this score (on both sides) was found on the tdistribution with
15
degrees of freedom.
The area was .686 (p-value)
To a new chapter we go!
Chapter 10: Correlations
Correlations are one way to quanity and show the
relationship between two features of the same object.
Usually this is between two sets interval data,
otherwise it’s called an association.
If two values increase together, they are said to be
positively correlated. (As one goes up, so does the
other)
If one value increases as the other decreases, they are
said to be negatively correlated.
Example: Longer bearded dragons tend to be larger all
around, so they weigh more.
Length and Weight are positively correlated in bearded
dragons.
Example: Heating bills tend to be a lot less when it’s
warmer out.
Heating Cost and Outdoor Temperature are
negatively correlated.
The most common graph to show two sets of interval
data together is the scatter plot.
Each dot represents a subject. In Length vs. Weight,
each dot is a dragon.
The height of the dot represents the length of the
dragon. How far it is to the right represents the weight
of the dragon.
The dragon for this dot is 18cm long, and weighs 700g.
There is an obvious upward trend in the graph. This
shows a positive correlation.
The negative correlation between heating cost and
outdoor temperature can be shown the same way.
The lack of correlation between two variables can also
be show in a scatterplot.
The strength of a correlation is how well the data
points fit onto a straight line .
Stronger correlations are easier to see and have less
random scatter or variation.
We can quantify the strength and direction of a
correlation with the correlation coefficient.
The correlation coefficient, called…
r from a sample and
ρ, or rho from a population.
(we’ll see r frequently)
(we’ll see rho rarely)
Is a value between -1 and 1 that tells how strong a correlation
is and in what direction.
The stronger a correlation, the farther the coefficient is from
zero (and the closer it is to 1 or -1)
Positive correlations have positive coefficients r.
Negative correlations have negative coefficients r.
The stronger the negative correlation, the closer it is to -1.
A perfect correlation, one in which all the values fit perfectly
on a line, has a correlation 1 (for positive) or -1 (for negative).
If there is no correlation at all, r will have a value of zero.
However, since r is from a sample, it will vary like everything
else from a sample. Instead of zero, it usually has some value
close to zero on either side.
Recall the Burnaby vs. Coquitlam gas example from last week.
One reason a pooled t-test was appropriate was because gas
prices between the two cities were correlated.
Guess the correlation coefficient between Burnaby and
Coquitlam gas prices.
A) r = 0.05
B) r = 0.97
C) r = 0.592
D) r = -0.592
C) r = 0.592
There is a relationship between Burnaby and Coquitlam gas
prices, but it’s not a perfect relationship.
It’s postive, so the correlation coefficient r is postive, not
negative.
Which of these is a possible correlation coefficent?
A) r = -0.28
B) r = 1.21
C) r = 0.41 grams per bean.
Which of these is a possible correlation coefficent?
A) r = -0.28
r is always between -1 and 1. Also, it has no units, so ‘grams
per bean’ doesn’t make much sense, even through it’s a
relationship between two variables.
Joy of Stats 28:45 – 33:00 (200 Countries, 200 Years)
Next time: r-squared, significance test for correlation ,
nonlinearity.
Download