Statistics 203

advertisement
Topics for Today
Introduction to Non-parametric
Significance tests
1 - Way Chi-square test
2 – Way Chi-square test
Stat203
Fall 2011 – Week 9 Lecture 3
Page 1 of 23
Non-Parametric is a big word
But it makes life easier! (sometimes)
Recall, we had assumptions that were
necessary to use the t-tests for comparing
means and z-tests for comparing proportions.
We might need a non-parametric test if these
assumptions (or conditions) are not met
There are also some scientific questions about
nominal or ordinal data that can’t be answered
using the t-tests or z-tests.
First off, a recap of the assumptions/conditions
necessary for the t-tests and z-tests we’ve
looked at already
Stat203
Fall 2011 – Week 9 Lecture 3
Page 2 of 23
Assumptions: t-tests for means
Hypotheses involving means of interval or ratio
level data.
Assumptions required to use a t-tests for one
or more samples:
- Test variable(s) normally distributed, or
the sample(s) large enough (ie: > 50) so
that the sampling distribution of the mean
is normally distributed
- Interval or Ratio level data
Stat203
Fall 2011 – Week 9 Lecture 3
Page 3 of 23
Assumptions: z-tests for proportions
Hypotheses involving 1 or 2 proportions.
Assumptions required to use a z-test:
- 𝑛𝑝0 > 10 and 𝑛(1 − 𝑝0 ) > 10 (1-sample)
- 𝑛1 > 10 & 𝑛2 > 10 and 𝑛1 𝑝1 > 5 &
𝑛2 𝑝2 > 5 (2-samples)
Stat203
Fall 2011 – Week 9 Lecture 3
Page 4 of 23
1-Way Chi-Square Test
The objective of this test is to determine how
similar is an observed set of frequencies (or
relative frequencies), fo, to an expected set of
frequencies, fe.
A typical research hypothesis would indicate
that individuals are more (or less) likely than
expected to select some categories more than
others.
… and the most common research hypothesis
is that the relative-frequency of responses is
similar for all categories.
Which has the following statistical hypotheses:
H0: fo = fe
Ha: fo ≠ fe
Stat203
Fall 2011 – Week 9 Lecture 3
Page 5 of 23
… but what are fo and fe?
We’ve seen fo before, it’s just the observed
frequency (or relative frquency) for each
category of a nominal or ordinal variable!
The new item is fe … think of this as some
expected relative frequency, or %. What does
that mean?
- What is the expected relative frequency of
men and women going into a mens room?
- What is the expected relative frequency of
heads and tails out of 100 flips of a fair
coin?
- In Vancouver, what is the expected
relative frequency of raining and sunny
days?
Stat203
Fall 2011 – Week 9 Lecture 3
Page 6 of 23
The Chi-Square Test Statistic
Here’s the formula that we’ll need to calculate
the Chi-square test statistic:
2
(
)
𝑓
−
𝑓
𝑜
𝑒
2
𝜒 =∑
𝑓𝑒
…so it’s a bit more complicated than the tstatistic for means and the z-statistic for
proportions.
Once we calculate this, we then look up the
value in Table E. As with the t-distribution,
though, we need a ‘degrees of freedom’ for
this test statistic.
Differently from the t-test, the degrees of
freedom for the Chi-square is the number of
categories (k) minus 1.
Stat203
Fall 2011 – Week 9 Lecture 3
Page 7 of 23
Example (1-way Chi-Square test): To
determine whether dogs are color blind, a
student sets up an experiment where she
provides food to a dog in 4 differently coloured
dishes and records the colour of the dish the
dog chooses to eat from first. She does this for
a total of 80 dogs, randomly ordering the
dishes each time. If dogs are truly colour blind,
each colour dish should be selected about the
same number of times.
The Chi-Square test allows us to formally test
this research hypothesis.
Research Hypothesis:
Individuals:
Population:
Variable:
Stat203
Fall 2011 – Week 9 Lecture 3
Page 8 of 23
Parameter:
Statistical Hypotheses:
The observed frequency of each colour from
the 80 dogs is below, as is the expected
frequency if the dogs were colour blind:
Colour
Brown
Orange
Yellow
Green
fo
25
18
19
18
fe
20
20
20
20
And we have,
N = 80
k =4
(# of categories)
Now, let’s calculate our test statistic:
Stat203
Fall 2011 – Week 9 Lecture 3
Page 9 of 23
Stat203
Fall 2011 – Week 9 Lecture 3
Page 10 of 23
p-value:
Reject H0 at α = 0.05?
Conclusion:
Stat203
Fall 2011 – Week 9 Lecture 3
Page 11 of 23
2-Way Chi-Square Test
We used the 1-way test, to determine whether
the observed relative frequency distribution
was different than some ‘expected’
distribution.
note the similarity to a 1-sample test for a mean
or proportion where we are testing whether the
mean or proportion is different than some ‘null’
value
We can use a 2-way test to determine whether
relative frequency distributions from two
samples are the same or different from one
another.
note the similarity to a 2-sample test for means
or proportions where we are testing whether the
means or proportions are different from one
another.
Stat203
Fall 2011 – Week 9 Lecture 3
Page 12 of 23
Research questions that require a 2-way chisquare test, are based on relative frequencies
(like the 1-way test), but compare two
populations (or samples).
- Is the relative frequency of sunny days in
a year different between Vancouver and
Seattle?
- Is the relative frequency of female
students different between UBC and
SFU?
- Is the relative frequency of job type (white
vs blue vs service) the same for women
and men?
Stat203
Fall 2011 – Week 9 Lecture 3
Page 13 of 23
Example (Q24, pg 338): A radio executive
considering a switch in his station’s format
collects data on the radio preferences of
various age groups of 78 listeners. Does radio
format preference differ by age group?
Research Hypothesis:
Individuals:
Populations:
Variables:
Parameters:
Statistical Hypotheses:
Stat203
Fall 2011 – Week 9 Lecture 3
Page 14 of 23
The observed frequency (fo) of age group and
radio format preference is below:
Format
Music
News-talk
Sports
Total
Age Group
Young Middle Older Total
Adult Age
Adult
14
10
3
27
4
15
11
30
7
9
5
21
25
34
19
78
And we have,
N = 78
k =9
(# of categories)
but … we need fe! Here’s the formula for each
cell, with row and column totals associated
with that cell:
(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 ) ∗ (𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)
𝑓𝑒 =
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
Stat203
Fall 2011 – Week 9 Lecture 3
Page 15 of 23
So, the table with fe is:
Age Group
Young
Adult
Music
25*27/78
News-talk 25*30/78
Sports
25*21/78
25
Total
Format
Middle
Age
34*27/78
34*30/78
34*21/78
34
Older
Total
Adult
19*27/78 27
19*30/78 30
19*21/78 21
19
78
Which … after you do the arithmetic you get:
Age Group
Young
Adult
Music
8.7
News-talk
9.6
Sports
6.7
25
Total
Format
Stat203
Fall 2011 – Week 9 Lecture 3
Middle
Age
11.8
13.1
9.2
34
Older
Adult
6.6
7.3
5.1
19
Page 16 of 23
Total
27
30
21
78
… now that we have all the components, we
can calculate our test statistic (try the
arithmetic on your own):
2
(
)
𝑓
−
𝑓
28.1 3.2 13.0
𝑜
𝑒
2
𝜒 =∑
=
+
+
+
𝑓𝑒
8.7 11.8 6.6
31.4
9.6
0.09
6.7
+
3.6
13.1
+
+
0.04
9.2
13.7
+
7.3
0.01
5.1
=10.9
p-value:
Stat203
Fall 2011 – Week 9 Lecture 3
+
Page 17 of 23
Reject H0 at α = 0.05?
Conclusion:
Stat203
Fall 2011 – Week 9 Lecture 3
Page 18 of 23
One snag …
There is still an assumption necessary for us
to be able to use any of the Chi-Square tests.
All the cells in the table (ie: the frequency for
all categories) must be at least 5.
Stat203
Fall 2011 – Week 9 Lecture 3
Page 19 of 23
Other names for Chi-square tests:
1-Way:
o One-sample Chi-square
o Chi-square goodness of fit
2-Way
o 2-sample Chi-square
o 2x2 Chi-square
o r by c Chi-square
o Chi-square test for independence
Nice page describing how to do Chi-Square
tests in SPSS.
http://academic.uofs.edu/department/psych/methods/cannon99/level2d.html
Stat203
Fall 2011 – Week 9 Lecture 3
Page 20 of 23
So, a decision tree for choosing hypothesis
tests:
Single sample?
- Interval or Ratio data?
o One-sample t-test
- Nominal or Ordinal data?
o Proportion (ie: 2 categories)?
 1-sample z-test for proportions
o Distribution (ie: several categories)?
 1-way Chi-square
Stat203
Fall 2011 – Week 9 Lecture 3
Page 21 of 23
Two samples?
- Interval or Ratio data?
o Individuals measured twice/Matched?
 Paired t-test
o Variances equal?
 2-sample t-test w/equal variances
o Variances not equal?
 2-sample t-test w/unequal variances
- Nominal or Ordinal data?
o Proportion (ie: 2 categories) &
conditions met?
 2-sample z-test for proportions
o Distribution (ie: several categories)?
 2-way Chi-square
Stat203
Fall 2011 – Week 9 Lecture 3
Page 22 of 23
Today’s Topics
Chi – Square tests
- for comparing distributions of nominal or
ordinal data
- 1-way compare distribution in a single
sample to some expected distribution
- 2-way compare distributions for two
populations
New Reading
Chapter 10 up to pg 352
Stat203
Fall 2011 – Week 9 Lecture 3
Page 23 of 23
Download