Chi Square (x2)

advertisement
February 2013
 Nature
of the distribution is not known, or
known to be non-normal.
 Sometimes called distribution free
statistics
 Everything up to this point we’ve
assumed comes from data that IS
normally distributed.
 Nonparametric
ordinal data
tests use nominal and
• Nominal (presence or absence)




Pass or fail
Male or female
Presence or absence of a co-morbid disease in a clinical study
Taste (bitter, sweet or savory)
• Ordinal (some level of ranking of nominal features)
 Rating scales for pain
 Clubbing continuous variables into groups, e.g. high, medium
or low; young, middle aged, elderly; low income, moderate
income, high income.
 Number of bouts with asthma in a week
 Sign
test
 Test for equal medians between two groups
 Can be used on all data but more common in ordinal data
 Simple: counts the number of times the median is either (+)
higher or (-) in one group compared to another
 If +’s and –’s occur with equal frequency then we know the
medians are the same.
 Use a Z statistic for the proportion equal to 0.5 to test for
differences between the two groups.
 Does not account for how large or small the differences in
medians may be.
 NOTE: Does not require a normal distribution – and is basically
just like the parametric t-test. In fact, if the t-test is the
appropriate test, but you have non-normal data, that is when
you use the Sign Test.

Wilcoxon Signed Rank Test
 Test for equal medians between two groups BUT in this case it takes into
account the magnitude of the difference between the paired results (how
much bigger the median is for one group than the other, not just if it is the
same, higher or lower)
 Uses paired data

Wilcoxon Rank Sum Test
 Tests for differences between two independent groups

Kruskal-Wallis Test
 One-way ANOVA for nonparametric data

What you need to know:
• use the appropriate statistic for your data. Never try to dumb your data
down to use a lower level statistic unless there are problems that you
can’t overcome with distributions, etc.


Studies must be sure to use non-parametric tests when
the data do not support more quantitative analyses.
Know that these non-parametric alternatives exist.
Probably the most commonly used and easiest to understand
and one of the only nonparametric tests that reveals association
between variables.



Uses categorical data which can be presented in tabular fashion,
e.g., rows and columns.
The chi-square statistic compares the observed count in each cell
of the table with what would be expected if there is no association
between the rows and columns in the table.
Used to test the hypothesis of no association between two (or
more) groups and compares observed to expected counts.
Got the Flu
Did not get
the Flu
Total
Got the Shot
13
86
99
Did not get
the Shot
80
35
115
Total
93
121
214


The relationship between getting the flu and receiving
a flu shot can be displayed in a contingency table.
From the table we can see
 86/99 = 87% of those who got a shot did not get the flu
 80/93 = 86% of those who got the flu did NOT get a shot got the flu
 Does this suggest an association between the flu shot and getting the flu?
Got the Flu
Did not get
the Flu
Total
Got the Shot
13
86
99
Did not get
Shot
80
35
115
Total
93
121
214



The question of interest: does the flu shot decrease
your likelihood of getting the flu?
Need to calculate the numbers of shot/no shot
individuals that would be expected if the probability of
getting the flu were the same for each group.
If there is no association between having a shot and
getting the flu then the expected counts should nearly
equal the observed counts – and the X2 square value
should be small.

In our example:
 Overall proportion getting the flu shot = 99 / 214 = 0.463
 Overall proportion not getting the shot was 115 / 214 = 0.537

The observed numbers or counts in the table:
Got the Flu
Did not get
the Flu
Total
Got the shot
13
86
99
Did not get
the shot
80
35
115
Total
93
121
214

Under the assumption of no association between
getting the flu shot and getting the flu, the expected
numbers or counts in the table would be:
(Note: Expected counts = row total X column total / total number)
Got the Flu
Did not get the Flu
Total
Got the shot
99 X 93 /214 =43
99 X 121 / 214 = 56
99
No flu shot
115 X 93/214 =50
115 x 121 /214 = 65
115
Total
93
121
214
 X2
= Sumi [(Observedi – Expectedi) 2 /
Expectedi]
X2 =(13 – 43)2 /43 + (86-56)2 /56 + (80-50)2 / 50 + (35-65)2 / 65
= 900/43 + 900/56 + 900/50 +900/65
=20.93 +16.07 +18.00 +13.85
= 68.85
 X2 calculated = 68.85
 We have made the
assumption for our test
that there is no association between flu
shots and getting the flu.
 A small value for chi-square would support this
assumption: why?
 A large value would not support this assumption: why?
 The
question would be, is this a statistically
significant result?
 So, just like the t-test, we go to the tables
 X2 calculated = 68.85
 X2 table = 3.84 with
1 degree of freedom
(d.f. = (rows -1) times (columns-1) and
alpha =0.05
 Therefore, we reject the hypothesis of no
association and can state the p-value
would be less than 0.05 (would need to
look up in the table to obtain the actual pvalue)

T-tests
• One sample and two sample (paired and independent)
• Useful for comparing the means of two groups
• Can be used for more groups but you run the risk of
making a Type I error.

Analysis of Variance
• Compares two or more means controlling for the
experiment-wise (Type I error)

Correlation and Regression
• Compares multiple data points and provides the ability to
predict values of the dependent variables

Chi-square
• Useful in helping determine association between
variables. Not causal, just if there is any association.
Download