Chapter 5: Regression

advertisement
+
Discovering Statistics
2nd Edition Daniel T. Larose
Chapter 14:
Nonparametric Statistics
Lecture PowerPoint Slides
+ Chapter 14 Overview

14.1 Introduction to Nonparametric Statistics

14.2 Sign Test

14.3 Wilcoxon Signed Ranks Test for Matched-Pairs
Data

14.4 Wilcoxon Rank Sum Test for Two Independent
Samples

14.5 Kruskal-Wallis Test

14.6 Rank Correlation Test

14.7 Runs Test for Randomness
2
+ The Big Picture
3
Where we are coming from and where we are headed…
In earlier chapters, we learned how to perform hypothesis tests for
population parameters, such as the mean µ or proportion p.

Here in Chapter 14 we learn about a family of hypothesis tests,
called nonparametric hypothesis tests, whose conditions are similar
to those in earlier chapters but less stringent.

Congratulations on getting this far in your discovery of the field of
statistics! Best of luck in the future!

+ 14.1: Introduction to
Nonparametric Statistics
Objectives:
Explain what a nonparametric hypothesis test is and why
we use it.

Describe what is meant by the efficiency of a
nonparametric test.

4
5
Nonparametric Hypothesis Tests
In Chapters 9–13, we learned how to perform hypothesis tests for
population parameters, such as µ and p. To perform each of these
parametric hypothesis tests, certain conditions need to be
satisfied.
Parametric hypothesis tests are used to test claims about a
population parameter, such as the population mean µ or proportion
p. Often, parametric tests require that the population follow a
particular distribution, such as the normal distribution.
Nonparametric hypothesis tests, also called distribution-free
hypothesis tests, generally have fewer required conditions. In
particular, nonparametric tests do not require the population to
follow a particular distribution, such as the normal distribution.
6
Nonparametric Hypothesis Tests
Advantages of Nonparametric Hypothesis Tests
1. May be used on a greater variety of data because they require
fewer conditions than their parametric counterparts.
2. Can be applied to categorical (qualitative) data.
3. Manual computations tend to be easier than their parametric
counterparts.
Disadvantages of Nonparametric Hypothesis Tests
1. Less efficient than parametric tests as they require a larger
sample size to reject a null hypothesis.
2. Replace actual data values with either signs or ranks. Thus, the
actual data values are wasted.
3. Technology often does not have dedicated procedures for
performing these tests.
7
Nonparametric Hypothesis Tests
In general, parametric tests are more efficient than corresponding
nonparametric tests. The efficiency of a nonparametric test is used
to compare it with its corresponding parametric test.
The efficiency of a nonparametric hypothesis test is defined as the
ratio of the sample size required for the corresponding parametric
test to the sample size required for the nonparametric test, in order
to achieve the same result (such as correctly rejecting the null
hypothesis). The efficiency ratings are reported on the assumption
that required conditions for both the parametric and nonparametric
tests have been met.
Section
Situation
Parametric Test
Nonparametric Test
Efficiency
14.2
Matched pairs
t or Z test
Sign test
0.63
14.3
Matched pairs
t or Z test
Wilcoxon signed ranks test
0.95
14.4
Two independent samples
t or Z test
Wilcoxon rank sum test
0.95
14.5
Several independent samples
ANOVA
Kruskal-Wallis test
0.95
14.6
Correlation
Linear Correlation
Rank correlation test
0.91
14.7
Randomness
None
Runs test
--
+ 14.2: Sign Test
8
Objectives:

Perform the sign test for a single population median.
Carry out the sign test for matched-pair data from two dependent
samples.


Perform the sign test for binomial data.
9
Sign Test for a Population Median
In Section 9.4, we learned how to perform the one-sample t test for
the population mean µ. This is a parametric test requiring either a
normal population or large sample. What do we do when we have
neither? We use the sign test for the population median.
The sign test is a nonparametric hypothesis test in which the
original data are transformed into plus or minus signs. The sign test
may be conducted for (a) a single population median, (b) matchedpair data from two dependent samples, or (c) binomial data.
The sign test requires only that the sample data have been
randomly selected. It is not required that the population be normally
distributed.
10
Sign Test for a Population Median
Sign Test for the Population Median M (Small Sample ≤ 25)
If the data have been randomly selected, assign each value a (+) if
greater than the hypothesized median or (–) if less than the
hypothesized median.
Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0
Step 2: Find the critical value and state the rejection rule. Use Table X,
a, and the sample size n to identify Scrit.
Step 3: Find the test statistic Sdata.
Right-tailed test, Sdata= number of minus signs
Left-tailed test, Sdata= number of plus signs
Two-tailed test, Sdata= the smaller number of plus or minus signs
Step 4: State the conclusion and the interpretation.
11
Sign Test for a Population Median
Sign Test for the Population Median M (Large Sample > 25)
If the data have been randomly selected, assign each value a (+) if
greater than the hypothesized median or (–) if less than the
hypothesized median.
Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0
Step 2: Find the critical value and state the rejection rule. Use Table X
to find Zcrit.
Step 3: Find the test statistics and Zdata.
Z data 
( Sdata  0.5) 
n
2
n
2
Step 4: State the conclusion and the interpretation.
+ 14.3: Wilcoxon Signed Ranks
Test for Matched-Pair Data
Objectives:
 Assess
whether or not a data set is symmetric.
Carry out the Wilcoxon signed ranks test for matched-pair data
from two dependent samples.

Perform the Wilcoxon signed ranks test for a single population
median.

12
13
Assessing the Symmetry of a Data Set
In Section 2.2, we learned that a distribution is symmetric if there is
an axis of symmetry that splits the image in half so one side is the
mirror image of the other. In Section 3.5, we learned that a boxplot is
a convenient method for assessing the symmetry of a dataset.
Boxplot Criterion for Assessing Symmetry
A data set is symmetric when its corresponding boxplot has
whiskers of approximately equal length, and the median line is
situated approximately in the center of the box.
14
Wilcoxon Signed Ranks Test
In Section 14.2, we performed the sign test for both a single
population median and for the population median of the difference
between two dependent samples.
The Wilcoxon signed ranks test is a nonparametric hypothesis
test in which the original data are transformed into their ranks. The
Wilcoxon signed ranks test may be conducted for (a) a single
population median, or (b) matched-pair data from two dependent
samples.
To perform the test, data must be randomly selected and have a
symmetric distribution. Order the observations or the absolute value
of the differences from smallest to largest. Rank these values from
smallest to largest, assigning the average rank to any values that
are the same. Then, attach the sign of corresponding values to the
ranks.
15
Wilcoxon Signed Ranks Test
Wilcoxon Signed Ranks Test for Matched-Pair Data
(Small Sample ≤ 30)
If the data have been randomly selected and the distribution is
symmetric, assign each value a signed rank.
Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0
Step 2: Find the critical value and state the rejection rule. Use Table X,
a, and the sample size n to identify Tcrit.
Step 3: Find the test statistic Tdata.
Right-tailed test, Tdata= |T–|
Left-tailed test, Tdata= T+
Two-tailed test, Sdata= the smaller of T+ or |T–|.
Step 4: State the conclusion and the interpretation.
16
Wilcoxon Signed Ranks Test
Wilcoxon Signed Ranks Test for Matched-Pair Data
(Large Sample > 30)
If the data have been randomly selected and the distribution is
symmetric, assign each value a signed rank.
Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0
Step 2: Find the critical value and state the rejection rule. Use Table X
to identify Zcrit.
Step 3: Find the test statistic Zdata.
n(n  1)
Tdata 
4
Z data 
n(n  1)( 2n  1)
24
Step 4: State the conclusion and the interpretation.
17
Wilcoxon Signed Ranks Test
We can use the same methods for the Wilcoxon signed ranks test
for a single population median that we used for matched-pair data.
However, there is no subtracting of sample values to find the
differences.
Instead, subtract the hypothesized median from each data value and
assign the signed ranks to the differences.
Null Hypothesis
Alternative Hypothesis
Type of Test
H0: M = M0
H0: M > M0
Right-tailed
H0: M = M0
H0: M < M0
Left-tailed
H0: M = M0
H0: M ≠ M0
Two-tailed
+ 14.4: Wilcoxon Rank Sum Test
for Two Independent Samples
Objective:
Perform the Wilcoxon rank sum test for the difference in
population medians, using two independent samples.

18
19
Wilcoxon Rank Sum Test
In Section 14.3, we compared data from dependent samples. Recall
that two samples are independent when the subjects selected for
the first sample do not determine the subjects in the second sample.
The two-sample t test that we learned in Section 10.2 required that
either each sample be large or that each population be normally
distributed.
The Wilcoxon rank sum test is a nonparametric hypothesis test in
which the original data from two independent samples are
transformed into their ranks. It tests whether the two population
medians are equal or not.
In the Wilcoxon rank sum test, the two samples are temporarily
combined, and the ranks of the combined data values are calculated.
Then the ranks are summed separately for each sample.
R1 = the sum of the ranks for the first sample
R2 = the sum of the ranks for the second sample
20
Wilcoxon Rank Sum Test
Wilcoxon Rank Sum Test for Two Independent Samples
The requirements are: (a) two independent random samples, (b) each
sample size > 10, and (c) the shapes of the distributions are the same.
Step 1: State the hypotheses. H0: M1= M2 vs. Ha: M1>, M1<, or M1 ≠ M2
Step 2: Find the critical value and state the rejection rule. See Table
14.14.
Step 3: Find the test statistic Zdata.
Z data 
R1  μR
R
μR 
R 
n1 (n1  n2  1)
2
n1n2 (n1  n2 1)
12
Step 4: State the conclusion and the interpretation.
+ 14.5: Kruskal-Wallis Test
Objective:
Perform Kruskal-Wallis test for equal medians in three or more
populations.

21
22
Kruskal-Wallis Test
In Section 14.4, we learned the Wilcoxon rank sum test, which tests
whether the population medians of two independent random
samples are equal. Here, we extend this method to three or more
populations.
The Kruskal-Wallis test is a nonparametric hypothesis test in
which the original data from three or more independent samples
are transformed into their ranks. It tests whether the population
medians are all equal.
Like the Wilcoxon rank sum test, the samples are temporarily
combined, and the ranks of the combined data values are calculated.
Then the ranks are summed separately for each sample.
R1 = the sum of the ranks for the first sample
R2 = the sum of the ranks for the second sample, and so on
.
.
Rk = the sum of the ranks for the last sample
23
Wilcoxon Rank Sum Test
Wilcoxon Rank Sum Test for Two Independent Samples
The requirements are (a) k ≥ 3 independent random samples and (b)
each sample size > 5.
Step 1: State the hypotheses. H0: The population medians are all
equal vs. Ha: Not all population medians are equal.
Step 2: Find the c2 critical value and state the rejection rule. Use Table
X, a, and k – 1 degrees of freedom.
Step 3: Find the test statistic c2data.
χ
2
data
Rk2 
12  R12 R22

  3( N  1)


 ... 
N ( N  1)  n1 n2
nk 
Step 4: State the conclusion and the interpretation.
+ 14.6: Rank Correlation Test
Objective:

Perform the rank correlation test for paired data.
24
25
Rank Correlation Test
In Chapter 4, we learned how to calculate the correlation coefficient,
which measures the strength of linear association between two
variables. Here, we will learn how to calculate the rank correlation of
two variables, which is the correlation of the variables based on
ranks.
The rank correlation test (Spearman’s rank correlation test) is
based on the ranks of matched-pair data. This test may also be
applied when the original data are ranks. In the rank correlation
test, we investigate whether two variables are related by analyzing
the ranks of matched-pair data. The rank correlation test may also
be used to detect a nonlinear relationship between two variables.
The hypotheses for the rank correlation test are:
H0 = there is no rank correlation between the two variables
Ha = there is a rank correlation between the two variables
To find the test statistic, we must calculate and square the paired
differences of the ranks.
26
Rank Correlation Test
Rank Correlation Test (Small Sample ≤ 30)
The sample data must be randomly selected.
Step 1: State the hypotheses.
Step 2: Find the rcrit critical value and state the rejection rule. Use
Table X, a, and sample size n.
Step 3: Find the test statistic rdata.
Rank the values of each variable from lowest to highest.
Find the difference in ranks for each subject, square the
differences, and add them up.
rdata  1 
6 d 2
n(n 2  1)
Step 4: State the conclusion and the interpretation.
27
Rank Correlation Test
Rank Correlation Test (Large Sample > 30)
The sample data must be randomly selected.
Step 1: State the hypotheses.
Step 2: Find the Zcrit critical value and state the rejection rule. Use
Table 14.24.
Step 3: Find the test statistic Zdata.
Rank the values of each variable from lowest to highest.
Find the difference in ranks for each subject, square the
differences, and add them up.
Z data  1 
6 d 2
n(n 2  1)
Step 4: State the conclusion and the interpretation.
+ 14.7: Runs Test for Randomness
Objective:

Perform the runs test for randomness.
28
29
Runs Test for Randomness
Recall from Chapter 13 that one of the assumptions for the linear
regression model was that the values y were independent. Here we
learn a test for checking this assumption.
The runs test for randomness helps us determine whether the
data in a sequence are random or if there is a pattern. The test
applies to data that have two possible outcomes or data that can be
re-expressed as one of two outcomes. The test works by counting
the number of runs in the data set.
A sequence is an ordered set of data. A run is a sequence of
observations sharing the same value (of two possible values),
preceded or followed by data having the other possible value or by
no data at all. The runs test for randomness tests whether the
data in a sequence are random or whether there is a pattern in the
sequence.
30
Runs Test for Randomness
The notation for the runs test for randomness is as follows:
n1 = the number of observations having the first outcome
n2 = the number of observations having the second outcome
n = the total number of observations
G = the number of runs in the sequence
31
Runs Test for Randomness
Runs Test for Randomness (Small Samples n1 and n2 ≤ 20)
There are two conditions: (a) the data are ordered, and (b) each data
value represents one of two distinct outcomes.
Step 1: State the hypotheses. H0: The sequence of data is random vs.
Ha: The sequence of data is not random.
Step 2: Find the Gcrit critical value and state the rejection rule. Use
Table X, a = 0.05, row n1, and column n2.
Step 3: Find the test statistic Gdata.
Gdata  G
Step 4: State the conclusion and the interpretation.
32
Runs Test for Randomness
Runs Test for Randomness (Large Samples n1 or n2 > 20)
There are two conditions: (a) the data are ordered, and (b) each data
value represents one of two distinct outcomes.
Step 1: State the hypotheses. H0: The sequence of data is random vs.
Ha: The sequence of data is not random.
Step 2: Find the Zcrit critical value and state the rejection rule. Use
Table 14.27.
Step 3: Find the test statistic Gdata.
Gdata 
G  G
G 
G
2n1n 2
1
n1  n 2
(2n1n2 )(2n1n2  n1  n2 )
G 
(n1  n2 ) 2 (n1  n2 1)
Step 4: State the conclusion and the interpretation.


+ Chapter 14 Overview

14.1 Introduction to Nonparametric Statistics

14.2 Sign Test

14.3 Wilcoxon Signed Ranks Test for Matched-Pairs
Data

14.4 Wilcoxon Rank Sum Test for Two Independent
Samples

14.5 Kruskal-Wallis Test

14.6 Rank Correlation Test

14.7 Runs Test for Randomness
33
Download