Chapter 5: Regression

+ Discovering Statistics 2nd Edition Daniel T. Larose Chapter 14: Nonparametric Statistics Lecture PowerPoint Slides + Chapter 14 Overview  14.1 Introduction to Nonparametric Statistics  14.2 Sign Test  14.3 Wilcoxon Signed Ranks Test for Matched-Pairs Data  14.4 Wilcoxon Rank Sum Test for Two Independent Samples  14.5 Kruskal-Wallis Test  14.6 Rank Correlation Test  14.7 Runs Test for Randomness 2 + The Big Picture 3 Where we are coming from and where we are headed… In earlier chapters, we learned how to perform hypothesis tests for population parameters, such as the mean µ or proportion p.  Here in Chapter 14 we learn about a family of hypothesis tests, called nonparametric hypothesis tests, whose conditions are similar to those in earlier chapters but less stringent.  Congratulations on getting this far in your discovery of the field of statistics! Best of luck in the future!  + 14.1: Introduction to Nonparametric Statistics Objectives: Explain what a nonparametric hypothesis test is and why we use it.  Describe what is meant by the efficiency of a nonparametric test.  4 5 Nonparametric Hypothesis Tests In Chapters 9–13, we learned how to perform hypothesis tests for population parameters, such as µ and p. To perform each of these parametric hypothesis tests, certain conditions need to be satisfied. Parametric hypothesis tests are used to test claims about a population parameter, such as the population mean µ or proportion p. Often, parametric tests require that the population follow a particular distribution, such as the normal distribution. Nonparametric hypothesis tests, also called distribution-free hypothesis tests, generally have fewer required conditions. In particular, nonparametric tests do not require the population to follow a particular distribution, such as the normal distribution. 6 Nonparametric Hypothesis Tests Advantages of Nonparametric Hypothesis Tests 1. May be used on a greater variety of data because they require fewer conditions than their parametric counterparts. 2. Can be applied to categorical (qualitative) data. 3. Manual computations tend to be easier than their parametric counterparts. Disadvantages of Nonparametric Hypothesis Tests 1. Less efficient than parametric tests as they require a larger sample size to reject a null hypothesis. 2. Replace actual data values with either signs or ranks. Thus, the actual data values are wasted. 3. Technology often does not have dedicated procedures for performing these tests. 7 Nonparametric Hypothesis Tests In general, parametric tests are more efficient than corresponding nonparametric tests. The efficiency of a nonparametric test is used to compare it with its corresponding parametric test. The efficiency of a nonparametric hypothesis test is defined as the ratio of the sample size required for the corresponding parametric test to the sample size required for the nonparametric test, in order to achieve the same result (such as correctly rejecting the null hypothesis). The efficiency ratings are reported on the assumption that required conditions for both the parametric and nonparametric tests have been met. Section Situation Parametric Test Nonparametric Test Efficiency 14.2 Matched pairs t or Z test Sign test 0.63 14.3 Matched pairs t or Z test Wilcoxon signed ranks test 0.95 14.4 Two independent samples t or Z test Wilcoxon rank sum test 0.95 14.5 Several independent samples ANOVA Kruskal-Wallis test 0.95 14.6 Correlation Linear Correlation Rank correlation test 0.91 14.7 Randomness None Runs test -- + 14.2: Sign Test 8 Objectives:  Perform the sign test for a single population median. Carry out the sign test for matched-pair data from two dependent samples.   Perform the sign test for binomial data. 9 Sign Test for a Population Median In Section 9.4, we learned how to perform the one-sample t test for the population mean µ. This is a parametric test requiring either a normal population or large sample. What do we do when we have neither? We use the sign test for the population median. The sign test is a nonparametric hypothesis test in which the original data are transformed into plus or minus signs. The sign test may be conducted for (a) a single population median, (b) matchedpair data from two dependent samples, or (c) binomial data. The sign test requires only that the sample data have been randomly selected. It is not required that the population be normally distributed. 10 Sign Test for a Population Median Sign Test for the Population Median M (Small Sample ≤ 25) If the data have been randomly selected, assign each value a (+) if greater than the hypothesized median or (–) if less than the hypothesized median. Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0 Step 2: Find the critical value and state the rejection rule. Use Table X, a, and the sample size n to identify Scrit. Step 3: Find the test statistic Sdata. Right-tailed test, Sdata= number of minus signs Left-tailed test, Sdata= number of plus signs Two-tailed test, Sdata= the smaller number of plus or minus signs Step 4: State the conclusion and the interpretation. 11 Sign Test for a Population Median Sign Test for the Population Median M (Large Sample > 25) If the data have been randomly selected, assign each value a (+) if greater than the hypothesized median or (–) if less than the hypothesized median. Step 1: State the hypotheses. H0: M = M0 vs. Ha: M>, M<, or M ≠ M0 Step 2: Find the critical value and state the rejection rule. Use Table X to find Zcrit. Step 3: Find the test statistics and Zdata. Z data  ( Sdata  0.5)  n 2 n 2 Step 4: State the conclusion and the interpretation. + 14.3: Wilcoxon Signed Ranks Test for Matched-Pair Data Objectives:  Assess whether or not a data set is symmetric. Carry out the Wilcoxon signed ranks test for matched-pair data from two dependent samples.  Perform the Wilcoxon signed ranks test for a single population median.  12 13 Assessing the Symmetry of a Data Set In Section 2.2, we learned that a distribution is symmetric if there is an axis of symmetry that splits the image in half so one side is the mirror image of the other. In Section 3.5, we learned that a boxplot is a convenient method for assessing the symmetry of a dataset. Boxplot Criterion for Assessing Symmetry A data set is symmetric when its corresponding boxplot has whiskers of approximately equal length, and the median line is situated approximately in the center of the box. 14 Wilcoxon Signed Ranks Test In Section 14.2, we performed the sign test for both a single population median and for the population median of the difference between two dependent samples. The Wilcoxon signed ranks test is a nonparametric hypothesis test in which the original data are transformed into their ranks. The Wilcoxon signed ranks test may be conducted for (a) a single population median, or (b) matched-pair data from two dependent samples. To perform the test, data must be randomly selected and have a symmetric distribution. Order the observations or the absolute value of the differences from smallest to largest. Rank these values from smallest to largest, assigning the average rank to any values that are the same. Then, attach the sign of corresponding values to the ranks. 15 Wilcoxon Signed Ranks Test Wilcoxon Signed Ranks Test for Matched-Pair Data (Small Sample ≤ 30) If the data have been randomly selected and the distribution is symmetric, assign each value a signed rank. Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0 Step 2: Find the critical value and state the rejection rule. Use Table X, a, and the sample size n to identify Tcrit. Step 3: Find the test statistic Tdata. Right-tailed test, Tdata= |T–| Left-tailed test, Tdata= T+ Two-tailed test, Sdata= the smaller of T+ or |T–|. Step 4: State the conclusion and the interpretation. 16 Wilcoxon Signed Ranks Test Wilcoxon Signed Ranks Test for Matched-Pair Data (Large Sample > 30) If the data have been randomly selected and the distribution is symmetric, assign each value a signed rank. Step 1: State the hypotheses. H0: Md = 0 vs. Ha: Md>, Md<, or Md ≠ 0 Step 2: Find the critical value and state the rejection rule. Use Table X to identify Zcrit. Step 3: Find the test statistic Zdata. n(n  1) Tdata  4 Z data  n(n  1)( 2n  1) 24 Step 4: State the conclusion and the interpretation. 17 Wilcoxon Signed Ranks Test We can use the same methods for the Wilcoxon signed ranks test for a single population median that we used for matched-pair data. However, there is no subtracting of sample values to find the differences. Instead, subtract the hypothesized median from each data value and assign the signed ranks to the differences. Null Hypothesis Alternative Hypothesis Type of Test H0: M = M0 H0: M > M0 Right-tailed H0: M = M0 H0: M < M0 Left-tailed H0: M = M0 H0: M ≠ M0 Two-tailed + 14.4: Wilcoxon Rank Sum Test for Two Independent Samples Objective: Perform the Wilcoxon rank sum test for the difference in population medians, using two independent samples.  18 19 Wilcoxon Rank Sum Test In Section 14.3, we compared data from dependent samples. Recall that two samples are independent when the subjects selected for the first sample do not determine the subjects in the second sample. The two-sample t test that we learned in Section 10.2 required that either each sample be large or that each population be normally distributed. The Wilcoxon rank sum test is a nonparametric hypothesis test in which the original data from two independent samples are transformed into their ranks. It tests whether the two population medians are equal or not. In the Wilcoxon rank sum test, the two samples are temporarily combined, and the ranks of the combined data values are calculated. Then the ranks are summed separately for each sample. R1 = the sum of the ranks for the first sample R2 = the sum of the ranks for the second sample 20 Wilcoxon Rank Sum Test Wilcoxon Rank Sum Test for Two Independent Samples The requirements are: (a) two independent random samples, (b) each sample size > 10, and (c) the shapes of the distributions are the same. Step 1: State the hypotheses. H0: M1= M2 vs. Ha: M1>, M1<, or M1 ≠ M2 Step 2: Find the critical value and state the rejection rule. See Table 14.14. Step 3: Find the test statistic Zdata. Z data  R1  μR R μR  R  n1 (n1  n2  1) 2 n1n2 (n1  n2 1) 12 Step 4: State the conclusion and the interpretation. + 14.5: Kruskal-Wallis Test Objective: Perform Kruskal-Wallis test for equal medians in three or more populations.  21 22 Kruskal-Wallis Test In Section 14.4, we learned the Wilcoxon rank sum test, which tests whether the population medians of two independent random samples are equal. Here, we extend this method to three or more populations. The Kruskal-Wallis test is a nonparametric hypothesis test in which the original data from three or more independent samples are transformed into their ranks. It tests whether the population medians are all equal. Like the Wilcoxon rank sum test, the samples are temporarily combined, and the ranks of the combined data values are calculated. Then the ranks are summed separately for each sample. R1 = the sum of the ranks for the first sample R2 = the sum of the ranks for the second sample, and so on . . Rk = the sum of the ranks for the last sample 23 Wilcoxon Rank Sum Test Wilcoxon Rank Sum Test for Two Independent Samples The requirements are (a) k ≥ 3 independent random samples and (b) each sample size > 5. Step 1: State the hypotheses. H0: The population medians are all equal vs. Ha: Not all population medians are equal. Step 2: Find the c2 critical value and state the rejection rule. Use Table X, a, and k – 1 degrees of freedom. Step 3: Find the test statistic c2data. χ 2 data Rk2  12  R12 R22    3( N  1)    ...  N ( N  1)  n1 n2 nk  Step 4: State the conclusion and the interpretation. + 14.6: Rank Correlation Test Objective:  Perform the rank correlation test for paired data. 24 25 Rank Correlation Test In Chapter 4, we learned how to calculate the correlation coefficient, which measures the strength of linear association between two variables. Here, we will learn how to calculate the rank correlation of two variables, which is the correlation of the variables based on ranks. The rank correlation test (Spearman’s rank correlation test) is based on the ranks of matched-pair data. This test may also be applied when the original data are ranks. In the rank correlation test, we investigate whether two variables are related by analyzing the ranks of matched-pair data. The rank correlation test may also be used to detect a nonlinear relationship between two variables. The hypotheses for the rank correlation test are: H0 = there is no rank correlation between the two variables Ha = there is a rank correlation between the two variables To find the test statistic, we must calculate and square the paired differences of the ranks. 26 Rank Correlation Test Rank Correlation Test (Small Sample ≤ 30) The sample data must be randomly selected. Step 1: State the hypotheses. Step 2: Find the rcrit critical value and state the rejection rule. Use Table X, a, and sample size n. Step 3: Find the test statistic rdata. Rank the values of each variable from lowest to highest. Find the difference in ranks for each subject, square the differences, and add them up. rdata  1  6 d 2 n(n 2  1) Step 4: State the conclusion and the interpretation. 27 Rank Correlation Test Rank Correlation Test (Large Sample > 30) The sample data must be randomly selected. Step 1: State the hypotheses. Step 2: Find the Zcrit critical value and state the rejection rule. Use Table 14.24. Step 3: Find the test statistic Zdata. Rank the values of each variable from lowest to highest. Find the difference in ranks for each subject, square the differences, and add them up. Z data  1  6 d 2 n(n 2  1) Step 4: State the conclusion and the interpretation. + 14.7: Runs Test for Randomness Objective:  Perform the runs test for randomness. 28 29 Runs Test for Randomness Recall from Chapter 13 that one of the assumptions for the linear regression model was that the values y were independent. Here we learn a test for checking this assumption. The runs test for randomness helps us determine whether the data in a sequence are random or if there is a pattern. The test applies to data that have two possible outcomes or data that can be re-expressed as one of two outcomes. The test works by counting the number of runs in the data set. A sequence is an ordered set of data. A run is a sequence of observations sharing the same value (of two possible values), preceded or followed by data having the other possible value or by no data at all. The runs test for randomness tests whether the data in a sequence are random or whether there is a pattern in the sequence. 30 Runs Test for Randomness The notation for the runs test for randomness is as follows: n1 = the number of observations having the first outcome n2 = the number of observations having the second outcome n = the total number of observations G = the number of runs in the sequence 31 Runs Test for Randomness Runs Test for Randomness (Small Samples n1 and n2 ≤ 20) There are two conditions: (a) the data are ordered, and (b) each data value represents one of two distinct outcomes. Step 1: State the hypotheses. H0: The sequence of data is random vs. Ha: The sequence of data is not random. Step 2: Find the Gcrit critical value and state the rejection rule. Use Table X, a = 0.05, row n1, and column n2. Step 3: Find the test statistic Gdata. Gdata  G Step 4: State the conclusion and the interpretation. 32 Runs Test for Randomness Runs Test for Randomness (Large Samples n1 or n2 > 20) There are two conditions: (a) the data are ordered, and (b) each data value represents one of two distinct outcomes. Step 1: State the hypotheses. H0: The sequence of data is random vs. Ha: The sequence of data is not random. Step 2: Find the Zcrit critical value and state the rejection rule. Use Table 14.27. Step 3: Find the test statistic Gdata. Gdata  G  G G  G 2n1n 2 1 n1  n 2 (2n1n2 )(2n1n2  n1  n2 ) G  (n1  n2 ) 2 (n1  n2 1) Step 4: State the conclusion and the interpretation.   + Chapter 14 Overview  14.1 Introduction to Nonparametric Statistics  14.2 Sign Test  14.3 Wilcoxon Signed Ranks Test for Matched-Pairs Data  14.4 Wilcoxon Rank Sum Test for Two Independent Samples  14.5 Kruskal-Wallis Test  14.6 Rank Correlation Test  14.7 Runs Test for Randomness 33

Chapter 5: Regression

Related documents

Products

Support

Chapter 5: Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib