Sarah Loveland MAT 5900 Monte Carlo Methods Final Paper Robustness of the t Test for Violations of Normality Statistical Inference Statistics involves making inferences about a population by using a sample of data from the population. There are two different methods of statistical inference: estimation and hypothesis testing. Estimation answers the question, “What is the value of the parameter?” while hypothesis testing answers the question, “Is the parameter value equal to some specified value?” (Longnecker & Ott, 175). An estimate can either be a point estimate, such as the sample mean, or an interval estimate, such as a confidence interval. A confidence interval for the mean is formed by the following formula: 𝑦̅ ± 𝑧𝛼 (𝜎𝑦̅ ), where 𝑦̅ is the sample mean, 𝛼 is the specified level of 2 𝛼 significance of the test (decided on ahead of time), ±𝑧𝛼 is the z-critical value with 100( 2 )% in 2 the upper/lower tail, 𝜎𝑦̅ = 𝜎 √𝑛 is the standard deviation of the sample mean, and 𝑛 is the sample size. The meaning of this confidence interval is that 100(1 − 𝛼)% of the sample means taken from the population will fall within the interval (see Figure 1). One type of hypothesis testing involves testing a null hypothesis that the mean of a population, 𝜇, is equal to a certain value, 𝜇0 , versus an alternative hypothesis that 𝜇 ≠ 𝜇0 . A test statistic is calculated, such as the sample mean, 𝑦̅, and is then used to calculate a z-value, 𝑦̅−𝜇 𝑧 = 𝜎/ 𝑛0. This z-value is then compared to the z-critical value, 𝑧𝛼 , to determine whether to √ 2 reject or fail to reject the null hypothesis. If |𝑧| > 𝑧𝛼 , then the null hypothesis is rejected, 2 meaning that there is evidence that 𝜇 ≠ 𝜇0 . If this is not the case, there is no evidence against the null, and it is not rejected. This type of test is used when the data from the population follow a standard normal, or z, distribution, and when the standard deviation, 𝜎, of this population is known. However, in most hypothesis tests, especially those using real world data, the standard deviation is unknown. It is possible to replace the standard deviation of the population with the sample standard deviation, 𝑠, but some adjustments must be made to the hypothesis test in order to make this replacement. The One-Sample t Test The Student’s t test can be used to test hypotheses of populations who follow normal distributions with unknown standard deviations. The test statistic is still calculated in the same way, and is used to calculate a t-value, = 𝑦̅−𝜇0 𝑠/√𝑛 , where 𝑠 is the sample standard deviation. The adjustments that must be made in order to use the sample standard deviation are in finding the critical value. A z-critical value can no longer be used; instead a t-critical value must be found. The same 𝛼 can be used, but another value called degrees of freedom must be found in order to account for the deviation from the standard normal. Degrees of freedom are defined as the sample size less one, 𝑑𝑓 = 𝑛 − 1. A t distribution with 𝑑𝑓 = 1, will have a lower peak and heavier tails than the standard normal. A t distribution with 𝑑𝑓 = 5 will have a higher peak and lighter tails than that with 𝑑𝑓 = 1, but will still have a lower peak and heavier tails than the 𝛼 standard normal (see Figure 2). The t-critical value is found using 2 and the degrees of freedom as parameters, and then compared to the t-value in the same way as with the z- and z-critical values. This test is used to compare one sample mean to a specified expected value of the population mean, and is called the one-sample t test. The Two-Sample t Test It is also possible to use the t test to compare the means of two populations in order to determine whether they are equal to each other. This is called the two-sample t test, and involves similar concepts to the one-sample t test, but must take into account two sample sizes, two sample means, and two sample standard deviations. In the two-sample test, the null hypothesis is testing if 𝜇1 =𝜇2 , and the alternative hypothesis is that 𝜇1 ≠ 𝜇2 . The test statistic for this test is 𝑦̅1 − 𝑦̅2 , and the t-value is found by the equation, 𝑡 = (𝑦̅1 −𝑦̅2 ) 𝑠 2 𝑠 2 √ 𝑝 + 𝑝 𝑛1 , where 𝑠𝑝 2 = (𝑛1 −1)𝑠1 2 +(𝑛2 −1)𝑠2 2 𝑛1 +𝑛2 −2 𝑛2 is the pooled variance of the two samples, 𝑛1 and 𝑛2 are the sample sizes, 𝑦̅1 and 𝑦̅2 are the sample means, and 𝑛1 + 𝑛2 − 2 are the degrees of freedom. The t-critical value is once again 𝛼 found using 2 and the degrees of freedom as parameters and is compared to the t-value to determine whether to reject or accept the mean. Assumptions behind t Test When using a t test, there are assumptions that must be taken into account in order to ensure accuracy of the results. The first of these assumptions is independence. This means that the samples must not depend on each other. For example, two samples of patients, each receiving either a treatment or the placebo, are independent because there are different patients in each sample. On the other hand, testing the prices of two mechanic shops by sending the same cars to both shops would create two dependent samples, which requires a slightly adjusted test. Another important assumption behind the t test is normality. The test is designed to be used with normally distributed or pseudo-normally distributed samples. Violating normality will cause the confidence interval, alpha value, and power of the test to differ from the expected values (Longnecker & Ott 242). The last assumption is that the variances of the two populations in a two-sample test are equal. Most of the time the variances are unknown, but this can be tested using the sample variances. If this assumption is not met, there is an alternate test to deal with samples with unequal population variances. Research Focus My research focuses on the robustness of the one-sample and two-sample t tests against violations of normality. I focused on the robustness of the alpha values, the probability of rejecting the null when it is true, and the power, the probability of correctly rejecting the null when it is false. I focused my study on six non-normal distributions: uniform, exponential, t with 3 degrees of freedom, t with 5 degrees of freedom, t with 7 degrees of freedom, and Cauchy. The uniform distribution is a light-tailed distribution because every value between the two endpoints is equally probable, meaning that no values fall in the “tails” (See Figure 3). The exponential distribution is a skewed distribution, meaning that the peak of the distribution does not fall in the center (See Figure 4). The three t distributions are heavy-tailed distributions because there is less area under the peak and more under the tails than in the standard normal (See Figure 5). The Cauchy, another heavy-tailed distribution, brings some added challenges because the mean and variance of the distribution are not defined (See Figure 6). For this distribution I instead used the median for comparison. I compared the effectiveness of these distributions with the one-sample and two-sample t tests to the effectiveness of the standard normal distribution (See Figure 7). Method in R: One-Sample t Test In order to test the robustness of the one-sample t test for alpha in R, I first set up one t test with 𝛼 = .05, sample size 𝑛 = 5, and the mean of the population, 𝜇, equal to 𝜇0 , the expected value of the mean. I finally set up an outer for loop and set the number of runs to 100000 in order to run Monte Carlo methods to get a reliable estimate. In order to estimate the alpha value for each distribution, I kept track of how many times the null hypothesis was rejected. Since the means were set to be equal, these rejection are Type I errors, rejecting the null hypothesis when it is actually true. I divided the number of rejections by the number of runs in order to get the alpha value. I also calculated the standard error for each value to see what the confidence in the value was. I ran the same code (See Code 1) for a number of different sample sizes and for each distribution and kept track of the results in a table. The method was similar for testing the power of the one-sample t test for each distribution. I still ran the test for 100000 runs with 𝛼 = .05. However, I set the mean of the distribution 𝜇 so that it was not equal to 𝜇0 . So when the test was run and again kept track of the rejections, it was keeping track of the power, how many times the null was rejected when it was actually false. I used two different sample sizes for the power test. I also changed the shift size |𝜇 − 𝜇0 | , which is the difference between the actual mean of the distribution and the hypothesized mean. This was to test how much more effective the test was at recognizing the difference between the mean and the hypothesized mean when the distance was larger. Again, I ran the same R code (See Code 2) for each distribution and kept track of the results in tables. Method in R: Two-Sample t Test To test the robustness of the two-sample t test for alpha in R, I set up the test with 𝛼 = .05, and both sample sizes equal to 𝑛 = 5. Since the two-sample test is used to compare the means of two different samples, I set the means of the two distributions equal to each other, 𝜇1 = 𝜇2 . I used the same outer for loop as I had for the one-sample test, and again used 100000 runs. I used the formula for the two sample test and kept track of how many times the null hypothesis was rejected even thought it was set to be true. I then calculated the alpha value and the standard error and kept track of these in a table. I ran this same code (See Code 3) for a number of different sample sizes and for each distribution. For running the two-sample test with the Cauchy distribution, I had to take into account the fact that the variance is not defined. Therefore, the t statistic must be replaced by the 𝑡 ′ statistic, 𝑡 ′ = ̅𝑦̅̅̅−𝑦 1 ̅̅̅̅ 2 𝑠 2 𝑠 2 √ 1 + 2 𝑛1 . This takes into account the two potentially different variances. There is 𝑛2 also a different equation for the degrees of freedom for this statistic: = (𝑠 𝑠 2 𝑠 2 ( 1 + 2 )2 𝑛1 𝑛2 2 2 2 2 1 /𝑛1 ) +(𝑠2 /𝑛2 ) 𝑛1 −1 𝑛2 −1 . This test was run exactly the same way in R except with the different formulas substituted. The method was similar for testing the power of the two-sample t test for each distribution. However, I set the mean of the both distributions so that they were not equal. So when the test kept track of the rejections, it was keeping track of the power. I used two different sample sizes for this test as well, with both samples having the same sample size. And I changed the shift size |𝜇 − 𝜇0 | the same way that I did with the one-sample test. I ran the same R code (See Code 4) for each distribution and kept track of the results in tables. Again, I used 𝑡 ′ for the Cauchy distribution because of the undefined variances. Results and Implications The results of these tests have given some insight into the robustness of the t test against violations of normality. As can be seen in Table 1, listing the alpha values for the one-sample t at different sample sizes, the standard normal distribution exhibits an alpha value of .05 at every sample size. This is the ideal for comparing the other distributions. The uniform distribution takes a slightly larger sample size to reach the expected alpha value, but once the test was run with a sample size of 15 or 20 the alpha value was very close to .05. The exponential distribution takes even longer than the uniform to reach an acceptable alpha level, which is to be expected because of the skew of the distribution. With sample sizes of 50 or 100, it is fairly close to .05, but is still greater than the expected value. This means that we cannot be as confident in the test with the exponential distribution as with the standard normal. The t distributions get closer to the standard normal as the degrees of freedom increase. These are pretty close to the values obtained for normal, and as expected the values improve for the higher degrees of freedom. The Cauchy distribution had extremely low alpha values, which was somewhat of a surprise to me because it looks like it follows a distribution that is somewhat close to normal. For power for the one-sample t test, the normal distribution approached very high power with a shift size as small as 2, even with a sample size of only 5 (See Table 2). The other distributions, as expected took longer to reach these high levels of power. The uniform was pretty close to the standard normal values again, especially for a sample size of 20 (See Table 3). The exponential did better than the uniform distribution in terms of power, achieving values for power much closer to those of the normal distribution. The t distributions with different degrees of freedom once again show that higher degrees of freedom approach the standard normal more quickly. The Cauchy distribution did reach a reasonably high power, but did not get a power higher than .742 even for a shift size of 5. The standard normal distribution had alpha values for the two-sample t test extremely close to or equal to .05 for all sample sizes (See Table 4). The uniform was very close again, even with both samples having a sample size of 15. The exponential took longer to achieve appropriate alpha values, but with sample sizes of 20 it was fairly close, while still a little less than the expected value of .05. The t distributions all approached the values of the standard normal, with the lower degrees of freedom approaching at a slightly slower rate, as expected. The Cauchy distribution once again did not achieve alpha values above .21 despite appearing to be fairly close to the normal distribution. The power calculations for the two-sample t test followed basically the same pattern as those for the one-sample test (See Tables 5,6). The uniform and exponential distributions took much longer to approach high power values for sample sizes of 5, only reaching powers of .468 and .387 respectively. For sample sizes of 20, however, these two distributions approached higher power more quickly, both with powers greater than .9 with a shift size of 3. The t distributions once again follow what is expected, with the distribution with 3 degrees of freedom approaching higher power slightly more slowly than those with 5 and 7 degrees of freedom. The Cauchy had extremely low power for the two-sample test, even with a shift size of 5, the power was only slightly greater than .5. This means that the test was only able to spot the difference between the means half of the time, even when they had a shift size of 5, which was more than enough for the other distributions to achieve high power. Conclusions The t test was not effective at all for samples from the Cauchy distribution. This is most likely because of the undefined mean and variance for the distribution. Although the median seems to be a good estimator of the center of the distribution, the alpha values and power for the tests with Cauchy samples were not even close to those expected in most cases. This is strong evidence that the t test is not robust for samples with an underlying Cauchy distribution. For the other distributions, specifically the uniform and the exponential distributions, the t test seems to be robust for both alpha and power at higher sample sizes, above 15 or 20. I had expected pretty good results from the uniform because it is symmetric. Even though it doesn’t have tails like the standard normal, there is no skew to worry about and it is pretty easy to predict what will happen with this distribution. I hadn’t expected such good results from the exponential distribution because of the amount of skew that is present. It had even better power than the uniform for almost all cases. As for the t distributions with degrees of freedom 3, 5, and 7, they acted as expected, slightly lagging behind the standard normal. It took slightly larger sample sizes for each of these distributions to reach expected values, with lower degrees of freedom achieving values more slowly. Overall, caution should be used when violating the assumptions of the t test. However, I am pretty confident in saying that the uniform, exponential, and t (df=3,5,7) distributions can be used with one-sample and two-sample t tests for higher sample sizes, around 20. Figures, R Code, Tables Figure 1: 95% confidence interval – 95% of the sample means found will fall within this interval, 5% will fall in the tails. Figure 2: t distributions with 𝑑𝑓 = 1, 2, 5. The distribution with 𝑑𝑓 = ∞ is the standard normal. Figure 3: Uniform distribution Figure 4: Exponential distribution Figure 5: t distribution with various degrees of freedom Figure 6: Cauchy distribution Figure 7: Normal distribution Code 1: n=100 mu=0 alpha=.05 df=n-1 rej=0 nruns=100000 #sample size #expected value of mean #level of significance #degrees of freedom #H0: mu=0 #Ha: mu!=0 #Reject H0 if abs(t)>crit for (run in 1:nruns){ samp=rnorm(n) ybar=mean(samp) s=sd(samp) #sample #sample mean #sample standard deviation t=(ybar-mu)/(s/sqrt(n)) crit=qt(1-(alpha/2),df) if (abs(t)>crit){rej=rej+1}} rejrate=rej/nruns rejrate #rate of rejecting null when it is true (alpha) stderr=sqrt((rejrate*(1-rejrate))/nruns) stderr Code 2: n=20 mu=1 alpha=.05 df=n-1 rej=0 nruns=100000 #sample size #expected value of mean #level of significance #degrees of freedom #H0: mu=1 #Ha: mu!=1 #Reject H0 if abs(t)>crit for (run in 1:nruns){ samp=rnorm(n) ybar=mean(samp) s=sd(samp) #sample #sample mean #sample standard deviation t=(ybar-mu)/(s/sqrt(n)) crit=qt(1-(alpha/2),df) if (abs(t)>crit){rej=rej+1}} rejrate=rej/nruns rejrate #rate of rejecting null when Ha is true (power) stderr=sqrt((rejrate*(1-rejrate))/nruns) stderr Code 3: n1=5 n2=5 alpha=.05 df=n1+n2-2 rej=0 nruns=100000 #sample 1 size #sample 2 size #level of significance #degrees of freedom #H0: mu1=mu2 #Ha: mu1!=mu2 #Reject H0 if abs(t)>crit for (run in 1:nruns){ samp1=rnorm(n1) samp2=rnorm(n2) ybar1=mean(samp1) ybar2=mean(samp2) s1=sd(samp1) s2=sd(samp2) #sample 1 #sample 2 #sample 1 mean #sample 2 mean #sample 1 standard deviation #sample 2 standard deviation spsq=Sp=(((n1-1)*(s1**2))+((n2-1)*(s2**2)))/(df) #pooled variance t=(ybar1-ybar2)/sqrt((spsq/n1)+(spsq/n2)) crit=qt(1-(alpha/2), df) if (abs(t)>crit){rej=rej+1}} rejrate=rej/nruns rejrate stderr=sqrt((rejrate*(1-rejrate))/nruns) stderr #test statistic Code 4: n1=20 n2=20 alpha=.05 df=n1+n2-2 rej=0 nruns=100000 #sample 1 size #sample 2 size #level of significance #degrees of freedom #H0: mu1=mu2 #Ha: mu1!=mu2 #Reject H0 if abs(t)>crit for (run in 1:nruns){ samp1=rnorm(n1) samp2=rnorm(n2,2) ybar1=mean(samp1) ybar2=mean(samp2) s1=sd(samp1) s2=sd(samp2) #sample 1 #sample 2 with different mean #sample 1 mean #sample 2 mean #sample 1 standard deviation #sample 2 standard deviation spsq=Sp=(((n1-1)*(s1**2))+((n2-1)*(s2**2)))/(df) #pooled variance t=(ybar1-ybar2)/sqrt((spsq/n1)+(spsq/n2)) crit=qt(1-(alpha/2), df) if (abs(t)>crit){rej=rej+1}} rejrate=rej/nruns rejrate stderr=sqrt((rejrate*(1-rejrate))/nruns) stderr #test statistic Table 1: One-Sample t: Alpha Calculations, alpha=.05, each column has a different sample size Distribution Normal Uniform Exponential t₃ t₅ t₇ Cauchy 5 .050 (.001) .066 (.001) .117 (.001) .038 (.001) .042 (.001) .045 (.001) .018 (.000) 10 .050 (.001) .054 (.001) .099 (.001) .040 (.001) .046 (.001) .048 (.001) .019 (.000) 15 .050 (.001) .052 (.001) .089 (.001) .042 (.001) .046 (.001) .047 (.001) .020 (.000) 20 .051 (.001) .051 (.001) .081 (.001) .044 (.001) .047 (.001) .049 (.001) .020 (.000) 50 .050 (.001) .050 (.001) .066 (.001) .046 (.001) .049 (.001) .048 (.001) .021 (.000) 100 .051 (.001) .050 (.001) .059 (.001) .047 (.001) .051 (.001) .049 (.001) .020 (.000) Table 2: One-Sample t: Power Calculations, sample size=5, alpha=.05, each column has a different shift size: abs(mu-mu0) Distribution Normal Uniform Exponential t₃ t₅ t₇ Cauchy 0 .050 (.001) .066 (.001) .118 (.001) .038 (.001) .043 (.001) .044 (.001) .019 (.000) 0.5 .142 (.001) .085 (.001) .319 (.001) .100 (.001) .117 (.001) .122 (.001) .048 (.001) 1 .399 (.002) .144 (.001) .543 (.002) .282 (.001) .327 (.001) .348 (.002) .132 (.001) 2 .908 (.001) .420 (.002) .839 (.001) .687 (.001) .780 (.001) .817 (.001) .333 (.001) 3 .998 (.002) .839 (.001) .951 (.001) .881 (.001) .948 (.001) .970 (.001) .488 (.002) 5 1 (0) 1 (0) .996 (.000) .976 (.000) .996 (.000) .999 (.000) .667 (.001) Table 3: One-Sample t: Power Calculations, sample size=20, alpha=.05, each column has a different shift size: abs(mu-mu0) Distribution Normal Uniform Exponential t₃ t₅ t₇ Cauchy 0 .049 (.001) .052 (.001) .080 (.001) .043 (.001) .048 (.001) .048 (.001) .020 (.000) 0.5 .564 (.002) .219 (.001) .595 (.002) .320 (.001) .413 (.002) .453 (.002) .069 (.001) 1 .988 (.000) .679 (.001) .939 (.001) .780 (.001) .901 (.001) .941 (.001) .194 (.001) 2 1 (0) 1.000 (.000) 1.000 (.000) .982 (.000) .999 (.000) 1.000 (.000) .440 (.002) 3 1 (0) 1 (0) 1 (0) .996 (.000) 1.000 (.000) 1 (0) .592 (.002) 5 1 (0) 1 (0) 1 (0) .999 (.000) 1.000 (.000) 1 (0) .742 (.001) Table 4: Two-Sample t: Alpha Calculations, alpha=.05, each column has different sample sizes Distribution Normal Uniform Exponential t₃ t₅ t₇ Cauchy 5, 5 .049 (.001) .054 (.001) .038 (.001) .041 (.001) .046 (.001) .047 (.001) .011 (.000) 10, 10 .049 (.001) .052 (.001) .043 (.001) .042 (.001) .047 (.001) .049 (.001) .017 (.000) 15, 15 .049 (.001) .050 (.001) .046 (.001) .045 (.001) .048 (.001) .049 (.001) .018 (.000) 20, 20 .049 (.001) .050 (.001) .047 (.001) .044 (.001) .049 (.001) .048 (.001) .019 (.000) 50, 50 .050 (.001) .051 (.001) .049 (.001) .047 (.001) .049 (.001) .049 (.001) .020 (.000) 100, 100 .049 (.001) .050 (.001) .048 (.001) .049 (.001) .050 (.001) .049 (.001) .021 (.000) Table 5: Two-Sample t: Power Calculations, alpha=.05, sample size=5,5, each column has a different shift size: abs(mu1-mu2) Distribution Normal Uniform Exponential t₃ t₅ t₇ Cauchy 0 .050 (.001) .054 (.001) .039 (.001) .041 (.001) .045 (.001) .048 (.001) .012 (.000) 0.5 .107 (.001) .069 (.001) .061 (.001) .073 (.001) .089 (.001) .094 (.001) .019 (.000) 1 .283 (.001) .102 (.001) .103 (.001) .183 (.001) .221 (.001) .237 (.001) .044 (.001) 2 .793 (.001) .196 (.001) .192 (.001) .520 (.002) .629 (.002) .677 (.001) .133 (.001) 3 .984 (.000) .300 (.001) .278 (.001) .787 (.001) .892 (.001) .928 (.001) .242 (.001) 5 1 (0) .468 (.002) .387 (.001) .959 (.001) .993 (.000) .998 (.001) .434 (.002) Table 6: Two-Sample t: Power Calculations, alpha=.05, sample size=20,20, each column has a different shift size: abs(mu1-mu2) Distribution Normal Uniform Exponential t₃ t₅ t₇ Cauchy 0 .049 (.001) .050 (.001) .047 (.001) .045 (.001) .049 (.001) .048 (.001) .019 (.000) 0.5 .339 (.001) .126 (.001) .215 (.001) .179 (.001) .237 (.001) .267 (.001) .032 (.001) 1 .870 (.001) .318 (.001) .519 (.002) .523 (.002) .680 (.001) .746 (.001) .071 (.001) 2 1 (0) .732 (.001) .883 (.001) .937 (.001) .993 (.000) .998 (.000) .198 (.001) 3 1 (0) .925 (.001) .974 (.001) .989 (.000) 1.000 (.000) 1 (0) .334 (.001) 5 1 (0) .996 (.000) .997 (.000) .998 (.000) 1.000 (.000) 1 (0) .532 (.002) Works Cited: Ott, Lyman, and Michael Longnecker. A First Course in Statistical Methods. Belmont, CA: Thomson-Brooks/Cole, 2004. Print.