Normal Curve Tests of Means and Proportions Normal curve means tests, commonly called simply "hypothesis tests," are a basic method of exploring possible differences between two samples, or of testing the null hypothesis that an observed sample mean does not differ significantly from zero. The normal curve test is a parametric test assuming a normal distribution, but when its assumptions are met it is more powerful than corresponding two-sample nonparametric tests. The normal curve z-test is used when sample sizes are larger (ex., > 29), but with smaller samples the t-test is used. The two tests are equivalent. Key Concepts and Terms Deviation scores are the observed scores minus the mean, for any given variable. By definition, the average deviation score is always zero since half of deviations are above the mean and half below. Standard deviation. For any given variable, if we get rid of the signs (so they don't average to zero) by squaring the deviations , then add them up and divide by sample size (n), then get rid of the squaring by taking the square root, we have the standard deviation: s.d. = SQRT[(SUM(x - xmean)2)/n] Sample standard deviation is a conservative adjustment statisticians sometimes make when dealing with sample data. It is simply the formula above, but with (n - 1) in the denominator rather than n, the sample size. Variance is the square of the standard deviation. Standard error. If we took several samples of the same thing we would, of course, be able to compute several means, one for each sample. If we computed the standard deviation of these sample means as an estimate of their variation around the true but unknown population mean, that standard deviation of means is called the standard error. Standard error measures the variability of sample means. However, since we normally have only one sample but still wish to assess its variability, we can compute estimated standard error by this formula: SE = sd/SQRT(n - 1) where sd is the standard deviation for a variable and n is sample size. Often estimated standard error is just called 'standard error.' Confidence limits set upper and lower bounds on an estimate for a given level of significance (ex., the .05 level). The confidence interval is the range within these bounds. For instance, for normally distributed data, the confidence limits for an estimated mean are the sample mean plus or minus 1.96 times the standard error, as discussed below. Some researchers recommend reporting confidence limits wherever point (ex., mean) estimates and their significance are reported. This is because confidence limits provide additional information on the relative meaningfulness of the estimates. Thus significance has a different meaning when, for example, the confidence interval is the entire range of the data, as compared to the situation where the confidence interval is only ten percent of the range. Binomial distribution. The binomial distribution is the frequency distribution which occurs when one follows the rules of probability. The figure below, for instance, reflects the distribution of two things (R=Republicans, D=Democrats), each with a .5 probability of selection, taken four at a time with recurrences allowed. The binomial distribution follows the formula, (p + q)n, where p is the probability of one thing (Republicans, in this example, with p = .5) and q is the probability of non-occurrence (q = 1 - p), and n is the number of trials (4 in this example). Thus, (p + q)4 = (1/2 + 1/2)4 = 1p4 + 4p3q + 6p2q2 + 4pq3 + 1q4 = 1/16 + 4/16 + 6/16 + 4/16 + 1/16 As can be seen, this binomial expansion corresponds to the distribution shown in the figure above. . A normal distribution is similar to a binomial distribution, but for continuous interval data and large sample size. A normal distribution is assumed by many statistical procedures. Normal distributions take the form of a symmetric bell-shaped curve. The standard normal distribution is one with a mean of 0 and a standard deviation of 1. Standard scores, also called z-scores or standardized data, are scores which have had the mean subtracted and which have been divided by the standard deviation to yield scores which have a mean of 0 and a standard deviation of 1. Normality can be visually assessed by looking at a histogram of frequencies, or by looking at a normal probability plot output by most computer programs. Tests of normality are discussed further in the section on testing assumptions. The area under the normal curve represents probability: 68.26% of cases will lie within 1 standard deviation of the mean, 95.44% within 2 standard deviations, and 99.14% within 3 standard deviations. Often this is simplified by rounding to say that 1 s.d. corresponds to 2/3 of the cases, 2 s.d. to 95%, and 3 s.d. to 99%. Another way to put this is to say there is less than a .05 chance that a sampled case will lie outside 2 standard deviations of the mean, and less than .01 chance that it will lie outside 3 standard deviations. This statement is analogous to statements pertaining to significance levels of .05 and .01. Thus, if the mean in our sample is 20 and the standard deviation is 12, then if the data are normally distributed and randomly sampled, we would estimate that 95% of the cases will be within the range of 20 plus or minus 1.96*12 = 23.52, which is the range -3.52 to 43.53. By the same token the chance of a given case being 43.53 or higher, or -3.52 or lower, is .05. This calculation is a two-tailed test. The chance of a given case being 43.53 or higher is .025, which is the corresponding one-tailed test. Note that the significance level of a two-tailed test is numerically twice that of a one-tailed test, but since the lower the significance numerically (closer to 0) the better the significance substantively (less likelihood of the observation being just due to the chance of random sampling), the one-tailed test has substantively better significance by a factor of two. Normal curve means tests ("hypothesis tests"). The sampling distribution represented in a normal curve can be used to test hypotheses about means. An hypothesis of this sort might be as follows: Someone claims the average age of members of Congress is 55, but you think it is higher. You take a sample of the members of Congress and find the mean age is 59 -- but is 59 significantly different from 55, or could this just be due to the chance of sampling when the real mean age was in fact 55? In normal curve terms, if we hypothesize that there is a normal distribution of ages around a mean of 55 and were to take samples from this distribution, what percentage of the time would we get a sample mean age which is 4 years or more different from 55? This is similar to the two-tailed test illustrated in the figure above. If the distance from the hypothesized real mean of 55 to the sample mean of 59 (4 years) is 1.96 standard errors or greater, then the proportion of cases in the tail is .025 or less, and the proportion in both tails is .05 or less. Recall standard error is the standard deviation of sample means, which is what this example involves, but the logic is the same as for standard deviations of cases. We want the two-tail situation because the hypotheses dealt with "different from," whereas had it dealt only with "more than" then we would want the one-tail test. Dividing 1.96 into 4, if the standard error is 2.04 or less, then 59 is at least 1.96 standard errors away. We can then say that we can be 95% confident that our sample mean of 59 is significantly different from the hypothesized real mean of 55. Equivalently, we can say that the sample mean is significantly different at the .05 significance level. o The confidence interval for the example above is the sample mean, 59, plus or minus 1.96 times the standard error, for the 95% confidence level. This corresponds to a finding of significance at the .05 level when the hypothesized mean is outside the 95% confidence limits around the sample mean. The Statistics, Summarize, Explore menu choice in SPSS displays the 95% confidence interval for the mean. One Sample Formula for z values for means tests. It is conventional to denote the value we look up in a table of areas under the normal curve as "z." In the example above, z was 1.96, but it may calculate to any number according to this formula: z = (meansample - meanpopulation)/(s.d./SQRT(n - 1)) where s.d. is the sample standard deviation, used as an estimate of the unknown population standard deviation, and n is the sample size. Note that the denominator term is the standard error, discussed above. The researcher uses this formula to compute the z value, then sees how it compares with the critical value (ex., 1.96 for significance=.05) in a table of areas under the normal curve. In a two-tailed test, if z is 1.96 or higher, then the difference of means is significant at the .05 level. Normal curve proportions tests. Identical logic may be used to test the difference in proportions (percentages) rather than means. Let Po be an observed sample proportion, such as 55% of students favoring letter grades rather than pass-fail grades; let Pt be the hypothesized true proportion in the population, say 45%; and let Qt be 1 - Pt. Then the one sample formula for testing hypotheses about proportions is: z = (Po - Pt)/SQRT(Pt*Qt/n) The sample z value is compared to critical values found in a table of areas under the normal curve, as in means tests. Independent two-sample tests. Slightly different formulas apply when there are two samples, though the researcher is still computing a z value which is then compared with critical values in the table of areas under the normal curve, following the same inference logic as in one-sample tests. Two sample tests apply to a situation such as testing the difference in mean age or in percent Democrat in a sample from City A and a sample from City B. In the formulas below, the following notation applies: n1, n1: sample sizes in samples 1 and 2 xmean1, xmean2: means for samples 1 and 2 P1, P2: proportions for samples 1 and 2 Q1, Q2: 1 minus the proportions for samples 1 or 2 s12, s22: variances for samples 1 and 2 Independent Samples (Uncorrelated Data) Means Test z = (xmean1 - xmean2)/ SQRT[(s12/(n1-1)) + (s22/(n2-1))] Proportions Test z = (P1 - P2)/ SQRT[(P1Q1/n1) + (P2Q2/n2) ] Correlated two-sample tests. Data in two samples are correlated if the responses of person #1 in the first sample are associated with the responses of person #1 in the second sample. This happens in before-after studies of the same people, or in matched-pair tests of similar people. When data in the two samples are correlated, that correlation must be factored into the formulas for two-sample means and proportions tests. Notation is as for independent two-sample tests, with the addition of r12, which is the Pearson correlation of the given variable between samples 1 and 2 Dependent Samples (Correlated Data) Means Test z = (xmean1 - xmean2)/ SQRT[(s12/(n1-1)) + (s22/(n2-1)) - 2r12*(s1/SQRT(n1-1))*(s2/SQRT(n2-1)) ] Proportions Test z = (P1 - P2)/ SQRT[(P1Q1/n1) + (P2Q2/n2) - 2r12*(SQRT(P1Q1/n1) *(SQRT(P2Q2/n2) ] Assumptions Normal distribution. The normal curve means and proportions tests assume the variable of interest is normally distributed in the population. The z tests are parametric tests because they assume the parameter of normally distributed data. Non-parametric statistics are ones which do not require an assumption be made about the distribution of the data. Interval data are assumed since the calculations involved, starting with the calculation of deviations (observed scores minus mean scores) assume equal intervals. Sample size should not be small. Normal curve z tests assume than sample size n is large enough to form a normal curve. There is no accepted cutoff, but typically if n < 30, then t-tests should be used in place of z-tests. The t-tests are computed identically to z tests for larger samples, but a table of the t distribution is consulted. In such a table, degrees of freedom is (n - 1) for means tests and is (n1 + n2 - 2) for proportions tests. Homogeneity of variances is assumed in two-sample tests. The variance of the given variable should approach being equal in the two groups. Can I use normal curve means tests with ordinal data in spite of the assumption to the contrary? It is common in social science to use ordinal data with interval procedures provided there are at least five ordinal categories. Most researchers find that assumptions of normality are too grossly violated when there are fewer scale points. See further discussion in the section on testing assumptions. Bibliography Any introductory statistics textbook.