STATISTICS REVIEW Introduction Students are often intimidated by statistics. This brief overview is intended to place statistics in context and to provide a reference sheet for those who are trying to interpret statistics that they read. It does not attempt to show or to explain the mathematics involved. Although it is helpful if those who use statistics understand the math, the computer age has rendered that understanding unnecessary for many purposes. Practically speaking, students often simply want to know whether a particular result is significant, i.e. how likely it is that the obtained result may be attributable to something other than chance. Computer programs can easily produce numbers that allow such conclusions, if the student knows which tests to use and has an understanding of what the numbers mean. This summary is intended to help achieve that understanding. Basic Concepts Variables Most statistics involve at least two variables: an independent variable and a dependent variable. The independent variable is the one that the researcher focuses on as influencing the dependent variable. The dependent variable “depends” on the independent variable. For example, height “depends” on age: As individuals age, they usually grow taller. One cannot alter age by withholding food and thereby stunting the growth of children, so age cannot logically “depend” on height. As another example, scores on a test “depend” on the amount of knowledge an individual has on the subject matter. Assigning higher test scores would not increase knowledge. Attributes Every variable has attributes or components that constitute the variable. For example, the attributes of gender include male, female, and transgendered. The attributes of age (in years) include all of the numbers from zero to 122 or so. In research, the list of a variable’s attributes has to be exhaustive, that it, it has to cover all possibilities, and the attributes have to be discrete, that is they cannot overlap with each other. Levels of Measurement Every variable fits into one and only one level of measurement: nominal, ordinal, interval, or ratio. Nominal variables are differentiated by names (from Latin nomen, which means name). Examples include ethnicity, language group, and hair color. Ordinal variables are differentiated by their order (from Latin ordo, which means order) in relationship to each other. Examples include class rank, place of finish in a race, and birth order. We know which individual is above or below another individual based on their places in the ordered list, but the place on the ordered list tells nothing about the amount of difference between any two individuals. For example, the second place finisher in a race could have been behind the first place finisher by .01 second or by 100 seconds, but if all we know is the order of finishing, we know nothing about the closeness of their times. Likert scales produce ordinal levels of measurement. Interval variables are differentiated by a measurement that has regular intervals, such as inches, degrees Celsius, and some test scores. With interval measurements, unlike ordinal measurements, one could properly say that one individual was twice as tall as another or that one individual did half as well on a test as another individual. Ratio variables are similar to interval variables, except ratio variables have a true zero. Examples include degrees Kelvin (but not Celsius or Fahrenheit), number of children, number of times married, and number of statistics tests passed. Descriptive and Inferential Statistics Statistics are divided into two general categories—descriptive and inferential. Descriptive statistics include mean, median, mode, range, sum of squares (sum of squared deviation scores), variance (also known as mean square, which is short for mean of squared deviation scores), and standard deviation. It is assumed that graduate students have some familiarity with each of these statistics. Scores are often converted to z scores. A z score is simply a number that shows how far a score is from a mean. Therefore, a z score could range from 0 to infinity, but practically speaking, a z score of 4, which means that the raw score is 4 standard deviations from the mean, is so large that the area under the distribution curve beyond the z score is less than 0.0001. Although not a statistic, a regression line is a line through a data plot that best fits the data. The line produces the lowest possible difference between actual values and predicted values. Other descriptive statistics include the error sum of squares (SSE), regression sum of squares (SSR) and total sum of squares (SST). SSE + SSR = SST. Another descriptive statistic is proportion of variance explained (PVE). PVE is a measure of how well the regression line predicts the scores that are actually obtained. A correlation coefficient is a measure of the strength of a relationship between two variables. It is represented by the symbol r, and it can range between –1 and +1. The negative and positive signs reflect the direction of the slope of the line that shows the relationship between the two variables. The standard error of estimate is the standard deviation of the prediction errors. It tells how spread out scores are with respect to their predicted values. Inferential statistics involve tests to support conclusions about a population from which a sample is presumed to have been drawn. The tests that follow are considered to be inferential statistics. Assumptions All tests are based on the assumption that samples are randomly selected and randomly assigned and that individuals are independent from each other, i.e. that one member’s score does not influence another member’s score. Parametric tests are based on the assumption that populations from which samples are drawn have a normal distribution. Nonparametric tests do not have this assumption. Each test has other assumptions, such as regarding the type of data and the number of data points. The following test descriptions outline those assumptions. Number of Samples Different situations require different testing procedures. The following discussion is organized according to the number of samples that is being evaluated: one sample, two samples, and more than two samples. In each category, both parametric and nonparametric tests are explained. Single sample tests One-sample t test (Parametric) A t test is commonly used to compare two means to see if the are significantly different from each other. The t distribution varies according to the size of the sample, so once a t value is calculated, it has to be looked up in a table to find its significance level. The range of t values extends from 1.28 to 636.62, and higher numbers show increasing significance. (These statements are true of all t tests.) A one-sample or single-sample t test is used to see whether a group within a population is different from the population as a whole. The dependent variable must be interval or ratio. Chi-square goodness of fit test (Nonparametric) A Chi-square goodness of fit test is used to compare observed and expected frequencies within a group in a sample, i.e. whether the observed results differ from the expected results, with the expected results derived either from the whole population or from theoretical expectations. In addition to the universal assumptions, the Chi-square goodness of fit test rests on the assumption that the categories in the cross tabulation are mutually exclusive and exhaustive, the dependent variable must be nominal, and no expected frequency should be less than 1, and no more than 20% of the expected frequencies should be less than 5. The chi-square statistic is looked up in a table of critical values, and the statistic must be larger than the critical value to reject the null hypothesis. Chi square values range from 0 into the hundreds, and higher numbers show increasing independence of the variables. Two related samples tests These tests are used in which measurements on one group are taken at two different times, or two samples are drawn, and members are individually matched on some attribute. Dependent samples t test (Parametric) The dependent samples t test is also known as the correlated, paired, or matched t test. The dependent samples t test is used to determine whether the results of one measure differ significantly from another measure. The dependent variable must be interval or ratio. Results fit the t distribution, and the statistic must be larger than the critical value to reject the null hypothesis. Significant t values range from about 1.96 and run into the hundreds, with higher numbers being increasingly significant. Wilcoxon matched-pairs signed ranks test (Nonparametric) The Wilcoxon matched-pairs signed ranks test is used to determine whether the results of one measure differ significantly from another measure. The dependent variable must be ordinal (interval or ratio differences must be converted to ranks). The test statistic is called T. (This is not the same as the t test.) The statistic must be looked up in a table to find level of significance. The values of T range from 0 to 100. Unlike most tests, T must be less than or equal to the critical value to reject the null hypothesis. McNemar change test (Nonparametric) The McNemar change test is used to test whether a change in a pre- post- design is significant. The dependent measure must be nominal, (e.g. improved-not improved, increaseddecreased), and no expected frequency within a category should be less than 5. Although the calculation requires a correction factor (the Yates correction), the result produces a Chi-square statistic, which must be found in a table to find significance, and the statistic must be larger than the critical value to reject the null hypothesis. Two Independent Samples Tests Independent samples (group) t test (Parametric) The independent samples t test is used to determine whether the results of one experimental condition are independent from another experimental condition. The dependent measure must be interval or ratio, the samples must be drawn from populations whose variances are equal, and the sample must be of the same size. Results fit the t distribution, and the statistic must be larger than the critical value to reject the null hypothesis. Wilcoxon/Mann-Whitney test (Nonparametric) The Wilcoxon/Mann-Whitney test is used to determine whether the results of one experimental condition are independent from another experimental condition. The dependent measures must be ordinal (interval or ratio scores must be converted to ranks). The test statistic is U, which must be less than or equal to the critical value to reject the null hypothesis. The critical value is found in the table of critical values for the Wilcoxon rank-sum test. The U statistic can range from 0 into the hundreds, depending on how many data points are in the test. Some authors show a method for calculating a z value, which is then looked up in a table of the z distribution. Chi-square test of independence (2 x k) (Nonparametric) The Chi-square test of independence is used to determine the likelihood that a perceived relationship between categories could have come from a population in which no such relationship existed. The categories must be mutually exclusive, the dependent measure must be nominal, no expected frequency should be less than 1, and no more than 20% of the expected frequencies should be less than 5. The chi-square statistic is looked up in a table of critical values, and the statistic must be larger than the critical value to reject the null hypothesis. Three or more independent samples tests One-way analysis of variance (ANOVA) (Parametric) The one-way analysis of variance (ANOVA) is used to determine whether differences among three or more groups are significant. The dependent measure must be interval or ratio, the samples must be drawn from populations whose variances are equal, and the samples must be of the same size. ANOVA produces an F statistic, which is compared to F statistics in a table of critical values. To find the proper critical value in the table, one must know the degrees of freedom associated with the numerator and the denominator. The F statistic can range from 1 to about 34, and the statistic must be larger than the critical value to reject the null hypothesis. Kruskal-Wallis test (Nonparametric) The Kruskal-Wallis test is used to determine whether differences among three or more groups are significant in situations that do not meet the assumptions necessary for ANOVA. The dependent measure must be ordinal (interval or ratio scores must be converted to ranks). The Kruskal-Wallis test is a screening test. If it reveals a significant difference, individual pairs are evaluated with the Wilcoxon/Mann-Whitney test. The statistic for the Kruskal-Wallis test is H, which is approximately distributed like chisquare with degrees of freedom = k – 1. The H statistic must be larger than the critical value to reject the null hypothesis. Chi-square test for independence (k x k) (Nonparametric) The (k x k) Chi-square test of independence is the same as the (2 x k) Chi-square test of independence. Recommended Resources Cohen, B.C. (2001). Explaining psychological statistics (2nd ed.). New York: Wiley. Pyrczak, F. (2001). Making sense of statistics: A conceptual overview (2nd ed.). Los Angeles: Pyrczak. Stocks, J.T. (2001). Statistics for Social Workers. In B.A. Thyer (Ed.), The handbook of social work research methods (pp 81 – 129). Thousand Oaks: Sage. Urdan, T.C. (2001). Statistics in plain English. Mahwah, NJ: Erlbaum.