Measures of Association POWER CALCULATOR David J. Pittenger Marietta College 1 2 Measures of Association MS-DOS and Windows are the registered trademarks of Microsoft Corporation. IBM is the registered trademark of International Business Machine Corporation. Copyright © by David J. Pittenger. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this publication or the accompanying software may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the author. Measures of Association 3 TABLE OF CONTENTS Chapter 1: Introduction to Power Calculator ........................................... 7 Computer Hardware Requirements............................................................... 7 Making Backups ............................................................................................... 7 Installing the Programs to the Hard Drive .................................................... 7 Starting the Program ....................................................................................... 7 The Mouse .................................................................................................................. 8 The Keyboard ............................................................................................................. 8 The Main Menu ................................................................................................ 8 Setup ........................................................................................................................... 8 Background Color .................................................................................................. 9 Highlight Color ....................................................................................................... 9 Chapter 2: Power ~ One Sample t-Ratio................................................. 11 Introduction .................................................................................................... 11 Power Estimation ........................................................................................... 12 The Effect Size Index: d1 .......................................................................................... 12 Estimating Power ........................................................................................... 13 Alpha Level .......................................................................................................... 14 Number of Tails.................................................................................................... 14 Degrees of Freedom and Sample Size .................................................................. 14 Example of Power Estimation .................................................................................. 15 Chapter 3: Power ~ Two Sample Independent t-Ratio ........................... 17 Introduction .................................................................................................... 17 Independent Groups t-Ratio ..................................................................................... 17 Dependent Groups t-Ratio ........................................................................................ 17 Statistical Issues Related to the Independent Groups t-Ratio ................................... 18 Effect Size ................................................................................................................ 19 Assumptions for t-ratio and power ........................................................................... 19 Estimating Power ........................................................................................... 20 Alpha Level .......................................................................................................... 21 Number of Tails.................................................................................................... 21 Degrees of Freedom and Sample Sizes ................................................................. 21 Examples of Power Estimation ................................................................................. 22 t-Ratio When Variances Are Not Equal ....................................................... 25 Chapter 4: Power ~ Two Sample Dependent t-Ratio.............................. 27 Introduction .................................................................................................... 27 Estimating Power ........................................................................................... 29 Examples of Power Estimation ................................................................................. 29 Chapter 5: Power ~ Pearson Correlation Coefficient ............................ 31 4 Measures of Association Introduction .................................................................................................... 31 Estimating Power ........................................................................................... 32 Alpha Level .......................................................................................................... 32 Number of Tails ................................................................................................... 32 Degrees of Freedom and Sample Sizes ................................................................ 32 Examples of Power Estimation ................................................................................ 33 Chapter 6: Power ~ Difference Between Correlations: 1 = 2 ............. 37 Introduction .................................................................................................... 37 Estimating Power ........................................................................................... 38 Alpha Level .......................................................................................................... 38 Number of Tails ................................................................................................... 38 Sample Sizes ........................................................................................................ 38 Examples of Power Estimation ................................................................................ 39 Chapter 7: Power ~ Multiple Regression ................................................ 41 Introduction .................................................................................................... 41 Estimating Power ........................................................................................... 41 Alpha Level .......................................................................................................... 41 Number of Predictors: U ...................................................................................... 41 Sample Size: N ..................................................................................................... 41 Examples of Power Estimation ................................................................................ 42 Chapter 8: Power ~ Sign Test and P = .50 .............................................. 45 Introduction .................................................................................................... 45 Estimating Power ........................................................................................... 45 Alpha Level .......................................................................................................... 45 Number of Tails ................................................................................................... 45 Sample Size .......................................................................................................... 46 Examples of Power Estimation ................................................................................ 46 Chapter 9: Power ~ Difference Between Proportions: P1 = P2 .............. 49 Introduction .................................................................................................... 49 Estimating Power ........................................................................................... 49 Alpha Level .......................................................................................................... 50 Number of Tails ................................................................................................... 50 Degrees of Freedom and Sample Sizes ................................................................ 50 Examples of Power Estimation ................................................................................ 50 Chapter 10: Power ~ Analysis of Variance ............................................. 53 Introduction .................................................................................................... 53 Foundation of the ANOVA ...................................................................................... 54 Interpreting the F-ratio ............................................................................................. 54 F-ratio ...................................................................................................................... 55 The Correlation Ratio: 2 ......................................................................................... 55 Effect Size: f ............................................................................................................. 55 Measures of Association 5 Special Issues for Power Estimation for the ANOVA .............................................. 56 Measurement Error and Power ............................................................................. 56 Factorial Designs and Power ................................................................................ 57 Analysis of Covariance and Power ........................................................................... 57 ANOVA vs. ANCOVA ............................................................................................ 58 Estimating Power ........................................................................................... 59 Between-Subjects Factors..................................................................................... 59 Within-Subjects Factors ....................................................................................... 59 Levels of a Factor ................................................................................................. 59 Sample Size .......................................................................................................... 59 Chapter 11: Power ~ 2 .......................................................................... 65 Introduction .................................................................................................... 65 Estimating Power ........................................................................................... 66 Alpha Level .......................................................................................................... 66 Degrees of Freedom ............................................................................................. 66 Sample Size .......................................................................................................... 66 Examples of Power Estimation ................................................................................. 67 Chapter 12: Random Number Generator ............................................... 69 Introduction .................................................................................................... 69 Random Integers ....................................................................................................... 69 Random Normal Distribution ....................................................................... 71 Random Assignment of Subjects .................................................................. 71 Latin Square Generator ................................................................................ 72 Whole Number Generator ............................................................................. 73 Generating Samples with Specified Means and Standard Deviations ................ Error! Bookmark not defined. Frequency Distributions ............................................ Error! Bookmark not defined. Correlation and Regression ....................................... Error! Bookmark not defined. Analysis of Variance ................................................. Error! Bookmark not defined. One-Way ANOVA ................................................ Error! Bookmark not defined. Two-Way ANOVA................................................ Error! Bookmark not defined. Running the Program ................................................................................................ 73 Lower and Upper Sample Size ............................................................................. 80 Lower and Upper Standard Deviations ................................................................. 81 Denominator for SD ............................................................................................. 81 Replications in Set ................................................................................................ 81 Compute ............................................................................................................... 81 Print ...................................................................................................................... 81 Exit ....................................................................................................................... 81 Settings ................................................................................................................. 81 Chapter 13 Statistical Tables Generator ................................................. 85 Introduction .................................................................................................... 85 Critical Values : t-ratio ............................................................................................. 85 6 Measures of Association Critical Values : F-ratio ........................................................................................... 86 Critical Values: 2 .................................................................................................... 86 Critical Values: r ...................................................................................................... 86 r To z Transformation .............................................................................................. 86 Normal Distribution ................................................................................................. 86 Binomial Distribution ............................................................................................... 87 Chapter 14: ANOVA — Monte Carlo Simulator.................................... 89 Introduction .................................................................................................... 89 Design ANOVA Model ........................................................................................ 89 Change Parameters ............................................................................................... 90 Plot Factors .......................................................................................................... 90 Start Demonstration.............................................................................................. 90 Iterations............................................................................................................... 90 Significant Digits.................................................................................................. 90 Print Summary Tables to Screen .......................................................................... 91 Print Summary Tables to Disk ............................................................................. 91 Print Summary Tables to Printer .......................................................................... 91 Print Raw Data with Tables .................................................................................. 91 Create F-Ratio File ............................................................................................... 91 Other Features: ..................................................................................................... 92 Practice Session:................................................................................................... 93 Examining Power ..................................................................................................... 96 Robustness of the ANOVA .................................................................................... 102 References .............................................................................................. 105 Introduction to Power Calculator 7 CHAPTER 1: INTRODUCTION TO POWER CALCULATOR Power Calculator is a collection of computer programs that allow you to estimate the power of various statistical tests and to create frequently used statistical tables. Power Calculator is fully interactive and offers considerable flexibility. The program will produce tabular as well as graphical representations of the power estimates. In addition, you can vary all the essential parameters of the calculations to create the estimated power of a statistic. I wrote the program with the hope that researchers and students can use it in many different contexts with different abilities in statistics. Specifically, I hope that the program will be as useful for the professional researcher who needs quick power estimates and the student who is learning about the foundations of inferential statistics, confidence intervals, and statistical power. COMPUTER HARDWARE REQUIREMENTS To operate Power Calculator you will need an IBM® compatible computer. The program operates within MS-DOS® 2.0 or greater. The program will work within either the DOS or WINDOWS® environments. MAKING BACKUPS The programs are not copy protected. The programs and manual are copyrighted, however. Please do not distribute the program without permission. INSTALLING THE PROGRAMS TO THE HARD DRIVE The programs must be stored on your computer’s hard drive. The install program allows you to create a new directory on your hard drive for storing the programs, copy all the necessary files, and then prepare the programs to run on your system. To start the installation of Power Calculator, double click your mouse on the POWER1 icon. The program will begin by asking you to identify the directory you want to use for storing the programs. The default directory is C:\POWER. You should use this directory unless you have a specific need to store the program elsewhere. To accept the default directory press the ENTER key. To identify a new directory, type the new directory and press the ENTER key. Be sure that you follow DOS conventions for identifying the directory. STARTING THE PROGRAM When the program begins, you will see a large menu of options. As you can see, the program offers an array of computational alternatives ranging from estimation of power for various statistical procedures, to 8 Introduction to Power Calculator generation of random numbers, to the creation of frequently used statistical tables. The options listed in the Main Menu of the program can be selected using either a mouse or the keyboard. THE MOUSE To activate an option with the mouse, place the mouse cursor over the button representing the desired option and click the left button. You do not need to hold the button down other than to click it once. The program will immediately start the option you selected. THE KEYBOARD To activate a menu with the keyboard, press the highlighted letter associated with the option you wish to run. For example, to select the statistical tables, press the letter “L." To estimate the power of an Analysis of Variance, press the letter “F." THE MAIN MENU The Main Menu contains a list of all the options for this program. This menu serves as the central control for the program. In essence, you will control the operation of the program from this location. Each of the computational options will be explained in a subsequent section in this manual. The following is a brief review of the general options in this and many of the specific routines. HELP The Help button is present in all the programs. Pressing this button will present a small text window that contains information about the program and the specific routine you are using. To scroll through the help information, use the N, P, and E keys to move to the Next page, Previous page, or Exit the help routine. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. EXIT As its name suggests, the Exit button causes the program to leave the current application. If you are using one of the specific applications in Power Calculator pressing the Exit button will return you to the Main Menu. If you are in the Main Menu, pressing the Exit button will return your computer to DOS or Windows. SETUP This set of routines allows you to customize your program. You can change the color of the background and the color of the highlighted text. Introduction to Power Calculator 9 Background Color You can vary the color of the background screen by changing the proportions of Red, Green, and Blue. To increase or decrease a specific color, place the mouse on the appropriate arrow and click the left button. The colored slider bar will change and represent the proportion of the color represented in the background screen. At the same time, the background color will change. Highlight Color This option is useful to increase the contrast of the Highlight Color. You can select any color to compliment the color of the background screen by varying the amount of Red, Green, and Blue. To increase or decrease a specific color, place the mouse on the appropriate arrow and click the left button. The colored slider bar will change and represent the proportion of the color represented in the highlighted text. At the same time, the highlighted text color will change. Power: One Sample t-Ratio 11 CHAPTER 2: POWER ~ ONE SAMPLE T-RATIO INTRODUCTION William Gossett (Student, 1908) initiated a new generation of statistical testing when he described the t-ratio in 1908 under the pseudonym, “Student.” The t-ratio has become the familiar friend and beast of burden for contemporary researchers who depend upon inferential statistics for their work. Furthermore, this statistic served as the inspiration for more complex analytic tools such as the analysis of variance. In this chapter we will examine the power of the single sample t-ratio. The two sample t-ratios are examined in the next two chapters. Perhaps the most simple of the inferential statistics is the t-ratio for a single sample. For this test we are interested in whether the mean of a single sample should be considered a member of a specific population or not. In general, the null hypothesis for this test is: H 0 : 1 0 2.1 If the null hypothesis is true, then any difference between the sample and population means is due to sampling error or random effects. In other words, the null hypotheses states that there is no meaningful difference between these two numbers. In other words, random effects created the difference between the means. According to the central limit theorem, the mean of any sample drawn from the population may deviate from the population mean and within specific parameters. Therefore, our acceptance of the null hypothesis indicates that the difference between the sample and population means is within these parameters. The alternative hypothesis can take one of two forms. One form of the alternative hypothesis is: H 1 : 1 0 2.2 This alternative hypothesis is the non-directional hypothesis or a two-tailed test. We use these names because the researcher has predicted that there will be a difference between the sample and population means, but has not specified the direction of the difference. The directional or one-tailed is another form of alternative test because the researcher believes that there is reason to predict that the sample mean will be greater than (>) or less than (<) the population mean. Based on the researcher’s prediction the directional hypothesis is: H 1 : 1 0 or as H 1 : 1 0 2.3 An advantage of a two-tailed test is that it allows a researcher to find significant differences between the means without having to specify the direction of the difference. For example, if a researcher used a one-tailed 12 Power: One Sample t-Ratio test and predicted that the sample mean would be greater than the population mean and the sample mean is really LESS than the population mean, the researcher could not reject the null hypothesis. By contrast, if the researcher used a non-directional test, the difference could have been considered statistically significant. The disadvantage of the two-tailed test is that it is less powerful than the one-tailed test. Therefore, it is important to specify beforehand the type of hypothesis testing you will use and then follow through with power estimation. POWER ESTIMATION The t-ratio for the one sample case is similar to the equation for the z-score. The t-ratio is: t X ŝ 2.4 n The t-ratio can be negative, 0, or positive. The importance of the sign is essential if you are conducting a one-tailed test. If the sign of the t-ratio does not match the sign predicted in the alternative hypothesis, one cannot reject the null hypothesis. Three factors affect the magnitude of the t-ratio. The first factor is the difference between the sample and population means. All else being equal, the larger the difference between these values, the greater the magnitude of the t-ratio. The second factor that influences the t-ratio is ŝ , the unbiased estimate of the standard deviation of the population, . Larger values of ŝ will decrease the size of the t-ratio when all other factors are constant. Similarly, smaller values of ŝ increase the magnitude of the t-ratio. Therefore, power increases as ŝ decreases. The last factor that affects the size of the t-ratio is n, the number of subjects in the sample. According to the central limit theorem, the distribution of sample means drawn from the population decreases as a function of the square root of n. In other words, as n increases the theoretical spread of sample means drawn from a population decreases. Therefore, larger sample sizes produce a smaller standard error of the mean. The consequence for the t-ratio is that larger samples will, if all other things are equal, produce t-ratios of greater power. THE EFFECT SIZE INDEX: d1 Cohen (1988) devised the effect size index as a way to characterize the relative difference between two population means. The statistic is the ratio of the difference between the population means to the estimate of the population standard deviation. In mathematical terms, the effect size is determined by d1 X ŝ 2 2.5 Power: One Sample t-Ratio 13 In this equation, X and represent the sample and population means, respectively. The denominator, ŝ , is the unbiased estimate of the standard deviation of the population. If the null hypothesis is correct, then X and will be identical and the numerator of the statistic and d1 will equal 0. In words, when d1 = 0 there is no difference between the two population means. Values of d1 that are not equal to 0 represent a difference between the two populations. The larger the absolute value of d1, the greater the relative difference between the two populations. Cohen (1988) recommended some general benchmarks for evaluating the magnitude of d1. 0.0 d1 < 0.20: No Effect to Little Effect: The difference between the means are nonexistent or trivial. There may be many reasons for this condition. First, the population means may be different, but relative to the amount of variation within the population, the effect is difficult to detect without extremely large samples. 0.20 d1 < 0.50: Little Effect to Moderate Effect: This range of effect sizes represents a difference between the means that is difficult to detect without large samples. The difference among scores is large and contributes much noise to the data. Cohen (1988) suggested that effect sizes close to d1 = .20 is equivalent to the difference in heights between 15- and 16-year-old women. 0.50 d1 < 0.80: Moderate Effect to Large Effect: This range of effect sizes represents a difference between the means that can be seen in a graph of the data. In other words, effect sizes in this range allow the researcher to find significant effects with fewer subjects. Such effects are better suited for laboratory studies. 0.80 d1 < : Large Effect: Effects of this magnitude are extremely easy to observe and require few subjects to effectively estimate the difference between the population means. In essence, there is little overlap of the two sampling populations. The difference in height between 13- and 18-year-old women represents a large effect. ESTIMATING POWER You can use Power Calculator to estimate the power of the single sample t–ratio for various sample sizes, -levels, and directionality of the test. When you select this option, you will see a screen similar to the one presented in Figure 2.1. As you can see, you can change several parameters of the statistic. Let’s look at each of these in turn. 14 Power: One Sample t-Ratio Figure 2.1: The first screen for the program that calculates the power of a single sample t-ratio. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. Number of Tails This option allows you to toggle between a one- and two-tailed test. Remember that when you use a two-tailed test you divide between the two extremes of the sampling distribution, thus the proportion of the distribution at the either extreme of the distribution is /2. Sample Size You can enter sample sizes as small as 5 and as large as 9999. Recall that the degrees of freedom for the single sample t-ratio are n - 1. COMPUTE This function creates the power table for the parameters you have entered. The effect sizes for the table will range between 0 and 1.80. Note that the power distribution is symmetrical. Therefore, you can apply this information to positive and negative values of d1. GRAPH POWER This alternative offers a graph of the relation between sample size, effect size and power for the -level and type of hypothesis you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this com- Power: One Sample t-Ratio 15 putational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. EXIT This function returns you to the Main Menu. EXAMPLE OF POWER ESTIMATION An educational psychologist wants to compare the reading scores of students in a special education program to national norms. Because the psychologist uses a popular reading test, good estimates of the population mean and standard deviation are available. For specific reasons, the researcher decides to use an -level level of = .01 and a two-tailed test. Therefore, we can estimate the number of students required for a fair test of the hypothesis. After changing the basic parameters, select the Graph Power option. You will see a graph like the one presented in Figure 2.2. As you can see, the power of the study increases as effect size and sample size increase. How many subjects should the psychologist use? Figure 2.2: The power graph produced by the Power Calculator. The parameters are: = .01 for a two-tailed test. Obviously, using 500 subjects will provide good power, but at what cost? Testing this many subjects will be expensive and time consuming. If the effect size is large then the psychologist will actually be wasting time and money testing this many subjects. In essence, the researcher 16 Power: One Sample t-Ratio will be committing statistical overkill. Therefore, the researcher will need to find a balance between the competing need to conduct cost-effective research that will afford useful results. One benchmark is to set power to .80. Using this standard, we can use the graph to estimate the optimal sample size across the range of effect sizes. The researcher may guestimate that the effect size is d1 = .40. From the graph, it appears that a sample size between 70 and 80 will produce the desired power. For a more accurate estimate, we can return to the previous screen. Therefore, exit the graph and return to the calculation screen. Increase the sample size to 75 and select the calculate option. You will see a screen similar to the one presented in Figure 1.3. As you can see, 75 subjects will afford power of approximately .80 when d1 = .40 and = .01, two-tailed. Figure 2.3: The power table produced by the Power Calculator. The parameters are: N = 75, = .01 for a two-tailed test. Power: Two Sample Independent t-Ratio 17 CHAPTER 3: POWER ~ TWO SAMPLE INDEPENDENT T-RATIO Introduction The most common use of the t-ratio is to compare the means of two groups. In its typical application, the data may be discrete or continuous, and represent interval or ratio data. The goal of the t-ratio is to determine whether the difference between the two means represents chance factors or represents a meaningful difference. In addition to hypothesis testing, the t-ratio allows us to determine the relation between the independent and dependent variables, the effect size of the statistic. There are two essential forms of the t-ratio. The first is the independent groups t-ratio, which is described in this chapter. The second t-ratio is the dependent groups t-ratio, which is described in the next chapter. The difference between these two t-ratios depends upon how subjects are assigned to the two groups. INDEPENDENT GROUPS t-RATIO We use the independent groups t-ratio whenever we directly compare two sets of freestanding groups of subjects that are directly compared. Thus, the independent groups t-ratio can be used for either a true experiment or an intact groups design. In the true experiment the researcher assigns subjects to one of two groups such as a control or an experimental condition. For the intact group design, the researcher randomly selects subjects from two different and preexisting populations for comparison. The essential element of the independent groups t-ratio is that the behavior of subjects in one group has no effect on subjects in the other group. Similarly, the selection of subjects for one condition is unrelated to, or independent of, the selection of subjects for the alternate condition. DEPENDENT GROUPS t-RATIO Researchers use the dependent groups t-ratio when the two sets of data are in some way related to each other. For example, many researchers use a matched groups design to increase the power of the experiment. In this type of experiment the researcher identifies a significant subject variable that is related to the purpose of the research. The researcher then uses this subject variable to assign subjects to the treatment conditions. The goal of a matched groups design is to equate the groups before beginning the experiment. Another form of dependent groups design is to measure the same subjects under different treatment conditions. Such a design is called a repeated measures design. For example, in a study of forgetting, a psychologist may test the subject’s memory of specific material over the 18 Power: Two Sample Independent t-Ratio course of several days. In the repeated measures design the researcher tests the same subjects under different levels of the independent variable. The essential element in the dependent groups design is that subjects are assigned to the treatment conditions in a predetermined manner. The advantage of dependent groups design is that it tends to increase power. STATISTICAL ISSUES RELATED TO THE INDEPENDENT GROUPS t-RATIO Equation 8.1 is a common version of the t-ratio for independent groups. t X 1 X 2 1 2 X 1 2 X 12 X 2 2 X 22 n1 n 1 n 2 2 n2 3.1 1 1 n 1 n 2 Equation 8.2 is another version of the same equation. Notice that the denominator uses the variances of the two groups. t X 1 X 2 1 2 s X2 s X2 1 3.2 2 Many statistics textbooks do not report the full numerator as I have done in equations 3.1 and 3.2. As you can see, you can include the estimated difference between the population means. Because most researchers assume that the means are equal (e.g., 1 - 2 = 0) it is removed from the equation as superfluous. There may be conditions where the researcher knows that there is a difference between the populations and wishes to determine if the data from an experiment exceed that difference. For example, the researcher may have reason to assume that 1 = 10.0 and 2 = 5.0, therefore 1 - 2 = 5.0. The researcher may wish to determine if the difference between the two sample means is greater than 5.0. The denominator for this equation is the estimated standard error of the difference between means. As you can see, the denominator combines the sum of squares for the two sets of data to form the estimate. Statisticians call this process pooling because the equation combines the estimated variances of the two groups into a single estimate. The magnitude of the independent groups t-ratio depends upon several general factors. First, the difference between the means directly affects the value of the t-ratio. As the difference between the two means increases, the absolute value of the t-ratio also increases. When planning a research project, it is important to ensure that the two groups are as different from each other as possible. For a true experiment, one should select an independent variable that maximally influences the data. For Power: Two Sample Independent t-Ratio 19 the intact group design study, one should select from populations that are clearly defined and substantively different from each other. Another factor that influences the magnitude of the independent groups t-ratio is the amount of variability within the groups. With all else being equal, the less intersubject variability the greater the t-ratio and the power of the statistic. Again, reducing the factors that affect intersubject variability can increase the power of the statistic. Finally, all else being equal, the sample size influences the size of the independent groups t-ratio and the power of the statistic. As a generality, larger sample sizes decrease the size of the standard error of the difference between means. Therefore, increasing sample size will increase power. EFFECT SIZE The effect size of the independent groups t-ratio is defined as: d2 X1 X 2 sˆ 3.3 The numerator contains the sample means, X 1 and X 2 . The denominator of the equation represents an estimate of the common intersubject variability, or sampling error. We estimate ŝ using the simple equation sˆ sˆ 1 sˆ 2 2 3.4 Equation 3.4 is valid only when the two sample sizes are the same. The absolute value of d2 can range between 0 and . For practical purposes, however, most statisticians limit themselves to discussing effect sizes that range between 0 and 2. As noted in the previous chapter, Cohen (1988) listed these benchmarks as general guidelines. 0.0 0.2 0.5 0.8 d2 < 0.20 : d2 < 0.50 : d2 < 0.80 : d2 No to Little Effect Little Effect to Moderate Effect Moderate Effect to Large Effect Large Effect ASSUMPTIONS FOR t-RATIO AND POWER The accuracy of the t-ratio is dependent upon meeting several mathematical assumptions. The independent groups t-ratio requires independence of groups, normally distributed data, and homogeneity of variance. Although each of these assumptions is important, we will focus specifically on the assumption of homogeneity of variance. Because the denominator of the t-ratio relies upon the variance of each group, a great difference between the variances will compromise the accuracy of the t-ratio. The accuracy of the test is greatly compromised when the variances are radically different from each other and the sample sizes are not equal. 20 Power: Two Sample Independent t-Ratio A quick way to determine whether two variances are equal is to conduct a simple test called the Fmax test. The Fmax test is the larger variance divided by the smaller variance. sˆ l2arg er 3.5 F max 2 sˆ smaller The degrees of freedom correspond to the sample sizes for the two groups: The sample size represented in the numerator determines the first degrees of freedom. The sample size represented in the denominator determines the second degrees of freedom. The F-ratio is then tested against a table of F-ratios for /2. This table represents the upper and lower critical values of the F-distribution. If the F-ratio exceeds the critical value, the variances cannot be considered equal. If the variances are not equal, then the t-ratio may not provide an accurate estimate of the statistic. A powerful alternative to use in these conditions is the t-ratio developed by Welch (1936, 1938, 1947, 1951). We will consider this statistic at the end of this chapter. For this program we will assume that the variances can be considered equivalent. If the variances are not equal, then you must use an averaged denominator to estimate d2. The following equation shows how to average the variances of the groups for estimating the effect size. d2 X1 X 2 sˆ12 sˆ 22 2 3.6 If you use this estimate of d2 the sample sizes must be equal. If the sample variances are not equal and the sample sizes are not equal, then power cannot be accurately estimated. ESTIMATING POWER We can use Power Calculator to calculate the power of the independent groups t–ratio for various sample sizes, -level, and directionality of the test. When you select this option, you will see a screen similar to the one presented in Figure 3.1. As you can see, you can change several parameters of the statistic. Let’s look at each of these in turn. Power: Two Sample Independent t-Ratio 21 Figure 3.1: The initial screen for the program to calculate the power of a two-sample t-ratio. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. Number of Tails This option allows you to toggle between a one- and two-tailed test. Remember that when you use a two-tailed test you divide between the two extremes of the sampling distribution. Degrees of Freedom and Sample Sizes The program allows you to set the sample size of each group, it will then determine the degrees of freedom. You can enter sample sizes as small as 5 and as large as 9999. Recall that the degrees of freedom for the independent groups t-ratios is determined by (N1 -1) + (N2 - 1). If the sample sizes are not equal, the program calculates a common sample size using: ~ 2 N1 N 2 N 3.7 N1 N 2 If the sample sizes are not equal, the sample variances must be equal in order to create accurate power estimates! COMPUTE This function causes the program to create a power table for the parameters you have entered. The effect sizes for the table will range between 0 and 1.80. 22 Power: Two Sample Independent t-Ratio GRAPH POWER This option draws a graph of the relation between sample size, effect size and power for the -level and type of hypothesis you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this computational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. EXIT This function returns you to the Main Menu. EXAMPLES OF POWER ESTIMATION EXAMPLE 1 Assume that an experimental psychologist wants to examine two different conditions that are thought to evoke altruistic behavior. The researcher will randomly assign some subjects to a control group and the other subjects to an experimental group. Based on previous research, the psychologist believes that the effect size for this research will be moderate at best. Therefore, the researcher decides to set d2 = 0.40 and use a conventional -level of = .05 and a two-tailed test. Power: Two Sample Independent t-Ratio 23 Figure 3.2: A graph of the power curves for a two sample t-ratio where = .05, two tailed. Using these parameters, have the computer create a graph of the data. Looking at this graph, we can see that there should be approximately 100 subjects in each group to achieve a power of .80. In other words, the researcher will require 200 subjects randomly assigned to one of the two groups in order to find a significant result 80% of the time. Figure 3.3: A graph of the power curves for a two sample t-ratio where = .05, one tailed. 24 Power: Two Sample Independent t-Ratio What would happen to the power of the statistic if the researcher decided to convert the test from a two-tailed to a one-tailed test? Such a decision may be made in the planning phase of the experiment if the researcher can make a clear rationale for such a test. Following the same steps, we find that 75 subjects in each group (150 subjects, total) are needed for the study. Figure 3.3 represents the power curves for the onetailed test, and Figure 3.4 represents the power estimates for n = 75. Therefore, changing from a two-tailed test to a one-tailed test allows the researcher to use fewer subjects to obtain the same level of power. This tactic has risks, however. Using a one-tailed test requires that the sign of the t-ratio match the predicted outcome. If the relation between the means is opposite to the predicted relation, then the null hypothesis cannot be rejected. Figure 3.4: A table of power estimates for a two sample t-ratio where N = 150, = .05, one tailed. EXAMPLE 2 A psychologist wants to conduct a study examining the effects of reinforcement on the speed with which a behavior is learned. Two groups of subjects will learn a multi-step task. One group of subjects will receive no extrinsic rewards for completing the task. The other subjects will be paid $5.00 each time they complete the task correctly. The researcher believes that d2 = 0.20 and wishes to use a -level of = .05 two tailed. Look at Figure 3.2 to estimate the number of subjects required for adequate power. Given these conditions, the researcher should use 400 subjects in each group to have power close to .80. This prediction is confirmed by using the power calculator. Figure 3.5 presents these computations. Power: Two Sample Independent t-Ratio 25 Figure 3.5: A table of power estimates for a two sample t-ratio where N = 800, = .05, two tailed. t-RATIO WHEN VARIANCES ARE NOT EQUAL There are many cases when the variances for the two samples will be unequal. Although the t-ratio has a tendency to be robust against violations of the homogeneity of variance principle, the test will produce spurious results when the difference between the variances is large and when the sample sizes are unequal. The problem of unequal variances has been recognized for quite some time and is known as the Fisher-Brehnes problem. Welch (1936, 1938, 1947, & 1951) provided a solution to the problem when he devised an alternative form for calculating the t-ratio and its degrees of freedom. t̂ df ' X1 X 2 1 n 1 1 n1 1 n1 SS1 1 N 1 1 n 2 SS 2 n 2 1 SS1 1 SS 2 n 1 1 n 2 n 2 1 SS1 n1 1 n1 1 2 1 n2 SS 2 n 2 1 n 2 1 2 3.8 3.9 In these equations, SS represents the sum of squares for each group and n represents the number of subjects in each group. Although the equations do require considerable computational effort, one is rewarded with 26 Power: Two Sample Independent t-Ratio a parametric test that is powerful and robust. Indeed, there are several advantages to using these equations when the homogeneity assumption cannot be met. First, the Welch t-ratio uses the same sampling distribution as the conventional Student’s t-ratio. Second, Kohr, and Games (1974), Scheffe’ (1970), Wang (1971), and Zimmerman and Zumbo (1993), demonstrated that the Welch t-ratio has favorable features for protecting against Type I and Type II errors when sample variances and sizes are not equal. Therefore, one may apply power estimates generated by this program to the Welch version of the t-ratio. Power: Two Sample Dependent t-Ratio 27 CHAPTER 4: POWER ~ TWO SAMPLE DEPENDENT T-RATIO Introduction The essential difference between the independent and dependent groups t-ratio is the manner by which subjects the researcher assigns subjects to the groups. For the independent groups test, we assume that the researcher randomly assigns subjects or that the subjects are members of preexisting groups. By contrast, for the dependent groups t-ratio, the researcher purposefully assigns subjects to each of the groups. There are two general experimental procedures where a researcher will use a dependent groups t-ratio. The first is the matched group design, the second is the repeated measures design. For the matched group design, the researcher evaluates the subjects for some basic characteristic and then rank orders the subjects based on their scores. Next, the researcher randomly assigns the highest scoring subject to one group and the next subject to the second group. The researcher repeats this procedure until all subjects are assigned to the two groups. Using the matched groups design the researcher can increase the equivalence of the groups before the experiment begins. The other form of dependent groups design is the repeated measures design. For this design, the researcher tests the same subject under more than one condition. Stated simply, the researcher tests the same subject under both the control and experimental conditions. Consequently, the subjects serve as their own control condition. The advantage of using a dependent groups design is that the systematic variance among subjects can be estimated and statistically removed from the denominator of the t-ratio. The dependent groups t-ratio is t X 1 X 2 2 2 s X2 s X2 2r s X s X 1 2 1 2 4.1 As you can see in the denominator, the size of the standard error for the differences between means is reduced by the correlation between the two groups. As a generality, a dependent groups design are more powerful than the equivalent independent groups design if there is a correlation between the treatment conditions. There are some interesting points about the difference between the independent- and dependent-groups designs that require additional attention. The first issue to consider is the degrees of freedom for a dependent groups design. Because the subject’s scores are treated as pairs, the degrees of freedom are the number of pairs less one (n – 1 not n1 + n2 – 2). Therefore, the dependent groups t-ratio will have fewer degrees of freedom than the comparable independent groups design. 28 Power: Two Sample Dependent t-Ratio For example, if a researcher used 20 subjects in an independent groups design, the degrees of freedom would be 18 = (10 -1) + (10 -1). By contrast, the dependent groups design treats the 20 scores as 10 pairs of data. Therefore the degrees of freedom will be 9 = (10 - 1). Although the degrees of freedom are smaller for the dependent groups design, the power will not necessarily be smaller. Indeed, the power of the dependent groups design will be equal to or greater than the power of an equivalent independent groups design. The reason the power does not decrease has to do with the assumption of the dependent groups test. In the independent groups design there are two means each representing a separate population. Consequently, each population contributes its own sampling error. In the dependent groups design there is only one mean that represents the differences between the paired scores. Thus there is a single source of variation. The dependent groups test has half the amount of variance. If there is no correlation between the two groups, the power of the dependent groups t-ratio will be no greater than the independent groups tratio. Therefore, researchers using a matched groups design should select a matching procedure that is relevant to the study. When there is a correlation between the groups, the consequence on power can be dramatic. Consider the graph presented in Figure 4.1 which represents the increase in power as the correlation between groups varies between 0 and 1.0. This graph was generated using the Comparison of ts option in Power Calculator program. The lines represent the effect sizes ranging between 0.1, and 1.4. As you can see, when the correlation between groups is 0, the effect size of the dependent groups design is no greater than the independent groups design. However, as the size of the correlation between the groups increases, effect size increases. The increase in power is especially dramatic for larger effect sizes. Consider, for example, a moderate effect size of d = .40. An independent groups t-ratio will produce extremely low power, approximately 1 = .12. If a matched groups design is used, and the correlation between the groups is .90, the power jumps to approximately 1 - = .58. Power: Two Sample Dependent t-Ratio 29 Figure 4.1 A graphic illustration of the relation between power, effect size and the correlation between two groups in a dependent groups t-ratio. When the correlation between the groups is 0 the power of the dependent groups t-ratio is equal to the equivalent independent groups design. As the size of the correlation increases the power of the test increases. ESTIMATING POWER To estimate the power of a dependent groups t-ratio, you will need to conduct an intermediate statistical calculation. You will be converting the conventional measure of d, used for the independent groups test, to a form of d that reflects the correlation between the groups. This conversion is easy to perform using the following equation. d3 X1 X 2 sˆ 1 r12 4.2 Note that the numerator is our measure of effect size for the independent groups test, d2. The denominator is the correction factor created by the correlation between the two groups, r12. Once you convert the effect size, you can use the procedure described in the previous chapter to examine the power of the statistic. EXAMPLES OF POWER ESTIMATION EXAMPLE 1 A researcher who specializes in the study of math education wishes to examine the effectiveness of a new mathematics education program for middle school student. The researcher plans to conduct a preliminary research program that compares the new procedure with a conventional mathematics curriculum. The researcher has access to 120 students 30 Power: Two Sample Dependent t-Ratio who are available for the research project. We will assume that the effect size is small, thus d2 = .2. In addition we will assume that = .05, twotailed. If the researcher were to use an independent groups design, 60 subjects would be randomly assigned to each group (120 = 60 + 60). Under these conditions, the power would be 1 - = .18. Because middle school mathematics requires students to solve word problems, the researcher decides to use a matched groups design where students are matched based on a combined mathematics and verbal skills achievement test. If the correlation between the groups will be moderate, r = .70 we can re-estimate the power of the dependent groups design. Specifically, d3 .20 1 .70 .20 .30 .20 .3651 .5477 We can round d3 to .35 and leave = .05, two-tailed, and N remains 60 for each group. Using a matched groups with these conditions will produce a power of 1 - = .46. Therefore, the power increased by 28 percentage points .28 = .46 – .18; a 64% increase in power. Power: Pearson Correlation Coefficient 31 CHAPTER 5: POWER ~ PEARSON CORRELATION COEFFICIENT INTRODUCTION Sir Francis Galton, the famous British heridetarian, first conceived the concept of correlation. It was the mathematician Carl Pearson, however, who established the descriptive statistic that we currently recognize as the correlation coefficient or more formally as the Pearson Product Moment Correlation Coefficient. The definitional equation for the correlation coefficient is: r XY z X zY N 5.1 The correlation coefficient is an index of the relatedness between the two variables. Perfect correlations are represented by r = -1.00 and r = 1.00. A correlation of 0 indicates no linear * or systematic relation between the variables. When the correlation coefficient is squared, we calculate the coefficient of determination, r2. Specifically, the coefficient of determination indicates the proportion of variance in one variable that is shared with the other variable. There are many ways to interpret the correlation coefficient. One is to examine the size of r and r2. Cohen (1988) suggested that the magnitude of the correlation can be divided into four major categories. 0.0 0.1 0.3 0.5 r < 0.10 : r < 0.30 : r < 0.50 : r No to Little Effect Little Effect to Moderate Effect Moderate Effect to Large Effect Large Effect The correlation coefficient can also be subjected to hypothesis testing. A common hypothesis to test is whether the correlation for the population is significantly different from 0. This test is accomplished by converting the correlation coefficient to a t-ratio. t r XY df 2 1 r XY 5.2 * The correlation coefficient assumes that the relation between the two variables is linear. Therefore, it is possible that a non-linear relation may exist between the two variables and that the correlation coefficient will be close to or equal to 0. 32 Power: Pearson Correlation Coefficient Where rXY is the correlation coefficient and df = n - 2. Because of the nature of the t-ratio, we can plan to conduct either directional or nondirectional tests of the null hypothesis. ESTIMATING POWER We can use Power Calculator to calculate the power of correlation coefficient for various sample sizes, -level, and directionality of the test. When you select this option, you will see that you can change several parameters of the statistic. Let’s look at each of these in turn. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. Number of Tails This option allows you to toggle between a one- and two-tailed test. Remember that when you use a two-tailed test you divide between the two extremes of the sampling distribution. Sample Size You can enter sample sizes as small as 5 and as large as 9999. Recall that the degrees of freedom is determined by n - 2. If the sample sizes are not equal, the program calculates a common sample size using COMPUTE This function causes the program to create a power table for the parameters you have entered. The correlations in the table will range between 0 and 1.00. Because the correlation and t-ratio are symmetrical for this hypothesis, the information presented in the table applies equally to negative correlation coefficients. GRAPH POWER This alternative offers a graph of the relation between sample size, effect size and power for the -level and type of hypothesis you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this computational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. EXIT This function returns you to the Main Menu. Power: Pearson Correlation Coefficient 33 EXAMPLES OF POWER ESTIMATION A psychologist who studies personality created a new personality inventory that measures extroversion and introversion. In order to determine the validity of the new test, the psychologists decides to compare the new inventory to a commonly used measure of extroversion. How many subjects will the psychologist need in order to find a significant correlation between the two personality measures? Based on previous research, the psychologist believes that extroversion is a relatively easy construct to measure and that the effect size is large, r = .70. Using = .05, two-tailed, we can have the computer create a graph of the relation between effect size, sample size, and power. Figure 5.1: The power graph produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. With an extremely large effect size, we can see that the researcher will not require many subjects to detect the effect. Indeed, with 20 subjects, the power is 1 - = .95. 34 Power: Pearson Correlation Coefficient Figure 5.2: The power table produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. A health psychologist wants to determine if there is a correlation between a person's environmental stress and physical health. To test this correlation, the psychologist develops an environmental stress inventory that asks people to describe the frequency of stressful events in their lives (e.g., death of a family member, a promotion at work, or buying a new car). The participants will also complete a questionnaire about their health (e.g., blood pressure, number of time ill, and a general appraisal of their health). The psychologists believes that the effect size will be small, r = .20. Because the correlation is small, the researcher will require a larger sample in order to detect the effect. As you can see in the next figure, the researcher will require approximately 200 subjects in order to detect the effect. Power: Pearson Correlation Coefficient Figure 5.4: The power graph produced by the Power Calculator. The parameters are: = .01 for a two-tailed test. 35 Power: Multiple Regression 37 CHAPTER 6: POWER ~ DIFFERENCE BETWEEN CORRELATIONS: 1 = 2 INTRODUCTION In the previous chapter we examined the method for determining the power of the statistical test that determines whether or not a correlation coefficient equals 0. In this chapter we will examine a different test of the correlation, whether or not two correlations are equal each other. When examining the hypothesis = 0, we first convert the correlation to a t-ratio and then test the size of the t-ratio. To determine whether 1 = 2, we must convert the correlation coefficients to z-scores and then compare the difference between the z-scores. The first step to compare two correlations is to convert the correlations to z-scores using Fisher’s z-transformation: z r .5 log e 1 r log e (1 r ) 6.1 The Power Calculator program contains a routine that will print a table of r to z transformations. Once the correlations are transformed, we can proceed with a test of their difference using the equation: z1 z 2 z 1 1 N1 3 N 2 3 6.2 The effect size for the difference between correlations is q, which is determined as q = z1 - z2 6.3 for directional tests and as q = z1 - z2 for nondirectional tests. 6.4 38 Power: Difference Between Correlations 0.0 0.1 0.3 0.5 q < 0.10 : q < 0.30 : q < 0.50 : q No to Little Effect Little Effect to Moderate Effect Moderate Effect to Large Effect Large Effect ESTIMATING POWER We can use Power Calculator to calculate the power of the differences between two independent correlation coefficients for various sample sizes, -levels, and directionality of the test. When you select this option, you will see that you can change several parameters of the statistic. Let’s look at each of these in turn. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. Number of Tails This option allows you to toggle between a one- and two-tailed test. Remember that when you use a two-tailed test you divide between the two extremes of the sampling distribution. Sample Sizes You can enter sample sizes as small as 5 and as large as 9999. When comparing unequal sample sizes, the program uses the following equation to create a balanced sample size estimate: 2 n 1 3 n 2 3 n' 3 6.5 n1 n 2 6 COMPUTE This function causes the program to create a power table for the parameters you have entered. GRAPH POWER This alternative offers a graph of the relation between sample size, effect size and power for the -level and type of hypothesis you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this computational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. EXIT This function returns you to the Main Menu. Power: Multiple Regression 39 EXAMPLES OF POWER ESTIMATION A researcher wishes to examine the correlations among different variable. One test will be to determine whether two correlations are equivalent. The researcher believes that the effect is moderate to large, therefore she sets q = .40. She wants to know the power of her statistical test if she uses 100 pairs for each correlation. According to the following table, the power of the test is 1 - = .80. Figure 6.1: The power graph produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. Power: Multiple Regression 41 CHAPTER 7: POWER ~ MULTIPLE REGRESSION INTRODUCTION The process of multiple regression is a logical extension of the simple linear regression. Specifically, the researcher hopes to use two or more variables to predict a criterion. The goal of multiple regression is, therefore, to offer a better model or method of predicting the criterion. Although there are many ways to characterize the size of the effect, the more common form is R2. For the simple linear regression, r2 is an estimate of effect size at it indicates the proportion of the criterion variance that is accounted for by the predictor variable. For the multiple regression, R2 estimates the proportion of the criterion variable that is predicted by the model. 0.00 0.02 0.13 0.26 R2 < 0.02 : No to Little Effect R2 < 0.13 : Little Effect to Moderate Effect R2 < 0.26 : Moderate Effect to Large Effect R2 Large Effect ESTIMATING POWER We can use Power Calculator to calculate the power of the multiple linear regression with different numbers of predictors, -levels, and sample sizes. When you select this option, you will see that you can change several parameters of the statistic. Let’s look at each of these in turn. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. Number of Predictors: U This variable determines the number of variables you use will use in the model. Sample Size: N You can enter sample sizes as small as 5 and as large as 9999. When comparing unequal sample sizes, the program uses the following equation to create a balanced sample size estimate: COMPUTE This function causes the program to create a power table for the parameters you have entered. 42 Power: Multiple Regression GRAPH POWER This alternative offers a graph of the relation between sample size, effect size and power for the -level and type of hypothesis you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this computational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. EXIT This function returns you to the Main Menu. EXAMPLES OF POWER ESTIMATION A researcher wants to use two measures to predict an outcome. The researcher believes that the effect size is moderate to large. Therefore, R2 is approximately .20. How many subjects should the research use in the study? Set U = 2 and then click the mouse over the GRAPH POWER button. According to Figure 7.1, the researcher will need approximately 50 subjects to find the effect. Power: Multiple Regression 43 Figure 7.1: The power graph produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. When you return to the computational format, you can test the accuracy of the prediction. As you can see in Figure 7.2, if U = 2 and n = 50, the power is 1 - = .78. Figure 7.2: The power table produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. Power: Sign Test 45 CHAPTER 8: POWER ~ SIGN TEST AND P = .50 INTRODUCTION There are many situations where the researcher wishes to determine proportion in the population. Consider a political election. A candidate for an elected may wish to know the proportion of registered voters who will offer their support. If there are two candidates, Smith and Jones, a pollster may want to determine the proportion of the voters who will vote for Smith rather than Jones. If Smith and Jones are perceived equally by the electorate, then P = .50. However, if Smith has an advantage over Jones, then P > .50. By contrast, if more voters favor Jones, then P < .50. The effect size for the proportion test is g, which is determined as g =P - .50 8.1 for directional tests, and as g = P - .50 8.2 for nondirectional tests. 0.00 0.05 0.15 0.25 g < 0.05 : g < 0.15 : g < 0.30 : g No to Little Effect Little Effect to Moderate Effect Moderate Effect to Large Effect Large Effect ESTIMATING POWER We can use Power Calculator to calculate the power of the sign test or that the P = .50 for various sample sizes, -levels, and directionality of the test. When you select this option, you will see a screen similar to the one presented in Figure X. As you can see, you can change several parameters of the statistic. Let’s look at each of these in turn. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. Number of Tails This option allows you to toggle between a one- and two-tailed test. Remember that when you use a two-tailed test you divide between the two extremes of the sampling distribution. 46 Power: Sign Test Sample Size You may vary either of these values - when you change one the other is updated. You can enter sample sizes as small as 5 and as large as 9999. COMPUTE This function causes the program to create a power table for the parameters you have entered. GRAPH POWER This alternative offers a graph of the relation between sample size, effect size and power for the -level and type of hypothesis you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this computational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. END This function returns you to the Main Menu. EXAMPLES OF POWER ESTIMATION A researcher wants to conduct a survey of local voters. He believes that one part has a slight edge over the other in a specific county (e.g., p = .53). Therefore g = .03 How many subjects should he sample in order to detect the effect? Using the graph presented in Figure 8.1 suggests that 500 will not be enough if = .05 two-tailed. Power: Sign Test 47 Figure 8.1: The power graph produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. Figure 8.2 illustrates the power table when N = 2100. As you can see, with this many subjects, we will be able to have an 80% chance of detecting the difference if P = .53. Figure 8.2: The power graph produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. Power: Difference Between Proportions 49 CHAPTER 9: POWER ~ DIFFERENCE BETWEEN PROPORTIONS: P1 = P2 INTRODUCTION When sampling from different populations or comparing the results of two samples, one may want to determine whether the two proportions are equal or significantly different. The difference between proportions can be converted to a z-score using: z P1 P2 p 1 p p 1 p n1 n2 9.1 where p is p n 1 P1 n 2 P2 n1 n 2 9.2 In both equations, N represents the sample size and P represents the observed proportion. The first step is to convert the proportions to a common metric, 2 arcsin p 9.3 With this transformation we can determine the effect size, h, using: h 1 2 9.4 for directional tests and h 1 2 9.5 for nondirectional tests. When the sample sizes used to determine the two proportions, we can create a harmonic sample size using: n' 2 n 2 n 2 n1 n 2 9.6 ESTIMATING POWER We can use Power Calculator to calculate the power of correlation coefficient for various sample sizes, -levels, and directionality of the test. As 50 Power: Difference Between Proportions you can see, you can change several parameters of the statistic. Let’s look at each of these in turn. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. Number of Tails This option allows you to toggle between a one- and two-tailed test. Remember that when you use a two-tailed test you divide between the two extremes of the sampling distribution. Sample Sizes You may vary either of these values - when you change one the computer updated the other sample size. You can enter sample sizes as small as 5 and as large as 9999. COMPUTE This function causes the program to create a power table for the parameters you have entered. The correlations in the table will range between 0 and 1.00. GRAPH POWER This alternative offers a graph of the relation between sample size, effect size and power for the -level and type of hypothesis you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this computational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computes the power table, you can click on this button to have the program print a version of the table to the printer. EXIT This function returns you to the Main Menu. EXAMPLES OF POWER ESTIMATION A researcher wishes to compare the proportion of women who support a candidate against the proportion of men who support the same candidate. The researcher believes that the difference between the proportions will produce a medium effect size (e.g., h = .50). Using the graph option (Figure 9.1), you can see that the researcher will need about 70 subjects in each group. Power: Difference Between Proportions 51 Figure 9.1: The power graph produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. We can confirm this estimation by entering the appropriate sample sizes for n1 and n2. Using 70 men and 70 women will create a power of .84. Figure 9.2: The power graph produced by the Power Calculator. The parameters are: = .05 for a two-tailed test. Power: Analysis of Variance 53 CHAPTER 10: POWER ~ ANALYSIS OF VARIANCE INTRODUCTION Much like the ubiquitous t-ratio, the Analysis of Variance (ANOVA) has become the indispensable statistical tool for the contemporary researcher. Since its introduction in the 1920s by the great statistician, Sir Ronald A. Fisher, the ANOVA has been cultivated in many disciplines. Lovie (1979) for example, documented the introduction of the ANOVA to psychology. In essence, the ANOVA freed researchers from the use of haphazard single factor experiments whose results were analyzed by visual inspection, a hodgepodge of rudimentary descriptive statistics, and considerable guess work and subjective appraisal. Crutchfield and Tolman, two researchers to first use the ANOVA noted that: In this paper we wish to indicate the unique significance of multiplevariable designs in the study of those areas of behavior where it is known or suspected that complex interaction of variables exist. ... According to [Tolman’s] system, direct study of the isolated relationships between each of the independent variables or condition, on the one hand, and the dependent variable-resultant behavior-on the other, is considered not feasible. Instead, certain mediate conceptual constructs (intervening variables) are developed, and these bridge the operational gap between the independent variables and the dependent variables. The underlying presupposition of this system is that the combination of variables is not simple and direct in nature, but is a complex synthesis of field-relations (1940, p. 39). Crutchfield and Tolman recognized that the ANOVA afforded the opportunity to design experiments where one could simultaneously examine the effects of several independent variables and their interaction upon behavior. This is an essential insight for any science where the phenomenon of interest is affected by many variables operating in a complex manner. By the 1940s, statisticians had developed many of the forms of ANOVA currently in use. Reading any advanced text on the ANOVA will reveal that the statistic can be applied to univariate and multivariate procedures, randomized and fixed models, Latin square deigns, and mixed models, to list but a few experimental models. With the advent of post hoc test procedures that controlled experimentwise error rates (e.g., Tukey’s HSD, or the Scheffé test), the ANOVA now offers the researcher a full complement of statistical tools. The general logic of the ANOVA is quite simple. The total variance among all subjects is subdivided into identifiable components. In the simple single-factor design, the total variance is divided, or partitioned, into variance due to differences among treatment conditions and variance due to error. If the variance among groups is sufficiently larger than the variance within groups, one can reject the null hypothesis that the two variances are equal to each other. 54 Power: Analysis of Variance A complete review of how the ANOVA partitions the total sum of squares is beyond the agenda of this text. Rather than attempt to repeat what is done well elsewhere, I will focus on how one can estimate the power of a specific experimental design. The following section is a greatly abridged review of the ANOVA test. This information will allow us to knowledgeably discuss the power of the statistic. FOUNDATION OF THE ANOVA There are many ways to write the null and alternative hypotheses for the F-ratio. One of the more common is to present the hypotheses as: H0: 1 = 2 = 3 = k 10.1 H1: Not H0 10.2 In this form of the hypothesis, all group means are said to be equivalent to each other. This hypothesis is satisfactory for simple main effects, but can be more cumbersome when examining the relation among means for an interaction. Consequently, I prefer to think of the null and alternative hypotheses in a different form. Specifically, I note that: H0: Variance Effect = Variance Residual 10.3 H1: Variance Effect Variance Residual 10.4 I am an advocate for this form of hypothesis testing for several reasons. First, this hypothesis reminds us that the F-ratio is a nondirectional test. The statistic merely indicates whether the variance among the means is within the range predicted by sampling error. The F-ratio does not indicate the location of significant differences among the means (unless we are comparing two groups). A second reason that I prefer this form of hypothesis is that the statements make clear what we are comparing in the statistic. The Fratio is the ratio variance attributed to the effect (main effect or interaction) to the variance attributed to random effects. Therefore, the hypothesis can be readily applied to the analysis of main effects and interactions. Of course, the hypothesis can be written in a more elementary form. H0: Feffect = 1 10.5 H1: Feffect 1 10.6 INTERPRETING THE F-RATIO Although the purpose of the ANOVA is to determine whether the null hypothesis can be rejected, there are additional and important bits of information we can extract from the summary table. This information enhances our interpretation of the data and the experimental results. Besides the F-ratio, the ANOVA summary table offers two additional statistics that we need to use. These statistics are the (1) Measures of Associa- Power: Analysis of Variance 55 tion, and (2) Measure of Effect Size. Let’s consider each of these statistics in turn. F-RATIO The F-ratio is an inferential statistic that allows us to determine whether to reject the null hypothesis. If the size of F is sufficiently large, we can reject the hull hypothesis at a specified -level. With proper planning, we can design experiments that will yield data that will have sufficient statistical power to reject the null hypothesis. Aside from determining the whether or not to reject the null hypothesis, the F-ratio affords no other direct comparison or interpretation. Unfortunately, many who interpret the F ratio mistakenly believe that the size of the statistic indicates the importance of the data, the likelihood that the results of the experiment can be replicated, or the relation between the independent and dependent variables. Such interpretations are incorrect. However, other statistical tools address these issues. THE CORRELATION RATIO: 2 There are several ways to determine the relation between the independent and dependent variables. One of these is the correlation ratio that is represented as 2. The 2 is determined by dividing the sum of squares for the effect by the total sum of squares. In mathematical terms, 2 2 Effect 2 Total or as 2 SSEffect SSTotal 10.7 The numerator is the sum of squares for the effect and the denominator is the sum of squares for the total variance. We can interpret the correlation ratio in the same manner as r2, the coefficient of determination. Specifically, 2 will range between 0 and 1.0. Small values of 2 indicate that a small proportion of the total variance among the observations is due to the treatment effect(s) and that the majority of the variance due to other factors such as error. Larger values of 2 indicate that a portion of the differences among scores can be attributed to the treatment effect(s) EFFECT SIZE: F Another useful statistic is the effect size of the F-ratio. The effect size is the ratio of the variance for the effect divided by the total variance. Mathematically, we define f as f Effect Total 10.8 Of course, 2 and f are interrelated as is indicated in the following equations. 56 Power: Analysis of Variance 2 1 2 f 10.9 f2 10.10 1 f 2 We can interpret f in the same way that we interpret d for the t-ratio. Indeed, the t-ratio is a special case of the F-ratio when there is one degree of freedom for the numerator of the F-ratio. That is, t2 (N-2) = F(1, N2). Therefore, when there is one degree of freedom in the numeration of the F-ratio, d = 2 f. Following Cohen’s (1988) lead, we can characterize the magnitude as small, medium, and large. 2 0.00 0.10 0.25 0.40 f < 0.10: f < 0.25: f < 0.40: f No to Little Effect Little Effect to Moderate Effect Moderate Effect to Large Effect Large Effect We can now use this basic information to proceed with a purposeful study of the power analysis of the ANVOA. SPECIAL ISSUES FOR POWER ESTIMATION FOR THE ANOVA When conducting any empirical research, one must be concerned with the accuracy with which the dependent variable is measured. Measurement error refers to the fact that the measurement of the dependent variable is subject to random variation. This variation, in combination with variance created by intrasubject differences and other sources of random error, increase the error term of the ANOVA and degrade the power of the statistic. Therefore, increasing the accuracy with which measurement is made can increase power. Intrasubject variability is another issue that influences the power of any statistic. As the average difference among subjects increases the ability to detect differences among the groups decreases. Fortunately, researchers have access to a number of methodological and statistical procedures that reduce or statistically control for intrasubject variability. Two of the more common statistical procedures that will be examined include blocking, within-subjects designs, and the analysis of covariance. Measurement Error and Power Measurement error reflects the reliability with which a construct is measured. Reliable test produce consistent results, unreliable tests do not. When the reliability of the measurement techniques is perfect the correlation between two sets of measurements will be rXX = 1.00. A measurement procedure with no reliability will produce measurements that can be best described as a series of random numbers. In such cases rXX = 0.00. Hopkins and Hopkins (1979), and Rogers and Hopkins (1988) demonstrated that the relation between power and measurement error can be expressed using the following equation. Power: Analysis of Variance f YY YY f 1.0 57 10.11 In this equation, fYY is the estimated power, YY is the estimate of testretest reliability, and f1.0 is the power of the effect assuming perfect measurement. Here is a simple example of how this statistic can be used. Assume that you are designing a study that involves a measure of academic achievement. According to published results, the test-retest reliability of the test is rYY = .64. You have reason to believe that the effect size of the study is moderate, f = .35. If your estimate of effect size did not include an estimate of measurement error, you will need to adjust your effect size measure as f = .28 = . 64 .35. In essence, measurement error reduces power. This drop in power can be overcome through several strategies. The first would be to increase the sample size to ensure sufficient power. Another alternative would be to increase the measurement accuracy. For example, the test could be lengthened to increase reliability or you could add an additional test to increase measurement accuracy. The alternative will be dependent upon the relative cost of each. Using longer and more complex tests may tax the patience of the participants and burden the budget of the study. Factorial Designs and Power The factorial design has many advantages. Among these is the ability to examine the interaction among two or more independent variables. Another advantage is that factorial designs are considered more cost effective than comparable single factor experiments. In essence, the factorial design has the potential to make better use of subjects than more simplistic designs. Because of these two factors, the factorial ANOVA has the potential of increasing the power of a research project. ANALYSIS OF COVARIANCE AND POWER Another method for controlling intrasubject variation is the analysis of covariance or ANCOVA. The ANCOVA is a sophisticated statistical technique that systematically estimates intrasubject variability due to a specific subject variable. This variability is then removed from the general residual estimate, thus increasing the power of the measure. Rogers and Hopkins (1988) provided an estimate of the effect size when one uses an analysis of covariance. Specifically, they noted that f ' XX YY YY f 1 .0 1 2 X YY XX Y 10.12 In the equation, 2X Y represents the true score correlation between the covariate and the dependent variable. The variables XX and YY represent the reliability of the measure of the covariate and the dependent variable, respectively. 58 Power: Analysis of Variance Other than this transformation, the same procedures for estimating the power of the conventional ANOVA can be used to estimate the power of the ANCOVA. In other words, the following examples may be used for either the ANOVA or ANCOVA. ANOVA VS ANCOVA As a generality, the ANCOVA is more powerful than the ANOVA for the simple reason that the ANCOVA identifies an additional source of error variance that is then removed from the general error term. The consequence is a larger F-ratio. Although it is tempting to assume that the ANCOVA is always the superior research tool to the ANOVA, there are instances where the ANOVA may be preferred. Maxwell, Cole, Arvey, and Salas (1991) demonstrated that there are clear instances where the ANOVA may be the preferred statistical tool. As they noted in their paper, the noncentrality parameter for the ANCOVA is: ANCOVA n i2 2 a W2 XY 10.13 where n is the sample size is the deviation between the mean of a treatment condition and the grand mean, a is the number of levels for the factor, W2 is the within-group variance for the dependent variable, and XY is the population correlation between the pre-test and post-test. The noncentrality parameter for the ANOVA is: n ' k i2 ANOVA 10.14 a W2 1 k 1 YY Here, n’ is the sample size of groups, YY is the test-retest reliability of the dependent variable measure, and k is the factor by which the test is lengthened or shortened. There are several important aspects of these equations. First, the noncentrality parameter is central to the estimation of power. For example, we can write the equation in its most simple form as: f 10.15 n Using these basic mathematical principles, Maxwell et al. (1991) demonstrated that under special conditions, ANCOVA n i2 2 a W2 XY = ANOVA n ' k i2 a W2 1 k 1 YY Specifically, one can adjust the size of the dependent measure test in the ANOVA to equal the power in the ANCOVA. Maxwell et al. demonstrated Power: Analysis of Variance 59 when YY and XY are small, the ANOVA with a longer post test will be as powerful and require fewer subjects than the comparable ANCOVA. For larger values of YY and XY, the ANCOVA requires fewer subjects to achieve the same power. ESTIMATING POWER Before the computer can estimate the power of your statistic, it must know the specifics of the design you are studying. Specifically, you will need to enter the number of factors used in the study and the number of levels within each factor. It is essential to understand the terminology used in this manual to use the program effectively. Here are general definitions of the terms the program uses. Between-Subjects Factors When the program begins, you will be asked to enter the number of between-subjects Factors that are used in the study. A factor is an independent variable. A between-subjects factor is an independent variable wherein the subjects are exposed to only one level of the variable. Within-Subjects Factors The program will also ask you to enter the number of withinsubjects factors in the design. A within-subjects factor is an independent variable where subjects are exposed to all levels of the treatment condition. In other words, if the same subject is measured under several treatment conditions or tested on several occasions, then the variable is a within-subjects factor. Levels of a Factor For the analysis of variance, each factor will have two or more levels. Each level represents a unique condition within that factor. Consider an experiment that examines the relation between drug dosage and response. The drug treatment represents a factor. Each dosage represents a level of the factor. If separate groups of subjects are randomly assigned to each dosage condition, then drug dosage is a between-subjects variable. If all subjects experience each drug dosage during the course of the experiment, then drug dosage is a within-subjects variable. Sample Size The sample size represents the number of subjects assigned to each treatment condition. The program will assume that you intend to assign equal numbers of subjects to all treatment conditions. Once you have entered the relevant information, the program will create a type of ANOVA summary table. The table will identify each of the major terms in the ANOVA, its degrees of freedom, the adjusted sample size (N’), and the power of the effect for various effect sizes. Let’s consider several examples as a way to illustrate the use of the program 60 Power: Analysis of Variance Example 1: One Way ANOVA A researcher wants to conduct a single factor ANOVA with five levels of the factor. Subjects are to be randomly assigned to each of the five treatment conditions. Therefore, there is one between-subjects factor with five levels. We will assume that there are to be 5 subjects in each group and that = .05. In response to the computer’s questions we enter the following information: Number of Between-Subjects Factors: Levels of B-S Factor 1: Number of Within-Subjects Factors: Sample Size: : 1 5 0 5 .05 The computer will generate a table similar to the one in Figure 10.1. If you want to examine larger effect sizes, use the arrow pointing to the right. The arrow pointing to the left reveals smaller effect sizes. Figure 10.1 An example of the output for the ANOVA power estimate. For this example the program estimated the power of a one-way ANOVA with five levels of the independent variable, five subjects in each group and = .05 You can experiment with the power calculator by changing the sample size and -level. After you change these values, activate the REDO option for a revised set of power estimates. Example 2: One Way ANOVA with repeated measures A researcher is interested in how rapidly people will forget information and arranges to test subjects once a week for five weeks after they have memorize specific information. Because the researcher is testing the same subjects once a week, time is a within-subjects variable. Therefore, Power: Analysis of Variance 61 there is one within-subjects factor with five levels. We will assume that there are to be 5 subjects in the study and that = .05. In response to the computer’s questions we enter the following information: Number of Between-Subjects Factors: Number of Within-Subjects Factors: Levels of W-S Factor 1: Sample Size: : 0 1 5 5 .05 The computer will generate a table similar to the one in Figure 10.2. If you want to examine larger effect sizes, use the arrow pointing to the right. The arrow pointing to the left reveals smaller effect sizes. Note that in the “Model” line the 5 is surrounded by brackets ( [5] ). The program does this to indicate that the factor is a within-subjects variable. Figure 10.2 An example of the output for the ANOVA power estimate. For this example the program estimated the power of a one-way repeated-measures ANOVA with five levels of the independent variable, five subjects in each group and = .05 You can experiment with the power calculator by changing the sample size and -level. After you change these values, activate the REDO option for a revised set of power estimates. Example 3: Two-Way Factorial ANOVA A developmental psychologist wants to study children’s reaction to strangers. The researcher decides to study the interaction between the age of the child and the sex of the stranger. The first factor, sex, has two levels. The researcher decides to make this a Between-Subjects Variable — half of the children meet a stranger who is male, the others meet a 62 Power: Analysis of Variance stranger who is female. The second factor is also a Between-Subjects Variable because the children are tested once. The researcher decides to use four age groups representing children who are 4-8 months, 8-12 months, 12-16 months, and 16-24 months. Therefore, the researcher has a 2 4 factorial design. Number of Between-Subjects Factors: 2 Levels of B-S Factor 1: 2 Levels of B-S Factor 2: 4 Number of Within-Subjects Factors: 0 Sample Size: 5 : .05 Figure 10.3 An example of the output for the ANOVA power estimate. For this example the program estimated the power of a two-way ANOVA with two levels of the first variable, four levels of the second factor, five subjects in each group, and = .05 Example 4: Two-Way Factorial ANOVA With One Repeated Factor Assume that a clinical researcher wants to examine the long-term effectiveness of three forms of psychotherapy across time. First, subjects are randomly assigned to one of three treatment program, psychoanalytic, humanistic, or behavioral. After a fixed number of sessions, the treatment is terminated. The participates in the study are contacted 3, 6 and 12 months after the end of the treatment for assessment. The first factor is a between-subjects factor and has three levels. The second factor is a within subjects variable with three levels. Assume that there are 21 subjects (7 in each group). Therefore the information supplied to the computer is: Power: Analysis of Variance Number of Between-Subjects Factors: Levels of B-S Factor 1: Number of Within-Subjects Factors: Levels of W-S Factor 1: Sample Size: : 63 1 3 1 3 7 .05 Figure 10.4 An example of the output for the ANOVA power estimate. For this example the program estimated the power of a two-way ANOVA with three levels of the first variable, three levels of the second factor, seven subjects in each group, and = .05 Note that in the “Model” line, the second 3 is surrounded by brackets ( [3] ). The program does this to indicate the factor that is the withinsubjects variable. Power: 2 CHAPTER 11: 65 POWER ~ 2 INTRODUCTION Where the analysis of variance is the statistic of preference for data described as continuous and interval or ratio, the 2 is the statistic of preference for categorical data. This statistic is a frequently used inferential procedure in both the natural and social scientists. Thus, it is not uncommon to find extensive references to the statistic made by geneticists studying population characteristics, political scientists studying voting patterns, and psychologists studying the relation between attribution and behavior. Since its introduction by Carl Pearson in 1900, many excellent accounts of the use and misuse of the statistic have been written. I refer you to several of the more lucid for your review so that we may proceed with a review of the power of the 2. As a brief review, the 2 for a contingency table having I rows and J column, is determined using the following equation. I J 1 1 2 O ij E ij 2 E ij 11.1 Where Oij and Eij represent the Observed and Expected frequencies for each cell in the 2 table. The degrees of freedom for the 2 is simply df = (Number of Rows - 1)(Number of Columns -1) 11.2 An alternative application of the 2 is the goodness of fit test. This test is used to determine whether a single row of frequencies conforms to some predetermined set of frequencies. When using the goodness of fit test, the degrees are simply the number of cells less 1. To determine whether or not to reject the null hypothesis, the observed 2 is compared to a critical value of 2 based in the degrees of freedom and -level selected. If you reject the null hypothesis for 2 you can calculate several descriptive statistics that aid in the interpretation of the statistic. Of the many that are available, Cramer’s Contingency Coefficient, C, is most useful to us in our exploration of power. In brief, C, is a measure of association that indicates the degree to which the row and column variables are related to each other. We can determine C using C 2 2 N 11.3 Power: 2 66 This statistic is important to use because we can use it to develop a measure of effect size, w. Technically, the effect size of the 2 is determined by w P O P E P E I J 1 1 2 ij ij 11.4 ij Where P(Oij) and P(Eij) represent the proportion of the total frequency represented in each cell. A more convenient method of calculating w, however, is w C2 1 C2 11.5 As with other measures of effect size, Cohen (1988) has recommended several benchmarks for interpreting w. Specifically, 0.0 0.1 0.3 0.5 w < 0.10 : w < 0.30 : w < 0.50 : w No to Little Effect Little Effect to Moderate Effect Moderate Effect to Large Effect Large Effect ESTIMATING POWER We can use Power Calculator to calculate the power 2 for various sample sizes, degrees of freedom, and -level. When you select this option, you will see a screen similar to the one presented in Figure X. As you can see, you can change several parameters of the statistic. Let’s look at each of these in turn. Alpha Level This option allows you to vary the -level you plan to use for your research. Although the default value is set as = 0.05, you can increase or decrease the value of . You can enter values between .50 and .00001. The 2, like the F-ratio, is considered a nondirectional test Degrees of Freedom As noted previously, the degrees of freedom depend upon the size of the table used to conduct the test. For a goodness of fit test, the degrees of freedom are the number of cells less 1. For the contingency table, the degrees of freedom are (R - 1)(C - 1). Sample Size As the name implies, the sample size reflects the total number of observation used to make up the table of data. COMPUTE This function causes the program to create a power table for the parameters you have entered. Power: 2 67 GRAPH POWER This alternative offers a graph of the relation between sample size, effect size and power for the -level you are using. The graph option is useful when you want a quick estimate of the sample size your study may require. HELP The help function calls a help screen that should provide you with general information that will help you understand the features of this computational option. The help screens are a greatly abridged version of this manual. PRINT Once the program computer the power table, you can click on this button to have the program print a version of the table to the printer. EXIT This function returns you to the Main Menu. EXAMPLES OF POWER ESTIMATION Two researchers believe that the effect size for a study they wished to conduct is relatively large (e.g., w = .40). Will an N of 50 observations be sufficient? According to the results presented in Figure 10.1, the answer is yes. The power is 1 - = .82. Figure 11.1: The power table produced by the Power Calculator. The parameters are: = .05. 68 Power: 2 Random Number Generator 69 CHAPTER 12: RANDOM NUMBER GENERATOR INTRODUCTION This chapter describes four separate options built into the Power Calculator program. Each of these options generates a table of some form of random number. Using these options, you can (a) generate random integers between two extremes, (b) generate a normally distributed random sample with a specified mean and standard deviation, (c) create a random sequence for assigning subjects to treatment conditions, (d) generate a Latin Square of a specified size, and e) residuals for whole numbers. Each option will print the results to either the computer screen of the printer. RANDOM INTEGERS Random numbers are essential for work in statistics and experimental design. Consider a political scientist who wishes to poll the registered voters of a county. In order to produce useful data, the researcher will need to ensure that sample is not biased. One protection against selection bias is to use random selection. Let’s look at how the researcher could use the program. When the Random Integers option is selected, you will see a screen similar to the one presented in Figure 12.1. Options available are: Figure 12.1 The initial screen that you will see when you select the Random Integer option. 70 Random Number Generator Lowest Value in Sample This number represents the lowest integer that can potentially be in your sample. You may include positive as well as negative numbers. The only requirements are that this number be less than the highest potential integer in the sample and a whole number. HIghest Value in Sample This number represents the highest integer that can potentially be in your sample. You may include positive as well as negative numbers. The only requirements are that this number be greater than the lowest potential integer in the sample and a whole number. Sample Size This tells the computer how many numbers to generate for the sample. Print to Screen You may print the sample of integers to the screen or your printer. For the sake of illustration, assume that we want to generate 100 numbers between 0 and 25,000, inclusive. After you enter the appropriate value, press the Compute Button. The computer will generate a screen that looks something like the following. Figure 12.2 An example of the printout created by the Random Integer program. For this example, the computer generated 100 random numbers between 0 and 25,000. The program prints the numbers as it creates them, which is to say the numbers are random. In addition, the numbers are not sorted and there is the chance that the program will produce duplicate values. The probability that duplicates increases as the difference between the highest and lowest numbers decreases and the number in the sample increases. Random Number Generator 71 RANDOM NORMAL DISTRIBUTION This option is similar to the previous option. The primary difference is that the numbers that are generated are normally distributed and are selected from a population with a mean and standard deviation that you define. When this program begins, you may alter the value of the population mean, the population standard deviation, and the number of observations to include in the sample. The table can be printed to the screen or the printer. The following figure is an example of the printout that the program generates. These data were generated from a population where the = 100 and = 15. Figure 12.3 An example of the printout created by the Random Integer program. For this example, 10 were selected from a population where = 100 and = 15. RANDOM ASSIGNMENT OF SUBJECTS One of the key elements of a true experiment is the random assignment of subjects to the control and treatment conditions. Many researchers believe that true assumptions of cause and effect can be made only when random assignment determined the treatment conditions the subjects experienced. This program will create a table that will allow you to randomly assign subjects to treatment conditions. Let’s look at an some examples of how the program could be used. Assume that a researcher is conducting an experiment where there are four treatment conditions. The experiment could be a single factorial design with four levels of the factor, or a 2 2 factorial design. Each treatment condition is designated a number between 1 and 4, inclusive. We will also assume that the researcher plans to put five subjects into each treatment condition. Therefore, the number of groups is 4 and the number of subjects is 5. Enter this information into the computer and then click on the compute button. The program will generate a table something like the following figure. 72 Random Number Generator The program created five sets of numbers ranging between 1 and 4. Looking at Set 1, we can see that the first subject is assigned to Group 2, the second subjects is assigned to Group 3, the third subject is assigned to Group 1, and the fourth subject is assigned to Group 4. The researcher will follow this table until all the subjects are assigned to the appropriate treatment conditions. Immediately following is a test of randomness. This test determines if the distribution of numbers across the treatment conditions (positions) is random or not. The computer generates a 2 test for each position. As a general benchmark, if the p value is greater than .05 the column of numbers can be considered random. For all practical purposes, the numbers in this example appear to be reasonably random. If you are suspicious of the table, generate a new table by returning to the previous screen and creating a new table. Figure 12.4 An example of a table of random numbers that can be used to randomly assign subjects to treatment conditions. In this example there were four treatment conditions with five subjects in each treatment condition. The test of randomness is a 2 for the numbers in each column or testament condition. LATIN SQUARE GENERATOR Latin Squares are an important control and analysis procedure. In its simple form, the Latin Square is a form of counterbalancing procedure. The Latin Square design has many methodological and statistical advantages. To learn more about the Latin Square, consult an advanced text on statistics such as Winer, Brown, and Michels (1991). The Latin Square is a matrix of numbers. Each number represents a specific condition. The distinguishing feature of the Latin Square is that each number will be in a position only once. In other words, no number will have the same position more than once. We can look at an example to see how the Latin Square is constructed. The Latin Square in Figure 12.5 is a 10 10 matrix. Each row and each column contains the num- Random Number Generator 73 bers between 1 and 10 inclusive. If you look closely at the second number in the second row (7) you will see that the is not repeated again in that row or column. Figure 12.5 An example of a 10 10 printout produced by the Latin Square program. The numbers in the square represent the individual treatment conditions. Each number resides in a column location once, there are no replications. The numbers at the ends of the rows and columns represent are the total of the row or column and thus a check of the square. The computer generates the Latin Square by selecting numbers at random. The implication of this fact is two fold. First, each time you generate a Latin Square it will be different from the previous Latin Square *. The second implication is that with large Latin Squares, the computer will spend some time generating the square. I cannot offer you a good estimate of the time it will take to generate a Latin Square. Factors such as the size of the square and the operating speed of your computer interact to make it impossible to offer useful predictions. However, if you require a large Latin Square (e.g., greater than 10) be patient. If the program seems to be “stuck,” wait; the program is looking for a valid solution for you request. If you become inpatient, click on the Exit button and then restart the program. WHOLE NUMBER GENERATOR In statistics, as in all disciplines, doing is learning. Homework assignments, worked examples, and computational examples are effective * In fact, when the number of rows is three or fewer, there is only one possible solution. When there are four rows, there are four potential solutions. Five rows produces 58 potential solutions. The number of potential solutions increases rapidly as the number of rows is increased. 74 Random Number Generator methods for teaching simple and complex statistical techniques. Requiring students to work through examples allows them to practice their computational skills as well as examine how a statistical test works. One problem that students often encounter is computational error. These errors are most likely to occur when an intermediate step produces a number with a remainder. The student may transpose numbers, round inappropriately, or make some other mistake that produces the wrong answer. Finding the error in computation can be daunting and frustrating, and may make statistical work aversive for students. A partial solution to this problem is to create data sets that produce whole numbers for the desired statistical test. These data sets allow students to work through assignments that do not produce an intimidating collection of numbers to manipulate. These experiences may help the student who is easily threatened by numbers to develop a measure of confidence in his or her abilities. GENERATING SAMPLES WITH SPECIFIED MEANS AND STANDARD DEVIATIONS A residual is ei X i X . The program will allow you to generate residuals with different sample sizes and standard deviations. Here is an example of the arrays in the table. ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Residuals for whole numbers: SD = (SS/(n–1))^.5 repeats = 5 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Residual Arrays for N = 5 and s = 4 Array: e 1: 7 –3 –2 –1 –1 :SK = 2.0 2: 6 –5 1 –1 –1 :SK = 0.6 3: 6 –4 2 –2 –2 :SK = 0.9 4: 5 –5 3 –2 –1 :SK = 0.1 5: 5 –5 –3 2 1 :SK = –0.1 6: 4 4 –4 –4 0 :SK = 0.0 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– In this example, the standard deviation of each array is 4.0 using n - 1 in the denominator of the equation. The mean of each array is 0.0. The table also includes the skew of the array. You can use these arrays to generate data sets for students. All you need to do is add a constant to each residual. The result will produce an array of data whose mean will equal the constant. Specifically, X i ei X Random Number Generator 75 For example, X Mean Median X2 (X)2 ŝ e X X 6 +10 = 16 –5 +10 = 5 1 +10 = 11 –1 +10 = 9 –1 +10 = 9 0.0 50.0 0.0 10.0 –1.0 9.0 64.0 564.0 0.0 2500.0 4.0 4.0 e 4 + 4 + –4 + –4 + 0 + 0.0 0.0 0.0 64.0 0.0 4.0 X 10 10 2 2 6 30.0 6.0 6.0 244.0 900.0 4.0 X 6= 6= 6= 6= 6= You can use the information about skew when you wish to illustrate how outliers affect the skew of the data and the difference between the mean and median. The greater the absolute value of the skew, the greater the difference between the mean and median. To reverse the sign of the skew, multiply the residuals by –1. For example, the array e = (7 –2 –2 –1 –1) has a skew of 2.0. Multiplying the array by –1 produces a new array e' = (–7 2 2 1 1) which has a skew of –2.0. FREQUENCY DISTRIBUTIONS Using arrays of different sizes, you can create data sets for frequency distributions, stem–and–leaf plots, or other forms of exploratory data analysis. For example, with N = 10 and ŝ = 2, two of the many residual arrays are e = (4 e = (3 –4 3 1 –3 1 –3 –1 0 –1 0 0 0 0 0 0 0 0) 0) The following graph of the data is a simple frequency distribution. The two sample means are 9 and 10, respectively. 76 Random Number Generator 7 Frequency of X 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Valu es of X Frequency distribution of two arrays CORRELATION AND REGRESSION Using the residuals, we can generate data for correlation coefficients. Although the final correlation coefficient will have fractional values, the intermediate steps will be whole numbers. The following data represent an extended example. Here are four sets of residuals where N = 6 and ŝ = 4. For the sake of illustration, I rearranged the order of the residuals to create different patterns of correlations. In this example, all the Xs equal zero. Therefore, the correlation is the sum of the cross products (XY) divided by the sum of squared residuals: r XY e X e Y e 2 . Data For Correlations Subject 1 1 7 2 –4 3 –3 4 2 5 –1 6 –1 e2 80 Cross Products Subject 1 eXeY 2 –5 –4 –1 2 3 5 80 3 –3 –3 –3 0 2 7 80 12 –35 16 3 4 –3 –5 –20 13 21 12 9 0 -2 -7 33 4 –2 –4 –4 6 2 2 80 14 -14 16 12 12 -2 -2 22 Random Number Generator Subject 1 2 23 15 12 3 0 6 35 71 2 4 10 16 4 12 6 10 58 2 3 34 6 12 12 0 4 14 48 eXeY Subject 1 77 eXeY Correlations 1 2 3 4 1 1.000 2 –0.250 1.000 3 0.412 0.888 1.000 4 0.275 0.725 0.600 1.000 By adding a constant to each array of residuals, you can eliminate negative values and present data that are more realistic. You can, of course, use pairs of residuals that have different standard deviations. Take care in your selection, however. The denominator of the correlation coefficient is the harmonic mean of the sum of squares for the two samples. If both sum of squares are even values, then the denominator will also be even. Other combinations of sum of squares will produce whole denominators with fractional values. ANALYSIS OF VARIANCE One–Way ANOVA Assume that you want students to complete a single factor AONVA with four levels and six observations in each group. To create the data, we need to follow several specific steps. Step 1: Generate Group Means For this example, there are four group means. For our example we will set ŝ = 2.0. The residuals for the group means are: e = (3 –1 –1 –1) With these residuals, we can select the grand mean. The grand mean determines the mean of each group. Let's set the grand mean to 10. Therefore, the four group means are 78 Random Number Generator Ms = (13 9 9 9) Recall that the estimate of the between-groups variance is: 2 sˆ between n j X j X 2 k 1 As you can see, we need to recognize that sample size affects the sum of squares between groups. In this example, the sum of squares among the means is SS = 12.0, and the sample size for each group is 6. Therefore, the sum of squares between groups is 72.0. Step 2: Generate Individual Scores We can now use these means to generate the data for the groups. For this example, I selected ŝ = 2.0 X1 = X2 = X3 = X4 = X1 = X2 = X3 = X4 = (4 (3 (3 (3 –1 –3 2 –2 (17 12 (12 6 (12 11 (12 7 Source Between Within Total –1 1 –2 –2 12 10 7 7 –1 –1 –1 1 12 8 8 10 Sum of Squares 72.00 80.00 152.00 –1 0 –1 1 12 9 8 10 0) 0) –1) –1) 13) 9) 8) 8) Totals df 3 20 23 + + + + 13 9 9 9 Xi Xi2 78 54 54 54 240 1034 506 506 506 2552 Mean Square 24.00 4.00 F–Ratio 6.00 Two–Way ANOVA For this example, I will generate data for a 3 4 ANOVA with five scores in each cell. As in the previous example, the first thing we need to do is select a grand mean. For the sake of simplicity, we will use a grand mean of 10.0. Step 1: Generate Group Means For this example, I chose the following residuals. eA = (5 eB = (6 –5 –2 0) –2 –2) Random Number Generator 79 By adding 10 to each residual, we have the means for the main effects. MA = (15 MB = (16 5 8 10) 8 8) We can now create the mean for each cell in the factorial block. To calculate the cell mean, multiple the row and column mean and divide by the grand mean. For example, the mean of A1B1 is 24 = (15 16)/10. a1 a2 a3 b1 24 8 16 16 b2 12 4 8 8 b3 12 4 8 8 b4 12 4 8 8 15 5 10 10 Step 2: Generate Individual Scores For a sample size of n = 5 and ŝ = 3.0 there are only three residual arrays that produce whole numbers. e1 = (5 e2 = (4 e3 = (3 –3 –3 3 –1 –3 –3 –1 1 –3 0) 1) 0) We can randomly assign the arrays to the different cell means to generate the data for the individual cells. For example; a1 a2 a3 Totals: b1 29 21 23 23 24 120 b2 15 15 9 9 12 60 b3 16 9 9 13 13 60 b4 17 9 11 11 12 60 Totals 13 5 7 7 8 40 7 7 1 1 4 20 8 1 1 5 5 20 9 1 3 3 4 20 100 21 13 15 15 16 80 240 11 11 5 5 8 40 120 12 5 5 9 9 40 120 13 5 7 7 8 40 120 200 600 300 80 Random Number Generator The calculations for the ANOVA produce whole numbers until we reach the F–ratio. The F–ratios would have been whole numbers if I had selected arrays for the main effects that have standard deviations whose ratio is a whole number. In fact, for Factor A, s = 5, for Factor B, s = 4, and s = 3 for each group. Source A B AB Within Total Sum of Squares 1000.00 720.00 120.00 432.00 2272.00 df 2 3 6 48 59 Mean Square 500.00 240.00 20.00 9.00 F–Ratio 55.556 26.667 2.222 RUNNING THE PROGRAM The program is very easy to use. Just enter the parameters for the data you desire and then click a button — the computer does all the work. When you start the program, the screen will be similar to the one in Figure 13.2. As you can see, you can change several parameters. Let's review each in turn. Figure 12.7 Sample screen of the Whole Number Generator Lower and Upper Sample Size These parameters determine the size of the arrays that the computer will generate. In the default mode, the values are 2 and 5. Therefore, the computer will generate arrays of residuals for sample sizes of 2, 3, 4, and 5. You can set the upper sample size to 35. To generate arrays of only one sample size, set the lower and upper values to the same value. Random Number Generator 81 Lower and Upper Standard Deviations These values determine the standard deviations of the arrays that the computer will generate. In the default mode, the values are 2 and 5. Therefore, the computer will generate arrays of residuals with sample sizes of 2, 3, 4, and 5. You should note that not all sample size and standard deviation combinations will produce an array. The computer will indicate in the printout those combinations that are impossible to solve. The largest standard deviation that you can enter is 20. Use large standard deviations with caution, however. The larger the standard deviation the greater the number of arrays the computer will produce. By large, I mean on the order of more than 20,000! Denominator for SD The computer will create data using N or N - 1 for the denominator of the standard deviation. You can switch between 0, for N, and 1, for N - 1. Replications in Set This is an important parameter you can set. In essence, you can determine the frequency of equal values in the array of residuals. The number of replication can be set to any number between 1 and the Upper Sample Size. Restricting the number of replications will reduce the number of arrays. In some cases, there may be no solution of N and s with a specified maximum number of replications. Compute Clicking the mouse over this button causes the program to start generating the numbers. The computer will print the information to the screen, printer, or the disk drive. If you print to the screen, the computer will pause after each page of information until you press any key to continue the report. Press the ESC key to quit the current run and start over. Print Each time you click on this button, the program will send the information to a different destination. The computer's screen is the default device. You can also print to the printer or to the disk drive. If you print to the drive, the computer will create a file named FILE####.NMS. The #s represent numbers that the computer will generate for the data. Each time you print to the disk drive, the computer will generate a new file. The files are in ASCII format and can be easily read by any word processor. Exit This button returns control of the program to the main menu. Settings This option gives you access to a set of routines that allow you to change the color of the screen and text as well as other information concerning the program. The following table is an example of the data generated by the computer. For this example, I set the upper and lower sample size to 8, the standard deviations to 4, the denominator to N - 1, and the number of replications to 5. The computer found 70 unique arrays that produce whole numbers. 82 Random Number Generator ------------------------------------------------------------------------------Residuals for whole numbers: SD = (SS/(n-1))^.5 repeats = 5 ------------------------------------------------------------------------------Residual Arrays for N = 8 and s = 4 Array: e 1: 9 -2 -2 -2 -1 -1 -1 0 :SK = 2.08 2: 8 -5 -2 1 -1 -1 0 0 :SK = 1.12 3: 8 -4 -4 0 0 0 0 0 :SK = 1.14 4: 8 -4 -3 2 -1 -1 -1 0 :SK = 1.26 5: 8 -4 -3 -2 1 1 -1 0 :SK = 1.23 6: 8 -4 2 -2 -2 -2 0 0 :SK = 1.28 7: 8 -3 -3 -3 2 -1 0 0 :SK = 1.3 8: 8 3 -3 -2 -2 -2 -1 -1 :SK = 1.44 9: 8 -3 -3 2 -2 -2 1 -1 :SK = 1.33 10: 7 -6 -3 1 1 0 0 0 :SK = .3 11: 7 -6 2 -2 1 -1 -1 0 :SK = .37 12: 7 -6 -2 -2 1 1 1 0 :SK = .33 13: 7 -5 -4 2 1 -1 0 0 :SK = .48 14: 7 -5 3 -3 -2 0 0 0 :SK = .62 15: 7 -5 3 -3 1 -1 -1 -1 :SK = .64 16: 7 -5 -3 -3 1 1 1 1 :SK = .5 17: 7 -5 3 -2 -2 -2 1 0 :SK = .66 18: 7 -5 -3 2 2 -2 -1 0 :SK = .58 19: 7 4 -4 -3 -2 -1 -1 0 :SK = .91 20: 7 -4 -4 3 -2 1 -1 0 :SK = .69 21: 7 -4 -4 -3 2 1 1 0 :SK = .58 22: 7 4 -3 -3 -3 -2 0 0 :SK = .94 23: 7 4 -3 -3 -2 -2 -2 1 :SK = .98 24: 7 -4 3 -3 2 -2 -2 -1 :SK = .8 25: 7 -4 -3 -3 2 2 -2 1 :SK = .69 26: 7 3 -3 -3 -3 -3 1 1 :SK = .78 27: 6 -6 4 -2 -2 0 0 0 :SK = .14 28: 6 -6 -4 2 2 0 0 0 :SK = -.15 29: 6 -6 4 -2 1 -1 -1 -1 :SK = .16 30: 6 -6 -4 2 1 1 1 -1 :SK = -.17 31: 6 -6 3 -3 2 -1 -1 0 :SK = .01 32: 6 -6 3 -3 -2 1 1 0 :SK = -.02 33: 6 -6 2 2 2 -2 -2 -2 :SK = 0 34: 6 -5 -5 3 1 0 0 0 :SK = -.02 35: 6 5 -5 -2 -2 -1 -1 0 :SK = .58 36: 6 -5 -5 2 2 1 -1 0 :SK = -.06 37: 6 5 -4 -4 -1 -1 -1 0 :SK = .62 38: 6 -5 4 -4 1 -1 -1 0 :SK = .26 39: 6 5 -4 -3 -3 -1 0 0 :SK = .66 40: 6 -5 4 -3 -3 1 0 0 :SK = .3 41: 6 5 -4 -3 -2 -2 1 -1 :SK = .69 42: 6 -5 4 -3 2 -2 -1 -1 :SK = .37 43: 6 -5 4 -3 -2 -2 1 1 :SK = .33 44: 6 -5 -4 3 2 -2 1 -1 :SK = .16 45: 6 -5 -4 -3 2 2 1 1 :SK = .05 46: 6 -5 3 3 -3 -2 -2 0 :SK = .3 47: 6 4 -4 -4 -3 1 1 -1 :SK = .37 48: 6 -4 -4 -4 3 1 1 1 :SK = .16 49: 6 4 -4 -4 2 -2 -2 0 :SK = .42 50: 6 -4 -4 -4 2 2 2 0 :SK = .14 51: 6 -4 -4 3 3 -3 -1 0 :SK = .33 52: 6 4 3 -3 -3 -3 -2 -2 :SK = .62 53: 6 -4 3 -3 -3 -3 2 2 :SK = .33 54: 5 5 -5 -4 -2 1 0 0 :SK = .16 55: 5 -5 -5 4 2 -1 0 0 :SK = -.17 56: 5 5 -5 -3 -3 1 1 -1 :SK = .21 57: 5 -5 -5 3 3 1 -1 -1 :SK = -.22 58: 5 5 -5 -3 2 -2 -2 0 :SK = .26 59: 5 -5 -5 3 2 2 -2 0 :SK = -.27 60: 5 5 -4 -4 -3 2 -1 0 :SK = .3 61: 5 -5 4 -4 3 -2 -1 0 :SK = .05 62: 5 -5 4 -4 -3 2 1 0 :SK = -.06 63: 5 5 -4 3 -3 -2 -2 -2 :SK = .48 64: 5 -5 4 -3 -3 2 2 -2 :SK = .05 65: 5 -5 -4 3 3 2 -2 -2 :SK = -.06 Random Number Generator 83 ------------------------------------------------------------------------------Residuals for whole numbers: SD = (SS/(n-1))^.5: repeats = 5 Page 2 ------------------------------------------------------------------------------66: 5 5 3 -3 -3 -3 -3 -1 :SK = .5 67: 5 4 4 -4 -3 -3 -2 -1 :SK = .37 68: 5 4 -4 -4 3 -3 -2 1 :SK = .16 69: 5 -4 -4 -4 3 3 2 -1 :SK = -.02 70: 4 4 4 -4 -4 -4 0 0 :SK = 0 ------------------------------------------------------------------------------=============================================================================== Program Finished Statistical Tables Generator 85 CHAPTER 13 STATISTICAL TABLES GENERATOR INTRODUCTION The goal of this option is to provide accurate tables that researchers frequently use when conducting statistical tests. Specifically, the program will generate a table of values for a specific sampling distribution or table of critical values based on the parameters that you supply. The obvious advantage of this program is that it supplies a table on demand that meets your particular needs. Because you can specify the parameters of the table, the computer can create a specific table of values that may not otherwise be readily available. When you select this option, you will see a menu of alternative like the one in the following figure. As you can see, the program offers eight alternatives including tables for t-ratios, F-ratios, 2, the correlation coefficient, and the normal and binomial distributions. Each option will print the table to the screen or your computer’s printer. Figure 13.1 Menu of options available for the Statistical Tables option. CRITICAL VALUES : t-RATIO In this program you will have the opportunity to create the critical values required to reject the null hypothesis for Student’s t-ratio. To use the program you will need to specify the -level to be used and whether you wish a one or a two-tailed test. The lower and upper degrees of freedom establish the size of the table. You can make the table small by selecting a small range between the upper and lower limits. Similarly, you can make the table extremely large by setting the upper level at an extremely large level. 86 Statistical Tables Generator As a generality, the critical values of t will vary considerable as the degrees of freedom increases. When the degrees of freedom reach approximately 120, the amount of change will decrease and only be within one or two parts per thousand. Indeed, with degrees of freedom as large as these there is little difference between the t-distribution and the normal distribution. CRITICAL VALUES : F-RATIO This program generates the critical values of the F-ratio for a specified -level and range of degrees of freedom. The program will generate a table of F-ratios that identifies the degrees of freedom and critical value. To use the table, compare the observed F-ratio to the appropriate critical value. If the observed value is greater than the critical value, the null hypothesis may be created. On some occasions, you may find that the observed F-ratio is less than 1.0. You may want to know if the F-ratio is significantly less than 1. To conduct such a test, take the reciprocal of the F-ratio and reverse the degrees of freedom. Then test the revised F-ratio against the tabled values. For example, if the F-ratio were 0.436 with degrees of freedom of Numerator = 4 and Denominator = 30, the revised F-ratio is 2.296 = 1/0.436 with degrees of freedom of Numerator = 30 and Denominator = 4. CRITICAL VALUES: 2 This program produces critical values required to reject the null hypothesis for the 2 test. You may enter the -level and the range of degrees of freedom that the table should contain. CRITICAL VALUES: r When using the Pearson Product Moment Correlation Coefficient, one may determine whether the correlation coefficient is different from 0. In essence, the coefficient can be converted to a t-ratio that is then tested against critical values. This program will generate the minimum size of r required to reject the null hypothesis. You may set the -level, whether the test is a one- or two-tailed test, and the range of degrees of freedom to include in the table. r TO z TRANSFORMATION This program creates a table representing the transformation of correlation coefficients to z-scores using Fisher’s transformation. There are no parameters to set. The creation of the table is automatic. NORMAL DISTRIBUTION The goal of this program is to create a table of z-scores and the corresponding proportion of the distribution under specific section of the curve. Once you set the range of the table, the program will create a table of z-score and the proportion of the distribution above and below each z-score Statistical Tables Generator 87 BINOMIAL DISTRIBUTION The Binomial Distribution option creates the values for a Binomial distribution. The program allows you to determine P and the number of events in the population. The program prints a table with four columns. When the program creates the table, it provides you with basic information about the binomial distribution you created. The program will print the values of P; Q (1 - P); and the mean, standard deviation, skew, and kurtosis of the distribution. The table also includes four columns Column 1 represents the specific event that is represented by X(i). The column will range in values between 0 and the number of events in the distribution. The next column, headed by pX(i) is the proportion of the curve at the value of X. The next lines are the cumulative proportions of the distribution. The first of these columns represents the cumulative proportion at and below X(i). The second of these columns represents the cumulative proportion above X(i). Statistical Tables Generator 89 CHAPTER 14: ANOVA — MONTE CARLO SIMULATOR INTRODUCTION The ANOVA Monte Carlo Simulator is a program that has many functions and uses. You can use the program as a tutorial or as a utility to generate data. As a tutorial, you can use the program to examine many of the general principles of the Analysis of Variance. Like the other tutorials in this package, the program allows you to vary many of the essential parameters of the ANOVA and then generate random data that fit those parameters. In this capacity you can learn more about important concepts such as the robustness of the ANOVA when the population parameters violate the specific assumptions of the statistic. As a utility, you can generate data for homework assignments or other projects where one needs data that fit within specific parameters. Let’s begin by looking at how the program works. We will then examine different applications of the program. When the program begins, you will see a screen similar to the one presented in Figure 14.1. As you can see, there are many buttons. Each button controls the operation of the program. The following is a brief description of each button. Figure 14.1: Primary screen for the ANOVA Monte Carlo Simulator. Design ANOVA Model When you select this option, the program will allow you to design the type of ANOVA you wish to examine. You can control the number of Between-Subject and Within-Subject factors, and the levels of each factor. 90 Statistical Tables Generator Therefore, you will be able to model simple one-way ANOVAs, factorial ANOVAs, and mixed-model ANOVAs. Change Parameters This function allows you to change the parameters of the ANOVA you are examining. Specifically, the program allows you to change the mean (), standard deviation (), and sample size of each treatment condition within the ANOVA. When the program begins, all parameters are set as: = 5.0, = 2.5, and n = 5. Plot Factors Once you have generated data, the program will allow you to plot several graphs. These graphs represent each of the tests conducted in the ANOVA. That is, you will be able to examine the results for the main effects and interactions conducted for the model you created. Start Demonstration This button starts the program. After you have designed the ANOVA, set the parameters and the number of iterations, the program will generate data, conduct the ANOVA, and save the information to the computer’s memory. Iterations You can use this option to set the number of trials to generate. For quick demonstrations, you can use a small number of trials (e.g., 100). For more comprehensive tests of the ANOVA, you can increase the iterations to a large value (e.g., 10,000). For homework projects, you can set the iterations to the number of students in the class. The program always starts by setting the number of iterations to 10. You can change the number of iterations using the arrows to increase or decrease the value, or by clicking the mouse over the number of iterations and then entering the desired value. A word of caution is in order. Selection a large iteration value may commit you to a long computer run. Remember that the computer must generate new random numbers for each iteration. If you are examining a 2 4 ANOVA with 10 observations within each cell, the computer must generate 80 numbers and then conduct the appropriate ANOVA. These steps multiplied by the number of iterations you select can create a significant time-consuming task for the computer. Of course, your computer’s speed will affect the time required to complete the analysis. Similarly, a math coprocessor will enhance the speed of the intermediate calculations. Significant Digits The program generates data at random and will then round each number to represent the number significant digits (places to the right to the decimal point). You can instruct the program to generate whole Statistical Tables Generator 91 numbers (0 significant digits) or numbers with up to 4 significant digits (e.g., 2.1234). The number of significant digits only affects the random numbers that the computer generates, not the calculations. The computer uses the highest level of precision to calculate the ANOVA and descriptive statistics and reports all statistics to the third significant digit. Print Summary Tables to Screen When you select this option, the program will print the ANOVA summary tables to the screen. This will allow you to see the summary tables as the computer generates the data. Print Summary Tables to Disk This option allows you to make a permanent record of the ANOVA summary tables. The program saves each completed summary table to the disk drive. You can then retrieve the file using a word processor for review or printing. For each Monte Carlo Simulation, the computer will create a new file in the Statistics Tutor subdirectory. The general format for these files is FILEXXXX.AOV where the XXXXs represent the file number. Each time you run a simulation, the computer creates a new file. These files are numbered sequentially. Print Summary Tables to Printer This option allows you to make a permanent record of the ANOVA summary tables. The program prints each summary table to the printer. This option will slow the time it takes the computer to complete the Monte Carlo simulation. Print Raw Data with Tables If you print the summary tables to the disk or the printer, activating this option cause the program to print the raw data used for each ANOVA. You would use this option if you want a permanent copy of the data for later analysis. The raw data are never printed to the computer screen. This option is especially useful for instructors who wish to create homework assignments. Specifically, the instructor can create an ANOVA design, set the population parameters, and then set the iterations to the number of students in the class. Therefore, each student will produce a different data set for the assignment. Because the computer prints the data along with the summary tables, the instructor has a ready mechanism for checking the student’s work. Create F-Ratio File This option creates a special file that contains only the F-Ratios and their p-values. You might use this option if you want a permanent table of the F-Ratios and their p-values that you can use with a spread sheet program. 92 Statistical Tables Generator For each Monte Carlo Simulation, the computer will create a new file in the Statistics Tutor subdirectory. The general format for these files is TABLEXXX.TAB where the XXXXs represent the file number. Each time you run a simulation, the computer creates a new file. These files are numbered sequentially. The following table is an example of the file that the computer created. As you can see, the computer provides general information about the design of the ANOVA and the population parameters. For each data set, the computer records the label and degrees of freedom, F-ratio, and probability value for each F-ratio in the ANOVA. Table 14.1: Example of the Table Generated by the Monte Carlo Simulator. ANOVA Model: 2 Number of Between-Groups Factors: 1 Factor 1: Levels = Number of Within-Subjects Factors: 0 Parameters for ANOVA Name MU SIGMA A15 2.5 A25 2.5 Name F-Ratio A ( 1, 8) 1.214707 A ( 1, 8) .2932584 A ( 1, 8) 8.424465E-02 A ( 1, 8) 1.672475 A ( 1, 8) .2825279 A ( 1, 8) .1869996 A ( 1, 8) .3089594 A ( 1, 8) .5579816 A ( 1, 8) 4.054998 A ( 1, 8) 3.627503 2 Name = 1 n 5 5 p-value .3024517 .6028903 .7790068 .2320189 .6094869 .676846 .59352 .4764526 7.882446E-02 9.330428E-02 Other Features: You will notice that there are two additional bits of information on the screen. The first is the line ANOVA MODEL:. This line represents the type of ANOVA you have designed. The program always begins by conducting a one-way ANOVA with two independent groups. The next line is a graphic that represents how many ANOVAs the program has generated. When the program begins, it will draw a blue rectangle to show the proportion of the total iteration it has generated. Statistical Tables Generator 93 Figure 14.2: Example of ANOVA Monte Carlo Screen after the program has generated the data for the simulation. Practice Session: Let’s begin with a simple practice session. We will use the default ANOVA model and the default parameters. Click the mouse over the Start button or press the letter “S” on your keyboard. The program will immediately begin the process of generating the 10 ANOVAs. After the program generates the data, the screen will look like the one in Figure 14.2. Now that you have generated the 10 ANOVAs, let's look at the results. Click the mouse over the Plot Factors button, or press the letter “P” on the keyboard. You will see the screen change to the one presented in Figure 14.3. 94 Statistical Tables Generator Figure 14.3: Listing of the available effects that can be examined using the Plot Factors option. As you can see, there is only one effect to examine, the Main Effect for Factor A. If you had entered a factorial design, the program would present separate buttons for each Main Effect and interaction of factors. Because we have only the one option, press the return key, or click the mouse over the “A” button. The program will then tell you that it is working. In essence, the program is looking at all the F-ratios it generated and preparing the information for the following graphs. Once it has analyzed the data, you will see a graph similar to the one in Figure 14.4. Figure 14.4: Sample screen of the frequency distribution of F-ratios generated by the Monte Carlo simulator. Note that the program generates all Statistical Tables Generator 95 numbers at random. Therefore, each sampling distribution will be different from all others. This graph represents the frequency distribution of the F-Ratios generated for the ANOVA. Specifically, this is the frequency distribution of F-Ratios for Factor A. The horizontal axis represents the F-ratios. These values will range from 0.0 (on the left of the scale) to a large value of F. The graph also represents the probability level of the F-ratios. In this graph, the computer plotted the location of p = .5, p = .1, and p = .05. The vertical axis represents the observed frequency of each F-Ratio. You can now move the mouse through the graph. As you do, the numbers on the lower right of the screen will change. These numbers represent the locate of the mouse in the graph. For example, the mouse in Figure 14.4 is at an F-Ratio of 4.00. There was only one F-Ratio of this value. In essence, the data represented in Figure 14.4 is the sampling distribution for the F-ratio for the following conditions. First, the null hypothesis is a true statement. Specifically, the two population means are equal: 1 = 2. Second, the degrees of freedom are 1 for the numerator degrees of freedom, and 8 for the denominator degrees of freedom. Of course our sample size is small. Had we run more iterations, say 10,000, the distribution of F-ratios would look more like the distributions created by the Sampling Distributions tutorial. You can also generate other graphs. Press the letter “P” or click the mouse on the button to the left of the line “Press P to view Cumulative Probabilities.” When you do, you will see a graph like the one in Figure 14.5. Figure 14.5: Example of the cumulative probabilities graph. The horizontal axis represents the probability of the F-ratios. The vertical axis represents the cumulative probability. The dark curved line represents the ideal cumulative probability when the null hypothesis is true. The lighter histogram represents the observed data. 96 Statistical Tables Generator This graph represents the cumulative frequency of the probabilities for the tests you conducted. Along the horizontal axis are the probabilities. The probabilities range from 1.0 (on the right to the screen) to .0009 (on the left of the screen). The scale of the probabilities represents a log scale. The vertical axis represents the cumulative percentage for the Fratios. The percentages range from 0% (the bottom of the scale) to 100% (the top of the scale). There are two elements in the graph. The first is a black curved line. This line represents the cumulative probability that would occur for the null hypothesis. If the null hypothesis was a correct statement, all the probabilities should fit under this line. The second component is the blue bars. These bars represent the observed cumulative frequency of the probabilities. As you can see, for the 10 ANOVAs we conducted, the blue bars are close to the black line. This occurred because the null hypothesis is a true statement - the two population means are equal to each other. Move the mouse around the graph. As you do will see the numbers in the lower right of the screen change. These numbers represent the location of your mouse. In Figure 14.5, the mouse sets a p = .05 and cumulative percentage = 5%. In this example, we set the population parameters so that the null hypothesis was correct. That is, we ensured that 1 = 2. As we would expect, the size of the F-ratios followed what we would expect given this situation. We are now ready to begin experimenting with the ANOVA and this program. Specifically, we can use the program to examine such issues as power and the robustness of the ANOVA. EXAMINING POWER You should recall that power is the ability to reject the null hypothesis when the null hypothesis is a false statement. Power is affected by several variables including the sample size, the difference among the populations, the amount of within-group error, and the alpha level the researcher uses to test the null and alternative hypotheses. Researchers attempt to maximize the power of their experiments by optimizing each of these variables. Let’s look at how we can use the program to examine the concept of power as it relates to the ANOVA. In this example we will use a one-way ANOVA with four levels of the independent variable. To set-up the experiment, begin at the first page of the Monte Carlo program and click over the Design ANOVA Model button. The program will begin with the request: Enter Number of Between-S Variables #B/S: 1 Because we are using a simple one-way ANOVA, just press the ENTER key to indicate that there is one between-subject variable. The program will then ask: Enter LEVELS of Between-S Variable 1 Statistical Tables Generator 97 LEVELS 2 The analysis we want to conduct calls for four levels of the independent variable. Therefore, type 4 and press the ENTER key. The program will then ask for the name of Between-Subject variable 1. You can either press the ENTER key or type a short name and then press ENTER. The program now asks for the number of within-subjects variables. Enter Number of Within-S Variables #W/S: 0 There are no within-subject variables in this example. Enter a 0 and press the ENTER key. The program will automatically return to the first page. Notice that at the bottom of the screen, the program prints: ANOVA MODEL: 4 This indicates that we have entered the design of the ANOVA as a oneway ANOVA with four levels of the independent variable. Now that we have entered the design of the ANOVA, we can change the parameters of the population variables. Click the mouse over the Change Parameters button. You will now see a screen like the one in Figure 14.6. Figure 14.6: Example of the screen for changing the parameters of a simulation. Note that all the population means () all equal 5.000, the population standard deviations () equal 2.500, and the sample sizes are 5. For this example, let’s set the four means as 5.0, 6.0, 7.0, and 8.0. Enter the appropriate value for the mean and press the ENTER key. You have now 98 Statistical Tables Generator changed the population means for the four groups. Click the mouse over the Exit button to return to the first screen. For the next step, set the number of iterations to 100. Click the mouse over the 10, press the backspace key to clear space for the new number, and then type 100. When you press the ENTER key, the number of iterations will be reset. You are now ready to start the demonstration. Click the mouse over the Start Demonstration button. The program will automatically begin to generate the numbers for the separate ANOVAs. When the program is done, select the Plot Factors option, and then Factor A. You will then see a graph similar to the one in Figure 14.7. Your graph will be somewhat different from the one you see below because the computer is generating the numbers at random. Therefore, each run of the simulation will produce a slightly different pattern of results. Figure 14.7: Frequency distribution of F-ratios produced by the Monte Carlo Simulator. When you click the mouse over the Cumulative Probabilities option you will see a graph similar to the one in Figure 14.8. This is an important graph because it allows us to examine the power of the ANOVA design. Recall that the black curved line represents the normal cumulative probability that would be expected if the null hypothesis were true. That is, the line represents the condition where 1 = 2 = 3 = 4. As you can see, the actual cumulative probability level is above this line. This fact suggests that the probability of rejecting the null hypothesis is greater than . According to the graph, the cumulative probability is about 26. In other words, the probability of rejecting the null hypothesis is approximately 26%. Statistical Tables Generator 99 Figure 14.8: Cumulative probability distribution created by Monte Carlo simulator. By most standards, a power of 26% is small. What can the researcher do to increase the power of a research design? One alternative is to increase the level of . Although this is an easy and effective strategy, most researchers do not like to use an larger than .05. The options, therefore, are to reduce the within subject variation, increase the difference among the means, and increase the sample size. As a quick experiment, return to the main page of the ANOVA Monte Carlo simulator and select the Change Parameters option and then increase the sample size of the groups from 5 to 10. Once you have changed the sample sizes, rerun the simulation and determine the change in power. As you will see, a small increase in sample size greatly increases the power of the research design. 100 Statistical Tables Generator Figure 14.9: Cumulative probability distribution created by Monte Carlo simulator. You should note that this estimate of power is just that, an estimate. True estimates of power can be determined mathematically. Cohen’s book (19XX) contains a complete list of power tables for various statistics and designs. That book, however, examines power under the condition that the assumptions of the statistic are met. This program will allow you to experiment with violations of the assumptions of the ANOVA. As we noted above, there are many ways to experiment with the power of a research design. Currently, we have examined the effects of increasing sample size. You can continue to experiment with the program by systematically altering the different parameters of the population. For example, all else being equal, how does increasing or decreasing the within group variation in each treatment condition affect the results? Similarly, how much difference must there be among the groups to increase power to 80? Another dramatic means of affecting the power of the ANOVA is altering the design of the research methods. One of the more dramatic alterations involves using a within-subjects research design. Within-subjects research designs are important in the behavioral sciences because they offer the researcher increased power and the ability to detect interesting effects. Let’s take a moment and examine how these designs work. There are two general ways we can use a within-subjects design. The first is know as a repeated measures design. As the name implies, the repeated measures design include many measures taken from the same subject. Here is a simple example. A researcher may be interested in how quickly people forget information. Therefore, she has people memorize a list of words. Once the people memorize the list, the researcher tests their memory once every 3 hours for the next 12 hours. In this case, the researcher has 4 observations for each subject. The independ- Statistical Tables Generator 101 ent variable is the passage of time and the dependent variable is the individual’s performance on the memory test. The second general method for using a within-subjects design is the matched-groups design. In matched groups design, the participants are randomly selected but assigned to one of the treatment conditions on the basis of some important characteristic. The researcher assigns the subjects to the groups in such a way that each group has nearly identical, or matched, individuals. The researcher first selects a significant subject variable that he or she believes affects the outcome of the results. Next, the researcher ranks the individuals from highest to lowest on this important variable. Now comes the important part. The researcher begins with the highest ranking individuals and randomly assigns each to one of the treatment conditions. If there were four treatment conditions, the researcher would take the four highest scoring subjects and randomly assign each to one of the four treatment conditions. Whether one uses a repeated-measures design or a matched-groups design, the net result is the same. Be can estimate the proportion of variance that is due to subject variables. In the within-subject design, each person serves as his or her own control. In the matched-groups design, the matching variable allows us to identify an important contributing factor to total variance. The net result is that a within-subjects design has the potential of being more powerful than the equivalent between-subjects design. The increase in power is owed to the fact that the within-subject design allows us to estimate the portion of the total variance that is unique to differences among subjects. Because we have identified a new source of variation, the total error term of the ANOVA is reduced. Here is a simple experiment you can perform. Return to the original page of the program and select the design ANOVA model option. Then enter the following information: Enter Number of Between-S Variables #B/S: 0 Enter Number of Within-S Variables #W/S: 1 Enter LEVELS of Within-S Variable 1 LEVELS 4 After the program asks for the name of the within-subjects variable, it will ask for the correlations among the groups. Accept the default value of .750 by pressing the ENTER key. Enter r = .750 When you are finished, the program will note at the bottom your screen. ANOVA MODEL: [4] The model indicates that you entered a one-way ANOVA with repeated measures. Now change the four population means to 5, 6, 7, and 8. What we have done is create a situation similar to the earlier between- 102 Statistical Tables Generator subjects design. The only difference is that we now know that there is a significant correlation among the subjects in the treatment condition. Run 100 iterations and then look at the cumulative probability. The following graph is from a simulation using the same parameters. Compare this graph to the one in Figure 14.9. As you can see, the power for the within-subjects design is much greater. Figure 14.10: Cumulative probability distribution created by Mon- te Carlo simulator. ROBUSTNESS OF THE ANOVA Robustness is a statistical term that refers to the ability of an inferential statistic to afford accurate inferences when the mathematical assumptions of the statistic cannot be met. More specifically, robustness refers to the degree to which the rate of Type I errors are held constant when the assumptions are violated. Let’s look at an example for the sake of illustration. One of the key assumptions of the ANOVA is homogeneity of variance. When we conduct an ANOVA we assume that the variances of all groups are equal. This is an important assumption because the denominator of the F-Ratio is known as a pooled error term. This phrase means that the mean-squares error is really a type of average variance. If there are large differences among the variances, then the pooled error term may not accurately reflect the typical variance for any group. If there are large differences among the variances, will the ANOVA continue to provide useful information about the variance among groups? From the perspective of hypothesis testing, will the ANOVA continue to create too many or too few Type I and Type II errors? If the ANOVA is robust, then the rate of Type I errors will remain relatively constant. If the test is not robust, then the rate of Type I errors will be greatly increased or decreased. Statistical Tables Generator 103 Indeed, let’s see what would happen if we violated the assumption of homogeneity. Return to the first page and design a one-way ANOVA with four levels of the independent variable. In the Change Parameters option set all population means to 5 and the sample sizes to 10. Now change the four standard deviations to 1, 2, 4, and 6. This arrangement will create a considerable amount of heterogeneity of variance. How will this arrangement affect the robustness of the ANOVA? Figure 14.11: Cumulative probability distribution created by Monte Carlo simulator. As you can see in Figure 14.11, the violations of the homogeneity assumption had minimal influence on the rate of Type I errors around the conventional testing area of = .05. In this example, there was a slight inflation of Type I errors, but these appear to be minimal. Let’s look what happens, however, when the sample sizes are not equal. Return to the Change Parameters option and change the sample sizes to 4, 6, 8, and 10. When you run a simulation under these conditions you may produce a cumulative percentage graph like the one in Figure 14.12. 104 Statistical Tables Generator Figure 14.12: Cumulative probability distribution created by Monte Carlo simulator. The results are dramatic! The violation of the homogeneity principle coupled with unequal sample sizes greatly reduced the frequency of Type I errors. Statistical Tables Generator 105 REFERENCES Hopkins, K. D., & Hopkins, B. R. (1979). The effect of the reliability of the dependent variable on power. Journal of Special Education, 13, 463-466. Kohr, R. L., Games, P. A. (1974). Robustness of the analysis of variance, the Welch procedure, and a Box procedure to heterogeneous variables. Journal of Experimental Education, 43, 61-69. Lovie, A. D. (1979). The analysis of variance in experimental psychology: 1934-1945. British Journal of Mathematical and Statistical Psychology, 32, 151-178. Maxwell, S. E., Cole, D. A., Arvey, R. D., & Salas, E. (1991). A comparison of methods for increasing power in randomized betweensubjects designs. Psychological Bulletin, 110, 328-337. Rogers, W. T., & Hopkins, K. D. (1988). Power estimates in the presence of covariate and measurement error. Educational and Psychological Measurement, 48, 647-656. Scheffé, H. (1970). Practical solutions of the Behrens-Fisher problem. Journal of American Statistical Association, 65, 1501-1508. Student. (1908). The probable error of the mean. Biometrika, 6, 1-25. Wang, Y. Y. (1971). Probabilities of the type I errors of the Welch tests for the Behrens-Fisher problem, Journal of the American Statistical Association, 66, 605-608. Welch, B. L. (1936). Specification of rules for rejecting too variable a product with particular reference to an electric lamp problem. Journal of the Royal Statistical Society, 3, 29-48. Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29, 350-362. Welch, B. L. (1947). The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika, 34, 28-35. Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38, 330-336. Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed.). Boston, McGraw-Hill. Zimmerman, D. W., & Sumbo, B. D. (1993). The relative power of the Wilcoxon-Mann-Whitney test and Student t test under simple bounded transformations. The Journal of General Psychology, 117, 425-436.