Creating a Random Data Set for the Math SAT Scores for Simulating a Hypothesis Test When working with data analysis there are many instances where data is faked or altered for use in the classroom. The goal of this article is to showcase the variability that resulted when a small sample size was used. In the summer of 2015, I collaborated with graduate students Chris and Jamie to generate a sample of what we called “dummy” data for normally distributed student Math SAT scores. In what follows, Chris’s model of the SAT scores had 100 students while Jamie’s model used 1,000 students. The description about how to create the model is based on Chris’s work. We assume that the reader can write functions for a collection, graph data in parallel dot plots and use the history tool to capture random samples from a collection. For more information on simulating the hypothesis test on two means, please consult the Hypothesis Testing pages on this Wiki. The College Board provided 2014 statistics for almost 1.7 million students with 47% male and 53% female. The population’s mean scores were 530 and 499 respectively, and the standard deviations were 123 and 114 respectively. To begin, Chris added 100 cases to an empty case table by selecting the command New Cases in the Data menu. She labeled the first attribute Gender. The formula for Gender used an if-then-else command with the caseindex function. In TinkerPlots, the caseindex function assignes each case in the collection a value from 1 to n. Chris designated the first 47 cases as males and the remaining 53 cases as females by using the if-then-else command. Values for the second attribute, labeled Score, were randomly generated using the formulas randomNormal(530,123) for males and randomNormal(499, 114) for females. The if-then-else logic in Chris’s model is shown in Figure 1. Figure 1. Math SAT data set for 100 students After graphing the data, Chris realized that the random normal function generated values below 200 points and above 800 points. For example, the male student in row 45 has a score of 851.329 (see figure 1). The model had to be adjusted so all of the scores would be valid. One option was to drop the extreme data points from the model. The other option was to change all of the scores that fell below 200 to 200 and change all of the scores that fell above 800 to 800. We chose to change the scores so the ratio of males to females was preserved. The attribute New_Score was added to the third column in the case table (see figure 2). Notice that male student number 27’s score of 813.49 was changed to 800. TinkerPlots’ switch function allows for a series of logic tests on the same Kahthryn G. Shafer © January 2016 Page 1 of 6 attribute, which is more efficient than nested of if-then-else commands. The three options in the switch command were to round Score down to 800, round up to 200, or stay the same. Figure 2. Revised data set with valid scores in the third column After creating the dummy data set, Chris plotted the data and measured the difference of the means with the ruler tool (see figure 3). Chris found that the difference between the male and female groups was 79 points (579 – 500 = 79 points). This difference was over twice the 31-point difference found in the population means (530 – 499 = 31 points). Since Chris continued to work with this model the results of her simulated hypothesis test were not accurate. Figure 3. Parallel dot plot of male and female math SAT scores and the difference of the means In order to further illustrate the danger of creating this particular data set with only 100 data points I generated four more random data sets (collections) of 100 students each. The differences of the mean Math SAT score for the groups of male and female students was 47, -2, -21, and 52 respectively (see figures 4a-d). The difference of the means for the statistics reported by the College Board for males and females was 31 points. The only dummy data set that came close was the first collection, with a difference of 47 points. Since the range of the SAT scores was 600 points, and only 100 data points were generated, there was too much variability in the model. My results indicate what might happen if a teacher assigned this task to four different groups of students. Kahthryn G. Shafer © January 2016 Page 2 of 6 Figure 4a-d. Difference of the means for 47 males and 53 females generated from four different randomization of dummy data. Kahthryn G. Shafer © January 2016 Page 3 of 6 Jamie’s T-Test Simulation Jamie followed the same steps completed by Chris to create a set of random “dummy” data with 1,000 students instead of 100. Her formulas for Gender and SAT-mathscore are shown in Figure 5. Figure 6 shows parallel dot plots separated by gender. Excerpts from Jamie’s simulated hypothesis test are shown in the graphics and text boxes below. Figure 5. Data formulas for Math SAT scores Figure 6. Parallel dot plots of male and female Math SAT scores The difference in means for my original data set is 21 points. The males scored 21 points higher than the females on the math [SAT]. Kahthryn G. Shafer © January 2016 Page 4 of 6 Null hypothesis: There is no difference in the mean for males and females on the math section of the SAT. Alternate hypothesis: The mean score for males is greater than the mean score for females on the math section of the SATs. While simulating the hypothesis test, we encountered a challenge because the New_Score attribute re-randomized the dummy data set every time a sample was generated for assignment to group A or B. For example, say case 1 is a male with a Math SAT score of 388 in the first iteration of the simulation. Without “fixing” the data, Student 1 could have a Math SAT score of 215 in the second iteration, and a score of 780 in the third iteration. I suggested that we “fix” the dummy data by copying and pasting it into Excel, and then move it back into a new case table. [With TinkerPlots version 2.3 simply copy/paste the collection into a new case table.] The following text and figures 7, 8, and 9a-d illustrate feasible results of the hypothesis test simulation completed by Jamie. Figure 7: I will mix up the data to see if I can find a difference that high when there are both males and females together. I will keep my sample sizes the same (530, 470). I will create two more columns in my table (Rank: uniqueRank(random()) and Group: if(rank<531{"A","B") Figure 8: I plotted the new score by group. With gender clicked on, you can see that the groups are a mix of males and females. I will find the new difference in means then use the history tool to create 99 more samples. I will create 100 samples 4 times to see how many means in each group of 100 are greater than or equal to 21. Figure 7. Case table with random group assignment formulas Figure 8. Parallel dot plots of a single random group assignment Interpretation of the results: Kahthryn G. Shafer © January 2016 Page 5 of 6 After 4 samples of 100 differences in means, only 1 case was above 21 points. The results were 0,0,0,1. As a result, I reject the null hypothesis. There is a statistically significant relationship between gender and math SAT score. Figure 9a-d. Results of 100 differences of the means Improvements and Discussion As I reviewed the SAT projects completed by Chris and Jamie, I noticed that the scores were not integers. An improvement to the model would be to use a round function when the scores are randomly generated. What other improvements could be made to this simulation? Would it be helpful to randomize the data until both of the means and standard deviations are relatively close to the given statistics? What would happen if the invalid scores were simply excluded from the dummy data set? Should the teacher create the data or allow students to run the simulation with a small data set knowing that their results might be outrageous? These are issues that classroom teachers need to consider when creating “dummy” data for their students, or when they ask students to create their own models. Kahthryn G. Shafer © January 2016 Page 6 of 6