+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University + Background George Cobb started the interest in using resampling methods for introductory statistics with his plenary talk at the First USCOTS in 2007. Several groups our now working on integrating these approaches into the curriculum. I have added numerous resampling procedures to StatCrunch which is widely used in teaching introductory statistics. Roger Woodard and I have our INCIST NSF grant to develop teaching materials which incorporate these methods. We have conducted numerous workshops around the country where we have presented these materials to statistics teachers. + A Randomization Activity Students were randomly assigned to a version of an exam (Yellow or Green) when they entered the classroom. Afterwards, both sets of students complained about the exam saying their version was harder. Students investigate the possibility that the observed difference between means of 6.3 might occur due to random chance. They shuffle cards with scores written on them into yellow and green groups and then calculate the difference between the two means. They place a post it note on a whiteboard in the proper location and evaluate the resulting randomization distribution. + The randomization applet + A Sampling Distribution Activity A very inconvenient printed roster of 12,000 students at a fictitious university is provided to the students. They build a sampling distribution by each collecting a random sample of size 30 and reporting the mean number of Facebook friends for their sample via a StatCrunch survey. From the sampling distribution, students see the normal curve is a good descriptor of the sampling variability of the sample mean. This leads to the CLT and the idea of statistic ± 2×standard error as a 95% confidence interval for an unknown population mean. + A Sampling Distribution Activity Mary’s Sample Mean + A Sampling Distribution Activity The instructor then uses their access to the data in electronic form within StatCrunch to compute 1000 sample means with each sample mean based on a sample of 30 students. This is like doing the activity with 1000 students instead of 10. + A Bootstrapping Activity Building off the sampling distribution activity, students are then tasked with estimating the standard error of the sample mean using a single sample. Using the sample as a proxy for the population, each student collects 30 resamples taken with replacement from the common sample data. Each student reports the mean of their bootstrap sample. The student results are then augmented with applet results to compute a 95% confidence interval for the population mean. + The bootstrapping applet + What we have learned about the randomization approach This approach appeals nicely to a basic intuition that most people have about the problem. The tactile simulation adds a great deal of value in terms of students understanding of what is taking place in the applet. It can be introduced with little or no background required in terms of other statistical concepts such as normal theory or even the jargon of hypothesis testing. It can be easily used at a variety of time points in the standard introductory course. Students and instructors seem to really take to it! + What we have learned about the bootstrapping approach The bootstrap can be used to reinforce this idea of a sampling distribution. The bootstrap approach requires a great deal of backstory before it can be effectively introduced. It is probably best to rely on technology alone for the bootstrap after the student does the sampling activity in a more tactile way. Students seem to like it but instructors not so much! The bootstrap may also lead to possible misconceptions on the part of students: “Am I getting a confidence interval for the sample mean?” + For discussion With these approaches, two people can get different results even when using the same data set. Taking a simple random sample from a large list of values is difficult to do in a tactile way. Must we always rely on help from the computer? How can we make one sample problems more interesting to students? Students are interested in samples but not interested in population parameters! Why do we care about the mean number of Facebook friends? Are we too focused on inference in the introductory course? How often will the average student ever be confronted with a random sample?