LAB 8 Sampling Distributions The Situation: Joe, Cally, and Nick are 3 clerks who work in the small claims office at the government building in a large metropolitan city. Their supervisor periodically checks up on them to see how long it takes each of them to process claims. In order to deal with the large volume of small claims that come through the office each day, each clerk must take 6 minutes or less to process a claim, on the average. It would be too time consuming, if not impossible, for the supervisor to monitor every transaction performed by each clerk. The supervisor needs to decide on the best way to make a reliable estimate of the time it takes each clerk to process claims. In general, she has one of two options: 1) The supervisor will randomly pick a small sample (somewhere between 2 and 4 claims) and calculate the average time it takes a clerk to process claims. or 2) The supervisor will randomly pick a large sample (somewhere between 16 and 25 claims) and calculate the average time it takes a clerk to process claims. If a clerk takes longer than 6 minutes, on the average, to process small claims, the clerk may be moved to a less demanding position, or possibly fired. The supervisor does not want to make an incorrect decision and dismiss a clerk who is actually performing their job well. Your task is to find the method that will help the supervisor make the best decision. Overview: The supervisor isn’t able to observe the entire population for a particular clerk. She has to make her decision based on a single sample of the worker’s performance. From this sample the supervisor can calculate statistics like the sample mean and sample median. However, we know that if she took a different sample she could get different values for the sample statistics. Thus, she shouldn’t expect the sample mean to be exactly equal to the population mean. Let’s find out how well the sample mean estimates the population mean. In this lab you will utilize a program developed by Robert delMas at the University of Minnesota to help us investigate distributions of sample means to decide which method is best. Once you understand this behavior, you can decide which method is best for the employees. Case 1: Population is Normal (Joe) Let’s start with Joe. On average, Joe takes 5 minutes to process a claim, and his processing time tends to have a Normal distribution. However, the supervisor does not know this information, and must base all conclusions on sample data. We want to know the chances of Joe getting unlucky and showing his supervisor a sample mean of more than 6 minutes per claim, even though he's a 5-minute claim-processor. To find this out, we need to know the behavior of the sampling distribution of the sample means. Program Instructions: Locate and double-click the Sampling SIM program. You will see a screen similar to the figure on the right. The program lets you create predefined population distributions by simply clicking on a button. Hold the mouse button down on the button, slide the mouse down to highlight NORMAL, and let go. This creates a population with the shape of a Normal distribution (see figure above). The characteristics of this population distribution are displayed below the graph for the population, e.g. population mean of = 5 and standard deviation = 1.805. This graph represents the population distribution of Joe’s claim-processing. The mean (blue ) and median (red M) are represented by vertical lines. 1. Let’s simulate Monitoring Method 1 where the supervisor picks two claims at random. Go to the Menu bar and select Windows -> Sampling Distribution. The Sample Size set to 1. Change the Sample Size to 2 Click once on the New Series button so it reads Add More. Click once on the button labeled Draw Samples. The program will draw one sample, calculate it’s sample mean, and then place a green square in the graph area to represent the location of the sample mean. Click again (just once) on the Draw Samples button. A second sample is drawn and it’s sample mean is plotted on the graph. Look at the box labeled Total Sample Drawn. The number 2 is in the box indicating that a total of 2 samples has been drawn. Click on the Draw Samples button eight more times so that you have a total of 10 sample means plotted. 2. Change the value in the Number of Samples box from 1 to 490. Now click on Draw Samples and the program will draw 490 more samples and plot the sample mean for each. By clicking the Draw Samples button until you have a total of 500 in the Total Samples Drawn box. With a total of 500 samples, you get an even better idea of how the sample mean varies from sample to sample. 4. Go to the worksheet. Joe: Normal Distribution and sketch a graph that matches the graph created by the Sampling Distributions program, be sure to label the x axis. Just below where you placed the graph, you will see a place to record the Mean of x and the sd of x . The MEAN of Sample Means box on the computer screen tells you the average of the sample means. This measures the center of the distribution of sample means. Locate this value on the computer screen and write it in for the Mean of x just below where you placed the graph. The Standard Dev. of Sample Means box on the computer screen tells you the standard deviation of the sample means. This measures the variability among the sample means. Locate this value on the computer screen and write it in for the sd of x . 6. If the CLT was valid we would expect sd of x to be close to /sqrt(n) where is the original population standard deviation. 5. 7. Move the red tab to find the actual proportion of x that fell above 6. On your worksheet write the proportion of times that Joe would have had a sample that his supervisor would consider poor. STOP. Before you proceed, ask the instructor to check your work. Repeat each step for sample sizes of 9 and 16. Case 2: The Population is not Normal (Talia) Now, let’s take a look at another employee’s claim-processing times. Talia’s claim-processing times follow a skewed left distribution with mean 6.81 and standard deviation = 2.063. How will the means for samples taken from her distribution behave? To find out, return to the Population window (Windows -> Population), click on Normal, and then select "Skewed -". This will create Talia’s skewed left population distribution. This distribution is also shown at the top of the second column of the worksheet. Follow the previous steps to look at distributions of sample means for the same three sample sizes we used with Joe’s population (n = 2, n = 9, and n = 16). Case 3: Erratic Behavior (Cally) Now we look at an employee whose claim-processing times are quite irregular. Cally is pretty erratic; the population of her claim-processing times is presented at the top of the third column on worksheet. Her distribution has a population mean of = 5 with variability = 3.410. What will distributions of sample means from Cally’s population look like? To find out, return to the Population window (Windows -> Population). You will see four buttons at the bottom of the Population Window. Each button has the outline of a distribution. Locate the last button along the bottom with the blue outline. Click once on the button to create Cally’s population. Make sure you have the correct population by checking that = 5 and = 3.410. Follow the previous steps to look at distributions of sample means for the same three sample sizes we used with Joe’s population (n = 2, n = 9, and n = 16). Sampling Distributions Activity – Part 1 For a sample size of n = 2 (the first graph), how does the SHAPE of distribution of 500 sample means compare to the shape of the population ? For a sample size of n = 9 (the second graph), how does the SHAPE of distribution of 500 sample means compare to the shape of the population ? For a sample size of n = 16 (the third graph), how does the SHAPE of distribution of 500 sample means compare to the shape of the population? For all sample sizes n = 2, 9 and 16, how does the Mean of x compare to the MEAN of the population (the value of from the Sampling Distribution Worksheet)? page 3 Joe Normal Dist. Very Different Different A Little Different About the Same Very Different Different A Little Different About the Same Very Different Different A Little Different About the Same Talia Negative Skew Very Different Different A Little Different About the Same Very Different Different A Little Different About the Same Very Different Different A Little Different About the Same Cally Irregular Shape Very Different Different A Little Different About the Same Very Different Different A Little Different About the Same Very Different Different A Little Different About the Same Much Lower Much Lower Much Lower A Bit Lower A Bit Lower A Bit Lower About the Same About the Same About the Same A Bit Higher A Bit Higher A Bit Higher Much Higher Much Higher Much Higher n=2 n=9 n = 16 n=2 n=9 n = 16 n=2 n=9 n = 16 n=3 n=9 n = 16 n=2 n=9 n = 16 n=3 n=9 n = 16 YES YES YES NO NO NO For all sample sizes n = 2, 9, and 16 (the first graph), describe how the STANDARD DEVIATION of the sample means (sd of x ) compares to the STANDARD DEVIATION of the population (the value of from the Sampling Distribution Wooksheet)? Which sample size produced a distribution of sample means with the LARGEST variability (largest value for the sd of x )? Which sample size produced a distribution of sample means with the SMALLEST variability (smallest value for the sd of x )? Look at the value for the sd of x for all three sample sizes. Are any of the values GREATER than the standard deviation for the population (larger than the value of )? Sampling Distributions Scrapbook Score: out of 3 page 1 Score: out of 3 Score: out of 3