Minitab Simulation to examine the sampling distribution of the sample proportion p-hat. We will use Minitab to simulate randomly selecting samples from a population with a known proportion of “success” (i.e., with a population in which a given proportion have a particular characteristic of interest). Open Minitab (no need to open a data file) and follow these steps: 1) To simulate selecting a sample of size 20 from a population with p = 0.5 (50% of the population has the characteristic of interest): Calc > Random data > Binomial Generate 500 rows of data Store in X1 Number of trials = 20 Probability of success = 0.5 Now you have the results for 500 samples of size 20. The counts X (the values in X1) are the number in the sample who have the characteristic of interest. 2) To calculate the proportion of “success” for each of these 500 random samples of size 20: Calc > Calculator Store result in variable: p-hat20 Expression: X1 / 20 3) Now we will examine the distribution of the 500 sample proportions. Make a histogram for p-hat20. Describe the distribution. Where is the distribution centered? Use Stat > Basic Statistics > Descriptive Statistics to find the mean and the standard deviation of the 500 sample proportions. 4) To simulate selecting a sample of size 70 from a population with p = 0.5 (50% of the population has the characteristic of interest) and storing the number in the sample who also have that characteristic in X2: Calc > Random data > Binomial Generate 500 rows of data Store in X2 Number of trials = 70 Probability of success = 0.5 5) To calculate the proportion of “success” for each of these 500 random samples of size 70: Calc > Calculator Store result in variable: p-hat70 Expression: X2 / 70 6) To simulate selecting a sample of size 200 from a population with p = 0.5 (50% of the population has the characteristic of interest) and storing the number in the sample who also have that characteristic in X3: Calc > Random data > Binomial Generate 500 rows of data Store in X3 Number of trials = 200 Probability of success = 0.5 Now you have the results for 500 samples of size 200. The counts X (the values in X3) are the number in each sample that have the characteristic of interest. 7) To calculate the proportion of “success” for each of these 500 random samples of size 200: Calc > Calculator Store result in variable: p-hat200 Expression: X3 / 200 8) Now we will examine the distribution of the 500 sample proportions for samples of size 20, 70, and 200 by viewing three histograms: Graph > Histogram Click on Multiple Graphs. Graph the variables p-hat20 Select ‘Same X’ and ‘Same Y’. p-hat70 p-hat200 Describe and compare the three distributions. How are they similar? How are they different? Use Stat > Basic Statistics > Descriptive Statistics to find the means and the standard deviations. 9) Fill in the following: Pop‟n proportion = 50% Sample Size (p = 0.5) [p is fixed; n changes] Mean of the 500 sample proportions Std dev of the 500 sample proportions 20 70 200 Refer to Example 3.32 on page 214. We will use Minitab to do the simulation described there. Note – we‟ll define “success” here to be that a person finds clothes shopping to be frustrating. File > New > Minitab worksheet. 10) To simulate taking a sample of size n = 25 from a population in which 60% of the individuals find clothes shopping frustrating, do the following: Calc > Random data > Binomial Number of trials = 25 Generate 500 rows of data Probability of success = 0.6 Store in X1 We now have in column „X1‟ the number of “successes” in each of 500 random samples of size 25 taken from a population with a 60% “success” rate. 11) To calculate the sample proportions for each of these 500 trials: Calc > Calculator Store result in variable: p-hat60% Expression: X1 / 25 12) To simulate taking a sample of size 25 from a population in which only 28% of the individuals find clothes hopping frustrating, do the following: Calc > Random data > Binomial Number of trials = 25 Generate 500 rows of data Probability of success = 0.28 Store in X2 We now have in column „X2‟ the number of “successes” in each of 500 random samples of size 25, but now taken from a population with 28% “successes”. 13) To calculate the sample proportions for each of these 500 trials: Calc > Calculator Store result in variable: p-hat28% Expression: X2 / 25 14) To simulate taking a sample of size 25 from a population in which 75% of the individuals find clothes hopping frustrating, do the following: Calc > Random data > Binomial Number of trials = 25 Generate 500 rows of data Probability of success = 0.75 Store in X3 We now have in column „X3‟ the number of “successes” in each of 500 random samples of size 25, but now taken from a population with 75% “successes”. 15) To calculate the sample proportions for each of these 500 trials: Calc > Calculator Store result in variable: p-hat75% Expression: X3 / 25 16) Now we will examine the distribution of the 500 sample proportions samples of size 25, but taken from populations with 60%, 28% and 75% as the population proportion. Make three histograms: Graph > Histogram p-hat60% p-hat28% Multiple Graphs. p-hat75% Select ‘Same X’ and ‘Same Y’. Where are the distributions centered? Describe the distribution. Where is the distribution centered? Use Stat > Basic Statistics > Descriptive Statistics to find the mean and the standard deviation. What are the means? What are the standard deviations? File > New > Minitab worksheet. 17) Repeat #s 11 – 16 for samples of size 250 (so your Number of trials will = 250 instead of 25), and for samples of size 800. When you calculate p-hat each time, be sure to divide by the new value of n. Compare the histograms (using the same scale). 18) Fill in the following: Sample Size = 25 Pop‟n proportion p proportions 60% 28% 75% Mean of the 500 sample proportions Std dev of the 500 sample Sample Size = 250 Pop‟n proportion p proportions Mean of the 500 sample proportions Std dev of the 500 sample Mean of the 500 sample proportions Std dev of the 500 sample 60% 28% 75% Sample Size = 800 Pop‟n proportion p proportions 60% 28% 75% 19) Use your results from #9 and #18 to explain the idea that the sample proportion p-hat is an unbiased estimator of p, the population proportion. 20) Use your results from #9 and #18 to explain how sample size affects the variability of the estimate p-hat. 21) Explain why an estimate from a larger sample is better than an estimate from a smaller sample? Which is more likely to give “reliable” results? Why?