Exploring Statistical Concepts Name: _______________________ UF ID#:______________________ Purpose: To explore the sampling distribution of the sample proportion and the sample mean To create and interpret confidence intervals for the population proportion and the population mean Due Date: June 10th, 2010 at the beginning of class. Late projects will receive a 20% penalty. Statistical Software: This project requires the use of the statistical software program called Minitab. Minitab can be accessed: on the CIRCA computers (http://labs.circa.ufl.edu/hours.php by renting the program for free for 30 days through www.e-academy.com/minitab by using the computers in CBD 220 Tutoring Room: The tutoring room will meet in CBD 220 on Tuesday(June 8th 11 to 5pm ) and Wednesday(June 9th 11 to 7pm ). The TA will be there to help anyone with the project and/or other questions about the class. Part A: Exploring the Sampling Distribution of the Sample Proportion Applet: http://www.stat.tamu.edu/~west/ph/sampledist.html 1. Terms of the simulation. In class, we studied the sampling distribution of p̂ , the sample proportion of successes in a binomial experiment. We saw that this distribution is approximately normal if np and n(1-p) are both greater than or equal to fifteen. The parent population of the data is binary, meaning that it has only two potential responses: success (1) or failure (0). Using the applet, we are going to repeatedly draw samples from the binary distribution and compute the sample proportion of successes. This will allow us to see when the sampling distribution of p̂ can be approximated with the normal distribution. Familiarize yourself with the applet and answer the following questions. a) Each of the samples that will be drawn from the parent distribution will be of the same size. What is the symbol that the website uses to represent the size of each sample? ____________ b) What is the symbol used to represent how many times we collect these samples? ___________ 2. Simulation. For each setting of n and p given in the table that follows, compute the values of np and n(1-p). Determine if np and n(1-p) are greater than or equal to fifteen. Then use the applet to see the sampling distribution and determine if the normal approximation is good for each case (get at least one thousand samples for each combination of n and p). Select the value of p by using the drop down box next to the population graph. For each combination, determine if the graph shows that the sampling distribution of p̂ is close to normal. Look at symmetry, continuity (no big gaps in the data), and tails. Sketch n p 10 0.90 50 0.90 np n(1-p) both ≥ 15? Continuity? Symmetry? Normal (No big Approximation gaps in Good? data) 1000 0.90 10 0.50 50 0.50 100 0.50 10 0.20 50 0.20 100 0.20 1000 0.20 3. Summary. Play with the applet a bit. In your own words, explain what combinations of n and p are necessary for the sampling distribution of p̂ to be approximately normal, and why. Part B: Exploring the Sampling Distribution of the Sample Mean 1. Simulation. (Use the same applet as in part A.) For each parent distribution and sample size given on the table that follows, write down the mean and the standard deviation (given by the computer) in the first column. Then, compute the values of the mean and standard deviation of the distribution of X in the theoretical columns using the values that the Central Limit Theorem specifies. Then use the applet to get the distribution (get at least one thousand samples for each case). Record the mean and standard deviation of your simulation in the observed column. Comment on the shape of the graph. NOTE –When you look at the shape, imagine it being smoother. Parent Population Sampling Distribution of X Sample Size Normal μ= σ= 2 Normal σ= 30 Skewed μ= σ= 2 μ= Skewed σ= 30 μ= Uniform σ= 2 μ= Uniform σ= μ= Theoretical Mean Stdev Observed Mean Stdev 30 2. Summary. Based on the results of the simulation, what happens to: a) the shape of the distribution of x as n increases? b) the mean of the distribution of x as n increases? c) the standard deviation of the distribution of x as n increases? Shape Part C: Identify the types of problems and entering the data. 1. On the first day of class, everyone was invited to take part in a class survey. One of the questions is below. Identify the type of problem as either a situation where you are trying to estimate the population mean or the population proportion. a. Aside from class time, how many hours a week, on average, do you expect to spend studying and completing assignments for this course? _________________ 2. Open Minitab. 3. Go to the website: www.stat.ufl.edu/~mmeece/2023/DataSummerA10.htm and copy the data. Put in the number of hours spent studying for each student in the first column. Part D: Make a 95% Confidence interval for the population mean. 1. Summarize the data for the mean problem. To do this, go to STAT> Basic Statistics> Display Descriptive Statistics. Double Click on the variable and select O.K. x = _________ s = ________ n = _________ 2. Identify the following: x= μ= 3. Assumptions. What are the assumptions necessary for making inferences in this case? Have they been met? (To help explore the data make a boxplot. To do this, go to Graph, Boxplot, One y, Simple, O.K. On the next screen, double click on the variable name and select o.k. ) Put a copy of the boxplot here. 4. Regardless of your answer to number 3, construct a 95% confidence interval for the population mean. Go to Stat> Basic Statistics> 1 Sample t. Click inside the Samples in column box. A list of variables will appear. Double click on “Study” and select o.k. The 95% confidence interval should appear. Paste your Minitab output result below. 95% CI: 5. Regardless of your answer to number 3, interpret the confidence interval. 6. Now, consider your answer to number 3. Do you trust your results in parts 4 and 5? Explain. 7. More Interpretations: Suppose a random sample of 114 students was chosen, and each student was asked how many hours he or she studies each week. The resulting 95% confidence interval for was (8.9, 11.8). Determine if each one of the following statements is true with a capital “T” or false with a capital “F.” _____ a) 95% of all students study between 8.9 and 11.8 hours per week. _____ b) 95% of all sample means will be between 8.9 and 11.8. _____ c) 95% of samples will have averages between 8.9 and 11.8. _____ d) For 95% of all samples, will be between 8.9 and 11.8. _____ e) For 95% of all samples, will be included in the resulting 95% confidence interval. _____ f) _____ g) The formula produces intervals that capture the population mean for 95% of all samples. The formula produces intervals that capture the sample mean for 95% of all samples. Part E: Explore your own Data Set. Website: http://www.norc.org/GSS+Website/Data+Analysis/ 1. Select a research question from the General Social Survey. Go to the new GSS website. Open the catalog by clicking on the plus sign next to NORC Public Use Catalog, the click on the Plus sign next to GSS, then click on the icon next to General Social Survey 1972- 2006 and then variable description and then Mnemonic Index. Select a letter and then a variable name. Play around and look at the various variable options. Find a question in which it is reasonable to make a confidence interval. Print the page that contains the question and the data. Attach this to your project. 2. Write down the question selected. 3. What will be considered a success? 4. Summarize the data. number of successes: X = total number of observations: n= 5. Identify the following: p= p̂ 6. Assumptions. What are the assumptions necessary for making inferences in this case? Have they been met? 7. Construct a 95% confidence interval for the population proportion. Go to Stat> Basic Statistics> 1 Proportion. Click on the bullet for “Summarized Data”. Enter the number of events (X) and the number of trials (n). Select o.k. The 95% confidence interval should appear. Paste the Minitab output. 8. Interpret the results of the confidence interval.