Statistics 101L – Laboratory 3 The first activity looks at the distribution of the heights of students in this class. The other activities look at distributions of variables to see if the data could have come from a Normal model. Activity 1: On the first day of class we talked about the heights of students in the class. In last weeks lab we measured those heights, in cm. Now we will look at the heights and see if gender can explain some of the variation in the heights of students. The data from last week’s lab is on the course web site as a JMP data set called Height.JMP. Open the JMP data set and use JMP to analyze the distribution of height and the distribution of height by gender. Be sure to follow the suggestions from previous labs, homework and class on what should be included in the output and how the output should look. Turn in this output with your lab. 1. Describe the distribution of heights of students in the class. Make sure to discuss shape, center and spread and the presence of any outliers. Also, discuss the values of the numerical summaries for this distribution and what they tell you about the distribution. Recall that some numerical summaries, e.g. range, are not calculated by JMP but that you can calculate them from what JMP gives you. 2. Discuss the similarities and differences you see in the distributions of heights of the male and female students. Again be sure to discuss shape, center and spread and support your comparisons by referring to graphical and numerical summaries in the JMP output. 3. What are some other variables that could account for variation in heights? How could you investigate whether one of these other variables could account for variation in heights? Activity 2: As part of this lab you are given JMP output for 3 samples of data, Data Set #1, Data Set #2 and Data Set #3. Each data set contains 100 observations. Different models were used to generate these data sets. You are to determine if a Normal model or some model was used to generate each sample data set. In other words, could the data in each set have come from a population that can be modeled using a Normal model? Because the data sets are generated from known models, the population means and population standard deviations are known. For each data set the value of the population mean μ and the population standard deviation σ are given in the table below. Data Set #1 #2 #3 μ σ 1 5 50 1 1 1 For each data set, answer the questions 1 – 3 and 5 on the next page. 1 1. Describe the shape of the histogram. What does this indicate about the probable shape of the distribution for the entire population? 2. Describe the overall pattern of the normal quantile plot? 3. If a Normal model was used to generate a data set, the 68-95-99.7 rule should apply, roughly, to the 100 observations. Using μ and σ from the table above, determine what percentage of the 100 observations are within 1, 2, and 3 standard deviations of the population mean. What does this indicate about the probable shape of the distribution for the entire population? 4. After looking at the shapes of the histograms for the three data sets, can you determine what properties the normal quantile plot will have if the data were generated from a Normal model? How about if the data were generated from a different type of model? 5. Based on your answer to the questions 1 – 3 above, do you believe a Normal model was used to generate the 100 observations in each of the data sets? Explain your answer briefly. Activity 3: You have JMP output for the total weight of a sample of 325 Fun Size Bags of M&Ms. Could these weights have come from a population that could be modeled using a Normal model? For this data set, answer the following questions. 1. Describe the shape of the histogram. What does this indicate about the probable shape of the distribution for the entire population? 2. What is the overall pattern for the normal quantile plot? What does this indicate about whether the distribution of total weight could be modeled with a Normal model? 3. Using your answers to questions 1 and 2 above, do you believe the distribution of the total weight of Fun Size Bags of M&Ms can be modeled with a Normal model? Explain your answer. 2 Statistics 101L – Laboratory 3 – Answer Sheet Names: _______________________ _______________________ _______________________ _______________________ Activity 1: 1. Describe the distribution of heights of students in the class and discuss the values of the numerical summaries for this distribution and what they tell you about the distribution. 2. Discuss the similarities and differences you see in the distributions of heights of male and female students. 3. What are some other variables that could account for variation in heights? How could you investigate whether one of these other variables could account for variation in heights? 3 Activity 2: Data Set #1 Data Set #2 Data Set #3 1. Shape 1. Shape 1. Shape 2. Pattern of Normal plot 2. Pattern of Normal plot 2. Pattern of Normal plot 3. 68-95-99.7 rule 3. 68-95-99.7 rule 3. 68-95-99.7 rule 4. Shape of histogram and properties of Normal quantile plot 5. Normal model? 5. Normal model? 5. Normal model? 4 Activity 3: Total Weight of Fun Size Bags of M&Ms 1. Describe the shape of the histogram. What does this indicate about the probable shape of the distribution for the entire population? 2. What is the overall pattern for the normal quantile plot? What does this indicate about whether the distribution of total weight could be modeled with a Normal model? 3. Using your answers to questions 1 and 2 above, do you believe the distribution of the total weight of Fun Size Bags of M&Ms can be modeled with a Normal model? Explain your answer. 5