Session Packet #2 Supplemental Instruction Iowa State University Leader: Carly Course: Stat 226 Instructor: Date: 1/29/13 Session Agenda: Opening Activity: Main Worksheet: Closing Activity: Questions: Quiz – Reviewing Material Mean, Variance, St Dev Number Summaries Ask Any Questions Opening Activity: Quiz For questions 1-13, answer true or false. If the statement is false, change it in a way that makes it true. Try to complete this quiz on your own without using your notes. ___ 1.) The numbers 6, 7, 8, 9 do not have the same standard deviation as 160, 170, 180, 190. False. The numbers in the first set are not as spread out as the numbers in the second set. If we were comparing {6, 7, 8, 9} to {106,107,108,109}, the answer would be true because these sets have the same spread. ___ 2.) The quartiles Q1, Q2, and Q3 are numbers that divide an ordered group of observations into 3 4 equally size groups. False. “Quartile” means “each of 4 equal groups.” If you have trouble remembering this, think about how there are 4 quarters in a dollar. ___ 3.) Q1 is the median of all observations to the left of the median M, including M. False. We exclude the median in finding Q1. ___ 4.) 75% of all observations in a data set are of lesser value than Q3. True. ___ 5.) For a symmetric distribution, Q1 and Q3 are about equally apart from M. True. ___ 6.) For a left-skewed distribution, Q1 Q3 will be closer to M than Q3 Q1. False. You can also correct this by changing left-skewed to right-skewed. ___ 7.) For a right-skewed distribution, the minimum is closer to M than the maximum. True. ___ 8.) The IQR is equal to the middle 50% of the data. True. ___ 9.) The sample variance corresponds to the mean of all deviations of each observation from the sample mean. 1060 Hixson-Lied Student Success Center 515-294-6624 sistaff@iastate.edu http://www.si.iastate.edu True. ___ 10.) The sample variance standard deviation, s, is equal to the square root of the standard deviation sample variance, s2. False. ___ 11.) For the data set 7, 7, 7, 7, 7, 7, 7, 7, the standard deviation is equal to zero, but the variance is equal to 1 and the variance is also equal to zero. False. ___ 12.) If a data set is reasonably symmetric and no outliers are present, then we could use the mean and standard deviation or should only use the IQR and 5-number summary as measures of spread and center. False. ___ 13.) It is more advantageous to work with standard deviation than variance because standard deviation is expressed in the same units as the observations in the data set. True. Main Worksheet: Using the Mean, Variance, and Standard Deviation Sample Mean: 𝑛 𝑦1 + 𝑦2 + … + 𝑦𝑛 1 𝑦̅ = = ∑ 𝑦𝑖 𝑛 𝑛 𝑖=1 Why: This is the formula for calculating the sample mean, a measure of central tendency. When: The mean will be a fairly accurate measure of center when we are working with a relatively symmetric distribution of data. How: 𝑦̅ – the variable for the sample mean; measured in whatever units the observations are measured in 𝑦1 , 𝑦2 , 𝑦3 … the first, second, third… (and so on) observations in a data set 𝑖 – the variable that denotes an observation’s position in the data set 𝑦𝑖 – the variable that denotes some specific observation 𝑛 – the variable for sample size; the number of observations; whatever we are sampling, that is the unit in which “n” is measured 𝑦𝑛 – the variable that denotes the last observation in the data set ∑𝑛𝑖=1 𝑦𝑖 – the summation of all the data observations Name 1 disadvantage of using the mean. It is not robust against outliers. What is the main difference between the mean and median? The median uses only the positions of the observations to find the center of a data distribution. The mean uses the actual data values and averages them to find the center of a data distribution. Sample Variance: 𝑛 2 2 2 (𝑦 (𝑦 (𝑦 − 𝑦 ̅) + − 𝑦 ̅) + ⋯ + − 𝑦 ̅) 1 1 2 𝑛 𝑠2 = = ∑(𝑦𝑖 − 𝑦̅)2 𝑛−1 𝑛−1 𝑖=1 Why: This is a formula for measuring sample variance, a measure of spread. When: The variance will be a fairly accurate measure of spread when we are working with a relatively symmetric distribution of data. How: 𝑠 2 – the variable for the variance; measured in whatever units the response is measured in, but the unit is squared (𝑦𝑖 − 𝑦̅) – the difference between an observation and the sample mean ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 – the sum of all squared deviations from the sample mean Why do we square the deviations of the observations from the sample mean? To avoid cancelling out the positive deviations from the sample mean with negative deviations. Everything else equal, what happens to the variance as we increase the sample size? As we increase sample size, sample variance decreases. Standard Deviation: 𝑠 = √𝑠 2 Why: This is a formula for calculating standard deviation, a measure of spread. When: The standard deviation will be a fairly accurate measure of spread when we are working with a relatively symmetric distribution of data. How? 𝑠 – the variable for the standard deviation; measured in the same units as the response 𝑠 2 – the variable for the variance; measured in squared units What is the advantage of using the standard deviation over the sample variance? The advantage of using the standard deviation over the sample variance is that the standard deviation is measured in the same units as the response. Everything else equal, what happens to the standard deviation as we increase the sample size? As the sample size increases, standard deviation decreases. Closing Activity: Number Summaries For the questions below, use the given data set to compute the number summaries. It is a good idea to show your work (for the ones that require work) in case you would like to refer back to this activity later. To gauge what you know and what you need to review further, try to complete this activity on your own, without the use of your notes. A calculator might be needed. 12, 2, 25, 5, 30, 3, 4, 7, 12, 28, 4, 6, 1, 11, 25, 1, 9, 3, 4, 25 1, 1, 2, 3, 3, 4, 4, 4, 5, 6, 7, 9, 11, 12, 12, 25, 25, 25, 28, 30 1.) Minimum = 1 2.) Maximum = 30 3.) Range = Maximum – Minimum = 30 – 1 = 29 4.) Median = (6+7)/2 = 6.5 5.) Q1 = (3+4)/2 = 3.5 6.) Q3 = (12+25)/2 = 18.5 7.) IQR = Q3 – Q1 = 18.5 – 3.5 = 15 8.) Use a calculator, Excel, or JMP to find the mean. Mean = 10.850 9.) Use a calculator, Excel, or JMP to find the sample variance. Sample Variance = 98.766 10.) Use a calculator, Excel, or JMP to find the standard deviation. Standard deviation = 9.938 11.) Draw a box plot of this data. Don’t forget to label the minimum, Q1, median, Q3, and maximum. 12.) What is the shape of the distribution? Are there any outliers? The distribution appears to be bimodal. If we look back at the data set, we see that we do have 2 modes: 4 and 25. It could be argued that the distribution is right-skewed with outliers occurring at 25 and higher. 13.) Are Q1 and Q3 approximately the same distance to M? Why or why not? No, from the box and whiskers plot we see that Q1 is much closer to M than Q3. This occurs because the data is skewed to the right. 14.) The middle 50% of this data is between ____ and ____. (Give actual values.) (3.5 and 18.5) 15.) If you had to choose between using the IQR + 5-number summary or using the standard deviation and mean, which would you choose? Why? It would be best to use the IQR and 5 number summary in this case because the data is not symmetrical. The mean and standard deviation are not robust against outlying data.