MAT 1000 Mathematics in Today's World Last Time 1. Three keys to summarize a collection of data: shape, center, spread. 2. Can measure spread with the fivenumber summary. 3. The five-number summary can be represented visually by a boxplot, which is useful for making comparisons between distributions. Today Another measurement for the spread of a distribution: the standard deviation. For a distribution of the correct shape, the two numbers mean and standard deviation give us more information than the whole five-number summary. These special shaped distributions are called normal distributions, and they are very common. Standard deviation Spread should describe how widely data values are dispersed about the center. Finding the standard deviation uses the mean 𝑥 as the center. The standard deviation is the “average” of the distance of each data value 𝑥𝑖 from the mean 𝑥. Standard deviation The standard deviation can be either a parameter or a statistic. Parameter: 𝜎 (This is a Greek letter. It is pronounced “sigma.”) Statistic: 𝑠 For the following we will assume we are computing a statistic 𝑠 Standard deviation Example Let’s find the standard deviation of 7, 8, 11, 14, (assuming these are from a sample). First, find the mean 7 + 8 + 11 + 14 𝑥= = 10 4 Standard deviation Now we make a table 𝑥𝑖 𝑥𝑖 − 𝑥 7 8 7 − 10 = −3 8 − 10 = −2 11 − 10 = 1 14 − 10 = 4 11 14 (𝑥𝑖 − 𝑥)2 −3 −2 1 4 2 =9 2 =4 2 =1 2 = 16 30 Add up all the numbers in the last column. Standard deviation We divide that sum by one less than the number of data values. Remember the data set is: 7, 8, 11, 14. This has 4 values, so we divide the sum 30 by 3 30 = 10 3 This number is called the variance. Standard deviation The standard deviation is the square root of the variance. 10 ≈ 3.16 Standard deviation Let’s review the steps we took, using 𝑥 for the mean and 𝑥1 , 𝑥2 , … , 𝑥𝑛 for the 𝑛 data values. 1. We found the difference of each data value and the mean: 𝑥𝑖 −𝑥 2. We squared each of these numbers: 𝑥𝑖 − 𝑥 2 3. Add all of these up: 𝑥1 − 𝑥 2 + 𝑥2 − 𝑥 2 + ⋯ + 𝑥𝑛 − 𝑥 2 4. Divide by 𝑛 − 1, and take a square root: 𝑥1 − 𝑥 2 + 𝑥2 − 𝑥 2 + ⋯ + 𝑥𝑛 − 𝑥 𝑛−1 2 Standard deviation This is the formula for the standard deviation you will be given on tests 𝑠= 𝑥1 − 𝑥 2 + 𝑥2 − 𝑥 2 + ⋯ + 𝑥𝑛 − 𝑥 𝑛−1 The key is to remember the steps this formula describes. 2 Standard deviation Notes We divide by 𝑛 − 1 because we are computing a statistic (the reason is subtle but important). If we were finding a parameter, we would divide by 𝑛 If the data values have units, then the mean and standard deviation have the same units. Standard deviation How should we interpret the standard deviation? If the standard deviation is 0 then there is no deviation from the mean (all the data is equal) Otherwise, the standard deviation will be positive. The larger the value of the standard deviation, the more spread out the data. Five-number summary and standard deviation We have two ways to measure the center and spread of a distribution: 1. The five-number summary 2. The mean and standard deviation. If the data is symmetric without many outliers, we will see that the mean and standard deviation give lots of information. If the data is not very symmetric, or has lots of outliers, the five-number summary is best. Normal distributions The goal is to summarize large data sets. For a one number summary, measures of center like mean or median are the best we have, but no one number summary is very informative. It may be surprising, but for a large group of commonly occurring distributions, a two number summary can be quite informative. These distributions are called normal distributions. Normal distributions Normal distributions all have a particular shape: fairly symmetric, one peak, few outliers, and a characteristic “bell” shape. The shape is easier to see with a smooth curve… Normal distributions As a histogram. Normal distributions As a smooth curve. Normal distributions Both at once. Normal distributions As a histogram. Normal distributions As a smooth curve Normal distributions Both at once. Normal distributions If we know the mean 𝑥 and the standard deviation 𝑠 of a normal distribution, we can get lots of information. We can get (close to) Q1, the median, and Q3. Since normal distributions are very symmetric, the median is very close to 𝑥. Normal distributions What about the first and third quartiles of a normal distribution? The first quartile Q1 is: 𝑥 − 0.67𝑠 In words, multiply 𝑠 by 0.67, then find 𝑥 minus that. The third quartile Q3 is: 𝑥 + 0.67𝑠 In words, multiply 𝑠 by 0.67, then add that to 𝑥. Normal distributions Example The heights of men in the US are normally distributed with mean 69.3 in. (5′ 9") and standard deviation 2.9 in. (notice the unit of the standard deviation is in.). Find the median, Q1, and Q3. The median height is equal to the mean, 69.3 in. Q1 is 69.3 − 0.67 2.9 = 67.4 in. = 5′ 7" Q3 is 69.3 + 0.67 2.9 = 71.2 in. = 5′ 11" Normal distributions Remember that 25% of the data is below Q1, and 75% is below Q3 This is the same as saying “50% of the data is between Q1 and Q3." So in a normal distribution, the middle 50% of the data is between: 𝑥 − 0.67𝑠 and 𝑥 + 0.67𝑠 Normal distributions Remember: not every distribution is normal. Don’t use the formulas 𝑥 − 0.67𝑠 and 𝑥 + 0.67𝑠 unless you know the distribution is normal. Normal distributions have a specific shape: symmetric, one peak, few outliers, and no clusters. How can you tell if a distribution is normal? Look at a histogram or a stemplot!