Chapter 2 Turning Data Into Information Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Some key words: • sample vs. population • categorical (ordinal?) vs. quantitative • explanatory vs. response • outlier • mean vs. median • standard deviation vs. range vs. IQR range • histogram vs. boxplot • shape? Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 2 Symmetric: mean = median Skewed Left: mean < median Skewed Right: mean > median Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 3 Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 4 2.7 Bell-Shaped Distributions of Numbers Many measurements follow a predictable pattern: • Most individuals are clumped around the center • The greater the distance a value is from the center, the fewer individuals have that value. Variables that follow such a pattern are said to be “bell-shaped”. A special case is called a normal distribution or normal curve. Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 5 Example 2.11 Bell-Shaped British Women’s Heights Data: representative sample of 199 married British couples. Below shows a histogram of the wives’ heights with a normal curve superimposed. The mean height = 1602 millimeters. Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 6 Describing Spread with Standard Deviation Standard deviation measures variability by summarizing how far individual data values are from the mean. Think of the standard deviation as roughly the average distance values fall from the mean. Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 7 Calculating the Standard Deviation Formula for the (sample) standard deviation: x x 2 s i n 1 The value of s2 is called the (sample) variance. An equivalent formula, easier to compute, is: s x 2 i nx 2 n 1 Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 8 Population Standard Deviation Data sets usually represent a sample from a larger population. If the data set includes measurements for an entire population, the notations for the mean and standard deviation are different, and the formula for the standard deviation is also slightly different. A population mean is represented by the symbol m (“mu”), and the population standard deviation is x m 2 i n Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9 Interpreting the Standard Deviation for Bell-Shaped Curves: The Empirical Rule For any bell-shaped curve, approximately • 68% of the values fall within 1 standard deviation of the mean in either direction • 95% of the values fall within 2 standard deviations of the mean in either direction • 99.7% of the values fall within 3 standard deviations of the mean in either direction Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 10 The Empirical Rule, the Standard Deviation, and the Range • Empirical Rule => the range from the minimum to the maximum data values equals about 4 to 6 standard deviations for data with an approximate bell shape. • You can get a rough idea of the value of the standard deviation by dividing the range by 6. Range s 6 Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 11 Example 2.11 Women’s Heights (cont) Mean height for the 199 British women is 1602 mm and standard deviation is 62.4 mm. • 68% of the 199 heights would fall in the range 1602 62.4, or 1539.6 to 1664.4 mm • 95% of the heights would fall in the interval 1602 2(62.4), or 1477.2 to 1726.8 mm • 99.7% of the heights would fall in the interval 1602 3(62.4), or 1414.8 to 1789.2 mm Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 12 Example 2.11 Women’s Heights (cont) Summary of the actual results: Note: The minimum height = 1410 mm and the maximum height = 1760 mm, for a range of 1760 – 1410 = 350 mm. So an estimate of the standard deviation is: Range 350 s 58.3 mm 6 6 Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 13 Standardized z-Scores Standardized score or z-score: Observed value Mean z Standard deviation Example: Mean resting pulse rate for adult men is 70 beats per minute (bpm), standard deviation is 8 bpm. The standardized score for a resting pulse rate of 80: 80 70 z 1.25 8 A pulse rate of 80 is 1.25 standard deviations above the mean pulse rate for adult men. Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 14 The Empirical Rule Restated For bell-shaped data, • About 68% of the values have z-scores between –1 and +1. • About 95% of the values have z-scores between –2 and +2. • About 99.7% of the values have z-scores between –3 and +3. Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 15