Chapter 3 Descriptive Statistics: Numerical Methods Measures of Variability Measures of Relative Location and Detecting Outliers Exploratory Data Analysis Measures of Association Between Two Variables x © 2003 South-Western/Thomson LearningTM Slide 1 Measures of Variability It is often desirable to consider measures of variability (dispersion), as well as measures of location. For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each. Range Inter-quartile Range Variance Standard Deviation Coefficient of Variation © 2003 South-Western/Thomson LearningTM Slide 2 Measures of Variation Variation Range Interquartile Range Variance Standard Deviation Population Variance Population Standard Deviation Sample Variance Sample Standard Deviation Coefficient of Variation © 2003 South-Western/Thomson LearningTM Slide 3 Variation Measures of variation give information on the spread or variability of the data values. Same center, different variation © 2003 South-Western/Thomson LearningTM Slide 4 Range Simplest measure of variation Difference between the largest and the smallest observations: Range = xmaximum – xminimum Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 © 2003 South-Western/Thomson LearningTM Slide 5Chap 3-5 Example: Apartment Rents Range Range = largest value - smallest value Range = 615 - 425 = 190 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 © 2003 South-Western/Thomson LearningTM Slide 6 Interquartile Range The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. © 2003 South-Western/Thomson LearningTM Slide 7 Example: Apartment Rents Interquartile Range 3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445 Interquartile Range = Q3 - Q1 = 525 - 445 = 80 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 © 2003 South-Western/Thomson LearningTM Slide 8 Variance The variance is a measure of variability that utilizes all the data. It is based on the difference between the value of each observation (xi) and the mean (x for a sample, m for a population). © 2003 South-Western/Thomson LearningTM Slide 9 Variance The variance is the average of the squared differences between each data value and the mean. If the data set is a sample, the variance is denoted by s2. s2 2 ( x x ) i n 1 If the data set is a population, the variance is denoted by 2. 2 ( x m ) i 2 N © 2003 South-Western/Thomson LearningTM Slide 10 Variance for Grouped Data Sample Data s 2 fi ( X i x )2 n 1 Population Data 2 f (X i i m) 2 N © 2003 South-Western/Thomson LearningTM Slide 11 Standard Deviation Most commonly used measure of variation Shows variation about the mean The standard deviation of a data set is the positive square root of the variance. If the data set is a sample, the standard deviation is denoted s. 2 s s If the data set is a population, the standard deviation is denoted (sigma). 2 © 2003 South-Western/Thomson LearningTM Slide 12 Calculation Example: Sample Standard Deviation Data (Xi) : 10 Sample 12 14 15 n=8 s 17 18 18 24 Mean = x = 16 (10 x ) 2 (12 x ) 2 (14 x ) 2 (24 x ) 2 n 1 (10 16) 2 (12 16) 2 (14 16) 2 (24 16) 2 8 1 126 4.2426 7 © 2003 South-Western/Thomson Learning TM Slide 13 Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of data measured in different units Population σ CV μ 100% Sample s 100% CV x © 2003 South-Western/Thomson LearningTM Slide 14 Example: Apartment Rents Variance s 2 ( xi x ) 2 n 1 2 , 996.16 Standard Deviation s s2 2996. 47 54. 74 Coefficient of Variation s 54. 74 100 100 11.15 x 490.80 © 2003 South-Western/Thomson LearningTM Slide 15 Measures of Relative Location and Detecting Outliers z-Scores Detecting Outliers © 2003 South-Western/Thomson LearningTM Slide 16 z-Scores The z-score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean. xi x zi s A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z-score greater than zero. A data value equal to the sample mean will have a z-score of zero. © 2003 South-Western/Thomson LearningTM Slide 17 Example: Apartment Rents z-Score of Smallest Value (425) xi x 425 490.80 z 1. 20 s 54. 74 Standardized Values for Apartment Rents -1.20 -0.93 -0.75 -0.47 -0.20 0.35 1.54 -1.11 -0.93 -0.75 -0.38 -0.11 0.44 1.54 -1.11 -0.93 -0.75 -0.38 -0.01 0.62 1.63 -1.02 -0.84 -0.75 -0.34 -0.01 0.62 1.81 -1.02 -0.84 -0.75 -0.29 -0.01 0.62 1.99 -1.02 -0.84 -0.56 -0.29 0.17 0.81 1.99 -1.02 -0.84 -0.56 -0.29 0.17 1.06 1.99 -1.02 -0.84 -0.56 -0.20 0.17 1.08 1.99 -0.93 -0.75 -0.47 -0.20 0.17 1.45 2.27 -0.93 -0.75 -0.47 -0.20 0.35 1.45 2.27 © 2003 South-Western/Thomson LearningTM Slide 18 Detecting Outliers An outlier is an unusually small or unusually large value in a data set. A data value with a z-score less than -3 or greater than +3 might be considered an outlier. It might be an incorrectly recorded data value. It might be a data value that was incorrectly included in the data set. © 2003 South-Western/Thomson LearningTM Slide 19 Example: Apartment Rents Detecting Outliers The most extreme z-scores are -1.20 and 2.27. Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents -1.20 -0.93 -0.75 -0.47 -0.20 0.35 1.54 -1.11 -0.93 -0.75 -0.38 -0.11 0.44 1.54 -1.11 -0.93 -0.75 -0.38 -0.01 0.62 1.63 -1.02 -0.84 -0.75 -0.34 -0.01 0.62 1.81 -1.02 -0.84 -0.75 -0.29 -0.01 0.62 1.99 -1.02 -0.84 -0.56 -0.29 0.17 0.81 1.99 -1.02 -0.84 -0.56 -0.29 0.17 1.06 1.99 -1.02 -0.84 -0.56 -0.20 0.17 1.08 1.99 -0.93 -0.75 -0.47 -0.20 0.17 1.45 2.27 -0.93 -0.75 -0.47 -0.20 0.35 1.45 2.27 © 2003 South-Western/Thomson LearningTM Slide 20 Exploratory Data Analysis Five-Number Summary © 2003 South-Western/Thomson LearningTM Slide 21 Five-Number Summary Smallest Value First Quartile Median Third Quartile Largest Value © 2003 South-Western/Thomson LearningTM Slide 22 Example: Apartment Rents Five-Number Summary Lowest Value = 425 First Quartile = 450 Median = 475 Third Quartile = 525 Largest Value = 615 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 © 2003 South-Western/Thomson LearningTM Slide 23 Measures of Association between Two Variables Covariance Correlation Coefficient © 2003 South-Western/Thomson LearningTM Slide 24 Covariance The covariance is a measure of the linear association between two variables. Positive values indicate a positive relationship. Negative values indicate a negative relationship. © 2003 South-Western/Thomson LearningTM Slide 25 Covariance If the data sets are samples, the covariance is denoted by sxy. ( xi x )( yi y ) sxy n 1 If the data sets are populations, the covariance is denoted by xy . xy ( xi m x )( yi m y ) N © 2003 South-Western/Thomson LearningTM Slide 26 Correlation Coefficient The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. If the data sets are samples, the coefficient is rxy. rxy sxy sx s y If the data sets are populations, the coefficient is xy xy x y xy . © 2003 South-Western/Thomson LearningTM Slide 27