Descriptive Statistics: Numerical Measures Distribution Chapter 3 BA 201 Slide 1 DISTRIBUTION Slide 2 Measures of Distribution Shape, Relative Location, and Detecting Outliers Distribution Shape z-Scores Chebyshev’s Theorem Empirical Rule Detecting Outliers Slide 3 Distribution Shape: Skewness An important measure of the shape of a distribution is called skewness. The formula for the skewness of sample data is xi x Skewness ( n 1)(n 2) s n 3 Slide 4 Distribution Shape: Skewness Symmetric (not skewed) • Skewness is zero. • Mean and median are equal. .35 Relative Frequency Skewness = 0 .30 .25 .20 .15 .10 .05 0 Slide 5 Distribution Shape: Skewness Moderately Skewed Left • Skewness is negative. • Mean will usually be less than the median. .35 Relative Frequency Skewness = .31 .30 .25 .20 .15 .10 .05 0 Slide 6 Distribution Shape: Skewness Moderately Skewed Right • Skewness is positive. • Mean will usually be more than the median. .35 Relative Frequency Skewness = .31 .30 .25 .20 .15 .10 .05 0 Slide 7 Distribution Shape: Skewness Highly Skewed Right • Skewness is positive (often above 1.0). • Mean will usually be more than the median. Relative Frequency .35 Skewness = 1.25 .30 .25 .20 .15 .10 .05 0 Slide 8 Distribution Shape: Skewness Apartment Rents 425 430 430 435 435 435 435 435 440 440 440 440 440 445 445 445 445 445 450 450 450 450 450 450 450 460 460 460 465 465 465 470 470 472 475 475 475 480 480 480 480 485 490 490 490 500 500 500 500 510 510 515 525 525 525 535 549 550 570 570 575 575 580 590 600 600 600 600 615 615 Slide 9 Distribution Shape: Skewness Apartment Rents Relative Frequency .35 Skewness = 0.92 .30 .25 .20 .15 .10 .05 0 Slide 10 z-Scores The z-score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean. zi xi x s Slide 11 z-Scores An observation’s z-score is a measure of the relative location of the observation in a data set. z-score < 0 z-score > 0 x z-score = 0 Slide 12 z-Scores Apartment Rents • z-Score of Smallest Value (425) z xi x s 425 490.80 54.74 1.20 Standardized Values for Apartment Rents -1 .2 0 -1 .1 1 -1 .1 1 -1 .0 2 -1 .0 2 -1 .0 2 -1 .0 2 -1 .0 2 -0 .9 3 -0 .9 3 -0 .9 3 -0 .9 3 -0 .9 3 -0 .8 4 -0 .8 4 -0 .8 4 -0 .8 4 -0 .8 4 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .5 6 -0 .5 6 -0 .5 6 -0 .4 7 -0 .4 7 -0 .4 7 -0 .3 8 -0 .3 8 -0 .3 4 -0 .2 9 -0 .2 9 -0 .2 9 -0 .2 0 -0 .2 0 -0 .2 0 -0 .2 0 -0 .1 1 -0 .0 1 -0 .0 1 -0 .0 1 0 .1 7 0 .1 7 0 .1 7 0 .1 7 0 .3 5 0 .3 5 0 .4 4 0 .6 2 0 .6 2 0 .6 2 0 .8 1 1 .0 6 1 .0 8 1 .4 5 1 .4 5 1 .5 4 1 .5 4 1 .6 3 1 .8 1 1 .9 9 1 .9 9 1 .9 9 1 .9 9 2 .2 7 2 .2 7 Slide 13 PRACTICE Z-SCORES Slide 14 Practice #6 – z-Scores x = 13 s = 7.4 zi xi x s xi xi x 3 -10 7 -6 11 -2 16 3 18 5 23 10 z-Score Slide 15 Chebyshev’s Theorem At least (1 - 1/k2) of the items in any data set will be within k standard deviations of the mean, where k is any value greater than 1. Within k standard deviations of mean 2 3 4 % of data values 75% 89% 94% Slide 16 Chebyshev’s Theorem Apartment Rents Let z = 1.5 with x = 490.80 and s = 54.74 At least (1 1/(1.5)2) = 1 0.44 = 0.56 or 56% of the rent values must be between x - k(s) = 490.80 1.5(54.74) = 409 and x + k(s) = 490.80 + 1.5(54.74) = 573 (Actually, 86% of the rent values are between 409 and 573.) Slide 17 Empirical Rule When data approximate a bell-shaped distribution, the empirical rule can be used to determine the percentage of data values that must be within a specified number of standard deviations of the mean. Within … of the mean % of data values +/- 1 standard deviation +/- 2 standard deviations +/- 3 standard deviations 68.26% 95.44% 99.72% Slide 18 Empirical Rule 99.72% 95.44% 68.26% m – 3s m – 1s m – 2s m m + 3s m + 1s m + 2s x Slide 19 PRACTICE CHEBYSHEV’S THEOREM AND EMPIRICAL RULE Slide 20 Practice #7 - Chebyshev’s Theorem x = 1200 s = 110 How many items (%) are within k standard deviations? k = 1.25 k = 3.5 Slide 21 Practice #7 – Empirical Rule x = 1200 s = 110 What is the lower bound for 2 standard deviations? The upper bound? How many items (%) are within this area? Slide 22 Detecting Outliers An outlier is an unusually small or unusually large value in a data set. A data value with a z-score less than -3 or greater than +3 might be considered an outlier. It might be: • an incorrectly recorded data value • a data value that was incorrectly included in the data set • a correctly recorded data value that belongs in the data set Slide 23 Detecting Outliers Apartment Rents • The most extreme z-scores are -1.20 and 2.27 • Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents -1 .2 0 -1 .1 1 -1 .1 1 -1 .0 2 -1 .0 2 -1 .0 2 -1 .0 2 -1 .0 2 -0 .9 3 -0 .9 3 -0 .9 3 -0 .9 3 -0 .9 3 -0 .8 4 -0 .8 4 -0 .8 4 -0 .8 4 -0 .8 4 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .7 5 -0 .5 6 -0 .5 6 -0 .5 6 -0 .4 7 -0 .4 7 -0 .4 7 -0 .3 8 -0 .3 8 -0 .3 4 -0 .2 9 -0 .2 9 -0 .2 9 -0 .2 0 -0 .2 0 -0 .2 0 -0 .2 0 -0 .1 1 -0 .0 1 -0 .0 1 -0 .0 1 0 .1 7 0 .1 7 0 .1 7 0 .1 7 0 .3 5 0 .3 5 0 .4 4 0 .6 2 0 .6 2 0 .6 2 0 .8 1 1 .0 6 1 .0 8 1 .4 5 1 .4 5 1 .5 4 1 .5 4 1 .6 3 1 .8 1 1 .9 9 1 .9 9 1 .9 9 1 .9 9 2 .2 7 2 .2 7 Slide 24 Slide 25