Measures of Location The population mean of a data set is the average of all the data values. x Sum of the values of the N observations i N Number of observations in the population Measures of Location The population mean of a data set is the average of all the data values. x i N The sample mean is the point estimator of the population mean . Sum of the values x x of the n observations i n Number of observations in the sample Measures of Location Example: Recall the Hudson Auto Repair example The manager of Hudson Auto would like to have better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed below. 91 78 93 57 75 52 99 80 97 62 71 69 72 89 66 75 79 75 72 76 104 74 62 68 97 105 77 65 80 109 85 97 88 68 83 68 71 69 67 74 62 82 98 101 79 105 79 69 62 73 3949 50 78.98 Measures of Location For an odd number of observations: 26 18 27 12 14 27 19 7 observations in ascending order the median is the middle value. Measures of Location For an even number of observations: 26 18 27 12 14 27 19 30 8 observations in ascending order the median is the average of the middle two values. Median = (19 + 26)/2 = 22.5 Measures of Location Example: Hudson Auto Repair Averaging the 25th and 26th data values: Median = (75 + 76)/2 = 75.5 52 57 62 62 62 62 65 66 67 68 68 68 69 69 69 71 71 72 72 73 74 74 75 75 75 76 77 78 79 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Measures of Location Example: Hudson Auto Repair Mode = 62 52 57 62 62 62 62 65 66 67 68 68 74 68 74 69 75 69 75 69 75 71 76 71 77 72 78 72 79 73 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Measures of Location Example: Hudson Auto Repair First quartile = 25th percentile ith = (p/100)n =(25/100)50 = 12.5 = 13th First quartile = 69 52 57 62 62 62 62 65 66 67 68 68 68 69 69 69 71 71 72 72 73 74 74 75 75 75 76 77 78 79 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Measures of Location Example: Hudson Auto Repair ith = (p/100)n =(80/100)50 = 40th Average the 40th and 41st data values 80th Percentile = (93 + 97)/2 = 95 52 57 62 62 62 62 65 66 67 68 68 74 68 74 69 75 69 75 69 75 71 76 71 77 72 78 72 79 73 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Measures of Location Example: Hudson Auto Repair: 80th Percentile 95 52 57 62 62 62 62 65 66 67 68 68 74 68 74 69 75 69 75 69 75 71 76 71 77 72 78 72 79 73 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Pelican Stores -- continued Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table 2.18. Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican’s management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report 1.Using graphs and tables, summarize the qualitative variables. 2.Using graphs and tables, summarize the quantitative variables. 3.Using pivot tables and scatter plots, summarize the variables. 4.Compute the mean, mode, median, and the 25th and 75th percentiles. data_pelican.xls Measures of Variability Example: Hudson Auto Repair Range = maximum – minimum Range = 109 – 52 = 57 52 57 62 62 62 62 65 66 67 68 68 74 68 74 69 75 69 75 69 75 71 76 71 77 72 78 72 79 73 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Measures of Variability Example: Hudson Auto Repair 3rd Quartile (Q3) = 89 1st Quartile (Q1) = 69 Interquartile Range = Q3 – Q1 = 89 – 69 = 20 52 57 62 62 62 62 65 66 67 68 68 74 68 74 69 75 69 75 69 75 71 76 71 77 72 78 72 79 73 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Measures of Variability The population variance is the average variation 2 2 ( x ) i N The population mean Measures of Variability The population variance is the average variation 2 2 ( x ) i N i th deviation from the population mean Measures of Variability The population variance is the average variation 2 2 ( x ) i N i th squared deviation from the population mean Measures of Variability The population variance is the average variation 2 2 ( x ) i N Sum of squared deviations from the population mean Measures of Variability The population variance is the average variation 2 2 ( x ) i N Total variation of x Measures of Variability The population variance is the average variation 2 2 ( x ) i N Number of observations in the population Measures of Variability The population variance is the average variation 2 2 ( x ) i N The sample variance is an unbiased estimator of 2 s2 2 ( x ) i n Number of observations in the sample Measures of Variability The population variance is the average variation 2 2 ( x ) i N The sample variance is an unbiased estimator of 2 s 2 2 ( x x ) i n n n 1 Measures of Variability The population variance is the average variation 2 2 ( x ) i N The sample variance is an unbiased estimator of 2 s2 2 ( x x ) i n 1 Degrees of freedom Measures of Variability s s 2 s 100 % x 2 100 % Measures of Variability x = 78.98 Sorted invoices Observed value Sqrd Dev from the mean 1 52 727.92 2 57 483.12 3 62 288.32 4 62 288.32 5 62 288.32 6 62 288.32 7 65 195.44 49 105 677.04 50 109 901.20 Sum 3949 9592.98 Measures of Variability Example: Hudson Auto Repair Variance s 2 2 ( x x ) i n 1 9592.98 195.78 50 1 Standard Deviation s s2 195.78 13.992 Coefficient of variation 13.992 s 100% 17.72% 100 % 78.98 x Pelican Stores -- continued Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table 2.18. Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican’s management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report 1.Using graphs and tables, summarize the qualitative variables. 2.Using graphs and tables, summarize the quantitative variables. 3.Using pivot tables and scatter plots, summarize the variables. 4.Compute the mean, mode, median, and the 25th and 75th percentiles. 5.Compute the range, IQR, variance, and standard deviations. data_pelican.xls Measures of Shape Example: Hudson Auto Repair z-Score of Smallest Value xi x 52 78.98 z 1.93 s 13.992 52 57 62 62 62 62 65 66 67 68 68 68 69 69 69 71 71 72 72 73 74 74 75 75 75 76 77 78 79 79 79 80 80 82 83 85 88 89 91 93 97 97 97 98 99 101 104 105 105 109 Note: Data is in ascending order. Measures of Shape x = 78.98 s = 13.992 Observed value Dev from the mean z-score 52 -26.98 -1.93 57 -21.98 -1.57 62 -16.98 -1.21 62 -16.98 -1.21 62 -16.98 -1.21 62 -16.98 -1.21 65 -13.98 -1.00 105 26.02 1.86 109 30.02 2.15 3949 0 0 Measures of Shape An important measure of the shape of a distribution is called skewness. skew n zi 3 (n 1)(n 2) It is just the average of the n cubed z-scores when n is “large” skew 3 z i n Measures of Shape Observed value z-score cubed z-score 52 -1.93 -7.17 57 -1.57 -3.88 62 -1.21 -1.79 62 -1.21 -1.79 62 -1.21 -1.79 62 -1.21 -1.79 65 -1.00 -1.00 105 1.86 6.43 109 2.15 9.88 3949 0 22.567 Measures of Shape (n) zi 3 (50)(22.567) skew 0.4797 (n 1)(n 2) (49)(48) Tune-up Parts Cost 18 16 Frequency 14 12 10 8 6 4 2 50 60 $62 70 80 $75.50 $78.98 90 100 110 Parts Cost ($) Measures of Shape Symmetric Moderately Skewed Left skew = 0 skew = .31 Highly Skewed Right skew = 1.25 Measures of Shape Chebyshev's Theorem: At least (1 - 1/z2) of the data values are within z standard deviations of the mean. At least 0% of the data values are within 1 standard deviation of the mean At least 75% of the data values are within 2 standard deviations of the mean At least 89% of the data values are within 3 standard deviations of the mean At least 94% of the data values are within 4 standard deviations of the mean Measures of Shape Empirical Rule: 68.26% of the data values are within 1 standard deviation of the mean 95.44% of the data values are within 2 standard deviations of the mean 99.74% of the data values are within 3 standard deviations of the mean 99.99% of the data values are within 4 standard deviations of the mean Measures of Shape z-score Is the observation within 2 std dev? -1.93 Yes -1.57 Yes -1.21 Yes -1.21 Yes -1.21 Yes -1.21 Yes -1.00 Yes 1.86 Yes 2.15 No 49 of the 50 data values are within 2 s of the mean = 98% 50 of the 50 data values are within 3 s of the mean = 100% None of the values are outliers Pelican Stores -- continued Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table 2.18. Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican’s management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report 1.Using graphs and tables, summarize the qualitative variables. 2.Using graphs and tables, summarize the quantitative variables. 3.Using pivot tables and scatter plots, summarize the variables. 4.Compute the mean, mode, median, and the 25th and 75th percentiles. 5.Compute the range, IQR, variance, and standard deviations. 6.Compute the z-scores and skew, find the outliers, and count the observations that are within 1, 2, & 3 standard deviations of the mean. data_pelican.xls Measures of the relationship between 2 variables The covariance is computed as follows: sxy ( x x )( y y ) i i n 1 (for samples) xy ( x )( y i x i N (for populations) y ) Measures of the relationship between 2 variables The covariance is computed as follows: sxy ( x x )( y y ) i i n 1 (for samples) i th deviation from x’s means xy ( x )( y i x i N (for populations) y ) Measures of the relationship between 2 variables The covariance is computed as follows: sxy ( x x )( y y ) i i n 1 (for samples) i th deviation from y’s means xy ( x )( y i x i N (for populations) y ) Measures of the relationship between 2 variables The covariance is computed as follows: sxy ( x x )( y y ) i i n 1 (for samples) xy ( x )( y i x i N (for populations) The sizes of the sample and population y ) Measures of the relationship between 2 variables The covariance is computed as follows: sxy ( x x )( y y ) i i n 1 (for samples) Degrees of freedom xy ( x )( y i x i N (for populations) y ) Measures of the relationship between 2 variables The covariance is computed as follows: sxy ( x x )( y y ) xy i i n 1 ( xi x )( yi y ) N rxy s xy sx s y xy xy x y Measures of the relationship between 2 variables Example: Reed Auto Sales Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below. Number of TV Ads Number of Cars Sold (x) (y) 1 14 3 24 2 18 1 17 3 27 Measures of the relationship between 2 variables Example: Reed Auto Sales Cars sold 35 30 25 20 15 10 5 0 0 1 2 TV Ads 3 4 Measures of the relationship between 2 variables Example: Reed Auto Sales x y 1 14 3 24 2 18 1 17 3 27 10 . 100 . 5 5 x = 2 y = 20 (ads) (cars) x–x (x – x)2 y–y 12 32 22 12 32 1 1 0 1 1 4. 4 sxx = 1 14 20 24 20 18 20 17 20 27 20 (ads squared) (y – y)2 (x – x)(y – y) 36 16 4 9 49 114 . 4 syy= 28.5 (cars squared) sx = 1 sy = 5.34 (ads) (cars) 6 4 0 3 7 20 . 4 sxy = 5 (ads-cars) Measures of the relationship between 2 variables Example: Reed Auto Sales sxy = 5 (ads-cars) sx = 1 sy = 5.34 (ads) (cars) sxy (ads-cars) 5 rxy .9363 sx s y 1 5.34 (ads) (cars) Pelican Stores -- continued Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table 2.18. Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican’s management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report 1.Using graphs and tables, summarize the qualitative variables. 2.Using graphs and tables, summarize the quantitative variables. 3.Using pivot tables and scatter plots, summarize the variables. 4.Compute the mean, mode, median, and the 25th and 75th percentiles. 5.Compute the range, IQR, variance, and standard deviations. 6.Compute the z-scores and skew, find the outliers, and count the observations that are within 1, 2, & 3 standard deviations of the mean. 7.Compute the covariances and correlations. data_pelican.xls