Review 1 2004.11.10 Chapter 1: 1. Elements, Variable, and Observations: 2. Type of Data: Qualitative Data and Quantitative Data (a) Qualitative data may be nonnumeric or numeric. (b) Quantitative data are always numeric. (c) Arithmetic operations are only meaningful with quantitative data. Chapter 2: Figure 2.22, p. 66. 1. Summarizing qualitative data: Frequency distribution, relative frequency distribution, and percent frequency distribution. Bar plot and Pie plot. 2. Summarizing quantitative data: Frequency distribution, relative frequency distribution, percent frequency distribution, cumulative frequency distribution, cumulative relative frequency distribution, cumulative percent frequency distribution Histogram, Ogive, and stem-and leaf display. Chapters 3 Measures of Location, Dispersion, Exploratory Data Analysis, Measure of Relative Location, Weighted and Grouped Mean and Variance Chapter 4: Tabular and Graphical Methods: Crosstabulation (qualitative and quantitative data) and Scatter Diagram (only quantitative data). Numerical Method: Covariance and Correlation Coefficient. Example 1 (Chapter 1) A magazine surveyed a sample of its subscribers. Some of the responses from the survey are shown below. Subscriber ID Gender Age Income ($1000) 0006 F 22 45 4798 M 21 53 2291 F 33 82 4988 M 38 30 (a) How many elements are in the data set? Write them down. 1 (b) How many variables are in the data set? Write them down. (c) How many observations are in the data set? Write them down. (d) Which of the above (Sex, Age, Annual Household Income) are qualitative and which are quantitative? (e) Are the data time series or cross-sectional? [solution:] (a) 4 elements, subscribers: 0006, 4798, 2291, and 4988. (b) 3 variables, Gender, Age, and Income. (c) 4 observations, (F, 22, 45), (M, 21, 53), (F, 33, 82) and (M, 38, 30). (d) Quantitative: Age and Income; Qualitative: Gender. (e) The data are cross-sectional. Example 2 (Chapter 2) Consider the sample data in the following frequency distribution. Class Frequency 3-7 4 8-12 7 13-17 9 18-22 5 Summarize the data by constructing: (a) a relative frequency distribution, a cumulative relative frequency distribution, a cumulative percent frequency distribution. (b) a histogram and an ogive. (c) the mean, the standard deviation and the coefficient of variation for the grouped data. [solution:] (a) Class Relative Frequency Cumulative Relative Frequency Cumulative Percent Frequency 3-7 0.16 0.16 16 8-12 0.28 0.44 44 13-17 0.36 0.8 80 18-22 0.2 1 100 (c) xg 5 4 10 7 15 9 20 5 13 . 25 2 Since s 2 g 2 2 2 2 5 13 4 10 13 7 15 13 9 20 13 5 25 25 1 sg 25 5 and c.v. sg xg 100 38.46 . Example 3 (Chapter 3): For the following data, 2, 1, 0, 2, 0, 2, 1, 2, 0, 2, 1, 2, (a) Compute the mean (b) The standard deviation. (c) The coefficient of variation. (d) The (100/3)th percentile. (e) The 82th percentile (f) The mode. (g) The interquartile range. (h) The five number summary for the data. (i) The box plot. (j) Determine the outlier. [solution:] (a) 12 x x i 1 12 i 2 11 2 1.25 12 (b) 12 s x i 1 i x 12 1 2 2 1.252 1 1.252 1 1.252 2 1.252 11 0.866 (c) C.V . (d) 1. The data are 0 0 0 1 2. The index is 1 s 0.866 100 100 69.28 . x 1.25 1 2 3 2 2 2 2 2 12 100 3 4 100 Thus, 11 1 2 is the (100/3)th percentile. (e) The index is 12 82 9.84 100 Thus, the 10’th data in (d), 2, is the 82th percentile. (f) The mode is 2. (g) Since Q1 0 1 22 0.5, Q3 2, 2 2 IQR Q3 Q1 2 0.5 1.5 . (h) Minimum Q1 Q2 Q3 Maximum 0 0.5 1.5 2 2 Example 4 (Chapter 3): Suppose we have the following data: Rent 420-439 440-459 460-479 480-499 500-519 Frequency 8 17 12 8 7 Rent 520-539 540-559 560-579 580-599 600-619 Frequency 4 2 4 2 6 What are the mean rent and the sample variance for the rent? [solution:] 10 xg fM i 1 i 70 i , where f i is the frequency of class i M i is the midpoint of class i and n is the sample size. Then, Rent fi 420-439 440-459 460-479 480-499 500-519 8 17 12 8 7 4 Mi 429.5 449.5 469.5 489.5 509.5 Rent fi 520-539 540-559 560-579 580-599 600-619 4 2 4 2 6 Mi 529.5 549.5 569.5 589.5 609.5 fM 34525 Thus, 10 i i 1 i and xg 34525 493.21 . 70 For the sample variance, f M 10 s g2 i 1 i xg 2 i 70 1 208234.287 3017.89 69 Example 5 (Chapter 3) (a) Consider a sample with data values of 10, 20, 12, 17 and 16. Compute the z-score for each of the five observations. (b) Suppose the data have a bell-shaped distribution with a mean of 20 and a standard deviation of 5. Use both Chebyshev’s theorem and the empirical rule to determine the percentage of data within the range 10-30. [solution:] (a) Since x s 10 20 12 17 16 15 and 5 10 152 20 152 12 152 17 152 16 152 5 1 64 4, 4 x x 12 15 x1 x 10 15 x x 20 15 1.25, z 2 2 1.25, z 3 3 0.75, s 4 s 4 s 4 x x 16 15 x x 17 15 z4 4 0.5, z 5 5 0.25 s 4 s 4 z1 (b) [10,30] 20 10 x 2s Thus, by Chebyshev’s theorem, within 2 standard deviation, there is at least 5 1 1 2 100% 75% 2 By empirical rule, there are approximately 95% of the data values will be within this interval. Example 6 (Chapter 4) (a) The following data are for 30 observations on two qualitative variables x and y. The categories for x are A, B, and C; the categories for y are 1 and 2. Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x A B B C B C B C A B A B C C C y 1 1 1 2 1 2 1 2 1 1 1 1 2 2 2 Observation 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 x B C B C B C B C A B C C A B B y 2 1 1 1 1 2 1 2 1 1 2 2 1 1 2 (i) Develop a crosstabulation for the data, with x in columns and y in rows. (ii) What is the relationship, if any, between x and y? (b) For the following data, X 2 4 6 8 10 Y -5 -7 -9 -11 -13 Compute and interpret the sample covariance and the sample correlation coefficient. [solution:] (a) (i) Category A B C Total 1 5 11 2 18 2 0 2 10 12 Total 5 13 12 30 (ii) Category A values for x are always associated with category 1 values for y. Category B values for x are usually associated with category 1 values for y. Category C values for x are usually associated with category 2 values for y. (b) Since x 6, y 9 , 6 n s xy x x y y i 1 i i n 1 2 6 5 9 4 6 7 9 6 6 9 9 8 6 11 9 10 6 13 9 5 1 10 Also, since n s 2 x x x 2 6 4 6 6 6 8 6 10 6 i 1 2 2 i 2 n 1 2 2 2 5 1 10 and n s 2 y y y 5 9 7 9 9 9 11 9 13 9 i 1 2 i 2 2 2 n 1 5 1 2 2 10 , thus rxy s xy 2 2 x y ss 10 10 10 1 The covariance indicates the two variables are negatively correlated. Further, the correlation function indicates there is perfectly linear correlation between the two variables with negative slope. 7