9 Chapter 2 Sec. 2.1 One Variable Data How Data Described Sorting: small to big Graphing: visualizing and pattern Numerical summaries: measures of centers and variations. Sec. 2.2 The Graphical Display of Data 1. Dot – plots Example. Given data set: {1, 2, 3, 2, 4, 5, 2, 3, 6, 5}, draw a dot-plot. y x 2. Stem-leaf Example 1. Given a set of a test scores: {78, 65, 96, 81, 53, 70, 82, 98, 58, 45, 64, 75, 88, 67, 66, 68, 56, 77, 65, 48, 34, 55, 61, 70, 60}, construct a stem-leaf display. 10 Stem 3 4 5 6 7 8 9 Leaf 4 58 3568 01245567 00578 128 68 Example 2. Given the stem-leaf display: Stem Leaf 40 6 6 8 41 0 5 42 1 3 4 What is the original data set? Answer: {406, 406, 408, 410, 415, 421, 423, 424} 3. Histograms (a). Frequency Table (Using data from example 1 above.) 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 Frequency Relative Frequency 1 1/25 2 2/25 4 4/25 8 8/25 5 5/25 3 3/25 2 2/25 25 1 upper class limits lower class limits Cumulative Relative Frequency 1/25 3/25 7/25 15/25 20/25 23/25 1 11 Bar Graph (There are gaps between bars.): 9 8 Frequency 7 6 5 4 3 2 1 0 30 - 39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 -99 score (b). Histogram 9 8 Frequency 7 6 5 4 3 2 1 0 30 - 39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 -99 score Note: To change the bar graph to histogram, 39 and 40 merge at 39.5, 49 and 50 merge at 49.5 … The frequency axis can also be relative frequency axis. 0.35 Relative Frequency 0.3 0.25 0.2 0.15 0.1 0.05 0 30 - 39 40 - 49 50 - 59 60 - 69 score 70 - 79 80 - 89 90 -99 12 Section 2.3 Describe the Center of the Data 1. Arithmetic mean (Average) Given data set {x1 , x2 , , xn } 1 n n Sample mean: x xi i 1 1 n x1 x2 xn n n Population mean: xi i 1 2. Median (the “middle value” of data in ascending or descending order) Steps: (a). Order the data. 1 2 (b). If n is even, then Median M ( xn / 2 x( n / 2)1 ) (c). If n is odd, then Median M x( n1) / 2 Examples: Find the median of the given data set. (1). {1, 6, 11, 9, 2} Order: {x1 , x2 , x3 , x4 , x5 } {1, 2, 6, 9, 11} n 5 , odd number M x( n1) / 2 x3 6 (2). {5, 7, 31, 19, 14, 29, 10, 25, 42, 18} Order: 5, 7, 10, 14, 18, 19, 25, 29, 31, 42 n 10 , even number M 12 xn / 2 xn / 21 12 ( x5 x6 ) 12 (18 19) 18.5 13 3. Mode The most frequent data value in the data set. Note: A data set may have more than one mode. If all data have the same frequency, no need to discuss mode. 4. Midrange: Section 2.4 1 (smallest + largest) 2 Describe the Spread of the Data 1. Range of the data set Range = largest – smallest Advantage: Simple Disadvantage: It can not describe the spread of the data well if there is an extreme value. Note the difference between the range of a data set and the range of a random variable, which is the set of all the possible values of the random variable. Example 1. Find the range of the given data set. (a). {1, 3, 4, 6, 7, 8, 10, 11, 13} (b). {1, 3, 4, 6, 7, 8, 10, 11, 101} Solution: (a). range 13 1 12 (b). range 101 1 100 Note: the range of the second data set does not describe the spread of the data well. 14 2. Inter Quartile Range (IQR) 1st quartile Q1 : 2nd quartile Q2 : the number that 25% data fall under. the number that 50% data fall under. Note: Q2 M , the median 3rd quartile Q3 : the number that 75% data fall under. Inter Quartile Range IQR Q3 Q1 Example 2. Find the quartiles and the IQR. (a). {1, 3, 4, 6, 8, 9, 11} (b). {1, 3, 4, 6, 8, 9, 11, 100} Solution: (a). Q2 6 ( median) Two ways to calculate Q1 and Q3 , hence IQR: Excluding 6: Q1 3 , the median of {1, 3, 4}. Q3 9 , the median of {8, 9, 11}. IQR 9 3 6 . Including 6: Q1 12 (3 4) 3.5 , the median of {1, 3, 4, 6}. Q3 12 (8 9) 8.5 , the median of {6, 8, 9, 11}. IQR 8.5 3.5 5 . (b). Q2 12 (6 8) 7 , which is not a member of the data set. Q1 12 (3 4) 3.5 , the median of {1, 3, 4, 6} Q3 12 (9 11) 10 , the median of {8, 9, 11, 100} IQR 10 3.5 6.5 15 3. Standard deviation Standard deviation is a quantity to describe the spread of the data around the mean. Some of the ideas to describe data spread from the mean: How far from the mean, left/right of the mean ----The difference between a data value and the mean. Combined effect----The sum of the differences. Example 3. Find the sum of the differences from the mean. {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} Solution: (Treat as a sample data set) Mean: x 6 (verify) 11 Sum of the differences = (i 6) 0 (Note: xi i here.) i 1 n In fact, for any set {x1 , x2 , , xn } , ( xi x ) 0 . i 1 n n n i 1 i 1 i 1 Proof: ( xi x ) xi x nx nx 0 . So such calculation does not show the spread of the data from the mean. n How about xi x ? i 1 Mathematically, the absolute value will cause problems. 16 A better way of describe the deviation: Definition of the standard deviation/variance: Sample Standard deviation: s [ 1 n 1 n 2 1/ 2 ( x x ) ] ( xi x ) 2 i n 1 i 1 n 1 i 1 Population Standard deviation: [ 1 n 1 n 2 1/ 2 ( x ) ] ( xi ) 2 i n i 1 n i 1 (Note: For the same data set, s .) Sample variance: s2 1 n ( xi x ) 2 n 1 i 1 Population variance: 2 1 n ( xi ) 2 n i 1 Example 4. Find the standard deviation of the given data set. {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} Solution: Treat as a sample: 1 11 1 s [ (i 6) 2 ]1 / 2 110 11 . (Note: x 6 .) 11 1 i 1 10 Treat as a population: 1 11 1 [ (i 6) 2 ]1 / 2 110 10 . 11 i 1 11 17 Remarks: (Use sample standard deviation s here. It is the same for .) (a). s 0 , s 0 x1 x 2 xn x . (b). In general, data value more spread out s is larger (c). s is influenced strongly by the extreme values. (d). For most data sets, at least 90% of the values are in the interval ( x 3s , x 3s ) The 68-95-99 rule, also called the empirical rule for an approximately bell shaped distribution. x x 3s x 2s x s x x s x 2s x 3s 68% 95% 99% 18 Section 2.5 More Graphical Display: Box-Plots Min = the minimum value Max = the maximum value Q1 = the 1st quartile or lower quartile Q2 = the 2nd quartile (the median) Q3 = the 3rd quartile or upper quartile | Min | Max Q1 Q3 Q2 IQR Section 2.6 Data Transformation Linear transformation: y ax b , a and b are fixed real numbers. Example 1. Transform the given sample data set. Discuss the change of the mean and the standard deviation. Sample: { 1, 2, 3, 4, 5} Transformations: (1). y x 3 , (2). y 2 x , (3). y 2 x 1 1 5 Solution: x (1 2 3 4 5) 3 1 s x [(1 3) 2 (2 3) 2 (3 3) 2 (4 3) 2 (5 3) 2 ] 4 1/ 2 5 10 2 2 19 (1). y x 3 y {4, 5, 6, 7, 8} 1 y (4 5 6 7 8) 6 5 1 s y [(4 6) 2 (5 6) 2 (6 6) 2 (7 6) 2 (8 6) 2 ] 4 1/ 2 10 2 Note: y x 3, s y s x . (2) y 2 x y {2, 4, 6, 8 10} 1 y (2 4 6 8 10) 6 5 1/ 2 1 s y [(2 6) 2 (4 6) 2 (6 6) 2 (8 6) 2 (10 6) 2 ] 4 10 Note: y 2 x , s y 2 s x (3) y 2 x 1 y {3, 5, 7, 9, 11} 1 y (3 5 7 9 11) 7 5 1/ 2 1 s y [(3 7) 2 (5 7) 2 (7 7) 2 (9 7) 2 (11 7) 2 ] 4 Note: y 2 x 1, s y 2s x In general, If y axb X Y , then y a x b , s y a s x . 10 20 Standard score ( z -score) Definition formula: For sample: z xx s For population: z x z -score is a special linear transformation: Sample: z Population: z xx 1 x 1 x x ax b , where a , b s s s s s x 1 x 1 , where a , b z -score is a combination of two simple linear transformations: Given data x , let y x x 1 x ( x ) , and z then z y 1 y 0, s s xx . s Note: y x x 0, s y sx s . 1 z y 0, s 1 1 sz s y s 1 . s s zi Thus, {x1, x2 , x3 , xn } xi x s {z , 1 z 2 , z3 , z n } x , 0 s However, the set of z -scores always have the mean of 0, and the standard deviation of 1, i.e. z 0, s z 1. 21 The meaning of the z -score: z -score gives the number of standard deviations that a data value x is away from the mean x . For example, if z = 2, that is xx 2 x x 2s , which means x is 2 s standard deviations to the right of the mean x . Example 2. Given scores of a student’s exams in a class. Which one is his/her best performance in his/her class ? Which one is the worst ? Exam 1 2 3 4 Score 40 50 60 70 Class average 30 53 65 69 Standard deviation 20 8 15 10 Solution: To answer the question we need to compute the standard score of each exam score. Exam 1: z1 40 30 0.5 0.5 st. dv.’s to the right of 30. 20 Exam 2: z 2 50 53 0.375 0.375 st. dv.’s to the left of 53. 8 Exam 3: z3 60 65 0.333 0.333 st. dv.’s to the left of 65. 15 Exam 4: z 4 70 69 0.1 0.1 st dv.’s to the right of 69. 10 Therefore, score of exam 1 is the best and score of exam 2 is the worst.