Ch2.1 Numerical Summary Measures of Center for Data --------------------------------------------------------------------------------------------------------Topics: Measures of Center: mean, median (Note: Here we will cover the summary measure for DATA only. We will cover the measures for DISTRIBUTIONS in the 4th and 5th weeks after we introduce the concept of probability distribution.) --------------------------------------------------------------------------------------------------------- I: Measures of Center for Data: (1) Mean Mean x of n observations x1 , x2 ,..., x n is (here subscript doe not contain information on the magnitude of a data point) x x1 x2 ... xn n Ex. Sue wanted to study the systolic blood pressure (BP), x, of the NCSU freshmen; 7 freshmen were randomly selected and their BP values are 121, 110, 114, 100, 103, 130, 130 (Note: x i 808 ) The sample mean of BP is BP 808/ 7 115.4 We use one more digit in the mean than the original data Note: The R function to calculate mean is mean() > bp <- c(121, 110, 114, 100, 103, 130, 130) > mean(bp) 1 (2) Median Median is the middle value of the data such that there are same numbers of data points above it and below it. To get the Median ~ x of the n observations in the sample: (1) Sort the data, from the smallest to the largest (2) If n is odd, then ~ x = the middle value, i.e., n 1 ~ x = the th data point 2 If n is even, then ~ x = the average of the middle two values, i.e., 1 n ~ [ the th data point + the x= 2 2 n 1 th data point ] 2 Ex. In the BP example, there are 7 observations: 121, 110, 114, 100, 103, 130, 130. The sample median is: Sorted data is: 100 103 110 114 121 130 130. So BP 114 Ex. In BP example, there is one more data point 105. Then the sample median becomes: The sorted data becomes: 100 103 105 110 114 121 130 130. So BP (110 114) / 2 112 . Note: The R function to calculate mean is median() > median(bp) 2 Comment (1) : Mean vs. Median 1. Mean is sensitive to outliers (extreme values), while median is less affected by outliers. Ex. Data set 1: {1, 2, 3}. Data set 2: {1, 2, 99}. The mean of data set 1: 2 The median of data set 1: 2 The mean of data set 2: 34 The median of data set 2: 2 2. Mean is the __balance point_ of the data. A balance point is the point such that sum of the distance of the points above the mean = sum of the distance of the points below the mean Ex. A sample consists of 5 data points 1, 2, 3, 10, 14. The mean x = 6 Data point x Data point x 1, 2, 3 10, 14 Total distance to x =4+8=12 Total distance to x =5+4+3=12 Median is the _midpoint__ of a the distribution That is, half of the data points are above or below the median. 3 3. The relationship between mean and median depends on the shape of the distribution a. For symmetric distribution, mean median b. For positively-skewed distribution, mean > median c. For negatively-skewed distribution, mean < median In other word, from the relationship between mean and median, we can guess the shape of the distribution 4 Comment (2) : Change of Unit Mean and median share the same unit as the measuring scale. The values change with the measuring unit. Original data: x1 , x2 ,..., xn , transformation: y = ax + b. New data: When unit of measure changes from x to y a x b , then The new mean y a x b The new median y a x b Ex. The temperatures in Raleigh in the next 6 days are predicted to be 43, 39, 33, 39, 45 and 48 in Fahrenheit. What are the mean and median of these temperatures. What are the mean ( xC ) and median ( xC ) if we switch to Centigrade? Note that C F 32 5 . 9 Mean = 41.2 (F) Median = 41 (F) New mean in C = 41.17*5/9 = 22.9 (C) New median in C = 22.8 (C) 5