Numerical Summary Measures of Center for Data

advertisement
Ch2.1 Numerical Summary Measures of Center for Data
--------------------------------------------------------------------------------------------------------Topics:

Measures of Center: mean, median
(Note: Here we will cover the summary measure for DATA only. We will cover the
measures for DISTRIBUTIONS in the 4th and 5th weeks after we
introduce the concept of probability distribution.)
---------------------------------------------------------------------------------------------------------
I: Measures of Center for Data:
(1) Mean

Mean x of n observations x1 , x2 ,..., x n is (here subscript doe not contain
information on the magnitude of a data point)
x
x1  x2  ...  xn
n
Ex. Sue wanted to study the systolic blood pressure (BP), x, of the NCSU
freshmen; 7 freshmen were randomly selected and their BP values are
121, 110, 114, 100, 103, 130, 130 (Note:
x
i
 808 )
The sample mean of BP is
BP  808/ 7  115.4
We use one more digit in the mean than the original data
Note: The R function to calculate mean is mean()
> bp <- c(121, 110, 114, 100, 103, 130, 130)
> mean(bp)
1
(2) Median

Median is the middle value of the data such that there are same numbers of
data points above it and below it.

To get the Median ~
x of the n observations in the sample:
(1) Sort the data, from the smallest to the largest
(2) If n is odd, then ~
x = the middle value, i.e.,
 n 1
~
x = the 
 th data point
 2 
If n is even, then ~
x = the average of the middle two values, i.e.,
1
n
~
[ the   th data point + the
x=
2
2
n 
  1 th data point ]
2 
Ex. In the BP example, there are 7 observations: 121, 110, 114, 100, 103,
130, 130. The sample median is:
Sorted data is: 100 103 110 114 121 130 130. So BP  114
Ex. In BP example, there is one more data point 105. Then the sample
median becomes:
The sorted data becomes: 100 103 105 110 114 121 130 130.
So BP  (110  114) / 2  112 .
Note: The R function to calculate mean is median()
> median(bp)
2
Comment (1) : Mean vs. Median
1. Mean is sensitive to outliers (extreme values), while median is less affected
by outliers.
Ex. Data set 1: {1, 2, 3}. Data set 2: {1, 2, 99}.
The mean of data set 1: 2
The median of data set 1: 2
The mean of data set 2: 34
The median of data set 2: 2
2. Mean is the __balance point_ of the data.
A balance point is the point such that
sum of the distance of
the points above the mean
=
sum of the distance of
the points below the mean
Ex. A sample consists of 5 data points 1, 2, 3, 10, 14.
The mean x = 6
Data point  x
Data point  x
1, 2, 3
10, 14
Total distance to x =4+8=12
Total distance to x =5+4+3=12
Median is the _midpoint__ of a the distribution
That is, half of the data points are above or below the median.
3
3. The relationship between mean and median depends on the shape of the
distribution
a. For symmetric distribution, mean  median
b. For positively-skewed distribution, mean > median
c. For negatively-skewed distribution, mean < median
 In other word, from the relationship between mean and median, we can guess
the shape of the distribution
4
Comment (2) : Change of Unit
Mean and median share the same unit as the measuring scale. The values
change with the measuring unit.
Original data: x1 , x2 ,..., xn , transformation: y = ax + b.
New data:
When unit of measure changes from x to y  a  x  b , then
The new mean y  a  x  b
The new median y  a  x  b
Ex. The temperatures in Raleigh in the next 6 days are predicted to be 43, 39,
33, 39, 45 and 48 in Fahrenheit. What are the mean and median of these
temperatures. What are the mean ( xC ) and median ( xC ) if we switch to
Centigrade? Note that C  F  32 
5
.
9
Mean = 41.2 (F)
Median = 41 (F)
New mean in C = 41.17*5/9 = 22.9 (C)
New median in C = 22.8 (C)
5
Download