Measure of Location

advertisement
Measure of Location
Notation: We use n to denote the sample size; i.e. the number of
observations in a single sample.
e.g. if the sample of students’ heights is {180cm, 175cm, 191cm,
184cm, 178cm, 188cm}, then n = 6.
Furthermore, we use x1 , x2 , . . . , xn to denote the sample data.
e.g. in the above example,
x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
1 / 11
Measure of Location
Sample Mean:
The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as
Pn
xi
x1 + x2 + · · · , xn
x̄ =
= i=1
n
n
Remark:
P
1. For simplicity, we can informally write x̄ = nxi , where the
summation is over all sample observations.
2. When reporting x̄, we use decimal accuracy of one digit more than
the accuracy of the xi ’s.
3. The average of all values in the population is defined as population
mean and it is denoted by the Greek letter µ. In statistics, µ is
usually unavailable and we want to get some infomation about
population mean µ from sample mean x̄.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
2 / 11
Measure of Location
Example:
In the previous example, the sample is {180, 175, 191, 184, 178, 188} and
the sample size is 6; then the sample mean is calculated as
x̄ =
Liang Zhang (UofU)
180 + 175 + 191 + 184 + 178 + 188
= 182.7
6
Applied Statistics I
June 9, 2008
3 / 11
Measure of Location
Pros and Cons
Pros: the sample mean tells us the location (center) of the sample.
Cons: the sample mean can be significantly affected by outliers
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 11
Measure of Location
Sample Median
The sample median is obtained by first ordering the n observations
from smallest to largest (with any repeated values included so that
every sample observation appears in the ordered list). Then,
(
th
( n+1
if n is odd
2 ) ordered value,
x̃ =
n
n th
th
average of ( 2 ) and ( 2 + 1) ordered values, if n is even
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 11
Measure of Location
e.g. in the previous example, the sample is
x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Then the
ordered observation is
x1:6 = 175, x2:6 = 178, x3:6 = 180, x4:6 = 184, x5:6 = 188, x6:6 = 191.
And the sample median is the average of x3:6 and x4:6 , which is 182, since
the sample size is even.
If we have one more observation x7 = 189, then the ordered observation is
x1:7 = 175, x2:7 = 178, x3:7 = 180, x4:7 = 184, x5:7 = 188, x6:7 = 189, x7:7 =
191 and the sample median is x4:7 = 184, since the sample size now is odd.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 11
Measure of Location
Remark:
1. Contrary to the sample mean, the sample median is very insensitive to
outliers. In fact, the sample median is affected by at most two values in
the sample.
2. Similar to the sample mean and the population mean, we can define the
population median. However, in general, the sample median DOES NOT
equal to the population median. In statistics, we want to use sample
median to infer population median.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
7 / 11
Measure of Location
Other Measures of Location:
Quartiles: a quartile is any of the three values which divide the
ordered data set into four equal parts, so that each part represents
( 41 )th of the sample.
e.g. If our sample data about the students’ height is 180, 175, 191,
184, 178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172, then
the ordered data would be
169 170 172 172 | 175 177 178 180 | 181 183 184 186 | 188 189 191
197. And a summer of this sample data is given by:
Min. 1st Qu. Median Mean 3rd Qu. Max.
169.0
173.5
180.5
180.8
187
197.0
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
8 / 11
Measure of Location
Other Measures of Location:
Percentiles: A percentile is the data value below which a certain
percent of observations fall.
e.g. the 20th percentile is the value below which 20 percent of the
observations may be found. In our previous example, the sampel size
is 16, 20% which is 3.2. So the 20th percentile is 171.
Trimmed Mean: a p% trimmed mean is obtained by eliminating the
smallest p% data values and the largest p% data values and
averaging the left data values. It is a compromise between sample
mean and sample median.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
9 / 11
Measure of Location
Other Measures of Location:
Trimmed Mean:
e.g. in our previous example, the sample data is 180, 175, 191, 184,
178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172. If we
want to eliminate the largest and smallest observation, then it is a
1
16 = 6.25% trimmed mean. Then the 6.25% trimmed mean is
x̄tr (6.25%) = 180.4.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
10 / 11
Measure of Location
Categorical Data:
In some cases, we can assign values to categorical data. Then we
can calculate the sample mean. In that situation, the sample mean
would be the sample proportion.
e.g. if we toss a coin 10 times and get the result T, H, T, T, H,
T, H, H, H, T, we can assign 0 to T and 1 to H. Then, the sample
mean would be (1 + 1 + 1 + 1 + 1)/10 = 0.5 which is exactly the
proportion of heads in the sample data.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
11 / 11
Download