Answers When might the reported median of data be more

advertisement
Answers
1. When might the reported median of data be more appropriate than the mean of data?
What is your biggest challenge in deciding which is more appropriate? Add an
example in order to better illustrate your answer.
The mean and the median are both measures of central tendency. The mean is defined
as the sum of the observations divided by the number of observations. Consider the
observations
4
5
7
5
9
5
7
Number of observations = 7
Sum of the observations = 4+5+7+5+9+5+7 = 42
Mean = 42/7 = 6.
The median is defined as the middlemost observation when the observations are
arranged in the order of magnitude. First arrange the given observations in the order
of magnitude. Then we get
4
5
5
5
7
7
9
The middle most observation is the 4th observation. So the median is 5.
When the distribution of the observations is symmetric, the mean and the median will
be equal. When the distribution is moderately asymmetric, the two measures will be
approximately equal. When the distribution is highly skewed (asymmetric), the two
measures will be very different.
The mean is abnormally affected by extreme items. The median is not much affected
by extreme items. The following example will illustrate it.
Suppose in the above data, the last observation 7 is replaced by 25. Then the data
becomes
4
5
7
5
9
5
25
The mean is (4+5+7+5+9+5+25)/6 = 60/6 =10.
The median is 5
The mean changed from 6 to 10. But the median is unaltered.
When the magnitudes of the observations are not known and only the relative
positions or ranks are known, the mean cannot be computed. In this case we can
compute the median.
2. What is the significance of the standard deviation in the empirical rule? Please add an
example to bolster your answer.
The standard deviation is a measure of dispersion. It is defined as the square root of
the variance. The variance is defined as the mean of the squares of the deviations of
the observations from the mean. Consider the data
4
5
7
5
9
5
7
The mean is 6. The deviations of these observations from the mean are
-2
-1
1
-1
3
-1
1
The squares of the deviations are
4
1
1
1
9
1
1
The sum of the squared deviations is 4+1+1+1+9+1+1=18
Variance = 18/7 = 2.571
Standard deviation is √2.571 = 1.604
The most important distribution studied in statistics is the normal distribution. It is
characterised by two parameters μ and σ. μ is the mean and σ is the standard
deviation. This distribution is symmetric about the mean μ. Approximately 99.73% of
the observations lie between 3 standard deviations from the mean. Approximately
95.45% of the observations lie between 2 standard deviations from the mean.
Approximately 68.27% of the observations lie between 1 standard deviation from
the mean.
The z score corresponding to a value x is defined as (x – μ)/ σ, where μ is mean and σ
is standard deviation. If the variable x follows the normal distribution with mean μ
and standard deviation σ, then the z score follows the standard normal distribution.
The standard normal distribution is the normal distribution with mean 0 and standard
deviation 1.
The normal distribution is very important because of the central limit theorem. The
central limit theorem says that whatever be the distribution of the population, the
sample mean is approximately distributed as normal if the sample size is sufficiently
large.
3. Can a z score be negative? Why or why not?
The z score, z = (x – μ)/ σ can take any real value. If x is less than the mean μ, x – μ is
negative and z = (x – μ)/ σ is negative. If x is equal to the mean μ, x – μ is 0 and z = (x
– μ)/ σ is 0.
4. When is it more appropriate to use the standard deviation of data rather than the
variance of data? Is one a better measure of dispersion than the other? Explain and
provide an example.
The standard deviation and the variance are not really two different measures of
dispersion. The variance is the square of the standard deviation. The standard
deviation is the square root of the variance. In our example given above, variance =
2.571 and standard deviation = 1.604. The square of 1.604 is 2.571 and the square
root of 2.571 is 1.604.
Since variance is the mean of the squared deviations, the unit of variance is the square
of the unit of data. But the unit of standard deviation is the same as the unit of data.
For example, if the data are given in inches, the variance is in squared inches and the
standard deviation is in inches. If we know the value of the variance, then we know
the value of the standard deviation and vice versa.
Download