Understanding centre and spread

advertisement
Understanding centre and spread (year 11/12)
Centre and spread are the two most basic concepts used for descriptive and
comparative statistics. If we are describing one sample or comparing two
samples, we want to be able to identify the one best number that describes
the whole group (the centre) and how variable the individuals in that group
are about that centre (spread). Other descriptors like skew and unusual
features are also useful, but the most basic information about the distribution
is captured by describing centre and spread.
centre
spread
one best number to describe the
group
how different members of the
group are from each other
position
variation
central tendency
dispersion
signal
noise
Centre
Median and mean are measures of central tendency or average. If you
needed one number to describe the whole group, this would be it. Centre
describes position, how far along a scale the group is.
Describe what you observe and use the median as confirmation of your
observations. You must demonstrate that you understand what the median
measures in terms of the context (not the formula). Suitable words for
demonstrating that understanding include “on average” and “tend to”.
Spread
As well as describing the centre or position of our sample, we want to
describe its variability, how different the values are from each other, or how
different the values are compared to the centre. We need something to
measure the variability of the whole sample. IQR is a measure of variation or
spread for a group. It describes how different the values are from each other.
The discussion of spread should be separate from the discussion of centre,
and should not include any reference to position along the scale. Range is not
useful as a measure of spread, since it is determined only by extreme values.
Describe what you observe and use the IQR as confirmation of your
observations. Give the IQR with units in context and demonstrate that you
understand what the IQR measures in terms of the context (not the formula,
so not “width of middle 50%”). The concept being described is the variability
of the whole sample or population. Large values of IQR indicate a lot of
variability in the sample or population. What “large” means depends on the
context. In manufacturing, variability needs to be small. Some natural
populations are very variable. Consider whether the variation described by
the IQR is large for the context you are investigating.
Note that a measure of variability is a measure for the whole group, so we say
that “there is more variation in the heights of males than there is for females
in my sample”. We don’t say “tends to” or “on average” when we are talking
about variation.
Shift and Overlap
Shift and overlap are comparisons of the centre relative to spread for two
samples, answering the questions “Which one is bigger?” and “How much
bigger, relative to the variation in each sample?”
Think, describe what you see and relate it to the real world. You will not get
credit for general sentences which do not relate to the context and could be
taken from the table of statistics without understanding. For example, it is not
acceptable to say that “the median for females is 165cm which is about 2cm
less than the median for males at 167cm”, unless it is followed by further
interpretation. Show understanding of the context (eg “shorter” or “taller”
showing an understanding that you are discussing height) and understanding
of the concept of average (eg “tend to”).
Example 1:
In my sample I notice that the year 9 boys tend to be taller than the year 9
girls. The middle 50% of the boy’s heights is shifted further up the scale than
the heights of the girls. There is quite a lot of overlap between the middle 50%
of boy’s and girl’s heights, indicating that there are a lot of boys and girls in
my sample who are quite similar in height. The median height for boys in my
sample was 167 cm, while the mean height for girls was 165cm. This
confirms that the boys in my sample tend to be about 2cm taller than the girls.
This makes sense because lots of year 9 boys I know are taller than the year
9 girls I know.
In my sample I notice that the heights of year 9 boys have a similar spread to
the heights of year 9 girls. This is confirmed by the IQR of my sample of girl
heights which is 12cm showing a reasonable amount of variation. The IQR of
boy heights is 9.4cm, only slightly smaller. This means that there is less
variation in the heights of boys than girls in my sample. This makes sense
because there are more short girls than there are short boys, but there are tall
students of both sexes in my sample. I wonder if the difference in spread for
girl and boy heights in my sample is due to sampling variability?
Example 2:
In my sample I notice that the males have driven at a faster maximum speed
than females, on average. The middle 50% of male speed is shifted toward
higher speed than females, and only part of the middle 50% of males and
females overlaps. The median maximum speed driven by males is 120km/hr
while the median maximum speed driven by females is 15km less at 105km/hr,
which confirms my observations.
In my sample I notice that the males seem to be more variable in the
maximum speed driven compared to the female, with more males driving at
higher speeds. The IQR of maximum speed driven for females in my sample
is 77.5 km/hr, which is a huge amount of variation in speed. For males the
IQR was 40 km/hr, which is quite a lot less. This means that the males in my
sample are more similar to each other in the maximum speed driven than the
females are. This doesn’t at first make sense when I compare it to my initial
observations. This does make sense when I look closer. The IQR being
higher for females is because the female distribution is more bimodal with lots
of females never having driven (maximum speed of zero), but most others
clustered between 90 km/hr and 120 km/hr, so the statistical spread is higher
for females than males.
Download