2.3 Measures of Center and Spread

advertisement
2.3 Measures of Center and Spread
In this section you will learn to compute exact values of those same summary statistics you
estimated _______________ in the previous sections.
Measures of Center
The Mean of a sample:
x
(“x-bar”) – the “average value”
sum of all t he x values
number of values
x  x  ...  xn
 1 2
n
x

From a frequency table
(discrete data):
x=
n
From a frequency table
(continuous data):
x=
n
The mean can be estimated visually on a dot plot or histogram by finding the __________ ________
of the distribution (where you would have to place a finger below the horizontal axis in order to
balance the distribution).
The Median: M – the “_________ value”
1. Arrange all the values in order.
2. If the number of values is odd, the median is the middle one. The middle value is in the
position
n +1
.
2
3. If the number of values is even, the median is the average of the _____ ________
_________ .
Visually, the median is the value on the horizontal axis that separates the histogram into two parts,
with 50% of the area under each part of the curve.
1 | Section 2 . 3
Mean v. Median
The mean and median of a _______________ distribution are close together. If the distribution is
exactly symmetric, the mean and median are ____________.
In a skewed distribution, the values in the “tail” pull the mean up or down, so the mean generally
lies farther out in the tail than the median. In fact, the mean can be very sensitive to the presence of
even a single outlier, which can make it suspect as a measure of center.
Example: 40 students were enrolled in a course at Cal Poly. One month after the course began the
instructor requested a report that indicated how many times each student had accessed a web page
on the class site. The 40 observations were:
0
7
16
37
0
7
18
42
0
8
19
84
0
8
19
331
0
8
20
0
12
20
3
12
21
4
13
22
4
13
23
4
13
26
5
14
36
5
14
36
a) Compute the values of the mean and the median of this data set.
b) Of the mean and median, which does the best job of describing a typical value for this data
set? Explain.
Measures of Spread
Two AP Statistics classes took the same test. Here are their results:
Class 1: 78, 78, 78, 78, 78, 78, 78, 78, 78, 78
Class 2: 60, 64, 66, 74, 77, 79, 84, 90, 92, 94
What are the median and mean scores for both classes?
Mean Class 1:
Median Class 1:
Mean Class 2:
Median Class2:
Can you conclude that the classes performed in the same way given only these measures of their
centers?
2 | Section 2 . 3
The Range
The range is the simplest measure of variability. It is defined as:
range = largest value – smallest value
The Interquartile Range: IQR
The IQR is a measure of variability that is resistant to the effects of outliers. It is based on ___________.
1. Arrange the values in increasing order.
2. Find the median (Q2).
3. Find the quartiles:
lower quartile (Q1) = median of the lower half
upper quartile (Q3) = median of the upper half
If there are an odd number of values the median is excluded from both halves when finding
the quartiles.
Note: there is no standard rule for finding the quartiles so you will find different statistical
software packages use different procedures that can give slightly different values.
4. Calculate the IQR:
interquartile range = upper quartile – lower quartile
Five-Number Summaries
The following collection of summary measures is often referred to as the five-number summary.
1.
2.
3.
4.
5.
Minimum – the smallest value.
Lower quartile – the median of the lower half of the ordered values.
Median – the value that divides the ordered values into halves.
Upper quartile – the median of the upper half of the ordered values.
Maximum – the largest value.
These values give a reasonably complete description of center and spread. They also lead to another
visual representation of a distribution, the boxplot.
Boxplots
A boxplot is a compact display that provides information about the center, spread, and symmetry or
__________ of the data. There are two types of boxplots: the regular boxplot and the ____________
boxplot (the latter is the one we typically use).
Regular Boxplot
Modified Boxplot
Like the regular boxplot except the whiskers
only extend to the largest and smallest
non-outliers. Any outliers appear as
individual dots or other symbols.
3 | Section 2 . 3
An outlier is a value that is more than 1.5 times the IQR from the nearest quartile. Once again other
methods of identifying outliers exist, but the 1.5 ∙ IQR method is the most common.
The Standard Deviation of a sample: s
The most common measures of variability describe the extent to which the values deviate from the
mean. The deviations from the sample mean are the differences
( x1 - x ), ( x2 - x ), ( x3 - x )....( xn - x )
It is always true that
å( x - x ) = 0 , so therefore the _____________ deviation cannot be used as a
measure of variability. Instead the deviations are squared to prevent the negative and positive
deviations from “cancelling out” when summed (absolute values would also work, but squares are
much easier to work with).
From a frequency table:
å( x - x )
s=
2
n -1
s 2 = variance
s=
å( x - x )
2
×f
n -1
Why divide by n – 1? Essentially the reason is to adjust for working with a ___________. The
population standard deviation is estimated by the sample standard deviation. The variability in a
random sample tends to be less than in the entire population. Thus, you divide by n – 1 rather than n
to ___________ the estimate of the population standard deviation a bit. As the sample size
increases it makes little difference whether you divide by n or by n – 1.
Example: page 70, P21
Stemplot of average mammal longevities
0
∙
1
∙
2
∙
3
∙
4
|
|
|
|
|
|
|
|
|
1
5
0
5
0
5
3
5
0
5
0
4
5
0
5
0
1 | 5 stands for 15 years
6 7 7 8 8
2 2 2 2 2 2 2 2 2
5 5 5 5 6
0
5
1
(a) Five-number summary:
(b) IQR =
(c) Q1 – 1.5 ∙ IQR =
Low end outliers:
(d) Q3 + 1.5 ∙ IQR =
High end outliers:
(e) Draw a modified boxplot.
4 | Section 2 . 3
Download