How to describe a distribution

advertisement
Lesson 1-2
Describing Distributions with Numbers
How to describe a distribution
Make sure you consider all of the following points:
1. What is the approximate center of the distribution?
2. Do any unusual features stick out? Does it have outliers?
3. Is the distribution symmetric or is it skewed? Does the distribution have a single, central mode
(unimodal) or does it have several modes (bimodal or multimodal)? Is it uniform?
4. How spread out is the distribution? Remember, the range is a single value: maximum – minimum
It might be helpful to use CUSS to remember all the important points: Center, Unusual points, Shape,
Spread. If you are asked to compare two graphs, discuss both similarities and differences. Be sure to include
comparative words such as greater, less, more, half, etc.
Measures of Center
The mean is the most common of measures of central tendency. If n observations are denoted by x1, x2, x3, …, xn,
their mean is:
x  x  x    xn
x 1 2 3
n
A more common notation is: x 
1
 xi
n
Another measure of center is the middle number in a dataset, the median. To find the median of a
distribution:
1. Arrange all observations in order of size, from the smallest to the largest.
2. If the number of observations, n, is odd, the median M is the center observation in the ordered list.
3. If the number of observations, n, is even, the median M is the mean of the two center observations in the
ordered list.
Unusual Points and Spread
The distance between the first and third quartiles is called the Interquartile range (IQR). It measures the
spread of the middle half of the data. The interquartile range is a single value, not a place on a graph.
For most purposes, an outlier will be any value which falls more than 1.5 × IQR below Q1 or 1.5 × IQR
above Q3.
The five-number summary consists of the smallest observation, the first quartile, the median, the third
quartile, and the largest observation.
Minimum
Q1
M
Q3
Maximum
The term percentile refers to the percent of data points which fall at or below an individual data point.
The variance (s2) and the standard deviation (s) measure how far the “typical” observations are from the
mean. The variance is the average squared deviation. In order to retain the original units of measure,
statisticians use the square root of the variance, the standard deviation.
 x  x    x2  x 
 1
2
s
2
2
 ...   xn  x 
2
n 1
2
Or, in compact notation: s 2  1
 xi  x 

n 1
The standard deviation is the square root of the variance: s 
1
2
 xi  x 

n 1
Shape
A distribution is symmetric if the right and left sides of the histogram are roughly mirror images of each
other.
A distribution is skewed if one side stretches out much farther than the other. The distribution is skewed to
the left if the left side extends out farther and it is skewed to the right if the right side extends out much
farther.
Mean vs. Median
The mean is nonresistant because it is sensitive to the influence of extreme observations which may or may
not be outliers. A skewed distribution with no outliers will still pull the mean in the direction of the tail.
The median is a resistant measure of center because it is not sensitive to a few extreme observations.
When to Use Standard Deviation
Variance and standard deviation are used to describe the spread when the mean is used to describe the
center. Since both the mean and standard deviation are influences by outliers or skewed distributions, they
should only be used for reasonably symmetric distributions. 
What Can Go Wrong?





Don’t forget to sort the values before you find the median, quartiles, or percentiles.
Don’t compute numerical summaries for categorical data.
Beware of outliers.
Make a picture as your first step. Don’t use the mean and standard deviation until you know it is
reasonable to do so.
Don’t forget to check the reasonableness of your answers.
Homework: (Distributions, Graphs, and Numerical Summaries, oh my!)
Page 42:
Page 70:
37, 39, 40, 44, 45, 46, 49, 51, 52, 64, 65, 69-74
79-81, 83-92, 94, 96, 97, 98, 103, 104, 107-110.
Download