Spread

advertisement
Chapter 2: Exploring Distributions
The goal of this chapter is to provide you with the basic concepts and tools for exploring distributions of
____________ data (data which comes from observing a single variable).
Univariate data can be grouped into two categories:
1. Categorical (qualitative) Variables
The observations specify the group (category) to which an individual belongs.
e.g. the brand of calculator students own, whether a mammal is considered wild or not.
2. Numerical (quantitative) Variables
The observations assign a numerical value or measurement to each individual.
e.g. gestation period of elephants, maximum longevity of smokers.
2.1 Visualizing Distributions: Shape, Center, and Spread
This first section simply introduces various distribution shapes and asks you to estimate some summary
values visually. Later we will compute summary values numerically.
The AP Statistics Exam frequently asks you to describe or compare distributions. Although it may not
mention each of these specifically, when discussing numerical variables you must always include three
things (categorical variables will be covered later):
1. Shape (if a distribution does not look like any of these then it is said to be “indistinct”.)
The common shapes of a distribution are:
 Uniform
 _________
 Skewed (left or right)
 Symmetric
 Bimodal
Any other features – such as __________, clusters or gaps – should also be mentioned when
discussing the shape.
2. Center
The two common measures of center are the:
 Mean (average value)
 Median (middle value)
The ________, or most common value, is not very useful except when talking about bimodal
distributions.
3. Spread
The spread measures the ___________ in the values. The two common measures of spread are:
 Interquartile range
 Standard Deviation
Distributions come in a variety of shapes, and the following is a summary of the four most common.
Uniform (Rectangular) Distributions
In a uniform distribution all values occur __________ often. The shape is rectangular.
e.g. The last digit of social security codes is fairly uniform. To summarize this distribution, you might
write “The distribution of the last digits of 50 social security numbers is approximately uniform, with
about 5 out of 50 (10%) per digit.
Dot Plot
Social Security Digits
0
2
4
6
SS_Last_Digit
8
10
Normal Distributions
This is one of the most important shapes – but you must be careful about applying the term normal to
any distribution that is bell-shaped and symmetric. In a normal distribution the shape is symmetric.
There is a single peak at the line of symmetry, and the curve drops off smoothly on both sides, flattening
toward the x-axis but never quite reaching it and stretching infinitely far in both directions. The mean,
median and mode are all at the line of symmetry.
___________
point
(where the
curve changes
from concave
up to concave
down)
Skewed Distributions
Distributions which are bunched at one end and have a long “______” at the other end are called
skewed. The direction of the tail tells whether the distribution is:

Skewed right – tail stretches right

Skewed left – tail stretches left
e.g. The distribution of GPA’s of 62 students is skewed left.
Grade-Point Averages of 62 students (each dot represents tw o points)
0.5
1.0
1.5
2.0
GPA
2.5
3.0
Dot Plot
3.5
4.0
Typically the median is used to describe the center of a skewed distribution. To estimate the median
from a dot plot, locate the value that divides the dot into two halves. Use the interquartile range to
indicate the spread.
Interquartile range = upper quartile – lower quartile
The value that divides the
upper half into two halves.
The value that divides the
lower half into two halves.
Bimodal Distributions
A distribution that has two peaks is bimodal. These peaks should be definite and separated by some
distance before calling the distribution bimodal. It doesn’t make sense to discuss the center of the whole
distribution, but you could talk about the location of the two peaks and give a reason for the two
modes.
Other Features: Outliers, Gaps, and Clusters
An ___________ is a value that stands apart from the bulk of the data (later we will look at a rule to help
identify an outlier).
There is no formal definition of a gap, or a cluster – just use common sense.
e.g. Lord Rayleigh noticed two clusters separated by a gap when he measured the density of nitrogen of
samples collected from the atmosphere (the cluster on the right) and from a chemical procedure (the
cluster on the left). This led him to discover inert gases in the atmosphere.
Dot Plot
Rayleigh's Nitrogen Data
2.298
2.300
2.302
2.304
2.306
Density
2.308
2.310
2.312
Example:
Describe the distribution below in terms of shape, center, and spread.
Dot Plot
Los Angeles Rainfall 1899-1999
0
Solution:
Shape:
Center:
Spread:
Other Features:
5
10
15
20
Rainfall (inches)
25
30
35
Download