powerpoint

advertisement
Chapter 2
Turning Data
Into
Information
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
Some key words:
• sample vs. population
• categorical (ordinal?) vs. quantitative
• explanatory vs. response
• outlier
• mean vs. median
• standard deviation vs. range vs. IQR range
• histogram vs. boxplot
• shape?
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
2
Symmetric: mean = median
Skewed Left: mean < median
Skewed Right: mean > median
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
3
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
4
2.7 Bell-Shaped Distributions
of Numbers
Many measurements follow a predictable pattern:
• Most individuals are clumped around the center
• The greater the distance a value is from the
center, the fewer individuals have that value.
Variables that follow such a pattern are said
to be “bell-shaped”. A special case is called
a normal distribution or normal curve.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
5
Example 2.11 Bell-Shaped
British Women’s Heights
Data: representative sample of 199 married British couples.
Below shows a histogram of the wives’ heights with a normal
curve superimposed. The mean height = 1602 millimeters.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
6
Describing Spread
with Standard Deviation
Standard deviation measures variability
by summarizing how far individual
data values are from the mean.
Think of the standard deviation as
roughly the average distance
values fall from the mean.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
7
Calculating the Standard Deviation
Formula for the (sample) standard deviation:
 x  x 
2
s
i
n 1
The value of s2 is called the (sample) variance.
An equivalent formula, easier to compute, is:
s
x
2
i
 nx
2
n 1
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
8
Population Standard Deviation
Data sets usually represent a sample from a larger
population. If the data set includes measurements for
an entire population, the notations for the mean and
standard deviation are different, and the formula for
the standard deviation is also slightly different.
A population mean is represented by the symbol m
(“mu”), and the population standard deviation is
 x  m 
2

i
n
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
9
Interpreting the Standard Deviation
for Bell-Shaped Curves:
The Empirical Rule
For any bell-shaped curve, approximately
• 68% of the values fall within 1 standard
deviation of the mean in either direction
• 95% of the values fall within 2 standard
deviations of the mean in either direction
• 99.7% of the values fall within 3 standard
deviations of the mean in either direction
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
10
The Empirical Rule, the Standard
Deviation, and the Range
• Empirical Rule => the range from the
minimum to the maximum data values equals
about 4 to 6 standard deviations for data with
an approximate bell shape.
• You can get a rough idea of the value of the
standard deviation by dividing the range by 6.
Range
s
6
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
11
Example 2.11 Women’s Heights (cont)
Mean height for the 199 British women is 1602 mm
and standard deviation is 62.4 mm.
• 68% of the 199 heights would fall in the range
1602  62.4, or 1539.6 to 1664.4 mm
• 95% of the heights would fall in the interval
1602  2(62.4), or 1477.2 to 1726.8 mm
• 99.7% of the heights would fall in the interval
1602  3(62.4), or 1414.8 to 1789.2 mm
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
12
Example 2.11 Women’s Heights (cont)
Summary of the actual results:
Note: The minimum height = 1410 mm and the maximum
height = 1760 mm, for a range of 1760 – 1410 = 350 mm.
So an estimate of the standard deviation is:
Range 350
s

 58.3 mm
6
6
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
13
Standardized z-Scores
Standardized score or z-score:
Observed value  Mean
z
Standard deviation
Example: Mean resting pulse rate for adult men is 70
beats per minute (bpm), standard deviation is 8 bpm.
The standardized score for a resting pulse rate of 80:
80  70
z
 1.25
8
A pulse rate of 80 is 1.25 standard deviations
above the mean pulse rate for adult men.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
14
The Empirical Rule Restated
For bell-shaped data,
• About 68% of the values have
z-scores between –1 and +1.
• About 95% of the values have
z-scores between –2 and +2.
• About 99.7% of the values have
z-scores between –3 and +3.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
15
Download