1.2 Describing Distributions with Numbers

advertisement
1.2 Describing Distributions
with Numbers
Age of Presidents at Inauguration
President
Age
President
Age
President
Age
Washington
57
Buchanan
65
Coolidge
51
J. Adams
61
Lincoln
52
54
Jefferson
57
A. Johnson
56
Hoover
F. D,
Roosevelt
Madison
57
Grant
46
Truman
60
Monroe
58
Hayes
54
Eisenhower
61
J. Q. Adams
57
Garfield
49
Kennedy
43
Describe the Histogram in terms of
center, shape, spread, and outliers???
Arthur at Inauguration
51
L. Johnson
Ages of 61
Presidents
Jackson
14
Number of Presidents
12
10
8
51
55
Van Buren
54
Cleveland
47
Nixon
56
W. H. Harrison
68
B. Harrison
55
Ford
61
Tyler
51
Cleveland
55
Carter
52
Polk
49
McKinley
54
Reagan
69
Taylor
64
T. Roosevelt
42
Bush
64
Fillmore
50
Taft
51
Clinton
46
Pierce
48
Wilson
56
Bush
54
Harding
55
Obama
47
6
4
2
0
40-44
45-49
50-54
55-59
Age at Inauguration
60-64
65-69
Mean:
ο‚—
ο‚—
ο‚—
ο‚—
The most common measure of center (A.K.A. average)
Denoted by π‘₯
The Mean is considered Non-resistant because it is
sensitive to extreme values. May or may not be outliers.
On Calculator use 1 Var Stat to get the mean.
Median:
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
The middle value of the set of data
Denoted as M
If the # of observations is odd, the median is the center
observation.
If the # of observations is even then take the mean of the
two center observations.
Median is resistant to extreme values
On Calculator use 1 Var Stat to get the median.
Example 1: Find 𝒙 and M for the
set of data
20
Number of Hysterectomies performed by a male doctor in one year
25 25 27 28 31 33 34 36 37 44 50 59 85
π‘₯=41.3
M=34
Example 2: Find 𝒙 and M for the
set of data
Number of Hysterectomies performed by a female doctor in one year
5
7
10
14
18
19
25
29
31
33
π‘₯=19.1
M=18.5
86
Comparison of 𝒙 and M
ο‚—
If……
β—¦ Symmetrical – then they are very similar (close in value)
β—¦ Skewed – Then π‘₯ is farther out in the tail than the median
β—¦ Exactly symmetrical – exactly the same
Measuring Spread: Range & the
Quartiles
ο‚—
Range = Largest Value – Smallest Value
𝑄1 - Lower Quartile – median of the observations smaller than the median
𝑄2 - Median
𝑄3 - Upper Quartile - median of the observations larger than the median
ο‚—
𝐼𝑅𝑄 – Interquartile Range
ο‚—
ο‚—
ο‚—
(𝑄3 − 𝑄1 )
ο‚– Outliers fall more than 1.5 × πΌπ‘…π‘„ below 𝑄1 or above 𝑄3
** 1 – Var stats on your Calculator gives them all
to you.
5 – Number Summary
ο‚—
The 5# Summary consists of the smallest and
largest observations from a set of data along
with 𝑄1 , 𝑀, and 𝑄3 .
ο‚—
The 5# summary leads to a new graph called the
box and whisker plot (boxplot).
ο‚—
Best used for comparing two sets of data
Example 3: Find any outliers for
the set of data.
20
Number of Hysterectomies performed by a male doctor in one year
25 25 27 28 31 33 34 36 37 44 50 59 85
86
• 𝐼𝑅𝑄 = 𝑄3 − 𝑄1
• 50 − 27 = 23
• 1.5 × πΌπ‘…π‘„ = 34.5
• 𝑄3 + 34.5 = 84.5
𝑄1 − 34.5 = −24.5
• Therefore, the observations 85 and 86 are both outliers for
the set of data.
Example 4: Create a boxplot for
each set of data. What can you
conclude?
20
Number of Hysterectomies performed by a male doctor in one year
25 25 27 28 31 33 34 36 37 44 50 59 85
86
M
Max
Min
𝑄1
𝑄3
Number of Hysterectomies performed by a female doctor in one year
5
7
10
14
18
19
25
29
31
33
18.5
Min
𝑄1
M
𝑄3
Max
Standard Deviation
ο‚—
ο‚—
ο‚—
Measures spread by looking at how far the
observations are from the mean.
Denoted by s
** 1 – Var stats / Sx
Properties of Standard Deviation
ο‚—
ο‚—
ο‚—
s measures spread about the mean and should
be used only when the mean is used.
As s gets larger the observations are more
spread out from the mean
s is highly influenced by outliers
Example 5: Find the standard
deviation for the set of data
20
Number of Hysterectomies performed by a male doctor in one year
25 25 27 28 31 33 34 36 37 44 50 59 85
𝑠 = 20.6
86
*** 5# Summary is usually better
than the mean and standard
deviation for describing a skewed
distribution. Use the mean and
standard deviation for data that is
reasonably symmetric
Download