summarizing_graphics..

advertisement
Descriptive Statistics
Summarizing data using graphs
Which graph to use?
• Depends on type of data
• Depends on what you want to illustrate
• Depends on available statistical software
Bar Chart
Birth Order of Spring 1998 Stat 250 Students
40
Pe
ercent
30
20
10
Middle
Oldest
Only
Birth Order
n=92 students
Youngest
Bar Chart
• Summarizes categorical data.
• Horizontal axis represents categories,
categories while
vertical axis represents either counts
((“frequencies”)
frequencies ) or percentages ((“relative
relative
frequencies”).
• Used to illustrate the differences in
percentages (or counts) between categories.
Histogram
Age of Spring 1998 Stat 250 Students
Frequen
ncy (Count)
50
40
30
20
10
0
18
19
20
21
22
23
24
Age (in years)
n=92 students
25
26
27
Analogy
Bar chart is to categorical data as
histogram is to ...
measurement data.
Histogram
• Divide measurement up into equal-sized
categories.
g
• Determine number (or percentage) of
measurements falling into each category.
category
• Draw a bar for each category so bars’
heights represent number (or percent)
falling into the categories.
• Label
L b l andd title
i l appropriately.
i l
Histogram
Use common sense in determining
number
b off categories
t
i to
t use.
((Trial-and-error works fine,, too.))
Too few categories
Age of Spring 1998 Stat 250 Students
Frequen
ncy (Count)
60
50
40
30
20
10
0
18
23
Age (in years)
n=92 students
28
Too many categories
GPAs of Spring 1998 Stat 250 Students
7
Frequency (Count)
6
5
4
3
2
1
0
2
3
GPA
n=92 students
4
Dot Plot
Fastest Ever Driving Speed
226 Stat 100 Students, Fall '98
100
Men
126
Women
70
80
90
100 110 120 130 140 150 160
S
Speed
d
Dot Plot
• Summarizes measurement data.
• Horizontal axis represents measurement
scale.
• Plot one dot for each data point.
point
Stem-and-Leaf Plot
Stem-and-leaf of Shoes
12
63
(33)
43
25
12
8
4
4
2
2
1
1
1
1
1
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
N
= 139
Leaf Unit = 1.0
223334444444
555555555555566666666677777778888888888888999999999
000000000000011112222233333333444
555555556667777888
0000000000023
5557
0023
00
0
5
Stem-and-Leaf Plot
• Summarizes measurement data.
• Each data point is broken down into a
“stem” and a “leaf.”
• First,
First “stems”
stems are aligned in a column.
column
• Then, “leaves” are attached to the stems.
Box Plot
Amount of sleep in past 24 hours
of Spring 1998 Stat 250 Students
10
9
Hours o
of sleep
8
7
6
5
4
3
2
1
0
Box Plot
• Summarizes measurement data.
• Vertical (or horizontal) axis represents
measurement scale.
• Lines in box represent the 25th percentile
(“first quartile”), the 50th percentile
((“median”)
median ), and the 75th percentile ((“third
third
quartile”), respectively.
An aside...
• Roughly speaking:
– The “25th
25th percentile”
percentile is the number such that
25% of the data points fall below the number.
– The “median” or “50th p
percentile” is the
number such that half of the data points fall
below the number.
– The “75th percentile” is the number such that
75% of the data points fall below the number.
Box Plot (cont’d)
• “Whiskers” are drawn to the most extreme
data p
points that are not more than 1.5 times
the length of the box beyond either quartile.
– Whiskers are useful for identifying outliers.
• “Outliers,” or extreme observations, are
denoted by asterisks
asterisks.
– Generally, data points falling beyond the
whiskers are considered outliers.
outliers
Using Box Plots to Compare
Fastest Ever Driving Speed
226 Stat 100 Students, Fall 1998
Fastest Spe
F
eed (mph)
160
110
60
female
male
G d
Gender
Which graph to use when?
• Stem-and-leaf plots and dotplots are good
for small data sets,, while histograms
g
and
box plots are good for large data sets.
• Boxplots and dotplots are good for
comparing two groups.
• Boxplots are good for identifying outliers
outliers.
• Histograms and boxplots are good for
id if i “shape”
identifying
“h
” off data.
d
Scatter Plots
F t sizes
Foot
i
off Spring
S i
1998 St
Statt 250 students
t d t
31
Right fo
oot (in cm)
30
29
28
27
26
25
24
23
22
22
23
24
25
26
27
28
Left foot (in cm)
n=88
88 students
t d t
29
30
31
Scatter Plots
• Summarizes the relationship between two
measurement variables.
• Horizontal axis represents one variable and
vertical axis represents second variable.
variable
• Plot one point for each pair of
measurements.
measurements
No relationship
Lengths
g
of left forearms and head circumferences
of Spring 1998 Stat 250 Students
32
Left forearrm (in cm)
31
30
29
28
27
26
25
24
23
22
52
57
Head circumference ((in cm))
n=89 students
62
Closing comments
• Many possible types of graphs.
• Use common sense in reading graphs
graphs.
• When creating graphs, don’t summarize
your data too much or too little
little.
• When creating graphs, label everything for
others.
h
R
Remember
b you are trying
i to
communicate something to others!
Download