Lecture 5: Graphical Display of Data
MA 217 - Stephen Sawin
Fairfield University
January 10, 2014
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer.
Better is to give proportions
Party #
Rep.
28
Ind.
19
Dem.
16
Total 63
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions
Party # Proportion
Rep 28 28/63 =
Ind 19
Dem 16
Total 63 100%
44 .
4%
19/63 = 30 .
1%
16/63 = 25 .
4%
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions
Party # Proportion
Rep 28 28 /63 =
Ind 19
Dem 16
Total 63 100%
44 .
4%
19/63 = 30 .
1%
16/63 = 25 .
4%
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions
Party # Proportion
Rep 28 28/ 63 =
Ind 19
Dem 16
Total 63 100%
44 .
4%
19/63 = 30 .
1%
16/63 = 25 .
4%
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions
Party # Proportion
Rep 28 28/63 = 44 .
4%
Ind 19
Dem 16
Total 63 100%
19/63 = 30 .
1%
16/63 = 25 .
4%
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions
Party # Proportion
Rep 28 28/63 = 44 .
4%
Ind 19 19/63 = 30 .
1%
Dem 16 16/63 = 25 .
4%
Total 63 100%
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions
Party # Proportion
Rep 28 28/63 = 44 .
4%
Ind 19 19/63 = 30 .
1%
Dem 16 16/63 = 25 .
4%
!"#$%&'#(!')*+(
Total 63 100% !#$"
!#($"
!#("
!#'$"
!#'"
!#&$"
!#&"
!#%$"
!#%"
!#!$"
!"
)*+#" ,-.#" /*0#"
Bar Chart
5: Descriptive Statistics of Categorical Data
Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions
Party # Proportion
Rep 28 28/63 = 44 .
4%
Ind 19 19/63 = 30 .
1%
Dem 16 16/63 = 25 .
4%
!"#$%&'#(!')*+( !"#$%&'#(!')*+(
Total 63 100% !#$"
!#($"
!#("
!#'$"
!#'"
!#&$"
!#&"
!#%$"
!#%"
!#!$"
!"
)*+#" ,-.#" /*0#"
Pie Chart
!"#$%
&'($%
)"*$%
Bar Chart
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.
Stack multiple instances on top of each other.
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.
Stack multiple instances on top of each other.
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.
Stack multiple instances on top of each other.
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.
Stack multiple instances on top of each other.
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.
Stack multiple instances on top of each other.
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum.
Mark a dot or X for each data value.
Stack multiple instances on top of each other.
0 1 2 3 4 5 6 7 8 9 10
-
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum.
Mark a dot or X for each data value.
Stack multiple instances on top of each other.
X
-
0 1 2 3 4 5 6 7 8 9 10
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.
Stack multiple instances on top of each other.
X
X
-
0 1 2 3 4 5 6 7 8 9 10
5 Graphic Display of Numerical Data
Quiz scores for a class is a numerical variable.
10 10 9 9 8 8 8 7 5 3
The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3
To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.
Stack multiple instances on top of each other.
X X X
X
X
X
X
X
X
X
-
0 1 2 3 4 5 6 7 8 9 10
Histograms
10 10 9 9 8 8 8 7 5 3
Dot plots too fussy for large data sets. Instead make histogram.
Divide range up into equal size bins.
Pick a number of bins
(guideline: square root of number of data values)
Range
Bin Size =
Number of Bins
Can adjust max and min to make this divide nicely: If we go from
2 to 10 , then 4 bins gives
Bin Size =
Bin max − Bin min
=
Number of Bins
10 − 2
4
= 2
So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .
Count how many data values in each bin:
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Histograms
10 10 9 9 8 8 8 7 5 3
Dot plots too fussy for large data sets. Instead make histogram.
Divide range up into equal size bins.
Pick a number of bins
(guideline: square root of number of data values)
Range
Bin Size =
Number of Bins
Can adjust max and min to make this divide nicely: If we go from
2 to 10 , then 4 bins gives
Bin Size =
Bin max − Bin min
=
Number of Bins
10 − 2
4
= 2
So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .
Count how many data values in each bin:
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Histograms
10 10 9 9 8 8 8 7 5 3
Dot plots too fussy for large data sets. Instead make histogram.
Divide range up into equal size bins.
Pick a number of bins
(guideline: square root of number of data values)
Range
Bin Size =
Number of Bins
Can adjust max and min to make this divide nicely: If we go from
2 to 10 , then 4 bins gives
Bin Size =
Bin max − Bin min
=
Number of Bins
10 − 2
4
= 2
So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .
Count how many data values in each bin:
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Histograms
10 10 9 9 8 8 8 7 5 3
Dot plots too fussy for large data sets. Instead make histogram.
Divide range up into equal size bins.
Pick a number of bins
(guideline: square root of number of data values)
Range
Bin Size =
Number of Bins
Can adjust max and min to make this divide nicely: If we go from
2 to 10 , then 4 bins gives
Bin Size =
Bin max − Bin min
=
Number of Bins
10 − 2
4
= 2
So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .
Count how many data values in each bin:
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Histograms
10 10 9 9 8 8 8 7 5 3
Dot plots too fussy for large data sets. Instead make histogram.
Divide range up into equal size bins.
Pick a number of bins
(guideline: square root of number of data values)
Range
Bin Size =
Number of Bins
Can adjust max and min to make this divide nicely: If we go from
2 to 10 , then 4 bins gives
Bin Size =
Bin max − Bin min
=
Number of Bins
10 − 2
4
= 2
So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .
Count how many data values in each bin:
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Making Histograms
10 10 9 9 8 8 8 7 5 3
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Draw the number line as you would for a dot plot, but now for each bin make a vertical bar the width of the bin and with height proportional to the number of points in the bin.
"2-4"
"4-6"
"6-8"
"8-10"
(#$"
("
'#$"
'"
&#$"
&"
%#$"
%"
!#$"
!"
)&*()"
1
1
4
4
)(*+)" )+*,)" ),*%!)"
Making Histograms
10 10 9 9 8 8 8 7 5 3
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Draw the number line as you would for a dot plot, but now for each bin make a vertical bar the width of the bin and with height proportional to the number of points in the bin.
"2-4"
"4-6"
"6-8"
"8-10"
(#$"
("
'#$"
'"
&#$"
&"
%#$"
%"
!#$"
!"
)&*()"
1
1
4
4
)(*+)" )+*,)" ),*%!)"
Making Histograms
10 10 9 9 8 8 8 7 5 3
2 − 4 4 − 6 6 − 8 8 − 10
1 1 4 4
Draw the number line as you would for a dot plot, but now for
"2-4" 1 each bin make a vertical bar the width of the bin and with height
"6-8" 4 proportional to the number of points in the bin.
(#$"
("
'#$"
'"
&#$"
&"
%#$"
%"
!#$"
!"
)&*()" )(*+)" )+*,)" ),*%!)"
Some Examples of Histograms
Hours per week spent exercising (from survey)
College GPA (from survey)
Height (from similar survey at big university in
Georgia)
Histogram
25
20
15
10
5
0
0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8 13.8-16.5
Data
16.5-19.3 19.3-22 22-24.8 24.8-27.5
Histogram
25
20
15
10
5
0
1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19 3.19-3.46
Data
3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52
Some Examples of Histograms
Histogram
25
20
15
10
5
0
0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8
Data
13.8-16.5 16.5-19.3 19.3-22 22-24.8 24.8-27.5
Hours per week spent exercising (from survey)
College GPA (from survey)
Height (from similar survey at big university in
Georgia)
Histogram
25
20
15
10
5
0
1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19 3.19-3.46
Data
3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52
Some Examples of Histograms
Histogram
25
20
15
10
5
0
0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8
Data
13.8-16.5 16.5-19.3 19.3-22 22-24.8 24.8-27.5
Histogram
25
20
15
10
5
0
1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19
Data
3.19-3.46 3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52
Hours per week spent exercising (from survey)
College GPA (from survey)
Height (from similar survey at big university in
Georgia)
Some Examples of Histograms
Histogram
25
20
15
10
5
0
0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8
Data
13.8-16.5 16.5-19.3 19.3-22 22-24.8 24.8-27.5
Histogram
25
20
15
10
5
0
1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19
Data
3.19-3.46 3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52
Hours per week spent exercising (from survey)
College GPA (from survey)
Height (from similar survey at big university in
Georgia)
Shapes of Histograms
Uniform (dice, random number generators)
Bimodal
(height, two distinct subpops)
Unimodal
(many things)
Multimodal
(multiple subpops)
Shapes of Histograms
Uniform (dice, random number generators)
Bimodal
(height, two distinct subpops)
Unimodal
(many things)
Multimodal
(multiple subpops)
Shapes of Histograms
Uniform (dice, random number generators)
Bimodal
(height, two distinct subpops)
Unimodal
(many things)
Multimodal
(multiple subpops)
Shapes of Histograms
Uniform (dice, random number generators)
Bimodal
(height, two distinct subpops)
Unimodal
(many things)
Multimodal
(multiple subpops)
More Shapes of Histograms
Symmetric (variables where you are as likely to be above average as below)
Skewed Right (income, prices, time spent studying, things bounded below that occasionally are very large )
Skewed Left (gpa, lifespan, things bounded above but occasionally very small)
More Shapes of Histograms
Symmetric (variables where you are as likely to be above average as below)
Skewed Right (income, prices, time spent studying, things bounded below that occasionally are very large )
Skewed Left (gpa, lifespan, things bounded above but occasionally very small)
More Shapes of Histograms
Symmetric (variables where you are as likely to be above average as below)
Skewed Right (income, prices, time spent studying, things bounded below that occasionally are very large )
Skewed Left (gpa, lifespan, things bounded above but occasionally very small)
Lecture 5 Key Points
After this lecture you should be able to
I
I
I
I
I
I
I
I know the terms, maximum, minimum, range, dot plot, histogram,bin size, bin number, bin minimum, bin maximum.
know the terms uniform, unimodal, bimodal, multimodal and skew left, skew right, symmetric.
find and interpret proportions for a categorical variable.
calculate maximum, minimum and range.
produce very simple dot plots.
read a dot plot; read a histogram.
understand what’s involved in choosing bin size and number.
identify shape of a histogram (uniform/unimodal/ bimodal/ multimodal, symmetric/skew left/skew right)
After processing this lecture you should be able to
I
I
I suggest explanations for the shape of a histogram, guess shape from a description of the variable.
produce a histogram in Excel.