Slides - Fairfield University

advertisement

Lecture 5: Graphical Display of Data

MA 217 - Stephen Sawin

Fairfield University

January 10, 2014

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer.

Better is to give proportions

Party #

Rep.

28

Ind.

19

Dem.

16

Total 63

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions

Party # Proportion

Rep 28 28/63 =

Ind 19

Dem 16

Total 63 100%

44 .

4%

19/63 = 30 .

1%

16/63 = 25 .

4%

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions

Party # Proportion

Rep 28 28 /63 =

Ind 19

Dem 16

Total 63 100%

44 .

4%

19/63 = 30 .

1%

16/63 = 25 .

4%

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions

Party # Proportion

Rep 28 28/ 63 =

Ind 19

Dem 16

Total 63 100%

44 .

4%

19/63 = 30 .

1%

16/63 = 25 .

4%

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions

Party # Proportion

Rep 28 28/63 = 44 .

4%

Ind 19

Dem 16

Total 63 100%

19/63 = 30 .

1%

16/63 = 25 .

4%

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions

Party # Proportion

Rep 28 28/63 = 44 .

4%

Ind 19 19/63 = 30 .

1%

Dem 16 16/63 = 25 .

4%

Total 63 100%

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions

Party # Proportion

Rep 28 28/63 = 44 .

4%

Ind 19 19/63 = 30 .

1%

Dem 16 16/63 = 25 .

4%

!"#$%&'#(!')*+(

Total 63 100% !#$"

!#($"

!#("

!#'$"

!#'"

!#&$"

!#&"

!#%$"

!#%"

!#!$"

!"

)*+#" ,-.#" /*0#"

Bar Chart

5: Descriptive Statistics of Categorical Data

Your political party is a categorical variable. To summarize it for a class of 63 students, all you can do is summarize how many people gave each answer. Better is to give proportions

Party # Proportion

Rep 28 28/63 = 44 .

4%

Ind 19 19/63 = 30 .

1%

Dem 16 16/63 = 25 .

4%

!"#$%&'#(!')*+( !"#$%&'#(!')*+(

Total 63 100% !#$"

!#($"

!#("

!#'$"

!#'"

!#&$"

!#&"

!#%$"

!#%"

!#!$"

!"

)*+#" ,-.#" /*0#"

Pie Chart

!"#$%

&'($%

)"*$%

Bar Chart

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.

Stack multiple instances on top of each other.

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.

Stack multiple instances on top of each other.

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.

Stack multiple instances on top of each other.

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.

Stack multiple instances on top of each other.

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.

Stack multiple instances on top of each other.

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum.

Mark a dot or X for each data value.

Stack multiple instances on top of each other.

0 1 2 3 4 5 6 7 8 9 10

-

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum.

Mark a dot or X for each data value.

Stack multiple instances on top of each other.

X

-

0 1 2 3 4 5 6 7 8 9 10

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.

Stack multiple instances on top of each other.

X

X

-

0 1 2 3 4 5 6 7 8 9 10

5 Graphic Display of Numerical Data

Quiz scores for a class is a numerical variable.

10 10 9 9 8 8 8 7 5 3

The smallest data value in your data set is called the minimum , the largest is the maximum , and their difference is the range . For this data minimum = 3 maximum = 10 range = 7=10 − 3

To make a dot plot , draw a horizontal number line from the minimum to the maximum. Mark a dot or X for each data value.

Stack multiple instances on top of each other.

X X X

X

X

X

X

X

X

X

-

0 1 2 3 4 5 6 7 8 9 10

Histograms

10 10 9 9 8 8 8 7 5 3

Dot plots too fussy for large data sets. Instead make histogram.

Divide range up into equal size bins.

Pick a number of bins

(guideline: square root of number of data values)

Range

Bin Size =

Number of Bins

Can adjust max and min to make this divide nicely: If we go from

2 to 10 , then 4 bins gives

Bin Size =

Bin max − Bin min

=

Number of Bins

10 − 2

4

= 2

So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .

Count how many data values in each bin:

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Histograms

10 10 9 9 8 8 8 7 5 3

Dot plots too fussy for large data sets. Instead make histogram.

Divide range up into equal size bins.

Pick a number of bins

(guideline: square root of number of data values)

Range

Bin Size =

Number of Bins

Can adjust max and min to make this divide nicely: If we go from

2 to 10 , then 4 bins gives

Bin Size =

Bin max − Bin min

=

Number of Bins

10 − 2

4

= 2

So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .

Count how many data values in each bin:

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Histograms

10 10 9 9 8 8 8 7 5 3

Dot plots too fussy for large data sets. Instead make histogram.

Divide range up into equal size bins.

Pick a number of bins

(guideline: square root of number of data values)

Range

Bin Size =

Number of Bins

Can adjust max and min to make this divide nicely: If we go from

2 to 10 , then 4 bins gives

Bin Size =

Bin max − Bin min

=

Number of Bins

10 − 2

4

= 2

So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .

Count how many data values in each bin:

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Histograms

10 10 9 9 8 8 8 7 5 3

Dot plots too fussy for large data sets. Instead make histogram.

Divide range up into equal size bins.

Pick a number of bins

(guideline: square root of number of data values)

Range

Bin Size =

Number of Bins

Can adjust max and min to make this divide nicely: If we go from

2 to 10 , then 4 bins gives

Bin Size =

Bin max − Bin min

=

Number of Bins

10 − 2

4

= 2

So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .

Count how many data values in each bin:

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Histograms

10 10 9 9 8 8 8 7 5 3

Dot plots too fussy for large data sets. Instead make histogram.

Divide range up into equal size bins.

Pick a number of bins

(guideline: square root of number of data values)

Range

Bin Size =

Number of Bins

Can adjust max and min to make this divide nicely: If we go from

2 to 10 , then 4 bins gives

Bin Size =

Bin max − Bin min

=

Number of Bins

10 − 2

4

= 2

So bin boundaries are at 2 , 4 , 6 , 8 , and 10 .

Count how many data values in each bin:

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Making Histograms

10 10 9 9 8 8 8 7 5 3

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Draw the number line as you would for a dot plot, but now for each bin make a vertical bar the width of the bin and with height proportional to the number of points in the bin.

"2-4"

"4-6"

"6-8"

"8-10"

(#$"

("

'#$"

'"

&#$"

&"

%#$"

%"

!#$"

!"

)&*()"

1

1

4

4

)(*+)" )+*,)" ),*%!)"

Making Histograms

10 10 9 9 8 8 8 7 5 3

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Draw the number line as you would for a dot plot, but now for each bin make a vertical bar the width of the bin and with height proportional to the number of points in the bin.

"2-4"

"4-6"

"6-8"

"8-10"

(#$"

("

'#$"

'"

&#$"

&"

%#$"

%"

!#$"

!"

)&*()"

1

1

4

4

)(*+)" )+*,)" ),*%!)"

Making Histograms

10 10 9 9 8 8 8 7 5 3

2 − 4 4 − 6 6 − 8 8 − 10

1 1 4 4

Draw the number line as you would for a dot plot, but now for

"2-4" 1 each bin make a vertical bar the width of the bin and with height

"6-8" 4 proportional to the number of points in the bin.

(#$"

("

'#$"

'"

&#$"

&"

%#$"

%"

!#$"

!"

)&*()" )(*+)" )+*,)" ),*%!)"

Some Examples of Histograms

Hours per week spent exercising (from survey)

College GPA (from survey)

Height (from similar survey at big university in

Georgia)

Histogram

25

20

15

10

5

0

0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8 13.8-16.5

Data

16.5-19.3 19.3-22 22-24.8 24.8-27.5

Histogram

25

20

15

10

5

0

1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19 3.19-3.46

Data

3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52

Some Examples of Histograms

Histogram

25

20

15

10

5

0

0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8

Data

13.8-16.5 16.5-19.3 19.3-22 22-24.8 24.8-27.5

Hours per week spent exercising (from survey)

College GPA (from survey)

Height (from similar survey at big university in

Georgia)

Histogram

25

20

15

10

5

0

1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19 3.19-3.46

Data

3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52

Some Examples of Histograms

Histogram

25

20

15

10

5

0

0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8

Data

13.8-16.5 16.5-19.3 19.3-22 22-24.8 24.8-27.5

Histogram

25

20

15

10

5

0

1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19

Data

3.19-3.46 3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52

Hours per week spent exercising (from survey)

College GPA (from survey)

Height (from similar survey at big university in

Georgia)

Some Examples of Histograms

Histogram

25

20

15

10

5

0

0-2.75 2.75-5.5 5.5-8.26 8.26-11 11-13.8

Data

13.8-16.5 16.5-19.3 19.3-22 22-24.8 24.8-27.5

Histogram

25

20

15

10

5

0

1.86-2.13 2.13-2.39 2.39-2.66 2.66-2.93 2.93-3.19

Data

3.19-3.46 3.46-3.73 3.73-3.99 3.99-4.26 4.26-4.52

Hours per week spent exercising (from survey)

College GPA (from survey)

Height (from similar survey at big university in

Georgia)

Shapes of Histograms

Uniform (dice, random number generators)

Bimodal

(height, two distinct subpops)

Unimodal

(many things)

Multimodal

(multiple subpops)

Shapes of Histograms

Uniform (dice, random number generators)

Bimodal

(height, two distinct subpops)

Unimodal

(many things)

Multimodal

(multiple subpops)

Shapes of Histograms

Uniform (dice, random number generators)

Bimodal

(height, two distinct subpops)

Unimodal

(many things)

Multimodal

(multiple subpops)

Shapes of Histograms

Uniform (dice, random number generators)

Bimodal

(height, two distinct subpops)

Unimodal

(many things)

Multimodal

(multiple subpops)

More Shapes of Histograms

Symmetric (variables where you are as likely to be above average as below)

Skewed Right (income, prices, time spent studying, things bounded below that occasionally are very large )

Skewed Left (gpa, lifespan, things bounded above but occasionally very small)

More Shapes of Histograms

Symmetric (variables where you are as likely to be above average as below)

Skewed Right (income, prices, time spent studying, things bounded below that occasionally are very large )

Skewed Left (gpa, lifespan, things bounded above but occasionally very small)

More Shapes of Histograms

Symmetric (variables where you are as likely to be above average as below)

Skewed Right (income, prices, time spent studying, things bounded below that occasionally are very large )

Skewed Left (gpa, lifespan, things bounded above but occasionally very small)

Lecture 5 Key Points

After this lecture you should be able to

I

I

I

I

I

I

I

I know the terms, maximum, minimum, range, dot plot, histogram,bin size, bin number, bin minimum, bin maximum.

know the terms uniform, unimodal, bimodal, multimodal and skew left, skew right, symmetric.

find and interpret proportions for a categorical variable.

calculate maximum, minimum and range.

produce very simple dot plots.

read a dot plot; read a histogram.

understand what’s involved in choosing bin size and number.

identify shape of a histogram (uniform/unimodal/ bimodal/ multimodal, symmetric/skew left/skew right)

After processing this lecture you should be able to

I

I

I suggest explanations for the shape of a histogram, guess shape from a description of the variable.

produce a histogram in Excel.

choose appropriate bin sizes, bin maximums and minimums.

Download