Frequencies

advertisement

Topics for Today

Elements of a good summary

Summarizing Data

Frequencies

Graphical Presentation of Frequencies

Stat302

Fall 2010 – Week1, Lecture 2

Page 1 of 29

Elements of a Good Summary

1.

Who?

- What ___________ do the data describe?

Individuals may be people, animals, or things.

- How many individuals appear in the data?

2.

What?

- The ______ of variables available.

- Exact definitions of these variables.

- Units of measurement for each variable

Weights, for example, might be recorded in grams or kilograms. Costs might be recorded in $ or millions of $.

3. Why?

- What _______ do the data have?

- What are specific _________ ?

- To conclusions about individuals other than the ones we actually have data for?

Summarizing Specific Variables

Stat302

Fall 2010 – Week1, Lecture 2

Page 2 of 29

Each type of data is most effectively summarized differently:

Proportions / Frequencies: o ___________ o ___________ o ___________

Means (& SD) / Medians (IQR): o ________ (counts or if there are many categories) o ________

But, interval and ratio data can be converted to

‘ordinal’ data and presented with proportions and frequencies.

Stat302

Fall 2010 – Week1, Lecture 2

Page 3 of 29

Frequency and Frequency Distribution

Frequencies and frequency distributions are the most commonly used summary statistics

Frequency: number of _____ each unique value of a variable occurs in a data set

Frequency Distribution: listing of the frequency of unique ______ of a variable in a data set

Relative frequency:

# times each value occurs. / # of obs. in data set

Percentage Frequency:

Relative Freq. X 100%

Page 4 of 29 Stat302

Fall 2010 – Week1, Lecture 2

Examples of Frequencies for

Nominal data

Example 1 (Nominal Data)

How students get their information on current affairs (in 1995)

Media Freq Rel. Freq. % Freq

Television 37 0.4635

Newspaper 35 0.4375

Radio 7 0.0875

Magazine

Total

1 0.0125

80 1.0000

46.25

43.75

8.75

1.25

100.00

Stat302

Fall 2010 – Week1, Lecture 2

Page 5 of 29

Example 2 (Nominal Data)

To summarize 2,439 complaints about the comfort related characteristics of its airplanes, an airline’s customer service department issues the following table:

Nature of Complaint Rel.

Freq.

Inadequate leg room

Uncomfortable seats

.295

.375

Freq

719

914

Narrow aisles

Insufficient carry-on

.060 146

____ ___ space

Insufficient rest rooms .024 58

Miscellaneous .157 384

Total 1.000 2,439

Page 6 of 29 Stat302

Fall 2010 – Week1, Lecture 2

Example 3 (Ordinal Data)

Seventy-five (75) student were interviewed regarding how often they eat breakfast:

Frequeny of

Breakfast Eating

%

Freq

Rel.

Freq.

Freq

Always 20% .2 15

Almost all the time 14.7% .147 11

Most of the time 12% .12 9

Seldom

Never

Total

24% .24

_____ ____

18

__

100% 1.000 75

Stat302

Fall 2010 – Week1, Lecture 2

Page 7 of 29

Frequency Distributions of

Interval and Ratio Data

You can convert _____ or ________ scaled data into ordinal data by grouping, to generate a frequency distribution.

For Ordinal and Nominal data, the categories are obvious; they are the ______ the variable takes.

For Ratio or Interval data, you have to construct the categories, or _______ by defining class boundaries, and midpoints.

Stat302

Fall 2010 – Week1, Lecture 2

Page 8 of 29

Defining Classes for

Grouping Interval and Ratio data

Class intervals should be non-overlapping and _____________ defined.

In most circumstances, the intervals would be of the same width. (Open ended intervals are sometimes convenient.)

If there are no individuals in a particular interval, it should still be included to _____ a misleading impression of the data.

Stat302

Fall 2010 – Week1, Lecture 2

Page 9 of 29

Example 4 (Ratio)

The following data are percentages of persons

65 years old or older in 40 large ___________ in 2001. Set up a frequency distribution for these data and include the following:

(a) frequencies

(b) midpoints of the classes

(c) percentages

(d) cumulative frequencies

(e) cumulative percentages.

Stat302

Fall 2010 – Week1, Lecture 2

Page 10 of 29

Percentages of Persons 65 Years old or Older in 40 large urban locations in 2000

Location Percent 65+ Location Percent 65+

10

11

12

13

14

15

16

17

18

19

20

1

2

3

4

5

6

7

8

9

10.1

7.0

10.3

12.2

11.7

12.2

10.6

12.0

9.2

7.5

13.0

13.1

8.1

12.9

10.7

7.8

17.2 (H)

8.4

7.8

9.7

21

22

23

12.5

11.9

11.1

24 6.9 (L)

25

26

27

28

29

11.5

9.6

10.9

9.9

8.9

30

31

32

33

34

35

36

37

38

39

40

8.4

7.7

10.8

7.2

10.6

10.9

8.9

10.4

11.7

8.5

7.3

Page 11 of 29 Stat302

Fall 2010 – Week1, Lecture 2

We’ll eventually construct frequency distribution with software, but let’s do it by hand first of all so that we understand the construction.

First, what is the __________ being measured in this example?

How many individuals are measured?

What is the ________ being measured?

Now, let’s summarize the data for these individuals.

Page 12 of 29 Stat302

Fall 2010 – Week1, Lecture 2



Step 1: Determine the ______ of classes or groups that we wish to construct.

A useful rule of thumb is to have the number of classes ( k ) equal to the square root of the number of individuals.

________________________________

Step 2: Determine the range of values and the class size (width). highest value - lowest value k

17.2

6.9

6

10.3

6

1.717

We want the wid th to be a ’convenient’ number, so we’ll round this to 2.

Step 3: Determine the classes and tally the data. Make sure that the smallest and largest data values are included in the tally.

Stat302

Fall 2010 – Week1, Lecture 2

Page 13 of 29

Class midpoints Class

Tally (frequency)

m1 = 7

m2 = 9

m3 = 11

____________

m5 = 15

m6 = 17

[6-8)

[8-10)

[10-12)

_______

[14-16)

[16-18) f1 =8 f2 = 10 f3 = 14

______ f5 = 0 f6 = 1

Since the endpoints of one interval are

‘adjacent’ to the endpoints of the next interval, these numbers are called the real limits or class limits.

Stat302

Fall 2010 – Week1, Lecture 2

Page 14 of 29

Frequency, percent frequency, cumulative frequency and cumulative percent frequency can then be calculated.

Class Midpoint Frequency

[6, 8)

[8, 10)

[10, 12)

______

[14, 16)

[16, 18) m

7

9

11

__

15

17 f

Percentage

%

Cum. Freq. Cum. % cf c%

8

10

100(8/40) = 20.0 f1 = 8

100(10/40) = 25.0 f1+f2 = 18

20

45

14 100(14/40) = 35.0 f1+f2+f3 = 32 80

_ ______________ ______ ____

0 100(0/40) = 0.0 … = 39

97.5

1 100(1/40) = 2.5 … = 40

100

Note: Percentage distributions are used extensively to compare samples with different sample sizes.

Stat302

Fall 2010 – Week1, Lecture 2

Page 15 of 29

Advantages of Grouping

- it reduces the apparent complexity of the data by reducing the number of separate pieces of information.

- it helps to smooth out irregularities in the data.

Disadvantage of Grouping

- information is lost

Stat302

Fall 2010 – Week1, Lecture 2

Page 16 of 29

Graphical Display of Data

Include title, labels, unit of measurement, etc. to describe the main features.

1.

Bar Charts

2.

Pie Charts

3.

Histograms

Stat302

Fall 2010 – Week1, Lecture 2

Page 17 of 29

Bar Chart: series of rectangular bars where the _______ of the bars represent

___________ of the respective quantities; bars have equal width, label axes, start at zero

Pie Chart: circle divided into sectors in such a way that the area of each ______ is

____________ to the quantity represented.

Pie Charts emphasize the proportion of occurrences of each category. Bar charts focus attention on frequencies,

Stat302

Fall 2010 – Week1, Lecture 2

Page 18 of 29

Example of a Pie Chart:

The following data gives the breakdown of purposes for which a population makes six million trips on a normal working day.

Purpose No. of Trips

(millions)

Relative

Frequency

Angle of

Sector

To and from work

To and from School

2.01

1.14

0.335 120.6

0.190 68.4

Social

Personal Business

To and from Shops

Other

Total

0.84

0.64

0.60

0.77

6.00

0.140

0.107

0.100

50.4

38.4

36.0

0.128 46.2

1.000 360.0

Stat302

Fall 2010 – Week1, Lecture 2

Page 19 of 29

Pie Chart

33.50%

10.00%

12.83%

10.67%

19.00%

14.00%

Purpose

Purpose

Other

Social

To and from shops

Personal business

To and from school

To and from w ork

Stat302

Fall 2010 – Week1, Lecture 2

Page 20 of 29

1.0

0.5

0.0

2.0

1.5

Bar Chart

Purpose

Stat302

Fall 2010 – Week1, Lecture 2

Page 21 of 29

How to lie with a bar graph

Stat302

Fall 2010 – Week1, Lecture 2

Page 22 of 29

Histogram

A bar chart is used for plotting frequencies of nominal or ordinal variables.

__________ are used for plotting frequencies of _______ interval or ratio data. The main difference is there is no gap between the bars.

The frequency and relative frequency can be plotted (and will look almost identical except a different y-axis).

 Frequency Histogram

 Rel. Freq. Histogram

Centre of base of rectangle placed at class mark.

Stat302

Fall 2010 – Week1, Lecture 2

Page 23 of 29

Back to the seniors example earlier. Below is a frequency histogram of this data:

Frequency histogram of the percent of seniors in 40 locations

Category Midpoints

Stat302

Fall 2010 – Week1, Lecture 2

Page 24 of 29

… and a relative frequency histogram of the same data:

Relative Frequency histogram of the percent of seniors in 40 locations

Category Midpoints

Stat302

Fall 2010 – Week1, Lecture 2

Page 25 of 29

Notes on Histograms

Look for the overall _______ and for striking deviations from that pattern

Describe the overall pattern of a histogram by its _____ , centre and spread

Look for ________ , individual values that fall outside the overall pattern.

Stat302

Fall 2010 – Week1, Lecture 2

Page 26 of 29

Histogram Patterns

Symmetric: reflection on an axis, histogram falls on its image

Asymmetric: otherwise

Positively Skewed: Long tail to the right

Negatively Skewed: Long tail to the left

Modal Class: class with the largest number of observations

Unimodal: histogram with a single peak

Bimodal: histogram with 2 peaks, not necessarily equal in height

Bell-shaped

Today’s Topics

Page 27 of 29 Stat302

Fall 2010 – Week1, Lecture 2

Elements of a good Summary

- Who, What, Why

Summarizing Data

- Different methods for different types of data

Frequencies

- Frequency, relative frequency, percent frequency, cumulative frequency

- Frequency Distribution

- Grouping Interval and Ratio data

Graphical Presentation of Frequencies

- Pie Charts

- Bar Charts

- Histograms

Page 28 of 29 Stat302

Fall 2010 – Week1, Lecture 2

Reading for next lecture

Chapter 2

Stat302

Fall 2010 – Week1, Lecture 2

Page 29 of 29

Download