Topics for Today
Elements of a good summary
Summarizing Data
Frequencies
Graphical Presentation of Frequencies
Stat302
Fall 2010 – Week1, Lecture 2
Page 1 of 29
Elements of a Good Summary
1.
Who?
- What ___________ do the data describe?
Individuals may be people, animals, or things.
- How many individuals appear in the data?
2.
What?
- The ______ of variables available.
- Exact definitions of these variables.
- Units of measurement for each variable
Weights, for example, might be recorded in grams or kilograms. Costs might be recorded in $ or millions of $.
3. Why?
- What _______ do the data have?
- What are specific _________ ?
- To conclusions about individuals other than the ones we actually have data for?
Summarizing Specific Variables
Stat302
Fall 2010 – Week1, Lecture 2
Page 2 of 29
Each type of data is most effectively summarized differently:
Proportions / Frequencies: o ___________ o ___________ o ___________
Means (& SD) / Medians (IQR): o ________ (counts or if there are many categories) o ________
But, interval and ratio data can be converted to
‘ordinal’ data and presented with proportions and frequencies.
Stat302
Fall 2010 – Week1, Lecture 2
Page 3 of 29
Frequency and Frequency Distribution
Frequencies and frequency distributions are the most commonly used summary statistics
Frequency: number of _____ each unique value of a variable occurs in a data set
Frequency Distribution: listing of the frequency of unique ______ of a variable in a data set
Relative frequency:
# times each value occurs. / # of obs. in data set
Percentage Frequency:
Relative Freq. X 100%
Page 4 of 29 Stat302
Fall 2010 – Week1, Lecture 2
Examples of Frequencies for
Nominal data
Example 1 (Nominal Data)
How students get their information on current affairs (in 1995)
Media Freq Rel. Freq. % Freq
Television 37 0.4635
Newspaper 35 0.4375
Radio 7 0.0875
Magazine
Total
1 0.0125
80 1.0000
46.25
43.75
8.75
1.25
100.00
Stat302
Fall 2010 – Week1, Lecture 2
Page 5 of 29
Example 2 (Nominal Data)
To summarize 2,439 complaints about the comfort related characteristics of its airplanes, an airline’s customer service department issues the following table:
Nature of Complaint Rel.
Freq.
Inadequate leg room
Uncomfortable seats
.295
.375
Freq
719
914
Narrow aisles
Insufficient carry-on
.060 146
____ ___ space
Insufficient rest rooms .024 58
Miscellaneous .157 384
Total 1.000 2,439
Page 6 of 29 Stat302
Fall 2010 – Week1, Lecture 2
Example 3 (Ordinal Data)
Seventy-five (75) student were interviewed regarding how often they eat breakfast:
Frequeny of
Breakfast Eating
%
Freq
Rel.
Freq.
Freq
Always 20% .2 15
Almost all the time 14.7% .147 11
Most of the time 12% .12 9
Seldom
Never
Total
24% .24
_____ ____
18
__
100% 1.000 75
Stat302
Fall 2010 – Week1, Lecture 2
Page 7 of 29
Frequency Distributions of
Interval and Ratio Data
You can convert _____ or ________ scaled data into ordinal data by grouping, to generate a frequency distribution.
For Ordinal and Nominal data, the categories are obvious; they are the ______ the variable takes.
For Ratio or Interval data, you have to construct the categories, or _______ by defining class boundaries, and midpoints.
Stat302
Fall 2010 – Week1, Lecture 2
Page 8 of 29
Defining Classes for
Grouping Interval and Ratio data
Class intervals should be non-overlapping and _____________ defined.
In most circumstances, the intervals would be of the same width. (Open ended intervals are sometimes convenient.)
If there are no individuals in a particular interval, it should still be included to _____ a misleading impression of the data.
Stat302
Fall 2010 – Week1, Lecture 2
Page 9 of 29
Example 4 (Ratio)
The following data are percentages of persons
65 years old or older in 40 large ___________ in 2001. Set up a frequency distribution for these data and include the following:
(a) frequencies
(b) midpoints of the classes
(c) percentages
(d) cumulative frequencies
(e) cumulative percentages.
Stat302
Fall 2010 – Week1, Lecture 2
Page 10 of 29
Percentages of Persons 65 Years old or Older in 40 large urban locations in 2000
Location Percent 65+ Location Percent 65+
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
9
10.1
7.0
10.3
12.2
11.7
12.2
10.6
12.0
9.2
7.5
13.0
13.1
8.1
12.9
10.7
7.8
17.2 (H)
8.4
7.8
9.7
21
22
23
12.5
11.9
11.1
24 6.9 (L)
25
26
27
28
29
11.5
9.6
10.9
9.9
8.9
30
31
32
33
34
35
36
37
38
39
40
8.4
7.7
10.8
7.2
10.6
10.9
8.9
10.4
11.7
8.5
7.3
Page 11 of 29 Stat302
Fall 2010 – Week1, Lecture 2
We’ll eventually construct frequency distribution with software, but let’s do it by hand first of all so that we understand the construction.
First, what is the __________ being measured in this example?
How many individuals are measured?
What is the ________ being measured?
Now, let’s summarize the data for these individuals.
Page 12 of 29 Stat302
Fall 2010 – Week1, Lecture 2
Step 1: Determine the ______ of classes or groups that we wish to construct.
A useful rule of thumb is to have the number of classes ( k ) equal to the square root of the number of individuals.
________________________________
Step 2: Determine the range of values and the class size (width). highest value - lowest value k
17.2
6.9
6
10.3
6
1.717
We want the wid th to be a ’convenient’ number, so we’ll round this to 2.
Step 3: Determine the classes and tally the data. Make sure that the smallest and largest data values are included in the tally.
Stat302
Fall 2010 – Week1, Lecture 2
Page 13 of 29
Class midpoints Class
Tally (frequency)
m1 = 7
m2 = 9
m3 = 11
____________
m5 = 15
m6 = 17
[6-8)
[8-10)
[10-12)
_______
[14-16)
[16-18) f1 =8 f2 = 10 f3 = 14
______ f5 = 0 f6 = 1
Since the endpoints of one interval are
‘adjacent’ to the endpoints of the next interval, these numbers are called the real limits or class limits.
Stat302
Fall 2010 – Week1, Lecture 2
Page 14 of 29
Frequency, percent frequency, cumulative frequency and cumulative percent frequency can then be calculated.
Class Midpoint Frequency
[6, 8)
[8, 10)
[10, 12)
______
[14, 16)
[16, 18) m
7
9
11
__
15
17 f
Percentage
%
Cum. Freq. Cum. % cf c%
8
10
100(8/40) = 20.0 f1 = 8
100(10/40) = 25.0 f1+f2 = 18
20
45
14 100(14/40) = 35.0 f1+f2+f3 = 32 80
_ ______________ ______ ____
0 100(0/40) = 0.0 … = 39
97.5
1 100(1/40) = 2.5 … = 40
100
Note: Percentage distributions are used extensively to compare samples with different sample sizes.
Stat302
Fall 2010 – Week1, Lecture 2
Page 15 of 29
Advantages of Grouping
- it reduces the apparent complexity of the data by reducing the number of separate pieces of information.
- it helps to smooth out irregularities in the data.
Disadvantage of Grouping
- information is lost
Stat302
Fall 2010 – Week1, Lecture 2
Page 16 of 29
Graphical Display of Data
Include title, labels, unit of measurement, etc. to describe the main features.
1.
Bar Charts
2.
Pie Charts
3.
Histograms
Stat302
Fall 2010 – Week1, Lecture 2
Page 17 of 29
Bar Chart: series of rectangular bars where the _______ of the bars represent
___________ of the respective quantities; bars have equal width, label axes, start at zero
Pie Chart: circle divided into sectors in such a way that the area of each ______ is
____________ to the quantity represented.
Pie Charts emphasize the proportion of occurrences of each category. Bar charts focus attention on frequencies,
Stat302
Fall 2010 – Week1, Lecture 2
Page 18 of 29
Example of a Pie Chart:
The following data gives the breakdown of purposes for which a population makes six million trips on a normal working day.
Purpose No. of Trips
(millions)
Relative
Frequency
Angle of
Sector
To and from work
To and from School
2.01
1.14
0.335 120.6
0.190 68.4
Social
Personal Business
To and from Shops
Other
Total
0.84
0.64
0.60
0.77
6.00
0.140
0.107
0.100
50.4
38.4
36.0
0.128 46.2
1.000 360.0
Stat302
Fall 2010 – Week1, Lecture 2
Page 19 of 29
Pie Chart
33.50%
10.00%
12.83%
10.67%
19.00%
14.00%
Purpose
Purpose
Other
Social
To and from shops
Personal business
To and from school
To and from w ork
Stat302
Fall 2010 – Week1, Lecture 2
Page 20 of 29
1.0
0.5
0.0
2.0
1.5
Bar Chart
Purpose
Stat302
Fall 2010 – Week1, Lecture 2
Page 21 of 29
How to lie with a bar graph
Stat302
Fall 2010 – Week1, Lecture 2
Page 22 of 29
Histogram
A bar chart is used for plotting frequencies of nominal or ordinal variables.
__________ are used for plotting frequencies of _______ interval or ratio data. The main difference is there is no gap between the bars.
The frequency and relative frequency can be plotted (and will look almost identical except a different y-axis).
Frequency Histogram
Rel. Freq. Histogram
Centre of base of rectangle placed at class mark.
Stat302
Fall 2010 – Week1, Lecture 2
Page 23 of 29
Back to the seniors example earlier. Below is a frequency histogram of this data:
Frequency histogram of the percent of seniors in 40 locations
Category Midpoints
Stat302
Fall 2010 – Week1, Lecture 2
Page 24 of 29
… and a relative frequency histogram of the same data:
Relative Frequency histogram of the percent of seniors in 40 locations
Category Midpoints
Stat302
Fall 2010 – Week1, Lecture 2
Page 25 of 29
Notes on Histograms
Look for the overall _______ and for striking deviations from that pattern
Describe the overall pattern of a histogram by its _____ , centre and spread
Look for ________ , individual values that fall outside the overall pattern.
Stat302
Fall 2010 – Week1, Lecture 2
Page 26 of 29
Histogram Patterns
Symmetric: reflection on an axis, histogram falls on its image
Asymmetric: otherwise
Positively Skewed: Long tail to the right
Negatively Skewed: Long tail to the left
Modal Class: class with the largest number of observations
Unimodal: histogram with a single peak
Bimodal: histogram with 2 peaks, not necessarily equal in height
Bell-shaped
Today’s Topics
Page 27 of 29 Stat302
Fall 2010 – Week1, Lecture 2
Elements of a good Summary
- Who, What, Why
Summarizing Data
- Different methods for different types of data
Frequencies
- Frequency, relative frequency, percent frequency, cumulative frequency
- Frequency Distribution
- Grouping Interval and Ratio data
Graphical Presentation of Frequencies
- Pie Charts
- Bar Charts
- Histograms
Page 28 of 29 Stat302
Fall 2010 – Week1, Lecture 2
Reading for next lecture
Chapter 2
Stat302
Fall 2010 – Week1, Lecture 2
Page 29 of 29